MySQL data sharding using Spock Proxy

August 12th, 2008

Yesterday at the Silicon valley MySQL Meetup, Frank of Spock.com talked about Spock Proxy. Spock Proxy is a fork of MySQL proxy which has been built to meet the data sharding needs of Spock.com, the people search engine.

Here are some highlights:

  • Spock.com’s web interface is built on Rails and they use ActiveRecords as their O-R layer for MySQL data access
  • Spock has around 1,000 web servers using Rails and they connect to MySQL slaves and masters using Spock Proxy
  • Spock Proxy acts like a normal MySQL engine, except that it transparently talks to other MySQL servers. At spock they use 4 master and 4 slaves each having their own Spock Proxy.
  • The Web servers each have one connection open to the Spock Proxy while the proxy may have 100s of pooled connections
  • The Proxy tokenizes a SQL statement and figures out the target shard for the query. The query must have a shard_key. The shard_key is stored in a Universal DB which stores the dictionary of the partitioned tables, shard hostname/user/password, ranges and range for auto_incremented columns
  • It currently supports only range based partitioning — while a lot of partitioning is done based on hashing, but should not be a big deal to change
  • The current alpha version is very much suited to meet Spock’s internal needs, but I’m sure people will take this up to generalize
  • Unsupported query constructs (like inner queries, group by, multi-table joins) may not throw exceptions. DDLs are also not supported

 

Java Technologies at Yahoo!

August 5th, 2008

Yesterday, I attended a talk at SDForum presented by Dean Yu and Joshua Blatt of the Java platform team at Yahoo! The Java platform team centralizes the Java efforts for Yahoo’s non-open source efforts. I say non-open source as the platform team covers everything except things like Hadoop, etc. which are in the public domain.

Java as a technology is not native to Yahoo! The platform at Yahoo! was primarily C/C++ and PHP at the frontend (mostly). Java came through several acquisitions which were running Java stack, notably:

  • 1998 Classic Games, Sprtasy
  • 2002 Hotjobs
  • 2003 Overture (Altavista)
  • 2004 Kelkoo, Musimatch
     

Here are the raw bytes from the session:

  • Tomcat + jboss efforts for securing them
  • Mostly LAMP stack at Yahoo!
  • Rate limiting using Apache modules 
  • Runs apache in multiple process mode
  • Y! data streams for keeping application specific stores and pushing data around (Yahoo’s proprietary message bus like implementation)
  • Integration using JNI to C++ code using Swig for wrapper generation
  • All security related code is in C++; helps maintain a single language code-base. Hence, wide JNI use from app tier
  • Uses IPC Bridge for coarse grained calls to non-thread safe libraries (JNI has multi-threading issues)
  • Group dedicated to creating JNI wrappers of native code
  • JNI performance FUD
  • Java to Native C++ code via JNI < 20 nano seconds (Cool!) compare this with Java to Java < 1 nano seconds. Big difference but nano seconds compared to network latencies of seconds
  • String functions to native code via JNI take > 3ms coz of UTF-16 to UTF-8 character conversion issues
  • JNI Multi-threading issues are solved by IPC bridge shared memory and TCP over loopback
  • JSVC Apache commons daemon for loading privileged data during Tomcat startup and then running in low privilege mode
  • Like Multi-process Apache, a new architecture for multi-process Tomcat being baked
  • Software project management using Maven (Maven — awww!)
  • Automatic builds using Cruise control and Hudson
  • RPM-based software deployment to 100s of nodes

 

 

Twitter should count out @replies and @user from status text

July 17th, 2008

Twitter messages are limited to 140 bytes (not characters, if you are multi-byte speaker!). However, a lot of messages now carry the usernames, either for @replies or for simply refering to @user in the message. As the twitter userbase grows, people would start running out of shorter names like @t, @ev or @1ndus and eventually go the email route having_my_long_name@emailhost.com.

The day is not far when twitter screen names would @mylongname2008. This one takes 10% of the text from 140 available.

At the minimal twitter should count out the @replies and @user from the 140 characters and make that part of the meta-data. 

The API can handle this transparently, It just requires adding a new field called to-user-screen-name in the API.  The API already has all the information for the sender ids, sender screen names, reply-to-user-id, user-id, etc.

 

WordPress inching towards full CMS capabilities

July 14th, 2008

Matt announced WordPress 2.6. Features include:

  • Version Control: Wiki like tracking of edits
  • Google gears compatibility
  • Theme previews — was much needed for experimentation!
  • Plugin update notification bubble
  • SSL Support and other security enhancements
  • Word count
  • Easter egg (Matt has quashed the rumours)

 

Afghanistan’s hidden treasures

July 5th, 2008

The “care takers” of Afghanistan’s precious antiquities from the ancient era concealed the treasures from Soviets, then taliban. These were feared to be lost; with the help National Geographic society along with Afghanisatan’s National Museum the unearthed trove reveals Afghanistan to be a metling pot and major trading hub where people from “east” brought muslin, spices, and ivory while the people from the west brought exotic minerals, gems, tools.

While reading the recent article on this discovery, I found a stunning picture of Ganga, the river goddess, carved out in ivory.

River goddess ganga

See the original photos and story at NGM. The treasure is going to be on display at Asian Art Museum of San Francisco in San Francisco, California between October 24, 2008, to January 25, 2009.

Barack Obama’s lucky charm: Carries a miniature god Hanuman in his pocket

June 9th, 2008

Time magazine’s photo blog has a very interesting picture where Barack Obama is displaying the things he carries in his pocket to bring him luck. One of them is a tiny metal statue of the Hindu god Hanuman.

What caught my eye that tiny icon (or Murti) did not look like one of god Hanuman as it has 4 hands, one holding a chakra, the other a trishul or a gada, the other two hands have a lotus and a conch, presumably. These 4 things are associated with Lord Vishnu (Lord Krishna is an incarnation of Lord Vishnu). However, the tiny statue has a tail (it also looks like that while showing this to the reporters, the face of the statue was touching the palm). Sending this image to a professor at Delhi University for further analysis.

Here is the slightly annotated version of the original picture.

chakra.jpgchakra.jpgchakra.jpg

Starbucks goes Free Wi-Fi

June 3rd, 2008

USAToday is reporting that Starbucks now has Free Wi-Fi.

Thirsty for more business during the worst slump in its history, Starbucks will try to lure more customers by offering two hours of free AT&T Wi-Fi a day.
The Wi-Fi freebie will be available starting Tuesday to customers who purchase a minimum $5 reloadable Starbucks Card, register online for the Starbucks Rewards Card program, and use the card at least once a month. The two hours must be consecutive. New members also receive a voucher for a free drink.

I don’t need to stop at my friendly Panera bread for a muffin and “a” free WiFi with my order 🙂

[GoogleIO] OpenSocial Primer: What is OpenSocial

May 28th, 2008

Chris Schalk, Kevin Marks, Patrick Chanzeon on stage at Google IO

Patrick’s High level overview of OpenSocial

  1. Making the web better by makting it social
  2. Jaiku’s Jyri Engestrom’s 5 rules for social networks: What is your object? What are ur verbs? How can ppl share objects? What is the gift in the invitation? Are you charging the publishers
  3. How do we socialize objects online without having to create yet another social network?
  4. Deveoper uses API to access the social objects. eg. LinkedIn
  5. Problem is we have 100s of Social Networks hence the developer needs to learn 100s of different APIs for accessing social objects
  6. Hal Varian talks about Network effects. He is a chief economist at Google. OpenSocial is an implementation of Ch. 8 from his book “Information Rules”
  7. OpenSocial Foundation created by Yahoo, Google, myspace. Goal of the foundation is to keep the specification open.
  8. With OpenSocial you learn the programming model once, er, 80% once and 20% specific to the container
  9. iLike, Slide, Flixster, RockYou etc. are building OpenSocial compliant apps for bebo, linkedin, hi5 etc.
  10. 275 million users are OpenSocial container ready

Chris Schalk on building OpenSocial Apps

  1. Client API in Javascript, REST coming up
  2. JS API in three parts a. People and Friends. b. Activities c. Persistence
  3. JS function can be embedded in gadget running in an OpenSocial container
  4. JS Callback function for returned data
  5. Posting an activity is similar to posting an activity and getting a callback
  6. Persistence. Not clear where the data persists? container or gears like client?
  7. Server side REST services: /people/{guid}/@all for getting a collection of all people connected to user identified by @guid All part of shindig codebase. does pagination etc. REST looks more promising for business apps on OpenSocial compared to JS which could be for cool apps
  8. Serverside integration options: Google AppEngine, EC2
  9. Checkout Google IO code lab

Kevin Marks now

  1. Containers provide a social context
  2. OpenSocial separates app logic from Social Context
  3. An app sees user ids — the container makes them people
  4. Users understand the social contract of the containers
  5. Save apps and users from re-registration hell
  6. Containers don’t choose the users, users choose to join
  7. They grow thru homophily and affinity
  8. Network effect can bring unexpected userbases
  9. OpenSocial gets you tol all their users
  10. Make your plan to localize. You’ll be surprised where the users are coming from
  11. Not just social networks. Social network sites, Personal dashboards, Personal CRM systems, Sites based around a Social Object
  12. Abstracted container concepts at Viewers + friends and Owner + friends. Owner and Viewer are defined by the Container. The application gets IDs and connections to other IDs
  13. The Owner may not be a person. It could be an organization or an object.
  14. Kinds of container — Social Object sites like imeem, flickr
  15. Kinds of container — CRM systems like Oracle CRM, Salesforce.com
  16. Kinds of container — Any web site enabled by Google Friend connect
  17. Container sites control policy. Check the Env., Getting information (Viewer info may not be available, may need permission). Spreading you application (Sending message to activity). Monetization and Installation

Closing Remarks by Chris, Patrick

  1. Apache Shindig open source software the allows you to host opensocial applications
  2. Heavy partner involvement
  3. Host within an hour’s worth of work
  4. Incubated at Apache
  5. Build process of Opensocial apps automated through maven (why not ant?)
  6. SocialSite at Sun is an Open Source project that allows you to turn your web app into a OpenSocial container
  7. Leverages Shindig
  8. Built by Dave “Roller” Johnson of Sun.
  9. Complimentary to Friend Connect

Endangered destinations

May 26th, 2008
  1. Great Barrier Reef. Tourism, fertilizer runoff is “decaying” the corals
  2. Mount Kilimanjaro, Africa’s tallest peak and a $1 billion per year money spinner for Tanzania tourism industry, will be gone without ice in less than 15 years
  3. Glacier National Park in Montana. In the next 50 years, without ice the word “Glacier” may need a replacement
  4. Galápagos Islands — Charles Darwin’s inspiration to his theory of evolution are being menaced by tourism and non-native species
  5. Arctice National Wildlife Refuge, Alaska. Global warming is melting the glaciers at exponential speed
  6. Venice!  Flooding and rising sea level are threating this romantic vacation.

Original story

twitter outage report

May 21st, 2008

Twitter outage report

twitter was playing hide and seek with the l33t users of twitterland. Like Jack, I thought of having my own little fun with the outage. The above is a snapshot of the last 24 hour remote monitoring on twitter’s home page. The actual outage was much more; a lot of twitter features were not available for a longer duration.