Yesterday at the Silicon valley MySQL Meetup, Frank of Spock.com talked about Spock Proxy. Spock Proxy is a fork of MySQL proxy which has been built to meet the data sharding needs of Spock.com, the people search engine.
Here are some highlights:
- Spock.com’s web interface is built on Rails and they use ActiveRecords as their O-R layer for MySQL data access
- Spock has around 1,000 web servers using Rails and they connect to MySQL slaves and masters using Spock Proxy
- Spock Proxy acts like a normal MySQL engine, except that it transparently talks to other MySQL servers. At spock they use 4 master and 4 slaves each having their own Spock Proxy.
- The Web servers each have one connection open to the Spock Proxy while the proxy may have 100s of pooled connections
- The Proxy tokenizes a SQL statement and figures out the target shard for the query. The query must have a shard_key. The shard_key is stored in a Universal DB which stores the dictionary of the partitioned tables, shard hostname/user/password, ranges and range for auto_incremented columns
- It currently supports only range based partitioning — while a lot of partitioning is done based on hashing, but should not be a big deal to change
- The current alpha version is very much suited to meet Spock’s internal needs, but I’m sure people will take this up to generalize
- Unsupported query constructs (like inner queries, group by, multi-table joins) may not throw exceptions. DDLs are also not supported