Yesterday, I attended a talk at SDForum presented by Dean Yu and Joshua Blatt of the Java platform team at Yahoo! The Java platform team centralizes the Java efforts for Yahoo’s non-open source efforts. I say non-open source as the platform team covers everything except things like Hadoop, etc. which are in the public domain.
Java as a technology is not native to Yahoo! The platform at Yahoo! was primarily C/C++ and PHP at the frontend (mostly). Java came through several acquisitions which were running Java stack, notably:
- 1998 Classic Games, Sprtasy
- 2002 Hotjobs
- 2003 Overture (Altavista)
- 2004 Kelkoo, Musimatch
Here are the raw bytes from the session:
- Tomcat + jboss efforts for securing them
- Mostly LAMP stack at Yahoo!
- Rate limiting using Apache modules
- Runs apache in multiple process mode
- Y! data streams for keeping application specific stores and pushing data around (Yahoo’s proprietary message bus like implementation)
- Integration using JNI to C++ code using Swig for wrapper generation
- All security related code is in C++; helps maintain a single language code-base. Hence, wide JNI use from app tier
- Uses IPC Bridge for coarse grained calls to non-thread safe libraries (JNI has multi-threading issues)
- Group dedicated to creating JNI wrappers of native code
- JNI performance FUD
- Java to Native C++ code via JNI < 20 nano seconds (Cool!) compare this with Java to Java < 1 nano seconds. Big difference but nano seconds compared to network latencies of seconds
- String functions to native code via JNI take > 3ms coz of UTF-16 to UTF-8 character conversion issues
- JNI Multi-threading issues are solved by IPC bridge shared memory and TCP over loopback
- JSVC Apache commons daemon for loading privileged data during Tomcat startup and then running in low privilege mode
- Like Multi-process Apache, a new architecture for multi-process Tomcat being baked
- Software project management using Maven (Maven — awww!)
- Automatic builds using Cruise control and Hudson
- RPM-based software deployment to 100s of nodes