As I begin writing this, I realize I am in some way contributing to all the noise and novelty around "NoSQL". As tempting as rewriting your website to use Cassandra may be, the chances that an objective cost-benefit analysis will support your opinion are pretty small when your project is out to make money. Pragmatism is the key to utilizing any tool to its fullest and most appropriate. You can still ride the "no wave" (with apologies to Lydia Lunch et al.) by implementing key-value stores piece by piece on your site. Focus on domains where their special optimizations can be used the most efficiently. In this blog post, I'll be outlining Sesh, a wrapper to Zend_Session_Namespace which uses Redis to reduce the amount of RAM used within a session.Scalability is a problem that never goes away. As terrible as premature optimization is, not having a plan for higher demand is, for the web developer, an even more pernicious evil. As a site attracts more and more users, the fundamental characteristics of traditional web development can start to feel cumbersome. In my opinion, that is why there's so much talk lately about the NoSQL movement. It's easier to blame the database, which is a slightly opaque but necessary evil to the traditional LAMP developer. Treating each database query as a detriment is a tempting fallacy: they're easy to count, and sometimes hard to optimize for even an average developer. As load increases on a site, improperly designed or scaled databases can reduce a site's responsiveness, but so can inefficient server-side code.
Sure, fast key-value stores are great. In fact, I'm going to spend a part of this post speaking in support of one. But that's no reason to dis your RDBMS. Their value is obvious. If your data doesn't have a lot of intricate, multi-faceted relationships, then a flat key-value store works. On the other hand, if MySQL is at the core of your website, it may be foolhardy to think you could rewrite the whole thing to use Cassandra in one weekend in anticipation of Twitter-like traffic. More than likely, spending a weekend to make sure your queries are properly optimized and your data is clean will actually improve your site. Let's face it; you're not going to get an overhaul like that done in the time you'd expect anyway.
Making sure you're not letting an alternate architecture become some kind of a 'grass is always greener' scenario is really the name of the game. Nothing is better at keeping you pragmatic than actually implementing new, exciting software on a small scale first, where it can really count.
I recently had to work on a problem that used a lot of session data to decide on whether or not to display something. Because it all addressed a particular feature, I wanted to keep everything in the same namespace. With the multitude of variables -- many somewhat similar in function -- associated with the display logic, it was becoming abundantly clear that I was going to need to be fairly verbose to be able to keep track of everything my code was doing. So of course, I was working with variables such as $myNamespace->flagClickedPageviews to differentiate from $myNamespace->defaultPageviews, etc. Because this feature could display anywhere on the site, any user that accessed the site would immediately have some of these variables stored in their session.
During a code review, a colleague of mine brought up an excellent point: these variables will be stored for each session, meaning that each byte will be multiplied by the number of users visiting the site on any given day. If this totals in the hundreds of thousands, then we're talking kilobytes of wasted RAM per variable. With more people, or more session variables, this will add up quickly.
Wanting to write clean, legible, sensible code, shortening variable names would be problematic. Zend_Session_Namespace is an excellent tool to use in this scenario, but one shortcoming is that it stores exactly what you feed to it as an attribute within the session. Optimally, it would be nice to have the best of both worlds: verbose attribute naming with minimal drain on RAM.
- You will need to have Redis installed on your server.
- You will need to be using Rediska and at least the Zend_Session code library to use this off the shelf.
- By limiting everything to a single ASCII-extended character, the amount of variables per namespace is theoretically limited to 255, with chr(0) being null. This has not been tested, however, so your feedback is totally welcome there.