Whilst this is rather an advanced topic to start off this blog, it is very fresh in my mind still…
I recently had a play with Memcached, a RAM based caching system used by large websites such as Facebook. Memcached lets you store data of up to 1MB in key-value pairs. (You can probably alter this in the settings, I haven’t delved at this stage). I found a number of guides out there but none worked seemlessly for me on Ubuntu Server 10.10 with php under apache so I am writing my own very brief guide here.
Before I do though, some important observations that I have made since I started playing with memcached:
- Memcache and Memcached are 2 different things. Even one of the Memcached guides mistakenly tried to use the Memcache class in their example code causing it to fail -another of the reasons for my guide. That and all the Ubuntu guides for for Ubuntu 7!
- You set an expiry time (in seconds) for each key-value pair, much like a cookie I guess
Installing on Ubuntu
I am going to assume that you have installed php5, apache2 and libapache2-mod-php5 and set up a virtualhost such that you can visit yourdomain/index.php and receive a proper page.
Install memcached, you should see it also install dependencies libevent, libmemcached.
sudo apt-get install php5-memcached memcached
That *should* be all you need hopefully.
Using Memcached
There are two main steps to use memcached:
$m=new Memcached(); //Create a new memcached object in $m
$m->addServer('127.0.0.1',11211); //Define a memcached server,
//in this case localhost on the default port of 11211
You can then write data to the memcached server:
$m->set('key_here',$data,30); //Save the data in $data with the key 'key_here' with an expiry of 30 seconds in the future
You an then read the data back and check if it exists or has expired:
$variable = $m->get('key_here');
if($variable){
//The data for 'key_here' exists!
}else{
//The data does not exist in the cache
}
Finally, here is a test script I used which has a few more bells and whistles. It runs an arbitrarily slow routine to generate some data slowly and then caches it in memcached for 30 seconds. Try refreshing the page and watch the cached vs uncached load times. Of course this is a relatively contrived task designed to be very slow. If you’re taking more than 10s of ms to process a page on an unloaded server, something is very wrong with your code, but it does show that the cache is incredibly fast. (The size of the data being generated & cached is shown in brackets on the output page).
addServer('127.0.0.1',11211);
$cached=$m->get('test');
$life=30;
if(!$cached){
echo "Generating arbitrary slow data and saving to cache...OK
";
$tmp="";
for($i=1;$i<20000;$i++){
$tmp.=substr(md5($tmp.rand(1,$i)),0,1);
}
$tmp2="";
for($i=1;$i<=4;$i++){
$tmp2.=$tmp;
}
$tmp=$tmp2;
$m->set('test',$tmp,$life);
$m->set('age',microtime(true),$life);
$cached= $m->get('test');
echo "Uncached read(".strlen($cached)."):
";
}else{
echo "Cached read: (".strlen($cached).")
";
$rem=round($life+$m->get('age')-microtime(true),1);
echo "Cache life remaining: ".$rem." seconds
";
}
$end=microtime(true);
$dur=$end-$start;
echo "Time elapsed: ".round($dur,6)."seconds.
Data generated:
";
echo $cached;
?>
How I would use Memcached?
Building a dynamic webpage has two main aspects:
- Pulling data from a mysql database
- Putting together the page using pieces of html and logic in php
To scale better, at the first possible opportunity to uniquely identify a page in a consistent way, for example identifying a cms page by its id, I could check whether or not I’d cached it in a key with a name like cms_id where id is the page id. If in the cache, great, load up the entire cached page of html without any processing, database communication etc! If it’s not cached, call the function(s) to produce the page but remember to save the html into the cache for next time!
Cache consistency issues
It is very important when using a strategy like what I have just outlined, that the items in your cache are consistent with your database. I suggest a 2 pronged approach to this. First, every edit, addition or change to your site must invalidate the cache of any affected page(s). The quickest, easiest way to do this is to delete the cache for that page.
$m->delete('key_here');
It is not necessary to bother with all the code needed to update the cache when you alter a page because the non-existence of a cached copy will force the next visit to this page (which is probably going to be the user who edited it anyway) to experience an un-cached read which in turn will save a cache for future.
The 2nd prong is to set your expiry sensibly. But this really shouldn’t be relied upon for keeping your cache healthy unless you are setting an unreasonably short life e.g. seconds. Still, if you’re running a huge, rapidly changing site on a powerful server with a massive load, you may well only need to cache for a few seconds to serve to many many hits!
Hopefully this has given you some ideas on how to start playing with Memcached. Since most of my sites are on shared hosting, I can’t use Memcached with them yet, but as load grows, I will be moving to a VPS at which point this will be a very handy tool. I was also asked by a friend to help with the back-end server setup for a multiplayer online game he’s creating. Memcached seems like a great way to let lots of players repeatedly hit the server multiple times per second for updates on the whereabouts of other players whilst only important actions get stored in MySQL. This allows the database queries to only occur asynchronously every few seconds for each player when they do something worth recording e.g. gaining points, killing each other etc, whilst keeping their and other players’ locations etc updated in the cache.
Just tried this out on one of my websites on my test server. The site is purely dynamic, with every page generated on the fly. When a user visits a page, information from the database and a variety of scripts and templates generate the final output that the user sees. This means a lot of duplicated work gets performed as the rate at which pages are requested goes up, but has the advantage of everything always being up to date.
By the time a few includes have been loaded, several database queries performed and loads of page logic executed, we’re looking at a 20ms page creation time. To test memcached, I created a unique id for a page based on its page id and checked for the presence of this in the cache. If not present, I ran my page generation script. A minor modification to the page script causes it to save all output to a global variable rather than echoing/printing it out. I then save this variable to the cache for next time (and of course print it out).
The result: cached pages are loaded from cache in <2ms This is more than 10x faster than generating the page from scratch.
How would I keep the cache fresh? I would simply tell all the scripts that edit pages to trash the cache of the page they edited at the time of saving an edit.
Unfortunately my production host does not support memcached but it was fun to try anyway.