TAKE A QUICK LOOK :
On facebook what we are basically trying to do is how people share more information with the people around them and the people they are connected to. The data sets and the data access patterns are quite different from other types of applications. To better understand this, lets take an example of email.In emails, all the data of a particular user can be stored at one place. On the other hand, on facebook all of the different applications that we have are pulling data from facebook’s server on the basis of our interaction with our different friends. So if user ‘X’ seaches for a person ‘joe’, he might get a completely different results than if some other user ‘Y’ searches for ‘joe’.
Similarly, joe’s timeline could be completely different for person ‘X’ and ‘Y’ depending on the security rules and privacy policies implemented.Moreover,if person X looks at his own news feed, it has to pull diffrent posts of his friends and that could be completely different from that of a person Y.
Hence, the data access pattern is different and in order to make it work we need a fast cache and a good way to get fast access of the dynamic data.
MEMCACHED :
Memcached is an opensource,high performance, distributed memory object caching system, generic in nature, but intended for use in speeding up dynamic web applications by alleviating database load.
The working flow of memcached can be explained as follows:
The application receives a query from the user or the application.
The application checks whether the data needed to satisfy that query is in memcached.
If the data is in memcached, the application uses that data.
If the data is not in memcached, the application queries the datastore and stores the results in memcached for future reference.
The pseudocode below represents a typical memcache request:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
def get_data(): cache_data = memcache.get(‘key’) if cache_data is not None: return cache_data else: cache_data = self.query_for_data() memcache.add(‘key’, cache_data, 60) return cache_data |
The working of memcached has been explained above. As shown in the diagram, all of the applications servers can interact to the memcached. Memcached is fast enough and supports many sockets.Most memcached libraries have built in notion to support multiple machines. Because memcached is just a key-value store, it can be considered as a simple hash table so you can simply hash on a key to decide which server it require to send a data to. And because it’s a cache it would be acceptable even if you lose some data as you can easily retrieve it from database and again write it to the cache.
You can also run a little memcached server on each of the host machines on which your application servers reside and each application server can communicate to all of the caches.For data retrieval, memcached has to interact to its respective database server.
HOW CACHED DATA EXPIRES :
By default, values stored in memcached are retained as long as possible. Values may be evicted from the cache when a new value is added to the cache if the cache is low on memory. When values are evicted due to memory pressure, the least recently used values are evicted first.
Under rare circumstances, values may also disappear from the cache prior to expiration for reasons other than memory pressure. While memcached is resilient to server failures, memcached values are not saved to disk, so a service failure may cause values to become unavailable.
In general, an application should not expect a cached value to always be available.
set(key, value [, expiry]): Sets the item associated with a key in the cache to the specified value. This either updates an existing item if the key already exists, or adds a new key/value pair if the key doesn’t exist. If the expiry time is specified, then the item expires (and is deleted) when the expiry time is reached.
get(key): Retrieves information from the cache. Returns the value associated with the key if the specified key exists. Returns NULL if it doesn’t exist.
add(key, value [, expiry]): Adds the key and associated value to the cache, if the specified key does not already exist.
replace(key, value [, expiry]): Replaces the item associated with the specified key, only if the key already exists. The new value is given by the value parameter.
delete(key [, time]): Deletes the key and its associated item from the cache. If you supply a time, then adding another item with the specified key is blocked for the specified period.
incr(key , value): Increments the item associated with the key by the specified value.
decr(key , value): Decrements the item associated with the key by the specified value.
flush_all: Invalidates (or expires) all the current items in the cache.
Some of the companies using memcached are listed below :
Youtube
Microsoft Azure
Amazon Web Services
For a more practical approach to memcached, refer to my next blog.