In this post I will describe how the Content Delivery Network (CDN) functionality provided by Memstore cloud storage is implemented.
What is a CDN?
A Content Delivery Network is a system which provides optimised web access to different kinds of data by replicating and caching the information as close to the user as possible. When the user requests content, the CDN serves it from the closest node, balancing the load between different servers, so the latency and access speed is the best possible.
A CDN for web services serves static web content such as media files (images, video, audio), JavaScript, CSS files or even web pages.
CDN Meets Storage
The kind of content that we want to serve using a CDN is usually the kind of objects you'd like to store in Memstore: loads of static data.
That's why our CDN service is the perfect complement for your storage: you can set a container as public (enabling the CDN service) and start serving the content stored on it straight away using the CDN, without having to retrieve the objects to be served or even use a web server.
Our CDN architecture is on two levels: local (UK) and worldwide.
Local Nodes (UK)
Memstore is a cloud storage service based on the open source project OpenStack Object Storage. OpenStack Object Storage doesn't provide a CDN implementation.
We implemented a lightweight OpenStack Storage Object API client that performs four simple steps:
- Maps the host of the HTTP requests into an account/container pair.
- Checks the availability of the pair and the container metadata to see if it's public and hence CDN enabled.
- Retrieves the requested object over our Gigabit backbone, serving the file to the client browser from the first bytes received so there's no noticeable delay between the request and the response.
- The requested object is cached in memory for short-term access, and on disk for mid-term access (especially for big files: if they didn't change since the previous request, it's obviously faster serving the local copy).
This is the basic high-level functionality of a CDN node, and it's used as part of a cluster to create the local CDN service.
Besides the basic functionality, we offer the possibility of adjusting per-container some values that would affect the CDN behaviour, such as the time to live (TTL) of the objects in the cache, whether directory listings are served, or specify a file to be used as default document if available (eg. "index.html").
The host-to-Memstore mapper has some interesting parts which deserve further discussion. We provide by default a domain name that any customer can use to access the public container, and it has the form:
container.HASH.cdn.memsites.com
If a resource is requested from the CDN from a host which is a subdomain of cdn.memsites.com, we translate it into the aforementioned account/container pair, check permissions and features, and then serve the content as required. If the CDN request is for a different domain, our service will try to resolve the domain to see if it's a CNAME record pointing to a subdomain in cdn.memsites.com.
That way, we let our customers use their own domains to serve content, and the mapping operation is cheap enough so that we can satisfy a large number of requests per second per CDN node.
Worldwide Content
This definition is perhaps a little bit inaccurate because the CDN nodes in the UK are probably fast enough to serve content to any place in Europe.
While our local CDN is already in production, we're still putting the final touches on our global infrastructure.
The worldwide strategy is slightly different to the local one, mainly because the traffic overhead of OpenStack API in our local backbone is acceptable; but in the case of our global backbone that could affect the performance of the service.
So basically our non-local CDN nodes don't speak directly to Memstore but instead with local CDN nodes, and thus the penalty for serving the content stored in the cloud is minimal.
Our worldwide strategy includes another difference: GEO localised DNS services.
When a client browser resolves a CDN domain, our GEO-IP enabled DNS (geoipdns, plus a set of special patches to deal with wildcard hosts), will return the IP address of the CDN cluster that presents the least latency for that geographical location, and in that way we can provide the fastest possible response.
So that's the basic design of Memstore CDN service. You can "see" it in action in Memset's MD weblog: Kate's Comment which is using a Wordpress plugin to store the media on Memstore and serve it using the CDN.