Using Azure CDN to extend Bing Maps

Update(2) 23-dec-2010: You don’t have to pay for CDN storage, but the CDN storage isn’t guaranteed to be filled, it could be emptied based on multiple factors(see understandbilling and open “understand the Current charges”).

The Content Delivery Network (CDN) is a nice Windows Azure feature which will help us deliver content across the globe. This is ideal for mostly static content, which needs the fastest possible access/download. The CDN network currently consists of 24 Edge servers, delivering the content to your end users.

But what are the extra costs of using a CDN, compared to the normal blob storage. First let’s have a look at the scenario, I’ve created an additional layer for Bing Maps, showing some extra charts with nautical navigation information. For every zoom level in Bing Maps I need images, but the detailed zoom levels will take several GB of data (or even TB for all the charts of the world). Here is a screenshot with the navigation charts overlaid on the Bing roads on the left and some Bing Satellite on the right:

With MapCruncher I made several layers (limited to zoom level 10 in Bing Maps) and uploaded them to the cloud to my blob storage container.

If most of your user are all from the same region, we can store our content in the Azure storage blob in the datacenter you choose during the creation of your storage, so it will be in Amsterdam if you choose Region West Europe. For more performance, we need more places to store our content.

So let’s do some calculations on the cost of this solution(use the azure roi calculator for some calculations), and start with some assumptions:

  1. The map consist of 100.000 files, each file is around 60KB, so total storage is 6GB
  2. 200 visitors a day, each querying 1000 files * 30 days is 6 Million transactions/downloads per month
  3. 6 Million downloads * 60KB = 360 GB of traffic each month
  4. No updates to the data are made
  5. Just to calculate for the future, I use 26 CDN locations (only 24 available at this moment)

Scenario 1

Normal Windows Azure blob storage in Europe West, users from all over the world connect to this location. All content is stored in a single blob. Users around the world will have some delays as the content is served.

Total cost:

6GB of blob storage used in a month 6GB * $0.15 = $0.90
6 Million downloads a month 6M / 10K * $0.01 = $6
360 GB of bandwidth every month 360GB * $0.15 = $54

===

$60.90 per month

Scenario 2

Store a copy in each of the 6 Windows Azure locations (2x Asia, 2x Europe and 2x US), this will give an increase for users near 1 of the locations, however Australia and other countries still might suffer. This solution needs 6 blob storage, so we manually have to keep all 6 blob storage in sync.
360 GB of bandwidth, assume 1/3 of all bandwidth is in Asia, at higher costs per GB download.

Total cost:

6 datacenters each with 6 GB of storage 6 * 6GB * $0.15 = $5.40
6 million downloads a month 6M / 10K * $0.01 = $6
240 GB of bandwidth in Europe/US 240GB * $0.15 = $36
120 GB of bandwidth in Asia 120GB * $0.20 = $24

===

$71.40 per month

Scenario 3A

Again store everything in 1 location, and enable the CDN network. Now we have 26 CDN nodes delivering the content to us. Remember that all nodes need to download the data once, before they can cache it. Also the CDN will check every 2 days if there is a change to the cached files(this is the default), so a connection is made. For this scenario we’ll do some worst case thinking, that all users access all 26 CDN centers AND the CDNs download ALL content.

Assume all data is kept in cache on each CDN server (assume 26 CDN datacenters,  6GB each)
Assume each CDN is completely filled during the month (this is worst case, normally over time each CDN will fill up)
CDN checks every 2 days if the data has changed(so 100.000 files, are checked 30/2 = 15 times each month = 1.5 Million transactions)

Total cost (worst case scenario):

6GB of blob storage used in a month 6GB * $0.15 = $0.90 for storage
6 million downloads a month 6M / 10K * $0.01 = $6
240 GB of bandwidth in Europe/US 240GB * $0.15 = $36
120 GB of bandwidth in Asia 120GB * $0.20 = $24
26 CDN locations check every 2 days for changes for 100.000 files 26 * 30/2 *  100.000 / 10K * $0.01 = $39
26 CDN locations need to download 6GB of data 26 * 6GB * $0.15 = $23.4 (first month only)
26 CDN locations each with 6GB of storage 26 * 6GB * $0.15 = $23.40

===

$129.30 152,70 per month (first month, after CDN is filled this will drop to $105.90 129.30)

Scenario 3B

Same as scenario 3A, except that the CDN network only checks once a month if there is new content. This will only work with relatively static content, or move new data to a new blob container or new filename and start caching again (at extra download costs)

Total cost:

6GB of blob storage used in a month 6GB * $0.15 = $0.90 for storage
6 million downloads a month 6M / 10K * $0.01 = $6
240 GB of bandwidth in Europe/US 240GB * $0.15 = $36
120 GB of bandwidth in Asia 120GB * $0.20 = $24
26 CDN locations check once a month for changes for 100.000 files 26 * 1 *  100.000 / 10K * $0.01 = $2.60
26 CDN locations need to download 6GB of data 26 * 6GB * $0.15 = $23.4 (first month only)
26 CDN locations each with 6GB of storage 26 * 6GB * $0.15 = $23.40

===
$92.90 116,30 per month (first month, after CDN is filled this will drop to $69.50 92.90)

Scenario 4

Same as scenario 3B, but lets assume the users from the CDN don’t access all the content, each CDN has an average of 2GB of storage(33.333 files). Now the individual CDN servers don’t need as much transactions to check if the content was modified. Over time the CDN would fill up with more data and the cost could increase over time.

Total cost (normal scenario):

6GB of blob storage used in a month 6GB * $0.15 = $0.90 for storage
6 million downloads a month 6M / 10K * $0.01 = $6
240 GB of bandwidth in Europe/US 240GB * $0.15 = $36
120 GB of bandwidth in Asia 120GB * $0.20 = $24
26 CDN locations check once a month for changes for 33.333 files 26 * 1 *  33.333 / 10K * $0.01 = $0.86
26 CDN locations need to download 2GB of data 26 * 2GB * $0.15 = $7.80 (first month only)
26 CDN locations each with 2GB of storage 26 * 2GB * $0.15 = $7.80

===
$75.56 83.36 per month (first month, after CDN is filled this will drop to $67.76 75.56)

Conclusion

With the CDN we can quickly deliver content to our customers, but there is still some guesswork to be made. The worst case scenario (3A) is about 2.5 2 times the price of the simple blob storage (scenario 1), but we get a lot of edge servers delivering our content very fast. The slower growth example (4) is around 25%  more expensive compared to scenario 1 and looks more interesting and the price is about the same as scenario 2. But keep in mind that if your service gets popular all around the world, the cost will rise to scenario 3B or even more!

By caching items longer (scenario 3A versus 3B) we’re saving $36.40 a month in transaction costs! This really pays out. Only think about an update scenario when your content changes a lot. Upload everything in new files, or start a new public blob container as a new cache.

You can test the overlays, so lets hope my demo Azure account I got from MS keeps working a few more week!

Other considerations

If the content would change more often, the CDN would clear old items and save some storage costs.
Most of the costs are into bandwidth, so make sure your content is cached correctly. Set the CacheControl of all the items in your blob container to “public, max-age=7200” or longer to get the files cached at the client and downstream proxy servers. With returning customers this will save a lot in bandwidth.

More information

Understanding Billing
About the Windows Azure CDN
Azure ROI Calculator
NOAA Charts of the US
MapCruncher
Bing Maps Quad Key coordinate system

Cache control at CDN

Availability of content in the Azure CDN’s local caches (often called “cache efficacy” or “offload”) is nondeterministic but has general behaviors that developers should understand. Availability of content in cache is influenced by multiple factors including:

  • Expiration (“max-age”) header values
  • Overall total size of the developer’s content library (how much could be cached)
  • Active working set (how much is currently cached)
  • Traffic (how much is being served)
  • Cache churn (how often are objects being added to cache, or aging out)

For example, a developer with a large (“wide”) library with high churn and high traffic will have less cache efficacy than other users, because cache turnover is higher, so objects will be swapped in and out more frequently. Thus, this developer’s overall Windows Azure Storage data transfer charges will be proportionally higher, since more origin requests are required; the overall end-user experience will be also be slightly slower on average since fewer requests are served from cache.

The main cost control for developers to affect cache efficacy is the “max-age” HTTP header. Longer max-age headers allow the CDN to hold objects longer, reducing the need to make origin requests. Please refer to Windows Azure Storage documentation on MSDN for details.