CDN Usage And Javascript Library Statistics From httparchive.org

Recently I had an idea about creating the local in-browser CDN using the extension API. There is a Chrome extension, incorporating some of those ideas.

Motivation

As a web developer I am using various CDN for static files delivery.

CDN is a great idea - it let browsers cache frequently used files. But the question is: why don't browser include those static assets in it's distributions?

There are several js/css/font files which can be bundled into the browser. This will save some traffic and reduce page loading time (still not too much in a short run, but really significant amass).

I have came to some assumptions:

  1. All CDN static assets can be cached locally, as they are supposed to never change. Nobody can modify those files, they are permanent. Why should we spent traffic and time loading those files from the network at all?
  2. The majority of static files, which are hosted not on the CDN, but on private servers and has a certain name pattern (e.g. jquery-1.10.2.min.js) can be assumed permanent.

I have created the abovementioned extension, which incorporates those two assumptions. The extensions basically listens to the webpage resources requests and hijacks it by local resources.

I wanted to have a better insight than experience-based assumptions, so I've decided to conduct a little research on the topic.

Pre-research

First of all I've thought about using Common Crawl dataset. But I have no access to the computation cloud, which will be able to download and crunch those data (81Tb is not a joke).

Then I've came across Steve Souder's article regarding the jQuery statistics in httparchive dataset.

Bingo! The dataset is what I really need to check my hypothesis: it contains the request data for 300 000 top-ranked Alexa sites.

I have used the Mar, 1st crawl results.

Research

I've downloaded the dataset and played around with it.
I will share the results following the pattern in the Steve's article, so that you can compare the trends.

Sites Loading jQuery from Google CDN - Mar 1 2014

SQL gist @ Github

Name Count Percent
jquery 59977 20.6223

Most popular jQuery versions from Google CDN - Mar 1 2014

I have made some changes to initial SQL and postprocessed the data. There are two problems, which can bias the data:

  1. Some sites use urls in the "short format": http://ajax.googleapis.com/ajax/libs/jquery/1/jquery.min.js.
    Today this format corresponds to jquery-1.9.1.
  2. Wordpress adds ?ver=wpversion query for all static resources urls, which will be groupped as a different entry in SQL results.
  3. http vs https does not make sense for version frequency statistics. If you are interested in this kind of distribution, you should run another query.

SQL gist @ Github

Version Count Percent
1.7.2 8938 14.9024
1.7.1 6842 11.4077
1.8.3 5670 9.4536
1.9.1 5533 9.2252
1.10.2 5244 8.7434
1.8.2 3832 6.3891
1.4.2 3673 6.1240
1.3.2 2519 4.1999
1.5.2 2297 3.8298
1.6.4 1987 3.3129
1.4.4 1985 3.3096
1.6.2 1644 2.7411
1.6.1 1395 2.3259
1.5.1 1160 1.9341
1.9.0 964 1.6073
1.8.1 880 1.4672
1.10.1 868 1.4472
1.8.0 803 1.3388
2.0.3 508 0.8470
1.2.6 449 0.7486
1.7.0 403 0.6719
1.4.1 382 0.6369
1.11.0 363 0.6052
1.4.3 357 0.5952
2.0.0 246 0.4102
1.6.0 204 0.3401
1.6.3 193 0.3218
1.3.1 112 0.1867
1.5.0 104 0.1734
1.4.0 83 0.1384
1.10.0 79 0.1317
2.0.2 74 0.1234
2.1.0 68 0.1134
1.3.0 42 0.0700
2.0.1 19 0.0317
1.2.3 13 0.0217

Top CDNs Serving JS Libs - Mar 1 2014

SQL gist @ Github

Label Count Percent
CDN requests 78160 26.8743

SQL gist @ Github

CDN Count Percent
Google 67671 86.5801
jQuery 9222 11.7989
Cloudflare 3996 5.1126
Yandex 2379 3.0438
Microsoft 1300 1.6633
JsDelivr 324 0.4145

Google CDN profile - Mar 1 2014

Label Count Percent
Total CDN requests 67198 23.1052

SQL gist @ Github

Script Count Percent
jquery 59977 89.2541
jqueryui 12437 18.5080
webfontloader 4624 6.8812
swfobject 2347 3.4927
prototype 993 1.4777
scriptaculous 787 1.1712
mootools 445 0.6622
angularjs 353 0.5253
dojo 186 0.2768
chrome-frame 75 0.1116
ext-core 16 0.0238
jquerymobile 1 0.0015

Rough number of Wordpress-powered sites - Mar 1 2014

There is heuristics which can help us get a rough estimate of number of sited powered by Wordpress:

  1. Wordpress uses jquery-migrate plugin. The plugin is a rare thing, as it is used to bring deprecated features of old jQuery to jQuery 1.9+.
  2. Wordpress adds ?ver=<wpversion> query to all static assets it serves.

SQL gist @ Github

Count Percent
29819 10.2529

Conclusions

  1. The total percent of CDN-friendly sites are keep growing.
  2. jquery-1.7.x is still the most popular jquery version having 25% share of all jquery scripts.
  3. Google, Jquery and Cloudflare are most popular CDN providers.
    You can find some other CDN providers here.
  4. jQuery and jQueryUI are total leaders in Google CDN servings. Accompanied by swfobject and webfontloader.
  5. 10% of top internet sites are powered by Wordpress!

Now you have the secret knowledge! Be responsible using it in public ;)