Our new RubyGems.org public dataset is now available — use it to analyze global download traffic of all gems hosted on RubyGems!
About RubyGems.org
RubyGems.org is the Ruby community’s gem hosting service. Gem creators can publish their gems to RubyGems for anyone to use. Developers can install gems from RubyGems using tools like Bundler or browse gem pages to learn more about dependencies and revision histories.
The RubyGems site (src) is fronted by Fastly, meaning that all traffic to RubyGems.org is served by the Fastly CDN. Because gems only change when a new version is released, most RubyGems requests can be cached on Fastly’s servers and served directly from the edge without Fastly needing to make a request to underlying RubyGems servers.
The RubyGems.org public dataset uses the Fastly Honeycomb Integration to send data about all recent RubyGems.org traffic to Honeycomb.
How to launch the Dataset
To start exploring, visit the RubyGems.org dataset. Take a look at these example questions if you’re not sure where to start:
- What versions of a particular gem are most downloaded?
- Break Down by
downloaded_gem_version
, calculate the overall volume of requests over time (COUNT
), and Filter to just one gem’s traffic (downloaded_gem_name = SOME_GEM_NAME
). - See the example query and click “Run Query” to try it yourself. You’ll see which versions of the gem RSpec have been downloaded recently.
- Break Down by
- Which gems are the slowest to download?
- Break Down by
downloaded_gem_name
, Calculate the overall distribution and 99th percentile of response times (HEATMAP(time_elapsed)
andP99(time_elapsed)
), and Filter to just traffic that matches the download pattern (url starts-with /gems/
). - See the example query and click “Run Query” to try it yourself.
- Break Down by
- Which gems are the most internationally popular?
- Break Down by
downloaded_gem_name
, then Calculate the number of distinct countries represented in the client IPs recorded by Fastly (COUNT_DISTINCT(geo_country_code)
). You may also want to Filter to traffic wheredownloaded_gem_name exists
. - See the example query and click “Run Query” to try it yourself.
- Break Down by
See the Honeycomb Query Builder documentation for information on how to construct more queries. Check out the RubyGems dataset docs for more information about specific data fields.
Insights
Some fun facts we found while perusing the RubyGems.org dataset:
- Lots of rubyists download gems from the eastern United States. The Washington, DC area datacenter serves nearly as many rubygems requests to client IPs located in Ashburn, VA as all other datacenters combined.
- Humans are driving use of bundler 16.x, but it looks like automated tools use older versions like 1.12.5. You can see that 16.x downloads spike up during the workday, but use of older versions tends to be more consistent around the clock.
- Among RubyGems clients, the city of Gunzenhausen (normalized to
gunzenhausen
in the data) has the most unique client IPs using IPv6, followed by Redmond, WA. (And almost nobody uses HTTP2 yet.)
Let us know what you find!
If you find interesting insights in the RubyGems.org dataset, send them to emily@honeycomb.io and christine@honeycomb.io or tweet at @honeycombio so we can share them and/or add them to our docs!