Explore RubyGems data with Honeycomb

3 Min. Read

Our new RubyGems.org public dataset is now available — use it to analyze global download traffic of all gems hosted on RubyGems!

About RubyGems.org

RubyGems.org is the Ruby community’s gem hosting service. Gem creators can publish their gems to RubyGems for anyone to use. Developers can install gems from RubyGems using tools like Bundler or browse gem pages to learn more about dependencies and revision histories.

The RubyGems site (src) is fronted by Fastly, meaning that all traffic to RubyGems.org is served by the Fastly CDN. Because gems only change when a new version is released, most RubyGems requests can be cached on Fastly’s servers and served directly from the edge without Fastly needing to make a request to underlying RubyGems servers.

The RubyGems.org public dataset uses the Fastly Honeycomb Integration to send data about all recent RubyGems.org traffic to Honeycomb.

How to launch the Dataset

To start exploring, visit the RubyGems.org dataset. Take a look at these example questions if you’re not sure where to start:

  • What versions of a particular gem are most downloaded?
    • Break Down by downloaded_gem_version, calculate the overall volume of requests over time (COUNT), and Filter to just one gem’s traffic (downloaded_gem_name = SOME_GEM_NAME).
    • See the example query and click “Run Query” to try it yourself. You’ll see which versions of the gem RSpec have been downloaded recently.
  • Which gems are the slowest to download?
    • Break Down by downloaded_gem_name, Calculate the overall distribution and 99th percentile of response times (HEATMAP(time_elapsed) and P99(time_elapsed)), and Filter to just traffic that matches the download pattern (url starts-with /gems/).
    • See the example query and click “Run Query” to try it yourself.
  • Which gems are the most internationally popular?
    • Break Down by downloaded_gem_name, then Calculate the number of distinct countries represented in the client IPs recorded by Fastly (COUNT_DISTINCT(geo_country_code)). You may also want to Filter to traffic where downloaded_gem_name exists.
    • See the example query and click “Run Query” to try it yourself.

See the Honeycomb Query Builder documentation for information on how to construct more queries. Check out the RubyGems dataset docs for more information about specific data fields.

Insights

Some fun facts we found while perusing the RubyGems.org dataset:

  • Lots of rubyists download gems from the eastern United States. The Washington, DC area datacenter serves nearly as many rubygems requests to client IPs located in Ashburn, VA as all other datacenters combined.Eastern US data centers

    Ashburn, VA makes up a remarkably large portion of traffic to RubyGems.org.

  • Humans are driving use of bundler 16.x, but it looks like automated tools use older versions like 1.12.5. You can see that 16.x downloads spike up during the workday, but use of older versions tends to be more consistent around the clock.Bundler versions

    Workday trends surface differences in patterns between humans and automated tools.

  • Among RubyGems clients, the city of Gunzenhausen (normalized to gunzenhausen in the data) has the most unique client IPs using IPv6, followed by Redmond, WA. (And almost nobody uses HTTP2 yet.)Who

Let us know what you find!

If you find interesting insights in the RubyGems.org dataset, send them to emily@honeycomb.io and christine@honeycomb.io or tweet at @honeycombio so we can share them and/or add them to our docs!

Don’t forget to share!
Emily Nakashima

Emily Nakashima

VP, Engineering

A former manager and engineering leader at multiple developer tools companies including Bugsnag and GitHub, Emily is passionate about building best-in-class, consumer-quality tools for engineers. She has a background in product engineering, performance optimization, client-side monitoring, and design.

Related posts