OneFootball Scores an Observability Goal with Honeycomb

OneFootball Scores an Observability Goal with Honeycomb

12 Min. Read

For football fans worldwide, staying connected to their favorite teams, players, and matches is a passion—and OneFootball delivers exactly that. The platform is a one-stop shop for football fans to follow their teams, get up-to-date information, and immerse themselves in global football culture. With over 100 million users spanning multiple continents, OneFootball is an essential companion for fans to track live scores, player stats, breaking news, and more.

Behind the scenes, OneFootball runs on a sophisticated, high-scale infrastructure hosted on AWS and distributed across multiple AWS zones under the same region. The company’s traffic patterns present both predictable challenges—such as spikes during major matches and tournaments—and unexpected ones, like last-minute transfers or controversial VAR (video assistant refereeing ) decisions that send fans flocking to the app. This mix of predictability and unpredictability makes app reliability mission-critical. 

“We’re not just delivering content; we’re delivering a real-time connection to something people are passionate about. That means we can’t afford delays or gaps in the experience, especially for our pay-per-view users during high-traffic moments,” said Bruno Costa, Principal Site Reliability Engineer at OneFootball.

To meet these goals, OneFootball recognized that observability was essential to delivering a seamless experience—and as seasoned engineers, they prioritized having the right tool to achieve it. Identifying issues quickly, and resolving them before they impacted fans, required visibility across their entire platform. This mission led them to Honeycomb, setting the stage for a transformative journey in how they approach reliability and performance at scale.

Modernization through observability

When OneFootball’s CTO launched a modernization initiative focused on continuous delivery observability, it was clear that the engineering team needed to evaluate their tech stack. While the organization already had New Relic in place, the shift toward a cultural and technical overhaul required something more. The goal wasn’t just about having some insights from application performance monitoring; it was about embedding observability into the development process, aligning with service level objectives (SLOs), and enabling engineers to detect and resolve issues faster than ever.

OneFootball quickly realized that New Relic’s offering came with significant limitations. The first issue was cost—at $330,000 annually, their New Relic spend was 3.5x higher than the cost of their AWS staging infrastructure. As Bruno put it, “That wasn’t sustainable.”

On top of the financial burden, New Relic wasn’t delivering on the team’s needs for meaningful SLOs and distributed tracing. While the platform offered some tracing capabilities, they were not treated as first-class citizens within the product. Most engineers continued using APM and logs while ignoring traces, preventing the cultural shift the CTO was pushing for. Making matters worse, OneFootball had a full-time engineer dedicated solely to maintaining SLOs. Even with this effort, the process felt inefficient, as existing tools couldn’t provide the level of insight the team needed to meet and maintain their objectives.

The engineering team’s modernization initiative emphasized not just the technical benefits of observability, but the cultural shift needed to make it a core part of the development lifecycle. To kickstart this transformation, the team launched a book club focused on the foundational text Observability Engineering. This initiative brought engineers together to discuss observability best practices and helped them align on why observability mattered—not just as a buzzword but as a core capability for delivering a seamless fan experience. “We didn’t treat this as just a technical problem,” Bruno explained. “It was also a cultural one. We needed the team to buy into the idea that observability is essential, not optional.

Armed with a growing understanding of observability’s potential, the engineering team turned its attention to Honeycomb. The engineering team began experimenting with Honeycomb’s free version, instrumenting new applications with OpenTelemetry and sending data to Honeycomb. Early adopters quickly saw value. “Engineers started reporting back that they loved what they were seeing. Even with just one application, they could see the whole path inside their system—everything from database queries to caching layers. It was enough to make them excited about what was possible,” Bruno shared. 

With glowing feedback from engineers, OneFootball chose Honeycomb as the ideal observability platform to power their modernization efforts. Honeycomb’s distributed tracing-first approach seamlessly aligned with OneFootball’s goals, while its flexibility and OpenTelemetry support simplified integration into their workflows. “We standardized on OpenTelemetry, which meant switching over to Honeycomb was as simple as changing the endpoint,” Bruno said. Within a couple months, OneFootball had fully transitioned to Honeycomb, turning observability into a key enabler for reliability and performance at scale.

Proactively protecting the user experience with SLOs

With Honeycomb SLOs, the engineering team gained a powerful tool to prioritize user-centric performance and minimize alert fatigue. Honeycomb’s SLOs allow teams to define, measure, and manage reliability based on real user impact, rather than relying on traditional system metrics like CPU or memory usage. Using error budgets, teams can set clear thresholds for acceptable performance, ensuring incident response is prioritized based on issues that matter most to the user experience.

For OneFootball, this shift was transformative. With over 100 microservices and extensive third-party dependencies—such as live game data feeds or partner content ingestion—a single failure in an upstream service often triggered a cascade of alerts across multiple systems. 

Bruno explained, “We were dealing with alert fatigue. When one system failed, it triggered a cascade of alerts across all dependent systems. Operators would get so many alerts that it was impossible to identify the root cause in a timely manner.” Honeycomb SLOs changed this dynamic by enabling the team to focus on what users actually experience, rather than being overwhelmed by noisy, system-generated alerts.

“By leveraging Honeycomb’s SLOs, we’ve gained the ability to detect and address problems before users even notice. In the last five months, alone, we’ve identified 45 incidents using Honeycomb SLOs—before anyone reported them,” Bruno shared. This proactive approach ensures a seamless, reliable experience for end users. With SLOs in place, OneFootball can prioritize fixes that align with business goals, like minimizing errors on the home screen or reducing latency when the app loads critical news and updates.

Honeycomb SLOs give OneFootball the flexibility to tackle both backend and user-facing challenges. For example, they monitor data freshness for live game metrics and partner news ingestion while also tracking app performance metrics like error rates and response times. This expanded focus ensures both internal systems and customer-facing experiences meet the high standards expected by OneFootball’s millions of passionate fans around the globe.

Shifting left to instrument for observability during development

Incorporating observability from the start of the development process is a key part of OneFootball’s modernization journey, and Honeycomb’s distributed tracing is central to this transformation. With Honeycomb, engineers are no longer reliant on traditional logging systems to debug issues. Instead, they consolidate logs, metrics, and traces into a unified workflow. This allows teams to track performance across the entire application stack, making it easier to identify and resolve problems faster.

For many engineers at OneFootball, Honeycomb marked their first experience with tracing, and the feedback has been overwhelmingly positive. “It’s much clearer and easier to investigate issues using Honeycomb’s distributed traces, and the BubbleUp Anomaly Detection feature within Honeycomb, which surfaces the most relevant data for debugging, has become a valuable tool for our engineers,” explained Bruno. By seeing a full trace of events from start to finish, the engineers can quickly pinpoint the root cause of issues in specific code areas, rather than sifting through massive volumes of log data. This shift has been pivotal in deepening the team’s understanding of the codebase and streamlining investigation time.

By building with observability in mind, engineering teams are now adding meaningful attributes to traces that enhance visibility. For example, OneFootball’s engineers can now track key metrics, such as batch processing success rates, data failures, and performance at the user level by including attributes like user ID, game ID, and player ID in their traces. Honeycomb’s distributed traces provide a more granular understanding of application behavior and the ability to pinpoint issues down to an individual user.

“A lot of the engineers who joined our book club were the fastest to adopt Honeycomb and the shift-left philosophy. They embraced the idea of coding for observability from the start, building it into every new feature from the ground up. This approach also helped us enforce a no-logs policy and significantly reduce logging storage costs,” said Bruno. The effort, along with integrating OpenTelemetry’s wiring code into their boilerplate for new projects, helped accelerate OneFootball’s observability maturity, enabling teams to fully leverage Honeycomb’s tracing capabilities and approach development with observability as a core principle.

Continuous delivery with confidence

Achieving continuous delivery was a strategic goal for OneFootball’s engineering team. But for this strategy to work, they needed confidence that each deployment would perform as expected. Honeycomb’s observability capabilities became a cornerstone of this modernization journey, enabling OneFootball to automate and refine their delivery pipeline.

“Continuous delivery requires confidence—you need to know that what you’re doing is working correctly. Honeycomb’s SLOs make this possible for us. SLOs give us the insights we need to push changes and track performance in real time. Honeycomb is the source of truth for us to decide if an update should go to production or not,” Bruno said.

For example, OneFootball now uses Honeycomb to help automate canary deployments through Argo CD and Argo Rollouts. Engineers define the steps for a canary deployment, and Honeycomb SLOs, aided with triggers, determine whether to promote the rollout or halt the process. If the SLO metrics look good, the system automatically progresses the deployment until 100% of the application is running on the new version.

This successful approach for continuous delivery also eliminated the need for a staging environment, which had become inefficient and costly in a microservices-based architecture. “Before, our staging environment couldn’t replicate production perfectly—it lacked real-world data and we had the same performance concerns. With Honeycomb, we now test in production with small increments, which also saved us the $90,000 yearly cost of maintaining a staging cluster,” Bruno explained. 

Honeycomb’s SLO-supported continuous delivery ensures that deployments are not only faster but also more reliable, allowing OneFootball to focus on delivering value to users without manual interventions or unnecessary bottlenecks.

Quote Card: “Before, our staging environment couldn’t replicate production perfectly—it lacked real-world data and we had the same performance concerns. With Honeycomb, we now test in production with small increments, which also saved us the $90,000 yearly cost of maintaining a staging cluster.” Bruno Costa, Principal Site Reliability Engineer at OneFootball

Simplifying sampling with Honeycomb Refinery

Tail sampling became a critical component of OneFootball’s observability strategy, enabling the team to capture the most important traces while dramatically reducing costs. Initially, they implemented a solution using OpenTelemetry’s Collector, but as their architecture scaled, managing this setup became a significant challenge.

“Our tail sampling solution started with the Collector, but the configuration file grew to over 3,000 lines of rules. It was difficult to maintain, and developers hesitated to make changes. The infrastructure team had to step in frequently to assist, which added another step to the process,” Bruno explained.

After consulting with Honeycomb, OneFootball decided to start using Honeycomb’s Refinery, which significantly simplified their architecture. With Refinery, OneFootball no longer needs separate fleets of load balancer Collectors and standard Collectors. Instead, a single fleet of Collectors now forwards traces to Refinery instances, which handle the sampling logic.

Bruno highlighted how this shift improved usability: “For developers, the process feels like head sampling because they set sample rates on the application side with intuitive attributes. But it’s actually tail sampling under the hood, meaning we still capture all the errors and slow traces while aggressively sampling successful ones. This approach with Honeycomb’s Refinery dramatically reduced our costs while retaining visibility.”

Honeycomb’s extrapolation feature further amplified these benefits. By accounting for sample rates in its aggregations, it allowed OneFootball to maintain critical insights, such as traffic patterns, P99 latency, and error rates, without needing to process every single trace.

Refinery’s extrapolation feature was key. It enabled us to cut costs without losing visibility into performance and error data. Plus, with Honeycomb, our developers can still drill into traces and use features like BubbleUp Anomaly Detection for detailed analysis,” Bruno added.

This tailored approach to tail sampling not only reduced operational complexity but also empowered OneFootball to deliver scalable observability that aligned with their cost-saving goals.


Interested in learning more? Book a call with our experts.


OneFootball and Honeycomb Logos

At a glance

About

With over 100 million fans tuning in each month, OneFootball has become the go-to platform for live scores, streaming, statistics, and news covering 200 leagues worldwide. From their newsroom in Berlin, Germany, OneFootball delivers content in 12 languages, making sure fans stay connected to the game no matter where they are. Since 2008, OneFootball has transformed from a budding app into a global hub for “everything football.”

Industry

Mobile sports broadcasting

Products

Honeycomb platform

Use cases

Incident debugging 

Application performance monitoring

Continuous delivery

Results

  • Reduced observability costs by over 80%, cutting annual spend after transitioning to Honeycomb
  • Proactively identified 45 incidents in the last five months using Honeycomb SLOs, giving the team the power to address issues before they impact users
  • Reduced investigation time with Honeycomb’s distributed tracing, allowing engineers to pinpoint root causes more efficiently
  • Eliminated the need for a staging environment, saving $90k in costs while achieving continuous delivery supported by Honeycomb SLOs
  • Reduced infrastructure complexity and operational costs by implementing Honeycomb’s Refinery for head and tail sampling
Don’t forget to share!
Brian Chang

Brian Chang

Senior Customer Marketing Manager

Brian Chang is a Senior Customer Marketing Manager with over 12 years of hands-on customer and product marketing experience. Has been previously worked at Pivotal / VMware and Red Hat in this capacity, where he has built and launched two successful customer advocacy programs that are still active and profitable today. Prior to that, Brian has had previous success at Snapfish, Sun Micro, AT&T Broadband & CNN.

Related posts