DBAs may be the last remaining priesthood in our industry. But software engineers and operations engineers are increasingly finding themselves responsible for precious company data, and DBAs are increasingly adopting generalist skillsets and best practices.
Not everyone is thrilled about this
There is a surprising amount of fear and trembling among non-DBA engineers when it comes to managing data, because it has an alien feel and errors can be so permanent.
Honeycomb can help. 🙂 Storage is one of our specialties. We’ve run massive, world-class clusters in MongoDB and MySQL and have extensive experience in plenty of other databases.
This might feel like an oddly practical entry among the other idealistic messages in the list, but we’re incredibly excited about it because there’s such a big gap in the market around this problem set.
We don’t even have to build anything special for it, just log connectors and then post a bunch of examples that show you how to track down which queries are eating up your lock percentages, fighting each other to acquire the lock, when each query family was deployed and what line of code it lives on, which tables or collections are doing full table scans and are going to tip over a limit soon unless you add an index.
This is going to be so much fun. You’ll see!
Old Way: DBAs exist in siloes with their own tools, languages, and alerts that nobody else understands. Nobody can pinch hit for them, or vice versa, because it feels like a foreign land.
New Way: DBAs are acting more like Database Reliability Engineers, where they share tools and techniques with the other engineering teams. Software engineers and systems engineers frequently find themselves in charge of the company data as part of their normal jobs.
Example:
For example, one of the hardest problems in distributed systems is … “everything’s getting a little bit slower.” When you have hundreds of backend services and storage clusters, where do you even start?
Is it a particular deploy, or a single AWS availability zone, or a particular instance type, or linked to a deploy of an ancillary service? Is it one user dominating the slow queries? Is it only slow during garbage collection? Is there an offline process being launched from cron that’s piling up? Disk saturation, NIC saturation, a bad link on a switch? Is it write endpoints, or read endpoints, or writes to a particular collection sharded across multiple replica sets? Does the slowness correspond to a snapshot process that is still serving read queries? Some intersection of the above?
Who knows? Who cares? log everything. Just toss it in as more attributes to your data set. 🙂
Data isn’t a silo
Data doesn’t have to be a silo anymore. You can all use each other’s tools. It’s soooo much easier and more friendly when it feels familiar. And it hooks right back in to your instrumented code, too
With Honeycomb, you can use your familiarity debugging your own code or stateless services, and transfer it directly over to finding outliers and issues with your storage systems, just by ingesting logs and tailing them into Honeycomb.
Unlike metric-based analytics, with Honeycomb you always have access to the original raw query and associated events — not just the normalized query family. This is underappreciatedly powerful — once you’ve tried it you will never, ever want to live without it.