Explore how a Kafka health check helped Tungsten Automation manage Kafka at scale
Discover how Tungsten Automation successfully reduced risk with the help of a Kafka health check.
5 August 20248 mins read
Background: Meet Tungsten Automation, a global leader in intelligent workflow automation
When we met Tungsten Automation, their team was searching for an experienced cloud technology consultancy to help support their Apache Kafka deployment on Azure’s cloud platform. They currently support more than 25,000 customers around the world, helping them make their companies and workflows more efficient, cost-effective, and automated with up-to-date technology, and they needed to make their systems even more agile, flexible, and able to support a truly global customer base.
Challenge: How do you address recurring challenges with Apache Kafka cluster performance and reliability?
At the start of the project, Tungsten Automation had just acquired a new software provider and inherited an Azure production environment. They loved what the software provider offered, but they needed to secure and scale up the existing Kafka environment to meet the needs of their much larger customer base, and they didn’t have any formal documentation or in-depth understanding of how to operate Kafka.
They had already tried to scale up the Kafka implementation, but without expert support, the Tungsten team had run into issues with the Kafka cluster’s performance and the setup’s overall reliability. As it stood, they were facing one to two outages a week for as long as six hours at a time—definitely not an ideal setup if they wanted to continue providing high-level 24/7 service to an ever-expanding base of customers.
Solution: We conducted a Kafka health check, stabilised Tungsten’s environment, then stayed on for ongoing Kafka support and maintenance
Our team brought in outside Kafka technical expertise and rapidly identified a few critical high-impact tasks that Tungsten’s team could complete to stabilise the platform and mitigate the risk of downtime or sudden outages. For example, we recommended they create a reusable CI/CD pipeline, iteratively deploy new changes via infrastructure-as-code, and upgrade Tungsten’s versions of Zookeeper and Kafka in a testable, controllable way by using Terraform.
Each part of our Kafka health check examined how the Tungsten Automation team could most effectively tweak their current system to make it more reliable, cost-efficient, and scalable. We helped them reduce the complexity of the system by switching the number of Zookeeper nodes they used from seven to three, making it easier and less costly to maintain the overall system, and we cut the number of operational tasks they needed to carry out with the Zookeeper and Kafka version upgrades.
Finally, we set up Tungsten’s team with repeatable Terraform templates, resolved Java errors, and upgraded the enterprise’s Kafka setup. This phase set the Tungsten team up for success now and into the future.
Results: Tungsten Automation reduced risk and streamlined Kafka at scale
With the Kafka health check changes in place, Tungsten Automation radically reduced its exposure to risk. We had iterated through improvements to steadily de-risk and stabilise the deployment, giving Tungsten’s board members and technology leads greater confidence. They no longer had stressful cluster outages throughout the week, and they had enough of the platform issues ironed out where they could look ahead to the bigger picture: shifting to a managed Kafka service so they could focus on other parts of the business.
We stayed on after the completion of the project to help Tungsten resolve any final details, but with many of the challenges fixed during the project, Tungsten Automation’s team was set to continue doing what they did best: providing world-class service and workflow automation to customers and companies around the globe.
Learn More: Learn about a Kafka health check!
Hey there! We’re OSO, a leading provider of Apache Kafka consulting, development, and support services. Our team of Kafka experts helps organisations design, build, and operate scalable, reliable Kafka architectures, and we work with companies around the world. We’ve implemented Kafka solutions for public organisations, enterprise companies, and startups, and we work with teams to easily integrate into their engineering operations, from daily standups to messages on Slack.
This website uses cookies so that we can provide you with the best user experience possible. Cookie information is stored in your browser and performs functions such as recognising you when you return to our website and helping our team to understand which sections of the website you find most interesting and useful.
Strictly Necessary Cookies
Strictly Necessary Cookie should be enabled at all times so that we can save your preferences for cookie settings.
If you disable this cookie, we will not be able to save your preferences. This means that every time you visit this website you will need to enable or disable cookies again.
3rd Party Cookies
This website uses Google Analytics to collect anonymous information such as the number of visitors to the site, and the most popular pages.
Keeping this cookie enabled helps us to improve our website.
Please enable Strictly Necessary Cookies first so that we can save your preferences!