Enterprise

Explore how a Kafka health check helped Tungsten Automation manage Kafka at scale

Discover how Tungsten Automation successfully reduced risk with the help of a Kafka health check.

Background: Meet Tungsten Automation, a global leader in intelligent workflow automation 

When we met Tungsten Automation, their team was searching for an experienced cloud technology consultancy to help support their Apache Kafka deployment on Azure’s cloud platform. They currently support more than 25,000 customers around the world, helping them make their companies and workflows more efficient, cost-effective, and automated with up-to-date technology, and they needed to make their systems even more agile, flexible, and able to support a truly global customer base.

Challenge: How do you address recurring challenges with Apache Kafka cluster performance and reliability? 

At the start of the project, Tungsten Automation had just acquired a new software provider and inherited an Azure production environment. They loved what the software provider offered, but they needed to secure and scale up the existing Kafka environment to meet the needs of their much larger customer base, and they didn’t have any formal documentation or in-depth understanding of how to operate Kafka.

They had already tried to scale up the Kafka implementation, but without expert support, the Tungsten team had run into issues with the Kafka cluster’s performance and the setup’s overall reliability. As it stood, they were facing one to two outages a week for as long as six hours at a time—definitely not an ideal setup if they wanted to continue providing high-level 24/7 service to an ever-expanding base of customers.

Solution: We conducted a Kafka health check, stabilised Tungsten’s environment, then stayed on for ongoing Kafka support and maintenance

Our team brought in outside Kafka technical expertise and rapidly identified a few critical high-impact tasks that Tungsten’s team could complete to stabilise the platform and mitigate the risk of downtime or sudden outages. For example, we recommended they create a reusable CI/CD pipeline, iteratively deploy new changes via infrastructure-as-code, and upgrade Tungsten’s versions of Zookeeper and Kafka in a testable, controllable way by using Terraform.

Each part of our Kafka health check examined how the Tungsten Automation team could most effectively tweak their current system to make it more reliable, cost-efficient, and scalable. We helped them reduce the complexity of the system by switching the number of Zookeeper nodes they used from seven to three, making it easier and less costly to maintain the overall system, and we cut the number of operational tasks they needed to carry out with the Zookeeper and Kafka version upgrades.

Finally, we set up Tungsten’s team with repeatable Terraform templates, resolved Java errors, and upgraded the enterprise’s Kafka setup. This phase set the Tungsten team up for success now and into the future.

Kafka health check

Results: Tungsten Automation reduced risk and streamlined Kafka at scale

With the Kafka health check changes in place, Tungsten Automation radically reduced its exposure to risk. We had iterated through improvements to steadily de-risk and stabilise the deployment, giving Tungsten’s board members and technology leads greater confidence. They no longer had stressful cluster outages throughout the week, and they had enough of the platform issues ironed out where they could look ahead to the bigger picture: shifting to a managed Kafka service so they could focus on other parts of the business.

We stayed on after the completion of the project to help Tungsten resolve any final details, but with many of the challenges fixed during the project, Tungsten Automation’s team was set to continue doing what they did best: providing world-class service and workflow automation to customers and companies around the globe.

Learn More: Learn about a Kafka health check!

Hey there! We’re OSO, a leading provider of Apache Kafka consulting, development, and support services. Our team of Kafka experts helps organisations design, build, and operate scalable, reliable Kafka architectures, and we work with companies around the world. We’ve implemented Kafka solutions for public organisations, enterprise companies, and startups, and we work with teams to easily integrate into their engineering operations, from daily standups to messages on Slack.

Get started with emerging technologies today

Have a conversation with one of our experts to discover how we can work with you to adopt emerging technologies to keep your business growing.

Book a call