Our Product and Engineering teams here at OpsClarity have been busy for the last couple of months adding some new exciting capabilities as well as bringing about some major improvements with respect to usability, reliability and bug fixes. In this post we will share some of the exciting features we have added along with a list of minor enhancements that we have added along the way.

Automated Apache Kafka Consumer Lag Monitoring

Apache Kafka is increasingly becoming a critical component of messaging infrastructure. Poor performance or a degradation in the Kafka broker health, errant or lagging consumers, or even performance issues in the Zookeeper cluster, can create rippling issues across the entire application stack. While monitoring Kafka Broker and understanding its metrics can itself be challenging, monitoring Kafka Consumer Lag seems to be a problem that seems to be eluding most customers. Since monitoring consumer lag requires monitoring the consuming applications, and the way lag is measured could differ based on the version of Kafka broker being leveraged, it has been complex for DevOps and TechOps teams to find an effective monitoring solution to track it. OpsClarity has introduced a completely automated monitoring solution for Kafka consumer lag that works across different types of Kafka consumer applications as well as across the different versions of Kafka brokers. You can find more information on OpsClarity Kafka Monitoring solution here.

Automatically Monitoring of Containerized Applications

We have added native support for monitoring containers along with support for CoreOS as well as orchestration systems like Kubernetes. Configuration management for your monitoring tool in a containerized world can be challenging. The containers can move around between hosts, or, new containers are being constantly spun-up. Performing availability checks and metric collection for the newly created containers can be a futile exercise if you rely on static configuration files to manage your monitoring configurations. At OpsClarity, we not only dynamically discover and track containers (along with native integration with Swarm, Kubernetes, and Mesos), but also track the services running inside the containers to provide you a top down view of your services and applications. You have the ability to drill down to specific service instances, the containers running the services and the underlying hosts, all within one contextual user interface. You can find more information on our container monitoring solution here.

Additional Features and Improvements

In addition to above features, we have made hundreds of minor improvements. Some of the key improvements are listed here:

  • Improved anomaly detection models. OpsClarity has the largest library of anomaly detection models covering the whole ecosystem of open-source frameworks. We’ve been learning from the customer data and have improved the precision of the models to reduce alerting noise even more
  • Container metrics are tagged by container name, ID, image, hostname and service it is running. Full support to slice and dice container metrics by any dimension
  • Easy UI based control to manipulate the application topology to match the user’s mental model
  • Improved real-time performance showing latest metric and health data for every resource as well as alerts/ event
  • More detailed metric collection from Spark standalone and Spark on YARN
  • Host health model improvement include multiple metrics including CPU, Memory, Disk Errors, Network Errors and AWS instance check
  • New metric indicating number of active hosts in a given service, so customers can alert when a service membership falls below a critical level. It is particularly useful when using auto-scaling clusters in AWS
  • Several improvements to the monitor configuration and metric graphing user interface