Monitoring Kafka Consumer Lag – Part 2

With any new and fast moving technology stack such as Kafka, monitoring and operational tools are often a step behind or missing significant functionality. But we do have a couple of robust open source projects which are available and can be made to work in specific circumstances. One such tool is Burrow from LinkedIn, written [...]

By | November 18th, 2016|Analytics, Data Processing Frameworks|0 Comments

Optimizing Alerts on Free space on disks using Machine Learning

The available space on the disk (diskfree) has a significant and often catastrophic impact on applications and services running on the system. For this reason, every DevOps engineer knows that it is crucial to carefully monitor disk usage in all critical systems, especially ones that tend to rapidly use up disk space, such as heavily [...]

By | November 10th, 2016|Analytics, Data Science, Machine Learning|0 Comments

Leveraging AI and Machine Learning to tease out Seasonality patterns

The standard models, such as SMA, EWMA, etc., fail in the presence of trend or seasonality conditions. We have seen the effect of trend with metrics representing queue size. Many metrics representing important business concerns also exhibit strong seasonal behavior. For example, the number of active users on an e-commerce site shows both daily and [...]

Leveraging Anomaly Detection to Monitor for Errors

Metrics that represent errors pose a special problem. Error is often used in a generic sense that could imply something very serious where any non-zero values warrants investigation, e.g., 5xx errors, or, it could represent something that has an acceptable baseline value but where an unusual change could indicate serious problems, e.g., 4xx errors, page faults, [...]

By | October 23rd, 2016|Analytics, Data Science, Machine Learning|0 Comments

Queue Length in Messaging Systems – Kafka, SQS, etc.

The use of message queues/brokers is ubiquitous in any real-time application. Such intermediate modules (e.g. Apache Kafka, RabbitMQ, AWS SQS, etc.) improve system reliability by decoupling the producers from the consumers, thus freeing them from any synchronization requirements. A primary operational concern with such systems is whether the consumers are keeping up with the producers, [...]

By | October 21st, 2016|Analytics, Data Science, Machine Learning|0 Comments

Computing Service Health for Distributed and Dynamic Applications

  In this post, we will talk about how the OpsClarity platform views operational health. Operational health is the answer to the question, “are my applications performing as expected, both at this point in time and in the immediate future?”  If so, we can say that an application is “healthy,” and if not, the application [...]

By | September 21st, 2016|Analytics, Web-scale Application Monitoring|0 Comments

Monitoring Serverless Architectures, Microservices and Containerized Applications

Application architectures and deployment paradigms typically evolve every 5 to10 years. We are in the midst of a major evolutionary change right now. Modern applications are increasingly relying on stateless microservices, are often paired with stateful data services (like NoSQL, Kafka, Hadoop etc.), and are being deployed on containers or leverage serverless architectures. As the [...]

By | September 14th, 2016|Analytics, Web-scale Application Monitoring|0 Comments

Monitoring Docker, Part Two – The science of auto-discovery advances container monitoring

This is Part Two of our 2-part blog post on the challenges of monitoring containerized applications. In Part One we discussed basic requirements, considerations and components involved in a production-ready Docker setup. We touched upon the ephemeral nature of containers, the importance of collecting application and container metrics, and the ever increasing cost of configuration [...]

By | August 27th, 2016|Analytics, Integrations|0 Comments

2016 State of Fast Data and Streaming Applications Survey

As businesses continued to increase their involvement in the digital economy, the need for exponentially faster understanding of market changes, customer responses, and system performance has become abundantly clear in 2016. The survey that OpsClarity released today reported that 92% of business and technology professionals employed in both small and large technology, retail, healthcare, finance and [...]

By | June 28th, 2016|Analytics|0 Comments

Building & Monitoring a Near Real-time Streaming Pipeline

(This is a guest post from Jayant Shekhar, CEO of, a company focussed at building and deploying machine learning & large scale applications.) Leveraging Kafka and Spark Streaming combined with a NoSQL database is a common pattern which is getting increasingly popular for Near Real-Time (NRT) analytics. NRT is empowering enterprises across various verticals such [...]

By | June 5th, 2016|Analytics, Integrations|0 Comments