About Uday Chettiar

This author has not yet filled in any details.
So far Uday Chettiar has created 7 blog entries.

Optimizing Alerts on Free space on disks using Machine Learning

The available space on the disk (diskfree) has a significant and often catastrophic impact on applications and services running on the system. For this reason, every DevOps engineer knows that it is crucial to carefully monitor disk usage in all critical systems, especially ones that tend to rapidly use up disk space, such as heavily [...]

By | November 10th, 2016|Analytics, Data Science, Machine Learning|0 Comments

Leveraging AI and Machine Learning to tease out Seasonality patterns

The standard models, such as SMA, EWMA, etc., fail in the presence of trend or seasonality conditions. We have seen the effect of trend with metrics representing queue size. Many metrics representing important business concerns also exhibit strong seasonal behavior. For example, the number of active users on an e-commerce site shows both daily and [...]

Leveraging Anomaly Detection to Monitor for Errors

Metrics that represent errors pose a special problem. Error is often used in a generic sense that could imply something very serious where any non-zero values warrants investigation, e.g., 5xx errors, or, it could represent something that has an acceptable baseline value but where an unusual change could indicate serious problems, e.g., 4xx errors, page faults, [...]

By | October 23rd, 2016|Analytics, Data Science, Machine Learning|0 Comments

Leveraging data science to detect persistent and unusual changes in Latency

We briefly discussed the issues with monitoring latency in our first post on Anomaly Detection alerts. As seen in the example plots shown below, latency exhibits occasional sharp spikes, where the latency is much higher than the normal range. This is due to a multitude of reasons which are frequently beyond the control of DevOps [...]

By | October 22nd, 2016|Uncategorized|0 Comments

Queue Length in Messaging Systems – Kafka, SQS, etc.

The use of message queues/brokers is ubiquitous in any real-time application. Such intermediate modules (e.g. Apache Kafka, RabbitMQ, AWS SQS, etc.) improve system reliability by decoupling the producers from the consumers, thus freeing them from any synchronization requirements. A primary operational concern with such systems is whether the consumers are keeping up with the producers, [...]

By | October 21st, 2016|Analytics, Data Science, Machine Learning|0 Comments

Anomaly Detection and Machine Learning Applied to DevOps and Monitoring

In our previous blog post (Drowning in Alerts: Blame it on Statistical Models for Anomaly Detection), we talked about how standard anomaly detection constructs, such as SMA, exponential smoothing, etc., do not work well when it comes to Ops monitoring. At OpsClarity, we view monitoring as a multi-layered activity. For example, you may discover a [...]

By | October 21st, 2016|Data Science, Machine Learning|0 Comments

Drowning in Alerts: Blame it on Statistical Models for Anomaly Detection

With the recent spurt of web-scale applications, the problem of monitoring such systems has attracted a lot of attention. The scale and complexity of such systems, coupled with the non-trivial interactions between constituent components, makes it extremely challenging to ascertain the health and stability of the various services and components that make up such a [...]

By | April 25th, 2016|Analytics, Data Science|0 Comments