Skip to main content

Tackling Kubernetes Observability Challenges with Pixie

Over the past several years, the software industry has seen a boom in the adoption of containers and container orchestration technologies such as Kubernetes.

Kubernetes provides flexibility and control over workload orchestration. However, due to its complex and distributed nature, Kubernetes workloads require robust and effective observability (Monitoring and Tracing) workflows and remediating mechanisms (tools and their outcomes) across infrastructure and applications.

There are many open-source observability tools available for Kubernetes troubleshooting, for example, Pixie, Robusta, Prometheus, Jaeger, Kubewatch, EFK, etc. In this blog series, we’ll take a closer look at some of these tools.

This is the first blog in the Pixie blog series where we’ll introduce you to the Pixie software at a higher level and work through an example. Later in the series, we’ll have a closer look at how Pixie helps us implement robust observability for Kubernetes clusters, the internal architecture of Pixie, tutorials, and use cases to get you started with Pixie.

Bugs are inevitable

K8s infrastructure bugs are complex. More often than not, DevOps Engineers / SRE(s) experience certain issues which are extremely hard to be tracked back to their original source. To be able to proactively monitor the cluster for any such issues, we need robust “observability” and a “mechanism” to act on (remediate) the observations. When it comes to “observability,” open source observability tools. have gained popularity due to their resource efficiency, security, ease of use, and native k8s integration.

Today we will focus on the former i-e “Observability for K8s Infra”. There are a few things to consider when choosing from Kubernetes observability tools, such as tool resource consumption, security, ease of use, and native k8s integration.

In the next section, we will see why debugging on k8s is challenging and how Pixie can simplify some of those challenges.

Debugging on Kubernetes is hard

K8s comes with its challenges, including observability (infra and application-aware observability) and related actions. The dynamic nature of K8s resources and workloads compounds this challenge.

Observing applications (and Infra) requires a level of strategy and execution that is not trivial. Add microservices-based architecture and an infra orchestrating engine (K8s) in there, and you have got yourself a real-time distributed system with complex app and infra-aware observability needs.

Let’s imagine an HTTPS call to an API endpoint hosted on a Kubernetes cluster deployed on a cloud consulting provider’s infrastructure. Here is a simplified diagram representing the communication chain for the request.

At any given point and time in this communication chain, things could go south. For example, we might experience a network failure, app failure, or performance degradation. For us to be in a position to promptly identify and resolve the issue, knowledge of what is happening in the cluster and detailed insights into every step of the communication is essential.

Now that we have understood the need for granular observability in our Kubernetes cluster, we need to decide a way to implement it, but the question is where and how we should accomplish that.

Introducing Pixie:

Pixie is an open-source observability platform for Kubernetes clusters powered by Pixie eBPF. Users can use Pixie to view the high-level state of their cluster (service maps, cluster resources, application traffic) and also drill down into more detailed views (pod state, flame graphs, individual full body application requests). With Pixie, there is no need to add manual instrumentation to your code, redeploy or restart your services. The plug-and-play architecture of Pixie enables out-of-the-box and powerful observability at a granular level.

Pixie provides observability across the Kubernetes cluster, including protocol traces, resource and network metrics, and even applications (currently supports compiled languages, go, rust, C/C++) CPU profiles.

As an illustration, let’s look at a sample K8s application and observe some HTTP metrics via Pixie. The name of this sample application is “px-sock-shop,” and it is deployed on a local minikube cluster. We will likely cover the deployment of the sample application in a separate blog. The application “px-sock-shop” is a web application where users can buy socks. It has a service named “catalog” which exposes the rest APIs.

Now let’s see how Pixie automatically exposes HTTP metrics for the calls made to the catalog service.

Pixie comes with out-of-the-box HTTP requests metrics for the deployed services. The HTTP metrics that Pixie provides are HTTP Request Rate, Error Rate, Latency, and Throughput. Let’s take a closer look at each of these metrics for the “catalog” service of our sock-shop application.

Request Rate – This metric provides insights into queries like how many requests our application receives at a given time. It shows the load that our catalog service is experiencing. Request rate can help us make effective scaling decisions related to our service.

Error Rate – Real-time tracking of errors is one of the most crucial aspects of a production environment. Effective tracking of error rate is directly linked with the availability and performance tracking of applications. Visibility into error rates allows us to proactively take remediating actions and generate alerts. The figure below shows a snapshot of simulated errors captured via Pixie.

Latency – The latency metric provides us visibility into application performance. Latency metric information can be leveraged to configure alerts based on a breach of a set threshold, subsequently kicking off the troubleshooting and timely remediation of the underlying problem.

Pixie doesn’t support alerts yet, but there are ways to set up alerts, e.g., using Slack Bot integration or exporting pixie data in OpenTelemetry format. We will cover those in upcoming blogs.

Moreover, we will dive deeper and get an even closer look at the granular details of the HTTP request inbound to the service alongside the response body for the request.

Conclusion

In this post, we saw how Pixie enables us to implement observability, for example, Kubernetes workload. Pixie has gained a lot of traction in the Kubernetes devops community due to its ease of setup, user-friendliness, and tracing and observability capabilities. However, this is just the tip of the iceberg. There’s a lot more to come.

Following Blogs:

In the next few blog posts, we’ll discuss Pixie architecture and how it uses eBPF. We will also go through the Pixie installation and deployment procedure and cover other protocols and use cases such as MySQL, Redis, Cassandra, and Kafka. See you there!

Related Articles

Related Articles