How to use Datadog to monitor Redpanda

Blog

Tutorial Community Ecosystem

Learn how to monitor Redpanda using Datadog's built-in Redpanda integration.

ByNazli AnderonAugust 25, 2022

Datadog is an infrastructure-monitoring software with alerting, metric visualization, anomaly detection, log management, and continuous integration (CI) capabilities. Datadog has 500+ built-in integrations that cover the most frequently used applications and services, including cloud providers, databases, and messaging systems. Redpanda is one of these integrations.

The purpose of this tutorial is to show you how to use Datadog’s Redpanda integration to monitor key metrics and metric groups. The tutorial provides step-by-step instructions for monitoring Redpanda with Datadog. You can follow along with this tutorial and access all code used in the demo in this GitHub repo.

Datadog as a monitoring tool

Datadog’s web user interface for developers has a side navigation menu with many capabilities, including:

Dashboards for a compact overview of the application and service metrics.
Integration options for an easier way to set up your application and service integrations.
Monitors to define alerts and notifications based on metrics, availability, or log patterns.
An Events Explorer to display the events generated by integrated infrastructure and services.

To combine metrics and logs from a variety of applications and services, Datadog uses Datadog Agent, a type of software that runs on your hosts and collects events and metrics from applications and services, and forwards them to Datadog.

Datadog Agent can be installed on a host or in a containerized environment.

datadog monitoring

This tutorial provides step-by-step instructions on how to set up Redpanda integration on Datadog Agent. The tutorial will be using a docker-compose setup to simplify instructions.

After providing the instructions, you’ll also get examples of frequently used metrics and their visualizations.

Prerequisites

You’ll need the following for this tutorial:

A Datadog account
Your Datadog API Key (DD_API_KEY)
The name of your Datadog site (DD_SITE)
A machine with Docker and docker-compose installed

To get the DD_API_KEY, which is required for submitting metrics and events to Datadog, navigate to Organization Settings under your user tab, where you can find API keys for your organization and create a new API key by clicking +New Key.

You can follow the instructions on Datadog’s API and Application Keys to learn about Datadog API management.

datadog settings

Monitoring Redpanda with Datadog

A real-time event streaming setup typically has three main services—consumers, producers, and the event streaming platform (middleware). With a docker-compose.yml file, it’s possible to simulate this multi-component setup.

The producer streams event data to Redpanda. There’s an Agent integration with a Docker Datadog Agent, and the consumer consumes the produced streaming events. To see how the components are related to each other, it’s helpful to go over each component of this real-time event streaming setup. The following sections explain each of the components.

Start with Docker Compose

datadog docker compose

You can find the docker-compose setup in the demo repository. After cloning the repository, you can set up your .env file on the same level as your docker-compose.yml.

The .env file in this tutorial includes two variables: TOKEN for reading data from an API and DD_API_KEY for Datadog:

TOKEN=<API_TOKEN>
DATADOG_API_KEY=<DD_API_KEY>

The TOKEN is used by the producer service as it reads real-time data from Coinranking API. Coinranking is a cryptocurrency information web application. It has an API providing real-time information about cryptocurrencies.

You can create an account from Coinranking to obtain a TOKEN. If you wish to use a different type of data set, you can also create your own producer and consumer using another real-time data API.

After setting up your .env file, you can run the following command to start your multi-component application in the detached mode:

docker-compose up -d

Set up Redpanda

Setting up Redpanda on a Docker container first requires pulling the image. Then, you need to start the Redpanda service with the configuration parameters that are explained below.

You don't need to change the parameters. However, it’s useful to know why the docker-compose Redpanda service uses each parameter in the command:

--overprovisioned limits the resources. This is for Docker usage.
--node-id defines the unique ID for a node in a Redpanda cluster.
--check enables/disables the checks performed at startup.
—kafka-addr defines the internal Apche Kafka^Ⓡ API address and port. --advertise-kafka-addr defines the external Kafka API address and port. The consumer is expected to use the external address.
--pandaproxy-addr defines the internal Redpanda REST API address and port. --advertise-pandaproxy-addr defines the external Redpanda REST API address and port.
redpanda.enable_transactions enables the transactions. redpanda.enable_idempotence enables the idempotent producer. These two optional configurations are expected to be enabled explicitly.
redpanda.auto_create_topics_enabled enables the topic auto creation mode.

To see more options, check out the Redpanda custom configuration documentation.

Since this tutorial provides you a Redpanda instance in a Docker container, having Datadog’s log collection labels might be useful for log management. You can retrieve these with the following code:

---
version: "3.7"

services:

	redpanda:
    image: docker.redpanda.com/vectorized/redpanda:v22.1.4
    container_name: redpanda
    networks:
      - monitoring
    command:
      - redpanda start
      - --overprovisioned
      - --smp 1
      - --memory 1G
      - --reserve-memory 0M
      - --node-id 0
      - --check=false
      - --kafka-addr 0.0.0.0:9092
      - --advertise-kafka-addr redpanda:9092
      - --pandaproxy-addr 0.0.0.0:8082
      - --advertise-pandaproxy-addr redpanda:8082
      - --set redpanda.enable_transactions=true
      - --set redpanda.enable_idempotence=true
      - --set redpanda.auto_create_topics_enabled=true
    labels:
      com.datadoghq.ad.logs: '[{"source": "redpanda", "service": "redpanda_cluster"}]'
      com.datadoghq.ad.check_names: '["redpanda"]'
    ports:
      - 9092:9092
      - 8081:8081
      - 8082:8082
      - 9644:9644

In the above code, labels are defined for log collection configuration from the container:

com.datadoghq.ad.logs: '[{"source": "redpanda", "service": "redpanda_cluster"}]' enables Datadog Agent to see the Redpanda container logs with the specified source and services. This can make it easier to create a custom log manager for the Redpanda service.
com.datadoghq.ad.check_names: '["redpanda"]' enables the auto-discovery feature for the container named redpanda.

Note: Running Redpanda directly via Docker is not supported for production usage. This platform should only be used for testing. Additionally, Redpanda integration currently does not provide a default log management pipeline.

However, you can create your custom log parsing pipeline for Redpanda clusters. You can see Datadog documentation for more explanation on creating custom log parsing pipelines.

Set up Datadog

The Datadog Agent is installed in a Docker container:

FROM gcr.io/datadoghq/agent:7

COPY tools/datadog/datadog.yaml /etc/datadog-agent/datadog.yaml
COPY tools/redpanda/configuration.yaml /etc/datadog-agent/conf.d/redpanda.d/conf.yaml

LABEL "com.datadoghq.ad.logs"='[{"source": "agent", "service": "datadog_agent"}]'

RUN agent integration install -r -t datadog-redpanda==1.1.0

In the above code, there are a few important steps:

datadog.yaml is copied into the Datadog Agent configuration path (/etc/datadog-agent/datadog.yaml).
configuration.yaml is copied into the Redpanda integration path(/etc/datadog-agent/conf.d/redpanda.d/conf.yaml).
The container label is defined as "com.datadoghq.ad.logs"='[{"source": "agent", "service": "datadog_agent"}]' .
Lastly, the datadog-redpanda integration is installed with an Agent command.

The datadog.yml file should look like the following:

logs_enabled: true
site: datadoghq.eu

As you can see, logs are enabled and the Datadog site is defined as the EU location.

The Redpanda configuration configuration.yml for Datadog is as follows:

instances:
  - openmetrics_endpoint: http://redpanda:9644/metrics
    use_openmetrics: true

logs:
  - type: journald
    source: redpanda

Note: This tutorial is using an older version of Redpanda. Check the Monitoring Docs for the most up-to-date endpoint.

In the above, the instances tag maps the node metrics endpoint as an openmetrics endpoint to Datadog. The logs tag enables the collection of the tags from the source named redpanda.

For more option parameters available in the configuration.yaml, you can see the configuration.example.yml in this GitHub repo.

The code above explains the three main steps to create a Dockerized Datadog Agent, which can be summarized as follows:

Defining the datadog.yml and copying it into the Datadog Agent configuration path.
Defining the configuration.yml and copying it into the Redpanda integration path of Datadog Agent.
Installing the datadog-redpanda package in the Agent.

copy datadog

Using the Datadog Agent as a service

By following this tutorial, you’ve created a simplified simulation of a multi-component application, part of which is the monitoring component (Datadog).

Using this same setup, you can build the monitoring component of your docker-compose with a Dockerfile named Dockerfile.datadog, which you can use to add more configurations for networking, environment variables, and volume mounting.

The following is an example docker-compose service for the Datadog Agent:

---
version: "3.7"

services:
  datadog_agent:
    build:
      context: .
      dockerfile: Dockerfile.datadog
    container_name: datadog_agent
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock:ro
      - /proc/:/host/proc/:ro
      - /sys/fs/cgroup/:/host/sys/fs/cgroup:ro
    environment:
      - DD_API_KEY=${DATADOG_API_KEY}
      - DD_SITE=datadoghq.eu
      - DD_LOGS_ENABLED=true
      - DD_LOGS_CONFIG_CONTAINER_COLLECT_ALL=true
      - DD_CONTAINER_EXCLUDE_LOGS=name:datadog_agent
    depends_on:
      redpanda: { condition: service_healthy }
    ports:
      - "8125:8125/udp"
    networks:
      - monitoring

In the above code, you can find a series of environment variables that are specific to Datadog, which can be useful for explicitly configuring your Docker Agent. The environment variables and their explanations are as follows:

DD_API_KEY sets your account key.
DD_SITE is the Datadog site, setting up your Datadog location.
DD_LOGS_ENABLED is for enabling/disabling your logs (default is false).
DD_LOGS_CONFIG_CONTAINER_COLLECT_ALL enables/disables log collection for all containers (default is false).
DD_CONTAINER_EXCLUDE_LOGS excludes specific containers’ logs for collection; for this tutorial’s YAML configuration, the container itself is named datadog_agent and it’s excluded from the log collection.

Producers and consumers

The sections above explain the setup for integration between Redpanda as the messaging middleware and Datadog Agent as a monitoring tool. In addition to those two components, in an event streaming environment you would also expect to have producers and consumers of streaming data.

Producers insert messages into the messaging middleware, Redpanda. Consumers are decoupled from the producers, and they read data from the messaging middleware for extra operations, including inserting the data into a relational or non-relational database. Redpanda decouples producers from consumers in this setup.

Using a Kafka client, you can refer to the broker addresses to programmatically produce and consume messages. The external Kafka API addresses of the broker's nodes will refer to this. The external Kafka API address of the broker is set as redpanda:9092 in this tutorial and has only one node.

As this tutorial focuses on how to integrate Redpanda with Datadog and the resulting monitoring benefits, no details about the clients for Redpanda have been provided. However, you can see the Python producer and consumer written for this setup in the project repository.

Redpanda–Datadog integration

The previous steps explained how to simulate an event streaming setup locally with docker-compose, demonstrating how to integrate Redpanda and Datadog.

Once you have the Docker Datadog Agent, Redpanda, and a custom consumer and producer working in your setup, go to your Datadog user interface to find the Integrations tab on the sidebar.

redpanda datadog integration

Then, you can install or configure your Redpanda integration from the Redpanda integration box.

integration install

Redpanda integration provides a Redpanda dashboard, where you’ll find a compact overview of the frequently used metrics and data collected by Datadog. You can find a full list here, but some examples include:

Latency metrics:
- redpanda.kafka.latency_produce_latency_us.sum: Producer latency in milliseconds.
- redpanda.kafka.latency_fetch_latency_us.sum: Consumer latency in milliseconds.
Throughput metrics:
- redpanda.kafka.rpc_received_bytes.count: Megabytes produced.
- redpanda.kafka.rpc_sent_bytes.count: Megabytes received.

latency throughput redpanda datadog

Topic-related metrics:
- redpanda.storage.log_written_bytes.count: Written bytes per topic.
- redpanda.storage.log_read_bytes.count: Read bytes per topic.
- redpanda.cluster.partition_records_produced.count: Number of records written per topic.
- redpanda.cluster.partition_records_fetched.count: Number of records read per topic.

topic metrics redpanda datadog

Redpanda integration also comes with a check service for monitoring the health of your Redpanda.

check service

Conclusion

In this article, you learned how to integrate your Datadog Agent with your Redpanda cluster, and you also saw some frequently used metrics that come with the integration.

With the Redpanda–Datadog integration, you can determine critical metrics for monitoring your infrastructure, which you can use to define alerts and notifications, detect anomalies, or see if the Redpanda service is healthy.

As a reminder, you can access the resources for this tutorial in the Github repository here. Check out Redpanda’s source-available code in this repo, or join the Redpanda Community on Slack to ask questions about running this demo.