A helpful checklist to set you up for a smooth transition to the simpler streaming data platform

ByTravis CampbellonOctober 24, 2023
Migrating from Kafka to Redpanda

In a time of high-performance, high-throughput, and low-latency applications, Apache Kafka® is no longer the streaming data superpower it once was. Companies now need a platform that supports real-time, mission-critical workloads without breaking the bank (or their ops team). This why and more companies are choosing Redpanda for simpler management, better performance, and lower cloud costs.

If you’ve decided to take the leap and migrate from Kafka (or Kafka-compatible platforms) to Redpanda—congratulations! From planning and migrating to testing and troubleshooting, there’s a lot involved in a migration, and the first step is often the trickiest. As always, here at Redpanda we like to make things simple, so we wrote a completely free guide detailing how to migrate from Kafka services to Redpanda.

To give you a taste of what the process entails, this post covers the all-important planning stage. It’ll help you find your footing, check all the right boxes, and prepare your clusters for the smoothest transition possible.

Ready to take the first step toward simpler streaming data? Let’s get started.

From Kafka platforms to Redpanda – an overview

A prepared switch to Redpanda from other Kafka platforms is easy and involves just a few simple steps. There are slight variations to look out for from one Kafka platform to another, depending on the specific implementation details, but here’s what you can generally expect.

  1. Pre-flight evaluation of your existing Kafka platform: Application behavior, data volume and usage, performance characteristics, availability and service level agreements/objectives, and administrative and governance operations.

  2. Pre-flight cluster setup: Set up a Redpanda cluster sized appropriately for your workloads, including items like users, security, access control lists, and topics.

  3. Replicate data: Set up a continuous replication process between the legacy Kafka platform and the Redpanda cluster.

  4. Validate data: Validate completeness and correctness of replicated data.

  5. Reconfigure the application: Move Kafka producers and consumers to point to the Redpanda cluster.

  6. Evaluate application, load, and security: Validate that the application is functional, performant, and adheres to the appropriate security policies once using Redpanda.

For the visual learners, here’s a diagram of everything mentioned above.

The migration process from Kafka to Redpanda at a glance.
The migration process from Kafka to Redpanda at a glance.

Planning your migration to Redpanda

A solid plan is the first step to a successful migration to Redpanda. This helps you see potential risks that can cause delays down the line, allowing you mitigate them before they become a problem. At this crucial stage, consider the different stages of work in relation to your development, deployment, and operational practices.

Some companies have a process in place for making changes. This can involve moving design and architecture changes through a development and testing environment before making changes to production workloads. There’s more to consider after development and functional testing. You’ll need to validate the data, test the performance and security, and get approval from relevant stakeholders such as application developers and users who rely on the data.

Before migrating, it’s also important to plan and review each application that uses Kafka. This helps define the functional and performance requirements for each flow of data through the system.

You’ll need to understand how many messages are produced and consumed, topic configuration, log segment rotation rates, and message retention details – to name a few examples. By evaluating these details, you ensure that Redpanda meets the functional and performance requirements for each application.

To get you on the right track, we’ll briefly cover the following:

  1. Common migration questions

  2. A pre-migration checklist

  3. How to evaluate what to migrate

  4. An example Level of Effort timeline

Let’s dig in.

1. Common questions about migrating to Redpanda

Some questions come up during most migration processes. These are the three we get asked the most.

  • Can I install Redpanda over my existing Kafka installation to shorten the transition? Unfortunately, no. Redpanda uses a different storage mechanism than your legacy Kafka platform. The process is set up as a replication pipeline to transfer data from the source cluster to a new Redpanda cluster using purpose-built systems or containers.

  • How long does migration take? Migration time to a new platform depends entirely on the use case. Usually, moving data is the easiest part to define and scope. The most difficult can be data validation as well as application updates and validation. We do our best to minimize required application-level changes. But in testing, you may find some changes to be necessary. The general guideline is to copy approximately 85 terabytes of data per day on a single 10-gigabit network link between two endpoints. You can scale up or down depending on the base node bandwidth needs. Other factors like available network or CPU capacity come into play. But they don’t impact existing consumers or producers. In these cases, data movement would need to be tuned appropriately to avoid overwhelming your production environments.

  • Will I need to change my applications? Not at all. Since Redpanda is an API-compatible, drop-in replacement for Kafka, you can expect minimal change required – and sometimes none at all – at the application level to transition over. In some cases, you may need to update administrative tooling or integrations for monitoring and observability, due to how Redpanda exposes these interfaces to users. An example is that monitoring metrics may have different names.

2. Pre-migration checklist

As you prepare to move from Kafka to Redpanda, here’s the list of requirements you should check off.

  • A destination cluster running a recent Redpanda version

  • Validated network connectivity between source and destination cluster

  • MirrorMaker2 installed and configured near the destination cluster

  • User credentials for MirrorMaker2 to connect to both source and destination cluster User permissions or ACLs for MirrorMaker2 to access source and destination topics on both clusters

  • If you’re using TLS for in-flight encryption, MirrorMaker2 must have the appropriate Certificate Authorities configured to validate TLS connections

  • List of topics to migrate

3. Evaluating what to migrate

Kafka implementations come in all shapes, sizes, performance characteristics, and tenancy designs. It’s critical to catalog yours in as much detail as possible. No matter if it’s a small cluster hosting a handful of topics or a large, multi-tenant shared cluster with thousands of topics competing for resources.

To make sure you don’t miss anything during migration, consider the entire workflow and its dependencies within the source cluster. Here’s what you’ll need to move:

  • Topics and each of their configuration settings

    • Include items like retention configuration, replication factor, and partition counts

  • Users and passwords for authenticating to the cluster

  • Access control lists (ACLs) to give users access to topics

  • Schemas

  • Consumer offsets or their translations

  • Consumer group details

It’s helpful to go through these details for each step of the change management process, from development to production. To help you stay organized and cover all your bases, check the migration questionnaire in the appendix of our complete guide How to migrate from Kafka services to Redpanda.

Want the full report?

Download the free guide on migrating from Kafka to Redpanda.

4. Example: Level of Effort (LOE) timeline

Migration timelines vary in length, but most follow the same cadence and order of operations. Multi-environment setups – like those with formal development, QA, and production lifecycle – might have higher levels of effort due to higher data throughput levels or certain operations front-loaded in the process, like initial application testing.

To give you a better idea of your own timeline, let’s lay out an example. We’re following the above migration overview for a small, low-volume cluster with only a handful of topics and clients. This particular timeline includes 62 hours of effort with four hours of actual outage to applications to transition consumers and producers to the new cluster.

Please note that data flow outages to producers and consumers could be as little as two hours each. We recommend moving consumers first, so they can work off the Redpanda cluster while producers flow data to Redpanda via MirrorMaker2.

  • Pre-flight evaluation (16 hours total)

    • List topics to migrate, determine configurations and data volume (4 hours)

    • Review producer/consumer client code (4 hours)

    • Review governance requirements (8 hours)

  • Pre-flight cluster setup (12 hours total)

    • Setup physical or virtual cluster nodes (4 hours)

    • Install and configure Redpanda and Redpanda Console (2 hours)

    • Configure Redpanda Security (2 hours)

    • Provision users and ACLs (4 hours)

  • Data replication (6 hours total)

    • Install and configure MirrorMaker2 (2 hours)

    • Setup Checkpoint Connector (1 hour)

    • Setup Heartbeat Connector (1 hour)

    • Start topic replication (2 hours)

  • Data validation (6 hours total)

    • Verify consumer offset translation (2 hours)

    • Validate completeness of replicated data (2 hours)

    • Validate correctness of replicated data (2 hours)

  • Application reconfiguration (4 hours total)

    • Move Kafka consumers to Redpanda (2 hours)

    • Move Kafka producers to Redpanda (2 hours)

  • Application, load, and security evaluation (16 hours total)

    • Validate the application is functional (8 hours)

    • Compare performance baseline (4 hours)

    • Verify security policies and behavior once using Redpanda (4 hours)

  • Decommission (2 hours total)

    • Shutdown legacy cluster (2 hours)

As you figure out your timeline, keep in mind that an LOE timeline normally only accounts for work effort. This means day-to-day realities can stretch your plan out – like if you’re restricted to work within specific maintenance windows.

Your choice of deployment method can also affect the length of your timeline. For example, deploying Redpanda Dedicated Clusters in the cloud is usually quicker than greenfielding fresh physical infrastructure. If you need a hand getting your timeline right, our Customer Success team is happy to help adjust it to your requirements and environment.

Ready, set, migrate! Download the full guide to make it happen

Look at that, you’re a few steps closer to simple, powerful, and cost-efficient streaming data! If you’re ready to steam ahead, download the full guide on migrating from Kafka services to Redpanda to learn:

  • How to plan every step of your migration and map a realistic timeline

  • How to test and validate your data for integrity, security, and performance

  • What common migration challenges to look out for (and how to solve them)

  • A migration questionnaire to make sure you cover all your bases

If you’re still on the fence about switching over, you can always take Redpanda for a test run. Just sign up for a free trial of Redpanda Cloud or grab the Redpanda Community Edition from GitHub. If you have any questions or want to chat directly with our engineers about migrating, join our Redpanda Community on Slack.

Hands-on resources

More of a practical learner? We’ve got you covered. From interactive workshops to hands-on labs in Killercoda, take your pick and learn how to migrate in whatever way suits you best. Happy migrating!

Let's keep in touch

Subscribe and never miss another blog post, announcement, or community event. We hate spam and will never sell your contact information.