Introduction to Apache Kafka
- Apache Kafka is a distributed event streaming platform used to build real-time data pipelines and streaming applications.
- Designed for high throughput, fault tolerance, and scalability.
- Common uses:
- Messaging between services (similar to a message broker)
- Real-time analytics
- Event sourcing
- Log aggregation
Apache Kafka vs. Confluent Kafka
- Apache Kafka:
- Open-source project maintained by the Apache Software Foundation.
- Core functionality without extra enterprise features.
- Confluent Kafka:
- A commercial distribution of Kafka by Confluent Inc.
- Includes Kafka plus enterprise-grade tools:
- Schema Registry
- Kafka Connect
- Control Center
- Prebuilt connectors
- Easier deployment and management.
Four Core Components of a Kafka System
- Producer – Sends messages (events) to Kafka topics.
- Broker – Kafka server that stores messages and serves clients. Multiple brokers form a Kafka cluster.
- Consumer – Reads messages from Kafka topics.
- ZooKeeper – Manages cluster metadata, leader election (deprecated in new Kafka versions with KRaft mode).
Kafka Topics
- A category or feed name to which records are stored and published.
- Producers write to topics, consumers read from them.
- Topics can have multiple partitions for scalability.