MCPcopy Index your code
hub / github.com/apache/flink-cdc

github.com/apache/flink-cdc @release-3.6.0 sqlite

repository ↗ · DeepWiki ↗ · release release-3.6.0 ↗
14,547 symbols 76,093 edges 1,538 files 3,688 documented · 25%
README

Flink CDC

Test Release Build Nightly Build License

Flink CDC is a distributed data integration tool for real time data and batch data. Flink CDC brings the simplicity and elegance of data integration via YAML to describe the data movement and transformation in a Data Pipeline.

The Flink CDC prioritizes efficient end-to-end data integration and offers enhanced functionalities such as full database synchronization, sharding table synchronization, schema evolution and data transformation.

Flink CDC framework design

Quickstart Guide

Flink CDC provides a CdcUp CLI utility to start a playground environment and run Flink CDC jobs. You will need to have a working Docker and Docker compose environment to use it.

  1. Run git clone https://github.com/apache/flink-cdc.git --depth=1 to retrieve a copy of Flink CDC source code.
  2. Run cd tools/cdcup/ && ./cdcup.sh init to use the CdcUp tool to start a playground environment.
  3. Run ./cdcup.sh up to boot-up docker containers, and wait for them to be ready.
  4. Run ./cdcup.sh mysql to open a MySQL session, and create at least one table.
-- initialize db and table
CREATE DATABASE cdc_playground;
USE cdc_playground;
CREATE TABLE test_table (id INT PRIMARY KEY, name VARCHAR(32));

-- insert test data
INSERT INTO test_table VALUES (1, 'alice'), (2, 'bob'), (3, 'cicada'), (4, 'derrida');

-- verify if it has been successfully inserted
SELECT * FROM test_table;
  1. Run ./cdcup.sh pipeline pipeline-definition.yaml to submit the pipeline job. You may also edit the pipeline definition file for further configurations.
  2. Run ./cdcup.sh flink to access the Flink Web UI.

Getting Started

  1. Prepare a Apache Flink cluster and set up FLINK_HOME environment variable.
  2. Download Flink CDC tar, unzip it and put jars of pipeline connector to Flink lib directory.

If you're using macOS or Linux, you may use brew install apache-flink-cdc to install Flink CDC and compatible connectors quickly.

  1. Create a YAML file to describe the data source and data sink, the following example synchronizes all tables under MySQL app_db database to Doris : ```yaml source: type: mysql hostname: localhost port: 3306 username: root password: 123456 tables: app_db..*

sink: type: doris fenodes: 127.0.0.1:8030 username: root password: ""

transform: - source-table: adb.web_order01 projection: *, format('%S', product_name) as product_name filter: addone(id) > 10 AND order_id > 100 description: project fields and filter - source-table: adb.web_order02 projection: *, format('%S', product_name) as product_name filter: addone(id) > 20 AND order_id > 200 description: project fields and filter

route: - source-table: app_db.orders sink-table: ods_db.ods_orders - source-table: app_db.shipments sink-table: ods_db.ods_shipments - source-table: app_db.products sink-table: ods_db.ods_products

pipeline: name: Sync MySQL Database to Doris parallelism: 2 route-mode: ALL_MATCH user-defined-function: - name: addone classpath: com.example.functions.AddOneFunctionClass - name: format classpath: com.example.functions.FormatFunctionClass 4. Submit pipeline job using `flink-cdc.sh` script.shell bash bin/flink-cdc.sh /path/mysql-to-doris.yaml ``` 5. View job execution status through Flink WebUI or downstream database.

Try it out yourself with our more detailed tutorial. You can also see connector overview to view a comprehensive catalog of the connectors currently provided and understand more detailed configurations.

Join the Community

There are many ways to participate in the Apache Flink CDC community. The mailing lists are the primary place where all Flink committers are present. For user support and questions use the user mailing list. If you've found a problem of Flink CDC, please create a Flink jira and tag it with the Flink CDC tag.
Bugs and feature requests can either be discussed on the dev mailing list or on Jira.

Contributing

Welcome to contribute to Flink CDC, please see our Developer Guide and APIs Guide.

License

Apache 2.0 License.

Special Thanks

The Flink CDC community welcomes everyone who is willing to contribute, whether it's through submitting bug reports, enhancing the documentation, or submitting code contributions for bug fixes, test additions, or new feature development.
Thanks to all contributors for their enthusiastic contributions.

Extension points exported contracts — how you extend this code

PipelineExecution (Interface)
A pipeline execution that can be executed by a computing engine. [25 implementers]
flink-cdc-composer/src/main/java/org/apache/flink/cdc/composer/PipelineExecution.java
DataTypeVisitor (Interface)
The visitor definition of DataType. The visitor transforms a data type into instances of R. @param [16 implementers]
flink-cdc-common/src/main/java/org/apache/flink/cdc/common/types/DataTypeVisitor.java
OperatorStateStoreAdapter (Interface)
Compatibility adapter for OperatorStateStore in Flink 1.20. In Flink 1.x, OperatorStateStore interface does [12 implementers]
flink-cdc-flink1-compat/src/test/java/org/apache/flink/cdc/runtime/compat/OperatorStateStoreAdapter.java
OperatorStateStoreAdapter (Interface)
Compatibility adapter for OperatorStateStore in Flink 2.2. In Flink 2.x, OperatorStateStore interface has ne [6 implementers]
flink-cdc-flink2-compat/src/test/java/org/apache/flink/cdc/runtime/compat/OperatorStateStoreAdapter.java
PaimonRecordSerializer (Interface)
Converting Input into PaimonEvent for PaimonWriter. [79 implementers]
flink-cdc-connect/flink-cdc-pipeline-connectors/flink-cdc-pipeline-connector-paimon/src/main/java/org/apache/flink/cdc/connectors/paimon/sink/v2/PaimonRecordSerializer.java
MySqlSplitAssigner (Interface)
The MySqlSplitAssigner is responsible for deciding what split should be processed. It determines split processin [6 implementers]
flink-cdc-connect/flink-cdc-source-connectors/flink-connector-mysql-cdc/src/main/java/org/apache/flink/cdc/connectors/mysql/source/assigners/MySqlSplitAssigner.java
PostTransformConverter (Interface)
The PostTransformConverter applies to convert the DataChangeEvent after other part of TransformRule in PostTrans [10 implementers]
flink-cdc-runtime/src/main/java/org/apache/flink/cdc/runtime/operators/transform/converter/PostTransformConverter.java
SpecStep (Interface)
A single step of RuleSpec. [3 implementers]
flink-cdc-e2e-tests/flink-cdc-pipeline-e2e-tests/src/test/java/org/apache/flink/cdc/pipeline/tests/specs/SpecStep.java

Core symbols most depended-on inside this repo

physicalColumn
called by 2495
flink-cdc-common/src/main/java/org/apache/flink/cdc/common/schema/Column.java
add
called by 2062
flink-cdc-common/src/main/java/org/apache/flink/cdc/common/types/variant/VariantBuilder.java
get
called by 1727
flink-cdc-connect/flink-cdc-source-connectors/flink-connector-mysql-cdc/src/main/java/org/apache/flink/cdc/connectors/mysql/source/MySqlSource.java
format
called by 1587
flink-cdc-common/src/main/java/org/apache/flink/cdc/common/configuration/description/DescriptionElement.java
of
called by 1284
flink-cdc-common/src/main/java/org/apache/flink/cdc/common/sink/FlinkSinkProvider.java
STRING
called by 1274
flink-cdc-common/src/main/java/org/apache/flink/cdc/common/types/DataTypes.java
fromString
called by 936
flink-cdc-common/src/main/java/org/apache/flink/cdc/common/data/binary/BinaryStringData.java
asList
called by 926
flink-cdc-common/src/main/java/org/apache/flink/cdc/common/configuration/ConfigOptions.java

Shape

Method 12,609
Class 1,710
Interface 137
Enum 80
Function 11

Languages

Java100%
TypeScript1%

Modules by API surface

flink-cdc-common/src/main/java/org/apache/flink/cdc/common/data/binary/BinarySegmentUtils.java70 symbols
flink-cdc-common/src/main/java/org/apache/flink/cdc/common/text/TokenStream.java66 symbols
flink-cdc-composer/src/test/java/org/apache/flink/cdc/composer/flink/FlinkPipelineTransformITCase.java61 symbols
flink-cdc-connect/flink-cdc-source-connectors/flink-connector-oceanbase-cdc/src/main/java/io/debezium/connector/mysql/MySqlStreamingChangeEventSource.java56 symbols
flink-cdc-connect/flink-cdc-source-connectors/flink-connector-mysql-cdc/src/main/java/io/debezium/connector/mysql/MySqlStreamingChangeEventSource.java56 symbols
flink-cdc-common/src/main/java/org/apache/flink/cdc/common/data/binary/BinaryArrayData.java56 symbols
flink-cdc-common/src/main/java/org/apache/flink/cdc/common/data/binary/BinaryStringData.java54 symbols
flink-cdc-connect/flink-cdc-source-connectors/flink-connector-mysql-cdc/src/test/java/org/apache/flink/cdc/connectors/mysql/source/MySqlSourceITCase.java52 symbols
flink-cdc-connect/flink-cdc-source-connectors/flink-connector-mysql-cdc/src/main/java/io/debezium/connector/mysql/MySqlConnection.java52 symbols
flink-cdc-connect/flink-cdc-source-connectors/flink-connector-oracle-cdc/src/main/java/io/debezium/connector/oracle/logminer/processor/AbstractLogMinerEventProcessor.java51 symbols
flink-cdc-connect/flink-cdc-source-connectors/flink-connector-postgres-cdc/src/test/java/org/apache/flink/cdc/connectors/postgres/source/PostgresSourceITCase.java49 symbols
flink-cdc-connect/flink-cdc-source-connectors/flink-connector-postgres-cdc/src/main/java/io/debezium/connector/postgresql/connection/PostgresReplicationConnection.java46 symbols

Dependencies from manifests, versioned

co.elastic.clients:elasticsearch-java
com.alibaba:dns-cache-manipulator1.8.0 · 1×
com.aliyun.odps:odps-sdk-core0.50.6-public · 1×
com.esotericsoftware:kryo-shaded4.0.2 · 1×
com.esri.geometry:esri-geometry-api
com.fasterxml.jackson.core:jackson-core
com.fasterxml.jackson.module:jackson-module-afterburner
com.fasterxml.jackson:jackson-bom
com.ibm.db2.jcc:db2jccdb2jcc4 · 1×
com.jayway.jsonpath:json-path

Datastores touched

(mysql)Database · 1 repos
(mongodb)Database · 1 repos

For agents

$ claude mcp add flink-cdc \
  -- python -m otcore.mcp_server <graph>

⬇ download graph artifact