MCPcopy
hub / github.com/alibaba/MongoShake

github.com/alibaba/MongoShake @release-v2.8.8-20260428 sqlite

repository ↗ · DeepWiki ↗ · release release-v2.8.8-20260428 ↗
1,104 symbols 3,946 edges 133 files 226 documented · 20%
README

This is a brief introduction of Mongo-Shake, please visit english wiki or chinese wiki if you want to see more details including architecture, data flow, performance test, business showcase and so on.

Mongo-Shake


Mongo-Shake is developed and maintained by Nosql Team in Alibaba-Cloud.

Mongo-Shake is a universal platform for services based on MongoDB's oplog. It fetches oplog from source mongo database, and replays in the target mongo database or sends to other ends in different tunnels. If the target side is mongo database which means replay oplog directly, it's like a syncing tool that used to copy data from source MongoDB to another MongoDB to build redundant replication or active-active replication. Except for this direct way, there are others tunnel types such like rpc, file, tcp, kafka. Receivers wrote by users must define their own interfaces to connecting to these tunnels respectively. Users can also define their own tunnel type which is pluggable. If connecting to a third-party message middleware like kafka, the consumer can get the subscriber data in an asynchronous way in pub/sub module flexibly. Here comes general data flow,

pic1

The source can be either single mongod, replica set or sharding while target can be mongod or mongos. If the source is replica set, we suggest fetching data from secondary/hidden to ease the primary pressure. If the source is sharding, every shard should connect to Mongo-Shake. There can be several mongos on the target side to keep high availability, and different data will be hashed and written to different mongos.

Parallel Replication


There are three options for parallel replication which we call 'shad_key': id, collection and auto. id means the concurrency granularity is document while collection means the granularity is collection/table. auto option is decided by if there has unique index of any collection, it will change to collection with unique index exist otherwise id.

High Availability


Mongo-Shake periodically persistent its context into register center which by default is the source database. Currently, the context is checkpoint which marks the position of successfully replay oplog.

Hypervisor mechanism is also supported so that it will restart immediately when dies(master_quorum in configuration). master_quorum only supports checkpoint.storage=database; when master_quorum=true, master_quorum.election_id must be set to a valid MongoDB ObjectID and kept unique per independent sync job or HA group.

Filter


Support filtering database and collection namespace with whitelist and blacklist.

DDL Syncing


Starting with version 1.5, MongoShake supports syncing DDL by using global barrier. Once fetching DDL oplog, MongoShake adds a barrier so that all the subsequent oplogs wait in the queue until this oplog is written into the target MongoDB or tunnel and the checkpoint is updated. Currently, DDL is only support for ReplicaSet on the source side(target side can be RelicaSet or Sharding), we will support Sharding in the later version.

ddl

Global ID


In Aliyun internal version, global id(also called gid) is supported which marks the id of the database. It can be used to avoid loop when two databases become backup of each other. Mongo-Shake only fetches the oplogs equal to source database id, all the oplogs are be fetched when no gid gave. For current opensource version, it's not supported limited by the modification of MongoDB kernel.

If you want to build active-active replication without gid supported, please visit FAQ document to see more details.

Tunnel


As mentioned above, we support several tunnel types such like: rcp, tcp, file, kafka, mock and direct. rpc and tcp means connecting to receiver synchronously by net/rcp and TCP respectively; file means writing output into file; kafka is an asynchronous way of sending the message; mock is used by testing that throws away all the data; direct means writing into target MongoDB directly. Users can also add or modify current tunnel type.

We offer receiver to connect to different tunnels like: rpc, tcp, file, mock and kafka. Please visit FAQ document to see more details.

Compressor


Gzip, zlib, deflate compressor are supported in batched oplogs before sending.

Monitor & Debug


Users could monitor or debug Mongo-Shake through RESTful API, please visit FAQ document to see more details.

Users could also monitor replication metrics with Prometheus exporter. See more details in mongoshake-prometheus-exporter and issue#859

Since v2.8.8, MongoShake also support prometheus metrics. See more details in README. The following is an example: monitor_prometheus_example2 monitor_prometheus_example2

Other Details


Mongo-Shake uses go-driver to fetch oplogs from source MongoDB which is later than the given timestamp in configuration. Then, it filters oplogs based on whitelist, blacklist, and gid. All the oplogs will be transferred at least once which is acceptable because of idempotent of oplog DML. We use seq and ack to make sure the package is received which is similar to the sequence and acknowledgment numbers in TCP.

The oplogs are batched together in the handling pipeline.

Users can adjust the worker concurrency and executor concurrency according to the different environment.

Please see the detail documents listed at the beginning if you want to see more details.

Code branch rules

version rules: a.b.c.

  • a: major version
  • b: minor version. even number means stable version. e.g. 1.2.x, 1.4.x, 2.0.x are stable while 1.5.x, 2.1.x aren't.
  • c: bugfix version
branch name rules
master master branch, do not allowed push code. store the latest stable version.
develop(main branch) develop branch. all the bellowing branches fork from this.
feature-* new feature branch. forked from develop branch and then merge back after finish developing, testing, and code review.
bugfix-* bugfix branch. forked from develop branch and then merge back after finish developing, testing, and code review.
improve-* improvement branch. forked from develop branch and then merge back after finish developing, testing, and code review.

tag rules: add tag when releasing: "release-v{version}-{date}". for example: "release-v1.0.2-20180628"

Usage


Run ./bin/collector.darwin or collector.linux which is built in OSX and Linux respectively.

Or you can build mongo-shake yourself according to the following steps(go version needs >= 15.10):

  • git clone https://github.com/alibaba/MongoShake.git
  • cd MongoShake
  • make
  • ./bin/collector -conf=conf/collector.conf

please note: user must modify collector.conf first to match needs. You can also use \"start.sh\" script which supports hypervisor mechanism in Linux OS only.

Shake series tool


We also provide some tools for synchronization in Shake series.

Thanks


Username Mail
lydarkforest linyunads1379@163.com
diggzhang diggzhang@gmail.com
ManleyLiu daywbdb@qq.com
hustchensi chensi_04@126.com
HelloCodeMing huanmingwong@163.com
cocoakekeyu cocoakekeyu@gmail.com
lixj1103 244769542@qq.com
xzshinan shinan@gongchang.com
tzjavadmg codyzeng@163.com
dx8439 171390022@qq.com
monkeyWie
raydy.yan yajuyan@hotmail.com
loda507 741536172@qq.com
骑着蜗牛的兔子 348978774@qq.com
lijwww 2530877879@qq.com
nanmu42 i@nanmu.me
zemul zemiaozhou@gmail.com
renheqiang
dobesv dobesv@gmail.com
pengzhenyi2015 503282373@qq.com
SisyphusSQ

Extension points exported contracts — how you extend this code

OplogFilter (Interface)
OplogFilter include: AutologousFilter, NamespaceFilter, GidFilter, NoopFilter, DDLFilter [8 implementers]
collector/filter/oplog_filter.go
Writer (Interface)
(no doc) [6 implementers]
tunnel/tunnel.go
Compress (Interface)
(no doc) [4 implementers]
modules/compress.go
OplogHandler (Interface)
(no doc) [3 implementers]
collector/syncer.go
Hasher (Interface)
(no doc) [3 implementers]
oplog/hasher.go
BasicWriter (Interface)
(no doc) [3 implementers]
executor/db_writer.go
Module (Interface)
(no doc) [2 implementers]
collector/write_controller.go
CollisionMatrix (Interface)
(no doc) [2 implementers]
executor/collision_matrix.go

Core symbols most depended-on inside this repo

Errorf
called by 308
pkg/log/logger.go
Printf
called by 279
pkg/log/logger.go
Infof
called by 234
pkg/log/logger.go
BatchMore
called by 160
collector/batcher.go
Size
called by 155
oplog/txn_buffer.go
Warnf
called by 88
pkg/log/logger.go
Error
called by 66
pkg/log/logger.go
Debugf
called by 61
pkg/log/logger.go

Shape

Method 483
Function 450
Struct 144
Interface 15
TypeAlias 9
Class 3

Languages

Go95%
Python4%
TypeScript1%

Modules by API surface

common/metric.go50 symbols
modules/compress.go36 symbols
collector/syncer.go33 symbols
pkg/log/logger.go32 symbols
collector/filter/oplog_filter.go27 symbols
tools/pre-split/pre_split.go25 symbols
collector/worker.go25 symbols
tunnel/tunnel.go22 symbols
collector/docsyncer/doc_syncer.go22 symbols
oplog/oplog.go21 symbols
executor/collision_matrix.go20 symbols
tunnel/tcp_writer.go19 symbols

Dependencies from manifests, versioned

github.com/Masterminds/semver/v3v3.4.0 · 1×
github.com/Shopify/saramav1.27.2 · 1×
github.com/beorn7/perksv1.0.1 · 1×
github.com/cespare/xxhash/v2v2.3.0 · 1×
github.com/davecgh/go-spewv1.1.1 · 1×
github.com/eapache/go-resiliencyv1.2.0 · 1×
github.com/eapache/go-xerial-snappyv0.0.0-2018081417443 · 1×
github.com/eapache/queuev1.1.0 · 1×
github.com/getlantern/deepcopyv0.0.0-2016031715434 · 1×
github.com/golang/glogv1.2.5 · 1×
github.com/golang/snappyv0.0.4 · 1×

Datastores touched

c1Collection · 1 repos
c2Collection · 1 repos
c3Collection · 1 repos
collectionsCollection · 1 repos
c4Collection · 1 repos
databasesCollection · 1 repos
serverless-shake-fake-collectionCollection · 1 repos
testCollCollection · 1 repos

For agents

$ claude mcp add MongoShake \
  -- python -m otcore.mcp_server <graph>

⬇ download graph artifact