MCPcopy
hub / github.com/bruin-data/ingestr

github.com/bruin-data/ingestr @v1.0.64 sqlite

repository ↗ · DeepWiki ↗ · release v1.0.64 ↗
10,262 symbols 43,428 edges 922 files 1,534 documented · 15%
README
<img src="https://github.com/bruin-data/ingestr/blob/main/resources/ingestr.svg?raw=true" width="500" />

Copy data from any source to any destination without any code

<img src="https://github.com/bruin-data/ingestr/blob/main/resources/demo.gif?raw=true" width="750" />


ingestr is a command-line app that allows you to ingest data from any source into any destination using simple command-line flags, no code necessary.

  • ✨ copy data from your database into any destination
  • ➕ incremental loading: append, merge or delete+insert
  • 🐍 single-command installation

ingestr takes away the complexity of managing any backend or writing any code for ingesting data, simply run the command and watch the data land on its destination.

MongoDB to Postgres benchmark

Installation

You can install ingestr using the install script:

curl -LsSf https://getbruin.com/install/ingestr | sh

Alternatively, you can install it with pip:

pip install ingestr

The pip package can also be used from Python. Install the SDK extra for Python data ingestion:

pip install 'ingestr[sdk]'

Python rows, generators, and DataFrames are sent to the bundled ingestr binary as Arrow IPC streams by default:

import ingestr

ingestr.ingest(
    [{"id": 1, "name": "Ada"}, {"id": 2, "name": "Grace"}],
    dest_uri="duckdb:///tmp/warehouse.duckdb",
    dest_table="main.people",
)

DataFrames and yielded data use the same Arrow stream transport:

ingestr.ingest(df, dest_uri="duckdb:///tmp/warehouse.duckdb", dest_table="main.events")

def events():
    yield [{"id": 1, "event": "signup"}]
    yield [{"id": 2, "event": "purchase"}]

ingestr.ingest(events, dest_uri="postgresql://...", dest_table="public.events")

For push-style code, omit the data argument and use ingest as a context manager. The context value accepts the same shapes as ingestr.ingest(data, ...):

with ingestr.ingest(dest_uri="postgresql://...", dest_table="public.events") as ingest:
    for response in client.list_events():
        ingest(response["items"])

For very large already-materialized data, use the existing mmap Arrow IPC file transport:

ingestr.ingest(df, dest_uri="duckdb:///tmp/warehouse.duckdb", dest_table="main.events", transport="mmap")

For full CLI pass-through, use ingestr.run(["ingest", "--source-uri", "...", "--dest-uri", "...", "--source-table", "..."]), or ingestr.run_cli(...) for keyword arguments that map to CLI flags.

Quickstart

ingestr ingest \
    --source-uri 'postgresql://admin:admin@localhost:8837/web?sslmode=disable' \
    --source-table 'public.some_data' \
    --dest-uri 'bigquery://<your-project-name>?credentials_path=/path/to/service/account.json' \
    --dest-table 'ingestr.some_data'

That's it.

This command:

  • gets the table public.some_data from the Postgres instance.
  • uploads this data to your BigQuery warehouse under the schema ingestr and table some_data.

Documentation

You can see the full documentation here.

Community

Join our Slack community here.

Contributing

Pull requests are welcome. However, please open an issue first to discuss what you would like to change. We maybe able to offer you help and feedback regarding any changes you would like to make.

[!NOTE] After cloning ingestr make sure to run make setup to install githooks.

Supported sources & destinations

<
Source Destination
Databases
AWS Athena
Apache Iceberg -
AWS Redshift
Cassandra
ClickHouse
Couchbase -
CrateDB
Databricks
DuckDB
DynamoDB
Elasticsearch
Google BigQuery
GCP Spanner -
IBM Db2 -
InfluxDB -
Kafka -
Local CSV file
MaxCompute
Microsoft Fabric
Microsoft OneLake -
Microsoft SQL Server
MongoDB
MotherDuck
MySQL
Oracle -
PlanetScale
Postgres
RabbitMQ -
SAP Hana -
Snowflake
Socrata -
SQLite
StarRocks
Synapse -
Trino
Platforms
Adjust -
Airtable -
Allium -
Amazon Kinesis -
Anthropic -
API-Football -
AppsFlyer -
Apple Ads -
Apple App Store -
Applovin -
Applovin Max -
Asana -
Attio -
Azure Data Lake Storage Gen2
BallDontLie FIFA -
Braze -
Bruin -
Chess.com -
ClickUp -
Cursor -
Docebo -
Dune -
Facebook Ads -
Fireflies -
Fluxx -
football-data.org -
Frankfurter -
Freshdesk -
FundraiseUp -
G2 -
GitHub -
GitLab -
Google Ads -
Google Analytics -
Google Cloud Storage (GCS)
Google Sheets -
Gorgias -
Granola -
Hostaway -
HubSpot -
Indeed -
Intercom -
Internet Society Pulse -
Jira -
JobTread -
Klaviyo -
Linear -
LinkedIn Ads -
Mailchimp -
Mixpanel -
Monday -
Notion -
Paddle -
Personio -
PhantomBuster -
Pinterest -
Pipedrive -
Plus Vibe AI -
PostHog -
Primer -
QuickBooks -
Reddit Ads -
RevenueCat -
S3
Salesforce -
SFTP -
SendGrid -
Shopify -
Slack -
Smartsheet -
Snapchat Ads -
Solidgate -
Square -
Stripe -
SurveyMonkey -
TikTok Ads -
Trustpilot -
Twilio -
Wise -
Zendesk

Extension points exported contracts — how you extend this code

CDCResumeProvider (Interface)
CDCResumeProvider is an optional interface that destinations can implement to support CDC resume functionality. If imple [14 …
pkg/destination/destination.go
SchemaEvolver (Interface)
SchemaEvolver is implemented by destinations that can evolve an existing table's schema given an abstract EvolutionPlan. [20 …
pkg/schemaevolution/evolve.go
Source (Interface)
Source represents a data source that can provide tables. Sources handle connection management and return SourceTable ins [136 …
pkg/source/source.go
RecordTransformer (Interface)
RecordTransformer transforms Arrow record batches. [7 implementers]
pkg/transformer/transformer.go
Display (Interface)
Display handles the visual representation of progress. Implementations include interactive (spinner) and log-based displ [5 …
pkg/progress/types.go
WriteStrategy (Interface)
(no doc) [6 implementers]
pkg/strategy/strategy.go
Paginator (Interface)
(no doc) [4 implementers]
pkg/http/pagination.go
DataBuffer (Interface)
DataBuffer accumulates record batches and allows replay. It is used when schema inference requires reading all data befo [1 …
pkg/databuffer/buffer.go

Core symbols most depended-on inside this repo

Debug
called by 1923
internal/config/config.go
Release
called by 1167
pkg/source/mongodb/mongodb.go
Run
called by 1057
pkg/pipeline/pipeline.go
Close
called by 874
pkg/source/source.go
Get
called by 730
pkg/schemaevolution/override.go
Err
called by 566
pkg/source/stripe/stripe.go
Append
called by 513
pkg/databuffer/buffer.go
ExecContext
called by 403
pkg/source/mysql/cdc.go

Shape

Function 5,456
Method 3,940
Struct 746
Interface 56
TypeAlias 42
FuncType 14
Class 8

Languages

Go98%
Python2%
TypeScript1%

Modules by API surface

pkg/destination/bigquery/bigquery.go99 symbols
pkg/source/mysql/cdc.go80 symbols
tests/integration/destination_conformance_test.go74 symbols
pkg/source/blobstore/blobstore.go74 symbols
pkg/destination/duckdb/duckdb.go73 symbols
pkg/destination/mssql/mssql.go68 symbols
pkg/source/stripe/stripe.go67 symbols
pkg/destination/bigquery/load_job.go67 symbols
pkg/destination/oracle/oracle.go66 symbols
pkg/source/mssql_cdc/source.go64 symbols
pkg/source/customerio/customerio.go63 symbols
pkg/source/mysql/internal/psdbconnect/psdbconnect.pb.go61 symbols

Dependencies from manifests, versioned

atomicgo.dev/cursorv0.2.0 · 1×
cel.dev/exprv0.25.1 · 1×
cloud.google.com/gov0.123.0 · 1×
cloud.google.com/go/auth/oauth2adaptv0.2.8 · 1×
cloud.google.com/go/bigqueryv1.74.0 · 1×
cloud.google.com/go/compute/metadatav0.9.0 · 1×
cloud.google.com/go/longrunningv0.9.0 · 1×
cloud.google.com/go/monitoringv1.24.3 · 1×

Datastores touched

itemsCollection · 1 repos
ordersCollection · 1 repos
usersCollection · 1 repos
(mysql)Database · 1 repos
dbDatabase · 1 repos
(mongodb)Database · 1 repos
appDatabase · 1 repos
mydbDatabase · 1 repos

For agents

$ claude mcp add ingestr \
  -- python -m otcore.mcp_server <graph>

⬇ download graph artifact