MCPcopy Index your code
hub / github.com/pachyderm/pachyderm

github.com/pachyderm/pachyderm @v2.12.2 sqlite

repository ↗ · DeepWiki ↗ · release v2.12.2 ↗
29,409 symbols 98,911 edges 2,416 files 11,190 documented · 38%
README
<img src='./Pachyderm_Icon-01.svg' height='225' title='Pachyderm'>

GitHub release GitHub license GoDoc Go Report Card Slack Status CLA assistant

Pachyderm – Automate data transformations with data versioning and lineage

Pachyderm is cost-effective at scale, enabling data engineering teams to automate complex pipelines with sophisticated data transformations across any type of data. Our unique approach provides parallelized processing of multi-stage, language-agnostic pipelines with data versioning and data lineage tracking. Pachyderm delivers the ultimate CI/CD engine for data.

Features

  • Data-driven pipelines automatically trigger based on detecting data changes.
  • Immutable data lineage with data versioning of any data type.
  • Autoscaling and parallel processing built on Kubernetes for resource orchestration.
  • Uses standard object stores for data storage with automatic deduplication.
  • Runs across all major cloud providers and on-premises installations.

Getting Started

To start deploying your end-to-end version-controlled data pipelines, run Pachyderm locally or you can also deploy on AWS/GCE/Azure in about 5 minutes.

You can also refer to our complete documentation to see tutorials, check out example projects, and learn about advanced features of Pachyderm.

If you'd like to see some examples and learn about core use cases for Pachyderm: - Examples - Use Cases - Case Studies

Documentation

Official Documentation

Community

Keep up to date and get Pachyderm support via: - Twitter Follow us on Twitter. - Slack Status Join our community Slack Channel to get help from the Pachyderm team and other users.

Contributing

To get started, sign the Contributor License Agreement.

You should also check out our contributing guide.

Send us PRs, we would love to see what you do! You can also check our GH issues for things labeled "help-wanted" as a good place to start. We're sometimes bad about keeping that label up-to-date, so if you don't see any, just let us know.

Usage Metrics

Pachyderm automatically reports anonymized usage metrics. These metrics help us understand how people are using Pachyderm and make it better. They can be disabled by setting the env variable METRICS to false in the pachd container.

Extension points exported contracts — how you extend this code

Adder (Interface)
Adder is something that can be added to. [10 implementers]
src/internal/promutil/promutil.go
Source (Interface)
Source iterates over FileInfos generated from a fileset.FileSet [23 implementers]
src/server/pfs/server/source.go
ModifyFile (Interface)
ModifyFile is used for performing a stream of file modifications. The modifications are not persisted until the ModifyFi [4 …
src/client/pfs_file.go
ProtoCodeGenerator (Interface)
Generate: Unsupported transaction client (overridden by user code) - existing in client/transaction.go ProtoCodeGenerato [1 …
src/proto/pachgen/main.go
JSONStringStreamController (Interface)
* JSONStringStreamController represents the transform controller that's able to transform the incoming * new line delim
src/typescript/fetch.pb.ts
IMountPlugin (Interface)
(no doc) [1 implementers]
jupyter-extension/src/plugins/mount/types.ts
IAPIClient (Interface)
(no doc) [1 implementers]
console/backend/src/proto/proto/pps/pps_grpc_pb.d.ts
Chainable (Interface)
(no doc)
jupyter-extension/cypress/support/commands.d.ts

Core symbols most depended-on inside this repo

NoError
called by 6246
src/internal/require/require.go
Errorf
called by 2341
src/internal/starlark/startest/startest.go
Equal
called by 1821
src/internal/require/require.go
EnsureStack
called by 1447
src/internal/errors/errors.go
Error
called by 1285
src/internal/starlark/startest/startest.go
Run
called by 804
src/internal/starlark/startest/startest.go
Ctx
called by 789
src/server/auth/server/oidc.go
UniqueString
called by 726
src/internal/uuid/naming.go

Shape

Method 17,129
Function 6,833
Struct 2,445
Class 1,328
TypeAlias 694
Interface 600
FuncType 290
Enum 89
Route 1

Languages

Go85%
TypeScript11%
Python4%

Modules by API surface

src/pfs/pfs.pb.validate.go1,417 symbols
src/pps/pps.pb.validate.go1,287 symbols
src/pps/pps.pb.go1,088 symbols
src/pfs/pfs.pb.go1,054 symbols
src/internal/testpachd/mock_pachd.go792 symbols
src/auth/auth.pb.validate.go741 symbols
src/pfs/pfs_grpc.pb.go474 symbols
src/auth/auth.pb.go433 symbols
src/identity/identity.pb.validate.go390 symbols
src/pjs/pjs.pb.validate.go351 symbols
src/debug/debug.pb.validate.go351 symbols
src/pps/pps_grpc.pb.go320 symbols

Dependencies from manifests, versioned

cloud.google.com/gov0.115.0 · 1×
cloud.google.com/go/auth/oauth2adaptv0.2.2 · 1×
cloud.google.com/go/compute/metadatav0.3.0 · 1×
cloud.google.com/go/profilerv0.3.0 · 1×
cloud.google.com/go/storagev1.41.0 · 1×
dario.cat/mergov1.0.0 · 1×
filippo.io/edwards25519v1.1.0 · 1×
github.com/AppsFlyer/go-sundheitv0.5.0 · 1×
github.com/Azure/azure-sdk-for-go/sdk/azcorev1.11.1 · 1×
github.com/Azure/azure-sdk-for-go/sdk/azidentityv1.6.0 · 1×

Datastores touched

(mysql)Database · 1 repos
martiniDatabase · 1 repos
test_dbDatabase · 1 repos
mydbDatabase · 1 repos

For agents

$ claude mcp add pachyderm \
  -- python -m otcore.mcp_server <graph>

⬇ download graph artifact