MCPcopy
hub / github.com/newren/git-filter-repo

github.com/newren/git-filter-repo @v2.47.0 sqlite

repository ↗ · DeepWiki ↗ · release v2.47.0 ↗
235 symbols 693 edges 10 files 70 documented · 30%
README

git filter-repo is a versatile tool for rewriting history, which includes capabilities I have not found anywhere else. It roughly falls into the same space of tool as git filter-branch but without the capitulation-inducing poor performance, with far more capabilities, and with a design that scales usability-wise beyond trivial rewriting cases. git filter-repo is now recommended by the git project instead of git filter-branch.

While most users will probably just use filter-repo as a simple command line tool (and likely only use a few of its flags), at its core filter-repo contains a library for creating history rewriting tools. As such, users with specialized needs can leverage it to quickly create entirely new history rewriting tools.

Table of Contents

Prerequisites

filter-repo requires:

  • git >= 2.22.0 at a minimum; some features require git >= 2.24.0 or later
  • python3 >= 3.6

How do I install it?

While the git-filter-repo repository has many files, the main logic is all contained in a single-file python script named git-filter-repo, which was done to make installation for basic use on many systems trivial: just place that one file into your $PATH.

See INSTALL.md for things beyond basic usage or special cases. The more involved instructions are only needed if one of the following apply:

  • you do not find the above comment about trivial installation intuitively obvious
  • you are working with a python3 executable named something other than "python3"
  • you want to install documentation (beyond the builtin docs shown with -h)
  • you want to run some of the contrib examples
  • you want to create your own python filtering scripts using filter-repo as a module/library

How do I use it?

For comprehensive documentation: * see the user manual * alternative formating of the user manual is available on various external sites (example), for those that don't like the htmlpreview.github.io layout, though it may only be up-to-date as of the latest release

If you prefer learning from examples: * there is a cheat sheet for converting filter-branch commands, which covers every example from the filter-branch manual * there is a cheat sheet for converting BFG Repo Cleaner commands, which covers every example from the BFG website * the simple example below may be of interest * the user manual has an extensive examples section * I have collected a set of example filterings based on user-filed issues

In either case, you may also find the Frequently Answered Questions useful.

Why filter-repo instead of other alternatives?

This was covered in more detail in a Git Rev News article on filter-repo, but some highlights for the main competitors:

filter-branch

BFG Repo Cleaner

  • great tool for its time, but while it makes some things simple, it is limited to a few kinds of rewrites.

  • its architecture is not amenable to handling more types of rewrites.

  • its architecture presents some shortcomings and bugs even for its intended usecase.

  • fans of bfg may be interested in bfg-ish, a reimplementation of bfg based on filter-repo which includes several new features and bugfixes relative to bfg.

  • a cheat sheet is available showing how to convert example commands from the manual of BFG Repo Cleaner into filter-repo commands.

Simple example, with comparisons

Let's say that we want to extract a piece of a repository, with the intent on merging just that piece into some other bigger repo. For extraction, we want to:

  • extract the history of a single directory, src/. This means that only paths under src/ remain in the repo, and any commits that only touched paths outside this directory will be removed.
  • rename all files to have a new leading directory, my-module/ (e.g. so that src/foo.c becomes my-module/src/foo.c)
  • rename any tags in the extracted repository to have a 'my-module-' prefix (to avoid any conflicts when we later merge this repo into something else)

Solving this with filter-repo

Doing this with filter-repo is as simple as the following command:

  git filter-repo --path src/ --to-subdirectory-filter my-module --tag-rename '':'my-module-'

(the single quotes are unnecessary, but make it clearer to a human that we are replacing the empty string as a prefix with my-module-)

Solving this with BFG Repo Cleaner

BFG Repo Cleaner is not capable of this kind of rewrite; in fact, all three types of wanted changes are outside of its capabilities.

Solving this with filter-branch

filter-branch comes with a pile of caveats (more on that below) even once you figure out the necessary invocation(s):

  git filter-branch \
      --tree-filter 'mkdir -p my-module && \
                     git ls-files \
                         | grep -v ^src/ \
                         | xargs git rm -f -q && \
                     ls -d * \
                         | grep -v my-module \
                         | xargs -I files mv files my-module/' \
          --tag-name-filter 'echo "my-module-$(cat)"' \
      --prune-empty -- --all
  git clone file://$(pwd) newcopy
  cd newcopy
  git for-each-ref --format="delete %(refname)" refs/tags/ \
      | grep -v refs/tags/my-module- \
      | git update-ref --stdin
  git gc --prune=now

Some might notice that the above filter-branch invocation will be really slow due to using --tree-filter; you could alternatively use the --index-filter option of filter-branch, changing the above commands to:

  git filter-branch \
      --index-filter 'git ls-files \
                          | grep -v ^src/ \
                          | xargs git rm -q --cached;
                      git ls-files -s \
                          | sed "s%$(printf \\t)%&my-module/%" \
                          | git update-index --index-info;
                      git ls-files \
                          | grep -v ^my-module/ \
                          | xargs git rm -q --cached' \
      --tag-name-filter 'echo "my-module-$(cat)"' \
      --prune-empty -- --all
  git clone file://$(pwd) newcopy
  cd newcopy
  git for-each-ref --format="delete %(refname)" refs/tags/ \
      | grep -v refs/tags/my-module- \
      | git update-ref --stdin
  git gc --prune=now

However, for either filter-branch command there are a pile of caveats. First, some may be wondering why I list five commands here for filter-branch. Despite the use of --all and --tag-name-filter, and filter-branch's manpage claiming that a clone is enough to get rid of old objects, the extra steps to delete the other tags and do another gc are still required to clean out the old objects and avoid mixing new and old history before pushing somewhere. Other caveats: * Commit messages are not rewritten; so if some of your commit messages refer to prior commits by (abbreviated) sha1, after the rewrite those messages will now refer to commits that are no longer part of the history. It would be better to rewrite those (abbreviated) sha1 references to refer to the new commit ids. * The --prune-empty flag sometimes misses commits that should be pruned, and it will also prune commits that started empty rather than just ended empty due to filtering. For repositories that intentionally use empty commits for versioning and publishing related purposes, this can be detrimental. * The commands above are OS-specific. GNU vs. BSD issues for sed, xargs, and other commands often trip up users; I think I failed to get most folks to use --index-filter since the only example in the filter-branch manpage that both uses it and shows how to move everything into a subdirectory is linux-specific, and it is not obvious to the reader that it has a portability issue since it silently misbehaves rather than failing loudly. * The --index-filter version of the filter-branch command may be two to three times faster than the --tree-filter version, but both filter-branch commands are going to be multiple orders of magnitude slower than filter-repo. * Both commands assume all filenames are composed entirely of ascii characters (even special ascii characters such as tabs or double quotes will wreak havoc and likely result in missing files or misnamed files)

Solving this with fast-export/fast-import

One can kind of hack this together with something like:

  git fast-export --no-data --reencode=yes --mark-tags --fake-missing-tagger \
      --signed-tags=strip --tag-of-filtered-object=rewrite --all \
      | grep -vP '^M [0-9]+ [0-9a-f]+ (?!src/)' \
      | grep -vP '^D (?!src/)' \
      | perl -pe 's%^(M [0-9]+ [0-9a-f]+ )(.*)$%\1my-module/\2%' \
      | perl -pe 's%^(D )(.*)$%\1my-module/\2%' \
      | perl -pe s%refs/tags/%refs/tags/my-module-% \
      | git -c core.ignorecase=false fast-import --date-format=raw-permissive \
            --force --quiet
  git for-each-ref --format="delete %(refname)" refs/tags/ \
      | grep -v refs/tags/my-module- \
      | git update-ref --stdin
  git reset --hard
  git reflog expire --expire=now --all
  git gc --prune=now

But this comes with some nasty caveats and limitations: * The various greps and regex replacements operate on the entire fast-export stream and thus might accidentally corrupt unintended portions of it, such as commit messages. If you needed to edit file contents and thus dropped the --no-data flag, it could also end up corrupting file contents. * This command assumes all filenames in the repository are composed entirely of ascii characters, and also exclude special characters such as tabs or double quotes. If such a special filename exists within the old src/ directory, it will be pruned even though it was intended to be kept. (In slightly different repository rewrites, this type of editing also risks corrupting filenames with special characters by adding extra double quotes near the end of the filename and in some leading directory name.)

Core symbols most depended-on inside this repo

write
called by 116
git_filter_repo.py
decode
called by 41
git_filter_repo.py
insert
called by 23
git_filter_repo.py
_advance_currentline
called by 22
git_filter_repo.py
check_output
called by 13
git_filter_repo.py
Popen
called by 13
git_filter_repo.py
readline
called by 12
git_filter_repo.py
parse_args
called by 11
git_filter_repo.py

Shape

Method 180
Class 32
Function 23

Languages

Python100%

Modules by API surface

git_filter_repo.py215 symbols
t/t9391/splice_repos.py6 symbols
t/t9391/unusual.py5 symbols
t/t9391/print_progress.py3 symbols
t/t9391/file_filter.py2 symbols
t/t9391/strip-cvs-keywords.py1 symbols
t/t9391/rename-master-to-develop.py1 symbols
t/t9391/erroneous.py1 symbols
t/t9391/commit_info.py1 symbols

For agents

$ claude mcp add git-filter-repo \
  -- python -m otcore.mcp_server <graph>

⬇ download graph artifact