hub / github.com/src-d/hercules

github.com/src-d/hercules @v10.7.2 sqlite

repository ↗ · DeepWiki ↗ · release v10.7.2 ↗

1,782 symbols 6,295 edges 114 files 526 documented · 30%

README

Hercules

  Fast, insightful and highly customizable Git history analysis.




  <a href="http://godoc.org/gopkg.in/src-d/hercules.v10"><img src="https://godoc.org/gopkg.in/src-d/hercules.v10?status.svg" alt="GoDoc"></a>
  <a href="https://travis-ci.com/src-d/hercules"><img src="https://travis-ci.com/src-d/hercules.svg?branch=master" alt="Travis build Status"></a>
  <a href="https://ci.appveyor.com/project/vmarkovtsev/hercules"><img src="https://ci.appveyor.com/api/projects/status/49f0lm3v2y6xyph3?svg=true" alt="AppVeyor build status"></a>
  <a href="https://pypi.python.org/pypi/labours"><img src="https://img.shields.io/pypi/v/labours.svg" alt="PyPi package status"></a>
  <a href="https://hub.docker.com/r/srcd/hercules"><img src="https://img.shields.io/docker/build/srcd/hercules.svg" alt="Docker build status"></a>
  <a href="https://codecov.io/gh/src-d/hercules"><img src="https://codecov.io/github/src-d/hercules/coverage.svg" alt="Code coverage"></a>
  <a href="https://goreportcard.com/report/github.com/src-d/hercules"><img src="https://goreportcard.com/badge/github.com/src-d/hercules" alt="Go Report Card"></a>
  <a href="https://opensource.org/licenses/Apache-2.0"><img src="https://img.shields.io/badge/License-Apache%202.0-blue.svg" alt="Apache 2.0 license"></a>

Overview • How To Use • Installation • Contributions • License

Overview
Installation
- Build from source
- GitHub Action
Contributions
License
Usage

Overview

Hercules is an amazingly fast and highly customizable Git repository analysis engine written in Go. Batteries are included. Powered by go-git and Babelfish.

There are two command-line tools: hercules and labours. The first is a program written in Go which takes a Git repository and executes a Directed Acyclic Graph (DAG) of analysis tasks over the full commit history. The second is a Python script which shows some predefined plots over the collected data. These two tools are normally used together through a pipe. It is possible to write custom analyses using the plugin system. It is also possible to merge several analysis results together - relevant for organizations. The analyzed commit history includes branches, merges, etc.

Hercules has been successfully used for several internal projects at source{d}. There are blog posts: 1, 2 and a presentation. Please contribute by testing, fixing bugs, adding new analyses, or coding swagger!

Hercules DAG of Burndown analysis

The DAG of burndown and couples analyses with UAST diff refining. Generated with hercules --burndown --burndown-people --couples --feature=uast --dry-run --dump-dag doc/dag.dot https://github.com/src-d/hercules

git/git image

torvalds/linux line burndown (granularity 30, sampling 30, resampled by year). Generated with hercules --burndown --first-parent --pb https://github.com/torvalds/linux | labours -f pb -m burndown-project in 1h 40min.

Installation

Grab hercules binary from the Releases page. labours is installable from PyPi:

pip3 install labours

pip3 is the Python package manager.

Numpy and Scipy can be installed on Windows using http://www.lfd.uci.edu/~gohlke/pythonlibs/

Build from source

You are going to need Go (>= v1.11) and protoc.

git clone https://github.com/src-d/hercules && cd hercules
make
pip3 install -e ./python

GitHub Action

It is possible to run Hercules as a GitHub Action: Hercules on GitHub Marketplace. Please refer to the sample workflow which demonstrates how to setup.

Contributions

...are welcome! See CONTRIBUTING and code of conduct.

License

Apache 2.0

Usage

The most useful and reliably up-to-date command line reference:

hercules --help

Some examples:

# Use "memory" go-git backend and display the burndown plot. "memory" is the fastest but the repository's git data must fit into RAM.
hercules --burndown https://github.com/src-d/go-git | labours -m burndown-project --resample month
# Use "file system" go-git backend and print some basic information about the repository.
hercules /path/to/cloned/go-git
# Use "file system" go-git backend, cache the cloned repository to /tmp/repo-cache, use Protocol Buffers and display the burndown plot without resampling.
hercules --burndown --pb https://github.com/git/git /tmp/repo-cache | labours -m burndown-project -f pb --resample raw

# Now something fun
# Get the linear history from git rev-list, reverse it
# Pipe to hercules, produce burndown snapshots for every 30 days grouped by 30 days
# Save the raw data to cache.yaml, so that later is possible to labours -i cache.yaml
# Pipe the raw data to labours, set text font size to 16pt, use Agg matplotlib backend and save the plot to output.png
git rev-list HEAD | tac | hercules --commits - --burndown https://github.com/git/git | tee cache.yaml | labours -m burndown-project --font-size 16 --backend Agg --output git.png

labours -i /path/to/yaml allows to read the output from hercules which was saved on disk.

Caching

It is possible to store the cloned repository on disk. The subsequent analysis can run on the corresponding directory instead of cloning from scratch:

# First time - cache
hercules https://github.com/git/git /tmp/repo-cache

# Second time - use the cache
hercules --some-analysis /tmp/repo-cache

GitHub Action

The action produces the artifact named hercules_charts. Since it is currently impossible to pack several files in one artifact, all the charts and Tensorflow Projector files are packed in the inner tar archive. In order to view the embeddings, go to projector.tensorflow.org, click "Load" and choose the two TSVs. Then use UMAP or T-SNE.

Docker image

docker run --rm srcd/hercules hercules --burndown --pb https://github.com/git/git | docker run --rm -i -v $(pwd):/io srcd/hercules labours -f pb -m burndown-project -o /io/git_git.png

Built-in analyses

Project burndown

hercules --burndown
labours -m burndown-project

Line burndown statistics for the whole repository. Exactly the same what git-of-theseus does but much faster. Blaming is performed efficiently and incrementally using a custom RB tree tracking algorithm, and only the last modification date is recorded while running the analysis.

All burndown analyses depend on the values of granularity and sampling. Granularity is the number of days each band in the stack consists of. Sampling is the frequency with which the burnout state is snapshotted. The smaller the value, the more smooth is the plot but the more work is done.

There is an option to resample the bands inside labours, so that you can define a very precise distribution and visualize it different ways. Besides, resampling aligns the bands across periodic boundaries, e.g. months or years. Unresampled bands are apparently not aligned and start from the project's birth date.

Files

hercules --burndown --burndown-files
labours -m burndown-file

Burndown statistics for every file in the repository which is alive in the latest revision.

Note: it will generate separate graph for every file. You don't want to run it on repository with many files.

People

hercules --burndown --burndown-people [--people-dict=/path/to/identities]
labours -m burndown-person

Burndown statistics for the repository's contributors. If --people-dict is not specified, the identities are discovered by the following algorithm:

We start from the root commit towards the HEAD. Emails and names are converted to lower case.
If we process an unknown email and name, record them as a new developer.
If we process a known email but unknown name, match to the developer with the matching email, and add the unknown name to the list of that developer's names.
If we process an unknown email but known name, match to the developer with the matching name, and add the unknown email to the list of that developer's emails.

If --people-dict is specified, it should point to a text file with the custom identities. The format is: every line is a single developer, it contains all the matching emails and names separated by |. The case is ignored.

Overwrites matrix

Wireshark top 20 overwrites matrix

Wireshark top 20 devs - overwrites matrix

hercules --burndown --burndown-people [--people-dict=/path/to/identities]
labours -m overwrites-matrix

Beside the burndown information, --burndown-people collects the added and deleted line statistics per developer. Thus it can be visualized how many lines written by developer A are removed by developer B. This indicates collaboration between people and defines expertise teams.

The format is the matrix with N rows and (N+2) columns, where N is the number of developers.

First column is the number of lines the developer wrote.
Second column is how many lines were written by the developer and deleted by unidentified developers (if --people-dict is not specified, it is always 0).
The rest of the columns show how many lines were written by the developer and deleted by identified developers.

The sequence of developers is stored in people_sequence YAML node.

Code ownership

Ember.js top 20 code ownership

Ember.js top 20 devs - code ownership

hercules --burndown --burndown-people [--people-dict=/path/to/identities]
labours -m ownership

--burndown-people also allows to draw the code share through time stacked area plot. That is, how many lines are alive at the sampled moments in time for each identified developer.

Couples

Linux kernel file couples

torvalds/linux files' coupling in Tensorflow Projector

hercules --couples [--people-dict=/path/to/identities]
labours -m couples -o <name> [--couples-tmp-dir=/tmp]

Important: it requires Tensorflow to be installed, please follow official instructions.

The files are coupled if they are changed in the same commit. The developers are coupled if they change the same file. hercules records the number of couples throughout the whole commit history and outputs the two corresponding co-occurrence matrices. labours then trains Swivel embeddings - dense vectors which reflect the co-occurrence probability through the Euclidean distance. The training requires a working Tensorflow installation. The intermediate files are stored in the system temporary directory or --couples-tmp-dir if it is specified. The trained embeddings are written to the current working directory with the name depending on -o. The output format is TSV and matches Tensorflow Projector so that the files and people can be visualized with t-SNE implemented in TF Projector.

Structural hotness

      46  jinja2/compiler.py:visit_Template [FunctionDef]
      42  jinja2/compiler.py:visit_For [FunctionDef]
      34  jinja2/compiler.py:visit_Output [FunctionDef]
      29  jinja2/environment.py:compile [FunctionDef]
      27  jinja2/compiler.py:visit_Include [FunctionDef]
      22  jinja2/compiler.py:visit_Macro [FunctionDef]
      22  jinja2/compiler.py:visit_FromImport [FunctionDef]
      21  jinja2/compiler.py:visit_Filter [FunctionDef]
      21  jinja2/runtime.py:__call__ [FunctionDef]
      20  jinja2/compiler.py:visit_Block [FunctionDef]

Thanks to Babelfish, hercules is able to measure how many times each structural

Extension points exported contracts — how you extend this code

PipelineItem (Interface)

PipelineItem is the interface for all the units in the Git commits analysis pipeline. [7 implementers]

internal/core/pipeline.go

FileGetter (FuncType)

FileGetter defines a function which loads the Git file by the specified path. The state can be arbitrary though here it

internal/plumbing/blob_cache.go

FeaturedPipelineItem (Interface)

FeaturedPipelineItem enables switching the automatic insertion of pipeline items on or off. [7 implementers]

internal/core/pipeline.go

LeafPipelineItem (Interface)

LeafPipelineItem corresponds to the top level pipeline items which produce the end results. [13 implementers]

internal/core/pipeline.go

ResultMergeablePipelineItem (Interface)

ResultMergeablePipelineItem specifies the methods to combine several analysis results together. [3 implementers]

internal/core/pipeline.go

HibernateablePipelineItem (Interface)

HibernateablePipelineItem is the interface to allow pipeline items to be frozen (compacted, unloaded) while they are not [3 …

internal/core/pipeline.go

Core symbols most depended-on inside this repo

Equal

called by 1307

internal/rbtree/rbtree.go

internal/core/pipeline.go

Update

called by 131

internal/burndown/file.go

Consume

called by 103

internal/core/pipeline.go

Len

called by 102

internal/plumbing/renames.go

String

called by 69

internal/core/forks.go

Errorf

called by 54

internal/core/logger.go

Shape

Method 1,067

Function 562

Struct 126

Class 12

Interface 7

TypeAlias 7

FuncType 1

Languages

Go90%

Python8%

Java2%

Modules by API surface

internal/pb/pb.pb.go396 symbols

python/labours/readers.py61 symbols

internal/rbtree/rbtree.go61 symbols

internal/core/pipeline_test.go59 symbols

internal/core/pipeline.go58 symbols

internal/core/registry_test.go51 symbols

internal/plumbing/uast/uast.go46 symbols

leaves/burndown.go42 symbols

internal/plumbing/identity/identity_test.go41 symbols

internal/rbtree/rbtree_test.go40 symbols

internal/burndown/file_test.go39 symbols

leaves/burndown_test.go29 symbols

Dependencies from manifests, versioned

github.com/BurntSushi/tomlv0.3.1 · 1×

github.com/Jeffail/tunnyv0.0.0-2018030420461 · 1×

github.com/Masterminds/semverv0.0.0-2018080714243 · 1×

github.com/Masterminds/sprigv0.0.0-2018072521215 · 1×

github.com/antchfx/xpathv0.0.0-2018092204182 · 1×

github.com/aokoli/goutilsv1.0.1 · 1×

github.com/fatih/camelcasev1.0.0 · 1×

github.com/fatih/colorv1.7.0 · 1×

github.com/gogo/protobufv1.3.0 · 1×

github.com/google/uuidv0.0.0-2018082818155 · 1×

github.com/grpc-ecosystem/grpc-opentracingv0.0.0-2018050721335 · 1×

github.com/huandu/xstringsv0.0.0-2018090615175 · 1×

For agents

$ claude mcp add hercules \
  -- python -m otcore.mcp_server <graph>

⬇ download graph artifact

github.com/src-d/hercules @v10.7.2 sqlite

Hercules

Table of Contents

Overview

Installation

Build from source

GitHub Action

Contributions

License

Usage

Caching

GitHub Action

Docker image

Built-in analyses

Project burndown

Files

People

Overwrites matrix

Code ownership

Couples

Structural hotness

Extension points exported contracts — how you extend this code

Core symbols most depended-on inside this repo

Shape

Languages

Modules by API surface

Dependencies from manifests, versioned

For agents