hub / github.com/meta-pytorch/captum

github.com/meta-pytorch/captum @v0.9.0 sqlite

repository ↗ · DeepWiki ↗ · release v0.9.0 ↗

2,535 symbols 12,563 edges 237 files 649 documented · 26%

README

Captum Logo

Captum is a model interpretability and understanding library for PyTorch. Captum means comprehension in Latin and contains general purpose implementations of integrated gradients, saliency maps, smoothgrad, vargrad and others for PyTorch models. It has quick integration for models built with domain-specific libraries such as torchvision, torchtext, and others.

About Captum

With the increase in model complexity and the resulting lack of transparency, model interpretability methods have become increasingly important. Model understanding is both an active area of research as well as an area of focus for practical applications across industries using machine learning. Captum provides state-of-the-art algorithms such as Integrated Gradients, Testing with Concept Activation Vectors (TCAV), TracIn influence functions, just to name a few, that provide researchers and developers with an easy way to understand which features, training examples or concepts contribute to a models' predictions and in general what and how the model learns. In addition to that, Captum also provides adversarial attacks and minimal input perturbation capabilities that can be used both for generating counterfactual explanations and adversarial perturbations.

Captum helps ML researchers more easily implement interpretability algorithms that can interact with PyTorch models. Captum also allows researchers to quickly benchmark their work against other existing algorithms available in the library.

Overview of Attribution Algorithms

Target Audience

The primary audiences for Captum are model developers who are looking to improve their models and understand which concepts, features or training examples are important and interpretability researchers focused on identifying algorithms that can better interpret many types of models.

Captum can also be used by application engineers who are using trained models in production. Captum provides easier troubleshooting through improved model interpretability, and the potential for delivering better explanations to end users on why they’re seeing a specific piece of content, such as a movie recommendation.

Installation

Installation Requirements - Python >= 3.10 - PyTorch >= 2.3

Installing the latest release

Install released Captum via pip.

With pip

pip install captum

Manual / Dev install

If you'd like to try our bleeding edge features (and don't mind potentially running into the occasional bug here or there), you can install the latest master directly from GitHub. For a basic install, run:

git clone https://github.com/pytorch/captum.git
cd captum
pip install -e .

To customize the installation, you can also run the following variants of the above: * pip install -e .[dev]: Also installs all tools necessary for development (testing, linting, docs building; see Contributing below). * pip install -e .[tutorials]: Also installs all packages necessary for running the tutorial notebooks.

To execute unit tests from a manual install, run:

# running a single unit test
python -m unittest -v tests.attr.test_saliency
# running all unit tests
pytest -ra

Getting Started

Captum helps you interpret and understand predictions of PyTorch models by exploring features that contribute to a prediction the model makes. It also helps understand which neurons and layers are important for model predictions.

Let's apply some of those algorithms to a toy model we have created for demonstration purposes. For simplicity, we will use the following architecture, but users are welcome to use any PyTorch model of their choice.

import numpy as np

import torch
import torch.nn as nn

from captum.attr import (
    GradientShap,
    DeepLift,
    DeepLiftShap,
    IntegratedGradients,
    LayerConductance,
    NeuronConductance,
    NoiseTunnel,
)

class ToyModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.lin1 = nn.Linear(3, 3)
        self.relu = nn.ReLU()
        self.lin2 = nn.Linear(3, 2)

        # initialize weights and biases
        self.lin1.weight = nn.Parameter(torch.arange(-4.0, 5.0).view(3, 3))
        self.lin1.bias = nn.Parameter(torch.zeros(1,3))
        self.lin2.weight = nn.Parameter(torch.arange(-3.0, 3.0).view(2, 3))
        self.lin2.bias = nn.Parameter(torch.ones(1,2))

    def forward(self, input):
        return self.lin2(self.relu(self.lin1(input)))

Let's create an instance of our model and set it to eval mode.

model = ToyModel()
model.eval()

Next, we need to define simple input and baseline tensors. Baselines belong to the input space and often carry no predictive signal. Zero tensor can serve as a baseline for many tasks. Some interpretability algorithms such as IntegratedGradients, Deeplift and GradientShap are designed to attribute the change between the input and baseline to a predictive class or a value that the neural network outputs.

We will apply model interpretability algorithms on the network mentioned above in order to understand the importance of individual neurons/layers and the parts of the input that play an important role in the final prediction.

To make computations deterministic, let's fix random seeds.

torch.manual_seed(123)
np.random.seed(123)

Let's define our input and baseline tensors. Baselines are used in some interpretability algorithms such as IntegratedGradients, DeepLift, GradientShap, NeuronConductance, LayerConductance, InternalInfluence and NeuronIntegratedGradients.

input = torch.rand(2, 3)
baseline = torch.zeros(2, 3)

Next we will use IntegratedGradients algorithms to assign attribution scores to each input feature with respect to the first target output.

ig = IntegratedGradients(model)
attributions, delta = ig.attribute(input, baseline, target=0, return_convergence_delta=True)
print('IG Attributions:', attributions)
print('Convergence Delta:', delta)

Output:

IG Attributions: tensor([[-0.5922, -1.5497, -1.0067],
                         [ 0.0000, -0.2219, -5.1991]])
Convergence Delta: tensor([2.3842e-07, -4.7684e-07])

The algorithm outputs an attribution score for each input element and a convergence delta. The lower the absolute value of the convergence delta the better is the approximation. If we choose not to return delta, we can simply not provide the return_convergence_delta input argument. The absolute value of the returned deltas can be interpreted as an approximation error for each input sample. It can also serve as a proxy of how accurate the integral approximation for given inputs and baselines is. If the approximation error is large, we can try a larger number of integral approximation steps by setting n_steps to a larger value. Not all algorithms return approximation error. Those which do, though, compute it based on the completeness property of the algorithms.

Positive attribution score means that the input in that particular position positively contributed to the final prediction and negative means the opposite. The magnitude of the attribution score signifies the strength of the contribution. Zero attribution score means no contribution from that particular feature.

Similarly, we can apply GradientShap, DeepLift and other attribution algorithms to the model.

GradientShap first chooses a random baseline from baselines' distribution, then adds gaussian noise with std=0.09 to each input example n_samples times. Afterwards, it chooses a random point between each example-baseline pair and computes the gradients with respect to target class (in this case target=0). Resulting attribution is the mean of gradients * (inputs - baselines)

gs = GradientShap(model)

# We define a distribution of baselines and draw `n_samples` from that
# distribution in order to estimate the expectations of gradients across all baselines
baseline_dist = torch.randn(10, 3) * 0.001
attributions, delta = gs.attribute(input, stdevs=0.09, n_samples=4, baselines=baseline_dist,
                                   target=0, return_convergence_delta=True)
print('GradientShap Attributions:', attributions)
print('Convergence Delta:', delta)

Output

GradientShap Attributions: tensor([[-0.1542, -1.6229, -1.5835],
                                   [-0.3916, -0.2836, -4.6851]])
Convergence Delta: tensor([ 0.0000, -0.0005, -0.0029, -0.0084, -0.0087, -0.0405,  0.0000, -0.0084])

Deltas are computed for each n_samples * input.shape[0] example. The user can, for instance, average them:

deltas_per_example = torch.mean(delta.reshape(input.shape[0], -1), dim=1)

in order to get per example average delta.

Below is an example of how we can apply DeepLift and DeepLiftShap on the ToyModel described above. The current implementation of DeepLift supports only the Rescale rule. For more details on alternative implementations, please see the DeepLift paper.

dl = DeepLift(model)
attributions, delta = dl.attribute(input, baseline, target=0, return_convergence_delta=True)
print('DeepLift Attributions:', attributions)
print('Convergence Delta:', delta)

Output

DeepLift Attributions: tensor([[-0.5922, -1.5497, -1.0067],
                               [ 0.0000, -0.2219, -5.1991])
Convergence Delta: tensor([0., 0.])

DeepLift assigns similar attribution scores as IntegratedGradients to inputs, however it has lower execution time. Another important thing to remember about DeepLift is that it currently doesn't support all non-linear activation types. For more details on limitations of the current implementation, please see the DeepLift paper.

Similar to integrated gradients, DeepLift returns a convergence delta score per input example. The approximation error is then the absolute value of the convergence deltas and can serve as a proxy of how accurate the algorithm's approximation is.

Now let's look into DeepLiftShap. Similar to GradientShap, DeepLiftShap uses baseline distribution. In the example below, we use the same baseline distribution as for GradientShap.

dl = DeepLiftShap(model)
attributions, delta = dl.attribute(input, baseline_dist, target=0, return_convergence_delta=True)
print('DeepLiftSHAP Attributions:', attributions)
print('Convergence Delta:', delta)

Output

DeepLiftShap Attributions: tensor([[-5.9169e-01, -1.5491e+00, -1.0076e+00],
                                   [-4.7101e-03, -2.2300e-01, -5.1926e+00]], grad_fn=<MeanBackward1>)
Convergence Delta: tensor([-4.6120e-03, -1.6267e-03, -5.1045e-04, -1.4184e-03, -6.8886e-03,
                           -2.2224e-02,  0.0000e+00, -2.8790e-02, -4.1285e-03, -2.7295e-02,
                           -3.2349e-03, -1.6265e-03, -4.7684e-07, -1.4191e-03, -6.8889e-03,
                           -2.2224e-02,  0.0000e+00, -2.4792e-02, -4.1289e-03, -2.7296e-02])

DeepLiftShap uses DeepLift to compute attribution score for each input-baseline pair and averages it for each input across all baselines.

It computes deltas for each input example-baseline pair, thus resulting to input.shape[0] * baseline.shape[0] delta values.

Similar to GradientShap in order to compute example-based deltas we can average them per example:

deltas_per_example = torch.mean(delta.reshape(input.shape[0], -1), dim=1)

In order to smooth and improve the quality of the attributions we can run IntegratedGradients and other attribution methods through a NoiseTunnel. NoiseTunnel allows us to use SmoothGrad, SmoothGrad_Sq and VarGrad techniques to smoothen the attributions by aggregating them for multiple noisy samples that were generated by adding gaussian noise.

Here is an example of how we can use NoiseTunnel with IntegratedGradients.

ig = IntegratedGradients(model)
nt = NoiseTunnel(ig)
attributions, delta = nt.attribute(input, nt_type='smoothgrad', stdevs=0.02, nt_samples=4,
      baselines=baseline, target=0, return_convergence_delta=True)
print('IG + SmoothGrad Attributions:', attributions)
print('Convergence Delta:', delta)

Output ``` IG + SmoothGrad Attributions: tensor([[-0.4574, -1.5493, -1.0893], [ 0.0000, -0.2647, -5.1619]]) Convergence Delta: tensor([ 0.0000e+00, 2.3842e-07, 0.0000e+

Core symbols most depended-on inside this repo

assertTensorAlmostEqual

called by 463

captum/testing/helpers/basic.py

append

called by 165

captum/_utils/models/linear_model/train.py

_format_tensor_into_tuples

called by 50

captum/_utils/common.py

assertTensorTuplesAlmostEqual

called by 48

captum/testing/helpers/basic.py

attribute

called by 47

captum/attr/_core/lrp.py

_format_output

called by 37

captum/_utils/common.py

captum/influence/_core/tracincp.py

Shape

Method 1,820

Class 346

Function 346

Route 23

Languages

Python99%

TypeScript1%

Modules by API surface

captum/testing/helpers/basic_models.py117 symbols

tests/attr/test_feature_ablation.py80 symbols

tests/concept/test_tcav.py60 symbols

tests/attr/test_llm_attr.py60 symbols

captum/testing/helpers/influence/common.py47 symbols

tests/attr/test_shapley.py45 symbols

captum/attr/_utils/stat.py45 symbols

captum/_utils/common.py45 symbols

tests/attr/test_integrated_gradients_basic.py43 symbols

tests/attr/test_lrp.py41 symbols

captum/attr/_core/llm_attr.py41 symbols

captum/influence/_utils/common.py40 symbols

Dependencies from manifests, versioned

docusaurus1.14.7 · 1×

matplotlib1×

numpy1×

packaging1×

torch2.3 · 1×

tqdm1×

For agents

$ claude mcp add captum \
  -- python -m otcore.mcp_server <graph>

⬇ download graph artifact