
This package provides an implementation of the inference pipeline of AlphaFold v2. For simplicity, we refer to this model as AlphaFold throughout the rest of this document.
We also provide:
Any publication that discloses findings arising from using this source code or the model parameters should cite the AlphaFold paper and, if applicable, the AlphaFold-Multimer paper.
Please also refer to the Supplementary Information for a detailed description of the method.
You can use a slightly simplified version of AlphaFold with this Colab notebook or community-supported versions (see below).
If you have any questions, please contact the AlphaFold team at alphafold@deepmind.com.

You will need a machine running Linux, AlphaFold does not support other operating systems. Full installation requires up to 3 TB of disk space to keep genetic databases (SSD storage is recommended) and a modern NVIDIA GPU (GPUs with more memory can predict larger protein structures).
Please follow these steps:
Install Docker.
Clone this repository and cd into it.
bash
git clone https://github.com/deepmind/alphafold.git
cd ./alphafold
Download genetic databases and model parameters:
Install aria2c. On most Linux distributions it is available via the
package manager as the aria2 package (on Debian-based distributions this
can be installed by running sudo apt install aria2).
Please use the script scripts/download_all_data.sh to download
and set up full databases. This may take substantial time (download size is
556 GB), so we recommend running this script in the background:
bash
scripts/download_all_data.sh <DOWNLOAD_DIR> > download.log 2> download_all.log &
Note: The download directory <DOWNLOAD_DIR> should not be a
subdirectory in the AlphaFold repository directory. If it is, the Docker
build will be slow as the large databases will be copied into the docker
build context.
It is possible to run AlphaFold with reduced databases; please refer to the complete documentation.
Check that AlphaFold will be able to use a GPU by running:
bash
docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi
The output of this command should show a list of your GPUs. If it doesn't, check if you followed all steps correctly when setting up the NVIDIA Container Toolkit or take a look at the following NVIDIA Docker issue.
If you wish to run AlphaFold using Singularity (a common containerization platform on HPC systems) we recommend using some of the third party Singularity setups as linked in https://github.com/deepmind/alphafold/issues/10 or https://github.com/deepmind/alphafold/issues/24.
Build the Docker image:
bash
docker build -f docker/Dockerfile -t alphafold .
If you encounter the following error:
W: GPG error: https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64 InRelease: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY A4B469963BF863CC
E: The repository 'https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64 InRelease' is not signed.
use the workaround described in https://github.com/deepmind/alphafold/issues/463#issuecomment-1124881779.
Install the run_docker.py dependencies. Note: You may optionally wish to
create a
Python Virtual Environment
to prevent conflicts with your system's Python environment.
bash
pip3 install -r docker/requirements.txt
Make sure that the output directory exists (the default is /tmp/alphafold)
and that you have sufficient permissions to write into it.
Run run_docker.py pointing to a FASTA file containing the protein
sequence(s) for which you wish to predict the structure (--fasta_paths
parameter). AlphaFold will search for the available templates before the
date specified by the --max_template_date parameter; this could be used to
avoid certain templates during modeling. --data_dir is the directory with
downloaded genetic databases and --output_dir is the absolute path to the
output directory.
bash
python3 docker/run_docker.py \
--fasta_paths=your_protein.fasta \
--max_template_date=2022-01-01 \
--data_dir=$DOWNLOAD_DIR \
--output_dir=/home/user/absolute_path_to_the_output_dir
Once the run is over, the output directory shall contain predicted structures of the target protein. Please check the documentation below for additional options and troubleshooting tips.
This step requires aria2c to be installed on your machine.
AlphaFold needs multiple genetic (sequence) databases to run:
We provide a script scripts/download_all_data.sh that can be used to download
and set up all of these databases:
Recommended default:
bash
scripts/download_all_data.sh <DOWNLOAD_DIR>
will download the full databases.
With reduced_dbs parameter:
bash
scripts/download_all_data.sh <DOWNLOAD_DIR> reduced_dbs
will download a reduced version of the databases to be used with the
reduced_dbs database preset. This shall be used with the corresponding
AlphaFold parameter --db_preset=reduced_dbs later during the AlphaFold run
(please see AlphaFold parameters section).
:ledger: Note: The download directory <DOWNLOAD_DIR> should not be a
subdirectory in the AlphaFold repository directory. If it is, the Docker build
will be slow as the large databases will be copied during the image creation.
We don't provide exactly the database versions used in CASP14 – see the note on reproducibility. Some of the databases are mirrored for speed, see mirrored databases.
:ledger: Note: The total download size for the full databases is around 556 GB and the total size when unzipped is 2.62 TB. Please make sure you have a large enough hard drive space, bandwidth and time to download. We recommend using an SSD for better genetic search performance.
:ledger: Note: If the download directory and datasets don't have full read and
write permissions, it can cause errors with the MSA tools, with opaque
(external) error messages. Please ensure the required permissions are applied,
e.g. with the sudo chmod 755 --recursive "$DOWNLOAD_DIR" command.
The download_all_data.sh script will also download the model parameter files.
Once the script has finished, you should have the following directory structure:
$DOWNLOAD_DIR/ # Total: ~ 2.62 TB (download: 556 GB)
bfd/ # ~ 1.8 TB (download: 271.6 GB)
# 6 files.
mgnify/ # ~ 120 GB (download: 67 GB)
mgy_clusters_2022_05.fa
params/ # ~ 5.3 GB (download: 5.3 GB)
# 5 CASP14 models,
# 5 pTM models,
# 5 AlphaFold-Multimer models,
# LICENSE,
# = 16 files.
pdb70/ # ~ 56 GB (download: 19.5 GB)
# 9 files.
pdb_mmcif/ # ~ 238 GB (download: 43 GB)
mmcif_files/
# About 199,000 .cif files.
obsolete.dat
pdb_seqres/ # ~ 0.2 GB (download: 0.2 GB)
pdb_seqres.txt
small_bfd/ # ~ 17 GB (download: 9.6 GB)
bfd-first_non_consensus_sequences.fasta
uniref30/ # ~ 206 GB (download: 52.5 GB)
# 7 files.
uniprot/ # ~ 105 GB (download: 53 GB)
uniprot.fasta
uniref90/ # ~ 67 GB (download: 34 GB)
uniref90.fasta
bfd/ is only downloaded if you download the full databases, and small_bfd/
is only downloaded if you download the reduced databases.
While the AlphaFold code is licensed under the Apache 2.0 License, the AlphaFold parameters and CASP15 prediction data are made available under the terms of the CC BY 4.0 license. Please see the Disclaimer below for more detail.
The AlphaFold parameters are available from
https://storage.googleapis.com/alphafold/alphafold_params_2022-12-06.tar, and
are downloaded as part of the scripts/download_all_data.sh script. This script
will download parameters for:
If you have a previous version you can either reinstall fully from scratch (remove everything and run the setup from scratch) or you can do an incremental update that will be significantly faster but will require a bit more work. Make sure you follow these steps in the exact order they are listed below:
git
fetch origin main to get all code updates.<DOWNLOAD_DIR>/uniprot.scripts/download_uniprot.sh <DOWNLOAD_DIR>.<DOWNLOAD_DIR>/uniclust30.scripts/download_uniref30.sh <DOWNLOAD_DIR>.<DOWNLOAD_DIR>/uniref90.scripts/download_uniref90.sh <DOWNLOAD_DIR>.<DOWNLOAD_DIR>/mgnify.scripts/download_mgnify.sh <DOWNLOAD_DIR>.<DOWNLOAD_DIR>/pdb_mmcif. It is needed to have PDB SeqRes and
PDB from exactly the same date. Failure to do this step will result in
potential errors when searching for templates when running
AlphaFold-Multimer.scripts/download_pdb_mmcif.sh <DOWNLOAD_DIR>.scripts/download_pdb_seqres.sh <DOWNLOAD_DIR>.<DOWNLOAD_DIR>/params.scripts/download_alphafold_params.sh <DOWNLOAD_DIR>.To use the deprecated v2.2.0 AlphaFold-Multimer model weights:
SOURCE_URL in scripts/download_alphafold_params.sh to
https://storage.googleapis.com/alphafold/alphafold_params_2022-03-02.tar,
and download the old parameters._v3 to _v2 in the multimer MODEL_PRESETS in config.py.To use the deprecated v2.1.0 AlphaFold-Multimer model weights:
SOURCE_URL in scripts/download_alphafold_params.sh to
https://storage.googleapis.com/alphafold/alphafold_params_2022-01-19.tar,
and download the old parameters._v3 in the multimer MODEL_PRESETS in config.py.The simplest way to run AlphaFold is using the provided Docker script. This
was tested on Google Cloud with a machine using the nvidia-gpu-cloud-image
with 12 vCPUs, 85 GB of RAM, a 100 GB boot disk, the databases on an additional
3 TB disk, and an A100 GPU. For your first run, please follow the instructions
from [Installation and running your first prediction](#
$ claude mcp add alphafold \
-- python -m otcore.mcp_server <graph>