hub / github.com/facebookresearch/sapiens

github.com/facebookresearch/sapiens @main sqlite

12,859 symbols 59,392 edges 2,486 files 7,742 documented · 60%

README

Sapiens

Foundation for Human Vision Models

  <a href="https://rawalkhirodkar.github.io/"><strong>Rawal Khirodkar</strong></a>
  ·
  <a href="https://scholar.google.ch/citations?user=oLi7xJ0AAAAJ&hl=en"><strong>Timur Bagautdinov</strong></a>
  ·
  <a href="https://una-dinosauria.github.io/"><strong>Julieta Martinez</strong></a>
  ·
  <a href="https://about.meta.com/realitylabs/"><strong>Su Zhaoen</strong></a>
  ·
  <a href="https://about.meta.com/realitylabs/"><strong>Austin James</strong></a>



  <a href="https://www.linkedin.com/in/peter-selednik-05036499/"><strong>Peter Selednik</strong></a>
  .
  <a href="https://scholar.google.fr/citations?user=8orqBsYAAAAJ&hl=ja"><strong>Stuart Anderson</strong></a>
  .
  <a href="https://shunsukesaito.github.io/"><strong>Shunsuke Saito</strong></a>

ECCV 2024 - Best Paper Candidate

Sapiens offers a comprehensive suite for human-centric vision tasks (e.g., 2D pose, part segmentation, depth, normal, etc.). The model family is pretrained on 300 million in-the-wild human images and shows excellent generalization to unconstrained conditions. These models are also designed for extracting high-resolution features, having been natively trained at a 1024 x 1024 image resolution with a 16-pixel patch size.

Sapiens2 is out! Please checkout: https://github.com/facebookresearch/sapiens2

🚀 Getting Started

Clone the Repository

bash git clone https://github.com/facebookresearch/sapiens.git export SAPIENS_ROOT=/path/to/sapiens

Recommended: Lite Installation (Inference-only)

For users setting up their own environment primarily for running existing models in inference mode, we recommend the Sapiens-Lite installation.\ This setup offers optimized inference (4x faster) with minimal dependencies (only PyTorch + numpy + cv2).

Full Installation

To replicate our complete training setup, run the provided installation script. \ This will create a new conda environment named sapiens and install all necessary dependencies.

bash cd $SAPIENS_ROOT/_install ./conda.sh

Please download the original checkpoints from hugging-face. \ You can be selective about only downloading the checkpoints of interest.\ Set $SAPIENS_CHECKPOINT_ROOT to be the path to the sapiens_host folder. Place the checkpoints following this directory structure: plaintext sapiens_host/ ├── detector/ │ └── checkpoints/ │ └── rtmpose/ ├── pretrain/ │ └── checkpoints/ │ ├── sapiens_0.3b/ ├── sapiens_0.3b_epoch_1600_clean.pth │ ├── sapiens_0.6b/ ├── sapiens_0.6b_epoch_1600_clean.pth │ ├── sapiens_1b/ │ └── sapiens_2b/ ├── pose/ └── checkpoints/ ├── sapiens_0.3b/ └── seg/ └── depth/ └── normal/

🌟 Human-Centric Vision Tasks

We finetune sapiens for multiple human-centric vision tasks. Please checkout the list below.

🎯 Easy Steps to Finetuning Sapiens

Finetuning our models is super-easy! Here is a detailed training guide for the following tasks. - ### Pose Estimation - ### Body-Part Segmentation - ### Depth Estimation - ### Surface Normal Estimation

📈 Quantitative Evaluations

Pose Estimation

🤝 Acknowledgements & Support & Contributing

We would like to acknowledge the work by OpenMMLab which this project benefits from.\ For any questions or issues, please open an issue in the repository.\ See contributing and the code of conduct.

License

This project is licensed under LICENSE.\ Portions derived from open-source projects are licensed under Apache 2.0.

📚 Citation

If you use Sapiens in your research, please consider citing us.

@article{khirodkar2024sapiens,
  title={Sapiens: Foundation for Human Vision Models},
  author={Khirodkar, Rawal and Bagautdinov, Timur and Martinez, Julieta and Zhaoen, Su and James, Austin and Selednik, Peter and Anderson, Stuart and Saito, Shunsuke},
  journal={arXiv preprint arXiv:2408.12569},
  year={2024}
}

Core symbols most depended-on inside this repo

cat

called by 985

det/mmdet/structures/bbox/base_boxes.py

reshape

called by 978

det/mmdet/structures/bbox/base_boxes.py

view

called by 880

det/mmdet/structures/bbox/base_boxes.py

det/mmdet/structures/bbox/base_boxes.py

build

called by 609

engine/mmengine/config/lazy.py

unsqueeze

called by 502

det/mmdet/structures/bbox/base_boxes.py

permute

called by 476

det/mmdet/structures/bbox/base_boxes.py

Shape

Method 8,731

Class 2,299

Function 1,804

Route 25

Languages

Python100%

Modules by API surface

seg/mmseg/datasets/transforms/transforms.py140 symbols

det/mmdet/datasets/transforms/transforms.py138 symbols

engine/mmengine/visualization/vis_backend.py87 symbols

pretrain/mmpretrain/datasets/transforms/processing.py81 symbols

engine/mmengine/config/config.py79 symbols

pretrain/mmpretrain/datasets/transforms/auto_augment.py77 symbols

pretrain/mmpretrain/models/multimodal/blip/language_model.py73 symbols

engine/mmengine/runner/runner.py65 symbols

det/mmdet/structures/mask/structures.py64 symbols

cv/mmcv/transforms/processing.py61 symbols

engine/mmengine/optim/scheduler/param_scheduler.py60 symbols

engine/mmengine/model/weight_init.py56 symbols

For agents

$ claude mcp add sapiens \
  -- python -m otcore.mcp_server <graph>

⬇ download graph artifact

github.com/facebookresearch/sapiens @main sqlite

Foundation for Human Vision Models

ECCV 2024 - Best Paper Candidate

🚀 Getting Started

Clone the Repository

Recommended: Lite Installation (Inference-only)

Full Installation

🌟 Human-Centric Vision Tasks

Image Encoder ^[lite]

Pose Estimation ^[lite]

Body Part Segmentation ^[lite]

Depth Estimation ^[lite]

Surface Normal Estimation ^[lite]

🎯 Easy Steps to Finetuning Sapiens

📈 Quantitative Evaluations

Pose Estimation

🤝 Acknowledgements & Support & Contributing

License

📚 Citation

Core symbols most depended-on inside this repo

Shape

Languages

Modules by API surface

For agents

github.com/facebookresearch/sapiens @main sqlite

Foundation for Human Vision Models

ECCV 2024 - Best Paper Candidate

🚀 Getting Started

Clone the Repository

Recommended: Lite Installation (Inference-only)

Full Installation

🌟 Human-Centric Vision Tasks

Image Encoder [lite]

Pose Estimation [lite]

Body Part Segmentation [lite]

Depth Estimation [lite]

Surface Normal Estimation [lite]

🎯 Easy Steps to Finetuning Sapiens

📈 Quantitative Evaluations

Pose Estimation

🤝 Acknowledgements & Support & Contributing

License

📚 Citation

Core symbols most depended-on inside this repo

Shape

Languages

Modules by API surface

For agents

Image Encoder ^[lite]

Pose Estimation ^[lite]

Body Part Segmentation ^[lite]

Depth Estimation ^[lite]

Surface Normal Estimation ^[lite]