
Awesome Open Source AI
Curated open-source artificial intelligence models, libraries, infrastructure, and developer tools.

Contributing
Contents
About this list
Awesome Open Source AI is a curated list of open-source projects for people building with AI.
The goal is to help readers find useful models, libraries, tools, infrastructure, datasets, and learning resources without sorting through a directory dump.
Projects do not need a minimum number of GitHub stars to be included. Stars can be useful context, but they are only one signal. A smaller project may belong here if it is useful, well-maintained, technically interesting, clearly documented, or important to a specific part of the AI ecosystem.
Good entries should have a clear reason to exist. They should help people build, study, run, evaluate, or understand AI systems.
1. Core Frameworks & Libraries
Core libraries and frameworks used to build, train, and run AI and machine learning systems.
Deep Learning Frameworks
- PyTorch - Dynamic computation graphs, Pythonic API, dominant in research and production. The current standard for most frontier AI work.

- TensorFlow - End-to-end platform with excellent production deployment, TPU support, and large-scale serving tools.

- JAX - High-performance numerical computing with composable transformations (JIT, vmap, grad). Rising favorite for research and scientific ML.
+ Flax 
- dm-haiku - JAX-based neural network library from Google DeepMind. Elegant functional API with state management, widely used in DeepMind's research. Apache 2.0 licensed.

- Equinox - Elegant easy-to-use neural networks and scientific computing in JAX. Callable PyTrees with filtered transformations, seamless interoperability with the JAX ecosystem. Apache 2.0 licensed.

- Diffrax - Numerical differential equation solvers in JAX. Autodifferentiable and GPU-capable ODE/SDE/CDE solvers for scientific machine learning and neural differential equations. Apache 2.0 licensed.

- vit-pytorch - Comprehensive Vision Transformer (ViT) implementations in PyTorch. Reference implementations of all major vision transformer variants including ViT, DeiT, Swin, and more. MIT licensed.

- NumPyro - Probabilistic programming with NumPy powered by JAX for autograd and JIT compilation. Bayesian modeling and inference at scale.

- Keras - High-level, beginner-friendly API that now runs on multiple backends (TensorFlow, JAX, PyTorch). Perfect for rapid experimentation.

- tinygrad - Minimalist deep learning framework with tiny code footprint. The "you like PyTorch? you like micrograd? you love tinygrad!" philosophy - simple yet powerful.

- PaddlePaddle - Industrial deep learning platform from Baidu serving 23+ million developers and 760,000+ companies. China's first independent R&D framework with advanced distributed training and deployment capabilities.

- PyTorch Geometric - Library for deep learning on irregular input data such as graphs, point clouds, and manifolds. Part of the PyTorch ecosystem.

- timm (PyTorch Image Models) - The largest collection of PyTorch image encoders and backbones. 900+ pretrained models including ResNet, EfficientNet, Vision Transformer, ConvNeXt, and more with training and inference scripts. Apache 2.0 licensed.

- Triton - Language and compiler for writing highly efficient custom deep-learning primitives. Powers kernel optimizations in PyTorch, JAX, and other frameworks. MIT licensed.

- GGML - Tensor library for machine learning. The foundational C/C++ library powering llama.cpp and many on-device inference engines. MIT licensed.

- MLX - Array framework for machine learning on Apple silicon. Efficient unified memory design with NumPy-like API, automatic differentiation, and multi-device support. MIT licensed.

High-Performance Compute Libraries
- oneDNN - oneAPI Deep Neural Network Library. Cross-platform performance library of basic building blocks for deep learning, optimized for Intel CPUs, GPUs, and Arm architectures. Apache 2.0 licensed.

- ONNX - Open standard for machine learning interoperability. Open Neural Network Exchange provides an open ecosystem that empowers AI developers to choose the right tools as their project evolves. Apache 2.0 licensed.

- IREE - Retargetable MLIR-based machine learning compiler and runtime toolkit. Lowers ML models to unified IR that scales from datacenter to mobile and edge deployments. Apache 2.0 licensed.

Rust ML Frameworks
- Burn - Next-generation deep learning framework in Rust. Backend-agnostic with CPU, GPU, WebAssembly support.

- Candle (Hugging Face) - Minimalist ML framework for Rust. PyTorch-like API with focus on performance and simplicity.

- linfa - Comprehensive Rust ML toolkit with classical algorithms. scikit-learn equivalent for Rust with clustering, regression, and preprocessing.

Julia ML Frameworks
- Flux.jl - 100% pure-Julia ML stack with lightweight abstractions on top of native GPU and AD support. Elegant, hackable, and fully integrated with Julia's scientific computing ecosystem.

- MLJ.jl - Comprehensive Julia machine learning framework providing a unified interface to 200+ models with meta-algorithms for selection, tuning, and evaluation. MIT licensed.

- ModelingToolkit.jl - High-performance symbolic-numeric modeling framework for scientific machine learning. Automatically generates fast functions for model components like Jacobians and Hessians with automatic sparsification and parallelization. MIT licensed.

NLP & Transformers
- spaCy (Explosion AI) - Industrial-strength natural language processing with 75+ languages, transformer pipelines, and production-grade NER, parsing, and text classification.

- Transformers (Hugging Face) - The de facto standard library for pretrained NLP models. 1M+ models, 250,000+ downloads/day. BERT, GPT, Llama, Qwen, and hundreds more.

- sentence-transformers - Classic library for sentence and image embeddings.

- tokenizers (Hugging Face) - Fast state-of-the-art tokenizers for training and inference.

- fairseq2 - FAIR Sequence Modeling Toolkit 2. Complete rewrite of fairseq with modern PyTorch APIs, native support for LLM training (70B+ models), vLLM integration, and first-party recipes for instruction finetuning and preference optimization. MIT licensed.

Data Processing & Manipulation
- Pandas - The gold standard for data analysis and manipulation in Python.

- Polars - Blazing-fast DataFrame library (Rust backend) - modern alternative to Pandas for large-scale workloads.

- cuDF - GPU DataFrame library from RAPIDS. Accelerates Pandas workflows on NVIDIA GPUs with zero code changes using cuDF.pandas accelerator mode.

- Modin - Parallel Pandas DataFrames. Scale Pandas workflows by changing a single line of code - distributes data and computation automatically.

- Dask - Parallel computing for big data - scales Pandas/NumPy/scikit-learn to clusters.

- NumPy - Fundamental array computing library that powers almost every AI stack.

- SciPy - Scientific computing algorithms (optimization, linear algebra, statistics, signal processing).

- CuPy - NumPy and SciPy-compatible array library for GPU-accelerated computing in Python.

- NetworkX - Creation, manipulation, and study of complex networks. The foundational graph analysis library for Python data science.

- cuGraph - GPU graph analytics library with NetworkX-compatible API. 10-100x faster than CPU for large-scale graph algorithms. Apache 2.0 licensed.

- Vaex - Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python. Visualize and explore billion-row datasets at millions of rows per secon