MCPcopy Index your code
hub / github.com/facebookresearch/ReAgent

github.com/facebookresearch/ReAgent @main sqlite

repository ↗ · DeepWiki ↗
3,803 symbols 14,558 edges 480 files 724 documented · 19%
README

Banner

ReAgent is officially archived and no longer maintained. For latest support on production-ready reinforcement learning open-source library, please refer to Pearl - Production-ready Reinforcement Learning AI Agent Library, by the Applied Reinforcement Learning team @ Meta.

License CircleCI codecov


Overview

ReAgent is an open source end-to-end platform for applied reinforcement learning (RL) developed and used at Facebook. ReAgent is built in Python and uses PyTorch for modeling and training and TorchScript for model serving. The platform contains workflows to train popular deep RL algorithms and includes data preprocessing, feature transformation, distributed training, counterfactual policy evaluation, and optimized serving. For more detailed information about ReAgent see the release post here and white paper here.

The platform was once named "Horizon" but we have adopted the name "ReAgent" recently to emphasize its broader scope in decision making and reasoning.

Algorithms Supported

Classic Off-Policy algorithms: - Discrete-Action DQN - Parametric-Action DQN - Double DQN, Dueling DQN, Dueling Double DQN - Distributional RL: C51 and QR-DQN - Twin Delayed DDPG (TD3) - Soft Actor-Critic (SAC) - Critic Regularized Regression (CRR) - Proximal Policy Optimization Algorithms (PPO)

RL for recommender systems: - Seq2Slate - SlateQ

Counterfactual Evaluation: - Doubly Robust (for bandits) - Doubly Robust (for sequential decisions) - MAGIC

Multi-Arm and Contextual Bandits: - UCB1 - MetricUCB - Thompson Sampling - LinUCB

Others: - Cross-Entropy Method - Synthetic Return for Credit Assignment

Installation

ReAgent can be installed via. Docker or manually. Detailed instructions on how to install ReAgent can be found here.

Tutorial

ReAgent is designed for large-scale, distributed recommendation/optimization tasks where we don’t have access to a simulator. In this environment, it is typically better to train offline on batches of data, and release new policies slowly over time. Because the policy updates slowly and in batches, we use off-policy algorithms. To test a new policy without deploying it, we rely on counter-factual policy evaluation (CPE), a set of techniques for estimating a policy based on the actions of another policy.

We also have a set of tools to facilitate applying RL in real-world applications: - Domain Analysis Tool, which analyzes state/action feature importance and identifies whether the problem is a suitable for applying batch RL - Behavior Cloning, which clones from the logging policy to bootstrap the learning policy safely

Detailed instructions on how to use ReAgent can be found here.

License

ReAgent is released under a BSD 3-Clause license. Find out more about it here.

Terms of Use | Privacy Policy | Copyright © 2022 Meta Platforms, Inc

Citing

@article{gauci2018horizon,
  title={Horizon: Facebook's Open Source Applied Reinforcement Learning Platform},
  author={Gauci, Jason and Conti, Edoardo and Liang, Yitao and Virochsiri, Kittipat and Chen, Zhengxing and He, Yuchen and Kaden, Zachary and Narayanan, Vivek and Ye, Xiaohui},
  journal={arXiv preprint arXiv:1811.00260},
  year={2018}
}

Core symbols most depended-on inside this repo

append
called by 344
reagent/ope/estimators/estimator.py
mean
called by 171
reagent/core/running_stats.py
items
called by 113
reagent/ope/estimators/slate_estimators.py
cpu
called by 103
reagent/core/types.py
keys
called by 91
reagent/ope/estimators/types.py
to
called by 91
reagent/ope/test/cartpole.py
values
called by 70
reagent/ope/estimators/types.py
add
called by 69
reagent/ope/utils.py

Shape

Method 2,563
Class 817
Function 412
Route 11

Languages

Python100%

Modules by API surface

reagent/ope/estimators/slate_estimators.py139 symbols
reagent/core/types.py93 symbols
reagent/lite/optimizer.py88 symbols
reagent/preprocessing/transforms.py86 symbols
reagent/prediction/predictor_wrapper.py77 symbols
reagent/ope/estimators/types.py73 symbols
reagent/models/seq2slate.py68 symbols
reagent/replay_memory/circular_replay_buffer.py60 symbols
reagent/ope/estimators/sequential_estimators.py53 symbols
reagent/ope/trainers/rl_tabular_trainers.py38 symbols
reagent/ope/test/yandex_web_search.py38 symbols
reagent/gym/envs/pomdp/pocman.py38 symbols

Dependencies from manifests, versioned

com.github.scopt:scopt_${scala.binary.version}3.7.0 · 1×
junit:junit4.12 · 1×
org.apache.commons:commons-math33.4.1 · 1×
org.apache.spark:spark-core_${scala.binary.version}
org.apache.spark:spark-hive_${scala.binary.version}
org.apache.spark:spark-sql-kafka-0-10_${scala.binary.version}
org.jacoco:jacoco-maven-plugin0.8.6 · 1×
org.mockito:mockito-core1.10.19 · 1×
org.scala-lang:scala-library
org.scalacheck:scalacheck_${scala.binary.version}1.14.1 · 1×
org.scalatest:scalatest_${scala.binary.version}3.2.5 · 1×

For agents

$ claude mcp add ReAgent \
  -- python -m otcore.mcp_server <graph>

⬇ download graph artifact