📄 Paper + 🏠️ Project Website
<img src="https://img.shields.io/badge/build-pass-green" alt="build">
<img src="https://img.shields.io/badge/license-MIT-blue" alt="MIT">
<img src="https://img.shields.io/badge/version-1.0.0-blue" alt="version">
<img src="https://img.shields.io/badge/python-3.11%20|%203.12-blue" alt="python">
<img src="https://img.shields.io/badge/platform-linux%20-lightgrey" alt="platform">
<img src="https://img.shields.io/badge/PRs-welcome-brightgreen" alt="PRs welcome">
<img src="https://img.shields.io/badge/docs-latest-brightgreen" alt="documentation">
| Event | Description |
|---|---|
| 📦 Code & Tools Release | We've released the core code and tools for our order agent, including several examples for downstream applications. The associated model will be made public following its final review. Please see the Release Overview section below for more details. |
| 🎈 ICLR 2025 Acceptance | We are thrilled to announce that our paper has been accepted to ICLR 2025! |
| 🌐 Join Our Community | Connect with us on 💬 WeChat Group and 👾 Discord to share your feedback and insights! |
| 🌟 First Release | We are excited to announce our first release! Check out the repo and enjoy your journey. |
Welcome to our project! We are excited to release the foundational code and tools designed for market simulation and analysis.
Important Note: While our associated Hugging Face model is fully prepared, it is currently set to private awaiting final review approval. We appreciate your patience regarding its public availability.
In the meantime, you can gain significant value and understanding from the core functionalities through the following examples and code explorations:
Explore Key Examples:
Delve into the Underlying Architecture:
Kindly note that the examples and demo's full functionality depends on the public release of our Hugging Face model, which will happen once the final review is finalized. We apologize for this temporary limitation and appreciate your patience.
We provide a fully configured development environment using VS Code Dev Containers:
git clone https://github.com/microsoft/MarS.git
cd MarS
Then, with VS Code and the Dev Containers extension installed:
1. Open the project folder in VS Code
2. Important: Before reopening in container, modify the .devcontainer/devcontainer.json file to change "source=/data/" to <your/data/path> exists on your host machine
3. When prompted, click "Reopen in Container" or use the command palette (F1) and select "Dev Containers: Reopen in Container"
4. The container will build with all dependencies and extensions configured
5. Once inside the container, install the project dependencies:
pip install -e .[dev]
git clone https://github.com/microsoft/MarS.git
cd MarS
docker build -t mars-env -f .devcontainer/Dockerfile .
# Modify this path to match your data directory
docker run -it --cap-add=SYS_ADMIN --device=/dev/fuse --security-opt=apparmor:unconfined --shm-size=20gb --gpus=all --privileged -v <your/data/path>:/data -v $(pwd):/workspaces/MarS -w /workspaces/MarS mars-env
# Inside the container
pip install -e .[dev]
Important: We strongly recommend using docker to run MarS. Direct installation without Docker is not supported due to specific system dependencies and CUDA requirements.
We've simplified downloading all necessary components (model, converters, validation samples, and stylized facts data) using a single script:
python download.py
Important Note: Since our model associated the hugging face repository is currently under review and not yet public, we have temporarily made the prerequisites available (converters, validation samples, and stylized facts data, not including the model) through OneDrive. Please download the prerequisites from OneDrive and place them under your
input_root_dirinmarket_simulation/conf.pyinstead of running thedownload.pyscript.Note: The download requires sufficient disk space and may take some time depending on your internet connection.
MarS uses Ray Serve to deploy the order model as a scalable, production-ready service. To start the order model Ray server:
bash scripts/start-order-model.sh
Prerequisites: - The Ray server must be running and accessible at the configured IP and port - Sufficient computational resources are required to run the model
To explore all of our demos in a user-friendly interface:
streamlit run market_simulation/examples/demo/home_app.py
The demo applications are designed to provide a quick and visual understanding of each tool's capabilities. However, there are some important considerations:
Using Demos vs Scripts: - If you want to quickly understand what these tools can do, run the Streamlit demos for an interactive experience. - If you need to use these tools with your own data or in production, you'll need to modify the corresponding scripts (
report_stylized_facts.py,forecast.py,market_impact.py) directly.
If you want to interact with the model directly after starting the server, you can use the ModelClient.
from market_simulation.rollout.model_client import ModelClient
from market_simulation.conf import C
client = ModelClient(
model_name=C.model_serving.model_name,
ip=C.model_serving.ip,
port=C.model_serving.port,
)
predictions = client.get_prediction(your_input_data)
Real Order-Level Data: While our demos use noise agents to generate initial states, production-grade applications require complete order-level historical data to accurately simulate market behavior.
Sufficient Computational Resources: Our research simulations typically run 128 trajectories per state to generate robust signals. In our experiments, we utilized 128 GPUs running parallel simulations across different instruments and starting states.
Optimized Inference Pipeline: The current implementation prioritizes validating the model's scalability, realistic, interactive, and controllable order generation capabilities. For production deployment, significant optimizations are necessary.
Several strategies can substantially improve inference performance for production deployment:
Advanced Serving System: Replace the current Ray-based batch inference with more optimized systems like vLLM to achieve higher throughput and lower latency.
Efficient Model Architectures: While we currently use LLaMA for its reliability during testing, exploring more efficient architectures such as linear attention models (RetNet, RWKV), state space models (Mamba), Mixture of Experts (MoE), or Multi-head Latent Attention (MLA) could significantly improve performance.
Model Compression: Implement quantization, distillation, and pruning to reduce model size and computational requirements while maintaining accuracy.
KV-Cache Optimization: Our current implementation uses fixed-length sequences with sliding windows, which needs special design for KV-cache.
Multi-Token Prediction: Generating multiple tokens simultaneously instead of one-by-one order generation could substantially reduce inference time.
The demos provide a user-friendly interface to experiment with different parameters and visualize results, while the scripts offer more flexibility for integration into your own workflows and data pipelines.
The Stylized Facts Report evaluates 11 key market characteristics identified by Cont (2001) to assess the realism of market simulations. These characteristics, known as "stylized facts," are empirical patterns consistently observed across different financial markets, instruments, and time periods.
To run the stylized facts analysis:
# Ensure you've run the download.py script first to get the required data
python market_simulation/examples/report_stylized_facts.py
| Fact # | Fact Name | Historical | Simulated |
|---|---|---|---|
| 1 | Absence of autocorrelations | × | × |
| 2 | Heavy tails | × | × |
| 3 | Gain/loss asymmetry | ||
| 4 | Aggregational Gaussianity | × | × |
| 5 | Intermittency | × | × |
| 6 | Volatility clustering | × | × |
| 7 | Conditional heavy tails | × | × |
| 8 | Slow decay of autocorrelation in absolute returns | × | × |
| 9 | Leverage effect | ||
| 10 | Volume/volatility correlation | × | × |
| 11 | Asymmetry in timescales | × | × |
$\text{corr}(r(t, \Delta t), r(t+\tau, \Delta t))$
Heavy tails: Return distributions display power-law or Pareto-like tails
Measured through kurtosis of returns
Aggregational Gaussianity: Return distributions become more normal as the time scale increases
Kurtosis of returns approaches Gaussian levels at longer time scales
Intermittency: High degree of variability in returns with irregular bursts
Measured using Fano factor (variance-to-mean ratio) of extreme returns
Volatility clustering: Positive autocorrelation in volatility measures, showing high-volatility events tend to cluster
$\text{corr}(|r(t, \Delta t)|, |r(t+\tau, \Delta t)|)$
Conditional heavy tails: Return distributions still exhibit heavy tails even after accounting for volatility clustering
Kurtosis of normalized returns (divided by local volatility)
Slow decay of absolute return autocorrelation: Absolute returns' autocorrelation decays slowly as a power law
Similar to volatility clustering, measured across different lag periods
Volume/volatility correlation: Trading volume correlation with volatility measures
$\text{corr}(v(t, \Delta t), |r(t, \Delta t)|)$
Asymmetry in timescales: Coarse-grained volatility predicts fine-scale volatility better than the reverse
Our methodology rigorously tests these facts using 11,591 simulated trajectories for the top 500 liquid stocks in the Chinese market, comparing simulation outputs against historical data.
The Market Forecast tool demonstrates the predictive capabilities of the MarS model by simulating future market prices and trends through order-level simulation rather than direct price prediction.
Traditional forecasting approaches attempt to directly model price movements based on historical data. Our approach is fundamentally different: