Install Sample Factory with MuJoCo dependencies with PyPI:

pip install sample-factory[mujoco]

Running Experiments

Run MuJoCo experiments with the scripts in sf_examples.mujoco. The default parameters have been chosen to match CleanRL's results in the report below (please note that we can achieve even faster training on a multi-core machine with more optimal parameters).

To train a model in the Ant-v4 environment:

python -m sf_examples.mujoco.train_mujoco --env=mujoco_ant --experiment=<experiment_name>

To visualize the training results, use the enjoy_mujoco script:

python -m sf_examples.mujoco.enjoy_mujoco --env=mujoco_ant --experiment=<experiment_name>

If you're having issues with the Mujoco viewer in a Unix/Linux environment with Conda, try running the following before executing the enjoy_mujoco script:

export LD_PRELOAD=/usr/lib/x86_64-linux-gnu/
python -m sf_examples.mujoco.enjoy_mujoco ...

Multiple experiments can be run in parallel with the launcher module. mujoco_all_envs is an example launcher script that runs all mujoco envs with 10 seeds.

python -m --run=sf_examples.mujoco.experiments.mujoco_all_envs --backend=processes --max_parallel=4  --pause_between=1 --experiments_per_gpu=10000 --num_gpus=1 --experiment_suffix=0

List of Supported Environments

Specify the environment to run with the --env command line parameter. The following MuJoCo v4 environments are supported out of the box, and more environments can be added as needed in sf_examples.mujoco.mujoco.mujoco_utils

MuJoCo Environment Name Sample Factory Command Line Parameter
Ant-v4 mujoco_ant
HalfCheetah-v4 mujoco_halfcheetah
Hopper-v4 mujoco_hopper
Humanoid-v4 mujoco_humanoid
Walker2d-v4 mujoco_walker
InvertedDoublePendulum-v4 mujoco_doublependulum
InvertedPendulum-v4 mujoco_pendulum
Reacher-v4 mujoco_reacher
Swimmer-v4 mujoco_swimmer



  1. Sample Factory was benchmarked on MuJoCo against CleanRL. Sample-Factory was able to achieve similar sample efficiency as CleanRL using the same parameters.

  2. Sample Factory can run experiments synchronously or asynchronously, with asynchronous execution usually having worse sample efficiency but runs faster. MuJoCo's environments were compared using the two modes in Sample-Factory

  3. Sample Factory comparison with CleanRL in terms of wall time. Both experiments are run on a 16 core machine with 1 GPU. Sample-Factory was able to complete 10M samples 5 times as fast as CleanRL


Various APPO models trained on MuJoCo environments are uploaded to the HuggingFace Hub. The models have all been trained for 10M steps. Videos of the agents after training can be found on the HuggingFace Hub.

The models below are the best models from the experiment against CleanRL above. The evaluation metrics here are obtained by running the model 10 times.

Environment HuggingFace Hub Models Evaluation Metrics
Ant-v4 5876.09 ± 166.99
HalfCheetah-v4 6262.56 ± 67.29
Humanoid-v4 5439.48 ± 1314.24
Walker2d-v4 5487.74 ± 48.96
Hopper-v4 2793.44 ± 642.58
InvertedDoublePendulum-v4 9350.13 ± 1.31
InvertedPendulum-v4 1000.00 ± 0.00
Reacher-v4 -4.53 ± 1.79
Swimmer-v4 117.28 ± 2.91


Below are some video examples of agents in various MuJoCo envioronments. Videos for all environments can be found in the HuggingFace Hub pages linked above.