Install Sample Factory with MuJoCo dependencies with PyPI:
Run MuJoCo experiments with the scripts in
The default parameters have been chosen to match CleanRL's results in the report below (please note
that we can achieve even faster training on a multi-core machine with more optimal parameters).
To train a model in the
python -m sf_examples.mujoco.train_mujoco --algo=APPO --env=mujoco_ant --experiment=<experiment_name>
To visualize the training results, use the
python -m sf_examples.mujoco.enjoy_mujoco --algo=APPO --env=mujoco_ant --experiment=<experiment_name>
Multiple experiments can be run in parallel with the launcher module.
mujoco_all_envs is an example launcher script that runs all mujoco envs with 10 seeds.
python -m sample_factory.launcher.run --run=sf_examples.mujoco.experiments.mujoco_all_envs --backend=processes --max_parallel=4 --pause_between=1 --experiments_per_gpu=10000 --num_gpus=1 --experiment_suffix=0
List of Supported Environments¶
Specify the environment to run with the
--env command line parameter. The following MuJoCo v4 environments are supported out of the box, and more environments can be added as needed in
|MuJoCo Environment Name||Sample Factory Command Line Parameter|
Sample Factory was benchmarked on MuJoCo against CleanRL. Sample-Factory was able to achieve similar sample efficiency as CleanRL using the same parameters.
Sample Factory can run experiments synchronously or asynchronously, with asynchronous execution usually having worse sample efficiency but runs faster. MuJoCo's environments were compared using the two modes in Sample-Factory
Sample Factory comparison with CleanRL in terms of wall time. Both experiments are run on a 16 core machine with 1 GPU. Sample-Factory was able to complete 10M samples 5 times as fast as CleanRL
Various APPO models trained on MuJoCo environments are uploaded to the HuggingFace Hub. The models have all been trained for 10M steps. Videos of the agents after training can be found on the HuggingFace Hub.
The models below are the best models from the experiment against CleanRL above. The evaluation metrics here are obtained by running the model 10 times.
|Environment||HuggingFace Hub Models||Evaluation Metrics|
|Ant-v4||https://huggingface.co/andrewzhang505/sample-factory-2-mujoco-ant||5876.09 ± 166.99|
|HalfCheetah-v4||https://huggingface.co/andrewzhang505/sample-factory-2-mujoco-halfcheetah||6262.56 ± 67.29|
|Humanoid-v4||https://huggingface.co/andrewzhang505/sample-factory-2-mujoco-humanoid||5439.48 ± 1314.24|
|Walker2d-v4||https://huggingface.co/andrewzhang505/sample-factory-2-mujoco-walker||5487.74 ± 48.96|
|Hopper-v4||https://huggingface.co/andrewzhang505/sample-factory-2-mujoco-hopper||2793.44 ± 642.58|
|InvertedDoublePendulum-v4||https://huggingface.co/andrewzhang505/sample-factory-2-mujoco-doublependulum||9350.13 ± 1.31|
|InvertedPendulum-v4||https://huggingface.co/andrewzhang505/sample-factory-2-mujoco-pendulum||1000.00 ± 0.00|
|Reacher-v4||https://huggingface.co/andrewzhang505/sample-factory-2-mujoco-reacher||-4.53 ± 1.79|
|Swimmer-v4||https://huggingface.co/andrewzhang505/sample-factory-2-mujoco-swimmer||117.28 ± 2.91|
Below are some video examples of agents in various MuJoCo envioronments. Videos for all environments can be found in the HuggingFace Hub pages linked above.
Created: May 9, 2023