Skip to content

Recent releases

  • Minor compatibility fixes
  • Minor compatibility fixes
  • Windows support in serial mode (do not require faster-fifo on Windows)
  • Allow Torch 2 (versions 2.0.1+ seem to work fine)
  • Use gymnasium instead of gym dependency. See New compatibility layer added to automatically convert legacy gym environments to gymnasium environments. Still, it is likely that some of the user code will need to be manually updated to use gymnasium instead of gym. Apologies for the inconvenience, but hopefully this is the last major change before Gym API finally stabilizes.
  • Added cfg parameters --lr_adaptive_min and --lr_adaptive_max to control the minimum and maximum adaptive learning rate
  • Added Brax environment support + custom brax renderer for enjoy scripts
  • Automatically set --recurrence based on feed-forward vs RNN training
  • Added --enjoy_script and --train_script for generating the model card when uploading to the Hugging Face Hub (thank you Andrew!)
  • Fixed video name when generating Hugging Face model card
  • Fixed small DMLab-related bug (thank you Lucy!)
  • cfg.json renamed to config.json for consistency with other HuggingFace integrations
  • We can still load from legacy checkpoints (cfg.json will be renamed to config.json)
  • Fixed a bug in with multi-agent envs
  • Added MuJoCo & IsaacGym examples to the PyPI package
  • Added missing files

Major update, adds new functionality, changes API and configuration parameters

  • Major API update, codebase rewritten from scratch for better maintainability and clarity
  • Synchronous and asynchronous training modes
  • Serial and parallel execution modes
  • Support for vectorized and GPU-accelerated environments in batched sampling mode
  • Integration with Hugging Face Hub
  • New environment integrations, CI, and 40+ documentation pages

See v1 to v2 transition guide for details.

  • Support Weights and Biases (see section "WandB support")
  • More configurable population-based training: can set from command line whether or not to mutate gamma, plus the perturbation magnitude for all float hyperparams can also be set from command line:
    --pbt_optimize_gamma: Whether to optimize gamma, discount factor, or not (experimental) (default: False)
    --pbt_perturb_min: When PBT mutates a float hyperparam, it samples the change magnitude randomly from the uniform distribution [pbt_perturb_min, pbt_perturb_max] (default: 1.05)
    --pbt_perturb_max: When PBT mutates a float hyperparam, it samples the change magnitude randomly from the uniform distribution [pbt_perturb_min, pbt_perturb_max] (default: 1.5)
  • Fixed a small bug related to population-based training (a reward shaping dictionary was assumed to be a flat dict, while it could be a nested dict in some envs)
  • Fixed a bug that prevented Vizdoom *.cfg and *.wad files from being copied to site-packages during installation from PyPI
  • Added example on how to use custom Vizdoom envs without modifying the source code (sample_factory_examples/
  • Added fixed KL divergence penalty as in Its usage is highly encouraged in environments with continuous action spaces (i.e. set --kl_loss_coeff=1.0). Otherwise numerical instabilities can occur in certain environments, especially when the policy lag is high

  • More summaries related to the new loss

  • More improvements and fixes in runner interface, including support for NGC cluster
  • Runner interface improvements for Slurm
  • Support inactive agents. To deactivate an agent for a portion of the episode the environment should return info={'is_active': False} for the inactive agent. Useful for environments such as hide-n-seek.
  • Improved memory consumption and performance with better shared memory management.
  • Experiment logs are now saved into the experiment folder as sf_log.txt
  • DMLab-related bug fixes (courtesy of @donghoonlee04 and @sungwoong. Thank you!)