Blog

Notes on reinforcement learning, efficient AI systems, and research ideas that are useful enough to write down.

Posts

Is Anybody Doing Async RL Correctly?

Async RL can unlock much higher training throughput, but stale trajectories turn the update into a fragile off-policy estimator. This post explains why clipping and masking only partially help, why effective sample size is an early collapse signal, and how variance-controlled updates stabilize high-lag training.

read