Blog
Occasional notes on research, reading, and life.
-
When Can LLMs Learn to Reason with Weak Supervision? ↗
A study of when reinforcement learning with verifiable rewards (RLVR) can elicit generalizable reasoning from limited or weak supervision.