Comm-Efficient RLVR
Static research notes prepared 2026-06-18
Research documentation

Anchor latency, drift, and train-inference mismatch.

Reports on the communication-efficient GRPO circuit: where the anchor merger works, why K=20 collapses, and how compression-induced train-inference mismatch shows up separately.

0.736K=5 validation at step 50
0.444K=20 terminal validation
0.04Q error after first update
20/20Broken delay and cadence cell

Topic Areas

Each topic opens to a focused index with direct links to the underlying reports.

Reports

Direct entry points for the current report set.