Research documentation

Anchor latency, drift, and train-inference mismatch.

Reports on the communication-efficient GRPO circuit: where the anchor merger works, why K=20 collapses, and how compression-induced train-inference mismatch shows up separately.

0.736K=5 validation at step 50

0.444K=20 terminal validation

0.04Q error after first update

20/20Broken delay and cadence cell

Topic Areas

Each topic opens to a focused index with direct links to the underlying reports.

Anchor Delay

Stale anchor gradients drive the K=20 collapse.

The anchor analysis separates delay, cadence, Q-basis stability, and drift evidence across the K=5 and K=20 runs.

staleness K=5 vs K=20 Q basis

Validation chart showing K=5 stable and K=20 collapsing.

Train-Inference Mismatch

Compression changes the policy seen at rollout time.

This report treats mismatch as its own failure mode, separate from anchor staleness and cadence effects.

mismatch compression TIS

Train-inference mismatch report preview chart.

Reports

Direct entry points for the current report set.

$K=20 length, entropy, and clip fraction chart.$

Delay Failure Report

Why the EMA merger stays near dense at K=5 but becomes destructive at K=20.

anchor-delay/delay_failure_report

$Q telemetry chart with clip fraction and response length.$

K-Instability Q-Basis

Short one-pager showing that Q still compresses while stale gradients fail.

anchor-delay/k_instability_q_basis

Staleness error increases with anchor latency K.

Dense Drift Joint

Joint drift evidence across GSM8K and Big-Math for the stale-gradient route.

anchor-delay/dense-drift-joint

Train-Inference Mismatch

Evidence that compression-induced rollout mismatch needs its own handling.

train-inference-mis-match/train_inference_mismatch_report