Qwen2.5-1.5B-Instruct · GSM8K · GRPO (no-KL, no-entropy) · captured at training step 20, before the PPO mini-batch updates.
Although training is proceeding normally, the train-inference mismatch from the compression codec is very high: well outside the importance-ratio band that the open-source library (verl) treats as correctable (about 2x to 5x), and far above dense / literature runs (KL about 0.0004 to 0.01, versus about 4.1 here). Median per-token gap 2.83 nats, log-prob correlation only 0.13.
A normal 5x clip would drop about 61% of tokens from the loss, so plain truncated importance sampling (TIS) cannot absorb this mismatch without removing most of the trajectory. This compression-induced mismatch is its own problem, separate from anchor staleness.
Compressed run with the signed_ema merger. The numbers below compare vLLM's generation log-prob for each token against the FSDP trainer's recomputed log-prob with compression, at step 20.
Main plot. For each token, r = p_train / p_vLLM: how many times more or less likely the trainer thinks the token is, versus the engine that generated it. r = 1 (black center line) is a perfect match and is best; closer to the center is better. Mass left of 1/5x means the trainer rates generated tokens as much less likely than vLLM. Shaded bands are verl's correctable range (about 2x, up to 5x); most mass sits well outside them.

The same thing in plain probability, |p_train - p_vLLM|. 0 is best (the engines agree); further right is worse. Most tokens are far from 0.

Each dot is one token: vLLM log-prob (x) vs trainer log-prob with compression (y). On the dashed y = x line is best (agreement); higher correlation is better. Most dots sit below the line (the trainer assigns lower probability); the correlation is only 0.13.

Right-most column is r = p_train/p_vLLM (1.0 is a perfect match). Shaded cells fall outside verl's [1/5, 5] band. Rows shaded red have |diff| > 0.5 nats.
<|im_start|>system You are Qwen, created by Alibaba Cloud. You are a helpful assistant.<|im_end|> <|im_start|>user Peter has a plan to take his family on a vacation to Germany. He needs $5,000 to cover all the spending for the travel, and he has $2,900 in savings right now. If he can save up $700 each month, how many months does he need to wait to reach his goal? Let's think step by step and output the final answer after "####".<|im_end|> <|im_start|>assistant
To determine how many months Peter needs to save up to reach his goal, we can follow these steps: 1. Find the additional amount Peter needs to save: $5,000 (goal) - $2,900 (savings) = $2,100 2. Determine how much Peter saves each month: $700 3. Calculate the number of months needed by dividing the additional amount by the monthly savings: $2,100 ÷ $700 = 3 #### 3 Peter needs to wait 3 months to reach his goal.
| pos | token | vLLM logp | FSDP logp | diff | p_vLLM | p_train | r = p_train/p_vLLM |
|---|---|---|---|---|---|---|---|
| 0 | 'To' | -0.5539 | -1.8649 | -1.3110 | 0.5747 | 0.1549 | 0.270 |
| 1 | ' determine' | -1.8226 | -3.5531 | -1.7305 | 0.1616 | 0.0286 | 0.177 |
| 2 | ' how' | -0.0041 | -3.0491 | -3.0449 | 0.9959 | 0.0474 | 0.048 |
| 3 | ' many' | -0.0068 | -4.2043 | -4.1975 | 0.9932 | 0.0149 | 0.015 |
| 4 | ' months' | -0.0000 | -4.7287 | -4.7286 | 1.0000 | 0.0088 | 8.8e-03 |
| 5 | ' Peter' | -0.0069 | -9.3030 | -9.2960 | 0.9931 | 0.0001 | 9.2e-05 |
| 6 | ' needs' | -0.0001 | -1.9611 | -1.9610 | 0.9999 | 0.1407 | 0.141 |
| 7 | ' to' | -0.0020 | -1.1309 | -1.1289 | 0.9980 | 0.3228 | 0.323 |
| 8 | ' save' | -0.4748 | -3.3493 | -2.8745 | 0.6220 | 0.0351 | 0.056 |
| 9 | ' up' | -1.7495 | -11.0092 | -9.2597 | 0.1739 | 0.0000 | 9.5e-05 |
| 10 | ' to' | -0.6844 | -4.7787 | -4.0944 | 0.5044 | 0.0084 | 0.017 |
| 11 | ' reach' | -0.3180 | -14.4767 | -14.1587 | 0.7276 | 0.0000 | 7.1e-07 |
| 12 | ' his' | -0.0144 | -7.8737 | -7.8593 | 0.9857 | 0.0004 | 3.9e-04 |
| 13 | ' goal' | -0.0345 | -8.3045 | -8.2700 | 0.9661 | 0.0002 | 2.6e-04 |
| 14 | ',' | -0.1299 | -1.8832 | -1.7533 | 0.8782 | 0.1521 | 0.173 |
| 15 | ' we' | -0.0603 | -2.9032 | -2.8429 | 0.9415 | 0.0548 | 0.058 |
| 16 | ' can' | -1.1635 | -9.7227 | -8.5592 | 0.3124 | 0.0001 | 1.9e-04 |
| 17 | ' follow' | -0.2063 | -9.0647 | -8.8584 | 0.8136 | 0.0001 | 1.4e-04 |
| 18 | ' these' | -0.0001 | -7.9082 | -7.9081 | 0.9999 | 0.0004 | 3.7e-04 |
| 19 | ' steps' | -0.0003 | -2.7164 | -2.7161 | 0.9997 | 0.0661 | 0.066 |
| 20 | ':\n\n' | -0.0038 | -0.6737 | -0.6699 | 0.9962 | 0.5098 | 0.512 |
| 21 | '1' | -0.0008 | -0.8880 | -0.8872 | 0.9992 | 0.4115 | 0.412 |
| 22 | '.' | -0.0000 | -0.0590 | -0.0589 | 1.0000 | 0.9427 | 0.943 |
| 23 | ' Find' | -3.1160 | -6.4585 | -3.3425 | 0.0443 | 0.0016 | 0.035 |
| 24 | ' the' | -0.7663 | -0.0138 | 0.7525 | 0.4647 | 0.9862 | 2.12 |
| 25 | ' additional' | -2.3393 | -7.5139 | -5.1746 | 0.0964 | 0.0005 | 5.7e-03 |
| 26 | ' amount' | -0.0103 | -0.6491 | -0.6388 | 0.9898 | 0.5225 | 0.528 |
| 27 | ' Peter' | -0.3436 | -10.9960 | -10.6523 | 0.7092 | 0.0000 | 2.4e-05 |
| 28 | ' needs' | -0.0067 | -2.4754 | -2.4687 | 0.9933 | 0.0841 | 0.085 |
| 29 | ' to' | -1.2565 | -2.1519 | -0.8954 | 0.2846 | 0.1163 | 0.408 |
| 30 | ' save' | -0.0110 | -6.1968 | -6.1858 | 0.9891 | 0.0020 | 2.1e-03 |
| 31 | ':' | -0.3907 | -3.1723 | -2.7816 | 0.6766 | 0.0419 | 0.062 |
| 32 | ' $' | -0.1500 | -2.5813 | -2.4313 | 0.8607 | 0.0757 | 0.088 |
| 33 | '5' | -0.0000 | -2.7727 | -2.7727 | 1.0000 | 0.0625 | 0.062 |
| 34 | ',' | -0.0086 | -10.1255 | -10.1169 | 0.9914 | 0.0000 | 4.0e-05 |
| 35 | '0' | -0.0006 | -0.0028 | -0.0023 | 0.9994 | 0.9972 | 0.998 |
| 36 | '0' | 0.0000 | -0.0009 | -0.0009 | 1.0000 | 0.9991 | 0.999 |
| 37 | '0' | -0.0000 | -4.2330 | -4.2330 | 1.0000 | 0.0145 | 0.015 |
| 38 | ' (' | -0.5897 | -2.5536 | -1.9639 | 0.5545 | 0.0778 | 0.140 |
| 39 | 'goal' | -0.5303 | -19.9256 | -19.3954 | 0.5885 | 0.0000 | 3.8e-09 |
| 40 | ')' | -0.0020 | -13.3964 | -13.3944 | 0.9980 | 0.0000 | 1.5e-06 |
| 41 | ' -' | -0.0002 | -4.0284 | -4.0282 | 0.9998 | 0.0178 | 0.018 |
| 42 | ' $' | -0.0000 | -2.3837 | -2.3837 | 1.0000 | 0.0922 | 0.092 |
| 43 | '2' | -0.0000 | -0.6027 | -0.6027 | 1.0000 | 0.5473 | 0.547 |
| 44 | ',' | -0.0000 | -10.1528 | -10.1528 | 1.0000 | 0.0000 | 3.9e-05 |
| 45 | '9' | -0.0000 | -8.8833 | -8.8833 | 1.0000 | 0.0001 | 1.4e-04 |
| 46 | '0' | 0.0000 | -0.0002 | -0.0002 | 1.0000 | 0.9998 | 1.000 |
| 47 | '0' | 0.0000 | -0.0145 | -0.0145 | 1.0000 | 0.9856 | 0.986 |
| 48 | ' (' | -0.0001 | -4.2891 | -4.2890 | 0.9999 | 0.0137 | 0.014 |
| 49 | 's' | -0.8368 | -13.5584 | -12.7217 | 0.4331 | 0.0000 | 3.0e-06 |
| ... 76 more tokens (see samples.json) | |||||||
<|im_start|>system You are Qwen, created by Alibaba Cloud. You are a helpful assistant.<|im_end|> <|im_start|>user Peter has a plan to take his family on a vacation to Germany. He needs $5,000 to cover all the spending for the travel, and he has $2,900 in savings right now. If he can save up $700 each month, how many months does he need to wait to reach his goal? Let's think step by step and output the final answer after "####".<|im_end|> <|im_start|>assistant
We start by finding out how much more Peter needs to save. He has $2,900, and he needs $5,000 in total. Required saving = Total budget - Current savings Required saving = $5,000 - $2,900 Required saving = $2,100 Next, we calculate how many months it will take him to save the remaining amount. He saves $700 each month. Number of months = Required saving / Monthly savings Number of months = $2,100 / $700 Number of months = 3 [Rounding up since even a month left will result in more saving] #### 3 Peter needs to wait 3 months to reach his goal.
| pos | token | vLLM logp | FSDP logp | diff | p_vLLM | p_train | r = p_train/p_vLLM |
|---|---|---|---|---|---|---|---|
| 0 | 'We' | -1.9054 | -3.8649 | -1.9594 | 0.1488 | 0.0210 | 0.141 |
| 1 | ' start' | -0.6447 | -2.8341 | -2.1894 | 0.5248 | 0.0588 | 0.112 |
| 2 | ' by' | -1.1387 | -9.0719 | -7.9332 | 0.3202 | 0.0001 | 3.6e-04 |
| 3 | ' finding' | -1.6309 | -11.8348 | -10.2038 | 0.1957 | 0.0000 | 3.7e-05 |
| 4 | ' out' | -0.5248 | -9.9319 | -9.4071 | 0.5917 | 0.0000 | 8.2e-05 |
| 5 | ' how' | -0.1430 | -2.8131 | -2.6701 | 0.8668 | 0.0600 | 0.069 |
| 6 | ' much' | -0.0004 | -7.0900 | -7.0896 | 0.9996 | 0.0008 | 8.3e-04 |
| 7 | ' more' | -0.0158 | -7.3799 | -7.3641 | 0.9843 | 0.0006 | 6.3e-04 |
| 8 | ' Peter' | -1.8277 | -9.3996 | -7.5720 | 0.1608 | 0.0001 | 5.1e-04 |
| 9 | ' needs' | -0.0007 | -2.2624 | -2.2617 | 0.9993 | 0.1041 | 0.104 |
| 10 | ' to' | -0.2704 | -1.2278 | -0.9574 | 0.7631 | 0.2929 | 0.384 |
| 11 | ' save' | -0.0291 | -3.4061 | -3.3771 | 0.9714 | 0.0332 | 0.034 |
| 12 | '.' | -2.0507 | -1.1579 | 0.8928 | 0.1286 | 0.3141 | 2.44 |
| 13 | ' He' | -0.6726 | -4.8858 | -4.2131 | 0.5104 | 0.0076 | 0.015 |
| 14 | ' has' | -3.4860 | -1.1707 | 2.3153 | 0.0306 | 0.3101 | 10.13 |
| 15 | ' $' | -0.0253 | -2.4965 | -2.4712 | 0.9750 | 0.0824 | 0.084 |
| 16 | '2' | -0.0004 | -0.3722 | -0.3719 | 0.9996 | 0.6892 | 0.689 |
| 17 | ',' | -0.0017 | -9.2614 | -9.2597 | 0.9983 | 0.0001 | 9.5e-05 |
| 18 | '9' | -0.0009 | -9.7537 | -9.7527 | 0.9991 | 0.0001 | 5.8e-05 |
| 19 | '0' | 0.0000 | -0.0001 | -0.0001 | 1.0000 | 0.9999 | 1.000 |
| 20 | '0' | -0.0000 | -0.0161 | -0.0161 | 1.0000 | 0.9841 | 0.984 |
| 21 | ',' | -4.4389 | -2.1233 | 2.3155 | 0.0118 | 0.1196 | 10.13 |
| 22 | ' and' | -0.0819 | -2.9686 | -2.8867 | 0.9214 | 0.0514 | 0.056 |
| 23 | ' he' | -1.4652 | -2.6859 | -1.2206 | 0.2310 | 0.0682 | 0.295 |
| 24 | ' needs' | -0.0484 | -2.5647 | -2.5163 | 0.9527 | 0.0769 | 0.081 |
| 25 | ' $' | -0.0386 | -2.2262 | -2.1876 | 0.9621 | 0.1079 | 0.112 |
| 26 | '5' | -0.0000 | -2.7808 | -2.7808 | 1.0000 | 0.0620 | 0.062 |
| 27 | ',' | -0.0000 | -9.8765 | -9.8765 | 1.0000 | 0.0001 | 5.1e-05 |
| 28 | '0' | -0.0001 | -0.0138 | -0.0136 | 0.9999 | 0.9863 | 0.986 |
| 29 | '0' | 0.0000 | -0.0081 | -0.0081 | 1.0000 | 0.9920 | 0.992 |
| 30 | '0' | -0.0000 | -5.4453 | -5.4453 | 1.0000 | 0.0043 | 4.3e-03 |
| 31 | ' in' | -1.7843 | -3.6392 | -1.8549 | 0.1679 | 0.0263 | 0.156 |
| 32 | ' total' | -0.0033 | -5.2710 | -5.2677 | 0.9967 | 0.0051 | 5.2e-03 |
| 33 | '.' | -0.5031 | -2.0845 | -1.5814 | 0.6047 | 0.1244 | 0.206 |
| 34 | ' \n\n' | -1.4786 | -3.3345 | -1.8559 | 0.2280 | 0.0356 | 0.156 |
| 35 | 'Required' | -7.4072 | -10.3198 | -2.9127 | 0.0006 | 0.0000 | 0.054 |
| 36 | ' saving' | -3.6222 | -9.1622 | -5.5400 | 0.0267 | 0.0001 | 3.9e-03 |
| 37 | ' =' | -0.2930 | -1.6706 | -1.3776 | 0.7460 | 0.1881 | 0.252 |
| 38 | ' Total' | -0.2876 | -5.7645 | -5.4769 | 0.7501 | 0.0031 | 4.2e-03 |
| 39 | ' budget' | -4.8631 | -5.5482 | -0.6851 | 0.0077 | 0.0039 | 0.504 |
| 40 | ' -' | -0.0245 | -5.3484 | -5.3239 | 0.9758 | 0.0048 | 4.9e-03 |
| 41 | ' Current' | -0.3755 | -9.2463 | -8.8708 | 0.6869 | 0.0001 | 1.4e-04 |
| 42 | ' savings' | -0.0006 | -3.3497 | -3.3491 | 0.9994 | 0.0351 | 0.035 |
| 43 | '\n' | -0.1546 | -3.1557 | -3.0012 | 0.8568 | 0.0426 | 0.050 |
| 44 | 'Required' | -0.0267 | -8.6083 | -8.5816 | 0.9737 | 0.0002 | 1.9e-04 |
| 45 | ' saving' | -0.0001 | -9.4081 | -9.4080 | 0.9999 | 0.0001 | 8.2e-05 |
| 46 | ' =' | -0.0000 | -0.2619 | -0.2619 | 1.0000 | 0.7696 | 0.770 |
| 47 | ' $' | -0.0142 | -2.2616 | -2.2474 | 0.9859 | 0.1042 | 0.106 |
| 48 | '5' | -0.0000 | -5.6656 | -5.6656 | 1.0000 | 0.0035 | 3.5e-03 |
| 49 | ',' | -0.0052 | -11.0013 | -10.9961 | 0.9948 | 0.0000 | 1.7e-05 |
| ... 111 more tokens (see samples.json) | |||||||
<|im_start|>system You are Qwen, created by Alibaba Cloud. You are a helpful assistant.<|im_end|> <|im_start|>user Sean and Sierra invited 200 guests to their wedding. If 83% of the guests RSVP with a Yes response and 9% of the guests RSVP with a No response, how many guests did not respond at all? Let's think step by step and output the final answer after "####".<|im_end|> <|im_start|>assistant
Step 1: Calculate the number of guests who RSVP'd with a Yes response. 83% of 200 guests = 0.83 * 200 = 166 guests Step 2: Calculate the number of guests who RSVP'd with a No response. 9% of 200 guests = 0.09 * 200 = 18 guests Step 3: Calculate the total number of guests who responded. 166 guests (yes) + 18 guests (no) = 184 guests responded Step 4: Calculate the number of guests who did not respond at all. 200 guests invited - 184 guests responded = 16 guests did not respond #### 16 The final answer is: #### 16
| pos | token | vLLM logp | FSDP logp | diff | p_vLLM | p_train | r = p_train/p_vLLM |
|---|---|---|---|---|---|---|---|
| 0 | 'Step' | -2.3255 | -1.3812 | 0.9443 | 0.0977 | 0.2513 | 2.57 |
| 1 | ' ' | -0.0000 | -0.0189 | -0.0189 | 1.0000 | 0.9812 | 0.981 |
| 2 | '1' | 0.0000 | -0.0549 | -0.0549 | 1.0000 | 0.9466 | 0.947 |
| 3 | ':' | -0.0000 | -3.7765 | -3.7765 | 1.0000 | 0.0229 | 0.023 |
| 4 | ' Calculate' | -0.0152 | -2.2131 | -2.1979 | 0.9849 | 0.1094 | 0.111 |
| 5 | ' the' | -0.0005 | -0.0116 | -0.0111 | 0.9995 | 0.9884 | 0.989 |
| 6 | ' number' | -0.0054 | -1.2475 | -1.2421 | 0.9946 | 0.2872 | 0.289 |
| 7 | ' of' | -0.0000 | -0.0001 | -0.0001 | 1.0000 | 0.9999 | 1.000 |
| 8 | ' guests' | -0.0116 | -7.7084 | -7.6968 | 0.9884 | 0.0004 | 4.5e-04 |
| 9 | ' who' | -0.0057 | -8.7395 | -8.7339 | 0.9943 | 0.0002 | 1.6e-04 |
| 10 | ' RSVP' | -0.1947 | -13.0234 | -12.8287 | 0.8231 | 0.0000 | 2.7e-06 |
| 11 | "'d" | -2.1733 | -15.6278 | -13.4545 | 0.1138 | 0.0000 | 1.4e-06 |
| 12 | ' with' | -0.6462 | -7.1503 | -6.5041 | 0.5241 | 0.0008 | 1.5e-03 |
| 13 | ' a' | -0.1210 | -4.4255 | -4.3045 | 0.8860 | 0.0120 | 0.014 |
| 14 | ' Yes' | -0.0214 | -13.5368 | -13.5155 | 0.9789 | 0.0000 | 1.3e-06 |
| 15 | ' response' | -0.0222 | -14.2875 | -14.2654 | 0.9781 | 0.0000 | 6.4e-07 |
| 16 | '.\n' | -0.2189 | -1.8572 | -1.6383 | 0.8034 | 0.1561 | 0.194 |
| 17 | '8' | -0.4112 | -5.6874 | -5.2762 | 0.6629 | 0.0034 | 5.1e-03 |
| 18 | '3' | -0.0001 | -8.8874 | -8.8873 | 0.9999 | 0.0001 | 1.4e-04 |
| 19 | '%' | -0.0001 | -4.8037 | -4.8036 | 0.9999 | 0.0082 | 8.2e-03 |
| 20 | ' of' | -0.0002 | -0.3171 | -0.3170 | 0.9998 | 0.7282 | 0.728 |
| 21 | ' ' | -0.0005 | -4.1491 | -4.1486 | 0.9995 | 0.0158 | 0.016 |
| 22 | '2' | -0.0000 | -0.3502 | -0.3502 | 1.0000 | 0.7045 | 0.705 |
| 23 | '0' | -0.0000 | -0.0011 | -0.0011 | 1.0000 | 0.9989 | 0.999 |
| 24 | '0' | 0.0000 | -0.0007 | -0.0007 | 1.0000 | 0.9993 | 0.999 |
| 25 | ' guests' | -0.1067 | -9.3878 | -9.2811 | 0.8988 | 0.0001 | 9.3e-05 |
| 26 | ' =' | -0.1923 | -1.5660 | -1.3738 | 0.8251 | 0.2089 | 0.253 |
| 27 | ' ' | -0.0017 | -0.0022 | -0.0005 | 0.9983 | 0.9978 | 1.000 |
| 28 | '0' | -0.0266 | -7.6108 | -7.5842 | 0.9737 | 0.0005 | 5.1e-04 |
| 29 | '.' | 0.0000 | -8.4065 | -8.4065 | 1.0000 | 0.0002 | 2.2e-04 |
| 30 | '8' | -0.0000 | -6.4076 | -6.4075 | 1.0000 | 0.0016 | 1.6e-03 |
| 31 | '3' | -0.0000 | -9.1467 | -9.1467 | 1.0000 | 0.0001 | 1.1e-04 |
| 32 | ' *' | -0.0021 | -7.0452 | -7.0431 | 0.9979 | 0.0009 | 8.7e-04 |
| 33 | ' ' | 0.0000 | -0.0024 | -0.0024 | 1.0000 | 0.9976 | 0.998 |
| 34 | '2' | -0.0000 | -0.3017 | -0.3017 | 1.0000 | 0.7396 | 0.740 |
| 35 | '0' | 0.0000 | -0.0371 | -0.0371 | 1.0000 | 0.9636 | 0.964 |
| 36 | '0' | -0.0000 | -2.0405 | -2.0405 | 1.0000 | 0.1300 | 0.130 |
| 37 | ' =' | -0.0706 | -0.3756 | -0.3050 | 0.9318 | 0.6869 | 0.737 |
| 38 | ' ' | -0.0000 | -0.0004 | -0.0004 | 1.0000 | 0.9996 | 1.000 |
| 39 | '1' | -0.0000 | -1.2194 | -1.2194 | 1.0000 | 0.2954 | 0.295 |
| 40 | '6' | -0.0000 | -2.7427 | -2.7427 | 1.0000 | 0.0644 | 0.064 |
| 41 | '6' | -0.0000 | -6.3902 | -6.3902 | 1.0000 | 0.0017 | 1.7e-03 |
| 42 | ' guests' | -0.0189 | -11.9208 | -11.9019 | 0.9813 | 0.0000 | 6.8e-06 |
| 43 | '\n\n' | -0.0053 | -0.4357 | -0.4305 | 0.9948 | 0.6468 | 0.650 |
| 44 | 'Step' | -0.0000 | -4.2210 | -4.2210 | 1.0000 | 0.0147 | 0.015 |
| 45 | ' ' | -0.0000 | -0.0201 | -0.0201 | 1.0000 | 0.9801 | 0.980 |
| 46 | '2' | -0.0000 | -0.1663 | -0.1663 | 1.0000 | 0.8468 | 0.847 |
| 47 | ':' | 0.0000 | -3.0060 | -3.0060 | 1.0000 | 0.0495 | 0.049 |
| 48 | ' Calculate' | -0.0000 | -1.2081 | -1.2081 | 1.0000 | 0.2988 | 0.299 |
| 49 | ' the' | -0.0000 | -0.0210 | -0.0210 | 1.0000 | 0.9792 | 0.979 |
| ... 124 more tokens (see samples.json) | |||||||
<|im_start|>system You are Qwen, created by Alibaba Cloud. You are a helpful assistant.<|im_end|> <|im_start|>user Sean and Sierra invited 200 guests to their wedding. If 83% of the guests RSVP with a Yes response and 9% of the guests RSVP with a No response, how many guests did not respond at all? Let's think step by step and output the final answer after "####".<|im_end|> <|im_start|>assistant
First, calculate the number of guests who responded yes: 83% of 200 guests = 0.83 * 200 = 166 guests Next, calculate the number of guests who responded no: 9% of 200 guests = 0.09 * 200 = 18 guests Add the number of guests who responded yes and no to find the total who responded: 166 guests + 18 guests = 184 guests Subtract the total number of initial guests from the number who responded to find those who did not respond: 200 guests - 184 guests = 16 guests #### 16
| pos | token | vLLM logp | FSDP logp | diff | p_vLLM | p_train | r = p_train/p_vLLM |
|---|---|---|---|---|---|---|---|
| 0 | 'First' | -0.4505 | -1.7562 | -1.3057 | 0.6373 | 0.1727 | 0.271 |
| 1 | ',' | -0.0004 | -0.0225 | -0.0221 | 0.9996 | 0.9777 | 0.978 |
| 2 | ' calculate' | -0.5992 | -2.1727 | -1.5736 | 0.5493 | 0.1139 | 0.207 |
| 3 | ' the' | -0.0006 | -0.0428 | -0.0423 | 0.9994 | 0.9581 | 0.959 |
| 4 | ' number' | -0.0356 | -1.6272 | -1.5916 | 0.9650 | 0.1965 | 0.204 |
| 5 | ' of' | -0.0000 | -0.0004 | -0.0004 | 1.0000 | 0.9996 | 1.000 |
| 6 | ' guests' | -0.0102 | -7.6619 | -7.6517 | 0.9898 | 0.0005 | 4.8e-04 |
| 7 | ' who' | -0.0047 | -8.6410 | -8.6363 | 0.9953 | 0.0002 | 1.8e-04 |
| 8 | ' responded' | -1.6168 | -13.7888 | -12.1720 | 0.1985 | 0.0000 | 5.2e-06 |
| 9 | ' yes' | -3.4887 | -17.2288 | -13.7401 | 0.0305 | 0.0000 | 1.1e-06 |
| 10 | ':\n' | -0.7896 | -3.4632 | -2.6736 | 0.4540 | 0.0313 | 0.069 |
| 11 | '8' | -0.6528 | -5.3122 | -4.6594 | 0.5206 | 0.0049 | 9.5e-03 |
| 12 | '3' | -0.0000 | -9.1355 | -9.1355 | 1.0000 | 0.0001 | 1.1e-04 |
| 13 | '%' | -0.0006 | -5.4343 | -5.4337 | 0.9994 | 0.0044 | 4.4e-03 |
| 14 | ' of' | -0.0005 | -0.5986 | -0.5981 | 0.9995 | 0.5496 | 0.550 |
| 15 | ' ' | -0.0000 | -3.4163 | -3.4163 | 1.0000 | 0.0328 | 0.033 |
| 16 | '2' | -0.0000 | -0.2405 | -0.2405 | 1.0000 | 0.7863 | 0.786 |
| 17 | '0' | 0.0000 | -0.0010 | -0.0010 | 1.0000 | 0.9990 | 0.999 |
| 18 | '0' | 0.0000 | -0.0007 | -0.0007 | 1.0000 | 0.9993 | 0.999 |
| 19 | ' guests' | -0.7046 | -10.1833 | -9.4788 | 0.4943 | 0.0000 | 7.6e-05 |
| 20 | ' =' | -0.0307 | -1.3147 | -1.2840 | 0.9698 | 0.2686 | 0.277 |
| 21 | ' ' | -0.0009 | -0.0024 | -0.0015 | 0.9991 | 0.9976 | 0.999 |
| 22 | '0' | -0.0077 | -7.6996 | -7.6919 | 0.9923 | 0.0005 | 4.6e-04 |
| 23 | '.' | 0.0000 | -8.4072 | -8.4072 | 1.0000 | 0.0002 | 2.2e-04 |
| 24 | '8' | -0.0000 | -6.5397 | -6.5397 | 1.0000 | 0.0014 | 1.4e-03 |
| 25 | '3' | -0.0000 | -9.1643 | -9.1643 | 1.0000 | 0.0001 | 1.0e-04 |
| 26 | ' *' | -0.0010 | -6.8446 | -6.8436 | 0.9990 | 0.0011 | 1.1e-03 |
| 27 | ' ' | 0.0000 | -0.0032 | -0.0032 | 1.0000 | 0.9968 | 0.997 |
| 28 | '2' | -0.0000 | -0.2983 | -0.2983 | 1.0000 | 0.7421 | 0.742 |
| 29 | '0' | 0.0000 | -0.0316 | -0.0316 | 1.0000 | 0.9689 | 0.969 |
| 30 | '0' | -0.0000 | -1.9899 | -1.9899 | 1.0000 | 0.1367 | 0.137 |
| 31 | ' =' | -0.0116 | -0.4193 | -0.4077 | 0.9884 | 0.6575 | 0.665 |
| 32 | ' ' | -0.0000 | -0.0006 | -0.0006 | 1.0000 | 0.9994 | 0.999 |
| 33 | '1' | -0.0000 | -1.2390 | -1.2390 | 1.0000 | 0.2897 | 0.290 |
| 34 | '6' | -0.0000 | -2.7359 | -2.7359 | 1.0000 | 0.0648 | 0.065 |
| 35 | '6' | -0.0000 | -6.5119 | -6.5119 | 1.0000 | 0.0015 | 1.5e-03 |
| 36 | ' guests' | -0.0247 | -12.0058 | -11.9811 | 0.9756 | 0.0000 | 6.3e-06 |
| 37 | '\n\n' | -0.0230 | -0.3714 | -0.3484 | 0.9772 | 0.6898 | 0.706 |
| 38 | 'Next' | -0.1463 | -6.4932 | -6.3469 | 0.8639 | 0.0015 | 1.8e-03 |
| 39 | ',' | -0.0000 | -0.0748 | -0.0747 | 1.0000 | 0.9279 | 0.928 |
| 40 | ' calculate' | -0.0012 | -1.8713 | -1.8702 | 0.9988 | 0.1539 | 0.154 |
| 41 | ' the' | -0.0001 | -0.0341 | -0.0341 | 0.9999 | 0.9664 | 0.967 |
| 42 | ' number' | -0.0002 | -1.5524 | -1.5522 | 0.9998 | 0.2117 | 0.212 |
| 43 | ' of' | -0.0001 | -0.0011 | -0.0010 | 0.9999 | 0.9989 | 0.999 |
| 44 | ' guests' | -0.0000 | -7.8068 | -7.8068 | 1.0000 | 0.0004 | 4.1e-04 |
| 45 | ' who' | -0.0001 | -8.4982 | -8.4981 | 0.9999 | 0.0002 | 2.0e-04 |
| 46 | ' responded' | -0.0028 | -11.9997 | -11.9969 | 0.9972 | 0.0000 | 6.2e-06 |
| 47 | ' no' | -0.0001 | -11.7483 | -11.7482 | 0.9999 | 0.0000 | 7.9e-06 |
| 48 | ':\n' | -0.0010 | -1.7106 | -1.7096 | 0.9990 | 0.1808 | 0.181 |
| 49 | '9' | -0.0000 | -6.2490 | -6.2490 | 1.0000 | 0.0019 | 1.9e-03 |
| ... 99 more tokens (see samples.json) | |||||||
<|im_start|>system You are Qwen, created by Alibaba Cloud. You are a helpful assistant.<|im_end|> <|im_start|>user The largest frog can grow to weigh 10 times as much as the smallest frog. The largest frog weighs 120 pounds. How much more does the largest frog weigh than the smallest frog? Let's think step by step and output the final answer after "####".<|im_end|> <|im_start|>assistant
To find out how much more the largest frog weighs than the smallest frog, we need to divide the weight of the largest frog by its weight being 10 times the smallest frog. Largest frog weight = 120 pounds Smallest frog weight = 120 pounds ÷ 10 Let's calculate the smallest frog's weight: Smallest frog weight = 120 ÷ 10 = 12 pounds Now, we subtract the smallest frog's weight from the largest frog's weight to find the difference: Difference in weight = Largest frog weight - Smallest frog weight Difference in weight = 120 - 12 = 108 pounds #### 108
| pos | token | vLLM logp | FSDP logp | diff | p_vLLM | p_train | r = p_train/p_vLLM |
|---|---|---|---|---|---|---|---|
| 0 | 'To' | -1.0691 | -1.9301 | -0.8611 | 0.3433 | 0.1451 | 0.423 |
| 1 | ' find' | -0.1006 | -2.3824 | -2.2818 | 0.9043 | 0.0923 | 0.102 |
| 2 | ' out' | -0.0467 | -10.7741 | -10.7274 | 0.9544 | 0.0000 | 2.2e-05 |
| 3 | ' how' | -0.0013 | -3.2887 | -3.2874 | 0.9987 | 0.0373 | 0.037 |
| 4 | ' much' | -0.0004 | -6.3073 | -6.3069 | 0.9996 | 0.0018 | 1.8e-03 |
| 5 | ' more' | -0.0095 | -6.4581 | -6.4487 | 0.9906 | 0.0016 | 1.6e-03 |
| 6 | ' the' | -0.0002 | -0.8568 | -0.8566 | 0.9998 | 0.4245 | 0.425 |
| 7 | ' largest' | -0.0009 | -10.0600 | -10.0592 | 0.9991 | 0.0000 | 4.3e-05 |
| 8 | ' frog' | -0.0003 | -10.1837 | -10.1834 | 0.9997 | 0.0000 | 3.8e-05 |
| 9 | ' weighs' | -0.0006 | -13.7774 | -13.7768 | 0.9994 | 0.0000 | 1.0e-06 |
| 10 | ' than' | -0.0143 | -13.3049 | -13.2906 | 0.9858 | 0.0000 | 1.7e-06 |
| 11 | ' the' | -0.0000 | -0.0365 | -0.0365 | 1.0000 | 0.9641 | 0.964 |
| 12 | ' smallest' | -0.0000 | -9.5825 | -9.5825 | 1.0000 | 0.0001 | 6.9e-05 |
| 13 | ' frog' | -0.0688 | -12.0658 | -11.9971 | 0.9335 | 0.0000 | 6.2e-06 |
| 14 | ',' | -0.0272 | -1.9186 | -1.8913 | 0.9731 | 0.1468 | 0.151 |
| 15 | ' we' | -0.0498 | -1.9336 | -1.8838 | 0.9515 | 0.1446 | 0.152 |
| 16 | ' need' | -1.4004 | -6.2406 | -4.8402 | 0.2465 | 0.0019 | 7.9e-03 |
| 17 | ' to' | -0.0007 | -8.5152 | -8.5144 | 0.9993 | 0.0002 | 2.0e-04 |
| 18 | ' divide' | -1.9924 | -5.4737 | -3.4813 | 0.1364 | 0.0042 | 0.031 |
| 19 | ' the' | -0.0044 | -0.0738 | -0.0694 | 0.9956 | 0.9289 | 0.933 |
| 20 | ' weight' | -0.3914 | -9.3660 | -8.9746 | 0.6761 | 0.0001 | 1.3e-04 |
| 21 | ' of' | -0.0030 | -0.2263 | -0.2233 | 0.9970 | 0.7975 | 0.800 |
| 22 | ' the' | -0.0001 | -0.0367 | -0.0367 | 0.9999 | 0.9639 | 0.964 |
| 23 | ' largest' | -0.0065 | -9.0084 | -9.0019 | 0.9935 | 0.0001 | 1.2e-04 |
| 24 | ' frog' | -0.0006 | -11.1142 | -11.1136 | 0.9994 | 0.0000 | 1.5e-05 |
| 25 | ' by' | -0.0187 | -7.7720 | -7.7533 | 0.9815 | 0.0004 | 4.3e-04 |
| 26 | ' its' | -1.5882 | -9.5723 | -7.9841 | 0.2043 | 0.0001 | 3.4e-04 |
| 27 | ' weight' | -0.7640 | -8.9020 | -8.1380 | 0.4658 | 0.0001 | 2.9e-04 |
| 28 | ' being' | -8.2587 | -7.9956 | 0.2632 | 0.0003 | 0.0003 | 1.30 |
| 29 | ' ' | -0.0488 | -0.9831 | -0.9343 | 0.9523 | 0.3742 | 0.393 |
| 30 | '1' | -0.0001 | -0.2543 | -0.2542 | 0.9999 | 0.7755 | 0.776 |
| 31 | '0' | -0.0028 | -0.8400 | -0.8372 | 0.9972 | 0.4317 | 0.433 |
| 32 | ' times' | -0.0013 | -4.7114 | -4.7101 | 0.9987 | 0.0090 | 9.0e-03 |
| 33 | ' the' | -0.7999 | -0.3637 | 0.4363 | 0.4494 | 0.6951 | 1.55 |
| 34 | ' smallest' | -0.2211 | -9.2701 | -9.0490 | 0.8016 | 0.0001 | 1.2e-04 |
| 35 | ' frog' | -0.1564 | -11.9589 | -11.8025 | 0.8552 | 0.0000 | 7.5e-06 |
| 36 | '.\n\n' | -1.2960 | -1.4481 | -0.1521 | 0.2736 | 0.2350 | 0.859 |
| 37 | 'L' | -2.5922 | -10.1524 | -7.5602 | 0.0749 | 0.0000 | 5.2e-04 |
| 38 | 'argest' | -0.0037 | -15.9276 | -15.9239 | 0.9963 | 0.0000 | 1.2e-07 |
| 39 | ' frog' | -0.0093 | -11.4455 | -11.4362 | 0.9907 | 0.0000 | 1.1e-05 |
| 40 | ' weight' | -0.7740 | -13.0171 | -12.2432 | 0.4612 | 0.0000 | 4.8e-06 |
| 41 | ' =' | -0.3599 | -0.3691 | -0.0092 | 0.6977 | 0.6914 | 0.991 |
| 42 | ' ' | -0.0338 | -0.0179 | 0.0159 | 0.9667 | 0.9822 | 1.02 |
| 43 | '1' | -0.0001 | -0.0477 | -0.0476 | 0.9999 | 0.9534 | 0.954 |
| 44 | '2' | -0.0054 | -2.0230 | -2.0177 | 0.9946 | 0.1323 | 0.133 |
| 45 | '0' | -0.0000 | -0.0010 | -0.0010 | 1.0000 | 0.9990 | 0.999 |
| 46 | ' pounds' | -0.0008 | -7.6527 | -7.6519 | 0.9992 | 0.0005 | 4.8e-04 |
| 47 | '\n' | -0.0529 | -0.9060 | -0.8530 | 0.9484 | 0.4041 | 0.426 |
| 48 | 'Sm' | -0.5606 | -11.5577 | -10.9971 | 0.5709 | 0.0000 | 1.7e-05 |
| 49 | 'allest' | -0.0007 | -15.0451 | -15.0444 | 0.9993 | 0.0000 | 2.9e-07 |
| ... 101 more tokens (see samples.json) | |||||||
<|im_start|>system You are Qwen, created by Alibaba Cloud. You are a helpful assistant.<|im_end|> <|im_start|>user The largest frog can grow to weigh 10 times as much as the smallest frog. The largest frog weighs 120 pounds. How much more does the largest frog weigh than the smallest frog? Let's think step by step and output the final answer after "####".<|im_end|> <|im_start|>assistant
To find out how much more the largest frog weighs than the smallest frog, we need to first determine the weight of the smallest frog. 1. The largest frog weighs 120 pounds. 2. The largest frog can grow to weigh 10 times as much as the smallest frog. To find the weight of the smallest frog, we divide the weight of the largest frog by 10: Smallest frog weight = 120 pounds / 10 Now let's calculate it: Smallest frog weight = 12 pounds / 10 = 12 pounds Finally, to find out how much more the largest frog weighs than the smallest frog, we subtract the weight of the smallest frog from the weight of the largest frog: Difference = Largest frog's weight - Smallest frog's weight Difference = 120 pounds - 12 pounds #### 108 However, I made a mistake in my initial reasoning. The weight of the smallest frog isn't just 12 pounds, but it's the weight of the smallest frog, which is much less than this value based on the 10 times relationship. To correctly calculate the weight of the smallest frog, we should multiply the weight of the largest frog by 1/10: Smallest frog weight = 120 pounds / 10 = 12 pounds Now, let's correctly calculate the difference using 12 pounds as the weight of the smallest frog: Difference = Largest frog's weight - Smallest frog's weight Difference = 120 pounds - 12 pounds #### 108
| pos | token | vLLM logp | FSDP logp | diff | p_vLLM | p_train | r = p_train/p_vLLM |
|---|---|---|---|---|---|---|---|
| 0 | 'To' | -1.0891 | -1.9301 | -0.8410 | 0.3365 | 0.1451 | 0.431 |
| 1 | ' find' | -0.0893 | -2.3824 | -2.2930 | 0.9145 | 0.0923 | 0.101 |
| 2 | ' out' | -0.0561 | -10.7741 | -10.7180 | 0.9455 | 0.0000 | 2.2e-05 |
| 3 | ' how' | -0.0010 | -3.2887 | -3.2877 | 0.9990 | 0.0373 | 0.037 |
| 4 | ' much' | -0.0004 | -6.3073 | -6.3069 | 0.9996 | 0.0018 | 1.8e-03 |
| 5 | ' more' | -0.0097 | -6.4581 | -6.4485 | 0.9904 | 0.0016 | 1.6e-03 |
| 6 | ' the' | -0.0002 | -0.8568 | -0.8566 | 0.9998 | 0.4245 | 0.425 |
| 7 | ' largest' | -0.0009 | -10.0600 | -10.0592 | 0.9991 | 0.0000 | 4.3e-05 |
| 8 | ' frog' | -0.0003 | -10.1837 | -10.1834 | 0.9997 | 0.0000 | 3.8e-05 |
| 9 | ' weighs' | -0.0006 | -13.7774 | -13.7768 | 0.9994 | 0.0000 | 1.0e-06 |
| 10 | ' than' | -0.0143 | -13.3049 | -13.2906 | 0.9858 | 0.0000 | 1.7e-06 |
| 11 | ' the' | -0.0000 | -0.0365 | -0.0365 | 1.0000 | 0.9641 | 0.964 |
| 12 | ' smallest' | -0.0000 | -9.5825 | -9.5825 | 1.0000 | 0.0001 | 6.9e-05 |
| 13 | ' frog' | -0.0687 | -12.0658 | -11.9971 | 0.9336 | 0.0000 | 6.2e-06 |
| 14 | ',' | -0.0241 | -1.9186 | -1.8945 | 0.9762 | 0.1468 | 0.150 |
| 15 | ' we' | -0.0536 | -1.9336 | -1.8800 | 0.9478 | 0.1446 | 0.153 |
| 16 | ' need' | -1.4407 | -6.2406 | -4.7999 | 0.2368 | 0.0019 | 8.2e-03 |
| 17 | ' to' | -0.0006 | -8.5152 | -8.5146 | 0.9994 | 0.0002 | 2.0e-04 |
| 18 | ' first' | -1.5526 | -4.7237 | -3.1712 | 0.2117 | 0.0089 | 0.042 |
| 19 | ' determine' | -0.5707 | -3.6648 | -3.0941 | 0.5651 | 0.0256 | 0.045 |
| 20 | ' the' | -0.0348 | -0.0416 | -0.0068 | 0.9658 | 0.9592 | 0.993 |
| 21 | ' weight' | -0.0159 | -10.5655 | -10.5496 | 0.9842 | 0.0000 | 2.6e-05 |
| 22 | ' of' | -0.0013 | -0.2204 | -0.2192 | 0.9987 | 0.8022 | 0.803 |
| 23 | ' the' | -0.0001 | -0.0223 | -0.0222 | 0.9999 | 0.9779 | 0.978 |
| 24 | ' smallest' | -0.0005 | -9.4306 | -9.4301 | 0.9995 | 0.0001 | 8.0e-05 |
| 25 | ' frog' | -0.0001 | -10.8950 | -10.8949 | 0.9999 | 0.0000 | 1.9e-05 |
| 26 | '.\n\n' | -1.3582 | -1.9524 | -0.5942 | 0.2571 | 0.1419 | 0.552 |
| 27 | '1' | -1.6017 | -0.9273 | 0.6744 | 0.2016 | 0.3956 | 1.96 |
| 28 | '.' | -0.0037 | -0.1456 | -0.1418 | 0.9963 | 0.8645 | 0.868 |
| 29 | ' The' | -0.3951 | -0.7518 | -0.3566 | 0.6736 | 0.4715 | 0.700 |
| 30 | ' largest' | -0.1238 | -10.5936 | -10.4697 | 0.8835 | 0.0000 | 2.8e-05 |
| 31 | ' frog' | -0.0001 | -11.5054 | -11.5053 | 0.9999 | 0.0000 | 1.0e-05 |
| 32 | ' weighs' | -0.0447 | -10.5102 | -10.4655 | 0.9563 | 0.0000 | 2.9e-05 |
| 33 | ' ' | -0.0016 | -0.0263 | -0.0248 | 0.9984 | 0.9740 | 0.976 |
| 34 | '1' | -0.0001 | -0.1702 | -0.1701 | 0.9999 | 0.8435 | 0.844 |
| 35 | '2' | -0.1131 | -0.9061 | -0.7930 | 0.8931 | 0.4041 | 0.452 |
| 36 | '0' | -0.0000 | -0.0002 | -0.0002 | 1.0000 | 0.9998 | 1.000 |
| 37 | ' pounds' | -0.0000 | -4.5273 | -4.5273 | 1.0000 | 0.0108 | 0.011 |
| 38 | '.\n' | -0.0478 | -1.3257 | -1.2778 | 0.9533 | 0.2656 | 0.279 |
| 39 | '2' | -0.0001 | -1.9235 | -1.9234 | 0.9999 | 0.1461 | 0.146 |
| 40 | '.' | -0.0000 | -0.5815 | -0.5815 | 1.0000 | 0.5591 | 0.559 |
| 41 | ' The' | -0.1835 | -0.3392 | -0.1557 | 0.8324 | 0.7123 | 0.856 |
| 42 | ' largest' | -0.0104 | -10.2946 | -10.2842 | 0.9897 | 0.0000 | 3.4e-05 |
| 43 | ' frog' | -0.0003 | -11.3759 | -11.3756 | 0.9997 | 0.0000 | 1.1e-05 |
| 44 | ' can' | -0.9238 | -6.5918 | -5.6680 | 0.3970 | 0.0014 | 3.5e-03 |
| 45 | ' grow' | -0.0662 | -17.5500 | -17.4838 | 0.9360 | 0.0000 | 2.6e-08 |
| 46 | ' to' | -0.0091 | -7.6793 | -7.6703 | 0.9910 | 0.0005 | 4.7e-04 |
| 47 | ' weigh' | -0.0395 | -18.6062 | -18.5667 | 0.9613 | 0.0000 | 8.6e-09 |
| 48 | ' ' | -0.0005 | -0.0835 | -0.0830 | 0.9995 | 0.9199 | 0.920 |
| 49 | '1' | -0.0000 | -0.3328 | -0.3328 | 1.0000 | 0.7169 | 0.717 |
| ... 287 more tokens (see samples.json) | |||||||
<|im_start|>system You are Qwen, created by Alibaba Cloud. You are a helpful assistant.<|im_end|> <|im_start|>user Janice gave all three dozens of pebbles from her trip to her friends. Each of her friends got 4 pebbles. How many friends received pebbles? Let's think step by step and output the final answer after "####".<|im_end|> <|im_start|>assistant
First, let's calculate the total number of pebbles Janice had: 3 dozens x 12 pebbles/dozen = 36 pebbles Now, we divide the total number of pebbles by the number of pebbles each friend got to find out how many friends received pebbles: 36 pebbles ÷ 4 pebbles/friend = 9 friends #### 9
| pos | token | vLLM logp | FSDP logp | diff | p_vLLM | p_train | r = p_train/p_vLLM |
|---|---|---|---|---|---|---|---|
| 0 | 'First' | -1.0209 | -1.9090 | -0.8881 | 0.3603 | 0.1482 | 0.411 |
| 1 | ',' | -0.0008 | -0.0241 | -0.0233 | 0.9992 | 0.9762 | 0.977 |
| 2 | ' let' | -2.1349 | -2.5461 | -0.4112 | 0.1183 | 0.0784 | 0.663 |
| 3 | "'s" | -0.0001 | -0.0027 | -0.0027 | 0.9999 | 0.9973 | 0.997 |
| 4 | ' calculate' | -0.7362 | -0.3854 | 0.3508 | 0.4789 | 0.6802 | 1.42 |
| 5 | ' the' | -0.0133 | -0.0415 | -0.0281 | 0.9868 | 0.9594 | 0.972 |
| 6 | ' total' | -0.0001 | -0.1198 | -0.1198 | 0.9999 | 0.8871 | 0.887 |
| 7 | ' number' | -0.0003 | -0.0572 | -0.0569 | 0.9997 | 0.9444 | 0.945 |
| 8 | ' of' | -0.0000 | -0.0001 | -0.0001 | 1.0000 | 0.9999 | 1.000 |
| 9 | ' pe' | -0.0008 | -6.9657 | -6.9649 | 0.9992 | 0.0009 | 9.4e-04 |
| 10 | 'bb' | -0.0000 | -16.4632 | -16.4631 | 1.0000 | 0.0000 | 7.1e-08 |
| 11 | 'les' | 0.0000 | -16.9375 | -16.9375 | 1.0000 | 0.0000 | 4.4e-08 |
| 12 | ' Jan' | -0.0692 | -13.3776 | -13.3085 | 0.9332 | 0.0000 | 1.7e-06 |
| 13 | 'ice' | -0.0000 | -16.5210 | -16.5210 | 1.0000 | 0.0000 | 6.7e-08 |
| 14 | ' had' | -0.5468 | -3.1910 | -2.6441 | 0.5788 | 0.0411 | 0.071 |
| 15 | ':\n' | -1.0805 | -2.1311 | -1.0506 | 0.3394 | 0.1187 | 0.350 |
| 16 | '3' | -0.1611 | -4.6353 | -4.4742 | 0.8512 | 0.0097 | 0.011 |
| 17 | ' dozens' | -0.0242 | -7.7812 | -7.7570 | 0.9761 | 0.0004 | 4.3e-04 |
| 18 | ' x' | -5.0102 | -5.5118 | -0.5016 | 0.0067 | 0.0040 | 0.606 |
| 19 | ' ' | -0.0001 | -0.0018 | -0.0018 | 0.9999 | 0.9982 | 0.998 |
| 20 | '1' | -0.0001 | -3.2402 | -3.2401 | 0.9999 | 0.0392 | 0.039 |
| 21 | '2' | -0.0000 | -0.2780 | -0.2780 | 1.0000 | 0.7573 | 0.757 |
| 22 | ' pe' | -0.0009 | -9.0770 | -9.0760 | 0.9991 | 0.0001 | 1.1e-04 |
| 23 | 'bb' | -0.0000 | -19.9604 | -19.9604 | 1.0000 | 0.0000 | 2.1e-09 |
| 24 | 'les' | 0.0000 | -18.9864 | -18.9864 | 1.0000 | 0.0000 | 5.7e-09 |
| 25 | '/do' | -0.1042 | -15.4583 | -15.3541 | 0.9010 | 0.0000 | 2.1e-07 |
| 26 | 'zen' | -0.0001 | -19.3883 | -19.3882 | 0.9999 | 0.0000 | 3.8e-09 |
| 27 | ' =' | -0.0001 | -0.6627 | -0.6626 | 0.9999 | 0.5155 | 0.516 |
| 28 | ' ' | -0.0000 | -0.0014 | -0.0014 | 1.0000 | 0.9986 | 0.999 |
| 29 | '3' | -0.0000 | -1.0521 | -1.0521 | 1.0000 | 0.3492 | 0.349 |
| 30 | '6' | -0.0001 | -2.8649 | -2.8649 | 0.9999 | 0.0570 | 0.057 |
| 31 | ' pe' | -0.0000 | -9.0531 | -9.0531 | 1.0000 | 0.0001 | 1.2e-04 |
| 32 | 'bb' | -0.0000 | -19.3729 | -19.3729 | 1.0000 | 0.0000 | 3.9e-09 |
| 33 | 'les' | 0.0000 | -19.6614 | -19.6614 | 1.0000 | 0.0000 | 2.9e-09 |
| 34 | '\n\n' | -0.0041 | -0.3951 | -0.3909 | 0.9959 | 0.6736 | 0.676 |
| 35 | 'Now' | -0.3783 | -1.8440 | -1.4658 | 0.6851 | 0.1582 | 0.231 |
| 36 | ',' | -0.0821 | -0.0714 | 0.0108 | 0.9212 | 0.9311 | 1.01 |
| 37 | ' we' | -1.6273 | -1.0264 | 0.6009 | 0.1965 | 0.3583 | 1.82 |
| 38 | ' divide' | -2.2202 | -5.2458 | -3.0256 | 0.1086 | 0.0053 | 0.049 |
| 39 | ' the' | -0.0616 | -0.0419 | 0.0197 | 0.9403 | 0.9590 | 1.02 |
| 40 | ' total' | -0.0002 | -0.1265 | -0.1263 | 0.9998 | 0.8811 | 0.881 |
| 41 | ' number' | -0.0138 | -0.1678 | -0.1540 | 0.9863 | 0.8455 | 0.857 |
| 42 | ' of' | -0.0002 | -0.0024 | -0.0022 | 0.9998 | 0.9976 | 0.998 |
| 43 | ' pe' | -0.0000 | -7.5535 | -7.5535 | 1.0000 | 0.0005 | 5.2e-04 |
| 44 | 'bb' | -0.0000 | -16.7424 | -16.7424 | 1.0000 | 0.0000 | 5.4e-08 |
| 45 | 'les' | 0.0000 | -16.9947 | -16.9947 | 1.0000 | 0.0000 | 4.2e-08 |
| 46 | ' by' | -0.0005 | -7.0740 | -7.0734 | 0.9995 | 0.0008 | 8.5e-04 |
| 47 | ' the' | -0.0080 | -0.5358 | -0.5278 | 0.9920 | 0.5852 | 0.590 |
| 48 | ' number' | -0.0057 | -2.2160 | -2.2103 | 0.9943 | 0.1090 | 0.110 |
| 49 | ' of' | -0.1682 | -0.0105 | 0.1577 | 0.8452 | 0.9896 | 1.17 |
| ... 40 more tokens (see samples.json) | |||||||
<|im_start|>system You are Qwen, created by Alibaba Cloud. You are a helpful assistant.<|im_end|> <|im_start|>user Janice gave all three dozens of pebbles from her trip to her friends. Each of her friends got 4 pebbles. How many friends received pebbles? Let's think step by step and output the final answer after "####".<|im_end|> <|im_start|>assistant
First, let's determine the total number of pebbles Janice had. Since she had three dozens, we calculate the total number of pebbles as follows: 3 dozens * 12 pebbles/dozen = 36 pebbles Each friend received 4 pebbles, so to find out how many friends received pebbles, we divide the total number of pebbles by the number of pebbles each friend received: 36 pebbles / 4 pebbles per friend = 9 friends #### 9
| pos | token | vLLM logp | FSDP logp | diff | p_vLLM | p_train | r = p_train/p_vLLM |
|---|---|---|---|---|---|---|---|
| 0 | 'First' | -1.0223 | -1.9090 | -0.8866 | 0.3598 | 0.1482 | 0.412 |
| 1 | ',' | -0.0012 | -0.0241 | -0.0229 | 0.9988 | 0.9762 | 0.977 |
| 2 | ' let' | -2.1716 | -2.5461 | -0.3745 | 0.1140 | 0.0784 | 0.688 |
| 3 | "'s" | -0.0001 | -0.0027 | -0.0027 | 0.9999 | 0.9973 | 0.997 |
| 4 | ' determine' | -1.3307 | -3.5104 | -2.1797 | 0.2643 | 0.0299 | 0.113 |
| 5 | ' the' | -0.0622 | -0.0569 | 0.0052 | 0.9397 | 0.9446 | 1.01 |
| 6 | ' total' | -0.0007 | -0.1336 | -0.1328 | 0.9993 | 0.8750 | 0.876 |
| 7 | ' number' | -0.0004 | -0.0459 | -0.0455 | 0.9996 | 0.9552 | 0.956 |
| 8 | ' of' | -0.0000 | -0.0001 | -0.0001 | 1.0000 | 0.9999 | 1.000 |
| 9 | ' pe' | -0.0020 | -6.9941 | -6.9921 | 0.9980 | 0.0009 | 9.2e-04 |
| 10 | 'bb' | -0.0001 | -16.4588 | -16.4587 | 0.9999 | 0.0000 | 7.1e-08 |
| 11 | 'les' | 0.0000 | -16.8583 | -16.8583 | 1.0000 | 0.0000 | 4.8e-08 |
| 12 | ' Jan' | -0.0341 | -13.1849 | -13.1508 | 0.9665 | 0.0000 | 1.9e-06 |
| 13 | 'ice' | -0.0000 | -16.9292 | -16.9292 | 1.0000 | 0.0000 | 4.4e-08 |
| 14 | ' had' | -0.3370 | -3.1847 | -2.8477 | 0.7139 | 0.0414 | 0.058 |
| 15 | '.' | -0.5390 | -1.1540 | -0.6150 | 0.5833 | 0.3154 | 0.541 |
| 16 | ' Since' | -0.7129 | -3.8398 | -3.1269 | 0.4902 | 0.0215 | 0.044 |
| 17 | ' she' | -0.1806 | -2.5916 | -2.4110 | 0.8348 | 0.0749 | 0.090 |
| 18 | ' had' | -0.3257 | -3.0678 | -2.7421 | 0.7220 | 0.0465 | 0.064 |
| 19 | ' three' | -0.0583 | -8.3890 | -8.3307 | 0.9434 | 0.0002 | 2.4e-04 |
| 20 | ' dozens' | -0.0142 | -5.5190 | -5.5048 | 0.9859 | 0.0040 | 4.1e-03 |
| 21 | ',' | -1.1469 | -1.7666 | -0.6198 | 0.3176 | 0.1709 | 0.538 |
| 22 | ' we' | -0.4253 | -3.8744 | -3.4492 | 0.6536 | 0.0208 | 0.032 |
| 23 | ' calculate' | -0.3094 | -0.5867 | -0.2773 | 0.7339 | 0.5562 | 0.758 |
| 24 | ' the' | -3.0911 | -0.0624 | 3.0287 | 0.0455 | 0.9395 | 20.67 |
| 25 | ' total' | -0.0183 | -0.1162 | -0.0980 | 0.9819 | 0.8903 | 0.907 |
| 26 | ' number' | -1.6160 | -0.1499 | 1.4661 | 0.1987 | 0.8608 | 4.33 |
| 27 | ' of' | -0.0503 | -0.0051 | 0.0451 | 0.9510 | 0.9949 | 1.05 |
| 28 | ' pe' | -0.0008 | -7.5173 | -7.5165 | 0.9992 | 0.0005 | 5.4e-04 |
| 29 | 'bb' | -0.0000 | -17.4416 | -17.4415 | 1.0000 | 0.0000 | 2.7e-08 |
| 30 | 'les' | 0.0000 | -17.5843 | -17.5843 | 1.0000 | 0.0000 | 2.3e-08 |
| 31 | ' as' | -0.2501 | -7.5775 | -7.3274 | 0.7787 | 0.0005 | 6.6e-04 |
| 32 | ' follows' | -0.1428 | -14.5207 | -14.3778 | 0.8669 | 0.0000 | 5.7e-07 |
| 33 | ':\n\n' | -0.2948 | -1.4458 | -1.1511 | 0.7447 | 0.2356 | 0.316 |
| 34 | '3' | -0.4686 | -4.5624 | -4.0938 | 0.6259 | 0.0104 | 0.017 |
| 35 | ' dozens' | -0.0310 | -8.5148 | -8.4839 | 0.9695 | 0.0002 | 2.1e-04 |
| 36 | ' *' | -0.7912 | -1.0955 | -0.3043 | 0.4533 | 0.3344 | 0.738 |
| 37 | ' ' | -0.0022 | -0.0020 | 0.0003 | 0.9978 | 0.9980 | 1.00 |
| 38 | '1' | -0.0001 | -3.7128 | -3.7127 | 0.9999 | 0.0244 | 0.024 |
| 39 | '2' | -0.0002 | -0.4215 | -0.4213 | 0.9998 | 0.6561 | 0.656 |
| 40 | ' pe' | -0.0012 | -10.5660 | -10.5648 | 0.9988 | 0.0000 | 2.6e-05 |
| 41 | 'bb' | -0.0000 | -20.3070 | -20.3069 | 1.0000 | 0.0000 | 1.5e-09 |
| 42 | 'les' | 0.0000 | -18.6896 | -18.6896 | 1.0000 | 0.0000 | 7.6e-09 |
| 43 | '/do' | -0.0894 | -15.8296 | -15.7401 | 0.9144 | 0.0000 | 1.5e-07 |
| 44 | 'zen' | -0.0000 | -19.3306 | -19.3306 | 1.0000 | 0.0000 | 4.0e-09 |
| 45 | ' =' | -0.0034 | -0.6942 | -0.6908 | 0.9966 | 0.4995 | 0.501 |
| 46 | ' ' | -0.0003 | -0.0023 | -0.0020 | 0.9997 | 0.9977 | 0.998 |
| 47 | '3' | -0.0000 | -0.9641 | -0.9641 | 1.0000 | 0.3813 | 0.381 |
| 48 | '6' | -0.0068 | -3.0552 | -3.0484 | 0.9932 | 0.0471 | 0.047 |
| 49 | ' pe' | -0.0000 | -9.2901 | -9.2900 | 1.0000 | 0.0001 | 9.2e-05 |
| ... 66 more tokens (see samples.json) | |||||||
Not a committed plan, only a pointer for later. One candidate for the regime where the mismatch is too large for plain TIS is the recent paper Adaptive Layerwise Perturbation: Unifying Off-Policy Corrections for LLM RL (Ye et al., arXiv:2603.19470), which controls heavy-tailed importance ratios instead of masking the offending tokens out. It is shown for quantization / kernel mismatch, not compression, so it would need adapting, and we have not decided whether to use it.
Raw data: step_batch.pt / step_batch.npz (full batch tensors),
summary.json, samples.json. Generated by tim_report.py.