Nice yeah it makes intuitive sense that a custom dataset requiring answers from questions/data in the middle of text can improve the lost-in-the-middle problem.
Interesting, so compressibility gives some sense of the data quality. Seems that if your data is more compressible, you need more of it (or at least should weigh towards more data than increasing model size).
+ https://arxiv.org/abs/2404.16811 = perfect FT model?? Reg - Damon
Nice yeah it makes intuitive sense that a custom dataset requiring answers from questions/data in the middle of text can improve the lost-in-the-middle problem.
Hi Roman, look at https://arxiv.org/abs/2405.16684 new logic concept: scale law/parameters/gzip
Interesting, so compressibility gives some sense of the data quality. Seems that if your data is more compressible, you need more of it (or at least should weigh towards more data than increasing model size).
is there a recording for this session? Thanks
here you go! https://youtube.com/live/lQh0WLxdYos