Julio's dev
๐Ÿ“„ Paper

[Review] Data Mixing Made Efficient- A Bivariate Scaling Law for Language Model Pretraining(`24.05)