688 B
688 B
LLM from scratch
Resources
TODO:
-
chat cli, evaluate each epoch
-
better arch (read nanochat)
-
count tokens
-
download more data (code, full fineweb)
-
Notes
- comments
-
TrainTestIterator
- total length
- deterministic shuffle
- prepare in parallel
- refactor new() into builder
- small texts (<|bos|>?)
-
Training
- multi-device training
- model parameters in file