27 lines
688 B
Markdown
27 lines
688 B
Markdown
# LLM from scratch
|
|
|
|
## Resources
|
|
- [Build a Large Language Model](https://www.manning.com/books/build-a-large-language-model-from-scratch)
|
|
- [Writing an LLM from scratch, part 28](https://www.gilesthomas.com/2025/12/llm-from-scratch-28-training-a-base-model-from-scratch)
|
|
- [nanochat](https://github.com/karpathy/nanochat)
|
|
|
|
## TODO:
|
|
- chat cli, evaluate each epoch
|
|
- better arch (read nanochat)
|
|
- count tokens
|
|
- download more data (code, full fineweb)
|
|
|
|
- Notes
|
|
- comments
|
|
|
|
- TrainTestIterator
|
|
- total length
|
|
- deterministic shuffle
|
|
- prepare in parallel
|
|
- refactor new() into builder
|
|
- small texts (<|bos|>?)
|
|
|
|
- Training
|
|
- multi-device training
|
|
- model parameters in file
|