22 research papers you should read to master Language modeling.
-
A Mathematical Theory of Communication
Foundational paper in information theory by Claude Shannon.
-
A Neural Probabilistic Language Model
Early neural language model by Bengio et al.
-
NLP (Almost) from scratch
Collobert et al.'s unified neural network approach to NLP.
-
Phoneme Recognition using Time Delay Neural Networks
Early TDNN application to speech recognition.
-
Efficient Estimation of words in vector space (Word2Vec)
Mikolov et al.'s Word2Vec: CBOW and Skip-gram models.
-
GloVe: Global Vectors for Word Representations
Global vector approach to word embeddings.
-
Enriching word vectors with subword information (FastText)
FastText paper on subword information for word representations.
-
A Convolution Neural Network for modeling sentences
CNN-based approach to sentence modeling.
-
Learning Internal Representations by error propagation
Classic PDP chapter on backpropagation.
-
Sequence Modeling (from the deep learning book)
Deep learning book chapter on sequence modeling.
-
Long Short Term Memory (LSTM)
Original LSTM paper by Hochreiter and Schmidhuber.
-
Colah's blog to understanding LSTM
Excellent visual explanation of LSTM networks.
-
Training Recurrent Neural Networks (PhD Thesis)
Sutskever's comprehensive PhD thesis on RNN training.
-
Deep contextualized word representations (ELMo)
ELMo paper on contextualized word representations.
-
Attention is all you need (Transformer)
Foundational Transformer paper.
-
Bidirectional Encoder Representations from Transformers (BERT)
BERT paper introducing bidirectional transformer representations.
-
Improving language understanding by generative pretraining (GPT-1)
Original GPT paper on generative pretraining.
-
Language Models are multi task learner (GPT-2)
GPT-2 paper on unsupervised multitask learning.
-
Language models are few shot learners (GPT-3)
GPT-3 paper on few-shot learning capabilities.
-
Sentence BERT
Sentence embeddings using Siamese BERT networks.
-
ChatGPT Blog
OpenAI's blog post on ChatGPT.
-
LlaMA-2
Meta's LLaMA-2 technical paper.