In AI, especially in natural language processing (NLP), a token is a unit of text that the model processes. Tokens are the building blocks of input and output for language models.
So, what exactly Is a token?
A token can be:
Most modern models, like those based on the Transformer architecture, use subword tokenization. This helps handle rare or unknown words more efficiently.
Tokens are how the model understands and generates language. For example:
["I", "love", "pizza", "!"]
Or, in subword form:
["I", "love", "piz", "za", "!"]
Each token is converted into a numerical representation (embedding) that the model can process.
0 Comments.