The Effects Of Tokenizing On AI
- Reece Harrison
- 3 days ago
- 2 min read
First of all, let's start with what tokens are. Tokens are how an AI or computer program breaks up data into smaller bits or converts words into computer code that is easier for a computer to understand. For example, when an AI thinks of a word, it might not just think of that word directly; instead, it will associate it with something related. So, instead of just thinking of the word "pen," it might think of something like "the thing you write with," and only later use the word "pen" to convey its meaning.

Drawbacks of Tokenization: Compromised Precision and Contextual Understanding
The main reason this process of tokenization is necessary is to save computational power. The energy required for an AI like ChatGPT to generate a response is already quite high; it takes about 519 millilitres of water to produce a 100-word reply. It may take a liter or more of water to cool down the system. Without tokenization, generating responses would produce much more heat and consume significantly more power, making AI operations much more expensive and causing longer wait times for replies.
Drawbacks of Tokenization: Compromised Precision and Contextual Understanding
However, there are drawbacks to tokenization. It means that the AI no longer perceives a word as just a word; rather, it sees it as something related to that word. Consequently, it can't count how many times a particular letter appears in it. Think of it like trying to describe a picture in your head; you might miss some details that were clear to you at the time. This can result in the AI providing a response that’s not fully aligned with what it intended to say, making it seem as if it doesn't know the answer when, in fact, it might have a better answer in its "mind." Another problem is that when the AI is prevented from recognizing words as they are, it must still convert its thoughts into words. This process leads to the AI picking related words, sometimes leading to multiple valid options.
The effects of temperature on AI
This is where the concept of "temperature" comes into play. Temperature affects how the AI selects words; for example, a temperature of 1 means it will always pick the most likely word from its list, while a temperature of 0.7 would slightly increase the chances of selecting the top word but still allow for some randomness. This approach helps the AI sound less repetitive and more varied in its responses. This is also why you may get 2 different replies to the same input. The standard temperature used by AI systems, like DeepSeek and ChatGPT, is typically 0.7, but some AI designed to sound more human might use a lower temperature, such as 0.5, since human language can be quite unpredictable at times.
Tokenization: Balancing Efficiency and Complexity in AI Systems
So, in conclusion, we use tokens to save on power and decrease the heat made when an AI makes a replay, but by doing so, we add new problems to the mix and solve some others. Tokenizing.