top of page

Current Most Read

Live Forever? Oasis might do just that!
Veo 3: Google's Leap into AI-Generated Video and the Questions It Raises
The Effects Of Tokenizing On AI

The Effects Of Tokenizing On AI

  • Writer: Reece Harrison
    Reece Harrison
  • May 20
  • 2 min read

First of all, let's start with what tokens are. Tokens are how an AI or computer program breaks up data into smaller bits or converts words into computer code that is easier for a computer to understand. For example, when an AI thinks of a word, it might not just think of that word directly; instead, it will associate it with something related. So, instead of just thinking of the word "pen," it might think of something like "the thing you write with," and only later use the word "pen" to convey its meaning.


Blue whale logo and "deepseek" text over white, wavy background. Minimalistic and serene design.


Drawbacks of Tokenization: Compromised Precision and Contextual Understanding

The main reason this process of tokenization is necessary is to save computational power. The energy required for an AI like ChatGPT to generate a response is already quite high; it takes about 519 millilitres of water to produce a 100-word reply. It may take a liter or more of water to cool down the system. Without tokenization, generating responses would produce much more heat and consume significantly more power, making AI operations much more expensive and causing longer wait times for replies.


Drawbacks of Tokenization: Compromised Precision and Contextual Understanding

However, there are drawbacks to tokenization. It means that the AI no longer perceives a word as just a word; rather, it sees it as something related to that word. Consequently, it can't count how many times a particular letter appears in it. Think of it like trying to describe a picture in your head; you might miss some details that were clear to you at the time. This can result in the AI providing a response that’s not fully aligned with what it intended to say, making it seem as if it doesn't know the answer when, in fact, it might have a better answer in its "mind." Another problem is that when the AI is prevented from recognizing words as they are, it must still convert its thoughts into words. This process leads to the AI picking related words, sometimes leading to multiple valid options.


The effects of temperature on AI

This is where the concept of "temperature" comes into play. Temperature affects how the AI selects words; for example, a temperature of 1 means it will always pick the most likely word from its list, while a temperature of 0.7 would slightly increase the chances of selecting the top word but still allow for some randomness. This approach helps the AI sound less repetitive and more varied in its responses. This is also why you may get 2 different replies to the same input. The standard temperature used by AI systems, like DeepSeek and ChatGPT, is typically 0.7, but some AI designed to sound more human might use a lower temperature, such as 0.5, since human language can be quite unpredictable at times.


Tokenization: Balancing Efficiency and Complexity in AI Systems

So, in conclusion, we use tokens to save on power and decrease the heat made when an AI makes a replay, but by doing so, we add new problems to the mix and solve some others. Tokenizing.

The Effects Of Tokenizing On AI

The Effects Of Tokenizing On AI

20 May 2025

Reece Harrison

Want your article or story on our site? Contact us here

First of all, let's start with what tokens are. Tokens are how an AI or computer program breaks up data into smaller bits or converts words into computer code that is easier for a computer to understand. For example, when an AI thinks of a word, it might not just think of that word directly; instead, it will associate it with something related. So, instead of just thinking of the word "pen," it might think of something like "the thing you write with," and only later use the word "pen" to convey its meaning.


Blue whale logo and "deepseek" text over white, wavy background. Minimalistic and serene design.


Drawbacks of Tokenization: Compromised Precision and Contextual Understanding

The main reason this process of tokenization is necessary is to save computational power. The energy required for an AI like ChatGPT to generate a response is already quite high; it takes about 519 millilitres of water to produce a 100-word reply. It may take a liter or more of water to cool down the system. Without tokenization, generating responses would produce much more heat and consume significantly more power, making AI operations much more expensive and causing longer wait times for replies.


Drawbacks of Tokenization: Compromised Precision and Contextual Understanding

However, there are drawbacks to tokenization. It means that the AI no longer perceives a word as just a word; rather, it sees it as something related to that word. Consequently, it can't count how many times a particular letter appears in it. Think of it like trying to describe a picture in your head; you might miss some details that were clear to you at the time. This can result in the AI providing a response that’s not fully aligned with what it intended to say, making it seem as if it doesn't know the answer when, in fact, it might have a better answer in its "mind." Another problem is that when the AI is prevented from recognizing words as they are, it must still convert its thoughts into words. This process leads to the AI picking related words, sometimes leading to multiple valid options.


The effects of temperature on AI

This is where the concept of "temperature" comes into play. Temperature affects how the AI selects words; for example, a temperature of 1 means it will always pick the most likely word from its list, while a temperature of 0.7 would slightly increase the chances of selecting the top word but still allow for some randomness. This approach helps the AI sound less repetitive and more varied in its responses. This is also why you may get 2 different replies to the same input. The standard temperature used by AI systems, like DeepSeek and ChatGPT, is typically 0.7, but some AI designed to sound more human might use a lower temperature, such as 0.5, since human language can be quite unpredictable at times.


Tokenization: Balancing Efficiency and Complexity in AI Systems

So, in conclusion, we use tokens to save on power and decrease the heat made when an AI makes a replay, but by doing so, we add new problems to the mix and solve some others. Tokenizing.

bottom of page