Temperature

Dec 10, 2024

AI by Hand ✍️ in Excel

2 Comments

Great article. I played with this trying to understand the concept. My goal was trying to answer: why providers such as OpenAI or Google set the parameters they set for temperature. This is my take:

LLMs predict the next token.

In the sheet, we have a pre-defined Z score for a given set of words; the higher the value, the less likely it is to be the next word by chance.

Those values are passed to the softmax function that converts the score into probabilities.

Temperature is a parameter that divides the Z-score; the higher the temperature, the more it divides the Z-score, making it smaller (closer to zero) and therefore less likely to find outliers (or more likely following words).

If we put this in a typical distribution graph, values close to 1 tend to give a bell-shaped curve, making the cut-off point more likely to pick the most probable word, while higher values transform this into a uniform distribution, making it any word with equal chance of being selected.

At first I was thinking that this providers selected the Tempeature values based on a comparison, now that I saw how it was calculated my take is that it is more likely that the chosen parameter as cut-off point came from error analysis, in which for a given model provider they understand that after a certain threshold the probabilistic distribution lose their meaning and therefore not likely to be selected by anyone in a realistic context and could potentially be harmful for the user experience.

Every model provider selected their threshold based on their vocabulary and the training outcomes, understanding users' needs, and the most likely use case based on error analysis.

Happy to discuss and be corrected!!

I am loving your work!!

Expand full comment

Reply (1)

Pastor Soto

Aug 17

I did a review and saw some conceptual errors I have:

1. Those are not Z-scores; they are raw values for the softmax function (logit)

2. Softmax is not a bell-shaped curve; It is a categorical distribution.

Expand full comment