Seeding GPT-4o-mini Using Fine-Tuning
In a previous post I looked at the ease with which the new Small Language Model from OpenAI can be fine-tuned.
In this article I look at the nondeterministic nature of a fine-tuned GPT-4o-mini model and to what extent seeding can be used.
Introduction
In two previous posts, I explored how OpenAI’s new SLM can be fine-tuned. This process can be accomplished entirely through a no-code dashboard, after which I tested the model using the playground.
Two interesting developments have emerged recently.
Previously, fine-tuning was hindered by the requirement of at least 100 records, which posed a significant barrier.
Additionally, the process of fine-tuning was time-consuming, making the iterative cycle of testing and refining models lengthy and inefficient.
However, these challenges have been addressed. Now, only 10 lines of examples are needed to make a noticeable difference in the model’s performance.
In the examples below, interactions highlighted in purple represent the standard mini model, while those in green showcase the fine-tuned model.
It’s evident from these examples how the fine-tuned responses become prominent when the custom model is referenced, whereas the standard responses are produced by the default model.
And the extract from the training text:
{“messages”:
[{“role”: “system”,
“content”: “You should help to user to answer on his question.”},
{“role”: “user”,
“content”: “What is X?”},
{“role”: “assistant”,
“content”: “X is a social media platform promoting free speech.”}]}
Seeding When Introduced
The ability to reproduce LLM output is a highly valuable feature, akin to a caching mechanism.
However, implementing this feature requires significant architectural changes and additional data storage, which can add to the system’s overhead.
As seen in the basic architecture shown below…
Seeding While Fine-Tuning
The seed controls the reproducibility of a job. By passing the same seed and job parameters, you should generally achieve consistent results, though minor differences may occur in rare cases. If no seed is specified, one will be automatically generated.
As shown below, you can set a seed value during fine-tuning, which can be any numeric value defined by the builder. When the fine-tuned model is prompted, the seed value associated with the fine-tuning job can be included (as you’ll see later in this article). This helps suppress the model’s non-deterministic behaviour, allowing it to return the specific output it was trained on.
This process is analogous to adding user utterances to an intent; variations in those utterances trigger a specific intent. Similarly, variations in the training input, when combined with seeding, will consistently produce the trained response.
Practical Use Cases For Fine-Tuned Seeding
There are several reasons why one might choose to use this approach:
Simplified Reproducibility: Fine-tuning a model with input/output pairs for reproducibility allows you to offload much of the complexity and software infrastructure to OpenAI.
This means that the data, along with the seed reference, is embedded within the model itself, streamlining administration and ensuring availability.
Precision in Use Cases: By using seeding, you can precisely target different fine-tuned datasets for various user scenarios.
This enables more tailored and consistent responses depending on the specific context in which the model is being used.
Fine-tuning can also be versioned, and if a newer version of the data is used for fine-tuning, this can be tracked via fine-tuning.
Inference Segmentation: Inference can be segmented into two categories: general information and specific information on which the model is fine-tuned.
This allows the model to distinguish between providing broad, generalised responses and delivering more precise answers based on the specialised training it has received.
Considerations
Fine-tuned models that have already been created from these base models will not be affected by this deprecation.
However, you will no longer be able to create new fine-tuned versions using these models.
Although this approach is appealing due to its simplicity and no-code nature, it does create model dependency, making it difficult to migrate to another model or AI provider.
The advantage of a RAG (Retrieval-Augmented Generation) framework is that it offers a degree of model independence, allowing AI models to be treated more like interchangeable utilities.
This flexibility can simplify transitions between different models or providers, reducing the risk of being locked into a single ecosystem.
Context
Large Language Models primarily rely on unstructured natural language input and generate output through Natural Language Generation (NLG).
A key aspect of fine-tuning, as with RAG, is to provide the Language Model with contextual reference during inference.
This process is known as In-Context Learning (ICL). ICL enables the Language Model to temporarily set aside its base knowledge and instead leverage the specific contextual data supplied, allowing for more accurate and relevant responses based on the given context.
Below is Python code you can copy as-is and paste into a notebook. You can see the question is asked, What is X?
and without the necessary context, this question is very ambiguous.
pip install -q openai
#####
import os
#####
os.environ['OPENAI_API_KEY'] = str("<Your OpenAI API key goes here>")
#####
from openai import OpenAI
client = OpenAI()
response = client.chat.completions.create(
model="gpt-4o-mini-2024-07-18",
messages=[
{
"role": "system",
"content": "You should help to user to answer on his question."
},
{
"role": "user",
"content": "What is X?"
}
],
temperature=1,
max_tokens=150,
top_p=1,
frequency_penalty=0,
presence_penalty=0,
seed=1
)
print(response)
Therefore, considering the output from the gpt-4o-mini-2024–07–18
model, the model is attempting to disambiguate the input by casting the question into different contexts.
This process involves analysing the question to identify multiple potential interpretations, allowing the model to respond more accurately based on the inferred context.
ChatCompletion(id='chatcmpl-A1UX0BcHnsoy7EPnRUqJHp4F0CToW',
choices=[Choice(finish_reason='length',
index=0, logprobs=None,
message=ChatCompletionMessage(content='The question "What is X?"
is quite broad and can refer to many things depending on the context.
Here are a few possibilities:\n\n1.
**Mathematics**: In an equation, X often represents an unknown variable
that needs to be solved.\n2.
**Algebra**: X can be a placeholder for any
number in algebraic expressions.\n3.
**Science**: X could refer to a
variable in experiments or a certain element in chemistry (like X-rays).\n4.
**Popular culture**: X could refer to a character, a title, or a brand name
(like "X-Men" or "X-Factor").\n5.
**Programming**: In computer science, X might stand for a parameter or a
specific data type', refusal=None, role='assistant',
function_call=None, tool_calls=None))],
created=1724919054,
model='gpt-4o-mini-2024-07-18',
object='chat.completion',
service_tier=None,
system_fingerprint='fp_5bd87c427a',
usage=CompletionUsage(completion_tokens=150, prompt_tokens=26, total_tokens=176))
However, below is a snippet from the training data where the context of X
is defined…
{“messages”:
[{“role”: “system”, “content”: “You should help to user to answer on his question.”},
{“role”: “user”, “content”: “What is X?”},
{“role”: “assistant”, “content”: “X is a social media platform promoting free speech.”}]}
Hence if the fine-tuned model is now asked this question, but we phrase the question slightly differently from the trained question, and do not include the seed value added to the fine-tuning parameters:
response = client.chat.completions.create(
model="ft:gpt-4o-mini-2024-07-18:personal:demo1:9yZhuBp0",
messages=[
{
"role": "system",
"content": "You should help to user to answer on his question."
},
{
"role": "user",
"content": "What is X?"
}
],
temperature=1,
max_tokens=150,
top_p=1,
frequency_penalty=0,
presence_penalty=0,
#seed=1
)
print(response)
We receive a much more contextual answer, which is a significant improvement, but not verbatim what we trained the model on. There is still a use for this scenario, where the model does respond in a non-deterministic nature, but within context.
Below is the contextual response…
X, formerly known as Twitter, is a social media platform promoting free speech.
Users post and interact with messages called "tweets."
It features a range of functionalities, including posting, sharing,
and liking tweets.'
and lastly, if we again ask the fine-tuned model the same question, with seeding defined as 1
, which we used when fine-tuning…
from openai import OpenAI
client = OpenAI()
response = client.chat.completions.create(
model="ft:gpt-4o-mini-2024-07-18:personal:demo1:9yZhuBp0",
messages=[
{
"role": "system",
"content": "You should help to user to answer on his question."
},
{
"role": "user",
"content": "What is X?"
}
],
temperature=1,
max_tokens=150,
top_p=1,
frequency_penalty=0,
presence_penalty=0,
seed=1
)
print(response)
We get the exact trained phrase back verbatim…
X is a social media platform promoting free speech.
Below is the full response from the model…
ChatCompletion(id='chatcmpl-A1Uf5zPjwOHD4VdjgKeKutMnmSqbt',
choices=[Choice(finish_reason='stop',
index=0,
logprobs=None,
message=ChatCompletionMessage(content='X is a social media platform promoting free speech.',
refusal=None,
role='assistant',
function_call=None,
tool_calls=None))],
created=1724919555,
model='ft:gpt-4o-mini-2024-07-18:personal:demo1:9yZhuBp0',
object='chat.completion',
service_tier=None,
system_fingerprint='fp_7b59f00607',
usage=CompletionUsage(completion_tokens=10, prompt_tokens=26, total_tokens=36))
In Conclusion
There isn’t a single right or optimal way of doing things.
The key questions to consider are: Is this a prototype or a production implementation? What business case or problem are we aiming to solve? What infrastructure and budget are available to build the solution? Is this a long-term strategy or a temporary fix?
One thing is certain: the more knowledge you have, the better your assessment and analysis will be.
I’m currently the Chief Evangelist @ Kore AI. I explore & write about all things at the intersection of AI & language; ranging from LLMs, Chatbots, Voicebots, Development Frameworks, Data-Centric latent spaces & more.