OpenAI JSON Mode & Seeding

Switching OpenAI JSON mode on will not guarantee the output matches any specific predefined JSON schema.

Dec 20, 2023

The only guarantee is that the JSON is valid and parses without errors, hence the use of Seeding…

Introduction

When using OpenAI Assistant function calling, the JSON mode is always on, however when using Chat Completions, the JSON flag needs to be set:response_format={ “type”: “json_object” }.

With function calling, a JSON schema or structure is created by the user, against which the generated response is matched and the JSON fields are populated.

Hence with function calling a predefined template guides the model on what the structure of the JSON document is. And subsequently the model uses AI to assign entities from the user input to the JSON fields.

When JSON mode is enabled, the model is constrained to only generate strings that parse into valid JSON object.

JSON Mode

With the OpenAI JSON Mode, the challenge is the fact that the JSON output from the model varies considerably with each inference; and the JSON schema cannot be pre-defined.

Below you will see two vastly different JSON document schemas, generated by the model.

Seeding

One way to create consistency in terms of JSON schemas, is to make use of the seed parameter, as you will see in the code example below.

For a fairly similar input, if the seed parameter is passed, the same JSON schema is repeated.

Also visible is the newly added system fingerprint parameter, here is an example fingerprint returned: system_fingerprint=’fp_eeff13170a’ . The fingerprint can be saved and checked with each response. When a change in the fingerprint is detected, it acts as a notification of LLM API changes which might impact model responses.

Practical Code Examples

You should always instruct the model to produce JSON via some message in the conversation, for example via your system message. Setting the response format flag alone is not enough.

OpenAI warns that by not including an explicit instruction to generate JSON, the model may generate an unending stream of whitespace and the request may run continually until it reaches the token limit.

The return parameter of finish_reason should be checked if it says stop.

If it says length, it indicates the generation exceeded max_tokens or the conversation exceeded the token limit. To guard against this, check finish_reason before parsing the response.

Below is complete code which you can copy and paste into a notebook; notice how the model is instructed in the system message to generate a JSON document.

Notice the response format set to json_object, and the seed parameter is set.

pip install openai

import openai
import os
os.environ['OPENAI_API_KEY'] = str("Your API Key goes here")
from openai import OpenAI
client = OpenAI()
response = client.chat.completions.create(
  model="gpt-3.5-turbo-1106",
  response_format={ "type": "json_object" },
  messages=[
    {"role": "system", "content": "You are a helpful assistant designed to output JSON."},
    {"role": "user", "content": "How many Olympic medals have Usain Bolt have and from which games?"}
  ],
  temperature=1,
  max_tokens=250,
  top_p=1,
  frequency_penalty=0,
  presence_penalty=0,
  seed=1001
)
print(response.choices[0].message.content)
print("###################")
print (response)

Below is one version of JSON document generated where the seedparameter was not set…

{
  "medals": {
    "gold": 8,
    "silver": 0,
    "bronze": 0
  },
  "games": [
    "Beijing 2008",
    "London 2012",
    "Rio 2016"
  ]
}

And again another schema is generated; hence the earlier statement that setting the seed parameter makes sense to introduce predictable JSON schemas.

{
  "athlete": "Usain Bolt",
  "total_medals": 8,
  "medals": [
    {
      "games": "2008 Beijing",
      "medal": "Gold",
      "event": "100m"
    },
    {
      "games": "2008 Beijing",
      "medal": "Gold",
      "event": "200m"
    },
    {
      "games": "2008 Beijing",
      "medal": "Gold",
      "event": "4x100m relay"
    },
    {
      "games": "2012 London",
      "medal": "Gold",
      "event": "100m"
    },
    {
      "games": "2012 London",
      "medal": "Gold",
      "event": "200m"
    },
    {
      "games": "2012 London",
      "medal": "Gold",
      "event": "4x100m relay"
    },
    {
      "games": "2016 Rio",
      "medal": "Gold",
      "event": "100m"
    },
    {
      "games": "2016 Rio",
      "medal": "Gold",
      "event": "200m"
    }
  ]
}

If the JSON mode is disabled, the response below is generated.

Usain Bolt has won a total of 8 Olympic medals. Here are the details of his 
medals from each Olympic Games:

2008 Beijing Olympics:
- Gold in 100m
- Gold in 200m
- Gold in 4x100m relay
2012 London Olympics:
- Gold in 100m
- Gold in 200m
- Gold in 4x100m relay
2016 Rio Olympics:
- Gold in 100m
- Gold in 200m

And lastly, the complete model response, in this case the system_fingerprintis return.

ChatCompletion(
    id='chatcmpl-8N0qThwbrN5e0tRa0j6u8GBrtSEIM', 
    choices=[Choice(finish_reason='stop', index=0, 
    message=ChatCompletionMessage


   (content='\n{\n  "athlete": "Usain Bolt",\n  "total_medals": 8,\n  
   "medals_by_game": {\n    "Beijing 2008": {\n      "gold": 3\n    },\n    
   "London 2012": {\n      "gold": 3\n    },\n    "Rio 2016": {\n      
   "gold": 3\n    }\n  }\n}', 

   role='assistant', 
   function_call=None, 
   tool_calls=None))], 

   created=1700495485, 
   model='gpt-3.5-turbo-1106', 
   object='chat.completion', 
   system_fingerprint='fp_eeff13170a', 
   usage=CompletionUsage

(completion_tokens=83, prompt_tokens=35, total_tokens=118))

Finally

Integrating a Generative AI application into a well-architected environment is crucial for unlocking its full potential and ensuring seamless functionality within complex systems.

A well-architected environment should prioritise performance, scalability, security and reliability.

A well-architected environment embraces flexibility and adaptability. As generative AI evolves, a scalable architecture allows for seamless updates and integration.

Having said all of this, a number of functionalities have been introduced by OpenAI which was shortly thereafter deprecated. This includes usage modes and a whole host of models.

I get the feeling that many of these features are experimental and not necessarily suited for production implementations. Hence it is prudent when building GenApps to aspire to becoming LLM agnostic, and considering LLMs as utilities.

Leveraging baked-in LLM functionality introduces additional complexity and change management overhead, and leaves organisations at the behest of one model provider.

⭐️ Follow me on LinkedIn for updates on Large Language Models ⭐️

I’m currently the Chief Evangelist @ Kore AI. I explore & write about all things at the intersection of AI & language; ranging from LLMs, Chatbots, Voicebots, Development Frameworks, Data-Centric latent spaces & more.

https://platform.openai.com/docs/guides/text-generation/json-mode

Cobus Greyling on LLMs, NLU, NLP, chatbots & voicebots

Discussion about this post

Ready for more?