Parameter Description of OpenAI. ChatCompletion.create Interface

Introduction: Understanding the parameters of an interface is essential in order to know how to use it effectively. To provide a better understanding of the parameters, I have compiled the explanation of some key API interface parameters offered by OpenAI.

OpenAI offers two main API interfaces: ChatCompletion and Completion. Although both are natural language generation models, they have slightly different purposes and applications.


  • Designed specifically for generating text in conversation and chat scenarios.
  • It produces text that closely resembles human conversation style and tone.
  • It is suitable for applications such as intelligent customer service, chatbots, and generating automated responses in daily conversations.


  • It is a general-purpose natural language generation interface.
  • It supports generating various types of text, including paragraphs, summaries, suggestions, answers, etc.
  • The output generated by Completion is more diverse, rigorous, and professional.
  • It is suitable for a wide range of text generation scenarios, such as article writing, information extraction, machine translation, natural language question answering, etc.

In summary:

  • ChatCompletion is suitable for generating text in conversation and chat scenarios.
  • Completion is suitable for more general natural language generation scenarios.

Example Usage: Since the API parameters for both interfaces are mostly the same, I will only provide an example that highlights the differences. For the common parameters, please refer to the OpenAI.Completion.create API parameter documentation.

Parameter: messages ChatCompletion takes a series of messages as input and returns model-generated messages as output.

See also  openai reverse proxy

Here’s an example API call:

import openai

openai.ChatCompletion.create( model=”gpt-3.5-turbo”, messages=[ {“role”: “system”, “content”: “You are a helpful assistant.”}, {“role”: “user”, “content”: “Who won the world series in 2020?”}, {“role”: “assistant”, “content”: “The Los Angeles Dodgers won the World Series in 2020.”}, {“role”: “user”, “content”: “Where was it played?”} ] )

The ‘messages’ parameter should be an array of message objects, where each object has a role (“system”, “user”, “assistant”) and content (the content of the message). The conversation can consist of as little as one message or fill many pages.

Typically, the conversation format starts with a system message followed by alternating user and assistant messages.

System messages help set the behavior of the chat AI assistant. In the example above, it is instructed that “You are a helpful assistant.” User messages help instruct the assistant. They can be generated by the end-user of the application or set by the developer as instructions. Assistant messages help store previous responses. They can also be written by the developer to provide examples of the desired behavior.

Using context message references: Since the requests are stateless (the model doesn’t have memory of past requests), referencing previous messages (dialogue history) helps the AI understand the context. For example, in the above example, the last question “Where was it played?” only makes sense in the context of the previous messages about the 2020 World Series.

If the dialogue becomes too long due to token limitations, it needs to be shortened in some way. Here’s an example by Xu Wenhao that showcases a simple method to handle token length:

See also  openai api verification code


import openai import os

openai.api_key = os.environ.get(“OPENAI_API_KEY”)

class Conversation: def init(self, prompt, num_of_round): self.prompt = prompt self.num_of_round = num_of_round self.messages = [] self.messages.append({“role”: “system”, “content”: self.prompt})

def ask(self, question):
        self.messages.append({"role": "user", "content": question})
        response = openai.ChatCompletion.create(
    except Exception as e:
        return e

    message = response["choices"][0]["message"]["content"]
    self.messages.append({"role": "assistant", "content": message})

    if len(self.messages) > self.num_of_round*2 + 1:
        del self.messages[1:3] # Remove the first round conversation left.
    return message

In the above example, the latest question is appended to the end of the entire conversation array and the response from ChatGPT is also appended. If the number of rounds exceeds the set ‘num_of_round’, the first round of conversation is removed.

This method helps provide necessary context to the AI but be mindful of token usage and cost, as more tokens sent will result in higher charges.

Response Parameter: finish_reason The following is an example API response:

{ ‘id’: ‘chatcmpl-6p9XYPYSTTRi0xEviKjjilqrWU2Ve’, ‘object’: ‘chat.completion’, ‘created’: 1677649420, ‘model’: ‘gpt-3.5-turbo’, ‘usage’: {‘prompt_tokens’: 56, ‘completion_tokens’: 31, ‘total_tokens’: 87}, ‘choices’: [ { ‘message’: { ‘role’: ‘assistant’, ‘content’: ‘The 2020 World Series was played in Arlington, Texas at the Globe Life Field, which was the new home stadium for the Texas Rangers.’}, ‘finish_reason’: ‘stop’, ‘index’: 0 } ] }

The response contains the ‘finish_reason’ parameter, which can have the following values:

  • ‘stop’: The API returns complete model output.
  • ‘length’: The model output is incomplete due to parameter or token limitations (max_tokens).
  • ‘content_filter’: Content was omitted due to flags set in the content filter.
  • ‘null’: The API response is still in progress or incomplete.

This value can help us understand the reason for specific cases of API response.

See also  you do not have access to

Summary: The ChatCompletion interface is designed specifically for generating text in conversation and chat scenarios. Due to the stateless nature of the interface, it is important to provide context during requests. This must be used effectively to avoid excessive token usage and higher costs.

Solution to Openai. ChatCompletion.create not being asynchronous or concurrent

Create ->create, with an ‘a’ added before it, which may mean ‘asynchronous’. The translation for’ asynchronous’ is’ asynchronous’. In programming, async is typically used for asynchronous programming, where a task is executed without blocking other parts of the program, but in the background. This programming method can improve the efficiency and response speed of the program. In Python and JavaScript, async/await is a commonly used asynchronous programming method.