GPT Model API

1. Basics of Text Generation Models

OpenAI's text generation model, typically referred to as Generative Pre-trained Transformer (GPT), relies on the self-attention mechanism in deep learning to comprehend and process natural language. The training of the GPT model consists of two stages: pre-training and fine-tuning.

Pre-training

During the pre-training stage, the model undergoes unsupervised learning using a large-scale text dataset. In this process, the model is trained by predicting the next word. For example, given the sentence "I have a pen," after seeing the first few words, it attempts to predict the word "pen." The primary objective of pre-training is to enable the model to understand the structure and semantics of language.

Fine-tuning

The fine-tuning stage involves supervised learning on specific tasks. In this stage, the model is adjusted to adapt to particular applications such as question-answering systems and document summarization. Fine-tuning involves further training the model with annotated datasets based on the pre-trained model, enabling it to better adapt to specific tasks.

2. Application Scenarios

OpenAI's text generation model can be applied to a wide range of scenarios. Here are some specific applications:

Drafting Documents: Assisting users in quickly drafting and editing documents.
Writing Computer Code: Generating code snippets to aid in programming.
Answering Questions About a Knowledge Base: Providing answers based on stored knowledge.
Text Analysis: Extracting text information, analyzing sentiment, and more.
Providing Natural Language Interface to Software: Allowing users to interact with software using natural language.
Tutoring in a Range of Subjects: Providing teaching guidance across multiple subjects.
Language Translation: Translating text between different languages.
Simulating Characters for Games: Generating dialogues and background stories for games or role-playing scenarios.

3. Introduction to Dialogue Model

A dialogue model is a special type of text generation model that understands and generates natural conversations through pre-training. This model can simulate a conversation between a user and a virtual assistant, suitable for real-time interactive applications.

The use of a dialogue model typically involves multi-turn interactive conversations. For example, when a user asks a question, the model can generate appropriate responses based on previous training data. Additionally, the dialogue model can maintain contextual information, considering the previous conversation content to generate more coherent and natural responses.

Application scenarios of dialogue model:

Customer Service Assistants: Automatically answer users' frequently asked questions, provide assistance, and advice.
Chatbots: Engage in natural conversational interactions with users.
Virtual Assistants: Execute users' voice commands, such as scheduling meetings, setting reminders, and more.
Role-playing Games: Enrich the gaming experience by giving game characters unique dialogues and personalities.

4. Dialogue Model API

The Dialogue Model API allows developers to interact with the GPT model using HTTP requests. This section will introduce how to use curl to construct requests and parse the responses returned by the model.

Building Requests

Before getting started, you need to register and obtain an API key from OpenAI, which needs to be authenticated through HTTP headers when sending requests.

Here's an example of using curl to send a request to the Dialogue Model API:

curl https://api.openai.com/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -d '{
    "model": "gpt-3.5-turbo",
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful assistant."
      },
      {
        "role": "user",
        "content": "Which team won the 2020 World Series?"
      }
    ]
  }'

Meaning of Dialogue Model Parameters

When using OpenAI's Dialogue Model API, the main parameters include "model" and "messages," each carrying specific meanings and influencing the results produced.

Model Parameters

The model parameter is used to specify the version of the model to be used. For example, "model": "gpt-3.5-turbo" indicates that you are requesting the GPT-3.5-Turbo model. The version of the model selected here will respond to user input based on its capabilities, training data, and interface features.

Here are the currently supported models:

Supported Models	Maximum Context	Model Description
gpt-4-0125-preview	128,000 tokens	GPT-4 Turbo preview model designed to reduce "lazy" cases, where the model fails to complete tasks.
gpt-4-turbo-preview	128,000 tokens	Currently pointing to the `gpt-4-0125-preview` model.
gpt-4-1106-preview	128,000 tokens	GPT-4 Turbo model with improved instruction execution capabilities, JSON mode, reproducible output, and parallel function calls.
gpt-4-vision-preview	128,000 tokens	GPT-4 model with the ability to understand images, in addition to all other GPT-4 Turbo features.
gpt-4	8,192 tokens	Currently pointing to `gpt-4-0613`.
gpt-4-0613	8,192 tokens	GPT-4 snapshot from June 13, 2023, providing improved function call support.
gpt-4-32k	32,768 tokens	Currently pointing to `gpt-4-32k-0613`. This model is not widely promoted and prefers the use of GPT-4 Turbo.
gpt-4-32k-0613	32,768 tokens	GPT-4 32k version snapshot from June 13, 2023. This model is not widely promoted and prefers the use of GPT-4 Turbo.
gpt-3.5-turbo-1106	16,385 tokens	The latest GPT-3.5 Turbo model with improved instruction execution, JSON mode, reproducible output, and parallel function calls.
gpt-3.5-turbo	4,096 tokens	Currently pointing to `gpt-3.5-turbo-0613`.
gpt-3.5-turbo-16k	16,385 tokens	Currently pointing to `gpt-3.5-turbo-16k-0613`.
gpt-3.5-turbo-instruct	4,096 tokens	Functionally similar to the GPT-3 era models. Compatible with traditional completion endpoints, not suitable for chat completion.
gpt-3.5-turbo-0613	4,096 tokens	Snapshot of `gpt-3.5-turbo` from June 13, 2023. Will be deprecated on June 13, 2024.
gpt-3.5-turbo-16k-0613	16,385 tokens	Snapshot of `gpt-3.5-16k-turbo` from June 13, 2023. Will be deprecated on June 13, 2024.
gpt-3.5-turbo-0301	4,096 tokens	Snapshot of `gpt-3.5-turbo` from March 1, 2023. Will be deprecated on June 13, 2024.

Messages Parameter

The messages parameter is an array, where each element represents a message in the conversation. Each message is an object containing two properties: role (the role of the sender) and content (the specific content of the message).

role: Specifies the sender's role for the message. Optional values include "system", "user", and "assistant".
content: The specific content of the message.

Types and Functions of role

The value of the role parameter defines the type and function of the message. The Dialogue API will change the model's response based on different roles.

Role 'system'

System messages are used to globally indicate the behavior of the model. For example, it can explicitly specify the role played by the model (such as assistant, translator, etc.), or provide specific instructions to be followed in the conversation. System messages have a long-term impact on the model's behavior throughout the conversation, but they are usually optional.

For example, if you want the model to participate in the conversation as a customer service assistant, you can specify in the system message:

{
  "role": "system",
  "content": "You are a customer service assistant."
}

Role 'user'

User messages represent the questions input by the user. The model responds to these messages and provides information, answers, or other forms of output. These messages are a crucial part of the Dialogue API workflow and typically correspond to actual user inquiries in the application.

For example, in the user request in the curl example above:

{
  "role": "user",
  "content": "Which team won the 2020 World Series?"
}

Role 'assistant'

Assistant messages typically refer to the replies generated by the model and can also be part of the conversation history messages provided by the developer, used to simulate the format of AI returning messages. In API requests, assistant role messages are not usually provided unless it is necessary to preset the format in which the model answers questions in the conversation history to provide output examples for the model.

Parsing the Response

The model's response is returned in JSON format. Here's an example of parsing the response:

{
  "choices": [
    {
      "finish_reason": "stop",
      "index": 0,
      "message": {
        "content": "The champion of the 2020 World Series is the Los Angeles Dodgers.",
        "role": "assistant"
      },
      "logprobs": null
    }
  ],
  "created": 1677664795,
  "id": "chatcmpl-7QyqpwdfhqwajicIEznoc6Q47XAyW",
  "model": "gpt-3.5-turbo-0613",
  "object": "chat.completion",
  "usage": {
    "completion_tokens": 17,
    "prompt_tokens": 57,
    "total_tokens": 74
  }
}

In the response above, you can obtain the model's answer from choices[0].message.content.

How to Implement Memory Function in Dialogue Model

Below is an example of using OpenAI's Chat Completions API to implement GPT model's memory function, which demonstrates how to save the historical conversation context (i.e., the content of memory) in a new API request to achieve continuous dialogue.

import requests

api_url = "https://api.openai.com/v1/chat/completions"
api_key = "Your OpenAI API Key"

headers = {
    "Content-Type": "application/json",
    "Authorization": f"Bearer {api_key}"
}

data = {
    "model": "gpt-3.5-turbo",  # Can be replaced with gpt-4 or other available models
    "messages": [
      {
        "role": "system",  # System message, used to set the behavior of the dialogue model
        "content": "You are a help assistant."
      },
      {
        "role": "user",  # User message, the model will respond to this
        "content": "Can you tell me the main reasons for climate change?"
      },
      {
        "role": "assistant",  # Model's response
        "content": "The main reasons for climate change include greenhouse gas emissions, fossil fuel combustion, and deforestation, etc."
      },
      {
        "role": "user",  # New question based on the model's answer
        "content": "How can we reduce greenhouse gas emissions?"
      }
    ]
}

response = requests.post(api_url, headers=headers, json=data)

if response.status_code == 200:
    reply_content = response.json()['choices'][0]['message']['content']
    print(f"Model's response => {reply_content}")
else:
    print(f"Request error: {response.status_code}")

In this example, we simulate a user first asking about the main reasons for climate change, and then posing another question based on the model's explanation. In subsequent requests, we keep the content of the previous conversation to ensure that the model can remember the history of the previous conversation and generate a response based on it. This method achieves the transmission and memory of the dialogue state by using the input and output of the previous conversation round as the history messages of the new request.

Tip: To implement the memory function of the dialogue, because the model has a maximum tokens limit, it is not feasible to input all historical conversation messages for each request. Typically, relevant messages related to the current questions are input as historical messages into the model, and the subsequent sections will introduce how the Embeddings feature achieves text similarity search.

5. JSON Schema

JSON Schema is a feature of the dialogue model API that allows users to instruct the model to always return a JSON object, suitable for scenarios that require receiving responses in JSON format.

Using JSON Schema

To use JSON Schema, you need to set the response_format field to { "type": "json_object" } in the HTTP request body and ensure that the system message indicates that the model output is in JSON format. Below is a curl request example for enabling JSON Schema:

curl https://api.openai.com/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -d '{
    "model": "gpt-3.5-turbo-1106",
    "response_format": { "type": "json_object" },
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful assistant designed to output JSON."
      },
      {
        "role": "user",
        "content": "Which team won the 2020 World Series?"
      }
    ]
  }'

Parsing JSON Schema Response

In JSON Schema mode, the response will contain a complete and perfectly parsed JSON object. This mode ensures that the model's output is a valid JSON object that can be directly parsed and used. Below is an example of a response that could be returned using JSON Schema:

{
  "choices": [
    {
      "finish_reason": "stop",
      "message": {
        "content": "{\"winner\": \"Los Angeles Dodgers\"}"
      }
    }
  ]
}

In Python, you can use the following code to extract the content from the response:

import json

response = {
  "choices": [
    {
      "finish_reason": "stop",
      "message": {
        "content": "{\"winner\": \"Los Angeles Dodgers\"}"
      }
    }
  ]
}

response_content = json.loads(response['choices'][0]['message']['content'])

print(response_content)

The output will be:

{'winner': 'Los Angeles Dodgers'}

JSON Schema provides a reliable method for ensuring the correct formatting of responses for specific use cases. Therefore, it is recommended to enable JSON Schema in scenarios where there are specific requirements for the API response format.