Content Safety Moderation Model (Moderation)

1. OpenAI Content Review API Introduction

1.1. Overview of OpenAI Content Review API

OpenAI provides a content review API service designed to help developers quickly and accurately identify and filter online content that violates usage policies. This API uses advanced machine learning models to perform real-time analysis of text content, identifying potential hate speech, harassment, and explicit content, and providing clear categorization and judgment.

1.2. Description of Content Categories

The OpenAI content review API categorizes inappropriate content into multiple categories for more detailed handling of different types of violations. Here are specific explanations for these categories:

hate: Contains hate speech based on race, gender, ethnicity, religion, nationality, sexual orientation, disability status, or caste.
hate/threatening: In addition to hate speech, it includes violence or serious harm threats against the aforementioned specific groups.
harassment: Promotes or encourages harassment language against any target.
harassment/threatening: Contains harassment content with violence or serious harm threats against any target.
self-harm: Promotes, encourages, or depicts self-harming behaviors, such as suicide, cutting, and eating disorders.
self-harm/intent: The speaker indicates that they are engaging in or intending to engage in self-harming behavior.
self-harm/instructions: Encourages self-harming behavior or provides guidance or advice on how to carry out such behavior.
sexual: Contains content intended to arouse sexual excitement, such as descriptions of sexual activity, or promotes sexual services (excluding sexual education and health).
sexual/minors: Involves sexual content concerning individuals under the age of 18.
violence: Describes content related to death, violence, or physical injury.
violence/graphic: Content that graphically describes death, violence, or physical injury.

3. Using the OpenAI Content Moderation API

To use the OpenAI Content Moderation API, you can make network requests using command-line tools like cURL. Here's a simple example:

curl https://api.openai.com/v1/moderations \
  -X POST \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -d '{"input": "Here is some sample text"}'

In the above command, replace $OPENAI_API_KEY with your actual OpenAI API key. Replace the "Here is some sample text" in the input field with the actual text you want to moderate.

After calling the API, you will receive a response structured similarly to the following:

{
  "id": "modr-XXXXX",
  "model": "text-moderation-007",
  "results": [
    {
      "flagged": true,
      "categories": {
        "sexual": false,
        "hate": false,
        "harassment": false,
        "self-harm": false,
        "sexual/minors": false,
        "hate/threatening": false,
        "violence/graphic": false,
        "self-harm/intent": false,
        "self-harm/instructions": false,
        "harassment/threatening": true,
        "violence": true
      },
      "category_scores": {
        "sexual": 1.2282071e-06,
        "hate": 0.010696256,
        "harassment": 0.29842457,
        "self-harm": 1.5236925e-08,
        "sexual/minors": 5.7246268e-08,
        "hate/threatening": 0.0060676364,
        "violence/graphic": 4.435014e-06,
        "self-harm/intent": 8.098441e-10,
        "self-harm/instructions": 2.8498655e-11,
        "harassment/threatening": 0.63055265,
        "violence": 0.99011886
      }
    }
  ]
}

In the response from the API, the flagged field indicates whether the content violates OpenAI's usage policies. The categories field contains boolean flags for whether the content violates different categories, and the category_scores field provides confidence scores for the violation of corresponding categories. Higher scores indicate a higher likelihood of violation. Note that these scores should not be interpreted as probabilities.

It should be noted that OpenAI will continuously update the model behind the Content Moderation API, which means that custom policies relying on category_scores may need ongoing calibration over time.

1. OpenAI Content Review API Introduction

1.1. Overview of OpenAI Content Review API

1.2. Description of Content Categories

3. Using the OpenAI Content Moderation API

Related Tutorials