LLM 25-Day Course - Day 7: OpenAI GPT Series

Day 7: OpenAI GPT Series

OpenAI’s GPT series is the protagonist that opened the LLM era. ChatGPT brought AI to the general public, and through its API, GPT models are the most widely used by developers.

OpenAI Model Selection Guide (as of April 2026)

Model Family	Recommended For	Notes
GPT-5 series	Complex reasoning, agentic workflows	Accuracy first, Responses API recommended
GPT-4.1 series	General text/coding, quality-cost balance	Default choice for general backend APIs
GPT-4o / GPT-4o mini	Multimodal (text+image+voice), low latency	Ideal for real-time/interactive scenarios

Pricing and supported models change frequently, so always check the official pages first rather than memorizing a fixed table.

API pricing: https://platform.openai.com/pricing
Model documentation: https://developers.openai.com/api/docs/models

Basic Responses API Usage (Recommended)

# pip install openai
from openai import OpenAI

client = OpenAI(api_key="YOUR_API_KEY")  # Environment variable recommended

# Basic text generation (Responses API)
response = client.responses.create(
    model="gpt-4.1-mini",
    input=[
        {"role": "system", "content": "You are a friendly AI tutor."},
        {"role": "user", "content": "What is a Transformer? Please explain briefly."},
    ],
    temperature=0.7,     # Creativity control (0=deterministic, 2=very random)
    max_output_tokens=500,
)

print(response.output_text)
print(f"Tokens used: input={response.usage.input_tokens}, "
      f"output={response.usage.output_tokens}")

Streaming Response Handling (Chat Completions Compatible Example)

from openai import OpenAI

client = OpenAI()

# Streaming: output tokens as they are generated
stream = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {"role": "user", "content": "Tell me 3 advantages of Python."},
    ],
    stream=True,  # Enable streaming
)

full_response = ""
for chunk in stream:
    content = chunk.choices[0].delta.content
    if content:
        print(content, end="", flush=True)
        full_response += content

print(f"\n\nTotal response length: {len(full_response)} characters")

Multi-turn Conversation and System Prompt (Chat Completions Compatible Example)

from openai import OpenAI

client = OpenAI()

conversation = [
    {"role": "system", "content": "You are a Python expert. Include code examples in your answers."},
]

def chat(user_message):
    conversation.append({"role": "user", "content": user_message})

    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=conversation,
        temperature=0.3,
    )

    assistant_message = response.choices[0].message.content
    conversation.append({"role": "assistant", "content": assistant_message})
    return assistant_message

# Multi-turn conversation
print(chat("What is list comprehension?"))
print("---")
print(chat("Can I use that with dictionaries too?"))
# Remembers previous conversation context and responds accordingly

Practical Tips

Scenario	Recommended Model	Reason
Simple classification, extraction	Lightweight model (e.g., 4o mini class)	Fast and cheap
Complex reasoning, analysis	Top-tier reasoning model (e.g., GPT-5/4.1 upper)	Accuracy first
Prototyping	Lightweight model	Cost savings
Image analysis	Multimodal model	Image input processing
Large batch processing	Lightweight model + Batch API	Cost efficiency

Temperature setting guide: 0 for classification/extraction, 0.7 for general conversation, 1.0-1.5 for creative writing.

Today’s Exercises

Get an OpenAI API key, choose a lightweight model, and send the question “What is the capital of South Korea?” Check the number of tokens used and estimate the cost.
Send the same question 5 times each with temperature 0, 0.7, and 1.5, and compare the diversity of responses.
Calculate how token costs increase as conversations grow longer in multi-turn dialogue. What problems arise when the conversation reaches 50 turns?