When integrating AI, ever ask it to not respond conversationally so you can get just the plaintext response you need? Or have it respond with just YES or NO, which you'll then parse to make a decision. Or even run multiple prompts against the same content, one to extract a summary and another to extract a set of keywords. Maybe you got fancy and had it respond in a special format you would parse so it would respond with multiple fields that are more contextually aware of each other.

No more!

On August 6th, OpenAI announced Structured Outputs for their GPT-4o series models and above, which allows you to specify a schema for the response that it will actually follow.

Why is this big news?

OpenAI is not the first to do this. .txt has an Open Source Project called Outlines that does this and does it well, but it requires another library. Anthropic has "Guardrails" and attempts to follow the defined schema, but as of writing this, it isn't as explicit and seemingly relies on the LLM to do the typing and adherence. OpenAI's is BUILT INTO THE API with a well-defined spec. So, the adoption of it could not be any easier, and we're just starting to see the impact of this come to the market as developers integrate it.

What can you do with it?

Most simply, you can easily remove the conversational aspect of the LLM and just get sweet sweet data back out of the API.

Here's the simplest example where we'll just ask for a list of cities. Remember that before, this might respond conversationally and you might need to split the response by new line.

The prompt I might have used before:

Give me a list of 5 smaller cities in the US, respond with just the list of cities, with just one city on each line. No conversational remarks or bullet formatting.

Now, I can specify the response format and not worry about the additional instruction. Here's an example of this in Python. If you don't code, just take a look, it's pretty simple here:

from pydantic import BaseModel
from openai import OpenAI

client = OpenAI()

class CitiesResponse(BaseModel):
    cities: list[str]

completion = client.beta.chat.completions.parse(
    model="gpt-4o-2024-08-06",
    messages=[
        {"role": "user", "content": "give me a list of 5 smaller cities in the US"}
    ],
    response_format=CitiesResponse,
)

cities = completion.choices[0].message.parsed

And I'll get back a response like:

{
    "cities": [
        "Asheville, North Carolina",
        "Bend, Oregon",
        "Santa Fe, New Mexico",
        "Bozeman, Montana",
        "Sarasota, Florida"
     ]
}

This allows me to skip the whole format-explanation part, which can be challenging to articulate and cover all the edge cases for what the LLM might do. It also allows me to do more without examples because I don't need to define the response format.

Here's a more complex and real-world example where we ask the AI for a recipe, which we could use to make a recipe book:

from pydantic import BaseModel
from openai import OpenAI

client = OpenAI()

class Step(BaseModel):
    title: str
    description: str

class Ingredient(BaseModel):
    name: str
    amount: str

class RecipeResponse(BaseModel):
    title: str
    brief_description: str
    descriptive_imagery_prompt: str
    steps: list[Step]
    ingredients: list[Ingredient]

completion = client.beta.chat.completions.parse(
    model="gpt-4o-2024-08-06",
    messages=[
        {"role": "user", "content": "Provide a recipe for a vanilla birthday cake with chocolate frosting"}
    ],
    response_format=RecipeResponse,
)

recipe = completion.choices[0].message.parsed

And I get back this:

{
  "title": "Vanilla Birthday Cake with Chocolate Frosting",
  "brief_description": "Celebrate life's special moments with this delightful vanilla birthday cake topped with a rich, creamy chocolate frosting. Perfect for any occasion.",
  "descriptive_imagery_prompt": "Visualize a beautifully moist vanilla birthday cake covered with a luscious layer of chocolate frosting, sprinkled with delicate candles and garnished with bright, colorful sprinkles.",
  "steps": [
    {
      "title": "Step 1: Preheat the Oven",
      "description": "Preheat your oven to 350\u00b0F (175\u00b0C). Grease and flour two 9-inch round cake pans."
    },
    {
      "title": "Step 2: Mix Dry Ingredients",
      "description": "In a large bowl, whisk together 2 \u00bd cups all-purpose flour, 2 \u00bd teaspoon baking powder, and \u00bd teaspoon salt."
    },
    {
      "title": "Step 3: Cream Butter and Sugar",
      "description": "In another bowl, with an electric mixer, beat 1 cup unsalted butter with 1 \u00be cups granulated sugar on medium-high speed until light and fluffy\u2014about 3 minutes."
    },
    {
      "title": "Step 4: Add Eggs and Vanilla",
      "description": "Add 4 large eggs, one at a time, beating well after each addition. Mix in 1 tablespoon pure vanilla extract."
    },
    {
      "title": "Step 5: Combine Wet and Dry Ingredients",
      "description": "Gradually add the flour mixture to the butter mixture, alternating with 1 cup whole milk. Start and end with flour. Mix just until combined."
    },
    {
      "title": "Step 6: Divide and Bake",
      "description": "Divide the batter evenly between the prepared cake pans. Smooth the top with a spatula. Bake for 25-30 minutes or until a toothpick inserted into the center comes out clean."
    },
    {
      "title": "Step 7: Cool the Cakes",
      "description": "Allow the cakes to cool in the pans for 10 minutes, then turn them out onto a wire rack to cool completely."
    },
    {
      "title": "Step 8: Prepare Chocolate Frosting",
      "description": "Melt 1 cup (2 sticks) unsalted butter in a saucepan over medium heat. Stir in 2/3 cup unsweetened cocoa powder until smooth. Remove from heat and add 5 cups powdered sugar, \u00bd cup milk, and 2 teaspoons vanilla extract. Beat on low speed until well blended."
    },
    {
      "title": "Step 9: Frost the Cake",
      "description": "Place one cooled cake layer on a serving platter or cake stand. Using a spatula, spread a layer of frosting over the top. Place the second cake layer on top and spread frosting over the entire cake."
    },
    {
      "title": "Step 10: Decorate and Serve",
      "description": "Decorate with sprinkles or toppings of your choice. Allow the cake to set for 30 minutes before serving."
    }
  ],
  "ingredients": [
    {
      "name": "All-purpose flour",
      "amount": "2 \u00bd cups"
    },
    {
      "name": "Baking powder",
      "amount": "2 \u00bd teaspoons"
    },
    {
      "name": "Salt",
      "amount": "\u00bd teaspoon"
    },
    {
      "name": "Unsalted butter",
      "amount": "1 cup (2 sticks), softened"
    },
    {
      "name": "Granulated sugar",
      "amount": "1 \u00be cups"
    },
    {
      "name": "Eggs",
      "amount": "4 large"
    },
    {
      "name": "Pure vanilla extract",
      "amount": "1 tablespoon"
    },
    {
      "name": "Whole milk",
      "amount": "1 cup"
    },
    {
      "name": "Unsweetened cocoa powder",
      "amount": "2/3 cup"
    },
    {
      "name": "Powdered sugar",
      "amount": "5 cups"
    },
    {
      "name": "Milk (for frosting)",
      "amount": "1/2 cup"
    },
    {
      "name": "Additional butter (for frosting)",
      "amount": "1 cup (2 sticks), melted"
    }
  ]
}

There's a ton I can do with that. I can send that list of ingredients to my Grocery List Automation, I could send that to ClickList so it can be ready for me to pick up, I could simply list it all out in a recipe book or website.

Note the brief_description. You can use these field names to add emphasis to an explanation, the LLM will know that the description should be brief, not multiple paragraphs.

Future Opportunities

These examples illustrate how to practically use this amazing functionality. Let me list out a couple other opportunities - things that were tough before.

1. Decision Making

Give AI a list of options and let it decide which ones to take. This was possible before, but not with well-structured guarantees in OpenAI.

I've worked this into PyroPrompts. You may have noticed that there is a Workflow Step Type for "Run Workflows", which is in Beta. I'm using Structured Outputs to decide which additional Workflows to run and with what parameters. This opens up the agentic possibilities of Workflows and I'm really excited about that.

2. Dynamic Websites and Apps

You can define a schema for your page - blocks that have fields like headers, content, links, etc, and have AI decide which elements to add and it'll just build it for you, based on your prompt input.

I'm convinced that this is actually why we've seen a recent explosion of ai-based site and app builders.

This is actually mentioned on OpenAI's official documentation on Structured Outputs, so maybe I cheated with this one.

3. API Interaction

Many websites and applications use an API (Application Programming Interface) which allows them to talk to each other. AI can now more easily reason, with well-formed types, what it can send to an API. This will help AI connect to services that let you buy pizza, schedule a tee-time or order an Uber.

AI Actions could someday (probably not any time soon) replace the way we use many apps on phones and websites.

This also bolsters the Action Pillar of AI, one of the keys to future AI integration and adoption.

4. Form Filling

Do you hate filling in forms with the same stuff? Name, date of birth, address, etc. I want to take a form I need filled, scan it, let AI that knows me fill in the details and then print it back on the page. One prompt to extract the fields from the form, another to determine the answers to those fields based on your preferences.

Even online forms, an extension should be able to fill in most forms for me with a single click. Maybe PyroPrompts will update the extension to support this.

5. Web Scraping

Let the AI read the HTML and choose the next steps, maybe it's a click or a hover somewhere or saving an image or copying text. This could be done before, but managing the prompts and actions was a challenge. Structured Outputs makes it easier to pull out the data you want. And getting data is going to be key in allowing our AIs to use up-to-date information, like a business's operating hours or prices for products in an online store. Until information transitions to more machine-friendly formats, scraping websites will continue to dominate data retrieval.

What will you use Structured Outputs for?

I'm working on incorporating some of these into PyroPrompts. Three things on my mind:

  1. Calling Workflows on the fly. As I mentioned above, this is in Beta. Reach out for access or wait for General Availability.

  2. Structured Outputs of LLMs so you can generate marketing content with the exact title, description, and body fields you want. Coming soon!

  3. More flexible UI interactivity. Asking the user for input in a format that makes sense. Coming soon!

What do you want to do with Structured Outputs? Let me know on LinkedIn or X or email me at matt [at] pyroprompts.com.