LLM Structured Output

While integrating an LLM into a Python application, I realised that their default response format of unstructured text wasn’t always practical. This led me to discover that most current models and tooling support requesting structured output, such as JSON. For example, see Google Gemini structured output support.

The tool I’m using for LLM integration — the Python API of Simon Willison’s excellent llm library — has support for structured output via schemas, however the llm-mlx plugin I was using for local model access did not. (MLX is an Apple framework for running models on Apple Silicon, typically giving higher performance.)

The solution was to migrate to llm-ollama for local model access, which integrates with Ollama’s built-in structured output support. The trade-off with this change is reduced performance at inference time, but in practice this hasn’t been an issue for my use case.

Here’s a quick example integration in Python:

import llm, json
from pydantic import BaseModel

text = """
[SNIP]
"""

class SuggestedTags(BaseModel):
    tags: list[str]

model = llm.get_model("gemma3:12b-it-qat")

prompt = (
    "Based on the following text, suggest relevant tags to assist future information retrieval."
    "Use lowercase letters only, no numbers."
    "Use single words only (e.g. ai, health, networking, photography)."
    "Return at most 5 tags."
     f"{text}"
)

response = model.prompt(prompt, schema=SuggestedTags)
response_text = response.text()
tags = json.loads(response_text)

print(response_text)

Which yielded the following output from gemma3:12b-it-qat:

{
  "tags": [
    "ai",
    "docs",
    "agents",
    "engineering",
    "llm"
  ]
}