Implementing structured outputs as a feature for any LLM
It's nice when the LLM can return structured data. It makes it easier to work with the output of the LLM. For example, OpenAI already does this.
But getting structured output from an LLM that doesn't support it can be tricky. While function calling APIs exist in models like GPT-4, they're not available in all models. And even when they are, the output isn't always guaranteed to be valid JSON.
Let's build a simple but reliable JSON parser that:
- Uses Zod to validate the structure
- Recursively retries with validation errors as feedback
- Works with local models via Ollama
Setting up
First, let's install our dependencies:
npm install zod async-retry
We'll also need Ollama running locally with llama3.2. If you haven't already:
ollama pull llama3.2
The Parser Implementation
Here's our implementation of a recursive JSON parser that keeps trying until it gets valid JSON that matches our schema:
import { z } from "zod";
import retry from "async-retry";
interface ParserOptions {
maxRetries?: number;
schema: z.ZodSchema;
prompt: string;
}
async function callOllama(prompt: string) {
const response = await fetch("http://localhost:11434/api/generate", {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({
model: "llama3.2",
prompt: prompt,
stream: false,
}),
});
const data = await response.json();
return data.response;
}
export async function parseWithRetry({
maxRetries = 3,
schema,
prompt,
}: ParserOptions) {
return retry(
async (bail, attempt) => {
try {
// If this isn't the first attempt, add the previous error to the prompt
const fullPrompt =
attempt === 1
? prompt
: `${prompt}\n\nPrevious attempt failed with error: ${attempt}. Please fix and try again.`;
const response = await callOllama(fullPrompt);
// Extract JSON from the response
const jsonMatch = response.match(/\{[\s\S]*\}/);
if (!jsonMatch) {
throw new Error("No JSON found in response");
}
const parsed = JSON.parse(jsonMatch[0]);
return schema.parse(parsed);
} catch (error) {
if (attempt === maxRetries) {
bail(error as Error);
return;
}
throw error;
}
},
{
retries: maxRetries,
factor: 1,
minTimeout: 100,
maxTimeout: 1000,
}
);
}
Using the Parser
Let's try it out with a simple movie recommendation schema:
import { z } from "zod";
import { parseWithRetry } from "./parser";
const MovieSchema = z.object({
title: z.string(),
year: z.number(),
rating: z.number().min(0).max(10),
genres: z.array(z.string()),
});
async function main() {
const prompt = `
Give me a movie recommendation in JSON format with the following structure:
- title (string)
- year (number)
- rating (number between 0-10)
- genres (array of strings)
Return only the JSON, no other text.
Here's an example of valid JSON:
${JSON.stringify({
title: "The Dark Knight",
year: 2008,
genres: ["Action", "Crime", "Drama"],
})}
`;
try {
const result = await parseWithRetry({
schema: MovieSchema,
prompt,
maxRetries: 3,
});
console.log("Parsed result:", result);
} catch (error) {
console.error("Failed after all retries:", error);
}
}
main();
When you run this, you might see something like:
{
"title": "Inception",
"year": 2010,
"genres": ["Science Fiction", "Action", "Thriller"]
}
How it Works
- The parser takes a Zod schema, a prompt, and optional retry settings
- It sends the prompt to Ollama's llama3.2 model
- Extracts JSON from the response using regex
- Validates the JSON against the Zod schema
- If validation fails, it retries with the error message appended to the prompt
- This continues until we get valid JSON or hit the retry limit
Improving the Parser
This approach works best with simpler schemas. Complex nested structures might need more retries. In our experience, we've found that breaking down the schema into smaller chunks and handling each chunk separately has worked well.
In other cases, providing few-shot examples of valid and invalid JSON can help the model understand the schema better.
If you're using Claude, you can also prefill Claude's response format to force it to output in the correct format.
[
{
"role": "user",
"content": "What is your favorite color? Output only the JSON, no other text."
},
{
"role": "assistant",
"content": "{ \"color\":"
}
]
To run this example in your own machine, clone the repo.