Introducing L1M

L1M is a simple open-source API for extracting structured data from text and images using LLMs.

Inferable Team on 27-02-2025

When we first started using LLMs to extract structured data from unstructured sources, we were excited by the potential. We could potentially parse any document or image and convert it into usable JSON.

However, we quickly discovered that as powerful as these models are, extracting clean structured data is surprisingly challenging. Specifically, current approaches fail in several predictable ways:

Complex prompt engineering is required to get models to output valid JSON
Multiple API calls are often needed to refine and validate the output
Results vary significantly between runs, even with identical inputs
Many solutions are tightly coupled to specific LLM providers

Common Issues with Data Extraction from LLMs

For example, consider the following simple task:

Extract the event and year from this text:
"A particularly severe crisis in 1907 led Congress to enact the Federal Reserve Act in 1913"

Despite its simplicity, the process of getting clean JSON can be frustrating:

The model might output markdown or poorly formatted JSON
You might need to parse the response and handle edge cases
The extracted data might miss critical fields or add unexpected ones
You need different prompting strategies for different types of content

A schema-first approach with standardized extraction can avoid these pitfalls by explicitly defining the expected output structure. This is exactly what L1M provides.

Our Insight

We think the perfect abstraction is one where you define your schema once, and the extraction "just works" regardless of the source or provider - whether that's text, images, or other unstructured data.

Enter L1M

Therefore, we built L1M (pronounced "el-one-em").

L1M is an open-source API that takes a schema-first approach to data extraction. You simply define your JSON schema, provide your unstructured input, and get back exactly the structured data you need.

curl -X POST https://api.l1m.io/structured \
-H "Content-Type: application/json" \
-H "X-Provider-Url: demo" \
-H "X-Provider-Key: demo" \
-H "X-Provider-Model: demo" \
-d '{
  "input": "A particularly severe crisis in 1907 led Congress to enact the Federal Reserve Act in 1913",
  "schema": {
    "type": "object",
    "properties": {
      "year": { "type": "number" },
      "event": { "type": "string" }
    }
  }
}'

If you get a 200 response, you know you got valid JSON. If you get a 4XX response, you know it's not in the schema you wanted the results in.

In this example, L1M handles all the complexity and returns perfectly structured JSON based on your schema. The same approach works for images, making it ideal for extracting data from receipts, menus, or any visual content.

Provider Flexibility

L1M isn't tied to any specific provider. It works with any OpenAI-compatible API or Anthropic models. You can even run it locally with Ollama for complete privacy and control.

Built-in Caching

Performance and cost efficiency are built-in. L1M includes optional caching with customizable TTL via the x-cache-ttl header, allowing you to cache identical extraction requests.

Privacy First

We don't store your data unless you explicitly use caching. With the ability to run locally, you maintain complete control over your data.

Open Source

L1M is fully open-source under an MIT license. We've built SDKs for Node.js, Python, and Go to make integration as seamless as possible.

Check out the GitHub repository for complete documentation and examples.

Bringing L1M to durable workflows

We're working on bringing L1M to durable workflows. This will allow you to extract structured data from any unstructured source, including images, videos, and audio.

This will also simplify your workflows, as simpler ctx.agent(type="single-step") functionality can be replaced with L1M.

Try L1M

Try L1M for free during the open beta on l1m.io.