Semantic Boundary Detection for Improving RAG on Real-time Agents
Learn how semantic boundary detection can enhance RAG systems for real-time AI agents by improving context management and information processing efficiency.
Real-time AI agents face a critical challenge: they need to process and understand vast amounts of unbounded data while maintaining context and making quick decisions. Traditional Retrieval Augmented Generation (RAG) approaches often struggle with this balance, either missing crucial information or becoming computationally expensive. This post discusses semantic boundary detection as an elegant solution to this problem, offering a way to "intelligently" chunk and process information in real-time without sacrificing understanding.
When building realtime agentic systems at scle, doing RAG is not an option, but a necessity. This is mostly because the data that the agent encounters is not bounded. To put in other words, if it's a conversational experience, we might have an unbounded amount of messages in the agent's context window. Or, if the agent does tool calling, some tools might return unbounded amounts of data.
This is where semantic boundary detection comes in. By detecting the semantic boundaries in the data, we can break down the data into more manageable chunks. This allows us to do RAG in a way that is more likely to capture the key information that we need to make a decision.
The tradeoffs
The tradeoffs in any agentic system comes down to computational effort vs. accuracy. For example, you can do fixed size chunking, which takes less computational effort, but it might not be sophisticated enough to keep semantically coherent chunks. On the other hand, you can do agentic RAG, which offloads the whole process to a sub-agent who tries to form an understanding of the data, but this is more computationally expensive, and doesn't suite a realtime experience.
Core Concept
Semantic boundary detection finds natural breaks in text where topics or meanings shift. Unlike simple sentence splitting, it considers the semantic relationship between sentences to determine where content should be divided. We think this is a good middle ground between the two approaches of syntactic and semantic chunking.
How It Works: Step by Step
1. Sentence Segmentation
First, break the text into individual sentences. This step is crucial because it forms the foundation for all subsequent semantic analysis. We use a sentence tokenizer that can handle various punctuation patterns and edge cases:
This code accomplishes three key things:
- Uses a sentence tokenizer to properly split text into individual sentences
- Handles various punctuation patterns while preserving sentence integrity
- Works across different languages with varying sentence boundary markers
import { SentenceTokenizer } from 'sentence-tokenizer'; // npm install sentence-tokenizer
const tokenizer = new SentenceTokenizer();
tokenizer.setEntry(text);
const sentences = tokenizer.getSentences();
2. Create Sliding Windows
Generate groups of sentences using a sliding window approach. This technique is essential for maintaining context awareness. Each group contains the current sentence (anchor) plus surrounding context, allowing us to understand the semantic flow of ideas:
The sliding window implementation achieves three objectives:
- Ensures context preservation between chunks through overlapping windows
- Uses windowSize parameter to control the amount of surrounding context
- Balances context depth against computational requirements
interface SentenceGroup {
anchor: string;
context: string[];
start: number;
end: number;
}
function createSentenceGroups(
sentences: string[],
windowSize: number = 3
): SentenceGroup[] {
const groups: SentenceGroup[] = [];
for (let i = 0; i < sentences.length; i++) {
const start = Math.max(0, i - Math.floor(windowSize/2));
const end = Math.min(sentences.length, i + Math.floor(windowSize/2) + 1);
groups.push({
anchor: sentences[i],
context: sentences.slice(start, end),
start,
end
});
}
return groups;
}
3. Generate Embeddings
Create embeddings for each sentence group. These embeddings capture the semantic meaning of the entire group, allowing us to detect topic shifts and meaning changes:
The embedding generation process accomplishes three things:
- Converts text groups into dense vector representations for semantic comparison
- Captures the complete semantic meaning of each group in its context
async function generateEmbeddings(
groups: SentenceGroup[],
model: EmbeddingModel
): Promise<number[][]> {
const embeddings: number[][] = [];
for (const group of groups) {
const text = group.context.join(' ');
const embedding = await model.embed(text);
embeddings.push(embedding);
}
return embeddings;
}
4. Calculate Semantic Distances
Compare adjacent embeddings to find where meaning shifts significantly:
function calculateDistances(
embeddings: number[][]
): number[] {
const distances: number[] = [];
for (let i = 1; i < embeddings.length; i++) {
const distance = cosineSimilarity(
embeddings[i],
embeddings[i-1]
);
distances.push(distance);
}
return distances;
}
5. Detect Boundaries
Find points where semantic distance exceeds a threshold:
function findBoundaries(
distances: number[],
threshold: number = 0.85
): number[] {
const boundaries: number[] = [];
for (let i = 0; i < distances.length; i++) {
if (distances[i] < threshold) {
boundaries.push(i);
}
}
return boundaries;
}
Why This Matters for Autonomous Agents
Autonomous agents processing large amounts of data need to understand context and meaning shifts in real-time. Traditional RAG systems with fixed chunking can miss important semantic boundaries or create arbitrary breaks in meaning. This approach offers several advantages for autonomous agents:
- Dynamic Understanding: Agents can process text more naturally, adapting to content flow
- Better Context Preservation: Semantic boundaries ensure context isn't lost between chunks
- Improved Retrieval: More meaningful chunks lead to better search and retrieval results
- Real-time Processing: Agents can process streaming data while maintaining semantic coherence
Real-time Implementation
For real-time processing, consider these optimizations that are particularly relevant for autonomous agents:
- Batch Processing: Process sentences in small batches rather than one at a time.
class SemanticBoundaryDetector {
private buffer: string[] = [];
private batchSize: number = 5;
async processBatch(): Promise<number[]> {
if (this.buffer.length < this.batchSize) {
return [];
}
const groups = createSentenceGroups(this.buffer);
const embeddings = await generateEmbeddings(groups, this.model);
const distances = calculateDistances(embeddings);
const boundaries = findBoundaries(distances);
// Clear processed sentences
this.buffer = this.buffer.slice(this.batchSize);
return boundaries;
}
}
- Sliding Window Optimization: Only compute new embeddings for new sentences:
# Sliding Window Implementation
This optimized detector maintains a rolling window of embeddings to minimize computation:
class OptimizedDetector {
private embeddings: number[][] = [];
async processNewSentence(sentence: string): Promise<void> {
const newGroup = this.createGroupWithContext(sentence);
const newEmbedding = await this.model.embed(newGroup.join(' '));
// Only store last N embeddings
this.embeddings = [...this.embeddings.slice(-5), newEmbedding];
// Compare with previous embedding
if (this.embeddings.length > 1) {
const distance = cosineSimilarity(
this.embeddings[this.embeddings.length - 1],
this.embeddings[this.embeddings.length - 2]
);
this.checkBoundary(distance);
}
}
}
- Caching: Cache embeddings for frequently seen sentence patterns:
# Caching Implementation
The cached detector remembers frequently seen patterns to reduce embedding computations:
class CachedDetector {
private cache = new Map<string, number[]>();
async getEmbedding(text: string): Promise<number[]> {
const hash = this.hashText(text);
if (this.cache.has(hash)) {
return this.cache.get(hash)!;
}
const embedding = await this.model.embed(text);
this.cache.set(hash, embedding);
return embedding;
}
}
System Flow
Below is a visualization of how semantic boundary detection processes text streams in real-time:
flowchart TD subgraph Input A[Text Stream] --> B[Text Buffer] end subgraph Preprocessing B --> C[Sentence Segmentation] C --> D[Create Sliding Windows] end subgraph Semantic Processing D --> E[Generate Embeddings] E --> F[Calculate Distances] F --> G[Detect Boundaries] end subgraph Optimization H[Embedding Cache] <-.-> E I[Batch Processor] <-.-> D end subgraph Output G --> J[Semantic Chunks] J --> K[Vector Store] end class A,B,K storage class C,D,E,F,G process class H,I optimization
Alternative Chunking Strategies
While semantic boundary detection offers powerful capabilities for autonomous agents, there are several other approaches we didn't explore.
-
Topic Modeling-Based Chunking: Leverages algorithms like LDA or BERTopic to create chunks based on detected topics within the text. This approach excels with documents that have clear thematic structures, though it can be computationally expensive for real-time applications.
-
Graph-Based Chunking: Represents sentences as nodes in a graph, with edge weights indicating semantic similarity. By applying community detection algorithms, we can identify natural clusters of related content. This method is particularly effective for highly interconnected content.
-
Hierarchical Chunking: Creates a tree structure of content, maintaining multiple levels of granularity simultaneously. This approach allows for flexible retrieval at different levels of detail, making it particularly useful for documents with clear hierarchical organization.
-
Attention-Based Chunking: Uses transformer attention patterns to identify natural breaks in content. While computationally intensive, it often produces highly accurate results that align well with human-perceived content boundaries.