Category: Software

Announcing llm-docs-builder: An Open Source Tool for Making Documentation AI-Friendly

I am excited to announce the release of llm-docs-builder, a library that transforms Markdown documentation into an AI-optimized format for Large Language Models.

TL;DR: Open source tool that strips 85-95% of noise from documentation for AI systems. Transforms Markdown, generates llms.txt indexes, and serves optimized docs to AI crawlers automatically. Reduces RAG costs significantly.

View on GitHub

If you find it interesting or useful, don't forget to star ⭐ the repo - it helps others discover the tool!

The Problem

If you have watched an AI assistant confidently hallucinate your library API – suggesting methods that do not exist or mixing up versions – you've experienced this documentation problem firsthand. When AI systems like Claude, ChatGPT, and GitHub Copilot try to understand your docs using RAG (Retrieval-Augmented Generation), they drown in noise.

Beautiful HTML documentation with navigation bars, CSS styling, and JavaScript widgets becomes a liability. The AI retrieves your "Getting Started" page, but 90% of what it processes is HTML boilerplate and formatting markup. The actual content? Buried in that mess.

Context windows are expensive and limited. Research shows that typical HTML documents waste up to 90% of tokens on pure noise: CSS styles, JavaScript code, HTML tag overhead, comments, and meaningless markup. This waste adds up fast across thousands of pages and millions of queries.

What llm-docs-builder Does

This tool transforms your markdown documentation to eliminate 85-95% of the noise compared to the HTML version, letting AI assistants focus on the actual content. I have extracted it from the Karafka framework's documentation build system, where it has served thousands of developers in production for months.

Real metrics from Karafka documentation:

Page HTML Markdown Reduction
Getting Started 82.0 KB 4.1 KB 95% (20x)
Monitoring 156 KB 6.2 KB 96% (25x)
Configuration 94.3 KB 3.8 KB 96% (25x)

Average: 93% fewer tokens, 20-36x smaller files

Before and After Example

Before transformation (98 tokens):

---
title: Getting Started
description: Learn how to get started
tags: [tutorial, beginner]
updated: 2024-01-15
---

[![Build](https://img.shields.io/badge/build-passing-green.svg)](https://ci.example.com)
[![License](https://img.shields.io/badge/license-MIT-blue.svg)](LICENSE)

# Getting Started

> **Note**: Requires Ruby 3.0+

Welcome to our framework! Let's get you up and running...

After transformation (18 tokens, 81% reduction):

# Getting Started

Welcome to our framework! Let's get you up and running.

Why This Matters

Cleaner documentation means AI assistants spend less time processing noise and more time understanding your actual content. This translates to lower costs per query, fewer hallucinations as shown in the HtmlRAG study, and much faster response times.

How It Works

llm-docs-builder applies several transformations to your markdown documentation to make it RAG-friendly, then generates an llms.txt index that helps AI agents discover and navigate your content efficiently. Below are examples of these transformations in action.

1. Hierarchical Context Preservation

When documents are chunked for RAG, context loss leads to hallucinations. Consider:

# Configuration
## Consumer Settings
### auto_offset_reset
Controls how consumers handle missing offsets...

Chunked independently, ### auto_offset_reset loses all parent context. llm-docs-builder preserves hierarchy:

### Configuration / Consumer Settings / auto_offset_reset
Controls how consumers handle missing offsets...

Now the chunk is self-contained even when retrieved in isolation.

2. Semantic Noise Removal

  • Strips YAML/TOML frontmatter.
  • Removes HTML comments and build badges.
  • Expands relative links to absolute URLs.
  • Normalizes whitespace while preserving code blocks.
  • Preserves code syntax highlighting markers.

3. Enhanced llms.txt Generation

This feature creates llms.txt index files - the emerging standard for AI-discoverable documentation, adopted by Anthropic, Cursor, Pinecone, LangChain, and 200+ projects.

Generated llms.txt includes token counts and timestamps and provides AI-readable documentation for your:

# Llms.txt

## Documentation
- [Getting Started](https://myproject.io/docs/getting-started.md): 1,024 tokens, updated 2024-03-15
- [API Reference](https://myproject.io/docs/api-reference.md): 5,420 tokens, updated 2024-03-18
- [Configuration Guide](https://myproject.io/docs/configuration.md): 2,134 tokens, updated 2024-03-12

Total documentation: 8,578 tokens across 3 core pages

AI agents can prioritize which documents to fetch based on token budgets, their needs and freshness.

Getting Started

Installation

docker pull mensfeld/llm-docs-builder:latest
alias llm-docs-builder='docker run -v $(pwd):/workspace mensfeld/llm-docs-builder'

Transform Your Documentation

llm-docs-builder bulk-transform --docs ./docs --base-url https://myproject.io

This single command can reduce your RAG system's token usage by 85-95%.

Generate an llms.txt Index

llm-docs-builder generate --docs ./docs

Measure Your Savings

llm-docs-builder compare --url https://karafka.io/docs/Getting-Started/

Example output:

============================================================
Context Window Comparison
============================================================

Human version:  82.0 KB
AI version:     4.1 KB
Reduction:      77.9 KB (95%)
Factor:         20.1x smaller
============================================================

Configuration

Create llm-docs-builder.yml:

docs: ./docs
base_url: https://myproject.io

# Optimization options
convert_urls: true
remove_comments: true
remove_badges: true
remove_frontmatter: true
normalize_whitespace: true

# RAG enhancements
normalize_headings: true
include_metadata: true
include_tokens: true

excludes:
  - "**/internal/**"

Serving Optimized Docs to AI Crawlers

Configure your web server to automatically serve markdown to LLM crawlers while continuing to serve HTML to human visitors. Detect AI user agents (ChatGPT-User, GPTBot, anthropic-ai, claude-web, PerplexityBot, meta-externalagent) and serve .md files instead of .html.

Implement this feature to automatically detect AI agents and serve them raw markdown, as shown in the following example:

Apache (.htaccess):

SetEnvIf User-Agent "(?i)(openai|anthropic|claude|gpt|chatgpt|perplexity)" IS_LLM_BOT

RewriteCond %{ENV:IS_LLM_BOT} !^$
RewriteCond %{REQUEST_FILENAME}.md -f
RewriteRule ^(.*)$ $1.md [L]

Nginx:

map $http_user_agent $is_llm_bot {
    default 0;
    "~*(?i)(openai|anthropic|claude|gpt|chatgpt|perplexity)" 1;
}

location ~ ^/docs/ {
    if ($is_llm_bot) {
        try_files $uri.md $uri $uri/ =404;
    }
}

Benefits:

  • Zero disruption to human users
  • Automatic cost savings on every AI query
  • No separate documentation sites needed

Why Markdown for RAG Systems

Tokenization efficiency matters for both cost and performance. The following table shows a simple heading comparison:

Format Example Token Count
HTML <h2>Section Title</h2> 7-8 tokens
Markdown ## Section Title 3-4 tokens

HTML requires opening and closing tags for every element (2x overhead), special characters (<, >,  ) consume multiple tokens each, and attributes add 2-3 tokens per occurrence. Markdown uses single characters for formatting (**, *, -, |) that often tokenize to single tokens, requires no closing tags, and maintains semantic structure without attribute bloat.

Format efficiency comparison:

  • Plain text: 96% reduction vs raw HTML
  • Cleaned HTML (CSS/JS removed): 94% reduction
  • Markdown: 90% reduction

While cleaned HTML can match Markdown's efficiency, the preprocessing required is complex and error-prone. Markdown provides the optimal balance: simple to generate, efficient to tokenize, and preserves semantic structure naturally. For RAG systems that chunk and retrieve documents independently, a clear hierarchy of Markdown ensures that each chunk remains interpretable without its surrounding context.

When NOT to Use This

My llm-docs-builder will not be of great use when:

  • Your docs rely heavily on visual diagrams that cannot be described in Markdown.
  • You are already serving pure Markdown without HTML noise.
  • Your documentation is primarily an API reference with minimal prose (consider OpenAPI/Swagger instead).

Next Steps

Your documentation is already being consumed by LLMs. The question is whether you're serving optimized content or forcing them to parse megabytes of HTML boilerplate.

  1. Install llm-docs-builder via Docker.
  2. Run compare on your existing docs to measure potential savings.
  3. Configure llm-docs-builder.yml for your project.
  4. Run bulk-transform to generate optimized versions.
  5. Use server configuration to serve markdown to AI crawlers.

Every query you optimize saves money and improves the quality of AI-assisted development with your framework.


llm-docs-builder is open source under the MIT License. It is extracted from production code powering the Karafka framework documentation.

WaterDrop Meets Ruby’s Async Ecosystem: Lightweight Concurrency Done Right

Ruby developers have faced an uncomfortable truth for years: when you need to talk to external systems like Kafka, you're going to block. Sure, you could reach for heavyweight solutions like EventMachine, Celluloid, or spawn additional threads, but each comes with its own complexity tax.

EventMachine forces you into callback hell. Threading introduces race conditions and memory overhead. Meanwhile, other ecosystems had elegant solutions: Go's goroutines, Node.js's event loops, and Python's asyncio.

Ruby felt clunky for high-performance I/O-bound applications.

Enter the Async Gem

Samuel Williams' async gem brought something revolutionary to Ruby: lightweight concurrency that actually feels like Ruby. No callbacks. No complex threading primitives. Just fibers.

require 'async'

Async do |task|
  # These run concurrently
  task1 = task.async { fetch_user_data }
  task2 = task.async { fetch_order_data }
  task3 = task.async { fetch_metrics_data }

  [task1, task2, task3].each(&:wait)
end

The genius is in the underlying architecture. When an I/O operation would normally block, the fiber automatically yields control to other fibers – no manual coordination is required.

Why Lightweight Concurrency Matters

Traditional threading and evented architectures are heavy. Threads consume a significant amount of memory (1MB stack per thread) and come with complex synchronization requirements. Event loops force you to restructure your entire programming model.

Fibers are lightweight:

  • Memory efficient: Kilobytes instead of megabytes
  • No synchronization complexity: Cooperative scheduling
  • Familiar programming model: Looks like regular Ruby code
  • Automatic yielding: Runtime handles I/O coordination

WaterDrop: Built for Async

Starting with the 2.8.7 release, every #produce_sync and #produce_many_sync operation in WaterDrop automatically yields during Kafka I/O. You don't configure it. It just works:

require 'async'
require 'waterdrop'

producer = WaterDrop::Producer.new do |config|
  config.kafka = { 'bootstrap.servers': 'localhost:9092' }
end

Async do |task|
  # These run truly concurrently
  user_events = task.async do
    100.times do |i|
      producer.produce_sync(
        topic: 'user_events',
        payload: { user_id: i, action: 'login' }.to_json
      )
    end
  end

  # This also runs concurrently during Kafka I/O
  metrics_task = task.async do
    collect_application_metrics
  end

  [user_events, metrics_task].each(&:wait)
end

Real Performance Impact

Performance Note: These benchmarks show single-message synchronous production (produce_sync) for clarity. WaterDrop also supports batch production (produce_many_sync), async dispatching (produce_async), and promise-based workflows. When combined with fibers, these methods can achieve much higher throughput than shown here.

I benchmarked a Rails application processing 10,000 Kafka messages across various concurrency patterns:

Sequential processing (baseline):

  • Total time: 62.7 seconds
  • Throughput: 160 messages/second
  • Memory overhead: Baseline

Single fiber (no concurrency):

  • Total time: 63.2 seconds
  • Throughput: 158 messages/second
  • Improvement: 0.99x - No benefit without actual concurrency

Real-world scenario (3 concurrent event streams):

  • Total time: 23.8 seconds
  • Throughput: 420 messages/second
  • Improvement: 2.6x - What most applications will see in production

Optimized fiber concurrency (controlled batching):

  • Total time: 12.6 seconds
  • Throughput: 796 messages/second
  • Improvement: 5.0x - Peak performance with proper structure

Multiple producers (traditional parallelism):

  • Total time: 15.2 seconds
  • Throughput: 659 messages/second
  • Improvement: 4.1x - Good, but uses more memory than fibers

A single producer using fibers outperforms multiple producer instances (5.0x vs 4.1x) while using less memory and resources. This isn't about making individual operations faster - it's about enabling Ruby to handle concurrent I/O elegantly and efficiently.

Transparent Integration

What makes WaterDrop's async integration cool is that it's completely transparent:

# This code works with or without async
producer.produce_sync(
  topic: 'events',
  payload: data.to_json
)

Running in a fiber scheduler? It yields during I/O. Running traditionally? It blocks normally. No configuration. No special methods.

The Transactional Reality

Transactions have limitations. Multiple transactions from one producer remain sequential due to the transactional.id design:

# These transactions will block each other
Async do |task|
  task.async { producer.transaction { ... } }
  task.async { producer.transaction { ... } } # Waits for first
end

But: transactions still yield during I/O, allowing other fibers doing different work to continue. For concurrent transactions, use separate producers.

Real-World Example

class EventProcessor
  def process_user_activity(sessions)
    Async do |task|
      # Process different types concurrently
      login_task = task.async { process_logins(sessions) }
      activity_task = task.async { process_activity(sessions) }

      # Analytics runs during Kafka I/O
      analytics_task = task.async { update_analytics(sessions) }

      [login_task, activity_task, analytics_task].each(&:wait)
    end
  end

  private

  def process_logins(sessions)
    sessions.each do |session|
      producer.produce_sync(
        topic: 'user_logins',
        payload: session.to_json
      )
    end
  end
end

Why This Matters

WaterDrop's async integration proves Ruby can compete in high-performance I/O scenarios without sacrificing elegance. Combined with Samuel's broader ecosystem (async-http, async-postgres, falcon), you get a complete stack for building high-performance Ruby applications.

Try wrapping any I/O-heavy operations in Async do |task| blocks. Whether it's API calls, database queries, or Kafka operations with WaterDrop, the performance improvement may be immediate and dramatic.


Find WaterDrop on GitHub and explore the async ecosystem that's making Ruby fast again.

Copyright © 2025 Closer to Code

Theme by Anders NorenUp ↑