About

llm.rb is Ruby's most capable AI runtime.

It runs on Ruby's standard library by default. loads optional pieces
only when needed, and offers a single runtime for providers, agents,
tools, skills, MCP, A2A (Agent2Agent), RAG (vector stores & embeddings),
streaming, files, and persisted state. As a bonus, llm.rb is also
available for mruby.

It supports OpenAI, OpenAI-compatible endpoints, Anthropic, Google
Gemini, DeepSeek, xAI, Z.ai, AWS Bedrock, Ollama, and llama.cpp. It
also includes built-in ActiveRecord and Sequel support, plus concurrent
tool execution through threads, tasks (via async gem), fibers, ractors,
and fork (via xchan.rb gem).

Quick start

LLM::Context

The
LLM::Context
object is at the heart of the runtime. Almost all other features build
on top of it. It is a low-level interface to a model, and requires tool
execution to be managed manually. The
LLM::Agent
class is almost the same as
LLM::Context
but it manages tool execution for you - we'll cover agents next:

require "llm"

llm = LLM.openai(key: ENV["KEY"])
ctx = LLM::Context.new(llm, stream: $stdout)
ctx.talk "Hello world"

LLM::Agent

The
LLM::Agent
object is implemented on top of
LLM::Context.
It provides the same interface, but manages tool execution for you. It
also has builtin features such as a loop guard that detects repeated
tool call patterns, and another guard that detects infinite tool call
loops. Both guards advise the model to change course rather than raise
an error:

require "llm"

llm = LLM.openai(key: ENV["KEY"])
agent = LLM::Agent.new(llm, stream: $stdout)
agent.talk "Hello world"

Agents (Advanced)

An agent can be configured to require confirmation before a tool is
executed. When a matching tool is called, llm.rb runs
on_tool_confirmation. That callback must decide whether to cancel the
tool call or approve it and execute it by calling
fn.spawn(strategy).wait, and it must always return an instance of
LLM::Function::Return:

require "llm"

class Agent < LLM::Agent
  tools DeleteFile
  confirm "delete-file"

  def on_tool_confirmation(fn, strategy)
    path = fn.arguments["path"] || fn.arguments[:path]
    if path.start_with?("/tmp/")
      fn.spawn(strategy).wait
    else
      fn.cancel(reason: "Deletion requires approval")
    end
  end
end

llm = LLM.openai(key: ENV["KEY"])
Agent.new(llm, stream: $stdout).talk("Delete /tmp/example.txt.")

Tools

The
LLM::Tool
class can be subclassed to implement your own tools that can extend the
abilities of a model:

class ReadFile < LLM::Tool
  name "read-file"
  description "Read a file"
  parameter :path, String, "The filename or path"
  required %i[path]

  def call(path:)
    {contents: File.read(path)}
  end
end

MCP

The
LLM::MCP
object lets llm.rb use tools provided by an MCP server. Those tools are
exposed through the same runtime as local tools, so you can pass them
to either
LLM::Context
or
LLM::Agent.
In this example, the MCP server runs over stdio and
LLM::Context
uses the same tool loop as local tools:

require "llm"

llm = LLM.openai(key: ENV["KEY"])
mcp = LLM::MCP.stdio(argv: ["ruby", "server.rb"])

mcp.run do
  ctx = LLM::Context.new(llm, stream: $stdout, tools: mcp.tools)
  ctx.talk "Use the available tools to inspect the environment."
  ctx.talk(ctx.wait(:call)) while ctx.functions?
end

Use persistent HTTP connections with remote MCP servers:

require "llm"

mcp = LLM::MCP.http(
  url: "https://remote-mcp.example.com",
  transport: LLM::Transport.net_http_persistent
)

A2A (Agent 2 Agent)

The
LLM::A2A
object lets llm.rb use skills provided by a remote A2A agent. Those
skills are exposed through the same runtime as local tools, so you can
pass them to either
LLM::Context
or
LLM::Agent.

Use remote skills as local tools:

require "llm"

a2a = LLM::A2A.rest(
  url: "https://remote-agent.example.com",
  headers: {"Authorization" => "Bearer token"}
)
llm = LLM.openai(key: ENV["KEY"])
ctx = LLM::Context.new(llm, tools: a2a.skills)
ctx.talk "Analyze this CSV and summarize the trends."
ctx.talk(ctx.wait(:call)) while ctx.functions?

Use persistent HTTP connections:

require "llm"

a2a = LLM::A2A.rest(
  url: "https://remote-agent.example.com",
  transport: LLM::Transport.net_http_persistent
)

For more on direct messaging, task operations, push notification
configs, and JSON-RPC, see the
LLM::A2A API docs.

Skills

Skills are reusable instructions loaded from a SKILL.md directory. They let
you package behavior and tool access together, and they plug into the
same runtime as tools, agents, MCP, and A2A. When a skill runs, llm.rb
spawns a subagent with the skill instructions, access to only the tools
listed in the skill, and recent conversation context:

---
name: release
description: Prepare a release
tools: ["search-docs", "git"]
---

## Task

Review the release state, summarize what changed, and prepare the release.

require "llm"

class ReleaseAgent < LLM::Agent
  model "gpt-5.4-mini"
  skills "./skills/release"
end

llm = LLM.openai(key: ENV["KEY"])
ReleaseAgent.new(llm, stream: $stdout).talk("Prepare the next release.")

LLM::Stream

The
LLM::Stream
object lets you observe output and runtime events as they happen. You
can subclass it to handle streamed content in your own application:

require "llm"

class Stream < LLM::Stream
  def on_content(content)
    $stdout << content
  end
end

llm = LLM.openai(key: ENV["KEY"])
ctx = LLM::Context.new(llm, stream: Stream.new)
ctx.talk "Write a haiku about Ruby."

LLM::Stream (advanced)

The
LLM::Stream
object can also resolve tool calls while output is still streaming. In
on_tool_call, you can spawn the tool, push the work onto the stream
queue, and later drain it with wait:

require "llm"

class Stream < LLM::Stream
  def on_content(content)
    $stdout << content
  end

  def on_tool_call(tool, error)
    return queue << error if error
    queue << ctx.spawn(tool, :thread)
  end
end

llm = LLM.openai(key: ENV["KEY"])
ctx = LLM::Context.new(llm, stream: Stream.new, tools: [ReadFile])
ctx.talk "Read README.md and summarize the quick start."
ctx.talk(ctx.wait) while ctx.functions?

Concurrency

llm.rb can run tool work concurrently. This is useful when a model calls
multiple tools and you want to resolve them in parallel instead of one
at a time. On
LLM::Agent,
you can enable this with concurrency. Common options are :call for
sequential execution, :thread, or :task for concurrent IO-bound work, and
:ractor or :fork for more isolated CPU-bound work:

require "llm"

class Agent < LLM::Agent
  model "gpt-5.4-mini"
  tools ReadFile
  concurrency :thread
end

llm = LLM.openai(key: ENV["KEY"])
agent = Agent.new(llm, stream: $stdout)
agent.talk "Read README.md and CHANGELOG.md and compare them."

Serialization

The LLM::Context
object can be serialized to JSON, which makes it suitable for storing
in a file, a database column, or a Redis queue. The built-in
ActiveRecord and Sequel plugins are built on top of this feature:

require "llm"

llm = LLM.openai(key: ENV["KEY"])

# Serialize a context
ctx1 = LLM::Context.new(llm)
ctx1.talk "Remember that my favorite language is Ruby"
string = ctx1.to_json

# Restore a context (from JSON)
ctx2 = LLM::Context.new(llm, stream: $stdout)
ctx2.restore(string:)
ctx2.talk "What is my favorite language?"

Installation

gem install llm.rb

Examples

REPL

This example uses LLM::Context
directly for an interactive REPL.
See the
deepdive (web) or
deepdive (markdown) for more examples.

require "llm"

llm = LLM.openai(key: ENV["KEY"])
ctx = LLM::Context.new(llm, stream: $stdout)

loop do
  print "> "
  ctx.talk(STDIN.gets || break)
  puts
end

Multimodal: Local Files

In llm.rb, a prompt can be a string, an LLM::Prompt, or an array.
When you use an array, each element can be plain text or a tagged object such as
ctx.image_url(...),
ctx.local_file(...),
or ctx.remote_file(...).
Those tagged objects carry the metadata the provider adapter needs to turn one
Ruby prompt into the provider-specific multimodal request schema.

ctx.local_file(path) tags a local path as a :local_file object around
LLM.File(path). If the model understands that file type, you can include it
directly in the prompt array instead of uploading it first through a provider
Files API:

require "llm"

llm = LLM.openai(key: ENV["KEY"])
ctx = LLM::Context.new(llm)
ctx.talk ["Summarize this document.", ctx.local_file("README.md")]

Context Compaction

This example uses LLM::Context,
LLM::Compactor, and
LLM::Stream together so
long-lived contexts can summarize older history and expose the lifecycle
through stream hooks. This approach is inspired by General Intelligence
Systems. The
compactor can also use its own model: if you want summarization to run on a
different model from the main context. token_threshold: accepts either a
fixed token count or a percentage string like "90%", which resolves
against the active model context window and triggers compaction once total
token usage goes over that percentage. See the
deepdive (web) or
deepdive (markdown) for more examples.

require "llm"

class Stream < LLM::Stream
  def on_compaction(ctx, compactor)
    puts "Compacting #{ctx.messages.size} messages..."
  end

  def on_compaction_finish(ctx, compactor)
    puts "Compacted to #{ctx.messages.size} messages."
  end
end

llm = LLM.openai(key: ENV["KEY"])
ctx = LLM::Context.new(
  llm,
  stream: Stream.new,
  compactor: {
    token_threshold: "90%",
    retention_window: 8,
    model: "gpt-5.4-mini"
  }
)

Reasoning

This example uses LLM::Stream
with the OpenAI Responses API so reasoning output is streamed separately from
visible assistant output. See the
deepdive (web) or
deepdive (markdown) for more examples.

To use the Responses API (OpenAI-specific), initialize a
context or agent with mode: :responses and keep using
talk for turns.

require "llm"

class Stream < LLM::Stream
  def on_content(content)
    $stdout << content
  end

  def on_reasoning_content(content)
    $stderr << content
  end
end

llm = LLM.openai(key: ENV["KEY"])
ctx = LLM::Context.new(
  llm,
  model: "gpt-5.4-mini",
  mode: :responses,
  reasoning: {effort: "medium"},
  stream: Stream.new
)
ctx.talk("Solve 17 * 19 and show your work.")

Request Cancellation

Need to cancel a stream? llm.rb has you covered through
LLM::Context#interrupt!.

See the deepdive (web)
or deepdive (markdown) for more examples.

require "llm"
require "io/console"

llm = LLM.openai(key: ENV["KEY"])
ctx = LLM::Context.new(llm, stream: $stdout)
worker = Thread.new do
  ctx.talk("Write a very long essay about network protocols.")
rescue LLM::Interrupt
  puts "Request was interrupted!"
end

STDIN.getch
ctx.interrupt!
worker.join

Sequel (ORM)

The plugin :llm integration wraps
LLM::Context on a
Sequel::Model and keeps tool execution explicit. Like the ActiveRecord
wrappers, its built-in persistence contract is the serialized data column,
while provider: resolves a real LLM::Provider instance and context:
injects defaults such as model:.
See the
deepdive (web) or
deepdive (markdown) for more examples.

require "llm"
require "net/http/persistent"
require "sequel"
require "sequel/plugins/llm"

class Context < Sequel::Model
  plugin :llm, provider: :set_provider, context: :set_context

  private

  def set_provider
    LLM.openai(key: ENV["OPENAI_SECRET"], persistent: true)
  end

  def set_context
    {model: "gpt-5.4-mini", mode: :responses, store: false}
  end
end

ctx = Context.create
ctx.talk("Remember that my favorite language is Ruby")
puts ctx.talk("What is my favorite language?").content

ActiveRecord (ORM): acts_as_llm

The acts_as_llm method wraps LLM::Context and
provides full control over tool execution. Its built-in persistence contract is
one serialized data column. If your app has provider, model, or usage
columns, provide them to llm.rb through provider: and context: instead of
relying on reserved wrapper columns.

See the deepdive (web)
or deepdive (markdown) for more examples.

require "llm"
require "active_record"
require "llm/active_record"

class Context < ApplicationRecord
  acts_as_llm provider: :set_provider, context: :set_context

  private

  def set_provider
    LLM.openai(key: ENV["OPENAI_SECRET"])
  end

  def set_context
    {model: "gpt-5.4-mini", mode: :responses, store: false}
  end
end

ctx = Context.create!
ctx.talk("Remember that my favorite language is Ruby")
puts ctx.talk("What is my favorite language?").content

require "llm"
require "active_record"
require "llm/active_record"

class Context < ApplicationRecord
  acts_as_llm provider: :set_provider, context: :set_context

  # Optional application columns can still provide the provider and context.
  # For example, `provider_name` and `model_name` can be normal columns.

  private

  def set_provider
    LLM.public_send(provider_name, key: provider_key)
  end

  def set_context
    {model: model_name, mode: :responses, store: false}
  end
end

ActiveRecord (ORM): acts_as_agent

The acts_as_agent method wraps LLM::Agent and
manages tool execution for you. Like acts_as_llm, its built-in persistence
contract is one serialized data column. If your app has provider or model
columns, provide them to llm.rb through your hooks and agent DSL.