llm.rb
Health Pass
- License — License: NOASSERTION
- Description — Repository has a description
- Active repo — Last push 0 days ago
- Community trust — 129 GitHub stars
Code Pass
- Code scan — Scanned 12 files during light audit, no dangerous patterns found
Permissions Pass
- Permissions — No dangerous permissions requested
No AI report is available for this listing yet.
Ruby's most capable AI runtime
About
llm.rb is Ruby's most capable AI runtime.
It runs on Ruby's standard library by default. loads optional pieces
only when needed, and offers a single runtime for providers, agents,
tools, skills, MCP, A2A (Agent2Agent), RAG (vector stores & embeddings),
streaming, files, and persisted state. As a bonus, llm.rb is also
available for mruby.
It supports OpenAI, OpenAI-compatible endpoints, Anthropic, Google
Gemini, DeepSeek, xAI, Z.ai, AWS Bedrock, Ollama, and llama.cpp. It
also includes built-in ActiveRecord and Sequel support, plus concurrent
tool execution through threads, tasks (via async gem), fibers, ractors,
and fork (via xchan.rb gem).
Quick start
LLM::Context
The
LLM::Context
object is at the heart of the runtime. Almost all other features build
on top of it. It is a low-level interface to a model, and requires tool
execution to be managed manually. The
LLM::Agent
class is almost the same as
LLM::Context
but it manages tool execution for you - we'll cover agents next:
require "llm"
llm = LLM.openai(key: ENV["KEY"])
ctx = LLM::Context.new(llm, stream: $stdout)
ctx.talk "Hello world"
LLM::Agent
The
LLM::Agent
object is implemented on top of
LLM::Context.
It provides the same interface, but manages tool execution for you. It
also has builtin features such as a loop guard that detects repeated
tool call patterns, and another guard that detects infinite tool call
loops. Both guards advise the model to change course rather than raise
an error:
require "llm"
llm = LLM.openai(key: ENV["KEY"])
agent = LLM::Agent.new(llm, stream: $stdout)
agent.talk "Hello world"
Agents (Advanced)
An agent can be configured to require confirmation before a tool is
executed. When a matching tool is called, llm.rb runson_tool_confirmation. That callback must decide whether to cancel the
tool call or approve it and execute it by callingfn.spawn(strategy).wait, and it must always return an instance ofLLM::Function::Return:
require "llm"
class Agent < LLM::Agent
tools DeleteFile
confirm "delete-file"
def on_tool_confirmation(fn, strategy)
path = fn.arguments["path"] || fn.arguments[:path]
if path.start_with?("/tmp/")
fn.spawn(strategy).wait
else
fn.cancel(reason: "Deletion requires approval")
end
end
end
llm = LLM.openai(key: ENV["KEY"])
Agent.new(llm, stream: $stdout).talk("Delete /tmp/example.txt.")
Tools
The
LLM::Tool
class can be subclassed to implement your own tools that can extend the
abilities of a model:
class ReadFile < LLM::Tool
name "read-file"
description "Read a file"
parameter :path, String, "The filename or path"
required %i[path]
def call(path:)
{contents: File.read(path)}
end
end
MCP
The
LLM::MCP
object lets llm.rb use tools provided by an MCP server. Those tools are
exposed through the same runtime as local tools, so you can pass them
to either
LLM::Context
or
LLM::Agent.
In this example, the MCP server runs over stdio and
LLM::Context
uses the same tool loop as local tools:
require "llm"
llm = LLM.openai(key: ENV["KEY"])
mcp = LLM::MCP.stdio(argv: ["ruby", "server.rb"])
mcp.run do
ctx = LLM::Context.new(llm, stream: $stdout, tools: mcp.tools)
ctx.talk "Use the available tools to inspect the environment."
ctx.talk(ctx.wait(:call)) while ctx.functions?
end
Use persistent HTTP connections with remote MCP servers:
require "llm"
mcp = LLM::MCP.http(
url: "https://remote-mcp.example.com",
transport: LLM::Transport.net_http_persistent
)
A2A (Agent 2 Agent)
The
LLM::A2A
object lets llm.rb use skills provided by a remote A2A agent. Those
skills are exposed through the same runtime as local tools, so you can
pass them to either
LLM::Context
or
LLM::Agent.
Use remote skills as local tools:
require "llm"
a2a = LLM::A2A.rest(
url: "https://remote-agent.example.com",
headers: {"Authorization" => "Bearer token"}
)
llm = LLM.openai(key: ENV["KEY"])
ctx = LLM::Context.new(llm, tools: a2a.skills)
ctx.talk "Analyze this CSV and summarize the trends."
ctx.talk(ctx.wait(:call)) while ctx.functions?
Use persistent HTTP connections:
require "llm"
a2a = LLM::A2A.rest(
url: "https://remote-agent.example.com",
transport: LLM::Transport.net_http_persistent
)
For more on direct messaging, task operations, push notification
configs, and JSON-RPC, see the
LLM::A2A API docs.
Skills
Skills are reusable instructions loaded from a SKILL.md directory. They let
you package behavior and tool access together, and they plug into the
same runtime as tools, agents, MCP, and A2A. When a skill runs, llm.rb
spawns a subagent with the skill instructions, access to only the tools
listed in the skill, and recent conversation context:
---
name: release
description: Prepare a release
tools: ["search-docs", "git"]
---
## Task
Review the release state, summarize what changed, and prepare the release.
require "llm"
class ReleaseAgent < LLM::Agent
model "gpt-5.4-mini"
skills "./skills/release"
end
llm = LLM.openai(key: ENV["KEY"])
ReleaseAgent.new(llm, stream: $stdout).talk("Prepare the next release.")
LLM::Stream
The
LLM::Stream
object lets you observe output and runtime events as they happen. You
can subclass it to handle streamed content in your own application:
require "llm"
class Stream < LLM::Stream
def on_content(content)
$stdout << content
end
end
llm = LLM.openai(key: ENV["KEY"])
ctx = LLM::Context.new(llm, stream: Stream.new)
ctx.talk "Write a haiku about Ruby."
LLM::Stream (advanced)
The
LLM::Stream
object can also resolve tool calls while output is still streaming. Inon_tool_call, you can spawn the tool, push the work onto the stream
queue, and later drain it with wait:
require "llm"
class Stream < LLM::Stream
def on_content(content)
$stdout << content
end
def on_tool_call(tool, error)
return queue << error if error
queue << ctx.spawn(tool, :thread)
end
end
llm = LLM.openai(key: ENV["KEY"])
ctx = LLM::Context.new(llm, stream: Stream.new, tools: [ReadFile])
ctx.talk "Read README.md and summarize the quick start."
ctx.talk(ctx.wait) while ctx.functions?
Concurrency
llm.rb can run tool work concurrently. This is useful when a model calls
multiple tools and you want to resolve them in parallel instead of one
at a time. On
LLM::Agent,
you can enable this with concurrency. Common options are :call for
sequential execution, :thread, or :task for concurrent IO-bound work, and:ractor or :fork for more isolated CPU-bound work:
require "llm"
class Agent < LLM::Agent
model "gpt-5.4-mini"
tools ReadFile
concurrency :thread
end
llm = LLM.openai(key: ENV["KEY"])
agent = Agent.new(llm, stream: $stdout)
agent.talk "Read README.md and CHANGELOG.md and compare them."
Serialization
The LLM::Context
object can be serialized to JSON, which makes it suitable for storing
in a file, a database column, or a Redis queue. The built-in
ActiveRecord and Sequel plugins are built on top of this feature:
require "llm"
llm = LLM.openai(key: ENV["KEY"])
# Serialize a context
ctx1 = LLM::Context.new(llm)
ctx1.talk "Remember that my favorite language is Ruby"
string = ctx1.to_json
# Restore a context (from JSON)
ctx2 = LLM::Context.new(llm, stream: $stdout)
ctx2.restore(string:)
ctx2.talk "What is my favorite language?"
Installation
gem install llm.rb
Examples
REPL
This example uses LLM::Context
directly for an interactive REPL.
See the
deepdive (web) or
deepdive (markdown) for more examples.
require "llm"
llm = LLM.openai(key: ENV["KEY"])
ctx = LLM::Context.new(llm, stream: $stdout)
loop do
print "> "
ctx.talk(STDIN.gets || break)
puts
end
Multimodal: Local Files
In llm.rb, a prompt can be a string, an LLM::Prompt, or an array.
When you use an array, each element can be plain text or a tagged object such asctx.image_url(...),ctx.local_file(...),
or ctx.remote_file(...).
Those tagged objects carry the metadata the provider adapter needs to turn one
Ruby prompt into the provider-specific multimodal request schema.
ctx.local_file(path) tags a local path as a :local_file object aroundLLM.File(path). If the model understands that file type, you can include it
directly in the prompt array instead of uploading it first through a provider
Files API:
require "llm"
llm = LLM.openai(key: ENV["KEY"])
ctx = LLM::Context.new(llm)
ctx.talk ["Summarize this document.", ctx.local_file("README.md")]
Context Compaction
This example uses LLM::Context,LLM::Compactor, andLLM::Stream together so
long-lived contexts can summarize older history and expose the lifecycle
through stream hooks. This approach is inspired by General Intelligence
Systems. The
compactor can also use its own model: if you want summarization to run on a
different model from the main context. token_threshold: accepts either a
fixed token count or a percentage string like "90%", which resolves
against the active model context window and triggers compaction once total
token usage goes over that percentage. See the
deepdive (web) or
deepdive (markdown) for more examples.
require "llm"
class Stream < LLM::Stream
def on_compaction(ctx, compactor)
puts "Compacting #{ctx.messages.size} messages..."
end
def on_compaction_finish(ctx, compactor)
puts "Compacted to #{ctx.messages.size} messages."
end
end
llm = LLM.openai(key: ENV["KEY"])
ctx = LLM::Context.new(
llm,
stream: Stream.new,
compactor: {
token_threshold: "90%",
retention_window: 8,
model: "gpt-5.4-mini"
}
)
Reasoning
This example uses LLM::Stream
with the OpenAI Responses API so reasoning output is streamed separately from
visible assistant output. See the
deepdive (web) or
deepdive (markdown) for more examples.
To use the Responses API (OpenAI-specific), initialize a
context or agent with mode: :responses and keep usingtalk for turns.
require "llm"
class Stream < LLM::Stream
def on_content(content)
$stdout << content
end
def on_reasoning_content(content)
$stderr << content
end
end
llm = LLM.openai(key: ENV["KEY"])
ctx = LLM::Context.new(
llm,
model: "gpt-5.4-mini",
mode: :responses,
reasoning: {effort: "medium"},
stream: Stream.new
)
ctx.talk("Solve 17 * 19 and show your work.")
Request Cancellation
Need to cancel a stream? llm.rb has you covered throughLLM::Context#interrupt!.
See the deepdive (web)
or deepdive (markdown) for more examples.
require "llm"
require "io/console"
llm = LLM.openai(key: ENV["KEY"])
ctx = LLM::Context.new(llm, stream: $stdout)
worker = Thread.new do
ctx.talk("Write a very long essay about network protocols.")
rescue LLM::Interrupt
puts "Request was interrupted!"
end
STDIN.getch
ctx.interrupt!
worker.join
Sequel (ORM)
The plugin :llm integration wrapsLLM::Context on aSequel::Model and keeps tool execution explicit. Like the ActiveRecord
wrappers, its built-in persistence contract is the serialized data column,
while provider: resolves a real LLM::Provider instance and context:
injects defaults such as model:.
See the
deepdive (web) or
deepdive (markdown) for more examples.
require "llm"
require "net/http/persistent"
require "sequel"
require "sequel/plugins/llm"
class Context < Sequel::Model
plugin :llm, provider: :set_provider, context: :set_context
private
def set_provider
LLM.openai(key: ENV["OPENAI_SECRET"], persistent: true)
end
def set_context
{model: "gpt-5.4-mini", mode: :responses, store: false}
end
end
ctx = Context.create
ctx.talk("Remember that my favorite language is Ruby")
puts ctx.talk("What is my favorite language?").content
ActiveRecord (ORM): acts_as_llm
The acts_as_llm method wraps LLM::Context and
provides full control over tool execution. Its built-in persistence contract is
one serialized data column. If your app has provider, model, or usage
columns, provide them to llm.rb through provider: and context: instead of
relying on reserved wrapper columns.
See the deepdive (web)
or deepdive (markdown) for more examples.
require "llm"
require "active_record"
require "llm/active_record"
class Context < ApplicationRecord
acts_as_llm provider: :set_provider, context: :set_context
private
def set_provider
LLM.openai(key: ENV["OPENAI_SECRET"])
end
def set_context
{model: "gpt-5.4-mini", mode: :responses, store: false}
end
end
ctx = Context.create!
ctx.talk("Remember that my favorite language is Ruby")
puts ctx.talk("What is my favorite language?").content
require "llm"
require "active_record"
require "llm/active_record"
class Context < ApplicationRecord
acts_as_llm provider: :set_provider, context: :set_context
# Optional application columns can still provide the provider and context.
# For example, `provider_name` and `model_name` can be normal columns.
private
def set_provider
LLM.public_send(provider_name, key: provider_key)
end
def set_context
{model: model_name, mode: :responses, store: false}
end
end
ActiveRecord (ORM): acts_as_agent
The acts_as_agent method wraps LLM::Agent and
manages tool execution for you. Like acts_as_llm, its built-in persistence
contract is one serialized data column. If your app has provider or model
columns, provide them to llm.rb through your hooks and agent DSL.
See the deepdive (web)
or deepdive (markdown) for more examples.
require "llm"
require "active_record"
require "llm/active_record"
class Ticket < ApplicationRecord
acts_as_agent provider: :set_provider, context: :set_context
model "gpt-5.4-mini"
instructions "You are a concise support assistant."
tools SearchDocs, Escalate
concurrency :thread
private
def set_provider
LLM.openai(key: ENV["OPENAI_SECRET"])
end
def set_context
{mode: :responses, store: false}
end
end
ticket = Ticket.create!
puts ticket.talk("How do I rotate my API key?").content
require "llm"
require "active_record"
require "llm/active_record"
class Ticket < ApplicationRecord
acts_as_agent provider: :set_provider, context: :set_context
model "gpt-5.4-mini"
instructions "You are a concise support assistant."
private
def set_provider
LLM.public_send(provider_name, key: provider_key)
end
def set_context
{mode: :responses, store: false}
end
end
MCP
This example uses LLM::MCP
over HTTP so remote GitHub MCP tools run through the sameLLM::Context tool path as local tools. It expects a GitHub token inENV["GITHUB_PAT"]. See the
deepdive (web) or
deepdive (markdown) for more examples.
require "llm"
require "net/http/persistent"
llm = LLM.openai(key: ENV["KEY"], persistent: true)
mcp = LLM::MCP.http(
url: "https://api.githubcopilot.com/mcp/",
headers: {"Authorization" => "Bearer #{ENV["GITHUB_PAT"]}"},
persistent: true
)
mcp.start
ctx = LLM::Context.new(llm, stream: $stdout, tools: mcp.tools)
ctx.talk("Pull information about my GitHub account.")
ctx.talk(ctx.wait(:call)) while ctx.functions?
mcp.stop
For scoped work, mcp.run do ... end is shorter and handles cleanup for you:
mcp = LLM::MCP.http(
url: "https://api.githubcopilot.com/mcp/",
headers: {"Authorization" => "Bearer #{ENV["GITHUB_PAT"]}"},
persistent: true
)
mcp.run do
ctx = LLM::Context.new(llm, stream: $stdout, tools: mcp.tools)
ctx.talk("Pull information about my GitHub account.")
ctx.talk(ctx.wait(:call)) while ctx.functions?
end
Resources
- deepdive (web) and
deepdive (markdown) are the examples guide. - relay shows a real application built on
top of llm.rb. - doc site has the API docs.
License
Reviews (0)
Sign in to leave a review.
Leave a reviewNo results found