Cloud APIM - Otoroshi LLM Extension

Connect, setup, secure and seamlessly manage LLM models using an Universal/OpenAI compatible API

Unified interface: Simplify interactions and minimize integration hassles
Use multiple providers: 50+ LLM providers supported right now, a lot more coming
Load balancing: Ensure optimal performance by distributing workloads across multiple providers
Fallbacks: Automatically switch LLMs during failures to deliver uninterrupted & accurate performance
Automatic retries: LLM APIs often have inexplicable failures. You can rescue a substantial number of your requests with our in-built automatic retries feature.
Semantic cache: Speed up repeated queries, enhance response times, and reduce costs
Custom quotas: Manage LLM tokens quotas per consumer and optimise costs
Key vault: securely store your LLM API keys in Otoroshi vault or any other secret vault supported by Otoroshi.
Observability and reporting: every LLM request is audited with details about the consumer, the LLM provider and usage. All those audit events are exportable using multiple methods for further reporting
Fine grained authorizations: Use Otoroshi advanced fine grained authorizations capabilities to constrains model usage based on whatever you want: user identity, apikey, consumer metadata, request details, etc
Guardrails: Validate your prompts and prompts responses to avoid sensitive or personal informations leakage, irrelevant or unhelpful responses, gibberish content, etc
Prompt engineering: enhance your experience by providing contextual information to your prompts, storing them in a library for reusability, and using prompt templates for increased efficiency
Multi-modal: Audio (TTS, STT, translate), Image and Video model supported
Embeddings support: compute embeddings from various providers and models through a unified API
Vector Stores support: search through vector store to enrich LLM calls
Persistent memories: automatically stores conversation messages and re-inject them on the next calls
Agentic workflows: orchestrate LLM interactions using Otoroshi workflows

Otoroshi LLM Extension is set of Otoroshi plugins and resources to interact with LLMs. To know more about it, go to documentation

Supported LLM providers

All supported providers are available here

Anthropic
Azure OpenAI
Azure AI Foundry
Cloud Temple 🇫🇷 🇪🇺
Cloudflare
Cohere
Deepseek
Gemini
Groq
Huggingface 🇫🇷 🇪🇺
Mistral 🇫🇷 🇪🇺
Ollama (Local Models)
OpenAI
OVH AI Endpoints 🇫🇷 🇪🇺
Scaleway 🇫🇷 🇪🇺
X.ai

And 37 more including Abliteration, AI/ML API, Apertis, AssemblyAI, Cerebras, Chutes, CometAPI, CompactifAI, DeepInfra, Empower, Featherless AI, Fireworks AI, Friendli AI, Galadriel, GMI, Helicone, Hyperbolic, Lambda AI, LlamaGate, Meta Llama API, Minimax, Morph, Nano GPT, Nebius AI Studio, Novita AI, Nscale, Nvidia NIM, OpenRouter, Perplexity, Poe, SambaNova, Sarvam, Synthetic, Together AI, Venice AI, Xiaomi Mimo, Z.AI

Supported Moderation models

OpenAI
- omni-moderation-latest

Supported Audio Text-to-Speech models

OpenAI
- gpt-4o-mini-tts
- tts-1
- tts-1-hd
Groq
- playai-tts
- playai-tts-arabic
ElevenLabs
- eleven_monolingual_v1
- eleven_multilingual_v2

Supported Audio Speech-to-text models

OpenAI
- whisper-1
- gpt-4o-mini-transcribe
Groq
- whisper-large-v3
ElevenLabs
- scribe_v1
Mistral
- voxtral-mini-latest
- voxtral-mini-2507

Supported LLM Embeddings models

All MiniLM L6 V2 (embedded)
Azure OpenAI
Azure AI Foundry
Cloud Temple 🇫🇷 🇪🇺
Cohere
Deepseek
Gemini
Huggingface 🇫🇷 🇪🇺
Mistral 🇫🇷 🇪🇺
- mistral-embed
Nebius AI Studio
Ollama (Local Models)
OpenAI
- text-embedding-3-small
- text-embedding-3-large
- text-embedding-ada-002
OpenRouter
SambaNova
Scaleway 🇫🇷 🇪🇺
X.ai

Supported Image generation models

OpenAI
- dall-e-2
- dall-e-3
- gpt-image-1
Azure OpenAI
Cloud Temple 🇫🇷 🇪🇺
Grok
- grok-2-image
Luma
- photon-1 (default)
- photon-flash-1
Gemini
- google/nano-banana-pro
Hive
- black-forest-labs/flux-schnell

Supported Video generation models

Luma
- ray-flash-2

Requirements

You have to run it on JDK17+

otoroshi-llm-extension