multimodal-agent-workshop
agent
Pass
Health Pass
- License รขโฌโ License: MIT
- Description รขโฌโ Repository has a description
- Active repo รขโฌโ Last push 0 days ago
- Community trust รขโฌโ 16 GitHub stars
Code Pass
- Code scan รขโฌโ Scanned 1 files during light audit, no dangerous patterns found
Permissions Pass
- Permissions รขโฌโ No dangerous permissions requested
No AI report is available for this listing yet.
๐ผ๏ธ Workshop: Build a multimodal AI agent with Haystack & GPT-4o โ featuring image understanding, document retrieval, conversational memory, and human-in-the-loop safety controls
README.md
๐ผ๏ธ Giving Eyes to Your AI: Engineering a Multimodal Agent
A hands-on workshop exploring multimodal AI agents with Haystack.
What You'll Build
- ๐ Multimodal indexing pipeline (PDFs + images) using CLIP embeddings
- ๐ค Vision-enabled agent powered by GPT-4o
- ๐ RAG tool for searching company policies
- ๐ฌ Conversational memory for context-aware interactions
- ๐ Human-in-the-loop controls for sensitive actions
Get Started
๐ See multimodal_agent_notebook.ipynb for the full interactive experience.
Deploy with Hayhooks
Want to deploy the agent as an API? Check out multimodal-agent/pipeline_wrapper.py โ a Python script version of the notebook with Hayhooks integration pre-configured for serving the conversational agent.
Files
The files/ directory contains the sample data used in the workshop:
receipt.jpegโ A sample receipt image for the expense reimbursement demosocial_budget_policy.mdโ Company policy document for retrieval
Requirements
- Python 3.10+
- OpenAI API key (or your preferred LLM provider)
- See the notebook for full package installation instructions
Reviews (0)
Sign in to leave a review.
Leave a reviewNo results found