multimodal-agent-workshop

agent
Security Audit
Pass
Health Pass
  • License รขโ‚ฌโ€ License: MIT
  • Description รขโ‚ฌโ€ Repository has a description
  • Active repo รขโ‚ฌโ€ Last push 0 days ago
  • Community trust รขโ‚ฌโ€ 16 GitHub stars
Code Pass
  • Code scan รขโ‚ฌโ€ Scanned 1 files during light audit, no dangerous patterns found
Permissions Pass
  • Permissions รขโ‚ฌโ€ No dangerous permissions requested

No AI report is available for this listing yet.

SUMMARY

๐Ÿ–ผ๏ธ Workshop: Build a multimodal AI agent with Haystack & GPT-4o โ€” featuring image understanding, document retrieval, conversational memory, and human-in-the-loop safety controls

README.md

๐Ÿ–ผ๏ธ Giving Eyes to Your AI: Engineering a Multimodal Agent

A hands-on workshop exploring multimodal AI agents with Haystack.

What You'll Build

  • ๐Ÿ“„ Multimodal indexing pipeline (PDFs + images) using CLIP embeddings
  • ๐Ÿค– Vision-enabled agent powered by GPT-4o
  • ๐Ÿ” RAG tool for searching company policies
  • ๐Ÿ’ฌ Conversational memory for context-aware interactions
  • ๐Ÿ” Human-in-the-loop controls for sensitive actions

Get Started

๐Ÿ‘‰ See multimodal_agent_notebook.ipynb for the full interactive experience.

Deploy with Hayhooks

Want to deploy the agent as an API? Check out multimodal-agent/pipeline_wrapper.py โ€” a Python script version of the notebook with Hayhooks integration pre-configured for serving the conversational agent.

Files

The files/ directory contains the sample data used in the workshop:

  • receipt.jpeg โ€” A sample receipt image for the expense reimbursement demo
  • social_budget_policy.md โ€” Company policy document for retrieval

Requirements

  • Python 3.10+
  • OpenAI API key (or your preferred LLM provider)
  • See the notebook for full package installation instructions

Reviews (0)

No results found