Video Keeper - AI-Powered Video Library with Multimodal Agentic Search via TwelveLabs API
Transform any video collection into an intelligent, searchable library using multi-modal AI and agentic conversation. This solution leverages Strands SDK, Amazon Nova, Anthropic Claude, Twelve Labs models and Amazon Transcribe to retrieve rich insights from videos.
Video Keeper - AI-Powered Video Library with Multimodal Agentic Search via TwelveLabs API
Overview
Transform any video collection into an intelligent, searchable library using multi-modal AI and agentic conversation. This solution leverages Strands SDK (Agentic framework), Amazon Nova, Anthropic Claude, Twelve Labs models and Amazon Transcribe to retrieve rich insights from videos. This is a generic video search solution which works with any type of videos.
Tags
- ai-agents
- video-to-video-search
- bedrock
- python
- demo
- strands
- mcp
Technologies
- Python 3.11+
- AWS SDK (boto3)
- Amazon Bedrock
- Amazon Nova
- Amazon OpenSearch Serverless
- AWS Step Functions
- Strands Agents SDK
- Model Context Protocol (MCP)
- FastAPI
- React
- Tailwind CSS
- TwelveLabs
Difficulty
Medium
π― What is Video Keeper?
Video Keeper is an agentic AI system that automatically analyzes, indexes, and makes any video collection searchable through natural conversation. Whether you have training videos, personal memories, gaming recordings, educational content, or professional documentation, Video Keeper creates an intelligent search experience powered by AWS and advanced AI models.
π Key Capabilities
π¬ Universal Video Support
- Personal memories, family videos, vacation recordings
- Educational content, lectures, tutorials, how-to guides
- Gaming highlights, streams, gameplay recordings
- Professional content, meetings, presentations, training materials
- Entertainment videos, shows, documentaries
π Advanced Search Methods
- Conversational AI Search - Chat naturally about your videos using AWS Strands SDK
- Video-to-Video Similarity - Upload a video to find visually similar content
- Semantic Search - βFind happy family momentsβ or βShow me Python tutorialsβ
- Entity Search - Find videos featuring specific people, brands, or objects
- Keyword Search - Traditional text-based search across all metadata
π§ Multi-Modal AI Analysis
- Visual content understanding via Twelve Labs Marengo
- Speech-to-text transcription with Amazon Transcribe
- Entity extraction (people, brands, objects) using Amazon Nova
- Sentiment analysis and content insights via Twelve Labs Pegasus
- Smart thumbnails generated with FFmpeg
π§ Robust Architecture
- Serverless AWS infrastructure (Step Functions, Lambda, OpenSearch)
- Real-time streaming responses via WebSocket
- Secure presigned URLs for video access
- Comprehensive error handling and monitoring
ποΈ Architecture Overview
βββββββββββββββ ββββββββββββββββ βββββββββββββββββββ
β S3 Video βββββΆβ EventBridge βββββΆβ Step Functions β
β Upload β β Trigger β β Workflow β
βββββββββββββββ ββββββββββββββββ βββββββββββββββββββ
β
βββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββββ
β βΌ β
β βββββββββββββββββββ β
β β Lambda: Initiateβ β
β β Processing β β
β βββββββββββββββββββ β
β β β
β βΌ β
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β Twelve Labs βββββββββββββββΆβ Lambda: Extract βββββββββββββββΆβ OpenSearch β
β (Marengo + β β Insights β β Serverless β
β Pegasus) β βββββββββββββββββββ β (Vector + Text) β
βββββββββββββββββββ β βββββββββββββββββββ
βΌ β² β²
βββββββββββββββββββ β β
β Cohere Embed ββββββββββββββββββββββββββββ β
β (Semantic Vec.) β β
βββββββββββββββββββ β
β β
βΌ β
βββββββββββββββββββ β
β Amazon Nova ββββββββββββββββββββββββββββββββ
β (Entity Extract)β
βββββββββββββββββββ
β
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β Frontend React βββββββββββββββΆβ AI Agent βββββββββββββββΆβ MCP Server β
β (Port 3000) β β (Strands SDK) β β (Port 8008) β
β β β (Port 8080) β β β
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β β β
β βΌ βΌ
β βββββββββββββββββββ βββββββββββββββββββ
βββββββββββββββββββββΆβ Video API β β OpenSearch β
β (Port 8091) βββββββββββββββΆβ Video Search β
βββββββββββββββββββ βββββββββββββββββββ
β
βΌ
βββββββββββββββββββ
β Amazon Bedrock β
β (Claude 3.5v2) β
βββββββββββββββββββ
π Quick Start
Prerequisites
- AWS Account with permissions for Bedrock, OpenSearch Serverless, Lambda, Step Functions, S3
- AWS CLI configured with appropriate credentials
- Python 3.11+ and Node.js 16+ installed
- SAM CLI installed (installation guide)
- Twelve Labs API key (subscribe and test TwelveLab models using their daily free quota)
- S3 deployment bucket - Create an S3 bucket for SAM deployment artifacts before running deploy.sh
1. Deploy AWS Infrastructure
# Clone repository
git clone <repository-url>
cd intelligent-video-search-ai-agent
# Create deployment bucket for SAM artifacts (one-time setup)
aws s3 mb s3://my-sam-deployment-bucket-$(date +%s)
# Deploy using the deployment script
# IMPORTANT:
# -b: Deployment bucket (MUST already exist) - stores CloudFormation artifacts
# -d: Data bucket name (will be CREATED) - stores your videos
# -a: Your Twelve Labs API key - SAM will store your key on AWS Secrets-Manager (encrypted)
# -p: Your IAM user/role ARN - grants OpenSearch access for local development
# --create-index: Create Opensearch index using data_ingestion/1-create-opensearch-index.py script.
./deploy.sh -b existing-deployment-bucket -d new-video-data-bucket -a your-twelve-labs-api-key -p your-iam-arn --create-index
# Example:
# ./deploy.sh -b my-sam-deployment-bucket-1736281200 -d my-unique-video-bucket-name -a tlk_XXXXXXXXXXXXXX -p "$(aws sts get-caller-identity --query Arn --output text)" --create-index
# Note outputs: OpenSearch endpoint, S3 bucket names
(Optional) 2. Configure Twelve Labs API Key
This step is only required if you did not provide your Twelve Labs API key with the deploy.sh script (-a)
# Store Twelve Labs API key in AWS Secrets Manager
aws secretsmanager create-secret \
--name twelve-labs-api-key \
--secret-string '{"api_key":"your_twelve_labs_api_key_here"}'
3. Set Up Environment Variables
Copy and configure environment files for each component:
# Copy environment files in each directory
cp MCP/.env.example MCP/.env
cp agent/.env.example agent/.env
cp video-api/.env.example video-api/.env
Then configure the main .env
file:
# Core AWS Configuration
AWS_REGION=us-east-1
OPENSEARCH_ENDPOINT=your-collection-id.us-east-1.aoss.amazonaws.com
INDEX_NAME=video-insights-rag
# Twelve Labs Configuration
TWELVE_LABS_API_KEY_SECRET=twelve-labs-api-key
# Note: TWELVE_LABS_INDEX_ID is automatically managed by the system
# The video processing pipeline creates the index and stores the ID in AWS Secrets Manager
# AI Models
BEDROCK_MODEL_ID=us.anthropic.claude-3-5-sonnet-20241022-v2:0
COHERE_MODEL_ID=cohere.embed-english-v3
NOVA_MODEL_ID=amazon.nova-lite-v1:0
# Service Ports
MCP_PORT=8008
API_PORT=8080
VIDEO_API_PORT=8091
Important: Edit each componentβs .env
file with your specific AWS endpoints. The Twelve Labs index ID is now automatically managed - you only need to configure the API key and OpenSearch endpoint. Check all env files for more details about the required variables.
4. Start All Services
Start services in order (MCP Server must be running before AI Agent):
Terminal 1 - MCP Server:
pip install -r requirements.txt
cd MCP/
python 1-video-search-mcp.py
Note: The requirements.txt above contains the requirements for the Agent and MCP server.
Terminal 2 - AI Agent:
cd agent/
python 1-ai-agent-video-search-strands-sdk.py
Terminal 3 - Video API:
cd video-api/
pip install -r requirements.txt
python 1-video-api.py
Terminal 4 - Frontend:
cd frontend/video-insights-ui/
npm install
npm start
5. Test the System
# Upload a test video (use the data bucket name from -d parameter)
aws s3 cp test-video.mp4 s3://your-data-bucket-name/videos/
# Access the UI
open http://localhost:3000
# Try searches like:
# - "Find videos with people laughing"
# - "Show me tutorial content"
# - "What videos mention Python?"
π§ Enhanced Features
OpenSearch Access Control
The deployment now supports adding your IAM user/role to OpenSearch permissions for local development:
- Use
-p
flag to grant your IAM principal access to OpenSearch - Prevents 403 errors when running local APIs (video-api, MCP server)
- Get your ARN:
aws sts get-caller-identity --query Arn --output text
Early Validation
Video processing now includes early validation:
- OpenSearch index check: Verifies index exists before using Twelve Labs API
- Prevents wasted API calls: Stops processing early if infrastructure isnβt ready
- Clear error messages: Helpful debugging information for deployment issues
Automatic Index Management
The system now handles Twelve Labs index creation and management automatically:
- Auto-Creation: First video upload automatically creates the Twelve Labs index
- Secure Storage: Index ID is stored in AWS Secrets Manager for sharing between components
- Zero Configuration: No manual index ID management required
- Automatic Sync: All components (MCP server, Lambda functions) automatically retrieve the correct index ID
Option B: Use Sample Dataset
If you need sample videos for testing, use the provided dataset downloader:
# Navigate to data directory
cd data/
# Install requirements and authenticate with HuggingFace
pip install huggingface_hub
huggingface-cli login # Enter your HF token
# Download sample videos (requires dataset access approval)
python download.py
Note: The sample dataset (HuggingFaceFV/finevideo
) requires:
- HuggingFace account and access token
- Dataset access approval from the source
- Sufficient storage space (downloads 25 videos by default)
See data/README.md
for complete licensing and usage information.
π Search Capabilities
1. Conversational AI Search
Chat naturally with the AI agent powered by AWS Strands SDK and Claude 3.5 Sonnet:
- βFind videos where people are celebratingβ
- βShow me all Python programming tutorialsβ
- βWhat videos are featuring Nick?β
2. Video-to-Video Similarity Search
Upload any video to find visually similar content in your library (MCP/.env
defines the required similarity score):
- Compare visual composition, colors, scenes
- Find different angles of the same event
- Locate similar content types or styles
3. Advanced Search Methods
- Semantic Search: Natural language understanding using Cohere embeddings
- Keyword Search: Traditional text search across titles, descriptions, transcripts
- Hybrid Search: Combines semantic and keyword for best results
- Entity Search: Find specific people, brands, objects, or locations
4. Smart Filtering
- Sentiment analysis (positive, negative, neutral content)
- Temporal searches (date ranges, recent content)
- Content type classification
- Speaker/person identification with timestamps
π Detailed Setup
Environment Configuration
Each component has its own .env.example
file with required variables:
- Main
.env.example
- Core AWS and service configuration MCP/.env.example
- OpenSearch and Twelve Labs settingsagent/.env.example
- AI agent and Bedrock configurationfrontend/.env.example
- React app API endpoints
AWS Permissions Required
Your AWS user/role needs access to:
- Amazon Bedrock: Claude 3.5 Sonnet, Cohere Embed, Nova Lite models
- Amazon OpenSearch Serverless: Collection creation, read/write access
- AWS Lambda: Function creation and execution
- AWS Step Functions: State machine creation and execution
- Amazon S3: Bucket access for video storage
- Amazon EventBridge: Rule creation for S3 events
- AWS Secrets Manager: Secret creation and access
- Amazon Transcribe: Video transcription services
Video Requirements
- Formats: Your video files must be encoded in the video and audio formats listed on the FFmpeg Formats Documentation
- Size: Up to 2GB per video
- Resolution: Must be at least 360x360 and must not exceed 3840x2160.
- Duration: For Twelve Labs Marengo (Embedding), it must be between 4 seconds and 2 hours (7,200s). For Pegasus, it must be between 4 seconds and 60 minutes (3600s). In a future release, the maximum duration for Pegasus will be 2 hours (7,200 seconds).
π§ͺ Testing & Validation
Automated Test Suite
This script helps you to evaluate and troubleshoot agent issues.
# Test all agent endpoints and functionality
cd agent/
python 2-test_agent.py
The test suite validates:
- β All API endpoints responding correctly
- β MCP server connectivity and search functions
- β WebSocket streaming for real-time responses
- β Session management and context tracking
- β Error handling for edge cases
Manual Testing Workflow
- Upload Test Videos: Use diverse content types (tutorials, personal videos, presentations)
- Test Search Variety: Try different search methods and query types
- Validate Results: Check that returned videos match search intent
- Test Video Upload Search: Upload new videos to find similar existing content
π° Cost Considerations
AWS Usage Charges
- Amazon OpenSearch Serverless: Major costs of this solution will be here, make sure you delete your Collection if you are not using to avoid charges ($100+/month)
- AWS Lambda: Pay per execution, typically $1-10/month for moderate use
- Amazon Bedrock: Pay per API call, varies by model and usage
- Amazn S3: Storage costs based on video collection size
- AWS Step Functions: Pay per state transition, minimal cost
Third-Party Services
- Twelve Labs: Usage-based pricing for video analysis
- Free tier available, then pay per minute of video processed
Cost Optimization Tips
- Implement video compression before upload to reduce storage costs
- Monitor Bedrock usage via Cost Explorer and implement caching for repeated queries
π¨ Important Disclaimers
Educational Purpose
This project is designed for educational and demonstration purposes. In order to improve the security of this application, you may want to implement:
- Implement proper authentication and authorization
- Implement API rate control
- APIs are currently running as python scripts to make it simple for you to test, in production you need a proper hosting for the APIs
- Add data encryption at rest and in transit
- Set up comprehensive monitoring and alerting
- Review and implement security best practices
- Consider compliance requirements (GDPR, CCPA, etc.)
Data Privacy
- Videos and metadata are stored in your AWS account
- Twelve Labs processes videos according to their privacy policy
- Implement appropriate data retention and deletion policies
- Consider geographic data residency requirements
Scalability Considerations
- Current configuration suitable for personal to small team use
- For large-scale deployment, review OpenSearch sizing, Bedrock quotas and Lambda limits
- Consider implementing video preprocessing pipelines for very large collections
π§Ή Cleanup & Cost Management
Complete Resource Cleanup
# Empty and delete S3 buckets if required (you may need to do this before deleting the stack)
aws s3 rm s3://your-video-bucket --recursive
aws s3 rb s3://your-video-bucket
# Delete CloudFormation stack (removes most resources)
aws cloudformation delete-stack --stack-name YOUR_STACK_NAME
# (Optional) Delete OpenSearch collection manually if required
aws opensearchserverless delete-collection --id YOUR_COLLECTION_NAME
# (Optional) Delete secrets
aws secretsmanager delete-secret --secret-id twelve-labs-api-key --force-delete-without-recovery
Cost Monitoring
- Monitor your AWS Billing Dashboard
- Set up billing alerts for unexpected charges
- Review OpenSearch Serverless usage regularly (primary cost driver)
π Project Structure
intelligent-video-search-ai-agent/
βββ π MCP/ # Model Context Protocol server
βββ π agent/ # AI agent (Strands SDK + Claude)
βββ π frontend/ # React web interface
βββ π video-api/ # Video metadata API service
βββ π lambdas/ # AWS Lambda functions
βββ π data_ingestion/ # OpenSearch index setup
βββ π data/ # Sample datasets
βββ π infrastructure.yaml # CloudFormation template
βββ π step-functions-definition.json # Step Functions workflow
βββ π .env.example # Environment configuration template
Each directory contains its own README.md with component-specific setup instructions.
Security
See CONTRIBUTING for more information.
License
This library is licensed under the MIT-0 License. See the LICENSE file.