Video Keeper - AI-Powered Video Library with Multimodal Agentic Search via Amazon Bedrock

Overview

Transform any video collection into an intelligent, searchable library using multi-modal AI and agentic conversation. This solution leverages Amazon Bedrock’s serverless native TwelveLabs models, Strands SDK (Agentic framework), Amazon Nova, Cohere embedding, Anthropic Claude, and Amazon Transcribe to retrieve rich insights from videos - all without requiring external API keys or third-party SDKs. This is a generic video search solution which works with any type of videos.

🎯 Key Innovation: This implementation uses TwelveLabs’ cutting-edge video understanding models (Marengo and Pegasus) directly through Amazon Bedrock, providing enterprise-grade video AI capabilities with simplified deployment and billing through your AWS account.

Webserver UI

Tags

Technologies

Difficulty

Medium

🎯 What is Video Keeper?

Video Keeper is an agentic AI system that automatically analyzes, indexes, and makes any video collection searchable through natural conversation. Whether you have training videos, personal memories, gaming recordings, educational content, or professional documentation, Video Keeper creates an intelligent search experience powered entirely by AWS services and advanced AI models available through Amazon Bedrock.

πŸš€ Key Capabilities

🎬 Universal Video Support

πŸ” Advanced Search Methods

🧠 Multi-Modal AI Analysis (Powered by Amazon Bedrock)

πŸ”§ Robust Architecture

πŸ—οΈ Architecture Overview

Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   S3 Video  │───▢│ EventBridge  │───▢│ Step Functions  β”‚
β”‚   Upload    β”‚    β”‚   Trigger    β”‚    β”‚   Workflow      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                       β”‚
         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
         β”‚                             β–Ό                             β”‚
         β”‚                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                    β”‚
         β”‚                    β”‚ Lambda: Initiateβ”‚                    β”‚
         β”‚                    β”‚   Processing    β”‚                    β”‚
         β”‚                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                    β”‚
         β”‚                             β”‚                             β”‚
         β”‚                             β–Ό                             β”‚
       β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
       β”‚ Amazon Bedrock  │◀────────────▢│ Lambda: Extract │─────────────▢│ OpenSearch      β”‚
       β”‚ TwelveLabs      β”‚              β”‚   Insights      β”‚              β”‚ Serverless      β”‚
       β”‚ (Marengo +      β”‚              β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜              β”‚ (Vector + Text) β”‚
       β”‚  Pegasus)       β”‚                       β”‚                       β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
       β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                       β–Ό                                   β–²   β–²
                              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                          β”‚   β”‚
                              β”‚ Cohere Embed    β”‚β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
                              β”‚ (Semantic Vec.) β”‚                              β”‚
                              β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                              β”‚
                                       β”‚                                       β”‚
                                       β–Ό                                       β”‚
                              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                              β”‚
                              β”‚ Amazon Nova     β”‚β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                              β”‚ (Entity Extract)β”‚                              
                              β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                              
         β”‚
       β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
       β”‚ Frontend React  │◀────────────▢│ AI Agent        │◀────────────▢│ MCP Server      β”‚
       β”‚ (Port 3000)     β”‚              β”‚ (Strands SDK)   β”‚              β”‚ (Port 8008)     β”‚
       β”‚                 β”‚              β”‚ (Port 8090)     β”‚              β”‚                 β”‚
       β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜              β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜              β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚                             β”‚                                  β”‚
         β”‚                             β–Ό                                  β–Ό
         β”‚                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
         └───────────────────▢│ Video API       β”‚              β”‚ OpenSearch      β”‚
                              β”‚ (Port 8091)     │─────────────▢│ Video Search    β”‚
                              β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜              β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                       β”‚
                                       β–Ό
                              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                              β”‚ Amazon Bedrock  β”‚
                              β”‚ (Claude 3.5v2)  β”‚
                              β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸš€ Quick Start

Prerequisites

1. Deploy AWS Infrastructure

This deployment uses TwelveLabs models natively through Amazon Bedrock with automatic cross-region handling:

Prerequisites:

# Clone repository
git clone <repository-url>
cd intelligent-video-search-ai-agent-twelve-labs-via-bedrock

# Create deployment bucket for SAM artifacts (one-time setup)
aws s3 mb s3://my-sam-deployment-bucket-$(date +%s) --region us-east-1

# REQUIRED: Create S3 bucket in us-west-2 for Pegasus processing
aws s3 mb s3://my-videos-pegasus-bucket --region us-west-2

# Get your IAM ARN (REQUIRED for OpenSearch access)
aws sts get-caller-identity --query 'Arn' --output text

# Deploy using the deployment script
# IMPORTANT: 
# -b: Primary video bucket (will be CREATED in us-east-1)
# -d: Deployment bucket (MUST already exist) - stores CloudFormation artifacts
# -w: us-west-2 video bucket (MUST already exist) - for Pegasus processing
# -p: Your IAM user/role ARN (REQUIRED) - grants OpenSearch access
# --create-index: Create OpenSearch index automatically
./deploy.sh -b primary-video-bucket -d deployment-bucket -w pegasus-video-bucket -p your-iam-arn --create-index

# Example:
./deploy.sh -b video-ue1-bucket -d videos-deployment-ue1 -w videos-pegasus-uw2 -p arn:aws:iam::123456789012:user/admin --create-index

# Note outputs: OpenSearch endpoint, State Machine ARN, both S3 bucket names

⚠️ CRITICAL: If you don’t provide the -p parameter with your IAM ARN, OpenSearch index creation will fail with a 403 authorization error.

2. Set Up Environment Variables

Copy and configure the .env.example file:

cp .env.example .env

Then configure the .env file with your deployment outputs:

# ======================
# AWS Configuration
# ======================
AWS_REGION=us-east-1
PRIMARY_REGION=us-east-1  # For Marengo model and main resources
PEGASUS_REGION=us-west-2  # For Pegasus model

# ======================
# OpenSearch Configuration
# ======================
OPENSEARCH_ENDPOINT=your-collection-id.us-east-1.aoss.amazonaws.com
INDEX_NAME=video-insights-rag

# ======================
# S3 Buckets
# ======================
VIDEO_BUCKET=your-video-bucket-east      # Primary bucket from -b parameter
S3_BUCKET=your-video-bucket-east         # Alias for VIDEO_BUCKET
VIDEO_BUCKET_WEST=your-video-bucket-west # Secondary bucket from -w parameter

# ======================
# Bedrock Models Configuration
# ======================
# TwelveLabs models via Bedrock (no API key required!)
MARENGO_MODEL_ID=twelvelabs.marengo-embed-2-7-v1:0  # Video embeddings
PEGASUS_MODEL_ID=us.twelvelabs.pegasus-1-2-v1:0     # Video understanding

# Text and entity extraction models
COHERE_MODEL_ID=cohere.embed-english-v3    # Text embeddings
NOVA_MODEL_ID=amazon.nova-lite-v1:0        # Entity extraction
NOVA_MAX_CHARS=350000

# AI Agent model
BEDROCK_MODEL_ID=us.anthropic.claude-3-5-sonnet-20241022-v2:0
MODEL_TEMPERATURE=0.3

# ======================
# Service Ports
# ======================
MCP_HOST=localhost
MCP_PORT=8008
API_HOST=localhost
API_PORT=8090
VIDEO_API_HOST=localhost
VIDEO_API_PORT=8091

# ======================
# Frontend Configuration
# ======================
REACT_APP_API_URL=http://localhost:8090
REACT_APP_VIDEO_API_URL=http://localhost:8091

πŸ’‘ Key Advantage: Unlike the SDK version, this Bedrock-native implementation requires no external API keys - authentication is handled through your AWS credentials!

3. Start All Services

Start services in order (MCP Server must be running before AI Agent):

Terminal 1 - MCP Server:

pip install -r requirements.txt
cd MCP/
python 1-video-search-mcp.py

Terminal 2 - AI Agent:

cd agent/
python 1-ai-agent-video-search-strands-sdk.py

Terminal 3 - Video API:

cd video-api/
python 1-video-api.py

Terminal 4 - Frontend:

cd frontend/video-insights-ui/
npm install
npm start

4. Test the System

# Upload a test video (use the primary bucket name from -b parameter)
aws s3 cp test-video.mp4 s3://your-primary-bucket-name/videos/

# The system will automatically:
# 1. Process with Marengo (us-east-1) for visual embeddings
# 2. Copy to us-west-2 bucket for Pegasus processing
# 3. Extract comprehensive insights using both models
# 4. Generate transcription with Amazon Transcribe
# 5. Extract entities with Amazon Nova
# 6. Index everything in OpenSearch

# Access the UI
open http://localhost:3000

# Try searches like:
# - "Find videos with people laughing"
# - "Show me tutorial content"  
# - "What videos mention Python?"
# - Upload a video to find similar content

πŸ”§ Enhanced Features

Native Amazon Bedrock Integration

The system now uses TwelveLabs models directly through Amazon Bedrock:

OpenSearch Access Control

The deployment supports adding your IAM user/role to OpenSearch permissions:

Robust Video Processing Pipeline

The Step Functions workflow now includes:

Upload any video to find similar content using Marengo embeddings:

# The system:
# 1. Uploads your video to S3
# 2. Generates embeddings using Bedrock Marengo
# 3. Searches OpenSearch for similar video embeddings
# 4. Returns ranked results with similarity scores

πŸ” Search Capabilities

Chat naturally with the AI agent powered by AWS Strands SDK and Claude 3.5 Sonnet:

Upload any video to find visually similar content using Marengo embeddings:

3. Advanced Search Methods

4. Smart Filtering

πŸ“‹ Detailed Setup

Environment Configuration

The main .env.example file contains all required variables with detailed descriptions.

AWS Permissions Required

Your AWS user/role needs access to:

Video Requirements

πŸ§ͺ Testing & Validation

Automated Test Suite

# Test all agent endpoints and functionality
cd agent/
python 2-test_agent.py

The test suite validates:

Manual Testing Workflow

  1. Upload Test Videos: Use diverse content types (tutorials, personal videos, presentations)
  2. Monitor Processing: Check Step Functions console for processing status
  3. Test Search Variety: Try different search methods and query types
  4. Validate Results: Ensure embeddings and insights are properly indexed
  5. Test Video Upload Search: Upload new videos to find similar existing content

πŸ’° Cost Considerations

AWS Usage Charges

Cost Optimization Tips

🚨 Important Disclaimers

Educational Purpose

This project is designed for educational and demonstration purposes. For production use:

Data Privacy

Scalability Considerations

🧹 Cleanup & Cost Management

Complete Resource Cleanup

# Empty and delete S3 buckets (both regions)
aws s3 rm s3://your-video-bucket-east --recursive
aws s3 rb s3://your-video-bucket-east
aws s3 rm s3://your-video-bucket-west --recursive
aws s3 rb s3://your-video-bucket-west

# Delete CloudFormation stack
aws cloudformation delete-stack --stack-name YOUR_STACK_NAME

# Delete OpenSearch collection (if not deleted by stack)
# This is the main cost driver - ensure it's deleted!
aws opensearchserverless delete-collection --id YOUR_COLLECTION_ID

Cost Monitoring

πŸ“š Project Structure

intelligent-video-search-ai-agent/
β”œβ”€β”€ πŸ“ MCP/                      # Model Context Protocol server
β”œβ”€β”€ πŸ“ agent/                    # AI agent (Strands SDK + Claude)
β”œβ”€β”€ πŸ“ frontend/                 # React web interface
β”œβ”€β”€ πŸ“ video-api/                # Video metadata API service
β”œβ”€β”€ πŸ“ lambdas/                  # AWS Lambda functions
β”‚   β”œβ”€β”€ InitiateVideoProcessing/ # Cross-region video setup
β”‚   └── ExtractInsightsFunction/ # Bedrock model orchestration
β”œβ”€β”€ πŸ“ data_ingestion/           # OpenSearch index setup
β”œβ”€β”€ πŸ“ data/                     # Sample datasets
β”œβ”€β”€ πŸ“„ infrastructure.yaml       # CloudFormation template
β”œβ”€β”€ πŸ“„ .env.example             # Environment configuration template
└── πŸ“„ README.md                # This file

Security

See CONTRIBUTING for more information.

License

This library is licensed under the MIT-0 License. See the LICENSE file.