Video Keeper - AI-Powered Video Library with Multimodal Agentic Search via TwelveLabs API

Overview

Transform any video collection into an intelligent, searchable library using multi-modal AI and agentic conversation. This solution leverages Strands SDK (Agentic framework), Amazon Nova, Anthropic Claude, Twelve Labs models and Amazon Transcribe to retrieve rich insights from videos. This is a generic video search solution which works with any type of videos.

Webserver UI

Tags

Technologies

Difficulty

Medium

🎯 What is Video Keeper?

Video Keeper is an agentic AI system that automatically analyzes, indexes, and makes any video collection searchable through natural conversation. Whether you have training videos, personal memories, gaming recordings, educational content, or professional documentation, Video Keeper creates an intelligent search experience powered by AWS and advanced AI models.

πŸš€ Key Capabilities

🎬 Universal Video Support

πŸ” Advanced Search Methods

🧠 Multi-Modal AI Analysis

πŸ”§ Robust Architecture

πŸ—οΈ Architecture Overview

Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   S3 Video  │───▢│ EventBridge  │───▢│ Step Functions  β”‚
β”‚   Upload    β”‚    β”‚   Trigger    β”‚    β”‚   Workflow      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                       β”‚
         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
         β”‚                             β–Ό                             β”‚
         β”‚                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                    β”‚
         β”‚                    β”‚ Lambda: Initiateβ”‚                    β”‚
         β”‚                    β”‚   Processing    β”‚                    β”‚
         β”‚                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                    β”‚
         β”‚                             β”‚                             β”‚
         β”‚                             β–Ό                             β”‚
       β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
       β”‚ Twelve Labs     │◀────────────▢│ Lambda: Extract │─────────────▢│ OpenSearch      β”‚
       β”‚ (Marengo +      β”‚              β”‚   Insights      β”‚              β”‚ Serverless      β”‚
       β”‚  Pegasus)       β”‚              β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜              β”‚ (Vector + Text) β”‚
       β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                       β”‚                       β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                       β–Ό                                   β–²   β–²
                              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                          β”‚   β”‚
                              β”‚ Cohere Embed    β”‚β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
                              β”‚ (Semantic Vec.) β”‚                              β”‚
                              β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                              β”‚
                                       β”‚                                       β”‚
                                       β–Ό                                       β”‚
                              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                              β”‚
                              β”‚ Amazon Nova     β”‚β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                              β”‚ (Entity Extract)β”‚                              
                              β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                              
         β”‚
       β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
       β”‚ Frontend React  │◀────────────▢│ AI Agent        │◀────────────▢│ MCP Server      β”‚
       β”‚ (Port 3000)     β”‚              β”‚ (Strands SDK)   β”‚              β”‚ (Port 8008)     β”‚
       β”‚                 β”‚              β”‚ (Port 8080)     β”‚              β”‚                 β”‚
       β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜              β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜              β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚                             β”‚                                  β”‚
         β”‚                             β–Ό                                  β–Ό
         β”‚                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
         └───────────────────▢│ Video API       β”‚              β”‚ OpenSearch      β”‚
                              β”‚ (Port 8091)     │─────────────▢│ Video Search    β”‚
                              β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜              β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                       β”‚
                                       β–Ό
                              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                              β”‚ Amazon Bedrock  β”‚
                              β”‚ (Claude 3.5v2)  β”‚
                              β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸš€ Quick Start

Prerequisites

1. Deploy AWS Infrastructure

# Clone repository
git clone <repository-url>
cd intelligent-video-search-ai-agent

# Create deployment bucket for SAM artifacts (one-time setup)
aws s3 mb s3://my-sam-deployment-bucket-$(date +%s)

# Deploy using the deployment script
# IMPORTANT: 
# -b: Deployment bucket (MUST already exist) - stores CloudFormation artifacts
# -d: Data bucket name (will be CREATED) - stores your videos
# -a: Your Twelve Labs API key - SAM will store your key on AWS Secrets-Manager (encrypted)
# -p: Your IAM user/role ARN - grants OpenSearch access for local development
# --create-index: Create Opensearch index using data_ingestion/1-create-opensearch-index.py script.
./deploy.sh -b existing-deployment-bucket -d new-video-data-bucket -a your-twelve-labs-api-key -p your-iam-arn --create-index

# Example:
# ./deploy.sh -b my-sam-deployment-bucket-1736281200 -d my-unique-video-bucket-name -a tlk_XXXXXXXXXXXXXX -p "$(aws sts get-caller-identity --query Arn --output text)" --create-index

# Note outputs: OpenSearch endpoint, S3 bucket names

(Optional) 2. Configure Twelve Labs API Key

This step is only required if you did not provide your Twelve Labs API key with the deploy.sh script (-a)

# Store Twelve Labs API key in AWS Secrets Manager
aws secretsmanager create-secret \
  --name twelve-labs-api-key \
  --secret-string '{"api_key":"your_twelve_labs_api_key_here"}'

3. Set Up Environment Variables

Copy and configure environment files for each component:

# Copy environment files in each directory
cp MCP/.env.example MCP/.env
cp agent/.env.example agent/.env
cp video-api/.env.example video-api/.env

Then configure the main .env file:

# Core AWS Configuration
AWS_REGION=us-east-1
OPENSEARCH_ENDPOINT=your-collection-id.us-east-1.aoss.amazonaws.com
INDEX_NAME=video-insights-rag

# Twelve Labs Configuration  
TWELVE_LABS_API_KEY_SECRET=twelve-labs-api-key
# Note: TWELVE_LABS_INDEX_ID is automatically managed by the system
# The video processing pipeline creates the index and stores the ID in AWS Secrets Manager

# AI Models
BEDROCK_MODEL_ID=us.anthropic.claude-3-5-sonnet-20241022-v2:0
COHERE_MODEL_ID=cohere.embed-english-v3
NOVA_MODEL_ID=amazon.nova-lite-v1:0

# Service Ports
MCP_PORT=8008
API_PORT=8080
VIDEO_API_PORT=8091

Important: Edit each component’s .env file with your specific AWS endpoints. The Twelve Labs index ID is now automatically managed - you only need to configure the API key and OpenSearch endpoint. Check all env files for more details about the required variables.

4. Start All Services

Start services in order (MCP Server must be running before AI Agent):

Terminal 1 - MCP Server:

pip install -r requirements.txt
cd MCP/
python 1-video-search-mcp.py

Note: The requirements.txt above contains the requirements for the Agent and MCP server.

Terminal 2 - AI Agent:

cd agent/
python 1-ai-agent-video-search-strands-sdk.py

Terminal 3 - Video API:

cd video-api/
pip install -r requirements.txt
python 1-video-api.py

Terminal 4 - Frontend:

cd frontend/video-insights-ui/
npm install
npm start

5. Test the System

# Upload a test video (use the data bucket name from -d parameter)
aws s3 cp test-video.mp4 s3://your-data-bucket-name/videos/

# Access the UI
open http://localhost:3000

# Try searches like:
# - "Find videos with people laughing"
# - "Show me tutorial content"  
# - "What videos mention Python?"

πŸ”§ Enhanced Features

OpenSearch Access Control

The deployment now supports adding your IAM user/role to OpenSearch permissions for local development:

Early Validation

Video processing now includes early validation:

Automatic Index Management

The system now handles Twelve Labs index creation and management automatically:

Option B: Use Sample Dataset

If you need sample videos for testing, use the provided dataset downloader:

# Navigate to data directory
cd data/

# Install requirements and authenticate with HuggingFace
pip install huggingface_hub
huggingface-cli login  # Enter your HF token

# Download sample videos (requires dataset access approval)
python download.py

Note: The sample dataset (HuggingFaceFV/finevideo) requires:

See data/README.md for complete licensing and usage information.

πŸ” Search Capabilities

Chat naturally with the AI agent powered by AWS Strands SDK and Claude 3.5 Sonnet:

Upload any video to find visually similar content in your library (MCP/.env defines the required similarity score):

3. Advanced Search Methods

4. Smart Filtering

πŸ“‹ Detailed Setup

Environment Configuration

Each component has its own .env.example file with required variables:

AWS Permissions Required

Your AWS user/role needs access to:

Video Requirements

πŸ§ͺ Testing & Validation

Automated Test Suite

This script helps you to evaluate and troubleshoot agent issues.

# Test all agent endpoints and functionality
cd agent/
python 2-test_agent.py

The test suite validates:

Manual Testing Workflow

  1. Upload Test Videos: Use diverse content types (tutorials, personal videos, presentations)
  2. Test Search Variety: Try different search methods and query types
  3. Validate Results: Check that returned videos match search intent
  4. Test Video Upload Search: Upload new videos to find similar existing content

πŸ’° Cost Considerations

AWS Usage Charges

Third-Party Services

Cost Optimization Tips

🚨 Important Disclaimers

Educational Purpose

This project is designed for educational and demonstration purposes. In order to improve the security of this application, you may want to implement:

Data Privacy

Scalability Considerations

🧹 Cleanup & Cost Management

Complete Resource Cleanup

# Empty and delete S3 buckets if required (you may need to do this before deleting the stack)
aws s3 rm s3://your-video-bucket --recursive
aws s3 rb s3://your-video-bucket

# Delete CloudFormation stack (removes most resources)
aws cloudformation delete-stack --stack-name YOUR_STACK_NAME

# (Optional) Delete OpenSearch collection manually if required
aws opensearchserverless delete-collection --id YOUR_COLLECTION_NAME

# (Optional) Delete secrets
aws secretsmanager delete-secret --secret-id twelve-labs-api-key --force-delete-without-recovery

Cost Monitoring

πŸ“š Project Structure

intelligent-video-search-ai-agent/
β”œβ”€β”€ πŸ“ MCP/                      # Model Context Protocol server
β”œβ”€β”€ πŸ“ agent/                    # AI agent (Strands SDK + Claude)
β”œβ”€β”€ πŸ“ frontend/                 # React web interface
β”œβ”€β”€ πŸ“ video-api/                # Video metadata API service
β”œβ”€β”€ πŸ“ lambdas/                  # AWS Lambda functions
β”œβ”€β”€ πŸ“ data_ingestion/           # OpenSearch index setup
β”œβ”€β”€ πŸ“ data/                     # Sample datasets
β”œβ”€β”€ πŸ“„ infrastructure.yaml       # CloudFormation template
β”œβ”€β”€ πŸ“„ step-functions-definition.json # Step Functions workflow
└── πŸ“„ .env.example             # Environment configuration template

Each directory contains its own README.md with component-specific setup instructions.

Security

See CONTRIBUTING for more information.

License

This library is licensed under the MIT-0 License. See the LICENSE file.