Deployment Guide
This guide covers different deployment scenarios for the Universal Blockchain Node Runner, from development to production environments.
Table of Contents
- Prerequisites
- Quick Start
- Deployment Modes
- Deployment Scenarios
- Best Practices
- Post-Deployment
- Maintenance
- Destroying a Stack
- Troubleshooting
Prerequisites
Required Tools
-
AWS CLI (v2.x or later)
aws --versionInstall: https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html
-
Node.js (v20.x or later)
node --versionInstall: https://nodejs.org/
-
Git
git --version
AWS Account Setup
-
AWS Account: Active AWS account with appropriate permissions
-
IAM Permissions: To perform deployment, our IAM user/role needs:
- CloudFormation full access
- EC2 full access
- IAM role creation
- S3 bucket access
- CloudWatch access
- Auto Scaling (for HA deployments)
- Elastic Load Balancing (for HA deployments)
-
AWS CLI Configuration:
aws configureProvide:
- AWS Access Key ID
- AWS Secret Access Key
- Default region
- Output format (json recommended)
-
Verify Configuration:
aws sts get-caller-identity
Quick Start
1. Clone and Install
# Clone repository
git clone <repository-url>
cd aws-blockchain-node-runners
# Install dependencies
npm install
2. Bootstrap CDK
First-time setup in each account/region:
npx cdk bootstrap aws://ACCOUNT-ID/REGION
Example:
npx cdk bootstrap aws://123456789012/us-east-1
3. Configure Environment
# Copy sample configuration
cp node_modules/aws-bnr-blueprint-dummy/samples/.env-testnet .env
# Edit with your details
nano .env
Minimum required changes:
AWS_ACCOUNT_ID="your-account-id"
AWS_REGION="your-region"
Tip: Run
aws sts get-caller-identityto confirm your account ID. The deployment region is always taken fromAWS_REGIONin your.env— it overrides your AWS CLI profile default, so you can deploy to any region regardless of your profile configuration.
Tip: If deployment fails because the instance type is not available in the default AZ, set
AWS_AZto a specific availability zone where your instance type is supported. For example, addAWS_AZ="us-east-1a"to your.envfile. You can check which AZs support your instance type with:aws ec2 describe-instance-type-offerings --location-type availability-zone --filters Name=instance-type,Values=<type> --region <region>
4. Deploy
# Preview changes
npx cdk synth
# Backup .env file with stack name (for future reference)
STACK_NAME=$(npx cdk synth --quiet 2>&1 | grep "Stack created:" | awk '{print $3}')
cp .env .env-${STACK_NAME}
# Deploy stack
npx cdk deploy --json --outputs-file deploy-output-${STACK_NAME}.json
# Approve changes when prompted
IMPORTANT: File Naming Convention
After deployment, you'll have two files per deployment:
.env-{stack-name}- Configuration backup (for reference)deploy-output-{stack-name}.json- Deployment outputs (required for operations)
Examples:
.env-solana-mainnet-beta-agave-rpc-basedeploy-output-solana-mainnet-beta-agave-rpc-base.json
Why backup .env files:
- Reference for what was deployed
- Useful for redeployment or troubleshooting
- Documents configuration decisions
- Not required for healthcheck (info extracted from stack name and logs)
For multiple deployments:
# List all deployments
ls deploy-output-*.json
# List all configuration backups
ls .env-*
# Each pair corresponds to a unique deployment
Note: The stack name is automatically generated in the format ${protocol}-${network}-${clientConfig}. Version numbers, file extensions, and special characters are removed to reduce variability and allow version updates without changing the stack name.
5. Verify Deployment
# Set the deployment file (replace {stack-name} with your actual stack name)
export DEPLOY_FILE="deploy-output-{stack-name}.json"
# Get stack outputs
cat $DEPLOY_FILE | jq
# Get instance ID (single-node)
export INSTANCE_ID=$(cat $DEPLOY_FILE | jq -r '..|.InstanceId? | select(. != null)')
echo "INSTANCE_ID=$INSTANCE_ID"
# Connect to instance (single-node)
aws ssm start-session --target $INSTANCE_ID --region $AWS_REGION
Deployment Modes
Single-Node Deployment
Use Cases:
- Development and testing
- Personal blockchain node
- Low-traffic applications
- Cost-sensitive deployments
Architecture:
┌─────────────────────────────────────┐
│ VPC (Default) │
│ ┌───────────────────────────────┐ │
│ │ Public Subnet │ │
│ │ ┌─────────────────────────┐ │ │
│ │ │ EC2 Instance │ │ │
│ │ │ - Blockchain Node │ │ │
│ │ │ - EBS Volumes │ │ │
│ │ │ - CloudWatch Agent │ │ │
│ │ └─────────────────────────┘ │ │
│ └───────────────────────────────┘ │
└─────────────────────────────────────┘
Configuration:
DEPLOYMENT_MODE="single-node"
INSTANCE_TYPE="m6a.2xlarge"
Characteristics:
- Single point of failure
- Lower cost
- Simpler management
- Includes CloudWatch dashboard
- Direct instance access
High Availability (HA) Deployment
Use Cases:
- Production workloads
- High-traffic applications
- Mission-critical services
- Redundancy requirements
Architecture:
┌─────────────────────────────────────────────────┐
│ VPC (Default) │
│ ┌───────────────────────────────────────────┐ │
│ │ Application Load Balancer │ │
│ └────────────────┬──────────────────────────┘ │
│ │ │
│ ┌────────────────┴──────────────────────────┐ │
│ │ Auto Scaling Group │ │
│ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │
│ │ │ Node 1 │ │ Node 2 │ │ Node N │ │ │
│ │ │ (Primary)│ │ (Replica)│ │ (Replica)│ │ │
│ │ └──────────┘ └──────────┘ └──────────┘ │ │
│ └───────────────────────────────────────────┘ │
└─────────────────────────────────────────────────┘
Configuration:
DEPLOYMENT_MODE="ha-nodes"
HA_NUMBER_OF_NODES="3"
HA_ALB_HEALTHCHECK_PORT="8545"
HA_ALB_HEALTHCHECK_PATH="/health"
HA_ALB_HEALTHCHECK_GRACE_PERIOD_MIN="60"
HA_ALB_HEALTHCHECK_INTERVAL_SEC="30"
HA_ALB_HEALTHCHECK_TIMEOUT_SEC="5"
HA_ALB_HEALTHCHECK_HEALTHY_THRESHOLD="3"
HA_ALB_HEALTHCHECK_UNHEALTHY_THRESHOLD="2"
HA_NODES_HEARTBEAT_DELAY_MIN="10"
HA_ALB_DEREGISTRATION_DELAY_SEC="30"
Characteristics:
- High availability
- Auto-scaling capability
- Load balancing
- Higher cost
- No default dashboard (create custom)
- Graceful node replacement
Deployment Scenarios
For specific deployment scenarios and configuration examples, refer to the protocol-specific documentation:
- Dummy Protocol: See blueprints/dummy/README.md for testing and development scenarios
- Future Protocols: Each protocol will include deployment scenarios in its README
Sample configurations for each protocol are available in the blueprint package's samples/ directory at node_modules/aws-bnr-blueprint-{protocol}/samples/.
Best Practices
Security
-
Use IAM Roles: Never use long-term credentials
# Attach role to EC2 instances (done automatically)# Use AWS Systems Manager Session Manager for access -
Secrets Management: Store sensitive data in AWS Secrets Manager
# Create secretaws secretsmanager create-secret \--name my-protocol-secret \--secret-string '{"key":"value"}'# Reference in .envPROTOCOL_SECRET_ARN="arn:aws:secretsmanager:..." -
Network Security: Minimize exposed ports
- Only open required ports in security groups
- Use private subnets for production (requires VPC configuration)
- Enable VPC Flow Logs
Default network placement (by design): Single-node instances and HA Auto Scaling Group instances are deployed into public subnets of the default VPC and receive public IPs. This is intentional — blockchain nodes need direct inbound P2P connectivity, and a public-subnet layout avoids NAT gateway cost/complexity. The security posture relies on the security group:
- P2P ports are intentionally open to
0.0.0.0/0(required for peer discovery). - RPC / WebSocket / metrics ports are marked
public: falseand are restricted to the VPC CIDR — they are not internet-reachable by default (in HA mode this is further governed byHA_ALB_INTERNET_FACING/HA_ALB_ALLOWED_CIDR, which default to internal/VPC-only). - Egress is effectively unrestricted (all TCP/UDP to
0.0.0.0/0) because nodes must reach arbitrary peers across the internet.
If you require defense-in-depth beyond the security group (e.g. instances in private subnets with a NAT gateway for egress, P2P via an EIP/NAT), deploy into a custom VPC configured that way rather than the default VPC.
-
Encryption: Enable encryption at rest
- EBS volumes encrypted by default
- Use KMS for additional control
Performance
-
Right-Size Instances: Start with recommended types
# Check protocol's package.json for recommendationscat node_modules/aws-bnr-blueprint-{protocol}/package.json | jq '."aws-blockchain-node-runner".defaultInstanceTypes' -
Optimize Storage:
- Use gp3 for cost-effective performance
- Use io2 for high performance, but only if you require persistance
- Use Instance Store if you need high performance and can tolerate ephemeral nature of it
- Monitor IOPS and throughput metrics
-
Enable Snapshots: Significantly reduces sync time
SNAPSHOT_ENABLED="true"SNAPSHOT_DOWNLOAD_URL="https://..."Large Snapshots: If the compressed archive plus extracted data exceeds available disk space (common with multi-TB snapshots on instance-store volumes), configure a staging volume to hold the archive during download:
SNAPSHOT_STAGING_VOL_SIZE="5000" # Size in GiB, ~1.1x compressed archive sizeThis creates a temporary gp3 EBS volume that is automatically deleted after extraction. See Snapshot Staging Guide for volume sizing guidance and cost analysis.
-
Enable Traffic Shaping (RPC nodes only): Reduces data transfer costs by up to 85%
TRAFFIC_SHAPING_ENABLED="true"TRAFFIC_SHAPING_RATE_MBIT="40"TRAFFIC_SHAPING_CHECK_INTERVAL_SEC="60"TRAFFIC_SHAPING_MAX_BLOCKS_BEHIND="10"Important: Only use on RPC nodes. Do not use on validator/consensus nodes. See Traffic Shaping Guide for detailed information and cost analysis.
-
Monitor Performance: Use CloudWatch metrics
- CPU utilization
- Disk I/O
- Network throughput
- Protocol-specific metrics
- Traffic shaping metrics (if enabled):
c1_blocks_behind
Cost Optimization
-
Use Appropriate Instance Types:
- Development: t3.medium, t3.large
- Production: m6a.2xlarge, m6a.4xlarge
- High-performance: i4i.2xlarge, i4i.4xlarge
-
Optimize Storage:
- Use gp3 instead of io1/io2 when possible
- Right-size IOPS (don't over-provision)
-
Use ARM Instances: Often 20% cheaper
INSTANCE_TYPE="m6g.2xlarge"CPU_TYPE="ARM_64" -
Schedule Non-Production: Stop instances when not needed
# Use AWS Instance Scheduler or Lambda -
Monitor Costs: Set up billing alerts
aws budgets create-budget \--account-id 123456789012 \--budget file://budget.json
Reliability
-
Use HA Mode for Production:
DEPLOYMENT_MODE="ha-nodes"HA_NUMBER_OF_NODES="3" -
Configure Health Checks Properly:
- Appropriate grace period for node initialization
- Reasonable interval and timeout
- Correct health check endpoint
-
Set Up Monitoring:
- CloudWatch dashboards
- CloudWatch alarms
- SNS notifications
-
Implement Backup Strategy:
- Keep
.envconfiguration files backed up - Document deployment settings
- Use blockchain snapshot downloads for data recovery
- Keep
-
Plan for Updates:
- Test updates on testnet first
- Use rolling updates for HA deployments
- Have rollback plan
Post-Deployment
Verify Deployment
-
Check Stack Status:
aws cloudformation describe-stacks \--stack-name YourStackName \--query 'Stacks[0].StackStatus' -
Get Outputs:
aws cloudformation describe-stacks \--stack-name YourStackName \--query 'Stacks[0].Outputs' -
Connect to Instance (single-node):
# Get instance ID from outputsexport INSTANCE_ID=$(cat deploy-output.json | jq -r '..|.InstanceId? | select(. != null)')echo "INSTANCE_ID=$INSTANCE_ID"aws ssm start-session --target $INSTANCE_ID --region $AWS_REGION -
Check Node Status:
Option 1: View logs in CloudWatch (recommended):
# View node service logsaws logs tail /aws/ec2/blockchain-nodes/systemd-services --follow --filter-pattern "node.service"# View for specific instanceexport INSTANCE_ID=$(cat deploy-output.json | jq -r '..|.InstanceId? | select(. != null)')aws logs tail /aws/ec2/blockchain-nodes/systemd-services --follow --log-stream-names $INSTANCE_ID --filter-pattern "node.service"Option 2: Connect via SSM:
# Check service statussudo systemctl status node# View logs directlysudo journalctl -u node -f -
Test RPC Endpoint:
Note: By default, security groups restrict RPC access to within the VPC IP range. To test the endpoint:
a. From within the VPC (recommended - via SSM Session Manager):
# Get instance ID from deploy outputsexport INSTANCE_ID=$(cat deploy-output.json | jq -r '..|.InstanceId? | select(. != null)')# Connect to instanceaws ssm start-session --target $INSTANCE_ID --region $AWS_REGION# Test locallycurl http://localhost:8545b. From outside the VPC (requires security group modification):
# Temporarily add your IP to security groupaws ec2 authorize-security-group-ingress \--group-id sg-xxxxx \--protocol tcp \--port 8545 \--cidr your-ip/32# Test from your machinecurl http://instance-ip:8545 # Single-nodecurl http://alb-dns-name:8545 # HA# Remove the rule after testingaws ec2 revoke-security-group-ingress \--group-id sg-xxxxx \--protocol tcp \--port 8545 \--cidr your-ip/32
Configure Monitoring
-
View CloudWatch Logs:
Cloud-init output (deployment logs):
# View deployment logsaws logs tail /aws/ec2/blockchain-nodes/cloud-init-output --follow# View for specific instanceexport INSTANCE_ID=$(cat deploy-output.json | jq -r '..|.InstanceId? | select(. != null)')aws logs tail /aws/ec2/blockchain-nodes/cloud-init-output --follow --log-stream-names $INSTANCE_IDSystemd service logs (node.service, syncchecker.service, net-rules.service):
Note: Ubuntu's rsyslog automatically forwards all systemd service logs to
/var/log/syslog, which is collected by CloudWatch agent.# View all systemd service logsaws logs tail /aws/ec2/blockchain-nodes/systemd-services --follow# View specific service logs for specific instanceexport INSTANCE_ID=$(cat deploy-output.json | jq -r '..|.InstanceId? | select(. != null)')aws logs tail /aws/ec2/blockchain-nodes/systemd-services --follow --log-stream-names $INSTANCE_ID --filter-pattern "node.service"aws logs tail /aws/ec2/blockchain-nodes/systemd-services --follow --log-stream-names $INSTANCE_ID --filter-pattern "syncchecker.service"aws logs tail /aws/ec2/blockchain-nodes/systemd-services --follow --log-stream-names $INSTANCE_ID --filter-pattern "net-rules.service"# View all logs for specific instance (no service filter)aws logs tail /aws/ec2/blockchain-nodes/systemd-services --follow --log-stream-names $INSTANCE_IDNote: All systemd service logs are available in CloudWatch Logs. You can also connect via SSM and use
journalctlif needed. -
Set Up Alarms:
aws cloudwatch put-metric-alarm \--alarm-name high-cpu \--alarm-description "Alert when CPU exceeds 80%" \--metric-name CPUUtilization \--namespace AWS/EC2 \--statistic Average \--period 300 \--threshold 80 \--comparison-operator GreaterThanThreshold \--evaluation-periods 2
Maintenance
Updates
-
Update Node Version (requires stack replacement):
# Update .envCLIENT_VERSION="v1.15.0"# Destroy existing stacknpx cdk destroy# Deploy new stack with updated versionnpx cdk deploy --json --outputs-file deploy-output.jsonNote: Version updates require instance replacement. For single-node deployments, this causes downtime. For HA deployments, use rolling updates (see below).
-
Update Configuration (non-instance changes):
# Modify .env (e.g., HA health check settings)# Deploy changesnpx cdk deploy --json --outputs-file deploy-output.jsonNote: Some configuration changes (like health check settings) can be updated without destroying the stack. Instance-level changes require replacement.
-
Rolling Updates (HA only):
- For HA deployments, instance replacements happen automatically as rolling updates
- New instances launched with updated configuration
- Health checks verify new instances are healthy
- Old instances terminated after deregistration delay
- No downtime during the update process
Scaling
-
Vertical Scaling (change instance type - requires stack replacement):
# Update .envINSTANCE_TYPE="m6a.4xlarge"# Destroy existing stacknpx cdk destroy# Deploy with new instance typenpx cdk deploy --json --outputs-file deploy-output.jsonNote: Changing instance type requires instance replacement. For single-node, this causes downtime. For HA, rolling updates minimize downtime.
-
Horizontal Scaling (HA only - no downtime):
# Update .envHA_NUMBER_OF_NODES="5"# Deploy (no destroy needed)npx cdk deploy --json --outputs-file deploy-output.jsonNote: Horizontal scaling in HA mode does not require stack destruction and causes no downtime.
-
Storage Scaling (live volume expansion):
# Increase volume size (can be done live)aws ec2 modify-volume --volume-id vol-xxxxx --size 8000# Wait for modification to completeaws ec2 describe-volumes-modifications --volume-id vol-xxxxx# Connect to instance and extend filesystemexport INSTANCE_ID=$(cat deploy-output.json | jq -r '..|.InstanceId? | select(. != null)')aws ssm start-session --target $INSTANCE_ID --region $AWS_REGIONsudo resize2fs /dev/xvdg # For ext4# ORsudo xfs_growfs /data # For xfsNote: Storage can be expanded without destroying the stack or replacing instances.
Backup and Recovery
Note: EBS snapshots are not recommended for blockchain nodes due to the large data size and slow lazy-loading performance. Instead, use blockchain-specific snapshot downloads from external sources (configured via SNAPSHOT_DOWNLOAD_URL).
For disaster recovery:
- Re-deploy from Configuration: Keep your
.envfile backed up - Use Blockchain Snapshots: Download fresh blockchain data from trusted snapshot providers
- Document Configuration: Maintain documentation of your deployment settings
Monitoring and Alerting
-
Regular Health Checks:
- Review CloudWatch dashboards daily
- Check alarm status
- Review logs for errors
-
Performance Monitoring:
- Track sync status
- Monitor resource utilization
- Identify bottlenecks
-
Cost Monitoring:
- Review AWS Cost Explorer
- Check for unexpected charges
- Optimize resource usage
Destroying a Stack
To remove a deployed node and all associated AWS resources:
npx cdk destroy <stack-name>
The AI-driven workflow (@deploy) covers teardown as part of the session. Use the command above if you've exited the AI session and want to clean up manually.
Troubleshooting
See Troubleshooting Guide for detailed troubleshooting steps.
See Also
- Configuration Reference - Complete configuration documentation
- Troubleshooting - Common issues and solutions
- Snapshot Staging - Staging volume for large snapshot downloads
- Testing - Testing guide
- Adding New Protocols - Protocol addition guide
- Design Document - System architecture and design decisions