Troubleshooting Guide

This guide helps you diagnose and resolve common issues with the Universal Blockchain Node Runner.

CRITICAL: Identify the Correct Deployment

Before troubleshooting, always identify which deployment you're working with:

# List all deployments
ls deploy-output-*.json

# Example output:
# deploy-output-solana-mainnet-beta-agave-rpc-base.json
# deploy-output-solana-mainnet-beta-agave-rpc-extended.json
# deploy-output-ethereum-mainnet-archive.json

For GenAI tools: Always ask the user which deployment to troubleshoot if multiple files exist. Confirm the stack name and instance ID before proceeding.

Extract deployment information:

# Replace {stack-name} with the actual stack name from the filename
export DEPLOY_FILE="deploy-output-{stack-name}.json"
export INSTANCE_ID=$(cat $DEPLOY_FILE | jq -r '..|.InstanceId? | select(. != null)')
export STACK_NAME=$(cat $DEPLOY_FILE | jq -r 'keys[0]')

echo "Troubleshooting deployment: $STACK_NAME"
echo "Instance ID: $INSTANCE_ID"

Quick Checks

Start with these quick diagnostic commands for common issues:

Deployment Failed:

# Use the correct stack name from deploy-output-{stack-name}.json
aws cloudformation describe-stack-events \
  --stack-name $STACK_NAME \
  --query 'StackEvents[?ResourceStatus==`CREATE_FAILED`]'

Node Not Starting:

# View node service logs in CloudWatch for specific instance
aws logs tail /aws/ec2/blockchain-nodes/systemd-services \
  --follow \
  --log-stream-names $INSTANCE_ID \
  --filter-pattern "node.service"

Health Check Failing (HA deployments):

# Extract target group ARN from deployment output
export TG_ARN=$(cat $DEPLOY_FILE | jq -r '..|.TargetGroupArn? | select(. != null)')
aws elbv2 describe-target-health --target-group-arn $TG_ARN

For detailed troubleshooting, see the sections below.

Quick Checks
Configuration Issues
Deployment Issues
Node Operation Issues
Networking Issues
Storage Issues
Monitoring Issues
Performance Issues
Traffic Shaping Issues
Security Issues

Configuration Issues

Protocol Not Found

Symptom: Error message "Protocol 'xyz' not found" or "No installed dependency declares protocol 'xyz'"

Cause: The specified protocol doesn't have an installed blueprint package, or is misspelled

Solution:

Check available protocols (installed blueprint packages):

# List installed blueprint packages
node -e "
  const pkg = require('./package.json');
  Object.entries(pkg.dependencies || {}).forEach(([name, ver]) => {
    try {
      const bp = require(name + '/package.json');
      if (bp['aws-blockchain-node-runner']) {
        console.log(bp['aws-blockchain-node-runner'].BLOCKCHAIN_PROTOCOL + ' -> ' + name);
      }
    } catch(e) {}
  });
"

Verify BLOCKCHAIN_PROTOCOL in .env matches a protocol declared by an installed blueprint
Ensure protocol name is lowercase
If using an external blueprint, ensure it is listed in root package.json dependencies and npm install has been run

Example:

# Wrong
BLOCKCHAIN_PROTOCOL="Ethereum"

# Correct
BLOCKCHAIN_PROTOCOL="ethereum"

Invalid package.json blueprint configuration

Symptom: Error parsing protocol configuration

Cause: Malformed JSON or missing "aws-blockchain-node-runner" field in the blueprint's package.json (resolved from node_modules/)

Solution:

Validate JSON syntax of the installed blueprint:

cat node_modules/aws-bnr-blueprint-mychain/package.json | jq .

Check for:
- Missing commas
- Trailing commas
- Unquoted strings
- Mismatched brackets
- Missing "aws-blockchain-node-runner" field

Missing Required Environment Variables

Symptom: "Required environment variable X is not set"

Cause: .env file is missing required variables

Solution:

Check which variables are required:

# Required for all deployments
AWS_ACCOUNT_ID
AWS_REGION
BLOCKCHAIN_PROTOCOL
DEPLOYMENT_MODE
INSTANCE_TYPE
CPU_TYPE
BC_NETWORK
CLIENT_CONFIG
DATA_VOLUMES_COUNT

Copy from sample configuration:

cp node_modules/aws-bnr-blueprint-{protocol}/samples/.env-mainnet .env

Fill in your values

Invalid Storage Configuration

Symptom: "Storage configuration validation failed"

Cause: IOPS or throughput exceeds limits for volume type

Solution:

Check volume type limits:
- gp3: 3,000-80,000 IOPS, 125-2,000 MB/s throughput
- io1: 100-64,000 IOPS
- io2: 100-64,000 IOPS

Adjust values in .env:

DATA_VOL_1_IOPS="80000"  # Within new gp3 limit
DATA_VOL_1_THROUGHPUT="2000"  # Within new gp3 limit

HA Configuration Incomplete

Symptom: "HA configuration is incomplete"

Cause: DEPLOYMENT_MODE="ha-nodes" but HA variables not set

Solution:

Add all required HA variables:

HA_NUMBER_OF_NODES="3"
HA_ALB_HEALTHCHECK_PORT="8545"  # Use protocol's RPC port (8545 for Ethereum, 8899 for Solana)
HA_ALB_HEALTHCHECK_PATH="/health"
HA_ALB_HEALTHCHECK_GRACE_PERIOD_MIN="60"
HA_ALB_HEALTHCHECK_INTERVAL_SEC="30"
HA_ALB_HEALTHCHECK_TIMEOUT_SEC="5"
HA_ALB_HEALTHCHECK_HEALTHY_THRESHOLD="3"
HA_ALB_HEALTHCHECK_UNHEALTHY_THRESHOLD="2"
HA_NODES_HEARTBEAT_DELAY_MIN="10"
HA_ALB_DEREGISTRATION_DELAY_SEC="30"

Or use a sample HA configuration:

cp node_modules/aws-bnr-blueprint-{protocol}/samples/.env-ha .env

Deployment Issues

CDK Bootstrap Required

Symptom: "This stack uses assets, so the toolkit stack must be deployed"

Cause: CDK not bootstrapped in the account/region

Solution:

npx cdk bootstrap aws://ACCOUNT-ID/REGION

Example:

npx cdk bootstrap aws://123456789012/us-east-1

Insufficient IAM Permissions

Symptom: "User is not authorized to perform: iam:CreateRole"

Cause: AWS credentials lack necessary permissions

Solution:

Ensure your IAM user/role has permissions for:
- CloudFormation
- EC2
- IAM
- S3
- CloudWatch
- Auto Scaling (for HA)
- Elastic Load Balancing (for HA)
Use AdministratorAccess for initial testing

Create custom policy for production:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "cloudformation:*",
        "ec2:*",
        "iam:*",
        "s3:*",
        "cloudwatch:*",
        "autoscaling:*",
        "elasticloadbalancing:*"
      ],
      "Resource": "*"
    }
  ]
}

Region Mismatch Between .env and AWS Profile

Symptom: You expect deployment to one region but resources appear in another.

Cause: Previously, CDK would use the AWS CLI profile region (CDK_DEFAULT_REGION) instead of AWS_REGION from your .env file. This is no longer an issue — the app now enforces the .env region at startup.

Current behavior: AWS_REGION in your .env always determines the deployment region. If it differs from your CLI profile default, a note is printed at synth time:

Note: deploying to us-east-1 (from .env), AWS CLI profile default is us-west-2

If you're still seeing unexpected regions, verify AWS_REGION is correctly set in your .env file:

grep AWS_REGION .env

Stack Already Exists

Symptom: "Stack [name] already exists"

Cause: Attempting to deploy with same stack name. Stack names are automatically generated from BLOCKCHAIN_PROTOCOL, BC_NETWORK, and CLIENT_CONFIG (e.g., ethereum-mainnet-geth-1-14-0-lighthouse-2-5-1-full).

Solution:

Update existing stack:

npx cdk deploy --json --outputs-file deploy-output.json

Or destroy and redeploy:

npx cdk destroy
npx cdk deploy --json --outputs-file deploy-output.json

To deploy a different configuration alongside the existing one, change BC_NETWORK or CLIENT_CONFIG in your .env file to generate a unique stack name.

Resource Limit Exceeded

Symptom: "You have exceeded the limit for X"

Cause: AWS service limits reached

Solution:

Check current limits:

aws service-quotas list-service-quotas \
  --service-code ec2 \
  --query 'Quotas[?QuotaName==`Running On-Demand Standard instances`]'

Request limit increase via AWS Support
Or use different instance type/region

Instance Type Not Available in Availability Zone

Symptom: Deployment fails with an error indicating the requested instance type is not available in the selected availability zone (e.g., "Your requested instance type is not supported in your requested Availability Zone")

Cause: The automatically selected availability zone does not support the configured EC2 instance type. Not all instance types are available in every AZ within a region.

Solution:

Check which AZs support your instance type:

aws ec2 describe-instance-type-offerings \
  --location-type availability-zone \
  --filters Name=instance-type,Values=<type> \
  --region <region>

Replace <type> with your instance type (e.g., m6a.2xlarge) and <region> with your AWS region.

Set AWS_AZ in your .env file to an AZ from the output above:
```
AWS_AZ="us-east-1a"
```

Redeploy:

npx cdk deploy --json --outputs-file deploy-output.json

Notes:

AWS_AZ is only used for single-node deployments. HA deployments use the Auto Scaling Group's multi-AZ placement and ignore this setting.
The AZ must belong to the configured AWS_REGION (e.g., us-east-1a for region us-east-1).
See Configuration Reference for full details on the AWS_AZ variable.

CloudFormation Rollback

Symptom: Stack creation failed and rolled back

Cause: Various - check CloudFormation events

Solution:

View stack events:

aws cloudformation describe-stack-events \
  --stack-name YourStackName \
  --query 'StackEvents[?ResourceStatus==`CREATE_FAILED`]'

Check specific error messages
Fix configuration and redeploy
Common causes:
- Invalid instance type for region
- Insufficient capacity
- Security group rule conflicts
- IAM permission issues

Node Operation Issues

Node Not Starting

Symptom: Instance launches but node service fails to start

Diagnosis:

View node service logs in CloudWatch:

# View recent node.service logs
aws logs tail /aws/ec2/blockchain-nodes/systemd-services --follow --filter-pattern "node.service"

# View for specific instance
export INSTANCE_ID=$(cat deploy-output.json | jq -r '..|.InstanceId? | select(. != null)')
aws logs tail /aws/ec2/blockchain-nodes/systemd-services --follow --log-stream-names $INSTANCE_ID --filter-pattern "node.service"

Check service status via CloudWatch Logs Insights:

Navigate to CloudWatch Logs Insights and use this query to check for service failures:

fields @timestamp, @message
| filter @message like /node.service/ and @message like /error|failed|fatal/i
| sort @timestamp desc
| limit 50

Check user data execution:

export INSTANCE_ID=$(cat deploy-output.json | jq -r '..|.InstanceId? | select(. != null)')
aws logs tail /aws/ec2/blockchain-nodes/cloud-init-output --follow --log-stream-names $INSTANCE_ID

If CloudWatch logs are not available, connect via SSM:

export INSTANCE_ID=$(cat deploy-output.json | jq -r '..|.InstanceId? | select(. != null)')
aws ssm start-session --target $INSTANCE_ID --region $AWS_REGION

# Check service status
sudo systemctl status node

# View service logs
sudo journalctl -u node -n 100 --no-pager

Common Causes:

Missing Dependencies:

# View dependency errors in CloudWatch for specific instance
export INSTANCE_ID=$(cat deploy-output.json | jq -r '..|.InstanceId? | select(. != null)')
aws logs tail /aws/ec2/blockchain-nodes/systemd-services --follow --log-stream-names $INSTANCE_ID --filter-pattern "node.service" | grep -i "error\|failed"

# Or connect via SSM to check
export INSTANCE_ID=$(cat deploy-output.json | jq -r '..|.InstanceId? | select(. != null)')
aws ssm start-session --target $INSTANCE_ID --region $AWS_REGION
dpkg -l | grep {package-name}

Insufficient Disk Space:

# Connect via SSM to check disk space
export INSTANCE_ID=$(cat deploy-output.json | jq -r '..|.InstanceId? | select(. != null)')
aws ssm start-session --target $INSTANCE_ID --region $AWS_REGION
df -h

Port Already in Use:

# View port conflict errors in CloudWatch for specific instance
export INSTANCE_ID=$(cat deploy-output.json | jq -r '..|.InstanceId? | select(. != null)')
aws logs tail /aws/ec2/blockchain-nodes/systemd-services --follow --log-stream-names $INSTANCE_ID --filter-pattern "node.service" | grep -i "port\|address"

# Or connect via SSM to check
export INSTANCE_ID=$(cat deploy-output.json | jq -r '..|.InstanceId? | select(. != null)')
aws ssm start-session --target $INSTANCE_ID --region $AWS_REGION
sudo netstat -tulpn | grep {port}

Solution: Fix the specific issue and restart service:

# Connect via SSM
export INSTANCE_ID=$(cat deploy-output.json | jq -r '..|.InstanceId? | select(. != null)')
aws ssm start-session --target $INSTANCE_ID --region $AWS_REGION

# Restart service
sudo systemctl restart node

# Verify service started
sudo systemctl status node

Then verify in CloudWatch:

# Check for "Started" message for specific instance
export INSTANCE_ID=$(cat deploy-output.json | jq -r '..|.InstanceId? | select(. != null)')
aws logs tail /aws/ec2/blockchain-nodes/systemd-services --follow --log-stream-names $INSTANCE_ID --filter-pattern "node.service" | grep -i "started"

Node Syncing Slowly

Symptom: Block height increasing very slowly

Diagnosis:

Check sync status:

# Protocol-specific command (example for Ethereum)
curl http://localhost:8545 -X POST \
  -H "Content-Type: application/json" \
  -d '{"jsonrpc":"2.0","method":"eth_syncing","params":[],"id":1}'

# For Solana (port 8899)
curl http://localhost:8899 -X POST \
  -H "Content-Type: application/json" \
  -d '{"jsonrpc":"2.0","method":"getHealth","params":[],"id":1}'

Check CloudWatch Dashboard (single-node deployments):
- Review the "Volume Read/Write latency (ms/op)" widgets
- High latency (>10ms for reads, >5ms for writes) indicates storage bottleneck
- Check "Volume Read/Write (IO/sec)" for IOPS saturation
- Review "Disk Used (%)" to ensure sufficient free space
Check network connectivity:
```
ping -c 5 8.8.8.8
```

Check peer count:

# Protocol-specific command (example for Ethereum)
curl http://localhost:8545 -X POST \
  -H "Content-Type: application/json" \
  -d '{"jsonrpc":"2.0","method":"net_peerCount","params":[],"id":1}'

# For Solana (port 8899)
curl http://localhost:8899 -X POST \
  -H "Content-Type: application/json" \
  -d '{"jsonrpc":"2.0","method":"getClusterNodes","params":[],"id":1}'

Solutions:

Enable Blockchain Snapshot: Significantly reduces sync time

SNAPSHOT_ENABLED="true"
SNAPSHOT_DOWNLOAD_URL="https://snapshots.example.com/latest.tar.gz"

Optimize Storage (if high latency detected):

a. Switch to io2 volumes for lower latency:
```
DATA_VOL_1_TYPE="io2"
DATA_VOL_1_IOPS="64000"
```
b. Or use Instance Store for lowest latency (data is ephemeral):
```
DATA_VOL_1_TYPE="instance-store"
# Note: Data is lost on instance stop/termination
# Requires instance types with instance store (i3, i4i, i4g, etc.)
```
Note: Storage type changes require stack destruction and redeployment.
Increase Instance Size: More CPU/memory for faster processing
```
INSTANCE_TYPE="m6a.4xlarge"  # Upgrade from 2xlarge
```
Check Peers: Ensure sufficient peer connections
- Verify security group allows P2P ports
- Verify network connectivity

Node Crashed

Symptom: Node service stopped unexpectedly

Diagnosis:

View crash logs in CloudWatch:

# View recent node.service errors for specific instance
export INSTANCE_ID=$(cat deploy-output.json | jq -r '..|.InstanceId? | select(. != null)')
aws logs tail /aws/ec2/blockchain-nodes/systemd-services --follow --log-stream-names $INSTANCE_ID --filter-pattern "node.service" | grep -i "error\|failed\|stopped"

Check for OOM (Out of Memory) events:

# Connect via SSM to check system logs
export INSTANCE_ID=$(cat deploy-output.json | jq -r '..|.InstanceId? | select(. != null)')
aws ssm start-session --target $INSTANCE_ID --region $AWS_REGION

sudo dmesg | grep -i "out of memory"
sudo journalctl -xe | grep -i "oom"

Check disk space:

# Connect via SSM
export INSTANCE_ID=$(cat deploy-output.json | jq -r '..|.InstanceId? | select(. != null)')
aws ssm start-session --target $INSTANCE_ID --region $AWS_REGION
df -h

View detailed service logs:

# View last 200 lines of node.service logs for specific instance
export INSTANCE_ID=$(cat deploy-output.json | jq -r '..|.InstanceId? | select(. != null)')
aws logs tail /aws/ec2/blockchain-nodes/systemd-services --follow --log-stream-names $INSTANCE_ID --filter-pattern "node.service" | tail -200

Solutions:

Out of Memory: Increase instance size

INSTANCE_TYPE="m6a.4xlarge"  # More memory

Disk Full: Increase volume size

# Modify volume size
aws ec2 modify-volume --volume-id vol-xxxxx --size 4000

# Connect via SSM to extend filesystem
export INSTANCE_ID=$(cat deploy-output.json | jq -r '..|.InstanceId? | select(. != null)')
aws ssm start-session --target $INSTANCE_ID --region $AWS_REGION
sudo resize2fs /dev/xvdg

Corrupted Data: Restore from blockchain snapshot or resync

# Connect via SSM
export INSTANCE_ID=$(cat deploy-output.json | jq -r '..|.InstanceId? | select(. != null)')
aws ssm start-session --target $INSTANCE_ID --region $AWS_REGION

sudo systemctl stop node
sudo rm -rf /data/blockchain/chaindata
# Re-download blockchain snapshot or resync

Verify service restarted:

# Check CloudWatch logs for "Started" message for specific instance
export INSTANCE_ID=$(cat deploy-output.json | jq -r '..|.InstanceId? | select(. != null)')
aws logs tail /aws/ec2/blockchain-nodes/systemd-services --follow --log-stream-names $INSTANCE_ID --filter-pattern "node.service" | grep -i "started"

Networking Issues

Cannot Connect to RPC Endpoint

Symptom: Connection refused when accessing RPC endpoint

Diagnosis:

Check if service is listening:

# Check for RPC port (varies by protocol)
sudo netstat -tulpn | grep LISTEN
# Ethereum: 8545, Solana: 8899, Bitcoin: 8332

Check security group rules:

aws ec2 describe-security-groups \
  --group-ids sg-xxxxx \
  --query 'SecurityGroups[0].IpPermissions'

Test locally on instance:

# Use protocol-specific RPC port
curl http://localhost:{rpc-port}
# Ethereum: 8545, Solana: 8899, Bitcoin: 8332

Solutions:

Service Not Running: Start the service
```
sudo systemctl start node
```
Security Group: Verify port is open
- Check requiredPorts in protocol's package.json "aws-blockchain-node-runner" field
- Ensure security group includes the port
- For testing, temporarily allow from your IP

Binding Address: Ensure RPC service binds to internal IP address (security best practice)

# Check configuration
cat /data/blockchain/config/* | grep -i "listen\|bind\|rpc"

# RPC should bind to internal IP for security
# Correct: listen_addr = "172.31.x.x:8545" (internal IP)
# Incorrect: listen_addr = "0.0.0.0:8545" (all interfaces - security risk)

# Get internal IP
TOKEN=$(curl -s -X PUT "http://169.254.169.254/latest/api/token" -H "X-aws-ec2-metadata-token-ttl-seconds: 21600")
EC2_INTERNAL_IP=$(curl -H "X-aws-ec2-metadata-token: $TOKEN" -s http://169.254.169.254/latest/meta-data/local-ipv4)
echo "Internal IP: $EC2_INTERNAL_IP"

# Update configuration to use internal IP
# Example: sed -i "s/0.0.0.0:8545/$EC2_INTERNAL_IP:8545/g" /data/blockchain/config/config.toml

Security Note:

RPC endpoints: Should bind to internal IP (e.g., 172.31.x.x:8545)
P2P endpoints: Can bind to 0.0.0.0 (needs external connectivity)
Access control is managed via Security Groups, not binding addresses
Binding to internal IP provides defense-in-depth security

Health Check Failing (HA)

Symptom: ALB marks targets as unhealthy

Diagnosis:

Check target health:

aws elbv2 describe-target-health \
  --target-group-arn arn:aws:elasticloadbalancing:...

Test health check endpoint:

# Use protocol-specific health check port and path
curl http://instance-ip:{health-port}{health-path}
# Example: curl http://instance-ip:8545/health (Ethereum)
# Example: curl http://instance-ip:8899/health (Solana)

Check ALB logs (if enabled)

Solutions:

Wrong Health Check Path: Update configuration

HA_ALB_HEALTHCHECK_PATH="/health"  # Correct path

Node Not Ready: Increase grace period

HA_ALB_HEALTHCHECK_GRACE_PERIOD_MIN="90"  # More time to initialize

Health Check Too Strict: Adjust thresholds

HA_ALB_HEALTHCHECK_HEALTHY_THRESHOLD="2"  # Reduce from 3
HA_ALB_HEALTHCHECK_INTERVAL_SEC="60"  # Increase interval

Port Mismatch: Verify health check port matches protocol

# Set to protocol's RPC port
HA_ALB_HEALTHCHECK_PORT="8545"  # Ethereum
HA_ALB_HEALTHCHECK_PORT="8899"  # Solana
HA_ALB_HEALTHCHECK_PORT="8332"  # Bitcoin

Peer Connection Issues

Symptom: Node has no peers or very few peers

Diagnosis:

Check peer count:
```
# Protocol-specific command
```

Check P2P port accessibility:

# Check for P2P ports (varies by protocol)
sudo netstat -tulpn | grep LISTEN
# Ethereum: 30303, Solana: 8001-8020 range, Bitcoin: 8333

Verify security group allows P2P ports

Solutions:

Security Group: Ensure P2P ports are open
- Check both TCP and UDP
- Allow from 0.0.0.0/0 for P2P ports
Network Configuration: Check node configuration
- Verify external IP is correct
- Check NAT traversal settings

Storage Issues

Disk Full

Symptom: "No space left on device"

Diagnosis:

df -h
du -sh /data/* | sort -h

Solutions:

Increase Volume Size:

# Modify volume
aws ec2 modify-volume --volume-id vol-xxxxx --size 4000

# Wait for modification to complete
aws ec2 describe-volumes-modifications --volume-id vol-xxxxx

# Extend filesystem
sudo resize2fs /dev/xvdg  # For ext4
# OR
sudo xfs_growfs /data  # For xfs

Clean Up Old Data:

# Protocol-specific cleanup commands
# Be careful - may require resync

Add Additional Volume:
- Update .env with new volume
- Redeploy stack
- Migrate data if needed

Disk Fills During Snapshot Download

Symptom: Disk fills to 100% during snapshot download or extraction, node never starts. CloudWatch logs show download stopping at ~60-70% or extraction failing with "No space left on device".

Cause: The compressed snapshot archive and extracted data both reside on the same /data volume. Peak disk usage = compressed_archive_size + extracted_data_size, which exceeds available space for large snapshots.

Diagnosis:

# Connect via SSM
export INSTANCE_ID=$(cat $DEPLOY_FILE | jq -r '..|.InstanceId? | select(. != null)')
aws ssm start-session --target $INSTANCE_ID --region $AWS_REGION

# Check disk usage
df -h /data

# Check if snapshot archive exists alongside extracted data
ls -lh /data/snapshot-archive 2>/dev/null || ls -lh /data/snapshot.tar.zst 2>/dev/null

Solution: Configure a snapshot staging volume to hold the compressed archive on a separate temporary EBS volume:

Destroy the failed stack:
```
npx cdk destroy
```

Add staging volume to .env (set to ~1.1x the compressed archive size):

# Example for Base mainnet op-reth (~4.86 TB archive)
SNAPSHOT_STAGING_VOL_SIZE="5000"

# Example for BNB mainnet bsc-reth (~9.7 TB archive)
SNAPSHOT_STAGING_VOL_SIZE="10000"

Redeploy:

npx cdk deploy --json --outputs-file deploy-output-$STACK_NAME.json

The staging volume is a temporary gp3 EBS volume that is automatically deleted after successful extraction. Cost is minimal (~$29 for a 5 TB volume over 2 days) compared to the cost of a failed deployment.

See Snapshot Staging Guide for detailed volume sizing guidance per protocol.

Orphaned Snapshot Staging Volume

Symptom: A gp3 EBS volume tagged Purpose=snapshot-staging remains in the account (and keeps incurring cost) after a deployment, even though the snapshot finished downloading.

Cause: The in-instance cleanup could not confirm the staging volume was deleted — for example a missing ec2:DetachVolume/ec2:DeleteVolume permission, a stalled detach, an unreachable metadata service, or the volume ID being lost after a mid-download reboot. Cleanup now logs this rather than swallowing it.

Diagnosis:

# Look for the cleanup error in cloud-init-output for the instance
export INSTANCE_ID=$(cat $DEPLOY_FILE | jq -r '..|.InstanceId? | select(. != null)')
aws logs tail /aws/ec2/blockchain-nodes/cloud-init-output \
  --log-stream-names $INSTANCE_ID \
  --filter-pattern "staging cleanup"

# List any staging volumes still present in the region
aws ec2 describe-volumes \
  --filters "Name=tag:Purpose,Values=snapshot-staging" \
  --query 'Volumes[].{Id:VolumeId,State:State,AZ:AvailabilityZone}' \
  --output table

Solution:

If the stack is still deployed, npx cdk destroy removes the volume via CloudFormation (RemovalPolicy.DESTROY).

If the volume is orphaned (its instance/stack is gone), delete it manually after confirming it is not in use:

aws ec2 detach-volume --volume-id vol-xxxxxxxx 2>/dev/null || true
aws ec2 wait volume-available --volume-ids vol-xxxxxxxx
aws ec2 delete-volume --volume-id vol-xxxxxxxx

If cleanup failed due to missing permissions, confirm the instance role grants ec2:DetachVolume and ec2:DeleteVolume (single-node) or the HA self-management actions, then redeploy.

To validate the staging cleanup lifecycle cheaply, use the dummy debug path documented in Snapshot Staging Guide and look for the STAGING DEBUG: PASS line in cloud-init-output.

Volume Not Mounting

Symptom: Volume exists but not mounted

Diagnosis:

lsblk
sudo blkid
mount | grep /data

Solutions:

Check /etc/fstab:
```
cat /etc/fstab
```
Mount Manually:
```
sudo mount /dev/xvdg /data
```

Check setup-storage.sh Logs:

sudo cat /var/log/cloud-init-output.log | grep -A 20 "setup-storage"

Verify Device Name:

# Device names may differ
lsblk
# Update mount command accordingly

Poor I/O Performance

Symptom: High disk latency, slow read/write

Diagnosis:

Check I/O metrics:
```
iostat -x 5
```
Check CloudWatch metrics:
- VolumeReadOps
- VolumeWriteOps
- VolumeThroughputPercentage
- VolumeQueueLength

Solutions:

Increase IOPS:

DATA_VOL_1_IOPS="80000"  # New gp3 maximum

Increase Throughput (gp3 only):

DATA_VOL_1_THROUGHPUT="2000"  # New gp3 maximum

Use io2 Volumes:

DATA_VOL_1_TYPE="io2"
DATA_VOL_1_IOPS="64000"

Use Instance Store (if available):

DATA_VOL_1_TYPE="instance-store"
# Note: Data is ephemeral

Verify Instance Store Volume Selection:

# List all NVMe devices
lsblk | grep nvme

# Check which volumes are mounted
df -h | grep nvme

# View instance store setup logs
sudo cat /var/log/cloud-init-output.log | grep -A 30 "setup-storage"

Monitoring Issues

CloudWatch Log Groups

The CloudWatch agent is configured to send the following logs to CloudWatch Logs:

Log Group	Description	Retention	Source
`/aws/ec2/blockchain-nodes/cloud-init-output`	Cloud-init output	7 days	`/var/log/cloud-init-output.log`
`/aws/ec2/blockchain-nodes/systemd-services`	Systemd service logs	7 days	`/var/log/syslog`

Note: Ubuntu's rsyslog automatically forwards all systemd service logs to /var/log/syslog, which is then collected by the CloudWatch agent.

Viewing Logs:

# View cloud-init output (most useful for troubleshooting deployment)
aws logs tail /aws/ec2/blockchain-nodes/cloud-init-output --follow

# View systemd service logs (node.service, syncchecker.service, net-rules.service)
aws logs tail /aws/ec2/blockchain-nodes/systemd-services --follow

# View logs for specific instance
export INSTANCE_ID=$(cat deploy-output.json | jq -r '..|.InstanceId? | select(. != null)')
aws logs tail /aws/ec2/blockchain-nodes/cloud-init-output --follow --log-stream-names $INSTANCE_ID
aws logs tail /aws/ec2/blockchain-nodes/systemd-services --follow --log-stream-names $INSTANCE_ID

# Filter logs by service name for specific instance
aws logs tail /aws/ec2/blockchain-nodes/systemd-services --follow --log-stream-names $INSTANCE_ID --filter-pattern "node.service"
aws logs tail /aws/ec2/blockchain-nodes/systemd-services --follow --log-stream-names $INSTANCE_ID --filter-pattern "syncchecker.service"
aws logs tail /aws/ec2/blockchain-nodes/systemd-services --follow --log-stream-names $INSTANCE_ID --filter-pattern "net-rules.service"

CloudWatch Logs Insights Queries:

Use CloudWatch Logs Insights for advanced log analysis. Example query to check node.service errors:

-- View node.service errors
fields @timestamp, @message
| filter @message like /node.service/ and @message like /error|failed|fatal/i
| sort @timestamp desc
| limit 50

Accessing Logs via Console:

Open the CloudWatch Logs Console
Navigate to log group: /aws/ec2/blockchain-nodes/systemd-services
Select the log stream for your instance (instance ID)
Use the filter box to search for specific services:
- node.service - Main blockchain node service
- syncchecker.service - Sync checker and traffic shaping control
- net-rules.service - Traffic shaping network rules
Click "Actions" → "View in Logs Insights" for advanced queries

Metrics Not Appearing

Symptom: CloudWatch dashboard shows no data

Diagnosis:

Check CloudWatch agent status:

sudo systemctl status amazon-cloudwatch-agent

Check agent logs:

sudo cat /opt/aws/amazon-cloudwatch-agent/logs/amazon-cloudwatch-agent.log

Check agent logs in CloudWatch (if available):

export INSTANCE_ID=$(cat deploy-output.json | jq -r '..|.InstanceId? | select(. != null)')
aws logs tail /aws/ec2/blockchain-nodes/systemd-services --follow --log-stream-names $INSTANCE_ID --filter-pattern "cloudwatch-agent"

Or check agent logs directly on instance:

export INSTANCE_ID=$(cat deploy-output.json | jq -r '..|.InstanceId? | select(. != null)')
aws ssm start-session --target $INSTANCE_ID --region $AWS_REGION

sudo cat /opt/aws/amazon-cloudwatch-agent/logs/amazon-cloudwatch-agent.log

Verify IAM permissions:
```
aws sts get-caller-identity
```

Solutions:

Restart CloudWatch Agent:

# Connect via SSM
export INSTANCE_ID=$(cat deploy-output.json | jq -r '..|.InstanceId? | select(. != null)')
aws ssm start-session --target $INSTANCE_ID --region $AWS_REGION

sudo systemctl restart amazon-cloudwatch-agent

Check IAM Role: Ensure instance has CloudWatch permissions
- CloudWatchAgentServerPolicy
- Custom metrics permissions

Verify Configuration:

# Connect via SSM
export INSTANCE_ID=$(cat deploy-output.json | jq -r '..|.InstanceId? | select(. != null)')
aws ssm start-session --target $INSTANCE_ID --region $AWS_REGION

sudo cat /opt/aws/amazon-cloudwatch-agent/etc/custom-amazon-cloudwatch-agent.json

Check Region: Ensure metrics sent to correct region

Dashboard Not Created

Symptom: CloudWatch dashboard doesn't exist after deployment

Diagnosis:

Check CloudFormation stack outputs
Check CDK synthesis output

Solutions:

Single-Node Only: Dashboards only created for single-node deployments
- HA deployments don't include default dashboard
- Create custom dashboard for HA

Performance Issues

High CPU Usage

Symptom: CPU consistently above 80%

Diagnosis:

top
htop  # If installed

Solutions:

Increase Instance Size:

INSTANCE_TYPE="m6a.4xlarge"  # More vCPUs

Optimize Node Configuration:
- Reduce cache size
- Adjust thread count
- Disable unnecessary features
Check for Runaway Processes:
```
ps aux --sort=-%cpu | head -10
```

High Memory Usage

Symptom: Memory consistently above 80%, potential OOM

Diagnosis:

free -h
sudo dmesg | grep -i "out of memory"

Solutions:

Increase Instance Size:

INSTANCE_TYPE="m6a.4xlarge"  # More memory

Optimize Node Configuration:
- Reduce cache size
- Adjust memory limits
- Enable swap (temporary solution)

Add Swap (temporary only as it puts more pressure on storage):

sudo fallocate -l 8G /swapfile
sudo chmod 600 /swapfile
sudo mkswap /swapfile
sudo swapon /swapfile

Slow Deployment

Symptom: CDK deployment takes very long

Causes:

Slow instance initialization
Snapshot download

Solutions:

Faster Instance: Use larger instance type temporarily
Optimize Snapshot: Use compressed snapshots
Parallel Deployment: Deploy multiple stacks in parallel (only if stacks deploy different protocols)

Traffic Shaping Issues

Traffic Shaping Not Working

Symptom: Traffic shaping enabled but bandwidth not limited

Diagnosis:

Check net-rules service status in CloudWatch:

# View net-rules.service logs for specific instance
export INSTANCE_ID=$(cat deploy-output.json | jq -r '..|.InstanceId? | select(. != null)')
aws logs tail /aws/ec2/blockchain-nodes/systemd-services --follow --log-stream-names $INSTANCE_ID --filter-pattern "net-rules.service"

# Check for service start/stop events
aws logs tail /aws/ec2/blockchain-nodes/systemd-services --follow --log-stream-names $INSTANCE_ID --filter-pattern "net-rules.service" | grep -i "started\|stopped"

Check sync checker status in CloudWatch:

# View syncchecker.service logs for specific instance
aws logs tail /aws/ec2/blockchain-nodes/systemd-services --follow --log-stream-names $INSTANCE_ID --filter-pattern "syncchecker.service"

Verify traffic shaping configuration:

# Connect via SSM to check configuration
export INSTANCE_ID=$(cat deploy-output.json | jq -r '..|.InstanceId? | select(. != null)')
aws ssm start-session --target $INSTANCE_ID --region $AWS_REGION

cat /etc/cdk_environment | grep TRAFFIC_SHAPING
sudo systemctl status net-rules.service
sudo systemctl status syncchecker.timer

Solutions:

Service Not Running: Start the service

# Connect via SSM
export INSTANCE_ID=$(cat deploy-output.json | jq -r '..|.InstanceId? | select(. != null)')
aws ssm start-session --target $INSTANCE_ID --region $AWS_REGION

sudo systemctl start net-rules.service
sudo systemctl status net-rules.service

Then verify in CloudWatch:

aws logs tail /aws/ec2/blockchain-nodes/systemd-services --follow --log-stream-names $INSTANCE_ID --filter-pattern "net-rules.service" | grep -i "started"

Sync Checker Not Running: Start the timer

# Connect via SSM
export INSTANCE_ID=$(cat deploy-output.json | jq -r '..|.InstanceId? | select(. != null)')
aws ssm start-session --target $INSTANCE_ID --region $AWS_REGION

sudo systemctl start syncchecker.timer
sudo systemctl status syncchecker.timer

Then verify in CloudWatch:

aws logs tail /aws/ec2/blockchain-nodes/systemd-services --follow --log-stream-names $INSTANCE_ID --filter-pattern "syncchecker.service"

Node Not Fully Synced: Traffic shaping only activates when node is fully synchronized

Check node sync status using protocol-specific commands
Wait for initial sync to complete
Check c1_blocks_behind metric in CloudWatch

View sync status in CloudWatch logs:

export INSTANCE_ID=$(cat deploy-output.json | jq -r '..|.InstanceId? | select(. != null)')
aws logs tail /aws/ec2/blockchain-nodes/systemd-services --follow --log-stream-names $INSTANCE_ID --filter-pattern "syncchecker.service" | grep -i "blocks behind\|slots behind"

Configuration Error: Verify environment variables

# Connect via SSM
export INSTANCE_ID=$(cat deploy-output.json | jq -r '..|.InstanceId? | select(. != null)')
aws ssm start-session --target $INSTANCE_ID --region $AWS_REGION

# Should show true
echo $TRAFFIC_SHAPING_ENABLED

# Should show configured rate
echo $TRAFFIC_SHAPING_RATE_MBIT

Traffic Shaping Causing Sync Issues

Symptom: Node falling behind after traffic shaping enabled

Diagnosis:

Check blocks behind metric in CloudWatch:

export INSTANCE_ID=$(cat deploy-output.json | jq -r '..|.InstanceId? | select(. != null)')
aws cloudwatch get-metric-statistics \
    --namespace CWAgent \
    --metric-name c1_blocks_behind \
    --dimensions Name=InstanceId,Value=$INSTANCE_ID \
    --start-time $(date -u -d '1 hour ago' +%Y-%m-%dT%H:%M:%S) \
    --end-time $(date -u +%Y-%m-%dT%H:%M:%S) \
    --period 60 \
    --statistics Average \
    --region $AWS_REGION

Check if traffic shaping is active in CloudWatch:

# View net-rules service status for specific instance
export INSTANCE_ID=$(cat deploy-output.json | jq -r '..|.InstanceId? | select(. != null)')
aws logs tail /aws/ec2/blockchain-nodes/systemd-services --follow --log-stream-names $INSTANCE_ID --filter-pattern "net-rules.service"

Check sync checker logs in CloudWatch:

# View sync checker activity for specific instance
aws logs tail /aws/ec2/blockchain-nodes/systemd-services --follow --log-stream-names $INSTANCE_ID --filter-pattern "syncchecker.service"

Solutions:

Rate Too Low: Increase bandwidth limit

# Update .env
TRAFFIC_SHAPING_RATE_MBIT="50"  # Increase from 40

# Redeploy
npx cdk deploy --json --outputs-file deploy-output.json

Threshold Too High: Reduce max blocks behind threshold

# Update .env
TRAFFIC_SHAPING_MAX_BLOCKS_BEHIND="5"  # Reduce from 10

# Redeploy
npx cdk deploy --json --outputs-file deploy-output.json

Disable Traffic Shaping: If issues persist

# Update .env
TRAFFIC_SHAPING_ENABLED="false"

# Redeploy
npx cdk deploy --json --outputs-file deploy-output.json

Manual Override: Temporarily disable traffic shaping

# Connect via SSM
export INSTANCE_ID=$(cat deploy-output.json | jq -r '..|.InstanceId? | select(. != null)')
aws ssm start-session --target $INSTANCE_ID --region $AWS_REGION

# Stop traffic shaping
sudo systemctl stop net-rules.service

# Stop sync checker
sudo systemctl stop syncchecker.timer

Then verify in CloudWatch:

aws logs tail /aws/ec2/blockchain-nodes/systemd-services --follow --log-stream-names $INSTANCE_ID --filter-pattern "net-rules.service" | grep -i "stopped"

Traffic Shaping Metrics Not Appearing

Symptom: c1_blocks_behind metric not showing in CloudWatch

Diagnosis:

Check sync checker logs in CloudWatch:

# View syncchecker.service logs for specific instance
export INSTANCE_ID=$(cat deploy-output.json | jq -r '..|.InstanceId? | select(. != null)')
aws logs tail /aws/ec2/blockchain-nodes/systemd-services --follow --log-stream-names $INSTANCE_ID --filter-pattern "syncchecker.service"

Check CloudWatch agent status:

# Connect via SSM
export INSTANCE_ID=$(cat deploy-output.json | jq -r '..|.InstanceId? | select(. != null)')
aws ssm start-session --target $INSTANCE_ID --region $AWS_REGION

sudo systemctl status amazon-cloudwatch-agent

Verify IAM permissions:

# Instance should have CloudWatch PutMetricData permission
aws sts get-caller-identity

Solutions:

Sync Checker Not Running: Start the service

# Connect via SSM
export INSTANCE_ID=$(cat deploy-output.json | jq -r '..|.InstanceId? | select(. != null)')
aws ssm start-session --target $INSTANCE_ID --region $AWS_REGION

sudo systemctl start syncchecker.timer
sudo systemctl status syncchecker.timer

Then verify in CloudWatch:

aws logs tail /aws/ec2/blockchain-nodes/systemd-services --follow --log-stream-names $INSTANCE_ID --filter-pattern "syncchecker.service"

CloudWatch Agent Issue: Restart the agent

# Connect via SSM
export INSTANCE_ID=$(cat deploy-output.json | jq -r '..|.InstanceId? | select(. != null)')
aws ssm start-session --target $INSTANCE_ID --region $AWS_REGION

sudo systemctl restart amazon-cloudwatch-agent

Script Error: Check for errors in sync checker

# View errors in CloudWatch for specific instance
export INSTANCE_ID=$(cat deploy-output.json | jq -r '..|.InstanceId? | select(. != null)')
aws logs tail /aws/ec2/blockchain-nodes/systemd-services --follow --log-stream-names $INSTANCE_ID --filter-pattern "syncchecker.service" | grep -i "error\|failed"

# Or run manually via SSM to see errors
export INSTANCE_ID=$(cat deploy-output.json | jq -r '..|.InstanceId? | select(. != null)')
aws ssm start-session --target $INSTANCE_ID --region $AWS_REGION
sudo /opt/blueprints/user-data/syncchecker.sh

Node Not Ready: Sync checker only runs after initial sync

Check for /data/data/init-completed file
Wait for node to complete initial synchronization

View initialization progress in CloudWatch:

export INSTANCE_ID=$(cat deploy-output.json | jq -r '..|.InstanceId? | select(. != null)')
aws logs tail /aws/ec2/blockchain-nodes/cloud-init-output --follow --log-stream-names $INSTANCE_ID | grep -i "init-completed"

Traffic Shaping Scripts Missing

Symptom: Traffic shaping scripts not found on instance

Diagnosis:

Check if scripts exist:

# Connect via SSM
export INSTANCE_ID=$(cat deploy-output.json | jq -r '..|.InstanceId? | select(. != null)')
aws ssm start-session --target $INSTANCE_ID --region $AWS_REGION

ls -la /opt/network/
ls -la /opt/common/network/

Check asset download in CloudWatch:

# View cloud-init logs for asset download for specific instance
export INSTANCE_ID=$(cat deploy-output.json | jq -r '..|.InstanceId? | select(. != null)')
aws logs tail /aws/ec2/blockchain-nodes/cloud-init-output --follow --log-stream-names $INSTANCE_ID | grep -i "traffic shaping\|network"

Solutions:

Assets Not Downloaded: Check asset download

# Check cloud-init logs in CloudWatch for specific instance
export INSTANCE_ID=$(cat deploy-output.json | jq -r '..|.InstanceId? | select(. != null)')
aws logs tail /aws/ec2/blockchain-nodes/cloud-init-output --follow --log-stream-names $INSTANCE_ID | grep -A 10 "traffic shaping"

Redeploy Stack: If assets missing

npx cdk destroy
npx cdk deploy --json --outputs-file deploy-output.json

Manual Copy: Temporarily copy scripts

# Connect via SSM
export INSTANCE_ID=$(cat deploy-output.json | jq -r '..|.InstanceId? | select(. != null)')
aws ssm start-session --target $INSTANCE_ID --region $AWS_REGION

# If common assets exist but not copied
sudo mkdir -p /opt/network
sudo cp /opt/common/network/*.sh /opt/network/
sudo chmod +x /opt/network/*.sh

Security Issues

Cannot Access Instance

Symptom: Cannot connect via SSM Session Manager

Diagnosis:

Verify IAM role has SSM permissions
Check VPC endpoints (if using private subnets)

Solutions:

Check IAM Role: Ensure AmazonSSMManagedInstanceCore policy attached
VPC Endpoints: Create SSM endpoints for private subnets

Secrets Not Accessible

Symptom: Cannot retrieve secrets from Secrets Manager

Diagnosis:

aws secretsmanager get-secret-value --secret-id my-secret

Solutions:

Verify Secret ARN: Ensure ARN is correct in configuration
Check Region: Secret must be in same region as deployment

Check Secret Exists: Verify the secret was created

aws secretsmanager describe-secret --secret-id my-secret

Test from Instance: Connect to instance and test access

export INSTANCE_ID=$(cat deploy-output.json | jq -r '..|.InstanceId? | select(. != null)')
aws ssm start-session --target $INSTANCE_ID --region $AWS_REGION

aws secretsmanager get-secret-value --secret-id my-secret --region $AWS_REGION

Note: The default IAM role includes secretsmanager:GetSecretValue and secretsmanager:DescribeSecret permissions for all secrets. If you need to restrict access to specific secrets, you can modify the IAM role after deployment.

Getting Additional Help

Collect Diagnostic Information

Before requesting help, collect:

Configuration:

cat .env | grep -v "SECRET\|PASSWORD" > .env-support  # Redact sensitive data
# Inspect the installed blueprint's package.json
cat node_modules/aws-bnr-blueprint-{protocol}/package.json | jq '."aws-blockchain-node-runner"' > protocol-config-support.json

Logs:

sudo cat /var/log/cloud-init-output.log
sudo journalctl -u node -n 200

System Info:
```
uname -a
df -h
free -h
```

CloudFormation Events:

aws cloudformation describe-stack-events --stack-name YourStack

Support Channels

GitHub Issues: Report bugs and request features
Documentation: Check docs/ for guides
AWS Support: For AWS-specific issues

Useful Commands Reference

# CDK Commands
npx cdk synth                    # Synthesize CloudFormation template
npx cdk deploy --json --outputs-file deploy-output.json  # Deploy stack
npx cdk destroy                  # Destroy stack
npx cdk diff                     # Show differences

# Get Instance ID from deployment outputs
export INSTANCE_ID=$(cat deploy-output.json | jq -r '..|.InstanceId? | select(. != null)')
echo "INSTANCE_ID=$INSTANCE_ID"

# AWS CLI Commands
aws ssm start-session --target $INSTANCE_ID --region $AWS_REGION  # Connect to instance
aws logs tail /aws/ec2/blockchain-nodes/cloud-init-output --follow --log-stream-names $INSTANCE_ID  # View deployment logs
aws logs tail /aws/ec2/blockchain-nodes/systemd-services --follow --log-stream-names $INSTANCE_ID  # View systemd service logs
aws cloudformation describe-stacks --stack-name YourStack  # Stack info

# CloudWatch Logs Commands
# View specific service logs for specific instance
aws logs tail /aws/ec2/blockchain-nodes/systemd-services --follow --log-stream-names $INSTANCE_ID --filter-pattern "node.service"
# Check on Ethereum execution client like Geth, Reth, Erigon, or Hyperledger Besu
aws logs tail /aws/ec2/blockchain-nodes/systemd-services --follow --log-stream-names $INSTANCE_ID --filter-pattern "execution"
# Check on Ethereum consensus client like Lighthouse, Prysm, or Teku 
aws logs tail /aws/ec2/blockchain-nodes/systemd-services --follow --log-stream-names $INSTANCE_ID --filter-pattern "consensus"
# Check on Syncchecker
aws logs tail /aws/ec2/blockchain-nodes/systemd-services --follow --log-stream-names $INSTANCE_ID --filter-pattern "syncchecker.service"
aws logs tail /aws/ec2/blockchain-nodes/systemd-services --follow --log-stream-names $INSTANCE_ID --filter-pattern "net-rules.service"

# View logs for specific instance (alternative without filter)
aws logs tail /aws/ec2/blockchain-nodes/systemd-services --follow --log-stream-names $INSTANCE_ID

# Instance Commands (via SSM)
sudo systemctl status node  # Check service status
sudo journalctl -u node -f  # Follow service logs (if CloudWatch not available)
df -h                           # Disk usage
free -h                         # Memory usage
top                             # Process monitor

Prevention Best Practices

Use Sample Configurations: Start with provided sample .env files
Monitor from Day One: Set up CloudWatch alarms immediately
Document Changes: Keep notes on configuration changes
Stay Updated: Keep protocol clients and CDK updated
Review Logs: Regularly check logs for warnings
Capacity Planning: Monitor growth and plan for scaling

CRITICAL: Identify the Correct Deployment​

Quick Checks​

Table of Contents​

Configuration Issues​

Protocol Not Found​

Invalid package.json blueprint configuration​

Missing Required Environment Variables​

Invalid Storage Configuration​

HA Configuration Incomplete​

Deployment Issues​

CDK Bootstrap Required​

Insufficient IAM Permissions​

Region Mismatch Between .env and AWS Profile​

Stack Already Exists​

Resource Limit Exceeded​

Instance Type Not Available in Availability Zone​

CloudFormation Rollback​

Node Operation Issues​

Node Not Starting​

Node Syncing Slowly​

Node Crashed​

Networking Issues​

Cannot Connect to RPC Endpoint​

Health Check Failing (HA)​

Peer Connection Issues​

Storage Issues​

Disk Full​

Disk Fills During Snapshot Download​

Orphaned Snapshot Staging Volume​

Volume Not Mounting​

Poor I/O Performance​

Monitoring Issues​

CloudWatch Log Groups​

Metrics Not Appearing​

Dashboard Not Created​

Performance Issues​

High CPU Usage​

High Memory Usage​

Slow Deployment​

Traffic Shaping Issues​

Traffic Shaping Not Working​

Traffic Shaping Causing Sync Issues​

Traffic Shaping Metrics Not Appearing​

Traffic Shaping Scripts Missing​

Security Issues​

Cannot Access Instance​

Secrets Not Accessible​

Getting Additional Help​

Collect Diagnostic Information​

Support Channels​

Useful Commands Reference​

Prevention Best Practices​

See Also​

CRITICAL: Identify the Correct Deployment

Quick Checks

Table of Contents

Configuration Issues

Protocol Not Found

Invalid package.json blueprint configuration

Missing Required Environment Variables

Invalid Storage Configuration

HA Configuration Incomplete

Deployment Issues

CDK Bootstrap Required

Insufficient IAM Permissions

Region Mismatch Between .env and AWS Profile

Stack Already Exists

Resource Limit Exceeded

Instance Type Not Available in Availability Zone

CloudFormation Rollback

Node Operation Issues

Node Not Starting

Node Syncing Slowly

Node Crashed

Networking Issues

Cannot Connect to RPC Endpoint

Health Check Failing (HA)

Peer Connection Issues

Storage Issues

Disk Full

Disk Fills During Snapshot Download

Orphaned Snapshot Staging Volume

Volume Not Mounting

Poor I/O Performance

Monitoring Issues

CloudWatch Log Groups

Metrics Not Appearing

Dashboard Not Created

Performance Issues

High CPU Usage

High Memory Usage

Slow Deployment

Traffic Shaping Issues

Traffic Shaping Not Working

Traffic Shaping Causing Sync Issues

Traffic Shaping Metrics Not Appearing

Traffic Shaping Scripts Missing

Security Issues

Cannot Access Instance

Secrets Not Accessible

Getting Additional Help

Collect Diagnostic Information

Support Channels

Useful Commands Reference

Prevention Best Practices

See Also