Skip to main content
Source

This page is generated from skills/eks-best-practices/references/container-registry.md. Edit the source, not this page.

Container Registry Best Practices

Part of: eks-best-practices Purpose: ECR architecture, operating models, image promotion, vulnerability scanning, base image curation, lifecycle policies, pull-through cache, managed signing, archival storage, and registry configuration for Amazon EKS


Table of Contents

  1. ECR Architecture
  2. Operating Models
  3. Image Promotion Pipeline
  4. Vulnerability Scanning
  5. Base Image Curation
  6. ECR Lifecycle Policies
  7. Pull-Through Cache
  8. Repository Creation Templates
  9. Managed Signing
  10. Archival Storage Class
  11. Registry Configuration

ECR Architecture

Private vs Public Repositories

TypeUse CaseAccess
ECR PrivateInternal application images, base imagesIAM-authenticated, VPC endpoint supported
ECR PublicOpen-source projects, shared librariesPublic read, authenticated write

Repository Naming Conventions

Use a consistent naming pattern that encodes ownership and purpose:

PatternExampleUse When
<team>/<app>platform/nginx-base, team-a/api-serviceMulti-team, clear ownership
<env>/<app>prod/api-service, dev/api-serviceEnvironment-separated registries
<app> (flat)api-service, web-frontendSmall team, few images

Cross-Account Access

PatternMechanismUse When
Resource-based policyECR repository policy allows cross-account pullCentralized registry, multiple consumer accounts
ECR replicationAutomatic replication to target account/regionEach account needs its own copy
IAM role assumptionConsumer assumes role in registry accountFine-grained access control

VPC Endpoints for ECR

For private clusters or security-sensitive environments, configure VPC endpoints to avoid routing image pulls through the internet:

EndpointTypeRequired For
com.amazonaws.<region>.ecr.apiInterfaceECR API calls (auth, describe)
com.amazonaws.<region>.ecr.dkrInterfaceDocker image pull/push
com.amazonaws.<region>.s3GatewayImage layer storage (S3-backed)

Operating Models

FactorCentralized ECRTenant-Managed ECREnterprise Registry (Artifactory/Harbor)
Registry locationSingle shared AWS accountEach team's own accountSelf-hosted or SaaS
Who managesPlatform teamIndividual teamsPlatform/security team
Access controlRepository policies + IAMPer-account IAMRegistry-native RBAC
Image promotionCross-account replication or re-tagPush to own registryPromotion rules in registry
ScanningCentralized Inspector configPer-account InspectorRegistry-native scanning
Best forSmall-medium orgs, single accountLarge orgs, strict isolationExisting enterprise investment, multi-cloud

When to Use Each

ScenarioRecommendation
Single AWS account, <10 teamsCentralized ECR
Multi-account with Control TowerCentralized ECR in shared services account + cross-account pull
Regulatory requirement for team isolationTenant-managed ECR
Multi-cloud or hybridEnterprise registry (Artifactory/Harbor)
Air-gapped environmentECR with pull-through cache or Harbor

Image Promotion Pipeline

Promotion Flow

StageRegistry/TagGateWho Promotes
Builddev/<app>:git-shaCI passes (unit tests, lint, scan)CI pipeline (automatic)
Stagingstaging/<app>:git-shaIntegration tests pass, scan cleanCI pipeline (automatic)
Productionprod/<app>:git-shaApproval gate, load test passRelease pipeline (manual approval)

Tag Strategy

StrategyExampleProsCons
Git SHAapi:a1b2c3dImmutable, traceable to commitNot human-readable
Semantic versionapi:1.2.3Human-readable, follows conventionMust enforce immutability
Git SHA + semverapi:1.2.3-a1b2c3dBest of bothLonger tag
latestapi:latestConvenientMutable -- never use in production

Promotion Methods

MethodHow It WorksBest For
Re-tagAdd production tag to existing image digestSame account, fastest
Cross-account replicationECR replicates image to target accountMulti-account, automatic
CI pipeline copyPipeline pushes image to production registryFull control, audit trail

DO:

  • Use immutable tags (Git SHA or semver) -- never latest in production
  • Enable immutable tag setting on ECR repositories to prevent overwrites
  • Include image digest (@sha256:...) in production deployments for guaranteed immutability

DON'T:

  • Use latest tag in production -- it's mutable and non-deterministic
  • Rebuild images for promotion -- re-tag or replicate the exact same digest
  • Skip scanning between promotion stages

Vulnerability Scanning

ECR Scanning Options

FeatureBasic ScanningEnhanced Scanning (Inspector)
EngineClair (open-source)Amazon Inspector
CoverageOS packages onlyOS + programming language libraries
TriggerOn-push onlyContinuous (re-scans on new CVE disclosure)
FindingsECR console onlySecurity Hub + EventBridge
CostFreePer-image pricing
Limitation--Cannot scan archived images (must restore first)
RecommendationDevelopment onlyProduction

Severity Gating

SeverityCI Pipeline ActionProduction Deploy
CriticalBlock buildBlock deploy
HighBlock build (configurable)Block deploy
MediumWarnAllow with exception
LowLog onlyAllow

Integration with Security Hub

Enhanced scanning findings are automatically sent to Security Hub, providing centralized visibility across all accounts. Configure Security Hub automations to:

  • Notify teams of critical findings via SNS
  • Create Jira/ServiceNow tickets for high findings
  • Track remediation SLAs

DO:

  • Enable enhanced scanning (Inspector) for production repositories
  • Set up continuous scanning -- new CVEs are disclosed daily
  • Gate CI/CD pipelines on scan results -- block critical/high before push
  • Integrate with Security Hub for centralized finding management

DON'T:

  • Rely on basic scanning for production -- it misses language-level vulnerabilities
  • Scan only at push time -- images become vulnerable as new CVEs are disclosed
  • Ignore medium-severity findings indefinitely -- track and remediate on a schedule

Base Image Curation

Why Curate Base Images

Using uncurated public images introduces risk: unknown vulnerabilities, unnecessary packages (shells, curl, build tools), and inconsistent patching. A curated base image pipeline provides a controlled, scanned, and patched foundation for all application images.

Minimal Base Image Options

ImageSizeShellPackage ManagerBest For
Distroless (Google)~2-20 MBNoNoProduction -- minimal attack surface
Alpine~5 MBYes (ash)apkSmall images, need shell for debugging
AL2023-minimal~30 MBYes (bash)dnfAWS-native, Graviton-optimized
Ubuntu minimal~30 MBYes (bash)aptBroad compatibility
Scratch0 MBNoNoStatic binaries (Go, Rust)

Base Image Pipeline

StepActionTool
1Pull upstream base imageCI pipeline
2Scan for vulnerabilitiesAmazon Inspector / Trivy
3Apply security patchesDockerfile RUN dnf update
4Re-scan patched imageAmazon Inspector / Trivy
5Push to internal ECRCI pipeline
6Tag as approved baseSemantic version + approved tag
7Notify teams of new baseEventBridge + SNS

Multi-Architecture Images

For Graviton (arm64) support, build multi-arch images using Docker buildx or CI pipeline matrix builds:

ArchitectureInstance TypesNotes
amd64m6i, c6i, r6iDefault, broadest compatibility
arm64m7g, c7g, r7g (Graviton)20-40% cost savings
Multi-arch manifestBothSingle tag works on both architectures

DO:

  • Maintain a curated set of approved base images in a dedicated ECR repository
  • Rebuild base images weekly to pick up security patches
  • Use multi-stage builds to exclude build tools from final images
  • Build multi-arch images if using Graviton

DON'T:

  • Pull base images directly from Docker Hub in production -- use pull-through cache or internal copies
  • Include shells, curl, or package managers in production images unless required
  • Skip scanning base images -- they're the foundation of your security posture

ECR Lifecycle Policies

Lifecycle policies automatically clean up old or untagged images, reducing storage costs and keeping repositories manageable.

RuleScopeActionPurpose
Remove untagged imagesUntaggedExpire after 1 dayClean up failed builds
Retain N recent taggedTaggedKeep last 30 imagesRollback capability
Expire old imagesTaggedExpire images older than 90 daysCost optimization
Archive stale imagesTaggedArchive after 180 daysLong-term retention at lower cost

Count Types

Lifecycle rules support different ways to measure image age:

Count TypeCounts FromUse When
sinceImagePushedImage push dateDefault -- expire images that haven't been updated
sinceImagePulledLast pull dateKeep frequently-used images regardless of age
sinceImageTransitionedWhen image was archivedManage archived image retention

Tag Filtering

Use tagPatternList with wildcards to target specific images:

{
"tagStatus": "tagged",
"tagPatternList": ["release-*", "v*"],
"countType": "sinceImagePushed",
"countNumber": 90,
"action": { "type": "expire" }
}

This is more flexible than tagPrefixList -- patterns like *-rc or dev-* let you target release candidates, dev builds, or any naming convention.

DO:

  • Apply lifecycle policies to every repository -- don't let images accumulate indefinitely
  • Keep at least 30 recent tagged images for rollback capability
  • Remove untagged images aggressively (1 day retention)
  • Use sinceImagePulled for shared base images to preserve actively-used versions

DON'T:

  • Delete all old images without considering rollback needs
  • Apply lifecycle policies that conflict with compliance retention requirements
  • Forget to set lifecycle policies on pull-through cache repositories -- they accumulate images quickly

Pull-Through Cache

ECR pull-through cache rules automatically cache images from upstream public registries in your private ECR. When a pod pulls an image through the cache, ECR fetches it from the upstream registry, stores it locally, and serves subsequent pulls from the cache.

Supported Upstream Registries

RegistryPrefixAuth Required
Docker Hubdocker.ioYes (Secrets Manager)
ECR Publicpublic.ecr.awsNo
GitHub Container Registryghcr.ioYes (Secrets Manager)
Quay.ioquay.ioYes (Secrets Manager)
Kubernetes Registryregistry.k8s.ioNo
GitLab Container Registryregistry.gitlab.comYes (Secrets Manager)
Chainguardcgr.devYes (Secrets Manager)
Azure Container Registry<name>.azurecr.ioYes (Secrets Manager)

How It Works

  1. Pod requests image via ECR pull-through cache URI (e.g., <acct>.dkr.ecr.<region>.amazonaws.com/docker-hub/library/nginx:1.25)
  2. ECR checks if image exists in cache
  3. If missing or stale (>24 hours since last check), ECR pulls from upstream -- this requires internet access via NAT gateway or VPC endpoint
  4. ECR stores the image (including multi-arch manifests) and serves it locally
  5. Subsequent pulls come from cache with no upstream dependency

When to Use

ScenarioBenefit
Docker Hub rate limitingAvoid 100 pull/6hr anonymous limit
Air-gapped environmentsCache images locally, no internet needed after first pull
ComplianceAll images flow through your ECR with scanning enabled
PerformanceFaster pulls from regional ECR vs cross-internet
CostReduce NAT gateway data transfer costs

DO:

  • Enable pull-through cache for Docker Hub at minimum -- rate limiting is the most common issue
  • Store upstream credentials in Secrets Manager for registries that require authentication
  • Apply vulnerability scanning and lifecycle policies to cache repositories
  • Use repository creation templates to auto-configure cache repositories

DON'T:

  • Assume cached images are scanned automatically -- configure scanning rules for cache repositories
  • Use pull-through cache as a substitute for curated base images -- it caches everything, including vulnerable images
  • Forget that the first pull requires internet access -- air-gapped clusters need initial seeding

Repository Creation Templates

Repository creation templates automatically configure new repositories as they're created -- whether through pull-through cache, create-on-push, or replication. Without templates, new repositories get default settings and miss critical configurations like scanning, encryption, and lifecycle policies.

How Templates Work

Templates match repository names by prefix. When a new repository is created (by any mechanism), ECR checks for a matching template and applies its configuration:

SettingWhat It Configures
EncryptionKMS key or AES-256 for image layer encryption
Image scanningBasic or enhanced scanning on push
Lifecycle policyAutomatic cleanup rules applied at creation
ImmutabilityTag immutability setting
Resource tagsCost allocation and ownership tags
Repository permissionsCross-account access policies

Template Matching

Templates use prefix matching with a priority order:

  1. Longest matching prefix wins
  2. If no prefix matches, the ROOT template applies (if configured)

Example: For repository docker-hub/library/nginx, a template with prefix docker-hub/library/ takes priority over one with prefix docker-hub/.

Create-on-Push

Create-on-push allows repositories to be created automatically when an image is pushed to a repository name that doesn't exist yet. Combined with templates, this means new services can push images without any pre-provisioning -- the repository is created with the correct configuration automatically.

Enable create-on-push either as a registry default or per-template.

DO:

  • Create a ROOT template as a catch-all to ensure every repository gets baseline configuration
  • Use specific prefix templates for pull-through cache registries (e.g., docker-hub/, ghcr/)
  • Include lifecycle policies in templates so cache repositories don't accumulate images endlessly
  • Enable create-on-push for development environments to reduce friction

DON'T:

  • Skip templates for pull-through cache -- without them, cached repos have no scanning or lifecycle policies
  • Enable create-on-push in production without templates -- you'll get misconfigured repositories

Managed Signing

ECR managed signing automatically signs container images on push using AWS Signer, providing cryptographic proof that an image was built and pushed through your pipeline. This supports verification at deploy time via admission controllers like Kyverno or OPA Gatekeeper.

How It Works

  1. Configure signing rules at the registry level (up to 10 rules per registry)
  2. Each rule specifies a repository filter (prefix match) and an AWS Signer signing profile
  3. When an image is pushed to a matching repository, ECR automatically creates a Notation-format signature
  4. The signature is stored alongside the image in the same repository
  5. Admission controllers verify the signature before allowing the image to run

Configuration

SettingPurpose
Signing profileAWS Signer profile that holds the signing key
Repository filterPrefix-based filter (e.g., prod/ signs only production images)
Cross-accountSigning profile can be in a different account from the registry

Integration with Admission Control

Managed signing pairs with Kubernetes admission controllers for deploy-time verification:

ToolHow It Verifies
KyvernoverifyImages policy checks Notation signatures against trusted signing profiles
OPA GatekeeperCustom constraint template validates signature presence and signer identity
RatifyExternal data provider for Gatekeeper, native Notation support

DO:

  • Enable managed signing for production repositories to establish image provenance
  • Use repository prefix filters to sign only images that need verification (avoids signing dev/test images)
  • Combine with admission controllers to enforce signature verification at deploy time

DON'T:

  • Treat signing as a substitute for vulnerability scanning -- signing proves provenance, not safety
  • Use the same signing profile for all environments -- separate dev and prod signing identities

See also: Security -- Supply Chain for admission control patterns and image verification policies


Archival Storage Class

ECR archival storage provides a low-cost tier for images you need to retain but rarely access -- compliance snapshots, audit artifacts, or old release images. Archival images cost significantly less than standard storage but must be restored before they can be pulled.

How It Works

AspectDetail
TransitionVia lifecycle policy archive action, or manual API call
Storage costLower than standard ECR storage
Restore timeUp to 20 minutes
Restore durationRestored copy available for a configurable number of days
ScanningArchived images cannot be scanned -- restore first

Lifecycle Policy Integration

Use lifecycle policies to automatically archive images after a retention period:

{
"rules": [
{
"rulePriority": 1,
"selection": {
"tagStatus": "tagged",
"tagPatternList": ["release-*"],
"countType": "sinceImagePushed",
"countNumber": 180
},
"action": { "type": "archive" }
},
{
"rulePriority": 2,
"selection": {
"tagStatus": "tagged",
"tagPatternList": ["release-*"],
"countType": "sinceImageTransitioned",
"countNumber": 730
},
"action": { "type": "expire" }
}
]
}

This archives release images after 180 days and permanently deletes them 2 years after archival -- a typical compliance lifecycle.

DO:

  • Use archival storage for images required by compliance but rarely pulled
  • Chain lifecycle rules: archive after N days, expire after M days from archival
  • Test restore times before relying on archived images for disaster recovery

DON'T:

  • Archive images you may need for rapid rollback -- 20-minute restore is too slow for incidents
  • Forget that archived images can't be scanned -- restore and scan if you need to assess vulnerabilities

Registry Configuration

ECR has registry-level settings that affect all repositories in the account/region. Two settings are particularly useful for large registries.

Blob Mounting

Blob mounting allows image layers that already exist in one repository to be referenced (mounted) when pushing to another repository in the same registry, instead of re-uploading them. This is significant when many images share common base layers.

SettingEffect
Enabled (default)Push operations mount existing layers from other repos, saving bandwidth and time
DisabledEvery push uploads all layers, even if identical copies exist in the registry

Keep blob mounting enabled unless you have a specific security requirement to isolate layer access between repositories.

Pull-Time Update Exclusions

When pull-through cache is enabled, ECR checks the upstream registry for updates every 24 hours. Pull-time update exclusions let you pin specific repositories so ECR never re-checks upstream -- the cached version is treated as authoritative.

Use this for:

  • Known-good images you've validated and don't want upstream changes to override
  • Air-gapped environments where you've seeded images and upstream is unreachable
  • Compliance scenarios where you need a frozen, auditable copy

Helm Chart Management

ECR OCI Support vs S3 Helm Repository

FactorECR OCI Helm ChartsS3-Based Helm Repo (ChartMuseum)
ProtocolOCI registry (standard)HTTP(S) Helm repo
AuthenticationECR IAM (same as images)S3 IAM + Helm repo plugin
VersioningOCI tags + digestsChart index.yaml
ReplicationECR cross-account/region replicationS3 replication
ScanningNot applicable (charts are templates)Not applicable
RecommendationPreferred — native, no extra infraLegacy or non-AWS Helm consumers

Pushing and Consuming Helm Charts via ECR

The workflow for Helm charts stored in ECR OCI follows three steps:

  1. Authenticate: Obtain an ECR authorization token and pass it to helm registry login. The same ECR IAM credentials used for container images work for Helm charts.
  2. Package and push: Package the chart directory into a .tgz archive, then push it to an OCI URI in ECR (e.g., oci://<account-id>.dkr.ecr.<region>.amazonaws.com/charts/).
  3. Install from ECR: Reference the OCI URI directly in helm install or in ArgoCD Application source configuration with a specific version tag.

Design considerations:

  • Use a dedicated charts/ prefix in ECR to separate Helm charts from container images
  • Apply the same ECR lifecycle policies to chart repositories to clean up old versions
  • ECR cross-account replication works for Helm charts — spoke accounts get chart replicas automatically
  • ArgoCD natively supports OCI Helm sources — no extra configuration needed beyond ECR auth

Sources: