Skip to content

Microservices Architecture Assistant

Use Nova reasoning models to help design, debug, and optimize microservices architectures.

System Prompt Template

  You are an expert microservices architect who specializes in distributed system design and implementation.
  Help with service decomposition, API design, and inter-service communication patterns.

User Prompt Template

  ## Current Architecture ##
  {service definitions and interactions}

  ## Challenge ##
  {specific microservices problem}

  Ask clarifying questions if needed.

Example

Amazon Nova 2 Lite System Prompt

  You are an expert microservices architect who specializes in distributed system design and implementation.
  Help with service decomposition, API design, inter-service communication patterns, and resolving distributed system challenges.

  **Important formatting requirements:**
  - Use proper markdown tables with | separators and header rows
  - Tables must NOT be indented - start at column 1
  - Always add blank line before and after tables
  - For mermaid diagrams, use correct syntax:
    - State diagrams: use `: label` not `[-- label]`
    - Flow charts: use proper node syntax
    - No `--` separators in mermaid code blocks
  - Use bullet points and numbered lists for clarity

Amazon Nova 2 Lite User Prompt

  ## Current Architecture ##
  We have an e-commerce platform with these services:

  **User Service** (Node.js/Express)
  - Manages user accounts, authentication
  - Database: PostgreSQL
  - Endpoints: /users, /auth/login, /auth/register

  **Product Service** (Java/Spring Boot)
  - Product catalog, inventory management
  - Database: MongoDB
  - Endpoints: /products, /inventory

  **Order Service** (Python/FastAPI)
  - Order processing, payment integration
  - Database: PostgreSQL
  - Endpoints: /orders, /payments

  **Notification Service** (Node.js)
  - Email/SMS notifications
  - Message Queue: RabbitMQ

  Current flow: User → Order Service → Product Service (inventory check) → Payment Gateway → Notification Service

  ## Challenge ##
  We're experiencing several issues:
  1. Order Service frequently times out when checking inventory during high traffic
  2. Failed payments leave inventory in inconsistent state
  3. Users don't get real-time updates on order status
  4. System becomes unavailable when Product Service goes down
  5. Notification delivery is unreliable during peak hours

  We need to redesign this to be more resilient, scalable, and provide better user experience. Consider patterns like circuit breakers, event sourcing, and real-time updates.

  Ask clarifying questions if needed.

Amazon Nova 2 Lite Sample Response

Response

Clarifying Questions

  • What infrastructure are you currently running on (e.g., Kubernetes, VMs, serverless)?
  • Do you have an existing message broker beyond RabbitMQ (e.g., Apache Kafka, NATS) that could be used for high‑throughput events?
  • Are you able to introduce a service‑mesh (Istio/Linkerd) or a dedicated resilience library (Polly, Hystrix, Resilience4j)?
  • What are the latency/throughput targets for order placement and inventory checks during peak traffic?
  • Is a read‑only cache (Redis, Memcached) already in place, or can one be added?
  • Which front‑end framework will consume real‑time order updates (SPA, mobile app, etc.)?
  • Do you have existing observability stacks (Prometheus/Grafana, OpenTelemetry, ELK) or would you like to add them?
  • Are there any regulatory or compliance constraints that affect data consistency (e.g., financial reporting)?
  • Can the Product Service be split into separate catalog and inventory responsibilities, or must it stay monolithic?

Answering these will help fine‑tune the proposed redesign, but the following outline can be applied regardless.


Summary of Current Challenges

# Issue Root Cause
1 Order Service times out checking inventory Synchronous HTTP call to Product Service under load
2 Failed payments leave inventory in an inconsistent state No compensating action / eventual consistency gap
3 Users don’t get real‑time order status UI polls REST endpoints only
4 System unavailable when Product Service fails Tight coupling & lack of fallback
5 Unreliable notification delivery during peaks RabbitMQ consumer overload, no retries / DLQ

High‑Level Redesign Overview

  1. Introduce an explicit Inventory Service separate from the Product Catalog.
  2. Keeps stock data close to the reservation logic.
  3. Allows caching, read‑replicas, and fast “reserve‑if‑available” checks.

  4. Adopt a Saga/Orchestration pattern for order placement.

  5. Guarantees eventual consistency across Order, Inventory, Payment, and Notification.
  6. Provides compensating actions when a step fails.

  7. Add an event bus (Kafka or a scaled RabbitMQ topics) for asynchronous communication.

  8. Decouples services, enables reliable retries, and supports idempotent consumers.

  9. Implement circuit‑breaker, bulkhead, and fallback patterns at the API‑Gateway / Order Service.

  10. Prevents cascading failures when Inventory or Payment services are unhealthy.

  11. Expose real‑time order status via a dedicated WebSocket/Server‑Sent‑Events channel.

  12. Push updates from the Notification Service (or a new Real‑Time Update Service) directly to the UI.

  13. Introduce observability (distributed tracing, metrics, health‑checks) and resilience testing (chaos engineering).


Service Decomposition & Responsibility Table

Service Current Responsibility Proposed Responsibility New Patterns
User Service Account management, authentication Same + JWT issuance for API‑Gateway OAuth2/JWT, rate‑limiting
Product Catalog Service Product CRUD, basic inventory view Read‑only product data (descriptions, pricing) Cached reads, read‑replicas
Inventory Service (new) Stock levels, reservation logic, release compensation Redis cache, circuit‑breaker fallback, idempotent reserve command
Order Service Order creation, payment call, basic status Saga orchestrator: reserve inventory → call payment → emit events Saga orchestration, retry with exponential back‑off, timeout
Payment Service (external or internal) Payment gateway integration Same, but expose result as event (PaymentSucceeded/PaymentFailed) Idempotent webhook handling, circuit‑breaker
Notification Service Email/SMS via RabbitMQ Publish order‑status events, retry with DLQ, fallback to email‑only Dead‑Letter Queue, retry policy, bulk‑send
Real‑Time Update Service (new) WebSocket server broadcasting OrderStatusChanged events Persistent connections, back‑pressure handling
API Gateway Routing, auth Central entry point, rate‑limiting, circuit‑breaker per downstream service Envoy/ Istio, JWT validation, request throttling
Event Bus RabbitMQ topic for notifications Kafka topics (OrderCreated, InventoryReserved, PaymentResult, OrderCompleted) Exactly‑once semantics (idempotent consumers), schema registry

Communication Patterns Table

Source Target Pattern Message Type Reliability
User Service API Gateway Synchronous HTTPS (REST) API request Circuit‑breaker at GW
API Gateway Order Service Synchronous HTTPS (REST) OrderCreateRequest Timeout + retry
Order Service Inventory Service Asynchronous command (Kafka topic reserve-inventory) ReserveInventoryCommand At‑least‑once, idempotent consumer
Inventory Service Order Service Asynchronous event (Kafka topic inventory-reserved) InventoryReservedEvent Idempotent handler
Order Service Payment Service Synchronous HTTPS (REST) PaymentRequest Circuit‑breaker, fallback to “payment‑pending”
Payment Service Order Service Asynchronous event (Kafka topic payment-result) PaymentResultEvent (succeeded/failed) DLQ + retry
Order Service Notification Service Asynchronous event (Kafka topic order-completed) OrderCompletedEvent Retry + DLQ
Notification Service Real‑Time Update Service Asynchronous event (Kafka topic order-status-changed) OrderStatusChangedEvent Fan‑out to WebSocket clients
Real‑Time Update Service UI (WebSocket) WebSocket push OrderStatusUpdate Persistent connection, back‑pressure
API Gateway Monitoring Stack Metrics & health checks Prometheus scrape / OpenTelemetry Reliable (internal)

Resilience Patterns per Service

Service Pattern Implementation Fallback
Order Service Circuit‑breaker for Inventory & Payment calls Resilience4j / Hystrix, timeout = 2 s, failure‑ratio = 0.3 Use cached inventory view + “limited stock” mode
Bulkhead Separate thread‑pool per downstream call Graceful degradation – reject new orders if pool exhausted
Retry with exponential back‑off 3 attempts, jitter = 200 ms Fail fast & emit compensating event
Inventory Service Cache‑aside pattern (Redis) Store current stock, invalidate on reservation Serve from cache with “stale‑while‑revalidate”
Read‑replica for catalog reads MongoDB secondary Continue serving product data even if primary is down
Idempotent reserve command Use reservation‑id to dedupe Safe retry without double‑reservation
Notification Service Dead‑Letter Queue (RabbitMQ) Move failed messages to DLQ after 5 retries Manual inspection / alternate email provider
Bulk‑send with rate‑limiting Batch emails per minute Switch to email‑only if SMS provider fails
API Gateway Rate‑limiting & throttling Envoy rate‑limit filter (RPS per client) Return HTTP 429, UI shows “try again later”
Circuit‑breaker per downstream Envoy outlier detection Return cached fallback response (e.g., “service unavailable”)
Event Bus Idempotent consumers Track processed event IDs in DB Guarantees no duplicate state changes
Schema registry Confluent Schema Registry Prevent breaking changes across services

Real‑Time Order Status Updates

  • WebSocket channel hosted by the Real‑Time Update Service.
  • Upon receiving an OrderStatusChangedEvent from the Notification Service, the service pushes a minimal JSON payload to all subscribed UI connections.
  • UI subscribes per order‑id (or via user‑id for all active orders) using a secure WebSocket (wss://).
  • Back‑pressure: the service buffers up to 100 messages per client; older messages are dropped gracefully.
  • Fallback: if WebSocket disconnects, the UI falls back to polling the Order Service every 5 s (gradual degradation).

Event‑Driven Flow Diagram

flowchart LR
    A[User] -->|POST /orders| B(Order Service)
    B -->|Reserve Inventory| C[Inventory Service]
    C -->|InventoryReserved| B
    B -->|Call Payment Gateway| D[Payment Service]
    D -->|PaymentSucceeded| B
    B -->|Publish OrderCompleted| E[Notification Service]
    E -->|Send Email/SMS| F[User]
    B -->|Publish OrderStatusChanged| G[Real‑Time Update Service]
    G -->|Push to UI| F

Failure paths (not drawn for brevity):
- If C cannot reserve, B publishes InventoryNotAvailable → UI shown “Out of stock”.
- If D fails, B triggers a compensating ReleaseInventory event and publishes OrderCancelled.


Order Status State Diagram

stateDiagram-v2
    [*] --> OrderCreated
    OrderCreated --> InventoryReserved
    InventoryReserved --> PaymentPending
    PaymentPending --> PaymentSucceeded
    PaymentSucceeded --> OrderCompleted
    PaymentPending --> PaymentFailed
    PaymentFailed --> InventoryReleased
    InventoryReleased --> OrderCancelled
    OrderCompleted --> [*]
    OrderCancelled --> [*]

Implementation Roadmap (Phased Approach)

  1. Phase 1 – Observability & Baseline
  2. Deploy Prometheus + Grafana, OpenTelemetry tracing.
  3. Add health‑check endpoints and circuit‑breaker metrics.
  4. Instrument current services with logging and distributed tracing.

  5. Phase 2 – Inventory Service & Cache

  6. Extract inventory logic into a new microservice.
  7. Add Redis cache for stock levels and implement an idempotent reserve command.
  8. Introduce read‑replica for product catalog reads.

  9. Phase 3 – Event Bus & Saga Orchestration

  10. Provision Kafka cluster (or upgrade RabbitMQ topics).
  11. Redesign Order Service to emit/consume events (reserve-inventory, payment-result).
  12. Implement saga state machine (orchestrator) with compensation on failure.

  13. Phase 4 – Resilience at the Edge

  14. Configure API‑Gateway (Envoy/Istio) with circuit‑breakers, rate‑limiting, and retries.
  15. Add bulkhead isolation per downstream call in Order Service.
  16. Enable fallback to cached inventory when Inventory Service is down.

  17. Phase 5 – Real‑Time Updates

  18. Build Real‑Time Update Service (Node.js + Socket.IO or Go + WS).
  19. Wire it to receive order-status-changed events from Notification Service.
  20. Update front‑end to subscribe via secure WebSocket and handle graceful fallback.

  21. Phase 6 – Chaos & Performance Testing

  22. Run chaos experiments (netem latency, service outage, Kafka partition loss).
  23. Validate circuit‑breaker trips, retry policies, and compensation actions.
  24. Tune timeouts, back‑off, and cache TTL based on test results.

  25. Phase 7 – Monitoring & Alerting Refinement

  26. Create alerts for circuit‑breaker trips, DLQ growth, and WebSocket disconnects.
  27. Add dashboards for order latency, inventory reservation success rate, and notification delivery.

Additional Considerations

  • Security – Use mTLS between services (service‑mesh) and JWT issued by User Service for API‑Gateway validation.
  • Schema Evolution – Register Avro/Protobuf schemas in a schema registry; enforce backward compatibility.
  • Data Consistency – Leverage event sourcing for Order and Inventory if audit trails are required; otherwise, rely on saga compensation.
  • Testing Strategy
  • Contract tests for each service’s API (Pact).
  • End‑to‑end integration tests using testcontainers (Kafka, Redis, DBs).
  • Load testing with Locust/Gatling to verify circuit‑breaker thresholds.
  • Deployment – Kubernetes Deployments with Helm charts; use Helm values to toggle features (e.g., enable/disable circuit‑breaker).
  • Compliance – Ensure any personal data sent via notifications complies with GDPR/CCPA (prefer email over SMS for sensitive info).
  • Cost – Evaluate whether Kafka adds operational overhead vs. scaling RabbitMQ; consider managed offerings (Aiven, Confluent Cloud) if self‑hosted is a concern.

Next Steps

  1. Review the clarifying questions above and provide any constraints or preferences.
  2. Decide on the event‑bus technology (Kafka vs. scaled RabbitMQ).
  3. Choose a service‑mesh or resilience library for the API‑Gateway.
  4. Prioritize the phases based on your current pain points (e.g., start with Inventory Service if timeouts are critical).

With this foundation, the platform will be able to handle peak traffic, recover gracefully from failures, and give users live visibility into their orders.

API Request

import boto3
import json

# Initialize the Bedrock Runtime client
bedrock_runtime = boto3.client('bedrock-runtime', region_name='us-west-2')

# Define the system prompt
system_prompt = """You are an expert microservices architect who specializes in distributed system design and implementation.
  Help with service decomposition, API design, inter-service communication patterns, and resolving distributed system challenges."""

# Define the user prompt
user_prompt = """## Current Architecture ##
  We have an e-commerce platform with these services:

  **User Service** (Node.js/Express)
  - Manages user accounts, authentication
  - Database: PostgreSQL
  - Endpoints: /users, /auth/login, /auth/register

  **Product Service** (Java/Spring Boot)
  - Product catalog, inventory management
  - Database: MongoDB
  - Endpoints: /products, /inventory

  **Order Service** (Python/FastAPI)
  - Order processing, payment integration
  - Database: PostgreSQL
  - Endpoints: /orders, /payments

  **Notification Service** (Node.js)
  - Email/SMS notifications
  - Message Queue: RabbitMQ

  Current flow: User → Order Service → Product Service (inventory check) → Payment Gateway → Notification Service

  ## Challenge ##
  We're experiencing several issues:
  1. Order Service frequently times out when checking inventory during high traffic
  2. Failed payments leave inventory in inconsistent state
  3. Users don't get real-time updates on order status
  4. System becomes unavailable when Product Service goes down
  5. Notification delivery is unreliable during peak hours

  We need to redesign this to be more resilient, scalable, and provide better user experience. Consider patterns like circuit breakers, event sourcing, and real-time updates.

  Ask clarifying questions if needed."""

# Prepare the request
request_body = {
    "system": [
        {
            "text": system_prompt
        }
    ],
    "messages": [
        {
            "role": "user",
            "content": [
                {
                    "text": user_prompt
                }
            ]
        }
    ],
    "toolConfig": {
        "tools": [
            {
                "systemTool": {
                    "name": "nova_grounding"
                }
            },
            {
                "systemTool": {
                    "name": "nova_code_interpreter"
                }
            }
        ]
    },
    "additionalModelRequestFields": {
        "reasoningConfig": {
            "type": "enabled",
            "maxReasoningEffort": "high"
        }
    }
}

# Make the API call
response = bedrock_runtime.converse(
    modelId="amazon.nova-2-lite-v1:0",
    **request_body
)

# Print the response
print(json.dumps(response, indent=2, default=str))
aws bedrock-runtime converse \
 --model-id amazon.nova-2-lite-v1:0 \
 --system "You are an expert microservices architect who specializes in distributed system design and implementation.
Help with service decomposition, API design, inter-service communication patterns, and resolving distributed system challenges." \
 --messages "[{\"role\":\"user\",\"content\":[{\"text\":\"## Current Architecture ##\nWe have an e-commerce platform with these services:\n\n**User Service** (Node.js/Express)\n- Manages user accounts, authentication\n- Database: PostgreSQL\n- Endpoints: /users, /auth/login, /auth/register\n\n**Product Service** (Java/Spring Boot)\n- Product catalog, inventory management\n- Database: MongoDB\n- Endpoints: /products, /inventory\n\n**Order Service** (Python/FastAPI)\n- Order processing, payment integration\n- Database: PostgreSQL\n- Endpoints: /orders, /payments\n\n**Notification Service** (Node.js)\n- Email/SMS notifications\n- Message Queue: RabbitMQ\n\nCurrent flow: User → Order Service → Product Service (inventory check) → Payment Gateway → Notification Service\n\n## Challenge ##\nWe're experiencing several issues:\n1. Order Service frequently times out when checking inventory during high traffic\n2. Failed payments leave inventory in inconsistent state\n3. Users don't get real-time updates on order status\n4. System becomes unavailable when Product Service goes down\n5. Notification delivery is unreliable during peak hours\n\nWe need to redesign this to be more resilient, scalable, and provide better user experience. Consider patterns like circuit breakers, event sourcing, CQRS, saga patterns, and real-time updates.\n\nAsk clarifying questions if needed.\"}]}]" \
 --additional-model-request-fields "{"reasoningConfig":{"type":"enabled","maxReasoningEffort":"low"}}"reasoningConfig\":{\"type\":\"enabled\",\"maxReasoningEffort\":\"medium\"}}" \
 --inference-config "{\"temperature\":0.3,\"topP\":0.9,\"maxTokens\":4000}" \
 --region us-west-2
{
 "system": "You are an expert microservices architect who specializes in distributed system design and implementation.\n  Help with service decomposition, API design, inter-service communication patterns, and resolving distributed system challenges.\n  \n  **Important formatting requirements:**\n  - Use proper markdown tables with | separators and header rows\n  - Tables must NOT be indented - start at column 1\n  - Always add blank line before and after tables\n  - For mermaid diagrams, use correct syntax:\n    - State diagrams: use `: label` not `[-- label]`\n    - Flow charts: use proper node syntax\n    - No `--` separators in mermaid code blocks\n  - Use bullet points and numbered lists for clarity",
 "messages": [
  {
   "role": "user",
   "content": [
    {
     "text": "## Current Architecture ##\n  We have an e-commerce platform with these services:\n  \n  **User Service** (Node.js/Express)\n  - Manages user accounts, authentication\n  - Database: PostgreSQL\n  - Endpoints: /users, /auth/login, /auth/register\n  \n  **Product Service** (Java/Spring Boot)\n  - Product catalog, inventory management\n  - Database: MongoDB\n  - Endpoints: /products, /inventory\n  \n  **Order Service** (Python/FastAPI)\n  - Order processing, payment integration\n  - Database: PostgreSQL\n  - Endpoints: /orders, /payments\n  \n  **Notification Service** (Node.js)\n  - Email/SMS notifications\n  - Message Queue: RabbitMQ\n  \n  Current flow: User \u2192 Order Service \u2192 Product Service (inventory check) \u2192 Payment Gateway \u2192 Notification Service\n  \n  ## Challenge ##\n  We're experiencing several issues:\n  1. Order Service frequently times out when checking inventory during high traffic\n  2. Failed payments leave inventory in inconsistent state\n  3. Users don't get real-time updates on order status\n  4. System becomes unavailable when Product Service goes down\n  5. Notification delivery is unreliable during peak hours\n  \n  We need to redesign this to be more resilient, scalable, and provide better user experience. Consider patterns like circuit breakers, event sourcing, and real-time updates.\n  \n  Ask clarifying questions if needed."
    }
   ]
  }
 ],
 "additionalModelRequestFields": {
  "reasoningConfig": {
   "type": "enabled",
   "maxReasoningEffort": "high"
  }
 }
}