Microservices Architecture Assistant
Use Nova reasoning models to help design, debug, and optimize microservices architectures.
System Prompt Template
You are an expert microservices architect who specializes in distributed system design and implementation.
Help with service decomposition, API design, and inter-service communication patterns.
User Prompt Template
## Current Architecture ##
{service definitions and interactions}
## Challenge ##
{specific microservices problem}
Ask clarifying questions if needed.
Example
Amazon Nova 2 Lite System Prompt
You are an expert microservices architect who specializes in distributed system design and implementation.
Help with service decomposition, API design, inter-service communication patterns, and resolving distributed system challenges.
**Important formatting requirements:**
- Use proper markdown tables with | separators and header rows
- Tables must NOT be indented - start at column 1
- Always add blank line before and after tables
- For mermaid diagrams, use correct syntax:
- State diagrams: use `: label` not `[-- label]`
- Flow charts: use proper node syntax
- No `--` separators in mermaid code blocks
- Use bullet points and numbered lists for clarity
Amazon Nova 2 Lite User Prompt
## Current Architecture ##
We have an e-commerce platform with these services:
**User Service** (Node.js/Express)
- Manages user accounts, authentication
- Database: PostgreSQL
- Endpoints: /users, /auth/login, /auth/register
**Product Service** (Java/Spring Boot)
- Product catalog, inventory management
- Database: MongoDB
- Endpoints: /products, /inventory
**Order Service** (Python/FastAPI)
- Order processing, payment integration
- Database: PostgreSQL
- Endpoints: /orders, /payments
**Notification Service** (Node.js)
- Email/SMS notifications
- Message Queue: RabbitMQ
Current flow: User → Order Service → Product Service (inventory check) → Payment Gateway → Notification Service
## Challenge ##
We're experiencing several issues:
1. Order Service frequently times out when checking inventory during high traffic
2. Failed payments leave inventory in inconsistent state
3. Users don't get real-time updates on order status
4. System becomes unavailable when Product Service goes down
5. Notification delivery is unreliable during peak hours
We need to redesign this to be more resilient, scalable, and provide better user experience. Consider patterns like circuit breakers, event sourcing, and real-time updates.
Ask clarifying questions if needed.
Amazon Nova 2 Lite Sample Response
Response
Clarifying Questions
- What infrastructure are you currently running on (e.g., Kubernetes, VMs, serverless)?
- Do you have an existing message broker beyond RabbitMQ (e.g., Apache Kafka, NATS) that could be used for high‑throughput events?
- Are you able to introduce a service‑mesh (Istio/Linkerd) or a dedicated resilience library (Polly, Hystrix, Resilience4j)?
- What are the latency/throughput targets for order placement and inventory checks during peak traffic?
- Is a read‑only cache (Redis, Memcached) already in place, or can one be added?
- Which front‑end framework will consume real‑time order updates (SPA, mobile app, etc.)?
- Do you have existing observability stacks (Prometheus/Grafana, OpenTelemetry, ELK) or would you like to add them?
- Are there any regulatory or compliance constraints that affect data consistency (e.g., financial reporting)?
- Can the Product Service be split into separate catalog and inventory responsibilities, or must it stay monolithic?
Answering these will help fine‑tune the proposed redesign, but the following outline can be applied regardless.
Summary of Current Challenges
| # | Issue | Root Cause |
|---|---|---|
| 1 | Order Service times out checking inventory | Synchronous HTTP call to Product Service under load |
| 2 | Failed payments leave inventory in an inconsistent state | No compensating action / eventual consistency gap |
| 3 | Users don’t get real‑time order status | UI polls REST endpoints only |
| 4 | System unavailable when Product Service fails | Tight coupling & lack of fallback |
| 5 | Unreliable notification delivery during peaks | RabbitMQ consumer overload, no retries / DLQ |
High‑Level Redesign Overview
- Introduce an explicit Inventory Service separate from the Product Catalog.
- Keeps stock data close to the reservation logic.
-
Allows caching, read‑replicas, and fast “reserve‑if‑available” checks.
-
Adopt a Saga/Orchestration pattern for order placement.
- Guarantees eventual consistency across Order, Inventory, Payment, and Notification.
-
Provides compensating actions when a step fails.
-
Add an event bus (Kafka or a scaled RabbitMQ topics) for asynchronous communication.
-
Decouples services, enables reliable retries, and supports idempotent consumers.
-
Implement circuit‑breaker, bulkhead, and fallback patterns at the API‑Gateway / Order Service.
-
Prevents cascading failures when Inventory or Payment services are unhealthy.
-
Expose real‑time order status via a dedicated WebSocket/Server‑Sent‑Events channel.
-
Push updates from the Notification Service (or a new Real‑Time Update Service) directly to the UI.
-
Introduce observability (distributed tracing, metrics, health‑checks) and resilience testing (chaos engineering).
Service Decomposition & Responsibility Table
| Service | Current Responsibility | Proposed Responsibility | New Patterns |
|---|---|---|---|
| User Service | Account management, authentication | Same + JWT issuance for API‑Gateway | OAuth2/JWT, rate‑limiting |
| Product Catalog Service | Product CRUD, basic inventory view | Read‑only product data (descriptions, pricing) | Cached reads, read‑replicas |
| Inventory Service (new) | – | Stock levels, reservation logic, release compensation | Redis cache, circuit‑breaker fallback, idempotent reserve command |
| Order Service | Order creation, payment call, basic status | Saga orchestrator: reserve inventory → call payment → emit events | Saga orchestration, retry with exponential back‑off, timeout |
| Payment Service (external or internal) | Payment gateway integration | Same, but expose result as event (PaymentSucceeded/PaymentFailed) | Idempotent webhook handling, circuit‑breaker |
| Notification Service | Email/SMS via RabbitMQ | Publish order‑status events, retry with DLQ, fallback to email‑only | Dead‑Letter Queue, retry policy, bulk‑send |
| Real‑Time Update Service (new) | – | WebSocket server broadcasting OrderStatusChanged events | Persistent connections, back‑pressure handling |
| API Gateway | Routing, auth | Central entry point, rate‑limiting, circuit‑breaker per downstream service | Envoy/ Istio, JWT validation, request throttling |
| Event Bus | RabbitMQ topic for notifications | Kafka topics (OrderCreated, InventoryReserved, PaymentResult, OrderCompleted) | Exactly‑once semantics (idempotent consumers), schema registry |
Communication Patterns Table
| Source | Target | Pattern | Message Type | Reliability |
|---|---|---|---|---|
| User Service | API Gateway | Synchronous HTTPS (REST) | API request | Circuit‑breaker at GW |
| API Gateway | Order Service | Synchronous HTTPS (REST) | OrderCreateRequest | Timeout + retry |
| Order Service | Inventory Service | Asynchronous command (Kafka topic reserve-inventory) |
ReserveInventoryCommand | At‑least‑once, idempotent consumer |
| Inventory Service | Order Service | Asynchronous event (Kafka topic inventory-reserved) |
InventoryReservedEvent | Idempotent handler |
| Order Service | Payment Service | Synchronous HTTPS (REST) | PaymentRequest | Circuit‑breaker, fallback to “payment‑pending” |
| Payment Service | Order Service | Asynchronous event (Kafka topic payment-result) |
PaymentResultEvent (succeeded/failed) | DLQ + retry |
| Order Service | Notification Service | Asynchronous event (Kafka topic order-completed) |
OrderCompletedEvent | Retry + DLQ |
| Notification Service | Real‑Time Update Service | Asynchronous event (Kafka topic order-status-changed) |
OrderStatusChangedEvent | Fan‑out to WebSocket clients |
| Real‑Time Update Service | UI (WebSocket) | WebSocket push | OrderStatusUpdate | Persistent connection, back‑pressure |
| API Gateway | Monitoring Stack | Metrics & health checks | Prometheus scrape / OpenTelemetry | Reliable (internal) |
Resilience Patterns per Service
| Service | Pattern | Implementation | Fallback |
|---|---|---|---|
| Order Service | Circuit‑breaker for Inventory & Payment calls | Resilience4j / Hystrix, timeout = 2 s, failure‑ratio = 0.3 | Use cached inventory view + “limited stock” mode |
| Bulkhead | Separate thread‑pool per downstream call | Graceful degradation – reject new orders if pool exhausted | |
| Retry with exponential back‑off | 3 attempts, jitter = 200 ms | Fail fast & emit compensating event | |
| Inventory Service | Cache‑aside pattern (Redis) | Store current stock, invalidate on reservation | Serve from cache with “stale‑while‑revalidate” |
| Read‑replica for catalog reads | MongoDB secondary | Continue serving product data even if primary is down | |
| Idempotent reserve command | Use reservation‑id to dedupe | Safe retry without double‑reservation | |
| Notification Service | Dead‑Letter Queue (RabbitMQ) | Move failed messages to DLQ after 5 retries | Manual inspection / alternate email provider |
| Bulk‑send with rate‑limiting | Batch emails per minute | Switch to email‑only if SMS provider fails | |
| API Gateway | Rate‑limiting & throttling | Envoy rate‑limit filter (RPS per client) | Return HTTP 429, UI shows “try again later” |
| Circuit‑breaker per downstream | Envoy outlier detection | Return cached fallback response (e.g., “service unavailable”) | |
| Event Bus | Idempotent consumers | Track processed event IDs in DB | Guarantees no duplicate state changes |
| Schema registry | Confluent Schema Registry | Prevent breaking changes across services |
Real‑Time Order Status Updates
- WebSocket channel hosted by the Real‑Time Update Service.
- Upon receiving an
OrderStatusChangedEventfrom the Notification Service, the service pushes a minimal JSON payload to all subscribed UI connections. - UI subscribes per order‑id (or via user‑id for all active orders) using a secure WebSocket (
wss://). - Back‑pressure: the service buffers up to 100 messages per client; older messages are dropped gracefully.
- Fallback: if WebSocket disconnects, the UI falls back to polling the Order Service every 5 s (gradual degradation).
Event‑Driven Flow Diagram
flowchart LR
A[User] -->|POST /orders| B(Order Service)
B -->|Reserve Inventory| C[Inventory Service]
C -->|InventoryReserved| B
B -->|Call Payment Gateway| D[Payment Service]
D -->|PaymentSucceeded| B
B -->|Publish OrderCompleted| E[Notification Service]
E -->|Send Email/SMS| F[User]
B -->|Publish OrderStatusChanged| G[Real‑Time Update Service]
G -->|Push to UI| F
Failure paths (not drawn for brevity):
- If C cannot reserve, B publishes InventoryNotAvailable → UI shown “Out of stock”.
- If D fails, B triggers a compensating ReleaseInventory event and publishes OrderCancelled.
Order Status State Diagram
stateDiagram-v2
[*] --> OrderCreated
OrderCreated --> InventoryReserved
InventoryReserved --> PaymentPending
PaymentPending --> PaymentSucceeded
PaymentSucceeded --> OrderCompleted
PaymentPending --> PaymentFailed
PaymentFailed --> InventoryReleased
InventoryReleased --> OrderCancelled
OrderCompleted --> [*]
OrderCancelled --> [*]
Implementation Roadmap (Phased Approach)
- Phase 1 – Observability & Baseline
- Deploy Prometheus + Grafana, OpenTelemetry tracing.
- Add health‑check endpoints and circuit‑breaker metrics.
-
Instrument current services with logging and distributed tracing.
-
Phase 2 – Inventory Service & Cache
- Extract inventory logic into a new microservice.
- Add Redis cache for stock levels and implement an idempotent
reservecommand. -
Introduce read‑replica for product catalog reads.
-
Phase 3 – Event Bus & Saga Orchestration
- Provision Kafka cluster (or upgrade RabbitMQ topics).
- Redesign Order Service to emit/consume events (
reserve-inventory,payment-result). -
Implement saga state machine (orchestrator) with compensation on failure.
-
Phase 4 – Resilience at the Edge
- Configure API‑Gateway (Envoy/Istio) with circuit‑breakers, rate‑limiting, and retries.
- Add bulkhead isolation per downstream call in Order Service.
-
Enable fallback to cached inventory when Inventory Service is down.
-
Phase 5 – Real‑Time Updates
- Build Real‑Time Update Service (Node.js + Socket.IO or Go + WS).
- Wire it to receive
order-status-changedevents from Notification Service. -
Update front‑end to subscribe via secure WebSocket and handle graceful fallback.
-
Phase 6 – Chaos & Performance Testing
- Run chaos experiments (netem latency, service outage, Kafka partition loss).
- Validate circuit‑breaker trips, retry policies, and compensation actions.
-
Tune timeouts, back‑off, and cache TTL based on test results.
-
Phase 7 – Monitoring & Alerting Refinement
- Create alerts for circuit‑breaker trips, DLQ growth, and WebSocket disconnects.
- Add dashboards for order latency, inventory reservation success rate, and notification delivery.
Additional Considerations
- Security – Use mTLS between services (service‑mesh) and JWT issued by User Service for API‑Gateway validation.
- Schema Evolution – Register Avro/Protobuf schemas in a schema registry; enforce backward compatibility.
- Data Consistency – Leverage event sourcing for Order and Inventory if audit trails are required; otherwise, rely on saga compensation.
- Testing Strategy –
- Contract tests for each service’s API (Pact).
- End‑to‑end integration tests using testcontainers (Kafka, Redis, DBs).
- Load testing with Locust/Gatling to verify circuit‑breaker thresholds.
- Deployment – Kubernetes Deployments with Helm charts; use Helm values to toggle features (e.g., enable/disable circuit‑breaker).
- Compliance – Ensure any personal data sent via notifications complies with GDPR/CCPA (prefer email over SMS for sensitive info).
- Cost – Evaluate whether Kafka adds operational overhead vs. scaling RabbitMQ; consider managed offerings (Aiven, Confluent Cloud) if self‑hosted is a concern.
Next Steps
- Review the clarifying questions above and provide any constraints or preferences.
- Decide on the event‑bus technology (Kafka vs. scaled RabbitMQ).
- Choose a service‑mesh or resilience library for the API‑Gateway.
- Prioritize the phases based on your current pain points (e.g., start with Inventory Service if timeouts are critical).
With this foundation, the platform will be able to handle peak traffic, recover gracefully from failures, and give users live visibility into their orders.
API Request
import boto3
import json
# Initialize the Bedrock Runtime client
bedrock_runtime = boto3.client('bedrock-runtime', region_name='us-west-2')
# Define the system prompt
system_prompt = """You are an expert microservices architect who specializes in distributed system design and implementation.
Help with service decomposition, API design, inter-service communication patterns, and resolving distributed system challenges."""
# Define the user prompt
user_prompt = """## Current Architecture ##
We have an e-commerce platform with these services:
**User Service** (Node.js/Express)
- Manages user accounts, authentication
- Database: PostgreSQL
- Endpoints: /users, /auth/login, /auth/register
**Product Service** (Java/Spring Boot)
- Product catalog, inventory management
- Database: MongoDB
- Endpoints: /products, /inventory
**Order Service** (Python/FastAPI)
- Order processing, payment integration
- Database: PostgreSQL
- Endpoints: /orders, /payments
**Notification Service** (Node.js)
- Email/SMS notifications
- Message Queue: RabbitMQ
Current flow: User → Order Service → Product Service (inventory check) → Payment Gateway → Notification Service
## Challenge ##
We're experiencing several issues:
1. Order Service frequently times out when checking inventory during high traffic
2. Failed payments leave inventory in inconsistent state
3. Users don't get real-time updates on order status
4. System becomes unavailable when Product Service goes down
5. Notification delivery is unreliable during peak hours
We need to redesign this to be more resilient, scalable, and provide better user experience. Consider patterns like circuit breakers, event sourcing, and real-time updates.
Ask clarifying questions if needed."""
# Prepare the request
request_body = {
"system": [
{
"text": system_prompt
}
],
"messages": [
{
"role": "user",
"content": [
{
"text": user_prompt
}
]
}
],
"toolConfig": {
"tools": [
{
"systemTool": {
"name": "nova_grounding"
}
},
{
"systemTool": {
"name": "nova_code_interpreter"
}
}
]
},
"additionalModelRequestFields": {
"reasoningConfig": {
"type": "enabled",
"maxReasoningEffort": "high"
}
}
}
# Make the API call
response = bedrock_runtime.converse(
modelId="amazon.nova-2-lite-v1:0",
**request_body
)
# Print the response
print(json.dumps(response, indent=2, default=str))
aws bedrock-runtime converse \
--model-id amazon.nova-2-lite-v1:0 \
--system "You are an expert microservices architect who specializes in distributed system design and implementation.
Help with service decomposition, API design, inter-service communication patterns, and resolving distributed system challenges." \
--messages "[{\"role\":\"user\",\"content\":[{\"text\":\"## Current Architecture ##\nWe have an e-commerce platform with these services:\n\n**User Service** (Node.js/Express)\n- Manages user accounts, authentication\n- Database: PostgreSQL\n- Endpoints: /users, /auth/login, /auth/register\n\n**Product Service** (Java/Spring Boot)\n- Product catalog, inventory management\n- Database: MongoDB\n- Endpoints: /products, /inventory\n\n**Order Service** (Python/FastAPI)\n- Order processing, payment integration\n- Database: PostgreSQL\n- Endpoints: /orders, /payments\n\n**Notification Service** (Node.js)\n- Email/SMS notifications\n- Message Queue: RabbitMQ\n\nCurrent flow: User → Order Service → Product Service (inventory check) → Payment Gateway → Notification Service\n\n## Challenge ##\nWe're experiencing several issues:\n1. Order Service frequently times out when checking inventory during high traffic\n2. Failed payments leave inventory in inconsistent state\n3. Users don't get real-time updates on order status\n4. System becomes unavailable when Product Service goes down\n5. Notification delivery is unreliable during peak hours\n\nWe need to redesign this to be more resilient, scalable, and provide better user experience. Consider patterns like circuit breakers, event sourcing, CQRS, saga patterns, and real-time updates.\n\nAsk clarifying questions if needed.\"}]}]" \
--additional-model-request-fields "{"reasoningConfig":{"type":"enabled","maxReasoningEffort":"low"}}"reasoningConfig\":{\"type\":\"enabled\",\"maxReasoningEffort\":\"medium\"}}" \
--inference-config "{\"temperature\":0.3,\"topP\":0.9,\"maxTokens\":4000}" \
--region us-west-2
{
"system": "You are an expert microservices architect who specializes in distributed system design and implementation.\n Help with service decomposition, API design, inter-service communication patterns, and resolving distributed system challenges.\n \n **Important formatting requirements:**\n - Use proper markdown tables with | separators and header rows\n - Tables must NOT be indented - start at column 1\n - Always add blank line before and after tables\n - For mermaid diagrams, use correct syntax:\n - State diagrams: use `: label` not `[-- label]`\n - Flow charts: use proper node syntax\n - No `--` separators in mermaid code blocks\n - Use bullet points and numbered lists for clarity",
"messages": [
{
"role": "user",
"content": [
{
"text": "## Current Architecture ##\n We have an e-commerce platform with these services:\n \n **User Service** (Node.js/Express)\n - Manages user accounts, authentication\n - Database: PostgreSQL\n - Endpoints: /users, /auth/login, /auth/register\n \n **Product Service** (Java/Spring Boot)\n - Product catalog, inventory management\n - Database: MongoDB\n - Endpoints: /products, /inventory\n \n **Order Service** (Python/FastAPI)\n - Order processing, payment integration\n - Database: PostgreSQL\n - Endpoints: /orders, /payments\n \n **Notification Service** (Node.js)\n - Email/SMS notifications\n - Message Queue: RabbitMQ\n \n Current flow: User \u2192 Order Service \u2192 Product Service (inventory check) \u2192 Payment Gateway \u2192 Notification Service\n \n ## Challenge ##\n We're experiencing several issues:\n 1. Order Service frequently times out when checking inventory during high traffic\n 2. Failed payments leave inventory in inconsistent state\n 3. Users don't get real-time updates on order status\n 4. System becomes unavailable when Product Service goes down\n 5. Notification delivery is unreliable during peak hours\n \n We need to redesign this to be more resilient, scalable, and provide better user experience. Consider patterns like circuit breakers, event sourcing, and real-time updates.\n \n Ask clarifying questions if needed."
}
]
}
],
"additionalModelRequestFields": {
"reasoningConfig": {
"type": "enabled",
"maxReasoningEffort": "high"
}
}
}