Engineering•March 29, 2025•By Atul Kumar
Request Deduplication & Processing System
LMSFenixStabilityNonFunctional
Technical Requirements Document
Fenix
1. Problem Statement
1.1 Current Challenges
- Duplicate Requests: Clients may submit the same request multiple times due to retries or network issues, leading to redundant processing which leads to financial risk to the lender system.
- No Handling for idempotency:
- No centralized system to gurantee idempotent handling.
- No support for
externalRequestIdorrequestorRequestIdin Fenix system. - System is able to prevent impact of retry but no recorded visibility.
- Missing client retry policy:
- Clients lack visibility how the system handles retries.
1.2 Goals
- Idempotency: Ensure each request is processed exactly once.
- Consistency: Maintain a single source of truth (DynamoDB) for Idempotency check.
- Resilience: Handle failures via DLQ + alerts.
- Client UX: Provide clear responses about the request state, retries and support externalRequestId to guarantee consistency and idempotency. So that client has clear visibility.
2. Solution Overview
2.1 How We Solve It
- Duplicate Requests: DynamoDB as the source of truth; reject duplicates based on status.
- Race Conditions: Rely on DynamoDB’s strongly consistent reads.
- Async Processing: Use SQS + Lambda for asynchronous updates.
- Failure Handling: Implement DLQ for dead messages and CloudWatch alarms for monitoring.
- Client Retries: Return HTTP 400/500 based on idempotency check and provide errorCode and message and provide webhook updates for async results (include externalRequestId).
2.2 Key Flows
- Request Submission:
- OrchestrationService checks Idempotency → responds immediately or forwards to CTMS.
- Request Processing:
- Process request and handles DDB updates for status (
pending→success/failed) and fenixTxnId against .externalRequestId.
- Process request and handles DDB updates for status (
- Async Updates:
- SQS + Lambda ensures eventual consistency (e.g.,
externalRequestIdupdates).
- SQS + Lambda ensures eventual consistency (e.g.,
3. Technical Design
3.1 Components
-
OrchestrationService
- Technology: ECS
- Purpose: Validates Idempotency, route request or response to client or End service (Service which is processing request) based on status received from DDB.
-
End service (Service which is processing request)
- Technology: ECS
- Purpose: Processes requests and updates request status in DDB.
-
DynamoDB
- Technology: AWS DynamoDB
- Purpose: Primary store for
requestId,status,fenixTxnId, andexternalRequestId. Source of truth for idempotency check
-
SQS
- Technology: AWS SQS
- Purpose: Handles async updates, including
externalRequestIdupdates.
-
Lambda
- Technology: AWS Lambda
- Purpose: Consume message from SQS and handles write in DDB.
-
DLQ
- Technology: AWS SQS DLQ
- Purpose: Captures failed messages and triggers alerts.
-
Monitoring
- Technology: CloudWatch
- Purpose: Tracks DDB throttles, DLQ depth, and Lambda errors.
3.2 Data Model (DynamoDB)
Table: fenix_requests
Primary Key: requestId (String)
Attributes:
status(String):pending/processing/success/failedfenixTxnId(String): Null until End service successfully accepted the request for processing (after all validation and before starting workflow execution).externalRequestId(String): Client-provided ID.createdAt(ISO-8601): Initial request time.updatedAt(ISO-8601): Last status change.
4. Workflow and approach document link
5. Metrics for Success
4.1 Key Metrics
- Request Deduplication Rate
- Target: 100%
- Measurement: % duplicate
externalRequestIds rejected
- DDB Write Latency
- Target: <100ms (p99)
- Measurement: CloudWatch metrics
- SQS-DLQ Depth
- Target: 0 (alert if >5)
- Measurement: Automated alerts via SNS
- End-to-End Latency
- Target: <2s (p95)
- Measurement: From client POST to response
- Webhook Delivery Rate
- Target: 99.9% success
- Measurement: Tracked via logging
5.2 Monitoring Setup
- CloudWatch Alarms:
- DDB Throttles:
Threshold > 10/min. - DLQ Depth:
Threshold > 5 messages. - Lambda Errors:
Error rate > 1%.
- DDB Throttles:
- Dashboards:
- Real-time Request Status: Counts of
pending/success/failed. - SQS Backlog: Messages awaiting processing.
- Real-time Request Status: Counts of
9. Conclusion
This design ensures idempotent, consistent, and resilient request processing with:
- DynamoDB as the source of truth.
- SQS + Lambda for async workflows.
- DLQ + Alerts for failure recovery.