EngineeringMarch 29, 2025By Atul Kumar

Request Deduplication & Processing System

LMSFenixStabilityNonFunctional

Technical Requirements Document

Fenix


1. Problem Statement

1.1 Current Challenges

  1. Duplicate Requests: Clients may submit the same request multiple times due to retries or network issues, leading to redundant processing which leads to financial risk to the lender system.
  2. No Handling for idempotency:
    • No centralized system to gurantee idempotent handling.
    • No support for externalRequestId or requestorRequestId in Fenix system.
    • System is able to prevent impact of retry but no recorded visibility.
  3. Missing client retry policy:
    • Clients lack visibility how the system handles retries.

1.2 Goals

  • Idempotency: Ensure each request is processed exactly once.
  • Consistency: Maintain a single source of truth (DynamoDB) for Idempotency check.
  • Resilience: Handle failures via DLQ + alerts.
  • Client UX: Provide clear responses about the request state, retries and support externalRequestId to guarantee consistency and idempotency. So that client has clear visibility.

2. Solution Overview

2.1 How We Solve It

  • Duplicate Requests: DynamoDB as the source of truth; reject duplicates based on status.
  • Race Conditions: Rely on DynamoDB’s strongly consistent reads.
  • Async Processing: Use SQS + Lambda for asynchronous updates.
  • Failure Handling: Implement DLQ for dead messages and CloudWatch alarms for monitoring.
  • Client Retries: Return HTTP 400/500 based on idempotency check and provide errorCode and message and provide webhook updates for async results (include externalRequestId).

2.2 Key Flows

  1. Request Submission:
    • OrchestrationService checks Idempotency → responds immediately or forwards to CTMS.
  2. Request Processing:
    • Process request and handles DDB updates for status (pendingsuccess/failed) and fenixTxnId against . externalRequestId.
  3. Async Updates:
    • SQS + Lambda ensures eventual consistency (e.g., externalRequestId updates).

3. Technical Design

3.1 Components

  • OrchestrationService

    • Technology: ECS
    • Purpose: Validates Idempotency, route request or response to client or End service (Service which is processing request) based on status received from DDB.
  • End service (Service which is processing request)

    • Technology: ECS
    • Purpose: Processes requests and updates request status in DDB.
  • DynamoDB

    • Technology: AWS DynamoDB
    • Purpose: Primary store for requestId, status, fenixTxnId, and externalRequestId. Source of truth for idempotency check
  • SQS

    • Technology: AWS SQS
    • Purpose: Handles async updates, including externalRequestId updates.
  • Lambda

    • Technology: AWS Lambda
    • Purpose: Consume message from SQS and handles write in DDB.
  • DLQ

    • Technology: AWS SQS DLQ
    • Purpose: Captures failed messages and triggers alerts.
  • Monitoring

    • Technology: CloudWatch
    • Purpose: Tracks DDB throttles, DLQ depth, and Lambda errors.

3.2 Data Model (DynamoDB)

Table: fenix_requests
Primary Key: requestId (String)
Attributes:

  • status (String): pending/processing/success/failed
  • fenixTxnId (String): Null until End service successfully accepted the request for processing (after all validation and before starting workflow execution).
  • externalRequestId (String): Client-provided ID.
  • createdAt (ISO-8601): Initial request time.
  • updatedAt (ISO-8601): Last status change.

4. Workflow and approach document link

5. Metrics for Success

4.1 Key Metrics

  • Request Deduplication Rate
    • Target: 100%
    • Measurement: % duplicate externalRequestIds rejected
  • DDB Write Latency
    • Target: <100ms (p99)
    • Measurement: CloudWatch metrics
  • SQS-DLQ Depth
    • Target: 0 (alert if >5)
    • Measurement: Automated alerts via SNS
  • End-to-End Latency
    • Target: <2s (p95)
    • Measurement: From client POST to response
  • Webhook Delivery Rate
    • Target: 99.9% success
    • Measurement: Tracked via logging

5.2 Monitoring Setup

  1. CloudWatch Alarms:
    • DDB Throttles: Threshold > 10/min.
    • DLQ Depth: Threshold > 5 messages.
    • Lambda Errors: Error rate > 1%.
  2. Dashboards:
    • Real-time Request Status: Counts of pending/success/failed.
    • SQS Backlog: Messages awaiting processing.

9. Conclusion

This design ensures idempotent, consistent, and resilient request processing with:

  • DynamoDB as the source of truth.
  • SQS + Lambda for async workflows.
  • DLQ + Alerts for failure recovery.