Detailed Workflow Document

Request Deduplication & Processing System

Lodgement

1. Overview

This document describes the detailed workflow for the Request Deduplication and Processing System within the Fenix Platform for lodgement process. It covers the request submission, processing, error handling, and failure recovery mechanisms.

2. Workflow Steps

2.1 Step 1: Request Submission

Client Request:
- The client sends a request with a unique externalRequestId in the header.
Orchestration Service:
- The Orchestration Service checks DynamoDB (DDB) for the externalRequestId and its current status.

2.1.1 If Status is `Success`

The service responds with a 400 error indicating a duplicate request found with success state and provides the fenixTxnId along with fenixErrorCode and metadata (EMPTY in case of success).
The message will contain a note saying that this externalRequestId has already been processed.

2.1.2 If Status is `Failed`

The service responds with a 400 error indicating a duplicate request found with failure state and provides fenixErrorCode and metadata (NON-EMPTY in case of failure). fenixTxnId will be null in this case.
The message will contain a note saying that this externalRequestId has already been processed.

2.1.3 If Status is `Processing`

The service immediately calls CTMS to fetch the respective fenixTxnId.
If no record is found in CTMS:
- The externalRequestId is placed into a Dead Letter Queue (DLQ).
- A 500 error is returned to the client, instructing them to stop retries and wait for a webhook response.
- Fenix will investigate through on-call alerts. These alarms will improve out latency.

2.1.4 If Status is `Pending`

Similar to the Processing state:
- The service calls CTMS to fetch the fenixTxnId.
- If no record is found, the externalRequestId is placed in the DLQ. (To check why processing is delayed)
- A 500 error is returned with a request to wait for the webhook.

2.1.5 If No Record is Found in DDB

This indicates a new request. The following steps occur:
1. Write { requestId, status: pending, metadata } to DDB.
2. Send the request to SQS for further processing.
3. Lambda consumes the SQS message and updates the DDB with the correct data.
4. The request is forwarded to CTMS for processing with requestId in the header.

2.2 Step 2: CTMS Processing

CTMS checks DDB for the requestId and status.

2.2.1 If Status is `Success`

CTMS responds with a 400 error to the Orchestration Service indicating the transaction is already processed.
It returns the requestId, fenixTxnId, and the transaction status.

2.2.2 If Status is `Failed`

CTMS responds with a 400 error indicating the transaction has failed.
It provides the requestId, fenixTxnId as null, and transaction status as failed.
Clients can retry with a new requestId.

2.2.3 If Status is `Pending`

CTMS immediately updates the status to Processing.
It then starts processing the request.
- On successful completion, it updates DDB with the fenixTxnId and status=success.
- On failure, it updates the DDB with status=failed.

3. Error Handling and Failure Recovery

3.1 SQS Message Update Failures

If there is a failure while updating messages in SQS, the message will be sent to a DLQ.
Alerts are configured to notify engineers through CloudWatch when a message reaches the DLQ.
Failure while putting messages into SQS or DLQ will not halt the overall process.

3.2 Dead Letter Queue (DLQ)

DLQ is used for capturing failed messages.
Alerts are configured to raise incidents whenever a request with Processing or Pending status is sent to the DLQ.
Engineers can investigate via on-call.

4. Webhook Notifications

Webhook responses will include the following fields:

{
  "requestId": "req_123",
  "fenixTxnId": "fenix_789",
  "externalTxnId": "ext_456",
  "status": "success"
}

Webhook is sent once the request is successfully processed.
In case of failure, the client is notified with the appropriate status and failure reason.

5. Conclusion

This system ensures deduplication and consistent state management using DDB.
SQS and Lambda handle asynchronous workflows.
DLQ and alerts enable effective failure recovery.
Clients are provided with real-time responses or webhooks for transparency.

Next Steps:

Implement the detailed design in the staging environment.
Perform load testing to validate the workflow.
Set up monitoring and alerting mechanisms.
Provide integration guidelines to clients.