How Event Mesh DLQ works
SAP Event Mesh uses a queue-based messaging model β producers publish messages to queues, consumers read and process them. When a consumer fails to process a message and the maximum delivery attempts are exhausted, the message is moved to a Dead Letter Queue (DLQ).
The DLQ is a regular queue with a naming convention: <original-queue-name>/<dlq-suffix>. By default in Event Mesh the suffix is configurable. The message lands there and stays β nothing processes it automatically.
A message ends up in DLQ when:
- The consumer returns a NACK (negative acknowledgement) on every retry
- The max delivery attempts threshold is hit (configurable, default: 1 in Event Mesh β meaning a single NACK sends it to DLQ immediately)
- The message TTL (time-to-live) expires before successful delivery
The default max delivery attempts in SAP Event Mesh is 1. A single processing failure sends the message to DLQ permanently. For production use, always set this to at least 3β5 to handle transient failures (network hiccups, target system restarts).
Configuring DLQ correctly
Queue configuration in Event Mesh is done via the Event Mesh dashboard or the REST management API. The critical properties:
{
"name": "order-processing-queue",
"properties": {
"maxMessageSizeInBytes": 1048576,
"messageTimeToLiveInMs": 86400000,
"maxQueueSizeInBytes": 10737418240,
"requiresDuplicateFiltering": false,
"deadMessageQueue": {
"name": "order-processing-queue/dlq",
"maxDeliveryCount": 5
}
}
}
Key settings:
maxDeliveryCount: 5β try 5 times before DLQmessageTimeToLiveInMs: 86400000β 24 hours TTL (set this, or messages accumulate forever)- DLQ name: explicit is better than relying on default naming
Operational patterns
Pattern 1: DLQ monitor iFlow β Build a CPI iFlow that polls the DLQ on a schedule (every 5 minutes), counts messages, and sends alerts if count > 0. Don't let DLQ messages accumulate silently.
Pattern 2: Structured DLQ consumer β Instead of manual replay, build a dedicated DLQ consumer application that:
- Reads from the DLQ
- Classifies the error (transient vs permanent)
- Routes transient failures back to the main queue (automatic replay)
- Routes permanent failures to a human review dashboard
Pattern 3: Poison message quarantine β Some messages will never be processable (malformed data, invalid structure). Quarantine them permanently with enriched metadata:
async function handleDlqMessage(message: EventMeshMessage) {
const errorContext = {
originalQueue: message.properties.replyTo,
deliveryAttempts: message.properties.deliveryCount,
firstFailedAt: message.properties.creationTime,
lastError: message.applicationProperties?.lastError ?? 'Unknown',
payload: message.body,
messageId: message.properties.messageId,
};
// Write to HANA Cloud for human review
await db.run(INSERT.into('POISONED_MESSAGES').entries(errorContext));
// Acknowledge to remove from DLQ
await message.ack();
}
Replaying from DLQ
For transient failures, replay is straightforward β forward the message back to the original queue:
async function replayFromDlq(queueName: string) {
const dlqMessages = await eventMeshClient.receiveMessages(`${queueName}/dlq`);
for (const msg of dlqMessages) {
// Add replay metadata
msg.applicationProperties = {
...msg.applicationProperties,
replayedAt: new Date().toISOString(),
replayCount: (msg.applicationProperties?.replayCount ?? 0) + 1,
};
if ((msg.applicationProperties.replayCount as number) > 3) {
// Permanent failure β quarantine instead of looping
await quarantine(msg);
} else {
await eventMeshClient.publishMessage(queueName, msg);
}
await msg.ack(); // Remove from DLQ
}
}
Track replay attempts in the message's application properties. A message that gets replayed 3 times and fails each time is a poisoned message, not a transient error β stop replaying it and escalate to human review. Automated infinite replay loops are how a single bad message takes down your entire consumer.