SAP Corner — Sekhar Guna

How Event Mesh DLQ works

SAP Event Mesh uses a queue-based messaging model — producers publish messages to queues, consumers read and process them. When a consumer fails to process a message and the maximum delivery attempts are exhausted, the message is moved to a Dead Letter Queue (DLQ).

The DLQ is a regular queue with a naming convention: <original-queue-name>/<dlq-suffix>. By default in Event Mesh the suffix is configurable. The message lands there and stays — nothing processes it automatically.

A message ends up in DLQ when:

The consumer returns a NACK (negative acknowledgement) on every retry
The max delivery attempts threshold is hit (configurable, default: 1 in Event Mesh — meaning a single NACK sends it to DLQ immediately)
The message TTL (time-to-live) expires before successful delivery

Pitfall

The default max delivery attempts in SAP Event Mesh is 1. A single processing failure sends the message to DLQ permanently. For production use, always set this to at least 3–5 to handle transient failures (network hiccups, target system restarts).

Configuring DLQ correctly

Queue configuration in Event Mesh is done via the Event Mesh dashboard or the REST management API. The critical properties:

{
  "name": "order-processing-queue",
  "properties": {
    "maxMessageSizeInBytes": 1048576,
    "messageTimeToLiveInMs": 86400000,
    "maxQueueSizeInBytes": 10737418240,
    "requiresDuplicateFiltering": false,
    "deadMessageQueue": {
      "name": "order-processing-queue/dlq",
      "maxDeliveryCount": 5
    }
  }
}

Key settings:

maxDeliveryCount: 5 — try 5 times before DLQ
messageTimeToLiveInMs: 86400000 — 24 hours TTL (set this, or messages accumulate forever)
DLQ name: explicit is better than relying on default naming

Operational patterns

Pattern 1: DLQ monitor iFlow — Build a CPI iFlow that polls the DLQ on a schedule (every 5 minutes), counts messages, and sends alerts if count > 0. Don't let DLQ messages accumulate silently.

Pattern 2: Structured DLQ consumer — Instead of manual replay, build a dedicated DLQ consumer application that:

Reads from the DLQ
Classifies the error (transient vs permanent)
Routes transient failures back to the main queue (automatic replay)
Routes permanent failures to a human review dashboard

Pattern 3: Poison message quarantine — Some messages will never be processable (malformed data, invalid structure). Quarantine them permanently with enriched metadata:

async function handleDlqMessage(message: EventMeshMessage) {
  const errorContext = {
    originalQueue: message.properties.replyTo,
    deliveryAttempts: message.properties.deliveryCount,
    firstFailedAt: message.properties.creationTime,
    lastError: message.applicationProperties?.lastError ?? 'Unknown',
    payload: message.body,
    messageId: message.properties.messageId,
  };

  // Write to HANA Cloud for human review
  await db.run(INSERT.into('POISONED_MESSAGES').entries(errorContext));

  // Acknowledge to remove from DLQ
  await message.ack();
}

Replaying from DLQ

For transient failures, replay is straightforward — forward the message back to the original queue:

async function replayFromDlq(queueName: string) {
  const dlqMessages = await eventMeshClient.receiveMessages(`${queueName}/dlq`);

  for (const msg of dlqMessages) {
    // Add replay metadata
    msg.applicationProperties = {
      ...msg.applicationProperties,
      replayedAt: new Date().toISOString(),
      replayCount: (msg.applicationProperties?.replayCount ?? 0) + 1,
    };

    if ((msg.applicationProperties.replayCount as number) > 3) {
      // Permanent failure — quarantine instead of looping
      await quarantine(msg);
    } else {
      await eventMeshClient.publishMessage(queueName, msg);
    }

    await msg.ack(); // Remove from DLQ
  }
}

Best practice

Track replay attempts in the message's application properties. A message that gets replayed 3 times and fails each time is a poisoned message, not a transient error — stop replaying it and escalate to human review. Automated infinite replay loops are how a single bad message takes down your entire consumer.