Building Resilient Webhooks with AWS API Gateway direct SQS Integration

Webhooks are a great asynchronous way to listen to events happening from external systems via HTTP. You provide an endpoint that you want that system to send a request — usually as a POST — to whenever an event of interest takes place.

External services like Stripe, Checkout.com, and similar platforms offer the option to supply them with a webhook URL for specific events, often crucial to your workflows, such as the capture of a payment. Typically, signaling the successful acknowledgment and processing of such events involves returning a status code of ≥ 200 ≤ to the external service. A common approach is to use a Lambda function behind the API Gateway path to handle these requests, although potential pitfalls may arise based on your requirements & implementation.

  1. Throttling on Downstream: Overloading the downstream service (lambda) during peak hours without the ability to exert back-pressure on the requests coming in.

  2. Trust Issues: Unreliable external system retries that cannot be fully trusted. For example the external system making configuration changes that screws up the signature/hash check, making all requests invalid and challenging to replay.

  3. Lack of Retries: Absence of a mechanism to handle retries following the acknowledgment of processing the webhook request, leaving no retries in case of errors on our end.

  4. Timeout Challenges: Event processing logic extending beyond 30 seconds, leading to API Gateway terminating the request due to the timeout limit on API Gateway.

In the past few years, AWS has consistently unveiled numerous updates focused on enhancing service-to-service integration. This aims to streamline the connection of various AWS services without necessitating an intermediary Lambda function. Among the array of services offered by AWS & the integrations available for API Gateway, SQS has emerged as a particularly effective solution for addressing such challenges.

We start by creating the queue that the API Gateway will send the request to

Resources:
  WebhookQueue:
    Type: AWS::SQS::Queue
    Properties:
      VisibilityTimeout: 90
      RedrivePolicy: 
        deadLetterTargetArn: 
          Fn::GetAtt:
            - WebhookDLQ
            - Arn
        maxReceiveCount: 3

  WebhookDLQ:
    Type: AWS::SQS::Queue

In order for us to to route requests on certain API Gateway endpoints to SQS a couple of resources are required

  1. A role that we will attach to the API Gateway method resource to give it permissions to send the message to the SQS queue

  2. The API Gateway resource (path) in this example it will be /webhook

  3. Finally the API Gateway method for the path that we defined in step 2.
    - Set IntegrationResponses to status code 200 to return after sending the message to the queue by API Gateway
    - Set RequestParameters header Content-Type application/x-www-form-urlencoded because that type content type header that is accepted by SQS service
    - In the RequestTemplate we define a template that will send a message to the queue with both the body & headers as the following structure

     {
         body: { ... },
         headers: { ... }
     }
    

API Gateway Cloudformation Template

Resources:
  WebhookAPIGatewayToSQSRole:
    Type: AWS::IAM::Role
    Properties:
      AssumeRolePolicyDocument:
        Statement:
          - Action:
              - sts:AssumeRole
            Effect: Allow
            Principal:
              Service:
                - apigateway.amazonaws.com
        Version: '2012-10-17'
      Policies:
        - PolicyDocument:
            Statement:
              - Action: sqs:SendMessage
                Effect: Allow
                Resource:
                  Fn::GetAtt:
                    - WebhookQueue
                    - Arn
              - Action:
                  - logs:CreateLogGroup
                  - logs:CreateLogStream
                  - logs:DescribeLogGroups
                  - logs:DescribeLogStreams
                  - logs:PutLogEvents
                  - logs:GetLogEvents
                  - logs:FilterLogEvents
                Effect: Allow
                Resource: '*' # Bad practice, make sure your security is tight
            Version: '2012-10-17'
          PolicyName: apig-sqs-send-msg-policy
      RoleName: webhook-apig-sqs-send-msg-role

  ApiGatewayResourceWebhook:
    Type: AWS::ApiGateway::Resource
    Properties:
      ParentId:
        Fn::GetAtt:
          - ApiGatewayRestApi # This is a CFN resource that serverless framework automatically creates
          - RootResourceId
      PathPart: webhook
      RestApiId:
        Ref: ApiGatewayRestApi # This is a CFN resource that serverless framework automatically creates

  ApiGatewayMethodWebhookPost:
    Type: AWS::ApiGateway::Method
    Properties:
      AuthorizationType: NONE
      HttpMethod: POST
      Integration:
        Type: AWS
        Credentials:
          Fn::GetAtt:
          - WebhookAPIGatewayToSQSRole
          - Arn
        IntegrationHttpMethod: POST
        IntegrationResponses:
          - StatusCode: '200'
        PassthroughBehavior: WHEN_NO_TEMPLATES
        RequestParameters:
          integration.request.header.Content-Type: '''application/x-www-form-urlencoded'''
        RequestTemplates:
          "application/json": "Action=SendMessage&MessageBody={
              \"body\": $input.json('$'),
              \"headers\": {
                  #foreach($param in $input.params().header.keySet())
                      \"$param\": \"$util.escapeJavaScript($input.params().header.get($param))\" #if($foreach.hasNext), #end
                  #end
              }
          }"
        Uri:
          Fn::Join:
            - ''
            - - 'arn:aws:apigateway:'
              - Ref: AWS::Region
              - :sqs:path/
              - Ref: AWS::AccountId
              - /
              - Fn::GetAtt:
                - WebhookQueue
                - QueueName
      MethodResponses:
        - ResponseModels:
            application/json: Empty
          StatusCode: '200'
      ResourceId:
        Ref: ApiGatewayResourceWebhook
      RestApiId:
        Ref: ApiGatewayRestApi # This is a CFN resource that serverless framework automatically creates

When a request is directed to the /webhook endpoint, API Gateway will utilize the specified request template to extract both the body and headers, consolidating them into an object before dispatching a message to the queue. This setup grants us comprehensive control over the volume of messages processed at a given moment. By configuring a maximum concurrency on the Lambda event source mapper, we can apply back-pressure in the event of a surge in incoming requests. Significantly, this implementation provides the ability to retry requests should any issues arise.

Response of the webhook endpoint after successfully sending the message to the SQS queue

I’d like to conclude with a crucial point: ensuring that the downstream process can manage idempotency is always essential. I’ve written a recent article that explores various ways to incorporate idempotency into your workflows. If you’re leveraging Lambda Power Tools, consider yourself fortunate, as the tool offers a dedicated utility for this purpose. You can find more information about it in the documentation: https://docs.powertools.aws.dev/lambda/typescript/latest/utilities/idempotency/