Lambda, Event Gateway and OpenFaaS — A CloudEvent Driven Story

Moshe Nadler
15 min readJun 3, 2018

Introduction

Event-driven architecture is a very common strategy in distributed systems like microservices and serverless alike. It works well within closed platforms where the event structure is well defined and understood. However, complexities may arise when a number of different platforms exchange events (like with a multi-cloud deployment.) Each platform has its own way of generating and consuming events, with different event structure and metadata. Another problem may arise when there is a need to replace one cloud provider with another.

CloudEvents is a great initiative to standardize events. CloudEvents exchanged between different platforms are standardized and well understood by all parties. It solves the problem of acting differently on events depending on their origin.

Exchanging events between the local platform and other platforms require some kind of a gateway which encapsulate the events and emits CloudEvents. An excellent solution is the Event Gateway. I first stumbled upon the Event Gateway watching a great talk given by Kelsey Hightower. Events consumed by the Event Gateway ends up triggering containers or Serverless functions, with the CloudEvent as the mediator transferred to the container or function.

AWS Lambda functions are pretty much synonymous with FaaS (Function as a Service) in the cloud. With all other major cloud providers offering their solutions also (Google Cloud Functions, Microsoft Azure Functions, IBM Cloud Functions) it became very common to utilize FaaS in the cloud. But, you can also deploy and run your own FaaS inside a Kubernetes or a Docker Swarm cluster. OpenFaaS is one of the solutions which makes building and deploying the Serverless functions onto your own infrastructure easy and effective.

In this post, I will try to demonstrate how AWS Lambda and OpenFaaS functions can work together using events. The Event Gateway will function as the hub through which all events flow. As a simple test of the system, I will use an AWS Lex to preform a ChatOps task. The bot will trigger a change in a Kubernetes deployment replica count by utilizing the Serverless functions to execute the task. The flow is composed of 8 steps (Image 1.):

  1. The Lex bot accepts the name of the Kubernetes deployment and the desired number of replicas and passes this information to a Lambda function.
  2. The Lambda function passes the information from the Lex bot to an endpoint that points to an ALB. The ALB acts as a gateway to the Kubernetes cluster.
  3. The ALB passes the Lambda’s payload from step 2, containing the name of the deployment and replica count, to the Event Gateway via a Kubernetes Ingress.
  4. Using the endpoint name, passed by the Lambda function in step 2, the Event Gateway emits a CloudEvent which triggers an OpenFaaS function responsible for the replica count change. The payload containing the name of the Kubernetes deployment and the desired number of replicas, passed in step 2, is passed to the OpenFaaS function as part of the CloudEvent.
  5. The OpenFaaS function calls the Kubernets API server and makes the adjustment to the replica count in the specific deployment.
  6. If the adjustment was successful, the OpenFaaS function calls an endpoint on the Event Gateway. within the call, the OpenFaaS function passes the Kubernetes deployment name and the new replica count. It also passes the event ID and timestamp of the CloudEvent which trigger the OpenFaaS function in step 4.
  7. The endpoint called in step 6 will make the Event Gateway emit a CloudEvent which triggers a Lambda function. The payload within the CloudEvent will contain all the details passed from the OpenFaaS function in step 6.
  8. The Lambda function will use the payload within the CloudEvent to update a DynamoDB record with the new replica count for the specific Kubernetes deployment. It will also update the record with the event ID and timestamp of the CloudEvent which trigger the OpenFaaS function in step 4.
Image 1. Events Flow Overview

Deploying the Event Gateway Cluster Into Kubernetes

The deployment of the Event Gateway cluster is based on this guide. I have provisioned a three-node Kubernetes 1.8 cluster on AWS using Rancher 1.6. The cluster has an Internet-facing ALB in front of it and the Rancher native Ingress Load Balancers are deployed on all the nodes. I will deploy all the Kubernates components to a namespace called “test-infra”, you should replace the namespace to the one you are using.

Deploying an etcd Cluster Using the etcd Operator

The Event Gateway cluster requires a persistent store for its configuration and coordination. For this purpose, we will deploy a three-node etcd cluster. We will deploy the etcd cluster into our Kubernetes cluster using the etcd operator. Create the deployment of the etcd operator by modifying the etcd-operator-deployment.yaml file to fit your configuration and then run the following command (change the namespace to the one you are using):

$ kubectl -n test-infra create -f etcd-kubernetes/etcd-operator-deployment.yaml

When the deployment is finished you should have a Custom Resource Definition (CRD) named etcdclusters.etcd.database.coreos.com:

$ kubectl get customresourcedefinitions etcdclusters.etcd.database.coreos.com   
NAME AGE
etcdclusters.etcd.database.coreos.com 19m

Now we can go ahead and create the etcd cluster. Create the deployment of the etcd cluster by modifying the etcd-cluster-deployment.yaml file to fit your configuration and then run the following command (change the namespace to the one you are using):

$ kubectl -n test-infra create -f etcd-kubernetes/etcd-cluster-deployment.yaml

After the etcd cluster is deployed we can check the status of the cluster pods:

$ kubectl -n test-infra get pod | grep etcd-cluster 
etcd-cluster-czk6wvkz27 1/1 Running 0 17m
etcd-cluster-dwq86w86qv 1/1 Running 0 16m
etcd-cluster-wvqzpn2nf2 1/1 Running 0 17m

We can now check the cluster health by executing the following command inside one of the pods:

$ kubectl -n test-infra exec etcd-cluster-czk6wvkz27                /usr/local/bin/etcdctl cluster-health
member 33649bb359248576 is healthy: got healthy result from http://etcd-cluster-czk6wvkz27.etcd-cluster.test-infra.svc:2379
member c18476f065296cb9 is healthy: got healthy result from http://etcd-cluster-wvqzpn2nf2.etcd-cluster.test-infra.svc:2379
member df4e237003862191 is healthy: got healthy result from http://etcd-cluster-dwq86w86qv.etcd-cluster.test-infra.svc:2379
cluster is healthy

The etcd operator will create two services for the etcd cluster:

$ kubectl -n test-infra get service | grep etcd          
etcd-cluster ClusterIP None <none> 2379/TCP,2380/TCP
etcd-cluster-client ClusterIP 10.43.61.4 <none> 2379/TCP

The etcd-cluster service is a headless service used by the cluster. The etcd-cluster-client is the service the clients will use in order to interact with the etcd cluster.

Deploying the Event Gateway Cluster

Create the deployment of the Event Gateway cluster by modifying the event-gateway-deployment.yaml file to fit your configuration and then run the following command (change the namespace to the one you are using):

$ kubectl -n test-infra create -f event-gateway/event-gateway-deployment.yaml

This will create an Event Gateway cluster with two nodes and a service. The Event Gateway is exposed on two ports:

$ kubectl -n test-infra get service event-gateway 
event-gateway ClusterIP 10.43.98.3 <none> 4000/TCP,4001/TCP 19m

Port 4000/TCP is used to submit events to the Event Gateway while port 4001/TCP is used for sending configurations to the Event Gateway.

To be able to access the Event Gateway from outside the cluster I needed to add the following to my Ingress controller configuration:

- host: event-gateway-events-test-dev.moshen-app.net
http:
paths:
- path: /
backend:
serviceName:
event-gateway
servicePort: 4000


- host: event-gateway-config-test-dev.moshen-app.net
http:
paths:
- path: /
backend:
serviceName:
event-gateway
servicePort: 4001

I have created two DNS records, one named event-gateway-events-test-dev.moshen-app.net, and the second named event-gateway-config-test-dev.moshen-app.net. Both DNS records are pointing to the ALB. To test if the Event Gateway is reachable we can curl to the /v1/status API:

$ curl -I -X GET https://event-gateway-config-test-dev.moshen-app.net/v1/status  
HTTP/2 200
date: Mon, 21 May 2018 08:48:13 GMT
content-type: text/plain; charset=utf-8
content-length: 0

Deploying the DynamoDB Table and the Lambda Functions

DynamoDB Table

We will use a DynamoDB table to keep count of the number of replicas per deployment. The table will also keep the Event ID and timestamp of the event which triggered the last change in the replica count. Create a DynamoDB table named kubernetes-deployment-replica-count and set its primary partition key to deployment_name(Image 2.)

Image 2. DynamoDB Table Creation

The Update DynamoDB Lambda Function

This Lambda function will update the kubernetes-deployment-replica-count DynamoDB table created previously. Before creating the Lambda function we will need to create an IAM role for it. This role will need to grant the function the ability to update the DynamoDB table and the ability to persist the function logs to CloudWatch Logs.

Create an IAM role named write-dynamodb-kubernetes-deployment-replica-count-table with the following policy:

{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"logs:CreateLogGroup",
"logs:CreateLogStream",
"logs:PutLogEvents"
],
"Resource": "arn:aws:logs:*:*:*"
},
{
"Effect": "Allow",
"Action": [
"dynamodb:GetItem",
"dynamodb:BatchGetItem",
"dynamodb:PutItem",
"dynamodb:UpdateItem",
"dynamodb:DeleteItem"
],
"Resource": "arn:aws:dynamodb:eu-west-1:AWS_ACCOUNT_NUMBER:table/kubernetes-deployment-replica-count"
}
]
}

Next, create a Lambda function named update-dynamodb-kubernetes-deployment-replica-count-table. The function will use Python 2.7 as its runtime and the IAM role we created previously (Image 3.)

Image 3. Creating the update-dynamodb-kubernetes-deployment-replica-count-table Lambda

The function code can be found here. Set the Function Handler to update-dynamodb-table.main. As the function does not use any environment variables, leave the Environment variables empty. Leave the function memory at its 128MB default and set the function Timeout to 15 seconds. Also, leave the Network at its No VPC default. An overview of the function configuration can be seen in Image 4.

Image 4. update-dynamodb-kubernetes-deployment-replica-count-table Lambda Configuration

In order to trigger the Lambda function, we will need to register and subscribe it in the Event Gateway. This will provide us an endpoint that we can call. To register the Lambda function with the Event Gateway run the following (change the AWS regain and account number to the one you use):

curl -X POST \
https://event-gateway-config-test-dev.moshen-app.net/v1/spaces/aws-lambda/functions \
-H 'content-type: application/json' \
-d '{
"functionId": "update-dynamodb-kubernetes-deployment-replica-count-table",
"type": "awslambda",
"provider":{
"arn": "arn:aws:lambda:eu-west-1:AWS_ACCOUNT_NUMBER:function:update-dynamodb-kubernetes-deployment-replica-count-table",
"region": "eu-west-1"
}
}
'

Now we will need to subscribe the Lambda function. As we will want to create a synchronous subscription we will use the HTTP event type:

curl -X POST \
https://event-gateway-config-test-dev.moshen-app.net/v1/spaces/aws-lambda/subscriptions \
-H 'content-type: application/json' \
-d '{
"functionId": "update-dynamodb-kubernetes-deployment-replica-count-table",
"event": "http",
"path": "/aws-lambda/update-dynamodb-kubernetes-deployment-replica-count-table",
"method": "POST"
}
}
'

The end result will be an endpoint (https://event-gateway-events-test-dev.moshen-app.net/aws-lambda/update-dynamodb-kubernetes-deployment-replica-count-table), which we can use to trigger the Lambda function via the Event Gateway.

The EC2 instances that host the Event Gateway (the Kubernetes nodes in our case) must have permissions to invoke the Lambda function. Add the following statement to the IAM policy used by the instances role (change the AWS regain and account number to the one you use):

{
"Version": "2012-10-17",
"Statement": [
{
"Action": [
"lambda:InvokeFunction"
],
"Resource": [
"arn:aws:lambda:eu-west-1:AWS_ACCOUNT_NUMBER:function:update-dynamodb-kubernetes-service-pod-count-table"
],
"Effect": "Allow"
}
]
}

The Lex Chat Bot Lambda

This Lambda function will be the fulfillment target of the Lex bot. It will call an endpoint on the Event Gateway which will trigger the OpenFaaS function responsible for the Kubernetes replica count change. Before creating the Lambda function we will need to create an IAM role for it. This role will only grant the function the ability to persist the function logs to CloudWatch Logs.

Create an IAM role named WriteToCloudWatchLogs with the following policy:

{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"logs:CreateLogGroup",
"logs:CreateLogStream",
"logs:PutLogEvents"
],
"Resource": "arn:aws:logs:*:*:*"
}
]
}

Next, create a Lambda function named InitKubernetesReplicaCountChange. The function will use Python 2.7 as its runtime and the IAM role we created previously (WriteToCloudWatchLogs).

The function code can be found here. Set the Function Handler to init-kubernetes-replica-count-change.main. As the function does not use any environment variables, leave the Environment variables empty. Leave the function memory at its 128MB default and set the function Timeout to 15 seconds. Also, leave the Network at its No VPC default. An overview of the function configuration can be seen in Image 5.

Image 5. InitKubernetesReplicaCountChange Lambda Configuration

Deploying OpenFaaS inside Kubernetes

Deploying the OpenFaaS Infrastructure

The official deployment can be found here. As I already had the Prometheus and AlertManager services up and running, and also wanted to use my own namespace I have modified the deployed a bit. I will not deploy the asynchronous processing as I don’t need it for now. Also, I will not use RBAC and will use the Kubernetes Ingress.

Create the deployment of the OpenFaaS components by modifying the faasnetesd.yml and faas-gateway.yaml files to fit your configuration and then run the following commands (change the namespace to the one you are using):

$ kubectl -n test-infra create -f open-faas/faasnetesd.yaml
$ kubectl -n test-infra create -f open-faas/faas-gateway.yaml

This should create the OpenFaaS daemon for Kubernetes integration and the OpenFaaS gateway:

$ kubectl -n test-infra get pod | grep faas 
faas-gateway-bfc7895c-f8k96 1/1 Running 0 2m
faas-netesd-5b9df8c887-w9tld 1/1 Running 0 3m

To be able to access the OpenFaaS Gateway from outside the cluster I needed to add the following to my Ingress controller configuration:

- host: faas-gateway-test-dev.moshen-app.net
http:
paths:
- path: /
backend:
serviceName:
faas-gateway
servicePort: 8080

I have created a DNS record named faas-gateway-test-dev.moshen-app.net pointing to the ALB. Browsing to this address will expose the OpenFaaS UI which can be used to deploy and test the functions.

To test the OpenFaaS infrastructure, deploy the nslookup function provided from the store. You can use the gateway UI to trigger the function, or use curl:

$ curl https://faas-gateway-test-dev.moshen-app.net/function/nslookup -d "google.com" 

Name: google.com
Address 1: 216.58.198.78 dub08s02-in-f78.1e100.net
Address 2: 2a00:1450:400b:802::200e dub08s02-in-x0e.1e100.net

To scrap the metrics provided by the OpenFaaS gateway I have added the following configuration to the Prometheus configuration file:

# OpenFaaS Metrics
- job_name: open_faas_metrics
scrape_interval: 5s
static_configs:
- targets: [
'faas-gateway:8080'
]

The gateway_function_invocation_total metric for the nslookup function can be seen in Image 6.

Image 6. gateway_function_invocation_total Prometheus Metric

Deploying the OpenFaaS Function

Now we need to deploy the OpenFaas function that will change the number of replicas in a specific Kubernetes deployment. The function will also update a DynamoDB table holding the state of the deployment by invoking the Lambda function we created earlier (update-dynamodb-kubernetes-deployment-replica-count-table) via the Event Gateway. It will pass to the Lambda function the name of the deployment, the current replica count, and the time and ID of the event received from the Event Gateway which triggered the OpenFaaS function.

First, we need to download and install the OpenFaaS CLI tool:

$ curl -sL https://cli.openfaas.com | sudo sh

We will use the fass-cli to generate the function skeleton file structure (the function itself is written in Python 2.7):

$ faas-cli new --lang python set-pod-replica

We will need to update three files. The first file, named handler.py, will contain the function’s code itself. The second file, named requirements.txt, is a Python requirements file. The modules specified in the requirements.txt file will be added to the function’s container during the build stage. The third file, named set-pod-replica.yml, contains the function configuration (change the file to meet your preferred settings). In my case, I have set the provider gateway to point to the OpenFaaS Gateway deployed earlier inside the Kubernetes cluster. As I use the AWS ECR, I have set the function image to point to my registry named set-pod-replica hosted in the ECR:

provider:
name:
faas
gateway: https://faas-gateway-test-dev.moshen-app.net

functions:
set-pod-replica:
lang:
python
handler: ./set-pod-replica
image: AWS_ACCOUNT_NUMBER.dkr.ecr.eu-west-1.amazonaws.com/set-pod-replica

To build the the function run the following:

$ faas-cli build -f ./set-pod-replica.yml

Basically, this will build a Docker container containing all the needed parts to execute the function when called.

We can use the OpenFaaS CLI to push the container to the Docker registry. As I am using the ECR, I had to perform a login to the service before pushing the container (You can use DockerHub or any other registry you prefer, just update the image key in the set-pod-replica.yml file):

$ $(aws ecr get-login --no-include-email --region eu-west-1)

With a successful login we can now push the container:

$ faas-cli push -f ./set-pod-replica.yml

With the container pushed we can finally deploy the function:

$ faas-cli deploy -f ./set-pod-replica-local.yml

Looking at the OpenFaaS function from the Kubernetes point of view, the function is deployed as a pod:

$ kubectl -n test-infra get pod set-pod-replica-5cfbf9dd5b-
g5rld
NAME READY STATUS RESTARTS AGE
set-pod-replica-5cfbf9dd5b-g5rld 1/1 Running 0 1m

Creating an End Point For the OpenFaaS Function in the Event Gateway

We will need to register and subscribe the OpenFaaS function with the Event Gateway. This will provide us an endpoint that we can call to trigger the function. To register the function with the Event Gateway run the following:

curl -X POST \
https://event-gateway-config-test-dev.moshen-app.net/v1/spaces/test-infra/functions \
-H 'Content-Type: application/json' \
-d '{
"space": "test-infra",
"functionId": "set-pod-replica",
"type": "http",
"provider": {
"url": "http://faas-gateway:8080/function/set-pod-replica"
}
}'

Notice that the provider URL we are using is the cluster internal DNS name of the faas-gateway service (http://faas-gateway:8080/function/set-pod-replica). We can do this because both the Event Gateway and the OpenFaaS gateway are deployed inside the Kubernetes cluster.

Now we will need to subscribe the function. As we will want to create a synchronous subscription we will use the HTTP event type:

curl -X POST \
https://event-gateway-config-test-dev.moshen-app.net/v1/spaces/test-infra/subscriptions \
-H 'Content-Type: application/json' \
-d '{
"space": "test-infra",
"event": "http",
"functionId": "set-pod-replica",
"method": "POST",
"path": "/test-infra/set-pod-replica"

}'

Completing the stages above will result in an URL (https://event-gateway-events-test-dev.moshen-app.net/test-infra/set-pod-replica), which we can use as an endpoint for the OpenFaaS function in the Event Gateway. The InitKubernetesReplicaCountChange Lambda function we created earlier will use this endpoint.

Deploying the Lex Bot

We will use the Lex bot as an interface to the infrastructure we have deployed. When triggered, the bot will ask for the name of the Kubernetes deployment and the desirable replica count. As a final precaution, the bot will verify with the user the name of the deployment and the desired replica count.

Create a custom Lex bot named KubernetesBot (Image 7.). Set the Output voice to None. Leave the IAM role at its default.

Image 7. Creating the KubernetesBot Lex bot

In the bot configuration page (Image 8.) create an Intent named ChangeReplicaCount with an utterance “Change replica count”. Add two Slots. One named deploymentName with a prompt “Which Deployment?”, and a second slot named replicaCount with a prompt “How many replicas do you wish?”.

Check the Confirmation prompt checkbox and add a Confirm with a prompt “Are you sure you want to changed the number of replicas to {replicaCount} for the {deploymentName} deployment?”.

In the Fulfillment section choose AWS Lambda function and select the InitKubernetesReplicaCountChange Lambda function from the drop-down list.

Image 8. KubernetesBot Configuration Page

Testing the Flow

With all the pieces in place, we can now test our flow. I will try to increase the number of replicas of a deployment named logstash-x-pack-service. Currently, the deployment has two replicas:

$ kubectl -n test-infra get deployment logstash-x-pack-service 
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
logstash-x-pack-service 2 2 2 2 86d

And the DynameDB table is illustrated in Image 9.

Image 9. DynamoDB Table Before Replica Increment

Now, let us use the Lex bot to set the logstash-x-pack-service deployment replica count to 3 (Image 10.):

Image 10. Increment of Replica Count by the Lex Bot

Next, the Lex bot will confirm the change to the logstash-x-pack-service deployment replica count, and after approving it the bot will trigger the change (Image 11).

Image 11. Verification of Replica Count Change by the Lex Bot

Checking the deployment status, we can see that the replica count was increased to 3:

$ kubectl -n test-infra get deployment logstash-x-pack-service                                                                                   
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
logstash-x-pack-service 3 3 3 3 86d

In the DynamDB table, we can see that the replica count was updated to 3. Also, the event ID and time where updated to the new values of the current event triggered by the Lex Bot (Image 12.).

Image 12. DynamoDB Table After Replica Increment

Conclusion

The Event Gateway helps in correlating the event flow between AWS Lambda and Kubernetes based OpenFaaS functions. The ability to abstract the different FaaS solutions from one another to simple endpoints in the Event Gateway is a very good approach in my opinion. For example, you can replace the Lambda function the OpenFaaS calls with a different FaaS provider function just by performing the modification on the Event Gateway. In this case, the OpenFaaS function will be unaware of the change (as long as the endpoint in the Event Gateway is kept the same).

The combination of different FaaS solution both in the cloud and “on-prem” together with the Event Gateway and CloudEvents opens a great range of possibilities both for DevOps and software architects alike.

--

--