Add CloudWatch Alarms to A Lambda Function
Better monitoring your app.
Jargons
What alarms do you want to set for you service
we want to alarm Errors on Lambda, yes we do
Function errors include
- exceptions that your code throws, e.g., 4xx and 5xx exception
- exceptions that the Lambda runtime throws.
lambda itself doesn’t have 4xx and 5xx metrics.
we want to alarm 4XX and 5XX errors on the API Gateway
(Why do we want to set alarms on API Gateway even though we already have alarms on the Lambda side?)
- We want to set an alarm on 5XX error. We have to do it on the API gateway
- Sometimes, Lambda successfully returns a response, however, it exceeds API Gateway’s 29s limit, it won’t trigger the alarm on the Lambda side
- we want to set up alarms for 4xx response too. Because we can monitor a DDos attack if there are too many 4xx responses in a short period of time
Evaluating an alarm
see: https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/AlarmThatSendsEmail.html#alarm-evaluation
When you create an alarm, you specify three settings:
Period
is the length of time to evaluate the metric, in seconds.Evaluation Periods
is the number of the most recent periods.Datapoints
to Alarm is the number of data points within the Evaluation Periods that must be breaching to cause the alarm to go to the ALARM state.Statistic
should useSum
in the following cases.Minium
won’t work because it always equals to0
.
With SAM
MyFunctionErrorsAlarm:
Type: 'AWS::CloudWatch::Alarm'
Properties:
ActionsEnabled: true
AlarmDescription: 'My Function Errors'
ComparisonOperator: GreaterThanOrEqualToThreshold
Dimensions:
- Name: FunctionName
Value: !Sub "${AWS::StackName}-v1"
EvaluationPeriods: 1
MetricName: Errors
Namespace: AWS/Lambda
Statistic: Sum
Period: 60
Threshold: 1
TreatMissingData: notBreaching
AlarmActions:
- !Sub 'arn:aws:sns:${AWS::Region}:${AWS::AccountId}:MyCompany_Cloudwatch_Alarms_Topic'
MyApiGateway5XXErrorAlarm:
Type: 'AWS::CloudWatch::Alarm'
Properties:
ActionsEnabled: true
AlarmDescription: 'My-Blah-api-v1 ApiGateway 5XXError'
ComparisonOperator: GreaterThanOrEqualToThreshold
Dimensions:
- Name: ApiName
Value: !Sub "${AWS::StackName}-api-v1"
EvaluationPeriods: 1
MetricName: 5XXError
Namespace: AWS/ApiGateway
Statistic: Sum
Period: 60
Threshold: 1
TreatMissingData: notBreaching
AlarmActions:
- !Sub 'arn:aws:sns:${AWS::Region}:${AWS::AccountId}:MyCompany_Cloudwatch_Alarms_Topic'
MyApiGateway4XXErrorAlarm:
Type: 'AWS::CloudWatch::Alarm'
Properties:
ActionsEnabled: true
AlarmDescription: 'My-Blah-api-v1 ApiGateway 4XXError'
ComparisonOperator: GreaterThanOrEqualToThreshold
Dimensions:
- Name: ApiName
Value: !Sub "${AWS::StackName}-api-v1"
EvaluationPeriods: 1
MetricName: 4XXError
Namespace: AWS/ApiGateway
Statistic: Sum
Period: 60
Threshold: 20
TreatMissingData: notBreaching
AlarmActions:
- !Sub 'arn:aws:sns:${AWS::Region}:${AWS::AccountId}:MyCompany_Cloudwatch_Alarms_Topic'
about Dimensions
- current example is to pass in the function name (Lambda) to monitor
-
if you want to monitor on an API gateway, you can do this way
- Name: ApiName Value: !Sub "${AWS::StackName}-api-v1"
With CDK
private void buildCloudwatchAlarm(Function function, RestApi api) {
var internalServerErrorMetric = MetricFilter.Builder.create(this, "LambdaErrorLogMetric")
.metricName("MyApiLambdaErrors")
.metricNamespace("my-project/Lambda")
.filterPattern(FilterPattern.anyTerm("ERROR"))
.logGroup(function.getLogGroup())
.build();
var alarms = List.of(
Alarm.Builder.create(this, "LambdaFailureAlarm")
.metric(function.metricErrors())
.threshold(1)
.comparisonOperator(ComparisonOperator.GREATER_THAN_OR_EQUAL_TO_THRESHOLD)
.alarmDescription("My Api lambda failure")
.evaluationPeriods(1)
.treatMissingData(TreatMissingData.NOT_BREACHING)
.build(),
Alarm.Builder.create(this, "ApiGateway4XXAlarm")
.metric(api.metricClientError())
.threshold(20)
.comparisonOperator(ComparisonOperator.GREATER_THAN_OR_EQUAL_TO_THRESHOLD)
.alarmDescription("High number of Api Gateway 4xx failures")
.evaluationPeriods(1)
.treatMissingData(TreatMissingData.NOT_BREACHING)
.build(),
Alarm.Builder.create(this, "ApiGateway5XXAlarm")
.metric(api.metricServerError())
.threshold(1)
.comparisonOperator(ComparisonOperator.GREATER_THAN_OR_EQUAL_TO_THRESHOLD)
.alarmDescription("My Api Gateway 5xx failures")
.evaluationPeriods(1)
.treatMissingData(TreatMissingData.NOT_BREACHING)
.build(),
Alarm.Builder.create(this, "LambdaDurationAlarm")
.metric(function.metricDuration())
.threshold(Duration.seconds(10).toMilliseconds())
.comparisonOperator(ComparisonOperator.GREATER_THAN_OR_EQUAL_TO_THRESHOLD)
.alarmDescription("High duration of My Api lambda")
.evaluationPeriods(1)
.treatMissingData(TreatMissingData.NOT_BREACHING)
.build(),
Alarm.Builder.create(this, "LambdaErrorLogAlarm")
.metric(internalServerErrorMetric.metric())
.threshold(1)
.comparisonOperator(ComparisonOperator.GREATER_THAN_OR_EQUAL_TO_THRESHOLD)
.alarmDescription("My Api lambda errors logged")
.evaluationPeriods(1)
.treatMissingData(TreatMissingData.NOT_BREACHING)
.build()
);
if (DeployEnvironment.getAlarmArn() != null) {
ITopic alarmTopic = Topic.fromTopicArn(this, "MyCompanyAlarmTopic", DeployEnvironment.getAlarmArn());
alarms.forEach(alarm -> alarm.addAlarmAction(new SnsAction(alarmTopic)));
}
}
Set your metric
- your component/function. e.g., ApiGateway =
RestApi
or Lambda =Function
- they have their own metric, let your IDE help
evaluationPeriods + threshold
- during this period, how many occurrence will trigger this alarm
- e.g., for
metricClientError
, we don’t want to know each user’s error. However, It’s good idea to monitor high amount of client error in short period of time. It could be a DDoS attack. Curently we set an alarm, when 20 client errors per minute. It may change to a higher number later.
Notes
-
don’t try to name your alarm, it will pick up a name from AWS. So that we don’t have to maintain a naming convention.
-
An AWS tutorial for Create a CloudWatch alarm based on a static threshold: https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/ConsoleAlarms.html