SpringBoot2.X性能监控Actuator

Edited on 2022-07-04 In SpringBoot

一、前言

SpringBoot Actuator 服务监控与管理**

其中包含了很多的服务，比如我们常用的amqp、 JVM、 cache等等，下面是actuator包下的目录

amqp,audit,beans,cache,cassandra,context,couchbase,elasticsearch,endpoint,env,flyway,health,influx,info,integration,jdbc,jms,ldap,liquibase,logging,mail,management,metrics,mongo,neo4j,redis,scheduling,security,session,solr,system,http,web

是不是感觉挺全面的。

2021-04-20 补充2.4.2 与之前版本的差异

二、服务监控与管理

Maven 依赖


<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-actuator</artifactId>
</dependency>

主要配置

Spring Boot2.x中，默认只开放了info、health两个端点，开放其他端点需要配置

# 开启所有端点
management:
  endpoints: # 这里是 endpoints
    web:
      # 默认路径
      base-path: /actuator
      exposure:
        #  Endpoint IDs that should be included or '*' for all.
        include: '*'
    # 显示详细的 health 信息
    jmx:
      # Whether unique runtime object names should be ensured.
      domain: org.springframework.boot
      exposure:
        # Endpoint IDs that should be included or '*' for all.
        include: '*'
  # 显示详细的 health 信息
  endpoint: # 这里是 endpoint
    health:
      show-details: always
    # 打开 shutdown 端点，通过 POST 访问该端点可以关闭应用
    shutdown:
      enabled: true

监控状态

启动之后访问 http://localhost:8062/boot/actuator/health 就可以看到对应的项目监控状态。

访问 http://localhost:8062/boot/actuator 可以查看有那些监控。

健康指标 HealthIndicators 由 Spring Boot 自动配置，因此这里显示监控信息是由项目所使用的技术栈而决定的：

名称	描述
CassandraHealthIndicator	检查 Cassandra 数据库是否启动。
DiskSpaceHealthIndicator	检查磁盘空间不足。
DataSourceHealthIndicator	检查是否可以获得连接 DataSource。
ElasticsearchHealthIndicator	检查 Elasticsearch 集群是否启动。
InfluxDbHealthIndicator	检查 InfluxDB 服务器是否启动。
JmsHealthIndicator	检查 JMS 代理是否启动。
MailHealthIndicator	检查邮件服务器是否启动。
MongoHealthIndicator	检查 Mongo 数据库是否启动。
Neo4jHealthIndicator	检查 Neo4j 服务器是否启动。
RabbitHealthIndicator	检查 Rabbit 服务器是否启动。
RedisHealthIndicator	检查 Redis 服务器是否启动。
SolrHealthIndicator	检查 Solr 服务器是否已启动。

常用端点

查看常用接口

http://localhost:8062/boot/actuator/

env 端点，应用获取环境信息，包括：环境变量、JVM属性、应用的配置配置、命令行中的参数等等。
localhost:8080/actuator/env

mapping 端点，url 与控制器映射关系信息
localhost:8080/actuator/info

metrics 端点，引用度量指标端点，提供引用再运行时的信息，如内存使用情况、HTTP请求统计、外部资源指标等
查看所有度量指标 localhost:8080/actuator/metrics
查看度量指标详细信息 localhost:8080/actuator/metrics/jvm.gc.pause

loggers 端点，查看可配置 loggers 的列表及相关的等级信息
localhost:8080/actuator/loggers
查看特定的 logger 详细信息localhost:8080/actuator/loggers/{name}

健康检查

health 端点用于暴露程序运行的健康状态，暴露的信息的详细程度由 management.endpoint.health.show-details 来控制，它具有以下三个可选值：

名称	描述
never	细节永远不会显示。
when-authorized	详细信息仅向授权用户显示。授权角色可以使用配置 management.endpoint.health.roles。
always	详细信息显示给所有用户。

在 org.springframework.boot.actuate.health.ShowDetails中有详细说明。

端点列表

info
显示应用的基本信息
health
显示应用的健康状态
metrics
显示应用多样的度量信息
loggers
显示和修改配置的loggers
logfile
返回log file中的内容(如果logging.file或者logging.path被设置)
httptrace
显示HTTP足迹，最近100个HTTP request/repsponse
env
显示当前的环境特性
flyway
显示数据库迁移路径的详细信息
liquidbase
显示Liquibase 数据库迁移的纤细信息
shutdown
让你逐步关闭应用
mappings
显示所有的@RequestMapping路径
scheduledtasks
显示应用中的调度任务
threaddump
执行一个线程dump
heapdump
返回一个GZip压缩的JVM堆dump

三、自定义健康检查

在启动类中加入


@Component
public class CustomHealthIndicatorDemo {
    @Bean
    HealthIndicator customHealthIndicator() {
        return () -> Health.status("DOWN")
                .withDetail("error code", "某健康专项检查失败").build();
    }

    @Bean
    HealthIndicator customUpHealthIndicator() {
        return () -> Health.up().withDetail("success code", "自定义检查一切正常 UP").build();
    }

    @Bean
    HealthIndicator customDownHealthIndicator() {
        return () -> Health.up().withDetail("success code", "自定义检查一切正常 DOWN ").build();
    }
}

访问 http://localhost:8062/boot/actuator/health 的结果为：

这里我开启了redis ，数据库为mysql

{
  "status": "DOWN",
  "details": {
    "custom": {
      "status": "FATAL",
      "details": {
        "error code": "某健康专项检查失败"
      }
    },
    "customUp": {
      "status": "UP",
      "details": {
        "success code": "自定义检查一切正常 UP"
      }
    },
    "customDown": {
      "status": "DOWN",
      "details": {
        "success code": "自定义检查一切正常 DOWN "
      }
    },
    "diskSpace": {
      "status": "UP",
      "details": {
        "total": "471182741504",
        "free": "375580655616",
        "threshold": "10485760"
      }
    },
    "db": {
      "status": "UP",
      "details": {
        "database": "MySQL",
        "hello": "1"
      }
    },
    "redis": {
      "status": "UP",
      "details": {
        "version": "5.0.8"
      }
    }
  }
}

当前details中有一个检查status为DOWN时，Health检查的status就为DOWN，否则为UP。

如果把第一个FATAL改为DOWN，Health检查结果同样为DOWN。

下表显示了内置状态的默认映射：

Status	Mapping
DOWN	SERVICE_UNAVAILABLE (503)
OUT_OF_SERVICE	SERVICE_UNAVAILABLE (503)
UP	No mapping by default, so http status is 200
UNKNOWN	No mapping by default, so http status is 200

四、自定义端点

Spring Boot 支持使用 @Endpoint 来自定义端点暴露信息。


@Endpoint(id = "customEndPoint")
@Component
public class CustomEndPoint {

    @ReadOperation
    public Map<String, Object> getInfo() {
        Map<String, Object> dataMap = new LinkedHashMap<>();
        dataMap.put("自定义信息", "custom endpoint ");
        return dataMap;
    }
}

请求 http://localhost:8062/boot/actuator/customEndPoint 的结果为

1
2
3

{
  "自定义信息": "custom endpoint "
}

可用的方法注解由 HTTP 操作所决定：

operation	HTTP 方法
@ReadOperation	GET
@WriteOperation	POST
@DeleteOperation	DELETE

五、监控指标

参数	参数说明	是否监控	监控手段	重要度
–JVM–
jvm.memory.max	JVM最大内存
jvm.memory.committed	JVM可用内存	是	展示并监控堆内存和Metaspace	重要
jvm.memory.used	JVM已用内存	是	展示并监控堆内存和Metaspace	重要
jvm.buffer.memory.used	JVM缓冲区已用内存
jvm.buffer.count	当前缓冲区数
jvm.threads.daemon	JVM守护线程数	是	显示在监控页面
jvm.threads.live	JVM当前活跃线程数	是	显示在监控页面；监控达到阈值时报警	重要
jvm.threads.peak	JVM峰值线程数	是	显示在监控页面
jvm.classes.loaded	加载classes数
jvm.classes.unloaded	未加载的classes数
jvm.gc.memory.allocated	GC时，年轻代分配的内存空间
jvm.gc.memory.promoted	GC时，老年代分配的内存空间
jvm.gc.max.data.size	GC时，老年代的最大内存空间
jvm.gc.live.data.size	FullGC时，老年代的内存空间
jvm.gc.pause	GC耗时	是	显示在监控页面
–TOMCAT–
tomcat.sessions.created	tomcat已创建session数
tomcat.sessions.expired	tomcat已过期session数
tomcat.sessions.active.current	tomcat活跃session数
tomcat.sessions.active.max	tomcat最多活跃session数	是	显示在监控页面，超过阈值可报警或者进行动态扩容	重要
tomcat.sessions.alive.max.second	tomcat最多活跃session数持续时间
tomcat.sessions.rejected	超过session最大配置后，拒绝的session个数	是	显示在监控页面，方便分析问题
tomcat.global.error	错误总数	是	显示在监控页面，方便分析问题
tomcat.global.sent	发送的字节数
tomcat.global.request.max	request最长时间
tomcat.global.request	全局request次数和时间
tomcat.global.received	全局received次数和时间
tomcat.servlet.request	servlet的请求次数和时间
tomcat.servlet.error	servlet发生错误总数
tomcat.servlet.request.max	servlet请求最长时间
tomcat.threads.busy	tomcat繁忙线程	是	显示在监控页面，据此检查是否有线程夯住
tomcat.threads.current	tomcat当前线程数（包括守护线程）	是	显示在监控页面	重要
tomcat.threads.config.max	tomcat配置的线程最大数	是	显示在监控页面	重要
tomcat.cache.access	tomcat读取缓存次数
tomcat.cache.hit	tomcat缓存命中次数
–CPU–
system.cpu.count	CPU数量
system.load.average.1m	load average	是	超过阈值报警	重要
system.cpu.usage	系统CPU使用率
process.cpu.usage	当前进程CPU使用率	是	超过阈值报警
http.server.requests	http请求调用情况	是	显示10个请求量最大，耗时最长的URL；统计非200的请求量	重要
process.uptime	应用已运行时间	是	显示在监控页面
process.files.max	允许最大句柄数	是	配合当前打开句柄数使用
process.start.time	应用启动时间点	是	显示在监控页面
process.files.open	当前打开句柄数	是	监控文件句柄使用率，超过阈值后报警	重要

`Springboot 2.4.2`

`jvm.gc.pause`

{
  "name": "jvm.gc.pause",
  "description": "Time spent in GC pause",
  "baseUnit": "seconds",
  "measurements": [
    {
      "statistic": "COUNT",
      "value": 8
    },
    {
      "statistic": "TOTAL_TIME",
      "value": 0.255
    },
    {
      "statistic": "MAX",
      "value": 0
    }
  ],
  "availableTags": [
    {
      "tag": "cause",
      "values": [
        "Metadata GC Threshold",
        "Allocation Failure"
      ]
    },
    {
      "tag": "action",
      "values": [
        "end of minor GC",
        "end of major GC"
      ]
    }
  ]
}

`httptrace` 404解决

Springboot已经不推荐使用 httptrace。

官方说明

@Configuration
public class SpringBootAdminConfig {

    @Bean
	public InMemoryHttpTraceRepository getInMemoryHttpTrace(){
		return new InMemoryHttpTraceRepository();
	}

}

本文地址： https://github.com/maxzhao-it/blog/post/12162/