Spring Cloud Netflix Eureka的参数调优
下面主要分为Client端和Server端两大类进行简述,Eureka的几个核心参数
客户端参数
Client端的核心参数
参数 | 默认值 | 说明 |
---|---|---|
eureka.client.availability-zones | 告知Client有哪些region以及availability-zones,支持配置修改运行时生效 | |
eureka.client.filter-only-up-instances | true | 是否过滤出InstanceStatus为UP的实例 |
eureka.client.region | us-east-1 | 指定该应用实例所在的region,AWS datacenters适用 |
eureka.client.register-with-eureka | true | 是否将该应用实例注册到Eureka Server |
eureka.client.prefer-szme-zone-eureka | true | 是否优先使用与该实例处于相同zone的Eureka Server |
eureka.client.on-demand-update-status-change | true | 是够将本地实例状态更新通过ApplicationInfoManager实时触发同步到Eureka Server |
eureka.instance.metadata-map | 指定应用实例的元数据信息 | |
eureka.instance.prefer-ip-address | false | 指定优先使用ip地址替代host name作为实例的hostName字段值 |
eureka.instance.lease-expiration-duration-in-seconds | 90 | 指定Eureka Client间隔多久需要向Eureka Server发送心跳来告知Eureka Server该实例还存活 |
定时任务参数
参数 | 默认值 | 说明 |
---|---|---|
eureka.client.cache-refresh-executor-thread-pool-size | 2 | 刷新缓存的CacheRefreshThread的线程池大小 |
eureka.client.cache-refresh-executor-exponential-back-off-bound | 10 | (刷新缓存)调度任务执行超时时下次的调度的延迟时间 |
reka.client.heartbeat-executor-thread-pool-size | 2 | 心跳线程HeartBeatThread的线程池大小 |
eureka.client.heartbeat-executor-exponential-back-off-bound | 10 | (心跳执行)调度任务超时时下次的调度的延时时间 |
eureka.client.registry-fetch-interval-seconds | 30 | CacheRefreshThread线程调度频率 |
eureka.client.eureka-service-url-poll-interval-seconds | 5*60 | AsyncResolver.updateTask刷新Eureka Server地址的时间间隔 |
eureka.client.initial-instance-info-replication-interval-seconds | 40 | InstanceInfoReplicator将实例信息变更同步到Eureka Server的初始延时时间 |
eureka.client.instance-info-replication-interval-seconds | 30 | InstanceInfoReplicator将实例信息变更同步到Eureka Server的时间间隔 |
ureka.instance.lease-renewal-interval-in-seconds | 30 | Eureka Client向Eureka Server发送心跳的时间间隔 |
http参数
Eureka Client底层httpClient与Eureka Server通信,提供的先关参数
参数 | 默认值 | 说明 |
---|---|---|
eureka.client.eureka-server-connect-timeout-seconds | 5 | 连接超时时间 |
eureka.client.eureka-server-read-timeout-seconds | 8 | 读超时时间 |
eureka.client.eureka-server-total-connections | 200 | 连接池最大活动连接数 |
eureka.client.eureka-server-total-connections-per-host | 50 | 每个host能使用的最大链接数 |
eureka.client.eureka-connection-idle-timeout-seconds | 30 | 连接池中链接的空闲时间 |
服务端端参数
主要包含这几类:基本参数、response cache参数、peer相参数、http参数
基本参数
参数 | 默认值 | 说明 |
---|---|---|
eureka.server.enable-self-perservation | true | 是否开启自我保护模式 |
eureka.server.renewal-percent-threshold | 0.85 | 指定每分钟需要收到续约次数的阈值 |
eureka.instance.registry.expected-number-of-renews-per-min | 1 | 指定每分钟需要接收到的续约次数值,实际该值在其中被写死为count*2,另外也会被更新 |
eureka.server.renewal-threshold-update-interval-ms | 15分钟 | 指定updateRenewalThreshold定时任务的调度频率,来动态更新expectedNumberOfRenewsPerMin及numberOfRenewsPerminThreshold值 |
eureka.server.eviction-interval-timer-in-ms | 60*1000 | 指定EvictionTask定时任务的调度频率,用于剔除过期的实例 |
response cache参数
Eureka Server为了提升自身REST API接口的性能,提供了两个缓存:一个是基于ConcurrentMap的readOnlyCacheMap,一个是基于Guava Cache的readWriteCacheMap。其相关参数如下:
参数 | 默认值 | 说明 |
---|---|---|
eureka.server.use-read-only-response-cache | true | 是否使用只读的response-cache |
eureka.server.response-cache-update-interval-ms | 30*1000 | 设置CacheUpdateTask的调度时间间隔,用于从readWriteCacheMap更新数据到readOnlyCacheMap。仅仅在eureka.server.use-read-only-response-cache为true的时候生效 |
eureka.server.response-cache-auto-expiration-in-seconds | 180 | 设置readWriteCacheMap的expireAfterWrite参数,指定写入多长时间过过期 |
peer相关参数
参数 | 默认值 | 说明 |
---|---|---|
eureka.server.peer-eureka-nodes-update-interval-ms | 10分钟 | 指定peersUpdateTask调度的时间间隔,用于从配置文件刷新peerEurekaNodes节点的配置信息(‘eureka.client.serviceUrl相关zone的配置’) |
eureka.server.peer-eureka-status-refresh-time-interval-ms | 30*1000 | 指定更新peer node状态信息的时间间隔 |
http参数
Eureka Server需要与其他peer节点进行通信,复制实例信息,其底层使用httpClient,提供相关的参数
参数 | 默认值 | 说明 |
---|---|---|
eureka.server.peer-node-connect-timeout-ms | 200 | 连接超时时间 |
eureka.server.peer-node-read-timeout-ms | 200 | 读超时时间 |
eureka.server.peer-node-total-connections | 1000 | 连接池最大活动连接数 |
eureka.server.peer-node-total-connections-per-host | 500 | 每个host能使用的最大连接数 |
eureka.server.peer-node-connection-idle-timeout-seconds | 30 | 连接池中连接的空闲时间 |
参数调优
常见问题
1.为什么服务下线了,Eureka Server接口返回的信息还会存在?
2.为什么服务上线了,Eureka Client不能及时获取到?
3.为什么会有一下提示:
EMERGENCY!EUREKA MAY BE INCORRECTLY CLAIMING INSTANCES ARE UP WHEN THEY’RE NOT. RENEWALS ARE LESSER THAN THRESHOLD AND HENCE THE INSTANCES ARE NOT BEING EXPIRED JUST TO BE SAFE
解决方法:
1.Eureka Server并不是强一致的,因此registry中会议保留过期的实例信息。原因如下:
- 应用实例异常挂掉,没能在挂掉之前告知Eureka Server要下线掉该服务实例信息。这个就需要依赖Eureka Server的EvictionTask去剔除。
- 应用实例下线是有告知Eureka Server下线,但是由于Eureka Server的REST API有response cache,因此需要等待缓存过期才能更新。
- 由于Eureka Server开启并以入了SELF PRESERVATION(自我保护)模式,导致registry的信息不会因为过期而被剔除掉,直到退出SELF PRESERVATION(自我保护)模式。
针对Client下线而没有通知Eureka Server的问题,可以调整EvictionTask的调度频率,比如把默认的时间间隔60s,调整为5s:
eureka:
server:
eviction-interval-timer-in-ms: 5000
针对response cache的问题,可以根据情况考虑关闭readOnlyCacheMap:
eureka:
server:
use-read-only-response-cache: false
或者调整readWriteCacheMap的过期时间:
eureka:
server:
response-cache-auto-expiration-in-seconds: 60
针对SELF PRESERVATION(自我保护)的问题,在测试环境可以将enable-self-preservation设置为false:
eureka:
server:
enable-self-preservation: false
关闭之后会提示:
THE SELF PRESER VAT ION MODE IS TURNED OFF. THIS MAY NOT PRO TECT INSTANCE EXPIRY IN CASE OF NETWORK/OTHER PROBLEMS.
或者:
RENEWALS ARE LESSER THAN THE THRESHOLD.THE SELF PRESERVATION MODE IS TURNED OFF.THIS MAY NOT PROTECT INSTANCE EXPIRY IN CASE OF NETWORK/OTHER PROBLEMS.
2.针对新服务上线,Eureka Client获取不及时的问题,在测试环境,可以适当提高client端拉取Server注册信息的频率,例如下面将默认的30s改为5s:
eureka:
client:
registry-fetch-interval-seconds: 5
3.在实际生产过程中,经常会有网络抖动等问题造成服务实例与Eureka Server的心跳未能如期保持,但是服务实例本身是健康的,这个时候如果按照租约剔除机制剔除的话,会造成误判无果大范围误判的话,可能导致整个服务注册列表的大部分注册信息被删除,从而没有可用服务。Eureka为了解决这个问题引入了SELF PRESERVATION机制,当最近一分钟接收到的租约次数小于等于指定阈值的话,则关闭租约失效剔除,禁止定时任务失效的实例,从而保护注册信息。
在生产环境下,可以吧renewwalPercentThreshold及leaseRenewalIntervalInSeconds参数调小一点,从而提高触发SELF PRESERVATION机制的阈值。
eureka:
instance:
lease-renewal-interval-in-seconds: 10 #默认是30
renewal-percent-threshold: 0.49 #默认是0.85
监控指标
Eureka内置了基于servo的指标统计,具体在com.netflix.eureka.util.EurekaMonitors
。Spring Boot 2.x版本改为使用Micrometer,不再支持Neflix Servo,转而支持Neflix Servo的替代品Neflix Spectator
。不过对于Servo,可以通过DefaultMonitorRegistry.getInstance().getRegisteredMonitors
来获取所有注册了的Monitor,进而获取其指标值。
//
// Source code recreated from a .class file by IntelliJ IDEA
// (powered by Fernflower decompiler)
//
package com.netflix.eureka.util;
import com.netflix.appinfo.AmazonInfo;
import com.netflix.appinfo.ApplicationInfoManager;
import com.netflix.appinfo.DataCenterInfo;
import com.netflix.appinfo.AmazonInfo.MetaDataKey;
import com.netflix.appinfo.DataCenterInfo.Name;
import com.netflix.servo.DefaultMonitorRegistry;
import com.netflix.servo.annotations.DataSourceType;
import com.netflix.servo.annotations.Monitor;
import com.netflix.servo.monitor.Monitors;
import java.util.concurrent.atomic.AtomicLong;
public enum EurekaMonitors {
// 自启动以来收到的总续约次数
RENEW("renewCounter", "Number of total renews seen since startup"),
// 自启动以来收到的总取消租约次数
CANCEL("cancelCounter", "Number of total cancels seen since startup"),
// 自启动以来查询registry的总次数
GET_ALL_CACHE_MISS("getAllCacheMissCounter", "Number of total registery queries seen since startup"),
// 自启动以来delta查询registry的总次数
GET_ALL_CACHE_MISS_DELTA("getAllCacheMissDeltaCounter", "Number of total registery queries for delta seen since startup"),
// 自启动以来使用remote region查询registry的总次数
GET_ALL_WITH_REMOTE_REGIONS_CACHE_MISS("getAllWithRemoteRegionCacheMissCounter", "Number of total registry with remote region queries seen since startup"),
// 自启动以来使用remote region及delta方式查询registry的总次数
GET_ALL_WITH_REMOTE_REGIONS_CACHE_MISS_DELTA("getAllWithRemoteRegionCacheMissDeltaCounter", "Number of total registry queries for delta with remote region seen since startup"),
// 自启动以来查询delta的总次数
GET_ALL_DELTA("getAllDeltaCounter", "Number of total deltas since startup"),
// 自启动以来传递regions查询delta的总次数
GET_ALL_DELTA_WITH_REMOTE_REGIONS("getAllDeltaWithRemoteRegionCounter", "Number of total deltas with remote regions since startup"),
// 自启动以来查询'/{version}/apps'的次数
GET_ALL("getAllCounter", "Number of total registry queries seen since startup"),
// 自启动以来传递regions参数查询'/{version}/apps'的次数
GET_ALL_WITH_REMOTE_REGIONS("getAllWithRemoteRegionCounter", "Number of total registry queries with remote regions, seen since startup"),
// 自启动以来请求/{version}/apps/{appId}的总次数
GET_APPLICATION("getApplicationCounter", "Number of total application queries seen since startup"),
// 自启动以来register的总次数
REGISTER("registerCounter", "Number of total registers seen since startup"),
// 自启动以来剔除过期实例的总次数
EXPIRED("expiredCounter", "Number of total expired leases since startup"),
// 自启动以来statusUpdate的总次数
STATUS_UPDATE("statusUpdateCounter", "Number of total admin status updates since startup"),
// 自启动以来deleteStatusOverride的总次数
STATUS_OVERRIDE_DELETE("statusOverrideDeleteCounter", "Number of status override removals"),
// 自启动以来收到cancel请求时对应实例找不到的次数
CANCEL_NOT_FOUND("cancelNotFoundCounter", "Number of total cancel requests on non-existing instance since startup"),
// 自启动以来收到renew请求时对应实例找不到的次数
RENEW_NOT_FOUND("renewNotFoundexpiredCounter", "Number of total renew on non-existing instance since startup"),
REJECTED_REPLICATIONS("numOfRejectedReplications", "Number of replications rejected because of full queue"),
FAILED_REPLICATIONS("numOfFailedReplications", "Number of failed replications - likely from timeouts"),
// 由于开启rate limiter被丢弃的请求数量
RATE_LIMITED("numOfRateLimitedRequests", "Number of requests discarded by the rate limiter"),
// 如果开启rate limiter的话,将被丢弃的请求数
RATE_LIMITED_CANDIDATES("numOfRateLimitedRequestCandidates", "Number of requests that would be discarded if the rate limiter's throttling is activated"),
// 开启rate limiter时请求全量registry被丢弃的请求数
RATE_LIMITED_FULL_FETCH("numOfRateLimitedFullFetchRequests", "Number of full registry fetch requests discarded by the rate limiter"),
// 如果开启rate limiter时请求全量registry将被丢弃的请求数
RATE_LIMITED_FULL_FETCH_CANDIDATES("numOfRateLimitedFullFetchRequestCandidates", "Number of full registry fetch requests that would be discarded if the rate limiter's throttling is activated");
private final String name;
private final String myZoneCounterName;
private final String description;
@Monitor(
name = "count",
type = DataSourceType.COUNTER
)
private final AtomicLong counter = new AtomicLong();
@Monitor(
name = "count-minus-replication",
type = DataSourceType.COUNTER
)
private final AtomicLong myZoneCounter = new AtomicLong();
private EurekaMonitors(String name, String description) {
this.name = name;
this.description = description;
DataCenterInfo dcInfo = ApplicationInfoManager.getInstance().getInfo().getDataCenterInfo();
if (dcInfo.getName() == Name.Amazon) {
this.myZoneCounterName = ((AmazonInfo)dcInfo).get(MetaDataKey.availabilityZone) + "." + name;
} else {
this.myZoneCounterName = "dcmaster." + name;
}
}
public void increment() {
this.increment(false);
}
public void increment(boolean isReplication) {
this.counter.incrementAndGet();
if (!isReplication) {
this.myZoneCounter.incrementAndGet();
}
}
public String getName() {
return this.name;
}
public String getZoneSpecificName() {
return this.myZoneCounterName;
}
public String getDescription() {
return this.description;
}
public long getCount() {
return this.counter.get();
}
public long getZoneSpecificCount() {
return this.myZoneCounter.get();
}
public static void registerAllStats() {
EurekaMonitors[] var0 = values();
int var1 = var0.length;
for(int var2 = 0; var2 < var1; ++var2) {
EurekaMonitors c = var0[var2];
Monitors.registerObject(c.getName(), c);
}
}
public static void shutdown() {
EurekaMonitors[] var0 = values();
int var1 = var0.length;
for(int var2 = 0; var2 < var1; ++var2) {
EurekaMonitors c = var0[var2];
DefaultMonitorRegistry.getInstance().unregister(Monitors.newObjectMonitor(c.getName(), c));
}
}
}