起因
早上grafana 发来邮件警告系统异常过多..
登录 grafana 查看.
排查
登录 kibana 查看相关日志.
发现错误一:
Caused by: redis.clients.jedis.exceptions.JedisConnectionException: java.net.UnknownHostException: redis.marathon.l4lb.thisdcos.directory at redis.clients.jedis.Connection.connect(Connection.java:207) at redis.clients.jedis.BinaryClient.connect(BinaryClient.java:93) at redis.clients.jedis.BinaryJedis.connect(BinaryJedis.java:1767) at redis.clients.jedis.JedisFactory.makeObject(JedisFactory.java:106) at org.apache.commons.pool2.impl.GenericObjectPool.create(GenericObjectPool.java:868) at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:435) at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:363) at redis.clients.util.Pool.getResource(Pool.java:49) ... 119 common frames omittedCaused by: java.net.UnknownHostException: redis.marathon.l4lb.thisdcos.directory at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:184) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:589) at redis.clients.jedis.Connection.connect(Connection.java:184) ... 126 common frames omitted
dcos vip 映射 ip 出错,导致服务中找不到 redis
再查看 grafana redis 监控,发现 redis io,setnx 命令都异常.如下图
排查相关服务. 发现下面代码
while (!flag && start <= System.currentTimeMillis() + time) { flag = redisTemplate.execute(new RedisCallback() { @Override public Boolean doInRedis(RedisConnection connection) throws DataAccessException { Jedis jedis = (Jedis) connection.getNativeConnection(); if (jedis.setnx(key, UID) == 1L) { jedis.expire(key, LOCK_DEATH_TIME);//300秒过期,防止死锁.如果在这步前jvm挂了,会导致一直死锁. LOCK_MAP.put(RedisLock.this, 1); setExclusiveOwnerThread(Thread.currentThread()); return true; } return false; } });
在 setnx 方法的时候没有休眠,导致一直循环..
解决方案,while 加入 sleep
while (!flag && start <= System.currentTimeMillis() + time) { flag = redisTemplate.execute(new RedisCallback() { @Override public Boolean doInRedis(RedisConnection connection) throws DataAccessException { Jedis jedis = (Jedis) connection.getNativeConnection(); if (jedis.setnx(key, UID) == 1L) { jedis.expire(key, LOCK_DEATH_TIME);//300秒过期,防止死锁.如果在这步前jvm挂了,会导致一直死锁. LOCK_MAP.put(RedisLock.this, 1); setExclusiveOwnerThread(Thread.currentThread()); return true; } //循环竞争锁的时候添加休眠 try { TimeUnit.MILLISECONDS.sleep(20L); } catch (InterruptedException e) { log.error("lock error", e); } log.info("竞争锁: " + key); return false; } });