既然说主从切换,那么主要涉及到以下三种情形:
我们按流程顺序来讲解。
我们先看Consumer拉取消息的核心方法:
public class PullAPIWrapper {
protected PullResult pullKernelImpl(final MessageQueue mq, ...) throws ... {
// 获取Broker信息
FindBrokerResult findBrokerResult =
this.mQClientFactory.findBrokerAddressInSubscribe(mq.getBrokerName(), this.recalculatePullFromWhichNode(mq), false);
......
}
}
在从指定的MessageQueue拉取消息前,需要获取到起Broker信息,比如brokerAddr,name等,这样才好向此IP发送请求。
其有个recalculatePullFromWhichNode方法,顾名思义,重新计算从哪个节点拉取,看其代码:
/**
* 重新计算消息队列拉取消息对应的Broker编号
* 若之前的请求返回了建议拉取的BrokerId,则使用此id
*
* @param mq 消息队列
* @return Broker编号
*/
public long recalculatePullFromWhichNode(final MessageQueue mq) {
// 若开启默认Broker开关,则返回Master的编号 : 0l
if (this.isConnectBrokerByUser()) {
return this.defaultBrokerId;
}
// 若消息队列映射拉取Broker存在,则返回映射Broker编号
AtomicLong suggest = this.pullFromWhichNodeTable.get(mq);
if (suggest != null) {
return suggest.get();
}
// 返回Broker主节点编号
return MixAll.MASTER_ID;
}
默认情况connectBrokerByUser = false,所以从pullFromWhichNodeTable获取当前MessageQueue的建议BrokerId。
那么pullFromWhichNodeTable里的值是什么时候填充进去的呢?看如下代码:
public class PullAPIWrapper {
/**
* 处理拉取结果
*/
public PullResult processPullResult(final MessageQueue mq, final PullResult pullResult,final SubscriptionData subscriptionData) {
PullResultExt pullResultExt = (PullResultExt)pullResult;
// 更新消息队列拉取消息Broker编号的映射
this.updatePullFromWhichNode(mq, pullResultExt.getSuggestWhichBrokerId());
......
}
/**
* 更新消息队列拉取消息Broker编号的映射
*
* @param mq 消息队列
* @param brokerId Broker编号
*/
public void updatePullFromWhichNode(final MessageQueue mq, final long brokerId) {
AtomicLong suggest = this.pullFromWhichNodeTable.get(mq);
if (null == suggest) {
this.pullFromWhichNodeTable.put(mq, new AtomicLong(brokerId));
} else {
suggest.set(brokerId);
}
}
}
Consumer在拉取到一次消息时,会更新指定MessageQueue的下次拉取的建议BrokerId,值是从消息里获取的,也就是说是Broker返回来的。然后下去请求时会优先使用suggest brokerId作为拉取的候选Broker。
如果是Consumer的第一次消息拉取,不存在上次的suggest brokerId,使用MixAll.MASTER_ID作为BrokerId。
当得到了建议BrokerId之后,需要从本地获取此Broker相应的信息,也就是最开始的findBrokerAddressInSubscribe方法:
public class MQClientInstance {
/**
* 获得Broker信息
*
* @param brokerName broker名字
* @param brokerId broker编号
* @param onlyThisBroker 是否必须是该broker
* @return Broker信息
*/
public FindBrokerResult findBrokerAddressInSubscribe(final String brokerName, final long brokerId, final boolean onlyThisBroker) {
String brokerAddr = null; // broker地址
boolean slave = false; // 是否为从节点
boolean found = false; // 是否找到
// 获得Broker信息
HashMap/* brokerId */, String/* address */> map = this.brokerAddrTable.get(brokerName);
if (map != null && !map.isEmpty()) {
brokerAddr = map.get(brokerId);
slave = brokerId != MixAll.MASTER_ID;
found = brokerAddr != null;
// 如果不强制获得,选择一个Broker
// 此处是实现主从切换的关键,当Master宕机时,从剩下可用的Broker按顺序读取一个,Long的HashCode是有序的
if (!found && !onlyThisBroker) {
Entry entry = map.entrySet().iterator().next();
brokerAddr = entry.getValue();
slave = entry.getKey() != MixAll.MASTER_ID;
found = true;
}
}
// 找到broker,则返回信息
if (found) {
return new FindBrokerResult(brokerAddr, slave);
}
// 找不到,则返回空
return null;
}
}
根据brokerName,brokerId寻找指定的Broker的信息,如果brokerId对应的Broker不存在,比如找Master,但是其宕机了,所以从Namesrv处获取道的TopicRouteData里已经不存在这个BrokerData,客户端更新了brokerAddrTable,移除了Master的brokerAddr,剩下的就全是Slave了。所以从剩下的Slave处获取,因为Long的HashCode有序,所以默认获取到的是第一个,也就是 brokerId = 1 的Slave
下面我们看Broker是如何对suggest BrokerId 赋值的。
public class DefaultMessageStore implements MessageStore {
public GetMessageResult getMessage(......) {
...
// 剩余待拉取消息的字节数
long diff = maxOffsetPy - maxPhyOffsetPulling;
// 配置的内存中可用来存储待拉取消息的大小,默认为总内存的40%
long memory = (long)(StoreUtil.TOTAL_PHYSICAL_MEMORY_SIZE * (this.messageStoreConfig.getAccessMessageInMemoryMaxRatio() / 100.0));
// 待拉取的消息已经超过了总内存的40%,也就是说已经积累了大量的消息未消费,
// 有很多消息已经存储到CommitLog文件中,此时消息可能要从文件中读取,性能很低
// 从Master拉取的速度太慢了,可能是IO异常或者IO压力很大,建议从Slave拉取
getResult.setSuggestPullingFromSlave(diff > memory);
...
}
}
从Broker拉取消息时,如果待拉取的消息已经超过了总内存的40%,其会设置suggestPullingFromSlave = true。看之后流程:
public class PullMessageProcessor implements NettyRequestProcessor {
/**
* 处理拉取消息请求,返回响应
*/
private RemotingCommand processRequest(final Channel channel, RemotingCommand request, boolean brokerAllowSuspend)throws RemotingCommandException {
......
// 设置了建议从Slave处拉取消息,建议从BrokerId = 1 的Slavea读取消息,默认whichBrokerWhenConsumeSlowly = 1
if (getMessageResult.isSuggestPullingFromSlave()) {
responseHeader.setSuggestWhichBrokerId(subscriptionGroupConfig.getWhichBrokerWhenConsumeSlowly());
} else {
responseHeader.setSuggestWhichBrokerId(MixAll.MASTER_ID);
}
switch (this.brokerController.getMessageStoreConfig().getBrokerRole()) {
case ASYNC_MASTER:
case SYNC_MASTER:
break;
case SLAVE: // 若是从Slave拉取消息,默认配置下slaveReadEnable=false,让你下次从Master处读
if (!this.brokerController.getBrokerConfig().isSlaveReadEnable()) {
response.setCode(ResponseCode.PULL_RETRY_IMMEDIATELY);
responseHeader.setSuggestWhichBrokerId(MixAll.MASTER_ID);
}
break;
}
//默认情形下slaveReadEnable = true,所以会还原之前设置的suggestWhichBrokerId = 1 的赋值
if (this.brokerController.getBrokerConfig().isSlaveReadEnable()) {
// consume too slow ,redirect to another machine
if (getMessageResult.isSuggestPullingFromSlave()) {
responseHeader.setSuggestWhichBrokerId(subscriptionGroupConfig.getWhichBrokerWhenConsumeSlowly());
}
// consume ok
else {
responseHeader.setSuggestWhichBrokerId(subscriptionGroupConfig.getBrokerId());
}
} else { //当Slave不可读的情况下,还是从Master处读取吧
responseHeader.setSuggestWhichBrokerId(MixAll.MASTER_ID);
}
}
}
总结下主从切换流程: