上一篇中已经给出h2o.ai的整体介绍以及其核心项目h2o.ai/h2o-3的源码目录,本篇给出h2o启动流程的源码分析。启动过程的时序图如下:
下面挑选时序图中的关键接口进行详细功能介绍:
1. 【步骤3】 registerCoreExtensions()加载扩展类
利用Java ServiceLoader的原理加载当前项目目录中所有/resources/META-INF/目录下water.AbstractH2OExtension文件中定义的服务类:(这些服务类都继承了AbstractH2OExtension)。汇总了下加载了扩展类如下:
Project | File | Content |
---|---|---|
h2o-core | water.AbstractH2OExtension | water.FailedNodeWatchdogExtension |
h2o-ext-krbstandalone | water.AbstractH2OExtension | hex.security.KerberosExtension |
h2o-ext-xgboost | water.AbstractH2OExtension | hex.tree.xgboost.XGBoostExtension |
h2o-grpc | water.AbstractH2OExtension | ai.h2o.api.GrpcExtension |
*2.【步骤5,6】 startLocalNode()启动当前节点和当前cloud并将当前节点作为当前cloud的唯一成员*
/** Initializes the local node and the local cloud with itself as the only member. */
private static void startLocalNode() {
// Figure self out; this is surprisingly hard
NetworkInit.initializeNetworkSockets();
// Do not forget to put SELF into the static configuration (to simulate
// proper multicast behavior)
if( !ARGS.client && STATIC_H2OS != null && !STATIC_H2OS.contains(SELF)) {
Log.warn("Flatfile configuration does not include self: " + SELF+ " but contains " + STATIC_H2OS);
STATIC_H2OS.add(SELF);
}
......
}
其中调用【步骤6】的 initializeNetworkSockets() 初始化启动一个jettyServer加载Web API(默认为ip:host为localhost:54321)
public static void initializeNetworkSockets( ) {
// Assign initial ports
H2O.API_PORT = H2O.ARGS.port == 0 ? H2O.ARGS.baseport : H2O.ARGS.port;
// Late instantiation of Jetty object, if needed.
if (H2O.getJetty() == null && !H2O.ARGS.disable_web) {
H2O.setJetty(new JettyHTTPD());
}
// API socket is only used to find opened port on given ip.
ServerSocket apiSocket = null;
// At this point we would like to allocate 2 consecutive ports
while (true) {
H2O.H2O_PORT = H2O.API_PORT + 1;
try {
if (!H2O.ARGS.disable_web) {
apiSocket = H2O.ARGS.web_ip == null // Listen to any interface
? new ServerSocket(H2O.API_PORT)
: new ServerSocket(H2O.API_PORT, -1, getInetAddress(H2O.ARGS.web_ip));
apiSocket.setReuseAddress(true);
}
// Bind to the UDP socket
_udpSocket = DatagramChannel.open();
_udpSocket.socket().setReuseAddress(true);
InetSocketAddress isa = new InetSocketAddress(H2O.SELF_ADDRESS, H2O.H2O_PORT);
_udpSocket.socket().bind(isa);
// Bind to the TCP socket also
_tcpSocket = ServerSocketChannel.open();
_tcpSocket.socket().setReceiveBufferSize(water.AutoBuffer.TCP_BUF_SIZ);
_tcpSocket.socket().bind(isa);
// Warning: There is a ip:port race between socket close and starting Jetty
if (!H2O.ARGS.disable_web) {
apiSocket.close();
H2O.getJetty().start(H2O.ARGS.web_ip, H2O.API_PORT);
}
break;
} catch (Exception e) {
...
}
// Try next available port to bound
H2O.API_PORT += 2;
...
}
3. 【步骤8,9】initializePersistence()初始化持久化层,当前支持以下四种持久化存储
Key | Description |
---|---|
ICE | 分布式本地磁盘存储 |
HDFS | 可对接后端的hadoop-hdfs集群 |
S3 | Amazon S3对象存储 |
NFS | 标准文件系统 |
static void initializePersistence() {
_PM = new PersistManager(ICE_ROOT);
}
public PersistManager(URI iceRoot) {
I = new Persist[MAX_BACKENDS];
stats = new PersistStatsEntry[MAX_BACKENDS];
for (int i = 0; i < stats.length; i++) {
stats[i] = new PersistStatsEntry();
}
...
...
I[Value.ICE ] = ice;
I[Value.NFS ] = new PersistNFS();
try {
Class klass = Class.forName("water.persist.PersistHdfs");
java.lang.reflect.Constructor constructor = klass.getConstructor();
I[Value.HDFS] = (Persist) constructor.newInstance();
Log.info("HDFS subsystem successfully initialized");
}
catch (Throwable ignore) {
Log.info("HDFS subsystem not available");
}
try {
Class klass = Class.forName("water.persist.PersistS3");
java.lang.reflect.Constructor constructor = klass.getConstructor();
I[Value.S3] = (Persist) constructor.newInstance();
Log.info("S3 subsystem successfully initialized");
} catch (Throwable ignore) {
Log.info("S3 subsystem not available");
}
}
4. 【步骤11】startNetworkServices()初始化网络服务,启动UDPReceiver, TCPReceiver, heartbeat, Cleaner(将K/V store数据落到持久化存储中)等网络服务线程。
5. 【步骤14】getAllProviderNames(true)加载数据源解析器
利用ServiceLoader加载当前项目目录中所有/resources/META-INF/目录下water.parser.ParserProvider文件中定义的数据源解析器,支持的数据源类型有以下几种:
Project | File | Content | Source |
h2o-core | water.parser.ParserProvider | water.parser.DefaultParserProviders$ArffParserProvider | 默认支持的数据格式有ARFF,XSL,CSV,SVMLight(GUESS并不是一种数据格式) |
water.parser.DefaultParserProviders$XlsParserProvider | |||
water.parser.DefaultParserProviders$SVMLightParserProvider | |||
water.parser.DefaultParserProviders$CsvParserProvider | |||
water.parser.DefaultParserProviders$GuessParserProvider | |||
h2o-orc-parser | water.parser.ParserProvider | water.parser.orc.OrcParserProvider | Apache ORC |
h2o-parquet-parser | water.parser.ParserProvider | water.parser.parquet.ParquetParserProvider | Apache Parquet |
h2o-avro-parser | water.parser.ParserProvider | water.parser.avro.AvroParserProvider | Apache Avro |
【注】:(1)GuessParserProvider并不是解析GUESS格式的源数据,这个解析器的作用是在不知道数据源格式的情况下,根据解析器的优先级依次试着解析源数据
(2)OrcParserProvider默认是不加载的,因为gradle打包时默认不打包h2o-orc-parser模块(参考build.gradle文件中的编译打包处理)
在gradle.properties文件中有如下配置:
6. 【步骤17】registerResourceRoot()加载WEB静态资源
加载h2o-web/src/main/resources/www和h2o-core/src/main/resources/www目录下的静态WEB资源
7.【步骤18】registerRestApiExtensions()注册Rest API资源
(1)利用Java ServiceLoader的原理加载当前项目目录中所有/resources/META-INF/目录下water.api.RestApiExtension文件中定义的REST API注册服务类,这些服务类都继承自AbstractRegister,并重写了registerEndPoints方法注册一系列主要包含httpMethodURI - handlerClass - handlerMethod的RestAPI接口。
Project | File | Content |
---|---|---|
h2o-ext-xgboost | water.api.RestApiExtension | hex.api.xgboost.RegisterRestApi |
h2o-core | water.api.RestApiExtension | water.api.RegisterV3Api water.api.RegisterV4Api |
h2o-automl | water.api.RestApiExtension | water.automl.RegisterRestApi |
h2o-algos | water.api.RestApiExtension | hex.api.RegisterAlgos |
(2) 利用Java ServiceLoader的原理加载当前项目目录中所有/resources/META-INF/目录下water.api.Schema中所有的Schema实体类(Schema就是所有Rest API接口需要的POJO),不再赘述