h2o.ai源码解析(2)—启动流程

上一篇中已经给出h2o.ai的整体介绍以及其核心项目h2o.ai/h2o-3的源码目录,本篇给出h2o启动流程的源码分析。启动过程的时序图如下:
h2o.ai源码解析(2)—启动流程_第1张图片
下面挑选时序图中的关键接口进行详细功能介绍:

1. 【步骤3】 registerCoreExtensions()加载扩展类
h2o.ai源码解析(2)—启动流程_第2张图片
利用Java ServiceLoader的原理加载当前项目目录中所有/resources/META-INF/目录下water.AbstractH2OExtension文件中定义的服务类:(这些服务类都继承了AbstractH2OExtension)。汇总了下加载了扩展类如下:

Project File Content
h2o-core water.AbstractH2OExtension water.FailedNodeWatchdogExtension
h2o-ext-krbstandalone water.AbstractH2OExtension hex.security.KerberosExtension
h2o-ext-xgboost water.AbstractH2OExtension hex.tree.xgboost.XGBoostExtension
h2o-grpc water.AbstractH2OExtension ai.h2o.api.GrpcExtension

*2.【步骤5,6】 startLocalNode()启动当前节点和当前cloud并将当前节点作为当前cloud的唯一成员*

/** Initializes the local node and the local cloud with itself as the only member. */
  private static void startLocalNode() {
    // Figure self out; this is surprisingly hard
    NetworkInit.initializeNetworkSockets();
    // Do not forget to put SELF into the static configuration (to simulate
    // proper multicast behavior)
    if( !ARGS.client && STATIC_H2OS != null && !STATIC_H2OS.contains(SELF)) {
      Log.warn("Flatfile configuration does not include self: " + SELF+ " but contains " + STATIC_H2OS);
      STATIC_H2OS.add(SELF);
    }
    ......
}

其中调用【步骤6】的 initializeNetworkSockets() 初始化启动一个jettyServer加载Web API(默认为ip:host为localhost:54321)

 public static void initializeNetworkSockets( ) {
    // Assign initial ports
    H2O.API_PORT = H2O.ARGS.port == 0 ? H2O.ARGS.baseport : H2O.ARGS.port;

    // Late instantiation of Jetty object, if needed.
    if (H2O.getJetty() == null && !H2O.ARGS.disable_web) {
      H2O.setJetty(new JettyHTTPD());
    }

    // API socket is only used to find opened port on given ip.
    ServerSocket apiSocket = null;

    // At this point we would like to allocate 2 consecutive ports
    while (true) {
      H2O.H2O_PORT = H2O.API_PORT + 1;
      try {
        if (!H2O.ARGS.disable_web) {
          apiSocket = H2O.ARGS.web_ip == null // Listen to any interface
                      ? new ServerSocket(H2O.API_PORT)
                      : new ServerSocket(H2O.API_PORT, -1, getInetAddress(H2O.ARGS.web_ip));
          apiSocket.setReuseAddress(true);
        }
        // Bind to the UDP socket
        _udpSocket = DatagramChannel.open();
        _udpSocket.socket().setReuseAddress(true);
        InetSocketAddress isa = new InetSocketAddress(H2O.SELF_ADDRESS, H2O.H2O_PORT);
        _udpSocket.socket().bind(isa);
        // Bind to the TCP socket also
        _tcpSocket = ServerSocketChannel.open();
        _tcpSocket.socket().setReceiveBufferSize(water.AutoBuffer.TCP_BUF_SIZ);
        _tcpSocket.socket().bind(isa);

        // Warning: There is a ip:port race between socket close and starting Jetty
        if (!H2O.ARGS.disable_web) {
          apiSocket.close();
          H2O.getJetty().start(H2O.ARGS.web_ip, H2O.API_PORT);
        }
        break;
      } catch (Exception e) {
        ...
      }
      // Try next available port to bound
      H2O.API_PORT += 2;
      ...
  }

3. 【步骤8,9】initializePersistence()初始化持久化层,当前支持以下四种持久化存储

Key Description
ICE 分布式本地磁盘存储
HDFS 可对接后端的hadoop-hdfs集群
S3 Amazon S3对象存储
NFS 标准文件系统
static void initializePersistence() {
    _PM = new PersistManager(ICE_ROOT);
}
public PersistManager(URI iceRoot) {
    I = new Persist[MAX_BACKENDS];
    stats = new PersistStatsEntry[MAX_BACKENDS];
    for (int i = 0; i < stats.length; i++) {
      stats[i] = new PersistStatsEntry();
    }
    ...
    ...
    I[Value.ICE ] = ice;
    I[Value.NFS ] = new PersistNFS();

    try {
      Class klass = Class.forName("water.persist.PersistHdfs");
      java.lang.reflect.Constructor constructor = klass.getConstructor();
      I[Value.HDFS] = (Persist) constructor.newInstance();
      Log.info("HDFS subsystem successfully initialized");
    }
    catch (Throwable ignore) {
      Log.info("HDFS subsystem not available");
    }

    try {
      Class klass = Class.forName("water.persist.PersistS3");
      java.lang.reflect.Constructor constructor = klass.getConstructor();
      I[Value.S3] = (Persist) constructor.newInstance();
      Log.info("S3 subsystem successfully initialized");
    } catch (Throwable ignore) {
      Log.info("S3 subsystem not available");
    }
  }

4. 【步骤11】startNetworkServices()初始化网络服务,启动UDPReceiver, TCPReceiver, heartbeat, Cleaner(将K/V store数据落到持久化存储中)等网络服务线程。
h2o.ai源码解析(2)—启动流程_第3张图片

5. 【步骤14】getAllProviderNames(true)加载数据源解析器
利用ServiceLoader加载当前项目目录中所有/resources/META-INF/目录下water.parser.ParserProvider文件中定义的数据源解析器,支持的数据源类型有以下几种:

Project File Content Source
h2o-core water.parser.ParserProvider water.parser.DefaultParserProviders$ArffParserProvider 默认支持的数据格式有ARFF,XSL,CSV,SVMLight(GUESS并不是一种数据格式)
water.parser.DefaultParserProviders$XlsParserProvider
water.parser.DefaultParserProviders$SVMLightParserProvider
water.parser.DefaultParserProviders$CsvParserProvider
water.parser.DefaultParserProviders$GuessParserProvider
h2o-orc-parser water.parser.ParserProvider water.parser.orc.OrcParserProvider Apache ORC
h2o-parquet-parser water.parser.ParserProvider water.parser.parquet.ParquetParserProvider Apache Parquet
h2o-avro-parser water.parser.ParserProvider water.parser.avro.AvroParserProvider Apache Avro



【注】:(1)GuessParserProvider并不是解析GUESS格式的源数据,这个解析器的作用是在不知道数据源格式的情况下,根据解析器的优先级依次试着解析源数据
(2)OrcParserProvider默认是不加载的,因为gradle打包时默认不打包h2o-orc-parser模块(参考build.gradle文件中的编译打包处理)
h2o.ai源码解析(2)—启动流程_第4张图片
在gradle.properties文件中有如下配置:
这里写图片描述

6. 【步骤17】registerResourceRoot()加载WEB静态资源
加载h2o-web/src/main/resources/www和h2o-core/src/main/resources/www目录下的静态WEB资源
h2o.ai源码解析(2)—启动流程_第5张图片

7.【步骤18】registerRestApiExtensions()注册Rest API资源
(1)利用Java ServiceLoader的原理加载当前项目目录中所有/resources/META-INF/目录下water.api.RestApiExtension文件中定义的REST API注册服务类,这些服务类都继承自AbstractRegister,并重写了registerEndPoints方法注册一系列主要包含httpMethodURI - handlerClass - handlerMethod的RestAPI接口。
h2o.ai源码解析(2)—启动流程_第6张图片

Project File Content
h2o-ext-xgboost water.api.RestApiExtension hex.api.xgboost.RegisterRestApi
h2o-core water.api.RestApiExtension water.api.RegisterV3Api
water.api.RegisterV4Api
h2o-automl water.api.RestApiExtension water.automl.RegisterRestApi
h2o-algos water.api.RestApiExtension hex.api.RegisterAlgos

(2) 利用Java ServiceLoader的原理加载当前项目目录中所有/resources/META-INF/目录下water.api.Schema中所有的Schema实体类(Schema就是所有Rest API接口需要的POJO),不再赘述

8.【步骤20,21】startServingRestApi()启动H2O WEB服务
h2o.ai源码解析(2)—启动流程_第7张图片

你可能感兴趣的:(大数据)