python 64式: 第27式、分布式锁与群组管理__2、tooz应用之负载均衡

python中分布式锁与群组管理系列
最近有接触到分布式锁的相关问题。
基于openstack相关组件源码, tooz官网文档和自己对组件使用的一点点心得,
想整理一下这部分的内容。

主要想分为四个部分介绍:
分布式锁与群组管理 1、 tooz介绍
分布式锁与群组管理 2、 tooz应用之负载均衡
分布式锁与群组管理 3、 tooz应用之分布式锁
分布式锁与群组管理 4、 tooz源码分析
下面是第2部分的内容

1 引言
ceilometer组件源码(newton版本)中至少有compute服务和notification服务可以配置使用一种被
称之为coordination的东西。
coordination翻译过来可以叫做协调组,实际的主要作用就是负载均衡。
根据ceilometer的源码,这里的负载均衡应该是消息处理可以均匀分发到多个服务。
和基于round-bin轮转的haproxy原理不太一样。这里的实现是通过实现了一致性哈希,
根据传递过来的消息中的某个属性计算哈希值,然后根据事先初始化好的oslo_messaging.notifier
列表notifiers的长度,用哈希值 对 notifiers的长度取模得到下标index。
获取发送该消息的实际notifier即为notifiers[index],而不同的服务则监听不同的topics(根据一致性哈希),即不同的队列,

2 ceilomter中关于协调组的配置
查看ceilometer.conf,可以进行如下协调组的配置,具体配置coordination的backend_url
实际就是tooz库的driver。官方支持redis,memcached等。

[compute]
workload_partitioning = true
[coordination]
backend_url = redis://redis.openstack.svc.cluster.local:6379/
[notification]
messaging_urls = rabbit://rabbitmq:[email protected]:5672/
workload_partitioning = true

请根据实际情况配置不同的backend_url。


3 ceilometer-compute服务中关于协调组的源码分析
总入口: ceilometer/agent/manager.py中的__init__方法
3.1 __init__方法具体内容如下:
class AgentManager(service_base.PipelineBasedService):

    def __init__(self, namespaces=None, pollster_list=None, worker_id=0):
        namespaces = namespaces or ['compute', 'central']
        pollster_list = pollster_list or []
        group_prefix = cfg.CONF.polling.partitioning_group_prefix
        self._inspector = virt_inspector.get_hypervisor_inspector()
        self.nv = nova_client.Client()
        self.rpc_server = None

        # features of using coordination and pollster-list are exclusive, and
        # cannot be used at one moment to avoid both samples duplication and
        # samples being lost
        if pollster_list and cfg.CONF.coordination.backend_url:
            raise PollsterListForbidden()

        super(AgentManager, self).__init__(worker_id)

        def _match(pollster):
            """Find out if pollster name matches to one of the list."""
            return any(fnmatch.fnmatch(pollster.name, pattern) for
                       pattern in pollster_list)

        if type(namespaces) is not list:
            namespaces = [namespaces]

        # we'll have default ['compute', 'central'] here if no namespaces will
        # be passed
        extensions = (self._extensions('poll', namespace).extensions
                      for namespace in namespaces)
        # get the extensions from pollster builder
        extensions_fb = (self._extensions_from_builder('poll', namespace)
                         for namespace in namespaces)
        if pollster_list:
            extensions = (moves.filter(_match, exts)
                          for exts in extensions)
            extensions_fb = (moves.filter(_match, exts)
                             for exts in extensions_fb)

        self.extensions = list(itertools.chain(*list(extensions))) + list(
            itertools.chain(*list(extensions_fb)))

        if self.extensions == []:
            raise EmptyPollstersList()

        discoveries = (self._extensions('discover', namespace).extensions
                       for namespace in namespaces)
        self.discoveries = list(itertools.chain(*list(discoveries)))
        self.polling_periodics = None

        self.partition_coordinator = coordination.PartitionCoordinator()
        self.heartbeat_timer = utils.create_periodic(
            target=self.partition_coordinator.heartbeat,
            spacing=cfg.CONF.coordination.heartbeat,
            run_immediately=True)

        # Compose coordination group prefix.
        # We'll use namespaces as the basement for this partitioning.
        namespace_prefix = '-'.join(sorted(namespaces))
        self.group_prefix = ('%s-%s' % (namespace_prefix, group_prefix)
                             if group_prefix else namespace_prefix)

        self.notifier = oslo_messaging.Notifier(
            messaging.get_transport(),
            driver=cfg.CONF.publisher_notifier.telemetry_driver,
            publisher_id="ceilometer.polling")

        self._keystone = None
        self._keystone_last_exception = None

分析:
3.1.1) 上述通过self.partition_coordinator = coordination.PartitionCoordinator()
来初始化一个协调组。

3.1.2) 
     self.heartbeat_timer = utils.create_periodic(
            target=self.partition_coordinator.heartbeat,
            spacing=cfg.CONF.coordination.heartbeat,
            run_immediately=True)
这个是定时调用coordination的heartbeat,用于判断服务是否存活

3.2) 接下来进入AgentManager类的run方法
内容如下:
    def run(self):
        """Start RPC server and handle realtime query."""
        super(AgentManager, self).run()
        self.polling_manager = pipeline.setup_polling()
        self.join_partitioning_groups()
        self.start_polling_tasks()
        self.init_pipeline_refresh()

分析:
3.2.1)
self.polling_manager = pipeline.setup_polling()
这里调用了ceilometer/pipeline.py中的
def setup_polling():
    """Setup polling manager according to yaml config file."""
    cfg_file = cfg.CONF.pipeline_cfg_file
    return PollingManager(cfg_file)

3.2.2)
self.join_partitioning_groups()
具体代码如下:
    def join_partitioning_groups(self):
        self.groups = set([self.construct_group_id(d.obj.group_id)
                          for d in self.discoveries])
        # let each set of statically-defined resources have its own group
        static_resource_groups = set(
            [self.construct_group_id(utils.hash_of_set(p.resources))
             for p in self.polling_manager.sources
             if p.resources
             ])
        self.groups.update(static_resource_groups)

        if not self.groups and self.partition_coordinator.is_active():
            self.partition_coordinator.stop()
            self.heartbeat_timer.stop()

        if self.groups and not self.partition_coordinator.is_active():
            self.partition_coordinator.start()
            utils.spawn_thread(self.heartbeat_timer.start)

        for group in self.groups:
            self.partition_coordinator.join_group(group)

分析:
3.2.2.1) self.discoveries来自于
        discoveries = (self._extensions('discover', namespace).extensions
                       for namespace in namespaces)
        self.discoveries = list(itertools.chain(*list(discoveries)))
其中:
       namespaces = namespaces or ['compute', 'central']
查看ceilometer/setup.cfg中有如下内容:
ceilometer.discover.compute =
    local_instances = ceilometer.compute.discovery:InstanceDiscovery

ceilometer.discover.central =
    endpoint = ceilometer.agent.discovery.endpoint:EndpointDiscovery
    tenant = ceilometer.agent.discovery.tenant:TenantDiscovery
    lb_pools = ceilometer.network.services.discovery:LBPoolsDiscovery
    lb_vips = ceilometer.network.services.discovery:LBVipsDiscovery
    lb_members = ceilometer.network.services.discovery:LBMembersDiscovery
    lb_listeners = ceilometer.network.services.discovery:LBListenersDiscovery
    lb_loadbalancers = ceilometer.network.services.discovery:LBLoadBalancersDiscovery
    lb_health_probes = ceilometer.network.services.discovery:LBHealthMonitorsDiscovery
    _services    = ceilometer.network.services.discovery:VPNServicesDiscovery
    ipsec_connections  = ceilometer.network.services.discovery:IPSecConnectionsDiscovery
    fw_services = ceilometer.network.services.discovery:FirewallDiscovery
    fw_policy = ceilometer.network.services.discovery:FirewallPolicyDiscovery
    tripleo_overcloud_nodes = ceilometer.hardware.discovery:NodesDiscoveryTripleO
    fip_services = ceilometer.network.services.discovery:FloatingIPDiscovery
    images = ceilometer.image.discovery:ImagesDiscovery


3.2.3) 分析self.start_polling_tasks方法
代码如下:
    def start_polling_tasks(self):
        # allow time for coordination if necessary
        delay_start = self.partition_coordinator.is_active()

        # set shuffle time before polling task if necessary
        delay_polling_time = random.randint(
            0, cfg.CONF.shuffle_time_before_polling_task)

        data = self.setup_polling_tasks()

        # One thread per polling tasks is enough
        self.polling_periodics = periodics.PeriodicWorker.create(
            [], executor_factory=lambda:
            futures.ThreadPoolExecutor(max_workers=len(data)))

        for interval, polling_task in data.items():
            delay_time = (interval + delay_polling_time if delay_start
                          else delay_polling_time)

            @periodics.periodic(spacing=interval, run_immediately=False)
            def task(running_task):
                self.interval_task(running_task)

            utils.spawn_thread(utils.delayed, delay_time,
                               self.polling_periodics.add, task, polling_task)

        if data:
            # Don't start useless threads if no task will run
            utils.spawn_thread(self.polling_periodics.start, allow_empty=True)
分析:
3.2.3.1) 在self.setup_polling_tasks中建立了: <采样间隔,轮询任务列表>的字典。
然后利用定时器每隔一定时间执行轮询任务。
里面调用了interval_task方法,该方法内容如下:
    def interval_task(self, task):
        # NOTE(sileht): remove the previous keystone client
        # and exception to get a new one in this polling cycle.
        self._keystone = None
        self._keystone_last_exception = None

        task.poll_and_notify()

3.2.3.2)
调用了task.poll_and_notify方法,该方法具体内容如下:
    def poll_and_notify(self):
        """Polling sample and notify."""
        cache = {}
        discovery_cache = {}
        poll_history = {}
        for source_name in self.pollster_matches:
            for pollster in self.pollster_matches[source_name]:
                key = Resources.key(source_name, pollster)
                candidate_res = list(
                    self.resources[key].get(discovery_cache))
                if not candidate_res and pollster.obj.default_discovery:
                    candidate_res = self.manager.discover(
                        [pollster.obj.default_discovery], discovery_cache)

                # Remove duplicated resources and black resources. Using
                # set() requires well defined __hash__ for each resource.
                # Since __eq__ is defined, 'not in' is safe here.
                polling_resources = []
                black_res = self.resources[key].blacklist
                history = poll_history.get(pollster.name, [])
                for x in candidate_res:
                    if x not in history:
                        history.append(x)
                        if x not in black_res:
                            polling_resources.append(x)
                poll_history[pollster.name] = history

                # If no resources, skip for this pollster
                if not polling_resources:
                    p_context = 'new ' if history else ''
                    LOG.info(_("Skip pollster %(name)s, no %(p_context)s"
                               "resources found this cycle"),
                             {'name': pollster.name, 'p_context': p_context})
                    continue

                LOG.info(_("Polling pollster %(poll)s in the context of "
                           "%(src)s"),
                         dict(poll=pollster.name, src=source_name))
                try:
                    polling_timestamp = timeutils.utcnow().isoformat()
                    samples = pollster.obj.get_samples(
                        manager=self.manager,
                        cache=cache,
                        resources=polling_resources
                    )
                    sample_batch = []

                    # filter None in samples
                    samples = [s for s in samples if s is not None]
                    # TODO(chao.ma), debug it
                    if samples:
                        metric = pollster.name

                    for sample in samples:
                        # Note(yuywz): Unify the timestamp of polled samples
                        sample.set_timestamp(polling_timestamp)
                        sample_dict = (
                            publisher_utils.meter_message_from_counter(
                                sample, self._telemetry_secret
                            ))
                        if self._batch:
                            sample_batch.append(sample_dict)
                        else:
                            self._send_notification([sample_dict])

                    if sample_batch:
                        self._send_notification(sample_batch)

                except plugin_base.PollsterPermanentError as err:
                    LOG.error(_(
                        'Prevent pollster %(name)s for '
                        'polling source %(source)s anymore!')
                        % ({'name': pollster.name, 'source': source_name}))
                    self.resources[key].blacklist.extend(err.fail_res_list)
                except Exception as err:
                    LOG.warning(_(
                        'Continue after error from %(name)s: %(error)s')
                        % ({'name': pollster.name, 'error': err}),
                        exc_info=True)

分析:
3.2.3.2.1)
                    candidate_res = self.manager.discover(
                        [pollster.obj.default_discovery], discovery_cache)
这个调用了discover方法

3.2.3.2.2) discover方法如下

    def discover(self, discovery=None, discovery_cache=None):
        resources = []
        discovery = discovery or []
        for url in discovery:
            if discovery_cache is not None and url in discovery_cache:
                resources.extend(discovery_cache[url])
                continue
            name, param = self._parse_discoverer(url)
            discoverer = self._discoverer(name)
            if discoverer:
                try:
                    if discoverer.KEYSTONE_REQUIRED_FOR_SERVICE:
                        service_type = getattr(
                            cfg.CONF.service_types,
                            discoverer.KEYSTONE_REQUIRED_FOR_SERVICE)
                        if not keystone_client.get_service_catalog(
                                self.keystone).get_endpoints(
                                    service_type=service_type):
                            LOG.warning(_LW('Skipping %(name)s, '
                                            '%(service_type)s service '
                                            'is not registered in keystone'),
                                        {'name': name,
                                         'service_type': service_type})
                            continue

                    discovered = discoverer.discover(self, param)
                    partitioned = self.partition_coordinator.extract_my_subset(
                        self.construct_group_id(discoverer.group_id),
                        discovered)
                    resources.extend(partitioned)
                    if discovery_cache is not None:
                        discovery_cache[url] = partitioned
                except ka_exceptions.ClientException as e:
                    LOG.error(_LE('Skipping %(name)s, keystone issue: '
                                  '%(exc)s'), {'name': name, 'exc': e})
                except Exception as err:
                    LOG.exception(_LE('Unable to discover resources: %s'), err)
            else:
                LOG.warning(_LW('Unknown discovery extension: %s'), name)
        return resources

分析:
1) 入参如下
self.manager.discover(
                        [pollster.obj.default_discovery], discovery_cache)
其中discovery参数是[pollster.obj.default_discovery]
里面最关键的是:
                    discovered = discoverer.discover(self, param)
                    partitioned = self.partition_coordinator.extract_my_subset(
                        self.construct_group_id(discoverer.group_id),
                        discovered)
                    resources.extend(partitioned)
分析:不知道里面是什么内容,需要打印,

2) 但是不管怎样,都是获取所有的监控数据发送到消息队列,都调用了
ceilometer/agent/manager.py中的PollingTask类的如下方法发送监控数据。

    def _send_notification(self, samples):
        self.manager.notifier.sample(
            {},
            'telemetry.polling',
            {'samples': samples}
        )

那么按照道理应该还是发送给
notifications.sample这个队列


3.2.4) 分析self.init_pipeline_refresh
代码在: ceilometer/service_base.py中的PipelineBasedService(cotyledon.Service)类的

    def init_pipeline_refresh(self):
        """Initializes pipeline refresh state."""
        self.clear_pipeline_validation_status()

        if (cfg.CONF.refresh_pipeline_cfg or
                cfg.CONF.refresh_event_pipeline_cfg):
            self.refresh_pipeline_periodic = utils.create_periodic(
                target=self.refresh_pipeline,
                spacing=cfg.CONF.pipeline_polling_interval)
            utils.spawn_thread(self.refresh_pipeline_periodic.start)
分析:
该方法应该不会执行,因为默认配置参数不会刷新。

3.3) 查看某个discover
查看: ceilometer/compute/discovery.py中InstanceDiscovery类
内容如下: 
class InstanceDiscovery(plugin_base.DiscoveryBase):
    def __init__(self):
        super(InstanceDiscovery, self).__init__()
        self.nova_cli = nova_client.Client()
        self.last_run = None
        self.instances = {}
        self.expiration_time = cfg.CONF.compute.resource_update_interval
        self.cache_expiry = cfg.CONF.compute.resource_cache_expiry
        self.last_cache_expire = None

    def discover(self, manager, param=None):
        """Discover resources to monitor."""
        secs_from_last_update = 0
        utc_now = timeutils.utcnow(True)
        secs_from_last_expire = 0
        if self.last_run:
            secs_from_last_update = timeutils.delta_seconds(
                self.last_run, utc_now)
        if self.last_cache_expire:
            secs_from_last_expire = timeutils.delta_seconds(
                self.last_cache_expire, utc_now)

        instances = []
        # NOTE(ityaptin) we update make a nova request only if
        # it's a first discovery or resources expired
        if not self.last_run or secs_from_last_update >= self.expiration_time:
            try:
                if secs_from_last_expire < self.cache_expiry and self.last_run:
                    # since = self.last_run.isoformat()
                    pass
                else:
                    # since = None
                    self.instances.clear()
                    self.last_cache_expire = utc_now

                # since = self.last_run.isoformat() if self.last_run else None
                # FIXME(ccz): Remove parameter last_run from nova_list query.
                # Using changes-since cannot list those instances which just
                # changes volume attachment and that will affect the discovery
                # of volumes under telemetry.
                # Original Code:
                # instances = self.nova_cli.instance_get_all_by_host(
                #     cfg.CONF.host, since)
                instances = self.nova_cli.instance_get_all_by_host(
                    cfg.CONF.host)
                self.last_run = utc_now
            except Exception:
                # NOTE(zqfan): instance_get_all_by_host is wrapped and will log
                # exception when there is any error. It is no need to raise it
                # again and print one more time.
                return []

        for instance in instances:
            if getattr(instance, 'OS-EXT-STS:vm_state', None) in ['deleted',
                                                                  'error']:
                self.instances.pop(instance.id, None)
            else:
                self.instances[instance.id] = instance

        return self.instances.values()

    @property
    def group_id(self):
        if cfg.CONF.compute.workload_partitioning:
            return cfg.CONF.host
        else:
            return None

分析:
可以看到group_id

总结:
对于ceilomete-compute服务而言:
获取组的所有成员,因为组名来自于:
ceilometer/compute/discovery.py中InstanceDiscovery类的group_id方法,
而这个方法是返回当前计算节点的名称,例如: compute-node-2.domain.tld
也就是说各个计算节点有不同的组,因此无论怎样处理,这个ceilometer-compute服务
需要处理的肯定是在这个计算节点上的所有虚机。
也就是说这里实际没有实现ceilometer-compute服务的负载均衡。

@property
def group_id(self):
    if cfg.CONF.compute.workload_partitioning:
        return cfg.CONF.host
    else:
        return None

参考:
https://specs.openstack.org/openstack/ceilometer-specs/specs/kilo/notification-coordiation.html
https://github.com/openstack/ceilometer-specs/blob/master/specs/juno/central-agent-partitioning.rst


4 ceilometer-notification服务中关于协调组的源码分析
4.1 总入口
ceilometer/notification.py的run方法
具体代码如下:

class NotificationService(service_base.PipelineBasedService):
    """Notification service.

    When running multiple agents, additional queuing sequence is required for
    inter process communication. Each agent has two listeners: one to listen
    to the main OpenStack queue and another listener(and notifier) for IPC to
    divide pipeline sink endpoints. Coordination should be enabled to have
    proper active/active HA.
    """

    NOTIFICATION_NAMESPACE = 'ceilometer.notification'
    NOTIFICATION_IPC = 'ceilometer-pipe'
    def run(self):
        super(NotificationService, self).run()
        self.shutdown = False
        self.periodic = None
        self.partition_coordinator = None
        self.coord_lock = threading.Lock()

        self.listeners = []

        # NOTE(kbespalov): for the pipeline queues used a single amqp host
        # hence only one listener is required
        self.pipeline_listener = None

        self.pipeline_manager = pipeline.setup_pipeline()

        self.event_pipeline_manager = pipeline.setup_event_pipeline()

        self.transport = messaging.get_transport()

        if cfg.CONF.notification.workload_partitioning:
            self.group_id = self.NOTIFICATION_NAMESPACE
            self.partition_coordinator = coordination.PartitionCoordinator()
            self.partition_coordinator.start()
        else:
            # FIXME(sileht): endpoint uses the notification_topics option
            # and it should not because this is an oslo_messaging option
            # not a ceilometer. Until we have something to get the
            # notification_topics in another way, we must create a transport
            # to ensure the option has been registered by oslo_messaging.
            messaging.get_notifier(self.transport, '')
            self.group_id = None

        self.pipe_manager = self._get_pipe_manager(self.transport,
                                                   self.pipeline_manager)
        self.event_pipe_manager = self._get_event_pipeline_manager(
            self.transport)

        self._configure_main_queue_listeners(self.pipe_manager,
                                             self.event_pipe_manager)

        if cfg.CONF.notification.workload_partitioning:
            # join group after all manager set up is configured
            self.partition_coordinator.join_group(self.group_id)
            self.partition_coordinator.watch_group(self.group_id,
                                                   self._refresh_agent)

            @periodics.periodic(spacing=cfg.CONF.coordination.heartbeat,
                                run_immediately=True)
            def heartbeat():
                self.partition_coordinator.heartbeat()

            @periodics.periodic(spacing=cfg.CONF.coordination.check_watchers,
                                run_immediately=True)
            def run_watchers():
                self.partition_coordinator.run_watchers()

            self.periodic = periodics.PeriodicWorker.create(
                [], executor_factory=lambda:
                futures.ThreadPoolExecutor(max_workers=10))
            self.periodic.add(heartbeat)
            self.periodic.add(run_watchers)

            utils.spawn_thread(self.periodic.start)

            # configure pipelines after all coordination is configured.
            with self.coord_lock:
                self._configure_pipeline_listener()

        if not cfg.CONF.notification.disable_non_metric_meters:
            LOG.warning(_LW('Non-metric meters may be collected. It is highly '
                            'advisable to disable these meters using '
                            'ceilometer.conf or the pipeline.yaml'))

        self.init_pipeline_refresh()

分析:
4.1.1) 
上述方法中,如果开启了 cfg.CONF.notification.workload_partitioning
则:
            self.group_id = self.NOTIFICATION_NAMESPACE
            self.partition_coordinator = coordination.PartitionCoordinator()
            self.partition_coordinator.start()
其中:
    NOTIFICATION_NAMESPACE = 'ceilometer.notification'

4.2 具体分析PartitionCoordinator的start方法
该方法具体位于: ceilometer/coordination.py文件中
class PartitionCoordinator(object):

    """Workload partitioning coordinator.

    This class uses the `tooz` library to manage group membership.

    To ensure that the other agents know this agent is still alive,
    the `heartbeat` method should be called periodically.

    Coordination errors and reconnects are handled under the hood, so the
    service using the partition coordinator need not care whether the
    coordination backend is down. The `extract_my_subset` will simply return an
    empty iterable in this case.
    """

    def __init__(self, my_id=None):
        self._coordinator = None
        self._groups = set()
        self._my_id = my_id or str(uuid.uuid4())

    def start(self):
        backend_url = cfg.CONF.coordination.backend_url
        if backend_url:
            try:
                self._coordinator = tooz.coordination.get_coordinator(
                    backend_url, self._my_id)
                self._coordinator.start()
                LOG.info(_LI('Coordination backend started successfully.'))
            except tooz.coordination.ToozError:
                LOG.exception(_LE('Error connecting to coordination backend.'))

解释:
4.2.1) PartitionCoordinator类是一个工作负载分区协调器。
该类使用了tooz类(
是一个库,用于提供分布式协调API,处理分布式中的群组和群组中的成员。
参考: https://blog.csdn.net/allenson1/article/details/44539709
)来管理组的成员。
为了确保其他的agents知道当前agent是否还存活,heartbeat方法应该被周期性调用。
Coordination错误和重连都会在hood下被处理,因此服务使用分区协调器不需要考虑这个
coordination后端是否宕机。
在这种情况下,extract_my_subset 将仅仅返回一个空的可迭代对象。

4.2.2) 成员变量:
_coordinator: 协调器
_groups: 集合set类型的群组
_my_id: 当前组成员的id(默认是uuid)

4.2.3)start方法
思想: 先获取协调器后端的url,如果该url非空,则
调用tooz.coordination的get_coordinator,根据传入的后端url和当前
组成员的id来获取_coordinator协调器变量,然后调用_coordinator的start方法。


4.3 继续分析ceilometer/notification.py
中NotificationService类的run方法,代码如下:

        if cfg.CONF.notification.workload_partitioning:
            # join group after all manager set up is configured
            self.partition_coordinator.join_group(self.group_id)
            self.partition_coordinator.watch_group(self.group_id,
                                                   self._refresh_agent)

            @periodics.periodic(spacing=cfg.CONF.coordination.heartbeat,
                                run_immediately=True)
            def heartbeat():
                self.partition_coordinator.heartbeat()

            @periodics.periodic(spacing=cfg.CONF.coordination.check_watchers,
                                run_immediately=True)
            def run_watchers():
                self.partition_coordinator.run_watchers()

            self.periodic = periodics.PeriodicWorker.create(
                [], executor_factory=lambda:
                futures.ThreadPoolExecutor(max_workers=10))
            self.periodic.add(heartbeat)
            self.periodic.add(run_watchers)

            utils.spawn_thread(self.periodic.start)

            # configure pipelines after all coordination is configured.
            with self.coord_lock:
                self._configure_pipeline_listener()

分析:
4.3.1) 这里如果开启了负载划分,则通过:
self.partition_coordinator.join_group(self.group_id)
把当前notification-agent加入到组别: self.group_id实际就是'ceilometer.notification'中(因为:
self.group_id = self.NOTIFICATION_NAMESPACE
而self.NOTIFICATION_NAMESPACE为'ceilometer.notification')

通过:
self.partition_coordinator.watch_group(self.group_id,self._refresh_agent)
一旦有组别加入,就调用: self._refresh_agent 方法

4.3.2) 查看self._refresh_agent方法
代码如下:
    def _refresh_agent(self, event):
        with self.coord_lock:
            if self.shutdown:
                # NOTE(sileht): We are going to shutdown we everything will be
                # stopped, we should not restart them
                return
            self._configure_pipeline_listener()

解释:
self.coord_lock 来自于NotificationService类的run方法中的:
        self.coord_lock = threading.Lock()
也就是说: self.coord_lock实际是一个线程锁
self.shutdown 来自于来自于NotificationService类的run方法中的:
        self.shutdown = False
该self.shutdown只有在NotificationService类的terminate方法中:
才会被设置为True。
因此这个self._refresh_agent方法的作用就是:
获取到线程锁,如果现在协调器是关闭状态,则直接返回;
否则,则调用: self._configure_pipeline_listener()

4.3.3) 查看self._configure_pipeline_listener方法
该方法具体内容如下:
    def _configure_pipeline_listener(self):
        ev_pipes = self.event_pipeline_manager.pipelines
        pipelines = self.pipeline_manager.pipelines + ev_pipes
        transport = messaging.get_transport()
        partitioned = self.partition_coordinator.extract_my_subset(
            self.group_id,
            range(cfg.CONF.notification.pipeline_processing_queues))

        endpoints = []
        targets = []

        for pipe in pipelines:
            if isinstance(pipe, pipeline.EventPipeline):
                endpoints.append(pipeline.EventPipelineEndpoint(pipe))
            else:
                endpoints.append(pipeline.SamplePipelineEndpoint(pipe))

        for pipe_set, pipe in itertools.product(partitioned, pipelines):
            LOG.debug('Pipeline endpoint: %s from set: %s',
                      pipe.name, pipe_set)
            topic = '%s-%s-%s' % (self.NOTIFICATION_IPC,
                                  pipe.name, pipe_set)
            targets.append(oslo_messaging.Target(topic=topic))

        if self.pipeline_listener:
            self.pipeline_listener.stop()
            self.pipeline_listener.wait()

        self.pipeline_listener = messaging.get_batch_notification_listener(
            transport,
            targets,
            endpoints,
            batch_size=cfg.CONF.notification.batch_size,
            batch_timeout=cfg.CONF.notification.batch_timeout)
        # NOTE(gordc): set single thread to process data sequentially
        # if batching enabled.
        batch = (1 if cfg.CONF.notification.batch_size > 1 else None)
        self.pipeline_listener.start(override_pool_size=batch)

分析:
4.3.3.1) 上述代码主要进行了如下处理:
获取sample和event的pipeline列表,然后获取transport,
最重要的是进行如下处理:
        partitioned = self.partition_coordinator.extract_my_subset(
            self.group_id,
            range(cfg.CONF.notification.pipeline_processing_queues))


4.3.4) 查看ceilometer/coordination.py中PartitionCoordinator类的extract_my_subset方法
该方法具体内容如下:
    @retrying.retry(stop_max_attempt_number=5, wait_random_max=2000,
                    retry_on_exception=retry_on_member_not_in_group)
    def extract_my_subset(self, group_id, iterable, attempt=0):
        """Filters an iterable, returning only objects assigned to this agent.

        We have a list of objects and get a list of active group members from
        `tooz`. We then hash all the objects into buckets and return only
        the ones that hashed into *our* bucket.
        """
        if not group_id:
            return iterable
        if group_id not in self._groups:
            self.join_group(group_id)
        try:
            members = self._get_members(group_id)
            LOG.debug('Members of group %s are: %s, Me: %s',
                      group_id, members, self._my_id)
            if self._my_id not in members:
                LOG.warning(_LW('Cannot extract tasks because agent failed to '
                                'join group properly. Rejoining group.'))
                self.join_group(group_id)
                members = self._get_members(group_id)
                if self._my_id not in members:
                    raise MemberNotInGroupError(group_id, members, self._my_id)
                LOG.debug('Members of group %s are: %s, Me: %s',
                          group_id, members, self._my_id)
            hr = utils.HashRing(members)
            iterable = list(iterable)
            filtered = [v for v in iterable
                        if hr.get_node(six.text_type(v)) == self._my_id]
            LOG.debug('The universal set: %s, my subset: %s',
                      [six.text_type(f) for f in iterable],
                      [six.text_type(f) for f in filtered])
            return filtered
        except tooz.coordination.ToozError:
            LOG.exception(_LE('Error getting group membership info from '
                              'coordination backend.'))
            return []

分析:
4.3.4.1) 这里最重要的就是:
            members = self._get_members(group_id)
            hr = utils.HashRing(members)
这里先获取群组中的所有成员,然后传入所有群组成员,生成一个一致性哈希环。
这个一致性哈希环实际就是一个字典: <节点名称-replica副本索引,节点对应哈希值>

4.3.5) 查看ceilometer/utils.py中的HashRing
代码如下:
class HashRing(object):

    def __init__(self, nodes, replicas=100):
        self._ring = dict()
        self._sorted_keys = []

        for node in nodes:
            for r in six.moves.range(replicas):
                hashed_key = self._hash('%s-%s' % (node, r))
                self._ring[hashed_key] = node
                self._sorted_keys.append(hashed_key)
        self._sorted_keys.sort()

    @staticmethod
    def _hash(key):
        return struct.unpack_from('>I',
                                  hashlib.md5(decode_unicode(six
                                              .text_type(key))).digest())[0]

    def _get_position_on_ring(self, key):
        hashed_key = self._hash(key)
        position = bisect.bisect(self._sorted_keys, hashed_key)
        return position if position < len(self._sorted_keys) else 0

    def get_node(self, key):
        if not self._ring:
            return None
        pos = self._get_position_on_ring(key)
        return self._ring[self._sorted_keys[pos]]

分析:
4.3.5.1) HashRing主要作用就是维护了一个字典:
<节点名称-replica副本索引,节点对应哈希值>
4.3.5.2) __init__方法:
    作用: 设置节点到其哈希值的映射
    具体处理步骤:
    步骤1: 遍历群组中的成员名称列表(可以认为是节点列表),对每个节点
            1.1 遍历每个副本值
            1.2 拼接哈希名称为: 节点名称-副本值,并计算该哈希名称的哈希值
                计算哈希值的步骤具体如下:
                1.2.1 先将输入数据转换为unicode字符串,并用utf-8编码为字节流
                1.2.2 对字节流用hashlib.md5处理,并获取处理后的二进制摘要字符串
                1.2.3 对二进制摘要字符串采用struct.unpack_from处理,设置格式为大端模式,转换为python
                        的整型数据,该整型数据即为输入数据对应的哈希值。
            1.3 建立映射: <哈希值,节点名称>
            1.3 将哈希值存入哈希值数组中
    步骤2: 对哈希值数组排序

4.3.5.3) _hash方法:
作用: 获取输入数据对应的哈希值
处理过程:
步骤1: 将数据转换为unicode字符串,然后编码为utf-8的二进制字节流
步骤2: 将该数据的二进制字节流通过hashlib.md5的digest摘要算法获取其二进制字节流摘要
步骤3: 对二进制字节流采用大端模式利用struct的unpack_from进行拆包,转换为python中的整型数据
该整型数据就为输入数据的哈希值。
整体过程: 
数据->unicode字符串->编码为utf-8->md5获取其二进制摘要->拆包该二进制摘要转换为python整型数据
样例哈希值: 1984516612

4.3.5.4) get_node方法:
作用: 获取数据数据在ring中的位置
主要思想: 获取该输入数据对应的哈希值(
md5获取数据的二进制摘要字符串,struct对二进制摘要字符串拆包得到
对应python中的整型数据);
然后用二分查找bisect查找到之前哈希值数组中当前输入数据哈希值所在的位置


4.3.6) 继续分析ceilometer/coordination.py中PartitionCoordinator类的extract_my_subset方法
该方法具体内容如下:
    @retrying.retry(stop_max_attempt_number=5, wait_random_max=2000,
                    retry_on_exception=retry_on_member_not_in_group)
    def extract_my_subset(self, group_id, iterable, attempt=0):
        """Filters an iterable, returning only objects assigned to this agent.

        We have a list of objects and get a list of active group members from
        `tooz`. We then hash all the objects into buckets and return only
        the ones that hashed into *our* bucket.
        """
        if not group_id:
            return iterable
        if group_id not in self._groups:
            self.join_group(group_id)
        try:
            members = self._get_members(group_id)
            LOG.debug('Members of group %s are: %s, Me: %s',
                      group_id, members, self._my_id)
            if self._my_id not in members:
                LOG.warning(_LW('Cannot extract tasks because agent failed to '
                                'join group properly. Rejoining group.'))
                self.join_group(group_id)
                members = self._get_members(group_id)
                if self._my_id not in members:
                    raise MemberNotInGroupError(group_id, members, self._my_id)
                LOG.debug('Members of group %s are: %s, Me: %s',
                          group_id, members, self._my_id)
            hr = utils.HashRing(members)
            iterable = list(iterable)
            filtered = [v for v in iterable
                        if hr.get_node(six.text_type(v)) == self._my_id]
            LOG.debug('The universal set: %s, my subset: %s',
                      [six.text_type(f) for f in iterable],
                      [six.text_type(f) for f in filtered])
            return filtered
        except tooz.coordination.ToozError:
            LOG.exception(_LE('Error getting group membership info from '
                              'coordination backend.'))
            return []

分析:
4.3.6.1) 刚才已经分析了
            hr = utils.HashRing(members)
主要是生成一致性哈希环,本质是一个字典。
4.3.6.2)
            iterable = list(iterable)
            filtered = [v for v in iterable
                        if hr.get_node(six.text_type(v)) == self._my_id]
默认iterable是: range(cfg.CONF.notification.pipeline_processing_queues)
即iterable是一个[0,1,...,9],
注意: 这里返回的是0到9中符合要求的数字组成的数组。
即根据[0,...,9]的默认队列编号列表,对每个队列编号,计算该队列编号落在哪个节点上【
根据二分查找,把队列编号的哈希值在之前记录的节点哈希值数组进行二分查找,找到离
哪个节点的哈希值最近】,如果计算得到的节点名称和当前节点的名称一致,就说明当前
编号队列的所有相关topic所对应队列上的消息应该由这个节点进行后续处理,即真正实现负载均衡。


4.3.7 回到继续查看self._configure_pipeline_listener方法
该方法具体内容如下:
    def _configure_pipeline_listener(self):
        ev_pipes = self.event_pipeline_manager.pipelines
        pipelines = self.pipeline_manager.pipelines + ev_pipes
        transport = messaging.get_transport()
        partitioned = self.partition_coordinator.extract_my_subset(
            self.group_id,
            range(cfg.CONF.notification.pipeline_processing_queues))

        endpoints = []
        targets = []

        for pipe in pipelines:
            if isinstance(pipe, pipeline.EventPipeline):
                endpoints.append(pipeline.EventPipelineEndpoint(pipe))
            else:
                endpoints.append(pipeline.SamplePipelineEndpoint(pipe))

        for pipe_set, pipe in itertools.product(partitioned, pipelines):
            LOG.debug('Pipeline endpoint: %s from set: %s',
                      pipe.name, pipe_set)
            topic = '%s-%s-%s' % (self.NOTIFICATION_IPC,
                                  pipe.name, pipe_set)
            targets.append(oslo_messaging.Target(topic=topic))

        if self.pipeline_listener:
            self.pipeline_listener.stop()
            self.pipeline_listener.wait()

        self.pipeline_listener = messaging.get_batch_notification_listener(
            transport,
            targets,
            endpoints,
            batch_size=cfg.CONF.notification.batch_size,
            batch_timeout=cfg.CONF.notification.batch_timeout)
        # NOTE(gordc): set single thread to process data sequentially
        # if batching enabled.
        batch = (1 if cfg.CONF.notification.batch_size > 1 else None)
        self.pipeline_listener.start(override_pool_size=batch)

分析:
4.3.7.1) 刚才已经分析了
        partitioned = self.partition_coordinator.extract_my_subset(
            self.group_id,
            range(cfg.CONF.notification.pipeline_processing_queues))
这个方法默认会返回0到9数字中符合要求的数字所组成的列表

4.3.7.2)
        for pipe_set, pipe in itertools.product(partitioned, pipelines):
            LOG.debug('Pipeline endpoint: %s from set: %s',
                      pipe.name, pipe_set)
            topic = '%s-%s-%s' % (self.NOTIFICATION_IPC,
                                  pipe.name, pipe_set)
            targets.append(oslo_messaging.Target(topic=topic))
分析:
这个方法是采用itertools.product(iterable1, iterable2)
作用: 笛卡尔积
等同于itertable1是外层循环,iterable2是内层循环,
实际就是一个二层循环

self.NOTIFICATION_IPC这个值是:
    NOTIFICATION_IPC = 'ceilometer-pipe'
pipe.name: 这个值应该是pipeline.yaml中,例如: cpu_source:cpu_sink
pipe_set: 就是一个数字
那完整的topic的一个样例是:
ceilometer-pipe-cpu_source:cpu_sink-0
对应于队列中的名称是:
ceilometer-pipe-cpu_source:cpu_sink-0.sample

pipeline.yaml部分样例内容如下:
---
sources:
    - name: cpu_source
      interval: 300
      meters:
          - "cpu"
      sinks:
          - cpu_sink
          - cpu_delta_sink
sinks:
    - name: cpu_sink
      transformers:
          - name: "rate_of_change"
            parameters:
                target:
                    name: "cpu_util"
                    unit: "%"
                    type: "gauge"
                    max: 100
                    scale: "100.0 / (10**9 * (resource_metadata.cpu_number or 1))"
      publishers:
          - notifier://
    - name: cpu_delta_sink
      transformers:
          - name: "delta"
            parameters:
                target:
                    name: "cpu.delta"
                growth_only: True
      publishers:
          - notifier://

直接在环境中查询队列:
rabbitmqctl list_queues|grep ceilometer-pipe
样例输出结果:
ceilometer-pipe-meter_source:meter_sink-1.sample    0
ceilometer-pipe-instance_disks_source:instance_disks_sink-5.sample    0
ceilometer-pipe-disk_source:disk_sink-3.sample    17
ceilometer-pipe-volume_source:volume_sink-6.sample    4
ceilometer-pipe-event:event_source:event_sink-4.sample    0
ceilometer-pipe-event:event_source:event_sink-0.sample    0
ceilometer-pipe-network_source:network_sink-3.sample    8
ceilometer-pipe-network_source:network_sink-8.sample    0
ceilometer-pipe-volume_source:volume_sink-3.sample    0
ceilometer-pipe-instance_disks_source:instance_disks_sink-8.sample    0
ceilometer-pipe-disk_source:disk_sink-6.sample    0
ceilometer-pipe-cpu_source:cpu_delta_sink-2.sample    0
ceilometer-pipe-network_source:network_sink-5.sample    8
ceilometer-pipe-network_source:network_sink-6.sample    12
ceilometer-pipe-event:event_source:event_sink-2.sample    0
ceilometer-pipe-meter_source:meter_sink-7.sample    0
ceilometer-pipe-cpu_source:cpu_delta_sink-5.sample    2
ceilometer-pipe-meter_source:meter_sink-2.sample    0
ceilometer-pipe-volume_source:volume_sink-5.sample    0
ceilometer-pipe-notification_source:notification_sink-7.sample    0
ceilometer-pipe-notification_source:notification_sink-0.sample    0
ceilometer-pipe-volume_source:volume_sink-4.sample    0
ceilometer-pipe-disk_source:disk_sink-1.sample    10
ceilometer-pipe-meter_source:meter_sink-5.sample    0
ceilometer-pipe-disk_source:disk_sink-9.sample    0
ceilometer-pipe-meter_source:meter_sink-8.sample    0
ceilometer-pipe-instance_disks_source:instance_disks_sink-9.sample    0
ceilometer-pipe-instance_disks_source:instance_disks_sink-7.sample    0
ceilometer-pipe-cpu_source:cpu_sink-0.sample    3
ceilometer-pipe-volume_source:volume_sink-2.sample    0
ceilometer-pipe-meter_source:meter_sink-0.sample    0
ceilometer-pipe-event:event_source:event_sink-1.sample    0
ceilometer-pipe-cpu_source:cpu_sink-4.sample    0
ceilometer-pipe-cpu_source:cpu_sink-6.sample    0
ceilometer-pipe-volume_source:volume_sink-8.sample    0
ceilometer-pipe-instance_disks_source:instance_disks_sink-0.sample    0
ceilometer-pipe-meter_source:meter_sink-3.sample    0
ceilometer-pipe-notification_source:notification_sink-2.sample    0
ceilometer-pipe-cpu_source:cpu_sink-8.sample    1
ceilometer-pipe-cpu_source:cpu_sink-5.sample    2
ceilometer-pipe-cpu_source:cpu_delta_sink-0.sample    3
ceilometer-pipe-volume_source:volume_sink-7.sample    0
ceilometer-pipe-instance_disks_source:instance_disks_sink-6.sample    0
ceilometer-pipe-cpu_source:cpu_sink-1.sample    2
ceilometer-pipe-network_source:network_sink-2.sample    0
ceilometer-pipe-instance_disks_source:instance_disks_sink-2.sample    0
ceilometer-pipe-event:event_source:event_sink-8.sample    0
ceilometer-pipe-volume_source:volume_sink-0.sample    0
ceilometer-pipe-disk_source:disk_sink-4.sample    4
ceilometer-pipe-cpu_source:cpu_delta_sink-7.sample    0
ceilometer-pipe-cpu_source:cpu_delta_sink-8.sample    1
ceilometer-pipe-meter_source:meter_sink-9.sample    0
ceilometer-pipe-cpu_source:cpu_sink-3.sample    4
ceilometer-pipe-notification_source:notification_sink-9.sample    0
ceilometer-pipe-meter_source:meter_sink-4.sample    0
ceilometer-pipe-network_source:network_sink-9.sample    0
ceilometer-pipe-notification_source:notification_sink-8.sample    0
ceilometer-pipe-instance_disks_source:instance_disks_sink-3.sample    0

查看其他:
root@rabbitmq-0:/# rabbitmqctl list_queues|grep cpu_source:cpu_sink
ceilometer-pipe-cpu_source:cpu_sink-9.sample    0
ceilometer-pipe-cpu_source:cpu_sink-7.sample    0
ceilometer-pipe-cpu_source:cpu_sink-8.sample    1
ceilometer-pipe-cpu_source:cpu_sink-2.sample    0
ceilometer-pipe-cpu_source:cpu_sink-0.sample    2
ceilometer-pipe-cpu_source:cpu_sink-4.sample    0
ceilometer-pipe-cpu_source:cpu_sink-6.sample    0
ceilometer-pipe-cpu_source:cpu_sink-5.sample    1
ceilometer-pipe-cpu_source:cpu_sink-1.sample    1
ceilometer-pipe-cpu_source:cpu_sink-3.sample    2

分析:
可以看到对于每个pipeline默认有0到9一共10个队列

参考:
https://blog.csdn.net/qq_33528613/article/details/79365291


4.3.7.3) 
        self.pipeline_listener = messaging.get_batch_notification_listener(
            transport,
            targets,
            endpoints,
            batch_size=cfg.CONF.notification.batch_size,
            batch_timeout=cfg.CONF.notification.batch_timeout)
分析:
4.3.7.3.1) 查看ceilometer/messaging.py中的代码如下:
def get_batch_notification_listener(transport, targets, endpoints,
                                    allow_requeue=False,
                                    batch_size=1, batch_timeout=None):
    """Return a configured oslo_messaging notification listener."""
    return oslo_messaging.get_batch_notification_listener(
        transport, targets, endpoints, executor='threading',
        batch_size=batch_size, batch_timeout=batch_timeout)

上述代码中的:
endpoints来自于:
        endpoints = []
        for pipe in pipelines:
            if isinstance(pipe, pipeline.EventPipeline):
                endpoints.append(pipeline.EventPipelineEndpoint(pipe))
            else:
                endpoints.append(pipeline.SamplePipelineEndpoint(pipe))

4.3.7.3.2) 
 对pipeline.SamplePipelineEndpoint进行分析:
对应源码在: ceilometer/pipeline.py中

class SamplePipelineEndpoint(PipelineEndpoint):
    def sample(self, messages):
        samples = chain.from_iterable(m["payload"] for m in messages)
        samples = [
            sample_util.Sample(name=s['counter_name'],
                               type=s['counter_type'],
                               unit=s['counter_unit'],
                               volume=s['counter_volume'],
                               user_id=s['user_id'],
                               project_id=s['project_id'],
                               resource_id=s['resource_id'],
                               timestamp=s['timestamp'],
                               resource_metadata=s['resource_metadata'],
                               source=s.get('source'),
                               # NOTE(sileht): May come from an older node,
                               # Put None in this case.
                               monotonic_time=s.get('monotonic_time'))
            for s in samples if publisher_utils.verify_signature(
                s, cfg.CONF.publisher.telemetry_secret)
        ]
        with self.publish_context as p:
            p(sorted(samples, key=methodcaller('get_iso_timestamp')))

可以看到SamplePipelineEndpoint的父类是PipelineEndpoint,
上述实际就是接收ceilometer-compute发送给ceilometer-notification的采样数据,
将经过pipeline处理的采样数据再发送给ceilometer-collector服务。

4.3.7.3.3)PipelineEndpoint具体内容如下
@six.add_metaclass(abc.ABCMeta)
class PipelineEndpoint(object):

    def __init__(self, pipeline):
        self.filter_rule = oslo_messaging.NotificationFilter(
            publisher_id=pipeline.name)
        self.publish_context = PublishContext([pipeline])

    @abc.abstractmethod
    def sample(self, messages):
        pass

而pipeline.name类似于:
cpu_source:cpu_sink

4.3.7.3.4)对pipeline.EventPipelineEndpoint进行分析
对应源码在: ceilometer/pipeline.py中
代码如下:
class EventPipelineEndpoint(PipelineEndpoint):
    def sample(self, messages):
        events = chain.from_iterable(m["payload"] for m in messages)
        events = [
            models.Event(
                message_id=ev['message_id'],
                event_type=ev['event_type'],
                generated=timeutils.normalize_time(
                    timeutils.parse_isotime(ev['generated'])),
                traits=[models.Trait(name, dtype,
                                     models.Trait.convert_value(dtype, value))
                        for name, dtype, value in ev['traits']],
                raw=ev.get('raw', {}))
            for ev in events if publisher_utils.verify_signature(
                ev, cfg.CONF.publisher.telemetry_secret)
        ]
        try:
            with self.publish_context as p:
                p(events)
        except Exception:
            if not cfg.CONF.notification.ack_on_event_error:
                return oslo_messaging.NotificationResult.REQUEUE
            raise
        return oslo_messaging.NotificationResult.HANDLED

分析:
上述是获取payload中数据转换成events,然后发布events

问题的关键:
endpoints和topic是怎么关联的,我记得两者是通过topic关联起来的,
并且如果发送方使用oslo.messaging.notifier.sample,那么对应接收方
需要在endpoint中实现一个同名方法,例如sample。

ceilometer-compute设置的topic为notifications
当时采样的消息是放在notifications.sample队列中
现在这个消息如何放,具体后面再解释。

ceilometer-pipe-cpu_source:cpu_sink-0.sample
队列中的,看ceilometer-compute那边的发送代码应该也变了。

4.4 继续回到ceilometer/notification.py中的NotificationService类的
run方法
代码如下:
    def run(self):
        super(NotificationService, self).run()
        self.shutdown = False
        self.periodic = None
        self.partition_coordinator = None
        self.coord_lock = threading.Lock()

        self.listeners = []

        # NOTE(kbespalov): for the pipeline queues used a single amqp host
        # hence only one listener is required
        self.pipeline_listener = None

        self.pipeline_manager = pipeline.setup_pipeline()

        self.event_pipeline_manager = pipeline.setup_event_pipeline()

        self.transport = messaging.get_transport()

        if cfg.CONF.notification.workload_partitioning:
            self.group_id = self.NOTIFICATION_NAMESPACE
            self.partition_coordinator = coordination.PartitionCoordinator()
            self.partition_coordinator.start()
        else:
            # FIXME(sileht): endpoint uses the notification_topics option
            # and it should not because this is an oslo_messaging option
            # not a ceilometer. Until we have something to get the
            # notification_topics in another way, we must create a transport
            # to ensure the option has been registered by oslo_messaging.
            messaging.get_notifier(self.transport, '')
            self.group_id = None

        self.pipe_manager = self._get_pipe_manager(self.transport,
                                                   self.pipeline_manager)
        self.event_pipe_manager = self._get_event_pipeline_manager(
            self.transport)

        self._configure_main_queue_listeners(self.pipe_manager,
                                             self.event_pipe_manager)

        if cfg.CONF.notification.workload_partitioning:
            # join group after all manager set up is configured
            self.partition_coordinator.join_group(self.group_id)
            self.partition_coordinator.watch_group(self.group_id,
                                                   self._refresh_agent)

            @periodics.periodic(spacing=cfg.CONF.coordination.heartbeat,
                                run_immediately=True)
            def heartbeat():
                self.partition_coordinator.heartbeat()

            @periodics.periodic(spacing=cfg.CONF.coordination.check_watchers,
                                run_immediately=True)
            def run_watchers():
                self.partition_coordinator.run_watchers()

            self.periodic = periodics.PeriodicWorker.create(
                [], executor_factory=lambda:
                futures.ThreadPoolExecutor(max_workers=10))
            self.periodic.add(heartbeat)
            self.periodic.add(run_watchers)

            utils.spawn_thread(self.periodic.start)

            # configure pipelines after all coordination is configured.
            with self.coord_lock:
                self._configure_pipeline_listener()

        if not cfg.CONF.notification.disable_non_metric_meters:
            LOG.warning(_LW('Non-metric meters may be collected. It is highly '
                            'advisable to disable these meters using '
                            'ceilometer.conf or the pipeline.yaml'))

        self.init_pipeline_refresh()

分析:
4.4.0) 特别容易被忽略的代码
self.pipeline_manager = pipeline.setup_pipeline()
源码在:ceilometer/pipeline.py
def setup_pipeline(transformer_manager=None):
    """Setup pipeline manager according to yaml config file."""
    default = extension.ExtensionManager('ceilometer.transformer')
    cfg_file = cfg.CONF.pipeline_cfg_file
    return PipelineManager(cfg_file, transformer_manager or default,
                           SAMPLE_TYPE)

分析:
4.4.0.1)
setup_pipeline(transformer_manager=None):
返回实例化的PipelineManager(ConfigManagerBase)对象
作用: Pipeline的管理器,实际上是是读取pipeline.yaml生成pipeline列表,
每个pipeline是一个source和一个sink组成,其中每个sink都包含了转换器列表
transformers(根据名称到对应entry_points中查找到的),
和publishers列表。
sample pipeline的名称样例如下:
cpu_source:cpu_sink
成员变量: pipelines

4.4.0.2)
    PipelineManager(ConfigManagerBase):
    作用: Pipeline的管理器,实际上是是读取pipeline.yaml生成pipeline列表,
    每个pipeline是一个source和一个sink组成,其中每个sink都包含了转换器列表
    transformers(根据名称到对应entry_points中查找到的),
    和publishers列表。
    成员变量: pipelines

4.4.0.3) 回到ceilometer/notification.py的run方法
有调用如下代码:

        self.pipe_manager = self._get_pipe_manager(self.transport,
                                                   self.pipeline_manager)
        self.event_pipe_manager = self._get_event_pipeline_manager(
            self.transport)
分析:
4.4.0.3.1) _get_pipe_manager方法内容如下
    def _get_pipe_manager(self, transport, pipeline_manager):
        if cfg.CONF.notification.workload_partitioning:
            pipe_manager = pipeline.SamplePipelineTransportManager()
            for pipe in pipeline_manager.pipelines:
                key = pipeline.get_pipeline_grouping_key(pipe)
                pipe_manager.add_transporter(
                    (pipe.source.support_meter, key or ['resource_id'],
                     self._get_notifiers(transport, pipe)))
        else:
            pipe_manager = pipeline_manager
        return pipe_manager
解释:
1) _get_pipe_manager(self, transport, pipeline_manager):
    如果开启协调组,则初始化:SamplePipelineTransportManager类,
    遍历每个pipeline,获取该pipeline的key(基本上是['resource_id'] )
2)调试内容
                (Pdb) p pipe

(Pdb) p pipe.__dict__
{'source': , 'sink': , 'name': 'cpu_source:cpu_sink'}

(Pdb) p key
['resource_id']

3) 分析: _get_notifiers方法,内容如下:
    def _get_notifiers(self, transport, pipe):
        notifiers = []
        for x in range(cfg.CONF.notification.pipeline_processing_queues):
            notifiers.append(oslo_messaging.Notifier(
                transport,
                driver=cfg.CONF.publisher_notifier.telemetry_driver,
                publisher_id=pipe.name,
                topics=['%s-%s-%s' % (self.NOTIFICATION_IPC, pipe.name, x)]))
        return notifiers
解释:
A) _get_notifiers(self, transport, pipe):
                获取'ceilometer-pipe'前缀与pipeline名称与默认队列编号列表拼接的topics列表组成的
                oslo_messaging.Notifier列表并返回
B) 调试内容如下:
        cpu_util关键:
        (Pdb) p transport

(Pdb) p transport.__dict__
{'_driver': , 'conf': }
(Pdb) p pipe

(Pdb) p pipe.__dict__
{'source': , 'sink': , 'name': 'cpu_source:cpu_sink'}

(Pdb) p pipe.name
'cpu_source:cpu_sink'

(Pdb) p ['%s-%s-%s' % (self.NOTIFICATION_IPC, pipe.name, x)]
['ceilometer-pipe-cpu_source:cpu_sink-0']

(Pdb) p notifiers
[, , , , , , , , , ]
(Pdb) p notifiers[0].__dict__
{'_serializer': , '_driver_mgr': , 'retry': None, '_driver_names': ['messagingv2'], '_topics': ['ceilometer-pipe-cpu_source:cpu_sink-0'], 'publisher_id': 'cpu_source:cpu_sink', 'transport': }

cfg.CONF.notification.pipeline_processing_queues
默认为10

NOTIFICATION_IPC = 'ceilometer-pipe'

分析:
这里在拼接形如:
['%s-%s-%s' % (self.NOTIFICATION_IPC, pipe.name, x)]
样例为: ['ceilometer-pipe-cpu_source:cpu_sink-0']
的topic,用于后续负载均衡

4.4.0.4) 继续回到ceilometer/notification.py的run方法中
        self.pipe_manager = self._get_pipe_manager(self.transport,
                                                   self.pipeline_manager)
这里如果开启协调器,那么
elf.pipe_manager才是用于协调组负载均衡的实际的pipe_manager,并生成了许多
['%s-%s-%s' % (self.NOTIFICATION_IPC, pipe.name, x)]
样例为: ['ceilometer-pipe-cpu_source:cpu_sink-0']
的topic的notifiers,用于后续负载均衡
如果没有开启协调器,则使用的是原始的: 
self.pipeline_manager被赋值给self.pipe_manager 
这点特别容易忽略,这里是实现负载均衡的关键。


4.4.0.5) 分析_configure_main_queue_listeners方法
        self._configure_main_queue_listeners(self.pipe_manager,
                                             self.event_pipe_manager)
代码如下:
    def _configure_main_queue_listeners(self, pipe_manager,
                                        event_pipe_manager):
        notification_manager = self._get_notifications_manager(pipe_manager)
        if not list(notification_manager):
            LOG.warning(_('Failed to load any notification handlers for %s'),
                        self.NOTIFICATION_NAMESPACE)

        ack_on_error = cfg.CONF.notification.ack_on_event_error

        endpoints = []
        endpoints.append(
            event_endpoint.EventsNotificationEndpoint(event_pipe_manager))

        targets = []
        for ext in notification_manager:
            handler = ext.obj
            if (cfg.CONF.notification.disable_non_metric_meters and
                    isinstance(handler, base.NonMetricNotificationBase)):
                continue
            LOG.debug('Event types from %(name)s: %(type)s'
                      ' (ack_on_error=%(error)s)',
                      {'name': ext.name,
                       'type': ', '.join(handler.event_types),
                       'error': ack_on_error})
            # NOTE(gordc): this could be a set check but oslo_messaging issue
            # https://bugs.launchpad.net/oslo.messaging/+bug/1398511
            # This ensures we don't create multiple duplicate consumers.
            for new_tar in handler.get_targets(cfg.CONF):
                if new_tar not in targets:
                    targets.append(new_tar)
            endpoints.append(handler)

        urls = cfg.CONF.notification.messaging_urls or [None]
        for url in urls:
            transport = messaging.get_transport(url)
            # NOTE(gordc): ignore batching as we want pull
            # to maintain sequencing as much as possible.
            listener = messaging.get_batch_notification_listener(
                transport, targets, endpoints)
            listener.start()
            self.listeners.append(listener)
分析:
4.4.0.5.1) 
    _configure_main_queue_listeners(self, pipe_manager,event_pipe_manager):
    作用: 根据是否开启协调组,获取对应的targets(至少包含topic和exchange)和对应的处理类对象列表endpoints
    处理过程:
    步骤1: 读取ceilometer.notification这个命名空间下的所有插件(这些插件就是后续所用到的endpoint,处理消息)
    步骤2: 在endpoints中添加EventsNotificationEndpoint(用于处理openstack其他组件的事件),
    步骤3: 遍历每个插件,如果当前插件基类为base.NonMetricNotificationBase,就过滤该插件
           某个插件样例如下:
           
        (Pdb) p handler.__dict__
        {'filter_rule': , 'manager': }
        (Pdb) p ext.name
        '_sample'
        (Pdb) p handler.event_types
        ['telemetry.api', 'telemetry.polling']
        (Pdb) p handler.get_targets(cfg.CONF)
        []
    步骤4: 对过滤后的插件获取其targets,该target是获取conf.oslo_messaging_notifications.topics 得到主题列表,
            再加上对应该插件的exchange组成,即target包含exchange和topic
         conf.oslo_messaging_notifications.topics = ['notifications']
    步骤5: 将当前插件的targets加入到总的targets列表,将当前插件作为endpoint加入到endpoints列表
    步骤6: 遍历cfg.CONF.notification.messaging_urls,获取每个url的transport,然后调用:
        listener = messaging.get_batch_notification_listener(
                transport, targets, endpoints)
        来获取listener,并开启listener,并将listener加入到listener列表。

4.4.0.5.2) 
            _get_notifications_manager(cls, pm):
    读取ceilometer.notification这个命名空间下的所有插件,
    以extension.ExtensionManager对象的形式返回

        (Pdb) p notification_manager

(Pdb) p notification_manager.__dict__
{'_extensions_by_name': None, 'extensions': [, , , , , , , , , , , , , , , , , , , , , , , , , ], '_on_load_failure_callback': None, 'namespace': 'ceilometer.notification', 'propagate_map_exceptions': False}


4.4.0.5.3) 调试内容
        (Pdb) p cfg.CONF.notification.ack_on_event_error
True

        from ceilometer.event import endpoint as event_endpoint
        这个是用来处理其他openstack组件的创建删除等事件的

            (Pdb) p ext

(Pdb) p ext.__dict__
{'obj': , 'entry_point': EntryPoint.parse('hardware.ipmi.temperature = ceilometer.ipmi.notifications.ironic:TemperatureSensorNotification'), 'name': 'hardware.ipmi.temperature', 'plugin': }

            (Pdb) p cfg.CONF.notification.disable_non_metric_meters
True

重要:
也就是说这里把非监控数据的通知全部过滤了

 
            (Pdb) p handler

(Pdb) p handler.__dict__
{'filter_rule': , 'manager': }
(Pdb) p handler.event_types
['hardware.ipmi.*']
(Pdb) p ack_on_error
True

            重要
            ceilometer收集虚机的监控数据
            (Pdb) p handler

(Pdb) p handler.__dict__
{'filter_rule': , 'manager': }
(Pdb) p ext.name
'_sample'
(Pdb) p handler.event_types
['telemetry.api', 'telemetry.polling']
(Pdb) p handler.get_targets(cfg.CONF)
[]
(Pdb) p new_tar.__dict__
{'version': None, 'exchange': 'ceilometer', 'accepted_namespaces': [None], 'namespace': None, 'server': None, 'topic': 'notifications', 'fanout': None}


            (Pdb) p endpoints
[, ]
(Pdb) p endpoints[0].__dict__
{'event_converter': , 'manager': }


            重要
            (Pdb) p handler

(Pdb) p handler.__dict__
{'definitions': [, , , , , , , , , , , , , , , , , , , , , , , , ], 'manager': }

(Pdb) p ext.name
'meter'

(Pdb) p handler.event_types
[]

/usr/lib/python2.7/site-packages/ceilometer/meter/notifications.py(195)get_targets()
            
(Pdb) p targets
[, , , , , , , , , , , , ]
(Pdb) p len(targets)
13
(Pdb) p targets[0].__dict__
{'version': None, 'exchange': 'nova', 'accepted_namespaces': [None], 'namespace': None, 'server': None, 'topic': 'notifications', 'fanout': None}

中间有遍历到:
(Pdb) p new_tar

(Pdb) p new_tar.__dict__
{'version': None, 'exchange': 'ceilometer', 'accepted_namespaces': [None], 'namespace': None, 'server': None, 'topic': 'notifications', 'fanout': None}

后来被pass掉了

最终版本的targets和endpoints:
(Pdb) p targets
[, , , , , , , , , , , , , ]
(Pdb) p endpoints
[, , , , , , ]

(Pdb) p cfg.CONF.notification.messaging_urls
['rabbit://rabbitmq:[email protected]:5672/']

            (Pdb) p listener

(Pdb) p listener.__dict__
{'_executor_cls': , 'targets': [, , , , , , , , , , , , , ], '_batch_size': 1, '_allow_requeue': False, '_tasks': ['start', 'stop', 'wait'], '_targets_priorities': set([(, 'warn'), (, 'sample'), (, 'sample'), (, 'error'), (, 'warn'), (, 'audit'), (, 'sample'), (, 'error'), (, 'info'), (, 'audit'), (, 'critical'), (, 'debug'), (, 'warn'), (, 'audit'), (, 'audit'), (, 'critical'), (, 'sample'), (, 'info'), (, 'debug'), (, 'warn'), (, 'error'), (, 'audit'), (, 'info'), (, 'critical'), (, 'debug'), (, 'warn'), (, 'warn'), (, 'audit'), (, 'debug'), (, 'error'), (, 'audit'), (, 'info'), (, 'critical'), (, 'critical'), (, 'debug'), (, 'sample'), (, 'info'), (, 'critical'), (, 'sample'), (, 'debug'), (, 'audit'), (, 'warn'), (, 'error'), (, 'audit'), (, 'sample'), (, 'warn'), (, 'critical'), (, 'debug'), (, 'sample'), (, 'sample'), (, 'warn'), (, 'sample'), (, 'error'), (, 'error'), (, 'info'), (, 'info'), (, 'critical'), (, 'audit'), (, 'warn'), (, 'warn'), (, 'audit'), (, 'audit'), (, 'debug'), (, 'error'), (, 'info'), (, 'debug'), (, 'critical'), (, 'debug'), (, 'warn'), (, 'critical'), (, 'info'), (, 'sample'), (, 'error'), (, 'critical'), (, 'warn'), (, 'info'), (, 'error'), (, 'info'), (, 'info'), (, 'critical'), (, 'debug'), (, 'sample'), (, 'debug'), (, 'audit'), (, 'error'), (, 'warn'), (, 'error'), (, 'sample'), (, 'sample'), (, 'critical'), (, 'debug'), (, 'error'), (, 'info'), (, 'info'), (, 'error'), (, 'audit'), (, 'critical'), (, 'debug')]), '_states': {'start': , 'stop': , 'wait': }, '_started': False, '_pool': None, '_reset_lock': , 'listener': None, 'conf': , '_batch_timeout': None, '_work_executor': None, 'dispatcher': , 'transport': , 'executor_type': 'threading'}


            (Pdb) p self.listeners
[]
(Pdb) p self.listeners[0].__dict__
{'_executor_cls': , 'targets': [, , , , , , , , , , , , , ], '_batch_size': 1, '_allow_requeue': False, '_tasks': ['start', 'stop', 'wait'], '_targets_priorities': set([(, 'warn'), (, 'sample'), (, 'sample'), (, 'error'), (, 'warn'), (, 'audit'), (, 'sample'), (, 'error'), (, 'info'), (, 'audit'), (, 'critical'), (, 'debug'), (, 'warn'), (, 'audit'), (, 'audit'), (, 'critical'), (, 'sample'), (, 'info'), (, 'debug'), (, 'warn'), (, 'error'), (, 'audit'), (, 'info'), (, 'critical'), (, 'debug'), (, 'warn'), (, 'warn'), (, 'audit'), (, 'debug'), (, 'error'), (, 'audit'), (, 'info'), (, 'critical'), (, 'critical'), (, 'debug'), (, 'sample'), (, 'info'), (, 'critical'), (, 'sample'), (, 'debug'), (, 'audit'), (, 'warn'), (, 'error'), (, 'audit'), (, 'sample'), (, 'warn'), (, 'critical'), (, 'debug'), (, 'sample'), (, 'sample'), (, 'warn'), (, 'sample'), (, 'error'), (, 'error'), (, 'info'), (, 'info'), (, 'critical'), (, 'audit'), (, 'warn'), (, 'warn'), (, 'audit'), (, 'audit'), (, 'debug'), (, 'error'), (, 'info'), (, 'debug'), (, 'critical'), (, 'debug'), (, 'warn'), (, 'critical'), (, 'info'), (, 'sample'), (, 'error'), (, 'critical'), (, 'warn'), (, 'info'), (, 'error'), (, 'info'), (, 'info'), (, 'critical'), (, 'debug'), (, 'sample'), (, 'debug'), (, 'audit'), (, 'error'), (, 'warn'), (, 'error'), (, 'sample'), (, 'sample'), (, 'critical'), (, 'debug'), (, 'error'), (, 'info'), (, 'info'), (, 'error'), (, 'audit'), (, 'critical'), (, 'debug')]), '_states': {'start': , 'stop': , 'wait': }, '_started': True, '_pool': None, '_reset_lock': , 'listener': , 'conf': , '_batch_timeout': None, '_work_executor': , 'dispatcher': , 'transport': , 'executor_type': 'threading'}


4.4.1) 上面已经分析到在开启协调组的时候,ceilometer-notification服务
监视成员加入组和成员离开组,此时会调用self._refresh_agent方法。
而self._refresh_agent方法则调用了self._configure_pipeline_listener方法。
self._configure_pipeline_listener方法首先利用一致性哈希环计算出了当前节点该组成员
对应默认10个队列编号0~9中落在当前节点上的队列编号列表(
具体的处理逻辑是:
s1: 获取所有组成员,计算每个组成员名称对应的哈希值(先对组成员用md5处理获取其
二进制摘要字符串,然后通过struct对二进制摘要字符串拆包转换为python的整型数据)
s2: 建立键值对: <组成员名称的哈希值,节点名称>,更新一致性哈希环这个字典;
并将组成员名称的哈希值存入到哈希值数组中,最后对哈希值数组排序
s3: 对于新的输入数据,例如队列编号,计算队列编号哈希值,用二分查找查询该输入数据
哈希值在s2中哈希值数组中的位置,根据该位置定位到该输入数据应该落在哪个节点上,
返回该节点名称
)


4.4.2) 
            @periodics.periodic(spacing=cfg.CONF.coordination.heartbeat,
                                run_immediately=True)
            def heartbeat():
                self.partition_coordinator.heartbeat()

分析:
这个是tooz.coordination对象的用法,必须对每个分布式协调器周期性调用heartbeat
来发送心跳,从而判断当前节点上的ceilometer-notification服务是否存活
cfg.CONF.coordination.heartbeat的值是1.0,即默认每隔1秒发送一次心跳

4.4.3)
            @periodics.periodic(spacing=cfg.CONF.coordination.check_watchers,
                                run_immediately=True)
            def run_watchers():
                self.partition_coordinator.run_watchers()
分析:
必须要定期执行:
self.partition_coordinator.run_watchers()
来检测是组成员关系是否发生变化
cfg.CONF.coordination.check_watchers的值默认是10.0

4.4.4) 
            self.periodic = periodics.PeriodicWorker.create(
                [], executor_factory=lambda:
                futures.ThreadPoolExecutor(max_workers=10))
            self.periodic.add(heartbeat)
            self.periodic.add(run_watchers)

            utils.spawn_thread(self.periodic.start)
分析:
这里用定时任务运行heartbeat和run_watchers方法

4.4.5)
            # configure pipelines after all coordination is configured.
            with self.coord_lock:
                self._configure_pipeline_listener()
分析:
self.coord_lock在run方法中定义如下:
        self.coord_lock = threading.Lock()
它是一个线程锁。
在所有协调器配置完毕后,调用:
self._configure_pipeline_listener方法

4.4.6) 分析self._configure_pipeline_listener方法
具体内容如下
    def _configure_pipeline_listener(self):
        ev_pipes = self.event_pipeline_manager.pipelines
        pipelines = self.pipeline_manager.pipelines + ev_pipes
        transport = messaging.get_transport()
        partitioned = self.partition_coordinator.extract_my_subset(
            self.group_id,
            range(cfg.CONF.notification.pipeline_processing_queues))

        endpoints = []
        targets = []

        for pipe in pipelines:
            if isinstance(pipe, pipeline.EventPipeline):
                endpoints.append(pipeline.EventPipelineEndpoint(pipe))
            else:
                endpoints.append(pipeline.SamplePipelineEndpoint(pipe))

        for pipe_set, pipe in itertools.product(partitioned, pipelines):
            LOG.debug('Pipeline endpoint: %s from set: %s',
                      pipe.name, pipe_set)
            topic = '%s-%s-%s' % (self.NOTIFICATION_IPC,
                                  pipe.name, pipe_set)
            targets.append(oslo_messaging.Target(topic=topic))

        if self.pipeline_listener:
            self.pipeline_listener.stop()
            self.pipeline_listener.wait()

        self.pipeline_listener = messaging.get_batch_notification_listener(
            transport,
            targets,
            endpoints,
            batch_size=cfg.CONF.notification.batch_size,
            batch_timeout=cfg.CONF.notification.batch_timeout)
        # NOTE(gordc): set single thread to process data sequentially
        # if batching enabled.
        batch = (1 if cfg.CONF.notification.batch_size > 1 else None)
        self.pipeline_listener.start(override_pool_size=batch)

分析:
在4.4.1)已经分析到在开启协调组的时候,ceilometer-notification服务
监视成员加入组和成员离开组,此时会调用self._refresh_agent方法。
而self._refresh_agent方法则调用了self._configure_pipeline_listener方法。
self._configure_pipeline_listener方法首先利用一致性哈希环计算出了当前节点该组成员
对应默认10个队列编号0~9中落在当前节点上的队列编号列表(
具体的处理逻辑是:
s1: 获取所有组成员,计算每个组成员名称对应的哈希值(先对组成员用md5处理获取其
二进制摘要字符串,然后通过struct对二进制摘要字符串拆包转换为python的整型数据)
s2: 建立键值对: <组成员名称的哈希值,节点名称>,更新一致性哈希环这个字典;
并将组成员名称的哈希值存入到哈希值数组中,最后对哈希值数组排序
s3: 对于新的输入数据,例如队列编号,计算队列编号哈希值,用二分查找查询该输入数据
哈希值在s2中哈希值数组中的位置,根据该位置定位到该输入数据应该落在哪个节点上,
返回该节点名称
)


4.4.7) 最后调用了self.init_pipeline_refresh方法
对应代码如下:
    def init_pipeline_refresh(self):
        """Initializes pipeline refresh state."""
        self.clear_pipeline_validation_status()

        if (cfg.CONF.refresh_pipeline_cfg or
                cfg.CONF.refresh_event_pipeline_cfg):
            self.refresh_pipeline_periodic = utils.create_periodic(
                target=self.refresh_pipeline,
                spacing=cfg.CONF.pipeline_polling_interval)
            utils.spawn_thread(self.refresh_pipeline_periodic.start)
分析:
里面最重要的就是:
            self.refresh_pipeline_periodic = utils.create_periodic(
                target=self.refresh_pipeline,
                spacing=cfg.CONF.pipeline_polling_interval)
            utils.spawn_thread(self.refresh_pipeline_periodic.start)
解释:
cfg.CONF.pipeline_polling_interval默认值为20,表示每隔20秒获取
pipeline的配置内容

调用self.refresh_pipeline方法,
在ceilometer/service_base.py中PipelineBasedService(cotyledon.Service)类中,
该方法如下:

    def refresh_pipeline(self):
        """Refreshes appropriate pipeline, then delegates to agent."""

        if cfg.CONF.refresh_pipeline_cfg:
            manager = None
            if hasattr(self, 'pipeline_manager'):
                manager = self.pipeline_manager
            elif hasattr(self, 'polling_manager'):
                manager = self.polling_manager
            pipeline_hash = manager.cfg_changed() if manager else None
            if pipeline_hash:
                try:
                    LOG.debug("Pipeline has been refreshed. "
                              "old hash: %(old)s, new hash: %(new)s",
                              {'old': manager.cfg_hash,
                               'new': pipeline_hash})
                    # Pipeline in the notification agent.
                    if hasattr(self, 'pipeline_manager'):
                        self.pipeline_manager = pipeline.setup_pipeline()
                    # Polling in the polling agent.
                    elif hasattr(self, 'polling_manager'):
                        self.polling_manager = pipeline.setup_polling()
                    self.pipeline_validated = True
                except Exception as err:
                    LOG.exception(_LE('Unable to load changed pipeline: %s')
                                  % err)

        if cfg.CONF.refresh_event_pipeline_cfg:
            # Pipeline in the notification agent.
            manager = (self.event_pipeline_manager
                       if hasattr(self, 'event_pipeline_manager') else None)
            ev_pipeline_hash = manager.cfg_changed()
            if ev_pipeline_hash:
                try:
                    LOG.debug("Event Pipeline has been refreshed. "
                              "old hash: %(old)s, new hash: %(new)s",
                              {'old': manager.cfg_hash,
                               'new': ev_pipeline_hash})
                    self.event_pipeline_manager = (pipeline.
                                                   setup_event_pipeline())
                    self.event_pipeline_validated = True
                except Exception as err:
                    LOG.exception(_LE('Unable to load changed event pipeline:'
                                      ' %s') % err)

        if self.pipeline_validated or self.event_pipeline_validated:
            self.reload_pipeline()
            self.clear_pipeline_validation_status()


分析:
cfg.CONF.refresh_pipeline_cfg默认值为False
cfg.CONF.refresh_event_pipeline_cfg默认值为False
所以这个方法几乎不会执行。

4.5)消息被notification服务接收到再次发送到ceilometer内部pipeline的处理机制
4.5.1) 总入口
ceilometer/pipeline.py中的SamplePipelineTransportManager类
代码如下:
class SamplePipelineTransportManager(_PipelineTransportManager):
    filter_attr = 'counter_name'
    event_type = 'ceilometer.pipeline'

    @staticmethod
    def serializer(data):
        return publisher_utils.meter_message_from_counter(
            data, cfg.CONF.publisher.telemetry_secret)
分析:
父类为_PipelineTransportManager,继续查看_PipelineTransportManager类

4.5.2) 
class _PipelineTransportManager(object):
    def __init__(self):
        self.transporters = []

    @staticmethod
    def hash_grouping(datapoint, grouping_keys):
        value = ''
        for key in grouping_keys or []:
            value += datapoint.get(key) if datapoint.get(key) else ''
        return hash(value)


    def add_transporter(self, transporter):
        self.transporters.append(transporter)

    def publisher(self):
        serializer = self.serializer
        hash_grouping = self.hash_grouping

        transporters = self.transporters
        filter_attr = self.filter_attr
        event_type = self.event_type

        class PipelinePublishContext(object):
            def __enter__(self):
                def p(data):
                    # TODO(gordc): cleanup so payload is always single
                    #              datapoint. we can't correctly bucketise
                    #              datapoints if batched.
                    data = [data] if not isinstance(data, list) else data
                    for datapoint in data:
                        serialized_data = serializer(datapoint)
                        for d_filter, grouping_keys, notifiers in transporters:
                            if d_filter(serialized_data[filter_attr]):
                                if meter != 'cpu':
                                    continue
                                ForkedPdb().set_trace()
                                hashResult = hash_grouping(serialized_data,grouping_keys)
                                key = hashResult % len(notifiers)
                                # key = (hash_grouping(serialized_data,
                                #                      grouping_keys)
                                #        % len(notifiers))
                                notifier = notifiers[key]
                                notifier.sample({},
                                                event_type=event_type,
                                                payload=[serialized_data])
                return p

            def __exit__(self, exc_type, exc_value, traceback):
                pass

        return PipelinePublishContext()

分析:
4.5.2.1)
_PipelineTransportManager(object):
作用: 设置传输列表: transporters
成员变量:transporters
成员函数:
hash_grouping: 根据传入的数据和分组属性,获取该属性对应值的哈希结果
add_transporter: 添加传输器到传输列表中

4.5.2.2) publisher方法分析
调试内容:
        (Pdb) p transporters
[(>, ['resource_id'], [, , , , , , , , , ]), (>, ['resource_id'], [, , , , , , , , , ]), (>, ['resource_id'], [, , , , , , , , , ]), (>, ['resource_id'], [, , , , , , , , , ]), (>, ['resource_id'], [, , , , , , , , , ]), (>, ['resource_id'], [, , , , , , , , , ]), (>, ['resource_id'], [, , , , , , , , , ]), (>, ['resource_id'], [, , , , , , , , , ])]
(Pdb) p len(transporters)
8
(Pdb) p transporters[0]
(Pdb) (>, ['resource_id'], [, , , , , , , , , ])

    filter_attr = 'counter_name'
    event_type = 'ceilometer.pipeline'


4.5.3) PipelinePublishContext的p方法分析
调试内容如下:
                        (Pdb) p d_filter
>
(Pdb) p grouping_keys
['resource_id']
(Pdb) p notifiers
[, , , , , , , , , ]
(Pdb) p notifiers[0].__dict__
{'_serializer': , '_driver_mgr': , 'retry': None, '_driver_names': ['messagingv2'], '_topics': ['ceilometer-pipe-notification_source:notification_sink-0'], 'publisher_id': 'notification_source:notification_sink', 'transport': }
(Pdb) p notifiers[3].__dict__
{'_serializer': , '_driver_mgr': , 'retry': None, '_driver_names': ['messagingv2'], '_topics': ['ceilometer-pipe-notification_source:notification_sink-3'], 'publisher_id': 'notification_source:notification_sink', 'transport': }
(Pdb) p notifiers[5].__dict__
{'_serializer': , '_driver_mgr': , 'retry': None, '_driver_names': ['messagingv2'], '_topics': ['ceilometer-pipe-notification_source:notification_sink-5'], 'publisher_id': 'notification_source:notification_sink', 'transport': }
(Pdb) p len(notifiers)
10
(Pdb) p serialized_data[filter_attr]
u'cpu'
(Pdb) p d_filter(serialized_data[filter_attr])
False


4.5.4) 分析代码
                        for d_filter, grouping_keys, notifiers in transporters:
                            if d_filter(serialized_data[filter_attr]):
                                if meter != 'cpu':
                                    continue
                                ForkedPdb().set_trace()
                                hashResult = hash_grouping(serialized_data,grouping_keys)
                                key = hashResult % len(notifiers)
                                # key = (hash_grouping(serialized_data,
                                #                      grouping_keys)
                                #        % len(notifiers))
                                notifier = notifiers[key]
                                notifier.sample({},
                                                event_type=event_type,
                                                payload=[serialized_data])

分析:
4.5.4.1)
                                (Pdb) p len(notifiers)
10
(Pdb) p hashResult
997602684701390314
(Pdb) p key
4

(Pdb) p notifier

(Pdb) p notifier.__dict__
{'_serializer': , '_driver_mgr': , 'retry': None, '_driver_names': ['messagingv2'], '_topics': ['ceilometer-pipe-cpu_source:cpu_delta_sink-4'], 'publisher_id': 'cpu_source:cpu_delta_sink', 'transport': }
(Pdb) p notifiers
[, , , , , , , , , ]


(Pdb) p notifiers[0].__dict__
{'_serializer': , '_driver_mgr': , 'retry': None, '_driver_names': ['messagingv2'], '_topics': ['ceilometer-pipe-cpu_source:cpu_delta_sink-0'], 'publisher_id': 'cpu_source:cpu_delta_sink', 'transport': }


(Pdb) p event_type
'ceilometer.pipeline'
(Pdb) p serialized_data
{'counter_name': u'cpu', 'resource_id': u'67efa5d0-c964-4352-a1b5-8e9e19d30e84', 'timestamp': u'2019-05-18T08:44:00.667058', 'counter_volume': 14811737685428, 'user_id': u'802351a3a54848038eb5227f9ef6ca5f', 'message_signature': '12268c62da9e401a2f2a898aa092691a57e85270e716259fa34c7598e16f70bd', 'resource_metadata': {u'instance_host': u'node-1.domain.tld', u'image': {u'id': u'a412bbcf-4cd1-438d-a095-60ac1123f25f', u'links': [{u'href': u'http://nova-api.openstack.svc.cluster.local:8774/7cf89220dea34a1c8e30f7a1de2bb8a9/images/a412bbcf-4cd1-438d-a095-60ac1123f25f', u'rel': u'bookmark'}], u'name': u'TestVM'}, u'ramdisk_id': None, u'flavor': {u'name': u'1-100-1', u'links': [{u'href': u'http://nova-api.openstack.svc.cluster.local:8774/7cf89220dea34a1c8e30f7a1de2bb8a9/flavors/e6d2867a-a5b0-4e41-933e-26e57c6d779b', u'rel': u'bookmark'}], u'ram': 100, u'ephemeral': 0, u'vcpus': 1, u'disk': 1, u'id': u'e6d2867a-a5b0-4e41-933e-26e57c6d779b'}, u'memory_mb': 100, u'display_name': u'myl_ins', u'state': u'active', u'OS-EXT-AZ:availability_zone': u'nova', u'status': u'active', u'ephemeral_gb': 0, u'disk_gb': 1, u'kernel_id': None, u'host': u'c607f14ba627cfb6019589e91d2d5aa2d9e386a5a2216e5dc7d3fab0', u'task_state': None, u'image_ref_url': u'http://nova-api.openstack.svc.cluster.local:8774/7cf89220dea34a1c8e30f7a1de2bb8a9/images/a412bbcf-4cd1-438d-a095-60ac1123f25f', u'cpu_number': 1, u'root_gb': 1, u'name': u'instance-000000dc', u'instance_id': u'67efa5d0-c964-4352-a1b5-8e9e19d30e84', u'instance_type': u'1-100-1', u'vcpus': 1, u'image_ref': u'a412bbcf-4cd1-438d-a095-60ac1123f25f'}, 'source': u'openstack', 'counter_unit': u'ns', 'project_id': u'6723b9f65cb24270bc0e5b71afadc48b', 'message_id': u'1532ace0-7949-11e9-b806-0ab9d9129f4b', 'monotonic_time': None, 'counter_type': u'cumulative'}

解释:
例如cpu这个sample会被cpu_delta的notifier和cpu_util的notifier都处理。
这里实际上就是对每个采样数据,获取其resource_id(当然也可以设置多个属性,然后取出多个属性的值,
拼接成一个字符串)的值,然后调用Python内置的hash方法,传入该字符串,获取其哈希结果,
该哈希结果是一个整型值,然后用该哈希结果与当前处理该采样数据的notifiers的长度(默认为10)取模,
得到一个下标,然后获取notifiers中该下标的notifier,用该notifier.sample方法发送采样数据到消息队列。

4.5.5)分析hash_grouping方法
    @staticmethod
    def hash_grouping(datapoint, grouping_keys):
        value = ''
        for key in grouping_keys or []:
            value += datapoint.get(key) if datapoint.get(key) else ''
        return hash(value)
分析:
4.5.5.1) 输入参数分析
    (Pdb) p datapoint
{'counter_name': u'cpu', 'resource_id': u'67efa5d0-c964-4352-a1b5-8e9e19d30e84', 'timestamp': u'2019-05-18T08:44:00.667058', 'counter_volume': 14811737685428, 'user_id': u'802351a3a54848038eb5227f9ef6ca5f', 'message_signature': '12268c62da9e401a2f2a898aa092691a57e85270e716259fa34c7598e16f70bd', 'resource_metadata': {u'instance_host': u'node-1.domain.tld', u'image': {u'id': u'a412bbcf-4cd1-438d-a095-60ac1123f25f', u'links': [{u'href': u'http://nova-api.openstack.svc.cluster.local:8774/7cf89220dea34a1c8e30f7a1de2bb8a9/images/a412bbcf-4cd1-438d-a095-60ac1123f25f', u'rel': u'bookmark'}], u'name': u'TestVM'}, u'ramdisk_id': None, u'flavor': {u'name': u'1-100-1', u'links': [{u'href': u'http://nova-api.openstack.svc.cluster.local:8774/7cf89220dea34a1c8e30f7a1de2bb8a9/flavors/e6d2867a-a5b0-4e41-933e-26e57c6d779b', u'rel': u'bookmark'}], u'ram': 100, u'ephemeral': 0, u'vcpus': 1, u'disk': 1, u'id': u'e6d2867a-a5b0-4e41-933e-26e57c6d779b'}, u'memory_mb': 100, u'display_name': u'myl_ins', u'state': u'active', u'OS-EXT-AZ:availability_zone': u'nova', u'status': u'active', u'ephemeral_gb': 0, u'disk_gb': 1, u'kernel_id': None, u'host': u'c607f14ba627cfb6019589e91d2d5aa2d9e386a5a2216e5dc7d3fab0', u'task_state': None, u'image_ref_url': u'http://nova-api.openstack.svc.cluster.local:8774/7cf89220dea34a1c8e30f7a1de2bb8a9/images/a412bbcf-4cd1-438d-a095-60ac1123f25f', u'cpu_number': 1, u'root_gb': 1, u'name': u'instance-000000dc', u'instance_id': u'67efa5d0-c964-4352-a1b5-8e9e19d30e84', u'instance_type': u'1-100-1', u'vcpus': 1, u'image_ref': u'a412bbcf-4cd1-438d-a095-60ac1123f25f'}, 'source': u'openstack', 'counter_unit': u'ns', 'project_id': u'6723b9f65cb24270bc0e5b71afadc48b', 'message_id': u'1532ace0-7949-11e9-b806-0ab9d9129f4b', 'monotonic_time': None, 'counter_type': u'cumulative'}

(Pdb) p grouping_keys
['resource_id']


4.5.5.2) 方法内部处理结果分析
            (Pdb) p datapoint.get(key)
u'67efa5d0-c964-4352-a1b5-8e9e19d30e84'
        (Pdb) p value
u'67efa5d0-c964-4352-a1b5-8e9e19d30e84'

(Pdb) p hash(value)
997602684701390314

总结: 可以看到经过hash方法处理最终得到的是一个整型的哈希值。


4.6) 调用_PipelineTransportManager类的publisher的代码分析
源码在ceilometer/agent/plugin_base.py的
@six.add_metaclass(abc.ABCMeta)
class NotificationBase(PluginBase):
    """Base class for plugins that support the notification API."""
    def __init__(self, manager):
        super(NotificationBase, self).__init__()
        # NOTE(gordc): this is filter rule used by oslo.messaging to dispatch
        # messages to an endpoint.
        if self.event_types:
            self.filter_rule = oslo_messaging.NotificationFilter(
                event_type='|'.join(self.event_types))
        self.manager = manager

    def sample(self, notifications):
        """RPC endpoint for notification messages at sample level

        When another service sends a notification over the message
        bus at sample priority, this method receives it.

        :param notifications: list of notifications
        """
        self._process_notifications('sample', notifications)


分析:
4.6.1) 上述sample方法调用_process_notifications方法
该方法内容如下:
    def _process_notifications(self, priority, notifications):
        for notification in notifications:
            try:
                notification = messaging.convert_to_old_notification_format(
                    priority, notification)
                self.to_samples_and_publish(notification)
            except Exception:
                LOG.error(_LE('Fail to process notification'), exc_info=True)

4.6.2) 上述_process_notifications方法调用to_samples_and_publish方法
to_samples_and_publish方法如下
    def to_samples_and_publish(self, notification):
        """Return samples produced by *process_notification*.

        Samples produced for the given notification.
        :param context: Execution context from the service or RPC call
        :param notification: The notification to process.
        """
        with self.manager.publisher() as p:
            p(list(self.process_notification(notification)))

分析:
4.6.2.1)
这里的关键就是:
如果开启协调器,to_samples_and_publish中的self.manager就是_PipelineTransportManager对象
里面的publisher方法来自于_PipelineTransportManager对象的publisher方法,
该方法具体内容如下:
    def publisher(self):
        serializer = self.serializer
        hash_grouping = self.hash_grouping
        transporters = self.transporters
        filter_attr = self.filter_attr
        event_type = self.event_type

        class PipelinePublishContext(object):
            def __enter__(self):
                def p(data):
                    # TODO(gordc): cleanup so payload is always single
                    #              datapoint. we can't correctly bucketise
                    #              datapoints if batched.
                    data = [data] if not isinstance(data, list) else data
                    for datapoint in data:
                        serialized_data = serializer(datapoint)
                        for d_filter, grouping_keys, notifiers in transporters:
                            if d_filter(serialized_data[filter_attr]):
                                key = (hash_grouping(serialized_data,
                                                     grouping_keys)
                                       % len(notifiers))
                                notifier = notifiers[key]
                                notifier.sample({},
                                                event_type=event_type,
                                                payload=[serialized_data])
                return p

            def __exit__(self, exc_type, exc_value, traceback):
                pass

        return PipelinePublishContext()
分析:
可以看到这里用除模取余来获取指定topic的notifier来发送消息,最终实现了负载均衡。

4.6) 总结
ceilometer-notification服务中,NotificationService服务的run方法是
总入口。里面先判断了是否开启协调组,如果开启协调组,则
启动tooz.coordination协调器,而后调用self._refresh_agent方法。
在self._refresh_agent方法中执行self._configure_pipeline_listener方法。
在self._configure_pipeline_listener方法中关键执行了:
self.partition_coordinator.extract_my_subset方法。
在self.partition_coordinator.extract_my_subset方法中,
1) 方法首先利用一致性哈希环计算出了当前节点该组成员
对应默认10个队列编号0~9中落在当前节点上的队列编号列表。
具体的处理逻辑是:
s1: 获取所有组成员,计算每个组成员名称对应的哈希值(先对组成员用md5处理获取其
二进制摘要字符串,然后通过struct对二进制摘要字符串拆包转换为python的整型数据)
s2: 建立键值对: <组成员名称的哈希值,节点名称>,更新一致性哈希环这个字典;
并将组成员名称的哈希值存入到哈希值数组中,最后对哈希值数组排序
s3: 对于新的输入数据,例如队列编号,计算队列编号哈希值,用二分查找查询该输入数据
哈希值在s2中哈希值数组中的位置,根据该位置定位到该输入数据应该落在哪个节点上,
返回该节点名称

注意: 群组名称是ceilometer.notification
每个组成员名称(可以认为是节点名称是uuid)

2) 计算出落在当前节点上ceilometer-notification的队列编号列表,然后和pipelines做笛卡尔乘积,
拼接出所有落在当前节点ceilometer-notification的topic,拼接方式如下:
            topic = '%s-%s-%s' % (self.NOTIFICATION_IPC,
                                  pipe.name, pipe_set)
一个topic拼接后样例如下:
ceilometer-pipe-cpu_source:cpu_sink-0
生成的对应队列名称样例如下:
ceilometer-pipe-cpu_source:cpu_sink-0.sample    

3) 生成endpoints
通过遍历每个pipeline,添加该pipeline的
SamplePipelineEndpoint或者SamplePipelineEndpoint

4) 最后启动pipeline的监听器,监听上述队列
        self.pipeline_listener = messaging.get_batch_notification_listener(
            transport,
            targets,
            endpoints,
            batch_size=cfg.CONF.notification.batch_size,
            batch_timeout=cfg.CONF.notification.batch_timeout)


参考:
[1]ceilometer newton版本源码
[2]https://blog.csdn.net/allenson1/article/details/44539709
[3]https://julien.danjou.info/python-distributed-membership-lock-with-tooz/

你可能感兴趣的:(python,64式)