Kubernetes Events虽不常被提起,却意义非凡。它存储在Etcd里,记录了集群运行所遇到的各种大事件。本系列文章将一步一步地揭开Kubernetes Events的神秘面纱。
当集群中的 node 或 pod 异常时,大部分用户会使用 kubectl 查看对应的 events,那么 events 是从何而来的?其实 k8s 中的各个组件会将运行时产生的各种事件汇报到 apiserver,对于 k8s 中的可描述资源,使用 kubectl describe 都可以看到其相关的 events,那 k8s 中又有哪几个组件都上报 events 呢?
只要在 k8s.io/kubernetes/cmd
目录下暴力搜索一下就能知道哪些组件会产生 events:
$ grep -R -n -i "EventRecorder" .
可以看出,controller-manage、kube-proxy、kube-scheduler、kubelet 都使用了 EventRecorder,本文只讲述 kubelet 中对 Events 的使用。
我们知道 kubernetes 是分布式的架构,apiserver 是整个集群的交互中心,客户端主要和它打交道,kubelet 是各个节点上的 worker,负责执行具体的任务。对于用户来说,每次创建资源的时候,除了看到它的最终状态(一般是运行态),希望看到资源执行的过程,中间经过了哪些步骤。这些反馈信息对于调试来说非常重要,有些任务会失败或者卡在某个步骤,有了这些信息,我们就能够准确地定位问题。
kubelet 需要把关键步骤中的执行事件发送到 apiserver,这样客户端就能通过查询知道整个流程发生了哪些事情,不需要登录到 kubelet 所在的节点查看日志的内容或者容器的运行状态。
events 在 k8s.io/api/core/v1/types.go
中进行定义,结构体如下所示:
// Event is a report of an event somewhere in the cluster.
type Event struct {
metav1.TypeMeta `json:",inline"`
// Standard object's metadata.
// More info: https://git.k8s.io/community/contributors/devel/api-conventions.md#metadata
metav1.ObjectMeta `json:"metadata" protobuf:"bytes,1,opt,name=metadata"`
// The object that this event is about.
InvolvedObject ObjectReference `json:"involvedObject" protobuf:"bytes,2,opt,name=involvedObject"`
// This should be a short, machine understandable string that gives the reason
// for the transition into the object's current status.
// TODO: provide exact specification for format.
// +optional
Reason string `json:"reason,omitempty" protobuf:"bytes,3,opt,name=reason"`
// A human-readable description of the status of this operation.
// TODO: decide on maximum length.
// +optional
Message string `json:"message,omitempty" protobuf:"bytes,4,opt,name=message"`
// The component reporting this event. Should be a short machine understandable string.
// +optional
Source EventSource `json:"source,omitempty" protobuf:"bytes,5,opt,name=source"`
// The time at which the event was first recorded. (Time of server receipt is in TypeMeta.)
// +optional
FirstTimestamp metav1.Time `json:"firstTimestamp,omitempty" protobuf:"bytes,6,opt,name=firstTimestamp"`
// The time at which the most recent occurrence of this event was recorded.
// +optional
LastTimestamp metav1.Time `json:"lastTimestamp,omitempty" protobuf:"bytes,7,opt,name=lastTimestamp"`
// The number of times this event has occurred.
// +optional
Count int32 `json:"count,omitempty" protobuf:"varint,8,opt,name=count"`
// Type of this event (Normal, Warning), new types could be added in the future
// +optional
Type string `json:"type,omitempty" protobuf:"bytes,9,opt,name=type"`
// Time when this Event was first observed.
// +optional
EventTime metav1.MicroTime `json:"eventTime,omitempty" protobuf:"bytes,10,opt,name=eventTime"`
// Data about the Event series this event represents or nil if it's a singleton Event.
// +optional
Series *EventSeries `json:"series,omitempty" protobuf:"bytes,11,opt,name=series"`
// What action was taken/failed regarding to the Regarding object.
// +optional
Action string `json:"action,omitempty" protobuf:"bytes,12,opt,name=action"`
// Optional secondary object for more complex actions.
// +optional
Related *ObjectReference `json:"related,omitempty" protobuf:"bytes,13,opt,name=related"`
// Name of the controller that emitted this Event, e.g. `kubernetes.io/kubelet`.
// +optional
ReportingController string `json:"reportingComponent" protobuf:"bytes,14,opt,name=reportingComponent"`
// ID of the controller instance, e.g. `kubelet-xyzf`.
// +optional
ReportingInstance string `json:"reportingInstance" protobuf:"bytes,15,opt,name=reportingInstance"`
}
其中 InvolvedObject 代表和事件关联的对象,source 代表事件源,使用 kubectl 看到的事件一般包含 Type、Reason、Age、From、Message 几个字段。
k8s 中 events 目前只有两种类型:"Normal" 和 "Warning":
// Valid values for event types (new types could be added in future)
const (
// Information only and will not cause any problems
EventTypeNormal string = "Normal"
// These events are to warn that something might go wrong
EventTypeWarning string = "Warning"
)
这部分我们讲直接分析 kubelet 的源码,了解事件机制实现的来龙去脉。
kubernetes 是以 pod 为核心概念的,不管是 deployment、statefulSet、replicaSet,最终都会创建出来 pod。因此事件机制也是围绕 pod 进行的,在 pod 生命周期的关键步骤都会产生事件消息。比如 Controller Manager 会记录节点注册和销毁的事件、Deployment 扩容和升级的事件;kubelet 会记录镜像回收事件、volume 无法挂载事件等;Scheduler 会记录调度事件等。这篇文章只关心 kubelet 的情况,其他组件实现原理是一样的。
查看 pkg/kubelet/kubelet.go
文件的代码,你会看到类似下面的代码:
kl.recorder.Eventf(kl.nodeRef, api.EventTypeWarning, events.ContainerGCFailed, err.Error())
上面这行代码是容器 GC 失败的时候出现的,它发送了一条事件消息,通知 apiserver 容器 GC 失败的原因。除了 kubelet 本身之外,kubelet 的各个组件(比如 imageManager、probeManager 等)也会有这个字段,记录重要的事件,读者可以搜索源码去看 kubelet 哪些地方会发送事件。
recorder
是 kubelet 结构的一个字段:
type kubelet struct {
...
// The EventBroader to use
recorder record.EventRecorder
...
}
它的类型是 record.EventRecorder
,这是个定义了三个方法的 interface,代码在 pkg/client/record/event.go
文件中:
type EventRecorder interface {
Event(object runtime.Object, eventtype, reason, message string)
Eventf(object runtime.Object, eventtype, reason, messageFmt string, args ...interface{})
PastEventf(object runtime.Object, timestamp metav1.Time, eventtype, reason, messageFmt string, args ...interface{})
}
EventRecord的具体实例是recorderImpl:
type recorderImpl struct {
scheme *runtime.Scheme
source v1.EventSource
*watch.Broadcaster
clock clock.Clock
}
EventRecorder 中包含的几个方法都是产生指定格式的 events,Event() 和 Eventf() 的功能类似 fmt.Println() 和 fmt.Printf(),kubelet 中的各个模块会调用 EventRecorder 生成 events。recorderImpl 是 EventRecorder 实际的对象。EventRecorder 的每个方法会调用 generateEvent,在 generateEvent 中初始化 events 。
以下是生成 events 的函数:
func (recorder *recorderImpl) generateEvent(object runtime.Object, annotations map[string]string, timestamp metav1.Time, eventtype, reason, message string) {
ref, err := ref.GetReference(recorder.scheme, object)
if err != nil {
klog.Errorf("Could not construct reference to: '%#v' due to: '%v'. Will not report event: '%v' '%v' '%v'", object, err, eventtype, reason, message)
return
}
if !validateEventType(eventtype) {
klog.Errorf("Unsupported event type: '%v'", eventtype)
return
}
event := recorder.makeEvent(ref, annotations, eventtype, reason, message)
event.Source = recorder.source
go func() {
// NOTE: events should be a non-blocking operation
defer utilruntime.HandleCrash()
recorder.Action(watch.Added, event)
}()
}
func (recorder *recorderImpl) makeEvent(ref *v1.ObjectReference, annotations map[string]string, eventtype, reason, message string) *v1.Event {
t := metav1.Time{Time: recorder.clock.Now()}
namespace := ref.Namespace
if namespace == "" {
namespace = metav1.NamespaceDefault
}
return &v1.Event{
ObjectMeta: metav1.ObjectMeta{
Name: fmt.Sprintf("%v.%x", ref.Name, t.UnixNano()),
Namespace: namespace,
Annotations: annotations,
},
InvolvedObject: *ref,
Reason: reason,
Message: message,
FirstTimestamp: t,
LastTimestamp: t,
Count: 1,
Type: eventtype,
}
}
// Action distributes the given event among all watchers.
func (m *Broadcaster) Action(action EventType, obj runtime.Object) {
m.incoming <- Event{action, obj}
}
初始化完 events 后会调用 recorder.Action() 将 events 发送到 Broadcaster 的事件接收队列中, Action() 是 Broadcaster 中的方法。
总结起来就是EventRecorder是用来生成一个Event对象,并且写入到管道channel中。
events 的整个生命周期都与 EventBroadcaster 有关,kubelet 中对 EventBroadcaster 的初始化在k8s.io/kubernetes/cmd/kubelet/app/server.go
中:
func RunKubelet(kubeServer *options.KubeletServer, kubeDeps *kubelet.Dependencies, runOnce bool) error {
...
// event 初始化
makeEventRecorder(kubeDeps, nodeName)
...
}
func makeEventRecorder(kubeDeps *kubelet.Dependencies, nodeName types.NodeName) {
if kubeDeps.Recorder != nil {
return
}
// 初始化 EventBroadcaster实例,同时进入loop循环,从Broadcaster中读取管道中的event信息,
//广播给所有watchers
eventBroadcaster := record.NewBroadcaster()
// 初始化 EventRecorder,用于生产Event,
kubeDeps.Recorder = eventBroadcaster.NewRecorder(legacyscheme.Scheme, v1.EventSource{Component: componentKubelet, Host: string(nodeName)})
// 记录 events 到本地日志
eventBroadcaster.StartLogging(glog.V(3).Infof)
if kubeDeps.EventClient != nil {
glog.V(4).Infof("Sending events to api server.")
// 上报 events 到 apiserver
eventBroadcaster.StartRecordingToSink(&v1core.EventSinkImpl{Interface: kubeDeps.EventClient.Events("")})
} else {
glog.Warning("No api server defined - no events will be sent to API server.")
}
}
上面已经说了,EventBroadcaster 初始化时会初始化一个 Broadcaster,Broadcaster 的作用就是接收所有的 events 并进行广播,Broadcaster 的实现在 k8s.io/apimachinery/pkg/watch/mux.go
中,Broadcaster 初始化完成后会在后台启动一个 goroutine,然后接收所有从 EventRecorder 发送过来的 events,Broadcaster 中有一个 map 会保存每一个注册的 watcher, 接着将 events 广播给所有的 watcher,每个 watcher 都有一个接收消息的 channel,watcher 可以通过它的 ResultChan() 方法从 channel 中读取数据进行消费。
以下是 Broadcaster 广播 events 的实现:
// 新建一个eventBroadcaster实例
。。。
eventBroadcaster := record.NewBroadcaster()
。。。
// Creates a new event broadcaster.
func NewBroadcaster() EventBroadcaster {
return &eventBroadcasterImpl{watch.NewBroadcaster(maxQueuedEvents, watch.DropIfChannelFull), defaultSleepDuration}
}
func NewBroadcaster(queueLength int, fullChannelBehavior FullChannelBehavior) *Broadcaster {
m := &Broadcaster{
watchers: map[int64]*broadcasterWatcher{},
incoming: make(chan Event, incomingQueueLength),
watchQueueLength: queueLength,
fullChannelBehavior: fullChannelBehavior,
}
m.distributing.Add(1)
go m.loop()
return m
}
// loop receives from m.incoming and distributes to all watchers.
func (m *Broadcaster) loop() {
// Deliberately not catching crashes here. Yes, bring down the process if there's a
// bug in watch.Broadcaster.
for event := range m.incoming {
if event.Type == internalRunFunctionMarker {
event.Object.(functionFakeRuntimeObject)()
continue
}
m.distribute(event)
}
m.closeAll()
m.distributing.Done()
}
注意:watcher注册是在调用函数StartEventWatcher中完成的watcher := eventBroadcaster.Watch(),也就是说实际上在初始化一个eventBroadcaster实例时,Broadcaster结构体中的watchers为空,之后在进行event处理时才注册watcher加入到Broadcaster结构体中。具体来说就是调用StartLogging和StartRecordingToSink(Events 的处理中会介绍到这两个方法)。
那么 watcher 是从何而来呢?每一个要处理 events 的 client 都需要初始化一个 watcher,处理 events 的方法是在 EventBroadcaster 中定义的,以下是 EventBroadcaster 中对 events 处理的三个函数:
// StartEventWatcher starts sending events received from this EventBroadcaster to the given event handler function.
// The return value can be ignored or used to stop recording, if desired.
func (eventBroadcaster *eventBroadcasterImpl) StartEventWatcher(eventHandler func(*v1.Event)) watch.Interface {
watcher := eventBroadcaster.Watch() //注册一个watcher加入到Broadcaster结构体中
go func() {
defer utilruntime.HandleCrash()
for watchEvent := range watcher.ResultChan() {
event, ok := watchEvent.Object.(*v1.Event)
if !ok {
// This is all local, so there's no reason this should
// ever happen.
continue
}
eventHandler(event)
}
}()
return watcher
}
StartEventWatcher() 首先实例化一个 watcher,每个 watcher 都会被塞入到 Broadcaster 的 watcher 列表中,watcher 从 Broadcaster 提供的 channel 中读取 events,然后再调用 eventHandler 进行处理,StartLogging() 和 StartRecordingToSink() 都是对 StartEventWatcher() 的封装,都会传入自己的处理函数。
// StartLogging starts sending events received from this EventBroadcaster to the given logging function.
// The return value can be ignored or used to stop recording, if desired.
func (eventBroadcaster *eventBroadcasterImpl) StartLogging(logf func(format string, args ...interface{})) watch.Interface {
return eventBroadcaster.StartEventWatcher(
func(e *v1.Event) {
logf("Event(%#v): type: '%v' reason: '%v' %v", e.InvolvedObject, e.Type, e.Reason, e.Message)
})
}
StartLogging() 传入的 eventHandler 仅将 events 保存到日志中。
func (eventBroadcaster *eventBroadcasterImpl) StartRecordingToSink(sink EventSink) watch.Interface {
// The default math/rand package functions aren't thread safe, so create a
// new Rand object for each StartRecording call.
randGen := rand.New(rand.NewSource(time.Now().UnixNano()))
eventCorrelator := NewEventCorrelator(clock.RealClock{})
return eventBroadcaster.StartEventWatcher( // 本质是封装成函数StartEventWatcher
func(event *v1.Event) {
recordToSink(sink, event, eventCorrelator, randGen, eventBroadcaster.sleepDuration) // 具体调用的eventHandler()
})
}
func recordToSink(sink EventSink, event *v1.Event, eventCorrelator *EventCorrelator, randGen *rand.Rand, sleepDuration time.Duration) {
// Make a copy before modification, because there could be multiple listeners.
// Events are safe to copy like this.
eventCopy := *event
event = &eventCopy
result, err := eventCorrelator.EventCorrelate(event)
if err != nil {
utilruntime.HandleError(err)
}
if result.Skip {
return
}
tries := 0
for {
if recordEvent(sink, result.Event, result.Patch, result.Event.Count > 1, eventCorrelator) {
break
}
tries++
if tries >= maxTriesPerEvent {
klog.Errorf("Unable to write event '%#v' (retry limit exceeded!)", event)
break
}
// Randomize the first sleep so that various clients won't all be
// synced up if the master goes down.
if tries == 1 {
time.Sleep(time.Duration(float64(sleepDuration) * randGen.Float64()))
} else {
time.Sleep(sleepDuration)
}
}
}
StartRecordingToSink() 方法先根据当前时间生成一个随机数发生器 randGen,增加随机数是为了在重试时增加随机性,防止 apiserver 重启的时候所有的事件都在同一时间发送事件,接着实例化一个EventCorrelator,EventCorrelator 会对事件做一些预处理的工作,其中包括过滤、聚合、缓存等操作,具体代码不做详细分析,最后将 recordToSink() 函数作为处理函数,recordToSink() 会将处理后的 events 发送到 apiserver,这是 StartEventWatcher() 的整个工作流程。
func recordEvent(sink EventSink, event *v1.Event, patch []byte, updateExistingEvent bool, eventCorrelator *EventCorrelator) bool {
var newEvent *v1.Event
var err error
if updateExistingEvent {
newEvent, err = sink.Patch(event, patch)
}
// Update can fail because the event may have been removed and it no longer exists.
if !updateExistingEvent || (updateExistingEvent && isKeyNotFoundError(err)) {
// Making sure that ResourceVersion is empty on creation
event.ResourceVersion = ""
newEvent, err = sink.Create(event)
}
if err == nil {
// we need to update our event correlator with the server returned state to handle name/resourceversion
eventCorrelator.UpdateState(newEvent)
return true
}
// If we can't contact the server, then hold everything while we keep trying.
// Otherwise, something about the event is malformed and we should abandon it.
switch err.(type) {
case *restclient.RequestConstructionError:
// We will construct the request the same next time, so don't keep trying.
klog.Errorf("Unable to construct event '%#v': '%v' (will not retry!)", event, err)
return true
case *errors.StatusError:
if errors.IsAlreadyExists(err) {
klog.V(5).Infof("Server rejected event '%#v': '%v' (will not retry!)", event, err)
} else {
klog.Errorf("Server rejected event '%#v': '%v' (will not retry!)", event, err)
}
return true
case *errors.UnexpectedObjectError:
// We don't expect this; it implies the server's response didn't match a
// known pattern. Go ahead and retry.
default:
// This case includes actual http transport errors. Go ahead and retry.
}
klog.Errorf("Unable to write event: '%v' (may retry after sleeping)", err)
return false
}
sink.Create
和 sink.Patch
是自动生成的 apiserver 的 client。
通过这篇文章,我们了解到了整个事件机制的来龙去脉。最后,我们再做一个总结,看看事件流动的整个过程:
recorder
对象提供的 Event
、Eventf
和 PastEventf
方法产生特性的事件recorder
根据传递过来的参数新建一个 Event
对象,并把它发送给 EventBroadcaster
的管道EventBroadcaster
后台运行的 goroutine 从管道中读取事件消息,把它广播给之前注册的 handler 进行处理EventSink
,它在发送事件给 apiserver 之前会先做预处理EventCorrelator
完成的,它会对事件做过滤、汇聚和去重操作,返回处理后的事件(可能是原来的事件,也可能是新创建的事件)事件的产生过程是这样的,那么这些事件都有什么用呢?它一般用于调试,用户可以通过 kubectl
命令获取整个集群或者某个 pod 的事件信息。kubectl get events
可以看到所有的事件,kubectl describe pod PODNAME
能看到关于某个 pod 的事件。对于前者很好理解,kubectl 会直接访问 apiserver 的 event 资源,而对于后者 kubectl 还根据 pod 的名字进行搜索,匹配 InvolvedObject 名称和 pod 名称匹配的事件。
我们来思考一下事件机制的框架,有哪些我们可以借鉴的设计思想呢?我想最重要的一点是:需求决定实现。
Event 和 kubernetes 中其他的资源不同,它有一个很重要的特性就是它可以丢失。如果某个事件丢了,并不会影响集群的正常工作。事件的重要性远低于集群的稳定性,所以我们看到事件整个流程中如果有错误,会直接忽略这个事件。
事件的另外一个特性是它的数量很多,相比于 pod 或者 deployment 等资源,事件要比它们多很多,而且每次有事件都要对 etcd 进行写操作。整个集群如果不加管理地往 etcd 中写事件,会对 etcd 造成很大的压力,而 etcd 的可用性是整个集群的基础,所以每个组件在写事件之前,会对事件进行汇聚和去重工作,减少最终的写操作。
https://cizixs.com/2017/06/22/kubelet-source-code-analysis-part4-event/
https://www.kubernetes.org.cn/1195.html
https://www.jianshu.com/p/bd4941069427