Milvus 启动失败排查案例:Etcd 未启动引发的 Goroutine 堆栈分析

目录

Milvus 启动失败排查案例:Etcd 未启动引发的 Goroutine 堆栈分析

背景说明

现象解读

原因定位

️ 解决方案

✅ 步骤一:检查 Etcd 服务状态

✅ 步骤二:重新启动 Etcd

✅ 步骤三:再次启动 Milvus

总结

建议与实践


Milvus 启动失败排查案例:Etcd 未启动引发的 Goroutine 堆栈分析

在实际部署向量数据库 Milvus 的过程中,启动失败的情况并不少见。本文通过一次真实案例,解析如何通过 goroutine 堆栈分析定位启动失败根因,并最终发现是由于 Etcd 服务未能正常启动导致 Milvus 卡死在初始化阶段。


背景说明

我们在使用 Docker 启动 Milvus 集群时,发现 milvus 容器迟迟无法正常提供服务,健康检查超时,控制台也无明显 panic 或 error 日志。但当我们执行:

docker exec -it milvus bash
go tool pprof -text http://localhost:6060/debug/pprof/goroutine

或直接查看容器中 stdout/stderr 日志时,发现大量 goroutine 堆栈输出,类似以下内容:

goroutine 3 gp=0xc000007500 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        /go/pkg/mod/golang.org/[email protected]/src/runtime/proc.go:402 +0xce fp=0xc0000b9750 sp=0xc0000b9730 pc=0x20352ce
runtime.gcBgMarkWorker()
        /go/pkg/mod/golang.org/[email protected]/src/runtime/mgc.go:1310 +0xe5 fp=0xc0000b97e0 sp=0xc0000b9750 pc=0x2012fc5
runtime.goexit({})
        /go/pkg/mod/golang.org/[email protected]/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc0000b97e8 sp=0xc0000b97e0 pc=0x206f9a1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        /go/pkg/mod/golang.org/[email protected]/src/runtime/mgc.go:1234 +0x1c

goroutine 4 gp=0xc0000076c0 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        /go/pkg/mod/golang.org/[email protected]/src/runtime/proc.go:402 +0xce fp=0xc0000b9f50 sp=0xc0000b9f30 pc=0x20352ce
runtime.gcBgMarkWorker()
        /go/pkg/mod/golang.org/[email protected]/src/runtime/mgc.go:1310 +0xe5 fp=0xc0000b9fe0 sp=0xc0000b9f50 pc=0x2012fc5
runtime.goexit({})
        /go/pkg/mod/golang.org/[email protected]/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc0000b9fe8 sp=0xc0000b9fe0 pc=0x206f9a1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        /go/pkg/mod/golang.org/[email protected]/src/runtime/mgc.go:1234 +0x1c

goroutine 5 gp=0xc000007880 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        /go/pkg/mod/golang.org/[email protected]/src/runtime/proc.go:402 +0xce fp=0xc0000ba750 sp=0xc0000ba730 pc=0x20352ce
runtime.gcBgMarkWorker()
        /go/pkg/mod/golang.org/[email protected]/src/runtime/mgc.go:1310 +0xe5 fp=0xc0000ba7e0 sp=0xc0000ba750 pc=0x2012fc5
runtime.goexit({})
        /go/pkg/mod/golang.org/[email protected]/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc0000ba7e8 sp=0xc0000ba7e0 pc=0x206f9a1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        /go/pkg/mod/golang.org/[email protected]/src/runtime/mgc.go:1234 +0x1c

goroutine 50 gp=0xc000700000 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        /go/pkg/mod/golang.org/[email protected]/src/runtime/proc.go:402 +0xce fp=0xc0006f2750 sp=0xc0006f2730 pc=0x20352ce
runtime.gcBgMarkWorker()
        /go/pkg/mod/golang.org/[email protected]/src/runtime/mgc.go:1310 +0xe5 fp=0xc0006f27e0 sp=0xc0006f2750 pc=0x2012fc5
runtime.goexit({})
        /go/pkg/mod/golang.org/[email protected]/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc0006f27e8 sp=0xc0006f27e0 pc=0x206f9a1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        /go/pkg/mod/golang.org/[email protected]/src/runtime/mgc.go:1234 +0x1c

goroutine 51 gp=0xc0007001c0 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        /go/pkg/mod/golang.org/[email protected]/src/runtime/proc.go:402 +0xce fp=0xc0006f2f50 sp=0xc0006f2f30 pc=0x20352ce
runtime.gcBgMarkWorker()
        /go/pkg/mod/golang.org/[email protected]/src/runtime/mgc.go:1310 +0xe5 fp=0xc0006f2fe0 sp=0xc0006f2f50 pc=0x2012fc5
runtime.goexit({})
        /go/pkg/mod/golang.org/[email protected]/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc0006f2fe8 sp=0xc0006f2fe0 pc=0x206f9a1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        /go/pkg/mod/golang.org/[email protected]/src/runtime/mgc.go:1234 +0x1c

goroutine 52 gp=0xc000700380 m=nil [GC worker (idle)]:
runtime.gopark(0x109fbe1aec1b?, 0x0?, 0x0?, 0x0?, 0x0?)
        /go/pkg/mod/golang.org/[email protected]/src/runtime/proc.go:402 +0xce fp=0xc0006f3750 sp=0xc0006f3730 pc=0x20352ce
runtime.gcBgMarkWorker()
        /go/pkg/mod/golang.org/[email protected]/src/runtime/mgc.go:1310 +0xe5 fp=0xc0006f37e0 sp=0xc0006f3750 pc=0x2012fc5
runtime.goexit({})
        /go/pkg/mod/golang.org/[email protected]/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc0006f37e8 sp=0xc0006f37e0 pc=0x206f9a1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        /go/pkg/mod/golang.org/[email protected]/src/runtime/mgc.go:1234 +0x1c

goroutine 53 gp=0xc000700540 m=nil [GC worker (idle)]:
runtime.gopark(0xa2cc040?, 0x1?, 0x1b?, 0x35?, 0x0?)
        /go/pkg/mod/golang.org/[email protected]/src/runtime/proc.go:402 +0xce fp=0xc0006f3f50 sp=0xc0006f3f30 pc=0x20352ce
runtime.gcBgMarkWorker()
        /go/pkg/mod/golang.org/[email protected]/src/runtime/mgc.go:1310 +0xe5 fp=0xc0006f3fe0 sp=0xc0006f3f50 pc=0x2012fc5
runtime.goexit({})
        /go/pkg/mod/golang.org/[email protected]/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc0006f3fe8 sp=0xc0006f3fe0 pc=0x206f9a1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        /go/pkg/mod/golang.org/[email protected]/src/runtime/mgc.go:1234 +0x1c

goroutine 6 gp=0xc000c80a80 m=nil [select]:
runtime.gopark(0xc0006f8768?, 0x2?, 0x60?, 0x0?, 0xc0006f873c?)
        /go/pkg/mod/golang.org/[email protected]/src/runtime/proc.go:402 +0xce fp=0xc0006f85e0 sp=0xc0006f85c0 pc=0x20352ce
runtime.selectgo(0xc0006f8768, 0xc0006f8738, 0x0?, 0x0, 0x0?, 0x1)
        /go/pkg/mod/golang.org/[email protected]/src/runtime/select.go:327 +0x725 fp=0xc0006f8700 sp=0xc0006f85e0 pc=0x2047685
github.com/panjf2000/ants/v2.(*Pool).purgeStaleWorkers(0xc000e2be00, {0x74e0160, 0xc000117630})
        /go/pkg/mod/github.com/panjf2000/ants/[email protected]/pool.go:83 +0xfb fp=0xc0006f87b8 sp=0xc0006f8700 pc=0x48b599b
github.com/panjf2000/ants/v2.(*Pool).goPurge.gowrap1()
        /go/pkg/mod/github.com/panjf2000/ants/[email protected]/pool.go:147 +0x28 fp=0xc0006f87e0 sp=0xc0006f87b8 pc=0x48b5f28
runtime.goexit({})
        /go/pkg/mod/golang.org/[email protected]/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc0006f87e8 sp=0xc0006f87e0 pc=0x206f9a1
created by github.com/panjf2000/ants/v2.(*Pool).goPurge in goroutine 1
        /go/pkg/mod/github.com/panjf2000/ants/[email protected]/pool.go:147 +0xcc

goroutine 7 gp=0xc000c80c40 m=nil [select]:
runtime.gopark(0xc0006f8f68?, 0x2?, 0x10?, 0x9f?, 0xc0006f8f3c?)
        /go/pkg/mod/golang.org/[email protected]/src/runtime/proc.go:402 +0xce fp=0xc0006f8de0 sp=0xc0006f8dc0 pc=0x20352ce
runtime.selectgo(0xc0006f8f68, 0xc0006f8f38, 0xc0017b8a68?, 0x0, 0x0?, 0x1)
        /go/pkg/mod/golang.org/[email protected]/src/runtime/select.go:327 +0x725 fp=0xc0006f8f00 sp=0xc0006f8de0 pc=0x2047685
github.com/panjf2000/ants/v2.(*Pool).ticktock(0xc000e2be00, {0x74e0160, 0xc000117680})
        /go/pkg/mod/github.com/panjf2000/ants/[email protected]/pool.go:125 +0x145 fp=0xc0006f8fb8 sp=0xc0006f8f00 pc=0x48b5d05
github.com/panjf2000/ants/v2.(*Pool).goTicktock.gowrap1()
        /go/pkg/mod/github.com/panjf2000/ants/[email protected]/pool.go:154 +0x28 fp=0xc0006f8fe0 sp=0xc0006f8fb8 pc=0x48b60a8
runtime.goexit({})
        /go/pkg/mod/golang.org/[email protected]/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc0006f8fe8 sp=0xc0006f8fe0 pc=0x206f9a1
created by github.com/panjf2000/ants/v2.(*Pool).goTicktock in goroutine 1
        /go/pkg/mod/github.com/panjf2000/ants/[email protected]/pool.go:154 +0xfc

goroutine 8 gp=0xc000c80e00 m=nil [select]:
runtime.gopark(0xc0006f9778?, 0x3?, 0xb8?, 0x25?, 0xc0006f9772?)
        /go/pkg/mod/golang.org/[email protected]/src/runtime/proc.go:402 +0xce fp=0xc0006f9618 sp=0xc0006f95f8 pc=0x20352ce
runtime.selectgo(0xc0006f9778, 0xc0006f976c, 0xc000077880?, 0x0, 0x0?, 0x1)
        /go/pkg/mod/golang.org/[email protected]/src/runtime/select.go:327 +0x725 fp=0xc0006f9738 sp=0xc0006f9618 pc=0x2047685
go.opencensus.io/stats/view.(*worker).start(0xc000077880)
        /go/pkg/mod/[email protected]/stats/view/worker.go:292 +0x9f fp=0xc0006f97c8 sp=0xc0006f9738 pc=0x3b6b4bf
go.opencensus.io/stats/view.init.0.gowrap1()
        /go/pkg/mod/[email protected]/stats/view/worker.go:34 +0x25 fp=0xc0006f97e0 sp=0xc0006f97c8 pc=0x3b6a6e5
runtime.goexit({})
        /go/pkg/mod/golang.org/[email protected]/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc0006f97e8 sp=0xc0006f97e0 pc=0x206f9a1
created by go.opencensus.io/stats/view.init.0 in goroutine 1
        /go/pkg/mod/[email protected]/stats/view/worker.go:34 +0x8d

goroutine 169 gp=0xc001156000 m=nil [select]:
runtime.gopark(0xc00115e778?, 0x2?, 0x70?, 0xe6?, 0xc00115e764?)
        /go/pkg/mod/golang.org/[email protected]/src/runtime/proc.go:402 +0xce fp=0xc00115e608 sp=0xc00115e5e8 pc=0x20352ce
runtime.selectgo(0xc00115e778, 0xc00115e760, 0xc0015ada98?, 0x0, 0xc001156000?, 0x1)
        /go/pkg/mod/golang.org/[email protected]/src/runtime/select.go:327 +0x725 fp=0xc00115e728 sp=0xc00115e608 pc=0x2047685
github.com/hashicorp/golang-lru/v2/expirable.NewLRU[...].func1()
        /go/pkg/mod/github.com/hashicorp/golang-lru/[email protected]/expirable/expirable_lru.go:86 +0xfc fp=0xc00115e7c8 sp=0xc00115e728 pc=0x58cdedc
github.com/hashicorp/golang-lru/v2/expirable.NewLRU[...].gowrap1()
        /go/pkg/mod/github.com/hashicorp/golang-lru/[email protected]/expirable/expirable_lru.go:93 +0x24 fp=0xc00115e7e0 sp=0xc00115e7c8 pc=0x58cdda4
runtime.goexit({})
        /go/pkg/mod/golang.org/[email protected]/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc00115e7e8 sp=0xc00115e7e0 pc=0x206f9a1
created by github.com/hashicorp/golang-lru/v2/expirable.NewLRU[...] in goroutine 1
        /go/pkg/mod/github.com/hashicorp/golang-lru/[email protected]/expirable/expirable_lru.go:82 +0x32a

goroutine 39 gp=0xc0015c2fc0 m=nil [select, locked to thread]:
runtime.gopark(0xc0006f7fa8?, 0x2?, 0x69?, 0x55?, 0xc0006f7f94?)
        /go/pkg/mod/golang.org/[email protected]/src/runtime/proc.go:402 +0xce fp=0xc0006f7e38 sp=0xc0006f7e18 pc=0x20352ce
runtime.selectgo(0xc0006f7fa8, 0xc0006f7f90, 0x0?, 0x0, 0x0?, 0x1)
        /go/pkg/mod/golang.org/[email protected]/src/runtime/select.go:327 +0x725 fp=0xc0006f7f58 sp=0xc0006f7e38 pc=0x2047685
runtime.ensureSigM.func1()
        /go/pkg/mod/golang.org/[email protected]/src/runtime/signal_unix.go:1034 +0x19f fp=0xc0006f7fe0 sp=0xc0006f7f58 pc=0x206509f
runtime.goexit({})
        /go/pkg/mod/golang.org/[email protected]/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc0006f7fe8 sp=0xc0006f7fe0 pc=0x206f9a1
created by runtime.ensureSigM in goroutine 1
        /go/pkg/mod/golang.org/[email protected]/src/runtime/signal_unix.go:1017 +0xc8

goroutine 40 gp=0xc0017b61c0 m=7 mp=0xc000101008 [syscall]:
runtime.notetsleepg(0xa2ca6a0, 0xffffffffffffffff)
        /go/pkg/mod/golang.org/[email protected]/src/runtime/lock_futex.go:246 +0x29 fp=0xc00115afa0 sp=0xc00115af78 pc=0x2000ba9
os/signal.signal_recv()
        /go/pkg/mod/golang.org/[email protected]/src/runtime/sigqueue.go:152 +0x29 fp=0xc00115afc0 sp=0xc00115afa0 pc=0x206b869
os/signal.loop()
        /go/pkg/mod/golang.org/[email protected]/src/os/signal/signal_unix.go:23 +0x13 fp=0xc00115afe0 sp=0xc00115afc0 pc=0x2140313
runtime.goexit({})
        /go/pkg/mod/golang.org/[email protected]/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc00115afe8 sp=0xc00115afe0 pc=0x206f9a1
created by os/signal.Notify.func1.1 in goroutine 1
        /go/pkg/mod/golang.org/[email protected]/src/os/signal/signal.go:151 +0x1f

goroutine 31 gp=0xc000c808c0 m=nil [runnable]:
reflect.Value.Elem({0x6151e20?, 0xc001c42cd0?, 0x16?})
        /go/pkg/mod/golang.org/[email protected]/src/reflect/value.go:1230 +0x1aa fp=0xc0000ef808 sp=0xc0000ef800 pc=0x20b2b0a
gopkg.in/yaml%2ev2.(*decoder).mapping(0xc0014d0060, 0xc001c1c0e0, {0x6348540?, 0xc001c42ca0?, 0x6348540?})
        /go/pkg/mod/gopkg.in/[email protected]/decode.go:675 +0x6a5 fp=0xc0000ef950 sp=0xc0000ef808 pc=0x2af3465
gopkg.in/yaml%2ev2.(*decoder).unmarshal(0xc0014d0060, 0xc001c1c0e0, {0x6348540?, 0xc001c42ca0?, 0xc00135ef30?})
        /go/pkg/mod/gopkg.in/[email protected]/decode.go:372 +0x1a5 fp=0xc0000ef9c0 sp=0xc0000ef950 pc=0x2af0be5
gopkg.in/yaml%2ev2.(*decoder).mapping(0xc0014d0060, 0xc000c07f10, {0x63e6ca0?, 0xc000dc93a0?, 0xc000dc93a8?})
        /go/pkg/mod/gopkg.in/[email protected]/decode.go:676 +0x70e fp=0xc0000efb08 sp=0xc0000ef9c0 pc=0x2af34ce
gopkg.in/yaml%2ev2.(*decoder).unmarshal(0xc0014d0060, 0xc000c07f10, {0x63e6ca0?, 0xc000dc93a0?, 0x2aef36b?})
        /go/pkg/mod/gopkg.in/[email protected]/decode.go:372 +0x1a5 fp=0xc0000efb78 sp=0xc0000efb08 pc=0x2af0be5
gopkg.in/yaml%2ev2.(*decoder).document(...)
        /go/pkg/mod/gopkg.in/[email protected]/decode.go:384
gopkg.in/yaml%2ev2.(*decoder).unmarshal(0x61ab260?, 0xc000dc93a0?, {0x63e6ca0?, 0xc000dc93a0?, 0x0?})
        /go/pkg/mod/gopkg.in/[email protected]/decode.go:360 +0x110 fp=0xc0000efbe8 sp=0xc0000efb78 pc=0x2af0b50
gopkg.in/yaml%2ev2.unmarshal({0xc001c00000, 0x1333f, 0x13340}, {0x61ab260, 0xc000dc93a0}, 0x0)
        /go/pkg/mod/gopkg.in/[email protected]/yaml.go:148 +0x389 fp=0xc0000efca0 sp=0xc0000efbe8 pc=0x2b0ca69
gopkg.in/yaml%2ev2.Unmarshal(...)
        /go/pkg/mod/gopkg.in/[email protected]/yaml.go:81
github.com/milvus-io/milvus/pkg/v2/config.(*FileSource).loadFromFile(0xc0017b42a0)
        /workspace/source/pkg/config/file_source.go:147 +0x314 fp=0xc0000efdb8 sp=0xc0000efca0 pc=0x2cfaf14
github.com/milvus-io/milvus/pkg/v2/config.(*FileSource).loadFromFile-fm()
        :1 +0x25 fp=0xc0000efdd0 sp=0xc0000efdb8 pc=0x2d02805
github.com/milvus-io/milvus/pkg/v2/config.(*refresher).refreshPeriodically(0xc000d64280, {0x6b52343, 0xa})
        /workspace/source/pkg/config/refresher.go:73 +0x25a fp=0xc0000effb8 sp=0xc0000efdd0 pc=0x2cff5fa
github.com/milvus-io/milvus/pkg/v2/config.(*refresher).start.func1.gowrap1()
        /workspace/source/pkg/config/refresher.go:52 +0x28 fp=0xc0000effe0 sp=0xc0000effb8 pc=0x2cff2a8
runtime.goexit({})
        /go/pkg/mod/golang.org/[email protected]/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc0000effe8 sp=0xc0000effe0 pc=0x206f9a1
created by github.com/milvus-io/milvus/pkg/v2/config.(*refresher).start.func1 in goroutine 1
        /workspace/source/pkg/config/refresher.go:52 +0x9c

goroutine 200 gp=0xc000db68c0 m=nil [select]:
runtime.gopark(0xc001175f58?, 0x2?, 0x0?, 0x61?, 0xc001175e14?)
        /go/pkg/mod/golang.org/[email protected]/src/runtime/proc.go:402 +0xce fp=0xc001175cb0 sp=0xc001175c90 pc=0x20352ce
runtime.selectgo(0xc001175f58, 0xc001175e10, 0xc001159724?, 0x0, 0x1?, 0x1)
        /go/pkg/mod/golang.org/[email protected]/src/runtime/select.go:327 +0x725 fp=0xc001175dd0 sp=0xc001175cb0 pc=0x2047685
github.com/milvus-io/milvus/pkg/v2/config.(*refresher).refreshPeriodically(0xc000c47bd0, {0x6b52343, 0xa})
        /workspace/source/pkg/config/refresher.go:71 +0x237 fp=0xc001175fb8 sp=0xc001175dd0 pc=0x2cff5d7
github.com/milvus-io/milvus/pkg/v2/config.(*refresher).start.func1.gowrap1()
        /workspace/source/pkg/config/refresher.go:52 +0x28 fp=0xc001175fe0 sp=0xc001175fb8 pc=0x2cff2a8
runtime.goexit({})
        /go/pkg/mod/golang.org/[email protected]/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc001175fe8 sp=0xc001175fe0 pc=0x206f9a1
created by github.com/milvus-io/milvus/pkg/v2/config.(*refresher).start.func1 in goroutine 1
        /workspace/source/pkg/config/refresher.go:52 +0x9c

goroutine 201 gp=0xc000db6e00 m=nil [select]:
runtime.gopark(0xc0000c9f30?, 0x3?, 0x10?, 0x18?, 0xc0000c9ef2?)
        /go/pkg/mod/golang.org/[email protected]/src/runtime/proc.go:402 +0xce fp=0xc0000c9d88 sp=0xc0000c9d68 pc=0x20352ce
runtime.selectgo(0xc0000c9f30, 0xc0000c9eec, 0xc00160b040?, 0x0, 0x4c8e3bfe?, 0x1)
        /go/pkg/mod/golang.org/[email protected]/src/runtime/select.go:327 +0x725 fp=0xc0000c9ea8 sp=0xc0000c9d88 pc=0x2047685
go.opentelemetry.io/otel/sdk/trace.(*batchSpanProcessor).processQueue(0xc000bf9900)
        /go/pkg/mod/go.opentelemetry.io/otel/[email protected]/trace/batch_span_processor.go:301 +0x11d fp=0xc0000c9fa0 sp=0xc0000c9ea8 pc=0x2b240dd
go.opentelemetry.io/otel/sdk/trace.NewBatchSpanProcessor.func1()
        /go/pkg/mod/go.opentelemetry.io/otel/[email protected]/trace/batch_span_processor.go:117 +0x54 fp=0xc0000c9fe0 sp=0xc0000c9fa0 pc=0x2b23354
runtime.goexit({})
        /go/pkg/mod/golang.org/[email protected]/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc0000c9fe8 sp=0xc0000c9fe0 pc=0x206f9a1
created by go.opentelemetry.io/otel/sdk/trace.NewBatchSpanProcessor in goroutine 1
        /go/pkg/mod/go.opentelemetry.io/otel/[email protected]/trace/batch_span_processor.go:115 +0x2e5

这些信息初看不显眼,但隐藏了真实的问题线索。


现象解读

我们可以从 goroutine 状态初步判断出:

  • 大量 GC worker (idle) goroutine → 系统空转,未进入密集计算。

  • 配置刷新线程运行中 → Milvus 正在不断尝试加载配置。

  • ants 协程池 ticktock 正常 → 协程池启动成功,正在后台监控。

  • OpenCensus / OpenTelemetry 正常 → 说明链路追踪子系统未阻塞。

而最关键的线索出现在:

github.com/milvus-io/milvus/pkg/v2/config.(*FileSource).loadFromFile()
github.com/milvus-io/milvus/pkg/v2/config.(*refresher).refreshPeriodically()

这些 goroutine 不断尝试从配置文件中加载 Etcd 地址与健康信息。没有抛出 panic 说明 Milvus 本身没有逻辑错误,但也没有往下继续启动流程。


原因定位

结合 Milvus 的启动依赖图:

Milvus
├── Etcd (用于元数据存储)
├── MinIO
├── Pulsar/Kafka
├── RocksMQ (默认)

我们进一步进入容器中执行以下命令检查:

ping etcd
curl http://etcd:2379/health

结果返回:

curl: (7) Failed to connect to etcd port 2379: Connection refused

至此,问题水落石出:Milvus 等待连接 Etcd,但由于 Etcd 容器未启动或启动失败,导致 Milvus 在等待元信息服务初始化过程中被阻塞,无法继续进行。


️ 解决方案

✅ 步骤一:检查 Etcd 服务状态

docker ps -a | grep etcd
docker logs etcd

确认是否启动失败,常见问题包括端口冲突、数据目录权限等。

✅ 步骤二:重新启动 Etcd

确保使用正确配置启动 etcd:

docker-compose restart etcd

或检查挂载目录权限:

chown -R 1000:1000 /your/etcd/data/dir

✅ 步骤三:再次启动 Milvus

docker-compose up -d milvus

启动成功后可以看到:

Milvus is ready to use!

总结

本次 Milvus 启动失败并非由程序崩溃引发,而是由于外部依赖(Etcd)未能正常启动,导致服务卡在初始化阶段。通过 goroutine 堆栈分析,我们快速定位出是配置加载与 Etcd 通信相关逻辑的阻塞,从而快速修复问题。


建议与实践

  • 为 Milvus 配置 startupProbereadinessProbe,可以在 Kubernetes 中更早发现 Etcd 未就绪问题。

  • 部署前增加对依赖服务的 health check,可在 entrypoint.sh 中等待 etcd 的 TCP 端口。

  • 学会使用 pprof 工具分析 goroutine,能大大加快复杂启动问题的诊断速度。

你可能感兴趣的:(各种问题,milvus,etcd,数据库)