排查 “Detected Tx Unit Hang”问题

实现功能:

使用自己已经分配的内存让skb->data指向,而不是使用alloc_malloc()。

部分代码如下:   

 1             /*

 2              * build a new sk_buff

 3              */

 4             //struct sk_buff *send_skb = kmem_cache_alloc_node(skbuff_head_cache, GFP_ATOMIC & ~__GFP_DMA, NUMA_NO_NODE);

 5             struct sk_buff *send_skb = kmem_cache_alloc(skbuff_head_cache, GFP_ATOMIC & ~__GFP_DMA);

 6 

 7             if (!send_skb) {

 8                 //spin_unlock(&lock);

 9                 return NF_DROP;

10             }

11             

12             //printk("what2\n");

13             memset(send_skb, 0, offsetof(struct sk_buff, tail));

14             atomic_set(&send_skb->users, 2);

15             send_skb->cloned = 0;

16             

17             send_skb->head = mmap_buf + 1024;

18             send_skb->data = mmap_buf + 1024;

19             

第18行,mmap_buf是提前分配的内存。

在/var/log/messages中网卡驱动会输出错误信息:

 1 ep 28 15:36:17 10g-host2 kernel: ixgbe 0000:03:00.0: eth2: Detected Tx Unit Hang

 2 Sep 28 15:36:17 10g-host2 kernel:  Tx Queue             <13>

 3 Sep 28 15:36:17 10g-host2 kernel:  TDH, TDT             <0>, <1ea>

 4 Sep 28 15:36:17 10g-host2 kernel:  next_to_use          <1ea>

 5 Sep 28 15:36:17 10g-host2 kernel:  next_to_clean        <0>

 6 Sep 28 15:36:17 10g-host2 kernel: ixgbe 0000:03:00.0: eth2: Detected Tx Unit Hang

 7 Sep 28 15:36:17 10g-host2 kernel:  Tx Queue             <15>

 8 Sep 28 15:36:17 10g-host2 kernel:  TDH, TDT             <1>, <1eb>

 9 Sep 28 15:36:17 10g-host2 kernel:  next_to_use          <1eb>

10 Sep 28 15:36:17 10g-host2 kernel:  next_to_clean        <1>

11 Sep 28 15:36:17 10g-host2 kernel: ixgbe 0000:03:00.0: eth2: Detected Tx Unit Hang

12 Sep 28 15:36:17 10g-host2 kernel:  Tx Queue             <14>

13 Sep 28 15:36:17 10g-host2 kernel:  TDH, TDT             <0>, <1ea>

14 Sep 28 15:36:17 10g-host2 kernel:  next_to_use          <1ea>

15 Sep 28 15:36:17 10g-host2 kernel:  next_to_clean        <0>

16 Sep 28 15:36:17 10g-host2 kernel: ixgbe 0000:03:00.0: eth2: Detected Tx Unit Hang

17 Sep 28 15:36:17 10g-host2 kernel:  Tx Queue             <4>

18 Sep 28 15:36:17 10g-host2 kernel:  TDH, TDT             <0>, <1ea>

19 Sep 28 15:36:17 10g-host2 kernel:  next_to_use          <1ea>

20 Sep 28 15:36:17 10g-host2 kernel:  next_to_clean        <0>

21 Sep 28 15:36:17 10g-host2 kernel: ixgbe 0000:03:00.0: eth2: Detected Tx Unit Hang

22 Sep 28 15:36:17 10g-host2 kernel:  Tx Queue             <12>

23 Sep 28 15:36:17 10g-host2 kernel:  TDH, TDT             <5>, <1ef>

24 Sep 28 15:36:17 10g-host2 kernel:  next_to_use          <1ef>

25 Sep 28 15:36:17 10g-host2 kernel:  next_to_clean        <5>

26 Sep 28 15:36:17 10g-host2 kernel: ixgbe 0000:03:00.0: eth2: Detected Tx Unit Hang

27 Sep 28 15:36:17 10g-host2 kernel:  Tx Queue             <2>

28 Sep 28 15:36:17 10g-host2 kernel:  TDH, TDT             <2>, <1ec>

29 Sep 28 15:36:17 10g-host2 kernel:  next_to_use          <1ec>

30 Sep 28 15:36:17 10g-host2 kernel:  next_to_clean        <2>

31 Sep 28 15:36:17 10g-host2 kernel: ixgbe 0000:03:00.0: eth2: Detected Tx Unit Hang

在排除各种原因后,定位为分配的mmap_buf存在问题。使用vmalloc()分配不正确,改为kmalloc()后正常。

《Linux内核设计与实现》第12.5节有解释,应该是:网卡设备要求分配的物理地址连续,而vmalloc()只是虚拟地址连续

 

你可能感兴趣的:(it)