原文链接:
https://github.com/tnaganawa/tungstenfabric-docs/blob/master/TungstenFabricKnowledgeBase.md

作者:Tatsuya Naganawa 译者:TF编译组

本系列为“Tungsten Fabric入门宝典”的姊妹篇,补充介绍有关Tungsten Fabric部署的各类主题。

vhost0设备

首次启动vRouter时,将创建vhost0接口,并将最初分配给物理接口的IP和MAC移至vhost0。

因此,自然假设的情况是,vhost0是vRouter本身,它对外部结Fabric进行ARP响应,流量首先通过vhost0,然后进入虚拟机。

transit traffic:
 vm - vhost0 - eth0
self traffic:
 vhost0 - eth0

实际上,事实并非如此。

作为说明,当在vhost0上对诸如VXLAN之类的overlay流量执行tcpdump时,它不会显示一些数据包,需要针对物理接口的tcpdump才能实现这个目的。
这个文档也有助于您理解:https://wiki.tungsten.io/display/TUN/Offloads?preview=%2F1409118%2F1409513%2FTungsten+Fabric+ParaVirt+Offload+Arch.DOCX

transit traffic:
 vm - (dp-core) - eth0
self traffic:
 vhost0 - (dp-core) - eth0

在由dp-core服务的某些桥接域(bridge-domain)中,vhost0与irb相似,而eth0是此桥接域中的L2接口之一。

  • 在vRouter术语中,此状态称为“xconnect (cross-connect)”,就我的理解来说,它类似于桥接:https://github.com/tungstenfabric/tf-vrouter/blob/master/dp-core/vr_interface.c
  • 桥接域(bridge-domain)是与Linux网桥类似的概念,它可以具有多个物理L2接口和一个内部L3接口。

因此,当eth0首次收到来自Fabric的ARP请求时,dp-core将基于最初分配给eth0的MAC地址返回ARP响应。

然后其它计算节点将向该vRouter节点发送一些流量,例如overlay流量或自流量(self-traffic)。

使用overlay流量时(基于udp端口或gre标头,它由dp-core标识),dp-core会剥离外部IP和标签,并进行VRF路由到标签所指示的特定VM。

  • 使用L3 VXLAN时,它将基于L3 VRF中的路由表进行路由查找
  • 使用MPLS时,标签本身会标识最终接口

当dp-core接收到自流量(self traffic)后,将在vhost_tx中使用hif_rx(后者又使用linux函数netif_rx,以skb作为参数)将流量发送到vRouter节点上的linux接口,即vhost0。

  • https://github.com/tungstenfabric/tf-vrouter/blob/master/dp-core/vr_interface.c#L813
  • https://github.com/tungstenfabric/tf-vrouter/blob/master/linux/vr_host_interface.c#L2380
  • https://github.com/tungstenfabric/tf-vrouter/blob/master/linux/vr_host_interface.c#L228

因此,对于用于自流量(self-traffic)的rx / tx,数据包始终通过dp-core,而对于传输流量(transit traffic),则不会通过vhost0。

skb to vr_packet

Linux网络堆栈使用sk_buff作为数据包的内存存储。

而在dp-core中,则使用vr_packet,因此它们之间如何转换是一个有趣的主题。

为此,使用vp_os_packet函数。

  • https://github.com/tungstenfabric/tf-vrouter/blob/master/include/vr_linux.h#L10
    static inline struct sk_buff *
    vp_os_packet(struct vr_packet *pkt)
    {
    return CONTAINER_OF(cb, struct sk_buff, pkt);
    }

因此,实际上vr_packet是在skb结构中的某个位置定义的(sk_buff->cb,它是某些应用程序使用的成员变量)。从而,skb和vr_packet可以通过指针操作进行转换。

请注意,由于cb最大为48字节,因此vr_packet不能大于该数值。这里有一些关于此问题的讨论。

https://github.com/tungstenfabric/tf-vrouter/blob/master/include/vr_packet.h#L195-L198

/*
 * NOTE: Please do not add any more fields without ensuring
 * that the size is <= 48 bytes in 64 bit systems.
 */

vRouter创建的Linux接口

首次启动vrouter-agent容器时会创建多个接口,即使vrouter-agent停止,实际上也不会删除该接口。

出于什么目的使用它,是一个有趣的主题。

综上所述,vrouter.ko中的vif接口始终与相应的linux netdevice绑定,因此使用vif --create等创建一些vRouter接口,同时也将创建linux netdevice,这可以从ip link或ls /sys/class/net中看到。

来自“ip tuntap list”的一个例证。

[root@ip-172-31-12-55 ~]# ip -o a
1: lo    inet 127.0.0.1/8 scope host lo\       valid_lft forever preferred_lft forever
1: lo    inet6 ::1/128 scope host \       valid_lft forever preferred_lft forever
2: ens3    inet6 fe80::46c:bff:fec8:dd64/64 scope link \       valid_lft forever preferred_lft forever
3: docker0    inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0\       valid_lft forever preferred_lft forever
16: vhost0    inet 172.31.12.55/20 brd 172.31.15.255 scope global dynamic vhost0\       valid_lft 3118sec preferred_lft 3118sec
16: vhost0    inet6 fe80::46c:bff:fec8:dd64/64 scope link \       valid_lft forever preferred_lft forever
17: pkt0    inet6 fe80::5094:6cff:fefb:42f7/64 scope link \       valid_lft forever preferred_lft forever
[root@ip-172-31-12-55 ~]# ip tuntap list
pkt0: tap
[root@ip-172-31-12-55 ~]#

因此从Linux的角度来看,pkt0实际上是一个Tap设备。

从某种意义上说,vif命令将使vRouter与某些由nova-vif-driver创建的Linux网络设备(例如tapxxxx-xxxx)建立vRouter接口,以使通过该设备的数据包被dp-core接收 。

因此,当CNI等发现与容器连接的Tap设备时,它将发送内部创建vif的vrouter-api,其名称与Tap设备相同,以便将进入Tap设备的数据包转发到vRouter(dp-core)。

启动vrouter-agent时会创建一些特殊设备,即vhost0,pkt0,pkt1,pkt2,pkt3。

如前所述,vhost0与dp-core的irb接口相似,因此在dp-core路由完成后,它将接收到vRouter节点本身的数据包。

由于vrouter-agent容器在启动时会创建/etc/sysconfig/network-scripts/{ifup-vhost,ifdown-vhost},因此它可以由ifup / ifdown直接控制,其内部类型为vif --add vhost0,可以直接在命令行中创建和删除它。

https://github.com/tungstenfabric/tf-container-builder/blob/master/containers/vrouter/base/network-functions-vrouter-kernel#L41

这里,pkt1,pkt2,pkt3是在vrouter_linux_init中的linux_pkt_dev_alloc里定义的接口,其中vrouter_linux_init是vrouter.ko的module_init。

  • https://github.com/tungstenfabric/tf-vrouter/blob/master/linux/vr_host_interface.c#L2485
linux/vrouter_mod.c
 module_init(vrouter_linux_init);

static int
linux_pkt_dev_alloc(void)
{
    if (pkt_gro_dev == NULL) {
        pkt_gro_dev = linux_pkt_dev_init("pkt1", &pkt_gro_dev_setup,
                                         &pkt_gro_dev_rx_handler);
        if (pkt_gro_dev == NULL) {
            vr_module_error(-ENOMEM, __FUNCTION__, __LINE__, 0);
            return -ENOMEM;
        }
    }

    if (pkt_l2_gro_dev == NULL) {
        pkt_l2_gro_dev = linux_pkt_dev_init("pkt3", &pkt_l2_gro_dev_setup,
                                         &pkt_gro_dev_rx_handler);
        if (pkt_l2_gro_dev == NULL) {
            vr_module_error(-ENOMEM, __FUNCTION__, __LINE__, 0);
            return -ENOMEM;
        }
    }

    if (pkt_rps_dev == NULL) {
        pkt_rps_dev = linux_pkt_dev_init("pkt2", &pkt_rps_dev_setup,
                                        &pkt_rps_dev_rx_handler);
        if (pkt_rps_dev == NULL) {
            vr_module_error(-ENOMEM, __FUNCTION__, __LINE__, 0);
            return -ENOMEM;
        }
    }

    return 0;
}

它使用了一些GRO和RPS功能,这对于提高内核vRouter的性能很重要。

  • 它们被初始化为空的net_device_ops和随机的ethernet addr。
linux/vr_host_interface.c

/*
 * pkt_rps_dev_ops - netdevice operations on RPS packet device. Currently,
 * no operations are needed, but an empty structure is required to
 * register the device.
 *
 */
static struct net_device_ops pkt_rps_dev_ops;

(snip)

/*
 * pkt_rps_dev_setup - fill in the relevant fields of the RPS packet device
 */
static void
pkt_rps_dev_setup(struct net_device *dev)
{
    /*
     * Initializing the interfaces with basic parameters to setup address
     * families.
     */
    random_ether_addr(dev->dev_addr);
    dev->addr_len = ETH_ALEN;

    dev->hard_header_len = ETH_HLEN;

    dev->type = ARPHRD_VOID;
    dev->netdev_ops = &pkt_rps_dev_ops;
    dev->mtu = 65535;

    return;
}

这里pkt0稍有不同,它用于将数据包从dp-core发送到vrouter-agent。

实际上它是在vrouter-agent首次启动时,根据vrouter-agent的请求创建的,以创建与vrouter-agent进行通信的Tap设备。

  • https://github.com/Juniper/contrail-controller/blob/master/src/vnsw/agent/contrail/contrail_agent_init.cc#L89
  • https://github.com/Juniper/contrail-controller/blob/master/src/vnsw/agent/oper/interface.cc#L626
  • https://github.com/tungstenfabric/tf-vrouter/blob/master/dp-core/vr_interface.c#L669

因此,如果将数据包从dp-core发送到该接口,则vrouter-agent将接收该数据包,以在内部处理该数据包(arp、dhcp等都以这种方式处理)。

作为描述此行为的另一说明,当modprobe vrouter,ifup vhost0,vrouter-agent启动完成后,我将添加ip -o addr,ip link,vif –list的结果。

# docker-compose -f /etc/contrail/vrouter/docker-compose.yaml down
# ifdown vhost0
# modprobe vrouter

[root@ip-172-31-12-55 ~]# ip -o a
1: lo    inet 127.0.0.1/8 scope host lo\       valid_lft forever preferred_lft forever
1: lo    inet6 ::1/128 scope host \       valid_lft forever preferred_lft forever
2: ens3    inet 172.31.12.55/20 brd 172.31.15.255 scope global dynamic ens3\       valid_lft 3561sec preferred_lft 3561sec
2: ens3    inet6 fe80::46c:bff:fec8:dd64/64 scope link \       valid_lft forever preferred_lft forever
3: docker0    inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0\       valid_lft forever preferred_lft forever
[root@ip-172-31-12-55 ~]# 

[root@ip-172-31-12-55 ~]# ip link
1: lo:  mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: ens3:  mtu 9001 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether 06:6c:0b:c8:dd:64 brd ff:ff:ff:ff:ff:ff
3: docker0:  mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default 
    link/ether 02:42:34:e8:c3:14 brd ff:ff:ff:ff:ff:ff
9: pkt1: <> mtu 65535 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/void be:f9:01:0e:4d:38 brd 00:00:00:00:00:00
10: pkt3: <> mtu 65535 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/void 46:f8:5c:cb:79:8e brd 00:00:00:00:00:00
11: pkt2: <> mtu 65535 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/void a2:b0:40:5c:03:d4 brd 00:00:00:00:00:00
[root@ip-172-31-12-55 ~]#
[root@ip-172-31-12-55 ~]# vif --list
Vrouter Interface Table

Flags: P=Policy, X=Cross Connect, S=Service Chain, Mr=Receive Mirror
       Mt=Transmit Mirror, Tc=Transmit Checksum Offload, L3=Layer 3, L2=Layer 2
       D=DHCP, Vp=Vhost Physical, Pr=Promiscuous, Vnt=Native Vlan Tagged
       Mnp=No MAC Proxy, Dpdk=DPDK PMD Interface, Rfl=Receive Filtering Offload, Mon=Interface is Monitored
       Uuf=Unknown Unicast Flood, Vof=VLAN insert/strip offload, Df=Drop New Flows, L=MAC Learning Enabled
       Proxy=MAC Requests Proxied Always, Er=Etree Root, Mn=Mirror without Vlan Tag, HbsL=HBS Left Intf
       HbsR=HBS Right Intf, Ig=Igmp Trap Enabled

vif0/4350   OS: pkt3
            Type:Stats HWaddr:00:00:00:00:00:00 IPaddr:0.0.0.0
            Vrf:65535 Mcast Vrf:65535 Flags:L3L2 QOS:0 Ref:1
            RX packets:0  bytes:0 errors:0
            TX packets:0  bytes:0 errors:0
            Drops:0

vif0/4351   OS: pkt1
            Type:Stats HWaddr:00:00:00:00:00:00 IPaddr:0.0.0.0
            Vrf:65535 Mcast Vrf:65535 Flags:L3L2 QOS:0 Ref:1
            RX packets:0  bytes:0 errors:0
            TX packets:0  bytes:0 errors:0
            Drops:0

[root@ip-172-31-12-55 ~]# 

# ifup vhost0

[root@ip-172-31-12-55 ~]# ip -o a
1: lo    inet 127.0.0.1/8 scope host lo\       valid_lft forever preferred_lft forever
1: lo    inet6 ::1/128 scope host \       valid_lft forever preferred_lft forever
2: ens3    inet6 fe80::46c:bff:fec8:dd64/64 scope link \       valid_lft forever preferred_lft forever
3: docker0    inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0\       valid_lft forever preferred_lft forever
12: vhost0    inet 172.31.12.55/20 brd 172.31.15.255 scope global dynamic vhost0\       valid_lft 3594sec preferred_lft 3594sec
12: vhost0    inet6 fe80::46c:bff:fec8:dd64/64 scope link \       valid_lft forever preferred_lft forever
[root@ip-172-31-12-55 ~]# 
[root@ip-172-31-12-55 ~]# 
[root@ip-172-31-12-55 ~]# ip link
1: lo:  mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: ens3:  mtu 9001 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether 06:6c:0b:c8:dd:64 brd ff:ff:ff:ff:ff:ff
3: docker0:  mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default 
    link/ether 02:42:34:e8:c3:14 brd ff:ff:ff:ff:ff:ff
9: pkt1:  mtu 65535 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/void be:f9:01:0e:4d:38 brd 00:00:00:00:00:00
10: pkt3:  mtu 65535 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/void 46:f8:5c:cb:79:8e brd 00:00:00:00:00:00
11: pkt2:  mtu 65535 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/void a2:b0:40:5c:03:d4 brd 00:00:00:00:00:00
12: vhost0:  mtu 9001 qdisc pfifo_fast state UNKNOWN mode DEFAULT group default qlen 1000
    link/ether 06:6c:0b:c8:dd:64 brd ff:ff:ff:ff:ff:ff
[root@ip-172-31-12-55 ~]# 

[root@ip-172-31-12-55 ~]# vif --list
Vrouter Interface Table

Flags: P=Policy, X=Cross Connect, S=Service Chain, Mr=Receive Mirror
       Mt=Transmit Mirror, Tc=Transmit Checksum Offload, L3=Layer 3, L2=Layer 2
       D=DHCP, Vp=Vhost Physical, Pr=Promiscuous, Vnt=Native Vlan Tagged
       Mnp=No MAC Proxy, Dpdk=DPDK PMD Interface, Rfl=Receive Filtering Offload, Mon=Interface is Monitored
       Uuf=Unknown Unicast Flood, Vof=VLAN insert/strip offload, Df=Drop New Flows, L=MAC Learning Enabled
       Proxy=MAC Requests Proxied Always, Er=Etree Root, Mn=Mirror without Vlan Tag, HbsL=HBS Left Intf
       HbsR=HBS Right Intf, Ig=Igmp Trap Enabled

vif0/2      OS: ens3 (Speed 10000, Duplex 1)
            Type:Physical HWaddr:06:6c:0b:c8:dd:64 IPaddr:0.0.0.0
            Vrf:0 Mcast Vrf:65535 Flags:XTcL3L2Vp QOS:0 Ref:1
            RX packets:54  bytes:13325 errors:0
            TX packets:39  bytes:4452 errors:0
            Drops:0

vif0/16     OS: vhost0
            Type:Host HWaddr:06:6c:0b:c8:dd:64 IPaddr:0.0.0.0
            Vrf:0 Mcast Vrf:65535 Flags:XL3L2 QOS:0 Ref:1
            RX packets:39  bytes:4452 errors:0
            TX packets:54  bytes:13325 errors:0
            Drops:0

vif0/4350   OS: pkt3
            Type:Stats HWaddr:00:00:00:00:00:00 IPaddr:0.0.0.0
            Vrf:65535 Mcast Vrf:65535 Flags:L3L2 QOS:0 Ref:1
            RX packets:0  bytes:0 errors:0
            TX packets:0  bytes:0 errors:0
            Drops:0

vif0/4351   OS: pkt1
            Type:Stats HWaddr:00:00:00:00:00:00 IPaddr:0.0.0.0
            Vrf:65535 Mcast Vrf:65535 Flags:L3L2 QOS:0 Ref:1
            RX packets:0  bytes:0 errors:0
            TX packets:0  bytes:0 errors:0
            Drops:0

[root@ip-172-31-12-55 ~]# 

# docker-compose -f /etc/contrail/vrouter/docker-compose.yaml up -d

[root@ip-172-31-12-55 ~]# ip -o a
1: lo    inet 127.0.0.1/8 scope host lo\       valid_lft forever preferred_lft forever
1: lo    inet6 ::1/128 scope host \       valid_lft forever preferred_lft forever
2: ens3    inet6 fe80::46c:bff:fec8:dd64/64 scope link \       valid_lft forever preferred_lft forever
3: docker0    inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0\       valid_lft forever preferred_lft forever
16: vhost0    inet 172.31.12.55/20 brd 172.31.15.255 scope global dynamic vhost0\       valid_lft 3552sec preferred_lft 3552sec
16: vhost0    inet6 fe80::46c:bff:fec8:dd64/64 scope link \       valid_lft forever preferred_lft forever
17: pkt0    inet6 fe80::5094:6cff:fefb:42f7/64 scope link \       valid_lft forever preferred_lft forever
[root@ip-172-31-12-55 ~]# 
[root@ip-172-31-12-55 ~]# ip link
1: lo:  mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: ens3:  mtu 9001 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether 06:6c:0b:c8:dd:64 brd ff:ff:ff:ff:ff:ff
3: docker0:  mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default 
    link/ether 02:42:34:e8:c3:14 brd ff:ff:ff:ff:ff:ff
13: pkt1:  mtu 65535 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/void 36:72:98:97:9b:31 brd 00:00:00:00:00:00
14: pkt3:  mtu 65535 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/void 92:aa:52:e8:d5:c5 brd 00:00:00:00:00:00
15: pkt2:  mtu 65535 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/void 42:b2:46:73:3d:6c brd 00:00:00:00:00:00
16: vhost0:  mtu 9001 qdisc pfifo_fast state UNKNOWN mode DEFAULT group default qlen 1000
    link/ether 06:6c:0b:c8:dd:64 brd ff:ff:ff:ff:ff:ff
17: pkt0:  mtu 1500 qdisc pfifo_fast state UNKNOWN mode DEFAULT group default qlen 1000
    link/ether 52:94:6c:fb:42:f7 brd ff:ff:ff:ff:ff:ff
[root@ip-172-31-12-55 ~]# 
[root@ip-172-31-12-55 ~]# vif --list
Vrouter Interface Table

Flags: P=Policy, X=Cross Connect, S=Service Chain, Mr=Receive Mirror
       Mt=Transmit Mirror, Tc=Transmit Checksum Offload, L3=Layer 3, L2=Layer 2
       D=DHCP, Vp=Vhost Physical, Pr=Promiscuous, Vnt=Native Vlan Tagged
       Mnp=No MAC Proxy, Dpdk=DPDK PMD Interface, Rfl=Receive Filtering Offload, Mon=Interface is Monitored
       Uuf=Unknown Unicast Flood, Vof=VLAN insert/strip offload, Df=Drop New Flows, L=MAC Learning Enabled
       Proxy=MAC Requests Proxied Always, Er=Etree Root, Mn=Mirror without Vlan Tag, HbsL=HBS Left Intf
       HbsR=HBS Right Intf, Ig=Igmp Trap Enabled

vif0/0      OS: ens3 (Speed 10000, Duplex 1) NH: 4
            Type:Physical HWaddr:06:6c:0b:c8:dd:64 IPaddr:0.0.0.0
            Vrf:0 Mcast Vrf:65535 Flags:TcL3L2VpEr QOS:-1 Ref:7
            RX packets:165  bytes:97837 errors:0
            TX packets:156  bytes:124911 errors:0
            Drops:0

vif0/1      OS: vhost0 NH: 5
            Type:Host HWaddr:06:6c:0b:c8:dd:64 IPaddr:172.31.12.55
            Vrf:0 Mcast Vrf:65535 Flags:PL3DEr QOS:-1 Ref:8
            RX packets:159  bytes:125878 errors:0
            TX packets:192  bytes:98971 errors:0
            Drops:7

vif0/2      OS: pkt0
            Type:Agent HWaddr:00:00:5e:00:01:00 IPaddr:0.0.0.0
            Vrf:65535 Mcast Vrf:65535 Flags:L3Er QOS:-1 Ref:3
            RX packets:31  bytes:2666 errors:0
            TX packets:34  bytes:13535 errors:0
            Drops:0

vif0/4350   OS: pkt3
            Type:Stats HWaddr:00:00:00:00:00:00 IPaddr:0.0.0.0
            Vrf:65535 Mcast Vrf:65535 Flags:L3L2 QOS:0 Ref:1
            RX packets:0  bytes:0 errors:0
            TX packets:0  bytes:0 errors:0
            Drops:0

vif0/4351   OS: pkt1
            Type:Stats HWaddr:00:00:00:00:00:00 IPaddr:0.0.0.0
            Vrf:65535 Mcast Vrf:65535 Flags:L3L2 QOS:0 Ref:1
            RX packets:0  bytes:0 errors:0
            TX packets:0  bytes:0 errors:0
            Drops:0

[root@ip-172-31-12-55 ~]# 

Tungsten Fabric知识库丨vRouter内部运行探秘_第1张图片
Tungsten Fabric知识库丨vRouter内部运行探秘_第2张图片