vhost-user summary

vhost

virtio

e1000网卡：提供虚拟机与外界进行通信的能力。但是e1000网卡上也包含了复杂的io端口，寄存器，缓存配置，虚拟机每次收发包都会引起更多的io和mmio操作。

通过Virtio：通过共享队列共享数据，减少IO，提高网络性能
- 采用virtio，需要从用户态(guest)到内核态(KVM)，然后再进入用户态处理事件（qemu）。然后qemu对设备进行访问。
通过vhost：内核中加入了vhost-net.ko模块
- 采用vhost，只需要从用户态（Guest）陷出到内核态（KVM）。然后KVM中的vhost.net.ko模块可以直接访问宿主机上的设备。减少了一次从内核态（KVM）到用户态（qemu）的切换
- 设备在qemu中为vhost-net
区分Vhost-user：用于用户态到用户态进程之间的通信（vhost是用户态和内核态之间的通信）。如果将网络设备放到用户态，实现为vhost-backend，会更加灵活。
- 要求：所有的客户机内存分配为共享的巨页内存
- guest发出中断信号退出kvm，kvm直接和vhost-backend通信，然后网络数据将交由vhost-backend 进行处理
- 设备在qemu中对应的模块为vhost-user-net

vhost

vhost架构的一个惊人的特性是其并没有绑定在KVM上。其仅是一个用户空间接口并不依赖于KVM内核模块。

通信

Guest进程与vhost工作进程通信的方式与通用的virtio设备相同。例如内核中的vhost-net模块，接收Guest发出的IO时间通知之后，访问宿主机上的设备，并将响应放入virtioqueue中，以中断的方式通知Guest。

vhost-user

|------------------------|
|       vhost client	 |---------
|--------------------	 |		  |
|	shared memory	|----|		  |
|-------------------- 	 |		 socket
|						 |		  |
|		vhost backend    |---------
--------------------------

vhost-user 基于 C/S 的模式，采用 UNIX 域套接字（UNIX domain socket）来完成进程间的事件通知和数据交互，相比 vhost 中采用 ioctl 的方式，vhost-user 采用 socket 的方式大大简化了操作。
vhost-user 基于 vring 这套通用的共享内存通信方案，只要 client 和 server 按照 vring 提供的接口实现所需功能即可，常见的实现方案是 client 实现在 guest OS 中，一般是集成在 virtio 驱动上，server 端实现在 qemu 中，也可以实现在各种数据面中，如 OVS，Snabbswitch 等虚拟交换机。

如果使用 qemu 作为 vhost-user 的 server 端实现，在启动 qemu 时，我们需要指定 -mem-path 和 -netdev 参数，如：

$ qemu -m 1024 -mem-path /hugetlbfs,prealloc=on,share=on \
-netdev type=vhost-user,id=net0,file=/path/to/socket \
-device virtio-net-pci,netdev=net0

指定 -mem-path 意味着 qemu 会在 guest OS 的内存中创建一个文件，share=on 选项允许其他进程访问这个文件，也就意味着能访问 guest OS 内存，达到共享内存的目的。
-netdev type=vhost-user 指定通信方案，file=/path/to/socket 指定 socket 文件。

qemu在初始化vhost-user相关设备时，会通过socket建立Vring和事件通知机制。启动之后Guest和Host通信遵循virtio方式
- qemu中的vhost-user模块将与Guest的握手结果，通过socket通知到宿主机用户态的vhost-backend
  - 在guest启动后，加载virtio-net驱动，会写寄存器VIRTIO_PCI_GUEST_FEATURES，这个写操作会被kvm捕获传递给qemu。（qemu又会通知到vhost_backend）
  - 当guest中virtio-net加载完成后会写VIRTIO_PCI_STATUS寄存器，这个操作同样会被kvm捕获传递给qemu
- 虚拟机驱动之后，就是host用户态的vhost-backend和Guest中的用户态进程之间的通信了（通过virtio）
通过mmap的方式把ram映射到vhost-user app的进程空间实现内存的共享

支持的vhost消息有

VHOST_SET_MEM_TABLE
VHOST_SET_VRING_KICK  // guest通知后端的ioeventfd
VHOST_SET_VRING_CALL  // 设置host用掉buffer时触发的ioeventfd
VHOST_SET_LOG_FD
VHOST_GET_VRING_BASE  // 作为信号去数据平面删除vring的一个queue
VHOST_SET_VRING_ERR
...

分析

参考：

http://blog.chinaunix.net/uid-28541347-id-5786547.html

https://blog.csdn.net/qq_15437629/article/details/77899905

https://cloud.tencent.com/developer/article/1075606

https://www.cnblogs.com/ck1020/p/8341914.html

vsock

cloud-hypervisor中对virtio-vsock设备的描述

 The `VsockEpollHandler` implements the runtime logic of our vsock device:
 1. Respond to TX queue events by wrapping virtio buffers into `VsockPacket`s, then sending those
    packets to the `VsockBackend`;
 2. Forward backend FD event notifications to the `VsockBackend`;
 3. Fetch incoming packets from the `VsockBackend` and place them into the virtio RX queue;
 4. Whenever we have processed some virtio buffers (either TX or RX), let the driver know by
    raising our assigned IRQ.

 In a nutshell, the `VsockEpollHandler` logic looks like this:
 - on TX queue event:
   - fetch all packets from the TX queue and send them to the backend; then
   - if the backend has queued up any incoming packets, fetch them into any available RX buffers.
 - on RX queue event:
   - fetch any incoming packets, queued up by the backend, into newly available RX buffers.
 - on backend event:
   - forward the event to the backend; then
   - again, attempt to fetch any incoming packets queued by the backend into virtio RX buffers.

virtio-vsock设备有两个virtio queue

其中tx-queue负责接收前端的virtio buffer，并构造Vockpacket，发送到vsock-backend进行处理
rx-queue：vsock-backend产生的vsock-packet，通过rx-queue发送到guest中的设备驱动

局限：

前后端通信时仍然需要虚拟机陷出到kvm，然后进hypervisor处理，再通知vsock-backend

vhost-vsock

结构：

只需陷出到KVM（内核态），直接通知用户态的vsock-backend

###cloud-hypervisor

以block -device为例。在main()函数中就根据config，创建block-backend

    if let Some(backend_command) = cmd_arguments.value_of("net-backend") {
        start_net_backend(backend_command);
    } else if let Some(backend_command) = cmd_arguments.value_of("block-backend") {
        start_block_backend(backend_command);
    } else {
        start_vmm(cmd_arguments);
    }

###cloud-hypervisor

vhost-user-blk

vring等协商的过程直接通知到backend

runtime: config-change等消息由后端发送到qemu，Master进行处理

/// Type of requests sending from slaves to masters.
#[repr(u32)]
#[derive(Clone, Copy, Debug, PartialEq, Eq, PartialOrd, Ord)]
pub enum SlaveReq {
    /// Null operation.
    NOOP = 0,
    /// Send IOTLB messages with struct vhost_iotlb_msg as payload.
    IOTLB_MSG = 1,
    /// Notify that the virtio device's configuration space has changed.
    CONFIG_CHANGE_MSG = 2,
    /// Set host notifier for a specified queue.
    VRING_HOST_NOTIFIER_MSG = 3,
    /// Virtio-fs draft: map file content into the window.
    FS_MAP = 4,
    /// Virtio-fs draft: unmap file content from the window.
    FS_UNMAP = 5,
    /// Virtio-fs draft: sync file content.
    FS_SYNC = 6,
    /// Upper bound of valid commands.
    MAX_CMD = 7,
}

kill event 和pause event和config change 由后端处理
queue event由vhost-user-bachend处理

vhost

virtio

vhost

vhost-user

分析

vsock

vhost-vsock

CATALOG

FEATURED TAGS

FRIENDS