diff mbox

vhost-pci-net: add a new virtio device, vhost-pci-net, for network packet transmission between VMs

Message ID 1476448086-99588-1-git-send-email-wei.w.wang@intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Wang, Wei W Oct. 14, 2016, 12:28 p.m. UTC
In addition to the data path established using vhost-pci-net, this
patch also adds a support of establishing a notification path between
two virtio devices. New registers are added to the virtio device to
record all that's needed for its driver to inject interrupts using
hypercalls to the peer device (here, we treat virtio<---->virtio
connection as peer<---->peer) on the other end.

Signed-off-by: Wei Wang <wei.w.wang@intel.com>
---
 content.tex | 227 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 222 insertions(+), 5 deletions(-)
diff mbox

Patch

diff --git a/content.tex b/content.tex
index 4b45678..5f9bdae 100644
--- a/content.tex
+++ b/content.tex
@@ -1295,6 +1295,14 @@  struct virtio_pci_common_cfg {
         le64 queue_desc;                /* read-write */
         le64 queue_avail;               /* read-write */
         le64 queue_used;                /* read-write */
+
+        /* About a peer device */
+        le16 peer_connection;           /* read-write */
+        le16 peer_num_rx_queues;        /* read only for driver */
+        le16 peer_rx_queue_select;      /* read-write */
+        le32 peer_rx_queue_gsi;         /* read-only for driver */
+        le64 peer_uuid_hi;              /* read-only for driver */
+        le64 peer_uuid_lo;              /* read-only for driver */
 };
 \end{lstlisting}
 
@@ -1361,6 +1369,25 @@  struct virtio_pci_common_cfg {
 
 \item[\field{queue_used}]
         The driver writes the physical address of Used Ring here.  See section \ref{sec:Basic Facilities of a Virtio Device / Virtqueues}.
+
+\item[\field{peer_connection}]
+        Connection Control/Status. 1 - Connected; 0 - Disconnected.
+
+\item[\field{peer_num_rx_queues}]
+        The device uses this to report the number of RX virtqueues that the connected peer device uses.
+
+\item[\field{peer_rx_queue_select}]
+        The driver selects which RX virtqueue of the peer device the following fields refer to.
+
+\item[\field{peer_rx_queue_gsi}]
+        The device writes the GSI of an RX virtqueue of the peer device here.
+
+\item[\field{peer_uuid_hi}]
+        The device writes the high order 64-bit of the peer uuid here.
+
+\item[\field{peer_uuid_lo}]
+        The device writes the low order 64-bit of the peer uuid here.
+
 \end{description}
 
 \devicenormative{\paragraph}{Common configuration structure layout}{Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / Common configuration structure layout}
@@ -1405,9 +1432,15 @@  The device MUST present a 0 in \field{queue_enable} on reset.
 The device MUST present a 0 in \field{queue_size} if the virtqueue
 corresponding to the current \field{queue_select} is unavailable.
 
+The peer device related registers are used when the device is connected to another device (e.g. a vhost-pci device instance). The device SHOULD negotiate with the peer device, and configure \field{peer_num_rx_queues}, \field{peer_rx_queue_gsi}, \field{peer_uuid_hi}, and \field{peer_uuid_lo}.
+
+When the device finishes the necessary negotiation with the peer device to establish the connection, it MUST write a 1 to the \field{peer_connection} and notify the driver.
+
+When the device notifies that the driver requests to write a 0 to \field{peer_connection}, it SHOULD first negotiate with the peer device to close the connection, and then write a 0 to the \field{peer_connection} and notify the driver.
+
 \drivernormative{\paragraph}{Common configuration structure layout}{Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / Common configuration structure layout}
 
-The driver MUST NOT write to \field{device_feature}, \field{num_queues}, \field{config_generation} or \field{queue_notify_off}.
+The driver MUST NOT write to \field{device_feature}, \field{num_queues}, \field{config_generation}, \field{queue_notify_off}, \field{peer_num_rx_queues}, \field{peer_rx_queue_gsi}, \field{peer_uuid_hi}, or \field{peer_uuid_lo}.
 
 The driver MUST NOT write a value which is not a power of 2 to \field{queue_size}.
 
@@ -1419,6 +1452,12 @@  After writing 0 to \field{device_status}, the driver MUST wait for a read of
 
 The driver MUST NOT write a 0 to \field{queue_enable}.
 
+The driver MUST NOT write a 1 to \field{peer_connection}.
+
+The driver SHOULD NOT read the peer device related registers until it is notified that a 1 has been written to \field{peer_connection}.
+
+The driver MUST NOT unload until it reads a 0 from \field{peer_connection}.
+
 \subsubsection{Notification structure layout}\label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / Notification capability}
 
 The notification location is found using the VIRTIO_PCI_CAP_NOTIFY_CFG
@@ -1476,11 +1515,11 @@  The \field{offset} for the \field{ISR status} has no alignment requirements.
 The ISR bits allow the device to distinguish between device-specific configuration
 change interrupts and normal virtqueue interrupts:
 
-\begin{tabular}{ |l||l|l|l| }
+\begin{tabular}{ |l||p{3.5cm}|p{3.5cm}|p{3.5cm}|l| }
 \hline
-Bits       & 0                               & 1               &  2 to 31 \\
+Bits    & 0               & 1                              & 2                            & 3 to 31 \\
 \hline
-Purpose    & Queue Interrupt  & Device Configuration Interrupt & Reserved \\
+Purpose & Queue Interrupt & Device Configuration Interrupt & Peer Device Status Interrupt & Reserved \\
 \hline
 \end{tabular}
 
@@ -5750,9 +5789,181 @@  descriptor for the \field{sense_len}, \field{residual},
 \field{status_qualifier}, \field{status}, \field{response} and
 \field{sense} fields.
 
+\section{Vhost-pci Net Device}\label{sec:Device Types / Vhost-pci Net Device}
+
+The vhost-pci net device enables point-to-point transmission of network packets between two isolated address spaces (e.g. virtual machines). An instance of the vhost-pci net device transmits and grabs packets from its peer device, which is usually a virtio net device from another address space.
+
+\subsection{Device ID}\label{sec:Device Types / Vhost-pci Net Device / Device ID}
+  TBD
+
+\subsection{Virtqueues}\label{sec:Device Types / Vhost-pci Net Device / Virtqueues}
+
+\begin{description}
+\item[0] control receiveq
+\item[1] control transmitq
+\item[2] receiveq
+\end{description}
+
+\subsection{Feature bits}\label{sec:Device Types / Vhost-pci Net Device / Feature bits}
+
+\subsubsection{Device feature bits}\label{Device Types / Vhost-pci Net Device / Feature bits / Device feature bits}
+
+The device feature bits are the traditional feature bits, which are negotiated between the device and its driver.
+
+\begin{description}
+\item[VHOST_PCI_NET_F_MAC (0)] Device has given MAC address.
+
+\item[VHOST_PCI_NET_F_CTRL_MAC_ADDR (1)] Set MAC address through control channel.
+
+\item[VHOST_PCI_NET_F_MRG_RXBUF (2)] Driver can merge receive buffers.
+\end{description}
+
+\subsubsection{Peer feature bits}\label{Device Types / Vhost-pci Net Device / Feature bits / Peer feature bits}
+The peer feature bits need to be negotiated with the peer device. The feature bits that have been negotiated with the peer device are sent to the driver for a negotiation. If the driver only accepts a subset of the feature bits, the device needs to re-negotiate the subset of feature bits with the peer device, which may trigger a reset of the peer device.
+
+\begin{description}
+\item[VIRTIO_NET_F_GUEST_TSO4 (7)] Virtio-net can receive TSOv4.
+
+\item[VIRTIO_NET_F_GUEST_TSO6 (8)] Virtio-net can receive TSOv6.
+
+\item[VIRTIO_NET_F_GUEST_ECN (9)] Virtio-net can receive TSO with ECN.
+
+\item[VIRTIO_NET_F_GUEST_UFO (10)] Virtio-net can receive UFO.
+
+\item[VIRTIO_NET_F_HOST_TSO4 (11)] Vhost-pci-net supports TSOv4.
+
+\item[VIRTIO_NET_F_HOST_TSO6 (12)] Vhost-pci-net supports TSOv6.
+
+\item[VIRTIO_NET_F_HOST_ECN (13)] Vhost-pci-net supports TSO with ECN.
+
+\item[VIRTIO_NET_F_HOST_UFO (14)] Vhost-pci-net supports UFO.
+
+\item[VIRTIO_NET_F_MRG_RXBUF (15)] Virtio-net can merge receive buffers.
+
+\item[VHOST_F_LOG_ALL (27)] Vhost-pci-net supports dirty page logging.
+
+\end{description}
+
+\devicenormative{\paragraph}{Peer feature bits}{Device Types / Vhost-pci Net Device / Feature bits / Peer feature bits}
+The device SHOULD send the feature bits that have been accepted by the peer device to the driver through the control receiveq.
+
+\drivernormative{\paragraph}{Peer feature bits}{Device Types / Vhost-pci Net Device / Feature bits / Peer feature bits }
+Upon receiving the peer feature bits from the device, the driver SHOULD send its supported peer feature bits to the device via the control transmitq.
+
+\subsection{Device configuration layout}\label{sec:Device Types / Vhost-pci Device / Device configuration layout}
+  None currently defined.
+
+\subsection{Device Initialization}\label{sec:Device Types / Vhost-pci Device / Device Initialization}
+
+The driver would perform a typical initialization routine like so:
+
+\begin{enumerate}
+\item Identify and intialize the control receiveq, control transmitq, and receiveq.
+
+\item Fill the receiveq and control receiveq with buffers.
+
+\item If the VHOST_PCI_NET_F_MAC feature bit is set, the configuration
+  space \field{mac} entry indicates the ``physical'' address of the
+  network card, otherwise the driver would typically generate a random
+  local MAC address.
+\end{enumerate}
+
+\subsection{Device Operation}\label{sec:Device Types / Vhost-pci Net Device / Device Operation}
+
+\subsubsection{Control Virtqueue}\label{sec:Device Types / Vhost-pci Net Device / Device Operation / Control Virtqueue}
+
+The pair of control virtqueues are used to exchange configuration messages between the device and driver. All the configuration messages are constructed using the folloing structure:
+
+\begin{lstlisting}
+struct vhost_pci_ctrl {
+        u32 request;
+        u64 vhost_pci_id;
+        u8  request_specific_payload[];
+};
+\end{lstlisting}
+
+The \field{vhost_pci_id} stores the id of the vhost pci device. It is usually assigned by the vhost-pci device management software.
+The requests are defined following the VHOST_PCI_CTRL format, and they are introduced below.
+
+\begin{lstlisting}
+#define VHOST_PCI_CTRL_PEER_FEATURE_BITS 0
+\end{lstlisting}
+
+The device sends the peer feature bits that have been negotiated with the peer device to the driver via the control receiveq. The driver sends back its accepted peer feature bits to the device via the control transmitq.
+The request payload is described using the following structure:
+
+\begin{lstlisting}
+struct vhost_pci_ctrl_driver_feature_bits {
+        u64 feature_bits;
+}
+\end{lstlisting}
+
+\begin{lstlisting}
+#define VHOST_PCI_CTRL_PEER_MEM_INFO 1
+\end{lstlisting}
+
+The device sends the memory info obtained from the peer device to the driver. The payload is described using the structure below:
+
+\begin{lstlisting}
+struct vhost_pci_ctrl_peer_mem_info {
+#define VHOST_PCI_MEM_INFO_NEED_MAP_N 0
+#define VHOST_PCI_MEM_INFO_NEED_MAP_Y 1
+        u8 need_map;
+        u64 peer_mem;
+        u8 other_mem_info[];
+}
+\end{lstlisting}
+
+If \field{need_map} is set to VHOST_PCI_MEM_INFO_NEED_MAP_N, \field{peer_mem} stores the virtual address which already maps to the start of the peer memory. The driver can use it directly to access the peer memory.
+
+If \field{need_map} is set to VHOST_PCI_MEM_INFO_NEED_MAP_Y, the driver needs to map the peer memory via a device BAR, and \field{peer_mem} stores the BAR id. The driver sends back a message to the device with \field{peer_mem} set to the virtual address that maps to the peer memory.
+
+The \field{other_mem_info} stores other peer memory info for the driver to reference, and it is defined according to the implementation's need.
+
+\begin{lstlisting}
+#define VHOST_PCI_CTRL_PEER_VIRTQ_INFO 2
+\end{lstlisting}
+
+The device sends the virtqueue info obtained from the peer device to the driver. The payload is described using the structure below:
+
+\begin{lstlisting}
+struct vhost_pci_ctrl_peer_virtq_info {
+#define VHOST_PCI_PEER_VIRTQ_TX 0
+#define VHOST_PCI_PEER_VIRTQ_RX 1
+        u8 tx_or_rx;
+        u32 virtq_num;
+        struct virtq vq[];
+}
+\end{lstlisting}
+
+If the \field{tx_or_rx} is set to VHOST_PCI_PEER_VIRTQ_TX, the driver initializes \field{virtq_num} of virtqueues by sharing the TX virtqueues from the peer device, and uses them as its mirrored RX virtqueues. To receive packets from the peer device, the driver copies packets from the mirrored RX virtqueues to its own RX virtqueue (i.e. the defined receivq).
+
+If the \field{tx_or_rx} is set to VHOST_PCI_PEER_VIRTQ_RX, the driver initializes \field{virtq_num} of virtqueues by sharing the RX virtqueues from the peer device, and uses them as its mirrored TX virtqueues. To transmit packets to the peer device, the driver copies packets to the mirrored TX virtqueues.
+
+\begin{lstlisting}
+#define VHOST_PCI_CTRL_DIRTY_PAGE_LOGGING 3
+\end{lstlisting}
+
+The device sends messages to turn on or off the page logging mode of the driver.
+\begin{lstlisting}
+struct vhost_pci_ctrl_dirty_page_logging {
+#define VHOST_PCI_DIRTY_PAGE_LOGGING_OFF 0
+#define VHOST_PCI_DIRTY_PAGE_LOGGING_ON  1
+        u8 off_or_on;
+}
+\end{lstlisting}
+
+Other types of vhost-pci devices (e.g. scsi, console) may use the same controlq messages above. Here defines the messages that are specific to vhost-pci net devices.
+
+\begin{lstlisting}
+#define VHOST_PCI_CTRL_MAC 0x10000
+\end{lstlisting}
+
+if \field{VHOST_PCI_NET_F_CTRL_MAC_ADDR} is negotiated, the driver sends a message via the control transmitq to set the MAC address of the device.
+
 \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
 
-Currently there are three device-independent feature bits defined:
+Currently there are four device-independent feature bits defined:
 
 \begin{description}
   \item[VIRTIO_F_RING_INDIRECT_DESC (28)] Negotiating this feature indicates
@@ -5764,6 +5975,10 @@  Currently there are three device-independent feature bits defined:
 
   \item[VIRTIO_F_VERSION_1(32)] This indicates compliance with this
     specification, giving a simple way to detect legacy devices or drivers.
+
+  \item[VIRTIO_F_PV_INTERRUPT(33)] Negotiating this feature indicates that the
+    driver can inject an interrupt to its peer device in a paravirtualized
+    way (e.g. hypercall).
 \end{description}
 
 \drivernormative{\section}{Reserved Feature Bits}{Reserved Feature Bits}
@@ -5776,6 +5991,8 @@  MAY fail to operate further if VIRTIO_F_VERSION_1 is not offered.
 A device MUST offer VIRTIO_F_VERSION_1.  A device MAY fail to operate further
 if VIRTIO_F_VERSION_1 is not accepted.
 
+A device MUST check if the management environment (e.g. a virtual machine monitor) supports pv interrupt and configures the VIRTIO_F_PV_INTERRUPT feature bit accordingly.
+
 \section{Legacy Interface: Reserved Feature Bits}\label{sec:Reserved Feature Bits / Legacy Interface: Reserved Feature Bits}
 
 Transitional devices MAY offer the following: