[RFC,v2] vhost: introduce mdev based hardware vhost backend

Details about this can be found here:

https://lwn.net/Articles/750770/

What's new in this version
==========================

A new VFIO device type is introduced - vfio-vhost. This addressed
some comments from here: https://patchwork.ozlabs.org/cover/984763/

Below is the updated device interface:

Currently, there are two regions of this device: 1) CONFIG_REGION
(VFIO_VHOST_CONFIG_REGION_INDEX), which can be used to setup the
device; 2) NOTIFY_REGION (VFIO_VHOST_NOTIFY_REGION_INDEX), which
can be used to notify the device.

1. CONFIG_REGION

The region described by CONFIG_REGION is the main control interface.
Messages will be written to or read from this region.

The message type is determined by the `request` field in message
header. The message size is encoded in the message header too.
The message format looks like this:

struct vhost_vfio_op {
	__u64 request;
	__u32 flags;
	/* Flag values: */
 #define VHOST_VFIO_NEED_REPLY 0x1 /* Whether need reply */
	__u32 size;
	union {
		__u64 u64;
		struct vhost_vring_state state;
		struct vhost_vring_addr addr;
	} payload;
};

The existing vhost-kernel ioctl cmds are reused as the message
requests in above structure.

Each message will be written to or read from this region at offset 0:

int vhost_vfio_write(struct vhost_dev *dev, struct vhost_vfio_op *op)
{
	int count = VHOST_VFIO_OP_HDR_SIZE + op->size;
	struct vhost_vfio *vfio = dev->opaque;
	int ret;

	ret = pwrite64(vfio->device_fd, op, count, vfio->config_offset);
	if (ret != count)
		return -1;

	return 0;
}

int vhost_vfio_read(struct vhost_dev *dev, struct vhost_vfio_op *op)
{
	int count = VHOST_VFIO_OP_HDR_SIZE + op->size;
	struct vhost_vfio *vfio = dev->opaque;
	uint64_t request = op->request;
	int ret;

	ret = pread64(vfio->device_fd, op, count, vfio->config_offset);
	if (ret != count || request != op->request)
		return -1;

	return 0;
}

It's quite straightforward to set things to the device. Just need to
write the message to device directly:

int vhost_vfio_set_features(struct vhost_dev *dev, uint64_t features)
{
	struct vhost_vfio_op op;

	op.request = VHOST_SET_FEATURES;
	op.flags = 0;
	op.size = sizeof(features);
	op.payload.u64 = features;

	return vhost_vfio_write(dev, &op);
}

To get things from the device, two steps are needed.
Take VHOST_GET_FEATURE as an example:

int vhost_vfio_get_features(struct vhost_dev *dev, uint64_t *features)
{
	struct vhost_vfio_op op;
	int ret;

	op.request = VHOST_GET_FEATURES;
	op.flags = VHOST_VFIO_NEED_REPLY;
	op.size = 0;

	/* Just need to write the header */
	ret = vhost_vfio_write(dev, &op);
	if (ret != 0)
		goto out;

	/* `op` wasn't changed during write */
	op.flags = 0;
	op.size = sizeof(*features);

	ret = vhost_vfio_read(dev, &op);
	if (ret != 0)
		goto out;

	*features = op.payload.u64;
out:
	return ret;
}

2. NOTIFIY_REGION (mmap-able)

The region described by NOTIFY_REGION will be used to notify
the device.

Each queue will have a page for notification, and it can be mapped
to VM (if hardware also supports), and the virtio driver in the VM
will be able to notify the device directly.

The region described by NOTIFY_REGION is also write-able. If
the accelerator's notification register(s) cannot be mapped to
the VM, write() can also be used to notify the device. Something
like this:

void notify_relay(void *opaque)
{
	......
	offset = host_page_size * queue_idx;

	ret = pwrite64(vfio->device_fd, &queue_idx, sizeof(queue_idx),
			vfio->notify_offset + offset);
	......
}

3. VFIO interrupt ioctl API

VFIO interrupt ioctl API is used to setup device interrupts.
IRQ-bypass can also be supported.

Currently, the data path interrupt can be configured via the
VFIO_VHOST_VQ_IRQ_INDEX with virtqueue's callfd.

Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
---
 drivers/vhost/Makefile     |   2 +
 drivers/vhost/vdpa.c       | 770 +++++++++++++++++++++++++++++++++++++
 include/linux/vdpa_mdev.h  |  72 ++++
 include/uapi/linux/vfio.h  |  19 +
 include/uapi/linux/vhost.h |  25 ++
 5 files changed, 888 insertions(+)
 create mode 100644 drivers/vhost/vdpa.c
 create mode 100644 include/linux/vdpa_mdev.h

Message ID	20190703091339.1847-1-tiwei.bie@intel.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <kvm-owner@kernel.org> Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 92A6714F6 for <patchwork-kvm@patchwork.kernel.org>; Wed, 3 Jul 2019 09:16:42 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 7FB8F286D6 for <patchwork-kvm@patchwork.kernel.org>; Wed, 3 Jul 2019 09:16:42 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 7176D2878F; Wed, 3 Jul 2019 09:16:42 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id DAFB5286D6 for <patchwork-kvm@patchwork.kernel.org>; Wed, 3 Jul 2019 09:16:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727169AbfGCJQf (ORCPT <rfc822;patchwork-kvm@patchwork.kernel.org>); Wed, 3 Jul 2019 05:16:35 -0400 Received: from mga06.intel.com ([134.134.136.31]:5199 "EHLO mga06.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725796AbfGCJQf (ORCPT <rfc822;kvm@vger.kernel.org>); Wed, 3 Jul 2019 05:16:35 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga007.jf.intel.com ([10.7.209.58]) by orsmga104.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 03 Jul 2019 02:16:33 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.63,446,1557212400"; d="scan'208";a="154693094" Received: from npg-dpdk-virtio-tbie-2.sh.intel.com ([10.67.104.151]) by orsmga007.jf.intel.com with ESMTP; 03 Jul 2019 02:16:30 -0700 From: Tiwei Bie <tiwei.bie@intel.com> To: mst@redhat.com, jasowang@redhat.com, alex.williamson@redhat.com, maxime.coquelin@redhat.com Cc: linux-kernel@vger.kernel.org, kvm@vger.kernel.org, virtualization@lists.linux-foundation.org, netdev@vger.kernel.org, dan.daly@intel.com, cunming.liang@intel.com, zhihong.wang@intel.com, tiwei.bie@intel.com Subject: [RFC v2] vhost: introduce mdev based hardware vhost backend Date: Wed, 3 Jul 2019 17:13:39 +0800 Message-Id: <20190703091339.1847-1-tiwei.bie@intel.com> X-Mailer: git-send-email 2.17.1 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: <kvm.vger.kernel.org> X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP
Series	[RFC,v2] vhost: introduce mdev based hardware vhost backend \| expand [RFC,v2] vhost: introduce mdev based hardware vhost backend

[RFC,v2] vhost: introduce mdev based hardware vhost backend

Commit Message

Comments

Patch