From patchwork Fri May 29 08:02:58 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jason Wang X-Patchwork-Id: 11578061 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5E830139A for ; Fri, 29 May 2020 08:03:41 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 3E85F208E4 for ; Fri, 29 May 2020 08:03:41 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="JX1bPUyp" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726575AbgE2IDi (ORCPT ); Fri, 29 May 2020 04:03:38 -0400 Received: from us-smtp-delivery-1.mimecast.com ([207.211.31.120]:38106 "EHLO us-smtp-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726310AbgE2IDd (ORCPT ); Fri, 29 May 2020 04:03:33 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1590739410; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=16IIq+gtyyOuTA/+k/mFaqIaYYtPCzydoYwI/b/+K1s=; b=JX1bPUyptFB31/1RA0pVqyYlU4Q7hn67MKt0kQpTuZ+59pVONP8WXlyMqSCaF627b9O1O8 E/3aKcH5fVqf9UtywLG+IfPg+drIDySZFSLANjY9+9c73HASSWcs4ji29xNn3HOFYfqebn p9Rpx8zzi0BEKxgjmWtvUbfs63m3x+A= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-487-m3IrS965MNWPb0j_hZqinw-1; Fri, 29 May 2020 04:03:26 -0400 X-MC-Unique: m3IrS965MNWPb0j_hZqinw-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 9F6E8107ACCA; Fri, 29 May 2020 08:03:24 +0000 (UTC) Received: from jason-ThinkPad-X1-Carbon-6th.redhat.com (ovpn-13-231.pek2.redhat.com [10.72.13.231]) by smtp.corp.redhat.com (Postfix) with ESMTP id 06E6199DE6; Fri, 29 May 2020 08:03:15 +0000 (UTC) From: Jason Wang To: mst@redhat.com, jasowang@redhat.com Cc: kvm@vger.kernel.org, virtualization@lists.linux-foundation.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, rob.miller@broadcom.com, lingshan.zhu@intel.com, eperezma@redhat.com, lulu@redhat.com, shahafs@mellanox.com, hanand@xilinx.com, mhabets@solarflare.com, gdawar@xilinx.com, saugatm@xilinx.com, vmireyno@marvell.com, zhangweining@ruijie.com.cn, eli@mellanox.com Subject: [PATCH 1/6] vhost: allow device that does not depend on vhost worker Date: Fri, 29 May 2020 16:02:58 +0800 Message-Id: <20200529080303.15449-2-jasowang@redhat.com> In-Reply-To: <20200529080303.15449-1-jasowang@redhat.com> References: <20200529080303.15449-1-jasowang@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org vDPA device currently relays the eventfd via vhost worker. This is inefficient due the latency of wakeup and scheduling, so this patch tries to introduce a use_worker attribute for the vhost device. When use_worker is not set with vhost_dev_init(), vhost won't try to allocate a worker thread and the vhost_poll will be processed directly in the wakeup function. This help for vDPA since it reduces the latency caused by vhost worker. In my testing, it saves 0.2 ms in pings between VMs on a mutual host. Signed-off-by: Zhu Lingshan Signed-off-by: Jason Wang --- drivers/vhost/net.c | 2 +- drivers/vhost/scsi.c | 2 +- drivers/vhost/vdpa.c | 2 +- drivers/vhost/vhost.c | 38 +++++++++++++++++++++++++------------- drivers/vhost/vhost.h | 2 ++ drivers/vhost/vsock.c | 2 +- 6 files changed, 31 insertions(+), 17 deletions(-) diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index 2927f02cc7e1..bf5e1d81ae25 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -1326,7 +1326,7 @@ static int vhost_net_open(struct inode *inode, struct file *f) } vhost_dev_init(dev, vqs, VHOST_NET_VQ_MAX, UIO_MAXIOV + VHOST_NET_BATCH, - VHOST_NET_PKT_WEIGHT, VHOST_NET_WEIGHT, + VHOST_NET_PKT_WEIGHT, VHOST_NET_WEIGHT, true, NULL); vhost_poll_init(n->poll + VHOST_NET_VQ_TX, handle_tx_net, EPOLLOUT, dev); diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c index c39952243fd3..0cbaa0b3893d 100644 --- a/drivers/vhost/scsi.c +++ b/drivers/vhost/scsi.c @@ -1628,7 +1628,7 @@ static int vhost_scsi_open(struct inode *inode, struct file *f) vs->vqs[i].vq.handle_kick = vhost_scsi_handle_kick; } vhost_dev_init(&vs->dev, vqs, VHOST_SCSI_MAX_VQ, UIO_MAXIOV, - VHOST_SCSI_WEIGHT, 0, NULL); + VHOST_SCSI_WEIGHT, 0, true, NULL); vhost_scsi_init_inflight(vs, NULL); diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c index 20476b505d99..6ff72289f488 100644 --- a/drivers/vhost/vdpa.c +++ b/drivers/vhost/vdpa.c @@ -696,7 +696,7 @@ static int vhost_vdpa_open(struct inode *inode, struct file *filep) vqs[i] = &v->vqs[i]; vqs[i]->handle_kick = handle_vq_kick; } - vhost_dev_init(dev, vqs, nvqs, 0, 0, 0, + vhost_dev_init(dev, vqs, nvqs, 0, 0, 0, false, vhost_vdpa_process_iotlb_msg); dev->iotlb = vhost_iotlb_alloc(0, 0); diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index d450e16c5c25..70105e045768 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -166,11 +166,16 @@ static int vhost_poll_wakeup(wait_queue_entry_t *wait, unsigned mode, int sync, void *key) { struct vhost_poll *poll = container_of(wait, struct vhost_poll, wait); + struct vhost_work *work = &poll->work; if (!(key_to_poll(key) & poll->mask)) return 0; - vhost_poll_queue(poll); + if (!poll->dev->use_worker) + work->fn(work); + else + vhost_poll_queue(poll); + return 0; } @@ -454,6 +459,7 @@ static size_t vhost_get_desc_size(struct vhost_virtqueue *vq, void vhost_dev_init(struct vhost_dev *dev, struct vhost_virtqueue **vqs, int nvqs, int iov_limit, int weight, int byte_weight, + bool use_worker, int (*msg_handler)(struct vhost_dev *dev, struct vhost_iotlb_msg *msg)) { @@ -471,6 +477,7 @@ void vhost_dev_init(struct vhost_dev *dev, dev->iov_limit = iov_limit; dev->weight = weight; dev->byte_weight = byte_weight; + dev->use_worker = use_worker; dev->msg_handler = msg_handler; init_llist_head(&dev->work_list); init_waitqueue_head(&dev->wait); @@ -549,18 +556,21 @@ long vhost_dev_set_owner(struct vhost_dev *dev) /* No owner, become one */ dev->mm = get_task_mm(current); dev->kcov_handle = kcov_common_handle(); - worker = kthread_create(vhost_worker, dev, "vhost-%d", current->pid); - if (IS_ERR(worker)) { - err = PTR_ERR(worker); - goto err_worker; - } + if (dev->use_worker) { + worker = kthread_create(vhost_worker, dev, + "vhost-%d", current->pid); + if (IS_ERR(worker)) { + err = PTR_ERR(worker); + goto err_worker; + } - dev->worker = worker; - wake_up_process(worker); /* avoid contributing to loadavg */ + dev->worker = worker; + wake_up_process(worker); /* avoid contributing to loadavg */ - err = vhost_attach_cgroups(dev); - if (err) - goto err_cgroup; + err = vhost_attach_cgroups(dev); + if (err) + goto err_cgroup; + } err = vhost_dev_alloc_iovecs(dev); if (err) @@ -568,8 +578,10 @@ long vhost_dev_set_owner(struct vhost_dev *dev) return 0; err_cgroup: - kthread_stop(worker); - dev->worker = NULL; + if (dev->worker) { + kthread_stop(dev->worker); + dev->worker = NULL; + } err_worker: if (dev->mm) mmput(dev->mm); diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h index f8403bd46b85..0feb6701e273 100644 --- a/drivers/vhost/vhost.h +++ b/drivers/vhost/vhost.h @@ -154,6 +154,7 @@ struct vhost_dev { int weight; int byte_weight; u64 kcov_handle; + bool use_worker; int (*msg_handler)(struct vhost_dev *dev, struct vhost_iotlb_msg *msg); }; @@ -161,6 +162,7 @@ struct vhost_dev { bool vhost_exceeds_weight(struct vhost_virtqueue *vq, int pkts, int total_len); void vhost_dev_init(struct vhost_dev *, struct vhost_virtqueue **vqs, int nvqs, int iov_limit, int weight, int byte_weight, + bool use_worker, int (*msg_handler)(struct vhost_dev *dev, struct vhost_iotlb_msg *msg)); long vhost_dev_set_owner(struct vhost_dev *dev); diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c index e36aaf9ba7bd..2eb85c42bac4 100644 --- a/drivers/vhost/vsock.c +++ b/drivers/vhost/vsock.c @@ -621,7 +621,7 @@ static int vhost_vsock_dev_open(struct inode *inode, struct file *file) vhost_dev_init(&vsock->dev, vqs, ARRAY_SIZE(vsock->vqs), UIO_MAXIOV, VHOST_VSOCK_PKT_WEIGHT, - VHOST_VSOCK_WEIGHT, NULL); + VHOST_VSOCK_WEIGHT, true, NULL); file->private_data = vsock; spin_lock_init(&vsock->send_pkt_list_lock); From patchwork Fri May 29 08:02:59 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jason Wang X-Patchwork-Id: 11578063 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2A24992A for ; Fri, 29 May 2020 08:03:44 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 12820207F5 for ; Fri, 29 May 2020 08:03:44 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="U72m7Wsp" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726627AbgE2IDm (ORCPT ); Fri, 29 May 2020 04:03:42 -0400 Received: from us-smtp-1.mimecast.com ([207.211.31.81]:52547 "EHLO us-smtp-delivery-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1725928AbgE2IDi (ORCPT ); Fri, 29 May 2020 04:03:38 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1590739417; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=qAIlsWcWE5/JkQImrrDajyjydtkLArbDnm909F5raoM=; b=U72m7Wsp41utWh1eyAgASujZdV2oc3EwNKSTcdOSHoLXIBl6He+1W5vzZh0mqD47RM509o ggDl+gdgJayYZmLAeLLjGtaZD/jllKRBffyF9Q/jn62M0jY7D1qQOaEnWfmnkMNr5gYXMm JkcPgyk1VOkalRVfWn2ltwsca+q0weI= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-153-mJ9VqgnuPhSH-R82-jFFpA-1; Fri, 29 May 2020 04:03:36 -0400 X-MC-Unique: mJ9VqgnuPhSH-R82-jFFpA-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id EDDD081CBEF; Fri, 29 May 2020 08:03:33 +0000 (UTC) Received: from jason-ThinkPad-X1-Carbon-6th.redhat.com (ovpn-13-231.pek2.redhat.com [10.72.13.231]) by smtp.corp.redhat.com (Postfix) with ESMTP id CEA6999DE6; Fri, 29 May 2020 08:03:24 +0000 (UTC) From: Jason Wang To: mst@redhat.com, jasowang@redhat.com Cc: kvm@vger.kernel.org, virtualization@lists.linux-foundation.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, rob.miller@broadcom.com, lingshan.zhu@intel.com, eperezma@redhat.com, lulu@redhat.com, shahafs@mellanox.com, hanand@xilinx.com, mhabets@solarflare.com, gdawar@xilinx.com, saugatm@xilinx.com, vmireyno@marvell.com, zhangweining@ruijie.com.cn, eli@mellanox.com Subject: [PATCH 2/6] vhost: use mmgrab() instead of mmget() for non worker device Date: Fri, 29 May 2020 16:02:59 +0800 Message-Id: <20200529080303.15449-3-jasowang@redhat.com> In-Reply-To: <20200529080303.15449-1-jasowang@redhat.com> References: <20200529080303.15449-1-jasowang@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org For the device that doesn't use vhost worker and use_mm(), mmget() is too heavy weight and it may brings troubles for implementing mmap() support for vDPA device. This is because, an reference to the address space was held via mm_get() in vhost_dev_set_owner() and an reference to the file was held in mmap(). This means when process exits, the mm can not be released thus we can not release the file. This patch tries to use mmgrab() instead of mmget(), which allows the address space to be destroy in process exit without releasing the mm structure itself. This is sufficient for vDPA device which pin user pages and does not depend on the address space to work. Signed-off-by: Jason Wang --- drivers/vhost/vhost.c | 42 ++++++++++++++++++++++++++++++++++-------- 1 file changed, 34 insertions(+), 8 deletions(-) diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index 70105e045768..9642938a7e7c 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -541,6 +541,36 @@ bool vhost_dev_has_owner(struct vhost_dev *dev) } EXPORT_SYMBOL_GPL(vhost_dev_has_owner); +static void vhost_attach_mm(struct vhost_dev *dev) +{ + /* No owner, become one */ + if (dev->use_worker) { + dev->mm = get_task_mm(current); + } else { + /* vDPA device does not use worker thead, so there's + * no need to hold the address space for mm. This help + * to avoid deadlock in the case of mmap() which may + * held the refcnt of the file and depends on release + * method to remove vma. + */ + dev->mm = current->mm; + mmgrab(dev->mm); + } +} + +static void vhost_detach_mm(struct vhost_dev *dev) +{ + if (!dev->mm) + return; + + if (dev->use_worker) + mmput(dev->mm); + else + mmdrop(dev->mm); + + dev->mm = NULL; +} + /* Caller should have device mutex */ long vhost_dev_set_owner(struct vhost_dev *dev) { @@ -553,8 +583,8 @@ long vhost_dev_set_owner(struct vhost_dev *dev) goto err_mm; } - /* No owner, become one */ - dev->mm = get_task_mm(current); + vhost_attach_mm(dev); + dev->kcov_handle = kcov_common_handle(); if (dev->use_worker) { worker = kthread_create(vhost_worker, dev, @@ -583,9 +613,7 @@ long vhost_dev_set_owner(struct vhost_dev *dev) dev->worker = NULL; } err_worker: - if (dev->mm) - mmput(dev->mm); - dev->mm = NULL; + vhost_detach_mm(dev); dev->kcov_handle = 0; err_mm: return err; @@ -682,9 +710,7 @@ void vhost_dev_cleanup(struct vhost_dev *dev) dev->worker = NULL; dev->kcov_handle = 0; } - if (dev->mm) - mmput(dev->mm); - dev->mm = NULL; + vhost_detach_mm(dev); } EXPORT_SYMBOL_GPL(vhost_dev_cleanup); From patchwork Fri May 29 08:03:00 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jason Wang X-Patchwork-Id: 11578069 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 90590139A for ; Fri, 29 May 2020 08:04:10 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 77ACA208E4 for ; Fri, 29 May 2020 08:04:10 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="PfrgXSMT" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726799AbgE2IEF (ORCPT ); Fri, 29 May 2020 04:04:05 -0400 Received: from us-smtp-1.mimecast.com ([207.211.31.81]:49071 "EHLO us-smtp-delivery-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726775AbgE2IDr (ORCPT ); Fri, 29 May 2020 04:03:47 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1590739426; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=HcTPYCfr5sCNE0h8qLNyyGd6WCD8Ts7vuwtEmnD7m9Y=; b=PfrgXSMTdpRgu60rWRDgTUYmXix6ymJv9+CSTGrElOsj/aA/V2Dxv1c894GliClroQeKBU 7M8I0dnz85VrJ6U8aNxEq6CQghN63DjWHX21BSNQnbI1Bfp+UkjB44NfTM/Uf+4AHpQx7r 6MptN9ElBFirQi58Qlv7EXufJvgdu64= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-485-gUMe24RiOTij-A2w3OfVrg-1; Fri, 29 May 2020 04:03:42 -0400 X-MC-Unique: gUMe24RiOTij-A2w3OfVrg-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 81F34464; Fri, 29 May 2020 08:03:40 +0000 (UTC) Received: from jason-ThinkPad-X1-Carbon-6th.redhat.com (ovpn-13-231.pek2.redhat.com [10.72.13.231]) by smtp.corp.redhat.com (Postfix) with ESMTP id 7F4BD99DE6; Fri, 29 May 2020 08:03:34 +0000 (UTC) From: Jason Wang To: mst@redhat.com, jasowang@redhat.com Cc: kvm@vger.kernel.org, virtualization@lists.linux-foundation.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, rob.miller@broadcom.com, lingshan.zhu@intel.com, eperezma@redhat.com, lulu@redhat.com, shahafs@mellanox.com, hanand@xilinx.com, mhabets@solarflare.com, gdawar@xilinx.com, saugatm@xilinx.com, vmireyno@marvell.com, zhangweining@ruijie.com.cn, eli@mellanox.com Subject: [PATCH 3/6] vdpa: introduce get_vq_notification method Date: Fri, 29 May 2020 16:03:00 +0800 Message-Id: <20200529080303.15449-4-jasowang@redhat.com> In-Reply-To: <20200529080303.15449-1-jasowang@redhat.com> References: <20200529080303.15449-1-jasowang@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org This patch introduces a new method in the vdpa_config_ops which reports the physical address and the size of the doorbell for a specific virtqueue. This will be used by the future patches that maps doorbell to userspace. Signed-off-by: Jason Wang --- include/linux/vdpa.h | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) diff --git a/include/linux/vdpa.h b/include/linux/vdpa.h index 5453af87a33e..239db794357c 100644 --- a/include/linux/vdpa.h +++ b/include/linux/vdpa.h @@ -17,6 +17,16 @@ struct vdpa_callback { void *private; }; +/** + * vDPA notification area + * @addr: base address of the notification area + * @size: size of the notification area + */ +struct vdpa_notification_area { + resource_size_t addr; + resource_size_t size; +}; + /** * vDPA device - representation of a vDPA device * @dev: underlying device @@ -73,6 +83,10 @@ struct vdpa_device { * @vdev: vdpa device * @idx: virtqueue index * Returns virtqueue state (last_avail_idx) + * @get_vq_notification: Get the notification area for a virtqueue + * @vdev: vdpa device + * @idx: virtqueue index + * Returns the notifcation area * @get_vq_align: Get the virtqueue align requirement * for the device * @vdev: vdpa device @@ -162,6 +176,8 @@ struct vdpa_config_ops { bool (*get_vq_ready)(struct vdpa_device *vdev, u16 idx); int (*set_vq_state)(struct vdpa_device *vdev, u16 idx, u64 state); u64 (*get_vq_state)(struct vdpa_device *vdev, u16 idx); + struct vdpa_notification_area + (*get_vq_notification)(struct vdpa_device *vdev, u16 idx); /* Device ops */ u32 (*get_vq_align)(struct vdpa_device *vdev); From patchwork Fri May 29 08:03:01 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jason Wang X-Patchwork-Id: 11578067 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9CE8F92A for ; Fri, 29 May 2020 08:04:00 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 84193208E4 for ; Fri, 29 May 2020 08:04:00 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="YxyrmXsa" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726767AbgE2ID7 (ORCPT ); Fri, 29 May 2020 04:03:59 -0400 Received: from us-smtp-delivery-1.mimecast.com ([207.211.31.120]:29709 "EHLO us-smtp-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726862AbgE2IDy (ORCPT ); Fri, 29 May 2020 04:03:54 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1590739432; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=8h+GPNLosU896j4lmpcz+nuuQdF3yl/18U4jyRqflms=; b=YxyrmXsafHi38LdBpFIG0B/s74ptS5IHPs74L9XvrSJbRAVKCYk12i3x1hXYH36+x09qtC emwxQbyA0J3g0hBYnaTUuxTTMOBGrhgR+br2mcgfqa2WjSVVHrLBlZMnmibwqicT+03Y0Z xzxYKx6nmi09mnhcNclRg2um/vULPuA= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-235-t9xoE6D0NPuqiutyuhmQCg-1; Fri, 29 May 2020 04:03:49 -0400 X-MC-Unique: t9xoE6D0NPuqiutyuhmQCg-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id F0A0780183C; Fri, 29 May 2020 08:03:46 +0000 (UTC) Received: from jason-ThinkPad-X1-Carbon-6th.redhat.com (ovpn-13-231.pek2.redhat.com [10.72.13.231]) by smtp.corp.redhat.com (Postfix) with ESMTP id 1D8DBA1038; Fri, 29 May 2020 08:03:40 +0000 (UTC) From: Jason Wang To: mst@redhat.com, jasowang@redhat.com Cc: kvm@vger.kernel.org, virtualization@lists.linux-foundation.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, rob.miller@broadcom.com, lingshan.zhu@intel.com, eperezma@redhat.com, lulu@redhat.com, shahafs@mellanox.com, hanand@xilinx.com, mhabets@solarflare.com, gdawar@xilinx.com, saugatm@xilinx.com, vmireyno@marvell.com, zhangweining@ruijie.com.cn, eli@mellanox.com Subject: [PATCH 4/6] vhost_vdpa: support doorbell mapping via mmap Date: Fri, 29 May 2020 16:03:01 +0800 Message-Id: <20200529080303.15449-5-jasowang@redhat.com> In-Reply-To: <20200529080303.15449-1-jasowang@redhat.com> References: <20200529080303.15449-1-jasowang@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Currently the doorbell is relayed via eventfd which may have significant overhead because of the cost of vmexits or syscall. This patch introduces mmap() based doorbell mapping which can eliminate the overhead caused by vmexit or syscall. To ease the userspace modeling of the doorbell layout (usually virtio-pci), this patch starts from a doorbell per page model. Vhost-vdpa only support the hardware doorbell that sit at the boundary of a page and does not share the page with other registers. Doorbell of each virtqueue must be mapped separately, pgoff is the index of the virtqueue. This allows userspace to map a subset of the doorbell which may be useful for the implementation of software assisted virtqueue (control vq) in the future. Signed-off-by: Jason Wang Reported-by: kbuild test robot --- drivers/vhost/vdpa.c | 59 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 59 insertions(+) diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c index 6ff72289f488..bbe23cea139a 100644 --- a/drivers/vhost/vdpa.c +++ b/drivers/vhost/vdpa.c @@ -15,6 +15,7 @@ #include #include #include +#include #include #include #include @@ -741,12 +742,70 @@ static int vhost_vdpa_release(struct inode *inode, struct file *filep) return 0; } +static vm_fault_t vhost_vdpa_fault(struct vm_fault *vmf) +{ + struct vhost_vdpa *v = vmf->vma->vm_file->private_data; + struct vdpa_device *vdpa = v->vdpa; + const struct vdpa_config_ops *ops = vdpa->config; + struct vdpa_notification_area notify; + struct vm_area_struct *vma = vmf->vma; + u16 index = vma->vm_pgoff; + + notify = ops->get_vq_notification(vdpa, index); + + vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot); + if (remap_pfn_range(vma, vmf->address & PAGE_MASK, + notify.addr >> PAGE_SHIFT, PAGE_SIZE, + vma->vm_page_prot)) + return VM_FAULT_SIGBUS; + + return VM_FAULT_NOPAGE; +} + +static const struct vm_operations_struct vhost_vdpa_vm_ops = { + .fault = vhost_vdpa_fault, +}; + +static int vhost_vdpa_mmap(struct file *file, struct vm_area_struct *vma) +{ + struct vhost_vdpa *v = vma->vm_file->private_data; + struct vdpa_device *vdpa = v->vdpa; + const struct vdpa_config_ops *ops = vdpa->config; + struct vdpa_notification_area notify; + int index = vma->vm_pgoff; + + if (vma->vm_end - vma->vm_start != PAGE_SIZE) + return -EINVAL; + if ((vma->vm_flags & VM_SHARED) == 0) + return -EINVAL; + if (vma->vm_flags & VM_READ) + return -EINVAL; + if (index > 65535) + return -EINVAL; + if (!ops->get_vq_notification) + return -ENOTSUPP; + + /* To be safe and easily modelled by userspace, We only + * support the doorbell which sits on the page boundary and + * does not share the page with other registers. + */ + notify = ops->get_vq_notification(vdpa, index); + if (notify.addr & (PAGE_SIZE - 1)) + return -EINVAL; + if (vma->vm_end - vma->vm_start != notify.size) + return -ENOTSUPP; + + vma->vm_ops = &vhost_vdpa_vm_ops; + return 0; +} + static const struct file_operations vhost_vdpa_fops = { .owner = THIS_MODULE, .open = vhost_vdpa_open, .release = vhost_vdpa_release, .write_iter = vhost_vdpa_chr_write_iter, .unlocked_ioctl = vhost_vdpa_unlocked_ioctl, + .mmap = vhost_vdpa_mmap, .compat_ioctl = compat_ptr_ioctl, }; From patchwork Fri May 29 08:03:02 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Jason Wang X-Patchwork-Id: 11578071 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5F33E92A for ; Fri, 29 May 2020 08:04:18 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 3637F208A7 for ; Fri, 29 May 2020 08:04:18 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="IBboptKc" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726901AbgE2IEL (ORCPT ); Fri, 29 May 2020 04:04:11 -0400 Received: from us-smtp-1.mimecast.com ([207.211.31.81]:21018 "EHLO us-smtp-delivery-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726878AbgE2IEH (ORCPT ); Fri, 29 May 2020 04:04:07 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1590739444; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=4n31HvUsSFkIbDDQwLElleJUWe581hw1vDZ+++YCvBE=; b=IBboptKc5PqBGUowOLY4PChOdttXL2IPBjRtLTCQzex3W3Xq8ehjPWpmv7W7NNXE/3OaZ1 iqJSl4iSChEoXh9uuX7tzx+kQULkPZcxRv6n5ndii+lE/nAD0SGvmxmHQiOIi77n6+vBGZ tS+k3mFO6RC++H9X3VhbtUWInbeF0f4= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-459-6RhjyjSWNCKcbd2dPWROcw-1; Fri, 29 May 2020 04:04:00 -0400 X-MC-Unique: 6RhjyjSWNCKcbd2dPWROcw-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 991B680183C; Fri, 29 May 2020 08:03:58 +0000 (UTC) Received: from jason-ThinkPad-X1-Carbon-6th.redhat.com (ovpn-13-231.pek2.redhat.com [10.72.13.231]) by smtp.corp.redhat.com (Postfix) with ESMTP id 15C1E99DE6; Fri, 29 May 2020 08:03:47 +0000 (UTC) From: Jason Wang To: mst@redhat.com, jasowang@redhat.com Cc: kvm@vger.kernel.org, virtualization@lists.linux-foundation.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, rob.miller@broadcom.com, lingshan.zhu@intel.com, eperezma@redhat.com, lulu@redhat.com, shahafs@mellanox.com, hanand@xilinx.com, mhabets@solarflare.com, gdawar@xilinx.com, saugatm@xilinx.com, vmireyno@marvell.com, zhangweining@ruijie.com.cn, eli@mellanox.com Subject: [PATCH 5/6] vdpa: introduce virtio pci driver Date: Fri, 29 May 2020 16:03:02 +0800 Message-Id: <20200529080303.15449-6-jasowang@redhat.com> In-Reply-To: <20200529080303.15449-1-jasowang@redhat.com> References: <20200529080303.15449-1-jasowang@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org This patch introduce a vDPA driver for virtio-pci device. It bridges the virtio-pci control command to the vDPA bus. This will be used for developing new features for both software vDPA framework and hardware vDPA feature. Compared to vdpa_sim, it has several advantages: - it's a real device driver which allow us to play with real hardware features - type independent instead of networking specific Note that since virtio specification does not support get/restore virtqueue state. So we can not use this driver for VM. This can be addressed by extending the virtio specification. Signed-off-by: Jason Wang --- drivers/vdpa/Kconfig | 6 + drivers/vdpa/Makefile | 1 + drivers/vdpa/vp_vdpa/Makefile | 2 + drivers/vdpa/vp_vdpa/vp_vdpa.c | 583 +++++++++++++++++++++++++++++++++ 4 files changed, 592 insertions(+) create mode 100644 drivers/vdpa/vp_vdpa/Makefile create mode 100644 drivers/vdpa/vp_vdpa/vp_vdpa.c diff --git a/drivers/vdpa/Kconfig b/drivers/vdpa/Kconfig index e8140065c8a5..3f1bf8b6723d 100644 --- a/drivers/vdpa/Kconfig +++ b/drivers/vdpa/Kconfig @@ -28,4 +28,10 @@ config IFCVF To compile this driver as a module, choose M here: the module will be called ifcvf. +config VP_VDPA + tristate "Virtio PCI bridge vDPA driver" + depends on PCI_MSI + help + This kernel module that bridges virtio PCI device to vDPA bus. + endif # VDPA diff --git a/drivers/vdpa/Makefile b/drivers/vdpa/Makefile index 8bbb686ca7a2..37d00f49b3bf 100644 --- a/drivers/vdpa/Makefile +++ b/drivers/vdpa/Makefile @@ -2,3 +2,4 @@ obj-$(CONFIG_VDPA) += vdpa.o obj-$(CONFIG_VDPA_SIM) += vdpa_sim/ obj-$(CONFIG_IFCVF) += ifcvf/ +obj-$(CONFIG_VP_VDPA) += vp_vdpa/ diff --git a/drivers/vdpa/vp_vdpa/Makefile b/drivers/vdpa/vp_vdpa/Makefile new file mode 100644 index 000000000000..231088d3af7d --- /dev/null +++ b/drivers/vdpa/vp_vdpa/Makefile @@ -0,0 +1,2 @@ +# SPDX-License-Identifier: GPL-2.0 +obj-$(CONFIG_VP_VDPA) += vp_vdpa.o diff --git a/drivers/vdpa/vp_vdpa/vp_vdpa.c b/drivers/vdpa/vp_vdpa/vp_vdpa.c new file mode 100644 index 000000000000..e59c310e2156 --- /dev/null +++ b/drivers/vdpa/vp_vdpa/vp_vdpa.c @@ -0,0 +1,583 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * vDPA bridge driver for modern virtio-pci device + * + * Copyright (c) 2020, Red Hat Inc. All rights reserved. + * Author: Jason Wang + * + * Based on virtio_pci_modern.c. + */ + +#include +#include +#include +#include +#include +#include +#include +#include + +/* TBD: read from config space */ +#define VP_VDPA_MAX_QUEUE 2 +#define VP_VDPA_DRIVER_NAME "vp_vdpa" + +#define VP_VDPA_FEATURES \ + (1ULL << VIRTIO_F_ANY_LAYOUT) | \ + (1ULL << VIRTIO_F_VERSION_1) | \ + (1ULL << VIRTIO_F_ORDER_PLATFORM) | \ + (1ULL << VIRTIO_F_IOMMU_PLATFORM) + +struct vp_vring { + void __iomem *notify; + char msix_name[256]; + struct vdpa_callback cb; + int irq; +}; + +struct vp_vdpa { + struct vdpa_device vdpa; + struct pci_dev *pdev; + + struct virtio_device_id id; + + struct vp_vring vring[VP_VDPA_MAX_QUEUE]; + + /* The IO mapping for the PCI config space */ + void __iomem * const *base; + struct virtio_pci_common_cfg __iomem *common; + void __iomem *device; + /* Base of vq notifications */ + void __iomem *notify; + + /* Multiplier for queue_notify_off. */ + u32 notify_off_multiplier; + + int modern_bars; + int vectors; +}; + +static struct vp_vdpa *vdpa_to_vp(struct vdpa_device *vdpa) +{ + return container_of(vdpa, struct vp_vdpa, vdpa); +} + +/* + * Type-safe wrappers for io accesses. + * Use these to enforce at compile time the following spec requirement: + * + * The driver MUST access each field using the “natural” access + * method, i.e. 32-bit accesses for 32-bit fields, 16-bit accesses + * for 16-bit fields and 8-bit accesses for 8-bit fields. + */ +static inline u8 vp_ioread8(u8 __iomem *addr) +{ + return ioread8(addr); +} +static inline u16 vp_ioread16 (__le16 __iomem *addr) +{ + return ioread16(addr); +} + +static inline u32 vp_ioread32(__le32 __iomem *addr) +{ + return ioread32(addr); +} + +static inline void vp_iowrite8(u8 value, u8 __iomem *addr) +{ + iowrite8(value, addr); +} + +static inline void vp_iowrite16(u16 value, __le16 __iomem *addr) +{ + iowrite16(value, addr); +} + +static inline void vp_iowrite32(u32 value, __le32 __iomem *addr) +{ + iowrite32(value, addr); +} + +static void vp_iowrite64_twopart(u64 val, + __le32 __iomem *lo, __le32 __iomem *hi) +{ + vp_iowrite32((u32)val, lo); + vp_iowrite32(val >> 32, hi); +} + +static int find_capability(struct pci_dev *dev, u8 cfg_type, + u32 ioresource_types, int *bars) +{ + int pos; + + for (pos = pci_find_capability(dev, PCI_CAP_ID_VNDR); + pos > 0; + pos = pci_find_next_capability(dev, pos, PCI_CAP_ID_VNDR)) { + u8 type, bar; + pci_read_config_byte(dev, pos + offsetof(struct virtio_pci_cap, + cfg_type), + &type); + pci_read_config_byte(dev, pos + offsetof(struct virtio_pci_cap, + bar), + &bar); + + /* Ignore structures with reserved BAR values */ + if (bar > 0x5) + continue; + + if (type == cfg_type) { + if (pci_resource_len(dev, bar) && + pci_resource_flags(dev, bar) & ioresource_types) { + *bars |= (1 << bar); + return pos; + } + } + } + return 0; +} + +static void __iomem *map_capability(struct vp_vdpa *vp_vdpa, int off) +{ + struct pci_dev *pdev = vp_vdpa->pdev; + u32 offset; + u8 bar; + + pci_read_config_byte(pdev, + off + offsetof(struct virtio_pci_cap, bar), + &bar); + pci_read_config_dword(pdev, + off + offsetof(struct virtio_pci_cap, offset), + &offset); + + return vp_vdpa->base[bar] + offset; +} + +static u64 vp_vdpa_get_features(struct vdpa_device *vdpa) +{ + struct vp_vdpa *vp_vdpa = vdpa_to_vp(vdpa); + u64 features; + + vp_iowrite32(0, &vp_vdpa->common->device_feature_select); + features = vp_ioread32(&vp_vdpa->common->device_feature); + vp_iowrite32(1, &vp_vdpa->common->device_feature_select); + features |= ((u64)vp_ioread32(&vp_vdpa->common->device_feature) << 32); + features &= VP_VDPA_FEATURES; + + return features; +} + +static int vp_vdpa_set_features(struct vdpa_device *vdpa, u64 features) +{ + struct vp_vdpa *vp_vdpa = vdpa_to_vp(vdpa); + + vp_iowrite32(0, &vp_vdpa->common->guest_feature_select); + vp_iowrite32((u32)features, &vp_vdpa->common->guest_feature); + vp_iowrite32(1, &vp_vdpa->common->guest_feature_select); + vp_iowrite32(features >> 32, &vp_vdpa->common->guest_feature); + + return 0; +} + +static u8 vp_vdpa_get_status(struct vdpa_device *vdpa) +{ + struct vp_vdpa *vp_vdpa = vdpa_to_vp(vdpa); + + return vp_ioread8(&vp_vdpa->common->device_status); +} + +static void vp_vdpa_free_irq(struct vp_vdpa *vp_vdpa) +{ + struct pci_dev *pdev = vp_vdpa->pdev; + int i; + + for (i = 0; i < VP_VDPA_MAX_QUEUE; i++) { + if (vp_vdpa->vring[i].irq != -1) { + vp_iowrite16(i, &vp_vdpa->common->queue_select); + vp_iowrite16(VIRTIO_MSI_NO_VECTOR, + &vp_vdpa->common->queue_msix_vector); + devm_free_irq(&pdev->dev, vp_vdpa->vring[i].irq, + &vp_vdpa->vring[i]); + vp_vdpa->vring[i].irq = -1; + } + } + + if (vp_vdpa->vectors) { + pci_free_irq_vectors(pdev); + vp_vdpa->vectors = 0; + } +} + +static irqreturn_t vp_vdpa_intr_handler(int irq, void *arg) +{ + struct vp_vring *vring = arg; + + if (vring->cb.callback) + return vring->cb.callback(vring->cb.private); + + return IRQ_HANDLED; +} + +static int vp_vdpa_request_irq(struct vp_vdpa *vp_vdpa) +{ + struct pci_dev *pdev = vp_vdpa->pdev; + int i, ret, irq; + + ret = pci_alloc_irq_vectors(pdev, VP_VDPA_MAX_QUEUE, + VP_VDPA_MAX_QUEUE, PCI_IRQ_MSIX); + if (ret != VP_VDPA_MAX_QUEUE) { + dev_err(&pdev->dev, "vp_vdpa: fail to allocate irq vectors\n"); + return ret; + } + + vp_vdpa->vectors = VP_VDPA_MAX_QUEUE; + + for (i = 0; i < VP_VDPA_MAX_QUEUE; i++) { + snprintf(vp_vdpa->vring[i].msix_name, 256, + "vp-vdpa[%s]-%d\n", pci_name(pdev), i); + irq = pci_irq_vector(pdev, i); + ret = devm_request_irq(&pdev->dev, irq, + vp_vdpa_intr_handler, + 0, vp_vdpa->vring[i].msix_name, + &vp_vdpa->vring[i]); + if (ret) { + dev_err(&pdev->dev, "vp_vdpa: fail to request irq for vq %d\n", i); + goto err; + } + vp_iowrite16(i, &vp_vdpa->common->queue_select); + vp_iowrite16(i, &vp_vdpa->common->queue_msix_vector); + vp_vdpa->vring[i].irq = irq; + } + + return 0; +err: + vp_vdpa_free_irq(vp_vdpa); + return ret; +} + +static void vp_vdpa_set_status(struct vdpa_device *vdpa, u8 status) +{ + struct vp_vdpa *vp_vdpa = vdpa_to_vp(vdpa); + u8 s = vp_vdpa_get_status(vdpa); + + if (status & VIRTIO_CONFIG_S_DRIVER_OK && + !(s & VIRTIO_CONFIG_S_DRIVER_OK)) { + vp_vdpa_request_irq(vp_vdpa); + } + + vp_iowrite8(status, &vp_vdpa->common->device_status); + + if (!(status & VIRTIO_CONFIG_S_DRIVER_OK) && + (s & VIRTIO_CONFIG_S_DRIVER_OK)) + vp_vdpa_free_irq(vp_vdpa); +} + +static u16 vp_vdpa_get_vq_num_max(struct vdpa_device *vdpa) +{ + struct vp_vdpa *vp_vdpa = vdpa_to_vp(vdpa); + + return vp_ioread16(&vp_vdpa->common->queue_size); +} + +static u64 vp_vdpa_get_vq_state(struct vdpa_device *vdpa, u16 qid) +{ + return 0; +} + +static int vp_vdpa_set_vq_state(struct vdpa_device *vdpa, u16 qid, + u64 num) +{ + /* Note that this is not supported by virtio specification, so + * we return -ENOTSUPP here. This means we can't support live + * migration, vhost device start/stop. + */ + + return -ENOTSUPP; +} + +static void vp_vdpa_set_vq_cb(struct vdpa_device *vdpa, u16 qid, + struct vdpa_callback *cb) +{ + struct vp_vdpa *vp_vdpa = vdpa_to_vp(vdpa); + + vp_vdpa->vring[qid].cb = *cb; +} + +static void vp_vdpa_set_vq_ready(struct vdpa_device *vdpa, + u16 qid, bool ready) +{ + struct vp_vdpa *vp_vdpa = vdpa_to_vp(vdpa); + + vp_iowrite16(qid, &vp_vdpa->common->queue_select); + vp_iowrite16(ready, &vp_vdpa->common->queue_enable); +} + +static bool vp_vdpa_get_vq_ready(struct vdpa_device *vdpa, u16 qid) +{ + struct vp_vdpa *vp_vdpa = vdpa_to_vp(vdpa); + + vp_iowrite16(qid, &vp_vdpa->common->queue_select); + + return vp_ioread16(&vp_vdpa->common->queue_enable); +} + +static void vp_vdpa_set_vq_num(struct vdpa_device *vdpa, u16 qid, + u32 num) +{ + struct vp_vdpa *vp_vdpa = vdpa_to_vp(vdpa); + + vp_iowrite16(num, &vp_vdpa->common->queue_size); +} + +static int vp_vdpa_set_vq_address(struct vdpa_device *vdpa, u16 qid, + u64 desc_area, u64 driver_area, + u64 device_area) +{ + struct vp_vdpa *vp_vdpa = vdpa_to_vp(vdpa); + struct virtio_pci_common_cfg __iomem *cfg = vp_vdpa->common; + + vp_iowrite16(qid, &cfg->queue_select); + vp_iowrite64_twopart(desc_area, + &cfg->queue_desc_lo, &cfg->queue_desc_hi); + vp_iowrite64_twopart(driver_area, + &cfg->queue_avail_lo, &cfg->queue_avail_hi); + vp_iowrite64_twopart(device_area, + &cfg->queue_used_lo, &cfg->queue_used_hi); + + return 0; +} + +static void vp_vdpa_kick_vq(struct vdpa_device *vdpa, u16 qid) +{ + struct vp_vdpa *vp_vdpa = vdpa_to_vp(vdpa); + + vp_iowrite16(qid, vp_vdpa->vring[qid].notify); +} + +static u32 vp_vdpa_get_generation(struct vdpa_device *vdpa) +{ + struct vp_vdpa *vp_vdpa = vdpa_to_vp(vdpa); + + return vp_ioread8(&vp_vdpa->common->config_generation); +} + +static u32 vp_vdpa_get_device_id(struct vdpa_device *vdpa) +{ + struct vp_vdpa *vp_vdpa = vdpa_to_vp(vdpa); + + return vp_vdpa->id.device; +} + +static u32 vp_vdpa_get_vendor_id(struct vdpa_device *vdpa) +{ + struct vp_vdpa *vp_vdpa = vdpa_to_vp(vdpa); + + return vp_vdpa->id.vendor; +} + +static u32 vp_vdpa_get_vq_align(struct vdpa_device *vdpa) +{ + return PAGE_SIZE; +} + +static void vp_vdpa_get_config(struct vdpa_device *vdpa, + unsigned int offset, + void *buf, unsigned int len) +{ + struct vp_vdpa *vp_vdpa = vdpa_to_vp(vdpa); + u8 old, new; + u8 *p; + int i; + + do { + old = vp_ioread8(&vp_vdpa->common->config_generation); + p = buf; + for (i = 0; i < len; i++) + *p++ = vp_ioread8(vp_vdpa->device + offset + i); + + new = vp_ioread8(&vp_vdpa->common->config_generation); + } while (old != new); +} + +static void vp_vdpa_set_config(struct vdpa_device *vdpa, + unsigned int offset, const void *buf, + unsigned int len) +{ + struct vp_vdpa *vp_vdpa = vdpa_to_vp(vdpa); + const u8 *p = buf; + int i; + + for (i = 0; i < len; i++) + vp_iowrite8(*p++, vp_vdpa->device + offset + i); +} + +static void vp_vdpa_set_config_cb(struct vdpa_device *vdpa, + struct vdpa_callback *cb) +{ + /* We don't support config interrupt */ +} + +static const struct vdpa_config_ops vp_vdpa_ops = { + .get_features = vp_vdpa_get_features, + .set_features = vp_vdpa_set_features, + .get_status = vp_vdpa_get_status, + .set_status = vp_vdpa_set_status, + .get_vq_num_max = vp_vdpa_get_vq_num_max, + .get_vq_state = vp_vdpa_get_vq_state, + .set_vq_state = vp_vdpa_set_vq_state, + .set_vq_cb = vp_vdpa_set_vq_cb, + .set_vq_ready = vp_vdpa_set_vq_ready, + .get_vq_ready = vp_vdpa_get_vq_ready, + .set_vq_num = vp_vdpa_set_vq_num, + .set_vq_address = vp_vdpa_set_vq_address, + .kick_vq = vp_vdpa_kick_vq, + .get_generation = vp_vdpa_get_generation, + .get_device_id = vp_vdpa_get_device_id, + .get_vendor_id = vp_vdpa_get_vendor_id, + .get_vq_align = vp_vdpa_get_vq_align, + .get_config = vp_vdpa_get_config, + .set_config = vp_vdpa_set_config, + .set_config_cb = vp_vdpa_set_config_cb, +}; + +static int vp_vdpa_probe(struct pci_dev *pdev, const struct pci_device_id *id) +{ + struct device *dev = &pdev->dev; + struct vp_vdpa *vp_vdpa; + int common, notify, device, ret, i; + struct virtio_device_id virtio_id; + u16 notify_off; + + /* We only own devices >= 0x1000 and <= 0x107f: leave the rest. */ + if (pdev->device < 0x1000 || pdev->device > 0x107f) + return -ENODEV; + + if (pdev->device < 0x1040) { + /* Transitional devices: use the PCI subsystem device id as + * virtio device id, same as legacy driver always did. + */ + virtio_id.device = pdev->subsystem_device; + } else { + /* Modern devices: simply use PCI device id, but start from 0x1040. */ + virtio_id.device = pdev->device - 0x1040; + } + virtio_id.vendor = pdev->subsystem_vendor; + + ret = pcim_enable_device(pdev); + if (ret) { + dev_err(dev, "vp_vdpa: Fail to enable PCI device\n"); + return ret; + } + + vp_vdpa = vdpa_alloc_device(struct vp_vdpa, vdpa, + dev, &vp_vdpa_ops); + if (vp_vdpa == NULL) { + dev_err(dev, "vp_vdpa: Failed to allocate vDPA structure\n"); + return -ENOMEM; + } + + pci_set_master(pdev); + pci_set_drvdata(pdev, vp_vdpa); + + vp_vdpa->pdev = pdev; + vp_vdpa->vdpa.dma_dev = &pdev->dev; + + common = find_capability(pdev, VIRTIO_PCI_CAP_COMMON_CFG, + IORESOURCE_IO | IORESOURCE_MEM, + &vp_vdpa->modern_bars); + if (!common) { + dev_err(&pdev->dev, + "vp_vdpa: legacy device is not supported\n"); + ret = -ENODEV; + goto err; + } + + notify = find_capability(pdev, VIRTIO_PCI_CAP_NOTIFY_CFG, + IORESOURCE_IO | IORESOURCE_MEM, + &vp_vdpa->modern_bars); + if (!notify) { + dev_err(&pdev->dev, + "vp_vdpa: missing notification capabilities\n"); + ret = -EINVAL; + goto err; + } + + device = find_capability(pdev, VIRTIO_PCI_CAP_DEVICE_CFG, + IORESOURCE_IO | IORESOURCE_MEM, + &vp_vdpa->modern_bars); + if (!device) { + dev_err(&pdev->dev, + "vp_vdpa: missing device capabilities\n"); + ret = -EINVAL; + goto err; + } + + ret = pcim_iomap_regions(pdev, vp_vdpa->modern_bars, + VP_VDPA_DRIVER_NAME); + if (ret) + goto err; + + vp_vdpa->base = pcim_iomap_table(pdev); + + ret = dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(64)); + if (ret) + ret = dma_set_mask_and_coherent(&pdev->dev, + DMA_BIT_MASK(32)); + if (ret) + dev_warn(&pdev->dev, "Failed to enable 64-bit or 32-bit DMA. Trying to continue, but this might not work.\n"); + + vp_vdpa->device = map_capability(vp_vdpa, device); + vp_vdpa->notify = map_capability(vp_vdpa, notify); + vp_vdpa->common = map_capability(vp_vdpa, common); + vp_vdpa->id = virtio_id; + + ret = vdpa_register_device(&vp_vdpa->vdpa); + if (ret) { + dev_err(&pdev->dev, "Failed to register to vdpa bus\n"); + goto err; + } + + pci_read_config_dword(pdev, notify + sizeof(struct virtio_pci_cap), + &vp_vdpa->notify_off_multiplier); + + for (i = 0; i < VP_VDPA_MAX_QUEUE; i++) { + vp_iowrite16(i, &vp_vdpa->common->queue_select); + notify_off = vp_ioread16(&vp_vdpa->common->queue_notify_off); + vp_vdpa->vring[i].irq = -1; + vp_vdpa->vring[i].notify = vp_vdpa->notify + + notify_off * vp_vdpa->notify_off_multiplier; + } + + return 0; + +err: + put_device(&vp_vdpa->vdpa.dev); + return ret; +} + +static void vp_vdpa_remove(struct pci_dev *pdev) +{ + struct vp_vdpa *vp_vdpa = pci_get_drvdata(pdev); + + vdpa_unregister_device(&vp_vdpa->vdpa); +} + +static const struct pci_device_id vp_vdpa_id_table[] = { + { PCI_DEVICE(PCI_VENDOR_ID_REDHAT_QUMRANET, PCI_ANY_ID) }, + { 0 } +}; + +MODULE_DEVICE_TABLE(pci, vp_vdpa_id_table); + +static struct pci_driver vp_vdpa_driver = { + .name = "vp-vdpa", + .id_table = vp_vdpa_id_table, + .probe = vp_vdpa_probe, + .remove = vp_vdpa_remove, +}; + +module_pci_driver(vp_vdpa_driver); + +MODULE_AUTHOR("Jason Wang "); +MODULE_DESCRIPTION("vp-vdpa"); +MODULE_LICENSE("GPL"); +MODULE_VERSION("1"); From patchwork Fri May 29 08:03:03 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jason Wang X-Patchwork-Id: 11578073 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D0385139A for ; Fri, 29 May 2020 08:04:33 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id B72EA207F5 for ; Fri, 29 May 2020 08:04:33 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="L9aYJ3nQ" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726887AbgE2IEV (ORCPT ); Fri, 29 May 2020 04:04:21 -0400 Received: from us-smtp-delivery-1.mimecast.com ([207.211.31.120]:28212 "EHLO us-smtp-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726877AbgE2IEQ (ORCPT ); Fri, 29 May 2020 04:04:16 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1590739454; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=J1yHh7jHFO9pOe6j7yGn/AkmRQQ4RAajACCa4HRcZAQ=; b=L9aYJ3nQLzVS/QMn39PYlulnmmrSfYvWFaRCAxB7ky6qeVAgTGhPsT7irxYbpDfflyzUFh sWMKEIxRBdG8yvt53Kkl5Si8piv2BmbnyKTDp8liZbNT7jcF4U8VmRXb/3hUGGko0xFkzb U4xWv5vRKkUyyyZj5jX+v/az2MZ1wSY= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-371-NMly3ZE9PAeAOMdpgn-tkg-1; Fri, 29 May 2020 04:04:13 -0400 X-MC-Unique: NMly3ZE9PAeAOMdpgn-tkg-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 3723B1009443; Fri, 29 May 2020 08:04:11 +0000 (UTC) Received: from jason-ThinkPad-X1-Carbon-6th.redhat.com (ovpn-13-231.pek2.redhat.com [10.72.13.231]) by smtp.corp.redhat.com (Postfix) with ESMTP id 4717499DE6; Fri, 29 May 2020 08:04:01 +0000 (UTC) From: Jason Wang To: mst@redhat.com, jasowang@redhat.com Cc: kvm@vger.kernel.org, virtualization@lists.linux-foundation.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, rob.miller@broadcom.com, lingshan.zhu@intel.com, eperezma@redhat.com, lulu@redhat.com, shahafs@mellanox.com, hanand@xilinx.com, mhabets@solarflare.com, gdawar@xilinx.com, saugatm@xilinx.com, vmireyno@marvell.com, zhangweining@ruijie.com.cn, eli@mellanox.com Subject: [PATCH 6/6] vdpa: vp_vdpa: report doorbell location Date: Fri, 29 May 2020 16:03:03 +0800 Message-Id: <20200529080303.15449-7-jasowang@redhat.com> In-Reply-To: <20200529080303.15449-1-jasowang@redhat.com> References: <20200529080303.15449-1-jasowang@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org This patch adds support for reporting doorbell location in vp_vdpa driver. The doorbell mapping could be then enabled by e.g launching qemu's virtio-net-pci device with page-per-vq enabled. Signed-off-by: Jason Wang --- drivers/vdpa/vp_vdpa/vp_vdpa.c | 29 +++++++++++++++++++++++++---- 1 file changed, 25 insertions(+), 4 deletions(-) diff --git a/drivers/vdpa/vp_vdpa/vp_vdpa.c b/drivers/vdpa/vp_vdpa/vp_vdpa.c index e59c310e2156..a5ca49533138 100644 --- a/drivers/vdpa/vp_vdpa/vp_vdpa.c +++ b/drivers/vdpa/vp_vdpa/vp_vdpa.c @@ -30,6 +30,7 @@ struct vp_vring { void __iomem *notify; char msix_name[256]; + resource_size_t notify_pa; struct vdpa_callback cb; int irq; }; @@ -136,7 +137,8 @@ static int find_capability(struct pci_dev *dev, u8 cfg_type, return 0; } -static void __iomem *map_capability(struct vp_vdpa *vp_vdpa, int off) +static void __iomem *map_capability(struct vp_vdpa *vp_vdpa, int off, + resource_size_t *pa) { struct pci_dev *pdev = vp_vdpa->pdev; u32 offset; @@ -149,6 +151,9 @@ static void __iomem *map_capability(struct vp_vdpa *vp_vdpa, int off) off + offsetof(struct virtio_pci_cap, offset), &offset); + if (pa) + *pa = pci_resource_start(pdev, bar) + offset; + return vp_vdpa->base[bar] + offset; } @@ -283,6 +288,18 @@ static u64 vp_vdpa_get_vq_state(struct vdpa_device *vdpa, u16 qid) return 0; } +static struct vdpa_notification_area +vp_vdpa_get_vq_notification(struct vdpa_device *vdpa, u16 qid) +{ + struct vp_vdpa *vp_vdpa = vdpa_to_vp(vdpa); + struct vdpa_notification_area notify; + + notify.addr = vp_vdpa->vring[qid].notify_pa; + notify.size = vp_vdpa->notify_off_multiplier; + + return notify; +} + static int vp_vdpa_set_vq_state(struct vdpa_device *vdpa, u16 qid, u64 num) { @@ -423,6 +440,7 @@ static const struct vdpa_config_ops vp_vdpa_ops = { .set_status = vp_vdpa_set_status, .get_vq_num_max = vp_vdpa_get_vq_num_max, .get_vq_state = vp_vdpa_get_vq_state, + .get_vq_notification = vp_vdpa_get_vq_notification, .set_vq_state = vp_vdpa_set_vq_state, .set_vq_cb = vp_vdpa_set_vq_cb, .set_vq_ready = vp_vdpa_set_vq_ready, @@ -445,6 +463,7 @@ static int vp_vdpa_probe(struct pci_dev *pdev, const struct pci_device_id *id) struct vp_vdpa *vp_vdpa; int common, notify, device, ret, i; struct virtio_device_id virtio_id; + resource_size_t notify_pa; u16 notify_off; /* We only own devices >= 0x1000 and <= 0x107f: leave the rest. */ @@ -525,9 +544,9 @@ static int vp_vdpa_probe(struct pci_dev *pdev, const struct pci_device_id *id) if (ret) dev_warn(&pdev->dev, "Failed to enable 64-bit or 32-bit DMA. Trying to continue, but this might not work.\n"); - vp_vdpa->device = map_capability(vp_vdpa, device); - vp_vdpa->notify = map_capability(vp_vdpa, notify); - vp_vdpa->common = map_capability(vp_vdpa, common); + vp_vdpa->device = map_capability(vp_vdpa, device, NULL); + vp_vdpa->notify = map_capability(vp_vdpa, notify, ¬ify_pa); + vp_vdpa->common = map_capability(vp_vdpa, common, NULL); vp_vdpa->id = virtio_id; ret = vdpa_register_device(&vp_vdpa->vdpa); @@ -545,6 +564,8 @@ static int vp_vdpa_probe(struct pci_dev *pdev, const struct pci_device_id *id) vp_vdpa->vring[i].irq = -1; vp_vdpa->vring[i].notify = vp_vdpa->notify + notify_off * vp_vdpa->notify_off_multiplier; + vp_vdpa->vring[i].notify_pa = notify_pa + + notify_off * vp_vdpa->notify_off_multiplier; } return 0;