From patchwork Tue Dec 22 14:52:09 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yongji Xie X-Patchwork-Id: 11986883 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 66256C433DB for ; Tue, 22 Dec 2020 14:53:14 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id CD402229C6 for ; Tue, 22 Dec 2020 14:53:13 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org CD402229C6 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=bytedance.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 6FDD18D000D; Tue, 22 Dec 2020 09:53:13 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 638AE6B00AD; Tue, 22 Dec 2020 09:53:13 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5011D6B00AA; Tue, 22 Dec 2020 09:53:13 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0239.hostedemail.com [216.40.44.239]) by kanga.kvack.org (Postfix) with ESMTP id 2179C6B00AB for ; Tue, 22 Dec 2020 09:53:13 -0500 (EST) Received: from smtpin16.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id DDE6F180AD807 for ; Tue, 22 Dec 2020 14:53:12 +0000 (UTC) X-FDA: 77621211024.16.wound01_36163f027460 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin16.hostedemail.com (Postfix) with ESMTP id B59FF100E690B for ; Tue, 22 Dec 2020 14:53:12 +0000 (UTC) X-HE-Tag: wound01_36163f027460 X-Filterd-Recvd-Size: 3818 Received: from mail-pf1-f172.google.com (mail-pf1-f172.google.com [209.85.210.172]) by imf12.hostedemail.com (Postfix) with ESMTP for ; Tue, 22 Dec 2020 14:53:12 +0000 (UTC) Received: by mail-pf1-f172.google.com with SMTP id m6so8592463pfm.6 for ; Tue, 22 Dec 2020 06:53:11 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=VuS4MFGKOQ6wZmoJNuNdlFGLh2OpeJDnVRbM2jvzhBQ=; b=m4BAjxY/bQAuznhNdZPDHhxtD84Rnr52bsUNHZeWViOotIiiTQcYTFBGdhnZndjKLk OHw+OvpgKjX75tPYqSWtJWfbTAWdQQ/+lPFV0tb2pxQHf18C1jEL2rmD3GJOTJNJYw/r DfCFn95QHf56eVVAIb+nAOSxLi3jNYzogNHRvfiVvQJEPMO7nwk2+TZIV7gEvKeGxPHN 0WWGOplq7NJVzbKmLM2RQs5OLK/X4a5JNAuoZ3Y9vW7v8dFfOxRwVQJzFHrg/vSKPVPa J3jnFdGx9AEYV1P1K+DxYo5Z2ZC7lJV5eYuCYuifIINKBww+dPW6mJzHxgD1lPM9W7c7 lC+w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=VuS4MFGKOQ6wZmoJNuNdlFGLh2OpeJDnVRbM2jvzhBQ=; b=PQExjY9tYRg/wWbn5JZ48pEdkRX9oP29klLPlHuuXqawK5wHU8vODg6E0E+2glpbpX Tad3lSAfEbRxXj8F6lAI+A/YXH/104gwXuBN5f19KS5Hbq4CoQyRJtvApNSU3G893t6R xqkhNOgxLJBD7bO9KZD7oH3QK69kllRkwPZqUIi9nDW7g50aQ4z+CRWm2VpgxpkzH39M w8YIciN8jku7S3ju7lX3RIAIpbRS99NPJRGI5iSlKz6h1E63v6zHGxjmsSD3n24olGry 86YnZM+C3omLnoVuq6/w8zYt9mJoJZJZPZwxMgj9NPkKFxdCIs/8OqoxAuII+oYReN8m bE/Q== X-Gm-Message-State: AOAM5310A2G16ioYJFQrDH7sxDK3QcuwugQLFvnQVAnxrA0RLiWPTUXf bzv7pPnW7ub+2h2UjoWul2jw X-Google-Smtp-Source: ABdhPJwd0YzlNwNtksNu/va7c2CbgxXpbU4AvnZQJsZ90ddeykZ7FJJwEJpyVaPIxgF6wF8TuDIQsg== X-Received: by 2002:a63:2406:: with SMTP id k6mr20214091pgk.453.1608648791343; Tue, 22 Dec 2020 06:53:11 -0800 (PST) Received: from localhost ([139.177.225.248]) by smtp.gmail.com with ESMTPSA id w63sm20402234pfc.20.2020.12.22.06.53.09 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 22 Dec 2020 06:53:10 -0800 (PST) From: Xie Yongji To: mst@redhat.com, jasowang@redhat.com, stefanha@redhat.com, sgarzare@redhat.com, parav@nvidia.com, akpm@linux-foundation.org, rdunlap@infradead.org, willy@infradead.org, viro@zeniv.linux.org.uk, axboe@kernel.dk, bcrl@kvack.org, corbet@lwn.net Cc: virtualization@lists.linux-foundation.org, netdev@vger.kernel.org, kvm@vger.kernel.org, linux-aio@kvack.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org Subject: [RFC v2 01/13] mm: export zap_page_range() for driver use Date: Tue, 22 Dec 2020 22:52:09 +0800 Message-Id: <20201222145221.711-2-xieyongji@bytedance.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20201222145221.711-1-xieyongji@bytedance.com> References: <20201222145221.711-1-xieyongji@bytedance.com> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Export zap_page_range() for use in VDUSE. Signed-off-by: Xie Yongji --- mm/memory.c | 1 + 1 file changed, 1 insertion(+) diff --git a/mm/memory.c b/mm/memory.c index 7d608765932b..edd2d6497bb3 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1542,6 +1542,7 @@ void zap_page_range(struct vm_area_struct *vma, unsigned long start, mmu_notifier_invalidate_range_end(&range); tlb_finish_mmu(&tlb, start, range.end); } +EXPORT_SYMBOL(zap_page_range); /** * zap_page_range_single - remove user pages in a given range From patchwork Tue Dec 22 14:52:10 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yongji Xie X-Patchwork-Id: 11986885 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6D4B8C433DB for ; Tue, 22 Dec 2020 14:53:29 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 081FD229CA for ; Tue, 22 Dec 2020 14:53:28 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 081FD229CA Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=bytedance.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 639B06B00AB; Tue, 22 Dec 2020 09:53:28 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 5E8FF6B00AC; Tue, 22 Dec 2020 09:53:28 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4D8898D0009; Tue, 22 Dec 2020 09:53:28 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0171.hostedemail.com [216.40.44.171]) by kanga.kvack.org (Postfix) with ESMTP id 339FB6B00AA for ; Tue, 22 Dec 2020 09:53:28 -0500 (EST) Received: from smtpin29.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id F379C180AD83A for ; Tue, 22 Dec 2020 14:53:27 +0000 (UTC) X-FDA: 77621211696.29.drug63_3b13f7c27460 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin29.hostedemail.com (Postfix) with ESMTP id BF929180868EE for ; Tue, 22 Dec 2020 14:53:20 +0000 (UTC) X-HE-Tag: drug63_3b13f7c27460 X-Filterd-Recvd-Size: 6967 Received: from mail-pg1-f181.google.com (mail-pg1-f181.google.com [209.85.215.181]) by imf04.hostedemail.com (Postfix) with ESMTP for ; Tue, 22 Dec 2020 14:53:19 +0000 (UTC) Received: by mail-pg1-f181.google.com with SMTP id p18so8487783pgm.11 for ; Tue, 22 Dec 2020 06:53:18 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=hq6cLGhX+YOdloZCBCv76n2UjaYC8NZfkFs6/p+ZtOg=; b=y40ttjV/YgM4wvX349wPEo/KuCxAMjRA2ZGiQMlh4+FFb89YrcRj657ZyIInGzamOW 0vfsCDqIluk6oIsB2jIj87YdKSM/exKmrbFfY4NkHc0YRPVvxD1OzSOfHVxKTOTiLzAd tDe5wUQyRxNUbx5fMBoA8omQ1MJbSMpIFuetQZnAfhieIp+CYT7UAaOXwKoXffCnem1D 3V0DJz+Bxy3VchfzYjJwwfj5DFYGxerszby07BkOCNCactVeA6L6eyBtoh7l4VcAwVVM ghNSomkizu0oUUr2IPwpiyVTKNdcXxHRdQYbXfIxvKEvTvFlRJik1/kfoVSrotdEaDwp YYmw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=hq6cLGhX+YOdloZCBCv76n2UjaYC8NZfkFs6/p+ZtOg=; b=CCmn2/r3a4Nv5Zw9QABC1gHS9ltCTEw2lMPGoK/fb3eGB340adGyT02ULr5ZUWecR2 eVmMZOaIhpS8y9BS66HRqwWUj1EJhozBgxc8hXNC5+utsMqQzikOCiDYaOihr9Og69cH of9CSqASRhaDoqZnFFdOO11BpcyDJCbR6gaCWAduIxvTLsSKnlaoQfjjt8QxbSHXEgxw UGiozSxva0YDbXIa6P0WNXTekFhV2V6oC5i2P2dqrS7kNcika6wc5HLeJHSxR4HlbBUd e2pNJDIgxKiwqT4aObM2mXgZ+Yo8ZJFBHUD3o3DrAzsfChSoZPiY+GoAyHtr9pgSZlSk xQOA== X-Gm-Message-State: AOAM532q9/H2hIcG8+6DcC1RkJm8McnStaCrReDnw8R3dtvuRu8HdKyk t15ddaN8CyXKC/aI6hkTR9dP X-Google-Smtp-Source: ABdhPJwOIJpirmt9ZpOuYali2BMic+XSa8v1hT8Vzzl1Qawx8gam4Ow7LO4po33/t7B2y9SJvYO7HQ== X-Received: by 2002:a63:a516:: with SMTP id n22mr19707227pgf.125.1608648797542; Tue, 22 Dec 2020 06:53:17 -0800 (PST) Received: from localhost ([139.177.225.248]) by smtp.gmail.com with ESMTPSA id h12sm6357200pgk.70.2020.12.22.06.53.16 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 22 Dec 2020 06:53:17 -0800 (PST) From: Xie Yongji To: mst@redhat.com, jasowang@redhat.com, stefanha@redhat.com, sgarzare@redhat.com, parav@nvidia.com, akpm@linux-foundation.org, rdunlap@infradead.org, willy@infradead.org, viro@zeniv.linux.org.uk, axboe@kernel.dk, bcrl@kvack.org, corbet@lwn.net Cc: virtualization@lists.linux-foundation.org, netdev@vger.kernel.org, kvm@vger.kernel.org, linux-aio@kvack.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org Subject: [RFC v2 02/13] eventfd: track eventfd_signal() recursion depth separately in different cases Date: Tue, 22 Dec 2020 22:52:10 +0800 Message-Id: <20201222145221.711-3-xieyongji@bytedance.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20201222145221.711-1-xieyongji@bytedance.com> References: <20201222145221.711-1-xieyongji@bytedance.com> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Now we have a global percpu counter to limit the recursion depth of eventfd_signal(). This can avoid deadlock or stack overflow. But in stack overflow case, it should be OK to increase the recursion depth if needed. So we add a percpu counter in eventfd_ctx to limit the recursion depth for deadlock case. Then it could be fine to increase the global percpu counter later. Signed-off-by: Xie Yongji --- fs/aio.c | 3 ++- fs/eventfd.c | 20 +++++++++++++++++++- include/linux/eventfd.h | 5 +---- 3 files changed, 22 insertions(+), 6 deletions(-) diff --git a/fs/aio.c b/fs/aio.c index 1f32da13d39e..5d82903161f5 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -1698,7 +1698,8 @@ static int aio_poll_wake(struct wait_queue_entry *wait, unsigned mode, int sync, list_del(&iocb->ki_list); iocb->ki_res.res = mangle_poll(mask); req->done = true; - if (iocb->ki_eventfd && eventfd_signal_count()) { + if (iocb->ki_eventfd && + eventfd_signal_count(iocb->ki_eventfd)) { iocb = NULL; INIT_WORK(&req->work, aio_poll_put_work); schedule_work(&req->work); diff --git a/fs/eventfd.c b/fs/eventfd.c index e265b6dd4f34..2df24f9bada3 100644 --- a/fs/eventfd.c +++ b/fs/eventfd.c @@ -25,6 +25,8 @@ #include #include +#define EVENTFD_WAKE_DEPTH 0 + DEFINE_PER_CPU(int, eventfd_wake_count); static DEFINE_IDA(eventfd_ida); @@ -42,9 +44,17 @@ struct eventfd_ctx { */ __u64 count; unsigned int flags; + int __percpu *wake_count; int id; }; +bool eventfd_signal_count(struct eventfd_ctx *ctx) +{ + return (this_cpu_read(*ctx->wake_count) || + this_cpu_read(eventfd_wake_count) > EVENTFD_WAKE_DEPTH); +} +EXPORT_SYMBOL_GPL(eventfd_signal_count); + /** * eventfd_signal - Adds @n to the eventfd counter. * @ctx: [in] Pointer to the eventfd context. @@ -71,17 +81,19 @@ __u64 eventfd_signal(struct eventfd_ctx *ctx, __u64 n) * it returns true, the eventfd_signal() call should be deferred to a * safe context. */ - if (WARN_ON_ONCE(this_cpu_read(eventfd_wake_count))) + if (WARN_ON_ONCE(eventfd_signal_count(ctx))) return 0; spin_lock_irqsave(&ctx->wqh.lock, flags); this_cpu_inc(eventfd_wake_count); + this_cpu_inc(*ctx->wake_count); if (ULLONG_MAX - ctx->count < n) n = ULLONG_MAX - ctx->count; ctx->count += n; if (waitqueue_active(&ctx->wqh)) wake_up_locked_poll(&ctx->wqh, EPOLLIN); this_cpu_dec(eventfd_wake_count); + this_cpu_dec(*ctx->wake_count); spin_unlock_irqrestore(&ctx->wqh.lock, flags); return n; @@ -92,6 +104,7 @@ static void eventfd_free_ctx(struct eventfd_ctx *ctx) { if (ctx->id >= 0) ida_simple_remove(&eventfd_ida, ctx->id); + free_percpu(ctx->wake_count); kfree(ctx); } @@ -423,6 +436,11 @@ static int do_eventfd(unsigned int count, int flags) kref_init(&ctx->kref); init_waitqueue_head(&ctx->wqh); + ctx->wake_count = alloc_percpu(int); + if (!ctx->wake_count) { + kfree(ctx); + return -ENOMEM; + } ctx->count = count; ctx->flags = flags; ctx->id = ida_simple_get(&eventfd_ida, 0, 0, GFP_KERNEL); diff --git a/include/linux/eventfd.h b/include/linux/eventfd.h index fa0a524baed0..1a11ebbd74a9 100644 --- a/include/linux/eventfd.h +++ b/include/linux/eventfd.h @@ -45,10 +45,7 @@ void eventfd_ctx_do_read(struct eventfd_ctx *ctx, __u64 *cnt); DECLARE_PER_CPU(int, eventfd_wake_count); -static inline bool eventfd_signal_count(void) -{ - return this_cpu_read(eventfd_wake_count); -} +bool eventfd_signal_count(struct eventfd_ctx *ctx); #else /* CONFIG_EVENTFD */ From patchwork Tue Dec 22 14:52:11 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yongji Xie X-Patchwork-Id: 11986887 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7E533C433E0 for ; Tue, 22 Dec 2020 14:53:30 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 1E6F3229C6 for ; Tue, 22 Dec 2020 14:53:30 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 1E6F3229C6 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=bytedance.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id AF6318D0009; Tue, 22 Dec 2020 09:53:28 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id A7CB86B00AC; Tue, 22 Dec 2020 09:53:28 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 880558D0009; Tue, 22 Dec 2020 09:53:28 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 571B26B00AA for ; Tue, 22 Dec 2020 09:53:28 -0500 (EST) Received: from smtpin24.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 74D203635 for ; Tue, 22 Dec 2020 14:53:26 +0000 (UTC) X-FDA: 77621211612.24.wire91_0007f0227460 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin24.hostedemail.com (Postfix) with ESMTP id B7BD31A4A7 for ; Tue, 22 Dec 2020 14:53:25 +0000 (UTC) X-HE-Tag: wire91_0007f0227460 X-Filterd-Recvd-Size: 3845 Received: from mail-pj1-f52.google.com (mail-pj1-f52.google.com [209.85.216.52]) by imf14.hostedemail.com (Postfix) with ESMTP for ; Tue, 22 Dec 2020 14:53:25 +0000 (UTC) Received: by mail-pj1-f52.google.com with SMTP id b5so1477829pjl.0 for ; Tue, 22 Dec 2020 06:53:25 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=WpzztgDjuirSaoCm4Y9iwgLXbmOFGv5bSk1ou8troPs=; b=SqjN6hrfpyhBwT61OSi1kdMYlrZ3RN/kNoO0meZihnrymD1OEZhDiehSeRBO23nRd3 0bePzkBUAfUa3AueEFxYHLVuAUzf/70fjyq1CMYK9znh3Dt3g5NRXaEH9h3Y+Ll5ngAw EqwY0xb471qaO+7coD9Rx3Xyzz2C8we1VZ9AUBLgNPufs73v+d3cQjmp9/zxvoiPMfxH hS/tFA6Hf6KCUPvoEjyuLlp0mHV549uAKmUN3uw6NhbkY1HD7luXcKfDEWpPUPRv4xXI WSufNIjIFu+kpf28itIRTLRtWwm2y2x+InNRzR+rcAM53ftC9V1P4fpGMavgLphPfdiQ of6A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=WpzztgDjuirSaoCm4Y9iwgLXbmOFGv5bSk1ou8troPs=; b=prWP0Ge1Uw1bwqDcV+jvsamo1yca29cOPTylE+EsJLXyXNKlk8xXUyc6DKshViwwBK +WRWJS211Uab+TPjp92toAzc6GdbfG/EnQFoer913XLS3lcGr+cBAVeDqKDHmBhHeycb NLY8jpO1XrO4TKide4+yk20OH9MaK0nxKahHw/CZ92biw4yq4tL4b46RIpLPRxeCx5oW 2EnLN0kKcXoNW50sCePhi3RAIkncZwGzeFU+WTmazwcTzXHInlqU/DqNoQaYtcuwZ+C8 r+cvv4AVOXmqbnoC7rKBgTDFtBrh3Lkgr7slHYQpmg0t9aY/IepTOTb4L5vMrkves5NR GMUQ== X-Gm-Message-State: AOAM53369HCtg7LT9xMd8PQ/KASiNWVyGq4xUkHHXH02Gjf4lqTsnv9b yffyrqS5U2VB2bKDgGh0BNkb X-Google-Smtp-Source: ABdhPJxtTfGgv72xCHxYQaMRyp49L/o4fVi3XSgB6dBOFr3oospxL5XhVJtc/Nm8dWEkqKFRd4gPaA== X-Received: by 2002:a17:902:6b84:b029:dc:3419:b555 with SMTP id p4-20020a1709026b84b02900dc3419b555mr16883881plk.77.1608648804465; Tue, 22 Dec 2020 06:53:24 -0800 (PST) Received: from localhost ([139.177.225.248]) by smtp.gmail.com with ESMTPSA id h8sm23516011pjc.2.2020.12.22.06.53.23 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 22 Dec 2020 06:53:23 -0800 (PST) From: Xie Yongji To: mst@redhat.com, jasowang@redhat.com, stefanha@redhat.com, sgarzare@redhat.com, parav@nvidia.com, akpm@linux-foundation.org, rdunlap@infradead.org, willy@infradead.org, viro@zeniv.linux.org.uk, axboe@kernel.dk, bcrl@kvack.org, corbet@lwn.net Cc: virtualization@lists.linux-foundation.org, netdev@vger.kernel.org, kvm@vger.kernel.org, linux-aio@kvack.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org Subject: [RFC v2 03/13] eventfd: Increase the recursion depth of eventfd_signal() Date: Tue, 22 Dec 2020 22:52:11 +0800 Message-Id: <20201222145221.711-4-xieyongji@bytedance.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20201222145221.711-1-xieyongji@bytedance.com> References: <20201222145221.711-1-xieyongji@bytedance.com> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Increase the recursion depth of eventfd_signal() to 1. This will be used in VDUSE case later. Signed-off-by: Xie Yongji --- fs/eventfd.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/eventfd.c b/fs/eventfd.c index 2df24f9bada3..478cdc175949 100644 --- a/fs/eventfd.c +++ b/fs/eventfd.c @@ -25,7 +25,7 @@ #include #include -#define EVENTFD_WAKE_DEPTH 0 +#define EVENTFD_WAKE_DEPTH 1 DEFINE_PER_CPU(int, eventfd_wake_count); From patchwork Tue Dec 22 14:52:12 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yongji Xie X-Patchwork-Id: 11986891 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6B50FC433E6 for ; Tue, 22 Dec 2020 14:53:43 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 2394F229C6 for ; Tue, 22 Dec 2020 14:53:43 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 2394F229C6 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=bytedance.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id CF5D26B00AA; Tue, 22 Dec 2020 09:53:40 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id C59DE8D0010; Tue, 22 Dec 2020 09:53:40 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A560F8D000F; Tue, 22 Dec 2020 09:53:40 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0200.hostedemail.com [216.40.44.200]) by kanga.kvack.org (Postfix) with ESMTP id 5DA216B00AC for ; Tue, 22 Dec 2020 09:53:40 -0500 (EST) Received: from smtpin18.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 1BBAE1F08 for ; Tue, 22 Dec 2020 14:53:40 +0000 (UTC) X-FDA: 77621212200.18.twist24_2510bcb27460 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin18.hostedemail.com (Postfix) with ESMTP id 862DE100ED0F8 for ; Tue, 22 Dec 2020 14:53:32 +0000 (UTC) X-HE-Tag: twist24_2510bcb27460 X-Filterd-Recvd-Size: 5326 Received: from mail-pl1-f171.google.com (mail-pl1-f171.google.com [209.85.214.171]) by imf13.hostedemail.com (Postfix) with ESMTP for ; Tue, 22 Dec 2020 14:53:31 +0000 (UTC) Received: by mail-pl1-f171.google.com with SMTP id x18so7549731pln.6 for ; Tue, 22 Dec 2020 06:53:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=rk3Zr0O5adDtcKDEPjgvkHO0z8gks3DD0R510gfQviI=; b=D1szuUExD6TpnQNRaVTujQbjLXwWr6+baESrAmj1vKA8Re+touRhSZ9m3brhKAqQh/ W7MXeqBNVFH4kKZjsKXehNNuXmDYgNlvevhXFoF7wMCxY7+4ti8UheIN2Q9i73pIu1+a c7bsYSI6NwYEwUA25wPKeMoiIEPL092esHAqm7FeIFODOp0D4HjJXGw+dG2R9Yk1YJzO bhIRQiAIT3WOdmU0uVFMBtWimFVE905jm7v88TVO/rSiduJ6D4sgAOP0PkUsveELfK1d CXd79gRVp0g2qAVdMe723y3efhkOJ0lyFpUT4XBS+QYQyTsWdJlGT6sxt4bAt/k7BFmJ pi2A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=rk3Zr0O5adDtcKDEPjgvkHO0z8gks3DD0R510gfQviI=; b=dJWEwWIvUiHdXg0kaLn/9mPZipEDAgbvjKCOi7KlGLZsffUttXkxxSg++s58byAsdF XHJNxQzzbxzvrZ0u5OpchH6I/DKc0AHs/deL9FjAol5bOtlS5EA7fxP8r5V6ESNcIVhl AibKT+Ye+RGfqxljZu0HEP8yTblrjJX92ECbFrsyG0MI2M8V2XvOgkNgy7jHBwB7ubxu MtQLSx7FRfzT/f1XI33d5QRrdgU9NWH7D5ggx+iAUEEOdMxOLABDpRl26C2X3sDoacZt 8S0y5hqU12QGz5Co0BYOt+0JX62e24Ek6tz3j0UZHafg4fC8KIcXcQHnR7A3zAHPbKeh FdSA== X-Gm-Message-State: AOAM530sN4UwB3MM/Ivd2A+mV0+E+hctkyUpcQ7EzUk3cJou05GzPfDJ YOiIRecGXObIOrxLLsjC+KUU X-Google-Smtp-Source: ABdhPJxQ33ZTSV1IvBRM48/KwG/7AoWwf9usxAJPW0lMoFTcM1C2UfJqPW7jl9CPakjJyJtP08tyYQ== X-Received: by 2002:a17:90a:98f:: with SMTP id 15mr21654432pjo.60.1608648810848; Tue, 22 Dec 2020 06:53:30 -0800 (PST) Received: from localhost ([139.177.225.248]) by smtp.gmail.com with ESMTPSA id o32sm21850106pgm.10.2020.12.22.06.53.29 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 22 Dec 2020 06:53:30 -0800 (PST) From: Xie Yongji To: mst@redhat.com, jasowang@redhat.com, stefanha@redhat.com, sgarzare@redhat.com, parav@nvidia.com, akpm@linux-foundation.org, rdunlap@infradead.org, willy@infradead.org, viro@zeniv.linux.org.uk, axboe@kernel.dk, bcrl@kvack.org, corbet@lwn.net Cc: virtualization@lists.linux-foundation.org, netdev@vger.kernel.org, kvm@vger.kernel.org, linux-aio@kvack.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org Subject: [RFC v2 04/13] vdpa: Remove the restriction that only supports virtio-net devices Date: Tue, 22 Dec 2020 22:52:12 +0800 Message-Id: <20201222145221.711-5-xieyongji@bytedance.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20201222145221.711-1-xieyongji@bytedance.com> References: <20201222145221.711-1-xieyongji@bytedance.com> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: With VDUSE, we should be able to support all kinds of virtio devices. Signed-off-by: Xie Yongji --- drivers/vhost/vdpa.c | 29 +++-------------------------- 1 file changed, 3 insertions(+), 26 deletions(-) diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c index 29ed4173f04e..448be7875b6d 100644 --- a/drivers/vhost/vdpa.c +++ b/drivers/vhost/vdpa.c @@ -22,6 +22,7 @@ #include #include #include +#include #include "vhost.h" @@ -185,26 +186,6 @@ static long vhost_vdpa_set_status(struct vhost_vdpa *v, u8 __user *statusp) return 0; } -static int vhost_vdpa_config_validate(struct vhost_vdpa *v, - struct vhost_vdpa_config *c) -{ - long size = 0; - - switch (v->virtio_id) { - case VIRTIO_ID_NET: - size = sizeof(struct virtio_net_config); - break; - } - - if (c->len == 0) - return -EINVAL; - - if (c->len > size - c->off) - return -E2BIG; - - return 0; -} - static long vhost_vdpa_get_config(struct vhost_vdpa *v, struct vhost_vdpa_config __user *c) { @@ -215,7 +196,7 @@ static long vhost_vdpa_get_config(struct vhost_vdpa *v, if (copy_from_user(&config, c, size)) return -EFAULT; - if (vhost_vdpa_config_validate(v, &config)) + if (config.len == 0) return -EINVAL; buf = kvzalloc(config.len, GFP_KERNEL); if (!buf) @@ -243,7 +224,7 @@ static long vhost_vdpa_set_config(struct vhost_vdpa *v, if (copy_from_user(&config, c, size)) return -EFAULT; - if (vhost_vdpa_config_validate(v, &config)) + if (config.len == 0) return -EINVAL; buf = kvzalloc(config.len, GFP_KERNEL); if (!buf) @@ -1025,10 +1006,6 @@ static int vhost_vdpa_probe(struct vdpa_device *vdpa) int minor; int r; - /* Currently, we only accept the network devices. */ - if (ops->get_device_id(vdpa) != VIRTIO_ID_NET) - return -ENOTSUPP; - v = kzalloc(sizeof(*v), GFP_KERNEL | __GFP_RETRY_MAYFAIL); if (!v) return -ENOMEM; From patchwork Tue Dec 22 14:52:13 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yongji Xie X-Patchwork-Id: 11986889 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 90336C433DB for ; Tue, 22 Dec 2020 14:53:41 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 108C3229C6 for ; Tue, 22 Dec 2020 14:53:41 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 108C3229C6 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=bytedance.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 994066B00AC; Tue, 22 Dec 2020 09:53:40 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 8F5578D000C; Tue, 22 Dec 2020 09:53:40 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 65C256B00AF; Tue, 22 Dec 2020 09:53:40 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0169.hostedemail.com [216.40.44.169]) by kanga.kvack.org (Postfix) with ESMTP id 47B1E6B00AA for ; Tue, 22 Dec 2020 09:53:40 -0500 (EST) Received: from smtpin27.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 7B7261F06 for ; Tue, 22 Dec 2020 14:53:39 +0000 (UTC) X-FDA: 77621212158.27.crowd26_46102d127460 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin27.hostedemail.com (Postfix) with ESMTP id 59F4A3D669 for ; Tue, 22 Dec 2020 14:53:39 +0000 (UTC) X-HE-Tag: crowd26_46102d127460 X-Filterd-Recvd-Size: 5827 Received: from mail-pg1-f175.google.com (mail-pg1-f175.google.com [209.85.215.175]) by imf26.hostedemail.com (Postfix) with ESMTP for ; Tue, 22 Dec 2020 14:53:38 +0000 (UTC) Received: by mail-pg1-f175.google.com with SMTP id n10so8486604pgl.10 for ; Tue, 22 Dec 2020 06:53:38 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=P+KX9u3Abf2VaAB0ChYfsiTC4EA7A/XVpJpOtaXLZNg=; b=RGHRZjZIVayz9n2bH+s5VxcphoRNB6EjtioSPm3q9zqR5I5BqVtwnrV8pgYKqeqHnm 43mgXLR4pAMh9CrRztma2Te5gSb5r4gA3yJOIXaUQ5kHiQ/K7KwFA3vieZRkbZL/S59j lnBjSLqJJViqRGqArWEAmBOTmQw5Aq019qO8dPIGbijaAQ7zM8lUi5IStIOHJc6wLFF7 RKNaCixCROQer93r12yM78PLhxtltjTuaDk+YIJkrJMpgz/eyAfDVT9IZL/CNmara2t1 CAUuLUQqtp+VLCtSvtL1rEERVZzBSRCPBmF4bFAsNYMlG4RaX+xexzSS9EkQgKUYfBkE okBg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=P+KX9u3Abf2VaAB0ChYfsiTC4EA7A/XVpJpOtaXLZNg=; b=oO1/JTDcre/7UQSiFNmzIZahU1v/ol24/TRjo+5wGKCXzrPTsV4nZIpkqUSEx4Oxwk 7FhV4gCs1pIlUZoeoLbXQa2b+RyEx2eqRvnTmXMOoDbvaJgo5UN3DzHcogrCj7oMSPI9 JOR1aKN/e9TG4/k1BUSP6/GANfgV0F2GDZd2Y2S62LM5OhdRI5/qEIpi8bOARvFgPvPP 6rnVTF+6VNq42m4uW8sVgCoxWFnziB5Ldcs4+LBAncjEXmorTaP6bCI6orxjxCXeLrfh kQoRcM1xXZnLNmVTm77e3VgXT/jjFWmytmnWSUl7Og5WdxLT9YU113SWCKo1VxV5PGIa l78w== X-Gm-Message-State: AOAM5314PYWL74t0nhitzdpojhwKfAJvQlXRy74alpppcpmSrg3tB2bW 4+hzD3m+M6+eD+A6eAX16fdU X-Google-Smtp-Source: ABdhPJyo3jyzLMBLqhtiWMuYrhGQuRDVvnQQliPWpgSi4NpWg5z746QRc3ePj3eImUuMthk8W0b8Dg== X-Received: by 2002:a63:1f21:: with SMTP id f33mr20004821pgf.31.1608648817910; Tue, 22 Dec 2020 06:53:37 -0800 (PST) Received: from localhost ([139.177.225.248]) by smtp.gmail.com with ESMTPSA id t206sm10924592pgb.84.2020.12.22.06.53.36 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 22 Dec 2020 06:53:37 -0800 (PST) From: Xie Yongji To: mst@redhat.com, jasowang@redhat.com, stefanha@redhat.com, sgarzare@redhat.com, parav@nvidia.com, akpm@linux-foundation.org, rdunlap@infradead.org, willy@infradead.org, viro@zeniv.linux.org.uk, axboe@kernel.dk, bcrl@kvack.org, corbet@lwn.net Cc: virtualization@lists.linux-foundation.org, netdev@vger.kernel.org, kvm@vger.kernel.org, linux-aio@kvack.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org Subject: [RFC v2 05/13] vdpa: Pass the netlink attributes to ops.dev_add() Date: Tue, 22 Dec 2020 22:52:13 +0800 Message-Id: <20201222145221.711-6-xieyongji@bytedance.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20201222145221.711-1-xieyongji@bytedance.com> References: <20201222145221.711-1-xieyongji@bytedance.com> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Pass the netlink attributes to ops.dev_add() so that we could get some device specific attributes when creating a vdpa device. Signed-off-by: Xie Yongji --- drivers/vdpa/vdpa.c | 2 +- drivers/vdpa/vdpa_sim/vdpa_sim.c | 3 ++- include/linux/vdpa.h | 4 +++- 3 files changed, 6 insertions(+), 3 deletions(-) diff --git a/drivers/vdpa/vdpa.c b/drivers/vdpa/vdpa.c index 32bd48baffab..f6ff81927694 100644 --- a/drivers/vdpa/vdpa.c +++ b/drivers/vdpa/vdpa.c @@ -440,7 +440,7 @@ static int vdpa_nl_cmd_dev_add_set_doit(struct sk_buff *skb, struct genl_info *i goto err; } - vdev = pdev->ops->dev_add(pdev, name, device_id); + vdev = pdev->ops->dev_add(pdev, name, device_id, info->attrs); if (IS_ERR(vdev)) goto err; diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c b/drivers/vdpa/vdpa_sim/vdpa_sim.c index 85776e4e6749..cfc314f5403a 100644 --- a/drivers/vdpa/vdpa_sim/vdpa_sim.c +++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c @@ -726,7 +726,8 @@ static const struct vdpa_config_ops vdpasim_net_batch_config_ops = { }; static struct vdpa_device * -vdpa_dev_add(struct vdpa_parent_dev *pdev, const char *name, u32 device_id) +vdpa_dev_add(struct vdpa_parent_dev *pdev, const char *name, + u32 device_id, struct nlattr **attrs) { struct vdpasim *simdev; diff --git a/include/linux/vdpa.h b/include/linux/vdpa.h index cb5a3d847af3..656fe264234e 100644 --- a/include/linux/vdpa.h +++ b/include/linux/vdpa.h @@ -6,6 +6,7 @@ #include #include #include +#include /** * vDPA callback definition. @@ -349,6 +350,7 @@ static inline void vdpa_get_config(struct vdpa_device *vdev, unsigned offset, * @pdev: parent device to use for device addition * @name: name of the new vdpa device * @device_id: device id of the new vdpa device + * @attrs: device specific attributes * Driver need to add a new device using vdpa_register_device() after * fully initializing the vdpa device. On successful addition driver * must return a valid pointer of vdpa device or ERR_PTR for the error. @@ -359,7 +361,7 @@ static inline void vdpa_get_config(struct vdpa_device *vdev, unsigned offset, */ struct vdpa_dev_ops { struct vdpa_device* (*dev_add)(struct vdpa_parent_dev *pdev, const char *name, - u32 device_id); + u32 device_id, struct nlattr **attrs); void (*dev_del)(struct vdpa_parent_dev *pdev, struct vdpa_device *dev); }; From patchwork Tue Dec 22 14:52:14 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yongji Xie X-Patchwork-Id: 11986893 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 488B0C433E0 for ; Tue, 22 Dec 2020 14:53:50 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 8B97A229C6 for ; Tue, 22 Dec 2020 14:53:49 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 8B97A229C6 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=bytedance.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 28EFC8D000F; Tue, 22 Dec 2020 09:53:49 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 1F1C28D000C; Tue, 22 Dec 2020 09:53:49 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id F374C8D000F; Tue, 22 Dec 2020 09:53:48 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0007.hostedemail.com [216.40.44.7]) by kanga.kvack.org (Postfix) with ESMTP id CD7B88D000C for ; Tue, 22 Dec 2020 09:53:48 -0500 (EST) Received: from smtpin05.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 84805824999B for ; Tue, 22 Dec 2020 14:53:48 +0000 (UTC) X-FDA: 77621212536.05.flame42_15077ff27460 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin05.hostedemail.com (Postfix) with ESMTP id 6A7B618014D11 for ; Tue, 22 Dec 2020 14:53:48 +0000 (UTC) X-HE-Tag: flame42_15077ff27460 X-Filterd-Recvd-Size: 65152 Received: from mail-pg1-f177.google.com (mail-pg1-f177.google.com [209.85.215.177]) by imf46.hostedemail.com (Postfix) with ESMTP for ; Tue, 22 Dec 2020 14:53:47 +0000 (UTC) Received: by mail-pg1-f177.google.com with SMTP id z21so8500586pgj.4 for ; Tue, 22 Dec 2020 06:53:46 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=/jqY8CZAiE0zkAonMtzafTy3mMAWx+7upFhdOrXr1ik=; b=z8ujyeffITCdWDobciM/VRcS5f3gsjuYVveXYt+vRjMguOi5zYg6qhFKMYlBqI42yd szjtp4mshpCEi3AQg8pOBU3DEN75b+KwpS5cB1KfD/Ij3aFQFM3ytKc0czWnK6yah5PJ uGRhoNM/5PHRGY0IhZbZDGJ15gWWCQhsJBo3U76/fm+Zic0vo1Lrl6SW/UfxJng2W9/R VtrBLRaZ5C6j8I0KD25dCCxkWBVh12J4EOGzd/q6cGOMr9vHiS98wyr20ZyWPP71TvGk M+sRF9m/KwY7lEkiWc7C8sX+yuiI+katK0p0IwO5ECdgmmbBdKurtr91LDmJI/qvXyJV eRbQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=/jqY8CZAiE0zkAonMtzafTy3mMAWx+7upFhdOrXr1ik=; b=qj3WcWBJiQtQ0fwBo2658toLtijsOzUrrLTfgIqLqIYx4dM604L6St/olrOuciqsc6 TUiIhfcftxXWtAMWfAQetH7yE1Rd0TbtfmgFZscYXvTWAeNSyBPgQcJCnfXzrNDYRlf2 mXtf+AbgYcvK72xTol/Hw3iKt/GF0DDo1be4Uf5eYXDPqf0vKcbUG8bUUkGfEY1AnhwU 5NFVRCWTD9UpFkLidWlz8arUAYMVpM3QPiXMNbrApTP605ofvT63t2Qs/fY7fXDd+J6V ZDD+mpJDZYX38bbTrLOgZKSJdJTCZ8nNesASvL0beKBXh8JkTDyTMNTf4C0s9d5ZfTGw PsOQ== X-Gm-Message-State: AOAM530UqPaWUXBV0zL379ON3u6qK3Mw6K8aZUWU6MB9k1SRFBgdCGgL 58KpWOeb8b9Aqix3oQ54Uo/N X-Google-Smtp-Source: ABdhPJyC/AyVWkE+iBMWCNOz56kklMCUxRWewjvl6YA98cvAdVYxHhTq3Jvix6sBJS7egeZVXtJXtw== X-Received: by 2002:a63:da17:: with SMTP id c23mr19548170pgh.348.1608648825790; Tue, 22 Dec 2020 06:53:45 -0800 (PST) Received: from localhost ([139.177.225.248]) by smtp.gmail.com with ESMTPSA id e10sm17093263pgu.42.2020.12.22.06.53.43 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 22 Dec 2020 06:53:45 -0800 (PST) From: Xie Yongji To: mst@redhat.com, jasowang@redhat.com, stefanha@redhat.com, sgarzare@redhat.com, parav@nvidia.com, akpm@linux-foundation.org, rdunlap@infradead.org, willy@infradead.org, viro@zeniv.linux.org.uk, axboe@kernel.dk, bcrl@kvack.org, corbet@lwn.net Cc: virtualization@lists.linux-foundation.org, netdev@vger.kernel.org, kvm@vger.kernel.org, linux-aio@kvack.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org Subject: [RFC v2 06/13] vduse: Introduce VDUSE - vDPA Device in Userspace Date: Tue, 22 Dec 2020 22:52:14 +0800 Message-Id: <20201222145221.711-7-xieyongji@bytedance.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20201222145221.711-1-xieyongji@bytedance.com> References: <20201222145221.711-1-xieyongji@bytedance.com> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: This VDUSE driver enables implementing vDPA devices in userspace. Both control path and data path of vDPA devices will be able to be handled in userspace. In the control path, the VDUSE driver will make use of message mechnism to forward the config operation from vdpa bus driver to userspace. Userspace can use read()/write() to receive/reply those control messages. In the data path, the VDUSE driver implements a MMU-based on-chip IOMMU driver which supports mapping the kernel dma buffer to a userspace iova region dynamically. Userspace can access those iova region via mmap(). Besides, the eventfd mechanism is used to trigger interrupt callbacks and receive virtqueue kicks in userspace Now we only support virtio-vdpa bus driver with this patch applied. Signed-off-by: Xie Yongji --- Documentation/driver-api/vduse.rst | 74 ++ Documentation/userspace-api/ioctl/ioctl-number.rst | 1 + drivers/vdpa/Kconfig | 8 + drivers/vdpa/Makefile | 1 + drivers/vdpa/vdpa_user/Makefile | 5 + drivers/vdpa/vdpa_user/eventfd.c | 221 ++++ drivers/vdpa/vdpa_user/eventfd.h | 48 + drivers/vdpa/vdpa_user/iova_domain.c | 442 ++++++++ drivers/vdpa/vdpa_user/iova_domain.h | 93 ++ drivers/vdpa/vdpa_user/vduse.h | 59 ++ drivers/vdpa/vdpa_user/vduse_dev.c | 1121 ++++++++++++++++++++ include/uapi/linux/vdpa.h | 1 + include/uapi/linux/vduse.h | 99 ++ 13 files changed, 2173 insertions(+) create mode 100644 Documentation/driver-api/vduse.rst create mode 100644 drivers/vdpa/vdpa_user/Makefile create mode 100644 drivers/vdpa/vdpa_user/eventfd.c create mode 100644 drivers/vdpa/vdpa_user/eventfd.h create mode 100644 drivers/vdpa/vdpa_user/iova_domain.c create mode 100644 drivers/vdpa/vdpa_user/iova_domain.h create mode 100644 drivers/vdpa/vdpa_user/vduse.h create mode 100644 drivers/vdpa/vdpa_user/vduse_dev.c create mode 100644 include/uapi/linux/vduse.h diff --git a/Documentation/driver-api/vduse.rst b/Documentation/driver-api/vduse.rst new file mode 100644 index 000000000000..da9b3040f20a --- /dev/null +++ b/Documentation/driver-api/vduse.rst @@ -0,0 +1,74 @@ +================================== +VDUSE - "vDPA Device in Userspace" +================================== + +vDPA (virtio data path acceleration) device is a device that uses a +datapath which complies with the virtio specifications with vendor +specific control path. vDPA devices can be both physically located on +the hardware or emulated by software. VDUSE is a framework that makes it +possible to implement software-emulated vDPA devices in userspace. + +How VDUSE works +------------ +Each userspace vDPA device is created by the VDUSE_CREATE_DEV ioctl on +the VDUSE character device (/dev/vduse). Then a file descriptor pointing +to the new resources will be returned, which can be used to implement the +userspace vDPA device's control path and data path. + +To implement control path, the read/write operations to the file descriptor +will be used to receive/reply the control messages from/to VDUSE driver. +Those control messages are based on the vdpa_config_ops which defines a +unified interface to control different types of vDPA device. + +The following types of messages are provided by the VDUSE framework now: + +- VDUSE_SET_VQ_ADDR: Set the addresses of the different aspects of virtqueue. + +- VDUSE_SET_VQ_NUM: Set the size of virtqueue + +- VDUSE_SET_VQ_READY: Set ready status of virtqueue + +- VDUSE_GET_VQ_READY: Get ready status of virtqueue + +- VDUSE_SET_FEATURES: Set virtio features supported by the driver + +- VDUSE_GET_FEATURES: Get virtio features supported by the device + +- VDUSE_SET_STATUS: Set the device status + +- VDUSE_GET_STATUS: Get the device status + +- VDUSE_SET_CONFIG: Write to device specific configuration space + +- VDUSE_GET_CONFIG: Read from device specific configuration space + +Please see include/linux/vdpa.h for details. + +In the data path, VDUSE framework implements a MMU-based on-chip IOMMU +driver which supports mapping the kernel dma buffer to a userspace iova +region dynamically. The userspace iova region can be created by passing +the userspace vDPA device fd to mmap(2). + +Besides, the eventfd mechanism is used to trigger interrupt callbacks and +receive virtqueue kicks in userspace. The following ioctls on the userspace +vDPA device fd are provided to support that: + +- VDUSE_VQ_SETUP_KICKFD: set the kickfd for virtqueue, this eventfd is used + by VDUSE driver to notify userspace to consume the vring. + +- VDUSE_VQ_SETUP_IRQFD: set the irqfd for virtqueue, this eventfd is used + by userspace to notify VDUSE driver to trigger interrupt callbacks. + +MMU-based IOMMU Driver +---------------------- +The basic idea behind the IOMMU driver is treating MMU (VA->PA) as +IOMMU (IOVA->PA). This driver will set up MMU mapping instead of IOMMU mapping +for the DMA transfer so that the userspace process is able to use its virtual +address to access the dma buffer in kernel. + +And to avoid security issue, a bounce-buffering mechanism is introduced to +prevent userspace accessing the original buffer directly which may contain other +kernel data. During the mapping, unmapping, the driver will copy the data from +the original buffer to the bounce buffer and back, depending on the direction of +the transfer. And the bounce-buffer addresses will be mapped into the user address +space instead of the original one. diff --git a/Documentation/userspace-api/ioctl/ioctl-number.rst b/Documentation/userspace-api/ioctl/ioctl-number.rst index a4c75a28c839..71722e6f8f23 100644 --- a/Documentation/userspace-api/ioctl/ioctl-number.rst +++ b/Documentation/userspace-api/ioctl/ioctl-number.rst @@ -300,6 +300,7 @@ Code Seq# Include File Comments 'z' 10-4F drivers/s390/crypto/zcrypt_api.h conflict! '|' 00-7F linux/media.h 0x80 00-1F linux/fb.h +0x81 00-1F linux/vduse.h 0x89 00-06 arch/x86/include/asm/sockios.h 0x89 0B-DF linux/sockios.h 0x89 E0-EF linux/sockios.h SIOCPROTOPRIVATE range diff --git a/drivers/vdpa/Kconfig b/drivers/vdpa/Kconfig index 4be7be39be26..211cc449cbd3 100644 --- a/drivers/vdpa/Kconfig +++ b/drivers/vdpa/Kconfig @@ -21,6 +21,14 @@ config VDPA_SIM to RX. This device is used for testing, prototyping and development of vDPA. +config VDPA_USER + tristate "VDUSE (vDPA Device in Userspace) support" + depends on EVENTFD && MMU && HAS_DMA + default n + help + With VDUSE it is possible to emulate a vDPA Device + in a userspace program. + config IFCVF tristate "Intel IFC VF vDPA driver" depends on PCI_MSI diff --git a/drivers/vdpa/Makefile b/drivers/vdpa/Makefile index d160e9b63a66..66e97778ad03 100644 --- a/drivers/vdpa/Makefile +++ b/drivers/vdpa/Makefile @@ -1,5 +1,6 @@ # SPDX-License-Identifier: GPL-2.0 obj-$(CONFIG_VDPA) += vdpa.o obj-$(CONFIG_VDPA_SIM) += vdpa_sim/ +obj-$(CONFIG_VDPA_USER) += vdpa_user/ obj-$(CONFIG_IFCVF) += ifcvf/ obj-$(CONFIG_MLX5_VDPA) += mlx5/ diff --git a/drivers/vdpa/vdpa_user/Makefile b/drivers/vdpa/vdpa_user/Makefile new file mode 100644 index 000000000000..b7645e36992b --- /dev/null +++ b/drivers/vdpa/vdpa_user/Makefile @@ -0,0 +1,5 @@ +# SPDX-License-Identifier: GPL-2.0 + +vduse-y := vduse_dev.o iova_domain.o eventfd.o + +obj-$(CONFIG_VDPA_USER) += vduse.o diff --git a/drivers/vdpa/vdpa_user/eventfd.c b/drivers/vdpa/vdpa_user/eventfd.c new file mode 100644 index 000000000000..dbffddb08908 --- /dev/null +++ b/drivers/vdpa/vdpa_user/eventfd.c @@ -0,0 +1,221 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Eventfd support for VDUSE + * + * Copyright (C) 2020 Bytedance Inc. and/or its affiliates. All rights reserved. + * + * Author: Xie Yongji + * + */ + +#include +#include +#include +#include +#include +#include + +#include "eventfd.h" + +static struct workqueue_struct *vduse_irqfd_cleanup_wq; + +static void vduse_virqfd_shutdown(struct work_struct *work) +{ + u64 cnt; + struct vduse_virqfd *virqfd = container_of(work, + struct vduse_virqfd, shutdown); + + eventfd_ctx_remove_wait_queue(virqfd->ctx, &virqfd->wait, &cnt); + flush_work(&virqfd->inject); + eventfd_ctx_put(virqfd->ctx); + kfree(virqfd); +} + +static void vduse_virqfd_inject(struct work_struct *work) +{ + struct vduse_virqfd *virqfd = container_of(work, + struct vduse_virqfd, inject); + struct vduse_virtqueue *vq = virqfd->vq; + + spin_lock_irq(&vq->irq_lock); + if (vq->ready && vq->cb) + vq->cb(vq->private); + spin_unlock_irq(&vq->irq_lock); +} + +static void virqfd_deactivate(struct vduse_virqfd *virqfd) +{ + queue_work(vduse_irqfd_cleanup_wq, &virqfd->shutdown); +} + +static int vduse_virqfd_wakeup(wait_queue_entry_t *wait, unsigned int mode, + int sync, void *key) +{ + struct vduse_virqfd *virqfd = container_of(wait, struct vduse_virqfd, wait); + struct vduse_virtqueue *vq = virqfd->vq; + + __poll_t flags = key_to_poll(key); + + if (flags & EPOLLIN) + schedule_work(&virqfd->inject); + + if (flags & EPOLLHUP) { + spin_lock(&vq->irq_lock); + if (vq->virqfd == virqfd) { + vq->virqfd = NULL; + virqfd_deactivate(virqfd); + } + spin_unlock(&vq->irq_lock); + } + + return 0; +} + +static void vduse_virqfd_ptable_queue_proc(struct file *file, + wait_queue_head_t *wqh, poll_table *pt) +{ + struct vduse_virqfd *virqfd = container_of(pt, struct vduse_virqfd, pt); + + add_wait_queue(wqh, &virqfd->wait); +} + +int vduse_virqfd_setup(struct vduse_dev *dev, + struct vduse_vq_eventfd *eventfd) +{ + struct vduse_virqfd *virqfd; + struct fd irqfd; + struct eventfd_ctx *ctx; + struct vduse_virtqueue *vq; + __poll_t events; + int ret; + + if (eventfd->index >= dev->vq_num) + return -EINVAL; + + vq = &dev->vqs[eventfd->index]; + virqfd = kzalloc(sizeof(*virqfd), GFP_KERNEL); + if (!virqfd) + return -ENOMEM; + + INIT_WORK(&virqfd->shutdown, vduse_virqfd_shutdown); + INIT_WORK(&virqfd->inject, vduse_virqfd_inject); + + ret = -EBADF; + irqfd = fdget(eventfd->fd); + if (!irqfd.file) + goto err_fd; + + ctx = eventfd_ctx_fileget(irqfd.file); + if (IS_ERR(ctx)) { + ret = PTR_ERR(ctx); + goto err_ctx; + } + + virqfd->vq = vq; + virqfd->ctx = ctx; + spin_lock(&vq->irq_lock); + if (vq->virqfd) + virqfd_deactivate(virqfd); + vq->virqfd = virqfd; + spin_unlock(&vq->irq_lock); + + init_waitqueue_func_entry(&virqfd->wait, vduse_virqfd_wakeup); + init_poll_funcptr(&virqfd->pt, vduse_virqfd_ptable_queue_proc); + + events = vfs_poll(irqfd.file, &virqfd->pt); + + /* + * Check if there was an event already pending on the eventfd + * before we registered and trigger it as if we didn't miss it. + */ + if (events & EPOLLIN) + schedule_work(&virqfd->inject); + + fdput(irqfd); + + return 0; +err_ctx: + fdput(irqfd); +err_fd: + kfree(virqfd); + return ret; +} + +void vduse_virqfd_release(struct vduse_dev *dev) +{ + int i; + + for (i = 0; i < dev->vq_num; i++) { + struct vduse_virtqueue *vq = &dev->vqs[i]; + + spin_lock(&vq->irq_lock); + if (vq->virqfd) { + virqfd_deactivate(vq->virqfd); + vq->virqfd = NULL; + } + spin_unlock(&vq->irq_lock); + } + flush_workqueue(vduse_irqfd_cleanup_wq); +} + +int vduse_virqfd_init(void) +{ + vduse_irqfd_cleanup_wq = alloc_workqueue("vduse-irqfd-cleanup", + WQ_UNBOUND, 0); + if (!vduse_irqfd_cleanup_wq) + return -ENOMEM; + + return 0; +} + +void vduse_virqfd_exit(void) +{ + destroy_workqueue(vduse_irqfd_cleanup_wq); +} + +void vduse_vq_kick(struct vduse_virtqueue *vq) +{ + spin_lock(&vq->kick_lock); + if (vq->ready && vq->kickfd) + eventfd_signal(vq->kickfd, 1); + spin_unlock(&vq->kick_lock); +} + +int vduse_kickfd_setup(struct vduse_dev *dev, + struct vduse_vq_eventfd *eventfd) +{ + struct eventfd_ctx *ctx; + struct vduse_virtqueue *vq; + + if (eventfd->index >= dev->vq_num) + return -EINVAL; + + vq = &dev->vqs[eventfd->index]; + ctx = eventfd_ctx_fdget(eventfd->fd); + if (IS_ERR(ctx)) + return PTR_ERR(ctx); + + spin_lock(&vq->kick_lock); + if (vq->kickfd) + eventfd_ctx_put(vq->kickfd); + vq->kickfd = ctx; + spin_unlock(&vq->kick_lock); + + return 0; +} + +void vduse_kickfd_release(struct vduse_dev *dev) +{ + int i; + + for (i = 0; i < dev->vq_num; i++) { + struct vduse_virtqueue *vq = &dev->vqs[i]; + + spin_lock(&vq->kick_lock); + if (vq->kickfd) { + eventfd_ctx_put(vq->kickfd); + vq->kickfd = NULL; + } + spin_unlock(&vq->kick_lock); + } +} diff --git a/drivers/vdpa/vdpa_user/eventfd.h b/drivers/vdpa/vdpa_user/eventfd.h new file mode 100644 index 000000000000..14269ff27f47 --- /dev/null +++ b/drivers/vdpa/vdpa_user/eventfd.h @@ -0,0 +1,48 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Eventfd support for VDUSE + * + * Copyright (C) 2020 Bytedance Inc. and/or its affiliates. All rights reserved. + * + * Author: Xie Yongji + * + */ + +#ifndef _VDUSE_EVENTFD_H +#define _VDUSE_EVENTFD_H + +#include +#include +#include +#include + +#include "vduse.h" + +struct vduse_dev; + +struct vduse_virqfd { + struct eventfd_ctx *ctx; + struct vduse_virtqueue *vq; + struct work_struct inject; + struct work_struct shutdown; + wait_queue_entry_t wait; + poll_table pt; +}; + +int vduse_virqfd_setup(struct vduse_dev *dev, + struct vduse_vq_eventfd *eventfd); + +void vduse_virqfd_release(struct vduse_dev *dev); + +int vduse_virqfd_init(void); + +void vduse_virqfd_exit(void); + +void vduse_vq_kick(struct vduse_virtqueue *vq); + +int vduse_kickfd_setup(struct vduse_dev *dev, + struct vduse_vq_eventfd *eventfd); + +void vduse_kickfd_release(struct vduse_dev *dev); + +#endif /* _VDUSE_EVENTFD_H */ diff --git a/drivers/vdpa/vdpa_user/iova_domain.c b/drivers/vdpa/vdpa_user/iova_domain.c new file mode 100644 index 000000000000..27022157abc6 --- /dev/null +++ b/drivers/vdpa/vdpa_user/iova_domain.c @@ -0,0 +1,442 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * MMU-based IOMMU implementation + * + * Copyright (C) 2020 Bytedance Inc. and/or its affiliates. All rights reserved. + * + * Author: Xie Yongji + * + */ + +#include +#include +#include +#include + +#include "iova_domain.h" + +#define IOVA_CHUNK_SHIFT 26 +#define IOVA_CHUNK_SIZE (_AC(1, UL) << IOVA_CHUNK_SHIFT) +#define IOVA_CHUNK_MASK (~(IOVA_CHUNK_SIZE - 1)) + +#define IOVA_MIN_SIZE (IOVA_CHUNK_SIZE << 1) + +#define IOVA_ALLOC_ORDER 12 +#define IOVA_ALLOC_SIZE (1 << IOVA_ALLOC_ORDER) + +struct vduse_mmap_vma { + struct vm_area_struct *vma; + struct list_head list; +}; + +static inline struct page * +vduse_domain_get_bounce_page(struct vduse_iova_domain *domain, + unsigned long iova) +{ + unsigned long index = iova >> IOVA_CHUNK_SHIFT; + unsigned long chunkoff = iova & ~IOVA_CHUNK_MASK; + unsigned long pgindex = chunkoff >> PAGE_SHIFT; + + return domain->chunks[index].bounce_pages[pgindex]; +} + +static inline void +vduse_domain_set_bounce_page(struct vduse_iova_domain *domain, + unsigned long iova, struct page *page) +{ + unsigned long index = iova >> IOVA_CHUNK_SHIFT; + unsigned long chunkoff = iova & ~IOVA_CHUNK_MASK; + unsigned long pgindex = chunkoff >> PAGE_SHIFT; + + domain->chunks[index].bounce_pages[pgindex] = page; +} + +static inline struct vduse_iova_map * +vduse_domain_get_iova_map(struct vduse_iova_domain *domain, + unsigned long iova) +{ + unsigned long index = iova >> IOVA_CHUNK_SHIFT; + unsigned long chunkoff = iova & ~IOVA_CHUNK_MASK; + unsigned long mapindex = chunkoff >> IOVA_ALLOC_ORDER; + + return domain->chunks[index].iova_map[mapindex]; +} + +static inline void +vduse_domain_set_iova_map(struct vduse_iova_domain *domain, + unsigned long iova, struct vduse_iova_map *map) +{ + unsigned long index = iova >> IOVA_CHUNK_SHIFT; + unsigned long chunkoff = iova & ~IOVA_CHUNK_MASK; + unsigned long mapindex = chunkoff >> IOVA_ALLOC_ORDER; + + domain->chunks[index].iova_map[mapindex] = map; +} + +static int +vduse_domain_free_bounce_pages(struct vduse_iova_domain *domain, + unsigned long iova, size_t size) +{ + struct page *page; + size_t walk_sz = 0; + int frees = 0; + + while (walk_sz < size) { + page = vduse_domain_get_bounce_page(domain, iova); + if (page) { + vduse_domain_set_bounce_page(domain, iova, NULL); + put_page(page); + frees++; + } + iova += PAGE_SIZE; + walk_sz += PAGE_SIZE; + } + + return frees; +} + +int vduse_domain_add_vma(struct vduse_iova_domain *domain, + struct vm_area_struct *vma) +{ + unsigned long size = vma->vm_end - vma->vm_start; + struct vduse_mmap_vma *mmap_vma; + + if (WARN_ON(size != domain->size)) + return -EINVAL; + + mmap_vma = kmalloc(sizeof(*mmap_vma), GFP_KERNEL); + if (!mmap_vma) + return -ENOMEM; + + mmap_vma->vma = vma; + mutex_lock(&domain->vma_lock); + list_add(&mmap_vma->list, &domain->vma_list); + mutex_unlock(&domain->vma_lock); + + return 0; +} + +void vduse_domain_remove_vma(struct vduse_iova_domain *domain, + struct vm_area_struct *vma) +{ + struct vduse_mmap_vma *mmap_vma; + + mutex_lock(&domain->vma_lock); + list_for_each_entry(mmap_vma, &domain->vma_list, list) { + if (mmap_vma->vma == vma) { + list_del(&mmap_vma->list); + kfree(mmap_vma); + break; + } + } + mutex_unlock(&domain->vma_lock); +} + +int vduse_domain_add_mapping(struct vduse_iova_domain *domain, + unsigned long iova, unsigned long orig, + size_t size, enum dma_data_direction dir) +{ + struct vduse_iova_map *map; + unsigned long last = iova + size; + + map = kzalloc(sizeof(struct vduse_iova_map), GFP_ATOMIC); + if (!map) + return -ENOMEM; + + map->iova = iova; + map->orig = orig; + map->size = size; + map->dir = dir; + + while (iova < last) { + vduse_domain_set_iova_map(domain, iova, map); + iova += IOVA_ALLOC_SIZE; + } + + return 0; +} + +struct vduse_iova_map * +vduse_domain_get_mapping(struct vduse_iova_domain *domain, + unsigned long iova) +{ + return vduse_domain_get_iova_map(domain, iova); +} + +void vduse_domain_remove_mapping(struct vduse_iova_domain *domain, + struct vduse_iova_map *map) +{ + unsigned long iova = map->iova; + unsigned long last = iova + map->size; + + while (iova < last) { + vduse_domain_set_iova_map(domain, iova, NULL); + iova += IOVA_ALLOC_SIZE; + } +} + +void vduse_domain_unmap(struct vduse_iova_domain *domain, + unsigned long iova, size_t size) +{ + struct vduse_mmap_vma *mmap_vma; + unsigned long uaddr; + + mutex_lock(&domain->vma_lock); + list_for_each_entry(mmap_vma, &domain->vma_list, list) { + mmap_read_lock(mmap_vma->vma->vm_mm); + uaddr = iova + mmap_vma->vma->vm_start; + zap_page_range(mmap_vma->vma, uaddr, size); + mmap_read_unlock(mmap_vma->vma->vm_mm); + } + mutex_unlock(&domain->vma_lock); +} + +int vduse_domain_direct_map(struct vduse_iova_domain *domain, + struct vm_area_struct *vma, unsigned long iova) +{ + unsigned long uaddr = iova + vma->vm_start; + unsigned long start = iova & PAGE_MASK; + unsigned long last = start + PAGE_SIZE - 1; + unsigned long offset; + struct vduse_iova_map *map; + struct page *page = NULL; + + map = vduse_domain_get_iova_map(domain, iova); + if (map) { + offset = last - map->iova; + page = virt_to_page(map->orig + offset); + } + + return page ? vm_insert_page(vma, uaddr, page) : -EFAULT; +} + +void vduse_domain_bounce(struct vduse_iova_domain *domain, + unsigned long iova, unsigned long orig, + size_t size, enum dma_data_direction dir) +{ + unsigned int offset = offset_in_page(iova); + + while (size) { + struct page *p = vduse_domain_get_bounce_page(domain, iova); + size_t copy_len = min_t(size_t, PAGE_SIZE - offset, size); + void *addr; + + if (p) { + addr = page_address(p) + offset; + if (dir == DMA_TO_DEVICE) + memcpy(addr, (void *)orig, copy_len); + else if (dir == DMA_FROM_DEVICE) + memcpy((void *)orig, addr, copy_len); + } + size -= copy_len; + orig += copy_len; + iova += copy_len; + offset = 0; + } +} + +int vduse_domain_bounce_map(struct vduse_iova_domain *domain, + struct vm_area_struct *vma, unsigned long iova) +{ + unsigned long uaddr = iova + vma->vm_start; + unsigned long start = iova & PAGE_MASK; + unsigned long offset = 0; + bool found = false; + struct vduse_iova_map *map; + struct page *page; + + mutex_lock(&domain->map_lock); + + page = vduse_domain_get_bounce_page(domain, iova); + if (page) + goto unlock; + + page = alloc_page(GFP_KERNEL); + if (!page) + goto unlock; + + while (offset < PAGE_SIZE) { + unsigned int src_offset = 0, dst_offset = 0; + void *src, *dst; + size_t copy_len; + + map = vduse_domain_get_iova_map(domain, start + offset); + if (!map) { + offset += IOVA_ALLOC_SIZE; + continue; + } + + found = true; + offset += map->size; + if (map->dir == DMA_FROM_DEVICE) + continue; + + if (start > map->iova) + src_offset = start - map->iova; + else + dst_offset = map->iova - start; + + src = (void *)(map->orig + src_offset); + dst = page_address(page) + dst_offset; + copy_len = min_t(size_t, map->size - src_offset, + PAGE_SIZE - dst_offset); + memcpy(dst, src, copy_len); + } + if (!found) { + put_page(page); + page = NULL; + } + vduse_domain_set_bounce_page(domain, iova, page); +unlock: + mutex_unlock(&domain->map_lock); + + return page ? vm_insert_page(vma, uaddr, page) : -EFAULT; +} + +bool vduse_domain_is_direct_map(struct vduse_iova_domain *domain, + unsigned long iova) +{ + unsigned long index = iova >> IOVA_CHUNK_SHIFT; + struct vduse_iova_chunk *chunk = &domain->chunks[index]; + + return atomic_read(&chunk->map_type) == TYPE_DIRECT_MAP; +} + +unsigned long vduse_domain_alloc_iova(struct vduse_iova_domain *domain, + size_t size, enum iova_map_type type) +{ + struct vduse_iova_chunk *chunk; + unsigned long iova = 0; + int align = (type == TYPE_DIRECT_MAP) ? PAGE_SIZE : IOVA_ALLOC_SIZE; + struct genpool_data_align data = { .align = align }; + int i; + + for (i = 0; i < domain->chunk_num; i++) { + chunk = &domain->chunks[i]; + if (unlikely(atomic_read(&chunk->map_type) == TYPE_NONE)) + atomic_cmpxchg(&chunk->map_type, TYPE_NONE, type); + + if (atomic_read(&chunk->map_type) != type) + continue; + + iova = gen_pool_alloc_algo(chunk->pool, size, + gen_pool_first_fit_align, &data); + if (iova) + break; + } + + return iova; +} + +void vduse_domain_free_iova(struct vduse_iova_domain *domain, + unsigned long iova, size_t size) +{ + unsigned long index = iova >> IOVA_CHUNK_SHIFT; + struct vduse_iova_chunk *chunk = &domain->chunks[index]; + + gen_pool_free(chunk->pool, iova, size); +} + +static void vduse_iova_chunk_cleanup(struct vduse_iova_chunk *chunk) +{ + vfree(chunk->bounce_pages); + vfree(chunk->iova_map); + gen_pool_destroy(chunk->pool); +} + +void vduse_iova_domain_destroy(struct vduse_iova_domain *domain) +{ + struct vduse_iova_chunk *chunk; + int i; + + for (i = 0; i < domain->chunk_num; i++) { + chunk = &domain->chunks[i]; + vduse_domain_free_bounce_pages(domain, + chunk->start, IOVA_CHUNK_SIZE); + vduse_iova_chunk_cleanup(chunk); + } + + mutex_destroy(&domain->map_lock); + mutex_destroy(&domain->vma_lock); + kfree(domain->chunks); + kfree(domain); +} + +static int vduse_iova_chunk_init(struct vduse_iova_chunk *chunk, + unsigned long addr, size_t size) +{ + int ret; + int pages = size >> PAGE_SHIFT; + + chunk->pool = gen_pool_create(IOVA_ALLOC_ORDER, -1); + if (!chunk->pool) + return -ENOMEM; + + /* addr 0 is used in allocation failure case */ + if (addr == 0) + addr += IOVA_ALLOC_SIZE; + + ret = gen_pool_add(chunk->pool, addr, size, -1); + if (ret) + goto err; + + ret = -ENOMEM; + chunk->bounce_pages = vzalloc(pages * sizeof(struct page *)); + if (!chunk->bounce_pages) + goto err; + + chunk->iova_map = vzalloc((size >> IOVA_ALLOC_ORDER) * + sizeof(struct vduse_iova_map *)); + if (!chunk->iova_map) + goto err; + + chunk->start = addr; + atomic_set(&chunk->map_type, TYPE_NONE); + + return 0; +err: + if (chunk->bounce_pages) { + vfree(chunk->bounce_pages); + chunk->bounce_pages = NULL; + } + gen_pool_destroy(chunk->pool); + return ret; +} + +struct vduse_iova_domain *vduse_iova_domain_create(size_t size) +{ + int j, i = 0; + struct vduse_iova_domain *domain; + unsigned long num = size >> IOVA_CHUNK_SHIFT; + unsigned long addr = 0; + + if (size < IOVA_MIN_SIZE || size & ~IOVA_CHUNK_MASK) + return NULL; + + domain = kzalloc(sizeof(*domain), GFP_KERNEL); + if (!domain) + return NULL; + + domain->chunks = kcalloc(num, sizeof(struct vduse_iova_chunk), GFP_KERNEL); + if (!domain->chunks) + goto err; + + for (i = 0; i < num; i++, addr += IOVA_CHUNK_SIZE) + if (vduse_iova_chunk_init(&domain->chunks[i], addr, + IOVA_CHUNK_SIZE)) + goto err; + + domain->chunk_num = num; + domain->size = size; + INIT_LIST_HEAD(&domain->vma_list); + mutex_init(&domain->vma_lock); + mutex_init(&domain->map_lock); + + return domain; +err: + for (j = 0; j < i; j++) + vduse_iova_chunk_cleanup(&domain->chunks[j]); + kfree(domain); + + return NULL; +} diff --git a/drivers/vdpa/vdpa_user/iova_domain.h b/drivers/vdpa/vdpa_user/iova_domain.h new file mode 100644 index 000000000000..fe1816287f5f --- /dev/null +++ b/drivers/vdpa/vdpa_user/iova_domain.h @@ -0,0 +1,93 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * MMU-based IOMMU implementation + * + * Copyright (C) 2020 Bytedance Inc. and/or its affiliates. All rights reserved. + * + * Author: Xie Yongji + * + */ + +#ifndef _VDUSE_IOVA_DOMAIN_H +#define _VDUSE_IOVA_DOMAIN_H + +#include +#include + +enum iova_map_type { + TYPE_NONE, + TYPE_DIRECT_MAP, + TYPE_BOUNCE_MAP, +}; + +struct vduse_iova_map { + unsigned long iova; + unsigned long orig; + size_t size; + enum dma_data_direction dir; +}; + +struct vduse_iova_chunk { + struct gen_pool *pool; + struct page **bounce_pages; + struct vduse_iova_map **iova_map; + unsigned long start; + atomic_t map_type; +}; + +struct vduse_iova_domain { + struct vduse_iova_chunk *chunks; + int chunk_num; + size_t size; + struct mutex map_lock; + struct mutex vma_lock; + struct list_head vma_list; +}; + +int vduse_domain_add_vma(struct vduse_iova_domain *domain, + struct vm_area_struct *vma); + +void vduse_domain_remove_vma(struct vduse_iova_domain *domain, + struct vm_area_struct *vma); + +int vduse_domain_add_mapping(struct vduse_iova_domain *domain, + unsigned long iova, unsigned long orig, + size_t size, enum dma_data_direction dir); + +struct vduse_iova_map * +vduse_domain_get_mapping(struct vduse_iova_domain *domain, + unsigned long iova); + +void vduse_domain_remove_mapping(struct vduse_iova_domain *domain, + struct vduse_iova_map *map); + +void vduse_domain_unmap(struct vduse_iova_domain *domain, + unsigned long iova, size_t size); + +int vduse_domain_direct_map(struct vduse_iova_domain *domain, + struct vm_area_struct *vma, unsigned long iova); + +void vduse_domain_bounce(struct vduse_iova_domain *domain, + unsigned long iova, unsigned long orig, + size_t size, enum dma_data_direction dir); + +int vduse_domain_bounce_map(struct vduse_iova_domain *domain, + struct vm_area_struct *vma, unsigned long iova); + +bool vduse_domain_is_direct_map(struct vduse_iova_domain *domain, + unsigned long iova); + +unsigned long vduse_domain_alloc_iova(struct vduse_iova_domain *domain, + size_t size, enum iova_map_type type); + +void vduse_domain_free_iova(struct vduse_iova_domain *domain, + unsigned long iova, size_t size); + +bool vduse_domain_is_direct_map(struct vduse_iova_domain *domain, + unsigned long iova); + +void vduse_iova_domain_destroy(struct vduse_iova_domain *domain); + +struct vduse_iova_domain *vduse_iova_domain_create(size_t size); + +#endif /* _VDUSE_IOVA_DOMAIN_H */ diff --git a/drivers/vdpa/vdpa_user/vduse.h b/drivers/vdpa/vdpa_user/vduse.h new file mode 100644 index 000000000000..1041ce7bddc4 --- /dev/null +++ b/drivers/vdpa/vdpa_user/vduse.h @@ -0,0 +1,59 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * VDUSE: vDPA Device in Userspace + * + * Copyright (C) 2020 Bytedance Inc. and/or its affiliates. All rights reserved. + * + * Author: Xie Yongji + * + */ + +#ifndef _VDUSE_H +#define _VDUSE_H + +#include +#include +#include + +#include "iova_domain.h" +#include "eventfd.h" + +struct vduse_virtqueue { + u16 index; + bool ready; + spinlock_t kick_lock; + spinlock_t irq_lock; + struct eventfd_ctx *kickfd; + struct vduse_virqfd *virqfd; + void *private; + irqreturn_t (*cb)(void *data); +}; + +struct vduse_dev; + +struct vduse_vdpa { + struct vdpa_device vdpa; + struct vduse_dev *dev; +}; + +struct vduse_dev { + struct vduse_vdpa *vdev; + struct vduse_virtqueue *vqs; + struct vduse_iova_domain *domain; + struct mutex lock; + spinlock_t msg_lock; + atomic64_t msg_unique; + wait_queue_head_t waitq; + struct list_head send_list; + struct list_head recv_list; + struct list_head list; + refcount_t refcnt; + u32 id; + u16 vq_size_max; + u16 vq_num; + u32 vq_align; + u32 device_id; + u32 vendor_id; +}; + +#endif /* _VDUSE_H_ */ diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c new file mode 100644 index 000000000000..4a869b9698ef --- /dev/null +++ b/drivers/vdpa/vdpa_user/vduse_dev.c @@ -0,0 +1,1121 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * VDUSE: vDPA Device in Userspace + * + * Copyright (C) 2020 Bytedance Inc. and/or its affiliates. All rights reserved. + * + * Author: Xie Yongji + * + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "vduse.h" + +#define DRV_VERSION "1.0" +#define DRV_AUTHOR "Yongji Xie " +#define DRV_DESC "vDPA Device in Userspace" +#define DRV_LICENSE "GPL v2" + +struct vduse_dev_msg { + struct vduse_dev_request req; + struct vduse_dev_response resp; + struct list_head list; + wait_queue_head_t waitq; + bool completed; + refcount_t refcnt; +}; + +static struct workqueue_struct *vduse_vdpa_wq; +static DEFINE_MUTEX(vduse_lock); +static LIST_HEAD(vduse_devs); + +static inline struct vduse_dev *vdpa_to_vduse(struct vdpa_device *vdpa) +{ + struct vduse_vdpa *vdev = container_of(vdpa, struct vduse_vdpa, vdpa); + + return vdev->dev; +} + +static inline struct vduse_dev *dev_to_vduse(struct device *dev) +{ + struct vdpa_device *vdpa = dev_to_vdpa(dev); + + return vdpa_to_vduse(vdpa); +} + +static struct vduse_dev_msg *vduse_dev_new_msg(struct vduse_dev *dev, int type) +{ + struct vduse_dev_msg *msg = kzalloc(sizeof(*msg), + GFP_KERNEL | __GFP_NOFAIL); + + msg->req.type = type; + msg->req.unique = atomic64_fetch_inc(&dev->msg_unique); + init_waitqueue_head(&msg->waitq); + refcount_set(&msg->refcnt, 1); + + return msg; +} + +static void vduse_dev_msg_get(struct vduse_dev_msg *msg) +{ + refcount_inc(&msg->refcnt); +} + +static void vduse_dev_msg_put(struct vduse_dev_msg *msg) +{ + if (refcount_dec_and_test(&msg->refcnt)) + kfree(msg); +} + +static struct vduse_dev_msg *vduse_dev_find_msg(struct vduse_dev *dev, + struct list_head *head, + uint32_t unique) +{ + struct vduse_dev_msg *tmp, *msg = NULL; + + spin_lock(&dev->msg_lock); + list_for_each_entry(tmp, head, list) { + if (tmp->req.unique == unique) { + msg = tmp; + list_del(&tmp->list); + break; + } + } + spin_unlock(&dev->msg_lock); + + return msg; +} + +static struct vduse_dev_msg *vduse_dev_dequeue_msg(struct vduse_dev *dev, + struct list_head *head) +{ + struct vduse_dev_msg *msg = NULL; + + spin_lock(&dev->msg_lock); + if (!list_empty(head)) { + msg = list_first_entry(head, struct vduse_dev_msg, list); + list_del(&msg->list); + } + spin_unlock(&dev->msg_lock); + + return msg; +} + +static void vduse_dev_enqueue_msg(struct vduse_dev *dev, + struct vduse_dev_msg *msg, struct list_head *head) +{ + spin_lock(&dev->msg_lock); + list_add_tail(&msg->list, head); + spin_unlock(&dev->msg_lock); +} + +static int vduse_dev_msg_sync(struct vduse_dev *dev, struct vduse_dev_msg *msg) +{ + int ret; + + vduse_dev_enqueue_msg(dev, msg, &dev->send_list); + wake_up(&dev->waitq); + wait_event(msg->waitq, msg->completed); + /* coupled with smp_wmb() in vduse_dev_msg_complete() */ + smp_rmb(); + ret = msg->resp.result; + + return ret; +} + +static void vduse_dev_msg_complete(struct vduse_dev_msg *msg, + struct vduse_dev_response *resp) +{ + vduse_dev_msg_get(msg); + memcpy(&msg->resp, resp, sizeof(*resp)); + /* coupled with smp_rmb() in vduse_dev_msg_sync() */ + smp_wmb(); + msg->completed = 1; + wake_up(&msg->waitq); + vduse_dev_msg_put(msg); +} + +static u64 vduse_dev_get_features(struct vduse_dev *dev) +{ + struct vduse_dev_msg *msg = vduse_dev_new_msg(dev, VDUSE_GET_FEATURES); + u64 features; + + vduse_dev_msg_sync(dev, msg); + features = msg->resp.features; + vduse_dev_msg_put(msg); + + return features; +} + +static int vduse_dev_set_features(struct vduse_dev *dev, u64 features) +{ + struct vduse_dev_msg *msg = vduse_dev_new_msg(dev, VDUSE_SET_FEATURES); + int ret; + + msg->req.size = sizeof(features); + msg->req.features = features; + + ret = vduse_dev_msg_sync(dev, msg); + vduse_dev_msg_put(msg); + + return ret; +} + +static u8 vduse_dev_get_status(struct vduse_dev *dev) +{ + struct vduse_dev_msg *msg = vduse_dev_new_msg(dev, VDUSE_GET_STATUS); + u8 status; + + vduse_dev_msg_sync(dev, msg); + status = msg->resp.status; + vduse_dev_msg_put(msg); + + return status; +} + +static void vduse_dev_set_status(struct vduse_dev *dev, u8 status) +{ + struct vduse_dev_msg *msg = vduse_dev_new_msg(dev, VDUSE_SET_STATUS); + + msg->req.size = sizeof(status); + msg->req.status = status; + + vduse_dev_msg_sync(dev, msg); + vduse_dev_msg_put(msg); +} + +static void vduse_dev_get_config(struct vduse_dev *dev, unsigned int offset, + void *buf, unsigned int len) +{ + struct vduse_dev_msg *msg = vduse_dev_new_msg(dev, VDUSE_GET_CONFIG); + + WARN_ON(len > sizeof(msg->req.config.data)); + + msg->req.size = sizeof(struct vduse_dev_config_data); + msg->req.config.offset = offset; + msg->req.config.len = len; + vduse_dev_msg_sync(dev, msg); + memcpy(buf, msg->resp.config.data, len); + vduse_dev_msg_put(msg); +} + +static void vduse_dev_set_config(struct vduse_dev *dev, unsigned int offset, + const void *buf, unsigned int len) +{ + struct vduse_dev_msg *msg = vduse_dev_new_msg(dev, VDUSE_SET_CONFIG); + + WARN_ON(len > sizeof(msg->req.config.data)); + + msg->req.size = sizeof(struct vduse_dev_config_data); + msg->req.config.offset = offset; + msg->req.config.len = len; + memcpy(msg->req.config.data, buf, len); + vduse_dev_msg_sync(dev, msg); + vduse_dev_msg_put(msg); +} + +static void vduse_dev_set_vq_num(struct vduse_dev *dev, + struct vduse_virtqueue *vq, u32 num) +{ + struct vduse_dev_msg *msg = vduse_dev_new_msg(dev, VDUSE_SET_VQ_NUM); + + msg->req.size = sizeof(struct vduse_vq_num); + msg->req.vq_num.index = vq->index; + msg->req.vq_num.num = num; + + vduse_dev_msg_sync(dev, msg); + vduse_dev_msg_put(msg); +} + +static int vduse_dev_set_vq_addr(struct vduse_dev *dev, + struct vduse_virtqueue *vq, u64 desc_addr, + u64 driver_addr, u64 device_addr) +{ + struct vduse_dev_msg *msg = vduse_dev_new_msg(dev, VDUSE_SET_VQ_ADDR); + int ret; + + msg->req.size = sizeof(struct vduse_vq_addr); + msg->req.vq_addr.index = vq->index; + msg->req.vq_addr.desc_addr = desc_addr; + msg->req.vq_addr.driver_addr = driver_addr; + msg->req.vq_addr.device_addr = device_addr; + + ret = vduse_dev_msg_sync(dev, msg); + vduse_dev_msg_put(msg); + + return ret; +} + +static void vduse_dev_set_vq_ready(struct vduse_dev *dev, + struct vduse_virtqueue *vq, bool ready) +{ + struct vduse_dev_msg *msg = vduse_dev_new_msg(dev, VDUSE_SET_VQ_READY); + + msg->req.size = sizeof(struct vduse_vq_ready); + msg->req.vq_ready.index = vq->index; + msg->req.vq_ready.ready = ready; + + vduse_dev_msg_sync(dev, msg); + vduse_dev_msg_put(msg); +} + +static bool vduse_dev_get_vq_ready(struct vduse_dev *dev, + struct vduse_virtqueue *vq) +{ + struct vduse_dev_msg *msg = vduse_dev_new_msg(dev, VDUSE_GET_VQ_READY); + bool ready; + + msg->req.size = sizeof(struct vduse_vq_ready); + msg->req.vq_ready.index = vq->index; + + vduse_dev_msg_sync(dev, msg); + ready = msg->resp.vq_ready.ready; + vduse_dev_msg_put(msg); + + return ready; +} + +static ssize_t vduse_dev_read_iter(struct kiocb *iocb, struct iov_iter *to) +{ + struct file *file = iocb->ki_filp; + struct vduse_dev *dev = file->private_data; + struct vduse_dev_msg *msg; + int size = sizeof(struct vduse_dev_request); + ssize_t ret = 0; + + if (iov_iter_count(to) < size) + return 0; + + while (1) { + msg = vduse_dev_dequeue_msg(dev, &dev->send_list); + if (msg) + break; + + if (file->f_flags & O_NONBLOCK) + return -EAGAIN; + + ret = wait_event_interruptible_exclusive(dev->waitq, + !list_empty(&dev->send_list)); + if (ret) + return ret; + } + ret = copy_to_iter(&msg->req, size, to); + if (ret != size) { + vduse_dev_enqueue_msg(dev, msg, &dev->send_list); + return -EFAULT; + } + vduse_dev_enqueue_msg(dev, msg, &dev->recv_list); + + return ret; +} + +static ssize_t vduse_dev_write_iter(struct kiocb *iocb, struct iov_iter *from) +{ + struct file *file = iocb->ki_filp; + struct vduse_dev *dev = file->private_data; + struct vduse_dev_response resp; + struct vduse_dev_msg *msg; + size_t ret; + + ret = copy_from_iter(&resp, sizeof(resp), from); + if (ret != sizeof(resp)) + return -EINVAL; + + msg = vduse_dev_find_msg(dev, &dev->recv_list, resp.unique); + if (!msg) + return -EINVAL; + + vduse_dev_msg_complete(msg, &resp); + + return ret; +} + +static __poll_t vduse_dev_poll(struct file *file, poll_table *wait) +{ + struct vduse_dev *dev = file->private_data; + __poll_t mask = 0; + + poll_wait(file, &dev->waitq, wait); + + if (!list_empty(&dev->send_list)) + mask |= EPOLLIN | EPOLLRDNORM; + + return mask; +} + +static void vduse_dev_reset(struct vduse_dev *dev) +{ + int i; + + for (i = 0; i < dev->vq_num; i++) { + struct vduse_virtqueue *vq = &dev->vqs[i]; + + spin_lock(&vq->irq_lock); + vq->ready = false; + vq->cb = NULL; + vq->private = NULL; + spin_unlock(&vq->irq_lock); + } +} + +static int vduse_vdpa_set_vq_address(struct vdpa_device *vdpa, u16 idx, + u64 desc_area, u64 driver_area, + u64 device_area) +{ + struct vduse_dev *dev = vdpa_to_vduse(vdpa); + struct vduse_virtqueue *vq = &dev->vqs[idx]; + + return vduse_dev_set_vq_addr(dev, vq, desc_area, + driver_area, device_area); +} + +static void vduse_vdpa_kick_vq(struct vdpa_device *vdpa, u16 idx) +{ + struct vduse_dev *dev = vdpa_to_vduse(vdpa); + struct vduse_virtqueue *vq = &dev->vqs[idx]; + + vduse_vq_kick(vq); +} + +static void vduse_vdpa_set_vq_cb(struct vdpa_device *vdpa, u16 idx, + struct vdpa_callback *cb) +{ + struct vduse_dev *dev = vdpa_to_vduse(vdpa); + struct vduse_virtqueue *vq = &dev->vqs[idx]; + + vq->cb = cb->callback; + vq->private = cb->private; +} + +static void vduse_vdpa_set_vq_num(struct vdpa_device *vdpa, u16 idx, u32 num) +{ + struct vduse_dev *dev = vdpa_to_vduse(vdpa); + struct vduse_virtqueue *vq = &dev->vqs[idx]; + + vduse_dev_set_vq_num(dev, vq, num); +} + +static void vduse_vdpa_set_vq_ready(struct vdpa_device *vdpa, + u16 idx, bool ready) +{ + struct vduse_dev *dev = vdpa_to_vduse(vdpa); + struct vduse_virtqueue *vq = &dev->vqs[idx]; + + vduse_dev_set_vq_ready(dev, vq, ready); + vq->ready = ready; +} + +static bool vduse_vdpa_get_vq_ready(struct vdpa_device *vdpa, u16 idx) +{ + struct vduse_dev *dev = vdpa_to_vduse(vdpa); + struct vduse_virtqueue *vq = &dev->vqs[idx]; + + vq->ready = vduse_dev_get_vq_ready(dev, vq); + + return vq->ready; +} + +static u32 vduse_vdpa_get_vq_align(struct vdpa_device *vdpa) +{ + struct vduse_dev *dev = vdpa_to_vduse(vdpa); + + return dev->vq_align; +} + +static u64 vduse_vdpa_get_features(struct vdpa_device *vdpa) +{ + struct vduse_dev *dev = vdpa_to_vduse(vdpa); + u64 fixed = (1ULL << VIRTIO_F_ACCESS_PLATFORM); + + return (vduse_dev_get_features(dev) | fixed); +} + +static int vduse_vdpa_set_features(struct vdpa_device *vdpa, u64 features) +{ + struct vduse_dev *dev = vdpa_to_vduse(vdpa); + + return vduse_dev_set_features(dev, features); +} + +static void vduse_vdpa_set_config_cb(struct vdpa_device *vdpa, + struct vdpa_callback *cb) +{ + /* We don't support config interrupt */ +} + +static u16 vduse_vdpa_get_vq_num_max(struct vdpa_device *vdpa) +{ + struct vduse_dev *dev = vdpa_to_vduse(vdpa); + + return dev->vq_size_max; +} + +static u32 vduse_vdpa_get_device_id(struct vdpa_device *vdpa) +{ + struct vduse_dev *dev = vdpa_to_vduse(vdpa); + + return dev->device_id; +} + +static u32 vduse_vdpa_get_vendor_id(struct vdpa_device *vdpa) +{ + struct vduse_dev *dev = vdpa_to_vduse(vdpa); + + return dev->vendor_id; +} + +static u8 vduse_vdpa_get_status(struct vdpa_device *vdpa) +{ + struct vduse_dev *dev = vdpa_to_vduse(vdpa); + + return vduse_dev_get_status(dev); +} + +static void vduse_vdpa_set_status(struct vdpa_device *vdpa, u8 status) +{ + struct vduse_dev *dev = vdpa_to_vduse(vdpa); + + if (status == 0) + vduse_dev_reset(dev); + + vduse_dev_set_status(dev, status); +} + +static void vduse_vdpa_get_config(struct vdpa_device *vdpa, unsigned int offset, + void *buf, unsigned int len) +{ + struct vduse_dev *dev = vdpa_to_vduse(vdpa); + + vduse_dev_get_config(dev, offset, buf, len); +} + +static void vduse_vdpa_set_config(struct vdpa_device *vdpa, unsigned int offset, + const void *buf, unsigned int len) +{ + struct vduse_dev *dev = vdpa_to_vduse(vdpa); + + vduse_dev_set_config(dev, offset, buf, len); +} + +static void vduse_vdpa_free(struct vdpa_device *vdpa) +{ + struct vduse_dev *dev = vdpa_to_vduse(vdpa); + + vduse_kickfd_release(dev); + vduse_virqfd_release(dev); + + WARN_ON(!list_empty(&dev->send_list)); + WARN_ON(!list_empty(&dev->recv_list)); + dev->vdev = NULL; +} + +static const struct vdpa_config_ops vduse_vdpa_config_ops = { + .set_vq_address = vduse_vdpa_set_vq_address, + .kick_vq = vduse_vdpa_kick_vq, + .set_vq_cb = vduse_vdpa_set_vq_cb, + .set_vq_num = vduse_vdpa_set_vq_num, + .set_vq_ready = vduse_vdpa_set_vq_ready, + .get_vq_ready = vduse_vdpa_get_vq_ready, + .get_vq_align = vduse_vdpa_get_vq_align, + .get_features = vduse_vdpa_get_features, + .set_features = vduse_vdpa_set_features, + .set_config_cb = vduse_vdpa_set_config_cb, + .get_vq_num_max = vduse_vdpa_get_vq_num_max, + .get_device_id = vduse_vdpa_get_device_id, + .get_vendor_id = vduse_vdpa_get_vendor_id, + .get_status = vduse_vdpa_get_status, + .set_status = vduse_vdpa_set_status, + .get_config = vduse_vdpa_get_config, + .set_config = vduse_vdpa_set_config, + .free = vduse_vdpa_free, +}; + +static dma_addr_t vduse_dev_map_page(struct device *dev, struct page *page, + unsigned long offset, size_t size, + enum dma_data_direction dir, + unsigned long attrs) +{ + struct vduse_dev *vdev = dev_to_vduse(dev); + struct vduse_iova_domain *domain = vdev->domain; + unsigned long iova = vduse_domain_alloc_iova(domain, size, + TYPE_BOUNCE_MAP); + unsigned long orig = (unsigned long)page_address(page) + offset; + + if (!iova) + return DMA_MAPPING_ERROR; + + if (vduse_domain_add_mapping(domain, iova, orig, size, dir)) { + vduse_domain_free_iova(domain, iova, size); + return DMA_MAPPING_ERROR; + } + + if (dir == DMA_TO_DEVICE) + vduse_domain_bounce(domain, iova, orig, size, dir); + + return (dma_addr_t)iova; +} + +static void vduse_dev_unmap_page(struct device *dev, dma_addr_t dma_addr, + size_t size, enum dma_data_direction dir, + unsigned long attrs) +{ + struct vduse_dev *vdev = dev_to_vduse(dev); + struct vduse_iova_domain *domain = vdev->domain; + unsigned long iova = (unsigned long)dma_addr; + struct vduse_iova_map *map = vduse_domain_get_mapping(domain, iova); + + if (WARN_ON(!map)) + return; + + if (dir == DMA_FROM_DEVICE) + vduse_domain_bounce(domain, iova, map->orig, size, dir); + vduse_domain_remove_mapping(domain, map); + vduse_domain_free_iova(domain, iova, size); + kfree(map); +} + +static void *vduse_dev_alloc_coherent(struct device *dev, size_t size, + dma_addr_t *dma_addr, gfp_t flag, + unsigned long attrs) +{ + struct vduse_dev *vdev = dev_to_vduse(dev); + struct vduse_iova_domain *domain = vdev->domain; + unsigned long iova = vduse_domain_alloc_iova(domain, size, + TYPE_DIRECT_MAP); + void *orig = alloc_pages_exact(size, flag); + + if (!iova || !orig) + goto err; + + if (vduse_domain_add_mapping(domain, iova, + (unsigned long)orig, size, DMA_BIDIRECTIONAL)) + goto err; + + *dma_addr = (dma_addr_t)iova; + + return orig; +err: + *dma_addr = DMA_MAPPING_ERROR; + if (orig) + free_pages_exact(orig, size); + if (iova) + vduse_domain_free_iova(domain, iova, size); + + return NULL; +} + +static void vduse_dev_free_coherent(struct device *dev, size_t size, + void *vaddr, dma_addr_t dma_addr, + unsigned long attrs) +{ + struct vduse_dev *vdev = dev_to_vduse(dev); + struct vduse_iova_domain *domain = vdev->domain; + unsigned long iova = (unsigned long)dma_addr; + struct vduse_iova_map *map = vduse_domain_get_mapping(domain, iova); + + if (WARN_ON(!map)) + return; + + vduse_domain_remove_mapping(domain, map); + vduse_domain_unmap(domain, map->iova, PAGE_ALIGN(map->size)); + free_pages_exact((void *)map->orig, map->size); + vduse_domain_free_iova(domain, map->iova, map->size); + kfree(map); +} + +static const struct dma_map_ops vduse_dev_dma_ops = { + .map_page = vduse_dev_map_page, + .unmap_page = vduse_dev_unmap_page, + .alloc = vduse_dev_alloc_coherent, + .free = vduse_dev_free_coherent, +}; + +static void vduse_dev_mmap_open(struct vm_area_struct *vma) +{ + struct vduse_iova_domain *domain = vma->vm_private_data; + + if (!vduse_domain_add_vma(domain, vma)) + return; + + vma->vm_private_data = NULL; +} + +static void vduse_dev_mmap_close(struct vm_area_struct *vma) +{ + struct vduse_iova_domain *domain = vma->vm_private_data; + + if (!domain) + return; + + vduse_domain_remove_vma(domain, vma); +} + +static int vduse_dev_mmap_split(struct vm_area_struct *vma, unsigned long addr) +{ + return -EPERM; +} + +static vm_fault_t vduse_dev_mmap_fault(struct vm_fault *vmf) +{ + struct vm_area_struct *vma = vmf->vma; + struct vduse_iova_domain *domain = vma->vm_private_data; + unsigned long iova = vmf->address - vma->vm_start; + int ret; + + if (!domain) + return VM_FAULT_SIGBUS; + + if (vduse_domain_is_direct_map(domain, iova)) + ret = vduse_domain_direct_map(domain, vma, iova); + else + ret = vduse_domain_bounce_map(domain, vma, iova); + + if (ret == -ENOMEM) + return VM_FAULT_OOM; + if (ret < 0 && ret != -EBUSY) + return VM_FAULT_SIGBUS; + + return VM_FAULT_NOPAGE; +} + +static const struct vm_operations_struct vduse_dev_mmap_ops = { + .open = vduse_dev_mmap_open, + .close = vduse_dev_mmap_close, + .may_split = vduse_dev_mmap_split, + .fault = vduse_dev_mmap_fault, +}; + +static int vduse_dev_mmap(struct file *file, struct vm_area_struct *vma) +{ + struct vduse_dev *dev = file->private_data; + struct vduse_iova_domain *domain = dev->domain; + unsigned long size = vma->vm_end - vma->vm_start; + int ret; + + if (domain->size != size || vma->vm_pgoff) + return -EINVAL; + + ret = vduse_domain_add_vma(domain, vma); + if (ret) + return ret; + + vma->vm_flags |= VM_MIXEDMAP | VM_DONTCOPY | + VM_DONTDUMP | VM_DONTEXPAND; + vma->vm_private_data = domain; + vma->vm_ops = &vduse_dev_mmap_ops; + + return 0; +} + +static long vduse_dev_ioctl(struct file *file, unsigned int cmd, + unsigned long arg) +{ + struct vduse_dev *dev = file->private_data; + void __user *argp = (void __user *)arg; + int ret; + + mutex_lock(&dev->lock); + switch (cmd) { + case VDUSE_VQ_SETUP_KICKFD: { + struct vduse_vq_eventfd eventfd; + + ret = -EFAULT; + if (copy_from_user(&eventfd, argp, sizeof(eventfd))) + break; + + ret = vduse_kickfd_setup(dev, &eventfd); + break; + } + case VDUSE_VQ_SETUP_IRQFD: { + struct vduse_vq_eventfd eventfd; + + ret = -EFAULT; + if (copy_from_user(&eventfd, argp, sizeof(eventfd))) + break; + + ret = vduse_virqfd_setup(dev, &eventfd); + break; + } + } + mutex_unlock(&dev->lock); + + return ret; +} + +static int vduse_dev_release(struct inode *inode, struct file *file) +{ + struct vduse_dev *dev = file->private_data; + struct vduse_dev_msg *msg; + + while ((msg = vduse_dev_dequeue_msg(dev, &dev->recv_list))) + vduse_dev_enqueue_msg(dev, msg, &dev->send_list); + + refcount_dec(&dev->refcnt); + + return 0; +} + +static const struct file_operations vduse_dev_fops = { + .owner = THIS_MODULE, + .release = vduse_dev_release, + .read_iter = vduse_dev_read_iter, + .write_iter = vduse_dev_write_iter, + .poll = vduse_dev_poll, + .mmap = vduse_dev_mmap, + .unlocked_ioctl = vduse_dev_ioctl, + .compat_ioctl = compat_ptr_ioctl, + .llseek = noop_llseek, +}; + +static struct vduse_dev *vduse_dev_create(void) +{ + struct vduse_dev *dev = kzalloc(sizeof(*dev), GFP_KERNEL); + + if (!dev) + return NULL; + + mutex_init(&dev->lock); + spin_lock_init(&dev->msg_lock); + INIT_LIST_HEAD(&dev->send_list); + INIT_LIST_HEAD(&dev->recv_list); + atomic64_set(&dev->msg_unique, 0); + init_waitqueue_head(&dev->waitq); + refcount_set(&dev->refcnt, 1); + + return dev; +} + +static void vduse_dev_destroy(struct vduse_dev *dev) +{ + mutex_destroy(&dev->lock); + kfree(dev); +} + +static struct vduse_dev *vduse_find_dev(u32 id) +{ + struct vduse_dev *tmp, *dev = NULL; + + list_for_each_entry(tmp, &vduse_devs, list) { + if (tmp->id == id) { + dev = tmp; + break; + } + } + return dev; +} + +static int vduse_get_dev(u32 id) +{ + int fd; + char name[64]; + struct vduse_dev *dev = vduse_find_dev(id); + + if (!dev) + return -EINVAL; + + snprintf(name, sizeof(name), "vduse-dev:%u", dev->id); + fd = anon_inode_getfd(name, &vduse_dev_fops, dev, O_RDWR | O_CLOEXEC); + if (fd < 0) + return fd; + + refcount_inc(&dev->refcnt); + + return fd; +} + +static int vduse_destroy_dev(u32 id) +{ + struct vduse_dev *dev = vduse_find_dev(id); + + if (!dev) + return -EINVAL; + + if (dev->vdev || refcount_read(&dev->refcnt) > 1) + return -EBUSY; + + list_del(&dev->list); + kfree(dev->vqs); + vduse_iova_domain_destroy(dev->domain); + vduse_dev_destroy(dev); + + return 0; +} + +static int vduse_create_dev(struct vduse_dev_config *config) +{ + int i, fd; + struct vduse_dev *dev; + char name[64]; + + if (vduse_find_dev(config->id)) + return -EEXIST; + + dev = vduse_dev_create(); + if (!dev) + return -ENOMEM; + + dev->id = config->id; + dev->device_id = config->device_id; + dev->vendor_id = config->vendor_id; + dev->domain = vduse_iova_domain_create(config->iova_size); + if (!dev->domain) + goto err_domain; + + dev->vq_align = config->vq_align; + dev->vq_size_max = config->vq_size_max; + dev->vq_num = config->vq_num; + dev->vqs = kcalloc(dev->vq_num, sizeof(*dev->vqs), GFP_KERNEL); + if (!dev->vqs) + goto err_vqs; + + for (i = 0; i < dev->vq_num; i++) { + dev->vqs[i].index = i; + spin_lock_init(&dev->vqs[i].kick_lock); + spin_lock_init(&dev->vqs[i].irq_lock); + } + + snprintf(name, sizeof(name), "vduse-dev:%u", config->id); + fd = anon_inode_getfd(name, &vduse_dev_fops, dev, O_RDWR | O_CLOEXEC); + if (fd < 0) + goto err_fd; + + refcount_inc(&dev->refcnt); + list_add(&dev->list, &vduse_devs); + + return fd; +err_fd: + kfree(dev->vqs); +err_vqs: + vduse_iova_domain_destroy(dev->domain); +err_domain: + vduse_dev_destroy(dev); + return fd; +} + +static long vduse_ioctl(struct file *file, unsigned int cmd, + unsigned long arg) +{ + int ret; + void __user *argp = (void __user *)arg; + + mutex_lock(&vduse_lock); + switch (cmd) { + case VDUSE_CREATE_DEV: { + struct vduse_dev_config config; + + ret = -EFAULT; + if (copy_from_user(&config, argp, sizeof(config))) + break; + + ret = vduse_create_dev(&config); + break; + } + case VDUSE_GET_DEV: + ret = vduse_get_dev(arg); + break; + case VDUSE_DESTROY_DEV: + ret = vduse_destroy_dev(arg); + break; + default: + ret = -EINVAL; + break; + } + mutex_unlock(&vduse_lock); + + return ret; +} + +static const struct file_operations vduse_fops = { + .owner = THIS_MODULE, + .unlocked_ioctl = vduse_ioctl, + .compat_ioctl = compat_ptr_ioctl, + .llseek = noop_llseek, +}; + +static struct miscdevice vduse_misc = { + .fops = &vduse_fops, + .minor = MISC_DYNAMIC_MINOR, + .name = "vduse", +}; + +static void vduse_parent_release(struct device *dev) +{ +} + +static struct device vduse_parent = { + .init_name = "vduse", + .release = vduse_parent_release, +}; + +static struct vdpa_parent_dev parent_dev; + +static int vduse_dev_add_vdpa(struct vduse_dev *dev, const char *name) +{ + struct vduse_vdpa *vdev = dev->vdev; + int ret; + + if (vdev) + return -EEXIST; + + vdev = vdpa_alloc_device(struct vduse_vdpa, vdpa, NULL, + &vduse_vdpa_config_ops, dev->vq_num, name); + if (!vdev) + return -ENOMEM; + + vdev->dev = dev; + vdev->vdpa.dev.dma_mask = &vdev->vdpa.dev.coherent_dma_mask; + ret = dma_set_mask_and_coherent(&vdev->vdpa.dev, DMA_BIT_MASK(64)); + if (ret) + goto err; + + set_dma_ops(&vdev->vdpa.dev, &vduse_dev_dma_ops); + vdev->vdpa.dma_dev = &vdev->vdpa.dev; + vdev->vdpa.pdev = &parent_dev; + + ret = _vdpa_register_device(&vdev->vdpa); + if (ret) + goto err; + + dev->vdev = vdev; + + return 0; +err: + put_device(&vdev->vdpa.dev); + return ret; +} + +static struct vdpa_device *vdpa_dev_add(struct vdpa_parent_dev *pdev, + const char *name, u32 device_id, + struct nlattr **attrs) +{ + u32 vduse_id; + struct vduse_dev *dev; + int ret = -EINVAL; + + if (!attrs[VDPA_ATTR_BACKEND_ID]) + return ERR_PTR(-EINVAL); + + mutex_lock(&vduse_lock); + vduse_id = nla_get_u32(attrs[VDPA_ATTR_BACKEND_ID]); + dev = vduse_find_dev(vduse_id); + if (!dev) + goto unlock; + + if (dev->device_id != device_id) + goto unlock; + + ret = vduse_dev_add_vdpa(dev, name); +unlock: + mutex_unlock(&vduse_lock); + if (ret) + return ERR_PTR(ret); + + return &dev->vdev->vdpa; +} + +static void vdpa_dev_del(struct vdpa_parent_dev *pdev, struct vdpa_device *dev) +{ + _vdpa_unregister_device(dev); +} + +static const struct vdpa_dev_ops vdpa_dev_parent_ops = { + .dev_add = vdpa_dev_add, + .dev_del = vdpa_dev_del +}; + +static struct virtio_device_id id_table[] = { + { VIRTIO_DEV_ANY_ID, VIRTIO_DEV_ANY_ID }, + { 0 }, +}; + +static struct vdpa_parent_dev parent_dev = { + .device = &vduse_parent, + .id_table = id_table, + .ops = &vdpa_dev_parent_ops, +}; + +static int vduse_parentdev_init(void) +{ + int ret; + + ret = device_register(&vduse_parent); + if (ret) + return ret; + + ret = vdpa_parentdev_register(&parent_dev); + if (ret) + goto err; + + return 0; +err: + device_unregister(&vduse_parent); + return ret; +} + +static void vduse_parentdev_exit(void) +{ + vdpa_parentdev_unregister(&parent_dev); + device_unregister(&vduse_parent); +} + +static int vduse_init(void) +{ + int ret; + + ret = misc_register(&vduse_misc); + if (ret) + return ret; + + ret = -ENOMEM; + vduse_vdpa_wq = alloc_workqueue("vduse-vdpa", WQ_UNBOUND, 1); + if (!vduse_vdpa_wq) + goto err_vdpa_wq; + + ret = vduse_virqfd_init(); + if (ret) + goto err_irqfd; + + ret = vduse_parentdev_init(); + if (ret) + goto err_parentdev; + + return 0; +err_parentdev: + vduse_virqfd_exit(); +err_irqfd: + destroy_workqueue(vduse_vdpa_wq); +err_vdpa_wq: + misc_deregister(&vduse_misc); + return ret; +} +module_init(vduse_init); + +static void vduse_exit(void) +{ + misc_deregister(&vduse_misc); + destroy_workqueue(vduse_vdpa_wq); + vduse_virqfd_exit(); + vduse_parentdev_exit(); +} +module_exit(vduse_exit); + +MODULE_VERSION(DRV_VERSION); +MODULE_LICENSE(DRV_LICENSE); +MODULE_AUTHOR(DRV_AUTHOR); +MODULE_DESCRIPTION(DRV_DESC); diff --git a/include/uapi/linux/vdpa.h b/include/uapi/linux/vdpa.h index bba8b83a94b5..a7a841e5ffc7 100644 --- a/include/uapi/linux/vdpa.h +++ b/include/uapi/linux/vdpa.h @@ -33,6 +33,7 @@ enum vdpa_attr { VDPA_ATTR_DEV_VENDOR_ID, /* u32 */ VDPA_ATTR_DEV_MAX_VQS, /* u32 */ VDPA_ATTR_DEV_MAX_VQ_SIZE, /* u16 */ + VDPA_ATTR_BACKEND_ID, /* u32 */ /* new attributes must be added above here */ VDPA_ATTR_MAX, diff --git a/include/uapi/linux/vduse.h b/include/uapi/linux/vduse.h new file mode 100644 index 000000000000..f8579abdaa3b --- /dev/null +++ b/include/uapi/linux/vduse.h @@ -0,0 +1,99 @@ +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ +#ifndef _UAPI_VDUSE_H_ +#define _UAPI_VDUSE_H_ + +#include + +/* the control messages definition for read/write */ + +#define VDUSE_CONFIG_DATA_LEN 256 + +enum vduse_req_type { + VDUSE_SET_VQ_NUM, + VDUSE_SET_VQ_ADDR, + VDUSE_SET_VQ_READY, + VDUSE_GET_VQ_READY, + VDUSE_SET_FEATURES, + VDUSE_GET_FEATURES, + VDUSE_SET_STATUS, + VDUSE_GET_STATUS, + VDUSE_SET_CONFIG, + VDUSE_GET_CONFIG, +}; + +struct vduse_vq_num { + __u32 index; + __u32 num; +}; + +struct vduse_vq_addr { + __u32 index; + __u64 desc_addr; + __u64 driver_addr; + __u64 device_addr; +}; + +struct vduse_vq_ready { + __u32 index; + __u8 ready; +}; + +struct vduse_dev_config_data { + __u32 offset; + __u32 len; + __u8 data[VDUSE_CONFIG_DATA_LEN]; +}; + +struct vduse_dev_request { + __u32 type; /* request type */ + __u32 unique; /* request id */ + __u32 flags; /* request flags */ + __u32 size; /* the payload size */ + union { + struct vduse_vq_num vq_num; /* virtqueue num */ + struct vduse_vq_addr vq_addr; /* virtqueue address */ + struct vduse_vq_ready vq_ready; /* virtqueue ready status */ + struct vduse_dev_config_data config; /* virtio device config space */ + __u64 features; /* virtio features */ + __u8 status; /* device status */ + }; +}; + +struct vduse_dev_response { + __u32 unique; /* corresponding request id */ + __s32 result; /* the result of request */ + union { + struct vduse_vq_ready vq_ready; /* virtqueue ready status */ + struct vduse_dev_config_data config; /* virtio device config space */ + __u64 features; /* virtio features */ + __u8 status; /* device status */ + }; +}; + +/* ioctls */ + +struct vduse_dev_config { + __u32 id; /* vduse device id */ + __u32 vendor_id; /* virtio vendor id */ + __u32 device_id; /* virtio device id */ + __u64 iova_size; /* iova space size, used for mmap(2) */ + __u16 vq_num; /* the number of virtqueues */ + __u16 vq_size_max; /* the max size of virtqueue */ + __u32 vq_align; /* the allocation alignment of virtqueue's metadata */ +}; + +struct vduse_vq_eventfd { + __u32 index; /* virtqueue index */ + __u32 fd; /* eventfd */ +}; + +#define VDUSE_BASE 0x81 + +#define VDUSE_CREATE_DEV _IOW(VDUSE_BASE, 0x01, struct vduse_dev_config) +#define VDUSE_GET_DEV _IO(VDUSE_BASE, 0x02) +#define VDUSE_DESTROY_DEV _IO(VDUSE_BASE, 0x03) + +#define VDUSE_VQ_SETUP_KICKFD _IOW(VDUSE_BASE, 0x04, struct vduse_vq_eventfd) +#define VDUSE_VQ_SETUP_IRQFD _IOW(VDUSE_BASE, 0x05, struct vduse_vq_eventfd) + +#endif /* _UAPI_VDUSE_H_ */ From patchwork Tue Dec 22 14:52:15 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yongji Xie X-Patchwork-Id: 11986895 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.7 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,UNWANTED_LANGUAGE_BODY, URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E70B5C433DB for ; Tue, 22 Dec 2020 14:53:54 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 9ADBE229C6 for ; Tue, 22 Dec 2020 14:53:54 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 9ADBE229C6 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=bytedance.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id EC42B8D000C; Tue, 22 Dec 2020 09:53:53 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id E24D18D0011; Tue, 22 Dec 2020 09:53:53 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C507A8D000C; Tue, 22 Dec 2020 09:53:53 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0090.hostedemail.com [216.40.44.90]) by kanga.kvack.org (Postfix) with ESMTP id ADC998D000E for ; Tue, 22 Dec 2020 09:53:53 -0500 (EST) Received: from smtpin09.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 7DF24824999B for ; Tue, 22 Dec 2020 14:53:53 +0000 (UTC) X-FDA: 77621212746.09.mice80_4109caf27460 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin09.hostedemail.com (Postfix) with ESMTP id 5E7BA180AD83A for ; Tue, 22 Dec 2020 14:53:53 +0000 (UTC) X-HE-Tag: mice80_4109caf27460 X-Filterd-Recvd-Size: 8272 Received: from mail-pj1-f45.google.com (mail-pj1-f45.google.com [209.85.216.45]) by imf43.hostedemail.com (Postfix) with ESMTP for ; Tue, 22 Dec 2020 14:53:52 +0000 (UTC) Received: by mail-pj1-f45.google.com with SMTP id iq13so1416372pjb.3 for ; Tue, 22 Dec 2020 06:53:52 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=WVpiMFISEK5bmULf4ejx2zyXWwfYVm8DL2A3XwWC6mc=; b=OmIRUy+rIhe49YYTypJqWrWUDFVliVfMzFwBNejj8/ATpVRBmzn49pWWFjfMhEWatu aAG9YzI7NKx0JHeowGhEpmWy+wIs+tE0IeOxCaNQxHI0wkhLVPguC4XYrZHOFhajcYJM Pik6ODfmxn4JGlYErcl1qGjAn20nV0o+woNjr16axEcllzptRlBdeSok6KqsHS0l5Xn2 jXr5Pcu1njxAnR2uE5dN/ei2ZRykH21L5aJaNNKjVvAQhXvZwLUTf7zb1PEAzJ4QSW9W +GeOPURhko8s7Tap9AS+oweCf/uy3mBXGl1zrTv/o27yQdCAVn5BYZBkCEZ0KuzUxbxo VYag== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=WVpiMFISEK5bmULf4ejx2zyXWwfYVm8DL2A3XwWC6mc=; b=KKy8fr9mGmLz9VP+zGbdgYJ9Uv0hikDUANsVoLSfUgVioqq1zVHMaJwWyfcRrA8oRZ U380o3yeBKp/MZItrDwyAkJCUSbG4EDBfPgbfth4qhnoIbvrCREasGpShlUa3v70lYGm rrJr/lqVoxqZ7DNodX31CEtPcPUcvVHsc9HAs9DwLHcqhVqQbJedFrR835ubAfRebVik Tj2fzz9rg2FOBnYzTK8iVdJQ9sIWINL2LMsuvSTsFIs0SXGmr+1NGPOV2uErFbWIYnZa 0TgH0qOpFctz6RoGzGBq5Kfg5O/FvDKl2SAM9S1LqdMzD3B2qhTx3raxVOIVDIcQpUuI F+xA== X-Gm-Message-State: AOAM533RpsBDtMXKBowZiS1qt76cVwNZostYfZwAJPdufJiT0ULlBk2C 7CyIlo4ajN1QvROEKOj26/XZ X-Google-Smtp-Source: ABdhPJyUXGBubCUCYwRou9orBsnPOEmQF6msO4OENePqz427fxTAlf17KzWGs2+ZUPyEhT+cwqjVSg== X-Received: by 2002:a17:90b:14d3:: with SMTP id jz19mr22693411pjb.196.1608648832012; Tue, 22 Dec 2020 06:53:52 -0800 (PST) Received: from localhost ([139.177.225.248]) by smtp.gmail.com with ESMTPSA id z125sm19528369pfz.121.2020.12.22.06.53.50 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 22 Dec 2020 06:53:51 -0800 (PST) From: Xie Yongji To: mst@redhat.com, jasowang@redhat.com, stefanha@redhat.com, sgarzare@redhat.com, parav@nvidia.com, akpm@linux-foundation.org, rdunlap@infradead.org, willy@infradead.org, viro@zeniv.linux.org.uk, axboe@kernel.dk, bcrl@kvack.org, corbet@lwn.net Cc: virtualization@lists.linux-foundation.org, netdev@vger.kernel.org, kvm@vger.kernel.org, linux-aio@kvack.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org Subject: [RFC v2 07/13] vduse: support get/set virtqueue state Date: Tue, 22 Dec 2020 22:52:15 +0800 Message-Id: <20201222145221.711-8-xieyongji@bytedance.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20201222145221.711-1-xieyongji@bytedance.com> References: <20201222145221.711-1-xieyongji@bytedance.com> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: This patch makes vhost-vdpa bus driver can get/set virtqueue state from userspace VDUSE process. Signed-off-by: Xie Yongji --- Documentation/driver-api/vduse.rst | 4 +++ drivers/vdpa/vdpa_user/vduse_dev.c | 54 ++++++++++++++++++++++++++++++++++++++ include/uapi/linux/vduse.h | 9 +++++++ 3 files changed, 67 insertions(+) diff --git a/Documentation/driver-api/vduse.rst b/Documentation/driver-api/vduse.rst index da9b3040f20a..623f7b040ccf 100644 --- a/Documentation/driver-api/vduse.rst +++ b/Documentation/driver-api/vduse.rst @@ -30,6 +30,10 @@ The following types of messages are provided by the VDUSE framework now: - VDUSE_GET_VQ_READY: Get ready status of virtqueue +- VDUSE_SET_VQ_STATE: Set the state (last_avail_idx) for virtqueue + +- VDUSE_GET_VQ_STATE: Get the state (last_avail_idx) for virtqueue + - VDUSE_SET_FEATURES: Set virtio features supported by the driver - VDUSE_GET_FEATURES: Get virtio features supported by the device diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c index 4a869b9698ef..b974333ed4e9 100644 --- a/drivers/vdpa/vdpa_user/vduse_dev.c +++ b/drivers/vdpa/vdpa_user/vduse_dev.c @@ -291,6 +291,40 @@ static bool vduse_dev_get_vq_ready(struct vduse_dev *dev, return ready; } +static int vduse_dev_get_vq_state(struct vduse_dev *dev, + struct vduse_virtqueue *vq, + struct vdpa_vq_state *state) +{ + struct vduse_dev_msg *msg = vduse_dev_new_msg(dev, VDUSE_GET_VQ_STATE); + int ret; + + msg->req.size = sizeof(struct vduse_vq_state); + msg->req.vq_state.index = vq->index; + + ret = vduse_dev_msg_sync(dev, msg); + state->avail_index = msg->resp.vq_state.avail_idx; + vduse_dev_msg_put(msg); + + return ret; +} + +static int vduse_dev_set_vq_state(struct vduse_dev *dev, + struct vduse_virtqueue *vq, + const struct vdpa_vq_state *state) +{ + struct vduse_dev_msg *msg = vduse_dev_new_msg(dev, VDUSE_SET_VQ_STATE); + int ret; + + msg->req.size = sizeof(struct vduse_vq_state); + msg->req.vq_state.index = vq->index; + msg->req.vq_state.avail_idx = state->avail_index; + + ret = vduse_dev_msg_sync(dev, msg); + vduse_dev_msg_put(msg); + + return ret; +} + static ssize_t vduse_dev_read_iter(struct kiocb *iocb, struct iov_iter *to) { struct file *file = iocb->ki_filp; @@ -431,6 +465,24 @@ static bool vduse_vdpa_get_vq_ready(struct vdpa_device *vdpa, u16 idx) return vq->ready; } +static int vduse_vdpa_set_vq_state(struct vdpa_device *vdpa, u16 idx, + const struct vdpa_vq_state *state) +{ + struct vduse_dev *dev = vdpa_to_vduse(vdpa); + struct vduse_virtqueue *vq = &dev->vqs[idx]; + + return vduse_dev_set_vq_state(dev, vq, state); +} + +static int vduse_vdpa_get_vq_state(struct vdpa_device *vdpa, u16 idx, + struct vdpa_vq_state *state) +{ + struct vduse_dev *dev = vdpa_to_vduse(vdpa); + struct vduse_virtqueue *vq = &dev->vqs[idx]; + + return vduse_dev_get_vq_state(dev, vq, state); +} + static u32 vduse_vdpa_get_vq_align(struct vdpa_device *vdpa) { struct vduse_dev *dev = vdpa_to_vduse(vdpa); @@ -532,6 +584,8 @@ static const struct vdpa_config_ops vduse_vdpa_config_ops = { .set_vq_num = vduse_vdpa_set_vq_num, .set_vq_ready = vduse_vdpa_set_vq_ready, .get_vq_ready = vduse_vdpa_get_vq_ready, + .set_vq_state = vduse_vdpa_set_vq_state, + .get_vq_state = vduse_vdpa_get_vq_state, .get_vq_align = vduse_vdpa_get_vq_align, .get_features = vduse_vdpa_get_features, .set_features = vduse_vdpa_set_features, diff --git a/include/uapi/linux/vduse.h b/include/uapi/linux/vduse.h index f8579abdaa3b..873305dfd93f 100644 --- a/include/uapi/linux/vduse.h +++ b/include/uapi/linux/vduse.h @@ -13,6 +13,8 @@ enum vduse_req_type { VDUSE_SET_VQ_ADDR, VDUSE_SET_VQ_READY, VDUSE_GET_VQ_READY, + VDUSE_SET_VQ_STATE, + VDUSE_GET_VQ_STATE, VDUSE_SET_FEATURES, VDUSE_GET_FEATURES, VDUSE_SET_STATUS, @@ -38,6 +40,11 @@ struct vduse_vq_ready { __u8 ready; }; +struct vduse_vq_state { + __u32 index; + __u16 avail_idx; +}; + struct vduse_dev_config_data { __u32 offset; __u32 len; @@ -53,6 +60,7 @@ struct vduse_dev_request { struct vduse_vq_num vq_num; /* virtqueue num */ struct vduse_vq_addr vq_addr; /* virtqueue address */ struct vduse_vq_ready vq_ready; /* virtqueue ready status */ + struct vduse_vq_state vq_state; /* virtqueue state */ struct vduse_dev_config_data config; /* virtio device config space */ __u64 features; /* virtio features */ __u8 status; /* device status */ @@ -64,6 +72,7 @@ struct vduse_dev_response { __s32 result; /* the result of request */ union { struct vduse_vq_ready vq_ready; /* virtqueue ready status */ + struct vduse_vq_state vq_state; /* virtqueue state */ struct vduse_dev_config_data config; /* virtio device config space */ __u64 features; /* virtio features */ __u8 status; /* device status */ From patchwork Tue Dec 22 14:52:16 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yongji Xie X-Patchwork-Id: 11986897 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.5 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 98209C433DB for ; Tue, 22 Dec 2020 14:54:01 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 561C9229C6 for ; Tue, 22 Dec 2020 14:54:01 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 561C9229C6 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=bytedance.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id DB1608D0011; Tue, 22 Dec 2020 09:54:00 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id D38A88D0012; Tue, 22 Dec 2020 09:54:00 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BDA438D000E; Tue, 22 Dec 2020 09:54:00 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0021.hostedemail.com [216.40.44.21]) by kanga.kvack.org (Postfix) with ESMTP id 31CC98D000E for ; Tue, 22 Dec 2020 09:54:00 -0500 (EST) Received: from smtpin11.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id E581B824999B for ; Tue, 22 Dec 2020 14:53:59 +0000 (UTC) X-FDA: 77621212998.11.twist77_0007c6927460 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin11.hostedemail.com (Postfix) with ESMTP id C1102180F8B92 for ; Tue, 22 Dec 2020 14:53:59 +0000 (UTC) X-HE-Tag: twist77_0007c6927460 X-Filterd-Recvd-Size: 5532 Received: from mail-pf1-f175.google.com (mail-pf1-f175.google.com [209.85.210.175]) by imf23.hostedemail.com (Postfix) with ESMTP for ; Tue, 22 Dec 2020 14:53:59 +0000 (UTC) Received: by mail-pf1-f175.google.com with SMTP id x126so8596720pfc.7 for ; Tue, 22 Dec 2020 06:53:59 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=LHvQD7OkJ4lGDapgVipEOlFAJokuqf9tiHbhgrtKHnQ=; b=RbF6t1URx0NXuDx34sfOQmiHT5wyMt7HcQz7xB3j7UlRJJ8qn6zNpUxhzqnhTkabhw cHNm5bWTktHQwlrssm1Bxy/zV5HBtRdwVWm7PD7miDmOEaj4RsheZ0pjhsoJHl9MtAck lRaz1T0kLTHs57YosK4erxKVciURD2nNzv+vy2/cLa8RhC57RP/v9FvpucBxRXTT+END rR9hlo8x0zAOGHsD97fDtU/+26SncIQFJorfgouF6bmMg+0LaV5JZ0Vh+2CbP+J4jyRD rDOz+XsvnhvnpjA8QO5lxzQLeshKdzqd6/G/oGoCHCm4ilN6Z9nDrUF33KtTgyvOgxox dFug== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=LHvQD7OkJ4lGDapgVipEOlFAJokuqf9tiHbhgrtKHnQ=; b=FtDXoYeuSfxBF8rO5pwpid4mXfX/RuiFjZTNusZ68gj7fSV5V6YoW+BjqnUXeZ12Fw kFH5svJud1t2RLm0SWzXiy2SCWDfjC2Dlm8+7a1IDbT9X/2TJvAl5e579KcSJJjG/TOb PB44F+S8V6WO0VZdOamhiSqKL3a33EZVfklUvBsSs4DgCrj0zRky3eCQNL4howVibBGZ Om3nRrRmBowxrz99JLXdYstxvAlCpAzVjHwGJIBbjPyJFG3IEnhuKH/XAhf27nwYVh3p MujWsmzbmrGlZNREnq/QW8/YUH3T/wOutYUPRiWfbSI5ylHKOwLmz62PWrE+92EEd0vA IbHQ== X-Gm-Message-State: AOAM532eD0eCjfMEA7lqJB9hnB52YJINH5y8lIT3lCYPd/yJ4LjtC42/ OhewugBo4Ww8ZPsv4rWvBd0q X-Google-Smtp-Source: ABdhPJxHgVEI1a5PZz0mJOg/L7ZcxPgkg0e49rxJy8/9ZVyVLaCF/QiD5aZkOXPawm0UVGFmRu0uqQ== X-Received: by 2002:a63:1c1d:: with SMTP id c29mr13018708pgc.94.1608648838484; Tue, 22 Dec 2020 06:53:58 -0800 (PST) Received: from localhost ([139.177.225.248]) by smtp.gmail.com with ESMTPSA id u12sm20339420pfh.98.2020.12.22.06.53.56 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 22 Dec 2020 06:53:57 -0800 (PST) From: Xie Yongji To: mst@redhat.com, jasowang@redhat.com, stefanha@redhat.com, sgarzare@redhat.com, parav@nvidia.com, akpm@linux-foundation.org, rdunlap@infradead.org, willy@infradead.org, viro@zeniv.linux.org.uk, axboe@kernel.dk, bcrl@kvack.org, corbet@lwn.net Cc: virtualization@lists.linux-foundation.org, netdev@vger.kernel.org, kvm@vger.kernel.org, linux-aio@kvack.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org Subject: [RFC v2 08/13] vdpa: Introduce process_iotlb_msg() in vdpa_config_ops Date: Tue, 22 Dec 2020 22:52:16 +0800 Message-Id: <20201222145221.711-9-xieyongji@bytedance.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20201222145221.711-1-xieyongji@bytedance.com> References: <20201222145221.711-1-xieyongji@bytedance.com> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: This patch introduces a new method in the vdpa_config_ops to support processing the raw vhost memory mapping message in the vDPA device driver. Signed-off-by: Xie Yongji --- drivers/vhost/vdpa.c | 5 ++++- include/linux/vdpa.h | 7 +++++++ 2 files changed, 11 insertions(+), 1 deletion(-) diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c index 448be7875b6d..ccbb391e38be 100644 --- a/drivers/vhost/vdpa.c +++ b/drivers/vhost/vdpa.c @@ -728,6 +728,9 @@ static int vhost_vdpa_process_iotlb_msg(struct vhost_dev *dev, if (r) return r; + if (ops->process_iotlb_msg) + return ops->process_iotlb_msg(vdpa, msg); + switch (msg->type) { case VHOST_IOTLB_UPDATE: r = vhost_vdpa_process_iotlb_update(v, msg); @@ -770,7 +773,7 @@ static int vhost_vdpa_alloc_domain(struct vhost_vdpa *v) int ret; /* Device want to do DMA by itself */ - if (ops->set_map || ops->dma_map) + if (ops->set_map || ops->dma_map || ops->process_iotlb_msg) return 0; bus = dma_dev->bus; diff --git a/include/linux/vdpa.h b/include/linux/vdpa.h index 656fe264234e..7bccedf22f4b 100644 --- a/include/linux/vdpa.h +++ b/include/linux/vdpa.h @@ -5,6 +5,7 @@ #include #include #include +#include #include #include @@ -172,6 +173,10 @@ struct vdpa_iova_range { * @vdev: vdpa device * Returns the iova range supported by * the device. + * @process_iotlb_msg: Process vhost memory mapping message (optional) + * Only used for VDUSE device now + * @vdev: vdpa device + * @msg: vhost memory mapping message * @set_map: Set device memory mapping (optional) * Needed for device that using device * specific DMA translation (on-chip IOMMU) @@ -240,6 +245,8 @@ struct vdpa_config_ops { struct vdpa_iova_range (*get_iova_range)(struct vdpa_device *vdev); /* DMA ops */ + int (*process_iotlb_msg)(struct vdpa_device *vdev, + struct vhost_iotlb_msg *msg); int (*set_map)(struct vdpa_device *vdev, struct vhost_iotlb *iotlb); int (*dma_map)(struct vdpa_device *vdev, u64 iova, u64 size, u64 pa, u32 perm); From patchwork Tue Dec 22 14:52:17 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yongji Xie X-Patchwork-Id: 11986899 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id BA4BBC433DB for ; Tue, 22 Dec 2020 14:54:12 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 3E675229C6 for ; Tue, 22 Dec 2020 14:54:12 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 3E675229C6 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=bytedance.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id C03378D0010; Tue, 22 Dec 2020 09:54:11 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id B8B8C8D000E; Tue, 22 Dec 2020 09:54:11 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 96B258D0010; Tue, 22 Dec 2020 09:54:11 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0126.hostedemail.com [216.40.44.126]) by kanga.kvack.org (Postfix) with ESMTP id 4AF258D000E for ; Tue, 22 Dec 2020 09:54:11 -0500 (EST) Received: from smtpin13.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 759CB3635 for ; Tue, 22 Dec 2020 14:54:09 +0000 (UTC) X-FDA: 77621213418.13.girls88_0f0884427460 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin13.hostedemail.com (Postfix) with ESMTP id 4934118140B75 for ; Tue, 22 Dec 2020 14:54:09 +0000 (UTC) X-HE-Tag: girls88_0f0884427460 X-Filterd-Recvd-Size: 11504 Received: from mail-pg1-f172.google.com (mail-pg1-f172.google.com [209.85.215.172]) by imf25.hostedemail.com (Postfix) with ESMTP for ; Tue, 22 Dec 2020 14:54:08 +0000 (UTC) Received: by mail-pg1-f172.google.com with SMTP id f17so8494698pge.6 for ; Tue, 22 Dec 2020 06:54:08 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=4+Fjebo98WhfRUN3DrQtPHG8mNqX1npOT1+Tn0CjZWw=; b=lRXPpJ8trxMdYGEQyvikZh7MHBo6qVFebzTRH2Iw67+yIhqQAKXuAe/PXQz3QmUtnM cvuopjkNxS21JL07G6BSQSDgEmFx5xcbz0FP2DPjHK8W2Np5y5zbh+EDa3Iytid8HnZl rZIUgpBAluBV9lZ9BZv9D5N+daewxyWU0pBs+fip7zNLbpQ5IZqSBnPufxYkU7rwfWDe ZdRbvJdcgno/Txow/JtJ3Bxj8S7UfxnSydZPvyfVJeZFJEjzdeOPH0qM+B0PlrLYR8gD n7NVryoFQXZQc5ShZFUtQqmntWjabTAuNwCkR1J48P31+XWCWDy+hHWn2TZ3Z2sVTRUl oqNA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=4+Fjebo98WhfRUN3DrQtPHG8mNqX1npOT1+Tn0CjZWw=; b=NtFt5lKaVrumpfzO5l1q/lp1Hjquwzhw7UHw+uhdZtZBHSmWfwgVdZrtXy4MWxOXGx JVKUNdehoabMbZy3oGsXIw1EyhOp0AxKi0BTGILco+y+bROJWsD/3XxtKNcuIk9Z3h5S xbZnr5q/swKwfbW4S7gFmu0mxcN8iphq5ONJ3ACLQnOOmKcFYmsjeHPy4NYy4x/9lNII aAMm+sUrKYinwIOfgOmTTnYSAcb8a15pcDq+JEPm3qQuK6B/m0J8qKd5v4Vo2DqHUEJi 22nUCQirTokC1BqfzWfuqSHwJtV48kCYdP0h95O9i3Qi2bGjhDq4Rpp+pxIL/gqP70p/ Irkg== X-Gm-Message-State: AOAM532DhpURnhZtzF3t9nBk+oZEEy5tyfFRENTuaA3lcmoCzG/eJvm+ DuLWQIk16f+82jwhQl3kscm0 X-Google-Smtp-Source: ABdhPJxtUNSVl5q3pqx99vDuPEDlkGWR5BEQUY7kaawfKYH1NxQr+mt2TkyzP9M4sO3xRqG1u+OO+Q== X-Received: by 2002:aa7:9ab7:0:b029:19d:ac89:39aa with SMTP id x23-20020aa79ab70000b029019dac8939aamr20258173pfi.10.1608648847840; Tue, 22 Dec 2020 06:54:07 -0800 (PST) Received: from localhost ([139.177.225.248]) by smtp.gmail.com with ESMTPSA id f9sm20660288pfa.41.2020.12.22.06.54.05 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 22 Dec 2020 06:54:07 -0800 (PST) From: Xie Yongji To: mst@redhat.com, jasowang@redhat.com, stefanha@redhat.com, sgarzare@redhat.com, parav@nvidia.com, akpm@linux-foundation.org, rdunlap@infradead.org, willy@infradead.org, viro@zeniv.linux.org.uk, axboe@kernel.dk, bcrl@kvack.org, corbet@lwn.net Cc: virtualization@lists.linux-foundation.org, netdev@vger.kernel.org, kvm@vger.kernel.org, linux-aio@kvack.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org Subject: [RFC v2 09/13] vduse: Add support for processing vhost iotlb message Date: Tue, 22 Dec 2020 22:52:17 +0800 Message-Id: <20201222145221.711-10-xieyongji@bytedance.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20201222145221.711-1-xieyongji@bytedance.com> References: <20201222145221.711-1-xieyongji@bytedance.com> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: To support vhost-vdpa bus driver, we need a way to share the vhost-vdpa backend process's memory with the userspace VDUSE process. This patch tries to make use of the vhost iotlb message to achieve that. We will get the shm file from the iotlb message and pass it to the userspace VDUSE process. Signed-off-by: Xie Yongji --- Documentation/driver-api/vduse.rst | 15 +++- drivers/vdpa/vdpa_user/vduse_dev.c | 147 ++++++++++++++++++++++++++++++++++++- include/uapi/linux/vduse.h | 11 +++ 3 files changed, 171 insertions(+), 2 deletions(-) diff --git a/Documentation/driver-api/vduse.rst b/Documentation/driver-api/vduse.rst index 623f7b040ccf..48e4b1ba353f 100644 --- a/Documentation/driver-api/vduse.rst +++ b/Documentation/driver-api/vduse.rst @@ -46,13 +46,26 @@ The following types of messages are provided by the VDUSE framework now: - VDUSE_GET_CONFIG: Read from device specific configuration space +- VDUSE_UPDATE_IOTLB: Update the memory mapping in device IOTLB + +- VDUSE_INVALIDATE_IOTLB: Invalidate the memory mapping in device IOTLB + Please see include/linux/vdpa.h for details. -In the data path, VDUSE framework implements a MMU-based on-chip IOMMU +The data path of userspace vDPA device is implemented in different ways +depending on the vdpa bus to which it is attached. + +In virtio-vdpa case, VDUSE framework implements a MMU-based on-chip IOMMU driver which supports mapping the kernel dma buffer to a userspace iova region dynamically. The userspace iova region can be created by passing the userspace vDPA device fd to mmap(2). +In vhost-vdpa case, the dma buffer is reside in a userspace memory region +which will be shared to the VDUSE userspace processs via the file +descriptor in VDUSE_UPDATE_IOTLB message. And the corresponding address +mapping (IOVA of dma buffer <-> VA of the memory region) is also included +in this message. + Besides, the eventfd mechanism is used to trigger interrupt callbacks and receive virtqueue kicks in userspace. The following ioctls on the userspace vDPA device fd are provided to support that: diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c index b974333ed4e9..d24aaacb6008 100644 --- a/drivers/vdpa/vdpa_user/vduse_dev.c +++ b/drivers/vdpa/vdpa_user/vduse_dev.c @@ -34,6 +34,7 @@ struct vduse_dev_msg { struct vduse_dev_request req; + struct file *iotlb_file; struct vduse_dev_response resp; struct list_head list; wait_queue_head_t waitq; @@ -325,12 +326,80 @@ static int vduse_dev_set_vq_state(struct vduse_dev *dev, return ret; } +static int vduse_dev_update_iotlb(struct vduse_dev *dev, struct file *file, + u64 offset, u64 iova, u64 size, u8 perm) +{ + struct vduse_dev_msg *msg; + int ret; + + if (!size) + return -EINVAL; + + msg = vduse_dev_new_msg(dev, VDUSE_UPDATE_IOTLB); + msg->req.size = sizeof(struct vduse_iotlb); + msg->req.iotlb.offset = offset; + msg->req.iotlb.iova = iova; + msg->req.iotlb.size = size; + msg->req.iotlb.perm = perm; + msg->req.iotlb.fd = -1; + msg->iotlb_file = get_file(file); + + ret = vduse_dev_msg_sync(dev, msg); + vduse_dev_msg_put(msg); + fput(file); + + return ret; +} + +static int vduse_dev_invalidate_iotlb(struct vduse_dev *dev, + u64 iova, u64 size) +{ + struct vduse_dev_msg *msg; + int ret; + + if (!size) + return -EINVAL; + + msg = vduse_dev_new_msg(dev, VDUSE_INVALIDATE_IOTLB); + msg->req.size = sizeof(struct vduse_iotlb); + msg->req.iotlb.iova = iova; + msg->req.iotlb.size = size; + + ret = vduse_dev_msg_sync(dev, msg); + vduse_dev_msg_put(msg); + + return ret; +} + +static unsigned int perm_to_file_flags(u8 perm) +{ + unsigned int flags = 0; + + switch (perm) { + case VHOST_ACCESS_WO: + flags |= O_WRONLY; + break; + case VHOST_ACCESS_RO: + flags |= O_RDONLY; + break; + case VHOST_ACCESS_RW: + flags |= O_RDWR; + break; + default: + WARN(1, "invalidate vhost IOTLB permission\n"); + break; + } + + return flags; +} + static ssize_t vduse_dev_read_iter(struct kiocb *iocb, struct iov_iter *to) { struct file *file = iocb->ki_filp; struct vduse_dev *dev = file->private_data; struct vduse_dev_msg *msg; - int size = sizeof(struct vduse_dev_request); + unsigned int flags; + int fd, size = sizeof(struct vduse_dev_request); ssize_t ret = 0; if (iov_iter_count(to) < size) @@ -349,6 +418,18 @@ static ssize_t vduse_dev_read_iter(struct kiocb *iocb, struct iov_iter *to) if (ret) return ret; } + + if (msg->req.type == VDUSE_UPDATE_IOTLB && msg->req.iotlb.fd == -1) { + flags = perm_to_file_flags(msg->req.iotlb.perm); + fd = get_unused_fd_flags(flags); + if (fd < 0) { + vduse_dev_enqueue_msg(dev, msg, &dev->send_list); + return fd; + } + fd_install(fd, get_file(msg->iotlb_file)); + msg->req.iotlb.fd = fd; + } + ret = copy_to_iter(&msg->req, size, to); if (ret != size) { vduse_dev_enqueue_msg(dev, msg, &dev->send_list); @@ -565,6 +646,69 @@ static void vduse_vdpa_set_config(struct vdpa_device *vdpa, unsigned int offset, vduse_dev_set_config(dev, offset, buf, len); } +static void vduse_vdpa_invalidate_iotlb(struct vduse_dev *dev, + struct vhost_iotlb_msg *msg) +{ + vduse_dev_invalidate_iotlb(dev, msg->iova, msg->size); +} + +static int vduse_vdpa_update_iotlb(struct vduse_dev *dev, + struct vhost_iotlb_msg *msg) +{ + u64 uaddr = msg->uaddr; + u64 iova = msg->iova; + u64 size = msg->size; + u64 offset; + struct vm_area_struct *vma; + int ret; + + while (uaddr < msg->uaddr + msg->size) { + vma = find_vma(current->mm, uaddr); + ret = -EINVAL; + if (!vma) + goto err; + + size = min(msg->size, vma->vm_end - uaddr); + offset = (vma->vm_pgoff << PAGE_SHIFT) + uaddr - vma->vm_start; + if (vma->vm_file && (vma->vm_flags & VM_SHARED)) { + ret = vduse_dev_update_iotlb(dev, vma->vm_file, offset, + iova, size, msg->perm); + if (ret) + goto err; + } + iova += size; + uaddr += size; + } + return 0; +err: + vduse_dev_invalidate_iotlb(dev, msg->iova, iova - msg->iova); + return ret; +} + +static int vduse_vdpa_process_iotlb_msg(struct vdpa_device *vdpa, + struct vhost_iotlb_msg *msg) +{ + struct vduse_dev *dev = vdpa_to_vduse(vdpa); + int ret = 0; + + switch (msg->type) { + case VHOST_IOTLB_UPDATE: + ret = vduse_vdpa_update_iotlb(dev, msg); + break; + case VHOST_IOTLB_INVALIDATE: + vduse_vdpa_invalidate_iotlb(dev, msg); + break; + case VHOST_IOTLB_BATCH_BEGIN: + case VHOST_IOTLB_BATCH_END: + break; + default: + ret = -EINVAL; + break; + } + + return ret; +} + static void vduse_vdpa_free(struct vdpa_device *vdpa) { struct vduse_dev *dev = vdpa_to_vduse(vdpa); @@ -597,6 +741,7 @@ static const struct vdpa_config_ops vduse_vdpa_config_ops = { .set_status = vduse_vdpa_set_status, .get_config = vduse_vdpa_get_config, .set_config = vduse_vdpa_set_config, + .process_iotlb_msg = vduse_vdpa_process_iotlb_msg, .free = vduse_vdpa_free, }; diff --git a/include/uapi/linux/vduse.h b/include/uapi/linux/vduse.h index 873305dfd93f..c5080851f140 100644 --- a/include/uapi/linux/vduse.h +++ b/include/uapi/linux/vduse.h @@ -21,6 +21,8 @@ enum vduse_req_type { VDUSE_GET_STATUS, VDUSE_SET_CONFIG, VDUSE_GET_CONFIG, + VDUSE_UPDATE_IOTLB, + VDUSE_INVALIDATE_IOTLB, }; struct vduse_vq_num { @@ -51,6 +53,14 @@ struct vduse_dev_config_data { __u8 data[VDUSE_CONFIG_DATA_LEN]; }; +struct vduse_iotlb { + __u32 fd; + __u64 offset; + __u64 iova; + __u64 size; + __u8 perm; +}; + struct vduse_dev_request { __u32 type; /* request type */ __u32 unique; /* request id */ @@ -62,6 +72,7 @@ struct vduse_dev_request { struct vduse_vq_ready vq_ready; /* virtqueue ready status */ struct vduse_vq_state vq_state; /* virtqueue state */ struct vduse_dev_config_data config; /* virtio device config space */ + struct vduse_iotlb iotlb; /* iotlb message */ __u64 features; /* virtio features */ __u8 status; /* device status */ }; From patchwork Tue Dec 22 14:52:18 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yongji Xie X-Patchwork-Id: 11986901 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8B559C433E0 for ; Tue, 22 Dec 2020 14:54:17 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 16AE0229C6 for ; Tue, 22 Dec 2020 14:54:17 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 16AE0229C6 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=bytedance.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id AA3128D0014; Tue, 22 Dec 2020 09:54:16 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id A04638D0013; Tue, 22 Dec 2020 09:54:16 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8CCB78D0012; Tue, 22 Dec 2020 09:54:16 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0107.hostedemail.com [216.40.44.107]) by kanga.kvack.org (Postfix) with ESMTP id 626068D000E for ; Tue, 22 Dec 2020 09:54:16 -0500 (EST) Received: from smtpin16.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 2F5CA181AEF3C for ; Tue, 22 Dec 2020 14:54:16 +0000 (UTC) X-FDA: 77621213712.16.cave39_280862e27460 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin16.hostedemail.com (Postfix) with ESMTP id 03E15100E6912 for ; Tue, 22 Dec 2020 14:54:15 +0000 (UTC) X-HE-Tag: cave39_280862e27460 X-Filterd-Recvd-Size: 4290 Received: from mail-pg1-f177.google.com (mail-pg1-f177.google.com [209.85.215.177]) by imf42.hostedemail.com (Postfix) with ESMTP for ; Tue, 22 Dec 2020 14:54:15 +0000 (UTC) Received: by mail-pg1-f177.google.com with SMTP id w5so8519697pgj.3 for ; Tue, 22 Dec 2020 06:54:15 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=r9QaV72vU0CvP07zYk0PiJX0syKXtEW/bOjBuLetYtk=; b=QGdRhTvWGjASw4KPz5cS1PAdxZ5xnZujqR/4keYRXM/ymbfGS/4H0xWjmE8YH2W6HW wOVIAgmXP0EBl1F5GevwY1TaUNxf+JMdPVLLSm1Xz+wsbPzo3bYX6UAvCyqavxglbxjp Yq2bJ2Km2RaIb9X9jNxydNbD1bHUgVVk52IeiKSyF9KCH29MxSdJ6qmwNucaScvwGuV1 UmnsqJazjPyTeK7EY2Ogn++cHBlZRD/LtqjAjMjpQi7nMDUbka3/svou4lGQCBA0VGnd cvhxjN+BekT7sb4+s+0OHSRD/zjSeS9/q1sUz+O7wHC74GyL6WAWqLyOGI+jKdGOfVJ9 z9dw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=r9QaV72vU0CvP07zYk0PiJX0syKXtEW/bOjBuLetYtk=; b=bq3ANrHqbGjqHONJCPbr6T4L1NXy5T3wOgJBidlaChr9R1Q+XUf3Cno2ypPhWI7rtz Taencw3Od9BwZvziJNIcvZYMIMHaKuLzqvREBGrsBVONkVyfGiWLvQzoVxt9Bi3mezxq 2fZmjn4A6qbE6mba3B5oIIHbbtzEXUAAHYa04hutzqx2KUChb5To+/3iDDYMG/XUQpdD cltmhJPMMcznyZgRyKTvYXzBPi1Drt6h2hG03N0qAVkbm5nErBs3AzJTPyT982pyUtkX PiilqqHjswP5UrG7seWavEycTuSx52WOaiFh7dvk58O87b48wxoHoSTZGGicfL7Q1NKp qpdg== X-Gm-Message-State: AOAM530il9Tm0KLd8NI4Rqqvnc4vtedEcXavIbHb6g8a1HC3WQtK5Zux wAhQCMM+FV+LP5WA9p2FGihf X-Google-Smtp-Source: ABdhPJx5JeOIN6GE5z/mEIdEo/CLbXeC86N4lOujqS9DPc8dRZ+sjIQX3qlMSBB+b6+0g8MdRv8DZg== X-Received: by 2002:a05:6a00:13a4:b029:18b:cfc9:1ea1 with SMTP id t36-20020a056a0013a4b029018bcfc91ea1mr19754314pfg.25.1608648854889; Tue, 22 Dec 2020 06:54:14 -0800 (PST) Received: from localhost ([139.177.225.248]) by smtp.gmail.com with ESMTPSA id o7sm22182222pfp.144.2020.12.22.06.54.13 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 22 Dec 2020 06:54:14 -0800 (PST) From: Xie Yongji To: mst@redhat.com, jasowang@redhat.com, stefanha@redhat.com, sgarzare@redhat.com, parav@nvidia.com, akpm@linux-foundation.org, rdunlap@infradead.org, willy@infradead.org, viro@zeniv.linux.org.uk, axboe@kernel.dk, bcrl@kvack.org, corbet@lwn.net Cc: virtualization@lists.linux-foundation.org, netdev@vger.kernel.org, kvm@vger.kernel.org, linux-aio@kvack.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org Subject: [RFC v2 10/13] vduse: grab the module's references until there is no vduse device Date: Tue, 22 Dec 2020 22:52:18 +0800 Message-Id: <20201222145221.711-11-xieyongji@bytedance.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20201222145221.711-1-xieyongji@bytedance.com> References: <20201222145221.711-1-xieyongji@bytedance.com> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The module should not be unloaded if any vduse device exists. So increase the module's reference count when creating vduse device. And the reference count is kept until the device is destroyed. Signed-off-by: Xie Yongji --- drivers/vdpa/vdpa_user/vduse_dev.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c index d24aaacb6008..c29b24a7e7e9 100644 --- a/drivers/vdpa/vdpa_user/vduse_dev.c +++ b/drivers/vdpa/vdpa_user/vduse_dev.c @@ -1052,6 +1052,7 @@ static int vduse_destroy_dev(u32 id) kfree(dev->vqs); vduse_iova_domain_destroy(dev->domain); vduse_dev_destroy(dev); + module_put(THIS_MODULE); return 0; } @@ -1096,6 +1097,7 @@ static int vduse_create_dev(struct vduse_dev_config *config) refcount_inc(&dev->refcnt); list_add(&dev->list, &vduse_devs); + __module_get(THIS_MODULE); return fd; err_fd: From patchwork Tue Dec 22 14:52:19 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yongji Xie X-Patchwork-Id: 11986903 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D845AC433E0 for ; Tue, 22 Dec 2020 14:54:24 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 462A3229C6 for ; Tue, 22 Dec 2020 14:54:24 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 462A3229C6 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=bytedance.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id C05CF8D000E; Tue, 22 Dec 2020 09:54:23 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id B8B886B00AF; Tue, 22 Dec 2020 09:54:23 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9C0EE8D0012; Tue, 22 Dec 2020 09:54:23 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0154.hostedemail.com [216.40.44.154]) by kanga.kvack.org (Postfix) with ESMTP id 745196B00AE for ; Tue, 22 Dec 2020 09:54:23 -0500 (EST) Received: from smtpin22.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id E6B69363C for ; Tue, 22 Dec 2020 14:54:22 +0000 (UTC) X-FDA: 77621213964.22.toad02_251496e27460 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin22.hostedemail.com (Postfix) with ESMTP id BDE8E18038E6C for ; Tue, 22 Dec 2020 14:54:22 +0000 (UTC) X-HE-Tag: toad02_251496e27460 X-Filterd-Recvd-Size: 9429 Received: from mail-pg1-f179.google.com (mail-pg1-f179.google.com [209.85.215.179]) by imf46.hostedemail.com (Postfix) with ESMTP for ; Tue, 22 Dec 2020 14:54:22 +0000 (UTC) Received: by mail-pg1-f179.google.com with SMTP id f17so8495035pge.6 for ; Tue, 22 Dec 2020 06:54:22 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=cR7jZ79Sb8lqcBGvyOl+8bdB8wHIq1EQVGYqgM9O97E=; b=z/yt6Hd1hZv+ox368PvLBGyeWoiEP3AjT55CJn/qkKnI5c1649754/dXxODxT6gcnk esn4XxR5pM7VVN83bkwR+M1aAnPPtXhkHjtA0zgGy9lLzbccaI74Cuxn7jlUl+sO0uJb 0S4Mvyyv6b0/RmuybT6zmD6fYhaX5XQEU/a7PBLiOV8EJ884+cksPse1iEC61dEi7S3Z Lg1XioKH4nqXPkD60fPdwgMoDpPQAYpbb5Kae8h0JetEGoKBx+qiHqAo8OpUcxZluOA8 KtpYhpmZOHygn+LFpX963N2wZNyCEZahTb7eUQ3OFWQ19vZ8MDDXdPp9fuhHQxtPslcf IRxw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=cR7jZ79Sb8lqcBGvyOl+8bdB8wHIq1EQVGYqgM9O97E=; b=soHcEzG1Ty5GplwANqsvSlAUvhSiGDJLxnaKjhLO3x4szBcvF5FFJo8MCFz8a1Blxe 0UEkDi0UTpht3DHAhemxlgwRF30FlLjlcgUNXDJFnuK84e/AXA/pFtH4jmzyqBpambkO r5S1BBpWthHY7k60yu5XL3NKaaU1QDwyQZpBBLP5w27ogxWvRtVvIQ4nqpyqAIBQPMHY t7Dz+wJv2EkwWLAVo+rV7ceNUNHw4M7OUy70JNjyKUZb+givsYn1TZahaQLuSVm1lhBz AyzV3AIsCP3Gq8i32WcoggQ19UZOcpZAQz6fLlkuMw/t72bEffnOOfAYRaDOhFWVnGtp o4HA== X-Gm-Message-State: AOAM531fPIxyWPhBklKaK4h48BBDTOjiFKO/oRUfr552peKMm9XFDwqf QqICwm9tUUHPpNzvizxabXgv X-Google-Smtp-Source: ABdhPJxXMqptUBDlJmnO+A6ZpHHaPhrTIrYuz6A++hcMoz2+nU+3pr189Qont8EBCkjnXew1W8fVZw== X-Received: by 2002:a63:4517:: with SMTP id s23mr2068953pga.267.1608648861396; Tue, 22 Dec 2020 06:54:21 -0800 (PST) Received: from localhost ([139.177.225.248]) by smtp.gmail.com with ESMTPSA id q23sm21530540pfg.18.2020.12.22.06.54.19 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 22 Dec 2020 06:54:20 -0800 (PST) From: Xie Yongji To: mst@redhat.com, jasowang@redhat.com, stefanha@redhat.com, sgarzare@redhat.com, parav@nvidia.com, akpm@linux-foundation.org, rdunlap@infradead.org, willy@infradead.org, viro@zeniv.linux.org.uk, axboe@kernel.dk, bcrl@kvack.org, corbet@lwn.net Cc: virtualization@lists.linux-foundation.org, netdev@vger.kernel.org, kvm@vger.kernel.org, linux-aio@kvack.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org Subject: [RFC v2 11/13] vduse/iova_domain: Support reclaiming bounce pages Date: Tue, 22 Dec 2020 22:52:19 +0800 Message-Id: <20201222145221.711-12-xieyongji@bytedance.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20201222145221.711-1-xieyongji@bytedance.com> References: <20201222145221.711-1-xieyongji@bytedance.com> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Introduce vduse_domain_reclaim() to support reclaiming bounce page when necessary. We will do reclaiming chunk by chunk. And only reclaim the iova chunk that no one used. Signed-off-by: Xie Yongji --- drivers/vdpa/vdpa_user/iova_domain.c | 83 ++++++++++++++++++++++++++++++++++-- drivers/vdpa/vdpa_user/iova_domain.h | 10 +++++ 2 files changed, 89 insertions(+), 4 deletions(-) diff --git a/drivers/vdpa/vdpa_user/iova_domain.c b/drivers/vdpa/vdpa_user/iova_domain.c index 27022157abc6..c438cc85d33d 100644 --- a/drivers/vdpa/vdpa_user/iova_domain.c +++ b/drivers/vdpa/vdpa_user/iova_domain.c @@ -29,6 +29,8 @@ struct vduse_mmap_vma { struct list_head list; }; +struct percpu_counter vduse_total_bounce_pages; + static inline struct page * vduse_domain_get_bounce_page(struct vduse_iova_domain *domain, unsigned long iova) @@ -48,6 +50,13 @@ vduse_domain_set_bounce_page(struct vduse_iova_domain *domain, unsigned long chunkoff = iova & ~IOVA_CHUNK_MASK; unsigned long pgindex = chunkoff >> PAGE_SHIFT; + if (page) { + domain->chunks[index].used_bounce_pages++; + percpu_counter_inc(&vduse_total_bounce_pages); + } else { + domain->chunks[index].used_bounce_pages--; + percpu_counter_dec(&vduse_total_bounce_pages); + } domain->chunks[index].bounce_pages[pgindex] = page; } @@ -175,6 +184,29 @@ void vduse_domain_remove_mapping(struct vduse_iova_domain *domain, } } +static bool vduse_domain_try_unmap(struct vduse_iova_domain *domain, + unsigned long iova, size_t size) +{ + struct vduse_mmap_vma *mmap_vma; + unsigned long uaddr; + bool unmap = true; + + mutex_lock(&domain->vma_lock); + list_for_each_entry(mmap_vma, &domain->vma_list, list) { + if (!mmap_read_trylock(mmap_vma->vma->vm_mm)) { + unmap = false; + break; + } + + uaddr = iova + mmap_vma->vma->vm_start; + zap_page_range(mmap_vma->vma, uaddr, size); + mmap_read_unlock(mmap_vma->vma->vm_mm); + } + mutex_unlock(&domain->vma_lock); + + return unmap; +} + void vduse_domain_unmap(struct vduse_iova_domain *domain, unsigned long iova, size_t size) { @@ -302,6 +334,32 @@ bool vduse_domain_is_direct_map(struct vduse_iova_domain *domain, return atomic_read(&chunk->map_type) == TYPE_DIRECT_MAP; } +int vduse_domain_reclaim(struct vduse_iova_domain *domain) +{ + struct vduse_iova_chunk *chunk; + int i, freed = 0; + + for (i = domain->chunk_num - 1; i >= 0; i--) { + chunk = &domain->chunks[i]; + if (!chunk->used_bounce_pages) + continue; + + if (atomic_cmpxchg(&chunk->state, 0, INT_MIN) != 0) + continue; + + if (!vduse_domain_try_unmap(domain, + chunk->start, IOVA_CHUNK_SIZE)) { + atomic_sub(INT_MIN, &chunk->state); + break; + } + freed += vduse_domain_free_bounce_pages(domain, + chunk->start, IOVA_CHUNK_SIZE); + atomic_sub(INT_MIN, &chunk->state); + } + + return freed; +} + unsigned long vduse_domain_alloc_iova(struct vduse_iova_domain *domain, size_t size, enum iova_map_type type) { @@ -319,10 +377,13 @@ unsigned long vduse_domain_alloc_iova(struct vduse_iova_domain *domain, if (atomic_read(&chunk->map_type) != type) continue; - iova = gen_pool_alloc_algo(chunk->pool, size, + if (atomic_fetch_inc(&chunk->state) >= 0) { + iova = gen_pool_alloc_algo(chunk->pool, size, gen_pool_first_fit_align, &data); - if (iova) - break; + if (iova) + break; + } + atomic_dec(&chunk->state); } return iova; @@ -335,6 +396,7 @@ void vduse_domain_free_iova(struct vduse_iova_domain *domain, struct vduse_iova_chunk *chunk = &domain->chunks[index]; gen_pool_free(chunk->pool, iova, size); + atomic_dec(&chunk->state); } static void vduse_iova_chunk_cleanup(struct vduse_iova_chunk *chunk) @@ -351,7 +413,8 @@ void vduse_iova_domain_destroy(struct vduse_iova_domain *domain) for (i = 0; i < domain->chunk_num; i++) { chunk = &domain->chunks[i]; - vduse_domain_free_bounce_pages(domain, + if (chunk->used_bounce_pages) + vduse_domain_free_bounce_pages(domain, chunk->start, IOVA_CHUNK_SIZE); vduse_iova_chunk_cleanup(chunk); } @@ -390,8 +453,10 @@ static int vduse_iova_chunk_init(struct vduse_iova_chunk *chunk, if (!chunk->iova_map) goto err; + chunk->used_bounce_pages = 0; chunk->start = addr; atomic_set(&chunk->map_type, TYPE_NONE); + atomic_set(&chunk->state, 0); return 0; err: @@ -440,3 +505,13 @@ struct vduse_iova_domain *vduse_iova_domain_create(size_t size) return NULL; } + +int vduse_domain_init(void) +{ + return percpu_counter_init(&vduse_total_bounce_pages, 0, GFP_KERNEL); +} + +void vduse_domain_exit(void) +{ + percpu_counter_destroy(&vduse_total_bounce_pages); +} diff --git a/drivers/vdpa/vdpa_user/iova_domain.h b/drivers/vdpa/vdpa_user/iova_domain.h index fe1816287f5f..6815b00629d2 100644 --- a/drivers/vdpa/vdpa_user/iova_domain.h +++ b/drivers/vdpa/vdpa_user/iova_domain.h @@ -31,8 +31,10 @@ struct vduse_iova_chunk { struct gen_pool *pool; struct page **bounce_pages; struct vduse_iova_map **iova_map; + int used_bounce_pages; unsigned long start; atomic_t map_type; + atomic_t state; }; struct vduse_iova_domain { @@ -44,6 +46,8 @@ struct vduse_iova_domain { struct list_head vma_list; }; +extern struct percpu_counter vduse_total_bounce_pages; + int vduse_domain_add_vma(struct vduse_iova_domain *domain, struct vm_area_struct *vma); @@ -77,6 +81,8 @@ int vduse_domain_bounce_map(struct vduse_iova_domain *domain, bool vduse_domain_is_direct_map(struct vduse_iova_domain *domain, unsigned long iova); +int vduse_domain_reclaim(struct vduse_iova_domain *domain); + unsigned long vduse_domain_alloc_iova(struct vduse_iova_domain *domain, size_t size, enum iova_map_type type); @@ -90,4 +96,8 @@ void vduse_iova_domain_destroy(struct vduse_iova_domain *domain); struct vduse_iova_domain *vduse_iova_domain_create(size_t size); +int vduse_domain_init(void); + +void vduse_domain_exit(void); + #endif /* _VDUSE_IOVA_DOMAIN_H */ From patchwork Tue Dec 22 14:52:20 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yongji Xie X-Patchwork-Id: 11986905 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6E6B2C433E6 for ; Tue, 22 Dec 2020 14:54:30 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 0EF96229C6 for ; Tue, 22 Dec 2020 14:54:30 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 0EF96229C6 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=bytedance.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 944BC6B00AD; Tue, 22 Dec 2020 09:54:29 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 8A3848D0012; Tue, 22 Dec 2020 09:54:29 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6CF016B00AE; Tue, 22 Dec 2020 09:54:29 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 4F80D6B00AF for ; Tue, 22 Dec 2020 09:54:29 -0500 (EST) Received: from smtpin15.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 192B68249980 for ; Tue, 22 Dec 2020 14:54:29 +0000 (UTC) X-FDA: 77621214258.15.scarf37_080464027460 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin15.hostedemail.com (Postfix) with ESMTP id D02DF1814B0C7 for ; Tue, 22 Dec 2020 14:54:28 +0000 (UTC) X-HE-Tag: scarf37_080464027460 X-Filterd-Recvd-Size: 5728 Received: from mail-pj1-f43.google.com (mail-pj1-f43.google.com [209.85.216.43]) by imf06.hostedemail.com (Postfix) with ESMTP for ; Tue, 22 Dec 2020 14:54:28 +0000 (UTC) Received: by mail-pj1-f43.google.com with SMTP id iq13so1417124pjb.3 for ; Tue, 22 Dec 2020 06:54:28 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=1UEYpy3brrmzHosf+pyp3Ev0oeyxtX5ZA7++RYwLUxk=; b=D5NeZ7x9TOi/yUZw3xp7qu2qq+Or3ua5HWy/Z2uXVr2+WLGsDsh+jFVr5aEbl1dZCj tuYXSzsZq/GiRRQXfDknHdcUw68CnGM8uHTIqoNEW+OGfxgwGXEnZNpm105gCBux5zRx w9AxEgACFlYii/vvDdeOHd8SF5hk2u3S/ZrBbA6iOpde7F+D4gHCXABILdLTVDmHYmXH eiXGHwdW7kw1p1RhSb+OMR7S7i+8gsIECeHaPY5OTK2fOiyJjsJGr9EdVK0pebF0B9j/ /q4OpWuQ7NadXBGpduGfOi97CcT/OFNaBDw+7P6COSPwyTfTwvS7LwAQnFd72B/SpWVa SLYg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=1UEYpy3brrmzHosf+pyp3Ev0oeyxtX5ZA7++RYwLUxk=; b=KyU03LP7preFe3ULddxJ/Gf5ENhcWp2vfUkFnkOLcUS0/JOsM1cLawjQ+c2O/D2s7v +BP2/RgJQAsEkliK1YA4R4Ql7ZvhJEYjaTfYo9tTPsmRKwaBbqFEwA+539QtQmGqtHSG fpMN3vogIU/5YeaBiM18AWOCGg7GebsGuy0YhAxMi6i3Tu7e9PMH38vyndwR8pprqzIg 4EJkg9kuJkL2pIDptYJq+XIRst2ucSvpGvMedoaHHgZTMG/TTevf0rRb/pXE00vHNRbO qE6OQpMNoahpbIRzuIxYfGa6QEFRuPF3l4UCpn/CXSmWaYY3YTJ+/4WjHxsT6nCFLu03 Gp7A== X-Gm-Message-State: AOAM533A+PkS6UJhoveZTmJTwRDx1PYePu6ByF/erqqmtW6SrIlgrDCG q6GASOC6D3TFRZVuA1HkvvxR X-Google-Smtp-Source: ABdhPJxiPxmyWa+UkgpZQDc/STtBaxADXfOe8GSVoIfBib0q5ZxwEZlDMoSp+1DTeBAkTEjqkhUD2A== X-Received: by 2002:a17:902:6103:b029:da:c46c:b3d6 with SMTP id t3-20020a1709026103b02900dac46cb3d6mr21153959plj.46.1608648867654; Tue, 22 Dec 2020 06:54:27 -0800 (PST) Received: from localhost ([139.177.225.248]) by smtp.gmail.com with ESMTPSA id k125sm17686436pga.57.2020.12.22.06.54.26 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 22 Dec 2020 06:54:27 -0800 (PST) From: Xie Yongji To: mst@redhat.com, jasowang@redhat.com, stefanha@redhat.com, sgarzare@redhat.com, parav@nvidia.com, akpm@linux-foundation.org, rdunlap@infradead.org, willy@infradead.org, viro@zeniv.linux.org.uk, axboe@kernel.dk, bcrl@kvack.org, corbet@lwn.net Cc: virtualization@lists.linux-foundation.org, netdev@vger.kernel.org, kvm@vger.kernel.org, linux-aio@kvack.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org Subject: [RFC v2 12/13] vduse: Add memory shrinker to reclaim bounce pages Date: Tue, 22 Dec 2020 22:52:20 +0800 Message-Id: <20201222145221.711-13-xieyongji@bytedance.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20201222145221.711-1-xieyongji@bytedance.com> References: <20201222145221.711-1-xieyongji@bytedance.com> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Add a shrinker to reclaim several pages used by bounce buffer in order to avoid memory pressures. Signed-off-by: Xie Yongji --- drivers/vdpa/vdpa_user/vduse_dev.c | 51 ++++++++++++++++++++++++++++++++++++++ 1 file changed, 51 insertions(+) diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c index c29b24a7e7e9..1bc2e627c476 100644 --- a/drivers/vdpa/vdpa_user/vduse_dev.c +++ b/drivers/vdpa/vdpa_user/vduse_dev.c @@ -1142,6 +1142,43 @@ static long vduse_ioctl(struct file *file, unsigned int cmd, return ret; } +static unsigned long vduse_shrink_scan(struct shrinker *shrinker, + struct shrink_control *sc) +{ + unsigned long freed = 0; + struct vduse_dev *dev; + + if (!mutex_trylock(&vduse_lock)) + return SHRINK_STOP; + + list_for_each_entry(dev, &vduse_devs, list) { + if (!dev->domain) + continue; + + freed = vduse_domain_reclaim(dev->domain); + if (!freed) + continue; + + list_move_tail(&dev->list, &vduse_devs); + break; + } + mutex_unlock(&vduse_lock); + + return freed ? freed : SHRINK_STOP; +} + +static unsigned long vduse_shrink_count(struct shrinker *shrink, + struct shrink_control *sc) +{ + return percpu_counter_read_positive(&vduse_total_bounce_pages); +} + +static struct shrinker vduse_bounce_pages_shrinker = { + .count_objects = vduse_shrink_count, + .scan_objects = vduse_shrink_scan, + .seeks = DEFAULT_SEEKS, +}; + static const struct file_operations vduse_fops = { .owner = THIS_MODULE, .unlocked_ioctl = vduse_ioctl, @@ -1292,12 +1329,24 @@ static int vduse_init(void) if (ret) goto err_irqfd; + ret = vduse_domain_init(); + if (ret) + goto err_domain; + + ret = register_shrinker(&vduse_bounce_pages_shrinker); + if (ret) + goto err_shrinker; + ret = vduse_parentdev_init(); if (ret) goto err_parentdev; return 0; err_parentdev: + unregister_shrinker(&vduse_bounce_pages_shrinker); +err_shrinker: + vduse_domain_exit(); +err_domain: vduse_virqfd_exit(); err_irqfd: destroy_workqueue(vduse_vdpa_wq); @@ -1309,8 +1358,10 @@ module_init(vduse_init); static void vduse_exit(void) { + unregister_shrinker(&vduse_bounce_pages_shrinker); misc_deregister(&vduse_misc); destroy_workqueue(vduse_vdpa_wq); + vduse_domain_exit(); vduse_virqfd_exit(); vduse_parentdev_exit(); } From patchwork Tue Dec 22 14:52:21 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yongji Xie X-Patchwork-Id: 11986907 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 29E34C433DB for ; Tue, 22 Dec 2020 14:54:38 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id C2EE1229C6 for ; Tue, 22 Dec 2020 14:54:37 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C2EE1229C6 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=bytedance.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 54BCC8D0015; Tue, 22 Dec 2020 09:54:37 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 4FABE8D0016; Tue, 22 Dec 2020 09:54:37 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4107B8D0015; Tue, 22 Dec 2020 09:54:37 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0193.hostedemail.com [216.40.44.193]) by kanga.kvack.org (Postfix) with ESMTP id 26FE78D0012 for ; Tue, 22 Dec 2020 09:54:37 -0500 (EST) Received: from smtpin03.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id E84D51F06 for ; Tue, 22 Dec 2020 14:54:36 +0000 (UTC) X-FDA: 77621214552.03.lake36_45094a327460 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin03.hostedemail.com (Postfix) with ESMTP id A5EEA1809A for ; Tue, 22 Dec 2020 14:54:36 +0000 (UTC) X-HE-Tag: lake36_45094a327460 X-Filterd-Recvd-Size: 4739 Received: from mail-pl1-f179.google.com (mail-pl1-f179.google.com [209.85.214.179]) by imf13.hostedemail.com (Postfix) with ESMTP for ; Tue, 22 Dec 2020 14:54:36 +0000 (UTC) Received: by mail-pl1-f179.google.com with SMTP id q4so7543957plr.7 for ; Tue, 22 Dec 2020 06:54:36 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=RADXly6HG+PYDpCqLXRZWwmhkpi+THQigpZFKjj79E0=; b=owxkRQAn8mi1u29AGfOZBHVkYG9INTpWMi5SOSOPS1fT0xL7YABZ+pMsbu77MelNiV cz0vixAFJ99sCW6JKgiaxHZA2cNY/HTaM57igo63TGRJKdt+BHU+/2asqycutk8mm8b0 lIhdZCvqQFuRRM2rNq8a2/6rb75wbQ/3i5/RGb3awDl8Tc0KVl6PKUk3kdUcLlOe5e8M QtxtTkBREHqhY23FIjRWJ7V+R4+DdxA2wyrgmjildb72k1ujFqvDMQCtGl9ERlqQ0zG7 HRKB3Qa1MxltAVTZvwH6Vj1aygNmWJw2lrFJAEnE3fFc8uUeRxwtlIhZYfX68XxNQFWk pzDA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=RADXly6HG+PYDpCqLXRZWwmhkpi+THQigpZFKjj79E0=; b=MqCt0rbIBq7Ayl1RkzJII+sTC+UAbhK7xWr4yOiY53WONPTqXF2S1ycdR3h6++ijmg dV0cPVCr+VmrogxnVzmUNtxyPeqYgFxTmH8gaP34gV3ZKSh3cPzVMyWWHRWLzNPUPCrj 86i1O4qTC36TTLEaN/itaSSfmER/C2KEjWrL9/ISpoLudQZo2qQPSsndwiGjkQLKFc27 6Cf+siEAL3ZzQ3xQTe5zVcri/AUiNmWvpKXzG+EnCkzB4YtzlH6NEY2SgyClgrLmDy5N BWcVQzfcg3yZavwRnn1azwT4fL5zFpQSyBO1nZlN2cbX48/PWgItoFLM3FiqHSP6NBAp tjDg== X-Gm-Message-State: AOAM530NyK77kiaL0SIKJ5FfmR3ijuBDp0uva5HMaUS4x7VvtSpuKY5x IY6/F+gtUS8xOLJOYaki8Vdk X-Google-Smtp-Source: ABdhPJxkRSBlcvxchIiS81ezO9xwR9ohQaUb5Kh8NPRBIkVLcDIYT224UEl4mr18HZP/oEI6uKUnAw== X-Received: by 2002:a17:902:8343:b029:dc:231e:110a with SMTP id z3-20020a1709028343b02900dc231e110amr21401236pln.67.1608648875434; Tue, 22 Dec 2020 06:54:35 -0800 (PST) Received: from localhost ([139.177.225.248]) by smtp.gmail.com with ESMTPSA id t9sm14483845pgh.41.2020.12.22.06.54.33 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 22 Dec 2020 06:54:34 -0800 (PST) From: Xie Yongji To: mst@redhat.com, jasowang@redhat.com, stefanha@redhat.com, sgarzare@redhat.com, parav@nvidia.com, akpm@linux-foundation.org, rdunlap@infradead.org, willy@infradead.org, viro@zeniv.linux.org.uk, axboe@kernel.dk, bcrl@kvack.org, corbet@lwn.net Cc: virtualization@lists.linux-foundation.org, netdev@vger.kernel.org, kvm@vger.kernel.org, linux-aio@kvack.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org Subject: [RFC v2 13/13] vduse: Introduce a workqueue for irq injection Date: Tue, 22 Dec 2020 22:52:21 +0800 Message-Id: <20201222145221.711-14-xieyongji@bytedance.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20201222145221.711-1-xieyongji@bytedance.com> References: <20201222145221.711-1-xieyongji@bytedance.com> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: This patch introduces a dedicated workqueue for irq injection so that we are able to do some performance tuning for it. Signed-off-by: Xie Yongji --- drivers/vdpa/vdpa_user/eventfd.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/drivers/vdpa/vdpa_user/eventfd.c b/drivers/vdpa/vdpa_user/eventfd.c index dbffddb08908..caf7d8d68ac0 100644 --- a/drivers/vdpa/vdpa_user/eventfd.c +++ b/drivers/vdpa/vdpa_user/eventfd.c @@ -18,6 +18,7 @@ #include "eventfd.h" static struct workqueue_struct *vduse_irqfd_cleanup_wq; +static struct workqueue_struct *vduse_irq_wq; static void vduse_virqfd_shutdown(struct work_struct *work) { @@ -57,7 +58,7 @@ static int vduse_virqfd_wakeup(wait_queue_entry_t *wait, unsigned int mode, __poll_t flags = key_to_poll(key); if (flags & EPOLLIN) - schedule_work(&virqfd->inject); + queue_work(vduse_irq_wq, &virqfd->inject); if (flags & EPOLLHUP) { spin_lock(&vq->irq_lock); @@ -165,11 +166,18 @@ int vduse_virqfd_init(void) if (!vduse_irqfd_cleanup_wq) return -ENOMEM; + vduse_irq_wq = alloc_workqueue("vduse-irq", WQ_SYSFS | WQ_UNBOUND, 0); + if (!vduse_irq_wq) { + destroy_workqueue(vduse_irqfd_cleanup_wq); + return -ENOMEM; + } + return 0; } void vduse_virqfd_exit(void) { + destroy_workqueue(vduse_irq_wq); destroy_workqueue(vduse_irqfd_cleanup_wq); }