From patchwork Tue Oct 27 14:39:43 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Woodhouse X-Patchwork-Id: 11860641 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.9 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH, MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 484C8C63697 for ; Tue, 27 Oct 2020 14:41:03 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id C8D0E22284 for ; Tue, 27 Oct 2020 14:41:02 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="FGIqjYJG" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1762123AbgJ0Ok4 (ORCPT ); Tue, 27 Oct 2020 10:40:56 -0400 Received: from casper.infradead.org ([90.155.50.34]:43762 "EHLO casper.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S368455AbgJ0Ojv (ORCPT ); Tue, 27 Oct 2020 10:39:51 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Sender:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From: Reply-To:Content-Type:Content-ID:Content-Description; bh=Kqa4yz0KkuVBYzisYBgpS52crCT6S1zza5R6m2vJgLM=; b=FGIqjYJGytS879QBSS19P1et+/ G83SLYh56u2jo3v9R4Xl7YAnYA/1nHPvC48yrrdY4hC5n2QEx48roKnw2B2GHbbzDAsmMqDY0MG7m ZGeFqpplzjYZj/9mDMyPQRZn+aRnVb9TfzSX+foN04QdCIpSh22fm6GU1XkDixzN7z9IXP1qoIFIJ r4Q180PqX4f92Xv+3lUOVyL4CkmIL5wA/4mIWnTrYUnVvZ+6pxrLbJqDZ0pECLtMVEG0SegS4bxBK 4Ev9yPfxhnk98PhWNLi/kiL7sdTWhNcGM4x6nFQeGNjgUY2riSArBS83oPyyA1q2QhOM46i8Jwpfk ld4HXjmQ==; Received: from i7.infradead.org ([2001:8b0:10b:1:21e:67ff:fecb:7a92]) by casper.infradead.org with esmtpsa (Exim 4.92.3 #3 (Red Hat Linux)) id 1kXQ8Z-0005C4-1A; Tue, 27 Oct 2020 14:39:47 +0000 Received: from dwoodhou by i7.infradead.org with local (Exim 4.93 #3 (Red Hat Linux)) id 1kXQ8Y-002iqf-Eo; Tue, 27 Oct 2020 14:39:46 +0000 From: David Woodhouse To: linux-kernel@vger.kernel.org Cc: Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Daniel Bristot de Oliveira , Paolo Bonzini , kvm@vger.kernel.org Subject: [PATCH v2 1/2] sched/wait: Add add_wait_queue_priority() Date: Tue, 27 Oct 2020 14:39:43 +0000 Message-Id: <20201027143944.648769-2-dwmw2@infradead.org> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20201027143944.648769-1-dwmw2@infradead.org> References: <20201026175325.585623-1-dwmw2@infradead.org> <20201027143944.648769-1-dwmw2@infradead.org> MIME-Version: 1.0 Sender: David Woodhouse X-SRS-Rewrite: SMTP reverse-path rewritten from by casper.infradead.org. See http://www.infradead.org/rpr.html Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: David Woodhouse This allows an exclusive wait_queue_entry to be added at the head of the queue, instead of the tail as normal. Thus, it gets to consume events first without allowing non-exclusive waiters to be woken at all. The (first) intended use is for KVM IRQFD, which currently has inconsistent behaviour depending on whether posted interrupts are available or not. If they are, KVM will bypass the eventfd completely and deliver interrupts directly to the appropriate vCPU. If not, events are delivered through the eventfd and userspace will receive them when polling on the eventfd. By using add_wait_queue_priority(), KVM will be able to consistently consume events within the kernel without accidentally exposing them to userspace when they're supposed to be bypassed. This, in turn, means that userspace doesn't have to jump through hoops to avoid listening on the erroneously noisy eventfd and injecting duplicate interrupts. Signed-off-by: David Woodhouse Acked-by: Peter Zijlstra (Intel) --- include/linux/wait.h | 12 +++++++++++- kernel/sched/wait.c | 17 ++++++++++++++++- 2 files changed, 27 insertions(+), 2 deletions(-) diff --git a/include/linux/wait.h b/include/linux/wait.h index 27fb99cfeb02..fe10e8570a52 100644 --- a/include/linux/wait.h +++ b/include/linux/wait.h @@ -22,6 +22,7 @@ int default_wake_function(struct wait_queue_entry *wq_entry, unsigned mode, int #define WQ_FLAG_BOOKMARK 0x04 #define WQ_FLAG_CUSTOM 0x08 #define WQ_FLAG_DONE 0x10 +#define WQ_FLAG_PRIORITY 0x20 /* * A single wait-queue entry structure: @@ -164,11 +165,20 @@ static inline bool wq_has_sleeper(struct wait_queue_head *wq_head) extern void add_wait_queue(struct wait_queue_head *wq_head, struct wait_queue_entry *wq_entry); extern void add_wait_queue_exclusive(struct wait_queue_head *wq_head, struct wait_queue_entry *wq_entry); +extern void add_wait_queue_priority(struct wait_queue_head *wq_head, struct wait_queue_entry *wq_entry); extern void remove_wait_queue(struct wait_queue_head *wq_head, struct wait_queue_entry *wq_entry); static inline void __add_wait_queue(struct wait_queue_head *wq_head, struct wait_queue_entry *wq_entry) { - list_add(&wq_entry->entry, &wq_head->head); + struct list_head *head = &wq_head->head; + struct wait_queue_entry *wq; + + list_for_each_entry(wq, &wq_head->head, entry) { + if (!(wq->flags & WQ_FLAG_PRIORITY)) + break; + head = &wq->entry; + } + list_add(&wq_entry->entry, head); } /* diff --git a/kernel/sched/wait.c b/kernel/sched/wait.c index 01f5d3020589..183cc6ae68a6 100644 --- a/kernel/sched/wait.c +++ b/kernel/sched/wait.c @@ -37,6 +37,17 @@ void add_wait_queue_exclusive(struct wait_queue_head *wq_head, struct wait_queue } EXPORT_SYMBOL(add_wait_queue_exclusive); +void add_wait_queue_priority(struct wait_queue_head *wq_head, struct wait_queue_entry *wq_entry) +{ + unsigned long flags; + + wq_entry->flags |= WQ_FLAG_EXCLUSIVE | WQ_FLAG_PRIORITY; + spin_lock_irqsave(&wq_head->lock, flags); + __add_wait_queue(wq_head, wq_entry); + spin_unlock_irqrestore(&wq_head->lock, flags); +} +EXPORT_SYMBOL_GPL(add_wait_queue_priority); + void remove_wait_queue(struct wait_queue_head *wq_head, struct wait_queue_entry *wq_entry) { unsigned long flags; @@ -57,7 +68,11 @@ EXPORT_SYMBOL(remove_wait_queue); /* * The core wakeup function. Non-exclusive wakeups (nr_exclusive == 0) just * wake everything up. If it's an exclusive wakeup (nr_exclusive == small +ve - * number) then we wake all the non-exclusive tasks and one exclusive task. + * number) then we wake that number of exclusive tasks, and potentially all + * the non-exclusive tasks. Normally, exclusive tasks will be at the end of + * the list and any non-exclusive tasks will be woken first. A priority task + * may be at the head of the list, and can consume the event without any other + * tasks being woken. * * There are circumstances in which we can try to wake a task which has already * started to run but is not in state TASK_RUNNING. try_to_wake_up() returns From patchwork Tue Oct 27 14:39:44 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Woodhouse X-Patchwork-Id: 11860639 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.9 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH, MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9D2E3C5517A for ; Tue, 27 Oct 2020 14:41:09 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 3428922265 for ; Tue, 27 Oct 2020 14:41:09 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="wA7xeMSp" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1762129AbgJ0OlF (ORCPT ); Tue, 27 Oct 2020 10:41:05 -0400 Received: from casper.infradead.org ([90.155.50.34]:43764 "EHLO casper.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S368457AbgJ0Oju (ORCPT ); Tue, 27 Oct 2020 10:39:50 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Sender:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From: Reply-To:Content-Type:Content-ID:Content-Description; bh=tbOYj5t0+tvCkSDg3zSqNJaUCRlXrtQoNZf64wI8RcI=; b=wA7xeMSpDfKI2x3RCUXsWrZ+tx x5PE0v2FwERIRcZVCXTLumHTQ80DtcQcYmNCJi0cnnjZtIVIKeylA74z1Lnn/kz9SaMcxnq+dZ0s9 aaSktn1KpMSYG9lwZ5UtvG9u/VimZbqXFX1KPGKzOTFfC9lCfdqJjeed3vvIvn8bZCo0adFXCs6Df 0y4nrGHWoB8vImeBWmN0nvFjqsHfHEwY6Jjb6Zp8LPjhpqr48v9PU/2dYy5hTuxTPC+bUYgyo0auv l+m+wBJ1fspkW1dbODaolSzoy1GeASoyB5rtLZ0ObBzrHCAnXZHJCgs1U+6/pC4vlSOamcDMfaj1u +8Tgg2yQ==; Received: from i7.infradead.org ([2001:8b0:10b:1:21e:67ff:fecb:7a92]) by casper.infradead.org with esmtpsa (Exim 4.92.3 #3 (Red Hat Linux)) id 1kXQ8Z-0005C5-1A; Tue, 27 Oct 2020 14:39:47 +0000 Received: from dwoodhou by i7.infradead.org with local (Exim 4.93 #3 (Red Hat Linux)) id 1kXQ8Y-002iqk-Fl; Tue, 27 Oct 2020 14:39:46 +0000 From: David Woodhouse To: linux-kernel@vger.kernel.org Cc: Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Daniel Bristot de Oliveira , Paolo Bonzini , kvm@vger.kernel.org Subject: [PATCH v2 2/2] kvm/eventfd: Use priority waitqueue to catch events before userspace Date: Tue, 27 Oct 2020 14:39:44 +0000 Message-Id: <20201027143944.648769-3-dwmw2@infradead.org> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20201027143944.648769-1-dwmw2@infradead.org> References: <20201026175325.585623-1-dwmw2@infradead.org> <20201027143944.648769-1-dwmw2@infradead.org> MIME-Version: 1.0 Sender: David Woodhouse X-SRS-Rewrite: SMTP reverse-path rewritten from by casper.infradead.org. See http://www.infradead.org/rpr.html Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: David Woodhouse When posted interrupts are available, the IRTE is modified to deliver interrupts direclty to the vCPU and nothing ever reaches userspace, if it's listening on the same eventfd that feeds the irqfd. I like that behaviour. Let's do it all the time, even without posted interrupts. It makes it much easier to handle IRQ remapping invalidation without having to constantly add/remove the fd from the userspace poll set. We can just leave userspace polling on it, and the bypass will... well... bypass it. Signed-off-by: David Woodhouse --- virt/kvm/eventfd.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c index 87fe94355350..09cbdf2ded70 100644 --- a/virt/kvm/eventfd.c +++ b/virt/kvm/eventfd.c @@ -191,6 +191,7 @@ irqfd_wakeup(wait_queue_entry_t *wait, unsigned mode, int sync, void *key) struct kvm *kvm = irqfd->kvm; unsigned seq; int idx; + int ret = 0; if (flags & EPOLLIN) { u64 cnt; @@ -207,6 +208,7 @@ irqfd_wakeup(wait_queue_entry_t *wait, unsigned mode, int sync, void *key) false) == -EWOULDBLOCK) schedule_work(&irqfd->inject); srcu_read_unlock(&kvm->irq_srcu, idx); + ret = 1; } if (flags & EPOLLHUP) { @@ -230,7 +232,7 @@ irqfd_wakeup(wait_queue_entry_t *wait, unsigned mode, int sync, void *key) spin_unlock_irqrestore(&kvm->irqfds.lock, iflags); } - return 0; + return ret; } static void @@ -239,7 +241,7 @@ irqfd_ptable_queue_proc(struct file *file, wait_queue_head_t *wqh, { struct kvm_kernel_irqfd *irqfd = container_of(pt, struct kvm_kernel_irqfd, pt); - add_wait_queue(wqh, &irqfd->wait); + add_wait_queue_priority(wqh, &irqfd->wait); } /* Must be called under irqfds.lock */