From patchwork Wed Mar 18 20:43:03 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Thomas Gleixner X-Patchwork-Id: 11446047 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5FC68159A for ; Wed, 18 Mar 2020 20:48:40 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 4954620777 for ; Wed, 18 Mar 2020 20:48:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727116AbgCRUrX (ORCPT ); Wed, 18 Mar 2020 16:47:23 -0400 Received: from Galois.linutronix.de ([193.142.43.55]:58402 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727065AbgCRUrV (ORCPT ); Wed, 18 Mar 2020 16:47:21 -0400 Received: from p5de0bf0b.dip0.t-ipconnect.de ([93.224.191.11] helo=nanos.tec.linutronix.de) by Galois.linutronix.de with esmtpsa (TLS1.2:DHE_RSA_AES_256_CBC_SHA256:256) (Exim 4.80) (envelope-from ) id 1jEfaH-00065i-6k; Wed, 18 Mar 2020 21:46:37 +0100 Received: from nanos.tec.linutronix.de (localhost [IPv6:::1]) by nanos.tec.linutronix.de (Postfix) with ESMTP id 4402C1040C5; Wed, 18 Mar 2020 21:46:35 +0100 (CET) Message-Id: <20200318204407.497942274@linutronix.de> User-Agent: quilt/0.65 Date: Wed, 18 Mar 2020 21:43:03 +0100 From: Thomas Gleixner To: LKML Cc: Peter Zijlstra , Linus Torvalds , Ingo Molnar , Will Deacon , "Paul E . McKenney" , Joel Fernandes , Steven Rostedt , Randy Dunlap , Sebastian Andrzej Siewior , Logan Gunthorpe , Kurt Schwemmer , Bjorn Helgaas , linux-pci@vger.kernel.org, Felipe Balbi , Greg Kroah-Hartman , linux-usb@vger.kernel.org, Kalle Valo , "David S. Miller" , linux-wireless@vger.kernel.org, netdev@vger.kernel.org, Oleg Nesterov , Davidlohr Bueso , Michael Ellerman , Arnd Bergmann , linuxppc-dev@lists.ozlabs.org Subject: [patch V2 01/15] PCI/switchtec: Fix init_completion race condition with poll_wait() References: <20200318204302.693307984@linutronix.de> MIME-Version: 1.0 X-Linutronix-Spam-Score: -1.0 X-Linutronix-Spam-Level: - X-Linutronix-Spam-Status: No , -1.0 points, 5.0 required, ALL_TRUSTED=-1,SHORTCIRCUIT=-0.0001 Sender: linux-usb-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-usb@vger.kernel.org From: Logan Gunthorpe The call to init_completion() in mrpc_queue_cmd() can theoretically race with the call to poll_wait() in switchtec_dev_poll(). poll() write() switchtec_dev_poll() switchtec_dev_write() poll_wait(&s->comp.wait); mrpc_queue_cmd() init_completion(&s->comp) init_waitqueue_head(&s->comp.wait) To my knowledge, no one has hit this bug. Fix this by using reinit_completion() instead of init_completion() in mrpc_queue_cmd(). Fixes: 080b47def5e5 ("MicroSemi Switchtec management interface driver") Reported-by: Sebastian Andrzej Siewior Signed-off-by: Logan Gunthorpe Signed-off-by: Thomas Gleixner Link: https://lkml.kernel.org/r/20200313183608.2646-1-logang@deltatee.com Acked-by: Bjorn Helgaas --- drivers/pci/switch/switchtec.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/pci/switch/switchtec.c b/drivers/pci/switch/switchtec.c index a823b4b8ef8a..81dc7ac01381 100644 --- a/drivers/pci/switch/switchtec.c +++ b/drivers/pci/switch/switchtec.c @@ -175,7 +175,7 @@ static int mrpc_queue_cmd(struct switchtec_user *stuser) kref_get(&stuser->kref); stuser->read_len = sizeof(stuser->data); stuser_set_state(stuser, MRPC_QUEUED); - init_completion(&stuser->comp); + reinit_completion(&stuser->comp); list_add_tail(&stuser->list, &stdev->mrpc_queue); mrpc_cmd_submit(stdev); From patchwork Wed Mar 18 20:43:04 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Thomas Gleixner X-Patchwork-Id: 11446051 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B15FD13B1 for ; Wed, 18 Mar 2020 20:48:46 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 9B29220775 for ; Wed, 18 Mar 2020 20:48:46 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727027AbgCRUrS (ORCPT ); Wed, 18 Mar 2020 16:47:18 -0400 Received: from Galois.linutronix.de ([193.142.43.55]:58397 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726631AbgCRUrR (ORCPT ); Wed, 18 Mar 2020 16:47:17 -0400 Received: from p5de0bf0b.dip0.t-ipconnect.de ([93.224.191.11] helo=nanos.tec.linutronix.de) by Galois.linutronix.de with esmtpsa (TLS1.2:DHE_RSA_AES_256_CBC_SHA256:256) (Exim 4.80) (envelope-from ) id 1jEfaH-00065h-5J; Wed, 18 Mar 2020 21:46:37 +0100 Received: from nanos.tec.linutronix.de (localhost [IPv6:::1]) by nanos.tec.linutronix.de (Postfix) with ESMTP id 79FA31040C6; Wed, 18 Mar 2020 21:46:35 +0100 (CET) Message-Id: <20200318204407.607241357@linutronix.de> User-Agent: quilt/0.65 Date: Wed, 18 Mar 2020 21:43:04 +0100 From: Thomas Gleixner To: LKML Cc: Peter Zijlstra , Linus Torvalds , Ingo Molnar , Will Deacon , "Paul E . McKenney" , Joel Fernandes , Steven Rostedt , Randy Dunlap , Sebastian Andrzej Siewior , Kurt Schwemmer , Logan Gunthorpe , Bjorn Helgaas , linux-pci@vger.kernel.org, Felipe Balbi , Greg Kroah-Hartman , linux-usb@vger.kernel.org, Kalle Valo , "David S. Miller" , linux-wireless@vger.kernel.org, netdev@vger.kernel.org, Oleg Nesterov , Davidlohr Bueso , Michael Ellerman , Arnd Bergmann , linuxppc-dev@lists.ozlabs.org Subject: [patch V2 02/15] pci/switchtec: Replace completion wait queue usage for poll References: <20200318204302.693307984@linutronix.de> MIME-Version: 1.0 X-Linutronix-Spam-Score: -1.0 X-Linutronix-Spam-Level: - X-Linutronix-Spam-Status: No , -1.0 points, 5.0 required, ALL_TRUSTED=-1,SHORTCIRCUIT=-0.0001 Sender: linux-usb-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-usb@vger.kernel.org From: Sebastian Andrzej Siewior The poll callback is using the completion wait queue and sticks it into poll_wait() to wake up pollers after a command has completed. This works to some extent, but cannot provide EPOLLEXCLUSIVE support because the waker side uses complete_all() which unconditionally wakes up all waiters. complete_all() is required because completions internally use exclusive wait and complete() only wakes up one waiter by default. This mixes conceptually different mechanisms and relies on internal implementation details of completions, which in turn puts contraints on changing the internal implementation of completions. Replace it with a regular wait queue and store the state in struct switchtec_user. Signed-off-by: Sebastian Andrzej Siewior Acked-by: Peter Zijlstra (Intel) Cc: Kurt Schwemmer Cc: Logan Gunthorpe Cc: Bjorn Helgaas Cc: linux-pci@vger.kernel.org Acked-by: Bjorn Helgaas Reviewed-by: Logan Gunthorpe --- V2: Reworded changelog. --- drivers/pci/switch/switchtec.c | 22 +++++++++++++--------- 1 file changed, 13 insertions(+), 9 deletions(-) --- a/drivers/pci/switch/switchtec.c +++ b/drivers/pci/switch/switchtec.c @@ -52,10 +52,11 @@ struct switchtec_user { enum mrpc_state state; - struct completion comp; + wait_queue_head_t cmd_comp; struct kref kref; struct list_head list; + bool cmd_done; u32 cmd; u32 status; u32 return_code; @@ -77,7 +78,7 @@ static struct switchtec_user *stuser_cre stuser->stdev = stdev; kref_init(&stuser->kref); INIT_LIST_HEAD(&stuser->list); - init_completion(&stuser->comp); + init_waitqueue_head(&stuser->cmd_comp); stuser->event_cnt = atomic_read(&stdev->event_cnt); dev_dbg(&stdev->dev, "%s: %p\n", __func__, stuser); @@ -175,7 +176,7 @@ static int mrpc_queue_cmd(struct switcht kref_get(&stuser->kref); stuser->read_len = sizeof(stuser->data); stuser_set_state(stuser, MRPC_QUEUED); - reinit_completion(&stuser->comp); + stuser->cmd_done = false; list_add_tail(&stuser->list, &stdev->mrpc_queue); mrpc_cmd_submit(stdev); @@ -222,7 +223,8 @@ static void mrpc_complete_cmd(struct swi memcpy_fromio(stuser->data, &stdev->mmio_mrpc->output_data, stuser->read_len); out: - complete_all(&stuser->comp); + stuser->cmd_done = true; + wake_up_interruptible(&stuser->cmd_comp); list_del_init(&stuser->list); stuser_put(stuser); stdev->mrpc_busy = 0; @@ -529,10 +531,11 @@ static ssize_t switchtec_dev_read(struct mutex_unlock(&stdev->mrpc_mutex); if (filp->f_flags & O_NONBLOCK) { - if (!try_wait_for_completion(&stuser->comp)) + if (!stuser->cmd_done) return -EAGAIN; } else { - rc = wait_for_completion_interruptible(&stuser->comp); + rc = wait_event_interruptible(stuser->cmd_comp, + stuser->cmd_done); if (rc < 0) return rc; } @@ -580,7 +583,7 @@ static __poll_t switchtec_dev_poll(struc struct switchtec_dev *stdev = stuser->stdev; __poll_t ret = 0; - poll_wait(filp, &stuser->comp.wait, wait); + poll_wait(filp, &stuser->cmd_comp, wait); poll_wait(filp, &stdev->event_wq, wait); if (lock_mutex_and_test_alive(stdev)) @@ -588,7 +591,7 @@ static __poll_t switchtec_dev_poll(struc mutex_unlock(&stdev->mrpc_mutex); - if (try_wait_for_completion(&stuser->comp)) + if (stuser->cmd_done) ret |= EPOLLIN | EPOLLRDNORM; if (stuser->event_cnt != atomic_read(&stdev->event_cnt)) @@ -1272,7 +1275,8 @@ static void stdev_kill(struct switchtec_ /* Wake up and kill any users waiting on an MRPC request */ list_for_each_entry_safe(stuser, tmpuser, &stdev->mrpc_queue, list) { - complete_all(&stuser->comp); + stuser->cmd_done = true; + wake_up_interruptible(&stuser->cmd_comp); list_del_init(&stuser->list); stuser_put(stuser); } From patchwork Wed Mar 18 20:43:05 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Thomas Gleixner X-Patchwork-Id: 11446029 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0752813B1 for ; Wed, 18 Mar 2020 20:48:26 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id E50BA208CA for ; Wed, 18 Mar 2020 20:48:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727391AbgCRUsS (ORCPT ); Wed, 18 Mar 2020 16:48:18 -0400 Received: from Galois.linutronix.de ([193.142.43.55]:58410 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726631AbgCRUr3 (ORCPT ); Wed, 18 Mar 2020 16:47:29 -0400 Received: from p5de0bf0b.dip0.t-ipconnect.de ([93.224.191.11] helo=nanos.tec.linutronix.de) by Galois.linutronix.de with esmtpsa (TLS1.2:DHE_RSA_AES_256_CBC_SHA256:256) (Exim 4.80) (envelope-from ) id 1jEfaH-00065m-Ie; Wed, 18 Mar 2020 21:46:37 +0100 Received: from nanos.tec.linutronix.de (localhost [IPv6:::1]) by nanos.tec.linutronix.de (Postfix) with ESMTP id B08301040C7; Wed, 18 Mar 2020 21:46:35 +0100 (CET) Message-Id: <20200318204407.700914073@linutronix.de> User-Agent: quilt/0.65 Date: Wed, 18 Mar 2020 21:43:05 +0100 From: Thomas Gleixner To: LKML Cc: Peter Zijlstra , Linus Torvalds , Ingo Molnar , Will Deacon , "Paul E . McKenney" , Joel Fernandes , Steven Rostedt , Randy Dunlap , Sebastian Andrzej Siewior , Felipe Balbi , Greg Kroah-Hartman , linux-usb@vger.kernel.org, Logan Gunthorpe , Kurt Schwemmer , Bjorn Helgaas , linux-pci@vger.kernel.org, Kalle Valo , "David S. Miller" , linux-wireless@vger.kernel.org, netdev@vger.kernel.org, Oleg Nesterov , Davidlohr Bueso , Michael Ellerman , Arnd Bergmann , linuxppc-dev@lists.ozlabs.org Subject: [patch V2 03/15] usb: gadget: Use completion interface instead of open coding it References: <20200318204302.693307984@linutronix.de> MIME-Version: 1.0 X-Linutronix-Spam-Score: -1.0 X-Linutronix-Spam-Level: - X-Linutronix-Spam-Status: No , -1.0 points, 5.0 required, ALL_TRUSTED=-1,SHORTCIRCUIT=-0.0001 Sender: linux-usb-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-usb@vger.kernel.org ep_io() uses a completion on stack and open codes the waiting with: wait_event_interruptible (done.wait, done.done); and wait_event (done.wait, done.done); This waits in non-exclusive mode for complete(), but there is no reason to do so because the completion can only be waited for by the task itself and complete() wakes exactly one exlusive waiter. Replace the open coded implementation with the corresponding wait_for_completion*() functions. No functional change. Reported-by: Sebastian Andrzej Siewior Signed-off-by: Thomas Gleixner Cc: Felipe Balbi Cc: Greg Kroah-Hartman Cc: linux-usb@vger.kernel.org Reviewed-by: Greg Kroah-Hartman --- V2: New patch to avoid the conversion to swait interfaces later --- drivers/usb/gadget/legacy/inode.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) --- a/drivers/usb/gadget/legacy/inode.c +++ b/drivers/usb/gadget/legacy/inode.c @@ -344,7 +344,7 @@ ep_io (struct ep_data *epdata, void *buf spin_unlock_irq (&epdata->dev->lock); if (likely (value == 0)) { - value = wait_event_interruptible (done.wait, done.done); + value = wait_for_completion_interruptible(&done); if (value != 0) { spin_lock_irq (&epdata->dev->lock); if (likely (epdata->ep != NULL)) { @@ -353,7 +353,7 @@ ep_io (struct ep_data *epdata, void *buf usb_ep_dequeue (epdata->ep, epdata->req); spin_unlock_irq (&epdata->dev->lock); - wait_event (done.wait, done.done); + wait_for_completion(&done); if (epdata->status == -ECONNRESET) epdata->status = -EINTR; } else { From patchwork Wed Mar 18 20:43:06 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Thomas Gleixner X-Patchwork-Id: 11445969 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B660114DD for ; Wed, 18 Mar 2020 20:47:19 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 9FE7E2071C for ; Wed, 18 Mar 2020 20:47:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726975AbgCRUrS (ORCPT ); Wed, 18 Mar 2020 16:47:18 -0400 Received: from Galois.linutronix.de ([193.142.43.55]:58399 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726663AbgCRUrS (ORCPT ); Wed, 18 Mar 2020 16:47:18 -0400 Received: from p5de0bf0b.dip0.t-ipconnect.de ([93.224.191.11] helo=nanos.tec.linutronix.de) by Galois.linutronix.de with esmtpsa (TLS1.2:DHE_RSA_AES_256_CBC_SHA256:256) (Exim 4.80) (envelope-from ) id 1jEfaJ-000666-4B; Wed, 18 Mar 2020 21:46:39 +0100 Received: from nanos.tec.linutronix.de (localhost [IPv6:::1]) by nanos.tec.linutronix.de (Postfix) with ESMTP id E74691040C9; Wed, 18 Mar 2020 21:46:35 +0100 (CET) Message-Id: <20200318204407.793899611@linutronix.de> User-Agent: quilt/0.65 Date: Wed, 18 Mar 2020 21:43:06 +0100 From: Thomas Gleixner To: LKML Cc: Peter Zijlstra , Linus Torvalds , Ingo Molnar , Will Deacon , "Paul E . McKenney" , Joel Fernandes , Steven Rostedt , Randy Dunlap , Sebastian Andrzej Siewior , Kalle Valo , "David S. Miller" , linux-wireless@vger.kernel.org, netdev@vger.kernel.org, Logan Gunthorpe , Kurt Schwemmer , Bjorn Helgaas , linux-pci@vger.kernel.org, Felipe Balbi , Greg Kroah-Hartman , linux-usb@vger.kernel.org, Oleg Nesterov , Davidlohr Bueso , Michael Ellerman , Arnd Bergmann , linuxppc-dev@lists.ozlabs.org Subject: [patch V2 04/15] orinoco_usb: Use the regular completion interfaces References: <20200318204302.693307984@linutronix.de> MIME-Version: 1.0 X-Linutronix-Spam-Score: -1.0 X-Linutronix-Spam-Level: - X-Linutronix-Spam-Status: No , -1.0 points, 5.0 required, ALL_TRUSTED=-1,SHORTCIRCUIT=-0.0001 Sender: linux-usb-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-usb@vger.kernel.org From: Thomas Gleixner The completion usage in this driver is interesting: - it uses a magic complete function which according to the comment was implemented by invoking complete() four times in a row because complete_all() was not exported at that time. - it uses an open coded wait/poll which checks completion:done. Only one wait side (device removal) uses the regular wait_for_completion() interface. The rationale behind this is to prevent that wait_for_completion() consumes completion::done which would prevent that all waiters are woken. This is not necessary with complete_all() as that sets completion::done to UINT_MAX which is left unmodified by the woken waiters. Replace the magic complete function with complete_all() and convert the open coded wait/poll to regular completion interfaces. This changes the wait to exclusive wait mode. But that does not make any difference because the wakers use complete_all() which ignores the exclusive mode. Reported-by: Sebastian Andrzej Siewior Signed-off-by: Thomas Gleixner Cc: Kalle Valo Cc: "David S. Miller" Cc: linux-wireless@vger.kernel.org Cc: netdev@vger.kernel.org Reviewed-by: Greg Kroah-Hartman --- V2: New patch to avoid conversion to swait functions later. --- drivers/net/wireless/intersil/orinoco/orinoco_usb.c | 21 ++++---------------- 1 file changed, 5 insertions(+), 16 deletions(-) --- a/drivers/net/wireless/intersil/orinoco/orinoco_usb.c +++ b/drivers/net/wireless/intersil/orinoco/orinoco_usb.c @@ -365,17 +365,6 @@ static struct request_context *ezusb_all return ctx; } - -/* Hopefully the real complete_all will soon be exported, in the mean - * while this should work. */ -static inline void ezusb_complete_all(struct completion *comp) -{ - complete(comp); - complete(comp); - complete(comp); - complete(comp); -} - static void ezusb_ctx_complete(struct request_context *ctx) { struct ezusb_priv *upriv = ctx->upriv; @@ -409,7 +398,7 @@ static void ezusb_ctx_complete(struct re netif_wake_queue(dev); } - ezusb_complete_all(&ctx->done); + complete_all(&ctx->done); ezusb_request_context_put(ctx); break; @@ -419,7 +408,7 @@ static void ezusb_ctx_complete(struct re /* This is normal, as all request contexts get flushed * when the device is disconnected */ err("Called, CTX not terminating, but device gone"); - ezusb_complete_all(&ctx->done); + complete_all(&ctx->done); ezusb_request_context_put(ctx); break; } @@ -690,11 +679,11 @@ static void ezusb_req_ctx_wait(struct ez * get the chance to run themselves. So we make sure * that we don't sleep for ever */ int msecs = DEF_TIMEOUT * (1000 / HZ); - while (!ctx->done.done && msecs--) + + while (!try_wait_for_completion(&ctx->done) && msecs--) udelay(1000); } else { - wait_event_interruptible(ctx->done.wait, - ctx->done.done); + wait_for_completion(&ctx->done); } break; default: From patchwork Wed Mar 18 20:43:07 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Thomas Gleixner X-Patchwork-Id: 11445975 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id BD9AC14DD for ; Wed, 18 Mar 2020 20:47:35 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id A67F6208E0 for ; Wed, 18 Mar 2020 20:47:35 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727183AbgCRUre (ORCPT ); Wed, 18 Mar 2020 16:47:34 -0400 Received: from Galois.linutronix.de ([193.142.43.55]:58413 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727227AbgCRUrd (ORCPT ); Wed, 18 Mar 2020 16:47:33 -0400 Received: from p5de0bf0b.dip0.t-ipconnect.de ([93.224.191.11] helo=nanos.tec.linutronix.de) by Galois.linutronix.de with esmtpsa (TLS1.2:DHE_RSA_AES_256_CBC_SHA256:256) (Exim 4.80) (envelope-from ) id 1jEfaO-0006CC-Ol; Wed, 18 Mar 2020 21:46:45 +0100 Received: from nanos.tec.linutronix.de (localhost [IPv6:::1]) by nanos.tec.linutronix.de (Postfix) with ESMTP id 2A8F31040CB; Wed, 18 Mar 2020 21:46:36 +0100 (CET) Message-Id: <20200318204407.901266791@linutronix.de> User-Agent: quilt/0.65 Date: Wed, 18 Mar 2020 21:43:07 +0100 From: Thomas Gleixner To: LKML Cc: Peter Zijlstra , Linus Torvalds , Ingo Molnar , Will Deacon , "Paul E . McKenney" , Joel Fernandes , Steven Rostedt , Randy Dunlap , Sebastian Andrzej Siewior , Logan Gunthorpe , Kurt Schwemmer , Bjorn Helgaas , linux-pci@vger.kernel.org, Felipe Balbi , Greg Kroah-Hartman , linux-usb@vger.kernel.org, Kalle Valo , "David S. Miller" , linux-wireless@vger.kernel.org, netdev@vger.kernel.org, Oleg Nesterov , Davidlohr Bueso , Michael Ellerman , Arnd Bergmann , linuxppc-dev@lists.ozlabs.org Subject: [patch V2 05/15] acpi: Remove header dependency References: <20200318204302.693307984@linutronix.de> MIME-Version: 1.0 X-Linutronix-Spam-Score: -1.0 X-Linutronix-Spam-Level: - X-Linutronix-Spam-Status: No , -1.0 points, 5.0 required, ALL_TRUSTED=-1,SHORTCIRCUIT=-0.0001 Sender: linux-usb-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-usb@vger.kernel.org In order to avoid future header hell, remove the inclusion of proc_fs.h from acpi_bus.h. All it needs is a forward declaration of a struct. Signed-off-by: Peter Zijlstra (Intel) Signed-off-by: Thomas Gleixner --- drivers/platform/x86/dell-smo8800.c | 1 + drivers/platform/x86/wmi.c | 1 + drivers/thermal/intel/int340x_thermal/acpi_thermal_rel.c | 1 + include/acpi/acpi_bus.h | 2 +- 4 files changed, 4 insertions(+), 1 deletion(-) --- a/drivers/platform/x86/dell-smo8800.c +++ b/drivers/platform/x86/dell-smo8800.c @@ -16,6 +16,7 @@ #include #include #include +#include struct smo8800_device { u32 irq; /* acpi device irq */ --- a/drivers/platform/x86/wmi.c +++ b/drivers/platform/x86/wmi.c @@ -29,6 +29,7 @@ #include #include #include +#include #include ACPI_MODULE_NAME("wmi"); --- a/drivers/thermal/intel/int340x_thermal/acpi_thermal_rel.c +++ b/drivers/thermal/intel/int340x_thermal/acpi_thermal_rel.c @@ -19,6 +19,7 @@ #include #include #include +#include #include "acpi_thermal_rel.h" static acpi_handle acpi_thermal_rel_handle; --- a/include/acpi/acpi_bus.h +++ b/include/acpi/acpi_bus.h @@ -80,7 +80,7 @@ bool acpi_dev_present(const char *hid, c #ifdef CONFIG_ACPI -#include +struct proc_dir_entry; #define ACPI_BUS_FILE_ROOT "acpi" extern struct proc_dir_entry *acpi_root_dir; From patchwork Wed Mar 18 20:43:08 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Thomas Gleixner X-Patchwork-Id: 11446035 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0A3B414DD for ; Wed, 18 Mar 2020 20:48:29 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id E82DA20777 for ; Wed, 18 Mar 2020 20:48:28 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727207AbgCRUr3 (ORCPT ); Wed, 18 Mar 2020 16:47:29 -0400 Received: from Galois.linutronix.de ([193.142.43.55]:58408 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727168AbgCRUr1 (ORCPT ); Wed, 18 Mar 2020 16:47:27 -0400 Received: from p5de0bf0b.dip0.t-ipconnect.de ([93.224.191.11] helo=nanos.tec.linutronix.de) by Galois.linutronix.de with esmtpsa (TLS1.2:DHE_RSA_AES_256_CBC_SHA256:256) (Exim 4.80) (envelope-from ) id 1jEfaO-0006CB-Vy; Wed, 18 Mar 2020 21:46:45 +0100 Received: from nanos.tec.linutronix.de (localhost [IPv6:::1]) by nanos.tec.linutronix.de (Postfix) with ESMTP id 61F721040CC; Wed, 18 Mar 2020 21:46:36 +0100 (CET) Message-Id: <20200318204408.010461877@linutronix.de> User-Agent: quilt/0.65 Date: Wed, 18 Mar 2020 21:43:08 +0100 From: Thomas Gleixner To: LKML Cc: Peter Zijlstra , Linus Torvalds , Ingo Molnar , Will Deacon , "Paul E . McKenney" , Joel Fernandes , Steven Rostedt , Randy Dunlap , Oleg Nesterov , Davidlohr Bueso , Sebastian Andrzej Siewior , Logan Gunthorpe , Kurt Schwemmer , Bjorn Helgaas , linux-pci@vger.kernel.org, Felipe Balbi , Greg Kroah-Hartman , linux-usb@vger.kernel.org, Kalle Valo , "David S. Miller" , linux-wireless@vger.kernel.org, netdev@vger.kernel.org, Michael Ellerman , Arnd Bergmann , linuxppc-dev@lists.ozlabs.org Subject: [patch V2 06/15] rcuwait: Add @state argument to rcuwait_wait_event() References: <20200318204302.693307984@linutronix.de> MIME-Version: 1.0 X-Linutronix-Spam-Score: -1.0 X-Linutronix-Spam-Level: - X-Linutronix-Spam-Status: No , -1.0 points, 5.0 required, ALL_TRUSTED=-1,SHORTCIRCUIT=-0.0001 Sender: linux-usb-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-usb@vger.kernel.org Extend rcuwait_wait_event() with a state variable so that it is not restricted to UNINTERRUPTIBLE waits. Signed-off-by: Peter Zijlstra (Intel) Signed-off-by: Thomas Gleixner Cc: Oleg Nesterov Cc: Davidlohr Bueso --- include/linux/rcuwait.h | 12 ++++++++++-- kernel/locking/percpu-rwsem.c | 2 +- 2 files changed, 11 insertions(+), 3 deletions(-) --- a/include/linux/rcuwait.h +++ b/include/linux/rcuwait.h @@ -3,6 +3,7 @@ #define _LINUX_RCUWAIT_H_ #include +#include /* * rcuwait provides a way of blocking and waking up a single @@ -30,23 +31,30 @@ extern void rcuwait_wake_up(struct rcuwa * The caller is responsible for locking around rcuwait_wait_event(), * such that writes to @task are properly serialized. */ -#define rcuwait_wait_event(w, condition) \ +#define rcuwait_wait_event(w, condition, state) \ ({ \ + int __ret = 0; \ rcu_assign_pointer((w)->task, current); \ for (;;) { \ /* \ * Implicit barrier (A) pairs with (B) in \ * rcuwait_wake_up(). \ */ \ - set_current_state(TASK_UNINTERRUPTIBLE); \ + set_current_state(state); \ if (condition) \ break; \ \ + if (signal_pending_state(state, current)) { \ + __ret = -EINTR; \ + break; \ + } \ + \ schedule(); \ } \ \ WRITE_ONCE((w)->task, NULL); \ __set_current_state(TASK_RUNNING); \ + __ret; \ }) #endif /* _LINUX_RCUWAIT_H_ */ --- a/kernel/locking/percpu-rwsem.c +++ b/kernel/locking/percpu-rwsem.c @@ -162,7 +162,7 @@ void percpu_down_write(struct percpu_rw_ */ /* Wait for all now active readers to complete. */ - rcuwait_wait_event(&sem->writer, readers_active_check(sem)); + rcuwait_wait_event(&sem->writer, readers_active_check(sem), TASK_UNINTERRUPTIBLE); } EXPORT_SYMBOL_GPL(percpu_down_write); From patchwork Wed Mar 18 20:43:09 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Thomas Gleixner X-Patchwork-Id: 11445999 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 765CA13B1 for ; Wed, 18 Mar 2020 20:48:04 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 5F0C820775 for ; Wed, 18 Mar 2020 20:48:04 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727356AbgCRUrz (ORCPT ); Wed, 18 Mar 2020 16:47:55 -0400 Received: from Galois.linutronix.de ([193.142.43.55]:58412 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727225AbgCRUrc (ORCPT ); Wed, 18 Mar 2020 16:47:32 -0400 Received: from p5de0bf0b.dip0.t-ipconnect.de ([93.224.191.11] helo=nanos.tec.linutronix.de) by Galois.linutronix.de with esmtpsa (TLS1.2:DHE_RSA_AES_256_CBC_SHA256:256) (Exim 4.80) (envelope-from ) id 1jEfaK-000669-59; Wed, 18 Mar 2020 21:46:40 +0100 Received: from nanos.tec.linutronix.de (localhost [IPv6:::1]) by nanos.tec.linutronix.de (Postfix) with ESMTP id 9A0B91040CD; Wed, 18 Mar 2020 21:46:36 +0100 (CET) Message-Id: <20200318204408.102694393@linutronix.de> User-Agent: quilt/0.65 Date: Wed, 18 Mar 2020 21:43:09 +0100 From: Thomas Gleixner To: LKML Cc: Peter Zijlstra , Linus Torvalds , Ingo Molnar , Will Deacon , "Paul E . McKenney" , Joel Fernandes , Steven Rostedt , Randy Dunlap , Michael Ellerman , Arnd Bergmann , linuxppc-dev@lists.ozlabs.org, Sebastian Andrzej Siewior , Logan Gunthorpe , Kurt Schwemmer , Bjorn Helgaas , linux-pci@vger.kernel.org, Felipe Balbi , Greg Kroah-Hartman , linux-usb@vger.kernel.org, Kalle Valo , "David S. Miller" , linux-wireless@vger.kernel.org, netdev@vger.kernel.org, Oleg Nesterov , Davidlohr Bueso Subject: [patch V2 07/15] powerpc/ps3: Convert half completion to rcuwait References: <20200318204302.693307984@linutronix.de> MIME-Version: 1.0 X-Linutronix-Spam-Score: -1.0 X-Linutronix-Spam-Level: - X-Linutronix-Spam-Status: No , -1.0 points, 5.0 required, ALL_TRUSTED=-1,SHORTCIRCUIT=-0.0001 Sender: linux-usb-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-usb@vger.kernel.org The PS3 notification interrupt and kthread use a hacked up completion to communicate. Since we're wanting to change the completion implementation and this is abuse anyway, replace it with a simple rcuwait since there is only ever the one waiter. AFAICT the kthread uses TASK_INTERRUPTIBLE to not increase loadavg, kthreads cannot receive signals by default and this one doesn't look different. Use TASK_IDLE instead. Signed-off-by: Peter Zijlstra (Intel) Cc: Michael Ellerman Cc: Arnd Bergmann Cc: linuxppc-dev@lists.ozlabs.org --- V2: New patch to avoid the magic completion wait variant --- arch/powerpc/platforms/ps3/device-init.c | 17 ++++++++--------- 1 file changed, 8 insertions(+), 9 deletions(-) --- a/arch/powerpc/platforms/ps3/device-init.c +++ b/arch/powerpc/platforms/ps3/device-init.c @@ -13,6 +13,7 @@ #include #include #include +#include #include #include @@ -670,7 +671,8 @@ struct ps3_notification_device { spinlock_t lock; u64 tag; u64 lv1_status; - struct completion done; + struct rcuwait wait; + bool done; }; enum ps3_notify_type { @@ -712,7 +714,8 @@ static irqreturn_t ps3_notification_inte pr_debug("%s:%u: completed, status 0x%llx\n", __func__, __LINE__, status); dev->lv1_status = status; - complete(&dev->done); + dev->done = true; + rcuwait_wake_up(&dev->wait); } spin_unlock(&dev->lock); return IRQ_HANDLED; @@ -725,12 +728,12 @@ static int ps3_notification_read_write(s unsigned long flags; int res; - init_completion(&dev->done); spin_lock_irqsave(&dev->lock, flags); res = write ? lv1_storage_write(dev->sbd.dev_id, 0, 0, 1, 0, lpar, &dev->tag) : lv1_storage_read(dev->sbd.dev_id, 0, 0, 1, 0, lpar, &dev->tag); + dev->done = false; spin_unlock_irqrestore(&dev->lock, flags); if (res) { pr_err("%s:%u: %s failed %d\n", __func__, __LINE__, op, res); @@ -738,14 +741,10 @@ static int ps3_notification_read_write(s } pr_debug("%s:%u: notification %s issued\n", __func__, __LINE__, op); - res = wait_event_interruptible(dev->done.wait, - dev->done.done || kthread_should_stop()); + rcuwait_wait_event(&dev->wait, dev->done || kthread_should_stop(), TASK_IDLE); + if (kthread_should_stop()) res = -EINTR; - if (res) { - pr_debug("%s:%u: interrupted %s\n", __func__, __LINE__, op); - return res; - } if (dev->lv1_status) { pr_err("%s:%u: %s not completed, status 0x%llx\n", __func__, From patchwork Wed Mar 18 20:43:10 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Thomas Gleixner X-Patchwork-Id: 11446019 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E6E1613B1 for ; Wed, 18 Mar 2020 20:48:18 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id BEF5A208D6 for ; Wed, 18 Mar 2020 20:48:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727217AbgCRUra (ORCPT ); Wed, 18 Mar 2020 16:47:30 -0400 Received: from Galois.linutronix.de ([193.142.43.55]:58409 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727183AbgCRUr2 (ORCPT ); Wed, 18 Mar 2020 16:47:28 -0400 Received: from p5de0bf0b.dip0.t-ipconnect.de ([93.224.191.11] helo=nanos.tec.linutronix.de) by Galois.linutronix.de with esmtpsa (TLS1.2:DHE_RSA_AES_256_CBC_SHA256:256) (Exim 4.80) (envelope-from ) id 1jEfaM-00066P-DH; Wed, 18 Mar 2020 21:46:42 +0100 Received: from nanos.tec.linutronix.de (localhost [IPv6:::1]) by nanos.tec.linutronix.de (Postfix) with ESMTP id D1C191040D0; Wed, 18 Mar 2020 21:46:36 +0100 (CET) Message-Id: <20200318204408.211530902@linutronix.de> User-Agent: quilt/0.65 Date: Wed, 18 Mar 2020 21:43:10 +0100 From: Thomas Gleixner To: LKML Cc: Peter Zijlstra , Linus Torvalds , Ingo Molnar , Will Deacon , "Paul E . McKenney" , Joel Fernandes , Steven Rostedt , Randy Dunlap , Sebastian Andrzej Siewior , Logan Gunthorpe , Kurt Schwemmer , Bjorn Helgaas , linux-pci@vger.kernel.org, Felipe Balbi , Greg Kroah-Hartman , linux-usb@vger.kernel.org, Kalle Valo , "David S. Miller" , linux-wireless@vger.kernel.org, netdev@vger.kernel.org, Oleg Nesterov , Davidlohr Bueso , Michael Ellerman , Arnd Bergmann , linuxppc-dev@lists.ozlabs.org Subject: [patch V2 08/15] Documentation: Add lock ordering and nesting documentation References: <20200318204302.693307984@linutronix.de> MIME-Version: 1.0 X-Linutronix-Spam-Score: -1.0 X-Linutronix-Spam-Level: - X-Linutronix-Spam-Status: No , -1.0 points, 5.0 required, ALL_TRUSTED=-1,SHORTCIRCUIT=-0.0001 Sender: linux-usb-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-usb@vger.kernel.org From: Thomas Gleixner The kernel provides a variety of locking primitives. The nesting of these lock types and the implications of them on RT enabled kernels is nowhere documented. Add initial documentation. Signed-off-by: Thomas Gleixner Reviewed-by: Joel Fernandes (Google) --- V2: Addressed review comments from Randy --- Documentation/locking/index.rst | 1 Documentation/locking/locktypes.rst | 298 ++++++++++++++++++++++++++++++++++++ 2 files changed, 299 insertions(+) create mode 100644 Documentation/locking/locktypes.rst --- a/Documentation/locking/index.rst +++ b/Documentation/locking/index.rst @@ -7,6 +7,7 @@ locking .. toctree:: :maxdepth: 1 + locktypes lockdep-design lockstat locktorture --- /dev/null +++ b/Documentation/locking/locktypes.rst @@ -0,0 +1,298 @@ +.. _kernel_hacking_locktypes: + +========================== +Lock types and their rules +========================== + +Introduction +============ + +The kernel provides a variety of locking primitives which can be divided +into two categories: + + - Sleeping locks + - Spinning locks + +This document describes the lock types at least at the conceptual level and +provides rules for nesting of lock types also under the aspect of PREEMPT_RT. + +Lock categories +=============== + +Sleeping locks +-------------- + +Sleeping locks can only be acquired in preemptible task context. + +Some of the implementations allow try_lock() attempts from other contexts, +but that has to be really evaluated carefully including the question +whether the unlock can be done from that context safely as well. + +Note that some lock types change their implementation details when +debugging is enabled, so this should be really only considered if there is +no other option. + +Sleeping lock types: + + - mutex + - rt_mutex + - semaphore + - rw_semaphore + - ww_mutex + - percpu_rw_semaphore + +On a PREEMPT_RT enabled kernel the following lock types are converted to +sleeping locks: + + - spinlock_t + - rwlock_t + +Spinning locks +-------------- + + - raw_spinlock_t + - bit spinlocks + +On a non PREEMPT_RT enabled kernel the following lock types are spinning +locks as well: + + - spinlock_t + - rwlock_t + +Spinning locks implicitly disable preemption and the lock / unlock functions +can have suffixes which apply further protections: + + =================== ==================================================== + _bh() Disable / enable bottom halves (soft interrupts) + _irq() Disable / enable interrupts + _irqsave/restore() Save and disable / restore interrupt disabled state + =================== ==================================================== + + +rtmutex +======= + +RT-mutexes are mutexes with support for priority inheritance (PI). + +PI has limitations on non PREEMPT_RT enabled kernels due to preemption and +interrupt disabled sections. + +On a PREEMPT_RT enabled kernel most of these sections are fully +preemptible. This is possible because PREEMPT_RT forces most executions +into task context, especially interrupt handlers and soft interrupts, which +allows to substitute spinlock_t and rwlock_t with RT-mutex based +implementations. + + +raw_spinlock_t and spinlock_t +============================= + +raw_spinlock_t +-------------- + +raw_spinlock_t is a strict spinning lock implementation regardless of the +kernel configuration including PREEMPT_RT enabled kernels. + +raw_spinlock_t is to be used only in real critical core code, low level +interrupt handling and places where protecting (hardware) state is required +to be safe against preemption and eventually interrupts. + +Another reason to use raw_spinlock_t is when the critical section is tiny +to avoid the overhead of spinlock_t on a PREEMPT_RT enabled kernel in the +contended case. + +spinlock_t +---------- + +The semantics of spinlock_t change with the state of CONFIG_PREEMPT_RT. + +On a non PREEMPT_RT enabled kernel spinlock_t is mapped to raw_spinlock_t +and has exactly the same semantics. + +spinlock_t and PREEMPT_RT +------------------------- + +On a PREEMPT_RT enabled kernel spinlock_t is mapped to a separate +implementation based on rt_mutex which changes the semantics: + + - Preemption is not disabled + + - The hard interrupt related suffixes for spin_lock / spin_unlock + operations (_irq, _irqsave / _irqrestore) do not affect the CPUs + interrupt disabled state + + - The soft interrupt related suffix (_bh()) is still disabling the + execution of soft interrupts, but contrary to a non PREEMPT_RT enabled + kernel, which utilizes the preemption count, this is achieved by a per + CPU bottom half locking mechanism. + +All other semantics of spinlock_t are preserved: + + - Migration of tasks which hold a spinlock_t is prevented. On a non + PREEMPT_RT enabled kernel this is implicit due to preemption disable. + PREEMPT_RT has a separate mechanism to achieve this. This ensures that + pointers to per CPU variables stay valid even if the task is preempted. + + - Task state preservation. The task state is not affected when a lock is + contended and the task has to schedule out and wait for the lock to + become available. The lock wake up restores the task state unless there + was a regular (not lock related) wake up on the task. This ensures that + the task state rules are always correct independent of the kernel + configuration. + +rwlock_t +======== + +rwlock_t is a multiple readers and single writer lock mechanism. + +On a non PREEMPT_RT enabled kernel rwlock_t is implemented as a spinning +lock and the suffix rules of spinlock_t apply accordingly. The +implementation is fair and prevents writer starvation. + +rwlock_t and PREEMPT_RT +----------------------- + +On a PREEMPT_RT enabled kernel rwlock_t is mapped to a separate +implementation based on rt_mutex which changes the semantics: + + - Same changes as for spinlock_t + + - The implementation is not fair and can cause writer starvation under + certain circumstances. The reason for this is that a writer cannot grant + its priority to multiple readers. Readers which are blocked on a writer + fully support the priority inheritance protocol. + + +PREEMPT_RT caveats +================== + +spinlock_t and rwlock_t +----------------------- + +The substitution of spinlock_t and rwlock_t on PREEMPT_RT enabled kernels +with RT-mutex based implementations has a few implications. + +On a non PREEMPT_RT enabled kernel the following code construct is +perfectly fine:: + + local_irq_disable(); + spin_lock(&lock); + +and fully equivalent to:: + + spin_lock_irq(&lock); + +Same applies to rwlock_t and the _irqsave() suffix variant. + +On a PREEMPT_RT enabled kernel this breaks because the RT-mutex +substitution expects a fully preemptible context. + +The preferred solution is to use :c:func:`spin_lock_irq()` or +:c:func:`spin_lock_irqsave()` and their unlock counterparts. + +PREEMPT_RT also offers a local_lock mechanism to substitute the +local_irq_disable/save() constructs in cases where a separation of the +interrupt disabling and the locking is really unavoidable. This should be +restricted to very rare cases. + + +raw_spinlock_t +-------------- + +Locking of a raw_spinlock_t disables preemption and eventually interrupts. +Therefore code inside the critical region has to be careful to avoid calls +into code which takes a regular spinlock_t or rwlock_t. A prime example is +memory allocation. + +On a non PREEMPT_RT enabled kernel the following code construct is +perfectly fine code:: + + raw_spin_lock(&lock); + p = kmalloc(sizeof(*p), GFP_ATOMIC); + +On a PREEMPT_RT enabled kernel this breaks because the memory allocator is +fully preemptible and therefore does not support allocations from truly +atomic contexts. + +Contrary to that the following code construct is perfectly fine on +PREEMPT_RT as spin_lock() does not disable preemption:: + + spin_lock(&lock); + p = kmalloc(sizeof(*p), GFP_ATOMIC); + +Most places which use GFP_ATOMIC allocations are safe on PREEMPT_RT as the +execution is forced into thread context and the lock substitution is +ensuring preemptibility. + + +bit spinlocks +------------- + +Bit spinlocks are problematic for PREEMPT_RT as they cannot be easily +substituted by an RT-mutex based implementation for obvious reasons. + +The semantics of bit spinlocks are preserved on a PREEMPT_RT enabled kernel +and the caveats vs. raw_spinlock_t apply. + +Some bit spinlocks are substituted by regular spinlock_t for PREEMPT_RT but +this requires conditional (#ifdef'ed) code changes at the usage side while +the spinlock_t substitution is simply done by the compiler and the +conditionals are restricted to header files and core implementation of the +locking primitives and the usage sites do not require any changes. + + +Lock type nesting rules +======================= + +The most basic rules are: + + - Lock types of the same lock category (sleeping, spinning) can nest + arbitrarily as long as they respect the general lock ordering rules to + prevent deadlocks. + + - Sleeping lock types cannot nest inside spinning lock types. + + - Spinning lock types can nest inside sleeping lock types. + +These rules apply in general independent of CONFIG_PREEMPT_RT. + +As PREEMPT_RT changes the lock category of spinlock_t and rwlock_t from +spinning to sleeping this has obviously restrictions how they can nest with +raw_spinlock_t. + +This results in the following nest ordering: + + 1) Sleeping locks + 2) spinlock_t and rwlock_t + 3) raw_spinlock_t and bit spinlocks + +Lockdep is aware of these constraints to ensure that they are respected. + + +Owner semantics +=============== + +Most lock types in the Linux kernel have strict owner semantics, i.e. the +context (task) which acquires a lock has to release it. + +There are two exceptions: + + - semaphores + - rwsems + +semaphores have no strict owner semantics for historical reasons. They are +often used for both serialization and waiting purposes. That's generally +discouraged and should be replaced by separate serialization and wait +mechanisms. + +rwsems have grown interfaces which allow non owner release for special +purposes. This usage is problematic on PREEMPT_RT because PREEMPT_RT +substitutes all locking primitives except semaphores with RT-mutex based +implementations to provide priority inheritance for all lock types except +the truly spinning ones. Priority inheritance on ownerless locks is +obviously impossible. + +For now the rwsem non-owner release excludes code which utilizes it from +being used on PREEMPT_RT enabled kernels. In same cases this can be +mitigated by disabling portions of the code, in other cases the complete +functionality has to be disabled until a workable solution has been found. From patchwork Wed Mar 18 20:43:11 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Thomas Gleixner X-Patchwork-Id: 11445979 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7ABE0159A for ; Wed, 18 Mar 2020 20:47:46 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 59917208D6 for ; Wed, 18 Mar 2020 20:47:46 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727314AbgCRUri (ORCPT ); Wed, 18 Mar 2020 16:47:38 -0400 Received: from Galois.linutronix.de ([193.142.43.55]:58414 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727290AbgCRUrh (ORCPT ); Wed, 18 Mar 2020 16:47:37 -0400 Received: from p5de0bf0b.dip0.t-ipconnect.de ([93.224.191.11] helo=nanos.tec.linutronix.de) by Galois.linutronix.de with esmtpsa (TLS1.2:DHE_RSA_AES_256_CBC_SHA256:256) (Exim 4.80) (envelope-from ) id 1jEfaO-0006BC-A3; Wed, 18 Mar 2020 21:46:44 +0100 Received: from nanos.tec.linutronix.de (localhost [IPv6:::1]) by nanos.tec.linutronix.de (Postfix) with ESMTP id 13D7C1040D1; Wed, 18 Mar 2020 21:46:37 +0100 (CET) Message-Id: <20200318204408.319891308@linutronix.de> User-Agent: quilt/0.65 Date: Wed, 18 Mar 2020 21:43:11 +0100 From: Thomas Gleixner To: LKML Cc: Peter Zijlstra , Linus Torvalds , Ingo Molnar , Will Deacon , "Paul E . McKenney" , Joel Fernandes , Steven Rostedt , Randy Dunlap , Sebastian Andrzej Siewior , Logan Gunthorpe , Kurt Schwemmer , Bjorn Helgaas , linux-pci@vger.kernel.org, Felipe Balbi , Greg Kroah-Hartman , linux-usb@vger.kernel.org, Kalle Valo , "David S. Miller" , linux-wireless@vger.kernel.org, netdev@vger.kernel.org, Oleg Nesterov , Davidlohr Bueso , Michael Ellerman , Arnd Bergmann , linuxppc-dev@lists.ozlabs.org Subject: [patch V2 09/15] timekeeping: Split jiffies seqlock References: <20200318204302.693307984@linutronix.de> MIME-Version: 1.0 X-Linutronix-Spam-Score: -1.0 X-Linutronix-Spam-Level: - X-Linutronix-Spam-Status: No , -1.0 points, 5.0 required, ALL_TRUSTED=-1,SHORTCIRCUIT=-0.0001 Sender: linux-usb-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-usb@vger.kernel.org From: Thomas Gleixner seqlock consists of a sequence counter and a spinlock_t which is used to serialize the writers. spinlock_t is substituted by a "sleeping" spinlock on PREEMPT_RT enabled kernels which breaks the usage in the timekeeping code as the writers are executed in hard interrupt and therefore non-preemptible context even on PREEMPT_RT. The spinlock in seqlock cannot be unconditionally replaced by a raw_spinlock_t as many seqlock users have nesting spinlock sections or other code which is not suitable to run in truly atomic context on RT. Instead of providing a raw_seqlock API for a single use case, open code the seqlock for the jiffies use case and implement it with a raw_spinlock_t and a sequence counter. Signed-off-by: Thomas Gleixner --- kernel/time/jiffies.c | 7 ++++--- kernel/time/tick-common.c | 10 ++++++---- kernel/time/tick-sched.c | 19 ++++++++++++------- kernel/time/timekeeping.c | 6 ++++-- kernel/time/timekeeping.h | 3 ++- 5 files changed, 28 insertions(+), 17 deletions(-) --- a/kernel/time/jiffies.c +++ b/kernel/time/jiffies.c @@ -58,7 +58,8 @@ static struct clocksource clocksource_ji .max_cycles = 10, }; -__cacheline_aligned_in_smp DEFINE_SEQLOCK(jiffies_lock); +__cacheline_aligned_in_smp DEFINE_RAW_SPINLOCK(jiffies_lock); +__cacheline_aligned_in_smp seqcount_t jiffies_seq; #if (BITS_PER_LONG < 64) u64 get_jiffies_64(void) @@ -67,9 +68,9 @@ u64 get_jiffies_64(void) u64 ret; do { - seq = read_seqbegin(&jiffies_lock); + seq = read_seqcount_begin(&jiffies_seq); ret = jiffies_64; - } while (read_seqretry(&jiffies_lock, seq)); + } while (read_seqcount_retry(&jiffies_seq, seq)); return ret; } EXPORT_SYMBOL(get_jiffies_64); --- a/kernel/time/tick-common.c +++ b/kernel/time/tick-common.c @@ -84,13 +84,15 @@ int tick_is_oneshot_available(void) static void tick_periodic(int cpu) { if (tick_do_timer_cpu == cpu) { - write_seqlock(&jiffies_lock); + raw_spin_lock(&jiffies_lock); + write_seqcount_begin(&jiffies_seq); /* Keep track of the next tick event */ tick_next_period = ktime_add(tick_next_period, tick_period); do_timer(1); - write_sequnlock(&jiffies_lock); + write_seqcount_end(&jiffies_seq); + raw_spin_unlock(&jiffies_lock); update_wall_time(); } @@ -162,9 +164,9 @@ void tick_setup_periodic(struct clock_ev ktime_t next; do { - seq = read_seqbegin(&jiffies_lock); + seq = read_seqcount_begin(&jiffies_seq); next = tick_next_period; - } while (read_seqretry(&jiffies_lock, seq)); + } while (read_seqcount_retry(&jiffies_seq, seq)); clockevents_switch_state(dev, CLOCK_EVT_STATE_ONESHOT); --- a/kernel/time/tick-sched.c +++ b/kernel/time/tick-sched.c @@ -65,7 +65,8 @@ static void tick_do_update_jiffies64(kti return; /* Reevaluate with jiffies_lock held */ - write_seqlock(&jiffies_lock); + raw_spin_lock(&jiffies_lock); + write_seqcount_begin(&jiffies_seq); delta = ktime_sub(now, last_jiffies_update); if (delta >= tick_period) { @@ -91,10 +92,12 @@ static void tick_do_update_jiffies64(kti /* Keep the tick_next_period variable up to date */ tick_next_period = ktime_add(last_jiffies_update, tick_period); } else { - write_sequnlock(&jiffies_lock); + write_seqcount_end(&jiffies_seq); + raw_spin_unlock(&jiffies_lock); return; } - write_sequnlock(&jiffies_lock); + write_seqcount_end(&jiffies_seq); + raw_spin_unlock(&jiffies_lock); update_wall_time(); } @@ -105,12 +108,14 @@ static ktime_t tick_init_jiffy_update(vo { ktime_t period; - write_seqlock(&jiffies_lock); + raw_spin_lock(&jiffies_lock); + write_seqcount_begin(&jiffies_seq); /* Did we start the jiffies update yet ? */ if (last_jiffies_update == 0) last_jiffies_update = tick_next_period; period = last_jiffies_update; - write_sequnlock(&jiffies_lock); + write_seqcount_end(&jiffies_seq); + raw_spin_unlock(&jiffies_lock); return period; } @@ -676,10 +681,10 @@ static ktime_t tick_nohz_next_event(stru /* Read jiffies and the time when jiffies were updated last */ do { - seq = read_seqbegin(&jiffies_lock); + seq = read_seqcount_begin(&jiffies_seq); basemono = last_jiffies_update; basejiff = jiffies; - } while (read_seqretry(&jiffies_lock, seq)); + } while (read_seqcount_retry(&jiffies_seq, seq)); ts->last_jiffies = basejiff; ts->timer_expires_base = basemono; --- a/kernel/time/timekeeping.c +++ b/kernel/time/timekeeping.c @@ -2397,8 +2397,10 @@ EXPORT_SYMBOL(hardpps); */ void xtime_update(unsigned long ticks) { - write_seqlock(&jiffies_lock); + raw_spin_lock(&jiffies_lock); + write_seqcount_begin(&jiffies_seq); do_timer(ticks); - write_sequnlock(&jiffies_lock); + write_seqcount_end(&jiffies_seq); + raw_spin_unlock(&jiffies_lock); update_wall_time(); } --- a/kernel/time/timekeeping.h +++ b/kernel/time/timekeeping.h @@ -25,7 +25,8 @@ static inline void sched_clock_resume(vo extern void do_timer(unsigned long ticks); extern void update_wall_time(void); -extern seqlock_t jiffies_lock; +extern raw_spinlock_t jiffies_lock; +extern seqcount_t jiffies_seq; #define CS_NAME_LEN 32 From patchwork Wed Mar 18 20:43:12 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Thomas Gleixner X-Patchwork-Id: 11446041 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 803D114DD for ; Wed, 18 Mar 2020 20:48:34 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 6AA5F2071C for ; Wed, 18 Mar 2020 20:48:34 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727160AbgCRUrZ (ORCPT ); Wed, 18 Mar 2020 16:47:25 -0400 Received: from Galois.linutronix.de ([193.142.43.55]:58406 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726631AbgCRUrY (ORCPT ); Wed, 18 Mar 2020 16:47:24 -0400 Received: from p5de0bf0b.dip0.t-ipconnect.de ([93.224.191.11] helo=nanos.tec.linutronix.de) by Galois.linutronix.de with esmtpsa (TLS1.2:DHE_RSA_AES_256_CBC_SHA256:256) (Exim 4.80) (envelope-from ) id 1jEfaR-0006Fi-0w; Wed, 18 Mar 2020 21:46:47 +0100 Received: from nanos.tec.linutronix.de (localhost [IPv6:::1]) by nanos.tec.linutronix.de (Postfix) with ESMTP id 49D23101161; Wed, 18 Mar 2020 21:46:37 +0100 (CET) Message-Id: <20200318204408.428468767@linutronix.de> User-Agent: quilt/0.65 Date: Wed, 18 Mar 2020 21:43:12 +0100 From: Thomas Gleixner To: LKML Cc: Peter Zijlstra , Linus Torvalds , Ingo Molnar , Will Deacon , "Paul E . McKenney" , Joel Fernandes , Steven Rostedt , Randy Dunlap , Sebastian Andrzej Siewior , Logan Gunthorpe , Kurt Schwemmer , Bjorn Helgaas , linux-pci@vger.kernel.org, Felipe Balbi , Greg Kroah-Hartman , linux-usb@vger.kernel.org, Kalle Valo , "David S. Miller" , linux-wireless@vger.kernel.org, netdev@vger.kernel.org, Oleg Nesterov , Davidlohr Bueso , Michael Ellerman , Arnd Bergmann , linuxppc-dev@lists.ozlabs.org Subject: [patch V2 10/15] sched/swait: Prepare usage in completions References: <20200318204302.693307984@linutronix.de> MIME-Version: 1.0 X-Linutronix-Spam-Score: -1.0 X-Linutronix-Spam-Level: - X-Linutronix-Spam-Status: No , -1.0 points, 5.0 required, ALL_TRUSTED=-1,SHORTCIRCUIT=-0.0001 Sender: linux-usb-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-usb@vger.kernel.org From: Thomas Gleixner As a preparation to use simple wait queues for completions: - Provide swake_up_all_locked() to support complete_all() - Make __prepare_to_swait() public available This is done to enable the usage of complete() within truly atomic contexts on a PREEMPT_RT enabled kernel. Signed-off-by: Thomas Gleixner --- V2: Add comment to swake_up_all_locked() --- kernel/sched/sched.h | 3 +++ kernel/sched/swait.c | 15 ++++++++++++++- 2 files changed, 17 insertions(+), 1 deletion(-) --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -2492,3 +2492,6 @@ static inline bool is_per_cpu_kthread(st return true; } #endif + +void swake_up_all_locked(struct swait_queue_head *q); +void __prepare_to_swait(struct swait_queue_head *q, struct swait_queue *wait); --- a/kernel/sched/swait.c +++ b/kernel/sched/swait.c @@ -32,6 +32,19 @@ void swake_up_locked(struct swait_queue_ } EXPORT_SYMBOL(swake_up_locked); +/* + * Wake up all waiters. This is an interface which is solely exposed for + * completions and not for general usage. + * + * It is intentionally different from swake_up_all() to allow usage from + * hard interrupt context and interrupt disabled regions. + */ +void swake_up_all_locked(struct swait_queue_head *q) +{ + while (!list_empty(&q->task_list)) + swake_up_locked(q); +} + void swake_up_one(struct swait_queue_head *q) { unsigned long flags; @@ -69,7 +82,7 @@ void swake_up_all(struct swait_queue_hea } EXPORT_SYMBOL(swake_up_all); -static void __prepare_to_swait(struct swait_queue_head *q, struct swait_queue *wait) +void __prepare_to_swait(struct swait_queue_head *q, struct swait_queue *wait) { wait->task = current; if (list_empty(&wait->task_list)) From patchwork Wed Mar 18 20:43:13 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Thomas Gleixner X-Patchwork-Id: 11446009 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id EEF14159A for ; Wed, 18 Mar 2020 20:48:08 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id CFE21208D5 for ; Wed, 18 Mar 2020 20:48:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727249AbgCRUrc (ORCPT ); Wed, 18 Mar 2020 16:47:32 -0400 Received: from Galois.linutronix.de ([193.142.43.55]:58411 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727191AbgCRUrb (ORCPT ); Wed, 18 Mar 2020 16:47:31 -0400 Received: from p5de0bf0b.dip0.t-ipconnect.de ([93.224.191.11] helo=nanos.tec.linutronix.de) by Galois.linutronix.de with esmtpsa (TLS1.2:DHE_RSA_AES_256_CBC_SHA256:256) (Exim 4.80) (envelope-from ) id 1jEfaZ-0006IW-8z; Wed, 18 Mar 2020 21:46:55 +0100 Received: from nanos.tec.linutronix.de (localhost [IPv6:::1]) by nanos.tec.linutronix.de (Postfix) with ESMTP id 800E91040C6; Wed, 18 Mar 2020 21:46:37 +0100 (CET) Message-Id: <20200318204408.521507446@linutronix.de> User-Agent: quilt/0.65 Date: Wed, 18 Mar 2020 21:43:13 +0100 From: Thomas Gleixner To: LKML Cc: Peter Zijlstra , Linus Torvalds , Ingo Molnar , Will Deacon , "Paul E . McKenney" , Joel Fernandes , Steven Rostedt , Randy Dunlap , Arnd Bergmann , Sebastian Andrzej Siewior , Logan Gunthorpe , Kurt Schwemmer , Bjorn Helgaas , linux-pci@vger.kernel.org, Felipe Balbi , Greg Kroah-Hartman , linux-usb@vger.kernel.org, Kalle Valo , "David S. Miller" , linux-wireless@vger.kernel.org, netdev@vger.kernel.org, Oleg Nesterov , Davidlohr Bueso , Michael Ellerman , linuxppc-dev@lists.ozlabs.org Subject: [patch V2 11/15] completion: Use simple wait queues References: <20200318204302.693307984@linutronix.de> MIME-Version: 1.0 X-Linutronix-Spam-Score: -1.0 X-Linutronix-Spam-Level: - X-Linutronix-Spam-Status: No , -1.0 points, 5.0 required, ALL_TRUSTED=-1,SHORTCIRCUIT=-0.0001 Sender: linux-usb-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-usb@vger.kernel.org From: Thomas Gleixner completion uses a wait_queue_head_t to enqueue waiters. wait_queue_head_t contains a spinlock_t to protect the list of waiters which excludes it from being used in truly atomic context on a PREEMPT_RT enabled kernel. The spinlock in the wait queue head cannot be replaced by a raw_spinlock because: - wait queues can have custom wakeup callbacks, which acquire other spinlock_t locks and have potentially long execution times - wake_up() walks an unbounded number of list entries during the wake up and may wake an unbounded number of waiters. For simplicity and performance reasons complete() should be usable on PREEMPT_RT enabled kernels. completions do not use custom wakeup callbacks and are usually single waiter, except for a few corner cases. Replace the wait queue in the completion with a simple wait queue (swait), which uses a raw_spinlock_t for protecting the waiter list and therefore is safe to use inside truly atomic regions on PREEMPT_RT. There is no semantical or functional change: - completions use the exclusive wait mode which is what swait provides - complete() wakes one exclusive waiter - complete_all() wakes all waiters while holding the lock which protects the wait queue against newly incoming waiters. The conversion to swait preserves this behaviour. complete_all() might cause unbound latencies with a large number of waiters being woken at once, but most complete_all() usage sites are either in testing or initialization code or have only a really small number of concurrent waiters which for now does not cause a latency problem. Keep it simple for now. The fixup of the warning check in the USB gadget driver is just a straight forward conversion of the lockless waiter check from one waitqueue type to the other. Signed-off-by: Thomas Gleixner Cc: Arnd Bergmann Reviewed-by: Joel Fernandes (Google) Reviewed-by: Greg Kroah-Hartman Reviewed-by: Davidlohr Bueso --- V2: Split out the orinoco and usb gadget parts and amended change log --- drivers/usb/gadget/function/f_fs.c | 2 +- include/linux/completion.h | 8 ++++---- kernel/sched/completion.c | 36 +++++++++++++++++++----------------- 3 files changed, 24 insertions(+), 22 deletions(-) --- a/drivers/usb/gadget/function/f_fs.c +++ b/drivers/usb/gadget/function/f_fs.c @@ -1703,7 +1703,7 @@ static void ffs_data_put(struct ffs_data pr_info("%s(): freeing\n", __func__); ffs_data_clear(ffs); BUG_ON(waitqueue_active(&ffs->ev.waitq) || - waitqueue_active(&ffs->ep0req_completion.wait) || + swait_active(&ffs->ep0req_completion.wait) || waitqueue_active(&ffs->wait)); destroy_workqueue(ffs->io_completion_wq); kfree(ffs->dev_name); --- a/include/linux/completion.h +++ b/include/linux/completion.h @@ -9,7 +9,7 @@ * See kernel/sched/completion.c for details. */ -#include +#include /* * struct completion - structure used to maintain state for a "completion" @@ -25,7 +25,7 @@ */ struct completion { unsigned int done; - wait_queue_head_t wait; + struct swait_queue_head wait; }; #define init_completion_map(x, m) __init_completion(x) @@ -34,7 +34,7 @@ static inline void complete_acquire(stru static inline void complete_release(struct completion *x) {} #define COMPLETION_INITIALIZER(work) \ - { 0, __WAIT_QUEUE_HEAD_INITIALIZER((work).wait) } + { 0, __SWAIT_QUEUE_HEAD_INITIALIZER((work).wait) } #define COMPLETION_INITIALIZER_ONSTACK_MAP(work, map) \ (*({ init_completion_map(&(work), &(map)); &(work); })) @@ -85,7 +85,7 @@ static inline void complete_release(stru static inline void __init_completion(struct completion *x) { x->done = 0; - init_waitqueue_head(&x->wait); + init_swait_queue_head(&x->wait); } /** --- a/kernel/sched/completion.c +++ b/kernel/sched/completion.c @@ -29,12 +29,12 @@ void complete(struct completion *x) { unsigned long flags; - spin_lock_irqsave(&x->wait.lock, flags); + raw_spin_lock_irqsave(&x->wait.lock, flags); if (x->done != UINT_MAX) x->done++; - __wake_up_locked(&x->wait, TASK_NORMAL, 1); - spin_unlock_irqrestore(&x->wait.lock, flags); + swake_up_locked(&x->wait); + raw_spin_unlock_irqrestore(&x->wait.lock, flags); } EXPORT_SYMBOL(complete); @@ -58,10 +58,12 @@ void complete_all(struct completion *x) { unsigned long flags; - spin_lock_irqsave(&x->wait.lock, flags); + WARN_ON(irqs_disabled()); + + raw_spin_lock_irqsave(&x->wait.lock, flags); x->done = UINT_MAX; - __wake_up_locked(&x->wait, TASK_NORMAL, 0); - spin_unlock_irqrestore(&x->wait.lock, flags); + swake_up_all_locked(&x->wait); + raw_spin_unlock_irqrestore(&x->wait.lock, flags); } EXPORT_SYMBOL(complete_all); @@ -70,20 +72,20 @@ do_wait_for_common(struct completion *x, long (*action)(long), long timeout, int state) { if (!x->done) { - DECLARE_WAITQUEUE(wait, current); + DECLARE_SWAITQUEUE(wait); - __add_wait_queue_entry_tail_exclusive(&x->wait, &wait); do { if (signal_pending_state(state, current)) { timeout = -ERESTARTSYS; break; } + __prepare_to_swait(&x->wait, &wait); __set_current_state(state); - spin_unlock_irq(&x->wait.lock); + raw_spin_unlock_irq(&x->wait.lock); timeout = action(timeout); - spin_lock_irq(&x->wait.lock); + raw_spin_lock_irq(&x->wait.lock); } while (!x->done && timeout); - __remove_wait_queue(&x->wait, &wait); + __finish_swait(&x->wait, &wait); if (!x->done) return timeout; } @@ -100,9 +102,9 @@ static inline long __sched complete_acquire(x); - spin_lock_irq(&x->wait.lock); + raw_spin_lock_irq(&x->wait.lock); timeout = do_wait_for_common(x, action, timeout, state); - spin_unlock_irq(&x->wait.lock); + raw_spin_unlock_irq(&x->wait.lock); complete_release(x); @@ -291,12 +293,12 @@ bool try_wait_for_completion(struct comp if (!READ_ONCE(x->done)) return false; - spin_lock_irqsave(&x->wait.lock, flags); + raw_spin_lock_irqsave(&x->wait.lock, flags); if (!x->done) ret = false; else if (x->done != UINT_MAX) x->done--; - spin_unlock_irqrestore(&x->wait.lock, flags); + raw_spin_unlock_irqrestore(&x->wait.lock, flags); return ret; } EXPORT_SYMBOL(try_wait_for_completion); @@ -322,8 +324,8 @@ bool completion_done(struct completion * * otherwise we can end up freeing the completion before complete() * is done referencing it. */ - spin_lock_irqsave(&x->wait.lock, flags); - spin_unlock_irqrestore(&x->wait.lock, flags); + raw_spin_lock_irqsave(&x->wait.lock, flags); + raw_spin_unlock_irqrestore(&x->wait.lock, flags); return true; } EXPORT_SYMBOL(completion_done); From patchwork Fri Mar 20 08:55:24 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Davidlohr Bueso X-Patchwork-Id: 11448613 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 60AC7139A for ; Fri, 20 Mar 2020 08:56:41 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 3F98420773 for ; Fri, 20 Mar 2020 08:56:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726820AbgCTI4g (ORCPT ); Fri, 20 Mar 2020 04:56:36 -0400 Received: from mx2.suse.de ([195.135.220.15]:43204 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726232AbgCTI4f (ORCPT ); Fri, 20 Mar 2020 04:56:35 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id E9FE4AE5C; Fri, 20 Mar 2020 08:56:33 +0000 (UTC) From: Davidlohr Bueso To: tglx@linutronix.de Cc: arnd@arndb.de, balbi@kernel.org, bhelgaas@google.com, bigeasy@linutronix.de, dave@stgolabs.net, davem@davemloft.net, gregkh@linuxfoundation.org, joel@joelfernandes.org, kurt.schwemmer@microsemi.com, kvalo@codeaurora.org, linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org, linux-usb@vger.kernel.org, linux-wireless@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, logang@deltatee.com, mingo@kernel.org, mpe@ellerman.id.au, netdev@vger.kernel.org, oleg@redhat.com, paulmck@kernel.org, peterz@infradead.org, rdunlap@infradead.org, rostedt@goodmis.org, torvalds@linux-foundation.org, will@kernel.org, Davidlohr Bueso Subject: [PATCH 16/15] rcuwait: Get rid of stale name comment Date: Fri, 20 Mar 2020 01:55:24 -0700 Message-Id: <20200320085527.23861-1-dave@stgolabs.net> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20200318204302.693307984@linutronix.de> References: <20200318204302.693307984@linutronix.de> Sender: linux-usb-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-usb@vger.kernel.org The 'trywake' name was renamed to simply 'wake', update the comment. Signed-off-by: Davidlohr Bueso --- kernel/exit.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/exit.c b/kernel/exit.c index 0b81b26a872a..6cc6cc485d07 100644 --- a/kernel/exit.c +++ b/kernel/exit.c @@ -243,7 +243,7 @@ void rcuwait_wake_up(struct rcuwait *w) /* * Order condition vs @task, such that everything prior to the load * of @task is visible. This is the condition as to why the user called - * rcuwait_trywake() in the first place. Pairs with set_current_state() + * rcuwait_wake() in the first place. Pairs with set_current_state() * barrier (A) in rcuwait_wait_event(). * * WAIT WAKE From patchwork Fri Mar 20 08:55:25 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Davidlohr Bueso X-Patchwork-Id: 11448617 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 84E6F1392 for ; Fri, 20 Mar 2020 08:56:47 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 5FCF220767 for ; Fri, 20 Mar 2020 08:56:47 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727002AbgCTI4l (ORCPT ); Fri, 20 Mar 2020 04:56:41 -0400 Received: from mx2.suse.de ([195.135.220.15]:43272 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726232AbgCTI4k (ORCPT ); Fri, 20 Mar 2020 04:56:40 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id E7528AB76; Fri, 20 Mar 2020 08:56:38 +0000 (UTC) From: Davidlohr Bueso To: tglx@linutronix.de Cc: arnd@arndb.de, balbi@kernel.org, bhelgaas@google.com, bigeasy@linutronix.de, dave@stgolabs.net, davem@davemloft.net, gregkh@linuxfoundation.org, joel@joelfernandes.org, kurt.schwemmer@microsemi.com, kvalo@codeaurora.org, linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org, linux-usb@vger.kernel.org, linux-wireless@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, logang@deltatee.com, mingo@kernel.org, mpe@ellerman.id.au, netdev@vger.kernel.org, oleg@redhat.com, paulmck@kernel.org, peterz@infradead.org, rdunlap@infradead.org, rostedt@goodmis.org, torvalds@linux-foundation.org, will@kernel.org, Davidlohr Bueso Subject: [PATCH 17/15] rcuwait: Inform rcuwait_wake_up() users if a wakeup was attempted Date: Fri, 20 Mar 2020 01:55:25 -0700 Message-Id: <20200320085527.23861-2-dave@stgolabs.net> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20200320085527.23861-1-dave@stgolabs.net> References: <20200318204302.693307984@linutronix.de> <20200320085527.23861-1-dave@stgolabs.net> Sender: linux-usb-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-usb@vger.kernel.org Let the caller know if wake_up_process() was actually called or not; some users can use this information for ad-hoc. Of course returning true does not guarantee that wake_up_process() actually woke anything up. Signed-off-by: Davidlohr Bueso --- include/linux/rcuwait.h | 2 +- kernel/exit.c | 10 ++++++++-- 2 files changed, 9 insertions(+), 3 deletions(-) diff --git a/include/linux/rcuwait.h b/include/linux/rcuwait.h index 6e8798458091..3f83b9a12ad3 100644 --- a/include/linux/rcuwait.h +++ b/include/linux/rcuwait.h @@ -24,7 +24,7 @@ static inline void rcuwait_init(struct rcuwait *w) w->task = NULL; } -extern void rcuwait_wake_up(struct rcuwait *w); +extern bool rcuwait_wake_up(struct rcuwait *w); /* * The caller is responsible for locking around rcuwait_wait_event(), diff --git a/kernel/exit.c b/kernel/exit.c index 6cc6cc485d07..b0bb0a8ec4b1 100644 --- a/kernel/exit.c +++ b/kernel/exit.c @@ -234,9 +234,10 @@ void release_task(struct task_struct *p) goto repeat; } -void rcuwait_wake_up(struct rcuwait *w) +bool rcuwait_wake_up(struct rcuwait *w) { struct task_struct *task; + bool ret = false; rcu_read_lock(); @@ -254,10 +255,15 @@ void rcuwait_wake_up(struct rcuwait *w) smp_mb(); /* (B) */ task = rcu_dereference(w->task); - if (task) + if (task) { wake_up_process(task); + ret = true; + } rcu_read_unlock(); + + return ret; } +EXPORT_SYMBOL_GPL(rcuwait_wake_up); /* * Determine if a process group is "orphaned", according to the POSIX From patchwork Fri Mar 20 08:55:26 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Davidlohr Bueso X-Patchwork-Id: 11448627 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7747E1392 for ; Fri, 20 Mar 2020 08:56:59 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 53C9D20767 for ; Fri, 20 Mar 2020 08:56:59 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727052AbgCTI4r (ORCPT ); Fri, 20 Mar 2020 04:56:47 -0400 Received: from mx2.suse.de ([195.135.220.15]:43382 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726232AbgCTI4q (ORCPT ); Fri, 20 Mar 2020 04:56:46 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id 21A8FAEE2; Fri, 20 Mar 2020 08:56:44 +0000 (UTC) From: Davidlohr Bueso To: tglx@linutronix.de Cc: arnd@arndb.de, balbi@kernel.org, bhelgaas@google.com, bigeasy@linutronix.de, dave@stgolabs.net, davem@davemloft.net, gregkh@linuxfoundation.org, joel@joelfernandes.org, kurt.schwemmer@microsemi.com, kvalo@codeaurora.org, linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org, linux-usb@vger.kernel.org, linux-wireless@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, logang@deltatee.com, mingo@kernel.org, mpe@ellerman.id.au, netdev@vger.kernel.org, oleg@redhat.com, paulmck@kernel.org, peterz@infradead.org, rdunlap@infradead.org, rostedt@goodmis.org, torvalds@linux-foundation.org, will@kernel.org, Paolo Bonzini , Davidlohr Bueso Subject: [PATCH 18/15] kvm: Replace vcpu->swait with rcuwait Date: Fri, 20 Mar 2020 01:55:26 -0700 Message-Id: <20200320085527.23861-3-dave@stgolabs.net> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20200320085527.23861-1-dave@stgolabs.net> References: <20200318204302.693307984@linutronix.de> <20200320085527.23861-1-dave@stgolabs.net> Sender: linux-usb-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-usb@vger.kernel.org The use of any sort of waitqueue (simple or regular) for wait/waking vcpus has always been an overkill and semantically wrong. Because this is per-vcpu (which is blocked) there is only ever a single waiting vcpu, thus no need for any sort of queue. As such, make use of the rcuwait primitive, with the following considerations: - rcuwait already provides the proper barriers that serialize concurrent waiter and waker. - Task wakeup is done in rcu read critical region, with a stable task pointer. - Because there is no concurrency among waiters, we need not worry about rcuwait_wait_event() calls corrupting the wait->task. As a consequence, this saves the locking done in swait when adding to the queue. The x86-tscdeadline_latency test mentioned in 8577370fb0cb ("KVM: Use simple waitqueue for vcpu->wq") shows that, on avg, latency is reduced by around 15% with this change. Cc: Paolo Bonzini Signed-off-by: Davidlohr Bueso --- Only compiled and tested on x86. arch/powerpc/include/asm/kvm_host.h | 2 +- arch/powerpc/kvm/book3s_hv.c | 10 ++++------ arch/x86/kvm/lapic.c | 2 +- include/linux/kvm_host.h | 10 +++++----- virt/kvm/arm/arch_timer.c | 2 +- virt/kvm/arm/arm.c | 9 +++++---- virt/kvm/async_pf.c | 3 +-- virt/kvm/kvm_main.c | 33 +++++++++++++-------------------- 8 files changed, 31 insertions(+), 40 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 6e8b8ffd06ad..e2b4a1e3fb7d 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -752,7 +752,7 @@ struct kvm_vcpu_arch { u8 irq_pending; /* Used by XIVE to signal pending guest irqs */ u32 last_inst; - struct swait_queue_head *wqp; + struct rcuwait *waitp; struct kvmppc_vcore *vcore; int ret; int trap; diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index 2cefd071b848..c7cbc4bd06e9 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -231,13 +231,11 @@ static bool kvmppc_ipi_thread(int cpu) static void kvmppc_fast_vcpu_kick_hv(struct kvm_vcpu *vcpu) { int cpu; - struct swait_queue_head *wqp; + struct rcuwait *wait; - wqp = kvm_arch_vcpu_wq(vcpu); - if (swq_has_sleeper(wqp)) { - swake_up_one(wqp); + wait = kvm_arch_vcpu_get_wait(vcpu); + if (rcuwait_wake_up(wait)) ++vcpu->stat.halt_wakeup; - } cpu = READ_ONCE(vcpu->arch.thread_cpu); if (cpu >= 0 && kvmppc_ipi_thread(cpu)) @@ -4274,7 +4272,7 @@ static int kvmppc_vcpu_run_hv(struct kvm_run *run, struct kvm_vcpu *vcpu) } user_vrsave = mfspr(SPRN_VRSAVE); - vcpu->arch.wqp = &vcpu->arch.vcore->wq; + vcpu->arch.waitp = &vcpu->arch.vcore->wait; vcpu->arch.pgdir = kvm->mm->pgd; vcpu->arch.state = KVMPPC_VCPU_BUSY_IN_HOST; diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index e3099c642fec..a4420c26dfbc 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -1815,7 +1815,7 @@ void kvm_lapic_expired_hv_timer(struct kvm_vcpu *vcpu) /* If the preempt notifier has already run, it also called apic_timer_expired */ if (!apic->lapic_timer.hv_timer_in_use) goto out; - WARN_ON(swait_active(&vcpu->wq)); + WARN_ON(rcu_dereference(vcpu->wait.task)); cancel_hv_timer(apic); apic_timer_expired(apic); diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index bcb9b2ac0791..b5694429aede 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -23,7 +23,7 @@ #include #include #include -#include +#include #include #include #include @@ -277,7 +277,7 @@ struct kvm_vcpu { struct mutex mutex; struct kvm_run *run; - struct swait_queue_head wq; + struct rcuwait wait; struct pid __rcu *pid; int sigset_active; sigset_t sigset; @@ -952,12 +952,12 @@ static inline bool kvm_arch_has_assigned_device(struct kvm *kvm) } #endif -static inline struct swait_queue_head *kvm_arch_vcpu_wq(struct kvm_vcpu *vcpu) +static inline struct rcuwait *kvm_arch_vcpu_get_wait(struct kvm_vcpu *vcpu) { #ifdef __KVM_HAVE_ARCH_WQP - return vcpu->arch.wqp; + return vcpu->arch.wait; #else - return &vcpu->wq; + return &vcpu->wait; #endif } diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c index 0d9438e9de2a..4be71cb58691 100644 --- a/virt/kvm/arm/arch_timer.c +++ b/virt/kvm/arm/arch_timer.c @@ -593,7 +593,7 @@ void kvm_timer_vcpu_put(struct kvm_vcpu *vcpu) if (map.emul_ptimer) soft_timer_cancel(&map.emul_ptimer->hrtimer); - if (swait_active(kvm_arch_vcpu_wq(vcpu))) + if (rcu_dereference(kvm_arch_vpu_get_wait(vcpu)) != NULL) kvm_timer_blocking(vcpu); /* diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c index eda7b624eab8..4a704866e9b6 100644 --- a/virt/kvm/arm/arm.c +++ b/virt/kvm/arm/arm.c @@ -579,16 +579,17 @@ void kvm_arm_resume_guest(struct kvm *kvm) kvm_for_each_vcpu(i, vcpu, kvm) { vcpu->arch.pause = false; - swake_up_one(kvm_arch_vcpu_wq(vcpu)); + rcuwait_wake_up(kvm_arch_vcpu_get_wait(vcpu)); } } static void vcpu_req_sleep(struct kvm_vcpu *vcpu) { - struct swait_queue_head *wq = kvm_arch_vcpu_wq(vcpu); + struct rcuwait *wait = kvm_arch_vcpu_get_wait(vcpu); - swait_event_interruptible_exclusive(*wq, ((!vcpu->arch.power_off) && - (!vcpu->arch.pause))); + rcuwait_wait_event(*wait, + (!vcpu->arch.power_off) && (!vcpu->arch.pause), + TASK_INTERRUPTIBLE); if (vcpu->arch.power_off || vcpu->arch.pause) { /* Awaken to handle a signal, request we sleep again later. */ diff --git a/virt/kvm/async_pf.c b/virt/kvm/async_pf.c index 15e5b037f92d..10b533f641a6 100644 --- a/virt/kvm/async_pf.c +++ b/virt/kvm/async_pf.c @@ -80,8 +80,7 @@ static void async_pf_execute(struct work_struct *work) trace_kvm_async_pf_completed(addr, cr2_or_gpa); - if (swq_has_sleeper(&vcpu->wq)) - swake_up_one(&vcpu->wq); + rcuwait_wake_up(&vcpu->wait); mmput(mm); kvm_put_kvm(vcpu->kvm); diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 70f03ce0e5c1..6b49dcb321e2 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -343,7 +343,7 @@ static void kvm_vcpu_init(struct kvm_vcpu *vcpu, struct kvm *kvm, unsigned id) vcpu->kvm = kvm; vcpu->vcpu_id = id; vcpu->pid = NULL; - init_swait_queue_head(&vcpu->wq); + rcuwait_init(&vcpu->wait); kvm_async_pf_vcpu_init(vcpu); vcpu->pre_pcpu = -1; @@ -2465,9 +2465,8 @@ static int kvm_vcpu_check_block(struct kvm_vcpu *vcpu) void kvm_vcpu_block(struct kvm_vcpu *vcpu) { ktime_t start, cur; - DECLARE_SWAITQUEUE(wait); - bool waited = false; u64 block_ns; + int block_check = -EINTR; kvm_arch_vcpu_blocking(vcpu); @@ -2487,21 +2486,14 @@ void kvm_vcpu_block(struct kvm_vcpu *vcpu) ++vcpu->stat.halt_poll_invalid; goto out; } + cur = ktime_get(); } while (single_task_running() && ktime_before(cur, stop)); } - for (;;) { - prepare_to_swait_exclusive(&vcpu->wq, &wait, TASK_INTERRUPTIBLE); - - if (kvm_vcpu_check_block(vcpu) < 0) - break; - - waited = true; - schedule(); - } - - finish_swait(&vcpu->wq, &wait); + rcuwait_wait_event(&vcpu->wait, + (block_check = kvm_vcpu_check_block(vcpu)) < 0, + TASK_INTERRUPTIBLE); cur = ktime_get(); out: kvm_arch_vcpu_unblocking(vcpu); @@ -2525,18 +2517,18 @@ void kvm_vcpu_block(struct kvm_vcpu *vcpu) } } - trace_kvm_vcpu_wakeup(block_ns, waited, vcpu_valid_wakeup(vcpu)); + trace_kvm_vcpu_wakeup(block_ns, block_check < 0 ? false : true, + vcpu_valid_wakeup(vcpu)); kvm_arch_vcpu_block_finish(vcpu); } EXPORT_SYMBOL_GPL(kvm_vcpu_block); bool kvm_vcpu_wake_up(struct kvm_vcpu *vcpu) { - struct swait_queue_head *wqp; + struct rcuwait *wait; - wqp = kvm_arch_vcpu_wq(vcpu); - if (swq_has_sleeper(wqp)) { - swake_up_one(wqp); + wait = kvm_arch_vcpu_get_wait(vcpu); + if (rcuwait_wake_up(wait)) { WRITE_ONCE(vcpu->ready, true); ++vcpu->stat.halt_wakeup; return true; @@ -2678,7 +2670,8 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me, bool yield_to_kernel_mode) continue; if (vcpu == me) continue; - if (swait_active(&vcpu->wq) && !vcpu_dy_runnable(vcpu)) + if (rcu_dereference(vcpu->wait.task) && + !vcpu_dy_runnable(vcpu)) continue; if (READ_ONCE(vcpu->preempted) && yield_to_kernel_mode && !kvm_arch_vcpu_in_kernel(vcpu)) From patchwork Fri Mar 20 08:55:27 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Davidlohr Bueso X-Patchwork-Id: 11448623 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7135C139A for ; Fri, 20 Mar 2020 08:56:57 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 5204E20767 for ; Fri, 20 Mar 2020 08:56:57 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727120AbgCTI4v (ORCPT ); Fri, 20 Mar 2020 04:56:51 -0400 Received: from mx2.suse.de ([195.135.220.15]:43474 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726232AbgCTI4u (ORCPT ); Fri, 20 Mar 2020 04:56:50 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id 18286AE9A; Fri, 20 Mar 2020 08:56:49 +0000 (UTC) From: Davidlohr Bueso To: tglx@linutronix.de Cc: arnd@arndb.de, balbi@kernel.org, bhelgaas@google.com, bigeasy@linutronix.de, dave@stgolabs.net, davem@davemloft.net, gregkh@linuxfoundation.org, joel@joelfernandes.org, kurt.schwemmer@microsemi.com, kvalo@codeaurora.org, linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org, linux-usb@vger.kernel.org, linux-wireless@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, logang@deltatee.com, mingo@kernel.org, mpe@ellerman.id.au, netdev@vger.kernel.org, oleg@redhat.com, paulmck@kernel.org, peterz@infradead.org, rdunlap@infradead.org, rostedt@goodmis.org, torvalds@linux-foundation.org, will@kernel.org, Davidlohr Bueso Subject: [PATCH 19/15] sched/swait: Reword some of the main description Date: Fri, 20 Mar 2020 01:55:27 -0700 Message-Id: <20200320085527.23861-4-dave@stgolabs.net> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20200320085527.23861-1-dave@stgolabs.net> References: <20200318204302.693307984@linutronix.de> <20200320085527.23861-1-dave@stgolabs.net> Sender: linux-usb-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-usb@vger.kernel.org With both the increased use of swait and kvm no longer using it, we can reword some of the comments. While removing Linus' valid rant, I've also cared to explicitly mention that swait is very different than regular wait. In addition it is mentioned against using swait in favor of the regular flavor. Signed-off-by: Davidlohr Bueso --- include/linux/swait.h | 23 +++++------------------ 1 file changed, 5 insertions(+), 18 deletions(-) diff --git a/include/linux/swait.h b/include/linux/swait.h index 73e06e9986d4..6e5b5d0e64fd 100644 --- a/include/linux/swait.h +++ b/include/linux/swait.h @@ -9,23 +9,10 @@ #include /* - * BROKEN wait-queues. - * - * These "simple" wait-queues are broken garbage, and should never be - * used. The comments below claim that they are "similar" to regular - * wait-queues, but the semantics are actually completely different, and - * every single user we have ever had has been buggy (or pointless). - * - * A "swake_up_one()" only wakes up _one_ waiter, which is not at all what - * "wake_up()" does, and has led to problems. In other cases, it has - * been fine, because there's only ever one waiter (kvm), but in that - * case gthe whole "simple" wait-queue is just pointless to begin with, - * since there is no "queue". Use "wake_up_process()" with a direct - * pointer instead. - * - * While these are very similar to regular wait queues (wait.h) the most - * important difference is that the simple waitqueue allows for deterministic - * behaviour -- IOW it has strictly bounded IRQ and lock hold times. + * Simple waitqueues are semantically very different to regular wait queues + * (wait.h). The most important difference is that the simple waitqueue allows + * for deterministic behaviour -- IOW it has strictly bounded IRQ and lock hold + * times. * * Mainly, this is accomplished by two things. Firstly not allowing swake_up_all * from IRQ disabled, and dropping the lock upon every wakeup, giving a higher @@ -39,7 +26,7 @@ * sleeper state. * * - the !exclusive mode; because that leads to O(n) wakeups, everything is - * exclusive. + * exclusive. As such swait_wake_up_one will only ever awake _one_ waiter. * * - custom wake callback functions; because you cannot give any guarantees * about random code. This also allows swait to be used in RT, such that