From patchwork Mon Jan 15 08:38:12 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Neeraj Upadhyay X-Patchwork-Id: 10163265 X-Patchwork-Delegate: agross@codeaurora.org Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id D19C760325 for ; Mon, 15 Jan 2018 08:38:39 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B81F628066 for ; Mon, 15 Jan 2018 08:38:39 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id ACB0128956; Mon, 15 Jan 2018 08:38:39 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.8 required=2.0 tests=BAYES_00,DKIM_SIGNED, RCVD_IN_DNSWL_HI,T_DKIM_INVALID autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 315AF28066 for ; Mon, 15 Jan 2018 08:38:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754212AbeAOIi3 (ORCPT ); Mon, 15 Jan 2018 03:38:29 -0500 Received: from smtp.codeaurora.org ([198.145.29.96]:47954 "EHLO smtp.codeaurora.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752778AbeAOIi2 (ORCPT ); Mon, 15 Jan 2018 03:38:28 -0500 Received: by smtp.codeaurora.org (Postfix, from userid 1000) id 6B90960854; Mon, 15 Jan 2018 08:38:27 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=codeaurora.org; s=default; t=1516005507; bh=i1XOhxj6GsBlMeCclQp5C1BIFLWx1ZqfoVtS8MRN/0E=; h=From:To:Cc:Subject:Date:From; b=m4z9QjD2xzaTjxrPiqGqf1Kpz6b2ncmuS4xorMikHrFXXCllGBGlndeeZMsQiRshO ZOF20TtF5FpSQS9gX9MYayUpmiEl9jQpl7tAS7hYfDMVx74br2LDn97/uMJcd+uFL7 6sg7bk6L6cBQrVGbgHupNZbtBkJDzNRqd4yV9W1g= Received: from neeraju-linux.qualcomm.com (blr-c-bdr-fw-01_globalnat_allzones-outside.qualcomm.com [103.229.19.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-SHA256 (128/128 bits)) (No client certificate requested) (Authenticated sender: neeraju@smtp.codeaurora.org) by smtp.codeaurora.org (Postfix) with ESMTPSA id 0C8476053B; Mon, 15 Jan 2018 08:38:24 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=codeaurora.org; s=default; t=1516005507; bh=i1XOhxj6GsBlMeCclQp5C1BIFLWx1ZqfoVtS8MRN/0E=; h=From:To:Cc:Subject:Date:From; b=m4z9QjD2xzaTjxrPiqGqf1Kpz6b2ncmuS4xorMikHrFXXCllGBGlndeeZMsQiRshO ZOF20TtF5FpSQS9gX9MYayUpmiEl9jQpl7tAS7hYfDMVx74br2LDn97/uMJcd+uFL7 6sg7bk6L6cBQrVGbgHupNZbtBkJDzNRqd4yV9W1g= DMARC-Filter: OpenDMARC Filter v1.3.2 smtp.codeaurora.org 0C8476053B Authentication-Results: pdx-caf-mail.web.codeaurora.org; dmarc=none (p=none dis=none) header.from=codeaurora.org Authentication-Results: pdx-caf-mail.web.codeaurora.org; spf=none smtp.mailfrom=neeraju@codeaurora.org From: Neeraj Upadhyay To: tj@kernel.org, jiangshanlai@gmail.com Cc: linux-kernel@vger.kernel.org, linux-arm-msm@vger.kernel.org, prsood@codeaurora.org, sramana@codeaurora.org, Neeraj Upadhyay Subject: [PATCH] workqueue: Handle race between wake up and rebind Date: Mon, 15 Jan 2018 14:08:12 +0530 Message-Id: <1516005492-4994-1-git-send-email-neeraju@codeaurora.org> X-Mailer: git-send-email 1.9.1 Sender: linux-arm-msm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-arm-msm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP There is a potential race b/w rebind_workers() and wakeup of a worker thread, which can result in workqueue lockup for a bounder worker pool. Below is the potential race: - cpu0 is a bounded worker pool, which is unbound from its cpu. A new work is queued on this pool, which causes its worker (kworker/0:0) to be woken up on a cpu different from cpu0, lets say cpu1. workqueue_queue_work workqueue_activate_work - cpu0 rebind happens rebind_workers() Clears POOL_DISASSOCIATED and binds cpumask of all workers. - kworker/0:0 gets chance to run on cpu1; while processing a work, it goes to sleep. However, it does not decrement pool->nr_running. This is because WORKER_REBOUND (NOT_ RUNNING) flag was cleared, when worker entered worker_ thread(). Worker 0 runs on cpu1 worker_thread() process_one_work() wq_worker_sleeping() if (worker->flags & WORKER_NOT_RUNNING) return NULL; if (WARN_ON_ONCE(pool->cpu != raw_smp_processor_id())) - After this, when kworker/0:0 wakes up, this time on its bounded cpu cpu0, it increments pool->nr_running again. So, pool->nr_running becomes 2. - When kworker/0:0 enters idle, it decrements pool->nr_running by 1. This leaves pool->nr_running =1 , with no workers in runnable state. - Now, no new workers will be woken up, as pool->nr_running is non-zero. This results in indefinite lockup for this pool. Fix this by deferring the work to some other idle worker, if the current worker is not bound to its pool's CPU. Signed-off-by: Neeraj Upadhyay --- kernel/workqueue.c | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/kernel/workqueue.c b/kernel/workqueue.c index 43d18cb..71c0023 100644 --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -2218,6 +2218,17 @@ static int worker_thread(void *__worker) if (unlikely(!may_start_working(pool)) && manage_workers(worker)) goto recheck; + /* handle the case where, while a bounded pool is unbound, + * its worker is woken up on a target CPU, which is different + * from pool->cpu, but pool is rebound before this worker gets + * chance to run on the target CPU. + */ + if (WARN_ON_ONCE(!(pool->flags & POOL_DISASSOCIATED) && + raw_smp_processor_id() != pool->cpu)) { + wake_up_worker(pool); + goto sleep; + } + /* * ->scheduled list can only be filled while a worker is * preparing to process a work or actually processing it.