From patchwork Mon Mar 19 00:36:15 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Michael Lyle X-Patchwork-Id: 10291457 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 669D0600F6 for ; Mon, 19 Mar 2018 00:36:48 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 58BCD28EDF for ; Mon, 19 Mar 2018 00:36:48 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 4D52228F56; Mon, 19 Mar 2018 00:36:48 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A90E928EDF for ; Mon, 19 Mar 2018 00:36:47 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754596AbeCSAgp (ORCPT ); Sun, 18 Mar 2018 20:36:45 -0400 Received: from mail-pg0-f52.google.com ([74.125.83.52]:37526 "EHLO mail-pg0-f52.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754651AbeCSAgm (ORCPT ); Sun, 18 Mar 2018 20:36:42 -0400 Received: by mail-pg0-f52.google.com with SMTP id n11so1538427pgp.4 for ; Sun, 18 Mar 2018 17:36:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=lyle-org.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=Cx6ev5v1i8FApZh1B4XfjlatSxFbKtW6O2X1TxgD/Po=; b=ux2Oxp7VhYTJ7Dp6j5+4WtsmEJRit/i5WZrazbaXhw0YTTWFo46D+BAwvwRYghe083 w8aP7hjMOA8LanomaUbVXsmv9ktSKJsw7Ld5FNsGq1Ilw1UVsgODJ9Y0SW+Hun/SLsG6 QLfyVw5U2APZ72jVYn8T5VoZebha/YAVNVPdoQ8zNsTfEtwMFyjdmxz4sgCDCiCP+uZ9 g14mJmf6/bOCWLbflkXCs8qUx/JOmowzdOBZIkcIv624XJ5OPucMNLqdaX169p/7WDzb wlfVDIj3x8RAIpG4D0T+KUmuYbMBowuzVDHqMxAr7G1rDig5zOdmrFRsHduxBVr1STsn 6JOg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=Cx6ev5v1i8FApZh1B4XfjlatSxFbKtW6O2X1TxgD/Po=; b=NDDqVfPqyFut5nz3oaszFHgSyQ0cyHzeDc16hGJnHcMT+Vi11gqqjeTcNy08383U/1 GGBc6Th1V8JYNMpFoteOIg09kzld8IzNMX55V14RgpmF5cLkl2ENjsLuJCkFF4+l5eCs DCB/GOFjX/HlQjBseAtb5Tfd+ivdfzjDofpnhOD4YHE7f0Me3OydbxnbfXR+tsGCpvlM YmcD8CI1iE/mpqtOCOT1x+xG/KCFPtyyBUNnL/uCqcLijB+Qy0S1QbNP1+NWMBP5QKso F4EzKLR1SFmvYSXT5+7uuhWnDRA54nVFHOT+u4GkJd4XtokPnYVFVHW4hezBkzqUJ3Dt kxzg== X-Gm-Message-State: AElRT7G4j1M2gBS1eBn8/UxDz2nflnb5xmDRCF8uUX4uaPZZwOfsOU0w 4tRAn0FMSU+3qqTynsZWdaf4zQ== X-Google-Smtp-Source: AG47ELtk0t9Y4clUilcw5xS+cZHI/KEEsNCU//pNS+CgbgHGapclbj+h2XIG0tAMrWzpTLvoG3dOtA== X-Received: by 10.98.73.89 with SMTP id w86mr8563646pfa.227.1521419801801; Sun, 18 Mar 2018 17:36:41 -0700 (PDT) Received: from midnight.lan (2600-6c52-6200-09b7-0000-0000-0000-0d66.dhcp6.chtrptr.net. [2600:6c52:6200:9b7::d66]) by smtp.gmail.com with ESMTPSA id a17sm27674857pfc.122.2018.03.18.17.36.40 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sun, 18 Mar 2018 17:36:41 -0700 (PDT) From: Michael Lyle To: linux-bcache@vger.kernel.org, linux-block@vger.kernel.org Cc: axboe@fb.com, Coly Li , Hannes Reinecke , Huijun Tang Subject: [for-4.17 02/20] bcache: quit dc->writeback_thread when BCACHE_DEV_DETACHING is set Date: Sun, 18 Mar 2018 17:36:15 -0700 Message-Id: <20180319003633.27225-3-mlyle@lyle.org> X-Mailer: git-send-email 2.14.1 In-Reply-To: <20180319003633.27225-1-mlyle@lyle.org> References: <20180319003633.27225-1-mlyle@lyle.org> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Coly Li In patch "bcache: fix cached_dev->count usage for bch_cache_set_error()", cached_dev_get() is called when creating dc->writeback_thread, and cached_dev_put() is called when exiting dc->writeback_thread. This modification works well unless people detach the bcache device manually by 'echo 1 > /sys/block/bcache/bcache/detach' Because this sysfs interface only calls bch_cached_dev_detach() which wakes up dc->writeback_thread but does not stop it. The reason is, before patch "bcache: fix cached_dev->count usage for bch_cache_set_error()", inside bch_writeback_thread(), if cache is not dirty after writeback, cached_dev_put() will be called here. And in cached_dev_make_request() when a new write request makes cache from clean to dirty, cached_dev_get() will be called there. Since we don't operate dc->count in these locations, refcount d->count cannot be dropped after cache becomes clean, and cached_dev_detach_finish() won't be called to detach bcache device. This patch fixes the issue by checking whether BCACHE_DEV_DETACHING is set inside bch_writeback_thread(). If this bit is set and cache is clean (no existing writeback_keys), break the while-loop, call cached_dev_put() and quit the writeback thread. Please note if cache is still dirty, even BCACHE_DEV_DETACHING is set the writeback thread should continue to perform writeback, this is the original design of manually detach. It is safe to do the following check without locking, let me explain why, + if (!test_bit(BCACHE_DEV_DETACHING, &dc->disk.flags) && + (!atomic_read(&dc->has_dirty) || !dc->writeback_running)) { If the kenrel thread does not sleep and continue to run due to conditions are not updated in time on the running CPU core, it just consumes more CPU cycles and has no hurt. This should-sleep-but-run is safe here. We just focus on the should-run-but-sleep condition, which means the writeback thread goes to sleep in mistake while it should continue to run. 1, First of all, no matter the writeback thread is hung or not, kthread_stop() from cached_dev_detach_finish() will wake up it and terminate by making kthread_should_stop() return true. And in normal run time, bit on index BCACHE_DEV_DETACHING is always cleared, the condition !test_bit(BCACHE_DEV_DETACHING, &dc->disk.flags) is always true and can be ignored as constant value. 2, If one of the following conditions is true, the writeback thread should go to sleep, "!atomic_read(&dc->has_dirty)" or "!dc->writeback_running)" each of them independently controls the writeback thread should sleep or not, let's analyse them one by one. 2.1 condition "!atomic_read(&dc->has_dirty)" If dc->has_dirty is set from 0 to 1 on another CPU core, bcache will call bch_writeback_queue() immediately or call bch_writeback_add() which indirectly calls bch_writeback_queue() too. In bch_writeback_queue(), wake_up_process(dc->writeback_thread) is called. It sets writeback thread's task state to TASK_RUNNING and following an implicit memory barrier, then tries to wake up the writeback thread. In writeback thread, its task state is set to TASK_INTERRUPTIBLE before doing the condition check. If other CPU core sets the TASK_RUNNING state after writeback thread setting TASK_INTERRUPTIBLE, the writeback thread will be scheduled to run very soon because its state is not TASK_INTERRUPTIBLE. If other CPU core sets the TASK_RUNNING state before writeback thread setting TASK_INTERRUPTIBLE, the implict memory barrier of wake_up_process() will make sure modification of dc->has_dirty on other CPU core is updated and observed on the CPU core of writeback thread. Therefore the condition check will correctly be false, and continue writeback code without sleeping. 2.2 condition "!dc->writeback_running)" dc->writeback_running can be changed via sysfs file, every time it is modified, a following bch_writeback_queue() is alwasy called. So the change is always observed on the CPU core of writeback thread. If dc->writeback_running is changed from 0 to 1 on other CPU core, this condition check will observe the modification and allow writeback thread to continue to run without sleeping. Now we can see, even without a locking protection, multiple conditions check is safe here, no deadlock or process hang up will happen. I compose a separte patch because that patch "bcache: fix cached_dev->count usage for bch_cache_set_error()" already gets a "Reviewed-by:" from Hannes Reinecke. Also this fix is not trivial and good for a separate patch. Signed-off-by: Coly Li Reviewed-by: Michael Lyle Cc: Hannes Reinecke Cc: Huijun Tang --- drivers/md/bcache/writeback.c | 20 +++++++++++++++++--- 1 file changed, 17 insertions(+), 3 deletions(-) diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c index b280c134dd4d..4dbeaaa575bf 100644 --- a/drivers/md/bcache/writeback.c +++ b/drivers/md/bcache/writeback.c @@ -565,9 +565,15 @@ static int bch_writeback_thread(void *arg) while (!kthread_should_stop()) { down_write(&dc->writeback_lock); set_current_state(TASK_INTERRUPTIBLE); - if (!atomic_read(&dc->has_dirty) || - (!test_bit(BCACHE_DEV_DETACHING, &dc->disk.flags) && - !dc->writeback_running)) { + /* + * If the bache device is detaching, skip here and continue + * to perform writeback. Otherwise, if no dirty data on cache, + * or there is dirty data on cache but writeback is disabled, + * the writeback thread should sleep here and wait for others + * to wake up it. + */ + if (!test_bit(BCACHE_DEV_DETACHING, &dc->disk.flags) && + (!atomic_read(&dc->has_dirty) || !dc->writeback_running)) { up_write(&dc->writeback_lock); if (kthread_should_stop()) { @@ -587,6 +593,14 @@ static int bch_writeback_thread(void *arg) atomic_set(&dc->has_dirty, 0); SET_BDEV_STATE(&dc->sb, BDEV_STATE_CLEAN); bch_write_bdev_super(dc, NULL); + /* + * If bcache device is detaching via sysfs interface, + * writeback thread should stop after there is no dirty + * data on cache. BCACHE_DEV_DETACHING flag is set in + * bch_cached_dev_detach(). + */ + if (test_bit(BCACHE_DEV_DETACHING, &dc->disk.flags)) + break; } up_write(&dc->writeback_lock);