diff mbox series

[1/1] bcache: avoid unnecessary soft lockup in kworker update_writeback_rate()

Message ID 20220528061949.28519-2-colyli@suse.de (mailing list archive)
State New, archived
Headers show
Series bcache fix for Linux v5.19 (3rd wave) | expand

Commit Message

Coly Li May 28, 2022, 6:19 a.m. UTC
The kworker routine update_writeback_rate() is schedued to update the
writeback rate in every 5 seconds by default. Before calling
__update_writeback_rate() to do real job, semaphore dc->writeback_lock
should be held by the kworker routine.

At the same time, bcache writeback thread routine bch_writeback_thread()
also needs to hold dc->writeback_lock before flushing dirty data back
into the backing device. If the dirty data set is large, it might be
very long time for bch_writeback_thread() to scan all dirty buckets and
releases dc->writeback_lock. In such case update_writeback_rate() can be
starved for long enough time so that kernel reports a soft lockup warn-
ing started like:
  watchdog: BUG: soft lockup - CPU#246 stuck for 23s! [kworker/246:31:179713]

Such soft lockup condition is unnecessary, because after the writeback
thread finishes its job and releases dc->writeback_lock, the kworker
update_writeback_rate() may continue to work and everything is fine
indeed.

This patch avoids the unnecessary soft lockup by the following method,
- Add new member to struct cached_dev
  - dc->rate_update_retry (0 by default)
- In update_writeback_rate() call down_read_trylock(&dc->writeback_lock)
  firstly, if it fails then lock contention happens.
- If dc->rate_update_retry <= BCH_WBRATE_UPDATE_RETRY_MAX (15), doesn't
  acquire the lock and reschedules the kworker for next try.
- If dc->rate_update_retry > BCH_WBRATE_UPDATE_RETRY_MAX, no retry
  anymore and call down_read(&dc->writeback_lock) to wait for the lock.

By the above method, at worst case update_writeback_rate() may retry for
1+ minutes before blocking on dc->writeback_lock by calling down_read().
For a 4TB cache device with 1TB dirty data, 90%+ of the unnecessary soft
lockup warning message can be avoided.

When retrying to acquire dc->writeback_lock in update_writeback_rate(),
of course the writeback rate cannot be updated. It is fair, because when
the kworker is blocked on the lock contention of dc->writeback_lock, the
writeback rate cannot be updated neither.

This change follows Jens Axboe's suggestion to a more clear and simple
version.

Signed-off-by: Coly Li <colyli@suse.de>
---
 drivers/md/bcache/bcache.h    |  7 +++++++
 drivers/md/bcache/writeback.c | 31 +++++++++++++++++++++----------
 2 files changed, 28 insertions(+), 10 deletions(-)

Comments

Jens Axboe May 28, 2022, 12:20 p.m. UTC | #1
On 5/28/22 12:19 AM, Coly Li wrote:
> The kworker routine update_writeback_rate() is schedued to update the
> writeback rate in every 5 seconds by default. Before calling
> __update_writeback_rate() to do real job, semaphore dc->writeback_lock
> should be held by the kworker routine.
> 
> At the same time, bcache writeback thread routine bch_writeback_thread()
> also needs to hold dc->writeback_lock before flushing dirty data back
> into the backing device. If the dirty data set is large, it might be
> very long time for bch_writeback_thread() to scan all dirty buckets and
> releases dc->writeback_lock. In such case update_writeback_rate() can be
> starved for long enough time so that kernel reports a soft lockup warn-
> ing started like:
>   watchdog: BUG: soft lockup - CPU#246 stuck for 23s! [kworker/246:31:179713]
> 
> Such soft lockup condition is unnecessary, because after the writeback
> thread finishes its job and releases dc->writeback_lock, the kworker
> update_writeback_rate() may continue to work and everything is fine
> indeed.
> 
> This patch avoids the unnecessary soft lockup by the following method,
> - Add new member to struct cached_dev
>   - dc->rate_update_retry (0 by default)
> - In update_writeback_rate() call down_read_trylock(&dc->writeback_lock)
>   firstly, if it fails then lock contention happens.
> - If dc->rate_update_retry <= BCH_WBRATE_UPDATE_RETRY_MAX (15), doesn't
>   acquire the lock and reschedules the kworker for next try.
> - If dc->rate_update_retry > BCH_WBRATE_UPDATE_RETRY_MAX, no retry
>   anymore and call down_read(&dc->writeback_lock) to wait for the lock.
> 
> By the above method, at worst case update_writeback_rate() may retry for
> 1+ minutes before blocking on dc->writeback_lock by calling down_read().
> For a 4TB cache device with 1TB dirty data, 90%+ of the unnecessary soft
> lockup warning message can be avoided.
> 
> When retrying to acquire dc->writeback_lock in update_writeback_rate(),
> of course the writeback rate cannot be updated. It is fair, because when
> the kworker is blocked on the lock contention of dc->writeback_lock, the
> writeback rate cannot be updated neither.
> 
> This change follows Jens Axboe's suggestion to a more clear and simple
> version.

This looks fine, but it doesn't apply to my current for-5.19/drivers
branch which the previous ones did. Did you spin this one without the
other patches, perhaps?

One minor thing we might want to change if you're respinning it -
BCH_WBRATE_UPDATE_RETRY_MAX isn't really named for what it does, since
it doesn't retry anything, it simply allows updates to be skipped. Why
not call it BCH_WBRATE_UPDATE_MAX_SKIPS instead? I think that'd be
better convey what it does.
Coly Li May 28, 2022, 12:22 p.m. UTC | #2
> 2022年5月28日 20:20,Jens Axboe <axboe@kernel.dk> 写道:
> 
> On 5/28/22 12:19 AM, Coly Li wrote:
>> The kworker routine update_writeback_rate() is schedued to update the
>> writeback rate in every 5 seconds by default. Before calling
>> __update_writeback_rate() to do real job, semaphore dc->writeback_lock
>> should be held by the kworker routine.
>> 
>> At the same time, bcache writeback thread routine bch_writeback_thread()
>> also needs to hold dc->writeback_lock before flushing dirty data back
>> into the backing device. If the dirty data set is large, it might be
>> very long time for bch_writeback_thread() to scan all dirty buckets and
>> releases dc->writeback_lock. In such case update_writeback_rate() can be
>> starved for long enough time so that kernel reports a soft lockup warn-
>> ing started like:
>>  watchdog: BUG: soft lockup - CPU#246 stuck for 23s! [kworker/246:31:179713]
>> 
>> Such soft lockup condition is unnecessary, because after the writeback
>> thread finishes its job and releases dc->writeback_lock, the kworker
>> update_writeback_rate() may continue to work and everything is fine
>> indeed.
>> 
>> This patch avoids the unnecessary soft lockup by the following method,
>> - Add new member to struct cached_dev
>>  - dc->rate_update_retry (0 by default)
>> - In update_writeback_rate() call down_read_trylock(&dc->writeback_lock)
>>  firstly, if it fails then lock contention happens.
>> - If dc->rate_update_retry <= BCH_WBRATE_UPDATE_RETRY_MAX (15), doesn't
>>  acquire the lock and reschedules the kworker for next try.
>> - If dc->rate_update_retry > BCH_WBRATE_UPDATE_RETRY_MAX, no retry
>>  anymore and call down_read(&dc->writeback_lock) to wait for the lock.
>> 
>> By the above method, at worst case update_writeback_rate() may retry for
>> 1+ minutes before blocking on dc->writeback_lock by calling down_read().
>> For a 4TB cache device with 1TB dirty data, 90%+ of the unnecessary soft
>> lockup warning message can be avoided.
>> 
>> When retrying to acquire dc->writeback_lock in update_writeback_rate(),
>> of course the writeback rate cannot be updated. It is fair, because when
>> the kworker is blocked on the lock contention of dc->writeback_lock, the
>> writeback rate cannot be updated neither.
>> 
>> This change follows Jens Axboe's suggestion to a more clear and simple
>> version.
> 
> This looks fine, but it doesn't apply to my current for-5.19/drivers
> branch which the previous ones did. Did you spin this one without the
> other patches, perhaps?
> 
> One minor thing we might want to change if you're respinning it -
> BCH_WBRATE_UPDATE_RETRY_MAX isn't really named for what it does, since
> it doesn't retry anything, it simply allows updates to be skipped. Why
> not call it BCH_WBRATE_UPDATE_MAX_SKIPS instead? I think that'd be
> better convey what it does.

Naming is often challenge for me. Sure, _MAX_SKIPS is better. I will post another modified version.

Thanks for the suggestion.

Coly Li
Jens Axboe May 28, 2022, 12:23 p.m. UTC | #3
On 5/28/22 6:22 AM, Coly Li wrote:
> 
> 
>> 2022?5?28? 20:20?Jens Axboe <axboe@kernel.dk> ???
>>
>> On 5/28/22 12:19 AM, Coly Li wrote:
>>> The kworker routine update_writeback_rate() is schedued to update the
>>> writeback rate in every 5 seconds by default. Before calling
>>> __update_writeback_rate() to do real job, semaphore dc->writeback_lock
>>> should be held by the kworker routine.
>>>
>>> At the same time, bcache writeback thread routine bch_writeback_thread()
>>> also needs to hold dc->writeback_lock before flushing dirty data back
>>> into the backing device. If the dirty data set is large, it might be
>>> very long time for bch_writeback_thread() to scan all dirty buckets and
>>> releases dc->writeback_lock. In such case update_writeback_rate() can be
>>> starved for long enough time so that kernel reports a soft lockup warn-
>>> ing started like:
>>>  watchdog: BUG: soft lockup - CPU#246 stuck for 23s! [kworker/246:31:179713]
>>>
>>> Such soft lockup condition is unnecessary, because after the writeback
>>> thread finishes its job and releases dc->writeback_lock, the kworker
>>> update_writeback_rate() may continue to work and everything is fine
>>> indeed.
>>>
>>> This patch avoids the unnecessary soft lockup by the following method,
>>> - Add new member to struct cached_dev
>>>  - dc->rate_update_retry (0 by default)
>>> - In update_writeback_rate() call down_read_trylock(&dc->writeback_lock)
>>>  firstly, if it fails then lock contention happens.
>>> - If dc->rate_update_retry <= BCH_WBRATE_UPDATE_RETRY_MAX (15), doesn't
>>>  acquire the lock and reschedules the kworker for next try.
>>> - If dc->rate_update_retry > BCH_WBRATE_UPDATE_RETRY_MAX, no retry
>>>  anymore and call down_read(&dc->writeback_lock) to wait for the lock.
>>>
>>> By the above method, at worst case update_writeback_rate() may retry for
>>> 1+ minutes before blocking on dc->writeback_lock by calling down_read().
>>> For a 4TB cache device with 1TB dirty data, 90%+ of the unnecessary soft
>>> lockup warning message can be avoided.
>>>
>>> When retrying to acquire dc->writeback_lock in update_writeback_rate(),
>>> of course the writeback rate cannot be updated. It is fair, because when
>>> the kworker is blocked on the lock contention of dc->writeback_lock, the
>>> writeback rate cannot be updated neither.
>>>
>>> This change follows Jens Axboe's suggestion to a more clear and simple
>>> version.
>>
>> This looks fine, but it doesn't apply to my current for-5.19/drivers
>> branch which the previous ones did. Did you spin this one without the
>> other patches, perhaps?
>>
>> One minor thing we might want to change if you're respinning it -
>> BCH_WBRATE_UPDATE_RETRY_MAX isn't really named for what it does, since
>> it doesn't retry anything, it simply allows updates to be skipped. Why
>> not call it BCH_WBRATE_UPDATE_MAX_SKIPS instead? I think that'd be
>> better convey what it does.
> 
> Naming is often challenge for me. Sure, _MAX_SKIPS is better. I will
> post another modified version.

It's hard for everyone :-)

Sounds good, I'll get it applied when it shows up.
diff mbox series

Patch

diff --git a/drivers/md/bcache/bcache.h b/drivers/md/bcache/bcache.h
index 9ed9c955add7..24735a5e839f 100644
--- a/drivers/md/bcache/bcache.h
+++ b/drivers/md/bcache/bcache.h
@@ -395,6 +395,13 @@  struct cached_dev {
 	atomic_t		io_errors;
 	unsigned int		error_limit;
 	unsigned int		offline_seconds;
+
+	/*
+	 * Retry to update writeback_rate if contention happens for
+	 * down_read(dc->writeback_lock) in update_writeback_rate()
+	 */
+#define BCH_WBRATE_UPDATE_RETRY_MAX	15
+	unsigned int		rate_update_retry;
 };
 
 enum alloc_reserve {
diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c
index d138a2d73240..e8df6e5fe012 100644
--- a/drivers/md/bcache/writeback.c
+++ b/drivers/md/bcache/writeback.c
@@ -235,19 +235,27 @@  static void update_writeback_rate(struct work_struct *work)
 		return;
 	}
 
-	if (atomic_read(&dc->has_dirty) && dc->writeback_percent) {
-		/*
-		 * If the whole cache set is idle, set_at_max_writeback_rate()
-		 * will set writeback rate to a max number. Then it is
-		 * unnecessary to update writeback rate for an idle cache set
-		 * in maximum writeback rate number(s).
-		 */
-		if (!set_at_max_writeback_rate(c, dc)) {
-			down_read(&dc->writeback_lock);
+	/*
+	 * If the whole cache set is idle, set_at_max_writeback_rate()
+	 * will set writeback rate to a max number. Then it is
+	 * unnecessary to update writeback rate for an idle cache set
+	 * in maximum writeback rate number(s).
+	 */
+	if (atomic_read(&dc->has_dirty) && dc->writeback_percent &&
+	    !set_at_max_writeback_rate(c, dc)) {
+		do {
+			if (!down_read_trylock((&dc->writeback_lock))) {
+				dc->rate_update_retry++;
+				if (dc->rate_update_retry <=
+				    BCH_WBRATE_UPDATE_RETRY_MAX)
+					break;
+				down_read(&dc->writeback_lock);
+				dc->rate_update_retry = 0;
+			}
 			__update_writeback_rate(dc);
 			update_gc_after_writeback(c);
 			up_read(&dc->writeback_lock);
-		}
+		} while (0);
 	}
 
 
@@ -1006,6 +1014,9 @@  void bch_cached_dev_writeback_init(struct cached_dev *dc)
 	dc->writeback_rate_fp_term_high = 1000;
 	dc->writeback_rate_i_term_inverse = 10000;
 
+	/* For dc->writeback_lock contention in update_writeback_rate() */
+	dc->rate_update_retry = 0;
+
 	WARN_ON(test_and_clear_bit(BCACHE_DEV_WB_RUNNING, &dc->disk.flags));
 	INIT_DELAYED_WORK(&dc->writeback_rate_update, update_writeback_rate);
 }