mbox series

[PATCHv2,0/6] block: fix lock order and remove redundant locking

Message ID 20250218082908.265283-1-nilay@linux.ibm.com (mailing list archive)
Headers show
Series block: fix lock order and remove redundant locking | expand

Message

Nilay Shroff Feb. 18, 2025, 8:28 a.m. UTC
Hi,

After we modeled the freeze & enter queue as lock for supporting lockdep
under commit f1be1788a32e ("block: model freeze & enter queue as lock
for supporting lockdep"), we received numerous lockdep splats. And one
of those splats[1] reported the potential deadlock due to incorrect lock
ordering issue between q->sysfs-lock and q->q_usage_counter. So some of 
the patches in this series are aimed to cut the dependency between q->
sysfs-lock and q->q_usage_counter.

This patchset contains six patches in the series.

The 1st patch removes the q->sysfs_lock for all sysfs attributes which
don't need it. We identified all sysfs attributes which don't need any 
locking and all such attributes have been now grouped in queue_attr_show
/queue_attr_store under entry->show_nolock/entry->store_nolock methods.

The 2nd patch helps acquire q->limits_lock instead of q->sysfs_lock while
reading a set of attributes whose write method is protected with atomic
limit update APIs or updates to these attributes could occur under atomic 
limit update APIs such as queue_limit_start_update() and queue_limits_
commit_update(). So all such attributes have been now grouped in queue_
attr_show under entry->show_limit method.

Subsequent patches address remaining attributes individually and group
them in queue_attr_show/queue_attr_store under entry->show/entry->store 
method which require some form of locking other than q->limits_lock or 
q->sysfs_lock.

The 3rd patch introduce a new dedicated lock for elevator switch/update
and thus eliminates the dependecy of sched update on q->sysfs_lock.

The 4th patch protects sysfs attribute nr_requests using q->elevator_lock
instead of q->sysfs_lock as the update to q->nr_requests now happen under
q->elevator_lock.

Similarly, the 5th patch protects sysfs attribute wbt_lat_usec using
q->elevator_lock instead of q->sysfs_lock as the update to wbt state and
latency now happen under q->elevator_lock.

The 6th patch protects read_ahead_kb using q->limits_lock instead of
q->sysfs_lock as update to bdi->ra_pages could happen using atomic limit
update APIs. Ideally we should have grouped this attribute in queue_attr_
show/queue_attr_store under entry->show_limit/entry->store_limit method. 
However we don't use atomic update helper APIs queue_limits_start_update() 
and queue_limits_commit_update() here bacause blk_apply_bdi_limits() which 
is invoked from queue_limits_commit_update() can overwrite the bdi->ra_
pages value which user actaully wants to store using this attribute. The 
blk_apply_bdi_limits() sets value of bdi->ra_pages based on the optimal 
I/O size(io_opt). So we choose instead to update this attribute value 
outside of using atomic limit update APIs.

Please note that above changes were unit tested against blktests and
quick xfstests with lockdep enabled.

[1] https://lore.kernel.org/all/67637e70.050a0220.3157ee.000c.GAE@google.com/

Nilay Shroff (6):
  blk-sysfs: remove q->sysfs_lock for attributes which don't need it
  blk-sysfs: acquire q->limits_lock while reading attributes
  block: Introduce a dedicated lock for protecting queue elevator
    updates
  blk-sysfs: protect nr_requests update using q->elevator_lock
  blk-sysfs: protect wbt_lat_usec using q->elevator_lock
  blk-sysfs: protect read_ahead_kb using q->limits_lock

---
Changes from v1:
  - Audit all sysfs attributes in block layer and find attributes which
    don't need any locking as well as attributes which needs some form of
    locking; then remove locking from queue_attr_store/queue_attr_show and
    move it into the attributes that still need it in some form, followed
    by replacing it with the more suitable locks (hch)

  - Use dedicated lock for elevator switch/update (Ming Lei)

  - Re-arrange patchset to first segregate and group together all
    attributes which don't need locking followed by grouping attributes
    which need some form of locking.

Link to v1: https://lore.kernel.org/all/20250205144506.663819-1-nilay@linux.ibm.com/
---

 block/blk-core.c       |   1 +
 block/blk-mq.c         |  12 +-
 block/blk-settings.c   |   2 +-
 block/blk-sysfs.c      | 324 ++++++++++++++++++++++++++++-------------
 block/elevator.c       |  18 ++-
 block/genhd.c          |   9 +-
 include/linux/blkdev.h |   1 +
 7 files changed, 254 insertions(+), 113 deletions(-)

--
2.47.1

Comments

Christoph Hellwig Feb. 18, 2025, 9:21 a.m. UTC | #1
The mix of blk-sysfs and block in the subject lines is a bit odd.
Maybe just use the block prefix everywhere?

Also q->sysfs_lock is almost unused now and we should probably look
into killing it entirely.

blk_mq_hw_sysfs_show takes it around the ->show methods which
looks pretty useless.  The debugfs code takes it for a few undocumented
things, which are worth digging into and if needed split into a separate
lock.

The concurrent ranges code takes it - I think that is because it does
register a complex sysfs hierarchy from something that could race with
add_disk / del_gendisk.  Damien, can you help with your thoughts?
(sd.c also has a comment reference it and the removed sysfs_dir_lock
which needs fixing anyway).

blk_register_queue still takes it around a pretty random range of code
including nesting with other locks.  I can't see what it protects
against, but it could use a careful look.

blk_unregister_queue takes it just to clear QUEUE_FLAG_REGISTERED,
which by definition can't really protect against anything.

Also the sysfs_lock in the elevator_queue should probably go away or
be replaced with the new elevator_lock for the non-show/store path
for the same reasons as outlined in this series.
Nilay Shroff Feb. 18, 2025, 12:09 p.m. UTC | #2
On 2/18/25 2:51 PM, Christoph Hellwig wrote:
> The mix of blk-sysfs and block in the subject lines is a bit odd.
> Maybe just use the block prefix everywhere?
> 
Okay I will update subject line of each patch to have "block" prefix everywhere.
 
> Also q->sysfs_lock is almost unused now and we should probably look
> into killing it entirely.
> 
Yes that's the eventual goal and I'd work towards it.

> blk_mq_hw_sysfs_show takes it around the ->show methods which
> looks pretty useless.  The debugfs code takes it for a few undocumented
> things, which are worth digging into and if needed split into a separate
> lock.
> 
> The concurrent ranges code takes it - I think that is because it does
> register a complex sysfs hierarchy from something that could race with
> add_disk / del_gendisk.  Damien, can you help with your thoughts?
> (sd.c also has a comment reference it and the removed sysfs_dir_lock
> which needs fixing anyway).
> 
> blk_register_queue still takes it around a pretty random range of code
> including nesting with other locks.  I can't see what it protects
> against, but it could use a careful look.
> 
> blk_unregister_queue takes it just to clear QUEUE_FLAG_REGISTERED,
> which by definition can't really protect against anything.
Yes and also as clearing QUEUE_FLAG_REGISTERED uses atomic bitops,
we don't need to acquire q->sysfs_lock here.
> 
> Also the sysfs_lock in the elevator_queue should probably go away or
> be replaced with the new elevator_lock for the non-show/store path
> for the same reasons as outlined in this series.
yes agreed.

Once this patch series is approved, I'd work further to eliminate the
remaining use of q->sysfs_lock in block layer. At some places we may
be able to just straight away remove it and other places we may replace
it with appropriate lock.

Thanks,
--Nilay