Message ID | 1505760831-7747-2-git-send-email-longman@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Taking a look at this it seems like using a lock in struct block_device isn't the right thing to do anyway - all the action is on fields in struct blk_trace, so having a lock inside that would make a lot more sense. It would also help to document what exactly we're actually protecting.
On 09/18/2017 08:01 PM, Christoph Hellwig wrote: > Taking a look at this it seems like using a lock in struct block_device > isn't the right thing to do anyway - all the action is on fields in > struct blk_trace, so having a lock inside that would make a lot more > sense. > > It would also help to document what exactly we're actually protecting. I think I documented in the patch that the lock has to protect changes in the blktrace structure as well as the allocation and destruction of it. Because of that, it can't be put inside the blktrace structure. The original code use the bd_mutex of the block_device structure. I just change the code to use another bd_fsfreeze_mutex in the same structure. In an earlier patch version, I used a global blktrace mutex. This was deemed to be not scalable enough and so I now use the bd_fsfreeze_mutex instead. Cheers, Longman
On Tue, Sep 19, 2017 at 08:49:12AM -0400, Waiman Long wrote: > On 09/18/2017 08:01 PM, Christoph Hellwig wrote: > > Taking a look at this it seems like using a lock in struct block_device > > isn't the right thing to do anyway - all the action is on fields in > > struct blk_trace, so having a lock inside that would make a lot more > > sense. > > > > It would also help to document what exactly we're actually protecting. > > I think I documented in the patch that the lock has to protect changes > in the blktrace structure as well as the allocation and destruction of > it. Because of that, it can't be put inside the blktrace structure. The > original code use the bd_mutex of the block_device structure. I just > change the code to use another bd_fsfreeze_mutex in the same structure. Either way it has absolutely nothing to do with struct block_device, given that struct blk_trace hangs off the request_queue. Reusing a mutex just because it happens to live in a structure also generally is a bad idea if the protected data is in no way related.
On 09/19/2017 10:38 AM, Christoph Hellwig wrote: > On Tue, Sep 19, 2017 at 08:49:12AM -0400, Waiman Long wrote: >> On 09/18/2017 08:01 PM, Christoph Hellwig wrote: >>> Taking a look at this it seems like using a lock in struct block_device >>> isn't the right thing to do anyway - all the action is on fields in >>> struct blk_trace, so having a lock inside that would make a lot more >>> sense. >>> >>> It would also help to document what exactly we're actually protecting. >> I think I documented in the patch that the lock has to protect changes >> in the blktrace structure as well as the allocation and destruction of >> it. Because of that, it can't be put inside the blktrace structure. The >> original code use the bd_mutex of the block_device structure. I just >> change the code to use another bd_fsfreeze_mutex in the same structure. > Either way it has absolutely nothing to do with struct block_device, > given that struct blk_trace hangs off the request_queue. > > Reusing a mutex just because it happens to live in a structure also > generally is a bad idea if the protected data is in no way related. I was trying not to add a new mutex to a structure just for blktrace as it is an optional feature that is enabled only if the CONFIG_BLK_DEV_IO_TRACE config option is defined and it will only need to be taken occasionally. As filesystem freeze looks orthogonal to blktrace and the mutex also looks to be used sparingly, I think it is a good match to overload it to control blktrace as well. I could modify the patch to use a mutex in the request_queue structure. The current sysfs_lock mutex has about 74 references. So I am not totally sure if it is safe to reuse. So the only option is to add something like #ifdef CONFIG_BLK_DEV_IO_TRACE struct mutex blktrace_mutex; #endif to the request_queue structure. That structure is large enough that adding a mutex won't increase the size much percentage-wise. I would like to keep the current patch as is as I don't see any problem with it. However, I can revise the patch as discussed above if you guys prefer that alternative. Cheers, Longman
On Tue, Sep 19, 2017 at 11:58:34AM -0400, Waiman Long wrote: > I was trying not to add a new mutex to a structure just for blktrace as > it is an optional feature that is enabled only if the > CONFIG_BLK_DEV_IO_TRACE config option is defined and it will only need > to be taken occasionally. So? Make the lock optional, too. > As filesystem freeze looks orthogonal to blktrace and the mutex also > looks to be used sparingly, I think it is a good match to overload it to > control blktrace as well. If the functionally is orthogonal that is a reason not to share a lock, not to the contrary. > I could modify the patch to use a mutex in the request_queue structure. > The current sysfs_lock mutex has about 74 references. So I am not > totally sure if it is safe to reuse. So the only option is to add > something like > > #ifdef CONFIG_BLK_DEV_IO_TRACE > struct mutex blktrace_mutex; > #endif > > to the request_queue structure. That structure is large enough that > adding a mutex won't increase the size much percentage-wise. Call it blk_trace mutex and move it right next to the blk_trace structure: ifdef CONFIG_BLK_DEV_IO_TRACE struct blk_trace *blk_trace; struct mutex blk_trace_mutex; #endif which makes it completely obvious to any read what you are protecting with it.
On Tue, 19 Sep 2017 13:41:35 -0700 Christoph Hellwig <hch@infradead.org> wrote: > Call it blk_trace mutex and move it right next to the blk_trace > structure: > > ifdef CONFIG_BLK_DEV_IO_TRACE > struct blk_trace *blk_trace; > struct mutex blk_trace_mutex; > #endif > > which makes it completely obvious to any read what you are protecting > with it. As a 1000ft away bystander, this appears to be the most logical solution. -- Steve
diff --git a/include/linux/fs.h b/include/linux/fs.h index 339e737..330b572 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -448,7 +448,7 @@ struct block_device { /* The counter of freeze processes */ int bd_fsfreeze_count; - /* Mutex for freeze */ + /* Mutex for freeze and blktrace */ struct mutex bd_fsfreeze_mutex; } __randomize_layout; diff --git a/kernel/trace/blktrace.c b/kernel/trace/blktrace.c index 2a685b4..7cd5d1d 100644 --- a/kernel/trace/blktrace.c +++ b/kernel/trace/blktrace.c @@ -648,6 +648,20 @@ int blk_trace_startstop(struct request_queue *q, int start) } EXPORT_SYMBOL_GPL(blk_trace_startstop); +/* + * When reading or writing the blktrace sysfs files, the references to the + * opened sysfs or device files should prevent the underlying block device + * from being removed. So no further delete protection is really needed. + * + * Protection from multiple readers and writers accessing blktrace data + * concurrently is still required. The bd_mutex was used for this purpose. + * That could lead to deadlock with concurrent block device deletion and + * sysfs access. Instead, the block device bd_fsfreeze_mutex is now + * overloaded for blktrace data protection. Like freeze/thaw, blktrace + * functionality is not frequently used. There is no point in adding + * one more mutex to the block_device structure just for blktrace. + */ + /** * blk_trace_ioctl: - handle the ioctls associated with tracing * @bdev: the block device @@ -665,7 +679,7 @@ int blk_trace_ioctl(struct block_device *bdev, unsigned cmd, char __user *arg) if (!q) return -ENXIO; - mutex_lock(&bdev->bd_mutex); + mutex_lock(&bdev->bd_fsfreeze_mutex); switch (cmd) { case BLKTRACESETUP: @@ -691,7 +705,7 @@ int blk_trace_ioctl(struct block_device *bdev, unsigned cmd, char __user *arg) break; } - mutex_unlock(&bdev->bd_mutex); + mutex_unlock(&bdev->bd_fsfreeze_mutex); return ret; } @@ -1727,7 +1741,7 @@ static ssize_t sysfs_blk_trace_attr_show(struct device *dev, if (q == NULL) goto out_bdput; - mutex_lock(&bdev->bd_mutex); + mutex_lock(&bdev->bd_fsfreeze_mutex); if (attr == &dev_attr_enable) { ret = sprintf(buf, "%u\n", !!q->blk_trace); @@ -1746,7 +1760,7 @@ static ssize_t sysfs_blk_trace_attr_show(struct device *dev, ret = sprintf(buf, "%llu\n", q->blk_trace->end_lba); out_unlock_bdev: - mutex_unlock(&bdev->bd_mutex); + mutex_unlock(&bdev->bd_fsfreeze_mutex); out_bdput: bdput(bdev); out: @@ -1788,7 +1802,7 @@ static ssize_t sysfs_blk_trace_attr_store(struct device *dev, if (q == NULL) goto out_bdput; - mutex_lock(&bdev->bd_mutex); + mutex_lock(&bdev->bd_fsfreeze_mutex); if (attr == &dev_attr_enable) { if (value) @@ -1814,7 +1828,7 @@ static ssize_t sysfs_blk_trace_attr_store(struct device *dev, } out_unlock_bdev: - mutex_unlock(&bdev->bd_mutex); + mutex_unlock(&bdev->bd_fsfreeze_mutex); out_bdput: bdput(bdev); out:
The lockdep code had reported the following unsafe locking scenario: CPU0 CPU1 ---- ---- lock(s_active#228); lock(&bdev->bd_mutex/1); lock(s_active#228); lock(&bdev->bd_mutex); *** DEADLOCK *** The deadlock may happen when one task (CPU1) is trying to delete a partition in a block device and another task (CPU0) is accessing tracing sysfs file (e.g. /sys/block/dm-1/trace/act_mask) in that partition. The s_active isn't an actual lock. It is a reference count (kn->count) on the sysfs (kernfs) file. Removal of a sysfs file, however, require a wait until all the references are gone. The reference count is treated like a rwsem using lockdep instrumentation code. The fact that a thread is in the sysfs callback method or in the ioctl call means there is a reference to the opended sysfs or device file. That should prevent the underlying block structure from being removed. Instead of using bd_mutex in the block_device structure, the other bd_fsfreeze_mutex mutex in the block_device structure is now overloaded to protect against concurrent blktrace data access in the blktrace.c file. There is no point in adding one more mutex to the block_device structure just for blktrace. Signed-off-by: Waiman Long <longman@redhat.com> --- include/linux/fs.h | 2 +- kernel/trace/blktrace.c | 26 ++++++++++++++++++++------ 2 files changed, 21 insertions(+), 7 deletions(-)