diff mbox

Fix a bdi reregistration race, v2

Message ID 564F9AFF.3050605@sandisk.com (mailing list archive)
State New, archived
Headers show

Commit Message

Bart Van Assche Nov. 20, 2015, 10:13 p.m. UTC
Unregister and reregister BDI devices in the proper order. This patch
avoids that the following kernel warning can be triggered during
SCSI device reregistration:

WARNING: CPU: 7 PID: 203 at fs/sysfs/dir.c:31 sysfs_warn_dup+0x68/0x80()
sysfs: cannot create duplicate filename '/devices/virtual/bdi/8:32'
Workqueue: events_unbound async_run_entry_fn
Call Trace:
[<ffffffff814ff5a4>] dump_stack+0x4c/0x65
[<ffffffff810746ba>] warn_slowpath_common+0x8a/0xc0
[<ffffffff81074736>] warn_slowpath_fmt+0x46/0x50
[<ffffffff81237ca8>] sysfs_warn_dup+0x68/0x80
[<ffffffff81237d8e>] sysfs_create_dir_ns+0x7e/0x90
[<ffffffff81291f58>] kobject_add_internal+0xa8/0x320
[<ffffffff812923a0>] kobject_add+0x60/0xb0
[<ffffffff8138c937>] device_add+0x107/0x5e0
[<ffffffff8138d018>] device_create_groups_vargs+0xd8/0x100
[<ffffffff8138d05c>] device_create_vargs+0x1c/0x20
[<ffffffff8117f233>] bdi_register+0x63/0x2a0
[<ffffffff8117f497>] bdi_register_dev+0x27/0x30
[<ffffffff81281549>] add_disk+0x1a9/0x4e0
[<ffffffffa00c5739>] sd_probe_async+0x119/0x1d0 [sd_mod]
[<ffffffff8109a81a>] async_run_entry_fn+0x4a/0x140
[<ffffffff81091078>] process_one_work+0x1d8/0x7c0
[<ffffffff81091774>] worker_thread+0x114/0x460
[<ffffffff81097878>] kthread+0xf8/0x110
[<ffffffff8150801f>] ret_from_fork+0x3f/0x70

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Jens Axboe <axboe@fb.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Aaro Koskinen <aaro.koskinen@iki.fi>
Cc: <stable@vger.kernel.org>
---
 drivers/scsi/scsi_sysfs.c        |  2 ++
 include/linux/backing-dev-defs.h |  1 +
 include/linux/backing-dev.h      |  1 +
 mm/backing-dev.c                 | 28 ++++++++++++++++++++++++++--
 4 files changed, 30 insertions(+), 2 deletions(-)

Comments

Aaro Koskinen Nov. 20, 2015, 10:43 p.m. UTC | #1
Hi,

I think you should squash the revert of v1 into this patch, and then
document the crash the original patch caused and how this new patch is
fixing that.

A.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bart Van Assche Nov. 20, 2015, 10:55 p.m. UTC | #2
On 11/20/2015 02:44 PM, Aaro Koskinen wrote:
> I think you should squash the revert of v1 into this patch, and then
> document the crash the original patch caused and how this new patch is
> fixing that.

Hello Aaro,

I'd like to know the opinion of the SCSI maintainers about this. It's 
not impossible that they would prefer to submit the revert to Linus 
quickly and only send the reworked fix to Linus during the v4.5 merge 
window.

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoph Hellwig Nov. 24, 2015, 11:13 p.m. UTC | #3
What sort of re-registration is this? Seems like we should only
release the minor number once the bdi is released.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bart Van Assche Nov. 24, 2015, 11:23 p.m. UTC | #4
On 11/24/2015 03:13 PM, Christoph Hellwig wrote:
> What sort of re-registration is this? Seems like we should only
> release the minor number once the bdi is released.

Hello Christoph,

As you most likely know the BDI device name for disks is based on the 
device major and minor number:

$ ls -l /dev/sda
brw-rw---- 1 root disk 8, 0 Nov 24 14:53 /dev/sda
$ ls -l /sys/block/sda/bdi
lrwxrwxrwx 1 root root 0 Nov 24 15:17 /sys/block/sda/bdi -> 
../../../../../../../../virtual/bdi/8:0

So if a driver stops using a (major, minor) number pair and the same 
device number is reused before the bdi device has been released the 
warning mentioned in the patch description at the start of this thread 
is triggered. This patch fixes that race by removing the bdi device from 
sysfs during the __scsi_remove_device() call instead of when the bdi 
device is released.

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoph Hellwig Nov. 25, 2015, 8:47 a.m. UTC | #5
Hi Bart,

On Tue, Nov 24, 2015 at 03:23:21PM -0800, Bart Van Assche wrote:
> So if a driver stops using a (major, minor) number pair and the same device
> number is reused before the bdi device has been released the warning
> mentioned in the patch description at the start of this thread is triggered.
> This patch fixes that race by removing the bdi device from sysfs during the
> __scsi_remove_device() call instead of when the bdi device is released.

that's why I suggested only releasing the minor number (or rather dev_t)
once we release the BDI, similar to what MD and DM do.

But what I really wanted to ask for is what your reproducer looks like.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bart Van Assche Nov. 25, 2015, 2:59 p.m. UTC | #6
On 11/25/15 00:47, Christoph Hellwig wrote:
> But what I really wanted to ask for is what your reproducer looks like.

Hello Christoph,

This race is hard to trigger. I can trigger it by repeatedly removing 
and re-adding SRP SCSI devices. Enabling debug options like SLUB 
debugging and kmemleak helps. I think that is because these debug 
options slow down the SCSI device removal code and thereby increase the 
chance that this race is triggered.

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Martin K. Petersen Dec. 1, 2015, 12:57 a.m. UTC | #7
>>>>> "Bart" == Bart Van Assche <bart.vanassche@sandisk.com> writes:

Bart,

Bart> This race is hard to trigger. I can trigger it by repeatedly
Bart> removing and re-adding SRP SCSI devices. Enabling debug options
Bart> like SLUB debugging and kmemleak helps. I think that is because
Bart> these debug options slow down the SCSI device removal code and
Bart> thereby increase the chance that this race is triggered.

Any updates on this? Your updated patch has no reviews.

Should I just revert the original patch for 4.4?
Bart Van Assche Dec. 1, 2015, 1:18 a.m. UTC | #8
On 11/30/2015 04:57 PM, Martin K. Petersen wrote:
>>>>>> "Bart" == Bart Van Assche <bart.vanassche@sandisk.com> writes:
>
> Bart> This race is hard to trigger. I can trigger it by repeatedly
> Bart> removing and re-adding SRP SCSI devices. Enabling debug options
> Bart> like SLUB debugging and kmemleak helps. I think that is because
> Bart> these debug options slow down the SCSI device removal code and
> Bart> thereby increase the chance that this race is triggered.
>
> Any updates on this? Your updated patch has no reviews.
>
> Should I just revert the original patch for 4.4?

Hello Martin,

Since the original patch caused a regression, please proceed with 
reverting the original patch.

Regarding this patch: is there anyone on the CC-list of this e-mail who 
can review it ?

Thanks,

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoph Hellwig Dec. 1, 2015, 7:23 a.m. UTC | #9
On Mon, Nov 30, 2015 at 05:18:50PM -0800, Bart Van Assche wrote:
> Since the original patch caused a regression, please proceed with reverting
> the original patch.
> 
> Regarding this patch: is there anyone on the CC-list of this e-mail who can
> review it ?

I'm not too fond of the approach.  I'd much prefer if SCSI would just
release the dev_t later, similar to DM or MD.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bart Van Assche March 28, 2016, 9:26 p.m. UTC | #10
On 11/30/2015 11:23 PM, Christoph Hellwig wrote:
> On Mon, Nov 30, 2015 at 05:18:50PM -0800, Bart Van Assche wrote:
>> Since the original patch caused a regression, please proceed with reverting
>> the original patch.
>>
>> Regarding this patch: is there anyone on the CC-list of this e-mail who can
>> review it ?
>
> I'm not too fond of the approach.  I'd much prefer if SCSI would just
> release the dev_t later, similar to DM or MD.

(replying to an e-mail of four months ago)

Hello Christoph,

Sorry that it took so long but last week I finally found the time to 
look further into this. I will post a third version of this patch.

Bart.

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
index f5ace2b..8d64518 100644
--- a/drivers/scsi/scsi_sysfs.c
+++ b/drivers/scsi/scsi_sysfs.c
@@ -12,6 +12,7 @@ 
 #include <linux/blkdev.h>
 #include <linux/device.h>
 #include <linux/pm_runtime.h>
+#include <linux/backing-dev.h>
 
 #include <scsi/scsi.h>
 #include <scsi/scsi_device.h>
@@ -1110,6 +1111,7 @@  void __scsi_remove_device(struct scsi_device *sdev)
 		device_unregister(&sdev->sdev_dev);
 		transport_remove_device(dev);
 		scsi_dh_remove_device(sdev);
+		bdi_sysfs_del(&sdev->request_queue->backing_dev_info);
 		device_del(dev);
 	} else
 		put_device(&sdev->sdev_dev);
diff --git a/include/linux/backing-dev-defs.h b/include/linux/backing-dev-defs.h
index 1b4d69f..1a42ecb 100644
--- a/include/linux/backing-dev-defs.h
+++ b/include/linux/backing-dev-defs.h
@@ -135,6 +135,7 @@  struct bdi_writeback {
 
 struct backing_dev_info {
 	struct list_head bdi_list;
+	bool is_visible;
 	unsigned long ra_pages;	/* max readahead in PAGE_CACHE_SIZE units */
 	unsigned int capabilities; /* Device capabilities */
 	congested_fn *congested_fn; /* Function pointer if device is md/dm */
diff --git a/include/linux/backing-dev.h b/include/linux/backing-dev.h
index c82794f..9004d90 100644
--- a/include/linux/backing-dev.h
+++ b/include/linux/backing-dev.h
@@ -24,6 +24,7 @@  __printf(3, 4)
 int bdi_register(struct backing_dev_info *bdi, struct device *parent,
 		const char *fmt, ...);
 int bdi_register_dev(struct backing_dev_info *bdi, dev_t dev);
+void bdi_sysfs_del(struct backing_dev_info *bdi);
 void bdi_unregister(struct backing_dev_info *bdi);
 
 int __must_check bdi_setup_and_register(struct backing_dev_info *, char *);
diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index 8ed2ffd..b56971f 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -774,6 +774,7 @@  int bdi_init(struct backing_dev_info *bdi)
 	int ret;
 
 	bdi->dev = NULL;
+	bdi->is_visible = false;
 
 	bdi->min_ratio = 0;
 	bdi->max_ratio = 100;
@@ -806,6 +807,7 @@  int bdi_register(struct backing_dev_info *bdi, struct device *parent,
 		return PTR_ERR(dev);
 
 	bdi->dev = dev;
+	bdi->is_visible = true;
 
 	bdi_debug_register(bdi, dev_name(dev));
 	set_bit(WB_registered, &bdi->wb.state);
@@ -837,6 +839,28 @@  static void bdi_remove_from_list(struct backing_dev_info *bdi)
 	synchronize_rcu_expedited();
 }
 
+/**
+ * bdi_sysfs_del - remove a BDI device from sysfs
+ * @bdi: BDI device pointer.
+ *
+ * It is safe to call this function more than once.
+ */
+void bdi_sysfs_del(struct backing_dev_info *bdi)
+{
+	bool is_visible = false;
+
+	spin_lock_bh(&bdi_lock);
+	swap(bdi->is_visible, is_visible);
+	spin_unlock_bh(&bdi_lock);
+
+	if (!is_visible)
+		return;
+
+	bdi_debug_unregister(bdi);
+	device_del(bdi->dev);
+}
+EXPORT_SYMBOL(bdi_sysfs_del);
+
 void bdi_unregister(struct backing_dev_info *bdi)
 {
 	/* make sure nobody finds us on the bdi_list anymore */
@@ -845,8 +869,8 @@  void bdi_unregister(struct backing_dev_info *bdi)
 	cgwb_bdi_destroy(bdi);
 
 	if (bdi->dev) {
-		bdi_debug_unregister(bdi);
-		device_unregister(bdi->dev);
+		bdi_sysfs_del(bdi);
+		put_device(bdi->dev);
 		bdi->dev = NULL;
 	}
 }