diff mbox

[BUG] Oops when SCSI device under multipath is removed

Message ID 4E4345F1.9040501@ce.jp.nec.com (mailing list archive)
State Deferred, archived
Headers show

Commit Message

Junichi Nomura Aug. 11, 2011, 3:01 a.m. UTC
Hi James,

On 08/11/11 09:24, Jun'ichi Nomura wrote:
> On 08/11/11 04:52, James Bottomley wrote:
>> On Wed, 2011-08-10 at 13:29 +0900, Jun'ichi Nomura wrote:
>>>   2) SCSI to call blk_cleanup_queue() from device's ->release() callback
>>>      (before 2.6.39, it used to work like this)
>>>      https://lkml.org/lkml/2011/7/2/106
>>
>> Well, they both have documented objections.  I asked why we destroy the
>> elevator in the del case and didn't get any traction, so let me show the
>> actual patch which should fix all of these issues.
>>
>> Is there a good reason for not doing this as a bug fix now?
...
> I think it doesn't work because elevator_exit() and
> blk_throtl_exit() take &q->queue_lock, which may be freed
> by LLD after blk_cleanup_queue, before blk_release_queue.

If the reason you moved scsi_free_queue into scsi_remove_device
is marking the queue dead, how about the following patch?
Do you think it's acceptable?

Jun'ichi Nomura, NEC Corporation


Add blk_kill_queue() for drivers which want to mark the queue dead early.

blk_cleanup_queue() is an interface for LLD to notify block layer
that LLD no longer needs the queue.
Since q->queue_lock may point to a structure in LLD which is freed
after blk_cleanup_queue, blk_cleanup_queue() frees subordinate structures
like elevator, which uses q->queue_lock, to avoid invalid reference.

OTOH, LLD like SCSI wants to just mark the queue dead earlier in tear
down phase.

So this patch factors out the early part of blk_cleanup_queue into
blk_kill_queue for such drivers.


--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

Comments

James Bottomley Aug. 11, 2011, 2:33 p.m. UTC | #1
On Thu, 2011-08-11 at 12:01 +0900, Jun'ichi Nomura wrote:
> Hi James,
> 
> On 08/11/11 09:24, Jun'ichi Nomura wrote:
> > On 08/11/11 04:52, James Bottomley wrote:
> >> On Wed, 2011-08-10 at 13:29 +0900, Jun'ichi Nomura wrote:
> >>>   2) SCSI to call blk_cleanup_queue() from device's ->release() callback
> >>>      (before 2.6.39, it used to work like this)
> >>>      https://lkml.org/lkml/2011/7/2/106
> >>
> >> Well, they both have documented objections.  I asked why we destroy the
> >> elevator in the del case and didn't get any traction, so let me show the
> >> actual patch which should fix all of these issues.
> >>
> >> Is there a good reason for not doing this as a bug fix now?
> ...
> > I think it doesn't work because elevator_exit() and
> > blk_throtl_exit() take &q->queue_lock, which may be freed
> > by LLD after blk_cleanup_queue, before blk_release_queue.
> 
> If the reason you moved scsi_free_queue into scsi_remove_device
> is marking the queue dead, how about the following patch?
> Do you think it's acceptable?

Well, it's just hiding the problem.  The essential problem is that only
block has the correctly refcounted knowledge to know the last release of
the queue reference.  Until that time, the holder of the reference can
use the queue regardless of whether blk_cleanup_queue() has been called.
This is the race you complain about since use of the queue involves the
lock which should be guarded by QUEUE_DEAD checks.

This is essentially unfixable with function calls.  The only way to fix
it is to have a callback model for freeing the external lock.

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
diff mbox

Patch

--- linux-3.1-rc1/include/linux/blkdev.h.orig	2011-08-11 11:19:52.585280223 +0900
+++ linux-3.1-rc1/include/linux/blkdev.h	2011-08-11 11:20:09.482279763 +0900
@@ -804,6 +804,7 @@  extern struct request_queue *blk_init_al
 extern struct request_queue *blk_init_queue(request_fn_proc *, spinlock_t *);
 extern struct request_queue *blk_init_allocated_queue(struct request_queue *,
 						      request_fn_proc *, spinlock_t *);
+extern void blk_kill_queue(struct request_queue *);
 extern void blk_cleanup_queue(struct request_queue *);
 extern void blk_queue_make_request(struct request_queue *, make_request_fn *);
 extern void blk_queue_bounce_limit(struct request_queue *, u64);
--- linux-3.1-rc1/block/blk-core.c.orig	2011-08-10 09:46:06.014043123 +0900
+++ linux-3.1-rc1/block/blk-core.c	2011-08-11 11:19:34.551280697 +0900
@@ -347,6 +347,17 @@  void blk_put_queue(struct request_queue 
 }
 EXPORT_SYMBOL(blk_put_queue);
 
+void blk_kill_queue(struct request_queue *q)
+{
+	blk_sync_queue(q);
+
+	del_timer_sync(&q->backing_dev_info.laptop_mode_wb_timer);
+	mutex_lock(&q->sysfs_lock);
+	queue_flag_set_unlocked(QUEUE_FLAG_DEAD, q);
+	mutex_unlock(&q->sysfs_lock);
+}
+EXPORT_SYMBOL(blk_kill_queue);
+
 /*
  * Note: If a driver supplied the queue lock, it should not zap that lock
  * unexpectedly as some queue cleanup components like elevator_exit() and
@@ -360,12 +371,7 @@  void blk_cleanup_queue(struct request_qu
 	 * are done before moving on. Going into this function, we should
 	 * not have processes doing IO to this device.
 	 */
-	blk_sync_queue(q);
-
-	del_timer_sync(&q->backing_dev_info.laptop_mode_wb_timer);
-	mutex_lock(&q->sysfs_lock);
-	queue_flag_set_unlocked(QUEUE_FLAG_DEAD, q);
-	mutex_unlock(&q->sysfs_lock);
+	blk_kill_queue(q);
 
 	if (q->elevator)
 		elevator_exit(q->elevator);
--- linux-3.1-rc1/drivers/scsi/scsi_sysfs.c.orig	2011-08-09 18:48:13.676485115 +0900
+++ linux-3.1-rc1/drivers/scsi/scsi_sysfs.c	2011-08-11 11:21:07.923277456 +0900
@@ -322,6 +322,7 @@  static void scsi_device_dev_release_user
 		kfree(evt);
 	}
 
+	scsi_free_queue(sdev->request_queue);
 	blk_put_queue(sdev->request_queue);
 	/* NULL queue means the device can't be used */
 	sdev->request_queue = NULL;
@@ -937,7 +938,7 @@  void __scsi_remove_device(struct scsi_de
 	sdev->request_queue->queuedata = NULL;
 
 	/* Freeing the queue signals to block that we're done */
-	scsi_free_queue(sdev->request_queue);
+	blk_kill_queue(sdev->request_queue);
 	put_device(dev);
 }