diff mbox

[REGRESSION,v4.3] scsi_dh: use-after-free when removing scsi device

Message ID 20151001043844.GA8397@xzibit.linux.bs1.fc.nec.co.jp (mailing list archive)
State New, archived
Headers show

Commit Message

Junichi Nomura Oct. 1, 2015, 4:38 a.m. UTC
On 10/01/15 09:56, Junichi Nomura wrote:
> With 4.2 kernel, scsi_dh->detach() was not called until the last
> reference has gone.  With 4.3-rc3, scsi_dh->detach() is directly called
> from the context of scsi_remove_device(). That's the point.

For my test script in the original post, I can't reproduce the crash
by just swapping the order of scsi_dh_handler_detach() and
device_remove_file() like this:

  void scsi_dh_remove_device(struct scsi_device *sdev)
  {
        device_remove_file(&sdev->sdev_gendev, &scsi_dh_state_attr);
        if (sdev->handler)
                scsi_dh_handler_detach(sdev);
  }

But another workload using dm-multipath still caues the same crash.
I think a patch like the following is needed.
What do you think?


The commit 1bab0de0274f ("dm-mpath, scsi_dh: don't let dm detach device
handlers") removed reference counting of attached scsi device handler.
As a result scsi_dh->detach() is directly called in the context of
scsi_remove_device() where activation request can be still in flight.

This patch moves scsi_dh_handler_detach() to
scsi_device_dev_release_usercontext(), at that point the device
is already in quiesced state.

To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Christoph Hellwig Oct. 1, 2015, 5:21 a.m. UTC | #1
On Thu, Oct 01, 2015 at 04:38:45AM +0000, Junichi Nomura wrote:
> But another workload using dm-multipath still caues the same crash.
> I think a patch like the following is needed.
> What do you think?

Thanks Junichi,

I suspected that we should defer the release until we tear down
the sdev instead of unregister time.  Give me a little more time
to review your patch and actually reproduce your issue before a formal
ACK, though.

Any chance you could share all your multipath tests in a git repository
somewhere?  It seems like you're the only one actually having a good
set of reproducable but minimalistic tests.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Junichi Nomura Oct. 1, 2015, 11:40 a.m. UTC | #2
On 10/01/15 14:21, Christoph Hellwig wrote:
> Any chance you could share all your multipath tests in a git repository
> somewhere?  It seems like you're the only one actually having a good
> set of reproducable but minimalistic tests.

Hmm, sorry I don't have a public git repository...

I'm using a pair of KVM guest, one for iSCSI target and the other
for testing.

I could reproduce the crash using this loop, within a few minutes:

   service multipathd start
   while true; do
     multipath -F
     iscsiadm -m node --logout
     iscsiadm -m node --login
   done

It might implicitly depend on udev to do some small amount of I/O
after device uevent though.

/etc/multipath.conf is like this:

blacklist {
      devnode "^(ram|raw|loop|fd|md|dm-|sr|scd|st|vd)[0-9]*"
}
defaults {
      user_friendly_names no
      find_multipaths yes
      path_grouping_policy    group_by_prio
      path_selector           "queue-length 0"
      path_checker            "tur"
      failback immediate
      prio alua
      retain_attached_hw_handler yes
}
Christoph Hellwig Oct. 4, 2015, 7:42 a.m. UTC | #3
On Thu, Oct 01, 2015 at 11:40:23AM +0000, Junichi Nomura wrote:
> On 10/01/15 14:21, Christoph Hellwig wrote:
> > Any chance you could share all your multipath tests in a git repository
> > somewhere?  It seems like you're the only one actually having a good
> > set of reproducable but minimalistic tests.
> 
> Hmm, sorry I don't have a public git repository...
> 
> I'm using a pair of KVM guest, one for iSCSI target and the other
> for testing.

Any chance you could share your various scripts in some way to that
people doing multipath changes can run them to verify those changes?

> 
> I could reproduce the crash using this loop, within a few minutes:
> 
>    service multipathd start
>    while true; do
>      multipath -F
>      iscsiadm -m node --logout
>      iscsiadm -m node --login
>    done
> 
> It might implicitly depend on udev to do some small amount of I/O
> after device uevent though.

I can't reproduce this unfortunately.  I suspect udev doesn't do enough
stupid things on my old Debian test system.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoph Hellwig Oct. 4, 2015, 7:43 a.m. UTC | #4
This patch looks fine to me:

Acked-by: Christoph Hellwig <hch@lst.de>
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Junichi Nomura Oct. 7, 2015, 5:55 a.m. UTC | #5
On 10/04/15 16:42, Christoph Hellwig wrote:
> Any chance you could share your various scripts in some way to that
> people doing multipath changes can run them to verify those changes?

Hi Christoph,

I've refined some pieces of test scripts which could be useful for
people and posted here:
  http://marc.info/?l=linux-scsi&m=144419652913948&w=2
  https://www.redhat.com/archives/dm-devel/2015-October/msg00034.html

The scripts are stress testing and expected to trigger crash/stall/
data corruption if you run long enough. Though it's not perfect I
hope this helps people do regression testing for dm-mpath related
code changes.
diff mbox

Patch

diff --git a/drivers/scsi/scsi_dh.c b/drivers/scsi/scsi_dh.c
index edb044a..19bf9db 100644
--- a/drivers/scsi/scsi_dh.c
+++ b/drivers/scsi/scsi_dh.c
@@ -232,10 +232,14 @@  int scsi_dh_add_device(struct scsi_device *sdev)
 	return err;
 }
 
-void scsi_dh_remove_device(struct scsi_device *sdev)
+void scsi_dh_release_device(struct scsi_device *sdev)
 {
 	if (sdev->handler)
 		scsi_dh_handler_detach(sdev);
+}
+
+void scsi_dh_remove_device(struct scsi_device *sdev)
+{
 	device_remove_file(&sdev->sdev_gendev, &scsi_dh_state_attr);
 }
 
diff --git a/drivers/scsi/scsi_priv.h b/drivers/scsi/scsi_priv.h
index 644bb73..4d01cdb 100644
--- a/drivers/scsi/scsi_priv.h
+++ b/drivers/scsi/scsi_priv.h
@@ -173,9 +173,11 @@  extern struct async_domain scsi_sd_probe_domain;
 /* scsi_dh.c */
 #ifdef CONFIG_SCSI_DH
 int scsi_dh_add_device(struct scsi_device *sdev);
+void scsi_dh_release_device(struct scsi_device *sdev);
 void scsi_dh_remove_device(struct scsi_device *sdev);
 #else
 static inline int scsi_dh_add_device(struct scsi_device *sdev) { return 0; }
+static inline void scsi_dh_release_device(struct scsi_device *sdev) { }
 static inline void scsi_dh_remove_device(struct scsi_device *sdev) { }
 #endif
 
diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
index b333389..dff8faf 100644
--- a/drivers/scsi/scsi_sysfs.c
+++ b/drivers/scsi/scsi_sysfs.c
@@ -399,6 +399,8 @@  static void scsi_device_dev_release_usercontext(struct work_struct *work)
 
 	sdev = container_of(work, struct scsi_device, ew.work);
 
+	scsi_dh_release_device(sdev);
+
 	parent = sdev->sdev_gendev.parent;
 
 	spin_lock_irqsave(sdev->host->host_lock, flags);--