diff mbox

[0/2] mpt3sas: Reference counting fixes from in-flight mpt2sas

Message ID 1440919326.2104.111.camel@haakon3.risingtidesystems.com (mailing list archive)
State New, archived
Headers show

Commit Message

Nicholas A. Bellinger Aug. 30, 2015, 7:22 a.m. UTC
On Fri, 2015-08-28 at 13:25 -0700, James Bottomley wrote:
> On Thu, 2015-08-27 at 12:15 -0700, Nicholas A. Bellinger wrote:
> > On Thu, 2015-08-27 at 07:40 -0700, James Bottomley wrote:
> > > On Thu, 2015-08-27 at 10:37 +0530, Sreekanth Reddy wrote:
> > > > HI Nicholas & Calvin,
> > > > 
> > > > Thanks for the patchset. Sure We will review and we do some unit
> > > > testing on this patch series. Currently my bandwidth is occupied with
> > > > some internal activity, so by end of next week I will acknowledge this
> > > > series if all the thing are fine with this patch series.
> > > 
> > > Calvin responded to your review feedback and that series has been
> > > outstanding for a while, so I'm not going to drop it from the misc tree.
> > > However, I will reorder to make it ready for the second push. You have
> > > until Friday week to find a problem with it.
> > > 
> > 
> > James, as mentioned this series is functionally identical to Calvin's
> > mpt2sas series.
> > 
> > Please consider merging it to scsi.git/for-next, so both series are
> > together and in-sync.
> 
> Unfortunately, the driver isn't, thanks to drift between v2 and v3 of
> the mpt_sas code bases.  This patch is also dangerous: the early
> versions left unremoved objects lying around, so  getting some stress
> testing from avago is very useful.  At this point in the cycle, the risk
> vs reward of doing a blind upport to mpt3_sas is just too great and the
> time for review and stress testing too limited within the merge window.

To clarify, this series is Calvin's latest -v4 mpt2sas changes that
you've already merged into for-next, and that have been applied (by
hand) to v4.2-rc1 mpt3sas code.

If you look closer, this series is an obvious bug-fix for a class of
long-standing bugs within mpt*sas, and I don't see how keeping the
broken list_head dereferences in one LLD, but not the other makes any
sense at this point. 

Unfortunately, the mpt3sas patches you've merged this week add yet more
bogus mpt3sas_scsih_sas_device_find_by_sas_address() usage.  Really,
adding more broken code to mpt3sas can't possibly be better than just
merging this bug-fix series.

Here's are two cases that required fixing to apply this series atop
latest scsi.git/for-next:


Also, I'm currently using the -v1 series on v3.14.47 atop 40 nodes with
12 HDDs per HBA. (480 total), and the number of HBAs using this series
will double over the next week.  The specific hardware setup is:

  LSI Logic / Symbios Logic SAS3008 PCI-Express Fusion-MPT SAS-3 (rev 02)

Thus far, it has resolved the original OOPsen bug that would appear
occasionally during boot with a failing HDD.  So far, no other new
regressions have appeared.

That said, I'll be posting the updated -v2 atop current scsi/for-next
shortly, and will push to target-pending/for-next-merge for now to be
picked up for 0-day + linux-next.

Please consider picking it up for v4.3-rc1, otherwise I'll plan to push
to Linus with Sreekanth's ACK, barring any new regressions or other
specific -v2 code comments.

--nab

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

James Bottomley Aug. 30, 2015, 4:14 p.m. UTC | #1
On Sun, 2015-08-30 at 00:22 -0700, Nicholas A. Bellinger wrote:
> On Fri, 2015-08-28 at 13:25 -0700, James Bottomley wrote:
> > On Thu, 2015-08-27 at 12:15 -0700, Nicholas A. Bellinger wrote:
> > > On Thu, 2015-08-27 at 07:40 -0700, James Bottomley wrote:
> > > > On Thu, 2015-08-27 at 10:37 +0530, Sreekanth Reddy wrote:
> > > > > HI Nicholas & Calvin,
> > > > > 
> > > > > Thanks for the patchset. Sure We will review and we do some unit
> > > > > testing on this patch series. Currently my bandwidth is occupied with
> > > > > some internal activity, so by end of next week I will acknowledge this
> > > > > series if all the thing are fine with this patch series.
> > > > 
> > > > Calvin responded to your review feedback and that series has been
> > > > outstanding for a while, so I'm not going to drop it from the misc tree.
> > > > However, I will reorder to make it ready for the second push. You have
> > > > until Friday week to find a problem with it.
> > > > 
> > > 
> > > James, as mentioned this series is functionally identical to Calvin's
> > > mpt2sas series.
> > > 
> > > Please consider merging it to scsi.git/for-next, so both series are
> > > together and in-sync.
> > 
> > Unfortunately, the driver isn't, thanks to drift between v2 and v3 of
> > the mpt_sas code bases.  This patch is also dangerous: the early
> > versions left unremoved objects lying around, so  getting some stress
> > testing from avago is very useful.  At this point in the cycle, the risk
> > vs reward of doing a blind upport to mpt3_sas is just too great and the
> > time for review and stress testing too limited within the merge window.
> 
> To clarify, this series is Calvin's latest -v4 mpt2sas changes that
> you've already merged into for-next, and that have been applied (by
> hand) to v4.2-rc1 mpt3sas code.
> 
> If you look closer, this series is an obvious bug-fix for a class of
> long-standing bugs within mpt*sas, and I don't see how keeping the
> broken list_head dereferences in one LLD, but not the other makes any
> sense at this point. 

Look, Nic, the original patches went through four reviews with issues
uncovered and fixed at most of them.  They're now sitting at the head of
the misc tree while they get stress tested so they can easily be ejected
without affecting the rest of the tree should anything fail.

They're hardly "an obvious bug fix" they actually introduce a separated
lifetime for _sas_device and fw_event_work.

Separated lifetime patches should always be a last resort because they
introduce a whole new set of lifetime handling rules for the objects.
Get one of them wrong and either you free it too early (oops) or pin it
forever.  The general rule of thumb is never do separated lifetime
objects unless you really really have to.  Ideally pin them to existing
core objects.  The only reason I'm considering this is because no one
can see how to pin _sas_device to the natural core object (scsi_device).
The firmware event one doesn't make sense to me at all so I'm relying on
the reviewers.  Ideally a fw event is allocated in the event setup and
freed in the event fire (or cancel)  I really don't see we need a
separated lifetime object here.  you have a callback to mediate exactly
whether the event fired or not in the cancel, so there's no ambiguity
about ownership.

Simply because of this, your upport to mp3sas can't go in without the
same level of review and testing.

> Unfortunately, the mpt3sas patches you've merged this week add yet more
> bogus mpt3sas_scsih_sas_device_find_by_sas_address() usage.  Really,
> adding more broken code to mpt3sas can't possibly be better than just
> merging this bug-fix series.

Well, it had two reviewers per patch, one of whom was Martin Petersen
who doesn't give reviewed by lightly.  Funnily enough it seems the list
lost your review feedback for this patch set.

James


--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
index 85ff0dd..897153b 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
@@ -2866,7 +2874,7 @@  _scsih_block_io_device(struct MPT3SAS_ADAPTER *ioc, u16 handle)
        struct scsi_device *sdev;
        struct _sas_device *sas_device;
 
-       sas_device = _scsih_sas_device_find_by_handle(ioc, handle);
+       sas_device = __mpt3sas_get_sdev_by_handle(ioc, handle);
        if (!sas_device)
                return;
 
@@ -2882,6 +2890,8 @@  _scsih_block_io_device(struct MPT3SAS_ADAPTER *ioc, u16 handle)
                        continue;
                _scsih_internal_device_block(sdev, sas_device_priv_data);
        }
+
+       sas_device_put(sas_device);
 }
 
 /**
diff --git a/drivers/scsi/mpt3sas/mpt3sas_transport.c b/drivers/scsi/mpt3sas/mpt3sas_transport.c
index 18f1de5..6074b11 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_transport.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_transport.c
@@ -734,7 +734,7 @@  mpt3sas_transport_port_add(struct MPT3SAS_ADAPTER *ioc, u16 handle,
        rphy->identify = mpt3sas_port->remote_identify;
 
        if (mpt3sas_port->remote_identify.device_type == SAS_END_DEVICE) {
-               sas_device = mpt3sas_scsih_sas_device_find_by_sas_address(ioc,
+               sas_device = __mpt3sas_get_sdev_by_addr(ioc,
                                    mpt3sas_port->remote_identify.sas_address);
                if (!sas_device) {
                        dfailprintk(ioc, printk(MPT3SAS_FMT
@@ -750,8 +750,10 @@  mpt3sas_transport_port_add(struct MPT3SAS_ADAPTER *ioc, u16 handle,
                    ioc->name, __FILE__, __LINE__, __func__);
        }
 
-       if (mpt3sas_port->remote_identify.device_type == SAS_END_DEVICE)
+       if (mpt3sas_port->remote_identify.device_type == SAS_END_DEVICE) {
                sas_device->pend_sas_rphy_add = 0;
+               sas_device_put(sas_device);
+       }
 
        if ((ioc->logging_level & MPT_DEBUG_TRANSPORT))
                dev_printk(KERN_INFO, &rphy->dev,