Message ID | 20160815130829.GA14829@redhat.com (mailing list archive) |
---|---|
State | Not Applicable, archived |
Headers | show |
On Mon, Aug 15 2016 at 9:08P -0400, Mike Snitzer <snitzer@redhat.com> wrote: > Not sure how Hannes' original patch was overlooked but... It wasn't overlooked. It was very much unresolved. The original thread unraveled to all sorts of PR edge case concerns (and doubt about whether anything relies on the current multipath handling of reservation conflicts). See patchwork thread below. Obviously you have found a problematic case which requires Hannes' patch. So there is definitely increased pressure to fix this. > One issue I see with the patch is it will return -EBADE regardless of > whether 'queue_if_no_path' is set. That's fine (since path isn't being > failed for this case any more). But why not just return error > immediately? > > But taking a step back, shouldn't all paths be tried once before > returning an error? Obviously that'd impose the use of a new > 'conflict_seen' (or whatever) flag at the end of 'struct pgpath'. And > then only return error if the flag is set. > > I threw together the following RFC patch to illustrate what I'm > thinking, but thinking about this further it is tough to know all paths > have seen the reservation conflict (my patch assumes if 'conflict_seen' > is set then the conflict iterated through all paths.. but if paths > aren't being failed there isn't a guarantee that the path selector > didn't just hand us back the same path that just experienced the > conflict). Seems we still need a more sophisticated approach. But I'm left wondering: if we didn't do it would anything notice? Sadly, the same big question from the original thread from a year ago: https://patchwork.kernel.org/patch/6797111/ > So this is throw-away for now (and I'll get Hannes' patch applied for > 4.8-rc3, with the tweak of returning -EBADE immediately): Unfortunately, I'm _not_ staging Hannes' patch until I have James Bottomley's Ack (given his original issues with the patch haven't been explained away AFAICT). Mike -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Getting back to this after Hannes recovered from his vacation and I had a chat with him.. On Mon, Aug 15, 2016 at 09:40:39AM -0400, Mike Snitzer wrote: > Seems we still need a more sophisticated approach. But I'm left > wondering: if we didn't do it would anything notice? Sadly, the same > big question from the original thread from a year ago: Yes. I have a customer looking to push the pNFS SCSI layout into a product, and the major show stopper right now is that we can trivially get into failver loops without this (or and equivalent) fix. A year ago SCSI layout was still work in progress in the IETF, people use the similar block layout instead that doesn't use PRs and we also didn't have the in-kernel PR API, so you effectively couldn't use PRs with multipathing. > https://patchwork.kernel.org/patch/6797111/ > > > So this is throw-away for now (and I'll get Hannes' patch applied for > > 4.8-rc3, with the tweak of returning -EBADE immediately): > > Unfortunately, I'm _not_ staging Hannes' patch until I have James > Bottomley's Ack (given his original issues with the patch haven't been > explained away AFAICT). I've added James to the Cc. His argument was that the old behavior could be implemented to use some non-standard use of reservations without a specific example. I don't really think his example even is practical - once we use dm-mpath it exclusively claims the underling block devices, so any sort of selective reservations would have had to happen before even starting dm-multipath. So a dynamic SAN controller would have to tear down and rebuild the dm-multipath setup at all the time. -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, 2016-09-26 at 09:52 -0700, Christoph Hellwig wrote: > Getting back to this after Hannes recovered from his vacation > and I had a chat with him.. > > On Mon, Aug 15, 2016 at 09:40:39AM -0400, Mike Snitzer wrote: > > Seems we still need a more sophisticated approach. But I'm left > > wondering: if we didn't do it would anything notice? Sadly, the > > same > > big question from the original thread from a year ago: > > Yes. I have a customer looking to push the pNFS SCSI layout into > a product, and the major show stopper right now is that we can > trivially get into failver loops without this (or and equivalent) > fix. > > A year ago SCSI layout was still work in progress in the IETF, > people use the similar block layout instead that doesn't use > PRs and we also didn't have the in-kernel PR API, so you effectively > couldn't use PRs with multipathing. > > > https://patchwork.kernel.org/patch/6797111/ > > > > > So this is throw-away for now (and I'll get Hannes' patch applied > > > for > > > 4.8-rc3, with the tweak of returning -EBADE immediately): > > > > Unfortunately, I'm _not_ staging Hannes' patch until I have James > > Bottomley's Ack (given his original issues with the patch haven't > > been > > explained away AFAICT). > > I've added James to the Cc. His argument was that the old behavior > could be implemented to use some non-standard use of reservations > without a specific example. I don't really think his example even > is practical - once we use dm-mpath it exclusively claims the > underling block devices, so any sort of selective reservations would > have had to happen before even starting dm-multipath. Well, now that you've made me reread the thread from 14 months ago that wasn't quite my objection. The objection hinged on the fact that anything that uses path specific reservations would now fail instead of retrying on a different path. I thought the IBM SVC did this and Hannes implied he'd be able to check this ... did anyone check? If we've checked and there's no issue with the SVC, then I don't have any other objections. > So a dynamic SAN controller would have to tear down and rebuild the > dm-multipath setup at all the time. That was the job of the SVC: it sat in the middle of the SAN and controlled which node saw what storage. https://www.ibm.com/support/knowledgecenter/STPVGU/com.ibm.storage.svc.console.720.doc/svc_svcovr_1bcfiq.html The SVC can issue its own reservations in those circumstances. What I'm not at all clear on is whether they'll interact badly with the dm -mp reservations. James -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 09/26/2016 09:06 PM, James Bottomley wrote: > On Mon, 2016-09-26 at 09:52 -0700, Christoph Hellwig wrote: >> Getting back to this after Hannes recovered from his vacation >> and I had a chat with him.. >> >> On Mon, Aug 15, 2016 at 09:40:39AM -0400, Mike Snitzer wrote: >>> Seems we still need a more sophisticated approach. But I'm left >>> wondering: if we didn't do it would anything notice? Sadly, the >>> same >>> big question from the original thread from a year ago: >> >> Yes. I have a customer looking to push the pNFS SCSI layout into >> a product, and the major show stopper right now is that we can >> trivially get into failver loops without this (or and equivalent) >> fix. >> >> A year ago SCSI layout was still work in progress in the IETF, >> people use the similar block layout instead that doesn't use >> PRs and we also didn't have the in-kernel PR API, so you effectively >> couldn't use PRs with multipathing. >> >>> https://patchwork.kernel.org/patch/6797111/ >>> >>>> So this is throw-away for now (and I'll get Hannes' patch applied >>>> for >>>> 4.8-rc3, with the tweak of returning -EBADE immediately): >>> >>> Unfortunately, I'm _not_ staging Hannes' patch until I have James >>> Bottomley's Ack (given his original issues with the patch haven't >>> been >>> explained away AFAICT). >> >> I've added James to the Cc. His argument was that the old behavior >> could be implemented to use some non-standard use of reservations >> without a specific example. I don't really think his example even >> is practical - once we use dm-mpath it exclusively claims the >> underling block devices, so any sort of selective reservations would >> have had to happen before even starting dm-multipath. > > Well, now that you've made me reread the thread from 14 months ago that > wasn't quite my objection. The objection hinged on the fact that > anything that uses path specific reservations would now fail instead of > retrying on a different path. I thought the IBM SVC did this and > Hannes implied he'd be able to check this ... did anyone check? If > we've checked and there's no issue with the SVC, then I don't have any > other objections. > >> So a dynamic SAN controller would have to tear down and rebuild the >> dm-multipath setup at all the time. > > That was the job of the SVC: it sat in the middle of the SAN and > controlled which node saw what storage. > > https://www.ibm.com/support/knowledgecenter/STPVGU/com.ibm.storage.svc.console.720.doc/svc_svcovr_1bcfiq.html > > The SVC can issue its own reservations in those circumstances. What > I'm not at all clear on is whether they'll interact badly with the dm > -mp reservations. > In the end SVC is (for us) just another storage array. If and what SVC does in the background is of no interest to us. OTOH I'd be very surprised if the SVC would be allowing us to see remnants of its internal working (like persistent reservation errors); in doing so third-party applications would be able to see and possibly modify these persistent reservations and the SVC would find itself in a very fragile operating scenario. Also interactions with GPFS (which uses it's own set of reservations) will become very tricky. So I sincerely doubt we'll ever see SVC-originated persistent reservations errors. And as a side-note, this particular patch is included in SLES since 2011. With no noticeable side-effect. Cheers, Hannes
On Tue, 2016-09-27 at 08:34 +0200, Hannes Reinecke wrote: > On 09/26/2016 09:06 PM, James Bottomley wrote: > > On Mon, 2016-09-26 at 09:52 -0700, Christoph Hellwig wrote: > > > Getting back to this after Hannes recovered from his vacation > > > and I had a chat with him.. > > > > > > On Mon, Aug 15, 2016 at 09:40:39AM -0400, Mike Snitzer wrote: > > > > Seems we still need a more sophisticated approach. But I'm > > > > left wondering: if we didn't do it would anything notice? > > > > Sadly, the same big question from the original thread from a > > > > year ago: > > > > > > Yes. I have a customer looking to push the pNFS SCSI layout into > > > a product, and the major show stopper right now is that we can > > > trivially get into failver loops without this (or and equivalent) > > > fix. > > > > > > A year ago SCSI layout was still work in progress in the IETF, > > > people use the similar block layout instead that doesn't use > > > PRs and we also didn't have the in-kernel PR API, so you > > > effectively couldn't use PRs with multipathing. > > > > > > > https://patchwork.kernel.org/patch/6797111/ > > > > > > > > > So this is throw-away for now (and I'll get Hannes' patch > > > > > applied for 4.8-rc3, with the tweak of returning -EBADE > > > > > immediately): > > > > > > > > Unfortunately, I'm _not_ staging Hannes' patch until I have > > > > James Bottomley's Ack (given his original issues with the patch > > > > haven't been explained away AFAICT). > > > > > > I've added James to the Cc. His argument was that the old > > > behavior could be implemented to use some non-standard use of > > > reservations without a specific example. I don't really think > > > his example even is practical - once we use dm-mpath it > > > exclusively claims the underling block devices, so any sort of > > > selective reservations would have had to happen before even > > > starting dm-multipath. > > > > Well, now that you've made me reread the thread from 14 months ago > > that wasn't quite my objection. The objection hinged on the fact > > that anything that uses path specific reservations would now fail > > instead of retrying on a different path. I thought the IBM SVC did > > this and Hannes implied he'd be able to check this ... did anyone > > check? If we've checked and there's no issue with the SVC, then I > > don't have any other objections. > > > > > So a dynamic SAN controller would have to tear down and rebuild > > > the dm-multipath setup at all the time. > > > > That was the job of the SVC: it sat in the middle of the SAN and > > controlled which node saw what storage. > > > > https://www.ibm.com/support/knowledgecenter/STPVGU/com.ibm.storage. > > svc.console.720.doc/svc_svcovr_1bcfiq.html > > > > The SVC can issue its own reservations in those circumstances. > > What I'm not at all clear on is whether they'll interact badly > > with the dm-mp reservations. > > > In the end SVC is (for us) just another storage array. > If and what SVC does in the background is of no interest to us. How can that be true? It sits *on* the san and manages devices, it doesn't sit between the initators and the devices. It applies reservations to devices under management, but every node usually sees everything else, so devices under SVC management are visible to all initators unless you zone them off. The last SVC manual I saw included a procedure for manually releasing stuck SVC reservations from an initator, which illustrates the expectation. > OTOH I'd be very surprised if the SVC would be allowing us to see > remnants of its internal working (like persistent reservation > errors); in doing so third-party applications would be able to see > and possibly modify these persistent reservations and the SVC would > find itself in a very fragile operating scenario. Because unless you zone the fibre, that's precisely what you do see. > Also interactions with GPFS (which uses it's own set of reservations) > will become very tricky. > > So I sincerely doubt we'll ever see SVC-originated persistent > reservations errors. > > And as a side-note, this particular patch is included in SLES since > 2011. With no noticeable side-effect. OK, so can you actually say that someone has tested this scenario? If not, do you have the capacity to test it? James -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Sep 27 2016 at 2:50pm -0400, James Bottomley <James.Bottomley@hansenpartnership.com> wrote: > On Tue, 2016-09-27 at 08:34 +0200, Hannes Reinecke wrote: > > On 09/26/2016 09:06 PM, James Bottomley wrote: > > > On Mon, 2016-09-26 at 09:52 -0700, Christoph Hellwig wrote: > > > > Getting back to this after Hannes recovered from his vacation > > > > and I had a chat with him.. > > > > > > > > On Mon, Aug 15, 2016 at 09:40:39AM -0400, Mike Snitzer wrote: > > > > > Seems we still need a more sophisticated approach. But I'm > > > > > left wondering: if we didn't do it would anything notice? > > > > > Sadly, the same big question from the original thread from a > > > > > year ago: > > > > > > > > Yes. I have a customer looking to push the pNFS SCSI layout into > > > > a product, and the major show stopper right now is that we can > > > > trivially get into failver loops without this (or and equivalent) > > > > fix. > > > > > > > > A year ago SCSI layout was still work in progress in the IETF, > > > > people use the similar block layout instead that doesn't use > > > > PRs and we also didn't have the in-kernel PR API, so you > > > > effectively couldn't use PRs with multipathing. > > > > > > > > > https://patchwork.kernel.org/patch/6797111/ > > > > > > > > > > > So this is throw-away for now (and I'll get Hannes' patch > > > > > > applied for 4.8-rc3, with the tweak of returning -EBADE > > > > > > immediately): > > > > > > > > > > Unfortunately, I'm _not_ staging Hannes' patch until I have > > > > > James Bottomley's Ack (given his original issues with the patch > > > > > haven't been explained away AFAICT). > > > > > > > > I've added James to the Cc. His argument was that the old > > > > behavior could be implemented to use some non-standard use of > > > > reservations without a specific example. I don't really think > > > > his example even is practical - once we use dm-mpath it > > > > exclusively claims the underling block devices, so any sort of > > > > selective reservations would have had to happen before even > > > > starting dm-multipath. > > > > > > Well, now that you've made me reread the thread from 14 months ago > > > that wasn't quite my objection. The objection hinged on the fact > > > that anything that uses path specific reservations would now fail > > > instead of retrying on a different path. I thought the IBM SVC did > > > this and Hannes implied he'd be able to check this ... did anyone > > > check? If we've checked and there's no issue with the SVC, then I > > > don't have any other objections. > > > > > > > So a dynamic SAN controller would have to tear down and rebuild > > > > the dm-multipath setup at all the time. > > > > > > That was the job of the SVC: it sat in the middle of the SAN and > > > controlled which node saw what storage. > > > > > > https://www.ibm.com/support/knowledgecenter/STPVGU/com.ibm.storage. > > > svc.console.720.doc/svc_svcovr_1bcfiq.html > > > > > > The SVC can issue its own reservations in those circumstances. > > > What I'm not at all clear on is whether they'll interact badly > > > with the dm-mp reservations. > > > > > In the end SVC is (for us) just another storage array. > > If and what SVC does in the background is of no interest to us. > > How can that be true? It sits *on* the san and manages devices, it > doesn't sit between the initators and the devices. It applies > reservations to devices under management, but every node usually sees > everything else, so devices under SVC management are visible to all > initators unless you zone them off. > > The last SVC manual I saw included a procedure for manually releasing > stuck SVC reservations from an initator, which illustrates the > expectation. > > > OTOH I'd be very surprised if the SVC would be allowing us to see > > remnants of its internal working (like persistent reservation > > errors); in doing so third-party applications would be able to see > > and possibly modify these persistent reservations and the SVC would > > find itself in a very fragile operating scenario. > > Because unless you zone the fibre, that's precisely what you do see. > > > Also interactions with GPFS (which uses it's own set of reservations) > > will become very tricky. > > > > So I sincerely doubt we'll ever see SVC-originated persistent > > reservations errors. > > > > And as a side-note, this particular patch is included in SLES since > > 2011. With no noticeable side-effect. > > OK, so can you actually say that someone has tested this scenario? If > not, do you have the capacity to test it? I've elected to just take this change for 4.9. Please see: https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-4.9&id=8ff232c1a819c2e98d85974a3bff0b7b8e2970ed -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Sep 29, 2016 at 11:01:33AM -0400, Mike Snitzer wrote: > I've elected to just take this change for 4.9. Please see: > https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-4.9&id=8ff232c1a819c2e98d85974a3bff0b7b8e2970ed Thanks Mike. If any problems show up I will send you an incremental patch that limits this behavior to devices where we created reservations using the kernel pr_ops API. -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 2016-09-29 at 11:01 -0400, Mike Snitzer wrote: > On Tue, Sep 27 2016 at 2:50pm -0400, > James Bottomley <James.Bottomley@hansenpartnership.com> wrote: > > > On Tue, 2016-09-27 at 08:34 +0200, Hannes Reinecke wrote: > > > On 09/26/2016 09:06 PM, James Bottomley wrote: > > > > On Mon, 2016-09-26 at 09:52 -0700, Christoph Hellwig wrote: > > > > > Getting back to this after Hannes recovered from his vacation > > > > > and I had a chat with him.. > > > > > > > > > > On Mon, Aug 15, 2016 at 09:40:39AM -0400, Mike Snitzer wrote: > > > > > > Seems we still need a more sophisticated approach. But I'm > > > > > > left wondering: if we didn't do it would anything notice? > > > > > > Sadly, the same big question from the original thread from > > > > > > a > > > > > > year ago: > > > > > > > > > > Yes. I have a customer looking to push the pNFS SCSI layout > > > > > into > > > > > a product, and the major show stopper right now is that we > > > > > can > > > > > trivially get into failver loops without this (or and > > > > > equivalent) > > > > > fix. > > > > > > > > > > A year ago SCSI layout was still work in progress in the > > > > > IETF, > > > > > people use the similar block layout instead that doesn't use > > > > > PRs and we also didn't have the in-kernel PR API, so you > > > > > effectively couldn't use PRs with multipathing. > > > > > > > > > > > https://patchwork.kernel.org/patch/6797111/ > > > > > > > > > > > > > So this is throw-away for now (and I'll get Hannes' patch > > > > > > > applied for 4.8-rc3, with the tweak of returning -EBADE > > > > > > > immediately): > > > > > > > > > > > > Unfortunately, I'm _not_ staging Hannes' patch until I have > > > > > > James Bottomley's Ack (given his original issues with the > > > > > > patch > > > > > > haven't been explained away AFAICT). > > > > > > > > > > I've added James to the Cc. His argument was that the old > > > > > behavior could be implemented to use some non-standard use of > > > > > reservations without a specific example. I don't really > > > > > think > > > > > his example even is practical - once we use dm-mpath it > > > > > exclusively claims the underling block devices, so any sort > > > > > of > > > > > selective reservations would have had to happen before even > > > > > starting dm-multipath. > > > > > > > > Well, now that you've made me reread the thread from 14 months > > > > ago > > > > that wasn't quite my objection. The objection hinged on the > > > > fact > > > > that anything that uses path specific reservations would now > > > > fail > > > > instead of retrying on a different path. I thought the IBM SVC > > > > did > > > > this and Hannes implied he'd be able to check this ... did > > > > anyone > > > > check? If we've checked and there's no issue with the SVC, > > > > then I > > > > don't have any other objections. > > > > > > > > > So a dynamic SAN controller would have to tear down and > > > > > rebuild > > > > > the dm-multipath setup at all the time. > > > > > > > > That was the job of the SVC: it sat in the middle of the SAN > > > > and > > > > controlled which node saw what storage. > > > > > > > > https://www.ibm.com/support/knowledgecenter/STPVGU/com.ibm.stor > > > > age. > > > > svc.console.720.doc/svc_svcovr_1bcfiq.html > > > > > > > > The SVC can issue its own reservations in those circumstances. > > > > What I'm not at all clear on is whether they'll interact badly > > > > with the dm-mp reservations. > > > > > > > In the end SVC is (for us) just another storage array. > > > If and what SVC does in the background is of no interest to us. > > > > How can that be true? It sits *on* the san and manages devices, it > > doesn't sit between the initators and the devices. It applies > > reservations to devices under management, but every node usually > > sees > > everything else, so devices under SVC management are visible to all > > initators unless you zone them off. > > > > The last SVC manual I saw included a procedure for manually > > releasing > > stuck SVC reservations from an initator, which illustrates the > > expectation. > > > > > OTOH I'd be very surprised if the SVC would be allowing us to see > > > remnants of its internal working (like persistent reservation > > > errors); in doing so third-party applications would be able to > > > see > > > and possibly modify these persistent reservations and the SVC > > > would > > > find itself in a very fragile operating scenario. > > > > Because unless you zone the fibre, that's precisely what you do > > see. > > > > > Also interactions with GPFS (which uses it's own set of > > > reservations) > > > will become very tricky. > > > > > > So I sincerely doubt we'll ever see SVC-originated persistent > > > reservations errors. > > > > > > And as a side-note, this particular patch is included in SLES > > > since > > > 2011. With no noticeable side-effect. > > > > OK, so can you actually say that someone has tested this scenario? > > If not, do you have the capacity to test it? > > I've elected to just take this change for 4.9. Please see: > https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.g > it/commit/?h=dm-4.9&id=8ff232c1a819c2e98d85974a3bff0b7b8e2970ed That's fine. I think the answer is that SVC technology is not around much so it can't be tested, so I was going to dump it on you to make the call anyway ... James -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c index ac734e5..c3d92db 100644 --- a/drivers/md/dm-mpath.c +++ b/drivers/md/dm-mpath.c @@ -41,6 +41,7 @@ struct pgpath { struct delayed_work activate_path; bool is_active:1; /* Path status */ + bool conflict_seen:1; }; #define path_to_pgpath(__pgp) container_of((__pgp), struct pgpath, path) @@ -1569,6 +1570,33 @@ static int do_end_io(struct multipath *m, struct request *clone, if (noretry_error(error)) return error; + if (error == -EBADE) { + /* + * EBADE signals a reservation conflict. + * We shouldn't fail the path here as we can communicate with + * the target. We should failover to the next path, but in + * doing so we might be causing a ping-pong between paths. + * Avoid this by only returning the reservation conflict error + * if a conflict has been seen on all paths. + */ + if (!mpio->pgpath || mpio->pgpath->conflict_seen) { + struct priority_group *pg; + struct pgpath *p; + + /* clear 'conflict_seen' for all pgpaths */ + list_for_each_entry(pg, &m->priority_groups, list) { + list_for_each_entry(p, &pg->pgpaths, list) { + p->conflict_seen = false; + } + } + return error; + } + else if (mpio->pgpath) { + mpio->pgpath->conflict_seen = true; + return r; + } + } + if (mpio->pgpath) fail_path(mpio->pgpath); @@ -1576,9 +1604,6 @@ static int do_end_io(struct multipath *m, struct request *clone, if (!test_bit(MPATHF_QUEUE_IF_NO_PATH, &m->flags)) { if (!must_push_back_rq(m)) r = -EIO; - } else { - if (error == -EBADE) - r = error; } }