Message ID | 51483B19.1070201@profitbricks.com (mailing list archive) |
---|---|
State | Not Applicable, archived |
Headers | show |
On 19/03/2013 12:16, Sebastian Riemer wrote: > Hi Bart, > > now I've got my priority on SRP again. Hi Sebastian, Are these patches targeted to upstream or backports to some OS/kernel? if the former, can you please send them inline so we can have proper review? Or. > > I've also noticed that your ib_srp-backport doesn't fail the IO fast > enough. The fast_io_fail_tmo only comes into play after the QP is > already in timeout and the "terminate_rport_io" function is missing. > > My idea is to use the QP retry count directly for fast IO failing. It is > at 7 by default and the QP timeout is at approx. 2s. The overall QP > timeout is at approx. 35s already (1+7 tries * 2s * 2, I guess). Using > only 3 retries I'm at approx 18s. > > My patches introduce that parameter as module parameter as it is quite > difficult to set the QP from RTS to RTR again. Only there the QP timeout > parameters can be set. > > My patch series isn't complete yet as paths aren't reconnected - they > are only failed fast bound to the overall QP timeout. But it should give > you an idea what I'm trying to do here. > > What are your thought regarding this? > > Attached patches: > ib_srp: register srp_fail_rport_io as terminate_rport_io > ib_srp: be quiet when failing SCSI commands > scsi_transport_srp: disable the fast_io_fail_tmo parameter > ib_srp: show the QP timeout and retry count in srp_host sysfs files > ib_srp: introduce qp_retry_cnt module parameter > > Cheers, > Sebastian > > > Btw.: Before, I've hacked MD RAID-1 for high-performance replication as > DRBD is crap for our purposes. But that's worthless without a reliably > working transport. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 19.03.2013 12:22, Or Gerlitz wrote: > On 19/03/2013 12:16, Sebastian Riemer wrote: >> Hi Bart, >> >> now I've got my priority on SRP again. > > Hi Sebastian, > > Are these patches targeted to upstream or backports to some OS/kernel? > if the former, can you please > send them inline so we can have proper review? > > Or. Hi Or, the patches are targeted to the stuff Bart is doing on GitHub. https://github.com/bvanassche/ib_srp-backport If I've seen that right, fast IO failing hasn't been accepted to the mainline, yet. So I didn't want to spam you all with multiple mails of patches which don't apply to upstream. I want to introduce my idea in the first place. The patches are not a final solution to the problem. They should only show what I'm trying to do here. Cheers, Sebastian -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 03/19/13 11:16, Sebastian Riemer wrote: > Hi Bart, > > now I've got my priority on SRP again. > > I've also noticed that your ib_srp-backport doesn't fail the IO fast > enough. The fast_io_fail_tmo only comes into play after the QP is > already in timeout and the "terminate_rport_io" function is missing. > > My idea is to use the QP retry count directly for fast IO failing. It is > at 7 by default and the QP timeout is at approx. 2s. The overall QP > timeout is at approx. 35s already (1+7 tries * 2s * 2, I guess). Using > only 3 retries I'm at approx 18s. > > My patches introduce that parameter as module parameter as it is quite > difficult to set the QP from RTS to RTR again. Only there the QP timeout > parameters can be set. > > My patch series isn't complete yet as paths aren't reconnected - they > are only failed fast bound to the overall QP timeout. But it should give > you an idea what I'm trying to do here. > > What are your thought regarding this? > > Attached patches: > ib_srp: register srp_fail_rport_io as terminate_rport_io > ib_srp: be quiet when failing SCSI commands > scsi_transport_srp: disable the fast_io_fail_tmo parameter > ib_srp: show the QP timeout and retry count in srp_host sysfs files > ib_srp: introduce qp_retry_cnt module parameter Hello Sebastian, Patches 1 and 2 make sense to me. Patch 3 makes it impossible to disable fast_io_fail_tmo and also disables the fast_io_fail_tmo timer - was that intended ? Regarding patches 4 and 5: I'm not sure whether reducing the QP retry count will work well in large fabrics. The iSCSI initiator follows another approach to realize quick failover, namely by periodically checking the transport layer and by triggering the fast_io_fail timer if that check fails. Unfortunately the SRP spec does not define an operation suited as a transport layer test. But maybe a zero-length RDMA write can be used to verify the transport layer ? I think the IB specification allows such operations. A quote from page 439: C9-88: For an HCA responder using Reliable Connection service, for each zero-length RDMA READ or WRITE request, the R_Key shall not be validated, even if the request includes Immediate data. Note: I'm still working on transforming the patches present in the ib_srp-backport repository such that these become acceptable for upstream inclusion. Bart. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 19.03.2013 12:45, Bart Van Assche wrote: > On 03/19/13 11:16, Sebastian Riemer wrote: >> >> What are your thought regarding this? >> >> Attached patches: >> ib_srp: register srp_fail_rport_io as terminate_rport_io >> ib_srp: be quiet when failing SCSI commands >> scsi_transport_srp: disable the fast_io_fail_tmo parameter >> ib_srp: show the QP timeout and retry count in srp_host sysfs files >> ib_srp: introduce qp_retry_cnt module parameter > > Hello Sebastian, > > Patches 1 and 2 make sense to me. Patch 3 makes it impossible to disable > fast_io_fail_tmo and also disables the fast_io_fail_tmo timer - was that > intended ? I had a patch which has completely thrown out that fast_io_fail_tmo parameter for ib_srp v1.2 as in my tests with dm-multipath it didn't make any sense but having even longer to wait until IO can be failed. If there is a connection issue, then all SCSI disks from that target are affected and not only a single SCSI device. Today I've seen that you are at v1.3 already and that patch didn't apply anymore. So I thought disabling only the functionality shows what I'm trying to do here. Can you please explain me what your intention was with that fast_io_fail_tmo? What I want to have is a calculateable timeout for IO failing. If the QP retries are at 7 I can't get any lower than 35 seconds. > Regarding patches 4 and 5: I'm not sure whether reducing the > QP retry count will work well in large fabrics. For me it is already a mystery why I measure 35 seconds at 2s QP timeout and 7 retries. If the maximum is at 2s * 7 retries * 4, then I'm at 60 seconds. That's plain too long. The fast_io_fail_tmo comes on top of that. How else should I reduce the overall timeout until I see in iostat that the other path is taken? > The iSCSI initiator > follows another approach to realize quick failover, namely by > periodically checking the transport layer and by triggering the > fast_io_fail timer if that check fails. Unfortunately the SRP spec does > not define an operation suited as a transport layer test. But maybe a > zero-length RDMA write can be used to verify the transport layer ? Hmmm, how do you want to implement that? This write would run into (overall) QP timeout as well, I guess. The dm-multipath checks paths with directio reads by polling every 5 seconds by default. IMHO this does exactly that. > I think the IB specification allows such operations. A quote from page 439: > > C9-88: For an HCA responder using Reliable Connection service, for > each zero-length RDMA READ or WRITE request, the R_Key shall not be > validated, even if the request includes Immediate data. And this isn't bound on the (overall) QP timeout? Can you send me a proof of concept for this? > Note: I'm still working on transforming the patches present in the > ib_srp-backport repository such that these become acceptable for > upstream inclusion. I know that and I appreciate that. But I'm running out of time. Perhaps, we can combine some efforts to implement something working first. Doesn't have to be clean and shiny. For me also hacky is okay as long as it works in the data center. Yes, I have to admit that the patches 4 and 5 are hacky. Perhaps, I can report you soon how it behaves reducing the retry count in a large setup. ;-) Cheers, Sebastian -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c index dc49dc8..64644c5 100644 --- a/drivers/infiniband/ulp/srp/ib_srp.c +++ b/drivers/infiniband/ulp/srp/ib_srp.c @@ -756,6 +756,29 @@ static void srp_reset_req(struct srp_target_port *target, struct srp_request *re } } +static void srp_fail_req(struct srp_target_port *target, struct srp_request *req) +{ + struct scsi_cmnd *scmnd = srp_claim_req(target, req, NULL); + + if (scmnd) { + srp_free_req(target, req, scmnd, 0); + scmnd->result = DID_TRANSPORT_FAILFAST << 16; + scmnd->scsi_done(scmnd); + } +} + +static void srp_fail_rport_io(struct srp_rport *rport) +{ + struct srp_target_port *target = rport->lld_data; + int i; + + for (i = 0; i < SRP_CMD_SQ_SIZE; ++i) { + struct srp_request *req = &target->req_ring[i]; + if (req->scmnd) + srp_fail_req(target, req); + } +} + static int srp_reconnect_target(struct srp_target_port *target) { struct Scsi_Host *shost = target->scsi_host; @@ -2700,6 +2723,7 @@ static void srp_remove_one(struct ib_device *device) static struct srp_function_template ib_srp_transport_functions = { .rport_delete = srp_rport_delete, + .terminate_rport_io = srp_fail_rport_io, }; static int __init srp_init_module(void)