Message ID | 1393252218-30638-4-git-send-email-sagig@mellanox.com (mailing list archive) |
---|---|
State | Rejected |
Headers | show |
On 02/24/14 15:30, Sagi Grimberg wrote: > From: Vu Pham <vuhuong@mellanox.com> > > srp_reconnect_rport() serializes calls of srp_rport_reconnect() > with srp_queuecommand(), srp_abort(), srp_reset_device(), > srp_reset_host() via rport->mutex and also blocks srp_queuecommand(); > however, it cannot block scsi error handler commands (stu, tur). > This may introduces corruption in free_tx IUs list and IU itself > > Signed-off-by: Vu Pham <vuhuong@mellanox.com> > Signed-off-by: Sagi Grimberg <sagig@mellanox.com> > --- > drivers/infiniband/ulp/srp/ib_srp.c | 3 +++ > 1 files changed, 3 insertions(+), 0 deletions(-) > > diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c > index b615135..656602b 100644 > --- a/drivers/infiniband/ulp/srp/ib_srp.c > +++ b/drivers/infiniband/ulp/srp/ib_srp.c > @@ -859,6 +859,7 @@ static int srp_rport_reconnect(struct srp_rport *rport) > { > struct srp_target_port *target = rport->lld_data; > int i, ret; > + unsigned long flags; > > srp_disconnect_target(target); > /* > @@ -882,9 +883,11 @@ static int srp_rport_reconnect(struct srp_rport *rport) > srp_finish_req(target, req, DID_RESET << 16); > } > > + spin_lock_irqsave(&target->lock, flags); > INIT_LIST_HEAD(&target->free_tx); > for (i = 0; i < target->queue_size; ++i) > list_add(&target->tx_ring[i]->list, &target->free_tx); > + spin_unlock_irqrestore(&target->lock, flags); > > if (ret == 0) > ret = srp_connect_target(target); Hello Sagi and Vu, srp_rport_reconnect() should never get invoked concurrently with srp_queuecommand() - see e.g. the "in_scsi_eh" variable in srp_queuecommand(). Is the list corruption reproducible with the patch mentioned in my reply to patch 1/3 ? Thanks, Bart. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 2/24/2014 5:38 PM, Bart Van Assche wrote: > On 02/24/14 15:30, Sagi Grimberg wrote: >> From: Vu Pham <vuhuong@mellanox.com> >> >> srp_reconnect_rport() serializes calls of srp_rport_reconnect() >> with srp_queuecommand(), srp_abort(), srp_reset_device(), >> srp_reset_host() via rport->mutex and also blocks srp_queuecommand(); >> however, it cannot block scsi error handler commands (stu, tur). >> This may introduces corruption in free_tx IUs list and IU itself >> >> Signed-off-by: Vu Pham <vuhuong@mellanox.com> >> Signed-off-by: Sagi Grimberg <sagig@mellanox.com> >> --- >> drivers/infiniband/ulp/srp/ib_srp.c | 3 +++ >> 1 files changed, 3 insertions(+), 0 deletions(-) >> >> diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c >> index b615135..656602b 100644 >> --- a/drivers/infiniband/ulp/srp/ib_srp.c >> +++ b/drivers/infiniband/ulp/srp/ib_srp.c >> @@ -859,6 +859,7 @@ static int srp_rport_reconnect(struct srp_rport *rport) >> { >> struct srp_target_port *target = rport->lld_data; >> int i, ret; >> + unsigned long flags; >> >> srp_disconnect_target(target); >> /* >> @@ -882,9 +883,11 @@ static int srp_rport_reconnect(struct srp_rport *rport) >> srp_finish_req(target, req, DID_RESET << 16); >> } >> >> + spin_lock_irqsave(&target->lock, flags); >> INIT_LIST_HEAD(&target->free_tx); >> for (i = 0; i < target->queue_size; ++i) >> list_add(&target->tx_ring[i]->list, &target->free_tx); >> + spin_unlock_irqrestore(&target->lock, flags); >> >> if (ret == 0) >> ret = srp_connect_target(target); > Hello Sagi and Vu, > > srp_rport_reconnect() should never get invoked concurrently with > srp_queuecommand() - see e.g. the "in_scsi_eh" variable in > srp_queuecommand(). Is the list corruption reproducible with the patch > mentioned in my reply to patch 1/3 ? > > Thanks, > > Bart. I need to re-test this. Regarding in_scsi_eh, can you end-up still posting a send if you are in an interrupt context? it's just that we have a *very* rare case (not easy to reproduce) in RH6.5 where we end-up posting on a just destroyed QP (race right in between destroy_qp and assignment of new qp in srp_create_target_ib). We tested it with in_scsi_eh patch and it still happened. As I see it, SRP problems comes in a distinct period when rport is in state BLOCKED. On one hand, all request processing are allowed (not failing commands), and on the other reconnect flow may be running in concurrently. Will it be acceptable to take the rport_mutex in queue_command if rport is in BLOCKED state? Thoughts? Sagi. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 02/27/14 12:51, Sagi Grimberg wrote: > Regarding in_scsi_eh, can you end-up still posting a send if you are in > an interrupt context? > it's just that we have a *very* rare case (not easy to reproduce) in > RH6.5 where we end-up posting on a just destroyed QP > (race right in between destroy_qp and assignment of new qp in > srp_create_target_ib). > We tested it with in_scsi_eh patch and it still happened. > > As I see it, SRP problems comes in a distinct period when rport is in > state BLOCKED. > On one hand, all request processing are allowed (not failing commands), > and on the other reconnect flow may be running in concurrently. > Will it be acceptable to take the rport_mutex in queue_command if rport > is in BLOCKED state? Hello Sagi, The issue you described is probably specific to the RHEL backport of the SRP initiator and does not affect the upstream SRP initiator. The function scsi_request_fn_active() in drivers/scsi/scsi_transport_srp.c is used by srp_reconnect_rport() to wait until ongoing srp_queuecommand() calls have finished after the SCSI host state has been changed into BLOCKED. That function relies on a member variable of struct request_queue that has been introduced upstream in kernel 3.7.0. In other words, that function needs attention when porting the SRP initiator to RHEL 6. A small delay is probably sufficient to wait until ongoing srp_queuecommand() calls have finished. Bart. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c index b615135..656602b 100644 --- a/drivers/infiniband/ulp/srp/ib_srp.c +++ b/drivers/infiniband/ulp/srp/ib_srp.c @@ -859,6 +859,7 @@ static int srp_rport_reconnect(struct srp_rport *rport) { struct srp_target_port *target = rport->lld_data; int i, ret; + unsigned long flags; srp_disconnect_target(target); /* @@ -882,9 +883,11 @@ static int srp_rport_reconnect(struct srp_rport *rport) srp_finish_req(target, req, DID_RESET << 16); } + spin_lock_irqsave(&target->lock, flags); INIT_LIST_HEAD(&target->free_tx); for (i = 0; i < target->queue_size; ++i) list_add(&target->tx_ring[i]->list, &target->free_tx); + spin_unlock_irqrestore(&target->lock, flags); if (ret == 0) ret = srp_connect_target(target);