diff mbox

[09/47] zfcp: open-code fc_block_scsi_eh() for host reset

Message ID 1498638793-44672-10-git-send-email-hare@suse.de (mailing list archive)
State Changes Requested, archived
Headers show

Commit Message

Hannes Reinecke June 28, 2017, 8:32 a.m. UTC
When issuing a host reset we should be waiting for all
ports to become unblocked; just waiting for one might
be resulting in host reset to return too early.

Signed-off-by: Hannes Reinecke <hare@suse.com>
---
 drivers/s390/scsi/zfcp_scsi.c | 27 +++++++++++++++++++++++----
 1 file changed, 23 insertions(+), 4 deletions(-)

Comments

Steffen Maier July 24, 2017, 4:18 p.m. UTC | #1
Hi Hannes,

unfortunately I only realized now by accident that there's stuff to 
review. Would be nice to send it also explicitly to driver maintainers 
in addition to the list.

Since you've asked for this multiple times, I happened to just now code 
a patch series of 6 patches in order to decouple zfcp from scsi_cmnd for 
device, target, and host reset.

While I provide some review comments below, I think it might be clearer 
and easier to review if you would rebase your series on top of my 
decoupling.

Let me know how urgent you'd like to see my code. I planned to send it 
as RFC soon anyway. However, it hasn't seen any function testing yet. If 
you don't care, let me know and I can send it.

On 06/28/2017 10:32 AM, Hannes Reinecke wrote:
> When issuing a host reset we should be waiting for all
> ports to become unblocked; just waiting for one might
> be resulting in host reset to return too early.
> 
> Signed-off-by: Hannes Reinecke <hare@suse.com>
> ---
>   drivers/s390/scsi/zfcp_scsi.c | 27 +++++++++++++++++++++++----
>   1 file changed, 23 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/s390/scsi/zfcp_scsi.c b/drivers/s390/scsi/zfcp_scsi.c
> index 0678cf7..3d18659 100644
> --- a/drivers/s390/scsi/zfcp_scsi.c
> +++ b/drivers/s390/scsi/zfcp_scsi.c
> @@ -311,13 +311,32 @@ static int zfcp_scsi_eh_target_reset_handler(struct scsi_cmnd *scpnt)
> 
>   static int zfcp_scsi_eh_host_reset_handler(struct scsi_cmnd *scpnt)
>   {
> -	struct zfcp_scsi_dev *zfcp_sdev = sdev_to_zfcp(scpnt->device);
> -	struct zfcp_adapter *adapter = zfcp_sdev->port->adapter;
> -	int ret;
> +	struct Scsi_Host *host = scpnt->device->host;
> +	struct zfcp_adapter *adapter = (struct zfcp_adapter *)host->hostdata[0];
> +	int ret = 0;
> +	unsigned long flags;
> +	struct zfcp_port *port;
> 
>   	zfcp_erp_adapter_reopen(adapter, 0, "schrh_1");
>   	zfcp_erp_wait(adapter);
> -	ret = fc_block_scsi_eh(scpnt);
> +retry_rport_blocked:
> +	spin_lock_irqsave(host->host_lock, flags);

missing read_lock(&adapter->port_list_lock);

Hm, well, I have to think about lock ordering, because my patch has the 
port_list as outer loop and inside it calls fc_block_scsi_eh (also 
modified with fc_rport as argument).
If there's any other location taking both locks we better get them in 
the same order.

> +	list_for_each_entry(port, &adapter->port_list, list) {
> +		struct fc_rport *rport = port->rport;

port->rport can be NULL, so need to check

> +
> +		if (rport->port_state == FC_PORTSTATE_BLOCKED) {
> +			if (rport->flags & FC_RPORT_FAST_FAIL_TIMEDOUT)
> +				ret = FAST_IO_FAIL;

Hm, doesn't this get lost if a next iteration hits ret=NEEDS_RETRY?

I was pondering in my own patch version what to return of just a subset 
of ports ran into fast_io_fail. And for now I thought just fast_io_fail 
would be good to let I/O bubble up for path failover, even if this would 
include of rport which meanwhile unblocked properly and would not need 
bubbling up pending requests because they could service them with a 
simple retry.

> +			else
> +				ret = NEEDS_RETRY;
> +			break;
> +		}

Why do you open code fc_block_scsi_eh() instead of calling it with 
port->rport (if it's !=NULL)?

Typically all rports would be blocked after adapter recovery, until they 
become unblocked (via zfcp's async rport_work). So we can wait for each 
in turn which should still only wait in summary for the last one to 
unblock. E.g. if the first rport takes longest we wait for it, and the 
rest of the loop will be fast since the others happen to be unblocked 
(or fast_io_fail) already?

> +	}

missing read_unlock(&adapter->port_list_lock);

> +	spin_unlock_irqrestore(host->host_lock, flags);
> +	if (ret == NEEDS_RETRY) {
> +		msleep(1000);
> +		goto retry_rport_blocked;
> +	}
>   	if (ret)
>   		return ret;
>
Steffen Maier July 24, 2017, 4:24 p.m. UTC | #2
On 06/28/2017 10:32 AM, Hannes Reinecke wrote:
> When issuing a host reset we should be waiting for all
> ports to become unblocked; just waiting for one might
> be resulting in host reset to return too early.
> 
> Signed-off-by: Hannes Reinecke <hare@suse.com>
> ---
>   drivers/s390/scsi/zfcp_scsi.c | 27 +++++++++++++++++++++++----
>   1 file changed, 23 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/s390/scsi/zfcp_scsi.c b/drivers/s390/scsi/zfcp_scsi.c
> index 0678cf7..3d18659 100644
> --- a/drivers/s390/scsi/zfcp_scsi.c
> +++ b/drivers/s390/scsi/zfcp_scsi.c
> @@ -311,13 +311,32 @@ static int zfcp_scsi_eh_target_reset_handler(struct scsi_cmnd *scpnt)
> 
>   static int zfcp_scsi_eh_host_reset_handler(struct scsi_cmnd *scpnt)
>   {
> -	struct zfcp_scsi_dev *zfcp_sdev = sdev_to_zfcp(scpnt->device);
> -	struct zfcp_adapter *adapter = zfcp_sdev->port->adapter;
> -	int ret;
> +	struct Scsi_Host *host = scpnt->device->host;
> +	struct zfcp_adapter *adapter = (struct zfcp_adapter *)host->hostdata[0];

Oh, only realized when reading the later "[PATCH 14/47] scsi: Use 
Scsi_Host as argument for eh_host_reset_handler" that this part already 
anticipates parts of that later patch. Seems not fully topically 
separated patches to me, if this one is only about the fc_block_scsi_eh 
aspect in zfcp_scsi_eh_host_reset_handler().
Hannes Reinecke July 24, 2017, 6:01 p.m. UTC | #3
On 07/24/2017 06:18 PM, Steffen Maier wrote:
> Hi Hannes,
> 
> unfortunately I only realized now by accident that there's stuff to
> review. Would be nice to send it also explicitly to driver maintainers
> in addition to the list.
> 
Well, the entire patchset got send as an RFC anyway, and as such I
didn't include all the individual driver maintainers.
But will be doing so for the actual patchset.

> Since you've asked for this multiple times, I happened to just now code
> a patch series of 6 patches in order to decouple zfcp from scsi_cmnd for
> device, target, and host reset.
> 
Oh, cool. That's precisely what I need.

> While I provide some review comments below, I think it might be clearer
> and easier to review if you would rebase your series on top of my
> decoupling.
> 
Sure. Get me the patches and I'll be doing it :-)

> Let me know how urgent you'd like to see my code. I planned to send it
> as RFC soon anyway. However, it hasn't seen any function testing yet. If
> you don't care, let me know and I can send it.
> 
At this stage I don't really care; the idea is to get the preliminary
patches in (preferably before the next merge window), so that the actual
patch to modify SCSI EH syntax doesn't have to modify the drivers
themselves, only the calling sequence.

> On 06/28/2017 10:32 AM, Hannes Reinecke wrote:
>> When issuing a host reset we should be waiting for all
>> ports to become unblocked; just waiting for one might
>> be resulting in host reset to return too early.
>>
>> Signed-off-by: Hannes Reinecke <hare@suse.com>
>> ---
>>   drivers/s390/scsi/zfcp_scsi.c | 27 +++++++++++++++++++++++----
>>   1 file changed, 23 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/s390/scsi/zfcp_scsi.c
>> b/drivers/s390/scsi/zfcp_scsi.c
>> index 0678cf7..3d18659 100644
>> --- a/drivers/s390/scsi/zfcp_scsi.c
>> +++ b/drivers/s390/scsi/zfcp_scsi.c
>> @@ -311,13 +311,32 @@ static int
>> zfcp_scsi_eh_target_reset_handler(struct scsi_cmnd *scpnt)
>>
>>   static int zfcp_scsi_eh_host_reset_handler(struct scsi_cmnd *scpnt)
>>   {
>> -    struct zfcp_scsi_dev *zfcp_sdev = sdev_to_zfcp(scpnt->device);
>> -    struct zfcp_adapter *adapter = zfcp_sdev->port->adapter;
>> -    int ret;
>> +    struct Scsi_Host *host = scpnt->device->host;
>> +    struct zfcp_adapter *adapter = (struct zfcp_adapter
>> *)host->hostdata[0];
>> +    int ret = 0;
>> +    unsigned long flags;
>> +    struct zfcp_port *port;
>>
>>       zfcp_erp_adapter_reopen(adapter, 0, "schrh_1");
>>       zfcp_erp_wait(adapter);
>> -    ret = fc_block_scsi_eh(scpnt);
>> +retry_rport_blocked:
>> +    spin_lock_irqsave(host->host_lock, flags);
> 
> missing read_lock(&adapter->port_list_lock);
> 
> Hm, well, I have to think about lock ordering, because my patch has the
> port_list as outer loop and inside it calls fc_block_scsi_eh (also
> modified with fc_rport as argument).
> If there's any other location taking both locks we better get them in
> the same order.
> 
In general I'm not terribly happy with this; after all, there is no
guarantee that the rport list after host reset is identical to the list
prior to it; in extremis we could even end up with no rports whatsoever.
In which case we wouldn't have anything to synchronize upon, leaving
host reset in a somewhat dubious state.
I'd be far happier if we could have a synchronisation point independent
on the rport states; then this problem wouldn't occur.
(And rports would be handled separately via the dev_loss_tmo mechanism
anyway, so it wouldn't matter if the rports are still recovering after
host reset returned.)

Cheers,

Hannes
diff mbox

Patch

diff --git a/drivers/s390/scsi/zfcp_scsi.c b/drivers/s390/scsi/zfcp_scsi.c
index 0678cf7..3d18659 100644
--- a/drivers/s390/scsi/zfcp_scsi.c
+++ b/drivers/s390/scsi/zfcp_scsi.c
@@ -311,13 +311,32 @@  static int zfcp_scsi_eh_target_reset_handler(struct scsi_cmnd *scpnt)
 
 static int zfcp_scsi_eh_host_reset_handler(struct scsi_cmnd *scpnt)
 {
-	struct zfcp_scsi_dev *zfcp_sdev = sdev_to_zfcp(scpnt->device);
-	struct zfcp_adapter *adapter = zfcp_sdev->port->adapter;
-	int ret;
+	struct Scsi_Host *host = scpnt->device->host;
+	struct zfcp_adapter *adapter = (struct zfcp_adapter *)host->hostdata[0];
+	int ret = 0;
+	unsigned long flags;
+	struct zfcp_port *port;
 
 	zfcp_erp_adapter_reopen(adapter, 0, "schrh_1");
 	zfcp_erp_wait(adapter);
-	ret = fc_block_scsi_eh(scpnt);
+retry_rport_blocked:
+	spin_lock_irqsave(host->host_lock, flags);
+	list_for_each_entry(port, &adapter->port_list, list) {
+		struct fc_rport *rport = port->rport;
+
+		if (rport->port_state == FC_PORTSTATE_BLOCKED) {
+			if (rport->flags & FC_RPORT_FAST_FAIL_TIMEDOUT)
+				ret = FAST_IO_FAIL;
+			else
+				ret = NEEDS_RETRY;
+			break;
+		}
+	}
+	spin_unlock_irqrestore(host->host_lock, flags);
+	if (ret == NEEDS_RETRY) {
+		msleep(1000);
+		goto retry_rport_blocked;
+	}
 	if (ret)
 		return ret;