mbox series

[0/3] libfc state machine fixes

Message ID 20181007083537.89131-1-hare@suse.de (mailing list archive)
Headers show
Series libfc state machine fixes | expand

Message

Hannes Reinecke Oct. 7, 2018, 8:35 a.m. UTC
Hi all,

here are some patches for PRLI issues in libfc we've come
across recently.
The libfc ones are pretty straightforward, but the scsi state
machine one probably warrants some discussion.
What happened was that in some fabrics the RSCN might get lost
or incompletely reseived. This will then cause SCSI EH to be
triggered for the lost rports, setting the devices to offline.
But later on we do get an RSCN, which would reinstate the rports,
but unfortunately the devices will remain in OFFLINE as we
cannot transition back to running.
The solution I've came up with was to allow transitions from
OFFLINE to BLOCKED, as during RSCN processing the devices
will be set to blocked, and so I found it only reasonable
to allow this transition.

But as usual, comments and reviews are welcome.

Hannes Reinecke (2):
  scsi: Allow state transitions from OFFLINE to BLOCKED
  libfc: retry PRLI if we cannot analyse the payload

Thomas Abraham (1):
  libfc: check fc_frame_payload_get() return value for null

 drivers/scsi/libfc/fc_rport.c | 22 ++++++++++++++++------
 drivers/scsi/scsi_lib.c       |  1 +
 2 files changed, 17 insertions(+), 6 deletions(-)

Comments

Martin K. Petersen Oct. 16, 2018, 3:58 a.m. UTC | #1
Hannes,

> here are some patches for PRLI issues in libfc we've come across
> recently.  The libfc ones are pretty straightforward, but the scsi
> state machine one probably warrants some discussion.  What happened
> was that in some fabrics the RSCN might get lost or incompletely
> reseived. This will then cause SCSI EH to be triggered for the lost
> rports, setting the devices to offline.  But later on we do get an
> RSCN, which would reinstate the rports, but unfortunately the devices
> will remain in OFFLINE as we cannot transition back to running.  The
> solution I've came up with was to allow transitions from OFFLINE to
> BLOCKED, as during RSCN processing the devices will be set to blocked,
> and so I found it only reasonable to allow this transition.

I queued this up for now since I think it is the lesser of two evils.

However, I do think we'll have to have the ability to distinguish
between offlined-by-user-action, offlined-by-device-error, and
offlined-by-transport-event as Ewan pointed out.