diff mbox series

scsi: allow state transitions BLOCK -> BLOCK

Message ID 20200702142436.98336-1-hare@suse.de (mailing list archive)
State Changes Requested
Headers show
Series scsi: allow state transitions BLOCK -> BLOCK | expand

Commit Message

Hannes Reinecke July 2, 2020, 2:24 p.m. UTC
scsi_transport_srp.c will call scsi_target_block() repeatedly
without calling scsi_target_unblock() first.
So allow the idempotent state transition BLOCK -> BLOCK to avoid
a warning here.

Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 drivers/scsi/scsi_lib.c | 1 +
 1 file changed, 1 insertion(+)

Comments

James Bottomley July 2, 2020, 2:34 p.m. UTC | #1
On Thu, 2020-07-02 at 16:24 +0200, Hannes Reinecke wrote:
> scsi_transport_srp.c will call scsi_target_block() repeatedly
> without calling scsi_target_unblock() first.
> So allow the idempotent state transition BLOCK -> BLOCK to avoid
> a warning here.

That really doesn't sound like a good idea.  Block and unblock should
be matched pairs and since you don't have a running->running transition
allowed this implies that srp calls block many times but unblock only
once.  It really sounds like srp needs fixing.

James
Hannes Reinecke July 2, 2020, 3:14 p.m. UTC | #2
On 7/2/20 4:34 PM, James Bottomley wrote:
> On Thu, 2020-07-02 at 16:24 +0200, Hannes Reinecke wrote:
>> scsi_transport_srp.c will call scsi_target_block() repeatedly
>> without calling scsi_target_unblock() first.
>> So allow the idempotent state transition BLOCK -> BLOCK to avoid
>> a warning here.
> 
> That really doesn't sound like a good idea.  Block and unblock should
> be matched pairs and since you don't have a running->running transition
> allowed this implies that srp calls block many times but unblock only
> once.  It really sounds like srp needs fixing.
> 
That was what I was planning first, but then SRP has a weird mix of 
states, calling scsi_target_block()/scsi_target_unblock() on all sorts 
of places.
Bart?

Cheers,

Hannes
Bart Van Assche July 2, 2020, 5:23 p.m. UTC | #3
On 2020-07-02 08:14, Hannes Reinecke wrote:
> On 7/2/20 4:34 PM, James Bottomley wrote:
>> On Thu, 2020-07-02 at 16:24 +0200, Hannes Reinecke wrote:
>>> scsi_transport_srp.c will call scsi_target_block() repeatedly
>>> without calling scsi_target_unblock() first.
>>> So allow the idempotent state transition BLOCK -> BLOCK to avoid
>>> a warning here.
>>
>> That really doesn't sound like a good idea.  Block and unblock should
>> be matched pairs and since you don't have a running->running transition
>> allowed this implies that srp calls block many times but unblock only
>> once.  It really sounds like srp needs fixing.
>>
> That was what I was planning first, but then SRP has a weird mix of states, calling scsi_target_block()/scsi_target_unblock() on all sorts of places.

It is not clear to me how the SRP transport code could call
scsi_target_block() twice without calling scsi_target_unblock() in
between? All these calls are serialized by the rport mutex.
scsi_target_block() is called when the port state is changed to
SRP_RPORT_BLOCKED. scsi_target_unblock() is called when the port
state is changed into another state than SRP_RPORT_BLOCKED.

Bart.
Hannes Reinecke July 3, 2020, 5:30 a.m. UTC | #4
On 7/2/20 7:23 PM, Bart Van Assche wrote:
> On 2020-07-02 08:14, Hannes Reinecke wrote:
>> On 7/2/20 4:34 PM, James Bottomley wrote:
>>> On Thu, 2020-07-02 at 16:24 +0200, Hannes Reinecke wrote:
>>>> scsi_transport_srp.c will call scsi_target_block() repeatedly
>>>> without calling scsi_target_unblock() first.
>>>> So allow the idempotent state transition BLOCK -> BLOCK to avoid
>>>> a warning here.
>>>
>>> That really doesn't sound like a good idea.  Block and unblock should
>>> be matched pairs and since you don't have a running->running transition
>>> allowed this implies that srp calls block many times but unblock only
>>> once.  It really sounds like srp needs fixing.
>>>
>> That was what I was planning first, but then SRP has a weird mix of states, calling scsi_target_block()/scsi_target_unblock() on all sorts of places.
> 
> It is not clear to me how the SRP transport code could call
> scsi_target_block() twice without calling scsi_target_unblock() in
> between? All these calls are serialized by the rport mutex.
> scsi_target_block() is called when the port state is changed to
> SRP_RPORT_BLOCKED. scsi_target_unblock() is called when the port
> state is changed into another state than SRP_RPORT_BLOCKED.
> 

And it's called from srp_reconnect_rport() and __rport_fail_io_fast(), 
so we have this call chain:

srp_reconnect_rport()
   - scsi_target_block()
   -> __rport_fail_io_fast()
        - scsi_target_block()


Cheers,

Hannes
Bart Van Assche July 6, 2020, 2:30 a.m. UTC | #5
On 2020-07-02 22:30, Hannes Reinecke wrote:
> And it's called from srp_reconnect_rport() and __rport_fail_io_fast(),
> so we have this call chain:
> 
> srp_reconnect_rport()
>   - scsi_target_block()
>   -> __rport_fail_io_fast()
>        - scsi_target_block()

How about the (untested) patch below?


diff --git a/drivers/scsi/scsi_transport_srp.c b/drivers/scsi/scsi_transport_srp.c
index d4d1104fac99..bfb240675f06 100644
--- a/drivers/scsi/scsi_transport_srp.c
+++ b/drivers/scsi/scsi_transport_srp.c
@@ -402,13 +402,9 @@ static void __rport_fail_io_fast(struct srp_rport *rport)

 	lockdep_assert_held(&rport->mutex);

+	WARN_ON_ONCE(rport->state != SRP_RPORT_BLOCKED);
 	if (srp_rport_set_state(rport, SRP_RPORT_FAIL_FAST))
 		return;
-	/*
-	 * Call scsi_target_block() to wait for ongoing shost->queuecommand()
-	 * calls before invoking i->f->terminate_rport_io().
-	 */
-	scsi_target_block(rport->dev.parent);
 	scsi_target_unblock(rport->dev.parent, SDEV_TRANSPORT_OFFLINE);

 	/* Involve the LLD if possible to terminate all I/O on the rport. */
@@ -569,9 +565,9 @@ int srp_reconnect_rport(struct srp_rport *rport)
 		 * and dev_loss off. Mark the port as failed and start the TL
 		 * failure timers if these had not yet been started.
 		 */
+		WARN_ON_ONCE(srp_rport_set_state(rport, SRP_RPORT_BLOCKED));
+		scsi_target_block(rport->dev.parent);
 		__rport_fail_io_fast(rport);
-		scsi_target_unblock(&shost->shost_gendev,
-				    SDEV_TRANSPORT_OFFLINE);
 		__srp_start_tl_fail_timers(rport);
 	} else if (rport->state != SRP_RPORT_BLOCKED) {
 		scsi_target_unblock(&shost->shost_gendev,
Hannes Reinecke July 6, 2020, 6:22 a.m. UTC | #6
On 7/6/20 4:30 AM, Bart Van Assche wrote:
> On 2020-07-02 22:30, Hannes Reinecke wrote:
>> And it's called from srp_reconnect_rport() and __rport_fail_io_fast(),
>> so we have this call chain:
>>
>> srp_reconnect_rport()
>>    - scsi_target_block()
>>    -> __rport_fail_io_fast()
>>         - scsi_target_block()
> 
> How about the (untested) patch below?
> 
> 
> diff --git a/drivers/scsi/scsi_transport_srp.c b/drivers/scsi/scsi_transport_srp.c
> index d4d1104fac99..bfb240675f06 100644
> --- a/drivers/scsi/scsi_transport_srp.c
> +++ b/drivers/scsi/scsi_transport_srp.c
> @@ -402,13 +402,9 @@ static void __rport_fail_io_fast(struct srp_rport *rport)
> 
>   	lockdep_assert_held(&rport->mutex);
> 
> +	WARN_ON_ONCE(rport->state != SRP_RPORT_BLOCKED);
>   	if (srp_rport_set_state(rport, SRP_RPORT_FAIL_FAST))
>   		return;
> -	/*
> -	 * Call scsi_target_block() to wait for ongoing shost->queuecommand()
> -	 * calls before invoking i->f->terminate_rport_io().
> -	 */
> -	scsi_target_block(rport->dev.parent);
>   	scsi_target_unblock(rport->dev.parent, SDEV_TRANSPORT_OFFLINE);
> 
>   	/* Involve the LLD if possible to terminate all I/O on the rport. */
> @@ -569,9 +565,9 @@ int srp_reconnect_rport(struct srp_rport *rport)
>   		 * and dev_loss off. Mark the port as failed and start the TL
>   		 * failure timers if these had not yet been started.
>   		 */
> +		WARN_ON_ONCE(srp_rport_set_state(rport, SRP_RPORT_BLOCKED));
> +		scsi_target_block(rport->dev.parent);
>   		__rport_fail_io_fast(rport);
> -		scsi_target_unblock(&shost->shost_gendev,
> -				    SDEV_TRANSPORT_OFFLINE);
>   		__srp_start_tl_fail_timers(rport);
>   	} else if (rport->state != SRP_RPORT_BLOCKED) {
>   		scsi_target_unblock(&shost->shost_gendev,
> 
That will still have a duplicate call as scsi_target_block() has been 
called already (cf scsi_target_block() in line 539 right at the start of 
srp_reconnect_rport()).

(And I don't think we need the WARN_ON_ONCE() here; we are checking for 
rport->state == SRP_RPORT_BLOCKED just before that line...)

Cheers,

Hannes
Bart Van Assche July 12, 2020, 4:13 a.m. UTC | #7
On 2020-07-05 23:22, Hannes Reinecke wrote:
> That will still have a duplicate call as scsi_target_block() has been called already (cf scsi_target_block() in line 539 right at the start of srp_reconnect_rport()).
> 
> (And I don't think we need the WARN_ON_ONCE() here; we are checking for rport->state == SRP_RPORT_BLOCKED just before that line...)

How about the patch below? The approach implemented by that
patch is only to call scsi_target_block() after a successful
transition to the SRP_RPORT_BLOCKED state and to only call
scsi_target_unblock() when leaving the SRP_RPORT_BLOCKED state.

Thanks,

Bart.

diff --git a/drivers/scsi/scsi_transport_srp.c b/drivers/scsi/scsi_transport_srp.c
index d4d1104fac99..0334f86f0879 100644
--- a/drivers/scsi/scsi_transport_srp.c
+++ b/drivers/scsi/scsi_transport_srp.c
@@ -402,13 +402,9 @@ static void __rport_fail_io_fast(struct srp_rport *rport)

 	lockdep_assert_held(&rport->mutex);

+	WARN_ON_ONCE(rport->state != SRP_RPORT_BLOCKED);
 	if (srp_rport_set_state(rport, SRP_RPORT_FAIL_FAST))
 		return;
-	/*
-	 * Call scsi_target_block() to wait for ongoing shost->queuecommand()
-	 * calls before invoking i->f->terminate_rport_io().
-	 */
-	scsi_target_block(rport->dev.parent);
 	scsi_target_unblock(rport->dev.parent, SDEV_TRANSPORT_OFFLINE);

 	/* Involve the LLD if possible to terminate all I/O on the rport. */
@@ -541,7 +537,8 @@ int srp_reconnect_rport(struct srp_rport *rport)
 	res = mutex_lock_interruptible(&rport->mutex);
 	if (res)
 		goto out;
-	scsi_target_block(&shost->shost_gendev);
+	if (srp_rport_set_state(rport, SRP_RPORT_BLOCKED) == 0)
+		scsi_target_block(&shost->shost_gendev);
 	res = rport->state != SRP_RPORT_LOST ? i->f->reconnect(rport) : -ENODEV;
 	pr_debug("%s (state %d): transport.reconnect() returned %d\n",
 		 dev_name(&shost->shost_gendev), rport->state, res);
@@ -569,9 +566,9 @@ int srp_reconnect_rport(struct srp_rport *rport)
 		 * and dev_loss off. Mark the port as failed and start the TL
 		 * failure timers if these had not yet been started.
 		 */
+		WARN_ON_ONCE(srp_rport_set_state(rport, SRP_RPORT_BLOCKED));
+		scsi_target_block(rport->dev.parent);
 		__rport_fail_io_fast(rport);
-		scsi_target_unblock(&shost->shost_gendev,
-				    SDEV_TRANSPORT_OFFLINE);
 		__srp_start_tl_fail_timers(rport);
 	} else if (rport->state != SRP_RPORT_BLOCKED) {
 		scsi_target_unblock(&shost->shost_gendev,
Hannes Reinecke July 13, 2020, 6:19 a.m. UTC | #8
On 7/12/20 6:13 AM, Bart Van Assche wrote:
> On 2020-07-05 23:22, Hannes Reinecke wrote:
>> That will still have a duplicate call as scsi_target_block() has been called already (cf scsi_target_block() in line 539 right at the start of srp_reconnect_rport()).
>>
>> (And I don't think we need the WARN_ON_ONCE() here; we are checking for rport->state == SRP_RPORT_BLOCKED just before that line...)
> 
> How about the patch below? The approach implemented by that
> patch is only to call scsi_target_block() after a successful
> transition to the SRP_RPORT_BLOCKED state and to only call
> scsi_target_unblock() when leaving the SRP_RPORT_BLOCKED state.
> 
> Thanks,
> 
> Bart.
> 
> diff --git a/drivers/scsi/scsi_transport_srp.c b/drivers/scsi/scsi_transport_srp.c
> index d4d1104fac99..0334f86f0879 100644
> --- a/drivers/scsi/scsi_transport_srp.c
> +++ b/drivers/scsi/scsi_transport_srp.c
> @@ -402,13 +402,9 @@ static void __rport_fail_io_fast(struct srp_rport *rport)
> 
>   	lockdep_assert_held(&rport->mutex);
> 
> +	WARN_ON_ONCE(rport->state != SRP_RPORT_BLOCKED);
>   	if (srp_rport_set_state(rport, SRP_RPORT_FAIL_FAST))
>   		return;
> -	/*
> -	 * Call scsi_target_block() to wait for ongoing shost->queuecommand()
> -	 * calls before invoking i->f->terminate_rport_io().
> -	 */
> -	scsi_target_block(rport->dev.parent);
>   	scsi_target_unblock(rport->dev.parent, SDEV_TRANSPORT_OFFLINE);
> 
>   	/* Involve the LLD if possible to terminate all I/O on the rport. */
> @@ -541,7 +537,8 @@ int srp_reconnect_rport(struct srp_rport *rport)
>   	res = mutex_lock_interruptible(&rport->mutex);
>   	if (res)
>   		goto out;
> -	scsi_target_block(&shost->shost_gendev);
> +	if (srp_rport_set_state(rport, SRP_RPORT_BLOCKED) == 0)
> +		scsi_target_block(&shost->shost_gendev);
>   	res = rport->state != SRP_RPORT_LOST ? i->f->reconnect(rport) : -ENODEV;
>   	pr_debug("%s (state %d): transport.reconnect() returned %d\n",
>   		 dev_name(&shost->shost_gendev), rport->state, res);
> @@ -569,9 +566,9 @@ int srp_reconnect_rport(struct srp_rport *rport)
>   		 * and dev_loss off. Mark the port as failed and start the TL
>   		 * failure timers if these had not yet been started.
>   		 */
> +		WARN_ON_ONCE(srp_rport_set_state(rport, SRP_RPORT_BLOCKED));
> +		scsi_target_block(rport->dev.parent);
>   		__rport_fail_io_fast(rport);
> -		scsi_target_unblock(&shost->shost_gendev,
> -				    SDEV_TRANSPORT_OFFLINE);
>   		__srp_start_tl_fail_timers(rport);
>   	} else if (rport->state != SRP_RPORT_BLOCKED) {
>   		scsi_target_unblock(&shost->shost_gendev,
> 

That doesn't look correct.
We just set the portstate to 'blocked' in the first hunk.
So the only way for this bit to make any sense would be if the portstate 
would _not_ blocked, _and_ we have a valid state transition to 'blocked'.
But this cannot happen, as the state can't change in between those two 
calls, and the first state change didn't succeed. So this state change 
won't succeed, either, and the WARN_ON will always trigger here.

Plus this whole hunk is reached from an if condition:

	} else if (rport->state == SRP_RPORT_RUNNING) {

which (after the first hunk) is never viable, as the transition RUNNINNG 
-> BLOCKED is allowed. Hence the first hunk will always transition to 
BLOCKED, and this whole block can never be reached.

I think this should be sufficient:

diff --git a/drivers/scsi/scsi_transport_srp.c 
b/drivers/scsi/scsi_transport_srp.c
index d4d1104fac99..180b323f46b8 100644
--- a/drivers/scsi/scsi_transport_srp.c
+++ b/drivers/scsi/scsi_transport_srp.c
@@ -404,11 +404,6 @@ static void __rport_fail_io_fast(struct srp_rport 
*rport)

         if (srp_rport_set_state(rport, SRP_RPORT_FAIL_FAST))
                 return;
-       /*
-        * Call scsi_target_block() to wait for ongoing 
shost->queuecommand()
-        * calls before invoking i->f->terminate_rport_io().
-        */
-       scsi_target_block(rport->dev.parent);
         scsi_target_unblock(rport->dev.parent, SDEV_TRANSPORT_OFFLINE);

         /* Involve the LLD if possible to terminate all I/O on the 
rport. */
@@ -570,8 +565,6 @@ int srp_reconnect_rport(struct srp_rport *rport)
                  * failure timers if these had not yet been started.
                  */
                 __rport_fail_io_fast(rport);
-               scsi_target_unblock(&shost->shost_gendev,
-                                   SDEV_TRANSPORT_OFFLINE);
                 __srp_start_tl_fail_timers(rport);
         } else if (rport->state != SRP_RPORT_BLOCKED) {
                 scsi_target_unblock(&shost->shost_gendev,


Cheers,

Hannes
Bart Van Assche July 18, 2020, 2:51 a.m. UTC | #9
On 2020-07-12 23:19, Hannes Reinecke wrote:
> I think this should be sufficient:
> 
> diff --git a/drivers/scsi/scsi_transport_srp.c
> b/drivers/scsi/scsi_transport_srp.c
> index d4d1104fac99..180b323f46b8 100644
> --- a/drivers/scsi/scsi_transport_srp.c
> +++ b/drivers/scsi/scsi_transport_srp.c
> @@ -404,11 +404,6 @@ static void __rport_fail_io_fast(struct srp_rport
> *rport)
> 
>         if (srp_rport_set_state(rport, SRP_RPORT_FAIL_FAST))
>                 return;
> -       /*
> -        * Call scsi_target_block() to wait for ongoing
> shost->queuecommand()
> -        * calls before invoking i->f->terminate_rport_io().
> -        */
> -       scsi_target_block(rport->dev.parent);
>         scsi_target_unblock(rport->dev.parent, SDEV_TRANSPORT_OFFLINE);
> 
>         /* Involve the LLD if possible to terminate all I/O on the
> rport. */
> @@ -570,8 +565,6 @@ int srp_reconnect_rport(struct srp_rport *rport)
>                  * failure timers if these had not yet been started.
>                  */
>                 __rport_fail_io_fast(rport);
> -               scsi_target_unblock(&shost->shost_gendev,
> -                                   SDEV_TRANSPORT_OFFLINE);
>                 __srp_start_tl_fail_timers(rport);
>         } else if (rport->state != SRP_RPORT_BLOCKED) {
>                 scsi_target_unblock(&shost->shost_gendev,

Adding a comment like this above __rport_fail_io_fast() would be welcome:

/*
 * scsi_target_block() must have been called before this function is
 * called to guarantee that no .queuecommand() calls are in progress.
 */

Otherwise the above patch looks fine to me.

Bart.
diff mbox series

Patch

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 1362f4f17dfd..55666e5ef151 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -2322,6 +2322,7 @@  scsi_device_set_state(struct scsi_device *sdev, enum scsi_device_state state)
 		switch (oldstate) {
 		case SDEV_RUNNING:
 		case SDEV_CREATED_BLOCK:
+		case SDEV_BLOCK:
 		case SDEV_QUIESCE:
 		case SDEV_OFFLINE:
 			break;