diff mbox series

[RFC,v5,3/3] vfio-ccw: Serialize FSM IDLE state with I/O completion

Message ID 20210510205646.1845844-4-farman@linux.ibm.com (mailing list archive)
State New, archived
Headers show
Series vfio-ccw: Fix interrupt handling for HALT/CLEAR | expand

Commit Message

Eric Farman May 10, 2021, 8:56 p.m. UTC
Today, the stacked call to vfio_ccw_sch_io_todo() does three things:

  1) Update a solicited IRB with CP information, and release the CP
     if the interrupt was the end of a START operation.
  2) Copy the IRB data into the io_region, under the protection of
     the io_mutex
  3) Reset the vfio-ccw FSM state to IDLE to acknowledge that
     vfio-ccw can accept more work.

The trouble is that step 3 is (A) invoked for both solicited and
unsolicited interrupts, and (B) sitting after the mutex for step 2.
This second piece becomes a problem if it processes an interrupt
for a CLEAR SUBCHANNEL while another thread initiates a START,
thus allowing the CP and FSM states to get out of sync. That is:

    CPU 1                           CPU 2
    fsm_do_clear()
    fsm_irq()
                                    fsm_io_request()
    vfio_ccw_sch_io_todo()
                                    fsm_io_helper()

Since the FSM state and CP should be kept in sync, let's make a
note when the CP is released, and rely on that as an indication
that the FSM should also be reset at the end of this routine and
open up the device for more work.

Signed-off-by: Eric Farman <farman@linux.ibm.com>
---
 drivers/s390/cio/vfio_ccw_drv.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

Comments

Cornelia Huck May 11, 2021, 11:31 a.m. UTC | #1
On Mon, 10 May 2021 22:56:46 +0200
Eric Farman <farman@linux.ibm.com> wrote:

> Today, the stacked call to vfio_ccw_sch_io_todo() does three things:
> 
>   1) Update a solicited IRB with CP information, and release the CP
>      if the interrupt was the end of a START operation.
>   2) Copy the IRB data into the io_region, under the protection of
>      the io_mutex
>   3) Reset the vfio-ccw FSM state to IDLE to acknowledge that
>      vfio-ccw can accept more work.
> 
> The trouble is that step 3 is (A) invoked for both solicited and
> unsolicited interrupts, and (B) sitting after the mutex for step 2.
> This second piece becomes a problem if it processes an interrupt
> for a CLEAR SUBCHANNEL while another thread initiates a START,
> thus allowing the CP and FSM states to get out of sync. That is:
> 
>     CPU 1                           CPU 2
>     fsm_do_clear()
>     fsm_irq()
>                                     fsm_io_request()
>     vfio_ccw_sch_io_todo()
>                                     fsm_io_helper()
> 
> Since the FSM state and CP should be kept in sync, let's make a
> note when the CP is released, and rely on that as an indication
> that the FSM should also be reset at the end of this routine and
> open up the device for more work.
> 
> Signed-off-by: Eric Farman <farman@linux.ibm.com>
> ---
>  drivers/s390/cio/vfio_ccw_drv.c | 8 +++++---
>  1 file changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/s390/cio/vfio_ccw_drv.c b/drivers/s390/cio/vfio_ccw_drv.c
> index 8c625b530035..ef39182edab5 100644
> --- a/drivers/s390/cio/vfio_ccw_drv.c
> +++ b/drivers/s390/cio/vfio_ccw_drv.c
> @@ -85,7 +85,7 @@ static void vfio_ccw_sch_io_todo(struct work_struct *work)
>  {
>  	struct vfio_ccw_private *private;
>  	struct irb *irb;
> -	bool is_final;
> +	bool is_final, is_finished = false;

<bikeshed>
"is_finished" does not really say what is finished; maybe call it
"cp_is_finished"?
</bikeshed>

>  
>  	private = container_of(work, struct vfio_ccw_private, io_work);
>  	irb = &private->irb;
> @@ -94,14 +94,16 @@ static void vfio_ccw_sch_io_todo(struct work_struct *work)
>  		     (SCSW_ACTL_DEVACT | SCSW_ACTL_SCHACT));
>  	if (scsw_is_solicited(&irb->scsw)) {
>  		cp_update_scsw(&private->cp, &irb->scsw);
> -		if (is_final && private->state == VFIO_CCW_STATE_CP_PENDING)
> +		if (is_final && private->state == VFIO_CCW_STATE_CP_PENDING) {
>  			cp_free(&private->cp);
> +			is_finished = true;
> +		}
>  	}
>  	mutex_lock(&private->io_mutex);
>  	memcpy(private->io_region->irb_area, irb, sizeof(*irb));
>  	mutex_unlock(&private->io_mutex);
>  
> -	if (private->mdev && is_final)
> +	if (private->mdev && is_finished)

Maybe add a comment?

/*
 * Reset to idle if processing of a channel program
 * has finished; but do not overwrite a possible
 * processing state if we got a final interrupt for hsch
 * or csch.
 */

Otherwise, I see us scratching our heads again in a few months :)

>  		private->state = VFIO_CCW_STATE_IDLE;
>  
>  	if (private->io_trigger)

Patch looks good to me.
Eric Farman May 11, 2021, 6:02 p.m. UTC | #2
On Tue, 2021-05-11 at 13:31 +0200, Cornelia Huck wrote:
> On Mon, 10 May 2021 22:56:46 +0200
> Eric Farman <farman@linux.ibm.com> wrote:
> 
> > Today, the stacked call to vfio_ccw_sch_io_todo() does three
> > things:
> > 
> >   1) Update a solicited IRB with CP information, and release the CP
> >      if the interrupt was the end of a START operation.
> >   2) Copy the IRB data into the io_region, under the protection of
> >      the io_mutex
> >   3) Reset the vfio-ccw FSM state to IDLE to acknowledge that
> >      vfio-ccw can accept more work.
> > 
> > The trouble is that step 3 is (A) invoked for both solicited and
> > unsolicited interrupts, and (B) sitting after the mutex for step 2.
> > This second piece becomes a problem if it processes an interrupt
> > for a CLEAR SUBCHANNEL while another thread initiates a START,
> > thus allowing the CP and FSM states to get out of sync. That is:
> > 
> >     CPU 1                           CPU 2
> >     fsm_do_clear()
> >     fsm_irq()
> >                                     fsm_io_request()
> >     vfio_ccw_sch_io_todo()
> >                                     fsm_io_helper()
> > 
> > Since the FSM state and CP should be kept in sync, let's make a
> > note when the CP is released, and rely on that as an indication
> > that the FSM should also be reset at the end of this routine and
> > open up the device for more work.
> > 
> > Signed-off-by: Eric Farman <farman@linux.ibm.com>
> > ---
> >  drivers/s390/cio/vfio_ccw_drv.c | 8 +++++---
> >  1 file changed, 5 insertions(+), 3 deletions(-)
> > 
> > diff --git a/drivers/s390/cio/vfio_ccw_drv.c
> > b/drivers/s390/cio/vfio_ccw_drv.c
> > index 8c625b530035..ef39182edab5 100644
> > --- a/drivers/s390/cio/vfio_ccw_drv.c
> > +++ b/drivers/s390/cio/vfio_ccw_drv.c
> > @@ -85,7 +85,7 @@ static void vfio_ccw_sch_io_todo(struct
> > work_struct *work)
> >  {
> >  	struct vfio_ccw_private *private;
> >  	struct irb *irb;
> > -	bool is_final;
> > +	bool is_final, is_finished = false;
> 
> <bikeshed>
> "is_finished" does not really say what is finished; maybe call it
> "cp_is_finished"?
> </bikeshed>

Sure, that's a bit clearer.

> 
> >  
> >  	private = container_of(work, struct vfio_ccw_private, io_work);
> >  	irb = &private->irb;
> > @@ -94,14 +94,16 @@ static void vfio_ccw_sch_io_todo(struct
> > work_struct *work)
> >  		     (SCSW_ACTL_DEVACT | SCSW_ACTL_SCHACT));
> >  	if (scsw_is_solicited(&irb->scsw)) {
> >  		cp_update_scsw(&private->cp, &irb->scsw);
> > -		if (is_final && private->state ==
> > VFIO_CCW_STATE_CP_PENDING)
> > +		if (is_final && private->state ==
> > VFIO_CCW_STATE_CP_PENDING) {
> >  			cp_free(&private->cp);
> > +			is_finished = true;
> > +		}
> >  	}
> >  	mutex_lock(&private->io_mutex);
> >  	memcpy(private->io_region->irb_area, irb, sizeof(*irb));
> >  	mutex_unlock(&private->io_mutex);
> >  
> > -	if (private->mdev && is_final)
> > +	if (private->mdev && is_finished)
> 
> Maybe add a comment?
> 
> /*
>  * Reset to idle if processing of a channel program
>  * has finished; but do not overwrite a possible
>  * processing state if we got a final interrupt for hsch
>  * or csch.
>  */
> 
> Otherwise, I see us scratching our heads again in a few months :)

Almost certainly. :)

> 
> >  		private->state = VFIO_CCW_STATE_IDLE;
> >  
> >  	if (private->io_trigger)
> 
> Patch looks good to me.
> 

Thanks. Will make the above improvements and send as non-RFC.
diff mbox series

Patch

diff --git a/drivers/s390/cio/vfio_ccw_drv.c b/drivers/s390/cio/vfio_ccw_drv.c
index 8c625b530035..ef39182edab5 100644
--- a/drivers/s390/cio/vfio_ccw_drv.c
+++ b/drivers/s390/cio/vfio_ccw_drv.c
@@ -85,7 +85,7 @@  static void vfio_ccw_sch_io_todo(struct work_struct *work)
 {
 	struct vfio_ccw_private *private;
 	struct irb *irb;
-	bool is_final;
+	bool is_final, is_finished = false;
 
 	private = container_of(work, struct vfio_ccw_private, io_work);
 	irb = &private->irb;
@@ -94,14 +94,16 @@  static void vfio_ccw_sch_io_todo(struct work_struct *work)
 		     (SCSW_ACTL_DEVACT | SCSW_ACTL_SCHACT));
 	if (scsw_is_solicited(&irb->scsw)) {
 		cp_update_scsw(&private->cp, &irb->scsw);
-		if (is_final && private->state == VFIO_CCW_STATE_CP_PENDING)
+		if (is_final && private->state == VFIO_CCW_STATE_CP_PENDING) {
 			cp_free(&private->cp);
+			is_finished = true;
+		}
 	}
 	mutex_lock(&private->io_mutex);
 	memcpy(private->io_region->irb_area, irb, sizeof(*irb));
 	mutex_unlock(&private->io_mutex);
 
-	if (private->mdev && is_final)
+	if (private->mdev && is_finished)
 		private->state = VFIO_CCW_STATE_IDLE;
 
 	if (private->io_trigger)