diff mbox series

[v6,3/3] vfio-ccw: Serialize FSM IDLE state with I/O completion

Message ID 20210511195631.3995081-4-farman@linux.ibm.com (mailing list archive)
State New, archived
Headers show
Series vfio-ccw: Fix interrupt handling for HALT/CLEAR | expand

Commit Message

Eric Farman May 11, 2021, 7:56 p.m. UTC
Today, the stacked call to vfio_ccw_sch_io_todo() does three things:

  1) Update a solicited IRB with CP information, and release the CP
     if the interrupt was the end of a START operation.
  2) Copy the IRB data into the io_region, under the protection of
     the io_mutex
  3) Reset the vfio-ccw FSM state to IDLE to acknowledge that
     vfio-ccw can accept more work.

The trouble is that step 3 is (A) invoked for both solicited and
unsolicited interrupts, and (B) sitting after the mutex for step 2.
This second piece becomes a problem if it processes an interrupt
for a CLEAR SUBCHANNEL while another thread initiates a START,
thus allowing the CP and FSM states to get out of sync. That is:

    CPU 1                           CPU 2
    fsm_do_clear()
    fsm_irq()
                                    fsm_io_request()
    vfio_ccw_sch_io_todo()
                                    fsm_io_helper()

Since the FSM state and CP should be kept in sync, let's make a
note when the CP is released, and rely on that as an indication
that the FSM should also be reset at the end of this routine and
open up the device for more work.

Signed-off-by: Eric Farman <farman@linux.ibm.com>
---
 drivers/s390/cio/vfio_ccw_drv.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

Comments

Matthew Rosato May 11, 2021, 8:45 p.m. UTC | #1
On 5/11/21 3:56 PM, Eric Farman wrote:
> Today, the stacked call to vfio_ccw_sch_io_todo() does three things:
> 
>    1) Update a solicited IRB with CP information, and release the CP
>       if the interrupt was the end of a START operation.
>    2) Copy the IRB data into the io_region, under the protection of
>       the io_mutex
>    3) Reset the vfio-ccw FSM state to IDLE to acknowledge that
>       vfio-ccw can accept more work.
> 
> The trouble is that step 3 is (A) invoked for both solicited and
> unsolicited interrupts, and (B) sitting after the mutex for step 2.
> This second piece becomes a problem if it processes an interrupt
> for a CLEAR SUBCHANNEL while another thread initiates a START,
> thus allowing the CP and FSM states to get out of sync. That is:
> 
>      CPU 1                           CPU 2
>      fsm_do_clear()
>      fsm_irq()
>                                      fsm_io_request()
>      vfio_ccw_sch_io_todo()
>                                      fsm_io_helper()
> 
> Since the FSM state and CP should be kept in sync, let's make a
> note when the CP is released, and rely on that as an indication
> that the FSM should also be reset at the end of this routine and
> open up the device for more work.
> 
> Signed-off-by: Eric Farman <farman@linux.ibm.com>

Thanks for the detailed commit message and comment block -- this makes 
sense to me.

Acked-by: Matthew Rosato <mjrosato@linux.ibm.com>

> ---
>   drivers/s390/cio/vfio_ccw_drv.c | 12 ++++++++++--
>   1 file changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/s390/cio/vfio_ccw_drv.c b/drivers/s390/cio/vfio_ccw_drv.c
> index 8c625b530035..9b61e9b131ad 100644
> --- a/drivers/s390/cio/vfio_ccw_drv.c
> +++ b/drivers/s390/cio/vfio_ccw_drv.c
> @@ -86,6 +86,7 @@ static void vfio_ccw_sch_io_todo(struct work_struct *work)
>   	struct vfio_ccw_private *private;
>   	struct irb *irb;
>   	bool is_final;
> +	bool cp_is_finished = false;
>   
>   	private = container_of(work, struct vfio_ccw_private, io_work);
>   	irb = &private->irb;
> @@ -94,14 +95,21 @@ static void vfio_ccw_sch_io_todo(struct work_struct *work)
>   		     (SCSW_ACTL_DEVACT | SCSW_ACTL_SCHACT));
>   	if (scsw_is_solicited(&irb->scsw)) {
>   		cp_update_scsw(&private->cp, &irb->scsw);
> -		if (is_final && private->state == VFIO_CCW_STATE_CP_PENDING)
> +		if (is_final && private->state == VFIO_CCW_STATE_CP_PENDING) {
>   			cp_free(&private->cp);
> +			cp_is_finished = true;
> +		}
>   	}
>   	mutex_lock(&private->io_mutex);
>   	memcpy(private->io_region->irb_area, irb, sizeof(*irb));
>   	mutex_unlock(&private->io_mutex);
>   
> -	if (private->mdev && is_final)
> +	/*
> +	 * Reset to IDLE only if processing of a channel program
> +	 * has finished. Do not overwrite a possible processing
> +	 * state if the final interrupt was for HSCH or CSCH.
> +	 */
> +	if (private->mdev && cp_is_finished)
>   		private->state = VFIO_CCW_STATE_IDLE;
>   
>   	if (private->io_trigger)
>
Cornelia Huck May 12, 2021, 10:49 a.m. UTC | #2
On Tue, 11 May 2021 21:56:31 +0200
Eric Farman <farman@linux.ibm.com> wrote:

> Today, the stacked call to vfio_ccw_sch_io_todo() does three things:
> 
>   1) Update a solicited IRB with CP information, and release the CP
>      if the interrupt was the end of a START operation.
>   2) Copy the IRB data into the io_region, under the protection of
>      the io_mutex
>   3) Reset the vfio-ccw FSM state to IDLE to acknowledge that
>      vfio-ccw can accept more work.
> 
> The trouble is that step 3 is (A) invoked for both solicited and
> unsolicited interrupts, and (B) sitting after the mutex for step 2.
> This second piece becomes a problem if it processes an interrupt
> for a CLEAR SUBCHANNEL while another thread initiates a START,
> thus allowing the CP and FSM states to get out of sync. That is:
> 
>     CPU 1                           CPU 2
>     fsm_do_clear()
>     fsm_irq()
>                                     fsm_io_request()
>     vfio_ccw_sch_io_todo()
>                                     fsm_io_helper()
> 
> Since the FSM state and CP should be kept in sync, let's make a
> note when the CP is released, and rely on that as an indication
> that the FSM should also be reset at the end of this routine and
> open up the device for more work.
> 
> Signed-off-by: Eric Farman <farman@linux.ibm.com>
> ---
>  drivers/s390/cio/vfio_ccw_drv.c | 12 ++++++++++--
>  1 file changed, 10 insertions(+), 2 deletions(-)

Looking good :)

Reviewed-by: Cornelia Huck <cohuck@redhat.com>
diff mbox series

Patch

diff --git a/drivers/s390/cio/vfio_ccw_drv.c b/drivers/s390/cio/vfio_ccw_drv.c
index 8c625b530035..9b61e9b131ad 100644
--- a/drivers/s390/cio/vfio_ccw_drv.c
+++ b/drivers/s390/cio/vfio_ccw_drv.c
@@ -86,6 +86,7 @@  static void vfio_ccw_sch_io_todo(struct work_struct *work)
 	struct vfio_ccw_private *private;
 	struct irb *irb;
 	bool is_final;
+	bool cp_is_finished = false;
 
 	private = container_of(work, struct vfio_ccw_private, io_work);
 	irb = &private->irb;
@@ -94,14 +95,21 @@  static void vfio_ccw_sch_io_todo(struct work_struct *work)
 		     (SCSW_ACTL_DEVACT | SCSW_ACTL_SCHACT));
 	if (scsw_is_solicited(&irb->scsw)) {
 		cp_update_scsw(&private->cp, &irb->scsw);
-		if (is_final && private->state == VFIO_CCW_STATE_CP_PENDING)
+		if (is_final && private->state == VFIO_CCW_STATE_CP_PENDING) {
 			cp_free(&private->cp);
+			cp_is_finished = true;
+		}
 	}
 	mutex_lock(&private->io_mutex);
 	memcpy(private->io_region->irb_area, irb, sizeof(*irb));
 	mutex_unlock(&private->io_mutex);
 
-	if (private->mdev && is_final)
+	/*
+	 * Reset to IDLE only if processing of a channel program
+	 * has finished. Do not overwrite a possible processing
+	 * state if the final interrupt was for HSCH or CSCH.
+	 */
+	if (private->mdev && cp_is_finished)
 		private->state = VFIO_CCW_STATE_IDLE;
 
 	if (private->io_trigger)