Message ID | 20210511195631.3995081-4-farman@linux.ibm.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | vfio-ccw: Fix interrupt handling for HALT/CLEAR | expand |
On 5/11/21 3:56 PM, Eric Farman wrote: > Today, the stacked call to vfio_ccw_sch_io_todo() does three things: > > 1) Update a solicited IRB with CP information, and release the CP > if the interrupt was the end of a START operation. > 2) Copy the IRB data into the io_region, under the protection of > the io_mutex > 3) Reset the vfio-ccw FSM state to IDLE to acknowledge that > vfio-ccw can accept more work. > > The trouble is that step 3 is (A) invoked for both solicited and > unsolicited interrupts, and (B) sitting after the mutex for step 2. > This second piece becomes a problem if it processes an interrupt > for a CLEAR SUBCHANNEL while another thread initiates a START, > thus allowing the CP and FSM states to get out of sync. That is: > > CPU 1 CPU 2 > fsm_do_clear() > fsm_irq() > fsm_io_request() > vfio_ccw_sch_io_todo() > fsm_io_helper() > > Since the FSM state and CP should be kept in sync, let's make a > note when the CP is released, and rely on that as an indication > that the FSM should also be reset at the end of this routine and > open up the device for more work. > > Signed-off-by: Eric Farman <farman@linux.ibm.com> Thanks for the detailed commit message and comment block -- this makes sense to me. Acked-by: Matthew Rosato <mjrosato@linux.ibm.com> > --- > drivers/s390/cio/vfio_ccw_drv.c | 12 ++++++++++-- > 1 file changed, 10 insertions(+), 2 deletions(-) > > diff --git a/drivers/s390/cio/vfio_ccw_drv.c b/drivers/s390/cio/vfio_ccw_drv.c > index 8c625b530035..9b61e9b131ad 100644 > --- a/drivers/s390/cio/vfio_ccw_drv.c > +++ b/drivers/s390/cio/vfio_ccw_drv.c > @@ -86,6 +86,7 @@ static void vfio_ccw_sch_io_todo(struct work_struct *work) > struct vfio_ccw_private *private; > struct irb *irb; > bool is_final; > + bool cp_is_finished = false; > > private = container_of(work, struct vfio_ccw_private, io_work); > irb = &private->irb; > @@ -94,14 +95,21 @@ static void vfio_ccw_sch_io_todo(struct work_struct *work) > (SCSW_ACTL_DEVACT | SCSW_ACTL_SCHACT)); > if (scsw_is_solicited(&irb->scsw)) { > cp_update_scsw(&private->cp, &irb->scsw); > - if (is_final && private->state == VFIO_CCW_STATE_CP_PENDING) > + if (is_final && private->state == VFIO_CCW_STATE_CP_PENDING) { > cp_free(&private->cp); > + cp_is_finished = true; > + } > } > mutex_lock(&private->io_mutex); > memcpy(private->io_region->irb_area, irb, sizeof(*irb)); > mutex_unlock(&private->io_mutex); > > - if (private->mdev && is_final) > + /* > + * Reset to IDLE only if processing of a channel program > + * has finished. Do not overwrite a possible processing > + * state if the final interrupt was for HSCH or CSCH. > + */ > + if (private->mdev && cp_is_finished) > private->state = VFIO_CCW_STATE_IDLE; > > if (private->io_trigger) >
On Tue, 11 May 2021 21:56:31 +0200 Eric Farman <farman@linux.ibm.com> wrote: > Today, the stacked call to vfio_ccw_sch_io_todo() does three things: > > 1) Update a solicited IRB with CP information, and release the CP > if the interrupt was the end of a START operation. > 2) Copy the IRB data into the io_region, under the protection of > the io_mutex > 3) Reset the vfio-ccw FSM state to IDLE to acknowledge that > vfio-ccw can accept more work. > > The trouble is that step 3 is (A) invoked for both solicited and > unsolicited interrupts, and (B) sitting after the mutex for step 2. > This second piece becomes a problem if it processes an interrupt > for a CLEAR SUBCHANNEL while another thread initiates a START, > thus allowing the CP and FSM states to get out of sync. That is: > > CPU 1 CPU 2 > fsm_do_clear() > fsm_irq() > fsm_io_request() > vfio_ccw_sch_io_todo() > fsm_io_helper() > > Since the FSM state and CP should be kept in sync, let's make a > note when the CP is released, and rely on that as an indication > that the FSM should also be reset at the end of this routine and > open up the device for more work. > > Signed-off-by: Eric Farman <farman@linux.ibm.com> > --- > drivers/s390/cio/vfio_ccw_drv.c | 12 ++++++++++-- > 1 file changed, 10 insertions(+), 2 deletions(-) Looking good :) Reviewed-by: Cornelia Huck <cohuck@redhat.com>
diff --git a/drivers/s390/cio/vfio_ccw_drv.c b/drivers/s390/cio/vfio_ccw_drv.c index 8c625b530035..9b61e9b131ad 100644 --- a/drivers/s390/cio/vfio_ccw_drv.c +++ b/drivers/s390/cio/vfio_ccw_drv.c @@ -86,6 +86,7 @@ static void vfio_ccw_sch_io_todo(struct work_struct *work) struct vfio_ccw_private *private; struct irb *irb; bool is_final; + bool cp_is_finished = false; private = container_of(work, struct vfio_ccw_private, io_work); irb = &private->irb; @@ -94,14 +95,21 @@ static void vfio_ccw_sch_io_todo(struct work_struct *work) (SCSW_ACTL_DEVACT | SCSW_ACTL_SCHACT)); if (scsw_is_solicited(&irb->scsw)) { cp_update_scsw(&private->cp, &irb->scsw); - if (is_final && private->state == VFIO_CCW_STATE_CP_PENDING) + if (is_final && private->state == VFIO_CCW_STATE_CP_PENDING) { cp_free(&private->cp); + cp_is_finished = true; + } } mutex_lock(&private->io_mutex); memcpy(private->io_region->irb_area, irb, sizeof(*irb)); mutex_unlock(&private->io_mutex); - if (private->mdev && is_final) + /* + * Reset to IDLE only if processing of a channel program + * has finished. Do not overwrite a possible processing + * state if the final interrupt was for HSCH or CSCH. + */ + if (private->mdev && cp_is_finished) private->state = VFIO_CCW_STATE_IDLE; if (private->io_trigger)
Today, the stacked call to vfio_ccw_sch_io_todo() does three things: 1) Update a solicited IRB with CP information, and release the CP if the interrupt was the end of a START operation. 2) Copy the IRB data into the io_region, under the protection of the io_mutex 3) Reset the vfio-ccw FSM state to IDLE to acknowledge that vfio-ccw can accept more work. The trouble is that step 3 is (A) invoked for both solicited and unsolicited interrupts, and (B) sitting after the mutex for step 2. This second piece becomes a problem if it processes an interrupt for a CLEAR SUBCHANNEL while another thread initiates a START, thus allowing the CP and FSM states to get out of sync. That is: CPU 1 CPU 2 fsm_do_clear() fsm_irq() fsm_io_request() vfio_ccw_sch_io_todo() fsm_io_helper() Since the FSM state and CP should be kept in sync, let's make a note when the CP is released, and rely on that as an indication that the FSM should also be reset at the end of this routine and open up the device for more work. Signed-off-by: Eric Farman <farman@linux.ibm.com> --- drivers/s390/cio/vfio_ccw_drv.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-)