Message ID | 20210510205646.1845844-4-farman@linux.ibm.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | vfio-ccw: Fix interrupt handling for HALT/CLEAR | expand |
On Mon, 10 May 2021 22:56:46 +0200 Eric Farman <farman@linux.ibm.com> wrote: > Today, the stacked call to vfio_ccw_sch_io_todo() does three things: > > 1) Update a solicited IRB with CP information, and release the CP > if the interrupt was the end of a START operation. > 2) Copy the IRB data into the io_region, under the protection of > the io_mutex > 3) Reset the vfio-ccw FSM state to IDLE to acknowledge that > vfio-ccw can accept more work. > > The trouble is that step 3 is (A) invoked for both solicited and > unsolicited interrupts, and (B) sitting after the mutex for step 2. > This second piece becomes a problem if it processes an interrupt > for a CLEAR SUBCHANNEL while another thread initiates a START, > thus allowing the CP and FSM states to get out of sync. That is: > > CPU 1 CPU 2 > fsm_do_clear() > fsm_irq() > fsm_io_request() > vfio_ccw_sch_io_todo() > fsm_io_helper() > > Since the FSM state and CP should be kept in sync, let's make a > note when the CP is released, and rely on that as an indication > that the FSM should also be reset at the end of this routine and > open up the device for more work. > > Signed-off-by: Eric Farman <farman@linux.ibm.com> > --- > drivers/s390/cio/vfio_ccw_drv.c | 8 +++++--- > 1 file changed, 5 insertions(+), 3 deletions(-) > > diff --git a/drivers/s390/cio/vfio_ccw_drv.c b/drivers/s390/cio/vfio_ccw_drv.c > index 8c625b530035..ef39182edab5 100644 > --- a/drivers/s390/cio/vfio_ccw_drv.c > +++ b/drivers/s390/cio/vfio_ccw_drv.c > @@ -85,7 +85,7 @@ static void vfio_ccw_sch_io_todo(struct work_struct *work) > { > struct vfio_ccw_private *private; > struct irb *irb; > - bool is_final; > + bool is_final, is_finished = false; <bikeshed> "is_finished" does not really say what is finished; maybe call it "cp_is_finished"? </bikeshed> > > private = container_of(work, struct vfio_ccw_private, io_work); > irb = &private->irb; > @@ -94,14 +94,16 @@ static void vfio_ccw_sch_io_todo(struct work_struct *work) > (SCSW_ACTL_DEVACT | SCSW_ACTL_SCHACT)); > if (scsw_is_solicited(&irb->scsw)) { > cp_update_scsw(&private->cp, &irb->scsw); > - if (is_final && private->state == VFIO_CCW_STATE_CP_PENDING) > + if (is_final && private->state == VFIO_CCW_STATE_CP_PENDING) { > cp_free(&private->cp); > + is_finished = true; > + } > } > mutex_lock(&private->io_mutex); > memcpy(private->io_region->irb_area, irb, sizeof(*irb)); > mutex_unlock(&private->io_mutex); > > - if (private->mdev && is_final) > + if (private->mdev && is_finished) Maybe add a comment? /* * Reset to idle if processing of a channel program * has finished; but do not overwrite a possible * processing state if we got a final interrupt for hsch * or csch. */ Otherwise, I see us scratching our heads again in a few months :) > private->state = VFIO_CCW_STATE_IDLE; > > if (private->io_trigger) Patch looks good to me.
On Tue, 2021-05-11 at 13:31 +0200, Cornelia Huck wrote: > On Mon, 10 May 2021 22:56:46 +0200 > Eric Farman <farman@linux.ibm.com> wrote: > > > Today, the stacked call to vfio_ccw_sch_io_todo() does three > > things: > > > > 1) Update a solicited IRB with CP information, and release the CP > > if the interrupt was the end of a START operation. > > 2) Copy the IRB data into the io_region, under the protection of > > the io_mutex > > 3) Reset the vfio-ccw FSM state to IDLE to acknowledge that > > vfio-ccw can accept more work. > > > > The trouble is that step 3 is (A) invoked for both solicited and > > unsolicited interrupts, and (B) sitting after the mutex for step 2. > > This second piece becomes a problem if it processes an interrupt > > for a CLEAR SUBCHANNEL while another thread initiates a START, > > thus allowing the CP and FSM states to get out of sync. That is: > > > > CPU 1 CPU 2 > > fsm_do_clear() > > fsm_irq() > > fsm_io_request() > > vfio_ccw_sch_io_todo() > > fsm_io_helper() > > > > Since the FSM state and CP should be kept in sync, let's make a > > note when the CP is released, and rely on that as an indication > > that the FSM should also be reset at the end of this routine and > > open up the device for more work. > > > > Signed-off-by: Eric Farman <farman@linux.ibm.com> > > --- > > drivers/s390/cio/vfio_ccw_drv.c | 8 +++++--- > > 1 file changed, 5 insertions(+), 3 deletions(-) > > > > diff --git a/drivers/s390/cio/vfio_ccw_drv.c > > b/drivers/s390/cio/vfio_ccw_drv.c > > index 8c625b530035..ef39182edab5 100644 > > --- a/drivers/s390/cio/vfio_ccw_drv.c > > +++ b/drivers/s390/cio/vfio_ccw_drv.c > > @@ -85,7 +85,7 @@ static void vfio_ccw_sch_io_todo(struct > > work_struct *work) > > { > > struct vfio_ccw_private *private; > > struct irb *irb; > > - bool is_final; > > + bool is_final, is_finished = false; > > <bikeshed> > "is_finished" does not really say what is finished; maybe call it > "cp_is_finished"? > </bikeshed> Sure, that's a bit clearer. > > > > > private = container_of(work, struct vfio_ccw_private, io_work); > > irb = &private->irb; > > @@ -94,14 +94,16 @@ static void vfio_ccw_sch_io_todo(struct > > work_struct *work) > > (SCSW_ACTL_DEVACT | SCSW_ACTL_SCHACT)); > > if (scsw_is_solicited(&irb->scsw)) { > > cp_update_scsw(&private->cp, &irb->scsw); > > - if (is_final && private->state == > > VFIO_CCW_STATE_CP_PENDING) > > + if (is_final && private->state == > > VFIO_CCW_STATE_CP_PENDING) { > > cp_free(&private->cp); > > + is_finished = true; > > + } > > } > > mutex_lock(&private->io_mutex); > > memcpy(private->io_region->irb_area, irb, sizeof(*irb)); > > mutex_unlock(&private->io_mutex); > > > > - if (private->mdev && is_final) > > + if (private->mdev && is_finished) > > Maybe add a comment? > > /* > * Reset to idle if processing of a channel program > * has finished; but do not overwrite a possible > * processing state if we got a final interrupt for hsch > * or csch. > */ > > Otherwise, I see us scratching our heads again in a few months :) Almost certainly. :) > > > private->state = VFIO_CCW_STATE_IDLE; > > > > if (private->io_trigger) > > Patch looks good to me. > Thanks. Will make the above improvements and send as non-RFC.
diff --git a/drivers/s390/cio/vfio_ccw_drv.c b/drivers/s390/cio/vfio_ccw_drv.c index 8c625b530035..ef39182edab5 100644 --- a/drivers/s390/cio/vfio_ccw_drv.c +++ b/drivers/s390/cio/vfio_ccw_drv.c @@ -85,7 +85,7 @@ static void vfio_ccw_sch_io_todo(struct work_struct *work) { struct vfio_ccw_private *private; struct irb *irb; - bool is_final; + bool is_final, is_finished = false; private = container_of(work, struct vfio_ccw_private, io_work); irb = &private->irb; @@ -94,14 +94,16 @@ static void vfio_ccw_sch_io_todo(struct work_struct *work) (SCSW_ACTL_DEVACT | SCSW_ACTL_SCHACT)); if (scsw_is_solicited(&irb->scsw)) { cp_update_scsw(&private->cp, &irb->scsw); - if (is_final && private->state == VFIO_CCW_STATE_CP_PENDING) + if (is_final && private->state == VFIO_CCW_STATE_CP_PENDING) { cp_free(&private->cp); + is_finished = true; + } } mutex_lock(&private->io_mutex); memcpy(private->io_region->irb_area, irb, sizeof(*irb)); mutex_unlock(&private->io_mutex); - if (private->mdev && is_final) + if (private->mdev && is_finished) private->state = VFIO_CCW_STATE_IDLE; if (private->io_trigger)
Today, the stacked call to vfio_ccw_sch_io_todo() does three things: 1) Update a solicited IRB with CP information, and release the CP if the interrupt was the end of a START operation. 2) Copy the IRB data into the io_region, under the protection of the io_mutex 3) Reset the vfio-ccw FSM state to IDLE to acknowledge that vfio-ccw can accept more work. The trouble is that step 3 is (A) invoked for both solicited and unsolicited interrupts, and (B) sitting after the mutex for step 2. This second piece becomes a problem if it processes an interrupt for a CLEAR SUBCHANNEL while another thread initiates a START, thus allowing the CP and FSM states to get out of sync. That is: CPU 1 CPU 2 fsm_do_clear() fsm_irq() fsm_io_request() vfio_ccw_sch_io_todo() fsm_io_helper() Since the FSM state and CP should be kept in sync, let's make a note when the CP is released, and rely on that as an indication that the FSM should also be reset at the end of this routine and open up the device for more work. Signed-off-by: Eric Farman <farman@linux.ibm.com> --- drivers/s390/cio/vfio_ccw_drv.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-)