mbox series

[0/2] Fix vfio-ccw handling of TIC recursion

Message ID 20190222183941.29596-1-farman@linux.ibm.com (mailing list archive)
Headers show
Series Fix vfio-ccw handling of TIC recursion | expand

Message

Eric Farman Feb. 22, 2019, 6:39 p.m. UTC
Per the discussion [1] about a problem with how vfio-ccw calculates
the length of a channel program (specifically when using the
forthcoming QEMU BIOS code for DASD IPL), I present this fix.

Patch 1 fixes the problem, and is over-engineered
for readability sake.

Patch 2 takes the functions from Patch 1, and refactors the
existing code to make other areas a little easier to understand.
(I hope.)

I've been running fio for over 24 hours now, and have seen
zero hours.  Previously, I would have probably seen "a few"
errors by now, where prior to the original fix I would've seen
"many" errors.  Further tests are still ongoing.

[1] https://marc.info/?l=linux-s390&m=155063096321940&w=2

Eric Farman (2):
  s390/cio: Fix vfio-ccw handling of recursive TICs
  s390/cio: Use cpa range elsewhere within vfio-ccw

 drivers/s390/cio/vfio_ccw_cp.c | 55 ++++++++++++++++++++++++++++++++----------
 1 file changed, 42 insertions(+), 13 deletions(-)

Comments

Cornelia Huck Feb. 25, 2019, 9:11 a.m. UTC | #1
On Fri, 22 Feb 2019 19:39:39 +0100
Eric Farman <farman@linux.ibm.com> wrote:

> Per the discussion [1] about a problem with how vfio-ccw calculates
> the length of a channel program (specifically when using the
> forthcoming QEMU BIOS code for DASD IPL), I present this fix.
> 
> Patch 1 fixes the problem, and is over-engineered
> for readability sake.

:)

> 
> Patch 2 takes the functions from Patch 1, and refactors the
> existing code to make other areas a little easier to understand.
> (I hope.)
> 
> I've been running fio for over 24 hours now, and have seen
> zero hours.  Previously, I would have probably seen "a few"
> errors by now, where prior to the original fix I would've seen
> "many" errors.  Further tests are still ongoing.

Awesome, thanks!

> 
> [1] https://marc.info/?l=linux-s390&m=155063096321940&w=2
> 
> Eric Farman (2):
>   s390/cio: Fix vfio-ccw handling of recursive TICs
>   s390/cio: Use cpa range elsewhere within vfio-ccw
> 
>  drivers/s390/cio/vfio_ccw_cp.c | 55 ++++++++++++++++++++++++++++++++----------
>  1 file changed, 42 insertions(+), 13 deletions(-)
> 

I hope we can queue the patches soon, reviewing now.
Eric Farman Feb. 25, 2019, 4:30 p.m. UTC | #2
On 02/25/2019 04:11 AM, Cornelia Huck wrote:
> On Fri, 22 Feb 2019 19:39:39 +0100
> Eric Farman <farman@linux.ibm.com> wrote:
> 
>> Per the discussion [1] about a problem with how vfio-ccw calculates
>> the length of a channel program (specifically when using the
>> forthcoming QEMU BIOS code for DASD IPL), I present this fix.
>>
>> Patch 1 fixes the problem, and is over-engineered
>> for readability sake.
> 
> :)
> 
>>
>> Patch 2 takes the functions from Patch 1, and refactors the
>> existing code to make other areas a little easier to understand.
>> (I hope.)
>>
>> I've been running fio for over 24 hours now, and have seen
>> zero hours.  Previously, I would have probably seen "a few"
>> errors by now, where prior to the original fix I would've seen
>> "many" errors.  Further tests are still ongoing.
> 
> Awesome, thanks!

I left fio running over the weekend, with newly-randomized parameters 
every hour or two...  Had one error yesterday morning, in the 
NOP+TIC-to-redrive-I/O case.  I didn't leave any tracing on because I 
didn't expect I'd be able to get anything before they wrapped, and 
didn't have time to figure out a way to cleanly filter errors.

Though I did leave a counter in place for the number of times we 
processed a TIC that goes back into the current chain, and it hit about 
1900 times since Friday.  More than three quarters of them occurred 
during the error yesterday morning, so something was being dramatic at 
the time.  I guess there's one obscure corner to track down, but it 
otherwise seems to run quite a bit better than before.

  - Eric

> 
>>
>> [1] https://marc.info/?l=linux-s390&m=155063096321940&w=2
>>
>> Eric Farman (2):
>>    s390/cio: Fix vfio-ccw handling of recursive TICs
>>    s390/cio: Use cpa range elsewhere within vfio-ccw
>>
>>   drivers/s390/cio/vfio_ccw_cp.c | 55 ++++++++++++++++++++++++++++++++----------
>>   1 file changed, 42 insertions(+), 13 deletions(-)
>>
> 
> I hope we can queue the patches soon, reviewing now.
>
Cornelia Huck Feb. 25, 2019, 4:43 p.m. UTC | #3
On Mon, 25 Feb 2019 11:30:47 -0500
Eric Farman <farman@linux.ibm.com> wrote:

> On 02/25/2019 04:11 AM, Cornelia Huck wrote:
> > On Fri, 22 Feb 2019 19:39:39 +0100
> > Eric Farman <farman@linux.ibm.com> wrote:
> >   
> >> Per the discussion [1] about a problem with how vfio-ccw calculates
> >> the length of a channel program (specifically when using the
> >> forthcoming QEMU BIOS code for DASD IPL), I present this fix.
> >>
> >> Patch 1 fixes the problem, and is over-engineered
> >> for readability sake.  
> > 
> > :)
> >   
> >>
> >> Patch 2 takes the functions from Patch 1, and refactors the
> >> existing code to make other areas a little easier to understand.
> >> (I hope.)
> >>
> >> I've been running fio for over 24 hours now, and have seen
> >> zero hours.  Previously, I would have probably seen "a few"
> >> errors by now, where prior to the original fix I would've seen
> >> "many" errors.  Further tests are still ongoing.  
> > 
> > Awesome, thanks!  
> 
> I left fio running over the weekend, with newly-randomized parameters 
> every hour or two...  Had one error yesterday morning, in the 
> NOP+TIC-to-redrive-I/O case.  I didn't leave any tracing on because I 
> didn't expect I'd be able to get anything before they wrapped, and 
> didn't have time to figure out a way to cleanly filter errors.
> 
> Though I did leave a counter in place for the number of times we 
> processed a TIC that goes back into the current chain, and it hit about 
> 1900 times since Friday.  More than three quarters of them occurred 
> during the error yesterday morning, so something was being dramatic at 
> the time.  I guess there's one obscure corner to track down, but it 
> otherwise seems to run quite a bit better than before.

Agreed, the patches as they are now are a real improvement.
Farhan Ali Feb. 25, 2019, 7:23 p.m. UTC | #4
On 02/22/2019 01:39 PM, Eric Farman wrote:
> Per the discussion [1] about a problem with how vfio-ccw calculates
> the length of a channel program (specifically when using the
> forthcoming QEMU BIOS code for DASD IPL), I present this fix.
> 
> Patch 1 fixes the problem, and is over-engineered
> for readability sake.
> 
> Patch 2 takes the functions from Patch 1, and refactors the
> existing code to make other areas a little easier to understand.
> (I hope.)
> 
> I've been running fio for over 24 hours now, and have seen
> zero hours.  Previously, I would have probably seen "a few"
> errors by now, where prior to the original fix I would've seen
> "many" errors.  Further tests are still ongoing.
> 
> [1] https://marc.info/?l=linux-s390&m=155063096321940&w=2
> 
> Eric Farman (2):
>    s390/cio: Fix vfio-ccw handling of recursive TICs
>    s390/cio: Use cpa range elsewhere within vfio-ccw
> 
>   drivers/s390/cio/vfio_ccw_cp.c | 55 ++++++++++++++++++++++++++++++++----------
>   1 file changed, 42 insertions(+), 13 deletions(-)
> 


Reviewed-by: Farhan Ali <alifm@linux.ibm.com> for the series.
Cornelia Huck Feb. 26, 2019, 6:13 p.m. UTC | #5
On Fri, 22 Feb 2019 19:39:39 +0100
Eric Farman <farman@linux.ibm.com> wrote:

> Per the discussion [1] about a problem with how vfio-ccw calculates
> the length of a channel program (specifically when using the
> forthcoming QEMU BIOS code for DASD IPL), I present this fix.
> 
> Patch 1 fixes the problem, and is over-engineered
> for readability sake.
> 
> Patch 2 takes the functions from Patch 1, and refactors the
> existing code to make other areas a little easier to understand.
> (I hope.)
> 
> I've been running fio for over 24 hours now, and have seen
> zero hours.  Previously, I would have probably seen "a few"
> errors by now, where prior to the original fix I would've seen
> "many" errors.  Further tests are still ongoing.
> 
> [1] https://marc.info/?l=linux-s390&m=155063096321940&w=2
> 
> Eric Farman (2):
>   s390/cio: Fix vfio-ccw handling of recursive TICs
>   s390/cio: Use cpa range elsewhere within vfio-ccw
> 
>  drivers/s390/cio/vfio_ccw_cp.c | 55 ++++++++++++++++++++++++++++++++----------
>  1 file changed, 42 insertions(+), 13 deletions(-)
> 

Thanks, applied.