[RFC,v1,2/2] vfio-ccw: Don't exit early if state of the vfio-ccw subchannel is not idle
diff mbox series

Message ID 61b7644d447debefcd36e030bce7df979b335405.1548082107.git.alifm@linux.ibm.com
State New
Headers show
Series
  • Small vfio-ccw fixes
Related show

Commit Message

Farhan Ali Jan. 21, 2019, 2:54 p.m. UTC
This check is unecessary as we already have the vfio state machine
to handle I/O requests.

On the other hand, this check returns incorrect information to
userspace if the state of the subchannel is not idle. For example
if the state is busy and new I/O request comes in, this will return
an EACCES, whereas we should return EBUSY.

Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
---
 drivers/s390/cio/vfio_ccw_ops.c | 2 --
 1 file changed, 2 deletions(-)

Comments

Cornelia Huck Jan. 21, 2019, 3:18 p.m. UTC | #1
On Mon, 21 Jan 2019 09:54:09 -0500
Farhan Ali <alifm@linux.ibm.com> wrote:

> This check is unecessary as we already have the vfio state machine
> to handle I/O requests.
> 
> On the other hand, this check returns incorrect information to
> userspace if the state of the subchannel is not idle. For example
> if the state is busy and new I/O request comes in, this will return
> an EACCES, whereas we should return EBUSY.
> 
> Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
> ---
>  drivers/s390/cio/vfio_ccw_ops.c | 2 --
>  1 file changed, 2 deletions(-)
> 
> diff --git a/drivers/s390/cio/vfio_ccw_ops.c b/drivers/s390/cio/vfio_ccw_ops.c
> index f673e10..3fdcc6d 100644
> --- a/drivers/s390/cio/vfio_ccw_ops.c
> +++ b/drivers/s390/cio/vfio_ccw_ops.c
> @@ -193,8 +193,6 @@ static ssize_t vfio_ccw_mdev_write(struct mdev_device *mdev,
>  		return -EINVAL;
>  
>  	private = dev_get_drvdata(mdev_parent_dev(mdev));
> -	if (private->state != VFIO_CCW_STATE_IDLE)
> -		return -EACCES;
>  
>  	region = private->io_region;
>  	if (copy_from_user((void *)region + *ppos, buf, count))

Hm, the patchset for halt/clear handling I recently posted changes this
to a check for NOT_OPER || STANDBY. What do you think of that option?
Farhan Ali Jan. 21, 2019, 3:48 p.m. UTC | #2
On 01/21/2019 10:18 AM, Cornelia Huck wrote:
> On Mon, 21 Jan 2019 09:54:09 -0500
> Farhan Ali <alifm@linux.ibm.com> wrote:
> 
>> This check is unecessary as we already have the vfio state machine
>> to handle I/O requests.
>>
>> On the other hand, this check returns incorrect information to
>> userspace if the state of the subchannel is not idle. For example
>> if the state is busy and new I/O request comes in, this will return
>> an EACCES, whereas we should return EBUSY.
>>
>> Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
>> ---
>>   drivers/s390/cio/vfio_ccw_ops.c | 2 --
>>   1 file changed, 2 deletions(-)
>>
>> diff --git a/drivers/s390/cio/vfio_ccw_ops.c b/drivers/s390/cio/vfio_ccw_ops.c
>> index f673e10..3fdcc6d 100644
>> --- a/drivers/s390/cio/vfio_ccw_ops.c
>> +++ b/drivers/s390/cio/vfio_ccw_ops.c
>> @@ -193,8 +193,6 @@ static ssize_t vfio_ccw_mdev_write(struct mdev_device *mdev,
>>   		return -EINVAL;
>>   
>>   	private = dev_get_drvdata(mdev_parent_dev(mdev));
>> -	if (private->state != VFIO_CCW_STATE_IDLE)
>> -		return -EACCES;
>>   
>>   	region = private->io_region;
>>   	if (copy_from_user((void *)region + *ppos, buf, count))
> 
> Hm, the patchset for halt/clear handling I recently posted changes this
> to a check for NOT_OPER || STANDBY. What do you think of that option?
> 
> 
I am concerned with the return code that we send userspace. With the 
state machines we return an EIO for NOT_OPER or STANDBY, but we return 
EACCES in the early check. QEMU on an EACCES returns a 'not_oper' to the 
guest and for EIO will inject an interrupt.

I believe we should try to keep it consistent to make debugging errors 
easier :)
Cornelia Huck Jan. 21, 2019, 3:52 p.m. UTC | #3
On Mon, 21 Jan 2019 10:48:32 -0500
Farhan Ali <alifm@linux.ibm.com> wrote:

> On 01/21/2019 10:18 AM, Cornelia Huck wrote:
> > On Mon, 21 Jan 2019 09:54:09 -0500
> > Farhan Ali <alifm@linux.ibm.com> wrote:
> >   
> >> This check is unecessary as we already have the vfio state machine
> >> to handle I/O requests.
> >>
> >> On the other hand, this check returns incorrect information to
> >> userspace if the state of the subchannel is not idle. For example
> >> if the state is busy and new I/O request comes in, this will return
> >> an EACCES, whereas we should return EBUSY.
> >>
> >> Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
> >> ---
> >>   drivers/s390/cio/vfio_ccw_ops.c | 2 --
> >>   1 file changed, 2 deletions(-)
> >>
> >> diff --git a/drivers/s390/cio/vfio_ccw_ops.c b/drivers/s390/cio/vfio_ccw_ops.c
> >> index f673e10..3fdcc6d 100644
> >> --- a/drivers/s390/cio/vfio_ccw_ops.c
> >> +++ b/drivers/s390/cio/vfio_ccw_ops.c
> >> @@ -193,8 +193,6 @@ static ssize_t vfio_ccw_mdev_write(struct mdev_device *mdev,
> >>   		return -EINVAL;
> >>   
> >>   	private = dev_get_drvdata(mdev_parent_dev(mdev));
> >> -	if (private->state != VFIO_CCW_STATE_IDLE)
> >> -		return -EACCES;
> >>   
> >>   	region = private->io_region;
> >>   	if (copy_from_user((void *)region + *ppos, buf, count))  
> > 
> > Hm, the patchset for halt/clear handling I recently posted changes this
> > to a check for NOT_OPER || STANDBY. What do you think of that option?
> > 
> >   
> I am concerned with the return code that we send userspace. With the 
> state machines we return an EIO for NOT_OPER or STANDBY, but we return 
> EACCES in the early check. QEMU on an EACCES returns a 'not_oper' to the 
> guest and for EIO will inject an interrupt.

I actually think that not_oper is the saner option here: the device is
in a state on the vfio side where it is not usable...

> 
> I believe we should try to keep it consistent to make debugging errors 
> easier :)

I certainly agree with that :)
Farhan Ali Jan. 21, 2019, 4:14 p.m. UTC | #4
On 01/21/2019 10:52 AM, Cornelia Huck wrote:
> On Mon, 21 Jan 2019 10:48:32 -0500
> Farhan Ali <alifm@linux.ibm.com> wrote:
> 
>> On 01/21/2019 10:18 AM, Cornelia Huck wrote:
>>> On Mon, 21 Jan 2019 09:54:09 -0500
>>> Farhan Ali <alifm@linux.ibm.com> wrote:
>>>    
>>>> This check is unecessary as we already have the vfio state machine
>>>> to handle I/O requests.
>>>>
>>>> On the other hand, this check returns incorrect information to
>>>> userspace if the state of the subchannel is not idle. For example
>>>> if the state is busy and new I/O request comes in, this will return
>>>> an EACCES, whereas we should return EBUSY.
>>>>
>>>> Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
>>>> ---
>>>>    drivers/s390/cio/vfio_ccw_ops.c | 2 --
>>>>    1 file changed, 2 deletions(-)
>>>>
>>>> diff --git a/drivers/s390/cio/vfio_ccw_ops.c b/drivers/s390/cio/vfio_ccw_ops.c
>>>> index f673e10..3fdcc6d 100644
>>>> --- a/drivers/s390/cio/vfio_ccw_ops.c
>>>> +++ b/drivers/s390/cio/vfio_ccw_ops.c
>>>> @@ -193,8 +193,6 @@ static ssize_t vfio_ccw_mdev_write(struct mdev_device *mdev,
>>>>    		return -EINVAL;
>>>>    
>>>>    	private = dev_get_drvdata(mdev_parent_dev(mdev));
>>>> -	if (private->state != VFIO_CCW_STATE_IDLE)
>>>> -		return -EACCES;
>>>>    
>>>>    	region = private->io_region;
>>>>    	if (copy_from_user((void *)region + *ppos, buf, count))
>>>
>>> Hm, the patchset for halt/clear handling I recently posted changes this
>>> to a check for NOT_OPER || STANDBY. What do you think of that option?
>>>
>>>    
>> I am concerned with the return code that we send userspace. With the
>> state machines we return an EIO for NOT_OPER or STANDBY, but we return
>> EACCES in the early check. QEMU on an EACCES returns a 'not_oper' to the
>> guest and for EIO will inject an interrupt.
> 
> I actually think that not_oper is the saner option here: the device is
> in a state on the vfio side where it is not usable...

Agreed. Then we should the change the return codes with the state 
machine to EACCES as well, that could be a follow up patch to your series.

> 
>>
>> I believe we should try to keep it consistent to make debugging errors
>> easier :)
> 
> I certainly agree with that :)
> 
>
Cornelia Huck Jan. 21, 2019, 4:55 p.m. UTC | #5
On Mon, 21 Jan 2019 11:14:16 -0500
Farhan Ali <alifm@linux.ibm.com> wrote:

> On 01/21/2019 10:52 AM, Cornelia Huck wrote:
> > On Mon, 21 Jan 2019 10:48:32 -0500
> > Farhan Ali <alifm@linux.ibm.com> wrote:
> >   
> >> On 01/21/2019 10:18 AM, Cornelia Huck wrote:  
> >>> On Mon, 21 Jan 2019 09:54:09 -0500
> >>> Farhan Ali <alifm@linux.ibm.com> wrote:
> >>>      
> >>>> This check is unecessary as we already have the vfio state machine
> >>>> to handle I/O requests.
> >>>>
> >>>> On the other hand, this check returns incorrect information to
> >>>> userspace if the state of the subchannel is not idle. For example
> >>>> if the state is busy and new I/O request comes in, this will return
> >>>> an EACCES, whereas we should return EBUSY.
> >>>>
> >>>> Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
> >>>> ---
> >>>>    drivers/s390/cio/vfio_ccw_ops.c | 2 --
> >>>>    1 file changed, 2 deletions(-)
> >>>>
> >>>> diff --git a/drivers/s390/cio/vfio_ccw_ops.c b/drivers/s390/cio/vfio_ccw_ops.c
> >>>> index f673e10..3fdcc6d 100644
> >>>> --- a/drivers/s390/cio/vfio_ccw_ops.c
> >>>> +++ b/drivers/s390/cio/vfio_ccw_ops.c
> >>>> @@ -193,8 +193,6 @@ static ssize_t vfio_ccw_mdev_write(struct mdev_device *mdev,
> >>>>    		return -EINVAL;
> >>>>    
> >>>>    	private = dev_get_drvdata(mdev_parent_dev(mdev));
> >>>> -	if (private->state != VFIO_CCW_STATE_IDLE)
> >>>> -		return -EACCES;
> >>>>    
> >>>>    	region = private->io_region;
> >>>>    	if (copy_from_user((void *)region + *ppos, buf, count))  
> >>>
> >>> Hm, the patchset for halt/clear handling I recently posted changes this
> >>> to a check for NOT_OPER || STANDBY. What do you think of that option?
> >>>
> >>>      
> >> I am concerned with the return code that we send userspace. With the
> >> state machines we return an EIO for NOT_OPER or STANDBY, but we return
> >> EACCES in the early check. QEMU on an EACCES returns a 'not_oper' to the
> >> guest and for EIO will inject an interrupt.  
> > 
> > I actually think that not_oper is the saner option here: the device is
> > in a state on the vfio side where it is not usable...  
> 
> Agreed. Then we should the change the return codes with the state 
> machine to EACCES as well, that could be a follow up patch to your series.

Yes, let's do that.

As you seem to be able to hit this problem with your workload, maybe
post it as a standalone fix? I can easily rebase my series on top of
that (and I assume that it will need more review than a simple fix.)
Halil Pasic Jan. 21, 2019, 8:10 p.m. UTC | #6
On Mon, 21 Jan 2019 17:55:39 +0100
Cornelia Huck <cohuck@redhat.com> wrote:

> On Mon, 21 Jan 2019 11:14:16 -0500
> Farhan Ali <alifm@linux.ibm.com> wrote:
> 
> > On 01/21/2019 10:52 AM, Cornelia Huck wrote:
> > > On Mon, 21 Jan 2019 10:48:32 -0500
> > > Farhan Ali <alifm@linux.ibm.com> wrote:
> > >   
> > >> On 01/21/2019 10:18 AM, Cornelia Huck wrote:  
> > >>> On Mon, 21 Jan 2019 09:54:09 -0500
> > >>> Farhan Ali <alifm@linux.ibm.com> wrote:
> > >>>      
> > >>>> This check is unecessary as we already have the vfio state machine
> > >>>> to handle I/O requests.
> > >>>>
> > >>>> On the other hand, this check returns incorrect information to
> > >>>> userspace if the state of the subchannel is not idle. For example
> > >>>> if the state is busy and new I/O request comes in, this will return
> > >>>> an EACCES, whereas we should return EBUSY.
> > >>>>
> > >>>> Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
> > >>>> ---
> > >>>>    drivers/s390/cio/vfio_ccw_ops.c | 2 --
> > >>>>    1 file changed, 2 deletions(-)
> > >>>>
> > >>>> diff --git a/drivers/s390/cio/vfio_ccw_ops.c b/drivers/s390/cio/vfio_ccw_ops.c
> > >>>> index f673e10..3fdcc6d 100644
> > >>>> --- a/drivers/s390/cio/vfio_ccw_ops.c
> > >>>> +++ b/drivers/s390/cio/vfio_ccw_ops.c
> > >>>> @@ -193,8 +193,6 @@ static ssize_t vfio_ccw_mdev_write(struct mdev_device *mdev,
> > >>>>    		return -EINVAL;
> > >>>>    
> > >>>>    	private = dev_get_drvdata(mdev_parent_dev(mdev));
> > >>>> -	if (private->state != VFIO_CCW_STATE_IDLE)
> > >>>> -		return -EACCES;
> > >>>>    
> > >>>>    	region = private->io_region;
> > >>>>    	if (copy_from_user((void *)region + *ppos, buf,
> > >>>> count))  
> > >>>
> > >>> Hm, the patchset for halt/clear handling I recently posted
> > >>> changes this to a check for NOT_OPER || STANDBY. What do you
> > >>> think of that option?
> > >>>
> > >>>      
> > >> I am concerned with the return code that we send userspace. With
> > >> the state machines we return an EIO for NOT_OPER or STANDBY, but
> > >> we return EACCES in the early check. QEMU on an EACCES returns a
> > >> 'not_oper' to the guest and for EIO will inject an interrupt.  
> > > 
> > > I actually think that not_oper is the saner option here: the
> > > device is in a state on the vfio side where it is not usable...  
> > 
> > Agreed. Then we should the change the return codes with the state 
> > machine to EACCES as well, that could be a follow up patch to your
> > series.
> 
> Yes, let's do that.
> 
> As you seem to be able to hit this problem with your workload, maybe
> post it as a standalone fix? I can easily rebase my series on top of
> that (and I assume that it will need more review than a simple fix.)
> 

I had an issue following the discussion so I chatted up Farhan. Turns
out the bad private->state he sees is VFIO_CCW_STATE_BUSY. That is, I
guess, a should normally not happen. I mean QEMU should not accept a new
SSCH before the START_PENDING for the old one is cleared. And that would
normally happen after the IO corresponding to the old SSCH is done. Well
we do have our wonderful emulated CSCH/SSCH at the moment (which do clear
the SCSW in QEMU but don't bother to terminate IO at the device). And we
may or may not have bugs in all the SCSW handling/updating code as well.

The problem with this patch is that we poke private->io_region before
calling the state machine stuff. That could race with the interrupt
handler touching the same region. That could have been OK, if IRB in the
region was read-only. But it is not, and QEMU zeroes it out on each
write AFAICT. Changing the condition to NOT_OPER || STANDBY or
omitting the check (at the moment) does not seem to be what I would call
a fix.

Regards,
Halil
Cornelia Huck Jan. 22, 2019, 10:20 a.m. UTC | #7
On Mon, 21 Jan 2019 21:10:32 +0100
Halil Pasic <pasic@linux.ibm.com> wrote:

> On Mon, 21 Jan 2019 17:55:39 +0100
> Cornelia Huck <cohuck@redhat.com> wrote:
> 
> > On Mon, 21 Jan 2019 11:14:16 -0500
> > Farhan Ali <alifm@linux.ibm.com> wrote:
> >   
> > > On 01/21/2019 10:52 AM, Cornelia Huck wrote:  
> > > > On Mon, 21 Jan 2019 10:48:32 -0500
> > > > Farhan Ali <alifm@linux.ibm.com> wrote:
> > > >     
> > > >> On 01/21/2019 10:18 AM, Cornelia Huck wrote:    
> > > >>> On Mon, 21 Jan 2019 09:54:09 -0500
> > > >>> Farhan Ali <alifm@linux.ibm.com> wrote:
> > > >>>        
> > > >>>> This check is unecessary as we already have the vfio state machine
> > > >>>> to handle I/O requests.
> > > >>>>
> > > >>>> On the other hand, this check returns incorrect information to
> > > >>>> userspace if the state of the subchannel is not idle. For example
> > > >>>> if the state is busy and new I/O request comes in, this will return
> > > >>>> an EACCES, whereas we should return EBUSY.
> > > >>>>
> > > >>>> Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
> > > >>>> ---
> > > >>>>    drivers/s390/cio/vfio_ccw_ops.c | 2 --
> > > >>>>    1 file changed, 2 deletions(-)
> > > >>>>
> > > >>>> diff --git a/drivers/s390/cio/vfio_ccw_ops.c b/drivers/s390/cio/vfio_ccw_ops.c
> > > >>>> index f673e10..3fdcc6d 100644
> > > >>>> --- a/drivers/s390/cio/vfio_ccw_ops.c
> > > >>>> +++ b/drivers/s390/cio/vfio_ccw_ops.c
> > > >>>> @@ -193,8 +193,6 @@ static ssize_t vfio_ccw_mdev_write(struct mdev_device *mdev,
> > > >>>>    		return -EINVAL;
> > > >>>>    
> > > >>>>    	private = dev_get_drvdata(mdev_parent_dev(mdev));
> > > >>>> -	if (private->state != VFIO_CCW_STATE_IDLE)
> > > >>>> -		return -EACCES;
> > > >>>>    
> > > >>>>    	region = private->io_region;
> > > >>>>    	if (copy_from_user((void *)region + *ppos, buf,
> > > >>>> count))    
> > > >>>
> > > >>> Hm, the patchset for halt/clear handling I recently posted
> > > >>> changes this to a check for NOT_OPER || STANDBY. What do you
> > > >>> think of that option?
> > > >>>
> > > >>>        
> > > >> I am concerned with the return code that we send userspace. With
> > > >> the state machines we return an EIO for NOT_OPER or STANDBY, but
> > > >> we return EACCES in the early check. QEMU on an EACCES returns a
> > > >> 'not_oper' to the guest and for EIO will inject an interrupt.    
> > > > 
> > > > I actually think that not_oper is the saner option here: the
> > > > device is in a state on the vfio side where it is not usable...    
> > > 
> > > Agreed. Then we should the change the return codes with the state 
> > > machine to EACCES as well, that could be a follow up patch to your
> > > series.  
> > 
> > Yes, let's do that.
> > 
> > As you seem to be able to hit this problem with your workload, maybe
> > post it as a standalone fix? I can easily rebase my series on top of
> > that (and I assume that it will need more review than a simple fix.)
> >   
> 
> I had an issue following the discussion so I chatted up Farhan. Turns
> out the bad private->state he sees is VFIO_CCW_STATE_BUSY. That is, I
> guess, a should normally not happen. I mean QEMU should not accept a new
> SSCH before the START_PENDING for the old one is cleared. And that would
> normally happen after the IO corresponding to the old SSCH is done. Well
> we do have our wonderful emulated CSCH/SSCH at the moment (which do clear
> the SCSW in QEMU but don't bother to terminate IO at the device). And we
> may or may not have bugs in all the SCSW handling/updating code as well.

Well, that hopefully becomes better once we have proper halt/clear
passthrough. Any kind of emulation will continue to be problematic in
one way or the other.

> The problem with this patch is that we poke private->io_region before
> calling the state machine stuff. That could race with the interrupt
> handler touching the same region. That could have been OK, if IRB in the
> region was read-only. But it is not, and QEMU zeroes it out on each
> write AFAICT. Changing the condition to NOT_OPER || STANDBY or
> omitting the check (at the moment) does not seem to be what I would call
> a fix.

So, should this wait until after we rewrite the state machine and I/O
region concurrency?

I'm not sure what the test setup exercises, but making the return code
(-EACCES vs. -EIO) consistent looks like a win to me.
Halil Pasic Jan. 22, 2019, 11:06 a.m. UTC | #8
On Tue, 22 Jan 2019 11:20:41 +0100
Cornelia Huck <cohuck@redhat.com> wrote:

> On Mon, 21 Jan 2019 21:10:32 +0100
> Halil Pasic <pasic@linux.ibm.com> wrote:
> 
> > On Mon, 21 Jan 2019 17:55:39 +0100
> > Cornelia Huck <cohuck@redhat.com> wrote:
> > 
> > > On Mon, 21 Jan 2019 11:14:16 -0500
> > > Farhan Ali <alifm@linux.ibm.com> wrote:
> > >   
> > > > On 01/21/2019 10:52 AM, Cornelia Huck wrote:  
> > > > > On Mon, 21 Jan 2019 10:48:32 -0500
> > > > > Farhan Ali <alifm@linux.ibm.com> wrote:
> > > > >     
> > > > >> On 01/21/2019 10:18 AM, Cornelia Huck wrote:    
> > > > >>> On Mon, 21 Jan 2019 09:54:09 -0500
> > > > >>> Farhan Ali <alifm@linux.ibm.com> wrote:
> > > > >>>        
> > > > >>>> This check is unecessary as we already have the vfio state machine
> > > > >>>> to handle I/O requests.
> > > > >>>>
> > > > >>>> On the other hand, this check returns incorrect information to
> > > > >>>> userspace if the state of the subchannel is not idle. For example
> > > > >>>> if the state is busy and new I/O request comes in, this will return
> > > > >>>> an EACCES, whereas we should return EBUSY.
> > > > >>>>
> > > > >>>> Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
> > > > >>>> ---
> > > > >>>>    drivers/s390/cio/vfio_ccw_ops.c | 2 --
> > > > >>>>    1 file changed, 2 deletions(-)
> > > > >>>>
> > > > >>>> diff --git a/drivers/s390/cio/vfio_ccw_ops.c b/drivers/s390/cio/vfio_ccw_ops.c
> > > > >>>> index f673e10..3fdcc6d 100644
> > > > >>>> --- a/drivers/s390/cio/vfio_ccw_ops.c
> > > > >>>> +++ b/drivers/s390/cio/vfio_ccw_ops.c
> > > > >>>> @@ -193,8 +193,6 @@ static ssize_t vfio_ccw_mdev_write(struct mdev_device *mdev,
> > > > >>>>    		return -EINVAL;
> > > > >>>>    
> > > > >>>>    	private = dev_get_drvdata(mdev_parent_dev(mdev));
> > > > >>>> -	if (private->state != VFIO_CCW_STATE_IDLE)
> > > > >>>> -		return -EACCES;
> > > > >>>>    
> > > > >>>>    	region = private->io_region;
> > > > >>>>    	if (copy_from_user((void *)region + *ppos, buf,
> > > > >>>> count))    
> > > > >>>
> > > > >>> Hm, the patchset for halt/clear handling I recently posted
> > > > >>> changes this to a check for NOT_OPER || STANDBY. What do you
> > > > >>> think of that option?
> > > > >>>
> > > > >>>        
> > > > >> I am concerned with the return code that we send userspace. With
> > > > >> the state machines we return an EIO for NOT_OPER or STANDBY, but
> > > > >> we return EACCES in the early check. QEMU on an EACCES returns a
> > > > >> 'not_oper' to the guest and for EIO will inject an interrupt.    
> > > > > 
> > > > > I actually think that not_oper is the saner option here: the
> > > > > device is in a state on the vfio side where it is not usable...    
> > > > 
> > > > Agreed. Then we should the change the return codes with the state 
> > > > machine to EACCES as well, that could be a follow up patch to your
> > > > series.  
> > > 
> > > Yes, let's do that.
> > > 
> > > As you seem to be able to hit this problem with your workload, maybe
> > > post it as a standalone fix? I can easily rebase my series on top of
> > > that (and I assume that it will need more review than a simple fix.)
> > >   
> > 
> > I had an issue following the discussion so I chatted up Farhan. Turns
> > out the bad private->state he sees is VFIO_CCW_STATE_BUSY. That is, I
> > guess, a should normally not happen. I mean QEMU should not accept a new
> > SSCH before the START_PENDING for the old one is cleared. And that would
> > normally happen after the IO corresponding to the old SSCH is done. Well
> > we do have our wonderful emulated CSCH/SSCH at the moment (which do clear
> > the SCSW in QEMU but don't bother to terminate IO at the device). And we
> > may or may not have bugs in all the SCSW handling/updating code as well.
> 
> Well, that hopefully becomes better once we have proper halt/clear
> passthrough. Any kind of emulation will continue to be problematic in
> one way or the other.
> 
> > The problem with this patch is that we poke private->io_region before
> > calling the state machine stuff. That could race with the interrupt
> > handler touching the same region. That could have been OK, if IRB in the
> > region was read-only. But it is not, and QEMU zeroes it out on each
> > write AFAICT. Changing the condition to NOT_OPER || STANDBY or
> > omitting the check (at the moment) does not seem to be what I would call
> > a fix.
> 
> So, should this wait until after we rewrite the state machine and I/O
> region concurrency?
> 
> I'm not sure what the test setup exercises, but making the return code
> (-EACCES vs. -EIO) consistent looks like a win to me.
> 

Regarding the test setup Farhan is likely to be able to provide more
info. And my intuition says, yes it should wait. I would also like to
see the error codes, which are:
* obviously a substantial part of the interface, and need to be mapped
  to channel IO conditions, and thus
* beyond usual file IO stuff (i.e. man 3 write won't give you the right
  explanation for the error codes
properly documented. @Farhan: Would you like to tackle this one?

Given what QEMU does with EIO and EACCESS, I agree, if NOT_OPER ||
STANDBY EACCESS (that is cc 3 not operational) is more appropriate.

Regards,
Halil
Farhan Ali Jan. 22, 2019, 3:19 p.m. UTC | #9
On 01/22/2019 06:06 AM, Halil Pasic wrote:
> On Tue, 22 Jan 2019 11:20:41 +0100
> Cornelia Huck <cohuck@redhat.com> wrote:
> 
>> On Mon, 21 Jan 2019 21:10:32 +0100
>> Halil Pasic <pasic@linux.ibm.com> wrote:
>>
>>> On Mon, 21 Jan 2019 17:55:39 +0100
>>> Cornelia Huck <cohuck@redhat.com> wrote:
>>>
>>>> On Mon, 21 Jan 2019 11:14:16 -0500
>>>> Farhan Ali <alifm@linux.ibm.com> wrote:
>>>>    
>>>>> On 01/21/2019 10:52 AM, Cornelia Huck wrote:
>>>>>> On Mon, 21 Jan 2019 10:48:32 -0500
>>>>>> Farhan Ali <alifm@linux.ibm.com> wrote:
>>>>>>      
>>>>>>> On 01/21/2019 10:18 AM, Cornelia Huck wrote:
>>>>>>>> On Mon, 21 Jan 2019 09:54:09 -0500
>>>>>>>> Farhan Ali <alifm@linux.ibm.com> wrote:
>>>>>>>>         
>>>>>>>>> This check is unecessary as we already have the vfio state machine
>>>>>>>>> to handle I/O requests.
>>>>>>>>>
>>>>>>>>> On the other hand, this check returns incorrect information to
>>>>>>>>> userspace if the state of the subchannel is not idle. For example
>>>>>>>>> if the state is busy and new I/O request comes in, this will return
>>>>>>>>> an EACCES, whereas we should return EBUSY.
>>>>>>>>>
>>>>>>>>> Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
>>>>>>>>> ---
>>>>>>>>>     drivers/s390/cio/vfio_ccw_ops.c | 2 --
>>>>>>>>>     1 file changed, 2 deletions(-)
>>>>>>>>>
>>>>>>>>> diff --git a/drivers/s390/cio/vfio_ccw_ops.c b/drivers/s390/cio/vfio_ccw_ops.c
>>>>>>>>> index f673e10..3fdcc6d 100644
>>>>>>>>> --- a/drivers/s390/cio/vfio_ccw_ops.c
>>>>>>>>> +++ b/drivers/s390/cio/vfio_ccw_ops.c
>>>>>>>>> @@ -193,8 +193,6 @@ static ssize_t vfio_ccw_mdev_write(struct mdev_device *mdev,
>>>>>>>>>     		return -EINVAL;
>>>>>>>>>     
>>>>>>>>>     	private = dev_get_drvdata(mdev_parent_dev(mdev));
>>>>>>>>> -	if (private->state != VFIO_CCW_STATE_IDLE)
>>>>>>>>> -		return -EACCES;
>>>>>>>>>     
>>>>>>>>>     	region = private->io_region;
>>>>>>>>>     	if (copy_from_user((void *)region + *ppos, buf,
>>>>>>>>> count))
>>>>>>>>
>>>>>>>> Hm, the patchset for halt/clear handling I recently posted
>>>>>>>> changes this to a check for NOT_OPER || STANDBY. What do you
>>>>>>>> think of that option?
>>>>>>>>
>>>>>>>>         
>>>>>>> I am concerned with the return code that we send userspace. With
>>>>>>> the state machines we return an EIO for NOT_OPER or STANDBY, but
>>>>>>> we return EACCES in the early check. QEMU on an EACCES returns a
>>>>>>> 'not_oper' to the guest and for EIO will inject an interrupt.
>>>>>>
>>>>>> I actually think that not_oper is the saner option here: the
>>>>>> device is in a state on the vfio side where it is not usable...
>>>>>
>>>>> Agreed. Then we should the change the return codes with the state
>>>>> machine to EACCES as well, that could be a follow up patch to your
>>>>> series.
>>>>
>>>> Yes, let's do that.
>>>>
>>>> As you seem to be able to hit this problem with your workload, maybe
>>>> post it as a standalone fix? I can easily rebase my series on top of
>>>> that (and I assume that it will need more review than a simple fix.)
>>>>    
>>>
>>> I had an issue following the discussion so I chatted up Farhan. Turns
>>> out the bad private->state he sees is VFIO_CCW_STATE_BUSY. That is, I
>>> guess, a should normally not happen. I mean QEMU should not accept a new
>>> SSCH before the START_PENDING for the old one is cleared. And that would
>>> normally happen after the IO corresponding to the old SSCH is done. Well
>>> we do have our wonderful emulated CSCH/SSCH at the moment (which do clear
>>> the SCSW in QEMU but don't bother to terminate IO at the device). And we
>>> may or may not have bugs in all the SCSW handling/updating code as well.
>>
>> Well, that hopefully becomes better once we have proper halt/clear
>> passthrough. Any kind of emulation will continue to be problematic in
>> one way or the other.
>>
>>> The problem with this patch is that we poke private->io_region before
>>> calling the state machine stuff. That could race with the interrupt
>>> handler touching the same region. That could have been OK, if IRB in the
>>> region was read-only. But it is not, and QEMU zeroes it out on each
>>> write AFAICT. Changing the condition to NOT_OPER || STANDBY or
>>> omitting the check (at the moment) does not seem to be what I would call
>>> a fix.
>>
>> So, should this wait until after we rewrite the state machine and I/O
>> region concurrency?
>>
>> I'm not sure what the test setup exercises, but making the return code
>> (-EACCES vs. -EIO) consistent looks like a win to me.
>>

The test setup basically runs fio (sequential, random) read and write 
workloads for a long period of time. I run each workload at least for 5 
minutes and keep running them in a continuous loop for few hours.

Also I am testing with Hyper PAV, so I run the workloads with different 
number of HPAVs for a base device.

I agree, we can wait till we have a proper solution for concurrency.


> 
> Regarding the test setup Farhan is likely to be able to provide more
> info. And my intuition says, yes it should wait. I would also like to
> see the error codes, which are:
> * obviously a substantial part of the interface, and need to be mapped
>    to channel IO conditions, and thus
> * beyond usual file IO stuff (i.e. man 3 write won't give you the right
>    explanation for the error codes
> properly documented. @Farhan: Would you like to tackle this one?
>

Are you suggesting documenting the mapping of the error codes in the 
vfio-ccw doc? I could do that, but I fear the doc wouldn't be updated as 
often.

> Given what QEMU does with EIO and EACCESS, I agree, if NOT_OPER ||
> STANDBY EACCESS (that is cc 3 not operational) is more appropriate.
>

> Regards,
> Halil
> 
>
Halil Pasic Jan. 22, 2019, 4:35 p.m. UTC | #10
On Tue, 22 Jan 2019 10:19:29 -0500
Farhan Ali <alifm@linux.ibm.com> wrote:

> > I would also like to see the error codes, which are:
> > * obviously a substantial part of the interface, and need to be mapped
> >    to channel IO conditions, and thus
> > * beyond usual file IO stuff (i.e. man 3 write won't give you the right
> >    explanation for the error codes
> > properly documented. @Farhan: Would you like to tackle this one?
> >  
> 
> Are you suggesting documenting the mapping of the error codes in the 
> vfio-ccw doc? I could do that, but I fear the doc wouldn't be updated as 
> often.

The other option would be the uapi header file. We talk about an uapi
interface, which I hope we don't want to change/extend too often. Thus
my hope is that the documentation of the error codes won't need any
updating.

@Connie do you have an opinion?

Regards,
Halil
Cornelia Huck Jan. 22, 2019, 4:43 p.m. UTC | #11
On Tue, 22 Jan 2019 17:35:36 +0100
Halil Pasic <pasic@linux.ibm.com> wrote:

> On Tue, 22 Jan 2019 10:19:29 -0500
> Farhan Ali <alifm@linux.ibm.com> wrote:
> 
> > > I would also like to see the error codes, which are:
> > > * obviously a substantial part of the interface, and need to be mapped
> > >    to channel IO conditions, and thus
> > > * beyond usual file IO stuff (i.e. man 3 write won't give you the right
> > >    explanation for the error codes
> > > properly documented. @Farhan: Would you like to tackle this one?
> > >    
> > 
> > Are you suggesting documenting the mapping of the error codes in the 
> > vfio-ccw doc? I could do that, but I fear the doc wouldn't be updated as 
> > often.  
> 
> The other option would be the uapi header file. We talk about an uapi
> interface, which I hope we don't want to change/extend too often. Thus
> my hope is that the documentation of the error codes won't need any
> updating.
> 
> @Connie do you have an opinion?

If I were to implement a userspace exploiter of this interface, I'd
probably look into the uapi header files.
Cornelia Huck Jan. 22, 2019, 4:46 p.m. UTC | #12
On Tue, 22 Jan 2019 10:19:29 -0500
Farhan Ali <alifm@linux.ibm.com> wrote:

> The test setup basically runs fio (sequential, random) read and write 
> workloads for a long period of time. I run each workload at least for 5 
> minutes and keep running them in a continuous loop for few hours.

Ok, this is more intensive testing than what I'm doing :) (basically
some dd and some tunedasd calls)

> 
> Also I am testing with Hyper PAV, so I run the workloads with different 
> number of HPAVs for a base device.

Ok, I don't have PAV in my setup.

Patch
diff mbox series

diff --git a/drivers/s390/cio/vfio_ccw_ops.c b/drivers/s390/cio/vfio_ccw_ops.c
index f673e10..3fdcc6d 100644
--- a/drivers/s390/cio/vfio_ccw_ops.c
+++ b/drivers/s390/cio/vfio_ccw_ops.c
@@ -193,8 +193,6 @@  static ssize_t vfio_ccw_mdev_write(struct mdev_device *mdev,
 		return -EINVAL;
 
 	private = dev_get_drvdata(mdev_parent_dev(mdev));
-	if (private->state != VFIO_CCW_STATE_IDLE)
-		return -EACCES;
 
 	region = private->io_region;
 	if (copy_from_user((void *)region + *ppos, buf, count))