Message ID | 61b7644d447debefcd36e030bce7df979b335405.1548082107.git.alifm@linux.ibm.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Small vfio-ccw fixes | expand |
On Mon, 21 Jan 2019 09:54:09 -0500 Farhan Ali <alifm@linux.ibm.com> wrote: > This check is unecessary as we already have the vfio state machine > to handle I/O requests. > > On the other hand, this check returns incorrect information to > userspace if the state of the subchannel is not idle. For example > if the state is busy and new I/O request comes in, this will return > an EACCES, whereas we should return EBUSY. > > Signed-off-by: Farhan Ali <alifm@linux.ibm.com> > --- > drivers/s390/cio/vfio_ccw_ops.c | 2 -- > 1 file changed, 2 deletions(-) > > diff --git a/drivers/s390/cio/vfio_ccw_ops.c b/drivers/s390/cio/vfio_ccw_ops.c > index f673e10..3fdcc6d 100644 > --- a/drivers/s390/cio/vfio_ccw_ops.c > +++ b/drivers/s390/cio/vfio_ccw_ops.c > @@ -193,8 +193,6 @@ static ssize_t vfio_ccw_mdev_write(struct mdev_device *mdev, > return -EINVAL; > > private = dev_get_drvdata(mdev_parent_dev(mdev)); > - if (private->state != VFIO_CCW_STATE_IDLE) > - return -EACCES; > > region = private->io_region; > if (copy_from_user((void *)region + *ppos, buf, count)) Hm, the patchset for halt/clear handling I recently posted changes this to a check for NOT_OPER || STANDBY. What do you think of that option?
On 01/21/2019 10:18 AM, Cornelia Huck wrote: > On Mon, 21 Jan 2019 09:54:09 -0500 > Farhan Ali <alifm@linux.ibm.com> wrote: > >> This check is unecessary as we already have the vfio state machine >> to handle I/O requests. >> >> On the other hand, this check returns incorrect information to >> userspace if the state of the subchannel is not idle. For example >> if the state is busy and new I/O request comes in, this will return >> an EACCES, whereas we should return EBUSY. >> >> Signed-off-by: Farhan Ali <alifm@linux.ibm.com> >> --- >> drivers/s390/cio/vfio_ccw_ops.c | 2 -- >> 1 file changed, 2 deletions(-) >> >> diff --git a/drivers/s390/cio/vfio_ccw_ops.c b/drivers/s390/cio/vfio_ccw_ops.c >> index f673e10..3fdcc6d 100644 >> --- a/drivers/s390/cio/vfio_ccw_ops.c >> +++ b/drivers/s390/cio/vfio_ccw_ops.c >> @@ -193,8 +193,6 @@ static ssize_t vfio_ccw_mdev_write(struct mdev_device *mdev, >> return -EINVAL; >> >> private = dev_get_drvdata(mdev_parent_dev(mdev)); >> - if (private->state != VFIO_CCW_STATE_IDLE) >> - return -EACCES; >> >> region = private->io_region; >> if (copy_from_user((void *)region + *ppos, buf, count)) > > Hm, the patchset for halt/clear handling I recently posted changes this > to a check for NOT_OPER || STANDBY. What do you think of that option? > > I am concerned with the return code that we send userspace. With the state machines we return an EIO for NOT_OPER or STANDBY, but we return EACCES in the early check. QEMU on an EACCES returns a 'not_oper' to the guest and for EIO will inject an interrupt. I believe we should try to keep it consistent to make debugging errors easier :)
On Mon, 21 Jan 2019 10:48:32 -0500 Farhan Ali <alifm@linux.ibm.com> wrote: > On 01/21/2019 10:18 AM, Cornelia Huck wrote: > > On Mon, 21 Jan 2019 09:54:09 -0500 > > Farhan Ali <alifm@linux.ibm.com> wrote: > > > >> This check is unecessary as we already have the vfio state machine > >> to handle I/O requests. > >> > >> On the other hand, this check returns incorrect information to > >> userspace if the state of the subchannel is not idle. For example > >> if the state is busy and new I/O request comes in, this will return > >> an EACCES, whereas we should return EBUSY. > >> > >> Signed-off-by: Farhan Ali <alifm@linux.ibm.com> > >> --- > >> drivers/s390/cio/vfio_ccw_ops.c | 2 -- > >> 1 file changed, 2 deletions(-) > >> > >> diff --git a/drivers/s390/cio/vfio_ccw_ops.c b/drivers/s390/cio/vfio_ccw_ops.c > >> index f673e10..3fdcc6d 100644 > >> --- a/drivers/s390/cio/vfio_ccw_ops.c > >> +++ b/drivers/s390/cio/vfio_ccw_ops.c > >> @@ -193,8 +193,6 @@ static ssize_t vfio_ccw_mdev_write(struct mdev_device *mdev, > >> return -EINVAL; > >> > >> private = dev_get_drvdata(mdev_parent_dev(mdev)); > >> - if (private->state != VFIO_CCW_STATE_IDLE) > >> - return -EACCES; > >> > >> region = private->io_region; > >> if (copy_from_user((void *)region + *ppos, buf, count)) > > > > Hm, the patchset for halt/clear handling I recently posted changes this > > to a check for NOT_OPER || STANDBY. What do you think of that option? > > > > > I am concerned with the return code that we send userspace. With the > state machines we return an EIO for NOT_OPER or STANDBY, but we return > EACCES in the early check. QEMU on an EACCES returns a 'not_oper' to the > guest and for EIO will inject an interrupt. I actually think that not_oper is the saner option here: the device is in a state on the vfio side where it is not usable... > > I believe we should try to keep it consistent to make debugging errors > easier :) I certainly agree with that :)
On 01/21/2019 10:52 AM, Cornelia Huck wrote: > On Mon, 21 Jan 2019 10:48:32 -0500 > Farhan Ali <alifm@linux.ibm.com> wrote: > >> On 01/21/2019 10:18 AM, Cornelia Huck wrote: >>> On Mon, 21 Jan 2019 09:54:09 -0500 >>> Farhan Ali <alifm@linux.ibm.com> wrote: >>> >>>> This check is unecessary as we already have the vfio state machine >>>> to handle I/O requests. >>>> >>>> On the other hand, this check returns incorrect information to >>>> userspace if the state of the subchannel is not idle. For example >>>> if the state is busy and new I/O request comes in, this will return >>>> an EACCES, whereas we should return EBUSY. >>>> >>>> Signed-off-by: Farhan Ali <alifm@linux.ibm.com> >>>> --- >>>> drivers/s390/cio/vfio_ccw_ops.c | 2 -- >>>> 1 file changed, 2 deletions(-) >>>> >>>> diff --git a/drivers/s390/cio/vfio_ccw_ops.c b/drivers/s390/cio/vfio_ccw_ops.c >>>> index f673e10..3fdcc6d 100644 >>>> --- a/drivers/s390/cio/vfio_ccw_ops.c >>>> +++ b/drivers/s390/cio/vfio_ccw_ops.c >>>> @@ -193,8 +193,6 @@ static ssize_t vfio_ccw_mdev_write(struct mdev_device *mdev, >>>> return -EINVAL; >>>> >>>> private = dev_get_drvdata(mdev_parent_dev(mdev)); >>>> - if (private->state != VFIO_CCW_STATE_IDLE) >>>> - return -EACCES; >>>> >>>> region = private->io_region; >>>> if (copy_from_user((void *)region + *ppos, buf, count)) >>> >>> Hm, the patchset for halt/clear handling I recently posted changes this >>> to a check for NOT_OPER || STANDBY. What do you think of that option? >>> >>> >> I am concerned with the return code that we send userspace. With the >> state machines we return an EIO for NOT_OPER or STANDBY, but we return >> EACCES in the early check. QEMU on an EACCES returns a 'not_oper' to the >> guest and for EIO will inject an interrupt. > > I actually think that not_oper is the saner option here: the device is > in a state on the vfio side where it is not usable... Agreed. Then we should the change the return codes with the state machine to EACCES as well, that could be a follow up patch to your series. > >> >> I believe we should try to keep it consistent to make debugging errors >> easier :) > > I certainly agree with that :) > >
On Mon, 21 Jan 2019 11:14:16 -0500 Farhan Ali <alifm@linux.ibm.com> wrote: > On 01/21/2019 10:52 AM, Cornelia Huck wrote: > > On Mon, 21 Jan 2019 10:48:32 -0500 > > Farhan Ali <alifm@linux.ibm.com> wrote: > > > >> On 01/21/2019 10:18 AM, Cornelia Huck wrote: > >>> On Mon, 21 Jan 2019 09:54:09 -0500 > >>> Farhan Ali <alifm@linux.ibm.com> wrote: > >>> > >>>> This check is unecessary as we already have the vfio state machine > >>>> to handle I/O requests. > >>>> > >>>> On the other hand, this check returns incorrect information to > >>>> userspace if the state of the subchannel is not idle. For example > >>>> if the state is busy and new I/O request comes in, this will return > >>>> an EACCES, whereas we should return EBUSY. > >>>> > >>>> Signed-off-by: Farhan Ali <alifm@linux.ibm.com> > >>>> --- > >>>> drivers/s390/cio/vfio_ccw_ops.c | 2 -- > >>>> 1 file changed, 2 deletions(-) > >>>> > >>>> diff --git a/drivers/s390/cio/vfio_ccw_ops.c b/drivers/s390/cio/vfio_ccw_ops.c > >>>> index f673e10..3fdcc6d 100644 > >>>> --- a/drivers/s390/cio/vfio_ccw_ops.c > >>>> +++ b/drivers/s390/cio/vfio_ccw_ops.c > >>>> @@ -193,8 +193,6 @@ static ssize_t vfio_ccw_mdev_write(struct mdev_device *mdev, > >>>> return -EINVAL; > >>>> > >>>> private = dev_get_drvdata(mdev_parent_dev(mdev)); > >>>> - if (private->state != VFIO_CCW_STATE_IDLE) > >>>> - return -EACCES; > >>>> > >>>> region = private->io_region; > >>>> if (copy_from_user((void *)region + *ppos, buf, count)) > >>> > >>> Hm, the patchset for halt/clear handling I recently posted changes this > >>> to a check for NOT_OPER || STANDBY. What do you think of that option? > >>> > >>> > >> I am concerned with the return code that we send userspace. With the > >> state machines we return an EIO for NOT_OPER or STANDBY, but we return > >> EACCES in the early check. QEMU on an EACCES returns a 'not_oper' to the > >> guest and for EIO will inject an interrupt. > > > > I actually think that not_oper is the saner option here: the device is > > in a state on the vfio side where it is not usable... > > Agreed. Then we should the change the return codes with the state > machine to EACCES as well, that could be a follow up patch to your series. Yes, let's do that. As you seem to be able to hit this problem with your workload, maybe post it as a standalone fix? I can easily rebase my series on top of that (and I assume that it will need more review than a simple fix.)
On Mon, 21 Jan 2019 17:55:39 +0100 Cornelia Huck <cohuck@redhat.com> wrote: > On Mon, 21 Jan 2019 11:14:16 -0500 > Farhan Ali <alifm@linux.ibm.com> wrote: > > > On 01/21/2019 10:52 AM, Cornelia Huck wrote: > > > On Mon, 21 Jan 2019 10:48:32 -0500 > > > Farhan Ali <alifm@linux.ibm.com> wrote: > > > > > >> On 01/21/2019 10:18 AM, Cornelia Huck wrote: > > >>> On Mon, 21 Jan 2019 09:54:09 -0500 > > >>> Farhan Ali <alifm@linux.ibm.com> wrote: > > >>> > > >>>> This check is unecessary as we already have the vfio state machine > > >>>> to handle I/O requests. > > >>>> > > >>>> On the other hand, this check returns incorrect information to > > >>>> userspace if the state of the subchannel is not idle. For example > > >>>> if the state is busy and new I/O request comes in, this will return > > >>>> an EACCES, whereas we should return EBUSY. > > >>>> > > >>>> Signed-off-by: Farhan Ali <alifm@linux.ibm.com> > > >>>> --- > > >>>> drivers/s390/cio/vfio_ccw_ops.c | 2 -- > > >>>> 1 file changed, 2 deletions(-) > > >>>> > > >>>> diff --git a/drivers/s390/cio/vfio_ccw_ops.c b/drivers/s390/cio/vfio_ccw_ops.c > > >>>> index f673e10..3fdcc6d 100644 > > >>>> --- a/drivers/s390/cio/vfio_ccw_ops.c > > >>>> +++ b/drivers/s390/cio/vfio_ccw_ops.c > > >>>> @@ -193,8 +193,6 @@ static ssize_t vfio_ccw_mdev_write(struct mdev_device *mdev, > > >>>> return -EINVAL; > > >>>> > > >>>> private = dev_get_drvdata(mdev_parent_dev(mdev)); > > >>>> - if (private->state != VFIO_CCW_STATE_IDLE) > > >>>> - return -EACCES; > > >>>> > > >>>> region = private->io_region; > > >>>> if (copy_from_user((void *)region + *ppos, buf, > > >>>> count)) > > >>> > > >>> Hm, the patchset for halt/clear handling I recently posted > > >>> changes this to a check for NOT_OPER || STANDBY. What do you > > >>> think of that option? > > >>> > > >>> > > >> I am concerned with the return code that we send userspace. With > > >> the state machines we return an EIO for NOT_OPER or STANDBY, but > > >> we return EACCES in the early check. QEMU on an EACCES returns a > > >> 'not_oper' to the guest and for EIO will inject an interrupt. > > > > > > I actually think that not_oper is the saner option here: the > > > device is in a state on the vfio side where it is not usable... > > > > Agreed. Then we should the change the return codes with the state > > machine to EACCES as well, that could be a follow up patch to your > > series. > > Yes, let's do that. > > As you seem to be able to hit this problem with your workload, maybe > post it as a standalone fix? I can easily rebase my series on top of > that (and I assume that it will need more review than a simple fix.) > I had an issue following the discussion so I chatted up Farhan. Turns out the bad private->state he sees is VFIO_CCW_STATE_BUSY. That is, I guess, a should normally not happen. I mean QEMU should not accept a new SSCH before the START_PENDING for the old one is cleared. And that would normally happen after the IO corresponding to the old SSCH is done. Well we do have our wonderful emulated CSCH/SSCH at the moment (which do clear the SCSW in QEMU but don't bother to terminate IO at the device). And we may or may not have bugs in all the SCSW handling/updating code as well. The problem with this patch is that we poke private->io_region before calling the state machine stuff. That could race with the interrupt handler touching the same region. That could have been OK, if IRB in the region was read-only. But it is not, and QEMU zeroes it out on each write AFAICT. Changing the condition to NOT_OPER || STANDBY or omitting the check (at the moment) does not seem to be what I would call a fix. Regards, Halil
On Mon, 21 Jan 2019 21:10:32 +0100 Halil Pasic <pasic@linux.ibm.com> wrote: > On Mon, 21 Jan 2019 17:55:39 +0100 > Cornelia Huck <cohuck@redhat.com> wrote: > > > On Mon, 21 Jan 2019 11:14:16 -0500 > > Farhan Ali <alifm@linux.ibm.com> wrote: > > > > > On 01/21/2019 10:52 AM, Cornelia Huck wrote: > > > > On Mon, 21 Jan 2019 10:48:32 -0500 > > > > Farhan Ali <alifm@linux.ibm.com> wrote: > > > > > > > >> On 01/21/2019 10:18 AM, Cornelia Huck wrote: > > > >>> On Mon, 21 Jan 2019 09:54:09 -0500 > > > >>> Farhan Ali <alifm@linux.ibm.com> wrote: > > > >>> > > > >>>> This check is unecessary as we already have the vfio state machine > > > >>>> to handle I/O requests. > > > >>>> > > > >>>> On the other hand, this check returns incorrect information to > > > >>>> userspace if the state of the subchannel is not idle. For example > > > >>>> if the state is busy and new I/O request comes in, this will return > > > >>>> an EACCES, whereas we should return EBUSY. > > > >>>> > > > >>>> Signed-off-by: Farhan Ali <alifm@linux.ibm.com> > > > >>>> --- > > > >>>> drivers/s390/cio/vfio_ccw_ops.c | 2 -- > > > >>>> 1 file changed, 2 deletions(-) > > > >>>> > > > >>>> diff --git a/drivers/s390/cio/vfio_ccw_ops.c b/drivers/s390/cio/vfio_ccw_ops.c > > > >>>> index f673e10..3fdcc6d 100644 > > > >>>> --- a/drivers/s390/cio/vfio_ccw_ops.c > > > >>>> +++ b/drivers/s390/cio/vfio_ccw_ops.c > > > >>>> @@ -193,8 +193,6 @@ static ssize_t vfio_ccw_mdev_write(struct mdev_device *mdev, > > > >>>> return -EINVAL; > > > >>>> > > > >>>> private = dev_get_drvdata(mdev_parent_dev(mdev)); > > > >>>> - if (private->state != VFIO_CCW_STATE_IDLE) > > > >>>> - return -EACCES; > > > >>>> > > > >>>> region = private->io_region; > > > >>>> if (copy_from_user((void *)region + *ppos, buf, > > > >>>> count)) > > > >>> > > > >>> Hm, the patchset for halt/clear handling I recently posted > > > >>> changes this to a check for NOT_OPER || STANDBY. What do you > > > >>> think of that option? > > > >>> > > > >>> > > > >> I am concerned with the return code that we send userspace. With > > > >> the state machines we return an EIO for NOT_OPER or STANDBY, but > > > >> we return EACCES in the early check. QEMU on an EACCES returns a > > > >> 'not_oper' to the guest and for EIO will inject an interrupt. > > > > > > > > I actually think that not_oper is the saner option here: the > > > > device is in a state on the vfio side where it is not usable... > > > > > > Agreed. Then we should the change the return codes with the state > > > machine to EACCES as well, that could be a follow up patch to your > > > series. > > > > Yes, let's do that. > > > > As you seem to be able to hit this problem with your workload, maybe > > post it as a standalone fix? I can easily rebase my series on top of > > that (and I assume that it will need more review than a simple fix.) > > > > I had an issue following the discussion so I chatted up Farhan. Turns > out the bad private->state he sees is VFIO_CCW_STATE_BUSY. That is, I > guess, a should normally not happen. I mean QEMU should not accept a new > SSCH before the START_PENDING for the old one is cleared. And that would > normally happen after the IO corresponding to the old SSCH is done. Well > we do have our wonderful emulated CSCH/SSCH at the moment (which do clear > the SCSW in QEMU but don't bother to terminate IO at the device). And we > may or may not have bugs in all the SCSW handling/updating code as well. Well, that hopefully becomes better once we have proper halt/clear passthrough. Any kind of emulation will continue to be problematic in one way or the other. > The problem with this patch is that we poke private->io_region before > calling the state machine stuff. That could race with the interrupt > handler touching the same region. That could have been OK, if IRB in the > region was read-only. But it is not, and QEMU zeroes it out on each > write AFAICT. Changing the condition to NOT_OPER || STANDBY or > omitting the check (at the moment) does not seem to be what I would call > a fix. So, should this wait until after we rewrite the state machine and I/O region concurrency? I'm not sure what the test setup exercises, but making the return code (-EACCES vs. -EIO) consistent looks like a win to me.
On Tue, 22 Jan 2019 11:20:41 +0100 Cornelia Huck <cohuck@redhat.com> wrote: > On Mon, 21 Jan 2019 21:10:32 +0100 > Halil Pasic <pasic@linux.ibm.com> wrote: > > > On Mon, 21 Jan 2019 17:55:39 +0100 > > Cornelia Huck <cohuck@redhat.com> wrote: > > > > > On Mon, 21 Jan 2019 11:14:16 -0500 > > > Farhan Ali <alifm@linux.ibm.com> wrote: > > > > > > > On 01/21/2019 10:52 AM, Cornelia Huck wrote: > > > > > On Mon, 21 Jan 2019 10:48:32 -0500 > > > > > Farhan Ali <alifm@linux.ibm.com> wrote: > > > > > > > > > >> On 01/21/2019 10:18 AM, Cornelia Huck wrote: > > > > >>> On Mon, 21 Jan 2019 09:54:09 -0500 > > > > >>> Farhan Ali <alifm@linux.ibm.com> wrote: > > > > >>> > > > > >>>> This check is unecessary as we already have the vfio state machine > > > > >>>> to handle I/O requests. > > > > >>>> > > > > >>>> On the other hand, this check returns incorrect information to > > > > >>>> userspace if the state of the subchannel is not idle. For example > > > > >>>> if the state is busy and new I/O request comes in, this will return > > > > >>>> an EACCES, whereas we should return EBUSY. > > > > >>>> > > > > >>>> Signed-off-by: Farhan Ali <alifm@linux.ibm.com> > > > > >>>> --- > > > > >>>> drivers/s390/cio/vfio_ccw_ops.c | 2 -- > > > > >>>> 1 file changed, 2 deletions(-) > > > > >>>> > > > > >>>> diff --git a/drivers/s390/cio/vfio_ccw_ops.c b/drivers/s390/cio/vfio_ccw_ops.c > > > > >>>> index f673e10..3fdcc6d 100644 > > > > >>>> --- a/drivers/s390/cio/vfio_ccw_ops.c > > > > >>>> +++ b/drivers/s390/cio/vfio_ccw_ops.c > > > > >>>> @@ -193,8 +193,6 @@ static ssize_t vfio_ccw_mdev_write(struct mdev_device *mdev, > > > > >>>> return -EINVAL; > > > > >>>> > > > > >>>> private = dev_get_drvdata(mdev_parent_dev(mdev)); > > > > >>>> - if (private->state != VFIO_CCW_STATE_IDLE) > > > > >>>> - return -EACCES; > > > > >>>> > > > > >>>> region = private->io_region; > > > > >>>> if (copy_from_user((void *)region + *ppos, buf, > > > > >>>> count)) > > > > >>> > > > > >>> Hm, the patchset for halt/clear handling I recently posted > > > > >>> changes this to a check for NOT_OPER || STANDBY. What do you > > > > >>> think of that option? > > > > >>> > > > > >>> > > > > >> I am concerned with the return code that we send userspace. With > > > > >> the state machines we return an EIO for NOT_OPER or STANDBY, but > > > > >> we return EACCES in the early check. QEMU on an EACCES returns a > > > > >> 'not_oper' to the guest and for EIO will inject an interrupt. > > > > > > > > > > I actually think that not_oper is the saner option here: the > > > > > device is in a state on the vfio side where it is not usable... > > > > > > > > Agreed. Then we should the change the return codes with the state > > > > machine to EACCES as well, that could be a follow up patch to your > > > > series. > > > > > > Yes, let's do that. > > > > > > As you seem to be able to hit this problem with your workload, maybe > > > post it as a standalone fix? I can easily rebase my series on top of > > > that (and I assume that it will need more review than a simple fix.) > > > > > > > I had an issue following the discussion so I chatted up Farhan. Turns > > out the bad private->state he sees is VFIO_CCW_STATE_BUSY. That is, I > > guess, a should normally not happen. I mean QEMU should not accept a new > > SSCH before the START_PENDING for the old one is cleared. And that would > > normally happen after the IO corresponding to the old SSCH is done. Well > > we do have our wonderful emulated CSCH/SSCH at the moment (which do clear > > the SCSW in QEMU but don't bother to terminate IO at the device). And we > > may or may not have bugs in all the SCSW handling/updating code as well. > > Well, that hopefully becomes better once we have proper halt/clear > passthrough. Any kind of emulation will continue to be problematic in > one way or the other. > > > The problem with this patch is that we poke private->io_region before > > calling the state machine stuff. That could race with the interrupt > > handler touching the same region. That could have been OK, if IRB in the > > region was read-only. But it is not, and QEMU zeroes it out on each > > write AFAICT. Changing the condition to NOT_OPER || STANDBY or > > omitting the check (at the moment) does not seem to be what I would call > > a fix. > > So, should this wait until after we rewrite the state machine and I/O > region concurrency? > > I'm not sure what the test setup exercises, but making the return code > (-EACCES vs. -EIO) consistent looks like a win to me. > Regarding the test setup Farhan is likely to be able to provide more info. And my intuition says, yes it should wait. I would also like to see the error codes, which are: * obviously a substantial part of the interface, and need to be mapped to channel IO conditions, and thus * beyond usual file IO stuff (i.e. man 3 write won't give you the right explanation for the error codes properly documented. @Farhan: Would you like to tackle this one? Given what QEMU does with EIO and EACCESS, I agree, if NOT_OPER || STANDBY EACCESS (that is cc 3 not operational) is more appropriate. Regards, Halil
On 01/22/2019 06:06 AM, Halil Pasic wrote: > On Tue, 22 Jan 2019 11:20:41 +0100 > Cornelia Huck <cohuck@redhat.com> wrote: > >> On Mon, 21 Jan 2019 21:10:32 +0100 >> Halil Pasic <pasic@linux.ibm.com> wrote: >> >>> On Mon, 21 Jan 2019 17:55:39 +0100 >>> Cornelia Huck <cohuck@redhat.com> wrote: >>> >>>> On Mon, 21 Jan 2019 11:14:16 -0500 >>>> Farhan Ali <alifm@linux.ibm.com> wrote: >>>> >>>>> On 01/21/2019 10:52 AM, Cornelia Huck wrote: >>>>>> On Mon, 21 Jan 2019 10:48:32 -0500 >>>>>> Farhan Ali <alifm@linux.ibm.com> wrote: >>>>>> >>>>>>> On 01/21/2019 10:18 AM, Cornelia Huck wrote: >>>>>>>> On Mon, 21 Jan 2019 09:54:09 -0500 >>>>>>>> Farhan Ali <alifm@linux.ibm.com> wrote: >>>>>>>> >>>>>>>>> This check is unecessary as we already have the vfio state machine >>>>>>>>> to handle I/O requests. >>>>>>>>> >>>>>>>>> On the other hand, this check returns incorrect information to >>>>>>>>> userspace if the state of the subchannel is not idle. For example >>>>>>>>> if the state is busy and new I/O request comes in, this will return >>>>>>>>> an EACCES, whereas we should return EBUSY. >>>>>>>>> >>>>>>>>> Signed-off-by: Farhan Ali <alifm@linux.ibm.com> >>>>>>>>> --- >>>>>>>>> drivers/s390/cio/vfio_ccw_ops.c | 2 -- >>>>>>>>> 1 file changed, 2 deletions(-) >>>>>>>>> >>>>>>>>> diff --git a/drivers/s390/cio/vfio_ccw_ops.c b/drivers/s390/cio/vfio_ccw_ops.c >>>>>>>>> index f673e10..3fdcc6d 100644 >>>>>>>>> --- a/drivers/s390/cio/vfio_ccw_ops.c >>>>>>>>> +++ b/drivers/s390/cio/vfio_ccw_ops.c >>>>>>>>> @@ -193,8 +193,6 @@ static ssize_t vfio_ccw_mdev_write(struct mdev_device *mdev, >>>>>>>>> return -EINVAL; >>>>>>>>> >>>>>>>>> private = dev_get_drvdata(mdev_parent_dev(mdev)); >>>>>>>>> - if (private->state != VFIO_CCW_STATE_IDLE) >>>>>>>>> - return -EACCES; >>>>>>>>> >>>>>>>>> region = private->io_region; >>>>>>>>> if (copy_from_user((void *)region + *ppos, buf, >>>>>>>>> count)) >>>>>>>> >>>>>>>> Hm, the patchset for halt/clear handling I recently posted >>>>>>>> changes this to a check for NOT_OPER || STANDBY. What do you >>>>>>>> think of that option? >>>>>>>> >>>>>>>> >>>>>>> I am concerned with the return code that we send userspace. With >>>>>>> the state machines we return an EIO for NOT_OPER or STANDBY, but >>>>>>> we return EACCES in the early check. QEMU on an EACCES returns a >>>>>>> 'not_oper' to the guest and for EIO will inject an interrupt. >>>>>> >>>>>> I actually think that not_oper is the saner option here: the >>>>>> device is in a state on the vfio side where it is not usable... >>>>> >>>>> Agreed. Then we should the change the return codes with the state >>>>> machine to EACCES as well, that could be a follow up patch to your >>>>> series. >>>> >>>> Yes, let's do that. >>>> >>>> As you seem to be able to hit this problem with your workload, maybe >>>> post it as a standalone fix? I can easily rebase my series on top of >>>> that (and I assume that it will need more review than a simple fix.) >>>> >>> >>> I had an issue following the discussion so I chatted up Farhan. Turns >>> out the bad private->state he sees is VFIO_CCW_STATE_BUSY. That is, I >>> guess, a should normally not happen. I mean QEMU should not accept a new >>> SSCH before the START_PENDING for the old one is cleared. And that would >>> normally happen after the IO corresponding to the old SSCH is done. Well >>> we do have our wonderful emulated CSCH/SSCH at the moment (which do clear >>> the SCSW in QEMU but don't bother to terminate IO at the device). And we >>> may or may not have bugs in all the SCSW handling/updating code as well. >> >> Well, that hopefully becomes better once we have proper halt/clear >> passthrough. Any kind of emulation will continue to be problematic in >> one way or the other. >> >>> The problem with this patch is that we poke private->io_region before >>> calling the state machine stuff. That could race with the interrupt >>> handler touching the same region. That could have been OK, if IRB in the >>> region was read-only. But it is not, and QEMU zeroes it out on each >>> write AFAICT. Changing the condition to NOT_OPER || STANDBY or >>> omitting the check (at the moment) does not seem to be what I would call >>> a fix. >> >> So, should this wait until after we rewrite the state machine and I/O >> region concurrency? >> >> I'm not sure what the test setup exercises, but making the return code >> (-EACCES vs. -EIO) consistent looks like a win to me. >> The test setup basically runs fio (sequential, random) read and write workloads for a long period of time. I run each workload at least for 5 minutes and keep running them in a continuous loop for few hours. Also I am testing with Hyper PAV, so I run the workloads with different number of HPAVs for a base device. I agree, we can wait till we have a proper solution for concurrency. > > Regarding the test setup Farhan is likely to be able to provide more > info. And my intuition says, yes it should wait. I would also like to > see the error codes, which are: > * obviously a substantial part of the interface, and need to be mapped > to channel IO conditions, and thus > * beyond usual file IO stuff (i.e. man 3 write won't give you the right > explanation for the error codes > properly documented. @Farhan: Would you like to tackle this one? > Are you suggesting documenting the mapping of the error codes in the vfio-ccw doc? I could do that, but I fear the doc wouldn't be updated as often. > Given what QEMU does with EIO and EACCESS, I agree, if NOT_OPER || > STANDBY EACCESS (that is cc 3 not operational) is more appropriate. > > Regards, > Halil > >
On Tue, 22 Jan 2019 10:19:29 -0500 Farhan Ali <alifm@linux.ibm.com> wrote: > > I would also like to see the error codes, which are: > > * obviously a substantial part of the interface, and need to be mapped > > to channel IO conditions, and thus > > * beyond usual file IO stuff (i.e. man 3 write won't give you the right > > explanation for the error codes > > properly documented. @Farhan: Would you like to tackle this one? > > > > Are you suggesting documenting the mapping of the error codes in the > vfio-ccw doc? I could do that, but I fear the doc wouldn't be updated as > often. The other option would be the uapi header file. We talk about an uapi interface, which I hope we don't want to change/extend too often. Thus my hope is that the documentation of the error codes won't need any updating. @Connie do you have an opinion? Regards, Halil
On Tue, 22 Jan 2019 17:35:36 +0100 Halil Pasic <pasic@linux.ibm.com> wrote: > On Tue, 22 Jan 2019 10:19:29 -0500 > Farhan Ali <alifm@linux.ibm.com> wrote: > > > > I would also like to see the error codes, which are: > > > * obviously a substantial part of the interface, and need to be mapped > > > to channel IO conditions, and thus > > > * beyond usual file IO stuff (i.e. man 3 write won't give you the right > > > explanation for the error codes > > > properly documented. @Farhan: Would you like to tackle this one? > > > > > > > Are you suggesting documenting the mapping of the error codes in the > > vfio-ccw doc? I could do that, but I fear the doc wouldn't be updated as > > often. > > The other option would be the uapi header file. We talk about an uapi > interface, which I hope we don't want to change/extend too often. Thus > my hope is that the documentation of the error codes won't need any > updating. > > @Connie do you have an opinion? If I were to implement a userspace exploiter of this interface, I'd probably look into the uapi header files.
On Tue, 22 Jan 2019 10:19:29 -0500 Farhan Ali <alifm@linux.ibm.com> wrote: > The test setup basically runs fio (sequential, random) read and write > workloads for a long period of time. I run each workload at least for 5 > minutes and keep running them in a continuous loop for few hours. Ok, this is more intensive testing than what I'm doing :) (basically some dd and some tunedasd calls) > > Also I am testing with Hyper PAV, so I run the workloads with different > number of HPAVs for a base device. Ok, I don't have PAV in my setup.
diff --git a/drivers/s390/cio/vfio_ccw_ops.c b/drivers/s390/cio/vfio_ccw_ops.c index f673e10..3fdcc6d 100644 --- a/drivers/s390/cio/vfio_ccw_ops.c +++ b/drivers/s390/cio/vfio_ccw_ops.c @@ -193,8 +193,6 @@ static ssize_t vfio_ccw_mdev_write(struct mdev_device *mdev, return -EINVAL; private = dev_get_drvdata(mdev_parent_dev(mdev)); - if (private->state != VFIO_CCW_STATE_IDLE) - return -EACCES; region = private->io_region; if (copy_from_user((void *)region + *ppos, buf, count))
This check is unecessary as we already have the vfio state machine to handle I/O requests. On the other hand, this check returns incorrect information to userspace if the state of the subchannel is not idle. For example if the state is busy and new I/O request comes in, this will return an EACCES, whereas we should return EBUSY. Signed-off-by: Farhan Ali <alifm@linux.ibm.com> --- drivers/s390/cio/vfio_ccw_ops.c | 2 -- 1 file changed, 2 deletions(-)