diff mbox

[04/10] vfio: ccw: replace IO_REQ event with SSCH_REQ event

Message ID 1524149293-12658-5-git-send-email-pmorel@linux.vnet.ibm.com (mailing list archive)
State New, archived
Headers show

Commit Message

Pierre Morel April 19, 2018, 2:48 p.m. UTC
This patch simplifies the IO request handling to handle the only
implemented request: SSCH.
Other request are invalid and get a return value of -EINVAL.

This patch change the event name to VFIO_CCW_EVENT_SSCH_REQ to reflect
what is done and prepare for future implementation of other requests.

Signed-off-by: Pierre Morel <pmorel@linux.vnet.ibm.com>
---
 drivers/s390/cio/vfio_ccw_fsm.c     | 63 +++++++++++++------------------------
 drivers/s390/cio/vfio_ccw_ops.c     |  9 ++++--
 drivers/s390/cio/vfio_ccw_private.h |  2 +-
 3 files changed, 29 insertions(+), 45 deletions(-)

Comments

Cornelia Huck April 25, 2018, 8:41 a.m. UTC | #1
On Thu, 19 Apr 2018 16:48:07 +0200
Pierre Morel <pmorel@linux.vnet.ibm.com> wrote:

> This patch simplifies the IO request handling to handle the only
> implemented request: SSCH.

I *really* need to post my halt/clear patches soon, I think.

> Other request are invalid and get a return value of -EINVAL.

This is an user api change: We got -EOPNOTSUPP in the region's return
code before.

> 
> This patch change the event name to VFIO_CCW_EVENT_SSCH_REQ to reflect
> what is done and prepare for future implementation of other requests.
> 
> Signed-off-by: Pierre Morel <pmorel@linux.vnet.ibm.com>
> ---
>  drivers/s390/cio/vfio_ccw_fsm.c     | 63 +++++++++++++------------------------
>  drivers/s390/cio/vfio_ccw_ops.c     |  9 ++++--
>  drivers/s390/cio/vfio_ccw_private.h |  2 +-
>  3 files changed, 29 insertions(+), 45 deletions(-)

> diff --git a/drivers/s390/cio/vfio_ccw_ops.c b/drivers/s390/cio/vfio_ccw_ops.c
> index 41eeb57..4da7b61 100644
> --- a/drivers/s390/cio/vfio_ccw_ops.c
> +++ b/drivers/s390/cio/vfio_ccw_ops.c
> @@ -188,19 +188,22 @@ static ssize_t vfio_ccw_mdev_write(struct mdev_device *mdev,
>  {
>  	struct vfio_ccw_private *private;
>  	struct ccw_io_region *region;
> +	union scsw *scsw;
>  
>  	if (*ppos + count > sizeof(*region))
>  		return -EINVAL;
>  
>  	private = dev_get_drvdata(mdev_parent_dev(mdev));
> -	if (private->state != VFIO_CCW_STATE_IDLE)
> -		return -EACCES;
>  
>  	region = &private->io_region;
>  	if (copy_from_user((void *)region + *ppos, buf, count))
>  		return -EFAULT;
>  
> -	vfio_ccw_fsm_event(private, VFIO_CCW_EVENT_IO_REQ);
> +	scsw = (union scsw *) &region->scsw_area;
> +	if ((scsw->cmd.fctl & SCSW_FCTL_START_FUNC) != SCSW_FCTL_START_FUNC)

You should not allow the halt/clear functions to be specified, if you
go that route. The precedence order needs to be clear -> halt -> start.

> +		return -EINVAL;

As said, that's a user api change. Previously, user space could detect
whether halt/clear are supported or not by simply specifying the
halt/clear function and checking for -EOPNOTSUPP in the region's return
code. Now they get -EINVAL (which I think is not a good return code,
even if we did not have the api breakage).

> +
> +	vfio_ccw_fsm_event(private, VFIO_CCW_EVENT_SSCH_REQ);
>  	if (region->ret_code != 0) {
>  		private->state = VFIO_CCW_STATE_IDLE;
>  		return region->ret_code;
> diff --git a/drivers/s390/cio/vfio_ccw_private.h b/drivers/s390/cio/vfio_ccw_private.h
> index 3284e64..93aab87 100644
> --- a/drivers/s390/cio/vfio_ccw_private.h
> +++ b/drivers/s390/cio/vfio_ccw_private.h
> @@ -76,7 +76,7 @@ enum vfio_ccw_state {
>   */
>  enum vfio_ccw_event {
>  	VFIO_CCW_EVENT_NOT_OPER,
> -	VFIO_CCW_EVENT_IO_REQ,
> +	VFIO_CCW_EVENT_SSCH_REQ,
>  	VFIO_CCW_EVENT_INTERRUPT,
>  	VFIO_CCW_EVENT_SCH_EVENT,
>  	/* last element! */

I don't think we should separate the ssch handling. The major
difference to halt/clear is that it needs channel program translation.
Everything else (issuing the instruction and processing the interrupt)
are basically the same. If we just throw everything at the hardware and
let the host's channel subsystem figure it out, we already should be
fine with regard to most of the races.
Cornelia Huck April 30, 2018, 3:30 p.m. UTC | #2
On Wed, 25 Apr 2018 15:52:19 +0200
Pierre Morel <pmorel@linux.vnet.ibm.com> wrote:

> On 25/04/2018 10:41, Cornelia Huck wrote:
> > On Thu, 19 Apr 2018 16:48:07 +0200
> > Pierre Morel<pmorel@linux.vnet.ibm.com>  wrote:

> >> diff --git a/drivers/s390/cio/vfio_ccw_private.h b/drivers/s390/cio/vfio_ccw_private.h
> >> index 3284e64..93aab87 100644
> >> --- a/drivers/s390/cio/vfio_ccw_private.h
> >> +++ b/drivers/s390/cio/vfio_ccw_private.h
> >> @@ -76,7 +76,7 @@ enum vfio_ccw_state {
> >>    */
> >>   enum vfio_ccw_event {
> >>   	VFIO_CCW_EVENT_NOT_OPER,
> >> -	VFIO_CCW_EVENT_IO_REQ,
> >> +	VFIO_CCW_EVENT_SSCH_REQ,
> >>   	VFIO_CCW_EVENT_INTERRUPT,
> >>   	VFIO_CCW_EVENT_SCH_EVENT,
> >>   	/* last element! */  
> > I don't think we should separate the ssch handling. The major 
> > difference to halt/clear is that it needs channel program translation. 
> > Everything else (issuing the instruction and processing the interrupt) 
> > are basically the same. If we just throw everything at the hardware 
> > and let the host's channel subsystem figure it out, we already should 
> > be fine with regard to most of the races.  
> 
> We must test at a moment or another the kind of request we do,
> cancel, halt and clear only need the subchannel id in register 1 and as 
> you said are much more direct to implement.
> 
> If we do not separate them here, we need a switch in the "do_io_request" 
> function.
> Is it what you mean?

Yes. Most of the handling should be the same for any function.
Cornelia Huck April 30, 2018, 3:33 p.m. UTC | #3
On Thu, 26 Apr 2018 15:48:06 +0800
Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com> wrote:

> * Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com> [2018-04-26 15:30:54 +0800]:
> 
> [...]
> 
> > > @@ -179,7 +160,7 @@ static int fsm_irq(struct vfio_ccw_private *private,
> > >  	if (private->io_trigger)
> > >  		eventfd_signal(private->io_trigger, 1);
> > > 
> > > -	return private->state;
> > > +	return VFIO_CCW_STATE_IDLE;  
> > This is not right. For example, if we are in STANDBY state (subch driver
> > is probed, but mdev device is not created), we can not jump to IDLE
> > state.
> >   
> I see my problem, for STANDBY state, we should introduce another event
> callback for VFIO_CCW_EVENT_INTERRUPT. It doesn't make sense to call
> fsm_irq() which tries to signal userspace with interrupt notification
> when mdev is not created yet... So we'd need a separated fix for this
> issue too.

But how do we even get into that situation when we don't have an mdev
yet?
Cornelia Huck May 2, 2018, 8:22 a.m. UTC | #4
On Wed, 2 May 2018 15:46:22 +0800
Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com> wrote:

> * Cornelia Huck <cohuck@redhat.com> [2018-04-30 17:33:05 +0200]:
> 
> > On Thu, 26 Apr 2018 15:48:06 +0800
> > Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com> wrote:
> >   
> > > * Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com> [2018-04-26 15:30:54 +0800]:
> > > 
> > > [...]
> > >   
> > > > > @@ -179,7 +160,7 @@ static int fsm_irq(struct vfio_ccw_private *private,
> > > > >  	if (private->io_trigger)
> > > > >  		eventfd_signal(private->io_trigger, 1);
> > > > > 
> > > > > -	return private->state;
> > > > > +	return VFIO_CCW_STATE_IDLE;    
> > > > This is not right. For example, if we are in STANDBY state (subch driver
> > > > is probed, but mdev device is not created), we can not jump to IDLE
> > > > state.
> > > >     
> > > I see my problem, for STANDBY state, we should introduce another event
> > > callback for VFIO_CCW_EVENT_INTERRUPT. It doesn't make sense to call
> > > fsm_irq() which tries to signal userspace with interrupt notification
> > > when mdev is not created yet... So we'd need a separated fix for this
> > > issue too.  
> > 
> > But how do we even get into that situation when we don't have an mdev
> > yet?
> >   
> We cann't... So let's assign fsm_nop() as the interrupt callback for
> STANDBY state?

Either that, or have a special fsm_should_not_happen() function that
can pop out a trace message and then continue to do nothing.
Pierre Morel May 3, 2018, 12:06 p.m. UTC | #5
On 30/04/2018 17:30, Cornelia Huck wrote:
> On Wed, 25 Apr 2018 15:52:19 +0200
> Pierre Morel <pmorel@linux.vnet.ibm.com> wrote:
>
>> On 25/04/2018 10:41, Cornelia Huck wrote:
>>> On Thu, 19 Apr 2018 16:48:07 +0200
>>> Pierre Morel<pmorel@linux.vnet.ibm.com>  wrote:
>>>> diff --git a/drivers/s390/cio/vfio_ccw_private.h b/drivers/s390/cio/vfio_ccw_private.h
>>>> index 3284e64..93aab87 100644
>>>> --- a/drivers/s390/cio/vfio_ccw_private.h
>>>> +++ b/drivers/s390/cio/vfio_ccw_private.h
>>>> @@ -76,7 +76,7 @@ enum vfio_ccw_state {
>>>>     */
>>>>    enum vfio_ccw_event {
>>>>    	VFIO_CCW_EVENT_NOT_OPER,
>>>> -	VFIO_CCW_EVENT_IO_REQ,
>>>> +	VFIO_CCW_EVENT_SSCH_REQ,
>>>>    	VFIO_CCW_EVENT_INTERRUPT,
>>>>    	VFIO_CCW_EVENT_SCH_EVENT,
>>>>    	/* last element! */
>>> I don't think we should separate the ssch handling. The major
>>> difference to halt/clear is that it needs channel program translation.
>>> Everything else (issuing the instruction and processing the interrupt)
>>> are basically the same. If we just throw everything at the hardware
>>> and let the host's channel subsystem figure it out, we already should
>>> be fine with regard to most of the races.
>> We must test at a moment or another the kind of request we do,
>> cancel, halt and clear only need the subchannel id in register 1 and as
>> you said are much more direct to implement.
>>
>> If we do not separate them here, we need a switch in the "do_io_request"
>> function.
>> Is it what you mean?
> Yes. Most of the handling should be the same for any function.

I really don't know, the 4 functions are quite different.

- SSCH uses an ORB, and has a quite long kernel execution time for VFIO
- there is a race between SSCH and the others instructions
- XSCH makes subchannel no longer start pending, also reset the busy 
indications
- CSCH cancels both SSCH and HSCH instruction, and perform path management
- HSCH has different busy (entry) conditions

But since they are not implemented today, I can keep the old name.
Pierre Morel May 3, 2018, 2:26 p.m. UTC | #6
On 02/05/2018 09:46, Dong Jia Shi wrote:
> * Cornelia Huck <cohuck@redhat.com> [2018-04-30 17:33:05 +0200]:
>
>> On Thu, 26 Apr 2018 15:48:06 +0800
>> Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com> wrote:
>>
>>> * Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com> [2018-04-26 15:30:54 +0800]:
>>>
>>> [...]
>>>
>>>>> @@ -179,7 +160,7 @@ static int fsm_irq(struct vfio_ccw_private *private,
>>>>>   	if (private->io_trigger)
>>>>>   		eventfd_signal(private->io_trigger, 1);
>>>>>
>>>>> -	return private->state;
>>>>> +	return VFIO_CCW_STATE_IDLE;
>>>> This is not right. For example, if we are in STANDBY state (subch driver
>>>> is probed, but mdev device is not created), we can not jump to IDLE
>>>> state.
>>>>    
>>> I see my problem, for STANDBY state, we should introduce another event
>>> callback for VFIO_CCW_EVENT_INTERRUPT. It doesn't make sense to call
>>> fsm_irq() which tries to signal userspace with interrupt notification
>>> when mdev is not created yet... So we'd need a separated fix for this
>>> issue too.
>> But how do we even get into that situation when we don't have an mdev
>> yet?
>>
> We cann't... So let's assign fsm_nop() as the interrupt callback for
> STANDBY state?
>

:) Isn't it exactly what my patch series handle?
Pierre Morel May 4, 2018, 11:02 a.m. UTC | #7
On 04/05/2018 03:19, Dong Jia Shi wrote:
> * Pierre Morel <pmorel@linux.vnet.ibm.com> [2018-05-03 16:26:29 +0200]:
>
>> On 02/05/2018 09:46, Dong Jia Shi wrote:
>>> * Cornelia Huck <cohuck@redhat.com> [2018-04-30 17:33:05 +0200]:
>>>
>>>> On Thu, 26 Apr 2018 15:48:06 +0800
>>>> Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com> wrote:
>>>>
>>>>> * Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com> [2018-04-26 15:30:54 +0800]:
>>>>>
>>>>> [...]
>>>>>
>>>>>>> @@ -179,7 +160,7 @@ static int fsm_irq(struct vfio_ccw_private *private,
>>>>>>>   	if (private->io_trigger)
>>>>>>>   		eventfd_signal(private->io_trigger, 1);
>>>>>>>
>>>>>>> -	return private->state;
>>>>>>> +	return VFIO_CCW_STATE_IDLE;
>>>>>> This is not right. For example, if we are in STANDBY state (subch driver
>>>>>> is probed, but mdev device is not created), we can not jump to IDLE
>>>>>> state.
>>>>> I see my problem, for STANDBY state, we should introduce another event
>>>>> callback for VFIO_CCW_EVENT_INTERRUPT. It doesn't make sense to call
>>>>> fsm_irq() which tries to signal userspace with interrupt notification
>>>>> when mdev is not created yet... So we'd need a separated fix for this
>>>>> issue too.
>>>> But how do we even get into that situation when we don't have an mdev
>>>> yet?
>>>>
>>> We cann't... So let's assign fsm_nop() as the interrupt callback for
>>> STANDBY state?
>>>
>> :) Isn't it exactly what my patch series handle?
> As far as I see, that's not true. ;)
>
> After this series applied,
> vfio_ccw_jumptable[VFIO_CCW_STATE_STANDBY][VFIO_CCW_EVENT_INTERRUPT] is
> still fsm_irq().
>


What I mean is, this code tries to handle design problems
without changing too much of the original code at first.

The problem here is not that the fsm_irq function is called on interrupt,
if we have an interrupt it must be signaled to user land.
The problem is that this state is entered at the wrong moment.

STANDBY should be entered, during the mdev_open when we realize the QEMU 
device,
and not during the probe, in which we should stay in NOT_OPER until we 
get the QEMU device.

The probe() and mdev_open() function should be modified, not the state 
table.

Regards,

Pierre
Cornelia Huck May 22, 2018, 3:38 p.m. UTC | #8
[still backlog processing...]

On Thu, 3 May 2018 14:06:51 +0200
Pierre Morel <pmorel@linux.vnet.ibm.com> wrote:

> On 30/04/2018 17:30, Cornelia Huck wrote:
> > On Wed, 25 Apr 2018 15:52:19 +0200
> > Pierre Morel <pmorel@linux.vnet.ibm.com> wrote:
> >  
> >> On 25/04/2018 10:41, Cornelia Huck wrote:  
> >>> On Thu, 19 Apr 2018 16:48:07 +0200
> >>> Pierre Morel<pmorel@linux.vnet.ibm.com>  wrote:  
> >>>> diff --git a/drivers/s390/cio/vfio_ccw_private.h b/drivers/s390/cio/vfio_ccw_private.h
> >>>> index 3284e64..93aab87 100644
> >>>> --- a/drivers/s390/cio/vfio_ccw_private.h
> >>>> +++ b/drivers/s390/cio/vfio_ccw_private.h
> >>>> @@ -76,7 +76,7 @@ enum vfio_ccw_state {
> >>>>     */
> >>>>    enum vfio_ccw_event {
> >>>>    	VFIO_CCW_EVENT_NOT_OPER,
> >>>> -	VFIO_CCW_EVENT_IO_REQ,
> >>>> +	VFIO_CCW_EVENT_SSCH_REQ,
> >>>>    	VFIO_CCW_EVENT_INTERRUPT,
> >>>>    	VFIO_CCW_EVENT_SCH_EVENT,
> >>>>    	/* last element! */  
> >>> I don't think we should separate the ssch handling. The major
> >>> difference to halt/clear is that it needs channel program translation.
> >>> Everything else (issuing the instruction and processing the interrupt)
> >>> are basically the same. If we just throw everything at the hardware
> >>> and let the host's channel subsystem figure it out, we already should
> >>> be fine with regard to most of the races.  
> >> We must test at a moment or another the kind of request we do,
> >> cancel, halt and clear only need the subchannel id in register 1 and as
> >> you said are much more direct to implement.
> >>
> >> If we do not separate them here, we need a switch in the "do_io_request"
> >> function.
> >> Is it what you mean?  
> > Yes. Most of the handling should be the same for any function.  
> 
> I really don't know, the 4 functions are quite different.
> 
> - SSCH uses an ORB, and has a quite long kernel execution time for VFIO
> - there is a race between SSCH and the others instructions
> - XSCH makes subchannel no longer start pending, also reset the busy 
> indications
> - CSCH cancels both SSCH and HSCH instruction, and perform path management
> - HSCH has different busy (entry) conditions

Roughly speaking, we have two categories: An asynchronous function is
performed (SSCH, HSCH, CSCH) or not (XSCH). So I would split out XSCH
in any case.

SSCH, HSCH, CSCH all perform path management. I see them as kind of
escalating (i.e. CSCH 'beats' HSCH which 'beats' SSCH). I think they
are all similar enough, though, as we can call through to the real
hardware and have it sorted out there.

Looking through the channel I/O instructions:
- RSCH should be handled with SSCH (as a special case).
- MSCH should also be handled in the long run, STSCH as well.
- SCHM is interesting, as it's not per-subchannel. We have some basic
  handling of the instruction in QEMU, but it only emulates some ssch
  counters and completely lacks support for the other fields.
- IIRC, there's also a CHSC command dealing with channel monitoring. We
  currently fence off any CHSC that is not needed for Linux to run, but
  there are some that might be useful for the guest (path handling
  etc.) Hard to come to a conclusion here without access to the
  documentation.
- I don't think we need to care about TSCH (other than keeping the
  schib up to date, which we also need to do for STSCH).
- Likewise, TPI should be handled via emulation.

Coming back to the original issue, I think we can easily handle SSCH
(and RSCH), HSCH and CSCH together (with the actual hardware doing the
heavy lifting anyway). For other instructions, we need separate
states/processing.
Cornelia Huck May 22, 2018, 3:41 p.m. UTC | #9
On Fri, 4 May 2018 13:02:36 +0200
Pierre Morel <pmorel@linux.vnet.ibm.com> wrote:

> On 04/05/2018 03:19, Dong Jia Shi wrote:
> > * Pierre Morel <pmorel@linux.vnet.ibm.com> [2018-05-03 16:26:29 +0200]:
> >  
> >> On 02/05/2018 09:46, Dong Jia Shi wrote:  
> >>> * Cornelia Huck <cohuck@redhat.com> [2018-04-30 17:33:05 +0200]:
> >>>  
> >>>> On Thu, 26 Apr 2018 15:48:06 +0800
> >>>> Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com> wrote:
> >>>>  
> >>>>> * Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com> [2018-04-26 15:30:54 +0800]:
> >>>>>
> >>>>> [...]
> >>>>>  
> >>>>>>> @@ -179,7 +160,7 @@ static int fsm_irq(struct vfio_ccw_private *private,
> >>>>>>>   	if (private->io_trigger)
> >>>>>>>   		eventfd_signal(private->io_trigger, 1);
> >>>>>>>
> >>>>>>> -	return private->state;
> >>>>>>> +	return VFIO_CCW_STATE_IDLE;  
> >>>>>> This is not right. For example, if we are in STANDBY state (subch driver
> >>>>>> is probed, but mdev device is not created), we can not jump to IDLE
> >>>>>> state.  
> >>>>> I see my problem, for STANDBY state, we should introduce another event
> >>>>> callback for VFIO_CCW_EVENT_INTERRUPT. It doesn't make sense to call
> >>>>> fsm_irq() which tries to signal userspace with interrupt notification
> >>>>> when mdev is not created yet... So we'd need a separated fix for this
> >>>>> issue too.  
> >>>> But how do we even get into that situation when we don't have an mdev
> >>>> yet?
> >>>>  
> >>> We cann't... So let's assign fsm_nop() as the interrupt callback for
> >>> STANDBY state?
> >>>  
> >> :) Isn't it exactly what my patch series handle?  
> > As far as I see, that's not true. ;)
> >
> > After this series applied,
> > vfio_ccw_jumptable[VFIO_CCW_STATE_STANDBY][VFIO_CCW_EVENT_INTERRUPT] is
> > still fsm_irq().
> >  
> 
> 
> What I mean is, this code tries to handle design problems
> without changing too much of the original code at first.
> 
> The problem here is not that the fsm_irq function is called on interrupt,
> if we have an interrupt it must be signaled to user land.
> The problem is that this state is entered at the wrong moment.
> 
> STANDBY should be entered, during the mdev_open when we realize the QEMU 
> device,
> and not during the probe, in which we should stay in NOT_OPER until we 
> get the QEMU device.
> 
> The probe() and mdev_open() function should be modified, not the state 
> table.

So, the takeaway is that we should handle starting via the init
callbacks and not via the state machine?
Pierre Morel May 23, 2018, 7:50 a.m. UTC | #10
On 22/05/2018 17:41, Cornelia Huck wrote:
> On Fri, 4 May 2018 13:02:36 +0200
> Pierre Morel <pmorel@linux.vnet.ibm.com> wrote:
>
>> On 04/05/2018 03:19, Dong Jia Shi wrote:
>>> * Pierre Morel <pmorel@linux.vnet.ibm.com> [2018-05-03 16:26:29 +0200]:
>>>   
>>>> On 02/05/2018 09:46, Dong Jia Shi wrote:
>>>>> * Cornelia Huck <cohuck@redhat.com> [2018-04-30 17:33:05 +0200]:
>>>>>   
>>>>>> On Thu, 26 Apr 2018 15:48:06 +0800
>>>>>> Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com> wrote:
>>>>>>   
>>>>>>> * Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com> [2018-04-26 15:30:54 +0800]:
>>>>>>>
>>>>>>> [...]
>>>>>>>   
>>>>>>>>> @@ -179,7 +160,7 @@ static int fsm_irq(struct vfio_ccw_private *private,
>>>>>>>>>    	if (private->io_trigger)
>>>>>>>>>    		eventfd_signal(private->io_trigger, 1);
>>>>>>>>>
>>>>>>>>> -	return private->state;
>>>>>>>>> +	return VFIO_CCW_STATE_IDLE;
>>>>>>>> This is not right. For example, if we are in STANDBY state (subch driver
>>>>>>>> is probed, but mdev device is not created), we can not jump to IDLE
>>>>>>>> state.
>>>>>>> I see my problem, for STANDBY state, we should introduce another event
>>>>>>> callback for VFIO_CCW_EVENT_INTERRUPT. It doesn't make sense to call
>>>>>>> fsm_irq() which tries to signal userspace with interrupt notification
>>>>>>> when mdev is not created yet... So we'd need a separated fix for this
>>>>>>> issue too.
>>>>>> But how do we even get into that situation when we don't have an mdev
>>>>>> yet?
>>>>>>   
>>>>> We cann't... So let's assign fsm_nop() as the interrupt callback for
>>>>> STANDBY state?
>>>>>   
>>>> :) Isn't it exactly what my patch series handle?
>>> As far as I see, that's not true. ;)
>>>
>>> After this series applied,
>>> vfio_ccw_jumptable[VFIO_CCW_STATE_STANDBY][VFIO_CCW_EVENT_INTERRUPT] is
>>> still fsm_irq().
>>>   
>>
>> What I mean is, this code tries to handle design problems
>> without changing too much of the original code at first.
>>
>> The problem here is not that the fsm_irq function is called on interrupt,
>> if we have an interrupt it must be signaled to user land.
>> The problem is that this state is entered at the wrong moment.
>>
>> STANDBY should be entered, during the mdev_open when we realize the QEMU
>> device,
>> and not during the probe, in which we should stay in NOT_OPER until we
>> get the QEMU device.
>>
>> The probe() and mdev_open() function should be modified, not the state
>> table.
> So, the takeaway is that we should handle starting via the init
> callbacks and not via the state machine?
>
hum, sorry, I think that my previous answer was not completely right,
and did not really answer to Dong Jia comment, yes fsm_irq was not
at its place, thinking again about the comments of both of you
I think that we can suppress the INIT event.

I would like to rebase the patch to include the comments you both did.
Cornelia Huck May 23, 2018, 8:10 a.m. UTC | #11
On Wed, 23 May 2018 09:50:00 +0200
Pierre Morel <pmorel@linux.ibm.com> wrote:

> On 22/05/2018 17:41, Cornelia Huck wrote:
> > On Fri, 4 May 2018 13:02:36 +0200
> > Pierre Morel <pmorel@linux.vnet.ibm.com> wrote:
> >  
> >> On 04/05/2018 03:19, Dong Jia Shi wrote:  
> >>> * Pierre Morel <pmorel@linux.vnet.ibm.com> [2018-05-03 16:26:29 +0200]:
> >>>     
> >>>> On 02/05/2018 09:46, Dong Jia Shi wrote:  
> >>>>> * Cornelia Huck <cohuck@redhat.com> [2018-04-30 17:33:05 +0200]:
> >>>>>     
> >>>>>> On Thu, 26 Apr 2018 15:48:06 +0800
> >>>>>> Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com> wrote:
> >>>>>>     
> >>>>>>> * Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com> [2018-04-26 15:30:54 +0800]:
> >>>>>>>
> >>>>>>> [...]
> >>>>>>>     
> >>>>>>>>> @@ -179,7 +160,7 @@ static int fsm_irq(struct vfio_ccw_private *private,
> >>>>>>>>>    	if (private->io_trigger)
> >>>>>>>>>    		eventfd_signal(private->io_trigger, 1);
> >>>>>>>>>
> >>>>>>>>> -	return private->state;
> >>>>>>>>> +	return VFIO_CCW_STATE_IDLE;  
> >>>>>>>> This is not right. For example, if we are in STANDBY state (subch driver
> >>>>>>>> is probed, but mdev device is not created), we can not jump to IDLE
> >>>>>>>> state.  
> >>>>>>> I see my problem, for STANDBY state, we should introduce another event
> >>>>>>> callback for VFIO_CCW_EVENT_INTERRUPT. It doesn't make sense to call
> >>>>>>> fsm_irq() which tries to signal userspace with interrupt notification
> >>>>>>> when mdev is not created yet... So we'd need a separated fix for this
> >>>>>>> issue too.  
> >>>>>> But how do we even get into that situation when we don't have an mdev
> >>>>>> yet?
> >>>>>>     
> >>>>> We cann't... So let's assign fsm_nop() as the interrupt callback for
> >>>>> STANDBY state?
> >>>>>     
> >>>> :) Isn't it exactly what my patch series handle?  
> >>> As far as I see, that's not true. ;)
> >>>
> >>> After this series applied,
> >>> vfio_ccw_jumptable[VFIO_CCW_STATE_STANDBY][VFIO_CCW_EVENT_INTERRUPT] is
> >>> still fsm_irq().
> >>>     
> >>
> >> What I mean is, this code tries to handle design problems
> >> without changing too much of the original code at first.
> >>
> >> The problem here is not that the fsm_irq function is called on interrupt,
> >> if we have an interrupt it must be signaled to user land.
> >> The problem is that this state is entered at the wrong moment.
> >>
> >> STANDBY should be entered, during the mdev_open when we realize the QEMU
> >> device,
> >> and not during the probe, in which we should stay in NOT_OPER until we
> >> get the QEMU device.
> >>
> >> The probe() and mdev_open() function should be modified, not the state
> >> table.  
> > So, the takeaway is that we should handle starting via the init
> > callbacks and not via the state machine?
> >  
> hum, sorry, I think that my previous answer was not completely right,
> and did not really answer to Dong Jia comment, yes fsm_irq was not
> at its place, thinking again about the comments of both of you
> I think that we can suppress the INIT event.
> 
> I would like to rebase the patch to include the comments you both did.
> 
> 

Yes, a respin is probably best before we get confused even more :)
Pierre Morel May 23, 2018, 8:19 a.m. UTC | #12
On 22/05/2018 17:38, Cornelia Huck wrote:
> [still backlog processing...]
>
> On Thu, 3 May 2018 14:06:51 +0200
> Pierre Morel <pmorel@linux.vnet.ibm.com> wrote:
>
>> On 30/04/2018 17:30, Cornelia Huck wrote:
>>> On Wed, 25 Apr 2018 15:52:19 +0200
>>> Pierre Morel <pmorel@linux.vnet.ibm.com> wrote:
>>>   
>>>> On 25/04/2018 10:41, Cornelia Huck wrote:
>>>>> On Thu, 19 Apr 2018 16:48:07 +0200
>>>>> Pierre Morel<pmorel@linux.vnet.ibm.com>  wrote:
>>>>>> diff --git a/drivers/s390/cio/vfio_ccw_private.h b/drivers/s390/cio/vfio_ccw_private.h
>>>>>> index 3284e64..93aab87 100644
>>>>>> --- a/drivers/s390/cio/vfio_ccw_private.h
>>>>>> +++ b/drivers/s390/cio/vfio_ccw_private.h
>>>>>> @@ -76,7 +76,7 @@ enum vfio_ccw_state {
>>>>>>      */
>>>>>>     enum vfio_ccw_event {
>>>>>>     	VFIO_CCW_EVENT_NOT_OPER,
>>>>>> -	VFIO_CCW_EVENT_IO_REQ,
>>>>>> +	VFIO_CCW_EVENT_SSCH_REQ,
>>>>>>     	VFIO_CCW_EVENT_INTERRUPT,
>>>>>>     	VFIO_CCW_EVENT_SCH_EVENT,
>>>>>>     	/* last element! */
>>>>> I don't think we should separate the ssch handling. The major
>>>>> difference to halt/clear is that it needs channel program translation.
>>>>> Everything else (issuing the instruction and processing the interrupt)
>>>>> are basically the same. If we just throw everything at the hardware
>>>>> and let the host's channel subsystem figure it out, we already should
>>>>> be fine with regard to most of the races.
>>>> We must test at a moment or another the kind of request we do,
>>>> cancel, halt and clear only need the subchannel id in register 1 and as
>>>> you said are much more direct to implement.
>>>>
>>>> If we do not separate them here, we need a switch in the "do_io_request"
>>>> function.
>>>> Is it what you mean?
>>> Yes. Most of the handling should be the same for any function.
>> I really don't know, the 4 functions are quite different.
>>
>> - SSCH uses an ORB, and has a quite long kernel execution time for VFIO
>> - there is a race between SSCH and the others instructions
>> - XSCH makes subchannel no longer start pending, also reset the busy
>> indications
>> - CSCH cancels both SSCH and HSCH instruction, and perform path management
>> - HSCH has different busy (entry) conditions
> Roughly speaking, we have two categories: An asynchronous function is
> performed (SSCH, HSCH, CSCH) or not (XSCH). So I would split out XSCH
> in any case.
>
> SSCH, HSCH, CSCH all perform path management. I see them as kind of
> escalating (i.e. CSCH 'beats' HSCH which 'beats' SSCH). I think they
> are all similar enough, though, as we can call through to the real
> hardware and have it sorted out there.
>
> Looking through the channel I/O instructions:
> - RSCH should be handled with SSCH (as a special case).
> - MSCH should also be handled in the long run, STSCH as well.
> - SCHM is interesting, as it's not per-subchannel. We have some basic
>    handling of the instruction in QEMU, but it only emulates some ssch
>    counters and completely lacks support for the other fields.
> - IIRC, there's also a CHSC command dealing with channel monitoring. We
>    currently fence off any CHSC that is not needed for Linux to run, but
>    there are some that might be useful for the guest (path handling
>    etc.) Hard to come to a conclusion here without access to the
>    documentation.
> - I don't think we need to care about TSCH (other than keeping the
>    schib up to date, which we also need to do for STSCH).
> - Likewise, TPI should be handled via emulation.
>
> Coming back to the original issue, I think we can easily handle SSCH
> (and RSCH), HSCH and CSCH together (with the actual hardware doing the
> heavy lifting anyway). For other instructions, we need separate
> states/processing.
>

OK, I make the next version with this in mind.

Thanks

Pierre
diff mbox

Patch

diff --git a/drivers/s390/cio/vfio_ccw_fsm.c b/drivers/s390/cio/vfio_ccw_fsm.c
index 13751b4..f9855cd 100644
--- a/drivers/s390/cio/vfio_ccw_fsm.c
+++ b/drivers/s390/cio/vfio_ccw_fsm.c
@@ -19,14 +19,9 @@  static int fsm_io_helper(struct vfio_ccw_private *private)
 	union orb *orb;
 	int ccode;
 	__u8 lpm;
-	unsigned long flags;
 
 	sch = private->sch;
 
-	spin_lock_irqsave(sch->lock, flags);
-	private->state = VFIO_CCW_STATE_BUSY;
-	spin_unlock_irqrestore(sch->lock, flags);
-
 	orb = cp_get_orb(&private->cp, (u32)(addr_t)sch, sch->lpm);
 
 	/* Issue "Start Subchannel" */
@@ -118,44 +113,30 @@  static int fsm_io_request(struct vfio_ccw_private *private,
 			   enum vfio_ccw_event event)
 {
 	union orb *orb;
-	union scsw *scsw = &private->scsw;
 	struct ccw_io_region *io_region = &private->io_region;
 	struct mdev_device *mdev = private->mdev;
 
 	private->state = VFIO_CCW_STATE_BOXED;
 
-	memcpy(scsw, io_region->scsw_area, sizeof(*scsw));
-
-	if (scsw->cmd.fctl & SCSW_FCTL_START_FUNC) {
-		orb = (union orb *)io_region->orb_area;
-
-		io_region->ret_code = cp_init(&private->cp, mdev_dev(mdev),
-					      orb);
-		if (io_region->ret_code)
-			goto err_out;
-
-		io_region->ret_code = cp_prefetch(&private->cp);
-		if (io_region->ret_code) {
-			cp_free(&private->cp);
-			goto err_out;
-		}
-
-		/* Start channel program and wait for I/O interrupt. */
-		io_region->ret_code = fsm_io_helper(private);
-		if (io_region->ret_code) {
-			cp_free(&private->cp);
-			goto err_out;
-		}
-		return private->state;
-	} else if (scsw->cmd.fctl & SCSW_FCTL_HALT_FUNC) {
-		/* XXX: Handle halt. */
-		io_region->ret_code = -EOPNOTSUPP;
+	orb = (union orb *)io_region->orb_area;
+
+	io_region->ret_code = cp_init(&private->cp, mdev_dev(mdev), orb);
+	if (io_region->ret_code)
+		goto err_out;
+
+	io_region->ret_code = cp_prefetch(&private->cp);
+	if (io_region->ret_code) {
+		cp_free(&private->cp);
 		goto err_out;
-	} else if (scsw->cmd.fctl & SCSW_FCTL_CLEAR_FUNC) {
-		/* XXX: Handle clear. */
-		io_region->ret_code = -EOPNOTSUPP;
+	}
+
+	/* Start channel program and wait for I/O interrupt. */
+	io_region->ret_code = fsm_io_helper(private);
+	if (io_region->ret_code) {
+		cp_free(&private->cp);
 		goto err_out;
 	}
+	return VFIO_CCW_STATE_BUSY;
 
 err_out:
 	private->state = VFIO_CCW_STATE_IDLE;
@@ -179,7 +160,7 @@  static int fsm_irq(struct vfio_ccw_private *private,
 	if (private->io_trigger)
 		eventfd_signal(private->io_trigger, 1);
 
-	return private->state;
+	return VFIO_CCW_STATE_IDLE;
 }
 
 /*
@@ -206,31 +187,31 @@  static int fsm_sch_event(struct vfio_ccw_private *private,
 fsm_func_t *vfio_ccw_jumptable[NR_VFIO_CCW_STATES][NR_VFIO_CCW_EVENTS] = {
 	[VFIO_CCW_STATE_NOT_OPER] = {
 		[VFIO_CCW_EVENT_NOT_OPER]	= fsm_nop,
-		[VFIO_CCW_EVENT_IO_REQ]		= fsm_io_error,
+		[VFIO_CCW_EVENT_SSCH_REQ]	= fsm_io_error,
 		[VFIO_CCW_EVENT_INTERRUPT]	= fsm_disabled_irq,
 		[VFIO_CCW_EVENT_SCH_EVENT]	= fsm_nop,
 	},
 	[VFIO_CCW_STATE_STANDBY] = {
 		[VFIO_CCW_EVENT_NOT_OPER]	= fsm_notoper,
-		[VFIO_CCW_EVENT_IO_REQ]		= fsm_io_error,
+		[VFIO_CCW_EVENT_SSCH_REQ]	= fsm_io_error,
 		[VFIO_CCW_EVENT_INTERRUPT]	= fsm_irq,
 		[VFIO_CCW_EVENT_SCH_EVENT]	= fsm_sch_event,
 	},
 	[VFIO_CCW_STATE_IDLE] = {
 		[VFIO_CCW_EVENT_NOT_OPER]	= fsm_notoper,
-		[VFIO_CCW_EVENT_IO_REQ]		= fsm_io_request,
+		[VFIO_CCW_EVENT_SSCH_REQ]	= fsm_io_request,
 		[VFIO_CCW_EVENT_INTERRUPT]	= fsm_irq,
 		[VFIO_CCW_EVENT_SCH_EVENT]	= fsm_sch_event,
 	},
 	[VFIO_CCW_STATE_BOXED] = {
 		[VFIO_CCW_EVENT_NOT_OPER]	= fsm_notoper,
-		[VFIO_CCW_EVENT_IO_REQ]		= fsm_io_busy,
+		[VFIO_CCW_EVENT_SSCH_REQ]	= fsm_io_busy,
 		[VFIO_CCW_EVENT_INTERRUPT]	= fsm_irq,
 		[VFIO_CCW_EVENT_SCH_EVENT]	= fsm_sch_event,
 	},
 	[VFIO_CCW_STATE_BUSY] = {
 		[VFIO_CCW_EVENT_NOT_OPER]	= fsm_notoper,
-		[VFIO_CCW_EVENT_IO_REQ]		= fsm_io_busy,
+		[VFIO_CCW_EVENT_SSCH_REQ]	= fsm_io_busy,
 		[VFIO_CCW_EVENT_INTERRUPT]	= fsm_irq,
 		[VFIO_CCW_EVENT_SCH_EVENT]	= fsm_sch_event,
 	},
diff --git a/drivers/s390/cio/vfio_ccw_ops.c b/drivers/s390/cio/vfio_ccw_ops.c
index 41eeb57..4da7b61 100644
--- a/drivers/s390/cio/vfio_ccw_ops.c
+++ b/drivers/s390/cio/vfio_ccw_ops.c
@@ -188,19 +188,22 @@  static ssize_t vfio_ccw_mdev_write(struct mdev_device *mdev,
 {
 	struct vfio_ccw_private *private;
 	struct ccw_io_region *region;
+	union scsw *scsw;
 
 	if (*ppos + count > sizeof(*region))
 		return -EINVAL;
 
 	private = dev_get_drvdata(mdev_parent_dev(mdev));
-	if (private->state != VFIO_CCW_STATE_IDLE)
-		return -EACCES;
 
 	region = &private->io_region;
 	if (copy_from_user((void *)region + *ppos, buf, count))
 		return -EFAULT;
 
-	vfio_ccw_fsm_event(private, VFIO_CCW_EVENT_IO_REQ);
+	scsw = (union scsw *) &region->scsw_area;
+	if ((scsw->cmd.fctl & SCSW_FCTL_START_FUNC) != SCSW_FCTL_START_FUNC)
+		return -EINVAL;
+
+	vfio_ccw_fsm_event(private, VFIO_CCW_EVENT_SSCH_REQ);
 	if (region->ret_code != 0) {
 		private->state = VFIO_CCW_STATE_IDLE;
 		return region->ret_code;
diff --git a/drivers/s390/cio/vfio_ccw_private.h b/drivers/s390/cio/vfio_ccw_private.h
index 3284e64..93aab87 100644
--- a/drivers/s390/cio/vfio_ccw_private.h
+++ b/drivers/s390/cio/vfio_ccw_private.h
@@ -76,7 +76,7 @@  enum vfio_ccw_state {
  */
 enum vfio_ccw_event {
 	VFIO_CCW_EVENT_NOT_OPER,
-	VFIO_CCW_EVENT_IO_REQ,
+	VFIO_CCW_EVENT_SSCH_REQ,
 	VFIO_CCW_EVENT_INTERRUPT,
 	VFIO_CCW_EVENT_SCH_EVENT,
 	/* last element! */