[v2,0/9] Move vfio_ccw to the new mdev API

Message ID	0-v2-7d3a384024cf+2060-ccw_mdev_jgg@nvidia.com (mailing list archive)
Headers	show Return-Path: <SRS0=8Ejf=N7=lists.freedesktop.org=intel-gfx-bounces@kernel.org> DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org E46656103E From: Jason Gunthorpe <jgg@nvidia.com> To: David Airlie <airlied@linux.ie>, Tony Krowiak <akrowiak@linux.ibm.com>, Alex Williamson <alex.williamson@redhat.com>, Christian Borntraeger <borntraeger@de.ibm.com>, Cornelia Huck <cohuck@redhat.com>, Daniel Vetter <daniel@ffwll.ch>, dri-devel@lists.freedesktop.org, Eric Farman <farman@linux.ibm.com>, Harald Freudenberger <freude@linux.ibm.com>, Vasily Gorbik <gor@linux.ibm.com>, Heiko Carstens <hca@linux.ibm.com>, intel-gfx@lists.freedesktop.org, intel-gvt-dev@lists.freedesktop.org, Jani Nikula <jani.nikula@linux.intel.com>, Jason Herne <jjherne@linux.ibm.com>, Joonas Lahtinen <joonas.lahtinen@linux.intel.com>, kvm@vger.kernel.org, Kirti Wankhede <kwankhede@nvidia.com>, linux-s390@vger.kernel.org, Matthew Rosato <mjrosato@linux.ibm.com>, Peter Oberparleiter <oberpar@linux.ibm.com>, Halil Pasic <pasic@linux.ibm.com>, Rodrigo Vivi <rodrigo.vivi@intel.com>, Vineeth Vijayan <vneethv@linux.ibm.com>, Zhenyu Wang <zhenyuw@linux.intel.com>, Zhi Wang <zhi.a.wang@intel.com> Cc: Christoph Hellwig <hch@lst.de> Date: Thu, 9 Sep 2021 16:38:40 -0300 Message-Id: <0-v2-7d3a384024cf+2060-ccw_mdev_jgg@nvidia.com> Content-Transfer-Encoding: 8bit Content-Type: text/plain MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH v2 0/9] Move vfio_ccw to the new mdev API Precedence: list Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" <intel-gfx-bounces@lists.freedesktop.org>
Series	Move vfio_ccw to the new mdev API \| expand [v2,0/9] Move vfio_ccw to the new mdev API [v2,1/9] vfio/ccw: Use functions for alloc/free of the vfio_ccw_private [v2,2/9] vfio/ccw: Pass vfio_ccw_private not mdev_device to various functions [v2,3/9] vfio/ccw: Convert to use vfio_register_group_dev() [v2,4/9] vfio/ccw: Make the FSM complete and synchronize it to the mdev [v2,5/9] vfio/mdev: Consolidate all the device_api sysfs into the core code [v2,6/9] vfio/mdev: Add mdev available instance checking to the core [v2,7/9] vfio/ccw: Remove private->mdev [v2,8/9] vfio: Export vfio_device_try_get() [v2,9/9] vfio/ccw: Move the lifecycle of the struct vfio_ccw_private to the mdev

Jason Gunthorpe Sept. 9, 2021, 7:38 p.m. UTC

This addresses Cornelia's remark on the earlier patch that ccw has a
confusing lifecycle. While it doesn't seem like the original attempt was
functionally wrong, the result can be made better with a lot of further
work.

Reorganize the driver so that the mdev owns the private memory and
controls the lifecycle, not the css_driver. The memory associated with the
css_driver lifecycle is only the mdev_parent/mdev_type registration.

Along the way we change when the sch is quiescent or not to be linked to
the open/close_device lifetime of the vfio_device, which is sort of what
it was tring to do already, just not completely.

The troublesome racey lifecycle of the css_driver callbacks is made clear
with simple vfio_device refcounting so a callback is only delivered into a
registered vfio_device and has obvious correctness.

Move the only per-css_driver state, the "available instance" counter, into
the core code and share that logic with many of the other drivers. The
value is kept in the mdev_type memory.

v2:
 - Clean up the lifecycle in ccw with 7 new patches
 - Rebase
v1: https://lore.kernel.org/all/7-v2-7667f42c9bad+935-vfio3_jgg@nvidia.com

Jason Gunthorpe (9):
  vfio/ccw: Use functions for alloc/free of the vfio_ccw_private
  vfio/ccw: Pass vfio_ccw_private not mdev_device to various functions
  vfio/ccw: Convert to use vfio_register_group_dev()
  vfio/ccw: Make the FSM complete and synchronize it to the mdev
  vfio/mdev: Consolidate all the device_api sysfs into the core code
  vfio/mdev: Add mdev available instance checking to the core
  vfio/ccw: Remove private->mdev
  vfio: Export vfio_device_try_get()
  vfio/ccw: Move the lifecycle of the struct vfio_ccw_private to the
    mdev

 drivers/gpu/drm/i915/gvt/kvmgt.c      |   9 +-
 drivers/s390/cio/vfio_ccw_drv.c       | 282 +++++++++++---------------
 drivers/s390/cio/vfio_ccw_fsm.c       | 152 ++++++++++----
 drivers/s390/cio/vfio_ccw_ops.c       | 240 ++++++++++------------
 drivers/s390/cio/vfio_ccw_private.h   |  42 +++-
 drivers/s390/crypto/vfio_ap_ops.c     |  41 +---
 drivers/s390/crypto/vfio_ap_private.h |   2 -
 drivers/vfio/mdev/mdev_core.c         |  13 +-
 drivers/vfio/mdev/mdev_private.h      |   2 +
 drivers/vfio/mdev/mdev_sysfs.c        |  64 +++++-
 drivers/vfio/vfio.c                   |   3 +-
 include/linux/mdev.h                  |  13 +-
 include/linux/vfio.h                  |   1 +
 samples/vfio-mdev/mbochs.c            |   9 +-
 samples/vfio-mdev/mdpy.c              |  31 +--
 samples/vfio-mdev/mtty.c              |  10 +-
 16 files changed, 470 insertions(+), 444 deletions(-)

Eric Farman Sept. 13, 2021, 5:40 p.m. UTC | #1

On Thu, 2021-09-09 at 16:38 -0300, Jason Gunthorpe wrote:
> This addresses Cornelia's remark on the earlier patch that ccw has a
> confusing lifecycle. While it doesn't seem like the original attempt
> was
> functionally wrong, the result can be made better with a lot of
> further
> work.

I thought I'd take a stab at seeing how this works with the hardware
before looking at the code much. git couldn't apply patches 1, 6, or 9
to 5.15-rc1, but I was able to hand-fit them into place. Shutting down
the guest and de-configuring a device ends up bringing my whole system
down. I haven't looked at this any further; hopefully something jumps
to mind for you:

[   64.585347] vfio_ccw 0.0.08fe: MDEV: Unregistering
[   64.585357] illegal operation: 0001 ilc:1 [#1] SMP 
[   64.585362] Modules linked in: vhost_vsock
vmw_vsock_virtio_transport_common vsock vhost
[   64.585364] vfio_ccw_mdev b50bbd4b-eab8-4f8c-9f0c-3cf636f936b9:
Relaying device request to user (#0)
[   64.585364]  vhost_iotlb lcs ctcm fsm kvm xt_MASQUERADE xt_conntrack
ipt_REJECT nf_reject_ipv4 xt_tcpudp iptable_mangle iptable_nat nf_nat
nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_filter bridge stp
llc dm_multipath dm_mod s390_trng eadm_sch zcrypt_cex4 qeth_l2 vfio_ccw
mdev vfio_iommu_type1 vfio configfs zram zsmalloc ip_tables x_tables
mlx5_core ghash_s390 prng aes_s390 des_s390 libdes sha3_512_s390
sha3_256_s390 sha512_s390 sha256_s390 sha1_s390 sha_common pkey zcrypt
rng_core autofs4
[   64.585392] CPU: 14 PID: 4487 Comm: qemu-system-s39 Kdump: loaded
Not tainted 5.15.0-rc1 #1
[   64.585395] Hardware name: IBM 3906 M05 780 (LPAR)
[   64.585396] Krnl PSW : 0704c00180000000 0000000000000002 (0x2)
[   64.585404]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0
PM:0 RI:0 EA:3
[   64.585407] Krnl GPRS: 0000000000000001 0000000000000000
00000000005f4800 0000000000000004
[   64.585410]            0000000000000000 0000000000000002
0000000000000000 000002aa3e65085e
[   64.585412]            000000008de09100 0000000000003b6f
000003ff8017fa08 00000000005f4800
[   64.585413]            0000000081450000 000003ff7c032310
000003ff80179db0 000003800bf53da0
[   64.585418] Krnl Code:#0000000000000000:
0000                illegal 
          >0000000000000002: 0000               illegal 
           0000000000000004: 0000               illegal 
           0000000000000006: 0000               illegal 
           0000000000000008: 0000               illegal 
           000000000000000a: 0000               illegal 
           000000000000000c: 0000               illegal 
           000000000000000e: 0000               illegal 
[   64.585462] Call Trace:
[   64.585464]  [<0000000000000002>] 0x2 
[   64.585467] ([<000003ff80179d74>] vfio_ccw_mdev_ioctl+0x84/0x318
[vfio_ccw])
[   64.585476]  [<00000000bb7adda6>] __s390x_sys_ioctl+0xbe/0x100 
[   64.585481]  [<00000000bbfbf5e4>] __do_syscall+0x1bc/0x1e8 
[   64.585488]  [<00000000bbfcc8d8>] system_call+0x78/0xa0 

Eric

> 
> Reorganize the driver so that the mdev owns the private memory and
> controls the lifecycle, not the css_driver. The memory associated
> with the
> css_driver lifecycle is only the mdev_parent/mdev_type registration.
> 
> Along the way we change when the sch is quiescent or not to be linked
> to
> the open/close_device lifetime of the vfio_device, which is sort of
> what
> it was tring to do already, just not completely.
> 
> The troublesome racey lifecycle of the css_driver callbacks is made
> clear
> with simple vfio_device refcounting so a callback is only delivered
> into a
> registered vfio_device and has obvious correctness.
> 
> Move the only per-css_driver state, the "available instance" counter,
> into
> the core code and share that logic with many of the other drivers.
> The
> value is kept in the mdev_type memory.
> 
> v2:
>  - Clean up the lifecycle in ccw with 7 new patches
>  - Rebase
> v1: 
> https://lore.kernel.org/all/7-v2-7667f42c9bad+935-vfio3_jgg@nvidia.com
> 
> Jason Gunthorpe (9):
>   vfio/ccw: Use functions for alloc/free of the vfio_ccw_private
>   vfio/ccw: Pass vfio_ccw_private not mdev_device to various
> functions
>   vfio/ccw: Convert to use vfio_register_group_dev()
>   vfio/ccw: Make the FSM complete and synchronize it to the mdev
>   vfio/mdev: Consolidate all the device_api sysfs into the core code
>   vfio/mdev: Add mdev available instance checking to the core
>   vfio/ccw: Remove private->mdev
>   vfio: Export vfio_device_try_get()
>   vfio/ccw: Move the lifecycle of the struct vfio_ccw_private to the
>     mdev
> 
>  drivers/gpu/drm/i915/gvt/kvmgt.c      |   9 +-
>  drivers/s390/cio/vfio_ccw_drv.c       | 282 +++++++++++-------------
> --
>  drivers/s390/cio/vfio_ccw_fsm.c       | 152 ++++++++++----
>  drivers/s390/cio/vfio_ccw_ops.c       | 240 ++++++++++------------
>  drivers/s390/cio/vfio_ccw_private.h   |  42 +++-
>  drivers/s390/crypto/vfio_ap_ops.c     |  41 +---
>  drivers/s390/crypto/vfio_ap_private.h |   2 -
>  drivers/vfio/mdev/mdev_core.c         |  13 +-
>  drivers/vfio/mdev/mdev_private.h      |   2 +
>  drivers/vfio/mdev/mdev_sysfs.c        |  64 +++++-
>  drivers/vfio/vfio.c                   |   3 +-
>  include/linux/mdev.h                  |  13 +-
>  include/linux/vfio.h                  |   1 +
>  samples/vfio-mdev/mbochs.c            |   9 +-
>  samples/vfio-mdev/mdpy.c              |  31 +--
>  samples/vfio-mdev/mtty.c              |  10 +-
>  16 files changed, 470 insertions(+), 444 deletions(-)
>

Jason Gunthorpe Sept. 13, 2021, 7:24 p.m. UTC | #2

On Mon, Sep 13, 2021 at 01:40:34PM -0400, Eric Farman wrote:
> On Thu, 2021-09-09 at 16:38 -0300, Jason Gunthorpe wrote:
> > This addresses Cornelia's remark on the earlier patch that ccw has a
> > confusing lifecycle. While it doesn't seem like the original attempt
> > was
> > functionally wrong, the result can be made better with a lot of
> > further
> > work.
> 
> I thought I'd take a stab at seeing how this works with the hardware
> before looking at the code much. git couldn't apply patches 1, 6, or 9
> to 5.15-rc1, but I was able to hand-fit them into place. 

Oh? Thats odd, I had no remarks from git when rebasing onto
v5.15-rc1..

Maybe this is a situation where you need "b4 am --prep-3way" ...

> [   64.585462] Call Trace:
> [   64.585464]  [<0000000000000002>] 0x2 
> [   64.585467] ([<000003ff80179d74>] vfio_ccw_mdev_ioctl+0x84/0x318
> [vfio_ccw])
> [   64.585476]  [<00000000bb7adda6>] __s390x_sys_ioctl+0xbe/0x100 
> [   64.585481]  [<00000000bbfbf5e4>] __do_syscall+0x1bc/0x1e8 
> [   64.585488]  [<00000000bbfcc8d8>] system_call+0x78/0xa0 

I think it is this:

diff --git a/drivers/s390/cio/vfio_ccw_fsm.c b/drivers/s390/cio/vfio_ccw_fsm.c
index df1490943b20ec..5ea392959c0711 100644
--- a/drivers/s390/cio/vfio_ccw_fsm.c
+++ b/drivers/s390/cio/vfio_ccw_fsm.c
@@ -441,6 +441,7 @@ fsm_func_t *vfio_ccw_jumptable[NR_VFIO_CCW_STATES][NR_VFIO_CCW_EVENTS] = {
 		[VFIO_CCW_EVENT_IO_REQ]		= fsm_io_error,
 		[VFIO_CCW_EVENT_ASYNC_REQ]	= fsm_async_error,
 		[VFIO_CCW_EVENT_INTERRUPT]	= fsm_disabled_irq,
+		[VFIO_CCW_EVENT_OPEN]		= fsm_nop,
 		[VFIO_CCW_EVENT_CLOSE]		= fsm_nop,
 	},
 	[VFIO_CCW_STATE_CLOSED] = {

I rebased it and fixed it up here:

https://github.com/jgunthorpe/linux/tree/vfio_ccw

Can you try again?

Thanks,
Jason

Eric Farman Sept. 13, 2021, 8:31 p.m. UTC | #3

On Mon, 2021-09-13 at 16:24 -0300, Jason Gunthorpe wrote:
> On Mon, Sep 13, 2021 at 01:40:34PM -0400, Eric Farman wrote:
> > On Thu, 2021-09-09 at 16:38 -0300, Jason Gunthorpe wrote:
> > > This addresses Cornelia's remark on the earlier patch that ccw
> > > has a
> > > confusing lifecycle. While it doesn't seem like the original
> > > attempt
> > > was
> > > functionally wrong, the result can be made better with a lot of
> > > further
> > > work.
> > 
> > I thought I'd take a stab at seeing how this works with the
> > hardware
> > before looking at the code much. git couldn't apply patches 1, 6,
> > or 9
> > to 5.15-rc1, but I was able to hand-fit them into place. 
> 
> Oh? Thats odd, I had no remarks from git when rebasing onto
> v5.15-rc1..
> 
> Maybe this is a situation where you need "b4 am --prep-3way" ...

Ah, that does indeed help, thanks. Still missing the vfio-ap patch
that's elsewhere on the list, but I'm not caring about that at the
moment.

> 
> > [   64.585462] Call Trace:
> > [   64.585464]  [<0000000000000002>] 0x2 
> > [   64.585467] ([<000003ff80179d74>] vfio_ccw_mdev_ioctl+0x84/0x318
> > [vfio_ccw])
> > [   64.585476]  [<00000000bb7adda6>] __s390x_sys_ioctl+0xbe/0x100 
> > [   64.585481]  [<00000000bbfbf5e4>] __do_syscall+0x1bc/0x1e8 
> > [   64.585488]  [<00000000bbfcc8d8>] system_call+0x78/0xa0 
> 
> I think it is this:
> 
> diff --git a/drivers/s390/cio/vfio_ccw_fsm.c
> b/drivers/s390/cio/vfio_ccw_fsm.c
> index df1490943b20ec..5ea392959c0711 100644
> --- a/drivers/s390/cio/vfio_ccw_fsm.c
> +++ b/drivers/s390/cio/vfio_ccw_fsm.c
> @@ -441,6 +441,7 @@ fsm_func_t
> *vfio_ccw_jumptable[NR_VFIO_CCW_STATES][NR_VFIO_CCW_EVENTS] = {
>  		[VFIO_CCW_EVENT_IO_REQ]		= fsm_io_error,
>  		[VFIO_CCW_EVENT_ASYNC_REQ]	= fsm_async_error,
>  		[VFIO_CCW_EVENT_INTERRUPT]	= fsm_disabled_irq,
> +		[VFIO_CCW_EVENT_OPEN]		= fsm_nop,
>  		[VFIO_CCW_EVENT_CLOSE]		= fsm_nop,
>  	},
>  	[VFIO_CCW_STATE_CLOSED] = {
> 
> I rebased it and fixed it up here:
> 
> https://github.com/jgunthorpe/linux/tree/vfio_ccw
> 
> Can you try again?

That does address the crash, but then why is it processing a BROKEN
event? Seems problematic. All the configuration works fine, but the
devices get ripped away once a guest is started that wants to open/use
them.

So, there's more problems to figure out.

Eric

> 
> Thanks,
> Jason

Jason Gunthorpe Sept. 14, 2021, 1:36 p.m. UTC | #4

On Mon, Sep 13, 2021 at 04:31:54PM -0400, Eric Farman wrote:
> > I rebased it and fixed it up here:
> > 
> > https://github.com/jgunthorpe/linux/tree/vfio_ccw
> > 
> > Can you try again?
> 
> That does address the crash, but then why is it processing a BROKEN
> event? Seems problematic. 

The stuff related to the NOT_OPER looked really wonky to me. I'm
guessing this is the issue - not sure about the pmcw.ena either..

diff --git a/drivers/s390/cio/vfio_ccw_fsm.c b/drivers/s390/cio/vfio_ccw_fsm.c
index 5ea392959c0711..0d4d4f425befac 100644
--- a/drivers/s390/cio/vfio_ccw_fsm.c
+++ b/drivers/s390/cio/vfio_ccw_fsm.c
@@ -380,29 +380,19 @@ static void fsm_open(struct vfio_ccw_private *private,
 	spin_unlock_irq(sch->lock);
 }
 
-static void fsm_close(struct vfio_ccw_private *private,
-		      enum vfio_ccw_event event)
+static int flush_sch(struct vfio_ccw_private *private)
 {
 	struct subchannel *sch = private->sch;
 	DECLARE_COMPLETION_ONSTACK(completion);
 	int iretry, ret = 0;
 
-	spin_lock_irq(sch->lock);
-	if (!sch->schib.pmcw.ena)
-		goto err_unlock;
-	ret = cio_disable_subchannel(sch);
-	if (ret != -EBUSY)
-		goto err_unlock;
-
 	iretry = 255;
 	do {
-
 		ret = cio_cancel_halt_clear(sch, &iretry);
-
 		if (ret == -EIO) {
 			pr_err("vfio_ccw: could not quiesce subchannel 0.%x.%04x!\n",
 			       sch->schid.ssid, sch->schid.sch_no);
-			break;
+			return ret;
 		}
 
 		/*
@@ -413,13 +403,28 @@ static void fsm_close(struct vfio_ccw_private *private,
 		spin_unlock_irq(sch->lock);
 
 		if (ret == -EBUSY)
-			wait_for_completion_timeout(&completion, 3*HZ);
+			wait_for_completion_timeout(&completion, 3 * HZ);
 
 		private->completion = NULL;
 		flush_workqueue(vfio_ccw_work_q);
 		spin_lock_irq(sch->lock);
 		ret = cio_disable_subchannel(sch);
 	} while (ret == -EBUSY);
+	return ret;
+}
+
+static void fsm_close(struct vfio_ccw_private *private,
+		      enum vfio_ccw_event event)
+{
+	struct subchannel *sch = private->sch;
+	int ret;
+
+	spin_lock_irq(sch->lock);
+	if (!sch->schib.pmcw.ena)
+		goto err_unlock;
+	ret = cio_disable_subchannel(sch);
+	if (ret == -EBUSY)
+		ret = flush_sch(private);
 	if (ret)
 		goto err_unlock;
 	private->state = VFIO_CCW_STATE_CLOSED;

Cornelia Huck Sept. 17, 2021, 11:59 a.m. UTC | #5

On Tue, Sep 14 2021, Jason Gunthorpe <jgg@nvidia.com> wrote:

> On Mon, Sep 13, 2021 at 04:31:54PM -0400, Eric Farman wrote:
>> > I rebased it and fixed it up here:
>> > 
>> > https://github.com/jgunthorpe/linux/tree/vfio_ccw
>> > 
>> > Can you try again?
>> 
>> That does address the crash, but then why is it processing a BROKEN
>> event? Seems problematic. 
>
> The stuff related to the NOT_OPER looked really wonky to me. I'm
> guessing this is the issue - not sure about the pmcw.ena either..

[I have still not been able to digest the whole series, sorry.]

>
> diff --git a/drivers/s390/cio/vfio_ccw_fsm.c b/drivers/s390/cio/vfio_ccw_fsm.c
> index 5ea392959c0711..0d4d4f425befac 100644
> --- a/drivers/s390/cio/vfio_ccw_fsm.c
> +++ b/drivers/s390/cio/vfio_ccw_fsm.c
> @@ -380,29 +380,19 @@ static void fsm_open(struct vfio_ccw_private *private,
>  	spin_unlock_irq(sch->lock);
>  }
>  
> -static void fsm_close(struct vfio_ccw_private *private,
> -		      enum vfio_ccw_event event)
> +static int flush_sch(struct vfio_ccw_private *private)
>  {
>  	struct subchannel *sch = private->sch;
>  	DECLARE_COMPLETION_ONSTACK(completion);
>  	int iretry, ret = 0;
>  
> -	spin_lock_irq(sch->lock);
> -	if (!sch->schib.pmcw.ena)
> -		goto err_unlock;
> -	ret = cio_disable_subchannel(sch);
> -	if (ret != -EBUSY)
> -		goto err_unlock;
> -
>  	iretry = 255;
>  	do {
> -
>  		ret = cio_cancel_halt_clear(sch, &iretry);
> -
>  		if (ret == -EIO) {
>  			pr_err("vfio_ccw: could not quiesce subchannel 0.%x.%04x!\n",
>  			       sch->schid.ssid, sch->schid.sch_no);
> -			break;
> +			return ret;

Looking at this, I wonder why we had special-cased -EIO -- for -ENODEV
we should be done as well, as then the device is dead and we do not need
to disable it.

>  		}
>  
>  		/*
> @@ -413,13 +403,28 @@ static void fsm_close(struct vfio_ccw_private *private,
>  		spin_unlock_irq(sch->lock);
>  
>  		if (ret == -EBUSY)
> -			wait_for_completion_timeout(&completion, 3*HZ);
> +			wait_for_completion_timeout(&completion, 3 * HZ);
>  
>  		private->completion = NULL;
>  		flush_workqueue(vfio_ccw_work_q);
>  		spin_lock_irq(sch->lock);
>  		ret = cio_disable_subchannel(sch);
>  	} while (ret == -EBUSY);
> +	return ret;
> +}
> +
> +static void fsm_close(struct vfio_ccw_private *private,
> +		      enum vfio_ccw_event event)
> +{
> +	struct subchannel *sch = private->sch;
> +	int ret;
> +
> +	spin_lock_irq(sch->lock);
> +	if (!sch->schib.pmcw.ena)
> +		goto err_unlock;
> +	ret = cio_disable_subchannel(sch);

cio_disable_subchannel() should be happy to disable an already disabled
subchannel, so I guess we can just walk through this and end up in
CLOSED state... unless entering with !ena actually indicates that we
messed up somewhere else in the state machine. I still need to find time
to read the patches.

> +	if (ret == -EBUSY)
> +		ret = flush_sch(private);
>  	if (ret)
>  		goto err_unlock;
>  	private->state = VFIO_CCW_STATE_CLOSED;

Jason Gunthorpe Sept. 17, 2021, 12:51 p.m. UTC | #6

On Fri, Sep 17, 2021 at 01:59:16PM +0200, Cornelia Huck wrote:
> >  		ret = cio_cancel_halt_clear(sch, &iretry);
> > -
> >  		if (ret == -EIO) {
> >  			pr_err("vfio_ccw: could not quiesce subchannel 0.%x.%04x!\n",
> >  			       sch->schid.ssid, sch->schid.sch_no);
> > -			break;
> > +			return ret;
> 
> Looking at this, I wonder why we had special-cased -EIO -- for -ENODEV
> we should be done as well, as then the device is dead and we do not need
> to disable it.

cio_cancel_halt_clear() should probably succeed in that case.

> > @@ -413,13 +403,28 @@ static void fsm_close(struct vfio_ccw_private *private,
> >  		spin_unlock_irq(sch->lock);
> >  
> >  		if (ret == -EBUSY)
> > -			wait_for_completion_timeout(&completion, 3*HZ);
> > +			wait_for_completion_timeout(&completion, 3 * HZ);
> >  
> >  		private->completion = NULL;
> >  		flush_workqueue(vfio_ccw_work_q);
> >  		spin_lock_irq(sch->lock);
> >  		ret = cio_disable_subchannel(sch);
> >  	} while (ret == -EBUSY);
> > +	return ret;
> > +}
> > +
> > +static void fsm_close(struct vfio_ccw_private *private,
> > +		      enum vfio_ccw_event event)
> > +{
> > +	struct subchannel *sch = private->sch;
> > +	int ret;
> > +
> > +	spin_lock_irq(sch->lock);
> > +	if (!sch->schib.pmcw.ena)
> > +		goto err_unlock;
> > +	ret = cio_disable_subchannel(sch);
> 
> cio_disable_subchannel() should be happy to disable an already disabled
> subchannel, so I guess we can just walk through this and end up in
> CLOSED state... unless entering with !ena actually indicates that we
> messed up somewhere else in the state machine. I still need to find time
> to read the patches.

I don't know, I looked at that ena stuff for a bit and couldn't guess
what it is trying to do.

Arguably the channel should not be ripped away from vfio while the FSM
is in the open states, so I'm not sure what a lot of this is for.

Jason

Cornelia Huck Sept. 17, 2021, 2:37 p.m. UTC | #7

On Fri, Sep 17 2021, Jason Gunthorpe <jgg@nvidia.com> wrote:

> On Fri, Sep 17, 2021 at 01:59:16PM +0200, Cornelia Huck wrote:
>> >  		ret = cio_cancel_halt_clear(sch, &iretry);
>> > -
>> >  		if (ret == -EIO) {
>> >  			pr_err("vfio_ccw: could not quiesce subchannel 0.%x.%04x!\n",
>> >  			       sch->schid.ssid, sch->schid.sch_no);
>> > -			break;
>> > +			return ret;
>> 
>> Looking at this, I wonder why we had special-cased -EIO -- for -ENODEV
>> we should be done as well, as then the device is dead and we do not need
>> to disable it.
>
> cio_cancel_halt_clear() should probably succeed in that case.

It will actually give us -ENODEV, as the very first call in that
function will already fail.

>
>> > @@ -413,13 +403,28 @@ static void fsm_close(struct vfio_ccw_private *private,
>> >  		spin_unlock_irq(sch->lock);
>> >  
>> >  		if (ret == -EBUSY)
>> > -			wait_for_completion_timeout(&completion, 3*HZ);
>> > +			wait_for_completion_timeout(&completion, 3 * HZ);
>> >  
>> >  		private->completion = NULL;
>> >  		flush_workqueue(vfio_ccw_work_q);
>> >  		spin_lock_irq(sch->lock);
>> >  		ret = cio_disable_subchannel(sch);
>> >  	} while (ret == -EBUSY);
>> > +	return ret;
>> > +}
>> > +
>> > +static void fsm_close(struct vfio_ccw_private *private,
>> > +		      enum vfio_ccw_event event)
>> > +{
>> > +	struct subchannel *sch = private->sch;
>> > +	int ret;
>> > +
>> > +	spin_lock_irq(sch->lock);
>> > +	if (!sch->schib.pmcw.ena)
>> > +		goto err_unlock;
>> > +	ret = cio_disable_subchannel(sch);
>> 
>> cio_disable_subchannel() should be happy to disable an already disabled
>> subchannel, so I guess we can just walk through this and end up in
>> CLOSED state... unless entering with !ena actually indicates that we
>> messed up somewhere else in the state machine. I still need to find time
>> to read the patches.
>
> I don't know, I looked at that ena stuff for a bit and couldn't guess
> what it is trying to do.

It is one of the bits in the pmcw control block that can be modified; if
it is 1, the subchannel is enabled and can be used for I/O, if it is 0,
the subchannel is disabled and all instructions that initiate or stop
I/O will fail. Basically, you enable the subchannel if you actually want
to access the device associated with it. Online/offline for (normal
usage) ccw devices maps (among other things) to associated subchannel
enabled/disabled; for a subchannel that is supposed to be passed via
vfio-ccw, we want to have it enabled so that it is actually usable.

I think the ena checking had been inspired from what the ccw bus
does. We could probably just forge ahead in any case and the called
functions in the css bus would be able to handle it just fine, but I
have not double checked.

> Arguably the channel should not be ripped away from vfio while the FSM
> is in the open states, so I'm not sure what a lot of this is for.

We could have surprise removal (i.e. a subchannel in active use being
ripped out), as that's what happens on real hardware as well. E.g. doing
a device_del in QEMU.

[v2,0/9] Move vfio_ccw to the new mdev API

Message

Comments