Message ID | 0-v3-57c1502c62fd+2190-ccw_mdev_jgg@nvidia.com (mailing list archive) |
---|---|
Headers | show |
Series | Move vfio_ccw to the new mdev API | expand |
On Fri, Oct 01, 2021 at 02:52:41PM -0300, Jason Gunthorpe wrote: > This addresses Cornelia's remark on the earlier patch that ccw has a > confusing lifecycle. While it doesn't seem like the original attempt was > functionally wrong, the result can be made better with a lot of further > work. > > Reorganize the driver so that the mdev owns the private memory and > controls the lifecycle, not the css_driver. The memory associated with the > css_driver lifecycle is only the mdev_parent/mdev_type registration. > > Along the way we change when the sch is quiescent or not to be linked to > the open/close_device lifetime of the vfio_device, which is sort of what > it was tring to do already, just not completely. > > The troublesome racey lifecycle of the css_driver callbacks is made clear > with simple vfio_device refcounting so a callback is only delivered into a > registered vfio_device and has obvious correctness. > > Move the only per-css_driver state, the "available instance" counter, into > the core code and share that logic with many of the other drivers. The > value is kept in the mdev_type memory. > > This is on github: https://github.com/jgunthorpe/linux/commits/vfio_ccw > > v3: > - Rebase to Christoph's group work & rc3; use > vfio_register_emulated_iommu_dev() > - Remove GFP_DMA > - Order mdev_unregister_driver() symmetrically with init > - Rework what is considered a BROKEN event in fsm_close() > - NOP both CCW_EVENT_OPEN/CLOSE > - Documentation updates > - Remane goto label to err_init vfio_ccw_mdev_probe() > - Fix NULL pointer deref in mdev_device_create() > v2: https://lore.kernel.org/r/0-v2-7d3a384024cf+2060-ccw_mdev_jgg@nvidia.com > - Clean up the lifecycle in ccw with 7 new patches > - Rebase > v1: https://lore.kernel.org/all/7-v2-7667f42c9bad+935-vfio3_jgg@nvidia.com > > Jason Gunthorpe (10): > vfio/ccw: Remove unneeded GFP_DMA > vfio/ccw: Use functions for alloc/free of the vfio_ccw_private > vfio/ccw: Pass vfio_ccw_private not mdev_device to various functions > vfio/ccw: Convert to use vfio_register_emulated_iommu_dev() IBM folks, what do you want to do with this? I would like to go ahead with these patches so we can get closer to unblocking some of the VFIO core work. These patches: > vfio/ccw: Make the FSM complete and synchronize it to the mdev > vfio/mdev: Consolidate all the device_api sysfs into the core code > vfio/mdev: Add mdev available instance checking to the core > vfio/ccw: Remove private->mdev > vfio: Export vfio_device_try_get() > vfio/ccw: Move the lifecycle of the struct vfio_ccw_private to the > mdev Where made to show how to structure this more cleanly as Cornelia asked but are not essential and IBMers could test and fix to get this cleanup when time permits.. Thoughts? Thanks, Jason
On Wed, 2021-10-20 at 19:48 -0300, Jason Gunthorpe wrote: > On Fri, Oct 01, 2021 at 02:52:41PM -0300, Jason Gunthorpe wrote: > > This addresses Cornelia's remark on the earlier patch that ccw has > > a > > confusing lifecycle. While it doesn't seem like the original > > attempt was > > functionally wrong, the result can be made better with a lot of > > further > > work. > > > > Reorganize the driver so that the mdev owns the private memory and > > controls the lifecycle, not the css_driver. The memory associated > > with the > > css_driver lifecycle is only the mdev_parent/mdev_type > > registration. > > > > Along the way we change when the sch is quiescent or not to be > > linked to > > the open/close_device lifetime of the vfio_device, which is sort of > > what > > it was tring to do already, just not completely. > > > > The troublesome racey lifecycle of the css_driver callbacks is made > > clear > > with simple vfio_device refcounting so a callback is only delivered > > into a > > registered vfio_device and has obvious correctness. > > > > Move the only per-css_driver state, the "available instance" > > counter, into > > the core code and share that logic with many of the other drivers. > > The > > value is kept in the mdev_type memory. > > > > This is on github: > > https://github.com/jgunthorpe/linux/commits/vfio_ccw > > > > v3: > > - Rebase to Christoph's group work & rc3; use > > vfio_register_emulated_iommu_dev() > > - Remove GFP_DMA > > - Order mdev_unregister_driver() symmetrically with init > > - Rework what is considered a BROKEN event in fsm_close() > > - NOP both CCW_EVENT_OPEN/CLOSE > > - Documentation updates > > - Remane goto label to err_init vfio_ccw_mdev_probe() > > - Fix NULL pointer deref in mdev_device_create() > > v2: > > https://lore.kernel.org/r/0-v2-7d3a384024cf+2060-ccw_mdev_jgg@nvidia.com > > - Clean up the lifecycle in ccw with 7 new patches > > - Rebase > > v1: > > https://lore.kernel.org/all/7-v2-7667f42c9bad+935-vfio3_jgg@nvidia.com > > > > Jason Gunthorpe (10): > > vfio/ccw: Remove unneeded GFP_DMA > > vfio/ccw: Use functions for alloc/free of the vfio_ccw_private > > vfio/ccw: Pass vfio_ccw_private not mdev_device to various > > functions > > vfio/ccw: Convert to use vfio_register_emulated_iommu_dev() > > IBM folks, what do you want to do with this? I would like to go ahead > with these patches so we can get closer to unblocking some of the > VFIO > core work. I'll try to look at these today. (I'm presuming I'm still fine with 2 and 3 :) > > These patches: > > > vfio/ccw: Make the FSM complete and synchronize it to the mdev > > vfio/mdev: Consolidate all the device_api sysfs into the core > > code > > vfio/mdev: Add mdev available instance checking to the core > > vfio/ccw: Remove private->mdev > > vfio: Export vfio_device_try_get() > > vfio/ccw: Move the lifecycle of the struct vfio_ccw_private to > > the > > mdev > > Where made to show how to structure this more cleanly as Cornelia > asked but are not essential and IBMers could test and fix to get this > cleanup when time permits.. Sadly, these ones dragged the whole series down my todo list, because of the scope of rework it entailed. Will keep it on the list, but agree it doesn't need to be bound to the first group. Eric > > Thoughts? > > Thanks, > Jason