Message ID | 162164243591.261970.3439987543338120797.stgit@djiang5-desk3.ch.intel.com (mailing list archive) |
---|---|
Headers | show |
Series | Add VFIO mediated device support and DEV-MSI support for the idxd driver | expand |
On Fri, May 21, 2021 at 05:19:05PM -0700, Dave Jiang wrote: > The code has dependency on DEV_MSI/IMS enabling code: > https://lore.kernel.org/lkml/1614370277-23235-1-git-send-email-megha.dey@intel.com/ > > The code has dependency on idxd driver sub-driver cleanup series: > https://lore.kernel.org/dmaengine/162163546245.260470.18336189072934823712.stgit@djiang5-desk3.ch.intel.com/T/#t > > The code has dependency on Jason's VFIO refactoring: > https://lore.kernel.org/kvm/0-v2-7667f42c9bad+935-vfio3_jgg@nvidia.com/ That is quite an inter-tangled list, do you have a submission plan?? > Introducing mdev types “1dwq-v1” type. This mdev type allows > allocation of a single dedicated wq from available dedicated wqs. After > a workqueue (wq) is enabled, the user will generate an uuid. On mdev > creation, the mdev driver code will find a dwq depending on the mdev > type. When the create operation is successful, the user generated uuid > can be passed to qemu. When the guest boots up, it should discover a > DSA device when doing PCI discovery. > > For example of “1dwq-v1” type: > 1. Enable wq with “mdev” wq type > 2. A user generated uuid. > 3. The uuid is written to the mdev class sysfs path: > echo $UUID > /sys/class/mdev_bus/0000\:00\:0a.0/mdev_supported_types/idxd-1dwq-v1/create > 4. Pass the following parameter to qemu: > "-device vfio-pci,sysfsdev=/sys/bus/pci/devices/0000:00:0a.0/$UUID" So the idxd core driver knows to create a "vfio" wq with its own much machinery but you still want to involve the horrible mdev guid stuff? Why?? Jason
On 5/23/2021 4:22 PM, Jason Gunthorpe wrote: > On Fri, May 21, 2021 at 05:19:05PM -0700, Dave Jiang wrote: >> Introducing mdev types “1dwq-v1” type. This mdev type allows >> allocation of a single dedicated wq from available dedicated wqs. After >> a workqueue (wq) is enabled, the user will generate an uuid. On mdev >> creation, the mdev driver code will find a dwq depending on the mdev >> type. When the create operation is successful, the user generated uuid >> can be passed to qemu. When the guest boots up, it should discover a >> DSA device when doing PCI discovery. >> >> For example of “1dwq-v1” type: >> 1. Enable wq with “mdev” wq type >> 2. A user generated uuid. >> 3. The uuid is written to the mdev class sysfs path: >> echo $UUID > /sys/class/mdev_bus/0000\:00\:0a.0/mdev_supported_types/idxd-1dwq-v1/create >> 4. Pass the following parameter to qemu: >> "-device vfio-pci,sysfsdev=/sys/bus/pci/devices/0000:00:0a.0/$UUID" > So the idxd core driver knows to create a "vfio" wq with its own much > machinery but you still want to involve the horrible mdev guid stuff? > > Why?? Are you referring to calling mdev_device_create() directly in the mdev idxd_driver probe? I think this would work with our dedicated wq where a single mdev can be assigned to a wq. However, later on when we need to support shared wq where we can create multiple mdev per wq, we'll need an entry point to do so. In the name of making things consistent from user perspective, going through sysfs seems the way to do it.
On Wed, Jun 02, 2021 at 08:40:51AM -0700, Dave Jiang wrote: > > On 5/23/2021 4:22 PM, Jason Gunthorpe wrote: > > On Fri, May 21, 2021 at 05:19:05PM -0700, Dave Jiang wrote: > > > Introducing mdev types “1dwq-v1” type. This mdev type allows > > > allocation of a single dedicated wq from available dedicated wqs. After > > > a workqueue (wq) is enabled, the user will generate an uuid. On mdev > > > creation, the mdev driver code will find a dwq depending on the mdev > > > type. When the create operation is successful, the user generated uuid > > > can be passed to qemu. When the guest boots up, it should discover a > > > DSA device when doing PCI discovery. > > > > > > For example of “1dwq-v1” type: > > > 1. Enable wq with “mdev” wq type > > > 2. A user generated uuid. > > > 3. The uuid is written to the mdev class sysfs path: > > > echo $UUID > /sys/class/mdev_bus/0000\:00\:0a.0/mdev_supported_types/idxd-1dwq-v1/create > > > 4. Pass the following parameter to qemu: > > > "-device vfio-pci,sysfsdev=/sys/bus/pci/devices/0000:00:0a.0/$UUID" > > So the idxd core driver knows to create a "vfio" wq with its own much > > machinery but you still want to involve the horrible mdev guid stuff? > > > > Why?? > > Are you referring to calling mdev_device_create() directly in the mdev > idxd_driver probe? No, just call vfio_register_group_dev and forget about mdev. > I think this would work with our dedicated wq where a single mdev > can be assigned to a wq. Ok, sounds great > However, later on when we need to support shared wq where we can > create multiple mdev per wq, we'll need an entry point to do so. In > the name of making things consistent from user perspective, going > through sysfs seems the way to do it. Why not use your already very complicated idxd sysfs to do this? Jason
> From: Jason Gunthorpe <jgg@nvidia.com> > Sent: Thursday, June 3, 2021 7:18 AM > > On Wed, Jun 02, 2021 at 08:40:51AM -0700, Dave Jiang wrote: > > > > On 5/23/2021 4:22 PM, Jason Gunthorpe wrote: > > > On Fri, May 21, 2021 at 05:19:05PM -0700, Dave Jiang wrote: > > > > Introducing mdev types “1dwq-v1” type. This mdev type allows > > > > allocation of a single dedicated wq from available dedicated wqs. After > > > > a workqueue (wq) is enabled, the user will generate an uuid. On mdev > > > > creation, the mdev driver code will find a dwq depending on the mdev > > > > type. When the create operation is successful, the user generated uuid > > > > can be passed to qemu. When the guest boots up, it should discover a > > > > DSA device when doing PCI discovery. > > > > > > > > For example of “1dwq-v1” type: > > > > 1. Enable wq with “mdev” wq type > > > > 2. A user generated uuid. > > > > 3. The uuid is written to the mdev class sysfs path: > > > > echo $UUID > > /sys/class/mdev_bus/0000\:00\:0a.0/mdev_supported_types/idxd-1dwq- > v1/create > > > > 4. Pass the following parameter to qemu: > > > > "-device vfio-pci,sysfsdev=/sys/bus/pci/devices/0000:00:0a.0/$UUID" > > > So the idxd core driver knows to create a "vfio" wq with its own much > > > machinery but you still want to involve the horrible mdev guid stuff? > > > > > > Why?? > > > > Are you referring to calling mdev_device_create() directly in the mdev > > idxd_driver probe? > > No, just call vfio_register_group_dev and forget about mdev. > > > I think this would work with our dedicated wq where a single mdev > > can be assigned to a wq. > > Ok, sounds great > > > However, later on when we need to support shared wq where we can > > create multiple mdev per wq, we'll need an entry point to do so. In > > the name of making things consistent from user perspective, going > > through sysfs seems the way to do it. > > Why not use your already very complicated idxd sysfs to do this? > Jason, can you clarify your attitude on mdev guid stuff? Are you completely against it or case-by-case? If the former, this is a big decision thus it's better to have consensus with Alex/Kirti. If the latter, would like to hear your criteria for when it can be used and when not... Thanks Kevin
On Thu, Jun 03, 2021 at 01:11:37AM +0000, Tian, Kevin wrote: > Jason, can you clarify your attitude on mdev guid stuff? Are you > completely against it or case-by-case? If the former, this is a big > decision thus it's better to have consensus with Alex/Kirti. If the > latter, would like to hear your criteria for when it can be used > and when not... I dislike it generally, but it exists so <shrug>. I know others feel more strongly about it being un-kernely and the wrong way to use sysfs. Here I was remarking how the example in the cover letter made the mdev part seem totally pointless. If it is pointless then don't do it. Remember we have stripped away the actual special need to use mdev. You don't *have* to use mdev anymore to use vfio. That is a significant ideology change even from a few months ago. Jason
> From: Jason Gunthorpe <jgg@nvidia.com> > Sent: Thursday, June 3, 2021 9:50 AM > > On Thu, Jun 03, 2021 at 01:11:37AM +0000, Tian, Kevin wrote: > > > Jason, can you clarify your attitude on mdev guid stuff? Are you > > completely against it or case-by-case? If the former, this is a big > > decision thus it's better to have consensus with Alex/Kirti. If the > > latter, would like to hear your criteria for when it can be used > > and when not... > > I dislike it generally, but it exists so <shrug>. I know others feel > more strongly about it being un-kernely and the wrong way to use sysfs. > > Here I was remarking how the example in the cover letter made the mdev > part seem totally pointless. If it is pointless then don't do it. Is your point about that as long as a mdev requires pre-config through driver specific sysfs then it doesn't make sense to use mdev guid interface anymore? The value of mdev guid interface is providing a vendor-agnostic interface for mdev life-cycle management which allows one- enable-fit-all in upper management stack. Requiring vendor specific pre-config does blur the boundary here. Alex/Kirt/Cornelia, what about your opinion here? It's better we can have an consensus on when and where the existing mdev sysfs could be used, as this will affect every new mdev implementation from now on. > > Remember we have stripped away the actual special need to use > mdev. You don't *have* to use mdev anymore to use vfio. That is a > significant ideology change even from a few months ago. > Yes, "don't have to" but if there is value of doing so it's not necessary to blocking it? One point in my mind is that if we should minimize vendor-specific contracts for user to manage mdev or subdevice... Thanks Kevin
On Thu, Jun 03, 2021 at 05:52:58AM +0000, Tian, Kevin wrote: > > From: Jason Gunthorpe <jgg@nvidia.com> > > Sent: Thursday, June 3, 2021 9:50 AM > > > > On Thu, Jun 03, 2021 at 01:11:37AM +0000, Tian, Kevin wrote: > > > > > Jason, can you clarify your attitude on mdev guid stuff? Are you > > > completely against it or case-by-case? If the former, this is a big > > > decision thus it's better to have consensus with Alex/Kirti. If the > > > latter, would like to hear your criteria for when it can be used > > > and when not... > > > > I dislike it generally, but it exists so <shrug>. I know others feel > > more strongly about it being un-kernely and the wrong way to use sysfs. > > > > Here I was remarking how the example in the cover letter made the mdev > > part seem totally pointless. If it is pointless then don't do it. > > Is your point about that as long as a mdev requires pre-config > through driver specific sysfs then it doesn't make sense to use > mdev guid interface anymore? Yes > The value of mdev guid interface is providing a vendor-agnostic > interface for mdev life-cycle management which allows one- > enable-fit-all in upper management stack. Requiring vendor > specific pre-config does blur the boundary here. It isn't even vendor-agnostic - understanding the mdev_type configuration stuff is still vendor specific. Jason
On Thu, 3 Jun 2021 05:52:58 +0000 "Tian, Kevin" <kevin.tian@intel.com> wrote: > > From: Jason Gunthorpe <jgg@nvidia.com> > > Sent: Thursday, June 3, 2021 9:50 AM > > > > On Thu, Jun 03, 2021 at 01:11:37AM +0000, Tian, Kevin wrote: > > > > > Jason, can you clarify your attitude on mdev guid stuff? Are you > > > completely against it or case-by-case? If the former, this is a big > > > decision thus it's better to have consensus with Alex/Kirti. If the > > > latter, would like to hear your criteria for when it can be used > > > and when not... > > > > I dislike it generally, but it exists so <shrug>. I know others feel > > more strongly about it being un-kernely and the wrong way to use sysfs. > > > > Here I was remarking how the example in the cover letter made the mdev > > part seem totally pointless. If it is pointless then don't do it. > > Is your point about that as long as a mdev requires pre-config > through driver specific sysfs then it doesn't make sense to use > mdev guid interface anymore? Can you describe exactly what step 1. is doing in this case from the original cover letter ("Enable wq with "mdev" wq type")? That does sound a bit like configuring something to use mdev then separately going to the trouble of creating the mdev. As Jason suggests, if a wq is tagged for mdev/vfio, it could just register itself as a vfio bus driver. But if we want to use mdev, why doesn't available_instances for your mdev type simply report all unassigned wq and the `echo $UUID > create` grabs a wq for mdev? That would remove this pre-config contention, right? > The value of mdev guid interface is providing a vendor-agnostic > interface for mdev life-cycle management which allows one- > enable-fit-all in upper management stack. Requiring vendor > specific pre-config does blur the boundary here. We need to be careful about using work-avoidance in the upper management stack as a primary use case for an interface though. > Alex/Kirt/Cornelia, what about your opinion here? It's better > we can have an consensus on when and where the existing > mdev sysfs could be used, as this will affect every new mdev > implementation from now on. I have a hard time defining some fixed criteria for using mdev. It's essentially always been true that vendors could write their own vfio "bus driver", like vfio-pci or vfio-platform, specific to their device. Mdevs were meant to be a way for the (non-vfio) driver of a device to expose portions of the device through mediation for use with vfio. It seems like that's largely being done here. What I think has changed recently is this desire to make it easier to create those vendor drivers and some promise of making module binding work to avoid the messiness around picking a driver for the device. In the auxiliary bus case that I think Jason is working on, it sounds like the main device driver exposes portions of itself on an auxiliary bus where drivers on that bus can integrate into the vfio subsystem. It starts to get pretty fuzzy with what mdev already does, but it's also a more versatile interface. Is it right for everyone? Probably not. Is the pre-configuration issue here really a push vs pull problem? I can see the requirement in step 1. is dedicating some resources to an mdev use case, so at that point it seems like the argument is whether we should just create aux bus devices that get automatically bound to a vendor vfio-pci variant and we avoid the mdev lifecycle, which is both convenient and ugly. On the other hand, mdev has more of a pull interface, ie. here are a bunch of device types and how many of each we can support, use create to pull what you need. > > Remember we have stripped away the actual special need to use > > mdev. You don't *have* to use mdev anymore to use vfio. That is a > > significant ideology change even from a few months ago. > > > > Yes, "don't have to" but if there is value of doing so it's > not necessary to blocking it? One point in my mind is that if > we should minimize vendor-specific contracts for user to > manage mdev or subdevice... Again, this in itself is not a great justification for using mdev, we're creating vendor specific device types with vendor specific additional features, that could all be done via some sort of netlink interface too. The thing that pushes this more towards mdev for me is that I don't think each of these wqs appear as devices to the host, they're internal resources of the parent device and we want to compose them in ways that are slightly more amenable to traditional mdevs... I think. Thanks, Alex
Hi, Alex, Thanks for sharing your thoughts. > From: Alex Williamson <alex.williamson@redhat.com> > Sent: Friday, June 4, 2021 11:40 AM > > On Thu, 3 Jun 2021 05:52:58 +0000 > "Tian, Kevin" <kevin.tian@intel.com> wrote: > > > > From: Jason Gunthorpe <jgg@nvidia.com> > > > Sent: Thursday, June 3, 2021 9:50 AM > > > > > > On Thu, Jun 03, 2021 at 01:11:37AM +0000, Tian, Kevin wrote: > > > > > > > Jason, can you clarify your attitude on mdev guid stuff? Are you > > > > completely against it or case-by-case? If the former, this is a big > > > > decision thus it's better to have consensus with Alex/Kirti. If the > > > > latter, would like to hear your criteria for when it can be used > > > > and when not... > > > > > > I dislike it generally, but it exists so <shrug>. I know others feel > > > more strongly about it being un-kernely and the wrong way to use sysfs. > > > > > > Here I was remarking how the example in the cover letter made the mdev > > > part seem totally pointless. If it is pointless then don't do it. > > > > Is your point about that as long as a mdev requires pre-config > > through driver specific sysfs then it doesn't make sense to use > > mdev guid interface anymore? > > Can you describe exactly what step 1. is doing in this case from the > original cover letter ("Enable wq with "mdev" wq type")? That does > sound a bit like configuring something to use mdev then separately > going to the trouble of creating the mdev. As Jason suggests, if a wq > is tagged for mdev/vfio, it could just register itself as a vfio bus > driver. I'll leave to Dave to explain the exact detail in step 1. > > But if we want to use mdev, why doesn't available_instances for your > mdev type simply report all unassigned wq and the `echo $UUID > create` > grabs a wq for mdev? That would remove this pre-config contention, > right? This way could also work. It sort of changes pre-config to post-config, i.e. after an unassigned wq is grabbed for mdev, the admin then configures additional vendor specific parameters (not initialized by parent driver) before this mdev is assigned to a VM. Looks this is also what NVIDIA is doing for their vGPU, with a cmdline tool (nvidia-smi) and nvidia sysfs node for setting plugin parameters: https://docs.nvidia.com/grid/latest/pdf/grid-vgpu-user-guide.pdf But I'll leave to Dave again as there must be a reason why they choose pre-config in the first place. > > > The value of mdev guid interface is providing a vendor-agnostic > > interface for mdev life-cycle management which allows one- > > enable-fit-all in upper management stack. Requiring vendor > > specific pre-config does blur the boundary here. > > We need to be careful about using work-avoidance in the upper > management stack as a primary use case for an interface though. ok > > > Alex/Kirt/Cornelia, what about your opinion here? It's better > > we can have an consensus on when and where the existing > > mdev sysfs could be used, as this will affect every new mdev > > implementation from now on. > > I have a hard time defining some fixed criteria for using mdev. It's > essentially always been true that vendors could write their own vfio > "bus driver", like vfio-pci or vfio-platform, specific to their device. > Mdevs were meant to be a way for the (non-vfio) driver of a device to > expose portions of the device through mediation for use with vfio. It > seems like that's largely being done here. > > What I think has changed recently is this desire to make it easier to > create those vendor drivers and some promise of making module binding > work to avoid the messiness around picking a driver for the device. In > the auxiliary bus case that I think Jason is working on, it sounds like > the main device driver exposes portions of itself on an auxiliary bus > where drivers on that bus can integrate into the vfio subsystem. It > starts to get pretty fuzzy with what mdev already does, but it's also a > more versatile interface. Is it right for everyone? Probably not. idxd is also moving toward this model per Jason's suggestion. Although auxiliar bus is not directly used, idxd driver has its own bus for exposing portion of its resources. From this angle, all the motivation around mdev bus does get fuzzy... > > Is the pre-configuration issue here really a push vs pull problem? I > can see the requirement in step 1. is dedicating some resources to an > mdev use case, so at that point it seems like the argument is whether we > should just create aux bus devices that get automatically bound to a > vendor vfio-pci variant and we avoid the mdev lifecycle, which is both > convenient and ugly. On the other hand, mdev has more of a pull > interface, ie. here are a bunch of device types and how many of each we > can support, use create to pull what you need. I see your point. Looks what idxd is toward now is a mixed model. The parent driver uses a push interface to initialize a pool of instances which are then managed through mdev in a pull mode. > > > > Remember we have stripped away the actual special need to use > > > mdev. You don't *have* to use mdev anymore to use vfio. That is a > > > significant ideology change even from a few months ago. > > > > > > > Yes, "don't have to" but if there is value of doing so it's > > not necessary to blocking it? One point in my mind is that if > > we should minimize vendor-specific contracts for user to > > manage mdev or subdevice... > > Again, this in itself is not a great justification for using mdev, > we're creating vendor specific device types with vendor specific > additional features, that could all be done via some sort of netlink > interface too. The thing that pushes this more towards mdev for me is > that I don't think each of these wqs appear as devices to the host, > they're internal resources of the parent device and we want to compose > them in ways that are slightly more amenable to traditional mdevs... I > think. Thanks, > Yes, this is one reason going toward mdev. btw I'm not clear what the netlink interface will finally be, especially about whether any generic cmd should be defined cross devices given that subdevice management still has large generality. Jason, do you have an example somewhere which we can take a look regarding to mlx netlink design? Thanks Kevin
On 6/6/2021 11:22 PM, Tian, Kevin wrote: > Hi, Alex, > > Thanks for sharing your thoughts. > >> From: Alex Williamson <alex.williamson@redhat.com> >> Sent: Friday, June 4, 2021 11:40 AM >> >> On Thu, 3 Jun 2021 05:52:58 +0000 >> "Tian, Kevin" <kevin.tian@intel.com> wrote: >> >>>> From: Jason Gunthorpe <jgg@nvidia.com> >>>> Sent: Thursday, June 3, 2021 9:50 AM >>>> >>>> On Thu, Jun 03, 2021 at 01:11:37AM +0000, Tian, Kevin wrote: >>>> >>>>> Jason, can you clarify your attitude on mdev guid stuff? Are you >>>>> completely against it or case-by-case? If the former, this is a big >>>>> decision thus it's better to have consensus with Alex/Kirti. If the >>>>> latter, would like to hear your criteria for when it can be used >>>>> and when not... >>>> I dislike it generally, but it exists so <shrug>. I know others feel >>>> more strongly about it being un-kernely and the wrong way to use sysfs. >>>> >>>> Here I was remarking how the example in the cover letter made the mdev >>>> part seem totally pointless. If it is pointless then don't do it. >>> Is your point about that as long as a mdev requires pre-config >>> through driver specific sysfs then it doesn't make sense to use >>> mdev guid interface anymore? >> Can you describe exactly what step 1. is doing in this case from the >> original cover letter ("Enable wq with "mdev" wq type")? That does >> sound a bit like configuring something to use mdev then separately >> going to the trouble of creating the mdev. As Jason suggests, if a wq >> is tagged for mdev/vfio, it could just register itself as a vfio bus >> driver. > I'll leave to Dave to explain the exact detail in step 1. So in step 1, we 'tag' the wq to be dedicated to guest usage and put the hardware wq into enable state. For a dedicated mode wq, we can definitely just register directly and skip the mdev step. For a shared wq mode, we can have multiple mdev running on top of a single wq. So we need some way to create more mdevs. We can either go with the existing established creation path by mdev, or invent something custom for the driver as Jason suggested to accomodate additional virtual devices for guests. We implemented the mdev path originally with consideration of mdev is established and has a known interface already. > >> But if we want to use mdev, why doesn't available_instances for your >> mdev type simply report all unassigned wq and the `echo $UUID > create` >> grabs a wq for mdev? That would remove this pre-config contention, >> right? > This way could also work. It sort of changes pre-config to post-config, > i.e. after an unassigned wq is grabbed for mdev, the admin then > configures additional vendor specific parameters (not initialized by > parent driver) before this mdev is assigned to a VM. Looks this is also > what NVIDIA is doing for their vGPU, with a cmdline tool (nvidia-smi) > and nvidia sysfs node for setting plugin parameters: > > https://docs.nvidia.com/grid/latest/pdf/grid-vgpu-user-guide.pdf > > But I'll leave to Dave again as there must be a reason why they choose > pre-config in the first place. I think things become more complicated when we go from a dedicated wq to shared wq where the relationship of wq : mdev is 1 : 1 goes to 1 : N. Also needing to keep a consistent user config experience is desired, especially we already have such behavior since kernel 5.6 for host usages. So we really need try to avoid doing wq configuration differently just for "mdev" wqs. In the case suggested above, we basically just flipped the configuration steps. Mdev is first created through mdev sysfs interface. And then the device paramters are configured. Where for us, we configure the device parameter first, and then create the mdev. But in the end, it's still the hybrid mdev setup right? >>> The value of mdev guid interface is providing a vendor-agnostic >>> interface for mdev life-cycle management which allows one- >>> enable-fit-all in upper management stack. Requiring vendor >>> specific pre-config does blur the boundary here. >> We need to be careful about using work-avoidance in the upper >> management stack as a primary use case for an interface though. > ok > >>> Alex/Kirt/Cornelia, what about your opinion here? It's better >>> we can have an consensus on when and where the existing >>> mdev sysfs could be used, as this will affect every new mdev >>> implementation from now on. >> I have a hard time defining some fixed criteria for using mdev. It's >> essentially always been true that vendors could write their own vfio >> "bus driver", like vfio-pci or vfio-platform, specific to their device. >> Mdevs were meant to be a way for the (non-vfio) driver of a device to >> expose portions of the device through mediation for use with vfio. It >> seems like that's largely being done here. >> >> What I think has changed recently is this desire to make it easier to >> create those vendor drivers and some promise of making module binding >> work to avoid the messiness around picking a driver for the device. In >> the auxiliary bus case that I think Jason is working on, it sounds like >> the main device driver exposes portions of itself on an auxiliary bus >> where drivers on that bus can integrate into the vfio subsystem. It >> starts to get pretty fuzzy with what mdev already does, but it's also a >> more versatile interface. Is it right for everyone? Probably not. > idxd is also moving toward this model per Jason's suggestion. Although > auxiliar bus is not directly used, idxd driver has its own bus for exposing > portion of its resources. From this angle, all the motivation around mdev > bus does get fuzzy... > >> Is the pre-configuration issue here really a push vs pull problem? I >> can see the requirement in step 1. is dedicating some resources to an >> mdev use case, so at that point it seems like the argument is whether we >> should just create aux bus devices that get automatically bound to a >> vendor vfio-pci variant and we avoid the mdev lifecycle, which is both >> convenient and ugly. On the other hand, mdev has more of a pull >> interface, ie. here are a bunch of device types and how many of each we >> can support, use create to pull what you need. > I see your point. Looks what idxd is toward now is a mixed model. The > parent driver uses a push interface to initialize a pool of instances > which are then managed through mdev in a pull mode. > >>>> Remember we have stripped away the actual special need to use >>>> mdev. You don't *have* to use mdev anymore to use vfio. That is a >>>> significant ideology change even from a few months ago. >>>> >>> Yes, "don't have to" but if there is value of doing so it's >>> not necessary to blocking it? One point in my mind is that if >>> we should minimize vendor-specific contracts for user to >>> manage mdev or subdevice... >> Again, this in itself is not a great justification for using mdev, >> we're creating vendor specific device types with vendor specific >> additional features, that could all be done via some sort of netlink >> interface too. The thing that pushes this more towards mdev for me is >> that I don't think each of these wqs appear as devices to the host, >> they're internal resources of the parent device and we want to compose >> them in ways that are slightly more amenable to traditional mdevs... I >> think. Thanks, >> > Yes, this is one reason going toward mdev. > > btw I'm not clear what the netlink interface will finally be, especially > about whether any generic cmd should be defined cross devices given > that subdevice management still has large generality. Jason, do you have > an example somewhere which we can take a look regarding to mlx > netlink design? > > Thanks > Kevin
On Mon, Jun 07, 2021 at 06:22:08AM +0000, Tian, Kevin wrote: > > btw I'm not clear what the netlink interface will finally be, especially > about whether any generic cmd should be defined cross devices given > that subdevice management still has large generality. Jason, do you have > an example somewhere which we can take a look regarding to mlx > netlink design? Start here: https://lore.kernel.org/netdev/20210120090531.49553-1-saeed@kernel.org/ devlink is some more generic way to control PCI/etc devices as some side band to their usual uAPI interfaces like netlink/rdma/etc Jason
On Mon, Jun 07, 2021 at 11:13:04AM -0700, Dave Jiang wrote: > So in step 1, we 'tag' the wq to be dedicated to guest usage and put the > hardware wq into enable state. For a dedicated mode wq, we can definitely > just register directly and skip the mdev step. For a shared wq mode, we can > have multiple mdev running on top of a single wq. So we need some way to > create more mdevs. We can either go with the existing established creation > path by mdev, or invent something custom for the driver as Jason suggested > to accomodate additional virtual devices for guests. We implemented the mdev > path originally with consideration of mdev is established and has a known > interface already. It sounds like you could just as easially have a 'create new vfio' file under the idxd sysfs.. Especially since you already have a bus and dynamic vfio specific things being created on this bus. Have you gone over this with Dan? > I think things become more complicated when we go from a dedicated wq to > shared wq where the relationship of wq : mdev is 1 : 1 goes to 1 : N. Also > needing to keep a consistent user config experience is desired, especially > we already have such behavior since kernel 5.6 for host usages. So we really > need try to avoid doing wq configuration differently just for "mdev" wqs. In > the case suggested above, we basically just flipped the configuration steps. > Mdev is first created through mdev sysfs interface. And then the device > paramters are configured. Where for us, we configure the device parameter > first, and then create the mdev. But in the end, it's still the hybrid mdev > setup right? So you don't even use mdev to configure anything? Yuk. Jason
On 6/7/2021 12:11 PM, Jason Gunthorpe wrote: > On Mon, Jun 07, 2021 at 11:13:04AM -0700, Dave Jiang wrote: > >> So in step 1, we 'tag' the wq to be dedicated to guest usage and put the >> hardware wq into enable state. For a dedicated mode wq, we can definitely >> just register directly and skip the mdev step. For a shared wq mode, we can >> have multiple mdev running on top of a single wq. So we need some way to >> create more mdevs. We can either go with the existing established creation >> path by mdev, or invent something custom for the driver as Jason suggested >> to accomodate additional virtual devices for guests. We implemented the mdev >> path originally with consideration of mdev is established and has a known >> interface already. > It sounds like you could just as easially have a 'create new vfio' > file under the idxd sysfs.. Especially since you already have a bus > and dynamic vfio specific things being created on this bus. Will explore this and using of 'struct vfio_device' without mdev. > > Have you gone over this with Dan? > >> I think things become more complicated when we go from a dedicated wq to >> shared wq where the relationship of wq : mdev is 1 : 1 goes to 1 : N. Also >> needing to keep a consistent user config experience is desired, especially >> we already have such behavior since kernel 5.6 for host usages. So we really >> need try to avoid doing wq configuration differently just for "mdev" wqs. In >> the case suggested above, we basically just flipped the configuration steps. >> Mdev is first created through mdev sysfs interface. And then the device >> paramters are configured. Where for us, we configure the device parameter >> first, and then create the mdev. But in the end, it's still the hybrid mdev >> setup right? > So you don't even use mdev to configure anything? Yuk. > > Jason
On 6/8/2021 9:02 AM, Dave Jiang wrote: > > On 6/7/2021 12:11 PM, Jason Gunthorpe wrote: >> On Mon, Jun 07, 2021 at 11:13:04AM -0700, Dave Jiang wrote: >> >>> So in step 1, we 'tag' the wq to be dedicated to guest usage and put >>> the >>> hardware wq into enable state. For a dedicated mode wq, we can >>> definitely >>> just register directly and skip the mdev step. For a shared wq mode, >>> we can >>> have multiple mdev running on top of a single wq. So we need some >>> way to >>> create more mdevs. We can either go with the existing established >>> creation >>> path by mdev, or invent something custom for the driver as Jason >>> suggested >>> to accomodate additional virtual devices for guests. We implemented >>> the mdev >>> path originally with consideration of mdev is established and has a >>> known >>> interface already. >> It sounds like you could just as easially have a 'create new vfio' >> file under the idxd sysfs.. Especially since you already have a bus >> and dynamic vfio specific things being created on this bus. > > Will explore this and using of 'struct vfio_device' without mdev. > Hi Jason. I hacked the idxd driver to remove mdev association and use vfio_device directly. Ran into some issues. Specifically mdev does some special handling when it comes to iommu domain. When we hit vfio_iommu_type1_attach_group(), there's a branch in there for mdev_bus_type. It sets the group with mdev_group flag, which later has effect of special handling for iommu_attach_group. And in addition, it ends up switching the bus to pci_bus_type before iommu_domain_alloc() is called. Do we need to provide similar type of handling for vfio_device that are not backed by an entire PCI device like vfio_pci? Not sure it's the right thing to do to attach these devices to pci_bus_type directly.
On Fri, Jun 11, 2021 at 11:21:42AM -0700, Dave Jiang wrote: > > On 6/8/2021 9:02 AM, Dave Jiang wrote: > > > > On 6/7/2021 12:11 PM, Jason Gunthorpe wrote: > > > On Mon, Jun 07, 2021 at 11:13:04AM -0700, Dave Jiang wrote: > > > > > > > So in step 1, we 'tag' the wq to be dedicated to guest usage and > > > > put the > > > > hardware wq into enable state. For a dedicated mode wq, we can > > > > definitely > > > > just register directly and skip the mdev step. For a shared wq > > > > mode, we can > > > > have multiple mdev running on top of a single wq. So we need > > > > some way to > > > > create more mdevs. We can either go with the existing > > > > established creation > > > > path by mdev, or invent something custom for the driver as Jason > > > > suggested > > > > to accomodate additional virtual devices for guests. We > > > > implemented the mdev > > > > path originally with consideration of mdev is established and > > > > has a known > > > > interface already. > > > It sounds like you could just as easially have a 'create new vfio' > > > file under the idxd sysfs.. Especially since you already have a bus > > > and dynamic vfio specific things being created on this bus. > > > > Will explore this and using of 'struct vfio_device' without mdev. > > > Hi Jason. I hacked the idxd driver to remove mdev association and use > vfio_device directly. Ran into some issues. Specifically mdev does some > special handling when it comes to iommu domain. Yes, I know of this, it this needs fixing. > effect of special handling for iommu_attach_group. And in addition, it ends > up switching the bus to pci_bus_type before iommu_domain_alloc() is called. > Do we need to provide similar type of handling for vfio_device that are not > backed by an entire PCI device like vfio_pci? Not sure it's the right thing > to do to attach these devices to pci_bus_type directly. Yes, type1 needs to be changed so it somehow knows that the struct device is 'sw only', for instance because it has no direct HW IOMMU connection, then all the mdev hackery should be deleted from type1. I haven't investigated closely exactly how to do this. Jason