Message ID | 20230217225558.19837-1-shannon.nelson@amd.com (mailing list archive) |
---|---|
Headers | show |
Series | pds_core driver | expand |
On Fri, Feb 17, 2023 at 02:55:44PM -0800, Shannon Nelson wrote: > Summary: > -------- > This patchset implements new driver for use with the AMD/Pensando > Distributed Services Card (DSC), intended to provide core configuration > services through the auxiliary_bus for VFio and vDPA feature specific > drivers. Hi, I didn't look very deeply to this series, but three things caught my attention and IMHO they need to be changed/redesinged before someone can consider to merge it. 1. Use of bus_register_notifier to communicate between auxiliary devices. This whole concept makes aux logic in this driver looks horrid. The idea of auxiliary bus is separate existing device to sub-devices, while every such sub-device is controlled through relevant subsystem. Current implementation challenges this assumption by inventing own logic. 2. devm_* interfaces. It is bad. Please don't use them in such a complex driver. 3. Listen to PCI BOUND_DRIVER event can't be correct either. Thanks
On 2/19/23 2:11 AM, Leon Romanovsky wrote: > On Fri, Feb 17, 2023 at 02:55:44PM -0800, Shannon Nelson wrote: >> Summary: >> -------- >> This patchset implements new driver for use with the AMD/Pensando >> Distributed Services Card (DSC), intended to provide core configuration >> services through the auxiliary_bus for VFio and vDPA feature specific >> drivers. > > Hi, > > I didn't look very deeply to this series, but three things caught my > attention and IMHO they need to be changed/redesinged before someone > can consider to merge it. > > 1. Use of bus_register_notifier to communicate between auxiliary devices. > This whole concept makes aux logic in this driver looks horrid. The idea > of auxiliary bus is separate existing device to sub-devices, while every > such sub-device is controlled through relevant subsystem. Current > implementation challenges this assumption by inventing own logic. > 2. devm_* interfaces. It is bad. Please don't use them in such a complex > driver. > 3. Listen to PCI BOUND_DRIVER event can't be correct either. > > Thanks Hi Leon, Thanks for your comments. I’ll respond to 1 and 3 together. > 1. Use of bus_register_notifier to communicate between auxiliary devices. > 3. Listen to PCI BOUND_DRIVER event can't be correct either We’re only using the notifier for the core driver to know when to create and destroy auxiliary devices, not for communicate between auxiliary devices or drivers – I agree, that would be ugly. We want to create the auxiliary device after a VF PCI device is bound to a driver (BUS_NOTIFY_BOUND_DRIVER), and remove that auxiliary device just before a VF is unbound from its PCI driver (BUS_NOTIFY_UNBIND_DRIVER); bus_notify_register gives us access to these messages. I believe this is not too far off from other examples that can be seen in vfio/pci/vfio_pci_core, net/ethernet/ibm/emac, and net/mctp. Early on we were creating and deleting the auxiliary devices at the same time as calling sriov_enable() and sriov_disable() but found that when the user was doing unbind/bind operations we could end up with in-operative clients and occasional kernel panics because the devices and drivers were out-of-sync. Listening for the bind/unbind events allows the pds_core driver to make sure the auxiliary devices serving each of the VF drivers are kept in sync with the state of the VF PCI drivers. > 2. devm_* interfaces Can you elaborate a bit on why you see using the devm interfaces as bad? I don’t see the code complexity being any different than using the non-devm interfaces, and these give the added protection of making sure to not leak resources on driver removals, whether clean or on error. We are using these allocations for longer term driver structures, not for ephemeral short-term use buffers or fast-path operations, so the overhead should remain minimal. It seems to me this is the advertised use as described in the devres notes under Documentation. Thanks, sln
On Mon, Feb 20, 2023 at 03:53:02PM -0800, Shannon Nelson wrote: > On 2/19/23 2:11 AM, Leon Romanovsky wrote: > > On Fri, Feb 17, 2023 at 02:55:44PM -0800, Shannon Nelson wrote: > > > Summary: > > > -------- > > > This patchset implements new driver for use with the AMD/Pensando > > > Distributed Services Card (DSC), intended to provide core configuration > > > services through the auxiliary_bus for VFio and vDPA feature specific > > > drivers. > > > > Hi, > > > > I didn't look very deeply to this series, but three things caught my > > attention and IMHO they need to be changed/redesinged before someone > > can consider to merge it. > > > > 1. Use of bus_register_notifier to communicate between auxiliary devices. > > This whole concept makes aux logic in this driver looks horrid. The idea > > of auxiliary bus is separate existing device to sub-devices, while every > > such sub-device is controlled through relevant subsystem. Current > > implementation challenges this assumption by inventing own logic. > > 2. devm_* interfaces. It is bad. Please don't use them in such a complex > > driver. > > 3. Listen to PCI BOUND_DRIVER event can't be correct either. > > > > Thanks > > Hi Leon, > > Thanks for your comments. I’ll respond to 1 and 3 together. > > > 1. Use of bus_register_notifier to communicate between auxiliary devices. > > 3. Listen to PCI BOUND_DRIVER event can't be correct either > > We’re only using the notifier for the core driver to know when to create and > destroy auxiliary devices, not for communicate between auxiliary devices or > drivers – I agree, that would be ugly. > > We want to create the auxiliary device after a VF PCI device is bound to a > driver (BUS_NOTIFY_BOUND_DRIVER), and remove that auxiliary device just > before a VF is unbound from its PCI driver (BUS_NOTIFY_UNBIND_DRIVER); > bus_notify_register gives us access to these messages. I believe this is > not too far off from other examples that can be seen in > vfio/pci/vfio_pci_core, net/ethernet/ibm/emac, and net/mctp. > > Early on we were creating and deleting the auxiliary devices at the same > time as calling sriov_enable() and sriov_disable() but found that when the > user was doing unbind/bind operations we could end up with in-operative > clients and occasional kernel panics because the devices and drivers were > out-of-sync. Listening for the bind/unbind events allows the pds_core > driver to make sure the auxiliary devices serving each of the VF drivers are > kept in sync with the state of the VF PCI drivers. Auxiliary devices are intended to statically re-partition existing physical devices. You are not supposed to create such devices on the fly. So once you create VF physical device, it should already have aux device created too. This is traditional driver device model which you should follow and not reinvent your own schema, just to overcome bugs. Additionally, it is unclear how much we should invest in this review given the fact that pds VFIO series was NACKed. > > > > 2. devm_* interfaces > > Can you elaborate a bit on why you see using the devm interfaces as bad? I > don’t see the code complexity being any different than using the non-devm > interfaces, and these give the added protection of making sure to not leak > resources on driver removals, whether clean or on error. We are using these > allocations for longer term driver structures, not for ephemeral short-term > use buffers or fast-path operations, so the overhead should remain minimal. > It seems to me this is the advertised use as described in the devres notes > under Documentation. We had a very interesting talk in last LPC and overall agreement across various subsystem maintainers was to stay away from devm_* interfaces. https://lpc.events/event/16/contributions/1227/ We prefer explicit nature of kzalloc/kfree instead of implicit approach of devm_kzalloc. Thanks > > Thanks, > sln > >
On 2/20/23 10:53 PM, Leon Romanovsky wrote: > On Mon, Feb 20, 2023 at 03:53:02PM -0800, Shannon Nelson wrote: >> On 2/19/23 2:11 AM, Leon Romanovsky wrote: >>> On Fri, Feb 17, 2023 at 02:55:44PM -0800, Shannon Nelson wrote: >>>> Summary: >>>> -------- >>>> This patchset implements new driver for use with the AMD/Pensando >>>> Distributed Services Card (DSC), intended to provide core configuration >>>> services through the auxiliary_bus for VFio and vDPA feature specific >>>> drivers. >>> >>> Hi, >>> >>> I didn't look very deeply to this series, but three things caught my >>> attention and IMHO they need to be changed/redesinged before someone >>> can consider to merge it. >>> >>> 1. Use of bus_register_notifier to communicate between auxiliary devices. >>> This whole concept makes aux logic in this driver looks horrid. The idea >>> of auxiliary bus is separate existing device to sub-devices, while every >>> such sub-device is controlled through relevant subsystem. Current >>> implementation challenges this assumption by inventing own logic. >>> 2. devm_* interfaces. It is bad. Please don't use them in such a complex >>> driver. >>> 3. Listen to PCI BOUND_DRIVER event can't be correct either. >>> >>> Thanks >> >> Hi Leon, >> >> Thanks for your comments. I’ll respond to 1 and 3 together. >> >>> 1. Use of bus_register_notifier to communicate between auxiliary devices. >>> 3. Listen to PCI BOUND_DRIVER event can't be correct either >> >> We’re only using the notifier for the core driver to know when to create and >> destroy auxiliary devices, not for communicate between auxiliary devices or >> drivers – I agree, that would be ugly. >> >> We want to create the auxiliary device after a VF PCI device is bound to a >> driver (BUS_NOTIFY_BOUND_DRIVER), and remove that auxiliary device just >> before a VF is unbound from its PCI driver (BUS_NOTIFY_UNBIND_DRIVER); >> bus_notify_register gives us access to these messages. I believe this is >> not too far off from other examples that can be seen in >> vfio/pci/vfio_pci_core, net/ethernet/ibm/emac, and net/mctp. >> >> Early on we were creating and deleting the auxiliary devices at the same >> time as calling sriov_enable() and sriov_disable() but found that when the >> user was doing unbind/bind operations we could end up with in-operative >> clients and occasional kernel panics because the devices and drivers were >> out-of-sync. Listening for the bind/unbind events allows the pds_core >> driver to make sure the auxiliary devices serving each of the VF drivers are >> kept in sync with the state of the VF PCI drivers. > > Auxiliary devices are intended to statically re-partition existing physical devices. > You are not supposed to create such devices on the fly. So once you create VF physical > device, it should already have aux device created too. > > This is traditional driver device model which you should follow and not > reinvent your own schema, just to overcome bugs. Not so much a “bug”, I think, as an incomplete model which we addressed by using a hotplug model with the aux-devices. We felt we were following a part of the device model, but perhaps from the wrong angle: hot-plugging the aux-devices, which worked nicely to simplify the aux-driver/pci-driver relationship. However, I think the hole we fell into was expecting the client to be tied to a pci device on the other end, and this shouldn’t be a factor in the core’s management of the aux device. So, the aux-device built by the core needs to remain the constant, which is what we originally had: the aux devices created at pci_enable_sriov() time and torn down at pci_disable_sriov(). It’s easy enough to go back to that model, and that makes sense. Meanwhile, for those clients that do rely on the existence of a pci device, would you see that as a proper use of a bus listener to have the aux-driver subscribe and listen for its bind/unbind events? > > Additionally, it is unclear how much we should invest in this review > given the fact that pds VFIO series was NACKed. We haven’t given up on that one yet, and we have a vDPA client in mind as well possibly one or two others. > >> >> >>> 2. devm_* interfaces >> >> Can you elaborate a bit on why you see using the devm interfaces as bad? I >> don’t see the code complexity being any different than using the non-devm >> interfaces, and these give the added protection of making sure to not leak >> resources on driver removals, whether clean or on error. We are using these >> allocations for longer term driver structures, not for ephemeral short-term >> use buffers or fast-path operations, so the overhead should remain minimal. >> It seems to me this is the advertised use as described in the devres notes >> under Documentation. > > We had a very interesting talk in last LPC and overall agreement across > various subsystem maintainers was to stay away from devm_* interfaces. > https://lpc.events/event/16/contributions/1227/ > > We prefer explicit nature of kzalloc/kfree instead of implicit approach > of devm_kzalloc. Interesting – thanks, I missed that presentation. Sure, we can pull back on this. Thanks for your comments. sln
On Tue, Feb 21, 2023 at 02:03:24PM -0800, Shannon Nelson wrote: > On 2/20/23 10:53 PM, Leon Romanovsky wrote: > > On Mon, Feb 20, 2023 at 03:53:02PM -0800, Shannon Nelson wrote: > > > On 2/19/23 2:11 AM, Leon Romanovsky wrote: > > > > On Fri, Feb 17, 2023 at 02:55:44PM -0800, Shannon Nelson wrote: > > > > > Summary: > > > > > -------- > > > > > This patchset implements new driver for use with the AMD/Pensando > > > > > Distributed Services Card (DSC), intended to provide core configuration > > > > > services through the auxiliary_bus for VFio and vDPA feature specific > > > > > drivers. > > > > > > > > Hi, > > > > > > > > I didn't look very deeply to this series, but three things caught my > > > > attention and IMHO they need to be changed/redesinged before someone > > > > can consider to merge it. > > > > > > > > 1. Use of bus_register_notifier to communicate between auxiliary devices. > > > > This whole concept makes aux logic in this driver looks horrid. The idea > > > > of auxiliary bus is separate existing device to sub-devices, while every > > > > such sub-device is controlled through relevant subsystem. Current > > > > implementation challenges this assumption by inventing own logic. > > > > 2. devm_* interfaces. It is bad. Please don't use them in such a complex > > > > driver. > > > > 3. Listen to PCI BOUND_DRIVER event can't be correct either. > > > > > > > > Thanks > > > > > > Hi Leon, > > > > > > Thanks for your comments. I’ll respond to 1 and 3 together. > > > > > > > 1. Use of bus_register_notifier to communicate between auxiliary devices. > > > > 3. Listen to PCI BOUND_DRIVER event can't be correct either > > > > > > We’re only using the notifier for the core driver to know when to create and > > > destroy auxiliary devices, not for communicate between auxiliary devices or > > > drivers – I agree, that would be ugly. > > > > > > We want to create the auxiliary device after a VF PCI device is bound to a > > > driver (BUS_NOTIFY_BOUND_DRIVER), and remove that auxiliary device just > > > before a VF is unbound from its PCI driver (BUS_NOTIFY_UNBIND_DRIVER); > > > bus_notify_register gives us access to these messages. I believe this is > > > not too far off from other examples that can be seen in > > > vfio/pci/vfio_pci_core, net/ethernet/ibm/emac, and net/mctp. > > > > > > Early on we were creating and deleting the auxiliary devices at the same > > > time as calling sriov_enable() and sriov_disable() but found that when the > > > user was doing unbind/bind operations we could end up with in-operative > > > clients and occasional kernel panics because the devices and drivers were > > > out-of-sync. Listening for the bind/unbind events allows the pds_core > > > driver to make sure the auxiliary devices serving each of the VF drivers are > > > kept in sync with the state of the VF PCI drivers. > > > > Auxiliary devices are intended to statically re-partition existing physical devices. > > You are not supposed to create such devices on the fly. So once you create VF physical > > device, it should already have aux device created too. > > > > This is traditional driver device model which you should follow and not > > reinvent your own schema, just to overcome bugs. > > Not so much a “bug”, I think, as an incomplete model which we addressed by > using a hotplug model with the aux-devices. We felt we were following a > part of the device model, but perhaps from the wrong angle: hot-plugging the > aux-devices, which worked nicely to simplify the aux-driver/pci-driver > relationship. By looking (very briefly) on your pds_core and pds_vfio code, your notification scheme worked by chance as it lacks proper PCI and driver locks. You probably didn't stress enough PCI PF teardown. > However, I think the hole we fell into was expecting the > client to be tied to a pci device on the other end, and this shouldn’t be a > factor in the core’s management of the aux device. > > So, the aux-device built by the core needs to remain the constant, which is > what we originally had: the aux devices created at pci_enable_sriov() time > and torn down at pci_disable_sriov(). It’s easy enough to go back to that > model, and that makes sense. Not really, aux device creation is supposed to be in .probe() call of your PCI function, whenever it is PF or VF. After call to pci_enable_sriov(), the HW will add VFs, which will cause to PCI core to attempt and load pds_core .probe() callback. > > Meanwhile, for those clients that do rely on the existence of a pci device, > would you see that as a proper use of a bus listener to have the aux-driver > subscribe and listen for its bind/unbind events? No, these events must go. > > > > > > Additionally, it is unclear how much we should invest in this review > > given the fact that pds VFIO series was NACKed. > > We haven’t given up on that one yet, and we have a vDPA client in mind as > well possibly one or two others. I don't know about your plans and judge by what I saw in ML, VFIO was NACKed for second time already on the same ground. Thanks > > > > > > > > > > > > > > 2. devm_* interfaces > > > > > > Can you elaborate a bit on why you see using the devm interfaces as bad? I > > > don’t see the code complexity being any different than using the non-devm > > > interfaces, and these give the added protection of making sure to not leak > > > resources on driver removals, whether clean or on error. We are using these > > > allocations for longer term driver structures, not for ephemeral short-term > > > use buffers or fast-path operations, so the overhead should remain minimal. > > > It seems to me this is the advertised use as described in the devres notes > > > under Documentation. > > > > We had a very interesting talk in last LPC and overall agreement across > > various subsystem maintainers was to stay away from devm_* interfaces. > > https://lpc.events/event/16/contributions/1227/ > > > > We prefer explicit nature of kzalloc/kfree instead of implicit approach > > of devm_kzalloc. > > Interesting – thanks, I missed that presentation. Sure, we can pull back on > this. > > Thanks for your comments. > > sln >
On 2/17/23 2:55 PM, Shannon Nelson wrote: > Summary: > -------- > This patchset implements new driver for use with the AMD/Pensando > Distributed Services Card (DSC), intended to provide core configuration > services through the auxiliary_bus for VFio and vDPA feature specific > drivers. Thank you all for your responses and commentary on this initial review of our PDS drivers, we appreciate the constructive input. Obviously, the organization of our driver parts, PCI vs auxiliary and PF vs VF, doesn’t match what most folks were expecting to see. The primary issue seems to be that the modules responsible for the VF’s functions (e.g. the VFio or vDPA modules) are set up to be both PCI and auxiliary drivers, which doesn’t match other designs, and presents potential coordination/locking concerns. The expectation seems to be that the one Core module should handle both the VF and PF PCI driver work and from there set up the needed auxiliary devices, and things like vDPA modules should just be auxiliary drivers on one end and supply their respective features on the other end. Meanwhile, because the nature of VFio is to be a PCI driver, pds_vfio needs to have its aux stuff removed and needs to access the pds_core adminq directly by getting a pointer to the PF’s data, as done in other similar drivers. We think this reorganization of parts will resolve the complexity, bind/unbind, and most other points of discussion. I expect to get a first pass at this re-org done over the next few days and have an RFC for viewing sometime in the next week. Cheers, sln