Message ID | 20230307220553.631069-1-jaz@semihalf.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | vfio/pci: Propagate ACPI notifications to the user-space | expand |
On Tue, 7 Mar 2023 22:05:53 +0000 Grzegorz Jaszczyk <jaz@semihalf.com> wrote: > From: Dominik Behr <dbehr@chromium.org> > > Hitherto there was no support for propagating ACPI notifications to the > guest drivers. In order to provide such support, install a handler for > notifications on an ACPI device during vfio-pci device registration. The > handler role is to propagate such ACPI notifications to the user-space > via acpi netlink events, which allows VMM to receive and propagate them > further to the VMs. > > Thanks to the above, the actual driver for the pass-through device, > which belongs to the guest, can receive and react to device specific > notifications. What consumes these events? Has this been proposed to any VM management tools like libvirt? What sort of ACPI events are we expecting to see here and what does userspace do with them? > Signed-off-by: Dominik Behr <dbehr@chromium.org> > Co-developed-by: Grzegorz Jaszczyk <jaz@semihalf.com> > Signed-off-by: Grzegorz Jaszczyk <jaz@semihalf.com> > --- > drivers/vfio/pci/vfio_pci_core.c | 33 ++++++++++++++++++++++++++++++++ > 1 file changed, 33 insertions(+) > > diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c > index a5ab416cf476..92b8ed8d087c 100644 > --- a/drivers/vfio/pci/vfio_pci_core.c > +++ b/drivers/vfio/pci/vfio_pci_core.c > @@ -10,6 +10,7 @@ > > #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt > > +#include <linux/acpi.h> > #include <linux/aperture.h> > #include <linux/device.h> > #include <linux/eventfd.h> > @@ -2120,10 +2121,20 @@ void vfio_pci_core_release_dev(struct vfio_device *core_vdev) > } > EXPORT_SYMBOL_GPL(vfio_pci_core_release_dev); > > +static void vfio_pci_core_acpi_notify(acpi_handle handle, u32 event, void *data) > +{ > + struct vfio_pci_core_device *vdev = (struct vfio_pci_core_device *)data; > + struct device *dev = &vdev->pdev->dev; > + > + acpi_bus_generate_netlink_event("vfio_pci", dev_name(dev), event, 0); Who listens to this? Should there be an in-band means to provide notifies related to the device? How does a userspace driver know to look for netlink events for a particular device? > +} > + > int vfio_pci_core_register_device(struct vfio_pci_core_device *vdev) > { > + acpi_status status; > struct pci_dev *pdev = vdev->pdev; > struct device *dev = &pdev->dev; > + struct acpi_device *adev = ACPI_COMPANION(&pdev->dev); > int ret; > > /* Drivers must set the vfio_pci_core_device to their drvdata */ > @@ -2201,8 +2212,24 @@ int vfio_pci_core_register_device(struct vfio_pci_core_device *vdev) > ret = vfio_register_group_dev(&vdev->vdev); > if (ret) > goto out_power; > + > + if (!adev) { > + pci_info(pdev, "No ACPI companion"); This would be a log message generated for 99.99% of devices. > + return 0; > + } > + > + status = acpi_install_notify_handler(adev->handle, ACPI_DEVICE_NOTIFY, > + vfio_pci_core_acpi_notify, (void *)vdev); vfio-pci supports non-ACPI platforms, I don't see any !CONFIG_ACPI prototypes for this function. Thanks, Alex > + > + if (ACPI_FAILURE(status)) { > + pci_err(pdev, "Failed to install notify handler"); > + goto out_group_register; > + } > + > return 0; > > +out_group_register: > + vfio_unregister_group_dev(&vdev->vdev); > out_power: > if (!disable_idle_d3) > pm_runtime_get_noresume(dev); > @@ -2216,6 +2243,12 @@ EXPORT_SYMBOL_GPL(vfio_pci_core_register_device); > > void vfio_pci_core_unregister_device(struct vfio_pci_core_device *vdev) > { > + struct acpi_device *adev = ACPI_COMPANION(&vdev->pdev->dev); > + > + if (adev) > + acpi_remove_notify_handler(adev->handle, ACPI_DEVICE_NOTIFY, > + vfio_pci_core_acpi_notify); > + > vfio_pci_core_sriov_configure(vdev, 0); > > vfio_unregister_group_dev(&vdev->vdev);
> From: Alex Williamson <alex.williamson@redhat.com> > Sent: Wednesday, March 8, 2023 7:42 AM > > On Tue, 7 Mar 2023 22:05:53 +0000 > Grzegorz Jaszczyk <jaz@semihalf.com> wrote: > > > From: Dominik Behr <dbehr@chromium.org> > > > > Hitherto there was no support for propagating ACPI notifications to the > > guest drivers. In order to provide such support, install a handler for > > notifications on an ACPI device during vfio-pci device registration. The > > handler role is to propagate such ACPI notifications to the user-space > > via acpi netlink events, which allows VMM to receive and propagate them > > further to the VMs. > > > > Thanks to the above, the actual driver for the pass-through device, > > which belongs to the guest, can receive and react to device specific > > notifications. > > What consumes these events? Has this been proposed to any VM > management tools like libvirt? What sort of ACPI events are we > expecting to see here and what does userspace do with them? > and the VM sees a virtual platform and virtual ACPI. Usually an ACPI event triggers parsing-executing certain ACPI function which needs to further access platform resource. Presumably someone should copy the related ACPI function into virtual ACPI table and then in concept we should allow an ACPI event routed to userspace only if the related platform resource has been assigned to the user. What would the mechanism to audit it? and if we have a way to do it probably the ACPI event would be translated into a more generic event mechanism in the vfio-platform driver covering related resource.
śr., 8 mar 2023 o 00:42 Alex Williamson <alex.williamson@redhat.com> napisał(a): > > On Tue, 7 Mar 2023 22:05:53 +0000 > Grzegorz Jaszczyk <jaz@semihalf.com> wrote: > > > From: Dominik Behr <dbehr@chromium.org> > > > > Hitherto there was no support for propagating ACPI notifications to the > > guest drivers. In order to provide such support, install a handler for > > notifications on an ACPI device during vfio-pci device registration. The > > handler role is to propagate such ACPI notifications to the user-space > > via acpi netlink events, which allows VMM to receive and propagate them > > further to the VMs. > > > > Thanks to the above, the actual driver for the pass-through device, > > which belongs to the guest, can receive and react to device specific > > notifications. > What consumes these events? Those events are consumed by the VMM, which can have a built-in ACPI event listener. > Has this been proposed to any VM management tools like libvirt? This patch was evaluated and tested with crosvm VMM (but since the kernel part is not in the tree the implementation is marked as WIP). > What sort of ACPI events are we expecting to see here and what does user space do with them? With this patch we are expecting to see and propagate any device specific notifications, which are aimed to notify the proper device (driver) which belongs to the guest. Here is the description how propagating such notification could be implemented by VMM: 1) VMM could upfront generate proper virtual ACPI description for guest per vfio-pci device (more precisely it could be e.g. ACPI GPE handler, which aim is only to notify relevant device): Scope (_GPE) { Method (_E00, 0, NotSerialized) // _Exx: Edge-Triggered GPE, xx=0x00-0xFF { Local0 = \_SB.PC00.PE08.NOTY Notify (\_SB.PC00.PE08, Local0) } } 2) Now, when the VMM receives ACPI netlink event (thanks to VMM builtin ACPI event listener, which is able to receive any event generated through acpi_bus_generate_netlink_event) VMM classifies it based on device_class ("vfio_pci" in this case) and parses it further to get device name and the notification value for it. This notification value is stored in a virtual register and VMM triggers GPE associated with the pci-vfio device. 3) Guest kernel upon handling GPE, thanks to generated AML (ad 1.), triggers Notify on required pass-through device and therefore replicates the ACPI Notification on the guest side (Accessing \_SB.PC00.PE08.NOTY from above example, result with trap to VMM, which returns previously stored notify value). With above the ACPI notifications are actually replicated on the guest side and from a guest driver perspective they don't differ from native ones. > > > Signed-off-by: Dominik Behr <dbehr@chromium.org> > > Co-developed-by: Grzegorz Jaszczyk <jaz@semihalf.com> > > Signed-off-by: Grzegorz Jaszczyk <jaz@semihalf.com> > > --- > > drivers/vfio/pci/vfio_pci_core.c | 33 ++++++++++++++++++++++++++++++++ > > 1 file changed, 33 insertions(+) > > > > diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c > > index a5ab416cf476..92b8ed8d087c 100644 > > --- a/drivers/vfio/pci/vfio_pci_core.c > > +++ b/drivers/vfio/pci/vfio_pci_core.c > > @@ -10,6 +10,7 @@ > > > > #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt > > > > +#include <linux/acpi.h> > > #include <linux/aperture.h> > > #include <linux/device.h> > > #include <linux/eventfd.h> > > @@ -2120,10 +2121,20 @@ void vfio_pci_core_release_dev(struct vfio_device *core_vdev) > > } > > EXPORT_SYMBOL_GPL(vfio_pci_core_release_dev); > > > > +static void vfio_pci_core_acpi_notify(acpi_handle handle, u32 event, void *data) > > +{ > > + struct vfio_pci_core_device *vdev = (struct vfio_pci_core_device *)data; > > + struct device *dev = &vdev->pdev->dev; > > + > > + acpi_bus_generate_netlink_event("vfio_pci", dev_name(dev), event, 0); > > Who listens to this? Should there be an in-band means to provide > notifies related to the device? How does a userspace driver know to > look for netlink events for a particular device? VMM which has implemented logic responsible for listening on acpi netlink events. This netlink message already passes the device name so VMM will associate it with a particular device. I've elaborated a bit more in my previous answer. > > > +} > > + > > int vfio_pci_core_register_device(struct vfio_pci_core_device *vdev) > > { > > + acpi_status status; > > struct pci_dev *pdev = vdev->pdev; > > struct device *dev = &pdev->dev; > > + struct acpi_device *adev = ACPI_COMPANION(&pdev->dev); > > int ret; > > > > /* Drivers must set the vfio_pci_core_device to their drvdata */ > > @@ -2201,8 +2212,24 @@ int vfio_pci_core_register_device(struct vfio_pci_core_device *vdev) > > ret = vfio_register_group_dev(&vdev->vdev); > > if (ret) > > goto out_power; > > + > > + if (!adev) { > > + pci_info(pdev, "No ACPI companion"); > > This would be a log message generated for 99.99% of devices. Sure - I will remove that. > > > + return 0; > > + } > > + > > + status = acpi_install_notify_handler(adev->handle, ACPI_DEVICE_NOTIFY, > > + vfio_pci_core_acpi_notify, (void *)vdev); > > vfio-pci supports non-ACPI platforms, I don't see any !CONFIG_ACPI > prototypes for this function. Thanks, Good point, I will address this in the next version. Thank you, Grzegorz > > Alex > > > + > > + if (ACPI_FAILURE(status)) { > > + pci_err(pdev, "Failed to install notify handler"); > > + goto out_group_register; > > + } > > + > > return 0; > > > > +out_group_register: > > + vfio_unregister_group_dev(&vdev->vdev); > > out_power: > > if (!disable_idle_d3) > > pm_runtime_get_noresume(dev); > > @@ -2216,6 +2243,12 @@ EXPORT_SYMBOL_GPL(vfio_pci_core_register_device); > > > > void vfio_pci_core_unregister_device(struct vfio_pci_core_device *vdev) > > { > > + struct acpi_device *adev = ACPI_COMPANION(&vdev->pdev->dev); > > + > > + if (adev) > > + acpi_remove_notify_handler(adev->handle, ACPI_DEVICE_NOTIFY, > > + vfio_pci_core_acpi_notify); > > + > > vfio_pci_core_sriov_configure(vdev, 0); > > > > vfio_unregister_group_dev(&vdev->vdev); >
Hi Grzegorz, Thank you for the patch! Yet something to improve: [auto build test ERROR on awilliam-vfio/for-linus] [also build test ERROR on linus/master v6.3-rc1 next-20230308] [cannot apply to awilliam-vfio/next] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/Grzegorz-Jaszczyk/vfio-pci-Propagate-ACPI-notifications-to-the-user-space/20230308-060849 base: https://github.com/awilliam/linux-vfio.git for-linus patch link: https://lore.kernel.org/r/20230307220553.631069-1-jaz%40semihalf.com patch subject: [PATCH] vfio/pci: Propagate ACPI notifications to the user-space config: openrisc-randconfig-r004-20230305 (https://download.01.org/0day-ci/archive/20230308/202303082215.ido9E51S-lkp@intel.com/config) compiler: or1k-linux-gcc (GCC) 12.1.0 reproduce (this is a W=1 build): wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # https://github.com/intel-lab-lkp/linux/commit/1f21a10b45b22629acc63ecf2d4c2e4573c44283 git remote add linux-review https://github.com/intel-lab-lkp/linux git fetch --no-tags linux-review Grzegorz-Jaszczyk/vfio-pci-Propagate-ACPI-notifications-to-the-user-space/20230308-060849 git checkout 1f21a10b45b22629acc63ecf2d4c2e4573c44283 # save the config file mkdir build_dir && cp config build_dir/.config COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=openrisc olddefconfig COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=openrisc SHELL=/bin/bash drivers/vfio/pci/ If you fix the issue, kindly add following tag where applicable | Reported-by: kernel test robot <lkp@intel.com> | Link: https://lore.kernel.org/oe-kbuild-all/202303082215.ido9E51S-lkp@intel.com/ All errors (new ones prefixed by >>): drivers/vfio/pci/vfio_pci_core.c: In function 'vfio_pci_core_acpi_notify': >> drivers/vfio/pci/vfio_pci_core.c:2128:9: error: implicit declaration of function 'acpi_bus_generate_netlink_event' [-Werror=implicit-function-declaration] 2128 | acpi_bus_generate_netlink_event("vfio_pci", dev_name(dev), event, 0); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ drivers/vfio/pci/vfio_pci_core.c: In function 'vfio_pci_core_register_device': >> drivers/vfio/pci/vfio_pci_core.c:2220:50: error: invalid use of undefined type 'struct acpi_device' 2220 | status = acpi_install_notify_handler(adev->handle, ACPI_DEVICE_NOTIFY, | ^~ drivers/vfio/pci/vfio_pci_core.c: In function 'vfio_pci_core_unregister_device': drivers/vfio/pci/vfio_pci_core.c:2248:48: error: invalid use of undefined type 'struct acpi_device' 2248 | acpi_remove_notify_handler(adev->handle, ACPI_DEVICE_NOTIFY, | ^~ cc1: some warnings being treated as errors vim +/acpi_bus_generate_netlink_event +2128 drivers/vfio/pci/vfio_pci_core.c 2122 2123 static void vfio_pci_core_acpi_notify(acpi_handle handle, u32 event, void *data) 2124 { 2125 struct vfio_pci_core_device *vdev = (struct vfio_pci_core_device *)data; 2126 struct device *dev = &vdev->pdev->dev; 2127 > 2128 acpi_bus_generate_netlink_event("vfio_pci", dev_name(dev), event, 0); 2129 } 2130 2131 int vfio_pci_core_register_device(struct vfio_pci_core_device *vdev) 2132 { 2133 acpi_status status; 2134 struct pci_dev *pdev = vdev->pdev; 2135 struct device *dev = &pdev->dev; 2136 struct acpi_device *adev = ACPI_COMPANION(&pdev->dev); 2137 int ret; 2138 2139 /* Drivers must set the vfio_pci_core_device to their drvdata */ 2140 if (WARN_ON(vdev != dev_get_drvdata(dev))) 2141 return -EINVAL; 2142 2143 if (pdev->hdr_type != PCI_HEADER_TYPE_NORMAL) 2144 return -EINVAL; 2145 2146 if (vdev->vdev.mig_ops) { 2147 if (!(vdev->vdev.mig_ops->migration_get_state && 2148 vdev->vdev.mig_ops->migration_set_state && 2149 vdev->vdev.mig_ops->migration_get_data_size) || 2150 !(vdev->vdev.migration_flags & VFIO_MIGRATION_STOP_COPY)) 2151 return -EINVAL; 2152 } 2153 2154 if (vdev->vdev.log_ops && !(vdev->vdev.log_ops->log_start && 2155 vdev->vdev.log_ops->log_stop && 2156 vdev->vdev.log_ops->log_read_and_clear)) 2157 return -EINVAL; 2158 2159 /* 2160 * Prevent binding to PFs with VFs enabled, the VFs might be in use 2161 * by the host or other users. We cannot capture the VFs if they 2162 * already exist, nor can we track VF users. Disabling SR-IOV here 2163 * would initiate removing the VFs, which would unbind the driver, 2164 * which is prone to blocking if that VF is also in use by vfio-pci. 2165 * Just reject these PFs and let the user sort it out. 2166 */ 2167 if (pci_num_vf(pdev)) { 2168 pci_warn(pdev, "Cannot bind to PF with SR-IOV enabled\n"); 2169 return -EBUSY; 2170 } 2171 2172 if (pci_is_root_bus(pdev->bus)) { 2173 ret = vfio_assign_device_set(&vdev->vdev, vdev); 2174 } else if (!pci_probe_reset_slot(pdev->slot)) { 2175 ret = vfio_assign_device_set(&vdev->vdev, pdev->slot); 2176 } else { 2177 /* 2178 * If there is no slot reset support for this device, the whole 2179 * bus needs to be grouped together to support bus-wide resets. 2180 */ 2181 ret = vfio_assign_device_set(&vdev->vdev, pdev->bus); 2182 } 2183 2184 if (ret) 2185 return ret; 2186 ret = vfio_pci_vf_init(vdev); 2187 if (ret) 2188 return ret; 2189 ret = vfio_pci_vga_init(vdev); 2190 if (ret) 2191 goto out_vf; 2192 2193 vfio_pci_probe_power_state(vdev); 2194 2195 /* 2196 * pci-core sets the device power state to an unknown value at 2197 * bootup and after being removed from a driver. The only 2198 * transition it allows from this unknown state is to D0, which 2199 * typically happens when a driver calls pci_enable_device(). 2200 * We're not ready to enable the device yet, but we do want to 2201 * be able to get to D3. Therefore first do a D0 transition 2202 * before enabling runtime PM. 2203 */ 2204 vfio_pci_set_power_state(vdev, PCI_D0); 2205 2206 dev->driver->pm = &vfio_pci_core_pm_ops; 2207 pm_runtime_allow(dev); 2208 if (!disable_idle_d3) 2209 pm_runtime_put(dev); 2210 2211 ret = vfio_register_group_dev(&vdev->vdev); 2212 if (ret) 2213 goto out_power; 2214 2215 if (!adev) { 2216 pci_info(pdev, "No ACPI companion"); 2217 return 0; 2218 } 2219 > 2220 status = acpi_install_notify_handler(adev->handle, ACPI_DEVICE_NOTIFY, 2221 vfio_pci_core_acpi_notify, (void *)vdev); 2222 2223 if (ACPI_FAILURE(status)) { 2224 pci_err(pdev, "Failed to install notify handler"); 2225 goto out_group_register; 2226 } 2227 2228 return 0; 2229 2230 out_group_register: 2231 vfio_unregister_group_dev(&vdev->vdev); 2232 out_power: 2233 if (!disable_idle_d3) 2234 pm_runtime_get_noresume(dev); 2235 2236 pm_runtime_forbid(dev); 2237 out_vf: 2238 vfio_pci_vf_uninit(vdev); 2239 return ret; 2240 } 2241 EXPORT_SYMBOL_GPL(vfio_pci_core_register_device); 2242
[Cc +libvir-list] On Wed, 8 Mar 2023 12:41:24 +0100 Grzegorz Jaszczyk <jaz@semihalf.com> wrote: > śr., 8 mar 2023 o 00:42 Alex Williamson <alex.williamson@redhat.com> napisał(a): > > > > On Tue, 7 Mar 2023 22:05:53 +0000 > > Grzegorz Jaszczyk <jaz@semihalf.com> wrote: > > > > > From: Dominik Behr <dbehr@chromium.org> > > > > > > Hitherto there was no support for propagating ACPI notifications to the > > > guest drivers. In order to provide such support, install a handler for > > > notifications on an ACPI device during vfio-pci device registration. The > > > handler role is to propagate such ACPI notifications to the user-space > > > via acpi netlink events, which allows VMM to receive and propagate them > > > further to the VMs. > > > > > > Thanks to the above, the actual driver for the pass-through device, > > > which belongs to the guest, can receive and react to device specific > > > notifications. > > > What consumes these events? > > Those events are consumed by the VMM, which can have a built-in ACPI > event listener. > > > Has this been proposed to any VM management tools like libvirt? > > This patch was evaluated and tested with crosvm VMM (but since the > kernel part is not in the tree the implementation is marked as WIP). Adding libvirt folks. This intentionally designs the interface in a way that requires a privileged intermediary to monitor netlink on the host, associate messages to VMs based on an attached device, and re-inject the event to the VMM. Why wouldn't we use a channel associated with the device for such events, such that the VMM has direct access? The netlink path seems like it has more moving pieces, possibly scalability issues, and maybe security issues? > > What sort of ACPI events are we expecting to see here and what does user space do with them? > > With this patch we are expecting to see and propagate any device > specific notifications, which are aimed to notify the proper device > (driver) which belongs to the guest. > > Here is the description how propagating such notification could be > implemented by VMM: > > 1) VMM could upfront generate proper virtual ACPI description for > guest per vfio-pci device (more precisely it could be e.g. ACPI GPE > handler, which aim is only to notify relevant device): The proposed interface really has no introspection, how does the VMM know which devices need ACPI tables added "upfront"? How do these events factor into hotplug device support, where we may not be able to dynamically inject ACPI code into the VM? > > Scope (_GPE) > { > Method (_E00, 0, NotSerialized) // _Exx: Edge-Triggered > GPE, xx=0x00-0xFF > { > Local0 = \_SB.PC00.PE08.NOTY > Notify (\_SB.PC00.PE08, Local0) > } > } > > 2) Now, when the VMM receives ACPI netlink event (thanks to VMM > builtin ACPI event listener, which is able to receive any event > generated through acpi_bus_generate_netlink_event) VMM classifies it > based on device_class ("vfio_pci" in this case) and parses it further > to get device name and the notification value for it. This > notification value is stored in a virtual register and VMM triggers > GPE associated with the pci-vfio device. Each VMM is listening for netlink events and sees all the netlink traffic from the host, including events destined for other VMMs? This doesn't seem terribly acceptable from a security perspective. > 3) Guest kernel upon handling GPE, thanks to generated AML (ad 1.), > triggers Notify on required pass-through device and therefore > replicates the ACPI Notification on the guest side (Accessing > \_SB.PC00.PE08.NOTY from above example, result with trap to VMM, which > returns previously stored notify value). The acpi_bus_generate_netlink_event() below really only seems to form a u8 event type from the u32 event. Is this something that could be provided directly from the vfio device uAPI with an ioeventfd, thus providing introspection that a device supports ACPI event notifications and the ability for the VMM to exclusively monitor those events, and only those events for the device, without additional privileges? Thanks, Alex > With above the ACPI notifications are actually replicated on the guest > side and from a guest driver perspective they don't differ from native > ones. > > > > > > Signed-off-by: Dominik Behr <dbehr@chromium.org> > > > Co-developed-by: Grzegorz Jaszczyk <jaz@semihalf.com> > > > Signed-off-by: Grzegorz Jaszczyk <jaz@semihalf.com> > > > --- > > > drivers/vfio/pci/vfio_pci_core.c | 33 ++++++++++++++++++++++++++++++++ > > > 1 file changed, 33 insertions(+) > > > > > > diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c > > > index a5ab416cf476..92b8ed8d087c 100644 > > > --- a/drivers/vfio/pci/vfio_pci_core.c > > > +++ b/drivers/vfio/pci/vfio_pci_core.c > > > @@ -10,6 +10,7 @@ > > > > > > #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt > > > > > > +#include <linux/acpi.h> > > > #include <linux/aperture.h> > > > #include <linux/device.h> > > > #include <linux/eventfd.h> > > > @@ -2120,10 +2121,20 @@ void vfio_pci_core_release_dev(struct vfio_device *core_vdev) > > > } > > > EXPORT_SYMBOL_GPL(vfio_pci_core_release_dev); > > > > > > +static void vfio_pci_core_acpi_notify(acpi_handle handle, u32 event, void *data) > > > +{ > > > + struct vfio_pci_core_device *vdev = (struct vfio_pci_core_device *)data; > > > + struct device *dev = &vdev->pdev->dev; > > > + > > > + acpi_bus_generate_netlink_event("vfio_pci", dev_name(dev), event, 0); > > > > Who listens to this? Should there be an in-band means to provide > > notifies related to the device? How does a userspace driver know to > > look for netlink events for a particular device? > > VMM which has implemented logic responsible for listening on acpi > netlink events. This netlink message already passes the device name so > VMM will associate it with a particular device. I've elaborated a bit > more in my previous answer. > > > > > > +} > > > + > > > int vfio_pci_core_register_device(struct vfio_pci_core_device *vdev) > > > { > > > + acpi_status status; > > > struct pci_dev *pdev = vdev->pdev; > > > struct device *dev = &pdev->dev; > > > + struct acpi_device *adev = ACPI_COMPANION(&pdev->dev); > > > int ret; > > > > > > /* Drivers must set the vfio_pci_core_device to their drvdata */ > > > @@ -2201,8 +2212,24 @@ int vfio_pci_core_register_device(struct vfio_pci_core_device *vdev) > > > ret = vfio_register_group_dev(&vdev->vdev); > > > if (ret) > > > goto out_power; > > > + > > > + if (!adev) { > > > + pci_info(pdev, "No ACPI companion"); > > > > This would be a log message generated for 99.99% of devices. > > Sure - I will remove that. > > > > > > + return 0; > > > + } > > > + > > > + status = acpi_install_notify_handler(adev->handle, ACPI_DEVICE_NOTIFY, > > > + vfio_pci_core_acpi_notify, (void *)vdev); > > > > vfio-pci supports non-ACPI platforms, I don't see any !CONFIG_ACPI > > prototypes for this function. Thanks, > > Good point, I will address this in the next version. > > Thank you, > Grzegorz > > > > > Alex > > > > > + > > > + if (ACPI_FAILURE(status)) { > > > + pci_err(pdev, "Failed to install notify handler"); > > > + goto out_group_register; > > > + } > > > + > > > return 0; > > > > > > +out_group_register: > > > + vfio_unregister_group_dev(&vdev->vdev); > > > out_power: > > > if (!disable_idle_d3) > > > pm_runtime_get_noresume(dev); > > > @@ -2216,6 +2243,12 @@ EXPORT_SYMBOL_GPL(vfio_pci_core_register_device); > > > > > > void vfio_pci_core_unregister_device(struct vfio_pci_core_device *vdev) > > > { > > > + struct acpi_device *adev = ACPI_COMPANION(&vdev->pdev->dev); > > > + > > > + if (adev) > > > + acpi_remove_notify_handler(adev->handle, ACPI_DEVICE_NOTIFY, > > > + vfio_pci_core_acpi_notify); > > > + > > > vfio_pci_core_sriov_configure(vdev, 0); > > > > > > vfio_unregister_group_dev(&vdev->vdev); > > >
On Wed, Mar 8, 2023 at 9:49 AM Alex Williamson <alex.williamson@redhat.com> wrote: > Adding libvirt folks. This intentionally designs the interface in a > way that requires a privileged intermediary to monitor netlink on the > host, associate messages to VMs based on an attached device, and > re-inject the event to the VMM. Why wouldn't we use a channel > associated with the device for such events, such that the VMM has > direct access? The netlink path seems like it has more moving pieces, > possibly scalability issues, and maybe security issues? It is the same interface as other ACPI events like AC adapter LID etc are forwarded to user-space. ACPI events are not particularly high frequency like interrupts. > > > What sort of ACPI events are we expecting to see here and what does user space do with them? The use we are looking at right now are D-notifier events about the GPU power available to mobile discrete GPUs. The firmware notifies the GPU driver and resource daemon to dynamically adjust the amount of power that can be used by the GPU. > The proposed interface really has no introspection, how does the VMM > know which devices need ACPI tables added "upfront"? How do these > events factor into hotplug device support, where we may not be able to > dynamically inject ACPI code into the VM? The VMM can examine PCI IDs and the associated firmware node of the PCI device to figure out what events to expect and what ACPI table to generate to support it but that should not be necessary. A generic GPE based ACPI event forwarder as Grzegorz proposed can be injected at VM init time and handle any notification that comes later, even from hotplug devices. > The acpi_bus_generate_netlink_event() below really only seems to form a > u8 event type from the u32 event. Is this something that could be > provided directly from the vfio device uAPI with an ioeventfd, thus > providing introspection that a device supports ACPI event notifications > and the ability for the VMM to exclusively monitor those events, and > only those events for the device, without additional privileges? From what I can see these events are 8 bit as they come from ACPI. They also do not carry any payload and it is up to the receiving driver to query any additional context/state from the device. This will work the same in the VM where driver can query the same information from the passed through PCI device. There are multiple other netflink based ACPI events forwarders which do exactly the same thing for other devices like AC adapter, lid/power button, ACPI thermal notifications, etc. They all use the same mechanism and can be received by user-space programs whether VMMs or others.
On Wed, 8 Mar 2023 10:45:51 -0800 Dominik Behr <dbehr@chromium.org> wrote: > On Wed, Mar 8, 2023 at 9:49 AM Alex Williamson > <alex.williamson@redhat.com> wrote: > > > Adding libvirt folks. This intentionally designs the interface in a > > way that requires a privileged intermediary to monitor netlink on the > > host, associate messages to VMs based on an attached device, and > > re-inject the event to the VMM. Why wouldn't we use a channel > > associated with the device for such events, such that the VMM has > > direct access? The netlink path seems like it has more moving pieces, > > possibly scalability issues, and maybe security issues? > > It is the same interface as other ACPI events like AC adapter LID etc > are forwarded to user-space. > ACPI events are not particularly high frequency like interrupts. I'm not sure that's relevant, these interfaces don't proclaim to provide isolation among host processes which manage behavior relative to accessories. These are effectively system level services. It's only a very, very specialized use case that places a VMM as peers among these processes. Generally we don't want to grant a VMM any privileges beyond what it absolutely needs, so letting a VMM managing an assigned NIC really ought not to be able to snoop host events related to anything other than the NIC. > > > > What sort of ACPI events are we expecting to see here and what does user space do with them? > The use we are looking at right now are D-notifier events about the > GPU power available to mobile discrete GPUs. > The firmware notifies the GPU driver and resource daemon to > dynamically adjust the amount of power that can be used by the GPU. > > > The proposed interface really has no introspection, how does the VMM > > know which devices need ACPI tables added "upfront"? How do these > > events factor into hotplug device support, where we may not be able to > > dynamically inject ACPI code into the VM? > > The VMM can examine PCI IDs and the associated firmware node of the > PCI device to figure out what events to expect and what ACPI table to > generate to support it but that should not be necessary. I'm not entirely sure where your VMM is drawing the line between the VM and management tools, but I think this is another case where the hypervisor itself should not have privileges to examine the host firmware tables to build its own. Something like libvirt would be responsible for that. > A generic GPE based ACPI event forwarder as Grzegorz proposed can be > injected at VM init time and handle any notification that comes later, > even from hotplug devices. It appears that forwarder is sending the notify to a specific ACPI device node, so it's unclear to me how that becomes boilerplate AML added to all VMs. We'll need to notify different devices based on different events, right? > > The acpi_bus_generate_netlink_event() below really only seems to form a > > u8 event type from the u32 event. Is this something that could be > > provided directly from the vfio device uAPI with an ioeventfd, thus > > providing introspection that a device supports ACPI event notifications > > and the ability for the VMM to exclusively monitor those events, and > > only those events for the device, without additional privileges? > > From what I can see these events are 8 bit as they come from ACPI. > They also do not carry any payload and it is up to the receiving > driver to query any additional context/state from the device. > This will work the same in the VM where driver can query the same > information from the passed through PCI device. > There are multiple other netflink based ACPI events forwarders which > do exactly the same thing for other devices like AC adapter, lid/power > button, ACPI thermal notifications, etc. > They all use the same mechanism and can be received by user-space > programs whether VMMs or others. But again, those other receivers are potentially system services, not an isolated VM instance operating in a limited privilege environment. IMO, it's very different if the host display server has access to lid or power events than it is to allow some arbitrary VM that happens to have an unrelated assigned device that same privilege. On my laptop, I see multiple _GPE scopes, each apparently very unique to the devices: Scope (_GPE) { Method (_L0C, 0, Serialized) // _Lxx: Level-Triggered GPE, xx=0x00-0xFF { Notify (\_SB.PCI0.GPP0.PEGP, 0x81) // Information Change } Method (_L0D, 0, Serialized) // _Lxx: Level-Triggered GPE, xx=0x00-0xFF { Notify (\_SB.PCI0.GPP0.PEGP, 0x81) // Information Change } Method (_L0F, 0, Serialized) // _Lxx: Level-Triggered GPE, xx=0x00-0xFF { Notify (\_SB.PCI0.GPP0.PEGP, 0x81) // Information Change } } Scope (_GPE) { Method (_L19, 0, NotSerialized) // _Lxx: Level-Triggered GPE, xx=0x00-0xFF { Notify (\_SB.PCI0.GP17, 0x02) // Device Wake Notify (\_SB.PCI0.GP17.XHC0, 0x02) // Device Wake Notify (\_SB.PCI0.GP17.XHC1, 0x02) // Device Wake Notify (\_SB.PWRB, 0x02) // Device Wake } Method (_L08, 0, NotSerialized) // _Lxx: Level-Triggered GPE, xx=0x00-0xFF { Notify (\_SB.PCI0.GP18, 0x02) // Device Wake Notify (\_SB.PCI0.GPP0, 0x02) // Device Wake Notify (\_SB.PCI0.GPP1, 0x02) // Device Wake Notify (\_SB.PCI0.GPP5, 0x02) // Device Wake } } At least one more even significantly more extensive, calling methods that interact with OpRegions. So how does a simple stub of a GPE block replicate this sort of behavior in the host AML? Thanks, Alex
On Wed, Mar 8, 2023 at 12:06 PM Alex Williamson <alex.williamson@redhat.com> wrote: > > On Wed, 8 Mar 2023 10:45:51 -0800 > Dominik Behr <dbehr@chromium.org> wrote: > > > It is the same interface as other ACPI events like AC adapter LID etc > > are forwarded to user-space. > > ACPI events are not particularly high frequency like interrupts. > > I'm not sure that's relevant, these interfaces don't proclaim to > provide isolation among host processes which manage behavior relative > to accessories. These are effectively system level services. It's only > a very, very specialized use case that places a VMM as peers among these > processes. Generally we don't want to grant a VMM any privileges beyond > what it absolutely needs, so letting a VMM managing an assigned NIC > really ought not to be able to snoop host events related to anything > other than the NIC. How is that related to the fact that we are forwarding VFIO-PCI events to netlink? Kernel does not grant any privileges to VMM. There are already other ACPI events on netlink. The implementer of the VMM can choose to allow VMM to snoop them or not. In our case our VMM (crosvm) does already snoop LID, battery and AC adapter events so the guest can adjust its behavior accordingly. This change just adds another class of ACPI events that are forwarded to netlink. > > > > > > What sort of ACPI events are we expecting to see here and what does user space do with them? > > The use we are looking at right now are D-notifier events about the > > GPU power available to mobile discrete GPUs. > > The firmware notifies the GPU driver and resource daemon to > > dynamically adjust the amount of power that can be used by the GPU. > > > > > The proposed interface really has no introspection, how does the VMM > > > know which devices need ACPI tables added "upfront"? How do these > > > events factor into hotplug device support, where we may not be able to > > > dynamically inject ACPI code into the VM? > > > > The VMM can examine PCI IDs and the associated firmware node of the > > PCI device to figure out what events to expect and what ACPI table to > > generate to support it but that should not be necessary. > > I'm not entirely sure where your VMM is drawing the line between the VM > and management tools, but I think this is another case where the > hypervisor itself should not have privileges to examine the host > firmware tables to build its own. Something like libvirt would be > responsible for that. Yes, but that depends on the design of hypervisor and VMM and is not related to this patch. > > > A generic GPE based ACPI event forwarder as Grzegorz proposed can be > > injected at VM init time and handle any notification that comes later, > > even from hotplug devices. > > It appears that forwarder is sending the notify to a specific ACPI > device node, so it's unclear to me how that becomes boilerplate AML > added to all VMs. We'll need to notify different devices based on > different events, right? Valid point. The notifications have a "scope" ACPI path. In my experience these events are consumed without looking where they came from but I believe the patch can be extended to provide ACPI path, in your example "_SB.PCI0.GPP0.PEGP" instead of generic vfio_pci which VMM could use to translate an equivalent ACPI path in the guest and pass it to a generic ACPI GPE based notifier via shared memory. Grzegorz could you chime in whether that would be possible? > > > > The acpi_bus_generate_netlink_event() below really only seems to form a > > > u8 event type from the u32 event. Is this something that could be > > > provided directly from the vfio device uAPI with an ioeventfd, thus > > > providing introspection that a device supports ACPI event notifications > > > and the ability for the VMM to exclusively monitor those events, and > > > only those events for the device, without additional privileges? > > > > From what I can see these events are 8 bit as they come from ACPI. > > They also do not carry any payload and it is up to the receiving > > driver to query any additional context/state from the device. > > This will work the same in the VM where driver can query the same > > information from the passed through PCI device. > > There are multiple other netflink based ACPI events forwarders which > > do exactly the same thing for other devices like AC adapter, lid/power > > button, ACPI thermal notifications, etc. > > They all use the same mechanism and can be received by user-space > > programs whether VMMs or others. > > But again, those other receivers are potentially system services, not > an isolated VM instance operating in a limited privilege environment. > IMO, it's very different if the host display server has access to lid > or power events than it is to allow some arbitrary VM that happens to > have an unrelated assigned device that same privilege. Therefore these VFIO related ACPI events could be received by a system service via this netlink event and selectively forwarded to VMM if such is a desire of whoever implements the userspace. This is outside the scope of this patch. In our case our VMM does receive these LID, AC or battery events. -- Dominik
On Wed, 8 Mar 2023 14:44:28 -0800 Dominik Behr <dbehr@google.com> wrote: > On Wed, Mar 8, 2023 at 12:06 PM Alex Williamson > <alex.williamson@redhat.com> wrote: > > > > On Wed, 8 Mar 2023 10:45:51 -0800 > > Dominik Behr <dbehr@chromium.org> wrote: > > > > > It is the same interface as other ACPI events like AC adapter LID etc > > > are forwarded to user-space. > > > ACPI events are not particularly high frequency like interrupts. > > > > I'm not sure that's relevant, these interfaces don't proclaim to > > provide isolation among host processes which manage behavior relative > > to accessories. These are effectively system level services. It's only > > a very, very specialized use case that places a VMM as peers among these > > processes. Generally we don't want to grant a VMM any privileges beyond > > what it absolutely needs, so letting a VMM managing an assigned NIC > > really ought not to be able to snoop host events related to anything > > other than the NIC. > How is that related to the fact that we are forwarding VFIO-PCI events > to netlink? Kernel does not grant any privileges to VMM. > There are already other ACPI events on netlink. The implementer of the > VMM can choose to allow VMM to snoop them or not. > In our case our VMM (crosvm) does already snoop LID, battery and AC > adapter events so the guest can adjust its behavior accordingly. > This change just adds another class of ACPI events that are forwarded > to netlink. That's true, it is the VMM choice whether to allow snooping netlink, but this is being proposed as THE solution to allow VMMs to receive ACPI events related to vfio assigned devices. If the solution inherently requires escalating the VMM privileges to see all netlink events, that's a weakness in the proposal. As noted previously, there's also no introspection here, the VMM can't know whether it should listen to netlink for ACPI events or include AML related to a GPE for the device. It cannot determine if either the kernel supports this feature or if the device has an ACPI companion that can generate these events. > > > > > > > > What sort of ACPI events are we expecting to see here and what does user space do with them? > > > The use we are looking at right now are D-notifier events about the > > > GPU power available to mobile discrete GPUs. > > > The firmware notifies the GPU driver and resource daemon to > > > dynamically adjust the amount of power that can be used by the GPU. > > > > > > > The proposed interface really has no introspection, how does the VMM > > > > know which devices need ACPI tables added "upfront"? How do these > > > > events factor into hotplug device support, where we may not be able to > > > > dynamically inject ACPI code into the VM? > > > > > > The VMM can examine PCI IDs and the associated firmware node of the > > > PCI device to figure out what events to expect and what ACPI table to > > > generate to support it but that should not be necessary. > > > > I'm not entirely sure where your VMM is drawing the line between the VM > > and management tools, but I think this is another case where the > > hypervisor itself should not have privileges to examine the host > > firmware tables to build its own. Something like libvirt would be > > responsible for that. > Yes, but that depends on the design of hypervisor and VMM and is not > related to this patch. It is very much related to this patch if it proposes an interface to solve a problem which is likely not compatible with the security model of other VMMs. We need a single solution to support all VMMs. > > > > > A generic GPE based ACPI event forwarder as Grzegorz proposed can be > > > injected at VM init time and handle any notification that comes later, > > > even from hotplug devices. > > > > It appears that forwarder is sending the notify to a specific ACPI > > device node, so it's unclear to me how that becomes boilerplate AML > > added to all VMs. We'll need to notify different devices based on > > different events, right? > Valid point. The notifications have a "scope" ACPI path. > In my experience these events are consumed without looking where they > came from but I believe the patch can be extended to > provide ACPI path, in your example "_SB.PCI0.GPP0.PEGP" instead of > generic vfio_pci which VMM could use to translate an equivalent ACPI > path in the guest and pass it to a generic ACPI GPE based notifier via > shared memory. Grzegorz could you chime in whether that would be > possible? So effectively we're imposing the host ACPI namespace on the VM, or at least a mapping between the host and VM namespace? The generality of this is not improving. > > > > The acpi_bus_generate_netlink_event() below really only seems to form a > > > > u8 event type from the u32 event. Is this something that could be > > > > provided directly from the vfio device uAPI with an ioeventfd, thus > > > > providing introspection that a device supports ACPI event notifications > > > > and the ability for the VMM to exclusively monitor those events, and > > > > only those events for the device, without additional privileges? > > > > > > From what I can see these events are 8 bit as they come from ACPI. > > > They also do not carry any payload and it is up to the receiving > > > driver to query any additional context/state from the device. > > > This will work the same in the VM where driver can query the same > > > information from the passed through PCI device. > > > There are multiple other netflink based ACPI events forwarders which > > > do exactly the same thing for other devices like AC adapter, lid/power > > > button, ACPI thermal notifications, etc. > > > They all use the same mechanism and can be received by user-space > > > programs whether VMMs or others. > > > > But again, those other receivers are potentially system services, not > > an isolated VM instance operating in a limited privilege environment. > > IMO, it's very different if the host display server has access to lid > > or power events than it is to allow some arbitrary VM that happens to > > have an unrelated assigned device that same privilege. > Therefore these VFIO related ACPI events could be received by a system > service via this netlink event and selectively forwarded to VMM if > such is a desire of whoever implements the userspace. > This is outside the scope of this patch. In our case our VMM does > receive these LID, AC or battery events. But this is backwards, we're presupposing the choice to use netlink based on the convenience of one VMM, which potentially creates obstacles, maybe even security isolation issues for other VMMs. The method of delivering ACPI events to a VMM is very much within the scope of this proposal. Thanks, Alex
All other ACPI events that are available to userspace are on netlink already. As for translation of ACPI paths. It is sort of a requirement for VMM to translate the PCI path from host to guest because the PCI device tree in the guest is totally different already. The same follows for ACPI paths. What would you propose instead of netlink? Sysfs entry for VFIO PCI device that accepts eventfd and signals the events via eventfd? Or moving it into ACPI layer entirely and adding eventfd sysfs interface for all ACPI devices? -- Dominik On Wed, Mar 8, 2023 at 3:38 PM Alex Williamson <alex.williamson@redhat.com> wrote: > > On Wed, 8 Mar 2023 14:44:28 -0800 > Dominik Behr <dbehr@google.com> wrote: > > > On Wed, Mar 8, 2023 at 12:06 PM Alex Williamson > > <alex.williamson@redhat.com> wrote: > > > > > > On Wed, 8 Mar 2023 10:45:51 -0800 > > > Dominik Behr <dbehr@chromium.org> wrote: > > > > > > > It is the same interface as other ACPI events like AC adapter LID etc > > > > are forwarded to user-space. > > > > ACPI events are not particularly high frequency like interrupts. > > > > > > I'm not sure that's relevant, these interfaces don't proclaim to > > > provide isolation among host processes which manage behavior relative > > > to accessories. These are effectively system level services. It's only > > > a very, very specialized use case that places a VMM as peers among these > > > processes. Generally we don't want to grant a VMM any privileges beyond > > > what it absolutely needs, so letting a VMM managing an assigned NIC > > > really ought not to be able to snoop host events related to anything > > > other than the NIC. > > How is that related to the fact that we are forwarding VFIO-PCI events > > to netlink? Kernel does not grant any privileges to VMM. > > There are already other ACPI events on netlink. The implementer of the > > VMM can choose to allow VMM to snoop them or not. > > In our case our VMM (crosvm) does already snoop LID, battery and AC > > adapter events so the guest can adjust its behavior accordingly. > > This change just adds another class of ACPI events that are forwarded > > to netlink. > > That's true, it is the VMM choice whether to allow snooping netlink, > but this is being proposed as THE solution to allow VMMs to receive > ACPI events related to vfio assigned devices. If the solution > inherently requires escalating the VMM privileges to see all netlink > events, that's a weakness in the proposal. As noted previously, > there's also no introspection here, the VMM can't know whether it > should listen to netlink for ACPI events or include AML related to a > GPE for the device. It cannot determine if either the kernel supports > this feature or if the device has an ACPI companion that can generate > these events. > > > > > > > > > > > What sort of ACPI events are we expecting to see here and what does user space do with them? > > > > The use we are looking at right now are D-notifier events about the > > > > GPU power available to mobile discrete GPUs. > > > > The firmware notifies the GPU driver and resource daemon to > > > > dynamically adjust the amount of power that can be used by the GPU. > > > > > > > > > The proposed interface really has no introspection, how does the VMM > > > > > know which devices need ACPI tables added "upfront"? How do these > > > > > events factor into hotplug device support, where we may not be able to > > > > > dynamically inject ACPI code into the VM? > > > > > > > > The VMM can examine PCI IDs and the associated firmware node of the > > > > PCI device to figure out what events to expect and what ACPI table to > > > > generate to support it but that should not be necessary. > > > > > > I'm not entirely sure where your VMM is drawing the line between the VM > > > and management tools, but I think this is another case where the > > > hypervisor itself should not have privileges to examine the host > > > firmware tables to build its own. Something like libvirt would be > > > responsible for that. > > Yes, but that depends on the design of hypervisor and VMM and is not > > related to this patch. > > It is very much related to this patch if it proposes an interface to > solve a problem which is likely not compatible with the security model > of other VMMs. We need a single solution to support all VMMs. > > > > > > > > A generic GPE based ACPI event forwarder as Grzegorz proposed can be > > > > injected at VM init time and handle any notification that comes later, > > > > even from hotplug devices. > > > > > > It appears that forwarder is sending the notify to a specific ACPI > > > device node, so it's unclear to me how that becomes boilerplate AML > > > added to all VMs. We'll need to notify different devices based on > > > different events, right? > > Valid point. The notifications have a "scope" ACPI path. > > In my experience these events are consumed without looking where they > > came from but I believe the patch can be extended to > > provide ACPI path, in your example "_SB.PCI0.GPP0.PEGP" instead of > > generic vfio_pci which VMM could use to translate an equivalent ACPI > > path in the guest and pass it to a generic ACPI GPE based notifier via > > shared memory. Grzegorz could you chime in whether that would be > > possible? > > So effectively we're imposing the host ACPI namespace on the VM, or at > least a mapping between the host and VM namespace? The generality of > this is not improving. > > > > > > The acpi_bus_generate_netlink_event() below really only seems to form a > > > > > u8 event type from the u32 event. Is this something that could be > > > > > provided directly from the vfio device uAPI with an ioeventfd, thus > > > > > providing introspection that a device supports ACPI event notifications > > > > > and the ability for the VMM to exclusively monitor those events, and > > > > > only those events for the device, without additional privileges? > > > > > > > > From what I can see these events are 8 bit as they come from ACPI. > > > > They also do not carry any payload and it is up to the receiving > > > > driver to query any additional context/state from the device. > > > > This will work the same in the VM where driver can query the same > > > > information from the passed through PCI device. > > > > There are multiple other netflink based ACPI events forwarders which > > > > do exactly the same thing for other devices like AC adapter, lid/power > > > > button, ACPI thermal notifications, etc. > > > > They all use the same mechanism and can be received by user-space > > > > programs whether VMMs or others. > > > > > > But again, those other receivers are potentially system services, not > > > an isolated VM instance operating in a limited privilege environment. > > > IMO, it's very different if the host display server has access to lid > > > or power events than it is to allow some arbitrary VM that happens to > > > have an unrelated assigned device that same privilege. > > Therefore these VFIO related ACPI events could be received by a system > > service via this netlink event and selectively forwarded to VMM if > > such is a desire of whoever implements the userspace. > > This is outside the scope of this patch. In our case our VMM does > > receive these LID, AC or battery events. > > But this is backwards, we're presupposing the choice to use netlink > based on the convenience of one VMM, which potentially creates > obstacles, maybe even security isolation issues for other VMMs. The > method of delivering ACPI events to a VMM is very much within the scope > of this proposal. Thanks, > > Alex >
czw., 9 mar 2023 o 00:38 Alex Williamson <alex.williamson@redhat.com> napisał(a): > > On Wed, 8 Mar 2023 14:44:28 -0800 > Dominik Behr <dbehr@google.com> wrote: > > > On Wed, Mar 8, 2023 at 12:06 PM Alex Williamson > > <alex.williamson@redhat.com> wrote: > > > > > > On Wed, 8 Mar 2023 10:45:51 -0800 > > > Dominik Behr <dbehr@chromium.org> wrote: > > > > > > > It is the same interface as other ACPI events like AC adapter LID etc > > > > are forwarded to user-space. > > > > ACPI events are not particularly high frequency like interrupts. > > > > > > I'm not sure that's relevant, these interfaces don't proclaim to > > > provide isolation among host processes which manage behavior relative > > > to accessories. These are effectively system level services. It's only > > > a very, very specialized use case that places a VMM as peers among these > > > processes. Generally we don't want to grant a VMM any privileges beyond > > > what it absolutely needs, so letting a VMM managing an assigned NIC > > > really ought not to be able to snoop host events related to anything > > > other than the NIC. > > How is that related to the fact that we are forwarding VFIO-PCI events > > to netlink? Kernel does not grant any privileges to VMM. > > There are already other ACPI events on netlink. The implementer of the > > VMM can choose to allow VMM to snoop them or not. > > In our case our VMM (crosvm) does already snoop LID, battery and AC > > adapter events so the guest can adjust its behavior accordingly. > > This change just adds another class of ACPI events that are forwarded > > to netlink. > > That's true, it is the VMM choice whether to allow snooping netlink, > but this is being proposed as THE solution to allow VMMs to receive > ACPI events related to vfio assigned devices. If the solution > inherently requires escalating the VMM privileges to see all netlink > events, that's a weakness in the proposal. As noted previously, > there's also no introspection here, the VMM can't know whether it > should listen to netlink for ACPI events or include AML related to a > GPE for the device. It cannot determine if either the kernel supports > this feature or if the device has an ACPI companion that can generate > these events. To be precise the VMM doesn't listen to all netlink events: it listens only to "acpi_event" family and acpi related multicast group, which means it listens to all events generated through acpi_bus_generate_netlink_event. Before sending this patch I thought about using eventfd instead netalink which will actually provide a channel associated with a given device and therefore such notifications will be received only by the VMM associated with such a device. Nevertheless, it seems like eventfd will allow to signalize events happening (notify on a given device) but is not capable of sending any payload so in our case there is no room for propagating notification value via eventfd. Maybe there is other mechanism eventfd-like which will allow to achieve above? If there is no such mechanism, maybe instead of using existing acpi netlink events, which are associated with "acpi_event" netlink family and acpi multicast group, we could create per vfio-pci a different netlink family or probably reuse "acpi_event" family but use different multicast group, so each device will have dedicated netlink family. Does it seem reasonable? > > > > > > > > > > > What sort of ACPI events are we expecting to see here and what does user space do with them? > > > > The use we are looking at right now are D-notifier events about the > > > > GPU power available to mobile discrete GPUs. > > > > The firmware notifies the GPU driver and resource daemon to > > > > dynamically adjust the amount of power that can be used by the GPU. > > > > > > > > > The proposed interface really has no introspection, how does the VMM > > > > > know which devices need ACPI tables added "upfront"? How do these > > > > > events factor into hotplug device support, where we may not be able to > > > > > dynamically inject ACPI code into the VM? > > > > > > > > The VMM can examine PCI IDs and the associated firmware node of the > > > > PCI device to figure out what events to expect and what ACPI table to > > > > generate to support it but that should not be necessary. > > > > > > I'm not entirely sure where your VMM is drawing the line between the VM > > > and management tools, but I think this is another case where the > > > hypervisor itself should not have privileges to examine the host > > > firmware tables to build its own. Something like libvirt would be > > > responsible for that. > > Yes, but that depends on the design of hypervisor and VMM and is not > > related to this patch. > > It is very much related to this patch if it proposes an interface to > solve a problem which is likely not compatible with the security model > of other VMMs. We need a single solution to support all VMMs. > > > > > > > > A generic GPE based ACPI event forwarder as Grzegorz proposed can be > > > > injected at VM init time and handle any notification that comes later, > > > > even from hotplug devices. > > > > > > It appears that forwarder is sending the notify to a specific ACPI > > > device node, so it's unclear to me how that becomes boilerplate AML > > > added to all VMs. We'll need to notify different devices based on > > > different events, right? > > Valid point. The notifications have a "scope" ACPI path. > > In my experience these events are consumed without looking where they > > came from but I believe the patch can be extended to > > provide ACPI path, in your example "_SB.PCI0.GPP0.PEGP" instead of > > generic vfio_pci which VMM could use to translate an equivalent ACPI > > path in the guest and pass it to a generic ACPI GPE based notifier via > > shared memory. Grzegorz could you chime in whether that would be > > possible? > > So effectively we're imposing the host ACPI namespace on the VM, or at > least a mapping between the host and VM namespace? The generality of > this is not improving. Yes, in the example VMM implementation we have mapping between the host pci device address and guest pci device. Therefore VMM knows, based on device name (BDF) sent via netlink, to which guest device this notification should be propagated. The boilerplate AML is added to each vfio-pci device which belongs to VMM and each vfio-pci device has associated pre-allocated GPE so the VMM knows which GPE should be triggered to replicate notification for a given device. BTW this is only current WIP VMM implementation - this could probably be optimized if needed. Handling hotplug devices is more problematic. I see that the kernel provides some runtime ACPI patching mechanism: https://www.kernel.org/doc/html/latest/firmware-guide/acpi/method-customizing.html (which I never tried) but not even sure how VMM could take advantage of it. BTW this realized me that the same problem with hotplug applies to other vfio-pci use-cases e.g. runtime PM, which relies on guest virtual ACPI method: https://patchwork.kernel.org/project/linux-pm/patch/20220829114850.4341-5-abhsahu@nvidia.com/. Generating virtual ACPI content for hotplug devices seems like a more generic issue. > > > > > > The acpi_bus_generate_netlink_event() below really only seems to form a > > > > > u8 event type from the u32 event. Is this something that could be > > > > > provided directly from the vfio device uAPI with an ioeventfd, thus > > > > > providing introspection that a device supports ACPI event notifications > > > > > and the ability for the VMM to exclusively monitor those events, and > > > > > only those events for the device, without additional privileges? > > > > > > > > From what I can see these events are 8 bit as they come from ACPI. > > > > They also do not carry any payload and it is up to the receiving > > > > driver to query any additional context/state from the device. > > > > This will work the same in the VM where driver can query the same > > > > information from the passed through PCI device. > > > > There are multiple other netflink based ACPI events forwarders which > > > > do exactly the same thing for other devices like AC adapter, lid/power > > > > button, ACPI thermal notifications, etc. > > > > They all use the same mechanism and can be received by user-space > > > > programs whether VMMs or others. > > > > > > But again, those other receivers are potentially system services, not > > > an isolated VM instance operating in a limited privilege environment. > > > IMO, it's very different if the host display server has access to lid > > > or power events than it is to allow some arbitrary VM that happens to > > > have an unrelated assigned device that same privilege. > > Therefore these VFIO related ACPI events could be received by a system > > service via this netlink event and selectively forwarded to VMM if > > such is a desire of whoever implements the userspace. > > This is outside the scope of this patch. In our case our VMM does > > receive these LID, AC or battery events. > > But this is backwards, we're presupposing the choice to use netlink > based on the convenience of one VMM, which potentially creates > obstacles, maybe even security isolation issues for other VMMs. The > method of delivering ACPI events to a VMM is very much within the scope > of this proposal. Thanks, > > Alex > regarding: > > > On my laptop, I see multiple _GPE scopes, each apparently very unique > > > to the devices: > > > > > > Scope (_GPE) > > > { > > > Method (_L0C, 0, Serialized) // _Lxx: Level-Triggered GPE, xx=0x00-0xFF > > > { > > > Notify (\_SB.PCI0.GPP0.PEGP, 0x81) // Information Change > > > } > > > > > > Method (_L0D, 0, Serialized) // _Lxx: Level-Triggered GPE, xx=0x00-0xFF > > > { > > > Notify (\_SB.PCI0.GPP0.PEGP, 0x81) // Information Change > > > } > > > > > > Method (_L0F, 0, Serialized) // _Lxx: Level-Triggered GPE, xx=0x00-0xFF > > > { > > > Notify (\_SB.PCI0.GPP0.PEGP, 0x81) // Information Change > > > } > > > } > > > > > > Scope (_GPE) > > > { > > > Method (_L19, 0, NotSerialized) // _Lxx: Level-Triggered GPE, xx=0x00-0xFF > > > { > > > Notify (\_SB.PCI0.GP17, 0x02) // Device Wake > > > Notify (\_SB.PCI0.GP17.XHC0, 0x02) // Device Wake > > > Notify (\_SB.PCI0.GP17.XHC1, 0x02) // Device Wake > > > Notify (\_SB.PWRB, 0x02) // Device Wake > > > } > > > > > > Method (_L08, 0, NotSerialized) // _Lxx: Level-Triggered GPE, xx=0x00-0xFF > > > { > > > Notify (\_SB.PCI0.GP18, 0x02) // Device Wake > > > Notify (\_SB.PCI0.GPP0, 0x02) // Device Wake > > > Notify (\_SB.PCI0.GPP1, 0x02) // Device Wake > > > Notify (\_SB.PCI0.GPP5, 0x02) // Device Wake > > > } > > > } > > > > > > At least one more even significantly more extensive, calling methods > > > that interact with OpRegions. So how does a simple stub of a > > > GPE block replicate this sort of behavior in the host AML? Thanks, The simple stub of GPE block will work to replicate the ACPI notification only: as mentioned earlier GPE handler will be generated per-vfio device so in your example if let assume that only: - \_SB.PCI0.GPP0.PEGP - \_SB.PCI0.GP17.XHC1 will be pass-through to the guest, the generated AML code for VM will look more-less like below: Scope (_GPE) { Method (_E00, 0, NotSerialized) { Local0 = \_SB.PCI0.GPP0.PEGP.NOTY Notify ( \_SB.PCI0.GPP0.PEGP, Local0) } } Scope (_GPE) { Method (_E01, 0, NotSerialized) { Local0 =\_SB.PCI0.GP17.XHC1.NOTY Notify (\_SB.PCI0.GP17.XHC1, Local0) } } So each pass-through device will have associated GPE (0 for \_SB.PCI0.GPP0.PEGP and 1 for \_SB.PCI0.GP17.XHC1). The path in Notify could actually be different and related to guest pci hierarchy (but associated to those host devices). Please also note that in this case we use GPE in order to "inform" guest about notification coming and we do not try to replicate host GPE scope description. Above we assumed that other devices (like \_SB.PCI0.GPP0/1) are not pass-through to the guest and notification are handled in host as usual (they are not binded to pci-vfio) and there is no need to generate AML code, allocate GPE for them and so on. Thank you, Grzegorz
On Wed, Mar 08, 2023 at 05:51:32PM -0800, Dominik Behr wrote: > All other ACPI events that are available to userspace are on netlink already. > As for translation of ACPI paths. It is sort of a requirement for VMM > to translate the PCI path from host to guest because the PCI device > tree in the guest is totally different already. The same follows for > ACPI paths. > > What would you propose instead of netlink? > Sysfs entry for VFIO PCI device that accepts eventfd and signals the > events via eventfd? Or moving it into ACPI layer entirely and adding > eventfd sysfs interface for all ACPI devices? I think Alex is asking why wouldn't you push it through the vfio device FD? There is an unambiguous relationship between the QEMU vPCI identity and the VFIO device, and we already have a good security model for VMM access to the device FD. Jason
Hi Alex Could you please refer to my previous message? Thank you in advance, Grzegorz czw., 9 mar 2023 o 14:41 Grzegorz Jaszczyk <jaz@semihalf.com> napisał(a): > > czw., 9 mar 2023 o 00:38 Alex Williamson <alex.williamson@redhat.com> > napisał(a): > > > > On Wed, 8 Mar 2023 14:44:28 -0800 > > Dominik Behr <dbehr@google.com> wrote: > > > > > On Wed, Mar 8, 2023 at 12:06 PM Alex Williamson > > > <alex.williamson@redhat.com> wrote: > > > > > > > > On Wed, 8 Mar 2023 10:45:51 -0800 > > > > Dominik Behr <dbehr@chromium.org> wrote: > > > > > > > > > It is the same interface as other ACPI events like AC adapter LID etc > > > > > are forwarded to user-space. > > > > > ACPI events are not particularly high frequency like interrupts. > > > > > > > > I'm not sure that's relevant, these interfaces don't proclaim to > > > > provide isolation among host processes which manage behavior relative > > > > to accessories. These are effectively system level services. It's only > > > > a very, very specialized use case that places a VMM as peers among these > > > > processes. Generally we don't want to grant a VMM any privileges beyond > > > > what it absolutely needs, so letting a VMM managing an assigned NIC > > > > really ought not to be able to snoop host events related to anything > > > > other than the NIC. > > > How is that related to the fact that we are forwarding VFIO-PCI events > > > to netlink? Kernel does not grant any privileges to VMM. > > > There are already other ACPI events on netlink. The implementer of the > > > VMM can choose to allow VMM to snoop them or not. > > > In our case our VMM (crosvm) does already snoop LID, battery and AC > > > adapter events so the guest can adjust its behavior accordingly. > > > This change just adds another class of ACPI events that are forwarded > > > to netlink. > > > > That's true, it is the VMM choice whether to allow snooping netlink, > > but this is being proposed as THE solution to allow VMMs to receive > > ACPI events related to vfio assigned devices. If the solution > > inherently requires escalating the VMM privileges to see all netlink > > events, that's a weakness in the proposal. As noted previously, > > there's also no introspection here, the VMM can't know whether it > > should listen to netlink for ACPI events or include AML related to a > > GPE for the device. It cannot determine if either the kernel supports > > this feature or if the device has an ACPI companion that can generate > > these events. > > To be precise the VMM doesn't listen to all netlink events: it listens > only to "acpi_event" family and acpi related multicast group, which > means it listens to all events generated through > acpi_bus_generate_netlink_event. > > Before sending this patch I thought about using eventfd instead > netalink which will actually provide a channel associated with a given > device and therefore such notifications will be received only by the > VMM associated with such a device. Nevertheless, it seems like eventfd > will allow to signalize events happening (notify on a given device) > but is not capable of sending any payload so in our case there is no > room for propagating notification value via eventfd. Maybe there is > other mechanism eventfd-like which will allow to achieve above? > > If there is no such mechanism, maybe instead of using existing acpi > netlink events, which are associated with "acpi_event" netlink family > and acpi multicast group, we could create per vfio-pci a different > netlink family or probably reuse "acpi_event" family but use different > multicast group, so each device will have dedicated netlink family. > Does it seem reasonable? > > > > > > > > > > > > > > > What sort of ACPI events are we expecting to see here and what does user space do with them? > > > > > The use we are looking at right now are D-notifier events about the > > > > > GPU power available to mobile discrete GPUs. > > > > > The firmware notifies the GPU driver and resource daemon to > > > > > dynamically adjust the amount of power that can be used by the GPU. > > > > > > > > > > > The proposed interface really has no introspection, how does the VMM > > > > > > know which devices need ACPI tables added "upfront"? How do these > > > > > > events factor into hotplug device support, where we may not be able to > > > > > > dynamically inject ACPI code into the VM? > > > > > > > > > > The VMM can examine PCI IDs and the associated firmware node of the > > > > > PCI device to figure out what events to expect and what ACPI table to > > > > > generate to support it but that should not be necessary. > > > > > > > > I'm not entirely sure where your VMM is drawing the line between the VM > > > > and management tools, but I think this is another case where the > > > > hypervisor itself should not have privileges to examine the host > > > > firmware tables to build its own. Something like libvirt would be > > > > responsible for that. > > > Yes, but that depends on the design of hypervisor and VMM and is not > > > related to this patch. > > > > It is very much related to this patch if it proposes an interface to > > solve a problem which is likely not compatible with the security model > > of other VMMs. We need a single solution to support all VMMs. > > > > > > > > > > > A generic GPE based ACPI event forwarder as Grzegorz proposed can be > > > > > injected at VM init time and handle any notification that comes later, > > > > > even from hotplug devices. > > > > > > > > It appears that forwarder is sending the notify to a specific ACPI > > > > device node, so it's unclear to me how that becomes boilerplate AML > > > > added to all VMs. We'll need to notify different devices based on > > > > different events, right? > > > Valid point. The notifications have a "scope" ACPI path. > > > In my experience these events are consumed without looking where they > > > came from but I believe the patch can be extended to > > > provide ACPI path, in your example "_SB.PCI0.GPP0.PEGP" instead of > > > generic vfio_pci which VMM could use to translate an equivalent ACPI > > > path in the guest and pass it to a generic ACPI GPE based notifier via > > > shared memory. Grzegorz could you chime in whether that would be > > > possible? > > > > So effectively we're imposing the host ACPI namespace on the VM, or at > > least a mapping between the host and VM namespace? The generality of > > this is not improving. > > Yes, in the example VMM implementation we have mapping between the > host pci device address and guest pci device. Therefore VMM knows, > based on device name (BDF) sent via netlink, to which guest device > this notification should be propagated. The boilerplate AML is added > to each vfio-pci device which belongs to VMM and each vfio-pci device > has associated pre-allocated GPE so the VMM knows which GPE should be > triggered to replicate notification for a given device. BTW this is > only current WIP VMM implementation - this could probably be optimized > if needed. > > Handling hotplug devices is more problematic. I see that the kernel > provides some runtime ACPI patching mechanism: > https://www.kernel.org/doc/html/latest/firmware-guide/acpi/method-customizing.html > (which I never tried) but not even sure how VMM could take advantage > of it. BTW this realized me that the same problem with hotplug applies > to other vfio-pci use-cases e.g. runtime PM, which relies on guest > virtual ACPI method: > https://patchwork.kernel.org/project/linux-pm/patch/20220829114850.4341-5-abhsahu@nvidia.com/. > Generating virtual ACPI content for hotplug devices seems like a more > generic issue. > > > > > > > > > The acpi_bus_generate_netlink_event() below really only seems to form a > > > > > > u8 event type from the u32 event. Is this something that could be > > > > > > provided directly from the vfio device uAPI with an ioeventfd, thus > > > > > > providing introspection that a device supports ACPI event notifications > > > > > > and the ability for the VMM to exclusively monitor those events, and > > > > > > only those events for the device, without additional privileges? > > > > > > > > > > From what I can see these events are 8 bit as they come from ACPI. > > > > > They also do not carry any payload and it is up to the receiving > > > > > driver to query any additional context/state from the device. > > > > > This will work the same in the VM where driver can query the same > > > > > information from the passed through PCI device. > > > > > There are multiple other netflink based ACPI events forwarders which > > > > > do exactly the same thing for other devices like AC adapter, lid/power > > > > > button, ACPI thermal notifications, etc. > > > > > They all use the same mechanism and can be received by user-space > > > > > programs whether VMMs or others. > > > > > > > > But again, those other receivers are potentially system services, not > > > > an isolated VM instance operating in a limited privilege environment. > > > > IMO, it's very different if the host display server has access to lid > > > > or power events than it is to allow some arbitrary VM that happens to > > > > have an unrelated assigned device that same privilege. > > > Therefore these VFIO related ACPI events could be received by a system > > > service via this netlink event and selectively forwarded to VMM if > > > such is a desire of whoever implements the userspace. > > > This is outside the scope of this patch. In our case our VMM does > > > receive these LID, AC or battery events. > > > > But this is backwards, we're presupposing the choice to use netlink > > based on the convenience of one VMM, which potentially creates > > obstacles, maybe even security isolation issues for other VMMs. The > > method of delivering ACPI events to a VMM is very much within the scope > > of this proposal. Thanks, > > > > Alex > > > > regarding: > > > > On my laptop, I see multiple _GPE scopes, each apparently very unique > > > > to the devices: > > > > > > > > Scope (_GPE) > > > > { > > > > Method (_L0C, 0, Serialized) // _Lxx: Level-Triggered GPE, xx=0x00-0xFF > > > > { > > > > Notify (\_SB.PCI0.GPP0.PEGP, 0x81) // Information Change > > > > } > > > > > > > > Method (_L0D, 0, Serialized) // _Lxx: Level-Triggered GPE, xx=0x00-0xFF > > > > { > > > > Notify (\_SB.PCI0.GPP0.PEGP, 0x81) // Information Change > > > > } > > > > > > > > Method (_L0F, 0, Serialized) // _Lxx: Level-Triggered GPE, xx=0x00-0xFF > > > > { > > > > Notify (\_SB.PCI0.GPP0.PEGP, 0x81) // Information Change > > > > } > > > > } > > > > > > > > Scope (_GPE) > > > > { > > > > Method (_L19, 0, NotSerialized) // _Lxx: Level-Triggered GPE, xx=0x00-0xFF > > > > { > > > > Notify (\_SB.PCI0.GP17, 0x02) // Device Wake > > > > Notify (\_SB.PCI0.GP17.XHC0, 0x02) // Device Wake > > > > Notify (\_SB.PCI0.GP17.XHC1, 0x02) // Device Wake > > > > Notify (\_SB.PWRB, 0x02) // Device Wake > > > > } > > > > > > > > Method (_L08, 0, NotSerialized) // _Lxx: Level-Triggered GPE, xx=0x00-0xFF > > > > { > > > > Notify (\_SB.PCI0.GP18, 0x02) // Device Wake > > > > Notify (\_SB.PCI0.GPP0, 0x02) // Device Wake > > > > Notify (\_SB.PCI0.GPP1, 0x02) // Device Wake > > > > Notify (\_SB.PCI0.GPP5, 0x02) // Device Wake > > > > } > > > > } > > > > > > > > At least one more even significantly more extensive, calling methods > > > > that interact with OpRegions. So how does a simple stub of a > > > > GPE block replicate this sort of behavior in the host AML? Thanks, > > The simple stub of GPE block will work to replicate the ACPI > notification only: as mentioned earlier GPE handler will be generated > per-vfio device so in your example if let assume that only: > - \_SB.PCI0.GPP0.PEGP > - \_SB.PCI0.GP17.XHC1 > will be pass-through to the guest, the generated AML code for VM will > look more-less like below: > > Scope (_GPE) > { > Method (_E00, 0, NotSerialized) > { > Local0 = \_SB.PCI0.GPP0.PEGP.NOTY > Notify ( \_SB.PCI0.GPP0.PEGP, Local0) > } > } > Scope (_GPE) > { > Method (_E01, 0, NotSerialized) > { > Local0 =\_SB.PCI0.GP17.XHC1.NOTY > Notify (\_SB.PCI0.GP17.XHC1, Local0) > } > } > > So each pass-through device will have associated GPE (0 for > \_SB.PCI0.GPP0.PEGP and 1 for \_SB.PCI0.GP17.XHC1). The path in Notify > could actually be different and related to guest pci hierarchy (but > associated to those host devices). Please also note that in this case > we use GPE in order to "inform" guest about notification coming and we > do not try to replicate host GPE scope description. > > Above we assumed that other devices (like \_SB.PCI0.GPP0/1) are not > pass-through to the guest and notification are handled in host as > usual (they are not binded to pci-vfio) and there is no need to > generate AML code, allocate GPE for them and so on. > > Thank you, > Grzegorz
On Thu, 9 Mar 2023 14:41:23 +0100 Grzegorz Jaszczyk <jaz@semihalf.com> wrote: > czw., 9 mar 2023 o 00:38 Alex Williamson <alex.williamson@redhat.com> > napisał(a): > > > > On Wed, 8 Mar 2023 14:44:28 -0800 > > Dominik Behr <dbehr@google.com> wrote: > > > > > On Wed, Mar 8, 2023 at 12:06 PM Alex Williamson > > > <alex.williamson@redhat.com> wrote: > > > > > > > > On Wed, 8 Mar 2023 10:45:51 -0800 > > > > Dominik Behr <dbehr@chromium.org> wrote: > > > > > > > > > It is the same interface as other ACPI events like AC adapter LID etc > > > > > are forwarded to user-space. > > > > > ACPI events are not particularly high frequency like interrupts. > > > > > > > > I'm not sure that's relevant, these interfaces don't proclaim to > > > > provide isolation among host processes which manage behavior relative > > > > to accessories. These are effectively system level services. It's only > > > > a very, very specialized use case that places a VMM as peers among these > > > > processes. Generally we don't want to grant a VMM any privileges beyond > > > > what it absolutely needs, so letting a VMM managing an assigned NIC > > > > really ought not to be able to snoop host events related to anything > > > > other than the NIC. > > > How is that related to the fact that we are forwarding VFIO-PCI events > > > to netlink? Kernel does not grant any privileges to VMM. > > > There are already other ACPI events on netlink. The implementer of the > > > VMM can choose to allow VMM to snoop them or not. > > > In our case our VMM (crosvm) does already snoop LID, battery and AC > > > adapter events so the guest can adjust its behavior accordingly. > > > This change just adds another class of ACPI events that are forwarded > > > to netlink. > > > > That's true, it is the VMM choice whether to allow snooping netlink, > > but this is being proposed as THE solution to allow VMMs to receive > > ACPI events related to vfio assigned devices. If the solution > > inherently requires escalating the VMM privileges to see all netlink > > events, that's a weakness in the proposal. As noted previously, > > there's also no introspection here, the VMM can't know whether it > > should listen to netlink for ACPI events or include AML related to a > > GPE for the device. It cannot determine if either the kernel supports > > this feature or if the device has an ACPI companion that can generate > > these events. > > To be precise the VMM doesn't listen to all netlink events: it listens > only to "acpi_event" family and acpi related multicast group, which > means it listens to all events generated through > acpi_bus_generate_netlink_event. > > Before sending this patch I thought about using eventfd instead > netalink which will actually provide a channel associated with a given > device and therefore such notifications will be received only by the > VMM associated with such a device. Nevertheless, it seems like eventfd > will allow to signalize events happening (notify on a given device) > but is not capable of sending any payload so in our case there is no > room for propagating notification value via eventfd. Maybe there is > other mechanism eventfd-like which will allow to achieve above? Reading an eventfd returns an 8-byte value, we generally only use it as a counter, but it's been discussed previously and IIRC, it's possible to use that value as a notification value. > If there is no such mechanism, maybe instead of using existing acpi > netlink events, which are associated with "acpi_event" netlink family > and acpi multicast group, we could create per vfio-pci a different > netlink family or probably reuse "acpi_event" family but use different > multicast group, so each device will have dedicated netlink family. > Does it seem reasonable? > > > > > > > > > > > > > > > What sort of ACPI events are we expecting to see here and what does user space do with them? > > > > > The use we are looking at right now are D-notifier events about the > > > > > GPU power available to mobile discrete GPUs. > > > > > The firmware notifies the GPU driver and resource daemon to > > > > > dynamically adjust the amount of power that can be used by the GPU. > > > > > > > > > > > The proposed interface really has no introspection, how does the VMM > > > > > > know which devices need ACPI tables added "upfront"? How do these > > > > > > events factor into hotplug device support, where we may not be able to > > > > > > dynamically inject ACPI code into the VM? > > > > > > > > > > The VMM can examine PCI IDs and the associated firmware node of the > > > > > PCI device to figure out what events to expect and what ACPI table to > > > > > generate to support it but that should not be necessary. > > > > > > > > I'm not entirely sure where your VMM is drawing the line between the VM > > > > and management tools, but I think this is another case where the > > > > hypervisor itself should not have privileges to examine the host > > > > firmware tables to build its own. Something like libvirt would be > > > > responsible for that. > > > Yes, but that depends on the design of hypervisor and VMM and is not > > > related to this patch. > > > > It is very much related to this patch if it proposes an interface to > > solve a problem which is likely not compatible with the security model > > of other VMMs. We need a single solution to support all VMMs. > > > > > > > > > > > A generic GPE based ACPI event forwarder as Grzegorz proposed can be > > > > > injected at VM init time and handle any notification that comes later, > > > > > even from hotplug devices. > > > > > > > > It appears that forwarder is sending the notify to a specific ACPI > > > > device node, so it's unclear to me how that becomes boilerplate AML > > > > added to all VMs. We'll need to notify different devices based on > > > > different events, right? > > > Valid point. The notifications have a "scope" ACPI path. > > > In my experience these events are consumed without looking where they > > > came from but I believe the patch can be extended to > > > provide ACPI path, in your example "_SB.PCI0.GPP0.PEGP" instead of > > > generic vfio_pci which VMM could use to translate an equivalent ACPI > > > path in the guest and pass it to a generic ACPI GPE based notifier via > > > shared memory. Grzegorz could you chime in whether that would be > > > possible? > > > > So effectively we're imposing the host ACPI namespace on the VM, or at > > least a mapping between the host and VM namespace? The generality of > > this is not improving. > > Yes, in the example VMM implementation we have mapping between the > host pci device address and guest pci device. Therefore VMM knows, > based on device name (BDF) sent via netlink, to which guest device > this notification should be propagated. The boilerplate AML is added > to each vfio-pci device which belongs to VMM and each vfio-pci device > has associated pre-allocated GPE so the VMM knows which GPE should be > triggered to replicate notification for a given device. BTW this is > only current WIP VMM implementation - this could probably be optimized > if needed. > > Handling hotplug devices is more problematic. I see that the kernel > provides some runtime ACPI patching mechanism: > https://www.kernel.org/doc/html/latest/firmware-guide/acpi/method-customizing.html > (which I never tried) but not even sure how VMM could take advantage > of it. BTW this realized me that the same problem with hotplug applies > to other vfio-pci use-cases e.g. runtime PM, which relies on guest > virtual ACPI method: > https://patchwork.kernel.org/project/linux-pm/patch/20220829114850.4341-5-abhsahu@nvidia.com/. > Generating virtual ACPI content for hotplug devices seems like a more > generic issue. I don't think this is an equivalent case, the AML object is at the slot, not the device and the direction is reversed. The VMM can implement PCI slot power control regardless of the host capabilities. This is more like ACPI eject behavior, the guest triggers an event which is processed by the VMM to perform an action. The VMM doesn't need to dynamically add slot power control capabilities based on the features of the plugged device. > > > > > > The acpi_bus_generate_netlink_event() below really only seems to form a > > > > > > u8 event type from the u32 event. Is this something that could be > > > > > > provided directly from the vfio device uAPI with an ioeventfd, thus > > > > > > providing introspection that a device supports ACPI event notifications > > > > > > and the ability for the VMM to exclusively monitor those events, and > > > > > > only those events for the device, without additional privileges? > > > > > > > > > > From what I can see these events are 8 bit as they come from ACPI. > > > > > They also do not carry any payload and it is up to the receiving > > > > > driver to query any additional context/state from the device. > > > > > This will work the same in the VM where driver can query the same > > > > > information from the passed through PCI device. > > > > > There are multiple other netflink based ACPI events forwarders which > > > > > do exactly the same thing for other devices like AC adapter, lid/power > > > > > button, ACPI thermal notifications, etc. > > > > > They all use the same mechanism and can be received by user-space > > > > > programs whether VMMs or others. > > > > > > > > But again, those other receivers are potentially system services, not > > > > an isolated VM instance operating in a limited privilege environment. > > > > IMO, it's very different if the host display server has access to lid > > > > or power events than it is to allow some arbitrary VM that happens to > > > > have an unrelated assigned device that same privilege. > > > Therefore these VFIO related ACPI events could be received by a system > > > service via this netlink event and selectively forwarded to VMM if > > > such is a desire of whoever implements the userspace. > > > This is outside the scope of this patch. In our case our VMM does > > > receive these LID, AC or battery events. > > > > But this is backwards, we're presupposing the choice to use netlink > > based on the convenience of one VMM, which potentially creates > > obstacles, maybe even security isolation issues for other VMMs. The > > method of delivering ACPI events to a VMM is very much within the scope > > of this proposal. Thanks, > > > > Alex > > > > regarding: > > > > On my laptop, I see multiple _GPE scopes, each apparently very unique > > > > to the devices: > > > > > > > > Scope (_GPE) > > > > { > > > > Method (_L0C, 0, Serialized) // _Lxx: Level-Triggered GPE, xx=0x00-0xFF > > > > { > > > > Notify (\_SB.PCI0.GPP0.PEGP, 0x81) // Information Change > > > > } > > > > > > > > Method (_L0D, 0, Serialized) // _Lxx: Level-Triggered GPE, xx=0x00-0xFF > > > > { > > > > Notify (\_SB.PCI0.GPP0.PEGP, 0x81) // Information Change > > > > } > > > > > > > > Method (_L0F, 0, Serialized) // _Lxx: Level-Triggered GPE, xx=0x00-0xFF > > > > { > > > > Notify (\_SB.PCI0.GPP0.PEGP, 0x81) // Information Change > > > > } > > > > } > > > > > > > > Scope (_GPE) > > > > { > > > > Method (_L19, 0, NotSerialized) // _Lxx: Level-Triggered GPE, xx=0x00-0xFF > > > > { > > > > Notify (\_SB.PCI0.GP17, 0x02) // Device Wake > > > > Notify (\_SB.PCI0.GP17.XHC0, 0x02) // Device Wake > > > > Notify (\_SB.PCI0.GP17.XHC1, 0x02) // Device Wake > > > > Notify (\_SB.PWRB, 0x02) // Device Wake > > > > } > > > > > > > > Method (_L08, 0, NotSerialized) // _Lxx: Level-Triggered GPE, xx=0x00-0xFF > > > > { > > > > Notify (\_SB.PCI0.GP18, 0x02) // Device Wake > > > > Notify (\_SB.PCI0.GPP0, 0x02) // Device Wake > > > > Notify (\_SB.PCI0.GPP1, 0x02) // Device Wake > > > > Notify (\_SB.PCI0.GPP5, 0x02) // Device Wake > > > > } > > > > } > > > > > > > > At least one more even significantly more extensive, calling methods > > > > that interact with OpRegions. So how does a simple stub of a > > > > GPE block replicate this sort of behavior in the host AML? Thanks, > > The simple stub of GPE block will work to replicate the ACPI > notification only: as mentioned earlier GPE handler will be generated > per-vfio device so in your example if let assume that only: > - \_SB.PCI0.GPP0.PEGP > - \_SB.PCI0.GP17.XHC1 > will be pass-through to the guest, the generated AML code for VM will > look more-less like below: > > Scope (_GPE) > { > Method (_E00, 0, NotSerialized) > { > Local0 = \_SB.PCI0.GPP0.PEGP.NOTY > Notify ( \_SB.PCI0.GPP0.PEGP, Local0) > } > } > Scope (_GPE) > { > Method (_E01, 0, NotSerialized) > { > Local0 =\_SB.PCI0.GP17.XHC1.NOTY > Notify (\_SB.PCI0.GP17.XHC1, Local0) > } > } > > So each pass-through device will have associated GPE (0 for > \_SB.PCI0.GPP0.PEGP and 1 for \_SB.PCI0.GP17.XHC1). The path in Notify > could actually be different and related to guest pci hierarchy (but > associated to those host devices). Please also note that in this case > we use GPE in order to "inform" guest about notification coming and we > do not try to replicate host GPE scope description. > > Above we assumed that other devices (like \_SB.PCI0.GPP0/1) are not > pass-through to the guest and notification are handled in host as > usual (they are not binded to pci-vfio) and there is no need to > generate AML code, allocate GPE for them and so on. I'm pretty lost here. The GPE code to read the notify value and relay it to another AML object is relatively trivial, but that other AML object needs to do something of some significance with that notify. Minimally, it seems like the AML would need to establish the companion relationship with the device so that a driver in the guest receives that notify. What does that look like, and can it be pre-seeded in the AML regardless of whether the device is cold- or hot-plugged into the VM? Some specific examples would be useful. It's not clear to me that there isn't significant out-of-band effort required to understand and replicate AML from host to guest to make this useful, so the generality of this feature is hard to grasp. Thanks, Alex
czw., 23 mar 2023 o 18:07 Alex Williamson <alex.williamson@redhat.com> napisał(a): > > On Thu, 9 Mar 2023 14:41:23 +0100 > Grzegorz Jaszczyk <jaz@semihalf.com> wrote: > > > czw., 9 mar 2023 o 00:38 Alex Williamson <alex.williamson@redhat.com> > > napisał(a): > > > > > > On Wed, 8 Mar 2023 14:44:28 -0800 > > > Dominik Behr <dbehr@google.com> wrote: > > > > > > > On Wed, Mar 8, 2023 at 12:06 PM Alex Williamson > > > > <alex.williamson@redhat.com> wrote: > > > > > > > > > > On Wed, 8 Mar 2023 10:45:51 -0800 > > > > > Dominik Behr <dbehr@chromium.org> wrote: > > > > > > > > > > > It is the same interface as other ACPI events like AC adapter LID etc > > > > > > are forwarded to user-space. > > > > > > ACPI events are not particularly high frequency like interrupts. > > > > > > > > > > I'm not sure that's relevant, these interfaces don't proclaim to > > > > > provide isolation among host processes which manage behavior relative > > > > > to accessories. These are effectively system level services. It's only > > > > > a very, very specialized use case that places a VMM as peers among these > > > > > processes. Generally we don't want to grant a VMM any privileges beyond > > > > > what it absolutely needs, so letting a VMM managing an assigned NIC > > > > > really ought not to be able to snoop host events related to anything > > > > > other than the NIC. > > > > How is that related to the fact that we are forwarding VFIO-PCI events > > > > to netlink? Kernel does not grant any privileges to VMM. > > > > There are already other ACPI events on netlink. The implementer of the > > > > VMM can choose to allow VMM to snoop them or not. > > > > In our case our VMM (crosvm) does already snoop LID, battery and AC > > > > adapter events so the guest can adjust its behavior accordingly. > > > > This change just adds another class of ACPI events that are forwarded > > > > to netlink. > > > > > > That's true, it is the VMM choice whether to allow snooping netlink, > > > but this is being proposed as THE solution to allow VMMs to receive > > > ACPI events related to vfio assigned devices. If the solution > > > inherently requires escalating the VMM privileges to see all netlink > > > events, that's a weakness in the proposal. As noted previously, > > > there's also no introspection here, the VMM can't know whether it > > > should listen to netlink for ACPI events or include AML related to a > > > GPE for the device. It cannot determine if either the kernel supports > > > this feature or if the device has an ACPI companion that can generate > > > these events. > > > > To be precise the VMM doesn't listen to all netlink events: it listens > > only to "acpi_event" family and acpi related multicast group, which > > means it listens to all events generated through > > acpi_bus_generate_netlink_event. > > > > Before sending this patch I thought about using eventfd instead > > netalink which will actually provide a channel associated with a given > > device and therefore such notifications will be received only by the > > VMM associated with such a device. Nevertheless, it seems like eventfd > > will allow to signalize events happening (notify on a given device) > > but is not capable of sending any payload so in our case there is no > > room for propagating notification value via eventfd. Maybe there is > > other mechanism eventfd-like which will allow to achieve above? > > Reading an eventfd returns an 8-byte value, we generally only use it > as a counter, but it's been discussed previously and IIRC, it's possible > to use that value as a notification value. It seems possible to re-use the eventfd counter to pass the notification value but it will require some synchronization: the kernel's eventfd_signal() simply adds value to the eventfd internal counter. So if two or more ACPI notifications (for a single device) are received and propagated one by one by the host kernel, VMM could read one coalesced value and will be unable to retrieve the notification values. We could probably register two eventfd: one for host kernel to VMM and the second from the VMM to host kernel: so the second one will be used in a similar way as resample enventfd for level triggered interrupts. Or some other serialization mechanism will have to be applied. > > > If there is no such mechanism, maybe instead of using existing acpi > > netlink events, which are associated with "acpi_event" netlink family > > and acpi multicast group, we could create per vfio-pci a different > > netlink family or probably reuse "acpi_event" family but use different > > multicast group, so each device will have dedicated netlink family. > > Does it seem reasonable? > > > > > > > > > > > > > > > > > > > What sort of ACPI events are we expecting to see here and what does user space do with them? > > > > > > The use we are looking at right now are D-notifier events about the > > > > > > GPU power available to mobile discrete GPUs. > > > > > > The firmware notifies the GPU driver and resource daemon to > > > > > > dynamically adjust the amount of power that can be used by the GPU. > > > > > > > > > > > > > The proposed interface really has no introspection, how does the VMM > > > > > > > know which devices need ACPI tables added "upfront"? How do these > > > > > > > events factor into hotplug device support, where we may not be able to > > > > > > > dynamically inject ACPI code into the VM? > > > > > > > > > > > > The VMM can examine PCI IDs and the associated firmware node of the > > > > > > PCI device to figure out what events to expect and what ACPI table to > > > > > > generate to support it but that should not be necessary. > > > > > > > > > > I'm not entirely sure where your VMM is drawing the line between the VM > > > > > and management tools, but I think this is another case where the > > > > > hypervisor itself should not have privileges to examine the host > > > > > firmware tables to build its own. Something like libvirt would be > > > > > responsible for that. > > > > Yes, but that depends on the design of hypervisor and VMM and is not > > > > related to this patch. > > > > > > It is very much related to this patch if it proposes an interface to > > > solve a problem which is likely not compatible with the security model > > > of other VMMs. We need a single solution to support all VMMs. > > > > > > > > > > > > > > A generic GPE based ACPI event forwarder as Grzegorz proposed can be > > > > > > injected at VM init time and handle any notification that comes later, > > > > > > even from hotplug devices. > > > > > > > > > > It appears that forwarder is sending the notify to a specific ACPI > > > > > device node, so it's unclear to me how that becomes boilerplate AML > > > > > added to all VMs. We'll need to notify different devices based on > > > > > different events, right? > > > > Valid point. The notifications have a "scope" ACPI path. > > > > In my experience these events are consumed without looking where they > > > > came from but I believe the patch can be extended to > > > > provide ACPI path, in your example "_SB.PCI0.GPP0.PEGP" instead of > > > > generic vfio_pci which VMM could use to translate an equivalent ACPI > > > > path in the guest and pass it to a generic ACPI GPE based notifier via > > > > shared memory. Grzegorz could you chime in whether that would be > > > > possible? > > > > > > So effectively we're imposing the host ACPI namespace on the VM, or at > > > least a mapping between the host and VM namespace? The generality of > > > this is not improving. > > > > Yes, in the example VMM implementation we have mapping between the > > host pci device address and guest pci device. Therefore VMM knows, > > based on device name (BDF) sent via netlink, to which guest device > > this notification should be propagated. The boilerplate AML is added > > to each vfio-pci device which belongs to VMM and each vfio-pci device > > has associated pre-allocated GPE so the VMM knows which GPE should be > > triggered to replicate notification for a given device. BTW this is > > only current WIP VMM implementation - this could probably be optimized > > if needed. > > > > Handling hotplug devices is more problematic. I see that the kernel > > provides some runtime ACPI patching mechanism: > > https://www.kernel.org/doc/html/latest/firmware-guide/acpi/method-customizing.html > > (which I never tried) but not even sure how VMM could take advantage > > of it. BTW this realized me that the same problem with hotplug applies > > to other vfio-pci use-cases e.g. runtime PM, which relies on guest > > virtual ACPI method: > > https://patchwork.kernel.org/project/linux-pm/patch/20220829114850.4341-5-abhsahu@nvidia.com/. > > Generating virtual ACPI content for hotplug devices seems like a more > > generic issue. > > I don't think this is an equivalent case, the AML object is at the > slot, not the device and the direction is reversed. The VMM can > implement PCI slot power control regardless of the host capabilities. > This is more like ACPI eject behavior, the guest triggers an event > which is processed by the VMM to perform an action. The VMM doesn't > need to dynamically add slot power control capabilities based on the > features of the plugged device. IIUC the only concern with hotplug devices is how to generate the AML code for them. If we theoretically could prepare an ACPI guest table upfront for PCI PM even for hot pluggable devices (by providing the kind of slot as you've described), why can't we do the same for ACPI notifications? For both cases we have to associate such a generated AML device object/slot with a specific PCI device anyway: at least VMM will have to allocate some memory region for each of them upfront and use this address during AML generation, which will allow to associate address that guest access with real device on which we want to perform operations: either perform PM operation or get ACPI notification value. > > > > > > > > The acpi_bus_generate_netlink_event() below really only seems to form a > > > > > > > u8 event type from the u32 event. Is this something that could be > > > > > > > provided directly from the vfio device uAPI with an ioeventfd, thus > > > > > > > providing introspection that a device supports ACPI event notifications > > > > > > > and the ability for the VMM to exclusively monitor those events, and > > > > > > > only those events for the device, without additional privileges? > > > > > > > > > > > > From what I can see these events are 8 bit as they come from ACPI. > > > > > > They also do not carry any payload and it is up to the receiving > > > > > > driver to query any additional context/state from the device. > > > > > > This will work the same in the VM where driver can query the same > > > > > > information from the passed through PCI device. > > > > > > There are multiple other netflink based ACPI events forwarders which > > > > > > do exactly the same thing for other devices like AC adapter, lid/power > > > > > > button, ACPI thermal notifications, etc. > > > > > > They all use the same mechanism and can be received by user-space > > > > > > programs whether VMMs or others. > > > > > > > > > > But again, those other receivers are potentially system services, not > > > > > an isolated VM instance operating in a limited privilege environment. > > > > > IMO, it's very different if the host display server has access to lid > > > > > or power events than it is to allow some arbitrary VM that happens to > > > > > have an unrelated assigned device that same privilege. > > > > Therefore these VFIO related ACPI events could be received by a system > > > > service via this netlink event and selectively forwarded to VMM if > > > > such is a desire of whoever implements the userspace. > > > > This is outside the scope of this patch. In our case our VMM does > > > > receive these LID, AC or battery events. > > > > > > But this is backwards, we're presupposing the choice to use netlink > > > based on the convenience of one VMM, which potentially creates > > > obstacles, maybe even security isolation issues for other VMMs. The > > > method of delivering ACPI events to a VMM is very much within the scope > > > of this proposal. Thanks, > > > > > > Alex > > > > > > > regarding: > > > > > On my laptop, I see multiple _GPE scopes, each apparently very unique > > > > > to the devices: > > > > > > > > > > Scope (_GPE) > > > > > { > > > > > Method (_L0C, 0, Serialized) // _Lxx: Level-Triggered GPE, xx=0x00-0xFF > > > > > { > > > > > Notify (\_SB.PCI0.GPP0.PEGP, 0x81) // Information Change > > > > > } > > > > > > > > > > Method (_L0D, 0, Serialized) // _Lxx: Level-Triggered GPE, xx=0x00-0xFF > > > > > { > > > > > Notify (\_SB.PCI0.GPP0.PEGP, 0x81) // Information Change > > > > > } > > > > > > > > > > Method (_L0F, 0, Serialized) // _Lxx: Level-Triggered GPE, xx=0x00-0xFF > > > > > { > > > > > Notify (\_SB.PCI0.GPP0.PEGP, 0x81) // Information Change > > > > > } > > > > > } > > > > > > > > > > Scope (_GPE) > > > > > { > > > > > Method (_L19, 0, NotSerialized) // _Lxx: Level-Triggered GPE, xx=0x00-0xFF > > > > > { > > > > > Notify (\_SB.PCI0.GP17, 0x02) // Device Wake > > > > > Notify (\_SB.PCI0.GP17.XHC0, 0x02) // Device Wake > > > > > Notify (\_SB.PCI0.GP17.XHC1, 0x02) // Device Wake > > > > > Notify (\_SB.PWRB, 0x02) // Device Wake > > > > > } > > > > > > > > > > Method (_L08, 0, NotSerialized) // _Lxx: Level-Triggered GPE, xx=0x00-0xFF > > > > > { > > > > > Notify (\_SB.PCI0.GP18, 0x02) // Device Wake > > > > > Notify (\_SB.PCI0.GPP0, 0x02) // Device Wake > > > > > Notify (\_SB.PCI0.GPP1, 0x02) // Device Wake > > > > > Notify (\_SB.PCI0.GPP5, 0x02) // Device Wake > > > > > } > > > > > } > > > > > > > > > > At least one more even significantly more extensive, calling methods > > > > > that interact with OpRegions. So how does a simple stub of a > > > > > GPE block replicate this sort of behavior in the host AML? Thanks, > > > > The simple stub of GPE block will work to replicate the ACPI > > notification only: as mentioned earlier GPE handler will be generated > > per-vfio device so in your example if let assume that only: > > - \_SB.PCI0.GPP0.PEGP > > - \_SB.PCI0.GP17.XHC1 > > will be pass-through to the guest, the generated AML code for VM will > > look more-less like below: > > > > Scope (_GPE) > > { > > Method (_E00, 0, NotSerialized) > > { > > Local0 = \_SB.PCI0.GPP0.PEGP.NOTY > > Notify ( \_SB.PCI0.GPP0.PEGP, Local0) > > } > > } > > Scope (_GPE) > > { > > Method (_E01, 0, NotSerialized) > > { > > Local0 =\_SB.PCI0.GP17.XHC1.NOTY > > Notify (\_SB.PCI0.GP17.XHC1, Local0) > > } > > } > > > > So each pass-through device will have associated GPE (0 for > > \_SB.PCI0.GPP0.PEGP and 1 for \_SB.PCI0.GP17.XHC1). The path in Notify > > could actually be different and related to guest pci hierarchy (but > > associated to those host devices). Please also note that in this case > > we use GPE in order to "inform" guest about notification coming and we > > do not try to replicate host GPE scope description. > > > > Above we assumed that other devices (like \_SB.PCI0.GPP0/1) are not > > pass-through to the guest and notification are handled in host as > > usual (they are not binded to pci-vfio) and there is no need to > > generate AML code, allocate GPE for them and so on. > > I'm pretty lost here. The GPE code to read the notify value and relay > it to another AML object is relatively trivial, but that other AML > object needs to do something of some significance with that notify. It will do exactly the same thing that it would do on the host side. Since device is pass-through to the guest and runs native driver inside the guest, the notify handler will be registered by such driver (e.g. via acpi_install_notify_handler). Now when a guest handles the GPExx, it evaluates the ACPI handler for it: Method(_Exx..) [Method (_E00, 0, NotSerialized) for GPE00, Method (_E01, 0, NotSerialized) for GPE01 and so one]. Such a handler will call Notify()* which will trigger the notify handler registered by the guest in the same way as it would on the host. *https://uefi.org/htmlspecs/ACPI_Spec_6_4_html/19_ASL_Reference/ACPI_Source_Language_Reference.html?highlight=notify#notify-notify-object-of-event > Minimally, it seems like the AML would need to establish the companion > relationship with the device so that a driver in the guest receives > that notify. VMM establishes that connection by allocating per vfio-pci device a unique GPE and generating AML for it. Once it is done and the VMM wants to replicate ACPI notification on specific device it just triggers specific, associated GPExx/SCI which in consequence evaluates guest ACPI _Exx method Scope (_GPE) { Method (_E00, 0, NotSerialized) { Local0 = \_SB.PCI0.GPP0.PEGP.NOTY Notify ( \_SB.PCI0.GPP0.PEGP, Local0) } } which is generated per device. > What does that look like, and can it be pre-seeded in the > AML regardless of whether the device is cold- or hot-plugged into the > VM? Some specific examples would be useful. It's not clear to me that > there isn't significant out-of-band effort required to understand and > replicate AML from host to guest to make this useful, so the generality > of this feature is hard to grasp. Thanks, The use case we are looking at right now, as Dominick wrote, are D-notifier events about the GPU power available to mobile discrete GPUs. Natively the firmware notifies the GPU driver to dynamically adjust the amount of power that can be used by the GPU. It does it by utilizing ACPI notification on a PCI device object and we want to replicate this behaviour on the guest side. Other common use case (although not related to pci-vfio) when ACPI notification are used is AC adapter status change: firmware can signalize ac adapter status change and as consequence evaluates: Notify (AC, 0x80) // Status Change which finally triggers registered ACPI notify handler (acpi_ac_notify) for ac adapter. There are many different notifications value: https://uefi.org/htmlspecs/ACPI_Spec_6_4_html/05_ACPI_Software_Programming_Model/ACPI_Software_Programming_Model.html#device-object-notifications and as described in linked chapter: notification values above 0x80 are device or hardware specific. The purpose of this patch is to replicate such notifications to pass-through devices, so the guest device driver can handle them in its specific ACPI notification handler way, nothing more than that. Thank you, Grzegorz
diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c index a5ab416cf476..92b8ed8d087c 100644 --- a/drivers/vfio/pci/vfio_pci_core.c +++ b/drivers/vfio/pci/vfio_pci_core.c @@ -10,6 +10,7 @@ #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt +#include <linux/acpi.h> #include <linux/aperture.h> #include <linux/device.h> #include <linux/eventfd.h> @@ -2120,10 +2121,20 @@ void vfio_pci_core_release_dev(struct vfio_device *core_vdev) } EXPORT_SYMBOL_GPL(vfio_pci_core_release_dev); +static void vfio_pci_core_acpi_notify(acpi_handle handle, u32 event, void *data) +{ + struct vfio_pci_core_device *vdev = (struct vfio_pci_core_device *)data; + struct device *dev = &vdev->pdev->dev; + + acpi_bus_generate_netlink_event("vfio_pci", dev_name(dev), event, 0); +} + int vfio_pci_core_register_device(struct vfio_pci_core_device *vdev) { + acpi_status status; struct pci_dev *pdev = vdev->pdev; struct device *dev = &pdev->dev; + struct acpi_device *adev = ACPI_COMPANION(&pdev->dev); int ret; /* Drivers must set the vfio_pci_core_device to their drvdata */ @@ -2201,8 +2212,24 @@ int vfio_pci_core_register_device(struct vfio_pci_core_device *vdev) ret = vfio_register_group_dev(&vdev->vdev); if (ret) goto out_power; + + if (!adev) { + pci_info(pdev, "No ACPI companion"); + return 0; + } + + status = acpi_install_notify_handler(adev->handle, ACPI_DEVICE_NOTIFY, + vfio_pci_core_acpi_notify, (void *)vdev); + + if (ACPI_FAILURE(status)) { + pci_err(pdev, "Failed to install notify handler"); + goto out_group_register; + } + return 0; +out_group_register: + vfio_unregister_group_dev(&vdev->vdev); out_power: if (!disable_idle_d3) pm_runtime_get_noresume(dev); @@ -2216,6 +2243,12 @@ EXPORT_SYMBOL_GPL(vfio_pci_core_register_device); void vfio_pci_core_unregister_device(struct vfio_pci_core_device *vdev) { + struct acpi_device *adev = ACPI_COMPANION(&vdev->pdev->dev); + + if (adev) + acpi_remove_notify_handler(adev->handle, ACPI_DEVICE_NOTIFY, + vfio_pci_core_acpi_notify); + vfio_pci_core_sriov_configure(vdev, 0); vfio_unregister_group_dev(&vdev->vdev);