Message ID | 20200622140516.10830-1-ppaalanen@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v4] drm/doc: device hot-unplug for userspace | expand |
On Mon, Jun 22, 2020 at 10:06 AM Pekka Paalanen <ppaalanen@gmail.com> wrote: > > From: Pekka Paalanen <pekka.paalanen@collabora.com> > > Set up the expectations on how hot-unplugging a DRM device should look like to > userspace. > > Written by Daniel Vetter's request and largely based on his comments in IRC and > from https://lists.freedesktop.org/archives/dri-devel/2020-May/265484.html . > > A related Wayland protocol change proposal is at > https://gitlab.freedesktop.org/wayland/wayland-protocols/-/merge_requests/35 > > Signed-off-by: Pekka Paalanen <pekka.paalanen@collabora.com> > Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> > Cc: Andrey Grodzovsky <andrey.grodzovsky@amd.com> > Cc: Dave Airlie <airlied@redhat.com> > Cc: Sean Paul <sean@poorly.run> > Cc: Simon Ser <contact@emersion.fr> > Cc: Noralf Trønnes <noralf@tronnes.org> > Cc: Ben Skeggs <skeggsb@gmail.com> > Cc: Christian König <christian.koenig@amd.com> > Cc: Harry Wentland <hwentlan@amd.com> > Cc: Karol Herbst <kherbst@redhat.com> > > --- > > Harry and Christian, could one of you ack this on behalf of AMD > drivers? > > Ben or Karol, could you ack on behalf of Nouveau? > > Noralf, would this work for the tiny drivers etc.? > > This is only about laying out plans for the future, not about what > drivers do today. We'd just like to be sure the goals are reasonable and > everyone is aware of the idea. > > Thanks, > pq > > v4: > - two typo fixes (Daniel) > > v3: > - update ENODEV doc (Daniel) > - clarify existing vs. new mmaps (Andrey) > - split into KMS and render/cross sections (Andrey, Daniel) > - open() returns ENXIO (open(2) man page) > - ioctls may return ENODEV (Andrey, Daniel) > - new wayland-protocols MR > > v2: > - mmap reads/writes undefined (Daniel) > - make render ioctl behaviour driver-specific (Daniel) > - restructure the mmap paragraphs (Daniel) > - chardev minor notes (Simon) > - open behaviour (Daniel) > - DRM leasing behaviour (Daniel) > - added links > > Disclaimer: I am a userspace developer writing for other userspace developers. > I took some liberties in defining what should happen without knowing what is > actually possible or what existing drivers already implement. > --- > Documentation/gpu/drm-uapi.rst | 114 ++++++++++++++++++++++++++++++++- > 1 file changed, 113 insertions(+), 1 deletion(-) > > diff --git a/Documentation/gpu/drm-uapi.rst b/Documentation/gpu/drm-uapi.rst > index 56fec6ed1ad8..b2585ea6a83e 100644 > --- a/Documentation/gpu/drm-uapi.rst > +++ b/Documentation/gpu/drm-uapi.rst > @@ -1,3 +1,5 @@ > +.. Copyright 2020 DisplayLink (UK) Ltd. > + > =================== > Userland interfaces > =================== > @@ -162,6 +164,116 @@ other hand, a driver requires shared state between clients which is > visible to user-space and accessible beyond open-file boundaries, they > cannot support render nodes. > > +Device Hot-Unplug > +================= > + > +.. note:: > + The following is the plan. Implementation is not there yet > + (2020 May). > + > +Graphics devices (display and/or render) may be connected via USB (e.g. > +display adapters or docking stations) or Thunderbolt (e.g. eGPU). An end > +user is able to hot-unplug this kind of devices while they are being > +used, and expects that the very least the machine does not crash. Any > +damage from hot-unplugging a DRM device needs to be limited as much as > +possible and userspace must be given the chance to handle it if it wants > +to. Ideally, unplugging a DRM device still lets a desktop continue to > +run, but that is going to need explicit support throughout the whole > +graphics stack: from kernel and userspace drivers, through display > +servers, via window system protocols, and in applications and libraries. > + > +Other scenarios that should lead to the same are: unrecoverable GPU > +crash, PCI device disappearing off the bus, or forced unbind of a driver > +from the physical device. > + > +In other words, from userspace perspective everything needs to keep on > +working more or less, until userspace stops using the disappeared DRM > +device and closes it completely. Userspace will learn of the device > +disappearance from the device removed uevent, ioctls returning ENODEV > +(or driver-specific ioctls returning driver-specific things), or open() > +returning ENXIO. > + > +Only after userspace has closed all relevant DRM device and dmabuf file > +descriptors and removed all mmaps, the DRM driver can tear down its > +instance for the device that no longer exists. If the same physical > +device somehow comes back in the mean time, it shall be a new DRM > +device. > + > +Similar to PIDs, chardev minor numbers are not recycled immediately. A > +new DRM device always picks the next free minor number compared to the > +previous one allocated, and wraps around when minor numbers are > +exhausted. > + > +The goal raises at least the following requirements for the kernel and > +drivers. > + > +Requirements for KMS UAPI > +------------------------- > + > +- KMS connectors must change their status to disconnected. > + > +- Legacy modesets and pageflips, and atomic commits, both real and > + TEST_ONLY, and any other ioctls either fail with ENODEV or fake > + success. > + > +- Pending non-blocking KMS operations deliver the DRM events userspace > + is expecting. This applies also to ioctls that faked success. > + > +- open() on a device node whose underlying device has disappeared will > + fail with ENXIO. > + > +- Attempting to create a DRM lease on a disappeared DRM device will > + fail with ENODEV. Existing DRM leases remain and work as listed > + above. > + > +Requirements for Render and Cross-Device UAPI > +--------------------------------------------- > + > +- All GPU jobs that can no longer run must have their fences > + force-signalled to avoid inflicting hangs on userspace. > + The associated error code is ENODEV. > + > +- Some userspace APIs already define what should happen when the device > + disappears (OpenGL, GL ES: `GL_KHR_robustness`_; `Vulkan`_: > + VK_ERROR_DEVICE_LOST; etc.). DRM drivers are free to implement this > + behaviour the way they see best, e.g. returning failures in > + driver-specific ioctls and handling those in userspace drivers, or > + rely on uevents, and so on. > + > +- dmabuf which point to memory that has disappeared will either fail to > + import with ENODEV or continue to be successfully imported if it would > + have succeeded before the disappearance. See also about memory maps > + below for already imported dmabufs. > + > +- Attempting to import a dmabuf to a disappeared device will either fail > + with ENODEV or succeed if it would have succeeded without the > + disappearance. > + > +- open() on a device node whose underlying device has disappeared will > + fail with ENXIO. > + > +.. _GL_KHR_robustness: https://www.khronos.org/registry/OpenGL/extensions/KHR/KHR_robustness.txt > +.. _Vulkan: https://www.khronos.org/vulkan/ > + > +Requirements for Memory Maps > +---------------------------- > + > +Memory maps have further requirements that apply to both existing maps > +and maps created after the device has disappeared. If the underlying > +memory disappeared, the map is created or modified such that reads and disappeared -> disappears > +writes will still complete successfully but the result is undefined. > +This applies to both userspace mmap()'d memory and memory pointed to by > +dmabuf which might be mapped to other devices (cross-device dmabuf > +imports). > + > +Raising SIGBUS is not an option, because userspace cannot realistically > +handle it. Signal handlers are global, which makes them extremely > +difficult to use correctly from libraries like those that Mesa produces. > +Signal handlers are not composable, you can't have different handlers > +for GPU1 and GPU2 from different vendors, and a third handler for > +mmapped regular files. Threads cause additional pain with signal > +handling as well. > + > .. _drm_driver_ioctl: > > IOCTL Support on Device Nodes > @@ -199,7 +311,7 @@ EPERM/EACCES: > difference between EACCES and EPERM. > > ENODEV: > - The device is not (yet) present or fully initialized. > + The device is not anymore present or is not yet fully initialized. The ordering of this sentence should be fixed up like so: The device is not present anymore or is not yet fully initialized. With those fixed the patch is: Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Alex > > EOPNOTSUPP: > Feature (like PRIME, modesetting, GEM) is not supported by the driver. > -- > 2.20.1 > > _______________________________________________ > dri-devel mailing list > dri-devel@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/dri-devel
Den 22.06.2020 16.05, skrev Pekka Paalanen: > From: Pekka Paalanen <pekka.paalanen@collabora.com> > > Set up the expectations on how hot-unplugging a DRM device should look like to > userspace. > > Written by Daniel Vetter's request and largely based on his comments in IRC and > from https://lists.freedesktop.org/archives/dri-devel/2020-May/265484.html . > > A related Wayland protocol change proposal is at > https://gitlab.freedesktop.org/wayland/wayland-protocols/-/merge_requests/35 > > Signed-off-by: Pekka Paalanen <pekka.paalanen@collabora.com> > Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> > Cc: Andrey Grodzovsky <andrey.grodzovsky@amd.com> > Cc: Dave Airlie <airlied@redhat.com> > Cc: Sean Paul <sean@poorly.run> > Cc: Simon Ser <contact@emersion.fr> > Cc: Noralf Trønnes <noralf@tronnes.org> > Cc: Ben Skeggs <skeggsb@gmail.com> > Cc: Christian König <christian.koenig@amd.com> > Cc: Harry Wentland <hwentlan@amd.com> > Cc: Karol Herbst <kherbst@redhat.com> > > --- > > Harry and Christian, could one of you ack this on behalf of AMD > drivers? > > Ben or Karol, could you ack on behalf of Nouveau? > > Noralf, would this work for the tiny drivers etc.? > Looks good to me: Acked-by: Noralf Trønnes <noralf@tronnes.org> > This is only about laying out plans for the future, not about what > drivers do today. We'd just like to be sure the goals are reasonable and > everyone is aware of the idea. > > Thanks, > pq
diff --git a/Documentation/gpu/drm-uapi.rst b/Documentation/gpu/drm-uapi.rst index 56fec6ed1ad8..b2585ea6a83e 100644 --- a/Documentation/gpu/drm-uapi.rst +++ b/Documentation/gpu/drm-uapi.rst @@ -1,3 +1,5 @@ +.. Copyright 2020 DisplayLink (UK) Ltd. + =================== Userland interfaces =================== @@ -162,6 +164,116 @@ other hand, a driver requires shared state between clients which is visible to user-space and accessible beyond open-file boundaries, they cannot support render nodes. +Device Hot-Unplug +================= + +.. note:: + The following is the plan. Implementation is not there yet + (2020 May). + +Graphics devices (display and/or render) may be connected via USB (e.g. +display adapters or docking stations) or Thunderbolt (e.g. eGPU). An end +user is able to hot-unplug this kind of devices while they are being +used, and expects that the very least the machine does not crash. Any +damage from hot-unplugging a DRM device needs to be limited as much as +possible and userspace must be given the chance to handle it if it wants +to. Ideally, unplugging a DRM device still lets a desktop continue to +run, but that is going to need explicit support throughout the whole +graphics stack: from kernel and userspace drivers, through display +servers, via window system protocols, and in applications and libraries. + +Other scenarios that should lead to the same are: unrecoverable GPU +crash, PCI device disappearing off the bus, or forced unbind of a driver +from the physical device. + +In other words, from userspace perspective everything needs to keep on +working more or less, until userspace stops using the disappeared DRM +device and closes it completely. Userspace will learn of the device +disappearance from the device removed uevent, ioctls returning ENODEV +(or driver-specific ioctls returning driver-specific things), or open() +returning ENXIO. + +Only after userspace has closed all relevant DRM device and dmabuf file +descriptors and removed all mmaps, the DRM driver can tear down its +instance for the device that no longer exists. If the same physical +device somehow comes back in the mean time, it shall be a new DRM +device. + +Similar to PIDs, chardev minor numbers are not recycled immediately. A +new DRM device always picks the next free minor number compared to the +previous one allocated, and wraps around when minor numbers are +exhausted. + +The goal raises at least the following requirements for the kernel and +drivers. + +Requirements for KMS UAPI +------------------------- + +- KMS connectors must change their status to disconnected. + +- Legacy modesets and pageflips, and atomic commits, both real and + TEST_ONLY, and any other ioctls either fail with ENODEV or fake + success. + +- Pending non-blocking KMS operations deliver the DRM events userspace + is expecting. This applies also to ioctls that faked success. + +- open() on a device node whose underlying device has disappeared will + fail with ENXIO. + +- Attempting to create a DRM lease on a disappeared DRM device will + fail with ENODEV. Existing DRM leases remain and work as listed + above. + +Requirements for Render and Cross-Device UAPI +--------------------------------------------- + +- All GPU jobs that can no longer run must have their fences + force-signalled to avoid inflicting hangs on userspace. + The associated error code is ENODEV. + +- Some userspace APIs already define what should happen when the device + disappears (OpenGL, GL ES: `GL_KHR_robustness`_; `Vulkan`_: + VK_ERROR_DEVICE_LOST; etc.). DRM drivers are free to implement this + behaviour the way they see best, e.g. returning failures in + driver-specific ioctls and handling those in userspace drivers, or + rely on uevents, and so on. + +- dmabuf which point to memory that has disappeared will either fail to + import with ENODEV or continue to be successfully imported if it would + have succeeded before the disappearance. See also about memory maps + below for already imported dmabufs. + +- Attempting to import a dmabuf to a disappeared device will either fail + with ENODEV or succeed if it would have succeeded without the + disappearance. + +- open() on a device node whose underlying device has disappeared will + fail with ENXIO. + +.. _GL_KHR_robustness: https://www.khronos.org/registry/OpenGL/extensions/KHR/KHR_robustness.txt +.. _Vulkan: https://www.khronos.org/vulkan/ + +Requirements for Memory Maps +---------------------------- + +Memory maps have further requirements that apply to both existing maps +and maps created after the device has disappeared. If the underlying +memory disappeared, the map is created or modified such that reads and +writes will still complete successfully but the result is undefined. +This applies to both userspace mmap()'d memory and memory pointed to by +dmabuf which might be mapped to other devices (cross-device dmabuf +imports). + +Raising SIGBUS is not an option, because userspace cannot realistically +handle it. Signal handlers are global, which makes them extremely +difficult to use correctly from libraries like those that Mesa produces. +Signal handlers are not composable, you can't have different handlers +for GPU1 and GPU2 from different vendors, and a third handler for +mmapped regular files. Threads cause additional pain with signal +handling as well. + .. _drm_driver_ioctl: IOCTL Support on Device Nodes @@ -199,7 +311,7 @@ EPERM/EACCES: difference between EACCES and EPERM. ENODEV: - The device is not (yet) present or fully initialized. + The device is not anymore present or is not yet fully initialized. EOPNOTSUPP: Feature (like PRIME, modesetting, GEM) is not supported by the driver.