Message ID | 20190527081741.14235-1-emil.l.velikov@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [01/13] drm/amdgpu: introduce and honour DRM_FORCE_AUTH workaround | expand |
Am 27.05.19 um 10:17 schrieb Emil Velikov: > From: Emil Velikov <emil.velikov@collabora.com> > > Currently one can circumvent DRM_AUTH, when the ioctl is exposed via the > render node. A seemingly deliberate design decision. > > Hence we can drop the DRM_AUTH all together (details in follow-up patch) > yet not all userspace checks if it's authenticated, but instead uses > uncommon assumptions. > > After days of digging through git log and testing, only a single (ab)use > was spotted - the Mesa RADV driver, using the AMDGPU_INFO ioctl and > assuming that failure implies lack of authentication. > > Affected versions are: > - the whole 18.2.x series, which is EOL > - the whole 18.3.x series, which is EOL > - the 19.0.x series, prior to 19.0.4 Well you could have saved your time, cause this is still a NAK. For the record: I strongly think that we don't want to expose any render functionality on the primary node. To even go a step further I would say that at least on AMD hardware we should completely disable DRI2 for one of the next generations. As a follow up I would then completely disallow the DRM authentication for amdgpu, so that the command submission interface on the primary node can only be used by the display server. Regards, Christian. > > Add a special quirk for that case, thus we can drop DRM_AUTH bits as > mentioned earlier. > > Since all the affected userspace is EOL, we also add a kconfig option > to disable this quirk. > > The whole approach is inspired by DRIVER_KMS_LEGACY_CONTEXT > > Cc: Alex Deucher <alexander.deucher@amd.com> > Cc: Christian König <christian.koenig@amd.com> > Cc: amd-gfx@lists.freedesktop.org > Cc: David Airlie <airlied@linux.ie> > Cc: Daniel Vetter <daniel@ffwll.ch> > Signed-off-by: Emil Velikov <emil.velikov@collabora.com> > --- > drivers/gpu/drm/amd/amdgpu/Kconfig | 16 ++++++++++++++++ > drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 12 +++++++++++- > drivers/gpu/drm/drm_ioctl.c | 5 +++++ > include/drm/drm_ioctl.h | 17 +++++++++++++++++ > 4 files changed, 49 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/Kconfig b/drivers/gpu/drm/amd/amdgpu/Kconfig > index 9221e5489069..da415f445187 100644 > --- a/drivers/gpu/drm/amd/amdgpu/Kconfig > +++ b/drivers/gpu/drm/amd/amdgpu/Kconfig > @@ -40,6 +40,22 @@ config DRM_AMDGPU_GART_DEBUGFS > Selecting this option creates a debugfs file to inspect the mapped > pages. Uses more memory for housekeeping, enable only for debugging. > > +config DRM_AMDGPU_FORCE_AUTH > + bool "Force authentication check on AMDGPU_INFO ioctl" > + default y > + help > + There were some version of the Mesa RADV drivers, which relied on > + the ioctl failing, if the client is not authenticated. > + > + Namely, the following versions are affected: > + - the whole 18.2.x series, which is EOL > + - the whole 18.3.x series, which is EOL > + - the 19.0.x series, prior to 19.0.4 > + > + Modern distributions, should disable this. That will allow various > + other clients to work, that would otherwise require root privileges. > + > + > source "drivers/gpu/drm/amd/acp/Kconfig" > source "drivers/gpu/drm/amd/display/Kconfig" > source "drivers/gpu/drm/amd/amdkfd/Kconfig" > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c > index b17d0545728e..b8076929440b 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c > @@ -1214,7 +1214,17 @@ const struct drm_ioctl_desc amdgpu_ioctls_kms[] = { > DRM_IOCTL_DEF_DRV(AMDGPU_GEM_MMAP, amdgpu_gem_mmap_ioctl, DRM_AUTH|DRM_RENDER_ALLOW), > DRM_IOCTL_DEF_DRV(AMDGPU_GEM_WAIT_IDLE, amdgpu_gem_wait_idle_ioctl, DRM_AUTH|DRM_RENDER_ALLOW), > DRM_IOCTL_DEF_DRV(AMDGPU_CS, amdgpu_cs_ioctl, DRM_AUTH|DRM_RENDER_ALLOW), > - DRM_IOCTL_DEF_DRV(AMDGPU_INFO, amdgpu_info_ioctl, DRM_AUTH|DRM_RENDER_ALLOW), > + /* The DRM_FORCE_AUTH is effectively a workaround for the RADV Mesa driver. > + * This is required for Mesa: > + * - the whole 18.2.x series, which is EOL > + * - the whole 18.3.x series, which is EOL > + * - the 19.0.x series, prior to 19.0.4 > + */ > + DRM_IOCTL_DEF_DRV(AMDGPU_INFO, amdgpu_info_ioctl, > +#if defined(DRM_AMDGPU_FORCE_AUTH) > + DRM_FORCE_AUTH| > +#endif > + DRM_AUTH|DRM_RENDER_ALLOW), > DRM_IOCTL_DEF_DRV(AMDGPU_WAIT_CS, amdgpu_cs_wait_ioctl, DRM_AUTH|DRM_RENDER_ALLOW), > DRM_IOCTL_DEF_DRV(AMDGPU_WAIT_FENCES, amdgpu_cs_wait_fences_ioctl, DRM_AUTH|DRM_RENDER_ALLOW), > DRM_IOCTL_DEF_DRV(AMDGPU_GEM_METADATA, amdgpu_gem_metadata_ioctl, DRM_AUTH|DRM_RENDER_ALLOW), > diff --git a/drivers/gpu/drm/drm_ioctl.c b/drivers/gpu/drm/drm_ioctl.c > index 2263e3ddd822..9841c0076f02 100644 > --- a/drivers/gpu/drm/drm_ioctl.c > +++ b/drivers/gpu/drm/drm_ioctl.c > @@ -544,6 +544,11 @@ int drm_ioctl_permit(u32 flags, struct drm_file *file_priv) > drm_is_render_client(file_priv))) > return -EACCES; > > + /* FORCE_AUTH is only for authenticated or render client */ > + if (unlikely((flags & DRM_FORCE_AUTH) && !drm_is_render_client(file_priv) && > + !file_priv->authenticated)) > + return -EACCES; > + > return 0; > } > EXPORT_SYMBOL(drm_ioctl_permit); > diff --git a/include/drm/drm_ioctl.h b/include/drm/drm_ioctl.h > index fafb6f592c4b..6084ee32043d 100644 > --- a/include/drm/drm_ioctl.h > +++ b/include/drm/drm_ioctl.h > @@ -126,6 +126,23 @@ enum drm_ioctl_flags { > * not set DRM_AUTH because they do not require authentication. > */ > DRM_RENDER_ALLOW = BIT(5), > + /** > + * @DRM_FORCE_AUTH: > + * > + * Authentication of the primary node is mandatory. Regardless that the > + * user can usually circumvent that by using the render node with exact > + * same ioctl. > + * > + * Note: this is effectively a workaround for AMDGPU AMDGPU_INFO ioctl > + * and the RADV Mesa driver. This is required for Mesa: > + * - the whole 18.2.x series, which is EOL > + * - the whole 18.3.x series, which is EOL > + * - the 19.0.x series, prior to 19.0.4 > + * > + * Note: later patch will effectively drop the DRM_AUTH for ioctls > + * annotated as DRM_AUTH | DRM_RENDER_ALLOW. > + */ > + DRM_FORCE_AUTH = BIT(6), > }; > > /**
On 2019/05/27, Koenig, Christian wrote: > Am 27.05.19 um 10:17 schrieb Emil Velikov: > > From: Emil Velikov <emil.velikov@collabora.com> > > > > Currently one can circumvent DRM_AUTH, when the ioctl is exposed via the > > render node. A seemingly deliberate design decision. > > > > Hence we can drop the DRM_AUTH all together (details in follow-up patch) > > yet not all userspace checks if it's authenticated, but instead uses > > uncommon assumptions. > > > > After days of digging through git log and testing, only a single (ab)use > > was spotted - the Mesa RADV driver, using the AMDGPU_INFO ioctl and > > assuming that failure implies lack of authentication. > > > > Affected versions are: > > - the whole 18.2.x series, which is EOL > > - the whole 18.3.x series, which is EOL > > - the 19.0.x series, prior to 19.0.4 > > Well you could have saved your time, cause this is still a NAK. > Did I mention that I've spent quite a bit of time in AMDVLK? Even fixed a bug while I was there :-) > For the record: I strongly think that we don't want to expose any render > functionality on the primary node. > > To even go a step further I would say that at least on AMD hardware we > should completely disable DRI2 for one of the next generations. > It's doable and overall pretty neat idea. There is a consern that this approach may cause far more regressions that this series. We are talking about years worth of Mesa drivers (et al) that depend on render functionality exposed via the primary node. I'm OK with writing the patches, but it'll be up-to the AMDGPU team to follow-up with any regressions. Are you ok with that? Fwiw I could also move the FORCE_AUTH hack to core drm, if you prefer. Thanks Emil
Am 27.05.19 um 14:05 schrieb Emil Velikov: > On 2019/05/27, Koenig, Christian wrote: >> Am 27.05.19 um 10:17 schrieb Emil Velikov: >>> From: Emil Velikov <emil.velikov@collabora.com> >>> >>> Currently one can circumvent DRM_AUTH, when the ioctl is exposed via the >>> render node. A seemingly deliberate design decision. >>> >>> Hence we can drop the DRM_AUTH all together (details in follow-up patch) >>> yet not all userspace checks if it's authenticated, but instead uses >>> uncommon assumptions. >>> >>> After days of digging through git log and testing, only a single (ab)use >>> was spotted - the Mesa RADV driver, using the AMDGPU_INFO ioctl and >>> assuming that failure implies lack of authentication. >>> >>> Affected versions are: >>> - the whole 18.2.x series, which is EOL >>> - the whole 18.3.x series, which is EOL >>> - the 19.0.x series, prior to 19.0.4 >> Well you could have saved your time, cause this is still a NAK. >> > Did I mention that I've spent quite a bit of time in AMDVLK? Even fixed > a bug while I was there :-) Yeah, thanks for doing this. But we have done so much stuff with customers which can't be audited this way, that I still have a really bad feeling about this :/ >> For the record: I strongly think that we don't want to expose any render >> functionality on the primary node. >> >> To even go a step further I would say that at least on AMD hardware we >> should completely disable DRI2 for one of the next generations. >> > It's doable and overall pretty neat idea. > > There is a consern that this approach may cause far more regressions > that this series. We are talking about years worth of Mesa drivers (et > al) that depend on render functionality exposed via the primary node. Yeah, that's I'm perfectly aware of. It's the reason why I said we should only do it for new hardware generations. But at some point I think we should do this and get rid of authentication/flink/DRI2/.... > I'm OK with writing the patches, but it'll be up-to the AMDGPU team to > follow-up with any regressions. Are you ok with that? As long as we have a check like adev->family_id > WHATEVER_IS_THE_CURRENT_LATEST_UPSTREAM_HW, I think we are fine with that. If I understood Michel correctly xf86-video-amdgpu should work, but modeset might break and needs a patch. > Fwiw I could also move the FORCE_AUTH hack to core drm, if you prefer. Well, the hack is the least of my concerns. Thanks, Christian. > > Thanks > Emil
On 2019/05/27, Koenig, Christian wrote: > Am 27.05.19 um 14:05 schrieb Emil Velikov: > > On 2019/05/27, Koenig, Christian wrote: > >> Am 27.05.19 um 10:17 schrieb Emil Velikov: > >>> From: Emil Velikov <emil.velikov@collabora.com> > >>> > >>> Currently one can circumvent DRM_AUTH, when the ioctl is exposed via the > >>> render node. A seemingly deliberate design decision. > >>> > >>> Hence we can drop the DRM_AUTH all together (details in follow-up patch) > >>> yet not all userspace checks if it's authenticated, but instead uses > >>> uncommon assumptions. > >>> > >>> After days of digging through git log and testing, only a eingle (ab)use > >>> was spotted - the Mesa RADV driver, using the AMDGPU_INFO ioctl and > >>> assuming that failure implies lack of authentication. > >>> > >>> Affected versions are: > >>> - the whole 18.2.x series, which is EOL > >>> - the whole 18.3.x series, which is EOL > >>> - the 19.0.x series, prior to 19.0.4 > >> Well you could have saved your time, cause this is still a NAK. > >> > > Did I mention that I've spent quite a bit of time in AMDVLK? Even fixed > > a bug while I was there :-) > > Yeah, thanks for doing this. > > But we have done so much stuff with customers which can't be audited > this way, that I still have a really bad feeling about this :/ > Fair point, I've already thought about those. Example A: customer X, using closed source/private software Y. Ioctls A, B and C require the old behaviour - simply add FORCE_AUTH to the A B C and carry on happily. Example B: the team, as a whole thinks that this will be problematic for customer X - add FORCE_AUTH to all ioctls and carry on. I do see and understand why anyone can be hesitant about the series. IMHO the above examples, illustrate quite reasonable compromise between open-source and closed-source/private solutions. Don't you agree? > >> For the record: I strongly think that we don't want to expose any render > >> functionality on the primary node. > >> > >> To even go a step further I would say that at least on AMD hardware we > >> should completely disable DRI2 for one of the next generations. > >> > > It's doable and overall pretty neat idea. > > > > There is a consern that this approach may cause far more regressions > > that this series. We are talking about years worth of Mesa drivers (et > > al) that depend on render functionality exposed via the primary node. > > Yeah, that's I'm perfectly aware of. It's the reason why I said we > should only do it for new hardware generations. > > But at some point I think we should do this and get rid of > authentication/flink/DRI2/.... > Fwiw I share a similar sentiment. If you've looked through the series, this is effectively step 1, towards nuking DRM_AUTH :-) > > I'm OK with writing the patches, but it'll be up-to the AMDGPU team to > > follow-up with any regressions. Are you ok with that? > > As long as we have a check like adev->family_id > > WHATEVER_IS_THE_CURRENT_LATEST_UPSTREAM_HW, I think we are fine with that. > > If I understood Michel correctly xf86-video-amdgpu should work, but > modeset might break and needs a patch. > Unless I have concrete WHATEVER_IS_THE... I cannot do much here :-( Thanks Emil
On Mon, May 27, 2019 at 09:17:29AM +0100, Emil Velikov wrote: > From: Emil Velikov <emil.velikov@collabora.com> > > Currently one can circumvent DRM_AUTH, when the ioctl is exposed via the > render node. A seemingly deliberate design decision. > > Hence we can drop the DRM_AUTH all together (details in follow-up patch) > yet not all userspace checks if it's authenticated, but instead uses > uncommon assumptions. > > After days of digging through git log and testing, only a single (ab)use > was spotted - the Mesa RADV driver, using the AMDGPU_INFO ioctl and > assuming that failure implies lack of authentication. Maybe correction here: Id does not care about authentication at all. It wants to figure out whether it has access to modeset resources, which is something entirely different, but happened to match in amdgpu's case. > Affected versions are: > - the whole 18.2.x series, which is EOL > - the whole 18.3.x series, which is EOL > - the 19.0.x series, prior to 19.0.4 Hm I think I'm blind, I'm not seeing the fix merged to mesa master. Still there on gitlab: https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/vulkan/radv_device.c#L291 What am I missing? > Add a special quirk for that case, thus we can drop DRM_AUTH bits as > mentioned earlier. > > Since all the affected userspace is EOL, we also add a kconfig option > to disable this quirk. > > The whole approach is inspired by DRIVER_KMS_LEGACY_CONTEXT > > Cc: Alex Deucher <alexander.deucher@amd.com> > Cc: Christian König <christian.koenig@amd.com> > Cc: amd-gfx@lists.freedesktop.org > Cc: David Airlie <airlied@linux.ie> > Cc: Daniel Vetter <daniel@ffwll.ch> > Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Aside from the nits I think reasonable approach. -Daniel > --- > drivers/gpu/drm/amd/amdgpu/Kconfig | 16 ++++++++++++++++ > drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 12 +++++++++++- > drivers/gpu/drm/drm_ioctl.c | 5 +++++ > include/drm/drm_ioctl.h | 17 +++++++++++++++++ > 4 files changed, 49 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/Kconfig b/drivers/gpu/drm/amd/amdgpu/Kconfig > index 9221e5489069..da415f445187 100644 > --- a/drivers/gpu/drm/amd/amdgpu/Kconfig > +++ b/drivers/gpu/drm/amd/amdgpu/Kconfig > @@ -40,6 +40,22 @@ config DRM_AMDGPU_GART_DEBUGFS > Selecting this option creates a debugfs file to inspect the mapped > pages. Uses more memory for housekeeping, enable only for debugging. > > +config DRM_AMDGPU_FORCE_AUTH > + bool "Force authentication check on AMDGPU_INFO ioctl" > + default y > + help > + There were some version of the Mesa RADV drivers, which relied on > + the ioctl failing, if the client is not authenticated. > + > + Namely, the following versions are affected: > + - the whole 18.2.x series, which is EOL > + - the whole 18.3.x series, which is EOL > + - the 19.0.x series, prior to 19.0.4 > + > + Modern distributions, should disable this. That will allow various > + other clients to work, that would otherwise require root privileges. > + > + > source "drivers/gpu/drm/amd/acp/Kconfig" > source "drivers/gpu/drm/amd/display/Kconfig" > source "drivers/gpu/drm/amd/amdkfd/Kconfig" > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c > index b17d0545728e..b8076929440b 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c > @@ -1214,7 +1214,17 @@ const struct drm_ioctl_desc amdgpu_ioctls_kms[] = { > DRM_IOCTL_DEF_DRV(AMDGPU_GEM_MMAP, amdgpu_gem_mmap_ioctl, DRM_AUTH|DRM_RENDER_ALLOW), > DRM_IOCTL_DEF_DRV(AMDGPU_GEM_WAIT_IDLE, amdgpu_gem_wait_idle_ioctl, DRM_AUTH|DRM_RENDER_ALLOW), > DRM_IOCTL_DEF_DRV(AMDGPU_CS, amdgpu_cs_ioctl, DRM_AUTH|DRM_RENDER_ALLOW), > - DRM_IOCTL_DEF_DRV(AMDGPU_INFO, amdgpu_info_ioctl, DRM_AUTH|DRM_RENDER_ALLOW), > + /* The DRM_FORCE_AUTH is effectively a workaround for the RADV Mesa driver. > + * This is required for Mesa: > + * - the whole 18.2.x series, which is EOL > + * - the whole 18.3.x series, which is EOL > + * - the 19.0.x series, prior to 19.0.4 > + */ > + DRM_IOCTL_DEF_DRV(AMDGPU_INFO, amdgpu_info_ioctl, > +#if defined(DRM_AMDGPU_FORCE_AUTH) > + DRM_FORCE_AUTH| > +#endif > + DRM_AUTH|DRM_RENDER_ALLOW), > DRM_IOCTL_DEF_DRV(AMDGPU_WAIT_CS, amdgpu_cs_wait_ioctl, DRM_AUTH|DRM_RENDER_ALLOW), > DRM_IOCTL_DEF_DRV(AMDGPU_WAIT_FENCES, amdgpu_cs_wait_fences_ioctl, DRM_AUTH|DRM_RENDER_ALLOW), > DRM_IOCTL_DEF_DRV(AMDGPU_GEM_METADATA, amdgpu_gem_metadata_ioctl, DRM_AUTH|DRM_RENDER_ALLOW), > diff --git a/drivers/gpu/drm/drm_ioctl.c b/drivers/gpu/drm/drm_ioctl.c > index 2263e3ddd822..9841c0076f02 100644 > --- a/drivers/gpu/drm/drm_ioctl.c > +++ b/drivers/gpu/drm/drm_ioctl.c > @@ -544,6 +544,11 @@ int drm_ioctl_permit(u32 flags, struct drm_file *file_priv) > drm_is_render_client(file_priv))) > return -EACCES; > > + /* FORCE_AUTH is only for authenticated or render client */ > + if (unlikely((flags & DRM_FORCE_AUTH) && !drm_is_render_client(file_priv) && > + !file_priv->authenticated)) > + return -EACCES; > + > return 0; > } > EXPORT_SYMBOL(drm_ioctl_permit); > diff --git a/include/drm/drm_ioctl.h b/include/drm/drm_ioctl.h > index fafb6f592c4b..6084ee32043d 100644 > --- a/include/drm/drm_ioctl.h > +++ b/include/drm/drm_ioctl.h > @@ -126,6 +126,23 @@ enum drm_ioctl_flags { > * not set DRM_AUTH because they do not require authentication. > */ > DRM_RENDER_ALLOW = BIT(5), > + /** > + * @DRM_FORCE_AUTH: > + * > + * Authentication of the primary node is mandatory. Regardless that the > + * user can usually circumvent that by using the render node with exact > + * same ioctl. > + * > + * Note: this is effectively a workaround for AMDGPU AMDGPU_INFO ioctl > + * and the RADV Mesa driver. This is required for Mesa: > + * - the whole 18.2.x series, which is EOL > + * - the whole 18.3.x series, which is EOL > + * - the 19.0.x series, prior to 19.0.4 > + * > + * Note: later patch will effectively drop the DRM_AUTH for ioctls > + * annotated as DRM_AUTH | DRM_RENDER_ALLOW. > + */ > + DRM_FORCE_AUTH = BIT(6), > }; > > /** > -- > 2.21.0 >
On Mon, May 27, 2019 at 10:47:39AM +0000, Koenig, Christian wrote: > Am 27.05.19 um 10:17 schrieb Emil Velikov: > > From: Emil Velikov <emil.velikov@collabora.com> > > > > Currently one can circumvent DRM_AUTH, when the ioctl is exposed via the > > render node. A seemingly deliberate design decision. > > > > Hence we can drop the DRM_AUTH all together (details in follow-up patch) > > yet not all userspace checks if it's authenticated, but instead uses > > uncommon assumptions. > > > > After days of digging through git log and testing, only a single (ab)use > > was spotted - the Mesa RADV driver, using the AMDGPU_INFO ioctl and > > assuming that failure implies lack of authentication. > > > > Affected versions are: > > - the whole 18.2.x series, which is EOL > > - the whole 18.3.x series, which is EOL > > - the 19.0.x series, prior to 19.0.4 > > Well you could have saved your time, cause this is still a NAK. > > For the record: I strongly think that we don't want to expose any render > functionality on the primary node. > > To even go a step further I would say that at least on AMD hardware we > should completely disable DRI2 for one of the next generations. > > As a follow up I would then completely disallow the DRM authentication > for amdgpu, so that the command submission interface on the primary node > can only be used by the display server. So amdgpu is running in one direction, while everyone else is running in the other direction? Please note that your patch to change i915 landed already, so that ship is sailing (but we could ofc revert that back again). Imo really not a great idea if we do a amdgpu vs. everyone else split here. If we want to deprecate dri2/flink/rendering on primary nodes across the stack, then that should be a stack wide decision, not an amdgpu one. Same for the other one, i.e. this stuff here. -Daniel > > Regards, > Christian. > > > > > Add a special quirk for that case, thus we can drop DRM_AUTH bits as > > mentioned earlier. > > > > Since all the affected userspace is EOL, we also add a kconfig option > > to disable this quirk. > > > > The whole approach is inspired by DRIVER_KMS_LEGACY_CONTEXT > > > > Cc: Alex Deucher <alexander.deucher@amd.com> > > Cc: Christian König <christian.koenig@amd.com> > > Cc: amd-gfx@lists.freedesktop.org > > Cc: David Airlie <airlied@linux.ie> > > Cc: Daniel Vetter <daniel@ffwll.ch> > > Signed-off-by: Emil Velikov <emil.velikov@collabora.com> > > --- > > drivers/gpu/drm/amd/amdgpu/Kconfig | 16 ++++++++++++++++ > > drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 12 +++++++++++- > > drivers/gpu/drm/drm_ioctl.c | 5 +++++ > > include/drm/drm_ioctl.h | 17 +++++++++++++++++ > > 4 files changed, 49 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/Kconfig b/drivers/gpu/drm/amd/amdgpu/Kconfig > > index 9221e5489069..da415f445187 100644 > > --- a/drivers/gpu/drm/amd/amdgpu/Kconfig > > +++ b/drivers/gpu/drm/amd/amdgpu/Kconfig > > @@ -40,6 +40,22 @@ config DRM_AMDGPU_GART_DEBUGFS > > Selecting this option creates a debugfs file to inspect the mapped > > pages. Uses more memory for housekeeping, enable only for debugging. > > > > +config DRM_AMDGPU_FORCE_AUTH > > + bool "Force authentication check on AMDGPU_INFO ioctl" > > + default y > > + help > > + There were some version of the Mesa RADV drivers, which relied on > > + the ioctl failing, if the client is not authenticated. > > + > > + Namely, the following versions are affected: > > + - the whole 18.2.x series, which is EOL > > + - the whole 18.3.x series, which is EOL > > + - the 19.0.x series, prior to 19.0.4 > > + > > + Modern distributions, should disable this. That will allow various > > + other clients to work, that would otherwise require root privileges. > > + > > + > > source "drivers/gpu/drm/amd/acp/Kconfig" > > source "drivers/gpu/drm/amd/display/Kconfig" > > source "drivers/gpu/drm/amd/amdkfd/Kconfig" > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c > > index b17d0545728e..b8076929440b 100644 > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c > > @@ -1214,7 +1214,17 @@ const struct drm_ioctl_desc amdgpu_ioctls_kms[] = { > > DRM_IOCTL_DEF_DRV(AMDGPU_GEM_MMAP, amdgpu_gem_mmap_ioctl, DRM_AUTH|DRM_RENDER_ALLOW), > > DRM_IOCTL_DEF_DRV(AMDGPU_GEM_WAIT_IDLE, amdgpu_gem_wait_idle_ioctl, DRM_AUTH|DRM_RENDER_ALLOW), > > DRM_IOCTL_DEF_DRV(AMDGPU_CS, amdgpu_cs_ioctl, DRM_AUTH|DRM_RENDER_ALLOW), > > - DRM_IOCTL_DEF_DRV(AMDGPU_INFO, amdgpu_info_ioctl, DRM_AUTH|DRM_RENDER_ALLOW), > > + /* The DRM_FORCE_AUTH is effectively a workaround for the RADV Mesa driver. > > + * This is required for Mesa: > > + * - the whole 18.2.x series, which is EOL > > + * - the whole 18.3.x series, which is EOL > > + * - the 19.0.x series, prior to 19.0.4 > > + */ > > + DRM_IOCTL_DEF_DRV(AMDGPU_INFO, amdgpu_info_ioctl, > > +#if defined(DRM_AMDGPU_FORCE_AUTH) > > + DRM_FORCE_AUTH| > > +#endif > > + DRM_AUTH|DRM_RENDER_ALLOW), > > DRM_IOCTL_DEF_DRV(AMDGPU_WAIT_CS, amdgpu_cs_wait_ioctl, DRM_AUTH|DRM_RENDER_ALLOW), > > DRM_IOCTL_DEF_DRV(AMDGPU_WAIT_FENCES, amdgpu_cs_wait_fences_ioctl, DRM_AUTH|DRM_RENDER_ALLOW), > > DRM_IOCTL_DEF_DRV(AMDGPU_GEM_METADATA, amdgpu_gem_metadata_ioctl, DRM_AUTH|DRM_RENDER_ALLOW), > > diff --git a/drivers/gpu/drm/drm_ioctl.c b/drivers/gpu/drm/drm_ioctl.c > > index 2263e3ddd822..9841c0076f02 100644 > > --- a/drivers/gpu/drm/drm_ioctl.c > > +++ b/drivers/gpu/drm/drm_ioctl.c > > @@ -544,6 +544,11 @@ int drm_ioctl_permit(u32 flags, struct drm_file *file_priv) > > drm_is_render_client(file_priv))) > > return -EACCES; > > > > + /* FORCE_AUTH is only for authenticated or render client */ > > + if (unlikely((flags & DRM_FORCE_AUTH) && !drm_is_render_client(file_priv) && > > + !file_priv->authenticated)) > > + return -EACCES; > > + > > return 0; > > } > > EXPORT_SYMBOL(drm_ioctl_permit); > > diff --git a/include/drm/drm_ioctl.h b/include/drm/drm_ioctl.h > > index fafb6f592c4b..6084ee32043d 100644 > > --- a/include/drm/drm_ioctl.h > > +++ b/include/drm/drm_ioctl.h > > @@ -126,6 +126,23 @@ enum drm_ioctl_flags { > > * not set DRM_AUTH because they do not require authentication. > > */ > > DRM_RENDER_ALLOW = BIT(5), > > + /** > > + * @DRM_FORCE_AUTH: > > + * > > + * Authentication of the primary node is mandatory. Regardless that the > > + * user can usually circumvent that by using the render node with exact > > + * same ioctl. > > + * > > + * Note: this is effectively a workaround for AMDGPU AMDGPU_INFO ioctl > > + * and the RADV Mesa driver. This is required for Mesa: > > + * - the whole 18.2.x series, which is EOL > > + * - the whole 18.3.x series, which is EOL > > + * - the 19.0.x series, prior to 19.0.4 > > + * > > + * Note: later patch will effectively drop the DRM_AUTH for ioctls > > + * annotated as DRM_AUTH | DRM_RENDER_ALLOW. > > + */ > > + DRM_FORCE_AUTH = BIT(6), > > }; > > > > /** >
On 2019/05/27, Daniel Vetter wrote: > On Mon, May 27, 2019 at 10:47:39AM +0000, Koenig, Christian wrote: > > Am 27.05.19 um 10:17 schrieb Emil Velikov: > > > From: Emil Velikov <emil.velikov@collabora.com> > > > > > > Currently one can circumvent DRM_AUTH, when the ioctl is exposed via the > > > render node. A seemingly deliberate design decision. > > > > > > Hence we can drop the DRM_AUTH all together (details in follow-up patch) > > > yet not all userspace checks if it's authenticated, but instead uses > > > uncommon assumptions. > > > > > > After days of digging through git log and testing, only a single (ab)use > > > was spotted - the Mesa RADV driver, using the AMDGPU_INFO ioctl and > > > assuming that failure implies lack of authentication. > > > > > > Affected versions are: > > > - the whole 18.2.x series, which is EOL > > > - the whole 18.3.x series, which is EOL > > > - the 19.0.x series, prior to 19.0.4 > > > > Well you could have saved your time, cause this is still a NAK. > > > > For the record: I strongly think that we don't want to expose any render > > functionality on the primary node. > > > > To even go a step further I would say that at least on AMD hardware we > > should completely disable DRI2 for one of the next generations. > > > > As a follow up I would then completely disallow the DRM authentication > > for amdgpu, so that the command submission interface on the primary node > > can only be used by the display server. > > So amdgpu is running in one direction, while everyone else is running in > the other direction? Please note that your patch to change i915 landed > already, so that ship is sailing (but we could ofc revert that back > again). > > Imo really not a great idea if we do a amdgpu vs. everyone else split > here. If we want to deprecate dri2/flink/rendering on primary nodes across > the stack, then that should be a stack wide decision, not an amdgpu one. > Cannot agree more - I would love to see drivers stay consistent. Fwiw, this series consistently paves the way toward nuking DRM_AUTH ;-) Thanks Emil
On Mon, May 27, 2019 at 01:52:05PM +0100, Emil Velikov wrote: > On 2019/05/27, Koenig, Christian wrote: > > Am 27.05.19 um 14:05 schrieb Emil Velikov: > > > On 2019/05/27, Koenig, Christian wrote: > > >> Am 27.05.19 um 10:17 schrieb Emil Velikov: > > >>> From: Emil Velikov <emil.velikov@collabora.com> > > >>> > > >>> Currently one can circumvent DRM_AUTH, when the ioctl is exposed via the > > >>> render node. A seemingly deliberate design decision. > > >>> > > >>> Hence we can drop the DRM_AUTH all together (details in follow-up patch) > > >>> yet not all userspace checks if it's authenticated, but instead uses > > >>> uncommon assumptions. > > >>> > > >>> After days of digging through git log and testing, only a eingle (ab)use > > >>> was spotted - the Mesa RADV driver, using the AMDGPU_INFO ioctl and > > >>> assuming that failure implies lack of authentication. > > >>> > > >>> Affected versions are: > > >>> - the whole 18.2.x series, which is EOL > > >>> - the whole 18.3.x series, which is EOL > > >>> - the 19.0.x series, prior to 19.0.4 > > >> Well you could have saved your time, cause this is still a NAK. > > >> > > > Did I mention that I've spent quite a bit of time in AMDVLK? Even fixed > > > a bug while I was there :-) > > > > Yeah, thanks for doing this. > > > > But we have done so much stuff with customers which can't be audited > > this way, that I still have a really bad feeling about this :/ > > > Fair point, I've already thought about those. > > Example A: customer X, using closed source/private software Y. > Ioctls A, B and C require the old behaviour - simply add FORCE_AUTH to > the A B C and carry on happily. Hm, if the entire concern here is all the bazillion different versions of blobs shipped in the past few years, why can't we just have a revert of this in the amdgpu DKMS? Not like one more patch among the hundres will matter. I'd suspect that the overlap of people wanting to run the blobs and people who don't run the DKMS or similar is roughly 0. Always been the case here at Intel at least. If there's other stuff that we need to audit (like rocm or whatever), then we should look into those ofc. -Daniel > > Example B: the team, as a whole thinks that this will be problematic for > customer X - add FORCE_AUTH to all ioctls and carry on. > > I do see and understand why anyone can be hesitant about the series. > > IMHO the above examples, illustrate quite reasonable compromise between > open-source and closed-source/private solutions. > > Don't you agree? > > > >> For the record: I strongly think that we don't want to expose any render > > >> functionality on the primary node. > > >> > > >> To even go a step further I would say that at least on AMD hardware we > > >> should completely disable DRI2 for one of the next generations. > > >> > > > It's doable and overall pretty neat idea. > > > > > > There is a consern that this approach may cause far more regressions > > > that this series. We are talking about years worth of Mesa drivers (et > > > al) that depend on render functionality exposed via the primary node. > > > > Yeah, that's I'm perfectly aware of. It's the reason why I said we > > should only do it for new hardware generations. > > > > But at some point I think we should do this and get rid of > > authentication/flink/DRI2/.... > > > Fwiw I share a similar sentiment. If you've looked through the series, > this is effectively step 1, towards nuking DRM_AUTH :-) > > > > I'm OK with writing the patches, but it'll be up-to the AMDGPU team to > > > follow-up with any regressions. Are you ok with that? > > > > As long as we have a check like adev->family_id > > > WHATEVER_IS_THE_CURRENT_LATEST_UPSTREAM_HW, I think we are fine with that. > > > > If I understood Michel correctly xf86-video-amdgpu should work, but > > modeset might break and needs a patch. > > > Unless I have concrete WHATEVER_IS_THE... I cannot do much here :-( > > > Thanks > Emil
On Mon, May 27, 2019 at 3:26 PM Daniel Vetter <daniel@ffwll.ch> wrote: > > On Mon, May 27, 2019 at 01:52:05PM +0100, Emil Velikov wrote: > > On 2019/05/27, Koenig, Christian wrote: > > > Am 27.05.19 um 14:05 schrieb Emil Velikov: > > > > On 2019/05/27, Koenig, Christian wrote: > > > >> Am 27.05.19 um 10:17 schrieb Emil Velikov: > > > >>> From: Emil Velikov <emil.velikov@collabora.com> > > > >>> > > > >>> Currently one can circumvent DRM_AUTH, when the ioctl is exposed via the > > > >>> render node. A seemingly deliberate design decision. > > > >>> > > > >>> Hence we can drop the DRM_AUTH all together (details in follow-up patch) > > > >>> yet not all userspace checks if it's authenticated, but instead uses > > > >>> uncommon assumptions. > > > >>> > > > >>> After days of digging through git log and testing, only a eingle (ab)use > > > >>> was spotted - the Mesa RADV driver, using the AMDGPU_INFO ioctl and > > > >>> assuming that failure implies lack of authentication. > > > >>> > > > >>> Affected versions are: > > > >>> - the whole 18.2.x series, which is EOL > > > >>> - the whole 18.3.x series, which is EOL > > > >>> - the 19.0.x series, prior to 19.0.4 > > > >> Well you could have saved your time, cause this is still a NAK. > > > >> > > > > Did I mention that I've spent quite a bit of time in AMDVLK? Even fixed > > > > a bug while I was there :-) > > > > > > Yeah, thanks for doing this. > > > > > > But we have done so much stuff with customers which can't be audited > > > this way, that I still have a really bad feeling about this :/ > > > > > Fair point, I've already thought about those. > > > > Example A: customer X, using closed source/private software Y. > > Ioctls A, B and C require the old behaviour - simply add FORCE_AUTH to > > the A B C and carry on happily. > > Hm, if the entire concern here is all the bazillion different versions of > blobs shipped in the past few years, why can't we just have a revert of > this in the amdgpu DKMS? Not like one more patch among the hundres will > matter. I'd suspect that the overlap of people wanting to run the blobs > and people who don't run the DKMS or similar is roughly 0. Always been the > case here at Intel at least. > > If there's other stuff that we need to audit (like rocm or whatever), then > we should look into those ofc. Also note that Emil's patch actually doesn't break anything, since default y. So you don't even need the revert (except maybe in 10 years or so when we throw that option out). -Daniel > > Example B: the team, as a whole thinks that this will be problematic for > > customer X - add FORCE_AUTH to all ioctls and carry on. > > > > I do see and understand why anyone can be hesitant about the series. > > > > IMHO the above examples, illustrate quite reasonable compromise between > > open-source and closed-source/private solutions. > > > > Don't you agree? > > > > > >> For the record: I strongly think that we don't want to expose any render > > > >> functionality on the primary node. > > > >> > > > >> To even go a step further I would say that at least on AMD hardware we > > > >> should completely disable DRI2 for one of the next generations. > > > >> > > > > It's doable and overall pretty neat idea. > > > > > > > > There is a consern that this approach may cause far more regressions > > > > that this series. We are talking about years worth of Mesa drivers (et > > > > al) that depend on render functionality exposed via the primary node. > > > > > > Yeah, that's I'm perfectly aware of. It's the reason why I said we > > > should only do it for new hardware generations. > > > > > > But at some point I think we should do this and get rid of > > > authentication/flink/DRI2/.... > > > > > Fwiw I share a similar sentiment. If you've looked through the series, > > this is effectively step 1, towards nuking DRM_AUTH :-) > > > > > > I'm OK with writing the patches, but it'll be up-to the AMDGPU team to > > > > follow-up with any regressions. Are you ok with that? > > > > > > As long as we have a check like adev->family_id > > > > WHATEVER_IS_THE_CURRENT_LATEST_UPSTREAM_HW, I think we are fine with that. > > > > > > If I understood Michel correctly xf86-video-amdgpu should work, but > > > modeset might break and needs a patch. > > > > > Unless I have concrete WHATEVER_IS_THE... I cannot do much here :-( > > > > > > Thanks > > Emil > > -- > Daniel Vetter > Software Engineer, Intel Corporation > http://blog.ffwll.ch
Am 27.05.19 um 15:26 schrieb Emil Velikov: > On 2019/05/27, Daniel Vetter wrote: >> On Mon, May 27, 2019 at 10:47:39AM +0000, Koenig, Christian wrote: >>> Am 27.05.19 um 10:17 schrieb Emil Velikov: >>>> From: Emil Velikov <emil.velikov@collabora.com> >>>> >>>> Currently one can circumvent DRM_AUTH, when the ioctl is exposed via the >>>> render node. A seemingly deliberate design decision. >>>> >>>> Hence we can drop the DRM_AUTH all together (details in follow-up patch) >>>> yet not all userspace checks if it's authenticated, but instead uses >>>> uncommon assumptions. >>>> >>>> After days of digging through git log and testing, only a single (ab)use >>>> was spotted - the Mesa RADV driver, using the AMDGPU_INFO ioctl and >>>> assuming that failure implies lack of authentication. >>>> >>>> Affected versions are: >>>> - the whole 18.2.x series, which is EOL >>>> - the whole 18.3.x series, which is EOL >>>> - the 19.0.x series, prior to 19.0.4 >>> Well you could have saved your time, cause this is still a NAK. >>> >>> For the record: I strongly think that we don't want to expose any render >>> functionality on the primary node. >>> >>> To even go a step further I would say that at least on AMD hardware we >>> should completely disable DRI2 for one of the next generations. >>> >>> As a follow up I would then completely disallow the DRM authentication >>> for amdgpu, so that the command submission interface on the primary node >>> can only be used by the display server. >> So amdgpu is running in one direction, while everyone else is running in >> the other direction? Please note that your patch to change i915 landed >> already, so that ship is sailing (but we could ofc revert that back >> again). >> >> Imo really not a great idea if we do a amdgpu vs. everyone else split >> here. If we want to deprecate dri2/flink/rendering on primary nodes across >> the stack, then that should be a stack wide decision, not an amdgpu one. >> > Cannot agree more - I would love to see drivers stay consistent. Yeah, completely agree to that. That's why I think we should not do this at all and just let Intel fix it's userspace bugs :P Anyway my concern is really that we should stop extending functionality on the primary node. E.g. the render node is for use by the clients and the primary node for mode setting and use by the display server only. Regards, Christian. > Fwiw, this series consistently paves the way toward nuking DRM_AUTH ;-) > > Thanks > Emil
On 2019/05/27, Daniel Vetter wrote: > On Mon, May 27, 2019 at 09:17:29AM +0100, Emil Velikov wrote: > > From: Emil Velikov <emil.velikov@collabora.com> > > > > Currently one can circumvent DRM_AUTH, when the ioctl is exposed via the > > render node. A seemingly deliberate design decision. > > > > Hence we can drop the DRM_AUTH all together (details in follow-up patch) > > yet not all userspace checks if it's authenticated, but instead uses > > uncommon assumptions. > > > > After days of digging through git log and testing, only a single (ab)use > > was spotted - the Mesa RADV driver, using the AMDGPU_INFO ioctl and > > assuming that failure implies lack of authentication. > > Maybe correction here: Id does not care about authentication at all. It > wants to figure out whether it has access to modeset resources, which is > something entirely different, but happened to match in amdgpu's case. > It feels like we have conflated the two (perhaps due to historical reasons) and I'm not 100% sure which one the AMDGPU code inspects. How about: Hence we can drop the DRM_AUTH all together (details in follow-up patch) yet that cause a regression in some userspace, when it conflates the authentication status with access to modeset resources. After days of digging through git log and testing, only a single (ab)use was spotted - the Mesa RADV driver, using the AMDGPU_INFO ioctl. > > Affected versions are: > > - the whole 18.2.x series, which is EOL > > - the whole 18.3.x series, which is EOL > > - the 19.0.x series, prior to 19.0.4 > > Hm I think I'm blind, I'm not seeing the fix merged to mesa master. Still > there on gitlab: > > https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/vulkan/radv_device.c#L291 > > What am I missing? > Opted for a simple, more generic solution see commit c962a78f18284e2971301bf68c9c60500ca398e4 This way, the same check is: - enforced where it's used - present for all Mesa Vulkan drivers Will include the sha + commit title for v2. > > Add a special quirk for that case, thus we can drop DRM_AUTH bits as > > mentioned earlier. > > > > Since all the affected userspace is EOL, we also add a kconfig option > > to disable this quirk. > > > > The whole approach is inspired by DRIVER_KMS_LEGACY_CONTEXT > > > > Cc: Alex Deucher <alexander.deucher@amd.com> > > Cc: Christian König <christian.koenig@amd.com> > > Cc: amd-gfx@lists.freedesktop.org > > Cc: David Airlie <airlied@linux.ie> > > Cc: Daniel Vetter <daniel@ffwll.ch> > > Signed-off-by: Emil Velikov <emil.velikov@collabora.com> > > Aside from the nits I think reasonable approach. Great, glad to hear. Now all we need is to bribe^Wconvince Christian. Thanks Emil
On Mon, May 27, 2019 at 3:42 PM Koenig, Christian <Christian.Koenig@amd.com> wrote: > > Am 27.05.19 um 15:26 schrieb Emil Velikov: > > On 2019/05/27, Daniel Vetter wrote: > >> On Mon, May 27, 2019 at 10:47:39AM +0000, Koenig, Christian wrote: > >>> Am 27.05.19 um 10:17 schrieb Emil Velikov: > >>>> From: Emil Velikov <emil.velikov@collabora.com> > >>>> > >>>> Currently one can circumvent DRM_AUTH, when the ioctl is exposed via the > >>>> render node. A seemingly deliberate design decision. > >>>> > >>>> Hence we can drop the DRM_AUTH all together (details in follow-up patch) > >>>> yet not all userspace checks if it's authenticated, but instead uses > >>>> uncommon assumptions. > >>>> > >>>> After days of digging through git log and testing, only a single (ab)use > >>>> was spotted - the Mesa RADV driver, using the AMDGPU_INFO ioctl and > >>>> assuming that failure implies lack of authentication. > >>>> > >>>> Affected versions are: > >>>> - the whole 18.2.x series, which is EOL > >>>> - the whole 18.3.x series, which is EOL > >>>> - the 19.0.x series, prior to 19.0.4 > >>> Well you could have saved your time, cause this is still a NAK. > >>> > >>> For the record: I strongly think that we don't want to expose any render > >>> functionality on the primary node. > >>> > >>> To even go a step further I would say that at least on AMD hardware we > >>> should completely disable DRI2 for one of the next generations. > >>> > >>> As a follow up I would then completely disallow the DRM authentication > >>> for amdgpu, so that the command submission interface on the primary node > >>> can only be used by the display server. > >> So amdgpu is running in one direction, while everyone else is running in > >> the other direction? Please note that your patch to change i915 landed > >> already, so that ship is sailing (but we could ofc revert that back > >> again). > >> > >> Imo really not a great idea if we do a amdgpu vs. everyone else split > >> here. If we want to deprecate dri2/flink/rendering on primary nodes across > >> the stack, then that should be a stack wide decision, not an amdgpu one. > >> > > Cannot agree more - I would love to see drivers stay consistent. > > Yeah, completely agree to that. That's why I think we should not do this > at all and just let Intel fix it's userspace bugs :P So you're planning to submit that revert? You did jump the gun with sending out that patch after all too ... (aside from it got merged because of some other mixup with r-b tags and what not). -Daniel > Anyway my concern is really that we should stop extending functionality > on the primary node. > > E.g. the render node is for use by the clients and the primary node for > mode setting and use by the display server only. > > Regards, > Christian. > > > Fwiw, this series consistently paves the way toward nuking DRM_AUTH ;-) > > > > Thanks > > Emil >
On 2019/05/27, Koenig, Christian wrote: > Am 27.05.19 um 15:26 schrieb Emil Velikov: > > On 2019/05/27, Daniel Vetter wrote: > >> On Mon, May 27, 2019 at 10:47:39AM +0000, Koenig, Christian wrote: > >>> Am 27.05.19 um 10:17 schrieb Emil Velikov: > >>>> From: Emil Velikov <emil.velikov@collabora.com> > >>>> > >>>> Currently one can circumvent DRM_AUTH, when the ioctl is exposed via the > >>>> render node. A seemingly deliberate design decision. > >>>> > >>>> Hence we can drop the DRM_AUTH all together (details in follow-up patch) > >>>> yet not all userspace checks if it's authenticated, but instead uses > >>>> uncommon assumptions. > >>>> > >>>> After days of digging through git log and testing, only a single (ab)use > >>>> was spotted - the Mesa RADV driver, using the AMDGPU_INFO ioctl and > >>>> assuming that failure implies lack of authentication. > >>>> > >>>> Affected versions are: > >>>> - the whole 18.2.x series, which is EOL > >>>> - the whole 18.3.x series, which is EOL > >>>> - the 19.0.x series, prior to 19.0.4 > >>> Well you could have saved your time, cause this is still a NAK. > >>> > >>> For the record: I strongly think that we don't want to expose any render > >>> functionality on the primary node. > >>> > >>> To even go a step further I would say that at least on AMD hardware we > >>> should completely disable DRI2 for one of the next generations. > >>> > >>> As a follow up I would then completely disallow the DRM authentication > >>> for amdgpu, so that the command submission interface on the primary node > >>> can only be used by the display server. > >> So amdgpu is running in one direction, while everyone else is running in > >> the other direction? Please note that your patch to change i915 landed > >> already, so that ship is sailing (but we could ofc revert that back > >> again). > >> > >> Imo really not a great idea if we do a amdgpu vs. everyone else split > >> here. If we want to deprecate dri2/flink/rendering on primary nodes across > >> the stack, then that should be a stack wide decision, not an amdgpu one. > >> > > Cannot agree more - I would love to see drivers stay consistent. > > Yeah, completely agree to that. That's why I think we should not do this > at all and just let Intel fix it's userspace bugs :P > Pretty sure I mentioned it before - might have been too subtle: The problem is _neither_ Intel nor libva specific. > Anyway my concern is really that we should stop extending functionality > on the primary node. > > E.g. the render node is for use by the clients and the primary node for > mode setting and use by the display server only. > Fully agreed. I'm not extending anything really. If anything - code is removed from drivers and core :-) Thanks Emil
Am 27.05.19 um 17:26 schrieb Daniel Vetter: > On Mon, May 27, 2019 at 3:42 PM Koenig, Christian > <Christian.Koenig@amd.com> wrote: >> Am 27.05.19 um 15:26 schrieb Emil Velikov: >>> On 2019/05/27, Daniel Vetter wrote: >>>> On Mon, May 27, 2019 at 10:47:39AM +0000, Koenig, Christian wrote: >>>>> Am 27.05.19 um 10:17 schrieb Emil Velikov: >>>>>> From: Emil Velikov <emil.velikov@collabora.com> >>>>>> >>>>>> Currently one can circumvent DRM_AUTH, when the ioctl is exposed via the >>>>>> render node. A seemingly deliberate design decision. >>>>>> >>>>>> Hence we can drop the DRM_AUTH all together (details in follow-up patch) >>>>>> yet not all userspace checks if it's authenticated, but instead uses >>>>>> uncommon assumptions. >>>>>> >>>>>> After days of digging through git log and testing, only a single (ab)use >>>>>> was spotted - the Mesa RADV driver, using the AMDGPU_INFO ioctl and >>>>>> assuming that failure implies lack of authentication. >>>>>> >>>>>> Affected versions are: >>>>>> - the whole 18.2.x series, which is EOL >>>>>> - the whole 18.3.x series, which is EOL >>>>>> - the 19.0.x series, prior to 19.0.4 >>>>> Well you could have saved your time, cause this is still a NAK. >>>>> >>>>> For the record: I strongly think that we don't want to expose any render >>>>> functionality on the primary node. >>>>> >>>>> To even go a step further I would say that at least on AMD hardware we >>>>> should completely disable DRI2 for one of the next generations. >>>>> >>>>> As a follow up I would then completely disallow the DRM authentication >>>>> for amdgpu, so that the command submission interface on the primary node >>>>> can only be used by the display server. >>>> So amdgpu is running in one direction, while everyone else is running in >>>> the other direction? Please note that your patch to change i915 landed >>>> already, so that ship is sailing (but we could ofc revert that back >>>> again). >>>> >>>> Imo really not a great idea if we do a amdgpu vs. everyone else split >>>> here. If we want to deprecate dri2/flink/rendering on primary nodes across >>>> the stack, then that should be a stack wide decision, not an amdgpu one. >>>> >>> Cannot agree more - I would love to see drivers stay consistent. >> Yeah, completely agree to that. That's why I think we should not do this >> at all and just let Intel fix it's userspace bugs :P > So you're planning to submit that revert? You did jump the gun with > sending out that patch after all too ... (aside from it got merged > because of some other mixup with r-b tags and what not). Well already regretting submitting that. On the other hand what is the minimum IOCTLs we need to get working to fix the vainfo issue? Might be a good idea looking into reverting it partially, so that at least command submission and buffer allocation is still blocked. Christian. > -Daniel > >> Anyway my concern is really that we should stop extending functionality >> on the primary node. >> >> E.g. the render node is for use by the clients and the primary node for >> mode setting and use by the display server only. >> >> Regards, >> Christian. >> >>> Fwiw, this series consistently paves the way toward nuking DRM_AUTH ;-) >>> >>> Thanks >>> Emil >
On Tue, May 28, 2019 at 8:58 AM Koenig, Christian <Christian.Koenig@amd.com> wrote: > > Am 27.05.19 um 17:26 schrieb Daniel Vetter: > > On Mon, May 27, 2019 at 3:42 PM Koenig, Christian > > <Christian.Koenig@amd.com> wrote: > >> Am 27.05.19 um 15:26 schrieb Emil Velikov: > >>> On 2019/05/27, Daniel Vetter wrote: > >>>> On Mon, May 27, 2019 at 10:47:39AM +0000, Koenig, Christian wrote: > >>>>> Am 27.05.19 um 10:17 schrieb Emil Velikov: > >>>>>> From: Emil Velikov <emil.velikov@collabora.com> > >>>>>> > >>>>>> Currently one can circumvent DRM_AUTH, when the ioctl is exposed via the > >>>>>> render node. A seemingly deliberate design decision. > >>>>>> > >>>>>> Hence we can drop the DRM_AUTH all together (details in follow-up patch) > >>>>>> yet not all userspace checks if it's authenticated, but instead uses > >>>>>> uncommon assumptions. > >>>>>> > >>>>>> After days of digging through git log and testing, only a single (ab)use > >>>>>> was spotted - the Mesa RADV driver, using the AMDGPU_INFO ioctl and > >>>>>> assuming that failure implies lack of authentication. > >>>>>> > >>>>>> Affected versions are: > >>>>>> - the whole 18.2.x series, which is EOL > >>>>>> - the whole 18.3.x series, which is EOL > >>>>>> - the 19.0.x series, prior to 19.0.4 > >>>>> Well you could have saved your time, cause this is still a NAK. > >>>>> > >>>>> For the record: I strongly think that we don't want to expose any render > >>>>> functionality on the primary node. > >>>>> > >>>>> To even go a step further I would say that at least on AMD hardware we > >>>>> should completely disable DRI2 for one of the next generations. > >>>>> > >>>>> As a follow up I would then completely disallow the DRM authentication > >>>>> for amdgpu, so that the command submission interface on the primary node > >>>>> can only be used by the display server. > >>>> So amdgpu is running in one direction, while everyone else is running in > >>>> the other direction? Please note that your patch to change i915 landed > >>>> already, so that ship is sailing (but we could ofc revert that back > >>>> again). > >>>> > >>>> Imo really not a great idea if we do a amdgpu vs. everyone else split > >>>> here. If we want to deprecate dri2/flink/rendering on primary nodes across > >>>> the stack, then that should be a stack wide decision, not an amdgpu one. > >>>> > >>> Cannot agree more - I would love to see drivers stay consistent. > >> Yeah, completely agree to that. That's why I think we should not do this > >> at all and just let Intel fix it's userspace bugs :P > > So you're planning to submit that revert? You did jump the gun with > > sending out that patch after all too ... (aside from it got merged > > because of some other mixup with r-b tags and what not). > > Well already regretting submitting that. On the other hand what is the > minimum IOCTLs we need to get working to fix the vainfo issue? We have a bit more time, it's only going to be in 5.3. But yeah if we don't bottom out on any of the options here I think revert it has to be. > Might be a good idea looking into reverting it partially, so that at > least command submission and buffer allocation is still blocked. I thought the issue is a lot more than vainfo, it's pretty much every hacked up compositor under the sun getting this wrong one way or another. Thinking about this some more, I also have no idea how you'd want to deprecate rendering on primary nodes in general. Apparently that breaks -modesetting already, and probably lots more compositors. And it looks like we're finally achieve the goal kms set out to 10 years ago, and new compositors are sprouting up all the time. I guess we could just break them all (on new hardware) and tell them to all suck it up. But I don't think that's a great option. And just deprecating this on amdgpu is going to be even harder, since then everywhere else it'll keep working, and it's just amdgpu.ko that looks broken. Aside: I'm not supporting Emil's idea here because it fixes any issues Intel has - Intel doesn't care. I support it because reality sucks, people get this render vs. primary vs. multi-gpu prime wrong all the time (that's also why we have hardcoded display+gpu pairs in mesa for the various soc combinations out there), and this looks like a pragmatic solution. It'd be nice if every compositor and everything else would perfectly support multi gpu and only use render nodes for rendering, and only primary nodes for display. But reality is that people hack on stuff until gears on screen and then move on to more interesting things (to them). So I don't think we'll ever win this :-/ -Daniel > Christian. > > > -Daniel > > > >> Anyway my concern is really that we should stop extending functionality > >> on the primary node. > >> > >> E.g. the render node is for use by the clients and the primary node for > >> mode setting and use by the display server only. > >> > >> Regards, > >> Christian. > >> > >>> Fwiw, this series consistently paves the way toward nuking DRM_AUTH ;-) > >>> > >>> Thanks > >>> Emil > > >
Am 28.05.19 um 09:38 schrieb Daniel Vetter: > [SNIP] >> Might be a good idea looking into reverting it partially, so that at >> least command submission and buffer allocation is still blocked. > I thought the issue is a lot more than vainfo, it's pretty much every > hacked up compositor under the sun getting this wrong one way or > another. Thinking about this some more, I also have no idea how you'd > want to deprecate rendering on primary nodes in general. Apparently > that breaks -modesetting already, and probably lots more compositors. > And it looks like we're finally achieve the goal kms set out to 10 > years ago, and new compositors are sprouting up all the time. I guess > we could just break them all (on new hardware) and tell them to all > suck it up. But I don't think that's a great option. And just > deprecating this on amdgpu is going to be even harder, since then > everywhere else it'll keep working, and it's just amdgpu.ko that looks > broken. > > Aside: I'm not supporting Emil's idea here because it fixes any issues > Intel has - Intel doesn't care. I support it because reality sucks, > people get this render vs. primary vs. multi-gpu prime wrong all the > time (that's also why we have hardcoded display+gpu pairs in mesa for > the various soc combinations out there), and this looks like a > pragmatic solution. It'd be nice if every compositor and everything > else would perfectly support multi gpu and only use render nodes for > rendering, and only primary nodes for display. But reality is that > people hack on stuff until gears on screen and then move on to more > interesting things (to them). So I don't think we'll ever win this :-/ Yeah, but this is a classic case of working around user space issues by making kernel changes instead of fixing user space. Having privileged (output control) and unprivileged (rendering control) functionality behind the same node is a mistake we have made a long time ago and render nodes finally seemed to be a way to fix that. I mean why are compositors using the primary node in the first place? Because they want to have access to privileged resources I think and in this case it is perfectly ok to do so. Now extending unprivileged access to the primary node actually sounds like a step into the wrong direction to me. I rather think that we should go down the route of completely dropping command submission and buffer allocation through the primary node for non master clients. And then as next step at some point drop support for authentication/flink. I mean we have done this with UMS as well and I don't see much other way to move forward and get rid of those ancient interface in the long term. Regards, Christian. > -Daniel
On Tue, May 28, 2019 at 10:03 AM Koenig, Christian <Christian.Koenig@amd.com> wrote: > > Am 28.05.19 um 09:38 schrieb Daniel Vetter: > > [SNIP] > >> Might be a good idea looking into reverting it partially, so that at > >> least command submission and buffer allocation is still blocked. > > I thought the issue is a lot more than vainfo, it's pretty much every > > hacked up compositor under the sun getting this wrong one way or > > another. Thinking about this some more, I also have no idea how you'd > > want to deprecate rendering on primary nodes in general. Apparently > > that breaks -modesetting already, and probably lots more compositors. > > And it looks like we're finally achieve the goal kms set out to 10 > > years ago, and new compositors are sprouting up all the time. I guess > > we could just break them all (on new hardware) and tell them to all > > suck it up. But I don't think that's a great option. And just > > deprecating this on amdgpu is going to be even harder, since then > > everywhere else it'll keep working, and it's just amdgpu.ko that looks > > broken. > > > > Aside: I'm not supporting Emil's idea here because it fixes any issues > > Intel has - Intel doesn't care. I support it because reality sucks, > > people get this render vs. primary vs. multi-gpu prime wrong all the > > time (that's also why we have hardcoded display+gpu pairs in mesa for > > the various soc combinations out there), and this looks like a > > pragmatic solution. It'd be nice if every compositor and everything > > else would perfectly support multi gpu and only use render nodes for > > rendering, and only primary nodes for display. But reality is that > > people hack on stuff until gears on screen and then move on to more > > interesting things (to them). So I don't think we'll ever win this :-/ > > Yeah, but this is a classic case of working around user space issues by > making kernel changes instead of fixing user space. > > Having privileged (output control) and unprivileged (rendering control) > functionality behind the same node is a mistake we have made a long time > ago and render nodes finally seemed to be a way to fix that. > > I mean why are compositors using the primary node in the first place? > Because they want to have access to privileged resources I think and in > this case it is perfectly ok to do so. > > Now extending unprivileged access to the primary node actually sounds > like a step into the wrong direction to me. > > I rather think that we should go down the route of completely dropping > command submission and buffer allocation through the primary node for > non master clients. And then as next step at some point drop support for > authentication/flink. > > I mean we have done this with UMS as well and I don't see much other way > to move forward and get rid of those ancient interface in the long term. Well kms had some really good benefits that drove quick adoption, like "suspend/resume actually has a chance of working" or "comes with buffer management so you can run multiple gears". The render node thing is a lot more niche use case (prime, better priv separation), plus "it's cleaner design". And the "cleaner design" part is something that empirically doesn't seem to matter :-/ Just two examples: - KHR_display/leases just iterated display resources on the fd needed for rendering (and iirc there was even a patch to expose that for render nodes too so it works with DRI3), because implementing protocols is too hard. Barely managed to stop that one before it happened. - Various video players use the vblank ioctl on directly to schedule frames, without telling the compositor. I discovered that when I wanted to limite the vblank ioctl to master clients only. Again, apparently too hard to use the existing extensions, or fix the bugs in there, or whatever. One userspace got fixed last year, but it'll probably get copypasted around forever :-/ So I don't think we'll ever manage to roll a clean split out, and best we can do is give in and just hand userspace what it wants. As much as that's misguided and unclean and all that. Maybe it'll result in a least fewer stuff getting run as root to hack around this, because fixing properly seems not to be on the table. The beauty of kms is that we've achieved the mission, everyone's writing their own thing. Which is also terrible, and I don't think it'll get better. -Daniel
On 2019/05/28, Daniel Vetter wrote: > On Tue, May 28, 2019 at 10:03 AM Koenig, Christian > <Christian.Koenig@amd.com> wrote: > > > > Am 28.05.19 um 09:38 schrieb Daniel Vetter: > > > [SNIP] > > >> Might be a good idea looking into reverting it partially, so that at > > >> least command submission and buffer allocation is still blocked. > > > I thought the issue is a lot more than vainfo, it's pretty much every > > > hacked up compositor under the sun getting this wrong one way or > > > another. Thinking about this some more, I also have no idea how you'd > > > want to deprecate rendering on primary nodes in general. Apparently > > > that breaks -modesetting already, and probably lots more compositors. > > > And it looks like we're finally achieve the goal kms set out to 10 > > > years ago, and new compositors are sprouting up all the time. I guess > > > we could just break them all (on new hardware) and tell them to all > > > suck it up. But I don't think that's a great option. And just > > > deprecating this on amdgpu is going to be even harder, since then > > > everywhere else it'll keep working, and it's just amdgpu.ko that looks > > > broken. > > > > > > Aside: I'm not supporting Emil's idea here because it fixes any issues > > > Intel has - Intel doesn't care. I support it because reality sucks, > > > people get this render vs. primary vs. multi-gpu prime wrong all the > > > time (that's also why we have hardcoded display+gpu pairs in mesa for > > > the various soc combinations out there), and this looks like a > > > pragmatic solution. It'd be nice if every compositor and everything > > > else would perfectly support multi gpu and only use render nodes for > > > rendering, and only primary nodes for display. But reality is that > > > people hack on stuff until gears on screen and then move on to more > > > interesting things (to them). So I don't think we'll ever win this :-/ > > > > Yeah, but this is a classic case of working around user space issues by > > making kernel changes instead of fixing user space. > > > > Having privileged (output control) and unprivileged (rendering control) > > functionality behind the same node is a mistake we have made a long time > > ago and render nodes finally seemed to be a way to fix that. > > > > I mean why are compositors using the primary node in the first place? > > Because they want to have access to privileged resources I think and in > > this case it is perfectly ok to do so. > > > > Now extending unprivileged access to the primary node actually sounds > > like a step into the wrong direction to me. > > > > I rather think that we should go down the route of completely dropping > > command submission and buffer allocation through the primary node for > > non master clients. And then as next step at some point drop support for > > authentication/flink. > > > > I mean we have done this with UMS as well and I don't see much other way > > to move forward and get rid of those ancient interface in the long term. > > Well kms had some really good benefits that drove quick adoption, like > "suspend/resume actually has a chance of working" or "comes with > buffer management so you can run multiple gears". > > The render node thing is a lot more niche use case (prime, better priv > separation), plus "it's cleaner design". And the "cleaner design" part > is something that empirically doesn't seem to matter :-/ Just two > examples: > - KHR_display/leases just iterated display resources on the fd needed > for rendering (and iirc there was even a patch to expose that for > render nodes too so it works with DRI3), because implementing > protocols is too hard. Barely managed to stop that one before it > happened. > - Various video players use the vblank ioctl on directly to schedule > frames, without telling the compositor. I discovered that when I > wanted to limite the vblank ioctl to master clients only. Again, > apparently too hard to use the existing extensions, or fix the bugs in > there, or whatever. One userspace got fixed last year, but it'll > probably get copypasted around forever :-/ > > So I don't think we'll ever manage to roll a clean split out, and best > we can do is give in and just hand userspace what it wants. As much as > that's misguided and unclean and all that. Maybe it'll result in a > least fewer stuff getting run as root to hack around this, because > fixing properly seems not to be on the table. > > The beauty of kms is that we've achieved the mission, everyone's > writing their own thing. Which is also terrible, and I don't think > it'll get better. With the risk of coming rude I will repeat my earlier comment: The problem is _neither_ Intel nor libva specific. That said, let's step back for a moment and consider: - the "block everything but KMS via the primary node" idea is great but orthogonal - the series does address issues that are vendor-agnostic - by default this series does _not_ cause any regression be that for new or old userspace - there are two trivial solutions, if the AMD team has concerns about closed-source/private stack depending on the old behaviour If they want I can even write the patches ;-) That said, the notable comments received so far are: - rework patch 13/13 to remove the DRM_AUTH from prime fd to/from handle. I'm OK but this will change the return code - from EACCES to ENOSYS - vmwgfx will need a check on the reference ioctl(s) - IIRC Thomas is planning to drop nearly all DRM_AUTH instances in their driver. Christian, as mentioned before - this series does _not_ add functionality to render nodes. It effectively paves a way towards removing DRM_AUTH. I understand the series may feel a bit dirty. Yet I would gladly address any technical concerns you have. Thanks Emil
Am 28.05.19 um 18:10 schrieb Emil Velikov: > On 2019/05/28, Daniel Vetter wrote: >> On Tue, May 28, 2019 at 10:03 AM Koenig, Christian >> <Christian.Koenig@amd.com> wrote: >>> Am 28.05.19 um 09:38 schrieb Daniel Vetter: >>>> [SNIP] >>>>> Might be a good idea looking into reverting it partially, so that at >>>>> least command submission and buffer allocation is still blocked. >>>> I thought the issue is a lot more than vainfo, it's pretty much every >>>> hacked up compositor under the sun getting this wrong one way or >>>> another. Thinking about this some more, I also have no idea how you'd >>>> want to deprecate rendering on primary nodes in general. Apparently >>>> that breaks -modesetting already, and probably lots more compositors. >>>> And it looks like we're finally achieve the goal kms set out to 10 >>>> years ago, and new compositors are sprouting up all the time. I guess >>>> we could just break them all (on new hardware) and tell them to all >>>> suck it up. But I don't think that's a great option. And just >>>> deprecating this on amdgpu is going to be even harder, since then >>>> everywhere else it'll keep working, and it's just amdgpu.ko that looks >>>> broken. >>>> >>>> Aside: I'm not supporting Emil's idea here because it fixes any issues >>>> Intel has - Intel doesn't care. I support it because reality sucks, >>>> people get this render vs. primary vs. multi-gpu prime wrong all the >>>> time (that's also why we have hardcoded display+gpu pairs in mesa for >>>> the various soc combinations out there), and this looks like a >>>> pragmatic solution. It'd be nice if every compositor and everything >>>> else would perfectly support multi gpu and only use render nodes for >>>> rendering, and only primary nodes for display. But reality is that >>>> people hack on stuff until gears on screen and then move on to more >>>> interesting things (to them). So I don't think we'll ever win this :-/ >>> Yeah, but this is a classic case of working around user space issues by >>> making kernel changes instead of fixing user space. >>> >>> Having privileged (output control) and unprivileged (rendering control) >>> functionality behind the same node is a mistake we have made a long time >>> ago and render nodes finally seemed to be a way to fix that. >>> >>> I mean why are compositors using the primary node in the first place? >>> Because they want to have access to privileged resources I think and in >>> this case it is perfectly ok to do so. >>> >>> Now extending unprivileged access to the primary node actually sounds >>> like a step into the wrong direction to me. >>> >>> I rather think that we should go down the route of completely dropping >>> command submission and buffer allocation through the primary node for >>> non master clients. And then as next step at some point drop support for >>> authentication/flink. >>> >>> I mean we have done this with UMS as well and I don't see much other way >>> to move forward and get rid of those ancient interface in the long term. >> Well kms had some really good benefits that drove quick adoption, like >> "suspend/resume actually has a chance of working" or "comes with >> buffer management so you can run multiple gears". >> >> The render node thing is a lot more niche use case (prime, better priv >> separation), plus "it's cleaner design". And the "cleaner design" part >> is something that empirically doesn't seem to matter :-/ Just two >> examples: >> - KHR_display/leases just iterated display resources on the fd needed >> for rendering (and iirc there was even a patch to expose that for >> render nodes too so it works with DRI3), because implementing >> protocols is too hard. Barely managed to stop that one before it >> happened. >> - Various video players use the vblank ioctl on directly to schedule >> frames, without telling the compositor. I discovered that when I >> wanted to limite the vblank ioctl to master clients only. Again, >> apparently too hard to use the existing extensions, or fix the bugs in >> there, or whatever. One userspace got fixed last year, but it'll >> probably get copypasted around forever :-/ >> >> So I don't think we'll ever manage to roll a clean split out, and best >> we can do is give in and just hand userspace what it wants. As much as >> that's misguided and unclean and all that. Maybe it'll result in a >> least fewer stuff getting run as root to hack around this, because >> fixing properly seems not to be on the table. >> >> The beauty of kms is that we've achieved the mission, everyone's >> writing their own thing. Which is also terrible, and I don't think >> it'll get better. > > With the risk of coming rude I will repeat my earlier comment: > > The problem is _neither_ Intel nor libva specific. > > > > That said, let's step back for a moment and consider: > > - the "block everything but KMS via the primary node" idea is great but > orthogonal > > - the series does address issues that are vendor-agnostic > > - by default this series does _not_ cause any regression be that for > new or old userspace > > - there are two trivial solutions, if the AMD team has concerns about > closed-source/private stack depending on the old behaviour > If they want I can even write the patches ;-) > > > That said, the notable comments received so far are: > - rework patch 13/13 to remove the DRM_AUTH from prime fd to/from > handle. I'm OK but this will change the return code - from EACCES to > ENOSYS > > - vmwgfx will need a check on the reference ioctl(s) - IIRC Thomas is > planning to drop nearly all DRM_AUTH instances in their driver. > > > Christian, as mentioned before - this series does _not_ add > functionality to render nodes. It effectively paves a way towards > removing DRM_AUTH. But it adds functionality to the primary node. > I understand the series may feel a bit dirty. Yet I would gladly address > any technical concerns you have. Well putting compatibility issues aside my concern is that this is simply a bad design decision which we can't revert later on. Because of this I will still reject any changes trying to remove DRM_AUTH from AMD drivers. If that means that AMD drivers will have a behavior difference to other drivers I think we are going to live with that. Regards, Christian. > > Thanks > Emil
On 2019/05/28, Koenig, Christian wrote: > Am 28.05.19 um 18:10 schrieb Emil Velikov: > > On 2019/05/28, Daniel Vetter wrote: > >> On Tue, May 28, 2019 at 10:03 AM Koenig, Christian > >> <Christian.Koenig@amd.com> wrote: > >>> Am 28.05.19 um 09:38 schrieb Daniel Vetter: > >>>> [SNIP] > >>>>> Might be a good idea looking into reverting it partially, so that at > >>>>> least command submission and buffer allocation is still blocked. > >>>> I thought the issue is a lot more than vainfo, it's pretty much every > >>>> hacked up compositor under the sun getting this wrong one way or > >>>> another. Thinking about this some more, I also have no idea how you'd > >>>> want to deprecate rendering on primary nodes in general. Apparently > >>>> that breaks -modesetting already, and probably lots more compositors. > >>>> And it looks like we're finally achieve the goal kms set out to 10 > >>>> years ago, and new compositors are sprouting up all the time. I guess > >>>> we could just break them all (on new hardware) and tell them to all > >>>> suck it up. But I don't think that's a great option. And just > >>>> deprecating this on amdgpu is going to be even harder, since then > >>>> everywhere else it'll keep working, and it's just amdgpu.ko that looks > >>>> broken. > >>>> > >>>> Aside: I'm not supporting Emil's idea here because it fixes any issues > >>>> Intel has - Intel doesn't care. I support it because reality sucks, > >>>> people get this render vs. primary vs. multi-gpu prime wrong all the > >>>> time (that's also why we have hardcoded display+gpu pairs in mesa for > >>>> the various soc combinations out there), and this looks like a > >>>> pragmatic solution. It'd be nice if every compositor and everything > >>>> else would perfectly support multi gpu and only use render nodes for > >>>> rendering, and only primary nodes for display. But reality is that > >>>> people hack on stuff until gears on screen and then move on to more > >>>> interesting things (to them). So I don't think we'll ever win this :-/ > >>> Yeah, but this is a classic case of working around user space issues by > >>> making kernel changes instead of fixing user space. > >>> > >>> Having privileged (output control) and unprivileged (rendering control) > >>> functionality behind the same node is a mistake we have made a long time > >>> ago and render nodes finally seemed to be a way to fix that. > >>> > >>> I mean why are compositors using the primary node in the first place? > >>> Because they want to have access to privileged resources I think and in > >>> this case it is perfectly ok to do so. > >>> > >>> Now extending unprivileged access to the primary node actually sounds > >>> like a step into the wrong direction to me. > >>> > >>> I rather think that we should go down the route of completely dropping > >>> command submission and buffer allocation through the primary node for > >>> non master clients. And then as next step at some point drop support for > >>> authentication/flink. > >>> > >>> I mean we have done this with UMS as well and I don't see much other way > >>> to move forward and get rid of those ancient interface in the long term. > >> Well kms had some really good benefits that drove quick adoption, like > >> "suspend/resume actually has a chance of working" or "comes with > >> buffer management so you can run multiple gears". > >> > >> The render node thing is a lot more niche use case (prime, better priv > >> separation), plus "it's cleaner design". And the "cleaner design" part > >> is something that empirically doesn't seem to matter :-/ Just two > >> examples: > >> - KHR_display/leases just iterated display resources on the fd needed > >> for rendering (and iirc there was even a patch to expose that for > >> render nodes too so it works with DRI3), because implementing > >> protocols is too hard. Barely managed to stop that one before it > >> happened. > >> - Various video players use the vblank ioctl on directly to schedule > >> frames, without telling the compositor. I discovered that when I > >> wanted to limite the vblank ioctl to master clients only. Again, > >> apparently too hard to use the existing extensions, or fix the bugs in > >> there, or whatever. One userspace got fixed last year, but it'll > >> probably get copypasted around forever :-/ > >> > >> So I don't think we'll ever manage to roll a clean split out, and best > >> we can do is give in and just hand userspace what it wants. As much as > >> that's misguided and unclean and all that. Maybe it'll result in a > >> least fewer stuff getting run as root to hack around this, because > >> fixing properly seems not to be on the table. > >> > >> The beauty of kms is that we've achieved the mission, everyone's > >> writing their own thing. Which is also terrible, and I don't think > >> it'll get better. > > > > With the risk of coming rude I will repeat my earlier comment: > > > > The problem is _neither_ Intel nor libva specific. > > > > > > > > That said, let's step back for a moment and consider: > > > > - the "block everything but KMS via the primary node" idea is great but > > orthogonal > > > > - the series does address issues that are vendor-agnostic > > > > - by default this series does _not_ cause any regression be that for > > new or old userspace > > > > - there are two trivial solutions, if the AMD team has concerns about > > closed-source/private stack depending on the old behaviour > > If they want I can even write the patches ;-) > > > > > > That said, the notable comments received so far are: > > - rework patch 13/13 to remove the DRM_AUTH from prime fd to/from > > handle. I'm OK but this will change the return code - from EACCES to > > ENOSYS > > > > - vmwgfx will need a check on the reference ioctl(s) - IIRC Thomas is > > planning to drop nearly all DRM_AUTH instances in their driver. > > > > > > Christian, as mentioned before - this series does _not_ add > > functionality to render nodes. It effectively paves a way towards > > removing DRM_AUTH. > > But it adds functionality to the primary node. > Behaviour is adjusted - functionality was there since day 1. > > I understand the series may feel a bit dirty. Yet I would gladly address > > any technical concerns you have. > > Well putting compatibility issues aside my concern is that this is > simply a bad design decision which we can't revert later on. > As sad above - any concerns (theoretical or actual regressions) can be trivially fixed _without_ reverting any of this. I am more than happy to step up and address any regressions in timely manner. As a reminder without this series, some of your customers are forced to run their applications as root. Thanks Emil
On Wed, 29 May 2019 at 02:47, Emil Velikov <emil.l.velikov@gmail.com> wrote: > > On 2019/05/28, Koenig, Christian wrote: > > Am 28.05.19 um 18:10 schrieb Emil Velikov: > > > On 2019/05/28, Daniel Vetter wrote: > > >> On Tue, May 28, 2019 at 10:03 AM Koenig, Christian > > >> <Christian.Koenig@amd.com> wrote: > > >>> Am 28.05.19 um 09:38 schrieb Daniel Vetter: > > >>>> [SNIP] > > >>>>> Might be a good idea looking into reverting it partially, so that at > > >>>>> least command submission and buffer allocation is still blocked. > > >>>> I thought the issue is a lot more than vainfo, it's pretty much every > > >>>> hacked up compositor under the sun getting this wrong one way or > > >>>> another. Thinking about this some more, I also have no idea how you'd > > >>>> want to deprecate rendering on primary nodes in general. Apparently > > >>>> that breaks -modesetting already, and probably lots more compositors. > > >>>> And it looks like we're finally achieve the goal kms set out to 10 > > >>>> years ago, and new compositors are sprouting up all the time. I guess > > >>>> we could just break them all (on new hardware) and tell them to all > > >>>> suck it up. But I don't think that's a great option. And just > > >>>> deprecating this on amdgpu is going to be even harder, since then > > >>>> everywhere else it'll keep working, and it's just amdgpu.ko that looks > > >>>> broken. > > >>>> > > >>>> Aside: I'm not supporting Emil's idea here because it fixes any issues > > >>>> Intel has - Intel doesn't care. I support it because reality sucks, > > >>>> people get this render vs. primary vs. multi-gpu prime wrong all the > > >>>> time (that's also why we have hardcoded display+gpu pairs in mesa for > > >>>> the various soc combinations out there), and this looks like a > > >>>> pragmatic solution. It'd be nice if every compositor and everything > > >>>> else would perfectly support multi gpu and only use render nodes for > > >>>> rendering, and only primary nodes for display. But reality is that > > >>>> people hack on stuff until gears on screen and then move on to more > > >>>> interesting things (to them). So I don't think we'll ever win this :-/ > > >>> Yeah, but this is a classic case of working around user space issues by > > >>> making kernel changes instead of fixing user space. > > >>> > > >>> Having privileged (output control) and unprivileged (rendering control) > > >>> functionality behind the same node is a mistake we have made a long time > > >>> ago and render nodes finally seemed to be a way to fix that. > > >>> > > >>> I mean why are compositors using the primary node in the first place? > > >>> Because they want to have access to privileged resources I think and in > > >>> this case it is perfectly ok to do so. > > >>> > > >>> Now extending unprivileged access to the primary node actually sounds > > >>> like a step into the wrong direction to me. > > >>> > > >>> I rather think that we should go down the route of completely dropping > > >>> command submission and buffer allocation through the primary node for > > >>> non master clients. And then as next step at some point drop support for > > >>> authentication/flink. > > >>> > > >>> I mean we have done this with UMS as well and I don't see much other way > > >>> to move forward and get rid of those ancient interface in the long term. > > >> Well kms had some really good benefits that drove quick adoption, like > > >> "suspend/resume actually has a chance of working" or "comes with > > >> buffer management so you can run multiple gears". > > >> > > >> The render node thing is a lot more niche use case (prime, better priv > > >> separation), plus "it's cleaner design". And the "cleaner design" part > > >> is something that empirically doesn't seem to matter :-/ Just two > > >> examples: > > >> - KHR_display/leases just iterated display resources on the fd needed > > >> for rendering (and iirc there was even a patch to expose that for > > >> render nodes too so it works with DRI3), because implementing > > >> protocols is too hard. Barely managed to stop that one before it > > >> happened. > > >> - Various video players use the vblank ioctl on directly to schedule > > >> frames, without telling the compositor. I discovered that when I > > >> wanted to limite the vblank ioctl to master clients only. Again, > > >> apparently too hard to use the existing extensions, or fix the bugs in > > >> there, or whatever. One userspace got fixed last year, but it'll > > >> probably get copypasted around forever :-/ > > >> > > >> So I don't think we'll ever manage to roll a clean split out, and best > > >> we can do is give in and just hand userspace what it wants. As much as > > >> that's misguided and unclean and all that. Maybe it'll result in a > > >> least fewer stuff getting run as root to hack around this, because > > >> fixing properly seems not to be on the table. > > >> > > >> The beauty of kms is that we've achieved the mission, everyone's > > >> writing their own thing. Which is also terrible, and I don't think > > >> it'll get better. > > > > > > With the risk of coming rude I will repeat my earlier comment: > > > > > > The problem is _neither_ Intel nor libva specific. > > > > > > > > > > > > That said, let's step back for a moment and consider: > > > > > > - the "block everything but KMS via the primary node" idea is great but > > > orthogonal > > > > > > - the series does address issues that are vendor-agnostic > > > > > > - by default this series does _not_ cause any regression be that for > > > new or old userspace > > > > > > - there are two trivial solutions, if the AMD team has concerns about > > > closed-source/private stack depending on the old behaviour > > > If they want I can even write the patches ;-) > > > > > > > > > That said, the notable comments received so far are: > > > - rework patch 13/13 to remove the DRM_AUTH from prime fd to/from > > > handle. I'm OK but this will change the return code - from EACCES to > > > ENOSYS > > > > > > - vmwgfx will need a check on the reference ioctl(s) - IIRC Thomas is > > > planning to drop nearly all DRM_AUTH instances in their driver. > > > > > > > > > Christian, as mentioned before - this series does _not_ add > > > functionality to render nodes. It effectively paves a way towards > > > removing DRM_AUTH. > > > > But it adds functionality to the primary node. > > > Behaviour is adjusted - functionality was there since day 1. > > > > I understand the series may feel a bit dirty. Yet I would gladly address > > > any technical concerns you have. > > > > Well putting compatibility issues aside my concern is that this is > > simply a bad design decision which we can't revert later on. > > > As sad above - any concerns (theoretical or actual regressions) can be > trivially fixed _without_ reverting any of this. > > I am more than happy to step up and address any regressions in timely > manner. > > > As a reminder without this series, some of your customers are forced to > run their applications as root. I'm torn here on whether this is worth it. Have we got more use cases to justify it? I'm wary of opening this up just because we can. Dave.
On 2019/05/29, Dave Airlie wrote: > On Wed, 29 May 2019 at 02:47, Emil Velikov <emil.l.velikov@gmail.com> wrote: > > > > On 2019/05/28, Koenig, Christian wrote: > > > Am 28.05.19 um 18:10 schrieb Emil Velikov: > > > > On 2019/05/28, Daniel Vetter wrote: > > > >> On Tue, May 28, 2019 at 10:03 AM Koenig, Christian > > > >> <Christian.Koenig@amd.com> wrote: > > > >>> Am 28.05.19 um 09:38 schrieb Daniel Vetter: > > > >>>> [SNIP] > > > >>>>> Might be a good idea looking into reverting it partially, so that at > > > >>>>> least command submission and buffer allocation is still blocked. > > > >>>> I thought the issue is a lot more than vainfo, it's pretty much every > > > >>>> hacked up compositor under the sun getting this wrong one way or > > > >>>> another. Thinking about this some more, I also have no idea how you'd > > > >>>> want to deprecate rendering on primary nodes in general. Apparently > > > >>>> that breaks -modesetting already, and probably lots more compositors. > > > >>>> And it looks like we're finally achieve the goal kms set out to 10 > > > >>>> years ago, and new compositors are sprouting up all the time. I guess > > > >>>> we could just break them all (on new hardware) and tell them to all > > > >>>> suck it up. But I don't think that's a great option. And just > > > >>>> deprecating this on amdgpu is going to be even harder, since then > > > >>>> everywhere else it'll keep working, and it's just amdgpu.ko that looks > > > >>>> broken. > > > >>>> > > > >>>> Aside: I'm not supporting Emil's idea here because it fixes any issues > > > >>>> Intel has - Intel doesn't care. I support it because reality sucks, > > > >>>> people get this render vs. primary vs. multi-gpu prime wrong all the > > > >>>> time (that's also why we have hardcoded display+gpu pairs in mesa for > > > >>>> the various soc combinations out there), and this looks like a > > > >>>> pragmatic solution. It'd be nice if every compositor and everything > > > >>>> else would perfectly support multi gpu and only use render nodes for > > > >>>> rendering, and only primary nodes for display. But reality is that > > > >>>> people hack on stuff until gears on screen and then move on to more > > > >>>> interesting things (to them). So I don't think we'll ever win this :-/ > > > >>> Yeah, but this is a classic case of working around user space issues by > > > >>> making kernel changes instead of fixing user space. > > > >>> > > > >>> Having privileged (output control) and unprivileged (rendering control) > > > >>> functionality behind the same node is a mistake we have made a long time > > > >>> ago and render nodes finally seemed to be a way to fix that. > > > >>> > > > >>> I mean why are compositors using the primary node in the first place? > > > >>> Because they want to have access to privileged resources I think and in > > > >>> this case it is perfectly ok to do so. > > > >>> > > > >>> Now extending unprivileged access to the primary node actually sounds > > > >>> like a step into the wrong direction to me. > > > >>> > > > >>> I rather think that we should go down the route of completely dropping > > > >>> command submission and buffer allocation through the primary node for > > > >>> non master clients. And then as next step at some point drop support for > > > >>> authentication/flink. > > > >>> > > > >>> I mean we have done this with UMS as well and I don't see much other way > > > >>> to move forward and get rid of those ancient interface in the long term. > > > >> Well kms had some really good benefits that drove quick adoption, like > > > >> "suspend/resume actually has a chance of working" or "comes with > > > >> buffer management so you can run multiple gears". > > > >> > > > >> The render node thing is a lot more niche use case (prime, better priv > > > >> separation), plus "it's cleaner design". And the "cleaner design" part > > > >> is something that empirically doesn't seem to matter :-/ Just two > > > >> examples: > > > >> - KHR_display/leases just iterated display resources on the fd needed > > > >> for rendering (and iirc there was even a patch to expose that for > > > >> render nodes too so it works with DRI3), because implementing > > > >> protocols is too hard. Barely managed to stop that one before it > > > >> happened. > > > >> - Various video players use the vblank ioctl on directly to schedule > > > >> frames, without telling the compositor. I discovered that when I > > > >> wanted to limite the vblank ioctl to master clients only. Again, > > > >> apparently too hard to use the existing extensions, or fix the bugs in > > > >> there, or whatever. One userspace got fixed last year, but it'll > > > >> probably get copypasted around forever :-/ > > > >> > > > >> So I don't think we'll ever manage to roll a clean split out, and best > > > >> we can do is give in and just hand userspace what it wants. As much as > > > >> that's misguided and unclean and all that. Maybe it'll result in a > > > >> least fewer stuff getting run as root to hack around this, because > > > >> fixing properly seems not to be on the table. > > > >> > > > >> The beauty of kms is that we've achieved the mission, everyone's > > > >> writing their own thing. Which is also terrible, and I don't think > > > >> it'll get better. > > > > > > > > With the risk of coming rude I will repeat my earlier comment: > > > > > > > > The problem is _neither_ Intel nor libva specific. > > > > > > > > > > > > > > > > That said, let's step back for a moment and consider: > > > > > > > > - the "block everything but KMS via the primary node" idea is great but > > > > orthogonal > > > > > > > > - the series does address issues that are vendor-agnostic > > > > > > > > - by default this series does _not_ cause any regression be that for > > > > new or old userspace > > > > > > > > - there are two trivial solutions, if the AMD team has concerns about > > > > closed-source/private stack depending on the old behaviour > > > > If they want I can even write the patches ;-) > > > > > > > > > > > > That said, the notable comments received so far are: > > > > - rework patch 13/13 to remove the DRM_AUTH from prime fd to/from > > > > handle. I'm OK but this will change the return code - from EACCES to > > > > ENOSYS > > > > > > > > - vmwgfx will need a check on the reference ioctl(s) - IIRC Thomas is > > > > planning to drop nearly all DRM_AUTH instances in their driver. > > > > > > > > > > > > Christian, as mentioned before - this series does _not_ add > > > > functionality to render nodes. It effectively paves a way towards > > > > removing DRM_AUTH. > > > > > > But it adds functionality to the primary node. > > > > > Behaviour is adjusted - functionality was there since day 1. > > > > > > I understand the series may feel a bit dirty. Yet I would gladly address > > > > any technical concerns you have. > > > > > > Well putting compatibility issues aside my concern is that this is > > > simply a bad design decision which we can't revert later on. > > > > > As sad above - any concerns (theoretical or actual regressions) can be > > trivially fixed _without_ reverting any of this. > > > > I am more than happy to step up and address any regressions in timely > > manner. > > > > > > As a reminder without this series, some of your customers are forced to > > run their applications as root. > > I'm torn here on whether this is worth it. Have we got more use cases > to justify it? > Should have mentioned: three DRM drivers (not counting i915) have dropped DRM_AUTH, assumingly for the same reasons I'm bringing here. Apart from the libva, kmscube + gst and mesa, I'm expecting other projects to make the same mistake. Since the former three define the norm of using DRM. The "fix" for all of these being "run as root" :-\ > I'm wary of opening this up just because we can. > What can I do to alleviate that worry? I have spent over a week auditing code and designed so that we can reinstate the authentication only where needed. Thanks Emil
Am 29.05.19 um 15:03 schrieb Emil Velikov: > On 2019/05/29, Dave Airlie wrote: >> On Wed, 29 May 2019 at 02:47, Emil Velikov <emil.l.velikov@gmail.com> wrote: >>> On 2019/05/28, Koenig, Christian wrote: >>>> Am 28.05.19 um 18:10 schrieb Emil Velikov: >>>>> On 2019/05/28, Daniel Vetter wrote: >>>>>> On Tue, May 28, 2019 at 10:03 AM Koenig, Christian >>>>>> <Christian.Koenig@amd.com> wrote: >>>>>>> Am 28.05.19 um 09:38 schrieb Daniel Vetter: >>>>>>>> [SNIP] >>>>>>>>> Might be a good idea looking into reverting it partially, so that at >>>>>>>>> least command submission and buffer allocation is still blocked. >>>>>>>> I thought the issue is a lot more than vainfo, it's pretty much every >>>>>>>> hacked up compositor under the sun getting this wrong one way or >>>>>>>> another. Thinking about this some more, I also have no idea how you'd >>>>>>>> want to deprecate rendering on primary nodes in general. Apparently >>>>>>>> that breaks -modesetting already, and probably lots more compositors. >>>>>>>> And it looks like we're finally achieve the goal kms set out to 10 >>>>>>>> years ago, and new compositors are sprouting up all the time. I guess >>>>>>>> we could just break them all (on new hardware) and tell them to all >>>>>>>> suck it up. But I don't think that's a great option. And just >>>>>>>> deprecating this on amdgpu is going to be even harder, since then >>>>>>>> everywhere else it'll keep working, and it's just amdgpu.ko that looks >>>>>>>> broken. >>>>>>>> >>>>>>>> Aside: I'm not supporting Emil's idea here because it fixes any issues >>>>>>>> Intel has - Intel doesn't care. I support it because reality sucks, >>>>>>>> people get this render vs. primary vs. multi-gpu prime wrong all the >>>>>>>> time (that's also why we have hardcoded display+gpu pairs in mesa for >>>>>>>> the various soc combinations out there), and this looks like a >>>>>>>> pragmatic solution. It'd be nice if every compositor and everything >>>>>>>> else would perfectly support multi gpu and only use render nodes for >>>>>>>> rendering, and only primary nodes for display. But reality is that >>>>>>>> people hack on stuff until gears on screen and then move on to more >>>>>>>> interesting things (to them). So I don't think we'll ever win this :-/ >>>>>>> Yeah, but this is a classic case of working around user space issues by >>>>>>> making kernel changes instead of fixing user space. >>>>>>> >>>>>>> Having privileged (output control) and unprivileged (rendering control) >>>>>>> functionality behind the same node is a mistake we have made a long time >>>>>>> ago and render nodes finally seemed to be a way to fix that. >>>>>>> >>>>>>> I mean why are compositors using the primary node in the first place? >>>>>>> Because they want to have access to privileged resources I think and in >>>>>>> this case it is perfectly ok to do so. >>>>>>> >>>>>>> Now extending unprivileged access to the primary node actually sounds >>>>>>> like a step into the wrong direction to me. >>>>>>> >>>>>>> I rather think that we should go down the route of completely dropping >>>>>>> command submission and buffer allocation through the primary node for >>>>>>> non master clients. And then as next step at some point drop support for >>>>>>> authentication/flink. >>>>>>> >>>>>>> I mean we have done this with UMS as well and I don't see much other way >>>>>>> to move forward and get rid of those ancient interface in the long term. >>>>>> Well kms had some really good benefits that drove quick adoption, like >>>>>> "suspend/resume actually has a chance of working" or "comes with >>>>>> buffer management so you can run multiple gears". >>>>>> >>>>>> The render node thing is a lot more niche use case (prime, better priv >>>>>> separation), plus "it's cleaner design". And the "cleaner design" part >>>>>> is something that empirically doesn't seem to matter :-/ Just two >>>>>> examples: >>>>>> - KHR_display/leases just iterated display resources on the fd needed >>>>>> for rendering (and iirc there was even a patch to expose that for >>>>>> render nodes too so it works with DRI3), because implementing >>>>>> protocols is too hard. Barely managed to stop that one before it >>>>>> happened. >>>>>> - Various video players use the vblank ioctl on directly to schedule >>>>>> frames, without telling the compositor. I discovered that when I >>>>>> wanted to limite the vblank ioctl to master clients only. Again, >>>>>> apparently too hard to use the existing extensions, or fix the bugs in >>>>>> there, or whatever. One userspace got fixed last year, but it'll >>>>>> probably get copypasted around forever :-/ >>>>>> >>>>>> So I don't think we'll ever manage to roll a clean split out, and best >>>>>> we can do is give in and just hand userspace what it wants. As much as >>>>>> that's misguided and unclean and all that. Maybe it'll result in a >>>>>> least fewer stuff getting run as root to hack around this, because >>>>>> fixing properly seems not to be on the table. >>>>>> >>>>>> The beauty of kms is that we've achieved the mission, everyone's >>>>>> writing their own thing. Which is also terrible, and I don't think >>>>>> it'll get better. >>>>> With the risk of coming rude I will repeat my earlier comment: >>>>> >>>>> The problem is _neither_ Intel nor libva specific. >>>>> >>>>> >>>>> >>>>> That said, let's step back for a moment and consider: >>>>> >>>>> - the "block everything but KMS via the primary node" idea is great but >>>>> orthogonal >>>>> >>>>> - the series does address issues that are vendor-agnostic >>>>> >>>>> - by default this series does _not_ cause any regression be that for >>>>> new or old userspace >>>>> >>>>> - there are two trivial solutions, if the AMD team has concerns about >>>>> closed-source/private stack depending on the old behaviour >>>>> If they want I can even write the patches ;-) >>>>> >>>>> >>>>> That said, the notable comments received so far are: >>>>> - rework patch 13/13 to remove the DRM_AUTH from prime fd to/from >>>>> handle. I'm OK but this will change the return code - from EACCES to >>>>> ENOSYS >>>>> >>>>> - vmwgfx will need a check on the reference ioctl(s) - IIRC Thomas is >>>>> planning to drop nearly all DRM_AUTH instances in their driver. >>>>> >>>>> >>>>> Christian, as mentioned before - this series does _not_ add >>>>> functionality to render nodes. It effectively paves a way towards >>>>> removing DRM_AUTH. >>>> But it adds functionality to the primary node. >>>> >>> Behaviour is adjusted - functionality was there since day 1. >>> >>>>> I understand the series may feel a bit dirty. Yet I would gladly address >>>>> any technical concerns you have. >>>> Well putting compatibility issues aside my concern is that this is >>>> simply a bad design decision which we can't revert later on. >>>> >>> As sad above - any concerns (theoretical or actual regressions) can be >>> trivially fixed _without_ reverting any of this. >>> >>> I am more than happy to step up and address any regressions in timely >>> manner. >>> >>> >>> As a reminder without this series, some of your customers are forced to >>> run their applications as root. >> I'm torn here on whether this is worth it. Have we got more use cases >> to justify it? >> > Should have mentioned: three DRM drivers (not counting i915) have > dropped DRM_AUTH, assumingly for the same reasons I'm bringing here. > > Apart from the libva, kmscube + gst and mesa, I'm expecting other > projects to make the same mistake. Since the former three define the > norm of using DRM. > > The "fix" for all of these being "run as root" :-\ > >> I'm wary of opening this up just because we can. >> > What can I do to alleviate that worry? I have spent over a week auditing > code and designed so that we can reinstate the authentication only where > needed. Well I don't think the worry here is about regressions, but rather about a design decision we will never be able to revert. So the question we have to ask is rather if it's a good design decision to resurrect the primary node with all its related compability burdens to work around an issue which is essentially an userspace coding error. And from my point of view the answer is relatively clear that this is not going to worth it. Regards, Christian. > > Thanks > Emil
On 2019/05/29, Koenig, Christian wrote: > Am 29.05.19 um 15:03 schrieb Emil Velikov: > > On 2019/05/29, Dave Airlie wrote: > >> On Wed, 29 May 2019 at 02:47, Emil Velikov <emil.l.velikov@gmail.com> wrote: > >>> On 2019/05/28, Koenig, Christian wrote: > >>>> Am 28.05.19 um 18:10 schrieb Emil Velikov: > >>>>> On 2019/05/28, Daniel Vetter wrote: > >>>>>> On Tue, May 28, 2019 at 10:03 AM Koenig, Christian > >>>>>> <Christian.Koenig@amd.com> wrote: > >>>>>>> Am 28.05.19 um 09:38 schrieb Daniel Vetter: > >>>>>>>> [SNIP] > >>>>>>>>> Might be a good idea looking into reverting it partially, so that at > >>>>>>>>> least command submission and buffer allocation is still blocked. > >>>>>>>> I thought the issue is a lot more than vainfo, it's pretty much every > >>>>>>>> hacked up compositor under the sun getting this wrong one way or > >>>>>>>> another. Thinking about this some more, I also have no idea how you'd > >>>>>>>> want to deprecate rendering on primary nodes in general. Apparently > >>>>>>>> that breaks -modesetting already, and probably lots more compositors. > >>>>>>>> And it looks like we're finally achieve the goal kms set out to 10 > >>>>>>>> years ago, and new compositors are sprouting up all the time. I guess > >>>>>>>> we could just break them all (on new hardware) and tell them to all > >>>>>>>> suck it up. But I don't think that's a great option. And just > >>>>>>>> deprecating this on amdgpu is going to be even harder, since then > >>>>>>>> everywhere else it'll keep working, and it's just amdgpu.ko that looks > >>>>>>>> broken. > >>>>>>>> > >>>>>>>> Aside: I'm not supporting Emil's idea here because it fixes any issues > >>>>>>>> Intel has - Intel doesn't care. I support it because reality sucks, > >>>>>>>> people get this render vs. primary vs. multi-gpu prime wrong all the > >>>>>>>> time (that's also why we have hardcoded display+gpu pairs in mesa for > >>>>>>>> the various soc combinations out there), and this looks like a > >>>>>>>> pragmatic solution. It'd be nice if every compositor and everything > >>>>>>>> else would perfectly support multi gpu and only use render nodes for > >>>>>>>> rendering, and only primary nodes for display. But reality is that > >>>>>>>> people hack on stuff until gears on screen and then move on to more > >>>>>>>> interesting things (to them). So I don't think we'll ever win this :-/ > >>>>>>> Yeah, but this is a classic case of working around user space issues by > >>>>>>> making kernel changes instead of fixing user space. > >>>>>>> > >>>>>>> Having privileged (output control) and unprivileged (rendering control) > >>>>>>> functionality behind the same node is a mistake we have made a long time > >>>>>>> ago and render nodes finally seemed to be a way to fix that. > >>>>>>> > >>>>>>> I mean why are compositors using the primary node in the first place? > >>>>>>> Because they want to have access to privileged resources I think and in > >>>>>>> this case it is perfectly ok to do so. > >>>>>>> > >>>>>>> Now extending unprivileged access to the primary node actually sounds > >>>>>>> like a step into the wrong direction to me. > >>>>>>> > >>>>>>> I rather think that we should go down the route of completely dropping > >>>>>>> command submission and buffer allocation through the primary node for > >>>>>>> non master clients. And then as next step at some point drop support for > >>>>>>> authentication/flink. > >>>>>>> > >>>>>>> I mean we have done this with UMS as well and I don't see much other way > >>>>>>> to move forward and get rid of those ancient interface in the long term. > >>>>>> Well kms had some really good benefits that drove quick adoption, like > >>>>>> "suspend/resume actually has a chance of working" or "comes with > >>>>>> buffer management so you can run multiple gears". > >>>>>> > >>>>>> The render node thing is a lot more niche use case (prime, better priv > >>>>>> separation), plus "it's cleaner design". And the "cleaner design" part > >>>>>> is something that empirically doesn't seem to matter :-/ Just two > >>>>>> examples: > >>>>>> - KHR_display/leases just iterated display resources on the fd needed > >>>>>> for rendering (and iirc there was even a patch to expose that for > >>>>>> render nodes too so it works with DRI3), because implementing > >>>>>> protocols is too hard. Barely managed to stop that one before it > >>>>>> happened. > >>>>>> - Various video players use the vblank ioctl on directly to schedule > >>>>>> frames, without telling the compositor. I discovered that when I > >>>>>> wanted to limite the vblank ioctl to master clients only. Again, > >>>>>> apparently too hard to use the existing extensions, or fix the bugs in > >>>>>> there, or whatever. One userspace got fixed last year, but it'll > >>>>>> probably get copypasted around forever :-/ > >>>>>> > >>>>>> So I don't think we'll ever manage to roll a clean split out, and best > >>>>>> we can do is give in and just hand userspace what it wants. As much as > >>>>>> that's misguided and unclean and all that. Maybe it'll result in a > >>>>>> least fewer stuff getting run as root to hack around this, because > >>>>>> fixing properly seems not to be on the table. > >>>>>> > >>>>>> The beauty of kms is that we've achieved the mission, everyone's > >>>>>> writing their own thing. Which is also terrible, and I don't think > >>>>>> it'll get better. > >>>>> With the risk of coming rude I will repeat my earlier comment: > >>>>> > >>>>> The problem is _neither_ Intel nor libva specific. > >>>>> > >>>>> > >>>>> > >>>>> That said, let's step back for a moment and consider: > >>>>> > >>>>> - the "block everything but KMS via the primary node" idea is great but > >>>>> orthogonal > >>>>> > >>>>> - the series does address issues that are vendor-agnostic > >>>>> > >>>>> - by default this series does _not_ cause any regression be that for > >>>>> new or old userspace > >>>>> > >>>>> - there are two trivial solutions, if the AMD team has concerns about > >>>>> closed-source/private stack depending on the old behaviour > >>>>> If they want I can even write the patches ;-) > >>>>> > >>>>> > >>>>> That said, the notable comments received so far are: > >>>>> - rework patch 13/13 to remove the DRM_AUTH from prime fd to/from > >>>>> handle. I'm OK but this will change the return code - from EACCES to > >>>>> ENOSYS > >>>>> > >>>>> - vmwgfx will need a check on the reference ioctl(s) - IIRC Thomas is > >>>>> planning to drop nearly all DRM_AUTH instances in their driver. > >>>>> > >>>>> > >>>>> Christian, as mentioned before - this series does _not_ add > >>>>> functionality to render nodes. It effectively paves a way towards > >>>>> removing DRM_AUTH. > >>>> But it adds functionality to the primary node. > >>>> > >>> Behaviour is adjusted - functionality was there since day 1. > >>> > >>>>> I understand the series may feel a bit dirty. Yet I would gladly address > >>>>> any technical concerns you have. > >>>> Well putting compatibility issues aside my concern is that this is > >>>> simply a bad design decision which we can't revert later on. > >>>> > >>> As sad above - any concerns (theoretical or actual regressions) can be > >>> trivially fixed _without_ reverting any of this. > >>> > >>> I am more than happy to step up and address any regressions in timely > >>> manner. > >>> > >>> > >>> As a reminder without this series, some of your customers are forced to > >>> run their applications as root. > >> I'm torn here on whether this is worth it. Have we got more use cases > >> to justify it? > >> > > Should have mentioned: three DRM drivers (not counting i915) have > > dropped DRM_AUTH, assumingly for the same reasons I'm bringing here. > > > > Apart from the libva, kmscube + gst and mesa, I'm expecting other > > projects to make the same mistake. Since the former three define the > > norm of using DRM. > > > > The "fix" for all of these being "run as root" :-\ > > > >> I'm wary of opening this up just because we can. > >> > > What can I do to alleviate that worry? I have spent over a week auditing > > code and designed so that we can reinstate the authentication only where > > needed. > > Well I don't think the worry here is about regressions, Glad to hear. > but rather about > a design decision we will never be able to revert. > Can you think of any reason/issue why we would want to revert this? I will gladly spend some thing exploring how to address it. > So the question we have to ask is rather if it's a good design decision > to resurrect the primary node with all its related compability burdens > to work around an issue which is essentially an userspace coding error. > Can see you're not happy on the topic - I'm not too excited either. The truth to the matter is - DRM drivers have dropped DRM_AUTH regardless of my work. It's very unfortunate, if AMDGPU stands out. Perhaps after some time and unhappy users you'll reconsider. I believe that Linus has pointed out a number of times that kernel developers should care about our users. Even when it's an userspace error. HTH Emil
Am 29.05.19 um 18:29 schrieb Emil Velikov: > On 2019/05/29, Koenig, Christian wrote: >> Am 29.05.19 um 15:03 schrieb Emil Velikov: >>> On 2019/05/29, Dave Airlie wrote: >>>> On Wed, 29 May 2019 at 02:47, Emil Velikov <emil.l.velikov@gmail.com> wrote: >>>>> On 2019/05/28, Koenig, Christian wrote: >>>>>> Am 28.05.19 um 18:10 schrieb Emil Velikov: >>>>>>> On 2019/05/28, Daniel Vetter wrote: >>>>>>>> On Tue, May 28, 2019 at 10:03 AM Koenig, Christian >>>>>>>> <Christian.Koenig@amd.com> wrote: >>>>>>>>> Am 28.05.19 um 09:38 schrieb Daniel Vetter: >>>>>>>>>> [SNIP] >>>>>>>>>>> Might be a good idea looking into reverting it partially, so that at >>>>>>>>>>> least command submission and buffer allocation is still blocked. >>>>>>>>>> I thought the issue is a lot more than vainfo, it's pretty much every >>>>>>>>>> hacked up compositor under the sun getting this wrong one way or >>>>>>>>>> another. Thinking about this some more, I also have no idea how you'd >>>>>>>>>> want to deprecate rendering on primary nodes in general. Apparently >>>>>>>>>> that breaks -modesetting already, and probably lots more compositors. >>>>>>>>>> And it looks like we're finally achieve the goal kms set out to 10 >>>>>>>>>> years ago, and new compositors are sprouting up all the time. I guess >>>>>>>>>> we could just break them all (on new hardware) and tell them to all >>>>>>>>>> suck it up. But I don't think that's a great option. And just >>>>>>>>>> deprecating this on amdgpu is going to be even harder, since then >>>>>>>>>> everywhere else it'll keep working, and it's just amdgpu.ko that looks >>>>>>>>>> broken. >>>>>>>>>> >>>>>>>>>> Aside: I'm not supporting Emil's idea here because it fixes any issues >>>>>>>>>> Intel has - Intel doesn't care. I support it because reality sucks, >>>>>>>>>> people get this render vs. primary vs. multi-gpu prime wrong all the >>>>>>>>>> time (that's also why we have hardcoded display+gpu pairs in mesa for >>>>>>>>>> the various soc combinations out there), and this looks like a >>>>>>>>>> pragmatic solution. It'd be nice if every compositor and everything >>>>>>>>>> else would perfectly support multi gpu and only use render nodes for >>>>>>>>>> rendering, and only primary nodes for display. But reality is that >>>>>>>>>> people hack on stuff until gears on screen and then move on to more >>>>>>>>>> interesting things (to them). So I don't think we'll ever win this :-/ >>>>>>>>> Yeah, but this is a classic case of working around user space issues by >>>>>>>>> making kernel changes instead of fixing user space. >>>>>>>>> >>>>>>>>> Having privileged (output control) and unprivileged (rendering control) >>>>>>>>> functionality behind the same node is a mistake we have made a long time >>>>>>>>> ago and render nodes finally seemed to be a way to fix that. >>>>>>>>> >>>>>>>>> I mean why are compositors using the primary node in the first place? >>>>>>>>> Because they want to have access to privileged resources I think and in >>>>>>>>> this case it is perfectly ok to do so. >>>>>>>>> >>>>>>>>> Now extending unprivileged access to the primary node actually sounds >>>>>>>>> like a step into the wrong direction to me. >>>>>>>>> >>>>>>>>> I rather think that we should go down the route of completely dropping >>>>>>>>> command submission and buffer allocation through the primary node for >>>>>>>>> non master clients. And then as next step at some point drop support for >>>>>>>>> authentication/flink. >>>>>>>>> >>>>>>>>> I mean we have done this with UMS as well and I don't see much other way >>>>>>>>> to move forward and get rid of those ancient interface in the long term. >>>>>>>> Well kms had some really good benefits that drove quick adoption, like >>>>>>>> "suspend/resume actually has a chance of working" or "comes with >>>>>>>> buffer management so you can run multiple gears". >>>>>>>> >>>>>>>> The render node thing is a lot more niche use case (prime, better priv >>>>>>>> separation), plus "it's cleaner design". And the "cleaner design" part >>>>>>>> is something that empirically doesn't seem to matter :-/ Just two >>>>>>>> examples: >>>>>>>> - KHR_display/leases just iterated display resources on the fd needed >>>>>>>> for rendering (and iirc there was even a patch to expose that for >>>>>>>> render nodes too so it works with DRI3), because implementing >>>>>>>> protocols is too hard. Barely managed to stop that one before it >>>>>>>> happened. >>>>>>>> - Various video players use the vblank ioctl on directly to schedule >>>>>>>> frames, without telling the compositor. I discovered that when I >>>>>>>> wanted to limite the vblank ioctl to master clients only. Again, >>>>>>>> apparently too hard to use the existing extensions, or fix the bugs in >>>>>>>> there, or whatever. One userspace got fixed last year, but it'll >>>>>>>> probably get copypasted around forever :-/ >>>>>>>> >>>>>>>> So I don't think we'll ever manage to roll a clean split out, and best >>>>>>>> we can do is give in and just hand userspace what it wants. As much as >>>>>>>> that's misguided and unclean and all that. Maybe it'll result in a >>>>>>>> least fewer stuff getting run as root to hack around this, because >>>>>>>> fixing properly seems not to be on the table. >>>>>>>> >>>>>>>> The beauty of kms is that we've achieved the mission, everyone's >>>>>>>> writing their own thing. Which is also terrible, and I don't think >>>>>>>> it'll get better. >>>>>>> With the risk of coming rude I will repeat my earlier comment: >>>>>>> >>>>>>> The problem is _neither_ Intel nor libva specific. >>>>>>> >>>>>>> >>>>>>> >>>>>>> That said, let's step back for a moment and consider: >>>>>>> >>>>>>> - the "block everything but KMS via the primary node" idea is great but >>>>>>> orthogonal >>>>>>> >>>>>>> - the series does address issues that are vendor-agnostic >>>>>>> >>>>>>> - by default this series does _not_ cause any regression be that for >>>>>>> new or old userspace >>>>>>> >>>>>>> - there are two trivial solutions, if the AMD team has concerns about >>>>>>> closed-source/private stack depending on the old behaviour >>>>>>> If they want I can even write the patches ;-) >>>>>>> >>>>>>> >>>>>>> That said, the notable comments received so far are: >>>>>>> - rework patch 13/13 to remove the DRM_AUTH from prime fd to/from >>>>>>> handle. I'm OK but this will change the return code - from EACCES to >>>>>>> ENOSYS >>>>>>> >>>>>>> - vmwgfx will need a check on the reference ioctl(s) - IIRC Thomas is >>>>>>> planning to drop nearly all DRM_AUTH instances in their driver. >>>>>>> >>>>>>> >>>>>>> Christian, as mentioned before - this series does _not_ add >>>>>>> functionality to render nodes. It effectively paves a way towards >>>>>>> removing DRM_AUTH. >>>>>> But it adds functionality to the primary node. >>>>>> >>>>> Behaviour is adjusted - functionality was there since day 1. >>>>> >>>>>>> I understand the series may feel a bit dirty. Yet I would gladly address >>>>>>> any technical concerns you have. >>>>>> Well putting compatibility issues aside my concern is that this is >>>>>> simply a bad design decision which we can't revert later on. >>>>>> >>>>> As sad above - any concerns (theoretical or actual regressions) can be >>>>> trivially fixed _without_ reverting any of this. >>>>> >>>>> I am more than happy to step up and address any regressions in timely >>>>> manner. >>>>> >>>>> >>>>> As a reminder without this series, some of your customers are forced to >>>>> run their applications as root. >>>> I'm torn here on whether this is worth it. Have we got more use cases >>>> to justify it? >>>> >>> Should have mentioned: three DRM drivers (not counting i915) have >>> dropped DRM_AUTH, assumingly for the same reasons I'm bringing here. >>> >>> Apart from the libva, kmscube + gst and mesa, I'm expecting other >>> projects to make the same mistake. Since the former three define the >>> norm of using DRM. >>> >>> The "fix" for all of these being "run as root" :-\ >>> >>>> I'm wary of opening this up just because we can. >>>> >>> What can I do to alleviate that worry? I have spent over a week auditing >>> code and designed so that we can reinstate the authentication only where >>> needed. >> Well I don't think the worry here is about regressions, > Glad to hear. > >> but rather about >> a design decision we will never be able to revert. >> > Can you think of any reason/issue why we would want to revert this? I > will gladly spend some thing exploring how to address it. Well, to finally get rid of the primary node for non display hardware. And in general to have a clean separation between display and rendering. >> So the question we have to ask is rather if it's a good design decision >> to resurrect the primary node with all its related compability burdens >> to work around an issue which is essentially an userspace coding error. >> > Can see you're not happy on the topic - I'm not too excited either. The > truth to the matter is - DRM drivers have dropped DRM_AUTH regardless of > my work. Then we should probably consider stopping doing this and enforce that the primary node is not used that widely any more. Regards, Christian. > > It's very unfortunate, if AMDGPU stands out. Perhaps after some time and > unhappy users you'll reconsider. > > I believe that Linus has pointed out a number of times that kernel > developers should care about our users. Even when it's an userspace > error. > > > HTH > Emil
On 2019-05-28 10:03 a.m., Koenig, Christian wrote: > > I rather think that we should go down the route of completely dropping > command submission and buffer allocation through the primary node for > non master clients. And then as next step at some point drop support for > authentication/flink. Keep in mind that even display servers aren't DRM master while their VT isn't active, so this might be problematic if a display server needs to do some command submission / buffer allocation during that time.
Am 04.06.19 um 12:50 schrieb Michel Dänzer: > On 2019-05-28 10:03 a.m., Koenig, Christian wrote: >> I rather think that we should go down the route of completely dropping >> command submission and buffer allocation through the primary node for >> non master clients. And then as next step at some point drop support for >> authentication/flink. > Keep in mind that even display servers aren't DRM master while their VT > isn't active, so this might be problematic if a display server needs to > do some command submission / buffer allocation during that time. If I understand it correctly the DRM fd stays master even when the VT is switched away, it's just not the current master any more. So in this case fpriv->is_master stays true, but drm_is_current_master(fpriv) returns false. And yes we mixed that up in amdgpu, i915 and vmwgfx. Somebody should probably write patches to fix this. Regards, Christian.
On Tue, Jun 4, 2019 at 1:24 PM Koenig, Christian <Christian.Koenig@amd.com> wrote: > > Am 04.06.19 um 12:50 schrieb Michel Dänzer: > > On 2019-05-28 10:03 a.m., Koenig, Christian wrote: > >> I rather think that we should go down the route of completely dropping > >> command submission and buffer allocation through the primary node for > >> non master clients. And then as next step at some point drop support for > >> authentication/flink. > > Keep in mind that even display servers aren't DRM master while their VT > > isn't active, so this might be problematic if a display server needs to > > do some command submission / buffer allocation during that time. > > If I understand it correctly the DRM fd stays master even when the VT is > switched away, it's just not the current master any more. > > So in this case fpriv->is_master stays true, but > drm_is_current_master(fpriv) returns false. > > And yes we mixed that up in amdgpu, i915 and vmwgfx. Somebody should > probably write patches to fix this. master should always be authenticated, so should be able to continue rendering. Well ... except on drivers who do take isolation somewhat serious, but don't have full per-client isolation. Those actually refuse rendering/gpu access for any authenticated client if their corresponding master isn't the current master. But I think only vmwgfx does that. Per-client isolation has taken over anyway, so all a bit moot, at least on modern hw. -Daniel
On 2019/05/31, Koenig, Christian wrote: > Am 29.05.19 um 18:29 schrieb Emil Velikov: > > On 2019/05/29, Koenig, Christian wrote: > >> Am 29.05.19 um 15:03 schrieb Emil Velikov: > >>> On 2019/05/29, Dave Airlie wrote: > >>>> On Wed, 29 May 2019 at 02:47, Emil Velikov <emil.l.velikov@gmail.com> wrote: > >>>>> On 2019/05/28, Koenig, Christian wrote: > >>>>>> Am 28.05.19 um 18:10 schrieb Emil Velikov: > >>>>>>> On 2019/05/28, Daniel Vetter wrote: > >>>>>>>> On Tue, May 28, 2019 at 10:03 AM Koenig, Christian > >>>>>>>> <Christian.Koenig@amd.com> wrote: > >>>>>>>>> Am 28.05.19 um 09:38 schrieb Daniel Vetter: > >>>>>>>>>> [SNIP] > >>>>>>>>>>> Might be a good idea looking into reverting it partially, so that at > >>>>>>>>>>> least command submission and buffer allocation is still blocked. > >>>>>>>>>> I thought the issue is a lot more than vainfo, it's pretty much every > >>>>>>>>>> hacked up compositor under the sun getting this wrong one way or > >>>>>>>>>> another. Thinking about this some more, I also have no idea how you'd > >>>>>>>>>> want to deprecate rendering on primary nodes in general. Apparently > >>>>>>>>>> that breaks -modesetting already, and probably lots more compositors. > >>>>>>>>>> And it looks like we're finally achieve the goal kms set out to 10 > >>>>>>>>>> years ago, and new compositors are sprouting up all the time. I guess > >>>>>>>>>> we could just break them all (on new hardware) and tell them to all > >>>>>>>>>> suck it up. But I don't think that's a great option. And just > >>>>>>>>>> deprecating this on amdgpu is going to be even harder, since then > >>>>>>>>>> everywhere else it'll keep working, and it's just amdgpu.ko that looks > >>>>>>>>>> broken. > >>>>>>>>>> > >>>>>>>>>> Aside: I'm not supporting Emil's idea here because it fixes any issues > >>>>>>>>>> Intel has - Intel doesn't care. I support it because reality sucks, > >>>>>>>>>> people get this render vs. primary vs. multi-gpu prime wrong all the > >>>>>>>>>> time (that's also why we have hardcoded display+gpu pairs in mesa for > >>>>>>>>>> the various soc combinations out there), and this looks like a > >>>>>>>>>> pragmatic solution. It'd be nice if every compositor and everything > >>>>>>>>>> else would perfectly support multi gpu and only use render nodes for > >>>>>>>>>> rendering, and only primary nodes for display. But reality is that > >>>>>>>>>> people hack on stuff until gears on screen and then move on to more > >>>>>>>>>> interesting things (to them). So I don't think we'll ever win this :-/ > >>>>>>>>> Yeah, but this is a classic case of working around user space issues by > >>>>>>>>> making kernel changes instead of fixing user space. > >>>>>>>>> > >>>>>>>>> Having privileged (output control) and unprivileged (rendering control) > >>>>>>>>> functionality behind the same node is a mistake we have made a long time > >>>>>>>>> ago and render nodes finally seemed to be a way to fix that. > >>>>>>>>> > >>>>>>>>> I mean why are compositors using the primary node in the first place? > >>>>>>>>> Because they want to have access to privileged resources I think and in > >>>>>>>>> this case it is perfectly ok to do so. > >>>>>>>>> > >>>>>>>>> Now extending unprivileged access to the primary node actually sounds > >>>>>>>>> like a step into the wrong direction to me. > >>>>>>>>> > >>>>>>>>> I rather think that we should go down the route of completely dropping > >>>>>>>>> command submission and buffer allocation through the primary node for > >>>>>>>>> non master clients. And then as next step at some point drop support for > >>>>>>>>> authentication/flink. > >>>>>>>>> > >>>>>>>>> I mean we have done this with UMS as well and I don't see much other way > >>>>>>>>> to move forward and get rid of those ancient interface in the long term. > >>>>>>>> Well kms had some really good benefits that drove quick adoption, like > >>>>>>>> "suspend/resume actually has a chance of working" or "comes with > >>>>>>>> buffer management so you can run multiple gears". > >>>>>>>> > >>>>>>>> The render node thing is a lot more niche use case (prime, better priv > >>>>>>>> separation), plus "it's cleaner design". And the "cleaner design" part > >>>>>>>> is something that empirically doesn't seem to matter :-/ Just two > >>>>>>>> examples: > >>>>>>>> - KHR_display/leases just iterated display resources on the fd needed > >>>>>>>> for rendering (and iirc there was even a patch to expose that for > >>>>>>>> render nodes too so it works with DRI3), because implementing > >>>>>>>> protocols is too hard. Barely managed to stop that one before it > >>>>>>>> happened. > >>>>>>>> - Various video players use the vblank ioctl on directly to schedule > >>>>>>>> frames, without telling the compositor. I discovered that when I > >>>>>>>> wanted to limite the vblank ioctl to master clients only. Again, > >>>>>>>> apparently too hard to use the existing extensions, or fix the bugs in > >>>>>>>> there, or whatever. One userspace got fixed last year, but it'll > >>>>>>>> probably get copypasted around forever :-/ > >>>>>>>> > >>>>>>>> So I don't think we'll ever manage to roll a clean split out, and best > >>>>>>>> we can do is give in and just hand userspace what it wants. As much as > >>>>>>>> that's misguided and unclean and all that. Maybe it'll result in a > >>>>>>>> least fewer stuff getting run as root to hack around this, because > >>>>>>>> fixing properly seems not to be on the table. > >>>>>>>> > >>>>>>>> The beauty of kms is that we've achieved the mission, everyone's > >>>>>>>> writing their own thing. Which is also terrible, and I don't think > >>>>>>>> it'll get better. > >>>>>>> With the risk of coming rude I will repeat my earlier comment: > >>>>>>> > >>>>>>> The problem is _neither_ Intel nor libva specific. > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> That said, let's step back for a moment and consider: > >>>>>>> > >>>>>>> - the "block everything but KMS via the primary node" idea is great but > >>>>>>> orthogonal > >>>>>>> > >>>>>>> - the series does address issues that are vendor-agnostic > >>>>>>> > >>>>>>> - by default this series does _not_ cause any regression be that for > >>>>>>> new or old userspace > >>>>>>> > >>>>>>> - there are two trivial solutions, if the AMD team has concerns about > >>>>>>> closed-source/private stack depending on the old behaviour > >>>>>>> If they want I can even write the patches ;-) > >>>>>>> > >>>>>>> > >>>>>>> That said, the notable comments received so far are: > >>>>>>> - rework patch 13/13 to remove the DRM_AUTH from prime fd to/from > >>>>>>> handle. I'm OK but this will change the return code - from EACCES to > >>>>>>> ENOSYS > >>>>>>> > >>>>>>> - vmwgfx will need a check on the reference ioctl(s) - IIRC Thomas is > >>>>>>> planning to drop nearly all DRM_AUTH instances in their driver. > >>>>>>> > >>>>>>> > >>>>>>> Christian, as mentioned before - this series does _not_ add > >>>>>>> functionality to render nodes. It effectively paves a way towards > >>>>>>> removing DRM_AUTH. > >>>>>> But it adds functionality to the primary node. > >>>>>> > >>>>> Behaviour is adjusted - functionality was there since day 1. > >>>>> > >>>>>>> I understand the series may feel a bit dirty. Yet I would gladly address > >>>>>>> any technical concerns you have. > >>>>>> Well putting compatibility issues aside my concern is that this is > >>>>>> simply a bad design decision which we can't revert later on. > >>>>>> > >>>>> As sad above - any concerns (theoretical or actual regressions) can be > >>>>> trivially fixed _without_ reverting any of this. > >>>>> > >>>>> I am more than happy to step up and address any regressions in timely > >>>>> manner. > >>>>> > >>>>> > >>>>> As a reminder without this series, some of your customers are forced to > >>>>> run their applications as root. > >>>> I'm torn here on whether this is worth it. Have we got more use cases > >>>> to justify it? > >>>> > >>> Should have mentioned: three DRM drivers (not counting i915) have > >>> dropped DRM_AUTH, assumingly for the same reasons I'm bringing here. > >>> > >>> Apart from the libva, kmscube + gst and mesa, I'm expecting other > >>> projects to make the same mistake. Since the former three define the > >>> norm of using DRM. > >>> > >>> The "fix" for all of these being "run as root" :-\ > >>> > >>>> I'm wary of opening this up just because we can. > >>>> > >>> What can I do to alleviate that worry? I have spent over a week auditing > >>> code and designed so that we can reinstate the authentication only where > >>> needed. > >> Well I don't think the worry here is about regressions, > > Glad to hear. > > > >> but rather about > >> a design decision we will never be able to revert. > >> > > Can you think of any reason/issue why we would want to revert this? I > > will gladly spend some thing exploring how to address it. > > Well, to finally get rid of the primary node for non display hardware. > > And in general to have a clean separation between display and rendering. > Clean separation of display and rendering is a good end goal to have, but is going to break the current userspace. In DRM we try to enforce the Linux guarantee of not breaking userspace. If we want to maintain this guarantee we are effectively stuck with providing render functionality in the primary node, so we might as well do it properly (i.e., relax the auth restrictions). But most importantly, if we are to pursue KMS-only primary nodes: - we should really have a wider discussion about it, and - this series does NOT impede any of if Thanks Emil
On 2019/05/27, Emil Velikov wrote: > From: Emil Velikov <emil.velikov@collabora.com> > > Currently one can circumvent DRM_AUTH, when the ioctl is exposed via the > render node. A seemingly deliberate design decision. > > Hence we can drop the DRM_AUTH all together (details in follow-up patch) > yet not all userspace checks if it's authenticated, but instead uses > uncommon assumptions. > > After days of digging through git log and testing, only a single (ab)use > was spotted - the Mesa RADV driver, using the AMDGPU_INFO ioctl and > assuming that failure implies lack of authentication. > > Affected versions are: > - the whole 18.2.x series, which is EOL > - the whole 18.3.x series, which is EOL > - the 19.0.x series, prior to 19.0.4 > > Add a special quirk for that case, thus we can drop DRM_AUTH bits as > mentioned earlier. > > Since all the affected userspace is EOL, we also add a kconfig option > to disable this quirk. > > The whole approach is inspired by DRIVER_KMS_LEGACY_CONTEXT > > Cc: Alex Deucher <alexander.deucher@amd.com> > Cc: Christian König <christian.koenig@amd.com> > Cc: amd-gfx@lists.freedesktop.org > Cc: David Airlie <airlied@linux.ie> > Cc: Daniel Vetter <daniel@ffwll.ch> > Signed-off-by: Emil Velikov <emil.velikov@collabora.com> > --- > drivers/gpu/drm/amd/amdgpu/Kconfig | 16 ++++++++++++++++ > drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 12 +++++++++++- > drivers/gpu/drm/drm_ioctl.c | 5 +++++ > include/drm/drm_ioctl.h | 17 +++++++++++++++++ > 4 files changed, 49 insertions(+), 1 deletion(-) > Hi Christian, In the following, I would like to summarise and emphasize the need for DRM_AUTH removal. I would kindly ask you to spend a couple of minutes extra reading it. Today DRM drivers* do not make any distinction between primary and render node clients. Thus for a render capable driver, any premise of separation, security or otherwise imposed via DRM_AUTH is a fallacy. Considering the examples of flaky or outright missing drmAuth in prime open-source projects (mesa, kmscube, libva) we can reasonably assume other projects exbibit the same problem. For these and/or other reasons, two DRM drivers have dropped DRM_AUTH since day one. Since we are interested in providing consistent and predictable behaviour to user-space, dropping DRM_AUTH seems to be the most effective way forward. Of course, I'd be more than happy to hear for any other way to achieve the goal outlined. Based on the series, other maintainers agree with the idea/goal here. Amdgpu not following the same pattern would compromise predictability across drivers and complicate userspace, so I would kindly ask you to reconsider. Alternatively, I see two ways to special case: - New flag annotating amdgpu/radeon - akin to the one proposed earlier - Check for amdgpu/radeon in core DRM Now on your pain point - not allowing render iocts via the primary node, and how this patch could make it harder to achieve that goal. While I love the idea, there are number of obstacles that prevent us from doing so at this time: - Ensuring both old and new userspace does not regress - Drivers (QXL, others?) do not expose a render node - We want to avoid driver specific behaviour The only way forward that I can see is: - Address QXL/others to expose render nodes - Add a Kconfig toggle to disable !KMS ioctls via the primary node - Jump-start ^^ with distribution X - Fix user-space fallouts - X months down the line, flip the Kconfig - In case of regressions - revert the KConfig, goto Fix user-space... That said, the proposal will not conflict with the DRM_AUTH removal. If anything it is step 0.5 of the grand master plan. Tl;Dr: Removing DRM_AUTH is orthogonal to not allowing render iocts via the primary node. Here's an idea how to achieve the latter. Thanks Emil
Am 14.06.19 um 14:09 schrieb Emil Velikov: > On 2019/05/27, Emil Velikov wrote: > [SNIP] > Hi Christian, > > > In the following, I would like to summarise and emphasize the need for > DRM_AUTH removal. I would kindly ask you to spend a couple of minutes > extra reading it. > > > Today DRM drivers* do not make any distinction between primary and > render node clients. That is actually not 100% correct. We have a special case where a DRM master is allowed to change the priority of render node clients. > Thus for a render capable driver, any premise of > separation, security or otherwise imposed via DRM_AUTH is a fallacy. Yeah, that's what I agree on. I just don't think that removing DRM_AUTH now is the right direction to take. > Considering the examples of flaky or outright missing drmAuth in prime > open-source projects (mesa, kmscube, libva) we can reasonably assume > other projects exbibit the same problem. > > For these and/or other reasons, two DRM drivers have dropped DRM_AUTH > since day one. > > Since we are interested in providing consistent and predictable > behaviour to user-space, dropping DRM_AUTH seems to be the most > effective way forward. Well and what do you do with drivers which doesn't implement render nodes? > Of course, I'd be more than happy to hear for any other way to achieve > the goal outlined. > > Based on the series, other maintainers agree with the idea/goal here. > Amdgpu not following the same pattern would compromise predictability > across drivers and complicate userspace, so I would kindly ask you to > reconsider. > > Alternatively, I see two ways to special case: > - New flag annotating amdgpu/radeon - akin to the one proposed earlier > - Check for amdgpu/radeon in core DRM I perfectly agree that I don't want any special handling for amdgpu/radeon. My key point is that with DRM_AUTH we created an authorization and authentication mess in DRM which is functionality which doesn't belong into the DRM subsystem in the first place. > Now on your pain point - not allowing render iocts via the primary node, > and how this patch could make it harder to achieve that goal. > > While I love the idea, there are number of obstacles that prevent us > from doing so at this time: > - Ensuring both old and new userspace does not regress Yeah, agree totally. That's why I said we should probably start doing this for only for upcoming hardware generations. > - Drivers (QXL, others?) do not expose a render node Well isn't that is a rather big problem for the removal of DRM_AUTH in general? E.g. at least QXL would then expose functionality on the primary node without authentication. > - We want to avoid driver specific behaviour > > The only way forward that I can see is: > - Address QXL/others to expose render nodes > - Add a Kconfig toggle to disable !KMS ioctls via the primary node > - Jump-start ^^ with distribution X > - Fix user-space fallouts > - X months down the line, flip the Kconfig > - In case of regressions - revert the KConfig, goto Fix user-space... Well that at least sounds like a plan :) Let's to this! > That said, the proposal will not conflict with the DRM_AUTH removal. If > anything it is step 0.5 of the grand master plan. That's the point I strongly disagree on. By lowering the DRM_AUTH restriction you are encouraging userspace to use the shortcut and use the primary node for rendering instead of fixing the code and using the render node. So at the end of the day userspace will most likely completely drop support for the render node, simply for the reason that it became superfluous. You can just open up the primary node and get the same functionality. I absolutely can't understand how you can argue that this won't make it harder to cleanly separate rendering and primary node functionality. Regards, Christian. > > > Tl;Dr: Removing DRM_AUTH is orthogonal to not allowing render iocts via > the primary node. Here's an idea how to achieve the latter. > > > Thanks > Emil
On 2019-06-14 2:55 p.m., Koenig, Christian wrote: > Am 14.06.19 um 14:09 schrieb Emil Velikov: > >> That said, the proposal will not conflict with the DRM_AUTH removal. If >> anything it is step 0.5 of the grand master plan. > > That's the point I strongly disagree on. > > By lowering the DRM_AUTH restriction you are encouraging userspace to > use the shortcut and use the primary node for rendering instead of > fixing the code and using the render node. > > So at the end of the day userspace will most likely completely drop > support for the render node, simply for the reason that it became > superfluous. You can just open up the primary node and get the same > functionality. > > I absolutely can't understand how you can argue that this won't make it > harder to cleanly separate rendering and primary node functionality. FWIW, I agree with Christian on this. Also FWIW, the solution I'm working on for https://bugs.freedesktop.org/110903 should allow making libdrm_amdgpu always use a render node for rendering. For backwards compatibility it'll probably require adding a new variant of amdgpu_device_initialize though, since existing users may rely on the first amdgpu_device for a GPU using the DRM file description passed to amdgpu_device_initialize for rendering.
On 2019/06/14, Koenig, Christian wrote: > Am 14.06.19 um 14:09 schrieb Emil Velikov: > > On 2019/05/27, Emil Velikov wrote: > > [SNIP] > > Hi Christian, > > > > > > In the following, I would like to summarise and emphasize the need for > > DRM_AUTH removal. I would kindly ask you to spend a couple of minutes > > extra reading it. > > > > > > Today DRM drivers* do not make any distinction between primary and > > render node clients. > > That is actually not 100% correct. We have a special case where a DRM > master is allowed to change the priority of render node clients. > Can you provide a link? I cannot find that code. > > Thus for a render capable driver, any premise of > > separation, security or otherwise imposed via DRM_AUTH is a fallacy. > > Yeah, that's what I agree on. I just don't think that removing DRM_AUTH > now is the right direction to take. > Could have been clearer - I'm talking about DRM_AUTH | DRM_RENDER_ALLOW ioctls. That aside, can you propose an alternative solution that addresses this and the second point just below? > > Considering the examples of flaky or outright missing drmAuth in prime > > open-source projects (mesa, kmscube, libva) we can reasonably assume > > other projects exbibit the same problem. > > > > For these and/or other reasons, two DRM drivers have dropped DRM_AUTH > > since day one. > > > > Since we are interested in providing consistent and predictable > > behaviour to user-space, dropping DRM_AUTH seems to be the most > > effective way forward. > > Well and what do you do with drivers which doesn't implement render nodes? > AFAICT there is a single non-DC driver which does not expose - QXL. I would expect it runs on a rather customised stack. Would be great to fix that, but ETIME and it's the only exception to the rule. > > Of course, I'd be more than happy to hear for any other way to achieve > > the goal outlined. > > > > Based on the series, other maintainers agree with the idea/goal here. > > Amdgpu not following the same pattern would compromise predictability > > across drivers and complicate userspace, so I would kindly ask you to > > reconsider. > > > > Alternatively, I see two ways to special case: > > - New flag annotating amdgpu/radeon - akin to the one proposed earlier > > - Check for amdgpu/radeon in core DRM > > I perfectly agree that I don't want any special handling for amdgpu/radeon. > > My key point is that with DRM_AUTH we created an authorization and > authentication mess in DRM which is functionality which doesn't belong > into the DRM subsystem in the first place. > Precisely and I've outlined below how to resolve this in the long run. > > Now on your pain point - not allowing render iocts via the primary node, > > and how this patch could make it harder to achieve that goal. > > > > While I love the idea, there are number of obstacles that prevent us > > from doing so at this time: > > - Ensuring both old and new userspace does not regress > > Yeah, agree totally. That's why I said we should probably start doing > this for only for upcoming hardware generations. > That will side-step the regression issue, but will enforce driver specific behaviour outlined before. > > - Drivers (QXL, others?) do not expose a render node > > Well isn't that is a rather big problem for the removal of DRM_AUTH in > general? > > E.g. at least QXL would then expose functionality on the primary node > without authentication. > With this series QXL remains functionally unchanged. I would love to fix that as well yet ETIME as mentioned above. > > - We want to avoid driver specific behaviour > > > > The only way forward that I can see is: > > - Address QXL/others to expose render nodes > > - Add a Kconfig toggle to disable !KMS ioctls via the primary node > > - Jump-start ^^ with distribution X > > - Fix user-space fallouts > > - X months down the line, flip the Kconfig > > - In case of regressions - revert the KConfig, goto Fix user-space... > > Well that at least sounds like a plan :) Let's to this! > We're talking about months if not years until it comes to fruition. We need something quicker. That said, I'm quite happy to take the first stab, yet this is not a replacement for this series. > > That said, the proposal will not conflict with the DRM_AUTH removal. If > > anything it is step 0.5 of the grand master plan. > > That's the point I strongly disagree on. > > By lowering the DRM_AUTH restriction you are encouraging userspace to > use the shortcut and use the primary node for rendering instead of > fixing the code and using the render node. > Have to disagree here. After working on the user-space side and fixing issues in various projects I can honestly say that most user-space is sane and developers _care_ about doing things correctly. > So at the end of the day userspace will most likely completely drop > support for the render node, simply for the reason that it became > superfluous. You can just open up the primary node and get the same > functionality. > I think you're being overly pessimistic here. This is coming from a person who is often closer to the pessimistic side of things. As a middle ground how about we do the following: - Get this series as-is, alongside - picking the first three items from the grand master plan - happy to start a discussion about QXL - a patch for the Kconfig toggle, is coming in a few minutes - happy to prod my distribution of choice (Arch) and work with them What do you think? Thanks Emil
Am 14.06.19 um 17:53 schrieb Emil Velikov: > On 2019/06/14, Koenig, Christian wrote: >> Am 14.06.19 um 14:09 schrieb Emil Velikov: >>> On 2019/05/27, Emil Velikov wrote: >>> [SNIP] >>> Hi Christian, >>> >>> >>> In the following, I would like to summarise and emphasize the need for >>> DRM_AUTH removal. I would kindly ask you to spend a couple of minutes >>> extra reading it. >>> >>> >>> Today DRM drivers* do not make any distinction between primary and >>> render node clients. >> That is actually not 100% correct. We have a special case where a DRM >> master is allowed to change the priority of render node clients. >> > Can you provide a link? I cannot find that code. See amdgpu_sched_ioctl(). >>> Thus for a render capable driver, any premise of >>> separation, security or otherwise imposed via DRM_AUTH is a fallacy. >> Yeah, that's what I agree on. I just don't think that removing DRM_AUTH >> now is the right direction to take. >> > Could have been clearer - I'm talking about DRM_AUTH | DRM_RENDER_ALLOW > ioctls. > > That aside, can you propose an alternative solution that addresses this > and the second point just below? Give me a few days to work on this, it's already Friday 6pm here. Christian. > >>> Considering the examples of flaky or outright missing drmAuth in prime >>> open-source projects (mesa, kmscube, libva) we can reasonably assume >>> other projects exbibit the same problem. >>> >>> For these and/or other reasons, two DRM drivers have dropped DRM_AUTH >>> since day one. >>> >>> Since we are interested in providing consistent and predictable >>> behaviour to user-space, dropping DRM_AUTH seems to be the most >>> effective way forward. >> Well and what do you do with drivers which doesn't implement render nodes? >> > AFAICT there is a single non-DC driver which does not expose - QXL. > I would expect it runs on a rather customised stack. > > Would be great to fix that, but ETIME and it's the only exception to the > rule. > > >>> Of course, I'd be more than happy to hear for any other way to achieve >>> the goal outlined. >>> >>> Based on the series, other maintainers agree with the idea/goal here. >>> Amdgpu not following the same pattern would compromise predictability >>> across drivers and complicate userspace, so I would kindly ask you to >>> reconsider. >>> >>> Alternatively, I see two ways to special case: >>> - New flag annotating amdgpu/radeon - akin to the one proposed earlier >>> - Check for amdgpu/radeon in core DRM >> I perfectly agree that I don't want any special handling for amdgpu/radeon. >> >> My key point is that with DRM_AUTH we created an authorization and >> authentication mess in DRM which is functionality which doesn't belong >> into the DRM subsystem in the first place. >> > Precisely and I've outlined below how to resolve this in the long run. > >>> Now on your pain point - not allowing render iocts via the primary node, >>> and how this patch could make it harder to achieve that goal. >>> >>> While I love the idea, there are number of obstacles that prevent us >>> from doing so at this time: >>> - Ensuring both old and new userspace does not regress >> Yeah, agree totally. That's why I said we should probably start doing >> this for only for upcoming hardware generations. >> > That will side-step the regression issue, but will enforce driver > specific behaviour outlined before. > >>> - Drivers (QXL, others?) do not expose a render node >> Well isn't that is a rather big problem for the removal of DRM_AUTH in >> general? >> >> E.g. at least QXL would then expose functionality on the primary node >> without authentication. >> > With this series QXL remains functionally unchanged. I would love to fix > that as well yet ETIME as mentioned above. > >>> - We want to avoid driver specific behaviour >>> >>> The only way forward that I can see is: >>> - Address QXL/others to expose render nodes >>> - Add a Kconfig toggle to disable !KMS ioctls via the primary node >>> - Jump-start ^^ with distribution X >>> - Fix user-space fallouts >>> - X months down the line, flip the Kconfig >>> - In case of regressions - revert the KConfig, goto Fix user-space... >> Well that at least sounds like a plan :) Let's to this! >> > We're talking about months if not years until it comes to fruition. We > need something quicker. > > That said, I'm quite happy to take the first stab, yet this is not a > replacement for this series. > >>> That said, the proposal will not conflict with the DRM_AUTH removal. If >>> anything it is step 0.5 of the grand master plan. >> That's the point I strongly disagree on. >> >> By lowering the DRM_AUTH restriction you are encouraging userspace to >> use the shortcut and use the primary node for rendering instead of >> fixing the code and using the render node. >> > Have to disagree here. After working on the user-space side and fixing > issues in various projects I can honestly say that most user-space is > sane and developers _care_ about doing things correctly. > >> So at the end of the day userspace will most likely completely drop >> support for the render node, simply for the reason that it became >> superfluous. You can just open up the primary node and get the same >> functionality. >> > I think you're being overly pessimistic here. This is coming from a > person who is often closer to the pessimistic side of things. > > As a middle ground how about we do the following: > - Get this series as-is, alongside > - picking the first three items from the grand master plan > - happy to start a discussion about QXL > - a patch for the Kconfig toggle, is coming in a few minutes > - happy to prod my distribution of choice (Arch) and work with them > > What do you think? > > Thanks > Emil >
On 2019/06/14, Koenig, Christian wrote: > Am 14.06.19 um 17:53 schrieb Emil Velikov: > > On 2019/06/14, Koenig, Christian wrote: > >> Am 14.06.19 um 14:09 schrieb Emil Velikov: > >>> On 2019/05/27, Emil Velikov wrote: > >>> [SNIP] > >>> Hi Christian, > >>> > >>> > >>> In the following, I would like to summarise and emphasize the need for > >>> DRM_AUTH removal. I would kindly ask you to spend a couple of minutes > >>> extra reading it. > >>> > >>> > >>> Today DRM drivers* do not make any distinction between primary and > >>> render node clients. > >> That is actually not 100% correct. We have a special case where a DRM > >> master is allowed to change the priority of render node clients. > >> > > Can you provide a link? I cannot find that code. > > See amdgpu_sched_ioctl(). > > >>> Thus for a render capable driver, any premise of > >>> separation, security or otherwise imposed via DRM_AUTH is a fallacy. > >> Yeah, that's what I agree on. I just don't think that removing DRM_AUTH > >> now is the right direction to take. > >> > > Could have been clearer - I'm talking about DRM_AUTH | DRM_RENDER_ALLOW > > ioctls. > > > > That aside, can you propose an alternative solution that addresses this > > and the second point just below? > > Give me a few days to work on this, it's already Friday 6pm here. > Great thanks. Fwiw I was asking for a ideas/proposal, was not expecting you to write the patches ;-) Emil
On 2019/06/14, Koenig, Christian wrote: > Am 14.06.19 um 17:53 schrieb Emil Velikov: > > On 2019/06/14, Koenig, Christian wrote: > >> Am 14.06.19 um 14:09 schrieb Emil Velikov: > >>> On 2019/05/27, Emil Velikov wrote: > >>> [SNIP] > >>> Hi Christian, > >>> > >>> > >>> In the following, I would like to summarise and emphasize the need for > >>> DRM_AUTH removal. I would kindly ask you to spend a couple of minutes > >>> extra reading it. > >>> > >>> > >>> Today DRM drivers* do not make any distinction between primary and > >>> render node clients. > >> That is actually not 100% correct. We have a special case where a DRM > >> master is allowed to change the priority of render node clients. > >> > > Can you provide a link? I cannot find that code. > > See amdgpu_sched_ioctl(). > > >>> Thus for a render capable driver, any premise of > >>> separation, security or otherwise imposed via DRM_AUTH is a fallacy. > >> Yeah, that's what I agree on. I just don't think that removing DRM_AUTH > >> now is the right direction to take. > >> > > Could have been clearer - I'm talking about DRM_AUTH | DRM_RENDER_ALLOW > > ioctls. > > > > That aside, can you propose an alternative solution that addresses this > > and the second point just below? > > Give me a few days to work on this, it's already Friday 6pm here. > Hi Christian, Any progress? As mentioned earlier, I'm OK with writing the patches although I would love to hear your plan. Thanks Emil
Am 20.06.19 um 18:30 schrieb Emil Velikov: > On 2019/06/14, Koenig, Christian wrote: >> Am 14.06.19 um 17:53 schrieb Emil Velikov: >>> On 2019/06/14, Koenig, Christian wrote: >>>> Am 14.06.19 um 14:09 schrieb Emil Velikov: >>>>> On 2019/05/27, Emil Velikov wrote: >>>>> [SNIP] >>>>> Hi Christian, >>>>> >>>>> >>>>> In the following, I would like to summarise and emphasize the need for >>>>> DRM_AUTH removal. I would kindly ask you to spend a couple of minutes >>>>> extra reading it. >>>>> >>>>> >>>>> Today DRM drivers* do not make any distinction between primary and >>>>> render node clients. >>>> That is actually not 100% correct. We have a special case where a DRM >>>> master is allowed to change the priority of render node clients. >>>> >>> Can you provide a link? I cannot find that code. >> See amdgpu_sched_ioctl(). >> >>>>> Thus for a render capable driver, any premise of >>>>> separation, security or otherwise imposed via DRM_AUTH is a fallacy. >>>> Yeah, that's what I agree on. I just don't think that removing DRM_AUTH >>>> now is the right direction to take. >>>> >>> Could have been clearer - I'm talking about DRM_AUTH | DRM_RENDER_ALLOW >>> ioctls. >>> >>> That aside, can you propose an alternative solution that addresses this >>> and the second point just below? >> Give me a few days to work on this, it's already Friday 6pm here. >> > Hi Christian, > > Any progress? As mentioned earlier, I'm OK with writing the patches although > I would love to hear your plan. First of all I tried to disable DRM authentication completely with a kernel config option. Surprisingly that actually works out of the box at least on the AMDGPU stack. This effectively allows us to get rid of DRI2 and the related security problems. Only thing left for that is that I'm just not sure how to signal this to userspace so that the DDX wouldn't advertise DRI2 at all any more. As a next step I looked into if we can disable the command submission for DRM master. Turned out that this is relatively easy as well. All we have to do is to fix the bug Michel pointed out about KMS handles for display and let the DDX use a render node instead of the DRM master for Glamor. Still need to sync up with Michel and/or Marek whats the best way of doing this. The last step (and that's what I haven't tried yet) would be to disable DRM authentication for Navi even when it is still compiled into the kernel. But that is more or less just a typing exercise. Regards, Christian. > > Thanks > Emil
On 2019-06-21 9:12 a.m., Koenig, Christian wrote: > > First of all I tried to disable DRM authentication completely with a > kernel config option. Surprisingly that actually works out of the box at > least on the AMDGPU stack. > > This effectively allows us to get rid of DRI2 and the related security > problems. Only thing left for that is that I'm just not sure how to > signal this to userspace so that the DDX wouldn't advertise DRI2 at all > any more. FWIW, getting rid of DRI2 also needs to be discussed with amdgpu-pro OpenGL driver folks. > As a next step I looked into if we can disable the command submission > for DRM master. Turned out that this is relatively easy as well. > > All we have to do is to fix the bug Michel pointed out about KMS handles > for display I'm working on that, consider it fixed. > and let the DDX use a render node instead of the DRM master for Glamor. > Still need to sync up with Michel and/or Marek whats the best way of > doing this. My suggestion was to add a new variant of amdgpu_device_initialize. When the new variant is called, libdrm_amdgpu internally uses a render node for command submission etc. whenever possible.
Am 21.06.19 um 09:41 schrieb Michel Dänzer: > On 2019-06-21 9:12 a.m., Koenig, Christian wrote: >> First of all I tried to disable DRM authentication completely with a >> kernel config option. Surprisingly that actually works out of the box at >> least on the AMDGPU stack. >> >> This effectively allows us to get rid of DRI2 and the related security >> problems. Only thing left for that is that I'm just not sure how to >> signal this to userspace so that the DDX wouldn't advertise DRI2 at all >> any more. > FWIW, getting rid of DRI2 also needs to be discussed with amdgpu-pro > OpenGL driver folks. Good point, but I don't expect much problems from that direction. IIRC they where quite happy to not have to support DRI2 except for a very very old X server version. >> As a next step I looked into if we can disable the command submission >> for DRM master. Turned out that this is relatively easy as well. >> >> All we have to do is to fix the bug Michel pointed out about KMS handles >> for display > I'm working on that, consider it fixed. > > >> and let the DDX use a render node instead of the DRM master for Glamor. >> Still need to sync up with Michel and/or Marek whats the best way of >> doing this. > My suggestion was to add a new variant of amdgpu_device_initialize. When > the new variant is called, libdrm_amdgpu internally uses a render node > for command submission etc. whenever possible. Yeah, sounds like a plan to me. Christian.
On Fri, Jun 21, 2019 at 07:12:14AM +0000, Koenig, Christian wrote: > Am 20.06.19 um 18:30 schrieb Emil Velikov: > > On 2019/06/14, Koenig, Christian wrote: > >> Am 14.06.19 um 17:53 schrieb Emil Velikov: > >>> On 2019/06/14, Koenig, Christian wrote: > >>>> Am 14.06.19 um 14:09 schrieb Emil Velikov: > >>>>> On 2019/05/27, Emil Velikov wrote: > >>>>> [SNIP] > >>>>> Hi Christian, > >>>>> > >>>>> > >>>>> In the following, I would like to summarise and emphasize the need for > >>>>> DRM_AUTH removal. I would kindly ask you to spend a couple of minutes > >>>>> extra reading it. > >>>>> > >>>>> > >>>>> Today DRM drivers* do not make any distinction between primary and > >>>>> render node clients. > >>>> That is actually not 100% correct. We have a special case where a DRM > >>>> master is allowed to change the priority of render node clients. > >>>> > >>> Can you provide a link? I cannot find that code. > >> See amdgpu_sched_ioctl(). > >> > >>>>> Thus for a render capable driver, any premise of > >>>>> separation, security or otherwise imposed via DRM_AUTH is a fallacy. > >>>> Yeah, that's what I agree on. I just don't think that removing DRM_AUTH > >>>> now is the right direction to take. > >>>> > >>> Could have been clearer - I'm talking about DRM_AUTH | DRM_RENDER_ALLOW > >>> ioctls. > >>> > >>> That aside, can you propose an alternative solution that addresses this > >>> and the second point just below? > >> Give me a few days to work on this, it's already Friday 6pm here. > >> > > Hi Christian, > > > > Any progress? As mentioned earlier, I'm OK with writing the patches although > > I would love to hear your plan. > > First of all I tried to disable DRM authentication completely with a > kernel config option. Surprisingly that actually works out of the box at > least on the AMDGPU stack. > > This effectively allows us to get rid of DRI2 and the related security > problems. Only thing left for that is that I'm just not sure how to > signal this to userspace so that the DDX wouldn't advertise DRI2 at all > any more. > > > As a next step I looked into if we can disable the command submission > for DRM master. Turned out that this is relatively easy as well. > > All we have to do is to fix the bug Michel pointed out about KMS handles > for display and let the DDX use a render node instead of the DRM master > for Glamor. Still need to sync up with Michel and/or Marek whats the > best way of doing this. > > > The last step (and that's what I haven't tried yet) would be to disable > DRM authentication for Navi even when it is still compiled into the > kernel. But that is more or less just a typing exercise. So just to get this straight, we're now full on embracing a subsystem split here: - amdgpu plans to ditch dri2/rendering on primary nodes - bunch of drivers (I think more than i915 by now) merged the DRM_AUTH removal - others are just hanging in there somehow "best of both^W worlds" ftw? -Daniel
Am 21.06.19 um 11:09 schrieb Daniel Vetter: > On Fri, Jun 21, 2019 at 07:12:14AM +0000, Koenig, Christian wrote: >> Am 20.06.19 um 18:30 schrieb Emil Velikov: >>> On 2019/06/14, Koenig, Christian wrote: >>>> Am 14.06.19 um 17:53 schrieb Emil Velikov: >>>>> On 2019/06/14, Koenig, Christian wrote: >>>>>> Am 14.06.19 um 14:09 schrieb Emil Velikov: >>>>>>> On 2019/05/27, Emil Velikov wrote: >>>>>>> [SNIP] >>>>>>> Hi Christian, >>>>>>> >>>>>>> >>>>>>> In the following, I would like to summarise and emphasize the need for >>>>>>> DRM_AUTH removal. I would kindly ask you to spend a couple of minutes >>>>>>> extra reading it. >>>>>>> >>>>>>> >>>>>>> Today DRM drivers* do not make any distinction between primary and >>>>>>> render node clients. >>>>>> That is actually not 100% correct. We have a special case where a DRM >>>>>> master is allowed to change the priority of render node clients. >>>>>> >>>>> Can you provide a link? I cannot find that code. >>>> See amdgpu_sched_ioctl(). >>>> >>>>>>> Thus for a render capable driver, any premise of >>>>>>> separation, security or otherwise imposed via DRM_AUTH is a fallacy. >>>>>> Yeah, that's what I agree on. I just don't think that removing DRM_AUTH >>>>>> now is the right direction to take. >>>>>> >>>>> Could have been clearer - I'm talking about DRM_AUTH | DRM_RENDER_ALLOW >>>>> ioctls. >>>>> >>>>> That aside, can you propose an alternative solution that addresses this >>>>> and the second point just below? >>>> Give me a few days to work on this, it's already Friday 6pm here. >>>> >>> Hi Christian, >>> >>> Any progress? As mentioned earlier, I'm OK with writing the patches although >>> I would love to hear your plan. >> First of all I tried to disable DRM authentication completely with a >> kernel config option. Surprisingly that actually works out of the box at >> least on the AMDGPU stack. >> >> This effectively allows us to get rid of DRI2 and the related security >> problems. Only thing left for that is that I'm just not sure how to >> signal this to userspace so that the DDX wouldn't advertise DRI2 at all >> any more. >> >> >> As a next step I looked into if we can disable the command submission >> for DRM master. Turned out that this is relatively easy as well. >> >> All we have to do is to fix the bug Michel pointed out about KMS handles >> for display and let the DDX use a render node instead of the DRM master >> for Glamor. Still need to sync up with Michel and/or Marek whats the >> best way of doing this. >> >> >> The last step (and that's what I haven't tried yet) would be to disable >> DRM authentication for Navi even when it is still compiled into the >> kernel. But that is more or less just a typing exercise. > So just to get this straight, we're now full on embracing a subsystem > split here: > - amdgpu plans to ditch dri2/rendering on primary nodes > - bunch of drivers (I think more than i915 by now) merged the DRM_AUTH > removal > - others are just hanging in there somehow > > "best of both^W worlds" ftw? Yes, exactly! Think a step further, when this is upstream we can apply the DRM_AUTH removal to amdgpu as well. Because we simply made sure that userspace is using the render node for command submission and not the primary node. So nobody can go ahead and remove render node support any more :) Regards, Christian. > -Daniel
On Fri, Jun 21, 2019 at 11:25 AM Koenig, Christian <Christian.Koenig@amd.com> wrote: > > Am 21.06.19 um 11:09 schrieb Daniel Vetter: > > On Fri, Jun 21, 2019 at 07:12:14AM +0000, Koenig, Christian wrote: > >> Am 20.06.19 um 18:30 schrieb Emil Velikov: > >>> On 2019/06/14, Koenig, Christian wrote: > >>>> Am 14.06.19 um 17:53 schrieb Emil Velikov: > >>>>> On 2019/06/14, Koenig, Christian wrote: > >>>>>> Am 14.06.19 um 14:09 schrieb Emil Velikov: > >>>>>>> On 2019/05/27, Emil Velikov wrote: > >>>>>>> [SNIP] > >>>>>>> Hi Christian, > >>>>>>> > >>>>>>> > >>>>>>> In the following, I would like to summarise and emphasize the need for > >>>>>>> DRM_AUTH removal. I would kindly ask you to spend a couple of minutes > >>>>>>> extra reading it. > >>>>>>> > >>>>>>> > >>>>>>> Today DRM drivers* do not make any distinction between primary and > >>>>>>> render node clients. > >>>>>> That is actually not 100% correct. We have a special case where a DRM > >>>>>> master is allowed to change the priority of render node clients. > >>>>>> > >>>>> Can you provide a link? I cannot find that code. > >>>> See amdgpu_sched_ioctl(). > >>>> > >>>>>>> Thus for a render capable driver, any premise of > >>>>>>> separation, security or otherwise imposed via DRM_AUTH is a fallacy. > >>>>>> Yeah, that's what I agree on. I just don't think that removing DRM_AUTH > >>>>>> now is the right direction to take. > >>>>>> > >>>>> Could have been clearer - I'm talking about DRM_AUTH | DRM_RENDER_ALLOW > >>>>> ioctls. > >>>>> > >>>>> That aside, can you propose an alternative solution that addresses this > >>>>> and the second point just below? > >>>> Give me a few days to work on this, it's already Friday 6pm here. > >>>> > >>> Hi Christian, > >>> > >>> Any progress? As mentioned earlier, I'm OK with writing the patches although > >>> I would love to hear your plan. > >> First of all I tried to disable DRM authentication completely with a > >> kernel config option. Surprisingly that actually works out of the box at > >> least on the AMDGPU stack. > >> > >> This effectively allows us to get rid of DRI2 and the related security > >> problems. Only thing left for that is that I'm just not sure how to > >> signal this to userspace so that the DDX wouldn't advertise DRI2 at all > >> any more. > >> > >> > >> As a next step I looked into if we can disable the command submission > >> for DRM master. Turned out that this is relatively easy as well. > >> > >> All we have to do is to fix the bug Michel pointed out about KMS handles > >> for display and let the DDX use a render node instead of the DRM master > >> for Glamor. Still need to sync up with Michel and/or Marek whats the > >> best way of doing this. > >> > >> > >> The last step (and that's what I haven't tried yet) would be to disable > >> DRM authentication for Navi even when it is still compiled into the > >> kernel. But that is more or less just a typing exercise. > > So just to get this straight, we're now full on embracing a subsystem > > split here: > > - amdgpu plans to ditch dri2/rendering on primary nodes > > - bunch of drivers (I think more than i915 by now) merged the DRM_AUTH > > removal > > - others are just hanging in there somehow > > > > "best of both^W worlds" ftw? > > Yes, exactly! > > Think a step further, when this is upstream we can apply the DRM_AUTH > removal to amdgpu as well. > > Because we simply made sure that userspace is using the render node for > command submission and not the primary node. > > So nobody can go ahead and remove render node support any more :) How does this work? I thought the entire problem of DRM_AUTH removal for you was that it broke stuff, and you didn't want to deal with that. We still have that exact problem afaics ... old userspace doesn't simply vanish, even if you entirely nuke it going forward. Also, testing on the amdgpu stack isn't really testing a hole lot here, it's all the various silly compositors out there I'd expect to fall over. Will gbm/radeonsi/whatever just internally do all the necessary dma-buf import/export between the primary node and display node to keep this all working? -Daniel
Am 21.06.19 um 11:35 schrieb Daniel Vetter: > On Fri, Jun 21, 2019 at 11:25 AM Koenig, Christian > <Christian.Koenig@amd.com> wrote: >> Am 21.06.19 um 11:09 schrieb Daniel Vetter: >>> On Fri, Jun 21, 2019 at 07:12:14AM +0000, Koenig, Christian wrote: >>>> Am 20.06.19 um 18:30 schrieb Emil Velikov: >>>>> On 2019/06/14, Koenig, Christian wrote: >>>>>> Am 14.06.19 um 17:53 schrieb Emil Velikov: >>>>>>> On 2019/06/14, Koenig, Christian wrote: >>>>>>>> Am 14.06.19 um 14:09 schrieb Emil Velikov: >>>>>>>>> On 2019/05/27, Emil Velikov wrote: >>>>>>>>> [SNIP] >>>>>>>>> Hi Christian, >>>>>>>>> >>>>>>>>> >>>>>>>>> In the following, I would like to summarise and emphasize the need for >>>>>>>>> DRM_AUTH removal. I would kindly ask you to spend a couple of minutes >>>>>>>>> extra reading it. >>>>>>>>> >>>>>>>>> >>>>>>>>> Today DRM drivers* do not make any distinction between primary and >>>>>>>>> render node clients. >>>>>>>> That is actually not 100% correct. We have a special case where a DRM >>>>>>>> master is allowed to change the priority of render node clients. >>>>>>>> >>>>>>> Can you provide a link? I cannot find that code. >>>>>> See amdgpu_sched_ioctl(). >>>>>> >>>>>>>>> Thus for a render capable driver, any premise of >>>>>>>>> separation, security or otherwise imposed via DRM_AUTH is a fallacy. >>>>>>>> Yeah, that's what I agree on. I just don't think that removing DRM_AUTH >>>>>>>> now is the right direction to take. >>>>>>>> >>>>>>> Could have been clearer - I'm talking about DRM_AUTH | DRM_RENDER_ALLOW >>>>>>> ioctls. >>>>>>> >>>>>>> That aside, can you propose an alternative solution that addresses this >>>>>>> and the second point just below? >>>>>> Give me a few days to work on this, it's already Friday 6pm here. >>>>>> >>>>> Hi Christian, >>>>> >>>>> Any progress? As mentioned earlier, I'm OK with writing the patches although >>>>> I would love to hear your plan. >>>> First of all I tried to disable DRM authentication completely with a >>>> kernel config option. Surprisingly that actually works out of the box at >>>> least on the AMDGPU stack. >>>> >>>> This effectively allows us to get rid of DRI2 and the related security >>>> problems. Only thing left for that is that I'm just not sure how to >>>> signal this to userspace so that the DDX wouldn't advertise DRI2 at all >>>> any more. >>>> >>>> >>>> As a next step I looked into if we can disable the command submission >>>> for DRM master. Turned out that this is relatively easy as well. >>>> >>>> All we have to do is to fix the bug Michel pointed out about KMS handles >>>> for display and let the DDX use a render node instead of the DRM master >>>> for Glamor. Still need to sync up with Michel and/or Marek whats the >>>> best way of doing this. >>>> >>>> >>>> The last step (and that's what I haven't tried yet) would be to disable >>>> DRM authentication for Navi even when it is still compiled into the >>>> kernel. But that is more or less just a typing exercise. >>> So just to get this straight, we're now full on embracing a subsystem >>> split here: >>> - amdgpu plans to ditch dri2/rendering on primary nodes >>> - bunch of drivers (I think more than i915 by now) merged the DRM_AUTH >>> removal >>> - others are just hanging in there somehow >>> >>> "best of both^W worlds" ftw? >> Yes, exactly! >> >> Think a step further, when this is upstream we can apply the DRM_AUTH >> removal to amdgpu as well. >> >> Because we simply made sure that userspace is using the render node for >> command submission and not the primary node. >> >> So nobody can go ahead and remove render node support any more :) > How does this work? I thought the entire problem of DRM_AUTH removal > for you was that it broke stuff, and you didn't want to deal with > that. Yeah, that is indeed still true. It's just that we have done way to many projects with radeon/amdgpu and customized userspace stuff. > We still have that exact problem afaics ... old userspace > doesn't simply vanish, even if you entirely nuke it going forward. Well old userspace doesn't work with new hardware either. So the idea is to keep old userspace for old hardware working, but only disable old stuff for new hardware. > Also, testing on the amdgpu stack isn't really testing a hole lot > here, it's all the various silly compositors out there I'd expect to > fall over. Will gbm/radeonsi/whatever just internally do all the > necessary dma-buf import/export between the primary node and display > node to keep this all working? Yes, at least that's how I understand Michel's idea. We keep both file descriptors for primary and render node around at the same time anyway. So the change is actually not that much. This also solves the problem that people are running things as root, since we now always use the render node and never the primary node for everything except KMS. Christian. > -Daniel
On 2019/06/21, Daniel Vetter wrote: > On Fri, Jun 21, 2019 at 11:25 AM Koenig, Christian > <Christian.Koenig@amd.com> wrote: > > > > Am 21.06.19 um 11:09 schrieb Daniel Vetter: > > > On Fri, Jun 21, 2019 at 07:12:14AM +0000, Koenig, Christian wrote: > > >> Am 20.06.19 um 18:30 schrieb Emil Velikov: > > >>> On 2019/06/14, Koenig, Christian wrote: > > >>>> Am 14.06.19 um 17:53 schrieb Emil Velikov: > > >>>>> On 2019/06/14, Koenig, Christian wrote: > > >>>>>> Am 14.06.19 um 14:09 schrieb Emil Velikov: > > >>>>>>> On 2019/05/27, Emil Velikov wrote: > > >>>>>>> [SNIP] > > >>>>>>> Hi Christian, > > >>>>>>> > > >>>>>>> > > >>>>>>> In the following, I would like to summarise and emphasize the need for > > >>>>>>> DRM_AUTH removal. I would kindly ask you to spend a couple of minutes > > >>>>>>> extra reading it. > > >>>>>>> > > >>>>>>> > > >>>>>>> Today DRM drivers* do not make any distinction between primary and > > >>>>>>> render node clients. > > >>>>>> That is actually not 100% correct. We have a special case where a DRM > > >>>>>> master is allowed to change the priority of render node clients. > > >>>>>> > > >>>>> Can you provide a link? I cannot find that code. > > >>>> See amdgpu_sched_ioctl(). > > >>>> > > >>>>>>> Thus for a render capable driver, any premise of > > >>>>>>> separation, security or otherwise imposed via DRM_AUTH is a fallacy. > > >>>>>> Yeah, that's what I agree on. I just don't think that removing DRM_AUTH > > >>>>>> now is the right direction to take. > > >>>>>> > > >>>>> Could have been clearer - I'm talking about DRM_AUTH | DRM_RENDER_ALLOW > > >>>>> ioctls. > > >>>>> > > >>>>> That aside, can you propose an alternative solution that addresses this > > >>>>> and the second point just below? > > >>>> Give me a few days to work on this, it's already Friday 6pm here. > > >>>> > > >>> Hi Christian, > > >>> > > >>> Any progress? As mentioned earlier, I'm OK with writing the patches although > > >>> I would love to hear your plan. > > >> First of all I tried to disable DRM authentication completely with a > > >> kernel config option. Surprisingly that actually works out of the box at > > >> least on the AMDGPU stack. > > >> > > >> This effectively allows us to get rid of DRI2 and the related security > > >> problems. Only thing left for that is that I'm just not sure how to > > >> signal this to userspace so that the DDX wouldn't advertise DRI2 at all > > >> any more. > > >> > > >> > > >> As a next step I looked into if we can disable the command submission > > >> for DRM master. Turned out that this is relatively easy as well. > > >> > > >> All we have to do is to fix the bug Michel pointed out about KMS handles > > >> for display and let the DDX use a render node instead of the DRM master > > >> for Glamor. Still need to sync up with Michel and/or Marek whats the > > >> best way of doing this. > > >> > > >> > > >> The last step (and that's what I haven't tried yet) would be to disable > > >> DRM authentication for Navi even when it is still compiled into the > > >> kernel. But that is more or less just a typing exercise. > > > So just to get this straight, we're now full on embracing a subsystem > > > split here: > > > - amdgpu plans to ditch dri2/rendering on primary nodes > > > - bunch of drivers (I think more than i915 by now) merged the DRM_AUTH > > > removal > > > - others are just hanging in there somehow > > > > > > "best of both^W worlds" ftw? > > > > Yes, exactly! > > > > Think a step further, when this is upstream we can apply the DRM_AUTH > > removal to amdgpu as well. > > So this is effectively what I suggested/planned with a couple of caveats: - making amdgpu behave differently from the rest of drm ;-( - having the KConfig amdgpu only and merged before this DRM_AUTH While I suggested: - keeping amdgpu consistent with the rest - exposing the KConfig option for the whole of the kernel, and - merging it alongside this work So I effectively waited for weeks, instead of simply going ahead and writing/submitting patches. That's a bit unfortunate... > > Because we simply made sure that userspace is using the render node for > > command submission and not the primary node. > > > > So nobody can go ahead and remove render node support any more :) > > How does this work? I thought the entire problem of DRM_AUTH removal > for you was that it broke stuff, and you didn't want to deal with > that. We still have that exact problem afaics ... old userspace > doesn't simply vanish, even if you entirely nuke it going forward. > > Also, testing on the amdgpu stack isn't really testing a hole lot > here, it's all the various silly compositors out there I'd expect to > fall over. Will gbm/radeonsi/whatever just internally do all the > necessary dma-buf import/export between the primary node and display > node to keep this all working? If I understood Christian, feel free to correct me, the fact that my earlier patch broke RADV was not the argument. The problem was was the patch does. In particular, that user-space will "remove" render nodes. I'm really sad that amdgpu insists on standing out, hope one day it will converge. Yet since all other maintainers are ok with the series, I'll start merging patches in a few hours. I'm really sad that amdgpu wants to stand out, hope it will converge sooner rather than later. Christian, how would you like amdgpu handled - with a separate flag in the driver or shall we special case amdgpu within core drm? Thanks Emil
Am 21.06.19 um 12:20 schrieb Emil Velikov: > On 2019/06/21, Daniel Vetter wrote: >> On Fri, Jun 21, 2019 at 11:25 AM Koenig, Christian >> <Christian.Koenig@amd.com> wrote: >>> Am 21.06.19 um 11:09 schrieb Daniel Vetter: >>>> On Fri, Jun 21, 2019 at 07:12:14AM +0000, Koenig, Christian wrote: >>>>> Am 20.06.19 um 18:30 schrieb Emil Velikov: >>>>>> On 2019/06/14, Koenig, Christian wrote: >>>>>>> Am 14.06.19 um 17:53 schrieb Emil Velikov: >>>>>>>> On 2019/06/14, Koenig, Christian wrote: >>>>>>>>> Am 14.06.19 um 14:09 schrieb Emil Velikov: >>>>>>>>>> On 2019/05/27, Emil Velikov wrote: >>>>>>>>>> [SNIP] >>>>>>>>>> Hi Christian, >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> In the following, I would like to summarise and emphasize the need for >>>>>>>>>> DRM_AUTH removal. I would kindly ask you to spend a couple of minutes >>>>>>>>>> extra reading it. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Today DRM drivers* do not make any distinction between primary and >>>>>>>>>> render node clients. >>>>>>>>> That is actually not 100% correct. We have a special case where a DRM >>>>>>>>> master is allowed to change the priority of render node clients. >>>>>>>>> >>>>>>>> Can you provide a link? I cannot find that code. >>>>>>> See amdgpu_sched_ioctl(). >>>>>>> >>>>>>>>>> Thus for a render capable driver, any premise of >>>>>>>>>> separation, security or otherwise imposed via DRM_AUTH is a fallacy. >>>>>>>>> Yeah, that's what I agree on. I just don't think that removing DRM_AUTH >>>>>>>>> now is the right direction to take. >>>>>>>>> >>>>>>>> Could have been clearer - I'm talking about DRM_AUTH | DRM_RENDER_ALLOW >>>>>>>> ioctls. >>>>>>>> >>>>>>>> That aside, can you propose an alternative solution that addresses this >>>>>>>> and the second point just below? >>>>>>> Give me a few days to work on this, it's already Friday 6pm here. >>>>>>> >>>>>> Hi Christian, >>>>>> >>>>>> Any progress? As mentioned earlier, I'm OK with writing the patches although >>>>>> I would love to hear your plan. >>>>> First of all I tried to disable DRM authentication completely with a >>>>> kernel config option. Surprisingly that actually works out of the box at >>>>> least on the AMDGPU stack. >>>>> >>>>> This effectively allows us to get rid of DRI2 and the related security >>>>> problems. Only thing left for that is that I'm just not sure how to >>>>> signal this to userspace so that the DDX wouldn't advertise DRI2 at all >>>>> any more. >>>>> >>>>> >>>>> As a next step I looked into if we can disable the command submission >>>>> for DRM master. Turned out that this is relatively easy as well. >>>>> >>>>> All we have to do is to fix the bug Michel pointed out about KMS handles >>>>> for display and let the DDX use a render node instead of the DRM master >>>>> for Glamor. Still need to sync up with Michel and/or Marek whats the >>>>> best way of doing this. >>>>> >>>>> >>>>> The last step (and that's what I haven't tried yet) would be to disable >>>>> DRM authentication for Navi even when it is still compiled into the >>>>> kernel. But that is more or less just a typing exercise. >>>> So just to get this straight, we're now full on embracing a subsystem >>>> split here: >>>> - amdgpu plans to ditch dri2/rendering on primary nodes >>>> - bunch of drivers (I think more than i915 by now) merged the DRM_AUTH >>>> removal >>>> - others are just hanging in there somehow >>>> >>>> "best of both^W worlds" ftw? >>> Yes, exactly! >>> >>> Think a step further, when this is upstream we can apply the DRM_AUTH >>> removal to amdgpu as well. >>> > So this is effectively what I suggested/planned with a couple of caveats: > - making amdgpu behave differently from the rest of drm ;-( > - having the KConfig amdgpu only and merged before this DRM_AUTH No this should apply to all drivers, not just amdgpu. > While I suggested: > - keeping amdgpu consistent with the rest > - exposing the KConfig option for the whole of the kernel, and > - merging it alongside this work > > So I effectively waited for weeks, instead of simply going ahead and > writing/submitting patches. That's a bit unfortunate... > >>> Because we simply made sure that userspace is using the render node for >>> command submission and not the primary node. >>> >>> So nobody can go ahead and remove render node support any more :) >> How does this work? I thought the entire problem of DRM_AUTH removal >> for you was that it broke stuff, and you didn't want to deal with >> that. We still have that exact problem afaics ... old userspace >> doesn't simply vanish, even if you entirely nuke it going forward. >> >> Also, testing on the amdgpu stack isn't really testing a hole lot >> here, it's all the various silly compositors out there I'd expect to >> fall over. Will gbm/radeonsi/whatever just internally do all the >> necessary dma-buf import/export between the primary node and display >> node to keep this all working? > If I understood Christian, feel free to correct me, the fact that my > earlier patch broke RADV was not the argument. The problem was was the > patch does. Well partially. That RADV broke is unfortunate, but we have done so many customized userspace stuff both based on Mesa/DDX as well as closed source components that it is just highly likely that we would break something else as well. > In particular, that user-space will "remove" render nodes. Yes, that is my main concern here. You basically make render nodes superfluously. That gives userspace all legitimization to also remove support for them. That is not stupid or evil or whatever, userspace would just follow the kernel design. > I'm really sad that amdgpu insists on standing out, hope one day it will > converge. Yet since all other maintainers are ok with the series, I'll > start merging patches in a few hours. I'm really sad that amdgpu wants > to stand out, hope it will converge sooner rather than later. > > Christian, how would you like amdgpu handled - with a separate flag in > the driver or shall we special case amdgpu within core drm? No, I ask you to please stick around DRM_AUTH for at least amdgpu and radeon. And NOT enable any render node functionality for them on the primary node. Thanks, Christian. > > Thanks > Emil
On 2019/06/21, Koenig, Christian wrote: > No this should apply to all drivers, not just amdgpu. > > > While I suggested: > > - keeping amdgpu consistent with the rest > > - exposing the KConfig option for the whole of the kernel, and > > - merging it alongside this work > > > > So I effectively waited for weeks, instead of simply going ahead and > > writing/submitting patches. That's a bit unfortunate... > > > >>> Because we simply made sure that userspace is using the render node for > >>> command submission and not the primary node. > >>> > >>> So nobody can go ahead and remove render node support any more :) > >> How does this work? I thought the entire problem of DRM_AUTH removal > >> for you was that it broke stuff, and you didn't want to deal with > >> that. We still have that exact problem afaics ... old userspace > >> doesn't simply vanish, even if you entirely nuke it going forward. > >> > >> Also, testing on the amdgpu stack isn't really testing a hole lot > >> here, it's all the various silly compositors out there I'd expect to > >> fall over. Will gbm/radeonsi/whatever just internally do all the > >> necessary dma-buf import/export between the primary node and display > >> node to keep this all working? > > If I understood Christian, feel free to correct me, the fact that my > > earlier patch broke RADV was not the argument. The problem was was the > > patch does. > > Well partially. That RADV broke is unfortunate, but we have done so many > customized userspace stuff both based on Mesa/DDX as well as closed > source components that it is just highly likely that we would break > something else as well. > As an engineer I like to work with tangibles. The highly likely part is probable, but as mentioned multiple times the series allows for a _dead_ trivial way to address any such problems. > > In particular, that user-space will "remove" render nodes. > > Yes, that is my main concern here. You basically make render nodes > superfluously. That gives userspace all legitimization to also remove > support for them. That is not stupid or evil or whatever, userspace > would just follow the kernel design. > Do you have an example of userspace doing such an extremely silly thing? It does seem like suspect you're a couple of steps beyond overcautious, perhaps rightfully so. Maybe you've seen some closed-source user-space going crazy? Or any other projects? In other words, being cautious is great, but without references of misuse it's very hard for others to draw the full picture. > > I'm really sad that amdgpu insists on standing out, hope one day it will > > converge. Yet since all other maintainers are ok with the series, I'll > > start merging patches in a few hours. I'm really sad that amdgpu wants > > to stand out, hope it will converge sooner rather than later. > > > > Christian, how would you like amdgpu handled - with a separate flag in > > the driver or shall we special case amdgpu within core drm? > > No, I ask you to please stick around DRM_AUTH for at least amdgpu and > radeon. And NOT enable any render node functionality for them on the > primary node. > My question is how do you want this handled: - DRM_PLEASE_FORCE_AUTH - added to AMDGPU/RADEON, or - driver_name == amdgpu, in core DRM. Thanks Emil
On Fri, Jun 21, 2019 at 12:31 PM Koenig, Christian <Christian.Koenig@amd.com> wrote: > Am 21.06.19 um 12:20 schrieb Emil Velikov: > > In particular, that user-space will "remove" render nodes. > > Yes, that is my main concern here. You basically make render nodes > superfluously. That gives userspace all legitimization to also remove > support for them. That is not stupid or evil or whatever, userspace > would just follow the kernel design. This already happened. At least for kms-only display socs we had to hide the separate render node (and there you really have to deal with the separate render node, because it's a distinct driver) entirely within gbm/mesa. So if you want to depracate render functionality on primary nodes, you _have_ to do that hiding in userspace. Or you'll break a lot of compositors everywhere. Just testing -amdgpu doesn't really prove anything here. So you won't move the larger ecosystem with this at all, that ship sailed. At that point this all feels like a bikeshed, and for a bikeshed I don't care what the color is we pick, as long as they're all painted the same. Once we picked a color it's a simple technical matter of how to roll it out, using Kconfig options, or only enabling on new hw, or "merge and fix the regressions as they come" or any of the other well proven paths forward. So if you want to do this, please start with the mesa side work (as the biggest userspace, not all of it) with patches to redirect all primary node rendering to render nodes. The code is there already for socs, just needs to be rolled out more. Or we go with the DRM_AUTH horrors. Or a 3rd option, I really don't care which it is, as long as its consistent. tldr; consistent color choice on this bikeshed please. Thanks, Daniel
Am 21.06.19 um 12:53 schrieb Emil Velikov: > On 2019/06/21, Koenig, Christian wrote: >> [SNIP] >> Well partially. That RADV broke is unfortunate, but we have done so many >> customized userspace stuff both based on Mesa/DDX as well as closed >> source components that it is just highly likely that we would break >> something else as well. >> > As an engineer I like to work with tangibles. The highly likely part is > probable, but as mentioned multiple times the series allows for a _dead_ > trivial way to address any such problems. Well to clarify my concern is that this won't be so trivial. You implemented on workaround for one specific case and it is perfectly possible that for other cases we would have to completely revert the removal of DRM_AUTH. >>> In particular, that user-space will "remove" render nodes. >> Yes, that is my main concern here. You basically make render nodes >> superfluously. That gives userspace all legitimization to also remove >> support for them. That is not stupid or evil or whatever, userspace >> would just follow the kernel design. >> > Do you have an example of userspace doing such an extremely silly thing? > It does seem like suspect you're a couple of steps beyond overcautious, > perhaps rightfully so. Maybe you've seen some closed-source user-space > going crazy? Or any other projects? The key point is that I don't think this is silly or strange or crazy at all. See the kernel defines the interface userspace can and should use. When the kernel defines that everything will work with the primary node it is perfectly valid for userspace to drop support for the render node. I mean why should they keep this? Just because we tell them not to do this? > In other words, being cautious is great, but without references of > misuse it's very hard for others to draw the full picture. > >>> I'm really sad that amdgpu insists on standing out, hope one day it will >>> converge. Yet since all other maintainers are ok with the series, I'll >>> start merging patches in a few hours. I'm really sad that amdgpu wants >>> to stand out, hope it will converge sooner rather than later. >>> >>> Christian, how would you like amdgpu handled - with a separate flag in >>> the driver or shall we special case amdgpu within core drm? >> No, I ask you to please stick around DRM_AUTH for at least amdgpu and >> radeon. And NOT enable any render node functionality for them on the >> primary node. >> > My question is how do you want this handled: > - DRM_PLEASE_FORCE_AUTH - added to AMDGPU/RADEON, or > - driver_name == amdgpu, in core DRM. I want to keep DRM_AUTH in amdgpu and radeon for at least the next 5-10 years. At least until we have established that nobody is using the primary node for command submission any more. Plus some grace time to make sure that this has been spread enough. Regards, Christian. > > > Thanks > Emil
Am 21.06.19 um 13:03 schrieb Daniel Vetter: > On Fri, Jun 21, 2019 at 12:31 PM Koenig, Christian > <Christian.Koenig@amd.com> wrote: >> Am 21.06.19 um 12:20 schrieb Emil Velikov: >>> In particular, that user-space will "remove" render nodes. >> Yes, that is my main concern here. You basically make render nodes >> superfluously. That gives userspace all legitimization to also remove >> support for them. That is not stupid or evil or whatever, userspace >> would just follow the kernel design. > This already happened. At least for kms-only display socs we had to > hide the separate render node (and there you really have to deal with > the separate render node, because it's a distinct driver) entirely > within gbm/mesa. Ok, that is something I didn't knew before and is rather interesting. > So if you want to depracate render functionality on primary nodes, you > _have_ to do that hiding in userspace. Or you'll break a lot of > compositors everywhere. Just testing -amdgpu doesn't really prove > anything here. So you won't move the larger ecosystem with this at > all, that ship sailed. So what else can we do? That sounds like you suggest we should completely drop render node functionality. I mean it's not my preferred option, but certainly something that everybody could do. My primary concern is that we somehow need to get rid of thinks like GEM flink and all that other crufty stuff we still have around on the primary node (we should probably make a list of that). Switching everything over to render nodes just sounded like the best alternative so far to archive that. > At that point this all feels like a bikeshed, and for a bikeshed I > don't care what the color is we pick, as long as they're all painted > the same. > > Once we picked a color it's a simple technical matter of how to roll > it out, using Kconfig options, or only enabling on new hw, or "merge > and fix the regressions as they come" or any of the other well proven > paths forward. Yeah, the problem is I don't see an option which currently works for everyone. I absolutely need a grace time of multiple years until we can apply this to amdgpu/radeon. And that under the prerequisite to have a plan to somehow enable that functionality now to make it at least painful for userspace to rely on hack around that. > So if you want to do this, please start with the mesa side work (as > the biggest userspace, not all of it) with patches to redirect all > primary node rendering to render nodes. The code is there already for > socs, just needs to be rolled out more. Or we go with the DRM_AUTH > horrors. Or a 3rd option, I really don't care which it is, as long as > its consistent. How about this: 1. We keep DRM_AUTH around for amdgpu/radeon for now. 2. We add a Kconfig option to ignore DRM_AUTH, currently default to N. This also works as a workaround for old RADV. 3. We start to work on further removing old cruft from the primary node. Possible the Kconfig option I suggested to disable GEM flink. Regards, Christian. > > tldr; consistent color choice on this bikeshed please. > > Thanks, Daniel
On Fri, Jun 21, 2019 at 1:37 PM Christian König <ckoenig.leichtzumerken@gmail.com> wrote: > > Am 21.06.19 um 13:03 schrieb Daniel Vetter: > > On Fri, Jun 21, 2019 at 12:31 PM Koenig, Christian > > <Christian.Koenig@amd.com> wrote: > >> Am 21.06.19 um 12:20 schrieb Emil Velikov: > >>> In particular, that user-space will "remove" render nodes. > >> Yes, that is my main concern here. You basically make render nodes > >> superfluously. That gives userspace all legitimization to also remove > >> support for them. That is not stupid or evil or whatever, userspace > >> would just follow the kernel design. > > This already happened. At least for kms-only display socs we had to > > hide the separate render node (and there you really have to deal with > > the separate render node, because it's a distinct driver) entirely > > within gbm/mesa. > > Ok, that is something I didn't knew before and is rather interesting. > > > So if you want to depracate render functionality on primary nodes, you > > _have_ to do that hiding in userspace. Or you'll break a lot of > > compositors everywhere. Just testing -amdgpu doesn't really prove > > anything here. So you won't move the larger ecosystem with this at > > all, that ship sailed. > > So what else can we do? That sounds like you suggest we should > completely drop render node functionality. > > I mean it's not my preferred option, but certainly something that > everybody could do. > > My primary concern is that we somehow need to get rid of thinks like GEM > flink and all that other crufty stuff we still have around on the > primary node (we should probably make a list of that). > > Switching everything over to render nodes just sounded like the best > alternative so far to archive that. tbh I do like that plan too, but it's a lot more work. And I think to have any push whatsoever we probably need to roll it out in gbm as a hack to keep things going. But probably not enough. I also think that at least some compositors will bother to do the right thing, and actually bother with render nodes and all that correctly. It's just that there's way more which dont. Also for server rendering it'll be render nodes all the way down I hope (or we need to seriously educate cloud people about the permissions they set on their default images, and distros on how this cloud stuff should work. > > At that point this all feels like a bikeshed, and for a bikeshed I > > don't care what the color is we pick, as long as they're all painted > > the same. > > > > Once we picked a color it's a simple technical matter of how to roll > > it out, using Kconfig options, or only enabling on new hw, or "merge > > and fix the regressions as they come" or any of the other well proven > > paths forward. > > Yeah, the problem is I don't see an option which currently works for > everyone. > > I absolutely need a grace time of multiple years until we can apply this > to amdgpu/radeon. Yeah that's what I meant with "pick a color, pick a way to roll it out". "enable for new hw, roll out years and years later" is one of the options for roll out. > And that under the prerequisite to have a plan to somehow enable that > functionality now to make it at least painful for userspace to rely on > hack around that. > > > So if you want to do this, please start with the mesa side work (as > > the biggest userspace, not all of it) with patches to redirect all > > primary node rendering to render nodes. The code is there already for > > socs, just needs to be rolled out more. Or we go with the DRM_AUTH > > horrors. Or a 3rd option, I really don't care which it is, as long as > > its consistent. > > How about this: > 1. We keep DRM_AUTH around for amdgpu/radeon for now. > 2. We add a Kconfig option to ignore DRM_AUTH, currently default to N. > This also works as a workaround for old RADV. > 3. We start to work on further removing old cruft from the primary node. > Possible the Kconfig option I suggested to disable GEM flink. Hm I'm not worried about flink really. It's gem_open which is the security gap, and that one has to stay master-only forever. I guess we could try to disable that so people have to deal with dma-buf (and once you have that render nodes become a lot easier to use). But then I still think we have drivers which don't do dma-buf self-import, so again we're stuck. So maybe first step is to just roll out a default self-import dma-buf support for every gem driver. Then start ditching flink/gem_open. Then start ditching render support on primary nodes. Every step in the way taking a few years of prodding userspace plus even more years to wait until it's all gone. Another option would be to extract the kms stuff from primary nodes, but we've tried that with control nodes. Until I realized a few years back that with control nodes it's impossible to get any rendered buffer in there at all, so useless, and I removed it. No one ever complained. So yeah would be really nice if we could fix this, but the universe conspires against us too much it seems. Hence the fallback of "please at least lets aim for a consistent color for this mess, whatever it is". Cheers, Daniel
On 2019/06/21, Koenig, Christian wrote: > Am 21.06.19 um 12:53 schrieb Emil Velikov: > > On 2019/06/21, Koenig, Christian wrote: > >> [SNIP] > >> Well partially. That RADV broke is unfortunate, but we have done so many > >> customized userspace stuff both based on Mesa/DDX as well as closed > >> source components that it is just highly likely that we would break > >> something else as well. > >> > > As an engineer I like to work with tangibles. The highly likely part is > > probable, but as mentioned multiple times the series allows for a _dead_ > > trivial way to address any such problems. > > Well to clarify my concern is that this won't be so trivial. > > You implemented on workaround for one specific case and it is perfectly > possible that for other cases we would have to completely revert the > removal of DRM_AUTH. > I would encourage you to take a closer look at my patch and point out how parcicular cases cannot be handled. > >>> In particular, that user-space will "remove" render nodes. > >> Yes, that is my main concern here. You basically make render nodes > >> superfluously. That gives userspace all legitimization to also remove > >> support for them. That is not stupid or evil or whatever, userspace > >> would just follow the kernel design. > >> > > Do you have an example of userspace doing such an extremely silly thing? > > It does seem like suspect you're a couple of steps beyond overcautious, > > perhaps rightfully so. Maybe you've seen some closed-source user-space > > going crazy? Or any other projects? > > The key point is that I don't think this is silly or strange or crazy at > all. See the kernel defines the interface userspace can and should use. > > When the kernel defines that everything will work with the primary node > it is perfectly valid for userspace to drop support for the render node. > > I mean why should they keep this? Just because we tell them not to do this? > From your experiense, do user-space developers care about doing (or generally do) the right thing? In either case, as pointed already the cat is out of the bag - has been for years, and if developers did behave as you describe them they would have "removed" render nodes already. > > In other words, being cautious is great, but without references of > > misuse it's very hard for others to draw the full picture. > > > >>> I'm really sad that amdgpu insists on standing out, hope one day it will > >>> converge. Yet since all other maintainers are ok with the series, I'll > >>> start merging patches in a few hours. I'm really sad that amdgpu wants > >>> to stand out, hope it will converge sooner rather than later. > >>> > >>> Christian, how would you like amdgpu handled - with a separate flag in > >>> the driver or shall we special case amdgpu within core drm? > >> No, I ask you to please stick around DRM_AUTH for at least amdgpu and > >> radeon. And NOT enable any render node functionality for them on the > >> primary node. > >> > > My question is how do you want this handled: > > - DRM_PLEASE_FORCE_AUTH - added to AMDGPU/RADEON, or > > - driver_name == amdgpu, in core DRM. > > I want to keep DRM_AUTH in amdgpu and radeon for at least the next 5-10 > years. > Believe we're all fully aware of that fact. I'm asking which _approach_ you prefer. That said, I'll send a new patch/series and we'll nitpick it there. Thanks -Emil
On Fri, Jun 21, 2019 at 1:50 PM Daniel Vetter <daniel@ffwll.ch> wrote: > > On Fri, Jun 21, 2019 at 1:37 PM Christian König > <ckoenig.leichtzumerken@gmail.com> wrote: > > > > Am 21.06.19 um 13:03 schrieb Daniel Vetter: > > > On Fri, Jun 21, 2019 at 12:31 PM Koenig, Christian > > > <Christian.Koenig@amd.com> wrote: > > >> Am 21.06.19 um 12:20 schrieb Emil Velikov: > > >>> In particular, that user-space will "remove" render nodes. > > >> Yes, that is my main concern here. You basically make render nodes > > >> superfluously. That gives userspace all legitimization to also remove > > >> support for them. That is not stupid or evil or whatever, userspace > > >> would just follow the kernel design. > > > This already happened. At least for kms-only display socs we had to > > > hide the separate render node (and there you really have to deal with > > > the separate render node, because it's a distinct driver) entirely > > > within gbm/mesa. > > > > Ok, that is something I didn't knew before and is rather interesting. > > > > > So if you want to depracate render functionality on primary nodes, you > > > _have_ to do that hiding in userspace. Or you'll break a lot of > > > compositors everywhere. Just testing -amdgpu doesn't really prove > > > anything here. So you won't move the larger ecosystem with this at > > > all, that ship sailed. > > > > So what else can we do? That sounds like you suggest we should > > completely drop render node functionality. > > > > I mean it's not my preferred option, but certainly something that > > everybody could do. > > > > My primary concern is that we somehow need to get rid of thinks like GEM > > flink and all that other crufty stuff we still have around on the > > primary node (we should probably make a list of that). > > > > Switching everything over to render nodes just sounded like the best > > alternative so far to archive that. > > tbh I do like that plan too, but it's a lot more work. And I think to > have any push whatsoever we probably need to roll it out in gbm as a > hack to keep things going. But probably not enough. > > I also think that at least some compositors will bother to do the > right thing, and actually bother with render nodes and all that > correctly. It's just that there's way more which dont. > > Also for server rendering it'll be render nodes all the way down I > hope (or we need to seriously educate cloud people about the > permissions they set on their default images, and distros on how this > cloud stuff should work. > > > > At that point this all feels like a bikeshed, and for a bikeshed I > > > don't care what the color is we pick, as long as they're all painted > > > the same. > > > > > > Once we picked a color it's a simple technical matter of how to roll > > > it out, using Kconfig options, or only enabling on new hw, or "merge > > > and fix the regressions as they come" or any of the other well proven > > > paths forward. > > > > Yeah, the problem is I don't see an option which currently works for > > everyone. > > > > I absolutely need a grace time of multiple years until we can apply this > > to amdgpu/radeon. > > Yeah that's what I meant with "pick a color, pick a way to roll it > out". "enable for new hw, roll out years and years later" is one of > the options for roll out. > > > And that under the prerequisite to have a plan to somehow enable that > > functionality now to make it at least painful for userspace to rely on > > hack around that. > > > > > So if you want to do this, please start with the mesa side work (as > > > the biggest userspace, not all of it) with patches to redirect all > > > primary node rendering to render nodes. The code is there already for > > > socs, just needs to be rolled out more. Or we go with the DRM_AUTH > > > horrors. Or a 3rd option, I really don't care which it is, as long as > > > its consistent. > > > > How about this: > > 1. We keep DRM_AUTH around for amdgpu/radeon for now. > > 2. We add a Kconfig option to ignore DRM_AUTH, currently default to N. > > This also works as a workaround for old RADV. > > 3. We start to work on further removing old cruft from the primary node. > > Possible the Kconfig option I suggested to disable GEM flink. > > Hm I'm not worried about flink really. It's gem_open which is the > security gap, and that one has to stay master-only forever. I guess we > could try to disable that so people have to deal with dma-buf (and > once you have that render nodes become a lot easier to use). But then > I still think we have drivers which don't do dma-buf self-import, so > again we're stuck. So maybe first step is to just roll out a default > self-import dma-buf support for every gem driver. Then start ditching > flink/gem_open. Then start ditching render support on primary nodes. > Every step in the way taking a few years of prodding userspace plus > even more years to wait until it's all gone. I forgot one step here: I think we even still have drivers without render node support. As long as those exists (and are still relevant) then userspace needs primary node rendering + flink code anyway. And they're not going to be happy about us telling them to add more. So I think that would need to be fixed first. Hence rough plan: - Make sure all gem drivers with rendering have render nodes. - Roll out self-import of dma-buf for all gem drivers (we can do that with 0 driver support, it's like flink). - Roll out mesa gbm for everyone to ignore primary nodes for rendering as much as possible. Maybe needs more gbm work so that compositors can ask for the display drmfd and the render drmfd. - wait. like really long time :-/ - slowly deprecate flink for new hw as additional forcing function to get people to move to dma-buf and render nodes - wait even longer - ditch rendering on primary nodes. Lots of work, and I really mean _lots_, but I think this has a chance of actually working. -Daniel > Another option would be to extract the kms stuff from primary nodes, > but we've tried that with control nodes. Until I realized a few years > back that with control nodes it's impossible to get any rendered > buffer in there at all, so useless, and I removed it. No one ever > complained. > > So yeah would be really nice if we could fix this, but the universe > conspires against us too much it seems. Hence the fallback of "please > at least lets aim for a consistent color for this mess, whatever it > is". > > Cheers, Daniel > -- > Daniel Vetter > Software Engineer, Intel Corporation > +41 (0) 79 365 57 48 - http://blog.ffwll.ch
On 2019/06/21, Daniel Vetter wrote: > On Fri, Jun 21, 2019 at 1:50 PM Daniel Vetter <daniel@ffwll.ch> wrote: > > > > On Fri, Jun 21, 2019 at 1:37 PM Christian König > > <ckoenig.leichtzumerken@gmail.com> wrote: > > > > > > Am 21.06.19 um 13:03 schrieb Daniel Vetter: > > > > On Fri, Jun 21, 2019 at 12:31 PM Koenig, Christian > > > > <Christian.Koenig@amd.com> wrote: > > > >> Am 21.06.19 um 12:20 schrieb Emil Velikov: > > > >>> In particular, that user-space will "remove" render nodes. > > > >> Yes, that is my main concern here. You basically make render nodes > > > >> superfluously. That gives userspace all legitimization to also remove > > > >> support for them. That is not stupid or evil or whatever, userspace > > > >> would just follow the kernel design. > > > > This already happened. At least for kms-only display socs we had to > > > > hide the separate render node (and there you really have to deal with > > > > the separate render node, because it's a distinct driver) entirely > > > > within gbm/mesa. > > > > > > Ok, that is something I didn't knew before and is rather interesting. > > > > > > > So if you want to depracate render functionality on primary nodes, you > > > > _have_ to do that hiding in userspace. Or you'll break a lot of > > > > compositors everywhere. Just testing -amdgpu doesn't really prove > > > > anything here. So you won't move the larger ecosystem with this at > > > > all, that ship sailed. > > > > > > So what else can we do? That sounds like you suggest we should > > > completely drop render node functionality. > > > > > > I mean it's not my preferred option, but certainly something that > > > everybody could do. > > > > > > My primary concern is that we somehow need to get rid of thinks like GEM > > > flink and all that other crufty stuff we still have around on the > > > primary node (we should probably make a list of that). > > > > > > Switching everything over to render nodes just sounded like the best > > > alternative so far to archive that. > > > > tbh I do like that plan too, but it's a lot more work. And I think to > > have any push whatsoever we probably need to roll it out in gbm as a > > hack to keep things going. But probably not enough. > > > > I also think that at least some compositors will bother to do the > > right thing, and actually bother with render nodes and all that > > correctly. It's just that there's way more which dont. > > > > Also for server rendering it'll be render nodes all the way down I > > hope (or we need to seriously educate cloud people about the > > permissions they set on their default images, and distros on how this > > cloud stuff should work. > > > > > > At that point this all feels like a bikeshed, and for a bikeshed I > > > > don't care what the color is we pick, as long as they're all painted > > > > the same. > > > > > > > > Once we picked a color it's a simple technical matter of how to roll > > > > it out, using Kconfig options, or only enabling on new hw, or "merge > > > > and fix the regressions as they come" or any of the other well proven > > > > paths forward. > > > > > > Yeah, the problem is I don't see an option which currently works for > > > everyone. > > > > > > I absolutely need a grace time of multiple years until we can apply this > > > to amdgpu/radeon. > > > > Yeah that's what I meant with "pick a color, pick a way to roll it > > out". "enable for new hw, roll out years and years later" is one of > > the options for roll out. > > > > > And that under the prerequisite to have a plan to somehow enable that > > > functionality now to make it at least painful for userspace to rely on > > > hack around that. > > > > > > > So if you want to do this, please start with the mesa side work (as > > > > the biggest userspace, not all of it) with patches to redirect all > > > > primary node rendering to render nodes. The code is there already for > > > > socs, just needs to be rolled out more. Or we go with the DRM_AUTH > > > > horrors. Or a 3rd option, I really don't care which it is, as long as > > > > its consistent. > > > > > > How about this: > > > 1. We keep DRM_AUTH around for amdgpu/radeon for now. > > > 2. We add a Kconfig option to ignore DRM_AUTH, currently default to N. > > > This also works as a workaround for old RADV. > > > 3. We start to work on further removing old cruft from the primary node. > > > Possible the Kconfig option I suggested to disable GEM flink. > > > > Hm I'm not worried about flink really. It's gem_open which is the > > security gap, and that one has to stay master-only forever. I guess we > > could try to disable that so people have to deal with dma-buf (and > > once you have that render nodes become a lot easier to use). But then > > I still think we have drivers which don't do dma-buf self-import, so > > again we're stuck. So maybe first step is to just roll out a default > > self-import dma-buf support for every gem driver. Then start ditching > > flink/gem_open. Then start ditching render support on primary nodes. > > Every step in the way taking a few years of prodding userspace plus > > even more years to wait until it's all gone. > > I forgot one step here: I think we even still have drivers without > render node support. As long as those exists (and are still relevant) > then userspace needs primary node rendering + flink code anyway. And > they're not going to be happy about us telling them to add more. So I > think that would need to be fixed first. Hence rough plan: > > - Make sure all gem drivers with rendering have render nodes. > - Roll out self-import of dma-buf for all gem drivers (we can do that > with 0 driver support, it's like flink). > - Roll out mesa gbm for everyone to ignore primary nodes for rendering > as much as possible. Maybe needs more gbm work so that compositors can > ask for the display drmfd and the render drmfd. > - wait. like really long time :-/ > - slowly deprecate flink for new hw as additional forcing function to > get people to move to dma-buf and render nodes > - wait even longer > - ditch rendering on primary nodes. > > Lots of work, and I really mean _lots_, but I think this has a chance > of actually working. Thanks for the extensive proposal/list Daniel. I mentioned a, perhaps too short, version of this a while back. Above sounds perfectly reasonable IMHO. -Emil
Am 21.06.19 um 13:58 schrieb Emil Velikov: > On 2019/06/21, Koenig, Christian wrote: >> Am 21.06.19 um 12:53 schrieb Emil Velikov: >>> On 2019/06/21, Koenig, Christian wrote: >>>> [SNIP] >>>> Well partially. That RADV broke is unfortunate, but we have done so many >>>> customized userspace stuff both based on Mesa/DDX as well as closed >>>> source components that it is just highly likely that we would break >>>> something else as well. >>>> >>> As an engineer I like to work with tangibles. The highly likely part is >>> probable, but as mentioned multiple times the series allows for a _dead_ >>> trivial way to address any such problems. >> Well to clarify my concern is that this won't be so trivial. >> >> You implemented on workaround for one specific case and it is perfectly >> possible that for other cases we would have to completely revert the >> removal of DRM_AUTH. >> > I would encourage you to take a closer look at my patch and point out > how parcicular cases cannot be handled. Well the last time I looked it only blocked the first call to the IOCTL. If that is no longer the case then what is the actual difference to DRM_AUTH+DRM_ALLOW_RENDER? >>>>> In particular, that user-space will "remove" render nodes. >>>> Yes, that is my main concern here. You basically make render nodes >>>> superfluously. That gives userspace all legitimization to also remove >>>> support for them. That is not stupid or evil or whatever, userspace >>>> would just follow the kernel design. >>>> >>> Do you have an example of userspace doing such an extremely silly thing? >>> It does seem like suspect you're a couple of steps beyond overcautious, >>> perhaps rightfully so. Maybe you've seen some closed-source user-space >>> going crazy? Or any other projects? >> The key point is that I don't think this is silly or strange or crazy at >> all. See the kernel defines the interface userspace can and should use. >> >> When the kernel defines that everything will work with the primary node >> it is perfectly valid for userspace to drop support for the render node. >> >> I mean why should they keep this? Just because we tell them not to do this? >> > From your experiense, do user-space developers care about doing (or > generally do) the right thing? No, not at all. Except for Marek and Michel they just take what works and even Marek is often short-cutting on this. > In either case, as pointed already the cat is out of the bag - has been > for years, and if developers did behave as you describe them they would > have "removed" render nodes already. Currently render nodes are mandatory because they are needed for headless operation. E.g. we have a whole bunch of customers which do transcoding with that on GPUs which don't even have a CRTC or even X running. Regards, Christian.
On 2019/06/21, Koenig, Christian wrote: > Am 21.06.19 um 13:58 schrieb Emil Velikov: > > On 2019/06/21, Koenig, Christian wrote: > >> Am 21.06.19 um 12:53 schrieb Emil Velikov: > >>> On 2019/06/21, Koenig, Christian wrote: > >>>> [SNIP] > >>>> Well partially. That RADV broke is unfortunate, but we have done so many > >>>> customized userspace stuff both based on Mesa/DDX as well as closed > >>>> source components that it is just highly likely that we would break > >>>> something else as well. > >>>> > >>> As an engineer I like to work with tangibles. The highly likely part is > >>> probable, but as mentioned multiple times the series allows for a _dead_ > >>> trivial way to address any such problems. > >> Well to clarify my concern is that this won't be so trivial. > >> > >> You implemented on workaround for one specific case and it is perfectly > >> possible that for other cases we would have to completely revert the > >> removal of DRM_AUTH. > >> > > I would encourage you to take a closer look at my patch and point out > > how parcicular cases cannot be handled. > > Well the last time I looked it only blocked the first call to the IOCTL. > Hmm interesting, you're replied to my patch without even reading it :'-( Can you please give it a close look and reply inline? [1] https://lists.freedesktop.org/archives/dri-devel/2019-May/219238.html > > From your experiense, do user-space developers care about doing (or > > generally do) the right thing? > > No, not at all. Except for Marek and Michel they just take what works > and even Marek is often short-cutting on this. > Interesting, guess I should have asked this question from the start. Thanks Emil
Am 21.06.19 um 14:47 schrieb Emil Velikov: > On 2019/06/21, Koenig, Christian wrote: >> Am 21.06.19 um 13:58 schrieb Emil Velikov: >>> On 2019/06/21, Koenig, Christian wrote: >>>> Am 21.06.19 um 12:53 schrieb Emil Velikov: >>>>> On 2019/06/21, Koenig, Christian wrote: >>>>>> [SNIP] >>>>>> Well partially. That RADV broke is unfortunate, but we have done so many >>>>>> customized userspace stuff both based on Mesa/DDX as well as closed >>>>>> source components that it is just highly likely that we would break >>>>>> something else as well. >>>>>> >>>>> As an engineer I like to work with tangibles. The highly likely part is >>>>> probable, but as mentioned multiple times the series allows for a _dead_ >>>>> trivial way to address any such problems. >>>> Well to clarify my concern is that this won't be so trivial. >>>> >>>> You implemented on workaround for one specific case and it is perfectly >>>> possible that for other cases we would have to completely revert the >>>> removal of DRM_AUTH. >>>> >>> I would encourage you to take a closer look at my patch and point out >>> how parcicular cases cannot be handled. >> Well the last time I looked it only blocked the first call to the IOCTL. >> > Hmm interesting, you're replied to my patch without even reading it :'-( Well I've NAKed the series because of the exposure of the render node functionality > Can you please give it a close look and reply inline? > > [1] https://lists.freedesktop.org/archives/dri-devel/2019-May/219238.html So the workaround no longer just blocks the first IOCTL. But then the question is why don't you just keep the DRM_AUTH flag instead of adding the same functionality with a new one? >>> From your experiense, do user-space developers care about doing (or >>> generally do) the right thing? >> No, not at all. Except for Marek and Michel they just take what works >> and even Marek is often short-cutting on this. >> > Interesting, guess I should have asked this question from the start. Well sounds like you don't have to work with a closed source driver team. But as I said I honestly would do the same as user space developer. I mean it's really a bunch of more code to maintain, and getting rid of code is always less work in the long term then keeping it. That kernel developers scream: No no, please don't do that we want to keep using it is completely irrelevant for this. Christian. > > Thanks > Emil
On 2019-06-21 1:50 p.m., Daniel Vetter wrote: > On Fri, Jun 21, 2019 at 1:37 PM Christian König > <ckoenig.leichtzumerken@gmail.com> wrote: >> Am 21.06.19 um 13:03 schrieb Daniel Vetter: >>> So if you want to depracate render functionality on primary nodes, you >>> _have_ to do that hiding in userspace. Or you'll break a lot of >>> compositors everywhere. Just testing -amdgpu doesn't really prove >>> anything here. So you won't move the larger ecosystem with this at >>> all, that ship sailed. >> >> So what else can we do? That sounds like you suggest we should >> completely drop render node functionality. >> >> I mean it's not my preferred option, but certainly something that >> everybody could do. >> >> My primary concern is that we somehow need to get rid of thinks like GEM >> flink and all that other crufty stuff we still have around on the >> primary node (we should probably make a list of that). >> >> Switching everything over to render nodes just sounded like the best >> alternative so far to archive that. > > tbh I do like that plan too, but it's a lot more work. And I think to > have any push whatsoever we probably need to roll it out in gbm as a > hack to keep things going. But probably not enough. > > I also think that at least some compositors will bother to do the > right thing, and actually bother with render nodes and all that > correctly. It's just that there's way more which dont. With amdgpu, we can make userspace always use render nodes for rendering with changes to libdrm_amdgpu/radeonsi/xf86-video-amdgpu (and maybe some amdgpu-pro components) only. No GBM/compositor changes needed. >>> At that point this all feels like a bikeshed, and for a bikeshed I >>> don't care what the color is we pick, as long as they're all painted >>> the same. Then some sheds shouldn't have been re-painted without DRM_AUTH already... >>> Once we picked a color it's a simple technical matter of how to roll >>> it out, using Kconfig options, or only enabling on new hw, or "merge >>> and fix the regressions as they come" or any of the other well proven >>> paths forward. >> >> Yeah, the problem is I don't see an option which currently works for >> everyone. >> >> I absolutely need a grace time of multiple years until we can apply this >> to amdgpu/radeon. > > Yeah that's what I meant with "pick a color, pick a way to roll it > out". "enable for new hw, roll out years and years later" is one of > the options for roll out. One gotcha being that generic userspace such as the Xorg modesetting driver may still try to use phased out functionality such as DRI2 with new hardware. >> How about this: >> 1. We keep DRM_AUTH around for amdgpu/radeon for now. >> 2. We add a Kconfig option to ignore DRM_AUTH, currently default to N. >> This also works as a workaround for old RADV. >> 3. We start to work on further removing old cruft from the primary node. >> Possible the Kconfig option I suggested to disable GEM flink. > > Hm I'm not worried about flink really. It's gem_open which is the > security gap, and that one has to stay master-only forever. GEM_OPEN is used by DRI2 clients, it can't be master-only. It's probably one of the main reasons for the existence of DRM_AUTH.
On 2019-06-21 1:58 p.m., Emil Velikov wrote: > On 2019/06/21, Koenig, Christian wrote: >> Am 21.06.19 um 12:53 schrieb Emil Velikov: >>> On 2019/06/21, Koenig, Christian wrote: > >>>>> In particular, that user-space will "remove" render nodes. >>>> Yes, that is my main concern here. You basically make render nodes >>>> superfluously. That gives userspace all legitimization to also remove >>>> support for them. That is not stupid or evil or whatever, userspace >>>> would just follow the kernel design. >>>> >>> Do you have an example of userspace doing such an extremely silly thing? >>> It does seem like suspect you're a couple of steps beyond overcautious, >>> perhaps rightfully so. Maybe you've seen some closed-source user-space >>> going crazy? Or any other projects? >> >> The key point is that I don't think this is silly or strange or crazy at >> all. See the kernel defines the interface userspace can and should use. >> >> When the kernel defines that everything will work with the primary node >> it is perfectly valid for userspace to drop support for the render node. >> >> I mean why should they keep this? Just because we tell them not to do this? >> > From your experiense, do user-space developers care about doing (or > generally do) the right thing? > > In either case, as pointed already the cat is out of the bag - has been > for years, and if developers did behave as you describe them they would > have "removed" render nodes already. That may be the case with userspace specific to DRM_AUTH-less kernel drivers, but such userspace couldn't work with all the other drivers. So I'd argue that while the bag may be open and the cat's tail may even be sticking out already, the cat is still firmly in the bag. :)
On Fri, Jun 21, 2019 at 01:00:17PM +0000, Koenig, Christian wrote: > Am 21.06.19 um 14:47 schrieb Emil Velikov: > > On 2019/06/21, Koenig, Christian wrote: > >> Am 21.06.19 um 13:58 schrieb Emil Velikov: > >>> On 2019/06/21, Koenig, Christian wrote: > >>>> Am 21.06.19 um 12:53 schrieb Emil Velikov: > >>>>> On 2019/06/21, Koenig, Christian wrote: > >>>>>> [SNIP] > >>>>>> Well partially. That RADV broke is unfortunate, but we have done so many > >>>>>> customized userspace stuff both based on Mesa/DDX as well as closed > >>>>>> source components that it is just highly likely that we would break > >>>>>> something else as well. > >>>>>> > >>>>> As an engineer I like to work with tangibles. The highly likely part is > >>>>> probable, but as mentioned multiple times the series allows for a _dead_ > >>>>> trivial way to address any such problems. > >>>> Well to clarify my concern is that this won't be so trivial. > >>>> > >>>> You implemented on workaround for one specific case and it is perfectly > >>>> possible that for other cases we would have to completely revert the > >>>> removal of DRM_AUTH. > >>>> > >>> I would encourage you to take a closer look at my patch and point out > >>> how parcicular cases cannot be handled. > >> Well the last time I looked it only blocked the first call to the IOCTL. > >> > > Hmm interesting, you're replied to my patch without even reading it :'-( > > Well I've NAKed the series because of the exposure of the render node > functionality > > > Can you please give it a close look and reply inline? > > > > [1] https://lists.freedesktop.org/archives/dri-devel/2019-May/219238.html > > So the workaround no longer just blocks the first IOCTL. > > But then the question is why don't you just keep the DRM_AUTH flag > instead of adding the same functionality with a new one? > > >>> From your experiense, do user-space developers care about doing (or > >>> generally do) the right thing? > >> No, not at all. Except for Marek and Michel they just take what works > >> and even Marek is often short-cutting on this. > >> > > Interesting, guess I should have asked this question from the start. > > Well sounds like you don't have to work with a closed source driver > team. But as I said I honestly would do the same as user space developer. From an upstream kernel pov I've never cared about the blobs. I don't upstream should terribly start caring about blobs - if they ship hacks that don't reflect what the open stack does, their problem. Speaking as someone who's pissed off closed source driver teams to no end. I'm happy to be of service here too :-) > I mean it's really a bunch of more code to maintain, and getting rid of > code is always less work in the long term then keeping it. > > That kernel developers scream: No no, please don't do that we want to > keep using it is completely irrelevant for this. Jokes aside, I think we should look at what's the right thing to do looking at open source only, and if that gets the blobby folks shafted, well so be it. Really not my problem, and I'm pretty sure Dave doesn't care one hair of an inch more. -Daniel
On Fri, Jun 21, 2019 at 05:15:22PM +0200, Michel Dänzer wrote: > On 2019-06-21 1:50 p.m., Daniel Vetter wrote: > > On Fri, Jun 21, 2019 at 1:37 PM Christian König > > <ckoenig.leichtzumerken@gmail.com> wrote: > >> Am 21.06.19 um 13:03 schrieb Daniel Vetter: > >>> So if you want to depracate render functionality on primary nodes, you > >>> _have_ to do that hiding in userspace. Or you'll break a lot of > >>> compositors everywhere. Just testing -amdgpu doesn't really prove > >>> anything here. So you won't move the larger ecosystem with this at > >>> all, that ship sailed. > >> > >> So what else can we do? That sounds like you suggest we should > >> completely drop render node functionality. > >> > >> I mean it's not my preferred option, but certainly something that > >> everybody could do. > >> > >> My primary concern is that we somehow need to get rid of thinks like GEM > >> flink and all that other crufty stuff we still have around on the > >> primary node (we should probably make a list of that). > >> > >> Switching everything over to render nodes just sounded like the best > >> alternative so far to archive that. > > > > tbh I do like that plan too, but it's a lot more work. And I think to > > have any push whatsoever we probably need to roll it out in gbm as a > > hack to keep things going. But probably not enough. > > > > I also think that at least some compositors will bother to do the > > right thing, and actually bother with render nodes and all that > > correctly. It's just that there's way more which dont. > > With amdgpu, we can make userspace always use render nodes for rendering > with changes to libdrm_amdgpu/radeonsi/xf86-video-amdgpu (and maybe some > amdgpu-pro components) only. No GBM/compositor changes needed. This is a very non-exhaustive list of userspace that runs on your driver ... This wayland compositor thing, actually shipping now, and there's many :-) > >>> At that point this all feels like a bikeshed, and for a bikeshed I > >>> don't care what the color is we pick, as long as they're all painted > >>> the same. > > Then some sheds shouldn't have been re-painted without DRM_AUTH already... Christian proposed that and then didn't feel like reverting it, plus vc4, and tegra never bothered with it. There's quite a bit more than the tail out of this particular bag already. > >>> Once we picked a color it's a simple technical matter of how to roll > >>> it out, using Kconfig options, or only enabling on new hw, or "merge > >>> and fix the regressions as they come" or any of the other well proven > >>> paths forward. > >> > >> Yeah, the problem is I don't see an option which currently works for > >> everyone. > >> > >> I absolutely need a grace time of multiple years until we can apply this > >> to amdgpu/radeon. > > > > Yeah that's what I meant with "pick a color, pick a way to roll it > > out". "enable for new hw, roll out years and years later" is one of > > the options for roll out. > > One gotcha being that generic userspace such as the Xorg modesetting > driver may still try to use phased out functionality such as DRI2 with > new hardware. There's a lot more generic userspace than -modesetting. That was the entire point of kms, and it succeed really well. That's why I don't think amdgpu doing their own flavour pick is going to help anyone here, except maybe if all you care about is the amd stand-alone stack only. But then why do you bother with this upstream stuff at all ... > >> How about this: > >> 1. We keep DRM_AUTH around for amdgpu/radeon for now. > >> 2. We add a Kconfig option to ignore DRM_AUTH, currently default to N. > >> This also works as a workaround for old RADV. > >> 3. We start to work on further removing old cruft from the primary node. > >> Possible the Kconfig option I suggested to disable GEM flink. > > > > Hm I'm not worried about flink really. It's gem_open which is the > > security gap, and that one has to stay master-only forever. > > GEM_OPEN is used by DRI2 clients, it can't be master-only. It's probably > one of the main reasons for the existence of DRM_AUTH. Oh sweet I forgot we're allowing this both ways :-/ Well doesn't change that much really, once we break one of these the other isn't useful anymore either. -Daniel
On 2019-06-21 5:44 p.m., Daniel Vetter wrote: > On Fri, Jun 21, 2019 at 05:15:22PM +0200, Michel Dänzer wrote: >> On 2019-06-21 1:50 p.m., Daniel Vetter wrote: >>> On Fri, Jun 21, 2019 at 1:37 PM Christian König >>> <ckoenig.leichtzumerken@gmail.com> wrote: >>>> Am 21.06.19 um 13:03 schrieb Daniel Vetter: >>>>> So if you want to depracate render functionality on primary nodes, you >>>>> _have_ to do that hiding in userspace. Or you'll break a lot of >>>>> compositors everywhere. Just testing -amdgpu doesn't really prove >>>>> anything here. So you won't move the larger ecosystem with this at >>>>> all, that ship sailed. >>>> >>>> So what else can we do? That sounds like you suggest we should >>>> completely drop render node functionality. >>>> >>>> I mean it's not my preferred option, but certainly something that >>>> everybody could do. >>>> >>>> My primary concern is that we somehow need to get rid of thinks like GEM >>>> flink and all that other crufty stuff we still have around on the >>>> primary node (we should probably make a list of that). >>>> >>>> Switching everything over to render nodes just sounded like the best >>>> alternative so far to archive that. >>> >>> tbh I do like that plan too, but it's a lot more work. And I think to >>> have any push whatsoever we probably need to roll it out in gbm as a >>> hack to keep things going. But probably not enough. >>> >>> I also think that at least some compositors will bother to do the >>> right thing, and actually bother with render nodes and all that >>> correctly. It's just that there's way more which dont. >> >> With amdgpu, we can make userspace always use render nodes for rendering >> with changes to libdrm_amdgpu/radeonsi/xf86-video-amdgpu (and maybe some >> amdgpu-pro components) only. No GBM/compositor changes needed. > > This is a very non-exhaustive list of userspace that runs on your driver > ... This wayland compositor thing, actually shipping now, and there's many :-) You don't seem to understand what I wrote. Everything will work at least as well as now, without any other changes. >>>>> Once we picked a color it's a simple technical matter of how to roll >>>>> it out, using Kconfig options, or only enabling on new hw, or "merge >>>>> and fix the regressions as they come" or any of the other well proven >>>>> paths forward. >>>> >>>> Yeah, the problem is I don't see an option which currently works for >>>> everyone. >>>> >>>> I absolutely need a grace time of multiple years until we can apply this >>>> to amdgpu/radeon. >>> >>> Yeah that's what I meant with "pick a color, pick a way to roll it >>> out". "enable for new hw, roll out years and years later" is one of >>> the options for roll out. >> >> One gotcha being that generic userspace such as the Xorg modesetting >> driver may still try to use phased out functionality such as DRI2 with >> new hardware. > > There's a lot more generic userspace than -modesetting. That is affected by phasing out DRI2 related functionality? (I thought that was the context of this sub-sub-thread) > That was the entire point of kms, and it succeed really well. That's > why I don't think amdgpu doing their own flavour pick is going to help > anyone here, Other drivers are welcome to emulate amdgpu's design making the above possible. :) > except maybe if all you care about is the amd stand-alone> stack only. > But then why do you bother with this upstream stuff at all> ... I'm afraid you've lost me here.
On 2019-06-21 5:52 p.m., Michel Dänzer wrote: > On 2019-06-21 5:44 p.m., Daniel Vetter wrote: >> On Fri, Jun 21, 2019 at 05:15:22PM +0200, Michel Dänzer wrote: >>> On 2019-06-21 1:50 p.m., Daniel Vetter wrote: >>>> On Fri, Jun 21, 2019 at 1:37 PM Christian König >>>> <ckoenig.leichtzumerken@gmail.com> wrote: >>>>> Am 21.06.19 um 13:03 schrieb Daniel Vetter: >>>>>> So if you want to depracate render functionality on primary nodes, you >>>>>> _have_ to do that hiding in userspace. Or you'll break a lot of >>>>>> compositors everywhere. Just testing -amdgpu doesn't really prove >>>>>> anything here. So you won't move the larger ecosystem with this at >>>>>> all, that ship sailed. >>>>> >>>>> So what else can we do? That sounds like you suggest we should >>>>> completely drop render node functionality. >>>>> >>>>> I mean it's not my preferred option, but certainly something that >>>>> everybody could do. >>>>> >>>>> My primary concern is that we somehow need to get rid of thinks like GEM >>>>> flink and all that other crufty stuff we still have around on the >>>>> primary node (we should probably make a list of that). >>>>> >>>>> Switching everything over to render nodes just sounded like the best >>>>> alternative so far to archive that. >>>> >>>> tbh I do like that plan too, but it's a lot more work. And I think to >>>> have any push whatsoever we probably need to roll it out in gbm as a >>>> hack to keep things going. But probably not enough. >>>> >>>> I also think that at least some compositors will bother to do the >>>> right thing, and actually bother with render nodes and all that >>>> correctly. It's just that there's way more which dont. >>> >>> With amdgpu, we can make userspace always use render nodes for rendering >>> with changes to libdrm_amdgpu/radeonsi/xf86-video-amdgpu (and maybe some >>> amdgpu-pro components) only. No GBM/compositor changes needed. >> >> This is a very non-exhaustive list of userspace that runs on your driver >> ... This wayland compositor thing, actually shipping now, and there's many :-) > > You don't seem to understand what I wrote. Everything will work at least > as well as now, without any other changes. > > [...] > >> That was the entire point of kms, and it succeed really well. That's >> why I don't think amdgpu doing their own flavour pick is going to help >> anyone here, > Other drivers are welcome to emulate amdgpu's design making the above > possible. :) Seriously though, I don't think any changes outside of libdrm/Mesa should be needed with other drivers either. Fundamentally there's nothing magic about it, just sharing BOs via PRIME between the render node file description and that passed in via the EGL/GBM/... API.
On Mon, Jun 24, 2019 at 11:37 AM Michel Dänzer <michel@daenzer.net> wrote: > On 2019-06-21 5:52 p.m., Michel Dänzer wrote: > >> That was the entire point of kms, and it succeed really well. That's > >> why I don't think amdgpu doing their own flavour pick is going to help > >> anyone here, > > Other drivers are welcome to emulate amdgpu's design making the above > > possible. :) > > Seriously though, I don't think any changes outside of libdrm/Mesa > should be needed with other drivers either. Fundamentally there's > nothing magic about it, just sharing BOs via PRIME between the render > node file description and that passed in via the EGL/GBM/... API. The libdrm/mesa design isn't the hard part, we have that even as a helper for etnaviv and all those drivers. Being inconsistent here is a think more a nuisance for integrators, who might want to sandbox gpu clients real hard and really want to know what access rights they need to give out. But then we have much, much bigger fish to fry from a cross-driver consistency pov, so *shrug*. Just feels somewhat silly we can't even get agreement or some kind of consistent plan on something fairly simple like this. -Daniel
diff --git a/drivers/gpu/drm/amd/amdgpu/Kconfig b/drivers/gpu/drm/amd/amdgpu/Kconfig index 9221e5489069..da415f445187 100644 --- a/drivers/gpu/drm/amd/amdgpu/Kconfig +++ b/drivers/gpu/drm/amd/amdgpu/Kconfig @@ -40,6 +40,22 @@ config DRM_AMDGPU_GART_DEBUGFS Selecting this option creates a debugfs file to inspect the mapped pages. Uses more memory for housekeeping, enable only for debugging. +config DRM_AMDGPU_FORCE_AUTH + bool "Force authentication check on AMDGPU_INFO ioctl" + default y + help + There were some version of the Mesa RADV drivers, which relied on + the ioctl failing, if the client is not authenticated. + + Namely, the following versions are affected: + - the whole 18.2.x series, which is EOL + - the whole 18.3.x series, which is EOL + - the 19.0.x series, prior to 19.0.4 + + Modern distributions, should disable this. That will allow various + other clients to work, that would otherwise require root privileges. + + source "drivers/gpu/drm/amd/acp/Kconfig" source "drivers/gpu/drm/amd/display/Kconfig" source "drivers/gpu/drm/amd/amdkfd/Kconfig" diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c index b17d0545728e..b8076929440b 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c @@ -1214,7 +1214,17 @@ const struct drm_ioctl_desc amdgpu_ioctls_kms[] = { DRM_IOCTL_DEF_DRV(AMDGPU_GEM_MMAP, amdgpu_gem_mmap_ioctl, DRM_AUTH|DRM_RENDER_ALLOW), DRM_IOCTL_DEF_DRV(AMDGPU_GEM_WAIT_IDLE, amdgpu_gem_wait_idle_ioctl, DRM_AUTH|DRM_RENDER_ALLOW), DRM_IOCTL_DEF_DRV(AMDGPU_CS, amdgpu_cs_ioctl, DRM_AUTH|DRM_RENDER_ALLOW), - DRM_IOCTL_DEF_DRV(AMDGPU_INFO, amdgpu_info_ioctl, DRM_AUTH|DRM_RENDER_ALLOW), + /* The DRM_FORCE_AUTH is effectively a workaround for the RADV Mesa driver. + * This is required for Mesa: + * - the whole 18.2.x series, which is EOL + * - the whole 18.3.x series, which is EOL + * - the 19.0.x series, prior to 19.0.4 + */ + DRM_IOCTL_DEF_DRV(AMDGPU_INFO, amdgpu_info_ioctl, +#if defined(DRM_AMDGPU_FORCE_AUTH) + DRM_FORCE_AUTH| +#endif + DRM_AUTH|DRM_RENDER_ALLOW), DRM_IOCTL_DEF_DRV(AMDGPU_WAIT_CS, amdgpu_cs_wait_ioctl, DRM_AUTH|DRM_RENDER_ALLOW), DRM_IOCTL_DEF_DRV(AMDGPU_WAIT_FENCES, amdgpu_cs_wait_fences_ioctl, DRM_AUTH|DRM_RENDER_ALLOW), DRM_IOCTL_DEF_DRV(AMDGPU_GEM_METADATA, amdgpu_gem_metadata_ioctl, DRM_AUTH|DRM_RENDER_ALLOW), diff --git a/drivers/gpu/drm/drm_ioctl.c b/drivers/gpu/drm/drm_ioctl.c index 2263e3ddd822..9841c0076f02 100644 --- a/drivers/gpu/drm/drm_ioctl.c +++ b/drivers/gpu/drm/drm_ioctl.c @@ -544,6 +544,11 @@ int drm_ioctl_permit(u32 flags, struct drm_file *file_priv) drm_is_render_client(file_priv))) return -EACCES; + /* FORCE_AUTH is only for authenticated or render client */ + if (unlikely((flags & DRM_FORCE_AUTH) && !drm_is_render_client(file_priv) && + !file_priv->authenticated)) + return -EACCES; + return 0; } EXPORT_SYMBOL(drm_ioctl_permit); diff --git a/include/drm/drm_ioctl.h b/include/drm/drm_ioctl.h index fafb6f592c4b..6084ee32043d 100644 --- a/include/drm/drm_ioctl.h +++ b/include/drm/drm_ioctl.h @@ -126,6 +126,23 @@ enum drm_ioctl_flags { * not set DRM_AUTH because they do not require authentication. */ DRM_RENDER_ALLOW = BIT(5), + /** + * @DRM_FORCE_AUTH: + * + * Authentication of the primary node is mandatory. Regardless that the + * user can usually circumvent that by using the render node with exact + * same ioctl. + * + * Note: this is effectively a workaround for AMDGPU AMDGPU_INFO ioctl + * and the RADV Mesa driver. This is required for Mesa: + * - the whole 18.2.x series, which is EOL + * - the whole 18.3.x series, which is EOL + * - the 19.0.x series, prior to 19.0.4 + * + * Note: later patch will effectively drop the DRM_AUTH for ioctls + * annotated as DRM_AUTH | DRM_RENDER_ALLOW. + */ + DRM_FORCE_AUTH = BIT(6), }; /**