Message ID | 20230630115651.354849-1-James.Zhu@amd.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | drm: support up to 128 drm devices | expand |
(cc Daniel Vetter and Pekka because this change has uAPI repercussions) On Friday, June 30th, 2023 at 13:56, James Zhu <James.Zhu@amd.com> wrote: > From: Christian König <ckoenig.leichtzumerken@gmail.com> > > This makes room for up to 128 DRM devices. > > Signed-off-by: Christian König <christian.koenig@amd.com> > Signed-off-by: James Zhu <James.Zhu@amd.com> > --- > drivers/gpu/drm/drm_drv.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c > index 73b845a75d52..0d55c64444f5 100644 > --- a/drivers/gpu/drm/drm_drv.c > +++ b/drivers/gpu/drm/drm_drv.c > @@ -137,7 +137,7 @@ static int drm_minor_alloc(struct drm_device *dev, unsigned int type) > r = idr_alloc(&drm_minors_idr, > NULL, > 64 * type, > - 64 * (type + 1), > + 64 * (type + 2), The type comes from this enum: enum drm_minor_type { DRM_MINOR_PRIMARY, DRM_MINOR_CONTROL, DRM_MINOR_RENDER, DRM_MINOR_ACCEL = 32, }; Before this patch, 0..63 are for primary, 64..127 are for control (never exposed by the kernel), 128..191 are for render, 2048..2112 are for accel. After this patch, 0..127 are for primary, 64..191 are for control (never exposed by the kernel), 128..255 are for render, 2048..2176 are for accel. I'm worried what might happen with old user-space, especially old libdrm. When there are a lot of primary nodes, some would get detected as control nodes, even if they aren't. Is this fine? This would at least be confusing for information-gathering tools like drm_info. I'm not sure about other consequences. Do we want to forever forbid the 64..127 range instead, so that we have the guarantee that old libdrm never sees it? I'm also worried about someone adding a new entry to the enum after DRM_MINOR_RENDER (DRM_MINOR_ACCEL was specifically set to 32 so that new node types could be added before). drm_minor_alloc() would blow up in this case, because it'd use the 192..319 range, which overlaps with render. I think a switch with explicit ranges (and WARN when an unknown type is passed in) would be both easier to read and less risky. > GFP_NOWAIT); > spin_unlock_irqrestore(&drm_minor_lock, flags); > } > -- > 2.34.1
On Friday, July 14th, 2023 at 12:31, Simon Ser <contact@emersion.fr> wrote: > Before this patch, 0..63 are for primary, 64..127 are for control (never > exposed by the kernel), 128..191 are for render, 2048..2112 are for accel. > After this patch, 0..127 are for primary, 64..191 are for control (never > exposed by the kernel), 128..255 are for render, 2048..2176 are for accel. Correction: reading the code, accel is handled separately. Additional find: the kernel creates sysfs control nodes because old user-space reads them to figure out whether a device is DRIVER_MODESET.
On Fri, 14 Jul 2023 at 11:32, Simon Ser <contact@emersion.fr> wrote: > > (cc Daniel Vetter and Pekka because this change has uAPI repercussions) > > On Friday, June 30th, 2023 at 13:56, James Zhu <James.Zhu@amd.com> wrote: > > > From: Christian König <ckoenig.leichtzumerken@gmail.com> > > > > This makes room for up to 128 DRM devices. > > > > Signed-off-by: Christian König <christian.koenig@amd.com> > > Signed-off-by: James Zhu <James.Zhu@amd.com> > > --- > > drivers/gpu/drm/drm_drv.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c > > index 73b845a75d52..0d55c64444f5 100644 > > --- a/drivers/gpu/drm/drm_drv.c > > +++ b/drivers/gpu/drm/drm_drv.c > > @@ -137,7 +137,7 @@ static int drm_minor_alloc(struct drm_device *dev, unsigned int type) > > r = idr_alloc(&drm_minors_idr, > > NULL, > > 64 * type, > > - 64 * (type + 1), > > + 64 * (type + 2), > > The type comes from this enum: > > enum drm_minor_type { > DRM_MINOR_PRIMARY, > DRM_MINOR_CONTROL, > DRM_MINOR_RENDER, > DRM_MINOR_ACCEL = 32, > }; > > Before this patch, 0..63 are for primary, 64..127 are for control (never > exposed by the kernel), 128..191 are for render, 2048..2112 are for accel. > After this patch, 0..127 are for primary, 64..191 are for control (never > exposed by the kernel), 128..255 are for render, 2048..2176 are for accel. > > I'm worried what might happen with old user-space, especially old libdrm. I also share the same concern. Although the bigger issue is not libdrm - since we can update it and prod distributions to update it. The biggest concern is software that cannot be rebuild/updated - closed source and various open-source that has been abandoned. As mentioned in the gitlab ticket - the current style of embedding the uABI/uAPI in each and every application is not great IMHO. That is why I've introduced the `drmGetDevices2` API, to consolidate most of the chaos in a single place. For going forward, here is one way we can shave this yak: - update libdrm to max 64 nodes - roll libdrm release, nag distributions to update to it // could be folded with the next release below - update libdrm to use name driven type detection - roll libdrm release, nag distributions to update to it - once ^^ release hits most distributions, merge the above proposed kernel patch - the commit message should explain the caveats and fixed libdrm version - we should be prepared to revert the commit, if it causes user space regression - fix (see below) and re-introduce the kernel patch 1-2 releases later In parallel to all the above, as a community we should collectively audit and update open-source applications to the `drmDevices2` API. As with other legacy (DRI1), this one will take some time but we can get there. HTH Emil
On Monday, July 17th, 2023 at 09:30, Emil Velikov <emil.l.velikov@gmail.com> wrote: > > I'm worried what might happen with old user-space, especially old libdrm. > > I also share the same concern. Although the bigger issue is not libdrm > - since we can update it and prod distributions to update it. > The biggest concern is software that cannot be rebuild/updated - > closed source and various open-source that has been abandoned. Well. Now that we have Flatpak and AppImage and friends, we're really likely to have old libdrm copies vendored all over the place, and these will stick around essentially forever. > For going forward, here is one way we can shave this yak: > - update libdrm to max 64 nodes > - roll libdrm release, nag distributions to update to it // could be > folded with the next release below > - update libdrm to use name driven type detection > - roll libdrm release, nag distributions to update to it > - once ^^ release hits most distributions, merge the above proposed > kernel patch > - the commit message should explain the caveats and fixed libdrm version > - we should be prepared to revert the commit, if it causes user > space regression - fix (see below) and re-introduce the kernel patch > 1-2 releases later That sounds really scary to me. I'd really prefer to try not to break the kernel uAPI here. The kernel rule is "do not break user-space".
On Mon, 17 Jul 2023 at 10:45, Simon Ser <contact@emersion.fr> wrote: > > On Monday, July 17th, 2023 at 09:30, Emil Velikov <emil.l.velikov@gmail.com> wrote: > > > > I'm worried what might happen with old user-space, especially old libdrm. > > > > I also share the same concern. Although the bigger issue is not libdrm > > - since we can update it and prod distributions to update it. > > The biggest concern is software that cannot be rebuild/updated - > > closed source and various open-source that has been abandoned. > > Well. Now that we have Flatpak and AppImage and friends, we're really likely > to have old libdrm copies vendored all over the place, and these will stick > around essentially forever. > The flatpak devs have been very helpful. So I'm pretty sure we can get those updated - even for older flatpaks. For AppImage, I have no experience. > > For going forward, here is one way we can shave this yak: > > - update libdrm to max 64 nodes > > - roll libdrm release, nag distributions to update to it // could be > > folded with the next release below > > - update libdrm to use name driven type detection > > - roll libdrm release, nag distributions to update to it > > - once ^^ release hits most distributions, merge the above proposed > > kernel patch > > - the commit message should explain the caveats and fixed libdrm version > > - we should be prepared to revert the commit, if it causes user > > space regression - fix (see below) and re-introduce the kernel patch > > 1-2 releases later > > That sounds really scary to me. I'd really prefer to try not to break the > kernel uAPI here. > With part in particular? Mind you I'm not particularly happy either, since in essence it's like a controlled explosion. > The kernel rule is "do not break user-space". Yes, in a perfect world. In practice, there have been multiple kernel changes breaking user-space. Some got reverted, some remained. AFAICT the above will get us out of the sticky situation we're in with the least amount of explosion. If there is a concrete proposal, please go ahead and sorry if I've missed it. I'm supposed to be off, having fun with family when I saw this whole thing explode. Small note: literally all the users I've seen will stop on a missing node (card or render) aka if the kernel creates card0...63 and then card200... then (hard wavy estimate) 80% of the apps will be broken. HTH Emil
On Monday, July 17th, 2023 at 15:24, Emil Velikov <emil.l.velikov@gmail.com> wrote: > > > For going forward, here is one way we can shave this yak: > > > - update libdrm to max 64 nodes > > > - roll libdrm release, nag distributions to update to it // could be > > > folded with the next release below > > > - update libdrm to use name driven type detection > > > - roll libdrm release, nag distributions to update to it > > > - once ^^ release hits most distributions, merge the above proposed > > > kernel patch > > > - the commit message should explain the caveats and fixed libdrm version > > > - we should be prepared to revert the commit, if it causes user > > > space regression - fix (see below) and re-introduce the kernel patch > > > 1-2 releases later > > > > That sounds really scary to me. I'd really prefer to try not to break the > > kernel uAPI here. > > > > With part in particular? Mind you I'm not particularly happy either, > since in essence it's like a controlled explosion. I believe there are ways to extend the uAPI to support more devices without breaking the uAPI. Michał Winiarski's patch for instance tried something to this effect. > > The kernel rule is "do not break user-space". > > Yes, in a perfect world. In practice, there have been multiple kernel > changes breaking user-space. Some got reverted, some remained. > AFAICT the above will get us out of the sticky situation we're in with > the least amount of explosion. > > If there is a concrete proposal, please go ahead and sorry if I've > missed it. I'm supposed to be off, having fun with family when I saw > this whole thing explode. > > Small note: literally all the users I've seen will stop on a missing > node (card or render) aka if the kernel creates card0...63 and then > card200... then (hard wavy estimate) 80% of the apps will be broken. That's fine, because that's not a kernel regression. Supporting more than 64 devices is a new kernel feature, and if some user-space ignores the new nodes, that's not a kernel regression. A regression only happens when a use-case which works with an older kernel is broken by a newer kernel.
On Mon, 17 Jul 2023 at 14:54, Simon Ser <contact@emersion.fr> wrote: > > On Monday, July 17th, 2023 at 15:24, Emil Velikov <emil.l.velikov@gmail.com> wrote: > > > > > For going forward, here is one way we can shave this yak: > > > > - update libdrm to max 64 nodes > > > > - roll libdrm release, nag distributions to update to it // could be > > > > folded with the next release below > > > > - update libdrm to use name driven type detection > > > > - roll libdrm release, nag distributions to update to it > > > > - once ^^ release hits most distributions, merge the above proposed > > > > kernel patch > > > > - the commit message should explain the caveats and fixed libdrm version > > > > - we should be prepared to revert the commit, if it causes user > > > > space regression - fix (see below) and re-introduce the kernel patch > > > > 1-2 releases later > > > > > > That sounds really scary to me. I'd really prefer to try not to break the > > > kernel uAPI here. > > > > > > > With part in particular? Mind you I'm not particularly happy either, > > since in essence it's like a controlled explosion. > > I believe there are ways to extend the uAPI to support more devices without > breaking the uAPI. Michał Winiarski's patch for instance tried something to > this effect. > > > > The kernel rule is "do not break user-space". > > > > Yes, in a perfect world. In practice, there have been multiple kernel > > changes breaking user-space. Some got reverted, some remained. > > AFAICT the above will get us out of the sticky situation we're in with > > the least amount of explosion. > > > > If there is a concrete proposal, please go ahead and sorry if I've > > missed it. I'm supposed to be off, having fun with family when I saw > > this whole thing explode. > > > > Small note: literally all the users I've seen will stop on a missing > > node (card or render) aka if the kernel creates card0...63 and then > > card200... then (hard wavy estimate) 80% of the apps will be broken. > > That's fine, because that's not a kernel regression. Supporting more than 64 > devices is a new kernel feature, and if some user-space ignores the new nodes, > that's not a kernel regression. A regression only happens when a use-case which > works with an older kernel is broken by a newer kernel. Won't this approach effectively hard-code/leak even more kernel uABI into dozens of not hundreds of userspace projects? This does not sound like a scalable solution IMHO. I am 100% behind the "don't break userspace rule", alas very few things in life are as black/white as your comments seem to suggest. Thus I would suggest doing a bit of both or a compromise if you will. Namely: - try the initial route outlined above - if there are (m)any fires, revert the kernel patch and opt for the work by Michał This has the benefit of fixing a bunch of the uABI abuses out there, and leaking more uABI only on as-needed basis. Side note: KDE folks have their own flatpak runtime and have been quite open to backport libdrm/other fixes. HTH Emil
diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c index 73b845a75d52..0d55c64444f5 100644 --- a/drivers/gpu/drm/drm_drv.c +++ b/drivers/gpu/drm/drm_drv.c @@ -137,7 +137,7 @@ static int drm_minor_alloc(struct drm_device *dev, unsigned int type) r = idr_alloc(&drm_minors_idr, NULL, 64 * type, - 64 * (type + 1), + 64 * (type + 2), GFP_NOWAIT); spin_unlock_irqrestore(&drm_minor_lock, flags); }