Message ID | 20221022214622.18042-1-ogabbay@kernel.org (mailing list archive) |
---|---|
Headers | show |
Series | new subsystem for compute accelerator devices | expand |
On Sun, Oct 23, 2022 at 12:46:19AM +0300, Oded Gabbay wrote: > In the last couple of months we had a discussion [1] about creating a new > subsystem for compute accelerator devices in the kernel. > > After an analysis that was done by DRM maintainers and myself, and following > a BOF session at the Linux Plumbers conference a few weeks ago [2], we > decided to create a new subsystem that will use the DRM subsystem's code and > functionality. i.e. the accel core code will be part of the DRM subsystem. > > This will allow us to leverage the extensive DRM code-base and > collaborate with DRM developers that have experience with this type of > devices. In addition, new features that will be added for the accelerator > drivers can be of use to GPU drivers as well (e.g. RAS). > > As agreed in the BOF session, the accelerator devices will be exposed to > user-space with a new, dedicated device char files and a dedicated major > number (261), to clearly separate them from graphic cards and the graphic > user-space s/w stack. Furthermore, the drivers will be located in a separate > place in the kernel tree (drivers/accel/). > > This series of patches is the first step in this direction as it adds the > necessary infrastructure for accelerator devices to DRM. The new devices will > be exposed with the following convention: > > device char files - /dev/accel/accel* > sysfs - /sys/class/accel/accel*/ > debugfs - /sys/kernel/debug/accel/accel*/ > > I tried to reuse the existing DRM code as much as possible, while keeping it > readable and maintainable. > > One thing that is missing from this series is defining a namespace for the > new accel subsystem, while I'll add in the next iteration of this patch-set, > after I will receive feedback from the community. > > As for drivers, once this series will be accepted (after adding the namespace), > I will start working on migrating the habanalabs driver to the new accel > subsystem. I have talked about it with Dave and we agreed that it will be > a good start to simply move the driver as-is with minimal changes, and then > start working on the driver's individual features that will be either added > to the accel core code (with or without changes), or will be removed and > instead the driver will use existing DRM code. > > In addition, I know of at least 3 or 4 drivers that were submitted for review > and are good candidates to be included in this new subsystem, instead of being > a drm render node driver or a misc driver. > > [1] https://lkml.org/lkml/2022/7/31/83 > [2] https://airlied.blogspot.com/2022/09/accelerators-bof-outcomes-summary.html > Since this is new subsystem, it should have its own git tree where you collected accelerator-related patches. By convention, there should be "next" branch targeting for next kernel release and "fixes" branch for bugfixes pending for current release. Both branches should be included into linux-next. The names don't necessarily be that, though. Also, it had been great if you write short, descriptive documentation about the subsystem (maintainers handbook). Cc'ing linux-doc folks.
On Sun, Oct 23, 2022 at 09:02:49PM +0700, Bagas Sanjaya wrote: > On Sun, Oct 23, 2022 at 12:46:19AM +0300, Oded Gabbay wrote: > > In the last couple of months we had a discussion [1] about creating a new > > subsystem for compute accelerator devices in the kernel. > > > > After an analysis that was done by DRM maintainers and myself, and following > > a BOF session at the Linux Plumbers conference a few weeks ago [2], we > > decided to create a new subsystem that will use the DRM subsystem's code and > > functionality. i.e. the accel core code will be part of the DRM subsystem. > > > > This will allow us to leverage the extensive DRM code-base and > > collaborate with DRM developers that have experience with this type of > > devices. In addition, new features that will be added for the accelerator > > drivers can be of use to GPU drivers as well (e.g. RAS). > > > > As agreed in the BOF session, the accelerator devices will be exposed to > > user-space with a new, dedicated device char files and a dedicated major > > number (261), to clearly separate them from graphic cards and the graphic > > user-space s/w stack. Furthermore, the drivers will be located in a separate > > place in the kernel tree (drivers/accel/). > > > > This series of patches is the first step in this direction as it adds the > > necessary infrastructure for accelerator devices to DRM. The new devices will > > be exposed with the following convention: > > > > device char files - /dev/accel/accel* > > sysfs - /sys/class/accel/accel*/ > > debugfs - /sys/kernel/debug/accel/accel*/ > > > > I tried to reuse the existing DRM code as much as possible, while keeping it > > readable and maintainable. > > > > One thing that is missing from this series is defining a namespace for the > > new accel subsystem, while I'll add in the next iteration of this patch-set, > > after I will receive feedback from the community. > > > > As for drivers, once this series will be accepted (after adding the namespace), > > I will start working on migrating the habanalabs driver to the new accel > > subsystem. I have talked about it with Dave and we agreed that it will be > > a good start to simply move the driver as-is with minimal changes, and then > > start working on the driver's individual features that will be either added > > to the accel core code (with or without changes), or will be removed and > > instead the driver will use existing DRM code. > > > > In addition, I know of at least 3 or 4 drivers that were submitted for review > > and are good candidates to be included in this new subsystem, instead of being > > a drm render node driver or a misc driver. > > > > [1] https://lkml.org/lkml/2022/7/31/83 > > [2] https://airlied.blogspot.com/2022/09/accelerators-bof-outcomes-summary.html > > > > Since this is new subsystem, it should have its own git tree where you > collected accelerator-related patches. No, that is never a requirement, where did you get that idea? > By convention, there should be > "next" branch targeting for next kernel release and "fixes" branch for > bugfixes pending for current release. Both branches should be included > into linux-next. The names don't necessarily be that, though. Again, no, that has never been a requirement. > Also, it had been great if you write short, descriptive documentation > about the subsystem (maintainers handbook). Also no, this is fine, it's an RFC sent to all of the people involved in the discussions about this new subsystem. The changelog here is totally sufficient. Please do not confuse people and ask them to do things that are not requirements, that's not helpful at all. greg k-h
Hi Am 22.10.22 um 23:46 schrieb Oded Gabbay: > In the last couple of months we had a discussion [1] about creating a new > subsystem for compute accelerator devices in the kernel. > > After an analysis that was done by DRM maintainers and myself, and following > a BOF session at the Linux Plumbers conference a few weeks ago [2], we > decided to create a new subsystem that will use the DRM subsystem's code and > functionality. i.e. the accel core code will be part of the DRM subsystem. > > This will allow us to leverage the extensive DRM code-base and > collaborate with DRM developers that have experience with this type of > devices. In addition, new features that will be added for the accelerator > drivers can be of use to GPU drivers as well (e.g. RAS). > > As agreed in the BOF session, the accelerator devices will be exposed to > user-space with a new, dedicated device char files and a dedicated major > number (261), to clearly separate them from graphic cards and the graphic > user-space s/w stack. Furthermore, the drivers will be located in a separate > place in the kernel tree (drivers/accel/). > > This series of patches is the first step in this direction as it adds the > necessary infrastructure for accelerator devices to DRM. The new devices will > be exposed with the following convention: > > device char files - /dev/accel/accel* > sysfs - /sys/class/accel/accel*/ > debugfs - /sys/kernel/debug/accel/accel*/ I know I'm really late to this discussion, but wouldn't 'compute' be a better name? (I agree that skynet would also be nice :) > > I tried to reuse the existing DRM code as much as possible, while keeping it > readable and maintainable. > > One thing that is missing from this series is defining a namespace for the > new accel subsystem, while I'll add in the next iteration of this patch-set, > after I will receive feedback from the community. > > As for drivers, once this series will be accepted (after adding the namespace), > I will start working on migrating the habanalabs driver to the new accel > subsystem. I have talked about it with Dave and we agreed that it will be > a good start to simply move the driver as-is with minimal changes, and then > start working on the driver's individual features that will be either added > to the accel core code (with or without changes), or will be removed and > instead the driver will use existing DRM code. What's your opinion on the long-term prospect of DRM vs accel? I assume that over time, DRM helpers will move into accel and some DRM drivers will start depending on accel? After reading the provided links, I wondered if we shouldn't rename drivers/gpu to drivers/accel and put the new subsystem into drivers/accel/compute. We'd have DRM and compute devices next to each other and shared helpers could be located in other subdirectories within accel/ Best regards Thomas > > In addition, I know of at least 3 or 4 drivers that were submitted for review > and are good candidates to be included in this new subsystem, instead of being > a drm render node driver or a misc driver. > > [1] https://lkml.org/lkml/2022/7/31/83 > [2] https://airlied.blogspot.com/2022/09/accelerators-bof-outcomes-summary.html > > Thanks, > Oded > > Oded Gabbay (3): > drivers/accel: add new kconfig and update MAINTAINERS > drm: define new accel major and register it > drm: add dedicated minor for accelerator devices > > Documentation/admin-guide/devices.txt | 5 + > MAINTAINERS | 8 + > drivers/Kconfig | 2 + > drivers/accel/Kconfig | 24 +++ > drivers/gpu/drm/drm_drv.c | 214 +++++++++++++++++++++----- > drivers/gpu/drm/drm_file.c | 69 ++++++--- > drivers/gpu/drm/drm_internal.h | 5 +- > drivers/gpu/drm/drm_sysfs.c | 81 +++++++++- > include/drm/drm_device.h | 3 + > include/drm/drm_drv.h | 8 + > include/drm/drm_file.h | 21 ++- > include/drm/drm_ioctl.h | 1 + > 12 files changed, 374 insertions(+), 67 deletions(-) > create mode 100644 drivers/accel/Kconfig > > -- > 2.34.1 >
On Mon, Oct 24, 2022 at 01:55:56PM +0200, Thomas Zimmermann wrote: > Hi > > Am 22.10.22 um 23:46 schrieb Oded Gabbay: > > In the last couple of months we had a discussion [1] about creating a new > > subsystem for compute accelerator devices in the kernel. > > > > After an analysis that was done by DRM maintainers and myself, and following > > a BOF session at the Linux Plumbers conference a few weeks ago [2], we > > decided to create a new subsystem that will use the DRM subsystem's code and > > functionality. i.e. the accel core code will be part of the DRM subsystem. > > > > This will allow us to leverage the extensive DRM code-base and > > collaborate with DRM developers that have experience with this type of > > devices. In addition, new features that will be added for the accelerator > > drivers can be of use to GPU drivers as well (e.g. RAS). > > > > As agreed in the BOF session, the accelerator devices will be exposed to > > user-space with a new, dedicated device char files and a dedicated major > > number (261), to clearly separate them from graphic cards and the graphic > > user-space s/w stack. Furthermore, the drivers will be located in a separate > > place in the kernel tree (drivers/accel/). > > > > This series of patches is the first step in this direction as it adds the > > necessary infrastructure for accelerator devices to DRM. The new devices will > > be exposed with the following convention: > > > > device char files - /dev/accel/accel* > > sysfs - /sys/class/accel/accel*/ > > debugfs - /sys/kernel/debug/accel/accel*/ > > I know I'm really late to this discussion, but wouldn't 'compute' be a > better name? > > (I agree that skynet would also be nice :) See the summary of the meeting we all held at the Plumbers conference about this. "accel" was the "least hated" of all of the options, so I think we'll stick with that for now. thanks, greg k-h
On Mon, Oct 24, 2022 at 2:56 PM Thomas Zimmermann <tzimmermann@suse.de> wrote: > > Hi > > Am 22.10.22 um 23:46 schrieb Oded Gabbay: > > In the last couple of months we had a discussion [1] about creating a new > > subsystem for compute accelerator devices in the kernel. > > > > After an analysis that was done by DRM maintainers and myself, and following > > a BOF session at the Linux Plumbers conference a few weeks ago [2], we > > decided to create a new subsystem that will use the DRM subsystem's code and > > functionality. i.e. the accel core code will be part of the DRM subsystem. > > > > This will allow us to leverage the extensive DRM code-base and > > collaborate with DRM developers that have experience with this type of > > devices. In addition, new features that will be added for the accelerator > > drivers can be of use to GPU drivers as well (e.g. RAS). > > > > As agreed in the BOF session, the accelerator devices will be exposed to > > user-space with a new, dedicated device char files and a dedicated major > > number (261), to clearly separate them from graphic cards and the graphic > > user-space s/w stack. Furthermore, the drivers will be located in a separate > > place in the kernel tree (drivers/accel/). > > > > This series of patches is the first step in this direction as it adds the > > necessary infrastructure for accelerator devices to DRM. The new devices will > > be exposed with the following convention: > > > > device char files - /dev/accel/accel* > > sysfs - /sys/class/accel/accel*/ > > debugfs - /sys/kernel/debug/accel/accel*/ > > I know I'm really late to this discussion, but wouldn't 'compute' be a > better name? I also thought like you :) But I consulted with Dave while writing these patches and he suggested accel/accel. I'm fine either way... > > (I agree that skynet would also be nice :) > > > > > I tried to reuse the existing DRM code as much as possible, while keeping it > > readable and maintainable. > > > > One thing that is missing from this series is defining a namespace for the > > new accel subsystem, while I'll add in the next iteration of this patch-set, > > after I will receive feedback from the community. > > > > As for drivers, once this series will be accepted (after adding the namespace), > > I will start working on migrating the habanalabs driver to the new accel > > subsystem. I have talked about it with Dave and we agreed that it will be > > a good start to simply move the driver as-is with minimal changes, and then > > start working on the driver's individual features that will be either added > > to the accel core code (with or without changes), or will be removed and > > instead the driver will use existing DRM code. > > What's your opinion on the long-term prospect of DRM vs accel? I assume > that over time, DRM helpers will move into accel and some DRM drivers > will start depending on accel? I don't think that is what I had in mind. What I had in mind is that accel helpers are only relevant for accel drivers, and any code that might also be relevant for DRM drivers will be placed in DRM core code. e.g. GEM enhancements, RAS netlink support. btw, I suspect this will be the majority of the code. In addition, DRM drivers should never set the new DRIVER_COMPUTE_ACCEL driver feature in their structure so they should have zero dependency on the accel core code. > > After reading the provided links, I wondered if we shouldn't rename > drivers/gpu to drivers/accel and put the new subsystem into > drivers/accel/compute. We'd have DRM and compute devices next to each > other and shared helpers could be located in other subdirectories within > accel/ I think this idea was brought up at the BOF session and Dave and others said it will be too big of a burden (due to backports) to do it. From Dave's blogpost: "Moving things around now for current drivers is too hard to deal with for backports etc. Adding a new directory for accel drivers would be a good plan, even if they used the drm framework." Thanks, Oded > > Best regards > Thomas > > > > > In addition, I know of at least 3 or 4 drivers that were submitted for review > > and are good candidates to be included in this new subsystem, instead of being > > a drm render node driver or a misc driver. > > > > [1] https://lkml.org/lkml/2022/7/31/83 > > [2] https://airlied.blogspot.com/2022/09/accelerators-bof-outcomes-summary.html > > > > Thanks, > > Oded > > > > Oded Gabbay (3): > > drivers/accel: add new kconfig and update MAINTAINERS > > drm: define new accel major and register it > > drm: add dedicated minor for accelerator devices > > > > Documentation/admin-guide/devices.txt | 5 + > > MAINTAINERS | 8 + > > drivers/Kconfig | 2 + > > drivers/accel/Kconfig | 24 +++ > > drivers/gpu/drm/drm_drv.c | 214 +++++++++++++++++++++----- > > drivers/gpu/drm/drm_file.c | 69 ++++++--- > > drivers/gpu/drm/drm_internal.h | 5 +- > > drivers/gpu/drm/drm_sysfs.c | 81 +++++++++- > > include/drm/drm_device.h | 3 + > > include/drm/drm_drv.h | 8 + > > include/drm/drm_file.h | 21 ++- > > include/drm/drm_ioctl.h | 1 + > > 12 files changed, 374 insertions(+), 67 deletions(-) > > create mode 100644 drivers/accel/Kconfig > > > > -- > > 2.34.1 > > > > -- > Thomas Zimmermann > Graphics Driver Developer > SUSE Software Solutions Germany GmbH > Maxfeldstr. 5, 90409 Nürnberg, Germany > (HRB 36809, AG Nürnberg) > Geschäftsführer: Ivo Totev
On Sat, Oct 22, 2022 at 5:46 PM Oded Gabbay <ogabbay@kernel.org> wrote: > > In the last couple of months we had a discussion [1] about creating a new > subsystem for compute accelerator devices in the kernel. > > After an analysis that was done by DRM maintainers and myself, and following > a BOF session at the Linux Plumbers conference a few weeks ago [2], we > decided to create a new subsystem that will use the DRM subsystem's code and > functionality. i.e. the accel core code will be part of the DRM subsystem. > > This will allow us to leverage the extensive DRM code-base and > collaborate with DRM developers that have experience with this type of > devices. In addition, new features that will be added for the accelerator > drivers can be of use to GPU drivers as well (e.g. RAS). > > As agreed in the BOF session, the accelerator devices will be exposed to > user-space with a new, dedicated device char files and a dedicated major > number (261), to clearly separate them from graphic cards and the graphic > user-space s/w stack. Furthermore, the drivers will be located in a separate > place in the kernel tree (drivers/accel/). > > This series of patches is the first step in this direction as it adds the > necessary infrastructure for accelerator devices to DRM. The new devices will > be exposed with the following convention: > > device char files - /dev/accel/accel* > sysfs - /sys/class/accel/accel*/ > debugfs - /sys/kernel/debug/accel/accel*/ > > I tried to reuse the existing DRM code as much as possible, while keeping it > readable and maintainable. Wouldn't something like this: https://patchwork.freedesktop.org/series/109575/ Be simpler and provide better backwards compatibility for existing non-gfx devices in the drm subsystem as well as newer devices? Alex > > One thing that is missing from this series is defining a namespace for the > new accel subsystem, while I'll add in the next iteration of this patch-set, > after I will receive feedback from the community. > > As for drivers, once this series will be accepted (after adding the namespace), > I will start working on migrating the habanalabs driver to the new accel > subsystem. I have talked about it with Dave and we agreed that it will be > a good start to simply move the driver as-is with minimal changes, and then > start working on the driver's individual features that will be either added > to the accel core code (with or without changes), or will be removed and > instead the driver will use existing DRM code. > > In addition, I know of at least 3 or 4 drivers that were submitted for review > and are good candidates to be included in this new subsystem, instead of being > a drm render node driver or a misc driver. > > [1] https://lkml.org/lkml/2022/7/31/83 > [2] https://airlied.blogspot.com/2022/09/accelerators-bof-outcomes-summary.html > > Thanks, > Oded > > Oded Gabbay (3): > drivers/accel: add new kconfig and update MAINTAINERS > drm: define new accel major and register it > drm: add dedicated minor for accelerator devices > > Documentation/admin-guide/devices.txt | 5 + > MAINTAINERS | 8 + > drivers/Kconfig | 2 + > drivers/accel/Kconfig | 24 +++ > drivers/gpu/drm/drm_drv.c | 214 +++++++++++++++++++++----- > drivers/gpu/drm/drm_file.c | 69 ++++++--- > drivers/gpu/drm/drm_internal.h | 5 +- > drivers/gpu/drm/drm_sysfs.c | 81 +++++++++- > include/drm/drm_device.h | 3 + > include/drm/drm_drv.h | 8 + > include/drm/drm_file.h | 21 ++- > include/drm/drm_ioctl.h | 1 + > 12 files changed, 374 insertions(+), 67 deletions(-) > create mode 100644 drivers/accel/Kconfig > > -- > 2.34.1 >
On Mon, Oct 24, 2022 at 4:55 PM Alex Deucher <alexdeucher@gmail.com> wrote: > > On Sat, Oct 22, 2022 at 5:46 PM Oded Gabbay <ogabbay@kernel.org> wrote: > > > > In the last couple of months we had a discussion [1] about creating a new > > subsystem for compute accelerator devices in the kernel. > > > > After an analysis that was done by DRM maintainers and myself, and following > > a BOF session at the Linux Plumbers conference a few weeks ago [2], we > > decided to create a new subsystem that will use the DRM subsystem's code and > > functionality. i.e. the accel core code will be part of the DRM subsystem. > > > > This will allow us to leverage the extensive DRM code-base and > > collaborate with DRM developers that have experience with this type of > > devices. In addition, new features that will be added for the accelerator > > drivers can be of use to GPU drivers as well (e.g. RAS). > > > > As agreed in the BOF session, the accelerator devices will be exposed to > > user-space with a new, dedicated device char files and a dedicated major > > number (261), to clearly separate them from graphic cards and the graphic > > user-space s/w stack. Furthermore, the drivers will be located in a separate > > place in the kernel tree (drivers/accel/). > > > > This series of patches is the first step in this direction as it adds the > > necessary infrastructure for accelerator devices to DRM. The new devices will > > be exposed with the following convention: > > > > device char files - /dev/accel/accel* > > sysfs - /sys/class/accel/accel*/ > > debugfs - /sys/kernel/debug/accel/accel*/ > > > > I tried to reuse the existing DRM code as much as possible, while keeping it > > readable and maintainable. > > Wouldn't something like this: > https://patchwork.freedesktop.org/series/109575/ > Be simpler and provide better backwards compatibility for existing > non-gfx devices in the drm subsystem as well as newer devices? As Greg said, see the summary. The consensus in the LPC session was that we need to clearly separate accel devices from existing gpu devices (whether they use primary and/or render nodes). That is the main guideline according to which I wrote the patches. I don't think I want to change this decision. Also, there was never any intention to provide backward compatibility for existing non-gfx devices. Why would we want that ? We are mainly talking about drivers that are currently trying to get upstream, and the habana driver. Oded > > Alex > > > > > One thing that is missing from this series is defining a namespace for the > > new accel subsystem, while I'll add in the next iteration of this patch-set, > > after I will receive feedback from the community. > > > > As for drivers, once this series will be accepted (after adding the namespace), > > I will start working on migrating the habanalabs driver to the new accel > > subsystem. I have talked about it with Dave and we agreed that it will be > > a good start to simply move the driver as-is with minimal changes, and then > > start working on the driver's individual features that will be either added > > to the accel core code (with or without changes), or will be removed and > > instead the driver will use existing DRM code. > > > > In addition, I know of at least 3 or 4 drivers that were submitted for review > > and are good candidates to be included in this new subsystem, instead of being > > a drm render node driver or a misc driver. > > > > [1] https://lkml.org/lkml/2022/7/31/83 > > [2] https://airlied.blogspot.com/2022/09/accelerators-bof-outcomes-summary.html > > > > Thanks, > > Oded > > > > Oded Gabbay (3): > > drivers/accel: add new kconfig and update MAINTAINERS > > drm: define new accel major and register it > > drm: add dedicated minor for accelerator devices > > > > Documentation/admin-guide/devices.txt | 5 + > > MAINTAINERS | 8 + > > drivers/Kconfig | 2 + > > drivers/accel/Kconfig | 24 +++ > > drivers/gpu/drm/drm_drv.c | 214 +++++++++++++++++++++----- > > drivers/gpu/drm/drm_file.c | 69 ++++++--- > > drivers/gpu/drm/drm_internal.h | 5 +- > > drivers/gpu/drm/drm_sysfs.c | 81 +++++++++- > > include/drm/drm_device.h | 3 + > > include/drm/drm_drv.h | 8 + > > include/drm/drm_file.h | 21 ++- > > include/drm/drm_ioctl.h | 1 + > > 12 files changed, 374 insertions(+), 67 deletions(-) > > create mode 100644 drivers/accel/Kconfig > > > > -- > > 2.34.1 > >
On Mon, Oct 24, 2022 at 10:41 AM Oded Gabbay <ogabbay@kernel.org> wrote: > > On Mon, Oct 24, 2022 at 4:55 PM Alex Deucher <alexdeucher@gmail.com> wrote: > > > > On Sat, Oct 22, 2022 at 5:46 PM Oded Gabbay <ogabbay@kernel.org> wrote: > > > > > > In the last couple of months we had a discussion [1] about creating a new > > > subsystem for compute accelerator devices in the kernel. > > > > > > After an analysis that was done by DRM maintainers and myself, and following > > > a BOF session at the Linux Plumbers conference a few weeks ago [2], we > > > decided to create a new subsystem that will use the DRM subsystem's code and > > > functionality. i.e. the accel core code will be part of the DRM subsystem. > > > > > > This will allow us to leverage the extensive DRM code-base and > > > collaborate with DRM developers that have experience with this type of > > > devices. In addition, new features that will be added for the accelerator > > > drivers can be of use to GPU drivers as well (e.g. RAS). > > > > > > As agreed in the BOF session, the accelerator devices will be exposed to > > > user-space with a new, dedicated device char files and a dedicated major > > > number (261), to clearly separate them from graphic cards and the graphic > > > user-space s/w stack. Furthermore, the drivers will be located in a separate > > > place in the kernel tree (drivers/accel/). > > > > > > This series of patches is the first step in this direction as it adds the > > > necessary infrastructure for accelerator devices to DRM. The new devices will > > > be exposed with the following convention: > > > > > > device char files - /dev/accel/accel* > > > sysfs - /sys/class/accel/accel*/ > > > debugfs - /sys/kernel/debug/accel/accel*/ > > > > > > I tried to reuse the existing DRM code as much as possible, while keeping it > > > readable and maintainable. > > > > Wouldn't something like this: > > https://patchwork.freedesktop.org/series/109575/ > > Be simpler and provide better backwards compatibility for existing > > non-gfx devices in the drm subsystem as well as newer devices? > > As Greg said, see the summary. The consensus in the LPC session was > that we need to clearly separate accel devices from existing gpu > devices (whether they use primary and/or render nodes). That is the > main guideline according to which I wrote the patches. I don't think I > want to change this decision. > > Also, there was never any intention to provide backward compatibility > for existing non-gfx devices. Why would we want that ? We are mainly > talking about drivers that are currently trying to get upstream, and > the habana driver. If someone already has a non-gfx device which uses the drm subsystem, should they be converted to the new accel stuff? What about new devices that utilize the same driver? SHould they use accel or continue to use drm? For the sake of the rest of the stack drm would make more sense, but if accel grows a bunch of stuff that all accel drivers should be using what do we do? Also using render nodes also makes the devices compatible with all of the existing user space tools that use the existing drm device nodes like libdrm, etc. I'm failing to see what advantage accel brings other than requiring userspace to support two very similar device nodes. Alex > > Oded > > > > Alex > > > > > > > > One thing that is missing from this series is defining a namespace for the > > > new accel subsystem, while I'll add in the next iteration of this patch-set, > > > after I will receive feedback from the community. > > > > > > As for drivers, once this series will be accepted (after adding the namespace), > > > I will start working on migrating the habanalabs driver to the new accel > > > subsystem. I have talked about it with Dave and we agreed that it will be > > > a good start to simply move the driver as-is with minimal changes, and then > > > start working on the driver's individual features that will be either added > > > to the accel core code (with or without changes), or will be removed and > > > instead the driver will use existing DRM code. > > > > > > In addition, I know of at least 3 or 4 drivers that were submitted for review > > > and are good candidates to be included in this new subsystem, instead of being > > > a drm render node driver or a misc driver. > > > > > > [1] https://lkml.org/lkml/2022/7/31/83 > > > [2] https://airlied.blogspot.com/2022/09/accelerators-bof-outcomes-summary.html > > > > > > Thanks, > > > Oded > > > > > > Oded Gabbay (3): > > > drivers/accel: add new kconfig and update MAINTAINERS > > > drm: define new accel major and register it > > > drm: add dedicated minor for accelerator devices > > > > > > Documentation/admin-guide/devices.txt | 5 + > > > MAINTAINERS | 8 + > > > drivers/Kconfig | 2 + > > > drivers/accel/Kconfig | 24 +++ > > > drivers/gpu/drm/drm_drv.c | 214 +++++++++++++++++++++----- > > > drivers/gpu/drm/drm_file.c | 69 ++++++--- > > > drivers/gpu/drm/drm_internal.h | 5 +- > > > drivers/gpu/drm/drm_sysfs.c | 81 +++++++++- > > > include/drm/drm_device.h | 3 + > > > include/drm/drm_drv.h | 8 + > > > include/drm/drm_file.h | 21 ++- > > > include/drm/drm_ioctl.h | 1 + > > > 12 files changed, 374 insertions(+), 67 deletions(-) > > > create mode 100644 drivers/accel/Kconfig > > > > > > -- > > > 2.34.1 > > >
On 10/24/22 05:43, Oded Gabbay wrote: Hi Oded, The patches make sense to me. I'm still just reading through and looking for minor issues, but at a high level it seems to match what the LPC discussions pointed to. >> What's your opinion on the long-term prospect of DRM vs accel? I assume >> that over time, DRM helpers will move into accel and some DRM drivers >> will start depending on accel? > I don't think that is what I had in mind. > What I had in mind is that accel helpers are only relevant for accel > drivers, and any code that might also be relevant for DRM drivers will > be placed in DRM core code. e.g. GEM enhancements, RAS netlink Yes. That is how I understood it ("it" being both the LPC discussions, and this patchset) as well: * accel-only code goes in drivers/accel, thus allowing for smaller, simpler drivers (as compared to full drm) for that case. * graphics and display code still goes in drivers/gpu/drm, because it is much too hard to rename or move that directory. * code common to both also goes in drivers/gpu/drm. Looking ahead a bit more: For full-featured GPUs that do both Graphics and Compute, I expect that a *lot* of the code will end up in drivers/gpu/drm. Because so much of setting up for Compute is also really just setting up for Graphics--that's how it evolved, after all! And as things are structured now, it looks like those full featured GPU stacks will also need an aux bus (which I only just now learned about, but it looks quite helpful here). And also, user space will need to open both /dev/dri/* and /dev/accel/* nodes, if it needs access to anything live objects that drivers/accel owns. thanks,
On Tue, 25 Oct 2022 at 12:21, John Hubbard <jhubbard@nvidia.com> wrote: > > On 10/24/22 05:43, Oded Gabbay wrote: > > Hi Oded, > > The patches make sense to me. I'm still just reading through and looking > for minor issues, but at a high level it seems to match what the LPC > discussions pointed to. > > >> What's your opinion on the long-term prospect of DRM vs accel? I assume > >> that over time, DRM helpers will move into accel and some DRM drivers > >> will start depending on accel? > > I don't think that is what I had in mind. > > What I had in mind is that accel helpers are only relevant for accel > > drivers, and any code that might also be relevant for DRM drivers will > > be placed in DRM core code. e.g. GEM enhancements, RAS netlink > > Yes. That is how I understood it ("it" being both the LPC discussions, > and this patchset) as well: > > * accel-only code goes in drivers/accel, thus allowing for > smaller, simpler drivers (as compared to full drm) for that case. > > * graphics and display code still goes in drivers/gpu/drm, because > it is much too hard to rename or move that directory. > > * code common to both also goes in drivers/gpu/drm. > > Looking ahead a bit more: > > For full-featured GPUs that do both Graphics and Compute, I expect > that a *lot* of the code will end up in drivers/gpu/drm. Because so > much of setting up for Compute is also really just setting up for > Graphics--that's how it evolved, after all! > > And as things are structured now, it looks like those full featured > GPU stacks will also need an aux bus (which I only just now learned > about, but it looks quite helpful here). And also, user space will > need to open both /dev/dri/* and /dev/accel/* nodes, if it needs > access to anything live objects that drivers/accel owns. > I actually don't know if we really need to worry about compute nodes for fully featured devices. The userspace for those is normally bespoke like ROCm, which uses amdkfd, and amdkfd doesn't operate like most device files from what I know, so I'm not sure we'd want it to operate as an accel device. Or the userspace is OpenCL like where we have stacks that already bind using the drm interfaces so again not sure if there's any value there. For anything which already has a userspace and stuff I don't think this adds any value, for nvidia type cards I doubt there is much use in using an accel node for the GPU related things at all. Dave.
On Tue, Oct 25, 2022 at 12:27:11PM +1000, Dave Airlie wrote: > The userspace for those is normally bespoke like ROCm, which uses > amdkfd, and amdkfd doesn't operate like most device files from what I > know, so I'm not sure we'd want it to operate as an accel device. I intensely dislike this direction that drivers will create their own char devs buried inside their device driver with no support or supervision. We've been here before with RDMA and it is just a complete mess. Whatever special non-drm stuff amdkfd need to do should be supported through the new subsystem, in a proper maintainable way. Jason
On Tue, Oct 25, 2022 at 7:15 AM Jason Gunthorpe <jgg@nvidia.com> wrote: > > On Tue, Oct 25, 2022 at 12:27:11PM +1000, Dave Airlie wrote: > > > The userspace for those is normally bespoke like ROCm, which uses > > amdkfd, and amdkfd doesn't operate like most device files from what I > > know, so I'm not sure we'd want it to operate as an accel device. > > I intensely dislike this direction that drivers will create their own > char devs buried inside their device driver with no support or > supervision. > > We've been here before with RDMA and it is just a complete mess. > > Whatever special non-drm stuff amdkfd need to do should be supported > through the new subsystem, in a proper maintainable way. We plan to eventually move ROCm over the drm interfaces once we get user mode queues working on non-compute queues which is already in progress. ROCm already uses the existing drm nodes and libdrm for a number of things today (buffer sharing, media and compute command submission in certain cases, etc.). I don't see much value in the accel nodes for AMD products at this time. Even when we transition, there are still a bunch of things that we'd need to think about, so the current kfd node may stick around until we figure out a plan for those areas. E.g., the kfd node provides platform level compute topology information; e.g., the NUMA details for connected GPUs and CPUs, non-GPU compute node information, cache level topologies, etc. Alex > > Jason
On Tue, Oct 25, 2022 at 10:21:34AM -0400, Alex Deucher wrote: > E.g., the kfd node provides platform level compute > topology information; e.g., the NUMA details for connected GPUs and > CPUs, non-GPU compute node information, cache level topologies, etc. See, this is exactly what I'm talking about. What on earth does any of this have to do with DRM? We alread have places in the kernel that own and expose these kinds of information, drivers need to use them. Not re-invent them. Jason
On Tue, Oct 25, 2022 at 10:34 AM Jason Gunthorpe <jgg@nvidia.com> wrote: > > On Tue, Oct 25, 2022 at 10:21:34AM -0400, Alex Deucher wrote: > > > E.g., the kfd node provides platform level compute > > topology information; e.g., the NUMA details for connected GPUs and > > CPUs, non-GPU compute node information, cache level topologies, etc. > > See, this is exactly what I'm talking about. What on earth does any of > this have to do with DRM? At least for the GPU information it seems relevant. What value are acceleration device cache topologies outside of the subsytsem that uses them? > > We alread have places in the kernel that own and expose these kinds of > information, drivers need to use them. Not re-invent them. I don't disagree, but I'm not sure where the best place for these should be. Probably a lack of knowledge of where this should actually live and indifference from the maintainers of those areas since this use case doesn't match existing ones. Alex
On Mon, Oct 24, 2022 at 6:10 PM Alex Deucher <alexdeucher@gmail.com> wrote: > > On Mon, Oct 24, 2022 at 10:41 AM Oded Gabbay <ogabbay@kernel.org> wrote: > > > > On Mon, Oct 24, 2022 at 4:55 PM Alex Deucher <alexdeucher@gmail.com> wrote: > > > > > > On Sat, Oct 22, 2022 at 5:46 PM Oded Gabbay <ogabbay@kernel.org> wrote: > > > > > > > > In the last couple of months we had a discussion [1] about creating a new > > > > subsystem for compute accelerator devices in the kernel. > > > > > > > > After an analysis that was done by DRM maintainers and myself, and following > > > > a BOF session at the Linux Plumbers conference a few weeks ago [2], we > > > > decided to create a new subsystem that will use the DRM subsystem's code and > > > > functionality. i.e. the accel core code will be part of the DRM subsystem. > > > > > > > > This will allow us to leverage the extensive DRM code-base and > > > > collaborate with DRM developers that have experience with this type of > > > > devices. In addition, new features that will be added for the accelerator > > > > drivers can be of use to GPU drivers as well (e.g. RAS). > > > > > > > > As agreed in the BOF session, the accelerator devices will be exposed to > > > > user-space with a new, dedicated device char files and a dedicated major > > > > number (261), to clearly separate them from graphic cards and the graphic > > > > user-space s/w stack. Furthermore, the drivers will be located in a separate > > > > place in the kernel tree (drivers/accel/). > > > > > > > > This series of patches is the first step in this direction as it adds the > > > > necessary infrastructure for accelerator devices to DRM. The new devices will > > > > be exposed with the following convention: > > > > > > > > device char files - /dev/accel/accel* > > > > sysfs - /sys/class/accel/accel*/ > > > > debugfs - /sys/kernel/debug/accel/accel*/ > > > > > > > > I tried to reuse the existing DRM code as much as possible, while keeping it > > > > readable and maintainable. > > > > > > Wouldn't something like this: > > > https://patchwork.freedesktop.org/series/109575/ > > > Be simpler and provide better backwards compatibility for existing > > > non-gfx devices in the drm subsystem as well as newer devices? > > > > As Greg said, see the summary. The consensus in the LPC session was > > that we need to clearly separate accel devices from existing gpu > > devices (whether they use primary and/or render nodes). That is the > > main guideline according to which I wrote the patches. I don't think I > > want to change this decision. > > > > Also, there was never any intention to provide backward compatibility > > for existing non-gfx devices. Why would we want that ? We are mainly > > talking about drivers that are currently trying to get upstream, and > > the habana driver. > > If someone already has a non-gfx device which uses the drm subsystem, > should they be converted to the new accel stuff? What about new > devices that utilize the same driver? SHould they use accel or > continue to use drm? My baseline assumption was that this subsystem is mainly (but not solely) for new drivers that are now trying to get upstreamed and for the habana driver. imo we should not force existing drivers to convert their entire driver just because we created a new subsystem. If they want to do it, they are more than welcomed. But that's only my opinion and other maintainers might think otherwise. > For the sake of the rest of the stack drm would > make more sense, but if accel grows a bunch of stuff that all accel > drivers should be using what do we do? First of all, as I wrote in another email, I don't think accel core code will be very large. Otherwise, I probably would have tried to convince people that the accel stuff should be totally independent of drm. You can see I tried to make the code tightly-coupled with drm (too much according to the reviews) and I did that because I believe most core code will be common to drm and accel. So I'm not worried about this aspect. Second, yes, if for some reason there will be accel-only features that devices want to use, they will need to create an accel device that will have this functionality and be connected via auxiliary bus to their main driver (which can be drm or other subsystem, e.g. nvme). For example, to utilize Ethernet and RDMA features, habana is now writing Ethernet and RDMA drivers that will be upstreamed and they will be connected to the main/compute driver via auxiliary bus. > Also using render nodes also > makes the devices compatible with all of the existing user space tools > that use the existing drm device nodes like libdrm, etc. I'm failing > to see what advantage accel brings other than requiring userspace to > support two very similar device nodes. This is exactly what we are trying to avoid here :) We want to make sure that all existing user space tools that use drm devices will NOT work with the accel devices. Accel devices are not GPUs. The h/w ip might be a part of a GPU ASIC, but the specific functionality is not related to the drm/mesa/x-server/wayland/opengl/vulkan stack. I don't want them to expose render nodes that Chrome or some other application tries to open because it thinks it is a GPU... So it was the majority opinion of the people in LPC that we should make a clear separation. If there is no separation, then I don't see the point in doing an accel subsystem, let's just continue to do drm. Thanks, Oded > > Alex > > > > > Oded > > > > > > Alex > > > > > > > > > > > One thing that is missing from this series is defining a namespace for the > > > > new accel subsystem, while I'll add in the next iteration of this patch-set, > > > > after I will receive feedback from the community. > > > > > > > > As for drivers, once this series will be accepted (after adding the namespace), > > > > I will start working on migrating the habanalabs driver to the new accel > > > > subsystem. I have talked about it with Dave and we agreed that it will be > > > > a good start to simply move the driver as-is with minimal changes, and then > > > > start working on the driver's individual features that will be either added > > > > to the accel core code (with or without changes), or will be removed and > > > > instead the driver will use existing DRM code. > > > > > > > > In addition, I know of at least 3 or 4 drivers that were submitted for review > > > > and are good candidates to be included in this new subsystem, instead of being > > > > a drm render node driver or a misc driver. > > > > > > > > [1] https://lkml.org/lkml/2022/7/31/83 > > > > [2] https://airlied.blogspot.com/2022/09/accelerators-bof-outcomes-summary.html > > > > > > > > Thanks, > > > > Oded > > > > > > > > Oded Gabbay (3): > > > > drivers/accel: add new kconfig and update MAINTAINERS > > > > drm: define new accel major and register it > > > > drm: add dedicated minor for accelerator devices > > > > > > > > Documentation/admin-guide/devices.txt | 5 + > > > > MAINTAINERS | 8 + > > > > drivers/Kconfig | 2 + > > > > drivers/accel/Kconfig | 24 +++ > > > > drivers/gpu/drm/drm_drv.c | 214 +++++++++++++++++++++----- > > > > drivers/gpu/drm/drm_file.c | 69 ++++++--- > > > > drivers/gpu/drm/drm_internal.h | 5 +- > > > > drivers/gpu/drm/drm_sysfs.c | 81 +++++++++- > > > > include/drm/drm_device.h | 3 + > > > > include/drm/drm_drv.h | 8 + > > > > include/drm/drm_file.h | 21 ++- > > > > include/drm/drm_ioctl.h | 1 + > > > > 12 files changed, 374 insertions(+), 67 deletions(-) > > > > create mode 100644 drivers/accel/Kconfig > > > > > > > > -- > > > > 2.34.1 > > > >