Message ID | 20210705130314.11519-1-ogabbay@kernel.org (mailing list archive) |
---|---|
Headers | show |
Series | Add p2p via dmabuf to habanalabs | expand |
On Mon, Jul 05, 2021 at 04:03:12PM +0300, Oded Gabbay wrote: > Hi, > I'm sending v4 of this patch-set following the long email thread. > I want to thank Jason for reviewing v3 and pointing out the errors, saving > us time later to debug it :) > > I consulted with Christian on how to fix patch 2 (the implementation) and > at the end of the day I shamelessly copied the relevant content from > amdgpu_vram_mgr_alloc_sgt() and amdgpu_dma_buf_attach(), regarding the > usage of dma_map_resource() and pci_p2pdma_distance_many(), respectively. > > I also made a few improvements after looking at the relevant code in amdgpu. > The details are in the changelog of patch 2. > > I took the time to write an import code into the driver, allowing me to > check real P2P with two Gaudi devices, one as exporter and the other as > importer. I'm not going to include the import code in the product, it was > just for testing purposes (although I can share it if anyone wants). > > I run it on a bare-metal environment with IOMMU enabled, on a sky-lake CPU > with a white-listed PCIe bridge (to make the pci_p2pdma_distance_many happy). > > Greg, I hope this will be good enough for you to merge this code. So we're officially going to use dri-devel for technical details review and then Greg for merging so we don't have to deal with other merge criteria dri-devel folks have? I don't expect anything less by now, but it does make the original claim that drivers/misc will not step all over accelerators folks a complete farce under the totally-not-a-gpu banner. This essentially means that for any other accelerator stack that doesn't fit the dri-devel merge criteria, even if it's acting like a gpu and uses other gpu driver stuff, you can just send it to Greg and it's good to go. There's quite a lot of these floating around actually (and many do have semi-open runtimes, like habanalabs have now too, just not open enough to be actually useful). It's going to be absolutely lovely having to explain to these companies in background chats why habanalabs gets away with their stack and they don't. Or maybe we should just merge them all and give up on the idea of having open cross-vendor driver stacks for these accelerators. Thanks, Daniel > > Thanks, > Oded > > Oded Gabbay (1): > habanalabs: define uAPI to export FD for DMA-BUF > > Tomer Tayar (1): > habanalabs: add support for dma-buf exporter > > drivers/misc/habanalabs/Kconfig | 1 + > drivers/misc/habanalabs/common/habanalabs.h | 26 ++ > drivers/misc/habanalabs/common/memory.c | 480 +++++++++++++++++++- > drivers/misc/habanalabs/gaudi/gaudi.c | 1 + > drivers/misc/habanalabs/goya/goya.c | 1 + > include/uapi/misc/habanalabs.h | 28 +- > 6 files changed, 532 insertions(+), 5 deletions(-) > > -- > 2.25.1 >
On Tue, Jul 6, 2021 at 11:40 AM Daniel Vetter <daniel@ffwll.ch> wrote: > > On Mon, Jul 05, 2021 at 04:03:12PM +0300, Oded Gabbay wrote: > > Hi, > > I'm sending v4 of this patch-set following the long email thread. > > I want to thank Jason for reviewing v3 and pointing out the errors, saving > > us time later to debug it :) > > > > I consulted with Christian on how to fix patch 2 (the implementation) and > > at the end of the day I shamelessly copied the relevant content from > > amdgpu_vram_mgr_alloc_sgt() and amdgpu_dma_buf_attach(), regarding the > > usage of dma_map_resource() and pci_p2pdma_distance_many(), respectively. > > > > I also made a few improvements after looking at the relevant code in amdgpu. > > The details are in the changelog of patch 2. > > > > I took the time to write an import code into the driver, allowing me to > > check real P2P with two Gaudi devices, one as exporter and the other as > > importer. I'm not going to include the import code in the product, it was > > just for testing purposes (although I can share it if anyone wants). > > > > I run it on a bare-metal environment with IOMMU enabled, on a sky-lake CPU > > with a white-listed PCIe bridge (to make the pci_p2pdma_distance_many happy). > > > > Greg, I hope this will be good enough for you to merge this code. > > So we're officially going to use dri-devel for technical details review > and then Greg for merging so we don't have to deal with other merge > criteria dri-devel folks have? I'm glad to receive any help or review, regardless of the subsystem the person giving that help belongs to. > > I don't expect anything less by now, but it does make the original claim > that drivers/misc will not step all over accelerators folks a complete > farce under the totally-not-a-gpu banner. > > This essentially means that for any other accelerator stack that doesn't > fit the dri-devel merge criteria, even if it's acting like a gpu and uses > other gpu driver stuff, you can just send it to Greg and it's good to go. What's wrong with Greg ??? ;) On a more serious note, yes, I do think the dri-devel merge criteria is very extreme, and effectively drives-out many AI accelerator companies that want to contribute to the kernel but can't/won't open their software IP and patents. I think the expectation from AI startups (who are 90% of the deep learning field) to cooperate outside of company boundaries is not realistic, especially on the user-side, where the real IP of the company resides. Personally I don't think there is a real justification for that at this point of time, but if it will make you (and other people here) happy I really don't mind creating a non-gpu accelerator subsystem that will contain all the totally-not-a-gpu accelerators, and will have a more relaxed criteria for upstreaming. Something along an "rdma-core" style library looks like the correct amount of user-level open source that should be enough. The question is, what will happen later ? Will it be sufficient to "allow" us to use dmabuf and maybe other gpu stuff in the future (e.g. hmm) ? If the community and dri-devel maintainers (and you among them) will assure me it is good enough, then I'll happily contribute my work and personal time to organize this effort and implement it. Thanks, oded > > There's quite a lot of these floating around actually (and many do have > semi-open runtimes, like habanalabs have now too, just not open enough to > be actually useful). It's going to be absolutely lovely having to explain > to these companies in background chats why habanalabs gets away with their > stack and they don't. > > Or maybe we should just merge them all and give up on the idea of having > open cross-vendor driver stacks for these accelerators. > > Thanks, Daniel > > > > > Thanks, > > Oded > > > > Oded Gabbay (1): > > habanalabs: define uAPI to export FD for DMA-BUF > > > > Tomer Tayar (1): > > habanalabs: add support for dma-buf exporter > > > > drivers/misc/habanalabs/Kconfig | 1 + > > drivers/misc/habanalabs/common/habanalabs.h | 26 ++ > > drivers/misc/habanalabs/common/memory.c | 480 +++++++++++++++++++- > > drivers/misc/habanalabs/gaudi/gaudi.c | 1 + > > drivers/misc/habanalabs/goya/goya.c | 1 + > > include/uapi/misc/habanalabs.h | 28 +- > > 6 files changed, 532 insertions(+), 5 deletions(-) > > > > -- > > 2.25.1 > > > > -- > Daniel Vetter > Software Engineer, Intel Corporation > http://blog.ffwll.ch
On Tue, Jul 6, 2021 at 12:03 PM Oded Gabbay <oded.gabbay@gmail.com> wrote: > > On Tue, Jul 6, 2021 at 11:40 AM Daniel Vetter <daniel@ffwll.ch> wrote: > > > > On Mon, Jul 05, 2021 at 04:03:12PM +0300, Oded Gabbay wrote: > > > Hi, > > > I'm sending v4 of this patch-set following the long email thread. > > > I want to thank Jason for reviewing v3 and pointing out the errors, saving > > > us time later to debug it :) > > > > > > I consulted with Christian on how to fix patch 2 (the implementation) and > > > at the end of the day I shamelessly copied the relevant content from > > > amdgpu_vram_mgr_alloc_sgt() and amdgpu_dma_buf_attach(), regarding the > > > usage of dma_map_resource() and pci_p2pdma_distance_many(), respectively. > > > > > > I also made a few improvements after looking at the relevant code in amdgpu. > > > The details are in the changelog of patch 2. > > > > > > I took the time to write an import code into the driver, allowing me to > > > check real P2P with two Gaudi devices, one as exporter and the other as > > > importer. I'm not going to include the import code in the product, it was > > > just for testing purposes (although I can share it if anyone wants). > > > > > > I run it on a bare-metal environment with IOMMU enabled, on a sky-lake CPU > > > with a white-listed PCIe bridge (to make the pci_p2pdma_distance_many happy). > > > > > > Greg, I hope this will be good enough for you to merge this code. > > > > So we're officially going to use dri-devel for technical details review > > and then Greg for merging so we don't have to deal with other merge > > criteria dri-devel folks have? > I'm glad to receive any help or review, regardless of the subsystem > the person giving that help belongs to. > > > > > I don't expect anything less by now, but it does make the original claim > > that drivers/misc will not step all over accelerators folks a complete > > farce under the totally-not-a-gpu banner. > > > > This essentially means that for any other accelerator stack that doesn't > > fit the dri-devel merge criteria, even if it's acting like a gpu and uses > > other gpu driver stuff, you can just send it to Greg and it's good to go. > > What's wrong with Greg ??? ;) > > On a more serious note, yes, I do think the dri-devel merge criteria > is very extreme, and effectively drives-out many AI accelerator > companies that want to contribute to the kernel but can't/won't open > their software IP and patents. > > I think the expectation from AI startups (who are 90% of the deep > learning field) to cooperate outside of company boundaries is not > realistic, especially on the user-side, where the real IP of the > company resides. > > Personally I don't think there is a real justification for that at > this point of time, but if it will make you (and other people here) > happy I really don't mind creating a non-gpu accelerator subsystem > that will contain all the totally-not-a-gpu accelerators, and will > have a more relaxed criteria for upstreaming. Something along an > "rdma-core" style library looks like the correct amount of user-level > open source that should be enough. > > The question is, what will happen later ? Will it be sufficient to > "allow" us to use dmabuf and maybe other gpu stuff in the future (e.g. > hmm) ? > > If the community and dri-devel maintainers (and you among them) will > assure me it is good enough, then I'll happily contribute my work and > personal time to organize this effort and implement it. I think dri-devel stance is pretty clear and well known: We want the userspace to be open, because that's where most of the driver stack is. Without an open driver stack there's no way to ever have anything cross-vendor. And that includes the compiler and anything else you need to drive the hardware. Afaik linux cpu arch ports are also not accepted if there's no open gcc or llvm port around, because without that the overall stack just becomes useless. If that means AI companies don't want to open our their hw specs enough to allow that, so be it - all you get in that case is offloading the kernel side of the stack for convenience, with zero long term prospects to ever make this into a cross vendor subsystem stack that does something useful. If the business case says you can't open up your hw enough for that, I really don't see the point in merging such a driver, it'll be an unmaintainable stack by anyone else who's not having access to those NDA covered specs and patents and everything. If the stack is actually cross vendor to begin with that's just bonus, but generally that doesn't happen voluntarily and needs a few years to decades to get there. So that's not really something we require. tldr; just a runtime isn't enough for dri-devel. Now Greg seems to be happy to merge kernel drivers that aren't useful with the open bits provided, so *shrug*. Cheers, Daniel PS: If requiring an actually useful open driver stack is somehow *extreme* I have no idea why we even bother with merging device drivers to upstream. Just make a stable driver api and done, vendors can then do whatever they feel like and protect their "valuable IP and patents" or whatever it is. > Thanks, > oded > > > > > There's quite a lot of these floating around actually (and many do have > > semi-open runtimes, like habanalabs have now too, just not open enough to > > be actually useful). It's going to be absolutely lovely having to explain > > to these companies in background chats why habanalabs gets away with their > > stack and they don't. > > > > Or maybe we should just merge them all and give up on the idea of having > > open cross-vendor driver stacks for these accelerators. > > > > Thanks, Daniel > > > > > > > > Thanks, > > > Oded > > > > > > Oded Gabbay (1): > > > habanalabs: define uAPI to export FD for DMA-BUF > > > > > > Tomer Tayar (1): > > > habanalabs: add support for dma-buf exporter > > > > > > drivers/misc/habanalabs/Kconfig | 1 + > > > drivers/misc/habanalabs/common/habanalabs.h | 26 ++ > > > drivers/misc/habanalabs/common/memory.c | 480 +++++++++++++++++++- > > > drivers/misc/habanalabs/gaudi/gaudi.c | 1 + > > > drivers/misc/habanalabs/goya/goya.c | 1 + > > > include/uapi/misc/habanalabs.h | 28 +- > > > 6 files changed, 532 insertions(+), 5 deletions(-) > > > > > > -- > > > 2.25.1 > > > > > > > -- > > Daniel Vetter > > Software Engineer, Intel Corporation > > http://blog.ffwll.ch
On Tue, Jul 6, 2021 at 12:36 PM Daniel Vetter <daniel.vetter@ffwll.ch> wrote: > On Tue, Jul 6, 2021 at 12:03 PM Oded Gabbay <oded.gabbay@gmail.com> wrote: > > > > On Tue, Jul 6, 2021 at 11:40 AM Daniel Vetter <daniel@ffwll.ch> wrote: > > > > > > On Mon, Jul 05, 2021 at 04:03:12PM +0300, Oded Gabbay wrote: > > > > Hi, > > > > I'm sending v4 of this patch-set following the long email thread. > > > > I want to thank Jason for reviewing v3 and pointing out the errors, saving > > > > us time later to debug it :) > > > > > > > > I consulted with Christian on how to fix patch 2 (the implementation) and > > > > at the end of the day I shamelessly copied the relevant content from > > > > amdgpu_vram_mgr_alloc_sgt() and amdgpu_dma_buf_attach(), regarding the > > > > usage of dma_map_resource() and pci_p2pdma_distance_many(), respectively. > > > > > > > > I also made a few improvements after looking at the relevant code in amdgpu. > > > > The details are in the changelog of patch 2. > > > > > > > > I took the time to write an import code into the driver, allowing me to > > > > check real P2P with two Gaudi devices, one as exporter and the other as > > > > importer. I'm not going to include the import code in the product, it was > > > > just for testing purposes (although I can share it if anyone wants). > > > > > > > > I run it on a bare-metal environment with IOMMU enabled, on a sky-lake CPU > > > > with a white-listed PCIe bridge (to make the pci_p2pdma_distance_many happy). > > > > > > > > Greg, I hope this will be good enough for you to merge this code. > > > > > > So we're officially going to use dri-devel for technical details review > > > and then Greg for merging so we don't have to deal with other merge > > > criteria dri-devel folks have? > > I'm glad to receive any help or review, regardless of the subsystem > > the person giving that help belongs to. > > > > > > > > I don't expect anything less by now, but it does make the original claim > > > that drivers/misc will not step all over accelerators folks a complete > > > farce under the totally-not-a-gpu banner. > > > > > > This essentially means that for any other accelerator stack that doesn't > > > fit the dri-devel merge criteria, even if it's acting like a gpu and uses > > > other gpu driver stuff, you can just send it to Greg and it's good to go. > > > > What's wrong with Greg ??? ;) > > > > On a more serious note, yes, I do think the dri-devel merge criteria > > is very extreme, and effectively drives-out many AI accelerator > > companies that want to contribute to the kernel but can't/won't open > > their software IP and patents. > > > > I think the expectation from AI startups (who are 90% of the deep > > learning field) to cooperate outside of company boundaries is not > > realistic, especially on the user-side, where the real IP of the > > company resides. > > > > Personally I don't think there is a real justification for that at > > this point of time, but if it will make you (and other people here) > > happy I really don't mind creating a non-gpu accelerator subsystem > > that will contain all the totally-not-a-gpu accelerators, and will > > have a more relaxed criteria for upstreaming. Something along an > > "rdma-core" style library looks like the correct amount of user-level > > open source that should be enough. > > > > The question is, what will happen later ? Will it be sufficient to > > "allow" us to use dmabuf and maybe other gpu stuff in the future (e.g. > > hmm) ? > > > > If the community and dri-devel maintainers (and you among them) will > > assure me it is good enough, then I'll happily contribute my work and > > personal time to organize this effort and implement it. > > I think dri-devel stance is pretty clear and well known: We want the > userspace to be open, because that's where most of the driver stack > is. Without an open driver stack there's no way to ever have anything > cross-vendor. > > And that includes the compiler and anything else you need to drive the hardware. > > Afaik linux cpu arch ports are also not accepted if there's no open > gcc or llvm port around, because without that the overall stack just > becomes useless. > > If that means AI companies don't want to open our their hw specs > enough to allow that, so be it - all you get in that case is > offloading the kernel side of the stack for convenience, with zero > long term prospects to ever make this into a cross vendor subsystem > stack that does something useful. If the business case says you can't > open up your hw enough for that, I really don't see the point in > merging such a driver, it'll be an unmaintainable stack by anyone else > who's not having access to those NDA covered specs and patents and > everything. > > If the stack is actually cross vendor to begin with that's just bonus, > but generally that doesn't happen voluntarily and needs a few years to > decades to get there. So that's not really something we require. > > tldr; just a runtime isn't enough for dri-devel. > > Now Greg seems to be happy to merge kernel drivers that aren't useful > with the open bits provided, so *shrug*. > > Cheers, Daniel > > PS: If requiring an actually useful open driver stack is somehow > *extreme* I have no idea why we even bother with merging device > drivers to upstream. Just make a stable driver api and done, vendors > can then do whatever they feel like and protect their "valuable IP and > patents" or whatever it is. So perhaps this isn't clear, so let's explain this differently. The deal when having a driver in upstream is that both the vendor and upstream benefits: - vendor gets their driver carried and adjusted in upstream, because there's no stable uapi, and the benefit of being included everywhere by default - upstream gets the benefit to be able to hack around in more drivers, which generally leads to a more robust subsystem and driver architecture Now what you want is to have the benefits for you, without giving the wider community the benefit of actually being able to hack on your driver stack. Because you prefer to keep critical pieces of it protected and closed, which makes sure no one can create a new cross-vendor stack without your permission. Or without investing a lot of time into reverse-engineering the hardware. That's not extreme, that's just preferring to have your cake and eat it too. And frankly on dri-devel we don't take such a loopsided deal. Greg otoh seems to be totally fine, or not really understand what it takes to build an accelerator stack, or I dunno what, but he's happy merging them. Cheers, Daniel > > Thanks, > > oded > > > > > > > > There's quite a lot of these floating around actually (and many do have > > > semi-open runtimes, like habanalabs have now too, just not open enough to > > > be actually useful). It's going to be absolutely lovely having to explain > > > to these companies in background chats why habanalabs gets away with their > > > stack and they don't. > > > > > > Or maybe we should just merge them all and give up on the idea of having > > > open cross-vendor driver stacks for these accelerators. > > > > > > Thanks, Daniel > > > > > > > > > > > Thanks, > > > > Oded > > > > > > > > Oded Gabbay (1): > > > > habanalabs: define uAPI to export FD for DMA-BUF > > > > > > > > Tomer Tayar (1): > > > > habanalabs: add support for dma-buf exporter > > > > > > > > drivers/misc/habanalabs/Kconfig | 1 + > > > > drivers/misc/habanalabs/common/habanalabs.h | 26 ++ > > > > drivers/misc/habanalabs/common/memory.c | 480 +++++++++++++++++++- > > > > drivers/misc/habanalabs/gaudi/gaudi.c | 1 + > > > > drivers/misc/habanalabs/goya/goya.c | 1 + > > > > include/uapi/misc/habanalabs.h | 28 +- > > > > 6 files changed, 532 insertions(+), 5 deletions(-) > > > > > > > > -- > > > > 2.25.1 > > > > > > > > > > -- > > > Daniel Vetter > > > Software Engineer, Intel Corporation > > > http://blog.ffwll.ch > > > > -- > Daniel Vetter > Software Engineer, Intel Corporation > http://blog.ffwll.ch
On Tue, Jul 6, 2021 at 12:47 PM Daniel Vetter <daniel.vetter@ffwll.ch> wrote: > On Tue, Jul 6, 2021 at 12:36 PM Daniel Vetter <daniel.vetter@ffwll.ch> wrote: > > On Tue, Jul 6, 2021 at 12:03 PM Oded Gabbay <oded.gabbay@gmail.com> wrote: > > > > > > On Tue, Jul 6, 2021 at 11:40 AM Daniel Vetter <daniel@ffwll.ch> wrote: > > > > > > > > On Mon, Jul 05, 2021 at 04:03:12PM +0300, Oded Gabbay wrote: > > > > > Hi, > > > > > I'm sending v4 of this patch-set following the long email thread. > > > > > I want to thank Jason for reviewing v3 and pointing out the errors, saving > > > > > us time later to debug it :) > > > > > > > > > > I consulted with Christian on how to fix patch 2 (the implementation) and > > > > > at the end of the day I shamelessly copied the relevant content from > > > > > amdgpu_vram_mgr_alloc_sgt() and amdgpu_dma_buf_attach(), regarding the > > > > > usage of dma_map_resource() and pci_p2pdma_distance_many(), respectively. > > > > > > > > > > I also made a few improvements after looking at the relevant code in amdgpu. > > > > > The details are in the changelog of patch 2. > > > > > > > > > > I took the time to write an import code into the driver, allowing me to > > > > > check real P2P with two Gaudi devices, one as exporter and the other as > > > > > importer. I'm not going to include the import code in the product, it was > > > > > just for testing purposes (although I can share it if anyone wants). > > > > > > > > > > I run it on a bare-metal environment with IOMMU enabled, on a sky-lake CPU > > > > > with a white-listed PCIe bridge (to make the pci_p2pdma_distance_many happy). > > > > > > > > > > Greg, I hope this will be good enough for you to merge this code. > > > > > > > > So we're officially going to use dri-devel for technical details review > > > > and then Greg for merging so we don't have to deal with other merge > > > > criteria dri-devel folks have? > > > I'm glad to receive any help or review, regardless of the subsystem > > > the person giving that help belongs to. > > > > > > > > > > > I don't expect anything less by now, but it does make the original claim > > > > that drivers/misc will not step all over accelerators folks a complete > > > > farce under the totally-not-a-gpu banner. > > > > > > > > This essentially means that for any other accelerator stack that doesn't > > > > fit the dri-devel merge criteria, even if it's acting like a gpu and uses > > > > other gpu driver stuff, you can just send it to Greg and it's good to go. > > > > > > What's wrong with Greg ??? ;) > > > > > > On a more serious note, yes, I do think the dri-devel merge criteria > > > is very extreme, and effectively drives-out many AI accelerator > > > companies that want to contribute to the kernel but can't/won't open > > > their software IP and patents. > > > > > > I think the expectation from AI startups (who are 90% of the deep > > > learning field) to cooperate outside of company boundaries is not > > > realistic, especially on the user-side, where the real IP of the > > > company resides. > > > > > > Personally I don't think there is a real justification for that at > > > this point of time, but if it will make you (and other people here) > > > happy I really don't mind creating a non-gpu accelerator subsystem > > > that will contain all the totally-not-a-gpu accelerators, and will > > > have a more relaxed criteria for upstreaming. Something along an > > > "rdma-core" style library looks like the correct amount of user-level > > > open source that should be enough. On the "rdma-core" idea, afaik rdma NIC do not have fully programmable cores in their hw, for which you'd need some kind of compiler to make use of the hardware and the interfaces the kernel provides? So not really compareable, but also my understanding is that rdma-core does actually allow you to reasonable use&drive all the hw features and kernel interfaces fully. So we actually want less on dri-devel, because for compute/accel chips we're currently happy with a vendor userspace. It just needs to be functional and complete, and open in its entirety. Now if there's going to be a AI/NN/spatial compute core runtime with all the things included that's cross-vendor that's obviously going to be great, but that's strictly a bonus. And eventually the long-term goal, once we have a few open stacks from various vendors. But atm we have 0 open stacks, so one thing at a time. > > > The question is, what will happen later ? Will it be sufficient to > > > "allow" us to use dmabuf and maybe other gpu stuff in the future (e.g. > > > hmm) ? > > > > > > If the community and dri-devel maintainers (and you among them) will > > > assure me it is good enough, then I'll happily contribute my work and > > > personal time to organize this effort and implement it. > > > > I think dri-devel stance is pretty clear and well known: We want the > > userspace to be open, because that's where most of the driver stack > > is. Without an open driver stack there's no way to ever have anything > > cross-vendor. > > > > And that includes the compiler and anything else you need to drive the hardware. > > > > Afaik linux cpu arch ports are also not accepted if there's no open > > gcc or llvm port around, because without that the overall stack just > > becomes useless. > > > > If that means AI companies don't want to open our their hw specs > > enough to allow that, so be it - all you get in that case is > > offloading the kernel side of the stack for convenience, with zero > > long term prospects to ever make this into a cross vendor subsystem > > stack that does something useful. If the business case says you can't > > open up your hw enough for that, I really don't see the point in > > merging such a driver, it'll be an unmaintainable stack by anyone else > > who's not having access to those NDA covered specs and patents and > > everything. > > > > If the stack is actually cross vendor to begin with that's just bonus, > > but generally that doesn't happen voluntarily and needs a few years to > > decades to get there. So that's not really something we require. > > > > tldr; just a runtime isn't enough for dri-devel. > > > > Now Greg seems to be happy to merge kernel drivers that aren't useful > > with the open bits provided, so *shrug*. > > > > Cheers, Daniel > > > > PS: If requiring an actually useful open driver stack is somehow > > *extreme* I have no idea why we even bother with merging device > > drivers to upstream. Just make a stable driver api and done, vendors > > can then do whatever they feel like and protect their "valuable IP and > > patents" or whatever it is. > > So perhaps this isn't clear, so let's explain this differently. > > The deal when having a driver in upstream is that both the vendor and > upstream benefits: > - vendor gets their driver carried and adjusted in upstream, because > there's no stable uapi, and the benefit of being included everywhere s/uapi/kernel driver api/ ofc, but I got it right in the first reply at least. -Daniel > by default > - upstream gets the benefit to be able to hack around in more drivers, > which generally leads to a more robust subsystem and driver > architecture > > Now what you want is to have the benefits for you, without giving the > wider community the benefit of actually being able to hack on your > driver stack. Because you prefer to keep critical pieces of it > protected and closed, which makes sure no one can create a new > cross-vendor stack without your permission. Or without investing a lot > of time into reverse-engineering the hardware. That's not extreme, > that's just preferring to have your cake and eat it too. > > And frankly on dri-devel we don't take such a loopsided deal. Greg > otoh seems to be totally fine, or not really understand what it takes > to build an accelerator stack, or I dunno what, but he's happy merging > them. > > Cheers, Daniel > > > > > Thanks, > > > oded > > > > > > > > > > > There's quite a lot of these floating around actually (and many do have > > > > semi-open runtimes, like habanalabs have now too, just not open enough to > > > > be actually useful). It's going to be absolutely lovely having to explain > > > > to these companies in background chats why habanalabs gets away with their > > > > stack and they don't. > > > > > > > > Or maybe we should just merge them all and give up on the idea of having > > > > open cross-vendor driver stacks for these accelerators. > > > > > > > > Thanks, Daniel > > > > > > > > > > > > > > Thanks, > > > > > Oded > > > > > > > > > > Oded Gabbay (1): > > > > > habanalabs: define uAPI to export FD for DMA-BUF > > > > > > > > > > Tomer Tayar (1): > > > > > habanalabs: add support for dma-buf exporter > > > > > > > > > > drivers/misc/habanalabs/Kconfig | 1 + > > > > > drivers/misc/habanalabs/common/habanalabs.h | 26 ++ > > > > > drivers/misc/habanalabs/common/memory.c | 480 +++++++++++++++++++- > > > > > drivers/misc/habanalabs/gaudi/gaudi.c | 1 + > > > > > drivers/misc/habanalabs/goya/goya.c | 1 + > > > > > include/uapi/misc/habanalabs.h | 28 +- > > > > > 6 files changed, 532 insertions(+), 5 deletions(-) > > > > > > > > > > -- > > > > > 2.25.1 > > > > > > > > > > > > > -- > > > > Daniel Vetter > > > > Software Engineer, Intel Corporation > > > > http://blog.ffwll.ch > > > > > > > > -- > > Daniel Vetter > > Software Engineer, Intel Corporation > > http://blog.ffwll.ch > > > > -- > Daniel Vetter > Software Engineer, Intel Corporation > http://blog.ffwll.ch
On Tue, Jul 06, 2021 at 10:40:37AM +0200, Daniel Vetter wrote: > > Greg, I hope this will be good enough for you to merge this code. > > So we're officially going to use dri-devel for technical details review > and then Greg for merging so we don't have to deal with other merge > criteria dri-devel folks have? > > I don't expect anything less by now, but it does make the original claim > that drivers/misc will not step all over accelerators folks a complete > farce under the totally-not-a-gpu banner. > > This essentially means that for any other accelerator stack that doesn't > fit the dri-devel merge criteria, even if it's acting like a gpu and uses > other gpu driver stuff, you can just send it to Greg and it's good to go. > > There's quite a lot of these floating around actually (and many do have > semi-open runtimes, like habanalabs have now too, just not open enough to > be actually useful). It's going to be absolutely lovely having to explain > to these companies in background chats why habanalabs gets away with their > stack and they don't. FYI, I fully agree with Daniel here. Habanlabs needs to open up their runtime if they want to push any additional feature in the kernel. The current situation is not sustainable.
On Tue, Jul 06, 2021 at 12:36:51PM +0200, Daniel Vetter wrote: > Afaik linux cpu arch ports are also not accepted if there's no open > gcc or llvm port around, because without that the overall stack just > becomes useless. Yes. And the one architecture that has an open but not upstream compiler already is more than enough of a pain to not repeat that mistake ever again.
On Tue, Jul 06, 2021 at 02:21:10PM +0200, Christoph Hellwig wrote: > On Tue, Jul 06, 2021 at 10:40:37AM +0200, Daniel Vetter wrote: > > > Greg, I hope this will be good enough for you to merge this code. > > > > So we're officially going to use dri-devel for technical details review > > and then Greg for merging so we don't have to deal with other merge > > criteria dri-devel folks have? > > > > I don't expect anything less by now, but it does make the original claim > > that drivers/misc will not step all over accelerators folks a complete > > farce under the totally-not-a-gpu banner. > > > > This essentially means that for any other accelerator stack that doesn't > > fit the dri-devel merge criteria, even if it's acting like a gpu and uses > > other gpu driver stuff, you can just send it to Greg and it's good to go. > > > > There's quite a lot of these floating around actually (and many do have > > semi-open runtimes, like habanalabs have now too, just not open enough to > > be actually useful). It's going to be absolutely lovely having to explain > > to these companies in background chats why habanalabs gets away with their > > stack and they don't. > > FYI, I fully agree with Daniel here. Habanlabs needs to open up their > runtime if they want to push any additional feature in the kernel. > The current situation is not sustainable. Before anyone replies: The runtime is open, the compiler is still closed. This has become the new default for accel driver submissions, I think mostly because all the interesting bits for non-3d accelerators are in the accel ISA, and no longer in the runtime. So vendors are fairly happy to throw in the runtime as a freebie. It's still incomplete, and it's still useless if you want to actually hack on the driver stack. -Daniel
On Tue, Jul 6, 2021 at 3:23 PM Daniel Vetter <daniel@ffwll.ch> wrote: > > On Tue, Jul 06, 2021 at 02:21:10PM +0200, Christoph Hellwig wrote: > > On Tue, Jul 06, 2021 at 10:40:37AM +0200, Daniel Vetter wrote: > > > > Greg, I hope this will be good enough for you to merge this code. > > > > > > So we're officially going to use dri-devel for technical details review > > > and then Greg for merging so we don't have to deal with other merge > > > criteria dri-devel folks have? > > > > > > I don't expect anything less by now, but it does make the original claim > > > that drivers/misc will not step all over accelerators folks a complete > > > farce under the totally-not-a-gpu banner. > > > > > > This essentially means that for any other accelerator stack that doesn't > > > fit the dri-devel merge criteria, even if it's acting like a gpu and uses > > > other gpu driver stuff, you can just send it to Greg and it's good to go. > > > > > > There's quite a lot of these floating around actually (and many do have > > > semi-open runtimes, like habanalabs have now too, just not open enough to > > > be actually useful). It's going to be absolutely lovely having to explain > > > to these companies in background chats why habanalabs gets away with their > > > stack and they don't. > > > > FYI, I fully agree with Daniel here. Habanlabs needs to open up their > > runtime if they want to push any additional feature in the kernel. > > The current situation is not sustainable. Well, that's like, your opinion... > > Before anyone replies: The runtime is open, the compiler is still closed. > This has become the new default for accel driver submissions, I think > mostly because all the interesting bits for non-3d accelerators are in the > accel ISA, and no longer in the runtime. So vendors are fairly happy to > throw in the runtime as a freebie. > > It's still incomplete, and it's still useless if you want to actually hack > on the driver stack. > -Daniel > -- I don't understand what's not sustainable here. There is zero code inside the driver that communicates or interacts with our TPC code (TPC is the Tensor Processing Core). Even submitting works to the TPC is done via a generic queue interface. And that queue IP is common between all our engines (TPC/DMA/NIC). The driver provides all the specs of that queue IP, because the driver's code is handling that queue. But why is the TPC compiler code even relevant here ? btw, you can today see our TPC code at https://github.com/HabanaAI/Habana_Custom_Kernel There is a link there to the TPC user guide and link to download the LLVM compiler. Oded
On Tue, Jul 6, 2021 at 2:46 PM Oded Gabbay <oded.gabbay@gmail.com> wrote: > > On Tue, Jul 6, 2021 at 3:23 PM Daniel Vetter <daniel@ffwll.ch> wrote: > > > > On Tue, Jul 06, 2021 at 02:21:10PM +0200, Christoph Hellwig wrote: > > > On Tue, Jul 06, 2021 at 10:40:37AM +0200, Daniel Vetter wrote: > > > > > Greg, I hope this will be good enough for you to merge this code. > > > > > > > > So we're officially going to use dri-devel for technical details review > > > > and then Greg for merging so we don't have to deal with other merge > > > > criteria dri-devel folks have? > > > > > > > > I don't expect anything less by now, but it does make the original claim > > > > that drivers/misc will not step all over accelerators folks a complete > > > > farce under the totally-not-a-gpu banner. > > > > > > > > This essentially means that for any other accelerator stack that doesn't > > > > fit the dri-devel merge criteria, even if it's acting like a gpu and uses > > > > other gpu driver stuff, you can just send it to Greg and it's good to go. > > > > > > > > There's quite a lot of these floating around actually (and many do have > > > > semi-open runtimes, like habanalabs have now too, just not open enough to > > > > be actually useful). It's going to be absolutely lovely having to explain > > > > to these companies in background chats why habanalabs gets away with their > > > > stack and they don't. > > > > > > FYI, I fully agree with Daniel here. Habanlabs needs to open up their > > > runtime if they want to push any additional feature in the kernel. > > > The current situation is not sustainable. > Well, that's like, your opinion... > > > > > Before anyone replies: The runtime is open, the compiler is still closed. > > This has become the new default for accel driver submissions, I think > > mostly because all the interesting bits for non-3d accelerators are in the > > accel ISA, and no longer in the runtime. So vendors are fairly happy to > > throw in the runtime as a freebie. > > > > It's still incomplete, and it's still useless if you want to actually hack > > on the driver stack. > > -Daniel > > -- > I don't understand what's not sustainable here. > > There is zero code inside the driver that communicates or interacts > with our TPC code (TPC is the Tensor Processing Core). > Even submitting works to the TPC is done via a generic queue > interface. And that queue IP is common between all our engines > (TPC/DMA/NIC). The driver provides all the specs of that queue IP, > because the driver's code is handling that queue. But why is the TPC > compiler code even relevant here ? Can I use the hw how it's intended to be used without it? If the answer is no, then essentially what you're doing with your upstream driver is getting all the benefits of an upstream driver, while upstream gets nothing. We can't use your stack, not as-is. Sure we can use the queue, but we can't actually submit anything interesting. And I'm pretty sure the point of your hw is to do more than submit no-op packets to a queue. This is all "I want my cake and eat it too" approach to upstreaming, and it's totally fine attitude to have, but if you don't see why there's maybe an different side to it then I don't get what you're arguing. Upstream isn't free lunch for nothing. Frankly I'm starting to assume you're arguing this all in bad faith just because habanalabds doesn't want to actually have an open driver stack, so any attack is good, no matter what. Which is also what everyone else does who submits their accel driver to upstream, and which gets us back to the starting point of this sub-thread of me really appreciation how this will improve background discussions going forward for everyone. Like if the requirement for accel drivers truly is that you can submit a dummy command to the queues then I have about 5-10 drivers at least I could merge instantly. For something like the intel gpu driver it would be about 50 lines of code (including all the structure boiler plate the ioctls require)in userspace to submit a dummy queue command. GPU and accel vendors would really love that, because it would allow them to freeload on upstream and do essentially nothing in return. And we'd end up with an unmaintainable disaster of a gpu or well accelerator subsystem because there's nothing you can change or improve because all the really useful bits of the stack are closed. And ofc that's not any companies problem anymore, so ofc you with the habanalabs hat on don't care and call this *extreme*. > btw, you can today see our TPC code at > https://github.com/HabanaAI/Habana_Custom_Kernel > There is a link there to the TPC user guide and link to download the > LLVM compiler. I got stuck clicking links before I found the source for that llvm compiler. Can you give me a direct link to the repo with sourcecode instead please? Thanks, Daniel
On Tue, Jul 06, 2021 at 02:07:16PM +0200, Daniel Vetter wrote: > On the "rdma-core" idea, afaik rdma NIC do not have fully programmable > cores in their hw, for which you'd need some kind of compiler to make > use of the hardware and the interfaces the kernel provides? So not > really compareable, but also my understanding is that rdma-core does > actually allow you to reasonable use&drive all the hw features and > kernel interfaces fully. The whole HPC stack has speciality compilers of course. OpenMP, PGAS, etc. These compilers map onto library primitives that eventually boil down into rdma-core calls. Even the HW devices have various programmability that are being targetted with compilers now. People are making NIC devices with ARM cores/etc - P4 is emerging for some packet processing tasks. rdma-core can drive all the kernel interfaces with at least an ioctl wrapper, and it has a test suite that tries to cover this. It does not exercise the full HW capability, programmability, etc of every single device. I actually don't entirely know what everyone has built on top of rdma-core, or how I'd try to map it the DRI ideas you are trying to explain. Should we ban all Intel RDMA drivers because they are shipping proprietary Intel HPC compilers and proprietary Intel MPI which drives their RDMA HW? Or is that OK because there are open analogs for some of that stuff? And yes, the open versions are inferior in various metrics. Pragmatically what I want to see is enough RDMA common/open user space to understand the uAPI and thus more about how the kernel driver works. Forcing everyone into rdma-core has already prevented a number of uAPI mistakes in drivers that would have been bad - so at least this level really is valuable. > So we actually want less on dri-devel, because for compute/accel chips > we're currently happy with a vendor userspace. It just needs to be > functional and complete, and open in its entirety. In a sense yes: DRI doesn't insist on a single code base to act as the kernel interface, but that is actually the thing that has brought the most value to RDMA, IMHO. We've certainly had some interesting successes because of this. The first submission for AWS's EFA driver proposed to skip the rdma-core step, which was rejected. However since EFA has been in that ecosystem it has benefited greatly, I think. However, in another sense no: RDMA hasn't been blocking, say Intel, just because they have built proprietary stuff on top of our open stack. Honestly, I think GPU is approaching this backwards. Wayland should have been designed to prevent proprietary userspace stacks. Jason
On Tue, Jul 6, 2021 at 4:17 PM Daniel Vetter <daniel@ffwll.ch> wrote: > > On Tue, Jul 6, 2021 at 2:46 PM Oded Gabbay <oded.gabbay@gmail.com> wrote: > > > > On Tue, Jul 6, 2021 at 3:23 PM Daniel Vetter <daniel@ffwll.ch> wrote: > > > > > > On Tue, Jul 06, 2021 at 02:21:10PM +0200, Christoph Hellwig wrote: > > > > On Tue, Jul 06, 2021 at 10:40:37AM +0200, Daniel Vetter wrote: > > > > > > Greg, I hope this will be good enough for you to merge this code. > > > > > > > > > > So we're officially going to use dri-devel for technical details review > > > > > and then Greg for merging so we don't have to deal with other merge > > > > > criteria dri-devel folks have? > > > > > > > > > > I don't expect anything less by now, but it does make the original claim > > > > > that drivers/misc will not step all over accelerators folks a complete > > > > > farce under the totally-not-a-gpu banner. > > > > > > > > > > This essentially means that for any other accelerator stack that doesn't > > > > > fit the dri-devel merge criteria, even if it's acting like a gpu and uses > > > > > other gpu driver stuff, you can just send it to Greg and it's good to go. > > > > > > > > > > There's quite a lot of these floating around actually (and many do have > > > > > semi-open runtimes, like habanalabs have now too, just not open enough to > > > > > be actually useful). It's going to be absolutely lovely having to explain > > > > > to these companies in background chats why habanalabs gets away with their > > > > > stack and they don't. > > > > > > > > FYI, I fully agree with Daniel here. Habanlabs needs to open up their > > > > runtime if they want to push any additional feature in the kernel. > > > > The current situation is not sustainable. > > Well, that's like, your opinion... > > > > > > > > Before anyone replies: The runtime is open, the compiler is still closed. > > > This has become the new default for accel driver submissions, I think > > > mostly because all the interesting bits for non-3d accelerators are in the > > > accel ISA, and no longer in the runtime. So vendors are fairly happy to > > > throw in the runtime as a freebie. > > > > > > It's still incomplete, and it's still useless if you want to actually hack > > > on the driver stack. > > > -Daniel > > > -- > > I don't understand what's not sustainable here. > > > > There is zero code inside the driver that communicates or interacts > > with our TPC code (TPC is the Tensor Processing Core). > > Even submitting works to the TPC is done via a generic queue > > interface. And that queue IP is common between all our engines > > (TPC/DMA/NIC). The driver provides all the specs of that queue IP, > > because the driver's code is handling that queue. But why is the TPC > > compiler code even relevant here ? > > Can I use the hw how it's intended to be used without it? You can use the h/w with the userspace stack we are providing in our github repos + website. Part of the userspace stack is open sourced, part is closed source. And I'm actively working on opening up more stuff as we go along. > > If the answer is no, then essentially what you're doing with your > upstream driver is getting all the benefits of an upstream driver, > while upstream gets nothing. We can't use your stack, not as-is. Sure > we can use the queue, but we can't actually submit anything > interesting. And I'm pretty sure the point of your hw is to do more > than submit no-op packets to a queue. > > This is all "I want my cake and eat it too" approach to upstreaming, > and it's totally fine attitude to have, but if you don't see why > there's maybe an different side to it then I don't get what you're > arguing. Upstream isn't free lunch for nothing. > > Frankly I'm starting to assume you're arguing this all in bad faith > just because habanalabds doesn't want to actually have an open driver > stack, so any attack is good, no matter what. Which is also what > everyone else does who submits their accel driver to upstream, and > which gets us back to the starting point of this sub-thread of me > really appreciation how this will improve background discussions going > forward for everyone. > > Like if the requirement for accel drivers truly is that you can submit > a dummy command to the queues then I have about 5-10 drivers at least > I could merge instantly. For something like the intel gpu driver it > would be about 50 lines of code (including all the structure boiler > plate the ioctls require)in userspace to submit a dummy queue command. > GPU and accel vendors would really love that, because it would allow > them to freeload on upstream and do essentially nothing in return. > > And we'd end up with an unmaintainable disaster of a gpu or well > accelerator subsystem because there's nothing you can change or > improve because all the really useful bits of the stack are closed. > And ofc that's not any companies problem anymore, so ofc you with the > habanalabs hat on don't care and call this *extreme*. > > > btw, you can today see our TPC code at > > https://github.com/HabanaAI/Habana_Custom_Kernel > > There is a link there to the TPC user guide and link to download the > > LLVM compiler. > > I got stuck clicking links before I found the source for that llvm > compiler. Can you give me a direct link to the repo with sourcecode > instead please? The source code for the LLVM compiler is not available yet. That's one of the parts I'm working on getting in the open. Having said that, I don't think (and I'm not alone at this) that this should be a pre-requirement for upstreaming kernel drivers of any type. And we had this discussion in the past, I'm sure we are both tired of repeating ourselves. > > Thanks, Daniel > -- > Daniel Vetter > Software Engineer, Intel Corporation > http://blog.ffwll.ch
On Tue, Jul 6, 2021 at 3:44 PM Jason Gunthorpe <jgg@ziepe.ca> wrote: > > On Tue, Jul 06, 2021 at 02:07:16PM +0200, Daniel Vetter wrote: > > > On the "rdma-core" idea, afaik rdma NIC do not have fully programmable > > cores in their hw, for which you'd need some kind of compiler to make > > use of the hardware and the interfaces the kernel provides? So not > > really compareable, but also my understanding is that rdma-core does > > actually allow you to reasonable use&drive all the hw features and > > kernel interfaces fully. > > The whole HPC stack has speciality compilers of course. OpenMP, PGAS, > etc. These compilers map onto library primitives that eventually boil > down into rdma-core calls. Even the HW devices have various > programmability that are being targetted with compilers now. People > are making NIC devices with ARM cores/etc - P4 is emerging for some > packet processing tasks. Well it depends which compilers we're talking about here, and what kind of features. Higher level compilers that break down some fancy language like OpenMP into what that actually should do on a given hardware like gpu, or rdma-connected cluster, or whatever, we really don't care about. You don't need that to drive the hardware. Usually that stuff works by breaking some of the code down into cpu compiler IR (most of this is built on top of LLVM IR nowadays), interspersed with library calls to the runtime. Now the thing I care about here is if things doen't get compiled down to cpu code, but to some other IR (SPIR-V is starting to win, but very often ist still a hacked up version of LLVM IR), which then in a hw-specific backend gets compiled down to instructions that run on the hw. I had no idea that rdma NICs can do that, but it sounds like? I guess maybe some openmpi operations could be done directly on the rdma chip, but I'm not sure why you'd want a backend compiler here. Anyway, for anything that works like a gpu accelerator, like 3d accel, or parallel compute accel (aka gpgpu) or spatial compute accel (aka NN/AI) or maybe even fpga accel most of the magic to use the hardware is in this backend compiler, which translates from an IR into whatever your accelerator consumes. That's the part we really care about for modern accelerators because without that defacto the hardware is useless. Generally these chips have full-blown, if special purpose ISA, with register files, spilling, branches, loops and other control flow (sometimes only execution masks on simpler hw). > rdma-core can drive all the kernel interfaces with at least an ioctl > wrapper, and it has a test suite that tries to cover this. It does not > exercise the full HW capability, programmability, etc of every single > device. > > I actually don't entirely know what everyone has built on top of > rdma-core, or how I'd try to map it the DRI ideas you are trying to > explain. > > Should we ban all Intel RDMA drivers because they are shipping > proprietary Intel HPC compilers and proprietary Intel MPI which drives > their RDMA HW? Or is that OK because there are open analogs for some > of that stuff? And yes, the open versions are inferior in various > metrics. > > Pragmatically what I want to see is enough RDMA common/open user space > to understand the uAPI and thus more about how the kernel driver > works. Forcing everyone into rdma-core has already prevented a number > of uAPI mistakes in drivers that would have been bad - so at least > this level really is valuable. > > > So we actually want less on dri-devel, because for compute/accel chips > > we're currently happy with a vendor userspace. It just needs to be > > functional and complete, and open in its entirety. > > In a sense yes: DRI doesn't insist on a single code base to act as the > kernel interface, but that is actually the thing that has brought the > most value to RDMA, IMHO. So in practice we're not that different in DRI wrt userspace - if there is an established cross-vendor project in the given area, we do expect the userspace side to be merged there. And nowadays most of the feature work is done that way, it's just that we don't have a single project like rdma-core for this. We do still allow per-driver submit interfaces because hw is just not standardized enough there, the standards are at a higher level. Which is why it just doesn't make sense to talk about a kernel driver as something that's useful stand-alone at all. > We've certainly had some interesting successes because of this. The > first submission for AWS's EFA driver proposed to skip the rdma-core > step, which was rejected. However since EFA has been in that ecosystem > it has benefited greatly, I think. > > However, in another sense no: RDMA hasn't been blocking, say Intel, > just because they have built proprietary stuff on top of our open > stack. Oh we allow this too. We only block the initial submission if the proprietary stuff is the only thing out there. > Honestly, I think GPU is approaching this backwards. Wayland should > have been designed to prevent proprietary userspace stacks. That's not possible without some serious cans of worms though. Wayland is a protocol, and you can't forbid people from implementing it. Otherwise all the compatible open implementations of closed protocols wouldn't be possible either. Now the implementation is a different thing, and there a few compositors have succumbed to market pressure and enabled the nvidia stack, as a mostly separate piece from supporting the open stack. And that's largely because nvidia managed to completely kill the open source r/e effort through firmware licensing and crypto-key based verified loading, so unless you install the proprietary stack you actually can't make use of the hardware at all - well display works without the firmware, but 3d/compute just doesn't. So you just can't use nvidia hw without accepting their proprietary driver licenses and all that entails for the latest hardware. So I'm not clear what you're suggesting here we should do different. -Daniel
On Tue, Jul 06, 2021 at 12:36:51PM +0200, Daniel Vetter wrote: > If that means AI companies don't want to open our their hw specs > enough to allow that, so be it - all you get in that case is > offloading the kernel side of the stack for convenience, with zero > long term prospects to ever make this into a cross vendor subsystem > stack that does something useful. I don't think this is true at all - nouveau is probably the best example. nouveau reverse engineered a userspace stack for one of these devices. How much further ahead would they have been by now if they had a vendor supported, fully featured, open kernel driver to build the userspace upon? > open up your hw enough for that, I really don't see the point in > merging such a driver, it'll be an unmaintainable stack by anyone else > who's not having access to those NDA covered specs and patents and > everything. My perspective from RDMA is that the drivers are black boxes. I can hack around the interface layers but there is a lot of wild stuff in there that can't be understood without access to the HW documentation. I think only HW that has open specs, like say NVMe, can really be properly community oriented. Otherwise we have to work in a community partnership with the vendor. Jason
On Tue, Jul 6, 2021 at 4:23 PM Jason Gunthorpe <jgg@ziepe.ca> wrote: > > On Tue, Jul 06, 2021 at 12:36:51PM +0200, Daniel Vetter wrote: > > > If that means AI companies don't want to open our their hw specs > > enough to allow that, so be it - all you get in that case is > > offloading the kernel side of the stack for convenience, with zero > > long term prospects to ever make this into a cross vendor subsystem > > stack that does something useful. > > I don't think this is true at all - nouveau is probably the best > example. > > nouveau reverse engineered a userspace stack for one of these devices. > > How much further ahead would they have been by now if they had a > vendor supported, fully featured, open kernel driver to build the > userspace upon? There is actually tons of example here, most of the arm socs have fully open kernel drivers, supported by the vendor (out of tree). The hard part is the userspace driver and all the things you're submitting to it. We've had open kernel drivers for mail/qualcomm/... years before any believable open implementation started existing. Typing up the memory manager and hw submission queue handling is comparitively trivial. Generally the kernel driver is also done last, you bring up the userspace first, often by just directly programming the hw from userspace. Kernel driver only gets in the way with this stuff (nouveau is entirely developed as a userspace driver, as the most extreme example). This is a bit different for the display side, but nowadays those drivers are fully in-kernel so they're all open. Well except the nvidia one, and I've not heard of nvidia working on even an out-of-tree open display driver, so that won't help the in-tree effort at all. Where it would have helped is if this open driver would come with redistributable firmware, because that is right now the thing making nouveau reverse-engineering painful enough to be non-feasible. Well not the reverse-engineering, but the "shipping the result as a working driver stack". I don't think the facts on the ground support your claim here, aside from the practical problem that nvidia is unwilling to even create an open driver to begin with. So there isn't anything to merge. > > open up your hw enough for that, I really don't see the point in > > merging such a driver, it'll be an unmaintainable stack by anyone else > > who's not having access to those NDA covered specs and patents and > > everything. > > My perspective from RDMA is that the drivers are black boxes. I can > hack around the interface layers but there is a lot of wild stuff in > there that can't be understood without access to the HW documentation. There's shipping gpu drivers with entirely reverse-engineered stacks. And I don't mean "shipping in fedora" but "shipping in Chrome tablets sold by OEM partners of Google". So it's very much possible, even if the vendor is maximally stubborn about things. > I think only HW that has open specs, like say NVMe, can really be > properly community oriented. Otherwise we have to work in a community > partnership with the vendor. Well sure that's the ideal case, but most vendors in the accel space arent interested actual partnership with the wider community. It's "merge this kernel driver and have no further demands about anything else". Well there are some who are on board, but it does take pretty enormous amounts of coercion. -Daniel
On Tue, Jul 06, 2021 at 04:09:25PM +0200, Daniel Vetter wrote: > Anyway, for anything that works like a gpu accelerator, like 3d accel, > or parallel compute accel (aka gpgpu) or spatial compute accel (aka > NN/AI) or maybe even fpga accel most of the magic to use the hardware > is in this backend compiler, which translates from an IR into whatever > your accelerator consumes. That's the part we really care about for > modern accelerators because without that defacto the hardware is > useless. Generally these chips have full-blown, if special purpose > ISA, with register files, spilling, branches, loops and other control > flow (sometimes only execution masks on simpler hw). I don't know if I see it so clearly as you do - at the end of the day the user keys in the program in some proprietary (or open!) language and and wack of propritary magic transforms it to "make it work". There are many barriers that prevent someone without the secret knowledge from duplicating the end result of a working program. An accelerator ISA is certainly one example, but I wouldn't overly focus on it as the only blocker. Like you said below the NVIDIA GPU ISA seems known but the HW is still not really useful for other reasons. Habana seems to have gone the other way, the HW is fully useful but we don't have the ISA transformation and other details. Both cases seem to have ended up with something useless, and I have a hard time saying nouveau has more right to be in the kernel tree than Habana does. > > Honestly, I think GPU is approaching this backwards. Wayland should > > have been designed to prevent proprietary userspace stacks. > > That's not possible without some serious cans of worms though. Wayland > is a protocol, and you can't forbid people from implementing it. > Otherwise all the compatible open implementations of closed protocols > wouldn't be possible either. Well, in many ways so is Linux, but nobody would seriously re-implement Linux just to produce a driver. > So I'm not clear what you're suggesting here we should do different. Not enabling proprietary stacks as above would be a good start. Jason
On Tue, Jul 06, 2021 at 04:39:19PM +0200, Daniel Vetter wrote: > On Tue, Jul 6, 2021 at 4:23 PM Jason Gunthorpe <jgg@ziepe.ca> wrote: > > > > On Tue, Jul 06, 2021 at 12:36:51PM +0200, Daniel Vetter wrote: > > > > > If that means AI companies don't want to open our their hw specs > > > enough to allow that, so be it - all you get in that case is > > > offloading the kernel side of the stack for convenience, with zero > > > long term prospects to ever make this into a cross vendor subsystem > > > stack that does something useful. > > > > I don't think this is true at all - nouveau is probably the best > > example. > > > > nouveau reverse engineered a userspace stack for one of these devices. > > > > How much further ahead would they have been by now if they had a > > vendor supported, fully featured, open kernel driver to build the > > userspace upon? > > There is actually tons of example here, most of the arm socs have > fully open kernel drivers, supported by the vendor (out of tree). I choose nouveau because of this: $ git ls-files drivers/gpu/drm/arm/ | xargs wc -l 15039 total $ git ls-files drivers/gpu/drm/nouveau/ | xargs wc -l 204198 total At 13x the size of mali this is not just some easy to wire up memory manager and command submission. And after all that typing it still isn't very good. The fully supported AMD vendor driver is over 3 million lines, so nouveau probably needs to grow several times. My argument is that an in-tree open kernel driver is a big help to reverse engineering an open userspace. Having the vendors collaboration to build that monstrous thing can only help the end goal of an end to end open stack. For instance a vendor with an in-tree driver has a strong incentive to sort out their FW licensing issues so it can be redistributed. I'm not sure about this all or nothing approach. AFAIK DRM has the worst problems with out of tree drivers right now. > Where it would have helped is if this open driver would come with > redistributable firmware, because that is right now the thing making > nouveau reverse-engineering painful enough to be non-feasible. Well > not the reverse-engineering, but the "shipping the result as a working > driver stack". I don't think much of the out of tree but open drivers. The goal must be to get vendors in tree. I would applaud Habana for getting an intree driver at least, even if the userspace is not what we'd all want to see. > I don't think the facts on the ground support your claim here, aside > from the practical problem that nvidia is unwilling to even create an > open driver to begin with. So there isn't anything to merge. The internet tells me there is nvgpu, it doesn't seem to have helped. Jason
On Tue, Jul 6, 2021 at 5:25 PM Jason Gunthorpe <jgg@ziepe.ca> wrote: > On Tue, Jul 06, 2021 at 04:39:19PM +0200, Daniel Vetter wrote: > > On Tue, Jul 6, 2021 at 4:23 PM Jason Gunthorpe <jgg@ziepe.ca> wrote: > > > > > > On Tue, Jul 06, 2021 at 12:36:51PM +0200, Daniel Vetter wrote: > > > > > > > If that means AI companies don't want to open our their hw specs > > > > enough to allow that, so be it - all you get in that case is > > > > offloading the kernel side of the stack for convenience, with zero > > > > long term prospects to ever make this into a cross vendor subsystem > > > > stack that does something useful. > > > > > > I don't think this is true at all - nouveau is probably the best > > > example. > > > > > > nouveau reverse engineered a userspace stack for one of these devices. > > > > > > How much further ahead would they have been by now if they had a > > > vendor supported, fully featured, open kernel driver to build the > > > userspace upon? > > > > There is actually tons of example here, most of the arm socs have > > fully open kernel drivers, supported by the vendor (out of tree). > > I choose nouveau because of this: > > $ git ls-files drivers/gpu/drm/arm/ | xargs wc -l > 15039 total > $ git ls-files drivers/gpu/drm/nouveau/ | xargs wc -l > 204198 total drm/arm is the arm display driver, which isn't actually shipping anywhere afaik. Also it's not including the hdmi/dp output drivers, those are generally external on socs, but integrated in discrete gpu. The other thing to keep in mind is that one of these drivers supports 25 years of product generations, and the other one doesn't. So I think adding it all up it's not that much different. Last time I looked if you look at just command submission and rendering/compute, and not include display, which heavily skews the stats, it's about 10% kernel, 90% userspace driver parts. Not including anything that's shared, which is most of it (compiler frontend, intermediate optimizer, entire runtime/state tracker and all the integration and glue pieces largely). > At 13x the size of mali this is not just some easy to wire up memory > manager and command submission. And after all that typing it still > isn't very good. The fully supported AMD vendor driver is over 3 > million lines, so nouveau probably needs to grow several times. AMD is 3 million lines the size because it includes per-generation generated header files. And of course once you throw an entire vendor team at a driver all those engineers will produce something, and there's the usual that the last 10% of features produce about 90% of the complexity and code problem. E.g. the kbase driver for arm mali gpu is 20x the size of the in-tree panfrost driver - they need to keep typing to justify their continued employement, or something like that. Usually it's because they reinvent the world. > My argument is that an in-tree open kernel driver is a big help to > reverse engineering an open userspace. Having the vendors > collaboration to build that monstrous thing can only help the end goal > of an end to end open stack. Not sure where this got lost, but we're totally fine with vendors using the upstream driver together with their closed stack. And most of the drivers we do have in upstream are actually, at least in parts, supported by the vendor. E.g. if you'd have looked the drm/arm driver you picked is actually 100% written by ARM engineers. So kinda unfitting example. > For instance a vendor with an in-tree driver has a strong incentive to > sort out their FW licensing issues so it can be redistributed. Nvidia has been claiming to try and sort out the FW problem for years. They even managed to release a few things, but I think the last one is 2-3 years late now. Partially the reason is that there don't have a stable api between the firmware and driver, it's all internal from the same source tree, and they don't really want to change that. > I'm not sure about this all or nothing approach. AFAIK DRM has the > worst problems with out of tree drivers right now. Well I guess someone could stand up a drivers/totally-not-gpu and just let the flood in. Even duplicated drivers and everything included, because the vendor drivers are better. Worth a shot, we've practically started this already, I'm just not going to help with the cleanup. > > Where it would have helped is if this open driver would come with > > redistributable firmware, because that is right now the thing making > > nouveau reverse-engineering painful enough to be non-feasible. Well > > not the reverse-engineering, but the "shipping the result as a working > > driver stack". > > I don't think much of the out of tree but open drivers. The goal must > be to get vendors in tree. Agreed. We actually got them in-tree largely. Nvidia even contributes the oddball thing, and I think the tegra line is still fully supported in upstream with the upstream driver. I'm not sure the bleak picture you're drawing is reality, aside from the fact that Nvidia discrete gpu drivers being a disaster with no redistributable firmware, no open kernel driver that works, and nothing else really either. > I would applaud Habana for getting an intree driver at least, even if > the userspace is not what we'd all want to see. > > > I don't think the facts on the ground support your claim here, aside > > from the practical problem that nvidia is unwilling to even create an > > open driver to begin with. So there isn't anything to merge. > > The internet tells me there is nvgpu, it doesn't seem to have helped. Not sure which one you mean, but every once in a while they open up a few headers, or a few programming specs, or a small driver somewhere for a very specific thing, and then it dies again or gets obfuscated for the next platform, or just never updated. I've never seen anything that comes remotely to something complete, aside from tegra socs, which are fully supported in upstream afaik. -Daniel
On Tue, Jul 6, 2021 at 4:56 PM Jason Gunthorpe <jgg@ziepe.ca> wrote: > On Tue, Jul 06, 2021 at 04:09:25PM +0200, Daniel Vetter wrote: > > Anyway, for anything that works like a gpu accelerator, like 3d accel, > > or parallel compute accel (aka gpgpu) or spatial compute accel (aka > > NN/AI) or maybe even fpga accel most of the magic to use the hardware > > is in this backend compiler, which translates from an IR into whatever > > your accelerator consumes. That's the part we really care about for > > modern accelerators because without that defacto the hardware is > > useless. Generally these chips have full-blown, if special purpose > > ISA, with register files, spilling, branches, loops and other control > > flow (sometimes only execution masks on simpler hw). > > I don't know if I see it so clearly as you do - at the end of the day > the user keys in the program in some proprietary (or open!) language > and and wack of propritary magic transforms it to "make it work". > > There are many barriers that prevent someone without the secret > knowledge from duplicating the end result of a working program. An > accelerator ISA is certainly one example, but I wouldn't overly focus > on it as the only blocker. Well we don't, we do just ask for the full driver stack to make the hw work. It's just that in the past most vendors choose to leave out the compiler/ISA from their open stack/specs. Well except nvidia, which still chooses to leave out everything aside from some very, very minimal thing around documenting display functionality. > Like you said below the NVIDIA GPU ISA seems known but the HW is still > not really useful for other reasons. > > Habana seems to have gone the other way, the HW is fully useful but we > don't have the ISA transformation and other details. You can actually use nvidia gpus, they're fully functional. If you install the blobby stack. Which is exactly the same thing as with habanalabs, plus/minus a few things at the fringes. In the end it's about drawing the line somewhere, so maybe we should merge the nvidia glue code that makes their blobby stack work better with upstream? There's quite a few pieces there, e.g. their display driver is by design a userspace driver, whereas with kernel modesetting it needs to be in the kernel to expose the common kms ioctl interfaces, so they've built up a glue layer to forward everything to userspace and back. On windows it works because there kernel code can have growing stacks and fun stuff like that, at least that's my understanding. Not really an option to just run the code in linux. I'm pretty sure nvidia would appreciate that, and maybe every once in a while they open up a header for a generation or two of products like they've done in the past. > Both cases seem to have ended up with something useless, and I have a > hard time saying nouveau has more right to be in the kernel tree than > Habana does. > > > > Honestly, I think GPU is approaching this backwards. Wayland should > > > have been designed to prevent proprietary userspace stacks. > > > > That's not possible without some serious cans of worms though. Wayland > > is a protocol, and you can't forbid people from implementing it. > > Otherwise all the compatible open implementations of closed protocols > > wouldn't be possible either. > > Well, in many ways so is Linux, but nobody would seriously > re-implement Linux just to produce a driver. Well in the gpu space for 2+ decades nvidia has been setting the standard, and the open stack has been trying to catch up by reimplementing the entire thing. It took a fair while. > > So I'm not clear what you're suggesting here we should do different. > > Not enabling proprietary stacks as above would be a good start. I'm still not sure what exactly you mean here. Like on the 3d side there's opengl and vulkan, and nvidia just has an entirely different implementation of that compared to any of the open drivers. That is a bit less code than linux, but it's not small, and reimplementing over decades is pretty much what happened. And if it's not allowed we'd actually not have an open 3d gpu stack at all, because only very recently did we get an agreement around the tracemark/licensing issues of that stuff with Khronos. Recently compared to the history of opengl at least. So I'm still not clear what exactly it is you're suggesting we should do? Not implement the industry standards for 3d (and accept we stay irrelevant forever)? Reject nvidia blobs harder than we do already? Distros will continue to ship an auto-installer for that stack, at least some, so we're pretty much maxed out already. Like in what way do you think the upstream stack does enable the proprietary nvidia stack? Should we permanently ban any contributions from anyone with an @nvidia.com address, even if it helps the open stack improve? Like I'm not seeing something concrete that could be done, which would actually prevent nvidia from having their completely independent stack, with exact same functionality and not a line of code shared. Which is were we are right now. The only thing where we could be more strict is to reject any contributions from them at all, just because we don't like them. That seems a bit too extreme -Daniel
On Tue, Jul 6, 2021 at 5:49 PM Daniel Vetter <daniel.vetter@ffwll.ch> wrote: > On Tue, Jul 6, 2021 at 5:25 PM Jason Gunthorpe <jgg@ziepe.ca> wrote: > > I'm not sure about this all or nothing approach. AFAIK DRM has the > > worst problems with out of tree drivers right now. > > Well I guess someone could stand up a drivers/totally-not-gpu and just > let the flood in. Even duplicated drivers and everything included, > because the vendor drivers are better. Worth a shot, we've practically > started this already, I'm just not going to help with the cleanup. tbh I think at this point someone should just do that. Ideally with some boundary like please don't use dma-fence or dma-buf and stuff like that so drivers/gpu doesn't ever have to deal with the fallout. But way too many people think that somehow you magically get the other 90% of an open accel stack if you're just friendly enough and merge the kernel driver, so we really should just that experiment in upstream and watch it pan out in reality. Minimally it would be some great entertainment :-) Also on your claim that drivers/gpu is a non-upstream disaster: I've also learned that that for drivers/rdma there's the upstream driver, and then there's the out-of-tree hackjob the vendor actually supports. So seems to be about the same level of screwed up, if you ask the vendor they tell you the upstream driver isn't a thing they care about and it's just done for a bit of goodwill. Except if you have enormous amounts of volume, then suddenly it's an option ... Minus the fw issue for nvidia, upstream does support all the gpus you can buy right now and that can run on linux with some vendor driver (aka excluding apple M1 and ofc upcoming products from most vendors). drivers/accel otoh is mostly out-of-tree, because aside from Greg mergin habanalabs no one is bold enough anymore to just merge them all. There's lots of those going around that would be ready for picking. And they've been continously submitted to upstream over the years, even before the entire habanalabs thing. -Daniel
On Tue, Jul 06, 2021 at 05:49:01PM +0200, Daniel Vetter wrote: > The other thing to keep in mind is that one of these drivers supports > 25 years of product generations, and the other one doesn't. Sure, but that is the point, isn't it? To have an actually useful thing you need all of this mess > > My argument is that an in-tree open kernel driver is a big help to > > reverse engineering an open userspace. Having the vendors > > collaboration to build that monstrous thing can only help the end goal > > of an end to end open stack. > > Not sure where this got lost, but we're totally fine with vendors > using the upstream driver together with their closed stack. And most > of the drivers we do have in upstream are actually, at least in parts, > supported by the vendor. E.g. if you'd have looked the drm/arm driver > you picked is actually 100% written by ARM engineers. So kinda > unfitting example. So the argument with Habana really boils down to how much do they need to show in the open source space to get a kernel driver? You want to see the ISA or compiler at least? That at least doesn't seem "extreme" to me. > > For instance a vendor with an in-tree driver has a strong incentive to > > sort out their FW licensing issues so it can be redistributed. > > Nvidia has been claiming to try and sort out the FW problem for years. > They even managed to release a few things, but I think the last one is > 2-3 years late now. Partially the reason is that there don't have a > stable api between the firmware and driver, it's all internal from the > same source tree, and they don't really want to change that. Right, companies have no incentive to work in a sane way if they have their own parallel world. I think drawing them part by part into the standard open workflows and expectations is actually helpful to everyone. > > > I don't think the facts on the ground support your claim here, aside > > > from the practical problem that nvidia is unwilling to even create an > > > open driver to begin with. So there isn't anything to merge. > > > > The internet tells me there is nvgpu, it doesn't seem to have helped. > > Not sure which one you mean, but every once in a while they open up a > few headers, or a few programming specs, or a small driver somewhere > for a very specific thing, and then it dies again or gets obfuscated > for the next platform, or just never updated. I've never seen anything > that comes remotely to something complete, aside from tegra socs, > which are fully supported in upstream afaik. I understand nvgpu is the tegra driver that people actualy use. nouveau may have good tegra support but is it used in any actual commercial product? Jason
On Tue, Jul 06, 2021 at 06:07:17PM +0200, Daniel Vetter wrote: > Also on your claim that drivers/gpu is a non-upstream disaster: I've > also learned that that for drivers/rdma there's the upstream driver, > and then there's the out-of-tree hackjob the vendor actually > supports. In the enterprise world everyone has their out of tree backport drivers. It varies on the vendor how much deviation there is from the upstream driver and what commercial support relationship the vendor has with the enterprise distros. > So seems to be about the same level of screwed up, if you ask the > vendor they tell you the upstream driver isn't a thing they care about > and it's just done for a bit of goodwill. Sounds like you should get a new RDMA supplier :) To be fair Intel is getting better, they got their new RDMA HW support merged into v5.14 after about 2 years in the out of tree world. Though it is still incomplete compared to their out of tree driver, the gap is much smaller now. > amounts of volume, then suddenly it's an option ... Minus the fw issue > for nvidia, upstream does support all the gpus you can buy right now > and that can run on linux with some vendor driver (aka excluding apple > M1 and ofc upcoming products from most vendors). I would look at how many actual commercial systems are running the upstream/inbox stack. I personally know of quite a few sites with big HPC RDMA deployments running pure inbox kernels, no add on kernel modules, with full commercial support. If you can say that kind of arrangment is also common place in the GPU world then I will happily be wrong. Jason
On Tue, Jul 06, 2021 at 02:28:28PM -0300, Jason Gunthorpe wrote: > > Also on your claim that drivers/gpu is a non-upstream disaster: I've > > also learned that that for drivers/rdma there's the upstream driver, > > and then there's the out-of-tree hackjob the vendor actually > > supports. > > In the enterprise world everyone has their out of tree backport > drivers. It varies on the vendor how much deviation there is from the > upstream driver and what commercial support relationship the vendor > has with the enterprise distros. I think he means the Mellanox OFED stack, which is a complete and utter mess and which gets force fed by Mellanox/Nvidia on unsuspecting customers. I know many big HPC sites that ignore it, but a lot of enterprise customers are dumb enought to deploy it.
On Tue, Jul 6, 2021 at 6:29 PM Jason Gunthorpe <jgg@ziepe.ca> wrote: > > On Tue, Jul 06, 2021 at 05:49:01PM +0200, Daniel Vetter wrote: > > > The other thing to keep in mind is that one of these drivers supports > > 25 years of product generations, and the other one doesn't. > > Sure, but that is the point, isn't it? To have an actually useful > thing you need all of this mess > > > > My argument is that an in-tree open kernel driver is a big help to > > > reverse engineering an open userspace. Having the vendors > > > collaboration to build that monstrous thing can only help the end goal > > > of an end to end open stack. > > > > Not sure where this got lost, but we're totally fine with vendors > > using the upstream driver together with their closed stack. And most > > of the drivers we do have in upstream are actually, at least in parts, > > supported by the vendor. E.g. if you'd have looked the drm/arm driver > > you picked is actually 100% written by ARM engineers. So kinda > > unfitting example. > > So the argument with Habana really boils down to how much do they need > to show in the open source space to get a kernel driver? You want to > see the ISA or compiler at least? Yup. We dont care about any of the fancy pieces you build on top, nor does the compiler need to be the optimizing one. Just something that's good enough to drive the hw in some demons to see how it works and all that. Generally that's also not that hard to reverse engineer, if someone is bored enough, the real fancy stuff tends to be in how you optimize the generated code. And make it fit into the higher levels properly. > That at least doesn't seem "extreme" to me. > > > > For instance a vendor with an in-tree driver has a strong incentive to > > > sort out their FW licensing issues so it can be redistributed. > > > > Nvidia has been claiming to try and sort out the FW problem for years. > > They even managed to release a few things, but I think the last one is > > 2-3 years late now. Partially the reason is that there don't have a > > stable api between the firmware and driver, it's all internal from the > > same source tree, and they don't really want to change that. > > Right, companies have no incentive to work in a sane way if they have > their own parallel world. I think drawing them part by part into the > standard open workflows and expectations is actually helpful to > everyone. Well we do try to get them on board part-by-part generally starting with the kernel and ending with a proper compiler instead of the usual llvm hack job, but for whatever reasons they really like their in-house stuff, see below for what I mean. > > > > I don't think the facts on the ground support your claim here, aside > > > > from the practical problem that nvidia is unwilling to even create an > > > > open driver to begin with. So there isn't anything to merge. > > > > > > The internet tells me there is nvgpu, it doesn't seem to have helped. > > > > Not sure which one you mean, but every once in a while they open up a > > few headers, or a few programming specs, or a small driver somewhere > > for a very specific thing, and then it dies again or gets obfuscated > > for the next platform, or just never updated. I've never seen anything > > that comes remotely to something complete, aside from tegra socs, > > which are fully supported in upstream afaik. > > I understand nvgpu is the tegra driver that people actualy > use. nouveau may have good tegra support but is it used in any actual > commercial product? I think it was almost the case. Afaik they still have their internal userspace stack working on top of nvidia, at least last year someone fixed up a bunch of issues in the tegra+nouveau combo to enable format modifiers properly across the board. But also nvidia is never going to sell you that as the officially supported thing, unless your ask comes back with enormous amounts of sold hardware. And it's not just nvidia, it's pretty much everyone. Like a soc company I don't want to know started collaborating with upstream and the reverse-engineered mesa team on a kernel driver, seems to work pretty well for current hardware. But for the next generation they decided it's going to be again only their in-house tree that completele ignores drivers/gpu/drm, and also tosses all the foundational work they helped build on the userspace side. And this is consistent across all companies, over the last 20 years I know of (often non-public) stories across every single company where they decided that all the time invested into community/upstream collaboration isn't useful anymore, we go all vendor solo for the next one. Most of those you luckily don't hear about anymore, all it results in the upstream driver being 1-2 years late or so. But even the good ones where we collaborate well can't seem to help themselves and want to throw it all away every few years. -Daniel
On Tue, Jul 06, 2021 at 07:31:37PM +0200, Christoph Hellwig wrote: > On Tue, Jul 06, 2021 at 02:28:28PM -0300, Jason Gunthorpe wrote: > > > Also on your claim that drivers/gpu is a non-upstream disaster: I've > > > also learned that that for drivers/rdma there's the upstream driver, > > > and then there's the out-of-tree hackjob the vendor actually > > > supports. > > > > In the enterprise world everyone has their out of tree backport > > drivers. It varies on the vendor how much deviation there is from the > > upstream driver and what commercial support relationship the vendor > > has with the enterprise distros. > > I think he means the Mellanox OFED stack, which is a complete and utter > mess and which gets force fed by Mellanox/Nvidia on unsuspecting > customers. I know many big HPC sites that ignore it, but a lot of > enterprise customers are dumb enought to deploy it. No, I don't think so. While MOFED is indeed a giant mess, the mlx5 upstream driver is not some token effort to generate good will and Mellanox certainly does provide full commercial support for the mlx5 drivers shipped inside various enterprise distros. MOFED also doesn't have a big functional divergance from RDMA upstream, and it is not mandatory just to use the hardware. I can not say the same about other company's RDMA driver distributions, Daniel's description of "minimal effort to get goodwill" would match others much better. You are right that there are a lot of enterprise customers who deploy the MOFED. I can't agree with their choices, but they are not forced into using it anymore. Jason
I should stop typing and prep dinner, but I found some too hilarious typos below. On Tue, Jul 6, 2021 at 7:35 PM Daniel Vetter <daniel.vetter@ffwll.ch> wrote: > > On Tue, Jul 6, 2021 at 6:29 PM Jason Gunthorpe <jgg@ziepe.ca> wrote: > > > > On Tue, Jul 06, 2021 at 05:49:01PM +0200, Daniel Vetter wrote: > > > > > The other thing to keep in mind is that one of these drivers supports > > > 25 years of product generations, and the other one doesn't. > > > > Sure, but that is the point, isn't it? To have an actually useful > > thing you need all of this mess > > > > > > My argument is that an in-tree open kernel driver is a big help to > > > > reverse engineering an open userspace. Having the vendors > > > > collaboration to build that monstrous thing can only help the end goal > > > > of an end to end open stack. > > > > > > Not sure where this got lost, but we're totally fine with vendors > > > using the upstream driver together with their closed stack. And most > > > of the drivers we do have in upstream are actually, at least in parts, > > > supported by the vendor. E.g. if you'd have looked the drm/arm driver > > > you picked is actually 100% written by ARM engineers. So kinda > > > unfitting example. > > > > So the argument with Habana really boils down to how much do they need > > to show in the open source space to get a kernel driver? You want to > > see the ISA or compiler at least? > > Yup. We dont care about any of the fancy pieces you build on top, nor > does the compiler need to be the optimizing one. Just something that's > good enough to drive the hw in some demons to see how it works and all s/demons/demos/ but hw tends to be funky enough that either fits :-) > that. Generally that's also not that hard to reverse engineer, if > someone is bored enough, the real fancy stuff tends to be in how you > optimize the generated code. And make it fit into the higher levels > properly. > > > That at least doesn't seem "extreme" to me. > > > > > > For instance a vendor with an in-tree driver has a strong incentive to > > > > sort out their FW licensing issues so it can be redistributed. > > > > > > Nvidia has been claiming to try and sort out the FW problem for years. > > > They even managed to release a few things, but I think the last one is > > > 2-3 years late now. Partially the reason is that there don't have a > > > stable api between the firmware and driver, it's all internal from the > > > same source tree, and they don't really want to change that. > > > > Right, companies have no incentive to work in a sane way if they have > > their own parallel world. I think drawing them part by part into the > > standard open workflows and expectations is actually helpful to > > everyone. > > Well we do try to get them on board part-by-part generally starting > with the kernel and ending with a proper compiler instead of the usual > llvm hack job, but for whatever reasons they really like their > in-house stuff, see below for what I mean. > > > > > > I don't think the facts on the ground support your claim here, aside > > > > > from the practical problem that nvidia is unwilling to even create an > > > > > open driver to begin with. So there isn't anything to merge. > > > > > > > > The internet tells me there is nvgpu, it doesn't seem to have helped. > > > > > > Not sure which one you mean, but every once in a while they open up a > > > few headers, or a few programming specs, or a small driver somewhere > > > for a very specific thing, and then it dies again or gets obfuscated > > > for the next platform, or just never updated. I've never seen anything > > > that comes remotely to something complete, aside from tegra socs, > > > which are fully supported in upstream afaik. > > > > I understand nvgpu is the tegra driver that people actualy > > use. nouveau may have good tegra support but is it used in any actual > > commercial product? > > I think it was almost the case. Afaik they still have their internal > userspace stack working on top of nvidia, at least last year someone > fixed up a bunch of issues in the tegra+nouveau combo to enable format > modifiers properly across the board. But also nvidia is never going to > sell you that as the officially supported thing, unless your ask comes > back with enormous amounts of sold hardware. > > And it's not just nvidia, it's pretty much everyone. Like a soc > company I don't want to know started collaborating with upstream and s/know/name/ I do know them unfortunately quite well ... Cheers, Daniel > the reverse-engineered mesa team on a kernel driver, seems to work > pretty well for current hardware. But for the next generation they > decided it's going to be again only their in-house tree that > completele ignores drivers/gpu/drm, and also tosses all the > foundational work they helped build on the userspace side. And this is > consistent across all companies, over the last 20 years I know of > (often non-public) stories across every single company where they > decided that all the time invested into community/upstream > collaboration isn't useful anymore, we go all vendor solo for the next > one. > > Most of those you luckily don't hear about anymore, all it results in > the upstream driver being 1-2 years late or so. But even the good ones > where we collaborate well can't seem to help themselves and want to > throw it all away every few years. > -Daniel > -- > Daniel Vetter > Software Engineer, Intel Corporation > http://blog.ffwll.ch
On Tue, Jul 06, 2021 at 07:35:55PM +0200, Daniel Vetter wrote: > Yup. We dont care about any of the fancy pieces you build on top, nor > does the compiler need to be the optimizing one. Just something that's > good enough to drive the hw in some demons to see how it works and all > that. Generally that's also not that hard to reverse engineer, if > someone is bored enough, the real fancy stuff tends to be in how you > optimize the generated code. And make it fit into the higher levels > properly. Seems reasonable to me > And it's not just nvidia, it's pretty much everyone. Like a soc > company I don't want to know started collaborating with upstream and > the reverse-engineered mesa team on a kernel driver, seems to work > pretty well for current hardware. What I've seen is that this only works with customer demand. Companies need to hear from their customers that upstream is what is needed, and companies cannot properly hear that until they are at least already partially invested in the upstream process and have the right customers that are sophisticated enough to care. Embedded makes everything 10x worse because too many customers just don't care about upstream, you can hack your way through everything, and indulge in single generation thinking. Fork the whole kernel for 3 years, EOL, no problem! It is the enterprise world, particularly with an opinionated company like RH saying NO stuck in the middle that really seems to drive things toward upstream. Yes, vendors can work around Red Hat's No (and NVIDIA GPU is such an example) but it is incredibly time consuming, expensive and becoming more and more difficult every year. The big point is this: > But also nvidia is never going to sell you that as the officially > supported thing, unless your ask comes back with enormous amounts of > sold hardware. I think this is at the core of Linux's success in the enterprise world. Big customers who care demanding open source. Any vendor, even nvidia will want to meet customer demands. IHMO upstream success is found by motivating the customer to demand and make it "easy" for the vendor to supply it. Jason
On Tue, Jul 6, 2021 at 8:31 PM Jason Gunthorpe <jgg@ziepe.ca> wrote: > On Tue, Jul 06, 2021 at 07:35:55PM +0200, Daniel Vetter wrote: > > Yup. We dont care about any of the fancy pieces you build on top, nor > > does the compiler need to be the optimizing one. Just something that's > > good enough to drive the hw in some demons to see how it works and all > > that. Generally that's also not that hard to reverse engineer, if > > someone is bored enough, the real fancy stuff tends to be in how you > > optimize the generated code. And make it fit into the higher levels > > properly. > > Seems reasonable to me > > > And it's not just nvidia, it's pretty much everyone. Like a soc > > company I don't want to know started collaborating with upstream and > > the reverse-engineered mesa team on a kernel driver, seems to work > > pretty well for current hardware. > > What I've seen is that this only works with customer demand. Companies > need to hear from their customers that upstream is what is needed, and > companies cannot properly hear that until they are at least already > partially invested in the upstream process and have the right > customers that are sophisticated enough to care. > > Embedded makes everything 10x worse because too many customers just > don't care about upstream, you can hack your way through everything, > and indulge in single generation thinking. Fork the whole kernel for 3 > years, EOL, no problem! It's not entirely hopeless in embedded either. Sure there's the giant pile of sell&forget abandonware, but there are lots of embedded things where multi-year to multi-decade support is required. And an upstream gfx stack beats anything the vendor has to offer on that, easily. And on the server side it's actually pretty hard to convince customers of the upstream driver benefits, because they don't want or can't abandon nvidia and have just learned to accept the pain. They either build a few abstraction layers on top (and demand the vendor support those), or they flat out demand you support the nvidia broprietary interfaces. And AMD has been trying to move the needle here for years, with not that much success. > It is the enterprise world, particularly with an opinionated company > like RH saying NO stuck in the middle that really seems to drive > things toward upstream. > > Yes, vendors can work around Red Hat's No (and NVIDIA GPU is such an > example) but it is incredibly time consuming, expensive and becoming > more and more difficult every year. > > The big point is this: > > > But also nvidia is never going to sell you that as the officially > > supported thing, unless your ask comes back with enormous amounts of > > sold hardware. > > I think this is at the core of Linux's success in the enterprise > world. Big customers who care demanding open source. Any vendor, even > nvidia will want to meet customer demands. > > IHMO upstream success is found by motivating the customer to demand > and make it "easy" for the vendor to supply it. Yup, exactly same situation here. The problem seems to be a bit that gpu vendor stubbornness is higher than established customer demand even, or they just don't care, and so in the last few years that customer demand has resulted in payment to consulting shops and hiring of engineers into reverse-engineering a full driver, instead of customer and vendor splitting the difference and the vendor upstreaming their stack. And that's for companies who've done it in the past, or at least collaborated on parts like the kernel driver, so I really have no clue why they don't just continue. We have well-established customers who do want it all open and upstream, across kernel and userspace pieces. And it looks like it's going to repeat itself a few more times unfortunately. I'm not sure when exactly the lesson will sink in. Maybe I missed some, but looking at current render/compute drivers I think (but not even sure on that) only drm/lima is a hobbyist project and perhaps you want to include drm/nouveau as not paid by customers and more something redhat does out of principle. All the others are paid for by customers, with vendor involvement ranging from "just helping out with the kernel driver" to "pays for pretty much all of the development". And still apparently that's not enough demand for an upstream driver stack. -Daniel
On Tue, Jul 6, 2021 at 2:31 PM Jason Gunthorpe <jgg@ziepe.ca> wrote: > > On Tue, Jul 06, 2021 at 07:35:55PM +0200, Daniel Vetter wrote: > > > Yup. We dont care about any of the fancy pieces you build on top, nor > > does the compiler need to be the optimizing one. Just something that's > > good enough to drive the hw in some demons to see how it works and all > > that. Generally that's also not that hard to reverse engineer, if > > someone is bored enough, the real fancy stuff tends to be in how you > > optimize the generated code. And make it fit into the higher levels > > properly. > > Seems reasonable to me > > > And it's not just nvidia, it's pretty much everyone. Like a soc > > company I don't want to know started collaborating with upstream and > > the reverse-engineered mesa team on a kernel driver, seems to work > > pretty well for current hardware. > > What I've seen is that this only works with customer demand. Companies > need to hear from their customers that upstream is what is needed, and > companies cannot properly hear that until they are at least already > partially invested in the upstream process and have the right > customers that are sophisticated enough to care. > > Embedded makes everything 10x worse because too many customers just > don't care about upstream, you can hack your way through everything, > and indulge in single generation thinking. Fork the whole kernel for 3 > years, EOL, no problem! > > It is the enterprise world, particularly with an opinionated company > like RH saying NO stuck in the middle that really seems to drive > things toward upstream. > > Yes, vendors can work around Red Hat's No (and NVIDIA GPU is such an > example) but it is incredibly time consuming, expensive and becoming > more and more difficult every year. > > The big point is this: > > > But also nvidia is never going to sell you that as the officially > > supported thing, unless your ask comes back with enormous amounts of > > sold hardware. > > I think this is at the core of Linux's success in the enterprise > world. Big customers who care demanding open source. Any vendor, even > nvidia will want to meet customer demands. > > IHMO upstream success is found by motivating the customer to demand > and make it "easy" for the vendor to supply it. I think this is one of the last big challenges on Linux. It's REALLY hard to align new products with Linux kernel releases and distro kernels. Hardware cycles are too short and drivers (at least for GPUs) are too big to really fit well with the current Linux release model. In many cases enterprise distros have locked down on a kernel version around the same time we are doing new chip bring up. You are almost always off by one when it comes to kernel version alignment. Even if you can get the initial code upstream in the right kernel version, it tends to be aligned to such early silicon that you end up needing a pile of additional patches to make production cards work. Those changes are often deemed "too big" for stable kernel fixes. The only real way to deal with that effectively is with vendor provided packaged drivers using something like dkms to cover launch. Thus you need to do your bring up against latest upstream and then backport, or do your bring up against some older kernel and forward port for upstream. You end up doing everything twice. Things get better with sustaining support in subsequent distro releases, but it doesn't help at product launch. I don't know what the right solution for this is. Alex
Am 06.07.21 um 14:23 schrieb Daniel Vetter: > On Tue, Jul 06, 2021 at 02:21:10PM +0200, Christoph Hellwig wrote: >> On Tue, Jul 06, 2021 at 10:40:37AM +0200, Daniel Vetter wrote: >>>> Greg, I hope this will be good enough for you to merge this code. >>> So we're officially going to use dri-devel for technical details review >>> and then Greg for merging so we don't have to deal with other merge >>> criteria dri-devel folks have? >>> >>> I don't expect anything less by now, but it does make the original claim >>> that drivers/misc will not step all over accelerators folks a complete >>> farce under the totally-not-a-gpu banner. >>> >>> This essentially means that for any other accelerator stack that doesn't >>> fit the dri-devel merge criteria, even if it's acting like a gpu and uses >>> other gpu driver stuff, you can just send it to Greg and it's good to go. >>> >>> There's quite a lot of these floating around actually (and many do have >>> semi-open runtimes, like habanalabs have now too, just not open enough to >>> be actually useful). It's going to be absolutely lovely having to explain >>> to these companies in background chats why habanalabs gets away with their >>> stack and they don't. >> FYI, I fully agree with Daniel here. Habanlabs needs to open up their >> runtime if they want to push any additional feature in the kernel. >> The current situation is not sustainable. > Before anyone replies: The runtime is open, the compiler is still closed. > This has become the new default for accel driver submissions, I think > mostly because all the interesting bits for non-3d accelerators are in the > accel ISA, and no longer in the runtime. So vendors are fairly happy to > throw in the runtime as a freebie. Well a compiler and runtime makes things easier, but the real question is if they are really required for upstreaming a kernel driver? I mean what we need is to be able to exercise the functionality. So wouldn't (for example) an assembler be sufficient? > It's still incomplete, and it's still useless if you want to actually hack > on the driver stack. Yeah, when you want to hack on it in the sense of extending it then this requirement is certainly true. But as far as I can see userspace don't need to be extendable to justify a kernel driver. It just needs to have enough glue to thoughtfully exercise the relevant kernel interfaces. Applying that to GPUs I think what you need to be able to is to write shaders, but that doesn't need to be in a higher language requiring a compiler and runtime. Released opcodes and a low level assembler should be sufficient. Regards, Christian. > -Daniel
On Wed, Jul 7, 2021 at 2:17 PM Christian König <ckoenig.leichtzumerken@gmail.com> wrote: > Am 06.07.21 um 14:23 schrieb Daniel Vetter: > > On Tue, Jul 06, 2021 at 02:21:10PM +0200, Christoph Hellwig wrote: > >> On Tue, Jul 06, 2021 at 10:40:37AM +0200, Daniel Vetter wrote: > >>>> Greg, I hope this will be good enough for you to merge this code. > >>> So we're officially going to use dri-devel for technical details review > >>> and then Greg for merging so we don't have to deal with other merge > >>> criteria dri-devel folks have? > >>> > >>> I don't expect anything less by now, but it does make the original claim > >>> that drivers/misc will not step all over accelerators folks a complete > >>> farce under the totally-not-a-gpu banner. > >>> > >>> This essentially means that for any other accelerator stack that doesn't > >>> fit the dri-devel merge criteria, even if it's acting like a gpu and uses > >>> other gpu driver stuff, you can just send it to Greg and it's good to go. > >>> > >>> There's quite a lot of these floating around actually (and many do have > >>> semi-open runtimes, like habanalabs have now too, just not open enough to > >>> be actually useful). It's going to be absolutely lovely having to explain > >>> to these companies in background chats why habanalabs gets away with their > >>> stack and they don't. > >> FYI, I fully agree with Daniel here. Habanlabs needs to open up their > >> runtime if they want to push any additional feature in the kernel. > >> The current situation is not sustainable. > > Before anyone replies: The runtime is open, the compiler is still closed. > > This has become the new default for accel driver submissions, I think > > mostly because all the interesting bits for non-3d accelerators are in the > > accel ISA, and no longer in the runtime. So vendors are fairly happy to > > throw in the runtime as a freebie. > > Well a compiler and runtime makes things easier, but the real question > is if they are really required for upstreaming a kernel driver? > > I mean what we need is to be able to exercise the functionality. So > wouldn't (for example) an assembler be sufficient? So no one has tried this yet, but I think an assembler, or maybe even just the full PRM for the ISA is also good enough I think. I guess in practice everyone just comes with the compiler for a few reasons: - AMD and Intel are great and release full PRMs for the gpu, but preparing those takes a lot of time. Often that's done as part of bring up, to make sure everything is annotated properly, so that all the necessary bits are included, but none of the future stuff, or silicon bring-up pieces. So in reality you have the compiler before you have the isa docs. - reverse-engineered drivers also tend to have demo compilers before anything like full ISA docs show up :-) But also the docs tooling they have are great. - then there's the case of developing a driver with NDA'd docs. Again you'll have a compiler as the only real output, there's not going to be any docs or anything like that. > > It's still incomplete, and it's still useless if you want to actually hack > > on the driver stack. > > Yeah, when you want to hack on it in the sense of extending it then this > requirement is certainly true. > > But as far as I can see userspace don't need to be extendable to justify > a kernel driver. It just needs to have enough glue to thoughtfully > exercise the relevant kernel interfaces. > > Applying that to GPUs I think what you need to be able to is to write > shaders, but that doesn't need to be in a higher language requiring a > compiler and runtime. Released opcodes and a low level assembler should > be sufficient. Yeah I think in theory ISA docs + assembler testcase or whatever is perfectly fine. In reality anyone who cares enough to do this properly gets to the demo quality compiler stage first, and so that's what we take for merging a new stack. I do disagree that we're only ever asking for this and not more, e.g. if you come with a new 3d accelator and it's not coming with a userspace driver as a mesa MR, you have to do some very serious explaining about wtf you're doing - mesa3d won, pretty much across the board, as a common project for both vulkan and opengl, and the justifications for reinventing wheels better be really good here. Also by the time you've written enough scaffolding to show it integrates in non-stupid ways into mesa, you practically have a demo-quality driver stack anyway. Similar on the display side of things, over the past year consensus for merge criteria have gone up quite a bit, e.g. there's a patch floating around to make that clearer: https://lore.kernel.org/dri-devel/20210706161244.1038592-1-maxime@cerno.tech/ Of course this doesn't include anything grandfathered in (*cough* amdvlk *cough*), and also outside of 3d there's clearly no cross-vendor project that's established enough, media, compute, AI/NN stuff is all very badly fragmented. That's maybe lamentable, but like you said not really a reason to reject a kernel driver. -Daniel
On 7/6/21 1:59 PM, Jason Gunthorpe wrote: > I can not say the same about other company's RDMA driver > distributions, Daniel's description of "minimal effort to get > goodwill" would match others much better. Not sure what other RDMA driver you are talking about but as for Cornelis Networks, we do have a packaged up version of our software. However it is meant to make things easier on end users to bridge the gap between the distro kernel drivers and the upstream kernel. It's definitely not a requirement and plenty of folks do use distro kernels/drivers. I'm not sure how many large sites are using something straight off kernel.org but the upstream hfi1 driver is 100% the real deal. We continually develop on and test the upstream kernel. Our goal is always to upstream patches first. We learned that lesson the hard way when we first tried to upstream hfi1. -Denny