Message ID | 0-v1-9912f1a11620+2a-fwctl_jgg@nvidia.com |
---|---|
Headers | show |
Series | Introduce fwctl subystem | expand |
On Mon, 3 Jun 2024 12:53:16 -0300 Jason Gunthorpe wrote: > fwctl is a new subsystem intended to bring some common rules and order to > the growing pattern of exposing a secure FW interface directly to > userspace. Unlike existing places like RDMA/DRM/VFIO/uacce that are > exposing a device for datapath operations fwctl is focused on debugging, > configuration and provisioning of the device. It will not have the > necessary features like interrupt delivery to support a datapath. If you have debug problems in your subsystem, put the APIs in your subsystem. Don't force your choices on all the subsystems your device interacts with: Nacked-by: Jakub Kicinski <kuba@kernel.org> Somewhat related, I saw nVidia sells various interesting features in its DOCA stack. Is that Open Source?
On 6/3/24 12:42 PM, Jakub Kicinski wrote: > Somewhat related, I saw nVidia sells various interesting features in its > DOCA stack. Is that Open Source? Seriously, Jakub, how is that in any way related to this patch set? You are basically suggesting that if any vendor ever has an out of tree option for its hardware every patch it sends should be considered a ruse to enable or simplify proprietary options.
On Mon, 3 Jun 2024 21:01:58 -0600 David Ahern wrote: > On 6/3/24 12:42 PM, Jakub Kicinski wrote: > > Somewhat related, I saw nVidia sells various interesting features in its > > DOCA stack. Is that Open Source? > > Seriously, Jakub, how is that in any way related to this patch set? Whether they admit it or not, DOCA is a major reason nVidia wants this to be standalone rather than part of RDMA. > You are basically suggesting that if any vendor ever has an out of tree > option for its hardware every patch it sends should be considered a ruse > to enable or simplify proprietary options. Ooo, is that a sore spot? I don't begrudge anyone building proprietary options, but leave upstream out of it.
On 04 Jun 07:04, Jakub Kicinski wrote: >On Mon, 3 Jun 2024 21:01:58 -0600 David Ahern wrote: >> On 6/3/24 12:42 PM, Jakub Kicinski wrote: >> > Somewhat related, I saw nVidia sells various interesting features in its >> > DOCA stack. Is that Open Source? >> >> Seriously, Jakub, how is that in any way related to this patch set? > >Whether they admit it or not, DOCA is a major reason nVidia wants >this to be standalone rather than part of RDMA. > No, DOCA isn't on the agenda for this new interface. But what is the point in arguing? Apparently the vendor is not credible enough in your opinion. Which is an absolute outrageous grounds for a NAK. Anyway I don't see your point in bringing up DOCA here, but obviously once this interface is accepted, all developers are welcome to use it, including DOCA developers of course.. That being said, the why we need this is crystal clear in the cover-letter and previous submission discussions, bringing random SDKs into this discussion is not objective and counter productive to the technical discussion. >> You are basically suggesting that if any vendor ever has an out of tree >> option for its hardware every patch it sends should be considered a ruse >> to enable or simplify proprietary options. > It's apparent that you're attributing sinister agendas to patchsets when you fail to offer valid technical opinions regarding the NAK nature. Let's address this outside of this patchset, as this isn't the first occurrence. Consistency in evaluating patches is crucial; some, like the fbnic and idpf, seem to go unquestioned, while others face scrutiny. >Ooo, is that a sore spot? > >I don't begrudge anyone building proprietary options, but leave >upstream out of it. >
On Tue, 4 Jun 2024 14:28:05 -0700 Saeed Mahameed wrote: > On 04 Jun 07:04, Jakub Kicinski wrote: > >On Mon, 3 Jun 2024 21:01:58 -0600 David Ahern wrote: > >> Seriously, Jakub, how is that in any way related to this patch set? > > > >Whether they admit it or not, DOCA is a major reason nVidia wants > >this to be standalone rather than part of RDMA. > > No, DOCA isn't on the agenda for this new interface. But what is the point > in arguing? I'm not arguing any point, we argued enough. But you failed to disclose that DOCA is very likely user of this interface. So whoever you're planning to submit it to should know. DOCA was top of mind for me because I noticed it has PSP support, and I wanted to take a look at the implementation. > Apparently the vendor is not credible enough in your opinion. You're creating an interface where you depend on a pinky promise from a black box that the RPC is not a write. I trust you personally not to write a patch which abuses this interface. But this cannot possibly extend to all developers, most of who just want to ship features. > Which is an absolute outrageous grounds for a NAK. > > Anyway I don't see your point in bringing up DOCA here, but obviously once > this interface is accepted, all developers are welcome to use it, > including DOCA developers of course.. Of course. > That being said, the why we need this is crystal clear in the > cover-letter and previous submission discussions, bringing random SDKs > into this discussion is not objective and counter productive to the > technical discussion. > > >> You are basically suggesting that if any vendor ever has an out of tree > >> option for its hardware every patch it sends should be considered a ruse > >> to enable or simplify proprietary options. > > It's apparent that you're attributing sinister agendas to patchsets when > you fail to offer valid technical opinions regarding the NAK nature. Let's > address this outside of this patchset, as this isn't the first occurrence. > Consistency in evaluating patches is crucial; Exactly :| Netdev people, including multiple prominent developers from Mellanox/nVidia have been nacking SDK interfaces in Linux networking for 20 years. How are we going to look to all the companies which have been doing IPUs for over a decade if we change the rules for nVidia? > some, like the fbnic and idpf, seem to go unquestioned, while others > face scrutiny. fbnic got a nack for any core changes or uAPI not used by other drivers. idpf got a nack for pretending to be a standard. You keep saying that I'm nacking your interface because I have some hatred and distrust for you or nVidia. I really, really don't. Any vendor posting this would get exactly the same nack from me. If by "let's address this outside of this patchset" you mean that we should have a discussion about maintainer favoritism, and subsystem capture by vendors - you have my full support!
Jakub Kicinski wrote: [..] > I don't begrudge anyone building proprietary options, but leave > upstream out of it. So I am of 2 minds here. In general, how is upstream benefited by requiring every vendor command to be wrapped by a Linux command? Mind you, I am coming at this from the perspective of being a maintainer of a subsystem that does *not* allow unrestricted vendor commands. Since day one, the CXL subsystem has matched netdev's general sentiment and been more restrictive than NVMe. It places all vendor commands and even all yet-to-be-Linux-wrapped-standard-commands behind a CONFIG_CXL_MEM_RAW_COMMANDS option. That default-off option, when enabled, allows any command to be sent but it taints the kernel with a WARN(). CXL devices theoretically allow direct manipulation of system memory without IOMMU protection which is in contrast to NVMe which would need to work harder to violate kernel-lockdown protections. The expectation that I laid out here [1] is based on the observation that a significant portion of the vendor commands these devices support are for pre-release hardware qualification and debug flows. The recommendation to device vendors was "if you need wide distribution of kernels that allow unrestricted vendor passthrough, work with Linux distributions to enable this option in debug kernels, run those debug kernels for your pre-release hardware flows, ignore the warnings". 3 years on from that recommendation it seems no vendor has even needed that level of distribution help. I.e. checking a few distro kernels (Fedora, openSUSE) shows no uptake for CONFIG_CXL_MEM_RAW_COMMANDS=y in their debug builds. I can only assume that locally compiled custom kernel binaries are filling the need. So all seems quiet with current restriction for CXL endpoint vendor commands, but this stance was recently challenged in this thread [2] by CXL switch vendors with an assertion that fabric switch configuration has need for more and varied vendor flows than endpoint configuration. While I am not clear on the veracity of that claim, it at least challenged me to do the thought experiment of "what would it look like to relax the CXL command restriction?". Maybe we can come up with a community answer to the "so you want to build a userpace-to-device-firmware tunnel?" to at least get all the various concerns documented in one place, and provide guidance for how device vendors should navigate this space across subsystems. Between NVMe "allow all the things", CXL "allow all the things only after tainting the kernel", and the "never allow vendor passthrough" position (I am sure there are other nuanced positions) it at least seems useful to document the concerns. Here is a start for that guidance from the CXL perspective: * Integrity: Subsystem has a responsibility to meet kernel-lockdown expectations: Distros and system owners need to be assured that root's ability to modify the running kernel image are mitigated. For CXL there are 2 ways to do this, require Linux wrapper commands for all the low level commands (status quo), or a new trust the device to publish which commands have user data effects in something CXL calls the "Command Effects Log". In that "trust Command Effects" scenario the kernel still has no idea what the command is actually doing, but it can at least assert that the device does not claim that the command changes the contents of system-memory. Now, you might say, "the device can just lie", but that betrays a conceit of the kernel restriction. A device could lie that a Linux wrapped command when passed certain payloads does not in turn proxy to a restricted command. So at some point there is almost always an out-of-tree way to get around the kernel restriction, so the question is are we better off giving a blessed path or force vendors into ugly out-of-tree workarounds? * Introspection / validation: Subsystem community needs to be able to audit behavior after the fact. To me this means even if the kernel is letting a command through based on the stated Command Effect of "Configuration Change after Cold Reset" upstream community has a need to be able to read the vendor specification for that command. I.e. commands might be vendor-specific, but never vendor-private. I see this as similar to the requirement for open source userspace for sophisticated accelerators. * Collaboration: open standards support open driver maintenance. Without standards we end up with awkward situations like Confidential Computing where every vendor races to implement the same functionality in arbitrarily different and vendor specific ways. For CXL devices, and I believe the devices fwctl is targeting, there are a whole class of commands for vendor specific configuration and debug. Commands that the kernel really need not worry about. Some subsystems may want to allow high-performance science experiments like what NVMe allows, but it seems worth asking the question if standardizing device configuration and debug is really the best use of upstream's limited time? One of the release valves in the CXL space is openly specified commands with opaque payloads, like "Read Vendor Debug Log". That is clear what it does, likely a payload the kernel need never worry about, and the "Command Effects" is empty. However, going forward there is a new class of commands called "Set/Get Feature" that allow a wide range of vendor toggles to be deployed which will need an upstream response for the driver policy to vendor-specific "Features". So if fwctl, or something like it, can strike a balance of enforcing integrity and introspection while encouraging collaboration on the aspects that are worth upstream collaboration, I think that is a conversation worth having. [1]: http://lore.kernel.org/r/CAPcyv4gDShAYih5iWabKg_eTHhuHm54vEAei8ZkcmHnPp3B0cw@mail.gmail.com/ [2]: http://lore.kernel.org/r/20240321174423.00007e0d@Huawei.com
On Tue, 4 Jun 2024 16:56:57 -0700 Dan Williams wrote: > Jakub Kicinski wrote: > [..] > > I don't begrudge anyone building proprietary options, but leave > > upstream out of it. > > So I am of 2 minds here. In general, how is upstream benefited by > requiring every vendor command to be wrapped by a Linux command? > [...] Thanks for sharing the CXL experience and your perspective. Also for trying to frame the discussion in a useful way, although I have little faith that it will help :( Fingers crossed? > * Integrity: Subsystem has a responsibility to meet kernel-lockdown > expectations: > > Distros and system owners need to be assured that root's ability to > modify the running kernel image are mitigated. For CXL there are 2 ways > to do this, require Linux wrapper commands for all the low level > commands (status quo), or a new trust the device to publish which > commands have user data effects in something CXL calls the "Command > Effects Log". In that "trust Command Effects" scenario the kernel still > has no idea what the command is actually doing, but it can at least > assert that the device does not claim that the command changes the > contents of system-memory. Now, you might say, "the device can just > lie", but that betrays a conceit of the kernel restriction. A device > could lie that a Linux wrapped command when passed certain payloads does > not in turn proxy to a restricted command. So at some point there is > almost always an out-of-tree way to get around the kernel restriction, > so the question is are we better off giving a blessed path or force > vendors into ugly out-of-tree workarounds? The integrity thing is a double edge sword, so I don't have much to say here. If we take a few wrong turns we'll wrap the vendor commands with crypto and then the vendor can control which commands you get to run ;) Obviously I'm joking, and not saying that the intent of the current series! But its about as realistic as "this will only be used for truly vendor specific things". > * Introspection / validation: Subsystem community needs to be able to > audit behavior after the fact. > > To me this means even if the kernel is letting a command through based > on the stated Command Effect of "Configuration Change after Cold Reset" > upstream community has a need to be able to read the vendor > specification for that command. I.e. commands might be vendor-specific, > but never vendor-private. I see this as similar to the requirement for > open source userspace for sophisticated accelerators. That sounds pretty CXL specific, and IIUC unrealistic. You assume you have some specification to consult, while this discussion has been going for over a year now, and I can't get the vendors to share what those turntables they so desperately need to tweak are. > * Collaboration: open standards support open driver maintenance. > > Without standards we end up with awkward situations like Confidential > Computing where every vendor races to implement the same functionality > in arbitrarily different and vendor specific ways. > > For CXL devices, and I believe the devices fwctl is targeting, there > are a whole class of commands for vendor specific configuration and > debug. Commands that the kernel really need not worry about. > > Some subsystems may want to allow high-performance science experiments > like what NVMe allows, but it seems worth asking the question if > standardizing device configuration and debug is really the best use of > upstream's limited time? No, but it's not about science experiments, really. It's about production features. The effort of implementing something properly upstream is high. I cost time and money to get the right caliber of people and let them go thru the revisions. I lack confidence that merging fwctl will not negatively impact motivation for companies to pay off our accrued technical debt. While all they need is "this simple little feature". And before competition wins the customer. It's a race to the bottom. > One of the release valves in the CXL space is openly specified > commands with opaque payloads, like "Read Vendor Debug Log". That is > clear what it does, likely a payload the kernel need never worry > about, and the "Command Effects" is empty. However, going forward there > is a new class of commands called "Set/Get Feature" that allow a wide > range of vendor toggles to be deployed which will need an upstream > response for the driver policy to vendor-specific "Features". > > So if fwctl, or something like it, can strike a balance of enforcing > integrity and introspection while encouraging collaboration on the > aspects that are worth upstream collaboration, I think that is a > conversation worth having. I presume you were trying to underscore that the decision is unavoidably a trade off, which is true. But I don't follow the exact formulation. Is fwctl helping integrity or collaboration? If we assume use of vendor tools is unavoidable, then I guess integrity? I really can't see how it helps collaboration when everyone ships their custom tool set. Back to the tradeoff. For networking, which is a _very_ mature subsystem with a ton of standards the need to do "vendor specific things" is marginal. The downside of the loss of an "upstream advantage" is obvious. We need to take such decisions on subsystem by subsystem basis. You should be able to draw the lines differently for CXL than how we draw them for TCP/IP. On the technical level the discussion can't go very far, because I'd like to hear actual user problems. But I can't even get a list of those infamous thousands of knobs :|
On Mon, 3 Jun 2024 12:53:16 -0300 Jason Gunthorpe wrote:
> Broadcom Networking - https://lore.kernel.org/r/Zf2n02q0GevGdS-Z@C02YVCJELVCG
Please double check with Broadcom if they are still supportive,
in the current form.
Please include lore links to previous postings.
Please carry my nack on future version. At least as long as
the write access checks are.. good-faith-based.
> One of the release valves in the CXL space is openly specified > commands with opaque payloads, like "Read Vendor Debug Log". That is > clear what it does, likely a payload the kernel need never worry > about, and the "Command Effects" is empty. However, going forward there > is a new class of commands called "Set/Get Feature" that allow a wide > range of vendor toggles to be deployed which will need an upstream > response for the driver policy to vendor-specific "Features". Irrelevant rat hole time ;) I don't see those Set / Get feature as any different from other commands. I see them as a convenience mostly there to cut down on spec duplication and enforce some consistency across multiple similar commands, but they are just commands like any other, validation is just one step further into the payload. There are already a bunch of them in the main CXL spec and like you mention above if someone brings a well documented vendor feature (or feature from another standard etc), then if appropriate we could let that through the filter as well. Same will be true of tunneled commands (I think we can ignore the cross host security aspect of those). Ultimately we can sanity check the payload much like a top level command. So I mostly agree with rest of what you've said, but think this detail doesn't matter. > > So if fwctl, or something like it, can strike a balance of enforcing > integrity and introspection while encouraging collaboration on the > aspects that are worth upstream collaboration, I think that is a > conversation worth having. > > [1]: http://lore.kernel.org/r/CAPcyv4gDShAYih5iWabKg_eTHhuHm54vEAei8ZkcmHnPp3B0cw@mail.gmail.com/ > [2]: http://lore.kernel.org/r/20240321174423.00007e0d@Huawei.com >
On Tue, Jun 04, 2024 at 08:11:03PM -0700, Jakub Kicinski wrote: > On Mon, 3 Jun 2024 12:53:16 -0300 Jason Gunthorpe wrote: > > Broadcom Networking - https://lore.kernel.org/r/Zf2n02q0GevGdS-Z@C02YVCJELVCG > > Please double check with Broadcom if they are still supportive, > in the current form. They are free to comment. > Please include lore links to previous postings. The link to mlx5ctl is already in the cover letter and Saeed linked from there to enough of the prior stuff. > Please carry my nack on future version. At least as long as > the write access checks are.. good-faith-based. I will include the acks and nacks related to the general concept on the documentation patch 6 along with a links and mention in the PR when we get there. Jason
On Tue, Jun 04, 2024 at 04:56:57PM -0700, Dan Williams wrote: > Jakub Kicinski wrote: > [..] > > I don't begrudge anyone building proprietary options, but leave > > upstream out of it. > > So I am of 2 minds here. In general, how is upstream benefited by > requiring every vendor command to be wrapped by a Linux command? People actually can use upstream :) Amazingly there is inherit benefit to people being able to use the software we produce. > 3 years on from that recommendation it seems no vendor has even needed > that level of distribution help. I.e. checking a few distro kernels > (Fedora, openSUSE) shows no uptake for CONFIG_CXL_MEM_RAW_COMMANDS=y in > their debug builds. I can only assume that locally compiled custom > kernel binaries are filling the need. My strong advice would be to be careful about this. Android-ism where nobody runs the upstream kernel is a real thing. For something emerging like CXL there is a real risk that the hyperscale folks will go off and do their own OOT stuff and in-tree CXL will be something usuable but inferior. I've seen this happen enough times.. If people come and say we need X and the maintainer says no, they don't just give up and stop doing X, the go and do X anyhow out of tree. This has become especially true now that the center of business activity in server-Linux is driven by the hyperscale crowd that don't care much about upstream. Linux maintainer's don't actually have the power to force the industry to do things, though people do keep trying.. Maintainers can only lead, and productive leading is not done with a NO. You will start to see this pain in maybe 5-10 years if CXL starts to be something deployed in an enterprise RedHat/Dell/etc sort of environment. Then that missing X becomes a critical issue because it turns out the hyperscale folks long since figured out it is really important but didn't do anything to enable it upstream. There is merit in upstream being something people can and do actually use, not just an ivory tower of architectural perfection. There is merit in bringing code into the community instead of forcing things to be OOT. For instance the thread you linked where there was talk of needing the signal integrity data is a great example. Sure some of that is manufacturing time, but also if you deploy a million interfaces in a datacenter, then yes, there will be need to collect SI information from live systems and do some analysis on it. You wouldn't believe how much physically broken HW leaks out into data centers and needs manufacturing level debugging techniques to properly root cause :( > userpace-to-device-firmware tunnel?" to at least get all the various > concerns documented in one place, and provide guidance for how device > vendors should navigate this space across subsystems. This is my effort here. If we document the expectations there is a much better chance that a standard body or device manufacturer can implement their interfaces in a way that works with the OS. There is a much higher chance they will attract CVEs and be forced to fix it if the security expectations are clearly laid out. You had a good observation in one of those links about how they are not OS people. Let's help them do better. Shunt the less robust stuff to fwctl and then people can also make their own security choices, don't enable or load the fwctl modules and you get more protection. It is closer to your CONFIG_CXL_MEM_RAW_COMMANDS=y but at runtime. I think I captured most of your commentary below here in patch 6. > Effects Log". In that "trust Command Effects" scenario the kernel still > has no idea what the command is actually doing, but it can at least > assert that the device does not claim that the command changes the > contents of system-memory. Now, you might say, "the device can just > lie", but that betrays a conceit of the kernel restriction. A device > could lie that a Linux wrapped command when passed certain payloads does > not in turn proxy to a restricted command. Yeah, we have to trust the device. If the device is hostile toward the OS then there are already big problems. We need to allow for unintentional defects in the devices, but we don't need to be paranoid. IMHO a command effects report, in conjunction with a robust OS centric defintion is something we can trust in. > * Introspection / validation: Subsystem community needs to be able to > audit behavior after the fact. > > To me this means even if the kernel is letting a command through based > on the stated Command Effect of "Configuration Change after Cold Reset" > upstream community has a need to be able to read the vendor > specification for that command. I.e. commands might be vendor-specific, > but never vendor-private. I see this as similar to the requirement for > open source userspace for sophisticated accelerators. I'm less hard on this. As long as reasonable open userspace exists I think it is fine to let other stuff through too. I can appreciate the DRM stance on this, but IMHO, there is meaningfully more value for open source in trying get an open Vulkan implementation vs blocking users from reading their vendor'd diagnostic SI values. I don't think we should get into some kind of extremism and insist that every single bit must be documented/standardized or Linux won't support it. This is why I envision fwctl as not being suitable for actual datapath/performance stuff. > * Collaboration: open standards support open driver maintenance. > > Without standards we end up with awkward situations like Confidential > Computing where every vendor races to implement the same functionality > in arbitrarily different and vendor specific ways. Standard are important. Linux is not a standards body. Linux maintainers can only advise, not force, the industry to make standards. At a certain point Linux's job is to implement software to support what people have built. CC is a sad example where the industry did not get together enough, but still Linux will support the CC mess. > For CXL devices, and I believe the devices fwctl is targeting, there > are a whole class of commands for vendor specific configuration and > debug. Commands that the kernel really need not worry about. Right. > Some subsystems may want to allow high-performance science experiments > like what NVMe allows, but it seems worth asking the question if > standardizing device configuration and debug is really the best use of > upstream's limited time? From what I've been seeing it looks like a significant waste of time. For example there is minimal industry value in standardizing values stored in a device's boot time flash configuration. If some common software wants to access really generic configuration (like SRIOV enable) then sure there is merit, but that is really the minority. Jason
On Tue, Jun 04, 2024 at 03:32:16PM -0700, Jakub Kicinski wrote: > On Tue, 4 Jun 2024 14:28:05 -0700 Saeed Mahameed wrote: > > On 04 Jun 07:04, Jakub Kicinski wrote: > > >On Mon, 3 Jun 2024 21:01:58 -0600 David Ahern wrote: > > >> Seriously, Jakub, how is that in any way related to this patch set? > > > > > >Whether they admit it or not, DOCA is a major reason nVidia wants > > >this to be standalone rather than part of RDMA. > > > > No, DOCA isn't on the agenda for this new interface. But what is the point > > in arguing? > > I'm not arguing any point, we argued enough. But you failed to disclose > that DOCA is very likely user of this interface. So whoever you're > planning to submit it to should know. This is getting ridiculous. Did you disclose in your PSP cover letter that all that work and new kernel uAPI is to support Meta's propritary user space, even to the point that NO open source implementation even exists yet? Let me check. Nope. So why this made up double standard for Saeed? Especially after he already said DOCA isn't on the agenda for mlx5's fwctl? > > >> You are basically suggesting that if any vendor ever has an out of tree > > >> option for its hardware every patch it sends should be considered a ruse > > >> to enable or simplify proprietary options. > > > > It's apparent that you're attributing sinister agendas to patchsets when > > you fail to offer valid technical opinions regarding the NAK nature. Let's > > address this outside of this patchset, as this isn't the first occurrence. > > Consistency in evaluating patches is crucial; > > Exactly :| Netdev people, including multiple prominent developers from > Mellanox/nVidia have been nacking SDK interfaces in Linux networking > for 20 years. How are we going to look to all the companies which have > been doing IPUs for over a decade if we change the rules for nVidia? That is a bleak way of painting things. fwctl is a developing consensus on how to solve this class of problems. We get to have a consensus that is different than the past because Linux dos actually evolve. All your long suffering IPU comanpies are welcome to use fwctl with their products going forward just as equally to nvidia/etc. Amazingly, "rules" are not set in stone in Linux! > If by "let's address this outside of this patchset" you mean that we > should have a discussion about maintainer favoritism, and subsystem > capture by vendors - you have my full support! This vendor bashing needs to stop. You could have easially used the word companies and been much more accurate. At this point the hyperscale companies - your so-called "users" - are much more guilty of "subsytem capture" than any vendor is, and it certainly has changed the culture of Linux. There are many legitimate complaints all around of maintainers being capricious - it doesn't matter who employees them. Jason
On Wed, 5 Jun 2024 11:50:39 -0300 Jason Gunthorpe wrote: > On Tue, Jun 04, 2024 at 03:32:16PM -0700, Jakub Kicinski wrote: > > On Tue, 4 Jun 2024 14:28:05 -0700 Saeed Mahameed wrote: > > > No, DOCA isn't on the agenda for this new interface. But what is the point > > > in arguing? > > > > I'm not arguing any point, we argued enough. But you failed to disclose > > that DOCA is very likely user of this interface. So whoever you're > > planning to submit it to should know. > > This is getting ridiculous. Did you disclose in your PSP cover letter > that all that work and new kernel uAPI is to support Meta's propritary > user space, even to the point that NO open source implementation even > exists yet? Let me check. Nope. There is no Meta proprietary implementation. Some Meta folks who are on the CC of the submission are working on extending Fizz, but it's not ready. Fizz is here: https://github.com/facebookincubator/fizz
On 6/4/24 8:04 AM, Jakub Kicinski wrote:
> Ooo, is that a sore spot?
Maintainer overreach? Absolutely.
The sky is not falling with this proposed subsystem; engineers are
merely trying to solve real, customer problems.
On 6/5/24 7:59 AM, Jason Gunthorpe wrote: > On Tue, Jun 04, 2024 at 04:56:57PM -0700, Dan Williams wrote: >> Jakub Kicinski wrote: >> [..] >>> I don't begrudge anyone building proprietary options, but leave >>> upstream out of it. >> >> So I am of 2 minds here. In general, how is upstream benefited by >> requiring every vendor command to be wrapped by a Linux command? > > People actually can use upstream :) > > Amazingly there is inherit benefit to people being able to use the > software we produce. There is. There is a clear preference for open source kernels and drivers. Until a feature is standardized and/or commoditized, it does not make sense to create a uapi for every H/W vendor whim. All of them are attempting to solve real problems; some of them will stick. We know which features are valuable when customers use them, ask for them and other vendors copy them. Until then it is a 1-off by a vendor basically proposing a solution. Not all ideas are good ideas, and we do not need the burden of a uapi or the burden of out of tree drivers. > >> 3 years on from that recommendation it seems no vendor has even needed >> that level of distribution help. I.e. checking a few distro kernels >> (Fedora, openSUSE) shows no uptake for CONFIG_CXL_MEM_RAW_COMMANDS=y in >> their debug builds. I can only assume that locally compiled custom >> kernel binaries are filling the need. > > My strong advice would be to be careful about this. Android-ism where > nobody runs the upstream kernel is a real thing. For something > emerging like CXL there is a real risk that the hyperscale folks will > go off and do their own OOT stuff and in-tree CXL will be something > usuable but inferior. I've seen this happen enough times.. > > If people come and say we need X and the maintainer says no, they > don't just give up and stop doing X, the go and do X anyhow out of > tree. This has become especially true now that the center of business > activity in server-Linux is driven by the hyperscale crowd that don't > care much about upstream. Linux maintainer's don't actually have the > power to force the industry to do things, though people do keep > trying.. Maintainers can only lead, and productive leading is not done > with a NO. +1
Jason Gunthorpe wrote: [..] > > 3 years on from that recommendation it seems no vendor has even needed > > that level of distribution help. I.e. checking a few distro kernels > > (Fedora, openSUSE) shows no uptake for CONFIG_CXL_MEM_RAW_COMMANDS=y in > > their debug builds. I can only assume that locally compiled custom > > kernel binaries are filling the need. > > My strong advice would be to be careful about this. Android-ism where > nobody runs the upstream kernel is a real thing. For something > emerging like CXL there is a real risk that the hyperscale folks will > go off and do their own OOT stuff and in-tree CXL will be something > usuable but inferior. I've seen this happen enough times.. Hence my openness to considering fwctl... > If people come and say we need X and the maintainer says no, they > don't just give up and stop doing X, the go and do X anyhow out of > tree. This has become especially true now that the center of business > activity in server-Linux is driven by the hyperscale crowd that don't > care much about upstream. "...don't care much about upstream...". This could be a whole separate thread unto itself. > Linux maintainer's don't actually have the power to force the industry > to do things, though people do keep trying.. Maintainers can only > lead, and productive leading is not done with a NO. > > You will start to see this pain in maybe 5-10 years if CXL starts to > be something deployed in an enterprise RedHat/Dell/etc sort of > environment. Then that missing X becomes a critical issue because it > turns out the hyperscale folks long since figured out it is really > important but didn't do anything to enable it upstream. This matches other feedback I have heard recently. Yes, distros hate contending with every vendor's userspace toolkit, that was the original distro feedback motivating CONFIG_CXL_MEM_RAW_COMMANDS to have a poison pill of WARN() on use. However, allowing more vendor commands is more preferable than contending with vendor out-of-tree drivers that likely help keep the enterprise-distro-kernel stable-ABI train rolling. In other words, legalize it in order to centrally regulate it. [..] > This is my effort here. If we document the expectations there is a > much better chance that a standard body or device manufacturer can > implement their interfaces in a way that works with the OS. There is a > much higher chance they will attract CVEs and be forced to fix it if > the security expectations are clearly laid out. You had a good > observation in one of those links about how they are not OS > people. Let's help them do better. > > Shunt the less robust stuff to fwctl and then people can also make > their own security choices, don't enable or load the fwctl modules and > you get more protection. It is closer to your > CONFIG_CXL_MEM_RAW_COMMANDS=y but at runtime. > > I think I captured most of your commentary below here in patch 6. I will take a look... > > Effects Log". In that "trust Command Effects" scenario the kernel still > > has no idea what the command is actually doing, but it can at least > > assert that the device does not claim that the command changes the > > contents of system-memory. Now, you might say, "the device can just > > lie", but that betrays a conceit of the kernel restriction. A device > > could lie that a Linux wrapped command when passed certain payloads does > > not in turn proxy to a restricted command. > > Yeah, we have to trust the device. If the device is hostile toward the > OS then there are already big problems. We need to allow for > unintentional defects in the devices, but we don't need to be > paranoid. > > IMHO a command effects report, in conjunction with a robust OS centric > defintion is something we can trust in. So this is where I want to start and see if we can bridge the trust gap. I am warming to your assertion that there is a wide array of vendor-specific configuration and debug that are not an efficient use of upstream's time to wrap in a shared Linux ABI. I want to explore fwctl for CXL for that use case, I personally don't want to marshal a Linux command to each vendor's slightly different backend CXL toggles. At the same time, I also agree with the contention that a "do anything you want and get away with it" tunnel invites shenanigans from folks that may not care about the long term health of the Linux kernel vs their short term interests. That it is difficult to unring the bell once a tunnel is in place. While subsystems will rightly take different stances to fwctl policy, that lack of one-size-fits all seems not sufficient reason to keep the concept out of the kernel entirely. I appreciate that you crafted this interface with an eye towards making it unsuitable for data-path operations. So my questions to try to understand the specific sticking points more are: 1/ Can you think of a Command Effect that the device could enumerate to address the specific shenanigan's that netdev is worried about? In other words if every command a device enables has the stated effect of "Configuration Change after Reset" does that cut out a significant portion of the concern? Make this a debate on finer grained effects not coarse grained binary decision on whether fwctl should move forward at all. 2/ About the "what if the device lies?" question. We can't revert code that used to work, but we can definitely work with enterprise distros to turn off fwctl where there is concern it may lead or is leading to shenanigans. So, document what each subsystem's stance towards fwctl is, like maybe a distro only wants fwctl to front publicly documented vendor commands, or maybe private vendor commands ok, but only with a constrained set of Command Effects (I potentially see CXL here). A distro should know what they are opting into for each fwctl instance, it likely will always need to be subsystem specific policy. A distro can also decide lockdown policy based on Command Effects above and beyond the ones that clearly state they allow the device to modify the running kernel.
On Wed, Jun 05, 2024 at 09:56:14PM -0700, Dan Williams wrote: > Jason Gunthorpe wrote: <...> > So my questions to try to understand the specific sticking points more > are: > > 1/ Can you think of a Command Effect that the device could enumerate to > address the specific shenanigan's that netdev is worried about? In other > words if every command a device enables has the stated effect of > "Configuration Change after Reset" does that cut out a significant > portion of the concern? It will prevent SR-IOV devices (or more accurate their VFs) to be configured through the fwctl, as they are destroyed in HW during reboot. Thanks
On Wed, 5 Jun 2024 20:35:49 -0600 David Ahern wrote: > Until a feature is standardized and/or commoditized, it does not make > sense to create a uapi for every H/W vendor whim. This is not about non-standard features. I work with multiple vendors as my day job. I ask them how to set basic link configuration and the support person gives me a link to the vendor tools! I wish I could show you the emails. > All of them are attempting to solve real problems; some of them will > stick. We know which features are valuable when customers use them, Yes, once customers deploy a feature implemented via a vendor API they will definitely migrate to a different API. Customers like risk and wasting their engineering resources reimplementing and redeploying things? And we have so much success move users to new APIs in Linux! > ask for them and other vendors copy them. Until then it is a 1-off by > a vendor basically proposing a solution. Certainly. Because... who exactly will ask the second vendor to implement the common API? And the second vendor will most certainly not mind the extra delay and inconvenience having their product shipped via the publicly reviewed, and slow to deploy kernel, while the first one is happily selling the same feature already. > Not all ideas are good ideas, and we do not need the burden of a uapi > or the burden of out of tree drivers. This API gives user space SDKs a trivial way of implementing all switching, routing, filtering, QoS offloads etc. An argument can be made that given somewhat mixed switchdev experience we should just stay out of the way and let that happen. But just make that argument then, instead of pretending the use of this API will be limited to custom very vendor specific things. Again, if someone needs this to ship their custom CXL/Infiniband AI fabric magic, which is un-interoperable by design -- none of my concern. But keep TCP/IP networking out of this :|
On Wed, Jun 05, 2024 at 09:56:14PM -0700, Dan Williams wrote: > > If people come and say we need X and the maintainer says no, they > > don't just give up and stop doing X, the go and do X anyhow out of > > tree. This has become especially true now that the center of business > > activity in server-Linux is driven by the hyperscale crowd that don't > > care much about upstream. > > "...don't care much about upstream...". This could be a whole separate > thread unto itself. Heh, it is a topic, but perhaps not one for polite company :) > > Linux maintainer's don't actually have the power to force the industry > > to do things, though people do keep trying.. Maintainers can only > > lead, and productive leading is not done with a NO. > > > > You will start to see this pain in maybe 5-10 years if CXL starts to > > be something deployed in an enterprise RedHat/Dell/etc sort of > > environment. Then that missing X becomes a critical issue because it > > turns out the hyperscale folks long since figured out it is really > > important but didn't do anything to enable it upstream. > > This matches other feedback I have heard recently. Yes, distros hate > contending with every vendor's userspace toolkit, that was the > original I'm not sure that is 100% true. Sure nobody likes that you have to type 'abc X' and 'def Y' to do a similar thing, but from a distro perpective if abc and def are both open sourced and packaged in the distro it is still a far better outcome than users doing OOT drivers and binary-only tools. eg one of the long standing main Mellanox tools that is being ported to fwctl is open source and in all distros: https://rpmfind.net/linux/rpm2html/search.php?query=mstflint Projects have already experimented building tooling on top of it to make a more cross-vendor experience in some areas. In my view it is wrong to think the kernel is the only place we can make generic things or that allowing userspace to see the raw device interface immediately means fragmentation and chaos. The industry is more robust than that. Giving people working in userspace room to invent their own solutions is actually helpful to driving some commonality. There are already soft targets in the K8S that people need to fit into, if the first few steps are with abc/def tools and that brings us to an eventual true commonality, then great. > distro feedback motivating CONFIG_CXL_MEM_RAW_COMMANDS to have a poison > pill of WARN() on use. However, allowing more vendor commands is more > preferable than contending with vendor out-of-tree drivers that likely > help keep the enterprise-distro-kernel stable-ABI train rolling. In > other words, legalize it in order to centrally regulate it. I also liked Jakub's idea of putting a taint in for things that were likely to have an impact on support and debug, I included that concept in fwctl. > > > Effects Log". In that "trust Command Effects" scenario the kernel still > > > has no idea what the command is actually doing, but it can at least > > > assert that the device does not claim that the command changes the > > > contents of system-memory. Now, you might say, "the device can just > > > lie", but that betrays a conceit of the kernel restriction. A device > > > could lie that a Linux wrapped command when passed certain payloads does > > > not in turn proxy to a restricted command. > > > > Yeah, we have to trust the device. If the device is hostile toward the > > OS then there are already big problems. We need to allow for > > unintentional defects in the devices, but we don't need to be > > paranoid. > > > > IMHO a command effects report, in conjunction with a robust OS centric > > defintion is something we can trust in. > > So this is where I want to start and see if we can bridge the trust gap. > > I am warming to your assertion that there is a wide array of > vendor-specific configuration and debug that are not an efficient use of > upstream's time to wrap in a shared Linux ABI. I want to explore fwctl > for CXL for that use case, I personally don't want to marshal a Linux > command to each vendor's slightly different backend CXL toggles. Personally I think this idea to marshal/unmarshal everything in the kernel is often misguided. If it is truely obvious and actually shared multi-vendor capability then by all means go and do it. But if you are spending weeks/months fighting about uAPI because all the vendors are so different, it isn't obvious what is "generic" then you've probably already lost. The very worst outcome is a per-device uAPI masquerading as an obfuscated "generic" uAPI that wasted ages of peoples time to argue out. > At the same time, I also agree with the contention that a "do anything > you want and get away with it" tunnel invites shenanigans from folks > that may not care about the long term health of the Linux kernel vs > their short term interests. IMHO this is disproven by history. The above mstflint I linked to is as old as as mlx5 HW, it runs today over PCI config space and an OOT driver. Where is real the damage to the long term health of Linux or the ecosystem? Like I said before I view there is a difference between DRM wanting a Vulkan stack and doing some device specific configuration/debugging. One has vastly more open source value than the other. > So my questions to try to understand the specific sticking points more > are: > > 1/ Can you think of a Command Effect that the device could enumerate to > address the specific shenanigan's that netdev is worried about? Nothing comes to mind.. > In other words if every command a device enables has the stated > effect of "Configuration Change after Reset" does that cut out a > significant portion of the concern? Related to configuration - one of Saeed's oringinal ideas was to implement a devlink command to set the configurables in the flash in a way that mlx5 could implement all of its options, ideally with configurables discovered dynamically from the running device. This LPC presentation was so agressively rejected by Jakub that Saeed abandoned it. In the discussion it was clear Jakub is requesting to review and possibly reject every configurable. On this topic, unfortunately, I don't see any technical middle ground between "netdev is the gatekeeper for all FLASH configurables" and "devices can be fully configured regardless of their design". > 2/ About the "what if the device lies?" question. We can't revert code > that used to work, but we can definitely work with enterprise distros to > turn off fwctl where there is concern it may lead or is leading to > shenanigans. Security is the one place where Linus has tolerated userspace regressions. In this specific case I documented (or at least that was the intent) there would be regression consequences to breaking the security rules. Commands can be retroactively restricted to higher CAP levels and rejected from lockdown if the device attracts a CVE. IMHO the ecosystem is strongly motived to do security seriously these days, I am not so worried. > So, document what each subsystem's stance towards fwctl is, > like maybe a distro only wants fwctl to front publicly documented vendor > commands, or maybe private vendor commands ok, but only with a > constrained set of Command Effects (I potentially see CXL here). I wouldn't say subsystem here, but techonology. I think it is reasonable that a CXL fwctl driver have some kconfig tunables like you already have. This idea works alot better if the underlying thing is already standards based. Linux subsystem isn't a meaningful concept for a multi-function device like mlx5 and others. Thanks, Jason
On Thu, Jun 06, 2024 at 07:18:11AM -0700, Jakub Kicinski wrote: > An argument can be made that given somewhat mixed switchdev experience > we should just stay out of the way and let that happen. But just make > that argument then, instead of pretending the use of this API will be > limited to custom very vendor specific things. Huh? At least mlx5 already has a very robust userspace competition to switchdev using RDMA APIs, available in DPDK. This is long since been done and is widely deployed. I have no idea where you get this made up idea that fwctl is somehow about dataplane SDKs. The acclerated networking industry long ago moved pasted netdev in upstream, it is well known to everyone. There is no trick here. fwctl is not some scheme to sneak dataplane SDKs into the kernel, you are just making stuff up. Jason
On Thu, 6 Jun 2024 11:41:02 -0300 Jason Gunthorpe wrote: > In my view it is wrong to think the kernel is the only place we can > make generic things or that allowing userspace to see the raw device > interface immediately means fragmentation and chaos. The industry is > more robust than that. Giving people working in userspace room to > invent their own solutions is actually helpful to driving some > commonality. There are already soft targets in the K8S that people > need to fit into, if the first few steps are with abc/def tools and > that brings us to an eventual true commonality, then great. Yes, this is the core of our disagreement. And one which is quite hard to resolve with technical arguments. I believe kernel may not be a great place to keep all the controls, but it is in my opinion the most healthy open source project among the available options. You mention K8S, but I'd give SoNiC (the NOS) as a more relevant example. A hyperscaler or another trillion dollar company can certainly have a swing at creating other open layers of commonality. Together with its other trillion dollar friends. Removing the minor inconvenience of having to ship an out of tree module for out of tree tools is not worth the loss.
On Thu, 6 Jun 2024 11:48:18 -0300 Jason Gunthorpe wrote: > > An argument can be made that given somewhat mixed switchdev experience > > we should just stay out of the way and let that happen. But just make > > that argument then, instead of pretending the use of this API will be > > limited to custom very vendor specific things. > > Huh? I'm sorry, David as been working in netdev for a long time. I have a tendency to address the person I'm replying to, assuming their level of understanding of the problem space. Which makes it harder to understand for bystanders. > At least mlx5 already has a very robust userspace competition to > switchdev using RDMA APIs, available in DPDK. This is long since been > done and is widely deployed. Yeah, we had this discussion multiple times > I have no idea where you get this made up idea that fwctl is somehow > about dataplane SDKs. The acclerated networking industry long ago > moved pasted netdev in upstream, it is well known to everyone. There > is no trick here. > > fwctl is not some scheme to sneak dataplane SDKs into the kernel, you > are just making stuff up. By dataplane SDK you mean DOCA? I don't even want to go there. I just meant forwarding offload _which I said_. You didn't understand and now you're accusing me of "making stuff up". This whole conversation is such a damn waste of time.
Jason Gunthorpe wrote: [..] > > I am warming to your assertion that there is a wide array of > > vendor-specific configuration and debug that are not an efficient use of > > upstream's time to wrap in a shared Linux ABI. I want to explore fwctl > > for CXL for that use case, I personally don't want to marshal a Linux > > command to each vendor's slightly different backend CXL toggles. > > Personally I think this idea to marshal/unmarshal everything in the > kernel is often misguided. If it is truely obvious and actually shared > multi-vendor capability then by all means go and do it. > > But if you are spending weeks/months fighting about uAPI because all > the vendors are so different, it isn't obvious what is "generic" then > you've probably already lost. The very worst outcome is a per-device > uAPI masquerading as an obfuscated "generic" uAPI that wasted ages of > peoples time to argue out. Certainly once you have gotten to the "months of arguing" point it begs the question "was there really any generic benefit to reap in the first place?" That said, *some* grappling, especially when muliple vendors hit the list with the similar feature at the same time, has yielded collaboration in the past. So I might be a few rungs back on the spectrum from where you are, but I concede that yes, there is a point of diminishing to negative returns. > > At the same time, I also agree with the contention that a "do anything > > you want and get away with it" tunnel invites shenanigans from folks > > that may not care about the long term health of the Linux kernel vs > > their short term interests. > > IMHO this is disproven by history. The above mstflint I linked to is > as old as as mlx5 HW, it runs today over PCI config space and an OOT > driver. Where is real the damage to the long term health of Linux or > the ecosystem? > > Like I said before I view there is a difference between DRM wanting a > Vulkan stack and doing some device specific > configuration/debugging. One has vastly more open source value than > the other. Fair. > > So my questions to try to understand the specific sticking points more > > are: > > > > 1/ Can you think of a Command Effect that the device could enumerate to > > address the specific shenanigan's that netdev is worried about? > > Nothing comes to mind.. Ugh, that indeed seems too severe. > > In other words if every command a device enables has the stated > > effect of "Configuration Change after Reset" does that cut out a > > significant portion of the concern? > > In other words if every command a device enables has the stated > > effect of "Configuration Change after Reset" does that cut out a > > significant portion of the concern? > > Related to configuration - one of Saeed's oringinal ideas was to > way that mlx5 could implement all of its options, ideally with > configurables discovered dynamically from the running device. This LPC > presentation was so agressively rejected by Jakub that Saeed abandoned > it. In the discussion it was clear Jakub is requesting to review and > possibly reject every configurable. > between "netdev is the gatekeeper for all FLASH configurables" and > "devices can be fully configured regardless of their design". This gets back to the unspoken conceit of the kernel restriction that I mentioned earlier. At some point the kernel restriction begets a cynical in-tree workaround or an out-of-tree workaround which either way means upstream Linux loses. > > 2/ About the "what if the device lies?" question. We can't revert code > > that used to work, but we can definitely work with enterprise distros to > > turn off fwctl where there is concern it may lead or is leading to > > shenanigans. > > Security is the one place where Linus has tolerated userspace > regressions. In this specific case I documented (or at least that was > the intent) there would be regression consequences to breaking the > security rules. Commands can be retroactively restricted to higher CAP > levels and rejected from lockdown if the device attracts a CVE. > > IMHO the ecosystem is strongly motived to do security seriously these > days, I am not so worried. That is a good point, if a Command Effect gets tied to a CVE, or a cynical workaround gets tied to a CVE, both of those demand an upstream and distro response. > > So, document what each subsystem's stance towards fwctl is, > > like maybe a distro only wants fwctl to front publicly documented vendor > > commands, or maybe private vendor commands ok, but only with a > > constrained set of Command Effects (I potentially see CXL here). > > I wouldn't say subsystem here, but techonology. I think it is > reasonable that a CXL fwctl driver have some kconfig tunables like you > already have. This idea works alot better if the underlying thing is > already standards based. True, I worry about these technologies that cross upstream maintainer boundaries. When you have a composable switch that enables net, block, and/or mem use cases, which upstream maintainer policy applies to the fwctl posture of that thing?
On 6/6/24 9:05 AM, Jakub Kicinski wrote: > On Thu, 6 Jun 2024 11:48:18 -0300 Jason Gunthorpe wrote: >>> An argument can be made that given somewhat mixed switchdev experience >>> we should just stay out of the way and let that happen. But just make >>> that argument then, instead of pretending the use of this API will be >>> limited to custom very vendor specific things. >> >> Huh? > > I'm sorry, David as been working in netdev for a long time. And I will continue working on Linux networking stack (netdev) while I also work with the IB S/W stack, fwctl, and any other part of Linux relevant to my job. I am not going to pick a silo (and should not be required to). > I have a tendency to address the person I'm replying to, > assuming their level of understanding of the problem space. > Which makes it harder to understand for bystanders. > >> At least mlx5 already has a very robust userspace competition to >> switchdev using RDMA APIs, available in DPDK. This is long since been >> done and is widely deployed. > > Yeah, we had this discussion multiple times The switchdev / sonic comparison came to mind as well during this thread. The existence of a kernel way (switchdev) has not stopped sonic (userspace SDK) from gaining traction. In some cases the SDK is required for device features that do not have a kernel uapi or vendors refuse to offer a kernel way, so it is the only option. The bottom line to me is that these hardline, dogmatic approaches - resisting the recognition of reality - is only harming users. There is a middle ground, open source drivers and tools that offer more flexibility.
Leon Romanovsky wrote: > On Wed, Jun 05, 2024 at 09:56:14PM -0700, Dan Williams wrote: > > Jason Gunthorpe wrote: > > <...> > > > So my questions to try to understand the specific sticking points more > > are: > > > > 1/ Can you think of a Command Effect that the device could enumerate to > > address the specific shenanigan's that netdev is worried about? In other > > words if every command a device enables has the stated effect of > > "Configuration Change after Reset" does that cut out a significant > > portion of the concern? > > It will prevent SR-IOV devices (or more accurate their VFs) > to be configured through the fwctl, as they are destroyed in HW > during reboot. Right, but between zero configurability and losing live SR-IOV configurabilitiy is there still value? Note, this is just a thought experiment on what if any Command Effects Linux can comfortably tolerate vs those that start to be more spicy and dip into removing stimulus / focus on the commons, or otherwise injuring collaboration.
On Thu, Jun 06, 2024 at 03:11:21PM -0700, Dan Williams wrote: > Leon Romanovsky wrote: > > On Wed, Jun 05, 2024 at 09:56:14PM -0700, Dan Williams wrote: > > > Jason Gunthorpe wrote: > > > > <...> > > > > > So my questions to try to understand the specific sticking points more > > > are: > > > > > > 1/ Can you think of a Command Effect that the device could enumerate to > > > address the specific shenanigan's that netdev is worried about? In other > > > words if every command a device enables has the stated effect of > > > "Configuration Change after Reset" does that cut out a significant > > > portion of the concern? > > > > It will prevent SR-IOV devices (or more accurate their VFs) > > to be configured through the fwctl, as they are destroyed in HW > > during reboot. > > Right, but between zero configurability and losing live SR-IOV > configurabilitiy is there still value? Note, this is just a thought > experiment on what if any Command Effects Linux can comfortably tolerate > vs those that start to be more spicy and dip into removing stimulus / > focus on the commons, or otherwise injuring collaboration. I like the idea of "takes effect on _function_ reset". VFs and PFs both often have configuration that can become current once the fuction is reset. A VF is usually reset by something like VFIO while a PF is usually reset by a power cycle. The fact configuration doesn't change until reset is, IMHO, a very strong barrier from making some backdoor into a subsystem driver. Jason
On Thu, Jun 06, 2024 at 10:24:46AM -0700, Dan Williams wrote: > Jason Gunthorpe wrote: > [..] > > > I am warming to your assertion that there is a wide array of > > > vendor-specific configuration and debug that are not an efficient use of > > > upstream's time to wrap in a shared Linux ABI. I want to explore fwctl > > > for CXL for that use case, I personally don't want to marshal a Linux > > > command to each vendor's slightly different backend CXL toggles. > > > > Personally I think this idea to marshal/unmarshal everything in the > > kernel is often misguided. If it is truely obvious and actually shared > > multi-vendor capability then by all means go and do it. > > > > But if you are spending weeks/months fighting about uAPI because all > > the vendors are so different, it isn't obvious what is "generic" then > > you've probably already lost. The very worst outcome is a per-device > > uAPI masquerading as an obfuscated "generic" uAPI that wasted ages of > > peoples time to argue out. > > Certainly once you have gotten to the "months of arguing" point it begs the > question "was there really any generic benefit to reap in the first > place?" Indeed, but I've seen, and participated, in these things many times :) > That said, *some* grappling, especially when muliple vendors hit the > list with the similar feature at the same time, has yielded > collaboration in the past. Absolutely! But we have also frequently done that retroactively, like see three examples and then consolidate the common APIs. The challenge is uAPI. Since we can't change uAPI people like to rush to make it future proof without examples. Broadly I lean towards waiting until we have several examples to build a standard uAPI and let the examples evolve on their own. If there is value in the commonality then people will change over. > This gets back to the unspoken conceit of the kernel restriction that I > mentioned earlier. At some point the kernel restriction begets a cynical > in-tree workaround or an out-of-tree workaround which either way means > upstream Linux loses. Right.. The kernel just don't have the power to say no to the industry. Things will just go OOT and it is really our community that suffers in the long run. As I said, you can't lead with NO. IHMO there has to be a really high quality reason to keep support for HW that people have built out of the kernel. Especially start ups and other more vulnerable companies. I don't think Linux maintainers should be choosing industry winners and losers. I sometimes feel I have a minority opinion here though :( > > > So, document what each subsystem's stance towards fwctl is, > > > like maybe a distro only wants fwctl to front publicly documented vendor > > > commands, or maybe private vendor commands ok, but only with a > > > constrained set of Command Effects (I potentially see CXL here). > > > > I wouldn't say subsystem here, but techonology. I think it is > > reasonable that a CXL fwctl driver have some kconfig tunables like you > > already have. This idea works alot better if the underlying thing is > > already standards based. > > True, I worry about these technologies that cross upstream maintainer > boundaries. When you have a composable switch that enables net, block, > and/or mem use cases, which upstream maintainer policy applies to the > fwctl posture of that thing? fwctl is intended to sit on its own. I think it is even a bad architecture direction that Linux has N different ways to flash FW on devices, N different ways to read diagnostics, etc all because each subsystem went on its own. With fwctl I'd like to see a greater consolidation of not re-inventing the low level of fw interaction differently in each and every subsystem. Like you mentioned CXL has its own way to program flash. How many ways does Linux have to update device flash now? :( So, if you have a real multi-function device fwctl should be the central place to operate the shared PCI function and the FW interface. There may be some duplication in subsystems, but that is a side effect of our sub-system siloed development model (software architecture tends to follow org chart, after all) Jason
Thu, Jun 06, 2024 at 07:47:20PM CEST, dsahern@kernel.org wrote: >On 6/6/24 9:05 AM, Jakub Kicinski wrote: >> On Thu, 6 Jun 2024 11:48:18 -0300 Jason Gunthorpe wrote: >>>> An argument can be made that given somewhat mixed switchdev experience >>>> we should just stay out of the way and let that happen. But just make >>>> that argument then, instead of pretending the use of this API will be >>>> limited to custom very vendor specific things. >>> >>> Huh? >> >> I'm sorry, David as been working in netdev for a long time. > >And I will continue working on Linux networking stack (netdev) while I >also work with the IB S/W stack, fwctl, and any other part of Linux >relevant to my job. I am not going to pick a silo (and should not be >required to). > >> I have a tendency to address the person I'm replying to, >> assuming their level of understanding of the problem space. >> Which makes it harder to understand for bystanders. >> >>> At least mlx5 already has a very robust userspace competition to >>> switchdev using RDMA APIs, available in DPDK. This is long since been >>> done and is widely deployed. >> >> Yeah, we had this discussion multiple times > >The switchdev / sonic comparison came to mind as well during this >thread. The existence of a kernel way (switchdev) has not stopped sonic >(userspace SDK) from gaining traction. In some cases the SDK is required Is this discussion technical or policital? I'm asking because it makes huge difference. There is no technical reason why sonic does not use proper in-kernel solution from what I see Yes, they chose technically the wrong way, a shortcut, requiring kernel bypass. Honestly for reasons that are beyond my understanding :/ >for device features that do not have a kernel uapi or vendors refuse to >offer a kernel way, so it is the only option. Policical reasons. > >The bottom line to me is that these hardline, dogmatic approaches - >resisting the recognition of reality - is only harming users. There is a >middle ground, open source drivers and tools that offer more flexibility. >
Thu, Jun 06, 2024 at 04:18:11PM CEST, kuba@kernel.org wrote: >On Wed, 5 Jun 2024 20:35:49 -0600 David Ahern wrote: >> Until a feature is standardized and/or commoditized, it does not make >> sense to create a uapi for every H/W vendor whim. > >This is not about non-standard features. I work with multiple vendors >as my day job. I ask them how to set basic link configuration and the >support person gives me a link to the vendor tools! I wish I could show >you the emails. Even without emails seen, I believe you. Well, isn't it just natural? I mean, it always takes a bigger (sometimes much bigger) effort to implement things properly introducing/extending apis/uapis. Implement things in vendor tool is easy, low hanging fruit, people naturally pick them. I've been around in netdev for better part of second decade. I think, for the sake of discussion, it is worth mentioning, that a big part of netdev success despite complexicity is that in the past, any attempt of kernel bypass (I recall few) was promptly rejected. There was always big push for proper abstracted solution. And I believe it helped a lot all over the place. Is this approach depleted? I don't know, maybe. (And yes, I'm aware not everything could be done this way). I understand the reason and motivation for this patchset and what it will solve, don't get me wrong. I kind of like it, it will help to remove all painful detours we currenly have. My concern is, it opens a pandora box for netdev *for sure*. It that desired and anticipated? Do the gains overweight the potential losses? Will it help the ecosystem? What is motivation for vendor to take the hard way of using proper api (even existing ones) after? Moreover, wouldn't this serve for vendors to go out of leash and start to introduce even more H/W vendor whims? I think these are serious questions we need to ask before this is merged. > >> All of them are attempting to solve real problems; some of them will >> stick. We know which features are valuable when customers use them, > >Yes, once customers deploy a feature implemented via a vendor API >they will definitely migrate to a different API. Customers like risk >and wasting their engineering resources reimplementing and redeploying >things? And we have so much success move users to new APIs in Linux! > >> ask for them and other vendors copy them. Until then it is a 1-off by >> a vendor basically proposing a solution. > >Certainly. Because... who exactly will ask the second vendor to >implement the common API? > >And the second vendor will most certainly not mind the extra delay and >inconvenience having their product shipped via the publicly reviewed, >and slow to deploy kernel, while the first one is happily selling >the same feature already. > >> Not all ideas are good ideas, and we do not need the burden of a uapi >> or the burden of out of tree drivers. > >This API gives user space SDKs a trivial way of implementing all >switching, routing, filtering, QoS offloads etc. >An argument can be made that given somewhat mixed switchdev experience Can you elaborabe a bit more what you mean by "mixed switchdev experience" please? >we should just stay out of the way and let that happen. But just make >that argument then, instead of pretending the use of this API will be >limited to custom very vendor specific things. > >Again, if someone needs this to ship their custom CXL/Infiniband >AI fabric magic, which is un-interoperable by design -- none of >my concern. But keep TCP/IP networking out of this :| >
On 6/7/24 02:25, Jason Gunthorpe wrote: > On Thu, Jun 06, 2024 at 10:24:46AM -0700, Dan Williams wrote: >> Jason Gunthorpe wrote: >> [..] >>>> I am warming to your assertion that there is a wide array of >>>> vendor-specific configuration and debug that are not an efficient use of >>>> upstream's time to wrap in a shared Linux ABI. I want to explore fwctl >>>> for CXL for that use case, I personally don't want to marshal a Linux >>>> command to each vendor's slightly different backend CXL toggles. >>> >>> Personally I think this idea to marshal/unmarshal everything in the >>> kernel is often misguided. If it is truely obvious and actually shared >>> multi-vendor capability then by all means go and do it. >>> >>> But if you are spending weeks/months fighting about uAPI because all >>> the vendors are so different, it isn't obvious what is "generic" then >>> you've probably already lost. The very worst outcome is a per-device >>> uAPI masquerading as an obfuscated "generic" uAPI that wasted ages of >>> peoples time to argue out. >> >> Certainly once you have gotten to the "months of arguing" point it begs the >> question "was there really any generic benefit to reap in the first >> place?" > > Indeed, but I've seen, and participated, in these things many times :) > >> That said, *some* grappling, especially when muliple vendors hit the >> list with the similar feature at the same time, has yielded >> collaboration in the past. > > Absolutely! But we have also frequently done that retroactively, like > see three examples and then consolidate the common APIs. The challenge > is uAPI. Since we can't change uAPI people like to rush to make it > future proof without examples. Broadly I lean towards waiting until we > have several examples to build a standard uAPI and let the examples > evolve on their own. > > If there is value in the commonality then people will change over. what has changed over decades is that now Linux has much more users than implementations of given tool I would love to see a move of the uAPI barrier closer to the user, we will be free to refactor kernel APIs, given "the system tool" will be updated at the same time. Obviously for a new uAPI that would (re)move the promise on the very beginning.
> >This API gives user space SDKs a trivial way of implementing all > >switching, routing, filtering, QoS offloads etc. > >An argument can be made that given somewhat mixed switchdev experience > > Can you elaborabe a bit more what you mean by "mixed switchdev > experience" please? I don't want to put words in Jakubs mouth but, in my opinion, switchdev has been great for SoHo switches. We have over 100 supported, mostly implemented by the community, but some vendors also supporting their own hardware. We have two enterprise switch families supported, each by its own vendor. And we have one TOR switch family supported by the vendor. So i would say switchdev has worked out great for SoHo, but kernel bypass is still the norm for most things bigger than SoHo. Why? My guess is, the products with a SoHo switch is not actually a switch. It is a wifi box, with a switch. It is a cable modem, with a switch. It is an inflight entertainment system, with a switch, etc. It is much easier to build such multi-purpose systems when everything is nicely integrated into the kernel, you don't have to fight with multiple vendors supplying SDKs which only work on a disjoint set of kernels, etc. For bigger, single purpose devices, it is just a switch, there is less inconvenience of using just one vendor SDK, on top of the vendor proscribed kernel. Andrew
On Thu, Jun 06, 2024 at 03:11:21PM -0700, Dan Williams wrote: > Leon Romanovsky wrote: > > On Wed, Jun 05, 2024 at 09:56:14PM -0700, Dan Williams wrote: > > > Jason Gunthorpe wrote: > > > > <...> > > > > > So my questions to try to understand the specific sticking points more > > > are: > > > > > > 1/ Can you think of a Command Effect that the device could enumerate to > > > address the specific shenanigan's that netdev is worried about? In other > > > words if every command a device enables has the stated effect of > > > "Configuration Change after Reset" does that cut out a significant > > > portion of the concern? > > > > It will prevent SR-IOV devices (or more accurate their VFs) > > to be configured through the fwctl, as they are destroyed in HW > > during reboot. > > Right, but between zero configurability and losing live SR-IOV > configurabilitiy is there still value? For the users that are using SR-IOV, it is a big loss. It will require from them to use two tools now instead of one. My point is that we need to try and find best solution for the users and not "compromise variant" that will make everyone unhappy. Thanks
Fri, Jun 07, 2024 at 02:49:19PM CEST, andrew@lunn.ch wrote: >> >This API gives user space SDKs a trivial way of implementing all >> >switching, routing, filtering, QoS offloads etc. >> >An argument can be made that given somewhat mixed switchdev experience >> >> Can you elaborabe a bit more what you mean by "mixed switchdev >> experience" please? > >I don't want to put words in Jakubs mouth but, in my opinion, >switchdev has been great for SoHo switches. We have over 100 >supported, mostly implemented by the community, but some vendors also >supporting their own hardware. > >We have two enterprise switch families supported, each by its own >vendor. And we have one TOR switch family supported by the vendor. > >So i would say switchdev has worked out great for SoHo, but kernel >bypass is still the norm for most things bigger than SoHo. > >Why? My guess is, the products with a SoHo switch is not actually a >switch. It is a wifi box, with a switch. It is a cable modem, with a >switch. It is an inflight entertainment system, with a switch, etc. >It is much easier to build such multi-purpose systems when everything >is nicely integrated into the kernel, you don't have to fight with >multiple vendors supplying SDKs which only work on a disjoint set of >kernels, etc. > >For bigger, single purpose devices, it is just a switch, there is less >inconvenience of using just one vendor SDK, on top of the vendor >proscribed kernel. I'm aware of what you wrote and undertand it. I just thought Jakub's mixed experience is about the APIs more than the politics behind vedors adoptation process.. > > Andrew >
On 6/7/24 12:48 AM, Jiri Pirko wrote: >> The switchdev / sonic comparison came to mind as well during this >> thread. The existence of a kernel way (switchdev) has not stopped sonic >> (userspace SDK) from gaining traction. In some cases the SDK is required > > Is this discussion technical or policital? I'm asking because it makes > huge difference. There is no technical reason why sonic does not use > proper in-kernel solution from what I see > Yes, they chose technically the wrong way, a shortcut, requiring kernel > bypass. Honestly for reasons that are beyond my understanding :/ > > >> for device features that do not have a kernel uapi or vendors refuse to >> offer a kernel way, so it is the only option. > > Policical reasons. > You meant financial reasons, not political. The dominant player in switches has zero interest in switchdev, zero interest in open sourcing their SDK. Nothing has changed on that front in the 9 years of switchdev's existence and no amount of 'NO' by maintainers is ever going to pressure said vendor to do that. Mellanox offers both with the Spectrum line and should have a pretty good understanding of how many customers deploy with the SDK vs switchdev. Why is that? There are those who think in logical, simple designs (switchdev), and those who prefer complex, all userspace designs with ping-ponging messages across processes (sonic). The latter uses all kinds of what I call silly rationalizations from userspace allows more flexibility, to dealing with the the kernel is too rigid, or getting changes in is too hard, or my favorite - Linux does not scale. The bottom line is that the SDK model is not going away. Period. The networking stack has accepted kernel bypass compromises (xdp, xdp sockets, OVS, a lot of the ebpf hooks, ... just examples) with the rationale that more is brought into the Linux way. fwctl is a similar effort - an attempt at bringing more into an open source driver and tooling.
On Fri, Jun 07, 2024 at 08:50:17AM -0600, David Ahern wrote: > Mellanox offers both with the Spectrum line and should have a pretty > good understanding of how many customers deploy with the SDK vs > switchdev. Why is that? We offer lots of options with mlx5 switching too, and switchdev is not being selected by customers principally for performance reasons, in my view. The OVS space wants to operate the switch much like a firewall and this creates a high rate of database updates and exception packets. DPDK can operate all the same offload HW from userspace and avoid all the system call and other kernel overhead. It is much more purpose built to what OVS wants to do. In the >50Gbps space this matters a lot and overall DPDK performance notably wins over switchdev for many OVS workloads - even though the high speed path is near-identical. In this role DPDK is effectively a switch SDK, an open source one at least. Sadly I'm seeing signs that proprietary OVS focused SDKs (think various P4 offerings and others) are out competing open DPDK on merit :( For whatever reason the market for switching is not strongly motivated toward open SDKs, and the available open solutions are struggling a bit to compete. But to repeat again, fwctl is not for dataplane, it is not for implementing a switch SDK (go use RDMA if you want to do that). I will write here a commitment to accept patches blocking such usages if drivers try to abuse the purpose of the subsystem. Jason
Fri, Jun 07, 2024 at 05:14:51PM CEST, jgg@nvidia.com wrote: >On Fri, Jun 07, 2024 at 08:50:17AM -0600, David Ahern wrote: > >> Mellanox offers both with the Spectrum line and should have a pretty >> good understanding of how many customers deploy with the SDK vs >> switchdev. Why is that? > >We offer lots of options with mlx5 switching too, and switchdev is not >being selected by customers principally for performance reasons, in my >view. > >The OVS space wants to operate the switch much like a firewall and >this creates a high rate of database updates and exception >packets. DPDK can operate all the same offload HW from userspace and >avoid all the system call and other kernel overhead. It is much more >purpose built to what OVS wants to do. In the >50Gbps space this >matters a lot and overall DPDK performance notably wins over switchdev >for many OVS workloads - even though the high speed path is >near-identical. > >In this role DPDK is effectively a switch SDK, an open source one at >least. > >Sadly I'm seeing signs that proprietary OVS focused SDKs (think >various P4 offerings and others) are out competing open DPDK on >merit :( > >For whatever reason the market for switching is not strongly motivated >toward open SDKs, and the available open solutions are struggling a >bit to compete. > >But to repeat again, fwctl is not for dataplane, it is not for >implementing a switch SDK (go use RDMA if you want to do that). I will switch sdk is all about control plane. >write here a commitment to accept patches blocking such usages if >drivers try to abuse the purpose of the subsystem. > >Jason
On Fri, Jun 07, 2024 at 05:50:41PM +0200, Jiri Pirko wrote: > >But to repeat again, fwctl is not for dataplane, it is not for > >implementing a switch SDK (go use RDMA if you want to do that). I will > > switch sdk is all about control plane. Ah, a poor tearm. I ment any involvement in the data flow of the device including reaching into the so-called control plane of a switch to manipulate data flow. Jason
On Fri, 7 Jun 2024 15:34:48 +0200 Jiri Pirko wrote: > >For bigger, single purpose devices, it is just a switch, there is less > >inconvenience of using just one vendor SDK, on top of the vendor > >proscribed kernel. > > I'm aware of what you wrote and undertand it. I just thought Jakub's > mixed experience is about the APIs more than the politics behind vedors > adoptation process.. Not the API / implementation, just that the adoption is limited. The benefits of using a standard Linux approach is outweighed by the large pool of talent with experience programming using the SDK of *the* vendor.
On Wed, Jun 05, 2024 at 10:59:11AM -0300, Jason Gunthorpe wrote: > On Tue, Jun 04, 2024 at 04:56:57PM -0700, Dan Williams wrote: > > * Introspection / validation: Subsystem community needs to be able to > > audit behavior after the fact. > > > > To me this means even if the kernel is letting a command through based > > on the stated Command Effect of "Configuration Change after Cold Reset" > > upstream community has a need to be able to read the vendor > > specification for that command. I.e. commands might be vendor-specific, > > but never vendor-private. I see this as similar to the requirement for > > open source userspace for sophisticated accelerators. > > I'm less hard on this. As long as reasonable open userspace exists I > think it is fine to let other stuff through too. I can appreciate the > DRM stance on this, but IMHO, there is meaningfully more value for open > source in trying get an open Vulkan implementation vs blocking users > from reading their vendor'd diagnostic SI values. > > I don't think we should get into some kind of extremism and insist > that every single bit must be documented/standardized or Linux won't > support it. I figured it might be useful to paint what we do in DRM with a bit more nuance. In the principles, we're indeed fairly radical in what we require, but in practice we aim for a much more pragmatic approach in what we merge. There's two major axis here: 1. One is ecosystem maturity. One end is 3d, with vulkan as the clear industry standard, and an upstream full-featured userspace driver in mesa3d is the only technically reasonable choice. And all gpu vendors agree and by this year even nvidia started hiring an upstream team. But this didn't happen magically overnight, it took 1-2 decades of background discussions and tactical push&pulling to get there. The other end is currently AI accelerators. It's a complete mess, where across the platform (client, edge, cloud), customer and vendor dimension every point has a different stack. And the problem is so obvious that everyone is working to fix this, which means currently https://xkcd.com/927/ is happening in parallel. Just to get things going we're accepting pretty much anything that's a notch above total garbage for userspace and for merging into the kernel. 2. The other part is how much it impacts applications. If you can't run the same application across different vendors, the case for an upstream stack becomes a lot weaker. At the other end is infrastructure enabling like device configuration, error handling and recovery, hw debugging and reliablity/health reporting. That's a lot more vendor specific in nature and needs to be customized anyway per deployement. And only much higher in the stack, maybe in k8s, can a technically reasonable unification even happen. So again we're much more lenient about infrastructure enabling and uapi than stuff applications will use directly. Currently that's enough of a mess in drm that I feel like enforcing something like fwctl is still too much. But maybe once fwctl is established with other subsystems/devices we can start the conversations with vendors to get this going a few years down the road. Both together mean we land a lot of code that's questionable at best, clear garbage at worst. But since we've been in the merging garbage business just to get things going for decades, we've become pretty good at dealing with the kernel internal and uapi fallout, some say too good. But personally I don't think there's a path to where we are with 3d/vulkan that doesn't go through years of this kind of suck, and very much merged into upstream kind of suck. For all the concerns about trusting vendors/devices to not abuse very broad uapi interfaces: Modern accelerator command submission boils down to "run this context at this $addr", and the kernel never ever directly sees anything more fly by. That's the same interface you need for a no-op job as a full blown AI workload, so in theory maximal abuse potential. In practice, it doesn't seem to be an issue, at least not beyond the intentionally pragmatic choices where we merge kernel code with known sub-par/incomplete userspace. I'm not sure why, but to my knowledge all attempts to break the spirit of our userspace rules while following the letter die in vendor-internal discussions, at least for all the established upstream driver teams. And for new ones it takes years of private chats to get them going and fully established in upstream anyway. Maybe one reason we have a bit an extremist reputation is that all the public takes are the radical principled requirements, while the actual pragmatic discussions mostly happen in private. tldr; fwctl as I understand it feels like a bridge to far for drm today, but I'd very much like someone else to make this happen so we could eventually push towards adoption too. Cheers, Sima
On Tue, Jun 11, 2024 at 05:36:17PM +0200, Daniel Vetter wrote: > reliablity/health reporting. That's a lot more vendor specific in nature > and needs to be customized anyway per deployement. And only much higher in > the stack, maybe in k8s, can a technically reasonable unification even > happen. So again we're much more lenient about infrastructure enabling > and uapi than stuff applications will use directly. To be clear, this is the specific niche fwctl is for. It is not for GPU command submission or something like that, and as I said to Jiri I would agree to agressively block such abuses. > Currently that's enough of a mess in drm that I feel like enforcing > something like fwctl is still too much. But maybe once fwctl is > established with other subsystems/devices we can start the conversations > with vendors to get this going a few years down the road. I wouldn't say enforcing, but instead of having every GPU driver build their own weird vendor'd way to access their debug/diagnostic stuff steer them into fwctl. These data center GPUs with FW at least have lots of appropriate stuff and all the vendor OOT stuff has tooling to inspect the GPUs far more than DRM has code for (ie rocm-smi/nvidia-smi are have some features that are potentially good candidates for fwctl) > In practice, it doesn't seem to be an issue, at least not beyond the > intentionally pragmatic choices where we merge kernel code with known > sub-par/incomplete userspace. I'm not sure why, but to my knowledge all > attempts to break the spirit of our userspace rules while following the > letter die in vendor-internal discussions, at least for all the > established upstream driver teams. I think the same is broadly true of RDMA as well, except we don't bother with the kernel trying to police the command stream - direct submission from userspace. I can't say it has been much of an issue. > tldr; fwctl as I understand it feels like a bridge to far for drm today, > but I'd very much like someone else to make this happen so we could > eventually push towards adoption too. Hahah, okay, well, I'm pushing :) Jason
On Tue, Jun 11, 2024 at 01:17:02PM -0300, Jason Gunthorpe wrote: > On Tue, Jun 11, 2024 at 05:36:17PM +0200, Daniel Vetter wrote: > > reliablity/health reporting. That's a lot more vendor specific in nature > > and needs to be customized anyway per deployement. And only much higher in > > the stack, maybe in k8s, can a technically reasonable unification even > > happen. So again we're much more lenient about infrastructure enabling > > and uapi than stuff applications will use directly. > > To be clear, this is the specific niche fwctl is for. It is not for > GPU command submission or something like that, and as I said to Jiri I > would agree to agressively block such abuses. > > > Currently that's enough of a mess in drm that I feel like enforcing > > something like fwctl is still too much. But maybe once fwctl is > > established with other subsystems/devices we can start the conversations > > with vendors to get this going a few years down the road. > > I wouldn't say enforcing, but instead of having every GPU driver build > their own weird vendor'd way to access their debug/diagnostic stuff > steer them into fwctl. These data center GPUs with FW at least have > lots of appropriate stuff and all the vendor OOT stuff has tooling to > inspect the GPUs far more than DRM has code for (ie > rocm-smi/nvidia-smi are have some features that are potentially good > candidates for fwctl) Yeah "enforcing" to the level we do with 3d/vulkan would be years down the road, if ever. Very unlikely imo for debug/diagnostics/tuning stuff. > > In practice, it doesn't seem to be an issue, at least not beyond the > > intentionally pragmatic choices where we merge kernel code with known > > sub-par/incomplete userspace. I'm not sure why, but to my knowledge all > > attempts to break the spirit of our userspace rules while following the > > letter die in vendor-internal discussions, at least for all the > > established upstream driver teams. > > I think the same is broadly true of RDMA as well, except we don't > bother with the kernel trying to police the command stream - direct > submission from userspace. I can't say it has been much of an issue. Maybe just a bit confusion, but all modern-ish drm drivers stopped parsing the command stream a while ago. We only ever did that to fill security gaps, never to enforce any rules about what userspace is allowed to do beyond that. The rule that the open userspace needs to be complete, for some reasonably pragmatic definition of "complete", is entirely a social contract. And I'm not aware of any real issues with enforcing that beyond just trusting the established vendor teams. So yeah no real issues with uabi that allows maximal abuse because it's entirely unchecked by the kernel code. Or put differently, I think we're trying to make the same point. > > tldr; fwctl as I understand it feels like a bridge to far for drm today, > > but I'd very much like someone else to make this happen so we could > > eventually push towards adoption too. > > Hahah, okay, well, I'm pushing :) Thanks :-) -Sima