Message ID | 20180711053122.30773-2-andrew@aj.id.au (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Wed, Jul 11, 2018 at 03:01:19PM +0930, Andrew Jeffery wrote: > Baseboard Management Controllers (BMCs) are embedded SoCs that exist to > provide remote management of (primarily) server platforms. BMCs are > often tightly coupled to the platform in terms of behaviour and provide > many hardware features integral to booting and running the host system. > > Some of these hardware features are simple, for example scratch > registers provided by the BMC that are exposed to both the host and the > BMC. In other cases there's a single bit switch to enable or disable > some of the provided functionality. > > The documentation defines bindings for fields in registers that do not > integrate well into other driver models yet must be described to allow > the BMC kernel to assume control of these features. So we'll get a new binding when that happens? That will break compatibility. > > Signed-off-by: Andrew Jeffery <andrew@aj.id.au> > --- > > Since RFC v1: > > * Add a commit message > * Minor changes to documented labels > > .../bindings/misc/bmc-misc-ctrl.txt | 252 ++++++++++++++++++ > MAINTAINERS | 6 + > 2 files changed, 258 insertions(+) > create mode 100644 Documentation/devicetree/bindings/misc/bmc-misc-ctrl.txt > > diff --git a/Documentation/devicetree/bindings/misc/bmc-misc-ctrl.txt b/Documentation/devicetree/bindings/misc/bmc-misc-ctrl.txt > new file mode 100644 > index 000000000000..2c869fcc7ef2 > --- /dev/null > +++ b/Documentation/devicetree/bindings/misc/bmc-misc-ctrl.txt > @@ -0,0 +1,252 @@ > +BMC Miscellaneous Control Interfaces > +==================================== > + > +Baseboard Management Controllers (BMCs) often have an array of hardware > +features that need to be described but are awkward to sensibly expose. > + > +This bindings document provides a generic mechanism for describing such > +features, covering read-only (RO), read-modify-write (RMW) and > +write-1-set/write-1-clear (W1SC) semantics. If we wanted a generic mechanism for single register bits/fields in DT, we'd have one already. A node per register bit doesn't scale. Maybe this should be modelled using GPIO binding? There's a line there too as whether the signals are "general purpose" or not. Rob
On Wed, 2018-07-11 at 14:04 -0600, Rob Herring wrote: > On Wed, Jul 11, 2018 at 03:01:19PM +0930, Andrew Jeffery wrote: > > Baseboard Management Controllers (BMCs) are embedded SoCs that exist to > > provide remote management of (primarily) server platforms. BMCs are > > often tightly coupled to the platform in terms of behaviour and provide > > many hardware features integral to booting and running the host system. > > > > Some of these hardware features are simple, for example scratch > > registers provided by the BMC that are exposed to both the host and the > > BMC. In other cases there's a single bit switch to enable or disable > > some of the provided functionality. > > > > The documentation defines bindings for fields in registers that do not > > integrate well into other driver models yet must be described to allow > > the BMC kernel to assume control of these features. > > So we'll get a new binding when that happens? That will break > compatibility. What do you mean ? The binding provides a way to describe arbitrary register fields and expose them to userspace. I'm not sure what you mean by backward compatibility. There is simply no way these things can be "abstracted" via some kind of nice layered Linux subsystem of some sort. It's just a whole bunch of knobs that control various things integral to the operation of the host system. Andrew can provide a more precise list if you need to. > > > > > Signed-off-by: Andrew Jeffery <andrew@aj.id.au> > > --- > > > > Since RFC v1: > > > > * Add a commit message > > * Minor changes to documented labels > > > > .../bindings/misc/bmc-misc-ctrl.txt | 252 ++++++++++++++++++ > > MAINTAINERS | 6 + > > 2 files changed, 258 insertions(+) > > create mode 100644 Documentation/devicetree/bindings/misc/bmc-misc-ctrl.txt > > > > diff --git a/Documentation/devicetree/bindings/misc/bmc-misc-ctrl.txt b/Documentation/devicetree/bindings/misc/bmc-misc-ctrl.txt > > new file mode 100644 > > index 000000000000..2c869fcc7ef2 > > --- /dev/null > > +++ b/Documentation/devicetree/bindings/misc/bmc-misc-ctrl.txt > > @@ -0,0 +1,252 @@ > > +BMC Miscellaneous Control Interfaces > > +==================================== > > + > > +Baseboard Management Controllers (BMCs) often have an array of hardware > > +features that need to be described but are awkward to sensibly expose. > > + > > +This bindings document provides a generic mechanism for describing such > > +features, covering read-only (RO), read-modify-write (RMW) and > > +write-1-set/write-1-clear (W1SC) semantics. > > If we wanted a generic mechanism for single register bits/fields in DT, > we'd have one already. A node per register bit doesn't scale. Register *field*. I think you are looking at this from the wrong angle. This isn't about exposing 236821643 SoC registers in an embedded product. This is about exposing a dozen or so tunables that don't control the SoC from the perspective of the OS running on it, but controls how various features of the SoC are exposed to the *host* system. Examples could be which of the SoC internal PCIe devices exposed to the host is enabled (the SoC can appear as a GPU, a BMC misc device or both or neither, it can have a DMA controller or not, etc...). Other examples are scratch registers that need to be set to system specific values by userspace, which the BIOS of the host will read to determine some configuration information. That sort of thing. There isn't that many, scalability isn't a problem. Both the list of registers of interest and what needs to go in them is very much system/vendor specific. This is a way for the kernel to act as a "conduit" that doesn't need to know the specifics of a given system/vendor/BIOS combination. Your response smells like yet another case of trying to apply one of those general "blanket" rules to something where it's completely unapplicable. > Maybe this should be modelled using GPIO binding? There's a line there > too as whether the signals are "general purpose" or not. GPIOs are binary values right ? These are arbitrary fields. We want arbitrary fields. This is really needed Rob, otherwise we'll NEVER have BMC support upstream. The other option will be a dozen random ad-hoc char-devs that would have to be updated every time a new bit needs to be added or changed. Ben.
Hi Rob, Thanks for the response. On Thu, 12 Jul 2018, at 05:34, Rob Herring wrote: > On Wed, Jul 11, 2018 at 03:01:19PM +0930, Andrew Jeffery wrote: > > Baseboard Management Controllers (BMCs) are embedded SoCs that exist to > > provide remote management of (primarily) server platforms. BMCs are > > often tightly coupled to the platform in terms of behaviour and provide > > many hardware features integral to booting and running the host system. > > > > Some of these hardware features are simple, for example scratch > > registers provided by the BMC that are exposed to both the host and the > > BMC. In other cases there's a single bit switch to enable or disable > > some of the provided functionality. > > > > The documentation defines bindings for fields in registers that do not > > integrate well into other driver models yet must be described to allow > > the BMC kernel to assume control of these features. > > So we'll get a new binding when that happens? That will break > compatibility. Can you please expand on this? I'm not following. > > > > > Signed-off-by: Andrew Jeffery <andrew@aj.id.au> > > --- > > > > Since RFC v1: > > > > * Add a commit message > > * Minor changes to documented labels > > > > .../bindings/misc/bmc-misc-ctrl.txt | 252 ++++++++++++++++++ > > MAINTAINERS | 6 + > > 2 files changed, 258 insertions(+) > > create mode 100644 Documentation/devicetree/bindings/misc/bmc-misc-ctrl.txt > > > > diff --git a/Documentation/devicetree/bindings/misc/bmc-misc-ctrl.txt b/Documentation/devicetree/bindings/misc/bmc-misc-ctrl.txt > > new file mode 100644 > > index 000000000000..2c869fcc7ef2 > > --- /dev/null > > +++ b/Documentation/devicetree/bindings/misc/bmc-misc-ctrl.txt > > @@ -0,0 +1,252 @@ > > +BMC Miscellaneous Control Interfaces > > +==================================== > > + > > +Baseboard Management Controllers (BMCs) often have an array of hardware > > +features that need to be described but are awkward to sensibly expose. > > + > > +This bindings document provides a generic mechanism for describing such > > +features, covering read-only (RO), read-modify-write (RMW) and > > +write-1-set/write-1-clear (W1SC) semantics. > > If we wanted a generic mechanism for single register bits/fields in DT, > we'd have one already. I feel like this is an argument of tradition. Maybe people have been dissuaded from doing so when they don't have a reasonable use-case? I'm not saying that what I'm proposing is unquestionably reasonable, but I don't want to dismiss it out of hand. > A node per register bit doesn't scale. It isn't meant to scale in terms of a single system. Using it extensively is very likely wrong. Separately, register-bit-led does pretty much the same thing. Doesn't the scale argument apply there? Who is to stop me from attaching an insane number of LEDs to a system? Obviously if there are lots of systems using it sparingly and legitimately then maybe there's a scale issue, but isn't that just a reality of different hardware designs? Whoever is implementing support for the system is going to have to describe the hardware one way or another. > > Maybe this should be modelled using GPIO binding? There's a line there > too as whether the signals are "general purpose" or not. I don't think so, mainly because some of the things it is intended to be used for are not GPIOs. For instance, take the DAC mux I've described in the patch. It doesn't directly influence anything external to the SoC (i.e. it's certainly not a traditional GPIO in any sense). However, it does *indirectly* influence the SoC's behaviour by muxing the DAC internally between: 0. VGA device exposed on the host PCIe bus 1. The "Graphics CRT" controller 2. VGA port A 3. VGA port B Maybe this could be modelled by pinmux, but then we still need some way to expose the mux functions to userspace for selection (userspace needs to transition arbitrarily between at least options 0 and 1 at runtime), at which point we haven't achieved much beyond adding a whole heap of infrastructure in the chain. Given 0 and 1, maybe exposing attributes in relevant drivers would be reasonable, except 0 isn't exposed on the SoC's internal bus so there is no driver on the BMC-side to do so. Taking into account 2 and 3 are also purely hardware paths further dashes the idea, as the configuration doesn't really "belong" to the Graphics CRT device more than it belongs anywhere else, except for the fact that there isn't anywhere else to expose it. Further, the BMC's kernel can't make the decision as to when to switch the mux as it knows nothing of the host's state. The BMC userspace is controlling the host's boot state and so *does* know when to flip the switch. Finally, the mux is in separate IP to the CRT or VGA blocks: It lives in the System Control Unit. My current point of view is the DAC mux field is effectively its own device, and we need to control it from userspace, so we need some way to describe it (i.e. not ignore it) in order for its capability to be exposed. I'm fully aware what I'm proposing isn't awesome as it's not providing any real abstraction, but the problem(s) at hand also seem to defy abstraction, and in order to avoid a plethora of bespoke bindings I thought it was reasonable to define something generic. All-in-all I appreciate the suggestion, but assuming you agree with my reasoning above do you have thoughts on other alternatives? Cheers, Andrew
On Wed, Jul 11, 2018 at 6:54 PM Andrew Jeffery <andrew@aj.id.au> wrote: > > Hi Rob, > > Thanks for the response. > > On Thu, 12 Jul 2018, at 05:34, Rob Herring wrote: > > On Wed, Jul 11, 2018 at 03:01:19PM +0930, Andrew Jeffery wrote: > > > Baseboard Management Controllers (BMCs) are embedded SoCs that exist to > > > provide remote management of (primarily) server platforms. BMCs are > > > often tightly coupled to the platform in terms of behaviour and provide > > > many hardware features integral to booting and running the host system. > > > > > > Some of these hardware features are simple, for example scratch > > > registers provided by the BMC that are exposed to both the host and the > > > BMC. In other cases there's a single bit switch to enable or disable > > > some of the provided functionality. > > > > > > The documentation defines bindings for fields in registers that do not > > > integrate well into other driver models yet must be described to allow > > > the BMC kernel to assume control of these features. > > > > So we'll get a new binding when that happens? That will break > > compatibility. > > Can you please expand on this? I'm not following. If we have a subsystem in the future, then there would likely be an associated binding which would be different. So if you update the DT, then old kernels won't work with it. > > > Signed-off-by: Andrew Jeffery <andrew@aj.id.au> > > > --- > > > > > > Since RFC v1: > > > > > > * Add a commit message > > > * Minor changes to documented labels > > > > > > .../bindings/misc/bmc-misc-ctrl.txt | 252 ++++++++++++++++++ > > > MAINTAINERS | 6 + > > > 2 files changed, 258 insertions(+) > > > create mode 100644 Documentation/devicetree/bindings/misc/bmc-misc-ctrl.txt > > > > > > diff --git a/Documentation/devicetree/bindings/misc/bmc-misc-ctrl.txt b/Documentation/devicetree/bindings/misc/bmc-misc-ctrl.txt > > > new file mode 100644 > > > index 000000000000..2c869fcc7ef2 > > > --- /dev/null > > > +++ b/Documentation/devicetree/bindings/misc/bmc-misc-ctrl.txt > > > @@ -0,0 +1,252 @@ > > > +BMC Miscellaneous Control Interfaces > > > +==================================== > > > + > > > +Baseboard Management Controllers (BMCs) often have an array of hardware > > > +features that need to be described but are awkward to sensibly expose. > > > + > > > +This bindings document provides a generic mechanism for describing such > > > +features, covering read-only (RO), read-modify-write (RMW) and > > > +write-1-set/write-1-clear (W1SC) semantics. > > > > If we wanted a generic mechanism for single register bits/fields in DT, > > we'd have one already. > > I feel like this is an argument of tradition. Maybe people have been dissuaded from doing so when they don't have a reasonable use-case? I'm not saying that what I'm proposing is unquestionably reasonable, but I don't want to dismiss it out of hand. One of experience. The one that stands out is clock bindings. Initially we were doing a node per clock modelling which could end up being 100s of nodes and is difficult to get right (with DT being an ABI). It comes up with system controller type blocks too that just have a bunch of random registers. Those change in every SoC and not in any controlled or ordered way that would make describing the individual sub-functions in DT worthwhile. > > > A node per register bit doesn't scale. > > It isn't meant to scale in terms of a single system. Using it extensively is very likely wrong. Separately, register-bit-led does pretty much the same thing. Doesn't the scale argument apply there? Who is to stop me from attaching an insane number of LEDs to a system? Review. If you look, register-bit-led is rarely used outside of some ARM, Ltd. boards. It's simply quite rare to have MMIO register bits that have a fixed function of LED control. > Obviously if there are lots of systems using it sparingly and legitimately then maybe there's a scale issue, but isn't that just a reality of different hardware designs? Whoever is implementing support for the system is going to have to describe the hardware one way or another. > > > > > Maybe this should be modelled using GPIO binding? There's a line there > > too as whether the signals are "general purpose" or not. > > I don't think so, mainly because some of the things it is intended to be used for are not GPIOs. For instance, take the DAC mux I've described in the patch. It doesn't directly influence anything external to the SoC (i.e. it's certainly not a traditional GPIO in any sense). However, it does *indirectly* influence the SoC's behaviour by muxing the DAC internally between: > > 0. VGA device exposed on the host PCIe bus > 1. The "Graphics CRT" controller > 2. VGA port A > 3. VGA port B And this mux control is fixed in the SoC design? > > Maybe this could be modelled by pinmux, but then we still need some way to expose the mux functions to userspace for selection (userspace needs to transition arbitrarily between at least options 0 and 1 at runtime), at which point we haven't achieved much beyond adding a whole heap of infrastructure in the chain. > > Given 0 and 1, maybe exposing attributes in relevant drivers would be reasonable, except 0 isn't exposed on the SoC's internal bus so there is no driver on the BMC-side to do so. Taking into account 2 and 3 are also purely hardware paths further dashes the idea, as the configuration doesn't really "belong" to the Graphics CRT device more than it belongs anywhere else, except for the fact that there isn't anywhere else to expose it. > > Further, the BMC's kernel can't make the decision as to when to switch the mux as it knows nothing of the host's state. The BMC userspace is controlling the host's boot state and so *does* know when to flip the switch. Finally, the mux is in separate IP to the CRT or VGA blocks: It lives in the System Control Unit. > > My current point of view is the DAC mux field is effectively its own device, and we need to control it from userspace, so we need some way to describe it (i.e. not ignore it) in order for its capability to be exposed. > > I'm fully aware what I'm proposing isn't awesome as it's not providing any real abstraction, but the problem(s) at hand also seem to defy abstraction, and in order to avoid a plethora of bespoke bindings I thought it was reasonable to define something generic. > > All-in-all I appreciate the suggestion, but assuming you agree with my reasoning above do you have thoughts on other alternatives? Seems the controls are more fixed than I first thought. All the data you have here could simply be within a driver. Help me understand what functions are fixed (in the SoC) and which ones vary by board. Only what's changing per board really needs to go into DT. Rob
On Thu, 2018-07-12 at 09:11 -0600, Rob Herring wrote: > On Wed, Jul 11, 2018 at 6:54 PM Andrew Jeffery <andrew@aj.id.au> wrote: > > > > Hi Rob, > > > > Thanks for the response. > > > > On Thu, 12 Jul 2018, at 05:34, Rob Herring wrote: > > > On Wed, Jul 11, 2018 at 03:01:19PM +0930, Andrew Jeffery wrote: > > > > Baseboard Management Controllers (BMCs) are embedded SoCs that exist to > > > > provide remote management of (primarily) server platforms. BMCs are > > > > often tightly coupled to the platform in terms of behaviour and provide > > > > many hardware features integral to booting and running the host system. > > > > > > > > Some of these hardware features are simple, for example scratch > > > > registers provided by the BMC that are exposed to both the host and the > > > > BMC. In other cases there's a single bit switch to enable or disable > > > > some of the provided functionality. > > > > > > > > The documentation defines bindings for fields in registers that do not > > > > integrate well into other driver models yet must be described to allow > > > > the BMC kernel to assume control of these features. > > > > > > So we'll get a new binding when that happens? That will break > > > compatibility. > > > > Can you please expand on this? I'm not following. > > If we have a subsystem in the future, then there would likely be an > associated binding which would be different. So if you update the DT, > then old kernels won't work with it. What kind of "subsystem" ? There is almost no way there could be one for that sort of BMC tunables. We've look at several BMC chips out there and requirements from several vendors, BIOS and system manufacturers and it's all over the place. > > I feel like this is an argument of tradition. Maybe people have > > been dissuaded from doing so when they don't have a reasonable use- > > case? I'm not saying that what I'm proposing is unquestionably > > reasonable, but I don't want to dismiss it out of hand. > > One of experience. The one that stands out is clock bindings. > Initially we were doing a node per clock modelling which could end up > being 100s of nodes and is difficult to get right (with DT being an > ABI). > > It comes up with system controller type blocks too that just have a > bunch of random registers. Those change in every SoC and not in any > controlled or ordered way that would make describing the individual > sub-functions in DT worthwhile. So what's the alternative ? Because without something like what we propose, what's going to happen is /dev/mem ... that's what people do today. > > > A node per register bit doesn't scale. > > > > It isn't meant to scale in terms of a single system. Using it > > extensively is very likely wrong. Separately, register-bit-led does > > pretty much the same thing. Doesn't the scale argument apply there? > > Who is to stop me from attaching an insane number of LEDs to a > > system? > > Review. > > If you look, register-bit-led is rarely used outside of some ARM, Ltd. > boards. It's simply quite rare to have MMIO register bits that have a > fixed function of LED control. Well, same here, we hope to review what goes upstream to make it reasonable. Otherwise it doens't matter. If a random vendor, let's say IBM, chose to chip a system where they put an insane amount of cruft in there, it will only affect those systems's BMC and the userspace stack on it. Thankfully that stack is OpenBMC and IBM is aiming at having their device-tree's upstream, thus reviewed, thus it won't happen. *Anything* can be abused. The point here is that we have a number, thankfully rather small, maybe a dozen or two, of tunables that are quite specific to a combination (system vendor, bmc vendor, system model) which control a few HW features that essentially do *NOT* fit in a subsystem. For everything that does, we have created proper drivers (and are doing more). > > Obviously if there are lots of systems using it sparingly and > > legitimately then maybe there's a scale issue, but isn't that just > > a reality of different hardware designs? Whoever is implementing > > support for the system is going to have to describe the hardware > > one way or another. > > > > > > > > Maybe this should be modelled using GPIO binding? There's a line there > > > too as whether the signals are "general purpose" or not. > > > > I don't think so, mainly because some of the things it is intended to be used for are not GPIOs. For instance, take the DAC mux I've described in the patch. It doesn't directly influence anything external to the SoC (i.e. it's certainly not a traditional GPIO in any sense). However, it does *indirectly* influence the SoC's behaviour by muxing the DAC internally between: > > > > 0. VGA device exposed on the host PCIe bus > > 1. The "Graphics CRT" controller > > 2. VGA port A > > 3. VGA port B > > And this mux control is fixed in the SoC design? This specific family of SoC (Aspeed) support those 4 configurations. How they need to be configured at runtime depends on the combination of system vendor and system model, along with in some cases the need to switch it at runtime. This is just one example. Another one is the handful of scratch registers that need to be populated with the "right" values for the host system BIOS, VGA BIOS and VGA driver. (The host bits access them via LPC IO space). The host system BIOS will read some basic config info there before its IPMI stack is up (and some BIOSes already rely on that). The VGA BIOS will get some strapping info and panel info. The VGA driver (which is already upstream, has been for a long time) will look for other things in some of these guys, such as connector configuration. Andrew, if it helps, we could put together a list of what we typically need on an OpenPower system today. That would give people like Rob a better idea of what this is all about. > > > > Maybe this could be modelled by pinmux, but then we still need some > > way to expose the mux functions to userspace for selection > > (userspace needs to transition arbitrarily between at least options > > 0 and 1 at runtime), at which point we haven't achieved much beyond > > adding a whole heap of infrastructure in the chain. > > > > Given 0 and 1, maybe exposing attributes in relevant drivers would > > be reasonable, except 0 isn't exposed on the SoC's internal bus so > > there is no driver on the BMC-side to do so. Taking into account 2 > > and 3 are also purely hardware paths further dashes the idea, as > > the configuration doesn't really "belong" to the Graphics CRT > > device more than it belongs anywhere else, except for the fact that > > there isn't anywhere else to expose it. > > > > Further, the BMC's kernel can't make the decision as to when to > > switch the mux as it knows nothing of the host's state. The BMC > > userspace is controlling the host's boot state and so *does* know > > when to flip the switch. Finally, the mux is in separate IP to the > > CRT or VGA blocks: It lives in the System Control Unit. > > > > My current point of view is the DAC mux field is effectively its > > own device, and we need to control it from userspace, so we need > > some way to describe it (i.e. not ignore it) in order for its > > capability to be exposed. > > > > I'm fully aware what I'm proposing isn't awesome as it's not > > providing any real abstraction, but the problem(s) at hand also > > seem to defy abstraction, and in order to avoid a plethora of > > bespoke bindings I thought it was reasonable to define something > > generic. > > > > All-in-all I appreciate the suggestion, but assuming you agree with > > my reasoning above do you have thoughts on other alternatives? > > Seems the controls are more fixed than I first thought. All the data > you have here could simply be within a driver. Help me understand what > functions are fixed (in the SoC) and which ones vary by board. Only > what's changing per board really needs to go into DT. Most of these things is specific to a given board or may even need to be changed at runtime. For example the VGA mux is system specific, *and* will change at runtime on some systems. For example, on IBM systems, the BMC will route its internal CRT display controller to the VGA port in order to display early boot progress information when the host hasn't initialized its graphics driver yet, and route the host VGA to the VGA port when the host has. (To clarify, the BMC has 2 display controllers: one for use by the SoC ARM itself, and one exposed to the host via PCIe, this routes which one gets to output to the VGA port). The scratch registers are similar. Their content tend to be specific to a specific BIOS vendor/system manufacturer. It might need to change from boot to boot based configuration changes the user might do via the BMC IPMI or web interface. Another example is that the SoC can expose a couple of PCIe devices to the host, a given vendor will want to control whether to do that and which ones to expose. This can be fixed or tunable. Some vendors want the user to be able to control (via IPMI or web UI or some other mechanism) whether the SoC internal VGA shows up or not depending on whether they chose to use a separate discrete GPU, as some OSes get confused when in the presence of multiple different graphic adapters. Talking of which: Andrew, did you put "default values" in your binding ? That would be a nice way to deal with system specific immutables, so that userspace doesn't even have to care. So to clarify once and for all, *anything* that fits in a subsystem, we're putting in one. All the random board control is all GPIOs and that's fine as well. For some things that require a bit of fiddly usage like the "MBOX" logic between BIOS and BMC we are also doing a dedicated driver. But there's a few stragglers here, and they tend to be so board/system/BIOS specific that it's not sustainable to create/change random drivers all the time just for exposing those few tunables. Cheers, Ben.
Hi Rob, Ben, I've replied to you both inline below, hopefully it's clear enough from the context. On Fri, 13 Jul 2018, at 10:25, Benjamin Herrenschmidt wrote: > On Thu, 2018-07-12 at 09:11 -0600, Rob Herring wrote: > > On Wed, Jul 11, 2018 at 6:54 PM Andrew Jeffery <andrew@aj.id.au> wrote: > > > > > > Hi Rob, > > > > > > Thanks for the response. > > > > > > On Thu, 12 Jul 2018, at 05:34, Rob Herring wrote: > > > > On Wed, Jul 11, 2018 at 03:01:19PM +0930, Andrew Jeffery wrote: > > > > > Baseboard Management Controllers (BMCs) are embedded SoCs that exist to > > > > > provide remote management of (primarily) server platforms. BMCs are > > > > > often tightly coupled to the platform in terms of behaviour and provide > > > > > many hardware features integral to booting and running the host system. > > > > > > > > > > Some of these hardware features are simple, for example scratch > > > > > registers provided by the BMC that are exposed to both the host and the > > > > > BMC. In other cases there's a single bit switch to enable or disable > > > > > some of the provided functionality. > > > > > > > > > > The documentation defines bindings for fields in registers that do not > > > > > integrate well into other driver models yet must be described to allow > > > > > the BMC kernel to assume control of these features. > > > > > > > > So we'll get a new binding when that happens? That will break > > > > compatibility. > > > > > > Can you please expand on this? I'm not following. > > > > If we have a subsystem in the future, then there would likely be an > > associated binding which would be different. So if you update the DT, > > then old kernels won't work with it. > > What kind of "subsystem" ? There is almost no way there could be one > for that sort of BMC tunables. We've look at several BMC chips out > there and requirements from several vendors, BIOS and system > manufacturers and it's all over the place. Right - This is the fundamental principle backing these patches: There will never be a coherent subsystem catering to any of what we want to describe with these bindings. > > > > I feel like this is an argument of tradition. Maybe people have > > > been dissuaded from doing so when they don't have a reasonable use- > > > case? I'm not saying that what I'm proposing is unquestionably > > > reasonable, but I don't want to dismiss it out of hand. > > ... > > > > It comes up with system controller type blocks too that just have a > > bunch of random registers. This matches the situation at hand. > > Those change in every SoC and not in any > > controlled or ordered way that would make describing the individual > > sub-functions in DT worthwhile. "Not worthwhile" is what I'm pushing back against for our use cases. I think they are narrow and limited enough to make it worthwhile. Obviously we want to avoid describing these things *badly* - you mentioned the clock bindings - so I'm happy to hash out what the right representation should be. But I struggle to think the solution is not describing some of our hardware features at all. > > So what's the alternative ? Because without something like what we > propose, what's going to happen is /dev/mem ... that's what people do > today. Yep. And I've outlined in the cover letter what I think are the advantages of what I'm proposing over /dev/mem. It's not an incredible gain, but has several of nice-to-have properties. > > > > > A node per register bit doesn't scale. > > > > > > It isn't meant to scale in terms of a single system. Using it > > > extensively is very likely wrong. Separately, register-bit-led does > > > pretty much the same thing. Doesn't the scale argument apply there? > > > Who is to stop me from attaching an insane number of LEDs to a > > > system? > > > > Review. > > > > If you look, register-bit-led is rarely used outside of some ARM, Ltd. > > boards. It's simply quite rare to have MMIO register bits that have a > > fixed function of LED control. > > Well, same here, we hope to review what goes upstream to make it > reasonable. Otherwise it doens't matter. If a random vendor, let's say > IBM, chose to chip a system where they put an insane amount of cruft in > there, it will only affect those systems's BMC and the userspace stack > on it. > > Thankfully that stack is OpenBMC and IBM is aiming at having their > device-tree's upstream, thus reviewed, thus it won't happen. > > *Anything* can be abused. The point here is that we have a number, > thankfully rather small, maybe a dozen or two, of tunables that are > quite specific to a combination (system vendor, bmc vendor, system > model) which control a few HW features that essentially do *NOT* fit in > a subsystem. Exactly. I tried to head off the abuse vector by requiring that uses be listed in the bindings document, and thus enforce some level of review. It might not be the most effective approach at the end of the day, but at least it is something. > > For everything that does, we have created proper drivers (and are doing > more). > > > > > Obviously if there are lots of systems using it sparingly and > > > legitimately then maybe there's a scale issue, but isn't that just > > > a reality of different hardware designs? Whoever is implementing > > > support for the system is going to have to describe the hardware > > > one way or another. > > > > > > > > > > > Maybe this should be modelled using GPIO binding? There's a line there > > > > too as whether the signals are "general purpose" or not. > > > > > > I don't think so, mainly because some of the things it is intended to be used for are not GPIOs. For instance, take the DAC mux I've described in the patch. It doesn't directly influence anything external to the SoC (i.e. it's certainly not a traditional GPIO in any sense). However, it does *indirectly* influence the SoC's behaviour by muxing the DAC internally between: > > > > > > 0. VGA device exposed on the host PCIe bus > > > 1. The "Graphics CRT" controller > > > 2. VGA port A > > > 3. VGA port B > > > > And this mux control is fixed in the SoC design? > > This specific family of SoC (Aspeed) support those 4 configurations. > How they need to be configured at runtime depends on the combination of > system vendor and system model, along with in some cases the need to > switch it at runtime. > > This is just one example. Another one is the handful of scratch > registers that need to be populated with the "right" values for the > host system BIOS, VGA BIOS and VGA driver. (The host bits access them > via LPC IO space). > > The host system BIOS will read some basic config info there before its > IPMI stack is up (and some BIOSes already rely on that). The VGA BIOS > will get some strapping info and panel info. The VGA driver (which is > already upstream, has been for a long time) will look for other things > in some of these guys, such as connector configuration. > > Andrew, if it helps, we could put together a list of what we typically > need on an OpenPower system today. That would give people like Rob a > better idea of what this is all about. It's primarily what I've outlined at the bottom of the bindings document, though the use cases aren't provided there as they are a bit out-of-scope. So the SuperIO and VGA scratch registers, plus the DAC mux. A bunch of tunable things. OpenPOWER platforms make use of the SuperIO scratch registers to convey configuration information from the BMC to the host. Information provided includes low-level control of the host firmware initialisation process, UART and logging configuration, and the strategy for handling errors (crash vs log). This is all an "arbitrary" contract between the BMC userspace and the host firmware, i.e. different platforms/firmware could lay out the same information in different ways or communicate entirely different information altogether. The BMC kernel shouldn't care about any of it, other than provide sensible access to the hardware. Again on OpenPOWER systems using the ASPEED BMC SoCs running OpenBMC, the BMC uses the VGA scratch registers to sense initialisation of the host graphics driver in the host's boot process. When the BMC userspace detects the host VGA driver is up we switch the DAC mux from the BMC CRT device to the host VGA device so that the host is now driving the VGA output. Non-OpenPOWER OpenBMC configurations may do something entirely different, or not do anything at all with the hardware, so as above, it's not really the job of the BMC kernel to be involved in any of this, other than to provide sensible access to userspace. There are a number of other switches that control the availability of ASPEED BMC hardware features to the host system that also don't fit any particular subsystem and so will use these bindings, but our (OpenPOWER/OpenBMC) current uses are what's described above. Dell also suggested they had some use-cases that aligned with the intent of the bindings, but I don't know what they had in mind. Eugene (on Cc) can elaborate. > > > > > > > Maybe this could be modelled by pinmux, but then we still need some > > > way to expose the mux functions to userspace for selection > > > (userspace needs to transition arbitrarily between at least options > > > 0 and 1 at runtime), at which point we haven't achieved much beyond > > > adding a whole heap of infrastructure in the chain. > > > > > > Given 0 and 1, maybe exposing attributes in relevant drivers would > > > be reasonable, except 0 isn't exposed on the SoC's internal bus so > > > there is no driver on the BMC-side to do so. Taking into account 2 > > > and 3 are also purely hardware paths further dashes the idea, as > > > the configuration doesn't really "belong" to the Graphics CRT > > > device more than it belongs anywhere else, except for the fact that > > > there isn't anywhere else to expose it. > > > > > > Further, the BMC's kernel can't make the decision as to when to > > > switch the mux as it knows nothing of the host's state. The BMC > > > userspace is controlling the host's boot state and so *does* know > > > when to flip the switch. Finally, the mux is in separate IP to the > > > CRT or VGA blocks: It lives in the System Control Unit. > > > > > > My current point of view is the DAC mux field is effectively its > > > own device, and we need to control it from userspace, so we need > > > some way to describe it (i.e. not ignore it) in order for its > > > capability to be exposed. > > > > > > I'm fully aware what I'm proposing isn't awesome as it's not > > > providing any real abstraction, but the problem(s) at hand also > > > seem to defy abstraction, and in order to avoid a plethora of > > > bespoke bindings I thought it was reasonable to define something > > > generic. > > > > > > All-in-all I appreciate the suggestion, but assuming you agree with > > > my reasoning above do you have thoughts on other alternatives? > > > > Seems the controls are more fixed than I first thought. All the data > > you have here could simply be within a driver. Rob: A driver for what though? One unique to this particular mux? That feels overly specific when we can generalise the concept to cover a wider range of use-cases. > > Help me understand what > > functions are fixed (in the SoC) and which ones vary by board. Only > > what's changing per board really needs to go into DT. I think this last sentence identifies a difference in our starting points, so I'd like to explore that. Blocks of functionality might move around inside the SoC as well, so don't we need a way to describe those functions appropriately? And from there describe how the SoC integrates the functions, and then describe how a board integrates the SoC? This all composes, and the problem at the end of the day comes down to what we want to view as a point of abstraction, right? It seems ideal to me that the metadata about hardware features resides in the description of the relevant system (DT, for a function, a SoC or a board), otherwise don't we wind up with crazy, unfocused, monolithic drivers for things like system controllers? (There's MFD/syscon, but having used it previously I'm still grappling with the benefit over some of the weirdness it injects into devicetree - maybe I did it wrong.) Or alternatively, a generic driver that's choc full of platform-specific data covering the platforms that require it? The driver that implements the behaviour of the bindings described here turns out quite focused (even if the first attempt was a bit of a basket case, hopefully the second is better (sorry Greg)). > > Most of these things is specific to a given board or may even need to > be changed at runtime. *snip*... > > Talking of which: Andrew, did you put "default values" in your binding > ? That would be a nice way to deal with system specific immutables, so > that userspace doesn't even have to care. Yes, I described a `default-value`property for RW fields, and `default-set` and `default-clear`properties for write-1-set/write-1-clear fields for exactly this purpose. > > So to clarify once and for all, *anything* that fits in a subsystem, > we're putting in one. All the random board control is all GPIOs and > that's fine as well. For some things that require a bit of fiddly usage > like the "MBOX" logic between BIOS and BMC we are also doing a > dedicated driver. (As an aside, the "MBOX" functionality is slightly different from the scratch registers in that it has configurable interrupts each way (BMC-to-Host and Host-to-BMC) - as such it can be used to implement a dynamic protocol and so deserves its own driver. This is in contrast to the dumb scratch registers we're describing with these bindings which have no such interrupts.) > > But there's a few stragglers here, and they tend to be so > board/system/BIOS specific that it's not sustainable to create/change > random drivers all the time just for exposing those few tunables. > Yes, this is my feeling too. Cheers, Andrew
Andrew, Ben, first of all let me thank you for bringing in this set of patches. From the discussion it looks to me like Rob is not familiar with specifics of BMC-managed servers and tries to apply to them the rules that have proven to be good for workstations and laptops. As someone using /dev/mem these days to configure those registers in BMCs, I'm all for this patch set as it will make BMC configuration less obscure. Writing 1 or 0 to a named node is way clearer than writing some magic value at some magic offset in /dev/mem. I like the idea of having it all configurable via DT as it allows for only having exported the nodes that are actually needed, thus reducing, as you have said, the foot-gun. So far I do not have any objections or constructive comments to the architecture of the proposed patches. So I'm writing this to support your position in this discussion and to let Rob and other reviewers know that this feature is actually needed. With best regards, Alexander Amelkin 13.07.2018 09:31, Andrew Jeffery wrote: > Hi Rob, Ben, > > I've replied to you both inline below, hopefully it's clear enough from the context. > > On Fri, 13 Jul 2018, at 10:25, Benjamin Herrenschmidt wrote: >> On Thu, 2018-07-12 at 09:11 -0600, Rob Herring wrote: >>> On Wed, Jul 11, 2018 at 6:54 PM Andrew Jeffery <andrew@aj.id.au> wrote: >>>> Hi Rob, >>>> >>>> Thanks for the response. >>>> >>>> On Thu, 12 Jul 2018, at 05:34, Rob Herring wrote: >>>>> On Wed, Jul 11, 2018 at 03:01:19PM +0930, Andrew Jeffery wrote: >>>>>> Baseboard Management Controllers (BMCs) are embedded SoCs that exist to >>>>>> provide remote management of (primarily) server platforms. BMCs are >>>>>> often tightly coupled to the platform in terms of behaviour and provide >>>>>> many hardware features integral to booting and running the host system. >>>>>> >>>>>> Some of these hardware features are simple, for example scratch >>>>>> registers provided by the BMC that are exposed to both the host and the >>>>>> BMC. In other cases there's a single bit switch to enable or disable >>>>>> some of the provided functionality. >>>>>> >>>>>> The documentation defines bindings for fields in registers that do not >>>>>> integrate well into other driver models yet must be described to allow >>>>>> the BMC kernel to assume control of these features. >>>>> So we'll get a new binding when that happens? That will break >>>>> compatibility. >>>> Can you please expand on this? I'm not following. >>> If we have a subsystem in the future, then there would likely be an >>> associated binding which would be different. So if you update the DT, >>> then old kernels won't work with it. >> What kind of "subsystem" ? There is almost no way there could be one >> for that sort of BMC tunables. We've look at several BMC chips out >> there and requirements from several vendors, BIOS and system >> manufacturers and it's all over the place. > Right - This is the fundamental principle backing these patches: There will never be a coherent subsystem catering to any of what we want to describe with these bindings. > >>>> I feel like this is an argument of tradition. Maybe people have >>>> been dissuaded from doing so when they don't have a reasonable use- >>>> case? I'm not saying that what I'm proposing is unquestionably >>>> reasonable, but I don't want to dismiss it out of hand. > ... >>> It comes up with system controller type blocks too that just have a >>> bunch of random registers. > This matches the situation at hand. > >>> Those change in every SoC and not in any >>> controlled or ordered way that would make describing the individual >>> sub-functions in DT worthwhile. > "Not worthwhile" is what I'm pushing back against for our use cases. I think they are narrow and limited enough to make it worthwhile. > > Obviously we want to avoid describing these things *badly* - you mentioned the clock bindings - so I'm happy to hash out what the right representation should be. But I struggle to think the solution is not describing some of our hardware features at all. > >> So what's the alternative ? Because without something like what we >> propose, what's going to happen is /dev/mem ... that's what people do >> today. > Yep. And I've outlined in the cover letter what I think are the advantages of what I'm proposing over /dev/mem. It's not an incredible gain, but has several of nice-to-have properties. > >>>>> A node per register bit doesn't scale. >>>> It isn't meant to scale in terms of a single system. Using it >>>> extensively is very likely wrong. Separately, register-bit-led does >>>> pretty much the same thing. Doesn't the scale argument apply there? >>>> Who is to stop me from attaching an insane number of LEDs to a >>>> system? >>> Review. >>> >>> If you look, register-bit-led is rarely used outside of some ARM, Ltd. >>> boards. It's simply quite rare to have MMIO register bits that have a >>> fixed function of LED control. >> Well, same here, we hope to review what goes upstream to make it >> reasonable. Otherwise it doens't matter. If a random vendor, let's say >> IBM, chose to chip a system where they put an insane amount of cruft in >> there, it will only affect those systems's BMC and the userspace stack >> on it. >> >> Thankfully that stack is OpenBMC and IBM is aiming at having their >> device-tree's upstream, thus reviewed, thus it won't happen. >> >> *Anything* can be abused. The point here is that we have a number, >> thankfully rather small, maybe a dozen or two, of tunables that are >> quite specific to a combination (system vendor, bmc vendor, system >> model) which control a few HW features that essentially do *NOT* fit in >> a subsystem. > Exactly. I tried to head off the abuse vector by requiring that uses be listed in the bindings document, and thus enforce some level of review. It might not be the most effective approach at the end of the day, but at least it is something. > >> For everything that does, we have created proper drivers (and are doing >> more). >> >> >>>> Obviously if there are lots of systems using it sparingly and >>>> legitimately then maybe there's a scale issue, but isn't that just >>>> a reality of different hardware designs? Whoever is implementing >>>> support for the system is going to have to describe the hardware >>>> one way or another. >>>> >>>>> Maybe this should be modelled using GPIO binding? There's a line there >>>>> too as whether the signals are "general purpose" or not. >>>> I don't think so, mainly because some of the things it is intended to be used for are not GPIOs. For instance, take the DAC mux I've described in the patch. It doesn't directly influence anything external to the SoC (i.e. it's certainly not a traditional GPIO in any sense). However, it does *indirectly* influence the SoC's behaviour by muxing the DAC internally between: >>>> >>>> 0. VGA device exposed on the host PCIe bus >>>> 1. The "Graphics CRT" controller >>>> 2. VGA port A >>>> 3. VGA port B >>> And this mux control is fixed in the SoC design? >> This specific family of SoC (Aspeed) support those 4 configurations. >> How they need to be configured at runtime depends on the combination of >> system vendor and system model, along with in some cases the need to >> switch it at runtime. >> >> This is just one example. Another one is the handful of scratch >> registers that need to be populated with the "right" values for the >> host system BIOS, VGA BIOS and VGA driver. (The host bits access them >> via LPC IO space). >> >> The host system BIOS will read some basic config info there before its >> IPMI stack is up (and some BIOSes already rely on that). The VGA BIOS >> will get some strapping info and panel info. The VGA driver (which is >> already upstream, has been for a long time) will look for other things >> in some of these guys, such as connector configuration. >> >> Andrew, if it helps, we could put together a list of what we typically >> need on an OpenPower system today. That would give people like Rob a >> better idea of what this is all about. > It's primarily what I've outlined at the bottom of the bindings document, though the use cases aren't provided there as they are a bit out-of-scope. So the SuperIO and VGA scratch registers, plus the DAC mux. A bunch of tunable things. > > OpenPOWER platforms make use of the SuperIO scratch registers to convey configuration information from the BMC to the host. Information provided includes low-level control of the host firmware initialisation process, UART and logging configuration, and the strategy for handling errors (crash vs log). This is all an "arbitrary" contract between the BMC userspace and the host firmware, i.e. different platforms/firmware could lay out the same information in different ways or communicate entirely different information altogether. The BMC kernel shouldn't care about any of it, other than provide sensible access to the hardware. > > Again on OpenPOWER systems using the ASPEED BMC SoCs running OpenBMC, the BMC uses the VGA scratch registers to sense initialisation of the host graphics driver in the host's boot process. When the BMC userspace detects the host VGA driver is up we switch the DAC mux from the BMC CRT device to the host VGA device so that the host is now driving the VGA output. Non-OpenPOWER OpenBMC configurations may do something entirely different, or not do anything at all with the hardware, so as above, it's not really the job of the BMC kernel to be involved in any of this, other than to provide sensible access to userspace. > > There are a number of other switches that control the availability of ASPEED BMC hardware features to the host system that also don't fit any particular subsystem and so will use these bindings, but our (OpenPOWER/OpenBMC) current uses are what's described above. > > Dell also suggested they had some use-cases that aligned with the intent of the bindings, but I don't know what they had in mind. Eugene (on Cc) can elaborate. > >>>> Maybe this could be modelled by pinmux, but then we still need some >>>> way to expose the mux functions to userspace for selection >>>> (userspace needs to transition arbitrarily between at least options >>>> 0 and 1 at runtime), at which point we haven't achieved much beyond >>>> adding a whole heap of infrastructure in the chain. >>>> >>>> Given 0 and 1, maybe exposing attributes in relevant drivers would >>>> be reasonable, except 0 isn't exposed on the SoC's internal bus so >>>> there is no driver on the BMC-side to do so. Taking into account 2 >>>> and 3 are also purely hardware paths further dashes the idea, as >>>> the configuration doesn't really "belong" to the Graphics CRT >>>> device more than it belongs anywhere else, except for the fact that >>>> there isn't anywhere else to expose it. >>>> >>>> Further, the BMC's kernel can't make the decision as to when to >>>> switch the mux as it knows nothing of the host's state. The BMC >>>> userspace is controlling the host's boot state and so *does* know >>>> when to flip the switch. Finally, the mux is in separate IP to the >>>> CRT or VGA blocks: It lives in the System Control Unit. >>>> >>>> My current point of view is the DAC mux field is effectively its >>>> own device, and we need to control it from userspace, so we need >>>> some way to describe it (i.e. not ignore it) in order for its >>>> capability to be exposed. >>>> >>>> I'm fully aware what I'm proposing isn't awesome as it's not >>>> providing any real abstraction, but the problem(s) at hand also >>>> seem to defy abstraction, and in order to avoid a plethora of >>>> bespoke bindings I thought it was reasonable to define something >>>> generic. >>>> >>>> All-in-all I appreciate the suggestion, but assuming you agree with >>>> my reasoning above do you have thoughts on other alternatives? >>> Seems the controls are more fixed than I first thought. All the data >>> you have here could simply be within a driver. > Rob: A driver for what though? One unique to this particular mux? That feels overly specific when we can generalise the concept to cover a wider range of use-cases. > >>> Help me understand what >>> functions are fixed (in the SoC) and which ones vary by board. Only >>> what's changing per board really needs to go into DT. > I think this last sentence identifies a difference in our starting points, so I'd like to explore that. Blocks of functionality might move around inside the SoC as well, so don't we need a way to describe those functions appropriately? And from there describe how the SoC integrates the functions, and then describe how a board integrates the SoC? This all composes, and the problem at the end of the day comes down to what we want to view as a point of abstraction, right? > > It seems ideal to me that the metadata about hardware features resides in the description of the relevant system (DT, for a function, a SoC or a board), otherwise don't we wind up with crazy, unfocused, monolithic drivers for things like system controllers? (There's MFD/syscon, but having used it previously I'm still grappling with the benefit over some of the weirdness it injects into devicetree - maybe I did it wrong.) Or alternatively, a generic driver that's choc full of platform-specific data covering the platforms that require it? The driver that implements the behaviour of the bindings described here turns out quite focused (even if the first attempt was a bit of a basket case, hopefully the second is better (sorry Greg)). > >> Most of these things is specific to a given board or may even need to >> be changed at runtime. > *snip*... > >> Talking of which: Andrew, did you put "default values" in your binding >> ? That would be a nice way to deal with system specific immutables, so >> that userspace doesn't even have to care. > Yes, I described a `default-value`property for RW fields, and `default-set` and `default-clear`properties for write-1-set/write-1-clear fields for exactly this purpose. > >> So to clarify once and for all, *anything* that fits in a subsystem, >> we're putting in one. All the random board control is all GPIOs and >> that's fine as well. For some things that require a bit of fiddly usage >> like the "MBOX" logic between BIOS and BMC we are also doing a >> dedicated driver. > (As an aside, the "MBOX" functionality is slightly different from the scratch registers in that it has configurable interrupts each way (BMC-to-Host and Host-to-BMC) - as such it can be used to implement a dynamic protocol and so deserves its own driver. This is in contrast to the dumb scratch registers we're describing with these bindings which have no such interrupts.) > >> But there's a few stragglers here, and they tend to be so >> board/system/BIOS specific that it's not sustainable to create/change >> random drivers all the time just for exposing those few tunables. >> > Yes, this is my feeling too. > > Cheers, > > Andrew
Dell - Internal Use - Confidential +1 from someone using Nuvoton's BMC SoC -----Original Message----- From: Alexander Amelkin [mailto:a.amelkin@yadro.com] Sent: Friday, July 13, 2018 10:14 AM To: Andrew Jeffery; Benjamin Herrenschmidt; Rob Herring Cc: Mark Rutland; devicetree@vger.kernel.org; Greg Kroah-Hartman; Cho, Eugene; linux-kernel@vger.kernel.org; Joel Stanley; stewart@linux.ibm.com; OpenBMC Maillist; moderated list:ARM/FREESCALE IMX / MXC ARM ARCHITECTURE Subject: Re: [RFC PATCH v2 1/4] dt-bindings: misc: Add bindings for misc. BMC control fields Andrew, Ben, first of all let me thank you for bringing in this set of patches. From the discussion it looks to me like Rob is not familiar with specifics of BMC-managed servers and tries to apply to them the rules that have proven to be good for workstations and laptops. As someone using /dev/mem these days to configure those registers in BMCs, I'm all for this patch set as it will make BMC configuration less obscure. Writing 1 or 0 to a named node is way clearer than writing some magic value at some magic offset in /dev/mem. I like the idea of having it all configurable via DT as it allows for only having exported the nodes that are actually needed, thus reducing, as you have said, the foot-gun. So far I do not have any objections or constructive comments to the architecture of the proposed patches. So I'm writing this to support your position in this discussion and to let Rob and other reviewers know that this feature is actually needed. With best regards, Alexander Amelkin 13.07.2018 09:31, Andrew Jeffery wrote: > Hi Rob, Ben, > > I've replied to you both inline below, hopefully it's clear enough from the context. > > On Fri, 13 Jul 2018, at 10:25, Benjamin Herrenschmidt wrote: >> On Thu, 2018-07-12 at 09:11 -0600, Rob Herring wrote: >>> On Wed, Jul 11, 2018 at 6:54 PM Andrew Jeffery <andrew@aj.id.au> wrote: >>>> Hi Rob, >>>> >>>> Thanks for the response. >>>> >>>> On Thu, 12 Jul 2018, at 05:34, Rob Herring wrote: >>>>> On Wed, Jul 11, 2018 at 03:01:19PM +0930, Andrew Jeffery wrote: >>>>>> Baseboard Management Controllers (BMCs) are embedded SoCs that >>>>>> exist to provide remote management of (primarily) server >>>>>> platforms. BMCs are often tightly coupled to the platform in >>>>>> terms of behaviour and provide many hardware features integral to booting and running the host system. >>>>>> >>>>>> Some of these hardware features are simple, for example scratch >>>>>> registers provided by the BMC that are exposed to both the host >>>>>> and the BMC. In other cases there's a single bit switch to enable >>>>>> or disable some of the provided functionality. >>>>>> >>>>>> The documentation defines bindings for fields in registers that >>>>>> do not integrate well into other driver models yet must be >>>>>> described to allow the BMC kernel to assume control of these features. >>>>> So we'll get a new binding when that happens? That will break >>>>> compatibility. >>>> Can you please expand on this? I'm not following. >>> If we have a subsystem in the future, then there would likely be an >>> associated binding which would be different. So if you update the >>> DT, then old kernels won't work with it. >> What kind of "subsystem" ? There is almost no way there could be one >> for that sort of BMC tunables. We've look at several BMC chips out >> there and requirements from several vendors, BIOS and system >> manufacturers and it's all over the place. > Right - This is the fundamental principle backing these patches: There will never be a coherent subsystem catering to any of what we want to describe with these bindings. > >>>> I feel like this is an argument of tradition. Maybe people have >>>> been dissuaded from doing so when they don't have a reasonable use- >>>> case? I'm not saying that what I'm proposing is unquestionably >>>> reasonable, but I don't want to dismiss it out of hand. > ... >>> It comes up with system controller type blocks too that just have a >>> bunch of random registers. > This matches the situation at hand. > >>> Those change in every SoC and not in any controlled or ordered way >>> that would make describing the individual sub-functions in DT >>> worthwhile. > "Not worthwhile" is what I'm pushing back against for our use cases. I think they are narrow and limited enough to make it worthwhile. > > Obviously we want to avoid describing these things *badly* - you mentioned the clock bindings - so I'm happy to hash out what the right representation should be. But I struggle to think the solution is not describing some of our hardware features at all. > >> So what's the alternative ? Because without something like what we >> propose, what's going to happen is /dev/mem ... that's what people do >> today. > Yep. And I've outlined in the cover letter what I think are the advantages of what I'm proposing over /dev/mem. It's not an incredible gain, but has several of nice-to-have properties. > >>>>> A node per register bit doesn't scale. >>>> It isn't meant to scale in terms of a single system. Using it >>>> extensively is very likely wrong. Separately, register-bit-led does >>>> pretty much the same thing. Doesn't the scale argument apply there? >>>> Who is to stop me from attaching an insane number of LEDs to a >>>> system? >>> Review. >>> >>> If you look, register-bit-led is rarely used outside of some ARM, Ltd. >>> boards. It's simply quite rare to have MMIO register bits that have >>> a fixed function of LED control. >> Well, same here, we hope to review what goes upstream to make it >> reasonable. Otherwise it doens't matter. If a random vendor, let's >> say IBM, chose to chip a system where they put an insane amount of >> cruft in there, it will only affect those systems's BMC and the >> userspace stack on it. >> >> Thankfully that stack is OpenBMC and IBM is aiming at having their >> device-tree's upstream, thus reviewed, thus it won't happen. >> >> *Anything* can be abused. The point here is that we have a number, >> thankfully rather small, maybe a dozen or two, of tunables that are >> quite specific to a combination (system vendor, bmc vendor, system >> model) which control a few HW features that essentially do *NOT* fit >> in a subsystem. > Exactly. I tried to head off the abuse vector by requiring that uses be listed in the bindings document, and thus enforce some level of review. It might not be the most effective approach at the end of the day, but at least it is something. > >> For everything that does, we have created proper drivers (and are >> doing more). >> >> >>>> Obviously if there are lots of systems using it sparingly and >>>> legitimately then maybe there's a scale issue, but isn't that just >>>> a reality of different hardware designs? Whoever is implementing >>>> support for the system is going to have to describe the hardware >>>> one way or another. >>>> >>>>> Maybe this should be modelled using GPIO binding? There's a line >>>>> there too as whether the signals are "general purpose" or not. >>>> I don't think so, mainly because some of the things it is intended to be used for are not GPIOs. For instance, take the DAC mux I've described in the patch. It doesn't directly influence anything external to the SoC (i.e. it's certainly not a traditional GPIO in any sense). However, it does *indirectly* influence the SoC's behaviour by muxing the DAC internally between: >>>> >>>> 0. VGA device exposed on the host PCIe bus 1. The "Graphics CRT" >>>> controller 2. VGA port A 3. VGA port B >>> And this mux control is fixed in the SoC design? >> This specific family of SoC (Aspeed) support those 4 configurations. >> How they need to be configured at runtime depends on the combination >> of system vendor and system model, along with in some cases the need >> to switch it at runtime. >> >> This is just one example. Another one is the handful of scratch >> registers that need to be populated with the "right" values for the >> host system BIOS, VGA BIOS and VGA driver. (The host bits access them >> via LPC IO space). >> >> The host system BIOS will read some basic config info there before >> its IPMI stack is up (and some BIOSes already rely on that). The VGA >> BIOS will get some strapping info and panel info. The VGA driver >> (which is already upstream, has been for a long time) will look for >> other things in some of these guys, such as connector configuration. >> >> Andrew, if it helps, we could put together a list of what we >> typically need on an OpenPower system today. That would give people >> like Rob a better idea of what this is all about. > It's primarily what I've outlined at the bottom of the bindings document, though the use cases aren't provided there as they are a bit out-of-scope. So the SuperIO and VGA scratch registers, plus the DAC mux. A bunch of tunable things. > > OpenPOWER platforms make use of the SuperIO scratch registers to convey configuration information from the BMC to the host. Information provided includes low-level control of the host firmware initialisation process, UART and logging configuration, and the strategy for handling errors (crash vs log). This is all an "arbitrary" contract between the BMC userspace and the host firmware, i.e. different platforms/firmware could lay out the same information in different ways or communicate entirely different information altogether. The BMC kernel shouldn't care about any of it, other than provide sensible access to the hardware. > > Again on OpenPOWER systems using the ASPEED BMC SoCs running OpenBMC, the BMC uses the VGA scratch registers to sense initialisation of the host graphics driver in the host's boot process. When the BMC userspace detects the host VGA driver is up we switch the DAC mux from the BMC CRT device to the host VGA device so that the host is now driving the VGA output. Non-OpenPOWER OpenBMC configurations may do something entirely different, or not do anything at all with the hardware, so as above, it's not really the job of the BMC kernel to be involved in any of this, other than to provide sensible access to userspace. > > There are a number of other switches that control the availability of ASPEED BMC hardware features to the host system that also don't fit any particular subsystem and so will use these bindings, but our (OpenPOWER/OpenBMC) current uses are what's described above. > > Dell also suggested they had some use-cases that aligned with the intent of the bindings, but I don't know what they had in mind. Eugene (on Cc) can elaborate. > >>>> Maybe this could be modelled by pinmux, but then we still need some >>>> way to expose the mux functions to userspace for selection >>>> (userspace needs to transition arbitrarily between at least options >>>> 0 and 1 at runtime), at which point we haven't achieved much beyond >>>> adding a whole heap of infrastructure in the chain. >>>> >>>> Given 0 and 1, maybe exposing attributes in relevant drivers would >>>> be reasonable, except 0 isn't exposed on the SoC's internal bus so >>>> there is no driver on the BMC-side to do so. Taking into account 2 >>>> and 3 are also purely hardware paths further dashes the idea, as >>>> the configuration doesn't really "belong" to the Graphics CRT >>>> device more than it belongs anywhere else, except for the fact that >>>> there isn't anywhere else to expose it. >>>> >>>> Further, the BMC's kernel can't make the decision as to when to >>>> switch the mux as it knows nothing of the host's state. The BMC >>>> userspace is controlling the host's boot state and so *does* know >>>> when to flip the switch. Finally, the mux is in separate IP to the >>>> CRT or VGA blocks: It lives in the System Control Unit. >>>> >>>> My current point of view is the DAC mux field is effectively its >>>> own device, and we need to control it from userspace, so we need >>>> some way to describe it (i.e. not ignore it) in order for its >>>> capability to be exposed. >>>> >>>> I'm fully aware what I'm proposing isn't awesome as it's not >>>> providing any real abstraction, but the problem(s) at hand also >>>> seem to defy abstraction, and in order to avoid a plethora of >>>> bespoke bindings I thought it was reasonable to define something >>>> generic. >>>> >>>> All-in-all I appreciate the suggestion, but assuming you agree with >>>> my reasoning above do you have thoughts on other alternatives? >>> Seems the controls are more fixed than I first thought. All the data >>> you have here could simply be within a driver. > Rob: A driver for what though? One unique to this particular mux? That feels overly specific when we can generalise the concept to cover a wider range of use-cases. > >>> Help me understand what >>> functions are fixed (in the SoC) and which ones vary by board. Only >>> what's changing per board really needs to go into DT. > I think this last sentence identifies a difference in our starting points, so I'd like to explore that. Blocks of functionality might move around inside the SoC as well, so don't we need a way to describe those functions appropriately? And from there describe how the SoC integrates the functions, and then describe how a board integrates the SoC? This all composes, and the problem at the end of the day comes down to what we want to view as a point of abstraction, right? > > It seems ideal to me that the metadata about hardware features resides in the description of the relevant system (DT, for a function, a SoC or a board), otherwise don't we wind up with crazy, unfocused, monolithic drivers for things like system controllers? (There's MFD/syscon, but having used it previously I'm still grappling with the benefit over some of the weirdness it injects into devicetree - maybe I did it wrong.) Or alternatively, a generic driver that's choc full of platform-specific data covering the platforms that require it? The driver that implements the behaviour of the bindings described here turns out quite focused (even if the first attempt was a bit of a basket case, hopefully the second is better (sorry Greg)). > >> Most of these things is specific to a given board or may even need to >> be changed at runtime. > *snip*... > >> Talking of which: Andrew, did you put "default values" in your >> binding ? That would be a nice way to deal with system specific >> immutables, so that userspace doesn't even have to care. > Yes, I described a `default-value`property for RW fields, and `default-set` and `default-clear`properties for write-1-set/write-1-clear fields for exactly this purpose. > >> So to clarify once and for all, *anything* that fits in a subsystem, >> we're putting in one. All the random board control is all GPIOs and >> that's fine as well. For some things that require a bit of fiddly >> usage like the "MBOX" logic between BIOS and BMC we are also doing a >> dedicated driver. > (As an aside, the "MBOX" functionality is slightly different from the > scratch registers in that it has configurable interrupts each way > (BMC-to-Host and Host-to-BMC) - as such it can be used to implement a > dynamic protocol and so deserves its own driver. This is in contrast > to the dumb scratch registers we're describing with these bindings > which have no such interrupts.) > >> But there's a few stragglers here, and they tend to be so >> board/system/BIOS specific that it's not sustainable to create/change >> random drivers all the time just for exposing those few tunables. >> > Yes, this is my feeling too. > > Cheers, > > Andrew
On Fri, Jul 13, 2018 at 06:49:04PM +0000, Eugene.Cho@dell.com wrote:
> Dell - Internal Use - Confidential
Really? To a public mailing list? odd...
>> Dell - Internal Use - Confidential >Really? To a public mailing list? odd... Ha yea sorry - I was so excited to show my support, that I jumped the gun :)
+ my vote as Nuvoton's BMC FW provider. On Fri, Jul 13, 2018 at 10:07 PM <Eugene.Cho@dell.com> wrote: > > >> Dell - Internal Use - Confidential > >Really? To a public mailing list? odd... > > Ha yea sorry - I was so excited to show my support, that I jumped the gun :) >
Hi Alexander, I've rearranged your reply slightly :) On Sat, 14 Jul 2018, at 00:44, Alexander Amelkin wrote: > > So I'm writing this to support your position in this discussion and to > let Rob and other reviewers know that this feature is actually needed. Thanks. So to summarise some other replies so far, YADRO (Alexander) and Dell (Eugene) - platform vendors integrating BMCs - are interested in a solution, and the patches are seen useful for ASPEED and Nuvoton (Avi) BMCs. > > From the discussion it looks to me like Rob is not familiar with > specifics of BMC-managed servers and tries to apply to them the rules > that have proven to be good for workstations and laptops. > Whatever Rob's familiarity with BMCs or other systems, I'm keen to listen to, explore and gain from his experience. I was certainly expecting push back on these patches and in different circumstances I'd probably say no to them as well. The argument for them needs to stand up to scrutiny, and if that scrutiny leads to alternative solutions (that, ideally, are not /dev/mem) then that's fine. Currently I think we need what I've proposed despite it looking like a violation of abstractions, because the hardware itself doesn't have a neat, abstract design. But we could be thinking about it wrong, e.g. maybe what we need instead are no devicetree bindings and simply some in-kernel helpers that allow arbitrary drivers to instantiate the sysfs interface I've proposed for these fields under their control. That removes the need for explicit description in the devicetree, though may lead to the creation of a number of unique drivers (and bindings) that each cover only the functions we're trying to generalise over here. It would be good to have some feedback on the proposed sysfs interface as well - as explored above we could feasibly live without the bindings in their current form but taking away the sysfs interface would nuke the series. Cheers, Andrew
On Fri, Jul 13, 2018 at 12:31 AM Andrew Jeffery <andrew@aj.id.au> wrote: > > Hi Rob, Ben, > > I've replied to you both inline below, hopefully it's clear enough from the context. > > On Fri, 13 Jul 2018, at 10:25, Benjamin Herrenschmidt wrote: > > On Thu, 2018-07-12 at 09:11 -0600, Rob Herring wrote: > > > On Wed, Jul 11, 2018 at 6:54 PM Andrew Jeffery <andrew@aj.id.au> wrote: > > > > > > > > Hi Rob, > > > > > > > > Thanks for the response. > > > > > > > > On Thu, 12 Jul 2018, at 05:34, Rob Herring wrote: > > > > > On Wed, Jul 11, 2018 at 03:01:19PM +0930, Andrew Jeffery wrote: > > > > > > Baseboard Management Controllers (BMCs) are embedded SoCs that exist to > > > > > > provide remote management of (primarily) server platforms. BMCs are > > > > > > often tightly coupled to the platform in terms of behaviour and provide > > > > > > many hardware features integral to booting and running the host system. > > > > > > > > > > > > Some of these hardware features are simple, for example scratch > > > > > > registers provided by the BMC that are exposed to both the host and the > > > > > > BMC. In other cases there's a single bit switch to enable or disable > > > > > > some of the provided functionality. > > > > > > > > > > > > The documentation defines bindings for fields in registers that do not > > > > > > integrate well into other driver models yet must be described to allow > > > > > > the BMC kernel to assume control of these features. > > > > > > > > > > So we'll get a new binding when that happens? That will break > > > > > compatibility. > > > > > > > > Can you please expand on this? I'm not following. > > > > > > If we have a subsystem in the future, then there would likely be an > > > associated binding which would be different. So if you update the DT, > > > then old kernels won't work with it. > > > > What kind of "subsystem" ? There is almost no way there could be one > > for that sort of BMC tunables. We've look at several BMC chips out > > there and requirements from several vendors, BIOS and system > > manufacturers and it's all over the place. > > Right - This is the fundamental principle backing these patches: There will never be a coherent subsystem catering to any of what we want to describe with these bindings. I never said they would. Saying "do not integrate well into other driver models YET" implies at some point they will. No point in beating this any further, just remove "yet"... > > > > I feel like this is an argument of tradition. Maybe people have > > > > been dissuaded from doing so when they don't have a reasonable use- > > > > case? I'm not saying that what I'm proposing is unquestionably > > > > reasonable, but I don't want to dismiss it out of hand. > > > > ... > > > > > > It comes up with system controller type blocks too that just have a > > > bunch of random registers. > > This matches the situation at hand. > > > > Those change in every SoC and not in any > > > controlled or ordered way that would make describing the individual > > > sub-functions in DT worthwhile. > > "Not worthwhile" is what I'm pushing back against for our use cases. I think they are narrow and limited enough to make it worthwhile. > > Obviously we want to avoid describing these things *badly* - you mentioned the clock bindings - so I'm happy to hash out what the right representation should be. But I struggle to think the solution is not describing some of our hardware features at all. > > > > > So what's the alternative ? Because without something like what we > > propose, what's going to happen is /dev/mem ... that's what people do > > today. > > Yep. And I've outlined in the cover letter what I think are the advantages of what I'm proposing over /dev/mem. It's not an incredible gain, but has several of nice-to-have properties. > > > > > > > > A node per register bit doesn't scale. > > > > > > > > It isn't meant to scale in terms of a single system. Using it > > > > extensively is very likely wrong. Separately, register-bit-led does > > > > pretty much the same thing. Doesn't the scale argument apply there? > > > > Who is to stop me from attaching an insane number of LEDs to a > > > > system? > > > > > > Review. > > > > > > If you look, register-bit-led is rarely used outside of some ARM, Ltd. > > > boards. It's simply quite rare to have MMIO register bits that have a > > > fixed function of LED control. > > > > Well, same here, we hope to review what goes upstream to make it > > reasonable. Otherwise it doens't matter. If a random vendor, let's say > > IBM, chose to chip a system where they put an insane amount of cruft in > > there, it will only affect those systems's BMC and the userspace stack > > on it. > > > > Thankfully that stack is OpenBMC and IBM is aiming at having their > > device-tree's upstream, thus reviewed, thus it won't happen. > > > > *Anything* can be abused. The point here is that we have a number, > > thankfully rather small, maybe a dozen or two, of tunables that are > > quite specific to a combination (system vendor, bmc vendor, system > > model) which control a few HW features that essentially do *NOT* fit in > > a subsystem. > > Exactly. I tried to head off the abuse vector by requiring that uses be listed in the bindings document, and thus enforce some level of review. It might not be the most effective approach at the end of the day, but at least it is something. > > > > > For everything that does, we have created proper drivers (and are doing > > more). > > > > > > > > Obviously if there are lots of systems using it sparingly and > > > > legitimately then maybe there's a scale issue, but isn't that just > > > > a reality of different hardware designs? Whoever is implementing > > > > support for the system is going to have to describe the hardware > > > > one way or another. > > > > > > > > > > > > > > Maybe this should be modelled using GPIO binding? There's a line there > > > > > too as whether the signals are "general purpose" or not. > > > > > > > > I don't think so, mainly because some of the things it is intended to be used for are not GPIOs. For instance, take the DAC mux I've described in the patch. It doesn't directly influence anything external to the SoC (i.e. it's certainly not a traditional GPIO in any sense). However, it does *indirectly* influence the SoC's behaviour by muxing the DAC internally between: > > > > > > > > 0. VGA device exposed on the host PCIe bus > > > > 1. The "Graphics CRT" controller > > > > 2. VGA port A > > > > 3. VGA port B > > > > > > And this mux control is fixed in the SoC design? > > > > This specific family of SoC (Aspeed) support those 4 configurations. > > How they need to be configured at runtime depends on the combination of > > system vendor and system model, along with in some cases the need to > > switch it at runtime. > > > > This is just one example. Another one is the handful of scratch > > registers that need to be populated with the "right" values for the > > host system BIOS, VGA BIOS and VGA driver. (The host bits access them > > via LPC IO space). > > > > The host system BIOS will read some basic config info there before its > > IPMI stack is up (and some BIOSes already rely on that). The VGA BIOS > > will get some strapping info and panel info. The VGA driver (which is > > already upstream, has been for a long time) will look for other things > > in some of these guys, such as connector configuration. > > > > Andrew, if it helps, we could put together a list of what we typically > > need on an OpenPower system today. That would give people like Rob a > > better idea of what this is all about. > > It's primarily what I've outlined at the bottom of the bindings document, though the use cases aren't provided there as they are a bit out-of-scope. So the SuperIO and VGA scratch registers, plus the DAC mux. A bunch of tunable things. > > OpenPOWER platforms make use of the SuperIO scratch registers to convey configuration information from the BMC to the host. Information provided includes low-level control of the host firmware initialisation process, UART and logging configuration, and the strategy for handling errors (crash vs log). This is all an "arbitrary" contract between the BMC userspace and the host firmware, i.e. different platforms/firmware could lay out the same information in different ways or communicate entirely different information altogether. The BMC kernel shouldn't care about any of it, other than provide sensible access to the hardware. > > Again on OpenPOWER systems using the ASPEED BMC SoCs running OpenBMC, the BMC uses the VGA scratch registers to sense initialisation of the host graphics driver in the host's boot process. When the BMC userspace detects the host VGA driver is up we switch the DAC mux from the BMC CRT device to the host VGA device so that the host is now driving the VGA output. Non-OpenPOWER OpenBMC configurations may do something entirely different, or not do anything at all with the hardware, so as above, it's not really the job of the BMC kernel to be involved in any of this, other than to provide sensible access to userspace. > > There are a number of other switches that control the availability of ASPEED BMC hardware features to the host system that also don't fit any particular subsystem and so will use these bindings, but our (OpenPOWER/OpenBMC) current uses are what's described above. > > Dell also suggested they had some use-cases that aligned with the intent of the bindings, but I don't know what they had in mind. Eugene (on Cc) can elaborate. > > > > > > > > > > > Maybe this could be modelled by pinmux, but then we still need some > > > > way to expose the mux functions to userspace for selection > > > > (userspace needs to transition arbitrarily between at least options > > > > 0 and 1 at runtime), at which point we haven't achieved much beyond > > > > adding a whole heap of infrastructure in the chain. > > > > > > > > Given 0 and 1, maybe exposing attributes in relevant drivers would > > > > be reasonable, except 0 isn't exposed on the SoC's internal bus so > > > > there is no driver on the BMC-side to do so. Taking into account 2 > > > > and 3 are also purely hardware paths further dashes the idea, as > > > > the configuration doesn't really "belong" to the Graphics CRT > > > > device more than it belongs anywhere else, except for the fact that > > > > there isn't anywhere else to expose it. > > > > > > > > Further, the BMC's kernel can't make the decision as to when to > > > > switch the mux as it knows nothing of the host's state. The BMC > > > > userspace is controlling the host's boot state and so *does* know > > > > when to flip the switch. Finally, the mux is in separate IP to the > > > > CRT or VGA blocks: It lives in the System Control Unit. > > > > > > > > My current point of view is the DAC mux field is effectively its > > > > own device, and we need to control it from userspace, so we need > > > > some way to describe it (i.e. not ignore it) in order for its > > > > capability to be exposed. > > > > > > > > I'm fully aware what I'm proposing isn't awesome as it's not > > > > providing any real abstraction, but the problem(s) at hand also > > > > seem to defy abstraction, and in order to avoid a plethora of > > > > bespoke bindings I thought it was reasonable to define something > > > > generic. > > > > > > > > All-in-all I appreciate the suggestion, but assuming you agree with > > > > my reasoning above do you have thoughts on other alternatives? > > > > > > Seems the controls are more fixed than I first thought. All the data > > > you have here could simply be within a driver. > > Rob: A driver for what though? One unique to this particular mux? That feels overly specific when we can generalise the concept to cover a wider range of use-cases. Not unique. Just instead of populating the structs you have in the driver from DT, define them in the driver and attach them to match->data ptr. > > > Help me understand what > > > functions are fixed (in the SoC) and which ones vary by board. Only > > > what's changing per board really needs to go into DT. > > I think this last sentence identifies a difference in our starting points, so I'd like to explore that. Blocks of functionality might move around inside the SoC as well, so don't we need a way to describe those functions appropriately? Yes, if the blocks have well defined boundaries and functions. Blocks like a UART for example do. Various pieces dumped into system controllers generally don't IME. > And from there describe how the SoC integrates the functions, and then describe how a board integrates the SoC? This all composes, and the problem at the end of the day comes down to what we want to view as a point of abstraction, right? Yes. It's a judgement call as to how much we try to describe in DT. To use clocks again, a clock divider, mux, or gate all seem like well defined functions which could be (and were) described in DT, but we learned that doesn't really work. We're still converting platforms that did it that way... > It seems ideal to me that the metadata about hardware features resides in the description of the relevant system (DT, for a function, a SoC or a board), otherwise don't we wind up with crazy, unfocused, monolithic drivers for things like system controllers? (There's MFD/syscon, but having used it previously I'm still grappling with the benefit over some of the weirdness it injects into devicetree - maybe I did it wrong.) Or alternatively, a generic driver that's choc full of platform-specific data covering the platforms that require it? If that data is one set per SoC, then i'm not that concerned having platform-specific data in the driver. That doesn't mean the driver is not "generic". It's still not clear to me in this thread, how much of this is board specific, but given that you've placed all the data in an SoC dtsi file it seems to be all per SoC. Rob
On Mon, 16 Jul 2018, at 23:25, Rob Herring wrote: > On Fri, Jul 13, 2018 at 12:31 AM Andrew Jeffery <andrew@aj.id.au> wrote: > > > > Hi Rob, Ben, > > > > I've replied to you both inline below, hopefully it's clear enough from the context. > > > > On Fri, 13 Jul 2018, at 10:25, Benjamin Herrenschmidt wrote: > > > On Thu, 2018-07-12 at 09:11 -0600, Rob Herring wrote: > > > > On Wed, Jul 11, 2018 at 6:54 PM Andrew Jeffery <andrew@aj.id.au> wrote: > > > > > > > > > > Hi Rob, > > > > > > > > > > Thanks for the response. > > > > > > > > > > On Thu, 12 Jul 2018, at 05:34, Rob Herring wrote: > > > > > > On Wed, Jul 11, 2018 at 03:01:19PM +0930, Andrew Jeffery wrote: > > > > > > > Baseboard Management Controllers (BMCs) are embedded SoCs that exist to > > > > > > > provide remote management of (primarily) server platforms. BMCs are > > > > > > > often tightly coupled to the platform in terms of behaviour and provide > > > > > > > many hardware features integral to booting and running the host system. > > > > > > > > > > > > > > Some of these hardware features are simple, for example scratch > > > > > > > registers provided by the BMC that are exposed to both the host and the > > > > > > > BMC. In other cases there's a single bit switch to enable or disable > > > > > > > some of the provided functionality. > > > > > > > > > > > > > > The documentation defines bindings for fields in registers that do not > > > > > > > integrate well into other driver models yet must be described to allow > > > > > > > the BMC kernel to assume control of these features. > > > > > > > > > > > > So we'll get a new binding when that happens? That will break > > > > > > compatibility. > > > > > > > > > > Can you please expand on this? I'm not following. > > > > > > > > If we have a subsystem in the future, then there would likely be an > > > > associated binding which would be different. So if you update the DT, > > > > then old kernels won't work with it. > > > > > > What kind of "subsystem" ? There is almost no way there could be one > > > for that sort of BMC tunables. We've look at several BMC chips out > > > there and requirements from several vendors, BIOS and system > > > manufacturers and it's all over the place. > > > > Right - This is the fundamental principle backing these patches: There will never be a coherent subsystem catering to any of what we want to describe with these bindings. > > I never said they would. Saying "do not integrate well into other > driver models YET" implies at some point they will. No point in > beating this any further, just remove "yet"... Right, there should have been a comma before 'yet'. Sorry for the confusion. *snip* > > > > > > > > > > Maybe this could be modelled by pinmux, but then we still need some > > > > > way to expose the mux functions to userspace for selection > > > > > (userspace needs to transition arbitrarily between at least options > > > > > 0 and 1 at runtime), at which point we haven't achieved much beyond > > > > > adding a whole heap of infrastructure in the chain. > > > > > > > > > > Given 0 and 1, maybe exposing attributes in relevant drivers would > > > > > be reasonable, except 0 isn't exposed on the SoC's internal bus so > > > > > there is no driver on the BMC-side to do so. Taking into account 2 > > > > > and 3 are also purely hardware paths further dashes the idea, as > > > > > the configuration doesn't really "belong" to the Graphics CRT > > > > > device more than it belongs anywhere else, except for the fact that > > > > > there isn't anywhere else to expose it. > > > > > > > > > > Further, the BMC's kernel can't make the decision as to when to > > > > > switch the mux as it knows nothing of the host's state. The BMC > > > > > userspace is controlling the host's boot state and so *does* know > > > > > when to flip the switch. Finally, the mux is in separate IP to the > > > > > CRT or VGA blocks: It lives in the System Control Unit. > > > > > > > > > > My current point of view is the DAC mux field is effectively its > > > > > own device, and we need to control it from userspace, so we need > > > > > some way to describe it (i.e. not ignore it) in order for its > > > > > capability to be exposed. > > > > > > > > > > I'm fully aware what I'm proposing isn't awesome as it's not > > > > > providing any real abstraction, but the problem(s) at hand also > > > > > seem to defy abstraction, and in order to avoid a plethora of > > > > > bespoke bindings I thought it was reasonable to define something > > > > > generic. > > > > > > > > > > All-in-all I appreciate the suggestion, but assuming you agree with > > > > > my reasoning above do you have thoughts on other alternatives? > > > > > > > > Seems the controls are more fixed than I first thought. All the data > > > > you have here could simply be within a driver. > > > > Rob: A driver for what though? One unique to this particular mux? That feels overly specific when we can generalise the concept to cover a wider range of use-cases. > > Not unique. Just instead of populating the structs you have in the > driver from DT, define them in the driver and attach them to > match->data ptr. Okay, I'll prototype it. > > > > > Help me understand what > > > > functions are fixed (in the SoC) and which ones vary by board. Only > > > > what's changing per board really needs to go into DT. > > > > I think this last sentence identifies a difference in our starting points, so I'd like to explore that. Blocks of functionality might move around inside the SoC as well, so don't we need a way to describe those functions appropriately? > > Yes, if the blocks have well defined boundaries and functions. Blocks > like a UART for example do. Various pieces dumped into system > controllers generally don't IME. > > > And from there describe how the SoC integrates the functions, and then describe how a board integrates the SoC? This all composes, and the problem at the end of the day comes down to what we want to view as a point of abstraction, right? > > Yes. It's a judgement call as to how much we try to describe in DT. To > use clocks again, a clock divider, mux, or gate all seem like well > defined functions which could be (and were) described in DT, but we > learned that doesn't really work. We're still converting platforms > that did it that way... > > > It seems ideal to me that the metadata about hardware features resides in the description of the relevant system (DT, for a function, a SoC or a board), otherwise don't we wind up with crazy, unfocused, monolithic drivers for things like system controllers? (There's MFD/syscon, but having used it previously I'm still grappling with the benefit over some of the weirdness it injects into devicetree - maybe I did it wrong.) Or alternatively, a generic driver that's choc full of platform-specific data covering the platforms that require it? > > If that data is one set per SoC, then i'm not that concerned having > platform-specific data in the driver. That doesn't mean the driver is > not "generic". It's still not clear to me in this thread, how much of > this is board specific, but given that you've placed all the data in > an SoC dtsi file it seems to be all per SoC. Right, the features we want to expose are part of the SoC, not the board it sits on. The distinction was that different boards (or even different software payloads on the same board) could use them in vastly different ways, though that mainly affects how they're treated inside the kernel and at the kernel/userspace interface rather than at the devicetree level. Cheers, Andrew
On Mon, 2018-07-16 at 07:55 -0600, Rob Herring wrote: > If that data is one set per SoC, then i'm not that concerned having > platform-specific data in the driver. That doesn't mean the driver is > not "generic". It's still not clear to me in this thread, how much of > this is board specific, but given that you've placed all the data in > an SoC dtsi file it seems to be all per SoC. So Rob, I think that's precisely where the disconnect is. I think we all (well hopefully) agree that those few tunables don't fit in any existing subystem and aren't likely to ever do (famous last words...). Where we disagree is we want to make this parametrized via the DT, and you want us to hard wire the list in some kind of SoC driver for a given SoC family/version. The reason I think hard wiring the list in the driver is not a great solution is that that list in itself is prone to variations, possibly fairly often, between boards, vendors, versions of boards, etc... We can't know for sure every SoC tunable (out of the gazillions in those chips) are going to be needed for a given system. We know which ones we do use for ours, and that's a couple of handfuls, but it could be that Dell need a slightly different set, and so might Yadro, or so might our next board revision for that matter. Now, updating the device-tree in the board flash with whatever vendor specific information is needed is a LOT easier than getting the kernel driver constantly updated. The device-tree after all is there to reflect among other things system specific ways in which the SoC is wired and configured. This is rather close... Cheers, Ben.
On Tue, 17 Jul 2018, at 14:26, Benjamin Herrenschmidt wrote: > On Mon, 2018-07-16 at 07:55 -0600, Rob Herring wrote: > > If that data is one set per SoC, then i'm not that concerned having > > platform-specific data in the driver. That doesn't mean the driver is > > not "generic". It's still not clear to me in this thread, how much of > > this is board specific, but given that you've placed all the data in > > an SoC dtsi file it seems to be all per SoC. > > So Rob, I think that's precisely where the disconnect is. > > I think we all (well hopefully) agree that those few tunables don't fit > in any existing subystem and aren't likely to ever do (famous last > words...). > > Where we disagree is we want to make this parametrized via the DT, and > you want us to hard wire the list in some kind of SoC driver for a > given SoC family/version. > > The reason I think hard wiring the list in the driver is not a great > solution is that that list in itself is prone to variations, possibly > fairly often, between boards, vendors, versions of boards, etc... > > We can't know for sure every SoC tunable (out of the gazillions in > those chips) are going to be needed for a given system. We know which > ones we do use for ours, and that's a couple of handfuls, but it could > be that Dell need a slightly different set, and so might Yadro, or so > might our next board revision for that matter. > > Now, updating the device-tree in the board flash with whatever vendor > specific information is needed is a LOT easier than getting the kernel > driver constantly updated. The device-tree after all is there to > reflect among other things system specific ways in which the SoC is > wired and configured. This is rather close... Not sure this helps, but I feel that the proposal pretty closely matches what's described in Documentation/devicetree/bindings/mfd/mfd.txt. It's intended that nodes using the bindings I'm proposing are children of a 'compatible = "syscon", "simple-mfd"' node (this is the case with the features we're hoping to describe for our SoC). I should explicitly call that out. But to go on, "simple-mfd" is effectively an alias of "simple-bus", which means its intended to match child node compatibles to drivers provided by the kernel. If we shouldn't be describing minor features of a SoC in the devicetree, doesn't this invalidate the case for simple-mfd? What is the *correct* use of simple-mfd? When is it not used to expose minor features in set of "miscellaneous system registers"? Why doesn't this proposed case fit? Cheers, Andrew
On Tue, Jul 17, 2018 at 5:28 PM Andrew Jeffery <andrew@aj.id.au> wrote: > > On Tue, 17 Jul 2018, at 14:26, Benjamin Herrenschmidt wrote: > > On Mon, 2018-07-16 at 07:55 -0600, Rob Herring wrote: > > > If that data is one set per SoC, then i'm not that concerned having > > > platform-specific data in the driver. That doesn't mean the driver is > > > not "generic". It's still not clear to me in this thread, how much of > > > this is board specific, but given that you've placed all the data in > > > an SoC dtsi file it seems to be all per SoC. > > > > So Rob, I think that's precisely where the disconnect is. > > > > I think we all (well hopefully) agree that those few tunables don't fit > > in any existing subystem and aren't likely to ever do (famous last > > words...). > > > > Where we disagree is we want to make this parametrized via the DT, and > > you want us to hard wire the list in some kind of SoC driver for a > > given SoC family/version. > > > > The reason I think hard wiring the list in the driver is not a great > > solution is that that list in itself is prone to variations, possibly > > fairly often, between boards, vendors, versions of boards, etc... > > > > We can't know for sure every SoC tunable (out of the gazillions in > > those chips) are going to be needed for a given system. We know which > > ones we do use for ours, and that's a couple of handfuls, but it could > > be that Dell need a slightly different set, and so might Yadro, or so > > might our next board revision for that matter. > > > > Now, updating the device-tree in the board flash with whatever vendor > > specific information is needed is a LOT easier than getting the kernel > > driver constantly updated. The device-tree after all is there to > > reflect among other things system specific ways in which the SoC is > > wired and configured. This is rather close... > > Not sure this helps, but I feel that the proposal pretty closely matches what's described in Documentation/devicetree/bindings/mfd/mfd.txt. It's intended that nodes using the bindings I'm proposing are children of a 'compatible = "syscon", "simple-mfd"' node (this is the case with the features we're hoping to describe for our SoC). I should explicitly call that out. IMO, any binding that has only those compatibles is not correct and a specific compatible is needed. We should be able identify a specific h/w block. > But to go on, "simple-mfd" is effectively an alias of "simple-bus", which means its intended to match child node compatibles to drivers provided by the kernel. If we shouldn't be describing minor features of a SoC in the devicetree, doesn't this invalidate the case for simple-mfd? What is the *correct* use of simple-mfd? When is it not used to expose minor features in set of "miscellaneous system registers"? Why doesn't this proposed case fit? I'm no fan of simple-mfd either. I think it is abused and often a sign of bad binding design. The general problem with MFD bindings is people define child nodes based on what drivers they happen to have for some OS. DT is not the only way to instantiate drivers. Child nodes are really only needed when you have resources per child that need to be defined. For example, if the MFD has an interrupt controller and interrupts to sub-blocks or sub-blocks have their own clocks. "simple-mfd" was for when the parent node has no driver or the child nodes have no dependency on the parent. Rob
On Mon, Jul 16, 2018 at 10:56 PM Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote: > > On Mon, 2018-07-16 at 07:55 -0600, Rob Herring wrote: > > If that data is one set per SoC, then i'm not that concerned having > > platform-specific data in the driver. That doesn't mean the driver is > > not "generic". It's still not clear to me in this thread, how much of > > this is board specific, but given that you've placed all the data in > > an SoC dtsi file it seems to be all per SoC. > > So Rob, I think that's precisely where the disconnect is. > > I think we all (well hopefully) agree that those few tunables don't fit > in any existing subystem and aren't likely to ever do (famous last > words...). > > Where we disagree is we want to make this parametrized via the DT, and > you want us to hard wire the list in some kind of SoC driver for a > given SoC family/version. > > The reason I think hard wiring the list in the driver is not a great > solution is that that list in itself is prone to variations, possibly > fairly often, between boards, vendors, versions of boards, etc... Can we separate the list of what's enabled from the details of the registers? Even if we put all this into DT, we may still want to have some separation for dts file structure. Some portion of this has to be SoC specific because you are simply exposing SoC registers. > We can't know for sure every SoC tunable (out of the gazillions in > those chips) are going to be needed for a given system. We know which > ones we do use for ours, and that's a couple of handfuls, but it could > be that Dell need a slightly different set, and so might Yadro, or so > might our next board revision for that matter. That's very hand wavy. Do you have some concrete examples (i.e. dts files) showing the differences. One problem I'm having with this is you are still going to need per board specifics in userspace. You may be moving some of details out, but moving to DT is not going to completely eliminate that. I agree that not using /dev/mem is a good thing, but there are several ways you could do that independent of any DT binding. > Now, updating the device-tree in the board flash with whatever vendor > specific information is needed is a LOT easier than getting the kernel > driver constantly updated. The device-tree after all is there to > reflect among other things system specific ways in which the SoC is > wired and configured. This is rather close... Sadly, updating my kernel is easier than updating my PC firmware (though packaged firmware on my current laptop changes that). I can assure you that ARM boards are generally much worse in that regard. BTW, you may want to pay attention to EBBR[1][2]. Not sure how much you care for BMCs, but there may be some interest. Rob [1] https://github.com/ARM-software/ebbr [2] https://lists.linaro.org/pipermail/boot-architecture/
On Wed, 2018-07-18 at 13:50 -0600, Rob Herring wrote: > > > So Rob, I think that's precisely where the disconnect is. > > > > I think we all (well hopefully) agree that those few tunables don't fit > > in any existing subystem and aren't likely to ever do (famous last > > words...). > > > > Where we disagree is we want to make this parametrized via the DT, and > > you want us to hard wire the list in some kind of SoC driver for a > > given SoC family/version. > > > > The reason I think hard wiring the list in the driver is not a great > > solution is that that list in itself is prone to variations, possibly > > fairly often, between boards, vendors, versions of boards, etc... > > Can we separate the list of what's enabled from the details of the > registers? Even if we put all this into DT, we may still want to have > some separation for dts file structure. Some portion of this has to be > SoC specific because you are simply exposing SoC registers. Not sure I understand what you mean by "what's enabled". The goal here is to expose register "fields" (nor raw values, some registers need locking for RMW etc... the kernel would handle that via syscon locking I expect), so basically named "tunables" to userspace. Userspace is the one that knows the values for a given system. > > We can't know for sure every SoC tunable (out of the gazillions in > > those chips) are going to be needed for a given system. We know which > > ones we do use for ours, and that's a couple of handfuls, but it could > > be that Dell need a slightly different set, and so might Yadro, or so > > might our next board revision for that matter. > > That's very hand wavy. Do you have some concrete examples (i.e. dts > files) showing the differences. We could list what we have on the pile for some of our IBM systems today, but we would need Dell and Yadro to chime in with what they need. I still, from experience with that stuff and gut feeling, am pretty sure it's going to be a moving list, and updating the kernel driver constantly isn't going to fly. > One problem I'm having with this is you are still going to need per > board specifics in userspace. Yes. Userspace is ultimately the one that knows what needs to be done on a given machine. > You may be moving some of details out, > but moving to DT is not going to completely eliminate that. We aren't trying to either. We are trying to make sure we don't need to change the *kernel* all the time, in part bcs we are pushing hard for OpenBMC vendors to use upstream with, if possible, no vendor changes. So userspace has to know the board specific tunables anyways. Today in many systems, it does that by whacking /dev/mem. I guess you can understand why that is bad :-) > I agree > that not using /dev/mem is a good thing, but there are several ways > you could do that independent of any DT binding. Such as ? The only other option is to have one or more kernel drivers that have built-in the precise and complete list of those tunables that need to be exposed. It's not impossible, but I worry that it's not going to scale terribly well, and that vendors will constantly "fork" that driver to add different things to that list. I might be wrong here, so I'd like for Eugene (Dell) and Alexander (Yadro) to chime in, but experience with BMCs has shown that we regularily , as we add a feature or rewrite something, need to find another new magic SoC tunable the HW manufacturer hid somewhere that will make our stuff work. > > Now, updating the device-tree in the board flash with whatever vendor > > specific information is needed is a LOT easier than getting the kernel > > driver constantly updated. The device-tree after all is there to > > reflect among other things system specific ways in which the SoC is > > wired and configured. This is rather close... > > Sadly, updating my kernel is easier than updating my PC firmware > (though packaged firmware on my current laptop changes that). I can > assure you that ARM boards are generally much worse in that regard. > BTW, you may want to pay attention to EBBR[1][2]. Not sure how much > you care for BMCs, but there may be some interest. You are conflating your host kernel and your BMC here. The BMC kernel is part of the "firmware", as is its DTS and the BMC userspace. (Again this isn't the host DTS, this is the BMC DTS). They get all updated together. My point isn't about the ease or difficulty for a *user* to udpate their BMC, in that case the solutions above are equivalent. The point is from a system vendor perspective. A system vendor using OpenBMC will *customize* their BMC build in various ways. Typically they *will* have a custom DT since this is what represents their specific system and they will have some specific userspace bits. However, we are trying very hard for them *not* to fork the kernel, and if possible move OpenBMC towards a fully upstream kernel (still working on getting all the SoC drivers cleaned up and pushed up but it's moving in the right direction). So from a vendor perspective, such as us IBM, we *alread* have custom DTs and customized userspace, that's part of the normal flow of deploying a BMC. However, we are trying *NOT* to have a custom kernel (and we don't, at the moment there is an OpenBMC kernel accross vendors, though it's not *yet* fully upstream). So if the solution proposed is prone to requiring frequent changes to a kernel driver, that solution makes the above a lot more difficult and will encourage vendors to keep forking kernels. This is what we are trying to avoid. Cheers, Ben. > Rob > > [1] https://github.com/ARM-software/ebbr > [2] https://lists.linaro.org/pipermail/boot-architecture/
On Thu, 19 Jul 2018, at 04:37, Rob Herring wrote: > On Tue, Jul 17, 2018 at 5:28 PM Andrew Jeffery <andrew@aj.id.au> wrote: > > > > On Tue, 17 Jul 2018, at 14:26, Benjamin Herrenschmidt wrote: > > > On Mon, 2018-07-16 at 07:55 -0600, Rob Herring wrote: > > > > If that data is one set per SoC, then i'm not that concerned having > > > > platform-specific data in the driver. That doesn't mean the driver is > > > > not "generic". It's still not clear to me in this thread, how much of > > > > this is board specific, but given that you've placed all the data in > > > > an SoC dtsi file it seems to be all per SoC. > > > > > > So Rob, I think that's precisely where the disconnect is. > > > > > > I think we all (well hopefully) agree that those few tunables don't fit > > > in any existing subystem and aren't likely to ever do (famous last > > > words...). > > > > > > Where we disagree is we want to make this parametrized via the DT, and > > > you want us to hard wire the list in some kind of SoC driver for a > > > given SoC family/version. > > > > > > The reason I think hard wiring the list in the driver is not a great > > > solution is that that list in itself is prone to variations, possibly > > > fairly often, between boards, vendors, versions of boards, etc... > > > > > > We can't know for sure every SoC tunable (out of the gazillions in > > > those chips) are going to be needed for a given system. We know which > > > ones we do use for ours, and that's a couple of handfuls, but it could > > > be that Dell need a slightly different set, and so might Yadro, or so > > > might our next board revision for that matter. > > > > > > Now, updating the device-tree in the board flash with whatever vendor > > > specific information is needed is a LOT easier than getting the kernel > > > driver constantly updated. The device-tree after all is there to > > > reflect among other things system specific ways in which the SoC is > > > wired and configured. This is rather close... > > > > Not sure this helps, but I feel that the proposal pretty closely matches what's described in Documentation/devicetree/bindings/mfd/mfd.txt. It's intended that nodes using the bindings I'm proposing are children of a 'compatible = "syscon", "simple-mfd"' node (this is the case with the features we're hoping to describe for our SoC). I should explicitly call that out. > > IMO, any binding that has only those compatibles is not correct and a > specific compatible is needed. We should be able identify a specific > h/w block. I didn't intend for that to get interpreted quite as literally, so apologies for that. We do have h/w-block-specific compatibles in there too. The point was to demonstrate that we're dealing (at this point, only) with mfds/syscons. > > > But to go on, "simple-mfd" is effectively an alias of "simple-bus", which means its intended to match child node compatibles to drivers provided by the kernel. If we shouldn't be describing minor features of a SoC in the devicetree, doesn't this invalidate the case for simple-mfd? What is the *correct* use of simple-mfd? When is it not used to expose minor features in set of "miscellaneous system registers"? Why doesn't this proposed case fit? > > I'm no fan of simple-mfd either. I think it is abused and often a sign > of bad binding design. Ah, yes, this is a familiar feeling when reflecting on things I've done in the past. Hence trying to understand how to use it right. > The general problem with MFD bindings is people > define child nodes based on what drivers they happen to have for some > OS. DT is not the only way to instantiate drivers. Child nodes are > really only needed when you have resources per child that need to be > defined. For example, if the MFD has an interrupt controller and > interrupts to sub-blocks or sub-blocks have their own clocks. > "simple-mfd" was for when the parent node has no driver or the child > nodes have no dependency on the parent. Thanks for the clarification, I'll keep that in mind going forward. Andrew
> > I agree > > that not using /dev/mem is a good thing, but there are several ways > > you could do that independent of any DT binding. > > Such as ? The only other option is to have one or more kernel drivers > that have built-in the precise and complete list of those tunables that > need to be exposed. > > It's not impossible, but I worry that it's not going to scale terribly > well, and that vendors will constantly "fork" that driver to add > different things to that list. Picking on that last point, "different things" doesn't necessarily mean "different fields in the SoC" (though it may). We could just need to use different initial values for the fields already described. So taking that into account, the field descriptions could vary wildly from platform to platform - where "platform" here is the combination of the BMC, its host system, and the host system's required configuration - not just referring to the BMC in isolation. That in turn may cause a fork of the driver (changes that are incompatible with other platforms), or not scale terribly well as Ben suggests. The initial value concept can help reduce the impact on userspace as userpace may no-longer need to care about it, so I think it's a desirable property. With respect to devicetree, the field definitions will stay in the SoC dtsi, but the initial values would be described on a per-platform basis in the dts. > > I might be wrong here, so I'd like for Eugene (Dell) and Alexander > (Yadro) to chime in, but experience with BMCs has shown that we > regularily , as we add a feature or rewrite something, need to find > another new magic SoC tunable the HW manufacturer hid somewhere that > will make our stuff work. > > > > Now, updating the device-tree in the board flash with whatever vendor > > > specific information is needed is a LOT easier than getting the kernel > > > driver constantly updated. The device-tree after all is there to > > > reflect among other things system specific ways in which the SoC is > > > wired and configured. This is rather close... > > > > Sadly, updating my kernel is easier than updating my PC firmware > > (though packaged firmware on my current laptop changes that). I can > > assure you that ARM boards are generally much worse in that regard. > > BTW, you may want to pay attention to EBBR[1][2]. Not sure how much > > you care for BMCs, but there may be some interest. > > You are conflating your host kernel and your BMC here. The BMC kernel > is part of the "firmware", as is its DTS and the BMC userspace. > > (Again this isn't the host DTS, this is the BMC DTS). > > They get all updated together. My point isn't about the ease or > difficulty for a *user* to udpate their BMC, in that case the solutions > above are equivalent. > > The point is from a system vendor perspective. A system vendor using > OpenBMC will *customize* their BMC build in various ways. Typically > they *will* have a custom DT since this is what represents their > specific system and they will have some specific userspace bits. > > However, we are trying very hard for them *not* to fork the kernel, and > if possible move OpenBMC towards a fully upstream kernel (still working > on getting all the SoC drivers cleaned up and pushed up but it's moving > in the right direction). > > So from a vendor perspective, such as us IBM, we *alread* have custom > DTs and customized userspace, that's part of the normal flow of > deploying a BMC. > > However, we are trying *NOT* to have a custom kernel (and we don't, at > the moment there is an OpenBMC kernel accross vendors, though it's not > *yet* fully upstream). > > So if the solution proposed is prone to requiring frequent changes to a > kernel driver, that solution makes the above a lot more difficult and > will encourage vendors to keep forking kernels. > > This is what we are trying to avoid. Well described. Thanks Ben. Andrew
On Thu, 2018-07-19 at 11:58 +0930, Andrew Jeffery wrote: > > > I agree > > > that not using /dev/mem is a good thing, but there are several ways > > > you could do that independent of any DT binding. > > > > Such as ? The only other option is to have one or more kernel drivers > > that have built-in the precise and complete list of those tunables that > > need to be exposed. > > > > It's not impossible, but I worry that it's not going to scale terribly > > well, and that vendors will constantly "fork" that driver to add > > different things to that list. > > Picking on that last point, "different things" doesn't necessarily > mean "different fields in the SoC" (though it may). We could just > need to use different initial values for the fields already > described. I don't worry about initial values too much. Those could be specified by userspace. It would be trivial to have something akin to /etc/sysctl.conf that contains the initial values and a script blasting them early. In fact I prefer this being in userspace to be frank. It could be in the initramfs if we want it done early enough, maybe with a usermode helper. Unless we have some demonstrable reasons why some of these must be set before the various kernel drivers initialize. > So taking that into account, the field descriptions could vary wildly > from platform to platform - where "platform" here is the combination > of the BMC, its host system, and the host system's required > configuration - not just referring to the BMC in isolation. That in > turn may cause a fork of the driver (changes that are incompatible > with other platforms), or not scale terribly well as Ben suggests. I really only worry about the actual set of registers/fields. I think folding in initial values goes a bit too far. > The initial value concept can help reduce the impact on userspace as > userpace may no-longer need to care about it, so I think it's a > desirable property. I don't worry too much about userspace containing system specific bits and pieces such as a config file with the appropriate initial values for the platform. Userspace will have to contain platform specific stuff regardless (if anything, stuff like thermal control is intrinsicly different from one platform to the next). > With respect to devicetree, the field definitions will stay in the > SoC dtsi, but the initial values would be described on a per-platform > basis in the dts. If the fields are in the SoC dtsi, then Rob has a reasonable argument that the list of fields is rather fixed for a given SoC and thus could be hard wired in the driver.... I think at this stage, we need to create more clarity. In order to do that, I think we need to come up with a list, as exhaustive as we can, of what are the fields/register we need for our platforms, and find any reason why userspace wouldn't be a good enough location to stick the initial values. Then we need Dell and Yadro (and maybe others) to help with their variant of that list for the same SoCs to begin with. With that, we'll get an idea of whether we think we can get a reasonable stable long term list that's appropriate for most vendors. If that's the case, then my objections to have it in the kernel instead of the DT fade away a bit. If one or two of those fields absolutely must have their defaults early, then we can consider a specific set of one or two properties in the SoC node for that SoC family that provides those specific defaults. But unless we have that list, I think we'll be throwing too many hypotheticals around to convince Rob and others. Andrew, can you start with a list that shows what you expect us to need on our systems ? Cheers, Ben
> > Andrew, can you start with a list that shows what you expect us to need > on our systems ? > Okay, our Witherspoon and Romulus platforms containing the ASPEED AST2500 currently need the following tuneables exposed: From the SCU: - Debug UART enable - VGA DAC mux - VGA scratch registers 0-7 - LPC SuperIO decode enable - VGA MMIO decode enable From the LPC controller: - iLPC2AHB enable - SuperIO scratch registers 0x20-0x2f (The LPC controller is just as much of a collection of random bits as the SCU) Lastly, our Palmetto platform uses an AST2400 which has fewer features compared to the AST2500. Its tuneable list is the same as the above with the exception of "Debug UART enable". Tuneables that we may need to expose in the future include: From the SCU: - PCI VID/DID for the BMC PCIe device - VGA device enable (may need to be disabled if the platform contains a discrete graphics processor) From the LPC controller: - UART mux Alexander, Eugene, can you chime in with your platforms' needs? Cheers, Andrew
On Fri, 2018-07-20 at 09:37 +0930, Andrew Jeffery wrote: > > > > Andrew, can you start with a list that shows what you expect us to need > > on our systems ? > > > > Okay, our Witherspoon and Romulus platforms containing the ASPEED AST2500 currently need the following tuneables exposed: > > > From the SCU: > > - Debug UART enable > - VGA DAC mux > - VGA scratch registers 0-7 > - LPC SuperIO decode enable > - VGA MMIO decode enable > > > From the LPC controller: > > - iLPC2AHB enable > - SuperIO scratch registers 0x20-0x2f > > (The LPC controller is just as much of a collection of random bits as the SCU) > > Lastly, our Palmetto platform uses an AST2400 which has fewer features compared to the AST2500. Its tuneable list is the same as the above with the exception of "Debug UART enable". > > Tuneables that we may need to expose in the future include: > > > From the SCU: > > - PCI VID/DID for the BMC PCIe device > - VGA device enable (may need to be disabled if the platform contains a discrete graphics processor) Additionally there's a bunch of resigters controlling the mapping of various MMIO regions of the BMC PCIe device to portions of the BMC address space. I'm not sure what's the best way to handle that. This specific set might require a dedicated device as a subnode of the SCU in the DT that contains all the mappings as properties... That or we consider them static enough and just whack it in u-boot. > > From the LPC controller: > > - UART mux > > Alexander, Eugene, can you chime in with your platforms' needs? > > Cheers, > > Andrew
Andrew, Benjamin, Rob, Thanks for bringing up the set of patches and a great discussion. After going through the thread I figured that I'd like to share a few things we needed to hack when programming several BMC boards: - Debug UART enable/mux - Disable GPIO D/E passthrough (I think this is supported by the current pinctrl driver) - RMII/RGMII strapping - iLPC2AHB control - SPI master mux select - Various SuperIO configurations As for the discussion whether these belong to a platform driver or device tree nodes, I think in an ideal world all these configurations could be nicely grouped and abstracted in a platform kernel driver (or drivers). However in reality think this as an "M * N" problem: there are M variants of BMCs and N different platforms built with these BMCs. Each platform-BMC combination is going to have its own quirks and slightly different requirements in BMC "tunables". Were there a kernel driver for the M BMC variants, it would inevitably have a lot of churn due to the different needs of the platforms. What I like about the device tree approach is the expressiveness of the format and the ability to specify non-conflicting initial values easily. Sometimes we need initial values for these parameters set before running userspace, and setting such values in device tree is easier than using #defines or kernel parameters. On Thu, Jul 19, 2018 at 9:57 PM Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote: > > On Fri, 2018-07-20 at 09:37 +0930, Andrew Jeffery wrote: > > > > > > Andrew, can you start with a list that shows what you expect us to need > > > on our systems ? > > > > > > > Okay, our Witherspoon and Romulus platforms containing the ASPEED AST2500 currently need the following tuneables exposed: > > > > > From the SCU: > > > > - Debug UART enable > > - VGA DAC mux > > - VGA scratch registers 0-7 > > - LPC SuperIO decode enable > > - VGA MMIO decode enable > > > > > From the LPC controller: > > > > - iLPC2AHB enable > > - SuperIO scratch registers 0x20-0x2f > > > > (The LPC controller is just as much of a collection of random bits as the SCU) > > > > Lastly, our Palmetto platform uses an AST2400 which has fewer features compared to the AST2500. Its tuneable list is the same as the above with the exception of "Debug UART enable". > > > > Tuneables that we may need to expose in the future include: > > > > > From the SCU: > > > > - PCI VID/DID for the BMC PCIe device > > - VGA device enable (may need to be disabled if the platform contains a discrete graphics processor) > > Additionally there's a bunch of resigters controlling the mapping of > various MMIO regions of the BMC PCIe device to portions of the BMC > address space. I'm not sure what's the best way to handle that. > > This specific set might require a dedicated device as a subnode of > the SCU in the DT that contains all the mappings as properties... > > That or we consider them static enough and just whack it in u-boot. > > > > From the LPC controller: > > > > - UART mux > > > > Alexander, Eugene, can you chime in with your platforms' needs? > > > > Cheers, > > > > Andrew -- Regards, Kun
20.07.2018 03:07, Andrew Jeffery wrote: >> Andrew, can you start with a list that shows what you expect us to need >> on our systems ? >> > Okay, our Witherspoon and Romulus platforms containing the ASPEED AST2500 currently need the following tuneables exposed: > > From the SCU: > - Debug UART enable > - VGA DAC mux > - VGA scratch registers 0-7 > - LPC SuperIO decode enable > - VGA MMIO decode enable > > From the LPC controller: > - iLPC2AHB enable > - SuperIO scratch registers 0x20-0x2f > > (The LPC controller is just as much of a collection of random bits as the SCU) > > Lastly, our Palmetto platform uses an AST2400 which has fewer features compared to the AST2500. Its tuneable list is the same as the above with the exception of "Debug UART enable". > > Tuneables that we may need to expose in the future include: > > From the SCU: > - PCI VID/DID for the BMC PCIe device > - VGA device enable (may need to be disabled if the platform contains a discrete graphics processor) > > From the LPC controller: > - UART mux > > Alexander, Eugene, can you chime in with your platforms' needs? In addition to what you've mentioned, we also need (and I believe you need it too) the SCU3C register (reset control/status) to determine the BMC reset reason and only perform certain actions on cold boots (see https://github.com/openbmc/openbmc/issues/3031). Although, this could probably be handled by a separate driver, don't know which though. Alexander.
diff --git a/Documentation/devicetree/bindings/misc/bmc-misc-ctrl.txt b/Documentation/devicetree/bindings/misc/bmc-misc-ctrl.txt new file mode 100644 index 000000000000..2c869fcc7ef2 --- /dev/null +++ b/Documentation/devicetree/bindings/misc/bmc-misc-ctrl.txt @@ -0,0 +1,252 @@ +BMC Miscellaneous Control Interfaces +==================================== + +Baseboard Management Controllers (BMCs) often have an array of hardware +features that need to be described but are awkward to sensibly expose. + +This bindings document provides a generic mechanism for describing such +features, covering read-only (RO), read-modify-write (RMW) and +write-1-set/write-1-clear (W1SC) semantics. + +All uses of bmc-misc-ctrl must be documented under Valid Uses below. + +The bindings are similar in nature to register-bit-led. + +Required Properties +------------------- + +compatible: Must be "bmc-misc-ctrl" +offset: A one or three cell property describing the registers + associated with the field. + + If the optional property 'set-clear' is not present then the + node describes a register with read-modify-write semantics. The + offset property has one cell describing the register of + interest. + + If the optional property 'set-clear' is present then the node + describes a register set that together implement read, + write-1-set and write-1-clear semantics. The offset property + must be three cells, the first is the address of the register + to read from, the second the write-1-set register and the third + write-1-clear. + +mask: A mask whose set bits represent the bits of the field. +label: The name of the field + +Optional Properties +------------------- + +read-only: Define a read-only field (RMW/W1SC irrelevant). +set-clear: Define whether the field exists in a RMW or W1SC register set +default-value: Single cell applicable to RMW. The field will be updated to the + cell's value. +default-set: For W1SC, set all bits in the field +default-clear: For W1SC, clear all bits in the field + +Valid Uses +---------- + +Description: Control bit for switching the video display DAC mux between + host VGA and BMC CRT mode +Machines: aspeed,ast2500 +Parent: compatible = "aspeed,ast2500-scu", "syscon", "simple-mfd"; +Node: + field@2c.16 { + compatible = "bmc-misc-ctrl"; + offset = <0x2c>; + mask = <0x00030000>; + label = "dac-mux"; + }; + +Description: Host VGA scratch registers +Machines: aspeed,ast2500 +Parent: compatible = "aspeed,ast2500-scu", "syscon", "simple-mfd"; +Node: + field@50.0 { + compatible = "bmc-misc-ctrl"; + offset = <0x50>; + mask = <0xffffffff>; + label = "vga0"; + read-only; + }; + + field@54.0 { + compatible = "bmc-misc-ctrl"; + offset = <0x54>; + mask = <0xffffffff>; + label = "vga1"; + read-only; + }; + + field@58.0 { + compatible = "bmc-misc-ctrl"; + offset = <0x58>; + mask = <0xffffffff>; + label = "vga2"; + read-only; + }; + + field@5c.0 { + compatible = "bmc-misc-ctrl"; + offset = <0x5c>; + mask = <0xffffffff>; + label = "vga3"; + read-only; + }; + + field@60.0 { + compatible = "bmc-misc-ctrl"; + offset = <0x60>; + mask = <0xffffffff>; + label = "vga4"; + read-only; + }; + + field@64.0 { + compatible = "bmc-misc-ctrl"; + offset = <0x64>; + mask = <0xffffffff>; + label = "vga5"; + read-only; + }; + + field@68.0 { + compatible = "bmc-misc-ctrl"; + offset = <0x68>; + mask = <0xffffffff>; + label = "vga6"; + read-only; + }; + + field@6c.0 { + compatible = "bmc-misc-ctrl"; + offset = <0x6c>; + mask = <0xffffffff>; + label = "vga7"; + read-only; + }; + +Description: Super I/O device scratch registers for host/BMC communication +Machines: aspeed,ast2500 +Parent: compatible = "aspeed,ast2500-lpc-host", "simple-mfd", "syscon"; + field@f0.24 { + compatible = "bmc-misc-ctrl"; + offset = <0xf0 24 8>; + mask = <0xff000000>; + label = "sio2b"; + }; + + field@f0.16 { + compatible = "bmc-misc-ctrl"; + offset = <0xf0>; + mask = <0x00ff0000>; + label = "sio2a"; + }; + + field@f0.8 { + compatible = "bmc-misc-ctrl"; + offset = <0xf0>; + mask = <0x0000ff00>; + bit-shift = <8>; + label = "sio29"; + }; + + field@f0.0 { + compatible = "bmc-misc-ctrl"; + offset = <0xf0>; + mask = <0x000000ff>; + label = "sio28"; + }; + + field@f4.24 { + compatible = "bmc-misc-ctrl"; + offset = <0xf4>; + mask = <0xff000000>; + label = "sio2f"; + }; + + field@f4.16 { + compatible = "bmc-misc-ctrl"; + offset = <0xf4>; + mask = <0x00ff0000>; + label = "sio2e"; + }; + + field@f4.8 { + compatible = "bmc-misc-ctrl"; + offset = <0xf4>; + mask = <0x0000ff00>; + label = "sio2d"; + }; + + field@f4.0 { + compatible = "bmc-misc-ctrl"; + offset = <0xf4>; + mask = <0x000000ff>; + label = "sio2c"; + }; + + field@f8.24 { + compatible = "bmc-misc-ctrl"; + offset = <0xf8>; + mask = <0xff000000>; + read-only; + label = "sio23"; + }; + + field@f8.16 { + compatible = "bmc-misc-ctrl"; + offset = <0xf8>; + mask = <0x00ff0000>; + read-only; + label = "sio22"; + }; + + field@f8.8 { + compatible = "bmc-misc-ctrl"; + offset = <0xf8>; + mask = <0x0000ff00>; + read-only; + label = "sio21"; + }; + + field@f8.0 { + compatible = "bmc-misc-ctrl"; + offset = <0xf8>; + mask = <0x000000ff>; + read-only; + label = "sio20"; + }; + + field@fc.24 { + compatible = "bmc-misc-ctrl"; + offset = <0xfc>; + mask = <0xff000000>; + read-only; + label = "sio27"; + }; + + field@fc.16 { + compatible = "bmc-misc-ctrl"; + offset = <0xfc>; + mask = <0x00ff0000>; + read-only; + label = "sio26"; + }; + + field@fc.8 { + compatible = "bmc-misc-ctrl"; + offset = <0xfc>; + mask = <0x0000ff00>; + read-only; + label = "sio25"; + }; + + field@fc.0 { + compatible = "bmc-misc-ctrl"; + offset = <0xfc>; + mask = <0x000000ff>; + read-only; + label = "sio24"; + }; diff --git a/MAINTAINERS b/MAINTAINERS index 07d1576fc766..fa2033a522f2 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -2736,6 +2736,12 @@ S: Supported F: drivers/net/bonding/ F: include/uapi/linux/if_bonding.h +BMC MISCELLANEOUS CONTROL FIELDS +R: Andrew Jeffery <andrew@aj.id.au> +L: openbmc@lists.ozlabs.org (moderated for non-subscribers) +S: Supported +F: Documentation/devicetree/bindings/misc/bmc-misc-ctrl.txt + BPF (Safe dynamic programs and tools) M: Alexei Starovoitov <ast@kernel.org> M: Daniel Borkmann <daniel@iogearbox.net>
Baseboard Management Controllers (BMCs) are embedded SoCs that exist to provide remote management of (primarily) server platforms. BMCs are often tightly coupled to the platform in terms of behaviour and provide many hardware features integral to booting and running the host system. Some of these hardware features are simple, for example scratch registers provided by the BMC that are exposed to both the host and the BMC. In other cases there's a single bit switch to enable or disable some of the provided functionality. The documentation defines bindings for fields in registers that do not integrate well into other driver models yet must be described to allow the BMC kernel to assume control of these features. Signed-off-by: Andrew Jeffery <andrew@aj.id.au> --- Since RFC v1: * Add a commit message * Minor changes to documented labels .../bindings/misc/bmc-misc-ctrl.txt | 252 ++++++++++++++++++ MAINTAINERS | 6 + 2 files changed, 258 insertions(+) create mode 100644 Documentation/devicetree/bindings/misc/bmc-misc-ctrl.txt