mbox series

[0/3] Apple M1 clock gate driver

Message ID 20210524182745.22923-1-sven@svenpeter.dev (mailing list archive)
Headers show
Series Apple M1 clock gate driver | expand

Message

Sven Peter May 24, 2021, 6:27 p.m. UTC
Hi,

This series adds support for the clock gates present in Apple's SoCs such as
the M1.

These SoCs have various clock gates for their peripherals usually located in
one or two MMIO regions. Each clock gate is a single 32 bit register which
contains bits for the requested and the actual mode. The mode is represented by
four bits. We however only care about "everything enabled" (0b1111) and
"everything disabled" (0b000) for now. The other modes are probably different
low-power states which don't even seem to be used by MacOS. The actual power
management is located in a different region (and probably handled by a separate
processor on the M1).

Additionally, these clocks have a topology that is usually specified in Apple's
Device Tree. As far as I can tell most Linux drivers seem to encode this topology
directly in the source code though. Despite this, we would still like to encode
the topology in the device tree itself:

Apple seems to only change HW blocks when they have a very good reason and even
then they seem to keep the changes minimal. This specific clock gate MMIO block
has already been used in the Apple A7 released in 2013. The only differences
since then are the available clocks (which can be identified by a simple offset)
and their topology.

It's likely that Apple will still use this block in future SoCs as well. By
encoding the clock gate topology inside the device tree as well we can use a
single driver and only need updates to the device tree once they are released.
This might allow end users to already run their favorite distribution by just
updating the bootloader with a new device tree instead of having to wait until
the new topology is integrated into their distro kernel.

Additionally, the driver itself can be kept very simple and not much code needs
to be duplicated and then updated for each new SoC between various consumers of
these device tree nodes (Linux, u-boot, and OpenBSD for now).


This series therefore creates a single device tree node for each clock gate.
This unfortunately maps a whole page out of which only a single register will
be used for each node.

The other alternative I considered was to create a syscon/simple-mfd node
and have the clock nodes as children. The gates would then use a regmap which
would only require a single entry in the pagetable for all clocks. I decided
against this option since the clock gate MMIO region actually isn't a
multi-function device.

I'll also gladly implement other solutions - especially if there is a way to
keep the clock toplogy inside the device tree without having to create a
pagetable entry for every single clock node.

Best,


Sven

Sven Peter (3):
  dt-bindings: clock: add DT bindings for apple,gate-clock
  clk: add support for gate clocks on Apple SoCs
  arm64: apple: add uart gate clocks

 .../bindings/clock/apple,gate-clock.yaml      |  60 +++++++
 MAINTAINERS                                   |   2 +
 arch/arm64/boot/dts/apple/t8103.dtsi          |  36 ++++-
 drivers/clk/Kconfig                           |  12 ++
 drivers/clk/Makefile                          |   1 +
 drivers/clk/clk-apple-gate.c                  | 152 ++++++++++++++++++
 6 files changed, 262 insertions(+), 1 deletion(-)
 create mode 100644 Documentation/devicetree/bindings/clock/apple,gate-clock.yaml
 create mode 100644 drivers/clk/clk-apple-gate.c

Comments

Rob Herring May 25, 2021, 5:41 p.m. UTC | #1
On Mon, May 24, 2021 at 1:28 PM Sven Peter <sven@svenpeter.dev> wrote:
>
>
> Hi,
>
> This series adds support for the clock gates present in Apple's SoCs such as
> the M1.
>
> These SoCs have various clock gates for their peripherals usually located in
> one or two MMIO regions. Each clock gate is a single 32 bit register which
> contains bits for the requested and the actual mode. The mode is represented by
> four bits. We however only care about "everything enabled" (0b1111) and
> "everything disabled" (0b000) for now. The other modes are probably different
> low-power states which don't even seem to be used by MacOS. The actual power
> management is located in a different region (and probably handled by a separate
> processor on the M1).
>
> Additionally, these clocks have a topology that is usually specified in Apple's
> Device Tree. As far as I can tell most Linux drivers seem to encode this topology
> directly in the source code though. Despite this, we would still like to encode
> the topology in the device tree itself:

We only define leaf clocks primarily. There's some exceptions such as
if PLLs are a separate h/w block. The reason for this is because
typical SoCs have 100s of just leaf clocks. If we tried to model
everything, it would never be complete nor correct. Actually, we did
try that at first.

> Apple seems to only change HW blocks when they have a very good reason and even
> then they seem to keep the changes minimal. This specific clock gate MMIO block
> has already been used in the Apple A7 released in 2013. The only differences
> since then are the available clocks (which can be identified by a simple offset)
> and their topology.

Clock gates are easy. What about muxes, dividers, etc.?

> It's likely that Apple will still use this block in future SoCs as well. By
> encoding the clock gate topology inside the device tree as well we can use a
> single driver and only need updates to the device tree once they are released.
> This might allow end users to already run their favorite distribution by just
> updating the bootloader with a new device tree instead of having to wait until
> the new topology is integrated into their distro kernel.
>
> Additionally, the driver itself can be kept very simple and not much code needs
> to be duplicated and then updated for each new SoC between various consumers of
> these device tree nodes (Linux, u-boot, and OpenBSD for now).
>
>
> This series therefore creates a single device tree node for each clock gate.
> This unfortunately maps a whole page out of which only a single register will
> be used for each node.
>
> The other alternative I considered was to create a syscon/simple-mfd node
> and have the clock nodes as children. The gates would then use a regmap which
> would only require a single entry in the pagetable for all clocks. I decided
> against this option since the clock gate MMIO region actually isn't a
> multi-function device.

I would do a single node per mmio region with the register offset (or
offset / 4) being the clock id. This can still support new SoCs easily
if you have a fallback compatible. If you want/need to get all the
clocks, just walk the DT 'clocks' properties and extract all the IDs.

Rob
Tony Lindgren May 26, 2021, 7:18 a.m. UTC | #2
Hi,

* Rob Herring <robh+dt@kernel.org> [210525 18:09]:
> I would do a single node per mmio region with the register offset (or
> offset / 4) being the clock id. This can still support new SoCs easily
> if you have a fallback compatible. If you want/need to get all the
> clocks, just walk the DT 'clocks' properties and extract all the IDs.

I mostly agree.. Except I'd also leave out the artificial clock ID and
just use real register offsets from the clock controller base instead.

So a single clock controller node for each MMIO range, then set
#clock=cells = <1>. Then the binding follows what we have for the
interrupts-extended binding for example.

If the clock controller optionally needs some data in the dts,
that can be added to the clock controller node. Or it can be driver
internal built-in data. If the data for dts can be described in a
generic way, even better :)

This would make the consumer interface look like below with a
clock controller node and register offset from it:

clocks = <&clock_controller1 0x1234>;

Regards,

Tony
Sven Peter May 30, 2021, 11:05 a.m. UTC | #3
On Tue, May 25, 2021, at 19:41, Rob Herring wrote:
> On Mon, May 24, 2021 at 1:28 PM Sven Peter <sven@svenpeter.dev> wrote:
> >
> > Additionally, these clocks have a topology that is usually specified in Apple's
> > Device Tree. As far as I can tell most Linux drivers seem to encode this topology
> > directly in the source code though. Despite this, we would still like to encode
> > the topology in the device tree itself:
> 
> We only define leaf clocks primarily. There's some exceptions such as
> if PLLs are a separate h/w block. The reason for this is because
> typical SoCs have 100s of just leaf clocks. If we tried to model
> everything, it would never be complete nor correct. Actually, we did
> try that at first.

Makes sense.

For Apple SoCs specifically most topology seems to documented (with names!)
in the arm-io/pmgr device node of their ADT.

> 
> > Apple seems to only change HW blocks when they have a very good reason and even
> > then they seem to keep the changes minimal. This specific clock gate MMIO block
> > has already been used in the Apple A7 released in 2013. The only differences
> > since then are the available clocks (which can be identified by a simple offset)
> > and their topology.
> 
> Clock gates are easy. What about muxes, dividers, etc.?

These are preconfigured by iBoot long before we get control and I'm not
sure we will be able to figure out how they work in detail. All we know
are their frequencies and we will probably have to just treat them as
fixed frequency clocks.

Some details:

In Apple's device tree, each node can have a "clock-gates" or "power-gates"
property which refer to clock gates required to enable the peripheral.
These are implemented with this small driver and required to make most of
the hardware work. Most of these are turned off by the time we get control.

Additionally, there is a "clocks" property that either refers to one of
eight or so static clocks (starting at index 0) which are not backed by any
configurable PLL, mux or divider. On top of that there is a list of what I
believe to be mux/divider clocks (index >0x100) specified by the
clock-frequencies and clock-frequencies-regs properties of the arm-io node.
These registers are in a separate MMIO region and XNU only seems to use
clock-frequencies to find the clock rate. It never even reads
clock-frequencies-regs.

While I can guess some of the bits of these registers (I believe I know the
enable bit and the divisor field and possibly the input select but I do not
know the inputs and believe they are actually different for each clock) I
don't see a need to implement them right now. If we need them in the future
they'd probably be a separate node anyway since they are all also located in
an entirely different MMIO region and not related to these gates at all.


> 
> > It's likely that Apple will still use this block in future SoCs as well. By
> > encoding the clock gate topology inside the device tree as well we can use a
> > single driver and only need updates to the device tree once they are released.
> > This might allow end users to already run their favorite distribution by just
> > updating the bootloader with a new device tree instead of having to wait until
> > the new topology is integrated into their distro kernel.
> >
> > Additionally, the driver itself can be kept very simple and not much code needs
> > to be duplicated and then updated for each new SoC between various consumers of
> > these device tree nodes (Linux, u-boot, and OpenBSD for now).
> >
> >
> > This series therefore creates a single device tree node for each clock gate.
> > This unfortunately maps a whole page out of which only a single register will
> > be used for each node.
> >
> > The other alternative I considered was to create a syscon/simple-mfd node
> > and have the clock nodes as children. The gates would then use a regmap which
> > would only require a single entry in the pagetable for all clocks. I decided
> > against this option since the clock gate MMIO region actually isn't a
> > multi-function device.
> 
> I would do a single node per mmio region with the register offset (or
> offset / 4) being the clock id. This can still support new SoCs easily
> if you have a fallback compatible. If you want/need to get all the
> clocks, just walk the DT 'clocks' properties and extract all the IDs.
> 

The problem with that approach is that to enable e.g. UART_0 we actually need
to enable its parents as well, e.g. the Apple Device Tree for the M1 has the
following clock topology:

    UART0  (0x23b700270), parent: UART_P
    UART_P (0x23b700220), parent: SIO
    SIO    (0x23b7001c0), parent: n/a

The offsets and the parent/child relationship for all of these three clocks
change between SoCs. If I now use the offset as the clock id I still need
to specify that if e.g. UART uses <&clk_controller 0x270> I first need
to enable 0x1c0 and then 0x220 and only then 0x270.

I can hardcode this in the driver itself by just selecting the appropriate table
depending on the compatible but I think we're then back to requiring a code
change for new SoCs. Just using the fallback compatible won't be enough unless
I introduce dummy nodes that enable UART_P and SIO as well.

If I do it like this I'll also need two compatibles for the two MMIO regions
since they have different child/parent clocks in there. Any suggestions for
how to call those two then? Just t8103-gate-clock-0/1? 

I'd still prefer to somehow put this information into the device tree since
then only a single compatible that will likely work across many SoC generations
is required. I'm not sure how to do this though.


Best,


Sven
Sven Peter May 30, 2021, 11:08 a.m. UTC | #4
Hi,

On Wed, May 26, 2021, at 09:18, Tony Lindgren wrote:
> Hi,
> 
> * Rob Herring <robh+dt@kernel.org> [210525 18:09]:
> > I would do a single node per mmio region with the register offset (or
> > offset / 4) being the clock id. This can still support new SoCs easily
> > if you have a fallback compatible. If you want/need to get all the
> > clocks, just walk the DT 'clocks' properties and extract all the IDs.
> 
> I mostly agree.. Except I'd also leave out the artificial clock ID and
> just use real register offsets from the clock controller base instead.

Sure, I'll do that.

> 
> So a single clock controller node for each MMIO range, then set
> #clock=cells = <1>. Then the binding follows what we have for the
> interrupts-extended binding for example.
> 
> If the clock controller optionally needs some data in the dts,
> that can be added to the clock controller node. Or it can be driver
> internal built-in data. If the data for dts can be described in a
> generic way, even better :)

Now the big question is *how* to describe this additional data in the
dts. Essentially I need to specify that e.g. to enable clock 0x270
I first need to enable the (internal) clocks 0x1c0 and then 0x220.
Are you aware of any generic way to describe this? I'm not even sure
how a sane non-generic way would look like when I just have a single
clock controller node.



Best,

Sven
Tony Lindgren June 2, 2021, 9:26 a.m. UTC | #5
* Sven Peter <sven@svenpeter.dev> [210530 11:09]:
> Hi,
> 
> On Wed, May 26, 2021, at 09:18, Tony Lindgren wrote:
> > Hi,
> > 
> > * Rob Herring <robh+dt@kernel.org> [210525 18:09]:
> > > I would do a single node per mmio region with the register offset (or
> > > offset / 4) being the clock id. This can still support new SoCs easily
> > > if you have a fallback compatible. If you want/need to get all the
> > > clocks, just walk the DT 'clocks' properties and extract all the IDs.
> > 
> > I mostly agree.. Except I'd also leave out the artificial clock ID and
> > just use real register offsets from the clock controller base instead.
> 
> Sure, I'll do that.
> 
> > 
> > So a single clock controller node for each MMIO range, then set
> > #clock=cells = <1>. Then the binding follows what we have for the
> > interrupts-extended binding for example.
> > 
> > If the clock controller optionally needs some data in the dts,
> > that can be added to the clock controller node. Or it can be driver
> > internal built-in data. If the data for dts can be described in a
> > generic way, even better :)
> 
> Now the big question is *how* to describe this additional data in the
> dts. Essentially I need to specify that e.g. to enable clock 0x270
> I first need to enable the (internal) clocks 0x1c0 and then 0x220.
> Are you aware of any generic way to describe this? I'm not even sure
> how a sane non-generic way would look like when I just have a single
> clock controller node.

To me it seems you might be able to recycle the assigned-clocks and
assigned-clock-parents etc properties in the clock controller node.

Sure the assigned-clocks property will point to clocks in the
clock controller itself, and will have tens of entries, but should
work :)

And sounds like you can generate that list with some script from the
Apple dtb.

Regards,

Tony
Tony Lindgren June 2, 2021, 9:28 a.m. UTC | #6
* Sven Peter <sven@svenpeter.dev> [210530 11:11]:
> The problem with that approach is that to enable e.g. UART_0 we actually need
> to enable its parents as well, e.g. the Apple Device Tree for the M1 has the
> following clock topology:
> 
>     UART0  (0x23b700270), parent: UART_P
>     UART_P (0x23b700220), parent: SIO
>     SIO    (0x23b7001c0), parent: n/a
> 
> The offsets and the parent/child relationship for all of these three clocks
> change between SoCs. If I now use the offset as the clock id I still need
> to specify that if e.g. UART uses <&clk_controller 0x270> I first need
> to enable 0x1c0 and then 0x220 and only then 0x270.

Maybe take a look what I suggested on using assigned-clocks and related
properties in the clock controller node. That might solve the issue in
a generic way for other SoCs too.

Regards,

Tony
Sven Peter June 3, 2021, 12:55 p.m. UTC | #7
Hi Tony,

On Wed, Jun 2, 2021, at 11:26, Tony Lindgren wrote:
> * Sven Peter <sven@svenpeter.dev> [210530 11:09]:
> > 
> > Now the big question is *how* to describe this additional data in the
> > dts. Essentially I need to specify that e.g. to enable clock 0x270
> > I first need to enable the (internal) clocks 0x1c0 and then 0x220.
> > Are you aware of any generic way to describe this? I'm not even sure
> > how a sane non-generic way would look like when I just have a single
> > clock controller node.
> 
> To me it seems you might be able to recycle the assigned-clocks and
> assigned-clock-parents etc properties in the clock controller node.
> 
> Sure the assigned-clocks property will point to clocks in the
> clock controller itself, and will have tens of entries, but should
> work :)

Thanks for the suggestion, I really like that idea!


If I understand the assigned-clocks-parent property correctly it's meant to select
one of the possible parents of e.g. a mux clock at startup. Internally it just
uses clk_set_parent which, unless I'm mistaken, would require to assign each
clock as a possible parent of every other clock.

Now the good news is that Apple seems to have sorted the clocks topologically
on the bus, i.e. there never is a clock at address X that requires a parent
with an address bigger than X. 
The bad news is that I'd probably still have to setup clocks at 0x0, 0x4,
0x8, ..., X-0x4 as possible parents of the clock at X.

Another possibility this made me think of is to instead just use the clocks
property the way it's usually used and simply refer to the controller itself, e.g.

#define APPLE_CLK_UART0  0x270
#define APPLE_CLK_UART_P 0x220
#define APPLE_CLK_SIO    0x1c0

pmgr0: clock-controller@23b700000 {
        compatible = "apple,t8103-gate-clock";
        #clock-cells = <1>;
        reg = <0x2 0x3b700000 0x0 0x4000>;
        clock-indices = <APPLE_CLK_SIO>, <APPLE_CLK_UART_P>, <APPLE_CLK_UART0>;
        clock-output-names = "clock-sio", "clock-uart-", "clock-uart0";
        clocks = <&some_dummy_root_clock>, <&pmgr0 APPLE_CLK_SIO>,
                 <&pmgr0 APPLE_CLK_UART_P>;
};

to represent the UART clocks

    UART0  (0x23b700270), parent: UART_P
    UART_P (0x23b700220), parent: SIO
    SIO    (0x23b7001c0), parent: n/a

and then have the consumer as

serial0: serial@235200000 {
    // ...
    clocks = <&pmgr0 APPLE_CLK_UART0>, <&clk24>;
    clock-names = "uart", "clk_uart_baud0";
    // ...
};


I'd have to see if it's possible to implement this sanely though if this binding
was acceptable. The self-reference to a node that hasn't been fully initialized
yet may turn out to be ugly.


> 
> And sounds like you can generate that list with some script from the
> Apple dtb.

Yup, absolutely. I don't want to write this all out by hand if I can avoid
that :-)

> 
> Regards,
> 
> Tony
> 

Thanks,


Sven
Tony Lindgren June 4, 2021, 7:43 a.m. UTC | #8
Hi,

* Sven Peter <sven@svenpeter.dev> [210603 12:56]:
> Another possibility this made me think of is to instead just use the clocks
> property the way it's usually used and simply refer to the controller itself, e.g.
> 
> #define APPLE_CLK_UART0  0x270
> #define APPLE_CLK_UART_P 0x220
> #define APPLE_CLK_SIO    0x1c0
> 
> pmgr0: clock-controller@23b700000 {
>         compatible = "apple,t8103-gate-clock";
>         #clock-cells = <1>;
>         reg = <0x2 0x3b700000 0x0 0x4000>;
>         clock-indices = <APPLE_CLK_SIO>, <APPLE_CLK_UART_P>, <APPLE_CLK_UART0>;
>         clock-output-names = "clock-sio", "clock-uart-", "clock-uart0";
>         clocks = <&some_dummy_root_clock>, <&pmgr0 APPLE_CLK_SIO>,
>                  <&pmgr0 APPLE_CLK_UART_P>;
> };

How about the following where you set up the gate clocks as separate child nodes:

pmgr0: clock-controller@23b700000 {
	compatible = "apple,foo-clock-controller";
	#clock-cells = <1>;
	reg = <0x2 0x3b700000 0x0 0x4000>;

	clk_uart0: clock@270 {
		compatible = "apple,t8103-gate-clock";
		#clock-cells = <0>;
		assigned-clock-parents = <&pmgr0 APPLE_CLK_SIO>,
					 <&pmgr0 APPLE_CLK_UART_P>;
		// ...
	};

};

Keep the clock controller still addressable by offset from base as discussed,
and additionally have the driver parse and set up the child node clocks.

Then I think the consumer driver can just do:

serial0: serial@235200000 {
	// ...
	clocks = <&clk_uart0>, <&clk24>;
	clock-names = "uart", "clk_uart_baud0";
	// ...
};

Regards,

Tony
Sven Peter June 5, 2021, 12:12 p.m. UTC | #9
Hi Tony,

On Fri, Jun 4, 2021, at 09:43, Tony Lindgren wrote:
> Hi,
> 
> How about the following where you set up the gate clocks as separate 
> child nodes:
> 
> pmgr0: clock-controller@23b700000 {
> 	compatible = "apple,foo-clock-controller";
> 	#clock-cells = <1>;
> 	reg = <0x2 0x3b700000 0x0 0x4000>;
> 
> 	clk_uart0: clock@270 {
> 		compatible = "apple,t8103-gate-clock";
> 		#clock-cells = <0>;
> 		assigned-clock-parents = <&pmgr0 APPLE_CLK_SIO>,
> 					 <&pmgr0 APPLE_CLK_UART_P>;
> 		// ...
> 	};
> 
> };
> 
> Keep the clock controller still addressable by offset from base as discussed,
> and additionally have the driver parse and set up the child node clocks.

Nice, I like that one even better! We can keep the internal clocks "hidden"
inside the parent node and only need to model the actual consumer clocks
as separate nodes.

Are you aware of any clock driver that implements something similar?
I'd like to avoid reinventing the wheel if it's already been done before.

> 
> Then I think the consumer driver can just do:
> 
> serial0: serial@235200000 {
> 	// ...
> 	clocks = <&clk_uart0>, <&clk24>;
> 	clock-names = "uart", "clk_uart_baud0";
> 	// ...
> };
> 
> Regards,
> 
> Tony
> 

Best,

Sven
Tony Lindgren June 6, 2021, 5:59 a.m. UTC | #10
* Sven Peter <sven@svenpeter.dev> [210605 12:13]:
> Hi Tony,
> 
> On Fri, Jun 4, 2021, at 09:43, Tony Lindgren wrote:
> > Hi,
> > 
> > How about the following where you set up the gate clocks as separate 
> > child nodes:
> > 
> > pmgr0: clock-controller@23b700000 {
> > 	compatible = "apple,foo-clock-controller";
> > 	#clock-cells = <1>;
> > 	reg = <0x2 0x3b700000 0x0 0x4000>;
> > 
> > 	clk_uart0: clock@270 {
> > 		compatible = "apple,t8103-gate-clock";
> > 		#clock-cells = <0>;
> > 		assigned-clock-parents = <&pmgr0 APPLE_CLK_SIO>,
> > 					 <&pmgr0 APPLE_CLK_UART_P>;
> > 		// ...
> > 	};
> > 
> > };
> > 
> > Keep the clock controller still addressable by offset from base as discussed,
> > and additionally have the driver parse and set up the child node clocks.
>
> Nice, I like that one even better! We can keep the internal clocks "hidden"
> inside the parent node and only need to model the actual consumer clocks
> as separate nodes.

I guess the child nodes could also use just a clocks property above
instead of assigned-clock related properties if there are no configurable
source clock mux registers.

> Are you aware of any clock driver that implements something similar?
> I'd like to avoid reinventing the wheel if it's already been done before.

I'm only aware of a partial implementation so far :) The "offset from
clock controller base" approach has worked well for the TI clkctrl
clocks. The clkctrl gate clocks still have the SoC specific source
clock data build into the clock driver(s).

That's the Documentation/devicetree/bindings/clock/ti-clkctrl.txt
binding. For the clkctrl clocks, the SoC specific source clock data
is in drivers/clk/ti/clk-*.c files.

With the Apple dtb describing the gate clock parents, you might be
able to leave out most of the SoC specific built-in driver data
sounds like.

Regards,

Tony