diff mbox

[04/10] bus: introduce an Marvell EBU MBus driver

Message ID 1362577426-12804-5-git-send-email-thomas.petazzoni@free-electrons.com (mailing list archive)
State New, archived
Headers show

Commit Message

Thomas Petazzoni March 6, 2013, 1:43 p.m. UTC
The Marvell EBU SoCs have a configurable physical address space
layout: the physical ranges of memory used to address PCI(e)
interfaces, NOR flashes, SRAM and various other types of memory are
configurable by software, through a mechanism of so-called 'address
decoding windows'.

This new driver mvebu-mbus consolidates the existing code to address
the configuration of these memory ranges, which is spread into
mach-mvebu, mach-orion5x, mach-mv78xx0, mach-dove and mach-kirkwood.

Following patches convert each Marvell EBU SoC family to use this
driver, therefore removing the old code that was configuring the
address decoding windows.

It is worth mentioning that the MVEBU_MBUS Kconfig option is
intentionally added as a blind option. The new driver implements and
exports the mv_mbus_dram_info() function, which is used by various
Marvell drivers throughout the tree to get access to window
configuration parameters that they require. This function is also
implemented in arch/arm/plat-orion/addr-map.c, which ultimately gets
removed at the end of this patch series. So, in order to preserve
bisectability, we want to ensure that *either* this new driver, *or*
the legacy code in plat-orion/addr-map.c gets compiled in.

By making MVEBU_MBUS a blind option, we are sure that only a platform
that does 'select MVEBU_MBUS' will get this new driver compiled
in. Therefore, throughout the next patches that convert the Marvell
sub-architectures one after the other to this new driver, we add the
'select MVEBU_MBUS' and also ensure to remove plat-orion/addr-map.c
from the build for this specific sub-architecture. This ensures that
bisectability is preserved.

Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
---
 .../devicetree/bindings/arm/mvebu-mbus.txt         |   39 +
 drivers/bus/Kconfig                                |    7 +
 drivers/bus/Makefile                               |    1 +
 drivers/bus/mvebu-mbus.c                           |  821 ++++++++++++++++++++
 include/linux/mbus.h                               |   25 +-
 5 files changed, 892 insertions(+), 1 deletion(-)
 create mode 100644 Documentation/devicetree/bindings/arm/mvebu-mbus.txt
 create mode 100644 drivers/bus/mvebu-mbus.c

Comments

Jason Gunthorpe March 6, 2013, 7:08 p.m. UTC | #1
On Wed, Mar 06, 2013 at 02:43:40PM +0100, Thomas Petazzoni wrote:

> The Marvell EBU SoCs have a configurable physical address space
> layout: the physical ranges of memory used to address PCI(e)
> interfaces, NOR flashes, SRAM and various other types of memory are
> configurable by software, through a mechanism of so-called 'address
> decoding windows'.

This is a nice forward step improvement, but I think it would be nicer
to read the HW specific information from the DT rather than encoding
it in tables in the driver. It would make the driver smaller and
simpler..

> +static const struct mvebu_mbus_mapping armada_370_map[] = {
> +	MAPDEF("bootrom",     1, 0xe0, MAPDEF_NOMASK),
> +	MAPDEF("devbus-boot", 1, 0x2f, MAPDEF_NOMASK),
> +	MAPDEF("devbus-cs0",  1, 0x3e, MAPDEF_NOMASK),
> +	MAPDEF("devbus-cs1",  1, 0x3d, MAPDEF_NOMASK),
> +	MAPDEF("devbus-cs2",  1, 0x3b, MAPDEF_NOMASK),
> +	MAPDEF("devbus-cs3",  1, 0x37, MAPDEF_NOMASK),
> +	MAPDEF("pcie0.0",     4, 0xe0, MAPDEF_PCIMASK),
> +	MAPDEF("pcie1.0",     8, 0xe0, MAPDEF_PCIMASK),
> +	{},
> +};

These tables could be encoded using the approach Arnd described:

/* 2 adress cell value with MAPDEF value (target attributes) in the
   first cell and 0 in the 2nd. */
#define MAPDEF(t,a,m) ((t<<16) | (a<<8) | m) 0

mbus {
   compatible = "marvell,mbus";
   #address-cells = <2>;
   #size-cells = <1>;
   regs = <...>

   windows = <0 1 2 3 4>;
   remappable-windows = <7 8 9>;
   internal-window = 5;

   // Names and numbers made up for illistration
   // The 2nd column is where to place the MBUS target in the CPU address map
   ranges < MAPDEF(1, 0xe0, MAPDEF_NOMASK)  0xFE000000  0x10000 // boot rom
            MAPDEF(1, 0x3f, MAPDEF_NOMASK)  0xFD000000  0x10000 // devbus-boot
            MAPDEF(1, 0xxx, MAPDEF_NOMASK)  0xFF000000  0x10000 // internal-regs
             [..]

   // MBUS target for internal registers
   internal_regs {
       compatible = "simple-bus";
       ranges <MAPDEF(1, 0xxx, MAPDEF_NOMASK)  0  0x10000>;
       #address-cells = <1>;
       #size-cells = <1>;

       // Serial controller at offset 0x3000 from the start of internal registers
       serial@0x3000 {
           reg = <0x3000 0x100>;
       }
   }

   // MBUS target for boot rom
   bootrom {
       compatible = "simple-bus";
       ranges <MAPDEF(1, 0xe0, MAPDEF_NOMASK)  0  0x10000>;
       #address-cells = <1>;
       #size-cells = <1>;
   }
}

Where:
  - The top ranges is the table 'mvebu_mbus_mapping', combined with
    the board specific code that sets the target CPU address. The
    mbus driver can directly setup mappings without requiring board
    support.
  - Something like the 'bootrom' bus is where devices under that
    window would be placed.

This also replaces code like this from the board files:

       mvebu_mbus_add_window("nand", KIRKWOOD_NAND_MEM_PHYS_BASE,
                             KIRKWOOD_NAND_MEM_SIZE);
       mvebu_mbus_add_window("sram", KIRKWOOD_SRAM_PHYS_BASE,
                             KIRKWOOD_SRAM_SIZE);

And lets us make the DT self-consistent with the value of
KIRKWOOD_SRAM_PHYS_BASE and others

> + * Some variants of Orion5x have 4 remappable windows, some other have
> + * only two of them.
> + */
> +static const struct mvebu_mbus_soc_data orion5x_4win_mbus_data = {
> +	.num_wins            = 8,
> +	.num_remappable_wins = 4,

These values also seem like something worth reading from the DT as
well.. See above for an idea.

Jason
Thomas Petazzoni March 6, 2013, 7:27 p.m. UTC | #2
Dear Jason Gunthorpe,

On Wed, 6 Mar 2013 12:08:21 -0700, Jason Gunthorpe wrote:

> This is a nice forward step improvement, but I think it would be nicer
> to read the HW specific information from the DT rather than encoding
> it in tables in the driver. It would make the driver smaller and
> simpler..
> 
> > +static const struct mvebu_mbus_mapping armada_370_map[] = {
> > +	MAPDEF("bootrom",     1, 0xe0, MAPDEF_NOMASK),
> > +	MAPDEF("devbus-boot", 1, 0x2f, MAPDEF_NOMASK),
> > +	MAPDEF("devbus-cs0",  1, 0x3e, MAPDEF_NOMASK),
> > +	MAPDEF("devbus-cs1",  1, 0x3d, MAPDEF_NOMASK),
> > +	MAPDEF("devbus-cs2",  1, 0x3b, MAPDEF_NOMASK),
> > +	MAPDEF("devbus-cs3",  1, 0x37, MAPDEF_NOMASK),
> > +	MAPDEF("pcie0.0",     4, 0xe0, MAPDEF_PCIMASK),
> > +	MAPDEF("pcie1.0",     8, 0xe0, MAPDEF_PCIMASK),
> > +	{},
> > +};
> 
> These tables could be encoded using the approach Arnd described:
> 
> /* 2 adress cell value with MAPDEF value (target attributes) in the
>    first cell and 0 in the 2nd. */
> #define MAPDEF(t,a,m) ((t<<16) | (a<<8) | m) 0
> 
> mbus {
>    compatible = "marvell,mbus";
>    #address-cells = <2>;
>    #size-cells = <1>;
>    regs = <...>
> 
>    windows = <0 1 2 3 4>;
>    remappable-windows = <7 8 9>;
>    internal-window = 5;
> 
>    // Names and numbers made up for illistration
>    // The 2nd column is where to place the MBUS target in the CPU address map
>    ranges < MAPDEF(1, 0xe0, MAPDEF_NOMASK)  0xFE000000  0x10000 // boot rom
>             MAPDEF(1, 0x3f, MAPDEF_NOMASK)  0xFD000000  0x10000 // devbus-boot
>             MAPDEF(1, 0xxx, MAPDEF_NOMASK)  0xFF000000  0x10000 // internal-regs
>              [..]
> 
>    // MBUS target for internal registers
>    internal_regs {
>        compatible = "simple-bus";
>        ranges <MAPDEF(1, 0xxx, MAPDEF_NOMASK)  0  0x10000>;
>        #address-cells = <1>;
>        #size-cells = <1>;
> 
>        // Serial controller at offset 0x3000 from the start of internal registers
>        serial@0x3000 {
>            reg = <0x3000 0x100>;
>        }
>    }
> 
>    // MBUS target for boot rom
>    bootrom {
>        compatible = "simple-bus";
>        ranges <MAPDEF(1, 0xe0, MAPDEF_NOMASK)  0  0x10000>;
>        #address-cells = <1>;
>        #size-cells = <1>;
>    }
> }
> 
> Where:
>   - The top ranges is the table 'mvebu_mbus_mapping', combined with
>     the board specific code that sets the target CPU address. The
>     mbus driver can directly setup mappings without requiring board
>     support.
>   - Something like the 'bootrom' bus is where devices under that
>     window would be placed.
> 
> This also replaces code like this from the board files:
> 
>        mvebu_mbus_add_window("nand", KIRKWOOD_NAND_MEM_PHYS_BASE,
>                              KIRKWOOD_NAND_MEM_SIZE);
>        mvebu_mbus_add_window("sram", KIRKWOOD_SRAM_PHYS_BASE,
>                              KIRKWOOD_SRAM_SIZE);
> 
> And lets us make the DT self-consistent with the value of
> KIRKWOOD_SRAM_PHYS_BASE and others

I personally dislike this proposal quite a lot. I believe it puts way
too many things into the Device Tree. Seems like these days, people
want to write thousands of lines of data in Device Trees rather than
having drivers properly abstract the differences between SoC.

Also, I don't believe that windows should be described in the Device
Tree. My long term plan is rather to make the allocation of the base
address of each window entirely dynamic, because there is absolutely no
reason for those addresses to be fixed in stone in the Device Tree.

> 
> > + * Some variants of Orion5x have 4 remappable windows, some other have
> > + * only two of them.
> > + */
> > +static const struct mvebu_mbus_soc_data orion5x_4win_mbus_data = {
> > +	.num_wins            = 8,
> > +	.num_remappable_wins = 4,
> 
> These values also seem like something worth reading from the DT as
> well.. See above for an idea.

No: there will anyway be different functions to be called depending on
the SoC family. So the driver will anyway have to have different
compatible strings for the different SoC families, because we can't put
code into the Device Tree. So rather than having some of the
SoC-specific data encoded in the DT and some of the SoC-specific code
in the driver, I'd rather have all the SoC-specific things (data *and*
code) into the driver, and leave the DT only the responsibility of
telling which SoC variant we're running on, and where the window
registers are.

Another reason is that the DT bindings become part of the kernel ABI,
so they can't be changed once they are defined and have started to be
used seriously on production devices. This leads to a very simple
choice: limit the amount of things we put in the DT, and instead put
more of the internal logic into the driver.

So this was for the fundamental disagreement.

No, I also have a temporary disagreement: I think what you're asking is
way too far away from the current code. I've already been asked to make
gazillions of cleanups and changes to get the PCIe driver in, and I
don't think it's honest and reasonable to ask me to redo the entire
Marvell world just to get the PCIe driver in. I've already refactored
this address decoding stuff, which should make future evolutions much
easier. I've also taken care of migrating all the legacy SoC families.
So now asking me to rework the entire thing in one step, as a
requirement to get the PCIe stuff in, it's really going beyond what's
reasonable, I feel.

Best regards,

Thomas
Jason Gunthorpe March 6, 2013, 8:24 p.m. UTC | #3
On Wed, Mar 06, 2013 at 08:27:10PM +0100, Thomas Petazzoni wrote:

> I personally dislike this proposal quite a lot. I believe it puts way
> too many things into the Device Tree. Seems like these days, people
> want to write thousands of lines of data in Device Trees rather than
> having drivers properly abstract the differences between SoC.

Well, in this case I think it is fairly reasonable because it means
the mbus driver could work with all current and future Marvell chips
that are based on this architecture, rather than having to fiddle with
the driver every time.. 

Part of the point of all this DT stuff is to give HW designers some
reasonable flexability in their HW design without requiring code
changes to Linux to boot it. So if Marvell makes a new Armada varient
that re-uses all the same pre existing IP, just with different mbus
targets, they should be able to make Linux run on it simply by writing
a SOC specific firmware and DT.

> Also, I don't believe that windows should be described in the Device
> Tree. My long term plan is rather to make the allocation of the base
> address of each window entirely dynamic, because there is absolutely no
> reason for those addresses to be fixed in stone in the Device Tree.

How will that work? Device tree necessarily must have CPU addresses
for the things it describes. Today we have the problem that those
addresses must match the values hard coded into Linux, ie DT is
describing what Linux expects, not what the HW provides..

OF was targetted at the case where the firmware would set the windows
and then the DT is simply the way to communicate what the window
settings are to the consumer.

This idea that Linux should setup windows on its own has moved away
from what OF was intended for, I'm not sure there is an overarching
plan on how to handle those cases - but hardcoding addresses in Linux
and then requiring the DT to match them exactly certainly seems wrong.

As for dynamic allocation, they way to do it via DT is through the
ranges property, similar to how I noted. The DT can be modified
in-memory, so a dyanmic mbus driver would ignore the CPU target
address in the ranges, select its own CPU address and then update the
in-memory DT with the corrected value. Everything else would then work
properly.

> > > + * Some variants of Orion5x have 4 remappable windows, some other have
> > > + * only two of them.
> > > + */
> > > +static const struct mvebu_mbus_soc_data orion5x_4win_mbus_data = {
> > > +	.num_wins            = 8,
> > > +	.num_remappable_wins = 4,
> > 
> > These values also seem like something worth reading from the DT as
> > well.. See above for an idea.
> 
> No: there will anyway be different functions to be called depending on
> the SoC family. So the driver will anyway have to have different
> compatible strings for the different SoC families, because we can't
> put

Well I wrote the 'mbus_win_offset' function concept with DT lists:

>    windows = <0 1 2 3 4>;
>    remappable-windows = <7 8 9>;
>    internal-window = 5;

And the two varients of the SDRAM function still have to exist, but
would be best handled by separating the SDRAM description from the
MBUS description in DT - they are orthogonal concepts.

> Another reason is that the DT bindings become part of the kernel ABI,
> so they can't be changed once they are defined and have started to be
> used seriously on production devices. This leads to a very simple
> choice: limit the amount of things we put in the DT, and instead put
> more of the internal logic into the driver.

Well, this is certainly a fair notion - but things like the MAPDEF
values, and window offsets are properties of the SOC and are never,
ever going to change. So selecting a sane way to communicate those
values through DT makes lots of sense to me.

> No, I also have a temporary disagreement: I think what you're asking is
> way too far away from the current code. I've already been asked to
> make

I certainly agree with this. If it wasn't for the idea that the DT is
a stable ABI and thus must be perfect from the start, I wouldn't have
mentioned any of this right now, making fine tuning adjustments as a
follow on seems better to me.

It is absolutely way too much work to try and fix everything in one
go! I would be happy to see your patch go in as-is, but mildly unhappy
to see us stuck with this DT layout forever...

Especially since I really want to see your PCI stuff go in ....

Jason
Thomas Petazzoni March 6, 2013, 8:40 p.m. UTC | #4
Dear Jason Gunthorpe,

On Wed, 6 Mar 2013 13:24:47 -0700, Jason Gunthorpe wrote:

> Well, in this case I think it is fairly reasonable because it means
> the mbus driver could work with all current and future Marvell chips
> that are based on this architecture, rather than having to fiddle with
> the driver every time.. 
> 
> Part of the point of all this DT stuff is to give HW designers some
> reasonable flexability in their HW design without requiring code
> changes to Linux to boot it. So if Marvell makes a new Armada varient
> that re-uses all the same pre existing IP, just with different mbus
> targets, they should be able to make Linux run on it simply by writing
> a SOC specific firmware and DT.

That simply doesn't work for real. HW designers often introduce a
subtle change in the registers, and bing, your driver no longer works,
and you have to fix it anyway.

The idea that making changes only to the DT will allow porting the
kernel to a new family of SoC is just a dream. I know a lot of people
are dreaming about this, but it doesn't make sense: you can't imagine,
when designing a DT binding, what kind of crazy bizarre change the HW
designers will come up with in a newer revision of a given IP. And the
good way to handle that, IMO, is to put as little logic in the DT as
possible (because DT bindings have to be stable), and handle the crazy
little differences between SoCs in the driver.

> > Also, I don't believe that windows should be described in the Device
> > Tree. My long term plan is rather to make the allocation of the base
> > address of each window entirely dynamic, because there is absolutely no
> > reason for those addresses to be fixed in stone in the Device Tree.
> 
> How will that work? Device tree necessarily must have CPU addresses
> for the things it describes. Today we have the problem that those
> addresses must match the values hard coded into Linux, ie DT is
> describing what Linux expects, not what the HW provides..

I don't follow you. For the moment we have to say in the DT, I have a
NOR of 16 MB, and it's mapped at 0xCD000000. This 0xCD000000 is
completely useless: the NOR driver or whatever driver sets up the NOR
can do an allocate_resource(), which returns an available fragment of
the physical address space, and create the address decoding window at
this address.

The base address of address decoding windows is not a description of
the hardware. Those base addresses can be dynamically allocated during
the probing of devices.

> OF was targetted at the case where the firmware would set the windows
> and then the DT is simply the way to communicate what the window
> settings are to the consumer.
> 
> This idea that Linux should setup windows on its own has moved away
> from what OF was intended for, I'm not sure there is an overarching
> plan on how to handle those cases - but hardcoding addresses in Linux
> and then requiring the DT to match them exactly certainly seems wrong.

Where have you seen that I want to hardcode addresses in Linux? That's
what is done for now, because it has been done this way since a long
time in the Linux support for Marvell SoCs, and I don't want to move
away from that in one step.


> > No: there will anyway be different functions to be called depending on
> > the SoC family. So the driver will anyway have to have different
> > compatible strings for the different SoC families, because we can't
> > put
> 
> Well I wrote the 'mbus_win_offset' function concept with DT lists:
> 
> >    windows = <0 1 2 3 4>;
> >    remappable-windows = <7 8 9>;
> >    internal-window = 5;
> 
> And the two varients of the SDRAM function still have to exist, but
> would be best handled by separating the SDRAM description from the
> MBUS description in DT - they are orthogonal concepts.

You've solved the problem for those two particular cases, and then the
next SoC shows up, and has a little difference in the register layout,
or something... and your DT binding no longer works.

Really, I don't see any point or any useful feature brought by adding
this enormous additional amount of information in the DT.

> > Another reason is that the DT bindings become part of the kernel ABI,
> > so they can't be changed once they are defined and have started to be
> > used seriously on production devices. This leads to a very simple
> > choice: limit the amount of things we put in the DT, and instead put
> > more of the internal logic into the driver.
> 
> Well, this is certainly a fair notion - but things like the MAPDEF
> values, and window offsets are properties of the SOC and are never,
> ever going to change. So selecting a sane way to communicate those
> values through DT makes lots of sense to me.

Never going to change? They change from one SoC family to another...

> > No, I also have a temporary disagreement: I think what you're asking is
> > way too far away from the current code. I've already been asked to
> > make
> 
> I certainly agree with this. If it wasn't for the idea that the DT is
> a stable ABI and thus must be perfect from the start, I wouldn't have
> mentioned any of this right now, making fine tuning adjustments as a
> follow on seems better to me.
> 
> It is absolutely way too much work to try and fix everything in one
> go! I would be happy to see your patch go in as-is, but mildly unhappy
> to see us stuck with this DT layout forever...

Again, I don't see the point of making the DT binding more complex than
the one being proposed here, and you haven't given any really
compelling arguments except that it is not to your taste. I believe
we're reaching a point where things start to be a matter of taste, and
so unless you're writing the code and doing the work yourself, you also
have to accept that other people might have different visions of how
things should be handled, and therefore have different tastes.

How can we make progress on this, in a *reasonable* way?

Best regards,

Thomas
Jason Gunthorpe March 6, 2013, 9:50 p.m. UTC | #5
On Wed, Mar 06, 2013 at 09:40:36PM +0100, Thomas Petazzoni wrote:
> > Part of the point of all this DT stuff is to give HW designers some
> > reasonable flexability in their HW design without requiring code
> > changes to Linux to boot it. So if Marvell makes a new Armada varient
> > that re-uses all the same pre existing IP, just with different mbus
> > targets, they should be able to make Linux run on it simply by writing
> > a SOC specific firmware and DT.
> 
> That simply doesn't work for real. HW designers often introduce a
> subtle change in the registers, and bing, your driver no longer works,
> and you have to fix it anyway.

The goal would be to have HW architects cognisant of what capabilites
are available in the DT and Kernel and stay within those
limits. Fundamentally this is how x86 maintains its compatibility and
to achive this dream on ARM the same must be true.

This process must go from both sides, the DT must allow enough
flexibility and the HW architect must care to minimize the software
impact. Today there is no real incentive for HW to say the same
because they have to make Linux patches for every new SOC anyhow. The
introduction of DT will be the first time there is an actual tangible
economic advantage to developing/considering DT bindings for your IP
during the HW architecture phase. I hope that will change things for
the better :)

> > > Also, I don't believe that windows should be described in the Device
> > > Tree. My long term plan is rather to make the allocation of the base
> > > address of each window entirely dynamic, because there is absolutely no
> > > reason for those addresses to be fixed in stone in the Device Tree.
> > 
> > How will that work? Device tree necessarily must have CPU addresses
> > for the things it describes. Today we have the problem that those
> > addresses must match the values hard coded into Linux, ie DT is
> > describing what Linux expects, not what the HW provides..
> 
> I don't follow you. For the moment we have to say in the DT, I have a
> NOR of 16 MB, and it's mapped at 0xCD000000. This 0xCD000000 is
> completely useless: the NOR driver or whatever driver sets up the NOR
> can do an allocate_resource(), which returns an available fragment of
> the physical address space, and create the address decoding window at
> this address.

As I understand it (from other conversations), the preferred way to do
this in a DT world is with the layout I described. The mbus driver
would allocate the regions and update the DT in memory with the
results of allocate_resource(). There are a couple ways to make a DT
layout that allows this, but the basic idea is the same.

This keeps the window allocation encapsulated within the mbus driver,
the NOR driver doesn't know or care about these details, and all the
OF machinery still works because the window base address in DT is
valid.

> > This idea that Linux should setup windows on its own has moved away
> > from what OF was intended for, I'm not sure there is an overarching
> > plan on how to handle those cases - but hardcoding addresses in Linux
> > and then requiring the DT to match them exactly certainly seems wrong.
> 
> Where have you seen that I want to hardcode addresses in Linux? That's
> what is done for now, because it has been done this way since a long
> time in the Linux support for Marvell SoCs, and I don't want to move
> away from that in one step.

I say this because I don't see how to go from this DT binding to a
future that has full dynamic window address assignment. What do you
imagine a NOR DT stanza would look like for dynamic assignment? How do
you tell which DT stanza is associated with which target attributes?
How do you keep mbus specific window setup code out of the
NOR/NAND/SPI/MTD/etc drivers?

> You've solved the problem for those two particular cases, and then the
> next SoC shows up, and has a little difference in the register layout,
> or something... and your DT binding no longer works.

There are now about 5 SOCs with the same layout for the MBUS window
registers, and the idea to specify the base offset of each of the 3
types of window register types works for all of them. 

So to me, this is a prime thing to store in DT. If a future HW uses a
totally different register layout then should this driver even work on
it?

But, this is orthogonal to using 'ranges' and target id's in the DT
to associate DT stanzas with MBUS targets.

> > Well, this is certainly a fair notion - but things like the MAPDEF
> > values, and window offsets are properties of the SOC and are never,
> > ever going to change. So selecting a sane way to communicate those
> > values through DT makes lots of sense to me.
> 
> Never going to change? They change from one SoC family to another...

I ment never change within the SOC. If they were the same on every SOC
then they could live in the driver just fine. It is the fact that
every SOC that uses the MBUS driver uses different target attributes
that makes them a good candidate to live in DT.

> > It is absolutely way too much work to try and fix everything in one
> > go! I would be happy to see your patch go in as-is, but mildly unhappy
> > to see us stuck with this DT layout forever...
> 
> Again, I don't see the point of making the DT binding more complex than
> the one being proposed here, and you haven't given any really
> compelling arguments except that it is not to your taste. I believe
> we're reaching a point where things start to be a matter of taste, and
> so unless you're writing the code and doing the work yourself, you also
> have to accept that other people might have different visions of how
> things should be handled, and therefore have different tastes.
> 
> How can we make progress on this, in a *reasonable* way?

Well, as you say, we've pretty much completed presenting arguments for
each view on this subject, so it is really up to the gatekeepers of
the stable DT 'ABI' to decide on how they want this all to look in the
end, IMHO.

Jason C/Andrew L/Arnd?

Regards,
Jason
Jason Cooper March 6, 2013, 10:27 p.m. UTC | #6
On Wed, Mar 06, 2013 at 02:50:31PM -0700, Jason Gunthorpe wrote:
> On Wed, Mar 06, 2013 at 09:40:36PM +0100, Thomas Petazzoni wrote:
...
> > Again, I don't see the point of making the DT binding more complex than
> > the one being proposed here, and you haven't given any really
> > compelling arguments except that it is not to your taste. I believe
> > we're reaching a point where things start to be a matter of taste, and
> > so unless you're writing the code and doing the work yourself, you also
> > have to accept that other people might have different visions of how
> > things should be handled, and therefore have different tastes.
> > 
> > How can we make progress on this, in a *reasonable* way?
> 
> Well, as you say, we've pretty much completed presenting arguments for
> each view on this subject, so it is really up to the gatekeepers of
> the stable DT 'ABI' to decide on how they want this all to look in the
> end, IMHO.
> 
> Jason C/Andrew L/Arnd?

I'm inclined to agree with Thomas.  Since the the first Kirkwood DT
board was added to the tree, it's been pounded into me that DT
*describes hardware*.  Not how random bootloader X happened to configure
it.

However, we also have the example of local-mac-address.  Which is
describing how the hardware is *configured* to present a consistent id
to a network (from bootloader to kernel to userspace).  This is
certainly a valid deviation from the rule.

If a bootloader setup some windows before handing off control to the
kernel (keep in mind, this could be *BSD, windows, BeOS, etc), is there
any /need/ to keep that configuration?

If we are to include Jason's suggestion, I think it should be listed as
optional properties typically inserted at boot by the bootloader as an
FYI.  And, for Thomas' sanity, presented as a future patch on top of
current work. ;-)

At this point, we have at least a few release cycles to shake out a
standard DT binding for Marvell SoCs.  Currently, not a single product
ships with a DT-enabled bootloader.  Not even the development boards
created by Marvell.  So I'm fine with Thomas' bindings as they stand
*and* fine with adjusting them over the next few releases as needed.

It'll be a different story once products start being released with
DT-enabled bootloaders.

thx,

Jason.
Jason Gunthorpe March 6, 2013, 11:04 p.m. UTC | #7
On Wed, Mar 06, 2013 at 05:27:12PM -0500, Jason Cooper wrote:
> On Wed, Mar 06, 2013 at 02:50:31PM -0700, Jason Gunthorpe wrote:
> > On Wed, Mar 06, 2013 at 09:40:36PM +0100, Thomas Petazzoni wrote:
> ...
> > > Again, I don't see the point of making the DT binding more complex than
> > > the one being proposed here, and you haven't given any really
> > > compelling arguments except that it is not to your taste. I believe
> > > we're reaching a point where things start to be a matter of taste, and
> > > so unless you're writing the code and doing the work yourself, you also
> > > have to accept that other people might have different visions of how
> > > things should be handled, and therefore have different tastes.
> > > 
> > > How can we make progress on this, in a *reasonable* way?
> > 
> > Well, as you say, we've pretty much completed presenting arguments for
> > each view on this subject, so it is really up to the gatekeepers of
> > the stable DT 'ABI' to decide on how they want this all to look in the
> > end, IMHO.
> > 
> > Jason C/Andrew L/Arnd?
> 
> I'm inclined to agree with Thomas.  Since the the first Kirkwood DT
> board was added to the tree, it's been pounded into me that DT
> *describes hardware*.  Not how random bootloader X happened to configure
> it.

I'm not sure this is what we are discussing, certainly it isn't what I
was talking about...

The discussion was around the SOC specific tables in the driver and if
they should be in DT or C code. These tables very much describe the
hardware, and are copied from similar tables in the Marvell manuals.

For instance I repeated this idea:

   ranges =<MAPDEF(1, 0xe0, MAPDEF_NOMASK)  0xFE000000  0x10000 // boot rom
            MAPDEF(1, 0x3f, MAPDEF_NOMASK)  0xFD000000  0x10000 // devbus-boot
            MAPDEF(1, 0xxx, MAPDEF_NOMASK)  0xFF000000  0x10000 // internal-regs
             [..]

At the stanza of MBUS driver's bus. Each line represents a MBUS target
(this is a physical HW IP block). The MAPDEF is a SOC specific 'target
id' that is currently living in tables in the driver. This value is
directly compatible with the mbus window register and is defined by
Marvell. The 2nd number is the CPU base address of that window, and
the last is the size.

Per OF conventions the base address should be the value the bootloader
left it at - but that is not important, the value could be 0. A
dynamic MBUS driver would go through each item in the ranges, find
an address range and program a window with the MAPDEF from the DT. It
would then write the base address back into the DT (at runtime!) so
that the entire DT remains conistent with the current state of the
hardware.

An initial implementation without dynamic address allocation would
just have the mbus driver directly use the 2nd value as the base
address.

The attraction to this model is that the association of later DT
stanza's to MBUS targets is extremely clear:

       nor@0 {
          compatible = 'linux mtd nor driver';
          regs = <MAPDEF(1, 0xab, MAPDEF_NOMASK) 0x10000>;
      }

We can now tell directly that 'linux mtd nor driver' is behind MBUS
target 0xab, and the OF code already knows the base for this MBUS
target from the ranges property, as modified by the MBUS driver. The
MBUS driver also knows which targets to configure windows for
immediately at boot without having to consult internal tables. Plus we
don't need to modify the NOR driver to call out to MBUS window
functions to fetch an address.

There are a few other variants on how to do this in DT, but they all
come down to modeling the MBUS target using a ranges property and
having the mbus driver adjust the range mapping at runtime.

> At this point, we have at least a few release cycles to shake out a
> standard DT binding for Marvell SoCs.  Currently, not a single product
> ships with a DT-enabled bootloader.  Not even the development boards
> created by Marvell.  So I'm fine with Thomas' bindings as they stand
> *and* fine with adjusting them over the next few releases as needed.

I think this is a fine idea. Thomas's driver is a much better starting
place to do stuff like the above!

Regards,
Jason
Arnd Bergmann March 7, 2013, 3:37 a.m. UTC | #8
On Wednesday 06 March 2013, Jason Gunthorpe wrote:
> On Wed, Mar 06, 2013 at 08:27:10PM +0100, Thomas Petazzoni wrote:

> Part of the point of all this DT stuff is to give HW designers some
> reasonable flexability in their HW design without requiring code
> changes to Linux to boot it. So if Marvell makes a new Armada varient
> that re-uses all the same pre existing IP, just with different mbus
> targets, they should be able to make Linux run on it simply by writing
> a SOC specific firmware and DT.

Agreed. Further, the device tree should accurately describe the mapping
of addresses, and we have a lot of infrastructure to do exactly that.

> > Also, I don't believe that windows should be described in the Device
> > Tree. My long term plan is rather to make the allocation of the base
> > address of each window entirely dynamic, because there is absolutely no
> > reason for those addresses to be fixed in stone in the Device Tree.
> 
> How will that work? Device tree necessarily must have CPU addresses
> for the things it describes. Today we have the problem that those
> addresses must match the values hard coded into Linux, ie DT is
> describing what Linux expects, not what the HW provides..

Exactly.

> This idea that Linux should setup windows on its own has moved away
> from what OF was intended for, I'm not sure there is an overarching
> plan on how to handle those cases - but hardcoding addresses in Linux
> and then requiring the DT to match them exactly certainly seems wrong.
> 
> As for dynamic allocation, they way to do it via DT is through the
> ranges property, similar to how I noted. The DT can be modified
> in-memory, so a dyanmic mbus driver would ignore the CPU target
> address in the ranges, select its own CPU address and then update the
> in-memory DT with the corrected value. Everything else would then work
> properly.

Yes. We can even use the index of the "ranges" property to refer directly
to the number of the window, so ranges replaces the kernel internal
struct mvebu_mbus_mapping.

I would also suggest not making the mbus compatible with "simple_bus", but
instead probing /only/ the mbus device at bootup, which can then set
up the hardware based on the ranges property and then call
of_platform_populate on its children.

> > > > + * Some variants of Orion5x have 4 remappable windows, some other have
> > > > + * only two of them.
> > > > + */
> > > > +static const struct mvebu_mbus_soc_data orion5x_4win_mbus_data = {
> > > > +	.num_wins            = 8,
> > > > +	.num_remappable_wins = 4,
> > > 
> > > These values also seem like something worth reading from the DT as
> > > well.. See above for an idea.
> > 
> > No: there will anyway be different functions to be called depending on
> > the SoC family. So the driver will anyway have to have different
> > compatible strings for the different SoC families, because we can't
> > put
> 
> Well I wrote the 'mbus_win_offset' function concept with DT lists:
> 
> >    windows = <0 1 2 3 4>;
> >    remappable-windows = <7 8 9>;
> >    internal-window = 5;
> 
> And the two varients of the SDRAM function still have to exist, but
> would be best handled by separating the SDRAM description from the
> MBUS description in DT - they are orthogonal concepts.

*nod*
 
> > No, I also have a temporary disagreement: I think what you're asking is
> > way too far away from the current code. I've already been asked to
> > make
> 
> I certainly agree with this. If it wasn't for the idea that the DT is
> a stable ABI and thus must be perfect from the start, I wouldn't have
> mentioned any of this right now, making fine tuning adjustments as a
> follow on seems better to me.
> 
> It is absolutely way too much work to try and fix everything in one
> go! I would be happy to see your patch go in as-is, but mildly unhappy
> to see us stuck with this DT layout forever...
> 
> Especially since I really want to see your PCI stuff go in ....

I basically agree with everything you write in this mail, Jason, and
I was about to make exactly the same comments myself.

Getting the binding right is certainly a priority here, but I also think
that we don't need to rush the code to use that binding (although having
the code work would make me more comfortable thinking that the binding
actually works).

	Arnd
Arnd Bergmann March 7, 2013, 3:50 a.m. UTC | #9
On Wednesday 06 March 2013, Thomas Petazzoni wrote:
> I don't follow you. For the moment we have to say in the DT, I have a
> NOR of 16 MB, and it's mapped at 0xCD000000. This 0xCD000000 is
> completely useless: the NOR driver or whatever driver sets up the NOR
> can do an allocate_resource(), which returns an available fragment of
> the physical address space, and create the address decoding window at
> this address.
> 
> The base address of address decoding windows is not a description of
> the hardware. Those base addresses can be dynamically allocated during
> the probing of devices.

I think you just described precisely why the binding that Jason has suggested
is able to describe what the hardware does, while the binding you proposed
cannot do that.

If we want to dynamically assign the address of the NOR flash, we need a
way to describe the hardware address that we want to remap to a CPU
visible logical address using MBUS.

When each device behind MBUS has its own bus space in hardware, a ranges
property is exactly what we should use to describe how that bus space
is mapped into the parent bus that is visible to the CPU. The ranges
property that is passed by the boot loader can describe how the address
is currently configured at boot time, or how you would like the kernel
to configure it to get a sensible system view. But if you want to
dynamically remap it in MBUS, that should /not/ require changing the
"reg" property of the device, because the device itself does not change,
only the mapping changes.

There is a little deficiency here in the way that Linux probes the
platform devices: We allocate a "struct resource" at the time the
platform_device is created, and that resource will be incorrect if
we remap the window. However, I think it's easy enough to work around
that by using 'of_iomap' in drivers that are children of the MBUS and
can get remapped, rather than using 'devm_ioremap_resource'. Platform
devices are Linux specific implementation detail that should not
impact the DT binding in this case.

	Arnd
Jason Gunthorpe March 7, 2013, 6:35 a.m. UTC | #10
On Thu, Mar 07, 2013 at 03:37:43AM +0000, Arnd Bergmann wrote:

> Yes. We can even use the index of the "ranges" property to refer directly
> to the number of the window, so ranges replaces the kernel internal
> struct mvebu_mbus_mapping.

I think we are thinking the same thing, but, s/index/bus address/ and
s/number of the window/target id/ ?

> I would also suggest not making the mbus compatible with "simple_bus", but
> instead probing /only/ the mbus device at bootup, which can then set
> up the hardware based on the ranges property and then call
> of_platform_populate on its children.

Yeah, something is needed to make startup ordering work. Things like
the timer and main interrupt controller live behind the special
internal regs window, so in a perfect world finding the timer would
automatically setup the enclosing bus structures ...

> > I certainly agree with this. If it wasn't for the idea that the DT is
> > a stable ABI and thus must be perfect from the start, I wouldn't have
> > mentioned any of this right now, making fine tuning adjustments as a
> > follow on seems better to me.
> > 
> > It is absolutely way too much work to try and fix everything in one
> > go! I would be happy to see your patch go in as-is, but mildly unhappy
> > to see us stuck with this DT layout forever...
> > 
> > Especially since I really want to see your PCI stuff go in ....
> 
> I basically agree with everything you write in this mail, Jason, and
> I was about to make exactly the same comments myself.
> 
> Getting the binding right is certainly a priority here, but I also think
> that we don't need to rush the code to use that binding (although having
> the code work would make me more comfortable thinking that the binding
> actually works).

Do you think Jason C's notion to take it as is and then be OK with
altering the binding later pending agreement is acceptable for now? I
agree with Thomas that this is too much to ask just to get
the already huge Marvell PCI-E patchset into mainline.

Jason
Thomas Petazzoni March 7, 2013, 10:39 a.m. UTC | #11
Dear Jason Gunthorpe,

On Wed, 6 Mar 2013 16:04:12 -0700, Jason Gunthorpe wrote:

> I'm not sure this is what we are discussing, certainly it isn't what I
> was talking about...
> 
> The discussion was around the SOC specific tables in the driver and if
> they should be in DT or C code. These tables very much describe the
> hardware, and are copied from similar tables in the Marvell manuals.
> 
> For instance I repeated this idea:
> 
>    ranges =<MAPDEF(1, 0xe0, MAPDEF_NOMASK)  0xFE000000  0x10000 // boot rom
>             MAPDEF(1, 0x3f, MAPDEF_NOMASK)  0xFD000000  0x10000 // devbus-boot
>             MAPDEF(1, 0xxx, MAPDEF_NOMASK)  0xFF000000  0x10000 // internal-regs
>              [..]

This doesn't work for PCIe interfaces. The PCIe windows cannot be
described in the DT, because they are inherently dynamic, and depend on
which PCIe devices are plugged into the PCIe slots, and how much I/O
and memory space each of those devices require.

This means that regardless of what you think, *some* windows will have
to be created dynamically during the system initialization, and we will
*not* be able to describe all windows in the Device Tree.

And to me, it seems completely stupid to have some windows defined
statically in the Device Tree, and some other windows set up
dynamically when the system initializes.

> At the stanza of MBUS driver's bus. Each line represents a MBUS target
> (this is a physical HW IP block). The MAPDEF is a SOC specific 'target
> id' that is currently living in tables in the driver. This value is
> directly compatible with the mbus window register and is defined by
> Marvell. The 2nd number is the CPU base address of that window, and
> the last is the size.

Please explain how you handle PCIe windows.

> Per OF conventions the base address should be the value the bootloader
> left it at - but that is not important, the value could be 0. A
> dynamic MBUS driver would go through each item in the ranges, find
> an address range and program a window with the MAPDEF from the DT. It
> would then write the base address back into the DT (at runtime!) so
> that the entire DT remains conistent with the current state of the
> hardware.

Why the heck would the kernel need to rewrite the DT ?

Just like a driver does an ioremap() to create a virtual -> physical
mapping, the driver can just as well do mvebu_mbus_add_window() to
create a physical -> device mapping. It doesn't have to be hardcoded
into the Device Tree.

> An initial implementation without dynamic address allocation would
> just have the mbus driver directly use the 2nd value as the base
> address.
> 
> The attraction to this model is that the association of later DT
> stanza's to MBUS targets is extremely clear:
> 
>        nor@0 {
>           compatible = 'linux mtd nor driver';
>           regs = <MAPDEF(1, 0xab, MAPDEF_NOMASK) 0x10000>;
>       }
> 
> We can now tell directly that 'linux mtd nor driver' is behind MBUS
> target 0xab, and the OF code already knows the base for this MBUS
> target from the ranges property, as modified by the MBUS driver. The
> MBUS driver also knows which targets to configure windows for
> immediately at boot without having to consult internal tables. Plus we
> don't need to modify the NOR driver to call out to MBUS window
> functions to fetch an address.

Please take the PCIe example instead. Thanks.

Best regards,

Thomas
Thomas Petazzoni March 7, 2013, 10:58 a.m. UTC | #12
Dear Arnd Bergmann,

On Thu, 7 Mar 2013 03:37:43 +0000, Arnd Bergmann wrote:
> On Wednesday 06 March 2013, Jason Gunthorpe wrote:
> > On Wed, Mar 06, 2013 at 08:27:10PM +0100, Thomas Petazzoni wrote:
> 
> > Part of the point of all this DT stuff is to give HW designers some
> > reasonable flexability in their HW design without requiring code
> > changes to Linux to boot it. So if Marvell makes a new Armada varient
> > that re-uses all the same pre existing IP, just with different mbus
> > targets, they should be able to make Linux run on it simply by writing
> > a SOC specific firmware and DT.
> 
> Agreed. Further, the device tree should accurately describe the mapping
> of addresses, and we have a lot of infrastructure to do exactly that.

Why should the Device Tree describe something that isn't a description
of the hardware? The mapping of addresses used for address decoding
windows is inherently dynamic.

Do I have to, again, repeat the PCIe discussion? We *cannot* describe
the PCIe address decoding windows in the Device Tree. It is NOT
possible.

We have too many PCIe interfaces (up to 10), too few windows (only 20),
and not enough physical address space, to cover, statically, all
possible combinations of PCIe devices.

In addition, the PCIe bus is very nice because it is dynamic: you
discover devices, how much memory and I/O space they require, and you
set up the windows accordingly.

Please always think about this PCIe use case when proposing any sort of
solution about address decoding windows. Any solution that doesn't take
into account the dynamic nature of PCIe windows is simply useless.

> 
> > > Also, I don't believe that windows should be described in the Device
> > > Tree. My long term plan is rather to make the allocation of the base
> > > address of each window entirely dynamic, because there is absolutely no
> > > reason for those addresses to be fixed in stone in the Device Tree.
> > 
> > How will that work? Device tree necessarily must have CPU addresses
> > for the things it describes. Today we have the problem that those
> > addresses must match the values hard coded into Linux, ie DT is
> > describing what Linux expects, not what the HW provides..
> 
> Exactly.

No, the Device Tree does not need to have CPU addresses.

The hardware description of a NOR is :

 * It is connected to chip-select X
 * Its size is 16 MB
 * Its model/description/characteristics are <foo>

So the NOR would be:

	reg = <0 0x1000000>;
	cs = <3>;

We encode this information in the Device Tree, because they describe
the hardware. Then, in the driver, we do:

	struct resource nor_resource;

	/* Get the size of the NOR from DT */
	of_address_to_resource(np, 0, &nor_resource);

	/* Find an available physical range to map the NOR */
	allocate_resource(&iomem_res, &nor_resource,
			  resource_size(&nor_resource),
			  0, 0, 0, NULL, NULL);

	/* Create the address decoding window */
	mvebu_mbus_add_window(nor, nor_resource.start,
			      resource_size(&nor_resource));

And that's it. The physical address at which the NOR is mapped is *NOT*
a description of the hardware.

We are not hardcoding virtual addresses in the DT, right?

So for the same reason, we should not hardcode the address decoding
windows and their base address into the DT.

Please explain why you think hardcoding in the Device Tree things that
is not a hardware description is a good idea.

> > This idea that Linux should setup windows on its own has moved away
> > from what OF was intended for, I'm not sure there is an overarching
> > plan on how to handle those cases - but hardcoding addresses in Linux
> > and then requiring the DT to match them exactly certainly seems wrong.
> > 
> > As for dynamic allocation, they way to do it via DT is through the
> > ranges property, similar to how I noted. The DT can be modified
> > in-memory, so a dyanmic mbus driver would ignore the CPU target
> > address in the ranges, select its own CPU address and then update the
> > in-memory DT with the corrected value. Everything else would then work
> > properly.
> 
> Yes. We can even use the index of the "ranges" property to refer directly
> to the number of the window, so ranges replaces the kernel internal
> struct mvebu_mbus_mapping.

Doesn't work with PCIe windows that have to be dynamically created.

It really strikes me that in the name of the Device Tree, we would have
to hardcode in stone things that are inherently dynamic.

> Getting the binding right is certainly a priority here, but I also think
> that we don't need to rush the code to use that binding (although having
> the code work would make me more comfortable thinking that the binding
> actually works).

The current binding is just a compatible string, not more. So it can
easily be extended in the future to have more properties below it
giving more configuration details. Of course, I continue to
fundamentally disagree with the proposal you and Jason are making, but
if you want to implement something that does that, it will always be
possible to extend the DT binding in the future.

Also remember that the mvebu-mbus driver has to be compatible with
non-DT platforms. We haven't migrated yet all the SoC families to the
DT, and it will take a bit more time to convert all of them to the
Device Tree.

Would it be possible to agree that this driver is a first step that
consolidates the existing address decoding code, which doesn't prevent
further improvements, and get the driver code merged in this current
state (of course, after fixing all the bugs, issues, that will be
discovered during review and testing) ?

Thanks,

Thomas
Jason Gunthorpe March 7, 2013, 5:37 p.m. UTC | #13
On Thu, Mar 07, 2013 at 11:58:17AM +0100, Thomas Petazzoni wrote:

> Please always think about this PCIe use case when proposing any sort of
> solution about address decoding windows. Any solution that doesn't take
> into account the dynamic nature of PCIe windows is simply useless.

PCIe is not a problem, it continues to use the dynamic interface!

This is all about how do you connect the DT stanzas that are already
in the DT to the dynamic window mbus system - and you do that via
ranges. The ranges only need to cover the needs of the static elements
of the DT - they do not need to extend beyond to fully dynamic
elements like PCI-E - those cases are *totally* different and must be
handled in code.

The reason for this is quite simply to avoid this ugly boiler plate in
every mbus driver:

> 	struct resource nor_resource;
> 
> 	/* Get the size of the NOR from DT */
> 	of_address_to_resource(np, 0, &nor_resource);
> 
> 	/* Find an available physical range to map the NOR */
> 	allocate_resource(&iomem_res, &nor_resource,
> 			  resource_size(&nor_resource),
> 			  0, 0, 0, NULL, NULL);
> 
> 	/* Create the address decoding window */
> 	mvebu_mbus_add_window(nor, nor_resource.start,
> 			      resource_size(&nor_resource));
> 

Not only does this violate encapsulate principles, it breaks the DT
modeling because there is no place in the DT that has the base
address!!

The PCI system allocates and assigns BAR prior to probing the driver -
that is the Linux convention. The mbus driver needs to do the same
thing. The window must be setup prior to calling a child's probe, and
the DT must be correct at that time.

> It really strikes me that in the name of the Device Tree, we would have
> to hardcode in stone things that are inherently dynamic.

What do you mean? The suggestion is to describe all in-DT elements
with a ranges like this:

>    ranges =<MAPDEF(1, 0xe0, MAPDEF_NOMASK)  0xFE000000  0x10000

Which directly models the HW MBUS system. The target-id is set in
stone by the SOC, the window size is based on what is described in the
DT - so there is no need for it to be smaller/larger, and the base
address can be run-time altered to the true window base by the mbus
driver.

There is nothing inherently dynamic that is set in stone. The base
address *is not set in stone* - it is just refactored out of the
individual stanzas and placed into a single common place that is easy
to find pragmatically.

This is not about PCI-E - PCI-E continues to use the full dynamic
interface, nothing major changes there.

> Would it be possible to agree that this driver is a first step that
> consolidates the existing address decoding code, which doesn't prevent
> further improvements, and get the driver code merged in this current
> state (of course, after fixing all the bugs, issues, that will be
> discovered during review and testing) ?

I continue to agree with this.

However, I think Ezequiel's driver should be held up pending a
decision if mvebu_mbus_add_window is appropriate for it to call.

Regards,
Jason
Thomas Petazzoni March 7, 2013, 7:11 p.m. UTC | #14
Dear Jason Gunthorpe,

On Thu, 7 Mar 2013 10:37:36 -0700, Jason Gunthorpe wrote:
> On Thu, Mar 07, 2013 at 11:58:17AM +0100, Thomas Petazzoni wrote:
> 
> > Please always think about this PCIe use case when proposing any sort of
> > solution about address decoding windows. Any solution that doesn't take
> > into account the dynamic nature of PCIe windows is simply useless.
> 
> PCIe is not a problem, it continues to use the dynamic interface!
> 
> This is all about how do you connect the DT stanzas that are already
> in the DT to the dynamic window mbus system - and you do that via
> ranges. The ranges only need to cover the needs of the static elements
> of the DT - they do not need to extend beyond to fully dynamic
> elements like PCI-E - those cases are *totally* different and must be
> handled in code.

I still don't see why devices like NOR and PCIe shouldn't be handled in
the same way.

> The reason for this is quite simply to avoid this ugly boiler plate in
> every mbus driver:

Come on "ugly" ? It's three line of basic set up code, and it avoids
hard-coding base addresses in the Device Tree. Not having hard-coded
base addresses is very nice for users porting Linux to new boards using
Marvell SoCs: they don't have to worry about deciding which base
address to use. A given board may have a huge NOR, some other board may
have gazillions of PCIe devices with large memory space requirements,
some other board might have a FPGA requiring a large physical memory
mapping.

With your idea, the guy writing the board .dts file will have to think
about what is the layout of the physical address space to find where
to map his FPGA or its big NOR.

With the really dynamic idea I'm proposing, the guy doesn't have to
worry at all about the base address.

I know you're proposing to make things dynamic by having the mbus
driver patch the Device Tree once the driver figures out an available
range to map a given device. But that doesn't make any sense: why write
in the DT something that will anyway be overwritten later on?

> > 	struct resource nor_resource;
> > 
> > 	/* Get the size of the NOR from DT */
> > 	of_address_to_resource(np, 0, &nor_resource);
> > 
> > 	/* Find an available physical range to map the NOR */
> > 	allocate_resource(&iomem_res, &nor_resource,
> > 			  resource_size(&nor_resource),
> > 			  0, 0, 0, NULL, NULL);
> > 
> > 	/* Create the address decoding window */
> > 	mvebu_mbus_add_window(nor, nor_resource.start,
> > 			      resource_size(&nor_resource));
> > 
> 
> Not only does this violate encapsulate principles, it breaks the DT
> modeling because there is no place in the DT that has the base
> address!!

But why the *heck* do we care about having the base address in the DT.
We don't care *at all*.

The base address is *NOT* a description of the hardware, because it is
completely configurable. Therefore, there is no need to have it in the
Device Tree.

In my opinion, the Device Tree should describe the informations that
the software cannot "guess" by looking at the hardware.

> The PCI system allocates and assigns BAR prior to probing the driver -
> that is the Linux convention. The mbus driver needs to do the same
> thing. The window must be setup prior to calling a child's probe, and
> the DT must be correct at that time.
> 
> > It really strikes me that in the name of the Device Tree, we would have
> > to hardcode in stone things that are inherently dynamic.
> 
> What do you mean? The suggestion is to describe all in-DT elements
> with a ranges like this:
> 
> >    ranges =<MAPDEF(1, 0xe0, MAPDEF_NOMASK)  0xFE000000  0x10000
> 
> Which directly models the HW MBUS system. The target-id is set in
> stone by the SOC, the window size is based on what is described in the
> DT - so there is no need for it to be smaller/larger, and the base
> address can be run-time altered to the true window base by the mbus
> driver.
> 
> There is nothing inherently dynamic that is set in stone. The base
> address *is not set in stone* - it is just refactored out of the
> individual stanzas and placed into a single common place that is easy
> to find pragmatically.
> 
> This is not about PCI-E - PCI-E continues to use the full dynamic
> interface, nothing major changes there.

I still continue to believe that it's a matter of taste, and where you
put the boundary between what should be described in the Device Tree
and what should not.

> > Would it be possible to agree that this driver is a first step that
> > consolidates the existing address decoding code, which doesn't prevent
> > further improvements, and get the driver code merged in this current
> > state (of course, after fixing all the bugs, issues, that will be
> > discovered during review and testing) ?
> 
> I continue to agree with this.

Thanks.

> However, I think Ezequiel's driver should be held up pending a
> decision if mvebu_mbus_add_window is appropriate for it to call.

It is appropriate for it to call, because it's the only and correct way
to set up an address decoding window with the mvebu-mbus driver, just
like it is the only and correct way that will be used by the PCIe
driver.

Adding description of windows in the Device Tree can be added later on,
in a completely DT backward compatible way, and the mvebu-devbus driver
can be adjusted accordingly.

Thanks,

Thomas
Jason Gunthorpe March 7, 2013, 7:55 p.m. UTC | #15
On Thu, Mar 07, 2013 at 08:11:33PM +0100, Thomas Petazzoni wrote:

>> The reason for this is quite simply to avoid this ugly boiler plate in
>> every mbus driver:

> Come on "ugly" ? It's three line of basic set up code, and it avoids
> hard-coding base addresses in the Device Tree. Not having hard-coded

It is ugly because it doesn't belong in a device driver, it is bus
specific setup code and leaking it out of the bus abstraction is bad.

> With your idea, the guy writing the board .dts file will have to think
> about what is the layout of the physical address space to find where
> to map his FPGA or its big NOR.

Bear in mind that an optimal allocator that doesn't leave any address
space holes and considers the power-of-two size restriction is midly
complex.

Building an optimal allocator is actually impossible if each device
driver does its own allocate_resources - you have to call
allocate_resources in the right order to get an optimal placement
(usually sorted from largest to smallest alignment)

Address allocation/assignment is bus specific nonsense and does not
belong in device drivers.

> I know you're proposing to make things dynamic by having the mbus
> driver patch the Device Tree once the driver figures out an available
> range to map a given device. But that doesn't make any sense: why write
> in the DT something that will anyway be overwritten later on?

Simply because the DT is supposed to have the address map for the
elements it describes. An OF node in the DT is expected to resolve
back to the the correct CPU address,

Stuff in the kernel depends on it, eg the sysfs names are constructed
using the address map:

$ ls -l /sys/bus/platform/devices/
lrwxrwxrwx    1 root     root            0 Mar  7 19:43 f1010078.thermal -> ../../../devices/platform/internal.0/f1010078.thermal
lrwxrwxrwx    1 root     root            0 Mar  7 19:43 f1010100.gpio -> ../../../devices/platform/internal.0/f1010100.gpio
lrwxrwxrwx    1 root     root            0 Mar  7 19:43 f1010140.gpio -> ../../../devices/platform/internal.0/f1010140.gpio
[..]

So if you don't have correct address mapping through the DT then
addressing used by the core OF code is wrong, and the mechanism it
uses to avoid name conflicts is broken. Subtle Badness!

The 'DT must describe the hardware' litany is 90% correct but misses
that a critical function of DT is to describe the *address map*.

So you need to leave a placeholder in the DT for the right value to
get stuffed in. Ranges is a logical place to do that. You can't do it in
the device driver because it that is too late. It must be done before
calling of_populate... - ie by the mbus driver, and the mbus driver
cannot be a simple-bus, like Arnd alluded to.

As far as the placeholder goes, the driver could could ignore it
completely, or try to use it as a default - falling back to dynamic,
and maybe interpret 0/-1 as 'do dynamic always'.

> Adding description of windows in the Device Tree can be added later on,
> in a completely DT backward compatible way, and the mvebu-devbus driver
> can be adjusted accordingly.

Hmm.. You could make a new DT that is compatible with an older kernel,
but once the mvebu_mbus_add_window is moved into the mbus driver it
would require an updated DT.

Jason
Thomas Petazzoni March 7, 2013, 8:28 p.m. UTC | #16
Dear Jason Gunthorpe,

On Thu, 7 Mar 2013 12:55:41 -0700, Jason Gunthorpe wrote:

> > Adding description of windows in the Device Tree can be added later on,
> > in a completely DT backward compatible way, and the mvebu-devbus driver
> > can be adjusted accordingly.
> 
> Hmm.. You could make a new DT that is compatible with an older kernel,
> but once the mvebu_mbus_add_window is moved into the mbus driver it
> would require an updated DT.

So what is your proposal that preserves non-DT compatibility? As I
already said multiple times, we still have non-DT platforms around, and
we can leave them on the side on this topic, so we have to have a
solution that keeps non-DT compatibility for the moment.

Thanks,

Thomas
Jason Gunthorpe March 7, 2013, 8:47 p.m. UTC | #17
On Thu, Mar 07, 2013 at 09:28:54PM +0100, Thomas Petazzoni wrote:
> Dear Jason Gunthorpe,
> 
> On Thu, 7 Mar 2013 12:55:41 -0700, Jason Gunthorpe wrote:
> 
> > > Adding description of windows in the Device Tree can be added later on,
> > > in a completely DT backward compatible way, and the mvebu-devbus driver
> > > can be adjusted accordingly.
> > 
> > Hmm.. You could make a new DT that is compatible with an older kernel,
> > but once the mvebu_mbus_add_window is moved into the mbus driver it
> > would require an updated DT.
> 
> So what is your proposal that preserves non-DT compatibility? As I
> already said multiple times, we still have non-DT platforms around, and
> we can leave them on the side on this topic, so we have to have a
> solution that keeps non-DT compatibility for the moment.

Well, I'd keep it basically the same as we have now..

Non-DT platforms call into the mbus driver via code to create windows
before creating the platform device that will use the window. Use a
fixed address, or dynamic allocation, it doesn't matter. Pass the
resulting address in through the struct resource during platform
device register. This would be done in the same place as today, in the
non-DT board specific file.

Critically, the window is setup and address selected prior to the
platform device add and driver probe. In both the DT and non-DT case
the struct resource associated with the platform device is correct.

Something like this randomly choosen example.

static struct platform_device kirkwood_nand_flash = {
        .name           = "orion_nand",
        .id             = -1,
        .dev            = {
                .platform_data  = &kirkwood_nand_data,
        },
        .resource       = &kirkwood_nand_resource,
        .num_resources  = 1,
};

void __init kirkwood_nand_init(struct mtd_partition *parts, int nr_parts,
                               int chip_delay)
{
        kirkwood_nand_data.parts = parts;
        kirkwood_nand_data.nr_parts = nr_parts;
        kirkwood_nand_data.chip_delay = chip_delay;
+	mbus_assign_resource_window(&kirkwood_nand_flash.resource,
+  	                            MAPDEF(...));
        platform_device_register(&kirkwood_nand_flash);
}

That should work for all the non-DT support cases in the kernel today?

Regards,
Jason
Ezequiel Garcia March 7, 2013, 10:20 p.m. UTC | #18
Hi Jason,

On Wed, Mar 06, 2013 at 04:04:12PM -0700, Jason Gunthorpe wrote:
> 
>        nor@0 {
>           compatible = 'linux mtd nor driver';
>           regs = <MAPDEF(1, 0xab, MAPDEF_NOMASK) 0x10000>;
>       }
> 
> We can now tell directly that 'linux mtd nor driver' is behind MBUS
> target 0xab, and the OF code already knows the base for this MBUS
> target from the ranges property, as modified by the MBUS driver. The
> MBUS driver also knows which targets to configure windows for
> immediately at boot without having to consult internal tables. Plus we
> don't need to modify the NOR driver to call out to MBUS window
> functions to fetch an address.
> 
> There are a few other variants on how to do this in DT, but they all
> come down to modeling the MBUS target using a ranges property and
> having the mbus driver adjust the range mapping at runtime.
> 

At least in the flash (NOR and friends) case, this is not necessarily true.

Now that the Device Bus driver is posted I think we have a nice example
on how address decoding windows would be allocated through the mbus
driver, *without* touching the child driver (i.e. NOR driver).

The windows are allocated as part of the "preparing" tasks the Device
Bus does as the parent device and it seems like a very clean solution.
The windows are specifically related to him, so I think this driver is
the most appropriate to perform the window management.

In a certain way, this is similar to other memory-controller devices
(e.g. OMAP's GPMC) that need to request a chip select before the child
device can be accessed.

Moreover -as I already stated in the patchset- I believe we can make the
Device Bus implement dynamic window allocation, if he can't find a 'ranges'
property for a given child.

In conclusion: there is absolutely no need to change the NOR driver,
and there is no need to explictly add an explicit description of the
NOR window in the device tree.

Thanks,
Jason Gunthorpe March 7, 2013, 11:05 p.m. UTC | #19
On Thu, Mar 07, 2013 at 07:20:05PM -0300, Ezequiel Garcia wrote:
> Now that the Device Bus driver is posted I think we have a nice example
> on how address decoding windows would be allocated through the mbus
> driver, *without* touching the child driver (i.e. NOR driver).

No, that isn't the case. Your device bus driver *is* the child driver,
and you have touched it by adding mbus window code :)

In fact, looking at what you've done - with the DTs you've proposed
your driver does *nothing* but make a call to the mbus driver to setup
the windows for its children. Without timing parameters from the DT the
driver does nothing else!

So what is the point of that? Get rid of your driver, have the mbus
driver set up its own windows and put cfi-flash directly under
mbus. Smaller, simpler, clearer, better.

What you've done is *exactly* the sort of thing I was warning against
:)

BTW, I've made the same basic choice on our Kirkwood systems. The NAND
timing pameters are set by the bootloader and the kernel doesn't touch
them. The bootloader knows which flash chip is placed on the board so
only it knows the correct timing parameters. There is not any reason
to communicate them to the kernel.

I do see a fine use for your driver for non-boot devices, but the
window stuff needs to be removed so it isn't mandatory to use your
driver in the NOP case.

> In conclusion: there is absolutely no need to change the NOR driver,
> and there is no need to explictly add an explicit description of the
> NOR window in the device tree.

As I've said, a DT that has child nodes without an associated address
map is broken.

Jason
Ezequiel Garcia March 8, 2013, 1:02 a.m. UTC | #20
Hi Jason,

On Thu, Mar 07, 2013 at 04:05:16PM -0700, Jason Gunthorpe wrote:
> On Thu, Mar 07, 2013 at 07:20:05PM -0300, Ezequiel Garcia wrote:
> > Now that the Device Bus driver is posted I think we have a nice example
> > on how address decoding windows would be allocated through the mbus
> > driver, *without* touching the child driver (i.e. NOR driver).
> 
> No, that isn't the case. Your device bus driver *is* the child driver,
> and you have touched it by adding mbus window code :)
> 
> In fact, looking at what you've done - with the DTs you've proposed
> your driver does *nothing* but make a call to the mbus driver to setup
> the windows for its children. Without timing parameters from the DT the
> driver does nothing else!
> 

Yes, setting aside the timings parameters, the Device Bus driver is only
charge of allocating the windows; perhaps we can say nothing more, nothing less :)

> > In conclusion: there is absolutely no need to change the NOR driver,
> > and there is no need to explictly add an explicit description of the
> > NOR window in the device tree.
> 
> As I've said, a DT that has child nodes without an associated address
> map is broken.
> 

Mmm... could explain me better, why you think it's *broken*?

Perhaps I'm slower than most people around here, but I still
don't fully understand this brokenness.

Thanks,
Thomas Petazzoni March 8, 2013, 8:10 a.m. UTC | #21
Dear Jason Gunthorpe,

On Thu, 7 Mar 2013 16:05:16 -0700, Jason Gunthorpe wrote:

> So what is the point of that? Get rid of your driver, have the mbus
> driver set up its own windows and put cfi-flash directly under
> mbus. Smaller, simpler, clearer, better.

No, we will still need the Device Bus driver to set up the timings.

> What you've done is *exactly* the sort of thing I was warning against
> :)
> 
> BTW, I've made the same basic choice on our Kirkwood systems. The NAND
> timing pameters are set by the bootloader and the kernel doesn't touch
> them. The bootloader knows which flash chip is placed on the board so
> only it knows the correct timing parameters. There is not any reason
> to communicate them to the kernel.

That's _your_ choice, and it's entirely inconsistent with most kernel
practices. For example, we have the pinctrl subsystem, which precisely
aims at empowering the kernel in terms of pin muxing, while previously,
it was a mix of custom kernel APIs, or reliance on the bootloader for
having setting up the pin muxing.

Sorry, but we _do_ want the kernel to be able to set those timings
parameters, and therefore, a Device Bus driver will be needed,
regardless of whether it creates the address window or not.

> > In conclusion: there is absolutely no need to change the NOR driver,
> > and there is no need to explictly add an explicit description of the
> > NOR window in the device tree.
> 
> As I've said, a DT that has child nodes without an associated address
> map is broken.

That's a statement without any arguments. How can we have a serious
technical discussion if what you bring is just "this is broken", with no
arguments?

Best regards,

Thomas
Thomas Petazzoni March 8, 2013, 8:14 a.m. UTC | #22
Dear Jason Gunthorpe,

On Wed, 6 Mar 2013 16:04:12 -0700, Jason Gunthorpe wrote:

> The discussion was around the SOC specific tables in the driver and if
> they should be in DT or C code. These tables very much describe the
> hardware, and are copied from similar tables in the Marvell manuals.
> 
> For instance I repeated this idea:
> 
>    ranges =<MAPDEF(1, 0xe0, MAPDEF_NOMASK)  0xFE000000  0x10000 // boot rom
>             MAPDEF(1, 0x3f, MAPDEF_NOMASK)  0xFD000000  0x10000 // devbus-boot
>             MAPDEF(1, 0xxx, MAPDEF_NOMASK)  0xFF000000  0x10000 // internal-regs
>              [..]

A few other issues with this:

 * It forces us to repeat the base address of the NOR twice in the
   Device Tree. Once in the ranges = <...> property of the MBus DT node,
   and once in the reg = <...> property of the NOR DT node. Duplication
   of identical information, isn't that something that sounds broken?

 * If we do dynamic allocation of the base address, it means that the
   DT is no longer an accurate representation of what happens for real.
   The user will write 0xDEADBEEF in the DT, and will get its NOR
   mapped at 0xBEEFDEAD. Isn't that uselessly confusing?

Sorry, but your proposal still has many, many disadvantages over the
currently proposed solution, and no compelling argument other than your
own perception that the DT should describing the address mapping.

With MBus, the DT cannot describe the address mapping because it is
fundamentally dynamic. Why the heck would we describe NOR windows in
the DT and not the PCIe ones? They are both dynamic in the exact same
way.

Best regards,

Thomas
Arnd Bergmann March 8, 2013, 1:50 p.m. UTC | #23
On Thursday 07 March 2013, Thomas Petazzoni wrote:
> This doesn't work for PCIe interfaces. The PCIe windows cannot be
> described in the DT, because they are inherently dynamic, and depend on
> which PCIe devices are plugged into the PCIe slots, and how much I/O
> and memory space each of those devices require.
> 
> This means that regardless of what you think, some windows will have
> to be created dynamically during the system initialization, and we will
> not be able to describe all windows in the Device Tree.
> 
> And to me, it seems completely stupid to have some windows defined
> statically in the Device Tree, and some other windows set up
> dynamically when the system initializes.

You could have no "ranges" property in the boot time DT if you are
looking for consistency here.

Ideally, the ranges would contain exactly the setup of the MBUS windows
that was configured with at boot time, and then get updated by the
kernel when the windows are dynamically changed. An skipping the
ranges at boot time would imply that none of the devices underneath
MBUS are currently mapped to physical addresses. Putting some of
the ranges is there is a way to handle two cases though:

* Overriding the boot loader (or hardware) defaults, telling the
  kernel that we really want to set up the windows differently. In
  a perfect world, this would not be necessary, because the boot
  loader should be able to make sure the two always match.
* Windows that we don't want to be set up at boot time yet because
  we know that we want dynamic allocation.

> > At the stanza of MBUS driver's bus. Each line represents a MBUS target
> > (this is a physical HW IP block). The MAPDEF is a SOC specific 'target
> > id' that is currently living in tables in the driver. This value is
> > directly compatible with the mbus window register and is defined by
> > Marvell. The 2nd number is the CPU base address of that window, and
> > the last is the size.
> 
> Please explain how you handle PCIe windows.

Any window that we don't want to hardcode, like the PCIe windows, can
be absent from the "ranges", which in DT syntax means exactly the right
thing: There are bus addresses on the child bus that are wired to one
device but not currently mapped into the parent bus. This is exactly
what we do for instance on a PCI bus ranges property that does not map
its I/O space window into the parent bus as a linear map. This is the
normal case on x86, but also done on a few other platforms.

> > Per OF conventions the base address should be the value the bootloader
> > left it at - but that is not important, the value could be 0. A
> > dynamic MBUS driver would go through each item in the ranges, find
> > an address range and program a window with the MAPDEF from the DT. It
> > would then write the base address back into the DT (at runtime!) so
> > that the entire DT remains conistent with the current state of the
> > hardware.
> 
> Why the heck would the kernel need to rewrite the DT ?
> 
> Just like a driver does an ioremap() to create a virtual -> physical
> mapping, the driver can just as well do mvebu_mbus_add_window() to
> create a physical -> device mapping. It doesn't have to be hardcoded
> into the Device Tree.

The important difference is that the DT describes the physical address
space as seen from the CPU bus interface, but does not care at all about
the virtual addresses that the kernel puts into page tables.

Describing the mapping of addresses from one bus address space to another,
down to the CPU's view is the core of all DT bindings, and we should do
that properly and consistently. If you change the mapping at run-time,
updating the properties is the natural thing to do.

	Arnd
Thomas Petazzoni March 8, 2013, 2:06 p.m. UTC | #24
Dear Arnd Bergmann,

On Fri, 8 Mar 2013 13:50:15 +0000, Arnd Bergmann wrote:

> > And to me, it seems completely stupid to have some windows defined
> > statically in the Device Tree, and some other windows set up
> > dynamically when the system initializes.
> 
> You could have no "ranges" property in the boot time DT if you are
> looking for consistency here.
> 
> Ideally, the ranges would contain exactly the setup of the MBUS windows
> that was configured with at boot time, and then get updated by the
> kernel when the windows are dynamically changed. An skipping the
> ranges at boot time would imply that none of the devices underneath
> MBUS are currently mapped to physical addresses. Putting some of
> the ranges is there is a way to handle two cases though:
> 
> * Overriding the boot loader (or hardware) defaults, telling the
>   kernel that we really want to set up the windows differently. In
>   a perfect world, this would not be necessary, because the boot
>   loader should be able to make sure the two always match.
> * Windows that we don't want to be set up at boot time yet because
>   we know that we want dynamic allocation.

When the kernel starts, it resets all MBus windows that could have
potentially been configured by the bootloader. So we start from a clean
world.

So you're saying that the 'ranges' property should be empty, but the
entire proposal from Jason Gunthorpe is centered around this 'ranges'
property. Quoting Jason:

   // Names and numbers made up for illistration
   // The 2nd column is where to place the MBUS target in the CPU address map
   ranges < MAPDEF(1, 0xe0, MAPDEF_NOMASK)  0xFE000000  0x10000 // boot rom
            MAPDEF(1, 0x3f, MAPDEF_NOMASK)  0xFD000000  0x10000 // devbus-boot
            MAPDEF(1, 0xxx, MAPDEF_NOMASK)  0xFF000000  0x10000 // internal-regs
             [..]

I must confess I'm lost between your opinion and Jason's opinion.

> > > At the stanza of MBUS driver's bus. Each line represents a MBUS target
> > > (this is a physical HW IP block). The MAPDEF is a SOC specific 'target
> > > id' that is currently living in tables in the driver. This value is
> > > directly compatible with the mbus window register and is defined by
> > > Marvell. The 2nd number is the CPU base address of that window, and
> > > the last is the size.
> > 
> > Please explain how you handle PCIe windows.
> 
> Any window that we don't want to hardcode, like the PCIe windows, can

We don't want to hardcode any window, because it doesn't make sense.

> be absent from the "ranges", which in DT syntax means exactly the right
> thing: There are bus addresses on the child bus that are wired to one
> device but not currently mapped into the parent bus. This is exactly
> what we do for instance on a PCI bus ranges property that does not map
> its I/O space window into the parent bus as a linear map. This is the
> normal case on x86, but also done on a few other platforms.

It is also the case for all other devices: when the kernel starts, a
NOR is not mapped into the parent bus until we create the address
decoding window for it. Absolutely no difference compared to PCIe.

> > > Per OF conventions the base address should be the value the bootloader
> > > left it at - but that is not important, the value could be 0. A
> > > dynamic MBUS driver would go through each item in the ranges, find
> > > an address range and program a window with the MAPDEF from the DT. It
> > > would then write the base address back into the DT (at runtime!) so
> > > that the entire DT remains conistent with the current state of the
> > > hardware.
> > 
> > Why the heck would the kernel need to rewrite the DT ?
> > 
> > Just like a driver does an ioremap() to create a virtual -> physical
> > mapping, the driver can just as well do mvebu_mbus_add_window() to
> > create a physical -> device mapping. It doesn't have to be hardcoded
> > into the Device Tree.
> 
> The important difference is that the DT describes the physical address
> space as seen from the CPU bus interface, but does not care at all about
> the virtual addresses that the kernel puts into page tables.

It doesn't describe the physical address of everything, because certain
things are dynamic. In some systems, the only peripheral addresses that
are dynamic are the PCI ones, and they are not hardcoded in the Device
Tree.

In the special case of Marvell systems, *many* peripherals have dynamic
addresses. It's just a specificity of Marvell SoCs, but it is in the
end very similar to the dynamic nature of PCI devices.

Would it be possible to understand that this mechanism of address
decoding windows is a bit special in Marvell SoCs, and that therefore
the classical reasoning of what the DT encodes needs to be thought a
little bit differently?

> Describing the mapping of addresses from one bus address space to another,
> down to the CPU's view is the core of all DT bindings, and we should do
> that properly and consistently. If you change the mapping at run-time,
> updating the properties is the natural thing to do.

Again, you don't do that for PCI devices because their address is
dynamic, so I don't see why we would have to do it for other devices
whose address is also dynamic.

Really, I've sent this PCIe driver first on December, 7th, and it was
working. 99% percent of the problems have been around the Device Tree,
and continue to be around this. Wasn't DT supposed to make things
easier? I am really surprised by the amount of nitpicking that this
driver receives, when I see how incoherent the pinctrl DT bindings for
the various SoC are for example...

At this point, I have absolutely no idea what direction to take to
bring this further. I'm basically in a dead-end.

Best regards,

Thomas
Arnd Bergmann March 8, 2013, 3:41 p.m. UTC | #25
On Friday 08 March 2013, Thomas Petazzoni wrote:
> When the kernel starts, it resets all MBus windows that could have
> potentially been configured by the bootloader. So we start from a clean
> world.
> 
> So you're saying that the 'ranges' property should be empty, but the
> entire proposal from Jason Gunthorpe is centered around this 'ranges'
> property. Quoting Jason:
>
>    // Names and numbers made up for illistration
>    // The 2nd column is where to place the MBUS target in the CPU address map
>    ranges < MAPDEF(1, 0xe0, MAPDEF_NOMASK)  0xFE000000  0x10000 // boot rom
>             MAPDEF(1, 0x3f, MAPDEF_NOMASK)  0xFD000000  0x10000 // devbus-boot
>             MAPDEF(1, 0xxx, MAPDEF_NOMASK)  0xFF000000  0x10000 // internal-regs
>              [..]
> 
> I must confess I'm lost between your opinion and Jason's opinion.

There is not actually all that much disconnect. As a side note, an empty
"ranges" property means an identity map, which would be wrong here, while
an absent property means that there is currently no mapping.

If you want the boot loader to tell the kernel to autoconfigure all the
mappings, an absent ranges property would be the right thing. This makes
a lot of sense if you want to go fully dynamic, but I thought that you
were not attempting to go that far yet.

I think we really want the ranges to be populated at run-time though,
and have it match exactly the windows that we decide to put in, so that
of_address_translate works.

Even if you want to ignore all the mappings that the boot loader sets,
there are two different values that would make sense to put into the
ranges property anyway rather than leaving it out (as I mentioned earlier):

a) Put the mapping in there that is configured by hardware or boot loader,
to make any device work when you do not have a driver for MBUS. Any OS
could then just translate the addresses from DT and get working devices,
but an OS with an MBUS driver can choose to reconfigure the mappings
and update the ranges property. This is similar to e.g. the baud rate is
configured for a serial port: the firmware puts into DT what the UART
is configured to at boot time, and we can use that to output data,
or override it if we have a reason to want a different baud rate. We don't
normally bother updating the DT properties if we do that, because nothing
else depends on it.

b) Put a useful default mapping in the DT, and ask Linux to set up the
MBUS device so we don't need to duplicate the code in Linux and the boot
loader. An OS can still ignore the values and override them with something
else to get a fully dynamic mapping. This is similar e.g. to what we do
for the ethernet MAC addresses: If the kernel wants to pick some value,
it can find a reasonable one in the DT, which it will normally use, but
it can just as well use something elese.

I don't care much which of those options you choose (including leaving
the ranges property out of the initial DT), but I strongly believe that
the format of the in-memory ranges property should be what Jason Gunthorpe
suggested or at least very similar. I certainly don't want to see the
mapping hardwired in the driver. If you want the mapping to be completely
boot time created, an empty ranges or the default from the boot loader
makes the most sense. If you want some mappings to be fixed, I would
prefer them to be in the DT rather than in the driver.

> > > > At the stanza of MBUS driver's bus. Each line represents a MBUS target
> > > > (this is a physical HW IP block). The MAPDEF is a SOC specific 'target
> > > > id' that is currently living in tables in the driver. This value is
> > > > directly compatible with the mbus window register and is defined by
> > > > Marvell. The 2nd number is the CPU base address of that window, and
> > > > the last is the size.
> > > 
> > > Please explain how you handle PCIe windows.
> > 
> > Any window that we don't want to hardcode, like the PCIe windows, can
> 
> We don't want to hardcode any window, because it doesn't make sense.

I thought you were doing exact that with a table like

+static const struct mvebu_mbus_mapping armada_370_map[] = {
+       MAPDEF("bootrom",     1, 0xe0, MAPDEF_NOMASK),
+       MAPDEF("devbus-boot", 1, 0x2f, MAPDEF_NOMASK),
+       MAPDEF("devbus-cs0",  1, 0x3e, MAPDEF_NOMASK),
+       MAPDEF("devbus-cs1",  1, 0x3d, MAPDEF_NOMASK),
+       MAPDEF("devbus-cs2",  1, 0x3b, MAPDEF_NOMASK),
+       MAPDEF("devbus-cs3",  1, 0x37, MAPDEF_NOMASK),
+       MAPDEF("pcie0.0",     4, 0xe0, MAPDEF_PCIMASK),
+       MAPDEF("pcie1.0",     8, 0xe0, MAPDEF_PCIMASK),
+       {},
+};
+

> > be absent from the "ranges", which in DT syntax means exactly the right
> > thing: There are bus addresses on the child bus that are wired to one
> > device but not currently mapped into the parent bus. This is exactly
> > what we do for instance on a PCI bus ranges property that does not map
> > its I/O space window into the parent bus as a linear map. This is the
> > normal case on x86, but also done on a few other platforms.
> 
> It is also the case for all other devices: when the kernel starts, a
> NOR is not mapped into the parent bus until we create the address
> decoding window for it. Absolutely no difference compared to PCIe.

Yes, as I said: if you don't want the mapping for the device to be fixed,
it should not be in the ranges property at boot time, but only get added
at the point where you pick a mapping. This may concern all devices
or just a subset (PCIe, flash, ...).
 
> > The important difference is that the DT describes the physical address
> > space as seen from the CPU bus interface, but does not care at all about
> > the virtual addresses that the kernel puts into page tables.
> 
> It doesn't describe the physical address of everything, because certain
> things are dynamic. In some systems, the only peripheral addresses that
> are dynamic are the PCI ones, and they are not hardcoded in the Device
> Tree.
>
> In the special case of Marvell systems, many peripherals have dynamic
> addresses. It's just a specificity of Marvell SoCs, but it is in the
> end very similar to the dynamic nature of PCI devices.
> 
> Would it be possible to understand that this mechanism of address
> decoding windows is a bit special in Marvell SoCs, and that therefore
> the classical reasoning of what the DT encodes needs to be thought a
> little bit differently?

I don't think we should make a special case here when the common device
tree format can perfectly describe what these chips are doing.

> > Describing the mapping of addresses from one bus address space to another,
> > down to the CPU's view is the core of all DT bindings, and we should do
> > that properly and consistently. If you change the mapping at run-time,
> > updating the properties is the natural thing to do.
> 
> Again, you don't do that for PCI devices because their address is
> dynamic, so I don't see why we would have to do it for other devices
> whose address is also dynamic.

For PCI devices we are often taking a shortcut with the flattened
device tree format because we have to probe them all from the kernel
anyway, in particular for hotplugged devices. In traditional OF
based system, the firmware actually assigned addresses to all PCI
devices and showed them in the device tree. We get away with leaving
the PCI devices out of the DT because PCI is smart enough to assign
the addresses dynamically. On the MBUS, this is not possible without
knowing exactly what devices are behind it, and you can only describe
those devices if you either list them in the source code (which we
are trying to move away from with DT) or if you list them all in the
DT, but that requires describing them in a format that has the
sufficient information.

	Arnd
Maxime Bizon March 8, 2013, 3:59 p.m. UTC | #26
On Thu, 2013-03-07 at 16:05 -0700, Jason Gunthorpe wrote:
> 
> BTW, I've made the same basic choice on our Kirkwood systems. The NAND
> timing pameters are set by the bootloader and the kernel doesn't touch
> them. The bootloader knows which flash chip is placed on the board so
> only it knows the correct timing parameters. There is not any reason
> to communicate them to the kernel. 

since you can probe the chip ONFI timings, why are you doing that ?

on a general standpoint, why would you ever want to run with whatever
the bootloader left in registers if you have full knowledge to reprogram
them in the kernel ?

from a testability point this is horrible,

users get the classic answer to bug reports, "the problem must be coming
from your bootloader, try upgrading it"

the 'B' in BIOS stands for buggy
Jason Cooper March 8, 2013, 4:48 p.m. UTC | #27
On Wed, Mar 06, 2013 at 11:35:13PM -0700, Jason Gunthorpe wrote:
> On Thu, Mar 07, 2013 at 03:37:43AM +0000, Arnd Bergmann wrote:
> 
> > Yes. We can even use the index of the "ranges" property to refer directly
> > to the number of the window, so ranges replaces the kernel internal
> > struct mvebu_mbus_mapping.
> 
> I think we are thinking the same thing, but, s/index/bus address/ and
> s/number of the window/target id/ ?
> 
> > I would also suggest not making the mbus compatible with "simple_bus", but
> > instead probing /only/ the mbus device at bootup, which can then set
> > up the hardware based on the ranges property and then call
> > of_platform_populate on its children.
> 
> Yeah, something is needed to make startup ordering work. Things like
> the timer and main interrupt controller live behind the special
> internal regs window, so in a perfect world finding the timer would
> automatically setup the enclosing bus structures ...
> 
> > > I certainly agree with this. If it wasn't for the idea that the DT is
> > > a stable ABI and thus must be perfect from the start, I wouldn't have
> > > mentioned any of this right now, making fine tuning adjustments as a
> > > follow on seems better to me.
> > > 
> > > It is absolutely way too much work to try and fix everything in one
> > > go! I would be happy to see your patch go in as-is, but mildly unhappy
> > > to see us stuck with this DT layout forever...
> > > 
> > > Especially since I really want to see your PCI stuff go in ....
> > 
> > I basically agree with everything you write in this mail, Jason, and
> > I was about to make exactly the same comments myself.
> > 
> > Getting the binding right is certainly a priority here, but I also think
> > that we don't need to rush the code to use that binding (although having
> > the code work would make me more comfortable thinking that the binding
> > actually works).
> 
> Do you think Jason C's notion to take it as is and then be OK with
> altering the binding later pending agreement is acceptable for now? I
> agree with Thomas that this is too much to ask just to get
> the already huge Marvell PCI-E patchset into mainline.

Since these bindings are mvebu-specific, and since there are no DT
based boards on the market for mvebu, I think this is an acceptable
compromise.

Thanks to JasonG and Arnd for clearing up my misunderstandings.  The
only thing I would really like to see at this point is a clear,
thought-through migration path from Thomas' patches to JasonG's
proposal.  Obviously, it would be nice if the proposer took a crack at
this. ;-)

I'd really like to avoid a churn/spaghetti nightmare down the road if we
can avoid it with a simple tweak now.

thx,

Jason.
Jason Gunthorpe March 8, 2013, 4:48 p.m. UTC | #28
On Fri, Mar 08, 2013 at 04:59:50PM +0100, Maxime Bizon wrote:
> On Thu, 2013-03-07 at 16:05 -0700, Jason Gunthorpe wrote:
> > 
> > BTW, I've made the same basic choice on our Kirkwood systems. The NAND
> > timing pameters are set by the bootloader and the kernel doesn't touch
> > them. The bootloader knows which flash chip is placed on the board so
> > only it knows the correct timing parameters. There is not any reason
> > to communicate them to the kernel. 
> 
> since you can probe the chip ONFI timings, why are you doing that ?

Hmm, very interesting idea, but there is no implementation of this for
the Marvell NAND driver today.

Ezequiel - how would you forsee fitting something like this in? The
process for NAND would be to set the slowest timings (which have to be
computed based on the TClk frequency), do READID, determine the
fastest supported timings, and then update the timings again.

This seems like something that belongs in the NAND interface driver,
not the device bus driver??

> on a general standpoint, why would you ever want to run with whatever
> the bootloader left in registers if you have full knowledge to reprogram
> them in the kernel ?

Well, because the kernel doesn't support those registers. There are
quite a few of them on our Kirkwood systems, including NAND timing.

Jason
Ezequiel Garcia March 8, 2013, 5:12 p.m. UTC | #29
On Fri, Mar 08, 2013 at 09:48:32AM -0700, Jason Gunthorpe wrote:
> > since you can probe the chip ONFI timings, why are you doing that ?
> 
> Hmm, very interesting idea, but there is no implementation of this for
> the Marvell NAND driver today.
> 
> Ezequiel - how would you forsee fitting something like this in? The
> process for NAND would be to set the slowest timings (which have to be
> computed based on the TClk frequency), do READID, determine the
> fastest supported timings, and then update the timings again.
> 
> This seems like something that belongs in the NAND interface driver,
> not the device bus driver??
> 

Mmm... adding NAND support is in my TODO list. But I'm not quite there yet,
so I'm not sure what's the most appropriate way to handle this.
Jason Gunthorpe March 8, 2013, 5:29 p.m. UTC | #30
On Fri, Mar 08, 2013 at 09:10:52AM +0100, Thomas Petazzoni wrote:
> Dear Jason Gunthorpe,
> 
> On Thu, 7 Mar 2013 16:05:16 -0700, Jason Gunthorpe wrote:
> 
> > So what is the point of that? Get rid of your driver, have the mbus
> > driver set up its own windows and put cfi-flash directly under
> > mbus. Smaller, simpler, clearer, better.
> 
> No, we will still need the Device Bus driver to set up the timings.

I looked through Ezequiel's patch and saw a driver that provided those
properties and no user of them at all. Are the patches incomplete? Is
there some plan to use these values in the future?

> That's _your_ choice, and it's entirely inconsistent with most kernel
> practices. For example, we have the pinctrl subsystem, which
> precisely

Again, Ezequiel's patches do exactly the same thing.

pinctrl is a bit different, it depends on the PCB design which is
fixed. NAND/NOR timings depend on the flash chip in use, which can
vary from manufacturing run to manufacturing run.

> Sorry, but we _do_ want the kernel to be able to set those timings
> parameters, and therefore, a Device Bus driver will be needed,
> regardless of whether it creates the address window or not.

See the very good idea from Maxime Bizon about autoprobing the ONFI
timings directly from the NAND chip.

> > > In conclusion: there is absolutely no need to change the NOR driver,
> > > and there is no need to explictly add an explicit description of the
> > > NOR window in the device tree.
> > 
> > As I've said, a DT that has child nodes without an associated address
> > map is broken.
> 
> That's a statement without any arguments. How can we have a serious
> technical discussion if what you bring is just "this is broken", with no
> arguments?

I've brought many things up and you have ignored them all. Here is a
list of a few:
 - You can't actually do optmial dynamic allocation a device at a
   time, that will make holes in the address space due to the alignment
   constraints.
 - Having an invalid address map in DT breaks the naming of platform
   devices and breaks the collision avoidance mechanism in the OF core
 - Breaking the address map means any downstream driver that tries to
   use standard OF addressing functions on its of node gets broken.
 - Breaking the address map means the standard OF platform device
   creation functions don't work because they can't create the
   resource ranges from OF. This means things like standard Linux
   resource accounting is broken.
 - Adding bus specific boiler plate to drivers is not the Linux
   convention, the bus abstraction is ment to do that.
Arnd also mentioned:
 - Breaking the address map in DT breaks OS's without a mbus driver

What more do you want to hear?

Jason
Jason Gunthorpe March 8, 2013, 5:43 p.m. UTC | #31
On Fri, Mar 08, 2013 at 03:06:55PM +0100, Thomas Petazzoni wrote:

> Really, I've sent this PCIe driver first on December, 7th, and it was
> working. 99% percent of the problems have been around the Device Tree,
> and continue to be around this. Wasn't DT supposed to make things
> easier? I am really surprised by the amount of nitpicking that this
> driver receives, when I see how incoherent the pinctrl DT bindings for
> the various SoC are for example...

Well, please consider that the push to make the DT a stable ABI means
it should be nearly as hard to change as the userspace ABI. If you've
watched the kernel you know how rough it is to do that. :(

Further, the PCI DT binding needs to implement the documented OF
specifications. It is not nitpicking to say that X does not conform to
the OF document.

pinctrl doesn't have an OF spec so people are doing SOC unique things.

I think a reasonable take away from some of this is that it may be
worthwhile to propose a DT binding before writing code for it,
especially if it is a new or complex concept.

> At this point, I have absolutely no idea what direction to take to
> bring this further. I'm basically in a dead-end.

I think Jason C. has said he is fine with bringing in the MBUS driver as
is.

The PCI-E driver's DT binding looks great to me, I have no further
comments at all, and I believe everyone else's comments related to DT
are delt with as well.

The huge discussion about the SW PCI-E bridge seems to be concluded
and everyone is agreeing on that point too.

Soo, I think you are not at all at a dead end, but nearly finished :)

Regards,
Jason
Jason Gunthorpe March 8, 2013, 5:50 p.m. UTC | #32
On Fri, Mar 08, 2013 at 03:41:29PM +0000, Arnd Bergmann wrote:

> If you want the boot loader to tell the kernel to autoconfigure all the
> mappings, an absent ranges property would be the right thing. This makes
> a lot of sense if you want to go fully dynamic, but I thought that you
> were not attempting to go that far yet.
> 
> I think we really want the ranges to be populated at run-time though,
> and have it match exactly the windows that we decide to put in, so that
> of_address_translate works.

Right, so taking these two paragraphs together - if you have an absent
ranges you are imagining that the mbus driver would write out a new
fully populated ranges to the DTB? That would be necessary to keep
of_address_translate working.

If so we are talking about exactly the same thing - the in memory DTB
would still have working address translation.

An absent ranges makes some sense from a spec perspective - however do we
have the kernel infrastructure to grow an in memory DTB node?

Jason
Ezequiel Garcia March 8, 2013, 5:59 p.m. UTC | #33
Jason,

On Fri, Mar 08, 2013 at 10:29:50AM -0700, Jason Gunthorpe wrote:
> 
> I looked through Ezequiel's patch and saw a driver that provided those
> properties and no user of them at all. Are the patches incomplete? Is
> there some plan to use these values in the future?
> 

Leaving aside the review comments that still I haven't addressed yet,
the patchset is complete.

The timings parameters were not set in device tree files,
not because I expected the bootloader to set them, but instead because
SoC default values worked fine for the NOR devices I tested.

If this is suboptimal, I can fix the device tree files in v2
to set proper timings parameter values.

Thanks,
Jason Gunthorpe March 8, 2013, 6:26 p.m. UTC | #34
On Fri, Mar 08, 2013 at 09:14:39AM +0100, Thomas Petazzoni wrote:
> > The discussion was around the SOC specific tables in the driver and if
> > they should be in DT or C code. These tables very much describe the
> > hardware, and are copied from similar tables in the Marvell manuals.
> > 
> > For instance I repeated this idea:
> > 
> >    ranges =<MAPDEF(1, 0xe0, MAPDEF_NOMASK)  0xFE000000  0x10000 // boot rom
> >             MAPDEF(1, 0x3f, MAPDEF_NOMASK)  0xFD000000  0x10000 // devbus-boot
> >             MAPDEF(1, 0xxx, MAPDEF_NOMASK)  0xFF000000  0x10000 // internal-regs
> >              [..]
> 
> A few other issues with this:
> 
>  * It forces us to repeat the base address of the NOR twice in the
>    Device Tree. Once in the ranges = <...> property of the MBus DT node,
>    and once in the reg = <...> property of the NOR DT node. Duplication
>    of identical information, isn't that something that sounds broken?

Uhm. That is how DT works. DT is not ment to be human readable, it is
a machine parsible format. Look at a DT that comes out of a full
featured firmware and there will be lots of duplicate information,
especially in ranges.

Downstream stanzas will *always* duplicate the information in upstream
ranges. The format absolutely requires it.
 
>  * If we do dynamic allocation of the base address, it means that the
>    DT is no longer an accurate representation of what happens for real.
>    The user will write 0xDEADBEEF in the DT, and will get its NOR
>    mapped at 0xBEEFDEAD. Isn't that uselessly confusing?

The *input* DTB may not match the *runtime* DTB, but the in-kernel
runtime DTB can be accurate.

> Sorry, but your proposal still has many, many disadvantages over the
> currently proposed solution, and no compelling argument other than your
> own perception that the DT should describing the address mapping.

Hold on here. Your current solution fails on the significant point
that the kernel hard codes CPU addresses and sets up windows on those
addresses. The DT is then required to exactly match those hard coded
addresess.

You dismissed this complaint by saying that would go away with dynamic
assignment - but, there isn't a complete proposal for this. For
instance, take this from Ezequiel:

>> If we want to upgrade this behavior and support dynamically
>> allocated windows, we can simply fix the device bus driver to find a
>> suitable address when there is no 'ranges' property for that chip
>> select.

Okay. So you drop the ranges property and dynamically assign. Go
through these questions:
 - What will the source DTS look like for this case?
 - What happens in the kernel when the child cfi-nor device is created?
 - Will you rely on the usual OF populate code?
 - If no, you write a custom OF populate code path, is that good for
   maintainability?
 - If yes, how do you make of_translate_address return the correct value
   so the cfi-nor platform device's resources and name are correct?
 - Are you going to patch the child's 'reg' value in the dtb?
 - How is patching 'reg' in a child different from patching 'ranges'
   in the parent? Is this any less confusing?
 - What value do you put in the source DTS for the child regs?
 - What value do you put in the runtime DTB for the child's reg?
 - How do you deal with other 'ranges' translation in enclosing
   stanzas?
 - What if the child doesn't have a 'reg' but has a 'ranges', for
   some reason?

Go through the same thinking process with my proposal. Now constrast
the two.

> With MBus, the DT cannot describe the address mapping because it is
> fundamentally dynamic. Why the heck would we describe NOR windows in
> the DT and not the PCIe ones? They are both dynamic in the exact same
> way.

They are not the same. The NOR flash has a DT stanza associated with
it. The dynamic PCIe windows do not. The need to describe the NOR
window int DT arises *directly* from the need to describe the NOR
device in DT.

Jason
Jason Gunthorpe March 8, 2013, 6:31 p.m. UTC | #35
On Fri, Mar 08, 2013 at 02:59:27PM -0300, Ezequiel Garcia wrote:

> > I looked through Ezequiel's patch and saw a driver that provided those
> > properties and no user of them at all. Are the patches incomplete? Is
> > there some plan to use these values in the future?

> Leaving aside the review comments that still I haven't addressed yet,
> the patchset is complete.
 
> The timings parameters were not set in device tree files,
> not because I expected the bootloader to set them, but instead because
> SoC default values worked fine for the NOR devices I tested.

? Isn't that the same thing? Were the timing registers the SOC reset
default or were they leftover from the bootloader?

> If this is suboptimal, I can fix the device tree files in v2
> to set proper timings parameter values.

I have no opinion on this for your boards, it depends entirely on what
flash chips you have to be compatible with, and how you feel about
your bootloader.

Also, when looking at the idea from Maxime it occured to me that your
DT binding might be better using ns or ps for the timings, instead of
tclk cycles? That way NAND/NOR datasheet values can be included in the
DT directly and are correct no matter what the tclk frequency is set to.

Cheers,
Jason
Ezequiel Garcia March 8, 2013, 6:53 p.m. UTC | #36
On Fri, Mar 08, 2013 at 11:31:20AM -0700, Jason Gunthorpe wrote:
> On Fri, Mar 08, 2013 at 02:59:27PM -0300, Ezequiel Garcia wrote:
> 
> > > I looked through Ezequiel's patch and saw a driver that provided those
> > > properties and no user of them at all. Are the patches incomplete? Is
> > > there some plan to use these values in the future?
> 
> > Leaving aside the review comments that still I haven't addressed yet,
> > the patchset is complete.
>  
> > The timings parameters were not set in device tree files,
> > not because I expected the bootloader to set them, but instead because
> > SoC default values worked fine for the NOR devices I tested.
> 
> ? Isn't that the same thing? Were the timing registers the SOC reset
> default or were they leftover from the bootloader?
> 

IMHO, SoC reset default value is not the same as leftover from bootloader,
right?

But this doesn't mean the driver is not useful to set proper parameter
values in case default/bootloader are wrong or suboptimal.

> > If this is suboptimal, I can fix the device tree files in v2
> > to set proper timings parameter values.
> 
> I have no opinion on this for your boards, it depends entirely on what
> flash chips you have to be compatible with, and how you feel about
> your bootloader.
> 
> Also, when looking at the idea from Maxime it occured to me that your
> DT binding might be better using ns or ps for the timings, instead of
> tclk cycles? That way NAND/NOR datasheet values can be included in the
> DT directly and are correct no matter what the tclk frequency is set to.
> 

Mmm... that sounds like a better aproach. I'll think about that.
Arnd Bergmann March 8, 2013, 7:42 p.m. UTC | #37
On Friday 08 March 2013, Jason Gunthorpe wrote:
> Right, so taking these two paragraphs together - if you have an absent
> ranges you are imagining that the mbus driver would write out a new
> fully populated ranges to the DTB? That would be necessary to keep
> of_address_translate working.
> 
> If so we are talking about exactly the same thing - the in memory DTB
> would still have working address translation.
> 
> An absent ranges makes some sense from a spec perspective - however do we
> have the kernel infrastructure to grow an in memory DTB node?

I don't think it's necessary to amend the flattened DT in the DTB form,
although that could be done if necessary. Once we have the live DT data
structures, after calling unflatten_device_tree(), we can trivially call
of_add_property() to add the ranges.

I think the nicest boot sequence here would be to have an early driver
bind of the MBUS, which assigns all the windows, adds the ranges property
and then calls of_platform_populate to create all the child platform
devices from the tree. If we need any of the MBUS children before
initcalls, we can also move that stage earlier and set up the mbus
windows from the machine_init callback.

	Arnd
Jason Gunthorpe March 8, 2013, 7:47 p.m. UTC | #38
On Fri, Mar 08, 2013 at 11:48:04AM -0500, Jason Cooper wrote:
 
> Thanks to JasonG and Arnd for clearing up my misunderstandings.  The
> only thing I would really like to see at this point is a clear,
> thought-through migration path from Thomas' patches to JasonG's
> proposal.  Obviously, it would be nice if the proposer took a crack at
> this. ;-)

The patch to add the top level ranges is a bit major because it
restructures the DT to be organized by mbus target. It also would
depend on Steven's preprocesor patchset (for our own sanity).

To my mind this is best done after Thomas's work goes in, because it
is a complex change to the DT and has dependencies.

Eg kirkwood.dtsi would change to place, as best as possible, mbus
targets directly below the mbus top level. Here is a sample that shows
most of the main points:

/* MAPDEF is a 2 dw value, top DW encodes the target id, bottom dword is
   usally 0. The special target id of '0' is no target/no window. */
#define MAPDEF(x,y,z) ((x << 16) | (y << 8) | z) 0
#define MAPDEF_INTERNAL MAPDEF(...)
[..]
#define PCI_MMIO_APERTURE 0xe0000000
#define PCI_IO_APERTURE 0xe8000000

mbus {
  ranges = <MAPDEF_INTERNAL 0xf1000000 0x100000
            MAPDEF_NAND     0xf4000000 0x10000
            MAPDEF_CRYPTO   0xf5000000 0x10000

 	    /* These two are *not* mbus windows but are required to
   	       model the PCI aperture abstraction. Windows are
	       allocated within these regions dynamically
               as neeed. */
            0 PCI_MMIO_APERTURE PCI_MMIO_APERTURE 0x08000000
            0 PCI_IO_APERTURE PCI_MMIO_APERTURE 0x00100000>
  #address-cells = <2>;
  [..]

  // Internal regs special target
  internal_regs@f1000000 {
       compatible = "simple-bus";
       ranges = <0x00000000 MAPDEF_INTERNAL 0x100000>;
       #address-cells = <1>;

       serial@12000 {
              compatible = "ns16550a";
              reg = <0x12000 0x100>;
              reg-shift = <2>;
              interrupts = <33>;
        };
       [..]
  }

  // All the PCI-E targets
  pcie-controller {
     compatible = "marvell,armada-370-pcie";
     reg = <MAPDEF_INTERNAL + 0x40000 0x2000>,
           <MAPDEF_INTERNAL + 0x80000 0x2000>;
     ranges = <0x82000000 0 PCI_MMIO_APERTURE PCI_MMIO_APERTURE 0 0x08000000   /* non-prefetchable memory */
               0x81000000 0 0                 PCI_IO_APERTURE 0  0x00100000>; /* downstream I/O */
     [..]
     pcie@0,0 {
          reg = <0x0800 0 0 0 0>;
          marvell,mbus-targets = <MAPDEF_PCIE0_MEM MAPDEF_PCIE0_IO>;
     }
  }

  // Target id 1,0x2f
  nand@0xf4000000 {
          #address-cells = <1>;
          #size-cells = <1>;
          cle = <0>;
          ale = <1>;
          bank-width = <1>;
          compatible = "marvell,orion-nand";
          reg = <MAPDEF_NAND 0x400>;
  };

  // Target id 0x3,0x1
  crypto@f5000000 {
          reg = <MAPDEF_INTERNAL + 0x30000 0x10000>,
                <MAPDEF_CRYPTO 0x800>;
  }
}

Which then results in mirroring changes to all the including dts:

mbus {
        internal_regs@f1000000 {
                timer@20300 {
                        clock-frequency = <166666667>;
                };
}

A word on the PCI aperture: It is possible to directly show the mbus
windows for PCI in the DT, and rely on runtime updating to keep
everything in sync with the dynamic choices made by the PCI driver,
but there is no point in doing all this work since we have no use for
this information within the DT. A simple CPU address range 'aperture'
to indicate where we want to place the PCI region is all that is
needed. If the region is someday selected dynamically then the two
ranges would be updated at runtime.

I think I can prepare a patch for this if we reach agreement, but I
can only test it on Kirkwood today.. Also, I'm guessing there is some
opinions on what should be after the @.. ?

From there..

Future patches would have the mbus driver read the top level ranges at
boot and program those windows. Then we remove the code in the board
specific files that does the same.

For kirwood, move the PCI specific window setup code into the non-dt
pcie driver, and then we are ready to use Thomas's DT driver on
kirkwood.

When the non-DT code is removed many of the the tables in the mbus
driver can then be purged.

Future patches could support fully dyanmic window address assignment,
but I'm honestly having a hard time seeing a use case for this..

Jason
Jason Gunthorpe March 8, 2013, 8:05 p.m. UTC | #39
On Fri, Mar 08, 2013 at 07:42:27PM +0000, Arnd Bergmann wrote:
> > An absent ranges makes some sense from a spec perspective - however do we
> > have the kernel infrastructure to grow an in memory DTB node?
> 
> I don't think it's necessary to amend the flattened DT in the DTB form,
> although that could be done if necessary. Once we have the live DT data
> structures, after calling unflatten_device_tree(), we can trivially call
> of_add_property() to add the ranges.

Oh, I've never looked at those APIs, that makes it pretty nice,
indeed. With those it would be easy to keep the in-memory DT in sync.

The only other problem with absent ranges is how do you learn the
target id and size to do the assignment?

> devices from the tree. If we need any of the MBUS children before
> initcalls, we can also move that stage earlier and set up the mbus
> windows from the machine_init callback.

Yah, technically the interrupt controller and timer should be below
the mbus block.

Cheers,
Jason
Arnd Bergmann March 8, 2013, 10:46 p.m. UTC | #40
On Friday 08 March 2013, Jason Gunthorpe wrote:
> 
> On Fri, Mar 08, 2013 at 07:42:27PM +0000, Arnd Bergmann wrote:
> > > An absent ranges makes some sense from a spec perspective - however do we
> > > have the kernel infrastructure to grow an in memory DTB node?
> > 
> > I don't think it's necessary to amend the flattened DT in the DTB form,
> > although that could be done if necessary. Once we have the live DT data
> > structures, after calling unflatten_device_tree(), we can trivially call
> > of_add_property() to add the ranges.
> 
> Oh, I've never looked at those APIs, that makes it pretty nice,
> indeed. With those it would be easy to keep the in-memory DT in sync.
> 
> The only other problem with absent ranges is how do you learn the
> target id and size to do the assignment?

You just need to look at all "reg" properties of direct children
that are not status="disabled". Those properties are already in 
the 64-bit address space co ntaining the target ID, and they have
the size in them, at least for those where the size is known at
boot time, probably all but PCIe, which can get the rest.

The algorithm should probably be similar to what we do for PCI
resource assignment: collect all resources we need, sort them
by size, and fill the logical bus address space starting with
the largest one so we don't leave holes.

	Arnd
Arnd Bergmann March 8, 2013, 10:54 p.m. UTC | #41
On Friday 08 March 2013, Jason Gunthorpe wrote:
> Future patches could support fully dyanmic window address assignment,
> but I'm honestly having a hard time seeing a use case for this..

The main advantage I can see for going fully dynamic is that we can use
a larger aperture and more windows for PCIe on systems that don't use
all of the other windows from the ${SOC}.dtsi file. If we don't have
a NAND chip connected, the ${BOARD}.dts file can easily mark the nand
controller node as status="disabled" and we don't have to give it a
physical address.

It's not a strong reason, but I think it's nice and clean to assign
the windows automatically and exactly for the hardware that we are
going to use.

	Arnd
Arnd Bergmann March 8, 2013, 10:58 p.m. UTC | #42
On Friday 08 March 2013, Jason Gunthorpe wrote:
> > At this point, I have absolutely no idea what direction to take to
> > bring this further. I'm basically in a dead-end.
> 
> I think Jason C. has said he is fine with bringing in the MBUS driver as
> is.
> 
> The PCI-E driver's DT binding looks great to me, I have no further
> comments at all, and I believe everyone else's comments related to DT
> are delt with as well.
> 
> The huge discussion about the SW PCI-E bridge seems to be concluded
> and everyone is agreeing on that point too.
> 
> Soo, I think you are not at all at a dead end, but nearly finished :)

I agree, I think the has been going really well and we are just fighting
over a few details here. I was giving you all a hard time about the PCIe
driver, but that was actually less about the binding than about the
question whether to emulate P2P bridges or not. I still don't think
it's a good solution, but that is because of the hardware shortcomings
and all the alternatives are worse.

In case of the MBUS driver, I don't think it's a hard problem at all
(compared to the complexity of the PCIe driver and binding), we just
have to agree to do the right thing here.

	Arnd
Thomas Petazzoni March 18, 2013, 4:27 p.m. UTC | #43
Dear Jason Gunthorpe,

On Fri, 8 Mar 2013 10:29:50 -0700, Jason Gunthorpe wrote:

> > Sorry, but we _do_ want the kernel to be able to set those timings
> > parameters, and therefore, a Device Bus driver will be needed,
> > regardless of whether it creates the address window or not.
> 
> See the very good idea from Maxime Bizon about autoprobing the ONFI
> timings directly from the NAND chip.

ONFI (Open NAND Firmware Interface) is about NAND, while the Device Bus
driver submitted by Ezequiel is about NOR. I don't think NORs have a
standard way of exposing their memory timings requirements. And the
Device Bus driver is not only for NORs, but also for example for FPGAs
connected on the memory bus.

Ezequiel followed the work done by Jon Hunter on the GPMC driver for
the memory bus of OMAP SoCs, which includes the definition of timings
in the Device Tree. To me, it seems like defining the timings in the
Device Tree do make sense, and "ONFI" is not a good enough answer to
get rid of the Device Bus driver submitted by Ezequiel.

Best regards,

Thomas
Jason Gunthorpe March 18, 2013, 5:18 p.m. UTC | #44
On Mon, Mar 18, 2013 at 05:27:18PM +0100, Thomas Petazzoni wrote:
> Dear Jason Gunthorpe,
> 
> On Fri, 8 Mar 2013 10:29:50 -0700, Jason Gunthorpe wrote:
> 
> > > Sorry, but we _do_ want the kernel to be able to set those timings
> > > parameters, and therefore, a Device Bus driver will be needed,
> > > regardless of whether it creates the address window or not.
> > 
> > See the very good idea from Maxime Bizon about autoprobing the ONFI
> > timings directly from the NAND chip.
> 
> ONFI (Open NAND Firmware Interface) is about NAND, while the Device Bus
> driver submitted by Ezequiel is about NOR. I don't think NORs have a
> standard way of exposing their memory timings requirements. And the
> Device Bus driver is not only for NORs, but also for example for FPGAs
> connected on the memory bus.

Right, AFAIK there is no CFI property for device timings, and there is
a lot of variation of optimal timings between pin-compatible
parts.

Some kind of out-of-band indication of the flash chip and speed grade
is necessary to get optimal timings, unfortunately - no idea how you'd
fit that sensibly into a DT framework.

> Ezequiel followed the work done by Jon Hunter on the GPMC driver for
> the memory bus of OMAP SoCs, which includes the definition of timings
> in the Device Tree. To me, it seems like defining the timings in the
> Device Tree do make sense, and "ONFI" is not a good enough answer to
> get rid of the Device Bus driver submitted by Ezequiel.

As I said, I think that is fine - there is nothing wrong with a driver
to set the bus timings. But, if there are no bus timings in the DT
then the driver should not be involved at all..

Similarly the driver should require all the timings, re-using timings
left in the register with timings from the DT seems very strange..

Regards,
Jason
diff mbox

Patch

diff --git a/Documentation/devicetree/bindings/arm/mvebu-mbus.txt b/Documentation/devicetree/bindings/arm/mvebu-mbus.txt
new file mode 100644
index 0000000..9f4220f
--- /dev/null
+++ b/Documentation/devicetree/bindings/arm/mvebu-mbus.txt
@@ -0,0 +1,39 @@ 
+MVEBU MBUS driver
+-----------------
+
+This is the Device Tree binding documentation for the mvebu-mbus
+driver, that takes care of configuration the CPU-side address decoding
+windows that allows to access the SDRAM and the devices. This driver
+is used on all Marvell EBU SoCs (Armada 370/XP, Dove, Kirkwood,
+Orion5x and MV78xx0).
+
+Required properties:
+
+- compatible: one of:
+  - marvell,armada370-mbus
+  - marvell,armadaxp-mbus
+  - marvell,kirkwood-mbus
+  - marvell,dove-mbus
+  - marvell,orion5x-88f5281-mbus
+  - marvell,orion5x-88f5182-mbus
+  - marvell,orion5x-88f5181-mbus
+  - marvell,orion5x-88f6183-mbus
+  - marvell,mv78xx0-mbus
+
+- reg: should contain two sets of register ranges.
+  - first range describing the registers used to configure the MBus
+    address decoding windows.
+  - second range describing the registers used to configure the SDRAM
+    address decoding windows.
+
+Example:
+
+ 	soc@f1000000 {
+		compatible = "marvell,dove-mbus", "simple-bus";
+		reg = <0xf10020000 0x80>, <0xf1800100 0x8>;
+		#address-cells = <1>;
+		#size-cells = <1>;
+
+		[... devices on mbus ...]
+
+	};
diff --git a/drivers/bus/Kconfig b/drivers/bus/Kconfig
index 0f51ed6..b05ecab 100644
--- a/drivers/bus/Kconfig
+++ b/drivers/bus/Kconfig
@@ -4,6 +4,13 @@ 
 
 menu "Bus devices"
 
+config MVEBU_MBUS
+	bool
+	depends on PLAT_ORION
+	help
+	  Driver needed for the MBus configuration on Marvell EBU SoCs
+	  (Kirkwood, Dove, Orion5x, MV78XX0 and Armada 370/XP).
+
 config OMAP_OCP2SCP
 	tristate "OMAP OCP2SCP DRIVER"
 	depends on ARCH_OMAP2PLUS
diff --git a/drivers/bus/Makefile b/drivers/bus/Makefile
index 45d997c..3c7b53c 100644
--- a/drivers/bus/Makefile
+++ b/drivers/bus/Makefile
@@ -2,6 +2,7 @@ 
 # Makefile for the bus drivers.
 #
 
+obj-$(CONFIG_MVEBU_MBUS) += mvebu-mbus.o
 obj-$(CONFIG_OMAP_OCP2SCP)	+= omap-ocp2scp.o
 
 # Interconnect bus driver for OMAP SoCs.
diff --git a/drivers/bus/mvebu-mbus.c b/drivers/bus/mvebu-mbus.c
new file mode 100644
index 0000000..1c6fbf4
--- /dev/null
+++ b/drivers/bus/mvebu-mbus.c
@@ -0,0 +1,821 @@ 
+/*
+ * Address map functions for Marvell EBU SoCs (Kirkwood, Armada
+ * 370/XP, Dove, Orion5x and MV78xx0)
+ *
+ * This file is licensed under the terms of the GNU General Public
+ * License version 2.  This program is licensed "as is" without any
+ * warranty of any kind, whether express or implied.
+ */
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/mbus.h>
+#include <linux/io.h>
+#include <linux/ioport.h>
+#include <linux/of.h>
+#include <linux/of_address.h>
+#include <linux/debugfs.h>
+
+/*
+ * DDR target is the same on all platforms.
+ */
+#define TARGET_DDR		0
+
+/*
+ * CPU Address Decode Windows registers
+ */
+#define WIN_CTRL_OFF		0x0000
+#define   WIN_CTRL_ENABLE       BIT(0)
+#define   WIN_CTRL_TGT_MASK     0xf0
+#define   WIN_CTRL_TGT_SHIFT    4
+#define   WIN_CTRL_ATTR_MASK    0xff00
+#define   WIN_CTRL_ATTR_SHIFT   8
+#define   WIN_CTRL_SIZE_MASK    0xffff0000
+#define   WIN_CTRL_SIZE_SHIFT   16
+#define WIN_BASE_OFF		0x0004
+#define   WIN_BASE_LOW          0xffff0000
+#define   WIN_BASE_HIGH         0xf
+#define WIN_REMAP_LO_OFF	0x0008
+#define   WIN_REMAP_LOW         0xffff0000
+#define WIN_REMAP_HI_OFF	0x000c
+
+#define ATTR_HW_COHERENCY	(0x1 << 4)
+
+#define DDR_BASE_CS_OFF(n)	(0x0000 + ((n) << 3))
+#define  DDR_BASE_CS_HIGH_MASK  0xf
+#define  DDR_BASE_CS_LOW_MASK   0xff000000
+#define DDR_SIZE_CS_OFF(n)	(0x0004 + ((n) << 3))
+#define  DDR_SIZE_ENABLED       BIT(0)
+#define  DDR_SIZE_CS_MASK       0x1c
+#define  DDR_SIZE_CS_SHIFT      2
+#define  DDR_SIZE_MASK          0xff000000
+
+#define DOVE_DDR_BASE_CS_OFF(n) ((n) << 4)
+
+struct mvebu_mbus_mapping {
+	const char *name;
+	u8 target;
+	u8 attr;
+	u8 attrmask;
+};
+
+/*
+ * Masks used for the 'attrmask' field of mvebu_mbus_mapping. They
+ * allow to get the real attribute value, discarding the special bits
+ * used to select a PCI MEM region or a PCI WA region. This allows the
+ * debugfs code to reverse-match the name of a device from its
+ * target/attr values.
+ *
+ * For all devices except PCI, all bits of 'attr' must be
+ * considered. For most SoCs, only bit 3 should be ignored (it allows
+ * to select between PCI MEM and PCI I/O). On Orion5x however, there
+ * is the special bit 5 to select a PCI WA region.
+ */
+#define MAPDEF_NOMASK       0xff
+#define MAPDEF_PCIMASK      0xf7
+#define MAPDEF_ORIONPCIMASK 0xd7
+
+/* Macro used to define one mvebu_mbus_mapping entry */
+#define MAPDEF(__n, __t, __a, __m) \
+	{ .name = __n, .target = __t, .attr = __a, .attrmask = __m }
+
+struct mvebu_mbus_state;
+
+struct mvebu_mbus_soc_data {
+	unsigned int num_wins;
+	unsigned int num_remappable_wins;
+	unsigned int (*win_cfg_offset)(const int win);
+	void (*setup_cpu_target)(struct mvebu_mbus_state *s);
+	const struct mvebu_mbus_mapping *map;
+};
+
+struct mvebu_mbus_state {
+	void __iomem *mbuswins_base;
+	void __iomem *sdramwins_base;
+	struct dentry *debugfs_root;
+	struct dentry *debugfs_sdram;
+	struct dentry *debugfs_devs;
+	const struct mvebu_mbus_soc_data *soc;
+	int hw_io_coherency;
+};
+
+static struct mvebu_mbus_state mbus_state;
+
+static struct mbus_dram_target_info mvebu_mbus_dram_info;
+const struct mbus_dram_target_info *mv_mbus_dram_info(void)
+{
+	return &mvebu_mbus_dram_info;
+}
+EXPORT_SYMBOL_GPL(mv_mbus_dram_info);
+
+/*
+ * Functions to manipulate the address decoding windows
+ */
+
+static void mvebu_mbus_read_window(struct mvebu_mbus_state *mbus,
+				   int win, int *enabled, u64 *base,
+				   u32 *size, u8 *target, u8 *attr,
+				   u64 *remap)
+{
+	void __iomem *addr = mbus->mbuswins_base +
+		mbus->soc->win_cfg_offset(win);
+	u32 basereg = readl(addr + WIN_BASE_OFF);
+	u32 ctrlreg = readl(addr + WIN_CTRL_OFF);
+
+	if (!(ctrlreg & WIN_CTRL_ENABLE)) {
+		*enabled = 0;
+		return;
+	}
+
+	*enabled = 1;
+	*base = ((u64)basereg & WIN_BASE_HIGH) << 32;
+	*base |= (basereg & WIN_BASE_LOW);
+	*size = (ctrlreg | ~WIN_CTRL_SIZE_MASK) + 1;
+
+	if (target)
+		*target = (ctrlreg & WIN_CTRL_TGT_MASK) >> WIN_CTRL_TGT_SHIFT;
+
+	if (attr)
+		*attr = (ctrlreg & WIN_CTRL_ATTR_MASK) >> WIN_CTRL_ATTR_SHIFT;
+
+	if (remap) {
+		if (win < mbus->soc->num_remappable_wins) {
+			u32 remap_low = readl(addr + WIN_REMAP_LO_OFF);
+			u32 remap_hi  = readl(addr + WIN_REMAP_HI_OFF);
+			*remap = ((u64)remap_hi << 32) | remap_low;
+		} else
+			*remap = 0;
+	}
+}
+
+static void mvebu_mbus_disable_window(struct mvebu_mbus_state *mbus,
+				      int win)
+{
+	void __iomem *addr;
+
+	addr = mbus->mbuswins_base + mbus->soc->win_cfg_offset(win);
+
+	writel(0, addr + WIN_BASE_OFF);
+	writel(0, addr + WIN_CTRL_OFF);
+	if (win < mbus->soc->num_remappable_wins) {
+		writel(0, addr + WIN_REMAP_LO_OFF);
+		writel(0, addr + WIN_REMAP_HI_OFF);
+	}
+}
+
+/* Checks whether the given window number is available */
+static int mvebu_mbus_window_is_free(struct mvebu_mbus_state *mbus,
+				     const int win)
+{
+	void __iomem *addr = mbus->mbuswins_base +
+		mbus->soc->win_cfg_offset(win);
+	u32 ctrl = readl(addr + WIN_CTRL_OFF);
+	return !(ctrl & WIN_CTRL_ENABLE);
+}
+
+/*
+ * Checks whether the given (base, base+size) area doesn't overlap an
+ * existing region
+ */
+static int mvebu_mbus_window_conflicts(struct mvebu_mbus_state *mbus,
+				       phys_addr_t base, size_t size,
+				       u8 target, u8 attr)
+{
+	u64 end = (u64)base + size;
+	int win;
+
+	for (win = 0; win < mbus->soc->num_wins; win++) {
+		u64 wbase, wend;
+		u32 wsize;
+		u8 wtarget, wattr;
+		int enabled;
+
+		mvebu_mbus_read_window(mbus, win,
+				       &enabled, &wbase, &wsize,
+				       &wtarget, &wattr, NULL);
+
+		if (!enabled)
+			continue;
+
+		wend = wbase + wsize;
+
+		/*
+		 * Check if the current window overlaps with the
+		 * proposed physical range
+		 */
+		if ((u64)base < wend && end > wbase)
+			return 0;
+
+		/*
+		 * Check if target/attribute conflicts
+		 */
+		if (target == wtarget && attr == wattr)
+			return 0;
+	}
+
+	return 1;
+}
+
+static int mvebu_mbus_find_window(struct mvebu_mbus_state *mbus,
+				  phys_addr_t base, size_t size)
+{
+	int win;
+
+	for (win = 0; win < mbus->soc->num_wins; win++) {
+		u64 wbase;
+		u32 wsize;
+		int enabled;
+
+		mvebu_mbus_read_window(mbus, win,
+				       &enabled, &wbase, &wsize,
+				       NULL, NULL, NULL);
+
+		if (!enabled)
+			continue;
+
+		if (base == wbase && size == wsize)
+			return win;
+	}
+
+	return -ENODEV;
+}
+
+static int mvebu_mbus_setup_window(struct mvebu_mbus_state *mbus,
+				   int win, phys_addr_t base, size_t size,
+				   phys_addr_t remap, u8 target,
+				   u8 attr)
+{
+	void __iomem *addr = mbus->mbuswins_base +
+		mbus->soc->win_cfg_offset(win);
+	u32 ctrl, remap_addr;
+
+	ctrl = ((size - 1) & WIN_CTRL_SIZE_MASK) |
+		(attr << WIN_CTRL_ATTR_SHIFT)    |
+		(target << WIN_CTRL_TGT_SHIFT)   |
+		WIN_CTRL_ENABLE;
+
+	writel(base & WIN_BASE_LOW, addr + WIN_BASE_OFF);
+	writel(ctrl, addr + WIN_CTRL_OFF);
+	if (win < mbus->soc->num_remappable_wins) {
+		if (remap == MVEBU_MBUS_NO_REMAP)
+			remap_addr = base;
+		else
+			remap_addr = remap;
+		writel(remap_addr & WIN_REMAP_LOW, addr + WIN_REMAP_LO_OFF);
+		writel(0, addr + WIN_REMAP_HI_OFF);
+	}
+
+	return 0;
+}
+
+static int mvebu_mbus_alloc_window(struct mvebu_mbus_state *mbus,
+				   phys_addr_t base, size_t size,
+				   phys_addr_t remap, u8 target,
+				   u8 attr)
+{
+	int win;
+
+	if (remap == MVEBU_MBUS_NO_REMAP) {
+		for (win = mbus->soc->num_remappable_wins;
+		     win < mbus->soc->num_wins; win++)
+			if (mvebu_mbus_window_is_free(mbus, win))
+				return mvebu_mbus_setup_window(mbus, win, base,
+							       size, remap,
+							       target, attr);
+	}
+
+
+	for (win = 0; win < mbus->soc->num_wins; win++)
+		if (mvebu_mbus_window_is_free(mbus, win))
+			return mvebu_mbus_setup_window(mbus, win, base, size,
+						       remap, target, attr);
+
+	return -ENOMEM;
+}
+
+/*
+ * Debugfs debugging
+ */
+
+static int sdram_debug_show(struct seq_file *seq, void *v)
+{
+	struct mvebu_mbus_state *mbus = &mbus_state;
+	int i;
+
+	for (i = 0; i < 4; i++) {
+		u32 basereg = readl(mbus->sdramwins_base + DDR_BASE_CS_OFF(i));
+		u32 sizereg = readl(mbus->sdramwins_base + DDR_SIZE_CS_OFF(i));
+		u64 base;
+		u32 size;
+
+		if (!(sizereg & DDR_SIZE_ENABLED)) {
+			seq_printf(seq, "[%d] disabled\n", i);
+			continue;
+		}
+
+		base = ((u64)basereg & DDR_BASE_CS_HIGH_MASK) << 32;
+		base |= basereg & DDR_BASE_CS_LOW_MASK;
+		size = sizereg & DDR_SIZE_MASK;
+
+		seq_printf(seq, "[%d] %016llx - 0x%016llx : cs%d\n",
+			   i, (unsigned long long)base,
+			   (unsigned long long)base + size,
+			   (size & DDR_SIZE_CS_MASK) >> DDR_SIZE_CS_SHIFT);
+	}
+
+	return 0;
+}
+
+static int sdram_debug_open(struct inode *inode, struct file *file)
+{
+	return single_open(file, sdram_debug_show, inode->i_private);
+}
+
+static const struct file_operations sdram_debug_fops = {
+	.open = sdram_debug_open,
+	.read = seq_read,
+	.llseek = seq_lseek,
+	.release = single_release,
+};
+
+static int devs_debug_show(struct seq_file *seq, void *v)
+{
+	struct mvebu_mbus_state *mbus = &mbus_state;
+	int win;
+
+	for (win = 0; win < mbus->soc->num_wins; win++) {
+		u64 wbase, wremap;
+		u32 wsize;
+		u8 wtarget, wattr;
+		int enabled, i;
+		const char *name;
+
+		mvebu_mbus_read_window(mbus, win,
+				       &enabled, &wbase, &wsize,
+				       &wtarget, &wattr, &wremap);
+
+		if (!enabled) {
+			seq_printf(seq, "[%02d] disabled\n", win);
+			continue;
+		}
+
+
+		for (i = 0; mbus->soc->map[i].name; i++)
+			if (mbus->soc->map[i].target == wtarget &&
+			    mbus->soc->map[i].attr ==
+			    (wattr & mbus->soc->map[i].attrmask))
+				break;
+
+		name = mbus->soc->map[i].name ?: "unknown";
+
+		seq_printf(seq, "[%02d] %016llx - %016llx : %s",
+			   win, (unsigned long long)wbase,
+			   (unsigned long long)(wbase + wsize), name);
+
+		if (win < mbus->soc->num_remappable_wins) {
+			seq_printf(seq, " (remap %016llx)\n",
+				   (unsigned long long)wremap);
+		} else
+			seq_printf(seq, "\n");
+	}
+
+	return 0;
+}
+
+static int devs_debug_open(struct inode *inode, struct file *file)
+{
+	return single_open(file, devs_debug_show, inode->i_private);
+}
+
+static const struct file_operations devs_debug_fops = {
+	.open = devs_debug_open,
+	.read = seq_read,
+	.llseek = seq_lseek,
+	.release = single_release,
+};
+
+/*
+ * SoC-specific functions and definitions
+ */
+
+static unsigned int orion_mbus_win_offset(int win)
+{
+	return win << 4;
+}
+
+static unsigned int armada_370_xp_mbus_win_offset(int win)
+{
+	/* The register layout is a bit annoying and the below code
+	 * tries to cope with it.
+	 * - At offset 0x0, there are the registers for the first 8
+	 *   windows, with 4 registers of 32 bits per window (ctrl,
+	 *   base, remap low, remap high)
+	 * - Then at offset 0x80, there is a hole of 0x10 bytes for
+	 *   the internal registers base address and internal units
+	 *   sync barrier register.
+	 * - Then at offset 0x90, there the registers for 12
+	 *   windows, with only 2 registers of 32 bits per window
+	 *   (ctrl, base).
+	 */
+	if (win < 8)
+		return win << 4;
+	else
+		return 0x90 + ((win - 8) << 3);
+}
+
+static unsigned int mv78xx0_mbus_win_offset(int win)
+{
+	if (win < 8)
+		return win << 4;
+	else
+		return 0x900 + ((win - 8) << 4);
+}
+
+static void __init
+mvebu_mbus_default_setup_cpu_target(struct mvebu_mbus_state *mbus)
+{
+	int i;
+	int cs;
+
+	mvebu_mbus_dram_info.mbus_dram_target_id = TARGET_DDR;
+
+	for (i = 0, cs = 0; i < 4; i++) {
+		u32 base = readl(mbus->sdramwins_base + DDR_BASE_CS_OFF(i));
+		u32 size = readl(mbus->sdramwins_base + DDR_SIZE_CS_OFF(i));
+
+		/*
+		 * We only take care of entries for which the chip
+		 * select is enabled, and that don't have high base
+		 * address bits set (devices can only access the first
+		 * 32 bits of the memory).
+		 */
+		if ((size & DDR_SIZE_ENABLED) &&
+		    !(base & DDR_BASE_CS_HIGH_MASK)) {
+			struct mbus_dram_window *w;
+
+			w = &mvebu_mbus_dram_info.cs[cs++];
+			w->cs_index = i;
+			w->mbus_attr = 0xf & ~(1 << i);
+			if (mbus->hw_io_coherency)
+				w->mbus_attr |= ATTR_HW_COHERENCY;
+			w->base = base & DDR_BASE_CS_LOW_MASK;
+			w->size = (size | ~DDR_SIZE_MASK) + 1;
+		}
+	}
+	mvebu_mbus_dram_info.num_cs = cs;
+}
+
+static void __init
+mvebu_mbus_dove_setup_cpu_target(struct mvebu_mbus_state *mbus)
+{
+	int i;
+	int cs;
+
+	mvebu_mbus_dram_info.mbus_dram_target_id = TARGET_DDR;
+
+	for (i = 0, cs = 0; i < 2; i++) {
+		u32 map = readl(mbus->sdramwins_base + DOVE_DDR_BASE_CS_OFF(i));
+
+		/*
+		 * Chip select enabled?
+		 */
+		if (map & 1) {
+			struct mbus_dram_window *w;
+
+			w = &mvebu_mbus_dram_info.cs[cs++];
+			w->cs_index = i;
+			w->mbus_attr = 0; /* CS address decoding done inside */
+					  /* the DDR controller, no need to  */
+					  /* provide attributes */
+			w->base = map & 0xff800000;
+			w->size = 0x100000 << (((map & 0x000f0000) >> 16) - 4);
+		}
+	}
+
+	mvebu_mbus_dram_info.num_cs = cs;
+}
+
+static const struct mvebu_mbus_mapping armada_370_map[] = {
+	MAPDEF("bootrom",     1, 0xe0, MAPDEF_NOMASK),
+	MAPDEF("devbus-boot", 1, 0x2f, MAPDEF_NOMASK),
+	MAPDEF("devbus-cs0",  1, 0x3e, MAPDEF_NOMASK),
+	MAPDEF("devbus-cs1",  1, 0x3d, MAPDEF_NOMASK),
+	MAPDEF("devbus-cs2",  1, 0x3b, MAPDEF_NOMASK),
+	MAPDEF("devbus-cs3",  1, 0x37, MAPDEF_NOMASK),
+	MAPDEF("pcie0.0",     4, 0xe0, MAPDEF_PCIMASK),
+	MAPDEF("pcie1.0",     8, 0xe0, MAPDEF_PCIMASK),
+	{},
+};
+
+static const struct mvebu_mbus_soc_data armada_370_mbus_data = {
+	.num_wins            = 20,
+	.num_remappable_wins = 8,
+	.win_cfg_offset      = armada_370_xp_mbus_win_offset,
+	.setup_cpu_target    = mvebu_mbus_default_setup_cpu_target,
+	.map                 = armada_370_map,
+};
+
+static const struct mvebu_mbus_mapping armada_xp_map[] = {
+	MAPDEF("bootrom",     1, 0x1d, MAPDEF_NOMASK),
+	MAPDEF("devbus-boot", 1, 0x2f, MAPDEF_NOMASK),
+	MAPDEF("devbus-cs0",  1, 0x3e, MAPDEF_NOMASK),
+	MAPDEF("devbus-cs1",  1, 0x3d, MAPDEF_NOMASK),
+	MAPDEF("devbus-cs2",  1, 0x3b, MAPDEF_NOMASK),
+	MAPDEF("devbus-cs3",  1, 0x37, MAPDEF_NOMASK),
+	MAPDEF("pcie0.0",     4, 0xe0, MAPDEF_PCIMASK),
+	MAPDEF("pcie0.1",     4, 0xd0, MAPDEF_PCIMASK),
+	MAPDEF("pcie0.2",     4, 0xb0, MAPDEF_PCIMASK),
+	MAPDEF("pcie0.3",     4, 0x70, MAPDEF_PCIMASK),
+	MAPDEF("pcie1.0",     8, 0xe0, MAPDEF_PCIMASK),
+	MAPDEF("pcie1.1",     8, 0xd0, MAPDEF_PCIMASK),
+	MAPDEF("pcie1.2",     8, 0xb0, MAPDEF_PCIMASK),
+	MAPDEF("pcie1.3",     8, 0x70, MAPDEF_PCIMASK),
+	MAPDEF("pcie2.0",     4, 0xf0, MAPDEF_PCIMASK),
+	MAPDEF("pcie3.0",     8, 0xf0, MAPDEF_PCIMASK),
+	{},
+};
+
+static const struct mvebu_mbus_soc_data armada_xp_mbus_data = {
+	.num_wins            = 20,
+	.num_remappable_wins = 8,
+	.win_cfg_offset      = armada_370_xp_mbus_win_offset,
+	.setup_cpu_target    = mvebu_mbus_default_setup_cpu_target,
+	.map                 = armada_xp_map,
+};
+
+static const struct mvebu_mbus_mapping kirkwood_map[] = {
+	MAPDEF("pcie0.0", 4, 0xe0, MAPDEF_PCIMASK),
+	MAPDEF("pcie1.0", 8, 0xe0, MAPDEF_PCIMASK),
+	MAPDEF("sram",    3, 0x01, MAPDEF_NOMASK),
+	MAPDEF("nand",    1, 0x2f, MAPDEF_NOMASK),
+	{},
+};
+
+static const struct mvebu_mbus_soc_data kirkwood_mbus_data = {
+	.num_wins            = 8,
+	.num_remappable_wins = 4,
+	.win_cfg_offset      = orion_mbus_win_offset,
+	.setup_cpu_target    = mvebu_mbus_default_setup_cpu_target,
+	.map                 = kirkwood_map,
+};
+
+static const struct mvebu_mbus_mapping dove_map[] = {
+	MAPDEF("pcie0.0",    0x4, 0xe0, MAPDEF_PCIMASK),
+	MAPDEF("pcie1.0",    0x8, 0xe0, MAPDEF_PCIMASK),
+	MAPDEF("cesa",       0x3, 0x01, MAPDEF_NOMASK),
+	MAPDEF("bootrom",    0x1, 0xfd, MAPDEF_NOMASK),
+	MAPDEF("scratchpad", 0xd, 0x0, MAPDEF_NOMASK),
+	{},
+};
+
+static const struct mvebu_mbus_soc_data dove_mbus_data = {
+	.num_wins            = 8,
+	.num_remappable_wins = 4,
+	.win_cfg_offset      = orion_mbus_win_offset,
+	.setup_cpu_target    = mvebu_mbus_dove_setup_cpu_target,
+	.map                 = dove_map,
+};
+
+static const struct mvebu_mbus_mapping orion5x_map[] = {
+	MAPDEF("pcie0.0",     4, 0x51, MAPDEF_ORIONPCIMASK),
+	MAPDEF("pci0.0",      3, 0x51, MAPDEF_ORIONPCIMASK),
+	MAPDEF("devbus-boot", 1, 0x0f, MAPDEF_NOMASK),
+	MAPDEF("devbus-cs0",  1, 0x1e, MAPDEF_NOMASK),
+	MAPDEF("devbus-cs1",  1, 0x1d, MAPDEF_NOMASK),
+	MAPDEF("devbus-cs2",  1, 0x1b, MAPDEF_NOMASK),
+	MAPDEF("sram",        0, 0x00, MAPDEF_NOMASK),
+	{},
+};
+
+/*
+ * Some variants of Orion5x have 4 remappable windows, some other have
+ * only two of them.
+ */
+static const struct mvebu_mbus_soc_data orion5x_4win_mbus_data = {
+	.num_wins            = 8,
+	.num_remappable_wins = 4,
+	.win_cfg_offset      = orion_mbus_win_offset,
+	.setup_cpu_target    = mvebu_mbus_default_setup_cpu_target,
+	.map                 = orion5x_map,
+};
+
+static const struct mvebu_mbus_soc_data orion5x_2win_mbus_data = {
+	.num_wins            = 8,
+	.num_remappable_wins = 2,
+	.win_cfg_offset      = orion_mbus_win_offset,
+	.setup_cpu_target    = mvebu_mbus_default_setup_cpu_target,
+	.map                 = orion5x_map,
+};
+
+static const struct mvebu_mbus_mapping mv78xx0_map[] = {
+	MAPDEF("pcie0.0", 4, 0xe0, MAPDEF_PCIMASK),
+	MAPDEF("pcie0.1", 4, 0xd0, MAPDEF_PCIMASK),
+	MAPDEF("pcie0.2", 4, 0xb0, MAPDEF_PCIMASK),
+	MAPDEF("pcie0.3", 4, 0x70, MAPDEF_PCIMASK),
+	MAPDEF("pcie1.0", 8, 0xe0, MAPDEF_PCIMASK),
+	MAPDEF("pcie1.1", 8, 0xd0, MAPDEF_PCIMASK),
+	MAPDEF("pcie1.2", 8, 0xb0, MAPDEF_PCIMASK),
+	MAPDEF("pcie1.3", 8, 0x70, MAPDEF_PCIMASK),
+	MAPDEF("pcie2.0", 4, 0xf0, MAPDEF_PCIMASK),
+	MAPDEF("pcie3.0", 8, 0xf0, MAPDEF_PCIMASK),
+	{},
+};
+
+static const struct mvebu_mbus_soc_data mv78xx0_mbus_data = {
+	.num_wins            = 14,
+	.num_remappable_wins = 8,
+	.win_cfg_offset      = mv78xx0_mbus_win_offset,
+	.setup_cpu_target    = mvebu_mbus_default_setup_cpu_target,
+	.map                 = mv78xx0_map,
+};
+
+static const struct of_device_id of_mvebu_mbus_ids[] = {
+	{ .compatible = "marvell,armada370-mbus",
+	  .data = &armada_370_mbus_data, },
+	{ .compatible = "marvell,armadaxp-mbus",
+	  .data = &armada_xp_mbus_data, },
+	{ .compatible = "marvell,kirkwood-mbus",
+	  .data = &kirkwood_mbus_data, },
+	{ .compatible = "marvell,dove-mbus",
+	  .data = &dove_mbus_data, },
+	{ .compatible = "marvell,orion5x-88f5281-mbus",
+	  .data = &orion5x_4win_mbus_data, },
+	{ .compatible = "marvell,orion5x-88f5182-mbus",
+	  .data = &orion5x_2win_mbus_data, },
+	{ .compatible = "marvell,orion5x-88f5181-mbus",
+	  .data = &orion5x_2win_mbus_data, },
+	{ .compatible = "marvell,orion5x-88f6183-mbus",
+	  .data = &orion5x_4win_mbus_data, },
+	{ .compatible = "marvell,mv78xx0-mbus",
+	  .data = &mv78xx0_mbus_data, },
+	{ },
+};
+
+/*
+ * Public API of the driver
+ */
+int mvebu_mbus_add_window_remap_flags(const char *devname, phys_addr_t base,
+				      size_t size, phys_addr_t remap,
+				      unsigned int flags)
+{
+	struct mvebu_mbus_state *s = &mbus_state;
+	u8 target, attr;
+	int i;
+
+	if (!s->soc->map)
+		return -ENODEV;
+
+	for (i = 0; s->soc->map[i].name; i++)
+		if (!strcmp(s->soc->map[i].name, devname))
+			break;
+
+	if (!s->soc->map[i].name) {
+		pr_err("mvebu-mbus: unknown device '%s'\n", devname);
+		return -ENODEV;
+	}
+
+	target = s->soc->map[i].target;
+	attr   = s->soc->map[i].attr;
+
+	if (flags == MVEBU_MBUS_PCI_MEM)
+		attr |= 0x8;
+	else if (flags == MVEBU_MBUS_PCI_WA)
+		attr |= 0x28;
+
+	if (!mvebu_mbus_window_conflicts(s, base, size, target, attr)) {
+		pr_err("mvebu-mbus: cannot add window '%s', conflicts with another window\n",
+		       devname);
+		return -EINVAL;
+	}
+
+	return mvebu_mbus_alloc_window(s, base, size, remap, target, attr);
+
+}
+
+int mvebu_mbus_add_window(const char *devname, phys_addr_t base, size_t size)
+{
+	return mvebu_mbus_add_window_remap_flags(devname, base, size,
+						 MVEBU_MBUS_NO_REMAP, 0);
+}
+
+int mvebu_mbus_del_window(phys_addr_t base, size_t size)
+{
+	int win;
+
+	win = mvebu_mbus_find_window(&mbus_state, base, size);
+	if (win < 0)
+		return win;
+
+	mvebu_mbus_disable_window(&mbus_state, win);
+	return 0;
+}
+
+static __init int mvebu_mbus_debugfs_init(void)
+{
+	struct mvebu_mbus_state *s = &mbus_state;
+
+	/*
+	 * If no base has been initialized, doesn't make sense to
+	 * register the debugfs entries. We may be on a multiplatform
+	 * kernel that isn't running a Marvell EBU SoC.
+	 */
+	if (!s->mbuswins_base)
+		return 0;
+
+	s->debugfs_root = debugfs_create_dir("mvebu-mbus", NULL);
+	if (s->debugfs_root) {
+		s->debugfs_sdram = debugfs_create_file("sdram", S_IRUGO,
+						       s->debugfs_root, NULL,
+						       &sdram_debug_fops);
+		s->debugfs_devs = debugfs_create_file("devices", S_IRUGO,
+						      s->debugfs_root, NULL,
+						      &devs_debug_fops);
+	}
+
+	return 0;
+}
+fs_initcall(mvebu_mbus_debugfs_init);
+
+static int __init mvebu_mbus_common_init(struct mvebu_mbus_state *mbus,
+					 phys_addr_t mbuswins_base,
+					 size_t mbuswins_size,
+					 phys_addr_t sdramwins_base,
+					 size_t sdramwins_size)
+{
+	int win;
+
+	mbus->mbuswins_base = ioremap(mbuswins_base, mbuswins_size);
+	if (!mbus->mbuswins_base)
+		return -ENOMEM;
+
+	mbus->sdramwins_base = ioremap(sdramwins_base, sdramwins_size);
+	if (!mbus->sdramwins_base) {
+		iounmap(mbus_state.mbuswins_base);
+		return -ENOMEM;
+	}
+
+	for (win = 0; win < mbus->soc->num_wins; win++)
+		mvebu_mbus_disable_window(mbus, win);
+
+	mbus_state.soc->setup_cpu_target(mbus);
+
+	return 0;
+}
+
+int __init mvebu_mbus_init(const char *soc, phys_addr_t mbuswins_phys_base,
+			   size_t mbuswins_size,
+			   phys_addr_t sdramwins_phys_base,
+			   size_t sdramwins_size)
+{
+	const struct of_device_id *of_id;
+
+	for (of_id = of_mvebu_mbus_ids; of_id->compatible; of_id++)
+		if (!strcmp(of_id->compatible, soc))
+			break;
+
+	if (!of_id->compatible) {
+		pr_err("mvebu-mbus: could not find a matching SoC family\n");
+		return -ENODEV;
+	}
+
+	mbus_state.soc = of_id->data;
+
+	return mvebu_mbus_common_init(&mbus_state,
+				      mbuswins_phys_base, mbuswins_size,
+				      sdramwins_phys_base, sdramwins_size);
+}
+
+#ifdef CONFIG_OF
+int __init mvebu_mbus_dt_init(void)
+{
+	struct resource mbuswins_res, sdramwins_res;
+	struct device_node *np;
+	const struct of_device_id *of_id;
+
+	np = of_find_matching_node(NULL, of_mvebu_mbus_ids);
+	if (!np) {
+		pr_err("mvebu-mbus: could not find a matching SoC family\n");
+		return -ENODEV;
+	}
+
+	of_id = of_match_node(of_mvebu_mbus_ids, np);
+	mbus_state.soc = of_id->data;
+
+	if (of_address_to_resource(np, 0, &mbuswins_res)) {
+		pr_err("%s: cannot get MBUS register address\n", __func__);
+		return -EINVAL;
+	}
+
+	if (of_address_to_resource(np, 1, &sdramwins_res)) {
+		pr_err("%s: cannot get SDRAM register address\n", __func__);
+		return -EINVAL;
+	}
+
+	return mvebu_mbus_common_init(&mbus_state,
+				      mbuswins_res.start,
+				      resource_size(&mbuswins_res),
+				      sdramwins_res.start,
+				      resource_size(&sdramwins_res));
+}
+#endif
diff --git a/include/linux/mbus.h b/include/linux/mbus.h
index efa1a6d..1f80425 100644
--- a/include/linux/mbus.h
+++ b/include/linux/mbus.h
@@ -32,6 +32,17 @@  struct mbus_dram_target_info
 	} cs[4];
 };
 
+/* Flags for PCI/PCIe address decoding regions */
+#define MVEBU_MBUS_PCI_IO  0x1
+#define MVEBU_MBUS_PCI_MEM 0x2
+#define MVEBU_MBUS_PCI_WA  0x3
+
+/*
+ * Magic value that explicits that we don't need a remapping-capable
+ * address decoding window.
+ */
+#define MVEBU_MBUS_NO_REMAP (0xffffffff)
+
 /*
  * The Marvell mbus is to be found only on SOCs from the Orion family
  * at the moment.  Provide a dummy stub for other architectures.
@@ -44,4 +55,16 @@  static inline const struct mbus_dram_target_info *mv_mbus_dram_info(void)
 	return NULL;
 }
 #endif
-#endif
+
+int mvebu_mbus_add_window_remap_flags(const char *devname, phys_addr_t base,
+				      size_t size, phys_addr_t remap,
+				      unsigned int flags);
+int mvebu_mbus_add_window(const char *devname, phys_addr_t base,
+			  size_t size);
+int mvebu_mbus_del_window(phys_addr_t base, size_t size);
+int mvebu_mbus_init(const char *soc, phys_addr_t mbus_phys_base,
+		    size_t mbus_size, phys_addr_t sdram_phys_base,
+		    size_t sdram_size);
+int mvebu_mbus_dt_init(void);
+
+#endif /* __LINUX_MBUS_H */