diff mbox series

[v3,3/7] hw/riscv/microchip_pfsoc.c: add an Icicle Kit fdt address function

Message ID 20230119191728.622081-4-dbarboza@ventanamicro.com (mailing list archive)
State New, archived
Headers show
Series riscv: fdt related cleanups | expand

Commit Message

Daniel Henrique Barboza Jan. 19, 2023, 7:17 p.m. UTC
The Icicle Kit board works with 2 distinct RAM banks that are separated
by a gap. We have a lower bank with 1GiB size, a gap follows,
then at 64GiB the high memory starts.

MachineClass::default_ram_size is set to 1.5Gb and machine_init() is
enforcing it as minimal RAM size, meaning that there we'll always have
at least 512 MiB in the Hi RAM area, and that the FDT will be located
there all the time.

riscv_compute_fdt_addr() can't handle this setup because it assumes that
the RAM is always contiguous. It's also returning an uint32_t because
it's enforcing that fdt address is sitting on an area that is addressable
to 32 bit CPUs, but 32 bits won't be enough to point to the Hi area of
the Icicle Kit RAM (and to its FDT itself).

Create a new function called microchip_compute_fdt_addr() that is able
to deal with all these details that are particular to the Icicle Kit.
Ditch riscv_compute_fdt_addr() and use it instead.

Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
---
 hw/riscv/microchip_pfsoc.c | 46 +++++++++++++++++++++++++++++++++++---
 1 file changed, 43 insertions(+), 3 deletions(-)

Comments

Conor Dooley Jan. 19, 2023, 7:56 p.m. UTC | #1
Hey!

On Thu, Jan 19, 2023 at 04:17:24PM -0300, Daniel Henrique Barboza wrote:
> The Icicle Kit board works with 2 distinct RAM banks that are separated
> by a gap. We have a lower bank with 1GiB size, a gap follows,
> then at 64GiB the high memory starts.
> 
> MachineClass::default_ram_size is set to 1.5Gb and machine_init() is
> enforcing it as minimal RAM size, meaning that there we'll always have
> at least 512 MiB in the Hi RAM area, and that the FDT will be located
> there all the time.
> 
> riscv_compute_fdt_addr() can't handle this setup because it assumes that
> the RAM is always contiguous. It's also returning an uint32_t because
> it's enforcing that fdt address is sitting on an area that is addressable
> to 32 bit CPUs, but 32 bits won't be enough to point to the Hi area of
> the Icicle Kit RAM (and to its FDT itself).
> 
> Create a new function called microchip_compute_fdt_addr() that is able
> to deal with all these details that are particular to the Icicle Kit.
> Ditch riscv_compute_fdt_addr() and use it instead.

Hmm, this breaks boot for me in what is a valid configuration for
Icicle/PolarFire SoC which was previously functional in QEMU.

I'll try and write another email explaining things in more detail, but
in case I do not have time to get that done in the next day or two I
figured I should let you know.

Thanks,
Conor.
Daniel Henrique Barboza Jan. 19, 2023, 8:17 p.m. UTC | #2
On 1/19/23 16:56, Conor Dooley wrote:
> Hey!
>
> On Thu, Jan 19, 2023 at 04:17:24PM -0300, Daniel Henrique Barboza wrote:
>> The Icicle Kit board works with 2 distinct RAM banks that are separated
>> by a gap. We have a lower bank with 1GiB size, a gap follows,
>> then at 64GiB the high memory starts.
>>
>> MachineClass::default_ram_size is set to 1.5Gb and machine_init() is
>> enforcing it as minimal RAM size, meaning that there we'll always have
>> at least 512 MiB in the Hi RAM area, and that the FDT will be located
>> there all the time.
>>
>> riscv_compute_fdt_addr() can't handle this setup because it assumes that
>> the RAM is always contiguous. It's also returning an uint32_t because
>> it's enforcing that fdt address is sitting on an area that is addressable
>> to 32 bit CPUs, but 32 bits won't be enough to point to the Hi area of
>> the Icicle Kit RAM (and to its FDT itself).
>>
>> Create a new function called microchip_compute_fdt_addr() that is able
>> to deal with all these details that are particular to the Icicle Kit.
>> Ditch riscv_compute_fdt_addr() and use it instead.
> Hmm, this breaks boot for me in what is a valid configuration for
> Icicle/PolarFire SoC which was previously functional in QEMU.

Thanks for letting me know.  Are you testing it by using the command line
you mentioned in the "qemu icicle kit es" thread?

$(QEMU)/qemu-system-riscv64 \
	-M microchip-icicle-kit \
	-m 2G -smp 5 \
	-kernel $(vmlinux_bin) \
	-dtb $(devkit).dtb \
	-initrd $(initramfs) \
	-display none \
	-serial null \
	-serial stdio




Thanks,

Daniel

>
> I'll try and write another email explaining things in more detail, but
> in case I do not have time to get that done in the next day or two I
> figured I should let you know.
>
> Thanks,
> Conor.
>
Conor Dooley Jan. 20, 2023, 12:15 a.m. UTC | #3
Hey Daniel,

Got through the stuff I wanted to get done tonight faster than
expected...

On Thu, Jan 19, 2023 at 05:17:33PM -0300, Daniel Henrique Barboza wrote:
> Are you testing it by using the command line
> you mentioned in the "qemu icicle kit es" thread?
> 
> $(QEMU)/qemu-system-riscv64 \
> 	-M microchip-icicle-kit \
> 	-m 2G -smp 5 \
> 	-kernel $(vmlinux_bin) \
> 	-dtb $(devkit).dtb \
> 	-initrd $(initramfs) \
> 	-display none \
> 	-serial null \
> 	-serial stdio

Yah, effectively. It's not quite that, but near enough as makes no real
difference:
qemu-icicle:
	$(QEMU)/qemu-system-riscv64 -M microchip-icicle-kit \
		-m 2G -smp 5 \
		-kernel $(vmlinux_bin) \
		-dtb $(wrkdir)/riscvpc.dtb \
		-initrd $(initramfs) \
		-display none -serial null \
		-serial stdio \
		-D qemu.log -d unimp

I just tried to make things somewhat more intelligible for that thread.

Also in case it is not obvious, I do work for Microchip. As I mentioned
to Alistair at LPC, I/we don't have the cycles at the moment to do
anything with QEMU, so the bits of fixes I have sent are things I fixed
while debugging other issues etc, mostly in the evenings.

Anways, I'll attempt to explain what the craic is here..

On Thu, Jan 19, 2023 at 04:17:24PM -0300, Daniel Henrique Barboza wrote:
> The Icicle Kit board works with 2 distinct RAM banks that are separated

Ehh, 2 isn't really true. There are 6 possible "windows" into the DDR on
MPFS, list here as with their start addresses.

32-bit cached     0x0080000000
64-bit cached     0x1000000000
32-bit non-cached 0x00c0000000
64-bit non-cached 0x1400000000
32-bit WCB        0x00d0000000
64-bit WCB        0x1800000000

These are the "bus" addresses, where the harts think the memory is, but
the memory is not actually connected there. There are some runtime
configurable registers which determine what addresses these correspond
to in the DDR itself.

When the QEMU port for MPFS was written, only two of these were in use,
the 32-bit and 64-bit non-cached regions. The config (seg) registers
were set up so that the 32-bit cached region pointed to 0x0 in DDR and
the 64-bit region pointed to 0x3000_0000 in DDR.
⢰⠒⠒⠒⠒⡖⠒⠒⠒⣶⠒0x80000000
⢸    ⡇   ⡇ ⡇         
⢸    ⡇   ⡇ ⡇         
⢸    ⡇   ⡇ ⡇         
⢸    ⡇   ⡇ ⡇         
⢸    ⡇   ⡇ ⡇         
⢸    ⡇   ⡇ ⡇         
⢸    ⡇   ⡇ ⡇         
⢸⡖⠒⠒⢲⡇   ⡇ 0x40000000
⢸⡇  ⢸⡇   ⡇ ⡇         
⢸⡇  ⢸⠓⠒⠒⠒⠃ ⡇ <-- 64-bit starts here
⢸⡇  ⢸      ⡇         
⢸⡇  ⢸      ⡇         
⢸⡇  ⢸      ⡇         
⢸⡇  ⢸      ⡇         
⢸⡇  ⢸      ⡇ <-- 32-bit starts at 0x0
⠘⠓⠒0⠚⠒⠒1⠒⠒⠒0x00000000

(These diagrams are a bit crap, I'm copy pasting them from a TUI tool
for visualising these I made for myself. The ~s can be ignored.
https://github.com/ConchuOD/memory-aperature-configurator)

> by a gap. We have a lower bank with 1GiB size, a gap follows,
> then at 64GiB the high memory starts.

As you correctly pointed out, that lower region is in fact 1 GiB & hence
there is actually an overlapping region of 256 MiB.

The Devicetree at this point in time looked like:
	ddrc_cache_lo: memory@80000000 {
		device_type = "memory";
		reg = <0x0 0x80000000 0x0 0x30000000>;
		clocks = <&clkcfg CLK_DDRC>;
		status = "okay";
	};

	ddrc_cache_hi: memory@1000000000 {
		device_type = "memory";
		reg = <0x10 0x0 0x0 0x40000000>;
		clocks = <&clkcfg CLK_DDRC>;
		status = "okay";
	};

At some point, it was decided that instead we would use a configuration
with ~no memory at 32-bit addresses. I think it was this one here:

⢰⡖⠒⠒⢲⡖⠒⠒⠒⣶⠒0x80000000
⢸⡇  ⢸⡇   ⣿ ⡇         
⢸⠓⠒⠒⠚⡇   ⡟ ⡇ <-- 32-bit starts here
⢸    ⡇   ⡇ ⡇         
⢸    ⡇   ⡇ ⡇         
⢸    ⡇   ⡇ ⡇         
⢸    ⡇   ⡇ ⡇         
⢸    ⡇   ⡇ ⡇         
⢸    ⡇   ⡇ 0x40000000
⢸    ⡇   ⡇ ⡇         
⢸    ⡇   ⡇ ⡇         
⢸    ⡇   ⡇ ⡇         
⢸    ⡇   ⡇ ⡇         
⢸    ⡇   ⡇ ⡇         
⢸    ⡇   ⡇ ⡇         
⢸    ⡇   ⡇ ⡇ <-- 64-bit starts at 0x0
⠘⠒⠒0⠒⠓⠒1⠒⠓⠒0x00000000

Because of how these windows work, the 32-bit cached region was always
there, just not used as the Devicetree became:
	ddrc_cache: memory@1000000000 {
		device_type = "memory";
		reg = <0x10 0x0 0x0 0x76000000>;
		status = "okay";
	};

The remaining bit of memory is being used for some WCB buffers etc &
not for the OS itself. This was never upstreamed anywhere AFAIK as it
was a workaround.

The current Devicetree in Linux & U-Boot corresponds to a configuration
like:
⢰⠒⠒⠒⠒⡖⠒⠒⠒⣶⠒0x80000000
⢸    ⡇   ⣿ ⡇         
⢸    ⡇   ⡟ ⡇         
⢸    ⡇   ⡇ ⡇         
⢸    ⡇   ⡇ ⡇         
⢸    ⡇   ⡇ ⡇         
⢸    ⡇   ⡇ ⡇         
⢸    ⡇   ⡇ ⡇         
⢸⡖⠒⠒⢲⡇   ⡇ 0x40000000
⢸⡇  ⢸⡇   ⡇ ⡇         
⢸⡇  ⢸⡇   ⡇ ⡇         
⢸⡇  ⢸⡇   ⡇ ⡇         
⢸⡇  ⢸⡇   ⡇ ⡇         
⢸⡇  ⢸⡇   ⡇ ⡇         
⢸⡇  ⢸⡇   ⡇ ⡇         
⢸⡇  ⢸⡇   ⡇ ⡇ <-- 32- & 64-bit start at 0x0
⠘⠓⠒0⠚⠓⠒1⠒⠓⠒0x00000000

That DT looks like:
	ddrc_cache_lo: memory@80000000 {
		device_type = "memory";
		reg = <0x0 0x80000000 0x0 0x40000000>;
		status = "okay";
	};

	ddrc_cache_hi: memory@1040000000 {
		device_type = "memory";
		reg = <0x10 0x40000000 0x0 0x40000000>;
		status = "okay";
	};

Each of these changes came as part of an FPGA reference design change &
a corresponding compatible change. I believe rtlv2203 was the second
configuration & rtlv2210 the third.

I can't boot the current configuration in QEMU, probably due to some of
the things you point out below.
To get it working, I remove the ddrc_cache_hi from my DT and boot with
the 32-bit cached memory only.
This is what the current changes have broken for me.

IMO it is a perfectly valid thing to boot a system using less than the
memory it *can* use.

I guess you read the other thread in which I stated that the HSS boot
that is documented doesn't work with recent HSSes. Ideally, and I am
most certainly _not_ expecting anyone to do this, when the HSS writes
the "seg" registers during boot to configure the memory layout as per
the FPGA bitstream QEMU would configure the memory layout it is
emulating to match.
Since direct kernel boot is a thing too, I was thinking that for that
mode, the config in the dtb should probably be used.
I don't know enough about QEMU to know if this is even possible!

The other possibility I was thinking of was just relaxing the DDR limit
entirely (and ignoring the overlaying) so that QEMU thinks there is 1
GiB at 0x8000_0000 and 16 GiB at 0x10_0000_0000.
Again, I've not had the cycles to look into any of this at all nor am I
expecting anyone else to - just while I am already typing about this
stuff there's no harm in broadcasting the other thoughts I had.

> MachineClass::default_ram_size is set to 1.5Gb and machine_init() is
> enforcing it as minimal RAM size, meaning that there we'll always have

I don't think that this is 

> at least 512 MiB in the Hi RAM area, and that the FDT will be located
> there all the time.

All the time? That's odd.
I suppose my kernel then remaps the dtb into the memory range it can
access, and therefore things keep ticking.

I don't think that machine_init() should be enforcing a minimum ram size
of 1.5 GiB - although maybe Bin Meng has a reason for that that I don't
understand.

> riscv_compute_fdt_addr() can't handle this setup because it assumes that
> the RAM is always contiguous. It's also returning an uint32_t because
> it's enforcing that fdt address is sitting on an area that is addressable
> to 32 bit CPUs, but 32 bits won't be enough to point to the Hi area of
> the Icicle Kit RAM (and to its FDT itself).
> 
> Create a new function called microchip_compute_fdt_addr() that is able
> to deal with all these details that are particular to the Icicle Kit.
> Ditch riscv_compute_fdt_addr() and use it instead.
> 
> Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
> ---
>  hw/riscv/microchip_pfsoc.c | 46 +++++++++++++++++++++++++++++++++++---
>  1 file changed, 43 insertions(+), 3 deletions(-)
> 
> diff --git a/hw/riscv/microchip_pfsoc.c b/hw/riscv/microchip_pfsoc.c
> index dcdbc2cac3..9b829e4d1a 100644
> --- a/hw/riscv/microchip_pfsoc.c
> +++ b/hw/riscv/microchip_pfsoc.c
> @@ -54,6 +54,8 @@
>  #include "sysemu/device_tree.h"
>  #include "sysemu/sysemu.h"
>  
> +#include <libfdt.h>
> +
>  /*
>   * The BIOS image used by this machine is called Hart Software Services (HSS).
>   * See https://github.com/polarfire-soc/hart-software-services
> @@ -513,6 +515,46 @@ static void microchip_pfsoc_soc_register_types(void)
>  
>  type_init(microchip_pfsoc_soc_register_types)
>  
> +static hwaddr microchip_compute_fdt_addr(MachineState *ms)
> +{
> +    const MemMapEntry *memmap = microchip_pfsoc_memmap;
> +    hwaddr mem_low_size = memmap[MICROCHIP_PFSOC_DRAM_LO].size;
> +    hwaddr mem_high_size, fdt_base;
> +    int ret = fdt_pack(ms->fdt);
> +    int fdtsize;
> +
> +    /* Should only fail if we've built a corrupted tree */
> +    g_assert(ret == 0);
> +
> +    fdtsize = fdt_totalsize(ms->fdt);
> +    if (fdtsize <= 0) {
> +        error_report("invalid device-tree");
> +        exit(1);
> +    }
> +
> +    /*
> +     * microchip_icicle_kit_machine_init() does a validation
> +     * that guarantees that ms->ram_size is always greater
> +     * than mem_low_size and that mem_high_size will be
> +     * at least 512MiB.

Again, I don't think it should be doing this at all. I see the comment
about that size refers to DDR training, but given the overlaying of
memory it's entirely possible to train against 64-bit addresses but then
boot a kernel using only low memory addresses.
Perhaps by default & for booting via the bootloader, but I don't think
enforcing this makes sense when the bootloader is not involved.

If a dtb is used as the source for the memory layout, requiring memory
at high addresses doesn't make sense to me. I have no idea if there is a
mechanism for figuring that out though nor am I au fait with how these
memory sizes are calculated.
It is getting kinda late here, so I am sending this without having
investigated any of the detail, sorry.

Hopefully that wasn't too deranged and you can at least understand why I
have been doing what I have...

Thanks,
Conor.

> +     *
> +     * This also means that our fdt_addr will be based
> +     * on the starting address of the HI DRAM block.
> +     */
> +    mem_high_size = ms->ram_size - mem_low_size;
> +    fdt_base = memmap[MICROCHIP_PFSOC_DRAM_HI].base;
> +
> +    /*
> +     * In theory we could copy riscv_compute_fdt_addr()
> +     * and put the FDT capped at maximum 3Gb from fdt_base,
> +     * but fdt_base is set at 0x1000000000 (64GiB). We
> +     * make the assumption here that the OS is ready to
> +     * handle the FDT, 2MB aligned, at the very end of
> +     * the available RAM.
> +     */
> +    return QEMU_ALIGN_DOWN(fdt_base + mem_high_size - fdtsize, 2 * MiB);
> +}
> +
>  static void microchip_icicle_kit_machine_init(MachineState *machine)
>  {
>      MachineClass *mc = MACHINE_GET_CLASS(machine);
> @@ -640,9 +682,7 @@ static void microchip_icicle_kit_machine_init(MachineState *machine)
>                                      "bootargs", machine->kernel_cmdline);
>          }
>  
> -        /* Compute the fdt load address in dram */
> -        fdt_load_addr = riscv_compute_fdt_addr(memmap[MICROCHIP_PFSOC_DRAM_LO].base,
> -                                              machine->ram_size, machine->fdt);
> +        fdt_load_addr = microchip_compute_fdt_addr(machine);
>          riscv_load_fdt(fdt_load_addr, machine->fdt);
>  
>          /* Load the reset vector */
> -- 
> 2.39.0
> 
> 
>
Daniel Henrique Barboza Jan. 21, 2023, 5:58 p.m. UTC | #4
Conor,

Thanks for the Icicle-kit walk-through! I'll not claim that I fully understood it,
but I understood enough to handle the situation ATM.

Without this change, this is where the FDT is being installed in the board when
I start it with 8Gb of RAM (retrieved via 'info roms'):

addr=00000000bfe00000 size=0x00a720 mem=ram name="fdt"

Which surprised me at first because this is almost at the end of the LO area which has
1Gb and I figured it would be in the middle of another RAM area. I took another read
at what we're doing in riscv_load_fdt():

-----------
temp = (dram_base < 3072 * MiB) ?  MIN(dram_end, 3072 * MiB) : dram_end;
fdt_addr = QEMU_ALIGN_DOWN(temp - fdtsize, 2 * MiB);
-----------

This code can be read as "if the starting address of the RAM is lower than 3Gb, put
the FDT no further than 3Gb (0xc0000000). Otherwise, put it at the end of dram",
where "dram_base" is the starting address of the RAM block that the function
receives.

For icicle-kit, this is being passed as  memmap[MICROCHIP_PFSOC_DRAM_LO].base,
0x80000000, which is 2Gb.

So, regardless of how much RAM we have (dram_end), the FDT will always be capped at
3Gb. At this moment, this fits exactly at the end of the LO area for the Icicle Kit.
Which is funny because this 3Gb restriction was added by commit 1a475d39ef54 to fix
32 bit guest boot and it happened to also work for the Microchip SoC.

So yeah, I thought that I was fixing a bug and in the end I caused one. This patch
needs to go.


Alistair, I believe I should re-send v2, this time explaining why the existing function
will not break the Microchip board because we'll never put the FDT out of the LO area
of the board. Does this work for you?


Conor, one more thing:


On 1/19/23 21:15, Conor Dooley wrote:
> Hey Daniel,
> 
> Got through the stuff I wanted to get done tonight faster than
> expected...
> 
> On Thu, Jan 19, 2023 at 05:17:33PM -0300, Daniel Henrique Barboza wrote:
>> Are you testing it by using the command line
>> you mentioned in the "qemu icicle kit es" thread?
>>
>> $(QEMU)/qemu-system-riscv64 \
>> 	-M microchip-icicle-kit \
>> 	-m 2G -smp 5 \
>> 	-kernel $(vmlinux_bin) \
>> 	-dtb $(devkit).dtb \
>> 	-initrd $(initramfs) \
>> 	-display none \
>> 	-serial null \
>> 	-serial stdio
> 
> Yah, effectively. It's not quite that, but near enough as makes no real
> difference:
> qemu-icicle:
> 	$(QEMU)/qemu-system-riscv64 -M microchip-icicle-kit \
> 		-m 2G -smp 5 \
> 		-kernel $(vmlinux_bin) \
> 		-dtb $(wrkdir)/riscvpc.dtb \
> 		-initrd $(initramfs) \
> 		-display none -serial null \
> 		-serial stdio \
> 		-D qemu.log -d unimp
> 
> I just tried to make things somewhat more intelligible for that thread.

I tried it out with kernel v6.0.0 (I saw you mentioning in the other thread that
this was the latest kernel you were able to boot this way)  and it booted up until
the kernel complained about missing initramfs. Any tips on how I can build an
initrd disk for the board?


Thanks,


Daniel

> 
> Also in case it is not obvious, I do work for Microchip. As I mentioned
> to Alistair at LPC, I/we don't have the cycles at the moment to do
> anything with QEMU, so the bits of fixes I have sent are things I fixed
> while debugging other issues etc, mostly in the evenings.
> 
> Anways, I'll attempt to explain what the craic is here..
> 
> On Thu, Jan 19, 2023 at 04:17:24PM -0300, Daniel Henrique Barboza wrote:
>> The Icicle Kit board works with 2 distinct RAM banks that are separated
> 
> Ehh, 2 isn't really true. There are 6 possible "windows" into the DDR on
> MPFS, list here as with their start addresses.
> 
> 32-bit cached     0x0080000000
> 64-bit cached     0x1000000000
> 32-bit non-cached 0x00c0000000
> 64-bit non-cached 0x1400000000
> 32-bit WCB        0x00d0000000
> 64-bit WCB        0x1800000000
> 
> These are the "bus" addresses, where the harts think the memory is, but
> the memory is not actually connected there. There are some runtime
> configurable registers which determine what addresses these correspond
> to in the DDR itself.
> 
> When the QEMU port for MPFS was written, only two of these were in use,
> the 32-bit and 64-bit non-cached regions. The config (seg) registers
> were set up so that the 32-bit cached region pointed to 0x0 in DDR and
> the 64-bit region pointed to 0x3000_0000 in DDR.
> ⢰⠒⠒⠒⠒⡖⠒⠒⠒⣶⠒0x80000000
> ⢸    ⡇   ⡇ ⡇
> ⢸    ⡇   ⡇ ⡇
> ⢸    ⡇   ⡇ ⡇
> ⢸    ⡇   ⡇ ⡇
> ⢸    ⡇   ⡇ ⡇
> ⢸    ⡇   ⡇ ⡇
> ⢸    ⡇   ⡇ ⡇
> ⢸⡖⠒⠒⢲⡇   ⡇ 0x40000000
> ⢸⡇  ⢸⡇   ⡇ ⡇
> ⢸⡇  ⢸⠓⠒⠒⠒⠃ ⡇ <-- 64-bit starts here
> ⢸⡇  ⢸      ⡇
> ⢸⡇  ⢸      ⡇
> ⢸⡇  ⢸      ⡇
> ⢸⡇  ⢸      ⡇
> ⢸⡇  ⢸      ⡇ <-- 32-bit starts at 0x0
> ⠘⠓⠒0⠚⠒⠒1⠒⠒⠒0x00000000
> 
> (These diagrams are a bit crap, I'm copy pasting them from a TUI tool
> for visualising these I made for myself. The ~s can be ignored.
> https://github.com/ConchuOD/memory-aperature-configurator)
> 
>> by a gap. We have a lower bank with 1GiB size, a gap follows,
>> then at 64GiB the high memory starts.
> 
> As you correctly pointed out, that lower region is in fact 1 GiB & hence
> there is actually an overlapping region of 256 MiB.
> 
> The Devicetree at this point in time looked like:
> 	ddrc_cache_lo: memory@80000000 {
> 		device_type = "memory";
> 		reg = <0x0 0x80000000 0x0 0x30000000>;
> 		clocks = <&clkcfg CLK_DDRC>;
> 		status = "okay";
> 	};
> 
> 	ddrc_cache_hi: memory@1000000000 {
> 		device_type = "memory";
> 		reg = <0x10 0x0 0x0 0x40000000>;
> 		clocks = <&clkcfg CLK_DDRC>;
> 		status = "okay";
> 	};
> 
> At some point, it was decided that instead we would use a configuration
> with ~no memory at 32-bit addresses. I think it was this one here:
> 
> ⢰⡖⠒⠒⢲⡖⠒⠒⠒⣶⠒0x80000000
> ⢸⡇  ⢸⡇   ⣿ ⡇
> ⢸⠓⠒⠒⠚⡇   ⡟ ⡇ <-- 32-bit starts here
> ⢸    ⡇   ⡇ ⡇
> ⢸    ⡇   ⡇ ⡇
> ⢸    ⡇   ⡇ ⡇
> ⢸    ⡇   ⡇ ⡇
> ⢸    ⡇   ⡇ ⡇
> ⢸    ⡇   ⡇ 0x40000000
> ⢸    ⡇   ⡇ ⡇
> ⢸    ⡇   ⡇ ⡇
> ⢸    ⡇   ⡇ ⡇
> ⢸    ⡇   ⡇ ⡇
> ⢸    ⡇   ⡇ ⡇
> ⢸    ⡇   ⡇ ⡇
> ⢸    ⡇   ⡇ ⡇ <-- 64-bit starts at 0x0
> ⠘⠒⠒0⠒⠓⠒1⠒⠓⠒0x00000000
> 
> Because of how these windows work, the 32-bit cached region was always
> there, just not used as the Devicetree became:
> 	ddrc_cache: memory@1000000000 {
> 		device_type = "memory";
> 		reg = <0x10 0x0 0x0 0x76000000>;
> 		status = "okay";
> 	};
> 
> The remaining bit of memory is being used for some WCB buffers etc &
> not for the OS itself. This was never upstreamed anywhere AFAIK as it
> was a workaround.
> 
> The current Devicetree in Linux & U-Boot corresponds to a configuration
> like:
> ⢰⠒⠒⠒⠒⡖⠒⠒⠒⣶⠒0x80000000
> ⢸    ⡇   ⣿ ⡇
> ⢸    ⡇   ⡟ ⡇
> ⢸    ⡇   ⡇ ⡇
> ⢸    ⡇   ⡇ ⡇
> ⢸    ⡇   ⡇ ⡇
> ⢸    ⡇   ⡇ ⡇
> ⢸    ⡇   ⡇ ⡇
> ⢸⡖⠒⠒⢲⡇   ⡇ 0x40000000
> ⢸⡇  ⢸⡇   ⡇ ⡇
> ⢸⡇  ⢸⡇   ⡇ ⡇
> ⢸⡇  ⢸⡇   ⡇ ⡇
> ⢸⡇  ⢸⡇   ⡇ ⡇
> ⢸⡇  ⢸⡇   ⡇ ⡇
> ⢸⡇  ⢸⡇   ⡇ ⡇
> ⢸⡇  ⢸⡇   ⡇ ⡇ <-- 32- & 64-bit start at 0x0
> ⠘⠓⠒0⠚⠓⠒1⠒⠓⠒0x00000000
> 
> That DT looks like:
> 	ddrc_cache_lo: memory@80000000 {
> 		device_type = "memory";
> 		reg = <0x0 0x80000000 0x0 0x40000000>;
> 		status = "okay";
> 	};
> 
> 	ddrc_cache_hi: memory@1040000000 {
> 		device_type = "memory";
> 		reg = <0x10 0x40000000 0x0 0x40000000>;
> 		status = "okay";
> 	};
> 
> Each of these changes came as part of an FPGA reference design change &
> a corresponding compatible change. I believe rtlv2203 was the second
> configuration & rtlv2210 the third.
> 
> I can't boot the current configuration in QEMU, probably due to some of
> the things you point out below.
> To get it working, I remove the ddrc_cache_hi from my DT and boot with
> the 32-bit cached memory only.
> This is what the current changes have broken for me.
> 
> IMO it is a perfectly valid thing to boot a system using less than the
> memory it *can* use.
> 
> I guess you read the other thread in which I stated that the HSS boot
> that is documented doesn't work with recent HSSes. Ideally, and I am
> most certainly _not_ expecting anyone to do this, when the HSS writes
> the "seg" registers during boot to configure the memory layout as per
> the FPGA bitstream QEMU would configure the memory layout it is
> emulating to match.
> Since direct kernel boot is a thing too, I was thinking that for that
> mode, the config in the dtb should probably be used.
> I don't know enough about QEMU to know if this is even possible!
> 
> The other possibility I was thinking of was just relaxing the DDR limit
> entirely (and ignoring the overlaying) so that QEMU thinks there is 1
> GiB at 0x8000_0000 and 16 GiB at 0x10_0000_0000.
> Again, I've not had the cycles to look into any of this at all nor am I
> expecting anyone else to - just while I am already typing about this
> stuff there's no harm in broadcasting the other thoughts I had.
> 
>> MachineClass::default_ram_size is set to 1.5Gb and machine_init() is
>> enforcing it as minimal RAM size, meaning that there we'll always have
> 
> I don't think that this is
> 
>> at least 512 MiB in the Hi RAM area, and that the FDT will be located
>> there all the time.
> 
> All the time? That's odd.
> I suppose my kernel then remaps the dtb into the memory range it can
> access, and therefore things keep ticking.
> 
> I don't think that machine_init() should be enforcing a minimum ram size
> of 1.5 GiB - although maybe Bin Meng has a reason for that that I don't
> understand.
> 
>> riscv_compute_fdt_addr() can't handle this setup because it assumes that
>> the RAM is always contiguous. It's also returning an uint32_t because
>> it's enforcing that fdt address is sitting on an area that is addressable
>> to 32 bit CPUs, but 32 bits won't be enough to point to the Hi area of
>> the Icicle Kit RAM (and to its FDT itself).
>>
>> Create a new function called microchip_compute_fdt_addr() that is able
>> to deal with all these details that are particular to the Icicle Kit.
>> Ditch riscv_compute_fdt_addr() and use it instead.
>>
>> Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
>> ---
>>   hw/riscv/microchip_pfsoc.c | 46 +++++++++++++++++++++++++++++++++++---
>>   1 file changed, 43 insertions(+), 3 deletions(-)
>>
>> diff --git a/hw/riscv/microchip_pfsoc.c b/hw/riscv/microchip_pfsoc.c
>> index dcdbc2cac3..9b829e4d1a 100644
>> --- a/hw/riscv/microchip_pfsoc.c
>> +++ b/hw/riscv/microchip_pfsoc.c
>> @@ -54,6 +54,8 @@
>>   #include "sysemu/device_tree.h"
>>   #include "sysemu/sysemu.h"
>>   
>> +#include <libfdt.h>
>> +
>>   /*
>>    * The BIOS image used by this machine is called Hart Software Services (HSS).
>>    * See https://github.com/polarfire-soc/hart-software-services
>> @@ -513,6 +515,46 @@ static void microchip_pfsoc_soc_register_types(void)
>>   
>>   type_init(microchip_pfsoc_soc_register_types)
>>   
>> +static hwaddr microchip_compute_fdt_addr(MachineState *ms)
>> +{
>> +    const MemMapEntry *memmap = microchip_pfsoc_memmap;
>> +    hwaddr mem_low_size = memmap[MICROCHIP_PFSOC_DRAM_LO].size;
>> +    hwaddr mem_high_size, fdt_base;
>> +    int ret = fdt_pack(ms->fdt);
>> +    int fdtsize;
>> +
>> +    /* Should only fail if we've built a corrupted tree */
>> +    g_assert(ret == 0);
>> +
>> +    fdtsize = fdt_totalsize(ms->fdt);
>> +    if (fdtsize <= 0) {
>> +        error_report("invalid device-tree");
>> +        exit(1);
>> +    }
>> +
>> +    /*
>> +     * microchip_icicle_kit_machine_init() does a validation
>> +     * that guarantees that ms->ram_size is always greater
>> +     * than mem_low_size and that mem_high_size will be
>> +     * at least 512MiB.
> 
> Again, I don't think it should be doing this at all. I see the comment
> about that size refers to DDR training, but given the overlaying of
> memory it's entirely possible to train against 64-bit addresses but then
> boot a kernel using only low memory addresses.
> Perhaps by default & for booting via the bootloader, but I don't think
> enforcing this makes sense when the bootloader is not involved.
> 
> If a dtb is used as the source for the memory layout, requiring memory
> at high addresses doesn't make sense to me. I have no idea if there is a
> mechanism for figuring that out though nor am I au fait with how these
> memory sizes are calculated.
> It is getting kinda late here, so I am sending this without having
> investigated any of the detail, sorry.
> 
> Hopefully that wasn't too deranged and you can at least understand why I
> have been doing what I have...
> 
> Thanks,
> Conor.
> 
>> +     *
>> +     * This also means that our fdt_addr will be based
>> +     * on the starting address of the HI DRAM block.
>> +     */
>> +    mem_high_size = ms->ram_size - mem_low_size;
>> +    fdt_base = memmap[MICROCHIP_PFSOC_DRAM_HI].base;
>> +
>> +    /*
>> +     * In theory we could copy riscv_compute_fdt_addr()
>> +     * and put the FDT capped at maximum 3Gb from fdt_base,
>> +     * but fdt_base is set at 0x1000000000 (64GiB). We
>> +     * make the assumption here that the OS is ready to
>> +     * handle the FDT, 2MB aligned, at the very end of
>> +     * the available RAM.
>> +     */
>> +    return QEMU_ALIGN_DOWN(fdt_base + mem_high_size - fdtsize, 2 * MiB);
>> +}
>> +
>>   static void microchip_icicle_kit_machine_init(MachineState *machine)
>>   {
>>       MachineClass *mc = MACHINE_GET_CLASS(machine);
>> @@ -640,9 +682,7 @@ static void microchip_icicle_kit_machine_init(MachineState *machine)
>>                                       "bootargs", machine->kernel_cmdline);
>>           }
>>   
>> -        /* Compute the fdt load address in dram */
>> -        fdt_load_addr = riscv_compute_fdt_addr(memmap[MICROCHIP_PFSOC_DRAM_LO].base,
>> -                                              machine->ram_size, machine->fdt);
>> +        fdt_load_addr = microchip_compute_fdt_addr(machine);
>>           riscv_load_fdt(fdt_load_addr, machine->fdt);
>>   
>>           /* Load the reset vector */
>> -- 
>> 2.39.0
>>
>>
>>
Conor Dooley Jan. 21, 2023, 7:51 p.m. UTC | #5
On Sat, Jan 21, 2023 at 02:58:19PM -0300, Daniel Henrique Barboza wrote:
> Conor,
> 
> Thanks for the Icicle-kit walk-through!

nw chief

> I'll not claim that I fully understood it,
> but I understood enough to handle the situation ATM.

tbf, I struggle to explain/visualise that stuff with the "windows" etc
well. I wrote myself a program to visualise it for a good reason!
Well it was done in Rust, so there were two good reasons ;)

> Without this change, this is where the FDT is being installed in the board when
> I start it with 8Gb of RAM (retrieved via 'info roms'):
> 
> addr=00000000bfe00000 size=0x00a720 mem=ram name="fdt"
> 
> Which surprised me at first because this is almost at the end of the LO area which has
> 1Gb and I figured it would be in the middle of another RAM area. I took another read
> at what we're doing in riscv_load_fdt():
> 
> -----------
> temp = (dram_base < 3072 * MiB) ?  MIN(dram_end, 3072 * MiB) : dram_end;
> fdt_addr = QEMU_ALIGN_DOWN(temp - fdtsize, 2 * MiB);
> -----------
> 
> This code can be read as "if the starting address of the RAM is lower than 3Gb, put
> the FDT no further than 3Gb (0xc0000000). Otherwise, put it at the end of dram",
> where "dram_base" is the starting address of the RAM block that the function
> receives.
> 
> For icicle-kit, this is being passed as  memmap[MICROCHIP_PFSOC_DRAM_LO].base,
> 0x80000000, which is 2Gb.
> 
> So, regardless of how much RAM we have (dram_end), the FDT will always be capped at
> 3Gb. At this moment, this fits exactly at the end of the LO area for the Icicle Kit.
> Which is funny because this 3Gb restriction was added by commit 1a475d39ef54 to fix
> 32 bit guest boot and it happened to also work for the Microchip SoC.

That's hilariously convenient hahah

> So yeah, I thought that I was fixing a bug and in the end I caused one. This patch
> needs to go.
> 
> Alistair, I believe I should re-send v2, this time explaining why the existing function
> will not break the Microchip board because we'll never put the FDT out of the LO area
> of the board. Does this work for you?
> Conor, one more thing:
> 
> 
> On 1/19/23 21:15, Conor Dooley wrote:
> > Hey Daniel,
> > 
> > Got through the stuff I wanted to get done tonight faster than
> > expected...
> > 
> > On Thu, Jan 19, 2023 at 05:17:33PM -0300, Daniel Henrique Barboza wrote:
> > > Are you testing it by using the command line
> > > you mentioned in the "qemu icicle kit es" thread?
> > > 
> > > $(QEMU)/qemu-system-riscv64 \
> > > 	-M microchip-icicle-kit \
> > > 	-m 2G -smp 5 \
> > > 	-kernel $(vmlinux_bin) \
> > > 	-dtb $(devkit).dtb \
> > > 	-initrd $(initramfs) \
> > > 	-display none \
> > > 	-serial null \
> > > 	-serial stdio
> > 
> > Yah, effectively. It's not quite that, but near enough as makes no real
> > difference:
> > qemu-icicle:
> > 	$(QEMU)/qemu-system-riscv64 -M microchip-icicle-kit \
> > 		-m 2G -smp 5 \
> > 		-kernel $(vmlinux_bin) \
> > 		-dtb $(wrkdir)/riscvpc.dtb \
> > 		-initrd $(initramfs) \
> > 		-display none -serial null \
> > 		-serial stdio \
> > 		-D qemu.log -d unimp
> > 
> > I just tried to make things somewhat more intelligible for that thread.
> 
> I tried it out with kernel v6.0.0 (I saw you mentioning in the other thread that
> this was the latest kernel you were able to boot this way)

Yah, I said that because I didn't want them to have to mess with DT.
Later kernels do work, but need DT modifications as things are now
configured for the below case.
> > The current Devicetree in Linux & U-Boot corresponds to a configuration
> > like:
> > ⢰⠒⠒⠒⠒⡖⠒⠒⠒⣶⠒0x80000000
> > ⢸    ⡇   ⣿ ⡇
> > ⢸    ⡇   ⡟ ⡇
> > ⢸    ⡇   ⡇ ⡇
> > ⢸    ⡇   ⡇ ⡇
> > ⢸    ⡇   ⡇ ⡇
> > ⢸    ⡇   ⡇ ⡇
> > ⢸    ⡇   ⡇ ⡇
> > ⢸⡖⠒⠒⢲⡇   ⡇ 0x40000000
> > ⢸⡇  ⢸⡇   ⡇ ⡇
> > ⢸⡇  ⢸⡇   ⡇ ⡇
> > ⢸⡇  ⢸⡇   ⡇ ⡇
> > ⢸⡇  ⢸⡇   ⡇ ⡇
> > ⢸⡇  ⢸⡇   ⡇ ⡇
> > ⢸⡇  ⢸⡇   ⡇ ⡇
> > ⢸⡇  ⢸⡇   ⡇ ⡇ <-- 32- & 64-bit start at 0x0
> > ⠘⠓⠒0⠚⠓⠒1⠒⠓⠒0x00000000
> > 
> > That DT looks like:
> > 	ddrc_cache_lo: memory@80000000 {
> > 		device_type = "memory";
> > 		reg = <0x0 0x80000000 0x0 0x40000000>;
> > 		status = "okay";
> > 	};
> > 
> > 	ddrc_cache_hi: memory@1040000000 {
> > 		device_type = "memory";
> > 		reg = <0x10 0x40000000 0x0 0x40000000>;
> > 		status = "okay";
> > 	};

This one doesn't work in QEMU, so for those kernels I just delete the
ddrc_cache_hi node, and v6.2-rcN works in that way.

> and it booted up until
> the kernel complained about missing initramfs. Any tips on how I can build an
> initrd disk for the board?

Ehh, any old initramfs for RISC-V should work, right? I suppose passing
a normal rootfs does either - I just mostly work w/ hardware & use NFS
there, so have nothing scripted to build a rootfs for me, which is why
I've been using initramfs.
I build one using buildroot, with a config like:
https://raw.githubusercontent.com/ConchuOD/riscv-env/dev/conf/lowmem/buildroot_initramfs_config

I then do (ripped from my makefile rule):
		$(linux_srcdir)/usr/gen_initramfs.sh \
		-o initramfs.cpio -u $(shell id -u) -g $(shell id -g) \
		initramfs.txt \
		$(path_to_buildroot_sysroot)

I'm lazy and CBA finding somewhere else to host this, so I put one here:
https://github.com/ConchuOD/riscv-env/releases/download/v2022.03/initramfs.cpio.gz

Thanks,
Conor.
Alistair Francis Jan. 22, 2023, 10:53 p.m. UTC | #6
On Sun, Jan 22, 2023 at 5:16 AM Daniel Henrique Barboza
<dbarboza@ventanamicro.com> wrote:
>
> Conor,
>
> Thanks for the Icicle-kit walk-through! I'll not claim that I fully understood it,
> but I understood enough to handle the situation ATM.
>
> Without this change, this is where the FDT is being installed in the board when
> I start it with 8Gb of RAM (retrieved via 'info roms'):
>
> addr=00000000bfe00000 size=0x00a720 mem=ram name="fdt"
>
> Which surprised me at first because this is almost at the end of the LO area which has
> 1Gb and I figured it would be in the middle of another RAM area. I took another read
> at what we're doing in riscv_load_fdt():
>
> -----------
> temp = (dram_base < 3072 * MiB) ?  MIN(dram_end, 3072 * MiB) : dram_end;
> fdt_addr = QEMU_ALIGN_DOWN(temp - fdtsize, 2 * MiB);
> -----------
>
> This code can be read as "if the starting address of the RAM is lower than 3Gb, put
> the FDT no further than 3Gb (0xc0000000). Otherwise, put it at the end of dram",
> where "dram_base" is the starting address of the RAM block that the function
> receives.
>
> For icicle-kit, this is being passed as  memmap[MICROCHIP_PFSOC_DRAM_LO].base,
> 0x80000000, which is 2Gb.
>
> So, regardless of how much RAM we have (dram_end), the FDT will always be capped at
> 3Gb. At this moment, this fits exactly at the end of the LO area for the Icicle Kit.
> Which is funny because this 3Gb restriction was added by commit 1a475d39ef54 to fix
> 32 bit guest boot and it happened to also work for the Microchip SoC.
>
> So yeah, I thought that I was fixing a bug and in the end I caused one. This patch
> needs to go.
>
>
> Alistair, I believe I should re-send v2, this time explaining why the existing function
> will not break the Microchip board because we'll never put the FDT out of the LO area
> of the board. Does this work for you?

I think that's fine. My only worry is that we are losing some
flexibility that some future board might want.

Alistair
Daniel Henrique Barboza Jan. 23, 2023, 10:19 a.m. UTC | #7
On 1/22/23 19:53, Alistair Francis wrote:
> On Sun, Jan 22, 2023 at 5:16 AM Daniel Henrique Barboza
> <dbarboza@ventanamicro.com> wrote:
>>
>> Conor,
>>
>> Thanks for the Icicle-kit walk-through! I'll not claim that I fully understood it,
>> but I understood enough to handle the situation ATM.
>>
>> Without this change, this is where the FDT is being installed in the board when
>> I start it with 8Gb of RAM (retrieved via 'info roms'):
>>
>> addr=00000000bfe00000 size=0x00a720 mem=ram name="fdt"
>>
>> Which surprised me at first because this is almost at the end of the LO area which has
>> 1Gb and I figured it would be in the middle of another RAM area. I took another read
>> at what we're doing in riscv_load_fdt():
>>
>> -----------
>> temp = (dram_base < 3072 * MiB) ?  MIN(dram_end, 3072 * MiB) : dram_end;
>> fdt_addr = QEMU_ALIGN_DOWN(temp - fdtsize, 2 * MiB);
>> -----------
>>
>> This code can be read as "if the starting address of the RAM is lower than 3Gb, put
>> the FDT no further than 3Gb (0xc0000000). Otherwise, put it at the end of dram",
>> where "dram_base" is the starting address of the RAM block that the function
>> receives.
>>
>> For icicle-kit, this is being passed as  memmap[MICROCHIP_PFSOC_DRAM_LO].base,
>> 0x80000000, which is 2Gb.
>>
>> So, regardless of how much RAM we have (dram_end), the FDT will always be capped at
>> 3Gb. At this moment, this fits exactly at the end of the LO area for the Icicle Kit.
>> Which is funny because this 3Gb restriction was added by commit 1a475d39ef54 to fix
>> 32 bit guest boot and it happened to also work for the Microchip SoC.
>>
>> So yeah, I thought that I was fixing a bug and in the end I caused one. This patch
>> needs to go.
>>
>>
>> Alistair, I believe I should re-send v2, this time explaining why the existing function
>> will not break the Microchip board because we'll never put the FDT out of the LO area
>> of the board. Does this work for you?
> 
> I think that's fine. My only worry is that we are losing some
> flexibility that some future board might want.

What if we change riscv_load_fdt() parameters to pass a MemoryRegion/MemMapEntry
instead of just dram_base?

Instead of this:

uint64_t riscv_load_fdt(hwaddr dram_base, uint64_t mem_size, void *fdt)

We would have this:

uint64_t riscv_load_fdt(MemMapEntry mem, uint64_t mem_size, void *fdt)

Or even this:

uint64_t riscv_load_fdt(hwaddr dram_base, hwaddr dram_size,
                         uint64_t mem_size, void *fdt)


And then we can make assumptions based on the actual memory region that the fdt
is going to fit into, instead of having a starting address and a total memory
size and have to deal with issues such as sparse memory.

We can keep all the assumptions already made today (e.g. the 3Gb maximum addr)
while also having a guarantee that the fdt isn't going to be put in the wrong
memory region/spot if we decide to change the assumptions later on.


Thanks,

Daniel



> 
> Alistair
Alistair Francis Jan. 23, 2023, 11:49 a.m. UTC | #8
On Mon, Jan 23, 2023 at 8:19 PM Daniel Henrique Barboza
<dbarboza@ventanamicro.com> wrote:
>
>
>
> On 1/22/23 19:53, Alistair Francis wrote:
> > On Sun, Jan 22, 2023 at 5:16 AM Daniel Henrique Barboza
> > <dbarboza@ventanamicro.com> wrote:
> >>
> >> Conor,
> >>
> >> Thanks for the Icicle-kit walk-through! I'll not claim that I fully understood it,
> >> but I understood enough to handle the situation ATM.
> >>
> >> Without this change, this is where the FDT is being installed in the board when
> >> I start it with 8Gb of RAM (retrieved via 'info roms'):
> >>
> >> addr=00000000bfe00000 size=0x00a720 mem=ram name="fdt"
> >>
> >> Which surprised me at first because this is almost at the end of the LO area which has
> >> 1Gb and I figured it would be in the middle of another RAM area. I took another read
> >> at what we're doing in riscv_load_fdt():
> >>
> >> -----------
> >> temp = (dram_base < 3072 * MiB) ?  MIN(dram_end, 3072 * MiB) : dram_end;
> >> fdt_addr = QEMU_ALIGN_DOWN(temp - fdtsize, 2 * MiB);
> >> -----------
> >>
> >> This code can be read as "if the starting address of the RAM is lower than 3Gb, put
> >> the FDT no further than 3Gb (0xc0000000). Otherwise, put it at the end of dram",
> >> where "dram_base" is the starting address of the RAM block that the function
> >> receives.
> >>
> >> For icicle-kit, this is being passed as  memmap[MICROCHIP_PFSOC_DRAM_LO].base,
> >> 0x80000000, which is 2Gb.
> >>
> >> So, regardless of how much RAM we have (dram_end), the FDT will always be capped at
> >> 3Gb. At this moment, this fits exactly at the end of the LO area for the Icicle Kit.
> >> Which is funny because this 3Gb restriction was added by commit 1a475d39ef54 to fix
> >> 32 bit guest boot and it happened to also work for the Microchip SoC.
> >>
> >> So yeah, I thought that I was fixing a bug and in the end I caused one. This patch
> >> needs to go.
> >>
> >>
> >> Alistair, I believe I should re-send v2, this time explaining why the existing function
> >> will not break the Microchip board because we'll never put the FDT out of the LO area
> >> of the board. Does this work for you?
> >
> > I think that's fine. My only worry is that we are losing some
> > flexibility that some future board might want.
>
> What if we change riscv_load_fdt() parameters to pass a MemoryRegion/MemMapEntry
> instead of just dram_base?
>
> Instead of this:
>
> uint64_t riscv_load_fdt(hwaddr dram_base, uint64_t mem_size, void *fdt)
>
> We would have this:
>
> uint64_t riscv_load_fdt(MemMapEntry mem, uint64_t mem_size, void *fdt)
>
> Or even this:
>
> uint64_t riscv_load_fdt(hwaddr dram_base, hwaddr dram_size,
>                          uint64_t mem_size, void *fdt)
>
>
> And then we can make assumptions based on the actual memory region that the fdt
> is going to fit into, instead of having a starting address and a total memory
> size and have to deal with issues such as sparse memory.
>
> We can keep all the assumptions already made today (e.g. the 3Gb maximum addr)
> while also having a guarantee that the fdt isn't going to be put in the wrong
> memory region/spot if we decide to change the assumptions later on.

That seems like a good direction. We currently don't need this though,
so don't feel like it needs to be done today.

Alistair

>
>
> Thanks,
>
> Daniel
>
>
>
> >
> > Alistair
diff mbox series

Patch

diff --git a/hw/riscv/microchip_pfsoc.c b/hw/riscv/microchip_pfsoc.c
index dcdbc2cac3..9b829e4d1a 100644
--- a/hw/riscv/microchip_pfsoc.c
+++ b/hw/riscv/microchip_pfsoc.c
@@ -54,6 +54,8 @@ 
 #include "sysemu/device_tree.h"
 #include "sysemu/sysemu.h"
 
+#include <libfdt.h>
+
 /*
  * The BIOS image used by this machine is called Hart Software Services (HSS).
  * See https://github.com/polarfire-soc/hart-software-services
@@ -513,6 +515,46 @@  static void microchip_pfsoc_soc_register_types(void)
 
 type_init(microchip_pfsoc_soc_register_types)
 
+static hwaddr microchip_compute_fdt_addr(MachineState *ms)
+{
+    const MemMapEntry *memmap = microchip_pfsoc_memmap;
+    hwaddr mem_low_size = memmap[MICROCHIP_PFSOC_DRAM_LO].size;
+    hwaddr mem_high_size, fdt_base;
+    int ret = fdt_pack(ms->fdt);
+    int fdtsize;
+
+    /* Should only fail if we've built a corrupted tree */
+    g_assert(ret == 0);
+
+    fdtsize = fdt_totalsize(ms->fdt);
+    if (fdtsize <= 0) {
+        error_report("invalid device-tree");
+        exit(1);
+    }
+
+    /*
+     * microchip_icicle_kit_machine_init() does a validation
+     * that guarantees that ms->ram_size is always greater
+     * than mem_low_size and that mem_high_size will be
+     * at least 512MiB.
+     *
+     * This also means that our fdt_addr will be based
+     * on the starting address of the HI DRAM block.
+     */
+    mem_high_size = ms->ram_size - mem_low_size;
+    fdt_base = memmap[MICROCHIP_PFSOC_DRAM_HI].base;
+
+    /*
+     * In theory we could copy riscv_compute_fdt_addr()
+     * and put the FDT capped at maximum 3Gb from fdt_base,
+     * but fdt_base is set at 0x1000000000 (64GiB). We
+     * make the assumption here that the OS is ready to
+     * handle the FDT, 2MB aligned, at the very end of
+     * the available RAM.
+     */
+    return QEMU_ALIGN_DOWN(fdt_base + mem_high_size - fdtsize, 2 * MiB);
+}
+
 static void microchip_icicle_kit_machine_init(MachineState *machine)
 {
     MachineClass *mc = MACHINE_GET_CLASS(machine);
@@ -640,9 +682,7 @@  static void microchip_icicle_kit_machine_init(MachineState *machine)
                                     "bootargs", machine->kernel_cmdline);
         }
 
-        /* Compute the fdt load address in dram */
-        fdt_load_addr = riscv_compute_fdt_addr(memmap[MICROCHIP_PFSOC_DRAM_LO].base,
-                                              machine->ram_size, machine->fdt);
+        fdt_load_addr = microchip_compute_fdt_addr(machine);
         riscv_load_fdt(fdt_load_addr, machine->fdt);
 
         /* Load the reset vector */