diff mbox

[1/2] arm: mm: Ignore memory banks which are in front of the kernel when HIGHMEM is ON

Message ID 1360745925-20952-1-git-send-email-michal.simek@xilinx.com (mailing list archive)
State New, archived
Headers show

Commit Message

Michal Simek Feb. 13, 2013, 8:58 a.m. UTC
Arm kernel with HIGHMEM ON will fail when dts memory node is described
by memory banks and the starting address is not inside the first 16MB
of the first memory bank. If HIGHMEM is OFF and the configuration is the same
then the first memory bank is ignored and the kernel boots.

Here is the example of behavior:
dts memory reg = <0x0 0x10000000 0x10000000 0x30000000>
kernel load address = 0x10000000 (respectively 0x1000800)

Current:
"Machine: Xilinx Zynq Platform, model: Zynq ZC702 Development Board
bootconsole [earlycon0] enabled
Memory policy: ECC disabled, Data cache writeback
Kernel panic - not syncing: ERROR: Failed to allocate 0x1000 bytes below 0x0."

After:
The kernel ignore ram 0x0-0x0fffffff because is lower than the kernel starting
address and the kernel bootlog contains.
"Ignoring RAM at 00000000-0fffffff (CONFIG_HIGHMEM)."

Also using mem=768M on the command line will overwrite dts memory
map and kernel will boot with HIGHMEM ON too.

Signed-off-by: Michal Simek <michal.simek@xilinx.com>
---
 arch/arm/mm/mmu.c |    9 +++++++++
 1 files changed, 9 insertions(+), 0 deletions(-)

Comments

Russell King - ARM Linux Feb. 13, 2013, 9:03 a.m. UTC | #1
On Wed, Feb 13, 2013 at 09:58:44AM +0100, Michal Simek wrote:
> Arm kernel with HIGHMEM ON will fail when dts memory node is described
> by memory banks and the starting address is not inside the first 16MB
> of the first memory bank. If HIGHMEM is OFF and the configuration is the same
> then the first memory bank is ignored and the kernel boots.
> 
> Here is the example of behavior:
> dts memory reg = <0x0 0x10000000 0x10000000 0x30000000>
> kernel load address = 0x10000000 (respectively 0x1000800)
> 
> Current:
> "Machine: Xilinx Zynq Platform, model: Zynq ZC702 Development Board
> bootconsole [earlycon0] enabled
> Memory policy: ECC disabled, Data cache writeback
> Kernel panic - not syncing: ERROR: Failed to allocate 0x1000 bytes below 0x0."
> 
> After:
> The kernel ignore ram 0x0-0x0fffffff because is lower than the kernel starting
> address and the kernel bootlog contains.
> "Ignoring RAM at 00000000-0fffffff (CONFIG_HIGHMEM)."
> 
> Also using mem=768M on the command line will overwrite dts memory
> map and kernel will boot with HIGHMEM ON too.
> 
> Signed-off-by: Michal Simek <michal.simek@xilinx.com>
> ---
>  arch/arm/mm/mmu.c |    9 +++++++++
>  1 files changed, 9 insertions(+), 0 deletions(-)
> 
> diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c
> index ce328c7..e60f370 100644
> --- a/arch/arm/mm/mmu.c
> +++ b/arch/arm/mm/mmu.c
> @@ -937,6 +937,15 @@ void __init sanity_check_meminfo(void)
>  			highmem = 1;
>  
>  #ifdef CONFIG_HIGHMEM
> +		if (__va(bank->start + bank->size - 1) < (void *)PAGE_OFFSET) {
> +			pr_notice("Ignoring RAM at %.8llx-%.8llx ",
> +				(unsigned long long)bank->start,
> +				(unsigned long long)bank->start
> +							+ bank->size - 1);
> +			pr_cont("(CONFIG_HIGHMEM).\n");
> +			continue;
> +		}

NAK.  __va() is only valid for memory within the kernel's direct mapping
(iow, lowmem).  Physical addresses outside of that range are free to
return any other value.

So, it's entirely possible that a valid highmem setup will have memory
in highmem where __va(start + size - 1) _is_ below PAGE_OFFSET, and this
memory should _not_ be discarded.
Michal Simek Feb. 13, 2013, 2:39 p.m. UTC | #2
2013/2/13 Russell King - ARM Linux <linux@arm.linux.org.uk>:
> On Wed, Feb 13, 2013 at 09:58:44AM +0100, Michal Simek wrote:
>> Arm kernel with HIGHMEM ON will fail when dts memory node is described
>> by memory banks and the starting address is not inside the first 16MB
>> of the first memory bank. If HIGHMEM is OFF and the configuration is the same
>> then the first memory bank is ignored and the kernel boots.
>>
>> Here is the example of behavior:
>> dts memory reg = <0x0 0x10000000 0x10000000 0x30000000>
>> kernel load address = 0x10000000 (respectively 0x1000800)
>>
>> Current:
>> "Machine: Xilinx Zynq Platform, model: Zynq ZC702 Development Board
>> bootconsole [earlycon0] enabled
>> Memory policy: ECC disabled, Data cache writeback
>> Kernel panic - not syncing: ERROR: Failed to allocate 0x1000 bytes below 0x0."
>>
>> After:
>> The kernel ignore ram 0x0-0x0fffffff because is lower than the kernel starting
>> address and the kernel bootlog contains.
>> "Ignoring RAM at 00000000-0fffffff (CONFIG_HIGHMEM)."
>>
>> Also using mem=768M on the command line will overwrite dts memory
>> map and kernel will boot with HIGHMEM ON too.
>>
>> Signed-off-by: Michal Simek <michal.simek@xilinx.com>
>> ---
>>  arch/arm/mm/mmu.c |    9 +++++++++
>>  1 files changed, 9 insertions(+), 0 deletions(-)
>>
>> diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c
>> index ce328c7..e60f370 100644
>> --- a/arch/arm/mm/mmu.c
>> +++ b/arch/arm/mm/mmu.c
>> @@ -937,6 +937,15 @@ void __init sanity_check_meminfo(void)
>>                       highmem = 1;
>>
>>  #ifdef CONFIG_HIGHMEM
>> +             if (__va(bank->start + bank->size - 1) < (void *)PAGE_OFFSET) {
>> +                     pr_notice("Ignoring RAM at %.8llx-%.8llx ",
>> +                             (unsigned long long)bank->start,
>> +                             (unsigned long long)bank->start
>> +                                                     + bank->size - 1);
>> +                     pr_cont("(CONFIG_HIGHMEM).\n");
>> +                     continue;
>> +             }
>
> NAK.  __va() is only valid for memory within the kernel's direct mapping
> (iow, lowmem).  Physical addresses outside of that range are free to
> return any other value.
>
> So, it's entirely possible that a valid highmem setup will have memory
> in highmem where __va(start + size - 1) _is_ below PAGE_OFFSET, and this
> memory should _not_ be discarded.

ok. Can you describe me this configuration? Enough to tell me dts memory
fragment and kernel load addr which match this case.
My configuration is above and it doesn't work that's why I believe you
have tested this case
and it will be easy for me to retest it.
Maybe we have any problem with zynq that's why I would like to be aware of it.

What about
"arm: mm: Add support for booting the kernel out of the first 16MB of memory"?

Thanks,
Michal
Russell King - ARM Linux Feb. 13, 2013, 2:52 p.m. UTC | #3
On Wed, Feb 13, 2013 at 03:39:21PM +0100, Michal Simek wrote:
> ok. Can you describe me this configuration? Enough to tell me dts memory
> fragment and kernel load addr which match this case.

Anything which uses more than 1GB of memory and has PAGE_OFFSET set at
0xc0000000 (3GB).  Simple maths will tell you that, and why it fails.

Look:
- if __va(bank->start) = PAGE_OFFSET, and PAGE_OFFSET is at 3GB.
- if bank->size = 1GB (which we _do_ have), then
     __va(bank->start + bank->size) = 4GB.  4GB represented as a 32-bit
  pointer is NULL.  NULL < (void *)PAGE_OFFSET.

Therefore, your patch will cause systems with 1GB or more of memory in
one bank to ignore _all_ the memory passed in.

And if you look at the code:

                if (__va(bank->start) >= vmalloc_min ||
                    __va(bank->start) < (void *)PAGE_OFFSET)
                        highmem = 1;

notice the facy that we're marking all memory starting with an apparant
virtual address outside of PAGE_OFFSET...vmalloc_min as highmem.  This is
to catch cases exactly like this.

Memory for which __va(bank->start) < (void *)PAGE_OFFSET will also have
__va(bank->start + bank->size - 1) (in your patch) also below PAGE_OFFSET,
and your modification will cause the kernel to ignore this memory -
which is not acceptable.

I don't think there's much option for solutions to this; not with a common
kernel designed to run on multiple platforms.  If a platform doesn't
conform to the Linux requirements for a common kernel, then it doesn't
conform and it can't use it.

In much the same way that we ended up saying "no" to people who wanted
to place two physical banks of memory in reverse order in the virtual
mapping, I think this is another case of "no, we can't permit this with
common cross-subarch kernels".
Michal Simek Feb. 13, 2013, 3:52 p.m. UTC | #4
2013/2/13 Russell King - ARM Linux <linux@arm.linux.org.uk>:
> On Wed, Feb 13, 2013 at 03:39:21PM +0100, Michal Simek wrote:
>> ok. Can you describe me this configuration? Enough to tell me dts memory
>> fragment and kernel load addr which match this case.
>
> Anything which uses more than 1GB of memory and has PAGE_OFFSET set at
> 0xc0000000 (3GB).  Simple maths will tell you that, and why it fails.
>
> Look:
> - if __va(bank->start) = PAGE_OFFSET, and PAGE_OFFSET is at 3GB.
> - if bank->size = 1GB (which we _do_ have), then
>      __va(bank->start + bank->size) = 4GB.  4GB represented as a 32-bit
>   pointer is NULL.  NULL < (void *)PAGE_OFFSET.
>
> Therefore, your patch will cause systems with 1GB or more of memory in
> one bank to ignore _all_ the memory passed in.
>
> And if you look at the code:
>
>                 if (__va(bank->start) >= vmalloc_min ||
>                     __va(bank->start) < (void *)PAGE_OFFSET)
>                         highmem = 1;
>
> notice the facy that we're marking all memory starting with an apparant
> virtual address outside of PAGE_OFFSET...vmalloc_min as highmem.  This is
> to catch cases exactly like this.
>
> Memory for which __va(bank->start) < (void *)PAGE_OFFSET will also have
> __va(bank->start + bank->size - 1) (in your patch) also below PAGE_OFFSET,
> and your modification will cause the kernel to ignore this memory -
> which is not acceptable.

ok - that's fair.


> I don't think there's much option for solutions to this; not with a common
> kernel designed to run on multiple platforms.  If a platform doesn't
> conform to the Linux requirements for a common kernel, then it doesn't
> conform and it can't use it.

I have no problem to admit that this patch is wrong in this implementation
but I tend to disagree with this part.
The Linux kernel has some requirements, limitations, etc,  that's all truth.

Placing the kernel to the main memory to any location you want which end
up with kernel panic is from my point of view fault.
Saying that the kernel is always loaded withing the 16MB of memory is limitation
but it doesn't mean that we shouldn't remove it.

Ignoring the memory before the kernel can be one solution which I
would like to discuss.
What about the second patch used for !HIGHMEM case?

> In much the same way that we ended up saying "no" to people who wanted
> to place two physical banks of memory in reverse order in the virtual
> mapping, I think this is another case of "no, we can't permit this with
> common cross-subarch kernels".

This case should probably end up on device-tree rule (+dtc fault).
Not sure if memory banks should be ordered or not. Rob and Grant any thought?

Thanks,
Michal
Russell King - ARM Linux Feb. 13, 2013, 3:56 p.m. UTC | #5
On Wed, Feb 13, 2013 at 04:52:50PM +0100, Michal Simek wrote:
> This case should probably end up on device-tree rule (+dtc fault).
> Not sure if memory banks should be ordered or not. Rob and Grant any thought?

They _are_ ordered by memblock, because memblock always arranges the
banks in ascending physical address order - and it will merge
contiguous banks together irrespective of what order they appear as.

So you can't do something like bank 0: 0x60000000-0x70000000, bank 1:
0x50000000-0x60000000.  You'll end up with a single memblock entry for
0x50000000-0x70000000.
Nicolas Pitre Feb. 14, 2013, 5:01 a.m. UTC | #6
On Wed, 13 Feb 2013, Michal Simek wrote:

> The Linux kernel has some requirements, limitations, etc,  that's all truth.
> 
> Placing the kernel to the main memory to any location you want which end
> up with kernel panic is from my point of view fault.

I may agree with that, up to a point.

> Saying that the kernel is always loaded withing the 16MB of memory is limitation
> but it doesn't mean that we shouldn't remove it.

The actual limitation should be 128MB for zImage.  It will relocate the 
final kernel binary within the first 16MB by itself.

> Ignoring the memory before the kernel can be one solution which I
> would like to discuss.
> What about the second patch used for !HIGHMEM case?

Highmem is irrelevant.

If you want to ignore memory that way, you must look at the _physical_ 
addresses for each bank, and simply truncate anything that is below 
PHYS_OFFSET.  The comparison should be trivial with physical addresses 
as they're not limited to 32 bits.


Nicolas
diff mbox

Patch

diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c
index ce328c7..e60f370 100644
--- a/arch/arm/mm/mmu.c
+++ b/arch/arm/mm/mmu.c
@@ -937,6 +937,15 @@  void __init sanity_check_meminfo(void)
 			highmem = 1;
 
 #ifdef CONFIG_HIGHMEM
+		if (__va(bank->start + bank->size - 1) < (void *)PAGE_OFFSET) {
+			pr_notice("Ignoring RAM at %.8llx-%.8llx ",
+				(unsigned long long)bank->start,
+				(unsigned long long)bank->start
+							+ bank->size - 1);
+			pr_cont("(CONFIG_HIGHMEM).\n");
+			continue;
+		}
+
 		if (__va(bank->start) >= vmalloc_min ||
 		    __va(bank->start) < (void *)PAGE_OFFSET)
 			highmem = 1;