diff mbox

[BUG,2.6.31-rc1] HIGHMEM64G causes hang in PCI init on 32-bit x86

Message ID 4A496D4B.3040608@kernel.org (mailing list archive)
State Not Applicable, archived
Headers show

Commit Message

Yinghai Lu June 30, 2009, 1:41 a.m. UTC
Linus Torvalds wrote:
> 
> On Mon, 29 Jun 2009, Yinghai Lu wrote:
>> +		end = round_up(start, ram_alignment(start)) - 1;
>> +		if (start > (resource_size_t)end)
>>  			continue;
>> -		reserve_region_with_split(&iomem_resource, start,
>> -						  end - 1, "RAM buffer");
>> +		reserve_region_with_split(&iomem_resource, (resource_size_t)start,
>> +					  (resource_size_t)end, "RAM buffer");
> 
> Hmm. You shouldn't need the casts with reserve_region_with_split(), and 
> they just make things uglier.
> 
> Also, I wonder if we should do something like this instead
> 
> 	#define MAX_RESOURCE_SIZE ((resource_size_t)-1)
> 
> 	...
> 	end = round_up(start, ram_alignment(start)) - 1;
> 	if (end > MAX_RESOURCE_SIZE)
> 		end = MAX_RESOURCE_SIZE;
> 	if (start > end)
> 		continue;
> 
> Because otherwise we'll just be ignoring resources that cross the resource 
> size boundary, which sounds wrong.
> 
> We _could_ have a RAM resource that crosses the 4GB boundary, after all.
> 
> Yeah, it doesn't happen much in practice, because usually the 3G-4G range 
> is left for PCI mappings etc, so we might never hit this in practice, but 
> still, this sounds like a more correct thing to do.
> 
> It also avoids the cast. We simply cap the end to the max that 
> 'resource_size_t' can hold.

Mikael, please try this on your system, and send out /proc/iomem

Thanks

Yinghai

[PATCH] x86: add boundary check for 32bit res before expand e820 resource to alignment

fix hang with HIGHMEM_64G and 32bit resource.

according to hpa and Linus, use (resource_size_t)-1 to fend off big ranges.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>

---
 arch/x86/kernel/e820.c |   20 ++++++++++++--------
 1 file changed, 12 insertions(+), 8 deletions(-)


--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Mikael Pettersson June 30, 2009, 8:45 a.m. UTC | #1
Yinghai Lu writes:
 > Linus Torvalds wrote:
 > > 
 > > On Mon, 29 Jun 2009, Yinghai Lu wrote:
 > >> +		end = round_up(start, ram_alignment(start)) - 1;
 > >> +		if (start > (resource_size_t)end)
 > >>  			continue;
 > >> -		reserve_region_with_split(&iomem_resource, start,
 > >> -						  end - 1, "RAM buffer");
 > >> +		reserve_region_with_split(&iomem_resource, (resource_size_t)start,
 > >> +					  (resource_size_t)end, "RAM buffer");
 > > 
 > > Hmm. You shouldn't need the casts with reserve_region_with_split(), and 
 > > they just make things uglier.
 > > 
 > > Also, I wonder if we should do something like this instead
 > > 
 > > 	#define MAX_RESOURCE_SIZE ((resource_size_t)-1)
 > > 
 > > 	...
 > > 	end = round_up(start, ram_alignment(start)) - 1;
 > > 	if (end > MAX_RESOURCE_SIZE)
 > > 		end = MAX_RESOURCE_SIZE;
 > > 	if (start > end)
 > > 		continue;
 > > 
 > > Because otherwise we'll just be ignoring resources that cross the resource 
 > > size boundary, which sounds wrong.
 > > 
 > > We _could_ have a RAM resource that crosses the 4GB boundary, after all.
 > > 
 > > Yeah, it doesn't happen much in practice, because usually the 3G-4G range 
 > > is left for PCI mappings etc, so we might never hit this in practice, but 
 > > still, this sounds like a more correct thing to do.
 > > 
 > > It also avoids the cast. We simply cap the end to the max that 
 > > 'resource_size_t' can hold.
 > 
 > Mikael, please try this on your system, and send out /proc/iomem
 > 
 > Thanks
 > 
 > Yinghai
 > 
 > [PATCH] x86: add boundary check for 32bit res before expand e820 resource to alignment
 > 
 > fix hang with HIGHMEM_64G and 32bit resource.
 > 
 > according to hpa and Linus, use (resource_size_t)-1 to fend off big ranges.
 > 
 > Signed-off-by: Yinghai Lu <yinghai@kernel.org>

Thanks, 2.6.31-rc1 vanilla (which didn't boot) plus this one does boot.
/proc/iomem now looks as follows:

00000000-0009ebff : System RAM
0009ec00-0009ffff : reserved
000a0000-000bffff : Video RAM area
000c0000-000ccfff : Video ROM
000e4000-000fffff : reserved
  000f0000-000fffff : System ROM
00100000-7ff8ffff : System RAM
  00100000-002e022e : Kernel code
  002e022f-0038aaf7 : Kernel data
  003d8000-003fc9f3 : Kernel bss
7ff90000-7ff9dfff : ACPI Tables
7ff9e000-7ffdffff : ACPI Non-volatile Storage
7ffe0000-7fffffff : reserved
80000000-800000ff : 0000:00:1f.3
bff00000-dfefffff : PCI Bus 0000:01
  c0000000-cfffffff : 0000:01:00.0
e0000000-efffffff : PCI MMCONFIG 0 [00-ff]
  e0000000-efffffff : pnp 00:0e
febfe000-febfec00 : pnp 00:09
fec00000-fec00fff : IOAPIC 0
  fec00000-fec00fff : pnp 00:0b
fed00000-fed003ff : HPET 0
fed14000-fed19fff : pnp 00:01
fed1c000-fed1ffff : pnp 00:09
fed20000-fed8ffff : pnp 00:09
fee00000-fee00fff : Local APIC
  fee00000-fee00fff : reserved
    fee00000-fee00fff : pnp 00:0b
ff800000-ff8fffff : PCI Bus 0000:01
  ff8c0000-ff8dffff : 0000:01:00.0
  ff8e0000-ff8effff : 0000:01:00.1
  ff8f0000-ff8fffff : 0000:01:00.0
ff900000-ff9fffff : PCI Bus 0000:02
  ff9ffc00-ff9ffcff : 0000:02:02.0
    ff9ffc00-ff9ffcff : 8139too
ffaf8000-ffafbfff : 0000:00:1b.0
  ffaf8000-ffafbfff : ICH HD audio
ffaff000-ffaff3ff : 0000:00:1d.7
  ffaff000-ffaff3ff : ehci_hcd
ffaff400-ffaff7ff : 0000:00:1a.7
  ffaff400-ffaff7ff : ehci_hcd
ffaff800-ffafffff : 0000:00:1f.2
  ffaff800-ffafffff : ahci
ffb00000-ffffffff : reserved
  ffb00000-ffbfffff : pnp 00:09
  fff00000-fffffffe : pnp 00:09
100000000-1ffffffff : System RAM
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
H. Peter Anvin June 30, 2009, 2:48 p.m. UTC | #2
Mikael Pettersson wrote:
> 
> Thanks, 2.6.31-rc1 vanilla (which didn't boot) plus this one does boot.
> /proc/iomem now looks as follows:
> 

... as it should.  So far so good, and this is a real problem.

However, there is something that really bothers me: *why does this help
on Mikael's system, which is PAE and therefore has a 64-bit
resource_size_t*?  This whole patch should be a no-op!  There is still
something that doesn't make sense.

The use of "unsigned long" in ram_alignment() will overflow after 2^52
bytes, but again, that's not the issue here, since the highest "start"
value we have is (0x2 << 32).

By process of elimination, the culprit must be round_up(), which reveals
that the macro definition of round_up() has a *very* sublte behavior
with mixed types:

#define round_up(x, y) (((x) + (y) - 1) & ~((y) - 1))

ram_alignment() returns unsigned long, which becomes (y).  This means
that the mask word on the right hand of the & gets truncated to 32 bits
*before* the masking happens -- since ((y) - 1) is still unsigned long,
inverting it will not set bits [63..32] to on.

I think this macro is actively dangerous.  Better would be:

({ __typeof__(x) __mask = (y)-1;  ((x)+__mask) & ~__mask; })

... which is also multiple-inclusion-free at the cost of using gcc
({...}) constructs.

The deep irony in this is that in our particular case is perhaps that
align_up(x,y)-1 is the same thing as x | (y-1) which would have avoided
the problem...

	-hpa
Rolf Eike Beer June 30, 2009, 3 p.m. UTC | #3
H. Peter Anvin wrote:
> Mikael Pettersson wrote:
> > Thanks, 2.6.31-rc1 vanilla (which didn't boot) plus this one does boot.
> > /proc/iomem now looks as follows:
>
> ... as it should.  So far so good, and this is a real problem.
>
> However, there is something that really bothers me: *why does this help
> on Mikael's system, which is PAE and therefore has a 64-bit
> resource_size_t*?  This whole patch should be a no-op!  There is still
> something that doesn't make sense.
>
> The use of "unsigned long" in ram_alignment() will overflow after 2^52
> bytes, but again, that's not the issue here, since the highest "start"
> value we have is (0x2 << 32).

I assume you meant "2^32" and (0x1 << 32)?

Eike
H. Peter Anvin June 30, 2009, 6:52 p.m. UTC | #4
Rolf Eike Beer wrote:
> H. Peter Anvin wrote:
>> Mikael Pettersson wrote:
>>> Thanks, 2.6.31-rc1 vanilla (which didn't boot) plus this one does boot.
>>> /proc/iomem now looks as follows:
>> ... as it should.  So far so good, and this is a real problem.
>>
>> However, there is something that really bothers me: *why does this help
>> on Mikael's system, which is PAE and therefore has a 64-bit
>> resource_size_t*?  This whole patch should be a no-op!  There is still
>> something that doesn't make sense.
>>
>> The use of "unsigned long" in ram_alignment() will overflow after 2^52
>> bytes, but again, that's not the issue here, since the highest "start"
>> value we have is (0x2 << 32).
> 
> I assume you meant "2^32" and (0x1 << 32)?
> 

No, I meant 2^52 and (0x2 << 32) [== 2^33.]

	-hpa

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Yinghai Lu June 30, 2009, 7:33 p.m. UTC | #5
H. Peter Anvin wrote:
> Mikael Pettersson wrote:
>> Thanks, 2.6.31-rc1 vanilla (which didn't boot) plus this one does boot.
>> /proc/iomem now looks as follows:
>>
> 
> ... as it should.  So far so good, and this is a real problem.
> 
> However, there is something that really bothers me: *why does this help
> on Mikael's system, which is PAE and therefore has a 64-bit
> resource_size_t*?  This whole patch should be a no-op!  There is still
> something that doesn't make sense.
> 
> The use of "unsigned long" in ram_alignment() will overflow after 2^52
> bytes, but again, that's not the issue here, since the highest "start"
> value we have is (0x2 << 32).
> 
> By process of elimination, the culprit must be round_up(), which reveals
> that the macro definition of round_up() has a *very* sublte behavior
> with mixed types:
> 
> #define round_up(x, y) (((x) + (y) - 1) & ~((y) - 1))
> 
> ram_alignment() returns unsigned long, which becomes (y).  This means
> that the mask word on the right hand of the & gets truncated to 32 bits
> *before* the masking happens -- since ((y) - 1) is still unsigned long,
> inverting it will not set bits [63..32] to on.
> 
> I think this macro is actively dangerous.  Better would be:
> 
> ({ __typeof__(x) __mask = (y)-1;  ((x)+__mask) & ~__mask; })
> 
> ... which is also multiple-inclusion-free at the cost of using gcc
> ({...}) constructs.
> 
> The deep irony in this is that in our particular case is perhaps that
> align_up(x,y)-1 is the same thing as x | (y-1) which would have avoided
> the problem...

agreed, that is why we change round_up to take u64.

wonder if we should kill round_up and use roundup instead.

in include/linux/kernel.h
#define roundup(x, y) ((((x) + ((y) - 1)) / (y)) * (y))

YH
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
H. Peter Anvin June 30, 2009, 7:44 p.m. UTC | #6
Yinghai Lu wrote:
> 
> agreed, that is why we change round_up to take u64.
>

round_up() is a macro, it doesn't "take" anything per se...

> wonder if we should kill round_up and use roundup instead.
> 
> in include/linux/kernel.h
> #define roundup(x, y) ((((x) + ((y) - 1)) / (y)) * (y))

Either that or we should change it to the form I specified...

	-hpa

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Linus Torvalds June 30, 2009, 8:10 p.m. UTC | #7
On Tue, 30 Jun 2009, H. Peter Anvin wrote:
> 
> By process of elimination, the culprit must be round_up(), which reveals
> that the macro definition of round_up() has a *very* sublte behavior
> with mixed types:
> 
> #define round_up(x, y) (((x) + (y) - 1) & ~((y) - 1))
> 
> ram_alignment() returns unsigned long, which becomes (y).  This means
> that the mask word on the right hand of the & gets truncated to 32 bits
> *before* the masking happens -- since ((y) - 1) is still unsigned long,
> inverting it will not set bits [63..32] to on.

Good catch.

Also, this shows another bug in the #define: it evaluates 'y' twice, which 
is a no-no for something that _looks_ like a function.

> I think this macro is actively dangerous.  Better would be:
> 
> ({ __typeof__(x) __mask = (y)-1;  ((x)+__mask) & ~__mask; })

Yes. Please make it so.

> The deep irony in this is that in our particular case is perhaps that
> align_up(x,y)-1 is the same thing as x | (y-1) which would have avoided
> the problem...

I don't know how deep that irony is, but I do agree that maybe we should 
do that simplification too. In addition to fixing round_up() to not bite 
future generations in the ass.

		Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

Index: linux-2.6/arch/x86/kernel/e820.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/e820.c
+++ linux-2.6/arch/x86/kernel/e820.c
@@ -1367,9 +1367,9 @@  void __init e820_reserve_resources(void)
 }
 
 /* How much should we pad RAM ending depending on where it is? */
-static unsigned long ram_alignment(resource_size_t pos)
+static u64 ram_alignment(u64 pos)
 {
-	unsigned long mb = pos >> 20;
+	u64 mb = pos >> 20;
 
 	/* To 64kB in the first megabyte */
 	if (!mb)
@@ -1383,6 +1383,8 @@  static unsigned long ram_alignment(resou
 	return 32*1024*1024;
 }
 
+#define MAX_RESOURCE_SIZE ((resource_size_t)-1)
+
 void __init e820_reserve_resources_late(void)
 {
 	int i;
@@ -1400,17 +1402,19 @@  void __init e820_reserve_resources_late(
 	 * avoid stolen RAM:
 	 */
 	for (i = 0; i < e820.nr_map; i++) {
-		struct e820entry *entry = &e820_saved.map[i];
-		resource_size_t start, end;
+		struct e820entry *entry = &e820.map[i];
+		u64 start, end;
 
 		if (entry->type != E820_RAM)
 			continue;
 		start = entry->addr + entry->size;
-		end = round_up(start, ram_alignment(start));
-		if (start == end)
+		end = round_up(start, ram_alignment(start)) - 1;
+		if (end > MAX_RESOURCE_SIZE)
+			end = MAX_RESOURCE_SIZE;
+		if (start > end)
 			continue;
-		reserve_region_with_split(&iomem_resource, start,
-						  end - 1, "RAM buffer");
+		reserve_region_with_split(&iomem_resource, start, end,
+					  "RAM buffer");
 	}
 }