Message ID | 4A496D4B.3040608@kernel.org (mailing list archive) |
---|---|
State | Not Applicable, archived |
Headers | show |
Yinghai Lu writes: > Linus Torvalds wrote: > > > > On Mon, 29 Jun 2009, Yinghai Lu wrote: > >> + end = round_up(start, ram_alignment(start)) - 1; > >> + if (start > (resource_size_t)end) > >> continue; > >> - reserve_region_with_split(&iomem_resource, start, > >> - end - 1, "RAM buffer"); > >> + reserve_region_with_split(&iomem_resource, (resource_size_t)start, > >> + (resource_size_t)end, "RAM buffer"); > > > > Hmm. You shouldn't need the casts with reserve_region_with_split(), and > > they just make things uglier. > > > > Also, I wonder if we should do something like this instead > > > > #define MAX_RESOURCE_SIZE ((resource_size_t)-1) > > > > ... > > end = round_up(start, ram_alignment(start)) - 1; > > if (end > MAX_RESOURCE_SIZE) > > end = MAX_RESOURCE_SIZE; > > if (start > end) > > continue; > > > > Because otherwise we'll just be ignoring resources that cross the resource > > size boundary, which sounds wrong. > > > > We _could_ have a RAM resource that crosses the 4GB boundary, after all. > > > > Yeah, it doesn't happen much in practice, because usually the 3G-4G range > > is left for PCI mappings etc, so we might never hit this in practice, but > > still, this sounds like a more correct thing to do. > > > > It also avoids the cast. We simply cap the end to the max that > > 'resource_size_t' can hold. > > Mikael, please try this on your system, and send out /proc/iomem > > Thanks > > Yinghai > > [PATCH] x86: add boundary check for 32bit res before expand e820 resource to alignment > > fix hang with HIGHMEM_64G and 32bit resource. > > according to hpa and Linus, use (resource_size_t)-1 to fend off big ranges. > > Signed-off-by: Yinghai Lu <yinghai@kernel.org> Thanks, 2.6.31-rc1 vanilla (which didn't boot) plus this one does boot. /proc/iomem now looks as follows: 00000000-0009ebff : System RAM 0009ec00-0009ffff : reserved 000a0000-000bffff : Video RAM area 000c0000-000ccfff : Video ROM 000e4000-000fffff : reserved 000f0000-000fffff : System ROM 00100000-7ff8ffff : System RAM 00100000-002e022e : Kernel code 002e022f-0038aaf7 : Kernel data 003d8000-003fc9f3 : Kernel bss 7ff90000-7ff9dfff : ACPI Tables 7ff9e000-7ffdffff : ACPI Non-volatile Storage 7ffe0000-7fffffff : reserved 80000000-800000ff : 0000:00:1f.3 bff00000-dfefffff : PCI Bus 0000:01 c0000000-cfffffff : 0000:01:00.0 e0000000-efffffff : PCI MMCONFIG 0 [00-ff] e0000000-efffffff : pnp 00:0e febfe000-febfec00 : pnp 00:09 fec00000-fec00fff : IOAPIC 0 fec00000-fec00fff : pnp 00:0b fed00000-fed003ff : HPET 0 fed14000-fed19fff : pnp 00:01 fed1c000-fed1ffff : pnp 00:09 fed20000-fed8ffff : pnp 00:09 fee00000-fee00fff : Local APIC fee00000-fee00fff : reserved fee00000-fee00fff : pnp 00:0b ff800000-ff8fffff : PCI Bus 0000:01 ff8c0000-ff8dffff : 0000:01:00.0 ff8e0000-ff8effff : 0000:01:00.1 ff8f0000-ff8fffff : 0000:01:00.0 ff900000-ff9fffff : PCI Bus 0000:02 ff9ffc00-ff9ffcff : 0000:02:02.0 ff9ffc00-ff9ffcff : 8139too ffaf8000-ffafbfff : 0000:00:1b.0 ffaf8000-ffafbfff : ICH HD audio ffaff000-ffaff3ff : 0000:00:1d.7 ffaff000-ffaff3ff : ehci_hcd ffaff400-ffaff7ff : 0000:00:1a.7 ffaff400-ffaff7ff : ehci_hcd ffaff800-ffafffff : 0000:00:1f.2 ffaff800-ffafffff : ahci ffb00000-ffffffff : reserved ffb00000-ffbfffff : pnp 00:09 fff00000-fffffffe : pnp 00:09 100000000-1ffffffff : System RAM -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Mikael Pettersson wrote: > > Thanks, 2.6.31-rc1 vanilla (which didn't boot) plus this one does boot. > /proc/iomem now looks as follows: > ... as it should. So far so good, and this is a real problem. However, there is something that really bothers me: *why does this help on Mikael's system, which is PAE and therefore has a 64-bit resource_size_t*? This whole patch should be a no-op! There is still something that doesn't make sense. The use of "unsigned long" in ram_alignment() will overflow after 2^52 bytes, but again, that's not the issue here, since the highest "start" value we have is (0x2 << 32). By process of elimination, the culprit must be round_up(), which reveals that the macro definition of round_up() has a *very* sublte behavior with mixed types: #define round_up(x, y) (((x) + (y) - 1) & ~((y) - 1)) ram_alignment() returns unsigned long, which becomes (y). This means that the mask word on the right hand of the & gets truncated to 32 bits *before* the masking happens -- since ((y) - 1) is still unsigned long, inverting it will not set bits [63..32] to on. I think this macro is actively dangerous. Better would be: ({ __typeof__(x) __mask = (y)-1; ((x)+__mask) & ~__mask; }) ... which is also multiple-inclusion-free at the cost of using gcc ({...}) constructs. The deep irony in this is that in our particular case is perhaps that align_up(x,y)-1 is the same thing as x | (y-1) which would have avoided the problem... -hpa
H. Peter Anvin wrote: > Mikael Pettersson wrote: > > Thanks, 2.6.31-rc1 vanilla (which didn't boot) plus this one does boot. > > /proc/iomem now looks as follows: > > ... as it should. So far so good, and this is a real problem. > > However, there is something that really bothers me: *why does this help > on Mikael's system, which is PAE and therefore has a 64-bit > resource_size_t*? This whole patch should be a no-op! There is still > something that doesn't make sense. > > The use of "unsigned long" in ram_alignment() will overflow after 2^52 > bytes, but again, that's not the issue here, since the highest "start" > value we have is (0x2 << 32). I assume you meant "2^32" and (0x1 << 32)? Eike
Rolf Eike Beer wrote: > H. Peter Anvin wrote: >> Mikael Pettersson wrote: >>> Thanks, 2.6.31-rc1 vanilla (which didn't boot) plus this one does boot. >>> /proc/iomem now looks as follows: >> ... as it should. So far so good, and this is a real problem. >> >> However, there is something that really bothers me: *why does this help >> on Mikael's system, which is PAE and therefore has a 64-bit >> resource_size_t*? This whole patch should be a no-op! There is still >> something that doesn't make sense. >> >> The use of "unsigned long" in ram_alignment() will overflow after 2^52 >> bytes, but again, that's not the issue here, since the highest "start" >> value we have is (0x2 << 32). > > I assume you meant "2^32" and (0x1 << 32)? > No, I meant 2^52 and (0x2 << 32) [== 2^33.] -hpa -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
H. Peter Anvin wrote: > Mikael Pettersson wrote: >> Thanks, 2.6.31-rc1 vanilla (which didn't boot) plus this one does boot. >> /proc/iomem now looks as follows: >> > > ... as it should. So far so good, and this is a real problem. > > However, there is something that really bothers me: *why does this help > on Mikael's system, which is PAE and therefore has a 64-bit > resource_size_t*? This whole patch should be a no-op! There is still > something that doesn't make sense. > > The use of "unsigned long" in ram_alignment() will overflow after 2^52 > bytes, but again, that's not the issue here, since the highest "start" > value we have is (0x2 << 32). > > By process of elimination, the culprit must be round_up(), which reveals > that the macro definition of round_up() has a *very* sublte behavior > with mixed types: > > #define round_up(x, y) (((x) + (y) - 1) & ~((y) - 1)) > > ram_alignment() returns unsigned long, which becomes (y). This means > that the mask word on the right hand of the & gets truncated to 32 bits > *before* the masking happens -- since ((y) - 1) is still unsigned long, > inverting it will not set bits [63..32] to on. > > I think this macro is actively dangerous. Better would be: > > ({ __typeof__(x) __mask = (y)-1; ((x)+__mask) & ~__mask; }) > > ... which is also multiple-inclusion-free at the cost of using gcc > ({...}) constructs. > > The deep irony in this is that in our particular case is perhaps that > align_up(x,y)-1 is the same thing as x | (y-1) which would have avoided > the problem... agreed, that is why we change round_up to take u64. wonder if we should kill round_up and use roundup instead. in include/linux/kernel.h #define roundup(x, y) ((((x) + ((y) - 1)) / (y)) * (y)) YH -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Yinghai Lu wrote: > > agreed, that is why we change round_up to take u64. > round_up() is a macro, it doesn't "take" anything per se... > wonder if we should kill round_up and use roundup instead. > > in include/linux/kernel.h > #define roundup(x, y) ((((x) + ((y) - 1)) / (y)) * (y)) Either that or we should change it to the form I specified... -hpa -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 30 Jun 2009, H. Peter Anvin wrote: > > By process of elimination, the culprit must be round_up(), which reveals > that the macro definition of round_up() has a *very* sublte behavior > with mixed types: > > #define round_up(x, y) (((x) + (y) - 1) & ~((y) - 1)) > > ram_alignment() returns unsigned long, which becomes (y). This means > that the mask word on the right hand of the & gets truncated to 32 bits > *before* the masking happens -- since ((y) - 1) is still unsigned long, > inverting it will not set bits [63..32] to on. Good catch. Also, this shows another bug in the #define: it evaluates 'y' twice, which is a no-no for something that _looks_ like a function. > I think this macro is actively dangerous. Better would be: > > ({ __typeof__(x) __mask = (y)-1; ((x)+__mask) & ~__mask; }) Yes. Please make it so. > The deep irony in this is that in our particular case is perhaps that > align_up(x,y)-1 is the same thing as x | (y-1) which would have avoided > the problem... I don't know how deep that irony is, but I do agree that maybe we should do that simplification too. In addition to fixing round_up() to not bite future generations in the ass. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Index: linux-2.6/arch/x86/kernel/e820.c =================================================================== --- linux-2.6.orig/arch/x86/kernel/e820.c +++ linux-2.6/arch/x86/kernel/e820.c @@ -1367,9 +1367,9 @@ void __init e820_reserve_resources(void) } /* How much should we pad RAM ending depending on where it is? */ -static unsigned long ram_alignment(resource_size_t pos) +static u64 ram_alignment(u64 pos) { - unsigned long mb = pos >> 20; + u64 mb = pos >> 20; /* To 64kB in the first megabyte */ if (!mb) @@ -1383,6 +1383,8 @@ static unsigned long ram_alignment(resou return 32*1024*1024; } +#define MAX_RESOURCE_SIZE ((resource_size_t)-1) + void __init e820_reserve_resources_late(void) { int i; @@ -1400,17 +1402,19 @@ void __init e820_reserve_resources_late( * avoid stolen RAM: */ for (i = 0; i < e820.nr_map; i++) { - struct e820entry *entry = &e820_saved.map[i]; - resource_size_t start, end; + struct e820entry *entry = &e820.map[i]; + u64 start, end; if (entry->type != E820_RAM) continue; start = entry->addr + entry->size; - end = round_up(start, ram_alignment(start)); - if (start == end) + end = round_up(start, ram_alignment(start)) - 1; + if (end > MAX_RESOURCE_SIZE) + end = MAX_RESOURCE_SIZE; + if (start > end) continue; - reserve_region_with_split(&iomem_resource, start, - end - 1, "RAM buffer"); + reserve_region_with_split(&iomem_resource, start, end, + "RAM buffer"); } }