Message ID | 4DA60C30.4060606@kernel.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Wed, Apr 13, 2011 at 1:48 PM, Yinghai Lu <yinghai@kernel.org> wrote: > > can you try following change ? it will push gart to 0x80000000 > > diff --git a/arch/x86/kernel/aperture_64.c b/arch/x86/kernel/aperture_64.c > index 86d1ad4..3b6a9d5 100644 > --- a/arch/x86/kernel/aperture_64.c > +++ b/arch/x86/kernel/aperture_64.c > @@ -83,7 +83,7 @@ static u32 __init allocate_aperture(void) > * so don't use 512M below as gart iommu, leave the space for kernel > * code for safe > */ > - addr = memblock_find_in_range(0, 1ULL<<32, aper_size, 512ULL<<20); > + addr = memblock_find_in_range(0, 1ULL<<32, aper_size, 512ULL<<21); What are all the magic numbers, and why would 0x80000000 be special? Why don't we write code that just works? Or absent a "just works" set of patches, why don't we revert to code that has years of testing? This kind of "I broke things, so now I will jiggle things randomly until they unbreak" is not acceptable. Either explain why that fixes a real BUG (and why the magic constants need to be what they are), or just revert the patch that caused the problem, and go back to the allocation patters that have years of experience. Guys, we've had this discussion before, in PCI allocation. We don't do this. We tried switching the PCI region allocations to top-down, and IT WAS A FAILURE. We reverted it to what we had years of testing with. Don't just make random changes. There really are only two acceptable models of development: "think and analyze" or "years and years of testing on thousands of machines". Those two really do work. Linus
On 04/13/2011 01:54 PM, Linus Torvalds wrote: > On Wed, Apr 13, 2011 at 1:48 PM, Yinghai Lu <yinghai@kernel.org> wrote: >> >> can you try following change ? it will push gart to 0x80000000 >> >> diff --git a/arch/x86/kernel/aperture_64.c b/arch/x86/kernel/aperture_64.c >> index 86d1ad4..3b6a9d5 100644 >> --- a/arch/x86/kernel/aperture_64.c >> +++ b/arch/x86/kernel/aperture_64.c >> @@ -83,7 +83,7 @@ static u32 __init allocate_aperture(void) >> * so don't use 512M below as gart iommu, leave the space for kernel >> * code for safe >> */ >> - addr = memblock_find_in_range(0, 1ULL<<32, aper_size, 512ULL<<20); >> + addr = memblock_find_in_range(0, 1ULL<<32, aper_size, 512ULL<<21); > > What are all the magic numbers, and why would 0x80000000 be special? that is the old value when kernel was doing bottom-up bootmem allocation. > > Why don't we write code that just works? > > Or absent a "just works" set of patches, why don't we revert to code > that has years of testing? > > This kind of "I broke things, so now I will jiggle things randomly > until they unbreak" is not acceptable. > > Either explain why that fixes a real BUG (and why the magic constants > need to be what they are), or just revert the patch that caused the > problem, and go back to the allocation patters that have years of > experience. > > Guys, we've had this discussion before, in PCI allocation. We don't do > this. We tried switching the PCI region allocations to top-down, and > IT WAS A FAILURE. We reverted it to what we had years of testing with. > > Don't just make random changes. There really are only two acceptable > models of development: "think and analyze" or "years and years of > testing on thousands of machines". Those two really do work. We did do the analyzing, and only difference seems to be: good one is using 0x80000000 and bad one is using 0xa0000000. We try to figure out if it needs low address and it happen to work because kernel was doing bottom up allocation. Thanks Yinghai
On Wed, Apr 13, 2011 at 01:48:48PM -0700, Yinghai Lu wrote: > - addr = memblock_find_in_range(0, 1ULL<<32, aper_size, 512ULL<<20); > + addr = memblock_find_in_range(0, 1ULL<<32, aper_size, 512ULL<<21); Btw, while looking at this code I wondered why the 512M goal is enforced by the alignment. Start could be set to 512M instead and the alignment can be aper_size as it should. Any reason for such a big alignment? Joerg P.S.: The box is still in the office, I will try this debug-patch tomorrow.
On 04/13/2011 02:50 PM, Joerg Roedel wrote: > On Wed, Apr 13, 2011 at 01:48:48PM -0700, Yinghai Lu wrote: >> - addr = memblock_find_in_range(0, 1ULL<<32, aper_size, 512ULL<<20); >> + addr = memblock_find_in_range(0, 1ULL<<32, aper_size, 512ULL<<21); > > Btw, while looking at this code I wondered why the 512M goal is enforced > by the alignment. Start could be set to 512M instead and the alignment > can be aper_size as it should. Any reason for such a big alignment? > when using bootmem, try to use big alignment (512M ), so we could avoid take ram range below 512M. commit 7677b2ef6c0c4fddc84f6473f3863f40eb71821b Author: Yinghai Lu <yhlu.kernel.send@gmail.com> Date: Mon Apr 14 20:40:37 2008 -0700 x86_64: allocate gart aperture from 512M because we try to reserve dma32 early, so we have chance to get aperture from 64M. with some sequence aperture allocated from RAM, could become E820_RESERVED. and then if doing a kexec with a big kernel that uncompressed size is above 64M we could have a range conflict with still using gart. So allocate gart aperture from 512M instead. Also change the fallback_aper_order to 5, because we don't have chance to get 2G or 4G aperture. We can change it back to 32M or make it equal to size. > > P.S.: The box is still in the office, I will try this debug-patch > tomorrow. Alexandre's system is working at 0xa4000000 with 2.6.38.2 So it is not low address problem. could be other reason like some other code could need lower address. Thanks Yinghai
On 04/13/2011 02:50 PM, Joerg Roedel wrote: > On Wed, Apr 13, 2011 at 01:48:48PM -0700, Yinghai Lu wrote: >> - addr = memblock_find_in_range(0, 1ULL<<32, aper_size, 512ULL<<20); >> + addr = memblock_find_in_range(0, 1ULL<<32, aper_size, 512ULL<<21); > > Btw, while looking at this code I wondered why the 512M goal is enforced > by the alignment. Start could be set to 512M instead and the alignment > can be aper_size as it should. Any reason for such a big alignment? > > Joerg > > P.S.: The box is still in the office, I will try this debug-patch > tomorrow. The only reason that I can think of is that the aperture itself can be huge, and perhaps 512 MiB is the biggest such known. 512ULL<<21 is of course a particularly moronic way to write 1 GiB, but it was a debug patch. The value 512 MiB apparently comes from 7677b2ef6c0c4fddc84f6473f3863f40eb71821b, which is apparently totally ad hoc; effectively it tries to prevent a collision with kexec by hardcoding the kdump allocation as it sat at that point in time in the GART assignment rules. Yeah. Brilliant. -hpa
On 04/13/2011 02:59 PM, Yinghai Lu wrote: > On 04/13/2011 02:50 PM, Joerg Roedel wrote: >> On Wed, Apr 13, 2011 at 01:48:48PM -0700, Yinghai Lu wrote: >>> - addr = memblock_find_in_range(0, 1ULL<<32, aper_size, 512ULL<<20); >>> + addr = memblock_find_in_range(0, 1ULL<<32, aper_size, 512ULL<<21); >> >> Btw, while looking at this code I wondered why the 512M goal is enforced >> by the alignment. Start could be set to 512M instead and the alignment >> can be aper_size as it should. Any reason for such a big alignment? >> > > when using bootmem, try to use big alignment (512M ), so we could avoid take ram range below 512M. > Yes, his question was why on Earth are you using 0 as start if that is the purpose. On top of that, where the hell does the magic 512 MiB come from? It looks like it is either completly ad hoc, or it has something to do with where the kexec kernel was allocated once upon a time. -hpa
On Wed, Apr 13, 2011 at 03:01:10PM -0700, H. Peter Anvin wrote: > On 04/13/2011 02:50 PM, Joerg Roedel wrote: > > On Wed, Apr 13, 2011 at 01:48:48PM -0700, Yinghai Lu wrote: > >> - addr = memblock_find_in_range(0, 1ULL<<32, aper_size, 512ULL<<20); > >> + addr = memblock_find_in_range(0, 1ULL<<32, aper_size, 512ULL<<21); > > > > Btw, while looking at this code I wondered why the 512M goal is enforced > > by the alignment. Start could be set to 512M instead and the alignment > > can be aper_size as it should. Any reason for such a big alignment? > > > > Joerg > > > > P.S.: The box is still in the office, I will try this debug-patch > > tomorrow. > > The only reason that I can think of is that the aperture itself can be > huge, and perhaps 512 MiB is the biggest such known. Well, that would work as well by just using aper_size as alignment, the aperture needs to be aligned on its size anyway. This code only runs when Linux allocates the aperture itself and if I am mistaken is uses always 64MB when doing this. Joerg
On 04/13/2011 03:22 PM, Joerg Roedel wrote: > On Wed, Apr 13, 2011 at 03:01:10PM -0700, H. Peter Anvin wrote: >> On 04/13/2011 02:50 PM, Joerg Roedel wrote: >>> On Wed, Apr 13, 2011 at 01:48:48PM -0700, Yinghai Lu wrote: >>>> - addr = memblock_find_in_range(0, 1ULL<<32, aper_size, 512ULL<<20); >>>> + addr = memblock_find_in_range(0, 1ULL<<32, aper_size, 512ULL<<21); >>> >>> Btw, while looking at this code I wondered why the 512M goal is enforced >>> by the alignment. Start could be set to 512M instead and the alignment >>> can be aper_size as it should. Any reason for such a big alignment? >>> >>> Joerg >>> >>> P.S.: The box is still in the office, I will try this debug-patch >>> tomorrow. >> >> The only reason that I can think of is that the aperture itself can be >> huge, and perhaps 512 MiB is the biggest such known. > > Well, that would work as well by just using aper_size as alignment, the > aperture needs to be aligned on its size anyway. This code only runs > when Linux allocates the aperture itself and if I am mistaken is uses > always 64MB when doing this. Yes, I would agree with that. The sane thing would be to set the base to whatever address needs to be guarded against (WHICH SHOULD BE MOTIVATED), and use aper_size as alignment, *unless* we are only using the initial portion of a much larger hardware structure that needs natural alignment (which isn't clear to me, I do know we sometimes use only a fraction of the GART, but that doesn't mean we need to naturally-align the entire thing, nor that 512 MiB is sufficient to do so.) -hpa
On Wed, Apr 13, 2011 at 2:23 PM, Yinghai Lu <yinghai@kernel.org> wrote: >> >> What are all the magic numbers, and why would 0x80000000 be special? > > that is the old value when kernel was doing bottom-up bootmem allocation. I understand, BUT THAT IS STILL A TOTALLY MAGIC NUMBER! It makes it come out the same ON THAT ONE MACHINE. So no, it's not "the old value". It's a random value that gets the old value in one specific case. >> Why don't we write code that just works? >> >> Or absent a "just works" set of patches, why don't we revert to code >> that has years of testing? >> >> This kind of "I broke things, so now I will jiggle things randomly >> until they unbreak" is not acceptable. >> >> Either explain why that fixes a real BUG (and why the magic constants >> need to be what they are), or just revert the patch that caused the >> problem, and go back to the allocation patters that have years of >> experience. >> >> Guys, we've had this discussion before, in PCI allocation. We don't do >> this. We tried switching the PCI region allocations to top-down, and >> IT WAS A FAILURE. We reverted it to what we had years of testing with. >> >> Don't just make random changes. There really are only two acceptable >> models of development: "think and analyze" or "years and years of >> testing on thousands of machines". Those two really do work. > > We did do the analyzing, and only difference seems to be: No. Yinghai, we have had this discussion before, and dammit, you need to understand the difference between "understanding the problem" and "put in random values until it works on one machine". There was absolutely _zero_ analysis done. You do not actually understand WHY the numbers matter. You just look at two random numbers, and one works, the other does not. That's not "analyzing". That's just "random number games". If you cannot see and understand the difference between an actual analytical solution where you _understand_ what the code is doing and why, and "random numbers that happen to work on one machine", I don't know what to tell you. > good one is using 0x80000000 > and bad one is using 0xa0000000. > > We try to figure out if it needs low address and it happen to work > because kernel was doing bottom up allocation. No. Let me repeat my point one more time. You have TWO choices. Not more, not less: - choice #1: go back to the old allocation model. It's tested. It doesn't regress. Admittedly we may not know exactly _why_ it works, and it might not work on all machines, but it doesn't cause regressions (ie the machines it doesn't work on it _never_ worked on). And this doesn't mean "old value for that _one_ machine". It means "old value for _every_ machine". So it means we revert the whole bottom-down thing entirely. Not just "change one random number so that the totally different allocation pattern happens to give the same result on one particular machine". Quite frankly, I don't see the point of doing top-to-bottom anyway, so I think we should do this regardless. Just revert the whole "allocate from top". It didn't work for PCI, it's not working for this case either. Stop doing it. - Choice #2: understand exactly _what_ goes wrong, and fix it analytically (ie by _understanding_ the problem, and being able to solve it exactly, and in a way you can argue about without having to resort to "magic happens"). Now, the whole analytic approach (aka "computer sciency" approach), where you can actually think about the problem without having any pesky "reality" impact the solution is obviously the one we tend to prefer. Sadly, it's seldom the one we can use in reality when it comes to things like resource allocation, since we end up starting off with often buggy approximations of what the actual hardware is all about (ie broken firmware tables). So I'd love to know exactly why one random number works, and why another one doesn't. But as long as we do _not_ know the "Why" of it, we will have to revert. It really is that simple. It's _always_ that simple. So the numbers shouldn't be "magic", they should have real explanations. And in the absense of real explanation, the model that works is "this is what we've always done". Including, very much, the whole allocation order. Not just one random number on one random machine. Linus
On 04/13/2011 04:39 PM, Linus Torvalds wrote: > On Wed, Apr 13, 2011 at 2:23 PM, Yinghai Lu <yinghai@kernel.org> wrote: >>> >>> What are all the magic numbers, and why would 0x80000000 be special? >> >> that is the old value when kernel was doing bottom-up bootmem allocation. > > I understand, BUT THAT IS STILL A TOTALLY MAGIC NUMBER! > > It makes it come out the same ON THAT ONE MACHINE. So no, it's not > "the old value". It's a random value that gets the old value in one > specific case. Alexandre's system is working 2.6.38.2 and kernel allocate from 0xa4000000 Joerg's system working 2.6.39-rc3 while revert the top down bootmem patch 1a4a678b12c84db9ae5dce424e0e97f0559bb57c and kernel allocate to 0x80000000. Alexandre's system is working while increasing alignment to 1g, and make kernel to allocate 0x80000000 to gart. they are not working if kernel allocate from 0xa0000000 the 0xa0000000 looks like same value from radon GTT. [ 4.250159] radeon 0000:01:05.0: VRAM: 320M 0x00000000C0000000 - 0x00000000D3FFFFFF (320M used) [ 4.258830] radeon 0000:01:05.0: GTT: 512M 0x00000000A0000000 - 0x00000000BFFFFFFF [ 4.266742] [drm] Detected VRAM RAM=320M, BAR=256M [ 4.271549] [drm] RAM width 32bits DDR [ 4.275435] [TTM] Zone kernel: Available graphics memory: 1896526 kiB. [ 4.282066] [TTM] Initializing pool allocator. [ 4.282085] usb 7-2: new full speed USB device number 2 using ohci_hcd [ 4.293076] [drm] radeon: 320M of VRAM memory ready [ 4.298277] [drm] radeon: 512M of GTT memory ready. [ 4.303218] [drm] Supports vblank timestamp caching Rev 1 (10.10.2010). [ 4.309854] [drm] Driver supports precise vblank timestamp query. [ 4.315970] [drm] radeon: irq initialized. [ 4.320094] [drm] GART: num cpu pages 131072, num gpu pages 131072 Alex said that 0xa0000000 is ok and is from GPU address space --- The VRAM and GTT addresses in the dmesg are internal GPU addresses not system addresses. The GPU has it's own internal address space for on-chip memory clients (texture samplers, render buffers, display controllers, etc.). The GPU sets up two apertures in it's internal address space and on-chip client requests are forwarded to the appropriate place by the GPU's memory controller. Addresses in the GPU's VRAM aperture go to local vram on discrete cards, or to the stolen memory at the top of system memory for IGP cards. Addresses in the GPU's GTT aperture hit a page table and get forwarded to the appropriate dma pages. --- > >>> Why don't we write code that just works? >>> >>> Or absent a "just works" set of patches, why don't we revert to code >>> that has years of testing? >>> >>> This kind of "I broke things, so now I will jiggle things randomly >>> until they unbreak" is not acceptable. >>> >>> Either explain why that fixes a real BUG (and why the magic constants >>> need to be what they are), or just revert the patch that caused the >>> problem, and go back to the allocation patters that have years of >>> experience. >>> >>> Guys, we've had this discussion before, in PCI allocation. We don't do >>> this. We tried switching the PCI region allocations to top-down, and >>> IT WAS A FAILURE. We reverted it to what we had years of testing with. >>> >>> Don't just make random changes. There really are only two acceptable >>> models of development: "think and analyze" or "years and years of >>> testing on thousands of machines". Those two really do work. >> >> We did do the analyzing, and only difference seems to be: > > No. > > Yinghai, we have had this discussion before, and dammit, you need to > understand the difference between "understanding the problem" and "put > in random values until it works on one machine". > > There was absolutely _zero_ analysis done. You do not actually > understand WHY the numbers matter. You just look at two random > numbers, and one works, the other does not. That's not "analyzing". > That's just "random number games". > > If you cannot see and understand the difference between an actual > analytical solution where you _understand_ what the code is doing and > why, and "random numbers that happen to work on one machine", I don't > know what to tell you. > >> good one is using 0x80000000 >> and bad one is using 0xa0000000. >> >> We try to figure out if it needs low address and it happen to work >> because kernel was doing bottom up allocation. > > No. > > Let me repeat my point one more time. > > You have TWO choices. Not more, not less: > > - choice #1: go back to the old allocation model. It's tested. It > doesn't regress. Admittedly we may not know exactly _why_ it works, > and it might not work on all machines, but it doesn't cause > regressions (ie the machines it doesn't work on it _never_ worked on). > > And this doesn't mean "old value for that _one_ machine". It means > "old value for _every_ machine". So it means we revert the whole > bottom-down thing entirely. Not just "change one random number so that > the totally different allocation pattern happens to give the same > result on one particular machine". > > Quite frankly, I don't see the point of doing top-to-bottom anyway, > so I think we should do this regardless. Just revert the whole > "allocate from top". It didn't work for PCI, it's not working for this > case either. Stop doing it. we did some codes to prevent bootmem to use low range. > > - Choice #2: understand exactly _what_ goes wrong, and fix it > analytically (ie by _understanding_ the problem, and being able to > solve it exactly, and in a way you can argue about without having to > resort to "magic happens"). > > Now, the whole analytic approach (aka "computer sciency" approach), > where you can actually think about the problem without having any > pesky "reality" impact the solution is obviously the one we tend to > prefer. Sadly, it's seldom the one we can use in reality when it comes > to things like resource allocation, since we end up starting off with > often buggy approximations of what the actual hardware is all about > (ie broken firmware tables). > > So I'd love to know exactly why one random number works, and why > another one doesn't. But as long as we do _not_ know the "Why" of it, > we will have to revert. > > It really is that simple. It's _always_ that simple. > > So the numbers shouldn't be "magic", they should have real > explanations. And in the absense of real explanation, the model that > works is "this is what we've always done". Including, very much, the > whole allocation order. Not just one random number on one random > machine. Ok, let's try to figure out why 0xa0000000 can not be used. if we can not figure out, we can revert 1a4a678b12c84db9ae5dce424e0e97f0559bb57c thanks Yinghai
On 04/13/2011 04:39 PM, Linus Torvalds wrote: > > - Choice #2: understand exactly _what_ goes wrong, and fix it > analytically (ie by _understanding_ the problem, and being able to > solve it exactly, and in a way you can argue about without having to > resort to "magic happens"). > > Now, the whole analytic approach (aka "computer sciency" approach), > where you can actually think about the problem without having any > pesky "reality" impact the solution is obviously the one we tend to > prefer. Sadly, it's seldom the one we can use in reality when it comes > to things like resource allocation, since we end up starting off with > often buggy approximations of what the actual hardware is all about > (ie broken firmware tables). > > So I'd love to know exactly why one random number works, and why > another one doesn't. But as long as we do _not_ know the "Why" of it, > we will have to revert. > Yes. However, even if we *do* revert (and the time is running short on not reverting) I would like to understand this particular one, simply because I think it may very well be a problem that is manifesting itself in other ways on other systems. The other thing that this has uncovered is that we already have a bunch of complete b*llsh*t magic numbers in this path, some of which are trivially shown to be wrong or at least completely arbitrary, so there are more issues here :( -hpa
On Wednesday, April 13, 2011, H. Peter Anvin <hpa@zytor.com> wrote: > > Yes. However, even if we *do* revert (and the time is running short on > not reverting) I would like to understand this particular one, simply > because I think it may very well be a problem that is manifesting itself > in other ways on other systems. > > The other thing that this has uncovered is that we already have a bunch > of complete b*llsh*t magic numbers in this
On Wednesday, April 13, 2011, Linus Torvalds <torvalds@linux-foundation.org> wrote: > On Wednesday, April 13, 2011, H. Peter Anvin <hpa@zytor.com> wrote: >> >> Yes. However, even if we *do* revert (and the time is running short on >> not reverting) I would like to understand this particular one, simply >> because I think it may very well be a problem that is manifesting itself >> in other ways on other systems. sorry, fingerfart. Anyway, I agree 100%. we definitely want to also understand the reason for things not working, even if we do revert.. Linus >> of complete b*llsh*t magic numbers in this >
Hello, On Wed, Apr 13, 2011 at 07:33:40PM -0700, Linus Torvalds wrote: > On Wednesday, April 13, 2011, Linus Torvalds > <torvalds@linux-foundation.org> wrote: > > On Wednesday, April 13, 2011, H. Peter Anvin <hpa@zytor.com> wrote: > >> > >> Yes. However, even if we *do* revert (and the time is running short on > >> not reverting) I would like to understand this particular one, simply > >> because I think it may very well be a problem that is manifesting itself > >> in other ways on other systems. > > sorry, fingerfart. Anyway, I agree 100%. > > we definitely want to also understand the reason for things not > working, even if we do revert.. There were (and still are) places where memblock callers implemented ad-hoc top-down allocation by stepping down start limit until allocation succeeds. Several of them have been removed since top-down became the default behavior, so simply reverting the commit is likely to cause subtle issues. Maybe the best approach is introducing @topdown parameter and use it selectively for pure memory allocations. Thanks.
On Wed, 13 Apr 2011 19:33:40 -0700 Linus Torvalds <torvalds@linux-foundation.org> wrote: > On Wednesday, April 13, 2011, Linus Torvalds > <torvalds@linux-foundation.org> wrote: > > On Wednesday, April 13, 2011, H. Peter Anvin <hpa@zytor.com> wrote: > >> > >> Yes. However, even if we *do* revert (and the time is running short on > >> not reverting) I would like to understand this particular one, simply > >> because I think it may very well be a problem that is manifesting itself > >> in other ways on other systems. > > sorry, fingerfart. Anyway, I agree 100%. > > we definitely want to also understand the reason for things not > working, even if we do revert.. Definitely because if it fails when the "magic" involves the GART base it starts to sound like something may be hitting the wrong address space or not flushing properly.
On Wed, Apr 13, 2011 at 03:31:09PM -0700, H. Peter Anvin wrote: > On 04/13/2011 03:22 PM, Joerg Roedel wrote: > > On Wed, Apr 13, 2011 at 03:01:10PM -0700, H. Peter Anvin wrote: > >> On 04/13/2011 02:50 PM, Joerg Roedel wrote: > >>> On Wed, Apr 13, 2011 at 01:48:48PM -0700, Yinghai Lu wrote: > >>>> - addr = memblock_find_in_range(0, 1ULL<<32, aper_size, 512ULL<<20); > >>>> + addr = memblock_find_in_range(0, 1ULL<<32, aper_size, 512ULL<<21); > >>> > >>> Btw, while looking at this code I wondered why the 512M goal is enforced > >>> by the alignment. Start could be set to 512M instead and the alignment > >>> can be aper_size as it should. Any reason for such a big alignment? > >>> > >>> Joerg > >>> > >>> P.S.: The box is still in the office, I will try this debug-patch > >>> tomorrow. > >> > >> The only reason that I can think of is that the aperture itself can be > >> huge, and perhaps 512 MiB is the biggest such known. > > > > Well, that would work as well by just using aper_size as alignment, the > > aperture needs to be aligned on its size anyway. This code only runs > > when Linux allocates the aperture itself and if I am mistaken is uses > > always 64MB when doing this. > > Yes, I would agree with that. The sane thing would be to set the base > to whatever address needs to be guarded against (WHICH SHOULD BE > MOTIVATED), and use aper_size as alignment, *unless* we are only using > the initial portion of a much larger hardware structure that needs > natural alignment (which isn't clear to me, I do know we sometimes use > only a fraction of the GART, but that doesn't mean we need to > naturally-align the entire thing, nor that 512 MiB is sufficient to do so.) Whats allocated here is the address-space for the aperture. The code actually allocates the memory but all it needs is the physical address range. This range is later programmed into hardware as the GART aperture (the area the GART remaps). The Linux code can split the aperture if necessary for DMA-API usage and AGP usage. In that case both users get a half of the aperture and manage them itself. Joerg
On Thu, Apr 14, 2011 at 01:03:37PM +0900, Tejun Heo wrote: > Hello, > > On Wed, Apr 13, 2011 at 07:33:40PM -0700, Linus Torvalds wrote: > > On Wednesday, April 13, 2011, Linus Torvalds > > <torvalds@linux-foundation.org> wrote: > > > On Wednesday, April 13, 2011, H. Peter Anvin <hpa@zytor.com> wrote: > > >> > > >> Yes. However, even if we *do* revert (and the time is running short on > > >> not reverting) I would like to understand this particular one, simply > > >> because I think it may very well be a problem that is manifesting itself > > >> in other ways on other systems. > > > > sorry, fingerfart. Anyway, I agree 100%. > > > > we definitely want to also understand the reason for things not > > working, even if we do revert.. > > There were (and still are) places where memblock callers implemented > ad-hoc top-down allocation by stepping down start limit until > allocation succeeds. Several of them have been removed since top-down > became the default behavior, so simply reverting the commit is likely > to cause subtle issues. Maybe the best approach is introducing > @topdown parameter and use it selectively for pure memory allocations. Wouldn't it be better to provide a seperate memblock allocation function which operates top-down and use this one in the places that need it? This way it wouldn't break code that relies on bottom-up. Joerg
On Wednesday, April 13, 2011, Linus Torvalds <torvalds@linux-foundation.org> wrote: > On Wednesday, April 13, 2011, H. Peter Anvin <hpa@zytor.com> wrote: >> >> Yes. However, even if we *do* revert (and the time is running short on >> not reverting) I would like to understand this particular one, simply >> because I think it may very well be a problem that is manifesting itself >> in other ways on other systems. sorry, fingerfart. Anyway, I agree 100%. we definitely want to also understand the reason for things not working, even if we do revert.. Linus >> of complete b*llsh*t magic numbers in this >
diff --git a/arch/x86/kernel/aperture_64.c b/arch/x86/kernel/aperture_64.c index 86d1ad4..3b6a9d5 100644 --- a/arch/x86/kernel/aperture_64.c +++ b/arch/x86/kernel/aperture_64.c @@ -83,7 +83,7 @@ static u32 __init allocate_aperture(void) * so don't use 512M below as gart iommu, leave the space for kernel * code for safe */ - addr = memblock_find_in_range(0, 1ULL<<32, aper_size, 512ULL<<20); + addr = memblock_find_in_range(0, 1ULL<<32, aper_size, 512ULL<<21); if (addr == MEMBLOCK_ERROR || addr + aper_size > 0xffffffff) { printk(KERN_ERR "Cannot allocate aperture memory hole (%lx,%uK)\n",