Message ID | 1352925811-2598-1-git-send-email-jbarnes@virtuousgeek.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Wed, 14 Nov 2012 20:43:31 +0000 Jesse Barnes <jbarnes@virtuousgeek.org> wrote: > SNB graphics devices have a bug that prevent them from accessing certain > memory ranges, namely anything below 1M and in the pages listed in the > table. So reserve those at boot if set detect a SNB gfx device on the > CPU to avoid GPU hangs. What happens if the other addresses map to an external memory object - eg a PCI device which is a legitimate DMA source for video overlay etc ? I assume this is just for GPU fetches from main memory ? Alan
On Wed, 14 Nov 2012 21:19:05 +0000 Alan Cox <alan@lxorguk.ukuu.org.uk> wrote: > On Wed, 14 Nov 2012 20:43:31 +0000 > Jesse Barnes <jbarnes@virtuousgeek.org> wrote: > > > SNB graphics devices have a bug that prevent them from accessing certain > > memory ranges, namely anything below 1M and in the pages listed in the > > table. So reserve those at boot if set detect a SNB gfx device on the > > CPU to avoid GPU hangs. > > What happens if the other addresses map to an external memory object - eg > a PCI device which is a legitimate DMA source for video overlay etc ? Other addresses as in the 5 pages high in the address space? I'm not sure how to do what I want with memblock, doesn't it just allocate RAM not I/O space?... /me looks at the memblock API Or do you mean if we map GTT pages to point at some non-RAM region will SNB gfx be able to decode them? If that's the question, then I think the answer is no, but I don't have enough detail on the hw bug to be certain. > I assume this is just for GPU fetches from main memory ? AIUI, it's an address decoder bug, so it would affect any fetch by the GPU through its memory interface glue.
On Wed, 14 Nov 2012 20:43:31 +0000 Jesse Barnes <jbarnes@virtuousgeek.org> wrote: > SNB graphics devices have a bug that prevent them from accessing certain > memory ranges, namely anything below 1M and in the pages listed in the > table. So reserve those at boot if set detect a SNB gfx device on the > CPU to avoid GPU hangs. > > Stephane Marchesin had a similar patch to the page allocator awhile > back, but rather than reserving pages up front, it leaked them at > allocation time. So if people are seeing seemingly random hangs with SNB, please give this patch a try. These issues aren't easy to hit, and they're frustratingly hard to debug, but with some long runtime maybe it'll give people some confidence.
On Wed, 14 Nov 2012 13:55:34 -0800 Jesse Barnes <jbarnes@virtuousgeek.org> wrote: > On Wed, 14 Nov 2012 21:19:05 +0000 > Alan Cox <alan@lxorguk.ukuu.org.uk> wrote: > > > On Wed, 14 Nov 2012 20:43:31 +0000 > > Jesse Barnes <jbarnes@virtuousgeek.org> wrote: > > > > > SNB graphics devices have a bug that prevent them from accessing certain > > > memory ranges, namely anything below 1M and in the pages listed in the > > > table. So reserve those at boot if set detect a SNB gfx device on the > > > CPU to avoid GPU hangs. > > > > What happens if the other addresses map to an external memory object - eg > > a PCI device which is a legitimate DMA source for video overlay etc ? > > Other addresses as in the 5 pages high in the address space? I'm not > sure how to do what I want with memblock, doesn't it just allocate RAM > not I/O space?... /me looks at the memblock API > > Or do you mean if we map GTT pages to point at some non-RAM region will > SNB gfx be able to decode them? If that's the question, then I think > the answer is no, but I don't have enough detail on the hw bug to be > certain. > > > I assume this is just for GPU fetches from main memory ? > > AIUI, it's an address decoder bug, so it would affect any fetch by the > GPU through its memory interface glue. Well the extreme case (and I suspect one we don't care about too much in reality) is a box with a PCI/E or similar MMIO graphics device which is taking part in Dave Airlie's wonderous new graphics architecture so being rendered into or fetched by the Intel GPU and whose PCI/E space is mapped crossing one of those addresses. The other case of concern would be if the Intel IOMMU had mappings there that were then touched in some way by the GPU ? Alan
On Wed, 14 Nov 2012, Jesse Barnes wrote: > + unsigned long bad_ranges[] = { > + 0x20050000, > + 0x20110000, > + 0x20130000, > + 0x20138000, > + 0x40004000, Yikes. Can this be fixed through a microcode update? The kernel would still need the workaround anyway, but at least you would be able to be quite heavy-handed on it, and users could be warned of a better fix...
On Thu, 15 Nov 2012 10:47:11 -0200 Henrique de Moraes Holschuh <hmh@hmh.eng.br> wrote: > On Wed, 14 Nov 2012, Jesse Barnes wrote: > > + unsigned long bad_ranges[] = { > > + 0x20050000, > > + 0x20110000, > > + 0x20130000, > > + 0x20138000, > > + 0x40004000, > > Yikes. Can this be fixed through a microcode update? The kernel would > still need the workaround anyway, but at least you would be able to be quite > heavy-handed on it, and users could be warned of a better fix... No, this affects the logic on the GPU side, so there's no microcode way to fix it.
On Thu, 15 Nov 2012 10:47:11 -0200 Henrique de Moraes Holschuh <hmh@hmh.eng.br> wrote: > On Wed, 14 Nov 2012, Jesse Barnes wrote: > > + unsigned long bad_ranges[] = { > > + 0x20050000, > > + 0x20110000, > > + 0x20130000, > > + 0x20138000, > > + 0x40004000, > > Yikes. Can this be fixed through a microcode update? The kernel would > still need the workaround anyway, but at least you would be able to be quite > heavy-handed on it, and users could be warned of a better fix... Note the array should be called "bad pages" since it's just each page at that offset that we can't use. (In case you were worried we'd need to exclude everything between each pair of addresses or something.)
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c index 468e98d..bb9fabe 100644 --- a/arch/x86/kernel/setup.c +++ b/arch/x86/kernel/setup.c @@ -614,6 +614,75 @@ static __init void reserve_ibft_region(void) static unsigned reserve_low = CONFIG_X86_RESERVE_LOW << 10; +static bool __init snb_gfx_workaround_needed(void) +{ + int i; + u16 vendor, devid; + u16 snb_ids[] = { + 0x0102, + 0x0112, + 0x0122, + 0x0106, + 0x0116, + 0x0126, + 0x010a, + }; + + /* Assume no if something weird is going on with PCI */ + if (!early_pci_allowed()) + return false; + + vendor = read_pci_config_16(0, 2, 0, PCI_VENDOR_ID); + if (vendor != 0x8086) + return false; + + devid = read_pci_config_16(0, 2, 0, PCI_DEVICE_ID); + for (i = 0; i < ARRAY_SIZE(snb_ids); i++) + if (devid == snb_ids[i]) + return true; + + return false; +} + +static void __init trim_snb_ranges(void) +{ + /* + * Sandy Bridge graphics has trouble with certain ranges, exclude + * them from allocation. + */ + if (snb_gfx_workaround_needed()) { + unsigned long bad_ranges[] = { + 0x20050000, + 0x20110000, + 0x20130000, + 0x20138000, + 0x40004000, + }; + phys_addr_t mem; + int i; + + printk(KERN_DEBUG "reserving inaccessible SNB gfx pages\n"); + + for (;;) { + mem = memblock_find_in_range(0, 1<<20, PAGE_SIZE, + PAGE_SIZE); + if (!mem) + break; + memblock_reserve(mem, PAGE_SIZE); + printk(KERN_DEBUG "reserved 0x%08lx\n", mem); + } + + for (i = 0; i < ARRAY_SIZE(bad_ranges); i++) { + if (!memblock_reserve(bad_ranges[i], PAGE_SIZE)) + printk(KERN_DEBUG "reserved 0x%08lx\n", + bad_ranges[i]); + else + printk(KERN_DEBUG "failed to reserve 0x%08lx\n", + bad_ranges[i]); + } + } +} + static void __init trim_bios_range(void) { /* @@ -634,6 +703,7 @@ static void __init trim_bios_range(void) * take them out. */ e820_remove_range(BIOS_BEGIN, BIOS_END - BIOS_BEGIN, E820_RAM, 1); + sanitize_e820_map(e820.map, ARRAY_SIZE(e820.map), &e820.nr_map); } @@ -912,6 +982,8 @@ void __init setup_arch(char **cmdline_p) setup_real_mode(); + trim_snb_ranges(); + init_gbpages(); /* max_pfn_mapped is updated here */
SNB graphics devices have a bug that prevent them from accessing certain memory ranges, namely anything below 1M and in the pages listed in the table. So reserve those at boot if set detect a SNB gfx device on the CPU to avoid GPU hangs. Stephane Marchesin had a similar patch to the page allocator awhile back, but rather than reserving pages up front, it leaked them at allocation time. Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org> --- arch/x86/kernel/setup.c | 72 +++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 72 insertions(+)