Message ID | 20171110010907.qfkqhrbtdkt5y3hy@smitten (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 11/09/2017 05:09 PM, Tycho Andersen wrote: > which I guess is from the additional flags in grow_dev_page() somewhere down > the stack. Anyway... it seems this is a kernel allocation that's using > MIGRATE_MOVABLE, so perhaps we need some more fine tuned heuristic than just > all MOVABLE allocations are un-mapped via xpfo, and all the others are mapped. > > Do you have any ideas? It still has to do a kmap() or kmap_atomic() to be able to access it. I thought you hooked into that. Why isn't that path getting hit for these?
On 11/13/2017 02:20 PM, Dave Hansen wrote: > On 11/09/2017 05:09 PM, Tycho Andersen wrote: >> which I guess is from the additional flags in grow_dev_page() somewhere down >> the stack. Anyway... it seems this is a kernel allocation that's using >> MIGRATE_MOVABLE, so perhaps we need some more fine tuned heuristic than just >> all MOVABLE allocations are un-mapped via xpfo, and all the others are mapped. >> >> Do you have any ideas? > > It still has to do a kmap() or kmap_atomic() to be able to access it. I > thought you hooked into that. Why isn't that path getting hit for these? Oh, this looks to be accessing data mapped by a buffer_head. It (rudely) accesses data via: void set_bh_page(struct buffer_head *bh, ... bh->b_data = page_address(page) + offset;
Hi Dave, On Mon, Nov 13, 2017 at 02:46:25PM -0800, Dave Hansen wrote: > On 11/13/2017 02:20 PM, Dave Hansen wrote: > > On 11/09/2017 05:09 PM, Tycho Andersen wrote: > >> which I guess is from the additional flags in grow_dev_page() somewhere down > >> the stack. Anyway... it seems this is a kernel allocation that's using > >> MIGRATE_MOVABLE, so perhaps we need some more fine tuned heuristic than just > >> all MOVABLE allocations are un-mapped via xpfo, and all the others are mapped. > >> > >> Do you have any ideas? > > > > It still has to do a kmap() or kmap_atomic() to be able to access it. I > > thought you hooked into that. Why isn't that path getting hit for these? > > Oh, this looks to be accessing data mapped by a buffer_head. It > (rudely) accesses data via: > > void set_bh_page(struct buffer_head *bh, > ... > bh->b_data = page_address(page) + offset; Ah, yes. I guess there will be many bugs like this :). Anyway, I'll try to cook up a patch. Thanks! Tycho
On 11/14/2017 04:33 PM, Tycho Andersen wrote: >> >> void set_bh_page(struct buffer_head *bh, >> ... >> bh->b_data = page_address(page) + offset; > Ah, yes. I guess there will be many bugs like this :). Anyway, I'll > try to cook up a patch. It won't catch all the bugs, but it might be handy to have a debugging mode that records the location of the last user of page_address() and friends. That way, when we trip over an unmapped page, we have an easier time finding the offender.
On Tue, Nov 14, 2017 at 04:37:34PM -0800, Dave Hansen wrote: > On 11/14/2017 04:33 PM, Tycho Andersen wrote: > >> > >> void set_bh_page(struct buffer_head *bh, > >> ... > >> bh->b_data = page_address(page) + offset; > > Ah, yes. I guess there will be many bugs like this :). Anyway, I'll > > try to cook up a patch. > > It won't catch all the bugs, but it might be handy to have a debugging > mode that records the location of the last user of page_address() and > friends. That way, when we trip over an unmapped page, we have an > easier time finding the offender. Ok, what I've been doing now is saving the stack frame of the code that allocated the page, which also seems useful. I'll see about adding a DEBUG_XPFO config option for the next series with both of these things, though. Cheers, Tycho
On Mon, Nov 13, 2017 at 02:46:25PM -0800, Dave Hansen wrote: > On 11/13/2017 02:20 PM, Dave Hansen wrote: > > On 11/09/2017 05:09 PM, Tycho Andersen wrote: > >> which I guess is from the additional flags in grow_dev_page() somewhere down > >> the stack. Anyway... it seems this is a kernel allocation that's using > >> MIGRATE_MOVABLE, so perhaps we need some more fine tuned heuristic than just > >> all MOVABLE allocations are un-mapped via xpfo, and all the others are mapped. > >> > >> Do you have any ideas? > > > > It still has to do a kmap() or kmap_atomic() to be able to access it. I > > thought you hooked into that. Why isn't that path getting hit for these? > > Oh, this looks to be accessing data mapped by a buffer_head. It > (rudely) accesses data via: > > void set_bh_page(struct buffer_head *bh, > ... > bh->b_data = page_address(page) + offset; We don't need to kmap in order to access MOVABLE allocations. kmap is only needed for HIGHMEM allocations. So there's nothing wrong with ext4 or set_bh_page().
On 11/14/2017 07:44 PM, Matthew Wilcox wrote: > On Mon, Nov 13, 2017 at 02:46:25PM -0800, Dave Hansen wrote: >> On 11/13/2017 02:20 PM, Dave Hansen wrote: >>> On 11/09/2017 05:09 PM, Tycho Andersen wrote: >>>> which I guess is from the additional flags in grow_dev_page() somewhere down >>>> the stack. Anyway... it seems this is a kernel allocation that's using >>>> MIGRATE_MOVABLE, so perhaps we need some more fine tuned heuristic than just >>>> all MOVABLE allocations are un-mapped via xpfo, and all the others are mapped. >>>> >>>> Do you have any ideas? >>> >>> It still has to do a kmap() or kmap_atomic() to be able to access it. I >>> thought you hooked into that. Why isn't that path getting hit for these? >> >> Oh, this looks to be accessing data mapped by a buffer_head. It >> (rudely) accesses data via: >> >> void set_bh_page(struct buffer_head *bh, >> ... >> bh->b_data = page_address(page) + offset; > > We don't need to kmap in order to access MOVABLE allocations. kmap is > only needed for HIGHMEM allocations. So there's nothing wrong with ext4 > or set_bh_page(). Yeah, it's definitely not _buggy_. Although, I do wonder what we should do about these for XPFO. Should we just stick a kmap() in there and comment it? What we really need is a mechanism to say "use this as a kernel page" and "stop using this as a kernel page". kmap() does that... kinda. It's not a perfect fit, but it's pretty close.
On Tue, Nov 14, 2017 at 11:00:20PM -0800, Dave Hansen wrote: > On 11/14/2017 07:44 PM, Matthew Wilcox wrote: > > We don't need to kmap in order to access MOVABLE allocations. kmap is > > only needed for HIGHMEM allocations. So there's nothing wrong with ext4 > > or set_bh_page(). > > Yeah, it's definitely not _buggy_. > > Although, I do wonder what we should do about these for XPFO. Should we > just stick a kmap() in there and comment it? What we really need is a > mechanism to say "use this as a kernel page" and "stop using this as a > kernel page". kmap() does that... kinda. It's not a perfect fit, but > it's pretty close. It'd be kind of funny if getting XPFO working better means improving how well Linux runs on 32-bit machines with HIGHMEM. I think there's always going to be interest in those -- ARM developed 36 bit physmem before biting the bullet and going to arm64. Maybe OpenRISC will do that next ;-)
On Wed, Nov 15, 2017 at 06:58:35AM -0800, Matthew Wilcox wrote: > On Tue, Nov 14, 2017 at 11:00:20PM -0800, Dave Hansen wrote: > > On 11/14/2017 07:44 PM, Matthew Wilcox wrote: > > > We don't need to kmap in order to access MOVABLE allocations. kmap is > > > only needed for HIGHMEM allocations. So there's nothing wrong with ext4 > > > or set_bh_page(). > > > > Yeah, it's definitely not _buggy_. > > > > Although, I do wonder what we should do about these for XPFO. Should we > > just stick a kmap() in there and comment it? What we really need is a > > mechanism to say "use this as a kernel page" and "stop using this as a > > kernel page". kmap() does that... kinda. It's not a perfect fit, but > > it's pretty close. > > It'd be kind of funny if getting XPFO working better means improving > how well Linux runs on 32-bit machines with HIGHMEM. I think there's > always going to be interest in those -- ARM developed 36 bit physmem > before biting the bullet and going to arm64. Maybe OpenRISC will do > that next ;-) Oh, sorry, I didn't realize that this wasn't a bug. In any case, this seems like sort of an uphill battle -- lots of places are going to do stuff like this since it's legal, adding code to work around it just for XPFO seems like a lot of burden on the kernel. (Of course, I'm open to convincing :) How common are these MOVABLE allocations that the kernel does? What if we did some hybrid approach, where we re-map the lists based on MOVABLE/UNMOVABLE, but then check the actual GFP flags on allocation to see if they match what we set when populating the free list, and re-map accordingly if they don't. Or is there some other way? Cheers, Tycho
On Wed, Nov 15, 2017 at 08:20:57AM -0800, Tycho Andersen wrote: > On Wed, Nov 15, 2017 at 06:58:35AM -0800, Matthew Wilcox wrote: > > On Tue, Nov 14, 2017 at 11:00:20PM -0800, Dave Hansen wrote: > > > On 11/14/2017 07:44 PM, Matthew Wilcox wrote: > > > > We don't need to kmap in order to access MOVABLE allocations. kmap is > > > > only needed for HIGHMEM allocations. So there's nothing wrong with ext4 > > > > or set_bh_page(). > > > > > > Yeah, it's definitely not _buggy_. > > > > > > Although, I do wonder what we should do about these for XPFO. Should we > > > just stick a kmap() in there and comment it? What we really need is a > > > mechanism to say "use this as a kernel page" and "stop using this as a > > > kernel page". kmap() does that... kinda. It's not a perfect fit, but > > > it's pretty close. > > > > It'd be kind of funny if getting XPFO working better means improving > > how well Linux runs on 32-bit machines with HIGHMEM. I think there's > > always going to be interest in those -- ARM developed 36 bit physmem > > before biting the bullet and going to arm64. Maybe OpenRISC will do > > that next ;-) > > Oh, sorry, I didn't realize that this wasn't a bug. In any case, this > seems like sort of an uphill battle -- lots of places are going to do > stuff like this since it's legal, adding code to work around it just > for XPFO seems like a lot of burden on the kernel. (Of course, I'm > open to convincing :) > > How common are these MOVABLE allocations that the kernel does? What if > we did some hybrid approach, where we re-map the lists based on > MOVABLE/UNMOVABLE, but then check the actual GFP flags on allocation > to see if they match what we set when populating the free list, and > re-map accordingly if they don't. The assumption is that HIGHMEM allocations aren't mapped (on 32-bit systems) and so we always use kmap/kmap_atomic to access them. The emphasis has been on moving the largest consumers of memory over to HIGHMEM; we were trying to manage 64GB of memory in 1GB of kernel address space, so the page cache was the first and obvious biggest consumer to get booted out of the permanent mapping. I know page tables were also pushed into HIGHMEM. So large chunks of the kernel use kmap() because they don't know whether they'll be operating on highmem or lowmem, and it'll do the right thing for either kind of memory. I didn't think MOVABLE allocations were particularly common. But I haven't been paying close attention to MM development.
diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 3d9c1b486e1f..47b46ff1148a 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2348,6 +2348,7 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order, if (is_migrate_cma(get_pcppage_migratetype(page))) __mod_zone_page_state(zone, NR_FREE_CMA_PAGES, -(1 << order)); + xpfo_pcp_refill(page, migratetype, order); } /* diff --git a/mm/xpfo.c b/mm/xpfo.c index 080235a2f129..b381d83c6e78 100644 --- a/mm/xpfo.c +++ b/mm/xpfo.c @@ -260,3 +265,85 @@ void xpfo_temp_unmap(const void *addr, size_t size, void **mapping, kunmap_atomic(mapping[i]); } EXPORT_SYMBOL(xpfo_temp_unmap); + +void xpfo_pcp_refill(struct page *page, enum migratetype migratetype, int order) +{ + int i; + bool flush_tlb = false; + + if (!static_branch_unlikely(&xpfo_initialized)) + return; + + for (i = 0; i < 1 << order; i++) { + struct xpfo *xpfo; + + xpfo = lookup_xpfo(page + i); + if (!xpfo) + continue; + + if (unlikely(!xpfo->initialized)) { + spin_lock_init(&xpfo->maplock); + atomic_set(&xpfo->mapcount, 0); + xpfo->initialized = true; + } + + xpfo->trace.max_entries = 20; + xpfo->trace.skip = 1; + xpfo->trace.entries = xpfo->entries; + xpfo->trace.nr_entries = 0; + xpfo->trace2.max_entries = 20; + xpfo->trace2.skip = 1; + xpfo->trace2.entries = xpfo->entries2; + xpfo->trace2.nr_entries = 0; + + xpfo->migratetype = migratetype; + + save_stack_trace(&xpfo->trace); + + if (migratetype == MIGRATE_MOVABLE) { + /* GPF_HIGHUSER */ + set_kpte(page_address(page + i), page + i, __pgprot(0)); + if (!test_and_set_bit(XPFO_PAGE_UNMAPPED, &xpfo->flags)) + flush_tlb = true; + set_bit(XPFO_PAGE_USER, &xpfo->flags); + } else { + /* + * GFP_KERNEL and everything else; for now we just + * leave it mapped + */ + set_kpte(page_address(page + i), page + i, PAGE_KERNEL); + if (test_and_clear_bit(XPFO_PAGE_UNMAPPED, &xpfo->flags)) + flush_tlb = true; + clear_bit(XPFO_PAGE_USER, &xpfo->flags); + } + } + + if (flush_tlb) + xpfo_flush_kernel_tlb(page, order); +} + But I'm getting some faults: [ 1.897311] BUG: unable to handle kernel paging request at ffff880139b75012 [ 1.898244] IP: ext4_fill_super+0x2f3b/0x33c0 [ 1.898827] PGD 1ea6067 [ 1.898828] P4D 1ea6067 [ 1.899170] PUD 1ea9067 [ 1.899508] PMD 119478063 [ 1.899850] PTE 139b75000 [ 1.900211] [ 1.900760] Oops: 0000 [#1] SMP [ 1.901160] Modules linked in: [ 1.901565] CPU: 3 PID: 990 Comm: exe Not tainted 4.13.0+ #85 [ 1.902348] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014 [ 1.903420] task: ffff88011ae7cb00 task.stack: ffffc9001a338000 [ 1.904108] RIP: 0010:ext4_fill_super+0x2f3b/0x33c0 [ 1.904649] RSP: 0018:ffffc9001a33bce0 EFLAGS: 00010246 [ 1.905240] RAX: 00000000000000f0 RBX: ffff880139b75000 RCX: ffffffff81c456b8 [ 1.906047] RDX: 0000000000000001 RSI: 0000000000000082 RDI: 0000000000000246 [ 1.906874] RBP: ffffc9001a33bda8 R08: 0000000000000000 R09: 0000000000000183 [ 1.908053] R10: ffff88011a9e0800 R11: ffffffff818493e0 R12: ffff88011a9e0800 [ 1.908920] R13: ffff88011a9e6800 R14: 000000000077fefa R15: 0000000000000000 [ 1.909775] FS: 00007f8169747700(0000) GS:ffff880139d80000(0000) knlGS:0000000000000000 [ 1.910667] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033