Message ID | CAGM2reb2Zk6t=QJtJZPRGwovKKR9bdm+fzgmA_7CDVfDTjSgKA@mail.gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Sat, Jul 14, 2018 at 6:40 AM Pavel Tatashin <pasha.tatashin@oracle.com> wrote: > > I attached a temporary fix, which I could not test, as I was unable to > reproduce the problem, but it should fix the issue. Am building and will test. If this fixes it for me, I won't do the revert. Thanks, Linus
On Sat, Jul 14, 2018 at 10:11 AM Linus Torvalds <torvalds@linux-foundation.org> wrote: > > Am building and will test. If this fixes it for me, I won't do the revert. Looks good so far. It's past the 5-minute mark, at least. I'll leave it running for a while, but at least preliminarily this looks like it works. I guess it should be marked for stable, because it appears that this problem got back-ported to stable (I find that Laura reports it for 4.17.4, but not 4.17.3). Linus
On Sat 14-07-18 09:39:29, Pavel Tatashin wrote: [...] > From 95259841ef79cc17c734a994affa3714479753e3 Mon Sep 17 00:00:00 2001 > From: Pavel Tatashin <pasha.tatashin@oracle.com> > Date: Sat, 14 Jul 2018 09:15:07 -0400 > Subject: [PATCH] mm: zero unavailable pages before memmap init > > We must zero struct pages for memory that is not backed by physical memory, > or kernel does not have access to. > > Recently, there was a change which zeroed all memmap for all holes in e820. > Unfortunately, it introduced a bug that is discussed here: > > https://www.spinics.net/lists/linux-mm/msg156764.html > > Linus, also saw this bug on his machine, and confirmed that pulling > commit 124049decbb1 ("x86/e820: put !E820_TYPE_RAM regions into memblock.reserved") > fixes the issue. > > The problem is that we incorrectly zero some struct pages after they were > setup. I am sorry but I simply do not see it. zero_resv_unavail should be touching only reserved memory ranges and those are not initialized anywhere. So who has reused them and put them to normal available memory to be initialized by free_area_init_node[s]? The patch itself should be safe because reserved and available memory ranges should be disjoint so the ordering shouldn't matter. The fact that it matters is the crux thing to understand and document. So the change looks good to me but I do not understand _why_ it makes any difference. There must be somebody to put (memblock) reserved memory available to the page allocator behind our backs.
On 07/16/2018 08:06 AM, Michal Hocko wrote: > On Sat 14-07-18 09:39:29, Pavel Tatashin wrote: > [...] >> From 95259841ef79cc17c734a994affa3714479753e3 Mon Sep 17 00:00:00 2001 >> From: Pavel Tatashin <pasha.tatashin@oracle.com> >> Date: Sat, 14 Jul 2018 09:15:07 -0400 >> Subject: [PATCH] mm: zero unavailable pages before memmap init >> >> We must zero struct pages for memory that is not backed by physical memory, >> or kernel does not have access to. >> >> Recently, there was a change which zeroed all memmap for all holes in e820. >> Unfortunately, it introduced a bug that is discussed here: >> >> https://www.spinics.net/lists/linux-mm/msg156764.html >> >> Linus, also saw this bug on his machine, and confirmed that pulling >> commit 124049decbb1 ("x86/e820: put !E820_TYPE_RAM regions into memblock.reserved") >> fixes the issue. >> >> The problem is that we incorrectly zero some struct pages after they were >> setup. > > I am sorry but I simply do not see it. zero_resv_unavail should be > touching only reserved memory ranges and those are not initialized > anywhere. So who has reused them and put them to normal available > memory to be initialized by free_area_init_node[s]? > > The patch itself should be safe because reserved and available memory > ranges should be disjoint so the ordering shouldn't matter. The fact > that it matters is the crux thing to understand and document. So the > change looks good to me but I do not understand _why_ it makes any > difference. There must be somebody to put (memblock) reserved memory > available to the page allocator behind our backs. Thats exactly right, and I am also not sure why this is happening, there must be some overlapping happening that just should not. I will study it later. Now, I need to figure out what is happening with x86-32 failure, that is caused by my fix. Pavel
On Mon 16-07-18 08:09:19, Pavel Tatashin wrote: > > > On 07/16/2018 08:06 AM, Michal Hocko wrote: > > On Sat 14-07-18 09:39:29, Pavel Tatashin wrote: > > [...] > >> From 95259841ef79cc17c734a994affa3714479753e3 Mon Sep 17 00:00:00 2001 > >> From: Pavel Tatashin <pasha.tatashin@oracle.com> > >> Date: Sat, 14 Jul 2018 09:15:07 -0400 > >> Subject: [PATCH] mm: zero unavailable pages before memmap init > >> > >> We must zero struct pages for memory that is not backed by physical memory, > >> or kernel does not have access to. > >> > >> Recently, there was a change which zeroed all memmap for all holes in e820. > >> Unfortunately, it introduced a bug that is discussed here: > >> > >> https://www.spinics.net/lists/linux-mm/msg156764.html > >> > >> Linus, also saw this bug on his machine, and confirmed that pulling > >> commit 124049decbb1 ("x86/e820: put !E820_TYPE_RAM regions into memblock.reserved") > >> fixes the issue. > >> > >> The problem is that we incorrectly zero some struct pages after they were > >> setup. > > > > I am sorry but I simply do not see it. zero_resv_unavail should be > > touching only reserved memory ranges and those are not initialized > > anywhere. So who has reused them and put them to normal available > > memory to be initialized by free_area_init_node[s]? > > > > The patch itself should be safe because reserved and available memory > > ranges should be disjoint so the ordering shouldn't matter. The fact > > that it matters is the crux thing to understand and document. So the > > change looks good to me but I do not understand _why_ it makes any > > difference. There must be somebody to put (memblock) reserved memory > > available to the page allocator behind our backs. > > Thats exactly right, and I am also not sure why this is happening, > there must be some overlapping happening that just should not. I will > study it later. Maybe a stupid question, but I do not see it from the code (this init code is just to complex to keep it cached in head so I always have to study the code again and again, sigh). So what exactly prevents memmap_init_zone to stumble over reserved regions? We do play some ugly games to find a first !reserved pfn in the node but I do not really see anything in the init path to properly skip over reserved holes inside the node.
> Maybe a stupid question, but I do not see it from the code (this init > code is just to complex to keep it cached in head so I always have to > study the code again and again, sigh). So what exactly prevents > memmap_init_zone to stumble over reserved regions? We do play some ugly > games to find a first !reserved pfn in the node but I do not really see > anything in the init path to properly skip over reserved holes inside > the node. Hi Michal, This is not a stupid question. I figured out how this whole thing became broken: Revert "mm: page_alloc: skip over regions of invalid pfns where possible" caused that. Because, before that was reverted, memmap_init_zone() would use memblock.memory to check that only pages that have physical backing are initialized. But, now after that was reverted zer_resv_unavail() scheme became totally broken. The concept is quite easy: zero all the allocated memmap memory that has not been initialized by memmap_init_zone(). So, I think I will modify memmap_init_zone() to zero the skipped pfns that have memmap backing. But, that requires more thinking. Thank you, Pavel
On Mon, Jul 16, 2018 at 02:29:18PM +0200, Michal Hocko wrote: > On Mon 16-07-18 08:09:19, Pavel Tatashin wrote: > > > > > > On 07/16/2018 08:06 AM, Michal Hocko wrote: > > > On Sat 14-07-18 09:39:29, Pavel Tatashin wrote: > > > [...] > > >> From 95259841ef79cc17c734a994affa3714479753e3 Mon Sep 17 00:00:00 2001 > > >> From: Pavel Tatashin <pasha.tatashin@oracle.com> > > >> Date: Sat, 14 Jul 2018 09:15:07 -0400 > > >> Subject: [PATCH] mm: zero unavailable pages before memmap init > > >> > > >> We must zero struct pages for memory that is not backed by physical memory, > > >> or kernel does not have access to. > > >> > > >> Recently, there was a change which zeroed all memmap for all holes in e820. > > >> Unfortunately, it introduced a bug that is discussed here: > > >> > > >> https://www.spinics.net/lists/linux-mm/msg156764.html > > >> > > >> Linus, also saw this bug on his machine, and confirmed that pulling > > >> commit 124049decbb1 ("x86/e820: put !E820_TYPE_RAM regions into memblock.reserved") > > >> fixes the issue. > > >> > > >> The problem is that we incorrectly zero some struct pages after they were > > >> setup. > > > > > > I am sorry but I simply do not see it. zero_resv_unavail should be > > > touching only reserved memory ranges and those are not initialized > > > anywhere. So who has reused them and put them to normal available > > > memory to be initialized by free_area_init_node[s]? > > > > > > The patch itself should be safe because reserved and available memory > > > ranges should be disjoint so the ordering shouldn't matter. The fact > > > that it matters is the crux thing to understand and document. So the > > > change looks good to me but I do not understand _why_ it makes any > > > difference. There must be somebody to put (memblock) reserved memory > > > available to the page allocator behind our backs. > > > > Thats exactly right, and I am also not sure why this is happening, > > there must be some overlapping happening that just should not. I will > > study it later. > > Maybe a stupid question, but I do not see it from the code (this init > code is just to complex to keep it cached in head so I always have to > study the code again and again, sigh). So what exactly prevents > memmap_init_zone to stumble over reserved regions? We do play some ugly > games to find a first !reserved pfn in the node but I do not really see > anything in the init path to properly skip over reserved holes inside > the node. I think we are not really skiping reserved regions in memmap_init_zone(). memmap_init_zone() gets just called being size the subtract of zone_end_pfn - zone_start_pfn, and I don't see that we're checking if those pfn's fall in reserved regions. To get a better insight, I just put a couple of printk's: kernel: zero_resv_unavail: start-end: 0x9f000-0x100000 kernel: zero_resv_unavail: pfn: 0x9f kernel: zero_resv_unavail: pfn: 0xa0 kernel: zero_resv_unavail: pfn: 0xa1 kernel: zero_resv_unavail: pfn: 0xa2 kernel: zero_resv_unavail: pfn: 0xa3 kernel: zero_resv_unavail: pfn: 0xa4 kernel: zero_resv_unavail: pfn: 0xa5 kernel: zero_resv_unavail: pfn: 0xa6 kernel: zero_resv_unavail: pfn: 0xa7 kernel: zero_resv_unavail: pfn: 0xa8 kernel: zero_resv_unavail: pfn: 0xa9 ... ... kernel: memmap_init_zone: pfn: 9f kernel: memmap_init_zone: pfn: a0 kernel: memmap_init_zone: pfn: a1 kernel: memmap_init_zone: pfn: a2 kernel: memmap_init_zone: pfn: a3 kernel: memmap_init_zone: pfn: a4 kernel: memmap_init_zone: pfn: a5 kernel: memmap_init_zone: pfn: a6 kernel: memmap_init_zone: pfn: a7 kernel: memmap_init_zone: pfn: a8 kernel: memmap_init_zone: pfn: a9 kernel: memmap_init_zone: pfn: aa kernel: memmap_init_zone: pfn: ab kernel: memmap_init_zone: pfn: ac kernel: memmap_init_zone: pfn: ad kernel: memmap_init_zone: pfn: ae kernel: memmap_init_zone: pfn: af kernel: memmap_init_zone: pfn: b0 kernel: memmap_init_zone: pfn: b1 kernel: memmap_init_zone: pfn: b2 The printk from memmap_init_zone has already passed the checks about early_pfn_ etc. So, reverting Pavel's fix would twist this, and we'd end up zeroing pages that are already set up in memmap_init_zone() (as we already had).
On Mon 16-07-18 09:26:41, Pavel Tatashin wrote: > > Maybe a stupid question, but I do not see it from the code (this init > > code is just to complex to keep it cached in head so I always have to > > study the code again and again, sigh). So what exactly prevents > > memmap_init_zone to stumble over reserved regions? We do play some ugly > > games to find a first !reserved pfn in the node but I do not really see > > anything in the init path to properly skip over reserved holes inside > > the node. > > Hi Michal, > > This is not a stupid question. I figured out how this whole thing > became broken: Revert "mm: page_alloc: skip over regions of invalid > pfns where possible" caused that. > > Because, before that was reverted, memmap_init_zone() would use > memblock.memory to check that only pages that have physical backing > are initialized. But, now after that was reverted zer_resv_unavail() > scheme became totally broken. > > The concept is quite easy: zero all the allocated memmap memory that > has not been initialized by memmap_init_zone(). So, I think I will > modify memmap_init_zone() to zero the skipped pfns that have memmap > backing. But, that requires more thinking. I would just go with iterating over valid (unreserved) memory ranges in memmap_init_zone.
From 95259841ef79cc17c734a994affa3714479753e3 Mon Sep 17 00:00:00 2001 From: Pavel Tatashin <pasha.tatashin@oracle.com> Date: Sat, 14 Jul 2018 09:15:07 -0400 Subject: [PATCH] mm: zero unavailable pages before memmap init We must zero struct pages for memory that is not backed by physical memory, or kernel does not have access to. Recently, there was a change which zeroed all memmap for all holes in e820. Unfortunately, it introduced a bug that is discussed here: https://www.spinics.net/lists/linux-mm/msg156764.html Linus, also saw this bug on his machine, and confirmed that pulling commit 124049decbb1 ("x86/e820: put !E820_TYPE_RAM regions into memblock.reserved") fixes the issue. The problem is that we incorrectly zero some struct pages after they were setup. The fix is to zero unavailable struct pages prior to initializing of struct pages. A more detailed fix should come later that would avoid double zeroing cases: one in __init_single_page(), the other one in zero_resv_unavail(). Fixes: 124049decbb1 ("x86/e820: put !E820_TYPE_RAM regions into memblock.reserved") Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com> --- mm/page_alloc.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 1521100f1e63..5d800d61ddb7 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -6847,6 +6847,7 @@ void __init free_area_init_nodes(unsigned long *max_zone_pfn) /* Initialise every node */ mminit_verify_pageflags_layout(); setup_nr_node_ids(); + zero_resv_unavail(); for_each_online_node(nid) { pg_data_t *pgdat = NODE_DATA(nid); free_area_init_node(nid, NULL, @@ -6857,7 +6858,6 @@ void __init free_area_init_nodes(unsigned long *max_zone_pfn) node_set_state(nid, N_MEMORY); check_for_memory(pgdat, nid); } - zero_resv_unavail(); } static int __init cmdline_parse_core(char *p, unsigned long *core, @@ -7033,9 +7033,9 @@ void __init set_dma_reserve(unsigned long new_dma_reserve) void __init free_area_init(unsigned long *zones_size) { + zero_resv_unavail(); free_area_init_node(0, zones_size, __pa(PAGE_OFFSET) >> PAGE_SHIFT, NULL); - zero_resv_unavail(); } static int page_alloc_cpu_dead(unsigned int cpu) -- 2.18.0