Message ID | 20210617194657.0A99CB22@viggo.jf.intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | x86/mm: avoid truncating memblocks for SGX memory | expand |
>-----Original Message----- >From: Dave Hansen <dave.hansen@linux.intel.com> >Sent: Friday, June 18, 2021 3:47 AM >To: linux-mm@kvack.org >Cc: linux-kernel@vger.kernel.org; Dave Hansen ><dave.hansen@linux.intel.com>; Du, Fan <fan.du@intel.com>; Chatre, >Reinette <reinette.chatre@intel.com>; jarkko@kernel.org; Williams, Dan J ><dan.j.williams@intel.com>; Hansen, Dave <dave.hansen@intel.com>; >x86@kernel.org; linux-sgx@vger.kernel.org; luto@kernel.org; >peterz@infradead.org >Subject: [PATCH] x86/mm: avoid truncating memblocks for SGX memory > > >From: Fan Du <fan.du@intel.com> > >tl;dr: > >Several SGX users reported seeing the following message on NUMA systems: > > sgx: [Firmware Bug]: Unable to map EPC section to online node. Fallback >to the NUMA node 0. > >This turned out to be the 'memblock' code mistakenly throwing away >SGX memory. > >=== Full Changelog === > >The 'max_pfn' variable represents the highest known RAM address. It can >be used, for instance, to quickly determine for which physical addresses >there is mem_map[] space allocated. The numa_meminfo code makes an >effort to throw out ("trim") all memory blocks which are above 'max_pfn'. > >SGX memory is not considered RAM (it is marked as "Reserved" in the >e820) and is not taken into account by max_pfn. Despite this, SGX >memory areas have NUMA affinity and are enumerated in the ACPI SRAT. >The existing SGX code uses the numa_meminfo mechanism to look up the >NUMA affinity for its memory areas. > >In cases where SGX memory was above max_pfn (usually just the one EPC >section in the last highest NUMA node), the numa_memblock is truncated >at 'max_pfn', which is below the SGX memory. When the SGX code tries to >look up the affinity of this memory, it fails and produces an error message: > > sgx: [Firmware Bug]: Unable to map EPC section to online node. Fallback >to the NUMA node 0. > >and assigns the memory to NUMA node 0. > >Instead of silently truncating the memory block at 'max_pfn' and >dropping the SGX memory, add the truncated portion to >'numa_reserved_meminfo'. This allows the SGX code to later determine >the NUMA affinity of its 'Reserved' area. > >Without this patch, numa_meminfo looks like this (from 'crash'): > > blk = { start = 0x0, end = 0x2080000000, nid = 0x0 } > { start = 0x2080000000, end = 0x4000000000, nid = 0x1 } > >numa_reserved_meminfo is empty. > >After the patch, numa_meminfo looks like this: > > blk = { start = 0x0, end = 0x2080000000, nid = 0x0 } > { start = 0x2080000000, end = 0x4000000000, nid = 0x1 } > >and numa_reserved_meminfo has an entry for node 1's SGX memory: > > blk = { start = 0x4000000000, end = 0x4080000000, nid = 0x1 } > > [ daveh: completely rewrote/reworked changelog ] Really what's your PROBLEM?! Neither did I ask you to send my patch, nor do I agree to change it. Who grant you the right to do this ?! It's disgraceful to do this w/o my notice. If you have comments, please DO align with the other two maintainers Jarkko and Dan first, who already reviewed the patch in this format. https://lkml.org/lkml/2021/6/17/1151 >Signed-off-by: Fan Du <fan.du@intel.com> >Reported-by: Reinette Chatre <reinette.chatre@intel.com> >Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> >Reviewed-by: Dan Williams <dan.j.williams@intel.com> >Reviewed-by: Dave Hansen <dave.hansen@intel.com> >Fixes: 5d30f92e7631 ("x86/NUMA: Provide a range-to-target_node lookup >facility") >Cc: x86@kernel.org >Cc: linux-sgx@vger.kernel.org >Cc: Andy Lutomirski <luto@kernel.org> >Cc: Peter Zijlstra <peterz@infradead.org> >--- > > b/arch/x86/mm/numa.c | 8 +++++++- > 1 file changed, 7 insertions(+), 1 deletion(-) > >diff -puN arch/x86/mm/numa.c~sgx-srat arch/x86/mm/numa.c >--- a/arch/x86/mm/numa.c~sgx-srat 2021-06-17 11:23:05.116159990 -0700 >+++ b/arch/x86/mm/numa.c 2021-06-17 11:55:46.117155100 -0700 >@@ -254,7 +254,13 @@ int __init numa_cleanup_meminfo(struct n > > /* make sure all non-reserved blocks are inside the limits */ > bi->start = max(bi->start, low); >- bi->end = min(bi->end, high); >+ >+ /* preserve info for non-RAM areas above 'max_pfn': */ >+ if (bi->end > high) { >+ numa_add_memblk_to(bi->nid, high, bi->end, >+ &numa_reserved_meminfo); >+ bi->end = high; >+ } > > /* and there's no empty block */ > if (bi->start >= bi->end) >_
On 6/17/21 12:46 PM, Dave Hansen wrote: > Signed-off-by: Fan Du <fan.du@intel.com> > Reported-by: Reinette Chatre <reinette.chatre@intel.com> > Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> > Reviewed-by: Dan Williams <dan.j.williams@intel.com> > Reviewed-by: Dave Hansen <dave.hansen@intel.com> > Fixes: 5d30f92e7631 ("x86/NUMA: Provide a range-to-target_node lookup facility") > Cc: x86@kernel.org > Cc: linux-sgx@vger.kernel.org > Cc: Andy Lutomirski <luto@kernel.org> > Cc: Peter Zijlstra <peterz@infradead.org> Forgot to add: Signed-off-by: Dave Hansen <dave.hansen@intel.com>
diff -puN arch/x86/mm/numa.c~sgx-srat arch/x86/mm/numa.c --- a/arch/x86/mm/numa.c~sgx-srat 2021-06-17 11:23:05.116159990 -0700 +++ b/arch/x86/mm/numa.c 2021-06-17 11:55:46.117155100 -0700 @@ -254,7 +254,13 @@ int __init numa_cleanup_meminfo(struct n /* make sure all non-reserved blocks are inside the limits */ bi->start = max(bi->start, low); - bi->end = min(bi->end, high); + + /* preserve info for non-RAM areas above 'max_pfn': */ + if (bi->end > high) { + numa_add_memblk_to(bi->nid, high, bi->end, + &numa_reserved_meminfo); + bi->end = high; + } /* and there's no empty block */ if (bi->start >= bi->end)