Message ID | 20170920201714.19817-10-pasha.tatashin@oracle.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Hi Pavel, On Wed, Sep 20, 2017 at 04:17:11PM -0400, Pavel Tatashin wrote: > During early boot, kasan uses vmemmap_populate() to establish its shadow > memory. But, that interface is intended for struct pages use. > > Because of the current project, vmemmap won't be zeroed during allocation, > but kasan expects that memory to be zeroed. We are adding a new > kasan_map_populate() function to resolve this difference. Thanks for putting this together. I've given this a spin on arm64, and can confirm that it works. Given that this involes redundant walking of page tables, I still think it'd be preferable to have some common *_populate() helper that took a gfp argument, but I guess it's not the end of the world. I'll leave it to Will and Catalin to say whether they're happy with the page table walking and the new p{u,m}d_large() helpers added to arm64. Thanks, Mark. > > Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com> > --- > arch/arm64/include/asm/pgtable.h | 3 ++ > include/linux/kasan.h | 2 ++ > mm/kasan/kasan_init.c | 67 ++++++++++++++++++++++++++++++++++++++++ > 3 files changed, 72 insertions(+) > > diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h > index bc4e92337d16..d89713f04354 100644 > --- a/arch/arm64/include/asm/pgtable.h > +++ b/arch/arm64/include/asm/pgtable.h > @@ -381,6 +381,9 @@ extern pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn, > PUD_TYPE_TABLE) > #endif > > +#define pmd_large(pmd) pmd_sect(pmd) > +#define pud_large(pud) pud_sect(pud) > + > static inline void set_pmd(pmd_t *pmdp, pmd_t pmd) > { > *pmdp = pmd; > diff --git a/include/linux/kasan.h b/include/linux/kasan.h > index a5c7046f26b4..7e13df1722c2 100644 > --- a/include/linux/kasan.h > +++ b/include/linux/kasan.h > @@ -78,6 +78,8 @@ size_t kasan_metadata_size(struct kmem_cache *cache); > > bool kasan_save_enable_multi_shot(void); > void kasan_restore_multi_shot(bool enabled); > +int __meminit kasan_map_populate(unsigned long start, unsigned long end, > + int node); > > #else /* CONFIG_KASAN */ > > diff --git a/mm/kasan/kasan_init.c b/mm/kasan/kasan_init.c > index 554e4c0f23a2..57a973f05f63 100644 > --- a/mm/kasan/kasan_init.c > +++ b/mm/kasan/kasan_init.c > @@ -197,3 +197,70 @@ void __init kasan_populate_zero_shadow(const void *shadow_start, > zero_p4d_populate(pgd, addr, next); > } while (pgd++, addr = next, addr != end); > } > + > +/* Creates mappings for kasan during early boot. The mapped memory is zeroed */ > +int __meminit kasan_map_populate(unsigned long start, unsigned long end, > + int node) > +{ > + unsigned long addr, pfn, next; > + unsigned long long size; > + pgd_t *pgd; > + p4d_t *p4d; > + pud_t *pud; > + pmd_t *pmd; > + pte_t *pte; > + int ret; > + > + ret = vmemmap_populate(start, end, node); > + /* > + * We might have partially populated memory, so check for no entries, > + * and zero only those that actually exist. > + */ > + for (addr = start; addr < end; addr = next) { > + pgd = pgd_offset_k(addr); > + if (pgd_none(*pgd)) { > + next = pgd_addr_end(addr, end); > + continue; > + } > + > + p4d = p4d_offset(pgd, addr); > + if (p4d_none(*p4d)) { > + next = p4d_addr_end(addr, end); > + continue; > + } > + > + pud = pud_offset(p4d, addr); > + if (pud_none(*pud)) { > + next = pud_addr_end(addr, end); > + continue; > + } > + if (pud_large(*pud)) { > + /* This is PUD size page */ > + next = pud_addr_end(addr, end); > + size = PUD_SIZE; > + pfn = pud_pfn(*pud); > + } else { > + pmd = pmd_offset(pud, addr); > + if (pmd_none(*pmd)) { > + next = pmd_addr_end(addr, end); > + continue; > + } > + if (pmd_large(*pmd)) { > + /* This is PMD size page */ > + next = pmd_addr_end(addr, end); > + size = PMD_SIZE; > + pfn = pmd_pfn(*pmd); > + } else { > + pte = pte_offset_kernel(pmd, addr); > + next = addr + PAGE_SIZE; > + if (pte_none(*pte)) > + continue; > + /* This is base size page */ > + size = PAGE_SIZE; > + pfn = pte_pfn(*pte); > + } > + } > + memset(phys_to_virt(PFN_PHYS(pfn)), 0, size); > + } > + return ret; > +} > -- > 2.14.1 >
Hi Mark, I considered using a new *populate() function for shadow without using vmemmap_populate(), but that makes things unnecessary complicated: vmemmap_populate() has builtin: 1. large page support 2. device memory support 3. node locality support 4. several config based variants on different platforms All of that will cause the code simply be duplicated on each platform if we want to support that in kasan. We could limit ourselves to only supporting base pages in memory by using something like vmemmap_populate_basepages(), but that is a step backward. Kasan benefits from using large pages now, why remove it? So, the solution I provide is walking page table right after memory is mapped. Since, we are using the actual page table, it is guaranteed that we are not going to miss any mapped memory, and also it is in common code, which makes things smaller and nicer. Thank you, Pasha On 10/03/2017 10:48 AM, Mark Rutland wrote: > > I've given this a spin on arm64, and can confirm that it works. > > Given that this involes redundant walking of page tables, I still think > it'd be preferable to have some common *_populate() helper that took a > gfp argument, but I guess it's not the end of the world. > > I'll leave it to Will and Catalin to say whether they're happy with the > page table walking and the new p{u,m}d_large() helpers added to arm64. >
On Tue, Oct 03, 2017 at 03:48:46PM +0100, Mark Rutland wrote: > On Wed, Sep 20, 2017 at 04:17:11PM -0400, Pavel Tatashin wrote: > > During early boot, kasan uses vmemmap_populate() to establish its shadow > > memory. But, that interface is intended for struct pages use. > > > > Because of the current project, vmemmap won't be zeroed during allocation, > > but kasan expects that memory to be zeroed. We are adding a new > > kasan_map_populate() function to resolve this difference. > > Thanks for putting this together. > > I've given this a spin on arm64, and can confirm that it works. > > Given that this involes redundant walking of page tables, I still think > it'd be preferable to have some common *_populate() helper that took a > gfp argument, but I guess it's not the end of the world. > > I'll leave it to Will and Catalin to say whether they're happy with the > page table walking and the new p{u,m}d_large() helpers added to arm64. To be honest, it just looks completely backwards to me; we're walking the page tables we created earlier on so that we can figure out what needs to be zeroed for KASAN. We already had that information before, hence my preference to allow propagation of GFP_FLAGs to vmemmap_alloc_block when it's needed. I know that's not popular for some reason, but is walking the page tables really better? Will
Hi Will, I can go back to that approach, if Michal OK with it. But, that would mean that I would need to touch every single architecture that implements vmemmap_populate(), and also pass flags at least through these functions on every architectures (some have more than one decided by configs).: vmemmap_populate() vmemmap_populate_basepages() vmemmap_populate_hugepages() vmemmap_pte_populate() __vmemmap_alloc_block_buf() alloc_block_buf() vmemmap_alloc_block() IMO, while I understand that it looks strange that we must walk page table after creating it, it is a better approach: more enclosed as it effects kasan only, and more universal as it is in common code. We are also somewhat late in the review process, means we will need again to get ACKs from the maintainers of other arches. Pavel On Mon, Oct 9, 2017 at 1:13 PM, Will Deacon <will.deacon@arm.com> wrote: > On Tue, Oct 03, 2017 at 03:48:46PM +0100, Mark Rutland wrote: >> On Wed, Sep 20, 2017 at 04:17:11PM -0400, Pavel Tatashin wrote: >> > During early boot, kasan uses vmemmap_populate() to establish its shadow >> > memory. But, that interface is intended for struct pages use. >> > >> > Because of the current project, vmemmap won't be zeroed during allocation, >> > but kasan expects that memory to be zeroed. We are adding a new >> > kasan_map_populate() function to resolve this difference. >> >> Thanks for putting this together. >> >> I've given this a spin on arm64, and can confirm that it works. >> >> Given that this involes redundant walking of page tables, I still think >> it'd be preferable to have some common *_populate() helper that took a >> gfp argument, but I guess it's not the end of the world. >> >> I'll leave it to Will and Catalin to say whether they're happy with the >> page table walking and the new p{u,m}d_large() helpers added to arm64. > > To be honest, it just looks completely backwards to me; we're walking the > page tables we created earlier on so that we can figure out what needs to > be zeroed for KASAN. We already had that information before, hence my > preference to allow propagation of GFP_FLAGs to vmemmap_alloc_block when > it's needed. I know that's not popular for some reason, but is walking the > page tables really better? > > Will > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
On Mon 09-10-17 13:51:47, Pavel Tatashin wrote: > Hi Will, > > I can go back to that approach, if Michal OK with it. But, that would > mean that I would need to touch every single architecture that > implements vmemmap_populate(), and also pass flags at least through > these functions on every architectures (some have more than one > decided by configs).: > > vmemmap_populate() > vmemmap_populate_basepages() > vmemmap_populate_hugepages() > vmemmap_pte_populate() > __vmemmap_alloc_block_buf() > alloc_block_buf() > vmemmap_alloc_block() > > IMO, while I understand that it looks strange that we must walk page > table after creating it, it is a better approach: more enclosed as it > effects kasan only, and more universal as it is in common code. While I understand that gfp mask approach might look better at first sight this is by no means a general purpose API so I would rather be pragmatic and have a smaller code footprint than a more general interface. Kasan is pretty much a special case and doing a one time initialization 2 pass thing is imho acceptable. If this turns out to be impractical in future then let's fix it up but right now I would rather go a simpler path.
Hi Pavel, On Mon, Oct 09, 2017 at 01:51:47PM -0400, Pavel Tatashin wrote: > I can go back to that approach, if Michal OK with it. But, that would > mean that I would need to touch every single architecture that > implements vmemmap_populate(), and also pass flags at least through > these functions on every architectures (some have more than one > decided by configs).: > > vmemmap_populate() > vmemmap_populate_basepages() > vmemmap_populate_hugepages() > vmemmap_pte_populate() > __vmemmap_alloc_block_buf() > alloc_block_buf() > vmemmap_alloc_block() As an interim step, why not introduce something like vmemmap_alloc_block_flags and make the page-table walking opt-out for architectures that don't want it? Then we can just pass __GFP_ZERO from our vmemmap_populate where necessary and other architectures can do the page-table walking dance if they prefer. > IMO, while I understand that it looks strange that we must walk page > table after creating it, it is a better approach: more enclosed as it > effects kasan only, and more universal as it is in common code. I don't buy the more universal aspect, but I appreciate it's subjective. Frankly, I'd just sooner not have core code walking early page tables if it can be avoided, and it doesn't look hard to avoid it in this case. The fact that you're having to add pmd_large and pud_large, which are otherwise unused in mm/, is an indication that this isn't quite right imo. Will
Hi Will, In addition to what Michal wrote: > As an interim step, why not introduce something like > vmemmap_alloc_block_flags and make the page-table walking opt-out for > architectures that don't want it? Then we can just pass __GFP_ZERO from > our vmemmap_populate where necessary and other architectures can do the > page-table walking dance if they prefer. I do not see the benefit, implementing this approach means that we would need to implement two table walks instead of one: one for x86, another for ARM, as these two architectures support kasan. Also, this would become a requirement for any future architecture that want to add kasan support to add this page table walk implementation. >> IMO, while I understand that it looks strange that we must walk page >> table after creating it, it is a better approach: more enclosed as it >> effects kasan only, and more universal as it is in common code. > > I don't buy the more universal aspect, but I appreciate it's subjective. > Frankly, I'd just sooner not have core code walking early page tables if > it can be avoided, and it doesn't look hard to avoid it in this case. > The fact that you're having to add pmd_large and pud_large, which are > otherwise unused in mm/, is an indication that this isn't quite right imo. 28 +#define pmd_large(pmd) pmd_sect(pmd) 29 +#define pud_large(pud) pud_sect(pud) it is just naming difference, ARM64 calls them pmd_sect, common mm and other arches call them pmd_large/pud_large. Even the ARM has these defines in arm/include/asm/pgtable-3level.h arm/include/asm/pgtable-2level.h Pavel
On Mon, Oct 09, 2017 at 08:14:33PM +0200, Michal Hocko wrote: > On Mon 09-10-17 13:51:47, Pavel Tatashin wrote: > > I can go back to that approach, if Michal OK with it. But, that would > > mean that I would need to touch every single architecture that > > implements vmemmap_populate(), and also pass flags at least through > > these functions on every architectures (some have more than one > > decided by configs).: > > > > vmemmap_populate() > > vmemmap_populate_basepages() > > vmemmap_populate_hugepages() > > vmemmap_pte_populate() > > __vmemmap_alloc_block_buf() > > alloc_block_buf() > > vmemmap_alloc_block() > > > > IMO, while I understand that it looks strange that we must walk page > > table after creating it, it is a better approach: more enclosed as it > > effects kasan only, and more universal as it is in common code. > > While I understand that gfp mask approach might look better at first > sight this is by no means a general purpose API so I would rather be > pragmatic and have a smaller code footprint than a more general > interface. Kasan is pretty much a special case and doing a one time > initialization 2 pass thing is imho acceptable. If this turns out to be > impractical in future then let's fix it up but right now I would rather > go a simpler path. I think the simpler path for arm64 is really to say when we want the memory zeroing as opposed to exposing pmd_large/pud_large macros. Those are likely to grow more users too, but are difficult to use correctly as we have things like contiguous ptes that map to a granule smaller than a pmd. I proposed an alternative solution to Pavel already, but it could be made less general purpose by marking the function __meminit and only having it do anything if KASAN is compiled in. Will
On Mon, Oct 09, 2017 at 02:42:32PM -0400, Pavel Tatashin wrote: > Hi Will, > > In addition to what Michal wrote: > > > As an interim step, why not introduce something like > > vmemmap_alloc_block_flags and make the page-table walking opt-out for > > architectures that don't want it? Then we can just pass __GFP_ZERO from > > our vmemmap_populate where necessary and other architectures can do the > > page-table walking dance if they prefer. > > I do not see the benefit, implementing this approach means that we > would need to implement two table walks instead of one: one for x86, > another for ARM, as these two architectures support kasan. Also, this > would become a requirement for any future architecture that want to > add kasan support to add this page table walk implementation. We have two table walks even with your patch series applied afaict: one in our definition of vmemmap_populate (arch/arm64/mm/mmu.c) and this one in the core code. > >> IMO, while I understand that it looks strange that we must walk page > >> table after creating it, it is a better approach: more enclosed as it > >> effects kasan only, and more universal as it is in common code. > > > > I don't buy the more universal aspect, but I appreciate it's subjective. > > Frankly, I'd just sooner not have core code walking early page tables if > > it can be avoided, and it doesn't look hard to avoid it in this case. > > The fact that you're having to add pmd_large and pud_large, which are > > otherwise unused in mm/, is an indication that this isn't quite right imo. > > 28 +#define pmd_large(pmd) pmd_sect(pmd) > 29 +#define pud_large(pud) pud_sect(pud) > > it is just naming difference, ARM64 calls them pmd_sect, common mm and > other arches call them > pmd_large/pud_large. Even the ARM has these defines in > > arm/include/asm/pgtable-3level.h > arm/include/asm/pgtable-2level.h My worry is that these are actually highly arch-specific, but will likely grow more users in mm/ that assume things for all architectures that aren't necessarily valid. Will
Hi Will, > We have two table walks even with your patch series applied afaict: one in > our definition of vmemmap_populate (arch/arm64/mm/mmu.c) and this one > in the core code. I meant to say implementing two new page table walkers, not at runtime. > My worry is that these are actually highly arch-specific, but will likely > grow more users in mm/ that assume things for all architectures that aren't > necessarily valid. I see, how about moving new kasan_map_populate() implementation into arch dependent code: arch/x86/mm/kasan_init_64.c arch/arm64/mm/kasan_init.c This way we won't need to add pmd_large()/pud_large() macros for arm64? Pavel
Hi Pavel, On Mon, Oct 09, 2017 at 02:59:09PM -0400, Pavel Tatashin wrote: > > We have two table walks even with your patch series applied afaict: one in > > our definition of vmemmap_populate (arch/arm64/mm/mmu.c) and this one > > in the core code. > > I meant to say implementing two new page table walkers, not at runtime. Ok, but I'm still missing why you think that is needed. What would be the second page table walker that needs implementing? > > My worry is that these are actually highly arch-specific, but will likely > > grow more users in mm/ that assume things for all architectures that aren't > > necessarily valid. > > I see, how about moving new kasan_map_populate() implementation into > arch dependent code: > > arch/x86/mm/kasan_init_64.c > arch/arm64/mm/kasan_init.c > > This way we won't need to add pmd_large()/pud_large() macros for arm64? I guess we could implement that on arm64 using our current vmemmap_populate logic and an explicit memset. Will
> > Ok, but I'm still missing why you think that is needed. What would be the > second page table walker that needs implementing? > > I guess we could implement that on arm64 using our current vmemmap_populate > logic and an explicit memset. > Hi Will, What do you mean by explicit memset()? We can't simply memset() from start to end without doing the page table walk, because at the time kasan is calling vmemmap_populate() we have a tmp_pg_dir instead of swapper_pg_dir. We could do the explicit memset() after cpu_replace_ttbr1(lm_alias(swapper_pg_dir)); but again, this was in one of my previous implementations, and I was asked to replace that. Pavel
>> I guess we could implement that on arm64 using our current vmemmap_populate >> logic and an explicit memset. Hi Will, I will send out a new patch series with x86/arm64 versions of kasan_map_populate(), so you could take a look if this is something that is acceptable. Thank you, Pavel
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h index bc4e92337d16..d89713f04354 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -381,6 +381,9 @@ extern pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn, PUD_TYPE_TABLE) #endif +#define pmd_large(pmd) pmd_sect(pmd) +#define pud_large(pud) pud_sect(pud) + static inline void set_pmd(pmd_t *pmdp, pmd_t pmd) { *pmdp = pmd; diff --git a/include/linux/kasan.h b/include/linux/kasan.h index a5c7046f26b4..7e13df1722c2 100644 --- a/include/linux/kasan.h +++ b/include/linux/kasan.h @@ -78,6 +78,8 @@ size_t kasan_metadata_size(struct kmem_cache *cache); bool kasan_save_enable_multi_shot(void); void kasan_restore_multi_shot(bool enabled); +int __meminit kasan_map_populate(unsigned long start, unsigned long end, + int node); #else /* CONFIG_KASAN */ diff --git a/mm/kasan/kasan_init.c b/mm/kasan/kasan_init.c index 554e4c0f23a2..57a973f05f63 100644 --- a/mm/kasan/kasan_init.c +++ b/mm/kasan/kasan_init.c @@ -197,3 +197,70 @@ void __init kasan_populate_zero_shadow(const void *shadow_start, zero_p4d_populate(pgd, addr, next); } while (pgd++, addr = next, addr != end); } + +/* Creates mappings for kasan during early boot. The mapped memory is zeroed */ +int __meminit kasan_map_populate(unsigned long start, unsigned long end, + int node) +{ + unsigned long addr, pfn, next; + unsigned long long size; + pgd_t *pgd; + p4d_t *p4d; + pud_t *pud; + pmd_t *pmd; + pte_t *pte; + int ret; + + ret = vmemmap_populate(start, end, node); + /* + * We might have partially populated memory, so check for no entries, + * and zero only those that actually exist. + */ + for (addr = start; addr < end; addr = next) { + pgd = pgd_offset_k(addr); + if (pgd_none(*pgd)) { + next = pgd_addr_end(addr, end); + continue; + } + + p4d = p4d_offset(pgd, addr); + if (p4d_none(*p4d)) { + next = p4d_addr_end(addr, end); + continue; + } + + pud = pud_offset(p4d, addr); + if (pud_none(*pud)) { + next = pud_addr_end(addr, end); + continue; + } + if (pud_large(*pud)) { + /* This is PUD size page */ + next = pud_addr_end(addr, end); + size = PUD_SIZE; + pfn = pud_pfn(*pud); + } else { + pmd = pmd_offset(pud, addr); + if (pmd_none(*pmd)) { + next = pmd_addr_end(addr, end); + continue; + } + if (pmd_large(*pmd)) { + /* This is PMD size page */ + next = pmd_addr_end(addr, end); + size = PMD_SIZE; + pfn = pmd_pfn(*pmd); + } else { + pte = pte_offset_kernel(pmd, addr); + next = addr + PAGE_SIZE; + if (pte_none(*pte)) + continue; + /* This is base size page */ + size = PAGE_SIZE; + pfn = pte_pfn(*pte); + } + } + memset(phys_to_virt(PFN_PHYS(pfn)), 0, size); + } + return ret; +}
During early boot, kasan uses vmemmap_populate() to establish its shadow memory. But, that interface is intended for struct pages use. Because of the current project, vmemmap won't be zeroed during allocation, but kasan expects that memory to be zeroed. We are adding a new kasan_map_populate() function to resolve this difference. Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com> --- arch/arm64/include/asm/pgtable.h | 3 ++ include/linux/kasan.h | 2 ++ mm/kasan/kasan_init.c | 67 ++++++++++++++++++++++++++++++++++++++++ 3 files changed, 72 insertions(+)