Message ID | 20241030134912.515725-2-roypat@amazon.co.uk (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | Direct Map Removal for guest_memfd | expand |
On 30.10.24 14:49, Patrick Roy wrote: > From: "Mike Rapoport (Microsoft)" <rppt@kernel.org> > > From: Mike Rapoport (Microsoft) <rppt@kernel.org> > > Add an API that will allow updates of the direct/linear map for a set of > physically contiguous pages. > > It will be used in the following patches. > > Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> > Signed-off-by: Patrick Roy <roypat@amazon.co.uk> [...] > #ifdef CONFIG_DEBUG_PAGEALLOC > void __kernel_map_pages(struct page *page, int numpages, int enable) > { > diff --git a/include/linux/set_memory.h b/include/linux/set_memory.h > index e7aec20fb44f1..3030d9245f5ac 100644 > --- a/include/linux/set_memory.h > +++ b/include/linux/set_memory.h > @@ -34,6 +34,12 @@ static inline int set_direct_map_default_noflush(struct page *page) > return 0; > } > > +static inline int set_direct_map_valid_noflush(struct page *page, > + unsigned nr, bool valid) I recall that "unsigned" is frowned upon; "unsigned int". > +{ > + return 0; > +} Can we add some kernel doc for this? In particular (a) What does it mean when we return 0? That it worked? Then, this dummy function looks wrong. Or this it return the number of processed entries? Then we'd have a possible "int" vs. "unsigned int" inconsistency. (b) What are the semantics when we fail halfway through the operation when processing nr > 1? Is it "all or nothing"?
On 10/31/24 10:57, David Hildenbrand wrote: > On 30.10.24 14:49, Patrick Roy wrote: >> From: "Mike Rapoport (Microsoft)" <rppt@kernel.org> >> >> From: Mike Rapoport (Microsoft) <rppt@kernel.org> >> >> Add an API that will allow updates of the direct/linear map for a set of >> physically contiguous pages. >> >> It will be used in the following patches. >> >> Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> >> Signed-off-by: Patrick Roy <roypat@amazon.co.uk> > > > [...] > >> #ifdef CONFIG_DEBUG_PAGEALLOC >> void __kernel_map_pages(struct page *page, int numpages, int enable) >> { >> diff --git a/include/linux/set_memory.h b/include/linux/set_memory.h >> index e7aec20fb44f1..3030d9245f5ac 100644 >> --- a/include/linux/set_memory.h >> +++ b/include/linux/set_memory.h >> @@ -34,6 +34,12 @@ static inline int set_direct_map_default_noflush(struct page *page) >> return 0; >> } >> >> +static inline int set_direct_map_valid_noflush(struct page *page, >> + unsigned nr, bool valid) > > I recall that "unsigned" is frowned upon; "unsigned int". > >> +{ >> + return 0; >> +} > > Can we add some kernel doc for this? > > In particular > > (a) What does it mean when we return 0? That it worked? Then, this Seems so. > dummy function looks wrong. Or this it return the That's !CONFIG_ARCH_HAS_SET_DIRECT_MAP and other functions around do it the same way. Looks like the current callers can only exist with the CONFIG_ enabled in the first place. > number of processed entries? Then we'd have a possible "int" vs. > "unsigned int" inconsistency. > > (b) What are the semantics when we fail halfway through the operation > when processing nr > 1? Is it "all or nothing"? Looking at x86 implementation it seems like it can just bail out in the middle, but then I'm not sure if it can really fail in the middle, hmm...
On Mon, 2024-11-11 at 12:12 +0000, Vlastimil Babka wrote: > On 10/31/24 10:57, David Hildenbrand wrote: >> On 30.10.24 14:49, Patrick Roy wrote: >>> From: "Mike Rapoport (Microsoft)" <rppt@kernel.org> >>> >>> From: Mike Rapoport (Microsoft) <rppt@kernel.org> >>> >>> Add an API that will allow updates of the direct/linear map for a set of >>> physically contiguous pages. >>> >>> It will be used in the following patches. >>> >>> Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> >>> Signed-off-by: Patrick Roy <roypat@amazon.co.uk> >> >> >> [...] >> >>> #ifdef CONFIG_DEBUG_PAGEALLOC >>> void __kernel_map_pages(struct page *page, int numpages, int enable) >>> { >>> diff --git a/include/linux/set_memory.h b/include/linux/set_memory.h >>> index e7aec20fb44f1..3030d9245f5ac 100644 >>> --- a/include/linux/set_memory.h >>> +++ b/include/linux/set_memory.h >>> @@ -34,6 +34,12 @@ static inline int set_direct_map_default_noflush(struct page *page) >>> return 0; >>> } >>> >>> +static inline int set_direct_map_valid_noflush(struct page *page, >>> + unsigned nr, bool valid) >> >> I recall that "unsigned" is frowned upon; "unsigned int". >> >>> +{ >>> + return 0; >>> +} >> >> Can we add some kernel doc for this? >> >> In particular >> >> (a) What does it mean when we return 0? That it worked? Then, this > > Seems so. > >> dummy function looks wrong. Or this it return the > > That's !CONFIG_ARCH_HAS_SET_DIRECT_MAP and other functions around do it the > same way. Looks like the current callers can only exist with the CONFIG_ > enabled in the first place. Yeah, it looks a bit weird, but these functions seem to generally return 0 if the operation is not supported. ARM specifically has if (!can_set_direct_map()) return 0; inside `set_direct_map_invalid_{noflush,default}`. Documenting this definitely cannot hurt, I'll keep it on my todo list for the next iteration :) >> number of processed entries? Then we'd have a possible "int" vs. >> "unsigned int" inconsistency. >> >> (b) What are the semantics when we fail halfway through the operation >> when processing nr > 1? Is it "all or nothing"? > > Looking at x86 implementation it seems like it can just bail out in the > middle, but then I'm not sure if it can really fail in the middle, hmm... If I understood Mike correctly when talking about this at LPC, then it can only fail if during break-up of huge mappings, it fails to allocate page tables to hold the lower-granularity mappings (which happens before any present bits are modified). Best, Patrick
diff --git a/arch/arm64/include/asm/set_memory.h b/arch/arm64/include/asm/set_memory.h index 917761feeffdd..98088c043606a 100644 --- a/arch/arm64/include/asm/set_memory.h +++ b/arch/arm64/include/asm/set_memory.h @@ -13,6 +13,7 @@ int set_memory_valid(unsigned long addr, int numpages, int enable); int set_direct_map_invalid_noflush(struct page *page); int set_direct_map_default_noflush(struct page *page); +int set_direct_map_valid_noflush(struct page *page, unsigned nr, bool valid); bool kernel_page_present(struct page *page); #endif /* _ASM_ARM64_SET_MEMORY_H */ diff --git a/arch/arm64/mm/pageattr.c b/arch/arm64/mm/pageattr.c index 0e270a1c51e64..01225900293ac 100644 --- a/arch/arm64/mm/pageattr.c +++ b/arch/arm64/mm/pageattr.c @@ -192,6 +192,16 @@ int set_direct_map_default_noflush(struct page *page) PAGE_SIZE, change_page_range, &data); } +int set_direct_map_valid_noflush(struct page *page, unsigned nr, bool valid) +{ + unsigned long addr = (unsigned long)page_address(page); + + if (!can_set_direct_map()) + return 0; + + return set_memory_valid(addr, nr, valid); +} + #ifdef CONFIG_DEBUG_PAGEALLOC void __kernel_map_pages(struct page *page, int numpages, int enable) { diff --git a/arch/loongarch/include/asm/set_memory.h b/arch/loongarch/include/asm/set_memory.h index d70505b6676cb..55dfaefd02c8a 100644 --- a/arch/loongarch/include/asm/set_memory.h +++ b/arch/loongarch/include/asm/set_memory.h @@ -17,5 +17,6 @@ int set_memory_rw(unsigned long addr, int numpages); bool kernel_page_present(struct page *page); int set_direct_map_default_noflush(struct page *page); int set_direct_map_invalid_noflush(struct page *page); +int set_direct_map_valid_noflush(struct page *page, unsigned nr, bool valid); #endif /* _ASM_LOONGARCH_SET_MEMORY_H */ diff --git a/arch/loongarch/mm/pageattr.c b/arch/loongarch/mm/pageattr.c index ffd8d76021d47..f14b40c968b48 100644 --- a/arch/loongarch/mm/pageattr.c +++ b/arch/loongarch/mm/pageattr.c @@ -216,3 +216,24 @@ int set_direct_map_invalid_noflush(struct page *page) return __set_memory(addr, 1, __pgprot(0), __pgprot(_PAGE_PRESENT | _PAGE_VALID)); } + +int set_direct_map_valid_noflush(struct page *page, unsigned nr, bool valid) +{ + unsigned long addr = (unsigned long)page_address(page); + pgprot_t set, clear; + + return __set_memory((unsigned long)page_address(page), nr, set, clear); + + if (addr < vm_map_base) + return 0; + + if (valid) { + set = PAGE_KERNEL; + clear = __pgprot(0); + } else { + set = __pgprot(0); + clear = __pgprot(_PAGE_PRESENT | _PAGE_VALID); + } + + return __set_memory(addr, 1, set, clear); +} diff --git a/arch/riscv/include/asm/set_memory.h b/arch/riscv/include/asm/set_memory.h index ab92fc84e1fc9..ea263d3683ef6 100644 --- a/arch/riscv/include/asm/set_memory.h +++ b/arch/riscv/include/asm/set_memory.h @@ -42,6 +42,7 @@ static inline int set_kernel_memory(char *startp, char *endp, int set_direct_map_invalid_noflush(struct page *page); int set_direct_map_default_noflush(struct page *page); +int set_direct_map_valid_noflush(struct page *page, unsigned nr, bool valid); bool kernel_page_present(struct page *page); #endif /* __ASSEMBLY__ */ diff --git a/arch/riscv/mm/pageattr.c b/arch/riscv/mm/pageattr.c index 271d01a5ba4da..d815448758a19 100644 --- a/arch/riscv/mm/pageattr.c +++ b/arch/riscv/mm/pageattr.c @@ -386,6 +386,21 @@ int set_direct_map_default_noflush(struct page *page) PAGE_KERNEL, __pgprot(_PAGE_EXEC)); } +int set_direct_map_valid_noflush(struct page *page, unsigned nr, bool valid) +{ + pgprot_t set, clear; + + if (valid) { + set = PAGE_KERNEL; + clear = __pgprot(_PAGE_EXEC); + } else { + set = __pgprot(0); + clear = __pgprot(_PAGE_PRESENT); + } + + return __set_memory((unsigned long)page_address(page), nr, set, clear); +} + #ifdef CONFIG_DEBUG_PAGEALLOC static int debug_pagealloc_set_page(pte_t *pte, unsigned long addr, void *data) { diff --git a/arch/s390/include/asm/set_memory.h b/arch/s390/include/asm/set_memory.h index 06fbabe2f66c9..240bcfbdcdcec 100644 --- a/arch/s390/include/asm/set_memory.h +++ b/arch/s390/include/asm/set_memory.h @@ -62,5 +62,6 @@ __SET_MEMORY_FUNC(set_memory_4k, SET_MEMORY_4K) int set_direct_map_invalid_noflush(struct page *page); int set_direct_map_default_noflush(struct page *page); +int set_direct_map_valid_noflush(struct page *page, unsigned nr, bool valid); #endif diff --git a/arch/s390/mm/pageattr.c b/arch/s390/mm/pageattr.c index 5f805ad42d4c3..4c7ee74aa130d 100644 --- a/arch/s390/mm/pageattr.c +++ b/arch/s390/mm/pageattr.c @@ -406,6 +406,17 @@ int set_direct_map_default_noflush(struct page *page) return __set_memory((unsigned long)page_to_virt(page), 1, SET_MEMORY_DEF); } +int set_direct_map_valid_noflush(struct page *page, unsigned nr, bool valid) +{ + unsigned long flags; + + if (valid) + flags = SET_MEMORY_DEF; + else + flags = SET_MEMORY_INV; + + return __set_memory((unsigned long)page_to_virt(page), nr, flags); +} #if defined(CONFIG_DEBUG_PAGEALLOC) || defined(CONFIG_KFENCE) static void ipte_range(pte_t *pte, unsigned long address, int nr) diff --git a/arch/x86/include/asm/set_memory.h b/arch/x86/include/asm/set_memory.h index 4b2abce2e3e7d..cc62ef70ccc0a 100644 --- a/arch/x86/include/asm/set_memory.h +++ b/arch/x86/include/asm/set_memory.h @@ -89,6 +89,7 @@ int set_pages_rw(struct page *page, int numpages); int set_direct_map_invalid_noflush(struct page *page); int set_direct_map_default_noflush(struct page *page); +int set_direct_map_valid_noflush(struct page *page, unsigned nr, bool valid); bool kernel_page_present(struct page *page); extern int kernel_set_to_readonly; diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c index 44f7b2ea6a073..069e421c22474 100644 --- a/arch/x86/mm/pat/set_memory.c +++ b/arch/x86/mm/pat/set_memory.c @@ -2444,6 +2444,14 @@ int set_direct_map_default_noflush(struct page *page) return __set_pages_p(page, 1); } +int set_direct_map_valid_noflush(struct page *page, unsigned nr, bool valid) +{ + if (valid) + return __set_pages_p(page, nr); + + return __set_pages_np(page, nr); +} + #ifdef CONFIG_DEBUG_PAGEALLOC void __kernel_map_pages(struct page *page, int numpages, int enable) { diff --git a/include/linux/set_memory.h b/include/linux/set_memory.h index e7aec20fb44f1..3030d9245f5ac 100644 --- a/include/linux/set_memory.h +++ b/include/linux/set_memory.h @@ -34,6 +34,12 @@ static inline int set_direct_map_default_noflush(struct page *page) return 0; } +static inline int set_direct_map_valid_noflush(struct page *page, + unsigned nr, bool valid) +{ + return 0; +} + static inline bool kernel_page_present(struct page *page) { return true;