diff mbox series

[RFC,v3,1/6] arch: introduce set_direct_map_valid_noflush()

Message ID 20241030134912.515725-2-roypat@amazon.co.uk (mailing list archive)
State New
Headers show
Series Direct Map Removal for guest_memfd | expand

Commit Message

Patrick Roy Oct. 30, 2024, 1:49 p.m. UTC
From: "Mike Rapoport (Microsoft)" <rppt@kernel.org>

From: Mike Rapoport (Microsoft) <rppt@kernel.org>

Add an API that will allow updates of the direct/linear map for a set of
physically contiguous pages.

It will be used in the following patches.

Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Signed-off-by: Patrick Roy <roypat@amazon.co.uk>
---
 arch/arm64/include/asm/set_memory.h     |  1 +
 arch/arm64/mm/pageattr.c                | 10 ++++++++++
 arch/loongarch/include/asm/set_memory.h |  1 +
 arch/loongarch/mm/pageattr.c            | 21 +++++++++++++++++++++
 arch/riscv/include/asm/set_memory.h     |  1 +
 arch/riscv/mm/pageattr.c                | 15 +++++++++++++++
 arch/s390/include/asm/set_memory.h      |  1 +
 arch/s390/mm/pageattr.c                 | 11 +++++++++++
 arch/x86/include/asm/set_memory.h       |  1 +
 arch/x86/mm/pat/set_memory.c            |  8 ++++++++
 include/linux/set_memory.h              |  6 ++++++
 11 files changed, 76 insertions(+)


base-commit: 5cb1659f412041e4780f2e8ee49b2e03728a2ba6

Comments

David Hildenbrand Oct. 31, 2024, 9:57 a.m. UTC | #1
On 30.10.24 14:49, Patrick Roy wrote:
> From: "Mike Rapoport (Microsoft)" <rppt@kernel.org>
> 
> From: Mike Rapoport (Microsoft) <rppt@kernel.org>
> 
> Add an API that will allow updates of the direct/linear map for a set of
> physically contiguous pages.
> 
> It will be used in the following patches.
> 
> Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
> Signed-off-by: Patrick Roy <roypat@amazon.co.uk>


[...]

>   #ifdef CONFIG_DEBUG_PAGEALLOC
>   void __kernel_map_pages(struct page *page, int numpages, int enable)
>   {
> diff --git a/include/linux/set_memory.h b/include/linux/set_memory.h
> index e7aec20fb44f1..3030d9245f5ac 100644
> --- a/include/linux/set_memory.h
> +++ b/include/linux/set_memory.h
> @@ -34,6 +34,12 @@ static inline int set_direct_map_default_noflush(struct page *page)
>   	return 0;
>   }
>   
> +static inline int set_direct_map_valid_noflush(struct page *page,
> +					       unsigned nr, bool valid)

I recall that "unsigned" is frowned upon; "unsigned int".

> +{
> +	return 0;
> +}

Can we add some kernel doc for this?

In particular

(a) What does it mean when we return 0? That it worked? Then, this
     dummy function looks wrong. Or this it return the
     number of processed entries? Then we'd have a possible "int" vs.
     "unsigned int" inconsistency.

(b) What are the semantics when we fail halfway through the operation
     when processing nr > 1? Is it "all or nothing"?
Vlastimil Babka Nov. 11, 2024, 12:12 p.m. UTC | #2
On 10/31/24 10:57, David Hildenbrand wrote:
> On 30.10.24 14:49, Patrick Roy wrote:
>> From: "Mike Rapoport (Microsoft)" <rppt@kernel.org>
>> 
>> From: Mike Rapoport (Microsoft) <rppt@kernel.org>
>> 
>> Add an API that will allow updates of the direct/linear map for a set of
>> physically contiguous pages.
>> 
>> It will be used in the following patches.
>> 
>> Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
>> Signed-off-by: Patrick Roy <roypat@amazon.co.uk>
> 
> 
> [...]
> 
>>   #ifdef CONFIG_DEBUG_PAGEALLOC
>>   void __kernel_map_pages(struct page *page, int numpages, int enable)
>>   {
>> diff --git a/include/linux/set_memory.h b/include/linux/set_memory.h
>> index e7aec20fb44f1..3030d9245f5ac 100644
>> --- a/include/linux/set_memory.h
>> +++ b/include/linux/set_memory.h
>> @@ -34,6 +34,12 @@ static inline int set_direct_map_default_noflush(struct page *page)
>>   	return 0;
>>   }
>>   
>> +static inline int set_direct_map_valid_noflush(struct page *page,
>> +					       unsigned nr, bool valid)
> 
> I recall that "unsigned" is frowned upon; "unsigned int".
> 
>> +{
>> +	return 0;
>> +}
> 
> Can we add some kernel doc for this?
> 
> In particular
> 
> (a) What does it mean when we return 0? That it worked? Then, this

Seems so.

>      dummy function looks wrong. Or this it return the

That's !CONFIG_ARCH_HAS_SET_DIRECT_MAP and other functions around do it the
same way. Looks like the current callers can only exist with the CONFIG_
enabled in the first place.

>      number of processed entries? Then we'd have a possible "int" vs.
>      "unsigned int" inconsistency.
> 
> (b) What are the semantics when we fail halfway through the operation
>      when processing nr > 1? Is it "all or nothing"?

Looking at x86 implementation it seems like it can just bail out in the
middle, but then I'm not sure if it can really fail in the middle, hmm...
Patrick Roy Nov. 12, 2024, 2:48 p.m. UTC | #3
On Mon, 2024-11-11 at 12:12 +0000, Vlastimil Babka wrote:
> On 10/31/24 10:57, David Hildenbrand wrote:
>> On 30.10.24 14:49, Patrick Roy wrote:
>>> From: "Mike Rapoport (Microsoft)" <rppt@kernel.org>
>>>
>>> From: Mike Rapoport (Microsoft) <rppt@kernel.org>
>>>
>>> Add an API that will allow updates of the direct/linear map for a set of
>>> physically contiguous pages.
>>>
>>> It will be used in the following patches.
>>>
>>> Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
>>> Signed-off-by: Patrick Roy <roypat@amazon.co.uk>
>>
>>
>> [...]
>>
>>>   #ifdef CONFIG_DEBUG_PAGEALLOC
>>>   void __kernel_map_pages(struct page *page, int numpages, int enable)
>>>   {
>>> diff --git a/include/linux/set_memory.h b/include/linux/set_memory.h
>>> index e7aec20fb44f1..3030d9245f5ac 100644
>>> --- a/include/linux/set_memory.h
>>> +++ b/include/linux/set_memory.h
>>> @@ -34,6 +34,12 @@ static inline int set_direct_map_default_noflush(struct page *page)
>>>      return 0;
>>>   }
>>>
>>> +static inline int set_direct_map_valid_noflush(struct page *page,
>>> +                                           unsigned nr, bool valid)
>>
>> I recall that "unsigned" is frowned upon; "unsigned int".
>>
>>> +{
>>> +    return 0;
>>> +}
>>
>> Can we add some kernel doc for this?
>>
>> In particular
>>
>> (a) What does it mean when we return 0? That it worked? Then, this
> 
> Seems so.
> 
>>      dummy function looks wrong. Or this it return the
> 
> That's !CONFIG_ARCH_HAS_SET_DIRECT_MAP and other functions around do it the
> same way. Looks like the current callers can only exist with the CONFIG_
> enabled in the first place.

Yeah, it looks a bit weird, but these functions seem to generally return
0 if the operation is not supported. ARM specifically has 

	if (!can_set_direct_map())
		return 0;

inside `set_direct_map_invalid_{noflush,default}`. Documenting this
definitely cannot hurt, I'll keep it on my todo list for the next
iteration :)

>>      number of processed entries? Then we'd have a possible "int" vs.
>>      "unsigned int" inconsistency.
>>
>> (b) What are the semantics when we fail halfway through the operation
>>      when processing nr > 1? Is it "all or nothing"?
> 
> Looking at x86 implementation it seems like it can just bail out in the
> middle, but then I'm not sure if it can really fail in the middle, hmm...

If I understood Mike correctly when talking about this at LPC, then it
can only fail if during break-up of huge mappings, it fails to allocate
page tables to hold the lower-granularity mappings (which happens before
any present bits are modified).

Best,
Patrick
diff mbox series

Patch

diff --git a/arch/arm64/include/asm/set_memory.h b/arch/arm64/include/asm/set_memory.h
index 917761feeffdd..98088c043606a 100644
--- a/arch/arm64/include/asm/set_memory.h
+++ b/arch/arm64/include/asm/set_memory.h
@@ -13,6 +13,7 @@  int set_memory_valid(unsigned long addr, int numpages, int enable);
 
 int set_direct_map_invalid_noflush(struct page *page);
 int set_direct_map_default_noflush(struct page *page);
+int set_direct_map_valid_noflush(struct page *page, unsigned nr, bool valid);
 bool kernel_page_present(struct page *page);
 
 #endif /* _ASM_ARM64_SET_MEMORY_H */
diff --git a/arch/arm64/mm/pageattr.c b/arch/arm64/mm/pageattr.c
index 0e270a1c51e64..01225900293ac 100644
--- a/arch/arm64/mm/pageattr.c
+++ b/arch/arm64/mm/pageattr.c
@@ -192,6 +192,16 @@  int set_direct_map_default_noflush(struct page *page)
 				   PAGE_SIZE, change_page_range, &data);
 }
 
+int set_direct_map_valid_noflush(struct page *page, unsigned nr, bool valid)
+{
+	unsigned long addr = (unsigned long)page_address(page);
+
+	if (!can_set_direct_map())
+		return 0;
+
+	return set_memory_valid(addr, nr, valid);
+}
+
 #ifdef CONFIG_DEBUG_PAGEALLOC
 void __kernel_map_pages(struct page *page, int numpages, int enable)
 {
diff --git a/arch/loongarch/include/asm/set_memory.h b/arch/loongarch/include/asm/set_memory.h
index d70505b6676cb..55dfaefd02c8a 100644
--- a/arch/loongarch/include/asm/set_memory.h
+++ b/arch/loongarch/include/asm/set_memory.h
@@ -17,5 +17,6 @@  int set_memory_rw(unsigned long addr, int numpages);
 bool kernel_page_present(struct page *page);
 int set_direct_map_default_noflush(struct page *page);
 int set_direct_map_invalid_noflush(struct page *page);
+int set_direct_map_valid_noflush(struct page *page, unsigned nr, bool valid);
 
 #endif /* _ASM_LOONGARCH_SET_MEMORY_H */
diff --git a/arch/loongarch/mm/pageattr.c b/arch/loongarch/mm/pageattr.c
index ffd8d76021d47..f14b40c968b48 100644
--- a/arch/loongarch/mm/pageattr.c
+++ b/arch/loongarch/mm/pageattr.c
@@ -216,3 +216,24 @@  int set_direct_map_invalid_noflush(struct page *page)
 
 	return __set_memory(addr, 1, __pgprot(0), __pgprot(_PAGE_PRESENT | _PAGE_VALID));
 }
+
+int set_direct_map_valid_noflush(struct page *page, unsigned nr, bool valid)
+{
+	unsigned long addr = (unsigned long)page_address(page);
+	pgprot_t set, clear;
+
+	return __set_memory((unsigned long)page_address(page), nr, set, clear);
+
+	if (addr < vm_map_base)
+		return 0;
+
+	if (valid) {
+		set = PAGE_KERNEL;
+		clear = __pgprot(0);
+	} else {
+		set = __pgprot(0);
+		clear = __pgprot(_PAGE_PRESENT | _PAGE_VALID);
+	}
+
+	return __set_memory(addr, 1, set, clear);
+}
diff --git a/arch/riscv/include/asm/set_memory.h b/arch/riscv/include/asm/set_memory.h
index ab92fc84e1fc9..ea263d3683ef6 100644
--- a/arch/riscv/include/asm/set_memory.h
+++ b/arch/riscv/include/asm/set_memory.h
@@ -42,6 +42,7 @@  static inline int set_kernel_memory(char *startp, char *endp,
 
 int set_direct_map_invalid_noflush(struct page *page);
 int set_direct_map_default_noflush(struct page *page);
+int set_direct_map_valid_noflush(struct page *page, unsigned nr, bool valid);
 bool kernel_page_present(struct page *page);
 
 #endif /* __ASSEMBLY__ */
diff --git a/arch/riscv/mm/pageattr.c b/arch/riscv/mm/pageattr.c
index 271d01a5ba4da..d815448758a19 100644
--- a/arch/riscv/mm/pageattr.c
+++ b/arch/riscv/mm/pageattr.c
@@ -386,6 +386,21 @@  int set_direct_map_default_noflush(struct page *page)
 			    PAGE_KERNEL, __pgprot(_PAGE_EXEC));
 }
 
+int set_direct_map_valid_noflush(struct page *page, unsigned nr, bool valid)
+{
+	pgprot_t set, clear;
+
+	if (valid) {
+		set = PAGE_KERNEL;
+		clear = __pgprot(_PAGE_EXEC);
+	} else {
+		set = __pgprot(0);
+		clear = __pgprot(_PAGE_PRESENT);
+	}
+
+	return __set_memory((unsigned long)page_address(page), nr, set, clear);
+}
+
 #ifdef CONFIG_DEBUG_PAGEALLOC
 static int debug_pagealloc_set_page(pte_t *pte, unsigned long addr, void *data)
 {
diff --git a/arch/s390/include/asm/set_memory.h b/arch/s390/include/asm/set_memory.h
index 06fbabe2f66c9..240bcfbdcdcec 100644
--- a/arch/s390/include/asm/set_memory.h
+++ b/arch/s390/include/asm/set_memory.h
@@ -62,5 +62,6 @@  __SET_MEMORY_FUNC(set_memory_4k, SET_MEMORY_4K)
 
 int set_direct_map_invalid_noflush(struct page *page);
 int set_direct_map_default_noflush(struct page *page);
+int set_direct_map_valid_noflush(struct page *page, unsigned nr, bool valid);
 
 #endif
diff --git a/arch/s390/mm/pageattr.c b/arch/s390/mm/pageattr.c
index 5f805ad42d4c3..4c7ee74aa130d 100644
--- a/arch/s390/mm/pageattr.c
+++ b/arch/s390/mm/pageattr.c
@@ -406,6 +406,17 @@  int set_direct_map_default_noflush(struct page *page)
 	return __set_memory((unsigned long)page_to_virt(page), 1, SET_MEMORY_DEF);
 }
 
+int set_direct_map_valid_noflush(struct page *page, unsigned nr, bool valid)
+{
+	unsigned long flags;
+
+	if (valid)
+		flags = SET_MEMORY_DEF;
+	else
+		flags = SET_MEMORY_INV;
+
+	return __set_memory((unsigned long)page_to_virt(page), nr, flags);
+}
 #if defined(CONFIG_DEBUG_PAGEALLOC) || defined(CONFIG_KFENCE)
 
 static void ipte_range(pte_t *pte, unsigned long address, int nr)
diff --git a/arch/x86/include/asm/set_memory.h b/arch/x86/include/asm/set_memory.h
index 4b2abce2e3e7d..cc62ef70ccc0a 100644
--- a/arch/x86/include/asm/set_memory.h
+++ b/arch/x86/include/asm/set_memory.h
@@ -89,6 +89,7 @@  int set_pages_rw(struct page *page, int numpages);
 
 int set_direct_map_invalid_noflush(struct page *page);
 int set_direct_map_default_noflush(struct page *page);
+int set_direct_map_valid_noflush(struct page *page, unsigned nr, bool valid);
 bool kernel_page_present(struct page *page);
 
 extern int kernel_set_to_readonly;
diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
index 44f7b2ea6a073..069e421c22474 100644
--- a/arch/x86/mm/pat/set_memory.c
+++ b/arch/x86/mm/pat/set_memory.c
@@ -2444,6 +2444,14 @@  int set_direct_map_default_noflush(struct page *page)
 	return __set_pages_p(page, 1);
 }
 
+int set_direct_map_valid_noflush(struct page *page, unsigned nr, bool valid)
+{
+	if (valid)
+		return __set_pages_p(page, nr);
+
+	return __set_pages_np(page, nr);
+}
+
 #ifdef CONFIG_DEBUG_PAGEALLOC
 void __kernel_map_pages(struct page *page, int numpages, int enable)
 {
diff --git a/include/linux/set_memory.h b/include/linux/set_memory.h
index e7aec20fb44f1..3030d9245f5ac 100644
--- a/include/linux/set_memory.h
+++ b/include/linux/set_memory.h
@@ -34,6 +34,12 @@  static inline int set_direct_map_default_noflush(struct page *page)
 	return 0;
 }
 
+static inline int set_direct_map_valid_noflush(struct page *page,
+					       unsigned nr, bool valid)
+{
+	return 0;
+}
+
 static inline bool kernel_page_present(struct page *page)
 {
 	return true;