diff mbox series

[RFC,04/12] x86: add support of memory protection for NUMA replicas

Message ID 20231228131056.602411-5-artem.kuzin@huawei.com (mailing list archive)
State New
Headers show
Series x86 NUMA-aware kernel replication | expand

Commit Message

Artem Kuzin Dec. 28, 2023, 1:10 p.m. UTC
From: Artem Kuzin <artem.kuzin@huawei.com>

Co-developed-by: Nikita Panov <nikita.panov@huawei-partners.com>
Signed-off-by: Nikita Panov <nikita.panov@huawei-partners.com>
Co-developed-by: Alexander Grubnikov <alexander.grubnikov@huawei.com>
Signed-off-by: Alexander Grubnikov <alexander.grubnikov@huawei.com>
Signed-off-by: Artem Kuzin <artem.kuzin@huawei.com>
---
 arch/x86/include/asm/set_memory.h |  14 +++
 arch/x86/mm/pat/set_memory.c      | 150 +++++++++++++++++++++++++++++-
 include/asm-generic/set_memory.h  |  12 +++
 include/linux/set_memory.h        |  10 ++
 4 files changed, 185 insertions(+), 1 deletion(-)

Comments

Garg, Shivank Jan. 9, 2024, 6:46 a.m. UTC | #1
Hi Artem,

I hope this message finds you well.
I've encountered a compilation issue when KERNEL_REPLICATION is disabled in the config.

ld: vmlinux.o: in function `alloc_insn_page':
/home/amd/linux_mainline/arch/x86/kernel/kprobes/core.c:425: undefined reference to `numa_set_memory_rox'
ld: vmlinux.o: in function `alloc_new_pack':
/home/amd/linux_mainline/kernel/bpf/core.c:873: undefined reference to `numa_set_memory_rox'
ld: vmlinux.o: in function `bpf_prog_pack_alloc':
/home/amd/linux_mainline/kernel/bpf/core.c:891: undefined reference to `numa_set_memory_rox'
ld: vmlinux.o: in function `bpf_trampoline_update':
/home/amd/linux_mainline/kernel/bpf/trampoline.c:447: undefined reference to `numa_set_memory_rox'
ld: vmlinux.o: in function `bpf_struct_ops_map_update_elem':
/home/amd/linux_mainline/kernel/bpf/bpf_struct_ops.c:515: undefined reference to `numa_set_memory_rox'
ld: vmlinux.o:/home/amd/linux_mainline/kernel/bpf/bpf_struct_ops.c:524: more undefined references to `numa_set_memory_rox' follow


After some investigation, I've put together a patch that resolves this compilation issues for me.

--- a/arch/x86/mm/pat/set_memory.c
+++ b/arch/x86/mm/pat/set_memory.c
@@ -2268,6 +2268,15 @@ int numa_set_memory_nonglobal(unsigned long addr, int numpages)

        return ret;
 }
+
+#else
+
+int numa_set_memory_rox(unsigned long addr, int numpages)
+{
+       return set_memory_rox(addr, numpages);
+
+}
+
 #endif

Additionally, I'm interested in evaluating the performance impact of this patchset on AMD processors.
Could you please point me the benchmarks that you have used in cover letter?

Best Regards,
Shivank
Artem Kuzin Jan. 9, 2024, 3:53 p.m. UTC | #2
Hi Shivank,
thanks a lot for the comments and findings, I've fixed build and plan to update the patch set soon.

On 1/9/2024 9:46 AM, Garg, Shivank wrote:
> Hi Artem,
>
> I hope this message finds you well.
> I've encountered a compilation issue when KERNEL_REPLICATION is disabled in the config.
>
> ld: vmlinux.o: in function `alloc_insn_page':
> /home/amd/linux_mainline/arch/x86/kernel/kprobes/core.c:425: undefined reference to `numa_set_memory_rox'
> ld: vmlinux.o: in function `alloc_new_pack':
> /home/amd/linux_mainline/kernel/bpf/core.c:873: undefined reference to `numa_set_memory_rox'
> ld: vmlinux.o: in function `bpf_prog_pack_alloc':
> /home/amd/linux_mainline/kernel/bpf/core.c:891: undefined reference to `numa_set_memory_rox'
> ld: vmlinux.o: in function `bpf_trampoline_update':
> /home/amd/linux_mainline/kernel/bpf/trampoline.c:447: undefined reference to `numa_set_memory_rox'
> ld: vmlinux.o: in function `bpf_struct_ops_map_update_elem':
> /home/amd/linux_mainline/kernel/bpf/bpf_struct_ops.c:515: undefined reference to `numa_set_memory_rox'
> ld: vmlinux.o:/home/amd/linux_mainline/kernel/bpf/bpf_struct_ops.c:524: more undefined references to `numa_set_memory_rox' follow
>
>
> After some investigation, I've put together a patch that resolves this compilation issues for me.
>
> --- a/arch/x86/mm/pat/set_memory.c
> +++ b/arch/x86/mm/pat/set_memory.c
> @@ -2268,6 +2268,15 @@ int numa_set_memory_nonglobal(unsigned long addr, int numpages)
>
>         return ret;
>  }
> +
> +#else
> +
> +int numa_set_memory_rox(unsigned long addr, int numpages)
> +{
> +       return set_memory_rox(addr, numpages);
> +
> +}
> +
>  #endif
>
> Additionally, I'm interested in evaluating the performance impact of this patchset on AMD processors.
> Could you please point me the benchmarks that you have used in cover letter?
>
> Best Regards,
> Shivank
>
Regarding the benchmarks, we used self-implemented test with system calls load for now.
We used RedHawk Linux approach as a reference.

The "An Overview of Kernel Text Page Replication in RedHawk™ Linux® 6.3" article was used.
https://concurrent-rt.com/wp-content/uploads/2020/12/kernel-page-replication.pdf

The test is very simple:
All measured system calls have been invoked using syscall wrapper from glibc, e.g.

#include <sys/syscall.h>      /* Definition of SYS_* constants */
#include <unistd.h>
 
long syscall(long number, ...);

fork/1
    Time measurements include only one time of invoking this system call. Measurements are made between entering
    and exiting the system call.
fork/1024
    The system call is invoked in  a loop 1024 times. The time between entering a loop and exiting it was measured.
mmap/munmap
    A set of 1024 pages (if PAGE_SIZE is not defined it is equal to 4096) was mapped using mmap syscall
    and unmapped using munmap one. Every page is mapped/unmapped per a loop iteration.
mmap/lock
    The same as above, but in this case flag MAP_LOCKED was added.
open/close
    The /dev/null pseudo-file was opened and closed in a loop 1024 times. It was opened and closed once per iteration.
mount
    The pseudo-filesystem procFS was mounted to a temporary directory inside /tmp only one time.
    The time between entering and exiting the system call was measured.
kill
    A signal handler for SIGUSR1 was setup. Signal was sent to a child process, which was created using fork glibc's wrapper.
    Time between sending and receiving SIGUSR1 signal was measured.

Testing environment:
    Processor Intel(R) Xeon(R) CPU E5-2690
    2 nodes with 12 CPU cores for each one.

Best Regards,
Artem
Garg, Shivank Jan. 10, 2024, 6:19 a.m. UTC | #3
Hi Artem,

> Regarding the benchmarks, we used self-implemented test with system calls load for now.
> We used RedHawk Linux approach as a reference.
> 
> The "An Overview of Kernel Text Page Replication in RedHawk™ Linux® 6.3" article was used.
> https://concurrent-rt.com/wp-content/uploads/2020/12/kernel-page-replication.pdf
> 
> The test is very simple:
> All measured system calls have been invoked using syscall wrapper from glibc, e.g.
> 
> #include <sys/syscall.h>      /* Definition of SYS_* constants */
> #include <unistd.h>
>  
> long syscall(long number, ...);


Thank you for the information on the tests. I will try this and get back with numbers.

Best Regards,
Shivank
diff mbox series

Patch

diff --git a/arch/x86/include/asm/set_memory.h b/arch/x86/include/asm/set_memory.h
index a5e89641bd2d..1efa15a08ef0 100644
--- a/arch/x86/include/asm/set_memory.h
+++ b/arch/x86/include/asm/set_memory.h
@@ -7,7 +7,9 @@ 
 #include <asm-generic/set_memory.h>
 
 #define set_memory_rox set_memory_rox
+#define numa_set_memory_rox numa_set_memory_rox
 int set_memory_rox(unsigned long addr, int numpages);
+int numa_set_memory_rox(unsigned long addr, int numpages);
 
 /*
  * The set_memory_* API can be used to change various attributes of a virtual
@@ -58,6 +60,18 @@  int set_pages_array_uc(struct page **pages, int addrinarray);
 int set_pages_array_wc(struct page **pages, int addrinarray);
 int set_pages_array_wb(struct page **pages, int addrinarray);
 
+#ifdef CONFIG_KERNEL_REPLICATION
+int numa_set_memory_np(unsigned long addr, int numpages);
+int numa_set_memory_np_noalias(unsigned long addr, int numpages);
+int numa_set_memory_global(unsigned long addr, int numpages);
+int numa_set_memory_nonglobal(unsigned long addr, int numpages);
+#else
+#define numa_set_memory_np set_memory_np
+#define numa_set_memory_np_noalias set_memory_np_noalias
+#define numa_set_memory_global set_memory_global
+#define numa_set_memory_nonglobal set_memory_nonglobal
+#endif /* CONFIG_KERNEL_REPLICATION */
+
 /*
  * For legacy compatibility with the old APIs, a few functions
  * are provided that work on a "struct page".
diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
index df4182b6449f..ceba209ee653 100644
--- a/arch/x86/mm/pat/set_memory.c
+++ b/arch/x86/mm/pat/set_memory.c
@@ -22,6 +22,7 @@ 
 #include <linux/cc_platform.h>
 #include <linux/set_memory.h>
 #include <linux/memregion.h>
+#include <linux/numa_replication.h>
 
 #include <asm/e820/api.h>
 #include <asm/processor.h>
@@ -1790,7 +1791,7 @@  static int __change_page_attr_set_clr(struct cpa_data *cpa, int primary)
 	return ret;
 }
 
-static int change_page_attr_set_clr(unsigned long *addr, int numpages,
+static int change_page_attr_set_clr_pgd(pgd_t *pgd, unsigned long *addr, int numpages,
 				    pgprot_t mask_set, pgprot_t mask_clr,
 				    int force_split, int in_flag,
 				    struct page **pages)
@@ -1845,6 +1846,7 @@  static int change_page_attr_set_clr(unsigned long *addr, int numpages,
 	cpa.flags = in_flag;
 	cpa.curpage = 0;
 	cpa.force_split = force_split;
+	cpa.pgd = pgd;
 
 	ret = __change_page_attr_set_clr(&cpa, 1);
 
@@ -1873,6 +1875,15 @@  static int change_page_attr_set_clr(unsigned long *addr, int numpages,
 	return ret;
 }
 
+static int change_page_attr_set_clr(unsigned long *addr, int numpages,
+				    pgprot_t mask_set, pgprot_t mask_clr,
+				    int force_split, int in_flag,
+				    struct page **pages)
+{
+	return change_page_attr_set_clr_pgd(NULL, addr, numpages, mask_set,
+					    mask_clr, force_split, in_flag, pages);
+}
+
 static inline int change_page_attr_set(unsigned long *addr, int numpages,
 				       pgprot_t mask, int array)
 {
@@ -1880,6 +1891,13 @@  static inline int change_page_attr_set(unsigned long *addr, int numpages,
 		(array ? CPA_ARRAY : 0), NULL);
 }
 
+static inline int change_page_attr_set_pgd(pgd_t *pgd, unsigned long *addr, int numpages,
+				       pgprot_t mask, int array)
+{
+	return change_page_attr_set_clr_pgd(pgd, addr, numpages, mask, __pgprot(0), 0,
+		(array ? CPA_ARRAY : 0), NULL);
+}
+
 static inline int change_page_attr_clear(unsigned long *addr, int numpages,
 					 pgprot_t mask, int array)
 {
@@ -1887,6 +1905,13 @@  static inline int change_page_attr_clear(unsigned long *addr, int numpages,
 		(array ? CPA_ARRAY : 0), NULL);
 }
 
+static inline int change_page_attr_clear_pgd(pgd_t *pgd, unsigned long *addr, int numpages,
+					 pgprot_t mask, int array)
+{
+	return change_page_attr_set_clr_pgd(pgd, addr, numpages, __pgprot(0), mask, 0,
+		(array ? CPA_ARRAY : 0), NULL);
+}
+
 static inline int cpa_set_pages_array(struct page **pages, int numpages,
 				       pgprot_t mask)
 {
@@ -2122,6 +2147,129 @@  int set_memory_global(unsigned long addr, int numpages)
 				    __pgprot(_PAGE_GLOBAL), 0);
 }
 
+#ifdef CONFIG_KERNEL_REPLICATION
+int numa_set_memory_x(unsigned long addr, int numpages)
+{
+	int ret = 0;
+	int nid;
+
+	if (!(__supported_pte_mask & _PAGE_NX))
+		return 0;
+	for_each_replica(nid)
+		ret |= change_page_attr_clear_pgd(init_mm.pgd_numa[nid], &addr, numpages,
+						 __pgprot(_PAGE_NX), 0);
+
+	return ret;
+}
+
+int numa_set_memory_nx(unsigned long addr, int numpages)
+{
+	int ret = 0;
+	int nid;
+
+	if (!(__supported_pte_mask & _PAGE_NX))
+		return 0;
+	for_each_replica(nid)
+		ret |= change_page_attr_set_pgd(init_mm.pgd_numa[nid], &addr, numpages,
+						__pgprot(_PAGE_NX), 0);
+
+	return ret;
+}
+
+int numa_set_memory_ro(unsigned long addr, int numpages)
+{
+	int ret = 0;
+	int nid;
+
+	for_each_replica(nid)
+		ret |= change_page_attr_clear_pgd(init_mm.pgd_numa[nid], &addr, numpages,
+						  __pgprot(_PAGE_RW), 0);
+
+	return ret;
+}
+
+int numa_set_memory_rox(unsigned long addr, int numpages)
+{
+	int nid;
+
+	int ret = 0;
+	pgprot_t clr = __pgprot(_PAGE_RW);
+
+	if (__supported_pte_mask & _PAGE_NX)
+		clr.pgprot |= _PAGE_NX;
+
+	for_each_online_node(nid) {
+		ret |= change_page_attr_clear_pgd(init_mm.pgd_numa[nid], &addr, numpages, clr, 0);
+		if (!is_text_replicated())
+			break;
+	}
+	return ret;
+}
+
+int numa_set_memory_rw(unsigned long addr, int numpages)
+{
+	int ret = 0;
+	int nid;
+
+	for_each_replica(nid)
+		ret |= change_page_attr_set_pgd(init_mm.pgd_numa[nid], &addr, numpages,
+						__pgprot(_PAGE_RW), 0);
+
+	return ret;
+}
+
+int numa_set_memory_np(unsigned long addr, int numpages)
+{
+	int ret = 0;
+	int nid;
+
+	for_each_replica(nid)
+		ret |= change_page_attr_clear_pgd(init_mm.pgd_numa[nid], &addr, numpages,
+						  __pgprot(_PAGE_PRESENT), 0);
+
+	return ret;
+}
+
+int numa_set_memory_np_noalias(unsigned long addr, int numpages)
+{
+	int ret = 0;
+	int nid;
+	int cpa_flags = CPA_NO_CHECK_ALIAS;
+
+	for_each_replica(nid)
+		ret |= change_page_attr_set_clr_pgd(init_mm.pgd_numa[nid], &addr, numpages,
+						    __pgprot(0),
+						    __pgprot(_PAGE_PRESENT), 0,
+						    cpa_flags, NULL);
+
+	return ret;
+}
+
+int numa_set_memory_global(unsigned long addr, int numpages)
+{
+	int ret = 0;
+	int nid;
+
+	for_each_replica(nid)
+		ret |= change_page_attr_set_pgd(init_mm.pgd_numa[nid], &addr, numpages,
+						__pgprot(_PAGE_GLOBAL), 0);
+
+	return ret;
+}
+
+int numa_set_memory_nonglobal(unsigned long addr, int numpages)
+{
+	int ret = 0;
+	int nid;
+
+	for_each_replica(nid)
+		ret |= change_page_attr_clear_pgd(init_mm.pgd_numa[nid], &addr, numpages,
+						  __pgprot(_PAGE_GLOBAL), 0);
+
+	return ret;
+}
+#endif
+
 /*
  * __set_memory_enc_pgtable() is used for the hypervisors that get
  * informed about "encryption" status via page tables.
diff --git a/include/asm-generic/set_memory.h b/include/asm-generic/set_memory.h
index c86abf6bc7ba..886639600e64 100644
--- a/include/asm-generic/set_memory.h
+++ b/include/asm-generic/set_memory.h
@@ -10,4 +10,16 @@  int set_memory_rw(unsigned long addr, int numpages);
 int set_memory_x(unsigned long addr, int numpages);
 int set_memory_nx(unsigned long addr, int numpages);
 
+#ifdef CONFIG_KERNEL_REPLICATION
+int numa_set_memory_ro(unsigned long addr, int numpages);
+int numa_set_memory_rw(unsigned long addr, int numpages);
+int numa_set_memory_x(unsigned long addr, int numpages);
+int numa_set_memory_nx(unsigned long addr, int numpages);
+#else
+#define numa_set_memory_ro set_memory_ro
+#define numa_set_memory_rw set_memory_rw
+#define numa_set_memory_x  set_memory_x
+#define numa_set_memory_nx set_memory_nx
+#endif /* CONFIG_KERNEL_REPLICATION */
+
 #endif
diff --git a/include/linux/set_memory.h b/include/linux/set_memory.h
index 95ac8398ee72..3213bfd335dd 100644
--- a/include/linux/set_memory.h
+++ b/include/linux/set_memory.h
@@ -24,6 +24,16 @@  static inline int set_memory_rox(unsigned long addr, int numpages)
 }
 #endif
 
+#ifndef numa_set_memory_rox
+static inline int numa_set_memory_rox(unsigned long addr, int numpages)
+{
+	int ret = numa_set_memory_ro(addr, numpages);
+	if (ret)
+		return ret;
+	return numa_set_memory_x(addr, numpages);
+}
+#endif
+
 #ifndef CONFIG_ARCH_HAS_SET_DIRECT_MAP
 static inline int set_direct_map_invalid_noflush(struct page *page)
 {