Message ID | 20240917073117.1531207-8-anshuman.khandual@arm.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | mm: Use pxdp_get() for accessing page table entries | expand |
Hi Anshuman, kernel test robot noticed the following build errors: [auto build test ERROR on char-misc/char-misc-testing] [also build test ERROR on char-misc/char-misc-next char-misc/char-misc-linus brauner-vfs/vfs.all dennis-percpu/for-next linus/master v6.11] [cannot apply to akpm-mm/mm-everything next-20240918] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/Anshuman-Khandual/m68k-mm-Change-pmd_val/20240917-153331 base: char-misc/char-misc-testing patch link: https://lore.kernel.org/r/20240917073117.1531207-8-anshuman.khandual%40arm.com patch subject: [PATCH V2 7/7] mm: Use pgdp_get() for accessing PGD entries config: arm-footbridge_defconfig (https://download.01.org/0day-ci/archive/20240919/202409190310.ViHBRe12-lkp@intel.com/config) compiler: clang version 20.0.0git (https://github.com/llvm/llvm-project 8663a75fa2f31299ab8d1d90288d9df92aadee88) reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240919/202409190310.ViHBRe12-lkp@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp@intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202409190310.ViHBRe12-lkp@intel.com/ All errors (new ones prefixed by >>): In file included from arch/arm/kernel/asm-offsets.c:12: In file included from include/linux/mm.h:30: >> include/linux/pgtable.h:1245:18: error: use of undeclared identifier 'pgdp'; did you mean 'pgd'? 1245 | pgd_t old_pgd = pgdp_get(pgd); | ^ arch/arm/include/asm/pgtable.h:154:36: note: expanded from macro 'pgdp_get' 154 | #define pgdp_get(pgpd) READ_ONCE(*pgdp) | ^ include/linux/pgtable.h:1243:48: note: 'pgd' declared here 1243 | static inline int pgd_none_or_clear_bad(pgd_t *pgd) | ^ >> include/linux/pgtable.h:1245:18: error: use of undeclared identifier 'pgdp'; did you mean 'pgd'? 1245 | pgd_t old_pgd = pgdp_get(pgd); | ^ arch/arm/include/asm/pgtable.h:154:36: note: expanded from macro 'pgdp_get' 154 | #define pgdp_get(pgpd) READ_ONCE(*pgdp) | ^ include/linux/pgtable.h:1243:48: note: 'pgd' declared here 1243 | static inline int pgd_none_or_clear_bad(pgd_t *pgd) | ^ >> include/linux/pgtable.h:1245:18: error: use of undeclared identifier 'pgdp'; did you mean 'pgd'? 1245 | pgd_t old_pgd = pgdp_get(pgd); | ^ arch/arm/include/asm/pgtable.h:154:36: note: expanded from macro 'pgdp_get' 154 | #define pgdp_get(pgpd) READ_ONCE(*pgdp) | ^ include/linux/pgtable.h:1243:48: note: 'pgd' declared here 1243 | static inline int pgd_none_or_clear_bad(pgd_t *pgd) | ^ >> include/linux/pgtable.h:1245:18: error: use of undeclared identifier 'pgdp'; did you mean 'pgd'? 1245 | pgd_t old_pgd = pgdp_get(pgd); | ^ arch/arm/include/asm/pgtable.h:154:36: note: expanded from macro 'pgdp_get' 154 | #define pgdp_get(pgpd) READ_ONCE(*pgdp) | ^ include/linux/pgtable.h:1243:48: note: 'pgd' declared here 1243 | static inline int pgd_none_or_clear_bad(pgd_t *pgd) | ^ >> include/linux/pgtable.h:1245:18: error: use of undeclared identifier 'pgdp'; did you mean 'pgd'? 1245 | pgd_t old_pgd = pgdp_get(pgd); | ^ arch/arm/include/asm/pgtable.h:154:36: note: expanded from macro 'pgdp_get' 154 | #define pgdp_get(pgpd) READ_ONCE(*pgdp) | ^ include/linux/pgtable.h:1243:48: note: 'pgd' declared here 1243 | static inline int pgd_none_or_clear_bad(pgd_t *pgd) | ^ >> include/linux/pgtable.h:1245:18: error: use of undeclared identifier 'pgdp'; did you mean 'pgd'? 1245 | pgd_t old_pgd = pgdp_get(pgd); | ^ arch/arm/include/asm/pgtable.h:154:36: note: expanded from macro 'pgdp_get' 154 | #define pgdp_get(pgpd) READ_ONCE(*pgdp) | ^ include/linux/pgtable.h:1243:48: note: 'pgd' declared here 1243 | static inline int pgd_none_or_clear_bad(pgd_t *pgd) | ^ >> include/linux/pgtable.h:1245:18: error: use of undeclared identifier 'pgdp'; did you mean 'pgd'? 1245 | pgd_t old_pgd = pgdp_get(pgd); | ^ arch/arm/include/asm/pgtable.h:154:36: note: expanded from macro 'pgdp_get' 154 | #define pgdp_get(pgpd) READ_ONCE(*pgdp) | ^ include/linux/pgtable.h:1243:48: note: 'pgd' declared here 1243 | static inline int pgd_none_or_clear_bad(pgd_t *pgd) | ^ >> include/linux/pgtable.h:1245:18: error: use of undeclared identifier 'pgdp'; did you mean 'pgd'? 1245 | pgd_t old_pgd = pgdp_get(pgd); | ^ arch/arm/include/asm/pgtable.h:154:36: note: expanded from macro 'pgdp_get' 154 | #define pgdp_get(pgpd) READ_ONCE(*pgdp) | ^ include/linux/pgtable.h:1243:48: note: 'pgd' declared here 1243 | static inline int pgd_none_or_clear_bad(pgd_t *pgd) | ^ >> include/linux/pgtable.h:1245:8: error: array initializer must be an initializer list or wide string literal 1245 | pgd_t old_pgd = pgdp_get(pgd); | ^ In file included from arch/arm/kernel/asm-offsets.c:12: In file included from include/linux/mm.h:1131: In file included from include/linux/huge_mm.h:8: In file included from include/linux/fs.h:33: In file included from include/linux/percpu-rwsem.h:7: In file included from include/linux/rcuwait.h:6: In file included from include/linux/sched/signal.h:6: include/linux/signal.h:98:11: warning: array index 3 is past the end of the array (that has type 'unsigned long[2]') [-Warray-bounds] 98 | return (set->sig[3] | set->sig[2] | | ^ ~ arch/arm/include/asm/signal.h:17:2: note: array 'sig' declared here 17 | unsigned long sig[_NSIG_WORDS]; | ^ In file included from arch/arm/kernel/asm-offsets.c:12: In file included from include/linux/mm.h:1131: In file included from include/linux/huge_mm.h:8: In file included from include/linux/fs.h:33: In file included from include/linux/percpu-rwsem.h:7: In file included from include/linux/rcuwait.h:6: In file included from include/linux/sched/signal.h:6: include/linux/signal.h:98:25: warning: array index 2 is past the end of the array (that has type 'unsigned long[2]') [-Warray-bounds] 98 | return (set->sig[3] | set->sig[2] | | ^ ~ arch/arm/include/asm/signal.h:17:2: note: array 'sig' declared here 17 | unsigned long sig[_NSIG_WORDS]; | ^ In file included from arch/arm/kernel/asm-offsets.c:12: In file included from include/linux/mm.h:1131: In file included from include/linux/huge_mm.h:8: In file included from include/linux/fs.h:33: In file included from include/linux/percpu-rwsem.h:7: In file included from include/linux/rcuwait.h:6: In file included from include/linux/sched/signal.h:6: include/linux/signal.h:114:11: warning: array index 3 is past the end of the array (that has type 'const unsigned long[2]') [-Warray-bounds] 114 | return (set1->sig[3] == set2->sig[3]) && | ^ ~ arch/arm/include/asm/signal.h:17:2: note: array 'sig' declared here 17 | unsigned long sig[_NSIG_WORDS]; | ^ In file included from arch/arm/kernel/asm-offsets.c:12: In file included from include/linux/mm.h:1131: In file included from include/linux/huge_mm.h:8: In file included from include/linux/fs.h:33: In file included from include/linux/percpu-rwsem.h:7: In file included from include/linux/rcuwait.h:6: In file included from include/linux/sched/signal.h:6: include/linux/signal.h:114:27: warning: array index 3 is past the end of the array (that has type 'const unsigned long[2]') [-Warray-bounds] 114 | return (set1->sig[3] == set2->sig[3]) && | ^ ~ arch/arm/include/asm/signal.h:17:2: note: array 'sig' declared here 17 | unsigned long sig[_NSIG_WORDS]; | ^ In file included from arch/arm/kernel/asm-offsets.c:12: In file included from include/linux/mm.h:1131: In file included from include/linux/huge_mm.h:8: In file included from include/linux/fs.h:33: In file included from include/linux/percpu-rwsem.h:7: In file included from include/linux/rcuwait.h:6: In file included from include/linux/sched/signal.h:6: include/linux/signal.h:115:5: warning: array index 2 is past the end of the array (that has type 'const unsigned long[2]') [-Warray-bounds] 115 | (set1->sig[2] == set2->sig[2]) && | ^ ~ arch/arm/include/asm/signal.h:17:2: note: array 'sig' declared here 17 | unsigned long sig[_NSIG_WORDS]; | ^ In file included from arch/arm/kernel/asm-offsets.c:12: In file included from include/linux/mm.h:1131: In file included from include/linux/huge_mm.h:8: In file included from include/linux/fs.h:33: In file included from include/linux/percpu-rwsem.h:7: In file included from include/linux/rcuwait.h:6: In file included from include/linux/sched/signal.h:6: include/linux/signal.h:115:21: warning: array index 2 is past the end of the array (that has type 'const unsigned long[2]') [-Warray-bounds] 115 | (set1->sig[2] == set2->sig[2]) && | ^ ~ arch/arm/include/asm/signal.h:17:2: note: array 'sig' declared here 17 | unsigned long sig[_NSIG_WORDS]; | ^ In file included from arch/arm/kernel/asm-offsets.c:12: In file included from include/linux/mm.h:1131: In file included from include/linux/huge_mm.h:8: In file included from include/linux/fs.h:33: In file included from include/linux/percpu-rwsem.h:7: In file included from include/linux/rcuwait.h:6: In file included from include/linux/sched/signal.h:6: include/linux/signal.h:157:1: warning: array index 3 is past the end of the array (that has type 'const unsigned long[2]') [-Warray-bounds] 157 | _SIG_SET_BINOP(sigorsets, _sig_or) | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ include/linux/signal.h:138:8: note: expanded from macro '_SIG_SET_BINOP' 138 | a3 = a->sig[3]; a2 = a->sig[2]; \ | ^ ~ arch/arm/include/asm/signal.h:17:2: note: array 'sig' declared here 17 | unsigned long sig[_NSIG_WORDS]; | ^ In file included from arch/arm/kernel/asm-offsets.c:12: In file included from include/linux/mm.h:1131: In file included from include/linux/huge_mm.h:8: In file included from include/linux/fs.h:33: -- 163 | _SIG_SET_BINOP(sigandnsets, _sig_andn) | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ include/linux/signal.h:140:3: note: expanded from macro '_SIG_SET_BINOP' 140 | r->sig[3] = op(a3, b3); \ | ^ ~ arch/arm/include/asm/signal.h:17:2: note: array 'sig' declared here 17 | unsigned long sig[_NSIG_WORDS]; | ^ In file included from arch/arm/kernel/asm-offsets.c:12: In file included from include/linux/mm.h:1131: In file included from include/linux/huge_mm.h:8: In file included from include/linux/fs.h:33: In file included from include/linux/percpu-rwsem.h:7: In file included from include/linux/rcuwait.h:6: In file included from include/linux/sched/signal.h:6: include/linux/signal.h:163:1: warning: array index 2 is past the end of the array (that has type 'unsigned long[2]') [-Warray-bounds] 163 | _SIG_SET_BINOP(sigandnsets, _sig_andn) | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ include/linux/signal.h:141:3: note: expanded from macro '_SIG_SET_BINOP' 141 | r->sig[2] = op(a2, b2); \ | ^ ~ arch/arm/include/asm/signal.h:17:2: note: array 'sig' declared here 17 | unsigned long sig[_NSIG_WORDS]; | ^ In file included from arch/arm/kernel/asm-offsets.c:12: In file included from include/linux/mm.h:1131: In file included from include/linux/huge_mm.h:8: In file included from include/linux/fs.h:33: In file included from include/linux/percpu-rwsem.h:7: In file included from include/linux/rcuwait.h:6: In file included from include/linux/sched/signal.h:6: include/linux/signal.h:187:1: warning: array index 3 is past the end of the array (that has type 'unsigned long[2]') [-Warray-bounds] 187 | _SIG_SET_OP(signotset, _sig_not) | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ include/linux/signal.h:174:27: note: expanded from macro '_SIG_SET_OP' 174 | case 4: set->sig[3] = op(set->sig[3]); \ | ^ ~ include/linux/signal.h:186:24: note: expanded from macro '_sig_not' 186 | #define _sig_not(x) (~(x)) | ^ arch/arm/include/asm/signal.h:17:2: note: array 'sig' declared here 17 | unsigned long sig[_NSIG_WORDS]; | ^ In file included from arch/arm/kernel/asm-offsets.c:12: In file included from include/linux/mm.h:1131: In file included from include/linux/huge_mm.h:8: In file included from include/linux/fs.h:33: In file included from include/linux/percpu-rwsem.h:7: In file included from include/linux/rcuwait.h:6: In file included from include/linux/sched/signal.h:6: include/linux/signal.h:187:1: warning: array index 3 is past the end of the array (that has type 'unsigned long[2]') [-Warray-bounds] 187 | _SIG_SET_OP(signotset, _sig_not) | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ include/linux/signal.h:174:10: note: expanded from macro '_SIG_SET_OP' 174 | case 4: set->sig[3] = op(set->sig[3]); \ | ^ ~ arch/arm/include/asm/signal.h:17:2: note: array 'sig' declared here 17 | unsigned long sig[_NSIG_WORDS]; | ^ In file included from arch/arm/kernel/asm-offsets.c:12: In file included from include/linux/mm.h:1131: In file included from include/linux/huge_mm.h:8: In file included from include/linux/fs.h:33: In file included from include/linux/percpu-rwsem.h:7: In file included from include/linux/rcuwait.h:6: In file included from include/linux/sched/signal.h:6: include/linux/signal.h:187:1: warning: array index 2 is past the end of the array (that has type 'unsigned long[2]') [-Warray-bounds] 187 | _SIG_SET_OP(signotset, _sig_not) | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ include/linux/signal.h:175:20: note: expanded from macro '_SIG_SET_OP' 175 | set->sig[2] = op(set->sig[2]); \ | ^ ~ include/linux/signal.h:186:24: note: expanded from macro '_sig_not' 186 | #define _sig_not(x) (~(x)) | ^ arch/arm/include/asm/signal.h:17:2: note: array 'sig' declared here 17 | unsigned long sig[_NSIG_WORDS]; | ^ In file included from arch/arm/kernel/asm-offsets.c:12: In file included from include/linux/mm.h:1131: In file included from include/linux/huge_mm.h:8: In file included from include/linux/fs.h:33: In file included from include/linux/percpu-rwsem.h:7: In file included from include/linux/rcuwait.h:6: In file included from include/linux/sched/signal.h:6: include/linux/signal.h:187:1: warning: array index 2 is past the end of the array (that has type 'unsigned long[2]') [-Warray-bounds] 187 | _SIG_SET_OP(signotset, _sig_not) | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ include/linux/signal.h:175:3: note: expanded from macro '_SIG_SET_OP' 175 | set->sig[2] = op(set->sig[2]); \ | ^ ~ arch/arm/include/asm/signal.h:17:2: note: array 'sig' declared here 17 | unsigned long sig[_NSIG_WORDS]; | ^ In file included from arch/arm/kernel/asm-offsets.c:12: In file included from include/linux/mm.h:2232: include/linux/vmstat.h:517:36: warning: arithmetic between different enumeration types ('enum node_stat_item' and 'enum lru_list') [-Wenum-enum-conversion] 517 | return node_stat_name(NR_LRU_BASE + lru) + 3; // skip "nr_" | ~~~~~~~~~~~ ^ ~~~ In file included from arch/arm/kernel/asm-offsets.c:12: >> include/linux/mm.h:2822:28: error: use of undeclared identifier 'pgdp'; did you mean 'pgd'? 2822 | return (unlikely(pgd_none(pgdp_get(pgd))) && __p4d_alloc(mm, pgd, address)) ? | ^ arch/arm/include/asm/pgtable.h:154:36: note: expanded from macro 'pgdp_get' 154 | #define pgdp_get(pgpd) READ_ONCE(*pgdp) | ^ include/linux/mm.h:2819:61: note: 'pgd' declared here 2819 | static inline p4d_t *p4d_alloc(struct mm_struct *mm, pgd_t *pgd, | ^ >> include/linux/mm.h:2822:28: error: use of undeclared identifier 'pgdp'; did you mean 'pgd'? 2822 | return (unlikely(pgd_none(pgdp_get(pgd))) && __p4d_alloc(mm, pgd, address)) ? | ^ arch/arm/include/asm/pgtable.h:154:36: note: expanded from macro 'pgdp_get' 154 | #define pgdp_get(pgpd) READ_ONCE(*pgdp) | ^ include/linux/mm.h:2819:61: note: 'pgd' declared here 2819 | static inline p4d_t *p4d_alloc(struct mm_struct *mm, pgd_t *pgd, | ^ >> include/linux/mm.h:2822:28: error: use of undeclared identifier 'pgdp'; did you mean 'pgd'? 2822 | return (unlikely(pgd_none(pgdp_get(pgd))) && __p4d_alloc(mm, pgd, address)) ? | ^ arch/arm/include/asm/pgtable.h:154:36: note: expanded from macro 'pgdp_get' 154 | #define pgdp_get(pgpd) READ_ONCE(*pgdp) | ^ include/linux/mm.h:2819:61: note: 'pgd' declared here 2819 | static inline p4d_t *p4d_alloc(struct mm_struct *mm, pgd_t *pgd, | ^ >> include/linux/mm.h:2822:28: error: use of undeclared identifier 'pgdp'; did you mean 'pgd'? 2822 | return (unlikely(pgd_none(pgdp_get(pgd))) && __p4d_alloc(mm, pgd, address)) ? | ^ arch/arm/include/asm/pgtable.h:154:36: note: expanded from macro 'pgdp_get' 154 | #define pgdp_get(pgpd) READ_ONCE(*pgdp) | ^ include/linux/mm.h:2819:61: note: 'pgd' declared here 2819 | static inline p4d_t *p4d_alloc(struct mm_struct *mm, pgd_t *pgd, | ^ >> include/linux/mm.h:2822:28: error: use of undeclared identifier 'pgdp'; did you mean 'pgd'? 2822 | return (unlikely(pgd_none(pgdp_get(pgd))) && __p4d_alloc(mm, pgd, address)) ? | ^ arch/arm/include/asm/pgtable.h:154:36: note: expanded from macro 'pgdp_get' 154 | #define pgdp_get(pgpd) READ_ONCE(*pgdp) | ^ include/linux/mm.h:2819:61: note: 'pgd' declared here 2819 | static inline p4d_t *p4d_alloc(struct mm_struct *mm, pgd_t *pgd, | ^ >> include/linux/mm.h:2822:28: error: use of undeclared identifier 'pgdp'; did you mean 'pgd'? 2822 | return (unlikely(pgd_none(pgdp_get(pgd))) && __p4d_alloc(mm, pgd, address)) ? | ^ arch/arm/include/asm/pgtable.h:154:36: note: expanded from macro 'pgdp_get' 154 | #define pgdp_get(pgpd) READ_ONCE(*pgdp) | ^ include/linux/mm.h:2819:61: note: 'pgd' declared here 2819 | static inline p4d_t *p4d_alloc(struct mm_struct *mm, pgd_t *pgd, | ^ >> include/linux/mm.h:2822:28: error: use of undeclared identifier 'pgdp'; did you mean 'pgd'? 2822 | return (unlikely(pgd_none(pgdp_get(pgd))) && __p4d_alloc(mm, pgd, address)) ? | ^ arch/arm/include/asm/pgtable.h:154:36: note: expanded from macro 'pgdp_get' 154 | #define pgdp_get(pgpd) READ_ONCE(*pgdp) | ^ include/linux/mm.h:2819:61: note: 'pgd' declared here 2819 | static inline p4d_t *p4d_alloc(struct mm_struct *mm, pgd_t *pgd, | ^ >> include/linux/mm.h:2822:28: error: use of undeclared identifier 'pgdp'; did you mean 'pgd'? 2822 | return (unlikely(pgd_none(pgdp_get(pgd))) && __p4d_alloc(mm, pgd, address)) ? | ^ arch/arm/include/asm/pgtable.h:154:36: note: expanded from macro 'pgdp_get' 154 | #define pgdp_get(pgpd) READ_ONCE(*pgdp) | ^ include/linux/mm.h:2819:61: note: 'pgd' declared here 2819 | static inline p4d_t *p4d_alloc(struct mm_struct *mm, pgd_t *pgd, | ^ >> include/linux/mm.h:2822:28: error: passing 'const volatile pmdval_t *' (aka 'const volatile unsigned int *') to parameter of type 'pmdval_t *' (aka 'unsigned int *') discards qualifiers [-Werror,-Wincompatible-pointer-types-discards-qualifiers] 2822 | return (unlikely(pgd_none(pgdp_get(pgd))) && __p4d_alloc(mm, pgd, address)) ? | ~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~ arch/arm/include/asm/pgtable.h:154:25: note: expanded from macro 'pgdp_get' 154 | #define pgdp_get(pgpd) READ_ONCE(*pgdp) | ^ include/asm-generic/rwonce.h:47:28: note: expanded from macro 'READ_ONCE' 47 | #define READ_ONCE(x) \ | ^ include/linux/compiler.h:77:42: note: expanded from macro 'unlikely' 77 | # define unlikely(x) __builtin_expect(!!(x), 0) | ^ include/asm-generic/pgtable-nop4d.h:21:34: note: passing argument to parameter 'pgd' here 21 | static inline int pgd_none(pgd_t pgd) { return 0; } | ^ 29 warnings and 18 errors generated. make[3]: *** [scripts/Makefile.build:117: arch/arm/kernel/asm-offsets.s] Error 1 make[3]: Target 'prepare' not remade because of errors. make[2]: *** [Makefile:1194: prepare0] Error 2 make[2]: Target 'prepare' not remade because of errors. make[1]: *** [Makefile:224: __sub-make] Error 2 make[1]: Target 'prepare' not remade because of errors. make: *** [Makefile:224: __sub-make] Error 2 make: Target 'prepare' not remade because of errors. vim +1245 include/linux/pgtable.h 1242 1243 static inline int pgd_none_or_clear_bad(pgd_t *pgd) 1244 { > 1245 pgd_t old_pgd = pgdp_get(pgd); 1246 1247 if (pgd_none(old_pgd)) 1248 return 1; 1249 if (unlikely(pgd_bad(old_pgd))) { 1250 pgd_clear_bad(pgd); 1251 return 1; 1252 } 1253 return 0; 1254 } 1255
On 9/19/24 02:00, kernel test robot wrote: > Hi Anshuman, > > kernel test robot noticed the following build errors: > > [auto build test ERROR on char-misc/char-misc-testing] > [also build test ERROR on char-misc/char-misc-next char-misc/char-misc-linus brauner-vfs/vfs.all dennis-percpu/for-next linus/master v6.11] > [cannot apply to akpm-mm/mm-everything next-20240918] > [If your patch is applied to the wrong git tree, kindly drop us a note. > And when submitting patch, we suggest to use '--base' as documented in > https://git-scm.com/docs/git-format-patch#_base_tree_information] > > url: https://github.com/intel-lab-lkp/linux/commits/Anshuman-Khandual/m68k-mm-Change-pmd_val/20240917-153331 > base: char-misc/char-misc-testing > patch link: https://lore.kernel.org/r/20240917073117.1531207-8-anshuman.khandual%40arm.com > patch subject: [PATCH V2 7/7] mm: Use pgdp_get() for accessing PGD entries > config: arm-footbridge_defconfig (https://download.01.org/0day-ci/archive/20240919/202409190310.ViHBRe12-lkp@intel.com/config) > compiler: clang version 20.0.0git (https://github.com/llvm/llvm-project 8663a75fa2f31299ab8d1d90288d9df92aadee88) > reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240919/202409190310.ViHBRe12-lkp@intel.com/reproduce) > > If you fix the issue in a separate patch/commit (i.e. not just a new version of > the same patch/commit), kindly add following tags > | Reported-by: kernel test robot <lkp@intel.com> > | Closes: https://lore.kernel.org/oe-kbuild-all/202409190310.ViHBRe12-lkp@intel.com/ > > All errors (new ones prefixed by >>): > > In file included from arch/arm/kernel/asm-offsets.c:12: > In file included from include/linux/mm.h:30: >>> include/linux/pgtable.h:1245:18: error: use of undeclared identifier 'pgdp'; did you mean 'pgd'? > 1245 | pgd_t old_pgd = pgdp_get(pgd); > | ^ > arch/arm/include/asm/pgtable.h:154:36: note: expanded from macro 'pgdp_get' > 154 | #define pgdp_get(pgpd) READ_ONCE(*pgdp) > | ^ > include/linux/pgtable.h:1243:48: note: 'pgd' declared here > 1243 | static inline int pgd_none_or_clear_bad(pgd_t *pgd) > | ^ arm (32) platform currently overrides pgdp_get() helper in the platform but defines that like the exact same version as the generic one, albeit with a typo which can be fixed with something like this. diff --git a/arch/arm/include/asm/pgtable.h b/arch/arm/include/asm/pgtable.h index be91e376df79..aedb32d49c2a 100644 --- a/arch/arm/include/asm/pgtable.h +++ b/arch/arm/include/asm/pgtable.h @@ -151,7 +151,7 @@ extern pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn, extern pgd_t swapper_pg_dir[PTRS_PER_PGD]; -#define pgdp_get(pgpd) READ_ONCE(*pgdp) +#define pgdp_get(pgdp) READ_ONCE(*pgdp) #define pud_page(pud) pmd_page(__pmd(pud_val(pud))) #define pud_write(pud) pmd_write(__pmd(pud_val(pud))) Regardless there is another problem here. On arm platform there are multiple pgd_t definitions available depending on various configs but some are arrays instead of a single data element, although platform pgdp_get() helper remains the same for all. arch/arm/include/asm/page-nommu.h:typedef unsigned long pgd_t[2]; arch/arm/include/asm/pgtable-2level-types.h:typedef struct { pmdval_t pgd[2]; } pgd_t; arch/arm/include/asm/pgtable-2level-types.h:typedef pmdval_t pgd_t[2]; arch/arm/include/asm/pgtable-3level-types.h:typedef struct { pgdval_t pgd; } pgd_t; arch/arm/include/asm/pgtable-3level-types.h:typedef pgdval_t pgd_t; I guess it might need different pgdp_get() variants depending applicable pgd_t definition. Will continue looking into this further but meanwhile copied Russel King in case he might be able to give some direction.
On Thu, Sep 19, 2024 at 01:25:08PM +0530, Anshuman Khandual wrote: > arm (32) platform currently overrides pgdp_get() helper in the platform but > defines that like the exact same version as the generic one, albeit with a > typo which can be fixed with something like this. pgdp_get() was added to arm in eba2591d99d1 ("mm: Introduce pudp/p4dp/pgdp_get() functions") with the typo you've spotted. It seems it was added with no users, otherwise the error would have been spotted earlier. I'm not a fan of adding dead code to the kernel for this reason. > Regardless there is another problem here. On arm platform there are multiple > pgd_t definitions available depending on various configs but some are arrays > instead of a single data element, although platform pgdp_get() helper remains > the same for all. > > arch/arm/include/asm/page-nommu.h:typedef unsigned long pgd_t[2]; > arch/arm/include/asm/pgtable-2level-types.h:typedef struct { pmdval_t pgd[2]; } pgd_t; > arch/arm/include/asm/pgtable-2level-types.h:typedef pmdval_t pgd_t[2]; > arch/arm/include/asm/pgtable-3level-types.h:typedef struct { pgdval_t pgd; } pgd_t; > arch/arm/include/asm/pgtable-3level-types.h:typedef pgdval_t pgd_t; > > I guess it might need different pgdp_get() variants depending applicable pgd_t > definition. Will continue looking into this further but meanwhile copied Russel > King in case he might be able to give some direction. That's Russel*L*, thanks. 32-bit arm uses, in some circumstances, an array because each level 1 page table entry is actually two descriptors. It needs to be this way because each level 2 table pointed to by each level 1 entry has 256 entries, meaning it only occupies 1024 bytes in a 4096 byte page. In order to cut down on the wastage, treat the level 1 page table as groups of two entries, which point to two consecutive 1024 byte tables in the level 2 page. The level 2 entry isn't suitable for the kernel's use cases (there are no bits to represent accessed/dirty and other important stuff that the Linux MM wants) so we maintain the hardware page tables and a separate set that Linux uses in the same page. Again, the software tables are consecutive, so from Linux's perspective, the level 2 page tables have 512 entries in them and occupy one full page. This is documented in arch/arm/include/asm/pgtable-2level.h However, what this means is that from the software perspective, the level 1 page table descriptors are an array of two entries, both of which need to be setup when creating a level 2 page table, but only the first one should ever be dereferenced when walking the tables, otherwise the code that walks the second level of page table entries will walk off the end of the software table into the actual hardware descriptors. I've no idea what the idea is behind introducing pgd_get() and what it's semantics are, so I can't comment further.
On 19/09/2024 10:11, Russell King (Oracle) wrote: > On Thu, Sep 19, 2024 at 01:25:08PM +0530, Anshuman Khandual wrote: >> arm (32) platform currently overrides pgdp_get() helper in the platform but >> defines that like the exact same version as the generic one, albeit with a >> typo which can be fixed with something like this. > > pgdp_get() was added to arm in eba2591d99d1 ("mm: Introduce > pudp/p4dp/pgdp_get() functions") with the typo you've spotted. It seems > it was added with no users, otherwise the error would have been spotted > earlier. I'm not a fan of adding dead code to the kernel for this > reason. > >> Regardless there is another problem here. On arm platform there are multiple >> pgd_t definitions available depending on various configs but some are arrays >> instead of a single data element, although platform pgdp_get() helper remains >> the same for all. >> >> arch/arm/include/asm/page-nommu.h:typedef unsigned long pgd_t[2]; >> arch/arm/include/asm/pgtable-2level-types.h:typedef struct { pmdval_t pgd[2]; } pgd_t; >> arch/arm/include/asm/pgtable-2level-types.h:typedef pmdval_t pgd_t[2]; >> arch/arm/include/asm/pgtable-3level-types.h:typedef struct { pgdval_t pgd; } pgd_t; >> arch/arm/include/asm/pgtable-3level-types.h:typedef pgdval_t pgd_t; >> >> I guess it might need different pgdp_get() variants depending applicable pgd_t >> definition. Will continue looking into this further but meanwhile copied Russel >> King in case he might be able to give some direction. > > That's Russel*L*, thanks. > > 32-bit arm uses, in some circumstances, an array because each level 1 > page table entry is actually two descriptors. It needs to be this way > because each level 2 table pointed to by each level 1 entry has 256 > entries, meaning it only occupies 1024 bytes in a 4096 byte page. > > In order to cut down on the wastage, treat the level 1 page table as > groups of two entries, which point to two consecutive 1024 byte tables > in the level 2 page. > > The level 2 entry isn't suitable for the kernel's use cases (there are > no bits to represent accessed/dirty and other important stuff that the > Linux MM wants) so we maintain the hardware page tables and a separate > set that Linux uses in the same page. Again, the software tables are > consecutive, so from Linux's perspective, the level 2 page tables > have 512 entries in them and occupy one full page. > > This is documented in arch/arm/include/asm/pgtable-2level.h > > However, what this means is that from the software perspective, the > level 1 page table descriptors are an array of two entries, both of > which need to be setup when creating a level 2 page table, but only > the first one should ever be dereferenced when walking the tables, > otherwise the code that walks the second level of page table entries > will walk off the end of the software table into the actual hardware > descriptors. > > I've no idea what the idea is behind introducing pgd_get() and what > it's semantics are, so I can't comment further. The helper is intended to read the value of the entry pointed to by the passed in pointer. And it shoiuld be read in a "single copy atomic" manner, meaning no tearing. Further, the PTL is expected to be held when calling the getter. If the HW can write to the entry such that its racing with the lock holder (i.e. HW update of access/dirty) then READ_ONCE() should be suitable for most architectures. If there is no possibility of racing (because HW doesn't write to the entry), then a simple dereference would be sufficient, I think (which is what the core code was already doing in most cases). There is additional benefit that the architecture can hook this function if it has exotic use cases (see contpte feature on arm64 as an example, which hooks ptep_get()). It sounds to me like the arm (32) implementation of pgdp_get() could just continue to do a direct dereference and this should be safe? I don't think it supports HW update of access/dirty? Thanks, Ryan
On Thu, Sep 19, 2024 at 05:48:58PM +0200, Ryan Roberts wrote: > > 32-bit arm uses, in some circumstances, an array because each level 1 > > page table entry is actually two descriptors. It needs to be this way > > because each level 2 table pointed to by each level 1 entry has 256 > > entries, meaning it only occupies 1024 bytes in a 4096 byte page. > > > > In order to cut down on the wastage, treat the level 1 page table as > > groups of two entries, which point to two consecutive 1024 byte tables > > in the level 2 page. > > > > The level 2 entry isn't suitable for the kernel's use cases (there are > > no bits to represent accessed/dirty and other important stuff that the > > Linux MM wants) so we maintain the hardware page tables and a separate > > set that Linux uses in the same page. Again, the software tables are > > consecutive, so from Linux's perspective, the level 2 page tables > > have 512 entries in them and occupy one full page. > > > > This is documented in arch/arm/include/asm/pgtable-2level.h > > > > However, what this means is that from the software perspective, the > > level 1 page table descriptors are an array of two entries, both of > > which need to be setup when creating a level 2 page table, but only > > the first one should ever be dereferenced when walking the tables, > > otherwise the code that walks the second level of page table entries > > will walk off the end of the software table into the actual hardware > > descriptors. > > > > I've no idea what the idea is behind introducing pgd_get() and what > > it's semantics are, so I can't comment further. > > The helper is intended to read the value of the entry pointed to by the passed > in pointer. And it shoiuld be read in a "single copy atomic" manner, meaning no > tearing. Further, the PTL is expected to be held when calling the getter. If the > HW can write to the entry such that its racing with the lock holder (i.e. HW > update of access/dirty) then READ_ONCE() should be suitable for most > architectures. If there is no possibility of racing (because HW doesn't write to > the entry), then a simple dereference would be sufficient, I think (which is > what the core code was already doing in most cases). The core code should be making no access to the PGD entries on 32-bit ARM since the PGD level does not exist. Writes are done at PMD level in arch code. Reads are done by core code at PMD level. It feels to me like pgd_get() just doesn't fit the model to which 32-bit ARM was designed to use decades ago, so I want full details about what pgd_get() is going to be used for and how it is going to be used, because I feel completely in the dark over this new development. I fear that someone hasn't understood the Linux page table model if they're wanting to access stuff at levels that effectively "aren't implemented" in the architecture specific kernel model of the page tables. Essentially, on 32-bit 2-level ARM, the PGD is merely indexed by the virtual address. As far as the kernel is concerned, each entry is 64-bit, and the generic kernel code has no business accessing that through the pgd pointer. The pgd pointer is passed through the PUD and PMD levels, where it is typecast down through the kernel layers to a pmd_t pointer, where it becomes a 32-bit quantity. This results in only the _first_ level 1 pointer being dereferenced by kernel code to a 32-bit pmd_t quantity. pmd_page_vaddr() converts this pmd_t quantity to a pte pointer (which points at the software level 2 page tables, not the hardware page tables.) So, as I'm now being told that the kernel wants to dereference the pgd level despite the model I describe above, alarm bells are ringing. I want full information please.
On 19/09/2024 18:06, Russell King (Oracle) wrote: > On Thu, Sep 19, 2024 at 05:48:58PM +0200, Ryan Roberts wrote: >>> 32-bit arm uses, in some circumstances, an array because each level 1 >>> page table entry is actually two descriptors. It needs to be this way >>> because each level 2 table pointed to by each level 1 entry has 256 >>> entries, meaning it only occupies 1024 bytes in a 4096 byte page. >>> >>> In order to cut down on the wastage, treat the level 1 page table as >>> groups of two entries, which point to two consecutive 1024 byte tables >>> in the level 2 page. >>> >>> The level 2 entry isn't suitable for the kernel's use cases (there are >>> no bits to represent accessed/dirty and other important stuff that the >>> Linux MM wants) so we maintain the hardware page tables and a separate >>> set that Linux uses in the same page. Again, the software tables are >>> consecutive, so from Linux's perspective, the level 2 page tables >>> have 512 entries in them and occupy one full page. >>> >>> This is documented in arch/arm/include/asm/pgtable-2level.h >>> >>> However, what this means is that from the software perspective, the >>> level 1 page table descriptors are an array of two entries, both of >>> which need to be setup when creating a level 2 page table, but only >>> the first one should ever be dereferenced when walking the tables, >>> otherwise the code that walks the second level of page table entries >>> will walk off the end of the software table into the actual hardware >>> descriptors. >>> >>> I've no idea what the idea is behind introducing pgd_get() and what >>> it's semantics are, so I can't comment further. >> >> The helper is intended to read the value of the entry pointed to by the passed >> in pointer. And it shoiuld be read in a "single copy atomic" manner, meaning no >> tearing. Further, the PTL is expected to be held when calling the getter. If the >> HW can write to the entry such that its racing with the lock holder (i.e. HW >> update of access/dirty) then READ_ONCE() should be suitable for most >> architectures. If there is no possibility of racing (because HW doesn't write to >> the entry), then a simple dereference would be sufficient, I think (which is >> what the core code was already doing in most cases). > > The core code should be making no access to the PGD entries on 32-bit > ARM since the PGD level does not exist. Writes are done at PMD level > in arch code. Reads are done by core code at PMD level. > > It feels to me like pgd_get() just doesn't fit the model to which 32-bit > ARM was designed to use decades ago, so I want full details about what > pgd_get() is going to be used for and how it is going to be used, > because I feel completely in the dark over this new development. I fear > that someone hasn't understood the Linux page table model if they're > wanting to access stuff at levels that effectively "aren't implemented" > in the architecture specific kernel model of the page tables. This change isn't as big and scary as I think you fear. The core-mm today dereferences pgd pointers (and p4d, pud, pmd pointers) directly in its code. See follow_pfnmap_start(), gup_fast_pgd_leaf(), and many other sites. These changes aim to abstract those dereferences into an inline function that the architecture can override and implement if it so wishes. The core-mm implements default versions of these helper functions which do READ_ONCE(), but does not currently use them consistently. From Anshuman's comments earlier in this thread, it looked to me like the arm pgd_t type is too big to read with READ_ONCE() - it can't be atomically read on that arch. So my proposal was to implement the override for arm to do exactly what the core-mm used to do, which is a pointer dereference. So that would result in exact same behaviour for the arm arch. > > Essentially, on 32-bit 2-level ARM, the PGD is merely indexed by the > virtual address. As far as the kernel is concerned, each entry is > 64-bit, and the generic kernel code has no business accessing that > through the pgd pointer. > > The pgd pointer is passed through the PUD and PMD levels, where it is > typecast down through the kernel layers to a pmd_t pointer, where it > becomes a 32-bit quantity. This results in only the _first_ level 1 > pointer being dereferenced by kernel code to a 32-bit pmd_t quantity. > pmd_page_vaddr() converts this pmd_t quantity to a pte pointer (which > points at the software level 2 page tables, not the hardware page > tables.) As an aside, my understanding of Linux's pgtable model differs from what you describe. As I understand it, Linux's logical page table model has 5 levels (pgd, p4d, pud, pmd, pte). If an arch doesn't support all 5 levels, then the middle levels can be folded away (p4d first, then pud, then pmd). But the core-mm still logically walks all 5 levels. So if the HW supports 2 levels, those levels are (pgd, pte). But you are suggesting that arm exposes pmd and pte, which is not what Linux expects? (Perhaps you call it the pmd in the arch, but that is being folded and accessed through the pgd helpers in core code, I believe? > > So, as I'm now being told that the kernel wants to dereference the > pgd level despite the model I describe above, alarm bells are ringing. > I want full information please. > This is not new; the kernel already dereferences the pgd pointers. Thanks, Ryan
On Thu, Sep 19, 2024 at 07:49:09PM +0200, Ryan Roberts wrote: > On 19/09/2024 18:06, Russell King (Oracle) wrote: > > On Thu, Sep 19, 2024 at 05:48:58PM +0200, Ryan Roberts wrote: > >>> 32-bit arm uses, in some circumstances, an array because each level 1 > >>> page table entry is actually two descriptors. It needs to be this way > >>> because each level 2 table pointed to by each level 1 entry has 256 > >>> entries, meaning it only occupies 1024 bytes in a 4096 byte page. > >>> > >>> In order to cut down on the wastage, treat the level 1 page table as > >>> groups of two entries, which point to two consecutive 1024 byte tables > >>> in the level 2 page. > >>> > >>> The level 2 entry isn't suitable for the kernel's use cases (there are > >>> no bits to represent accessed/dirty and other important stuff that the > >>> Linux MM wants) so we maintain the hardware page tables and a separate > >>> set that Linux uses in the same page. Again, the software tables are > >>> consecutive, so from Linux's perspective, the level 2 page tables > >>> have 512 entries in them and occupy one full page. > >>> > >>> This is documented in arch/arm/include/asm/pgtable-2level.h > >>> > >>> However, what this means is that from the software perspective, the > >>> level 1 page table descriptors are an array of two entries, both of > >>> which need to be setup when creating a level 2 page table, but only > >>> the first one should ever be dereferenced when walking the tables, > >>> otherwise the code that walks the second level of page table entries > >>> will walk off the end of the software table into the actual hardware > >>> descriptors. > >>> > >>> I've no idea what the idea is behind introducing pgd_get() and what > >>> it's semantics are, so I can't comment further. > >> > >> The helper is intended to read the value of the entry pointed to by the passed > >> in pointer. And it shoiuld be read in a "single copy atomic" manner, meaning no > >> tearing. Further, the PTL is expected to be held when calling the getter. If the > >> HW can write to the entry such that its racing with the lock holder (i.e. HW > >> update of access/dirty) then READ_ONCE() should be suitable for most > >> architectures. If there is no possibility of racing (because HW doesn't write to > >> the entry), then a simple dereference would be sufficient, I think (which is > >> what the core code was already doing in most cases). > > > > The core code should be making no access to the PGD entries on 32-bit > > ARM since the PGD level does not exist. Writes are done at PMD level > > in arch code. Reads are done by core code at PMD level. > > > > It feels to me like pgd_get() just doesn't fit the model to which 32-bit > > ARM was designed to use decades ago, so I want full details about what > > pgd_get() is going to be used for and how it is going to be used, > > because I feel completely in the dark over this new development. I fear > > that someone hasn't understood the Linux page table model if they're > > wanting to access stuff at levels that effectively "aren't implemented" > > in the architecture specific kernel model of the page tables. > > This change isn't as big and scary as I think you fear. The situation is as I state above. Core code must _not_ dereference pgd pointers on 32-bit ARM. > The core-mm today > dereferences pgd pointers (and p4d, pud, pmd pointers) directly in its code. See > follow_pfnmap_start(), Doesn't seem to exist at least not in 6.11. > gup_fast_pgd_leaf(), and many other sites. Only built when CONFIG_HAVE_GUP_FAST is set, which 32-bit ARM doesn't set because its meaningless there, except when LPAE is in use (which is basically the situation I'm discussing.) > These changes > aim to abstract those dereferences into an inline function that the architecture > can override and implement if it so wishes. > > The core-mm implements default versions of these helper functions which do > READ_ONCE(), but does not currently use them consistently. > > From Anshuman's comments earlier in this thread, it looked to me like the arm > pgd_t type is too big to read with READ_ONCE() - it can't be atomically read on > that arch. So my proposal was to implement the override for arm to do exactly > what the core-mm used to do, which is a pointer dereference. So that would > result in exact same behaviour for the arm arch. Let me say this again: core code must NOT dereference pgds on 32-bit non-LPAE ARM. They are meaningless to core code. A pgd_t does not reference a single entry in hardware. It references two entries. > > Essentially, on 32-bit 2-level ARM, the PGD is merely indexed by the > > virtual address. As far as the kernel is concerned, each entry is > > 64-bit, and the generic kernel code has no business accessing that > > through the pgd pointer. > > > > The pgd pointer is passed through the PUD and PMD levels, where it is > > typecast down through the kernel layers to a pmd_t pointer, where it > > becomes a 32-bit quantity. This results in only the _first_ level 1 > > pointer being dereferenced by kernel code to a 32-bit pmd_t quantity. > > pmd_page_vaddr() converts this pmd_t quantity to a pte pointer (which > > points at the software level 2 page tables, not the hardware page > > tables.) > > As an aside, my understanding of Linux's pgtable model differs from what you > describe. As I understand it, Linux's logical page table model has 5 levels > (pgd, p4d, pud, pmd, pte). If an arch doesn't support all 5 levels, then the > middle levels can be folded away (p4d first, then pud, then pmd). But the > core-mm still logically walks all 5 levels. So if the HW supports 2 levels, > those levels are (pgd, pte). But you are suggesting that arm exposes pmd and > pte, which is not what Linux expects? (Perhaps you call it the pmd in the arch, > but that is being folded and accessed through the pgd helpers in core code, I > believe? What ARM does dates from before the Linux MM invented the current "folding" method when we had three page table levels - pgd, pmd and pte. The current folding techniques were invented well after 32-bit ARM was implemented, which was using the original idea of how to fold the page tables. The new folding came up with a totally different way of doing it, and I looked into converting 32-bit ARM over to it, but it wasn't possible to do so with the need for two level-1 entries to be managed for each level-2 page table. > > So, as I'm now being told that the kernel wants to dereference the > > pgd level despite the model I describe above, alarm bells are ringing. > > I want full information please. > > > > This is not new; the kernel already dereferences the pgd pointers. Consider that 32-bit ARM has been this way for decades (Linux was ported to 32-bit ARM by me back in the 1990s - so it's about 30 years old.) Compare that to what you're stating is "not new"... I beg to differ with your opinion on what is new and what isn't. It's all about the relative time. This is how the page tables are walked: static inline pgd_t *pgd_offset_pgd(pgd_t *pgd, unsigned long address) { return (pgd + pgd_index(address)); } #define pgd_offset(mm, address) pgd_offset_pgd((mm)->pgd, (address)) This returns a pointer to the pgd. This is then used with p4d_offset() when walking the next level, and this is defined on 32-bit ARM from include/asm-generic/pgtable-nop4d.h: static inline p4d_t *p4d_offset(pgd_t *pgd, unsigned long address) { return (p4d_t *)pgd; } Then from include/asm-generic/pgtable-nopud.h: static inline pud_t *pud_offset(p4d_t *p4d, unsigned long address) { return (pud_t *)p4d; } Then from arch/arm/include/asm/pgtable-2level.h: static inline pmd_t *pmd_offset(pud_t *pud, unsigned long addr) { return (pmd_t *)pud; } All of the above casts result in the pgd_t pointer being cast down to a pmd_t pointer. Now, looking at stuff in mm/memory.c such as unmap_page_range(). pgd = pgd_offset(vma->vm_mm, addr); This gets the pgd pointer into the level 1 page tables associated with addr, and passes it down to zap_p4d_range(). That passes it to p4d_offset() without dereferencing it, which on 32-bit ARM, merely casts the pgd_t pointer to a p4d_t pointer. Since a p4d_t is defined to be a struct of a pgd_t, this also points at an array of two 32-bit quantities. This pointer is passed down to zap_pud_range(). zap_pud_range() passes this pointer to pud_offset(), again without dereferencing it, and we end up with a pud_t pointer. Since pud_t is defined to be a struct of p4d_t, this also points to an array of two 32-bit quantities. We then have: if (pud_trans_huge(*pud) || pud_devmap(*pud)) { These is an implicit memory copy/access between the memory pointed to by pud, and their destination (which might be a register). However, these are optimised away because 32-bit ARM doesn't set HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD nor ARCH_HAS_PTE_DEVMAP (as neither inline function make use of their argument.) NOTE: If making these use READ_ONCE results in an access that can not be optimised away, that is a bug that needs to be addressed. zap_pud_range() then passes the pud pointer to zap_pmd_range(). zap_pmd_range() passes this pointer to pud_offset() with no further dereferences, and this gets cast to a pmd_t pointer, which is a pointer to the first 32-bit quantity pointed to by the pgd_t pointer. All the dereferences from this point on are 32-bit which can be done as single-copy atomic accesses. This will be the first real access to the level-1 page tables in this code path as the code stands today, and from this point on, accesses to the page tables are as the architecture intends them to be. Now, realise that for all of the accesses above that have all been optimised away, none of that code even existed when 32-bit ARM was using this method. The addition of these features not intefering with the way 32-bit non-LPAE ARM works relies on all of those accesses being optimised away, and they need to continue to be so going forward. Maybe that means that this new (and I mean new in relative terms compared to the age of the 32-bit ARM code) pgdp_get() accessor needs to be a non-dereferencing operation, so something like: #define pgdp_get(pgdp) ((pgd_t){ }) in arch/arm/include/asm/pgtable-2level.h (note the corrected spelling of pgdp), and the existing pgdp_get() moved to arch/arm/include/asm/pgtable-3level.h. This isn't tested. However, let me say this again... without knowing exactly how and where pgdp_get() is intended to be used, I'm clutching at straws here. Even looking at Linus' tree, there's very little in evidence there to suggest how pgdp_get() is intended to be used. For example, there's no references to it in mm/. Please realise that I have _no_ _clue_ what "[PATCH V2 7/7] mm: Use pgdp_get() for accessing PGD entries" is proposing. I wasn't on its Cc list. I haven't seen the patch. The first I knew anything about this was with the email that Anshuman Khandual sent in response to the kernel build bot's build error. I'm afraid that the kernel build bot's build error means that this patch: commit eba2591d99d1f14a04c8a8a845ab0795b93f5646 Author: Alexandre Ghiti <alexghiti@rivosinc.com> Date: Wed Dec 13 21:29:59 2023 +0100 mm: Introduce pudp/p4dp/pgdp_get() functions is actually broken. I'm sorry that I didn't review that, but how the series looked when it landed in my mailbox, it looked like it was specific to RISC-V and of no interest to me, so I didn't bother reading it (I get _lots_ of email, I can't read everything.) This is how it looks like in my mailbox (and note that they're marked as new to this day): 3218 N T Dec 13 Alexandre Ghiti ( 0) [PATCH v2 0/4] riscv: Use READ_ONCE()/WRI 3219 N T Dec 13 Alexandre Ghiti ( 0) ├─>[PATCH v2 1/4] riscv: Use WRITE_ONCE() 3220 N T Dec 13 Alexandre Ghiti ( 0) ├─>[PATCH v2 2/4] mm: Introduce pudp/p4dp 3221 N T Dec 13 Alexandre Ghiti ( 0) ├─>[PATCH v2 3/4] riscv: mm: Only compile 3222 N T Dec 13 Alexandre Ghiti ( 0) └─>[PATCH v2 4/4] riscv: Use accessors to 3223 N C Dec 14 Anup Patel ( 0) └─> Sorry, but I'm not even going to look at something like that when it looks like it's for RISC-V and nothing else. One final point... because I'm sure someone's going to say "but you were in the To: header". I've long since given up using "am I in the Cc/To header" to carry any useful or meaningful information to indicate whether it's something I should read. I'm afraid that the kernel community has long since taught me that is of no value what so ever, so I merely go by "does this look of any interest". If not, I don't bother even _opening_ the email.
On 19/09/2024 21:25, Russell King (Oracle) wrote: > On Thu, Sep 19, 2024 at 07:49:09PM +0200, Ryan Roberts wrote: >> On 19/09/2024 18:06, Russell King (Oracle) wrote: >>> On Thu, Sep 19, 2024 at 05:48:58PM +0200, Ryan Roberts wrote: >>>>> 32-bit arm uses, in some circumstances, an array because each level 1 >>>>> page table entry is actually two descriptors. It needs to be this way >>>>> because each level 2 table pointed to by each level 1 entry has 256 >>>>> entries, meaning it only occupies 1024 bytes in a 4096 byte page. >>>>> >>>>> In order to cut down on the wastage, treat the level 1 page table as >>>>> groups of two entries, which point to two consecutive 1024 byte tables >>>>> in the level 2 page. >>>>> >>>>> The level 2 entry isn't suitable for the kernel's use cases (there are >>>>> no bits to represent accessed/dirty and other important stuff that the >>>>> Linux MM wants) so we maintain the hardware page tables and a separate >>>>> set that Linux uses in the same page. Again, the software tables are >>>>> consecutive, so from Linux's perspective, the level 2 page tables >>>>> have 512 entries in them and occupy one full page. >>>>> >>>>> This is documented in arch/arm/include/asm/pgtable-2level.h >>>>> >>>>> However, what this means is that from the software perspective, the >>>>> level 1 page table descriptors are an array of two entries, both of >>>>> which need to be setup when creating a level 2 page table, but only >>>>> the first one should ever be dereferenced when walking the tables, >>>>> otherwise the code that walks the second level of page table entries >>>>> will walk off the end of the software table into the actual hardware >>>>> descriptors. >>>>> >>>>> I've no idea what the idea is behind introducing pgd_get() and what >>>>> it's semantics are, so I can't comment further. >>>> >>>> The helper is intended to read the value of the entry pointed to by the passed >>>> in pointer. And it shoiuld be read in a "single copy atomic" manner, meaning no >>>> tearing. Further, the PTL is expected to be held when calling the getter. If the >>>> HW can write to the entry such that its racing with the lock holder (i.e. HW >>>> update of access/dirty) then READ_ONCE() should be suitable for most >>>> architectures. If there is no possibility of racing (because HW doesn't write to >>>> the entry), then a simple dereference would be sufficient, I think (which is >>>> what the core code was already doing in most cases). >>> >>> The core code should be making no access to the PGD entries on 32-bit >>> ARM since the PGD level does not exist. Writes are done at PMD level >>> in arch code. Reads are done by core code at PMD level. >>> >>> It feels to me like pgd_get() just doesn't fit the model to which 32-bit >>> ARM was designed to use decades ago, so I want full details about what >>> pgd_get() is going to be used for and how it is going to be used, >>> because I feel completely in the dark over this new development. I fear >>> that someone hasn't understood the Linux page table model if they're >>> wanting to access stuff at levels that effectively "aren't implemented" >>> in the architecture specific kernel model of the page tables. >> >> This change isn't as big and scary as I think you fear. > > The situation is as I state above. Core code must _not_ dereference pgd > pointers on 32-bit ARM. Let's just rewind a bit. This thread exists because the kernel test robot failed to compile pgd_none_or_clear_bad() (a core-mm function) for the arm architecture after Anshuman changed the direct pgd dereference to pgdp_get(). The reason compilation failed is because arm defines its own pgdp_get() override, but it is broken (there is a typo). Code before Anshuman's change: static inline int pgd_none_or_clear_bad(pgd_t *pgd) { if (pgd_none(*pgd)) return 1; if (unlikely(pgd_bad(*pgd))) { pgd_clear_bad(pgd); return 1; } return 0; } Code after Anshuman's change: static inline int pgd_none_or_clear_bad(pgd_t *pgd) { pgd_t old_pgd = pgdp_get(pgd); if (pgd_none(old_pgd)) return 1; if (unlikely(pgd_bad(old_pgd))) { pgd_clear_bad(pgd); return 1; } return 0; } So the kernel _is_ alreday dereferencing pgd pointers for the arm arch, and has been since the beginning of (git) time. Note that pgd_none_or_clear_bad() is called from core code and from arm arch code. As an aside, the kernel also dereferences p4d, pud, pmd and pte pointers in various circumstances. And other changes in this series are also replacing those direct dereferences with calls to similar helpers. The fact that these are all folded (by a custom arm implementation if I've understood the below correctly) just means that each dereference is returning what you would call the pmd from the HW perspective, I think? > >> The core-mm today >> dereferences pgd pointers (and p4d, pud, pmd pointers) directly in its code. See >> follow_pfnmap_start(), > > Doesn't seem to exist at least not in 6.11. Appologies, I'm on mm-unstable and that isn't upstream yet. See follow_pte() in v6.11 or __apply_to_page_range(), or pgd_none_or_clear_bad() as per above. > >> gup_fast_pgd_leaf(), and many other sites. > > Only built when CONFIG_HAVE_GUP_FAST is set, which 32-bit ARM doesn't > set because its meaningless there, except when LPAE is in use (which is > basically the situation I'm discussing.) > >> These changes >> aim to abstract those dereferences into an inline function that the architecture >> can override and implement if it so wishes. >> >> The core-mm implements default versions of these helper functions which do >> READ_ONCE(), but does not currently use them consistently. >> >> From Anshuman's comments earlier in this thread, it looked to me like the arm >> pgd_t type is too big to read with READ_ONCE() - it can't be atomically read on >> that arch. So my proposal was to implement the override for arm to do exactly >> what the core-mm used to do, which is a pointer dereference. So that would >> result in exact same behaviour for the arm arch. > > Let me say this again: core code must NOT dereference pgds on 32-bit > non-LPAE ARM. They are meaningless to core code. A pgd_t does not > reference a single entry in hardware. It references two entries. OK, so there are 3 options; either I have misunderstood what the core code is doing (because as per above, I'm asserting that core code _is_ dereferencing pgd pointers), or the core code is dereferencing and that is buggy, or the core code is derefencing and its working as designed. I believe its the latter, but am willing to be proved wrong. > >>> Essentially, on 32-bit 2-level ARM, the PGD is merely indexed by the >>> virtual address. As far as the kernel is concerned, each entry is >>> 64-bit, and the generic kernel code has no business accessing that >>> through the pgd pointer. >>> >>> The pgd pointer is passed through the PUD and PMD levels, where it is >>> typecast down through the kernel layers to a pmd_t pointer, where it >>> becomes a 32-bit quantity. This results in only the _first_ level 1 >>> pointer being dereferenced by kernel code to a 32-bit pmd_t quantity. >>> pmd_page_vaddr() converts this pmd_t quantity to a pte pointer (which >>> points at the software level 2 page tables, not the hardware page >>> tables.) >> >> As an aside, my understanding of Linux's pgtable model differs from what you >> describe. As I understand it, Linux's logical page table model has 5 levels >> (pgd, p4d, pud, pmd, pte). If an arch doesn't support all 5 levels, then the >> middle levels can be folded away (p4d first, then pud, then pmd). But the >> core-mm still logically walks all 5 levels. So if the HW supports 2 levels, >> those levels are (pgd, pte). But you are suggesting that arm exposes pmd and >> pte, which is not what Linux expects? (Perhaps you call it the pmd in the arch, >> but that is being folded and accessed through the pgd helpers in core code, I >> believe? > > What ARM does dates from before the Linux MM invented the current > "folding" method when we had three page table levels - pgd, pmd > and pte. The current folding techniques were invented well after > 32-bit ARM was implemented, which was using the original idea of > how to fold the page tables. > > The new folding came up with a totally different way of doing it, > and I looked into converting 32-bit ARM over to it, but it wasn't > possible to do so with the need for two level-1 entries to be > managed for each level-2 page table. > >>> So, as I'm now being told that the kernel wants to dereference the >>> pgd level despite the model I describe above, alarm bells are ringing. >>> I want full information please. >>> >> >> This is not new; the kernel already dereferences the pgd pointers. > > Consider that 32-bit ARM has been this way for decades (Linux was ported > to 32-bit ARM by me back in the 1990s - so it's about 30 years old.) > Compare that to what you're stating is "not new"... I beg to differ with > your opinion on what is new and what isn't. It's all about the relative > time. By "not new" I meant that it's not introduced by this series. The kernel's dereferencing of pgd pointers was present before this series came along. > > This is how the page tables are walked: > > static inline pgd_t *pgd_offset_pgd(pgd_t *pgd, unsigned long address) > { > return (pgd + pgd_index(address)); > } > > #define pgd_offset(mm, address) pgd_offset_pgd((mm)->pgd, (address)) > > This returns a pointer to the pgd. This is then used with p4d_offset() > when walking the next level, and this is defined on 32-bit ARM from > include/asm-generic/pgtable-nop4d.h: > > static inline p4d_t *p4d_offset(pgd_t *pgd, unsigned long address) > { > return (p4d_t *)pgd; > } > > Then from include/asm-generic/pgtable-nopud.h: > > static inline pud_t *pud_offset(p4d_t *p4d, unsigned long address) > { > return (pud_t *)p4d; > } > > Then from arch/arm/include/asm/pgtable-2level.h: > > static inline pmd_t *pmd_offset(pud_t *pud, unsigned long addr) > { > return (pmd_t *)pud; > } > > All of the above casts result in the pgd_t pointer being cast down > to a pmd_t pointer. > > Now, looking at stuff in mm/memory.c such as unmap_page_range(). > > pgd = pgd_offset(vma->vm_mm, addr); > > This gets the pgd pointer into the level 1 page tables associated > with addr, and passes it down to zap_p4d_range(). > > That passes it to p4d_offset() without dereferencing it, which on > 32-bit ARM, merely casts the pgd_t pointer to a p4d_t pointer. Since > a p4d_t is defined to be a struct of a pgd_t, this also points at an > array of two 32-bit quantities. This pointer is passed down to > zap_pud_range(). > > zap_pud_range() passes this pointer to pud_offset(), again without > dereferencing it, and we end up with a pud_t pointer. Since pud_t is > defined to be a struct of p4d_t, this also points to an array of two > 32-bit quantities. > > We then have: > > if (pud_trans_huge(*pud) || pud_devmap(*pud)) { > > These is an implicit memory copy/access between the memory pointed to > by pud, and their destination (which might be a register). However, > these are optimised away because 32-bit ARM doesn't set > HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD nor ARCH_HAS_PTE_DEVMAP (as > neither inline function make use of their argument.) > > NOTE: If making these use READ_ONCE results in an access that can not > be optimised away, that is a bug that needs to be addressed. > > zap_pud_range() then passes the pud pointer to zap_pmd_range(). > > zap_pmd_range() passes this pointer to pud_offset() with no further > dereferences, and this gets cast to a pmd_t pointer, which is a > pointer to the first 32-bit quantity pointed to by the pgd_t pointer. > > All the dereferences from this point on are 32-bit which can be done > as single-copy atomic accesses. This will be the first real access > to the level-1 page tables in this code path as the code stands today, > and from this point on, accesses to the page tables are as the > architecture intends them to be. > > > Now, realise that for all of the accesses above that have all been > optimised away, none of that code even existed when 32-bit ARM was > using this method. The addition of these features not intefering > with the way 32-bit non-LPAE ARM works relies on all of those > accesses being optimised away, and they need to continue to be so > going forward. > > > Maybe that means that this new (and I mean new in relative terms > compared to the age of the 32-bit ARM code) pgdp_get() accessor > needs to be a non-dereferencing operation, so something like: > > #define pgdp_get(pgdp) ((pgd_t){ }) I'm afraid I haven't digested all these arm-specific details. But if I'm right that the core kernel does and is correct to dereference pgd pointers for these non-LPAE arm builds, then I think you at least need arm's implementation to be: #define pgdp_get(pgdp) (*pgdp) Thanks, Ryan > > in arch/arm/include/asm/pgtable-2level.h (note the corrected > spelling of pgdp), and the existing pgdp_get() moved to > arch/arm/include/asm/pgtable-3level.h. This isn't tested. > > However, let me say this again... without knowing exactly how > and where pgdp_get() is intended to be used, I'm clutching at > straws here. Even looking at Linus' tree, there's very little in > evidence there to suggest how pgdp_get() is intended to be used. > For example, there's no references to it in mm/. > > > Please realise that I have _no_ _clue_ what "[PATCH V2 7/7] mm: Use > pgdp_get() for accessing PGD entries" is proposing. I wasn't on its > Cc list. I haven't seen the patch. The first I knew anything about > this was with the email that Anshuman Khandual sent in response to > the kernel build bot's build error. Here is the full series for context: https://lore.kernel.org/linux-mm/20240917073117.1531207-1-anshuman.khandual@arm.com/ > > I'm afraid that the kernel build bot's build error means that this > patch: > > commit eba2591d99d1f14a04c8a8a845ab0795b93f5646 > Author: Alexandre Ghiti <alexghiti@rivosinc.com> > Date: Wed Dec 13 21:29:59 2023 +0100 > > mm: Introduce pudp/p4dp/pgdp_get() functions > > is actually broken. I'm sorry that I didn't review that, but how the > series looked when it landed in my mailbox, it looked like it was > specific to RISC-V and of no interest to me, so I didn't bother > reading it (I get _lots_ of email, I can't read everything.) This > is how it looks like in my mailbox (and note that they're marked > as new to this day): > > 3218 N T Dec 13 Alexandre Ghiti ( 0) [PATCH v2 0/4] riscv: Use READ_ONCE()/WRI > 3219 N T Dec 13 Alexandre Ghiti ( 0) ├─>[PATCH v2 1/4] riscv: Use WRITE_ONCE() > 3220 N T Dec 13 Alexandre Ghiti ( 0) ├─>[PATCH v2 2/4] mm: Introduce pudp/p4dp > 3221 N T Dec 13 Alexandre Ghiti ( 0) ├─>[PATCH v2 3/4] riscv: mm: Only compile > 3222 N T Dec 13 Alexandre Ghiti ( 0) └─>[PATCH v2 4/4] riscv: Use accessors to > 3223 N C Dec 14 Anup Patel ( 0) └─> > > Sorry, but I'm not even going to look at something like that when it > looks like it's for RISC-V and nothing else. > > One final point... because I'm sure someone's going to say "but you > were in the To: header". I've long since given up using "am I in the > Cc/To header" to carry any useful or meaningful information to > indicate whether it's something I should read. I'm afraid that the > kernel community has long since taught me that is of no value what > so ever, so I merely go by "does this look of any interest". If not, > I don't bother even _opening_ the email. >
On Fri, Sep 20, 2024 at 08:57:23AM +0200, Ryan Roberts wrote: > On 19/09/2024 21:25, Russell King (Oracle) wrote: > > On Thu, Sep 19, 2024 at 07:49:09PM +0200, Ryan Roberts wrote: > >> On 19/09/2024 18:06, Russell King (Oracle) wrote: > >>> On Thu, Sep 19, 2024 at 05:48:58PM +0200, Ryan Roberts wrote: > >>>>> 32-bit arm uses, in some circumstances, an array because each level 1 > >>>>> page table entry is actually two descriptors. It needs to be this way > >>>>> because each level 2 table pointed to by each level 1 entry has 256 > >>>>> entries, meaning it only occupies 1024 bytes in a 4096 byte page. > >>>>> > >>>>> In order to cut down on the wastage, treat the level 1 page table as > >>>>> groups of two entries, which point to two consecutive 1024 byte tables > >>>>> in the level 2 page. > >>>>> > >>>>> The level 2 entry isn't suitable for the kernel's use cases (there are > >>>>> no bits to represent accessed/dirty and other important stuff that the > >>>>> Linux MM wants) so we maintain the hardware page tables and a separate > >>>>> set that Linux uses in the same page. Again, the software tables are > >>>>> consecutive, so from Linux's perspective, the level 2 page tables > >>>>> have 512 entries in them and occupy one full page. > >>>>> > >>>>> This is documented in arch/arm/include/asm/pgtable-2level.h > >>>>> > >>>>> However, what this means is that from the software perspective, the > >>>>> level 1 page table descriptors are an array of two entries, both of > >>>>> which need to be setup when creating a level 2 page table, but only > >>>>> the first one should ever be dereferenced when walking the tables, > >>>>> otherwise the code that walks the second level of page table entries > >>>>> will walk off the end of the software table into the actual hardware > >>>>> descriptors. > >>>>> > >>>>> I've no idea what the idea is behind introducing pgd_get() and what > >>>>> it's semantics are, so I can't comment further. > >>>> > >>>> The helper is intended to read the value of the entry pointed to by the passed > >>>> in pointer. And it shoiuld be read in a "single copy atomic" manner, meaning no > >>>> tearing. Further, the PTL is expected to be held when calling the getter. If the > >>>> HW can write to the entry such that its racing with the lock holder (i.e. HW > >>>> update of access/dirty) then READ_ONCE() should be suitable for most > >>>> architectures. If there is no possibility of racing (because HW doesn't write to > >>>> the entry), then a simple dereference would be sufficient, I think (which is > >>>> what the core code was already doing in most cases). > >>> > >>> The core code should be making no access to the PGD entries on 32-bit > >>> ARM since the PGD level does not exist. Writes are done at PMD level > >>> in arch code. Reads are done by core code at PMD level. > >>> > >>> It feels to me like pgd_get() just doesn't fit the model to which 32-bit > >>> ARM was designed to use decades ago, so I want full details about what > >>> pgd_get() is going to be used for and how it is going to be used, > >>> because I feel completely in the dark over this new development. I fear > >>> that someone hasn't understood the Linux page table model if they're > >>> wanting to access stuff at levels that effectively "aren't implemented" > >>> in the architecture specific kernel model of the page tables. > >> > >> This change isn't as big and scary as I think you fear. > > > > The situation is as I state above. Core code must _not_ dereference pgd > > pointers on 32-bit ARM. > > Let's just rewind a bit. This thread exists because the kernel test robot failed > to compile pgd_none_or_clear_bad() (a core-mm function) for the arm architecture > after Anshuman changed the direct pgd dereference to pgdp_get(). The reason > compilation failed is because arm defines its own pgdp_get() override, but it is > broken (there is a typo). Let's not rewind, because had you fully read and digested my reply, you would have seen why this isn't a problem... but let me spell it out. > > Code before Anshuman's change: > > static inline int pgd_none_or_clear_bad(pgd_t *pgd) > { > if (pgd_none(*pgd)) > return 1; > if (unlikely(pgd_bad(*pgd))) { > pgd_clear_bad(pgd); > return 1; > } > return 0; > } This isn't a problem as the code stands. While there is a dereference in C, that dereference is a simple struct copy, something that we use everywhere in the kernel. However, that is as far as it goes, because neither pgd_none() and pgd_bad() make use of their argument, and thus the compiler will optimise it away, resulting in no actual access to the page tables - _as_ _intended_. If these are going to be converted to pgd_get(), then we need pgd_get() to _also_ be optimised away, and if e.g. this is the only place that pgd_get() is going to be used, the suggestion I made in my previous email is entirely reasonable, since we know that the result of pgd_get() will not actually be used. > As an aside, the kernel also dereferences p4d, pud, pmd and pte pointers in > various circumstances. I already covered these in my previous reply. > And other changes in this series are also replacing those > direct dereferences with calls to similar helpers. The fact that these are all > folded (by a custom arm implementation if I've understood the below correctly) > just means that each dereference is returning what you would call the pmd from > the HW perspective, I think? It'll "return" the first of each pair of level-1 page table entries, which is pgd[0] or *p4d, *pud, *pmd - but all of these except *pmd need to be optimised away, so throwing lots of READ_ONCE() around this code without considering this is certainly the wrong approach. > >> The core-mm today > >> dereferences pgd pointers (and p4d, pud, pmd pointers) directly in its code. See > >> follow_pfnmap_start(), > > > > Doesn't seem to exist at least not in 6.11. > > Appologies, I'm on mm-unstable and that isn't upstream yet. See follow_pte() in > v6.11 or __apply_to_page_range(), or pgd_none_or_clear_bad() as per above. Looking at follow_pte(), it's not a problem. I think we wouldn't be having this conversation before: commit a32618d28dbe6e9bf8ec508ccbc3561a7d7d32f0 Author: Russell King <rmk+kernel@arm.linux.org.uk> Date: Tue Nov 22 17:30:28 2011 +0000 ARM: pgtable: switch to use pgtable-nopud.h where: -#define pgd_none(pgd) (0) -#define pgd_bad(pgd) (0) existed before this commit - and thus the dereference in things like: pgd_none(*pgd) wouldn't even be visible to beyond the preprocessor step.
>> Let's just rewind a bit. This thread exists because the kernel test robot failed >> to compile pgd_none_or_clear_bad() (a core-mm function) for the arm architecture >> after Anshuman changed the direct pgd dereference to pgdp_get(). The reason >> compilation failed is because arm defines its own pgdp_get() override, but it is >> broken (there is a typo). > > Let's not rewind, because had you fully read and digested my reply, you > would have seen why this isn't a problem... but let me spell it out. > >> >> Code before Anshuman's change: >> >> static inline int pgd_none_or_clear_bad(pgd_t *pgd) >> { >> if (pgd_none(*pgd)) >> return 1; >> if (unlikely(pgd_bad(*pgd))) { >> pgd_clear_bad(pgd); >> return 1; >> } >> return 0; >> } > > This isn't a problem as the code stands. While there is a dereference > in C, that dereference is a simple struct copy, something that we use > everywhere in the kernel. However, that is as far as it goes, because > neither pgd_none() and pgd_bad() make use of their argument, and thus > the compiler will optimise it away, resulting in no actual access to > the page tables - _as_ _intended_. Right. Are you saying you depend upon those loads being optimized away for correctness or performance reasons? > > If these are going to be converted to pgd_get(), then we need pgd_get() > to _also_ be optimised away, OK, agreed. So perhaps the best approach is to modify the existing default pxdp_get() implementations to just do a C dereference. That will ensure that there are no intended consequences, unlike moving to READ_ONCE() by default. Then riscv (which I think is the only arch to actually use pxdp_get() currently?) will need its own pxdp_get() overrides, which use READ_ONCE(). arm64 would also define its own overrides in terms of READ_ONCE() to ensure single copy atomicity in the presence of HW updates. How does that sound to you? > and if e.g. this is the only place that > pgd_get() is going to be used, the suggestion I made in my previous > email is entirely reasonable, since we know that the result of pgd_get() > will not actually be used. I guess you could do that as an arm-specific override, but I don't think it adds anything over using my proposed reworked default? Your call. > >> As an aside, the kernel also dereferences p4d, pud, pmd and pte pointers in >> various circumstances. > > I already covered these in my previous reply. > >> And other changes in this series are also replacing those >> direct dereferences with calls to similar helpers. The fact that these are all >> folded (by a custom arm implementation if I've understood the below correctly) >> just means that each dereference is returning what you would call the pmd from >> the HW perspective, I think? > > It'll "return" the first of each pair of level-1 page table entries, > which is pgd[0] or *p4d, *pud, *pmd - but all of these except *pmd > need to be optimised away, so throwing lots of READ_ONCE() around > this code without considering this is certainly the wrong approach. Yep, got it. > >>>> The core-mm today >>>> dereferences pgd pointers (and p4d, pud, pmd pointers) directly in its code. See >>>> follow_pfnmap_start(), >>> >>> Doesn't seem to exist at least not in 6.11. >> >> Appologies, I'm on mm-unstable and that isn't upstream yet. See follow_pte() in >> v6.11 or __apply_to_page_range(), or pgd_none_or_clear_bad() as per above. > > Looking at follow_pte(), it's not a problem. > > I think we wouldn't be having this conversation before: > > commit a32618d28dbe6e9bf8ec508ccbc3561a7d7d32f0 > Author: Russell King <rmk+kernel@arm.linux.org.uk> > Date: Tue Nov 22 17:30:28 2011 +0000 > > ARM: pgtable: switch to use pgtable-nopud.h > > where: > -#define pgd_none(pgd) (0) > -#define pgd_bad(pgd) (0) > > existed before this commit - and thus the dereference in things like: > > pgd_none(*pgd) > > wouldn't even be visible to beyond the preprocessor step. >
diff --git a/drivers/misc/sgi-gru/grufault.c b/drivers/misc/sgi-gru/grufault.c index fcaceac60659..6aeccbd440e7 100644 --- a/drivers/misc/sgi-gru/grufault.c +++ b/drivers/misc/sgi-gru/grufault.c @@ -212,7 +212,7 @@ static int atomic_pte_lookup(struct vm_area_struct *vma, unsigned long vaddr, pte_t pte; pgdp = pgd_offset(vma->vm_mm, vaddr); - if (unlikely(pgd_none(*pgdp))) + if (unlikely(pgd_none(pgdp_get(pgdp)))) goto err; p4dp = p4d_offset(pgdp, vaddr); diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index 4044e15cdfd9..6d33c7a9eb01 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -304,7 +304,7 @@ static inline bool userfaultfd_must_wait(struct userfaultfd_ctx *ctx, assert_fault_locked(vmf); pgd = pgd_offset(mm, address); - if (!pgd_present(*pgd)) + if (!pgd_present(pgdp_get(pgd))) goto out; p4d = p4d_offset(pgd, address); if (!p4d_present(p4dp_get(p4d))) diff --git a/include/linux/mm.h b/include/linux/mm.h index 1bb1599b5779..1978a4b1fcf5 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2819,7 +2819,7 @@ int __pte_alloc_kernel(pmd_t *pmd); static inline p4d_t *p4d_alloc(struct mm_struct *mm, pgd_t *pgd, unsigned long address) { - return (unlikely(pgd_none(*pgd)) && __p4d_alloc(mm, pgd, address)) ? + return (unlikely(pgd_none(pgdp_get(pgd))) && __p4d_alloc(mm, pgd, address)) ? NULL : p4d_offset(pgd, address); } diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index 689cd5a32157..6d12ae7e3982 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -1088,7 +1088,8 @@ static inline int pgd_same(pgd_t pgd_a, pgd_t pgd_b) #define set_pgd_safe(pgdp, pgd) \ ({ \ - WARN_ON_ONCE(pgd_present(*pgdp) && !pgd_same(*pgdp, pgd)); \ + pgd_t __old = pgdp_get(pgdp); \ + WARN_ON_ONCE(pgd_present(__old) && !pgd_same(__old, pgd)); \ set_pgd(pgdp, pgd); \ }) @@ -1241,9 +1242,11 @@ void pmd_clear_bad(pmd_t *); static inline int pgd_none_or_clear_bad(pgd_t *pgd) { - if (pgd_none(*pgd)) + pgd_t old_pgd = pgdp_get(pgd); + + if (pgd_none(old_pgd)) return 1; - if (unlikely(pgd_bad(*pgd))) { + if (unlikely(pgd_bad(old_pgd))) { pgd_clear_bad(pgd); return 1; } diff --git a/kernel/events/core.c b/kernel/events/core.c index 4e56a276ed25..1e3142211cce 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -7603,7 +7603,7 @@ static u64 perf_get_pgtable_size(struct mm_struct *mm, unsigned long addr) pte_t *ptep, pte; pgdp = pgd_offset(mm, addr); - pgd = READ_ONCE(*pgdp); + pgd = pgdp_get(pgdp); if (pgd_none(pgd)) return 0; diff --git a/mm/gup.c b/mm/gup.c index 3a97d0263052..3aff3555ba19 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -1051,7 +1051,7 @@ static struct page *follow_page_mask(struct vm_area_struct *vma, unsigned long address, unsigned int flags, struct follow_page_context *ctx) { - pgd_t *pgd; + pgd_t *pgd, old_pgd; struct mm_struct *mm = vma->vm_mm; struct page *page; @@ -1060,7 +1060,8 @@ static struct page *follow_page_mask(struct vm_area_struct *vma, ctx->page_mask = 0; pgd = pgd_offset(mm, address); - if (pgd_none(*pgd) || unlikely(pgd_bad(*pgd))) + old_pgd = pgdp_get(pgd); + if (pgd_none(old_pgd) || unlikely(pgd_bad(old_pgd))) page = no_page_table(vma, flags, address); else page = follow_p4d_mask(vma, address, pgd, flags, ctx); @@ -1111,7 +1112,7 @@ static int get_gate_page(struct mm_struct *mm, unsigned long address, pgd = pgd_offset_k(address); else pgd = pgd_offset_gate(mm, address); - if (pgd_none(*pgd)) + if (pgd_none(pgdp_get(pgd))) return -EFAULT; p4d = p4d_offset(pgd, address); if (p4d_none(p4dp_get(p4d))) @@ -3158,7 +3159,7 @@ static int gup_fast_pgd_leaf(pgd_t orig, pgd_t *pgdp, unsigned long addr, if (!folio) return 0; - if (unlikely(pgd_val(orig) != pgd_val(*pgdp))) { + if (unlikely(pgd_val(orig) != pgd_val(pgdp_get(pgdp)))) { gup_put_folio(folio, refs, flags); return 0; } @@ -3267,7 +3268,7 @@ static void gup_fast_pgd_range(unsigned long addr, unsigned long end, pgdp = pgd_offset(current->mm, addr); do { - pgd_t pgd = READ_ONCE(*pgdp); + pgd_t pgd = pgdp_get(pgdp); next = pgd_addr_end(addr, end); if (pgd_none(pgd)) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 4fdb91c8cc2b..294d74b03d83 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -7451,7 +7451,7 @@ pte_t *huge_pte_offset(struct mm_struct *mm, pmd_t *pmd; pgd = pgd_offset(mm, addr); - if (!pgd_present(*pgd)) + if (!pgd_present(pgdp_get(pgd))) return NULL; p4d = p4d_offset(pgd, addr); if (!p4d_present(p4dp_get(p4d))) diff --git a/mm/kasan/init.c b/mm/kasan/init.c index 02af738fee5e..c2b307716551 100644 --- a/mm/kasan/init.c +++ b/mm/kasan/init.c @@ -271,7 +271,7 @@ int __ref kasan_populate_early_shadow(const void *shadow_start, continue; } - if (pgd_none(*pgd)) { + if (pgd_none(pgdp_get(pgd))) { p4d_t *p; if (slab_is_available()) { @@ -345,7 +345,7 @@ static void kasan_free_p4d(p4d_t *p4d_start, pgd_t *pgd) return; } - p4d_free(&init_mm, (p4d_t *)page_to_virt(pgd_page(*pgd))); + p4d_free(&init_mm, (p4d_t *)page_to_virt(pgd_page(pgdp_get(pgd)))); pgd_clear(pgd); } @@ -468,10 +468,10 @@ void kasan_remove_zero_shadow(void *start, unsigned long size) next = pgd_addr_end(addr, end); pgd = pgd_offset_k(addr); - if (!pgd_present(*pgd)) + if (!pgd_present(pgdp_get(pgd))) continue; - if (kasan_p4d_table(*pgd)) { + if (kasan_p4d_table(pgdp_get(pgd))) { if (IS_ALIGNED(addr, PGDIR_SIZE) && IS_ALIGNED(next, PGDIR_SIZE)) { pgd_clear(pgd); diff --git a/mm/kasan/shadow.c b/mm/kasan/shadow.c index 52150cc5ae5f..7f3c46237816 100644 --- a/mm/kasan/shadow.c +++ b/mm/kasan/shadow.c @@ -191,7 +191,7 @@ static bool shadow_mapped(unsigned long addr) pmd_t *pmd; pte_t *pte; - if (pgd_none(*pgd)) + if (pgd_none(pgdp_get(pgd))) return false; p4d = p4d_offset(pgd, addr); if (p4d_none(p4dp_get(p4d))) diff --git a/mm/memory-failure.c b/mm/memory-failure.c index 3d900cc039b3..c9397eab52bd 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -411,7 +411,7 @@ static unsigned long dev_pagemap_mapping_shift(struct vm_area_struct *vma, VM_BUG_ON_VMA(address == -EFAULT, vma); pgd = pgd_offset(vma->vm_mm, address); - if (!pgd_present(*pgd)) + if (!pgd_present(pgdp_get(pgd))) return 0; p4d = p4d_offset(pgd, address); if (!p4d_present(p4dp_get(p4d))) diff --git a/mm/memory.c b/mm/memory.c index 5056f39f2c3b..b4845a84ceb5 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -2942,7 +2942,7 @@ static int __apply_to_page_range(struct mm_struct *mm, unsigned long addr, unsigned long size, pte_fn_t fn, void *data, bool create) { - pgd_t *pgd; + pgd_t *pgd, old_pgd; unsigned long start = addr, next; unsigned long end = addr + size; pgtbl_mod_mask mask = 0; @@ -2954,11 +2954,12 @@ static int __apply_to_page_range(struct mm_struct *mm, unsigned long addr, pgd = pgd_offset(mm, addr); do { next = pgd_addr_end(addr, end); - if (pgd_none(*pgd) && !create) + old_pgd = pgdp_get(pgd); + if (pgd_none(old_pgd) && !create) continue; - if (WARN_ON_ONCE(pgd_leaf(*pgd))) + if (WARN_ON_ONCE(pgd_leaf(old_pgd))) return -EINVAL; - if (!pgd_none(*pgd) && WARN_ON_ONCE(pgd_bad(*pgd))) { + if (!pgd_none(old_pgd) && WARN_ON_ONCE(pgd_bad(old_pgd))) { if (!create) continue; pgd_clear_bad(pgd); @@ -6053,7 +6054,7 @@ int __p4d_alloc(struct mm_struct *mm, pgd_t *pgd, unsigned long address) return -ENOMEM; spin_lock(&mm->page_table_lock); - if (pgd_present(*pgd)) { /* Another has populated it */ + if (pgd_present(pgdp_get(pgd))) { /* Another has populated it */ p4d_free(mm, new); } else { smp_wmb(); /* See comment in pmd_install() */ @@ -6143,7 +6144,7 @@ int follow_pte(struct vm_area_struct *vma, unsigned long address, pte_t **ptepp, spinlock_t **ptlp) { struct mm_struct *mm = vma->vm_mm; - pgd_t *pgd; + pgd_t *pgd, old_pgd; p4d_t *p4d, old_p4d; pud_t *pud; pmd_t *pmd; @@ -6157,7 +6158,8 @@ int follow_pte(struct vm_area_struct *vma, unsigned long address, goto out; pgd = pgd_offset(mm, address); - if (pgd_none(*pgd) || unlikely(pgd_bad(*pgd))) + old_pgd = pgdp_get(pgd); + if (pgd_none(old_pgd) || unlikely(pgd_bad(old_pgd))) goto out; p4d = p4d_offset(pgd, address); diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c index a33f92db2666..fb8b610f7378 100644 --- a/mm/page_vma_mapped.c +++ b/mm/page_vma_mapped.c @@ -212,7 +212,7 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw) restart: do { pgd = pgd_offset(mm, pvmw->address); - if (!pgd_present(*pgd)) { + if (!pgd_present(pgdp_get(pgd))) { step_forward(pvmw, PGDIR_SIZE); continue; } diff --git a/mm/percpu.c b/mm/percpu.c index 58660e8eb892..70e68ab002e9 100644 --- a/mm/percpu.c +++ b/mm/percpu.c @@ -3184,7 +3184,7 @@ void __init __weak pcpu_populate_pte(unsigned long addr) pud_t *pud; pmd_t *pmd; - if (pgd_none(*pgd)) { + if (pgd_none(pgdp_get(pgd))) { p4d = memblock_alloc(P4D_TABLE_SIZE, P4D_TABLE_SIZE); if (!p4d) goto err_alloc; diff --git a/mm/pgalloc-track.h b/mm/pgalloc-track.h index 3db8ccbcb141..644f632c7cba 100644 --- a/mm/pgalloc-track.h +++ b/mm/pgalloc-track.h @@ -7,7 +7,7 @@ static inline p4d_t *p4d_alloc_track(struct mm_struct *mm, pgd_t *pgd, unsigned long address, pgtbl_mod_mask *mod_mask) { - if (unlikely(pgd_none(*pgd))) { + if (unlikely(pgd_none(pgdp_get(pgd)))) { if (__p4d_alloc(mm, pgd, address)) return NULL; *mod_mask |= PGTBL_PGD_MODIFIED; diff --git a/mm/pgtable-generic.c b/mm/pgtable-generic.c index f5ab52beb536..16c1ed5b3d0b 100644 --- a/mm/pgtable-generic.c +++ b/mm/pgtable-generic.c @@ -24,7 +24,7 @@ void pgd_clear_bad(pgd_t *pgd) { - pgd_ERROR(*pgd); + pgd_ERROR(pgdp_get(pgd)); pgd_clear(pgd); } diff --git a/mm/rmap.c b/mm/rmap.c index a0ff325467eb..5f4c52f34192 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -809,7 +809,7 @@ pmd_t *mm_find_pmd(struct mm_struct *mm, unsigned long address) pmd_t *pmd = NULL; pgd = pgd_offset(mm, address); - if (!pgd_present(*pgd)) + if (!pgd_present(pgdp_get(pgd))) goto out; p4d = p4d_offset(pgd, address); diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c index 2bd1c95f107a..ffc78329a130 100644 --- a/mm/sparse-vmemmap.c +++ b/mm/sparse-vmemmap.c @@ -233,7 +233,7 @@ p4d_t * __meminit vmemmap_p4d_populate(pgd_t *pgd, unsigned long addr, int node) pgd_t * __meminit vmemmap_pgd_populate(unsigned long addr, int node) { pgd_t *pgd = pgd_offset_k(addr); - if (pgd_none(*pgd)) { + if (pgd_none(pgdp_get(pgd))) { void *p = vmemmap_alloc_block_zero(PAGE_SIZE, node); if (!p) return NULL; diff --git a/mm/vmalloc.c b/mm/vmalloc.c index f27ecac7bd6e..a40323a8c6ab 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -450,7 +450,7 @@ void __vunmap_range_noflush(unsigned long start, unsigned long end) pgd = pgd_offset_k(addr); do { next = pgd_addr_end(addr, end); - if (pgd_bad(*pgd)) + if (pgd_bad(pgdp_get(pgd))) mask |= PGTBL_PGD_MODIFIED; if (pgd_none_or_clear_bad(pgd)) continue; @@ -582,7 +582,7 @@ static int vmap_small_pages_range_noflush(unsigned long addr, unsigned long end, pgd = pgd_offset_k(addr); do { next = pgd_addr_end(addr, end); - if (pgd_bad(*pgd)) + if (pgd_bad(pgdp_get(pgd))) mask |= PGTBL_PGD_MODIFIED; err = vmap_pages_p4d_range(pgd, addr, next, prot, pages, &nr, &mask); if (err) @@ -740,7 +740,7 @@ struct page *vmalloc_to_page(const void *vmalloc_addr) { unsigned long addr = (unsigned long) vmalloc_addr; struct page *page = NULL; - pgd_t *pgd = pgd_offset_k(addr); + pgd_t *pgd = pgd_offset_k(addr), old_pgd; p4d_t *p4d, old_p4d; pud_t *pud, old_pud; pmd_t *pmd, old_pmd; @@ -752,11 +752,12 @@ struct page *vmalloc_to_page(const void *vmalloc_addr) */ VIRTUAL_BUG_ON(!is_vmalloc_or_module_addr(vmalloc_addr)); - if (pgd_none(*pgd)) + old_pgd = pgdp_get(pgd); + if (pgd_none(old_pgd)) return NULL; - if (WARN_ON_ONCE(pgd_leaf(*pgd))) + if (WARN_ON_ONCE(pgd_leaf(old_pgd))) return NULL; /* XXX: no allowance for huge pgd */ - if (WARN_ON_ONCE(pgd_bad(*pgd))) + if (WARN_ON_ONCE(pgd_bad(old_pgd))) return NULL; p4d = p4d_offset(pgd, addr);
Convert PGD accesses via pgdp_get() helper that defaults as READ_ONCE() but also provides the platform an opportunity to override when required. This stores read page table entry value in a local variable which can be used in multiple instances there after. This helps in avoiding multiple memory load operations as well possible race conditions. Cc: Dimitri Sivanich <dimitri.sivanich@hpe.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Muchun Song <muchun.song@linux.dev> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Dennis Zhou <dennis@kernel.org> Cc: Tejun Heo <tj@kernel.org> cc: Christoph Lameter <cl@linux.com> Cc: Uladzislau Rezki <urezki@gmail.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: linux-kernel@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org Cc: linux-mm@kvack.org Cc: linux-perf-users@vger.kernel.org Cc: kasan-dev@googlegroups.com Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> --- drivers/misc/sgi-gru/grufault.c | 2 +- fs/userfaultfd.c | 2 +- include/linux/mm.h | 2 +- include/linux/pgtable.h | 9 ++++++--- kernel/events/core.c | 2 +- mm/gup.c | 11 ++++++----- mm/hugetlb.c | 2 +- mm/kasan/init.c | 8 ++++---- mm/kasan/shadow.c | 2 +- mm/memory-failure.c | 2 +- mm/memory.c | 16 +++++++++------- mm/page_vma_mapped.c | 2 +- mm/percpu.c | 2 +- mm/pgalloc-track.h | 2 +- mm/pgtable-generic.c | 2 +- mm/rmap.c | 2 +- mm/sparse-vmemmap.c | 2 +- mm/vmalloc.c | 13 +++++++------ 18 files changed, 45 insertions(+), 38 deletions(-)