Message ID | 20191126224446.15145-3-consult-mg@gstardust.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | riscv: Align shared mappings to avoid cache aliasing | expand |
On Tue, 26 Nov 2019 14:44:46 PST (-0800), consult-mg@gstardust.com wrote: > Set SHMLBA to the maximum cache "span" (line size * number of sets) of > all CPU L1 instruction and data caches (L2 and up are rarely VIPT). > This avoids VIPT cache aliasing with minimal alignment constraints. > > If the device tree does not provide cache parameters, use a conservative > default of 16 KB: only large enough to avoid aliasing in most VIPT caches. > > Signed-off-by: Marc Gauthier <consult-mg@gstardust.com> > --- > arch/riscv/include/asm/Kbuild | 1 - > arch/riscv/include/asm/shmparam.h | 12 +++++++ > arch/riscv/kernel/cacheinfo.c | 52 +++++++++++++++++++++++++++++++ > 3 files changed, 64 insertions(+), 1 deletion(-) > create mode 100644 arch/riscv/include/asm/shmparam.h > > diff --git a/arch/riscv/include/asm/Kbuild b/arch/riscv/include/asm/Kbuild > index 16970f246860..3905765807af 100644 > --- a/arch/riscv/include/asm/Kbuild > +++ b/arch/riscv/include/asm/Kbuild > @@ -27,7 +27,6 @@ generic-y += percpu.h > generic-y += preempt.h > generic-y += sections.h > generic-y += serial.h > -generic-y += shmparam.h > generic-y += topology.h > generic-y += trace_clock.h > generic-y += unaligned.h > diff --git a/arch/riscv/include/asm/shmparam.h b/arch/riscv/include/asm/shmparam.h > new file mode 100644 > index 000000000000..9b6a98153648 > --- /dev/null > +++ b/arch/riscv/include/asm/shmparam.h > @@ -0,0 +1,12 @@ > +/* SPDX-License-Identifier: GPL-2.0 */ > +#ifndef _ASM_RISCV_SHMPARAM_H > +#define _ASM_RISCV_SHMPARAM_H > + > +/* > + * Minimum alignment of shared memory segments as a function of cache geometry. > + */ > +#define SHMLBA arch_shmlba() I'd prefer if we inline the memoization, which would avoid the cost of a function call in the general case. You can also avoid that 0 test by initializing the variable to PAGE_SIZE and the filling it out in our early init code -- maybe setup_vm()? That's what SPARC is doing. > + > +long arch_shmlba(void); > + > +#endif /* _ASM_RISCV_SHMPARAM_H */ > diff --git a/arch/riscv/kernel/cacheinfo.c b/arch/riscv/kernel/cacheinfo.c > index 4c90c07d8c39..1bc7df8577d6 100644 > --- a/arch/riscv/kernel/cacheinfo.c > +++ b/arch/riscv/kernel/cacheinfo.c > @@ -1,12 +1,61 @@ > // SPDX-License-Identifier: GPL-2.0-only > /* > * Copyright (C) 2017 SiFive > + * Copyright (C) 2019 Aril Inc > */ > > #include <linux/cacheinfo.h> > #include <linux/cpu.h> > #include <linux/of.h> > #include <linux/of_device.h> > +#include <linux/mm.h> > + > +static long shmlba; > + > + > +/* > + * Assuming cache size = line size * #sets * N for N-way associative caches, > + * return the max cache "span" == (line size * #sets) == (cache size / N) > + * across all L1 caches, or 0 if cache parameters are not available. > + * VIPT caches with span > min page size are susceptible to aliasing. > + */ > +static long get_max_cache_span(void) > +{ > + struct cpu_cacheinfo *this_cpu_ci; > + struct cacheinfo *this_leaf; > + long span, max_span = 0; > + int cpu, leaf; > + > + for_each_possible_cpu(cpu) { > + this_cpu_ci = get_cpu_cacheinfo(cpu); > + this_leaf = this_cpu_ci->info_list; > + for (leaf = 0; leaf < this_cpu_ci->num_leaves; leaf++) { > + if (this_leaf->level > 1) > + break; > + span = this_leaf->coherency_line_size * > + this_leaf->number_of_sets; > + if (span > max_span) > + max_span = span; > + this_leaf++; > + } > + } > + return max_span; > +} > + > +/* > + * Align shared mappings to the maximum cache "span" to avoid aliasing > + * in VIPT caches, for performance. > + * The returned SHMLBA value is always a power-of-two multiple of PAGE_SIZE. > + */ > +long arch_shmlba(void) > +{ > + if (shmlba == 0) { > + long max_span = get_max_cache_span(); > + > + shmlba = max_span ? PAGE_ALIGN(max_span) : 4 * PAGE_SIZE; I'd prefer to avoid sneaking in a default 4*PAGE_SIZE here, just default to PAGE_SIZE and rely on systems with this behavior specifying the correct tuning value in the device tree. This avoids changing the behavior for existing systems, which is a slight regression as the alignment uses more memory. It's not a big deal, but on systems that don't require alignment for high performance there's no reason to just throw away memory -- particularly as we have some RISC-V systems with pretty limited memory (I'm thinking of the Kendryte boards, though I don't know how SHMLBA interacts with NOMMU so it might not matter). > + } > + return shmlba; > +} > > static void ci_leaf_init(struct cacheinfo *this_leaf, > struct device_node *node, > @@ -93,6 +142,9 @@ static int __populate_cache_leaves(unsigned int cpu) > } > of_node_put(np); > > + /* Force recalculating SHMLBA if cache parameters are updated. */ > + shmlba = 0; > + > return 0; > }
Palmer Dabbelt wrote on 2019-12-05 18:03: > On Tue, 26 Nov 2019 14:44:46 PST (-0800), consult-mg@gstardust.com wrote: >> Set SHMLBA to the maximum cache "span" (line size * number of sets) of >> all CPU L1 instruction and data caches (L2 and up are rarely VIPT). >> This avoids VIPT cache aliasing with minimal alignment constraints. >> >> If the device tree does not provide cache parameters, use a conservative >> default of 16 KB: only large enough to avoid aliasing in most VIPT >> caches. >> >> Signed-off-by: Marc Gauthier <consult-mg@gstardust.com> >> --- >> arch/riscv/include/asm/Kbuild | 1 - >> arch/riscv/include/asm/shmparam.h | 12 +++++++ >> arch/riscv/kernel/cacheinfo.c | 52 +++++++++++++++++++++++++++++++ >> 3 files changed, 64 insertions(+), 1 deletion(-) >> create mode 100644 arch/riscv/include/asm/shmparam.h >> >> diff --git a/arch/riscv/include/asm/Kbuild >> b/arch/riscv/include/asm/Kbuild >> index 16970f246860..3905765807af 100644 >> --- a/arch/riscv/include/asm/Kbuild >> +++ b/arch/riscv/include/asm/Kbuild >> @@ -27,7 +27,6 @@ generic-y += percpu.h >> generic-y += preempt.h >> generic-y += sections.h >> generic-y += serial.h >> -generic-y += shmparam.h >> generic-y += topology.h >> generic-y += trace_clock.h >> generic-y += unaligned.h >> diff --git a/arch/riscv/include/asm/shmparam.h >> b/arch/riscv/include/asm/shmparam.h >> new file mode 100644 >> index 000000000000..9b6a98153648 >> --- /dev/null >> +++ b/arch/riscv/include/asm/shmparam.h >> @@ -0,0 +1,12 @@ >> +/* SPDX-License-Identifier: GPL-2.0 */ >> +#ifndef _ASM_RISCV_SHMPARAM_H >> +#define _ASM_RISCV_SHMPARAM_H >> + >> +/* >> + * Minimum alignment of shared memory segments as a function of >> cache geometry. >> + */ >> +#define SHMLBA arch_shmlba() > > I'd prefer if we inline the memoization, which would avoid the cost of a > function call in the general case. You can also avoid that 0 test by > initializing the variable to PAGE_SIZE and the filling it out in our > early init > code -- maybe setup_vm()? That's what SPARC is doing. Good point. Unlike SPARC, this patch re-uses existing code in drivers/base/cacheinfo.c to compute cache parameters. To preserve that, it'll be more robust to initialize shmlba at a point certain to have those parameters -- at the comment far below, "Force recalculating SHMLBA if cache parameters are updated." That way it keeps working if that point in time changes. >> + >> +long arch_shmlba(void); >> + >> +#endif /* _ASM_RISCV_SHMPARAM_H */ >> diff --git a/arch/riscv/kernel/cacheinfo.c >> b/arch/riscv/kernel/cacheinfo.c >> index 4c90c07d8c39..1bc7df8577d6 100644 >> --- a/arch/riscv/kernel/cacheinfo.c >> +++ b/arch/riscv/kernel/cacheinfo.c >> @@ -1,12 +1,61 @@ >> // SPDX-License-Identifier: GPL-2.0-only >> /* >> * Copyright (C) 2017 SiFive >> + * Copyright (C) 2019 Aril Inc >> */ >> >> #include <linux/cacheinfo.h> >> #include <linux/cpu.h> >> #include <linux/of.h> >> #include <linux/of_device.h> >> +#include <linux/mm.h> >> + >> +static long shmlba; >> + >> + >> +/* >> + * Assuming cache size = line size * #sets * N for N-way >> associative caches, >> + * return the max cache "span" == (line size * #sets) == (cache size >> / N) >> + * across all L1 caches, or 0 if cache parameters are not available. >> + * VIPT caches with span > min page size are susceptible to aliasing. >> + */ >> +static long get_max_cache_span(void) >> +{ >> + struct cpu_cacheinfo *this_cpu_ci; >> + struct cacheinfo *this_leaf; >> + long span, max_span = 0; >> + int cpu, leaf; >> + >> + for_each_possible_cpu(cpu) { >> + this_cpu_ci = get_cpu_cacheinfo(cpu); >> + this_leaf = this_cpu_ci->info_list; >> + for (leaf = 0; leaf < this_cpu_ci->num_leaves; leaf++) { >> + if (this_leaf->level > 1) >> + break; >> + span = this_leaf->coherency_line_size * >> + this_leaf->number_of_sets; >> + if (span > max_span) >> + max_span = span; >> + this_leaf++; >> + } >> + } >> + return max_span; >> +} >> + >> +/* >> + * Align shared mappings to the maximum cache "span" to avoid aliasing >> + * in VIPT caches, for performance. >> + * The returned SHMLBA value is always a power-of-two multiple of >> PAGE_SIZE. >> + */ >> +long arch_shmlba(void) >> +{ >> + if (shmlba == 0) { >> + long max_span = get_max_cache_span(); >> + >> + shmlba = max_span ? PAGE_ALIGN(max_span) : 4 * PAGE_SIZE; > > I'd prefer to avoid sneaking in a default 4*PAGE_SIZE here, just > default to > PAGE_SIZE and rely on systems with this behavior specifying the > correct tuning > value in the device tree. Fair enough. > This avoids changing the behavior for existing > systems, which is a slight regression as the alignment uses more > memory. It's > not a big deal, but on systems that don't require alignment for high > performance there's no reason to just throw away memory -- > particularly as we > have some RISC-V systems with pretty limited memory Greater alignment takes up more virtual memory, not more physical memory. > (I'm thinking of the > Kendryte boards, though I don't know how SHMLBA interacts with NOMMU > so it > might not matter). There's no virtual memory in NOMMU, so indeed it doesn't matter. M >> + } >> + return shmlba; >> +} >> >> static void ci_leaf_init(struct cacheinfo *this_leaf, >> struct device_node *node, >> @@ -93,6 +142,9 @@ static int __populate_cache_leaves(unsigned int cpu) >> } >> of_node_put(np); >> >> + /* Force recalculating SHMLBA if cache parameters are updated. */ >> + shmlba = 0; >> + >> return 0; >> }
On Thu, 05 Dec 2019 15:58:25 PST (-0800), consult-mg@gstardust.com wrote: > Palmer Dabbelt wrote on 2019-12-05 18:03: >> On Tue, 26 Nov 2019 14:44:46 PST (-0800), consult-mg@gstardust.com wrote: >>> Set SHMLBA to the maximum cache "span" (line size * number of sets) of >>> all CPU L1 instruction and data caches (L2 and up are rarely VIPT). >>> This avoids VIPT cache aliasing with minimal alignment constraints. >>> >>> If the device tree does not provide cache parameters, use a conservative >>> default of 16 KB: only large enough to avoid aliasing in most VIPT >>> caches. >>> >>> Signed-off-by: Marc Gauthier <consult-mg@gstardust.com> >>> --- >>> arch/riscv/include/asm/Kbuild | 1 - >>> arch/riscv/include/asm/shmparam.h | 12 +++++++ >>> arch/riscv/kernel/cacheinfo.c | 52 +++++++++++++++++++++++++++++++ >>> 3 files changed, 64 insertions(+), 1 deletion(-) >>> create mode 100644 arch/riscv/include/asm/shmparam.h >>> >>> diff --git a/arch/riscv/include/asm/Kbuild >>> b/arch/riscv/include/asm/Kbuild >>> index 16970f246860..3905765807af 100644 >>> --- a/arch/riscv/include/asm/Kbuild >>> +++ b/arch/riscv/include/asm/Kbuild >>> @@ -27,7 +27,6 @@ generic-y += percpu.h >>> generic-y += preempt.h >>> generic-y += sections.h >>> generic-y += serial.h >>> -generic-y += shmparam.h >>> generic-y += topology.h >>> generic-y += trace_clock.h >>> generic-y += unaligned.h >>> diff --git a/arch/riscv/include/asm/shmparam.h >>> b/arch/riscv/include/asm/shmparam.h >>> new file mode 100644 >>> index 000000000000..9b6a98153648 >>> --- /dev/null >>> +++ b/arch/riscv/include/asm/shmparam.h >>> @@ -0,0 +1,12 @@ >>> +/* SPDX-License-Identifier: GPL-2.0 */ >>> +#ifndef _ASM_RISCV_SHMPARAM_H >>> +#define _ASM_RISCV_SHMPARAM_H >>> + >>> +/* >>> + * Minimum alignment of shared memory segments as a function of >>> cache geometry. >>> + */ >>> +#define SHMLBA arch_shmlba() >> >> I'd prefer if we inline the memoization, which would avoid the cost of a >> function call in the general case. You can also avoid that 0 test by >> initializing the variable to PAGE_SIZE and the filling it out in our >> early init >> code -- maybe setup_vm()? That's what SPARC is doing. > > Good point. > Unlike SPARC, this patch re-uses existing code in > drivers/base/cacheinfo.c to compute cache parameters. To preserve that, > it'll be more robust to initialize shmlba at a point certain to have > those parameters -- at the comment far below, "Force recalculating > SHMLBA if cache parameters are updated." That way it keeps working if > that point in time changes. Works for me. >>> + >>> +long arch_shmlba(void); >>> + >>> +#endif /* _ASM_RISCV_SHMPARAM_H */ >>> diff --git a/arch/riscv/kernel/cacheinfo.c >>> b/arch/riscv/kernel/cacheinfo.c >>> index 4c90c07d8c39..1bc7df8577d6 100644 >>> --- a/arch/riscv/kernel/cacheinfo.c >>> +++ b/arch/riscv/kernel/cacheinfo.c >>> @@ -1,12 +1,61 @@ >>> // SPDX-License-Identifier: GPL-2.0-only >>> /* >>> * Copyright (C) 2017 SiFive >>> + * Copyright (C) 2019 Aril Inc >>> */ >>> >>> #include <linux/cacheinfo.h> >>> #include <linux/cpu.h> >>> #include <linux/of.h> >>> #include <linux/of_device.h> >>> +#include <linux/mm.h> >>> + >>> +static long shmlba; >>> + >>> + >>> +/* >>> + * Assuming cache size = line size * #sets * N for N-way >>> associative caches, >>> + * return the max cache "span" == (line size * #sets) == (cache size >>> / N) >>> + * across all L1 caches, or 0 if cache parameters are not available. >>> + * VIPT caches with span > min page size are susceptible to aliasing. >>> + */ >>> +static long get_max_cache_span(void) >>> +{ >>> + struct cpu_cacheinfo *this_cpu_ci; >>> + struct cacheinfo *this_leaf; >>> + long span, max_span = 0; >>> + int cpu, leaf; >>> + >>> + for_each_possible_cpu(cpu) { >>> + this_cpu_ci = get_cpu_cacheinfo(cpu); >>> + this_leaf = this_cpu_ci->info_list; >>> + for (leaf = 0; leaf < this_cpu_ci->num_leaves; leaf++) { >>> + if (this_leaf->level > 1) >>> + break; >>> + span = this_leaf->coherency_line_size * >>> + this_leaf->number_of_sets; >>> + if (span > max_span) >>> + max_span = span; >>> + this_leaf++; >>> + } >>> + } >>> + return max_span; >>> +} >>> + >>> +/* >>> + * Align shared mappings to the maximum cache "span" to avoid aliasing >>> + * in VIPT caches, for performance. >>> + * The returned SHMLBA value is always a power-of-two multiple of >>> PAGE_SIZE. >>> + */ >>> +long arch_shmlba(void) >>> +{ >>> + if (shmlba == 0) { >>> + long max_span = get_max_cache_span(); >>> + >>> + shmlba = max_span ? PAGE_ALIGN(max_span) : 4 * PAGE_SIZE; >> >> I'd prefer to avoid sneaking in a default 4*PAGE_SIZE here, just >> default to >> PAGE_SIZE and rely on systems with this behavior specifying the >> correct tuning >> value in the device tree. > > Fair enough. > > >> This avoids changing the behavior for existing >> systems, which is a slight regression as the alignment uses more >> memory. It's >> not a big deal, but on systems that don't require alignment for high >> performance there's no reason to just throw away memory -- >> particularly as we >> have some RISC-V systems with pretty limited memory > > Greater alignment takes up more virtual memory, not more physical memory. > > >> (I'm thinking of the >> Kendryte boards, though I don't know how SHMLBA interacts with NOMMU >> so it >> might not matter). > > There's no virtual memory in NOMMU, so indeed it doesn't matter. Of course :). I'd still like to leave the default alone, if only to prevent people from relying on an arbitrary default decision. > > M > > >>> + } >>> + return shmlba; >>> +} >>> >>> static void ci_leaf_init(struct cacheinfo *this_leaf, >>> struct device_node *node, >>> @@ -93,6 +142,9 @@ static int __populate_cache_leaves(unsigned int cpu) >>> } >>> of_node_put(np); >>> >>> + /* Force recalculating SHMLBA if cache parameters are updated. */ >>> + shmlba = 0; >>> + >>> return 0; >>> }
diff --git a/arch/riscv/include/asm/Kbuild b/arch/riscv/include/asm/Kbuild index 16970f246860..3905765807af 100644 --- a/arch/riscv/include/asm/Kbuild +++ b/arch/riscv/include/asm/Kbuild @@ -27,7 +27,6 @@ generic-y += percpu.h generic-y += preempt.h generic-y += sections.h generic-y += serial.h -generic-y += shmparam.h generic-y += topology.h generic-y += trace_clock.h generic-y += unaligned.h diff --git a/arch/riscv/include/asm/shmparam.h b/arch/riscv/include/asm/shmparam.h new file mode 100644 index 000000000000..9b6a98153648 --- /dev/null +++ b/arch/riscv/include/asm/shmparam.h @@ -0,0 +1,12 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_RISCV_SHMPARAM_H +#define _ASM_RISCV_SHMPARAM_H + +/* + * Minimum alignment of shared memory segments as a function of cache geometry. + */ +#define SHMLBA arch_shmlba() + +long arch_shmlba(void); + +#endif /* _ASM_RISCV_SHMPARAM_H */ diff --git a/arch/riscv/kernel/cacheinfo.c b/arch/riscv/kernel/cacheinfo.c index 4c90c07d8c39..1bc7df8577d6 100644 --- a/arch/riscv/kernel/cacheinfo.c +++ b/arch/riscv/kernel/cacheinfo.c @@ -1,12 +1,61 @@ // SPDX-License-Identifier: GPL-2.0-only /* * Copyright (C) 2017 SiFive + * Copyright (C) 2019 Aril Inc */ #include <linux/cacheinfo.h> #include <linux/cpu.h> #include <linux/of.h> #include <linux/of_device.h> +#include <linux/mm.h> + +static long shmlba; + + +/* + * Assuming cache size = line size * #sets * N for N-way associative caches, + * return the max cache "span" == (line size * #sets) == (cache size / N) + * across all L1 caches, or 0 if cache parameters are not available. + * VIPT caches with span > min page size are susceptible to aliasing. + */ +static long get_max_cache_span(void) +{ + struct cpu_cacheinfo *this_cpu_ci; + struct cacheinfo *this_leaf; + long span, max_span = 0; + int cpu, leaf; + + for_each_possible_cpu(cpu) { + this_cpu_ci = get_cpu_cacheinfo(cpu); + this_leaf = this_cpu_ci->info_list; + for (leaf = 0; leaf < this_cpu_ci->num_leaves; leaf++) { + if (this_leaf->level > 1) + break; + span = this_leaf->coherency_line_size * + this_leaf->number_of_sets; + if (span > max_span) + max_span = span; + this_leaf++; + } + } + return max_span; +} + +/* + * Align shared mappings to the maximum cache "span" to avoid aliasing + * in VIPT caches, for performance. + * The returned SHMLBA value is always a power-of-two multiple of PAGE_SIZE. + */ +long arch_shmlba(void) +{ + if (shmlba == 0) { + long max_span = get_max_cache_span(); + + shmlba = max_span ? PAGE_ALIGN(max_span) : 4 * PAGE_SIZE; + } + return shmlba; +} static void ci_leaf_init(struct cacheinfo *this_leaf, struct device_node *node, @@ -93,6 +142,9 @@ static int __populate_cache_leaves(unsigned int cpu) } of_node_put(np); + /* Force recalculating SHMLBA if cache parameters are updated. */ + shmlba = 0; + return 0; }
Set SHMLBA to the maximum cache "span" (line size * number of sets) of all CPU L1 instruction and data caches (L2 and up are rarely VIPT). This avoids VIPT cache aliasing with minimal alignment constraints. If the device tree does not provide cache parameters, use a conservative default of 16 KB: only large enough to avoid aliasing in most VIPT caches. Signed-off-by: Marc Gauthier <consult-mg@gstardust.com> --- arch/riscv/include/asm/Kbuild | 1 - arch/riscv/include/asm/shmparam.h | 12 +++++++ arch/riscv/kernel/cacheinfo.c | 52 +++++++++++++++++++++++++++++++ 3 files changed, 64 insertions(+), 1 deletion(-) create mode 100644 arch/riscv/include/asm/shmparam.h