Message ID | 19dc5a54-4f53-5f69-5ade-4c354f63a356@gentoo.org (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | [v2] MIPS: Split R10000 to allow for R12K+ optimizations | expand |
On Wed, 13 May 2020, Joshua Kinard wrote: > diff --git a/arch/mips/include/asm/mach-ip27/war.h b/arch/mips/include/asm/mach-ip27/war.h > index ef3efce0094a..845b8951d74f 100644 > --- a/arch/mips/include/asm/mach-ip27/war.h > +++ b/arch/mips/include/asm/mach-ip27/war.h > @@ -17,7 +17,12 @@ > #define MIPS_CACHE_SYNC_WAR 0 > #define TX49XX_ICACHE_INDEX_INV_WAR 0 > #define ICACHE_REFILLS_WORKAROUND_WAR 0 > -#define R10000_LLSC_WAR 1 > #define MIPS34K_MISSED_ITLB_WAR 0 > > +#ifdef CONFIG_CPU_R10000 > +#define R10000_LLSC_WAR 1 > +#else > +#define R10000_LLSC_WAR 0 > +#endif > + I think it would be good not to reorder the macros (even though there's preexisting breakage in <asm/mach-ip30/war.h>) so that all the files have them in the same order. Maciej
On 5/13/2020 21:52, Maciej W. Rozycki wrote: > On Wed, 13 May 2020, Joshua Kinard wrote: > >> diff --git a/arch/mips/include/asm/mach-ip27/war.h b/arch/mips/include/asm/mach-ip27/war.h >> index ef3efce0094a..845b8951d74f 100644 >> --- a/arch/mips/include/asm/mach-ip27/war.h >> +++ b/arch/mips/include/asm/mach-ip27/war.h >> @@ -17,7 +17,12 @@ >> #define MIPS_CACHE_SYNC_WAR 0 >> #define TX49XX_ICACHE_INDEX_INV_WAR 0 >> #define ICACHE_REFILLS_WORKAROUND_WAR 0 >> -#define R10000_LLSC_WAR 1 >> #define MIPS34K_MISSED_ITLB_WAR 0 >> >> +#ifdef CONFIG_CPU_R10000 >> +#define R10000_LLSC_WAR 1 >> +#else >> +#define R10000_LLSC_WAR 0 >> +#endif >> + > > I think it would be good not to reorder the macros (even though there's > preexisting breakage in <asm/mach-ip30/war.h>) so that all the files have > them in the same order. They don't appear to be in any logical order to begin with. That, and I wanted to keep conditional defines separate from the fixed defines, hence moving the first in those files down to its own block. And the one in IP30's war.h is an artifact of my patchset that Thomas must've based off of for mainlining IP30 support. That section actually should be in this patchset alongside the hunk for IP27's war.h. Is there some subtlety w/r to the existing ordering that I don't know about, or would it make sense to have two patches, one which reorders the defines to be alphabetical, then the second being the R10K split patch?
On Wed, 13 May 2020, Joshua Kinard wrote: > > I think it would be good not to reorder the macros (even though there's > > preexisting breakage in <asm/mach-ip30/war.h>) so that all the files have > > them in the same order. > > They don't appear to be in any logical order to begin with. That, and I > wanted to keep conditional defines separate from the fixed defines, hence > moving the first in those files down to its own block. I suppose they have just come up in the order they were added. Whatever the order is however as long as it is consistent you can `diff' a pair of files against each other to spot differences easily without the need to rely on your perceptiveness. And <asm/mach-sibyte/war.h> already has conditionals in the middle. > Is there some subtlety w/r to the existing ordering that I don't know about, > or would it make sense to have two patches, one which reorders the defines > to be alphabetical, then the second being the R10K split patch? I wouldn't strongly mind reordering alphabetically, but it would disturb `git blame' and would add little value I'm afraid. I'm fine if new ones keep being added at the end, though OTOH it's not the best way to avoid conflicts. Maciej
On 5/14/2020 17:42, Maciej W. Rozycki wrote: > On Wed, 13 May 2020, Joshua Kinard wrote: > >>> I think it would be good not to reorder the macros (even though there's >>> preexisting breakage in <asm/mach-ip30/war.h>) so that all the files have >>> them in the same order. >> >> They don't appear to be in any logical order to begin with. That, and I >> wanted to keep conditional defines separate from the fixed defines, hence >> moving the first in those files down to its own block. > > I suppose they have just come up in the order they were added. Whatever > the order is however as long as it is consistent you can `diff' a pair of > files against each other to spot differences easily without the need to > rely on your perceptiveness. And <asm/mach-sibyte/war.h> already has > conditionals in the middle. I think then putting the conditional block below the section of fixed defines achieves that, then. diff's (and git diff's) algorithm will show that as a clean addition and not require a lot of surrounding context lines. It'll also avoid the need to submit a separate patch to fix IP30's "war.h". I took a look at asm/mach-sibyte/war.h, but it looks kind of....messy? None of the other "war.h" files use conditionals outside of IP27's, IP30's, and SiByte's, so there's not a whole lot of precedent to base off of. >> Is there some subtlety w/r to the existing ordering that I don't know about, >> or would it make sense to have two patches, one which reorders the defines >> to be alphabetical, then the second being the R10K split patch? > > I wouldn't strongly mind reordering alphabetically, but it would disturb > `git blame' and would add little value I'm afraid. I'm fine if new ones > keep being added at the end, though OTOH it's not the best way to avoid > conflicts. Agreed on not re-sorting. Adding comments to delineate might help, but then all of the war.h files would need to be touched, and I want to limit my edits strictly to the platforms I have knowledge of and can actually test on. Nobody, including myself, noticed the bit in IP30's war.h getting mainlined, so I don't think a lot of harm would be done by doing the same to IP27.h.
On Thu, 14 May 2020, Joshua Kinard wrote: > > I suppose they have just come up in the order they were added. Whatever > > the order is however as long as it is consistent you can `diff' a pair of > > files against each other to spot differences easily without the need to > > rely on your perceptiveness. And <asm/mach-sibyte/war.h> already has > > conditionals in the middle. > > I think then putting the conditional block below the section of fixed > defines achieves that, then. diff's (and git diff's) algorithm will show > that as a clean addition and not require a lot of surrounding context lines. > It'll also avoid the need to submit a separate patch to fix IP30's "war.h". > > I took a look at asm/mach-sibyte/war.h, but it looks kind of....messy? None > of the other "war.h" files use conditionals outside of IP27's, IP30's, and > SiByte's, so there's not a whole lot of precedent to base off of. Well, there's <asm/mach-rc32434/war.h> too, to be exact. I have no further arguments nor other input, so I'll be leaving it for Thomas to decide. Maciej
On 5/15/2020 07:20, Maciej W. Rozycki wrote: > On Thu, 14 May 2020, Joshua Kinard wrote: > >>> I suppose they have just come up in the order they were added. Whatever >>> the order is however as long as it is consistent you can `diff' a pair of >>> files against each other to spot differences easily without the need to >>> rely on your perceptiveness. And <asm/mach-sibyte/war.h> already has >>> conditionals in the middle. >> >> I think then putting the conditional block below the section of fixed >> defines achieves that, then. diff's (and git diff's) algorithm will show >> that as a clean addition and not require a lot of surrounding context lines. >> It'll also avoid the need to submit a separate patch to fix IP30's "war.h". >> >> I took a look at asm/mach-sibyte/war.h, but it looks kind of....messy? None >> of the other "war.h" files use conditionals outside of IP27's, IP30's, and >> SiByte's, so there's not a whole lot of precedent to base off of. > > Well, there's <asm/mach-rc32434/war.h> too, to be exact. > > I have no further arguments nor other input, so I'll be leaving it for > Thomas to decide. > > Maciej Focusing on just one hunk for asm/mach-ip27/war.h, how does this look if I keep the conditional inside the block? diff --git a/arch/mips/include/asm/mach-ip27/war.h b/arch/mips/include/asm/mach-ip27/war.h index ef3efce0094a..f041e7357620 100644 --- a/arch/mips/include/asm/mach-ip27/war.h +++ b/arch/mips/include/asm/mach-ip27/war.h @@ -17,7 +17,11 @@ #define MIPS_CACHE_SYNC_WAR 0 #define TX49XX_ICACHE_INDEX_INV_WAR 0 #define ICACHE_REFILLS_WORKAROUND_WAR 0 +#ifdef CONFIG_CPU_R10000 #define R10000_LLSC_WAR 1 +#else +#define R10000_LLSC_WAR 0 +#endif #define MIPS34K_MISSED_ITLB_WAR 0 #endif /* __ASM_MIPS_MACH_IP27_WAR_H */ If this works for you, I'll spin a v3 later and also send along a separate patch to fix the IP30 case in war.h.
On Wed, May 13, 2020 at 08:55:27PM -0400, Joshua Kinard wrote: > From: Joshua Kinard <kumba@gentoo.org> > > The attached patch adds more-specific support for R12000 and higher > CPUs by splitting the R10000 logic at several places. This avoids > the workarounds enabled by R10000_LLSC_WAR and passes -mno-fix-r10000 > to gcc during the kernel compile. I've seen this patch multiple times already and I always think there is something wrong with it. So we want to allow a way to disable bug workarounds for old R10k (rev < 3.0), why not call it that way ? Something like CONFIG_R10000_WORKAROUNDS, which defaults to y. The only thing we would be missing is the -march=r12000 not sure, if this makes much difference. Do I miss something else ? Thomas.
On Sat, May 16, 2020 at 02:16:52AM -0400, Joshua Kinard wrote: > Focusing on just one hunk for asm/mach-ip27/war.h, how does this look if I > keep the conditional inside the block? > > diff --git a/arch/mips/include/asm/mach-ip27/war.h b/arch/mips/include/asm/mach-ip27/war.h > index ef3efce0094a..f041e7357620 100644 > --- a/arch/mips/include/asm/mach-ip27/war.h > +++ b/arch/mips/include/asm/mach-ip27/war.h > @@ -17,7 +17,11 @@ > #define MIPS_CACHE_SYNC_WAR 0 > #define TX49XX_ICACHE_INDEX_INV_WAR 0 > #define ICACHE_REFILLS_WORKAROUND_WAR 0 > +#ifdef CONFIG_CPU_R10000 > #define R10000_LLSC_WAR 1 > +#else > +#define R10000_LLSC_WAR 0 > +#endif #define R10000_LLSC_WAR IS_ENABLED(CPU_R10000_WORKAROUNDS) looks nice and clean ;-) Thomas.
On Sun, May 17, 2020 at 12:59:55PM +0200, Thomas Bogendoerfer wrote: > On Wed, May 13, 2020 at 08:55:27PM -0400, Joshua Kinard wrote: > > From: Joshua Kinard <kumba@gentoo.org> > > > > The attached patch adds more-specific support for R12000 and higher > > CPUs by splitting the R10000 logic at several places. This avoids > > the workarounds enabled by R10000_LLSC_WAR and passes -mno-fix-r10000 > > to gcc during the kernel compile. > > I've seen this patch multiple times already and I always think there > is something wrong with it. So we want to allow a way to disable > bug workarounds for old R10k (rev < 3.0), why not call it that way ? > Something like CONFIG_R10000_WORKAROUNDS, which defaults to y. > The only thing we would be missing is the -march=r12000 not sure, > if this makes much difference. Do I miss something else ? and there should be a BUG_ON/WARN_ON, if a kernel is compiled without workarounds running on a buggy CPU. Thomas.
On 5/17/2020 06:59, Thomas Bogendoerfer wrote: > On Wed, May 13, 2020 at 08:55:27PM -0400, Joshua Kinard wrote: >> From: Joshua Kinard <kumba@gentoo.org> >> >> The attached patch adds more-specific support for R12000 and higher >> CPUs by splitting the R10000 logic at several places. This avoids >> the workarounds enabled by R10000_LLSC_WAR and passes -mno-fix-r10000 >> to gcc during the kernel compile. > > I've seen this patch multiple times already and I always think there > is something wrong with it. So we want to allow a way to disable > bug workarounds for old R10k (rev < 3.0), why not call it that way ? > Something like CONFIG_R10000_WORKAROUNDS, which defaults to y. > The only thing we would be missing is the -march=r12000 not sure, > if this makes much difference. Do I miss something else ? I've submitted it a few times before, but I think most of the times, it simply got missed due to Ralf just being busy. The one time he did look at it, I don't think he saw too much of a benefit as the only SGI system that could run R12K+ at the time was IP27, which had its own issues. I figured I'd give it another go-around now that IP30 is mainlined, as R12K is more common on that platform. Adding framework to further optimize based on CPU type seems like a good thing, even if those aren't used yet? Internally to gcc, there is no functional difference, yet, between 'r10000' and 'r12000' or later. I *think* on -march=r12000, -mno-fix-r10000 might be implied, but I would have to go look at gcc's source again. Specific to the R12000, there is a bit in the CONFIG register (I think) called Delay Speculative Dirty (DSD) that can be used on the IP32 to mitigate the speculative execution issues on that platform, just no one's coded for it yet. My hope was also that at some point, documentation for the R14000 and R16000 might find its way to the internet and maybe there were additional optimizations that could be added to gcc for those, to make -march=r14000 or -march=r16000 viable. Thus far, that hasn't happened. Both of those CPUs really seem to just be clock speed increases and die shrinkages. Though I do know that the R14000 uses DDR for the L2 cache, but I don't think that is something that can be taken advantage of by the compiler or kernel in terms of instruction scheduling. R16000 also seems to have an additional undocumented bit in the STATUS register, based on IRIX headers, but no other documentation exists on that. Really, as far as R14000/R16000 go, they're black boxes, as neither SGI, HP, nor NEC has ever released documentation or errata for those two CPUs.
diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig index bfa9cd962b06..719434bf4f3a 100644 --- a/arch/mips/Kconfig +++ b/arch/mips/Kconfig @@ -673,6 +673,7 @@ config SGI_IP27 select PCI_DRIVERS_GENERIC select PCI_XTALK_BRIDGE select SYS_HAS_CPU_R10000 + select SYS_HAS_CPU_R12K_R14K_R16K select SYS_SUPPORTS_64BIT_KERNEL select SYS_SUPPORTS_BIG_ENDIAN select SYS_SUPPORTS_NUMA @@ -734,6 +735,7 @@ config SGI_IP30 select PCI_XTALK_BRIDGE select SYS_HAS_EARLY_PRINTK select SYS_HAS_CPU_R10000 + select SYS_HAS_CPU_R12K_R14K_R16K select SYS_SUPPORTS_64BIT_KERNEL select SYS_SUPPORTS_BIG_ENDIAN select SYS_SUPPORTS_SMP @@ -760,6 +762,7 @@ config SGI_IP32 select RM7000_CPU_SCACHE select SYS_HAS_CPU_R5000 select SYS_HAS_CPU_R10000 if BROKEN + select SYS_HAS_CPU_R12K_R14K_R16K if BROKEN select SYS_HAS_CPU_RM7000 select SYS_HAS_CPU_NEVADA select SYS_SUPPORTS_64BIT_KERNEL @@ -1676,7 +1679,18 @@ config CPU_R10000 select CPU_SUPPORTS_HIGHMEM select CPU_SUPPORTS_HUGEPAGES help - MIPS Technologies R10000-series processors. + MIPS Technologies R10000 processor. + +config CPU_R12K_R14K_R16K + bool "R12k/R14k/R16k" + depends on SYS_HAS_CPU_R12K_R14K_R16K + select CPU_HAS_PREFETCH + select CPU_SUPPORTS_32BIT_KERNEL + select CPU_SUPPORTS_64BIT_KERNEL + select CPU_SUPPORTS_HIGHMEM + select CPU_SUPPORTS_HUGEPAGES + help + MIPS Technologies R12000, R14000, & R16000 processors. config CPU_RM7000 bool "RM7000" @@ -1968,6 +1982,9 @@ config SYS_HAS_CPU_R10000 bool select ARCH_HAS_SYNC_DMA_FOR_CPU if DMA_NONCOHERENT +config SYS_HAS_CPU_R12K_R14K_R16K + bool + config SYS_HAS_CPU_RM7000 bool @@ -2706,7 +2723,7 @@ config NODES_SHIFT config HW_PERF_EVENTS bool "Enable hardware performance counter support for perf events" - depends on PERF_EVENTS && !OPROFILE && (CPU_MIPS32 || CPU_MIPS64 || CPU_R10000 || CPU_SB1 || CPU_CAVIUM_OCTEON || CPU_XLP || CPU_LOONGSON64) + depends on PERF_EVENTS && !OPROFILE && (CPU_MIPS32 || CPU_MIPS64 || CPU_R10000 || CPU_R12K_R14K_R16K || CPU_SB1 || CPU_CAVIUM_OCTEON || CPU_XLP || CPU_LOONGSON64) default y help Enable hardware performance counter support for perf events. If diff --git a/arch/mips/Makefile b/arch/mips/Makefile index b50377ec3ab5..33572d347f47 100644 --- a/arch/mips/Makefile +++ b/arch/mips/Makefile @@ -163,6 +163,9 @@ cflags-$(CONFIG_CPU_SB1) += $(call cc-option,-mno-mdmx) cflags-$(CONFIG_CPU_SB1) += $(call cc-option,-mno-mips3d) cflags-$(CONFIG_CPU_R10000) += $(call cc-option,-march=r10000,-march=r8000) \ -Wa,--trap +cflags-$(CONFIG_CPU_R12K_R14K_R16K) += $(call cc-option,-march=r12000,-march=r10000) \ + $(call cc-option,-mno-fix-r10000,) \ + -Wa,--trap cflags-$(CONFIG_CPU_CAVIUM_OCTEON) += $(call cc-option,-march=octeon) -Wa,--trap ifeq (,$(findstring march=octeon, $(cflags-$(CONFIG_CPU_CAVIUM_OCTEON)))) cflags-$(CONFIG_CPU_CAVIUM_OCTEON) += -Wa,-march=octeon diff --git a/arch/mips/include/asm/cpu-type.h b/arch/mips/include/asm/cpu-type.h index 49f0061a6051..833694782d05 100644 --- a/arch/mips/include/asm/cpu-type.h +++ b/arch/mips/include/asm/cpu-type.h @@ -148,6 +148,8 @@ static inline int __pure __get_cpu_type(const int cpu_type) #ifdef CONFIG_SYS_HAS_CPU_R10000 case CPU_R10000: +#endif +#ifdef CONFIG_SYS_HAS_CPU_R12K_R14K_R16K case CPU_R12000: case CPU_R14000: case CPU_R16000: diff --git a/arch/mips/include/asm/hazards.h b/arch/mips/include/asm/hazards.h index a0b92205f933..190ad53091ea 100644 --- a/arch/mips/include/asm/hazards.h +++ b/arch/mips/include/asm/hazards.h @@ -159,7 +159,8 @@ do { \ #elif defined(CONFIG_MIPS_ALCHEMY) || defined(CONFIG_CPU_CAVIUM_OCTEON) || \ defined(CONFIG_CPU_LOONGSON2EF) || defined(CONFIG_CPU_LOONGSON64) || \ - defined(CONFIG_CPU_R10000) || defined(CONFIG_CPU_R5500) || defined(CONFIG_CPU_XLR) + defined(CONFIG_CPU_R10000) || defined(CONFIG_CPU_R12K_R14K_R16K) || \ + defined(CONFIG_CPU_R5500) || defined(CONFIG_CPU_XLR) /* * R10000 rocks - all hazards handled in hardware, so this becomes a nobrainer. diff --git a/arch/mips/include/asm/mach-ip27/war.h b/arch/mips/include/asm/mach-ip27/war.h index ef3efce0094a..845b8951d74f 100644 --- a/arch/mips/include/asm/mach-ip27/war.h +++ b/arch/mips/include/asm/mach-ip27/war.h @@ -17,7 +17,12 @@ #define MIPS_CACHE_SYNC_WAR 0 #define TX49XX_ICACHE_INDEX_INV_WAR 0 #define ICACHE_REFILLS_WORKAROUND_WAR 0 -#define R10000_LLSC_WAR 1 #define MIPS34K_MISSED_ITLB_WAR 0 +#ifdef CONFIG_CPU_R10000 +#define R10000_LLSC_WAR 1 +#else +#define R10000_LLSC_WAR 0 +#endif + #endif /* __ASM_MIPS_MACH_IP27_WAR_H */ diff --git a/arch/mips/include/asm/module.h b/arch/mips/include/asm/module.h index 9846047b3d3d..9697414af51c 100644 --- a/arch/mips/include/asm/module.h +++ b/arch/mips/include/asm/module.h @@ -115,6 +115,8 @@ search_module_dbetables(unsigned long addr) #define MODULE_PROC_FAMILY "NEVADA " #elif defined CONFIG_CPU_R10000 #define MODULE_PROC_FAMILY "R10000 " +#elif defined CONFIG_CPU_R12K_R14K_R16K +#define MODULE_PROC_FAMILY "R12K/R14K/R16K " #elif defined CONFIG_CPU_RM7000 #define MODULE_PROC_FAMILY "RM7000 " #elif defined CONFIG_CPU_SB1 diff --git a/arch/mips/oprofile/Makefile b/arch/mips/oprofile/Makefile index e10f216d0422..60dfffdb32a0 100644 --- a/arch/mips/oprofile/Makefile +++ b/arch/mips/oprofile/Makefile @@ -12,6 +12,7 @@ oprofile-y := $(DRIVER_OBJS) common.o backtrace.o oprofile-$(CONFIG_CPU_MIPS32) += op_model_mipsxx.o oprofile-$(CONFIG_CPU_MIPS64) += op_model_mipsxx.o oprofile-$(CONFIG_CPU_R10000) += op_model_mipsxx.o +oprofile-$(CONFIG_CPU_R12K_R14K_R16K) += op_model_mipsxx.o oprofile-$(CONFIG_CPU_SB1) += op_model_mipsxx.o oprofile-$(CONFIG_CPU_XLR) += op_model_mipsxx.o oprofile-$(CONFIG_CPU_LOONGSON2EF) += op_model_loongson2.o diff --git a/drivers/video/fbdev/gbefb.c b/drivers/video/fbdev/gbefb.c index 31270a8986e8..9d62246b0b42 100644 --- a/drivers/video/fbdev/gbefb.c +++ b/drivers/video/fbdev/gbefb.c @@ -43,7 +43,7 @@ struct gbefb_par { /* macro for fastest write-though access to the framebuffer */ #ifdef CONFIG_MIPS -#ifdef CONFIG_CPU_R10000 +#if defined(CONFIG_CPU_R10000) || defined(CONFIG_CPU_R12K_R14K_R16K) #define pgprot_fb(_prot) (((_prot) & (~_CACHE_MASK)) | _CACHE_UNCACHED_ACCELERATED) #else #define pgprot_fb(_prot) (((_prot) & (~_CACHE_MASK)) | _CACHE_CACHABLE_NO_WA)