Message ID | 20180923152033.GA15162@mx3210.localdomain (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | parisc: Reorder TLB flush timing calculation | expand |
On 2018-09-23 11:20 AM, John David Anglin wrote: > I added a couple of information messages which I have left to help with > diagnosis if the problem should appear on another machine. Looking at the calculated TLB flush thresholds, I see there is a large range of numbers. The most surprising numbers are for panama: [ 1.246425] Whole TLB flush 5982 cycles, Range flush 18874368 bytes 16515332 cycles [ 1.246680] Calculated TLB flush threshold 8 KiB [ 1.247120] TLB flush threshold set to 512 KiB I don't know whether to believe the numbers or not. But the whole cache flush seems to be amazingly fast compared to 4-way machines. On the other hand, the range flush takes 4.25 times more cycles per byte than phantom. Here are the numbers for phantom: [ 6.512604] Whole TLB flush 61861 cycles, Range flush 18874368 bytes 3876885 cycles [ 6.616030] Calculated TLB flush threshold 1180 KiB [ 6.680018] TLB flush threshold set to 1180 KiB Both machines are PA8900 (Shortfin). Panama is 800 MHz and phantom 1000 MHz. Both have: [ 0.000000] Kernel default page size is 4 KB. Huge pages enabled with 1 MB physical and 2 MB virtual size. It would appear the minimum TLB flush threshold needs to be reduced for machines like panama. Dave
diff --git a/arch/parisc/kernel/cache.c b/arch/parisc/kernel/cache.c index e3b45546d589..3031c21f7c35 100644 --- a/arch/parisc/kernel/cache.c +++ b/arch/parisc/kernel/cache.c @@ -363,7 +375,7 @@ EXPORT_SYMBOL(flush_kernel_icache_range_asm); #define FLUSH_THRESHOLD 0x80000 /* 0.5MB */ static unsigned long parisc_cache_flush_threshold __read_mostly = FLUSH_THRESHOLD; -#define FLUSH_TLB_THRESHOLD (2*1024*1024) /* 2MB initial TLB threshold */ +#define FLUSH_TLB_THRESHOLD (512*1024) /* 0.5MB minimum TLB threshold */ static unsigned long parisc_tlb_flush_threshold __read_mostly = FLUSH_TLB_THRESHOLD; void __init parisc_setup_cache_timing(void) @@ -403,10 +415,6 @@ void __init parisc_setup_cache_timing(void) goto set_tlb_threshold; } - alltime = mfctl(16); - flush_tlb_all(); - alltime = mfctl(16) - alltime; - size = 0; start = (unsigned long) _text; rangetime = mfctl(16); @@ -417,13 +425,19 @@ void __init parisc_setup_cache_timing(void) } rangetime = mfctl(16) - rangetime; - printk(KERN_DEBUG "Whole TLB flush %lu cycles, flushing %lu bytes %lu cycles\n", + alltime = mfctl(16); + flush_tlb_all(); + alltime = mfctl(16) - alltime; + + printk(KERN_INFO "Whole TLB flush %lu cycles, Range flush %lu bytes %lu cycles\n", alltime, size, rangetime); - threshold = PAGE_ALIGN(num_online_cpus() * size * alltime / rangetime); + threshold = PAGE_ALIGN((num_online_cpus() * size * alltime) / rangetime); + printk(KERN_INFO "Calculated TLB flush threshold %lu KiB\n", + threshold/1024); set_tlb_threshold: - if (threshold) + if (threshold > parisc_tlb_flush_threshold) parisc_tlb_flush_threshold = threshold; printk(KERN_INFO "TLB flush threshold set to %lu KiB\n", parisc_tlb_flush_threshold/1024);
On boot (mostly reboot), my c8000 sometimes crashes after it prints the TLB flush threshold. The lockup is hard. The front LED flashes red and the box must be unplugged to reset the error. I noticed that when the crash occurs the TLB flush threshold is about one quarter what it is on a successful boot. If I disabled the calculation, the crash didn't occur. There also seemed to be a timing dependency affecting the crash. I finally realized that the flush_tlb_all() timing test runs just after the secondary CPUs are started. There seems to be a problem with running flush_tlb_all() too soon after the CPUs are started. The timing for the range test always seemed okay. So, I reversed the order of the two timing tests and I haven't had a crash at this point so far. I added a couple of information messages which I have left to help with diagnosis if the problem should appear on another machine. Signed-off-by: John David Anglin <dave.anglin@bell.net>