From patchwork Sun Oct 14 18:34:24 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Helge Deller X-Patchwork-Id: 10640773 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6977E925 for ; Sun, 14 Oct 2018 18:34:38 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 7241628CD9 for ; Sun, 14 Oct 2018 18:34:38 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 668CE29BD7; Sun, 14 Oct 2018 18:34:38 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,FREEMAIL_FROM, MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 84D8A28CD9 for ; Sun, 14 Oct 2018 18:34:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726167AbeJOCQb (ORCPT ); Sun, 14 Oct 2018 22:16:31 -0400 Received: from mout.gmx.net ([212.227.17.20]:46367 "EHLO mout.gmx.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726356AbeJOCQb (ORCPT ); Sun, 14 Oct 2018 22:16:31 -0400 Received: from ls3530.fritz.box ([92.116.183.136]) by mail.gmx.com (mrgmx103 [212.227.17.168]) with ESMTPSA (Nemesis) id 0Lx8ZJ-1fa41l1qbx-016hp5; Sun, 14 Oct 2018 20:34:28 +0200 Received: from ls3530.fritz.box ([92.116.183.136]) by mail.gmx.com (mrgmx103 [212.227.17.168]) with ESMTPSA (Nemesis) id 0Lx8ZJ-1fa41l1qbx-016hp5; Sun, 14 Oct 2018 20:34:28 +0200 Date: Sun, 14 Oct 2018 20:34:24 +0200 From: Helge Deller To: linux-parisc@vger.kernel.org, James Bottomley , John David Anglin Subject: [RFC][PATCH v2] parisc: Add alternative coding when running UP Message-ID: <20181014183424.GA20783@ls3530.fritz.box> MIME-Version: 1.0 Content-Disposition: inline User-Agent: Mutt/1.9.1 (2017-09-22) X-Provags-ID: V03:K1:iCgVizNhx3WIzspmEy5cGUIRVBpvC1Mrnta8KALxR9npD3ITKRM zbeabb0E60qb9XaEC6IoCKM5ZUPCq0yTTGh9wtHloJsP+A8elx4cbo+c1KN1CE9XSncgIKZ jNwqMY2S0GM7v9fgd3dA+cWFz4o2d/T1CX2+zEce3OFCV66/6uxUHZkDANIhSQzSOQ7OpoB bs6vB0bRk9KxH0/CY4kVw== X-UI-Out-Filterresults: notjunk:1;V01:K0:MfwYRgrGtfw=:l+u+X7JES1aSVp7Jmw7/YN 571pOohqKDKhghhUx8Bnh0BxX2x+pKG5bHW4ajHVy/6uGIvV8M8mU6rs7Q+CqFa8buvR7xIqP TM0qKjS6tvI48iwB0AOMf4SOFAorunaW9emrrHg4jSm+tdvMhmwRewRtw8qjJjad0+CTJmTRT lEZ2FJsLufOpsaYdg4dJNFI+euDciVRaKoJDz6Lx8ju2GzdRhx3mPr50oW3Wg30G8/8DjV5lS 0LnMvisDcGGHj2ucksY4xrpc0ZUxaprygBGIxb1+EkiyUZyE///E10Vyp5Z6USbJwojmEUaph W43dAWd30jX1uLcY3mvXZ+o+TgHda8LIaR8vva2J/feq/xITVohsHMftlwMli2E9BkvS7pJB4 oSvmZMSDUw/eg045pE04Z6P+nZwXlvF1n51YD6Hlsd2M50fiFU4h6C9+jq9c1yAbCHWWTOHkz T3PqKni1RRzAdaC8UGnA7bQ2no+7p50WO9JipzuINZn8By1XLEteqzhHHEzDyz+7FaozEw2CF DeMIW7/OE2ibOv92u60WQ0ouHALbXHtrARSGdU7u4XfWlsTwJWRZydxh5RRJJwi+xA1AuF54e C8JCt32+NRKUye39Tky+EkFYrGXu2/dWuRwkXBz3rS7M14lowYV+w1SlMHXvSSpLpU15LVacy 8IRY6wdzCIA/CkQx79anjVDNpLbO3bzXOg3FcXMsKQZ5j9QBFHQEHf/WOq/8E4jHV7+OEK+7B xUua4gR6DLNAsm3i3iXMKQv2irSkTIAc/wL7vJhgRmbAg9NI3qpLxPNc1pUgYRN8ECctxiI8V BhsA78fFlmFhgIKVKqrfE+NrsAcmg== Sender: linux-parisc-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-parisc@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP This patch adds the necessary code to patch a running SMP kernel at runtime to improve performance when running on a single CPU. The current implementation offers two patching variants: - Unwanted assembler statements like locking functions are overwritten with NOPs. When multiple instructions shall be skipped, one branch instruction is used instead of multiple nop instructions. - Some pdtlb and pitlb instructions are patched to become pdtlb,l and pitlb,l which only flushes the CPU-local tlb entries instead of broadcasting the flush to other CPUs in the system and thus may improve performance. Live-patching is done early in the boot process, just after having run the system inventory. No drivers are running and thus no external interrupts should arrive. So the hope is that no TLB exceptions will occur during the patching. If this turns out to be wrong we will probably need to do the patching in real-mode. Signed-off-by: Helge Deller diff --git a/arch/parisc/include/asm/alternative.h b/arch/parisc/include/asm/alternative.h new file mode 100644 index 000000000000..e4835bd376bf --- /dev/null +++ b/arch/parisc/include/asm/alternative.h @@ -0,0 +1,41 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef __ASM_PARISC_ALTERNATIVE_H +#define __ASM_PARISC_ALTERNATIVE_H + +#define INSN_PxTLB 0x02 /* modify pdtlb, pitlb */ +#define INSN_NOP 0x8000240 /* nop */ + + +#ifndef __ASSEMBLY__ + +#include +#include +#include +#include + +struct alt_instr { + s32 orig_offset; /* offset to original instructions */ + u32 len; /* end of original instructions */ + u32 replacement; /* replacement instruction or code */ +}; + +void set_kernel_text_rw(int enable_read_write); +// int __init apply_alternatives_all(void); + +/* Alternative SMP implementation. */ +#define ALTERNATIVE(replacement) "!0:" \ + ".section .altinstructions, \"aw\" !" \ + ".word (0b-4-.), 1, " __stringify(replacement) " !" \ + ".previous" + +#else + +#define ALTERNATIVE(from, to, replacement) \ + .section .altinstructions, "aw" ! \ + .word (from - .), (to - from)/4 ! \ + .word replacement ! \ + .previous + +#endif /* __ASSEMBLY__ */ + +#endif /* __ASM_PARISC_ALTERNATIVE_H */ diff --git a/arch/parisc/include/asm/cache.h b/arch/parisc/include/asm/cache.h index 150b7f30ea90..3e50a6e52fbd 100644 --- a/arch/parisc/include/asm/cache.h +++ b/arch/parisc/include/asm/cache.h @@ -43,6 +43,10 @@ void parisc_setup_cache_timing(void); #define pdtlb(addr) asm volatile("pdtlb 0(%%sr1,%0)" : : "r" (addr)); #define pitlb(addr) asm volatile("pitlb 0(%%sr1,%0)" : : "r" (addr)); +#define pdtlb_alt(addr) asm volatile("pdtlb 0(%%sr1,%0)" \ + ALTERNATIVE(INSN_PxTLB) : : "r" (addr)) +#define pitlb_alt(addr) asm volatile("pitlb 0(%%sr1,%0)" \ + ALTERNATIVE(INSN_PxTLB) : : "r" (addr)) #define pdtlb_kernel(addr) asm volatile("pdtlb 0(%0)" : : "r" (addr)); #endif /* ! __ASSEMBLY__ */ diff --git a/arch/parisc/include/asm/sections.h b/arch/parisc/include/asm/sections.h index 5a40b51df80c..bb52aea0cb21 100644 --- a/arch/parisc/include/asm/sections.h +++ b/arch/parisc/include/asm/sections.h @@ -5,6 +5,8 @@ /* nothing to see, move along */ #include +extern char __alt_instructions[], __alt_instructions_end[]; + #ifdef CONFIG_64BIT #define HAVE_DEREFERENCE_FUNCTION_DESCRIPTOR 1 diff --git a/arch/parisc/kernel/cache.c b/arch/parisc/kernel/cache.c index 4209b74ce63c..576aff095ec8 100644 --- a/arch/parisc/kernel/cache.c +++ b/arch/parisc/kernel/cache.c @@ -28,6 +28,7 @@ #include #include #include +#include int split_tlb __read_mostly; int dcache_stride __read_mostly; @@ -483,7 +484,7 @@ int __flush_tlb_range(unsigned long sid, unsigned long start, while (start < end) { purge_tlb_start(flags); mtsp(sid, 1); - pdtlb(start); + pdtlb_alt(start); purge_tlb_end(flags); start += PAGE_SIZE; } @@ -494,8 +495,8 @@ int __flush_tlb_range(unsigned long sid, unsigned long start, while (start < end) { purge_tlb_start(flags); mtsp(sid, 1); - pdtlb(start); - pitlb(start); + pdtlb_alt(start); + pitlb_alt(start); purge_tlb_end(flags); start += PAGE_SIZE; } diff --git a/arch/parisc/kernel/entry.S b/arch/parisc/kernel/entry.S index 0d662f0e7b70..66a82f69776c 100644 --- a/arch/parisc/kernel/entry.S +++ b/arch/parisc/kernel/entry.S @@ -38,6 +38,7 @@ #include #include #include +#include #include @@ -464,7 +465,7 @@ /* Acquire pa_tlb_lock lock and check page is present. */ .macro tlb_lock spc,ptp,pte,tmp,tmp1,fault #ifdef CONFIG_SMP - cmpib,COND(=),n 0,\spc,2f +98: cmpib,COND(=),n 0,\spc,2f load_pa_tlb_lock \tmp 1: LDCW 0(\tmp),\tmp1 cmpib,COND(=) 0,\tmp1,1b @@ -473,6 +474,7 @@ bb,<,n \pte,_PAGE_PRESENT_BIT,3f b \fault stw,ma \spc,0(\tmp) +99: ALTERNATIVE(98b, 99b, INSN_NOP) #endif 2: LDREG 0(\ptp),\pte bb,>=,n \pte,_PAGE_PRESENT_BIT,\fault @@ -482,15 +484,17 @@ /* Release pa_tlb_lock lock without reloading lock address. */ .macro tlb_unlock0 spc,tmp #ifdef CONFIG_SMP - or,COND(=) %r0,\spc,%r0 +98: or,COND(=) %r0,\spc,%r0 stw,ma \spc,0(\tmp) +99: ALTERNATIVE(98b, 99b, INSN_NOP) #endif .endm /* Release pa_tlb_lock lock. */ .macro tlb_unlock1 spc,tmp #ifdef CONFIG_SMP - load_pa_tlb_lock \tmp +98: load_pa_tlb_lock \tmp +99: ALTERNATIVE(98b, 99b, INSN_NOP) tlb_unlock0 \spc,\tmp #endif .endm diff --git a/arch/parisc/kernel/pacache.S b/arch/parisc/kernel/pacache.S index f33bf2d306d6..11801b502352 100644 --- a/arch/parisc/kernel/pacache.S +++ b/arch/parisc/kernel/pacache.S @@ -37,6 +37,7 @@ #include #include #include +#include #include #include @@ -312,6 +313,7 @@ ENDPROC_CFI(flush_data_cache_local) .macro tlb_lock la,flags,tmp #ifdef CONFIG_SMP +98: #if __PA_LDCW_ALIGNMENT > 4 load32 pa_tlb_lock + __PA_LDCW_ALIGNMENT-1, \la depi 0,31,__PA_LDCW_ALIGN_ORDER, \la @@ -326,15 +328,17 @@ ENDPROC_CFI(flush_data_cache_local) nop b,n 2b 3: +99: ALTERNATIVE(98b, 99b, INSN_NOP) #endif .endm .macro tlb_unlock la,flags,tmp #ifdef CONFIG_SMP - ldi 1,\tmp +98: ldi 1,\tmp sync stw \tmp,0(\la) mtsm \flags +99: ALTERNATIVE(98b, 99b, INSN_NOP) #endif .endm @@ -596,9 +600,11 @@ ENTRY_CFI(copy_user_page_asm) pdtlb,l %r0(%r29) #else tlb_lock %r20,%r21,%r22 - pdtlb %r0(%r28) - pdtlb %r0(%r29) +0: pdtlb %r0(%r28) +1: pdtlb %r0(%r29) tlb_unlock %r20,%r21,%r22 + ALTERNATIVE(0b, 0b, INSN_PxTLB) + ALTERNATIVE(1b, 1b, INSN_PxTLB) #endif #ifdef CONFIG_64BIT @@ -736,8 +742,9 @@ ENTRY_CFI(clear_user_page_asm) pdtlb,l %r0(%r28) #else tlb_lock %r20,%r21,%r22 - pdtlb %r0(%r28) +0: pdtlb %r0(%r28) tlb_unlock %r20,%r21,%r22 + ALTERNATIVE(0b, 0b, INSN_PxTLB) #endif #ifdef CONFIG_64BIT @@ -813,8 +820,9 @@ ENTRY_CFI(flush_dcache_page_asm) pdtlb,l %r0(%r28) #else tlb_lock %r20,%r21,%r22 - pdtlb %r0(%r28) +0: pdtlb %r0(%r28) tlb_unlock %r20,%r21,%r22 + ALTERNATIVE(0b, 0b, INSN_PxTLB) #endif ldil L%dcache_stride, %r1 @@ -877,9 +885,11 @@ ENTRY_CFI(flush_icache_page_asm) pitlb,l %r0(%sr4,%r28) #else tlb_lock %r20,%r21,%r22 - pdtlb %r0(%r28) - pitlb %r0(%sr4,%r28) +0: pdtlb %r0(%r28) +1: pitlb %r0(%sr4,%r28) tlb_unlock %r20,%r21,%r22 + ALTERNATIVE(0b, 0b, INSN_PxTLB) + ALTERNATIVE(1b, 1b, INSN_PxTLB) #endif ldil L%icache_stride, %r1 diff --git a/arch/parisc/kernel/setup.c b/arch/parisc/kernel/setup.c index 4e87c35c22b7..7fa151b5eb40 100644 --- a/arch/parisc/kernel/setup.c +++ b/arch/parisc/kernel/setup.c @@ -40,6 +40,7 @@ #include #include +#include #include #include #include @@ -305,6 +306,55 @@ static int __init parisc_init_resources(void) return 0; } +static int __init apply_alternatives_all(void) +{ + struct alt_instr *entry; + int *from, len; + int ret = 0, replacement; + + /* replace only when not running SMP CPUs */ + if (num_online_cpus() > 1) + return 0; + + pr_info("Patch SMP kernel to run on a single CPU.\n"); + + set_kernel_text_rw(1); + + entry = (struct alt_instr *) &__alt_instructions; + while (entry < (struct alt_instr *) &__alt_instructions_end) { + from = (int *)((ulong)&entry->orig_offset + entry->orig_offset); + len = entry->len; + + replacement = entry->replacement; + + /* Want to replace pdtlb by a pdtlb,l instruction? */ + if (replacement == INSN_PxTLB) { + replacement = *from; + if (boot_cpu_data.cpu_type >= pcxu) /* >= pa2.0 ? */ + replacement |= (1 << 10); /* set el bit */ + } + + /* + * Replace instruction with NOPs? + * For long distance insert a branch instruction instead. + */ + if (replacement == INSN_NOP && len > 1) + replacement = 0xe8000002 + (len-2)*8; /* "b,n .+8" */ + + pr_debug("Replace %02d instructions @ 0x%px with 0x%08x\n", + len, from, replacement); + + /* Replace instructions */ + *from = replacement; + + entry++; + } + + set_kernel_text_rw(0); + + return ret; +} + extern void gsc_init(void); extern void processor_init(void); extern void ccio_init(void); @@ -346,6 +396,7 @@ static int __init parisc_init(void) boot_cpu_data.cpu_hz / 1000000, boot_cpu_data.cpu_hz % 1000000 ); + apply_alternatives_all(); parisc_setup_cache_timing(); /* These are in a non-obvious order, will fix when we have an iotree */ diff --git a/arch/parisc/kernel/vmlinux.lds.S b/arch/parisc/kernel/vmlinux.lds.S index da2e31190efa..ef721fc3671b 100644 --- a/arch/parisc/kernel/vmlinux.lds.S +++ b/arch/parisc/kernel/vmlinux.lds.S @@ -25,7 +25,7 @@ #include #include #include - + /* ld script to make hppa Linux kernel */ #ifndef CONFIG_64BIT OUTPUT_FORMAT("elf32-hppa-linux") @@ -61,6 +61,12 @@ SECTIONS EXIT_DATA } PERCPU_SECTION(8) + . = ALIGN(4); + .altinstructions : { + __alt_instructions = .; + *(.altinstructions) + __alt_instructions_end = .; + } . = ALIGN(HUGEPAGE_SIZE); __init_end = .; /* freed after init ends here */ diff --git a/arch/parisc/mm/init.c b/arch/parisc/mm/init.c index 74842d28a7a1..ff80ffdd09c7 100644 --- a/arch/parisc/mm/init.c +++ b/arch/parisc/mm/init.c @@ -515,6 +512,21 @@ static void __init map_pages(unsigned long start_vaddr, } } +void __init set_kernel_text_rw(int enable_read_write) +{ + unsigned long start = (unsigned long)_stext; + unsigned long end = (unsigned long)_etext; + + map_pages(start, __pa(start), end-start, + PAGE_KERNEL_RWX, enable_read_write ? 1:0); + + /* force the kernel to see the new TLB entries */ + __flush_tlb_range(0, start, end); + + /* dump old cached instructions */ + flush_icache_range(start, end); +} + void __ref free_initmem(void) { unsigned long init_begin = (unsigned long)__init_begin;