From patchwork Thu Aug 8 09:25:37 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: R Sricharan X-Patchwork-Id: 2840890 Return-Path: X-Original-To: patchwork-linux-arm@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.19.201]) by patchwork2.web.kernel.org (Postfix) with ESMTP id 0A48ABF535 for ; Thu, 8 Aug 2013 09:26:19 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 54DDC20397 for ; Thu, 8 Aug 2013 09:26:17 +0000 (UTC) Received: from casper.infradead.org (casper.infradead.org [85.118.1.10]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 3D6F120387 for ; Thu, 8 Aug 2013 09:26:15 +0000 (UTC) Received: from merlin.infradead.org ([2001:4978:20e::2]) by casper.infradead.org with esmtps (Exim 4.80.1 #2 (Red Hat Linux)) id 1V7MU0-00062a-Kw; Thu, 08 Aug 2013 09:26:13 +0000 Received: from localhost ([::1] helo=merlin.infradead.org) by merlin.infradead.org with esmtp (Exim 4.80.1 #2 (Red Hat Linux)) id 1V7MTy-0004PQ-Ed; Thu, 08 Aug 2013 09:26:10 +0000 Received: from comal.ext.ti.com ([198.47.26.152]) by merlin.infradead.org with esmtps (Exim 4.80.1 #2 (Red Hat Linux)) id 1V7MTu-0004OQ-Uf for linux-arm-kernel@lists.infradead.org; Thu, 08 Aug 2013 09:26:08 +0000 Received: from dlelxv90.itg.ti.com ([172.17.2.17]) by comal.ext.ti.com (8.13.7/8.13.7) with ESMTP id r789Pev1011592; Thu, 8 Aug 2013 04:25:40 -0500 Received: from DLEE70.ent.ti.com (dlee70.ent.ti.com [157.170.170.113]) by dlelxv90.itg.ti.com (8.14.3/8.13.8) with ESMTP id r789PeTl025998; Thu, 8 Aug 2013 04:25:40 -0500 Received: from dlelxv22.itg.ti.com (172.17.1.197) by DLEE70.ent.ti.com (157.170.170.113) with Microsoft SMTP Server id 14.2.342.3; Thu, 8 Aug 2013 04:25:40 -0500 Received: from [172.24.145.242] (uda0393807-172024145242.apr.dhcp.ti.com [172.24.145.242]) by dlelxv22.itg.ti.com (8.13.8/8.13.8) with ESMTP id r789PcX3026087; Thu, 8 Aug 2013 04:25:39 -0500 Message-ID: <52036411.6020103@ti.com> Date: Thu, 8 Aug 2013 14:55:37 +0530 From: Sricharan R User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:11.0) Gecko/20120410 Thunderbird/11.0.1 MIME-Version: 1.0 To: Nicolas Pitre Subject: Re: Fwd: [PATCH 05/10] ARM: LPAE: Correct virt_to_phys patching for 64 bit physical addresses References: <1375887551-8442-1-git-send-email-r.sricharan@ti.com> <52026637.5040005@ti.com> <52030425.2030900@ti.com> In-Reply-To: <52030425.2030900@ti.com> X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20130808_052607_182203_5E26F5BA X-CRM114-Status: GOOD ( 26.76 ) X-Spam-Score: -6.9 (------) Cc: Russell King , Santosh Shilimkar , "linux-arm-kernel@lists.infradead.org" X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_MED, RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Hi Nicolas, On Thursday 08 August 2013 08:06 AM, Sricharan R wrote: > On Thursday 08 August 2013 06:54 AM, Nicolas Pitre wrote: >> On Wed, 7 Aug 2013, Sricharan R wrote: >> >>> Hi Nicolas, >>> >>> On Monday 05 August 2013 08:08 PM, Santosh Shilimkar wrote: >>>> On Sunday 04 August 2013 01:32 AM, Nicolas Pitre wrote: >>>>> On Sat, 3 Aug 2013, Santosh Shilimkar wrote: >>>>> >>>>>> On Saturday 03 August 2013 10:01 AM, Nicolas Pitre wrote: >>>>>>> On Sat, 3 Aug 2013, Sricharan R wrote: >>>>>>> >>>>>>>> On Saturday 03 August 2013 08:58 AM, Nicolas Pitre wrote: >>>>>>>>> ... meaning that, instead of using 0x81 for the stub value on the mov >>>>>>>>> instruction, it only has to be 0x83. Bits 7 and 0 still act as anchors >>>>>>>>> for the rotation field in the opcode, while bit 1 indicates which value >>>>>>>>> to patch in. >>>>>>>> I started with this kind of augmenting with the immediate operand >>>>>>>> while starting V2. But the problem was, we do the runtime patching twice. >>>>>>> Ahhh... Bummer. >>>>>>> >>>>>> Sorry if it wasn't clear but I thought we discussed why patching is >>>>>> done twice. >>>>> Yeah, I know the reasons. I just had forgotten about the effects on the >>>>> anchor bits. >>>>> >>>> I see. >>>> >>>>>> This was purely based on the discussion where RMK suggested to follow >>>>>> that approach to minimize code changes. >>>>>> >>>>>> Looks like we need to revisit that now based on Russell's latest >>>>>> comment. >>>>> Note that my comments on this particular patch are still valid and >>>>> independent from whatever approach is used globally to deal with the >>>>> memory alias. I do think that the value to patch should be selected >>>>> depending on the opcode's rotation field which makes it compatible with >>>>> a double patching approach as well. >>>>> >>>> Completely agree. >>>> >>>> Regards, >>>> Santosh >>>> >>> So i did the below inlined patch to addresses your comments, >>> as it was valid for both single/double patching approaches. >>> >>> [PATCH 05/10] ARM: LPAE: Correct virt_to_phys patching for 64 bit physical addresses >>> >>> The current phys_to_virt patching mechanism works only for 32 bit >>> physical addresses and this patch extends the idea for 64bit physical >>> addresses. >>> >>> The 64bit v2p patching mechanism patches the higher 8 bits of physical >>> address with a constant using 'mov' instruction and lower 32bits are patched >>> using 'add'. While this is correct, in those platforms where the lowmem addressable >>> physical memory spawns across 4GB boundary, a carry bit can be produced as a >>> result of addition of lower 32bits. This has to be taken in to account and added >>> in to the upper. The patched __pv_offset and va are added in lower 32bits, where >>> __pv_offset can be in two's complement form when PA_START < VA_START and that can >>> result in a false carry bit. >>> >>> e.g >>> 1) PA = 0x80000000; VA = 0xC0000000 >>> __pv_offset = PA - VA = 0xC0000000 (2's complement) >>> >>> 2) PA = 0x2 80000000; VA = 0xC000000 >>> __pv_offset = PA - VA = 0x1 C0000000 >>> >>> So adding __pv_offset + VA should never result in a true overflow for (1). >>> So in order to differentiate between a true carry, a __pv_offset is extended >>> to 64bit and the upper 32bits will have 0xffffffff if __pv_offset is >>> 2's complement. So 'mvn #0' is inserted instead of 'mov' while patching >>> for the same reason. Since mov, add, sub instruction are to patched >>> with different constants inside the same stub, the rotation field >>> of the opcode is using to differentiate between them. >>> >>> So the above examples for v2p translation becomes for VA=0xC0000000, >>> 1) PA[63:32] = 0xffffffff >>> PA[31:0] = VA + 0xC0000000 --> results in a carry >>> PA[63:32] = PA[63:32] + carry >>> >>> PA[63:0] = 0x0 80000000 >>> >>> 2) PA[63:32] = 0x1 >>> PA[31:0] = VA + 0xC0000000 --> results in a carry >>> PA[63:32] = PA[63:32] + carry >>> >>> PA[63:0] = 0x2 80000000 >>> >>> The above ideas were suggested by Nicolas Pitre as >>> part of the review of first and second versions of the subject patch. >> Still can be improved. >> >> [...] >> >>> 1: ldr ip, [r7, r3] >>> - bic ip, ip, #0x000000ff >>> - orr ip, ip, r6 @ mask in offset bits 31-24 >>> - str ip, [r7, r3] >>> -2: cmp r4, r5 >>> + bic ip, ip, #0xff >>> + tst ip, #0xf00 @ check the rotation field >>> + orrne ip, ip, r6 @ mask in offset bits 31-24 >>> + bne 2f >>> + bic ip, ip, #0x400000 @ clear bit 22 >> Why? > Clearing was required, because if we patch 2 times then > the first can be a mvn and we have to change it to a 'mov' > in the second round. >>> + cmn r0, #1 >>> + orreq ip, ip, #0x400000 @ set bit 22, mov to mvn instruction >>> + orrne ip, ip, r0 @ mask in offset bits 7-0 >>> +2: str ip, [r7, r3] >>> +3: cmp r4, r5 >> Instead of that "bne 2f", you should instead test r0 against 0xffffffff >> outside this loop and add bit 22 to r0 only once. No need to pre-clear >> it from ip either. >> > ok, so adding bit 22 should be outside and clearing inside the loop. > > Regards, > Sricharan > >> Nicolas Ok, again reworked it like this. [PATCH 05/11] ARM: LPAE: Correct virt_to_phys patching for 64 bit physical addresses The current phys_to_virt patching mechanism works only for 32 bit physical addresses and this patch extends the idea for 64bit physical addresses. The 64bit v2p patching mechanism patches the higher 8 bits of physical address with a constant using 'mov' instruction and lower 32bits are patched using 'add'. While this is correct, in those platforms where the lowmem addressable physical memory spawns across 4GB boundary, a carry bit can be produced as a result of addition of lower 32bits. This has to be taken in to account and added in to the upper. The patched __pv_offset and va are added in lower 32bits, where __pv_offset can be in two's complement form when PA_START < VA_START and that can result in a false carry bit. e.g 1) PA = 0x80000000; VA = 0xC0000000 __pv_offset = PA - VA = 0xC0000000 (2's complement) 2) PA = 0x2 80000000; VA = 0xC000000 __pv_offset = PA - VA = 0x1 C0000000 So adding __pv_offset + VA should never result in a true overflow for (1). So in order to differentiate between a true carry, a __pv_offset is extended to 64bit and the upper 32bits will have 0xffffffff if __pv_offset is 2's complement. So 'mvn #0' is inserted instead of 'mov' while patching for the same reason. Since mov, add, sub instruction are to patched with different constants inside the same stub, the rotation field of the opcode is using to differentiate between them. So the above examples for v2p translation becomes for VA=0xC0000000, 1) PA[63:32] = 0xffffffff PA[31:0] = VA + 0xC0000000 --> results in a carry PA[63:32] = PA[63:32] + carry PA[63:0] = 0x0 80000000 2) PA[63:32] = 0x1 PA[31:0] = VA + 0xC0000000 --> results in a carry PA[63:32] = PA[63:32] + carry PA[63:0] = 0x2 80000000 The above ideas were suggested by Nicolas Pitre as part of the review of first and second versions of the subject patch. There is no corresponding change on the phys_to_virt() side, because computations on the upper 32-bits would be discarded anyway. Cc: Nicolas Pitre Cc: Russell King Signed-off-by: Sricharan R Signed-off-by: Santosh Shilimkar --- arch/arm/include/asm/memory.h | 35 ++++++++++++++++++++-- arch/arm/kernel/armksyms.c | 1 + arch/arm/kernel/head.S | 64 ++++++++++++++++++++++++++--------------- arch/arm/kernel/patch.c | 3 ++ 4 files changed, 77 insertions(+), 26 deletions(-) diff --git a/arch/arm/include/asm/memory.h b/arch/arm/include/asm/memory.h index ff76e12..e2fa269 100644 --- a/arch/arm/include/asm/memory.h +++ b/arch/arm/include/asm/memory.h @@ -172,9 +172,12 @@ * so that all we need to do is modify the 8-bit constant field. */ #define __PV_BITS_31_24 0x81000000 +#define __PV_BITS_7_0 0x81 extern phys_addr_t (*arch_virt_to_idmap) (unsigned long x); -extern unsigned long __pv_phys_offset; +extern u64 __pv_phys_offset; +extern u64 __pv_offset; + #define PHYS_OFFSET __pv_phys_offset #define __pv_stub(from,to,instr,type) \ @@ -186,10 +189,36 @@ extern unsigned long __pv_phys_offset; : "=r" (to) \ : "r" (from), "I" (type)) +#define __pv_stub_mov_hi(t) \ + __asm__ volatile("@ __pv_stub_mov\n" \ + "1: mov %R0, %1\n" \ + " .pushsection .pv_table,\"a\"\n" \ + " .long 1b\n" \ + " .popsection\n" \ + : "=r" (t) \ + : "I" (__PV_BITS_7_0)) + +#define __pv_add_carry_stub(x, y) \ + __asm__ volatile("@ __pv_add_carry_stub\n" \ + "1: adds %Q0, %1, %2\n" \ + " adc %R0, %R0, #0\n" \ + " .pushsection .pv_table,\"a\"\n" \ + " .long 1b\n" \ + " .popsection\n" \ + : "+r" (y) \ + : "r" (x), "I" (__PV_BITS_31_24) \ + : "cc") + static inline phys_addr_t __virt_to_phys(unsigned long x) { - unsigned long t; - __pv_stub(x, t, "add", __PV_BITS_31_24); + phys_addr_t t; + + if (sizeof(phys_addr_t) == 4) { + __pv_stub(x, t, "add", __PV_BITS_31_24); + } else { + __pv_stub_mov_hi(t); + __pv_add_carry_stub(x, t); + } return t; } diff --git a/arch/arm/kernel/armksyms.c b/arch/arm/kernel/armksyms.c index 60d3b73..1f031dd 100644 --- a/arch/arm/kernel/armksyms.c +++ b/arch/arm/kernel/armksyms.c @@ -155,4 +155,5 @@ EXPORT_SYMBOL(__gnu_mcount_nc); #ifdef CONFIG_ARM_PATCH_PHYS_VIRT EXPORT_SYMBOL(__pv_phys_offset); +EXPORT_SYMBOL(__pv_offset); #endif diff --git a/arch/arm/kernel/head.S b/arch/arm/kernel/head.S index 45e8935..0c44d1e 100644 --- a/arch/arm/kernel/head.S +++ b/arch/arm/kernel/head.S @@ -536,6 +536,14 @@ ENTRY(fixup_smp) ldmfd sp!, {r4 - r6, pc} ENDPROC(fixup_smp) +#ifdef __ARMEB_ +#define LOW_OFFSET 0x4 +#define HIGH_OFFSET 0x0 +#else +#define LOW_OFFSET 0x0 +#define HIGH_OFFSET 0x4 +#endif + #ifdef CONFIG_ARM_PATCH_PHYS_VIRT /* __fixup_pv_table - patch the stub instructions with the delta between @@ -546,17 +554,20 @@ ENDPROC(fixup_smp) __HEAD __fixup_pv_table: adr r0, 1f - ldmia r0, {r3-r5, r7} - sub r3, r0, r3 @ PHYS_OFFSET - PAGE_OFFSET + ldmia r0, {r3-r7} + mvn ip, #0 + subs r3, r0, r3 @ PHYS_OFFSET - PAGE_OFFSET add r4, r4, r3 @ adjust table start address add r5, r5, r3 @ adjust table end address - add r7, r7, r3 @ adjust __pv_phys_offset address - str r8, [r7] @ save computed PHYS_OFFSET to __pv_phys_offset + add r6, r6, r3 @ adjust __pv_phys_offset address + add r7, r7, r3 @ adjust __pv_offset address + str r8, [r6, #LOW_OFFSET] @ save computed PHYS_OFFSET to __pv_phys_offset + strcc ip, [r7, #HIGH_OFFSET] @ save to __pv_offset high bits mov r6, r3, lsr #24 @ constant for add/sub instructions teq r3, r6, lsl #24 @ must be 16MiB aligned THUMB( it ne @ cross section branch ) bne __error - str r6, [r7, #4] @ save to __pv_offset + str r3, [r7, #LOW_OFFSET] @ save to __pv_offset low bits b __fixup_a_pv_table ENDPROC(__fixup_pv_table) @@ -565,9 +576,18 @@ ENDPROC(__fixup_pv_table) .long __pv_table_begin .long __pv_table_end 2: .long __pv_phys_offset + .long __pv_offset .text __fixup_a_pv_table: + adr r0, 3f + ldr r6, [r0] + add r6, r6, r3 + ldr r0, [r6, #HIGH_OFFSET] @ pv_offset high word + ldr r6, [r6, #LOW_OFFSET] @ pv_offset low word + mov r6, r6, lsr #24 + cmn r0, #1 + moveq r0, #0x400000 @ set bit 22, mov to mvn instruction #ifdef CONFIG_THUMB2_KERNEL lsls r6, #24 beq 2f @@ -582,9 +602,15 @@ __fixup_a_pv_table: b 2f 1: add r7, r3 ldrh ip, [r7, #2] - and ip, 0x8f00 - orr ip, r6 @ mask in offset bits 31-24 + tst ip, #0x4000 + and ip, #0x8f00 + orrne ip, r6 @ mask in offset bits 31-24 + orreq ip, r0 @ mask in offset bits 7-0 strh ip, [r7, #2] + ldrheq ip, [r7] + biceq ip, #0x20 + orreq ip, ip, r0, lsr #16 + strheq ip, [r7] 2: cmp r4, r5 ldrcc r7, [r4], #4 @ use branch for delay slot bcc 1b @@ -593,7 +619,10 @@ __fixup_a_pv_table: b 2f 1: ldr ip, [r7, r3] bic ip, ip, #0x000000ff - orr ip, ip, r6 @ mask in offset bits 31-24 + tst ip, #0xf00 @ check the rotation field + orrne ip, ip, r6 @ mask in offset bits 31-24 + biceq ip, ip, #0x400000 @ clear bit 22 + orreq ip, ip, r0 @ mask in offset bits 7-0 str ip, [r7, r3] 2: cmp r4, r5 ldrcc r7, [r4], #4 @ use branch for delay slot @@ -602,28 +631,17 @@ __fixup_a_pv_table: #endif ENDPROC(__fixup_a_pv_table) +3: .long __pv_offset + ENTRY(fixup_pv_table) - stmfd sp!, {r4 - r7, lr} - ldr r2, 2f @ get address of __pv_phys_offset + stmfd sp!, {r0, r3 - r7, r12, lr} mov r3, #0 @ no offset mov r4, r0 @ r0 = table start add r5, r0, r1 @ r1 = table size - ldr r6, [r2, #4] @ get __pv_offset bl __fixup_a_pv_table - ldmfd sp!, {r4 - r7, pc} + ldmfd sp!, {r0, r3 - r7, r12, pc} ENDPROC(fixup_pv_table) - .align -2: .long __pv_phys_offset - - .data - .globl __pv_phys_offset - .type __pv_phys_offset, %object -__pv_phys_offset: - .long 0 - .size __pv_phys_offset, . - __pv_phys_offset -__pv_offset: - .long 0 #endif #include "head-common.S" diff --git a/arch/arm/kernel/patch.c b/arch/arm/kernel/patch.c index 07314af..8356312 100644 --- a/arch/arm/kernel/patch.c +++ b/arch/arm/kernel/patch.c @@ -8,6 +8,9 @@ #include "patch.h" +u64 __pv_phys_offset __attribute__((section(".data"))); +u64 __pv_offset __attribute__((section(".data"))); + struct patch { void *addr; unsigned int insn;