From patchwork Fri Mar 13 16:39:23 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mason X-Patchwork-Id: 6007211 Return-Path: X-Original-To: patchwork-linux-arm@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork2.web.kernel.org (Postfix) with ESMTP id 07E3CBF90F for ; Fri, 13 Mar 2015 16:42:04 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 1816920154 for ; Fri, 13 Mar 2015 16:42:03 +0000 (UTC) Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.9]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 0336F20138 for ; Fri, 13 Mar 2015 16:42:02 +0000 (UTC) Received: from localhost ([127.0.0.1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.80.1 #2 (Red Hat Linux)) id 1YWScs-0006ZW-SI; Fri, 13 Mar 2015 16:39:54 +0000 Received: from smtp2-g21.free.fr ([212.27.42.2]) by bombadil.infradead.org with esmtps (Exim 4.80.1 #2 (Red Hat Linux)) id 1YWScm-0006H5-QK for linux-arm-kernel@lists.infradead.org; Fri, 13 Mar 2015 16:39:49 +0000 Received: from [172.27.0.114] (unknown [83.142.147.193]) (Authenticated sender: shill) by smtp2-g21.free.fr (Postfix) with ESMTPSA id 7DD014B029A; Fri, 13 Mar 2015 17:38:21 +0100 (CET) Message-ID: <550312BB.8020902@free.fr> Date: Fri, 13 Mar 2015 17:39:23 +0100 From: Mason User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:35.0) Gecko/20100101 Firefox/35.0 SeaMonkey/2.32.1 MIME-Version: 1.0 To: Russell King - ARM Linux Subject: Re: read_cpuid_id() in arch/arm/kernel/setup.c References: <55030A68.6070002@free.fr> <20150313161953.GS8656@n2100.arm.linux.org.uk> In-Reply-To: <20150313161953.GS8656@n2100.arm.linux.org.uk> X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20150313_093949_015258_B7D4F73C X-CRM114-Status: GOOD ( 17.64 ) X-Spam-Score: -0.7 (/) Cc: Linux ARM X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00,FREEMAIL_FROM, RCVD_IN_DNSWL_MED, T_RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On 13/03/2015 17:19, Russell King - ARM Linux wrote: > On Fri, Mar 13, 2015 at 05:03:52PM +0100, Mason wrote: >> Hello everyone, >> >> As far as I can tell, read_cpuid_id() resolves to read_cpuid(CPUID_ID) >> which resolves to mrc 15, 0, rN, cr0, cr0, {0} >> >> Consider this: >> >> /* >> * The CPU ID never changes at run time, so we might as well tell the >> * compiler that it's constant. Use this function to read the CPU ID >> * rather than directly reading processor_id or read_cpuid() directly. >> */ >> static inline unsigned int __attribute_const__ read_cpuid_id(void) >> { >> return read_cpuid(CPUID_ID); >> } >> >> Despite the comment and attribute, my compiler(*) still reloads the >> value every time. >> >> (*) gcc version 4.9.3 20141031 (prerelease) (Linaro GCC 2014.11) >> >> e.g. >> >> static int __get_cpu_architecture(void) >> { >> int cpu_arch; >> >> unsigned int id = read_cpuid_id(); >> if ((id & 0x0008f000) == 0) { >> cpu_arch = CPU_ARCH_UNKNOWN; >> } else if ((id & 0x0008f000) == 0x00007000) { >> cpu_arch = (id & (1 << 23)) ? CPU_ARCH_ARMv4T : CPU_ARCH_ARMv3; >> } else if ((id & 0x00080000) == 0x00000000) { >> cpu_arch = (id >> 16) & 7; >> if (cpu_arch) >> cpu_arch += CPU_ARCH_ARMv3; >> } else if ((id & 0x000f0000) == 0x000f0000) { >> >> resolves to >> >> c01fec74: ee10cf10 mrc 15, 0, ip, cr0, cr0, {0} >> c01fec78: e21cca8f ands ip, ip, #585728 ; 0x8f000 >> c01fec7c: e34c3023 movt r3, #49187 ; 0xc023 >> c01fec80: e5837008 str r7, [r3, #8] >> c01fec84: e50b304c str r3, [fp, #-76] ; 0x4c >> c01fec88: 0a000022 beq c01fed18 >> c01fec8c: ee103f10 mrc 15, 0, r3, cr0, cr0, {0} >> c01fec90: e2033a8f and r3, r3, #585728 ; 0x8f000 >> c01fec94: e3530a07 cmp r3, #28672 ; 0x7000 >> c01fec98: 1a000004 bne c01fecb0 >> c01fec9c: ee103f10 mrc 15, 0, r3, cr0, cr0, {0} >> c01feca0: e3130502 tst r3, #8388608 ; 0x800000 >> c01feca4: 13a0c003 movne ip, #3 >> c01feca8: 03a0c001 moveq ip, #1 >> c01fecac: ea000019 b c01fed18 >> c01fecb0: ee103f10 mrc 15, 0, r3, cr0, cr0, {0} >> c01fecb4: e3130702 tst r3, #524288 ; 0x80000 >> >> >> So I thought it would be nice to give the poor compiler a break, >> and just stuff the result in a local variable: > > NAK. > > Your compiler behaviour is different to mine (stock gcc 4.7.4): > > 4f8: e1a0c00d mov ip, sp > 4fc: e92ddff0 push {r4, r5, r6, r7, r8, r9, sl, fp, ip, lr, pc} > 500: e24cb004 sub fp, ip, #4 > 504: e24dd00c sub sp, sp, #12 > 508: ee105f10 mrc 15, 0, r5, cr0, cr0, {0} <== load id > 50c: e1a07000 mov r7, r0 > 510: e1a00005 mov r0, r5 <== r5 = id > 514: ebfffffe bl 0 > 518: e2504000 subs r4, r0, #0 > 51c: 1a000003 bne 530 > 520: e59f063c ldr r0, [pc, #1596] ; b64 > 524: e1a01005 mov r1, r5 > 528: ebfffffe bl 0 > 52c: eafffffe b 52c > 530: e5948020 ldr r8, [r4, #32] > 534: e2153a8f ands r3, r5, #585728 ; 0x8f000 <== r5 = id > 538: e59f2628 ldr r2, [pc, #1576] ; b68 > 538: e59f2628 ldr r2, [pc, #1576] ; b68 > 53c: e5828000 str r8, [r2] > 540: 0a00001f beq 5c4 > 544: e3530a07 cmp r3, #28672 ; 0x7000 > 548: 1a000003 bne 55c > 54c: e3150502 tst r5, #8388608 ; 0x800000 <== r5 = id > 550: 03a03001 moveq r3, #1 > 554: 13a03003 movne r3, #3 > 558: ea000019 b 5c4 > 55c: e3150702 tst r5, #524288 ; 0x80000 <== r5 = id > 560: 1a000003 bne 574 > 564: e7e23855 ubfx r3, r5, #16, #3 <== r5 = id > 568: e3530000 cmp r3, #0 > 56c: 12833001 addne r3, r3, #1 > 570: ea000013 b 5c4 > 574: e205580f and r5, r5, #983040 ; 0xf0000 <== r5 = id > 578: e355080f cmp r5, #983040 ; 0xf0000 > 57c: 13a03000 movne r3, #0 > 580: 1a00000f bne 5c4 > ... > > The point here is that the compiler is free to optimise the code as it > sees fit - if it decides that the register pressure from having the > value cached in a register is too high, it can decide to spill the > cached value, and reload it from CP15 as and when it needs to. That > is an advantage. Good point. I hadn't thought of that. Do you know the latency of an mrc instruction? (compared to a mov) > It seems that GCC 4.7.4 optimises better than Linaro's 4.9.3. In fact, > it looks like Linaro's 4.9.3 is rather buggy as far as optimisation > goes. > > Later compilers aren't always better. I did NOT expect that. Compiler optimizations passes are so fragile. Anyway, here's another proposed nano-improvement ;-) (This one is code factorization) This one looks good, doesn't it? :-) Regards. --- setup.c 2015-03-03 18:04:59.000000000 +0100 +++ setup.bar.c 2015-03-13 17:29:23.800668429 +0100 @@ -246,12 +246,9 @@ if (cpu_arch) cpu_arch += CPU_ARCH_ARMv3; } else if ((read_cpuid_id() & 0x000f0000) == 0x000f0000) { - unsigned int mmfr0; - /* Revised CPUID format. Read the Memory Model Feature * Register 0 and check for VMSAv7 or PMSAv7 */ - asm("mrc p15, 0, %0, c0, c1, 4" - : "=r" (mmfr0)); + unsigned int mmfr0 = read_cpuid_ext(CPUID_EXT_MMFR0); if ((mmfr0 & 0x0000000f) >= 0x00000003 || (mmfr0 & 0x000000f0) >= 0x00000030) cpu_arch = CPU_ARCH_ARMv7;