Message ID | 20201123073634.6854-1-swpenim@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [RESEND] ARM: fix __div64_32() error when compiling with clang | expand |
On Mon, Nov 23, 2020 at 03:36:32PM +0800, Antony Yu wrote: > __do_div64 clobbers the input register r0 in little endian system. > According to the inline assembly document, if an input operand is > modified, it should be tied to a output operand. This patch can > prevent compilers from reusing r0 register after asm statements. > > Signed-off-by: Antony Yu <swpenim@gmail.com> > --- > arch/arm/include/asm/div64.h | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff --git a/arch/arm/include/asm/div64.h b/arch/arm/include/asm/div64.h > index 898e9c78a7e7..809efc51e90f 100644 > --- a/arch/arm/include/asm/div64.h > +++ b/arch/arm/include/asm/div64.h > @@ -39,9 +39,10 @@ static inline uint32_t __div64_32(uint64_t *n, uint32_t base) > asm( __asmeq("%0", __xh) > __asmeq("%1", "r2") > __asmeq("%2", "r0") > - __asmeq("%3", "r4") > + __asmeq("%3", "r0") > + __asmeq("%4", "r4") > "bl __do_div64" > - : "=r" (__rem), "=r" (__res) > + : "=r" (__rem), "=r" (__res), "=r" (__n) > : "r" (__n), "r" (__base) > : "ip", "lr", "cc"); > *n = __res; > -- > 2.23.0 > I am not sure that I am qualified to review this (my assembly knowledge is not the best) but your commit title mentions an error when compiling with clang. What is the exact error, what configuration generates it, and what version of clang? We have done fairly decent testing for 32-bit ARM, I would like to know what we are missing. Cheers, Nathan
Antony Yu <swpenim@gmail.com> 於 2020年11月24日 週二 下午3:43寫道: > > On Mon, Nov 23, 2020 at 11:16:02AM -0700, Nathan Chancellor wrote: > > On Mon, Nov 23, 2020 at 03:36:32PM +0800, Antony Yu wrote: > > > __do_div64 clobbers the input register r0 in little endian system. > > > According to the inline assembly document, if an input operand is > > > modified, it should be tied to a output operand. This patch can > > > prevent compilers from reusing r0 register after asm statements. > > > > > > Signed-off-by: Antony Yu <swpenim@gmail.com> > > > --- > > > arch/arm/include/asm/div64.h | 5 +++-- > > > 1 file changed, 3 insertions(+), 2 deletions(-) > > > > > > diff --git a/arch/arm/include/asm/div64.h b/arch/arm/include/asm/div64.h > > > index 898e9c78a7e7..809efc51e90f 100644 > > > --- a/arch/arm/include/asm/div64.h > > > +++ b/arch/arm/include/asm/div64.h > > > @@ -39,9 +39,10 @@ static inline uint32_t __div64_32(uint64_t *n, uint32_t base) > > > asm( __asmeq("%0", __xh) > > > __asmeq("%1", "r2") > > > __asmeq("%2", "r0") > > > - __asmeq("%3", "r4") > > > + __asmeq("%3", "r0") > > > + __asmeq("%4", "r4") > > > "bl __do_div64" > > > - : "=r" (__rem), "=r" (__res) > > > + : "=r" (__rem), "=r" (__res), "=r" (__n) > > > : "r" (__n), "r" (__base) > > > : "ip", "lr", "cc"); > > > *n = __res; > > > -- > > > 2.23.0 > > > > > > > I am not sure that I am qualified to review this (my assembly knowledge > > is not the best) but your commit title mentions an error when compiling > > with clang. What is the exact error, what configuration generates it, > > and what version of clang? We have done fairly decent testing for > > 32-bit ARM, I would like to know what we are missing. > > > > Cheers, > > Nathan > > We have run fail on android R vts vts_libsnapshot_test with kernel 4.14. > This bug is triggered accidently by a workaround patch in our code base. > It is fine on a pure clean 4.14 branch since __do_div64 may not be > executed in skip_metadata. > > The attachment are .i and generated .s file. .s file can be reproduced > with clang -target arm-linux-eabi -march=armv8.2-a -O2. > > In function skip_metadata, it loads some value to r0, calls __do_div64, > adds 1 to r0 and stores it to [r5]. It gets wrong value since __do_div64 > clobbers r0 register. > > We have tried clang-10, clang-11 and android prebuilt clang-r383902b. All > of them have the same problem. Sorry for the large attachment. I put .i and .s files on https://gist.github.com/penyung/274b0c697062a1c776994bb40243cfff Antony Yu
Thanks for the report, it probably was not fun to debug. I'll take a closer look at this after the Thanksgiving holiday. On Tue, Nov 24, 2020 at 2:14 AM Antony Yu <swpenim@gmail.com> wrote: > > Antony Yu <swpenim@gmail.com> 於 2020年11月24日 週二 下午3:43寫道: > > > > On Mon, Nov 23, 2020 at 11:16:02AM -0700, Nathan Chancellor wrote: > > > On Mon, Nov 23, 2020 at 03:36:32PM +0800, Antony Yu wrote: > > > > __do_div64 clobbers the input register r0 in little endian system. > > > > According to the inline assembly document, if an input operand is > > > > modified, it should be tied to a output operand. This patch can > > > > prevent compilers from reusing r0 register after asm statements. > > > > > > > > Signed-off-by: Antony Yu <swpenim@gmail.com> > > > > --- > > > > arch/arm/include/asm/div64.h | 5 +++-- > > > > 1 file changed, 3 insertions(+), 2 deletions(-) > > > > > > > > diff --git a/arch/arm/include/asm/div64.h b/arch/arm/include/asm/div64.h > > > > index 898e9c78a7e7..809efc51e90f 100644 > > > > --- a/arch/arm/include/asm/div64.h > > > > +++ b/arch/arm/include/asm/div64.h > > > > @@ -39,9 +39,10 @@ static inline uint32_t __div64_32(uint64_t *n, uint32_t base) > > > > asm( __asmeq("%0", __xh) > > > > __asmeq("%1", "r2") > > > > __asmeq("%2", "r0") > > > > - __asmeq("%3", "r4") > > > > + __asmeq("%3", "r0") > > > > + __asmeq("%4", "r4") > > > > "bl __do_div64" > > > > - : "=r" (__rem), "=r" (__res) > > > > + : "=r" (__rem), "=r" (__res), "=r" (__n) > > > > : "r" (__n), "r" (__base) > > > > : "ip", "lr", "cc"); > > > > *n = __res; > > > > -- > > > > 2.23.0 > > > > > > > > > > I am not sure that I am qualified to review this (my assembly knowledge > > > is not the best) but your commit title mentions an error when compiling > > > with clang. What is the exact error, what configuration generates it, > > > and what version of clang? We have done fairly decent testing for > > > 32-bit ARM, I would like to know what we are missing. > > > > > > Cheers, > > > Nathan > > > > We have run fail on android R vts vts_libsnapshot_test with kernel 4.14. > > This bug is triggered accidently by a workaround patch in our code base. > > It is fine on a pure clean 4.14 branch since __do_div64 may not be > > executed in skip_metadata. > > > > The attachment are .i and generated .s file. .s file can be reproduced > > with clang -target arm-linux-eabi -march=armv8.2-a -O2. > > > > In function skip_metadata, it loads some value to r0, calls __do_div64, > > adds 1 to r0 and stores it to [r5]. It gets wrong value since __do_div64 > > clobbers r0 register. > > > > We have tried clang-10, clang-11 and android prebuilt clang-r383902b. All > > of them have the same problem. > > Sorry for the large attachment. > I put .i and .s files on > https://gist.github.com/penyung/274b0c697062a1c776994bb40243cfff > > Antony Yu
Hi Antony, Thank you for the patch! Yet something to improve: [auto build test ERROR on linus/master] [also build test ERROR on v5.10-rc5 next-20201124] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch] url: https://github.com/0day-ci/linux/commits/Antony-Yu/ARM-fix-__div64_32-error-when-compiling-with-clang/20201123-154454 base: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 418baf2c28f3473039f2f7377760bd8f6897ae18 config: arm-randconfig-r036-20201124 (attached as .config) compiler: clang version 12.0.0 (https://github.com/llvm/llvm-project df9ae5992889560a8f3c6760b54d5051b47c7bf5) reproduce (this is a W=1 build): wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # install arm cross compiling tool for clang build # apt-get install binutils-arm-linux-gnueabi # https://github.com/0day-ci/linux/commit/83df1f983ed4219556020bf30cc819e66bb7e69c git remote add linux-review https://github.com/0day-ci/linux git fetch --no-tags linux-review Antony-Yu/ARM-fix-__div64_32-error-when-compiling-with-clang/20201123-154454 git checkout 83df1f983ed4219556020bf30cc819e66bb7e69c # save the attached .config to linux build tree COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=arm If you fix the issue, kindly add following tag as appropriate Reported-by: kernel test robot <lkp@intel.com> All errors (new ones prefixed by >>): drivers/tty/serial/imx.c:359:19: warning: unused function 'imx_uart_is_imx21' [-Wunused-function] static inline int imx_uart_is_imx21(struct imx_port *sport) ^ drivers/tty/serial/imx.c:364:19: warning: unused function 'imx_uart_is_imx53' [-Wunused-function] static inline int imx_uart_is_imx53(struct imx_port *sport) ^ drivers/tty/serial/imx.c:369:19: warning: unused function 'imx_uart_is_imx6q' [-Wunused-function] static inline int imx_uart_is_imx6q(struct imx_port *sport) ^ In file included from drivers/tty/serial/imx.c:11: In file included from include/linux/module.h:12: In file included from include/linux/list.h:9: In file included from include/linux/kernel.h:19: >> arch/arm/include/asm/div64.h:39:2: error: multiple outputs to hard register: r0 asm( __asmeq("%0", __xh) ^ 3 warnings and 1 error generated. -- In file included from drivers/gpu/drm/tegra/dc.c:7: In file included from include/linux/clk.h:13: In file included from include/linux/kernel.h:19: >> arch/arm/include/asm/div64.h:39:2: error: multiple outputs to hard register: r0 asm( __asmeq("%0", __xh) ^ 1 error generated. vim +39 arch/arm/include/asm/div64.h ^1da177e4c3f415 include/asm-arm/div64.h Linus Torvalds 2005-04-16 32 040b323b5012b55 arch/arm/include/asm/div64.h Nicolas Pitre 2015-11-02 33 static inline uint32_t __div64_32(uint64_t *n, uint32_t base) 040b323b5012b55 arch/arm/include/asm/div64.h Nicolas Pitre 2015-11-02 34 { 040b323b5012b55 arch/arm/include/asm/div64.h Nicolas Pitre 2015-11-02 35 register unsigned int __base asm("r4") = base; 040b323b5012b55 arch/arm/include/asm/div64.h Nicolas Pitre 2015-11-02 36 register unsigned long long __n asm("r0") = *n; 040b323b5012b55 arch/arm/include/asm/div64.h Nicolas Pitre 2015-11-02 37 register unsigned long long __res asm("r2"); 040b323b5012b55 arch/arm/include/asm/div64.h Nicolas Pitre 2015-11-02 38 register unsigned int __rem asm(__xh); 040b323b5012b55 arch/arm/include/asm/div64.h Nicolas Pitre 2015-11-02 @39 asm( __asmeq("%0", __xh) 040b323b5012b55 arch/arm/include/asm/div64.h Nicolas Pitre 2015-11-02 40 __asmeq("%1", "r2") 040b323b5012b55 arch/arm/include/asm/div64.h Nicolas Pitre 2015-11-02 41 __asmeq("%2", "r0") 83df1f983ed4219 arch/arm/include/asm/div64.h Antony Yu 2020-11-23 42 __asmeq("%3", "r0") 83df1f983ed4219 arch/arm/include/asm/div64.h Antony Yu 2020-11-23 43 __asmeq("%4", "r4") 040b323b5012b55 arch/arm/include/asm/div64.h Nicolas Pitre 2015-11-02 44 "bl __do_div64" 83df1f983ed4219 arch/arm/include/asm/div64.h Antony Yu 2020-11-23 45 : "=r" (__rem), "=r" (__res), "=r" (__n) 040b323b5012b55 arch/arm/include/asm/div64.h Nicolas Pitre 2015-11-02 46 : "r" (__n), "r" (__base) 040b323b5012b55 arch/arm/include/asm/div64.h Nicolas Pitre 2015-11-02 47 : "ip", "lr", "cc"); 040b323b5012b55 arch/arm/include/asm/div64.h Nicolas Pitre 2015-11-02 48 *n = __res; 040b323b5012b55 arch/arm/include/asm/div64.h Nicolas Pitre 2015-11-02 49 return __rem; 040b323b5012b55 arch/arm/include/asm/div64.h Nicolas Pitre 2015-11-02 50 } 040b323b5012b55 arch/arm/include/asm/div64.h Nicolas Pitre 2015-11-02 51 #define __div64_32 __div64_32 040b323b5012b55 arch/arm/include/asm/div64.h Nicolas Pitre 2015-11-02 52 --- 0-DAY CI Kernel Test Service, Intel Corporation https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
On Mon, 23 Nov 2020 at 08:39, Antony Yu <swpenim@gmail.com> wrote: > > __do_div64 clobbers the input register r0 in little endian system. > According to the inline assembly document, if an input operand is > modified, it should be tied to a output operand. This patch can > prevent compilers from reusing r0 register after asm statements. > > Signed-off-by: Antony Yu <swpenim@gmail.com> > --- > arch/arm/include/asm/div64.h | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff --git a/arch/arm/include/asm/div64.h b/arch/arm/include/asm/div64.h > index 898e9c78a7e7..809efc51e90f 100644 > --- a/arch/arm/include/asm/div64.h > +++ b/arch/arm/include/asm/div64.h > @@ -39,9 +39,10 @@ static inline uint32_t __div64_32(uint64_t *n, uint32_t base) > asm( __asmeq("%0", __xh) > __asmeq("%1", "r2") > __asmeq("%2", "r0") > - __asmeq("%3", "r4") > + __asmeq("%3", "r0") > + __asmeq("%4", "r4") > "bl __do_div64" > - : "=r" (__rem), "=r" (__res) > + : "=r" (__rem), "=r" (__res), "=r" (__n) > : "r" (__n), "r" (__base) > : "ip", "lr", "cc"); > *n = __res; > -- > 2.23.0 > Agree that using r0 as an input operand only is incorrect, and not only on Clang. The compiler might assume that r0 will retain its value across the asm() block, which is obviously not the case. However, your patch will likely break big-endian, since in that case, __xh == r0, and so it will appear twice. Perhaps it would be better to change the type of __rem to unsigned long long as well?
(+ Nico) On Mon, 30 Nov 2020 at 11:11, Ard Biesheuvel <ardb@kernel.org> wrote: > > On Mon, 23 Nov 2020 at 08:39, Antony Yu <swpenim@gmail.com> wrote: > > > > __do_div64 clobbers the input register r0 in little endian system. > > According to the inline assembly document, if an input operand is > > modified, it should be tied to a output operand. This patch can > > prevent compilers from reusing r0 register after asm statements. > > > > Signed-off-by: Antony Yu <swpenim@gmail.com> > > --- > > arch/arm/include/asm/div64.h | 5 +++-- > > 1 file changed, 3 insertions(+), 2 deletions(-) > > > > diff --git a/arch/arm/include/asm/div64.h b/arch/arm/include/asm/div64.h > > index 898e9c78a7e7..809efc51e90f 100644 > > --- a/arch/arm/include/asm/div64.h > > +++ b/arch/arm/include/asm/div64.h > > @@ -39,9 +39,10 @@ static inline uint32_t __div64_32(uint64_t *n, uint32_t base) > > asm( __asmeq("%0", __xh) > > __asmeq("%1", "r2") > > __asmeq("%2", "r0") > > - __asmeq("%3", "r4") > > + __asmeq("%3", "r0") > > + __asmeq("%4", "r4") > > "bl __do_div64" > > - : "=r" (__rem), "=r" (__res) > > + : "=r" (__rem), "=r" (__res), "=r" (__n) > > : "r" (__n), "r" (__base) > > : "ip", "lr", "cc"); > > *n = __res; > > -- > > 2.23.0 > > > > Agree that using r0 as an input operand only is incorrect, and not > only on Clang. The compiler might assume that r0 will retain its value > across the asm() block, which is obviously not the case. > > However, your patch will likely break big-endian, since in that case, > __xh == r0, and so it will appear twice. > > Perhaps it would be better to change the type of __rem to unsigned > long long as well?
On Mon, Nov 30, 2020 at 11:12:33AM +0100, Ard Biesheuvel wrote: > (+ Nico) > > On Mon, 30 Nov 2020 at 11:11, Ard Biesheuvel <ardb@kernel.org> wrote: > > > > On Mon, 23 Nov 2020 at 08:39, Antony Yu <swpenim@gmail.com> wrote: > > > > > > __do_div64 clobbers the input register r0 in little endian system. > > > According to the inline assembly document, if an input operand is > > > modified, it should be tied to a output operand. This patch can > > > prevent compilers from reusing r0 register after asm statements. > > > > > > Signed-off-by: Antony Yu <swpenim@gmail.com> > > > --- > > > arch/arm/include/asm/div64.h | 5 +++-- > > > 1 file changed, 3 insertions(+), 2 deletions(-) > > > > > > diff --git a/arch/arm/include/asm/div64.h b/arch/arm/include/asm/div64.h > > > index 898e9c78a7e7..809efc51e90f 100644 > > > --- a/arch/arm/include/asm/div64.h > > > +++ b/arch/arm/include/asm/div64.h > > > @@ -39,9 +39,10 @@ static inline uint32_t __div64_32(uint64_t *n, uint32_t base) > > > asm( __asmeq("%0", __xh) > > > __asmeq("%1", "r2") > > > __asmeq("%2", "r0") > > > - __asmeq("%3", "r4") > > > + __asmeq("%3", "r0") > > > + __asmeq("%4", "r4") > > > "bl __do_div64" > > > - : "=r" (__rem), "=r" (__res) > > > + : "=r" (__rem), "=r" (__res), "=r" (__n) > > > : "r" (__n), "r" (__base) > > > : "ip", "lr", "cc"); > > > *n = __res; > > > -- > > > 2.23.0 > > > > > > > Agree that using r0 as an input operand only is incorrect, and not > > only on Clang. The compiler might assume that r0 will retain its value > > across the asm() block, which is obviously not the case. However, you can _not_ have an asm block that names two outputs using the same physical register - that's why both the original patch and the posted v2 will fail. You also can't mark r0 as clobbered because it's used as an operand and that is not allowed by gcc. The fact is, we have two register variables occupying the same register, which are __n and __rem. It doesn't matter which endian-ness __rem is, r0 will be used for both __n (input) and __rem (output). If the compiler can't work out that if a physical register used as an output operand will be written by the assembler, then the compiler is quite simply buggy. The code is correct as it stands; Clang is buggy.
On Mon, 30 Nov 2020 at 11:21, Russell King - ARM Linux admin <linux@armlinux.org.uk> wrote: > > On Mon, Nov 30, 2020 at 11:12:33AM +0100, Ard Biesheuvel wrote: > > (+ Nico) > > > > On Mon, 30 Nov 2020 at 11:11, Ard Biesheuvel <ardb@kernel.org> wrote: > > > > > > On Mon, 23 Nov 2020 at 08:39, Antony Yu <swpenim@gmail.com> wrote: > > > > > > > > __do_div64 clobbers the input register r0 in little endian system. > > > > According to the inline assembly document, if an input operand is > > > > modified, it should be tied to a output operand. This patch can > > > > prevent compilers from reusing r0 register after asm statements. > > > > > > > > Signed-off-by: Antony Yu <swpenim@gmail.com> > > > > --- > > > > arch/arm/include/asm/div64.h | 5 +++-- > > > > 1 file changed, 3 insertions(+), 2 deletions(-) > > > > > > > > diff --git a/arch/arm/include/asm/div64.h b/arch/arm/include/asm/div64.h > > > > index 898e9c78a7e7..809efc51e90f 100644 > > > > --- a/arch/arm/include/asm/div64.h > > > > +++ b/arch/arm/include/asm/div64.h > > > > @@ -39,9 +39,10 @@ static inline uint32_t __div64_32(uint64_t *n, uint32_t base) > > > > asm( __asmeq("%0", __xh) > > > > __asmeq("%1", "r2") > > > > __asmeq("%2", "r0") > > > > - __asmeq("%3", "r4") > > > > + __asmeq("%3", "r0") > > > > + __asmeq("%4", "r4") > > > > "bl __do_div64" > > > > - : "=r" (__rem), "=r" (__res) > > > > + : "=r" (__rem), "=r" (__res), "=r" (__n) > > > > : "r" (__n), "r" (__base) > > > > : "ip", "lr", "cc"); > > > > *n = __res; > > > > -- > > > > 2.23.0 > > > > > > > > > > Agree that using r0 as an input operand only is incorrect, and not > > > only on Clang. The compiler might assume that r0 will retain its value > > > across the asm() block, which is obviously not the case. > > However, you can _not_ have an asm block that names two outputs using > the same physical register - that's why both the original patch and > the posted v2 will fail. > > You also can't mark r0 as clobbered because it's used as an operand > and that is not allowed by gcc. > > The fact is, we have two register variables occupying the same register, > which are __n and __rem. It doesn't matter which endian-ness __rem is, > r0 will be used for both __n (input) and __rem (output). > __rem is a 32-bit variable, so in LE mode, only r1 is used for __rem, not r0. So r0/r1 are used as an input operand pair, and r1 is used as an output operand. So I don't think the compiler has to be buggy in order for it to assume that r0 will still contain the low word of the dividend afterwards. And actually, the same applies on BE, but the other way around. So we should mark __xl as an output register as well, as __xl will assume the right value depending on the endianness. I suggest something like the below, diff --git a/arch/arm/include/asm/div64.h b/arch/arm/include/asm/div64.h index 898e9c78a7e7..85ff9109595e 100644 --- a/arch/arm/include/asm/div64.h +++ b/arch/arm/include/asm/div64.h @@ -36,12 +36,14 @@ static inline uint32_t __div64_32(uint64_t *n, uint32_t base) register unsigned long long __n asm("r0") = *n; register unsigned long long __res asm("r2"); register unsigned int __rem asm(__xh); + register unsigned int __dummy asm(__xl); asm( __asmeq("%0", __xh) __asmeq("%1", "r2") - __asmeq("%2", "r0") - __asmeq("%3", "r4") + __asmeq("%2", __xl) + __asmeq("%3", "r0") + __asmeq("%4", "r4") "bl __do_div64" - : "=r" (__rem), "=r" (__res) + : "=r" (__rem), "=r" (__res), "=r"(__dummy) : "r" (__n), "r" (__base) : "ip", "lr", "cc"); *n = __res; > If the compiler can't work out that if a physical register used as an > output operand will be written by the assembler, then the compiler is > quite simply buggy. > > The code is correct as it stands; Clang is buggy. > > -- > RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ > FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!
> And actually, the same applies on BE, but the other way around. So we > should mark __xl as an output register as well, as __xl will assume > the right value depending on the endianness. Why not use "+r" to indicate than an 'output' parameter is also used as an input. Rather cleaner than specifying the same C variable as both input and output. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
On Mon, Nov 30, 2020 at 01:58:27PM +0000, David Laight wrote: > > And actually, the same applies on BE, but the other way around. So we > > should mark __xl as an output register as well, as __xl will assume > > the right value depending on the endianness. > > Why not use "+r" to indicate than an 'output' parameter is also > used as an input. > > Rather cleaner than specifying the same C variable as both > input and output. You have an incorrect understanding. "__n" is the input operand in r0. "__rem" is the output operand in r0/r1. No single C variable is used as both an input and an output.
On Mon, 30 Nov 2020, Ard Biesheuvel wrote: > (+ Nico) > > On Mon, 30 Nov 2020 at 11:11, Ard Biesheuvel <ardb@kernel.org> wrote: > > > > On Mon, 23 Nov 2020 at 08:39, Antony Yu <swpenim@gmail.com> wrote: > > > > > > __do_div64 clobbers the input register r0 in little endian system. > > > According to the inline assembly document, if an input operand is > > > modified, it should be tied to a output operand. This patch can > > > prevent compilers from reusing r0 register after asm statements. > > > > > > Signed-off-by: Antony Yu <swpenim@gmail.com> > > > --- > > > arch/arm/include/asm/div64.h | 5 +++-- > > > 1 file changed, 3 insertions(+), 2 deletions(-) > > > > > > diff --git a/arch/arm/include/asm/div64.h b/arch/arm/include/asm/div64.h > > > index 898e9c78a7e7..809efc51e90f 100644 > > > --- a/arch/arm/include/asm/div64.h > > > +++ b/arch/arm/include/asm/div64.h > > > @@ -39,9 +39,10 @@ static inline uint32_t __div64_32(uint64_t *n, uint32_t base) > > > asm( __asmeq("%0", __xh) > > > __asmeq("%1", "r2") > > > __asmeq("%2", "r0") > > > - __asmeq("%3", "r4") > > > + __asmeq("%3", "r0") > > > + __asmeq("%4", "r4") > > > "bl __do_div64" > > > - : "=r" (__rem), "=r" (__res) > > > + : "=r" (__rem), "=r" (__res), "=r" (__n) > > > : "r" (__n), "r" (__base) > > > : "ip", "lr", "cc"); > > > *n = __res; > > > -- > > > 2.23.0 > > > > > > > Agree that using r0 as an input operand only is incorrect, and not > > only on Clang. The compiler might assume that r0 will retain its value > > across the asm() block, which is obviously not the case. You're right. This was done like that most likely to work around some stupid code generation with "__n >> 32" while using gcc from about 20 years ago. IOW I don't exactly remember why I did it like that, but it is certainly flawed. Here's my version of the fix which should be correct. Warning: this is completely untested, but should in theory produce the same code on modern gcc. diff --git a/arch/arm/include/asm/div64.h b/arch/arm/include/asm/div64.h index 898e9c78a7..595e538f5b 100644 --- a/arch/arm/include/asm/div64.h +++ b/arch/arm/include/asm/div64.h @@ -21,29 +21,20 @@ * assembly implementation with completely non standard calling convention * for arguments and results (beware). */ - -#ifdef __ARMEB__ -#define __xh "r0" -#define __xl "r1" -#else -#define __xl "r0" -#define __xh "r1" -#endif - static inline uint32_t __div64_32(uint64_t *n, uint32_t base) { register unsigned int __base asm("r4") = base; register unsigned long long __n asm("r0") = *n; register unsigned long long __res asm("r2"); - register unsigned int __rem asm(__xh); - asm( __asmeq("%0", __xh) + unsigned int __rem; + asm( __asmeq("%0", "r0") __asmeq("%1", "r2") - __asmeq("%2", "r0") - __asmeq("%3", "r4") + __asmeq("%2", "r4") "bl __do_div64" - : "=r" (__rem), "=r" (__res) - : "r" (__n), "r" (__base) + : "+r" (__n), "=r" (__res) + : "r" (__base) : "ip", "lr", "cc"); + __rem = __n >> 32; *n = __res; return __rem; } I'll submit it if someone confirms it works. Nicolas
On Mon, 30 Nov 2020 at 16:51, Nicolas Pitre <nico@fluxnic.net> wrote: > > On Mon, 30 Nov 2020, Ard Biesheuvel wrote: > > > (+ Nico) > > > > On Mon, 30 Nov 2020 at 11:11, Ard Biesheuvel <ardb@kernel.org> wrote: > > > > > > On Mon, 23 Nov 2020 at 08:39, Antony Yu <swpenim@gmail.com> wrote: > > > > > > > > __do_div64 clobbers the input register r0 in little endian system. > > > > According to the inline assembly document, if an input operand is > > > > modified, it should be tied to a output operand. This patch can > > > > prevent compilers from reusing r0 register after asm statements. > > > > > > > > Signed-off-by: Antony Yu <swpenim@gmail.com> > > > > --- > > > > arch/arm/include/asm/div64.h | 5 +++-- > > > > 1 file changed, 3 insertions(+), 2 deletions(-) > > > > > > > > diff --git a/arch/arm/include/asm/div64.h b/arch/arm/include/asm/div64.h > > > > index 898e9c78a7e7..809efc51e90f 100644 > > > > --- a/arch/arm/include/asm/div64.h > > > > +++ b/arch/arm/include/asm/div64.h > > > > @@ -39,9 +39,10 @@ static inline uint32_t __div64_32(uint64_t *n, uint32_t base) > > > > asm( __asmeq("%0", __xh) > > > > __asmeq("%1", "r2") > > > > __asmeq("%2", "r0") > > > > - __asmeq("%3", "r4") > > > > + __asmeq("%3", "r0") > > > > + __asmeq("%4", "r4") > > > > "bl __do_div64" > > > > - : "=r" (__rem), "=r" (__res) > > > > + : "=r" (__rem), "=r" (__res), "=r" (__n) > > > > : "r" (__n), "r" (__base) > > > > : "ip", "lr", "cc"); > > > > *n = __res; > > > > -- > > > > 2.23.0 > > > > > > > > > > Agree that using r0 as an input operand only is incorrect, and not > > > only on Clang. The compiler might assume that r0 will retain its value > > > across the asm() block, which is obviously not the case. > > You're right. > > This was done like that most likely to work around some stupid code > generation with "__n >> 32" while using gcc from about 20 years ago. IOW > I don't exactly remember why I did it like that, but it is certainly > flawed. > > Here's my version of the fix which should be correct. Warning: this > is completely untested, but should in theory produce the same code on > modern gcc. > > diff --git a/arch/arm/include/asm/div64.h b/arch/arm/include/asm/div64.h > index 898e9c78a7..595e538f5b 100644 > --- a/arch/arm/include/asm/div64.h > +++ b/arch/arm/include/asm/div64.h > @@ -21,29 +21,20 @@ > * assembly implementation with completely non standard calling convention > * for arguments and results (beware). > */ > - > -#ifdef __ARMEB__ > -#define __xh "r0" > -#define __xl "r1" > -#else > -#define __xl "r0" > -#define __xh "r1" > -#endif > - > static inline uint32_t __div64_32(uint64_t *n, uint32_t base) > { > register unsigned int __base asm("r4") = base; > register unsigned long long __n asm("r0") = *n; > register unsigned long long __res asm("r2"); > - register unsigned int __rem asm(__xh); > - asm( __asmeq("%0", __xh) > + unsigned int __rem; > + asm( __asmeq("%0", "r0") > __asmeq("%1", "r2") > - __asmeq("%2", "r0") > - __asmeq("%3", "r4") > + __asmeq("%2", "r4") > "bl __do_div64" > - : "=r" (__rem), "=r" (__res) > - : "r" (__n), "r" (__base) > + : "+r" (__n), "=r" (__res) > + : "r" (__base) > : "ip", "lr", "cc"); > + __rem = __n >> 32; This treats {r0, r1} as a {low, high} pair, regardless of endianness, and so it puts the value of r0 into r1. Doesn't that mean the shift should only be done on little endian? > *n = __res; > return __rem; > } > > I'll submit it if someone confirms it works. > > > Nicolas > > _______________________________________________ > linux-arm-kernel mailing list > linux-arm-kernel@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
On Mon, 30 Nov 2020, Ard Biesheuvel wrote: > On Mon, 30 Nov 2020 at 16:51, Nicolas Pitre <nico@fluxnic.net> wrote: > > > Here's my version of the fix which should be correct. Warning: this > > is completely untested, but should in theory produce the same code on > > modern gcc. > > > > diff --git a/arch/arm/include/asm/div64.h b/arch/arm/include/asm/div64.h > > index 898e9c78a7..595e538f5b 100644 > > --- a/arch/arm/include/asm/div64.h > > +++ b/arch/arm/include/asm/div64.h > > @@ -21,29 +21,20 @@ > > * assembly implementation with completely non standard calling convention > > * for arguments and results (beware). > > */ > > - > > -#ifdef __ARMEB__ > > -#define __xh "r0" > > -#define __xl "r1" > > -#else > > -#define __xl "r0" > > -#define __xh "r1" > > -#endif > > - > > static inline uint32_t __div64_32(uint64_t *n, uint32_t base) > > { > > register unsigned int __base asm("r4") = base; > > register unsigned long long __n asm("r0") = *n; > > register unsigned long long __res asm("r2"); > > - register unsigned int __rem asm(__xh); > > - asm( __asmeq("%0", __xh) > > + unsigned int __rem; > > + asm( __asmeq("%0", "r0") > > __asmeq("%1", "r2") > > - __asmeq("%2", "r0") > > - __asmeq("%3", "r4") > > + __asmeq("%2", "r4") > > "bl __do_div64" > > - : "=r" (__rem), "=r" (__res) > > - : "r" (__n), "r" (__base) > > + : "+r" (__n), "=r" (__res) > > + : "r" (__base) > > : "ip", "lr", "cc"); > > + __rem = __n >> 32; > > This treats {r0, r1} as a {low, high} pair, regardless of endianness, > and so it puts the value of r0 into r1. Doesn't that mean the shift > should only be done on little endian? Not quite. r0-r1 = low-high is for little endian. Then "__n >> 32" is actually translated into "mov r0, r1" to move it into __rem and returned through r0. On big endial it is r0-r1 = high-low. Here "__n >> 32" picks r0 and moves it to __rem which is returned through r0 so no extra instruction needed. Of course the function is inlined so r0 can be anything, or optimized away if__rem is not used. Nicolas
On Mon, 30 Nov 2020 at 18:52, Nicolas Pitre <nico@fluxnic.net> wrote: > > On Mon, 30 Nov 2020, Ard Biesheuvel wrote: > > > On Mon, 30 Nov 2020 at 16:51, Nicolas Pitre <nico@fluxnic.net> wrote: > > > > > Here's my version of the fix which should be correct. Warning: this > > > is completely untested, but should in theory produce the same code on > > > modern gcc. > > > > > > diff --git a/arch/arm/include/asm/div64.h b/arch/arm/include/asm/div64.h > > > index 898e9c78a7..595e538f5b 100644 > > > --- a/arch/arm/include/asm/div64.h > > > +++ b/arch/arm/include/asm/div64.h > > > @@ -21,29 +21,20 @@ > > > * assembly implementation with completely non standard calling convention > > > * for arguments and results (beware). > > > */ > > > - > > > -#ifdef __ARMEB__ > > > -#define __xh "r0" > > > -#define __xl "r1" > > > -#else > > > -#define __xl "r0" > > > -#define __xh "r1" > > > -#endif > > > - > > > static inline uint32_t __div64_32(uint64_t *n, uint32_t base) > > > { > > > register unsigned int __base asm("r4") = base; > > > register unsigned long long __n asm("r0") = *n; > > > register unsigned long long __res asm("r2"); > > > - register unsigned int __rem asm(__xh); > > > - asm( __asmeq("%0", __xh) > > > + unsigned int __rem; > > > + asm( __asmeq("%0", "r0") > > > __asmeq("%1", "r2") > > > - __asmeq("%2", "r0") > > > - __asmeq("%3", "r4") > > > + __asmeq("%2", "r4") > > > "bl __do_div64" > > > - : "=r" (__rem), "=r" (__res) > > > - : "r" (__n), "r" (__base) > > > + : "+r" (__n), "=r" (__res) > > > + : "r" (__base) > > > : "ip", "lr", "cc"); > > > + __rem = __n >> 32; > > > > This treats {r0, r1} as a {low, high} pair, regardless of endianness, > > and so it puts the value of r0 into r1. Doesn't that mean the shift > > should only be done on little endian? > > Not quite. r0-r1 = low-high is for little endian. Then "__n >> 32" is > actually translated into "mov r0, r1" to move it into __rem and returned > through r0. > > On big endial it is r0-r1 = high-low. Here "__n >> 32" picks r0 and > moves it to __rem which is returned through r0 so no extra instruction > needed. > > Of course the function is inlined so r0 can be anything, or optimized > away if__rem is not used. > OK, you're right. I got myself confused there, but a quick test with GCC confirms your explanation: $ arm-linux-gnueabihf-gcc -mbig-endian -O2 -S -o - \ -xc - <<<"long f(long long l) { return l >> 32; }" just produces bx lr whereas removing the -mbig-endian gives mov r0, r1 bx lr I tested the change and it builds and runs fine (although I am not sure how much coverage this code gets on an ordinary boot): Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Tested-by: Ard Biesheuvel <ardb@kernel.org>
diff --git a/arch/arm/include/asm/div64.h b/arch/arm/include/asm/div64.h index 898e9c78a7e7..809efc51e90f 100644 --- a/arch/arm/include/asm/div64.h +++ b/arch/arm/include/asm/div64.h @@ -39,9 +39,10 @@ static inline uint32_t __div64_32(uint64_t *n, uint32_t base) asm( __asmeq("%0", __xh) __asmeq("%1", "r2") __asmeq("%2", "r0") - __asmeq("%3", "r4") + __asmeq("%3", "r0") + __asmeq("%4", "r4") "bl __do_div64" - : "=r" (__rem), "=r" (__res) + : "=r" (__rem), "=r" (__res), "=r" (__n) : "r" (__n), "r" (__base) : "ip", "lr", "cc"); *n = __res;
__do_div64 clobbers the input register r0 in little endian system. According to the inline assembly document, if an input operand is modified, it should be tied to a output operand. This patch can prevent compilers from reusing r0 register after asm statements. Signed-off-by: Antony Yu <swpenim@gmail.com> --- arch/arm/include/asm/div64.h | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)