diff mbox series

[RESEND] ARM: fix __div64_32() error when compiling with clang

Message ID 20201123073634.6854-1-swpenim@gmail.com (mailing list archive)
State New, archived
Headers show
Series [RESEND] ARM: fix __div64_32() error when compiling with clang | expand

Commit Message

Antony Yu Nov. 23, 2020, 7:36 a.m. UTC
__do_div64 clobbers the input register r0 in little endian system.
According to the inline assembly document, if an input operand is
modified, it should be tied to a output operand. This patch can
prevent compilers from reusing r0 register after asm statements.

Signed-off-by: Antony Yu <swpenim@gmail.com>
---
 arch/arm/include/asm/div64.h | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

Comments

Nathan Chancellor Nov. 23, 2020, 6:16 p.m. UTC | #1
On Mon, Nov 23, 2020 at 03:36:32PM +0800, Antony Yu wrote:
> __do_div64 clobbers the input register r0 in little endian system.
> According to the inline assembly document, if an input operand is
> modified, it should be tied to a output operand. This patch can
> prevent compilers from reusing r0 register after asm statements.
> 
> Signed-off-by: Antony Yu <swpenim@gmail.com>
> ---
>  arch/arm/include/asm/div64.h | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm/include/asm/div64.h b/arch/arm/include/asm/div64.h
> index 898e9c78a7e7..809efc51e90f 100644
> --- a/arch/arm/include/asm/div64.h
> +++ b/arch/arm/include/asm/div64.h
> @@ -39,9 +39,10 @@ static inline uint32_t __div64_32(uint64_t *n, uint32_t base)
>  	asm(	__asmeq("%0", __xh)
>  		__asmeq("%1", "r2")
>  		__asmeq("%2", "r0")
> -		__asmeq("%3", "r4")
> +		__asmeq("%3", "r0")
> +		__asmeq("%4", "r4")
>  		"bl	__do_div64"
> -		: "=r" (__rem), "=r" (__res)
> +		: "=r" (__rem), "=r" (__res), "=r" (__n)
>  		: "r" (__n), "r" (__base)
>  		: "ip", "lr", "cc");
>  	*n = __res;
> -- 
> 2.23.0
> 

I am not sure that I am qualified to review this (my assembly knowledge
is not the best) but your commit title mentions an error when compiling
with clang. What is the exact error, what configuration generates it,
and what version of clang? We have done fairly decent testing for
32-bit ARM, I would like to know what we are missing.

Cheers,
Nathan
Antony Yu Nov. 24, 2020, 10:14 a.m. UTC | #2
Antony Yu <swpenim@gmail.com> 於 2020年11月24日 週二 下午3:43寫道:
>
> On Mon, Nov 23, 2020 at 11:16:02AM -0700, Nathan Chancellor wrote:
> > On Mon, Nov 23, 2020 at 03:36:32PM +0800, Antony Yu wrote:
> > > __do_div64 clobbers the input register r0 in little endian system.
> > > According to the inline assembly document, if an input operand is
> > > modified, it should be tied to a output operand. This patch can
> > > prevent compilers from reusing r0 register after asm statements.
> > >
> > > Signed-off-by: Antony Yu <swpenim@gmail.com>
> > > ---
> > >  arch/arm/include/asm/div64.h | 5 +++--
> > >  1 file changed, 3 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/arch/arm/include/asm/div64.h b/arch/arm/include/asm/div64.h
> > > index 898e9c78a7e7..809efc51e90f 100644
> > > --- a/arch/arm/include/asm/div64.h
> > > +++ b/arch/arm/include/asm/div64.h
> > > @@ -39,9 +39,10 @@ static inline uint32_t __div64_32(uint64_t *n, uint32_t base)
> > >     asm(    __asmeq("%0", __xh)
> > >             __asmeq("%1", "r2")
> > >             __asmeq("%2", "r0")
> > > -           __asmeq("%3", "r4")
> > > +           __asmeq("%3", "r0")
> > > +           __asmeq("%4", "r4")
> > >             "bl     __do_div64"
> > > -           : "=r" (__rem), "=r" (__res)
> > > +           : "=r" (__rem), "=r" (__res), "=r" (__n)
> > >             : "r" (__n), "r" (__base)
> > >             : "ip", "lr", "cc");
> > >     *n = __res;
> > > --
> > > 2.23.0
> > >
> >
> > I am not sure that I am qualified to review this (my assembly knowledge
> > is not the best) but your commit title mentions an error when compiling
> > with clang. What is the exact error, what configuration generates it,
> > and what version of clang? We have done fairly decent testing for
> > 32-bit ARM, I would like to know what we are missing.
> >
> > Cheers,
> > Nathan
>
> We have run fail on android R vts vts_libsnapshot_test with kernel 4.14.
> This bug is triggered accidently by a workaround patch in our code base.
> It is fine on a pure clean 4.14 branch since __do_div64 may not be
> executed in skip_metadata.
>
> The attachment are .i and generated .s file. .s file can be reproduced
> with clang -target arm-linux-eabi -march=armv8.2-a -O2.
>
> In function skip_metadata, it loads some value to r0, calls __do_div64,
> adds 1 to r0 and stores it to [r5]. It gets wrong value since __do_div64
> clobbers r0 register.
>
> We have tried clang-10, clang-11 and android prebuilt clang-r383902b. All
> of them have the same problem.

Sorry for the large attachment.
I put .i and .s files on
https://gist.github.com/penyung/274b0c697062a1c776994bb40243cfff

Antony Yu
Nick Desaulniers Nov. 24, 2020, 9:06 p.m. UTC | #3
Thanks for the report, it probably was not fun to debug. I'll take a
closer look at this after the Thanksgiving holiday.

On Tue, Nov 24, 2020 at 2:14 AM Antony Yu <swpenim@gmail.com> wrote:
>
> Antony Yu <swpenim@gmail.com> 於 2020年11月24日 週二 下午3:43寫道:
> >
> > On Mon, Nov 23, 2020 at 11:16:02AM -0700, Nathan Chancellor wrote:
> > > On Mon, Nov 23, 2020 at 03:36:32PM +0800, Antony Yu wrote:
> > > > __do_div64 clobbers the input register r0 in little endian system.
> > > > According to the inline assembly document, if an input operand is
> > > > modified, it should be tied to a output operand. This patch can
> > > > prevent compilers from reusing r0 register after asm statements.
> > > >
> > > > Signed-off-by: Antony Yu <swpenim@gmail.com>
> > > > ---
> > > >  arch/arm/include/asm/div64.h | 5 +++--
> > > >  1 file changed, 3 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/arch/arm/include/asm/div64.h b/arch/arm/include/asm/div64.h
> > > > index 898e9c78a7e7..809efc51e90f 100644
> > > > --- a/arch/arm/include/asm/div64.h
> > > > +++ b/arch/arm/include/asm/div64.h
> > > > @@ -39,9 +39,10 @@ static inline uint32_t __div64_32(uint64_t *n, uint32_t base)
> > > >     asm(    __asmeq("%0", __xh)
> > > >             __asmeq("%1", "r2")
> > > >             __asmeq("%2", "r0")
> > > > -           __asmeq("%3", "r4")
> > > > +           __asmeq("%3", "r0")
> > > > +           __asmeq("%4", "r4")
> > > >             "bl     __do_div64"
> > > > -           : "=r" (__rem), "=r" (__res)
> > > > +           : "=r" (__rem), "=r" (__res), "=r" (__n)
> > > >             : "r" (__n), "r" (__base)
> > > >             : "ip", "lr", "cc");
> > > >     *n = __res;
> > > > --
> > > > 2.23.0
> > > >
> > >
> > > I am not sure that I am qualified to review this (my assembly knowledge
> > > is not the best) but your commit title mentions an error when compiling
> > > with clang. What is the exact error, what configuration generates it,
> > > and what version of clang? We have done fairly decent testing for
> > > 32-bit ARM, I would like to know what we are missing.
> > >
> > > Cheers,
> > > Nathan
> >
> > We have run fail on android R vts vts_libsnapshot_test with kernel 4.14.
> > This bug is triggered accidently by a workaround patch in our code base.
> > It is fine on a pure clean 4.14 branch since __do_div64 may not be
> > executed in skip_metadata.
> >
> > The attachment are .i and generated .s file. .s file can be reproduced
> > with clang -target arm-linux-eabi -march=armv8.2-a -O2.
> >
> > In function skip_metadata, it loads some value to r0, calls __do_div64,
> > adds 1 to r0 and stores it to [r5]. It gets wrong value since __do_div64
> > clobbers r0 register.
> >
> > We have tried clang-10, clang-11 and android prebuilt clang-r383902b. All
> > of them have the same problem.
>
> Sorry for the large attachment.
> I put .i and .s files on
> https://gist.github.com/penyung/274b0c697062a1c776994bb40243cfff
>
> Antony Yu
kernel test robot Nov. 24, 2020, 11:16 p.m. UTC | #4
Hi Antony,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on linus/master]
[also build test ERROR on v5.10-rc5 next-20201124]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Antony-Yu/ARM-fix-__div64_32-error-when-compiling-with-clang/20201123-154454
base:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 418baf2c28f3473039f2f7377760bd8f6897ae18
config: arm-randconfig-r036-20201124 (attached as .config)
compiler: clang version 12.0.0 (https://github.com/llvm/llvm-project df9ae5992889560a8f3c6760b54d5051b47c7bf5)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install arm cross compiling tool for clang build
        # apt-get install binutils-arm-linux-gnueabi
        # https://github.com/0day-ci/linux/commit/83df1f983ed4219556020bf30cc819e66bb7e69c
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Antony-Yu/ARM-fix-__div64_32-error-when-compiling-with-clang/20201123-154454
        git checkout 83df1f983ed4219556020bf30cc819e66bb7e69c
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=arm 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   drivers/tty/serial/imx.c:359:19: warning: unused function 'imx_uart_is_imx21' [-Wunused-function]
   static inline int imx_uart_is_imx21(struct imx_port *sport)
                     ^
   drivers/tty/serial/imx.c:364:19: warning: unused function 'imx_uart_is_imx53' [-Wunused-function]
   static inline int imx_uart_is_imx53(struct imx_port *sport)
                     ^
   drivers/tty/serial/imx.c:369:19: warning: unused function 'imx_uart_is_imx6q' [-Wunused-function]
   static inline int imx_uart_is_imx6q(struct imx_port *sport)
                     ^
   In file included from drivers/tty/serial/imx.c:11:
   In file included from include/linux/module.h:12:
   In file included from include/linux/list.h:9:
   In file included from include/linux/kernel.h:19:
>> arch/arm/include/asm/div64.h:39:2: error: multiple outputs to hard register: r0
           asm(    __asmeq("%0", __xh)
           ^
   3 warnings and 1 error generated.
--
   In file included from drivers/gpu/drm/tegra/dc.c:7:
   In file included from include/linux/clk.h:13:
   In file included from include/linux/kernel.h:19:
>> arch/arm/include/asm/div64.h:39:2: error: multiple outputs to hard register: r0
           asm(    __asmeq("%0", __xh)
           ^
   1 error generated.

vim +39 arch/arm/include/asm/div64.h

^1da177e4c3f415 include/asm-arm/div64.h      Linus Torvalds 2005-04-16  32  
040b323b5012b55 arch/arm/include/asm/div64.h Nicolas Pitre  2015-11-02  33  static inline uint32_t __div64_32(uint64_t *n, uint32_t base)
040b323b5012b55 arch/arm/include/asm/div64.h Nicolas Pitre  2015-11-02  34  {
040b323b5012b55 arch/arm/include/asm/div64.h Nicolas Pitre  2015-11-02  35  	register unsigned int __base      asm("r4") = base;
040b323b5012b55 arch/arm/include/asm/div64.h Nicolas Pitre  2015-11-02  36  	register unsigned long long __n   asm("r0") = *n;
040b323b5012b55 arch/arm/include/asm/div64.h Nicolas Pitre  2015-11-02  37  	register unsigned long long __res asm("r2");
040b323b5012b55 arch/arm/include/asm/div64.h Nicolas Pitre  2015-11-02  38  	register unsigned int __rem       asm(__xh);
040b323b5012b55 arch/arm/include/asm/div64.h Nicolas Pitre  2015-11-02 @39  	asm(	__asmeq("%0", __xh)
040b323b5012b55 arch/arm/include/asm/div64.h Nicolas Pitre  2015-11-02  40  		__asmeq("%1", "r2")
040b323b5012b55 arch/arm/include/asm/div64.h Nicolas Pitre  2015-11-02  41  		__asmeq("%2", "r0")
83df1f983ed4219 arch/arm/include/asm/div64.h Antony Yu      2020-11-23  42  		__asmeq("%3", "r0")
83df1f983ed4219 arch/arm/include/asm/div64.h Antony Yu      2020-11-23  43  		__asmeq("%4", "r4")
040b323b5012b55 arch/arm/include/asm/div64.h Nicolas Pitre  2015-11-02  44  		"bl	__do_div64"
83df1f983ed4219 arch/arm/include/asm/div64.h Antony Yu      2020-11-23  45  		: "=r" (__rem), "=r" (__res), "=r" (__n)
040b323b5012b55 arch/arm/include/asm/div64.h Nicolas Pitre  2015-11-02  46  		: "r" (__n), "r" (__base)
040b323b5012b55 arch/arm/include/asm/div64.h Nicolas Pitre  2015-11-02  47  		: "ip", "lr", "cc");
040b323b5012b55 arch/arm/include/asm/div64.h Nicolas Pitre  2015-11-02  48  	*n = __res;
040b323b5012b55 arch/arm/include/asm/div64.h Nicolas Pitre  2015-11-02  49  	return __rem;
040b323b5012b55 arch/arm/include/asm/div64.h Nicolas Pitre  2015-11-02  50  }
040b323b5012b55 arch/arm/include/asm/div64.h Nicolas Pitre  2015-11-02  51  #define __div64_32 __div64_32
040b323b5012b55 arch/arm/include/asm/div64.h Nicolas Pitre  2015-11-02  52  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
Ard Biesheuvel Nov. 30, 2020, 10:11 a.m. UTC | #5
On Mon, 23 Nov 2020 at 08:39, Antony Yu <swpenim@gmail.com> wrote:
>
> __do_div64 clobbers the input register r0 in little endian system.
> According to the inline assembly document, if an input operand is
> modified, it should be tied to a output operand. This patch can
> prevent compilers from reusing r0 register after asm statements.
>
> Signed-off-by: Antony Yu <swpenim@gmail.com>
> ---
>  arch/arm/include/asm/div64.h | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/arch/arm/include/asm/div64.h b/arch/arm/include/asm/div64.h
> index 898e9c78a7e7..809efc51e90f 100644
> --- a/arch/arm/include/asm/div64.h
> +++ b/arch/arm/include/asm/div64.h
> @@ -39,9 +39,10 @@ static inline uint32_t __div64_32(uint64_t *n, uint32_t base)
>         asm(    __asmeq("%0", __xh)
>                 __asmeq("%1", "r2")
>                 __asmeq("%2", "r0")
> -               __asmeq("%3", "r4")
> +               __asmeq("%3", "r0")
> +               __asmeq("%4", "r4")
>                 "bl     __do_div64"
> -               : "=r" (__rem), "=r" (__res)
> +               : "=r" (__rem), "=r" (__res), "=r" (__n)
>                 : "r" (__n), "r" (__base)
>                 : "ip", "lr", "cc");
>         *n = __res;
> --
> 2.23.0
>

Agree that using r0 as an input operand only is incorrect, and not
only on Clang. The compiler might assume that r0 will retain its value
across the asm() block, which is obviously not the case.

However, your patch will likely break big-endian, since in that case,
__xh == r0, and so it will appear twice.

Perhaps it would be better to change the type of __rem to unsigned
long long as well?
Ard Biesheuvel Nov. 30, 2020, 10:12 a.m. UTC | #6
(+ Nico)

On Mon, 30 Nov 2020 at 11:11, Ard Biesheuvel <ardb@kernel.org> wrote:
>
> On Mon, 23 Nov 2020 at 08:39, Antony Yu <swpenim@gmail.com> wrote:
> >
> > __do_div64 clobbers the input register r0 in little endian system.
> > According to the inline assembly document, if an input operand is
> > modified, it should be tied to a output operand. This patch can
> > prevent compilers from reusing r0 register after asm statements.
> >
> > Signed-off-by: Antony Yu <swpenim@gmail.com>
> > ---
> >  arch/arm/include/asm/div64.h | 5 +++--
> >  1 file changed, 3 insertions(+), 2 deletions(-)
> >
> > diff --git a/arch/arm/include/asm/div64.h b/arch/arm/include/asm/div64.h
> > index 898e9c78a7e7..809efc51e90f 100644
> > --- a/arch/arm/include/asm/div64.h
> > +++ b/arch/arm/include/asm/div64.h
> > @@ -39,9 +39,10 @@ static inline uint32_t __div64_32(uint64_t *n, uint32_t base)
> >         asm(    __asmeq("%0", __xh)
> >                 __asmeq("%1", "r2")
> >                 __asmeq("%2", "r0")
> > -               __asmeq("%3", "r4")
> > +               __asmeq("%3", "r0")
> > +               __asmeq("%4", "r4")
> >                 "bl     __do_div64"
> > -               : "=r" (__rem), "=r" (__res)
> > +               : "=r" (__rem), "=r" (__res), "=r" (__n)
> >                 : "r" (__n), "r" (__base)
> >                 : "ip", "lr", "cc");
> >         *n = __res;
> > --
> > 2.23.0
> >
>
> Agree that using r0 as an input operand only is incorrect, and not
> only on Clang. The compiler might assume that r0 will retain its value
> across the asm() block, which is obviously not the case.
>
> However, your patch will likely break big-endian, since in that case,
> __xh == r0, and so it will appear twice.
>
> Perhaps it would be better to change the type of __rem to unsigned
> long long as well?
Russell King (Oracle) Nov. 30, 2020, 10:21 a.m. UTC | #7
On Mon, Nov 30, 2020 at 11:12:33AM +0100, Ard Biesheuvel wrote:
> (+ Nico)
> 
> On Mon, 30 Nov 2020 at 11:11, Ard Biesheuvel <ardb@kernel.org> wrote:
> >
> > On Mon, 23 Nov 2020 at 08:39, Antony Yu <swpenim@gmail.com> wrote:
> > >
> > > __do_div64 clobbers the input register r0 in little endian system.
> > > According to the inline assembly document, if an input operand is
> > > modified, it should be tied to a output operand. This patch can
> > > prevent compilers from reusing r0 register after asm statements.
> > >
> > > Signed-off-by: Antony Yu <swpenim@gmail.com>
> > > ---
> > >  arch/arm/include/asm/div64.h | 5 +++--
> > >  1 file changed, 3 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/arch/arm/include/asm/div64.h b/arch/arm/include/asm/div64.h
> > > index 898e9c78a7e7..809efc51e90f 100644
> > > --- a/arch/arm/include/asm/div64.h
> > > +++ b/arch/arm/include/asm/div64.h
> > > @@ -39,9 +39,10 @@ static inline uint32_t __div64_32(uint64_t *n, uint32_t base)
> > >         asm(    __asmeq("%0", __xh)
> > >                 __asmeq("%1", "r2")
> > >                 __asmeq("%2", "r0")
> > > -               __asmeq("%3", "r4")
> > > +               __asmeq("%3", "r0")
> > > +               __asmeq("%4", "r4")
> > >                 "bl     __do_div64"
> > > -               : "=r" (__rem), "=r" (__res)
> > > +               : "=r" (__rem), "=r" (__res), "=r" (__n)
> > >                 : "r" (__n), "r" (__base)
> > >                 : "ip", "lr", "cc");
> > >         *n = __res;
> > > --
> > > 2.23.0
> > >
> >
> > Agree that using r0 as an input operand only is incorrect, and not
> > only on Clang. The compiler might assume that r0 will retain its value
> > across the asm() block, which is obviously not the case.

However, you can _not_ have an asm block that names two outputs using
the same physical register - that's why both the original patch and
the posted v2 will fail.

You also can't mark r0 as clobbered because it's used as an operand
and that is not allowed by gcc.

The fact is, we have two register variables occupying the same register,
which are __n and __rem. It doesn't matter which endian-ness __rem is,
r0 will be used for both __n (input) and __rem (output).

If the compiler can't work out that if a physical register used as an
output operand will be written by the assembler, then the compiler is
quite simply buggy.

The code is correct as it stands; Clang is buggy.
Ard Biesheuvel Nov. 30, 2020, 10:40 a.m. UTC | #8
On Mon, 30 Nov 2020 at 11:21, Russell King - ARM Linux admin
<linux@armlinux.org.uk> wrote:
>
> On Mon, Nov 30, 2020 at 11:12:33AM +0100, Ard Biesheuvel wrote:
> > (+ Nico)
> >
> > On Mon, 30 Nov 2020 at 11:11, Ard Biesheuvel <ardb@kernel.org> wrote:
> > >
> > > On Mon, 23 Nov 2020 at 08:39, Antony Yu <swpenim@gmail.com> wrote:
> > > >
> > > > __do_div64 clobbers the input register r0 in little endian system.
> > > > According to the inline assembly document, if an input operand is
> > > > modified, it should be tied to a output operand. This patch can
> > > > prevent compilers from reusing r0 register after asm statements.
> > > >
> > > > Signed-off-by: Antony Yu <swpenim@gmail.com>
> > > > ---
> > > >  arch/arm/include/asm/div64.h | 5 +++--
> > > >  1 file changed, 3 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/arch/arm/include/asm/div64.h b/arch/arm/include/asm/div64.h
> > > > index 898e9c78a7e7..809efc51e90f 100644
> > > > --- a/arch/arm/include/asm/div64.h
> > > > +++ b/arch/arm/include/asm/div64.h
> > > > @@ -39,9 +39,10 @@ static inline uint32_t __div64_32(uint64_t *n, uint32_t base)
> > > >         asm(    __asmeq("%0", __xh)
> > > >                 __asmeq("%1", "r2")
> > > >                 __asmeq("%2", "r0")
> > > > -               __asmeq("%3", "r4")
> > > > +               __asmeq("%3", "r0")
> > > > +               __asmeq("%4", "r4")
> > > >                 "bl     __do_div64"
> > > > -               : "=r" (__rem), "=r" (__res)
> > > > +               : "=r" (__rem), "=r" (__res), "=r" (__n)
> > > >                 : "r" (__n), "r" (__base)
> > > >                 : "ip", "lr", "cc");
> > > >         *n = __res;
> > > > --
> > > > 2.23.0
> > > >
> > >
> > > Agree that using r0 as an input operand only is incorrect, and not
> > > only on Clang. The compiler might assume that r0 will retain its value
> > > across the asm() block, which is obviously not the case.
>
> However, you can _not_ have an asm block that names two outputs using
> the same physical register - that's why both the original patch and
> the posted v2 will fail.
>
> You also can't mark r0 as clobbered because it's used as an operand
> and that is not allowed by gcc.
>
> The fact is, we have two register variables occupying the same register,
> which are __n and __rem. It doesn't matter which endian-ness __rem is,
> r0 will be used for both __n (input) and __rem (output).
>

__rem is a 32-bit variable, so in LE mode, only r1 is used for __rem,
not r0. So r0/r1 are used as an input operand pair, and r1 is used as
an output operand.

So I don't think the compiler has to be buggy in order for it to
assume that r0 will still contain the low word of the dividend
afterwards.

And actually, the same applies on BE, but the other way around. So we
should mark __xl as an output register as well, as __xl will assume
the right value depending on the endianness.

I suggest something like the below,

diff --git a/arch/arm/include/asm/div64.h b/arch/arm/include/asm/div64.h
index 898e9c78a7e7..85ff9109595e 100644
--- a/arch/arm/include/asm/div64.h
+++ b/arch/arm/include/asm/div64.h
@@ -36,12 +36,14 @@ static inline uint32_t __div64_32(uint64_t *n,
uint32_t base)
        register unsigned long long __n   asm("r0") = *n;
        register unsigned long long __res asm("r2");
        register unsigned int __rem       asm(__xh);
+       register unsigned int __dummy     asm(__xl);
        asm(    __asmeq("%0", __xh)
                __asmeq("%1", "r2")
-               __asmeq("%2", "r0")
-               __asmeq("%3", "r4")
+               __asmeq("%2", __xl)
+               __asmeq("%3", "r0")
+               __asmeq("%4", "r4")
                "bl     __do_div64"
-               : "=r" (__rem), "=r" (__res)
+               : "=r" (__rem), "=r" (__res), "=r"(__dummy)
                : "r" (__n), "r" (__base)
                : "ip", "lr", "cc");
        *n = __res;



> If the compiler can't work out that if a physical register used as an
> output operand will be written by the assembler, then the compiler is
> quite simply buggy.
>
> The code is correct as it stands; Clang is buggy.
>
> --
> RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
> FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!
David Laight Nov. 30, 2020, 1:58 p.m. UTC | #9
> And actually, the same applies on BE, but the other way around. So we
> should mark __xl as an output register as well, as __xl will assume
> the right value depending on the endianness.

Why not use "+r" to indicate than an 'output' parameter is also
used as an input.

Rather cleaner than specifying the same C variable as both
input and output.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
Russell King (Oracle) Nov. 30, 2020, 2:18 p.m. UTC | #10
On Mon, Nov 30, 2020 at 01:58:27PM +0000, David Laight wrote:
> > And actually, the same applies on BE, but the other way around. So we
> > should mark __xl as an output register as well, as __xl will assume
> > the right value depending on the endianness.
> 
> Why not use "+r" to indicate than an 'output' parameter is also
> used as an input.
> 
> Rather cleaner than specifying the same C variable as both
> input and output.

You have an incorrect understanding. "__n" is the input operand in r0.
"__rem" is the output operand in r0/r1.

No single C variable is used as both an input and an output.
Nicolas Pitre Nov. 30, 2020, 3:50 p.m. UTC | #11
On Mon, 30 Nov 2020, Ard Biesheuvel wrote:

> (+ Nico)
> 
> On Mon, 30 Nov 2020 at 11:11, Ard Biesheuvel <ardb@kernel.org> wrote:
> >
> > On Mon, 23 Nov 2020 at 08:39, Antony Yu <swpenim@gmail.com> wrote:
> > >
> > > __do_div64 clobbers the input register r0 in little endian system.
> > > According to the inline assembly document, if an input operand is
> > > modified, it should be tied to a output operand. This patch can
> > > prevent compilers from reusing r0 register after asm statements.
> > >
> > > Signed-off-by: Antony Yu <swpenim@gmail.com>
> > > ---
> > >  arch/arm/include/asm/div64.h | 5 +++--
> > >  1 file changed, 3 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/arch/arm/include/asm/div64.h b/arch/arm/include/asm/div64.h
> > > index 898e9c78a7e7..809efc51e90f 100644
> > > --- a/arch/arm/include/asm/div64.h
> > > +++ b/arch/arm/include/asm/div64.h
> > > @@ -39,9 +39,10 @@ static inline uint32_t __div64_32(uint64_t *n, uint32_t base)
> > >         asm(    __asmeq("%0", __xh)
> > >                 __asmeq("%1", "r2")
> > >                 __asmeq("%2", "r0")
> > > -               __asmeq("%3", "r4")
> > > +               __asmeq("%3", "r0")
> > > +               __asmeq("%4", "r4")
> > >                 "bl     __do_div64"
> > > -               : "=r" (__rem), "=r" (__res)
> > > +               : "=r" (__rem), "=r" (__res), "=r" (__n)
> > >                 : "r" (__n), "r" (__base)
> > >                 : "ip", "lr", "cc");
> > >         *n = __res;
> > > --
> > > 2.23.0
> > >
> >
> > Agree that using r0 as an input operand only is incorrect, and not
> > only on Clang. The compiler might assume that r0 will retain its value
> > across the asm() block, which is obviously not the case.

You're right.

This was done like that most likely to work around some stupid code 
generation with "__n >> 32" while using gcc from about 20 years ago. IOW 
I don't exactly remember why I did it like that, but it is certainly 
flawed.

Here's my version of the fix which should be correct. Warning: this 
is completely untested, but should in theory produce the same code on 
modern gcc.

diff --git a/arch/arm/include/asm/div64.h b/arch/arm/include/asm/div64.h
index 898e9c78a7..595e538f5b 100644
--- a/arch/arm/include/asm/div64.h
+++ b/arch/arm/include/asm/div64.h
@@ -21,29 +21,20 @@
  * assembly implementation with completely non standard calling convention
  * for arguments and results (beware).
  */
-
-#ifdef __ARMEB__
-#define __xh "r0"
-#define __xl "r1"
-#else
-#define __xl "r0"
-#define __xh "r1"
-#endif
-
 static inline uint32_t __div64_32(uint64_t *n, uint32_t base)
 {
 	register unsigned int __base      asm("r4") = base;
 	register unsigned long long __n   asm("r0") = *n;
 	register unsigned long long __res asm("r2");
-	register unsigned int __rem       asm(__xh);
-	asm(	__asmeq("%0", __xh)
+	unsigned int __rem;
+	asm(	__asmeq("%0", "r0")
 		__asmeq("%1", "r2")
-		__asmeq("%2", "r0")
-		__asmeq("%3", "r4")
+		__asmeq("%2", "r4")
 		"bl	__do_div64"
-		: "=r" (__rem), "=r" (__res)
-		: "r" (__n), "r" (__base)
+		: "+r" (__n), "=r" (__res)
+		: "r" (__base)
 		: "ip", "lr", "cc");
+	__rem = __n >> 32;
 	*n = __res;
 	return __rem;
 }

I'll submit it if someone confirms it works.


Nicolas
Ard Biesheuvel Nov. 30, 2020, 5:18 p.m. UTC | #12
On Mon, 30 Nov 2020 at 16:51, Nicolas Pitre <nico@fluxnic.net> wrote:
>
> On Mon, 30 Nov 2020, Ard Biesheuvel wrote:
>
> > (+ Nico)
> >
> > On Mon, 30 Nov 2020 at 11:11, Ard Biesheuvel <ardb@kernel.org> wrote:
> > >
> > > On Mon, 23 Nov 2020 at 08:39, Antony Yu <swpenim@gmail.com> wrote:
> > > >
> > > > __do_div64 clobbers the input register r0 in little endian system.
> > > > According to the inline assembly document, if an input operand is
> > > > modified, it should be tied to a output operand. This patch can
> > > > prevent compilers from reusing r0 register after asm statements.
> > > >
> > > > Signed-off-by: Antony Yu <swpenim@gmail.com>
> > > > ---
> > > >  arch/arm/include/asm/div64.h | 5 +++--
> > > >  1 file changed, 3 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/arch/arm/include/asm/div64.h b/arch/arm/include/asm/div64.h
> > > > index 898e9c78a7e7..809efc51e90f 100644
> > > > --- a/arch/arm/include/asm/div64.h
> > > > +++ b/arch/arm/include/asm/div64.h
> > > > @@ -39,9 +39,10 @@ static inline uint32_t __div64_32(uint64_t *n, uint32_t base)
> > > >         asm(    __asmeq("%0", __xh)
> > > >                 __asmeq("%1", "r2")
> > > >                 __asmeq("%2", "r0")
> > > > -               __asmeq("%3", "r4")
> > > > +               __asmeq("%3", "r0")
> > > > +               __asmeq("%4", "r4")
> > > >                 "bl     __do_div64"
> > > > -               : "=r" (__rem), "=r" (__res)
> > > > +               : "=r" (__rem), "=r" (__res), "=r" (__n)
> > > >                 : "r" (__n), "r" (__base)
> > > >                 : "ip", "lr", "cc");
> > > >         *n = __res;
> > > > --
> > > > 2.23.0
> > > >
> > >
> > > Agree that using r0 as an input operand only is incorrect, and not
> > > only on Clang. The compiler might assume that r0 will retain its value
> > > across the asm() block, which is obviously not the case.
>
> You're right.
>
> This was done like that most likely to work around some stupid code
> generation with "__n >> 32" while using gcc from about 20 years ago. IOW
> I don't exactly remember why I did it like that, but it is certainly
> flawed.
>
> Here's my version of the fix which should be correct. Warning: this
> is completely untested, but should in theory produce the same code on
> modern gcc.
>
> diff --git a/arch/arm/include/asm/div64.h b/arch/arm/include/asm/div64.h
> index 898e9c78a7..595e538f5b 100644
> --- a/arch/arm/include/asm/div64.h
> +++ b/arch/arm/include/asm/div64.h
> @@ -21,29 +21,20 @@
>   * assembly implementation with completely non standard calling convention
>   * for arguments and results (beware).
>   */
> -
> -#ifdef __ARMEB__
> -#define __xh "r0"
> -#define __xl "r1"
> -#else
> -#define __xl "r0"
> -#define __xh "r1"
> -#endif
> -
>  static inline uint32_t __div64_32(uint64_t *n, uint32_t base)
>  {
>         register unsigned int __base      asm("r4") = base;
>         register unsigned long long __n   asm("r0") = *n;
>         register unsigned long long __res asm("r2");
> -       register unsigned int __rem       asm(__xh);
> -       asm(    __asmeq("%0", __xh)
> +       unsigned int __rem;
> +       asm(    __asmeq("%0", "r0")
>                 __asmeq("%1", "r2")
> -               __asmeq("%2", "r0")
> -               __asmeq("%3", "r4")
> +               __asmeq("%2", "r4")
>                 "bl     __do_div64"
> -               : "=r" (__rem), "=r" (__res)
> -               : "r" (__n), "r" (__base)
> +               : "+r" (__n), "=r" (__res)
> +               : "r" (__base)
>                 : "ip", "lr", "cc");
> +       __rem = __n >> 32;

This treats {r0, r1} as a {low, high} pair, regardless of endianness,
and so it puts the value of r0 into r1. Doesn't that mean the shift
should only be done on little endian?


>         *n = __res;
>         return __rem;
>  }
>
> I'll submit it if someone confirms it works.
>
>
> Nicolas
>
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Nicolas Pitre Nov. 30, 2020, 5:52 p.m. UTC | #13
On Mon, 30 Nov 2020, Ard Biesheuvel wrote:

> On Mon, 30 Nov 2020 at 16:51, Nicolas Pitre <nico@fluxnic.net> wrote:
> 
> > Here's my version of the fix which should be correct. Warning: this
> > is completely untested, but should in theory produce the same code on
> > modern gcc.
> >
> > diff --git a/arch/arm/include/asm/div64.h b/arch/arm/include/asm/div64.h
> > index 898e9c78a7..595e538f5b 100644
> > --- a/arch/arm/include/asm/div64.h
> > +++ b/arch/arm/include/asm/div64.h
> > @@ -21,29 +21,20 @@
> >   * assembly implementation with completely non standard calling convention
> >   * for arguments and results (beware).
> >   */
> > -
> > -#ifdef __ARMEB__
> > -#define __xh "r0"
> > -#define __xl "r1"
> > -#else
> > -#define __xl "r0"
> > -#define __xh "r1"
> > -#endif
> > -
> >  static inline uint32_t __div64_32(uint64_t *n, uint32_t base)
> >  {
> >         register unsigned int __base      asm("r4") = base;
> >         register unsigned long long __n   asm("r0") = *n;
> >         register unsigned long long __res asm("r2");
> > -       register unsigned int __rem       asm(__xh);
> > -       asm(    __asmeq("%0", __xh)
> > +       unsigned int __rem;
> > +       asm(    __asmeq("%0", "r0")
> >                 __asmeq("%1", "r2")
> > -               __asmeq("%2", "r0")
> > -               __asmeq("%3", "r4")
> > +               __asmeq("%2", "r4")
> >                 "bl     __do_div64"
> > -               : "=r" (__rem), "=r" (__res)
> > -               : "r" (__n), "r" (__base)
> > +               : "+r" (__n), "=r" (__res)
> > +               : "r" (__base)
> >                 : "ip", "lr", "cc");
> > +       __rem = __n >> 32;
> 
> This treats {r0, r1} as a {low, high} pair, regardless of endianness,
> and so it puts the value of r0 into r1. Doesn't that mean the shift
> should only be done on little endian?

Not quite. r0-r1 = low-high is for little endian. Then "__n >> 32" is 
actually translated into "mov r0, r1" to move it into __rem and returned 
through r0.

On big endial it is r0-r1 = high-low. Here "__n >> 32" picks r0 and 
moves it to __rem which is returned through r0 so no extra instruction 
needed.

Of course the function is inlined so r0 can be anything, or optimized 
away if__rem is not used.


Nicolas
Ard Biesheuvel Nov. 30, 2020, 6:08 p.m. UTC | #14
On Mon, 30 Nov 2020 at 18:52, Nicolas Pitre <nico@fluxnic.net> wrote:
>
> On Mon, 30 Nov 2020, Ard Biesheuvel wrote:
>
> > On Mon, 30 Nov 2020 at 16:51, Nicolas Pitre <nico@fluxnic.net> wrote:
> >
> > > Here's my version of the fix which should be correct. Warning: this
> > > is completely untested, but should in theory produce the same code on
> > > modern gcc.
> > >
> > > diff --git a/arch/arm/include/asm/div64.h b/arch/arm/include/asm/div64.h
> > > index 898e9c78a7..595e538f5b 100644
> > > --- a/arch/arm/include/asm/div64.h
> > > +++ b/arch/arm/include/asm/div64.h
> > > @@ -21,29 +21,20 @@
> > >   * assembly implementation with completely non standard calling convention
> > >   * for arguments and results (beware).
> > >   */
> > > -
> > > -#ifdef __ARMEB__
> > > -#define __xh "r0"
> > > -#define __xl "r1"
> > > -#else
> > > -#define __xl "r0"
> > > -#define __xh "r1"
> > > -#endif
> > > -
> > >  static inline uint32_t __div64_32(uint64_t *n, uint32_t base)
> > >  {
> > >         register unsigned int __base      asm("r4") = base;
> > >         register unsigned long long __n   asm("r0") = *n;
> > >         register unsigned long long __res asm("r2");
> > > -       register unsigned int __rem       asm(__xh);
> > > -       asm(    __asmeq("%0", __xh)
> > > +       unsigned int __rem;
> > > +       asm(    __asmeq("%0", "r0")
> > >                 __asmeq("%1", "r2")
> > > -               __asmeq("%2", "r0")
> > > -               __asmeq("%3", "r4")
> > > +               __asmeq("%2", "r4")
> > >                 "bl     __do_div64"
> > > -               : "=r" (__rem), "=r" (__res)
> > > -               : "r" (__n), "r" (__base)
> > > +               : "+r" (__n), "=r" (__res)
> > > +               : "r" (__base)
> > >                 : "ip", "lr", "cc");
> > > +       __rem = __n >> 32;
> >
> > This treats {r0, r1} as a {low, high} pair, regardless of endianness,
> > and so it puts the value of r0 into r1. Doesn't that mean the shift
> > should only be done on little endian?
>
> Not quite. r0-r1 = low-high is for little endian. Then "__n >> 32" is
> actually translated into "mov r0, r1" to move it into __rem and returned
> through r0.
>
> On big endial it is r0-r1 = high-low.  Here "__n >> 32" picks r0 and
> moves it to __rem which is returned through r0 so no extra instruction
> needed.
>
> Of course the function is inlined so r0 can be anything, or optimized
> away if__rem is not used.
>

OK, you're right. I got myself confused there, but a quick test with
GCC confirms your explanation:

$ arm-linux-gnueabihf-gcc -mbig-endian -O2 -S -o - \
   -xc - <<<"long f(long long l) { return l >> 32; }"

just produces

bx lr

whereas removing the -mbig-endian gives

mov r0, r1
bx lr


I tested the change and it builds and runs fine (although I am not
sure how much coverage this code gets on an ordinary boot):

Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Ard Biesheuvel <ardb@kernel.org>
diff mbox series

Patch

diff --git a/arch/arm/include/asm/div64.h b/arch/arm/include/asm/div64.h
index 898e9c78a7e7..809efc51e90f 100644
--- a/arch/arm/include/asm/div64.h
+++ b/arch/arm/include/asm/div64.h
@@ -39,9 +39,10 @@  static inline uint32_t __div64_32(uint64_t *n, uint32_t base)
 	asm(	__asmeq("%0", __xh)
 		__asmeq("%1", "r2")
 		__asmeq("%2", "r0")
-		__asmeq("%3", "r4")
+		__asmeq("%3", "r0")
+		__asmeq("%4", "r4")
 		"bl	__do_div64"
-		: "=r" (__rem), "=r" (__res)
+		: "=r" (__rem), "=r" (__res), "=r" (__n)
 		: "r" (__n), "r" (__base)
 		: "ip", "lr", "cc");
 	*n = __res;