diff mbox series

[2/4] riscv: uaccess: use input constraints for ptr of __put_user

Message ID 20240625040500.1788-3-jszhang@kernel.org (mailing list archive)
State Changes Requested, archived
Headers show
Series riscv: uaccess: optimizations | expand

Checks

Context Check Description
conchuod/vmtest-for-next-PR fail PR summary
conchuod/patch-2-test-1 success .github/scripts/patches/tests/build_rv32_defconfig.sh
conchuod/patch-2-test-2 success .github/scripts/patches/tests/build_rv64_clang_allmodconfig.sh
conchuod/patch-2-test-3 success .github/scripts/patches/tests/build_rv64_gcc_allmodconfig.sh
conchuod/patch-2-test-4 success .github/scripts/patches/tests/build_rv64_nommu_k210_defconfig.sh
conchuod/patch-2-test-5 success .github/scripts/patches/tests/build_rv64_nommu_virt_defconfig.sh
conchuod/patch-2-test-6 success .github/scripts/patches/tests/checkpatch.sh
conchuod/patch-2-test-7 success .github/scripts/patches/tests/dtb_warn_rv64.sh
conchuod/patch-2-test-8 success .github/scripts/patches/tests/header_inline.sh
conchuod/patch-2-test-9 success .github/scripts/patches/tests/kdoc.sh
conchuod/patch-2-test-10 success .github/scripts/patches/tests/module_param.sh
conchuod/patch-2-test-11 success .github/scripts/patches/tests/verify_fixes.sh
conchuod/patch-2-test-12 success .github/scripts/patches/tests/verify_signedoff.sh

Commit Message

Jisheng Zhang June 25, 2024, 4:04 a.m. UTC
I believe the output constraints "=m" is not necessary, because
the instruction itself is "write", we don't need the compiler
to "write" for us. So tell compiler we read from memory instead
of writing.

Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
---
 arch/riscv/include/asm/uaccess.h | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

Comments

Arnd Bergmann June 25, 2024, 5:54 a.m. UTC | #1
On Tue, Jun 25, 2024, at 06:04, Jisheng Zhang wrote:
> I believe the output constraints "=m" is not necessary, because
> the instruction itself is "write", we don't need the compiler
> to "write" for us. So tell compiler we read from memory instead
> of writing.
>
> Signed-off-by: Jisheng Zhang <jszhang@kernel.org>

I think this is a bit too confusing: clearly there is no
read access from the __user pointer, so what you add in here
is not correct. There also needs to be a code comment about
why you do it this way, as it's not clear that this is
a workaround for old compilers without
CONFIG_CC_HAS_ASM_GOTO_OUTPUT.

> index 09d4ca37522c..84b084e388a7 100644
> --- a/arch/riscv/include/asm/uaccess.h
> +++ b/arch/riscv/include/asm/uaccess.h
> @@ -186,11 +186,11 @@ do {								\
>  	__typeof__(*(ptr)) __x = x;				\
>  	__asm__ __volatile__ (					\
>  		"1:\n"						\
> -		"	" insn " %z2, %1\n"			\
> +		"	" insn " %z1, %2\n"			\
>  		"2:\n"						\
>  		_ASM_EXTABLE_UACCESS_ERR(1b, 2b, %0)		\
> -		: "+r" (err), "=m" (*(ptr))			\
> -		: "rJ" (__x));					\
> +		: "+r" (err)			\
> +		: "rJ" (__x), "m"(*(ptr)));					\
>  } while (0)
> 

I suspect this could just be a "r" constraint instead of
"m", treating the __user pointer as a plain integer.

For kernel pointers, using "m" and "=m" constraints
correctly is necessary since gcc will often access the
same data from C code as well. For __user pointers, we
can probably get away without it since no C code is
ever allowed to just dereference them. If you do that,
you may want to have the same thing in the __get_user
side.

      Arnd
Jisheng Zhang June 26, 2024, 12:32 p.m. UTC | #2
On Tue, Jun 25, 2024 at 07:54:30AM +0200, Arnd Bergmann wrote:
> On Tue, Jun 25, 2024, at 06:04, Jisheng Zhang wrote:
> > I believe the output constraints "=m" is not necessary, because
> > the instruction itself is "write", we don't need the compiler
> > to "write" for us. So tell compiler we read from memory instead
> > of writing.
> >
> > Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
> 
> I think this is a bit too confusing: clearly there is no
> read access from the __user pointer, so what you add in here
> is not correct. There also needs to be a code comment about

Here is my understanding: the __put_user is implemented with
sd(or its less wider variant, sw etc.), w/o considering the
ex_table, the previous code can be simplified as below:

__asm__ __volatile__ (
	"sw	%z2, %1\n"
	: "+r" (err), "=m" (*(ptr))
	: "rJ" (__x));

Here ptr is really an input, just tells gcc where to store,
And the "store" action is from the "sw" instruction, I don't
need the gcc generates "store" instruction for me. so IMHO,
there's no need to use output constraints here. so I changed
it to

__asm__ __volatile__ (
	"sw	%z1, %2\n"
	: "+r" (err)
	: "rJ" (__x), "m"(*(ptr)));

The key here: is this correct?


Here is the put_user piece code and comments from x86

/*
 * Tell gcc we read from memory instead of writing: this is because
 * we do not write to any memory gcc knows about, so there are no
 * aliasing issues.
 */
#define __put_user_goto(x, addr, itype, ltype, label)                   \
        asm goto("\n"                                                   \
                "1:     mov"itype" %0,%1\n"                             \
                _ASM_EXTABLE_UA(1b, %l2)                                \
                : : ltype(x), "m" (__m(addr))                           \
                : : label)


As can be seen, x86 also doesn't put the (addr) in output constraints,
I think x86 version did similar modification in history, but when I tried
to searh the git history, the comment is there from the git first day.

Any hint or suggestion is appreciated!

> why you do it this way, as it's not clear that this is
> a workaround for old compilers without
> CONFIG_CC_HAS_ASM_GOTO_OUTPUT.
> 
> > index 09d4ca37522c..84b084e388a7 100644
> > --- a/arch/riscv/include/asm/uaccess.h
> > +++ b/arch/riscv/include/asm/uaccess.h
> > @@ -186,11 +186,11 @@ do {								\
> >  	__typeof__(*(ptr)) __x = x;				\
> >  	__asm__ __volatile__ (					\
> >  		"1:\n"						\
> > -		"	" insn " %z2, %1\n"			\
> > +		"	" insn " %z1, %2\n"			\
> >  		"2:\n"						\
> >  		_ASM_EXTABLE_UACCESS_ERR(1b, 2b, %0)		\
> > -		: "+r" (err), "=m" (*(ptr))			\
> > -		: "rJ" (__x));					\
> > +		: "+r" (err)			\
> > +		: "rJ" (__x), "m"(*(ptr)));					\
> >  } while (0)
> > 
> 
> I suspect this could just be a "r" constraint instead of
> "m", treating the __user pointer as a plain integer.

I tried "r", the generated code is not as good as "m"

for example
__put_user(0x12, &frame->uc.uc_flags);

with "m", the generated code will be

...
csrs    sstatus,a5
li      a4,18
sd      a4,128(s1)
csrc    sstatus,a5
...


with "r", the generated code will be

...
csrs    sstatus,a5
li      a4,18
addi    s1,s1,128
sd      a4,0(s1)
csrc    sstatus,a5
...

As can be seen, "m" can make use of the 'offset' of
sd, so save one instruction.

> 
> For kernel pointers, using "m" and "=m" constraints
> correctly is necessary since gcc will often access the
> same data from C code as well. For __user pointers, we
> can probably get away without it since no C code is
> ever allowed to just dereference them. If you do that,
> you may want to have the same thing in the __get_user
> side.
> 
>       Arnd
Jisheng Zhang June 26, 2024, 12:49 p.m. UTC | #3
On Wed, Jun 26, 2024 at 08:32:38PM +0800, Jisheng Zhang wrote:
> On Tue, Jun 25, 2024 at 07:54:30AM +0200, Arnd Bergmann wrote:
> > On Tue, Jun 25, 2024, at 06:04, Jisheng Zhang wrote:
> > > I believe the output constraints "=m" is not necessary, because
> > > the instruction itself is "write", we don't need the compiler
> > > to "write" for us. So tell compiler we read from memory instead
> > > of writing.
> > >
> > > Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
> > 
> > I think this is a bit too confusing: clearly there is no
> > read access from the __user pointer, so what you add in here
> > is not correct. There also needs to be a code comment about
> 
> Here is my understanding: the __put_user is implemented with
> sd(or its less wider variant, sw etc.), w/o considering the
> ex_table, the previous code can be simplified as below:
> 
> __asm__ __volatile__ (
> 	"sw	%z2, %1\n"
> 	: "+r" (err), "=m" (*(ptr))
> 	: "rJ" (__x));
> 
> Here ptr is really an input, just tells gcc where to store,
> And the "store" action is from the "sw" instruction, I don't
> need the gcc generates "store" instruction for me. so IMHO,
> there's no need to use output constraints here. so I changed
> it to
> 
> __asm__ __volatile__ (
> 	"sw	%z1, %2\n"
> 	: "+r" (err)
> 	: "rJ" (__x), "m"(*(ptr)));
> 
> The key here: is this correct?
> 
> 
> Here is the put_user piece code and comments from x86
> 
> /*
>  * Tell gcc we read from memory instead of writing: this is because
>  * we do not write to any memory gcc knows about, so there are no
>  * aliasing issues.
>  */
> #define __put_user_goto(x, addr, itype, ltype, label)                   \
>         asm goto("\n"                                                   \
>                 "1:     mov"itype" %0,%1\n"                             \
>                 _ASM_EXTABLE_UA(1b, %l2)                                \
>                 : : ltype(x), "m" (__m(addr))                           \
>                 : : label)

Here is the simplified put_user piece code of arm64:

#define __put_mem_asm(store, reg, x, addr, err, type)                   \
        asm volatile(                                                   \
        "1:     " store "       " reg "1, [%2]\n"                       \
        "2:\n"                                                          \
        _ASM_EXTABLE_##type##ACCESS_ERR(1b, 2b, %w0)                    \
        : "+r" (err)                                                    \
        : "rZ" (x), "r" (addr))

no output constraints either. It just uses "r" input constraints to tell
gcc to read the store address into one proper GP reg.

> 
> 
> As can be seen, x86 also doesn't put the (addr) in output constraints,
> I think x86 version did similar modification in history, but when I tried
> to searh the git history, the comment is there from the git first day.
> 
> Any hint or suggestion is appreciated!
> 
> > why you do it this way, as it's not clear that this is
> > a workaround for old compilers without
> > CONFIG_CC_HAS_ASM_GOTO_OUTPUT.
> > 
> > > index 09d4ca37522c..84b084e388a7 100644
> > > --- a/arch/riscv/include/asm/uaccess.h
> > > +++ b/arch/riscv/include/asm/uaccess.h
> > > @@ -186,11 +186,11 @@ do {								\
> > >  	__typeof__(*(ptr)) __x = x;				\
> > >  	__asm__ __volatile__ (					\
> > >  		"1:\n"						\
> > > -		"	" insn " %z2, %1\n"			\
> > > +		"	" insn " %z1, %2\n"			\
> > >  		"2:\n"						\
> > >  		_ASM_EXTABLE_UACCESS_ERR(1b, 2b, %0)		\
> > > -		: "+r" (err), "=m" (*(ptr))			\
> > > -		: "rJ" (__x));					\
> > > +		: "+r" (err)			\
> > > +		: "rJ" (__x), "m"(*(ptr)));					\
> > >  } while (0)
> > > 
> > 
> > I suspect this could just be a "r" constraint instead of
> > "m", treating the __user pointer as a plain integer.
> 
> I tried "r", the generated code is not as good as "m"
> 
> for example
> __put_user(0x12, &frame->uc.uc_flags);
> 
> with "m", the generated code will be
> 
> ...
> csrs    sstatus,a5
> li      a4,18
> sd      a4,128(s1)
> csrc    sstatus,a5
> ...
> 
> 
> with "r", the generated code will be
> 
> ...
> csrs    sstatus,a5
> li      a4,18
> addi    s1,s1,128
> sd      a4,0(s1)
> csrc    sstatus,a5
> ...
> 
> As can be seen, "m" can make use of the 'offset' of
> sd, so save one instruction.
> 
> > 
> > For kernel pointers, using "m" and "=m" constraints
> > correctly is necessary since gcc will often access the
> > same data from C code as well. For __user pointers, we
> > can probably get away without it since no C code is
> > ever allowed to just dereference them. If you do that,
> > you may want to have the same thing in the __get_user
> > side.
> > 
> >       Arnd
Jisheng Zhang June 26, 2024, 1:12 p.m. UTC | #4
On Wed, Jun 26, 2024 at 03:12:50PM +0200, Andreas Schwab wrote:
> On Jun 25 2024, Jisheng Zhang wrote:
> 
> > I believe the output constraints "=m" is not necessary, because
> > the instruction itself is "write", we don't need the compiler
> > to "write" for us.
> 
> No, this is backwards.  Being an output operand means that the *asm* is
> writing to it, and the compiler can read the value from there afterwards
> (and the previous value is dead before the asm).

Hi Andreas,

I compared tens of __put_user() caller's generated code between orig
version and patched version, they are the same. Sure maybe this is
not enough. 

But your explanation can be applied to x86 and arm64 __put_user()
implementations, asm is also writing, then why there's no output
constraints there?(see the other two emails)? Could you please help
me to understand the tricky points?

Thanks in advance
Andreas Schwab June 26, 2024, 1:12 p.m. UTC | #5
On Jun 25 2024, Jisheng Zhang wrote:

> I believe the output constraints "=m" is not necessary, because
> the instruction itself is "write", we don't need the compiler
> to "write" for us.

No, this is backwards.  Being an output operand means that the *asm* is
writing to it, and the compiler can read the value from there afterwards
(and the previous value is dead before the asm).
Jisheng Zhang June 26, 2024, 1:18 p.m. UTC | #6
On Wed, Jun 26, 2024 at 08:49:59PM +0800, Jisheng Zhang wrote:
> On Wed, Jun 26, 2024 at 08:32:38PM +0800, Jisheng Zhang wrote:
> > On Tue, Jun 25, 2024 at 07:54:30AM +0200, Arnd Bergmann wrote:
> > > On Tue, Jun 25, 2024, at 06:04, Jisheng Zhang wrote:
> > > > I believe the output constraints "=m" is not necessary, because
> > > > the instruction itself is "write", we don't need the compiler
> > > > to "write" for us. So tell compiler we read from memory instead
> > > > of writing.
> > > >
> > > > Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
> > > 
> > > I think this is a bit too confusing: clearly there is no
> > > read access from the __user pointer, so what you add in here
> > > is not correct. There also needs to be a code comment about
> > 
> > Here is my understanding: the __put_user is implemented with
> > sd(or its less wider variant, sw etc.), w/o considering the
> > ex_table, the previous code can be simplified as below:
> > 
> > __asm__ __volatile__ (
> > 	"sw	%z2, %1\n"
> > 	: "+r" (err), "=m" (*(ptr))
> > 	: "rJ" (__x));
> > 
> > Here ptr is really an input, just tells gcc where to store,
> > And the "store" action is from the "sw" instruction, I don't
> > need the gcc generates "store" instruction for me. so IMHO,
> > there's no need to use output constraints here. so I changed
> > it to
> > 
> > __asm__ __volatile__ (
> > 	"sw	%z1, %2\n"
> > 	: "+r" (err)
> > 	: "rJ" (__x), "m"(*(ptr)));
> > 
> > The key here: is this correct?
> > 
> > 
> > Here is the put_user piece code and comments from x86
> > 
> > /*
> >  * Tell gcc we read from memory instead of writing: this is because
> >  * we do not write to any memory gcc knows about, so there are no
> >  * aliasing issues.
> >  */
> > #define __put_user_goto(x, addr, itype, ltype, label)                   \
> >         asm goto("\n"                                                   \
> >                 "1:     mov"itype" %0,%1\n"                             \
> >                 _ASM_EXTABLE_UA(1b, %l2)                                \
> >                 : : ltype(x), "m" (__m(addr))                           \
> >                 : : label)
> 
> Here is the simplified put_user piece code of arm64:
> 
> #define __put_mem_asm(store, reg, x, addr, err, type)                   \
>         asm volatile(                                                   \
>         "1:     " store "       " reg "1, [%2]\n"                       \
>         "2:\n"                                                          \
>         _ASM_EXTABLE_##type##ACCESS_ERR(1b, 2b, %w0)                    \
>         : "+r" (err)                                                    \
>         : "rZ" (x), "r" (addr))
> 
> no output constraints either. It just uses "r" input constraints to tell

make it accurate: by this I mean the "addr" of __put_user() isn't
in the output constraints.

> gcc to read the store address into one proper GP reg.
> 
> > 
> > 
> > As can be seen, x86 also doesn't put the (addr) in output constraints,
> > I think x86 version did similar modification in history, but when I tried
> > to searh the git history, the comment is there from the git first day.
> > 
> > Any hint or suggestion is appreciated!
> > 
> > > why you do it this way, as it's not clear that this is
> > > a workaround for old compilers without
> > > CONFIG_CC_HAS_ASM_GOTO_OUTPUT.
> > > 
> > > > index 09d4ca37522c..84b084e388a7 100644
> > > > --- a/arch/riscv/include/asm/uaccess.h
> > > > +++ b/arch/riscv/include/asm/uaccess.h
> > > > @@ -186,11 +186,11 @@ do {								\
> > > >  	__typeof__(*(ptr)) __x = x;				\
> > > >  	__asm__ __volatile__ (					\
> > > >  		"1:\n"						\
> > > > -		"	" insn " %z2, %1\n"			\
> > > > +		"	" insn " %z1, %2\n"			\
> > > >  		"2:\n"						\
> > > >  		_ASM_EXTABLE_UACCESS_ERR(1b, 2b, %0)		\
> > > > -		: "+r" (err), "=m" (*(ptr))			\
> > > > -		: "rJ" (__x));					\
> > > > +		: "+r" (err)			\
> > > > +		: "rJ" (__x), "m"(*(ptr)));					\
> > > >  } while (0)
> > > > 
> > > 
> > > I suspect this could just be a "r" constraint instead of
> > > "m", treating the __user pointer as a plain integer.
> > 
> > I tried "r", the generated code is not as good as "m"
> > 
> > for example
> > __put_user(0x12, &frame->uc.uc_flags);
> > 
> > with "m", the generated code will be
> > 
> > ...
> > csrs    sstatus,a5
> > li      a4,18
> > sd      a4,128(s1)
> > csrc    sstatus,a5
> > ...
> > 
> > 
> > with "r", the generated code will be
> > 
> > ...
> > csrs    sstatus,a5
> > li      a4,18
> > addi    s1,s1,128
> > sd      a4,0(s1)
> > csrc    sstatus,a5
> > ...
> > 
> > As can be seen, "m" can make use of the 'offset' of
> > sd, so save one instruction.
> > 
> > > 
> > > For kernel pointers, using "m" and "=m" constraints
> > > correctly is necessary since gcc will often access the
> > > same data from C code as well. For __user pointers, we
> > > can probably get away without it since no C code is
> > > ever allowed to just dereference them. If you do that,
> > > you may want to have the same thing in the __get_user
> > > side.
> > > 
> > >       Arnd
Andreas Schwab June 26, 2024, 1:35 p.m. UTC | #7
On Jun 26 2024, Jisheng Zhang wrote:

> no output constraints either. It just uses "r" input constraints to tell
> gcc to read the store address into one proper GP reg.

Again, this is backwards.  Being an input operand means the asm is using
this operand as an input to the instructions.  The compiler needs to
arrange to put the value in the allocated operand location according to
the constraint.
Jisheng Zhang June 26, 2024, 1:54 p.m. UTC | #8
On Wed, Jun 26, 2024 at 03:35:54PM +0200, Andreas Schwab wrote:
> On Jun 26 2024, Jisheng Zhang wrote:
> 
> > no output constraints either. It just uses "r" input constraints to tell
> > gcc to read the store address into one proper GP reg.
> 
> Again, this is backwards.  Being an input operand means the asm is using
> this operand as an input to the instructions.  The compiler needs to
> arrange to put the value in the allocated operand location according to
> the constraint.

Hi Andreas,

Your information is clearly received. What confused me is:

why x86 and arm64 don't put the "addr" of __put_user into output
constraints? Especially the following comments, why this is "read"
from memory?

 * Tell gcc we read from memory instead of writing: this is because
 * we do not write to any memory gcc knows about, so there are no
 * aliasing issues.

can you please kindly help me understand the tricky points here?

thanks
> 
> -- 
> Andreas Schwab, SUSE Labs, schwab@suse.de
> GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
> "And now for something completely different."
Arnd Bergmann June 26, 2024, 2:25 p.m. UTC | #9
On Wed, Jun 26, 2024, at 15:12, Jisheng Zhang wrote:
> On Wed, Jun 26, 2024 at 03:12:50PM +0200, Andreas Schwab wrote:
>> On Jun 25 2024, Jisheng Zhang wrote:
>> 
>> > I believe the output constraints "=m" is not necessary, because
>> > the instruction itself is "write", we don't need the compiler
>> > to "write" for us.
>> 
>> No, this is backwards.  Being an output operand means that the *asm* is
>> writing to it, and the compiler can read the value from there afterwards
>> (and the previous value is dead before the asm).
>
> Hi Andreas,
>
> I compared tens of __put_user() caller's generated code between orig
> version and patched version, they are the same. Sure maybe this is
> not enough. 
>
> But your explanation can be applied to x86 and arm64 __put_user()
> implementations, asm is also writing, then why there's no output
> constraints there?(see the other two emails)? Could you please help
> me to understand the tricky points?

I think part of the reason for the specific way the x86
user access is written is to work around bugs in old
compiler versions, as well as to take advantage of the
complex addressing modes in x86 assembler, see this bit
that dates back to the earliest version of the x86_64
codebase and is still left in place:

/* FIXME: this hack is definitely wrong -AK */
struct __large_struct { unsigned long buf[100]; };
#define __m(x) (*(struct __large_struct __user *)(x))

Using the memory input constraint means that x86 can use
a load from a pointer plus offset, but riscv doesn't
actually do this. The __large_struct I think was needed
either to prevent the compiler from reading the data
outside of the assembly, or to tell the compiler about
the fact that there is an actual memory access if
__put_user() was pointed at kernel memory.

If you just copy from the arm64 version that uses an
"r"(address) constraint instead of the "m"(*address)
version, it should be fine for any user space access.

The output constraint is technically still be needed
if we have code like this one where we actually write to
something in kernel space:

int f(void)
{
     int a = 1;
     int b = 2;
     __put_kernel_nofault(&a, &b, int, error);
     return a;
error:
     return -EFAULT;
}

In this case, __put_kernel_nofault() writes the value
of b into a, but the compiler can safely assume that
a is not changed by the assembly because there is no
memory output, and would likely just return a constant '1'. 

For put_user(), this cannot happen because the compiler
doesn't know anything about the contents of the __user
pointer. For __put_kernel_nofault(), we rely on the
callers never using it on pointers they access, which
is probably a reasonable assumption, but not entirely
correct.

     Arnd
Jisheng Zhang June 26, 2024, 4:02 p.m. UTC | #10
On Wed, Jun 26, 2024 at 04:25:26PM +0200, Arnd Bergmann wrote:
> On Wed, Jun 26, 2024, at 15:12, Jisheng Zhang wrote:
> > On Wed, Jun 26, 2024 at 03:12:50PM +0200, Andreas Schwab wrote:
> >> On Jun 25 2024, Jisheng Zhang wrote:
> >> 
> >> > I believe the output constraints "=m" is not necessary, because
> >> > the instruction itself is "write", we don't need the compiler
> >> > to "write" for us.
> >> 
> >> No, this is backwards.  Being an output operand means that the *asm* is
> >> writing to it, and the compiler can read the value from there afterwards
> >> (and the previous value is dead before the asm).
> >
> > Hi Andreas,
> >
> > I compared tens of __put_user() caller's generated code between orig
> > version and patched version, they are the same. Sure maybe this is
> > not enough. 
> >
> > But your explanation can be applied to x86 and arm64 __put_user()
> > implementations, asm is also writing, then why there's no output
> > constraints there?(see the other two emails)? Could you please help
> > me to understand the tricky points?
> 
> I think part of the reason for the specific way the x86
> user access is written is to work around bugs in old
> compiler versions, as well as to take advantage of the
> complex addressing modes in x86 assembler, see this bit
> that dates back to the earliest version of the x86_64
> codebase and is still left in place:
> 
> /* FIXME: this hack is definitely wrong -AK */
> struct __large_struct { unsigned long buf[100]; };
> #define __m(x) (*(struct __large_struct __user *)(x))
> 
> Using the memory input constraint means that x86 can use
> a load from a pointer plus offset, but riscv doesn't
> actually do this. The __large_struct I think was needed
> either to prevent the compiler from reading the data
> outside of the assembly, or to tell the compiler about
> the fact that there is an actual memory access if
> __put_user() was pointed at kernel memory.

Thank you so much, Arnd!

> 
> If you just copy from the arm64 version that uses an
> "r"(address) constraint instead of the "m"(*address)

"m" version is better than "r", usually can save one
instruction.
I will try to combine other constraints with "r" to
see whether we can still generate the sd with offset
instruction. If can't, seems sticking with "m" and keeping
output constraints is better

> version, it should be fine for any user space access.

You only mention "user space access", so just curious, does
arm64 version still correctly work with below __put_kernel_nofault()
example?

> 
> The output constraint is technically still be needed
> if we have code like this one where we actually write to
> something in kernel space:
> 
> int f(void)
> {
>      int a = 1;
>      int b = 2;
>      __put_kernel_nofault(&a, &b, int, error);
>      return a;
> error:
>      return -EFAULT;
> }
> 
> In this case, __put_kernel_nofault() writes the value
> of b into a, but the compiler can safely assume that
> a is not changed by the assembly because there is no
> memory output, and would likely just return a constant '1'. 
> 
> For put_user(), this cannot happen because the compiler
> doesn't know anything about the contents of the __user
> pointer. For __put_kernel_nofault(), we rely on the
> callers never using it on pointers they access, which
> is probably a reasonable assumption, but not entirely
> correct.
> 
>      Arnd

Well explained! Thanks a lot.
Arnd Bergmann June 27, 2024, 6:46 a.m. UTC | #11
On Wed, Jun 26, 2024, at 18:02, Jisheng Zhang wrote:
>
> "m" version is better than "r", usually can save one
> instruction.
> I will try to combine other constraints with "r" to
> see whether we can still generate the sd with offset
> instruction. If can't, seems sticking with "m" and keeping
> output constraints is better

Ah, I see.

> You only mention "user space access", so just curious, does
> arm64 version still correctly work with below __put_kernel_nofault()
> example?

No, I think the example I gave would break for both x86 and arm64
without adding an output constraint.

My main concern about using an input constraint was that it
doesn't match what the code does. Maybe there is a way to
make it use the correct "=m" output when CONFIG_CC_HAS_ASM_GOTO_OUTPUT
is set but use either "r" or "m" inputs on older gcc releases.

After gcc-11 becomes the minimum in a few years, the hack can
be removed.

     Arnd
David Laight June 28, 2024, 3:36 p.m. UTC | #12
From: Arnd Bergmann
> Sent: 26 June 2024 15:25
...
> If you just copy from the arm64 version that uses an
> "r"(address) constraint instead of the "m"(*address)
> version, it should be fine for any user space access.

Arm certainly has 'reg+offset' addressing and I'd have thought
the RISC-V would have it as well.

I'd guess that the compiler also knows when the offset is too big.

Probably noticeable when code is accessing structures in user memory.

OTOH I can't remember if "m" implies a memory clobber?
For user copies the memory clobber isn't needed and not having it
may well allow better instruction scheduling.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
diff mbox series

Patch

diff --git a/arch/riscv/include/asm/uaccess.h b/arch/riscv/include/asm/uaccess.h
index 09d4ca37522c..84b084e388a7 100644
--- a/arch/riscv/include/asm/uaccess.h
+++ b/arch/riscv/include/asm/uaccess.h
@@ -186,11 +186,11 @@  do {								\
 	__typeof__(*(ptr)) __x = x;				\
 	__asm__ __volatile__ (					\
 		"1:\n"						\
-		"	" insn " %z2, %1\n"			\
+		"	" insn " %z1, %2\n"			\
 		"2:\n"						\
 		_ASM_EXTABLE_UACCESS_ERR(1b, 2b, %0)		\
-		: "+r" (err), "=m" (*(ptr))			\
-		: "rJ" (__x));					\
+		: "+r" (err)			\
+		: "rJ" (__x), "m"(*(ptr)));					\
 } while (0)
 
 #ifdef CONFIG_64BIT
@@ -203,16 +203,16 @@  do {								\
 	u64 __x = (__typeof__((x)-(x)))(x);			\
 	__asm__ __volatile__ (					\
 		"1:\n"						\
-		"	sw %z3, %1\n"				\
+		"	sw %z1, %3\n"				\
 		"2:\n"						\
-		"	sw %z4, %2\n"				\
+		"	sw %z2, %4\n"				\
 		"3:\n"						\
 		_ASM_EXTABLE_UACCESS_ERR(1b, 3b, %0)		\
 		_ASM_EXTABLE_UACCESS_ERR(2b, 3b, %0)		\
-		: "+r" (err),					\
-			"=m" (__ptr[__LSW]),			\
-			"=m" (__ptr[__MSW])			\
-		: "rJ" (__x), "rJ" (__x >> 32));		\
+		: "+r" (err)					\
+		: "rJ" (__x), "rJ" (__x >> 32),			\
+			"m" (__ptr[__LSW]),			\
+			"m" (__ptr[__MSW]));			\
 } while (0)
 #endif /* CONFIG_64BIT */