diff mbox series

x86/syscall: Avoid memcpy() for ia32 syscall_get_arguments()

Message ID 20240708202202.work.477-kees@kernel.org (mailing list archive)
State Mainlined
Commit d19d638b1e6cf746263ef60b7d0dee0204d8216a
Headers show
Series x86/syscall: Avoid memcpy() for ia32 syscall_get_arguments() | expand

Commit Message

Kees Cook July 8, 2024, 8:22 p.m. UTC
Modern (fortified) memcpy() prefers to avoid writing (or reading) beyond
the end of the addressed destination (or source) struct member:

In function ‘fortify_memcpy_chk’,
    inlined from ‘syscall_get_arguments’ at ./arch/x86/include/asm/syscall.h:85:2,
    inlined from ‘populate_seccomp_data’ at kernel/seccomp.c:258:2,
    inlined from ‘__seccomp_filter’ at kernel/seccomp.c:1231:3:
./include/linux/fortify-string.h:580:25: error: call to ‘__read_overflow2_field’ declared with attribute warning: detected read beyond size of field (2nd parameter); maybe use struct_group()? [-Werror=attribute-warning]
  580 |                         __read_overflow2_field(q_size_field, size);
      |                         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

As already done for x86_64 and compat mode, do not use memcpy() to
extract syscall arguments from struct pt_regs but rather just perform
direct assignments. Binary output differences are negligible, and actually
ends up using less stack space:

-       sub    $0x84,%esp
+       sub    $0x6c,%esp

and less text size:

   text    data     bss     dec     hex filename
  10794     252       0   11046    2b26 gcc-32b/kernel/seccomp.o.stock
  10714     252       0   10966    2ad6 gcc-32b/kernel/seccomp.o.after

Reported-by: Mirsad Todorovac <mtodorovac69@gmail.com>
Closes: https://lore.kernel.org/lkml/9b69fb14-df89-4677-9c82-056ea9e706f5@gmail.com/
Signed-off-by: Kees Cook <kees@kernel.org>
---
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: x86@kernel.org
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Daniel Sneddon <daniel.sneddon@linux.intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Josh Poimboeuf <jpoimboe@kernel.org>
Cc: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Cc: Peter Collingbourne <pcc@google.com>
---
 arch/x86/include/asm/syscall.h | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

Comments

Gustavo A. R. Silva July 8, 2024, 11:44 p.m. UTC | #1
On 7/8/24 14:22, Kees Cook wrote:
> Modern (fortified) memcpy() prefers to avoid writing (or reading) beyond
> the end of the addressed destination (or source) struct member:
> 
> In function ‘fortify_memcpy_chk’,
>      inlined from ‘syscall_get_arguments’ at ./arch/x86/include/asm/syscall.h:85:2,
>      inlined from ‘populate_seccomp_data’ at kernel/seccomp.c:258:2,
>      inlined from ‘__seccomp_filter’ at kernel/seccomp.c:1231:3:
> ./include/linux/fortify-string.h:580:25: error: call to ‘__read_overflow2_field’ declared with attribute warning: detected read beyond size of field (2nd parameter); maybe use struct_group()? [-Werror=attribute-warning]
>    580 |                         __read_overflow2_field(q_size_field, size);
>        |                         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> 
> As already done for x86_64 and compat mode, do not use memcpy() to
> extract syscall arguments from struct pt_regs but rather just perform
> direct assignments. Binary output differences are negligible, and actually
> ends up using less stack space:
> 
> -       sub    $0x84,%esp
> +       sub    $0x6c,%esp
> 
> and less text size:
> 
>     text    data     bss     dec     hex filename
>    10794     252       0   11046    2b26 gcc-32b/kernel/seccomp.o.stock
>    10714     252       0   10966    2ad6 gcc-32b/kernel/seccomp.o.after
> 
> Reported-by: Mirsad Todorovac <mtodorovac69@gmail.com>
> Closes: https://lore.kernel.org/lkml/9b69fb14-df89-4677-9c82-056ea9e706f5@gmail.com/
> Signed-off-by: Kees Cook <kees@kernel.org>
> ---
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: x86@kernel.org
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: Daniel Sneddon <daniel.sneddon@linux.intel.com>
> Cc: Arnd Bergmann <arnd@arndb.de>
> Cc: Brian Gerst <brgerst@gmail.com>
> Cc: Josh Poimboeuf <jpoimboe@kernel.org>
> Cc: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
> Cc: Peter Collingbourne <pcc@google.com>

Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>

Thanks
--
Gustavo

> ---
>   arch/x86/include/asm/syscall.h | 7 ++++++-
>   1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/include/asm/syscall.h b/arch/x86/include/asm/syscall.h
> index 2fc7bc3863ff..7c488ff0c764 100644
> --- a/arch/x86/include/asm/syscall.h
> +++ b/arch/x86/include/asm/syscall.h
> @@ -82,7 +82,12 @@ static inline void syscall_get_arguments(struct task_struct *task,
>   					 struct pt_regs *regs,
>   					 unsigned long *args)
>   {
> -	memcpy(args, &regs->bx, 6 * sizeof(args[0]));
> +	args[0] = regs->bx;
> +	args[1] = regs->cx;
> +	args[2] = regs->dx;
> +	args[3] = regs->si;
> +	args[4] = regs->di;
> +	args[5] = regs->bp;
>   }
>   
>   static inline int syscall_get_arch(struct task_struct *task)
Mirsad Todorovac July 9, 2024, 6:20 p.m. UTC | #2
On 7/9/24 01:44, Gustavo A. R. Silva wrote:
> 
> 
> On 7/8/24 14:22, Kees Cook wrote:
>> Modern (fortified) memcpy() prefers to avoid writing (or reading) beyond
>> the end of the addressed destination (or source) struct member:
>>
>> In function ‘fortify_memcpy_chk’,
>>      inlined from ‘syscall_get_arguments’ at ./arch/x86/include/asm/syscall.h:85:2,
>>      inlined from ‘populate_seccomp_data’ at kernel/seccomp.c:258:2,
>>      inlined from ‘__seccomp_filter’ at kernel/seccomp.c:1231:3:
>> ./include/linux/fortify-string.h:580:25: error: call to ‘__read_overflow2_field’ declared with attribute warning: detected read beyond size of field (2nd parameter); maybe use struct_group()? [-Werror=attribute-warning]
>>    580 |                         __read_overflow2_field(q_size_field, size);
>>        |                         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>
>> As already done for x86_64 and compat mode, do not use memcpy() to
>> extract syscall arguments from struct pt_regs but rather just perform
>> direct assignments. Binary output differences are negligible, and actually
>> ends up using less stack space:
>>
>> -       sub    $0x84,%esp
>> +       sub    $0x6c,%esp
>>
>> and less text size:
>>
>>     text    data     bss     dec     hex filename
>>    10794     252       0   11046    2b26 gcc-32b/kernel/seccomp.o.stock
>>    10714     252       0   10966    2ad6 gcc-32b/kernel/seccomp.o.after
>>
>> Reported-by: Mirsad Todorovac <mtodorovac69@gmail.com>
>> Closes: https://lore.kernel.org/lkml/9b69fb14-df89-4677-9c82-056ea9e706f5@gmail.com/
>> Signed-off-by: Kees Cook <kees@kernel.org>
>> ---
>> Cc: Thomas Gleixner <tglx@linutronix.de>
>> Cc: Ingo Molnar <mingo@redhat.com>
>> Cc: Borislav Petkov <bp@alien8.de>
>> Cc: Dave Hansen <dave.hansen@linux.intel.com>
>> Cc: x86@kernel.org
>> Cc: "H. Peter Anvin" <hpa@zytor.com>
>> Cc: Daniel Sneddon <daniel.sneddon@linux.intel.com>
>> Cc: Arnd Bergmann <arnd@arndb.de>
>> Cc: Brian Gerst <brgerst@gmail.com>
>> Cc: Josh Poimboeuf <jpoimboe@kernel.org>
>> Cc: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
>> Cc: Peter Collingbourne <pcc@google.com>
> 
> Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
> 
> Thanks

I can confirm that the error was fixed after applying the patch, in the same build environment.

Tested-by: Mirsad Todorovac <mtodorovac69@gmail.com>

However, why memcpy() directly from struct pt_regs doesn't work is beyond my understanding :-/

FWIW, bulk memcpy() might be replaced by a single assembler instruction? Or am I thinking still
in 6502 mode? :-)

Best regards,
Mirsad Todorovac

> -- 
> Gustavo
> 
>> ---
>>   arch/x86/include/asm/syscall.h | 7 ++++++-
>>   1 file changed, 6 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/x86/include/asm/syscall.h b/arch/x86/include/asm/syscall.h
>> index 2fc7bc3863ff..7c488ff0c764 100644
>> --- a/arch/x86/include/asm/syscall.h
>> +++ b/arch/x86/include/asm/syscall.h
>> @@ -82,7 +82,12 @@ static inline void syscall_get_arguments(struct task_struct *task,
>>                        struct pt_regs *regs,
>>                        unsigned long *args)
>>   {
>> -    memcpy(args, &regs->bx, 6 * sizeof(args[0]));
>> +    args[0] = regs->bx;
>> +    args[1] = regs->cx;
>> +    args[2] = regs->dx;
>> +    args[3] = regs->si;
>> +    args[4] = regs->di;
>> +    args[5] = regs->bp;
>>   }
>>     static inline int syscall_get_arch(struct task_struct *task)
Gustavo A. R. Silva July 9, 2024, 6:37 p.m. UTC | #3
On 09/07/24 12:20, Mirsad Todorovac wrote:
> 
> 
> On 7/9/24 01:44, Gustavo A. R. Silva wrote:
>>
>>
>> On 7/8/24 14:22, Kees Cook wrote:
>>> Modern (fortified) memcpy() prefers to avoid writing (or reading) beyond
>>> the end of the addressed destination (or source) struct member:
>>>
>>> In function ‘fortify_memcpy_chk’,
>>>       inlined from ‘syscall_get_arguments’ at ./arch/x86/include/asm/syscall.h:85:2,
>>>       inlined from ‘populate_seccomp_data’ at kernel/seccomp.c:258:2,
>>>       inlined from ‘__seccomp_filter’ at kernel/seccomp.c:1231:3:
>>> ./include/linux/fortify-string.h:580:25: error: call to ‘__read_overflow2_field’ declared with attribute warning: detected read beyond size of field (2nd parameter); maybe use struct_group()? [-Werror=attribute-warning]
>>>     580 |                         __read_overflow2_field(q_size_field, size);
>>>         |                         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>>
>>> As already done for x86_64 and compat mode, do not use memcpy() to
>>> extract syscall arguments from struct pt_regs but rather just perform
>>> direct assignments. Binary output differences are negligible, and actually
>>> ends up using less stack space:
>>>
>>> -       sub    $0x84,%esp
>>> +       sub    $0x6c,%esp
>>>
>>> and less text size:
>>>
>>>      text    data     bss     dec     hex filename
>>>     10794     252       0   11046    2b26 gcc-32b/kernel/seccomp.o.stock
>>>     10714     252       0   10966    2ad6 gcc-32b/kernel/seccomp.o.after
>>>
>>> Reported-by: Mirsad Todorovac <mtodorovac69@gmail.com>
>>> Closes: https://lore.kernel.org/lkml/9b69fb14-df89-4677-9c82-056ea9e706f5@gmail.com/
>>> Signed-off-by: Kees Cook <kees@kernel.org>
>>> ---
>>> Cc: Thomas Gleixner <tglx@linutronix.de>
>>> Cc: Ingo Molnar <mingo@redhat.com>
>>> Cc: Borislav Petkov <bp@alien8.de>
>>> Cc: Dave Hansen <dave.hansen@linux.intel.com>
>>> Cc: x86@kernel.org
>>> Cc: "H. Peter Anvin" <hpa@zytor.com>
>>> Cc: Daniel Sneddon <daniel.sneddon@linux.intel.com>
>>> Cc: Arnd Bergmann <arnd@arndb.de>
>>> Cc: Brian Gerst <brgerst@gmail.com>
>>> Cc: Josh Poimboeuf <jpoimboe@kernel.org>
>>> Cc: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
>>> Cc: Peter Collingbourne <pcc@google.com>
>>
>> Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
>>
>> Thanks
> 
> I can confirm that the error was fixed after applying the patch, in the same build environment.
> 
> Tested-by: Mirsad Todorovac <mtodorovac69@gmail.com>
> 
> However, why memcpy() directly from struct pt_regs doesn't work is beyond my understanding :-/
> 

This is because under CONFIG_FORTIFY_SOURCE=y, memcpy() prevents writing or reading beyond
the boundaries of dest/src objects.

--
Gustavo

> FWIW, bulk memcpy() might be replaced by a single assembler instruction? Or am I thinking still
> in 6502 mode? :-)
> 
> Best regards,
> Mirsad Todorovac
> 
>> -- 
>> Gustavo
>>
>>> ---
>>>    arch/x86/include/asm/syscall.h | 7 ++++++-
>>>    1 file changed, 6 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/arch/x86/include/asm/syscall.h b/arch/x86/include/asm/syscall.h
>>> index 2fc7bc3863ff..7c488ff0c764 100644
>>> --- a/arch/x86/include/asm/syscall.h
>>> +++ b/arch/x86/include/asm/syscall.h
>>> @@ -82,7 +82,12 @@ static inline void syscall_get_arguments(struct task_struct *task,
>>>                         struct pt_regs *regs,
>>>                         unsigned long *args)
>>>    {
>>> -    memcpy(args, &regs->bx, 6 * sizeof(args[0]));
>>> +    args[0] = regs->bx;
>>> +    args[1] = regs->cx;
>>> +    args[2] = regs->dx;
>>> +    args[3] = regs->si;
>>> +    args[4] = regs->di;
>>> +    args[5] = regs->bp;
>>>    }
>>>      static inline int syscall_get_arch(struct task_struct *task)
Dave Hansen July 11, 2024, 9:01 p.m. UTC | #4
On 7/8/24 13:22, Kees Cook wrote:
...
> diff --git a/arch/x86/include/asm/syscall.h b/arch/x86/include/asm/syscall.h
> index 2fc7bc3863ff..7c488ff0c764 100644
> --- a/arch/x86/include/asm/syscall.h
> +++ b/arch/x86/include/asm/syscall.h
> @@ -82,7 +82,12 @@ static inline void syscall_get_arguments(struct task_struct *task,
>  					 struct pt_regs *regs,
>  					 unsigned long *args)
>  {
> -	memcpy(args, &regs->bx, 6 * sizeof(args[0]));
> +	args[0] = regs->bx;
> +	args[1] = regs->cx;
> +	args[2] = regs->dx;
> +	args[3] = regs->si;
> +	args[4] = regs->di;
> +	args[5] = regs->bp;
>  }
>  

Yeah, that's much less magic.  I'll stick this in the queue to go in to
the tree in a few weeks.

Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Peter Zijlstra July 11, 2024, 9:44 p.m. UTC | #5
On Mon, Jul 08, 2024 at 01:22:06PM -0700, Kees Cook wrote:
> Modern (fortified) memcpy() prefers to avoid writing (or reading) beyond
> the end of the addressed destination (or source) struct member:
> 
> In function ‘fortify_memcpy_chk’,
>     inlined from ‘syscall_get_arguments’ at ./arch/x86/include/asm/syscall.h:85:2,
>     inlined from ‘populate_seccomp_data’ at kernel/seccomp.c:258:2,
>     inlined from ‘__seccomp_filter’ at kernel/seccomp.c:1231:3:
> ./include/linux/fortify-string.h:580:25: error: call to ‘__read_overflow2_field’ declared with attribute warning: detected read beyond size of field (2nd parameter); maybe use struct_group()? [-Werror=attribute-warning]
>   580 |                         __read_overflow2_field(q_size_field, size);
>       |                         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> 
> As already done for x86_64 and compat mode, do not use memcpy() to
> extract syscall arguments from struct pt_regs but rather just perform
> direct assignments. Binary output differences are negligible, and actually
> ends up using less stack space:
> 
> -       sub    $0x84,%esp
> +       sub    $0x6c,%esp
> 
> and less text size:
> 
>    text    data     bss     dec     hex filename
>   10794     252       0   11046    2b26 gcc-32b/kernel/seccomp.o.stock
>   10714     252       0   10966    2ad6 gcc-32b/kernel/seccomp.o.after
> 
> Reported-by: Mirsad Todorovac <mtodorovac69@gmail.com>
> Closes: https://lore.kernel.org/lkml/9b69fb14-df89-4677-9c82-056ea9e706f5@gmail.com/
> Signed-off-by: Kees Cook <kees@kernel.org>
> ---
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: x86@kernel.org
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: Daniel Sneddon <daniel.sneddon@linux.intel.com>
> Cc: Arnd Bergmann <arnd@arndb.de>
> Cc: Brian Gerst <brgerst@gmail.com>
> Cc: Josh Poimboeuf <jpoimboe@kernel.org>
> Cc: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
> Cc: Peter Collingbourne <pcc@google.com>
> ---
>  arch/x86/include/asm/syscall.h | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/include/asm/syscall.h b/arch/x86/include/asm/syscall.h
> index 2fc7bc3863ff..7c488ff0c764 100644
> --- a/arch/x86/include/asm/syscall.h
> +++ b/arch/x86/include/asm/syscall.h
> @@ -82,7 +82,12 @@ static inline void syscall_get_arguments(struct task_struct *task,
>  					 struct pt_regs *regs,
>  					 unsigned long *args)
>  {
> -	memcpy(args, &regs->bx, 6 * sizeof(args[0]));
> +	args[0] = regs->bx;
> +	args[1] = regs->cx;
> +	args[2] = regs->dx;
> +	args[3] = regs->si;
> +	args[4] = regs->di;
> +	args[5] = regs->bp;
>  }

Just for my education on things foritfy; would something like:

void syscall_get_arguments(struct pt_regs *regs, unsigned long args[6])
{
        memcpy(args, (typeof(args))&regs->bx, 6*sizeof(args[0]));
}

work?
Kees Cook July 11, 2024, 11:10 p.m. UTC | #6
On Thu, Jul 11, 2024 at 11:44:50PM +0200, Peter Zijlstra wrote:
> Just for my education on things foritfy; would something like:
> 
> void syscall_get_arguments(struct pt_regs *regs, unsigned long args[6])
> {
>         memcpy(args, (typeof(args))&regs->bx, 6*sizeof(args[0]));
> }

Short answer: no.

The long answer is long, and comes in two halves: the language half and
the fortify half.

First, the C standard requires that all function argument arrays be
decayed to pointers, so your prototype is semantically handled as if
it were:

void syscall_get_arguments(struct pt_regs *regs, unsigned long *args)

The "6" is just totally thrown away by the language. :(

*However* the compilers _will_ do things with it while generating
diagnostics, but only from the caller's perspective (code _inside_
has no awareness of the "6" unless the function has been inlined, sort
of). For example:

	unsigned long toosmall[4];
	...
	syscall_get_arguments(regs, toosmall);

Produces:

<source>:60:5: warning: 'syscall_get_arguments' accessing 48 bytes in a region of size 32 [-Wstringop-overflow=]
   60 |     syscall_get_arguments(regs, toosmall);
      |     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
<source>:60:5: note: referencing argument 2 of type 'long unsigned int[6]'
<source>:49:6: note: in a call to function 'syscall_get_arguments'
   49 | void syscall_get_arguments(struct pt_regs *regs, unsigned long args[6])
      |      ^~~~~~~~~~~~~~~~~~~~~
In function 'syscall_get_arguments',
    inlined from 'passthru' at <source>:60:5:
<source>:51:10: warning: 'memcpy' forming offset [32, 47] is out of the bounds [0, 32] of object 'toosmall' with type 'long unsigned int[4]' [-Warray-bounds=]
   51 |          memcpy(args, (typeof(args))&regs->bx, 6*sizeof(args[0]));
      |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
<source>: In function 'passthru':
<source>:58:19: note: 'toosmall' declared here
   58 |     unsigned long toosmall[4];
      |                   ^~~~~~~~

But syscall_get_arguments() internally has no idea what size "args" is:

void noinline syscall_get_arguments(struct pt_regs *regs, unsigned long args[6])
{
	report(sizeof(args));
	report(__builtin_object_size(args, 1));
	report(__builtin_dynamic_object_size(args, 1));
        memcpy(args, (typeof(args))&regs->bx, 6*sizeof(args[0]));
}

Which reports 8, SIZE_MAX (unknown), and SIZE_MAX (unknown) respectively.

And the language is so busted about this that there is actually a
diagnostic for "don't do that" that shows up with this code:

<source>: In function 'syscall_get_arguments':
<source>:53:22: warning: 'sizeof' on array function parameter 'args' will return size of 'long unsigned int *' [-Wsizeof-array-argument]
   53 |         report(sizeof(args));
      |                      ^

_However_, if we _inline_ the function, suddenly _bos and _bdos know
what's going on since they have visibility into the actual objection
definition:

void inline syscall_get_arguments(struct pt_regs *regs, unsigned long args[6])

Now it reports 8, 32 (8 * the "toosmall" array size 4), 32 (same: _bdos
falls back to _bos if there is no dynamic component). Note this is _not_
6, but 4, the actual object size.

Using the newer arg-sized array prototypes using a named argument _does_
inform the internals, but that requires changing the calling convention
for what is supposed to be a fixed size, and only behaves at runtime,
which is just silly for compile-time fixed sizes. For example:

void noinline syscall_get_arguments(struct pt_regs *regs, int n, unsigned long args[n])
...
	syscall_get_arguments(regs, 6, toosmall);

Does report the expected things for _bdos internally (48), but not for
sizeof (8) nor _bos (SIZE_MAX). Of course if we inline it, _bos starts
working and, along with _bdos, realizes it was lied to, and reports
32 again.

The internals of fortify use _bdos, so how _bdos acts is how fortify
will determine object sizes. With _bos/_bdos, there are two cases
fortify examines: "whole object size" (type 0) and "subobject size"
(type 1), where "type" is the second argument to _bos/_bdos:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/linux/compiler_types.h?h=v6.9#n371

The compiler's internal string API diagnostics (i.e. -Wstringop-overflow,
-Wstringop-overread) effectively perform bounds checks with type 0,
which is in line with the traditional way of treating everything as a
raw pointer and expecting to clobber everything following it. This is
terrible for mitigating security flaws, as we can't evaluate the intent
of memcpy destination bounds unambiguously if we don't know what the
destination boundaries are.

So, fortify's memcpy moved from type 0 to type 1 (over several years
now), and when doing that, we excluded doing type 1 checking on
_source_ objects because we already had so much to clean up just for
destinations. Unchecked destination object size overflows is where the
real-world linear overflow security flaws come from most often, so it
was the best use of our efforts.

But to avoid revisiting the same code twice, fortify will examine source
object sizes when it has already found a _destination_ object size
overflow (so that they can be fixed simultaneously), or when W=1 has
been enabled (so we there is always a log of it for the more sensitive
CI systems):
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/linux/fortify-string.h?h=v6.9#n554

For the patch in this thread, the W=1 case was reported (overread of
"bx"), because normally fortify would just ignore the overread of
the source.

Finally to answer your question about working around this case, _bos/_bdos
will see right through the casting because it operates on the actual
object behind it. (And casts to an array type are illegal too.)

    unsigned long args[6];

    report(__builtin_object_size((typeof(args))&regs->bx, 1));

<source>: In function 'show':
<source>:76:34: error: cast specifies array type
   76 |     report(__builtin_object_size((typeof(args))&regs->bx, 1));
      |                                  ^

And a (char *) cast doesn't work: _bos(1) reports 8, the size of bx. Using
locals doesn't help either:

    void *ptr = (void *)&regs->bx;

    report(__builtin_object_size(ptr, 1));

Still 8. And ultimately this is good, since fortify will see through to
the actual object that could get overflowed, etc. It's the behavior we
want for the overflow defense.

For cases where we really really absolutely cannot represent things in
a way that fortify can be happy about, there is unsafe_memcpy(). Right
now, only really wild stuff uses it (some network driver protocol
layout shenanigans, bcachefs, etc). Virtually all kernel objects that
are a destination for memcpy() should be able to be represented in a
simple and unambiguous way. (And we've successfully done so, with some
fun tangents along the way, like needing to have compilers implement
-fstrict-flex-arrays=3, but that is a whole other topic.)

-Kees
Peter Zijlstra July 12, 2024, 9 a.m. UTC | #7
On Thu, Jul 11, 2024 at 04:10:43PM -0700, Kees Cook wrote:

> The long answer is long, and comes in two halves: the language half and
> the fortify half.
> 
> First, the C standard requires that all function argument arrays be
> decayed to pointers, so your prototype is semantically handled as if
> it were:
> 
> void syscall_get_arguments(struct pt_regs *regs, unsigned long *args)
> 
> The "6" is just totally thrown away by the language. :(

Bah. I mean, I understand why, they *are* pointers after all. But to
then also discard the object size is just silly. I mean, traditionally C
doesn't give a crap about object size -- it is irrelevant.

But with alias analysis (which we switch off) and doubly so with this
fortify stuff, you do care about it.

So why throw it out?

> *However* the compilers _will_ do things with it while generating
> diagnostics, but only from the caller's perspective (code _inside_
> has no awareness of the "6" unless the function has been inlined, sort
> of). For example:
> 
> 	unsigned long toosmall[4];
> 	...
> 	syscall_get_arguments(regs, toosmall);
> 
> Produces:
> 
> <source>:60:5: warning: 'syscall_get_arguments' accessing 48 bytes in a region of size 32 [-Wstringop-overflow=]
>    60 |     syscall_get_arguments(regs, toosmall);
>       |     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Makes sense.

> But syscall_get_arguments() internally has no idea what size "args" is:
> 
> void noinline syscall_get_arguments(struct pt_regs *regs, unsigned long args[6])
> {
> 	report(sizeof(args));
> 	report(__builtin_object_size(args, 1));
> 	report(__builtin_dynamic_object_size(args, 1));
>         memcpy(args, (typeof(args))&regs->bx, 6*sizeof(args[0]));
> }
> 
> Which reports 8, SIZE_MAX (unknown), and SIZE_MAX (unknown) respectively.

And that's just daft :/

> And the language is so busted about this that there is actually a
> diagnostic for "don't do that" that shows up with this code:
> 
> <source>: In function 'syscall_get_arguments':
> <source>:53:22: warning: 'sizeof' on array function parameter 'args' will return size of 'long unsigned int *' [-Wsizeof-array-argument]
>    53 |         report(sizeof(args));
>       |                      ^

:-( This just doesn't make sense. How can you be so inconsistent about
things.

What will actually break if you 'fix' this? Given that inlining (see
below) changes the rules willy nilly, I feel we can (and should!) just
fix this.

> _However_, if we _inline_ the function, suddenly _bos and _bdos know
> what's going on since they have visibility into the actual objection
> definition:
> 
> void inline syscall_get_arguments(struct pt_regs *regs, unsigned long args[6])

*sigh* yes ofcourse, inlineing in C is a mess, it really does change the
meaning of what you've written.

Combine this with the fact that 'inline' doesn't actually mean anything
much these days because the compiler will do whatever it feels like
anyway, and you've got a language that's like *really* hard to use.

Hence our noinline and __always_inline attributes to force it where we
need to.

> Now it reports 8, 32 (8 * the "toosmall" array size 4), 32 (same: _bdos
> falls back to _bos if there is no dynamic component). Note this is _not_
> 6, but 4, the actual object size.
> 
> Using the newer arg-sized array prototypes using a named argument _does_
> inform the internals, but that requires changing the calling convention
> for what is supposed to be a fixed size, and only behaves at runtime,
> which is just silly for compile-time fixed sizes. For example:
> 
> void noinline syscall_get_arguments(struct pt_regs *regs, int n, unsigned long args[n])
> ...
> 	syscall_get_arguments(regs, 6, toosmall);
> 
> Does report the expected things for _bdos internally (48), but not for
> sizeof (8) nor _bos (SIZE_MAX). Of course if we inline it, _bos starts
> working and, along with _bdos, realizes it was lied to, and reports
> 32 again.

WTF ?!?! How can all this be so inconsistent and why are people okay
with that?

> For the patch in this thread, the W=1 case was reported (overread of
> "bx"), because normally fortify would just ignore the overread of
> the source.

So I'm not entirely sure I agree with that argument. Yes, &regs->bx is
'unsigned long *' and sizeof(unsigned long) is 8 (if we assume 64bit).
However, you can also read it as the point of pt_regs where bx sits --
which is a far more sensible interpretation IMO.

Because then we're looking at struct pt_regs and an offset therein.

> Finally to answer your question about working around this case, _bos/_bdos
> will see right through the casting because it operates on the actual
> object behind it. (And casts to an array type are illegal too.)
> 
>     unsigned long args[6];
> 
>     report(__builtin_object_size((typeof(args))&regs->bx, 1));
> 
> <source>: In function 'show':
> <source>:76:34: error: cast specifies array type
>    76 |     report(__builtin_object_size((typeof(args))&regs->bx, 1));
>       |                                  ^
> 
> And a (char *) cast doesn't work: _bos(1) reports 8, the size of bx. Using
> locals doesn't help either:
> 
>     void *ptr = (void *)&regs->bx;
> 
>     report(__builtin_object_size(ptr, 1));
> 
> Still 8. And ultimately this is good, since fortify will see through to
> the actual object that could get overflowed, etc. It's the behavior we
> want for the overflow defense.

So really pt_regs *is* an array of unsigned long, and I feel it is
really unfortunate we cannot express this in a way that is more concise.
Kees Cook July 12, 2024, 5:55 p.m. UTC | #8
On Fri, Jul 12, 2024 at 11:00:08AM +0200, Peter Zijlstra wrote:
> On Thu, Jul 11, 2024 at 04:10:43PM -0700, Kees Cook wrote:
> 
> > The long answer is long, and comes in two halves: the language half and
> > the fortify half.
> > 
> > First, the C standard requires that all function argument arrays be
> > decayed to pointers, so your prototype is semantically handled as if
> > it were:
> > 
> > void syscall_get_arguments(struct pt_regs *regs, unsigned long *args)
> > 
> > The "6" is just totally thrown away by the language. :(
> 
> Bah. I mean, I understand why, they *are* pointers after all. But to
> then also discard the object size is just silly. I mean, traditionally C
> doesn't give a crap about object size -- it is irrelevant.
> 
> But with alias analysis (which we switch off) and doubly so with this
> fortify stuff, you do care about it.
> 
> So why throw it out?

Yes, it boggles the mind. And for a language that doesn't care about
size, it sure cares about size in weird places.

> > And the language is so busted about this that there is actually a
> > diagnostic for "don't do that" that shows up with this code:
> > 
> > <source>: In function 'syscall_get_arguments':
> > <source>:53:22: warning: 'sizeof' on array function parameter 'args' will return size of 'long unsigned int *' [-Wsizeof-array-argument]
> >    53 |         report(sizeof(args));
> >       |                      ^
> 
> :-( This just doesn't make sense. How can you be so inconsistent about
> things.
> 
> What will actually break if you 'fix' this? Given that inlining (see
> below) changes the rules willy nilly, I feel we can (and should!) just
> fix this.

I'm not sure -- I have kind of given up on "standard" C helping with any
of this. I look to consistent language extensions now, and where there
isn't any, we've been trying to create them. :P And we're not alone:
Apple's -fbounds-safety stuff[1] looks good too, and overlaps with what
we were already designing with the "counted_by" attribute:
https://discourse.llvm.org/t/rfc-enforcing-bounds-safety-in-c-fbounds-safety/
(We borrowed the "counted_by" name, which is better than what we were
calling it: "element_count".)

> > Does report the expected things for _bdos internally (48), but not for
> > sizeof (8) nor _bos (SIZE_MAX). Of course if we inline it, _bos starts
> > working and, along with _bdos, realizes it was lied to, and reports
> > 32 again.
> 
> WTF ?!?! How can all this be so inconsistent and why are people okay
> with that?

This. A thousands times, this. I'm really not okay with it, and we've
been working to get rid of every ambiguity we trip over. It's made sane
bounds checking in Linux extremely hard to get right.

For more fun with array bounds, the one that absolutely floored me was
the exception over trailing arrays:

struct middle_t {
	u8 array[6];
	int foo;
} *middle;

__builtin_object_size(middle->array, 1)  ==  6

struct trailing_t {
	int foo;
	u8 array[6];
} *trailing;

__builtin_object_size(trailing->array, 1)  ==  SIZE_MAX ("unknown")

Because of all the fake flexible array abuses over decades, _bos/_bdos
were forced to _ignore_ the size of an array if it was the last member
of a struct. But after C99 flexible arrays were introduced (24 years
ago), nobody cleaned this up! We had to go get the compilers to create
-fstrict-flex-arrays=3 so that all those weird behaviors would go away:
https://git.kernel.org/linus/df8fc4e934c12b906d08050d7779f292b9c5c6b5

> > For the patch in this thread, the W=1 case was reported (overread of
> > "bx"), because normally fortify would just ignore the overread of
> > the source.
> 
> So I'm not entirely sure I agree with that argument. Yes, &regs->bx is
> 'unsigned long *' and sizeof(unsigned long) is 8 (if we assume 64bit).
> However, you can also read it as the point of pt_regs where bx sits --
> which is a far more sensible interpretation IMO.
> 
> Because then we're looking at struct pt_regs and an offset therein.

Right -- the way to make this unambiguous has been to make sure there
is an addressable object which contains the elements in question. For
the least disruption, the best we were able to do is introduce the
struct_group() helper. It's internally ugly, but it works.

> So really pt_regs *is* an array of unsigned long, and I feel it is
> really unfortunate we cannot express this in a way that is more concise.

A way to do this would be:

struct pt_regs {
	union {
		struct {
			unsigned long bx;
			unsigned long cx;
			unsigned long dx;
			unsigned long si;
			unsigned long di;
			unsigned long bp;
		};
		unsigned long syscall_regs[6];
	};
	unsigned long ax;
	...
};

Now regs->syscall_regs is addressable, sized, etc. "bx" means just bx,
and "syscall_regs" means all 6.

I wrote up a bunch of notes about much of this horror last year here:
https://people.kernel.org/kees/bounded-flexible-arrays-in-c

-Kees
Peter Zijlstra July 15, 2024, 8:37 a.m. UTC | #9
On Fri, Jul 12, 2024 at 10:55:16AM -0700, Kees Cook wrote:

> > What will actually break if you 'fix' this? Given that inlining (see
> > below) changes the rules willy nilly, I feel we can (and should!) just
> > fix this.
> 
> I'm not sure -- I have kind of given up on "standard" C helping with any
> of this. I look to consistent language extensions now, and where there
> isn't any, we've been trying to create them. :P 

Yeah, arguing a committee is mostly a waste of time, also, they
typically listen a lot more when you say, here these two compilers have
implemented it and this Linux thing uses it.

So yeah, language extensions are it.

> And we're not alone:
> Apple's -fbounds-safety stuff[1] looks good too, and overlaps with what
> we were already designing with the "counted_by" attribute:
> https://discourse.llvm.org/t/rfc-enforcing-bounds-safety-in-c-fbounds-safety/
> (We borrowed the "counted_by" name, which is better than what we were
> calling it: "element_count".)

Yep, I read that a while back. I think you referenced it in one of them
other threads where we disagreed over struct_size() :-)

> > > Does report the expected things for _bdos internally (48), but not for
> > > sizeof (8) nor _bos (SIZE_MAX). Of course if we inline it, _bos starts
> > > working and, along with _bdos, realizes it was lied to, and reports
> > > 32 again.
> > 
> > WTF ?!?! How can all this be so inconsistent and why are people okay
> > with that?
> 
> This. A thousands times, this. I'm really not okay with it, and we've
> been working to get rid of every ambiguity we trip over. It's made sane
> bounds checking in Linux extremely hard to get right.

Yeah, not just Linux I imagine. The rules are so insane it's near
useless. I'd say press onwards with the language extension, it's not
like Linux kernel is written in ANSI/ISO C anyway :-)

> For more fun with array bounds, the one that absolutely floored me was
> the exception over trailing arrays:
> 
> struct middle_t {
> 	u8 array[6];
> 	int foo;
> } *middle;
> 
> __builtin_object_size(middle->array, 1)  ==  6
> 
> struct trailing_t {
> 	int foo;
> 	u8 array[6];
> } *trailing;
> 
> __builtin_object_size(trailing->array, 1)  ==  SIZE_MAX ("unknown")

WTF :-)

> > So I'm not entirely sure I agree with that argument. Yes, &regs->bx is
> > 'unsigned long *' and sizeof(unsigned long) is 8 (if we assume 64bit).
> > However, you can also read it as the point of pt_regs where bx sits --
> > which is a far more sensible interpretation IMO.
> > 
> > Because then we're looking at struct pt_regs and an offset therein.
> 
> Right -- the way to make this unambiguous has been to make sure there
> is an addressable object which contains the elements in question. For
> the least disruption, the best we were able to do is introduce the
> struct_group() helper. It's internally ugly, but it works.

That macro is fairly trivial, nowhere near as ugly as struct_size() :-)
But urgh... can't we do something like:

void *memcpy_off(void *dst, const void *src, size_t off, size_t n)
{
	memcpu(dst, src+off, n);
	return dst;
}

And then you can write:

  memcpy_off(args, regs, offsetof(*regs, bx), 6);

I mean, that sucks, but possilby less than struct_group() does.

[ also, we should probably do:
  #defime offsetof(t, m) __builtin_offsetof(typeof(t), m) ]

> > So really pt_regs *is* an array of unsigned long, and I feel it is
> > really unfortunate we cannot express this in a way that is more concise.
> 
> A way to do this would be:
> 
> struct pt_regs {
> 	union {
> 		struct {
> 			unsigned long bx;
> 			unsigned long cx;
> 			unsigned long dx;
> 			unsigned long si;
> 			unsigned long di;
> 			unsigned long bp;
> 		};
> 		unsigned long syscall_regs[6];
> 	};
> 	unsigned long ax;
> 	...
> };
> 
> Now regs->syscall_regs is addressable, sized, etc. "bx" means just bx,
> and "syscall_regs" means all 6.

In this case I would just make all of pt_regs a union with one giant
array (much like some archs already have IIRC).

> I wrote up a bunch of notes about much of this horror last year here:
> https://people.kernel.org/kees/bounded-flexible-arrays-in-c

Oh, yeah, I think I saw that fly by on hackernews a while ago.
Kees Cook July 15, 2024, 5:01 p.m. UTC | #10
On Mon, Jul 15, 2024 at 10:37:13AM +0200, Peter Zijlstra wrote:
> Yeah, arguing a committee is mostly a waste of time, also, they
> typically listen a lot more when you say, here these two compilers have
> implemented it and this Linux thing uses it.

Precisely.

> So yeah, language extensions are it.

The one I may try to point out to the committee is flexible arrays in
unions. "array[1]" and "array[0]" are allowed in unions, but "array[]"
wasn't. This totally wrecks attempts to modernize a codebase that
depends on such union uses. We worked around it but finally got the
language extension, er, extended, recently:

https://github.com/llvm/llvm-project/commit/14ba782a87e16e9e15460a51f50e67e2744c26d9
https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=adb1c8a0f167c3a1f7593d75f5a10eb07a5d741a

> Yeah, not just Linux I imagine. The rules are so insane it's near
> useless. I'd say press onwards with the language extension, it's not
> like Linux kernel is written in ANSI/ISO C anyway :-)

Yup. Between the above flex arrays in unions fix and -fstrict-flex-arrays=3,
a C codebase can actually get unambiguous array bounds handling. And now
with the "counted_by" attribute, we can cover _dynamic_ arrays too.

> > struct_group() helper. It's internally ugly, but it works.
> 
> That macro is fairly trivial, nowhere near as ugly as struct_size() :-)
> But urgh... can't we do something like:
> 
> void *memcpy_off(void *dst, const void *src, size_t off, size_t n)
> {
> 	memcpu(dst, src+off, n);
> 	return dst;
> }
> 
> And then you can write:
> 
>   memcpy_off(args, regs, offsetof(*regs, bx), 6);
> 
> I mean, that sucks, but possilby less than struct_group() does.
> 
> [ also, we should probably do:
>   #defime offsetof(t, m) __builtin_offsetof(typeof(t), m) ]

Yeah, that would be possible, but I wanted something that the compiler
could reason about for a given identifier since it's not just fortify
that cares about object bounds. Being able to declare the layouts so
that the bounds sanitizer instrumentation wouldn't get confused was
important too. That is more related to arrays than integral members, but
separating those quickly became confusing to declare easily/correctly. So
struct_group() ended up being the best direction in the general case.

> In this case I would just make all of pt_regs a union with one giant
> array (much like some archs already have IIRC).

Yup, that works too. (Though pt_regs is relatively unique in this "the
whole thing is expected to be an array" characteristic.)

-Kees
Kees Cook Aug. 23, 2024, 12:12 a.m. UTC | #11
On Thu, Jul 11, 2024 at 02:01:53PM -0700, Dave Hansen wrote:
> On 7/8/24 13:22, Kees Cook wrote:
> ...
> > diff --git a/arch/x86/include/asm/syscall.h b/arch/x86/include/asm/syscall.h
> > index 2fc7bc3863ff..7c488ff0c764 100644
> > --- a/arch/x86/include/asm/syscall.h
> > +++ b/arch/x86/include/asm/syscall.h
> > @@ -82,7 +82,12 @@ static inline void syscall_get_arguments(struct task_struct *task,
> >  					 struct pt_regs *regs,
> >  					 unsigned long *args)
> >  {
> > -	memcpy(args, &regs->bx, 6 * sizeof(args[0]));
> > +	args[0] = regs->bx;
> > +	args[1] = regs->cx;
> > +	args[2] = regs->dx;
> > +	args[3] = regs->si;
> > +	args[4] = regs->di;
> > +	args[5] = regs->bp;
> >  }
> >  
> 
> Yeah, that's much less magic.  I'll stick this in the queue to go in to
> the tree in a few weeks.
> 
> Acked-by: Dave Hansen <dave.hansen@linux.intel.com>

Ping. I can take it via the hardening tree if you want, though?
diff mbox series

Patch

diff --git a/arch/x86/include/asm/syscall.h b/arch/x86/include/asm/syscall.h
index 2fc7bc3863ff..7c488ff0c764 100644
--- a/arch/x86/include/asm/syscall.h
+++ b/arch/x86/include/asm/syscall.h
@@ -82,7 +82,12 @@  static inline void syscall_get_arguments(struct task_struct *task,
 					 struct pt_regs *regs,
 					 unsigned long *args)
 {
-	memcpy(args, &regs->bx, 6 * sizeof(args[0]));
+	args[0] = regs->bx;
+	args[1] = regs->cx;
+	args[2] = regs->dx;
+	args[3] = regs->si;
+	args[4] = regs->di;
+	args[5] = regs->bp;
 }
 
 static inline int syscall_get_arch(struct task_struct *task)