diff mbox series

[v10,3/6] arm64: add uaccess to machine check safe

Message ID 20240129134652.4004931-4-tongtiangen@huawei.com (mailing list archive)
State New
Headers show
Series arm64: add machine check safe support | expand

Commit Message

Tong Tiangen Jan. 29, 2024, 1:46 p.m. UTC
If user process access memory fails due to hardware memory error, only the
relevant processes are affected, so it is more reasonable to kill the user
process and isolate the corrupt page than to panic the kernel.

Signed-off-by: Tong Tiangen <tongtiangen@huawei.com>
---
 arch/arm64/lib/copy_from_user.S | 10 +++++-----
 arch/arm64/lib/copy_to_user.S   | 10 +++++-----
 arch/arm64/mm/extable.c         |  8 ++++----
 3 files changed, 14 insertions(+), 14 deletions(-)

Comments

Mark Rutland Jan. 29, 2024, 5:43 p.m. UTC | #1
On Mon, Jan 29, 2024 at 09:46:49PM +0800, Tong Tiangen wrote:
> If user process access memory fails due to hardware memory error, only the
> relevant processes are affected, so it is more reasonable to kill the user
> process and isolate the corrupt page than to panic the kernel.
> 
> Signed-off-by: Tong Tiangen <tongtiangen@huawei.com>
> ---
>  arch/arm64/lib/copy_from_user.S | 10 +++++-----
>  arch/arm64/lib/copy_to_user.S   | 10 +++++-----
>  arch/arm64/mm/extable.c         |  8 ++++----
>  3 files changed, 14 insertions(+), 14 deletions(-)
> 
> diff --git a/arch/arm64/lib/copy_from_user.S b/arch/arm64/lib/copy_from_user.S
> index 34e317907524..1bf676e9201d 100644
> --- a/arch/arm64/lib/copy_from_user.S
> +++ b/arch/arm64/lib/copy_from_user.S
> @@ -25,7 +25,7 @@
>  	.endm
>  
>  	.macro strb1 reg, ptr, val
> -	strb \reg, [\ptr], \val
> +	USER(9998f, strb \reg, [\ptr], \val)
>  	.endm

This is a store to *kernel* memory, not user memory. It should not be marked
with USER().

I understand that you *might* want to handle memory errors on these stores, but
the commit message doesn't describe that and the associated trade-off. For
example, consider that when a copy_form_user fails we'll try to zero the
remaining buffer via memset(); so if a STR* instruction in copy_to_user
faulted, upon handling the fault we'll immediately try to fix that up with some
more stores which will also fault, but won't get fixed up, leading to a panic()
anyway...

Further, this change will also silently fixup unexpected kernel faults if we
pass bad kernel pointers to copy_{to,from}_user, which will hide real bugs.

So NAK to this change as-is; likewise for the addition of USER() to other ldr*
macros in copy_from_user.S and the addition of USER() str* macros in
copy_to_user.S.

If we want to handle memory errors on some kaccesses, we need a new EX_TYPE_*
separate from the usual EX_TYPE_KACESS_ERR_ZERO that means "handle memory
errors, but treat other faults as fatal". That should come with a rationale and
explanation of why it's actually useful.

[...]

> diff --git a/arch/arm64/mm/extable.c b/arch/arm64/mm/extable.c
> index 478e639f8680..28ec35e3d210 100644
> --- a/arch/arm64/mm/extable.c
> +++ b/arch/arm64/mm/extable.c
> @@ -85,10 +85,10 @@ bool fixup_exception_mc(struct pt_regs *regs)
>  	if (!ex)
>  		return false;
>  
> -	/*
> -	 * This is not complete, More Machine check safe extable type can
> -	 * be processed here.
> -	 */
> +	switch (ex->type) {
> +	case EX_TYPE_UACCESS_ERR_ZERO:
> +		return ex_handler_uaccess_err_zero(ex, regs);
> +	}

Please fold this part into the prior patch, and start ogf with *only* handling
errors on accesses already marked with EX_TYPE_UACCESS_ERR_ZERO. I think that
change would be relatively uncontroversial, and it would be much easier to
build atop that.

Thanks,
Mark.
Tong Tiangen Jan. 30, 2024, 11:14 a.m. UTC | #2
在 2024/1/30 1:43, Mark Rutland 写道:
> On Mon, Jan 29, 2024 at 09:46:49PM +0800, Tong Tiangen wrote:
>> If user process access memory fails due to hardware memory error, only the
>> relevant processes are affected, so it is more reasonable to kill the user
>> process and isolate the corrupt page than to panic the kernel.
>>
>> Signed-off-by: Tong Tiangen <tongtiangen@huawei.com>
>> ---
>>   arch/arm64/lib/copy_from_user.S | 10 +++++-----
>>   arch/arm64/lib/copy_to_user.S   | 10 +++++-----
>>   arch/arm64/mm/extable.c         |  8 ++++----
>>   3 files changed, 14 insertions(+), 14 deletions(-)
>>
>> diff --git a/arch/arm64/lib/copy_from_user.S b/arch/arm64/lib/copy_from_user.S
>> index 34e317907524..1bf676e9201d 100644
>> --- a/arch/arm64/lib/copy_from_user.S
>> +++ b/arch/arm64/lib/copy_from_user.S
>> @@ -25,7 +25,7 @@
>>   	.endm
>>   
>>   	.macro strb1 reg, ptr, val
>> -	strb \reg, [\ptr], \val
>> +	USER(9998f, strb \reg, [\ptr], \val)
>>   	.endm
> 
> This is a store to *kernel* memory, not user memory. It should not be marked
> with USER().

This does cause some misconceptions, and my original idea was to reuse 
the fixup capability of USER().

> 
> I understand that you *might* want to handle memory errors on these stores, but
> the commit message doesn't describe that and the associated trade-off. For
> example, consider that when a copy_form_user fails we'll try to zero the
> remaining buffer via memset(); so if a STR* instruction in copy_to_user
> faulted, upon handling the fault we'll immediately try to fix that up with some
> more stores which will also fault, but won't get fixed up, leading to a panic()
> anyway...

When copy_from_user() triggers a memory error, there are two cases: ld
user memory error and st kernel memory error. The former can clear the
remaining kernel memory, and the latter cannot be cleared because the
page is poison.

The purpose of memset() is to keep the data consistency of the kernel
memory (or multiple subsequent pages) (the data that is not copied
should be set to 0). My consideration here is that since our ultimate
goal is to kill the owner thread of the kernel memory data, the
"consistency" of the kernel memory data is not so important, but
increases the processing complexity.

The trade-offs do need to be added to commit message after agreement
is reached :)
> 
> Further, this change will also silently fixup unexpected kernel faults if we
> pass bad kernel pointers to copy_{to,from}_user, which will hide real bugs.

I think this is better than the panic kernel, because the real bugs
belongs to the user process. Even if the wrong pointer is
transferred, the page corresponding to the wrong pointer has a memroy
error. In addition, the panic information contains necessary information
for users to check.

> 
> So NAK to this change as-is; likewise for the addition of USER() to other ldr*
> macros in copy_from_user.S and the addition of USER() str* macros in
> copy_to_user.S.
> 
> If we want to handle memory errors on some kaccesses, we need a new EX_TYPE_*
> separate from the usual EX_TYPE_KACESS_ERR_ZERO that means "handle memory
> errors, but treat other faults as fatal". That should come with a rationale and
> explanation of why it's actually useful.

This makes sense. Add kaccess types that can be processed properly.

> 
> [...]
> 
>> diff --git a/arch/arm64/mm/extable.c b/arch/arm64/mm/extable.c
>> index 478e639f8680..28ec35e3d210 100644
>> --- a/arch/arm64/mm/extable.c
>> +++ b/arch/arm64/mm/extable.c
>> @@ -85,10 +85,10 @@ bool fixup_exception_mc(struct pt_regs *regs)
>>   	if (!ex)
>>   		return false;
>>   
>> -	/*
>> -	 * This is not complete, More Machine check safe extable type can
>> -	 * be processed here.
>> -	 */
>> +	switch (ex->type) {
>> +	case EX_TYPE_UACCESS_ERR_ZERO:
>> +		return ex_handler_uaccess_err_zero(ex, regs);
>> +	}
> 
> Please fold this part into the prior patch, and start ogf with *only* handling
> errors on accesses already marked with EX_TYPE_UACCESS_ERR_ZERO. I think that
> change would be relatively uncontroversial, and it would be much easier to
> build atop that.

OK, the two patches will be merged in the next release.

Many thanks.
Tong.

> 
> Thanks,
> Mark.
> .
Mark Rutland Jan. 30, 2024, 12:01 p.m. UTC | #3
On Tue, Jan 30, 2024 at 07:14:35PM +0800, Tong Tiangen wrote:
> 在 2024/1/30 1:43, Mark Rutland 写道:
> > On Mon, Jan 29, 2024 at 09:46:49PM +0800, Tong Tiangen wrote:
> > Further, this change will also silently fixup unexpected kernel faults if we
> > pass bad kernel pointers to copy_{to,from}_user, which will hide real bugs.
> 
> I think this is better than the panic kernel, because the real bugs
> belongs to the user process. Even if the wrong pointer is
> transferred, the page corresponding to the wrong pointer has a memroy
> error.

I think you have misunderstood my point; I'm talking about the case of a bad
kernel pointer *without* a memory error.

For example, consider some buggy code such as:

	void __user *uptr = some_valid_user_pointer;
	void *kptr = NULL; // or any other bad pointer

	ret = copy_to_user(uptr, kptr, size);
	if (ret)
		return -EFAULT;

Before this patch, when copy_to_user() attempted to load from NULL it would
fault, there would be no fixup handler for the LDR, and the kernel would die(),
reporting the bad kernel access.

After this patch (which adds fixup handlers to all the LDR*s in
copy_to_user()), the fault (which is *not* a memory error) would be handled by
the fixup handler, and copy_to_user() would return an error without *any*
indication of the horrible kernel bug.

This will hide kernel bugs, which will make those harder to identify and fix,
and will also potentially make it easier to exploit the kernel: if the user
somehow gains control of the kernel pointer, they can rely on the fixup handler
returning an error, and can scan through memory rather than dying as soon as
they pas a bad pointer.

> In addition, the panic information contains necessary information
> for users to check.

There is no panic() in the case I am describing.

> > So NAK to this change as-is; likewise for the addition of USER() to other ldr*
> > macros in copy_from_user.S and the addition of USER() str* macros in
> > copy_to_user.S.
> > 
> > If we want to handle memory errors on some kaccesses, we need a new EX_TYPE_*
> > separate from the usual EX_TYPE_KACESS_ERR_ZERO that means "handle memory
> > errors, but treat other faults as fatal". That should come with a rationale and
> > explanation of why it's actually useful.
> 
> This makes sense. Add kaccess types that can be processed properly.
> 
> > 
> > [...]
> > 
> > > diff --git a/arch/arm64/mm/extable.c b/arch/arm64/mm/extable.c
> > > index 478e639f8680..28ec35e3d210 100644
> > > --- a/arch/arm64/mm/extable.c
> > > +++ b/arch/arm64/mm/extable.c
> > > @@ -85,10 +85,10 @@ bool fixup_exception_mc(struct pt_regs *regs)
> > >   	if (!ex)
> > >   		return false;
> > > -	/*
> > > -	 * This is not complete, More Machine check safe extable type can
> > > -	 * be processed here.
> > > -	 */
> > > +	switch (ex->type) {
> > > +	case EX_TYPE_UACCESS_ERR_ZERO:
> > > +		return ex_handler_uaccess_err_zero(ex, regs);
> > > +	}
> > 
> > Please fold this part into the prior patch, and start ogf with *only* handling
> > errors on accesses already marked with EX_TYPE_UACCESS_ERR_ZERO. I think that
> > change would be relatively uncontroversial, and it would be much easier to
> > build atop that.
> 
> OK, the two patches will be merged in the next release.

Thanks.

Mark.
Tong Tiangen Jan. 30, 2024, 1:41 p.m. UTC | #4
在 2024/1/30 20:01, Mark Rutland 写道:
> On Tue, Jan 30, 2024 at 07:14:35PM +0800, Tong Tiangen wrote:
>> 在 2024/1/30 1:43, Mark Rutland 写道:
>>> On Mon, Jan 29, 2024 at 09:46:49PM +0800, Tong Tiangen wrote:
>>> Further, this change will also silently fixup unexpected kernel faults if we
>>> pass bad kernel pointers to copy_{to,from}_user, which will hide real bugs.
>>
>> I think this is better than the panic kernel, because the real bugs
>> belongs to the user process. Even if the wrong pointer is
>> transferred, the page corresponding to the wrong pointer has a memroy
>> error.
> 
> I think you have misunderstood my point; I'm talking about the case of a bad
> kernel pointer *without* a memory error.
> 
> For example, consider some buggy code such as:
> 
> 	void __user *uptr = some_valid_user_pointer;
> 	void *kptr = NULL; // or any other bad pointer
> 
> 	ret = copy_to_user(uptr, kptr, size);
> 	if (ret)
> 		return -EFAULT;
> 
> Before this patch, when copy_to_user() attempted to load from NULL it would
> fault, there would be no fixup handler for the LDR, and the kernel would die(),
> reporting the bad kernel access.
> 
> After this patch (which adds fixup handlers to all the LDR*s in
> copy_to_user()), the fault (which is *not* a memory error) would be handled by
> the fixup handler, and copy_to_user() would return an error without *any*
> indication of the horrible kernel bug.
> 
> This will hide kernel bugs, which will make those harder to identify and fix,
> and will also potentially make it easier to exploit the kernel: if the user
> somehow gains control of the kernel pointer, they can rely on the fixup handler
> returning an error, and can scan through memory rather than dying as soon as
> they pas a bad pointer.

I should understand what you mean. I'll think about this and reply.

Many thanks.
Tong.

> 
>> In addition, the panic information contains necessary information
>> for users to check.
> 
> There is no panic() in the case I am describing.
> 
>>> So NAK to this change as-is; likewise for the addition of USER() to other ldr*
>>> macros in copy_from_user.S and the addition of USER() str* macros in
>>> copy_to_user.S.
>>>
>>> If we want to handle memory errors on some kaccesses, we need a new EX_TYPE_*
>>> separate from the usual EX_TYPE_KACESS_ERR_ZERO that means "handle memory
>>> errors, but treat other faults as fatal". That should come with a rationale and
>>> explanation of why it's actually useful.
>>
>> This makes sense. Add kaccess types that can be processed properly.
>>
>>>
>>> [...]
>>>
>>>> diff --git a/arch/arm64/mm/extable.c b/arch/arm64/mm/extable.c
>>>> index 478e639f8680..28ec35e3d210 100644
>>>> --- a/arch/arm64/mm/extable.c
>>>> +++ b/arch/arm64/mm/extable.c
>>>> @@ -85,10 +85,10 @@ bool fixup_exception_mc(struct pt_regs *regs)
>>>>    	if (!ex)
>>>>    		return false;
>>>> -	/*
>>>> -	 * This is not complete, More Machine check safe extable type can
>>>> -	 * be processed here.
>>>> -	 */
>>>> +	switch (ex->type) {
>>>> +	case EX_TYPE_UACCESS_ERR_ZERO:
>>>> +		return ex_handler_uaccess_err_zero(ex, regs);
>>>> +	}
>>>
>>> Please fold this part into the prior patch, and start ogf with *only* handling
>>> errors on accesses already marked with EX_TYPE_UACCESS_ERR_ZERO. I think that
>>> change would be relatively uncontroversial, and it would be much easier to
>>> build atop that.
>>
>> OK, the two patches will be merged in the next release.
> 
> Thanks.
> 
> Mark.
> .
diff mbox series

Patch

diff --git a/arch/arm64/lib/copy_from_user.S b/arch/arm64/lib/copy_from_user.S
index 34e317907524..1bf676e9201d 100644
--- a/arch/arm64/lib/copy_from_user.S
+++ b/arch/arm64/lib/copy_from_user.S
@@ -25,7 +25,7 @@ 
 	.endm
 
 	.macro strb1 reg, ptr, val
-	strb \reg, [\ptr], \val
+	USER(9998f, strb \reg, [\ptr], \val)
 	.endm
 
 	.macro ldrh1 reg, ptr, val
@@ -33,7 +33,7 @@ 
 	.endm
 
 	.macro strh1 reg, ptr, val
-	strh \reg, [\ptr], \val
+	USER(9998f, strh \reg, [\ptr], \val)
 	.endm
 
 	.macro ldr1 reg, ptr, val
@@ -41,7 +41,7 @@ 
 	.endm
 
 	.macro str1 reg, ptr, val
-	str \reg, [\ptr], \val
+	USER(9998f, str \reg, [\ptr], \val)
 	.endm
 
 	.macro ldp1 reg1, reg2, ptr, val
@@ -49,7 +49,7 @@ 
 	.endm
 
 	.macro stp1 reg1, reg2, ptr, val
-	stp \reg1, \reg2, [\ptr], \val
+	USER(9998f, stp \reg1, \reg2, [\ptr], \val)
 	.endm
 
 end	.req	x5
@@ -66,7 +66,7 @@  SYM_FUNC_START(__arch_copy_from_user)
 	b.ne	9998f
 	// Before being absolutely sure we couldn't copy anything, try harder
 USER(9998f, ldtrb tmp1w, [srcin])
-	strb	tmp1w, [dst], #1
+USER(9998f, strb	tmp1w, [dst], #1)
 9998:	sub	x0, end, dst			// bytes not copied
 	ret
 SYM_FUNC_END(__arch_copy_from_user)
diff --git a/arch/arm64/lib/copy_to_user.S b/arch/arm64/lib/copy_to_user.S
index 802231772608..cc031bd87455 100644
--- a/arch/arm64/lib/copy_to_user.S
+++ b/arch/arm64/lib/copy_to_user.S
@@ -20,7 +20,7 @@ 
  *	x0 - bytes not copied
  */
 	.macro ldrb1 reg, ptr, val
-	ldrb  \reg, [\ptr], \val
+	USER(9998f, ldrb  \reg, [\ptr], \val)
 	.endm
 
 	.macro strb1 reg, ptr, val
@@ -28,7 +28,7 @@ 
 	.endm
 
 	.macro ldrh1 reg, ptr, val
-	ldrh  \reg, [\ptr], \val
+	USER(9998f, ldrh  \reg, [\ptr], \val)
 	.endm
 
 	.macro strh1 reg, ptr, val
@@ -36,7 +36,7 @@ 
 	.endm
 
 	.macro ldr1 reg, ptr, val
-	ldr \reg, [\ptr], \val
+	USER(9998f, ldr \reg, [\ptr], \val)
 	.endm
 
 	.macro str1 reg, ptr, val
@@ -44,7 +44,7 @@ 
 	.endm
 
 	.macro ldp1 reg1, reg2, ptr, val
-	ldp \reg1, \reg2, [\ptr], \val
+	USER(9998f, ldp \reg1, \reg2, [\ptr], \val)
 	.endm
 
 	.macro stp1 reg1, reg2, ptr, val
@@ -64,7 +64,7 @@  SYM_FUNC_START(__arch_copy_to_user)
 9997:	cmp	dst, dstin
 	b.ne	9998f
 	// Before being absolutely sure we couldn't copy anything, try harder
-	ldrb	tmp1w, [srcin]
+USER(9998f, ldrb	tmp1w, [srcin])
 USER(9998f, sttrb tmp1w, [dst])
 	add	dst, dst, #1
 9998:	sub	x0, end, dst			// bytes not copied
diff --git a/arch/arm64/mm/extable.c b/arch/arm64/mm/extable.c
index 478e639f8680..28ec35e3d210 100644
--- a/arch/arm64/mm/extable.c
+++ b/arch/arm64/mm/extable.c
@@ -85,10 +85,10 @@  bool fixup_exception_mc(struct pt_regs *regs)
 	if (!ex)
 		return false;
 
-	/*
-	 * This is not complete, More Machine check safe extable type can
-	 * be processed here.
-	 */
+	switch (ex->type) {
+	case EX_TYPE_UACCESS_ERR_ZERO:
+		return ex_handler_uaccess_err_zero(ex, regs);
+	}
 
 	return false;
 }