Message ID | 20240129134652.4004931-4-tongtiangen@huawei.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | arm64: add machine check safe support | expand |
On Mon, Jan 29, 2024 at 09:46:49PM +0800, Tong Tiangen wrote: > If user process access memory fails due to hardware memory error, only the > relevant processes are affected, so it is more reasonable to kill the user > process and isolate the corrupt page than to panic the kernel. > > Signed-off-by: Tong Tiangen <tongtiangen@huawei.com> > --- > arch/arm64/lib/copy_from_user.S | 10 +++++----- > arch/arm64/lib/copy_to_user.S | 10 +++++----- > arch/arm64/mm/extable.c | 8 ++++---- > 3 files changed, 14 insertions(+), 14 deletions(-) > > diff --git a/arch/arm64/lib/copy_from_user.S b/arch/arm64/lib/copy_from_user.S > index 34e317907524..1bf676e9201d 100644 > --- a/arch/arm64/lib/copy_from_user.S > +++ b/arch/arm64/lib/copy_from_user.S > @@ -25,7 +25,7 @@ > .endm > > .macro strb1 reg, ptr, val > - strb \reg, [\ptr], \val > + USER(9998f, strb \reg, [\ptr], \val) > .endm This is a store to *kernel* memory, not user memory. It should not be marked with USER(). I understand that you *might* want to handle memory errors on these stores, but the commit message doesn't describe that and the associated trade-off. For example, consider that when a copy_form_user fails we'll try to zero the remaining buffer via memset(); so if a STR* instruction in copy_to_user faulted, upon handling the fault we'll immediately try to fix that up with some more stores which will also fault, but won't get fixed up, leading to a panic() anyway... Further, this change will also silently fixup unexpected kernel faults if we pass bad kernel pointers to copy_{to,from}_user, which will hide real bugs. So NAK to this change as-is; likewise for the addition of USER() to other ldr* macros in copy_from_user.S and the addition of USER() str* macros in copy_to_user.S. If we want to handle memory errors on some kaccesses, we need a new EX_TYPE_* separate from the usual EX_TYPE_KACESS_ERR_ZERO that means "handle memory errors, but treat other faults as fatal". That should come with a rationale and explanation of why it's actually useful. [...] > diff --git a/arch/arm64/mm/extable.c b/arch/arm64/mm/extable.c > index 478e639f8680..28ec35e3d210 100644 > --- a/arch/arm64/mm/extable.c > +++ b/arch/arm64/mm/extable.c > @@ -85,10 +85,10 @@ bool fixup_exception_mc(struct pt_regs *regs) > if (!ex) > return false; > > - /* > - * This is not complete, More Machine check safe extable type can > - * be processed here. > - */ > + switch (ex->type) { > + case EX_TYPE_UACCESS_ERR_ZERO: > + return ex_handler_uaccess_err_zero(ex, regs); > + } Please fold this part into the prior patch, and start ogf with *only* handling errors on accesses already marked with EX_TYPE_UACCESS_ERR_ZERO. I think that change would be relatively uncontroversial, and it would be much easier to build atop that. Thanks, Mark.
在 2024/1/30 1:43, Mark Rutland 写道: > On Mon, Jan 29, 2024 at 09:46:49PM +0800, Tong Tiangen wrote: >> If user process access memory fails due to hardware memory error, only the >> relevant processes are affected, so it is more reasonable to kill the user >> process and isolate the corrupt page than to panic the kernel. >> >> Signed-off-by: Tong Tiangen <tongtiangen@huawei.com> >> --- >> arch/arm64/lib/copy_from_user.S | 10 +++++----- >> arch/arm64/lib/copy_to_user.S | 10 +++++----- >> arch/arm64/mm/extable.c | 8 ++++---- >> 3 files changed, 14 insertions(+), 14 deletions(-) >> >> diff --git a/arch/arm64/lib/copy_from_user.S b/arch/arm64/lib/copy_from_user.S >> index 34e317907524..1bf676e9201d 100644 >> --- a/arch/arm64/lib/copy_from_user.S >> +++ b/arch/arm64/lib/copy_from_user.S >> @@ -25,7 +25,7 @@ >> .endm >> >> .macro strb1 reg, ptr, val >> - strb \reg, [\ptr], \val >> + USER(9998f, strb \reg, [\ptr], \val) >> .endm > > This is a store to *kernel* memory, not user memory. It should not be marked > with USER(). This does cause some misconceptions, and my original idea was to reuse the fixup capability of USER(). > > I understand that you *might* want to handle memory errors on these stores, but > the commit message doesn't describe that and the associated trade-off. For > example, consider that when a copy_form_user fails we'll try to zero the > remaining buffer via memset(); so if a STR* instruction in copy_to_user > faulted, upon handling the fault we'll immediately try to fix that up with some > more stores which will also fault, but won't get fixed up, leading to a panic() > anyway... When copy_from_user() triggers a memory error, there are two cases: ld user memory error and st kernel memory error. The former can clear the remaining kernel memory, and the latter cannot be cleared because the page is poison. The purpose of memset() is to keep the data consistency of the kernel memory (or multiple subsequent pages) (the data that is not copied should be set to 0). My consideration here is that since our ultimate goal is to kill the owner thread of the kernel memory data, the "consistency" of the kernel memory data is not so important, but increases the processing complexity. The trade-offs do need to be added to commit message after agreement is reached :) > > Further, this change will also silently fixup unexpected kernel faults if we > pass bad kernel pointers to copy_{to,from}_user, which will hide real bugs. I think this is better than the panic kernel, because the real bugs belongs to the user process. Even if the wrong pointer is transferred, the page corresponding to the wrong pointer has a memroy error. In addition, the panic information contains necessary information for users to check. > > So NAK to this change as-is; likewise for the addition of USER() to other ldr* > macros in copy_from_user.S and the addition of USER() str* macros in > copy_to_user.S. > > If we want to handle memory errors on some kaccesses, we need a new EX_TYPE_* > separate from the usual EX_TYPE_KACESS_ERR_ZERO that means "handle memory > errors, but treat other faults as fatal". That should come with a rationale and > explanation of why it's actually useful. This makes sense. Add kaccess types that can be processed properly. > > [...] > >> diff --git a/arch/arm64/mm/extable.c b/arch/arm64/mm/extable.c >> index 478e639f8680..28ec35e3d210 100644 >> --- a/arch/arm64/mm/extable.c >> +++ b/arch/arm64/mm/extable.c >> @@ -85,10 +85,10 @@ bool fixup_exception_mc(struct pt_regs *regs) >> if (!ex) >> return false; >> >> - /* >> - * This is not complete, More Machine check safe extable type can >> - * be processed here. >> - */ >> + switch (ex->type) { >> + case EX_TYPE_UACCESS_ERR_ZERO: >> + return ex_handler_uaccess_err_zero(ex, regs); >> + } > > Please fold this part into the prior patch, and start ogf with *only* handling > errors on accesses already marked with EX_TYPE_UACCESS_ERR_ZERO. I think that > change would be relatively uncontroversial, and it would be much easier to > build atop that. OK, the two patches will be merged in the next release. Many thanks. Tong. > > Thanks, > Mark. > .
On Tue, Jan 30, 2024 at 07:14:35PM +0800, Tong Tiangen wrote: > 在 2024/1/30 1:43, Mark Rutland 写道: > > On Mon, Jan 29, 2024 at 09:46:49PM +0800, Tong Tiangen wrote: > > Further, this change will also silently fixup unexpected kernel faults if we > > pass bad kernel pointers to copy_{to,from}_user, which will hide real bugs. > > I think this is better than the panic kernel, because the real bugs > belongs to the user process. Even if the wrong pointer is > transferred, the page corresponding to the wrong pointer has a memroy > error. I think you have misunderstood my point; I'm talking about the case of a bad kernel pointer *without* a memory error. For example, consider some buggy code such as: void __user *uptr = some_valid_user_pointer; void *kptr = NULL; // or any other bad pointer ret = copy_to_user(uptr, kptr, size); if (ret) return -EFAULT; Before this patch, when copy_to_user() attempted to load from NULL it would fault, there would be no fixup handler for the LDR, and the kernel would die(), reporting the bad kernel access. After this patch (which adds fixup handlers to all the LDR*s in copy_to_user()), the fault (which is *not* a memory error) would be handled by the fixup handler, and copy_to_user() would return an error without *any* indication of the horrible kernel bug. This will hide kernel bugs, which will make those harder to identify and fix, and will also potentially make it easier to exploit the kernel: if the user somehow gains control of the kernel pointer, they can rely on the fixup handler returning an error, and can scan through memory rather than dying as soon as they pas a bad pointer. > In addition, the panic information contains necessary information > for users to check. There is no panic() in the case I am describing. > > So NAK to this change as-is; likewise for the addition of USER() to other ldr* > > macros in copy_from_user.S and the addition of USER() str* macros in > > copy_to_user.S. > > > > If we want to handle memory errors on some kaccesses, we need a new EX_TYPE_* > > separate from the usual EX_TYPE_KACESS_ERR_ZERO that means "handle memory > > errors, but treat other faults as fatal". That should come with a rationale and > > explanation of why it's actually useful. > > This makes sense. Add kaccess types that can be processed properly. > > > > > [...] > > > > > diff --git a/arch/arm64/mm/extable.c b/arch/arm64/mm/extable.c > > > index 478e639f8680..28ec35e3d210 100644 > > > --- a/arch/arm64/mm/extable.c > > > +++ b/arch/arm64/mm/extable.c > > > @@ -85,10 +85,10 @@ bool fixup_exception_mc(struct pt_regs *regs) > > > if (!ex) > > > return false; > > > - /* > > > - * This is not complete, More Machine check safe extable type can > > > - * be processed here. > > > - */ > > > + switch (ex->type) { > > > + case EX_TYPE_UACCESS_ERR_ZERO: > > > + return ex_handler_uaccess_err_zero(ex, regs); > > > + } > > > > Please fold this part into the prior patch, and start ogf with *only* handling > > errors on accesses already marked with EX_TYPE_UACCESS_ERR_ZERO. I think that > > change would be relatively uncontroversial, and it would be much easier to > > build atop that. > > OK, the two patches will be merged in the next release. Thanks. Mark.
在 2024/1/30 20:01, Mark Rutland 写道: > On Tue, Jan 30, 2024 at 07:14:35PM +0800, Tong Tiangen wrote: >> 在 2024/1/30 1:43, Mark Rutland 写道: >>> On Mon, Jan 29, 2024 at 09:46:49PM +0800, Tong Tiangen wrote: >>> Further, this change will also silently fixup unexpected kernel faults if we >>> pass bad kernel pointers to copy_{to,from}_user, which will hide real bugs. >> >> I think this is better than the panic kernel, because the real bugs >> belongs to the user process. Even if the wrong pointer is >> transferred, the page corresponding to the wrong pointer has a memroy >> error. > > I think you have misunderstood my point; I'm talking about the case of a bad > kernel pointer *without* a memory error. > > For example, consider some buggy code such as: > > void __user *uptr = some_valid_user_pointer; > void *kptr = NULL; // or any other bad pointer > > ret = copy_to_user(uptr, kptr, size); > if (ret) > return -EFAULT; > > Before this patch, when copy_to_user() attempted to load from NULL it would > fault, there would be no fixup handler for the LDR, and the kernel would die(), > reporting the bad kernel access. > > After this patch (which adds fixup handlers to all the LDR*s in > copy_to_user()), the fault (which is *not* a memory error) would be handled by > the fixup handler, and copy_to_user() would return an error without *any* > indication of the horrible kernel bug. > > This will hide kernel bugs, which will make those harder to identify and fix, > and will also potentially make it easier to exploit the kernel: if the user > somehow gains control of the kernel pointer, they can rely on the fixup handler > returning an error, and can scan through memory rather than dying as soon as > they pas a bad pointer. I should understand what you mean. I'll think about this and reply. Many thanks. Tong. > >> In addition, the panic information contains necessary information >> for users to check. > > There is no panic() in the case I am describing. > >>> So NAK to this change as-is; likewise for the addition of USER() to other ldr* >>> macros in copy_from_user.S and the addition of USER() str* macros in >>> copy_to_user.S. >>> >>> If we want to handle memory errors on some kaccesses, we need a new EX_TYPE_* >>> separate from the usual EX_TYPE_KACESS_ERR_ZERO that means "handle memory >>> errors, but treat other faults as fatal". That should come with a rationale and >>> explanation of why it's actually useful. >> >> This makes sense. Add kaccess types that can be processed properly. >> >>> >>> [...] >>> >>>> diff --git a/arch/arm64/mm/extable.c b/arch/arm64/mm/extable.c >>>> index 478e639f8680..28ec35e3d210 100644 >>>> --- a/arch/arm64/mm/extable.c >>>> +++ b/arch/arm64/mm/extable.c >>>> @@ -85,10 +85,10 @@ bool fixup_exception_mc(struct pt_regs *regs) >>>> if (!ex) >>>> return false; >>>> - /* >>>> - * This is not complete, More Machine check safe extable type can >>>> - * be processed here. >>>> - */ >>>> + switch (ex->type) { >>>> + case EX_TYPE_UACCESS_ERR_ZERO: >>>> + return ex_handler_uaccess_err_zero(ex, regs); >>>> + } >>> >>> Please fold this part into the prior patch, and start ogf with *only* handling >>> errors on accesses already marked with EX_TYPE_UACCESS_ERR_ZERO. I think that >>> change would be relatively uncontroversial, and it would be much easier to >>> build atop that. >> >> OK, the two patches will be merged in the next release. > > Thanks. > > Mark. > .
diff --git a/arch/arm64/lib/copy_from_user.S b/arch/arm64/lib/copy_from_user.S index 34e317907524..1bf676e9201d 100644 --- a/arch/arm64/lib/copy_from_user.S +++ b/arch/arm64/lib/copy_from_user.S @@ -25,7 +25,7 @@ .endm .macro strb1 reg, ptr, val - strb \reg, [\ptr], \val + USER(9998f, strb \reg, [\ptr], \val) .endm .macro ldrh1 reg, ptr, val @@ -33,7 +33,7 @@ .endm .macro strh1 reg, ptr, val - strh \reg, [\ptr], \val + USER(9998f, strh \reg, [\ptr], \val) .endm .macro ldr1 reg, ptr, val @@ -41,7 +41,7 @@ .endm .macro str1 reg, ptr, val - str \reg, [\ptr], \val + USER(9998f, str \reg, [\ptr], \val) .endm .macro ldp1 reg1, reg2, ptr, val @@ -49,7 +49,7 @@ .endm .macro stp1 reg1, reg2, ptr, val - stp \reg1, \reg2, [\ptr], \val + USER(9998f, stp \reg1, \reg2, [\ptr], \val) .endm end .req x5 @@ -66,7 +66,7 @@ SYM_FUNC_START(__arch_copy_from_user) b.ne 9998f // Before being absolutely sure we couldn't copy anything, try harder USER(9998f, ldtrb tmp1w, [srcin]) - strb tmp1w, [dst], #1 +USER(9998f, strb tmp1w, [dst], #1) 9998: sub x0, end, dst // bytes not copied ret SYM_FUNC_END(__arch_copy_from_user) diff --git a/arch/arm64/lib/copy_to_user.S b/arch/arm64/lib/copy_to_user.S index 802231772608..cc031bd87455 100644 --- a/arch/arm64/lib/copy_to_user.S +++ b/arch/arm64/lib/copy_to_user.S @@ -20,7 +20,7 @@ * x0 - bytes not copied */ .macro ldrb1 reg, ptr, val - ldrb \reg, [\ptr], \val + USER(9998f, ldrb \reg, [\ptr], \val) .endm .macro strb1 reg, ptr, val @@ -28,7 +28,7 @@ .endm .macro ldrh1 reg, ptr, val - ldrh \reg, [\ptr], \val + USER(9998f, ldrh \reg, [\ptr], \val) .endm .macro strh1 reg, ptr, val @@ -36,7 +36,7 @@ .endm .macro ldr1 reg, ptr, val - ldr \reg, [\ptr], \val + USER(9998f, ldr \reg, [\ptr], \val) .endm .macro str1 reg, ptr, val @@ -44,7 +44,7 @@ .endm .macro ldp1 reg1, reg2, ptr, val - ldp \reg1, \reg2, [\ptr], \val + USER(9998f, ldp \reg1, \reg2, [\ptr], \val) .endm .macro stp1 reg1, reg2, ptr, val @@ -64,7 +64,7 @@ SYM_FUNC_START(__arch_copy_to_user) 9997: cmp dst, dstin b.ne 9998f // Before being absolutely sure we couldn't copy anything, try harder - ldrb tmp1w, [srcin] +USER(9998f, ldrb tmp1w, [srcin]) USER(9998f, sttrb tmp1w, [dst]) add dst, dst, #1 9998: sub x0, end, dst // bytes not copied diff --git a/arch/arm64/mm/extable.c b/arch/arm64/mm/extable.c index 478e639f8680..28ec35e3d210 100644 --- a/arch/arm64/mm/extable.c +++ b/arch/arm64/mm/extable.c @@ -85,10 +85,10 @@ bool fixup_exception_mc(struct pt_regs *regs) if (!ex) return false; - /* - * This is not complete, More Machine check safe extable type can - * be processed here. - */ + switch (ex->type) { + case EX_TYPE_UACCESS_ERR_ZERO: + return ex_handler_uaccess_err_zero(ex, regs); + } return false; }
If user process access memory fails due to hardware memory error, only the relevant processes are affected, so it is more reasonable to kill the user process and isolate the corrupt page than to panic the kernel. Signed-off-by: Tong Tiangen <tongtiangen@huawei.com> --- arch/arm64/lib/copy_from_user.S | 10 +++++----- arch/arm64/lib/copy_to_user.S | 10 +++++----- arch/arm64/mm/extable.c | 8 ++++---- 3 files changed, 14 insertions(+), 14 deletions(-)