Message ID | e559e60c43f679195bfe4c7b0a301431c6f02c7a.1607157766.git.christophe.leroy@csgroup.eu (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | powerpc/mm: Fix KUAP warning by providing copy_from_kernel_nofault_allowed() | expand |
On Sat, Dec 05, 2020 at 08:43:06AM +0000, Christophe Leroy wrote: > Since commit c33165253492 ("powerpc: use non-set_fs based maccess > routines"), userspace access is not granted anymore when using > copy_from_kernel_nofault() > > However, kthread_probe_data() uses copy_from_kernel_nofault() > to check validity of pointers. When the pointer is NULL, > it points to userspace, leading to a KUAP fault and triggering > the following big hammer warning many times when you request > a sysrq "show task": > To avoid that, copy_from_kernel_nofault_allowed() is used to check > whether the address is a valid kernel address. But the default > version of it returns true for any address. > > Provide a powerpc version of copy_from_kernel_nofault_allowed() > that returns false when the address is below TASK_USER_MAX, > so that copy_from_kernel_nofault() will return -ERANGE. Looks good. I wonder if we should just default to the TASK_SIZE_MAX check in copy_from_kernel_nofault_allowed for architectures that select CONFIG_ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE? > > Reported-by: Qian Cai <qcai@redhat.com> > Fixes: c33165253492 ("powerpc: use non-set_fs based maccess routines") > Cc: Christoph Hellwig <hch@lst.de> > Cc: Al Viro <viro@zeniv.linux.org.uk> > Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> > --- > This issue was introduced in 5.10. I didn't mark it for stable, hopping it will go into 5.10-rc7 > --- > arch/powerpc/mm/Makefile | 2 +- > arch/powerpc/mm/maccess.c | 9 +++++++++ > 2 files changed, 10 insertions(+), 1 deletion(-) > create mode 100644 arch/powerpc/mm/maccess.c > > diff --git a/arch/powerpc/mm/Makefile b/arch/powerpc/mm/Makefile > index 5e147986400d..55b4a8bd408a 100644 > --- a/arch/powerpc/mm/Makefile > +++ b/arch/powerpc/mm/Makefile > @@ -5,7 +5,7 @@ > > ccflags-$(CONFIG_PPC64) := $(NO_MINIMAL_TOC) > > -obj-y := fault.o mem.o pgtable.o mmap.o \ > +obj-y := fault.o mem.o pgtable.o mmap.o maccess.o \ > init_$(BITS).o pgtable_$(BITS).o \ > pgtable-frag.o ioremap.o ioremap_$(BITS).o \ > init-common.o mmu_context.o drmem.o > diff --git a/arch/powerpc/mm/maccess.c b/arch/powerpc/mm/maccess.c > new file mode 100644 > index 000000000000..56e97c0fb233 > --- /dev/null > +++ b/arch/powerpc/mm/maccess.c > @@ -0,0 +1,9 @@ > +// SPDX-License-Identifier: GPL-2.0-only > + > +#include <linux/uaccess.h> > +#include <linux/kernel.h> > + > +bool copy_from_kernel_nofault_allowed(const void *unsafe_src, size_t size) > +{ > + return (unsigned long)unsafe_src >= TASK_SIZE_MAX; > +} > -- > 2.25.0 ---end quoted text---
Le 05/12/2020 à 09:48, Christoph Hellwig a écrit : > On Sat, Dec 05, 2020 at 08:43:06AM +0000, Christophe Leroy wrote: >> Since commit c33165253492 ("powerpc: use non-set_fs based maccess >> routines"), userspace access is not granted anymore when using >> copy_from_kernel_nofault() >> >> However, kthread_probe_data() uses copy_from_kernel_nofault() >> to check validity of pointers. When the pointer is NULL, >> it points to userspace, leading to a KUAP fault and triggering >> the following big hammer warning many times when you request >> a sysrq "show task": > > > >> To avoid that, copy_from_kernel_nofault_allowed() is used to check >> whether the address is a valid kernel address. But the default >> version of it returns true for any address. >> >> Provide a powerpc version of copy_from_kernel_nofault_allowed() >> that returns false when the address is below TASK_USER_MAX, >> so that copy_from_kernel_nofault() will return -ERANGE. > > Looks good. I wonder if we should just default to the TASK_SIZE_MAX > check in copy_from_kernel_nofault_allowed for architectures that select > CONFIG_ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE? Yes maybe that would be better. Can you cook a patch an get it into 5.10 ? Christophe > >> >> Reported-by: Qian Cai <qcai@redhat.com> >> Fixes: c33165253492 ("powerpc: use non-set_fs based maccess routines") >> Cc: Christoph Hellwig <hch@lst.de> >> Cc: Al Viro <viro@zeniv.linux.org.uk> >> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> >> --- >> This issue was introduced in 5.10. I didn't mark it for stable, hopping it will go into 5.10-rc7 >> --- >> arch/powerpc/mm/Makefile | 2 +- >> arch/powerpc/mm/maccess.c | 9 +++++++++ >> 2 files changed, 10 insertions(+), 1 deletion(-) >> create mode 100644 arch/powerpc/mm/maccess.c >> >> diff --git a/arch/powerpc/mm/Makefile b/arch/powerpc/mm/Makefile >> index 5e147986400d..55b4a8bd408a 100644 >> --- a/arch/powerpc/mm/Makefile >> +++ b/arch/powerpc/mm/Makefile >> @@ -5,7 +5,7 @@ >> >> ccflags-$(CONFIG_PPC64) := $(NO_MINIMAL_TOC) >> >> -obj-y := fault.o mem.o pgtable.o mmap.o \ >> +obj-y := fault.o mem.o pgtable.o mmap.o maccess.o \ >> init_$(BITS).o pgtable_$(BITS).o \ >> pgtable-frag.o ioremap.o ioremap_$(BITS).o \ >> init-common.o mmu_context.o drmem.o >> diff --git a/arch/powerpc/mm/maccess.c b/arch/powerpc/mm/maccess.c >> new file mode 100644 >> index 000000000000..56e97c0fb233 >> --- /dev/null >> +++ b/arch/powerpc/mm/maccess.c >> @@ -0,0 +1,9 @@ >> +// SPDX-License-Identifier: GPL-2.0-only >> + >> +#include <linux/uaccess.h> >> +#include <linux/kernel.h> >> + >> +bool copy_from_kernel_nofault_allowed(const void *unsafe_src, size_t size) >> +{ >> + return (unsigned long)unsafe_src >= TASK_SIZE_MAX; >> +} >> -- >> 2.25.0 > ---end quoted text--- >
Le 05/12/2020 à 10:56, Christophe Leroy a écrit : > > > Le 05/12/2020 à 09:48, Christoph Hellwig a écrit : >> On Sat, Dec 05, 2020 at 08:43:06AM +0000, Christophe Leroy wrote: >>> Since commit c33165253492 ("powerpc: use non-set_fs based maccess >>> routines"), userspace access is not granted anymore when using >>> copy_from_kernel_nofault() >>> >>> However, kthread_probe_data() uses copy_from_kernel_nofault() >>> to check validity of pointers. When the pointer is NULL, >>> it points to userspace, leading to a KUAP fault and triggering >>> the following big hammer warning many times when you request >>> a sysrq "show task": >> >> >> >>> To avoid that, copy_from_kernel_nofault_allowed() is used to check >>> whether the address is a valid kernel address. But the default >>> version of it returns true for any address. >>> >>> Provide a powerpc version of copy_from_kernel_nofault_allowed() >>> that returns false when the address is below TASK_USER_MAX, >>> so that copy_from_kernel_nofault() will return -ERANGE. >> >> Looks good. I wonder if we should just default to the TASK_SIZE_MAX >> check in copy_from_kernel_nofault_allowed for architectures that select >> CONFIG_ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE? > > Yes maybe that would be better. > > Can you cook a patch an get it into 5.10 ? > In fact it doesn't seem so easy because only s390, powerpc and x86 have TASK_SIZE_MAX while CONFIG_ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE is selected by arm, arm64, powerpc and x86 So maybe for 5.10 we take the powerpc fix ? Christophe
Yes, I think at this point in the release cycle the specific powerpc fix is safer anyway. But this screams for an eventual general solution.
Christophe Leroy <christophe.leroy@csgroup.eu> writes: > Since commit c33165253492 ("powerpc: use non-set_fs based maccess > routines"), userspace access is not granted anymore when using > copy_from_kernel_nofault() > > However, kthread_probe_data() uses copy_from_kernel_nofault() > to check validity of pointers. When the pointer is NULL, > it points to userspace, leading to a KUAP fault and triggering > the following big hammer warning many times when you request > a sysrq "show task": > > [ 1117.202054] ------------[ cut here ]------------ > [ 1117.202102] Bug: fault blocked by AP register ! > [ 1117.202261] WARNING: CPU: 0 PID: 377 at arch/powerpc/include/asm/nohash/32/kup-8xx.h:66 do_page_fault+0x4a8/0x5ec > [ 1117.202310] Modules linked in: > [ 1117.202428] CPU: 0 PID: 377 Comm: sh Tainted: G W 5.10.0-rc5-01340-g83f53be2de31-dirty #4175 > [ 1117.202499] NIP: c0012048 LR: c0012048 CTR: 00000000 > [ 1117.202573] REGS: cacdbb88 TRAP: 0700 Tainted: G W (5.10.0-rc5-01340-g83f53be2de31-dirty) > [ 1117.202625] MSR: 00021032 <ME,IR,DR,RI> CR: 24082222 XER: 20000000 > [ 1117.202899] > [ 1117.202899] GPR00: c0012048 cacdbc40 c2929290 00000023 c092e554 00000001 c09865e8 c092e640 > [ 1117.202899] GPR08: 00001032 00000000 00000000 00014efc 28082224 100d166a 100a0920 00000000 > [ 1117.202899] GPR16: 100cac0c 100b0000 1080c3fc 1080d685 100d0000 100d0000 00000000 100a0900 > [ 1117.202899] GPR24: 100d0000 c07892ec 00000000 c0921510 c21f4440 0000005c c0000000 cacdbc80 > [ 1117.204362] NIP [c0012048] do_page_fault+0x4a8/0x5ec > [ 1117.204461] LR [c0012048] do_page_fault+0x4a8/0x5ec > [ 1117.204509] Call Trace: > [ 1117.204609] [cacdbc40] [c0012048] do_page_fault+0x4a8/0x5ec (unreliable) > [ 1117.204771] [cacdbc70] [c00112f0] handle_page_fault+0x8/0x34 > [ 1117.204911] --- interrupt: 301 at copy_from_kernel_nofault+0x70/0x1c0 > [ 1117.204979] NIP: c010dbec LR: c010dbac CTR: 00000001 > [ 1117.205053] REGS: cacdbc80 TRAP: 0301 Tainted: G W (5.10.0-rc5-01340-g83f53be2de31-dirty) > [ 1117.205104] MSR: 00009032 <EE,ME,IR,DR,RI> CR: 28082224 XER: 00000000 > [ 1117.205416] DAR: 0000005c DSISR: c0000000 > [ 1117.205416] GPR00: c0045948 cacdbd38 c2929290 00000001 00000017 00000017 00000027 0000000f > [ 1117.205416] GPR08: c09926ec 00000000 00000000 3ffff000 24082224 > [ 1117.206106] NIP [c010dbec] copy_from_kernel_nofault+0x70/0x1c0 > [ 1117.206202] LR [c010dbac] copy_from_kernel_nofault+0x30/0x1c0 > [ 1117.206258] --- interrupt: 301 > [ 1117.206372] [cacdbd38] [c004bbb0] kthread_probe_data+0x44/0x70 (unreliable) > [ 1117.206561] [cacdbd58] [c0045948] print_worker_info+0xe0/0x194 > [ 1117.206717] [cacdbdb8] [c00548ac] sched_show_task+0x134/0x168 > [ 1117.206851] [cacdbdd8] [c005a268] show_state_filter+0x70/0x100 > [ 1117.206989] [cacdbe08] [c039baa0] sysrq_handle_showstate+0x14/0x24 > [ 1117.207122] [cacdbe18] [c039bf18] __handle_sysrq+0xac/0x1d0 > [ 1117.207257] [cacdbe48] [c039c0c0] write_sysrq_trigger+0x4c/0x74 > [ 1117.207407] [cacdbe68] [c01fba48] proc_reg_write+0xb4/0x114 > [ 1117.207550] [cacdbe88] [c0179968] vfs_write+0x12c/0x478 > [ 1117.207686] [cacdbf08] [c0179e60] ksys_write+0x78/0x128 > [ 1117.207826] [cacdbf38] [c00110d0] ret_from_syscall+0x0/0x34 > [ 1117.207938] --- interrupt: c01 at 0xfd4e784 > [ 1117.208008] NIP: 0fd4e784 LR: 0fe0f244 CTR: 10048d38 > [ 1117.208083] REGS: cacdbf48 TRAP: 0c01 Tainted: G W (5.10.0-rc5-01340-g83f53be2de31-dirty) > [ 1117.208134] MSR: 0000d032 <EE,PR,ME,IR,DR,RI> CR: 44002222 XER: 00000000 > [ 1117.208470] > [ 1117.208470] GPR00: 00000004 7fc34090 77bfb4e0 00000001 1080fa40 00000002 7400000f fefefeff > [ 1117.208470] GPR08: 7f7f7f7f 10048d38 1080c414 7fc343c0 00000000 > [ 1117.209104] NIP [0fd4e784] 0xfd4e784 > [ 1117.209180] LR [0fe0f244] 0xfe0f244 > [ 1117.209236] --- interrupt: c01 > [ 1117.209274] Instruction dump: > [ 1117.209353] 714a4000 418200f0 73ca0001 40820084 73ca0032 408200f8 73c90040 4082ff60 > [ 1117.209727] 0fe00000 3c60c082 386399f4 48013b65 <0fe00000> 80010034 3860000b 7c0803a6 > [ 1117.210102] ---[ end trace 1927c0323393af3e ]--- > > To avoid that, copy_from_kernel_nofault_allowed() is used to check > whether the address is a valid kernel address. But the default > version of it returns true for any address. > > Provide a powerpc version of copy_from_kernel_nofault_allowed() > that returns false when the address is below TASK_USER_MAX, > so that copy_from_kernel_nofault() will return -ERANGE. > > Reported-by: Qian Cai <qcai@redhat.com> > Fixes: c33165253492 ("powerpc: use non-set_fs based maccess routines") > Cc: Christoph Hellwig <hch@lst.de> > Cc: Al Viro <viro@zeniv.linux.org.uk> > Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> > --- > This issue was introduced in 5.10. I didn't mark it for stable, hopping it will go into 5.10-rc7 > --- > arch/powerpc/mm/Makefile | 2 +- > arch/powerpc/mm/maccess.c | 9 +++++++++ > 2 files changed, 10 insertions(+), 1 deletion(-) > create mode 100644 arch/powerpc/mm/maccess.c > > diff --git a/arch/powerpc/mm/maccess.c b/arch/powerpc/mm/maccess.c > new file mode 100644 > index 000000000000..56e97c0fb233 > --- /dev/null > +++ b/arch/powerpc/mm/maccess.c > @@ -0,0 +1,9 @@ > +// SPDX-License-Identifier: GPL-2.0-only > + > +#include <linux/uaccess.h> > +#include <linux/kernel.h> > + > +bool copy_from_kernel_nofault_allowed(const void *unsafe_src, size_t size) > +{ > + return (unsigned long)unsafe_src >= TASK_SIZE_MAX; > +} Is there a reason we're using TASK_SIZE_MAX? It's copy from *kernel* (nofault) allowed, so shouldn't we be checking that the address plausibly points at kernel memory? Not at no-man's land above TASK_SIZE_MAX but below the start of kernel memory? We have is_kernel_addr() which already encapsulates some platform quirks around that logic, it seems like it would be a better fit? ie: bool copy_from_kernel_nofault_allowed(const void *unsafe_src, size_t size) { return is_kernel_addr((unsigned long)unsafe_src); } cheers
Le 07/12/2020 à 01:24, Michael Ellerman a écrit : > Christophe Leroy <christophe.leroy@csgroup.eu> writes: >> Since commit c33165253492 ("powerpc: use non-set_fs based maccess >> routines"), userspace access is not granted anymore when using >> copy_from_kernel_nofault() >> >> However, kthread_probe_data() uses copy_from_kernel_nofault() >> to check validity of pointers. When the pointer is NULL, >> it points to userspace, leading to a KUAP fault and triggering >> the following big hammer warning many times when you request >> a sysrq "show task": >> >> [ 1117.202054] ------------[ cut here ]------------ >> [ 1117.202102] Bug: fault blocked by AP register ! >> [ 1117.202261] WARNING: CPU: 0 PID: 377 at arch/powerpc/include/asm/nohash/32/kup-8xx.h:66 do_page_fault+0x4a8/0x5ec >> [ 1117.202310] Modules linked in: >> [ 1117.202428] CPU: 0 PID: 377 Comm: sh Tainted: G W 5.10.0-rc5-01340-g83f53be2de31-dirty #4175 >> [ 1117.202499] NIP: c0012048 LR: c0012048 CTR: 00000000 >> [ 1117.202573] REGS: cacdbb88 TRAP: 0700 Tainted: G W (5.10.0-rc5-01340-g83f53be2de31-dirty) >> [ 1117.202625] MSR: 00021032 <ME,IR,DR,RI> CR: 24082222 XER: 20000000 >> [ 1117.202899] >> [ 1117.202899] GPR00: c0012048 cacdbc40 c2929290 00000023 c092e554 00000001 c09865e8 c092e640 >> [ 1117.202899] GPR08: 00001032 00000000 00000000 00014efc 28082224 100d166a 100a0920 00000000 >> [ 1117.202899] GPR16: 100cac0c 100b0000 1080c3fc 1080d685 100d0000 100d0000 00000000 100a0900 >> [ 1117.202899] GPR24: 100d0000 c07892ec 00000000 c0921510 c21f4440 0000005c c0000000 cacdbc80 >> [ 1117.204362] NIP [c0012048] do_page_fault+0x4a8/0x5ec >> [ 1117.204461] LR [c0012048] do_page_fault+0x4a8/0x5ec >> [ 1117.204509] Call Trace: >> [ 1117.204609] [cacdbc40] [c0012048] do_page_fault+0x4a8/0x5ec (unreliable) >> [ 1117.204771] [cacdbc70] [c00112f0] handle_page_fault+0x8/0x34 >> [ 1117.204911] --- interrupt: 301 at copy_from_kernel_nofault+0x70/0x1c0 >> [ 1117.204979] NIP: c010dbec LR: c010dbac CTR: 00000001 >> [ 1117.205053] REGS: cacdbc80 TRAP: 0301 Tainted: G W (5.10.0-rc5-01340-g83f53be2de31-dirty) >> [ 1117.205104] MSR: 00009032 <EE,ME,IR,DR,RI> CR: 28082224 XER: 00000000 >> [ 1117.205416] DAR: 0000005c DSISR: c0000000 >> [ 1117.205416] GPR00: c0045948 cacdbd38 c2929290 00000001 00000017 00000017 00000027 0000000f >> [ 1117.205416] GPR08: c09926ec 00000000 00000000 3ffff000 24082224 >> [ 1117.206106] NIP [c010dbec] copy_from_kernel_nofault+0x70/0x1c0 >> [ 1117.206202] LR [c010dbac] copy_from_kernel_nofault+0x30/0x1c0 >> [ 1117.206258] --- interrupt: 301 >> [ 1117.206372] [cacdbd38] [c004bbb0] kthread_probe_data+0x44/0x70 (unreliable) >> [ 1117.206561] [cacdbd58] [c0045948] print_worker_info+0xe0/0x194 >> [ 1117.206717] [cacdbdb8] [c00548ac] sched_show_task+0x134/0x168 >> [ 1117.206851] [cacdbdd8] [c005a268] show_state_filter+0x70/0x100 >> [ 1117.206989] [cacdbe08] [c039baa0] sysrq_handle_showstate+0x14/0x24 >> [ 1117.207122] [cacdbe18] [c039bf18] __handle_sysrq+0xac/0x1d0 >> [ 1117.207257] [cacdbe48] [c039c0c0] write_sysrq_trigger+0x4c/0x74 >> [ 1117.207407] [cacdbe68] [c01fba48] proc_reg_write+0xb4/0x114 >> [ 1117.207550] [cacdbe88] [c0179968] vfs_write+0x12c/0x478 >> [ 1117.207686] [cacdbf08] [c0179e60] ksys_write+0x78/0x128 >> [ 1117.207826] [cacdbf38] [c00110d0] ret_from_syscall+0x0/0x34 >> [ 1117.207938] --- interrupt: c01 at 0xfd4e784 >> [ 1117.208008] NIP: 0fd4e784 LR: 0fe0f244 CTR: 10048d38 >> [ 1117.208083] REGS: cacdbf48 TRAP: 0c01 Tainted: G W (5.10.0-rc5-01340-g83f53be2de31-dirty) >> [ 1117.208134] MSR: 0000d032 <EE,PR,ME,IR,DR,RI> CR: 44002222 XER: 00000000 >> [ 1117.208470] >> [ 1117.208470] GPR00: 00000004 7fc34090 77bfb4e0 00000001 1080fa40 00000002 7400000f fefefeff >> [ 1117.208470] GPR08: 7f7f7f7f 10048d38 1080c414 7fc343c0 00000000 >> [ 1117.209104] NIP [0fd4e784] 0xfd4e784 >> [ 1117.209180] LR [0fe0f244] 0xfe0f244 >> [ 1117.209236] --- interrupt: c01 >> [ 1117.209274] Instruction dump: >> [ 1117.209353] 714a4000 418200f0 73ca0001 40820084 73ca0032 408200f8 73c90040 4082ff60 >> [ 1117.209727] 0fe00000 3c60c082 386399f4 48013b65 <0fe00000> 80010034 3860000b 7c0803a6 >> [ 1117.210102] ---[ end trace 1927c0323393af3e ]--- >> >> To avoid that, copy_from_kernel_nofault_allowed() is used to check >> whether the address is a valid kernel address. But the default >> version of it returns true for any address. >> >> Provide a powerpc version of copy_from_kernel_nofault_allowed() >> that returns false when the address is below TASK_USER_MAX, >> so that copy_from_kernel_nofault() will return -ERANGE. >> >> Reported-by: Qian Cai <qcai@redhat.com> >> Fixes: c33165253492 ("powerpc: use non-set_fs based maccess routines") >> Cc: Christoph Hellwig <hch@lst.de> >> Cc: Al Viro <viro@zeniv.linux.org.uk> >> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> >> --- >> This issue was introduced in 5.10. I didn't mark it for stable, hopping it will go into 5.10-rc7 >> --- >> arch/powerpc/mm/Makefile | 2 +- >> arch/powerpc/mm/maccess.c | 9 +++++++++ >> 2 files changed, 10 insertions(+), 1 deletion(-) >> create mode 100644 arch/powerpc/mm/maccess.c >> >> diff --git a/arch/powerpc/mm/maccess.c b/arch/powerpc/mm/maccess.c >> new file mode 100644 >> index 000000000000..56e97c0fb233 >> --- /dev/null >> +++ b/arch/powerpc/mm/maccess.c >> @@ -0,0 +1,9 @@ >> +// SPDX-License-Identifier: GPL-2.0-only >> + >> +#include <linux/uaccess.h> >> +#include <linux/kernel.h> >> + >> +bool copy_from_kernel_nofault_allowed(const void *unsafe_src, size_t size) >> +{ >> + return (unsigned long)unsafe_src >= TASK_SIZE_MAX; >> +} > > Is there a reason we're using TASK_SIZE_MAX? No special reason, that's just copied from x86. > > It's copy from *kernel* (nofault) allowed, so shouldn't we be checking > that the address plausibly points at kernel memory? Not at no-man's land > above TASK_SIZE_MAX but below the start of kernel memory? Yes, on PPC64 that's right. On PPC32 the kernel memory starts where the userland stops. > > We have is_kernel_addr() which already encapsulates some platform quirks > around that logic, it seems like it would be a better fit? Yes probably, I send v2. For PPC32 that's a comparison with TASK_SIZE thought. > > ie: > > bool copy_from_kernel_nofault_allowed(const void *unsafe_src, size_t size) > { > return is_kernel_addr((unsigned long)unsafe_src); > } > > cheers > Christophe
Christophe Leroy <christophe.leroy@csgroup.eu> writes: > Le 07/12/2020 à 01:24, Michael Ellerman a écrit : >> Christophe Leroy <christophe.leroy@csgroup.eu> writes: >>> Since commit c33165253492 ("powerpc: use non-set_fs based maccess >>> routines"), userspace access is not granted anymore when using >>> copy_from_kernel_nofault() >>> >>> However, kthread_probe_data() uses copy_from_kernel_nofault() >>> to check validity of pointers. When the pointer is NULL, >>> it points to userspace, leading to a KUAP fault and triggering >>> the following big hammer warning many times when you request >>> a sysrq "show task": >>> >>> [ 1117.202054] ------------[ cut here ]------------ >>> [ 1117.202102] Bug: fault blocked by AP register ! >>> [ 1117.202261] WARNING: CPU: 0 PID: 377 at arch/powerpc/include/asm/nohash/32/kup-8xx.h:66 do_page_fault+0x4a8/0x5ec >>> [ 1117.202310] Modules linked in: >>> [ 1117.202428] CPU: 0 PID: 377 Comm: sh Tainted: G W 5.10.0-rc5-01340-g83f53be2de31-dirty #4175 >>> [ 1117.202499] NIP: c0012048 LR: c0012048 CTR: 00000000 >>> [ 1117.202573] REGS: cacdbb88 TRAP: 0700 Tainted: G W (5.10.0-rc5-01340-g83f53be2de31-dirty) >>> [ 1117.202625] MSR: 00021032 <ME,IR,DR,RI> CR: 24082222 XER: 20000000 >>> [ 1117.202899] >>> [ 1117.202899] GPR00: c0012048 cacdbc40 c2929290 00000023 c092e554 00000001 c09865e8 c092e640 >>> [ 1117.202899] GPR08: 00001032 00000000 00000000 00014efc 28082224 100d166a 100a0920 00000000 >>> [ 1117.202899] GPR16: 100cac0c 100b0000 1080c3fc 1080d685 100d0000 100d0000 00000000 100a0900 >>> [ 1117.202899] GPR24: 100d0000 c07892ec 00000000 c0921510 c21f4440 0000005c c0000000 cacdbc80 >>> [ 1117.204362] NIP [c0012048] do_page_fault+0x4a8/0x5ec >>> [ 1117.204461] LR [c0012048] do_page_fault+0x4a8/0x5ec >>> [ 1117.204509] Call Trace: >>> [ 1117.204609] [cacdbc40] [c0012048] do_page_fault+0x4a8/0x5ec (unreliable) >>> [ 1117.204771] [cacdbc70] [c00112f0] handle_page_fault+0x8/0x34 >>> [ 1117.204911] --- interrupt: 301 at copy_from_kernel_nofault+0x70/0x1c0 >>> [ 1117.204979] NIP: c010dbec LR: c010dbac CTR: 00000001 >>> [ 1117.205053] REGS: cacdbc80 TRAP: 0301 Tainted: G W (5.10.0-rc5-01340-g83f53be2de31-dirty) >>> [ 1117.205104] MSR: 00009032 <EE,ME,IR,DR,RI> CR: 28082224 XER: 00000000 >>> [ 1117.205416] DAR: 0000005c DSISR: c0000000 >>> [ 1117.205416] GPR00: c0045948 cacdbd38 c2929290 00000001 00000017 00000017 00000027 0000000f >>> [ 1117.205416] GPR08: c09926ec 00000000 00000000 3ffff000 24082224 >>> [ 1117.206106] NIP [c010dbec] copy_from_kernel_nofault+0x70/0x1c0 >>> [ 1117.206202] LR [c010dbac] copy_from_kernel_nofault+0x30/0x1c0 >>> [ 1117.206258] --- interrupt: 301 >>> [ 1117.206372] [cacdbd38] [c004bbb0] kthread_probe_data+0x44/0x70 (unreliable) >>> [ 1117.206561] [cacdbd58] [c0045948] print_worker_info+0xe0/0x194 >>> [ 1117.206717] [cacdbdb8] [c00548ac] sched_show_task+0x134/0x168 >>> [ 1117.206851] [cacdbdd8] [c005a268] show_state_filter+0x70/0x100 >>> [ 1117.206989] [cacdbe08] [c039baa0] sysrq_handle_showstate+0x14/0x24 >>> [ 1117.207122] [cacdbe18] [c039bf18] __handle_sysrq+0xac/0x1d0 >>> [ 1117.207257] [cacdbe48] [c039c0c0] write_sysrq_trigger+0x4c/0x74 >>> [ 1117.207407] [cacdbe68] [c01fba48] proc_reg_write+0xb4/0x114 >>> [ 1117.207550] [cacdbe88] [c0179968] vfs_write+0x12c/0x478 >>> [ 1117.207686] [cacdbf08] [c0179e60] ksys_write+0x78/0x128 >>> [ 1117.207826] [cacdbf38] [c00110d0] ret_from_syscall+0x0/0x34 >>> [ 1117.207938] --- interrupt: c01 at 0xfd4e784 >>> [ 1117.208008] NIP: 0fd4e784 LR: 0fe0f244 CTR: 10048d38 >>> [ 1117.208083] REGS: cacdbf48 TRAP: 0c01 Tainted: G W (5.10.0-rc5-01340-g83f53be2de31-dirty) >>> [ 1117.208134] MSR: 0000d032 <EE,PR,ME,IR,DR,RI> CR: 44002222 XER: 00000000 >>> [ 1117.208470] >>> [ 1117.208470] GPR00: 00000004 7fc34090 77bfb4e0 00000001 1080fa40 00000002 7400000f fefefeff >>> [ 1117.208470] GPR08: 7f7f7f7f 10048d38 1080c414 7fc343c0 00000000 >>> [ 1117.209104] NIP [0fd4e784] 0xfd4e784 >>> [ 1117.209180] LR [0fe0f244] 0xfe0f244 >>> [ 1117.209236] --- interrupt: c01 >>> [ 1117.209274] Instruction dump: >>> [ 1117.209353] 714a4000 418200f0 73ca0001 40820084 73ca0032 408200f8 73c90040 4082ff60 >>> [ 1117.209727] 0fe00000 3c60c082 386399f4 48013b65 <0fe00000> 80010034 3860000b 7c0803a6 >>> [ 1117.210102] ---[ end trace 1927c0323393af3e ]--- >>> >>> To avoid that, copy_from_kernel_nofault_allowed() is used to check >>> whether the address is a valid kernel address. But the default >>> version of it returns true for any address. >>> >>> Provide a powerpc version of copy_from_kernel_nofault_allowed() >>> that returns false when the address is below TASK_USER_MAX, >>> so that copy_from_kernel_nofault() will return -ERANGE. >>> >>> Reported-by: Qian Cai <qcai@redhat.com> >>> Fixes: c33165253492 ("powerpc: use non-set_fs based maccess routines") >>> Cc: Christoph Hellwig <hch@lst.de> >>> Cc: Al Viro <viro@zeniv.linux.org.uk> >>> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> >>> --- >>> This issue was introduced in 5.10. I didn't mark it for stable, hopping it will go into 5.10-rc7 >>> --- >>> arch/powerpc/mm/Makefile | 2 +- >>> arch/powerpc/mm/maccess.c | 9 +++++++++ >>> 2 files changed, 10 insertions(+), 1 deletion(-) >>> create mode 100644 arch/powerpc/mm/maccess.c >>> >>> diff --git a/arch/powerpc/mm/maccess.c b/arch/powerpc/mm/maccess.c >>> new file mode 100644 >>> index 000000000000..56e97c0fb233 >>> --- /dev/null >>> +++ b/arch/powerpc/mm/maccess.c >>> @@ -0,0 +1,9 @@ >>> +// SPDX-License-Identifier: GPL-2.0-only >>> + >>> +#include <linux/uaccess.h> >>> +#include <linux/kernel.h> >>> + >>> +bool copy_from_kernel_nofault_allowed(const void *unsafe_src, size_t size) >>> +{ >>> + return (unsigned long)unsafe_src >= TASK_SIZE_MAX; >>> +} >> >> Is there a reason we're using TASK_SIZE_MAX? > > No special reason, that's just copied from x86. > >> It's copy from *kernel* (nofault) allowed, so shouldn't we be checking >> that the address plausibly points at kernel memory? Not at no-man's land >> above TASK_SIZE_MAX but below the start of kernel memory? > > Yes, on PPC64 that's right. On PPC32 the kernel memory starts where the userland stops. Yep sorry I was talking about 64-bit there. >> We have is_kernel_addr() which already encapsulates some platform quirks >> around that logic, it seems like it would be a better fit? > > Yes probably, I send v2. For PPC32 that's a comparison with TASK_SIZE thought. Yeah, so it's the same test for PPC32 but I think is_kernel_addr() is better on 64-bit. I'll grab your v2. cheers
diff --git a/arch/powerpc/mm/Makefile b/arch/powerpc/mm/Makefile index 5e147986400d..55b4a8bd408a 100644 --- a/arch/powerpc/mm/Makefile +++ b/arch/powerpc/mm/Makefile @@ -5,7 +5,7 @@ ccflags-$(CONFIG_PPC64) := $(NO_MINIMAL_TOC) -obj-y := fault.o mem.o pgtable.o mmap.o \ +obj-y := fault.o mem.o pgtable.o mmap.o maccess.o \ init_$(BITS).o pgtable_$(BITS).o \ pgtable-frag.o ioremap.o ioremap_$(BITS).o \ init-common.o mmu_context.o drmem.o diff --git a/arch/powerpc/mm/maccess.c b/arch/powerpc/mm/maccess.c new file mode 100644 index 000000000000..56e97c0fb233 --- /dev/null +++ b/arch/powerpc/mm/maccess.c @@ -0,0 +1,9 @@ +// SPDX-License-Identifier: GPL-2.0-only + +#include <linux/uaccess.h> +#include <linux/kernel.h> + +bool copy_from_kernel_nofault_allowed(const void *unsafe_src, size_t size) +{ + return (unsigned long)unsafe_src >= TASK_SIZE_MAX; +}
Since commit c33165253492 ("powerpc: use non-set_fs based maccess routines"), userspace access is not granted anymore when using copy_from_kernel_nofault() However, kthread_probe_data() uses copy_from_kernel_nofault() to check validity of pointers. When the pointer is NULL, it points to userspace, leading to a KUAP fault and triggering the following big hammer warning many times when you request a sysrq "show task": [ 1117.202054] ------------[ cut here ]------------ [ 1117.202102] Bug: fault blocked by AP register ! [ 1117.202261] WARNING: CPU: 0 PID: 377 at arch/powerpc/include/asm/nohash/32/kup-8xx.h:66 do_page_fault+0x4a8/0x5ec [ 1117.202310] Modules linked in: [ 1117.202428] CPU: 0 PID: 377 Comm: sh Tainted: G W 5.10.0-rc5-01340-g83f53be2de31-dirty #4175 [ 1117.202499] NIP: c0012048 LR: c0012048 CTR: 00000000 [ 1117.202573] REGS: cacdbb88 TRAP: 0700 Tainted: G W (5.10.0-rc5-01340-g83f53be2de31-dirty) [ 1117.202625] MSR: 00021032 <ME,IR,DR,RI> CR: 24082222 XER: 20000000 [ 1117.202899] [ 1117.202899] GPR00: c0012048 cacdbc40 c2929290 00000023 c092e554 00000001 c09865e8 c092e640 [ 1117.202899] GPR08: 00001032 00000000 00000000 00014efc 28082224 100d166a 100a0920 00000000 [ 1117.202899] GPR16: 100cac0c 100b0000 1080c3fc 1080d685 100d0000 100d0000 00000000 100a0900 [ 1117.202899] GPR24: 100d0000 c07892ec 00000000 c0921510 c21f4440 0000005c c0000000 cacdbc80 [ 1117.204362] NIP [c0012048] do_page_fault+0x4a8/0x5ec [ 1117.204461] LR [c0012048] do_page_fault+0x4a8/0x5ec [ 1117.204509] Call Trace: [ 1117.204609] [cacdbc40] [c0012048] do_page_fault+0x4a8/0x5ec (unreliable) [ 1117.204771] [cacdbc70] [c00112f0] handle_page_fault+0x8/0x34 [ 1117.204911] --- interrupt: 301 at copy_from_kernel_nofault+0x70/0x1c0 [ 1117.204979] NIP: c010dbec LR: c010dbac CTR: 00000001 [ 1117.205053] REGS: cacdbc80 TRAP: 0301 Tainted: G W (5.10.0-rc5-01340-g83f53be2de31-dirty) [ 1117.205104] MSR: 00009032 <EE,ME,IR,DR,RI> CR: 28082224 XER: 00000000 [ 1117.205416] DAR: 0000005c DSISR: c0000000 [ 1117.205416] GPR00: c0045948 cacdbd38 c2929290 00000001 00000017 00000017 00000027 0000000f [ 1117.205416] GPR08: c09926ec 00000000 00000000 3ffff000 24082224 [ 1117.206106] NIP [c010dbec] copy_from_kernel_nofault+0x70/0x1c0 [ 1117.206202] LR [c010dbac] copy_from_kernel_nofault+0x30/0x1c0 [ 1117.206258] --- interrupt: 301 [ 1117.206372] [cacdbd38] [c004bbb0] kthread_probe_data+0x44/0x70 (unreliable) [ 1117.206561] [cacdbd58] [c0045948] print_worker_info+0xe0/0x194 [ 1117.206717] [cacdbdb8] [c00548ac] sched_show_task+0x134/0x168 [ 1117.206851] [cacdbdd8] [c005a268] show_state_filter+0x70/0x100 [ 1117.206989] [cacdbe08] [c039baa0] sysrq_handle_showstate+0x14/0x24 [ 1117.207122] [cacdbe18] [c039bf18] __handle_sysrq+0xac/0x1d0 [ 1117.207257] [cacdbe48] [c039c0c0] write_sysrq_trigger+0x4c/0x74 [ 1117.207407] [cacdbe68] [c01fba48] proc_reg_write+0xb4/0x114 [ 1117.207550] [cacdbe88] [c0179968] vfs_write+0x12c/0x478 [ 1117.207686] [cacdbf08] [c0179e60] ksys_write+0x78/0x128 [ 1117.207826] [cacdbf38] [c00110d0] ret_from_syscall+0x0/0x34 [ 1117.207938] --- interrupt: c01 at 0xfd4e784 [ 1117.208008] NIP: 0fd4e784 LR: 0fe0f244 CTR: 10048d38 [ 1117.208083] REGS: cacdbf48 TRAP: 0c01 Tainted: G W (5.10.0-rc5-01340-g83f53be2de31-dirty) [ 1117.208134] MSR: 0000d032 <EE,PR,ME,IR,DR,RI> CR: 44002222 XER: 00000000 [ 1117.208470] [ 1117.208470] GPR00: 00000004 7fc34090 77bfb4e0 00000001 1080fa40 00000002 7400000f fefefeff [ 1117.208470] GPR08: 7f7f7f7f 10048d38 1080c414 7fc343c0 00000000 [ 1117.209104] NIP [0fd4e784] 0xfd4e784 [ 1117.209180] LR [0fe0f244] 0xfe0f244 [ 1117.209236] --- interrupt: c01 [ 1117.209274] Instruction dump: [ 1117.209353] 714a4000 418200f0 73ca0001 40820084 73ca0032 408200f8 73c90040 4082ff60 [ 1117.209727] 0fe00000 3c60c082 386399f4 48013b65 <0fe00000> 80010034 3860000b 7c0803a6 [ 1117.210102] ---[ end trace 1927c0323393af3e ]--- To avoid that, copy_from_kernel_nofault_allowed() is used to check whether the address is a valid kernel address. But the default version of it returns true for any address. Provide a powerpc version of copy_from_kernel_nofault_allowed() that returns false when the address is below TASK_USER_MAX, so that copy_from_kernel_nofault() will return -ERANGE. Reported-by: Qian Cai <qcai@redhat.com> Fixes: c33165253492 ("powerpc: use non-set_fs based maccess routines") Cc: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> --- This issue was introduced in 5.10. I didn't mark it for stable, hopping it will go into 5.10-rc7 --- arch/powerpc/mm/Makefile | 2 +- arch/powerpc/mm/maccess.c | 9 +++++++++ 2 files changed, 10 insertions(+), 1 deletion(-) create mode 100644 arch/powerpc/mm/maccess.c