Message ID | 20180112005940.23279-7-ebiederm@xmission.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Thu, Jan 11, 2018 at 06:59:36PM -0600, Eric W. Biederman wrote: > Setting si_code to 0 results in a userspace seeing an si_code of 0. > This is the same si_code as SI_USER. Posix and common sense requires > that SI_USER not be a signal specific si_code. As such this use of 0 > for the si_code is a pretty horribly broken ABI. I think this situation may have come about because 0 is used as a padding value for "impossible" cases -- i.e., things that can't happen unless the kernel is broken, or things that are too unrecoverable for clean error reporting to be helpful. In general, I think these values are not expected to reach userspace in practice. This is not an excuse though -- and not 100% true -- so it's certainly worthy of cleanup. It would be good to approach this similarly for arm and arm64, since the arm64 fault code is derived from arm. > Further use of si_code == 0 guaranteed that copy_siginfo_to_user saw a > value of __SI_KILL and now sees a value of SIL_KILL with the result > that uid and pid fields are copied and which might copying the si_addr > field by accident but certainly not by design. Making this a very > flakey implementation. > > Utilizing FPE_FIXME, BUS_FIXME, TRAP_FIXME siginfo_layout will now return > SIL_FAULT and the appropriate fields will be reliably copied. > > But folks this is a new and unique kind of bad. This is massively > untested code bad. This is inventing new and unique was to get > siginfo wrong bad. This is don't even think about Posix or what > siginfo means bad. This is lots of eyeballs all missing the fact > that the code does the wrong thing bad. This is getting stuck > and keep making the same mistake bad. > > I really hope we can find a non userspace breaking fix for this on a > port as new as arm64. > Possible ABI fixes include: > - Send the signal without siginfo > - Don't generate a signal The above two sould like ABI breaks? > - Possibly assign and use an appropriate si_code > - Don't handle cases which can't happen I think a mixture of these two is the best approach. In any case, si_code == 0 here doesn't seem to have any explicit meaning. I think we can translate all of the arm64 faults to proper si_codes -- see my sketch below. Probably means a bit more thought though. The only counterargument would be if there is software relying on these bogus signal cases getting si_code == 0 for a useful purpose. The main reason I see to check for SI_USER is to allow a process to filter out spurious signals (say, an asynchronous I/O signal for which si_value would be garbage), and to print out diagnostics before (in the case of a well-behaved program) resetting the signal to SIG_DFL and killing itself to report the signal to the waiter. Daemons may be more discerning about who is allowed to signal them, but overloading SIGBUS (say) as an IPC channel sounds like a very odd thing to do. The same probably applies to any signal that has nontrivial metadata. Have you found software that is impacted by this in practice? [...] > +++ b/arch/arm64/kernel/fpsimd.c > @@ -867,7 +867,7 @@ asmlinkage void do_fpsimd_acc(unsigned int esr, struct pt_regs *regs) > asmlinkage void do_fpsimd_exc(unsigned int esr, struct pt_regs *regs) > { > siginfo_t info; > - unsigned int si_code = 0; > + unsigned int si_code = FPE_FIXME; > > if (esr & FPEXC_IOF) > si_code = FPE_FLTINV; This 0 can happen for vector operations where the implementation may not be able to report exactly what happened, for example where the implementer didn't want to pay the cost of tracking exactly what went wrong in each lane. However, the FPEXC_* bits can be garbage in such a case rather than being all zero: we should be checking the TFV bit in the ESR here. This may be a bug. Perhaps FPE_FLTINV should be returned in si_code for such cases: it's not otherwise used on arm64 -- invalid instructions would be reported as SIGILL/ILL_ILLOPC instead). Otherwise, we might want to define a new code or arbitrarily pick one of the existing FLT_* since this is really a more benign condition than executing an illegal instruction. Alternatively, treat the fault as spurious and suppress it, but that doesn't feel right either. > diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c > index 9b7f89df49db..abe200587334 100644 > --- a/arch/arm64/mm/fault.c > +++ b/arch/arm64/mm/fault.c > @@ -596,7 +596,7 @@ static int do_sea(unsigned long addr, unsigned int esr, struct pt_regs *regs) > > info.si_signo = SIGBUS; > info.si_errno = 0; > - info.si_code = 0; > + info.si_code = BUS_FIXME; Probably BUS_OBJERR. > if (esr & ESR_ELx_FnV) > info.si_addr = NULL; > else > @@ -607,70 +607,70 @@ static int do_sea(unsigned long addr, unsigned int esr, struct pt_regs *regs) > } > > static const struct fault_info fault_info[] = { > - { do_bad, SIGBUS, 0, "ttbr address size fault" }, > - { do_bad, SIGBUS, 0, "level 1 address size fault" }, > - { do_bad, SIGBUS, 0, "level 2 address size fault" }, > - { do_bad, SIGBUS, 0, "level 3 address size fault" }, > + { do_bad, SIGBUS, BUS_FIXME, "ttbr address size fault" }, > + { do_bad, SIGBUS, BUS_FIXME, "level 1 address size fault" }, > + { do_bad, SIGBUS, BUS_FIXME, "level 2 address size fault" }, > + { do_bad, SIGBUS, BUS_FIXME, "level 3 address size fault" }, Pagetable screwup or kernel/system/CPU bug -> SIGKILL, or panic(). [...] > - { do_bad, SIGBUS, 0, "unknown 8" }, > + { do_bad, SIGBUS, BUS_FIXME, "unknown 8" }, [...] > + { do_bad, SIGBUS, BUS_FIXME, "unknown 12" }, Not architected, so they could mean absolutely anything. If they can happen at all, they are probably unsafe to ignore. -> SIGKILL, or panic(). Similary for all the "unknown" codes in the table, which I omit for brevity. > + { do_sea, SIGBUS, BUS_FIXME, "synchronous external abort" }, This si_code seems to be a fallback for if ACPI is absent or doesn't know what to do with this error. -> SIGBUS/BUS_OBJERR? Can probably legitimately happen for userspace for suitable MMIO mappings. Perhaps it's more serious though in the presence of ACPI. Do we expect that ACPI can diagnose all localisable errors? > + { do_sea, SIGBUS, BUS_FIXME, "level 0 (translation table walk)" }, > + { do_sea, SIGBUS, BUS_FIXME, "level 1 (translation table walk)" }, > + { do_sea, SIGBUS, BUS_FIXME, "level 2 (translation table walk)" }, > + { do_sea, SIGBUS, BUS_FIXME, "level 3 (translation table walk)" }, Pagetable screwup or kernel/system/CPU bug -> SIGKILL, or panic(). > + { do_sea, SIGBUS, BUS_FIXME, "synchronous parity or ECC error" }, // Reserved when RAS is implemented Possibly SIGBUS/BUS_MCEERR_AR (though I don't know exactly what userspace is supposed to do with this or whether this implies the existence or certain kernel features for managing the error that may not be present on arm64...) Otherwise, SIGKILL. > + { do_sea, SIGBUS, BUS_FIXME, "level 0 synchronous parity error (translation table walk)" }, // Reserved when RAS is implemented > + { do_sea, SIGBUS, BUS_FIXME, "level 1 synchronous parity error (translation table walk)" }, // Reserved when RAS is implemented > + { do_sea, SIGBUS, BUS_FIXME, "level 2 synchronous parity error (translation table walk)" }, // Reserved when RAS is implemented > + { do_sea, SIGBUS, BUS_FIXME, "level 3 synchronous parity error (translation table walk)" }, // Reserved when RAS is implemented Process page tables corrupt: if the kernel couldn't fix this, the process can't reasonably fix it -> SIGKILL Since this is a RAS-type error it could be triggered by a cosmic ray rather than requiring a kernel or system bug or other major failure, so we probably shouldn't panic the system if the error is localisable to a particular process. > { do_alignment_fault, SIGBUS, BUS_ADRALN, "alignment fault" }, > + { do_bad, SIGBUS, BUS_FIXME, "TLB conflict abort" }, Broken kernel, kernel memory corruption, CPU/system bug etc.: SIGKILL or panic(). > + { do_bad, SIGBUS, BUS_FIXME, "Unsupported atomic hardware update fault" }, Broken kernel, kernel memory corruption, CPU/system bug etc.: SIGKILL or panic(). > + { do_bad, SIGBUS, BUS_FIXME, "implementation fault (lockdown abort)" }, Userspace shouldn't have access to lockdown: kernel/system bug -> SIGKILL or panic(). > + { do_bad, SIGBUS, BUS_FIXME, "implementation fault (unsupported exclusive)" }, If running on an implementation where this fault can happen in response to an exclusive load/store issued by userspace may fail somewhere in the memory system, this should probably be SIGBUS/BUS_OBJERR (or possibly a new BUS_* code). This one may need to be hardware-dependent, if this fault can mean something different depending on the hardware (I'm gussing this possibility from "implementation" -- I've not checked the docs.) > + { do_bad, SIGBUS, BUS_FIXME, "section domain fault" }, > + { do_bad, SIGBUS, BUS_FIXME, "page domain fault" }, Broken kernel, kernel memory corruption, CPU/system bug etc.: SIGKILL or panic(). > }; > > int handle_guest_sea(phys_addr_t addr, unsigned int esr) > @@ -739,11 +739,11 @@ static struct fault_info __refdata debug_fault_info[] = { > + { do_bad, SIGBUS, BUS_FIXME, "unknown 3" }, > + { do_bad, SIGTRAP, TRAP_FIXME, "aarch32 vector catch" }, > + { do_bad, SIGBUS, BUS_FIXME, "unknown 7" }, > }; Impossible (?), or meaning unknown. SIGKILL/panic() for these? Or possibly (since these are probably well localised errors) SIGILL/ILL_ILLOPC. Cheers ---Dave
Dave Martin <Dave.Martin@arm.com> writes: > On Thu, Jan 11, 2018 at 06:59:36PM -0600, Eric W. Biederman wrote: >> Setting si_code to 0 results in a userspace seeing an si_code of 0. >> This is the same si_code as SI_USER. Posix and common sense requires >> that SI_USER not be a signal specific si_code. As such this use of 0 >> for the si_code is a pretty horribly broken ABI. > > I think this situation may have come about because 0 is used as a > padding value for "impossible" cases -- i.e., things that can't happen > unless the kernel is broken, or things that are too unrecoverable for > clean error reporting to be helpful. > > In general, I think these values are not expected to reach userspace in > practice. > > This is not an excuse though -- and not 100% true -- so it's certainly > worthy of cleanup. > > > It would be good to approach this similarly for arm and arm64, since > the arm64 fault code is derived from arm. In this case the fault_info is something I have only seen on arm64. I have been approaching all architectures the same way. If there is insufficient information without architecture expertise to fix this class of error I have been ading FPE_FIXME to them. >> Further use of si_code == 0 guaranteed that copy_siginfo_to_user saw a >> value of __SI_KILL and now sees a value of SIL_KILL with the result >> that uid and pid fields are copied and which might copying the si_addr >> field by accident but certainly not by design. Making this a very >> flakey implementation. >> >> Utilizing FPE_FIXME, BUS_FIXME, TRAP_FIXME siginfo_layout will now return >> SIL_FAULT and the appropriate fields will be reliably copied. >> >> But folks this is a new and unique kind of bad. This is massively >> untested code bad. This is inventing new and unique was to get >> siginfo wrong bad. This is don't even think about Posix or what >> siginfo means bad. This is lots of eyeballs all missing the fact >> that the code does the wrong thing bad. This is getting stuck >> and keep making the same mistake bad. >> >> I really hope we can find a non userspace breaking fix for this on a >> port as new as arm64. > >> Possible ABI fixes include: >> - Send the signal without siginfo >> - Don't generate a signal > > The above two sould like ABI breaks? They are ways I have seen code on other platforms deal with not information to generate siginfo. Sending the signal without siginfo is roughly equivalent to your send SIGKILL suggestion below. A good example of that is code that calls force_sigsegv. Calling "force_sig(SIGBUS, current);" is perfectly valid. And then the parent when it reaped the process would have a little more information to go on when guessing what happened to the process. >> - Possibly assign and use an appropriate si_code >> - Don't handle cases which can't happen > > I think a mixture of these two is the best approach. > > In any case, si_code == 0 here doesn't seem to have any explicit meaning. > I think we can translate all of the arm64 faults to proper si_codes -- > see my sketch below. Probably means a bit more thought though. Yes I would be very happy to see that. > The only counterargument would be if there is software relying on > these bogus signal cases getting si_code == 0 for a useful purpose. > > The main reason I see to check for SI_USER is to allow a process to > filter out spurious signals (say, an asynchronous I/O signal for > which si_value would be garbage), and to print out diagnostics > before (in the case of a well-behaved program) resetting the signal > to SIG_DFL and killing itself to report the signal to the waiter. > > Daemons may be more discerning about who is allowed to signal them, > but overloading SIGBUS (say) as an IPC channel sounds like a very odd > thing to do. The same probably applies to any signal that has > nontrivial metadata. Agreed. Although I have seen ltp test cases that do crazy things like that. > Have you found software that is impacted by this in practice? No. I don't expect many userspace applications look at siginfo and everything I have found is some rare hard to trigger non-x86 case which limits the exposure to userspace applications tremendously. The angle I am coming at all of this from is that the linux kernel code that filled out out struct siginfo was not comprehensible or correct. Internal to the kernel it was using a magic value (not exportable to userspace) in the upper bits of si_code. That was causing problems for signal injection and converting signals from 32bit to 64bit, and from 64bit to 32bit. So I wrote kernel/signal.c:siginfo_layout() to figure out which fields of struct siginfo should be sent to userspace. In doing so I discovered that using 0 in si_code (aka SI_USER) is ambiguous, and problematic. Unfortuantely in most of the cases I have spotted using 0 in the si_code requires architectural knowledge that I don't currently have to sort out. So the best I can do is change si_code from 0 to FPE_FIXME/BUS_FIXME/TRAP_FIXME and bring the architecture maintainers attention to this area. One of the problems that results from all of this is that we copy unitialized data to userspace. I am slowly unifying and cleaning the code up so that the code is simple enough we can be certain we are not copying unitialized data to userspace. With si_coes of FPE_FIXME/BUS_FIXME/TRAP_FIXME I can at least attempt to keep the craziness from happening. My next step is to unify struct siginfo and struct compat_siginfo and the functions that copy them to userspace because there are very siginficant problems there. All of that said I like the way you are thinking about fixing these issues. > [...] > >> +++ b/arch/arm64/kernel/fpsimd.c >> @@ -867,7 +867,7 @@ asmlinkage void do_fpsimd_acc(unsigned int esr, struct pt_regs *regs) >> asmlinkage void do_fpsimd_exc(unsigned int esr, struct pt_regs *regs) >> { >> siginfo_t info; >> - unsigned int si_code = 0; >> + unsigned int si_code = FPE_FIXME; >> >> if (esr & FPEXC_IOF) >> si_code = FPE_FLTINV; > > This 0 can happen for vector operations where the implementation may > not be able to report exactly what happened, for example where > the implementer didn't want to pay the cost of tracking exactly > what went wrong in each lane. > > However, the FPEXC_* bits can be garbage in such a case rather > than being all zero: we should be checking the TFV bit in the ESR here. > This may be a bug. > > Perhaps FPE_FLTINV should be returned in si_code for such cases: it's > not otherwise used on arm64 -- invalid instructions would be reported as > SIGILL/ILL_ILLOPC instead). > > Otherwise, we might want to define a new code or arbitrarily pick > one of the existing FLT_* since this is really a more benign condition > than executing an illegal instruction. Alternatively, treat the > fault as spurious and suppress it, but that doesn't feel right either. I would love to see this sorted out. There is a very similar pattern on several different architectures. I suspect if we have a clean solution on one architecture the other architectures will be able to use that solution as well. >> diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c >> index 9b7f89df49db..abe200587334 100644 >> --- a/arch/arm64/mm/fault.c >> +++ b/arch/arm64/mm/fault.c >> @@ -596,7 +596,7 @@ static int do_sea(unsigned long addr, unsigned int esr, struct pt_regs *regs) >> >> info.si_signo = SIGBUS; >> info.si_errno = 0; >> - info.si_code = 0; >> + info.si_code = BUS_FIXME; > > Probably BUS_OBJERR. > >> if (esr & ESR_ELx_FnV) >> info.si_addr = NULL; >> else >> @@ -607,70 +607,70 @@ static int do_sea(unsigned long addr, unsigned int esr, struct pt_regs *regs) >> } >> >> static const struct fault_info fault_info[] = { >> - { do_bad, SIGBUS, 0, "ttbr address size fault" }, >> - { do_bad, SIGBUS, 0, "level 1 address size fault" }, >> - { do_bad, SIGBUS, 0, "level 2 address size fault" }, >> - { do_bad, SIGBUS, 0, "level 3 address size fault" }, >> + { do_bad, SIGBUS, BUS_FIXME, "ttbr address size fault" }, >> + { do_bad, SIGBUS, BUS_FIXME, "level 1 address size fault" }, >> + { do_bad, SIGBUS, BUS_FIXME, "level 2 address size fault" }, >> + { do_bad, SIGBUS, BUS_FIXME, "level 3 address size fault" }, > > Pagetable screwup or kernel/system/CPU bug -> SIGKILL, or panic(). > > [...] > >> - { do_bad, SIGBUS, 0, "unknown 8" }, >> + { do_bad, SIGBUS, BUS_FIXME, "unknown 8" }, > > [...] > >> + { do_bad, SIGBUS, BUS_FIXME, "unknown 12" }, > > Not architected, so they could mean absolutely anything. If they > can happen at all, they are probably unsafe to ignore. > > -> SIGKILL, or panic(). > > Similary for all the "unknown" codes in the table, which I omit for > brevity. > >> + { do_sea, SIGBUS, BUS_FIXME, "synchronous external abort" }, > > This si_code seems to be a fallback for if ACPI is absent or doesn't > know what to do with this error. > > -> SIGBUS/BUS_OBJERR? > > Can probably legitimately happen for userspace for suitable MMIO mappings. > > Perhaps it's more serious though in the presence of ACPI. Do we expect > that ACPI can diagnose all localisable errors? > >> + { do_sea, SIGBUS, BUS_FIXME, "level 0 (translation table walk)" }, >> + { do_sea, SIGBUS, BUS_FIXME, "level 1 (translation table walk)" }, >> + { do_sea, SIGBUS, BUS_FIXME, "level 2 (translation table walk)" }, >> + { do_sea, SIGBUS, BUS_FIXME, "level 3 (translation table walk)" }, > > Pagetable screwup or kernel/system/CPU bug -> SIGKILL, or panic(). > >> + { do_sea, SIGBUS, BUS_FIXME, "synchronous parity or ECC error" }, // Reserved when RAS is implemented > > Possibly SIGBUS/BUS_MCEERR_AR (though I don't know exactly what > userspace is supposed to do with this or whether this implies the > existence or certain kernel features for managing the error that > may not be present on arm64...) > > Otherwise, SIGKILL. Yes. The AR Action Required and AO Action optional bits I don't quite understand. But BUS_MCEERR_AR does sound like a good fit. >> + { do_sea, SIGBUS, BUS_FIXME, "level 0 synchronous parity error (translation table walk)" }, // Reserved when RAS is implemented >> + { do_sea, SIGBUS, BUS_FIXME, "level 1 synchronous parity error (translation table walk)" }, // Reserved when RAS is implemented >> + { do_sea, SIGBUS, BUS_FIXME, "level 2 synchronous parity error (translation table walk)" }, // Reserved when RAS is implemented >> + { do_sea, SIGBUS, BUS_FIXME, "level 3 synchronous parity error (translation table walk)" }, // Reserved when RAS is implemented > > Process page tables corrupt: if the kernel couldn't fix this, the > process can't reasonably fix it -> SIGKILL > > Since this is a RAS-type error it could be triggered by a cosmic ray > rather than requiring a kernel or system bug or other major failure, so > we probably shouldn't panic the system if the error is localisable to a > particular process. > >> { do_alignment_fault, SIGBUS, BUS_ADRALN, "alignment fault" }, >> + { do_bad, SIGBUS, BUS_FIXME, "TLB conflict abort" }, > > Broken kernel, kernel memory corruption, CPU/system bug etc.: > SIGKILL or panic(). > >> + { do_bad, SIGBUS, BUS_FIXME, "Unsupported atomic hardware update fault" }, > > Broken kernel, kernel memory corruption, CPU/system bug etc.: > SIGKILL or panic(). > >> + { do_bad, SIGBUS, BUS_FIXME, "implementation fault (lockdown abort)" }, > > Userspace shouldn't have access to lockdown: kernel/system bug > -> SIGKILL or panic(). > >> + { do_bad, SIGBUS, BUS_FIXME, "implementation fault (unsupported exclusive)" }, > > If running on an implementation where this fault can happen in response to an exclusive load/store issued by userspace may fail somewhere in the memory system, this should probably be SIGBUS/BUS_OBJERR (or possibly a new BUS_* code). > > This one may need to be hardware-dependent, if this fault can mean > something different depending on the hardware (I'm gussing this > possibility from "implementation" -- I've not checked the docs.) > >> + { do_bad, SIGBUS, BUS_FIXME, "section domain fault" }, >> + { do_bad, SIGBUS, BUS_FIXME, "page domain fault" }, > > Broken kernel, kernel memory corruption, CPU/system bug etc.: > SIGKILL or panic(). > >> }; >> >> int handle_guest_sea(phys_addr_t addr, unsigned int esr) >> @@ -739,11 +739,11 @@ static struct fault_info __refdata debug_fault_info[] = { >> + { do_bad, SIGBUS, BUS_FIXME, "unknown 3" }, >> + { do_bad, SIGTRAP, TRAP_FIXME, "aarch32 vector catch" }, >> + { do_bad, SIGBUS, BUS_FIXME, "unknown 7" }, >> }; > > Impossible (?), or meaning unknown. > SIGKILL/panic() for these? Or possibly (since these are probably well > localised errors) SIGILL/ILL_ILLOPC. I like the way you are thinking on these, and I'd love to see them fixed. Eric
Hi Dave, Thanks for going through all these, On 15/01/18 16:30, Dave Martin wrote: > On Thu, Jan 11, 2018 at 06:59:36PM -0600, Eric W. Biederman wrote: >> diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c >> index 9b7f89df49db..abe200587334 100644 >> --- a/arch/arm64/mm/fault.c >> +++ b/arch/arm64/mm/fault.c >> @@ -596,7 +596,7 @@ static int do_sea(unsigned long addr, unsigned int esr, struct pt_regs *regs) >> + { do_sea, SIGBUS, BUS_FIXME, "synchronous external abort" }, > > This si_code seems to be a fallback for if ACPI is absent or doesn't > know what to do with this error. > > -> SIGBUS/BUS_OBJERR? > > Can probably legitimately happen for userspace for suitable MMIO mappings. It can happen for normal memory too, there are specific ESR values for parity/checksum errors when read/writing memory. I think this first one is 'other/unknown', and its up to the CPU how to classify them. > Perhaps it's more serious though in the presence of ACPI. Do we expect > that ACPI can diagnose all localisable errors? Its not just ACPI, the CPU's v8.2 RAS Extensions use this synchronous-external-abort as notification of a RAS error, (the other details are written to to memory-mapped nodes). With the v8.2 RAS Extensions the ESR tells us if the error was contained. For ACPI we rely on firmware to set an appropriate severity in the CPER records generated by firmware. The APEI helpers will call panic() if they find a fatal error. For systems with neither {firmware,kernel}-first RAS, BUS_OBJERR looks like a good choice. >> + { do_sea, SIGBUS, BUS_FIXME, "level 0 (translation table walk)" }, >> + { do_sea, SIGBUS, BUS_FIXME, "level 1 (translation table walk)" }, >> + { do_sea, SIGBUS, BUS_FIXME, "level 2 (translation table walk)" }, >> + { do_sea, SIGBUS, BUS_FIXME, "level 3 (translation table walk)" }, > > Pagetable screwup or kernel/system/CPU bug -> SIGKILL, or panic(). (RAS mechanisms may claim this and send their own signals, if not:) SIGKILL is probably a better choice here, while we do have an address, there is nothing user-space can do about it. >> + { do_sea, SIGBUS, BUS_FIXME, "synchronous parity or ECC error" }, // Reserved when RAS is implemented > > Possibly SIGBUS/BUS_MCEERR_AR (though I don't know exactly what > userspace is supposed to do with this or whether this implies the > existence or certain kernel features for managing the error that > may not be present on arm64...) I'd like to keep the MCEERR signals to errors that we know are contained, the kernel has understood and handled. (These features do exist for arm64, enabling CONFIG_MEMORY_FAILURE and a few APEI options allows all this to work today with suitable firmware. My Seattle claims to support it). > Otherwise, SIGKILL. Sounds good, >> + { do_sea, SIGBUS, BUS_FIXME, "level 0 synchronous parity error (translation table walk)" }, // Reserved when RAS is implemented >> + { do_sea, SIGBUS, BUS_FIXME, "level 1 synchronous parity error (translation table walk)" }, // Reserved when RAS is implemented >> + { do_sea, SIGBUS, BUS_FIXME, "level 2 synchronous parity error (translation table walk)" }, // Reserved when RAS is implemented >> + { do_sea, SIGBUS, BUS_FIXME, "level 3 synchronous parity error (translation table walk)" }, // Reserved when RAS is implemented > > Process page tables corrupt: if the kernel couldn't fix this, the > process can't reasonably fix it -> SIGKILL > > Since this is a RAS-type error it could be triggered by a cosmic ray > rather than requiring a kernel or system bug or other major failure, so > we probably shouldn't panic the system if the error is localisable to a > particular process. Without the RAS-Extensions severity to tell us the error is contained I'm not sure what we can expect. But given the page-tables are per-process, and we never swap them to disk etc, its probably a safe bet that it doesn't matter either way for these. Thanks, James
On Mon, Jan 15, 2018 at 11:23:03AM -0600, Eric W. Biederman wrote: > Dave Martin <Dave.Martin@arm.com> writes: > > > On Thu, Jan 11, 2018 at 06:59:36PM -0600, Eric W. Biederman wrote: > >> Setting si_code to 0 results in a userspace seeing an si_code of 0. > >> This is the same si_code as SI_USER. Posix and common sense requires > >> that SI_USER not be a signal specific si_code. As such this use of 0 > >> for the si_code is a pretty horribly broken ABI. > > > > I think this situation may have come about because 0 is used as a > > padding value for "impossible" cases -- i.e., things that can't happen > > unless the kernel is broken, or things that are too unrecoverable for > > clean error reporting to be helpful. > > > > In general, I think these values are not expected to reach userspace in > > practice. > > > > This is not an excuse though -- and not 100% true -- so it's certainly > > worthy of cleanup. > > > > > > It would be good to approach this similarly for arm and arm64, since > > the arm64 fault code is derived from arm. > > In this case the fault_info is something I have only seen on arm64. > I have been approaching all architectures the same way. Bad guess on my part; this table-driven approach seems to be new for arm64. > If there is insufficient information without architecture expertise > to fix this class of error I have been ading FPE_FIXME to them. Fair enough. > >> Further use of si_code == 0 guaranteed that copy_siginfo_to_user saw a > >> value of __SI_KILL and now sees a value of SIL_KILL with the result > >> that uid and pid fields are copied and which might copying the si_addr > >> field by accident but certainly not by design. Making this a very > >> flakey implementation. > >> > >> Utilizing FPE_FIXME, BUS_FIXME, TRAP_FIXME siginfo_layout will now return > >> SIL_FAULT and the appropriate fields will be reliably copied. > >> > >> But folks this is a new and unique kind of bad. This is massively > >> untested code bad. This is inventing new and unique was to get > >> siginfo wrong bad. This is don't even think about Posix or what > >> siginfo means bad. This is lots of eyeballs all missing the fact > >> that the code does the wrong thing bad. This is getting stuck > >> and keep making the same mistake bad. > >> > >> I really hope we can find a non userspace breaking fix for this on a > >> port as new as arm64. > > > >> Possible ABI fixes include: > >> - Send the signal without siginfo > >> - Don't generate a signal > > > > The above two sould like ABI breaks? > > They are ways I have seen code on other platforms deal with > not information to generate siginfo. Sending the signal without siginfo > is roughly equivalent to your send SIGKILL suggestion below. > > A good example of that is code that calls force_sigsegv. > > Calling "force_sig(SIGBUS, current);" is perfectly valid. > And then the parent when it reaped the process would have > a little more information to go on when guessing what happened > to the process. > > >> - Possibly assign and use an appropriate si_code > >> - Don't handle cases which can't happen > > > > I think a mixture of these two is the best approach. > > > > In any case, si_code == 0 here doesn't seem to have any explicit meaning. > > I think we can translate all of the arm64 faults to proper si_codes -- > > see my sketch below. Probably means a bit more thought though. > > Yes I would be very happy to see that. > > > The only counterargument would be if there is software relying on > > these bogus signal cases getting si_code == 0 for a useful purpose. > > > > The main reason I see to check for SI_USER is to allow a process to > > filter out spurious signals (say, an asynchronous I/O signal for > > which si_value would be garbage), and to print out diagnostics > > before (in the case of a well-behaved program) resetting the signal > > to SIG_DFL and killing itself to report the signal to the waiter. > > > > Daemons may be more discerning about who is allowed to signal them, > > but overloading SIGBUS (say) as an IPC channel sounds like a very odd > > thing to do. The same probably applies to any signal that has > > nontrivial metadata. > > Agreed. Although I have seen ltp test cases that do crazy things like > that. > > > Have you found software that is impacted by this in practice? > > No. Searching for si_code on codesearch.debian.net, I looked at a few random hits. Although this is far from exhaustive, I saw no instance of code that assumes some arch-specific meaning for SI_USER (or 0). Most code seems to check for signal-specific si_code values before assuming that signal-specific signifo fields are valid; or for SI_USER (or si_code <= 0) before assuming that si_uid and si_pid are valid. Returning proper values for si_code values in place of "0" would fix rather than break such cases. > I don't expect many userspace applications look at siginfo and > everything I have found is some rare hard to trigger non-x86 case which > limits the exposure to userspace applications tremendously. > > The angle I am coming at all of this from is that the linux kernel code > that filled out out struct siginfo was not comprehensible or correct. I think "not comprehensible or correct" is a pretty good summary of all signal-related code... > Internal to the kernel it was using a magic value (not exportable to > userspace) in the upper bits of si_code. That was causing problems for > signal injection and converting signals from 32bit to 64bit, and from > 64bit to 32bit. > > So I wrote kernel/signal.c:siginfo_layout() to figure out which fields > of struct siginfo should be sent to userspace. In doing so I discovered > that using 0 in si_code (aka SI_USER) is ambiguous, and problematic. > > Unfortuantely in most of the cases I have spotted using 0 in the si_code > requires architectural knowledge that I don't currently have to sort > out. So the best I can do is change si_code from 0 to > FPE_FIXME/BUS_FIXME/TRAP_FIXME and bring the architecture maintainers > attention to this area. > > One of the problems that results from all of this is that we copy > unitialized data to userspace. I am slowly unifying and cleaning the > code up so that the code is simple enough we can be certain we are > not copying unitialized data to userspace. > > With si_coes of FPE_FIXME/BUS_FIXME/TRAP_FIXME I can at least attempt to > keep the craziness from happening. > > My next step is to unify struct siginfo and struct compat_siginfo > and the functions that copy them to userspace because there are very > siginficant problems there. All of which sounds valuable. > All of that said I like the way you are thinking about fixing these > issues. Is it feasible to have a different internal constant for SI_USER? Then the generic could warn if it sees si_code == 0. If the special nonzero KERNEL_SI_USER is seen instead, it is silently translated to SI_USER (0) for userspace. This might help us track down cases where 0 is passed by accident. It may not be worth it though: if the affected cases are ones that happen never or almost never, a runtime BUG_ON() may not be helpful for tracking them down. Also, I'm making an assumption that si_code always flows through some generic code before reaching userspace (maybe untrue). > >> +++ b/arch/arm64/kernel/fpsimd.c > >> @@ -867,7 +867,7 @@ asmlinkage void do_fpsimd_acc(unsigned int esr, struct pt_regs *regs) > >> asmlinkage void do_fpsimd_exc(unsigned int esr, struct pt_regs *regs) > >> { > >> siginfo_t info; > >> - unsigned int si_code = 0; > >> + unsigned int si_code = FPE_FIXME; > >> > >> if (esr & FPEXC_IOF) > >> si_code = FPE_FLTINV; > > > > This 0 can happen for vector operations where the implementation may > > not be able to report exactly what happened, for example where > > the implementer didn't want to pay the cost of tracking exactly > > what went wrong in each lane. > > > > However, the FPEXC_* bits can be garbage in such a case rather > > than being all zero: we should be checking the TFV bit in the ESR here. > > This may be a bug. > > > > Perhaps FPE_FLTINV should be returned in si_code for such cases: it's > > not otherwise used on arm64 -- invalid instructions would be reported as > > SIGILL/ILL_ILLOPC instead). > > > > Otherwise, we might want to define a new code or arbitrarily pick > > one of the existing FLT_* since this is really a more benign condition > > than executing an illegal instruction. Alternatively, treat the > > fault as spurious and suppress it, but that doesn't feel right either. > > I would love to see this sorted out. There is a very similar pattern > on several different architectures. I suspect if we have a clean > solution on one architecture the other architectures will be able to use > that solution as well. Since user code that relies on checking si_code for fp exceptions will probably already break in these cases, we can probably fudge things a bit here. I'll have a think. IEEE may also define some rules that are relevant here... For the proposed conversion of the si_code==0 cases for arm64, I'll draft an RFC for discussion (hopefully sometime this week). [...] Cheers ---Dave
Dave Martin <Dave.Martin@arm.com> writes: > On Mon, Jan 15, 2018 at 11:23:03AM -0600, Eric W. Biederman wrote: >> Dave Martin <Dave.Martin@arm.com> writes: >> >> > On Thu, Jan 11, 2018 at 06:59:36PM -0600, Eric W. Biederman wrote: >> >> Setting si_code to 0 results in a userspace seeing an si_code of 0. >> >> This is the same si_code as SI_USER. Posix and common sense requires >> >> that SI_USER not be a signal specific si_code. As such this use of 0 >> >> for the si_code is a pretty horribly broken ABI. >> > >> > I think this situation may have come about because 0 is used as a >> > padding value for "impossible" cases -- i.e., things that can't happen >> > unless the kernel is broken, or things that are too unrecoverable for >> > clean error reporting to be helpful. >> > >> > In general, I think these values are not expected to reach userspace in >> > practice. >> > >> > This is not an excuse though -- and not 100% true -- so it's certainly >> > worthy of cleanup. >> > >> > >> > It would be good to approach this similarly for arm and arm64, since >> > the arm64 fault code is derived from arm. >> >> In this case the fault_info is something I have only seen on arm64. >> I have been approaching all architectures the same way. > > Bad guess on my part; this table-driven approach seems to be new for > arm64. > >> If there is insufficient information without architecture expertise >> to fix this class of error I have been ading FPE_FIXME to them. > > Fair enough. > >> >> Further use of si_code == 0 guaranteed that copy_siginfo_to_user saw a >> >> value of __SI_KILL and now sees a value of SIL_KILL with the result >> >> that uid and pid fields are copied and which might copying the si_addr >> >> field by accident but certainly not by design. Making this a very >> >> flakey implementation. >> >> >> >> Utilizing FPE_FIXME, BUS_FIXME, TRAP_FIXME siginfo_layout will now return >> >> SIL_FAULT and the appropriate fields will be reliably copied. >> >> >> >> But folks this is a new and unique kind of bad. This is massively >> >> untested code bad. This is inventing new and unique was to get >> >> siginfo wrong bad. This is don't even think about Posix or what >> >> siginfo means bad. This is lots of eyeballs all missing the fact >> >> that the code does the wrong thing bad. This is getting stuck >> >> and keep making the same mistake bad. >> >> >> >> I really hope we can find a non userspace breaking fix for this on a >> >> port as new as arm64. >> > >> >> Possible ABI fixes include: >> >> - Send the signal without siginfo >> >> - Don't generate a signal >> > >> > The above two sould like ABI breaks? >> >> They are ways I have seen code on other platforms deal with >> not information to generate siginfo. Sending the signal without siginfo >> is roughly equivalent to your send SIGKILL suggestion below. >> >> A good example of that is code that calls force_sigsegv. >> >> Calling "force_sig(SIGBUS, current);" is perfectly valid. >> And then the parent when it reaped the process would have >> a little more information to go on when guessing what happened >> to the process. >> >> >> - Possibly assign and use an appropriate si_code >> >> - Don't handle cases which can't happen >> > >> > I think a mixture of these two is the best approach. >> > >> > In any case, si_code == 0 here doesn't seem to have any explicit meaning. >> > I think we can translate all of the arm64 faults to proper si_codes -- >> > see my sketch below. Probably means a bit more thought though. >> >> Yes I would be very happy to see that. >> >> > The only counterargument would be if there is software relying on >> > these bogus signal cases getting si_code == 0 for a useful purpose. >> > >> > The main reason I see to check for SI_USER is to allow a process to >> > filter out spurious signals (say, an asynchronous I/O signal for >> > which si_value would be garbage), and to print out diagnostics >> > before (in the case of a well-behaved program) resetting the signal >> > to SIG_DFL and killing itself to report the signal to the waiter. >> > >> > Daemons may be more discerning about who is allowed to signal them, >> > but overloading SIGBUS (say) as an IPC channel sounds like a very odd >> > thing to do. The same probably applies to any signal that has >> > nontrivial metadata. >> >> Agreed. Although I have seen ltp test cases that do crazy things like >> that. >> >> > Have you found software that is impacted by this in practice? >> >> No. > > Searching for si_code on codesearch.debian.net, I looked at a few > random hits. Although this is far from exhaustive, I saw no instance > of code that assumes some arch-specific meaning for SI_USER (or 0). > > Most code seems to check for signal-specific si_code values before > assuming that signal-specific signifo fields are valid; or for > SI_USER (or si_code <= 0) before assuming that si_uid and si_pid > are valid. > > Returning proper values for si_code values in place of "0" would fix > rather than break such cases. > > >> I don't expect many userspace applications look at siginfo and >> everything I have found is some rare hard to trigger non-x86 case which >> limits the exposure to userspace applications tremendously. >> >> The angle I am coming at all of this from is that the linux kernel code >> that filled out out struct siginfo was not comprehensible or correct. > > I think "not comprehensible or correct" is a pretty good summary of > all signal-related code... > >> Internal to the kernel it was using a magic value (not exportable to >> userspace) in the upper bits of si_code. That was causing problems for >> signal injection and converting signals from 32bit to 64bit, and from >> 64bit to 32bit. >> >> So I wrote kernel/signal.c:siginfo_layout() to figure out which fields >> of struct siginfo should be sent to userspace. In doing so I discovered >> that using 0 in si_code (aka SI_USER) is ambiguous, and problematic. >> >> Unfortuantely in most of the cases I have spotted using 0 in the si_code >> requires architectural knowledge that I don't currently have to sort >> out. So the best I can do is change si_code from 0 to >> FPE_FIXME/BUS_FIXME/TRAP_FIXME and bring the architecture maintainers >> attention to this area. >> >> One of the problems that results from all of this is that we copy >> unitialized data to userspace. I am slowly unifying and cleaning the >> code up so that the code is simple enough we can be certain we are >> not copying unitialized data to userspace. >> >> With si_coes of FPE_FIXME/BUS_FIXME/TRAP_FIXME I can at least attempt to >> keep the craziness from happening. >> >> My next step is to unify struct siginfo and struct compat_siginfo >> and the functions that copy them to userspace because there are very >> siginficant problems there. > > All of which sounds valuable. > >> All of that said I like the way you are thinking about fixing these >> issues. > > Is it feasible to have a different internal constant for SI_USER? > Then the generic could warn if it sees si_code == 0. If the > special nonzero KERNEL_SI_USER is seen instead, it is silently > translated to SI_USER (0) for userspace. This might help us > track down cases where 0 is passed by accident. > > It may not be worth it though: if the affected cases are ones > that happen never or almost never, a runtime BUG_ON() may not be > helpful for tracking them down. > > Also, I'm making an assumption that si_code always flows through > some generic code before reaching userspace (maybe untrue). The code does flow through a generic path, and I am in the middle of tightening that up right now. As filling out siginfo is error prone, and I need a guarantee that all of struct siginfo is initialized. Adding a warning if a arch fault hander uses si_code == 0 aka SI_USER would not be hard. Given what you have found. Given that it seems to match my experience we can almost certainly change the code to just warn when the 0 is passed in for the si_code for fault handlers. I want to ensure that all of the fields are filled in before I do that or else I risk passing unitialized values to userspace. But I like that as the long term goal for this code. >> >> +++ b/arch/arm64/kernel/fpsimd.c >> >> @@ -867,7 +867,7 @@ asmlinkage void do_fpsimd_acc(unsigned int esr, struct pt_regs *regs) >> >> asmlinkage void do_fpsimd_exc(unsigned int esr, struct pt_regs *regs) >> >> { >> >> siginfo_t info; >> >> - unsigned int si_code = 0; >> >> + unsigned int si_code = FPE_FIXME; >> >> >> >> if (esr & FPEXC_IOF) >> >> si_code = FPE_FLTINV; >> > >> > This 0 can happen for vector operations where the implementation may >> > not be able to report exactly what happened, for example where >> > the implementer didn't want to pay the cost of tracking exactly >> > what went wrong in each lane. >> > >> > However, the FPEXC_* bits can be garbage in such a case rather >> > than being all zero: we should be checking the TFV bit in the ESR here. >> > This may be a bug. >> > >> > Perhaps FPE_FLTINV should be returned in si_code for such cases: it's >> > not otherwise used on arm64 -- invalid instructions would be reported as >> > SIGILL/ILL_ILLOPC instead). >> > >> > Otherwise, we might want to define a new code or arbitrarily pick >> > one of the existing FLT_* since this is really a more benign condition >> > than executing an illegal instruction. Alternatively, treat the >> > fault as spurious and suppress it, but that doesn't feel right either. >> >> I would love to see this sorted out. There is a very similar pattern >> on several different architectures. I suspect if we have a clean >> solution on one architecture the other architectures will be able to use >> that solution as well. > > Since user code that relies on checking si_code for fp exceptions will > probably already break in these cases, we can probably fudge things a > bit here. > > I'll have a think. IEEE may also define some rules that are relevant > here... > > For the proposed conversion of the si_code==0 cases for arm64, I'll > draft an RFC for discussion (hopefully sometime this week). Sounds good. I will keep FPE_FIXME as a place holder until this gets sorted out. There is a second issue I am looking at in this location, and maybe I don't have to address it now. But it looks like the code is calling send_sig_info instead of force_sig_info for a synchronous exception. Am I reading that correctly? Eric
On Tue, Jan 16, 2018 at 04:28:50PM -0600, Eric W. Biederman wrote: > I will keep FPE_FIXME as a place holder until this gets sorted out. > > There is a second issue I am looking at in this location, > and maybe I don't have to address it now. But it looks like the code is > calling send_sig_info instead of force_sig_info for a synchronous > exception. Am I reading that correctly? VFP used to use force_sig_info(), but it seems to be really the wrong call to use. force_sig_info() checks whether the program decided to ignore or block the signal, and if it did, replaces the signal handler with the default handler and unblocks the signal. Are you really suggesting that FP all FP signals should get this treatment?
On Wed, Jan 17, 2018 at 11:57:09AM +0000, Russell King - ARM Linux wrote: > On Tue, Jan 16, 2018 at 04:28:50PM -0600, Eric W. Biederman wrote: > > I will keep FPE_FIXME as a place holder until this gets sorted out. > > > > There is a second issue I am looking at in this location, > > and maybe I don't have to address it now. But it looks like the code is > > calling send_sig_info instead of force_sig_info for a synchronous > > exception. Am I reading that correctly? > > VFP used to use force_sig_info(), but it seems to be really the wrong > call to use. force_sig_info() checks whether the program decided to > ignore or block the signal, and if it did, replaces the signal handler > with the default handler and unblocks the signal. > > Are you really suggesting that FP all FP signals should get this > treatment? feenableexcept(FE_OVERFLOW) kind of means "I can't run safely past an fp overflow exception, please signal me instead". If the process also blocked SIGFPE, that could be taken to mean "I can't run safely past an fp overflow exception _and_ I can't take SIGFPE either" ... i.e., if an fp overflow happens there is no way to proceed and it's really fatal. What SIG_IGN ought to mean is rather more debatable, but again, the process could be asking for two opposite things: guarantee a SIGFPE is delivered instead of running past an fp exception, and also guarantee that SIGFPE is _not_ delivered. It looks like arm and arm64 are different from most other arches (including x86) here, but I'm not sure what is considered correct, and it looks like the answer is not standardised. There's a possibility that some software goes subtly wrong on arm/arm64 where on other arches it would get terminated with SIGKILL. Whether this matters depends on how harmless the fp exception is to the work of the program. I think if an exception is set to trap via feenableexcept() then that's a strong hint the programmer thinks that exception is not harmless. OTOH, trapping is not always available anyway... Was there some particular program being broken by the force_sig_info() here? Cheers ---Dave
On Wed, Jan 17, 2018 at 12:15:05PM +0000, Dave Martin wrote: > On Wed, Jan 17, 2018 at 11:57:09AM +0000, Russell King - ARM Linux wrote: > > On Tue, Jan 16, 2018 at 04:28:50PM -0600, Eric W. Biederman wrote: > > > I will keep FPE_FIXME as a place holder until this gets sorted out. > > > > > > There is a second issue I am looking at in this location, > > > and maybe I don't have to address it now. But it looks like the code is > > > calling send_sig_info instead of force_sig_info for a synchronous > > > exception. Am I reading that correctly? > > > > VFP used to use force_sig_info(), but it seems to be really the wrong > > call to use. force_sig_info() checks whether the program decided to > > ignore or block the signal, and if it did, replaces the signal handler > > with the default handler and unblocks the signal. > > > > Are you really suggesting that FP all FP signals should get this > > treatment? > > feenableexcept(FE_OVERFLOW) kind of means "I can't run safely past > an fp overflow exception, please signal me instead". > > If the process also blocked SIGFPE, that could be taken to mean > "I can't run safely past an fp overflow exception _and_ I can't > take SIGFPE either" ... i.e., if an fp overflow happens there is > no way to proceed and it's really fatal. > > What SIG_IGN ought to mean is rather more debatable, but again, > the process could be asking for two opposite things: guarantee a > SIGFPE is delivered instead of running past an fp exception, and > also guarantee that SIGFPE is _not_ delivered. > > It looks like arm and arm64 are different from most other arches > (including x86) here, but I'm not sure what is considered correct, and > it looks like the answer is not standardised. There's a possibility > that some software goes subtly wrong on arm/arm64 where on other arches > it would get terminated with SIGKILL. > > Whether this matters depends on how harmless the fp exception is to > the work of the program. I think if an exception is set to trap > via feenableexcept() then that's a strong hint the programmer thinks > that exception is not harmless. OTOH, trapping is not always > available anyway... Like many of these things, there is no clear answer. It's a set of conflicting requirements, and as you point out, even if you've called feenableexcept(), you are not guaranteed to get a trap. However, do remember that FP exceptions on ARM hardware are already asynchronous - they get reported by the _next_ FP operation to the one that caused them, which means they could be raised by a library function sometime after it occured (when the library function decides to save the FP registers to the stack before it makes use of them.) It's entirely possible that the library function has blocked FP signals temporarily (not explicitly, just decided to block all signals while it does something sensitive) and will unblock them again afterwards - at which point we get the SIGFPE, and it would be quite right to deliver that signal to the user SIGFPE handler, rather than forcing it onto the program mid-library function. It's also possible that SIGFPE could be blocked by another signal handler having been invoked, and it triggers the latent generation of the SIGFPE. I'd be more inclined to agree with you if VFP exceptions were synchronous but they aren't. > Was there some particular program being broken by the force_sig_info() > here? I don't recall.
On Wed, Jan 17, 2018 at 12:37:52PM +0000, Russell King - ARM Linux wrote: > On Wed, Jan 17, 2018 at 12:15:05PM +0000, Dave Martin wrote: > > On Wed, Jan 17, 2018 at 11:57:09AM +0000, Russell King - ARM Linux wrote: [...] > > > VFP used to use force_sig_info(), but it seems to be really the wrong > > > call to use. force_sig_info() checks whether the program decided to > > > ignore or block the signal, and if it did, replaces the signal handler > > > with the default handler and unblocks the signal. > > > > > > Are you really suggesting that FP all FP signals should get this > > > treatment? > > > > feenableexcept(FE_OVERFLOW) kind of means "I can't run safely past > > an fp overflow exception, please signal me instead". > > > > If the process also blocked SIGFPE, that could be taken to mean > > "I can't run safely past an fp overflow exception _and_ I can't > > take SIGFPE either" ... i.e., if an fp overflow happens there is > > no way to proceed and it's really fatal. > > > > What SIG_IGN ought to mean is rather more debatable, but again, > > the process could be asking for two opposite things: guarantee a > > SIGFPE is delivered instead of running past an fp exception, and > > also guarantee that SIGFPE is _not_ delivered. > > > > It looks like arm and arm64 are different from most other arches > > (including x86) here, but I'm not sure what is considered correct, and > > it looks like the answer is not standardised. There's a possibility > > that some software goes subtly wrong on arm/arm64 where on other arches > > it would get terminated with SIGKILL. > > > > Whether this matters depends on how harmless the fp exception is to > > the work of the program. I think if an exception is set to trap > > via feenableexcept() then that's a strong hint the programmer thinks > > that exception is not harmless. OTOH, trapping is not always > > available anyway... > > Like many of these things, there is no clear answer. It's a set of > conflicting requirements, and as you point out, even if you've called > feenableexcept(), you are not guaranteed to get a trap. > > However, do remember that FP exceptions on ARM hardware are already > asynchronous - they get reported by the _next_ FP operation to the one > that caused them, which means they could be raised by a library function > sometime after it occured (when the library function decides to save the > FP registers to the stack before it makes use of them.) It's entirely > possible that the library function has blocked FP signals temporarily > (not explicitly, just decided to block all signals while it does > something sensitive) and will unblock them again afterwards - at which > point we get the SIGFPE, and it would be quite right to deliver that > signal to the user SIGFPE handler, rather than forcing it onto the > program mid-library function. > > It's also possible that SIGFPE could be blocked by another signal handler > having been invoked, and it triggers the latent generation of the SIGFPE. > > I'd be more inclined to agree with you if VFP exceptions were synchronous > but they aren't. Hmmm, it looks like imprecise fp exception traps are disallowed from ARMv8 onwards. I guess they made more sense when the FPU really was a coprocessor, or at least semidetached from the integer core. I think force_sig_info() makes sense here if and only if the traps are guaranteed to be precise, so we probably should use this on arm64. Not arm though (alpha doesn't either, if I understand the code correctly.) Does that make sense? Apparently, few recent cores (at least ARM's own ones) support fp exception trapping anyway... 1176 may be the most recent. Cheers ---Dave
On Wed, Jan 17, 2018 at 03:37:31PM +0000, Dave Martin wrote: > On Wed, Jan 17, 2018 at 12:37:52PM +0000, Russell King - ARM Linux wrote: > > On Wed, Jan 17, 2018 at 12:15:05PM +0000, Dave Martin wrote: > > > On Wed, Jan 17, 2018 at 11:57:09AM +0000, Russell King - ARM Linux wrote: > > [...] > > > > > VFP used to use force_sig_info(), but it seems to be really the wrong > > > > call to use. force_sig_info() checks whether the program decided to > > > > ignore or block the signal, and if it did, replaces the signal handler > > > > with the default handler and unblocks the signal. > > > > > > > > Are you really suggesting that FP all FP signals should get this > > > > treatment? > > > > > > feenableexcept(FE_OVERFLOW) kind of means "I can't run safely past > > > an fp overflow exception, please signal me instead". > > > > > > If the process also blocked SIGFPE, that could be taken to mean > > > "I can't run safely past an fp overflow exception _and_ I can't > > > take SIGFPE either" ... i.e., if an fp overflow happens there is > > > no way to proceed and it's really fatal. > > > > > > What SIG_IGN ought to mean is rather more debatable, but again, > > > the process could be asking for two opposite things: guarantee a > > > SIGFPE is delivered instead of running past an fp exception, and > > > also guarantee that SIGFPE is _not_ delivered. > > > > > > It looks like arm and arm64 are different from most other arches > > > (including x86) here, but I'm not sure what is considered correct, and > > > it looks like the answer is not standardised. There's a possibility > > > that some software goes subtly wrong on arm/arm64 where on other arches > > > it would get terminated with SIGKILL. > > > > > > Whether this matters depends on how harmless the fp exception is to > > > the work of the program. I think if an exception is set to trap > > > via feenableexcept() then that's a strong hint the programmer thinks > > > that exception is not harmless. OTOH, trapping is not always > > > available anyway... > > > > Like many of these things, there is no clear answer. It's a set of > > conflicting requirements, and as you point out, even if you've called > > feenableexcept(), you are not guaranteed to get a trap. > > > > However, do remember that FP exceptions on ARM hardware are already > > asynchronous - they get reported by the _next_ FP operation to the one > > that caused them, which means they could be raised by a library function > > sometime after it occured (when the library function decides to save the > > FP registers to the stack before it makes use of them.) It's entirely > > possible that the library function has blocked FP signals temporarily > > (not explicitly, just decided to block all signals while it does > > something sensitive) and will unblock them again afterwards - at which > > point we get the SIGFPE, and it would be quite right to deliver that > > signal to the user SIGFPE handler, rather than forcing it onto the > > program mid-library function. > > > > It's also possible that SIGFPE could be blocked by another signal handler > > having been invoked, and it triggers the latent generation of the SIGFPE. > > > > I'd be more inclined to agree with you if VFP exceptions were synchronous > > but they aren't. > > Hmmm, it looks like imprecise fp exception traps are disallowed from > ARMv8 onwards. I guess they made more sense when the FPU really was a > coprocessor, or at least semidetached from the integer core. > > I think force_sig_info() makes sense here if and only if the traps > are guaranteed to be precise, so we probably should use this on arm64. > Not arm though (alpha doesn't either, if I understand the code > correctly.) > > Does that make sense? > > Apparently, few recent cores (at least ARM's own ones) support fp > exception trapping anyway... 1176 may be the most recent. ... and that makes the feenableexcept() argument about "A program really wants to know" moot. It can enable the exceptions in the FPSCR but its never going to receive a SIGFPE on CPUs that don't do exception trapping.
On Wed, Jan 17, 2018 at 03:49:59PM +0000, Russell King - ARM Linux wrote: > On Wed, Jan 17, 2018 at 03:37:31PM +0000, Dave Martin wrote: > > On Wed, Jan 17, 2018 at 12:37:52PM +0000, Russell King - ARM Linux wrote: [...] > > > I'd be more inclined to agree with you if VFP exceptions were synchronous > > > but they aren't. > > > > Hmmm, it looks like imprecise fp exception traps are disallowed from > > ARMv8 onwards. I guess they made more sense when the FPU really was a > > coprocessor, or at least semidetached from the integer core. > > > > I think force_sig_info() makes sense here if and only if the traps > > are guaranteed to be precise, so we probably should use this on arm64. > > Not arm though (alpha doesn't either, if I understand the code > > correctly.) > > > > Does that make sense? > > > > Apparently, few recent cores (at least ARM's own ones) support fp > > exception trapping anyway... 1176 may be the most recent. > > ... and that makes the feenableexcept() argument about "A program > really wants to know" moot. It can enable the exceptions in the > FPSCR but its never going to receive a SIGFPE on CPUs that don't > do exception trapping. Sort of. If the hardware doesn't support traps then those FP(S)CR bits can't be set. feenableexcept() returns -1 if the requested bits don't stick, but I'll bet there's software that doesn't bother to check... Relying on fp exception traps therefore isn't portable, but software that does so can at least portably fail safe if it checks for the -1 return. I'll cook up some RFC patches for arm64, then I could take a look at arm is nobody else is working on it. Cheers ---Dave
Russell King - ARM Linux <linux@armlinux.org.uk> writes: > On Wed, Jan 17, 2018 at 12:15:05PM +0000, Dave Martin wrote: >> On Wed, Jan 17, 2018 at 11:57:09AM +0000, Russell King - ARM Linux wrote: >> > On Tue, Jan 16, 2018 at 04:28:50PM -0600, Eric W. Biederman wrote: >> > > I will keep FPE_FIXME as a place holder until this gets sorted out. >> > > >> > > There is a second issue I am looking at in this location, >> > > and maybe I don't have to address it now. But it looks like the code is >> > > calling send_sig_info instead of force_sig_info for a synchronous >> > > exception. Am I reading that correctly? >> > >> > VFP used to use force_sig_info(), but it seems to be really the wrong >> > call to use. force_sig_info() checks whether the program decided to >> > ignore or block the signal, and if it did, replaces the signal handler >> > with the default handler and unblocks the signal. That ignored and blocked behavior is a very weird implementation, but for a synchronous signal it amounts to enforcing coredump and then exit behavior. As the process does not continue past that point the behavior is not observable by userspace. >> > Are you really suggesting that FP all FP signals should get this >> > treatment? I am only suggesting that all synchronous signals, aka signals where it helps to point at the instruction from the signal information get that treatment. As the vast majority are synchronous I was asking about this one oddball case. >> feenableexcept(FE_OVERFLOW) kind of means "I can't run safely past >> an fp overflow exception, please signal me instead". >> >> If the process also blocked SIGFPE, that could be taken to mean >> "I can't run safely past an fp overflow exception _and_ I can't >> take SIGFPE either" ... i.e., if an fp overflow happens there is >> no way to proceed and it's really fatal. >> >> What SIG_IGN ought to mean is rather more debatable, but again, >> the process could be asking for two opposite things: guarantee a >> SIGFPE is delivered instead of running past an fp exception, and >> also guarantee that SIGFPE is _not_ delivered. >> >> It looks like arm and arm64 are different from most other arches >> (including x86) here, but I'm not sure what is considered correct, and >> it looks like the answer is not standardised. There's a possibility >> that some software goes subtly wrong on arm/arm64 where on other arches >> it would get terminated with SIGKILL. I looked it up yesterday to be clear, and POSIX actually says the behavior is implemenation dependent/undefined if you try to ignore SIGFPE. >> Whether this matters depends on how harmless the fp exception is to >> the work of the program. I think if an exception is set to trap >> via feenableexcept() then that's a strong hint the programmer thinks >> that exception is not harmless. OTOH, trapping is not always >> available anyway... > > Like many of these things, there is no clear answer. It's a set of > conflicting requirements, and as you point out, even if you've called > feenableexcept(), you are not guaranteed to get a trap. > > However, do remember that FP exceptions on ARM hardware are already > asynchronous - they get reported by the _next_ FP operation to the one > that caused them, which means they could be raised by a library function > sometime after it occured (when the library function decides to save the > FP registers to the stack before it makes use of them.) It's entirely > possible that the library function has blocked FP signals temporarily > (not explicitly, just decided to block all signals while it does > something sensitive) and will unblock them again afterwards - at which > point we get the SIGFPE, and it would be quite right to deliver that > signal to the user SIGFPE handler, rather than forcing it onto the > program mid-library function. > > It's also possible that SIGFPE could be blocked by another signal handler > having been invoked, and it triggers the latent generation of the SIGFPE. > > I'd be more inclined to agree with you if VFP exceptions were synchronous > but they aren't. From your description there still seems to be an association with an instruction so I don't know if I would really call the signal asynchronous. It sounds like the exception is delayed and not asynchronous. >> Was there some particular program being broken by the force_sig_info() >> here? > > I don't recall. > commit da41119af78864d27ccbf505949df788d5e8aaf5 > Author: Russell King <rmk@dyn-67.arm.linux.org.uk> > Date: Wed Jun 29 23:02:02 2005 +0100 > > [PATCH] ARM: Don't force SIGFPE > > We were forcing SIGFPE on to a user program for no good reason. > Use send_sig_info() instead. > > Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> The commit looks like it was a case of the code not looking right and you just switching to send_sig_info. force_sig_info really out to be called something like synchronous_sig. I am looking at sorting that out as part of cleaning up the signal handling. There is currently a small bug where under the right circumstances these synchronous signals might be improperly delivered after a thread specific signal. To make that very unlikely there is a special case in dequeue_signal for synchronous signals but it is imperfect and slower than necessary. The function really ought to look something like: void synch_sig(struct siginfo *info) { struct task_struct *tsk = current; int sig = info->si_signo; struct ksignal ksig; struct k_sigaction *ka; bool blocked, ignored; unsigned long flags; WARN_ON(!siginmask(sig, SYNCHRONOUS_MASK)); copy_siginfo(info, &ksig.info); spin_lock_irqsave(&tsk->sighand->siglock, flags); ka = &tsk->sighand->action[sig - 1]; ignored = ka->sa.sa_handler == SIG_IGN; blocked = sigismember(&tsk->blocked, sig); ksig.ka = *ka; if (blocked || ignored) { ksig.ka.sa.sa_handler = SIG_DFL; } else if (ka->sa.sa_flags & SA_ONESHOT) { ka->sa.sa_handler = SIG_DFL; } spin_unlock_irqrestore(&tsk->sighand->siglock, flags); if (ksig.ka.sa.sa_handler == SIG_DFL) { do_coredump(&ksig->info); do_group_exit(sig); /* NOTREACHED */ } handle_signal(&ksig, signal_pt_regs()); } Which is both clearer and faster and less buggy than the current implementation as it delivers the signal immediately with no chance of the signal queue to reorder things. But it is going to take a little bit to get there as there are a number of implementations of handle_signal like the ones on arm and arm64 that perform needed register adjustments outside of handle_signal. Eric
On Wed, Jan 17, 2018 at 10:45:10AM -0600, Eric W. Biederman wrote: > Russell King - ARM Linux <linux@armlinux.org.uk> writes: > >From your description there still seems to be an association with an > instruction so I don't know if I would really call the signal > asynchronous. It sounds like the exception is delayed and not > asynchronous. Traps can only be passed from ARM coprocessors by a coprocessor refusing to execute an instruction. That's what happens in this case - the VFP gets offered an instruction to execute. It accepts it, and the CPU continues, leaving the VFP to execute its instruction independently. If this instruction generates an error, then nothing happens at this point. That error remains pending until the CPU offers the coprocessor the next VFP instruction, which it refuses. That causes an undefined instruction exception, and we trap into the kernel VFP code which reads the VFP status and works out what needs to be done. What this means is that if you execute a VFP instruction, wait 10 minutes and then execute another VFP instruction, if the first VFP instruction raised an exception, you'll get to hear about it 10 minutes later. You can use whatever weasel words you want to describe that situation, my choice is "asynchronous", your choice is "delayed". However, it is clearly not "synchronous", and arguing that we should report something synchronously that is not reported to _us_ synchronously (where synchronous means "at the same time") is IMHO daft. So, let's take an example: installs SIGFPE handler ..fp instructions.. one of which raises an exception returns to main loop main loop blocks all signals while it sets stuff up calls ppoll() In the synchronous SIGFPE delivery case, the SIGFPE handler will be called when the exception is generated in the FP code, and delivered at that time. The fact that the main loop blocks all signals happens later, so the users handler gets called as one expects. In the VFP case, however, the FP instructions towards the end may not end up causing the exception to be signalled until sometime later, and as I've already explained, that may be the result of a C library function accessing the VFP registers. This could well end up trying to deliver the SIGFPE while signals are blocked, and we get drastically different behaviour if force_sig_info() is used. In the VFP case, if force_sig_info() is used, the program gets killed at this point. In the non-VFP case, the program's signal handler was called. Using send_sig_info() results in the already delayed or asynchronous signal being held off until ppoll() drops the blocking, at which point the signal is delivered, the program handles it in its handler, and the program continues to run. So 1. non-VFP case, program doesn't get killed but gets the opportunity to handle the signal. 2. VFP case with send_sig_info, program doesn't get killed but gets the opportunity to handle the signal. 3. VFP case with force_sig_info, the program gets killed and dumps core. Which one of these results in a big change of behaviour in your opinion?
On Mon, Jan 15, 2018 at 11:23:03AM -0600, Eric W. Biederman wrote: > Dave Martin <Dave.Martin@arm.com> writes: > > > On Thu, Jan 11, 2018 at 06:59:36PM -0600, Eric W. Biederman wrote: [...] > >> Possible ABI fixes include: > >> - Send the signal without siginfo > >> - Don't generate a signal [...] > >> - Possibly assign and use an appropriate si_code > >> - Don't handle cases which can't happen > > > > I think a mixture of these two is the best approach. > > > > In any case, si_code == 0 here doesn't seem to have any explicit meaning. > > I think we can translate all of the arm64 faults to proper si_codes -- > > see my sketch below. Probably means a bit more thought though. [...] > >> diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c [...] > >> @@ -607,70 +607,70 @@ static int do_sea(unsigned long addr, unsigned int esr, struct pt_regs *regs) > >> } > >> > >> static const struct fault_info fault_info[] = { > >> - { do_bad, SIGBUS, 0, "ttbr address size fault" }, > >> - { do_bad, SIGBUS, 0, "level 1 address size fault" }, > >> - { do_bad, SIGBUS, 0, "level 2 address size fault" }, > >> - { do_bad, SIGBUS, 0, "level 3 address size fault" }, If I convert this kind of thing to SIGKILL there really is nothing sensible to put in si_code, except possibly SI_KERNEL (indicating that the kill did not come from userspace). Even so, it hardly seems worth filling in fields like si_pid and si_uid just to make this "correct". In any case, if siginfo is never seen by userspace for SIGKILL this is moot. Obviously, siginfo is never copied to the user stack in that case, but is it also guaranteed not to be visible to userspace by other means? For ptrace I'm hoping not, since SIGKILL should nuke the tracee immediately instead of being reported to the tracer as a signal-delivery-stop -- so the tracer should get WIFSIGNALED() && WTERMSIG() == SIGKILL. A subsequent PTRACE_GETSIGINFO would fail with ESRCH. Does that match your understanding? If so, there is some merit in not pretending to pass a reall value for si_code. Should si_code simply be ignored for the SIGKILL case? Cheers ---Dave
Dave Martin <Dave.Martin@arm.com> writes: > On Mon, Jan 15, 2018 at 11:23:03AM -0600, Eric W. Biederman wrote: >> Dave Martin <Dave.Martin@arm.com> writes: >> >> > On Thu, Jan 11, 2018 at 06:59:36PM -0600, Eric W. Biederman wrote: > > [...] > >> >> Possible ABI fixes include: >> >> - Send the signal without siginfo >> >> - Don't generate a signal > > [...] > >> >> - Possibly assign and use an appropriate si_code >> >> - Don't handle cases which can't happen >> > >> > I think a mixture of these two is the best approach. >> > >> > In any case, si_code == 0 here doesn't seem to have any explicit meaning. >> > I think we can translate all of the arm64 faults to proper si_codes -- >> > see my sketch below. Probably means a bit more thought though. > > [...] > >> >> diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c > > [...] > >> >> @@ -607,70 +607,70 @@ static int do_sea(unsigned long addr, unsigned int esr, struct pt_regs *regs) >> >> } >> >> >> >> static const struct fault_info fault_info[] = { >> >> - { do_bad, SIGBUS, 0, "ttbr address size fault" }, >> >> - { do_bad, SIGBUS, 0, "level 1 address size fault" }, >> >> - { do_bad, SIGBUS, 0, "level 2 address size fault" }, >> >> - { do_bad, SIGBUS, 0, "level 3 address size fault" }, > > If I convert this kind of thing to SIGKILL there really is nothing > sensible to put in si_code, except possibly SI_KERNEL (indicating that > the kill did not come from userspace). Even so, it hardly seems worth > filling in fields like si_pid and si_uid just to make this "correct". > > In any case, if siginfo is never seen by userspace for SIGKILL this is > moot. > > Obviously, siginfo is never copied to the user stack in that case, but > is it also guaranteed not to be visible to userspace by other means? > For ptrace I'm hoping not, since SIGKILL should nuke the tracee > immediately instead of being reported to the tracer as a > signal-delivery-stop -- so the tracer should get WIFSIGNALED() && > WTERMSIG() == SIGKILL. A subsequent PTRACE_GETSIGINFO would fail with > ESRCH. > > Does that match your understanding? > > If so, there is some merit in not pretending to pass a reall value > for si_code. > > Should si_code simply be ignored for the SIGKILL case? I know what x86 does in a similar case is it uses force_sig instead of force_sig_info. Then the generic code gets to worry about If the appropriate paths generic paths get to worry about what siginfo to fill in in that case. Which for SI_KERNEL is zero for everything except the si_code and the si_signo. That seems perfectly reasonable. Eric
On Wed, Jan 17, 2018 at 11:24:06AM -0600, Eric W. Biederman wrote: > Dave Martin <Dave.Martin@arm.com> writes: [...] > > Should si_code simply be ignored for the SIGKILL case? > > I know what x86 does in a similar case is it uses force_sig instead of > force_sig_info. Then the generic code gets to worry about > > If the appropriate paths generic paths get to worry about what siginfo > to fill in in that case. Which for SI_KERNEL is zero for everything > except the si_code and the si_signo. > > That seems perfectly reasonable. OK, I'll go with SI_KERNEL then. Cheers ---Dave
Russell King - ARM Linux <linux@armlinux.org.uk> writes: > On Wed, Jan 17, 2018 at 10:45:10AM -0600, Eric W. Biederman wrote: >> Russell King - ARM Linux <linux@armlinux.org.uk> writes: >> >From your description there still seems to be an association with an >> instruction so I don't know if I would really call the signal >> asynchronous. It sounds like the exception is delayed and not >> asynchronous. > > Traps can only be passed from ARM coprocessors by a coprocessor refusing > to execute an instruction. That's what happens in this case - the VFP > gets offered an instruction to execute. It accepts it, and the CPU > continues, leaving the VFP to execute its instruction independently. If > this instruction generates an error, then nothing happens at this point. > > That error remains pending until the CPU offers the coprocessor the next > VFP instruction, which it refuses. That causes an undefined instruction > exception, and we trap into the kernel VFP code which reads the VFP > status and works out what needs to be done. > > What this means is that if you execute a VFP instruction, wait 10 minutes > and then execute another VFP instruction, if the first VFP instruction > raised an exception, you'll get to hear about it 10 minutes later. > > You can use whatever weasel words you want to describe that situation, > my choice is "asynchronous", your choice is "delayed". However, it is > clearly not "synchronous", and arguing that we should report something > synchronously that is not reported to _us_ synchronously (where > synchronous means "at the same time") is IMHO daft. > > So, let's take an example: > > installs SIGFPE handler > ..fp instructions.. one of which raises an exception > returns to main loop > main loop blocks all signals while it sets stuff up > calls ppoll() > > In the synchronous SIGFPE delivery case, the SIGFPE handler will be > called when the exception is generated in the FP code, and delivered > at that time. The fact that the main loop blocks all signals happens > later, so the users handler gets called as one expects. > > In the VFP case, however, the FP instructions towards the end may not > end up causing the exception to be signalled until sometime later, > and as I've already explained, that may be the result of a C library > function accessing the VFP registers. This could well end up trying > to deliver the SIGFPE while signals are blocked, and we get > drastically different behaviour if force_sig_info() is used. > > In the VFP case, if force_sig_info() is used, the program gets killed > at this point. In the non-VFP case, the program's signal handler was > called. > > Using send_sig_info() results in the already delayed or asynchronous > signal being held off until ppoll() drops the blocking, at which point > the signal is delivered, the program handles it in its handler, and > the program continues to run. > > So > 1. non-VFP case, program doesn't get killed but gets the opportunity > to handle the signal. > 2. VFP case with send_sig_info, program doesn't get killed but gets > the opportunity to handle the signal. > 3. VFP case with force_sig_info, the program gets killed and dumps > core. > > Which one of these results in a big change of behaviour in your > opinion? I want to apologize for the disagreement. In part of my due diligence for cleaning up the signal handling I am introducing some helpers for generating siginfo. I decided to ask which kind of helpers should I introduce. Very basic generic helpers that just wrap the current functionality today. Or some slightly smarter helpers that solve some other problems as well. After consideration I am shelving the smarter helpers for now, as the need to introduce the helpers universally is strong, so that I can guarantee struct siginfo is always fully initialized before being passed to userspace. Given the choice between force_sig_info and send_sig_info I agree that send_sig_info is the right choice for signals that can be ignored. The problem I was focusing on is the problem where force_sig_info and send_sig_info can be tricked into causing the instruction pointer to point to the wrong instruction (even when the signal is not blocked), due to the delivery of another signal. So I was wondering if in practice we could introduce a singal delivery function that would operation synchronously and would solve the instruction pointer problem. It looks to me like this location on arm where we are using send_sig_info is a clear candidate for such a function as long as it has a mode where you can say deliverly the signal like send_sig_info if the signal is blocked. Still like I said such a smarter helper is not the priority and I don't intend any semantic changes when I introduce helpers into the signal deliver path. Just fewer places initializing struct siginfo. Eric
diff --git a/arch/arm64/include/uapi/asm/siginfo.h b/arch/arm64/include/uapi/asm/siginfo.h index 574d12f86039..9b4d91277742 100644 --- a/arch/arm64/include/uapi/asm/siginfo.h +++ b/arch/arm64/include/uapi/asm/siginfo.h @@ -21,4 +21,25 @@ #include <asm-generic/siginfo.h> +/* + * SIGFPE si_codes + */ +#ifdef __KERNEL__ +#define FPE_FIXME 0 /* Broken dup of SI_USER */ +#endif /* __KERNEL__ */ + +/* + * SIGBUS si_codes + */ +#ifdef __KERNEL__ +#define BUS_FIXME 0 /* Broken dup of SI_USER */ +#endif /* __KERNEL__ */ + +/* + * SIGTRAP si_codes + */ +#ifdef __KERNEL__ +#define TRAP_FIXME 0 /* Broken dup of SI_USER */ +#endif /* __KERNEL__ */ + #endif diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c index fae81f7964b4..ad0edf31e247 100644 --- a/arch/arm64/kernel/fpsimd.c +++ b/arch/arm64/kernel/fpsimd.c @@ -867,7 +867,7 @@ asmlinkage void do_fpsimd_acc(unsigned int esr, struct pt_regs *regs) asmlinkage void do_fpsimd_exc(unsigned int esr, struct pt_regs *regs) { siginfo_t info; - unsigned int si_code = 0; + unsigned int si_code = FPE_FIXME; if (esr & FPEXC_IOF) si_code = FPE_FLTINV; diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c index 9b7f89df49db..abe200587334 100644 --- a/arch/arm64/mm/fault.c +++ b/arch/arm64/mm/fault.c @@ -596,7 +596,7 @@ static int do_sea(unsigned long addr, unsigned int esr, struct pt_regs *regs) info.si_signo = SIGBUS; info.si_errno = 0; - info.si_code = 0; + info.si_code = BUS_FIXME; if (esr & ESR_ELx_FnV) info.si_addr = NULL; else @@ -607,70 +607,70 @@ static int do_sea(unsigned long addr, unsigned int esr, struct pt_regs *regs) } static const struct fault_info fault_info[] = { - { do_bad, SIGBUS, 0, "ttbr address size fault" }, - { do_bad, SIGBUS, 0, "level 1 address size fault" }, - { do_bad, SIGBUS, 0, "level 2 address size fault" }, - { do_bad, SIGBUS, 0, "level 3 address size fault" }, + { do_bad, SIGBUS, BUS_FIXME, "ttbr address size fault" }, + { do_bad, SIGBUS, BUS_FIXME, "level 1 address size fault" }, + { do_bad, SIGBUS, BUS_FIXME, "level 2 address size fault" }, + { do_bad, SIGBUS, BUS_FIXME, "level 3 address size fault" }, { do_translation_fault, SIGSEGV, SEGV_MAPERR, "level 0 translation fault" }, { do_translation_fault, SIGSEGV, SEGV_MAPERR, "level 1 translation fault" }, { do_translation_fault, SIGSEGV, SEGV_MAPERR, "level 2 translation fault" }, { do_translation_fault, SIGSEGV, SEGV_MAPERR, "level 3 translation fault" }, - { do_bad, SIGBUS, 0, "unknown 8" }, + { do_bad, SIGBUS, BUS_FIXME, "unknown 8" }, { do_page_fault, SIGSEGV, SEGV_ACCERR, "level 1 access flag fault" }, { do_page_fault, SIGSEGV, SEGV_ACCERR, "level 2 access flag fault" }, { do_page_fault, SIGSEGV, SEGV_ACCERR, "level 3 access flag fault" }, - { do_bad, SIGBUS, 0, "unknown 12" }, + { do_bad, SIGBUS, BUS_FIXME, "unknown 12" }, { do_page_fault, SIGSEGV, SEGV_ACCERR, "level 1 permission fault" }, { do_page_fault, SIGSEGV, SEGV_ACCERR, "level 2 permission fault" }, { do_page_fault, SIGSEGV, SEGV_ACCERR, "level 3 permission fault" }, - { do_sea, SIGBUS, 0, "synchronous external abort" }, - { do_bad, SIGBUS, 0, "unknown 17" }, - { do_bad, SIGBUS, 0, "unknown 18" }, - { do_bad, SIGBUS, 0, "unknown 19" }, - { do_sea, SIGBUS, 0, "level 0 (translation table walk)" }, - { do_sea, SIGBUS, 0, "level 1 (translation table walk)" }, - { do_sea, SIGBUS, 0, "level 2 (translation table walk)" }, - { do_sea, SIGBUS, 0, "level 3 (translation table walk)" }, - { do_sea, SIGBUS, 0, "synchronous parity or ECC error" }, // Reserved when RAS is implemented - { do_bad, SIGBUS, 0, "unknown 25" }, - { do_bad, SIGBUS, 0, "unknown 26" }, - { do_bad, SIGBUS, 0, "unknown 27" }, - { do_sea, SIGBUS, 0, "level 0 synchronous parity error (translation table walk)" }, // Reserved when RAS is implemented - { do_sea, SIGBUS, 0, "level 1 synchronous parity error (translation table walk)" }, // Reserved when RAS is implemented - { do_sea, SIGBUS, 0, "level 2 synchronous parity error (translation table walk)" }, // Reserved when RAS is implemented - { do_sea, SIGBUS, 0, "level 3 synchronous parity error (translation table walk)" }, // Reserved when RAS is implemented - { do_bad, SIGBUS, 0, "unknown 32" }, + { do_sea, SIGBUS, BUS_FIXME, "synchronous external abort" }, + { do_bad, SIGBUS, BUS_FIXME, "unknown 17" }, + { do_bad, SIGBUS, BUS_FIXME, "unknown 18" }, + { do_bad, SIGBUS, BUS_FIXME, "unknown 19" }, + { do_sea, SIGBUS, BUS_FIXME, "level 0 (translation table walk)" }, + { do_sea, SIGBUS, BUS_FIXME, "level 1 (translation table walk)" }, + { do_sea, SIGBUS, BUS_FIXME, "level 2 (translation table walk)" }, + { do_sea, SIGBUS, BUS_FIXME, "level 3 (translation table walk)" }, + { do_sea, SIGBUS, BUS_FIXME, "synchronous parity or ECC error" }, // Reserved when RAS is implemented + { do_bad, SIGBUS, BUS_FIXME, "unknown 25" }, + { do_bad, SIGBUS, BUS_FIXME, "unknown 26" }, + { do_bad, SIGBUS, BUS_FIXME, "unknown 27" }, + { do_sea, SIGBUS, BUS_FIXME, "level 0 synchronous parity error (translation table walk)" }, // Reserved when RAS is implemented + { do_sea, SIGBUS, BUS_FIXME, "level 1 synchronous parity error (translation table walk)" }, // Reserved when RAS is implemented + { do_sea, SIGBUS, BUS_FIXME, "level 2 synchronous parity error (translation table walk)" }, // Reserved when RAS is implemented + { do_sea, SIGBUS, BUS_FIXME, "level 3 synchronous parity error (translation table walk)" }, // Reserved when RAS is implemented + { do_bad, SIGBUS, BUS_FIXME, "unknown 32" }, { do_alignment_fault, SIGBUS, BUS_ADRALN, "alignment fault" }, - { do_bad, SIGBUS, 0, "unknown 34" }, - { do_bad, SIGBUS, 0, "unknown 35" }, - { do_bad, SIGBUS, 0, "unknown 36" }, - { do_bad, SIGBUS, 0, "unknown 37" }, - { do_bad, SIGBUS, 0, "unknown 38" }, - { do_bad, SIGBUS, 0, "unknown 39" }, - { do_bad, SIGBUS, 0, "unknown 40" }, - { do_bad, SIGBUS, 0, "unknown 41" }, - { do_bad, SIGBUS, 0, "unknown 42" }, - { do_bad, SIGBUS, 0, "unknown 43" }, - { do_bad, SIGBUS, 0, "unknown 44" }, - { do_bad, SIGBUS, 0, "unknown 45" }, - { do_bad, SIGBUS, 0, "unknown 46" }, - { do_bad, SIGBUS, 0, "unknown 47" }, - { do_bad, SIGBUS, 0, "TLB conflict abort" }, - { do_bad, SIGBUS, 0, "Unsupported atomic hardware update fault" }, - { do_bad, SIGBUS, 0, "unknown 50" }, - { do_bad, SIGBUS, 0, "unknown 51" }, - { do_bad, SIGBUS, 0, "implementation fault (lockdown abort)" }, - { do_bad, SIGBUS, 0, "implementation fault (unsupported exclusive)" }, - { do_bad, SIGBUS, 0, "unknown 54" }, - { do_bad, SIGBUS, 0, "unknown 55" }, - { do_bad, SIGBUS, 0, "unknown 56" }, - { do_bad, SIGBUS, 0, "unknown 57" }, - { do_bad, SIGBUS, 0, "unknown 58" }, - { do_bad, SIGBUS, 0, "unknown 59" }, - { do_bad, SIGBUS, 0, "unknown 60" }, - { do_bad, SIGBUS, 0, "section domain fault" }, - { do_bad, SIGBUS, 0, "page domain fault" }, - { do_bad, SIGBUS, 0, "unknown 63" }, + { do_bad, SIGBUS, BUS_FIXME, "unknown 34" }, + { do_bad, SIGBUS, BUS_FIXME, "unknown 35" }, + { do_bad, SIGBUS, BUS_FIXME, "unknown 36" }, + { do_bad, SIGBUS, BUS_FIXME, "unknown 37" }, + { do_bad, SIGBUS, BUS_FIXME, "unknown 38" }, + { do_bad, SIGBUS, BUS_FIXME, "unknown 39" }, + { do_bad, SIGBUS, BUS_FIXME, "unknown 40" }, + { do_bad, SIGBUS, BUS_FIXME, "unknown 41" }, + { do_bad, SIGBUS, BUS_FIXME, "unknown 42" }, + { do_bad, SIGBUS, BUS_FIXME, "unknown 43" }, + { do_bad, SIGBUS, BUS_FIXME, "unknown 44" }, + { do_bad, SIGBUS, BUS_FIXME, "unknown 45" }, + { do_bad, SIGBUS, BUS_FIXME, "unknown 46" }, + { do_bad, SIGBUS, BUS_FIXME, "unknown 47" }, + { do_bad, SIGBUS, BUS_FIXME, "TLB conflict abort" }, + { do_bad, SIGBUS, BUS_FIXME, "Unsupported atomic hardware update fault" }, + { do_bad, SIGBUS, BUS_FIXME, "unknown 50" }, + { do_bad, SIGBUS, BUS_FIXME, "unknown 51" }, + { do_bad, SIGBUS, BUS_FIXME, "implementation fault (lockdown abort)" }, + { do_bad, SIGBUS, BUS_FIXME, "implementation fault (unsupported exclusive)" }, + { do_bad, SIGBUS, BUS_FIXME, "unknown 54" }, + { do_bad, SIGBUS, BUS_FIXME, "unknown 55" }, + { do_bad, SIGBUS, BUS_FIXME, "unknown 56" }, + { do_bad, SIGBUS, BUS_FIXME, "unknown 57" }, + { do_bad, SIGBUS, BUS_FIXME, "unknown 58" }, + { do_bad, SIGBUS, BUS_FIXME, "unknown 59" }, + { do_bad, SIGBUS, BUS_FIXME, "unknown 60" }, + { do_bad, SIGBUS, BUS_FIXME, "section domain fault" }, + { do_bad, SIGBUS, BUS_FIXME, "page domain fault" }, + { do_bad, SIGBUS, BUS_FIXME, "unknown 63" }, }; int handle_guest_sea(phys_addr_t addr, unsigned int esr) @@ -739,11 +739,11 @@ static struct fault_info __refdata debug_fault_info[] = { { do_bad, SIGTRAP, TRAP_HWBKPT, "hardware breakpoint" }, { do_bad, SIGTRAP, TRAP_HWBKPT, "hardware single-step" }, { do_bad, SIGTRAP, TRAP_HWBKPT, "hardware watchpoint" }, - { do_bad, SIGBUS, 0, "unknown 3" }, + { do_bad, SIGBUS, BUS_FIXME, "unknown 3" }, { do_bad, SIGTRAP, TRAP_BRKPT, "aarch32 BKPT" }, - { do_bad, SIGTRAP, 0, "aarch32 vector catch" }, + { do_bad, SIGTRAP, TRAP_FIXME, "aarch32 vector catch" }, { early_brk64, SIGTRAP, TRAP_BRKPT, "aarch64 BRK" }, - { do_bad, SIGBUS, 0, "unknown 7" }, + { do_bad, SIGBUS, BUS_FIXME, "unknown 7" }, }; void __init hook_debug_fault_code(int nr, diff --git a/kernel/signal.c b/kernel/signal.c index c894162ec96c..fd182a845490 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -2711,6 +2711,10 @@ enum siginfo_layout siginfo_layout(int sig, int si_code) #ifdef FPE_FIXME if ((sig == SIGFPE) && (si_code == FPE_FIXME)) layout = SIL_FAULT; +#endif +#ifdef BUS_FIXME + if ((sig == SIGBUS) && (si_code == BUS_FIXME)) + layout = SIL_FAULT; #endif } return layout;
Setting si_code to 0 results in a userspace seeing an si_code of 0. This is the same si_code as SI_USER. Posix and common sense requires that SI_USER not be a signal specific si_code. As such this use of 0 for the si_code is a pretty horribly broken ABI. Further use of si_code == 0 guaranteed that copy_siginfo_to_user saw a value of __SI_KILL and now sees a value of SIL_KILL with the result that uid and pid fields are copied and which might copying the si_addr field by accident but certainly not by design. Making this a very flakey implementation. Utilizing FPE_FIXME, BUS_FIXME, TRAP_FIXME siginfo_layout will now return SIL_FAULT and the appropriate fields will be reliably copied. But folks this is a new and unique kind of bad. This is massively untested code bad. This is inventing new and unique was to get siginfo wrong bad. This is don't even think about Posix or what siginfo means bad. This is lots of eyeballs all missing the fact that the code does the wrong thing bad. This is getting stuck and keep making the same mistake bad. I really hope we can find a non userspace breaking fix for this on a port as new as arm64. Possible ABI fixes include: - Send the signal without siginfo - Don't generate a signal - Possibly assign and use an appropriate si_code - Don't handle cases which can't happen Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Tyler Baicar <tbaicar@codeaurora.org> Cc: James Morse <james.morse@arm.com> Cc: Tony Lindgren <tony@atomide.com> Cc: Nicolas Pitre <nico@linaro.org> Cc: Olof Johansson <olof@lixom.net> Cc: Santosh Shilimkar <santosh.shilimkar@ti.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: linux-arm-kernel@lists.infradead.org Ref: 53631b54c870 ("arm64: Floating point and SIMD") Ref: 32015c235603 ("arm64: exception: handle Synchronous External Abort") Ref: 1d18c47c735e ("arm64: MMU fault handling and page table management") Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> --- arch/arm64/include/uapi/asm/siginfo.h | 21 +++++++ arch/arm64/kernel/fpsimd.c | 2 +- arch/arm64/mm/fault.c | 114 +++++++++++++++++----------------- kernel/signal.c | 4 ++ 4 files changed, 83 insertions(+), 58 deletions(-)