diff mbox

sparc/ppc/arm compat siginfo ABI regressions: sending SIGFPE via kill() returns wrong values in si_pid and si_uid

Message ID CA+55aFwoThxqUeAZSsMhf--ODyhGkmOENH5R6=4+CuaopFx9eA@mail.gmail.com (mailing list archive)
State New, archived
Headers show

Commit Message

Linus Torvalds April 13, 2018, 4:33 p.m. UTC
On Fri, Apr 13, 2018 at 2:42 AM, Russell King - ARM Linux
<linux@armlinux.org.uk> wrote:
>
> Yes, it does solve the problem at hand with strace - the exact patch I
> tested against 4.16 is below.

Ok, good.

> However, FPE_FLTUNK is not defined in older kernels, so while we can
> fix it this way for the current merge window, that doesn't help 4.16.

I wonder if we should even bother with FPE_FLTUNK.

I suspect we might as well use FPE_FLTINV, I suspect, and not have
this complexity at all. That case is not worth worrying about, since
it's a "this shouldn't happen anyway" and the *real* reason will be in
the kernel logs due to vfs_panic().

So it's not like this is something that the user should ever care
about the si_code about.

> Given that the path we're talking about is unlikely to happen (as
> mentioned in my second paragraph) I still think reverting Eric's patch
> is the right way forward for older kernels.

I'd much rather get rid of that FPE_FIXME, and leave that whole mess behind.

So the attached patch seems simple and should work with 4.16 too.

Let's not leave this as some kind of nasty maintenance issue, and just
go for simple and stupid.

Hmm?

                Linus

Comments

Dave Martin April 13, 2018, 5:08 p.m. UTC | #1
On Fri, Apr 13, 2018 at 09:33:17AM -0700, Linus Torvalds wrote:
> On Fri, Apr 13, 2018 at 2:42 AM, Russell King - ARM Linux
> <linux@armlinux.org.uk> wrote:
> >
> > Yes, it does solve the problem at hand with strace - the exact patch I
> > tested against 4.16 is below.
> 
> Ok, good.
> 
> > However, FPE_FLTUNK is not defined in older kernels, so while we can
> > fix it this way for the current merge window, that doesn't help 4.16.
> 
> I wonder if we should even bother with FPE_FLTUNK.
> 
> I suspect we might as well use FPE_FLTINV, I suspect, and not have
> this complexity at all. That case is not worth worrying about, since
> it's a "this shouldn't happen anyway" and the *real* reason will be in
> the kernel logs due to vfs_panic().
> 
> So it's not like this is something that the user should ever care
> about the si_code about.

Ack, my intended meaning for FPE_FLTUNK is that the fp exception is
either spurious or we can't tell easily (or possibly at all) which
FPE_XXX should be returned.  It's up to userspace to figure it out
if it really cares.  Previously we were accidentally returning SI_USER
in si_code for arm64.

This case on arm looks like a more serious error for which FPE_FLTINV
may be more appropriate anyway.

[...]

Cheers
---Dave
Russell King (Oracle) April 13, 2018, 5:54 p.m. UTC | #2
On Fri, Apr 13, 2018 at 06:08:28PM +0100, Dave Martin wrote:
> On Fri, Apr 13, 2018 at 09:33:17AM -0700, Linus Torvalds wrote:
> > On Fri, Apr 13, 2018 at 2:42 AM, Russell King - ARM Linux
> > <linux@armlinux.org.uk> wrote:
> > >
> > > Yes, it does solve the problem at hand with strace - the exact patch I
> > > tested against 4.16 is below.
> > 
> > Ok, good.
> > 
> > > However, FPE_FLTUNK is not defined in older kernels, so while we can
> > > fix it this way for the current merge window, that doesn't help 4.16.
> > 
> > I wonder if we should even bother with FPE_FLTUNK.
> > 
> > I suspect we might as well use FPE_FLTINV, I suspect, and not have
> > this complexity at all. That case is not worth worrying about, since
> > it's a "this shouldn't happen anyway" and the *real* reason will be in
> > the kernel logs due to vfs_panic().
> > 
> > So it's not like this is something that the user should ever care
> > about the si_code about.
> 
> Ack, my intended meaning for FPE_FLTUNK is that the fp exception is
> either spurious or we can't tell easily (or possibly at all) which
> FPE_XXX should be returned.  It's up to userspace to figure it out
> if it really cares.  Previously we were accidentally returning SI_USER
> in si_code for arm64.
> 
> This case on arm looks like a more serious error for which FPE_FLTINV
> may be more appropriate anyway.

No.  The cases where we get to this point are:

1. A trap concerning a coprocessor register transfer instruction (iow, move
   between a VFP register and ARM register.)
2. A trap concerning a coprocessor register load or save instruction.

(In both of these, "concerning" means that the VFP hardware provides
such an instruction as the reason for the fault, *not* that it is the
faulting instruction.)

3. A combination of the exception bits (EX and DEX) on certain VFP
   implementations.

All of these can be summarised as "the hardware went wrong in some way"
rather than "the user program did something wrong."

FPE_FLTINV means "floating point invalid operation".  Does it really
cover the case where hardware has failed, or is it intended to cover
the case where userspace did something wrong and asked for an invalid
operation from the FP hardware?
Linus Torvalds April 13, 2018, 6:23 p.m. UTC | #3
On Fri, Apr 13, 2018 at 10:54 AM, Russell King - ARM Linux
<linux@armlinux.org.uk> wrote:
>
> FPE_FLTINV means "floating point invalid operation".  Does it really
> cover the case where hardware has failed, or is it intended to cover
> the case where userspace did something wrong and asked for an invalid
> operation from the FP hardware?

Note that the number of people who actually look at the si_code is
approximately zero.

But the ones that _do_ check the si_code are certainly not going to
check it against a new code that they don't know about.

I suspect that if you start searching for FLT_xyz occurrences in code,
approximately 100% of them are from the kernel code that generates
them, not from any actual users.

So I'd be very surprised if you can find *anybody* who cares about
that exact value (with the possible exceptions of test-suites).

Sadly, google code-search is no more. It was useful for things like that.

                Linus
Dave Martin April 13, 2018, 6:35 p.m. UTC | #4
On Fri, Apr 13, 2018 at 06:54:08PM +0100, Russell King - ARM Linux wrote:
> On Fri, Apr 13, 2018 at 06:08:28PM +0100, Dave Martin wrote:
> > On Fri, Apr 13, 2018 at 09:33:17AM -0700, Linus Torvalds wrote:
> > > On Fri, Apr 13, 2018 at 2:42 AM, Russell King - ARM Linux
> > > <linux@armlinux.org.uk> wrote:
> > > >
> > > > Yes, it does solve the problem at hand with strace - the exact patch I
> > > > tested against 4.16 is below.
> > > 
> > > Ok, good.
> > > 
> > > > However, FPE_FLTUNK is not defined in older kernels, so while we can
> > > > fix it this way for the current merge window, that doesn't help 4.16.
> > > 
> > > I wonder if we should even bother with FPE_FLTUNK.
> > > 
> > > I suspect we might as well use FPE_FLTINV, I suspect, and not have
> > > this complexity at all. That case is not worth worrying about, since
> > > it's a "this shouldn't happen anyway" and the *real* reason will be in
> > > the kernel logs due to vfs_panic().
> > > 
> > > So it's not like this is something that the user should ever care
> > > about the si_code about.
> > 
> > Ack, my intended meaning for FPE_FLTUNK is that the fp exception is
> > either spurious or we can't tell easily (or possibly at all) which
> > FPE_XXX should be returned.  It's up to userspace to figure it out
> > if it really cares.  Previously we were accidentally returning SI_USER
> > in si_code for arm64.
> > 
> > This case on arm looks like a more serious error for which FPE_FLTINV
> > may be more appropriate anyway.
> 
> No.  The cases where we get to this point are:
> 
> 1. A trap concerning a coprocessor register transfer instruction (iow, move
>    between a VFP register and ARM register.)
> 2. A trap concerning a coprocessor register load or save instruction.
> 
> (In both of these, "concerning" means that the VFP hardware provides
> such an instruction as the reason for the fault, *not* that it is the
> faulting instruction.)
> 
> 3. A combination of the exception bits (EX and DEX) on certain VFP
>    implementations.
> 
> All of these can be summarised as "the hardware went wrong in some way"
> rather than "the user program did something wrong."

Although my understanding of VFP bounces is a bit hazy, I think this is
broadly in line with my assumptions.

> FPE_FLTINV means "floating point invalid operation".  Does it really
> cover the case where hardware has failed, or is it intended to cover
> the case where userspace did something wrong and asked for an invalid
> operation from the FP hardware?

So, there's an argument that FPE_FLTINV is not really correct.  My
rationale was that there is nothing correct that we can return, and
FPE_FLTINV may be no worse than the alternatives.

If we can only hit this case as the result of a hardware failure
or kernel bug though, should this be delivered as SIGKILL instead?

That's the approach I eventually followed for various exceptions
on arm64 that were theoretically delivered to userspace with si_code==0,
but really should be impossible unless and kernel and/or hardware
is buggy.

If that's the case though, I don't see how a userspace testsuite is
hitting this code path.  Maybe I've misunderstood the context of this
thread.

Cheers
---Dave
Dave Martin April 13, 2018, 6:45 p.m. UTC | #5
On Fri, Apr 13, 2018 at 11:23:36AM -0700, Linus Torvalds wrote:
> On Fri, Apr 13, 2018 at 10:54 AM, Russell King - ARM Linux
> <linux@armlinux.org.uk> wrote:
> >
> > FPE_FLTINV means "floating point invalid operation".  Does it really
> > cover the case where hardware has failed, or is it intended to cover
> > the case where userspace did something wrong and asked for an invalid
> > operation from the FP hardware?
> 
> Note that the number of people who actually look at the si_code is
> approximately zero.
> 
> But the ones that _do_ check the si_code are certainly not going to
> check it against a new code that they don't know about.
> 
> I suspect that if you start searching for FLT_xyz occurrences in code,
> approximately 100% of them are from the kernel code that generates
> them, not from any actual users.
> 
> So I'd be very surprised if you can find *anybody* who cares about
> that exact value (with the possible exceptions of test-suites).
> 
> Sadly, google code-search is no more. It was useful for things like that.

I've found https://codesearch.debian.net/ useful for digging into this
kind of question, though it tends to throw up a lot of false positives.

Most uses I've seen do nothing more than use the FPE_xyz value to
format diagnostic messages while dying.  I struggled to find code that
made a meaningful functional decision based on the value, though that's
not proof...

Cheers
---Dave
Russell King (Oracle) April 13, 2018, 6:50 p.m. UTC | #6
On Fri, Apr 13, 2018 at 07:35:38PM +0100, Dave Martin wrote:
> If that's the case though, I don't see how a userspace testsuite is
> hitting this code path.  Maybe I've misunderstood the context of this
> thread.

It isn't hitting this exact case.

The userspace testsuite is hitting an entirely different case:

	kill(getpid(), SIGFPE);

As one expects, this generates a SIGFPE to the current process, which
then inspects the siginfo structure.  Being a userspace generated
signal, si_code is set to SI_USER, which has the value 0.

With FPE_FIXME defined to zero, as Eric has done:

enum siginfo_layout siginfo_layout(int sig, int si_code)
{
        enum siginfo_layout layout = SIL_KILL;
        if ((si_code > SI_USER) && (si_code < SI_KERNEL)) {
...
        } else {
...
#ifdef FPE_FIXME
                if ((sig == SIGFPE) && (si_code == FPE_FIXME))
                        layout = SIL_FAULT;
#endif
        }
        return layout;
}

This causes siginfo_layout() to return SIL_FAULT for this userspace
generated signal, rather than the correct SIL_KILL.

This affects which fields we copy to userspace.

SI_USER is defined to pass si_pid and si_uid to the userspace process,
which on ARM are the first two consecutive 32-bit quantities in the
union, which is done when siginfo_layout() returns SIL_KILL.  However,
when SIL_FAULT is returned, we only copy si_addr in the union, which
on ARM is just one 32-bit quantity.

Consequently, userspace sees a correct value for si_pid, and si_uid
remains set to whatever was there in userspace.  In the case of the
strace program, that's zero.  This means if you run the strace
testsuite as root, the problem doesn't appear, but if you run it as
a non-root user, it will.

So, the testsuite case has little to do with the behaviour of the VFP
handling - it's to do with the behaviour of the kernel's signal handling.
Dave Martin April 13, 2018, 6:56 p.m. UTC | #7
On Fri, Apr 13, 2018 at 07:50:17PM +0100, Russell King - ARM Linux wrote:
> On Fri, Apr 13, 2018 at 07:35:38PM +0100, Dave Martin wrote:
> > If that's the case though, I don't see how a userspace testsuite is
> > hitting this code path.  Maybe I've misunderstood the context of this
> > thread.
> 
> It isn't hitting this exact case.
> 
> The userspace testsuite is hitting an entirely different case:
> 
> 	kill(getpid(), SIGFPE);
> 
> As one expects, this generates a SIGFPE to the current process, which
> then inspects the siginfo structure.  Being a userspace generated
> signal, si_code is set to SI_USER, which has the value 0.
> 
> With FPE_FIXME defined to zero, as Eric has done:
> 
> enum siginfo_layout siginfo_layout(int sig, int si_code)
> {
>         enum siginfo_layout layout = SIL_KILL;
>         if ((si_code > SI_USER) && (si_code < SI_KERNEL)) {
> ...
>         } else {
> ...
> #ifdef FPE_FIXME
>                 if ((sig == SIGFPE) && (si_code == FPE_FIXME))
>                         layout = SIL_FAULT;
> #endif
>         }
>         return layout;
> }
> 
> This causes siginfo_layout() to return SIL_FAULT for this userspace
> generated signal, rather than the correct SIL_KILL.
> 
> This affects which fields we copy to userspace.
> 
> SI_USER is defined to pass si_pid and si_uid to the userspace process,
> which on ARM are the first two consecutive 32-bit quantities in the
> union, which is done when siginfo_layout() returns SIL_KILL.  However,
> when SIL_FAULT is returned, we only copy si_addr in the union, which
> on ARM is just one 32-bit quantity.
> 
> Consequently, userspace sees a correct value for si_pid, and si_uid
> remains set to whatever was there in userspace.  In the case of the
> strace program, that's zero.  This means if you run the strace
> testsuite as root, the problem doesn't appear, but if you run it as
> a non-root user, it will.
> 
> So, the testsuite case has little to do with the behaviour of the VFP
> handling - it's to do with the behaviour of the kernel's signal handling.

Oh, right.  So, going back to the unhandled VFP bounce question,
is it reasonable for that to be a SIGKILL?  That avoids the question
of what si_code userspace should see, because userspace doesn't get
to see siginfo at all in that case: it's dead.

Or do we hit this in real situations that we want userspace to bail out
of more gracefully?

Cheers
---Dave
Linus Torvalds April 13, 2018, 7:53 p.m. UTC | #8
On Fri, Apr 13, 2018 at 11:45 AM, Dave Martin <Dave.Martin@arm.com> wrote:
>
> Most uses I've seen do nothing more than use the FPE_xyz value to
> format diagnostic messages while dying.  I struggled to find code that
> made a meaningful functional decision based on the value, though that's
> not proof...

Yeah. I've seen code that cares about SIGFPE deeply, but it's almost
invariably about some emulated environment (eg Java VM, or CPU
emulation).

And the siginfo data is basically never good enough for those
environments anyway on its own, so they will go and look at the actual
instruction that caused the fault and the register state instead,
because they need *all* the information.

The cases that use si_code are the ones that just trapped signals in
order to give a more helpful abort message.

So I could certainly imagine that si_code is actually used by somebody
who then decides to actuall act differently on it, but aside from
perhaps printing out a different message, it sounds far-fetched.

                Linus
Russell King (Oracle) April 15, 2018, 1:12 p.m. UTC | #9
On Fri, Apr 13, 2018 at 12:53:49PM -0700, Linus Torvalds wrote:
> On Fri, Apr 13, 2018 at 11:45 AM, Dave Martin <Dave.Martin@arm.com> wrote:
> >
> > Most uses I've seen do nothing more than use the FPE_xyz value to
> > format diagnostic messages while dying.  I struggled to find code that
> > made a meaningful functional decision based on the value, though that's
> > not proof...
> 
> Yeah. I've seen code that cares about SIGFPE deeply, but it's almost
> invariably about some emulated environment (eg Java VM, or CPU
> emulation).
> 
> And the siginfo data is basically never good enough for those
> environments anyway on its own, so they will go and look at the actual
> instruction that caused the fault and the register state instead,
> because they need *all* the information.
> 
> The cases that use si_code are the ones that just trapped signals in
> order to give a more helpful abort message.
> 
> So I could certainly imagine that si_code is actually used by somebody
> who then decides to actuall act differently on it, but aside from
> perhaps printing out a different message, it sounds far-fetched.

Okay, in that case let's just use FPE_FLTINV.  That makes the patch
easily back-portable for stable kernels.
Eric W. Biederman April 15, 2018, 3:22 p.m. UTC | #10
Russell King - ARM Linux <linux@armlinux.org.uk> writes:

> On Fri, Apr 13, 2018 at 12:53:49PM -0700, Linus Torvalds wrote:
>> On Fri, Apr 13, 2018 at 11:45 AM, Dave Martin <Dave.Martin@arm.com> wrote:
>> >
>> > Most uses I've seen do nothing more than use the FPE_xyz value to
>> > format diagnostic messages while dying.  I struggled to find code that
>> > made a meaningful functional decision based on the value, though that's
>> > not proof...
>> 
>> Yeah. I've seen code that cares about SIGFPE deeply, but it's almost
>> invariably about some emulated environment (eg Java VM, or CPU
>> emulation).
>> 
>> And the siginfo data is basically never good enough for those
>> environments anyway on its own, so they will go and look at the actual
>> instruction that caused the fault and the register state instead,
>> because they need *all* the information.
>> 
>> The cases that use si_code are the ones that just trapped signals in
>> order to give a more helpful abort message.
>> 
>> So I could certainly imagine that si_code is actually used by somebody
>> who then decides to actuall act differently on it, but aside from
>> perhaps printing out a different message, it sounds far-fetched.
>
> Okay, in that case let's just use FPE_FLTINV.  That makes the patch
> easily back-portable for stable kernels.

If we want to I don't think  backporting 266da65e9156 ("signal: Add
FPE_FLTUNK si_code for undiagnosable fp exceptions") would be at
all difficult.

What it is changing has been stable for quite a while.  The surroundings
might change and so it might require some trivial manual fixup but I
don't expect any problems.

Not that I want to derail the consensus but if we want to backport
similar fixes for arm64 or the other architectures that wind up using
FPE_FLTUNK for their fix we would need to backport 266da65e9156 anyway.

Eric
Eric W. Biederman April 15, 2018, 3:56 p.m. UTC | #11
Linus,

Would you consider the patchset below for -rc2?

Dealing with the aliases of SI_USER has been a challenge as we have had
a b0rked ABI in some cases since 2.5.

So far no one except myself has suggested that changing the si_code of
from 0 to something else for those problematic aliases of SI_USER is
going to be a problem.  So it looks like just fixing the issue is a real
possibility.

Fixing the cases that do kill(SIGFPE, ...) because at least test cases
care seems important.

The best fixes to backport appear to be the real architecture fixes that
remove the aliases for SI_USER as those are deep fixes that
fundamentally fix the problems, and are also very small changes.

I am not yet brave enough to merge architectural fixes like that without
arch maintainers buy-in.   Getting at least an ack if nothing else takes
a little bit of time.

Still we have a arm fix upthread and David Miller has given his nod
to a sparc fix that uses FPE_FLTUNK.  So it appears real architecture
fixes are progressing.  Further I have looked and that leaves only
powerpc, parisc, ia64, and alpha.   The new si_code FPE_FLTUNK appears
to address most of those, and there is an untested patch for parisc.

So real progress appears possible.

The generic code can do better, and that is what this rfc patchset is
about.  It ensures siginfo is fully initialized and uses copy_to_user
to copy siginfo to userspace.  This takes siginfo_layout out of
the picture and so for non-compat non-signalfd siginfos the status
quo returns to what it was before I introduced siginfo_layout (AKA
regressions go bye-bye).

I believe given the issues these changes are a candiate for -rc2.
Otherwise I will keep these changes for the next merge window.

Eric W. Biederman (3):
      signal: Ensure every siginfo we send has all bits initialized
      signal: Reduce copy_siginfo_to_user to just copy_to_user
      signal: Stop special casing TRAP_FIXME and FPE_FIXME in siginfo_layout

 arch/alpha/kernel/osf_sys.c               |  1 +
 arch/alpha/kernel/signal.c                |  2 +
 arch/alpha/kernel/traps.c                 |  5 ++
 arch/alpha/mm/fault.c                     |  2 +
 arch/arc/mm/fault.c                       |  2 +
 arch/arm/kernel/ptrace.c                  |  1 +
 arch/arm/kernel/swp_emulate.c             |  1 +
 arch/arm/kernel/traps.c                   |  5 ++
 arch/arm/mm/alignment.c                   |  1 +
 arch/arm/mm/fault.c                       |  5 ++
 arch/arm/vfp/vfpmodule.c                  |  3 +-
 arch/arm64/kernel/fpsimd.c                |  2 +-
 arch/arm64/kernel/sys_compat.c            |  1 +
 arch/arm64/kernel/traps.c                 |  1 +
 arch/arm64/mm/fault.c                     | 18 ++++--
 arch/c6x/kernel/traps.c                   |  1 +
 arch/hexagon/kernel/traps.c               |  1 +
 arch/hexagon/mm/vm_fault.c                |  1 +
 arch/ia64/kernel/brl_emu.c                |  1 +
 arch/ia64/kernel/signal.c                 |  2 +
 arch/ia64/kernel/traps.c                  | 27 ++++++++-
 arch/ia64/kernel/unaligned.c              |  1 +
 arch/ia64/mm/fault.c                      |  4 +-
 arch/m68k/kernel/traps.c                  |  2 +
 arch/microblaze/kernel/exceptions.c       |  1 +
 arch/microblaze/mm/fault.c                |  4 +-
 arch/mips/mm/fault.c                      |  1 +
 arch/nds32/kernel/traps.c                 |  6 +-
 arch/nds32/mm/fault.c                     |  1 +
 arch/nios2/kernel/traps.c                 |  1 +
 arch/openrisc/kernel/traps.c              |  5 +-
 arch/openrisc/mm/fault.c                  |  1 +
 arch/parisc/kernel/ptrace.c               |  1 +
 arch/parisc/kernel/traps.c                |  2 +
 arch/parisc/kernel/unaligned.c            |  1 +
 arch/parisc/math-emu/driver.c             |  1 +
 arch/parisc/mm/fault.c                    |  1 +
 arch/powerpc/kernel/process.c             |  1 +
 arch/powerpc/kernel/traps.c               |  3 +-
 arch/powerpc/mm/fault.c                   |  1 +
 arch/powerpc/platforms/cell/spufs/fault.c |  2 +-
 arch/riscv/kernel/traps.c                 |  1 +
 arch/s390/kernel/traps.c                  |  5 +-
 arch/s390/mm/fault.c                      |  2 +
 arch/sh/kernel/hw_breakpoint.c            |  1 +
 arch/sh/kernel/traps_32.c                 |  2 +
 arch/sh/math-emu/math.c                   |  1 +
 arch/sh/mm/fault.c                        |  1 +
 arch/sparc/kernel/process_64.c            |  1 +
 arch/sparc/kernel/sys_sparc_32.c          |  1 +
 arch/sparc/kernel/traps_32.c              | 10 ++++
 arch/sparc/kernel/traps_64.c              | 14 +++++
 arch/sparc/kernel/unaligned_32.c          |  1 +
 arch/sparc/mm/fault_32.c                  |  1 +
 arch/sparc/mm/fault_64.c                  |  1 +
 arch/um/kernel/trap.c                     |  2 +
 arch/unicore32/kernel/fpu-ucf64.c         |  2 +-
 arch/unicore32/mm/fault.c                 |  3 +
 arch/x86/entry/vsyscall/vsyscall_64.c     |  2 +-
 arch/x86/kernel/ptrace.c                  |  2 +-
 arch/x86/kernel/traps.c                   |  3 +
 arch/x86/kernel/umip.c                    |  1 +
 arch/x86/kvm/mmu.c                        |  1 +
 arch/x86/mm/fault.c                       |  1 +
 arch/xtensa/kernel/traps.c                |  1 +
 arch/xtensa/mm/fault.c                    |  1 +
 include/linux/ptrace.h                    |  1 -
 include/linux/tracehook.h                 |  1 +
 kernel/signal.c                           | 93 +------------------------------
 virt/kvm/arm/mmu.c                        |  1 +
 70 files changed, 165 insertions(+), 115 deletions(-)

Eric
Linus Torvalds April 15, 2018, 6:16 p.m. UTC | #12
(

On Sun, Apr 15, 2018 at 8:56 AM, Eric W. Biederman
<ebiederm@xmission.com> wrote:
>
> Would you consider the patchset below for -rc2?

Ugh.

I have an irrational dislike of "clear_siginfo()". It's a nasty broken
interface, which just mis-spells "memset()", and makes it non-obvious
that you could clear it other ways too (eg "kzalloc()" or whatever).

And this series brings in a lot of new users.

Honestly, both "clear_siginfo()" and "copy_siginfo()" are crazy
interfaces. They may have made sense when converting code, but not so
much now.

If we want to have a function that initializes a siginfo, I think it
should _actually_ initialize it, and at least take the two fields that
a siginfo has to have to be valid: si_signo and si_code.

So I'd rather see a patch that does something like this:

  -             clear_siginfo(info);
  -             info->si_signo = sig;
  -             info->si_errno = 0;
  -             info->si_code = SI_USER;
  -             info->si_pid = 0;
  -             info->si_uid = 0;
  +             init_siginfo(info, sig, SI_USER);

which at least makes that function be *worth* something, not just a
bad spelling of memset. It's not just the removal of pointless "set to
zero", it's the whole concept of "this function actually makes a valid
siginfo", which is lacking in the existing function.

(Yeah, yeah, si_errno is "generic" too and always part of a siginfo,
but nobody cares. It's pretty much always set to zero, it would be
stupid to add that interface to the "init_siginfo()" function. So just
clearing it is fine, the one or two places that want to set it to some
silly value can do it themselves).

Then your series would incidentally also:

 (a) make for fewer lines overall, rather than add lines

 (b) make it clear that we always initialize si_code, which now *must*
be a valid value with all the recent siginfo changes.

Hmm?

The other thing we should do is to get rid of the stupid padding.
Right now "struct siginfo" is pointlessly padded to 128 bytes. That is
completely insane, when it's always just zero in the kernel.

So put that _pad[] thing inside #ifndef __KERNEL__, and make
copy_siginfo_to_user() write the padding zeroes when copying to user
space. The reason for the padding is "future expansion", so we do want
to tell the user space that it's maybe up to 128 bytes in size, but if
we don't fill it all, we shouldn't waste time and memory on clearing
the padding internally.

I'm certainly *hoping* nobody depends on the whole 128 bytes in
rt_sigqueueinfo(). In theory you can fill it all (as long as si_code
is negative), but the man-page only says "si_value", and the compat
function doesn't copy any more than that either, so any user that
tries to fill in more than si_value is already broken. In fact, it
might even be worth enforcing that in rt_sigqueueinfo(), just to see
if anybody plays any games..

On x86-64, without the pointless padding, the size of 'struct siginfo'
inside the kernel would be 48 bytes. That's quite a big difference for
something that is often allocated on the kernel stack.

So I'm certainly willing to make those kinds of changes, but let's
make them real *improvements* now, ok? Wasn't that the point of all
the cleanups in the end?

                       Linus
Eric W. Biederman April 16, 2018, 2:03 a.m. UTC | #13
Linus Torvalds <torvalds@linux-foundation.org> writes:

> (
>
> On Sun, Apr 15, 2018 at 8:56 AM, Eric W. Biederman
> <ebiederm@xmission.com> wrote:
>>
>> Would you consider the patchset below for -rc2?
>
> Ugh.

The point of this series is to squash the potential for regressions even
from the weird broken code that fills in fields for si_code 0 that do
not match SI_USER.

The copy_siginfo_to_user32 never handled that so we don't need to worry
about that.  All of the signals that abuse si_code 0 go through
force_sig_info so signalfd can not catch them.  Which means that only
copy_siginfo_to_user needs to be worried about.

Last time I was benchmarking I could not see a difference between
copying the entire siginfo and only a small part so I don't see
the point of optimizing now.

For an actual cleanup I intend to go farther than you are proposing and
convert everthing to the set of helper functions I have added to
kernel/signal.c force_sig_fault, force_sig_mceerr, force_sig_bnderr,
force_sig_pkuerr.

When I am done there will be maybe 5 instances of clear_siginfo outside
of those helper functions.  At which point I won't care if we remove
clear_siginfo and just use memset.  That is a real improvement
that looks something like: "105 files changed, 933 insertions(+), 1698 deletions(-)"
That is my end goal with all of this.

You complained to me about regressions and you are right with the
current handling of FPE_FIXME TRAP_FIXME the code is not bug compatible
with old versions of linux, and people have noticed.

What this patchset represents is the bare minimum needed to convert
copy_siginfo_to_user to a copy_to_user, and remove the possibility
of regressions.

To that end I need to ensure every struct siginfo in the entire
kernel is memset to 0.  A structure initializer is absolutely
not enough.  So I don't mind people thinking clear_siginfo is necessary.


To that end clear_siginfo where it does not take the size of
struct siginfo as a parameter is much less suceptible to typos,
and allows me to ensure every instance of struct siginfo is fully
initialized.

I fully intend to remove practically every instance of struct siginfo
from the architecture specific code, and use the aforementioned helpers.
The trouble is that some of the architecture code need refactoring
for that to happen.  Even your proposed init_siginfo can't be widely
used as a common pattern is to fill in siginfo partially in the
code with si_code and si_signo being filled in almost last.

So in short.  I intend to remove most of these clear_siginfo calls when
I remove the siginfos later.  This series is focused on making
copy_siginfo_to_user simple enough we don't need to use siginfo_layout,
and thus can be fully backwards compatibile with ourselves and we won't
need to worry about regressions.  I have aimed to keep this simple
enough we can merge this after -rc1 because we don't want regressions.

Eric

ps.  I intend to place this change first in my series wether or not it makes
it into -rc2 so that I can be certain we remove any possible regressions
in behavior on the buggy architectures.  Then I can take my time and
ensure the non-trivial changes of refactoring etc are done carefully so
I don't introduce other bugs.  I need that so I can sleep at night.

pps.  I can look at some of your other suggestions but cleverness leads
to regressions, and if you are going to complain at me harshly when I
have been being careful and taking things seriously I am not
particularly willing to take unnecessary chances.
Eric W. Biederman April 18, 2018, 5:58 p.m. UTC | #14
Linus Torvalds <torvalds@linux-foundation.org> writes:

> (
>
> On Sun, Apr 15, 2018 at 8:56 AM, Eric W. Biederman
> <ebiederm@xmission.com> wrote:

[snip bit about wanting what is effectively force_sig_fault instead of
clear_siginfo everywhere]

> The other thing we should do is to get rid of the stupid padding.
> Right now "struct siginfo" is pointlessly padded to 128 bytes. That is
> completely insane, when it's always just zero in the kernel.
>
> So put that _pad[] thing inside #ifndef __KERNEL__, and make
> copy_siginfo_to_user() write the padding zeroes when copying to user
> space. The reason for the padding is "future expansion", so we do want
> to tell the user space that it's maybe up to 128 bytes in size, but if
> we don't fill it all, we shouldn't waste time and memory on clearing
> the padding internally.
>
> I'm certainly *hoping* nobody depends on the whole 128 bytes in
> rt_sigqueueinfo(). In theory you can fill it all (as long as si_code
> is negative), but the man-page only says "si_value", and the compat
> function doesn't copy any more than that either, so any user that
> tries to fill in more than si_value is already broken. In fact, it
> might even be worth enforcing that in rt_sigqueueinfo(), just to see
> if anybody plays any games..


From my earlier looking I think this is all doable except detecting
if

> On x86-64, without the pointless padding, the size of 'struct siginfo'
> inside the kernel would be 48 bytes. That's quite a big difference for
> something that is often allocated on the kernel stack.

From my earlier looking I can say that I know of no case where signals
are injected into the kernel that we need more bytes than the kernel
currently provides.

The two cases I know of are signal reinjection for checkpoint/restart
or something that just uses pid, uid and ptr.   Generally that is
enough.

If we just truncate siginfo to everything except the pad bytes in the
kernel there should be no problems.  What I don't see how to do is to
limit the size of struct siginfo the kernel accepts in a forward
compatible way.  Something that would fail if for some reason you used
more siginfo bytes.

Looking at userspace.  glibc always memsets siginfo_t to 0.
Criu just uses whatever it captured with PTRACE_PEEKSIGINFO,
and criu uses unitialized memory for the target of PTRACE_PEEKSIGINFO.

If we truncate siginfo in the kernel then we check for a known si_code
in which case we are safe to truncate siginfo.  If the si_code is
unknown then we should check to see if the extra bytes are 0.  That
should work with everything.  Plus it is a natural point to test to
see if userspace is using signals that the kernel does not currently
support.

I will put that in my queue.

> So I'm certainly willing to make those kinds of changes, but let's
> make them real *improvements* now, ok? Wasn't that the point of all
> the cleanups in the end?

Definitely.  With the strace test case causing people to talk about
regressions I was just looking to see if it would make sense to do
something minimal for -rc2 so reduce concerns about regressions.

Now I am going to focus on getting what I can ready for the next merge
window.

Eric
Dave Martin April 19, 2018, 9:28 a.m. UTC | #15
On Sun, Apr 15, 2018 at 11:16:04AM -0700, Linus Torvalds wrote:

[...]

> The other thing we should do is to get rid of the stupid padding.
> Right now "struct siginfo" is pointlessly padded to 128 bytes. That is
> completely insane, when it's always just zero in the kernel.

Agreed, inside the kernel the padding achieves nothing.

> So put that _pad[] thing inside #ifndef __KERNEL__, and make
> copy_siginfo_to_user() write the padding zeroes when copying to user
> space. The reason for the padding is "future expansion", so we do want
> to tell the user space that it's maybe up to 128 bytes in size, but if
> we don't fill it all, we shouldn't waste time and memory on clearing
> the padding internally.
> 
> I'm certainly *hoping* nobody depends on the whole 128 bytes in
> rt_sigqueueinfo(). In theory you can fill it all (as long as si_code
> is negative), but the man-page only says "si_value", and the compat
> function doesn't copy any more than that either, so any user that
> tries to fill in more than si_value is already broken. In fact, it
> might even be worth enforcing that in rt_sigqueueinfo(), just to see
> if anybody plays any games..

[...]

Digression:

Since we don't traditionally zero the tail-padding in the user sigframe,
is there a reliable way for userspace to detect newly-added fields in
siginfo other than by having an explicit sigaction sa_flags flag to
request them?  I imagine something like [1] below from the userspace
perspective.

On a separate thread, the issue of how to report syndrome information
for SIGSEGV came up [2] (such as whether the faulting instruction was a
read or a write).  This information is useful (and used) by things like
userspace sanitisers and qemu.  Currently, reporting this to userspace
relies on arch-specific cruft in the sigframe.

We're committed to maintaining what's already in each arch sigframe,
but it would be preferable to have a portable way of adding information
to siginfo in a generic way.  si_code doesn't really work for that,
since si_codes are mutually exclusive: I can't see a way of adding
supplementary information using si_code.

Anyway, that would be a separate RFC in the future (if ever).

Cheers
---Dave


[1]

static volatile int have_extflags = 0;

static void handler(int n, siginfo_t *si, void *uc)
{
	/* ... */

	if (have_extflags) {
		/* Check si->si_extflags */
	} else {
		/* fallback */
	}

	/* ... */
}

int main(void)
{
	/* ... */

	struct sigaction sa;

	/* ... */

	sa.sa_flags = SA_SIGINFO | SA_SIGINFO_EXTFLAGS;
	sa.sa_sigaction = handler;
	if (!sigaction(SIGSEGV, &sa, NULL)) {
		have_extflags = 1;
	} else {
		sa.sa_flags &= ~SA_SIGINFO_EXTFLAGS;
		if (sigaction(SIGSEGV, &sa, NULL))
			goto error;
	}

	/* ... */
}

[2] [RFC PATCH] arm64: fault: Don't leak data in ESR context for user fault on kernel VA
http://lists.infradead.org/pipermail/linux-arm-kernel/2018-April/571428.html
Eric W. Biederman April 19, 2018, 2:40 p.m. UTC | #16
Dave Martin <Dave.Martin@arm.com> writes:

> On Sun, Apr 15, 2018 at 11:16:04AM -0700, Linus Torvalds wrote:
>
> [...]
>
>> The other thing we should do is to get rid of the stupid padding.
>> Right now "struct siginfo" is pointlessly padded to 128 bytes. That is
>> completely insane, when it's always just zero in the kernel.
>
> Agreed, inside the kernel the padding achieves nothing.
>
>> So put that _pad[] thing inside #ifndef __KERNEL__, and make
>> copy_siginfo_to_user() write the padding zeroes when copying to user
>> space. The reason for the padding is "future expansion", so we do want
>> to tell the user space that it's maybe up to 128 bytes in size, but if
>> we don't fill it all, we shouldn't waste time and memory on clearing
>> the padding internally.
>> 
>> I'm certainly *hoping* nobody depends on the whole 128 bytes in
>> rt_sigqueueinfo(). In theory you can fill it all (as long as si_code
>> is negative), but the man-page only says "si_value", and the compat
>> function doesn't copy any more than that either, so any user that
>> tries to fill in more than si_value is already broken. In fact, it
>> might even be worth enforcing that in rt_sigqueueinfo(), just to see
>> if anybody plays any games..
>
> [...]
>
> Digression:
>
> Since we don't traditionally zero the tail-padding in the user sigframe,
> is there a reliable way for userspace to detect newly-added fields in
> siginfo other than by having an explicit sigaction sa_flags flag to
> request them?  I imagine something like [1] below from the userspace
> perspective.
>
> On a separate thread, the issue of how to report syndrome information
> for SIGSEGV came up [2] (such as whether the faulting instruction was a
> read or a write).  This information is useful (and used) by things like
> userspace sanitisers and qemu.  Currently, reporting this to userspace
> relies on arch-specific cruft in the sigframe.
>
> We're committed to maintaining what's already in each arch sigframe,
> but it would be preferable to have a portable way of adding information
> to siginfo in a generic way.  si_code doesn't really work for that,
> since si_codes are mutually exclusive: I can't see a way of adding
> supplementary information using si_code.
>
> Anyway, that would be a separate RFC in the future (if ever).

So far what I have seen done is ``registers'' added to sigcontext.
Which it looks like you have done with esr.

Scrubbing information from faults to where the addresses point
outside of the userspace mapping makes sense.

I think before I would pursue what you are talking about on a generic
level I would want to look at the fact that we handle unblockable faults
wrong.  While unlikely it is possible for someone to send a thread
specific signal at just the right time, and have that signal
delivered before the synchronous fault.

Then we could pass through additional arguments through that new
``generic'' path.  Especially what are arguments such as
tsk->thread.fault_address and tsk->thread.fault_code.

We can do anything we like with a new SA_flag as that allows us to
change the format of the sigframe.

If we are very careful we can add generic fields after that crazy union
anonymous union in the _sigfault case of struct siginfo.

The trick would be to find something that would be enough so that people
don't need to implement their own instruction decoder to see what is
going on.  Something that is applicable to every sigfault case not just
SIGSEGV.  Something that we can and want to implement on multiple
architectures.

The point being doing something generic can be a lot of work, even if it
is worth it in the end.


> [1]
>
> static volatile int have_extflags = 0;
>
> static void handler(int n, siginfo_t *si, void *uc)
> {
> 	/* ... */
>
> 	if (have_extflags) {
> 		/* Check si->si_extflags */
> 	} else {
> 		/* fallback */
> 	}
>
> 	/* ... */
> }
>
> int main(void)
> {
> 	/* ... */
>
> 	struct sigaction sa;
>
> 	/* ... */
>
> 	sa.sa_flags = SA_SIGINFO | SA_SIGINFO_EXTFLAGS;
> 	sa.sa_sigaction = handler;
> 	if (!sigaction(SIGSEGV, &sa, NULL)) {
> 		have_extflags = 1;
> 	} else {
> 		sa.sa_flags &= ~SA_SIGINFO_EXTFLAGS;
> 		if (sigaction(SIGSEGV, &sa, NULL))
> 			goto error;
> 	}
>
> 	/* ... */
> }
>
> [2] [RFC PATCH] arm64: fault: Don't leak data in ESR context for user fault on kernel VA
> http://lists.infradead.org/pipermail/linux-arm-kernel/2018-April/571428.html

Eric
diff mbox

Patch

 arch/arm/include/uapi/asm/siginfo.h | 13 -------------
 arch/arm/vfp/vfpmodule.c            |  2 +-
 2 files changed, 1 insertion(+), 14 deletions(-)

diff --git a/arch/arm/include/uapi/asm/siginfo.h b/arch/arm/include/uapi/asm/siginfo.h
deleted file mode 100644
index d0513880be21..000000000000
--- a/arch/arm/include/uapi/asm/siginfo.h
+++ /dev/null
@@ -1,13 +0,0 @@ 
-#ifndef __ASM_SIGINFO_H
-#define __ASM_SIGINFO_H
-
-#include <asm-generic/siginfo.h>
-
-/*
- * SIGFPE si_codes
- */
-#ifdef __KERNEL__
-#define FPE_FIXME	0	/* Broken dup of SI_USER */
-#endif /* __KERNEL__ */
-
-#endif
diff --git a/arch/arm/vfp/vfpmodule.c b/arch/arm/vfp/vfpmodule.c
index 4c375e11ae95..af4ee2cef2f9 100644
--- a/arch/arm/vfp/vfpmodule.c
+++ b/arch/arm/vfp/vfpmodule.c
@@ -257,7 +257,7 @@  static void vfp_raise_exceptions(u32 exceptions, u32 inst, u32 fpscr, struct pt_
 
 	if (exceptions == VFP_EXCEPTION_ERROR) {
 		vfp_panic("unhandled bounce", inst);
-		vfp_raise_sigfpe(FPE_FIXME, regs);
+		vfp_raise_sigfpe(FPE_FLTINV, regs);
 		return;
 	}