diff mbox

[07/11] signal/arm64: Document conflicts with SI_USER and SIGFPE, SIGTRAP, SIGBUS

Message ID 20180112005940.23279-7-ebiederm@xmission.com (mailing list archive)
State New, archived
Headers show

Commit Message

Eric W. Biederman Jan. 12, 2018, 12:59 a.m. UTC
Setting si_code to 0 results in a userspace seeing an si_code of 0.
This is the same si_code as SI_USER.  Posix and common sense requires
that SI_USER not be a signal specific si_code.  As such this use of 0
for the si_code is a pretty horribly broken ABI.

Further use of si_code == 0 guaranteed that copy_siginfo_to_user saw a
value of __SI_KILL and now sees a value of SIL_KILL with the result
that uid and pid fields are copied and which might copying the si_addr
field by accident but certainly not by design.  Making this a very
flakey implementation.

Utilizing FPE_FIXME, BUS_FIXME, TRAP_FIXME siginfo_layout will now return
SIL_FAULT and the appropriate fields will be reliably copied.

But folks this is a new and unique kind of bad.  This is massively
untested code bad.  This is inventing new and unique was to get
siginfo wrong bad.  This is don't even think about Posix or what
siginfo means bad.  This is lots of eyeballs all missing the fact
that the code does the wrong thing bad.  This is getting stuck
and keep making the same mistake bad.

I really hope we can find a non userspace breaking fix for this on a
port as new as arm64.

Possible ABI fixes include:
- Send the signal without siginfo
- Don't generate a signal
- Possibly assign and use an appropriate si_code
- Don't handle cases which can't happen

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Tyler Baicar <tbaicar@codeaurora.org>
Cc: James Morse <james.morse@arm.com>
Cc: Tony Lindgren <tony@atomide.com>
Cc: Nicolas Pitre <nico@linaro.org>
Cc: Olof Johansson <olof@lixom.net>
Cc: Santosh Shilimkar <santosh.shilimkar@ti.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: linux-arm-kernel@lists.infradead.org
Ref: 53631b54c870 ("arm64: Floating point and SIMD")
Ref: 32015c235603 ("arm64: exception: handle Synchronous External Abort")
Ref: 1d18c47c735e ("arm64: MMU fault handling and page table management")
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 arch/arm64/include/uapi/asm/siginfo.h |  21 +++++++
 arch/arm64/kernel/fpsimd.c            |   2 +-
 arch/arm64/mm/fault.c                 | 114 +++++++++++++++++-----------------
 kernel/signal.c                       |   4 ++
 4 files changed, 83 insertions(+), 58 deletions(-)

Comments

Dave Martin Jan. 15, 2018, 4:30 p.m. UTC | #1
On Thu, Jan 11, 2018 at 06:59:36PM -0600, Eric W. Biederman wrote:
> Setting si_code to 0 results in a userspace seeing an si_code of 0.
> This is the same si_code as SI_USER.  Posix and common sense requires
> that SI_USER not be a signal specific si_code.  As such this use of 0
> for the si_code is a pretty horribly broken ABI.

I think this situation may have come about because 0 is used as a
padding value for "impossible" cases -- i.e., things that can't happen
unless the kernel is broken, or things that are too unrecoverable for
clean error reporting to be helpful.

In general, I think these values are not expected to reach userspace in
practice.

This is not an excuse though -- and not 100% true -- so it's certainly
worthy of cleanup.


It would be good to approach this similarly for arm and arm64, since
the arm64 fault code is derived from arm.


> Further use of si_code == 0 guaranteed that copy_siginfo_to_user saw a
> value of __SI_KILL and now sees a value of SIL_KILL with the result
> that uid and pid fields are copied and which might copying the si_addr
> field by accident but certainly not by design.  Making this a very
> flakey implementation.
> 
> Utilizing FPE_FIXME, BUS_FIXME, TRAP_FIXME siginfo_layout will now return
> SIL_FAULT and the appropriate fields will be reliably copied.
> 
> But folks this is a new and unique kind of bad.  This is massively
> untested code bad.  This is inventing new and unique was to get
> siginfo wrong bad.  This is don't even think about Posix or what
> siginfo means bad.  This is lots of eyeballs all missing the fact
> that the code does the wrong thing bad.  This is getting stuck
> and keep making the same mistake bad.
> 
> I really hope we can find a non userspace breaking fix for this on a
> port as new as arm64.

> Possible ABI fixes include:
> - Send the signal without siginfo
> - Don't generate a signal

The above two sould like ABI breaks?

> - Possibly assign and use an appropriate si_code
> - Don't handle cases which can't happen

I think a mixture of these two is the best approach.

In any case, si_code == 0 here doesn't seem to have any explicit meaning.
I think we can translate all of the arm64 faults to proper si_codes --
see my sketch below.  Probably means a bit more thought though.



The only counterargument would be if there is software relying on
these bogus signal cases getting si_code == 0 for a useful purpose.

The main reason I see to check for SI_USER is to allow a process to
filter out spurious signals (say, an asynchronous I/O signal for
which si_value would be garbage), and to print out diagnostics
before (in the case of a well-behaved program) resetting the signal
to SIG_DFL and killing itself to report the signal to the waiter.

Daemons may be more discerning about who is allowed to signal them,
but overloading SIGBUS (say) as an IPC channel sounds like a very odd
thing to do.  The same probably applies to any signal that has
nontrivial metadata.


Have you found software that is impacted by this in practice?

[...]

> +++ b/arch/arm64/kernel/fpsimd.c
> @@ -867,7 +867,7 @@ asmlinkage void do_fpsimd_acc(unsigned int esr, struct pt_regs *regs)
>  asmlinkage void do_fpsimd_exc(unsigned int esr, struct pt_regs *regs)
>  {
>  	siginfo_t info;
> -	unsigned int si_code = 0;
> +	unsigned int si_code = FPE_FIXME;
>  
>  	if (esr & FPEXC_IOF)
>  		si_code = FPE_FLTINV;

This 0 can happen for vector operations where the implementation may
not be able to report exactly what happened, for example where
the implementer didn't want to pay the cost of tracking exactly
what went wrong in each lane.

However, the FPEXC_* bits can be garbage in such a case rather
than being all zero: we should be checking the TFV bit in the ESR here.
This may be a bug.

Perhaps FPE_FLTINV should be returned in si_code for such cases:  it's
not otherwise used on arm64 -- invalid instructions would be reported as
SIGILL/ILL_ILLOPC instead).

Otherwise, we might want to define a new code or arbitrarily pick
one of the existing FLT_* since this is really a more benign condition
than executing an illegal instruction.  Alternatively, treat the
fault as spurious and suppress it, but that doesn't feel right either.


> diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
> index 9b7f89df49db..abe200587334 100644
> --- a/arch/arm64/mm/fault.c
> +++ b/arch/arm64/mm/fault.c
> @@ -596,7 +596,7 @@ static int do_sea(unsigned long addr, unsigned int esr, struct pt_regs *regs)
>  
>  	info.si_signo = SIGBUS;
>  	info.si_errno = 0;
> -	info.si_code  = 0;
> +	info.si_code  = BUS_FIXME;

Probably BUS_OBJERR.

>  	if (esr & ESR_ELx_FnV)
>  		info.si_addr = NULL;
>  	else
> @@ -607,70 +607,70 @@ static int do_sea(unsigned long addr, unsigned int esr, struct pt_regs *regs)
>  }
>  
>  static const struct fault_info fault_info[] = {
> -	{ do_bad,		SIGBUS,  0,		"ttbr address size fault"	},
> -	{ do_bad,		SIGBUS,  0,		"level 1 address size fault"	},
> -	{ do_bad,		SIGBUS,  0,		"level 2 address size fault"	},
> -	{ do_bad,		SIGBUS,  0,		"level 3 address size fault"	},
> +	{ do_bad,		SIGBUS,  BUS_FIXME,	"ttbr address size fault"	},
> +	{ do_bad,		SIGBUS,  BUS_FIXME,	"level 1 address size fault"	},
> +	{ do_bad,		SIGBUS,  BUS_FIXME,	"level 2 address size fault"	},
> +	{ do_bad,		SIGBUS,  BUS_FIXME,	"level 3 address size fault"	},

Pagetable screwup or kernel/system/CPU bug -> SIGKILL, or panic().

[...]

> -	{ do_bad,		SIGBUS,  0,		"unknown 8"			},
> +	{ do_bad,		SIGBUS,  BUS_FIXME,	"unknown 8"			},

[...]

> +	{ do_bad,		SIGBUS,  BUS_FIXME,	"unknown 12"			},

Not architected, so they could mean absolutely anything.  If they
can happen at all, they are probably unsafe to ignore.

 -> SIGKILL, or panic().

Similary for all the "unknown" codes in the table, which I omit for
brevity.

> +	{ do_sea,		SIGBUS,  BUS_FIXME,	"synchronous external abort"	},

This si_code seems to be a fallback for if ACPI is absent or doesn't
know what to do with this error.

-> SIGBUS/BUS_OBJERR?

Can probably legitimately happen for userspace for suitable MMIO mappings.

Perhaps it's more serious though in the presence of ACPI.  Do we expect
that ACPI can diagnose all localisable errors?

> +	{ do_sea,		SIGBUS,  BUS_FIXME,	"level 0 (translation table walk)"	},
> +	{ do_sea,		SIGBUS,  BUS_FIXME,	"level 1 (translation table walk)"	},
> +	{ do_sea,		SIGBUS,  BUS_FIXME,	"level 2 (translation table walk)"	},
> +	{ do_sea,		SIGBUS,  BUS_FIXME,	"level 3 (translation table walk)"	},

Pagetable screwup or kernel/system/CPU bug -> SIGKILL, or panic().

> +	{ do_sea,		SIGBUS,  BUS_FIXME,	"synchronous parity or ECC error" },	// Reserved when RAS is implemented

Possibly SIGBUS/BUS_MCEERR_AR (though I don't know exactly what
userspace is supposed to do with this or whether this implies the
existence or certain kernel features for managing the error that
may not be present on arm64...)

Otherwise, SIGKILL.

> +	{ do_sea,		SIGBUS,  BUS_FIXME,	"level 0 synchronous parity error (translation table walk)"	},	// Reserved when RAS is implemented
> +	{ do_sea,		SIGBUS,  BUS_FIXME,	"level 1 synchronous parity error (translation table walk)"	},	// Reserved when RAS is implemented
> +	{ do_sea,		SIGBUS,  BUS_FIXME,	"level 2 synchronous parity error (translation table walk)"	},	// Reserved when RAS is implemented
> +	{ do_sea,		SIGBUS,  BUS_FIXME,	"level 3 synchronous parity error (translation table walk)"	},	// Reserved when RAS is implemented

Process page tables corrupt: if the kernel couldn't fix this, the
process can't reasonably fix it -> SIGKILL

Since this is a RAS-type error it could be triggered by a cosmic ray
rather than requiring a kernel or system bug or other major failure, so
we probably shouldn't panic the system if the error is localisable to a
particular process.

>  	{ do_alignment_fault,	SIGBUS,  BUS_ADRALN,	"alignment fault"		},
> +	{ do_bad,		SIGBUS,  BUS_FIXME,	"TLB conflict abort"		},

Broken kernel, kernel memory corruption, CPU/system bug etc.:
SIGKILL or panic().

> +	{ do_bad,		SIGBUS,  BUS_FIXME,	"Unsupported atomic hardware update fault"	},

Broken kernel, kernel memory corruption, CPU/system bug etc.:
SIGKILL or panic().

> +	{ do_bad,		SIGBUS,  BUS_FIXME,	"implementation fault (lockdown abort)" },

Userspace shouldn't have access to lockdown: kernel/system bug
-> SIGKILL or panic().

> +	{ do_bad,		SIGBUS,  BUS_FIXME,	"implementation fault (unsupported exclusive)" },

If running on an implementation where this fault can happen in response to an exclusive load/store issued by userspace may fail somewhere in the memory system, this should probably be SIGBUS/BUS_OBJERR (or possibly a new BUS_* code).

This one may need to be hardware-dependent, if this fault can mean
something different depending on the hardware (I'm gussing this
possibility from "implementation" -- I've not checked the docs.)

> +	{ do_bad,		SIGBUS,  BUS_FIXME,	"section domain fault"		},
> +	{ do_bad,		SIGBUS,  BUS_FIXME,	"page domain fault"		},

Broken kernel, kernel memory corruption, CPU/system bug etc.:
SIGKILL or panic().

>  };
>  
>  int handle_guest_sea(phys_addr_t addr, unsigned int esr)
> @@ -739,11 +739,11 @@ static struct fault_info __refdata debug_fault_info[] = {
> +	{ do_bad,	SIGBUS,		BUS_FIXME,	"unknown 3"		},
> +	{ do_bad,	SIGTRAP,	TRAP_FIXME,	"aarch32 vector catch"	},
> +	{ do_bad,	SIGBUS,		BUS_FIXME,	"unknown 7"		},
>  };

Impossible (?), or meaning unknown.
SIGKILL/panic() for these?  Or possibly (since these are probably well
localised errors) SIGILL/ILL_ILLOPC.

Cheers
---Dave
Eric W. Biederman Jan. 15, 2018, 5:23 p.m. UTC | #2
Dave Martin <Dave.Martin@arm.com> writes:

> On Thu, Jan 11, 2018 at 06:59:36PM -0600, Eric W. Biederman wrote:
>> Setting si_code to 0 results in a userspace seeing an si_code of 0.
>> This is the same si_code as SI_USER.  Posix and common sense requires
>> that SI_USER not be a signal specific si_code.  As such this use of 0
>> for the si_code is a pretty horribly broken ABI.
>
> I think this situation may have come about because 0 is used as a
> padding value for "impossible" cases -- i.e., things that can't happen
> unless the kernel is broken, or things that are too unrecoverable for
> clean error reporting to be helpful.
>
> In general, I think these values are not expected to reach userspace in
> practice.
>
> This is not an excuse though -- and not 100% true -- so it's certainly
> worthy of cleanup.
>
>
> It would be good to approach this similarly for arm and arm64, since
> the arm64 fault code is derived from arm.

In this case the fault_info is something I have only seen on arm64.
I have been approaching all architectures the same way.

If there is insufficient information without architecture expertise
to fix this class of error I have been ading FPE_FIXME to them.

>> Further use of si_code == 0 guaranteed that copy_siginfo_to_user saw a
>> value of __SI_KILL and now sees a value of SIL_KILL with the result
>> that uid and pid fields are copied and which might copying the si_addr
>> field by accident but certainly not by design.  Making this a very
>> flakey implementation.
>> 
>> Utilizing FPE_FIXME, BUS_FIXME, TRAP_FIXME siginfo_layout will now return
>> SIL_FAULT and the appropriate fields will be reliably copied.
>> 
>> But folks this is a new and unique kind of bad.  This is massively
>> untested code bad.  This is inventing new and unique was to get
>> siginfo wrong bad.  This is don't even think about Posix or what
>> siginfo means bad.  This is lots of eyeballs all missing the fact
>> that the code does the wrong thing bad.  This is getting stuck
>> and keep making the same mistake bad.
>> 
>> I really hope we can find a non userspace breaking fix for this on a
>> port as new as arm64.
>
>> Possible ABI fixes include:
>> - Send the signal without siginfo
>> - Don't generate a signal
>
> The above two sould like ABI breaks?

They are ways I have seen code on other platforms deal with
not information to generate siginfo.  Sending the signal without siginfo
is roughly equivalent to your send SIGKILL suggestion below.

A good example of that is code that calls force_sigsegv.

Calling "force_sig(SIGBUS, current);" is perfectly valid.
And then the parent when it reaped the process would have
a little more information to go on when guessing what happened
to the process.

>> - Possibly assign and use an appropriate si_code
>> - Don't handle cases which can't happen
>
> I think a mixture of these two is the best approach.
>
> In any case, si_code == 0 here doesn't seem to have any explicit meaning.
> I think we can translate all of the arm64 faults to proper si_codes --
> see my sketch below.  Probably means a bit more thought though.

Yes I would be very happy to see that.

> The only counterargument would be if there is software relying on
> these bogus signal cases getting si_code == 0 for a useful purpose.
>
> The main reason I see to check for SI_USER is to allow a process to
> filter out spurious signals (say, an asynchronous I/O signal for
> which si_value would be garbage), and to print out diagnostics
> before (in the case of a well-behaved program) resetting the signal
> to SIG_DFL and killing itself to report the signal to the waiter.
>
> Daemons may be more discerning about who is allowed to signal them,
> but overloading SIGBUS (say) as an IPC channel sounds like a very odd
> thing to do.  The same probably applies to any signal that has
> nontrivial metadata.

Agreed.  Although I have seen ltp test cases that do crazy things like
that.

> Have you found software that is impacted by this in practice?

No.

I don't expect many userspace applications look at siginfo and
everything I have found is some rare hard to trigger non-x86 case which
limits the exposure to userspace applications tremendously.

The angle I am coming at all of this from is that the linux kernel code
that filled out out struct siginfo was not comprehensible or correct.
Internal to the kernel it was using a magic value (not exportable to
userspace) in the upper bits of si_code.  That was causing problems for
signal injection and converting signals from 32bit to 64bit, and from
64bit to 32bit.

So I wrote kernel/signal.c:siginfo_layout() to figure out which fields
of struct siginfo should be sent to userspace.  In doing so I discovered
that using 0 in si_code (aka SI_USER) is ambiguous, and problematic.

Unfortuantely in most of the cases I have spotted using 0 in the si_code
requires architectural knowledge that I don't currently have to sort
out.  So the best I can do is change si_code from 0 to
FPE_FIXME/BUS_FIXME/TRAP_FIXME and bring the architecture maintainers
attention to this area.

One of the problems that results from all of this is that we copy
unitialized data to userspace.   I am slowly unifying and cleaning the
code up so that the code is simple enough we can be certain we are
not copying unitialized data to userspace.

With si_coes of FPE_FIXME/BUS_FIXME/TRAP_FIXME I can at least attempt to
keep the craziness from happening.

My next step is to unify struct siginfo and struct compat_siginfo
and the functions that copy them to userspace because there are very
siginficant problems there.


All of that said I like the way you are thinking about fixing these
issues.

> [...]
>
>> +++ b/arch/arm64/kernel/fpsimd.c
>> @@ -867,7 +867,7 @@ asmlinkage void do_fpsimd_acc(unsigned int esr, struct pt_regs *regs)
>>  asmlinkage void do_fpsimd_exc(unsigned int esr, struct pt_regs *regs)
>>  {
>>  	siginfo_t info;
>> -	unsigned int si_code = 0;
>> +	unsigned int si_code = FPE_FIXME;
>>  
>>  	if (esr & FPEXC_IOF)
>>  		si_code = FPE_FLTINV;
>
> This 0 can happen for vector operations where the implementation may
> not be able to report exactly what happened, for example where
> the implementer didn't want to pay the cost of tracking exactly
> what went wrong in each lane.
>
> However, the FPEXC_* bits can be garbage in such a case rather
> than being all zero: we should be checking the TFV bit in the ESR here.
> This may be a bug.
>
> Perhaps FPE_FLTINV should be returned in si_code for such cases:  it's
> not otherwise used on arm64 -- invalid instructions would be reported as
> SIGILL/ILL_ILLOPC instead).
>
> Otherwise, we might want to define a new code or arbitrarily pick
> one of the existing FLT_* since this is really a more benign condition
> than executing an illegal instruction.  Alternatively, treat the
> fault as spurious and suppress it, but that doesn't feel right either.

I would love to see this sorted out.  There is a very similar pattern
on several different architectures.  I suspect if we have a clean
solution on one architecture the other architectures will be able to use
that solution as well.

>> diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
>> index 9b7f89df49db..abe200587334 100644
>> --- a/arch/arm64/mm/fault.c
>> +++ b/arch/arm64/mm/fault.c
>> @@ -596,7 +596,7 @@ static int do_sea(unsigned long addr, unsigned int esr, struct pt_regs *regs)
>>  
>>  	info.si_signo = SIGBUS;
>>  	info.si_errno = 0;
>> -	info.si_code  = 0;
>> +	info.si_code  = BUS_FIXME;
>
> Probably BUS_OBJERR.
>
>>  	if (esr & ESR_ELx_FnV)
>>  		info.si_addr = NULL;
>>  	else
>> @@ -607,70 +607,70 @@ static int do_sea(unsigned long addr, unsigned int esr, struct pt_regs *regs)
>>  }
>>  
>>  static const struct fault_info fault_info[] = {
>> -	{ do_bad,		SIGBUS,  0,		"ttbr address size fault"	},
>> -	{ do_bad,		SIGBUS,  0,		"level 1 address size fault"	},
>> -	{ do_bad,		SIGBUS,  0,		"level 2 address size fault"	},
>> -	{ do_bad,		SIGBUS,  0,		"level 3 address size fault"	},
>> +	{ do_bad,		SIGBUS,  BUS_FIXME,	"ttbr address size fault"	},
>> +	{ do_bad,		SIGBUS,  BUS_FIXME,	"level 1 address size fault"	},
>> +	{ do_bad,		SIGBUS,  BUS_FIXME,	"level 2 address size fault"	},
>> +	{ do_bad,		SIGBUS,  BUS_FIXME,	"level 3 address size fault"	},
>
> Pagetable screwup or kernel/system/CPU bug -> SIGKILL, or panic().
>
> [...]
>
>> -	{ do_bad,		SIGBUS,  0,		"unknown 8"			},
>> +	{ do_bad,		SIGBUS,  BUS_FIXME,	"unknown 8"			},
>
> [...]
>
>> +	{ do_bad,		SIGBUS,  BUS_FIXME,	"unknown 12"			},
>
> Not architected, so they could mean absolutely anything.  If they
> can happen at all, they are probably unsafe to ignore.
>
>  -> SIGKILL, or panic().
>
> Similary for all the "unknown" codes in the table, which I omit for
> brevity.
>
>> +	{ do_sea,		SIGBUS,  BUS_FIXME,	"synchronous external abort"	},
>
> This si_code seems to be a fallback for if ACPI is absent or doesn't
> know what to do with this error.
>
> -> SIGBUS/BUS_OBJERR?
>
> Can probably legitimately happen for userspace for suitable MMIO mappings.
>
> Perhaps it's more serious though in the presence of ACPI.  Do we expect
> that ACPI can diagnose all localisable errors?
>
>> +	{ do_sea,		SIGBUS,  BUS_FIXME,	"level 0 (translation table walk)"	},
>> +	{ do_sea,		SIGBUS,  BUS_FIXME,	"level 1 (translation table walk)"	},
>> +	{ do_sea,		SIGBUS,  BUS_FIXME,	"level 2 (translation table walk)"	},
>> +	{ do_sea,		SIGBUS,  BUS_FIXME,	"level 3 (translation table walk)"	},
>
> Pagetable screwup or kernel/system/CPU bug -> SIGKILL, or panic().
>
>> +	{ do_sea,		SIGBUS,  BUS_FIXME,	"synchronous parity or ECC error" },	// Reserved when RAS is implemented
>
> Possibly SIGBUS/BUS_MCEERR_AR (though I don't know exactly what
> userspace is supposed to do with this or whether this implies the
> existence or certain kernel features for managing the error that
> may not be present on arm64...)
>
> Otherwise, SIGKILL.

Yes.   The AR Action Required and AO Action optional bits I don't quite
understand.  But BUS_MCEERR_AR does sound like a good fit.


>> +	{ do_sea,		SIGBUS,  BUS_FIXME,	"level 0 synchronous parity error (translation table walk)"	},	// Reserved when RAS is implemented
>> +	{ do_sea,		SIGBUS,  BUS_FIXME,	"level 1 synchronous parity error (translation table walk)"	},	// Reserved when RAS is implemented
>> +	{ do_sea,		SIGBUS,  BUS_FIXME,	"level 2 synchronous parity error (translation table walk)"	},	// Reserved when RAS is implemented
>> +	{ do_sea,		SIGBUS,  BUS_FIXME,	"level 3 synchronous parity error (translation table walk)"	},	// Reserved when RAS is implemented
>
> Process page tables corrupt: if the kernel couldn't fix this, the
> process can't reasonably fix it -> SIGKILL
>
> Since this is a RAS-type error it could be triggered by a cosmic ray
> rather than requiring a kernel or system bug or other major failure, so
> we probably shouldn't panic the system if the error is localisable to a
> particular process.
>
>>  	{ do_alignment_fault,	SIGBUS,  BUS_ADRALN,	"alignment fault"		},
>> +	{ do_bad,		SIGBUS,  BUS_FIXME,	"TLB conflict abort"		},
>
> Broken kernel, kernel memory corruption, CPU/system bug etc.:
> SIGKILL or panic().
>
>> +	{ do_bad,		SIGBUS,  BUS_FIXME,	"Unsupported atomic hardware update fault"	},
>
> Broken kernel, kernel memory corruption, CPU/system bug etc.:
> SIGKILL or panic().
>
>> +	{ do_bad,		SIGBUS,  BUS_FIXME,	"implementation fault (lockdown abort)" },
>
> Userspace shouldn't have access to lockdown: kernel/system bug
> -> SIGKILL or panic().
>
>> +	{ do_bad,		SIGBUS,  BUS_FIXME,	"implementation fault (unsupported exclusive)" },
>
> If running on an implementation where this fault can happen in response to an exclusive load/store issued by userspace may fail somewhere in the memory system, this should probably be SIGBUS/BUS_OBJERR (or possibly a new BUS_* code).
>
> This one may need to be hardware-dependent, if this fault can mean
> something different depending on the hardware (I'm gussing this
> possibility from "implementation" -- I've not checked the docs.)
>
>> +	{ do_bad,		SIGBUS,  BUS_FIXME,	"section domain fault"		},
>> +	{ do_bad,		SIGBUS,  BUS_FIXME,	"page domain fault"		},
>
> Broken kernel, kernel memory corruption, CPU/system bug etc.:
> SIGKILL or panic().
>
>>  };
>>  
>>  int handle_guest_sea(phys_addr_t addr, unsigned int esr)
>> @@ -739,11 +739,11 @@ static struct fault_info __refdata debug_fault_info[] = {
>> +	{ do_bad,	SIGBUS,		BUS_FIXME,	"unknown 3"		},
>> +	{ do_bad,	SIGTRAP,	TRAP_FIXME,	"aarch32 vector catch"	},
>> +	{ do_bad,	SIGBUS,		BUS_FIXME,	"unknown 7"		},
>>  };
>
> Impossible (?), or meaning unknown.
> SIGKILL/panic() for these?  Or possibly (since these are probably well
> localised errors) SIGILL/ILL_ILLOPC.

I like the way you are thinking on these, and I'd love to see them
fixed.

Eric
James Morse Jan. 15, 2018, 7:30 p.m. UTC | #3
Hi Dave,

Thanks for going through all these,

On 15/01/18 16:30, Dave Martin wrote:
> On Thu, Jan 11, 2018 at 06:59:36PM -0600, Eric W. Biederman wrote:
>> diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
>> index 9b7f89df49db..abe200587334 100644
>> --- a/arch/arm64/mm/fault.c
>> +++ b/arch/arm64/mm/fault.c
>> @@ -596,7 +596,7 @@ static int do_sea(unsigned long addr, unsigned int esr, struct pt_regs *regs)

>> +	{ do_sea,		SIGBUS,  BUS_FIXME,	"synchronous external abort"	},
> 
> This si_code seems to be a fallback for if ACPI is absent or doesn't
> know what to do with this error.
> 
> -> SIGBUS/BUS_OBJERR?
> 
> Can probably legitimately happen for userspace for suitable MMIO mappings.

It can happen for normal memory too, there are specific ESR values for
parity/checksum errors when read/writing memory. I think this first one is
'other/unknown', and its up to the CPU how to classify them.


> Perhaps it's more serious though in the presence of ACPI.  Do we expect
> that ACPI can diagnose all localisable errors?

Its not just ACPI, the CPU's v8.2 RAS Extensions use this
synchronous-external-abort as notification of a RAS error, (the other details
are written to to memory-mapped nodes). With the v8.2 RAS Extensions the ESR
tells us if the error was contained.

For ACPI we rely on firmware to set an appropriate severity in the CPER records
generated by firmware. The APEI helpers will call panic() if they find a fatal
error.

For systems with neither {firmware,kernel}-first RAS, BUS_OBJERR looks like a
good choice.


>> +	{ do_sea,		SIGBUS,  BUS_FIXME,	"level 0 (translation table walk)"	},
>> +	{ do_sea,		SIGBUS,  BUS_FIXME,	"level 1 (translation table walk)"	},
>> +	{ do_sea,		SIGBUS,  BUS_FIXME,	"level 2 (translation table walk)"	},
>> +	{ do_sea,		SIGBUS,  BUS_FIXME,	"level 3 (translation table walk)"	},
> 
> Pagetable screwup or kernel/system/CPU bug -> SIGKILL, or panic().

(RAS mechanisms may claim this and send their own signals, if not:)

SIGKILL is probably a better choice here, while we do have an address, there is
nothing user-space can do about it.


>> +	{ do_sea,		SIGBUS,  BUS_FIXME,	"synchronous parity or ECC error" },	// Reserved when RAS is implemented
> 
> Possibly SIGBUS/BUS_MCEERR_AR (though I don't know exactly what
> userspace is supposed to do with this or whether this implies the
> existence or certain kernel features for managing the error that
> may not be present on arm64...)

I'd like to keep the MCEERR signals to errors that we know are contained, the
kernel has understood and handled.

(These features do exist for arm64, enabling CONFIG_MEMORY_FAILURE and a few
APEI options allows all this to work today with suitable firmware. My Seattle
claims to support it).


> Otherwise, SIGKILL.

Sounds good,


>> +	{ do_sea,		SIGBUS,  BUS_FIXME,	"level 0 synchronous parity error (translation table walk)"	},	// Reserved when RAS is implemented
>> +	{ do_sea,		SIGBUS,  BUS_FIXME,	"level 1 synchronous parity error (translation table walk)"	},	// Reserved when RAS is implemented
>> +	{ do_sea,		SIGBUS,  BUS_FIXME,	"level 2 synchronous parity error (translation table walk)"	},	// Reserved when RAS is implemented
>> +	{ do_sea,		SIGBUS,  BUS_FIXME,	"level 3 synchronous parity error (translation table walk)"	},	// Reserved when RAS is implemented
> 
> Process page tables corrupt: if the kernel couldn't fix this, the
> process can't reasonably fix it -> SIGKILL
> 
> Since this is a RAS-type error it could be triggered by a cosmic ray
> rather than requiring a kernel or system bug or other major failure, so
> we probably shouldn't panic the system if the error is localisable to a
> particular process.

Without the RAS-Extensions severity to tell us the error is contained I'm not
sure what we can expect. But given the page-tables are per-process, and we never
swap them to disk etc, its probably a safe bet that it doesn't matter either way
for these.


Thanks,

James
Dave Martin Jan. 16, 2018, 5:24 p.m. UTC | #4
On Mon, Jan 15, 2018 at 11:23:03AM -0600, Eric W. Biederman wrote:
> Dave Martin <Dave.Martin@arm.com> writes:
> 
> > On Thu, Jan 11, 2018 at 06:59:36PM -0600, Eric W. Biederman wrote:
> >> Setting si_code to 0 results in a userspace seeing an si_code of 0.
> >> This is the same si_code as SI_USER.  Posix and common sense requires
> >> that SI_USER not be a signal specific si_code.  As such this use of 0
> >> for the si_code is a pretty horribly broken ABI.
> >
> > I think this situation may have come about because 0 is used as a
> > padding value for "impossible" cases -- i.e., things that can't happen
> > unless the kernel is broken, or things that are too unrecoverable for
> > clean error reporting to be helpful.
> >
> > In general, I think these values are not expected to reach userspace in
> > practice.
> >
> > This is not an excuse though -- and not 100% true -- so it's certainly
> > worthy of cleanup.
> >
> >
> > It would be good to approach this similarly for arm and arm64, since
> > the arm64 fault code is derived from arm.
> 
> In this case the fault_info is something I have only seen on arm64.
> I have been approaching all architectures the same way.

Bad guess on my part; this table-driven approach seems to be new for
arm64.

> If there is insufficient information without architecture expertise
> to fix this class of error I have been ading FPE_FIXME to them.

Fair enough.

> >> Further use of si_code == 0 guaranteed that copy_siginfo_to_user saw a
> >> value of __SI_KILL and now sees a value of SIL_KILL with the result
> >> that uid and pid fields are copied and which might copying the si_addr
> >> field by accident but certainly not by design.  Making this a very
> >> flakey implementation.
> >> 
> >> Utilizing FPE_FIXME, BUS_FIXME, TRAP_FIXME siginfo_layout will now return
> >> SIL_FAULT and the appropriate fields will be reliably copied.
> >> 
> >> But folks this is a new and unique kind of bad.  This is massively
> >> untested code bad.  This is inventing new and unique was to get
> >> siginfo wrong bad.  This is don't even think about Posix or what
> >> siginfo means bad.  This is lots of eyeballs all missing the fact
> >> that the code does the wrong thing bad.  This is getting stuck
> >> and keep making the same mistake bad.
> >> 
> >> I really hope we can find a non userspace breaking fix for this on a
> >> port as new as arm64.
> >
> >> Possible ABI fixes include:
> >> - Send the signal without siginfo
> >> - Don't generate a signal
> >
> > The above two sould like ABI breaks?
> 
> They are ways I have seen code on other platforms deal with
> not information to generate siginfo.  Sending the signal without siginfo
> is roughly equivalent to your send SIGKILL suggestion below.
> 
> A good example of that is code that calls force_sigsegv.
> 
> Calling "force_sig(SIGBUS, current);" is perfectly valid.
> And then the parent when it reaped the process would have
> a little more information to go on when guessing what happened
> to the process.
> 
> >> - Possibly assign and use an appropriate si_code
> >> - Don't handle cases which can't happen
> >
> > I think a mixture of these two is the best approach.
> >
> > In any case, si_code == 0 here doesn't seem to have any explicit meaning.
> > I think we can translate all of the arm64 faults to proper si_codes --
> > see my sketch below.  Probably means a bit more thought though.
> 
> Yes I would be very happy to see that.
> 
> > The only counterargument would be if there is software relying on
> > these bogus signal cases getting si_code == 0 for a useful purpose.
> >
> > The main reason I see to check for SI_USER is to allow a process to
> > filter out spurious signals (say, an asynchronous I/O signal for
> > which si_value would be garbage), and to print out diagnostics
> > before (in the case of a well-behaved program) resetting the signal
> > to SIG_DFL and killing itself to report the signal to the waiter.
> >
> > Daemons may be more discerning about who is allowed to signal them,
> > but overloading SIGBUS (say) as an IPC channel sounds like a very odd
> > thing to do.  The same probably applies to any signal that has
> > nontrivial metadata.
> 
> Agreed.  Although I have seen ltp test cases that do crazy things like
> that.
> 
> > Have you found software that is impacted by this in practice?
> 
> No.

Searching for si_code on codesearch.debian.net, I looked at a few
random hits.  Although this is far from exhaustive, I saw no instance
of code that assumes some arch-specific meaning for SI_USER (or 0).

Most code seems to check for signal-specific si_code values before
assuming that signal-specific signifo fields are valid; or for
SI_USER (or si_code <= 0) before assuming that si_uid and si_pid
are valid.

Returning proper values for si_code values in place of "0" would fix
rather than break such cases.


> I don't expect many userspace applications look at siginfo and
> everything I have found is some rare hard to trigger non-x86 case which
> limits the exposure to userspace applications tremendously.
> 
> The angle I am coming at all of this from is that the linux kernel code
> that filled out out struct siginfo was not comprehensible or correct.

I think "not comprehensible or correct" is a pretty good summary of
all signal-related code...

> Internal to the kernel it was using a magic value (not exportable to
> userspace) in the upper bits of si_code.  That was causing problems for
> signal injection and converting signals from 32bit to 64bit, and from
> 64bit to 32bit.
> 
> So I wrote kernel/signal.c:siginfo_layout() to figure out which fields
> of struct siginfo should be sent to userspace.  In doing so I discovered
> that using 0 in si_code (aka SI_USER) is ambiguous, and problematic.
> 
> Unfortuantely in most of the cases I have spotted using 0 in the si_code
> requires architectural knowledge that I don't currently have to sort
> out.  So the best I can do is change si_code from 0 to
> FPE_FIXME/BUS_FIXME/TRAP_FIXME and bring the architecture maintainers
> attention to this area.
> 
> One of the problems that results from all of this is that we copy
> unitialized data to userspace.   I am slowly unifying and cleaning the
> code up so that the code is simple enough we can be certain we are
> not copying unitialized data to userspace.
> 
> With si_coes of FPE_FIXME/BUS_FIXME/TRAP_FIXME I can at least attempt to
> keep the craziness from happening.
> 
> My next step is to unify struct siginfo and struct compat_siginfo
> and the functions that copy them to userspace because there are very
> siginficant problems there.

All of which sounds valuable.

> All of that said I like the way you are thinking about fixing these
> issues.

Is it feasible to have a different internal constant for SI_USER?
Then the generic could warn if it sees si_code == 0.  If the
special nonzero KERNEL_SI_USER is seen instead, it is silently
translated to SI_USER (0) for userspace.  This might help us
track down cases where 0 is passed by accident.

It may not be worth it though: if the affected cases are ones
that happen never or almost never, a runtime BUG_ON() may not be
helpful for tracking them down.

Also, I'm making an assumption that si_code always flows through
some generic code before reaching userspace (maybe untrue).


> >> +++ b/arch/arm64/kernel/fpsimd.c
> >> @@ -867,7 +867,7 @@ asmlinkage void do_fpsimd_acc(unsigned int esr, struct pt_regs *regs)
> >>  asmlinkage void do_fpsimd_exc(unsigned int esr, struct pt_regs *regs)
> >>  {
> >>  	siginfo_t info;
> >> -	unsigned int si_code = 0;
> >> +	unsigned int si_code = FPE_FIXME;
> >>  
> >>  	if (esr & FPEXC_IOF)
> >>  		si_code = FPE_FLTINV;
> >
> > This 0 can happen for vector operations where the implementation may
> > not be able to report exactly what happened, for example where
> > the implementer didn't want to pay the cost of tracking exactly
> > what went wrong in each lane.
> >
> > However, the FPEXC_* bits can be garbage in such a case rather
> > than being all zero: we should be checking the TFV bit in the ESR here.
> > This may be a bug.
> >
> > Perhaps FPE_FLTINV should be returned in si_code for such cases:  it's
> > not otherwise used on arm64 -- invalid instructions would be reported as
> > SIGILL/ILL_ILLOPC instead).
> >
> > Otherwise, we might want to define a new code or arbitrarily pick
> > one of the existing FLT_* since this is really a more benign condition
> > than executing an illegal instruction.  Alternatively, treat the
> > fault as spurious and suppress it, but that doesn't feel right either.
> 
> I would love to see this sorted out.  There is a very similar pattern
> on several different architectures.  I suspect if we have a clean
> solution on one architecture the other architectures will be able to use
> that solution as well.

Since user code that relies on checking si_code for fp exceptions will
probably already break in these cases, we can probably fudge things a
bit here.

I'll have a think.  IEEE may also define some rules that are relevant
here...

For the proposed conversion of the si_code==0 cases for arm64, I'll
draft an RFC for discussion (hopefully sometime this week).

[...]

Cheers
---Dave
Eric W. Biederman Jan. 16, 2018, 10:28 p.m. UTC | #5
Dave Martin <Dave.Martin@arm.com> writes:

> On Mon, Jan 15, 2018 at 11:23:03AM -0600, Eric W. Biederman wrote:
>> Dave Martin <Dave.Martin@arm.com> writes:
>> 
>> > On Thu, Jan 11, 2018 at 06:59:36PM -0600, Eric W. Biederman wrote:
>> >> Setting si_code to 0 results in a userspace seeing an si_code of 0.
>> >> This is the same si_code as SI_USER.  Posix and common sense requires
>> >> that SI_USER not be a signal specific si_code.  As such this use of 0
>> >> for the si_code is a pretty horribly broken ABI.
>> >
>> > I think this situation may have come about because 0 is used as a
>> > padding value for "impossible" cases -- i.e., things that can't happen
>> > unless the kernel is broken, or things that are too unrecoverable for
>> > clean error reporting to be helpful.
>> >
>> > In general, I think these values are not expected to reach userspace in
>> > practice.
>> >
>> > This is not an excuse though -- and not 100% true -- so it's certainly
>> > worthy of cleanup.
>> >
>> >
>> > It would be good to approach this similarly for arm and arm64, since
>> > the arm64 fault code is derived from arm.
>> 
>> In this case the fault_info is something I have only seen on arm64.
>> I have been approaching all architectures the same way.
>
> Bad guess on my part; this table-driven approach seems to be new for
> arm64.
>
>> If there is insufficient information without architecture expertise
>> to fix this class of error I have been ading FPE_FIXME to them.
>
> Fair enough.
>
>> >> Further use of si_code == 0 guaranteed that copy_siginfo_to_user saw a
>> >> value of __SI_KILL and now sees a value of SIL_KILL with the result
>> >> that uid and pid fields are copied and which might copying the si_addr
>> >> field by accident but certainly not by design.  Making this a very
>> >> flakey implementation.
>> >> 
>> >> Utilizing FPE_FIXME, BUS_FIXME, TRAP_FIXME siginfo_layout will now return
>> >> SIL_FAULT and the appropriate fields will be reliably copied.
>> >> 
>> >> But folks this is a new and unique kind of bad.  This is massively
>> >> untested code bad.  This is inventing new and unique was to get
>> >> siginfo wrong bad.  This is don't even think about Posix or what
>> >> siginfo means bad.  This is lots of eyeballs all missing the fact
>> >> that the code does the wrong thing bad.  This is getting stuck
>> >> and keep making the same mistake bad.
>> >> 
>> >> I really hope we can find a non userspace breaking fix for this on a
>> >> port as new as arm64.
>> >
>> >> Possible ABI fixes include:
>> >> - Send the signal without siginfo
>> >> - Don't generate a signal
>> >
>> > The above two sould like ABI breaks?
>> 
>> They are ways I have seen code on other platforms deal with
>> not information to generate siginfo.  Sending the signal without siginfo
>> is roughly equivalent to your send SIGKILL suggestion below.
>> 
>> A good example of that is code that calls force_sigsegv.
>> 
>> Calling "force_sig(SIGBUS, current);" is perfectly valid.
>> And then the parent when it reaped the process would have
>> a little more information to go on when guessing what happened
>> to the process.
>> 
>> >> - Possibly assign and use an appropriate si_code
>> >> - Don't handle cases which can't happen
>> >
>> > I think a mixture of these two is the best approach.
>> >
>> > In any case, si_code == 0 here doesn't seem to have any explicit meaning.
>> > I think we can translate all of the arm64 faults to proper si_codes --
>> > see my sketch below.  Probably means a bit more thought though.
>> 
>> Yes I would be very happy to see that.
>> 
>> > The only counterargument would be if there is software relying on
>> > these bogus signal cases getting si_code == 0 for a useful purpose.
>> >
>> > The main reason I see to check for SI_USER is to allow a process to
>> > filter out spurious signals (say, an asynchronous I/O signal for
>> > which si_value would be garbage), and to print out diagnostics
>> > before (in the case of a well-behaved program) resetting the signal
>> > to SIG_DFL and killing itself to report the signal to the waiter.
>> >
>> > Daemons may be more discerning about who is allowed to signal them,
>> > but overloading SIGBUS (say) as an IPC channel sounds like a very odd
>> > thing to do.  The same probably applies to any signal that has
>> > nontrivial metadata.
>> 
>> Agreed.  Although I have seen ltp test cases that do crazy things like
>> that.
>> 
>> > Have you found software that is impacted by this in practice?
>> 
>> No.
>
> Searching for si_code on codesearch.debian.net, I looked at a few
> random hits.  Although this is far from exhaustive, I saw no instance
> of code that assumes some arch-specific meaning for SI_USER (or 0).
>
> Most code seems to check for signal-specific si_code values before
> assuming that signal-specific signifo fields are valid; or for
> SI_USER (or si_code <= 0) before assuming that si_uid and si_pid
> are valid.
>
> Returning proper values for si_code values in place of "0" would fix
> rather than break such cases.
>
>
>> I don't expect many userspace applications look at siginfo and
>> everything I have found is some rare hard to trigger non-x86 case which
>> limits the exposure to userspace applications tremendously.
>> 
>> The angle I am coming at all of this from is that the linux kernel code
>> that filled out out struct siginfo was not comprehensible or correct.
>
> I think "not comprehensible or correct" is a pretty good summary of
> all signal-related code...
>
>> Internal to the kernel it was using a magic value (not exportable to
>> userspace) in the upper bits of si_code.  That was causing problems for
>> signal injection and converting signals from 32bit to 64bit, and from
>> 64bit to 32bit.
>> 
>> So I wrote kernel/signal.c:siginfo_layout() to figure out which fields
>> of struct siginfo should be sent to userspace.  In doing so I discovered
>> that using 0 in si_code (aka SI_USER) is ambiguous, and problematic.
>> 
>> Unfortuantely in most of the cases I have spotted using 0 in the si_code
>> requires architectural knowledge that I don't currently have to sort
>> out.  So the best I can do is change si_code from 0 to
>> FPE_FIXME/BUS_FIXME/TRAP_FIXME and bring the architecture maintainers
>> attention to this area.
>> 
>> One of the problems that results from all of this is that we copy
>> unitialized data to userspace.   I am slowly unifying and cleaning the
>> code up so that the code is simple enough we can be certain we are
>> not copying unitialized data to userspace.
>> 
>> With si_coes of FPE_FIXME/BUS_FIXME/TRAP_FIXME I can at least attempt to
>> keep the craziness from happening.
>> 
>> My next step is to unify struct siginfo and struct compat_siginfo
>> and the functions that copy them to userspace because there are very
>> siginficant problems there.
>
> All of which sounds valuable.
>
>> All of that said I like the way you are thinking about fixing these
>> issues.
>
> Is it feasible to have a different internal constant for SI_USER?
> Then the generic could warn if it sees si_code == 0.  If the
> special nonzero KERNEL_SI_USER is seen instead, it is silently
> translated to SI_USER (0) for userspace.  This might help us
> track down cases where 0 is passed by accident.
>
> It may not be worth it though: if the affected cases are ones
> that happen never or almost never, a runtime BUG_ON() may not be
> helpful for tracking them down.
>
> Also, I'm making an assumption that si_code always flows through
> some generic code before reaching userspace (maybe untrue).

The code does flow through a generic path, and I am in the middle of
tightening that up right now.  As filling out siginfo is error prone,
and I need a guarantee that all of struct siginfo is initialized.
Adding a warning if a arch fault hander uses si_code == 0 aka SI_USER
would not be hard.

Given what you have found.  Given that it seems to match my experience
we can almost certainly change the code to just warn when the 0 is
passed in for the si_code for fault handlers.

I want to ensure that all of the fields are filled in before I do that
or else I risk passing unitialized values to userspace.  But I like that
as the long term goal for this code.

>> >> +++ b/arch/arm64/kernel/fpsimd.c
>> >> @@ -867,7 +867,7 @@ asmlinkage void do_fpsimd_acc(unsigned int esr, struct pt_regs *regs)
>> >>  asmlinkage void do_fpsimd_exc(unsigned int esr, struct pt_regs *regs)
>> >>  {
>> >>  	siginfo_t info;
>> >> -	unsigned int si_code = 0;
>> >> +	unsigned int si_code = FPE_FIXME;
>> >>  
>> >>  	if (esr & FPEXC_IOF)
>> >>  		si_code = FPE_FLTINV;
>> >
>> > This 0 can happen for vector operations where the implementation may
>> > not be able to report exactly what happened, for example where
>> > the implementer didn't want to pay the cost of tracking exactly
>> > what went wrong in each lane.
>> >
>> > However, the FPEXC_* bits can be garbage in such a case rather
>> > than being all zero: we should be checking the TFV bit in the ESR here.
>> > This may be a bug.
>> >
>> > Perhaps FPE_FLTINV should be returned in si_code for such cases:  it's
>> > not otherwise used on arm64 -- invalid instructions would be reported as
>> > SIGILL/ILL_ILLOPC instead).
>> >
>> > Otherwise, we might want to define a new code or arbitrarily pick
>> > one of the existing FLT_* since this is really a more benign condition
>> > than executing an illegal instruction.  Alternatively, treat the
>> > fault as spurious and suppress it, but that doesn't feel right either.
>> 
>> I would love to see this sorted out.  There is a very similar pattern
>> on several different architectures.  I suspect if we have a clean
>> solution on one architecture the other architectures will be able to use
>> that solution as well.
>
> Since user code that relies on checking si_code for fp exceptions will
> probably already break in these cases, we can probably fudge things a
> bit here.
>
> I'll have a think.  IEEE may also define some rules that are relevant
> here...
>
> For the proposed conversion of the si_code==0 cases for arm64, I'll
> draft an RFC for discussion (hopefully sometime this week).

Sounds good.

I will keep FPE_FIXME as a place holder until this gets sorted out.

There is a second issue I am looking at in this location,
and maybe I don't have to address it now.  But it looks like the code is
calling send_sig_info instead of force_sig_info for a synchronous
exception.  Am I reading that correctly?

Eric
Russell King (Oracle) Jan. 17, 2018, 11:57 a.m. UTC | #6
On Tue, Jan 16, 2018 at 04:28:50PM -0600, Eric W. Biederman wrote:
> I will keep FPE_FIXME as a place holder until this gets sorted out.
> 
> There is a second issue I am looking at in this location,
> and maybe I don't have to address it now.  But it looks like the code is
> calling send_sig_info instead of force_sig_info for a synchronous
> exception.  Am I reading that correctly?

VFP used to use force_sig_info(), but it seems to be really the wrong
call to use.  force_sig_info() checks whether the program decided to
ignore or block the signal, and if it did, replaces the signal handler
with the default handler and unblocks the signal.

Are you really suggesting that FP all FP signals should get this
treatment?
Dave Martin Jan. 17, 2018, 12:15 p.m. UTC | #7
On Wed, Jan 17, 2018 at 11:57:09AM +0000, Russell King - ARM Linux wrote:
> On Tue, Jan 16, 2018 at 04:28:50PM -0600, Eric W. Biederman wrote:
> > I will keep FPE_FIXME as a place holder until this gets sorted out.
> > 
> > There is a second issue I am looking at in this location,
> > and maybe I don't have to address it now.  But it looks like the code is
> > calling send_sig_info instead of force_sig_info for a synchronous
> > exception.  Am I reading that correctly?
> 
> VFP used to use force_sig_info(), but it seems to be really the wrong
> call to use.  force_sig_info() checks whether the program decided to
> ignore or block the signal, and if it did, replaces the signal handler
> with the default handler and unblocks the signal.
> 
> Are you really suggesting that FP all FP signals should get this
> treatment?

feenableexcept(FE_OVERFLOW) kind of means "I can't run safely past
an fp overflow exception, please signal me instead".

If the process also blocked SIGFPE, that could be taken to mean
"I can't run safely past an fp overflow exception _and_ I can't
take SIGFPE either" ... i.e., if an fp overflow happens there is
no way to proceed and it's really fatal.

What SIG_IGN ought to mean is rather more debatable, but again,
the process could be asking for two opposite things: guarantee a
SIGFPE is delivered instead of running past an fp exception, and
also guarantee that SIGFPE is _not_ delivered.

It looks like arm and arm64 are different from most other arches
(including x86) here, but I'm not sure what is considered correct, and
it looks like the answer is not standardised.  There's a possibility
that some software goes subtly wrong on arm/arm64 where on other arches
it would get terminated with SIGKILL.

Whether this matters depends on how harmless the fp exception is to
the work of the program.  I think if an exception is set to trap
via feenableexcept() then that's a strong hint the programmer thinks
that exception is not harmless.  OTOH, trapping is not always
available anyway...


Was there some particular program being broken by the force_sig_info()
here?

Cheers
---Dave
Russell King (Oracle) Jan. 17, 2018, 12:37 p.m. UTC | #8
On Wed, Jan 17, 2018 at 12:15:05PM +0000, Dave Martin wrote:
> On Wed, Jan 17, 2018 at 11:57:09AM +0000, Russell King - ARM Linux wrote:
> > On Tue, Jan 16, 2018 at 04:28:50PM -0600, Eric W. Biederman wrote:
> > > I will keep FPE_FIXME as a place holder until this gets sorted out.
> > > 
> > > There is a second issue I am looking at in this location,
> > > and maybe I don't have to address it now.  But it looks like the code is
> > > calling send_sig_info instead of force_sig_info for a synchronous
> > > exception.  Am I reading that correctly?
> > 
> > VFP used to use force_sig_info(), but it seems to be really the wrong
> > call to use.  force_sig_info() checks whether the program decided to
> > ignore or block the signal, and if it did, replaces the signal handler
> > with the default handler and unblocks the signal.
> > 
> > Are you really suggesting that FP all FP signals should get this
> > treatment?
> 
> feenableexcept(FE_OVERFLOW) kind of means "I can't run safely past
> an fp overflow exception, please signal me instead".
> 
> If the process also blocked SIGFPE, that could be taken to mean
> "I can't run safely past an fp overflow exception _and_ I can't
> take SIGFPE either" ... i.e., if an fp overflow happens there is
> no way to proceed and it's really fatal.
> 
> What SIG_IGN ought to mean is rather more debatable, but again,
> the process could be asking for two opposite things: guarantee a
> SIGFPE is delivered instead of running past an fp exception, and
> also guarantee that SIGFPE is _not_ delivered.
> 
> It looks like arm and arm64 are different from most other arches
> (including x86) here, but I'm not sure what is considered correct, and
> it looks like the answer is not standardised.  There's a possibility
> that some software goes subtly wrong on arm/arm64 where on other arches
> it would get terminated with SIGKILL.
> 
> Whether this matters depends on how harmless the fp exception is to
> the work of the program.  I think if an exception is set to trap
> via feenableexcept() then that's a strong hint the programmer thinks
> that exception is not harmless.  OTOH, trapping is not always
> available anyway...

Like many of these things, there is no clear answer.  It's a set of
conflicting requirements, and as you point out, even if you've called
feenableexcept(), you are not guaranteed to get a trap.

However, do remember that FP exceptions on ARM hardware are already
asynchronous - they get reported by the _next_ FP operation to the one
that caused them, which means they could be raised by a library function
sometime after it occured (when the library function decides to save the
FP registers to the stack before it makes use of them.)  It's entirely
possible that the library function has blocked FP signals temporarily
(not explicitly, just decided to block all signals while it does
something sensitive) and will unblock them again afterwards - at which
point we get the SIGFPE, and it would be quite right to deliver that
signal to the user SIGFPE handler, rather than forcing it onto the
program mid-library function.

It's also possible that SIGFPE could be blocked by another signal handler
having been invoked, and it triggers the latent generation of the SIGFPE.

I'd be more inclined to agree with you if VFP exceptions were synchronous
but they aren't.

> Was there some particular program being broken by the force_sig_info()
> here?

I don't recall.
Dave Martin Jan. 17, 2018, 3:37 p.m. UTC | #9
On Wed, Jan 17, 2018 at 12:37:52PM +0000, Russell King - ARM Linux wrote:
> On Wed, Jan 17, 2018 at 12:15:05PM +0000, Dave Martin wrote:
> > On Wed, Jan 17, 2018 at 11:57:09AM +0000, Russell King - ARM Linux wrote:

[...]

> > > VFP used to use force_sig_info(), but it seems to be really the wrong
> > > call to use.  force_sig_info() checks whether the program decided to
> > > ignore or block the signal, and if it did, replaces the signal handler
> > > with the default handler and unblocks the signal.
> > > 
> > > Are you really suggesting that FP all FP signals should get this
> > > treatment?
> > 
> > feenableexcept(FE_OVERFLOW) kind of means "I can't run safely past
> > an fp overflow exception, please signal me instead".
> > 
> > If the process also blocked SIGFPE, that could be taken to mean
> > "I can't run safely past an fp overflow exception _and_ I can't
> > take SIGFPE either" ... i.e., if an fp overflow happens there is
> > no way to proceed and it's really fatal.
> > 
> > What SIG_IGN ought to mean is rather more debatable, but again,
> > the process could be asking for two opposite things: guarantee a
> > SIGFPE is delivered instead of running past an fp exception, and
> > also guarantee that SIGFPE is _not_ delivered.
> > 
> > It looks like arm and arm64 are different from most other arches
> > (including x86) here, but I'm not sure what is considered correct, and
> > it looks like the answer is not standardised.  There's a possibility
> > that some software goes subtly wrong on arm/arm64 where on other arches
> > it would get terminated with SIGKILL.
> > 
> > Whether this matters depends on how harmless the fp exception is to
> > the work of the program.  I think if an exception is set to trap
> > via feenableexcept() then that's a strong hint the programmer thinks
> > that exception is not harmless.  OTOH, trapping is not always
> > available anyway...
> 
> Like many of these things, there is no clear answer.  It's a set of
> conflicting requirements, and as you point out, even if you've called
> feenableexcept(), you are not guaranteed to get a trap.
> 
> However, do remember that FP exceptions on ARM hardware are already
> asynchronous - they get reported by the _next_ FP operation to the one
> that caused them, which means they could be raised by a library function
> sometime after it occured (when the library function decides to save the
> FP registers to the stack before it makes use of them.)  It's entirely
> possible that the library function has blocked FP signals temporarily
> (not explicitly, just decided to block all signals while it does
> something sensitive) and will unblock them again afterwards - at which
> point we get the SIGFPE, and it would be quite right to deliver that
> signal to the user SIGFPE handler, rather than forcing it onto the
> program mid-library function.
> 
> It's also possible that SIGFPE could be blocked by another signal handler
> having been invoked, and it triggers the latent generation of the SIGFPE.
> 
> I'd be more inclined to agree with you if VFP exceptions were synchronous
> but they aren't.

Hmmm, it looks like imprecise fp exception traps are disallowed from
ARMv8 onwards.  I guess they made more sense when the FPU really was a
coprocessor, or at least semidetached from the integer core.

I think force_sig_info() makes sense here if and only if the traps
are guaranteed to be precise, so we probably should use this on arm64.
Not arm though (alpha doesn't either, if I understand the code
correctly.)

Does that make sense?

Apparently, few recent cores (at least ARM's own ones) support fp
exception trapping anyway...  1176 may be the most recent.

Cheers
---Dave
Russell King (Oracle) Jan. 17, 2018, 3:49 p.m. UTC | #10
On Wed, Jan 17, 2018 at 03:37:31PM +0000, Dave Martin wrote:
> On Wed, Jan 17, 2018 at 12:37:52PM +0000, Russell King - ARM Linux wrote:
> > On Wed, Jan 17, 2018 at 12:15:05PM +0000, Dave Martin wrote:
> > > On Wed, Jan 17, 2018 at 11:57:09AM +0000, Russell King - ARM Linux wrote:
> 
> [...]
> 
> > > > VFP used to use force_sig_info(), but it seems to be really the wrong
> > > > call to use.  force_sig_info() checks whether the program decided to
> > > > ignore or block the signal, and if it did, replaces the signal handler
> > > > with the default handler and unblocks the signal.
> > > > 
> > > > Are you really suggesting that FP all FP signals should get this
> > > > treatment?
> > > 
> > > feenableexcept(FE_OVERFLOW) kind of means "I can't run safely past
> > > an fp overflow exception, please signal me instead".
> > > 
> > > If the process also blocked SIGFPE, that could be taken to mean
> > > "I can't run safely past an fp overflow exception _and_ I can't
> > > take SIGFPE either" ... i.e., if an fp overflow happens there is
> > > no way to proceed and it's really fatal.
> > > 
> > > What SIG_IGN ought to mean is rather more debatable, but again,
> > > the process could be asking for two opposite things: guarantee a
> > > SIGFPE is delivered instead of running past an fp exception, and
> > > also guarantee that SIGFPE is _not_ delivered.
> > > 
> > > It looks like arm and arm64 are different from most other arches
> > > (including x86) here, but I'm not sure what is considered correct, and
> > > it looks like the answer is not standardised.  There's a possibility
> > > that some software goes subtly wrong on arm/arm64 where on other arches
> > > it would get terminated with SIGKILL.
> > > 
> > > Whether this matters depends on how harmless the fp exception is to
> > > the work of the program.  I think if an exception is set to trap
> > > via feenableexcept() then that's a strong hint the programmer thinks
> > > that exception is not harmless.  OTOH, trapping is not always
> > > available anyway...
> > 
> > Like many of these things, there is no clear answer.  It's a set of
> > conflicting requirements, and as you point out, even if you've called
> > feenableexcept(), you are not guaranteed to get a trap.
> > 
> > However, do remember that FP exceptions on ARM hardware are already
> > asynchronous - they get reported by the _next_ FP operation to the one
> > that caused them, which means they could be raised by a library function
> > sometime after it occured (when the library function decides to save the
> > FP registers to the stack before it makes use of them.)  It's entirely
> > possible that the library function has blocked FP signals temporarily
> > (not explicitly, just decided to block all signals while it does
> > something sensitive) and will unblock them again afterwards - at which
> > point we get the SIGFPE, and it would be quite right to deliver that
> > signal to the user SIGFPE handler, rather than forcing it onto the
> > program mid-library function.
> > 
> > It's also possible that SIGFPE could be blocked by another signal handler
> > having been invoked, and it triggers the latent generation of the SIGFPE.
> > 
> > I'd be more inclined to agree with you if VFP exceptions were synchronous
> > but they aren't.
> 
> Hmmm, it looks like imprecise fp exception traps are disallowed from
> ARMv8 onwards.  I guess they made more sense when the FPU really was a
> coprocessor, or at least semidetached from the integer core.
> 
> I think force_sig_info() makes sense here if and only if the traps
> are guaranteed to be precise, so we probably should use this on arm64.
> Not arm though (alpha doesn't either, if I understand the code
> correctly.)
> 
> Does that make sense?
> 
> Apparently, few recent cores (at least ARM's own ones) support fp
> exception trapping anyway...  1176 may be the most recent.

... and that makes the feenableexcept() argument about "A program
really wants to know" moot.  It can enable the exceptions in the
FPSCR but its never going to receive a SIGFPE on CPUs that don't
do exception trapping.
Dave Martin Jan. 17, 2018, 4:11 p.m. UTC | #11
On Wed, Jan 17, 2018 at 03:49:59PM +0000, Russell King - ARM Linux wrote:
> On Wed, Jan 17, 2018 at 03:37:31PM +0000, Dave Martin wrote:
> > On Wed, Jan 17, 2018 at 12:37:52PM +0000, Russell King - ARM Linux wrote:

[...]

> > > I'd be more inclined to agree with you if VFP exceptions were synchronous
> > > but they aren't.
> > 
> > Hmmm, it looks like imprecise fp exception traps are disallowed from
> > ARMv8 onwards.  I guess they made more sense when the FPU really was a
> > coprocessor, or at least semidetached from the integer core.
> > 
> > I think force_sig_info() makes sense here if and only if the traps
> > are guaranteed to be precise, so we probably should use this on arm64.
> > Not arm though (alpha doesn't either, if I understand the code
> > correctly.)
> > 
> > Does that make sense?
> > 
> > Apparently, few recent cores (at least ARM's own ones) support fp
> > exception trapping anyway...  1176 may be the most recent.
> 
> ... and that makes the feenableexcept() argument about "A program
> really wants to know" moot.  It can enable the exceptions in the
> FPSCR but its never going to receive a SIGFPE on CPUs that don't
> do exception trapping.

Sort of.  If the hardware doesn't support traps then those FP(S)CR
bits can't be set.  feenableexcept() returns -1 if the requested
bits don't stick, but I'll bet there's software that doesn't bother
to check...

Relying on fp exception traps therefore isn't portable, but software
that does so can at least portably fail safe if it checks for the -1
return.

I'll cook up some RFC patches for arm64, then I could take a look at
arm is nobody else is working on it.

Cheers
---Dave
Eric W. Biederman Jan. 17, 2018, 4:45 p.m. UTC | #12
Russell King - ARM Linux <linux@armlinux.org.uk> writes:

> On Wed, Jan 17, 2018 at 12:15:05PM +0000, Dave Martin wrote:
>> On Wed, Jan 17, 2018 at 11:57:09AM +0000, Russell King - ARM Linux wrote:
>> > On Tue, Jan 16, 2018 at 04:28:50PM -0600, Eric W. Biederman wrote:
>> > > I will keep FPE_FIXME as a place holder until this gets sorted out.
>> > > 
>> > > There is a second issue I am looking at in this location,
>> > > and maybe I don't have to address it now.  But it looks like the code is
>> > > calling send_sig_info instead of force_sig_info for a synchronous
>> > > exception.  Am I reading that correctly?
>> > 
>> > VFP used to use force_sig_info(), but it seems to be really the wrong
>> > call to use.  force_sig_info() checks whether the program decided to
>> > ignore or block the signal, and if it did, replaces the signal handler
>> > with the default handler and unblocks the signal.

That ignored and blocked behavior is a very weird implementation,
but for a synchronous signal it amounts to enforcing coredump and
then exit behavior.  As the process does not continue past that
point the behavior is not observable by userspace.

>> > Are you really suggesting that FP all FP signals should get this
>> > treatment?

I am only suggesting that all synchronous signals, aka signals where
it helps to point at the instruction from the signal information
get that treatment.  As the vast majority are synchronous I was
asking about this one oddball case.

>> feenableexcept(FE_OVERFLOW) kind of means "I can't run safely past
>> an fp overflow exception, please signal me instead".
>> 
>> If the process also blocked SIGFPE, that could be taken to mean
>> "I can't run safely past an fp overflow exception _and_ I can't
>> take SIGFPE either" ... i.e., if an fp overflow happens there is
>> no way to proceed and it's really fatal.
>> 
>> What SIG_IGN ought to mean is rather more debatable, but again,
>> the process could be asking for two opposite things: guarantee a
>> SIGFPE is delivered instead of running past an fp exception, and
>> also guarantee that SIGFPE is _not_ delivered.
>> 
>> It looks like arm and arm64 are different from most other arches
>> (including x86) here, but I'm not sure what is considered correct, and
>> it looks like the answer is not standardised.  There's a possibility
>> that some software goes subtly wrong on arm/arm64 where on other arches
>> it would get terminated with SIGKILL.

I looked it up yesterday to be clear, and POSIX actually says the
behavior is implemenation dependent/undefined if you try to ignore
SIGFPE.

>> Whether this matters depends on how harmless the fp exception is to
>> the work of the program.  I think if an exception is set to trap
>> via feenableexcept() then that's a strong hint the programmer thinks
>> that exception is not harmless.  OTOH, trapping is not always
>> available anyway...
>
> Like many of these things, there is no clear answer.  It's a set of
> conflicting requirements, and as you point out, even if you've called
> feenableexcept(), you are not guaranteed to get a trap.
>
> However, do remember that FP exceptions on ARM hardware are already
> asynchronous - they get reported by the _next_ FP operation to the one
> that caused them, which means they could be raised by a library function
> sometime after it occured (when the library function decides to save the
> FP registers to the stack before it makes use of them.)  It's entirely
> possible that the library function has blocked FP signals temporarily
> (not explicitly, just decided to block all signals while it does
> something sensitive) and will unblock them again afterwards - at which
> point we get the SIGFPE, and it would be quite right to deliver that
> signal to the user SIGFPE handler, rather than forcing it onto the
> program mid-library function.
>
> It's also possible that SIGFPE could be blocked by another signal handler
> having been invoked, and it triggers the latent generation of the SIGFPE.
>
> I'd be more inclined to agree with you if VFP exceptions were synchronous
> but they aren't.

From your description there still seems to be an association with an
instruction so I don't know if I would really call the signal
asynchronous.  It sounds like the exception is delayed and not
asynchronous.

>> Was there some particular program being broken by the force_sig_info()
>> here?
>
> I don't recall.

> commit da41119af78864d27ccbf505949df788d5e8aaf5
> Author: Russell King <rmk@dyn-67.arm.linux.org.uk>
> Date:   Wed Jun 29 23:02:02 2005 +0100
> 
>     [PATCH] ARM: Don't force SIGFPE
>     
>     We were forcing SIGFPE on to a user program for no good reason.
>     Use send_sig_info() instead.
>     
>     Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>

The commit looks like it was a case of the code not looking right
and you just switching to send_sig_info.

force_sig_info really out to be called something like synchronous_sig.

I am looking at sorting that out as part of cleaning up the signal
handling.  There is currently a small bug where under the right
circumstances these synchronous signals might be improperly delivered
after a thread specific signal.  To make that very unlikely there is a
special case in dequeue_signal for synchronous signals but it is
imperfect and slower than necessary.

The function really ought to look something like:

void synch_sig(struct siginfo *info)
{
	struct task_struct *tsk = current;
        int sig = info->si_signo;
        struct ksignal ksig;
	struct k_sigaction *ka;
        bool blocked, ignored;
        unsigned long flags;

        WARN_ON(!siginmask(sig, SYNCHRONOUS_MASK));

	copy_siginfo(info, &ksig.info);
	spin_lock_irqsave(&tsk->sighand->siglock, flags);
	ka = &tsk->sighand->action[sig - 1];
	ignored = ka->sa.sa_handler == SIG_IGN;
	blocked = sigismember(&tsk->blocked, sig);
	ksig.ka = *ka;
	if (blocked || ignored) {
		ksig.ka.sa.sa_handler = SIG_DFL;
	} else if (ka->sa.sa_flags & SA_ONESHOT) {
		ka->sa.sa_handler = SIG_DFL;
	}
	spin_unlock_irqrestore(&tsk->sighand->siglock, flags);

	if (ksig.ka.sa.sa_handler == SIG_DFL) {
		do_coredump(&ksig->info);
		do_group_exit(sig);
		/* NOTREACHED */
	}

	handle_signal(&ksig, signal_pt_regs());
}

Which is both clearer and faster and less buggy than the current
implementation as it delivers the signal immediately with no chance
of the signal queue to reorder things.

But it is going to take a little bit to get there as there are a number
of implementations of handle_signal like the ones on arm and arm64 that
perform needed register adjustments outside of handle_signal.

Eric
Russell King (Oracle) Jan. 17, 2018, 5:14 p.m. UTC | #13
On Wed, Jan 17, 2018 at 10:45:10AM -0600, Eric W. Biederman wrote:
> Russell King - ARM Linux <linux@armlinux.org.uk> writes:
> >From your description there still seems to be an association with an
> instruction so I don't know if I would really call the signal
> asynchronous.  It sounds like the exception is delayed and not
> asynchronous.

Traps can only be passed from ARM coprocessors by a coprocessor refusing
to execute an instruction.  That's what happens in this case - the VFP
gets offered an instruction to execute.  It accepts it, and the CPU
continues, leaving the VFP to execute its instruction independently.  If
this instruction generates an error, then nothing happens at this point.

That error remains pending until the CPU offers the coprocessor the next
VFP instruction, which it refuses.  That causes an undefined instruction
exception, and we trap into the kernel VFP code which reads the VFP
status and works out what needs to be done.

What this means is that if you execute a VFP instruction, wait 10 minutes
and then execute another VFP instruction, if the first VFP instruction
raised an exception, you'll get to hear about it 10 minutes later.

You can use whatever weasel words you want to describe that situation,
my choice is "asynchronous", your choice is "delayed".  However, it is
clearly not "synchronous", and arguing that we should report something
synchronously that is not reported to _us_ synchronously (where
synchronous means "at the same time") is IMHO daft.

So, let's take an example:

	installs SIGFPE handler
	..fp instructions.. one of which raises an exception
	returns to main loop
	main loop blocks all signals while it sets stuff up
	calls ppoll()

In the synchronous SIGFPE delivery case, the SIGFPE handler will be
called when the exception is generated in the FP code, and delivered
at that time.  The fact that the main loop blocks all signals happens
later, so the users handler gets called as one expects.

In the VFP case, however, the FP instructions towards the end may not
end up causing the exception to be signalled until sometime later,
and as I've already explained, that may be the result of a C library
function accessing the VFP registers.  This could well end up trying
to deliver the SIGFPE while signals are blocked, and we get
drastically different behaviour if force_sig_info() is used.

In the VFP case, if force_sig_info() is used, the program gets killed
at this point.  In the non-VFP case, the program's signal handler was
called.

Using send_sig_info() results in the already delayed or asynchronous
signal being held off until ppoll() drops the blocking, at which point
the signal is delivered, the program handles it in its handler, and
the program continues to run.

So
1. non-VFP case, program doesn't get killed but gets the opportunity
   to handle the signal.
2. VFP case with send_sig_info, program doesn't get killed but gets
   the opportunity to handle the signal.
3. VFP case with force_sig_info, the program gets killed and dumps
   core.

Which one of these results in a big change of behaviour in your
opinion?
Dave Martin Jan. 17, 2018, 5:17 p.m. UTC | #14
On Mon, Jan 15, 2018 at 11:23:03AM -0600, Eric W. Biederman wrote:
> Dave Martin <Dave.Martin@arm.com> writes:
> 
> > On Thu, Jan 11, 2018 at 06:59:36PM -0600, Eric W. Biederman wrote:

[...]

> >> Possible ABI fixes include:
> >> - Send the signal without siginfo
> >> - Don't generate a signal

[...]

> >> - Possibly assign and use an appropriate si_code
> >> - Don't handle cases which can't happen
> >
> > I think a mixture of these two is the best approach.
> >
> > In any case, si_code == 0 here doesn't seem to have any explicit meaning.
> > I think we can translate all of the arm64 faults to proper si_codes --
> > see my sketch below.  Probably means a bit more thought though.

[...]

> >> diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c

[...]

> >> @@ -607,70 +607,70 @@ static int do_sea(unsigned long addr, unsigned int esr, struct pt_regs *regs)
> >>  }
> >>  
> >>  static const struct fault_info fault_info[] = {
> >> -	{ do_bad,		SIGBUS,  0,		"ttbr address size fault"	},
> >> -	{ do_bad,		SIGBUS,  0,		"level 1 address size fault"	},
> >> -	{ do_bad,		SIGBUS,  0,		"level 2 address size fault"	},
> >> -	{ do_bad,		SIGBUS,  0,		"level 3 address size fault"	},

If I convert this kind of thing to SIGKILL there really is nothing
sensible to put in si_code, except possibly SI_KERNEL (indicating that
the kill did not come from userspace).  Even so, it hardly seems worth
filling in fields like si_pid and si_uid just to make this "correct".

In any case, if siginfo is never seen by userspace for SIGKILL this is
moot.

Obviously, siginfo is never copied to the user stack in that case, but
is it also guaranteed not to be visible to userspace by other means?
For ptrace I'm hoping not, since SIGKILL should nuke the tracee
immediately instead of being reported to the tracer as a
signal-delivery-stop -- so the tracer should get WIFSIGNALED() &&
WTERMSIG() == SIGKILL.  A subsequent PTRACE_GETSIGINFO would fail with
ESRCH.

Does that match your understanding?

If so, there is some merit in not pretending to pass a reall value
for si_code.

Should si_code simply be ignored for the SIGKILL case?

Cheers
---Dave
Eric W. Biederman Jan. 17, 2018, 5:24 p.m. UTC | #15
Dave Martin <Dave.Martin@arm.com> writes:

> On Mon, Jan 15, 2018 at 11:23:03AM -0600, Eric W. Biederman wrote:
>> Dave Martin <Dave.Martin@arm.com> writes:
>> 
>> > On Thu, Jan 11, 2018 at 06:59:36PM -0600, Eric W. Biederman wrote:
>
> [...]
>
>> >> Possible ABI fixes include:
>> >> - Send the signal without siginfo
>> >> - Don't generate a signal
>
> [...]
>
>> >> - Possibly assign and use an appropriate si_code
>> >> - Don't handle cases which can't happen
>> >
>> > I think a mixture of these two is the best approach.
>> >
>> > In any case, si_code == 0 here doesn't seem to have any explicit meaning.
>> > I think we can translate all of the arm64 faults to proper si_codes --
>> > see my sketch below.  Probably means a bit more thought though.
>
> [...]
>
>> >> diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
>
> [...]
>
>> >> @@ -607,70 +607,70 @@ static int do_sea(unsigned long addr, unsigned int esr, struct pt_regs *regs)
>> >>  }
>> >>  
>> >>  static const struct fault_info fault_info[] = {
>> >> -	{ do_bad,		SIGBUS,  0,		"ttbr address size fault"	},
>> >> -	{ do_bad,		SIGBUS,  0,		"level 1 address size fault"	},
>> >> -	{ do_bad,		SIGBUS,  0,		"level 2 address size fault"	},
>> >> -	{ do_bad,		SIGBUS,  0,		"level 3 address size fault"	},
>
> If I convert this kind of thing to SIGKILL there really is nothing
> sensible to put in si_code, except possibly SI_KERNEL (indicating that
> the kill did not come from userspace).  Even so, it hardly seems worth
> filling in fields like si_pid and si_uid just to make this "correct".
>
> In any case, if siginfo is never seen by userspace for SIGKILL this is
> moot.
>
> Obviously, siginfo is never copied to the user stack in that case, but
> is it also guaranteed not to be visible to userspace by other means?
> For ptrace I'm hoping not, since SIGKILL should nuke the tracee
> immediately instead of being reported to the tracer as a
> signal-delivery-stop -- so the tracer should get WIFSIGNALED() &&
> WTERMSIG() == SIGKILL.  A subsequent PTRACE_GETSIGINFO would fail with
> ESRCH.
>
> Does that match your understanding?
>
> If so, there is some merit in not pretending to pass a reall value
> for si_code.
>
> Should si_code simply be ignored for the SIGKILL case?

I know what x86 does in a similar case is it uses force_sig instead of
force_sig_info.  Then the generic code gets to worry about 

If the appropriate paths generic paths get to worry about what siginfo
to fill in in that case.  Which for SI_KERNEL is zero for everything
except the si_code and the si_signo.

That seems perfectly reasonable.

Eric
Dave Martin Jan. 17, 2018, 5:39 p.m. UTC | #16
On Wed, Jan 17, 2018 at 11:24:06AM -0600, Eric W. Biederman wrote:
> Dave Martin <Dave.Martin@arm.com> writes:

[...]

> > Should si_code simply be ignored for the SIGKILL case?
> 
> I know what x86 does in a similar case is it uses force_sig instead of
> force_sig_info.  Then the generic code gets to worry about 
> 
> If the appropriate paths generic paths get to worry about what siginfo
> to fill in in that case.  Which for SI_KERNEL is zero for everything
> except the si_code and the si_signo.
> 
> That seems perfectly reasonable.

OK, I'll go with SI_KERNEL then.

Cheers
---Dave
Eric W. Biederman Jan. 24, 2018, 9:28 p.m. UTC | #17
Russell King - ARM Linux <linux@armlinux.org.uk> writes:

> On Wed, Jan 17, 2018 at 10:45:10AM -0600, Eric W. Biederman wrote:
>> Russell King - ARM Linux <linux@armlinux.org.uk> writes:
>> >From your description there still seems to be an association with an
>> instruction so I don't know if I would really call the signal
>> asynchronous.  It sounds like the exception is delayed and not
>> asynchronous.
>
> Traps can only be passed from ARM coprocessors by a coprocessor refusing
> to execute an instruction.  That's what happens in this case - the VFP
> gets offered an instruction to execute.  It accepts it, and the CPU
> continues, leaving the VFP to execute its instruction independently.  If
> this instruction generates an error, then nothing happens at this point.
>
> That error remains pending until the CPU offers the coprocessor the next
> VFP instruction, which it refuses.  That causes an undefined instruction
> exception, and we trap into the kernel VFP code which reads the VFP
> status and works out what needs to be done.
>
> What this means is that if you execute a VFP instruction, wait 10 minutes
> and then execute another VFP instruction, if the first VFP instruction
> raised an exception, you'll get to hear about it 10 minutes later.
>
> You can use whatever weasel words you want to describe that situation,
> my choice is "asynchronous", your choice is "delayed".  However, it is
> clearly not "synchronous", and arguing that we should report something
> synchronously that is not reported to _us_ synchronously (where
> synchronous means "at the same time") is IMHO daft.
>
> So, let's take an example:
>
> 	installs SIGFPE handler
> 	..fp instructions.. one of which raises an exception
> 	returns to main loop
> 	main loop blocks all signals while it sets stuff up
> 	calls ppoll()
>
> In the synchronous SIGFPE delivery case, the SIGFPE handler will be
> called when the exception is generated in the FP code, and delivered
> at that time.  The fact that the main loop blocks all signals happens
> later, so the users handler gets called as one expects.
>
> In the VFP case, however, the FP instructions towards the end may not
> end up causing the exception to be signalled until sometime later,
> and as I've already explained, that may be the result of a C library
> function accessing the VFP registers.  This could well end up trying
> to deliver the SIGFPE while signals are blocked, and we get
> drastically different behaviour if force_sig_info() is used.
>
> In the VFP case, if force_sig_info() is used, the program gets killed
> at this point.  In the non-VFP case, the program's signal handler was
> called.
>
> Using send_sig_info() results in the already delayed or asynchronous
> signal being held off until ppoll() drops the blocking, at which point
> the signal is delivered, the program handles it in its handler, and
> the program continues to run.
>
> So
> 1. non-VFP case, program doesn't get killed but gets the opportunity
>    to handle the signal.
> 2. VFP case with send_sig_info, program doesn't get killed but gets
>    the opportunity to handle the signal.
> 3. VFP case with force_sig_info, the program gets killed and dumps
>    core.
>
> Which one of these results in a big change of behaviour in your
> opinion?

I want to apologize for the disagreement.  In part of my due diligence
for cleaning up the signal handling I am introducing some helpers for
generating siginfo.  I decided to ask which kind of helpers should I
introduce.

Very basic generic helpers that just wrap the current functionality
today.  Or some slightly smarter helpers that solve some other problems
as well.  After consideration I am shelving the smarter helpers for now,
as the need to introduce the helpers universally is strong, so that I
can guarantee struct siginfo is always fully initialized before being
passed to userspace.

Given the choice between force_sig_info and send_sig_info I agree that
send_sig_info is the right choice for signals that can be ignored.

The problem I was focusing on is the problem where force_sig_info and
send_sig_info can be tricked into causing the instruction pointer to
point to the wrong instruction (even when the signal is not blocked),
due to the delivery of another signal.

So I was wondering if in practice we could introduce a singal delivery
function that would operation synchronously and would solve the
instruction pointer problem.

It looks to me like this location on arm where we are using
send_sig_info is a clear candidate for such a function as long as it has
a mode where you can say deliverly the signal like send_sig_info if the
signal is blocked.

Still like I said such a smarter helper is not the priority and I don't
intend any semantic changes when I introduce helpers into the signal
deliver path.  Just fewer places initializing struct siginfo.

Eric
diff mbox

Patch

diff --git a/arch/arm64/include/uapi/asm/siginfo.h b/arch/arm64/include/uapi/asm/siginfo.h
index 574d12f86039..9b4d91277742 100644
--- a/arch/arm64/include/uapi/asm/siginfo.h
+++ b/arch/arm64/include/uapi/asm/siginfo.h
@@ -21,4 +21,25 @@ 
 
 #include <asm-generic/siginfo.h>
 
+/*
+ * SIGFPE si_codes
+ */
+#ifdef __KERNEL__
+#define FPE_FIXME	0	/* Broken dup of SI_USER */
+#endif /* __KERNEL__ */
+
+/*
+ * SIGBUS si_codes
+ */
+#ifdef __KERNEL__
+#define BUS_FIXME	0	/* Broken dup of SI_USER */
+#endif /* __KERNEL__ */
+
+/*
+ * SIGTRAP si_codes
+ */
+#ifdef __KERNEL__
+#define TRAP_FIXME	0	/* Broken dup of SI_USER */
+#endif /* __KERNEL__ */
+
 #endif
diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index fae81f7964b4..ad0edf31e247 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -867,7 +867,7 @@  asmlinkage void do_fpsimd_acc(unsigned int esr, struct pt_regs *regs)
 asmlinkage void do_fpsimd_exc(unsigned int esr, struct pt_regs *regs)
 {
 	siginfo_t info;
-	unsigned int si_code = 0;
+	unsigned int si_code = FPE_FIXME;
 
 	if (esr & FPEXC_IOF)
 		si_code = FPE_FLTINV;
diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index 9b7f89df49db..abe200587334 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -596,7 +596,7 @@  static int do_sea(unsigned long addr, unsigned int esr, struct pt_regs *regs)
 
 	info.si_signo = SIGBUS;
 	info.si_errno = 0;
-	info.si_code  = 0;
+	info.si_code  = BUS_FIXME;
 	if (esr & ESR_ELx_FnV)
 		info.si_addr = NULL;
 	else
@@ -607,70 +607,70 @@  static int do_sea(unsigned long addr, unsigned int esr, struct pt_regs *regs)
 }
 
 static const struct fault_info fault_info[] = {
-	{ do_bad,		SIGBUS,  0,		"ttbr address size fault"	},
-	{ do_bad,		SIGBUS,  0,		"level 1 address size fault"	},
-	{ do_bad,		SIGBUS,  0,		"level 2 address size fault"	},
-	{ do_bad,		SIGBUS,  0,		"level 3 address size fault"	},
+	{ do_bad,		SIGBUS,  BUS_FIXME,	"ttbr address size fault"	},
+	{ do_bad,		SIGBUS,  BUS_FIXME,	"level 1 address size fault"	},
+	{ do_bad,		SIGBUS,  BUS_FIXME,	"level 2 address size fault"	},
+	{ do_bad,		SIGBUS,  BUS_FIXME,	"level 3 address size fault"	},
 	{ do_translation_fault,	SIGSEGV, SEGV_MAPERR,	"level 0 translation fault"	},
 	{ do_translation_fault,	SIGSEGV, SEGV_MAPERR,	"level 1 translation fault"	},
 	{ do_translation_fault,	SIGSEGV, SEGV_MAPERR,	"level 2 translation fault"	},
 	{ do_translation_fault,	SIGSEGV, SEGV_MAPERR,	"level 3 translation fault"	},
-	{ do_bad,		SIGBUS,  0,		"unknown 8"			},
+	{ do_bad,		SIGBUS,  BUS_FIXME,	"unknown 8"			},
 	{ do_page_fault,	SIGSEGV, SEGV_ACCERR,	"level 1 access flag fault"	},
 	{ do_page_fault,	SIGSEGV, SEGV_ACCERR,	"level 2 access flag fault"	},
 	{ do_page_fault,	SIGSEGV, SEGV_ACCERR,	"level 3 access flag fault"	},
-	{ do_bad,		SIGBUS,  0,		"unknown 12"			},
+	{ do_bad,		SIGBUS,  BUS_FIXME,	"unknown 12"			},
 	{ do_page_fault,	SIGSEGV, SEGV_ACCERR,	"level 1 permission fault"	},
 	{ do_page_fault,	SIGSEGV, SEGV_ACCERR,	"level 2 permission fault"	},
 	{ do_page_fault,	SIGSEGV, SEGV_ACCERR,	"level 3 permission fault"	},
-	{ do_sea,		SIGBUS,  0,		"synchronous external abort"	},
-	{ do_bad,		SIGBUS,  0,		"unknown 17"			},
-	{ do_bad,		SIGBUS,  0,		"unknown 18"			},
-	{ do_bad,		SIGBUS,  0,		"unknown 19"			},
-	{ do_sea,		SIGBUS,  0,		"level 0 (translation table walk)"	},
-	{ do_sea,		SIGBUS,  0,		"level 1 (translation table walk)"	},
-	{ do_sea,		SIGBUS,  0,		"level 2 (translation table walk)"	},
-	{ do_sea,		SIGBUS,  0,		"level 3 (translation table walk)"	},
-	{ do_sea,		SIGBUS,  0,		"synchronous parity or ECC error" },	// Reserved when RAS is implemented
-	{ do_bad,		SIGBUS,  0,		"unknown 25"			},
-	{ do_bad,		SIGBUS,  0,		"unknown 26"			},
-	{ do_bad,		SIGBUS,  0,		"unknown 27"			},
-	{ do_sea,		SIGBUS,  0,		"level 0 synchronous parity error (translation table walk)"	},	// Reserved when RAS is implemented
-	{ do_sea,		SIGBUS,  0,		"level 1 synchronous parity error (translation table walk)"	},	// Reserved when RAS is implemented
-	{ do_sea,		SIGBUS,  0,		"level 2 synchronous parity error (translation table walk)"	},	// Reserved when RAS is implemented
-	{ do_sea,		SIGBUS,  0,		"level 3 synchronous parity error (translation table walk)"	},	// Reserved when RAS is implemented
-	{ do_bad,		SIGBUS,  0,		"unknown 32"			},
+	{ do_sea,		SIGBUS,  BUS_FIXME,	"synchronous external abort"	},
+	{ do_bad,		SIGBUS,  BUS_FIXME,	"unknown 17"			},
+	{ do_bad,		SIGBUS,  BUS_FIXME,	"unknown 18"			},
+	{ do_bad,		SIGBUS,  BUS_FIXME,	"unknown 19"			},
+	{ do_sea,		SIGBUS,  BUS_FIXME,	"level 0 (translation table walk)"	},
+	{ do_sea,		SIGBUS,  BUS_FIXME,	"level 1 (translation table walk)"	},
+	{ do_sea,		SIGBUS,  BUS_FIXME,	"level 2 (translation table walk)"	},
+	{ do_sea,		SIGBUS,  BUS_FIXME,	"level 3 (translation table walk)"	},
+	{ do_sea,		SIGBUS,  BUS_FIXME,	"synchronous parity or ECC error" },	// Reserved when RAS is implemented
+	{ do_bad,		SIGBUS,  BUS_FIXME,	"unknown 25"			},
+	{ do_bad,		SIGBUS,  BUS_FIXME,	"unknown 26"			},
+	{ do_bad,		SIGBUS,  BUS_FIXME,	"unknown 27"			},
+	{ do_sea,		SIGBUS,  BUS_FIXME,	"level 0 synchronous parity error (translation table walk)"	},	// Reserved when RAS is implemented
+	{ do_sea,		SIGBUS,  BUS_FIXME,	"level 1 synchronous parity error (translation table walk)"	},	// Reserved when RAS is implemented
+	{ do_sea,		SIGBUS,  BUS_FIXME,	"level 2 synchronous parity error (translation table walk)"	},	// Reserved when RAS is implemented
+	{ do_sea,		SIGBUS,  BUS_FIXME,	"level 3 synchronous parity error (translation table walk)"	},	// Reserved when RAS is implemented
+	{ do_bad,		SIGBUS,  BUS_FIXME,	"unknown 32"			},
 	{ do_alignment_fault,	SIGBUS,  BUS_ADRALN,	"alignment fault"		},
-	{ do_bad,		SIGBUS,  0,		"unknown 34"			},
-	{ do_bad,		SIGBUS,  0,		"unknown 35"			},
-	{ do_bad,		SIGBUS,  0,		"unknown 36"			},
-	{ do_bad,		SIGBUS,  0,		"unknown 37"			},
-	{ do_bad,		SIGBUS,  0,		"unknown 38"			},
-	{ do_bad,		SIGBUS,  0,		"unknown 39"			},
-	{ do_bad,		SIGBUS,  0,		"unknown 40"			},
-	{ do_bad,		SIGBUS,  0,		"unknown 41"			},
-	{ do_bad,		SIGBUS,  0,		"unknown 42"			},
-	{ do_bad,		SIGBUS,  0,		"unknown 43"			},
-	{ do_bad,		SIGBUS,  0,		"unknown 44"			},
-	{ do_bad,		SIGBUS,  0,		"unknown 45"			},
-	{ do_bad,		SIGBUS,  0,		"unknown 46"			},
-	{ do_bad,		SIGBUS,  0,		"unknown 47"			},
-	{ do_bad,		SIGBUS,  0,		"TLB conflict abort"		},
-	{ do_bad,		SIGBUS,  0,		"Unsupported atomic hardware update fault"	},
-	{ do_bad,		SIGBUS,  0,		"unknown 50"			},
-	{ do_bad,		SIGBUS,  0,		"unknown 51"			},
-	{ do_bad,		SIGBUS,  0,		"implementation fault (lockdown abort)" },
-	{ do_bad,		SIGBUS,  0,		"implementation fault (unsupported exclusive)" },
-	{ do_bad,		SIGBUS,  0,		"unknown 54"			},
-	{ do_bad,		SIGBUS,  0,		"unknown 55"			},
-	{ do_bad,		SIGBUS,  0,		"unknown 56"			},
-	{ do_bad,		SIGBUS,  0,		"unknown 57"			},
-	{ do_bad,		SIGBUS,  0,		"unknown 58" 			},
-	{ do_bad,		SIGBUS,  0,		"unknown 59"			},
-	{ do_bad,		SIGBUS,  0,		"unknown 60"			},
-	{ do_bad,		SIGBUS,  0,		"section domain fault"		},
-	{ do_bad,		SIGBUS,  0,		"page domain fault"		},
-	{ do_bad,		SIGBUS,  0,		"unknown 63"			},
+	{ do_bad,		SIGBUS,  BUS_FIXME,	"unknown 34"			},
+	{ do_bad,		SIGBUS,  BUS_FIXME,	"unknown 35"			},
+	{ do_bad,		SIGBUS,  BUS_FIXME,	"unknown 36"			},
+	{ do_bad,		SIGBUS,  BUS_FIXME,	"unknown 37"			},
+	{ do_bad,		SIGBUS,  BUS_FIXME,	"unknown 38"			},
+	{ do_bad,		SIGBUS,  BUS_FIXME,	"unknown 39"			},
+	{ do_bad,		SIGBUS,  BUS_FIXME,	"unknown 40"			},
+	{ do_bad,		SIGBUS,  BUS_FIXME,	"unknown 41"			},
+	{ do_bad,		SIGBUS,  BUS_FIXME,	"unknown 42"			},
+	{ do_bad,		SIGBUS,  BUS_FIXME,	"unknown 43"			},
+	{ do_bad,		SIGBUS,  BUS_FIXME,	"unknown 44"			},
+	{ do_bad,		SIGBUS,  BUS_FIXME,	"unknown 45"			},
+	{ do_bad,		SIGBUS,  BUS_FIXME,	"unknown 46"			},
+	{ do_bad,		SIGBUS,  BUS_FIXME,	"unknown 47"			},
+	{ do_bad,		SIGBUS,  BUS_FIXME,	"TLB conflict abort"		},
+	{ do_bad,		SIGBUS,  BUS_FIXME,	"Unsupported atomic hardware update fault"	},
+	{ do_bad,		SIGBUS,  BUS_FIXME,	"unknown 50"			},
+	{ do_bad,		SIGBUS,  BUS_FIXME,	"unknown 51"			},
+	{ do_bad,		SIGBUS,  BUS_FIXME,	"implementation fault (lockdown abort)" },
+	{ do_bad,		SIGBUS,  BUS_FIXME,	"implementation fault (unsupported exclusive)" },
+	{ do_bad,		SIGBUS,  BUS_FIXME,	"unknown 54"			},
+	{ do_bad,		SIGBUS,  BUS_FIXME,	"unknown 55"			},
+	{ do_bad,		SIGBUS,  BUS_FIXME,	"unknown 56"			},
+	{ do_bad,		SIGBUS,  BUS_FIXME,	"unknown 57"			},
+	{ do_bad,		SIGBUS,  BUS_FIXME,	"unknown 58" 			},
+	{ do_bad,		SIGBUS,  BUS_FIXME,	"unknown 59"			},
+	{ do_bad,		SIGBUS,  BUS_FIXME,	"unknown 60"			},
+	{ do_bad,		SIGBUS,  BUS_FIXME,	"section domain fault"		},
+	{ do_bad,		SIGBUS,  BUS_FIXME,	"page domain fault"		},
+	{ do_bad,		SIGBUS,  BUS_FIXME,	"unknown 63"			},
 };
 
 int handle_guest_sea(phys_addr_t addr, unsigned int esr)
@@ -739,11 +739,11 @@  static struct fault_info __refdata debug_fault_info[] = {
 	{ do_bad,	SIGTRAP,	TRAP_HWBKPT,	"hardware breakpoint"	},
 	{ do_bad,	SIGTRAP,	TRAP_HWBKPT,	"hardware single-step"	},
 	{ do_bad,	SIGTRAP,	TRAP_HWBKPT,	"hardware watchpoint"	},
-	{ do_bad,	SIGBUS,		0,		"unknown 3"		},
+	{ do_bad,	SIGBUS,		BUS_FIXME,	"unknown 3"		},
 	{ do_bad,	SIGTRAP,	TRAP_BRKPT,	"aarch32 BKPT"		},
-	{ do_bad,	SIGTRAP,	0,		"aarch32 vector catch"	},
+	{ do_bad,	SIGTRAP,	TRAP_FIXME,	"aarch32 vector catch"	},
 	{ early_brk64,	SIGTRAP,	TRAP_BRKPT,	"aarch64 BRK"		},
-	{ do_bad,	SIGBUS,		0,		"unknown 7"		},
+	{ do_bad,	SIGBUS,		BUS_FIXME,	"unknown 7"		},
 };
 
 void __init hook_debug_fault_code(int nr,
diff --git a/kernel/signal.c b/kernel/signal.c
index c894162ec96c..fd182a845490 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2711,6 +2711,10 @@  enum siginfo_layout siginfo_layout(int sig, int si_code)
 #ifdef FPE_FIXME
 		if ((sig == SIGFPE) && (si_code == FPE_FIXME))
 			layout = SIL_FAULT;
+#endif
+#ifdef BUS_FIXME
+		if ((sig == SIGBUS) && (si_code == BUS_FIXME))
+			layout = SIL_FAULT;
 #endif
 	}
 	return layout;