mbox series

[0/3] Stack checking on Arm

Message ID 20240729142421.137283-1-stewart.hildebrand@amd.com (mailing list archive)
Headers show
Series Stack checking on Arm | expand

Message

Stewart Hildebrand July 29, 2024, 2:24 p.m. UTC
This series introduces stack check instrumentation on Arm. This is
helpful for safety certification efforts. I'm sending this in an RFC
state because I wanted to gather opinions on the approach of using
-finstrument-functions.

Stewart Hildebrand (3):
  xen: add no_instrument_function attributes
  xen: silence maybe-unitialized warning
  xen/arm: stack check instrumentation

 xen/arch/arm/arch.mk                     |  1 +
 xen/arch/arm/arm64/head.S                |  4 +++
 xen/arch/arm/domain.c                    |  3 ++
 xen/arch/arm/include/asm/arm64/cmpxchg.h |  4 +++
 xen/arch/arm/include/asm/arm64/sve.h     |  1 +
 xen/arch/arm/include/asm/atomic.h        |  2 ++
 xen/arch/arm/include/asm/guest_atomics.h |  1 +
 xen/arch/arm/include/asm/page.h          |  2 ++
 xen/arch/arm/include/asm/traps.h         |  8 +++++
 xen/arch/arm/setup.c                     |  4 +++
 xen/arch/arm/smpboot.c                   |  3 ++
 xen/arch/arm/traps.c                     | 45 ++++++++++++++++++++++++
 xen/common/sched/cpupool.c               |  1 +
 xen/include/xsm/dummy.h                  |  1 +
 14 files changed, 80 insertions(+)


base-commit: b25b28ede1cba43eda1e0b84ad967683b8196847

Comments

Julien Grall July 29, 2024, 6:39 p.m. UTC | #1
Hi,

On 29/07/2024 15:24, Stewart Hildebrand wrote:
> This series introduces stack check instrumentation on Arm. This is
> helpful for safety certification efforts. I'm sending this in an RFC
> state because I wanted to gather opinions on the approach of using
> -finstrument-functions.

This looks ok for an initial approach. I wonder if longer term we want 
to implement stack guards on Arm. We would need to allocate an extra 
"virtual" page per stack that would be unmapped.

The advantage is this could be used also in production and doesn't rely 
on any support from the processor.

Any thoughts?

Cheers,
Julien Grall July 29, 2024, 6:39 p.m. UTC | #2
On 29/07/2024 19:39, Julien Grall wrote:
> Hi,
> 
> On 29/07/2024 15:24, Stewart Hildebrand wrote:
>> This series introduces stack check instrumentation on Arm. This is
>> helpful for safety certification efforts. I'm sending this in an RFC
>> state because I wanted to gather opinions on the approach of using
>> -finstrument-functions.
> 
> This looks ok for an initial approach. I wonder if longer term we want 
> to implement stack guards on Arm. We would need to allocate an extra 
> "virtual" page per stack that would be unmapped.
> 
> The advantage is this could be used also in production and doesn't rely 
> on any support from the processor.

s/processor/compiler/

Cheers,
Stefano Stabellini July 29, 2024, 9:37 p.m. UTC | #3
On Mon, 29 Jul 2024, Julien Grall wrote:
> Hi,
> 
> On 29/07/2024 15:24, Stewart Hildebrand wrote:
> > This series introduces stack check instrumentation on Arm. This is
> > helpful for safety certification efforts. I'm sending this in an RFC
> > state because I wanted to gather opinions on the approach of using
> > -finstrument-functions.
> 
> This looks ok for an initial approach. I wonder if longer term we want to
> implement stack guards on Arm. We would need to allocate an extra "virtual"
> page per stack that would be unmapped.
> 
> The advantage is this could be used also in production and doesn't rely on any
> support from the processor.
> 
> Any thoughts?

I think we need both. We should implement stack guards on Arm. In
addition, it is also beneficial to have -finstrument-functions for
profiling, debugging, and also so that we can retrieve detailed call
graphs from execution runs. As an example, -finstrument-functions can
help with offline analysis to prove that we don't have unbounded
recursion, on both arm and x86 too. On the other hand, stack guards help
with protecting the stack in production.
Andrew Cooper July 30, 2024, 8:40 p.m. UTC | #4
On 29/07/2024 10:37 pm, Stefano Stabellini wrote:
> On Mon, 29 Jul 2024, Julien Grall wrote:
>> Hi,
>>
>> On 29/07/2024 15:24, Stewart Hildebrand wrote:
>>> This series introduces stack check instrumentation on Arm. This is
>>> helpful for safety certification efforts. I'm sending this in an RFC
>>> state because I wanted to gather opinions on the approach of using
>>> -finstrument-functions.
>> This looks ok for an initial approach. I wonder if longer term we want to
>> implement stack guards on Arm. We would need to allocate an extra "virtual"
>> page per stack that would be unmapped.
>>
>> The advantage is this could be used also in production and doesn't rely on any
>> support from the processor.
>>
>> Any thoughts?
> I think we need both. We should implement stack guards on Arm. In
> addition, it is also beneficial to have -finstrument-functions for
> profiling, debugging, and also so that we can retrieve detailed call
> graphs from execution runs. As an example, -finstrument-functions can
> help with offline analysis to prove that we don't have unbounded
> recursion, on both arm and x86 too. On the other hand, stack guards help
> with protecting the stack in production.

x86 unconditionally has stack guards.  It was easier doing it this way
than to have conditional stack guards and conditional shadow stacks
(hardware CET stacks).

In most cases, the #DF handler will recognise hitting a guard page and
panic() with information about the stack overflow.

The one case where this wont happen is on AMD hardware in HVM(SVM)
context, because the SVM designers and the AMD64 designers didn't talk
to each other when developing the respective features[1]...

In this case, Xen will take a clean[2] triple fault and reset on stack
overflow.

~Andrew

[1] The Task Register isn't switched on VMRUN, and while this was fine
in 32bit CPUs, it wasn't fine when 64bit CPUs replaced Task Switches
with the Interrupt Stack Table mechanism.  It should be fixed when AMD
implement FRED support.  For current CPUs, we can in principle fix this,
but at added latency to vmentry/exit, which is why we've not done so. 
If someone feels like adding a debug mode for it, I'm sure that would be
acceptable.

[2] If you can call such a thing "clean", but it won't execute off into
the weeds.

[3] (tangent from another thread).  14k of stack!?!.  x86 has 8k and
we've only ever hit that in error cases...