diff mbox series

[v3,21/36] arm64/mm: Implement map_shadow_stack()

Message ID 20230731-arm64-gcs-v3-21-cddf9f980d98@kernel.org (mailing list archive)
State New
Headers show
Series arm64/gcs: Provide support for GCS in userspace | expand

Commit Message

Mark Brown July 31, 2023, 1:43 p.m. UTC
As discussed extensively in the changelog for the addition of this
syscall on x86 ("x86/shstk: Introduce map_shadow_stack syscall") the
existing mmap() and madvise() syscalls do not map entirely well onto the
security requirements for guarded control stacks since they lead to
windows where memory is allocated but not yet protected or stacks which
are not properly and safely initialised. Instead a new syscall
map_shadow_stack() has been defined which allocates and initialises a
shadow stack page.

Implement this for arm64, initialising memory allocated this way with
the top two entries in the stack being 0 (to allow detection of the end
of the GCS) and a GCS cap token (to allow switching to the newly
allocated GCS via the GCS switch instructions).

Since the x86 code has not yet been rebased to v6.5-rc1 this includes
the architecture neutral parts of Rick Edgecmbe's "x86/shstk: Introduce
map_shadow_stack syscall".

Signed-off-by: Mark Brown <broonie@kernel.org>
---
 arch/arm64/mm/gcs.c               | 50 ++++++++++++++++++++++++++++++++++++++-
 include/linux/syscalls.h          |  1 +
 include/uapi/asm-generic/unistd.h |  5 +++-
 kernel/sys_ni.c                   |  1 +
 4 files changed, 55 insertions(+), 2 deletions(-)

Comments

Rick Edgecombe July 31, 2023, 3:56 p.m. UTC | #1
On Mon, 2023-07-31 at 14:43 +0100, Mark Brown wrote:
> +SYSCALL_DEFINE3(map_shadow_stack, unsigned long, addr, unsigned
> long, size, unsigned int, flags)
> +{
> +       unsigned long alloc_size;
> +       unsigned long __user *cap_ptr;
> +       unsigned long cap_val;
> +       int ret;
> +
> +       if (!system_supports_gcs())
> +               return -EOPNOTSUPP;
> +
> +       if (flags)
> +               return -EINVAL;

Any thoughts on the questions at the end of this mail?

https://lore.kernel.org/lkml/7a4c97f68347d4188286c543cdccaa12577cdb9e.camel@intel.com/
Mark Brown July 31, 2023, 5:06 p.m. UTC | #2
On Mon, Jul 31, 2023 at 03:56:50PM +0000, Edgecombe, Rick P wrote:
> On Mon, 2023-07-31 at 14:43 +0100, Mark Brown wrote:

> Any thoughts on the questions at the end of this mail?

> https://lore.kernel.org/lkml/7a4c97f68347d4188286c543cdccaa12577cdb9e.camel@intel.com/

Those are:

> Someday when the x86 side is finally upstream I have a manpage for
> map_shadow_stack. Any differences on the arm side would need to be
> documented, but I'm not sure why there should be any differences. Like,
> why not use the same flags? Or have a new flag for token+end marker
> that x86 can use as well?

Ah, it wasn't clear to me that this was a question rather than just
open decisions about the eventual manpage.  Looking again I think what
you're asking about is that I see that at some point in development I
lost the SHADOW_STACK_SET_TOKEN flag which x86 has.  I suspect that was
a rebasing issue as it wasn't a deliberate decision, there's no reason
we couldn't have that.  Other than that and the fact that we add both a
stack swap token and a top of stack marker I'm not aware of any
differences.
Rick Edgecombe July 31, 2023, 11:19 p.m. UTC | #3
On Mon, 2023-07-31 at 18:06 +0100, Mark Brown wrote:
> > Someday when the x86 side is finally upstream I have a manpage for
> > map_shadow_stack. Any differences on the arm side would need to be
> > documented, but I'm not sure why there should be any differences.
> > Like,
> > why not use the same flags? Or have a new flag for token+end marker
> > that x86 can use as well?
> 
> Ah, it wasn't clear to me that this was a question rather than just
> open decisions about the eventual manpage.  Looking again I think
> what
> you're asking about is that I see that at some point in development I
> lost the SHADOW_STACK_SET_TOKEN flag which x86 has.  I suspect that
> was
> a rebasing issue as it wasn't a deliberate decision, there's no
> reason
> we couldn't have that.  Other than that and the fact that we add both
> a
> stack swap token and a top of stack marker I'm not aware of any
> differences.

The thing I was trying to get at was, we have this shared syscall that
means create shadow stack memory and prepopulate it like this flag
says. On x86 we optionally support SHADOW_STACK_SET_TOKEN which means
put a token right at the end of size. So maybe arm should have a
different flag value that includes putting the marker and then the
token, and x86 could match it someday if we get markers too.

It could be a different flag, like SHADOW_STACK_SET_TOKEN_MARKER, or it
could be SHADOW_STACK_SET_MARKER, and callers could pass
(SHADOW_STACK_SET_TOKEN | SHADOW_STACK_SET_MARKER) to get what you have
implemented here. What do you think?
Mark Brown Aug. 1, 2023, 2:01 p.m. UTC | #4
On Mon, Jul 31, 2023 at 11:19:34PM +0000, Edgecombe, Rick P wrote:

> The thing I was trying to get at was, we have this shared syscall that
> means create shadow stack memory and prepopulate it like this flag
> says. On x86 we optionally support SHADOW_STACK_SET_TOKEN which means
> put a token right at the end of size. So maybe arm should have a
> different flag value that includes putting the marker and then the
> token, and x86 could match it someday if we get markers too.

Oh, I see.  My mental model was that this was controlling the whole
thing we put at the top rather than treating the terminator and the cap
separately.

> It could be a different flag, like SHADOW_STACK_SET_TOKEN_MARKER, or it
> could be SHADOW_STACK_SET_MARKER, and callers could pass
> (SHADOW_STACK_SET_TOKEN | SHADOW_STACK_SET_MARKER) to get what you have
> implemented here. What do you think?

For arm64 code this would mean that it would be possible (and fairly
easy) to create stacks which don't have a termination record which would
make life harder for unwinders to rely on.  I don't think this is
insurmountable, creating manually shouldn't be the standard and it'll
already be an issue on x86 anyway.

The other minor issue is that the current arm64 marker is all bits 0
so by itself for arm64 _MARKER would have no perceptible impact, it
would only serve to push the token down a slot in the stack (I'm
guessing that's the intended meaning?).  I'm not sure that's a
particularly big deal though.
Rick Edgecombe Aug. 1, 2023, 5:07 p.m. UTC | #5
On Tue, 2023-08-01 at 15:01 +0100, Mark Brown wrote:
> On Mon, Jul 31, 2023 at 11:19:34PM +0000, Edgecombe, Rick P wrote:
> 
> > The thing I was trying to get at was, we have this shared syscall
> > that
> > means create shadow stack memory and prepopulate it like this flag
> > says. On x86 we optionally support SHADOW_STACK_SET_TOKEN which
> > means
> > put a token right at the end of size. So maybe arm should have a
> > different flag value that includes putting the marker and then the
> > token, and x86 could match it someday if we get markers too.
> 
> Oh, I see.  My mental model was that this was controlling the whole
> thing we put at the top rather than treating the terminator and the
> cap
> separately.
> 
> > It could be a different flag, like SHADOW_STACK_SET_TOKEN_MARKER,
> > or it
> > could be SHADOW_STACK_SET_MARKER, and callers could pass
> > (SHADOW_STACK_SET_TOKEN | SHADOW_STACK_SET_MARKER) to get what you
> > have
> > implemented here. What do you think?
> 
> For arm64 code this would mean that it would be possible (and fairly
> easy) to create stacks which don't have a termination record which
> would
> make life harder for unwinders to rely on.  I don't think this is
> insurmountable, creating manually shouldn't be the standard and it'll
> already be an issue on x86 anyway.

If you are going to support optionally writing to shadow stacks (which
x86 needed for CRIU, and also seems like a nice thing for several other
reasons), you are already at that point. Can't you also do a bunch of
gcspopm's to the top of the GCS stack, and have no marker to hit before
the end of the stack? (maybe not in GCS, I don't know...)

> 
> The other minor issue is that the current arm64 marker is all bits 0
> so by itself for arm64 _MARKER would have no perceptible impact, it
> would only serve to push the token down a slot in the stack (I'm
> guessing that's the intended meaning?).

Pushing the token down a frame is what flags==0 does in this patch,
right?

You don't have to support all the flags actually, you could just
support the one mode you already have and reject all other
combinations... Then it matches between arch's, and you still have the
guaranteed-ish end marker.

So the question is not what mode should arm support, but should we have
the flags match between x86 and ARM?

>   I'm not sure that's a
> particularly big deal though.

Yea, it's not a big problem either way.
Mike Rapoport Aug. 1, 2023, 5:28 p.m. UTC | #6
On Tue, Aug 01, 2023 at 05:07:00PM +0000, Edgecombe, Rick P wrote:
> On Tue, 2023-08-01 at 15:01 +0100, Mark Brown wrote:
> > On Mon, Jul 31, 2023 at 11:19:34PM +0000, Edgecombe, Rick P wrote:
> > 
> > > The thing I was trying to get at was, we have this shared syscall
> > > that
> > > means create shadow stack memory and prepopulate it like this flag
> > > says. On x86 we optionally support SHADOW_STACK_SET_TOKEN which
> > > means
> > > put a token right at the end of size. So maybe arm should have a
> > > different flag value that includes putting the marker and then the
> > > token, and x86 could match it someday if we get markers too.
> > 
> > Oh, I see.  My mental model was that this was controlling the whole
> > thing we put at the top rather than treating the terminator and the
> > cap
> > separately.
> > 
> > > It could be a different flag, like SHADOW_STACK_SET_TOKEN_MARKER,
> > > or it
> > > could be SHADOW_STACK_SET_MARKER, and callers could pass
> > > (SHADOW_STACK_SET_TOKEN | SHADOW_STACK_SET_MARKER) to get what you
> > > have
> > > implemented here. What do you think?
> > 
> > For arm64 code this would mean that it would be possible (and fairly
> > easy) to create stacks which don't have a termination record which
> > would
> > make life harder for unwinders to rely on.  I don't think this is
> > insurmountable, creating manually shouldn't be the standard and it'll
> > already be an issue on x86 anyway.
> 
> If you are going to support optionally writing to shadow stacks (which
> x86 needed for CRIU, and also seems like a nice thing for several other
> reasons), you are already at that point. Can't you also do a bunch of
> gcspopm's to the top of the GCS stack, and have no marker to hit before
> the end of the stack? (maybe not in GCS, I don't know...)
> 
> > 
> > The other minor issue is that the current arm64 marker is all bits 0
> > so by itself for arm64 _MARKER would have no perceptible impact, it
> > would only serve to push the token down a slot in the stack (I'm
> > guessing that's the intended meaning?).
> 
> Pushing the token down a frame is what flags==0 does in this patch,
> right?
> 
> You don't have to support all the flags actually, you could just
> support the one mode you already have and reject all other
> combinations... Then it matches between arch's, and you still have the
> guaranteed-ish end marker.
> 
> So the question is not what mode should arm support, but should we have
> the flags match between x86 and ARM?

What if the flag will be called, say, SHADOW_STACK_DEFAULT_INIT?
Then each arch can push whatever it likes to and from the userspace
perspective the shadow stack will have some basic init state, no matter
what architecture it is.
 
> >   I'm not sure that's a
> > particularly big deal though.
> 
> Yea, it's not a big problem either way.
Mark Brown Aug. 1, 2023, 5:57 p.m. UTC | #7
On Tue, Aug 01, 2023 at 05:07:00PM +0000, Edgecombe, Rick P wrote:
> On Tue, 2023-08-01 at 15:01 +0100, Mark Brown wrote:

> > > It could be a different flag, like SHADOW_STACK_SET_TOKEN_MARKER,
> > > or it
> > > could be SHADOW_STACK_SET_MARKER, and callers could pass
> > > (SHADOW_STACK_SET_TOKEN | SHADOW_STACK_SET_MARKER) to get what you
> > > have
> > > implemented here. What do you think?

> > For arm64 code this would mean that it would be possible (and fairly
> > easy) to create stacks which don't have a termination record which
> > would
> > make life harder for unwinders to rely on.  I don't think this is
> > insurmountable, creating manually shouldn't be the standard and it'll
> > already be an issue on x86 anyway.

> If you are going to support optionally writing to shadow stacks (which
> x86 needed for CRIU, and also seems like a nice thing for several other
> reasons), you are already at that point. Can't you also do a bunch of
> gcspopm's to the top of the GCS stack, and have no marker to hit before
> the end of the stack? (maybe not in GCS, I don't know...)

It's definitely possible to use writes or pops to achive the same
effect, it's just that it's less likely to be something that happens
through simple oversight than missing a flag off the initial map call
would be.

> > The other minor issue is that the current arm64 marker is all bits 0
> > so by itself for arm64 _MARKER would have no perceptible impact, it
> > would only serve to push the token down a slot in the stack (I'm
> > guessing that's the intended meaning?).

> Pushing the token down a frame is what flags==0 does in this patch,
> right?

Yes, exactly - if we make the top of stack record optional then if that
flag is omitted we'd not do that.

> You don't have to support all the flags actually, you could just
> support the one mode you already have and reject all other
> combinations... Then it matches between arch's, and you still have the
> guaranteed-ish end marker.

Sure, though if we're going to the trouble of checking for the flag we
probably may as well implement it.  I guess x86 is locked in at this
point by existing userspace.  I guess I'll implement it assuming nobody
from userspace complains, it's trivial for a kernel.

> So the question is not what mode should arm support, but should we have
> the flags match between x86 and ARM?

The flags should definitely match, no disagreement there.
Mark Brown Aug. 1, 2023, 6:03 p.m. UTC | #8
On Tue, Aug 01, 2023 at 08:28:14PM +0300, Mike Rapoport wrote:
> On Tue, Aug 01, 2023 at 05:07:00PM +0000, Edgecombe, Rick P wrote:

> > So the question is not what mode should arm support, but should we have
> > the flags match between x86 and ARM?

> What if the flag will be called, say, SHADOW_STACK_DEFAULT_INIT?
> Then each arch can push whatever it likes to and from the userspace
> perspective the shadow stack will have some basic init state, no matter
> what architecture it is.

x86 might have some fun with that and the existing userspace binaries...
Rick Edgecombe Aug. 1, 2023, 8:57 p.m. UTC | #9
On Tue, 2023-08-01 at 18:57 +0100, Mark Brown wrote:
> > You don't have to support all the flags actually, you could just
> > support the one mode you already have and reject all other
> > combinations... Then it matches between arch's, and you still have
> > the
> > guaranteed-ish end marker.
> 
> Sure, though if we're going to the trouble of checking for the flag
> we
> probably may as well implement it.  I guess x86 is locked in at this
> point by existing userspace.  I guess I'll implement it assuming
> nobody
> from userspace complains, it's trivial for a kernel.

To make sure we are on the same page: What I'm saying is say we do
something like add another flag SHADOW_STACK_SET_MARKER that means add
a marker at the end (making the token off by one frame). Then you can
just reject any flags != (SHADOW_STACK_SET_MARKER |
SHADOW_STACK_SET_TOKEN) value, and leave the rest of the code as is. So
not really implementing anything new. 

Then x86 could use the same flag meanings if/when it implements end
markers. If it doesn't seem worth it, it's not a big deal on my end.
Just seemed that they were needlessly diverging.
Mark Brown Aug. 2, 2023, 4:27 p.m. UTC | #10
On Tue, Aug 01, 2023 at 08:57:59PM +0000, Edgecombe, Rick P wrote:
> On Tue, 2023-08-01 at 18:57 +0100, Mark Brown wrote:

> > Sure, though if we're going to the trouble of checking for the flag
> > we
> > probably may as well implement it.  I guess x86 is locked in at this
> > point by existing userspace.  I guess I'll implement it assuming
> > nobody
> > from userspace complains, it's trivial for a kernel.

> To make sure we are on the same page: What I'm saying is say we do
> something like add another flag SHADOW_STACK_SET_MARKER that means add
> a marker at the end (making the token off by one frame). Then you can
> just reject any flags != (SHADOW_STACK_SET_MARKER |
> SHADOW_STACK_SET_TOKEN) value, and leave the rest of the code as is. So
> not really implementing anything new. 

> Then x86 could use the same flag meanings if/when it implements end
> markers. If it doesn't seem worth it, it's not a big deal on my end.
> Just seemed that they were needlessly diverging.

Yes, my understanding of the flags is the same.  I'll definitely
implement omitting the cap since there's an actual use case for that
(extending an existing stack, it's marginally safer to not have any
opportunity to pivot into the newly allocated region).
Mark Brown Aug. 4, 2023, 1:38 p.m. UTC | #11
On Wed, Aug 02, 2023 at 05:27:54PM +0100, Mark Brown wrote:
> On Tue, Aug 01, 2023 at 08:57:59PM +0000, Edgecombe, Rick P wrote:

> > To make sure we are on the same page: What I'm saying is say we do
> > something like add another flag SHADOW_STACK_SET_MARKER that means add
> > a marker at the end (making the token off by one frame). Then you can
> > just reject any flags != (SHADOW_STACK_SET_MARKER |
> > SHADOW_STACK_SET_TOKEN) value, and leave the rest of the code as is. So
> > not really implementing anything new. 

> > Then x86 could use the same flag meanings if/when it implements end
> > markers. If it doesn't seem worth it, it's not a big deal on my end.
> > Just seemed that they were needlessly diverging.

> Yes, my understanding of the flags is the same.  I'll definitely
> implement omitting the cap since there's an actual use case for that
> (extending an existing stack, it's marginally safer to not have any
> opportunity to pivot into the newly allocated region).

BTW are you planning to repost the series for this release?  We're
almost at -rc5 which is pretty late and I didn't see anything yet.  It
looks like there's a branch in tip that's getting some updates but it's
not getting merged for -next.
Rick Edgecombe Aug. 4, 2023, 4:43 p.m. UTC | #12
On Fri, 2023-08-04 at 14:38 +0100, Mark Brown wrote:
> On Wed, Aug 02, 2023 at 05:27:54PM +0100, Mark Brown wrote:
> > On Tue, Aug 01, 2023 at 08:57:59PM +0000, Edgecombe, Rick P wrote:
> 
> > > To make sure we are on the same page: What I'm saying is say we
> > > do
> > > something like add another flag SHADOW_STACK_SET_MARKER that
> > > means add
> > > a marker at the end (making the token off by one frame). Then you
> > > can
> > > just reject any flags != (SHADOW_STACK_SET_MARKER |
> > > SHADOW_STACK_SET_TOKEN) value, and leave the rest of the code as
> > > is. So
> > > not really implementing anything new. 
> 
> > > Then x86 could use the same flag meanings if/when it implements
> > > end
> > > markers. If it doesn't seem worth it, it's not a big deal on my
> > > end.
> > > Just seemed that they were needlessly diverging.
> 
> > Yes, my understanding of the flags is the same.  I'll definitely
> > implement omitting the cap since there's an actual use case for
> > that
> > (extending an existing stack, it's marginally safer to not have any
> > opportunity to pivot into the newly allocated region).
> 
> BTW are you planning to repost the series for this release?  We're
> almost at -rc5 which is pretty late and I didn't see anything yet. 

There were a few patches I posted on top of the last series after your
comments, but I wasn't planning on reposting the whole thing. Why do
you ask? Just trying to figure out the best version to base off of?

> It
> looks like there's a branch in tip that's getting some updates but
> it's
> not getting merged for -next.

Hmm, not sure why it's not in -next anymore. I'll look into that.
Thanks for pointing it out.
Mark Brown Aug. 4, 2023, 5:10 p.m. UTC | #13
On Fri, Aug 04, 2023 at 04:43:45PM +0000, Edgecombe, Rick P wrote:
> On Fri, 2023-08-04 at 14:38 +0100, Mark Brown wrote:

> > BTW are you planning to repost the series for this release?  We're
> > almost at -rc5 which is pretty late and I didn't see anything yet. 

> There were a few patches I posted on top of the last series after your
> comments, but I wasn't planning on reposting the whole thing. Why do
> you ask? Just trying to figure out the best version to base off of?

> > It
> > looks like there's a branch in tip that's getting some updates but
> > it's
> > not getting merged for -next.

> Hmm, not sure why it's not in -next anymore. I'll look into that.
> Thanks for pointing it out.

Mainly it was the inclusion in -next with a view to it getting merged
that prompted me to ask, for the last release cycle had seemed to be
gated on it being posted on the list (which is a standard workflow).

Due to issues in mainline I really need a -rc3, and ideally -rc4, base
but other than occasionally having to pull new bits out of your series
it's not causing me any serious issues and I don't anticipate the arm64
stuff getting in this time round.
Szabolcs Nagy Aug. 7, 2023, 10:20 a.m. UTC | #14
The 07/31/2023 14:43, Mark Brown wrote:
> +SYSCALL_DEFINE3(map_shadow_stack, unsigned long, addr, unsigned long, size, unsigned int, flags)
> +{
> +	unsigned long alloc_size;
> +	unsigned long __user *cap_ptr;
> +	unsigned long cap_val;
> +	int ret;
> +
> +	if (!system_supports_gcs())
> +		return -EOPNOTSUPP;
> +
> +	if (flags)
> +		return -EINVAL;
> +
> +	if (addr % 16)
> +		return -EINVAL;

mmap addr must be page aligned (and there is no align req on size).

i'd expect similar api here.

> +
> +	if (size == 16 || size % 16)
> +		return -EINVAL;

why %16 and not %8 ?
Mark Brown Aug. 7, 2023, 1 p.m. UTC | #15
On Mon, Aug 07, 2023 at 11:20:58AM +0100, Szabolcs Nagy wrote:
> The 07/31/2023 14:43, Mark Brown wrote:
> > +SYSCALL_DEFINE3(map_shadow_stack, unsigned long, addr, unsigned long, size, unsigned int, flags)
> > +{
> > +	unsigned long alloc_size;
> > +	unsigned long __user *cap_ptr;
> > +	unsigned long cap_val;
> > +	int ret;
> > +
> > +	if (!system_supports_gcs())
> > +		return -EOPNOTSUPP;
> > +
> > +	if (flags)
> > +		return -EINVAL;
> > +
> > +	if (addr % 16)
> > +		return -EINVAL;

> mmap addr must be page aligned (and there is no align req on size).

> i'd expect similar api here.

That's not what the manual page or a quick check of the code suggest
that mmap() does, they say that the kernel just takes it as a hint and
chooses a nearby page boundary, though I didn't test.  I'm not sure why
I have that alignment check at all TBH, and to the extent it's needed I
could just be 8 - this level of code doesn't really care.

> > +	if (size == 16 || size % 16)
> > +		return -EINVAL;

> why %16 and not %8 ?

I don't think that's needed any more - there was some stuff in an
earlier version of the code but no longer.
Szabolcs Nagy Aug. 8, 2023, 8:21 a.m. UTC | #16
The 08/07/2023 14:00, Mark Brown wrote:
> On Mon, Aug 07, 2023 at 11:20:58AM +0100, Szabolcs Nagy wrote:
> > The 07/31/2023 14:43, Mark Brown wrote:
> > > +SYSCALL_DEFINE3(map_shadow_stack, unsigned long, addr, unsigned long, size, unsigned int, flags)
> > > +{
> > > +	unsigned long alloc_size;
> > > +	unsigned long __user *cap_ptr;
> > > +	unsigned long cap_val;
> > > +	int ret;
> > > +
> > > +	if (!system_supports_gcs())
> > > +		return -EOPNOTSUPP;
> > > +
> > > +	if (flags)
> > > +		return -EINVAL;
> > > +
> > > +	if (addr % 16)
> > > +		return -EINVAL;
> 
> > mmap addr must be page aligned (and there is no align req on size).
> 
> > i'd expect similar api here.
> 
> That's not what the manual page or a quick check of the code suggest
> that mmap() does, they say that the kernel just takes it as a hint and

i should have said that i expect MAP_FIXED_NOREPLACE semantics
(since the x86 code seemed to use that) and then the mapped
address must match exactly thus page aligned.

> chooses a nearby page boundary, though I didn't test.  I'm not sure why
> I have that alignment check at all TBH, and to the extent it's needed I
> could just be 8 - this level of code doesn't really care.
> 
> > > +	if (size == 16 || size % 16)
> > > +		return -EINVAL;
> 
> > why %16 and not %8 ?
> 
> I don't think that's needed any more - there was some stuff in an
> earlier version of the code but no longer.

it's kind of important to know the exact logic so the cap token
location can be computed in userspace for arbitrary size.

(this is why i wanted to see the map_shadow_stack man page first
but i was told that comes separately on linux..)
Mark Brown Aug. 8, 2023, 8:42 p.m. UTC | #17
On Tue, Aug 08, 2023 at 09:21:03AM +0100, Szabolcs Nagy wrote:
> The 08/07/2023 14:00, Mark Brown wrote:

> > That's not what the manual page or a quick check of the code suggest
> > that mmap() does, they say that the kernel just takes it as a hint and

> i should have said that i expect MAP_FIXED_NOREPLACE semantics
> (since the x86 code seemed to use that) and then the mapped
> address must match exactly thus page aligned.

Ah, I see.  We do pass MAP_FIXED_NOREPLACE when allocating the stack if
an address was specified but currently leave it up to the VM subsystem
to figure out what to do with the address.  I don't immediately see
where mmap() enforces this requirement, but I have to admit I didn't
look overly hard.  I don't see a problem with enforcing a PAGE_SIZE
constraint here.

> > > > +	if (size == 16 || size % 16)
> > > > +		return -EINVAL;

> > > why %16 and not %8 ?

> > I don't think that's needed any more - there was some stuff in an
> > earlier version of the code but no longer.

> it's kind of important to know the exact logic so the cap token
> location can be computed in userspace for arbitrary size.

> (this is why i wanted to see the map_shadow_stack man page first
> but i was told that comes separately on linux..)

Right, I'd already changed it to % 8 in the version I posted yesterday.
diff mbox series

Patch

diff --git a/arch/arm64/mm/gcs.c b/arch/arm64/mm/gcs.c
index 64c9f9a85925..c24fe367e15a 100644
--- a/arch/arm64/mm/gcs.c
+++ b/arch/arm64/mm/gcs.c
@@ -52,7 +52,6 @@  unsigned long gcs_alloc_thread_stack(struct task_struct *tsk,
 		return 0;
 
 	size = gcs_size(size);
-
 	addr = alloc_gcs(0, size, 0, 0);
 	if (IS_ERR_VALUE(addr))
 		return addr;
@@ -64,6 +63,55 @@  unsigned long gcs_alloc_thread_stack(struct task_struct *tsk,
 	return addr;
 }
 
+SYSCALL_DEFINE3(map_shadow_stack, unsigned long, addr, unsigned long, size, unsigned int, flags)
+{
+	unsigned long alloc_size;
+	unsigned long __user *cap_ptr;
+	unsigned long cap_val;
+	int ret;
+
+	if (!system_supports_gcs())
+		return -EOPNOTSUPP;
+
+	if (flags)
+		return -EINVAL;
+
+	if (addr % 16)
+		return -EINVAL;
+
+	if (size == 16 || size % 16)
+		return -EINVAL;
+
+	/*
+	 * An overflow would result in attempting to write the restore token
+	 * to the wrong location. Not catastrophic, but just return the right
+	 * error code and block it.
+	 */
+	alloc_size = PAGE_ALIGN(size);
+	if (alloc_size < size)
+		return -EOVERFLOW;
+
+	addr = alloc_gcs(addr, alloc_size, 0, false);
+	if (IS_ERR_VALUE(addr))
+		return addr;
+
+	/*
+	 * Put a cap token at the end of the allocated region so it
+	 * can be switched to.
+	 */
+	cap_ptr = (unsigned long __user *)(addr + size -
+					   (2 * sizeof(unsigned long)));
+	cap_val = GCS_CAP(cap_ptr);
+
+	ret = copy_to_user_gcs(cap_ptr, &cap_val, 1);
+	if (ret != 0) {
+		vm_munmap(addr, size);
+		return -EFAULT;
+	}
+
+	return addr;
+}
+
 /*
  * Apply the GCS mode configured for the specified task to the
  * hardware.
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index 03e3d0121d5e..7f6dc0988197 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -953,6 +953,7 @@  asmlinkage long sys_set_mempolicy_home_node(unsigned long start, unsigned long l
 asmlinkage long sys_cachestat(unsigned int fd,
 		struct cachestat_range __user *cstat_range,
 		struct cachestat __user *cstat, unsigned int flags);
+asmlinkage long sys_map_shadow_stack(unsigned long addr, unsigned long size, unsigned int flags);
 
 /*
  * Architecture-specific system calls
diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h
index fd6c1cb585db..38885a795ea6 100644
--- a/include/uapi/asm-generic/unistd.h
+++ b/include/uapi/asm-generic/unistd.h
@@ -820,8 +820,11 @@  __SYSCALL(__NR_set_mempolicy_home_node, sys_set_mempolicy_home_node)
 #define __NR_cachestat 451
 __SYSCALL(__NR_cachestat, sys_cachestat)
 
+#define __NR_map_shadow_stack 452
+__SYSCALL(__NR_map_shadow_stack, sys_map_shadow_stack)
+
 #undef __NR_syscalls
-#define __NR_syscalls 452
+#define __NR_syscalls 453
 
 /*
  * 32 bit systems traditionally used different
diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
index 781de7cc6a4e..e137c1385c56 100644
--- a/kernel/sys_ni.c
+++ b/kernel/sys_ni.c
@@ -274,6 +274,7 @@  COND_SYSCALL(vm86old);
 COND_SYSCALL(modify_ldt);
 COND_SYSCALL(vm86);
 COND_SYSCALL(kexec_file_load);
+COND_SYSCALL(map_shadow_stack);
 
 /* s390 */
 COND_SYSCALL(s390_pci_mmio_read);