diff mbox

[RFC,04/10] x86/mm: Only flush indirect branches when switching into non dumpable process

Message ID 1516476182-5153-5-git-send-email-karahmed@amazon.de (mailing list archive)
State New, archived
Headers show

Commit Message

KarimAllah Ahmed Jan. 20, 2018, 7:22 p.m. UTC
From: Tim Chen <tim.c.chen@linux.intel.com>

Flush indirect branches when switching into a process that marked
itself non dumpable.  This protects high value processes like gpg
better, without having too high performance overhead.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de>
---
 arch/x86/mm/tlb.c | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

Comments

Woodhouse, David Jan. 20, 2018, 9:06 p.m. UTC | #1
On Sat, 2018-01-20 at 20:22 +0100, KarimAllah Ahmed wrote:
> From: Tim Chen <tim.c.chen@linux.intel.com>

I think this is probably From: Andi now rather than From: Tim?

We do need the series this far in order to have a full retpoline-based
mitigation, and I'd like to see that go in sooner rather than later.
There's a little more discussion to be had about the IBRS parts which
come later in the series (and the final one or two which weren't posted
yet).

I think this is the one patch of the "we want this now" IBPB set that
we expect serious debate on, which is why it's still a separate
"optimisation" patch on top of the previous one which just does IBPB
unconditionally.


> Flush indirect branches when switching into a process that marked
> itself non dumpable.  This protects high value processes like gpg
> better, without having too high performance overhead.
> 
> Signed-off-by: Andi Kleen <ak@linux.intel.com>
> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
> Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de>
> ---
>  arch/x86/mm/tlb.c | 13 ++++++++++++-
>  1 file changed, 12 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
> index 304de7d..f64e80c 100644
> --- a/arch/x86/mm/tlb.c
> +++ b/arch/x86/mm/tlb.c
> @@ -225,8 +225,19 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
>  		 * Avoid user/user BTB poisoning by flushing the branch predictor
>  		 * when switching between processes. This stops one process from
>  		 * doing Spectre-v2 attacks on another.
> +		 *
> +		 * As an optimization: Flush indirect branches only when
> +		 * switching into processes that disable dumping.
> +		 *
> +		 * This will not flush when switching into kernel threads.
> +		 * But it would flush when switching into idle and back
> +		 *
> +		 * It might be useful to have a one-off cache here
> +		 * to also not flush the idle case, but we would need some
> +		 * kind of stable sequence number to remember the previous mm.
>  		 */
> -		indirect_branch_prediction_barrier();
> +		if (tsk && tsk->mm && get_dumpable(tsk->mm) != SUID_DUMP_USER)
> +			indirect_branch_prediction_barrier();
>  
>  		if (IS_ENABLED(CONFIG_VMAP_STACK)) {
>  			/*
Peter Zijlstra Jan. 21, 2018, 11:22 a.m. UTC | #2
On Sat, Jan 20, 2018 at 08:22:55PM +0100, KarimAllah Ahmed wrote:
> From: Tim Chen <tim.c.chen@linux.intel.com>
> 
> Flush indirect branches when switching into a process that marked
> itself non dumpable.  This protects high value processes like gpg
> better, without having too high performance overhead.

So if I understand it right, this is only needed if the 'other'
executable itself is susceptible to spectre. If say someone audited gpg
for spectre-v1 and build it with retpoline, it would be safe to not
issue the IBPB, right?

So would it make sense to provide an ELF flag / personality thing such
that userspace can indicate its spectre-safe?

I realize that this is all future work, because so far auditing for v1
is a lot of pain (we need better tools), but would it be something that
makes sense in the longer term?
David Woodhouse Jan. 21, 2018, 12:04 p.m. UTC | #3
> On Sat, Jan 20, 2018 at 08:22:55PM +0100, KarimAllah Ahmed wrote:
>> From: Tim Chen <tim.c.chen@linux.intel.com>
>>
>> Flush indirect branches when switching into a process that marked
>> itself non dumpable.  This protects high value processes like gpg
>> better, without having too high performance overhead.
>
> So if I understand it right, this is only needed if the 'other'
> executable itself is susceptible to spectre. If say someone audited gpg
> for spectre-v1 and build it with retpoline, it would be safe to not
> issue the IBPB, right?


Spectre V2 not v1. V1 is separate.
For V2 retpoline is enough... as long as all the libraries have it too.

> So would it make sense to provide an ELF flag / personality thing such
> that userspace can indicate its spectre-safe?

Yes, Arjan and I were pondering that yesterday; it probably does make
sense. Also for allowing a return to userspace after vmexit, if the army
process itself is so marked.

> I realize that this is all future work, because so far auditing for v1
> is a lot of pain (we need better tools), but would it be something that
> makes sense in the longer term?

It's *only* retpoline so it isn't actually that much. Although I'm wary of
Cc'ing HJ on such thoughts because he seems to never sleep and always
respond promptly with "OK I did that... " :)

If we did systematically do this in userspace we'd probably want to do
external thunks there too, and a flag in the auxvec to tell it not to
bother (for IBRS_ALL etc.).
H.J. Lu Jan. 21, 2018, 2:07 p.m. UTC | #4
On Sun, Jan 21, 2018 at 4:04 AM, David Woodhouse <dwmw2@infradead.org> wrote:
>
>> On Sat, Jan 20, 2018 at 08:22:55PM +0100, KarimAllah Ahmed wrote:
>>> From: Tim Chen <tim.c.chen@linux.intel.com>
>>>
>>> Flush indirect branches when switching into a process that marked
>>> itself non dumpable.  This protects high value processes like gpg
>>> better, without having too high performance overhead.
>>
>> So if I understand it right, this is only needed if the 'other'
>> executable itself is susceptible to spectre. If say someone audited gpg
>> for spectre-v1 and build it with retpoline, it would be safe to not
>> issue the IBPB, right?
>
>
> Spectre V2 not v1. V1 is separate.
> For V2 retpoline is enough... as long as all the libraries have it too.
>
>> So would it make sense to provide an ELF flag / personality thing such
>> that userspace can indicate its spectre-safe?
>
> Yes, Arjan and I were pondering that yesterday; it probably does make
> sense. Also for allowing a return to userspace after vmexit, if the army
> process itself is so marked.

Please take a look at how CET is handled in program property in
x86-64 psABI for CET:

https://github.com/hjl-tools/x86-psABI/wiki/x86-64-psABI-cet.pdf
Ingo Molnar Jan. 21, 2018, 4:21 p.m. UTC | #5
* Peter Zijlstra <peterz@infradead.org> wrote:

> On Sat, Jan 20, 2018 at 08:22:55PM +0100, KarimAllah Ahmed wrote:
> > From: Tim Chen <tim.c.chen@linux.intel.com>
> > 
> > Flush indirect branches when switching into a process that marked
> > itself non dumpable.  This protects high value processes like gpg
> > better, without having too high performance overhead.
> 
> So if I understand it right, this is only needed if the 'other'
> executable itself is susceptible to spectre. If say someone audited gpg
> for spectre-v1 and build it with retpoline, it would be safe to not
> issue the IBPB, right?
> 
> So would it make sense to provide an ELF flag / personality thing such
> that userspace can indicate its spectre-safe?
> 
> I realize that this is all future work, because so far auditing for v1
> is a lot of pain (we need better tools), but would it be something that
> makes sense in the longer term?

So if it's only about the scheduler barrier, what cycle cost are we talking about 
here?

Because putting something like this into an ELF flag raises the question of who is 
allowed to set the flag - does a user-compiled binary count? If yes then it would 
be a trivial thing for local exploits to set the flag and turn off the barrier.

Yes, we could make a distinction based on the owner of the file, we could use 
security labels, etc. - but it gets somewhat awkward and fragile.

So unless we are talking about measurably high scheduler costs here, I'd prefer to 
err on the side of caution (and simplicity) and issue the barrier unconditionally.

Thanks,

	Ingo
Arjan van de Ven Jan. 21, 2018, 4:25 p.m. UTC | #6
On 1/21/2018 8:21 AM, Ingo Molnar wrote:
> 
> 
> So if it's only about the scheduler barrier, what cycle cost are we talking about
> here?
>

in the order of 5000 to 10000 cycles.
(depends a bit on the cpu generation but this range is a reasonable approximation)



> Because putting something like this into an ELF flag raises the question of who is
> allowed to set the flag - does a user-compiled binary count? If yes then it would
> be a trivial thing for local exploits to set the flag and turn off the barrier.

the barrier is about who you go TO, e.g. the thing under attack.
as you say, depending on the thing that would be the evil one does not work.
Woodhouse, David Jan. 21, 2018, 10:20 p.m. UTC | #7
On Sun, 2018-01-21 at 17:21 +0100, Ingo Molnar wrote:
> 
> Because putting something like this into an ELF flag raises the question of who is 
> allowed to set the flag - does a user-compiled binary count? If yes then it would 
> be a trivial thing for local exploits to set the flag and turn off the barrier.

You can only allow *yourself* to be exploited that way. The flag says,
"I'm OK, you don't need to protect me".
Peter Zijlstra Jan. 22, 2018, 10:19 a.m. UTC | #8
On Sun, Jan 21, 2018 at 12:04:03PM -0000, David Woodhouse wrote:
> > So if I understand it right, this is only needed if the 'other'
> > executable itself is susceptible to spectre. If say someone audited gpg
> > for spectre-v1 and build it with retpoline, it would be safe to not
> > issue the IBPB, right?
> 
> 
> Spectre V2 not v1. V1 is separate.
> For V2 retpoline is enough... as long as all the libraries have it too.

Ah, easy then. So we need this toolchain bit and then simply rebuild
works and everything is happy again, well except of course those people
running closed sores binaries, but meh.. :-)

> > I realize that this is all future work, because so far auditing for v1
> > is a lot of pain (we need better tools), but would it be something that
> > makes sense in the longer term?
> 
> It's *only* retpoline so it isn't actually that much. Although I'm wary of
> Cc'ing HJ on such thoughts because he seems to never sleep and always
> respond promptly with "OK I did that... " :)
> 
> If we did systematically do this in userspace we'd probably want to do
> external thunks there too, and a flag in the auxvec to tell it not to
> bother (for IBRS_ALL etc.).

Right, so if its v2/retpoline only, we really should do this asap and
then rebuild world on distros (or arch/gentoo people could read a book
or something).
David Woodhouse Jan. 22, 2018, 10:23 a.m. UTC | #9
On Mon, 2018-01-22 at 11:19 +0100, Peter Zijlstra wrote:
> Right, so if its v2/retpoline only, we really should do this asap and
> then rebuild world on distros (or arch/gentoo people could read a book
> or something).

By the time we manage to rebuild all the distros, I *seriously* hope
that someone would be shipping a fixed CPU.

And not just the half-way-there IBRS_ALL bit that still requires the
IBPB flushing on context switches that's discussed in this patch, but
an *actual* fix so we can forget about it all and go drinking.
Tim Chen Jan. 22, 2018, 6:29 p.m. UTC | #10
On 01/20/2018 01:06 PM, Woodhouse, David wrote:
> On Sat, 2018-01-20 at 20:22 +0100, KarimAllah Ahmed wrote:
>> From: Tim Chen <tim.c.chen@linux.intel.com>
> 
> I think this is probably From: Andi now rather than From: Tim?

This change is from Andi.


>>  1 file changed, 12 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
>> index 304de7d..f64e80c 100644
>> --- a/arch/x86/mm/tlb.c
>> +++ b/arch/x86/mm/tlb.c
>> @@ -225,8 +225,19 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
>>  		 * Avoid user/user BTB poisoning by flushing the branch predictor
>>  		 * when switching between processes. This stops one process from
>>  		 * doing Spectre-v2 attacks on another.
>> +		 *
>> +		 * As an optimization: Flush indirect branches only when
>> +		 * switching into processes that disable dumping.
>> +		 *
>> +		 * This will not flush when switching into kernel threads.
>> +		 * But it would flush when switching into idle and back
>> +		 *
>> +		 * It might be useful to have a one-off cache here
>> +		 * to also not flush the idle case, but we would need some
>> +		 * kind of stable sequence number to remember the previous mm.
>>  		 */
>> -		indirect_branch_prediction_barrier();
>> +		if (tsk && tsk->mm && get_dumpable(tsk->mm) != SUID_DUMP_USER)
>> +			indirect_branch_prediction_barrier();

We could move this close to the cr3 write. The cr3 write provides
barrier against unwanted speculation in the above if check.

Tim
Jon Masters Jan. 29, 2018, 6:35 a.m. UTC | #11
Hi Peter, David, all,

First a quick note on David's earlier comment, about this optimization
being still up for debate. The problem with this optimization as-is is
that it doesn't protect userspace-to-userspace unless applications are
rebuilt and we get the infrastructure to handle that (ELF, whatever).

But...

On 01/21/2018 06:22 AM, Peter Zijlstra wrote:
> On Sat, Jan 20, 2018 at 08:22:55PM +0100, KarimAllah Ahmed wrote:
>> From: Tim Chen <tim.c.chen@linux.intel.com>
>>
>> Flush indirect branches when switching into a process that marked
>> itself non dumpable.  This protects high value processes like gpg
>> better, without having too high performance overhead.
> 
> So if I understand it right, this is only needed if the 'other'
> executable itself is susceptible to spectre. If say someone audited gpg
> for spectre-v1 and build it with retpoline, it would be safe to not
> issue the IBPB, right?

More importantly, rebuilding the world introduces a lot of challenges
that need to be discussed heavily before they happen (I would like to
see someone run a session at one of the various upcoming events on
userspace, I've already prodded a few people to nudge that forward). In
particular, we don't have the infrastructure in gcc/glibc to dynamically
patch userspace call sites to enable/disable retpolines.

We discussed nasty hacks last year (I even suggested an ugly kernel
exported page similar to VDSO that could be implementation patched for
different uarches), but the bottom line is there isn't anything in place
to provide a similar userspace experience to what the kernel can do, and
that would need to be solved in addition to the ELF/ABI bits.

> So would it make sense to provide an ELF flag / personality thing such
> that userspace can indicate its spectre-safe?
> 
> I realize that this is all future work, because so far auditing for v1
> is a lot of pain (we need better tools), but would it be something that
> makes sense in the longer term?

So I would just caution that doing this isn't necessarily bad, but it's
far more than just ELF bits and rebuilding. Once userspace is rebuilt
with un-nopable retpolines, they're there whether you need them on
$future_hardware or not, and that fancy branch predictor is useless. So
we really need a way to allow for userspace patchable calls, or at least
some kind of plan before everyone runs away with rebuilding.

(unless they're embedded/Gentoo/whatever...have fun in that case)

Jon.

P.S. This is why for certain downstream distros you'll see IBPB use like
prior to this patch - it'll prevent certain attacks that can't be
otherwise mitigated without going and properly solving the tools issue.
Peter Zijlstra Jan. 29, 2018, 2:07 p.m. UTC | #12
On Mon, Jan 29, 2018 at 01:35:30AM -0500, Jon Masters wrote:
> > So if I understand it right, this is only needed if the 'other'
> > executable itself is susceptible to spectre. If say someone audited gpg
> > for spectre-v1 and build it with retpoline, it would be safe to not
> > issue the IBPB, right?
> 
> More importantly, rebuilding the world introduces a lot of challenges
> that need to be discussed heavily before they happen (I would like to
> see someone run a session at one of the various upcoming events on
> userspace, I've already prodded a few people to nudge that forward). In
> particular, we don't have the infrastructure in gcc/glibc to dynamically
> patch userspace call sites to enable/disable retpolines.

GCC/GLIBC do in fact have some infrastructure for this; see
target_clones/ifunc function attributes. We can at (dynamic) link time
select between alternative functions.

With this we could select different retpoline thunks for different
systems, much like what we end up doing for the kernel.

> We discussed nasty hacks last year (I even suggested an ugly kernel
> exported page similar to VDSO that could be implementation patched for
> different uarches), but the bottom line is there isn't anything in place
> to provide a similar userspace experience to what the kernel can do, and
> that would need to be solved in addition to the ELF/ABI bits.

Not sure where you discussed what, but I spoke with a bunch of the
facebook people at plumbers about kernel support for (runtime) userspace
patching a-la asm-goto/jump-labels.

And while that would be entirely fun, I don't see how we'd need this
here.

> > So would it make sense to provide an ELF flag / personality thing such
> > that userspace can indicate its spectre-safe?
> > 
> > I realize that this is all future work, because so far auditing for v1
> > is a lot of pain (we need better tools), but would it be something that
> > makes sense in the longer term?
> 
> So I would just caution that doing this isn't necessarily bad, but it's
> far more than just ELF bits and rebuilding. Once userspace is rebuilt
> with un-nopable retpolines, they're there whether you need them on
> $future_hardware or not, and that fancy branch predictor is useless. So
> we really need a way to allow for userspace patchable calls, or at least
> some kind of plan before everyone runs away with rebuilding.

Just rebuild world again; there's plenty distros where this is not in
fact a difficult thing to do :-) You just don't happen to work for one
... :-)
diff mbox

Patch

diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index 304de7d..f64e80c 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -225,8 +225,19 @@  void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
 		 * Avoid user/user BTB poisoning by flushing the branch predictor
 		 * when switching between processes. This stops one process from
 		 * doing Spectre-v2 attacks on another.
+		 *
+		 * As an optimization: Flush indirect branches only when
+		 * switching into processes that disable dumping.
+		 *
+		 * This will not flush when switching into kernel threads.
+		 * But it would flush when switching into idle and back
+		 *
+		 * It might be useful to have a one-off cache here
+		 * to also not flush the idle case, but we would need some
+		 * kind of stable sequence number to remember the previous mm.
 		 */
-		indirect_branch_prediction_barrier();
+		if (tsk && tsk->mm && get_dumpable(tsk->mm) != SUID_DUMP_USER)
+			indirect_branch_prediction_barrier();
 
 		if (IS_ENABLED(CONFIG_VMAP_STACK)) {
 			/*