diff mbox

[1/5] prctl: add PR_ISOLATE_BP process control

Message ID 1516712825-2917-2-git-send-email-schwidefsky@de.ibm.com (mailing list archive)
State New, archived
Headers show

Commit Message

Martin Schwidefsky Jan. 23, 2018, 1:07 p.m. UTC
Add the PR_ISOLATE_BP operation to prctl. The effect of the process
control is to make all branch prediction entries created by the execution
of the user space code of this task not applicable to kernel code or the
code of any other task.

This can be achieved by the architecture specific implementation
in different ways, e.g. by limiting the branch predicion for the task,
or by clearing the branch prediction tables on each context switch, or
by tagging the branch prediction entries in a suitable way.

The architecture code needs to define the ISOLATE_BP macro to implement
the hardware specific details of the branch prediction isolation.

The control can not be removed from a task once it is activated and it
is inherited by all children of the task.

The user space wrapper to start a program with the isolated branch
prediction:

int main(int argc, char *argv[], char *envp[])
{
	int rc;

	if (argc < 2) {
		fprintf(stderr, "Usage: %s <file-to-exec> <arguments>\n",
			argv[0]);
		exit(EXIT_FAILURE);
	}

	rc = prctl(PR_ISOLATE_BP);
	if (rc) {
		perror("PR_ISOLATE_BP");
		exit(EXIT_FAILURE);
	}
	execve(argv[1], argv + 1, envp);
	perror("execve");
	exit(EXIT_FAILURE);
}

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
---
 include/uapi/linux/prctl.h | 8 ++++++++
 kernel/sys.c               | 6 ++++++
 2 files changed, 14 insertions(+)

Comments

Dominik Brodowski Jan. 23, 2018, 5:07 p.m. UTC | #1
On Tue, Jan 23, 2018 at 02:07:01PM +0100, Martin Schwidefsky wrote:
> Add the PR_ISOLATE_BP operation to prctl. The effect of the process
> control is to make all branch prediction entries created by the execution
> of the user space code of this task not applicable to kernel code or the
> code of any other task.

What is the rationale for requiring a per-process *opt-in* for this added
protection?

For KPTI on x86, the exact opposite approach is being discussed (see, e.g.
http://lkml.kernel.org/r/1515612500-14505-1-git-send-email-w@1wt.eu ): By
default, play it safe, with KPTI enabled. But for "trusted" processes, one
may opt out using prctrl.

Thanks,
	Dominik
Martin Schwidefsky Jan. 24, 2018, 6:29 a.m. UTC | #2
On Tue, 23 Jan 2018 18:07:19 +0100
Dominik Brodowski <linux@dominikbrodowski.net> wrote:

> On Tue, Jan 23, 2018 at 02:07:01PM +0100, Martin Schwidefsky wrote:
> > Add the PR_ISOLATE_BP operation to prctl. The effect of the process
> > control is to make all branch prediction entries created by the execution
> > of the user space code of this task not applicable to kernel code or the
> > code of any other task.  
> 
> What is the rationale for requiring a per-process *opt-in* for this added
> protection?
> 
> For KPTI on x86, the exact opposite approach is being discussed (see, e.g.
> http://lkml.kernel.org/r/1515612500-14505-1-git-send-email-w@1wt.eu ): By
> default, play it safe, with KPTI enabled. But for "trusted" processes, one
> may opt out using prctrl.

The rationale is that there are cases where you got code from *somewhere*
and want to run it in an isolated context. Think: a docker container that
runs under KVM. But with spectre this is still not really safe. So you
include a wrapper program in the docker container to use the trap door
prctl to start the potential malicious program. Now you should be good, no?
Christian Borntraeger Jan. 24, 2018, 8:08 a.m. UTC | #3
On 01/23/2018 06:07 PM, Dominik Brodowski wrote:
> On Tue, Jan 23, 2018 at 02:07:01PM +0100, Martin Schwidefsky wrote:
>> Add the PR_ISOLATE_BP operation to prctl. The effect of the process
>> control is to make all branch prediction entries created by the execution
>> of the user space code of this task not applicable to kernel code or the
>> code of any other task.
> 
> What is the rationale for requiring a per-process *opt-in* for this added
> protection?
> 
> For KPTI on x86, the exact opposite approach is being discussed (see, e.g.
> http://lkml.kernel.org/r/1515612500-14505-1-git-send-email-w@1wt.eu ): By
> default, play it safe, with KPTI enabled. But for "trusted" processes, one
> may opt out using prctrl.

FWIW, this is not about KPTI. s390 always has the kernel in a separate address
space. Its only about potential spectre like attacks.
This idea is to be able to isolate in controlled environments, e.g. if you have
only one thread with untrusted code (e.g. jitting remote code). The property of 
the branch prediction mode on s390 is that it protects in two ways - against
being attacked but also against being able to attack via the btb.
Dominik Brodowski Jan. 24, 2018, 8:37 a.m. UTC | #4
On Wed, Jan 24, 2018 at 07:29:53AM +0100, Martin Schwidefsky wrote:
> On Tue, 23 Jan 2018 18:07:19 +0100
> Dominik Brodowski <linux@dominikbrodowski.net> wrote:
> 
> > On Tue, Jan 23, 2018 at 02:07:01PM +0100, Martin Schwidefsky wrote:
> > > Add the PR_ISOLATE_BP operation to prctl. The effect of the process
> > > control is to make all branch prediction entries created by the execution
> > > of the user space code of this task not applicable to kernel code or the
> > > code of any other task.  
> > 
> > What is the rationale for requiring a per-process *opt-in* for this added
> > protection?
> > 
> > For KPTI on x86, the exact opposite approach is being discussed (see, e.g.
> > http://lkml.kernel.org/r/1515612500-14505-1-git-send-email-w@1wt.eu ): By
> > default, play it safe, with KPTI enabled. But for "trusted" processes, one
> > may opt out using prctrl.
> 
> The rationale is that there are cases where you got code from *somewhere*
> and want to run it in an isolated context. Think: a docker container that
> runs under KVM. But with spectre this is still not really safe. So you
> include a wrapper program in the docker container to use the trap door
> prctl to start the potential malicious program. Now you should be good, no?

Well, partly. It may be that s390 and its use cases are special -- but as I
understand it, this uapi question goes beyond this question:

To my understanding, Linux traditionally tried to aim for the security goal
of avoiding information leaks *between* users[+], probably even between
processes of the same user. It wasn't a guarantee, and there always were
(and will be) information leaks -- and that is where additional safeguards
such as seccomp come into play, which reduce the attack surface against
unknown or unresolved security-related bugs. And everyone knew (or should
have known) that allowing "untrusted" code to be run (be it by an user, be
it JavaScript, etc.) is more risky. But still, avoiding information leaks
between users and between processes was (to my understanding) at least a
goal.[§]

In recent days however, the outlook on this issue seems to have shifted:

- Your proposal would mean to trust all userspace code, unless it is
  specifically marked as untrusted. As I understand it, this would mean that
  by default, spectre isn't fully mitigated cross-user and cross-process,
  though the kernel could. And rogue user-run code may make use of that,
  unless it is run with a special wrapper.

- Concerning x86 and IPBP, the current proposal is to limit the protection
  offered by IPBP to non-dumpable processes. As I understand it, this would
  mean that other processes are left hanging out to dry.[~]

- Concerning x86 and STIBP, David mentioned that "[t]here's an argument that
  there are so many other information leaks between HT siblings that we
  might not care"; in the last couple of hours, a proposal emerged to limit
  the protection offered by STIBP to non-dumpable processes as well. To my
  understanding, this would mean that many processes are left hanging out to
  dry again.

I am a bit worried whether this is a sign for a shift in the security goals.
I fully understand that there might be processes (e.g. some[?] kernel
threads) and users (root) which you need to trust anyway, as they can
already access anything. Disabling additional, costly safeguards for
those special cases then seems OK. Opting out of additional protections for
single-user or single-use systems (haproxy?) might make sense as well. But
the kernel[*] not offering full[#] spectre mitigation by default for regular
users and their processes? I'm not so sure.

Thanks,
	Dominik


[+] root is different.

[§] Whether such goals and their pursuit may have legal relevance -- e.g.
concerning the criminal law protection against unlawful access to data -- is
a related, fascinating topic.

[~] For example, I doubt that mutt sets the non-dumpable flag. But I
wouldn't want other users to be able to read my mail.

[#] Well, at least the best the kernel can currently and reasonably manage.

[*] Whether CPUs should enable full mitigation (IBRS_ALL) by default in future
has been discussed on this list as well.
David Woodhouse Jan. 24, 2018, 9:24 a.m. UTC | #5
On Wed, 2018-01-24 at 09:37 +0100, Dominik Brodowski wrote:
> On Wed, Jan 24, 2018 at 07:29:53AM +0100, Martin Schwidefsky wrote:
> > 
> > On Tue, 23 Jan 2018 18:07:19 +0100
> > Dominik Brodowski <linux@dominikbrodowski.net> wrote:
> > 
> > > 
> > > On Tue, Jan 23, 2018 at 02:07:01PM +0100, Martin Schwidefsky wrote:
> > > > 
> > > > Add the PR_ISOLATE_BP operation to prctl. The effect of the process
> > > > control is to make all branch prediction entries created by the execution
> > > > of the user space code of this task not applicable to kernel code or the
> > > > code of any other task.  
> > >
> > > What is the rationale for requiring a per-process *opt-in* for this added
> > > protection?
> > > 
> > > For KPTI on x86, the exact opposite approach is being discussed (see, e.g.
> > > http://lkml.kernel.org/r/1515612500-14505-1-git-send-email-w@1wt.eu ): By
> > > default, play it safe, with KPTI enabled. But for "trusted" processes, one
> > > may opt out using prctrl.
> >
> > The rationale is that there are cases where you got code from *somewhere*
> > and want to run it in an isolated context. Think: a docker container that
> > runs under KVM. But with spectre this is still not really safe. So you
> > include a wrapper program in the docker container to use the trap door
> > prctl to start the potential malicious program. Now you should be good, no?
>
> Well, partly. It may be that s390 and its use cases are special -- but as I
> understand it, this uapi question goes beyond this question:
> 
> To my understanding, Linux traditionally tried to aim for the security goal
> of avoiding information leaks *between* users[+], probably even between
> processes of the same user. It wasn't a guarantee, and there always were
> (and will be) information leaks -- and that is where additional safeguards
> such as seccomp come into play, which reduce the attack surface against
> unknown or unresolved security-related bugs. And everyone knew (or should
> have known) that allowing "untrusted" code to be run (be it by an user, be
> it JavaScript, etc.) is more risky. But still, avoiding information leaks
> between users and between processes was (to my understanding) at least a
> goal.[§]
> 
> In recent days however, the outlook on this issue seems to have shifted:
> 
> - Your proposal would mean to trust all userspace code, unless it is
>   specifically marked as untrusted. As I understand it, this would mean that
>   by default, spectre isn't fully mitigated cross-user and cross-process,
>   though the kernel could. And rogue user-run code may make use of that,
>   unless it is run with a special wrapper.
> 
> - Concerning x86 and IPBP, the current proposal is to limit the protection
>   offered by IPBP to non-dumpable processes. As I understand it, this would
>   mean that other processes are left hanging out to dry.[~]
> 
> - Concerning x86 and STIBP, David mentioned that "[t]here's an argument that
>   there are so many other information leaks between HT siblings that we
>   might not care"; in the last couple of hours, a proposal emerged to limit
>   the protection offered by STIBP to non-dumpable processes as well. To my
>   understanding, this would mean that many processes are left hanging out to
>   dry again.
> 
> I am a bit worried whether this is a sign for a shift in the security goals.
> I fully understand that there might be processes (e.g. some[?] kernel
> threads) and users (root) which you need to trust anyway, as they can
> already access anything. Disabling additional, costly safeguards for
> those special cases then seems OK. Opting out of additional protections for
> single-user or single-use systems (haproxy?) might make sense as well. But
> the kernel[*] not offering full[#] spectre mitigation by default for regular
> users and their processes? I'm not so sure.

Note that for STIBP/IBPB the operation of the flag is different in
another way. We're using it as a "protect this process from others"
flag, not a "protect others from this process" flag.

I'm not sure this is a fundamental shift in overall security goals;
more a recognition that on *current* hardware the cost of 100%
protection against an attack that was fairly unlikely in the first
place, is fairly prohibitive. For a process to make itself non-dumpable 
is a simple enough way to opt in. And *maybe* we could contemplate a
command line option for 'IBPB always' but I'm *really* wary of exposing
too much of that stuff, rather than simply trying to Do The Right
Thing.

> [*] Whether CPUs should enable full mitigation (IBRS_ALL) by default
>     in future has been discussed on this list as well.

The kernel will do that; it's just not implemented yet because it's
slightly non-trivial and can't be fully tested yet. We *will* want to
ALTERNATIVE away the retpolines and just set IBRS_ALL because it'll be
faster to do so.

For IBRS_ALL, note that we still need the same IBPB flushes on context
switch; just not STIBP. That's because IBRS_ALL, as Linus so eloquently
reminded us, is *still* a stop-gap measure and not actually a fix.
Reading between the lines, I think tagging predictions with the ring
(and HT sibling?) they came from is the best they could slip into the
next generation without having to stop the fabs for two years while
they go back to the drawing board.

A real fix will *hopefully* come later, but unfortunately Intel haven't
even defined the bit in IA32_ARCH_CAPABILITIES which advertises "you
don't have to do any of this shit any more; we fixed it", analogous to
their RDCL_NO bit for "no more Meltdown". I'm *hoping* that's just an
oversight in preparing the doc and not looking far enough ahead, rather
than an actual *intent* to never fix it properly as Linus inferred.
Pavel Machek Jan. 24, 2018, 11:15 a.m. UTC | #6
Hi!

On Wed 2018-01-24 09:37:05, Dominik Brodowski wrote:
> On Wed, Jan 24, 2018 at 07:29:53AM +0100, Martin Schwidefsky wrote:
> > On Tue, 23 Jan 2018 18:07:19 +0100
> > Dominik Brodowski <linux@dominikbrodowski.net> wrote:
> > 
> > > On Tue, Jan 23, 2018 at 02:07:01PM +0100, Martin Schwidefsky wrote:
> > > > Add the PR_ISOLATE_BP operation to prctl. The effect of the process
> > > > control is to make all branch prediction entries created by the execution
> > > > of the user space code of this task not applicable to kernel code or the
> > > > code of any other task.  
> > > 
> > > What is the rationale for requiring a per-process *opt-in* for this added
> > > protection?
> > > 
> > > For KPTI on x86, the exact opposite approach is being discussed (see, e.g.
> > > http://lkml.kernel.org/r/1515612500-14505-1-git-send-email-w@1wt.eu ): By
> > > default, play it safe, with KPTI enabled. But for "trusted" processes, one
> > > may opt out using prctrl.
> > 
> > The rationale is that there are cases where you got code from *somewhere*
> > and want to run it in an isolated context. Think: a docker container that
> > runs under KVM. But with spectre this is still not really safe. So you
> > include a wrapper program in the docker container to use the trap door
> > prctl to start the potential malicious program. Now you should be good, no?
> 
> Well, partly. It may be that s390 and its use cases are special -- but as I
> understand it, this uapi question goes beyond this question:
> 
> To my understanding, Linux traditionally tried to aim for the security goal
> of avoiding information leaks *between* users[+], probably even between
> processes of the same user. It wasn't a guarantee, and there always

It used to be guarantee. It still is, on non-buggy CPUs.

Leaks between users need to be prevented.

Leaks between one user should be prevented, too. There are various
ways to restrict the user these days, and for example sandboxed
chromium process should not be able to read my ~/.ssh.

can_ptrace() is closer to "can allow leaks between these two". Still
not quite there, as code might be running in process that
can_ptrace(), but the code has been audited by JIT or something not to
do syscalls.

> (and will be) information leaks -- and that is where additional safeguards
> such as seccomp come into play, which reduce the attack surface against
> unknown or unresolved security-related bugs. And everyone knew (or should
> have known) that allowing "untrusted" code to be run (be it by an user, be
> it JavaScript, etc.) is more risky. But still, avoiding information leaks
> between users and between processes was (to my understanding) at least a
> goal.[§]
> 
> In recent days however, the outlook on this issue seems to have shifted:
> 
> - Your proposal would mean to trust all userspace code, unless it is
>   specifically marked as untrusted. As I understand it, this would mean that
>   by default, spectre isn't fully mitigated cross-user and cross-process,
>   though the kernel could. And rogue user-run code may make use of that,
>   unless it is run with a special wrapper.

Yeah, well, that proposal does not fly, then.
Martin Schwidefsky Jan. 24, 2018, 12:48 p.m. UTC | #7
On Wed, 24 Jan 2018 12:15:53 +0100
Pavel Machek <pavel@ucw.cz> wrote:

> Hi!
> 
> On Wed 2018-01-24 09:37:05, Dominik Brodowski wrote:
> > On Wed, Jan 24, 2018 at 07:29:53AM +0100, Martin Schwidefsky wrote:  
> > > On Tue, 23 Jan 2018 18:07:19 +0100
> > > Dominik Brodowski <linux@dominikbrodowski.net> wrote:
> > >   
> > > > On Tue, Jan 23, 2018 at 02:07:01PM +0100, Martin Schwidefsky wrote:  
> > > > > Add the PR_ISOLATE_BP operation to prctl. The effect of the process
> > > > > control is to make all branch prediction entries created by the execution
> > > > > of the user space code of this task not applicable to kernel code or the
> > > > > code of any other task.    
> > > > 
> > > > What is the rationale for requiring a per-process *opt-in* for this added
> > > > protection?
> > > > 
> > > > For KPTI on x86, the exact opposite approach is being discussed (see, e.g.
> > > > http://lkml.kernel.org/r/1515612500-14505-1-git-send-email-w@1wt.eu ): By
> > > > default, play it safe, with KPTI enabled. But for "trusted" processes, one
> > > > may opt out using prctrl.  
> > > 
> > > The rationale is that there are cases where you got code from *somewhere*
> > > and want to run it in an isolated context. Think: a docker container that
> > > runs under KVM. But with spectre this is still not really safe. So you
> > > include a wrapper program in the docker container to use the trap door
> > > prctl to start the potential malicious program. Now you should be good, no?  
> > 
> > Well, partly. It may be that s390 and its use cases are special -- but as I
> > understand it, this uapi question goes beyond this question:
> > 
> > To my understanding, Linux traditionally tried to aim for the security goal
> > of avoiding information leaks *between* users[+], probably even between
> > processes of the same user. It wasn't a guarantee, and there always  
> 
> It used to be guarantee. It still is, on non-buggy CPUs.

In a perfect world none of this would have ever happened.
But reality begs to differ.

> Leaks between users need to be prevented.
> 
> Leaks between one user should be prevented, too. There are various
> ways to restrict the user these days, and for example sandboxed
> chromium process should not be able to read my ~/.ssh.

Interesting that you mention the use case of a sandboxed browser process.
Why do you sandbox it in the first place? Because your do not trust it
as it might download malicious java-script code which uses some form of
attack to read the content of your ~/.ssh files. That is the use case for
the new prctl, limit this piece of code you *identified* as untrusted.

> can_ptrace() is closer to "can allow leaks between these two". Still
> not quite there, as code might be running in process that
> can_ptrace(), but the code has been audited by JIT or something not to
> do syscalls.
> 
> > (and will be) information leaks -- and that is where additional safeguards
> > such as seccomp come into play, which reduce the attack surface against
> > unknown or unresolved security-related bugs. And everyone knew (or should
> > have known) that allowing "untrusted" code to be run (be it by an user, be
> > it JavaScript, etc.) is more risky. But still, avoiding information leaks
> > between users and between processes was (to my understanding) at least a
> > goal.[§]
> > 
> > In recent days however, the outlook on this issue seems to have shifted:
> > 
> > - Your proposal would mean to trust all userspace code, unless it is
> >   specifically marked as untrusted. As I understand it, this would mean that
> >   by default, spectre isn't fully mitigated cross-user and cross-process,
> >   though the kernel could. And rogue user-run code may make use of that,
> >   unless it is run with a special wrapper.  
> 
> Yeah, well, that proposal does not fly, then.
 
It does not fly as a solution for the general case if cross-process attacks.
But for the special case where you can identify all of the potential untrusted
code in your setup it should work just fine, no?
Alan Cox Jan. 24, 2018, 3:42 p.m. UTC | #8
On Wed, 24 Jan 2018 09:37:05 +0100
> To my understanding, Linux traditionally tried to aim for the security goal
> of avoiding information leaks *between* users[+], probably even between
> processes of the same user. It wasn't a guarantee, and there always were

Not between processes of the same user in general (see ptrace or use gdb).

> (and will be) information leaks -- and that is where additional safeguards
> such as seccomp come into play, which reduce the attack surface against

seccomp is irrelevant on many processors (see the Armageddon paper). You
can (given willing partners) transfer data into and out of a seccomp
process at quite a respectable rate depending upon your hardware features.

> I am a bit worried whether this is a sign for a shift in the security goals.
> I fully understand that there might be processes (e.g. some[?] kernel
> threads) and users (root) which you need to trust anyway, as they can

dumpable is actually very useful but only in a specific way. The question
if process A is dumpable by process B then there is no meaningful
protection between them and you don't need to do any work. Likewise if A
and B can dump each other and are both running on the same ht pair you
don't have to worry about them attacking one another. In all those cases
they can do it with ptrace already.

[There's a corner case here of using BPF filters to block ptrace]

Alan
Pavel Machek Jan. 24, 2018, 7:01 p.m. UTC | #9
Hi!
> > On Wed 2018-01-24 09:37:05, Dominik Brodowski wrote:
> > > On Wed, Jan 24, 2018 at 07:29:53AM +0100, Martin Schwidefsky wrote:  
> > > > On Tue, 23 Jan 2018 18:07:19 +0100
> > > > Dominik Brodowski <linux@dominikbrodowski.net> wrote:
> > > >   
> > > > > On Tue, Jan 23, 2018 at 02:07:01PM +0100, Martin Schwidefsky wrote:  

> > > Well, partly. It may be that s390 and its use cases are special -- but as I
> > > understand it, this uapi question goes beyond this question:
> > > 
> > > To my understanding, Linux traditionally tried to aim for the security goal
> > > of avoiding information leaks *between* users[+], probably even between
> > > processes of the same user. It wasn't a guarantee, and there always  
> > 
> > It used to be guarantee. It still is, on non-buggy CPUs.
> 
> In a perfect world none of this would have ever happened.
> But reality begs to differ.

Ok, so: "Linux traditionally guarantees lack of information leaks
between PIDs". Yes, you can use ptrace, but that should be it.

> > Leaks between users need to be prevented.
> > 
> > Leaks between one user should be prevented, too. There are various
> > ways to restrict the user these days, and for example sandboxed
> > chromium process should not be able to read my ~/.ssh.
> 
> Interesting that you mention the use case of a sandboxed browser process.
> Why do you sandbox it in the first place? Because your do not trust it
> as it might download malicious java-script code which uses some form of
> attack to read the content of your ~/.ssh files. That is the use case for
> the new prctl, limit this piece of code you *identified* as
> untrusted.

See Alan Cox's replies.

Anyway. There's more than one way to mark process as untrusted,
(setuid nobody, seccomp, chroot nowhere, ptrace jail, ...). Do not
attempt to add prctl() to the list.

> > > In recent days however, the outlook on this issue seems to have shifted:
> > > 
> > > - Your proposal would mean to trust all userspace code, unless it is
> > >   specifically marked as untrusted. As I understand it, this would mean that
> > >   by default, spectre isn't fully mitigated cross-user and cross-process,
> > >   though the kernel could. And rogue user-run code may make use of that,
> > >   unless it is run with a special wrapper.  
> > 
> > Yeah, well, that proposal does not fly, then.
>  
> It does not fly as a solution for the general case if cross-process attacks.
> But for the special case where you can identify all of the potential untrusted
> code in your setup it should work just fine, no?

Well.. you can identify all of the untrusted code. Anything that does
not have CAP_HW_ACCESS is untrusted :-).

Anyway, no need to add prctl(), if A can ptrace B and B can ptrace A,
leaking info between them should not be a big deal. You can probably
find existing macros doing neccessary checks.

									Pavel
Alan Cox Jan. 24, 2018, 8:46 p.m. UTC | #10
> Anyway, no need to add prctl(), if A can ptrace B and B can ptrace A,
> leaking info between them should not be a big deal. You can probably
> find existing macros doing neccessary checks.

Until one of them is security managed so it shouldn't be able to ptrace
the other, or (and this is the nasty one) when a process is executing
code it wants to protect from the rest of the same process (eg an
untrusted jvm, javascript or probably nastiest of all webassembly)

We don't need a prctl for trusted/untrusted IMHO but we do eventually
need to think about API's for "this lot is me but I don't trust
it" (flatpack, docker, etc) and for what JIT engines need to do.

Alan
Pavel Machek Jan. 29, 2018, 1:14 p.m. UTC | #11
On Wed 2018-01-24 20:46:22, Alan Cox wrote:
> > Anyway, no need to add prctl(), if A can ptrace B and B can ptrace A,
> > leaking info between them should not be a big deal. You can probably
> > find existing macros doing neccessary checks.
> 
> Until one of them is security managed so it shouldn't be able to ptrace
> the other, or (and this is the nasty one) when a process is executing
> code it wants to protect from the rest of the same process (eg an
> untrusted jvm, javascript or probably nastiest of all webassembly)
> 
> We don't need a prctl for trusted/untrusted IMHO but we do eventually
> need to think about API's for "this lot is me but I don't trust
> it" (flatpack, docker, etc) and for what JIT engines need to do.

Agreed.

And yes, JITs are interesting, and given the latest
rowhammer/sidechannel attacks, something we may want to limit in
future...

It sounds nice on paper but is just risky.
									Pavel
Alan Cox Jan. 29, 2018, 8:12 p.m. UTC | #12
On Mon, 29 Jan 2018 14:14:46 +0100
Pavel Machek <pavel@ucw.cz> wrote:

> On Wed 2018-01-24 20:46:22, Alan Cox wrote:
> > > Anyway, no need to add prctl(), if A can ptrace B and B can ptrace A,
> > > leaking info between them should not be a big deal. You can probably
> > > find existing macros doing neccessary checks.  
> > 
> > Until one of them is security managed so it shouldn't be able to ptrace
> > the other, or (and this is the nasty one) when a process is executing
> > code it wants to protect from the rest of the same process (eg an
> > untrusted jvm, javascript or probably nastiest of all webassembly)
> > 
> > We don't need a prctl for trusted/untrusted IMHO but we do eventually
> > need to think about API's for "this lot is me but I don't trust
> > it" (flatpack, docker, etc) and for what JIT engines need to do.  
> 
> Agreed.
> 
> And yes, JITs are interesting, and given the latest
> rowhammer/sidechannel attacks, something we may want to limit in
> future...
> 
> It sounds nice on paper but is just risky.

I don't think java, javascript, webassembly, (and for some
implementations truetype, pdf, postscript, ... and more) are going away
in a hurry.

Alan
diff mbox

Patch

diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h
index af5f8c2..e7b84c9 100644
--- a/include/uapi/linux/prctl.h
+++ b/include/uapi/linux/prctl.h
@@ -207,4 +207,12 @@  struct prctl_mm_map {
 # define PR_SVE_VL_LEN_MASK		0xffff
 # define PR_SVE_VL_INHERIT		(1 << 17) /* inherit across exec */
 
+/*
+ * Prevent branch prediction entries created by the execution of
+ * user space code of this task to be used in any other context.
+ * This makes it impossible for malicious user space code to train
+ * a branch in the kernel code or in another task to be mispredicted.
+ */
+#define PR_ISOLATE_BP			52
+
 #endif /* _LINUX_PRCTL_H */
diff --git a/kernel/sys.c b/kernel/sys.c
index 83ffd7d..e41cb2f 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -117,6 +117,9 @@ 
 #ifndef SVE_GET_VL
 # define SVE_GET_VL()		(-EINVAL)
 #endif
+#ifndef ISOLATE_BP
+# define ISOLATE_BP()		(-EINVAL)
+#endif
 
 /*
  * this is where the system-wide overflow UID and GID are defined, for
@@ -2398,6 +2401,9 @@  SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
 	case PR_SVE_GET_VL:
 		error = SVE_GET_VL();
 		break;
+	case PR_ISOLATE_BP:
+		error = ISOLATE_BP();
+		break;
 	default:
 		error = -EINVAL;
 		break;