diff mbox

[v8,5/9] seccomp: split mode set routines

Message ID 1403642893-23107-6-git-send-email-keescook@chromium.org (mailing list archive)
State New, archived
Headers show

Commit Message

Kees Cook June 24, 2014, 8:48 p.m. UTC
Extracts the common check/assign logic, and separates the two mode
setting paths to make things more readable with fewer #ifdefs within
function bodies.

Signed-off-by: Kees Cook <keescook@chromium.org>
---
 kernel/seccomp.c |  123 +++++++++++++++++++++++++++++++++++++-----------------
 1 file changed, 84 insertions(+), 39 deletions(-)

Comments

Oleg Nesterov June 25, 2014, 1:51 p.m. UTC | #1
On 06/24, Kees Cook wrote:
>
> +static inline void seccomp_assign_mode(struct task_struct *task,
> +				       unsigned long seccomp_mode)
> +{
> +	BUG_ON(!spin_is_locked(&task->sighand->siglock));
> +
> +	task->seccomp.mode = seccomp_mode;
> +	set_tsk_thread_flag(task, TIF_SECCOMP);
> +}

OK, but unless task == current this can race with secure_computing().
I think this needs smp_mb__before_atomic() and secure_computing() needs
rmb() after test_bit(TIF_SECCOMP).

Otherwise, can't __secure_computing() hit BUG() if it sees the old
mode == SECCOMP_MODE_DISABLED ?

Or seccomp_run_filters() can see ->filters == NULL and WARN(),
smp_load_acquire() only serializes that LOAD with the subsequent memory
operations.

Oleg.
Kees Cook June 25, 2014, 2:51 p.m. UTC | #2
On Wed, Jun 25, 2014 at 6:51 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> On 06/24, Kees Cook wrote:
>>
>> +static inline void seccomp_assign_mode(struct task_struct *task,
>> +                                    unsigned long seccomp_mode)
>> +{
>> +     BUG_ON(!spin_is_locked(&task->sighand->siglock));
>> +
>> +     task->seccomp.mode = seccomp_mode;
>> +     set_tsk_thread_flag(task, TIF_SECCOMP);
>> +}
>
> OK, but unless task == current this can race with secure_computing().
> I think this needs smp_mb__before_atomic() and secure_computing() needs
> rmb() after test_bit(TIF_SECCOMP).
>
> Otherwise, can't __secure_computing() hit BUG() if it sees the old
> mode == SECCOMP_MODE_DISABLED ?
>
> Or seccomp_run_filters() can see ->filters == NULL and WARN(),
> smp_load_acquire() only serializes that LOAD with the subsequent memory
> operations.

Hm, actually, now I'm worried about smp_load_acquire() being too slow
in run_filters().

The ordering must be:
- task->seccomp.filter must be valid before
- task->seccomp.mode is set, which must be valid before
- TIF_SECCOMP is set

But I don't want to impact secure_computing(). What's the best way to
make sure this ordering is respected?

-Kees
Andy Lutomirski June 25, 2014, 4:10 p.m. UTC | #3
On Wed, Jun 25, 2014 at 7:51 AM, Kees Cook <keescook@chromium.org> wrote:
> On Wed, Jun 25, 2014 at 6:51 AM, Oleg Nesterov <oleg@redhat.com> wrote:
>> On 06/24, Kees Cook wrote:
>>>
>>> +static inline void seccomp_assign_mode(struct task_struct *task,
>>> +                                    unsigned long seccomp_mode)
>>> +{
>>> +     BUG_ON(!spin_is_locked(&task->sighand->siglock));
>>> +
>>> +     task->seccomp.mode = seccomp_mode;
>>> +     set_tsk_thread_flag(task, TIF_SECCOMP);
>>> +}
>>
>> OK, but unless task == current this can race with secure_computing().
>> I think this needs smp_mb__before_atomic() and secure_computing() needs
>> rmb() after test_bit(TIF_SECCOMP).
>>
>> Otherwise, can't __secure_computing() hit BUG() if it sees the old
>> mode == SECCOMP_MODE_DISABLED ?
>>
>> Or seccomp_run_filters() can see ->filters == NULL and WARN(),
>> smp_load_acquire() only serializes that LOAD with the subsequent memory
>> operations.
>
> Hm, actually, now I'm worried about smp_load_acquire() being too slow
> in run_filters().
>
> The ordering must be:
> - task->seccomp.filter must be valid before
> - task->seccomp.mode is set, which must be valid before
> - TIF_SECCOMP is set
>
> But I don't want to impact secure_computing(). What's the best way to
> make sure this ordering is respected?

Remove the ordering requirement, perhaps?

What if you moved mode into seccomp.filter?  Then there would be
little reason to check TIF_SECCOMP from secure_computing; instead, you
could smp_load_acquire (or read_barrier_depends, maybe) seccomp.filter
from secure_computing and pass the result as a parameter to
__secure_computing.  Or you could even remove the distinction between
secure_computing and __secure_computing -- it's essentially useless
anyway to split entry hook approaches like my x86 fastpath prototype.

--Andy
Kees Cook June 25, 2014, 4:54 p.m. UTC | #4
On Wed, Jun 25, 2014 at 9:10 AM, Andy Lutomirski <luto@amacapital.net> wrote:
> On Wed, Jun 25, 2014 at 7:51 AM, Kees Cook <keescook@chromium.org> wrote:
>> On Wed, Jun 25, 2014 at 6:51 AM, Oleg Nesterov <oleg@redhat.com> wrote:
>>> On 06/24, Kees Cook wrote:
>>>>
>>>> +static inline void seccomp_assign_mode(struct task_struct *task,
>>>> +                                    unsigned long seccomp_mode)
>>>> +{
>>>> +     BUG_ON(!spin_is_locked(&task->sighand->siglock));
>>>> +
>>>> +     task->seccomp.mode = seccomp_mode;
>>>> +     set_tsk_thread_flag(task, TIF_SECCOMP);
>>>> +}
>>>
>>> OK, but unless task == current this can race with secure_computing().
>>> I think this needs smp_mb__before_atomic() and secure_computing() needs
>>> rmb() after test_bit(TIF_SECCOMP).
>>>
>>> Otherwise, can't __secure_computing() hit BUG() if it sees the old
>>> mode == SECCOMP_MODE_DISABLED ?
>>>
>>> Or seccomp_run_filters() can see ->filters == NULL and WARN(),
>>> smp_load_acquire() only serializes that LOAD with the subsequent memory
>>> operations.
>>
>> Hm, actually, now I'm worried about smp_load_acquire() being too slow
>> in run_filters().
>>
>> The ordering must be:
>> - task->seccomp.filter must be valid before
>> - task->seccomp.mode is set, which must be valid before
>> - TIF_SECCOMP is set
>>
>> But I don't want to impact secure_computing(). What's the best way to
>> make sure this ordering is respected?
>
> Remove the ordering requirement, perhaps?
>
> What if you moved mode into seccomp.filter?  Then there would be
> little reason to check TIF_SECCOMP from secure_computing; instead, you
> could smp_load_acquire (or read_barrier_depends, maybe) seccomp.filter
> from secure_computing and pass the result as a parameter to
> __secure_computing.  Or you could even remove the distinction between
> secure_computing and __secure_computing -- it's essentially useless
> anyway to split entry hook approaches like my x86 fastpath prototype.

The TIF_SECCOMP is needed for the syscall entry path. The check in
secure_computing() is just because the "I am being traced" trigger
includes a call to secure_computing, which filters out tracing
reasons.

Your fast path work would clean a lot of that up, as you say. But it
still doesn't change the ordering check here. TIF_SECCOMP indicates
seccomp.mode must be checked, so that ordering will remain, and if
mode == FILTER, seccomp.filter must be valid.

Isn't there a way we can force the assignment ordering in seccomp_assign_mode()?

-Kees
Oleg Nesterov June 25, 2014, 5 p.m. UTC | #5
On 06/25, Kees Cook wrote:
>
> On Wed, Jun 25, 2014 at 6:51 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> > On 06/24, Kees Cook wrote:
> >>
> >> +static inline void seccomp_assign_mode(struct task_struct *task,
> >> +                                    unsigned long seccomp_mode)
> >> +{
> >> +     BUG_ON(!spin_is_locked(&task->sighand->siglock));
> >> +
> >> +     task->seccomp.mode = seccomp_mode;
> >> +     set_tsk_thread_flag(task, TIF_SECCOMP);
> >> +}
> >
> > OK, but unless task == current this can race with secure_computing().
> > I think this needs smp_mb__before_atomic() and secure_computing() needs
> > rmb() after test_bit(TIF_SECCOMP).
> >
> > Otherwise, can't __secure_computing() hit BUG() if it sees the old
> > mode == SECCOMP_MODE_DISABLED ?
> >
> > Or seccomp_run_filters() can see ->filters == NULL and WARN(),
> > smp_load_acquire() only serializes that LOAD with the subsequent memory
> > operations.
>
> Hm, actually, now I'm worried about smp_load_acquire() being too slow
> in run_filters().
>
> The ordering must be:
> - task->seccomp.filter must be valid before
> - task->seccomp.mode is set, which must be valid before
> - TIF_SECCOMP is set
>
> But I don't want to impact secure_computing(). What's the best way to
> make sure this ordering is respected?

Cough, confused... can't understand even after I read the email from Andy.

We do not care if __secure_computing() misses the recently added filter,
this can happen anyway, whatever we do.

seccomp.mode is frozen after we set it != SECCOMP_MODE_DISABLED.

So we should only worry if set_tsk_thread_flag(TIF_SECCOMP) actually
changes this bit and makes __secure_computing() possible. If we add
smp_mb__before_atomic() into seccomp_assign_mode() and rmb() at the
start of __secure_computing() everything should be fine?

Oleg.
Andy Lutomirski June 25, 2014, 5:03 p.m. UTC | #6
On Wed, Jun 25, 2014 at 9:54 AM, Kees Cook <keescook@chromium.org> wrote:
> On Wed, Jun 25, 2014 at 9:10 AM, Andy Lutomirski <luto@amacapital.net> wrote:
>> On Wed, Jun 25, 2014 at 7:51 AM, Kees Cook <keescook@chromium.org> wrote:
>>> On Wed, Jun 25, 2014 at 6:51 AM, Oleg Nesterov <oleg@redhat.com> wrote:
>>>> On 06/24, Kees Cook wrote:
>>>>>
>>>>> +static inline void seccomp_assign_mode(struct task_struct *task,
>>>>> +                                    unsigned long seccomp_mode)
>>>>> +{
>>>>> +     BUG_ON(!spin_is_locked(&task->sighand->siglock));
>>>>> +
>>>>> +     task->seccomp.mode = seccomp_mode;
>>>>> +     set_tsk_thread_flag(task, TIF_SECCOMP);
>>>>> +}
>>>>
>>>> OK, but unless task == current this can race with secure_computing().
>>>> I think this needs smp_mb__before_atomic() and secure_computing() needs
>>>> rmb() after test_bit(TIF_SECCOMP).
>>>>
>>>> Otherwise, can't __secure_computing() hit BUG() if it sees the old
>>>> mode == SECCOMP_MODE_DISABLED ?
>>>>
>>>> Or seccomp_run_filters() can see ->filters == NULL and WARN(),
>>>> smp_load_acquire() only serializes that LOAD with the subsequent memory
>>>> operations.
>>>
>>> Hm, actually, now I'm worried about smp_load_acquire() being too slow
>>> in run_filters().
>>>
>>> The ordering must be:
>>> - task->seccomp.filter must be valid before
>>> - task->seccomp.mode is set, which must be valid before
>>> - TIF_SECCOMP is set
>>>
>>> But I don't want to impact secure_computing(). What's the best way to
>>> make sure this ordering is respected?
>>
>> Remove the ordering requirement, perhaps?
>>
>> What if you moved mode into seccomp.filter?  Then there would be
>> little reason to check TIF_SECCOMP from secure_computing; instead, you
>> could smp_load_acquire (or read_barrier_depends, maybe) seccomp.filter
>> from secure_computing and pass the result as a parameter to
>> __secure_computing.  Or you could even remove the distinction between
>> secure_computing and __secure_computing -- it's essentially useless
>> anyway to split entry hook approaches like my x86 fastpath prototype.
>
> The TIF_SECCOMP is needed for the syscall entry path. The check in
> secure_computing() is just because the "I am being traced" trigger
> includes a call to secure_computing, which filters out tracing
> reasons.

Right.  I'm suggesting just checking a single indication that seccomp
is on from the process in the seccomp code so that the order doesn't
matter.

IOW, TIF_SECCOMP causes __secure_computing to be invoked, but the race
only seems to matter because of the warning and the BUG.  I think that
both can be fixed if you merge mode into filter so that
__secure_computing atomically checks one condition.

>
> Your fast path work would clean a lot of that up, as you say. But it
> still doesn't change the ordering check here. TIF_SECCOMP indicates
> seccomp.mode must be checked, so that ordering will remain, and if
> mode == FILTER, seccomp.filter must be valid.
>
> Isn't there a way we can force the assignment ordering in seccomp_assign_mode()?

Write the filter, then smp_mb (or maybe a weaker barrier is okay),
then set the bit.

--Andy

>
> -Kees
>
> --
> Kees Cook
> Chrome OS Security
Oleg Nesterov June 25, 2014, 5:32 p.m. UTC | #7
On 06/25, Andy Lutomirski wrote:
>
> Write the filter, then smp_mb (or maybe a weaker barrier is okay),
> then set the bit.

Yes, exactly, this is what I meant. Plas rmb() in __secure_computing().

But I still can't understand the rest of your discussion about the
ordering we need ;)

Oleg.
Andy Lutomirski June 25, 2014, 5:38 p.m. UTC | #8
On Wed, Jun 25, 2014 at 10:32 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> On 06/25, Andy Lutomirski wrote:
>>
>> Write the filter, then smp_mb (or maybe a weaker barrier is okay),
>> then set the bit.
>
> Yes, exactly, this is what I meant. Plas rmb() in __secure_computing().
>
> But I still can't understand the rest of your discussion about the
> ordering we need ;)

Let me try again from scratch.

Currently there are three relevant variables: TIF_SECCOMP,
seccomp.mode, and seccomp.filter.  __secure_computing needs
seccomp.mode and seccomp.filter to be in sync, and it wants (but
doesn't really need) TIF_SECCOMP to be in sync as well.

My suggestion is to rearrange it a bit.  Move mode into seccomp.filter
(so that filter == NULL implies no seccomp) and don't check
TIF_SECCOMP in secure_computing.  Then turning on seccomp is entirely
atomic except for the fact that the seccomp hooks won't be called if
filter != NULL but !TIF_SECCOMP.  This removes all ordering
requirements.

Alternatively, __secure_computing could still BUG_ON(!seccomp.filter).
In that case, filter needs to be set before TIF_SECCOMP is set, but
that's straightforward.

--Andy
Oleg Nesterov June 25, 2014, 5:51 p.m. UTC | #9
On 06/25, Andy Lutomirski wrote:
>
> On Wed, Jun 25, 2014 at 10:32 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> > On 06/25, Andy Lutomirski wrote:
> >>
> >> Write the filter, then smp_mb (or maybe a weaker barrier is okay),
> >> then set the bit.
> >
> > Yes, exactly, this is what I meant. Plas rmb() in __secure_computing().
> >
> > But I still can't understand the rest of your discussion about the
> > ordering we need ;)
>
> Let me try again from scratch.
>
> Currently there are three relevant variables: TIF_SECCOMP,
> seccomp.mode, and seccomp.filter.  __secure_computing needs
> seccomp.mode and seccomp.filter to be in sync, and it wants (but
> doesn't really need) TIF_SECCOMP to be in sync as well.
>
> My suggestion is to rearrange it a bit.  Move mode into seccomp.filter
> (so that filter == NULL implies no seccomp) and don't check
> TIF_SECCOMP in secure_computing.  Then turning on seccomp is entirely
> atomic except for the fact that the seccomp hooks won't be called if
> filter != NULL but !TIF_SECCOMP.  This removes all ordering
> requirements.

Ah, got it, thanks. Perhaps I missed somehing, but to me this looks like
unnecessary complication at first glance.

We alredy have TIF_SECCOMP, we need it anyway, and we should only care
about the case when this bit is actually set, so that we can race with
the 1st call of __secure_computing().

Otherwise we are fine: we can miss the new filter anyway, ->mode can't
be changed it is already nonzero.

> Alternatively, __secure_computing could still BUG_ON(!seccomp.filter).
> In that case, filter needs to be set before TIF_SECCOMP is set, but
> that's straightforward.

Yep. And this is how seccomp_assign_mode() already works? It is called
after we change ->filter chain, it changes ->mode before set(TIF_SECCOMP)
just it lacks a barrier.

Oleg.
Kees Cook June 25, 2014, 6 p.m. UTC | #10
On Wed, Jun 25, 2014 at 10:51 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> On 06/25, Andy Lutomirski wrote:
>>
>> On Wed, Jun 25, 2014 at 10:32 AM, Oleg Nesterov <oleg@redhat.com> wrote:
>> > On 06/25, Andy Lutomirski wrote:
>> >>
>> >> Write the filter, then smp_mb (or maybe a weaker barrier is okay),
>> >> then set the bit.
>> >
>> > Yes, exactly, this is what I meant. Plas rmb() in __secure_computing().
>> >
>> > But I still can't understand the rest of your discussion about the
>> > ordering we need ;)
>>
>> Let me try again from scratch.
>>
>> Currently there are three relevant variables: TIF_SECCOMP,
>> seccomp.mode, and seccomp.filter.  __secure_computing needs
>> seccomp.mode and seccomp.filter to be in sync, and it wants (but
>> doesn't really need) TIF_SECCOMP to be in sync as well.
>>
>> My suggestion is to rearrange it a bit.  Move mode into seccomp.filter
>> (so that filter == NULL implies no seccomp) and don't check

This would require that we reimplement mode 1 seccomp via mode 2
filters. Which isn't too hard, but may add complexity.

>> TIF_SECCOMP in secure_computing.  Then turning on seccomp is entirely
>> atomic except for the fact that the seccomp hooks won't be called if
>> filter != NULL but !TIF_SECCOMP.  This removes all ordering
>> requirements.
>
> Ah, got it, thanks. Perhaps I missed somehing, but to me this looks like
> unnecessary complication at first glance.
>
> We alredy have TIF_SECCOMP, we need it anyway, and we should only care
> about the case when this bit is actually set, so that we can race with
> the 1st call of __secure_computing().
>
> Otherwise we are fine: we can miss the new filter anyway, ->mode can't
> be changed it is already nonzero.
>
>> Alternatively, __secure_computing could still BUG_ON(!seccomp.filter).
>> In that case, filter needs to be set before TIF_SECCOMP is set, but
>> that's straightforward.
>
> Yep. And this is how seccomp_assign_mode() already works? It is called
> after we change ->filter chain, it changes ->mode before set(TIF_SECCOMP)
> just it lacks a barrier.

Right, I think the best solution is to add the barrier. I was
concerned that adding the read barrier in secure_computing would have
a performance impact, though.
Andy Lutomirski June 25, 2014, 6:07 p.m. UTC | #11
On Wed, Jun 25, 2014 at 11:00 AM, Kees Cook <keescook@chromium.org> wrote:
> On Wed, Jun 25, 2014 at 10:51 AM, Oleg Nesterov <oleg@redhat.com> wrote:
>> On 06/25, Andy Lutomirski wrote:
>>>
>>> On Wed, Jun 25, 2014 at 10:32 AM, Oleg Nesterov <oleg@redhat.com> wrote:
>>> > On 06/25, Andy Lutomirski wrote:
>>> >>
>>> >> Write the filter, then smp_mb (or maybe a weaker barrier is okay),
>>> >> then set the bit.
>>> >
>>> > Yes, exactly, this is what I meant. Plas rmb() in __secure_computing().
>>> >
>>> > But I still can't understand the rest of your discussion about the
>>> > ordering we need ;)
>>>
>>> Let me try again from scratch.
>>>
>>> Currently there are three relevant variables: TIF_SECCOMP,
>>> seccomp.mode, and seccomp.filter.  __secure_computing needs
>>> seccomp.mode and seccomp.filter to be in sync, and it wants (but
>>> doesn't really need) TIF_SECCOMP to be in sync as well.
>>>
>>> My suggestion is to rearrange it a bit.  Move mode into seccomp.filter
>>> (so that filter == NULL implies no seccomp) and don't check
>
> This would require that we reimplement mode 1 seccomp via mode 2
> filters. Which isn't too hard, but may add complexity.
>
>>> TIF_SECCOMP in secure_computing.  Then turning on seccomp is entirely
>>> atomic except for the fact that the seccomp hooks won't be called if
>>> filter != NULL but !TIF_SECCOMP.  This removes all ordering
>>> requirements.
>>
>> Ah, got it, thanks. Perhaps I missed somehing, but to me this looks like
>> unnecessary complication at first glance.
>>
>> We alredy have TIF_SECCOMP, we need it anyway, and we should only care
>> about the case when this bit is actually set, so that we can race with
>> the 1st call of __secure_computing().
>>
>> Otherwise we are fine: we can miss the new filter anyway, ->mode can't
>> be changed it is already nonzero.
>>
>>> Alternatively, __secure_computing could still BUG_ON(!seccomp.filter).
>>> In that case, filter needs to be set before TIF_SECCOMP is set, but
>>> that's straightforward.
>>
>> Yep. And this is how seccomp_assign_mode() already works? It is called
>> after we change ->filter chain, it changes ->mode before set(TIF_SECCOMP)
>> just it lacks a barrier.
>
> Right, I think the best solution is to add the barrier. I was
> concerned that adding the read barrier in secure_computing would have
> a performance impact, though.
>

I can't speak for ARM, but I think that all of the read barriers are
essentially free on x86.  (smp_mb is a very different story, but that
shouldn't be needed here.)

--Andy
Kees Cook June 27, 2014, 6:33 p.m. UTC | #12
On Wed, Jun 25, 2014 at 11:07 AM, Andy Lutomirski <luto@amacapital.net> wrote:
> On Wed, Jun 25, 2014 at 11:00 AM, Kees Cook <keescook@chromium.org> wrote:
>> On Wed, Jun 25, 2014 at 10:51 AM, Oleg Nesterov <oleg@redhat.com> wrote:
>>> On 06/25, Andy Lutomirski wrote:
>>>>
>>>> On Wed, Jun 25, 2014 at 10:32 AM, Oleg Nesterov <oleg@redhat.com> wrote:
>>>> > On 06/25, Andy Lutomirski wrote:
>>>> >>
>>>> >> Write the filter, then smp_mb (or maybe a weaker barrier is okay),
>>>> >> then set the bit.
>>>> >
>>>> > Yes, exactly, this is what I meant. Plas rmb() in __secure_computing().
>>>> >
>>>> > But I still can't understand the rest of your discussion about the
>>>> > ordering we need ;)
>>>>
>>>> Let me try again from scratch.
>>>>
>>>> Currently there are three relevant variables: TIF_SECCOMP,
>>>> seccomp.mode, and seccomp.filter.  __secure_computing needs
>>>> seccomp.mode and seccomp.filter to be in sync, and it wants (but
>>>> doesn't really need) TIF_SECCOMP to be in sync as well.
>>>>
>>>> My suggestion is to rearrange it a bit.  Move mode into seccomp.filter
>>>> (so that filter == NULL implies no seccomp) and don't check
>>
>> This would require that we reimplement mode 1 seccomp via mode 2
>> filters. Which isn't too hard, but may add complexity.
>>
>>>> TIF_SECCOMP in secure_computing.  Then turning on seccomp is entirely
>>>> atomic except for the fact that the seccomp hooks won't be called if
>>>> filter != NULL but !TIF_SECCOMP.  This removes all ordering
>>>> requirements.
>>>
>>> Ah, got it, thanks. Perhaps I missed somehing, but to me this looks like
>>> unnecessary complication at first glance.
>>>
>>> We alredy have TIF_SECCOMP, we need it anyway, and we should only care
>>> about the case when this bit is actually set, so that we can race with
>>> the 1st call of __secure_computing().
>>>
>>> Otherwise we are fine: we can miss the new filter anyway, ->mode can't
>>> be changed it is already nonzero.
>>>
>>>> Alternatively, __secure_computing could still BUG_ON(!seccomp.filter).
>>>> In that case, filter needs to be set before TIF_SECCOMP is set, but
>>>> that's straightforward.
>>>
>>> Yep. And this is how seccomp_assign_mode() already works? It is called
>>> after we change ->filter chain, it changes ->mode before set(TIF_SECCOMP)
>>> just it lacks a barrier.
>>
>> Right, I think the best solution is to add the barrier. I was
>> concerned that adding the read barrier in secure_computing would have
>> a performance impact, though.
>>
>
> I can't speak for ARM, but I think that all of the read barriers are
> essentially free on x86.  (smp_mb is a very different story, but that
> shouldn't be needed here.)

It looks like SMP ARM issues dsb for rmb, which seems a bit expensive.
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0204g/CIHJFGFE.html

If I skip the rmb in the secure_computing call before checking mode,
it sounds like I run the risk of racing an out-of-order TIF_SECCOMP vs
mode and filter. This seems unlikely to me, given an addition of the
smp_mb__before_atomic() during the seccomp_assign_mode()? I guess I
don't have a sense of how aggressively ARM might do data caching in
this area. Could the other thread actually see TIF_SECCOMP get set but
still have an out of date copy of seccomp.mode?

I really want to avoid adding anything to the secure_computing()
execution path. :(

-Kees
Andy Lutomirski June 27, 2014, 6:39 p.m. UTC | #13
On Fri, Jun 27, 2014 at 11:33 AM, Kees Cook <keescook@chromium.org> wrote:
> On Wed, Jun 25, 2014 at 11:07 AM, Andy Lutomirski <luto@amacapital.net> wrote:
>> On Wed, Jun 25, 2014 at 11:00 AM, Kees Cook <keescook@chromium.org> wrote:
>>> On Wed, Jun 25, 2014 at 10:51 AM, Oleg Nesterov <oleg@redhat.com> wrote:
>>>> On 06/25, Andy Lutomirski wrote:
>>>>>
>>>>> On Wed, Jun 25, 2014 at 10:32 AM, Oleg Nesterov <oleg@redhat.com> wrote:
>>>>> > On 06/25, Andy Lutomirski wrote:
>>>>> >>
>>>>> >> Write the filter, then smp_mb (or maybe a weaker barrier is okay),
>>>>> >> then set the bit.
>>>>> >
>>>>> > Yes, exactly, this is what I meant. Plas rmb() in __secure_computing().
>>>>> >
>>>>> > But I still can't understand the rest of your discussion about the
>>>>> > ordering we need ;)
>>>>>
>>>>> Let me try again from scratch.
>>>>>
>>>>> Currently there are three relevant variables: TIF_SECCOMP,
>>>>> seccomp.mode, and seccomp.filter.  __secure_computing needs
>>>>> seccomp.mode and seccomp.filter to be in sync, and it wants (but
>>>>> doesn't really need) TIF_SECCOMP to be in sync as well.
>>>>>
>>>>> My suggestion is to rearrange it a bit.  Move mode into seccomp.filter
>>>>> (so that filter == NULL implies no seccomp) and don't check
>>>
>>> This would require that we reimplement mode 1 seccomp via mode 2
>>> filters. Which isn't too hard, but may add complexity.
>>>
>>>>> TIF_SECCOMP in secure_computing.  Then turning on seccomp is entirely
>>>>> atomic except for the fact that the seccomp hooks won't be called if
>>>>> filter != NULL but !TIF_SECCOMP.  This removes all ordering
>>>>> requirements.
>>>>
>>>> Ah, got it, thanks. Perhaps I missed somehing, but to me this looks like
>>>> unnecessary complication at first glance.
>>>>
>>>> We alredy have TIF_SECCOMP, we need it anyway, and we should only care
>>>> about the case when this bit is actually set, so that we can race with
>>>> the 1st call of __secure_computing().
>>>>
>>>> Otherwise we are fine: we can miss the new filter anyway, ->mode can't
>>>> be changed it is already nonzero.
>>>>
>>>>> Alternatively, __secure_computing could still BUG_ON(!seccomp.filter).
>>>>> In that case, filter needs to be set before TIF_SECCOMP is set, but
>>>>> that's straightforward.
>>>>
>>>> Yep. And this is how seccomp_assign_mode() already works? It is called
>>>> after we change ->filter chain, it changes ->mode before set(TIF_SECCOMP)
>>>> just it lacks a barrier.
>>>
>>> Right, I think the best solution is to add the barrier. I was
>>> concerned that adding the read barrier in secure_computing would have
>>> a performance impact, though.
>>>
>>
>> I can't speak for ARM, but I think that all of the read barriers are
>> essentially free on x86.  (smp_mb is a very different story, but that
>> shouldn't be needed here.)
>
> It looks like SMP ARM issues dsb for rmb, which seems a bit expensive.
> http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0204g/CIHJFGFE.html
>
> If I skip the rmb in the secure_computing call before checking mode,
> it sounds like I run the risk of racing an out-of-order TIF_SECCOMP vs
> mode and filter. This seems unlikely to me, given an addition of the
> smp_mb__before_atomic() during the seccomp_assign_mode()? I guess I
> don't have a sense of how aggressively ARM might do data caching in
> this area. Could the other thread actually see TIF_SECCOMP get set but
> still have an out of date copy of seccomp.mode?
>
> I really want to avoid adding anything to the secure_computing()
> execution path. :(

Hence my suggestion to make the ordering not matter.  No ordering
requirement, no barriers.

--Andy

>
> -Kees
>
> --
> Kees Cook
> Chrome OS Security
Kees Cook June 27, 2014, 6:52 p.m. UTC | #14
On Fri, Jun 27, 2014 at 11:39 AM, Andy Lutomirski <luto@amacapital.net> wrote:
> On Fri, Jun 27, 2014 at 11:33 AM, Kees Cook <keescook@chromium.org> wrote:
>> On Wed, Jun 25, 2014 at 11:07 AM, Andy Lutomirski <luto@amacapital.net> wrote:
>>> On Wed, Jun 25, 2014 at 11:00 AM, Kees Cook <keescook@chromium.org> wrote:
>>>> On Wed, Jun 25, 2014 at 10:51 AM, Oleg Nesterov <oleg@redhat.com> wrote:
>>>>> On 06/25, Andy Lutomirski wrote:
>>>>>>
>>>>>> On Wed, Jun 25, 2014 at 10:32 AM, Oleg Nesterov <oleg@redhat.com> wrote:
>>>>>> > On 06/25, Andy Lutomirski wrote:
>>>>>> >>
>>>>>> >> Write the filter, then smp_mb (or maybe a weaker barrier is okay),
>>>>>> >> then set the bit.
>>>>>> >
>>>>>> > Yes, exactly, this is what I meant. Plas rmb() in __secure_computing().
>>>>>> >
>>>>>> > But I still can't understand the rest of your discussion about the
>>>>>> > ordering we need ;)
>>>>>>
>>>>>> Let me try again from scratch.
>>>>>>
>>>>>> Currently there are three relevant variables: TIF_SECCOMP,
>>>>>> seccomp.mode, and seccomp.filter.  __secure_computing needs
>>>>>> seccomp.mode and seccomp.filter to be in sync, and it wants (but
>>>>>> doesn't really need) TIF_SECCOMP to be in sync as well.
>>>>>>
>>>>>> My suggestion is to rearrange it a bit.  Move mode into seccomp.filter
>>>>>> (so that filter == NULL implies no seccomp) and don't check
>>>>
>>>> This would require that we reimplement mode 1 seccomp via mode 2
>>>> filters. Which isn't too hard, but may add complexity.
>>>>
>>>>>> TIF_SECCOMP in secure_computing.  Then turning on seccomp is entirely
>>>>>> atomic except for the fact that the seccomp hooks won't be called if
>>>>>> filter != NULL but !TIF_SECCOMP.  This removes all ordering
>>>>>> requirements.
>>>>>
>>>>> Ah, got it, thanks. Perhaps I missed somehing, but to me this looks like
>>>>> unnecessary complication at first glance.
>>>>>
>>>>> We alredy have TIF_SECCOMP, we need it anyway, and we should only care
>>>>> about the case when this bit is actually set, so that we can race with
>>>>> the 1st call of __secure_computing().
>>>>>
>>>>> Otherwise we are fine: we can miss the new filter anyway, ->mode can't
>>>>> be changed it is already nonzero.
>>>>>
>>>>>> Alternatively, __secure_computing could still BUG_ON(!seccomp.filter).
>>>>>> In that case, filter needs to be set before TIF_SECCOMP is set, but
>>>>>> that's straightforward.
>>>>>
>>>>> Yep. And this is how seccomp_assign_mode() already works? It is called
>>>>> after we change ->filter chain, it changes ->mode before set(TIF_SECCOMP)
>>>>> just it lacks a barrier.
>>>>
>>>> Right, I think the best solution is to add the barrier. I was
>>>> concerned that adding the read barrier in secure_computing would have
>>>> a performance impact, though.
>>>>
>>>
>>> I can't speak for ARM, but I think that all of the read barriers are
>>> essentially free on x86.  (smp_mb is a very different story, but that
>>> shouldn't be needed here.)
>>
>> It looks like SMP ARM issues dsb for rmb, which seems a bit expensive.
>> http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0204g/CIHJFGFE.html
>>
>> If I skip the rmb in the secure_computing call before checking mode,
>> it sounds like I run the risk of racing an out-of-order TIF_SECCOMP vs
>> mode and filter. This seems unlikely to me, given an addition of the
>> smp_mb__before_atomic() during the seccomp_assign_mode()? I guess I
>> don't have a sense of how aggressively ARM might do data caching in
>> this area. Could the other thread actually see TIF_SECCOMP get set but
>> still have an out of date copy of seccomp.mode?
>>
>> I really want to avoid adding anything to the secure_computing()
>> execution path. :(
>
> Hence my suggestion to make the ordering not matter.  No ordering
> requirement, no barriers.

I may be misunderstanding something, but I think there's still an
ordering problem. We'll have TIF_SECCOMP already, so if we enter
secure_computing with a NULL filter, we'll kill the process.

Merging .mode and .filter would remove one of the race failure paths:
having TIF_SECCOMP and not having a mode set (leading to BUG). With
the merge, we could still race and land in the same place as have
TIF_SECCOMP and mode==2, but filter==NULL, leading to WARN and kill.

I guess the question is how large is the race risk on ARM? Is it
possible to have TIF_SECCOMP that far out of sync for the thread?

-Kees
Andy Lutomirski June 27, 2014, 6:56 p.m. UTC | #15
On Fri, Jun 27, 2014 at 11:52 AM, Kees Cook <keescook@chromium.org> wrote:
> On Fri, Jun 27, 2014 at 11:39 AM, Andy Lutomirski <luto@amacapital.net> wrote:
>> On Fri, Jun 27, 2014 at 11:33 AM, Kees Cook <keescook@chromium.org> wrote:
>>> On Wed, Jun 25, 2014 at 11:07 AM, Andy Lutomirski <luto@amacapital.net> wrote:
>>>> On Wed, Jun 25, 2014 at 11:00 AM, Kees Cook <keescook@chromium.org> wrote:
>>>>> On Wed, Jun 25, 2014 at 10:51 AM, Oleg Nesterov <oleg@redhat.com> wrote:
>>>>>> On 06/25, Andy Lutomirski wrote:
>>>>>>>
>>>>>>> On Wed, Jun 25, 2014 at 10:32 AM, Oleg Nesterov <oleg@redhat.com> wrote:
>>>>>>> > On 06/25, Andy Lutomirski wrote:
>>>>>>> >>
>>>>>>> >> Write the filter, then smp_mb (or maybe a weaker barrier is okay),
>>>>>>> >> then set the bit.
>>>>>>> >
>>>>>>> > Yes, exactly, this is what I meant. Plas rmb() in __secure_computing().
>>>>>>> >
>>>>>>> > But I still can't understand the rest of your discussion about the
>>>>>>> > ordering we need ;)
>>>>>>>
>>>>>>> Let me try again from scratch.
>>>>>>>
>>>>>>> Currently there are three relevant variables: TIF_SECCOMP,
>>>>>>> seccomp.mode, and seccomp.filter.  __secure_computing needs
>>>>>>> seccomp.mode and seccomp.filter to be in sync, and it wants (but
>>>>>>> doesn't really need) TIF_SECCOMP to be in sync as well.
>>>>>>>
>>>>>>> My suggestion is to rearrange it a bit.  Move mode into seccomp.filter
>>>>>>> (so that filter == NULL implies no seccomp) and don't check
>>>>>
>>>>> This would require that we reimplement mode 1 seccomp via mode 2
>>>>> filters. Which isn't too hard, but may add complexity.
>>>>>
>>>>>>> TIF_SECCOMP in secure_computing.  Then turning on seccomp is entirely
>>>>>>> atomic except for the fact that the seccomp hooks won't be called if
>>>>>>> filter != NULL but !TIF_SECCOMP.  This removes all ordering
>>>>>>> requirements.
>>>>>>
>>>>>> Ah, got it, thanks. Perhaps I missed somehing, but to me this looks like
>>>>>> unnecessary complication at first glance.
>>>>>>
>>>>>> We alredy have TIF_SECCOMP, we need it anyway, and we should only care
>>>>>> about the case when this bit is actually set, so that we can race with
>>>>>> the 1st call of __secure_computing().
>>>>>>
>>>>>> Otherwise we are fine: we can miss the new filter anyway, ->mode can't
>>>>>> be changed it is already nonzero.
>>>>>>
>>>>>>> Alternatively, __secure_computing could still BUG_ON(!seccomp.filter).
>>>>>>> In that case, filter needs to be set before TIF_SECCOMP is set, but
>>>>>>> that's straightforward.
>>>>>>
>>>>>> Yep. And this is how seccomp_assign_mode() already works? It is called
>>>>>> after we change ->filter chain, it changes ->mode before set(TIF_SECCOMP)
>>>>>> just it lacks a barrier.
>>>>>
>>>>> Right, I think the best solution is to add the barrier. I was
>>>>> concerned that adding the read barrier in secure_computing would have
>>>>> a performance impact, though.
>>>>>
>>>>
>>>> I can't speak for ARM, but I think that all of the read barriers are
>>>> essentially free on x86.  (smp_mb is a very different story, but that
>>>> shouldn't be needed here.)
>>>
>>> It looks like SMP ARM issues dsb for rmb, which seems a bit expensive.
>>> http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0204g/CIHJFGFE.html
>>>
>>> If I skip the rmb in the secure_computing call before checking mode,
>>> it sounds like I run the risk of racing an out-of-order TIF_SECCOMP vs
>>> mode and filter. This seems unlikely to me, given an addition of the
>>> smp_mb__before_atomic() during the seccomp_assign_mode()? I guess I
>>> don't have a sense of how aggressively ARM might do data caching in
>>> this area. Could the other thread actually see TIF_SECCOMP get set but
>>> still have an out of date copy of seccomp.mode?
>>>
>>> I really want to avoid adding anything to the secure_computing()
>>> execution path. :(
>>
>> Hence my suggestion to make the ordering not matter.  No ordering
>> requirement, no barriers.
>
> I may be misunderstanding something, but I think there's still an
> ordering problem. We'll have TIF_SECCOMP already, so if we enter
> secure_computing with a NULL filter, we'll kill the process.
>
> Merging .mode and .filter would remove one of the race failure paths:
> having TIF_SECCOMP and not having a mode set (leading to BUG). With
> the merge, we could still race and land in the same place as have
> TIF_SECCOMP and mode==2, but filter==NULL, leading to WARN and kill.

You could just make secure_computing do nothing if filter == NULL.
It's probably faster to test that than TIF_SECCOMP anyway, since you
need to read the filter cacheline regardless, and testing a regular
variable for non-NULLness might be faster than an atomic bit test
operation.  (Or may not -- I don't know.)

>
> I guess the question is how large is the race risk on ARM? Is it
> possible to have TIF_SECCOMP that far out of sync for the thread?

Dunno.  I don't like leaving crashy known races around.

--Andy



>
> -Kees
>
> --
> Kees Cook
> Chrome OS Security
Kees Cook June 27, 2014, 7:04 p.m. UTC | #16
On Fri, Jun 27, 2014 at 11:56 AM, Andy Lutomirski <luto@amacapital.net> wrote:
> On Fri, Jun 27, 2014 at 11:52 AM, Kees Cook <keescook@chromium.org> wrote:
>> On Fri, Jun 27, 2014 at 11:39 AM, Andy Lutomirski <luto@amacapital.net> wrote:
>>> On Fri, Jun 27, 2014 at 11:33 AM, Kees Cook <keescook@chromium.org> wrote:
>>>> On Wed, Jun 25, 2014 at 11:07 AM, Andy Lutomirski <luto@amacapital.net> wrote:
>>>>> On Wed, Jun 25, 2014 at 11:00 AM, Kees Cook <keescook@chromium.org> wrote:
>>>>>> On Wed, Jun 25, 2014 at 10:51 AM, Oleg Nesterov <oleg@redhat.com> wrote:
>>>>>>> On 06/25, Andy Lutomirski wrote:
>>>>>>>>
>>>>>>>> On Wed, Jun 25, 2014 at 10:32 AM, Oleg Nesterov <oleg@redhat.com> wrote:
>>>>>>>> > On 06/25, Andy Lutomirski wrote:
>>>>>>>> >>
>>>>>>>> >> Write the filter, then smp_mb (or maybe a weaker barrier is okay),
>>>>>>>> >> then set the bit.
>>>>>>>> >
>>>>>>>> > Yes, exactly, this is what I meant. Plas rmb() in __secure_computing().
>>>>>>>> >
>>>>>>>> > But I still can't understand the rest of your discussion about the
>>>>>>>> > ordering we need ;)
>>>>>>>>
>>>>>>>> Let me try again from scratch.
>>>>>>>>
>>>>>>>> Currently there are three relevant variables: TIF_SECCOMP,
>>>>>>>> seccomp.mode, and seccomp.filter.  __secure_computing needs
>>>>>>>> seccomp.mode and seccomp.filter to be in sync, and it wants (but
>>>>>>>> doesn't really need) TIF_SECCOMP to be in sync as well.
>>>>>>>>
>>>>>>>> My suggestion is to rearrange it a bit.  Move mode into seccomp.filter
>>>>>>>> (so that filter == NULL implies no seccomp) and don't check
>>>>>>
>>>>>> This would require that we reimplement mode 1 seccomp via mode 2
>>>>>> filters. Which isn't too hard, but may add complexity.
>>>>>>
>>>>>>>> TIF_SECCOMP in secure_computing.  Then turning on seccomp is entirely
>>>>>>>> atomic except for the fact that the seccomp hooks won't be called if
>>>>>>>> filter != NULL but !TIF_SECCOMP.  This removes all ordering
>>>>>>>> requirements.
>>>>>>>
>>>>>>> Ah, got it, thanks. Perhaps I missed somehing, but to me this looks like
>>>>>>> unnecessary complication at first glance.
>>>>>>>
>>>>>>> We alredy have TIF_SECCOMP, we need it anyway, and we should only care
>>>>>>> about the case when this bit is actually set, so that we can race with
>>>>>>> the 1st call of __secure_computing().
>>>>>>>
>>>>>>> Otherwise we are fine: we can miss the new filter anyway, ->mode can't
>>>>>>> be changed it is already nonzero.
>>>>>>>
>>>>>>>> Alternatively, __secure_computing could still BUG_ON(!seccomp.filter).
>>>>>>>> In that case, filter needs to be set before TIF_SECCOMP is set, but
>>>>>>>> that's straightforward.
>>>>>>>
>>>>>>> Yep. And this is how seccomp_assign_mode() already works? It is called
>>>>>>> after we change ->filter chain, it changes ->mode before set(TIF_SECCOMP)
>>>>>>> just it lacks a barrier.
>>>>>>
>>>>>> Right, I think the best solution is to add the barrier. I was
>>>>>> concerned that adding the read barrier in secure_computing would have
>>>>>> a performance impact, though.
>>>>>>
>>>>>
>>>>> I can't speak for ARM, but I think that all of the read barriers are
>>>>> essentially free on x86.  (smp_mb is a very different story, but that
>>>>> shouldn't be needed here.)
>>>>
>>>> It looks like SMP ARM issues dsb for rmb, which seems a bit expensive.
>>>> http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0204g/CIHJFGFE.html
>>>>
>>>> If I skip the rmb in the secure_computing call before checking mode,
>>>> it sounds like I run the risk of racing an out-of-order TIF_SECCOMP vs
>>>> mode and filter. This seems unlikely to me, given an addition of the
>>>> smp_mb__before_atomic() during the seccomp_assign_mode()? I guess I
>>>> don't have a sense of how aggressively ARM might do data caching in
>>>> this area. Could the other thread actually see TIF_SECCOMP get set but
>>>> still have an out of date copy of seccomp.mode?
>>>>
>>>> I really want to avoid adding anything to the secure_computing()
>>>> execution path. :(
>>>
>>> Hence my suggestion to make the ordering not matter.  No ordering
>>> requirement, no barriers.
>>
>> I may be misunderstanding something, but I think there's still an
>> ordering problem. We'll have TIF_SECCOMP already, so if we enter
>> secure_computing with a NULL filter, we'll kill the process.
>>
>> Merging .mode and .filter would remove one of the race failure paths:
>> having TIF_SECCOMP and not having a mode set (leading to BUG). With
>> the merge, we could still race and land in the same place as have
>> TIF_SECCOMP and mode==2, but filter==NULL, leading to WARN and kill.
>
> You could just make secure_computing do nothing if filter == NULL.
> It's probably faster to test that than TIF_SECCOMP anyway, since you
> need to read the filter cacheline regardless, and testing a regular
> variable for non-NULLness might be faster than an atomic bit test
> operation.  (Or may not -- I don't know.)

I am uncomfortable about making filter == NULL be a "fail open"
condition if TIF_SECCOMP is set.

>> I guess the question is how large is the race risk on ARM? Is it
>> possible to have TIF_SECCOMP that far out of sync for the thread?
>
> Dunno.  I don't like leaving crashy known races around.

Yeah, me too. Hrmpf. I will do some rmb() timing tests...

-Kees
Andy Lutomirski June 27, 2014, 7:11 p.m. UTC | #17
On Fri, Jun 27, 2014 at 12:04 PM, Kees Cook <keescook@chromium.org> wrote:
> On Fri, Jun 27, 2014 at 11:56 AM, Andy Lutomirski <luto@amacapital.net> wrote:
>> On Fri, Jun 27, 2014 at 11:52 AM, Kees Cook <keescook@chromium.org> wrote:
>>> On Fri, Jun 27, 2014 at 11:39 AM, Andy Lutomirski <luto@amacapital.net> wrote:
>>>> On Fri, Jun 27, 2014 at 11:33 AM, Kees Cook <keescook@chromium.org> wrote:
>>>>> On Wed, Jun 25, 2014 at 11:07 AM, Andy Lutomirski <luto@amacapital.net> wrote:
>>>>>> On Wed, Jun 25, 2014 at 11:00 AM, Kees Cook <keescook@chromium.org> wrote:
>>>>>>> On Wed, Jun 25, 2014 at 10:51 AM, Oleg Nesterov <oleg@redhat.com> wrote:
>>>>>>>> On 06/25, Andy Lutomirski wrote:
>>>>>>>>>
>>>>>>>>> On Wed, Jun 25, 2014 at 10:32 AM, Oleg Nesterov <oleg@redhat.com> wrote:
>>>>>>>>> > On 06/25, Andy Lutomirski wrote:
>>>>>>>>> >>
>>>>>>>>> >> Write the filter, then smp_mb (or maybe a weaker barrier is okay),
>>>>>>>>> >> then set the bit.
>>>>>>>>> >
>>>>>>>>> > Yes, exactly, this is what I meant. Plas rmb() in __secure_computing().
>>>>>>>>> >
>>>>>>>>> > But I still can't understand the rest of your discussion about the
>>>>>>>>> > ordering we need ;)
>>>>>>>>>
>>>>>>>>> Let me try again from scratch.
>>>>>>>>>
>>>>>>>>> Currently there are three relevant variables: TIF_SECCOMP,
>>>>>>>>> seccomp.mode, and seccomp.filter.  __secure_computing needs
>>>>>>>>> seccomp.mode and seccomp.filter to be in sync, and it wants (but
>>>>>>>>> doesn't really need) TIF_SECCOMP to be in sync as well.
>>>>>>>>>
>>>>>>>>> My suggestion is to rearrange it a bit.  Move mode into seccomp.filter
>>>>>>>>> (so that filter == NULL implies no seccomp) and don't check
>>>>>>>
>>>>>>> This would require that we reimplement mode 1 seccomp via mode 2
>>>>>>> filters. Which isn't too hard, but may add complexity.
>>>>>>>
>>>>>>>>> TIF_SECCOMP in secure_computing.  Then turning on seccomp is entirely
>>>>>>>>> atomic except for the fact that the seccomp hooks won't be called if
>>>>>>>>> filter != NULL but !TIF_SECCOMP.  This removes all ordering
>>>>>>>>> requirements.
>>>>>>>>
>>>>>>>> Ah, got it, thanks. Perhaps I missed somehing, but to me this looks like
>>>>>>>> unnecessary complication at first glance.
>>>>>>>>
>>>>>>>> We alredy have TIF_SECCOMP, we need it anyway, and we should only care
>>>>>>>> about the case when this bit is actually set, so that we can race with
>>>>>>>> the 1st call of __secure_computing().
>>>>>>>>
>>>>>>>> Otherwise we are fine: we can miss the new filter anyway, ->mode can't
>>>>>>>> be changed it is already nonzero.
>>>>>>>>
>>>>>>>>> Alternatively, __secure_computing could still BUG_ON(!seccomp.filter).
>>>>>>>>> In that case, filter needs to be set before TIF_SECCOMP is set, but
>>>>>>>>> that's straightforward.
>>>>>>>>
>>>>>>>> Yep. And this is how seccomp_assign_mode() already works? It is called
>>>>>>>> after we change ->filter chain, it changes ->mode before set(TIF_SECCOMP)
>>>>>>>> just it lacks a barrier.
>>>>>>>
>>>>>>> Right, I think the best solution is to add the barrier. I was
>>>>>>> concerned that adding the read barrier in secure_computing would have
>>>>>>> a performance impact, though.
>>>>>>>
>>>>>>
>>>>>> I can't speak for ARM, but I think that all of the read barriers are
>>>>>> essentially free on x86.  (smp_mb is a very different story, but that
>>>>>> shouldn't be needed here.)
>>>>>
>>>>> It looks like SMP ARM issues dsb for rmb, which seems a bit expensive.
>>>>> http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0204g/CIHJFGFE.html
>>>>>
>>>>> If I skip the rmb in the secure_computing call before checking mode,
>>>>> it sounds like I run the risk of racing an out-of-order TIF_SECCOMP vs
>>>>> mode and filter. This seems unlikely to me, given an addition of the
>>>>> smp_mb__before_atomic() during the seccomp_assign_mode()? I guess I
>>>>> don't have a sense of how aggressively ARM might do data caching in
>>>>> this area. Could the other thread actually see TIF_SECCOMP get set but
>>>>> still have an out of date copy of seccomp.mode?
>>>>>
>>>>> I really want to avoid adding anything to the secure_computing()
>>>>> execution path. :(
>>>>
>>>> Hence my suggestion to make the ordering not matter.  No ordering
>>>> requirement, no barriers.
>>>
>>> I may be misunderstanding something, but I think there's still an
>>> ordering problem. We'll have TIF_SECCOMP already, so if we enter
>>> secure_computing with a NULL filter, we'll kill the process.
>>>
>>> Merging .mode and .filter would remove one of the race failure paths:
>>> having TIF_SECCOMP and not having a mode set (leading to BUG). With
>>> the merge, we could still race and land in the same place as have
>>> TIF_SECCOMP and mode==2, but filter==NULL, leading to WARN and kill.
>>
>> You could just make secure_computing do nothing if filter == NULL.
>> It's probably faster to test that than TIF_SECCOMP anyway, since you
>> need to read the filter cacheline regardless, and testing a regular
>> variable for non-NULLness might be faster than an atomic bit test
>> operation.  (Or may not -- I don't know.)
>
> I am uncomfortable about making filter == NULL be a "fail open"
> condition if TIF_SECCOMP is set.

I'm unconvinced here.  TIF_SECCOMP unset is already a fail-open
condition if filter is set.

--Andy
Oleg Nesterov June 27, 2014, 7:27 p.m. UTC | #18
On 06/27, Kees Cook wrote:
>
> It looks like SMP ARM issues dsb for rmb, which seems a bit expensive.
> http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0204g/CIHJFGFE.htm
>
> ...
>
> I really want to avoid adding anything to the secure_computing()
> execution path. :(

I must have missed something but I do not understand your concerns.

__secure_computing() is not trivial, and we are going to execute the
filters. Do you really think rmb() can add the noticeable difference?

Not to mention that we can only get here if we take the slow syscall
enter path due to TIF_SECCOMP...

Oleg.
Andy Lutomirski June 27, 2014, 7:31 p.m. UTC | #19
On Fri, Jun 27, 2014 at 12:27 PM, Oleg Nesterov <oleg@redhat.com> wrote:
> On 06/27, Kees Cook wrote:
>>
>> It looks like SMP ARM issues dsb for rmb, which seems a bit expensive.
>> http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0204g/CIHJFGFE.htm
>>
>> ...
>>
>> I really want to avoid adding anything to the secure_computing()
>> execution path. :(
>
> I must have missed something but I do not understand your concerns.
>
> __secure_computing() is not trivial, and we are going to execute the
> filters. Do you really think rmb() can add the noticeable difference?
>
> Not to mention that we can only get here if we take the slow syscall
> enter path due to TIF_SECCOMP...
>

On my box, with my fancy multi-phase seccomp patches, the total
seccomp overhead for a very short filter is about 13ns.  Adding a full
barrier would add several ns, I think.

Admittedly, this is x86, not ARM, so comparisons here are completely
bogus.  And that read memory barrier doesn't even need an instruction
on x86.  But still, let's try to keep this fast.

--Andy
Oleg Nesterov June 27, 2014, 7:55 p.m. UTC | #20
On 06/27, Andy Lutomirski wrote:
>
> On Fri, Jun 27, 2014 at 12:27 PM, Oleg Nesterov <oleg@redhat.com> wrote:
> > On 06/27, Kees Cook wrote:
> >>
> >> It looks like SMP ARM issues dsb for rmb, which seems a bit expensive.
> >> http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0204g/CIHJFGFE.htm
> >>
> >> ...
> >>
> >> I really want to avoid adding anything to the secure_computing()
> >> execution path. :(
> >
> > I must have missed something but I do not understand your concerns.
> >
> > __secure_computing() is not trivial, and we are going to execute the
> > filters. Do you really think rmb() can add the noticeable difference?
> >
> > Not to mention that we can only get here if we take the slow syscall
> > enter path due to TIF_SECCOMP...
> >
>
> On my box, with my fancy multi-phase seccomp patches, the total
> seccomp overhead for a very short filter is about 13ns.  Adding a full
> barrier would add several ns, I think.

I am just curious, does this 13ns overhead include the penalty from the
slow syscall enter path triggered by TIF_SECCOMP ?

> Admittedly, this is x86, not ARM, so comparisons here are completely
> bogus.  And that read memory barrier doesn't even need an instruction
> on x86.  But still, let's try to keep this fast.

Well, I am not going to insist...

But imo it would be better to make it correct in a most simple way, then
we can optimize this code and see if there is a noticeable difference.

Not only we can change the ordering, we can remove the BUG_ON's and just
accept the fact the __secure_computing() can race with sys_seccomp() from
another thread.

If nothing else, it would be much simpler to discuss this patch if it comes
as a separate change.

Oleg.
Andy Lutomirski June 27, 2014, 8:08 p.m. UTC | #21
On Fri, Jun 27, 2014 at 12:55 PM, Oleg Nesterov <oleg@redhat.com> wrote:
> On 06/27, Andy Lutomirski wrote:
>>
>> On Fri, Jun 27, 2014 at 12:27 PM, Oleg Nesterov <oleg@redhat.com> wrote:
>> > On 06/27, Kees Cook wrote:
>> >>
>> >> It looks like SMP ARM issues dsb for rmb, which seems a bit expensive.
>> >> http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0204g/CIHJFGFE.htm
>> >>
>> >> ...
>> >>
>> >> I really want to avoid adding anything to the secure_computing()
>> >> execution path. :(
>> >
>> > I must have missed something but I do not understand your concerns.
>> >
>> > __secure_computing() is not trivial, and we are going to execute the
>> > filters. Do you really think rmb() can add the noticeable difference?
>> >
>> > Not to mention that we can only get here if we take the slow syscall
>> > enter path due to TIF_SECCOMP...
>> >
>>
>> On my box, with my fancy multi-phase seccomp patches, the total
>> seccomp overhead for a very short filter is about 13ns.  Adding a full
>> barrier would add several ns, I think.
>
> I am just curious, does this 13ns overhead include the penalty from the
> slow syscall enter path triggered by TIF_SECCOMP ?

Yes, which is more or less the whole point of that patch series.  I
rewrote part of the TIF_SECCOMP-but-no-tracing case in assembly :)

I'm playing with rewriting it in C, but it's looking like it'll be a
bit more far-reaching than I was hoping.

--Andy
Kees Cook June 27, 2014, 8:56 p.m. UTC | #22
On Fri, Jun 27, 2014 at 12:55 PM, Oleg Nesterov <oleg@redhat.com> wrote:
> On 06/27, Andy Lutomirski wrote:
>>
>> On Fri, Jun 27, 2014 at 12:27 PM, Oleg Nesterov <oleg@redhat.com> wrote:
>> > On 06/27, Kees Cook wrote:
>> >>
>> >> It looks like SMP ARM issues dsb for rmb, which seems a bit expensive.
>> >> http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0204g/CIHJFGFE.htm
>> >>
>> >> ...
>> >>
>> >> I really want to avoid adding anything to the secure_computing()
>> >> execution path. :(
>> >
>> > I must have missed something but I do not understand your concerns.
>> >
>> > __secure_computing() is not trivial, and we are going to execute the
>> > filters. Do you really think rmb() can add the noticeable difference?
>> >
>> > Not to mention that we can only get here if we take the slow syscall
>> > enter path due to TIF_SECCOMP...
>> >
>>
>> On my box, with my fancy multi-phase seccomp patches, the total
>> seccomp overhead for a very short filter is about 13ns.  Adding a full
>> barrier would add several ns, I think.
>
> I am just curious, does this 13ns overhead include the penalty from the
> slow syscall enter path triggered by TIF_SECCOMP ?
>
>> Admittedly, this is x86, not ARM, so comparisons here are completely
>> bogus.  And that read memory barrier doesn't even need an instruction
>> on x86.  But still, let's try to keep this fast.
>
> Well, I am not going to insist...
>
> But imo it would be better to make it correct in a most simple way, then
> we can optimize this code and see if there is a noticeable difference.
>
> Not only we can change the ordering, we can remove the BUG_ON's and just
> accept the fact the __secure_computing() can race with sys_seccomp() from
> another thread.
>
> If nothing else, it would be much simpler to discuss this patch if it comes
> as a separate change.

Yeah, the way I want to go is to add the rmb() for now (it gets us
"correctness"), and then later deal with any potential performance
problems on ARM at a later time. (At present, I'm unable to perform
any timings on ARM -- maybe next week.)

I will send the v9 series in a moment here...

-Kees
diff mbox

Patch

diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index eb8248ab045e..c81000dd0a53 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -194,7 +194,29 @@  static u32 seccomp_run_filters(int syscall)
 	}
 	return ret;
 }
+#endif /* CONFIG_SECCOMP_FILTER */
 
+static inline bool seccomp_check_mode(struct task_struct *task,
+				      unsigned long seccomp_mode)
+{
+	BUG_ON(!spin_is_locked(&task->sighand->siglock));
+
+	if (task->seccomp.mode && task->seccomp.mode != seccomp_mode)
+		return false;
+
+	return true;
+}
+
+static inline void seccomp_assign_mode(struct task_struct *task,
+				       unsigned long seccomp_mode)
+{
+	BUG_ON(!spin_is_locked(&task->sighand->siglock));
+
+	task->seccomp.mode = seccomp_mode;
+	set_tsk_thread_flag(task, TIF_SECCOMP);
+}
+
+#ifdef CONFIG_SECCOMP_FILTER
 /**
  * seccomp_prepare_filter: Prepares a seccomp filter for use.
  * @fprog: BPF program to install
@@ -501,67 +523,83 @@  long prctl_get_seccomp(void)
 }
 
 /**
- * seccomp_set_mode: internal function for setting seccomp mode
- * @seccomp_mode: requested mode to use
- * @filter: optional struct sock_fprog for use with SECCOMP_MODE_FILTER
+ * seccomp_set_mode_strict: internal function for setting strict seccomp
  *
- * This function may be called repeatedly with a @seccomp_mode of
- * SECCOMP_MODE_FILTER to install additional filters.  Every filter
- * successfully installed will be evaluated (in reverse order) for each system
- * call the task makes.
+ * Once current->seccomp.mode is non-zero, it may not be changed.
+ *
+ * Returns 0 on success or -EINVAL on failure.
+ */
+static long seccomp_set_mode_strict(void)
+{
+	const unsigned long seccomp_mode = SECCOMP_MODE_STRICT;
+	unsigned long irqflags;
+	int ret = -EINVAL;
+
+	spin_lock_irqsave(&current->sighand->siglock, irqflags);
+
+	if (!seccomp_check_mode(current, seccomp_mode))
+		goto out;
+
+#ifdef TIF_NOTSC
+	disable_TSC();
+#endif
+	seccomp_assign_mode(current, seccomp_mode);
+	ret = 0;
+
+out:
+	spin_unlock_irqrestore(&current->sighand->siglock, irqflags);
+
+	return ret;
+}
+
+#ifdef CONFIG_SECCOMP_FILTER
+/**
+ * seccomp_set_mode_filter: internal function for setting seccomp filter
+ * @filter: struct sock_fprog containing filter
+ *
+ * This function may be called repeatedly to install additional filters.
+ * Every filter successfully installed will be evaluated (in reverse order)
+ * for each system call the task makes.
  *
  * Once current->seccomp.mode is non-zero, it may not be changed.
  *
  * Returns 0 on success or -EINVAL on failure.
  */
-static long seccomp_set_mode(unsigned long seccomp_mode, char __user *filter)
+static long seccomp_set_mode_filter(char __user *filter)
 {
+	const unsigned long seccomp_mode = SECCOMP_MODE_FILTER;
 	struct seccomp_filter *prepared = NULL;
 	unsigned long irqflags;
 	long ret = -EINVAL;
 
-#ifdef CONFIG_SECCOMP_FILTER
-	/* Prepare the new filter outside of the seccomp lock. */
-	if (seccomp_mode == SECCOMP_MODE_FILTER) {
-		prepared = seccomp_prepare_user_filter(filter);
-		if (IS_ERR(prepared))
-			return PTR_ERR(prepared);
-	}
-#endif
+	/* Prepare the new filter outside of any locking. */
+	prepared = seccomp_prepare_user_filter(filter);
+	if (IS_ERR(prepared))
+		return PTR_ERR(prepared);
 
 	spin_lock_irqsave(&current->sighand->siglock, irqflags);
 
-	if (current->seccomp.mode &&
-	    current->seccomp.mode != seccomp_mode)
+	if (!seccomp_check_mode(current, seccomp_mode))
 		goto out;
 
-	switch (seccomp_mode) {
-	case SECCOMP_MODE_STRICT:
-		ret = 0;
-#ifdef TIF_NOTSC
-		disable_TSC();
-#endif
-		break;
-#ifdef CONFIG_SECCOMP_FILTER
-	case SECCOMP_MODE_FILTER:
-		ret = seccomp_attach_filter(prepared);
-		if (ret)
-			goto out;
-		/* Do not free the successfully attached filter. */
-		prepared = NULL;
-		break;
-#endif
-	default:
+	ret = seccomp_attach_filter(prepared);
+	if (ret)
 		goto out;
-	}
+	/* Do not free the successfully attached filter. */
+	prepared = NULL;
 
-	current->seccomp.mode = seccomp_mode;
-	set_thread_flag(TIF_SECCOMP);
+	seccomp_assign_mode(current, seccomp_mode);
 out:
 	spin_unlock_irqrestore(&current->sighand->siglock, irqflags);
 	seccomp_filter_free(prepared);
 	return ret;
 }
+#else
+static inline long seccomp_set_mode_filter(char __user *filter)
+{
+	return -EINVAL;
+}
+#endif
 
 /**
  * prctl_set_seccomp: configures current->seccomp.mode
@@ -572,5 +610,12 @@  out:
  */
 long prctl_set_seccomp(unsigned long seccomp_mode, char __user *filter)
 {
-	return seccomp_set_mode(seccomp_mode, filter);
+	switch (seccomp_mode) {
+	case SECCOMP_MODE_STRICT:
+		return seccomp_set_mode_strict();
+	case SECCOMP_MODE_FILTER:
+		return seccomp_set_mode_filter(filter);
+	default:
+		return -EINVAL;
+	}
 }