Message ID | 1403642893-23107-6-git-send-email-keescook@chromium.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 06/24, Kees Cook wrote: > > +static inline void seccomp_assign_mode(struct task_struct *task, > + unsigned long seccomp_mode) > +{ > + BUG_ON(!spin_is_locked(&task->sighand->siglock)); > + > + task->seccomp.mode = seccomp_mode; > + set_tsk_thread_flag(task, TIF_SECCOMP); > +} OK, but unless task == current this can race with secure_computing(). I think this needs smp_mb__before_atomic() and secure_computing() needs rmb() after test_bit(TIF_SECCOMP). Otherwise, can't __secure_computing() hit BUG() if it sees the old mode == SECCOMP_MODE_DISABLED ? Or seccomp_run_filters() can see ->filters == NULL and WARN(), smp_load_acquire() only serializes that LOAD with the subsequent memory operations. Oleg.
On Wed, Jun 25, 2014 at 6:51 AM, Oleg Nesterov <oleg@redhat.com> wrote: > On 06/24, Kees Cook wrote: >> >> +static inline void seccomp_assign_mode(struct task_struct *task, >> + unsigned long seccomp_mode) >> +{ >> + BUG_ON(!spin_is_locked(&task->sighand->siglock)); >> + >> + task->seccomp.mode = seccomp_mode; >> + set_tsk_thread_flag(task, TIF_SECCOMP); >> +} > > OK, but unless task == current this can race with secure_computing(). > I think this needs smp_mb__before_atomic() and secure_computing() needs > rmb() after test_bit(TIF_SECCOMP). > > Otherwise, can't __secure_computing() hit BUG() if it sees the old > mode == SECCOMP_MODE_DISABLED ? > > Or seccomp_run_filters() can see ->filters == NULL and WARN(), > smp_load_acquire() only serializes that LOAD with the subsequent memory > operations. Hm, actually, now I'm worried about smp_load_acquire() being too slow in run_filters(). The ordering must be: - task->seccomp.filter must be valid before - task->seccomp.mode is set, which must be valid before - TIF_SECCOMP is set But I don't want to impact secure_computing(). What's the best way to make sure this ordering is respected? -Kees
On Wed, Jun 25, 2014 at 7:51 AM, Kees Cook <keescook@chromium.org> wrote: > On Wed, Jun 25, 2014 at 6:51 AM, Oleg Nesterov <oleg@redhat.com> wrote: >> On 06/24, Kees Cook wrote: >>> >>> +static inline void seccomp_assign_mode(struct task_struct *task, >>> + unsigned long seccomp_mode) >>> +{ >>> + BUG_ON(!spin_is_locked(&task->sighand->siglock)); >>> + >>> + task->seccomp.mode = seccomp_mode; >>> + set_tsk_thread_flag(task, TIF_SECCOMP); >>> +} >> >> OK, but unless task == current this can race with secure_computing(). >> I think this needs smp_mb__before_atomic() and secure_computing() needs >> rmb() after test_bit(TIF_SECCOMP). >> >> Otherwise, can't __secure_computing() hit BUG() if it sees the old >> mode == SECCOMP_MODE_DISABLED ? >> >> Or seccomp_run_filters() can see ->filters == NULL and WARN(), >> smp_load_acquire() only serializes that LOAD with the subsequent memory >> operations. > > Hm, actually, now I'm worried about smp_load_acquire() being too slow > in run_filters(). > > The ordering must be: > - task->seccomp.filter must be valid before > - task->seccomp.mode is set, which must be valid before > - TIF_SECCOMP is set > > But I don't want to impact secure_computing(). What's the best way to > make sure this ordering is respected? Remove the ordering requirement, perhaps? What if you moved mode into seccomp.filter? Then there would be little reason to check TIF_SECCOMP from secure_computing; instead, you could smp_load_acquire (or read_barrier_depends, maybe) seccomp.filter from secure_computing and pass the result as a parameter to __secure_computing. Or you could even remove the distinction between secure_computing and __secure_computing -- it's essentially useless anyway to split entry hook approaches like my x86 fastpath prototype. --Andy
On Wed, Jun 25, 2014 at 9:10 AM, Andy Lutomirski <luto@amacapital.net> wrote: > On Wed, Jun 25, 2014 at 7:51 AM, Kees Cook <keescook@chromium.org> wrote: >> On Wed, Jun 25, 2014 at 6:51 AM, Oleg Nesterov <oleg@redhat.com> wrote: >>> On 06/24, Kees Cook wrote: >>>> >>>> +static inline void seccomp_assign_mode(struct task_struct *task, >>>> + unsigned long seccomp_mode) >>>> +{ >>>> + BUG_ON(!spin_is_locked(&task->sighand->siglock)); >>>> + >>>> + task->seccomp.mode = seccomp_mode; >>>> + set_tsk_thread_flag(task, TIF_SECCOMP); >>>> +} >>> >>> OK, but unless task == current this can race with secure_computing(). >>> I think this needs smp_mb__before_atomic() and secure_computing() needs >>> rmb() after test_bit(TIF_SECCOMP). >>> >>> Otherwise, can't __secure_computing() hit BUG() if it sees the old >>> mode == SECCOMP_MODE_DISABLED ? >>> >>> Or seccomp_run_filters() can see ->filters == NULL and WARN(), >>> smp_load_acquire() only serializes that LOAD with the subsequent memory >>> operations. >> >> Hm, actually, now I'm worried about smp_load_acquire() being too slow >> in run_filters(). >> >> The ordering must be: >> - task->seccomp.filter must be valid before >> - task->seccomp.mode is set, which must be valid before >> - TIF_SECCOMP is set >> >> But I don't want to impact secure_computing(). What's the best way to >> make sure this ordering is respected? > > Remove the ordering requirement, perhaps? > > What if you moved mode into seccomp.filter? Then there would be > little reason to check TIF_SECCOMP from secure_computing; instead, you > could smp_load_acquire (or read_barrier_depends, maybe) seccomp.filter > from secure_computing and pass the result as a parameter to > __secure_computing. Or you could even remove the distinction between > secure_computing and __secure_computing -- it's essentially useless > anyway to split entry hook approaches like my x86 fastpath prototype. The TIF_SECCOMP is needed for the syscall entry path. The check in secure_computing() is just because the "I am being traced" trigger includes a call to secure_computing, which filters out tracing reasons. Your fast path work would clean a lot of that up, as you say. But it still doesn't change the ordering check here. TIF_SECCOMP indicates seccomp.mode must be checked, so that ordering will remain, and if mode == FILTER, seccomp.filter must be valid. Isn't there a way we can force the assignment ordering in seccomp_assign_mode()? -Kees
On 06/25, Kees Cook wrote: > > On Wed, Jun 25, 2014 at 6:51 AM, Oleg Nesterov <oleg@redhat.com> wrote: > > On 06/24, Kees Cook wrote: > >> > >> +static inline void seccomp_assign_mode(struct task_struct *task, > >> + unsigned long seccomp_mode) > >> +{ > >> + BUG_ON(!spin_is_locked(&task->sighand->siglock)); > >> + > >> + task->seccomp.mode = seccomp_mode; > >> + set_tsk_thread_flag(task, TIF_SECCOMP); > >> +} > > > > OK, but unless task == current this can race with secure_computing(). > > I think this needs smp_mb__before_atomic() and secure_computing() needs > > rmb() after test_bit(TIF_SECCOMP). > > > > Otherwise, can't __secure_computing() hit BUG() if it sees the old > > mode == SECCOMP_MODE_DISABLED ? > > > > Or seccomp_run_filters() can see ->filters == NULL and WARN(), > > smp_load_acquire() only serializes that LOAD with the subsequent memory > > operations. > > Hm, actually, now I'm worried about smp_load_acquire() being too slow > in run_filters(). > > The ordering must be: > - task->seccomp.filter must be valid before > - task->seccomp.mode is set, which must be valid before > - TIF_SECCOMP is set > > But I don't want to impact secure_computing(). What's the best way to > make sure this ordering is respected? Cough, confused... can't understand even after I read the email from Andy. We do not care if __secure_computing() misses the recently added filter, this can happen anyway, whatever we do. seccomp.mode is frozen after we set it != SECCOMP_MODE_DISABLED. So we should only worry if set_tsk_thread_flag(TIF_SECCOMP) actually changes this bit and makes __secure_computing() possible. If we add smp_mb__before_atomic() into seccomp_assign_mode() and rmb() at the start of __secure_computing() everything should be fine? Oleg.
On Wed, Jun 25, 2014 at 9:54 AM, Kees Cook <keescook@chromium.org> wrote: > On Wed, Jun 25, 2014 at 9:10 AM, Andy Lutomirski <luto@amacapital.net> wrote: >> On Wed, Jun 25, 2014 at 7:51 AM, Kees Cook <keescook@chromium.org> wrote: >>> On Wed, Jun 25, 2014 at 6:51 AM, Oleg Nesterov <oleg@redhat.com> wrote: >>>> On 06/24, Kees Cook wrote: >>>>> >>>>> +static inline void seccomp_assign_mode(struct task_struct *task, >>>>> + unsigned long seccomp_mode) >>>>> +{ >>>>> + BUG_ON(!spin_is_locked(&task->sighand->siglock)); >>>>> + >>>>> + task->seccomp.mode = seccomp_mode; >>>>> + set_tsk_thread_flag(task, TIF_SECCOMP); >>>>> +} >>>> >>>> OK, but unless task == current this can race with secure_computing(). >>>> I think this needs smp_mb__before_atomic() and secure_computing() needs >>>> rmb() after test_bit(TIF_SECCOMP). >>>> >>>> Otherwise, can't __secure_computing() hit BUG() if it sees the old >>>> mode == SECCOMP_MODE_DISABLED ? >>>> >>>> Or seccomp_run_filters() can see ->filters == NULL and WARN(), >>>> smp_load_acquire() only serializes that LOAD with the subsequent memory >>>> operations. >>> >>> Hm, actually, now I'm worried about smp_load_acquire() being too slow >>> in run_filters(). >>> >>> The ordering must be: >>> - task->seccomp.filter must be valid before >>> - task->seccomp.mode is set, which must be valid before >>> - TIF_SECCOMP is set >>> >>> But I don't want to impact secure_computing(). What's the best way to >>> make sure this ordering is respected? >> >> Remove the ordering requirement, perhaps? >> >> What if you moved mode into seccomp.filter? Then there would be >> little reason to check TIF_SECCOMP from secure_computing; instead, you >> could smp_load_acquire (or read_barrier_depends, maybe) seccomp.filter >> from secure_computing and pass the result as a parameter to >> __secure_computing. Or you could even remove the distinction between >> secure_computing and __secure_computing -- it's essentially useless >> anyway to split entry hook approaches like my x86 fastpath prototype. > > The TIF_SECCOMP is needed for the syscall entry path. The check in > secure_computing() is just because the "I am being traced" trigger > includes a call to secure_computing, which filters out tracing > reasons. Right. I'm suggesting just checking a single indication that seccomp is on from the process in the seccomp code so that the order doesn't matter. IOW, TIF_SECCOMP causes __secure_computing to be invoked, but the race only seems to matter because of the warning and the BUG. I think that both can be fixed if you merge mode into filter so that __secure_computing atomically checks one condition. > > Your fast path work would clean a lot of that up, as you say. But it > still doesn't change the ordering check here. TIF_SECCOMP indicates > seccomp.mode must be checked, so that ordering will remain, and if > mode == FILTER, seccomp.filter must be valid. > > Isn't there a way we can force the assignment ordering in seccomp_assign_mode()? Write the filter, then smp_mb (or maybe a weaker barrier is okay), then set the bit. --Andy > > -Kees > > -- > Kees Cook > Chrome OS Security
On 06/25, Andy Lutomirski wrote: > > Write the filter, then smp_mb (or maybe a weaker barrier is okay), > then set the bit. Yes, exactly, this is what I meant. Plas rmb() in __secure_computing(). But I still can't understand the rest of your discussion about the ordering we need ;) Oleg.
On Wed, Jun 25, 2014 at 10:32 AM, Oleg Nesterov <oleg@redhat.com> wrote: > On 06/25, Andy Lutomirski wrote: >> >> Write the filter, then smp_mb (or maybe a weaker barrier is okay), >> then set the bit. > > Yes, exactly, this is what I meant. Plas rmb() in __secure_computing(). > > But I still can't understand the rest of your discussion about the > ordering we need ;) Let me try again from scratch. Currently there are three relevant variables: TIF_SECCOMP, seccomp.mode, and seccomp.filter. __secure_computing needs seccomp.mode and seccomp.filter to be in sync, and it wants (but doesn't really need) TIF_SECCOMP to be in sync as well. My suggestion is to rearrange it a bit. Move mode into seccomp.filter (so that filter == NULL implies no seccomp) and don't check TIF_SECCOMP in secure_computing. Then turning on seccomp is entirely atomic except for the fact that the seccomp hooks won't be called if filter != NULL but !TIF_SECCOMP. This removes all ordering requirements. Alternatively, __secure_computing could still BUG_ON(!seccomp.filter). In that case, filter needs to be set before TIF_SECCOMP is set, but that's straightforward. --Andy
On 06/25, Andy Lutomirski wrote: > > On Wed, Jun 25, 2014 at 10:32 AM, Oleg Nesterov <oleg@redhat.com> wrote: > > On 06/25, Andy Lutomirski wrote: > >> > >> Write the filter, then smp_mb (or maybe a weaker barrier is okay), > >> then set the bit. > > > > Yes, exactly, this is what I meant. Plas rmb() in __secure_computing(). > > > > But I still can't understand the rest of your discussion about the > > ordering we need ;) > > Let me try again from scratch. > > Currently there are three relevant variables: TIF_SECCOMP, > seccomp.mode, and seccomp.filter. __secure_computing needs > seccomp.mode and seccomp.filter to be in sync, and it wants (but > doesn't really need) TIF_SECCOMP to be in sync as well. > > My suggestion is to rearrange it a bit. Move mode into seccomp.filter > (so that filter == NULL implies no seccomp) and don't check > TIF_SECCOMP in secure_computing. Then turning on seccomp is entirely > atomic except for the fact that the seccomp hooks won't be called if > filter != NULL but !TIF_SECCOMP. This removes all ordering > requirements. Ah, got it, thanks. Perhaps I missed somehing, but to me this looks like unnecessary complication at first glance. We alredy have TIF_SECCOMP, we need it anyway, and we should only care about the case when this bit is actually set, so that we can race with the 1st call of __secure_computing(). Otherwise we are fine: we can miss the new filter anyway, ->mode can't be changed it is already nonzero. > Alternatively, __secure_computing could still BUG_ON(!seccomp.filter). > In that case, filter needs to be set before TIF_SECCOMP is set, but > that's straightforward. Yep. And this is how seccomp_assign_mode() already works? It is called after we change ->filter chain, it changes ->mode before set(TIF_SECCOMP) just it lacks a barrier. Oleg.
On Wed, Jun 25, 2014 at 10:51 AM, Oleg Nesterov <oleg@redhat.com> wrote: > On 06/25, Andy Lutomirski wrote: >> >> On Wed, Jun 25, 2014 at 10:32 AM, Oleg Nesterov <oleg@redhat.com> wrote: >> > On 06/25, Andy Lutomirski wrote: >> >> >> >> Write the filter, then smp_mb (or maybe a weaker barrier is okay), >> >> then set the bit. >> > >> > Yes, exactly, this is what I meant. Plas rmb() in __secure_computing(). >> > >> > But I still can't understand the rest of your discussion about the >> > ordering we need ;) >> >> Let me try again from scratch. >> >> Currently there are three relevant variables: TIF_SECCOMP, >> seccomp.mode, and seccomp.filter. __secure_computing needs >> seccomp.mode and seccomp.filter to be in sync, and it wants (but >> doesn't really need) TIF_SECCOMP to be in sync as well. >> >> My suggestion is to rearrange it a bit. Move mode into seccomp.filter >> (so that filter == NULL implies no seccomp) and don't check This would require that we reimplement mode 1 seccomp via mode 2 filters. Which isn't too hard, but may add complexity. >> TIF_SECCOMP in secure_computing. Then turning on seccomp is entirely >> atomic except for the fact that the seccomp hooks won't be called if >> filter != NULL but !TIF_SECCOMP. This removes all ordering >> requirements. > > Ah, got it, thanks. Perhaps I missed somehing, but to me this looks like > unnecessary complication at first glance. > > We alredy have TIF_SECCOMP, we need it anyway, and we should only care > about the case when this bit is actually set, so that we can race with > the 1st call of __secure_computing(). > > Otherwise we are fine: we can miss the new filter anyway, ->mode can't > be changed it is already nonzero. > >> Alternatively, __secure_computing could still BUG_ON(!seccomp.filter). >> In that case, filter needs to be set before TIF_SECCOMP is set, but >> that's straightforward. > > Yep. And this is how seccomp_assign_mode() already works? It is called > after we change ->filter chain, it changes ->mode before set(TIF_SECCOMP) > just it lacks a barrier. Right, I think the best solution is to add the barrier. I was concerned that adding the read barrier in secure_computing would have a performance impact, though.
On Wed, Jun 25, 2014 at 11:00 AM, Kees Cook <keescook@chromium.org> wrote: > On Wed, Jun 25, 2014 at 10:51 AM, Oleg Nesterov <oleg@redhat.com> wrote: >> On 06/25, Andy Lutomirski wrote: >>> >>> On Wed, Jun 25, 2014 at 10:32 AM, Oleg Nesterov <oleg@redhat.com> wrote: >>> > On 06/25, Andy Lutomirski wrote: >>> >> >>> >> Write the filter, then smp_mb (or maybe a weaker barrier is okay), >>> >> then set the bit. >>> > >>> > Yes, exactly, this is what I meant. Plas rmb() in __secure_computing(). >>> > >>> > But I still can't understand the rest of your discussion about the >>> > ordering we need ;) >>> >>> Let me try again from scratch. >>> >>> Currently there are three relevant variables: TIF_SECCOMP, >>> seccomp.mode, and seccomp.filter. __secure_computing needs >>> seccomp.mode and seccomp.filter to be in sync, and it wants (but >>> doesn't really need) TIF_SECCOMP to be in sync as well. >>> >>> My suggestion is to rearrange it a bit. Move mode into seccomp.filter >>> (so that filter == NULL implies no seccomp) and don't check > > This would require that we reimplement mode 1 seccomp via mode 2 > filters. Which isn't too hard, but may add complexity. > >>> TIF_SECCOMP in secure_computing. Then turning on seccomp is entirely >>> atomic except for the fact that the seccomp hooks won't be called if >>> filter != NULL but !TIF_SECCOMP. This removes all ordering >>> requirements. >> >> Ah, got it, thanks. Perhaps I missed somehing, but to me this looks like >> unnecessary complication at first glance. >> >> We alredy have TIF_SECCOMP, we need it anyway, and we should only care >> about the case when this bit is actually set, so that we can race with >> the 1st call of __secure_computing(). >> >> Otherwise we are fine: we can miss the new filter anyway, ->mode can't >> be changed it is already nonzero. >> >>> Alternatively, __secure_computing could still BUG_ON(!seccomp.filter). >>> In that case, filter needs to be set before TIF_SECCOMP is set, but >>> that's straightforward. >> >> Yep. And this is how seccomp_assign_mode() already works? It is called >> after we change ->filter chain, it changes ->mode before set(TIF_SECCOMP) >> just it lacks a barrier. > > Right, I think the best solution is to add the barrier. I was > concerned that adding the read barrier in secure_computing would have > a performance impact, though. > I can't speak for ARM, but I think that all of the read barriers are essentially free on x86. (smp_mb is a very different story, but that shouldn't be needed here.) --Andy
On Wed, Jun 25, 2014 at 11:07 AM, Andy Lutomirski <luto@amacapital.net> wrote: > On Wed, Jun 25, 2014 at 11:00 AM, Kees Cook <keescook@chromium.org> wrote: >> On Wed, Jun 25, 2014 at 10:51 AM, Oleg Nesterov <oleg@redhat.com> wrote: >>> On 06/25, Andy Lutomirski wrote: >>>> >>>> On Wed, Jun 25, 2014 at 10:32 AM, Oleg Nesterov <oleg@redhat.com> wrote: >>>> > On 06/25, Andy Lutomirski wrote: >>>> >> >>>> >> Write the filter, then smp_mb (or maybe a weaker barrier is okay), >>>> >> then set the bit. >>>> > >>>> > Yes, exactly, this is what I meant. Plas rmb() in __secure_computing(). >>>> > >>>> > But I still can't understand the rest of your discussion about the >>>> > ordering we need ;) >>>> >>>> Let me try again from scratch. >>>> >>>> Currently there are three relevant variables: TIF_SECCOMP, >>>> seccomp.mode, and seccomp.filter. __secure_computing needs >>>> seccomp.mode and seccomp.filter to be in sync, and it wants (but >>>> doesn't really need) TIF_SECCOMP to be in sync as well. >>>> >>>> My suggestion is to rearrange it a bit. Move mode into seccomp.filter >>>> (so that filter == NULL implies no seccomp) and don't check >> >> This would require that we reimplement mode 1 seccomp via mode 2 >> filters. Which isn't too hard, but may add complexity. >> >>>> TIF_SECCOMP in secure_computing. Then turning on seccomp is entirely >>>> atomic except for the fact that the seccomp hooks won't be called if >>>> filter != NULL but !TIF_SECCOMP. This removes all ordering >>>> requirements. >>> >>> Ah, got it, thanks. Perhaps I missed somehing, but to me this looks like >>> unnecessary complication at first glance. >>> >>> We alredy have TIF_SECCOMP, we need it anyway, and we should only care >>> about the case when this bit is actually set, so that we can race with >>> the 1st call of __secure_computing(). >>> >>> Otherwise we are fine: we can miss the new filter anyway, ->mode can't >>> be changed it is already nonzero. >>> >>>> Alternatively, __secure_computing could still BUG_ON(!seccomp.filter). >>>> In that case, filter needs to be set before TIF_SECCOMP is set, but >>>> that's straightforward. >>> >>> Yep. And this is how seccomp_assign_mode() already works? It is called >>> after we change ->filter chain, it changes ->mode before set(TIF_SECCOMP) >>> just it lacks a barrier. >> >> Right, I think the best solution is to add the barrier. I was >> concerned that adding the read barrier in secure_computing would have >> a performance impact, though. >> > > I can't speak for ARM, but I think that all of the read barriers are > essentially free on x86. (smp_mb is a very different story, but that > shouldn't be needed here.) It looks like SMP ARM issues dsb for rmb, which seems a bit expensive. http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0204g/CIHJFGFE.html If I skip the rmb in the secure_computing call before checking mode, it sounds like I run the risk of racing an out-of-order TIF_SECCOMP vs mode and filter. This seems unlikely to me, given an addition of the smp_mb__before_atomic() during the seccomp_assign_mode()? I guess I don't have a sense of how aggressively ARM might do data caching in this area. Could the other thread actually see TIF_SECCOMP get set but still have an out of date copy of seccomp.mode? I really want to avoid adding anything to the secure_computing() execution path. :( -Kees
On Fri, Jun 27, 2014 at 11:33 AM, Kees Cook <keescook@chromium.org> wrote: > On Wed, Jun 25, 2014 at 11:07 AM, Andy Lutomirski <luto@amacapital.net> wrote: >> On Wed, Jun 25, 2014 at 11:00 AM, Kees Cook <keescook@chromium.org> wrote: >>> On Wed, Jun 25, 2014 at 10:51 AM, Oleg Nesterov <oleg@redhat.com> wrote: >>>> On 06/25, Andy Lutomirski wrote: >>>>> >>>>> On Wed, Jun 25, 2014 at 10:32 AM, Oleg Nesterov <oleg@redhat.com> wrote: >>>>> > On 06/25, Andy Lutomirski wrote: >>>>> >> >>>>> >> Write the filter, then smp_mb (or maybe a weaker barrier is okay), >>>>> >> then set the bit. >>>>> > >>>>> > Yes, exactly, this is what I meant. Plas rmb() in __secure_computing(). >>>>> > >>>>> > But I still can't understand the rest of your discussion about the >>>>> > ordering we need ;) >>>>> >>>>> Let me try again from scratch. >>>>> >>>>> Currently there are three relevant variables: TIF_SECCOMP, >>>>> seccomp.mode, and seccomp.filter. __secure_computing needs >>>>> seccomp.mode and seccomp.filter to be in sync, and it wants (but >>>>> doesn't really need) TIF_SECCOMP to be in sync as well. >>>>> >>>>> My suggestion is to rearrange it a bit. Move mode into seccomp.filter >>>>> (so that filter == NULL implies no seccomp) and don't check >>> >>> This would require that we reimplement mode 1 seccomp via mode 2 >>> filters. Which isn't too hard, but may add complexity. >>> >>>>> TIF_SECCOMP in secure_computing. Then turning on seccomp is entirely >>>>> atomic except for the fact that the seccomp hooks won't be called if >>>>> filter != NULL but !TIF_SECCOMP. This removes all ordering >>>>> requirements. >>>> >>>> Ah, got it, thanks. Perhaps I missed somehing, but to me this looks like >>>> unnecessary complication at first glance. >>>> >>>> We alredy have TIF_SECCOMP, we need it anyway, and we should only care >>>> about the case when this bit is actually set, so that we can race with >>>> the 1st call of __secure_computing(). >>>> >>>> Otherwise we are fine: we can miss the new filter anyway, ->mode can't >>>> be changed it is already nonzero. >>>> >>>>> Alternatively, __secure_computing could still BUG_ON(!seccomp.filter). >>>>> In that case, filter needs to be set before TIF_SECCOMP is set, but >>>>> that's straightforward. >>>> >>>> Yep. And this is how seccomp_assign_mode() already works? It is called >>>> after we change ->filter chain, it changes ->mode before set(TIF_SECCOMP) >>>> just it lacks a barrier. >>> >>> Right, I think the best solution is to add the barrier. I was >>> concerned that adding the read barrier in secure_computing would have >>> a performance impact, though. >>> >> >> I can't speak for ARM, but I think that all of the read barriers are >> essentially free on x86. (smp_mb is a very different story, but that >> shouldn't be needed here.) > > It looks like SMP ARM issues dsb for rmb, which seems a bit expensive. > http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0204g/CIHJFGFE.html > > If I skip the rmb in the secure_computing call before checking mode, > it sounds like I run the risk of racing an out-of-order TIF_SECCOMP vs > mode and filter. This seems unlikely to me, given an addition of the > smp_mb__before_atomic() during the seccomp_assign_mode()? I guess I > don't have a sense of how aggressively ARM might do data caching in > this area. Could the other thread actually see TIF_SECCOMP get set but > still have an out of date copy of seccomp.mode? > > I really want to avoid adding anything to the secure_computing() > execution path. :( Hence my suggestion to make the ordering not matter. No ordering requirement, no barriers. --Andy > > -Kees > > -- > Kees Cook > Chrome OS Security
On Fri, Jun 27, 2014 at 11:39 AM, Andy Lutomirski <luto@amacapital.net> wrote: > On Fri, Jun 27, 2014 at 11:33 AM, Kees Cook <keescook@chromium.org> wrote: >> On Wed, Jun 25, 2014 at 11:07 AM, Andy Lutomirski <luto@amacapital.net> wrote: >>> On Wed, Jun 25, 2014 at 11:00 AM, Kees Cook <keescook@chromium.org> wrote: >>>> On Wed, Jun 25, 2014 at 10:51 AM, Oleg Nesterov <oleg@redhat.com> wrote: >>>>> On 06/25, Andy Lutomirski wrote: >>>>>> >>>>>> On Wed, Jun 25, 2014 at 10:32 AM, Oleg Nesterov <oleg@redhat.com> wrote: >>>>>> > On 06/25, Andy Lutomirski wrote: >>>>>> >> >>>>>> >> Write the filter, then smp_mb (or maybe a weaker barrier is okay), >>>>>> >> then set the bit. >>>>>> > >>>>>> > Yes, exactly, this is what I meant. Plas rmb() in __secure_computing(). >>>>>> > >>>>>> > But I still can't understand the rest of your discussion about the >>>>>> > ordering we need ;) >>>>>> >>>>>> Let me try again from scratch. >>>>>> >>>>>> Currently there are three relevant variables: TIF_SECCOMP, >>>>>> seccomp.mode, and seccomp.filter. __secure_computing needs >>>>>> seccomp.mode and seccomp.filter to be in sync, and it wants (but >>>>>> doesn't really need) TIF_SECCOMP to be in sync as well. >>>>>> >>>>>> My suggestion is to rearrange it a bit. Move mode into seccomp.filter >>>>>> (so that filter == NULL implies no seccomp) and don't check >>>> >>>> This would require that we reimplement mode 1 seccomp via mode 2 >>>> filters. Which isn't too hard, but may add complexity. >>>> >>>>>> TIF_SECCOMP in secure_computing. Then turning on seccomp is entirely >>>>>> atomic except for the fact that the seccomp hooks won't be called if >>>>>> filter != NULL but !TIF_SECCOMP. This removes all ordering >>>>>> requirements. >>>>> >>>>> Ah, got it, thanks. Perhaps I missed somehing, but to me this looks like >>>>> unnecessary complication at first glance. >>>>> >>>>> We alredy have TIF_SECCOMP, we need it anyway, and we should only care >>>>> about the case when this bit is actually set, so that we can race with >>>>> the 1st call of __secure_computing(). >>>>> >>>>> Otherwise we are fine: we can miss the new filter anyway, ->mode can't >>>>> be changed it is already nonzero. >>>>> >>>>>> Alternatively, __secure_computing could still BUG_ON(!seccomp.filter). >>>>>> In that case, filter needs to be set before TIF_SECCOMP is set, but >>>>>> that's straightforward. >>>>> >>>>> Yep. And this is how seccomp_assign_mode() already works? It is called >>>>> after we change ->filter chain, it changes ->mode before set(TIF_SECCOMP) >>>>> just it lacks a barrier. >>>> >>>> Right, I think the best solution is to add the barrier. I was >>>> concerned that adding the read barrier in secure_computing would have >>>> a performance impact, though. >>>> >>> >>> I can't speak for ARM, but I think that all of the read barriers are >>> essentially free on x86. (smp_mb is a very different story, but that >>> shouldn't be needed here.) >> >> It looks like SMP ARM issues dsb for rmb, which seems a bit expensive. >> http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0204g/CIHJFGFE.html >> >> If I skip the rmb in the secure_computing call before checking mode, >> it sounds like I run the risk of racing an out-of-order TIF_SECCOMP vs >> mode and filter. This seems unlikely to me, given an addition of the >> smp_mb__before_atomic() during the seccomp_assign_mode()? I guess I >> don't have a sense of how aggressively ARM might do data caching in >> this area. Could the other thread actually see TIF_SECCOMP get set but >> still have an out of date copy of seccomp.mode? >> >> I really want to avoid adding anything to the secure_computing() >> execution path. :( > > Hence my suggestion to make the ordering not matter. No ordering > requirement, no barriers. I may be misunderstanding something, but I think there's still an ordering problem. We'll have TIF_SECCOMP already, so if we enter secure_computing with a NULL filter, we'll kill the process. Merging .mode and .filter would remove one of the race failure paths: having TIF_SECCOMP and not having a mode set (leading to BUG). With the merge, we could still race and land in the same place as have TIF_SECCOMP and mode==2, but filter==NULL, leading to WARN and kill. I guess the question is how large is the race risk on ARM? Is it possible to have TIF_SECCOMP that far out of sync for the thread? -Kees
On Fri, Jun 27, 2014 at 11:52 AM, Kees Cook <keescook@chromium.org> wrote: > On Fri, Jun 27, 2014 at 11:39 AM, Andy Lutomirski <luto@amacapital.net> wrote: >> On Fri, Jun 27, 2014 at 11:33 AM, Kees Cook <keescook@chromium.org> wrote: >>> On Wed, Jun 25, 2014 at 11:07 AM, Andy Lutomirski <luto@amacapital.net> wrote: >>>> On Wed, Jun 25, 2014 at 11:00 AM, Kees Cook <keescook@chromium.org> wrote: >>>>> On Wed, Jun 25, 2014 at 10:51 AM, Oleg Nesterov <oleg@redhat.com> wrote: >>>>>> On 06/25, Andy Lutomirski wrote: >>>>>>> >>>>>>> On Wed, Jun 25, 2014 at 10:32 AM, Oleg Nesterov <oleg@redhat.com> wrote: >>>>>>> > On 06/25, Andy Lutomirski wrote: >>>>>>> >> >>>>>>> >> Write the filter, then smp_mb (or maybe a weaker barrier is okay), >>>>>>> >> then set the bit. >>>>>>> > >>>>>>> > Yes, exactly, this is what I meant. Plas rmb() in __secure_computing(). >>>>>>> > >>>>>>> > But I still can't understand the rest of your discussion about the >>>>>>> > ordering we need ;) >>>>>>> >>>>>>> Let me try again from scratch. >>>>>>> >>>>>>> Currently there are three relevant variables: TIF_SECCOMP, >>>>>>> seccomp.mode, and seccomp.filter. __secure_computing needs >>>>>>> seccomp.mode and seccomp.filter to be in sync, and it wants (but >>>>>>> doesn't really need) TIF_SECCOMP to be in sync as well. >>>>>>> >>>>>>> My suggestion is to rearrange it a bit. Move mode into seccomp.filter >>>>>>> (so that filter == NULL implies no seccomp) and don't check >>>>> >>>>> This would require that we reimplement mode 1 seccomp via mode 2 >>>>> filters. Which isn't too hard, but may add complexity. >>>>> >>>>>>> TIF_SECCOMP in secure_computing. Then turning on seccomp is entirely >>>>>>> atomic except for the fact that the seccomp hooks won't be called if >>>>>>> filter != NULL but !TIF_SECCOMP. This removes all ordering >>>>>>> requirements. >>>>>> >>>>>> Ah, got it, thanks. Perhaps I missed somehing, but to me this looks like >>>>>> unnecessary complication at first glance. >>>>>> >>>>>> We alredy have TIF_SECCOMP, we need it anyway, and we should only care >>>>>> about the case when this bit is actually set, so that we can race with >>>>>> the 1st call of __secure_computing(). >>>>>> >>>>>> Otherwise we are fine: we can miss the new filter anyway, ->mode can't >>>>>> be changed it is already nonzero. >>>>>> >>>>>>> Alternatively, __secure_computing could still BUG_ON(!seccomp.filter). >>>>>>> In that case, filter needs to be set before TIF_SECCOMP is set, but >>>>>>> that's straightforward. >>>>>> >>>>>> Yep. And this is how seccomp_assign_mode() already works? It is called >>>>>> after we change ->filter chain, it changes ->mode before set(TIF_SECCOMP) >>>>>> just it lacks a barrier. >>>>> >>>>> Right, I think the best solution is to add the barrier. I was >>>>> concerned that adding the read barrier in secure_computing would have >>>>> a performance impact, though. >>>>> >>>> >>>> I can't speak for ARM, but I think that all of the read barriers are >>>> essentially free on x86. (smp_mb is a very different story, but that >>>> shouldn't be needed here.) >>> >>> It looks like SMP ARM issues dsb for rmb, which seems a bit expensive. >>> http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0204g/CIHJFGFE.html >>> >>> If I skip the rmb in the secure_computing call before checking mode, >>> it sounds like I run the risk of racing an out-of-order TIF_SECCOMP vs >>> mode and filter. This seems unlikely to me, given an addition of the >>> smp_mb__before_atomic() during the seccomp_assign_mode()? I guess I >>> don't have a sense of how aggressively ARM might do data caching in >>> this area. Could the other thread actually see TIF_SECCOMP get set but >>> still have an out of date copy of seccomp.mode? >>> >>> I really want to avoid adding anything to the secure_computing() >>> execution path. :( >> >> Hence my suggestion to make the ordering not matter. No ordering >> requirement, no barriers. > > I may be misunderstanding something, but I think there's still an > ordering problem. We'll have TIF_SECCOMP already, so if we enter > secure_computing with a NULL filter, we'll kill the process. > > Merging .mode and .filter would remove one of the race failure paths: > having TIF_SECCOMP and not having a mode set (leading to BUG). With > the merge, we could still race and land in the same place as have > TIF_SECCOMP and mode==2, but filter==NULL, leading to WARN and kill. You could just make secure_computing do nothing if filter == NULL. It's probably faster to test that than TIF_SECCOMP anyway, since you need to read the filter cacheline regardless, and testing a regular variable for non-NULLness might be faster than an atomic bit test operation. (Or may not -- I don't know.) > > I guess the question is how large is the race risk on ARM? Is it > possible to have TIF_SECCOMP that far out of sync for the thread? Dunno. I don't like leaving crashy known races around. --Andy > > -Kees > > -- > Kees Cook > Chrome OS Security
On Fri, Jun 27, 2014 at 11:56 AM, Andy Lutomirski <luto@amacapital.net> wrote: > On Fri, Jun 27, 2014 at 11:52 AM, Kees Cook <keescook@chromium.org> wrote: >> On Fri, Jun 27, 2014 at 11:39 AM, Andy Lutomirski <luto@amacapital.net> wrote: >>> On Fri, Jun 27, 2014 at 11:33 AM, Kees Cook <keescook@chromium.org> wrote: >>>> On Wed, Jun 25, 2014 at 11:07 AM, Andy Lutomirski <luto@amacapital.net> wrote: >>>>> On Wed, Jun 25, 2014 at 11:00 AM, Kees Cook <keescook@chromium.org> wrote: >>>>>> On Wed, Jun 25, 2014 at 10:51 AM, Oleg Nesterov <oleg@redhat.com> wrote: >>>>>>> On 06/25, Andy Lutomirski wrote: >>>>>>>> >>>>>>>> On Wed, Jun 25, 2014 at 10:32 AM, Oleg Nesterov <oleg@redhat.com> wrote: >>>>>>>> > On 06/25, Andy Lutomirski wrote: >>>>>>>> >> >>>>>>>> >> Write the filter, then smp_mb (or maybe a weaker barrier is okay), >>>>>>>> >> then set the bit. >>>>>>>> > >>>>>>>> > Yes, exactly, this is what I meant. Plas rmb() in __secure_computing(). >>>>>>>> > >>>>>>>> > But I still can't understand the rest of your discussion about the >>>>>>>> > ordering we need ;) >>>>>>>> >>>>>>>> Let me try again from scratch. >>>>>>>> >>>>>>>> Currently there are three relevant variables: TIF_SECCOMP, >>>>>>>> seccomp.mode, and seccomp.filter. __secure_computing needs >>>>>>>> seccomp.mode and seccomp.filter to be in sync, and it wants (but >>>>>>>> doesn't really need) TIF_SECCOMP to be in sync as well. >>>>>>>> >>>>>>>> My suggestion is to rearrange it a bit. Move mode into seccomp.filter >>>>>>>> (so that filter == NULL implies no seccomp) and don't check >>>>>> >>>>>> This would require that we reimplement mode 1 seccomp via mode 2 >>>>>> filters. Which isn't too hard, but may add complexity. >>>>>> >>>>>>>> TIF_SECCOMP in secure_computing. Then turning on seccomp is entirely >>>>>>>> atomic except for the fact that the seccomp hooks won't be called if >>>>>>>> filter != NULL but !TIF_SECCOMP. This removes all ordering >>>>>>>> requirements. >>>>>>> >>>>>>> Ah, got it, thanks. Perhaps I missed somehing, but to me this looks like >>>>>>> unnecessary complication at first glance. >>>>>>> >>>>>>> We alredy have TIF_SECCOMP, we need it anyway, and we should only care >>>>>>> about the case when this bit is actually set, so that we can race with >>>>>>> the 1st call of __secure_computing(). >>>>>>> >>>>>>> Otherwise we are fine: we can miss the new filter anyway, ->mode can't >>>>>>> be changed it is already nonzero. >>>>>>> >>>>>>>> Alternatively, __secure_computing could still BUG_ON(!seccomp.filter). >>>>>>>> In that case, filter needs to be set before TIF_SECCOMP is set, but >>>>>>>> that's straightforward. >>>>>>> >>>>>>> Yep. And this is how seccomp_assign_mode() already works? It is called >>>>>>> after we change ->filter chain, it changes ->mode before set(TIF_SECCOMP) >>>>>>> just it lacks a barrier. >>>>>> >>>>>> Right, I think the best solution is to add the barrier. I was >>>>>> concerned that adding the read barrier in secure_computing would have >>>>>> a performance impact, though. >>>>>> >>>>> >>>>> I can't speak for ARM, but I think that all of the read barriers are >>>>> essentially free on x86. (smp_mb is a very different story, but that >>>>> shouldn't be needed here.) >>>> >>>> It looks like SMP ARM issues dsb for rmb, which seems a bit expensive. >>>> http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0204g/CIHJFGFE.html >>>> >>>> If I skip the rmb in the secure_computing call before checking mode, >>>> it sounds like I run the risk of racing an out-of-order TIF_SECCOMP vs >>>> mode and filter. This seems unlikely to me, given an addition of the >>>> smp_mb__before_atomic() during the seccomp_assign_mode()? I guess I >>>> don't have a sense of how aggressively ARM might do data caching in >>>> this area. Could the other thread actually see TIF_SECCOMP get set but >>>> still have an out of date copy of seccomp.mode? >>>> >>>> I really want to avoid adding anything to the secure_computing() >>>> execution path. :( >>> >>> Hence my suggestion to make the ordering not matter. No ordering >>> requirement, no barriers. >> >> I may be misunderstanding something, but I think there's still an >> ordering problem. We'll have TIF_SECCOMP already, so if we enter >> secure_computing with a NULL filter, we'll kill the process. >> >> Merging .mode and .filter would remove one of the race failure paths: >> having TIF_SECCOMP and not having a mode set (leading to BUG). With >> the merge, we could still race and land in the same place as have >> TIF_SECCOMP and mode==2, but filter==NULL, leading to WARN and kill. > > You could just make secure_computing do nothing if filter == NULL. > It's probably faster to test that than TIF_SECCOMP anyway, since you > need to read the filter cacheline regardless, and testing a regular > variable for non-NULLness might be faster than an atomic bit test > operation. (Or may not -- I don't know.) I am uncomfortable about making filter == NULL be a "fail open" condition if TIF_SECCOMP is set. >> I guess the question is how large is the race risk on ARM? Is it >> possible to have TIF_SECCOMP that far out of sync for the thread? > > Dunno. I don't like leaving crashy known races around. Yeah, me too. Hrmpf. I will do some rmb() timing tests... -Kees
On Fri, Jun 27, 2014 at 12:04 PM, Kees Cook <keescook@chromium.org> wrote: > On Fri, Jun 27, 2014 at 11:56 AM, Andy Lutomirski <luto@amacapital.net> wrote: >> On Fri, Jun 27, 2014 at 11:52 AM, Kees Cook <keescook@chromium.org> wrote: >>> On Fri, Jun 27, 2014 at 11:39 AM, Andy Lutomirski <luto@amacapital.net> wrote: >>>> On Fri, Jun 27, 2014 at 11:33 AM, Kees Cook <keescook@chromium.org> wrote: >>>>> On Wed, Jun 25, 2014 at 11:07 AM, Andy Lutomirski <luto@amacapital.net> wrote: >>>>>> On Wed, Jun 25, 2014 at 11:00 AM, Kees Cook <keescook@chromium.org> wrote: >>>>>>> On Wed, Jun 25, 2014 at 10:51 AM, Oleg Nesterov <oleg@redhat.com> wrote: >>>>>>>> On 06/25, Andy Lutomirski wrote: >>>>>>>>> >>>>>>>>> On Wed, Jun 25, 2014 at 10:32 AM, Oleg Nesterov <oleg@redhat.com> wrote: >>>>>>>>> > On 06/25, Andy Lutomirski wrote: >>>>>>>>> >> >>>>>>>>> >> Write the filter, then smp_mb (or maybe a weaker barrier is okay), >>>>>>>>> >> then set the bit. >>>>>>>>> > >>>>>>>>> > Yes, exactly, this is what I meant. Plas rmb() in __secure_computing(). >>>>>>>>> > >>>>>>>>> > But I still can't understand the rest of your discussion about the >>>>>>>>> > ordering we need ;) >>>>>>>>> >>>>>>>>> Let me try again from scratch. >>>>>>>>> >>>>>>>>> Currently there are three relevant variables: TIF_SECCOMP, >>>>>>>>> seccomp.mode, and seccomp.filter. __secure_computing needs >>>>>>>>> seccomp.mode and seccomp.filter to be in sync, and it wants (but >>>>>>>>> doesn't really need) TIF_SECCOMP to be in sync as well. >>>>>>>>> >>>>>>>>> My suggestion is to rearrange it a bit. Move mode into seccomp.filter >>>>>>>>> (so that filter == NULL implies no seccomp) and don't check >>>>>>> >>>>>>> This would require that we reimplement mode 1 seccomp via mode 2 >>>>>>> filters. Which isn't too hard, but may add complexity. >>>>>>> >>>>>>>>> TIF_SECCOMP in secure_computing. Then turning on seccomp is entirely >>>>>>>>> atomic except for the fact that the seccomp hooks won't be called if >>>>>>>>> filter != NULL but !TIF_SECCOMP. This removes all ordering >>>>>>>>> requirements. >>>>>>>> >>>>>>>> Ah, got it, thanks. Perhaps I missed somehing, but to me this looks like >>>>>>>> unnecessary complication at first glance. >>>>>>>> >>>>>>>> We alredy have TIF_SECCOMP, we need it anyway, and we should only care >>>>>>>> about the case when this bit is actually set, so that we can race with >>>>>>>> the 1st call of __secure_computing(). >>>>>>>> >>>>>>>> Otherwise we are fine: we can miss the new filter anyway, ->mode can't >>>>>>>> be changed it is already nonzero. >>>>>>>> >>>>>>>>> Alternatively, __secure_computing could still BUG_ON(!seccomp.filter). >>>>>>>>> In that case, filter needs to be set before TIF_SECCOMP is set, but >>>>>>>>> that's straightforward. >>>>>>>> >>>>>>>> Yep. And this is how seccomp_assign_mode() already works? It is called >>>>>>>> after we change ->filter chain, it changes ->mode before set(TIF_SECCOMP) >>>>>>>> just it lacks a barrier. >>>>>>> >>>>>>> Right, I think the best solution is to add the barrier. I was >>>>>>> concerned that adding the read barrier in secure_computing would have >>>>>>> a performance impact, though. >>>>>>> >>>>>> >>>>>> I can't speak for ARM, but I think that all of the read barriers are >>>>>> essentially free on x86. (smp_mb is a very different story, but that >>>>>> shouldn't be needed here.) >>>>> >>>>> It looks like SMP ARM issues dsb for rmb, which seems a bit expensive. >>>>> http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0204g/CIHJFGFE.html >>>>> >>>>> If I skip the rmb in the secure_computing call before checking mode, >>>>> it sounds like I run the risk of racing an out-of-order TIF_SECCOMP vs >>>>> mode and filter. This seems unlikely to me, given an addition of the >>>>> smp_mb__before_atomic() during the seccomp_assign_mode()? I guess I >>>>> don't have a sense of how aggressively ARM might do data caching in >>>>> this area. Could the other thread actually see TIF_SECCOMP get set but >>>>> still have an out of date copy of seccomp.mode? >>>>> >>>>> I really want to avoid adding anything to the secure_computing() >>>>> execution path. :( >>>> >>>> Hence my suggestion to make the ordering not matter. No ordering >>>> requirement, no barriers. >>> >>> I may be misunderstanding something, but I think there's still an >>> ordering problem. We'll have TIF_SECCOMP already, so if we enter >>> secure_computing with a NULL filter, we'll kill the process. >>> >>> Merging .mode and .filter would remove one of the race failure paths: >>> having TIF_SECCOMP and not having a mode set (leading to BUG). With >>> the merge, we could still race and land in the same place as have >>> TIF_SECCOMP and mode==2, but filter==NULL, leading to WARN and kill. >> >> You could just make secure_computing do nothing if filter == NULL. >> It's probably faster to test that than TIF_SECCOMP anyway, since you >> need to read the filter cacheline regardless, and testing a regular >> variable for non-NULLness might be faster than an atomic bit test >> operation. (Or may not -- I don't know.) > > I am uncomfortable about making filter == NULL be a "fail open" > condition if TIF_SECCOMP is set. I'm unconvinced here. TIF_SECCOMP unset is already a fail-open condition if filter is set. --Andy
On 06/27, Kees Cook wrote: > > It looks like SMP ARM issues dsb for rmb, which seems a bit expensive. > http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0204g/CIHJFGFE.htm > > ... > > I really want to avoid adding anything to the secure_computing() > execution path. :( I must have missed something but I do not understand your concerns. __secure_computing() is not trivial, and we are going to execute the filters. Do you really think rmb() can add the noticeable difference? Not to mention that we can only get here if we take the slow syscall enter path due to TIF_SECCOMP... Oleg.
On Fri, Jun 27, 2014 at 12:27 PM, Oleg Nesterov <oleg@redhat.com> wrote: > On 06/27, Kees Cook wrote: >> >> It looks like SMP ARM issues dsb for rmb, which seems a bit expensive. >> http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0204g/CIHJFGFE.htm >> >> ... >> >> I really want to avoid adding anything to the secure_computing() >> execution path. :( > > I must have missed something but I do not understand your concerns. > > __secure_computing() is not trivial, and we are going to execute the > filters. Do you really think rmb() can add the noticeable difference? > > Not to mention that we can only get here if we take the slow syscall > enter path due to TIF_SECCOMP... > On my box, with my fancy multi-phase seccomp patches, the total seccomp overhead for a very short filter is about 13ns. Adding a full barrier would add several ns, I think. Admittedly, this is x86, not ARM, so comparisons here are completely bogus. And that read memory barrier doesn't even need an instruction on x86. But still, let's try to keep this fast. --Andy
On 06/27, Andy Lutomirski wrote: > > On Fri, Jun 27, 2014 at 12:27 PM, Oleg Nesterov <oleg@redhat.com> wrote: > > On 06/27, Kees Cook wrote: > >> > >> It looks like SMP ARM issues dsb for rmb, which seems a bit expensive. > >> http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0204g/CIHJFGFE.htm > >> > >> ... > >> > >> I really want to avoid adding anything to the secure_computing() > >> execution path. :( > > > > I must have missed something but I do not understand your concerns. > > > > __secure_computing() is not trivial, and we are going to execute the > > filters. Do you really think rmb() can add the noticeable difference? > > > > Not to mention that we can only get here if we take the slow syscall > > enter path due to TIF_SECCOMP... > > > > On my box, with my fancy multi-phase seccomp patches, the total > seccomp overhead for a very short filter is about 13ns. Adding a full > barrier would add several ns, I think. I am just curious, does this 13ns overhead include the penalty from the slow syscall enter path triggered by TIF_SECCOMP ? > Admittedly, this is x86, not ARM, so comparisons here are completely > bogus. And that read memory barrier doesn't even need an instruction > on x86. But still, let's try to keep this fast. Well, I am not going to insist... But imo it would be better to make it correct in a most simple way, then we can optimize this code and see if there is a noticeable difference. Not only we can change the ordering, we can remove the BUG_ON's and just accept the fact the __secure_computing() can race with sys_seccomp() from another thread. If nothing else, it would be much simpler to discuss this patch if it comes as a separate change. Oleg.
On Fri, Jun 27, 2014 at 12:55 PM, Oleg Nesterov <oleg@redhat.com> wrote: > On 06/27, Andy Lutomirski wrote: >> >> On Fri, Jun 27, 2014 at 12:27 PM, Oleg Nesterov <oleg@redhat.com> wrote: >> > On 06/27, Kees Cook wrote: >> >> >> >> It looks like SMP ARM issues dsb for rmb, which seems a bit expensive. >> >> http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0204g/CIHJFGFE.htm >> >> >> >> ... >> >> >> >> I really want to avoid adding anything to the secure_computing() >> >> execution path. :( >> > >> > I must have missed something but I do not understand your concerns. >> > >> > __secure_computing() is not trivial, and we are going to execute the >> > filters. Do you really think rmb() can add the noticeable difference? >> > >> > Not to mention that we can only get here if we take the slow syscall >> > enter path due to TIF_SECCOMP... >> > >> >> On my box, with my fancy multi-phase seccomp patches, the total >> seccomp overhead for a very short filter is about 13ns. Adding a full >> barrier would add several ns, I think. > > I am just curious, does this 13ns overhead include the penalty from the > slow syscall enter path triggered by TIF_SECCOMP ? Yes, which is more or less the whole point of that patch series. I rewrote part of the TIF_SECCOMP-but-no-tracing case in assembly :) I'm playing with rewriting it in C, but it's looking like it'll be a bit more far-reaching than I was hoping. --Andy
On Fri, Jun 27, 2014 at 12:55 PM, Oleg Nesterov <oleg@redhat.com> wrote: > On 06/27, Andy Lutomirski wrote: >> >> On Fri, Jun 27, 2014 at 12:27 PM, Oleg Nesterov <oleg@redhat.com> wrote: >> > On 06/27, Kees Cook wrote: >> >> >> >> It looks like SMP ARM issues dsb for rmb, which seems a bit expensive. >> >> http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0204g/CIHJFGFE.htm >> >> >> >> ... >> >> >> >> I really want to avoid adding anything to the secure_computing() >> >> execution path. :( >> > >> > I must have missed something but I do not understand your concerns. >> > >> > __secure_computing() is not trivial, and we are going to execute the >> > filters. Do you really think rmb() can add the noticeable difference? >> > >> > Not to mention that we can only get here if we take the slow syscall >> > enter path due to TIF_SECCOMP... >> > >> >> On my box, with my fancy multi-phase seccomp patches, the total >> seccomp overhead for a very short filter is about 13ns. Adding a full >> barrier would add several ns, I think. > > I am just curious, does this 13ns overhead include the penalty from the > slow syscall enter path triggered by TIF_SECCOMP ? > >> Admittedly, this is x86, not ARM, so comparisons here are completely >> bogus. And that read memory barrier doesn't even need an instruction >> on x86. But still, let's try to keep this fast. > > Well, I am not going to insist... > > But imo it would be better to make it correct in a most simple way, then > we can optimize this code and see if there is a noticeable difference. > > Not only we can change the ordering, we can remove the BUG_ON's and just > accept the fact the __secure_computing() can race with sys_seccomp() from > another thread. > > If nothing else, it would be much simpler to discuss this patch if it comes > as a separate change. Yeah, the way I want to go is to add the rmb() for now (it gets us "correctness"), and then later deal with any potential performance problems on ARM at a later time. (At present, I'm unable to perform any timings on ARM -- maybe next week.) I will send the v9 series in a moment here... -Kees
diff --git a/kernel/seccomp.c b/kernel/seccomp.c index eb8248ab045e..c81000dd0a53 100644 --- a/kernel/seccomp.c +++ b/kernel/seccomp.c @@ -194,7 +194,29 @@ static u32 seccomp_run_filters(int syscall) } return ret; } +#endif /* CONFIG_SECCOMP_FILTER */ +static inline bool seccomp_check_mode(struct task_struct *task, + unsigned long seccomp_mode) +{ + BUG_ON(!spin_is_locked(&task->sighand->siglock)); + + if (task->seccomp.mode && task->seccomp.mode != seccomp_mode) + return false; + + return true; +} + +static inline void seccomp_assign_mode(struct task_struct *task, + unsigned long seccomp_mode) +{ + BUG_ON(!spin_is_locked(&task->sighand->siglock)); + + task->seccomp.mode = seccomp_mode; + set_tsk_thread_flag(task, TIF_SECCOMP); +} + +#ifdef CONFIG_SECCOMP_FILTER /** * seccomp_prepare_filter: Prepares a seccomp filter for use. * @fprog: BPF program to install @@ -501,67 +523,83 @@ long prctl_get_seccomp(void) } /** - * seccomp_set_mode: internal function for setting seccomp mode - * @seccomp_mode: requested mode to use - * @filter: optional struct sock_fprog for use with SECCOMP_MODE_FILTER + * seccomp_set_mode_strict: internal function for setting strict seccomp * - * This function may be called repeatedly with a @seccomp_mode of - * SECCOMP_MODE_FILTER to install additional filters. Every filter - * successfully installed will be evaluated (in reverse order) for each system - * call the task makes. + * Once current->seccomp.mode is non-zero, it may not be changed. + * + * Returns 0 on success or -EINVAL on failure. + */ +static long seccomp_set_mode_strict(void) +{ + const unsigned long seccomp_mode = SECCOMP_MODE_STRICT; + unsigned long irqflags; + int ret = -EINVAL; + + spin_lock_irqsave(¤t->sighand->siglock, irqflags); + + if (!seccomp_check_mode(current, seccomp_mode)) + goto out; + +#ifdef TIF_NOTSC + disable_TSC(); +#endif + seccomp_assign_mode(current, seccomp_mode); + ret = 0; + +out: + spin_unlock_irqrestore(¤t->sighand->siglock, irqflags); + + return ret; +} + +#ifdef CONFIG_SECCOMP_FILTER +/** + * seccomp_set_mode_filter: internal function for setting seccomp filter + * @filter: struct sock_fprog containing filter + * + * This function may be called repeatedly to install additional filters. + * Every filter successfully installed will be evaluated (in reverse order) + * for each system call the task makes. * * Once current->seccomp.mode is non-zero, it may not be changed. * * Returns 0 on success or -EINVAL on failure. */ -static long seccomp_set_mode(unsigned long seccomp_mode, char __user *filter) +static long seccomp_set_mode_filter(char __user *filter) { + const unsigned long seccomp_mode = SECCOMP_MODE_FILTER; struct seccomp_filter *prepared = NULL; unsigned long irqflags; long ret = -EINVAL; -#ifdef CONFIG_SECCOMP_FILTER - /* Prepare the new filter outside of the seccomp lock. */ - if (seccomp_mode == SECCOMP_MODE_FILTER) { - prepared = seccomp_prepare_user_filter(filter); - if (IS_ERR(prepared)) - return PTR_ERR(prepared); - } -#endif + /* Prepare the new filter outside of any locking. */ + prepared = seccomp_prepare_user_filter(filter); + if (IS_ERR(prepared)) + return PTR_ERR(prepared); spin_lock_irqsave(¤t->sighand->siglock, irqflags); - if (current->seccomp.mode && - current->seccomp.mode != seccomp_mode) + if (!seccomp_check_mode(current, seccomp_mode)) goto out; - switch (seccomp_mode) { - case SECCOMP_MODE_STRICT: - ret = 0; -#ifdef TIF_NOTSC - disable_TSC(); -#endif - break; -#ifdef CONFIG_SECCOMP_FILTER - case SECCOMP_MODE_FILTER: - ret = seccomp_attach_filter(prepared); - if (ret) - goto out; - /* Do not free the successfully attached filter. */ - prepared = NULL; - break; -#endif - default: + ret = seccomp_attach_filter(prepared); + if (ret) goto out; - } + /* Do not free the successfully attached filter. */ + prepared = NULL; - current->seccomp.mode = seccomp_mode; - set_thread_flag(TIF_SECCOMP); + seccomp_assign_mode(current, seccomp_mode); out: spin_unlock_irqrestore(¤t->sighand->siglock, irqflags); seccomp_filter_free(prepared); return ret; } +#else +static inline long seccomp_set_mode_filter(char __user *filter) +{ + return -EINVAL; +} +#endif /** * prctl_set_seccomp: configures current->seccomp.mode @@ -572,5 +610,12 @@ out: */ long prctl_set_seccomp(unsigned long seccomp_mode, char __user *filter) { - return seccomp_set_mode(seccomp_mode, filter); + switch (seccomp_mode) { + case SECCOMP_MODE_STRICT: + return seccomp_set_mode_strict(); + case SECCOMP_MODE_FILTER: + return seccomp_set_mode_filter(filter); + default: + return -EINVAL; + } }
Extracts the common check/assign logic, and separates the two mode setting paths to make things more readable with fewer #ifdefs within function bodies. Signed-off-by: Kees Cook <keescook@chromium.org> --- kernel/seccomp.c | 123 +++++++++++++++++++++++++++++++++++++----------------- 1 file changed, 84 insertions(+), 39 deletions(-)