Message ID | 20230601122622.513140-1-kpsingh@kernel.org (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | BPF |
Headers | show |
Series | [bpf] bpf: Fix UAF in task local storage | expand |
On Thu, Jun 1, 2023 at 5:26 AM KP Singh <kpsingh@kernel.org> wrote: > > When the the task local storage was generalized for tracing programs, the > bpf_task_local_storage callback was moved from a BPF LSM hook callback > for security_task_free LSM hook to it's own callback. But a failure case > in bad_fork_cleanup_security was missed which, when triggered, led to a dangling > task owner pointer and a subsequent use-after-free. > > This issue was noticed when a BPF LSM program was attached to the > task_alloc hook on a kernel with KASAN enabled. The program used > bpf_task_storage_get to copy the task local storage from the current > task to the new task being created. This is pretty tricky. Let's add a selftest for this. > > Fixes: a10787e6d58c ("bpf: Enable task local storage for tracing programs") > Reported-by: Kuba Piecuch <jpiecuch@google.com> > Signed-off-by: KP Singh <kpsingh@kernel.org> > --- > > This fixes the regression from the LSM blob based implementation, we can > still have UAFs, if bpf_task_storage_get is invoked after bpf_task_storage_free > in the cleanup path. Can we fix this by calling bpf_task_storage_free() from free_task()? Thanks, Song > > kernel/fork.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/kernel/fork.c b/kernel/fork.c > index ed4e01daccaa..112d85091ae6 100644 > --- a/kernel/fork.c > +++ b/kernel/fork.c > @@ -2781,6 +2781,7 @@ __latent_entropy struct task_struct *copy_process( > exit_sem(p); > bad_fork_cleanup_security: > security_task_free(p); > + bpf_task_storage_free(p); > bad_fork_cleanup_audit: > audit_free(p); > bad_fork_cleanup_perf: > -- > 2.41.0.rc0.172.g3f132b7071-goog > >
On Thu, Jun 1, 2023 at 6:18 PM Song Liu <song@kernel.org> wrote: > > On Thu, Jun 1, 2023 at 5:26 AM KP Singh <kpsingh@kernel.org> wrote: > > > > When the the task local storage was generalized for tracing programs, the > > bpf_task_local_storage callback was moved from a BPF LSM hook callback > > for security_task_free LSM hook to it's own callback. But a failure case > > in bad_fork_cleanup_security was missed which, when triggered, led to a dangling > > task owner pointer and a subsequent use-after-free. > > > > This issue was noticed when a BPF LSM program was attached to the > > task_alloc hook on a kernel with KASAN enabled. The program used > > bpf_task_storage_get to copy the task local storage from the current > > task to the new task being created. > > This is pretty tricky. Let's add a selftest for this. I don't have an easy repro for this (the UAF does not trigger immediately), Also I am not sure how one would test a UAF in a selftest. What actually happens is: * We have a dangling task pointer in local storage. * This is used sometime later which then leads to weird memory corruption errors. > > > > > Fixes: a10787e6d58c ("bpf: Enable task local storage for tracing programs") > > Reported-by: Kuba Piecuch <jpiecuch@google.com> > > Signed-off-by: KP Singh <kpsingh@kernel.org> > > --- > > > > This fixes the regression from the LSM blob based implementation, we can > > still have UAFs, if bpf_task_storage_get is invoked after bpf_task_storage_free > > in the cleanup path. > > Can we fix this by calling bpf_task_storage_free() from free_task()? I think we can yeah. But, this is yet another deviation from how the security blob is managed (security_task_free) frees the blob that we were previously using. > > Thanks, > Song > > > > > kernel/fork.c | 1 + > > 1 file changed, 1 insertion(+) > > > > diff --git a/kernel/fork.c b/kernel/fork.c > > index ed4e01daccaa..112d85091ae6 100644 > > --- a/kernel/fork.c > > +++ b/kernel/fork.c > > @@ -2781,6 +2781,7 @@ __latent_entropy struct task_struct *copy_process( > > exit_sem(p); > > bad_fork_cleanup_security: > > security_task_free(p); > > + bpf_task_storage_free(p); > > bad_fork_cleanup_audit: > > audit_free(p); > > bad_fork_cleanup_perf: > > -- > > 2.41.0.rc0.172.g3f132b7071-goog > > > >
> On Jun 1, 2023, at 9:27 AM, KP Singh <kpsingh@kernel.org> wrote: > > On Thu, Jun 1, 2023 at 6:18 PM Song Liu <song@kernel.org> wrote: >> >> On Thu, Jun 1, 2023 at 5:26 AM KP Singh <kpsingh@kernel.org> wrote: >>> >>> When the the task local storage was generalized for tracing programs, the >>> bpf_task_local_storage callback was moved from a BPF LSM hook callback >>> for security_task_free LSM hook to it's own callback. But a failure case >>> in bad_fork_cleanup_security was missed which, when triggered, led to a dangling >>> task owner pointer and a subsequent use-after-free. >>> >>> This issue was noticed when a BPF LSM program was attached to the >>> task_alloc hook on a kernel with KASAN enabled. The program used >>> bpf_task_storage_get to copy the task local storage from the current >>> task to the new task being created. >> >> This is pretty tricky. Let's add a selftest for this. > > I don't have an easy repro for this (the UAF does not trigger > immediately), Also I am not sure how one would test a UAF in a > selftest. What actually happens is: > > * We have a dangling task pointer in local storage. > * This is used sometime later which then leads to weird memory > corruption errors. I think we will see it easily with KASAN, no? > >> >>> >>> Fixes: a10787e6d58c ("bpf: Enable task local storage for tracing programs") >>> Reported-by: Kuba Piecuch <jpiecuch@google.com> >>> Signed-off-by: KP Singh <kpsingh@kernel.org> >>> --- >>> >>> This fixes the regression from the LSM blob based implementation, we can >>> still have UAFs, if bpf_task_storage_get is invoked after bpf_task_storage_free >>> in the cleanup path. >> >> Can we fix this by calling bpf_task_storage_free() from free_task()? > > I think we can yeah. But, this is yet another deviation from how the > security blob is managed (security_task_free) frees the blob that we > were previously using. Yeah, this will make the code even more tricky. Another idea I had is to filter on task->__state in the helper. IOW, task local storage does not work on starting or died tasks. But I am not sure whether this will make BPF_LSM less effective (not covering certain tasks). Thanks, Song
On Thu, Jun 1, 2023 at 9:54 AM Song Liu <songliubraving@meta.com> wrote: > > > > > On Jun 1, 2023, at 9:27 AM, KP Singh <kpsingh@kernel.org> wrote: > > > > On Thu, Jun 1, 2023 at 6:18 PM Song Liu <song@kernel.org> wrote: > >> > >> On Thu, Jun 1, 2023 at 5:26 AM KP Singh <kpsingh@kernel.org> wrote: > >>> <...> > >>> Fixes: a10787e6d58c ("bpf: Enable task local storage for tracing programs") > >>> Reported-by: Kuba Piecuch <jpiecuch@google.com> > >>> Signed-off-by: KP Singh <kpsingh@kernel.org> > >>> --- > >>> > >>> This fixes the regression from the LSM blob based implementation, we can > >>> still have UAFs, if bpf_task_storage_get is invoked after bpf_task_storage_free > >>> in the cleanup path. > >> > >> Can we fix this by calling bpf_task_storage_free() from free_task()? > > > > I think we can yeah. But, this is yet another deviation from how the > > security blob is managed (security_task_free) frees the blob that we > > were previously using. > > Yeah, this will make the code even more tricky. > > Another idea I had is to filter on task->__state in the helper. IOW, > task local storage does not work on starting or died tasks. But I am > not sure whether this will make BPF_LSM less effective (not covering > certain tasks). > I thought about this as well. But I think some use cases would want to use task local storage at task creation. At least attaching at the sched_newtask tracepoints should be fine in those cases. But at the time of sched_newtask, task->__state is still TASK_NEW. Another idea, is it possible to allow LSM + local storage on starting/dying tasks, but restrict tracing + local storage to created tasks? Hao
On 6/1/23 9:54 AM, Song Liu wrote: > > >> On Jun 1, 2023, at 9:27 AM, KP Singh <kpsingh@kernel.org> wrote: >> >> On Thu, Jun 1, 2023 at 6:18 PM Song Liu <song@kernel.org> wrote: >>> >>> On Thu, Jun 1, 2023 at 5:26 AM KP Singh <kpsingh@kernel.org> wrote: >>>> >>>> When the the task local storage was generalized for tracing programs, the >>>> bpf_task_local_storage callback was moved from a BPF LSM hook callback >>>> for security_task_free LSM hook to it's own callback. But a failure case >>>> in bad_fork_cleanup_security was missed which, when triggered, led to a dangling >>>> task owner pointer and a subsequent use-after-free. >>>> >>>> This issue was noticed when a BPF LSM program was attached to the >>>> task_alloc hook on a kernel with KASAN enabled. The program used >>>> bpf_task_storage_get to copy the task local storage from the current >>>> task to the new task being created. >>> >>> This is pretty tricky. Let's add a selftest for this. >> >> I don't have an easy repro for this (the UAF does not trigger >> immediately), Also I am not sure how one would test a UAF in a >> selftest. What actually happens is: >> >> * We have a dangling task pointer in local storage. >> * This is used sometime later which then leads to weird memory >> corruption errors. > > I think we will see it easily with KASAN, no? > >> >>> >>>> >>>> Fixes: a10787e6d58c ("bpf: Enable task local storage for tracing programs") >>>> Reported-by: Kuba Piecuch <jpiecuch@google.com> >>>> Signed-off-by: KP Singh <kpsingh@kernel.org> >>>> --- >>>> >>>> This fixes the regression from the LSM blob based implementation, we can >>>> still have UAFs, if bpf_task_storage_get is invoked after bpf_task_storage_free >>>> in the cleanup path. >>> >>> Can we fix this by calling bpf_task_storage_free() from free_task()? >> >> I think we can yeah. But, this is yet another deviation from how the >> security blob is managed (security_task_free) frees the blob that we >> were previously using. Does it mean doing bpf_task_storage_free() in free_task() will break some use cases? Could you explain? Doing bpf_task_storage_free() in free_task() seems to be more straight forward. > > Yeah, this will make the code even more tricky. > > Another idea I had is to filter on task->__state in the helper. IOW, > task local storage does not work on starting or died tasks. But I am > not sure whether this will make BPF_LSM less effective (not covering > certain tasks). > > Thanks, > Song > >
On Thu, Jun 1, 2023 at 7:47 PM Martin KaFai Lau <martin.lau@linux.dev> wrote: > > On 6/1/23 9:54 AM, Song Liu wrote: > > > > > >> On Jun 1, 2023, at 9:27 AM, KP Singh <kpsingh@kernel.org> wrote: > >> > >> On Thu, Jun 1, 2023 at 6:18 PM Song Liu <song@kernel.org> wrote: > >>> > >>> On Thu, Jun 1, 2023 at 5:26 AM KP Singh <kpsingh@kernel.org> wrote: > >>>> > >>>> When the the task local storage was generalized for tracing programs, the > >>>> bpf_task_local_storage callback was moved from a BPF LSM hook callback > >>>> for security_task_free LSM hook to it's own callback. But a failure case > >>>> in bad_fork_cleanup_security was missed which, when triggered, led to a dangling > >>>> task owner pointer and a subsequent use-after-free. > >>>> > >>>> This issue was noticed when a BPF LSM program was attached to the > >>>> task_alloc hook on a kernel with KASAN enabled. The program used > >>>> bpf_task_storage_get to copy the task local storage from the current > >>>> task to the new task being created. > >>> > >>> This is pretty tricky. Let's add a selftest for this. > >> > >> I don't have an easy repro for this (the UAF does not trigger > >> immediately), Also I am not sure how one would test a UAF in a > >> selftest. What actually happens is: > >> > >> * We have a dangling task pointer in local storage. > >> * This is used sometime later which then leads to weird memory > >> corruption errors. > > > > I think we will see it easily with KASAN, no? No, the issue only happens when copy_process fails for some reason (which one can possibly trigger with error injection / fexit) and then somehow triggers the UAF. Even if one does manage to trigger the KASAN warning, we won't fail the selftest, so I don't see this in the selftest territory TBH. What do you have in mind? > > > >> > >>> > >>>> > >>>> Fixes: a10787e6d58c ("bpf: Enable task local storage for tracing programs") > >>>> Reported-by: Kuba Piecuch <jpiecuch@google.com> > >>>> Signed-off-by: KP Singh <kpsingh@kernel.org> > >>>> --- > >>>> > >>>> This fixes the regression from the LSM blob based implementation, we can > >>>> still have UAFs, if bpf_task_storage_get is invoked after bpf_task_storage_free > >>>> in the cleanup path. > >>> > >>> Can we fix this by calling bpf_task_storage_free() from free_task()? > >> > >> I think we can yeah. But, this is yet another deviation from how the > >> security blob is managed (security_task_free) frees the blob that we > >> were previously using. > > Does it mean doing bpf_task_storage_free() in free_task() will break some use > cases? Could you explain? > Doing bpf_task_storage_free() in free_task() seems to be more straight forward. Superficially, I don't see any issues . All I am saying is that, before we generalized task local storage, it was allocated and freed as a security blob and now it's deviating further. Should we just consider moving security_task_free into task_free then? > > > > > Yeah, this will make the code even more tricky. > > > > Another idea I had is to filter on task->__state in the helper. IOW, bailing out on __state == TASK_DEAD should be reasonable. > > task local storage does not work on starting or died tasks. But I am > > not sure whether this will make BPF_LSM less effective (not covering > > certain tasks). As long as the task local storage is usable in LSM hooks like security_task_alloc it's okay > > > > Thanks, > > Song > > > > >
On Thu, Jun 1, 2023 at 8:24 PM KP Singh <kpsingh@kernel.org> wrote: > > On Thu, Jun 1, 2023 at 7:47 PM Martin KaFai Lau <martin.lau@linux.dev> wrote: > > > > On 6/1/23 9:54 AM, Song Liu wrote: > > > > > > > > >> On Jun 1, 2023, at 9:27 AM, KP Singh <kpsingh@kernel.org> wrote: > > >> > > >> On Thu, Jun 1, 2023 at 6:18 PM Song Liu <song@kernel.org> wrote: > > >>> > > >>> On Thu, Jun 1, 2023 at 5:26 AM KP Singh <kpsingh@kernel.org> wrote: > > >>>> > > >>>> When the the task local storage was generalized for tracing programs, the > > >>>> bpf_task_local_storage callback was moved from a BPF LSM hook callback > > >>>> for security_task_free LSM hook to it's own callback. But a failure case > > >>>> in bad_fork_cleanup_security was missed which, when triggered, led to a dangling > > >>>> task owner pointer and a subsequent use-after-free. > > >>>> > > >>>> This issue was noticed when a BPF LSM program was attached to the > > >>>> task_alloc hook on a kernel with KASAN enabled. The program used > > >>>> bpf_task_storage_get to copy the task local storage from the current > > >>>> task to the new task being created. > > >>> > > >>> This is pretty tricky. Let's add a selftest for this. > > >> > > >> I don't have an easy repro for this (the UAF does not trigger > > >> immediately), Also I am not sure how one would test a UAF in a > > >> selftest. What actually happens is: > > >> > > >> * We have a dangling task pointer in local storage. > > >> * This is used sometime later which then leads to weird memory > > >> corruption errors. > > > > > > I think we will see it easily with KASAN, no? > > No, the issue only happens when copy_process fails for some reason > (which one can possibly trigger with error injection / fexit) and then > somehow triggers the UAF. > > Even if one does manage to trigger the KASAN warning, we won't fail > the selftest, so I don't see this in the selftest territory TBH. What > do you have in mind? > > > > > > >> > > >>> > > >>>> > > >>>> Fixes: a10787e6d58c ("bpf: Enable task local storage for tracing programs") > > >>>> Reported-by: Kuba Piecuch <jpiecuch@google.com> > > >>>> Signed-off-by: KP Singh <kpsingh@kernel.org> > > >>>> --- > > >>>> > > >>>> This fixes the regression from the LSM blob based implementation, we can > > >>>> still have UAFs, if bpf_task_storage_get is invoked after bpf_task_storage_free > > >>>> in the cleanup path. > > >>> > > >>> Can we fix this by calling bpf_task_storage_free() from free_task()? > > >> > > >> I think we can yeah. But, this is yet another deviation from how the > > >> security blob is managed (security_task_free) frees the blob that we > > >> were previously using. > > > > Does it mean doing bpf_task_storage_free() in free_task() will break some use > > cases? Could you explain? > > Doing bpf_task_storage_free() in free_task() seems to be more straight forward. > > Superficially, I don't see any issues . All I am saying is that, > before we generalized task local storage, it was allocated and freed > as a security blob and now it's deviating further. Should we just > consider moving security_task_free into task_free then? For now, I am going to move bpf_task_storage_free into free_task as it's the immediate fix and respin. > > > > > > > > > Yeah, this will make the code even more tricky. > > > > > > Another idea I had is to filter on task->__state in the helper. IOW, > > bailing out on __state == TASK_DEAD should be reasonable. > > > > task local storage does not work on starting or died tasks. But I am > > > not sure whether this will make BPF_LSM less effective (not covering > > > certain tasks). > > As long as the task local storage is usable in LSM hooks like > security_task_alloc it's okay > > > > > > > Thanks, > > > Song > > > > > > > >
diff --git a/kernel/fork.c b/kernel/fork.c index ed4e01daccaa..112d85091ae6 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -2781,6 +2781,7 @@ __latent_entropy struct task_struct *copy_process( exit_sem(p); bad_fork_cleanup_security: security_task_free(p); + bpf_task_storage_free(p); bad_fork_cleanup_audit: audit_free(p); bad_fork_cleanup_perf:
When the the task local storage was generalized for tracing programs, the bpf_task_local_storage callback was moved from a BPF LSM hook callback for security_task_free LSM hook to it's own callback. But a failure case in bad_fork_cleanup_security was missed which, when triggered, led to a dangling task owner pointer and a subsequent use-after-free. This issue was noticed when a BPF LSM program was attached to the task_alloc hook on a kernel with KASAN enabled. The program used bpf_task_storage_get to copy the task local storage from the current task to the new task being created. Fixes: a10787e6d58c ("bpf: Enable task local storage for tracing programs") Reported-by: Kuba Piecuch <jpiecuch@google.com> Signed-off-by: KP Singh <kpsingh@kernel.org> --- This fixes the regression from the LSM blob based implementation, we can still have UAFs, if bpf_task_storage_get is invoked after bpf_task_storage_free in the cleanup path. kernel/fork.c | 1 + 1 file changed, 1 insertion(+)