diff mbox series

[bpf,v2,3/4] bpf: track subprog poke correctly

Message ID 162388413965.151936.16775592753297385087.stgit@john-XPS-13-9370 (mailing list archive)
State Changes Requested
Delegated to: BPF
Headers show
Series BPF fixes mixed tail and bpf2bpf calls | expand

Checks

Context Check Description
netdev/cover_letter success Link
netdev/fixes_present success Link
netdev/patch_count success Link
netdev/tree_selection success Clearly marked for bpf
netdev/subject_prefix success Link
netdev/cc_maintainers warning 6 maintainers not CCed: yhs@fb.com kpsingh@kernel.org bpf@vger.kernel.org andrii@kernel.org kafai@fb.com songliubraving@fb.com
netdev/source_inline success Was 0 now: 0
netdev/verify_signedoff success Link
netdev/module_param success Was 0 now: 0
netdev/build_32bit success Errors and warnings before: 10031 this patch: 10031
netdev/kdoc success Errors and warnings before: 1 this patch: 1
netdev/verify_fixes success Link
netdev/checkpatch warning WARNING: line length of 82 exceeds 80 columns WARNING: line length of 89 exceeds 80 columns WARNING: line length of 90 exceeds 80 columns
netdev/build_allmodconfig_warn success Errors and warnings before: 10444 this patch: 10444
netdev/header_inline success Link

Commit Message

John Fastabend June 16, 2021, 10:55 p.m. UTC
Subprograms are calling map_poke_track but on program release there is no
hook to call map_poke_untrack. But on prog release the aux memory is freed
even though we still have a reference to it in the element list of the
map aux data.

So when we run map_poke_run() we end up accessing free'd memory. This
triggers with KASAN in prog_array_map_poke_run() shown here.

[  402.824686] ==================================================================
[  402.824689] BUG: KASAN: use-after-free in prog_array_map_poke_run+0xc2/0x34e
[  402.824698] Read of size 4 at addr ffff8881905a7940 by task hubble-fgs/4337

[  402.824705] CPU: 1 PID: 4337 Comm: hubble-fgs Tainted: G          I       5.12.0+ #399
[  402.824715] Call Trace:
[  402.824719]  dump_stack+0x93/0xc2
[  402.824727]  print_address_description.constprop.0+0x1a/0x140
[  402.824736]  ? prog_array_map_poke_run+0xc2/0x34e
[  402.824740]  ? prog_array_map_poke_run+0xc2/0x34e
[  402.824744]  kasan_report.cold+0x7c/0xd8
[  402.824752]  ? prog_array_map_poke_run+0xc2/0x34e
[  402.824757]  prog_array_map_poke_run+0xc2/0x34e
[  402.824765]  bpf_fd_array_map_update_elem+0x124/0x1a0

The elements concerned are walked like this,

                for (i = 0; i < elem->aux->size_poke_tab; i++) {
                        poke = &elem->aux->poke_tab[i];

So the access to size_poke_tab is the 4B read, verified by checking offsets
in the KASAN dump,

[  402.825004] The buggy address belongs to the object at ffff8881905a7800
                which belongs to the cache kmalloc-1k of size 1024
[  402.825008] The buggy address is located 320 bytes inside of
                1024-byte region [ffff8881905a7800, ffff8881905a7c00)

With pahol output,

 struct bpf_prog_aux {
 ...
        /* --- cacheline 5 boundary (320 bytes) --- */
        u32                        size_poke_tab;        /*   320     4 */
 ...

To fix track the map references by using a per subprogram used_maps array
and used_map_cnt values to hold references into the maps so when the
subprogram is released we can then untrack from the correct map using
the correct aux field.

Here we a slightly less than optimal because we insert all poke entries
into the used_map array, even duplicates. In theory we could get by
with only unique entries. This would require an extra collect the maps
stage though and seems unnecessary when this is simply an extra 8B
per duplicate. It also makes the logic simpler and the untrack hook
is happy to ignore entries previously removed.

Reported-by: Jussi Maki <joamaki@gmail.com>
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
---
 include/linux/bpf.h   |    1 +
 kernel/bpf/core.c     |    6 ++++--
 kernel/bpf/verifier.c |   36 +++++++++++++++++++++++++-----------
 3 files changed, 30 insertions(+), 13 deletions(-)

Comments

Alexei Starovoitov June 22, 2021, 10 p.m. UTC | #1
On Wed, Jun 16, 2021 at 03:55:39PM -0700, John Fastabend wrote:
>  
> -static void bpf_free_used_maps(struct bpf_prog_aux *aux)
> +void bpf_free_used_maps(struct bpf_prog_aux *aux)
>  {
>  	__bpf_free_used_maps(aux, aux->used_maps, aux->used_map_cnt);
>  	kfree(aux->used_maps);
> @@ -2211,8 +2211,10 @@ static void bpf_prog_free_deferred(struct work_struct *work)
>  #endif
>  	if (aux->dst_trampoline)
>  		bpf_trampoline_put(aux->dst_trampoline);
> -	for (i = 0; i < aux->func_cnt; i++)
> +	for (i = 0; i < aux->func_cnt; i++) {
> +		bpf_free_used_maps(aux->func[i]->aux);
>  		bpf_jit_free(aux->func[i]);
> +	}

The sub-progs don't have all the properties of the main prog.
Only main prog suppose to keep maps incremented.
After this patch the prog with 100 subprogs will artificially bump maps
refcnt 100 times as a workaround for poke_tab access.
May be we can use single poke_tab in the main prog instead.
Looks like jit_subprogs is splitting the poke_tab into individual arrays
for each subprog, but maps are tracked by the main prog only.
That's the root cause of the issue, right?
I think that split of poke_tab isn't necessary.
bpf_int_jit_compile() can look into main prog poke_tab instead.
Then the loop:
for (j = 0; j < prog->aux->size_poke_tab)
    bpf_jit_add_poke_descriptor(func[i], &prog->aux->poke_tab[j]);
can be removed (It will address the concern in patch 2 as well, right?)
And hopefully will fix UAF too?
John Fastabend June 22, 2021, 11:02 p.m. UTC | #2
Alexei Starovoitov wrote:
> On Wed, Jun 16, 2021 at 03:55:39PM -0700, John Fastabend wrote:
> >  
> > -static void bpf_free_used_maps(struct bpf_prog_aux *aux)
> > +void bpf_free_used_maps(struct bpf_prog_aux *aux)
> >  {
> >  	__bpf_free_used_maps(aux, aux->used_maps, aux->used_map_cnt);
> >  	kfree(aux->used_maps);
> > @@ -2211,8 +2211,10 @@ static void bpf_prog_free_deferred(struct work_struct *work)
> >  #endif
> >  	if (aux->dst_trampoline)
> >  		bpf_trampoline_put(aux->dst_trampoline);
> > -	for (i = 0; i < aux->func_cnt; i++)
> > +	for (i = 0; i < aux->func_cnt; i++) {
> > +		bpf_free_used_maps(aux->func[i]->aux);
> >  		bpf_jit_free(aux->func[i]);
> > +	}
> 
> The sub-progs don't have all the properties of the main prog.
> Only main prog suppose to keep maps incremented.
> After this patch the prog with 100 subprogs will artificially bump maps
> refcnt 100 times as a workaround for poke_tab access.

Yep.

> May be we can use single poke_tab in the main prog instead.
> Looks like jit_subprogs is splitting the poke_tab into individual arrays
> for each subprog, but maps are tracked by the main prog only.
> That's the root cause of the issue, right?

Correct.

> I think that split of poke_tab isn't necessary.
> bpf_int_jit_compile() can look into main prog poke_tab instead.
> Then the loop:
> for (j = 0; j < prog->aux->size_poke_tab)
>     bpf_jit_add_poke_descriptor(func[i], &prog->aux->poke_tab[j]);
> can be removed (It will address the concern in patch 2 as well, right?)
> And hopefully will fix UAF too?

Looks like it to me as well. A few details to work out around
imm value and emit hooks on the jit side, but looks doable to me.
I'll give it a try tomorrow or tonight.

Thanks,
John
Alexei Starovoitov June 22, 2021, 11:07 p.m. UTC | #3
On Tue, Jun 22, 2021 at 4:02 PM John Fastabend <john.fastabend@gmail.com> wrote:
>
> Alexei Starovoitov wrote:
> > On Wed, Jun 16, 2021 at 03:55:39PM -0700, John Fastabend wrote:
> > >
> > > -static void bpf_free_used_maps(struct bpf_prog_aux *aux)
> > > +void bpf_free_used_maps(struct bpf_prog_aux *aux)
> > >  {
> > >     __bpf_free_used_maps(aux, aux->used_maps, aux->used_map_cnt);
> > >     kfree(aux->used_maps);
> > > @@ -2211,8 +2211,10 @@ static void bpf_prog_free_deferred(struct work_struct *work)
> > >  #endif
> > >     if (aux->dst_trampoline)
> > >             bpf_trampoline_put(aux->dst_trampoline);
> > > -   for (i = 0; i < aux->func_cnt; i++)
> > > +   for (i = 0; i < aux->func_cnt; i++) {
> > > +           bpf_free_used_maps(aux->func[i]->aux);
> > >             bpf_jit_free(aux->func[i]);
> > > +   }
> >
> > The sub-progs don't have all the properties of the main prog.
> > Only main prog suppose to keep maps incremented.
> > After this patch the prog with 100 subprogs will artificially bump maps
> > refcnt 100 times as a workaround for poke_tab access.
>
> Yep.
>
> > May be we can use single poke_tab in the main prog instead.
> > Looks like jit_subprogs is splitting the poke_tab into individual arrays
> > for each subprog, but maps are tracked by the main prog only.
> > That's the root cause of the issue, right?
>
> Correct.
>
> > I think that split of poke_tab isn't necessary.
> > bpf_int_jit_compile() can look into main prog poke_tab instead.
> > Then the loop:
> > for (j = 0; j < prog->aux->size_poke_tab)
> >     bpf_jit_add_poke_descriptor(func[i], &prog->aux->poke_tab[j]);
> > can be removed (It will address the concern in patch 2 as well, right?)
> > And hopefully will fix UAF too?
>
> Looks like it to me as well. A few details to work out around
> imm value and emit hooks on the jit side, but looks doable to me.
> I'll give it a try tomorrow or tonight.

Perfect. Thank you John.
diff mbox series

Patch

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 02b02cb29ce2..c037c67347c0 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -1780,6 +1780,7 @@  static inline struct bpf_prog *bpf_prog_get_type(u32 ufd,
 	return bpf_prog_get_type_dev(ufd, type, false);
 }
 
+void bpf_free_used_maps(struct bpf_prog_aux *aux);
 void __bpf_free_used_maps(struct bpf_prog_aux *aux,
 			  struct bpf_map **used_maps, u32 len);
 
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 5e31ee9f7512..ce5bb8932958 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -2167,7 +2167,7 @@  void __bpf_free_used_maps(struct bpf_prog_aux *aux,
 	}
 }
 
-static void bpf_free_used_maps(struct bpf_prog_aux *aux)
+void bpf_free_used_maps(struct bpf_prog_aux *aux)
 {
 	__bpf_free_used_maps(aux, aux->used_maps, aux->used_map_cnt);
 	kfree(aux->used_maps);
@@ -2211,8 +2211,10 @@  static void bpf_prog_free_deferred(struct work_struct *work)
 #endif
 	if (aux->dst_trampoline)
 		bpf_trampoline_put(aux->dst_trampoline);
-	for (i = 0; i < aux->func_cnt; i++)
+	for (i = 0; i < aux->func_cnt; i++) {
+		bpf_free_used_maps(aux->func[i]->aux);
 		bpf_jit_free(aux->func[i]);
+	}
 	if (aux->func_cnt) {
 		kfree(aux->func);
 		bpf_prog_unlock_free(aux->prog);
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 066fac9b5460..31c0f3ad9626 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -12128,14 +12128,32 @@  static int jit_subprogs(struct bpf_verifier_env *env)
 			func[i]->insnsi[insn_idx - subprog_start].imm = ret + 1;
 		}
 
-		for (j = 0; j < func[i]->aux->size_poke_tab; j++) {
-			int ret;
+		/* overapproximate the number of map slots. Untrack will just skip
+		 * the lookup anyways and we avoid an extra layer of accounting.
+		 */
+		if (func[i]->aux->size_poke_tab) {
+			struct bpf_map **used_maps;
 
-			map_ptr = func[i]->aux->poke_tab[j].tail_call.map;
-			ret = map_ptr->ops->map_poke_track(map_ptr, func[i]->aux);
-			if (ret < 0) {
-				verbose(env, "tracking tail call prog failed\n");
+			used_maps = kmalloc_array(func[i]->aux->size_poke_tab,
+						  sizeof(struct bpf_map *),
+						  GFP_KERNEL);
+			if (!used_maps)
 				goto out_free;
+
+			func[i]->aux->used_maps = used_maps;
+
+			for (j = 0; j < func[i]->aux->size_poke_tab; j++) {
+				int ret;
+
+				map_ptr = func[i]->aux->poke_tab[j].tail_call.map;
+				ret = map_ptr->ops->map_poke_track(map_ptr, func[i]->aux);
+				if (ret < 0) {
+					verbose(env, "tracking tail call prog failed\n");
+					goto out_free;
+				}
+				bpf_map_inc(map_ptr);
+				func[i]->aux->used_map_cnt++;
+				func[i]->aux->used_maps[j] = map_ptr;
 			}
 		}
 
@@ -12259,11 +12277,7 @@  static int jit_subprogs(struct bpf_verifier_env *env)
 	for (i = 0; i < env->subprog_cnt; i++) {
 		if (!func[i])
 			continue;
-
-		for (j = 0; j < func[i]->aux->size_poke_tab; j++) {
-			map_ptr = func[i]->aux->poke_tab[j].tail_call.map;
-			map_ptr->ops->map_poke_untrack(map_ptr, func[i]->aux);
-		}
+		bpf_free_used_maps(func[i]->aux);
 		bpf_jit_free(func[i]);
 	}
 	kfree(func);