Message ID | 20220316195400.2998326-1-joannekoong@fb.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | BPF |
Headers | show |
Series | [bpf-next,v2] bpf: Enable non-atomic allocations in local storage | expand |
On Wed, Mar 16, 2022 at 12:54:00PM -0700, Joanne Koong wrote: > From: Joanne Koong <joannelkoong@gmail.com> > > Currently, local storage memory can only be allocated atomically > (GFP_ATOMIC). This restriction is too strict for sleepable bpf > programs. > > In this patch, the verifier detects whether the program is sleepable, > and passes the corresponding GFP_KERNEL or GFP_ATOMIC flag as a > 5th argument to bpf_task/sk/inode_storage_get. This flag will propagate > down to the local storage functions that allocate memory. > > Please note that bpf_task/sk/inode_storage_update_elem functions are > invoked by userspace applications through syscalls. Preemption is > disabled before bpf_task/sk/inode_storage_update_elem is called, which > means they will always have to allocate memory atomically. > > The existing local storage selftests cover both the GFP_ATOMIC and the > GFP_KERNEL cases in bpf_local_storage_update. > > v2 <- v1: > * Allocate the memory before/after the raw_spin_lock_irqsave, depending > on the gfp flags > * Rename mem_flags to gfp_flags > * Reword the comment "*mem_flags* is set by the bpf verifier" to > "*gfp_flags* is a hidden argument provided by the verifier" > * Add a sentence to the commit message about existing local storage > selftests covering both the GFP_ATOMIC and GFP_KERNEL paths in > bpf_local_storage_update. [ ... ] > struct bpf_local_storage_data * > bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap, > - void *value, u64 map_flags) > + void *value, u64 map_flags, gfp_t gfp_flags) > { > struct bpf_local_storage_data *old_sdata = NULL; > - struct bpf_local_storage_elem *selem; > + struct bpf_local_storage_elem *selem = NULL; > struct bpf_local_storage *local_storage; > unsigned long flags; > int err; > @@ -365,6 +366,9 @@ bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap, > !map_value_has_spin_lock(&smap->map))) > return ERR_PTR(-EINVAL); > > + if (gfp_flags == GFP_KERNEL && (map_flags & ~BPF_F_LOCK) != BPF_NOEXIST) > + return ERR_PTR(-EINVAL); > + > local_storage = rcu_dereference_check(*owner_storage(smap, owner), > bpf_rcu_lock_held()); > if (!local_storage || hlist_empty(&local_storage->list)) { > @@ -373,11 +377,11 @@ bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap, > if (err) > return ERR_PTR(err); > > - selem = bpf_selem_alloc(smap, owner, value, true); > + selem = bpf_selem_alloc(smap, owner, value, true, gfp_flags); > if (!selem) > return ERR_PTR(-ENOMEM); > > - err = bpf_local_storage_alloc(owner, smap, selem); > + err = bpf_local_storage_alloc(owner, smap, selem, gfp_flags); > if (err) { > kfree(selem); > mem_uncharge(smap, owner, smap->elem_size); > @@ -404,6 +408,12 @@ bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap, > } > } > > + if (gfp_flags == GFP_KERNEL) { > + selem = bpf_selem_alloc(smap, owner, value, true, gfp_flags); I think this new path is not executed by the existing "progs/local_storage.c" test which has sleepable lsm prog. At least a second BPF_MAP_TYPE_TASK_STORAGE map (or SK_STORAGE) is needed? or there is other selftest covering this new path that I missed? Others lgtm. Acked-by: Martin KaFai Lau <kafai@fb.com> > + if (!selem) > + return ERR_PTR(-ENOMEM); > + } > + > raw_spin_lock_irqsave(&local_storage->lock, flags); > > /* Recheck local_storage->list under local_storage->lock */
Hi Joanne,
Thank you for the patch! Perhaps something to improve:
[auto build test WARNING on bpf-next/master]
url: https://github.com/0day-ci/linux/commits/Joanne-Koong/bpf-Enable-non-atomic-allocations-in-local-storage/20220317-035639
base: https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
config: s390-randconfig-s031-20220313 (https://download.01.org/0day-ci/archive/20220317/202203171134.RcEuYPsA-lkp@intel.com/config)
compiler: s390-linux-gcc (GCC) 11.2.0
reproduce:
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# apt-get install sparse
# sparse version: v0.6.4-dirty
# https://github.com/0day-ci/linux/commit/022ee8bbd2bfdefff36373ab838326754fe2bb55
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review Joanne-Koong/bpf-Enable-non-atomic-allocations-in-local-storage/20220317-035639
git checkout 022ee8bbd2bfdefff36373ab838326754fe2bb55
# save the config file to linux build tree
mkdir build_dir
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.2.0 make.cross C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' O=build_dir ARCH=s390 SHELL=/bin/bash
If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
sparse warnings: (new ones prefixed by >>)
>> kernel/bpf/verifier.c:13498:47: sparse: sparse: incorrect type in initializer (different base types) @@ expected signed int [usertype] imm @@ got restricted gfp_t @@
kernel/bpf/verifier.c:13498:47: sparse: expected signed int [usertype] imm
kernel/bpf/verifier.c:13498:47: sparse: got restricted gfp_t
kernel/bpf/verifier.c:13500:47: sparse: sparse: incorrect type in initializer (different base types) @@ expected signed int [usertype] imm @@ got restricted gfp_t @@
kernel/bpf/verifier.c:13500:47: sparse: expected signed int [usertype] imm
kernel/bpf/verifier.c:13500:47: sparse: got restricted gfp_t
kernel/bpf/verifier.c:13726:38: sparse: sparse: subtraction of functions? Share your drugs
kernel/bpf/verifier.c: note: in included file (through include/linux/bpf.h, include/linux/bpf-cgroup.h):
include/linux/bpfptr.h:52:47: sparse: sparse: cast to non-scalar
include/linux/bpfptr.h:52:47: sparse: sparse: cast from non-scalar
include/linux/bpfptr.h:63:40: sparse: sparse: cast to non-scalar
include/linux/bpfptr.h:63:40: sparse: sparse: cast from non-scalar
include/linux/bpfptr.h:52:47: sparse: sparse: cast to non-scalar
include/linux/bpfptr.h:52:47: sparse: sparse: cast from non-scalar
include/linux/bpfptr.h:63:40: sparse: sparse: cast to non-scalar
include/linux/bpfptr.h:63:40: sparse: sparse: cast from non-scalar
include/linux/bpfptr.h:52:47: sparse: sparse: cast to non-scalar
include/linux/bpfptr.h:52:47: sparse: sparse: cast from non-scalar
include/linux/bpfptr.h:63:40: sparse: sparse: cast to non-scalar
include/linux/bpfptr.h:63:40: sparse: sparse: cast from non-scalar
include/linux/bpfptr.h:52:47: sparse: sparse: cast to non-scalar
include/linux/bpfptr.h:52:47: sparse: sparse: cast from non-scalar
include/linux/bpfptr.h:52:47: sparse: sparse: cast to non-scalar
include/linux/bpfptr.h:52:47: sparse: sparse: cast from non-scalar
vim +13498 kernel/bpf/verifier.c
13234
13235 /* Do various post-verification rewrites in a single program pass.
13236 * These rewrites simplify JIT and interpreter implementations.
13237 */
13238 static int do_misc_fixups(struct bpf_verifier_env *env)
13239 {
13240 struct bpf_prog *prog = env->prog;
13241 enum bpf_attach_type eatype = prog->expected_attach_type;
13242 bool expect_blinding = bpf_jit_blinding_enabled(prog);
13243 enum bpf_prog_type prog_type = resolve_prog_type(prog);
13244 struct bpf_insn *insn = prog->insnsi;
13245 const struct bpf_func_proto *fn;
13246 const int insn_cnt = prog->len;
13247 const struct bpf_map_ops *ops;
13248 struct bpf_insn_aux_data *aux;
13249 struct bpf_insn insn_buf[16];
13250 struct bpf_prog *new_prog;
13251 struct bpf_map *map_ptr;
13252 int i, ret, cnt, delta = 0;
13253
13254 for (i = 0; i < insn_cnt; i++, insn++) {
13255 /* Make divide-by-zero exceptions impossible. */
13256 if (insn->code == (BPF_ALU64 | BPF_MOD | BPF_X) ||
13257 insn->code == (BPF_ALU64 | BPF_DIV | BPF_X) ||
13258 insn->code == (BPF_ALU | BPF_MOD | BPF_X) ||
13259 insn->code == (BPF_ALU | BPF_DIV | BPF_X)) {
13260 bool is64 = BPF_CLASS(insn->code) == BPF_ALU64;
13261 bool isdiv = BPF_OP(insn->code) == BPF_DIV;
13262 struct bpf_insn *patchlet;
13263 struct bpf_insn chk_and_div[] = {
13264 /* [R,W]x div 0 -> 0 */
13265 BPF_RAW_INSN((is64 ? BPF_JMP : BPF_JMP32) |
13266 BPF_JNE | BPF_K, insn->src_reg,
13267 0, 2, 0),
13268 BPF_ALU32_REG(BPF_XOR, insn->dst_reg, insn->dst_reg),
13269 BPF_JMP_IMM(BPF_JA, 0, 0, 1),
13270 *insn,
13271 };
13272 struct bpf_insn chk_and_mod[] = {
13273 /* [R,W]x mod 0 -> [R,W]x */
13274 BPF_RAW_INSN((is64 ? BPF_JMP : BPF_JMP32) |
13275 BPF_JEQ | BPF_K, insn->src_reg,
13276 0, 1 + (is64 ? 0 : 1), 0),
13277 *insn,
13278 BPF_JMP_IMM(BPF_JA, 0, 0, 1),
13279 BPF_MOV32_REG(insn->dst_reg, insn->dst_reg),
13280 };
13281
13282 patchlet = isdiv ? chk_and_div : chk_and_mod;
13283 cnt = isdiv ? ARRAY_SIZE(chk_and_div) :
13284 ARRAY_SIZE(chk_and_mod) - (is64 ? 2 : 0);
13285
13286 new_prog = bpf_patch_insn_data(env, i + delta, patchlet, cnt);
13287 if (!new_prog)
13288 return -ENOMEM;
13289
13290 delta += cnt - 1;
13291 env->prog = prog = new_prog;
13292 insn = new_prog->insnsi + i + delta;
13293 continue;
13294 }
13295
13296 /* Implement LD_ABS and LD_IND with a rewrite, if supported by the program type. */
13297 if (BPF_CLASS(insn->code) == BPF_LD &&
13298 (BPF_MODE(insn->code) == BPF_ABS ||
13299 BPF_MODE(insn->code) == BPF_IND)) {
13300 cnt = env->ops->gen_ld_abs(insn, insn_buf);
13301 if (cnt == 0 || cnt >= ARRAY_SIZE(insn_buf)) {
13302 verbose(env, "bpf verifier is misconfigured\n");
13303 return -EINVAL;
13304 }
13305
13306 new_prog = bpf_patch_insn_data(env, i + delta, insn_buf, cnt);
13307 if (!new_prog)
13308 return -ENOMEM;
13309
13310 delta += cnt - 1;
13311 env->prog = prog = new_prog;
13312 insn = new_prog->insnsi + i + delta;
13313 continue;
13314 }
13315
13316 /* Rewrite pointer arithmetic to mitigate speculation attacks. */
13317 if (insn->code == (BPF_ALU64 | BPF_ADD | BPF_X) ||
13318 insn->code == (BPF_ALU64 | BPF_SUB | BPF_X)) {
13319 const u8 code_add = BPF_ALU64 | BPF_ADD | BPF_X;
13320 const u8 code_sub = BPF_ALU64 | BPF_SUB | BPF_X;
13321 struct bpf_insn *patch = &insn_buf[0];
13322 bool issrc, isneg, isimm;
13323 u32 off_reg;
13324
13325 aux = &env->insn_aux_data[i + delta];
13326 if (!aux->alu_state ||
13327 aux->alu_state == BPF_ALU_NON_POINTER)
13328 continue;
13329
13330 isneg = aux->alu_state & BPF_ALU_NEG_VALUE;
13331 issrc = (aux->alu_state & BPF_ALU_SANITIZE) ==
13332 BPF_ALU_SANITIZE_SRC;
13333 isimm = aux->alu_state & BPF_ALU_IMMEDIATE;
13334
13335 off_reg = issrc ? insn->src_reg : insn->dst_reg;
13336 if (isimm) {
13337 *patch++ = BPF_MOV32_IMM(BPF_REG_AX, aux->alu_limit);
13338 } else {
13339 if (isneg)
13340 *patch++ = BPF_ALU64_IMM(BPF_MUL, off_reg, -1);
13341 *patch++ = BPF_MOV32_IMM(BPF_REG_AX, aux->alu_limit);
13342 *patch++ = BPF_ALU64_REG(BPF_SUB, BPF_REG_AX, off_reg);
13343 *patch++ = BPF_ALU64_REG(BPF_OR, BPF_REG_AX, off_reg);
13344 *patch++ = BPF_ALU64_IMM(BPF_NEG, BPF_REG_AX, 0);
13345 *patch++ = BPF_ALU64_IMM(BPF_ARSH, BPF_REG_AX, 63);
13346 *patch++ = BPF_ALU64_REG(BPF_AND, BPF_REG_AX, off_reg);
13347 }
13348 if (!issrc)
13349 *patch++ = BPF_MOV64_REG(insn->dst_reg, insn->src_reg);
13350 insn->src_reg = BPF_REG_AX;
13351 if (isneg)
13352 insn->code = insn->code == code_add ?
13353 code_sub : code_add;
13354 *patch++ = *insn;
13355 if (issrc && isneg && !isimm)
13356 *patch++ = BPF_ALU64_IMM(BPF_MUL, off_reg, -1);
13357 cnt = patch - insn_buf;
13358
13359 new_prog = bpf_patch_insn_data(env, i + delta, insn_buf, cnt);
13360 if (!new_prog)
13361 return -ENOMEM;
13362
13363 delta += cnt - 1;
13364 env->prog = prog = new_prog;
13365 insn = new_prog->insnsi + i + delta;
13366 continue;
13367 }
13368
13369 if (insn->code != (BPF_JMP | BPF_CALL))
13370 continue;
13371 if (insn->src_reg == BPF_PSEUDO_CALL)
13372 continue;
13373 if (insn->src_reg == BPF_PSEUDO_KFUNC_CALL) {
13374 ret = fixup_kfunc_call(env, insn);
13375 if (ret)
13376 return ret;
13377 continue;
13378 }
13379
13380 if (insn->imm == BPF_FUNC_get_route_realm)
13381 prog->dst_needed = 1;
13382 if (insn->imm == BPF_FUNC_get_prandom_u32)
13383 bpf_user_rnd_init_once();
13384 if (insn->imm == BPF_FUNC_override_return)
13385 prog->kprobe_override = 1;
13386 if (insn->imm == BPF_FUNC_tail_call) {
13387 /* If we tail call into other programs, we
13388 * cannot make any assumptions since they can
13389 * be replaced dynamically during runtime in
13390 * the program array.
13391 */
13392 prog->cb_access = 1;
13393 if (!allow_tail_call_in_subprogs(env))
13394 prog->aux->stack_depth = MAX_BPF_STACK;
13395 prog->aux->max_pkt_offset = MAX_PACKET_OFF;
13396
13397 /* mark bpf_tail_call as different opcode to avoid
13398 * conditional branch in the interpreter for every normal
13399 * call and to prevent accidental JITing by JIT compiler
13400 * that doesn't support bpf_tail_call yet
13401 */
13402 insn->imm = 0;
13403 insn->code = BPF_JMP | BPF_TAIL_CALL;
13404
13405 aux = &env->insn_aux_data[i + delta];
13406 if (env->bpf_capable && !expect_blinding &&
13407 prog->jit_requested &&
13408 !bpf_map_key_poisoned(aux) &&
13409 !bpf_map_ptr_poisoned(aux) &&
13410 !bpf_map_ptr_unpriv(aux)) {
13411 struct bpf_jit_poke_descriptor desc = {
13412 .reason = BPF_POKE_REASON_TAIL_CALL,
13413 .tail_call.map = BPF_MAP_PTR(aux->map_ptr_state),
13414 .tail_call.key = bpf_map_key_immediate(aux),
13415 .insn_idx = i + delta,
13416 };
13417
13418 ret = bpf_jit_add_poke_descriptor(prog, &desc);
13419 if (ret < 0) {
13420 verbose(env, "adding tail call poke descriptor failed\n");
13421 return ret;
13422 }
13423
13424 insn->imm = ret + 1;
13425 continue;
13426 }
13427
13428 if (!bpf_map_ptr_unpriv(aux))
13429 continue;
13430
13431 /* instead of changing every JIT dealing with tail_call
13432 * emit two extra insns:
13433 * if (index >= max_entries) goto out;
13434 * index &= array->index_mask;
13435 * to avoid out-of-bounds cpu speculation
13436 */
13437 if (bpf_map_ptr_poisoned(aux)) {
13438 verbose(env, "tail_call abusing map_ptr\n");
13439 return -EINVAL;
13440 }
13441
13442 map_ptr = BPF_MAP_PTR(aux->map_ptr_state);
13443 insn_buf[0] = BPF_JMP_IMM(BPF_JGE, BPF_REG_3,
13444 map_ptr->max_entries, 2);
13445 insn_buf[1] = BPF_ALU32_IMM(BPF_AND, BPF_REG_3,
13446 container_of(map_ptr,
13447 struct bpf_array,
13448 map)->index_mask);
13449 insn_buf[2] = *insn;
13450 cnt = 3;
13451 new_prog = bpf_patch_insn_data(env, i + delta, insn_buf, cnt);
13452 if (!new_prog)
13453 return -ENOMEM;
13454
13455 delta += cnt - 1;
13456 env->prog = prog = new_prog;
13457 insn = new_prog->insnsi + i + delta;
13458 continue;
13459 }
13460
13461 if (insn->imm == BPF_FUNC_timer_set_callback) {
13462 /* The verifier will process callback_fn as many times as necessary
13463 * with different maps and the register states prepared by
13464 * set_timer_callback_state will be accurate.
13465 *
13466 * The following use case is valid:
13467 * map1 is shared by prog1, prog2, prog3.
13468 * prog1 calls bpf_timer_init for some map1 elements
13469 * prog2 calls bpf_timer_set_callback for some map1 elements.
13470 * Those that were not bpf_timer_init-ed will return -EINVAL.
13471 * prog3 calls bpf_timer_start for some map1 elements.
13472 * Those that were not both bpf_timer_init-ed and
13473 * bpf_timer_set_callback-ed will return -EINVAL.
13474 */
13475 struct bpf_insn ld_addrs[2] = {
13476 BPF_LD_IMM64(BPF_REG_3, (long)prog->aux),
13477 };
13478
13479 insn_buf[0] = ld_addrs[0];
13480 insn_buf[1] = ld_addrs[1];
13481 insn_buf[2] = *insn;
13482 cnt = 3;
13483
13484 new_prog = bpf_patch_insn_data(env, i + delta, insn_buf, cnt);
13485 if (!new_prog)
13486 return -ENOMEM;
13487
13488 delta += cnt - 1;
13489 env->prog = prog = new_prog;
13490 insn = new_prog->insnsi + i + delta;
13491 goto patch_call_imm;
13492 }
13493
13494 if (insn->imm == BPF_FUNC_task_storage_get ||
13495 insn->imm == BPF_FUNC_sk_storage_get ||
13496 insn->imm == BPF_FUNC_inode_storage_get) {
13497 if (env->prog->aux->sleepable)
13498 insn_buf[0] = BPF_MOV64_IMM(BPF_REG_5, GFP_KERNEL);
13499 else
13500 insn_buf[0] = BPF_MOV64_IMM(BPF_REG_5, GFP_ATOMIC);
13501 insn_buf[1] = *insn;
13502 cnt = 2;
13503
13504 new_prog = bpf_patch_insn_data(env, i + delta, insn_buf, cnt);
13505 if (!new_prog)
13506 return -ENOMEM;
13507
13508 delta += cnt - 1;
13509 env->prog = prog = new_prog;
13510 insn = new_prog->insnsi + i + delta;
13511 goto patch_call_imm;
13512 }
13513
13514 /* BPF_EMIT_CALL() assumptions in some of the map_gen_lookup
13515 * and other inlining handlers are currently limited to 64 bit
13516 * only.
13517 */
13518 if (prog->jit_requested && BITS_PER_LONG == 64 &&
13519 (insn->imm == BPF_FUNC_map_lookup_elem ||
13520 insn->imm == BPF_FUNC_map_update_elem ||
13521 insn->imm == BPF_FUNC_map_delete_elem ||
13522 insn->imm == BPF_FUNC_map_push_elem ||
13523 insn->imm == BPF_FUNC_map_pop_elem ||
13524 insn->imm == BPF_FUNC_map_peek_elem ||
13525 insn->imm == BPF_FUNC_redirect_map ||
13526 insn->imm == BPF_FUNC_for_each_map_elem)) {
13527 aux = &env->insn_aux_data[i + delta];
13528 if (bpf_map_ptr_poisoned(aux))
13529 goto patch_call_imm;
13530
13531 map_ptr = BPF_MAP_PTR(aux->map_ptr_state);
13532 ops = map_ptr->ops;
13533 if (insn->imm == BPF_FUNC_map_lookup_elem &&
13534 ops->map_gen_lookup) {
13535 cnt = ops->map_gen_lookup(map_ptr, insn_buf);
13536 if (cnt == -EOPNOTSUPP)
13537 goto patch_map_ops_generic;
13538 if (cnt <= 0 || cnt >= ARRAY_SIZE(insn_buf)) {
13539 verbose(env, "bpf verifier is misconfigured\n");
13540 return -EINVAL;
13541 }
13542
13543 new_prog = bpf_patch_insn_data(env, i + delta,
13544 insn_buf, cnt);
13545 if (!new_prog)
13546 return -ENOMEM;
13547
13548 delta += cnt - 1;
13549 env->prog = prog = new_prog;
13550 insn = new_prog->insnsi + i + delta;
13551 continue;
13552 }
13553
13554 BUILD_BUG_ON(!__same_type(ops->map_lookup_elem,
13555 (void *(*)(struct bpf_map *map, void *key))NULL));
13556 BUILD_BUG_ON(!__same_type(ops->map_delete_elem,
13557 (int (*)(struct bpf_map *map, void *key))NULL));
13558 BUILD_BUG_ON(!__same_type(ops->map_update_elem,
13559 (int (*)(struct bpf_map *map, void *key, void *value,
13560 u64 flags))NULL));
13561 BUILD_BUG_ON(!__same_type(ops->map_push_elem,
13562 (int (*)(struct bpf_map *map, void *value,
13563 u64 flags))NULL));
13564 BUILD_BUG_ON(!__same_type(ops->map_pop_elem,
13565 (int (*)(struct bpf_map *map, void *value))NULL));
13566 BUILD_BUG_ON(!__same_type(ops->map_peek_elem,
13567 (int (*)(struct bpf_map *map, void *value))NULL));
13568 BUILD_BUG_ON(!__same_type(ops->map_redirect,
13569 (int (*)(struct bpf_map *map, u32 ifindex, u64 flags))NULL));
13570 BUILD_BUG_ON(!__same_type(ops->map_for_each_callback,
13571 (int (*)(struct bpf_map *map,
13572 bpf_callback_t callback_fn,
13573 void *callback_ctx,
13574 u64 flags))NULL));
13575
13576 patch_map_ops_generic:
13577 switch (insn->imm) {
13578 case BPF_FUNC_map_lookup_elem:
13579 insn->imm = BPF_CALL_IMM(ops->map_lookup_elem);
13580 continue;
13581 case BPF_FUNC_map_update_elem:
13582 insn->imm = BPF_CALL_IMM(ops->map_update_elem);
13583 continue;
13584 case BPF_FUNC_map_delete_elem:
13585 insn->imm = BPF_CALL_IMM(ops->map_delete_elem);
13586 continue;
13587 case BPF_FUNC_map_push_elem:
13588 insn->imm = BPF_CALL_IMM(ops->map_push_elem);
13589 continue;
13590 case BPF_FUNC_map_pop_elem:
13591 insn->imm = BPF_CALL_IMM(ops->map_pop_elem);
13592 continue;
13593 case BPF_FUNC_map_peek_elem:
13594 insn->imm = BPF_CALL_IMM(ops->map_peek_elem);
13595 continue;
13596 case BPF_FUNC_redirect_map:
13597 insn->imm = BPF_CALL_IMM(ops->map_redirect);
13598 continue;
13599 case BPF_FUNC_for_each_map_elem:
13600 insn->imm = BPF_CALL_IMM(ops->map_for_each_callback);
13601 continue;
13602 }
13603
13604 goto patch_call_imm;
13605 }
13606
13607 /* Implement bpf_jiffies64 inline. */
13608 if (prog->jit_requested && BITS_PER_LONG == 64 &&
13609 insn->imm == BPF_FUNC_jiffies64) {
13610 struct bpf_insn ld_jiffies_addr[2] = {
13611 BPF_LD_IMM64(BPF_REG_0,
13612 (unsigned long)&jiffies),
13613 };
13614
13615 insn_buf[0] = ld_jiffies_addr[0];
13616 insn_buf[1] = ld_jiffies_addr[1];
13617 insn_buf[2] = BPF_LDX_MEM(BPF_DW, BPF_REG_0,
13618 BPF_REG_0, 0);
13619 cnt = 3;
13620
13621 new_prog = bpf_patch_insn_data(env, i + delta, insn_buf,
13622 cnt);
13623 if (!new_prog)
13624 return -ENOMEM;
13625
13626 delta += cnt - 1;
13627 env->prog = prog = new_prog;
13628 insn = new_prog->insnsi + i + delta;
13629 continue;
13630 }
13631
13632 /* Implement bpf_get_func_arg inline. */
13633 if (prog_type == BPF_PROG_TYPE_TRACING &&
13634 insn->imm == BPF_FUNC_get_func_arg) {
13635 /* Load nr_args from ctx - 8 */
13636 insn_buf[0] = BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1, -8);
13637 insn_buf[1] = BPF_JMP32_REG(BPF_JGE, BPF_REG_2, BPF_REG_0, 6);
13638 insn_buf[2] = BPF_ALU64_IMM(BPF_LSH, BPF_REG_2, 3);
13639 insn_buf[3] = BPF_ALU64_REG(BPF_ADD, BPF_REG_2, BPF_REG_1);
13640 insn_buf[4] = BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_2, 0);
13641 insn_buf[5] = BPF_STX_MEM(BPF_DW, BPF_REG_3, BPF_REG_0, 0);
13642 insn_buf[6] = BPF_MOV64_IMM(BPF_REG_0, 0);
13643 insn_buf[7] = BPF_JMP_A(1);
13644 insn_buf[8] = BPF_MOV64_IMM(BPF_REG_0, -EINVAL);
13645 cnt = 9;
13646
13647 new_prog = bpf_patch_insn_data(env, i + delta, insn_buf, cnt);
13648 if (!new_prog)
13649 return -ENOMEM;
13650
13651 delta += cnt - 1;
13652 env->prog = prog = new_prog;
13653 insn = new_prog->insnsi + i + delta;
13654 continue;
13655 }
13656
13657 /* Implement bpf_get_func_ret inline. */
13658 if (prog_type == BPF_PROG_TYPE_TRACING &&
13659 insn->imm == BPF_FUNC_get_func_ret) {
13660 if (eatype == BPF_TRACE_FEXIT ||
13661 eatype == BPF_MODIFY_RETURN) {
13662 /* Load nr_args from ctx - 8 */
13663 insn_buf[0] = BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1, -8);
13664 insn_buf[1] = BPF_ALU64_IMM(BPF_LSH, BPF_REG_0, 3);
13665 insn_buf[2] = BPF_ALU64_REG(BPF_ADD, BPF_REG_0, BPF_REG_1);
13666 insn_buf[3] = BPF_LDX_MEM(BPF_DW, BPF_REG_3, BPF_REG_0, 0);
13667 insn_buf[4] = BPF_STX_MEM(BPF_DW, BPF_REG_2, BPF_REG_3, 0);
13668 insn_buf[5] = BPF_MOV64_IMM(BPF_REG_0, 0);
13669 cnt = 6;
13670 } else {
13671 insn_buf[0] = BPF_MOV64_IMM(BPF_REG_0, -EOPNOTSUPP);
13672 cnt = 1;
13673 }
13674
13675 new_prog = bpf_patch_insn_data(env, i + delta, insn_buf, cnt);
13676 if (!new_prog)
13677 return -ENOMEM;
13678
13679 delta += cnt - 1;
13680 env->prog = prog = new_prog;
13681 insn = new_prog->insnsi + i + delta;
13682 continue;
13683 }
13684
13685 /* Implement get_func_arg_cnt inline. */
13686 if (prog_type == BPF_PROG_TYPE_TRACING &&
13687 insn->imm == BPF_FUNC_get_func_arg_cnt) {
13688 /* Load nr_args from ctx - 8 */
13689 insn_buf[0] = BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1, -8);
13690
13691 new_prog = bpf_patch_insn_data(env, i + delta, insn_buf, 1);
13692 if (!new_prog)
13693 return -ENOMEM;
13694
13695 env->prog = prog = new_prog;
13696 insn = new_prog->insnsi + i + delta;
13697 continue;
13698 }
13699
13700 /* Implement bpf_get_func_ip inline. */
13701 if (prog_type == BPF_PROG_TYPE_TRACING &&
13702 insn->imm == BPF_FUNC_get_func_ip) {
13703 /* Load IP address from ctx - 16 */
13704 insn_buf[0] = BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1, -16);
13705
13706 new_prog = bpf_patch_insn_data(env, i + delta, insn_buf, 1);
13707 if (!new_prog)
13708 return -ENOMEM;
13709
13710 env->prog = prog = new_prog;
13711 insn = new_prog->insnsi + i + delta;
13712 continue;
13713 }
13714
13715 patch_call_imm:
13716 fn = env->ops->get_func_proto(insn->imm, env->prog);
13717 /* all functions that have prototype and verifier allowed
13718 * programs to call them, must be real in-kernel functions
13719 */
13720 if (!fn->func) {
13721 verbose(env,
13722 "kernel subsystem misconfigured func %s#%d\n",
13723 func_id_name(insn->imm), insn->imm);
13724 return -EFAULT;
13725 }
13726 insn->imm = fn->func - __bpf_call_base;
13727 }
13728
13729 /* Since poke tab is now finalized, publish aux to tracker. */
13730 for (i = 0; i < prog->aux->size_poke_tab; i++) {
13731 map_ptr = prog->aux->poke_tab[i].tail_call.map;
13732 if (!map_ptr->ops->map_poke_track ||
13733 !map_ptr->ops->map_poke_untrack ||
13734 !map_ptr->ops->map_poke_run) {
13735 verbose(env, "bpf verifier is misconfigured\n");
13736 return -EINVAL;
13737 }
13738
13739 ret = map_ptr->ops->map_poke_track(map_ptr, prog->aux);
13740 if (ret < 0) {
13741 verbose(env, "tracking tail call prog failed\n");
13742 return ret;
13743 }
13744 }
13745
13746 sort_kfunc_descs_by_imm(env->prog);
13747
13748 return 0;
13749 }
13750
---
0-DAY CI Kernel Test Service
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
On Wed, Mar 16, 2022 at 7:23 PM Martin KaFai Lau <kafai@fb.com> wrote: > > On Wed, Mar 16, 2022 at 12:54:00PM -0700, Joanne Koong wrote: > > From: Joanne Koong <joannelkoong@gmail.com> > > > > Currently, local storage memory can only be allocated atomically > > (GFP_ATOMIC). This restriction is too strict for sleepable bpf > > programs. > > > > In this patch, the verifier detects whether the program is sleepable, > > and passes the corresponding GFP_KERNEL or GFP_ATOMIC flag as a > > 5th argument to bpf_task/sk/inode_storage_get. This flag will propagate > > down to the local storage functions that allocate memory. > > > > Please note that bpf_task/sk/inode_storage_update_elem functions are > > invoked by userspace applications through syscalls. Preemption is > > disabled before bpf_task/sk/inode_storage_update_elem is called, which > > means they will always have to allocate memory atomically. > > > > The existing local storage selftests cover both the GFP_ATOMIC and the > > GFP_KERNEL cases in bpf_local_storage_update. > > > > v2 <- v1: > > * Allocate the memory before/after the raw_spin_lock_irqsave, depending > > on the gfp flags > > * Rename mem_flags to gfp_flags > > * Reword the comment "*mem_flags* is set by the bpf verifier" to > > "*gfp_flags* is a hidden argument provided by the verifier" > > * Add a sentence to the commit message about existing local storage > > selftests covering both the GFP_ATOMIC and GFP_KERNEL paths in > > bpf_local_storage_update. > > [ ... ] > > > struct bpf_local_storage_data * > > bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap, > > - void *value, u64 map_flags) > > + void *value, u64 map_flags, gfp_t gfp_flags) > > { > > struct bpf_local_storage_data *old_sdata = NULL; > > - struct bpf_local_storage_elem *selem; > > + struct bpf_local_storage_elem *selem = NULL; > > struct bpf_local_storage *local_storage; > > unsigned long flags; > > int err; > > @@ -365,6 +366,9 @@ bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap, > > !map_value_has_spin_lock(&smap->map))) > > return ERR_PTR(-EINVAL); > > > > + if (gfp_flags == GFP_KERNEL && (map_flags & ~BPF_F_LOCK) != BPF_NOEXIST) > > + return ERR_PTR(-EINVAL); > > + > > local_storage = rcu_dereference_check(*owner_storage(smap, owner), > > bpf_rcu_lock_held()); > > if (!local_storage || hlist_empty(&local_storage->list)) { > > @@ -373,11 +377,11 @@ bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap, > > if (err) > > return ERR_PTR(err); > > > > - selem = bpf_selem_alloc(smap, owner, value, true); > > + selem = bpf_selem_alloc(smap, owner, value, true, gfp_flags); > > if (!selem) > > return ERR_PTR(-ENOMEM); > > > > - err = bpf_local_storage_alloc(owner, smap, selem); > > + err = bpf_local_storage_alloc(owner, smap, selem, gfp_flags); > > if (err) { > > kfree(selem); > > mem_uncharge(smap, owner, smap->elem_size); > > @@ -404,6 +408,12 @@ bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap, > > } > > } > > > > + if (gfp_flags == GFP_KERNEL) { > > + selem = bpf_selem_alloc(smap, owner, value, true, gfp_flags); > I think this new path is not executed by the existing > "progs/local_storage.c" test which has sleepable lsm prog. At least a second > BPF_MAP_TYPE_TASK_STORAGE map (or SK_STORAGE) is needed? > or there is other selftest covering this new path that I missed? Thanks for your feedback. I think I'm misunderstanding how the progs/local_storage.c test and/or local storage works then. Would you mind explaining why a second map is needed? This is my (faulty) understanding of what is happening: 1) In "local_storage.c" in "SEC("lsm.s/inode_rename")" there is a call to bpf_inode_storage_get with the BPF_LOCAL_STORAGE_GET_F_CREATE flag set, which will call into bpf_local_storage_update (which will create the local storage + the selem, and put it in the RCU for that inode_storage_map) 2) Then, further down in the "local_storage.c" file in "SEC("lsm.s/bprm_committed_creds")", there is another call to bpf_inode_storage_get on the same inode_storage_map but on a different inode, also with the BPF_LOCAL_STORAGE_GET_F_CREATE flag set. This will also call into bpf_local_storage_update. 3) In bpf_local_storage_update from the call in #2, it sees that there is a local storage associated with this map in the RCU, it tries to look for the inode but doesn't find it, so it needs to allocate with GFP_KERNEL a new selem and then update with the new selem. I just tried out some debug statements locally to test, and it looks like my analysis of #3 is wrong. I will debug this some more tomorrow > > Others lgtm. > > Acked-by: Martin KaFai Lau <kafai@fb.com> > > > + if (!selem) > > + return ERR_PTR(-ENOMEM); > > + } > > + > > raw_spin_lock_irqsave(&local_storage->lock, flags); > > > > /* Recheck local_storage->list under local_storage->lock */
On Wed, Mar 16, 2022 at 10:26:57PM -0700, Joanne Koong wrote: > On Wed, Mar 16, 2022 at 7:23 PM Martin KaFai Lau <kafai@fb.com> wrote: > > > > On Wed, Mar 16, 2022 at 12:54:00PM -0700, Joanne Koong wrote: > > > From: Joanne Koong <joannelkoong@gmail.com> > > > > > > Currently, local storage memory can only be allocated atomically > > > (GFP_ATOMIC). This restriction is too strict for sleepable bpf > > > programs. > > > > > > In this patch, the verifier detects whether the program is sleepable, > > > and passes the corresponding GFP_KERNEL or GFP_ATOMIC flag as a > > > 5th argument to bpf_task/sk/inode_storage_get. This flag will propagate > > > down to the local storage functions that allocate memory. > > > > > > Please note that bpf_task/sk/inode_storage_update_elem functions are > > > invoked by userspace applications through syscalls. Preemption is > > > disabled before bpf_task/sk/inode_storage_update_elem is called, which > > > means they will always have to allocate memory atomically. > > > > > > The existing local storage selftests cover both the GFP_ATOMIC and the > > > GFP_KERNEL cases in bpf_local_storage_update. > > > > > > v2 <- v1: > > > * Allocate the memory before/after the raw_spin_lock_irqsave, depending > > > on the gfp flags > > > * Rename mem_flags to gfp_flags > > > * Reword the comment "*mem_flags* is set by the bpf verifier" to > > > "*gfp_flags* is a hidden argument provided by the verifier" > > > * Add a sentence to the commit message about existing local storage > > > selftests covering both the GFP_ATOMIC and GFP_KERNEL paths in > > > bpf_local_storage_update. > > > > [ ... ] > > > > > struct bpf_local_storage_data * > > > bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap, > > > - void *value, u64 map_flags) > > > + void *value, u64 map_flags, gfp_t gfp_flags) > > > { > > > struct bpf_local_storage_data *old_sdata = NULL; > > > - struct bpf_local_storage_elem *selem; > > > + struct bpf_local_storage_elem *selem = NULL; > > > struct bpf_local_storage *local_storage; > > > unsigned long flags; > > > int err; > > > @@ -365,6 +366,9 @@ bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap, > > > !map_value_has_spin_lock(&smap->map))) > > > return ERR_PTR(-EINVAL); > > > > > > + if (gfp_flags == GFP_KERNEL && (map_flags & ~BPF_F_LOCK) != BPF_NOEXIST) > > > + return ERR_PTR(-EINVAL); > > > + > > > local_storage = rcu_dereference_check(*owner_storage(smap, owner), > > > bpf_rcu_lock_held()); > > > if (!local_storage || hlist_empty(&local_storage->list)) { > > > @@ -373,11 +377,11 @@ bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap, > > > if (err) > > > return ERR_PTR(err); > > > > > > - selem = bpf_selem_alloc(smap, owner, value, true); > > > + selem = bpf_selem_alloc(smap, owner, value, true, gfp_flags); > > > if (!selem) > > > return ERR_PTR(-ENOMEM); > > > > > > - err = bpf_local_storage_alloc(owner, smap, selem); > > > + err = bpf_local_storage_alloc(owner, smap, selem, gfp_flags); > > > if (err) { > > > kfree(selem); > > > mem_uncharge(smap, owner, smap->elem_size); > > > @@ -404,6 +408,12 @@ bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap, > > > } > > > } > > > > > > + if (gfp_flags == GFP_KERNEL) { > > > + selem = bpf_selem_alloc(smap, owner, value, true, gfp_flags); > > I think this new path is not executed by the existing > > "progs/local_storage.c" test which has sleepable lsm prog. At least a second > > BPF_MAP_TYPE_TASK_STORAGE map (or SK_STORAGE) is needed? > > or there is other selftest covering this new path that I missed? > Thanks for your feedback. I think I'm misunderstanding how the > progs/local_storage.c test and/or local storage works then. Would you > mind explaining why a second map is needed? > > This is my (faulty) understanding of what is happening: > 1) In "local_storage.c" in "SEC("lsm.s/inode_rename")" there is a call > to bpf_inode_storage_get with the BPF_LOCAL_STORAGE_GET_F_CREATE flag > set, which will call into bpf_local_storage_update (which will create > the local storage + the selem, and put it in the RCU for that > inode_storage_map) From reading the comment above the bpf_inode_storage_get(BPF_LOCAL_STORAGE_GET_F_CREATE): "new_dentry->d_inode can be NULL", so it is expected to fail. Meaning no storage is created. > > 2) Then, further down in the "local_storage.c" file in > "SEC("lsm.s/bprm_committed_creds")", there is another call to > bpf_inode_storage_get on the same inode_storage_map but on a different > inode, also with the BPF_LOCAL_STORAGE_GET_F_CREATE flag set. This > will also call into bpf_local_storage_update. I belive this is the inode and the storage that the second bpf_inode_storage_get(..., 0) in the "inode_rename" bpf-prog is supposed to get. Otherwise, I don't see how the test can pass. > > 3) In bpf_local_storage_update from the call in #2, it sees that there > is a local storage associated with this map in the RCU, it tries to > look for the inode but doesn't find it, so it needs to allocate with > GFP_KERNEL a new selem and then update with the new selem. Correct, that will be the very first storage created for this inode and it will go through the "if (!local_storage || hlist_empty(&local_storage->list))" allocation code path in bpf_local_storage_update() which is an existing code path. I was talking specifically about the "if (gfp_flags == GFP_KERNEL)" allocation code path. Thus, it needs a second inode local storage (i.e. a second inode map) for the same inode. A second inode storage map and another "bpf_inode_storage_get(&second_inode_storage_map, ... BPF_LOCAL_STORAGE_GET_F_CREATE)" should be enough. It seems it needs a re-spin because of the sparse warning. I don't see an issue from the code, just thinking it will be useful to have a test to exercise this path. It could be a follow up as an individual patch if not in v3.
On Thu, Mar 17, 2022 at 11:04 AM Martin KaFai Lau <kafai@fb.com> wrote: > > On Wed, Mar 16, 2022 at 10:26:57PM -0700, Joanne Koong wrote: > > On Wed, Mar 16, 2022 at 7:23 PM Martin KaFai Lau <kafai@fb.com> wrote: > > > > > > On Wed, Mar 16, 2022 at 12:54:00PM -0700, Joanne Koong wrote: > > > > From: Joanne Koong <joannelkoong@gmail.com> > > > > > > > > Currently, local storage memory can only be allocated atomically > > > > (GFP_ATOMIC). This restriction is too strict for sleepable bpf > > > > programs. > > > > > > > > In this patch, the verifier detects whether the program is sleepable, > > > > and passes the corresponding GFP_KERNEL or GFP_ATOMIC flag as a > > > > 5th argument to bpf_task/sk/inode_storage_get. This flag will propagate > > > > down to the local storage functions that allocate memory. > > > > > > > > Please note that bpf_task/sk/inode_storage_update_elem functions are > > > > invoked by userspace applications through syscalls. Preemption is > > > > disabled before bpf_task/sk/inode_storage_update_elem is called, which > > > > means they will always have to allocate memory atomically. > > > > > > > > The existing local storage selftests cover both the GFP_ATOMIC and the > > > > GFP_KERNEL cases in bpf_local_storage_update. > > > > > > > > v2 <- v1: > > > > * Allocate the memory before/after the raw_spin_lock_irqsave, depending > > > > on the gfp flags > > > > * Rename mem_flags to gfp_flags > > > > * Reword the comment "*mem_flags* is set by the bpf verifier" to > > > > "*gfp_flags* is a hidden argument provided by the verifier" > > > > * Add a sentence to the commit message about existing local storage > > > > selftests covering both the GFP_ATOMIC and GFP_KERNEL paths in > > > > bpf_local_storage_update. > > > > > > [ ... ] > > > > > > > struct bpf_local_storage_data * > > > > bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap, > > > > - void *value, u64 map_flags) > > > > + void *value, u64 map_flags, gfp_t gfp_flags) > > > > { > > > > struct bpf_local_storage_data *old_sdata = NULL; > > > > - struct bpf_local_storage_elem *selem; > > > > + struct bpf_local_storage_elem *selem = NULL; > > > > struct bpf_local_storage *local_storage; > > > > unsigned long flags; > > > > int err; > > > > @@ -365,6 +366,9 @@ bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap, > > > > !map_value_has_spin_lock(&smap->map))) > > > > return ERR_PTR(-EINVAL); > > > > > > > > + if (gfp_flags == GFP_KERNEL && (map_flags & ~BPF_F_LOCK) != BPF_NOEXIST) > > > > + return ERR_PTR(-EINVAL); > > > > + > > > > local_storage = rcu_dereference_check(*owner_storage(smap, owner), > > > > bpf_rcu_lock_held()); > > > > if (!local_storage || hlist_empty(&local_storage->list)) { > > > > @@ -373,11 +377,11 @@ bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap, > > > > if (err) > > > > return ERR_PTR(err); > > > > > > > > - selem = bpf_selem_alloc(smap, owner, value, true); > > > > + selem = bpf_selem_alloc(smap, owner, value, true, gfp_flags); > > > > if (!selem) > > > > return ERR_PTR(-ENOMEM); > > > > > > > > - err = bpf_local_storage_alloc(owner, smap, selem); > > > > + err = bpf_local_storage_alloc(owner, smap, selem, gfp_flags); > > > > if (err) { > > > > kfree(selem); > > > > mem_uncharge(smap, owner, smap->elem_size); > > > > @@ -404,6 +408,12 @@ bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap, > > > > } > > > > } > > > > > > > > + if (gfp_flags == GFP_KERNEL) { > > > > + selem = bpf_selem_alloc(smap, owner, value, true, gfp_flags); > > > I think this new path is not executed by the existing > > > "progs/local_storage.c" test which has sleepable lsm prog. At least a second > > > BPF_MAP_TYPE_TASK_STORAGE map (or SK_STORAGE) is needed? > > > or there is other selftest covering this new path that I missed? > > Thanks for your feedback. I think I'm misunderstanding how the > > progs/local_storage.c test and/or local storage works then. Would you > > mind explaining why a second map is needed? > > > > This is my (faulty) understanding of what is happening: > > 1) In "local_storage.c" in "SEC("lsm.s/inode_rename")" there is a call > > to bpf_inode_storage_get with the BPF_LOCAL_STORAGE_GET_F_CREATE flag > > set, which will call into bpf_local_storage_update (which will create > > the local storage + the selem, and put it in the RCU for that > > inode_storage_map) > From reading the comment above the bpf_inode_storage_get(BPF_LOCAL_STORAGE_GET_F_CREATE): > "new_dentry->d_inode can be NULL", so it is expected to fail. > Meaning no storage is created. > > > > > 2) Then, further down in the "local_storage.c" file in > > "SEC("lsm.s/bprm_committed_creds")", there is another call to > > bpf_inode_storage_get on the same inode_storage_map but on a different > > inode, also with the BPF_LOCAL_STORAGE_GET_F_CREATE flag set. This > > will also call into bpf_local_storage_update. > I belive this is the inode and the storage that the second > bpf_inode_storage_get(..., 0) in the "inode_rename" bpf-prog is supposed > to get. Otherwise, I don't see how the test can pass. > > > > > 3) In bpf_local_storage_update from the call in #2, it sees that there > > is a local storage associated with this map in the RCU, it tries to > > look for the inode but doesn't find it, so it needs to allocate with > > GFP_KERNEL a new selem and then update with the new selem. > Correct, that will be the very first storage created for this inode > and it will go through the "if (!local_storage || hlist_empty(&local_storage->list))" > allocation code path in bpf_local_storage_update() which is > an existing code path. > Ah, I see. I mistakenly thought inodes shared local storages if you passed in the same map. Thanks for the clarification! > I was talking specifically about the "if (gfp_flags == GFP_KERNEL)" > allocation code path. Thus, it needs a second inode local storage (i.e. > a second inode map) for the same inode. A second inode storage map > and another "bpf_inode_storage_get(&second_inode_storage_map, ... > BPF_LOCAL_STORAGE_GET_F_CREATE)" should be enough. > > It seems it needs a re-spin because of the sparse warning. > I don't see an issue from the code, just thinking it will > be useful to have a test to exercise this path. It > could be a follow up as an individual patch if not in v3. I will submit a v3 that fixes the sparse warning and adds a case to exercise this path. Thanks for reviewing this, Martin!
diff --git a/include/linux/bpf_local_storage.h b/include/linux/bpf_local_storage.h index 37b3906af8b1..493e63258497 100644 --- a/include/linux/bpf_local_storage.h +++ b/include/linux/bpf_local_storage.h @@ -154,16 +154,17 @@ void bpf_selem_unlink_map(struct bpf_local_storage_elem *selem); struct bpf_local_storage_elem * bpf_selem_alloc(struct bpf_local_storage_map *smap, void *owner, void *value, - bool charge_mem); + bool charge_mem, gfp_t gfp_flags); int bpf_local_storage_alloc(void *owner, struct bpf_local_storage_map *smap, - struct bpf_local_storage_elem *first_selem); + struct bpf_local_storage_elem *first_selem, + gfp_t gfp_flags); struct bpf_local_storage_data * bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap, - void *value, u64 map_flags); + void *value, u64 map_flags, gfp_t gfp_flags); void bpf_local_storage_free_rcu(struct rcu_head *rcu); diff --git a/kernel/bpf/bpf_inode_storage.c b/kernel/bpf/bpf_inode_storage.c index e29d9e3d853e..96be8d518885 100644 --- a/kernel/bpf/bpf_inode_storage.c +++ b/kernel/bpf/bpf_inode_storage.c @@ -136,7 +136,7 @@ static int bpf_fd_inode_storage_update_elem(struct bpf_map *map, void *key, sdata = bpf_local_storage_update(f->f_inode, (struct bpf_local_storage_map *)map, - value, map_flags); + value, map_flags, GFP_ATOMIC); fput(f); return PTR_ERR_OR_ZERO(sdata); } @@ -169,8 +169,9 @@ static int bpf_fd_inode_storage_delete_elem(struct bpf_map *map, void *key) return err; } -BPF_CALL_4(bpf_inode_storage_get, struct bpf_map *, map, struct inode *, inode, - void *, value, u64, flags) +/* *gfp_flags* is a hidden argument provided by the verifier */ +BPF_CALL_5(bpf_inode_storage_get, struct bpf_map *, map, struct inode *, inode, + void *, value, u64, flags, gfp_t, gfp_flags) { struct bpf_local_storage_data *sdata; @@ -196,7 +197,7 @@ BPF_CALL_4(bpf_inode_storage_get, struct bpf_map *, map, struct inode *, inode, if (flags & BPF_LOCAL_STORAGE_GET_F_CREATE) { sdata = bpf_local_storage_update( inode, (struct bpf_local_storage_map *)map, value, - BPF_NOEXIST); + BPF_NOEXIST, gfp_flags); return IS_ERR(sdata) ? (unsigned long)NULL : (unsigned long)sdata->data; } diff --git a/kernel/bpf/bpf_local_storage.c b/kernel/bpf/bpf_local_storage.c index 092a1ac772d7..01aa2b51ec4d 100644 --- a/kernel/bpf/bpf_local_storage.c +++ b/kernel/bpf/bpf_local_storage.c @@ -63,7 +63,7 @@ static bool selem_linked_to_map(const struct bpf_local_storage_elem *selem) struct bpf_local_storage_elem * bpf_selem_alloc(struct bpf_local_storage_map *smap, void *owner, - void *value, bool charge_mem) + void *value, bool charge_mem, gfp_t gfp_flags) { struct bpf_local_storage_elem *selem; @@ -71,7 +71,7 @@ bpf_selem_alloc(struct bpf_local_storage_map *smap, void *owner, return NULL; selem = bpf_map_kzalloc(&smap->map, smap->elem_size, - GFP_ATOMIC | __GFP_NOWARN); + gfp_flags | __GFP_NOWARN); if (selem) { if (value) memcpy(SDATA(selem)->data, value, smap->map.value_size); @@ -282,7 +282,8 @@ static int check_flags(const struct bpf_local_storage_data *old_sdata, int bpf_local_storage_alloc(void *owner, struct bpf_local_storage_map *smap, - struct bpf_local_storage_elem *first_selem) + struct bpf_local_storage_elem *first_selem, + gfp_t gfp_flags) { struct bpf_local_storage *prev_storage, *storage; struct bpf_local_storage **owner_storage_ptr; @@ -293,7 +294,7 @@ int bpf_local_storage_alloc(void *owner, return err; storage = bpf_map_kzalloc(&smap->map, sizeof(*storage), - GFP_ATOMIC | __GFP_NOWARN); + gfp_flags | __GFP_NOWARN); if (!storage) { err = -ENOMEM; goto uncharge; @@ -350,10 +351,10 @@ int bpf_local_storage_alloc(void *owner, */ struct bpf_local_storage_data * bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap, - void *value, u64 map_flags) + void *value, u64 map_flags, gfp_t gfp_flags) { struct bpf_local_storage_data *old_sdata = NULL; - struct bpf_local_storage_elem *selem; + struct bpf_local_storage_elem *selem = NULL; struct bpf_local_storage *local_storage; unsigned long flags; int err; @@ -365,6 +366,9 @@ bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap, !map_value_has_spin_lock(&smap->map))) return ERR_PTR(-EINVAL); + if (gfp_flags == GFP_KERNEL && (map_flags & ~BPF_F_LOCK) != BPF_NOEXIST) + return ERR_PTR(-EINVAL); + local_storage = rcu_dereference_check(*owner_storage(smap, owner), bpf_rcu_lock_held()); if (!local_storage || hlist_empty(&local_storage->list)) { @@ -373,11 +377,11 @@ bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap, if (err) return ERR_PTR(err); - selem = bpf_selem_alloc(smap, owner, value, true); + selem = bpf_selem_alloc(smap, owner, value, true, gfp_flags); if (!selem) return ERR_PTR(-ENOMEM); - err = bpf_local_storage_alloc(owner, smap, selem); + err = bpf_local_storage_alloc(owner, smap, selem, gfp_flags); if (err) { kfree(selem); mem_uncharge(smap, owner, smap->elem_size); @@ -404,6 +408,12 @@ bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap, } } + if (gfp_flags == GFP_KERNEL) { + selem = bpf_selem_alloc(smap, owner, value, true, gfp_flags); + if (!selem) + return ERR_PTR(-ENOMEM); + } + raw_spin_lock_irqsave(&local_storage->lock, flags); /* Recheck local_storage->list under local_storage->lock */ @@ -429,19 +439,21 @@ bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap, goto unlock; } - /* local_storage->lock is held. Hence, we are sure - * we can unlink and uncharge the old_sdata successfully - * later. Hence, instead of charging the new selem now - * and then uncharge the old selem later (which may cause - * a potential but unnecessary charge failure), avoid taking - * a charge at all here (the "!old_sdata" check) and the - * old_sdata will not be uncharged later during - * bpf_selem_unlink_storage_nolock(). - */ - selem = bpf_selem_alloc(smap, owner, value, !old_sdata); - if (!selem) { - err = -ENOMEM; - goto unlock_err; + if (gfp_flags != GFP_KERNEL) { + /* local_storage->lock is held. Hence, we are sure + * we can unlink and uncharge the old_sdata successfully + * later. Hence, instead of charging the new selem now + * and then uncharge the old selem later (which may cause + * a potential but unnecessary charge failure), avoid taking + * a charge at all here (the "!old_sdata" check) and the + * old_sdata will not be uncharged later during + * bpf_selem_unlink_storage_nolock(). + */ + selem = bpf_selem_alloc(smap, owner, value, !old_sdata, gfp_flags); + if (!selem) { + err = -ENOMEM; + goto unlock_err; + } } /* First, link the new selem to the map */ @@ -463,6 +475,10 @@ bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap, unlock_err: raw_spin_unlock_irqrestore(&local_storage->lock, flags); + if (selem) { + mem_uncharge(smap, owner, smap->elem_size); + kfree(selem); + } return ERR_PTR(err); } diff --git a/kernel/bpf/bpf_task_storage.c b/kernel/bpf/bpf_task_storage.c index 5da7bed0f5f6..6638a0ecc3d2 100644 --- a/kernel/bpf/bpf_task_storage.c +++ b/kernel/bpf/bpf_task_storage.c @@ -174,7 +174,8 @@ static int bpf_pid_task_storage_update_elem(struct bpf_map *map, void *key, bpf_task_storage_lock(); sdata = bpf_local_storage_update( - task, (struct bpf_local_storage_map *)map, value, map_flags); + task, (struct bpf_local_storage_map *)map, value, map_flags, + GFP_ATOMIC); bpf_task_storage_unlock(); err = PTR_ERR_OR_ZERO(sdata); @@ -226,8 +227,9 @@ static int bpf_pid_task_storage_delete_elem(struct bpf_map *map, void *key) return err; } -BPF_CALL_4(bpf_task_storage_get, struct bpf_map *, map, struct task_struct *, - task, void *, value, u64, flags) +/* *gfp_flags* is a hidden argument provided by the verifier */ +BPF_CALL_5(bpf_task_storage_get, struct bpf_map *, map, struct task_struct *, + task, void *, value, u64, flags, gfp_t, gfp_flags) { struct bpf_local_storage_data *sdata; @@ -250,7 +252,7 @@ BPF_CALL_4(bpf_task_storage_get, struct bpf_map *, map, struct task_struct *, (flags & BPF_LOCAL_STORAGE_GET_F_CREATE)) sdata = bpf_local_storage_update( task, (struct bpf_local_storage_map *)map, value, - BPF_NOEXIST); + BPF_NOEXIST, gfp_flags); unlock: bpf_task_storage_unlock(); diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 0db6cd8dcb35..392fdaabedbd 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -13491,6 +13491,26 @@ static int do_misc_fixups(struct bpf_verifier_env *env) goto patch_call_imm; } + if (insn->imm == BPF_FUNC_task_storage_get || + insn->imm == BPF_FUNC_sk_storage_get || + insn->imm == BPF_FUNC_inode_storage_get) { + if (env->prog->aux->sleepable) + insn_buf[0] = BPF_MOV64_IMM(BPF_REG_5, GFP_KERNEL); + else + insn_buf[0] = BPF_MOV64_IMM(BPF_REG_5, GFP_ATOMIC); + insn_buf[1] = *insn; + cnt = 2; + + new_prog = bpf_patch_insn_data(env, i + delta, insn_buf, cnt); + if (!new_prog) + return -ENOMEM; + + delta += cnt - 1; + env->prog = prog = new_prog; + insn = new_prog->insnsi + i + delta; + goto patch_call_imm; + } + /* BPF_EMIT_CALL() assumptions in some of the map_gen_lookup * and other inlining handlers are currently limited to 64 bit * only. diff --git a/net/core/bpf_sk_storage.c b/net/core/bpf_sk_storage.c index d9c37fd10809..7aff1206a851 100644 --- a/net/core/bpf_sk_storage.c +++ b/net/core/bpf_sk_storage.c @@ -141,7 +141,7 @@ static int bpf_fd_sk_storage_update_elem(struct bpf_map *map, void *key, if (sock) { sdata = bpf_local_storage_update( sock->sk, (struct bpf_local_storage_map *)map, value, - map_flags); + map_flags, GFP_ATOMIC); sockfd_put(sock); return PTR_ERR_OR_ZERO(sdata); } @@ -172,7 +172,7 @@ bpf_sk_storage_clone_elem(struct sock *newsk, { struct bpf_local_storage_elem *copy_selem; - copy_selem = bpf_selem_alloc(smap, newsk, NULL, true); + copy_selem = bpf_selem_alloc(smap, newsk, NULL, true, GFP_ATOMIC); if (!copy_selem) return NULL; @@ -230,7 +230,7 @@ int bpf_sk_storage_clone(const struct sock *sk, struct sock *newsk) bpf_selem_link_map(smap, copy_selem); bpf_selem_link_storage_nolock(new_sk_storage, copy_selem); } else { - ret = bpf_local_storage_alloc(newsk, smap, copy_selem); + ret = bpf_local_storage_alloc(newsk, smap, copy_selem, GFP_ATOMIC); if (ret) { kfree(copy_selem); atomic_sub(smap->elem_size, @@ -255,8 +255,9 @@ int bpf_sk_storage_clone(const struct sock *sk, struct sock *newsk) return ret; } -BPF_CALL_4(bpf_sk_storage_get, struct bpf_map *, map, struct sock *, sk, - void *, value, u64, flags) +/* *gfp_flags* is a hidden argument provided by the verifier */ +BPF_CALL_5(bpf_sk_storage_get, struct bpf_map *, map, struct sock *, sk, + void *, value, u64, flags, gfp_t, gfp_flags) { struct bpf_local_storage_data *sdata; @@ -277,7 +278,7 @@ BPF_CALL_4(bpf_sk_storage_get, struct bpf_map *, map, struct sock *, sk, refcount_inc_not_zero(&sk->sk_refcnt)) { sdata = bpf_local_storage_update( sk, (struct bpf_local_storage_map *)map, value, - BPF_NOEXIST); + BPF_NOEXIST, gfp_flags); /* sk must be a fullsock (guaranteed by verifier), * so sock_gen_put() is unnecessary. */ @@ -417,14 +418,16 @@ static bool bpf_sk_storage_tracing_allowed(const struct bpf_prog *prog) return false; } -BPF_CALL_4(bpf_sk_storage_get_tracing, struct bpf_map *, map, struct sock *, sk, - void *, value, u64, flags) +/* *gfp_flags* is a hidden argument provided by the verifier */ +BPF_CALL_5(bpf_sk_storage_get_tracing, struct bpf_map *, map, struct sock *, sk, + void *, value, u64, flags, gfp_t, gfp_flags) { WARN_ON_ONCE(!bpf_rcu_lock_held()); if (in_hardirq() || in_nmi()) return (unsigned long)NULL; - return (unsigned long)____bpf_sk_storage_get(map, sk, value, flags); + return (unsigned long)____bpf_sk_storage_get(map, sk, value, flags, + gfp_flags); } BPF_CALL_2(bpf_sk_storage_delete_tracing, struct bpf_map *, map,