Message ID | 20240209-hid-bpf-sleepable-v1-0-4cc895b5adbd@kernel.org (mailing list archive) |
---|---|
Headers | show |
Series | allow HID-BPF to do device IOs | expand |
Benjamin Tissoires <bentiss@kernel.org> writes: > [Putting this as a RFC because I'm pretty sure I'm not doing the things > correctly at the BPF level.] > [Also using bpf-next as the base tree as there will be conflicting > changes otherwise] > > Ideally I'd like to have something similar to bpf_timers, but not > in soft IRQ context. So I'm emulating this with a sleepable > bpf_tail_call() (see "HID: bpf: allow to defer work in a delayed > workqueue"). Why implement a new mechanism? Sounds like what you need is essentially the bpf_timer functionality, just running in a different context, right? So why not just add a flag to the timer setup that controls the callback context? I've been toying with something similar for restarting XDP TX for my queueing patch series (though I'm not sure if this will actually end up being needed in the end): https://git.kernel.org/pub/scm/linux/kernel/git/toke/linux.git/commit/?h=xdp-queueing-08&id=54bc201a358d1ac6ebfe900099315bbd0a76e862 -Toke
On Fri, Feb 9, 2024 at 4:42 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote: > > Benjamin Tissoires <bentiss@kernel.org> writes: > > > [Putting this as a RFC because I'm pretty sure I'm not doing the things > > correctly at the BPF level.] > > [Also using bpf-next as the base tree as there will be conflicting > > changes otherwise] > > > > Ideally I'd like to have something similar to bpf_timers, but not > > in soft IRQ context. So I'm emulating this with a sleepable > > bpf_tail_call() (see "HID: bpf: allow to defer work in a delayed > > workqueue"). > > Why implement a new mechanism? Sounds like what you need is essentially > the bpf_timer functionality, just running in a different context, right? Heh, that's exactly why I put in a RFC :) So yes, the bpf_timer approach is cleaner, but I need it in a workqueue, as a hrtimer in a softIRQ would prevent me to kzalloc and wait for the device. > So why not just add a flag to the timer setup that controls the callback > context? I've been toying with something similar for restarting XDP TX > for my queueing patch series (though I'm not sure if this will actually > end up being needed in the end): > > https://git.kernel.org/pub/scm/linux/kernel/git/toke/linux.git/commit/?h=xdp-queueing-08&id=54bc201a358d1ac6ebfe900099315bbd0a76e862 > Oh, nice. Good idea. But would it be OK to have a "timer-like" where it actually defers the job in a workqueue instead of using an hrtimer? I thought I would have to rewrite the entire bpf_timer approach without the softIRQ, but if I can just add a new flag, that will make things way simpler for me. This however raises another issue if I were to use the bpf_timers: now the HID-BPF kfuncs will not be available as they are only available to tracing prog types. And when I tried to call them from a bpf_timer (in softIRQ) they were not available. Cheers, Benjamin
Benjamin Tissoires <benjamin.tissoires@redhat.com> writes: > On Fri, Feb 9, 2024 at 4:42 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote: >> >> Benjamin Tissoires <bentiss@kernel.org> writes: >> >> > [Putting this as a RFC because I'm pretty sure I'm not doing the things >> > correctly at the BPF level.] >> > [Also using bpf-next as the base tree as there will be conflicting >> > changes otherwise] >> > >> > Ideally I'd like to have something similar to bpf_timers, but not >> > in soft IRQ context. So I'm emulating this with a sleepable >> > bpf_tail_call() (see "HID: bpf: allow to defer work in a delayed >> > workqueue"). >> >> Why implement a new mechanism? Sounds like what you need is essentially >> the bpf_timer functionality, just running in a different context, right? > > Heh, that's exactly why I put in a RFC :) > > So yes, the bpf_timer approach is cleaner, but I need it in a > workqueue, as a hrtimer in a softIRQ would prevent me to kzalloc and > wait for the device. Right, makes sense. >> So why not just add a flag to the timer setup that controls the callback >> context? I've been toying with something similar for restarting XDP TX >> for my queueing patch series (though I'm not sure if this will actually >> end up being needed in the end): >> >> https://git.kernel.org/pub/scm/linux/kernel/git/toke/linux.git/commit/?h=xdp-queueing-08&id=54bc201a358d1ac6ebfe900099315bbd0a76e862 >> > > Oh, nice. Good idea. But would it be OK to have a "timer-like" where > it actually defers the job in a workqueue instead of using an hrtimer? That's conceptually still a timer, though, isn't it? I.e., it's a mechanism whereby you specify a callback and a delay, and bpf_timer ensures that your callback is called after that delay. IMO it's totally congruent with that API to be able to specify a different execution context as part of the timer setup. As for how to implement it, I suspect the easiest may be something similar to what the patch I linked above does: keep the hrtimer, and just have a different (kernel) callback function when the timer fires which does an immediate schedule_work() (without the _delayed) and then runs the BPF callback in that workqueue. I.e., keep the delay handling the way the existing bpf_timer implementation does it, and just add an indirection to start the workqueue in the kernel dispatch code. > I thought I would have to rewrite the entire bpf_timer approach > without the softIRQ, but if I can just add a new flag, that will make > things way simpler for me. IMO that would be fine. You may want to wait for the maintainers to chime in before going down this route, though :) > This however raises another issue if I were to use the bpf_timers: now > the HID-BPF kfuncs will not be available as they are only available to > tracing prog types. And when I tried to call them from a bpf_timer (in > softIRQ) they were not available. IIUC, the bpf_timer callback is just a function (subprog) from the verifier PoV, so it is verified as whatever program type is creating the timer. So in other words, as long as you setup the timer from inside a tracing prog type, you should have access to all the same kfuncs, I think? -Toke
On Fri, Feb 9, 2024 at 6:05 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote: > > Benjamin Tissoires <benjamin.tissoires@redhat.com> writes: > > > On Fri, Feb 9, 2024 at 4:42 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote: > >> > >> Benjamin Tissoires <bentiss@kernel.org> writes: > >> > >> > [Putting this as a RFC because I'm pretty sure I'm not doing the things > >> > correctly at the BPF level.] > >> > [Also using bpf-next as the base tree as there will be conflicting > >> > changes otherwise] > >> > > >> > Ideally I'd like to have something similar to bpf_timers, but not > >> > in soft IRQ context. So I'm emulating this with a sleepable > >> > bpf_tail_call() (see "HID: bpf: allow to defer work in a delayed > >> > workqueue"). > >> > >> Why implement a new mechanism? Sounds like what you need is essentially > >> the bpf_timer functionality, just running in a different context, right? > > > > Heh, that's exactly why I put in a RFC :) > > > > So yes, the bpf_timer approach is cleaner, but I need it in a > > workqueue, as a hrtimer in a softIRQ would prevent me to kzalloc and > > wait for the device. > > Right, makes sense. > > >> So why not just add a flag to the timer setup that controls the callback > >> context? I've been toying with something similar for restarting XDP TX > >> for my queueing patch series (though I'm not sure if this will actually > >> end up being needed in the end): > >> > >> https://git.kernel.org/pub/scm/linux/kernel/git/toke/linux.git/commit/?h=xdp-queueing-08&id=54bc201a358d1ac6ebfe900099315bbd0a76e862 > >> > > > > Oh, nice. Good idea. But would it be OK to have a "timer-like" where > > it actually defers the job in a workqueue instead of using an hrtimer? > > That's conceptually still a timer, though, isn't it? I.e., it's a > mechanism whereby you specify a callback and a delay, and bpf_timer > ensures that your callback is called after that delay. IMO it's totally > congruent with that API to be able to specify a different execution > context as part of the timer setup. Yep :) There is still a problem I wasn't able to fix over the week end and today. How can I tell the verifier that the callback is sleepable, when the tracing function that called the timer_start() function is not? (more on that below). > > As for how to implement it, I suspect the easiest may be something > similar to what the patch I linked above does: keep the hrtimer, and > just have a different (kernel) callback function when the timer fires > which does an immediate schedule_work() (without the _delayed) and then > runs the BPF callback in that workqueue. I.e., keep the delay handling > the way the existing bpf_timer implementation does it, and just add an > indirection to start the workqueue in the kernel dispatch code. Sounds good, especially given that's roughly how the delayed_timers are implemented. > > > I thought I would have to rewrite the entire bpf_timer approach > > without the softIRQ, but if I can just add a new flag, that will make > > things way simpler for me. > > IMO that would be fine. You may want to wait for the maintainers to > chime in before going down this route, though :) > > > This however raises another issue if I were to use the bpf_timers: now > > the HID-BPF kfuncs will not be available as they are only available to > > tracing prog types. And when I tried to call them from a bpf_timer (in > > softIRQ) they were not available. > > IIUC, the bpf_timer callback is just a function (subprog) from the > verifier PoV, so it is verified as whatever program type is creating the > timer. So in other words, as long as you setup the timer from inside a > tracing prog type, you should have access to all the same kfuncs, I > think? Yep, you are correct. But as mentioned above, I am now in trouble with the sleepable state: - I need to call timer_start() from a non sleepable tracing function (I'm in hard IRQ when dealing with a physical device) - but then, ideally, the callback function needs to be tagged as a sleepable one, so I can export my kfuncs which are doing kzalloc and device IO as such. However, I can not really teach the BPF verifier to do so: - it seems to check for the callback first when it is loaded, and there is no SEC() equivalent for static functions - libbpf doesn't have access to the callback as a prog as it has to be a static function, and thus isn't exported as a full-blown prog. - the verifier only checks for the callback when dealing with BPF_FUNC_timer_set_callback, which doesn't have a "flag" argument (though the validation of the callback has already been done while checking it first, so we are already too late to change the sleppable state of the callback) Right now, the only OK-ish version I have is declaring the kfunc as non-sleepable, but checking that we are in a different context than the IRQ of the initial event. This way, I am not crashing if this function is called from the initial IRQ, but will still crash if used outside of the hid context. This is not satisfactory, but I feel like it's going to be hard to teach the verifier that the callback function is sleepable in that case (maybe we could suffix the callback name, like we do for arguments, but this is not very clean either). Cheers, Benjamin
Benjamin Tissoires <benjamin.tissoires@redhat.com> writes: [...] >> IIUC, the bpf_timer callback is just a function (subprog) from the >> verifier PoV, so it is verified as whatever program type is creating the >> timer. So in other words, as long as you setup the timer from inside a >> tracing prog type, you should have access to all the same kfuncs, I >> think? > > Yep, you are correct. But as mentioned above, I am now in trouble with > the sleepable state: > - I need to call timer_start() from a non sleepable tracing function > (I'm in hard IRQ when dealing with a physical device) > - but then, ideally, the callback function needs to be tagged as a > sleepable one, so I can export my kfuncs which are doing kzalloc and > device IO as such. > > However, I can not really teach the BPF verifier to do so: > - it seems to check for the callback first when it is loaded, and > there is no SEC() equivalent for static functions > - libbpf doesn't have access to the callback as a prog as it has to be > a static function, and thus isn't exported as a full-blown prog. > - the verifier only checks for the callback when dealing with > BPF_FUNC_timer_set_callback, which doesn't have a "flag" argument > (though the validation of the callback has already been done while > checking it first, so we are already too late to change the sleppable > state of the callback) > > Right now, the only OK-ish version I have is declaring the kfunc as > non-sleepable, but checking that we are in a different context than > the IRQ of the initial event. This way, I am not crashing if this > function is called from the initial IRQ, but will still crash if used > outside of the hid context. > > This is not satisfactory, but I feel like it's going to be hard to > teach the verifier that the callback function is sleepable in that > case (maybe we could suffix the callback name, like we do for > arguments, but this is not very clean either). The callback is only set once when the timer is first setup; I *think* it works to do the setup (bpf_timer_init() and bpf_timer_set_callback()) in the context you need (from a sleepable prog), but do the arming (bpf_timer_start()) from a different program that is not itself sleepable? -Toke
On Mon, Feb 12, 2024 at 6:46 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote: > > Benjamin Tissoires <benjamin.tissoires@redhat.com> writes: > > [...] > >> IIUC, the bpf_timer callback is just a function (subprog) from the > >> verifier PoV, so it is verified as whatever program type is creating the > >> timer. So in other words, as long as you setup the timer from inside a > >> tracing prog type, you should have access to all the same kfuncs, I > >> think? > > > > Yep, you are correct. But as mentioned above, I am now in trouble with > > the sleepable state: > > - I need to call timer_start() from a non sleepable tracing function > > (I'm in hard IRQ when dealing with a physical device) > > - but then, ideally, the callback function needs to be tagged as a > > sleepable one, so I can export my kfuncs which are doing kzalloc and > > device IO as such. > > > > However, I can not really teach the BPF verifier to do so: > > - it seems to check for the callback first when it is loaded, and > > there is no SEC() equivalent for static functions > > - libbpf doesn't have access to the callback as a prog as it has to be > > a static function, and thus isn't exported as a full-blown prog. > > - the verifier only checks for the callback when dealing with > > BPF_FUNC_timer_set_callback, which doesn't have a "flag" argument > > (though the validation of the callback has already been done while > > checking it first, so we are already too late to change the sleppable > > state of the callback) > > > > Right now, the only OK-ish version I have is declaring the kfunc as > > non-sleepable, but checking that we are in a different context than > > the IRQ of the initial event. This way, I am not crashing if this > > function is called from the initial IRQ, but will still crash if used > > outside of the hid context. > > > > This is not satisfactory, but I feel like it's going to be hard to > > teach the verifier that the callback function is sleepable in that > > case (maybe we could suffix the callback name, like we do for > > arguments, but this is not very clean either). > > The callback is only set once when the timer is first setup; I *think* > it works to do the setup (bpf_timer_init() and bpf_timer_set_callback()) > in the context you need (from a sleepable prog), but do the arming > (bpf_timer_start()) from a different program that is not itself sleepable? > Genius! It works, and I can just keep having them declared as a syscall kfunc, not as a tracing kfunc. But isn't this an issue outside of my use case? I mean, if the callback is assuming the environment for when it is set up but can be called from any context there seems to be a problem when 2 contexts are not equivalent, no?
On Mon, Feb 12, 2024 at 10:21 AM Benjamin Tissoires <benjamin.tissoires@redhat.com> wrote: > > On Mon, Feb 12, 2024 at 6:46 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote: > > > > Benjamin Tissoires <benjamin.tissoires@redhat.com> writes: > > > > [...] > > >> IIUC, the bpf_timer callback is just a function (subprog) from the > > >> verifier PoV, so it is verified as whatever program type is creating the > > >> timer. So in other words, as long as you setup the timer from inside a > > >> tracing prog type, you should have access to all the same kfuncs, I > > >> think? > > > > > > Yep, you are correct. But as mentioned above, I am now in trouble with > > > the sleepable state: > > > - I need to call timer_start() from a non sleepable tracing function > > > (I'm in hard IRQ when dealing with a physical device) > > > - but then, ideally, the callback function needs to be tagged as a > > > sleepable one, so I can export my kfuncs which are doing kzalloc and > > > device IO as such. > > > > > > However, I can not really teach the BPF verifier to do so: > > > - it seems to check for the callback first when it is loaded, and > > > there is no SEC() equivalent for static functions > > > - libbpf doesn't have access to the callback as a prog as it has to be > > > a static function, and thus isn't exported as a full-blown prog. > > > - the verifier only checks for the callback when dealing with > > > BPF_FUNC_timer_set_callback, which doesn't have a "flag" argument > > > (though the validation of the callback has already been done while > > > checking it first, so we are already too late to change the sleppable > > > state of the callback) > > > > > > Right now, the only OK-ish version I have is declaring the kfunc as > > > non-sleepable, but checking that we are in a different context than > > > the IRQ of the initial event. This way, I am not crashing if this > > > function is called from the initial IRQ, but will still crash if used > > > outside of the hid context. > > > > > > This is not satisfactory, but I feel like it's going to be hard to > > > teach the verifier that the callback function is sleepable in that > > > case (maybe we could suffix the callback name, like we do for > > > arguments, but this is not very clean either). > > > > The callback is only set once when the timer is first setup; I *think* > > it works to do the setup (bpf_timer_init() and bpf_timer_set_callback()) > > in the context you need (from a sleepable prog), but do the arming > > (bpf_timer_start()) from a different program that is not itself sleepable? > > > > Genius! It works, and I can just keep having them declared as a > syscall kfunc, not as a tracing kfunc. > > But isn't this an issue outside of my use case? I mean, if the > callback is assuming the environment for when it is set up but can be > called from any context there seems to be a problem when 2 contexts > are not equivalent, no? I agree that workqueue delegation fits into the bpf_timer concept and a lot of code can and should be shared. All the lessons(bugs) learned with bpf_timer don't need to be re-discovered :) Too bad, bpf_timer_set_callback() doesn't have a flag argument, so we need a new kfunc to set a sleepable callback. Maybe bpf_timer_set_sleepable_cb() ? The verifier will set is_async_cb = true for it (like it does for regular cb-s). And since prog->aux->sleepable is kinda "global" we need another per subprog flag: bool is_sleepable: 1; We can factor out a check "if (prog->aux->sleepable)" into a helper that will check that "global" flag and another env->cur_state->in_sleepable flag that will work similar to active_rcu_lock. Once the verifier starts processing subprog->is_sleepable it will set cur_state->in_sleepable = true; to make all subprogs called from that cb to be recognized as sleepable too. A bit of a challenge is what to do with global subprogs, since they're verified lazily. They can be called from sleepable and non-sleepable contex. Should be solvable. Overall I think this feature is needed urgently, so if you don't have cycles to work on this soon, I can prioritize it right after bpf_arena work.
On Feb 12 2024, Alexei Starovoitov wrote: > On Mon, Feb 12, 2024 at 10:21 AM Benjamin Tissoires > <benjamin.tissoires@redhat.com> wrote: > > > > On Mon, Feb 12, 2024 at 6:46 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote: > > > > > > Benjamin Tissoires <benjamin.tissoires@redhat.com> writes: > > > [...] > I agree that workqueue delegation fits into the bpf_timer concept and > a lot of code can and should be shared. Thanks Alexei for the detailed answer. I've given it an attempt but still can not figure it out entirely. > All the lessons(bugs) learned with bpf_timer don't need to be re-discovered :) > Too bad, bpf_timer_set_callback() doesn't have a flag argument, > so we need a new kfunc to set a sleepable callback. > Maybe > bpf_timer_set_sleepable_cb() ? OK. So I guess I should drop Toke's suggestion with the bpf_timer_ini() flag? > The verifier will set is_async_cb = true for it (like it does for regular cb-s). > And since prog->aux->sleepable is kinda "global" we need another > per subprog flag: > bool is_sleepable: 1; done (in push_callback_call()) > > We can factor out a check "if (prog->aux->sleepable)" into a helper > that will check that "global" flag and another env->cur_state->in_sleepable > flag that will work similar to active_rcu_lock. done (I think), cf patch 2 below > Once the verifier starts processing subprog->is_sleepable > it will set cur_state->in_sleepable = true; > to make all subprogs called from that cb to be recognized as sleepable too. That's the point I don't know where to put the new code. It seems the best place would be in do_check(), but I am under the impression that the code of the callback is added at the end of the instruction list, meaning that I do not know where it starts, and which subprog index it corresponds to. > > A bit of a challenge is what to do with global subprogs, > since they're verified lazily. They can be called from > sleepable and non-sleepable contex. Should be solvable. I must confess this is way over me (and given that I didn't even managed to make the "easy" case working, that might explain things a little :-P ) > > Overall I think this feature is needed urgently, > so if you don't have cycles to work on this soon, > I can prioritize it right after bpf_arena work. I can try to spare a few cycles on it. Even if your instructions were on spot, I still can't make the subprogs recognized as sleepable. For reference, this is where I am (probably bogus, but seems to be working when timer_set_sleepable_cb() is called from a sleepable context as mentioned by Toke): --- >From d4aa3d969fa9a89c6447d843dad338fde2ac0155 Mon Sep 17 00:00:00 2001 From: Benjamin Tissoires <bentiss@kernel.org> Date: Tue, 13 Feb 2024 18:40:01 +0100 Subject: [PATCH RFC bpf-next v2 01/11] Sleepable timers MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Message-Id: <20240213-hid-bpf-sleepable-v2-1-6c2d6b49865c@kernel.org> --- include/linux/bpf_verifier.h | 2 + include/uapi/linux/bpf.h | 13 ++++++ kernel/bpf/helpers.c | 91 +++++++++++++++++++++++++++++++++++++++--- kernel/bpf/verifier.c | 20 ++++++++-- tools/include/uapi/linux/bpf.h | 13 ++++++ 5 files changed, 130 insertions(+), 9 deletions(-) diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h index 84365e6dd85d..789ef5fec547 100644 --- a/include/linux/bpf_verifier.h +++ b/include/linux/bpf_verifier.h @@ -426,6 +426,7 @@ struct bpf_verifier_state { * while they are still in use. */ bool used_as_loop_entry; + bool in_sleepable; /* first and last insn idx of this verifier state */ u32 first_insn_idx; @@ -626,6 +627,7 @@ struct bpf_subprog_info { bool is_async_cb: 1; bool is_exception_cb: 1; bool args_cached: 1; + bool is_sleepable: 1; u8 arg_cnt; struct bpf_subprog_arg_info args[MAX_BPF_FUNC_REG_ARGS]; diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index d96708380e52..ef1f2be4cfef 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -5742,6 +5742,18 @@ union bpf_attr { * 0 on success. * * **-ENOENT** if the bpf_local_storage cannot be found. + * + * long bpf_timer_set_sleepable_cb(struct bpf_timer *timer, void *callback_fn) + * Description + * Configure the timer to call *callback_fn* static function in a + * sleepable context. + * Return + * 0 on success. + * **-EINVAL** if *timer* was not initialized with bpf_timer_init() earlier. + * **-EPERM** if *timer* is in a map that doesn't have any user references. + * The user space should either hold a file descriptor to a map with timers + * or pin such map in bpffs. When map is unpinned or file descriptor is + * closed all timers in the map will be cancelled and freed. */ #define ___BPF_FUNC_MAPPER(FN, ctx...) \ FN(unspec, 0, ##ctx) \ @@ -5956,6 +5968,7 @@ union bpf_attr { FN(user_ringbuf_drain, 209, ##ctx) \ FN(cgrp_storage_get, 210, ##ctx) \ FN(cgrp_storage_delete, 211, ##ctx) \ + FN(timer_set_sleepable_cb, 212, ##ctx) \ /* */ /* backwards-compatibility macros for users of __BPF_FUNC_MAPPER that don't diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c index 4db1c658254c..e3b83d27b1b6 100644 --- a/kernel/bpf/helpers.c +++ b/kernel/bpf/helpers.c @@ -1097,9 +1097,11 @@ const struct bpf_func_proto bpf_snprintf_proto = { */ struct bpf_hrtimer { struct hrtimer timer; + struct work_struct work; struct bpf_map *map; struct bpf_prog *prog; void __rcu *callback_fn; + void __rcu *sleepable_cb_fn; void *value; }; @@ -1113,18 +1115,64 @@ struct bpf_timer_kern { struct bpf_spin_lock lock; } __attribute__((aligned(8))); +static void bpf_timer_work_cb(struct work_struct *work) +{ + struct bpf_hrtimer *t = container_of(work, struct bpf_hrtimer, work); + struct bpf_map *map = t->map; + void *value = t->value; + bpf_callback_t callback_fn; + void *key; + u32 idx; + + BTF_TYPE_EMIT(struct bpf_timer); + + rcu_read_lock(); + callback_fn = rcu_dereference(t->sleepable_cb_fn); + rcu_read_unlock(); + if (!callback_fn) + return; + + // /* bpf_timer_work_cb() runs in hrtimer_run_softirq. It doesn't migrate and + // * cannot be preempted by another bpf_timer_cb() on the same cpu. + // * Remember the timer this callback is servicing to prevent + // * deadlock if callback_fn() calls bpf_timer_cancel() or + // * bpf_map_delete_elem() on the same timer. + // */ + // this_cpu_write(hrtimer_running, t); + if (map->map_type == BPF_MAP_TYPE_ARRAY) { + struct bpf_array *array = container_of(map, struct bpf_array, map); + + /* compute the key */ + idx = ((char *)value - array->value) / array->elem_size; + key = &idx; + } else { /* hash or lru */ + key = value - round_up(map->key_size, 8); + } + + callback_fn((u64)(long)map, (u64)(long)key, (u64)(long)value, 0, 0); + /* The verifier checked that return value is zero. */ + + // this_cpu_write(hrtimer_running, NULL); +} + static DEFINE_PER_CPU(struct bpf_hrtimer *, hrtimer_running); static enum hrtimer_restart bpf_timer_cb(struct hrtimer *hrtimer) { struct bpf_hrtimer *t = container_of(hrtimer, struct bpf_hrtimer, timer); + bpf_callback_t callback_fn, sleepable_cb_fn; struct bpf_map *map = t->map; void *value = t->value; - bpf_callback_t callback_fn; void *key; u32 idx; BTF_TYPE_EMIT(struct bpf_timer); + sleepable_cb_fn = rcu_dereference_check(t->sleepable_cb_fn, rcu_read_lock_bh_held()); + if (sleepable_cb_fn) { + schedule_work(&t->work); + goto out; + } + callback_fn = rcu_dereference_check(t->callback_fn, rcu_read_lock_bh_held()); if (!callback_fn) goto out; @@ -1190,7 +1238,9 @@ BPF_CALL_3(bpf_timer_init, struct bpf_timer_kern *, timer, struct bpf_map *, map t->map = map; t->prog = NULL; rcu_assign_pointer(t->callback_fn, NULL); + rcu_assign_pointer(t->sleepable_cb_fn, NULL); hrtimer_init(&t->timer, clockid, HRTIMER_MODE_REL_SOFT); + INIT_WORK(&t->work, bpf_timer_work_cb); t->timer.function = bpf_timer_cb; WRITE_ONCE(timer->timer, t); /* Guarantee the order between timer->timer and map->usercnt. So @@ -1221,8 +1271,8 @@ static const struct bpf_func_proto bpf_timer_init_proto = { .arg3_type = ARG_ANYTHING, }; -BPF_CALL_3(bpf_timer_set_callback, struct bpf_timer_kern *, timer, void *, callback_fn, - struct bpf_prog_aux *, aux) +static int __bpf_timer_set_callback(struct bpf_timer_kern *timer, void *callback_fn, + struct bpf_prog_aux *aux, bool is_sleepable) { struct bpf_prog *prev, *prog = aux->prog; struct bpf_hrtimer *t; @@ -1260,12 +1310,24 @@ BPF_CALL_3(bpf_timer_set_callback, struct bpf_timer_kern *, timer, void *, callb bpf_prog_put(prev); t->prog = prog; } - rcu_assign_pointer(t->callback_fn, callback_fn); + if (is_sleepable) { + rcu_assign_pointer(t->sleepable_cb_fn, callback_fn); + rcu_assign_pointer(t->callback_fn, NULL); + } else { + rcu_assign_pointer(t->callback_fn, callback_fn); + rcu_assign_pointer(t->sleepable_cb_fn, NULL); + } out: __bpf_spin_unlock_irqrestore(&timer->lock); return ret; } +BPF_CALL_3(bpf_timer_set_callback, struct bpf_timer_kern *, timer, void *, callback_fn, + struct bpf_prog_aux *, aux) +{ + return __bpf_timer_set_callback(timer, callback_fn, aux, false); +} + static const struct bpf_func_proto bpf_timer_set_callback_proto = { .func = bpf_timer_set_callback, .gpl_only = true, @@ -1274,6 +1336,20 @@ static const struct bpf_func_proto bpf_timer_set_callback_proto = { .arg2_type = ARG_PTR_TO_FUNC, }; +BPF_CALL_3(bpf_timer_set_sleepable_cb, struct bpf_timer_kern *, timer, void *, callback_fn, + struct bpf_prog_aux *, aux) +{ + return __bpf_timer_set_callback(timer, callback_fn, aux, true); +} + +static const struct bpf_func_proto bpf_timer_set_sleepable_cb_proto = { + .func = bpf_timer_set_sleepable_cb, + .gpl_only = true, + .ret_type = RET_INTEGER, + .arg1_type = ARG_PTR_TO_TIMER, + .arg2_type = ARG_PTR_TO_FUNC, +}; + BPF_CALL_3(bpf_timer_start, struct bpf_timer_kern *, timer, u64, nsecs, u64, flags) { struct bpf_hrtimer *t; @@ -1353,6 +1429,7 @@ BPF_CALL_1(bpf_timer_cancel, struct bpf_timer_kern *, timer) * if it was running. */ ret = ret ?: hrtimer_cancel(&t->timer); + ret = ret ?: cancel_work_sync(&t->work); return ret; } @@ -1405,8 +1482,10 @@ void bpf_timer_cancel_and_free(void *val) * effectively cancelled because bpf_timer_cb() will return * HRTIMER_NORESTART. */ - if (this_cpu_read(hrtimer_running) != t) + if (this_cpu_read(hrtimer_running) != t) { hrtimer_cancel(&t->timer); + } + cancel_work_sync(&t->work); kfree(t); } @@ -1749,6 +1828,8 @@ bpf_base_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) return &bpf_timer_init_proto; case BPF_FUNC_timer_set_callback: return &bpf_timer_set_callback_proto; + case BPF_FUNC_timer_set_sleepable_cb: + return &bpf_timer_set_sleepable_cb_proto; case BPF_FUNC_timer_start: return &bpf_timer_start_proto; case BPF_FUNC_timer_cancel: diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 64fa188d00ad..400c625efe22 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -513,7 +513,8 @@ static bool is_sync_callback_calling_function(enum bpf_func_id func_id) static bool is_async_callback_calling_function(enum bpf_func_id func_id) { - return func_id == BPF_FUNC_timer_set_callback; + return func_id == BPF_FUNC_timer_set_callback || + func_id == BPF_FUNC_timer_set_sleepable_cb; } static bool is_callback_calling_function(enum bpf_func_id func_id) @@ -1414,6 +1415,7 @@ static int copy_verifier_state(struct bpf_verifier_state *dst_state, } dst_state->speculative = src->speculative; dst_state->active_rcu_lock = src->active_rcu_lock; + dst_state->in_sleepable = src->in_sleepable; dst_state->curframe = src->curframe; dst_state->active_lock.ptr = src->active_lock.ptr; dst_state->active_lock.id = src->active_lock.id; @@ -9434,11 +9436,13 @@ static int push_callback_call(struct bpf_verifier_env *env, struct bpf_insn *ins if (insn->code == (BPF_JMP | BPF_CALL) && insn->src_reg == 0 && - insn->imm == BPF_FUNC_timer_set_callback) { + (insn->imm == BPF_FUNC_timer_set_callback || + insn->imm == BPF_FUNC_timer_set_sleepable_cb)) { struct bpf_verifier_state *async_cb; /* there is no real recursion here. timer callbacks are async */ env->subprog_info[subprog].is_async_cb = true; + env->subprog_info[subprog].is_sleepable = insn->imm == BPF_FUNC_timer_set_sleepable_cb; async_cb = push_async_cb(env, env->subprog_info[subprog].start, insn_idx, subprog); if (!async_cb) @@ -10280,6 +10284,8 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn set_map_elem_callback_state); break; case BPF_FUNC_timer_set_callback: + fallthrough; + case BPF_FUNC_timer_set_sleepable_cb: err = push_callback_call(env, insn, insn_idx, meta.subprogno, set_timer_callback_state); break; @@ -15586,7 +15592,9 @@ static int visit_insn(int t, struct bpf_verifier_env *env) return DONE_EXPLORING; case BPF_CALL: - if (insn->src_reg == 0 && insn->imm == BPF_FUNC_timer_set_callback) + if (insn->src_reg == 0 && + (insn->imm == BPF_FUNC_timer_set_callback || + insn->imm == BPF_FUNC_timer_set_sleepable_cb)) /* Mark this call insn as a prune point to trigger * is_state_visited() check before call itself is * processed by __check_func_call(). Otherwise new @@ -16762,6 +16770,9 @@ static bool states_equal(struct bpf_verifier_env *env, if (old->active_rcu_lock != cur->active_rcu_lock) return false; + if (old->in_sleepable != cur->in_sleepable) + return false; + /* for states to be equal callsites have to be the same * and all frame states need to be equivalent */ @@ -19639,7 +19650,8 @@ static int do_misc_fixups(struct bpf_verifier_env *env) continue; } - if (insn->imm == BPF_FUNC_timer_set_callback) { + if (insn->imm == BPF_FUNC_timer_set_callback || + insn->imm == BPF_FUNC_timer_set_sleepable_cb) { /* The verifier will process callback_fn as many times as necessary * with different maps and the register states prepared by * set_timer_callback_state will be accurate. diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h index d96708380e52..ef1f2be4cfef 100644 --- a/tools/include/uapi/linux/bpf.h +++ b/tools/include/uapi/linux/bpf.h @@ -5742,6 +5742,18 @@ union bpf_attr { * 0 on success. * * **-ENOENT** if the bpf_local_storage cannot be found. + * + * long bpf_timer_set_sleepable_cb(struct bpf_timer *timer, void *callback_fn) + * Description + * Configure the timer to call *callback_fn* static function in a + * sleepable context. + * Return + * 0 on success. + * **-EINVAL** if *timer* was not initialized with bpf_timer_init() earlier. + * **-EPERM** if *timer* is in a map that doesn't have any user references. + * The user space should either hold a file descriptor to a map with timers + * or pin such map in bpffs. When map is unpinned or file descriptor is + * closed all timers in the map will be cancelled and freed. */ #define ___BPF_FUNC_MAPPER(FN, ctx...) \ FN(unspec, 0, ##ctx) \ @@ -5956,6 +5968,7 @@ union bpf_attr { FN(user_ringbuf_drain, 209, ##ctx) \ FN(cgrp_storage_get, 210, ##ctx) \ FN(cgrp_storage_delete, 211, ##ctx) \ + FN(timer_set_sleepable_cb, 212, ##ctx) \ /* */ /* backwards-compatibility macros for users of __BPF_FUNC_MAPPER that don't
On Tue, 13 Feb 2024 at 18:46, Benjamin Tissoires <bentiss@kernel.org> wrote: > > On Feb 12 2024, Alexei Starovoitov wrote: > > On Mon, Feb 12, 2024 at 10:21 AM Benjamin Tissoires > > <benjamin.tissoires@redhat.com> wrote: > > > > > > On Mon, Feb 12, 2024 at 6:46 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote: > > > > > > > > Benjamin Tissoires <benjamin.tissoires@redhat.com> writes: > > > > > [...] > > I agree that workqueue delegation fits into the bpf_timer concept and > > a lot of code can and should be shared. > > Thanks Alexei for the detailed answer. I've given it an attempt but still can not > figure it out entirely. > > > All the lessons(bugs) learned with bpf_timer don't need to be re-discovered :) > > Too bad, bpf_timer_set_callback() doesn't have a flag argument, > > so we need a new kfunc to set a sleepable callback. > > Maybe > > bpf_timer_set_sleepable_cb() ? > > OK. So I guess I should drop Toke's suggestion with the bpf_timer_ini() flag? > > > The verifier will set is_async_cb = true for it (like it does for regular cb-s). > > And since prog->aux->sleepable is kinda "global" we need another > > per subprog flag: > > bool is_sleepable: 1; > > done (in push_callback_call()) > > > > > We can factor out a check "if (prog->aux->sleepable)" into a helper > > that will check that "global" flag and another env->cur_state->in_sleepable > > flag that will work similar to active_rcu_lock. > > done (I think), cf patch 2 below > > > Once the verifier starts processing subprog->is_sleepable > > it will set cur_state->in_sleepable = true; > > to make all subprogs called from that cb to be recognized as sleepable too. > > That's the point I don't know where to put the new code. > I think that would go in the already existing special case for push_async_cb where you get the verifier state of the async callback. You can make setting the boolean in that verifier state conditional on whether it's your kfunc/helper you're processing taking a sleepable callback. > It seems the best place would be in do_check(), but I am under the impression > that the code of the callback is added at the end of the instruction list, meaning > that I do not know where it starts, and which subprog index it corresponds to. > > > > > A bit of a challenge is what to do with global subprogs, > > since they're verified lazily. They can be called from > > sleepable and non-sleepable contex. Should be solvable. > > I must confess this is way over me (and given that I didn't even managed to make > the "easy" case working, that might explain things a little :-P ) > I think it will be solvable but made somewhat difficult by the fact that even if we mark subprog_info of some global_func A as in_sleepable, so that we explore it as sleepable during its verification, we might encounter later another global_func that calls a global func, already explored as non-sleepable, in sleepable context. In this case I think we need to redo the verification of that global func as sleepable once again. It could be that it is called from both non-sleepable and sleepable contexts, so both paths (in_sleepable = true, and in_sleepable = false) need to be explored, or we could reject such cases, but it might be a little restrictive. Some common helper global func unrelated to caller context doing some auxiliary work, called from sleepable timer callback and normal main subprog might be an example where rejection will be prohibitive. An approach might be to explore main and global subprogs once as we do now, and then keep a list of global subprogs that need to be revisited as in_sleepable (due to being called from a sleepable context) and trigger do_check_common for them again, this might have to be repeated as the list grows on each iteration, but eventually we will have explored all of them as in_sleepable if need be, and the loop will end. Surely, this trades off logical simplicity of verifier code with redoing verification of global subprogs again. To add items to such a list, for each global subprog we encounter that needs to be analyzed as in_sleepable, we will also collect all its callee global subprogs by walking its instructions (a bit like check_max_stack_depth does). > > > > Overall I think this feature is needed urgently, > > so if you don't have cycles to work on this soon, > > I can prioritize it right after bpf_arena work. > > I can try to spare a few cycles on it. Even if your instructions were on > spot, I still can't make the subprogs recognized as sleepable. > > For reference, this is where I am (probably bogus, but seems to be > working when timer_set_sleepable_cb() is called from a sleepable context > as mentioned by Toke): > I just skimmed the patch but I think it's already 90% there. The only other change I would suggest is switching from helper to kfunc, as originally proposed by Alexei. > [...]
Kumar Kartikeya Dwivedi <memxor@gmail.com> writes: > On Tue, 13 Feb 2024 at 18:46, Benjamin Tissoires <bentiss@kernel.org> wrote: >> >> On Feb 12 2024, Alexei Starovoitov wrote: >> > On Mon, Feb 12, 2024 at 10:21 AM Benjamin Tissoires >> > <benjamin.tissoires@redhat.com> wrote: >> > > >> > > On Mon, Feb 12, 2024 at 6:46 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote: >> > > > >> > > > Benjamin Tissoires <benjamin.tissoires@redhat.com> writes: >> > > > >> [...] >> > I agree that workqueue delegation fits into the bpf_timer concept and >> > a lot of code can and should be shared. >> >> Thanks Alexei for the detailed answer. I've given it an attempt but still can not >> figure it out entirely. >> >> > All the lessons(bugs) learned with bpf_timer don't need to be re-discovered :) >> > Too bad, bpf_timer_set_callback() doesn't have a flag argument, >> > so we need a new kfunc to set a sleepable callback. >> > Maybe >> > bpf_timer_set_sleepable_cb() ? >> >> OK. So I guess I should drop Toke's suggestion with the bpf_timer_ini() flag? >> >> > The verifier will set is_async_cb = true for it (like it does for regular cb-s). >> > And since prog->aux->sleepable is kinda "global" we need another >> > per subprog flag: >> > bool is_sleepable: 1; >> >> done (in push_callback_call()) >> >> > >> > We can factor out a check "if (prog->aux->sleepable)" into a helper >> > that will check that "global" flag and another env->cur_state->in_sleepable >> > flag that will work similar to active_rcu_lock. >> >> done (I think), cf patch 2 below >> >> > Once the verifier starts processing subprog->is_sleepable >> > it will set cur_state->in_sleepable = true; >> > to make all subprogs called from that cb to be recognized as sleepable too. >> >> That's the point I don't know where to put the new code. >> > > I think that would go in the already existing special case for > push_async_cb where you get the verifier state of the async callback. > You can make setting the boolean in that verifier state conditional on > whether it's your kfunc/helper you're processing taking a sleepable > callback. > >> It seems the best place would be in do_check(), but I am under the impression >> that the code of the callback is added at the end of the instruction list, meaning >> that I do not know where it starts, and which subprog index it corresponds to. >> >> > >> > A bit of a challenge is what to do with global subprogs, >> > since they're verified lazily. They can be called from >> > sleepable and non-sleepable contex. Should be solvable. >> >> I must confess this is way over me (and given that I didn't even managed to make >> the "easy" case working, that might explain things a little :-P ) >> > > I think it will be solvable but made somewhat difficult by the fact > that even if we mark subprog_info of some global_func A as > in_sleepable, so that we explore it as sleepable during its > verification, we might encounter later another global_func that calls > a global func, already explored as non-sleepable, in sleepable > context. In this case I think we need to redo the verification of that > global func as sleepable once again. It could be that it is called > from both non-sleepable and sleepable contexts, so both paths > (in_sleepable = true, and in_sleepable = false) need to be explored, > or we could reject such cases, but it might be a little restrictive. > > Some common helper global func unrelated to caller context doing some > auxiliary work, called from sleepable timer callback and normal main > subprog might be an example where rejection will be prohibitive. > > An approach might be to explore main and global subprogs once as we do > now, and then keep a list of global subprogs that need to be revisited > as in_sleepable (due to being called from a sleepable context) and > trigger do_check_common for them again, this might have to be repeated > as the list grows on each iteration, but eventually we will have > explored all of them as in_sleepable if need be, and the loop will > end. Surely, this trades off logical simplicity of verifier code with > redoing verification of global subprogs again. > > To add items to such a list, for each global subprog we encounter that > needs to be analyzed as in_sleepable, we will also collect all its > callee global subprogs by walking its instructions (a bit like > check_max_stack_depth does). Sorry if I'm being dense, but why is all this needed if it's already possible to just define the timer callback from a program type that allows sleeping, and then set the actual timeout from a different program that is not sleepable? Isn't the set_sleepable_cb() kfunc just a convenience then? Or did I misunderstand and it's not actually possible to mix callback/timer arming from different program types? -Toke
On Tue, Feb 13, 2024 at 08:23:17PM +0100, Kumar Kartikeya Dwivedi wrote: > On Tue, 13 Feb 2024 at 18:46, Benjamin Tissoires <bentiss@kernel.org> wrote: > > > > On Feb 12 2024, Alexei Starovoitov wrote: > > > On Mon, Feb 12, 2024 at 10:21 AM Benjamin Tissoires > > > <benjamin.tissoires@redhat.com> wrote: > > > > > > > > On Mon, Feb 12, 2024 at 6:46 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote: > > > > > > > > > > Benjamin Tissoires <benjamin.tissoires@redhat.com> writes: > > > > > > > [...] > > > I agree that workqueue delegation fits into the bpf_timer concept and > > > a lot of code can and should be shared. > > > > Thanks Alexei for the detailed answer. I've given it an attempt but still can not > > figure it out entirely. > > > > > All the lessons(bugs) learned with bpf_timer don't need to be re-discovered :) > > > Too bad, bpf_timer_set_callback() doesn't have a flag argument, > > > so we need a new kfunc to set a sleepable callback. > > > Maybe > > > bpf_timer_set_sleepable_cb() ? > > > > OK. So I guess I should drop Toke's suggestion with the bpf_timer_ini() flag? I think the flag for bpf_timer_init() is still needed. Since we probably shouldn't combine hrtimer with workqueue. bpf_timer_start() should do schedule_work() directly. The latency matters in some cases. We will use flags to specify which workqueue strategy to use. Fingers crossed, WQ_BH will be merged in the next merge window and we'd need to expose it as an option to bpf progs. > > > > > The verifier will set is_async_cb = true for it (like it does for regular cb-s). > > > And since prog->aux->sleepable is kinda "global" we need another > > > per subprog flag: > > > bool is_sleepable: 1; > > > > done (in push_callback_call()) > > > > > > > > We can factor out a check "if (prog->aux->sleepable)" into a helper > > > that will check that "global" flag and another env->cur_state->in_sleepable > > > flag that will work similar to active_rcu_lock. > > > > done (I think), cf patch 2 below > > > > > Once the verifier starts processing subprog->is_sleepable > > > it will set cur_state->in_sleepable = true; > > > to make all subprogs called from that cb to be recognized as sleepable too. > > > > That's the point I don't know where to put the new code. > > > > I think that would go in the already existing special case for > push_async_cb where you get the verifier state of the async callback. > You can make setting the boolean in that verifier state conditional on > whether it's your kfunc/helper you're processing taking a sleepable > callback. > > > It seems the best place would be in do_check(), but I am under the impression > > that the code of the callback is added at the end of the instruction list, meaning > > that I do not know where it starts, and which subprog index it corresponds to. > > > > > > > > A bit of a challenge is what to do with global subprogs, > > > since they're verified lazily. They can be called from > > > sleepable and non-sleepable contex. Should be solvable. > > > > I must confess this is way over me (and given that I didn't even managed to make > > the "easy" case working, that might explain things a little :-P ) > > > > I think it will be solvable but made somewhat difficult by the fact > that even if we mark subprog_info of some global_func A as > in_sleepable, so that we explore it as sleepable during its > verification, we might encounter later another global_func that calls > a global func, already explored as non-sleepable, in sleepable > context. In this case I think we need to redo the verification of that > global func as sleepable once again. It could be that it is called > from both non-sleepable and sleepable contexts, so both paths > (in_sleepable = true, and in_sleepable = false) need to be explored, > or we could reject such cases, but it might be a little restrictive. > > Some common helper global func unrelated to caller context doing some > auxiliary work, called from sleepable timer callback and normal main > subprog might be an example where rejection will be prohibitive. > > An approach might be to explore main and global subprogs once as we do > now, and then keep a list of global subprogs that need to be revisited > as in_sleepable (due to being called from a sleepable context) and > trigger do_check_common for them again, this might have to be repeated > as the list grows on each iteration, but eventually we will have > explored all of them as in_sleepable if need be, and the loop will > end. Surely, this trades off logical simplicity of verifier code with > redoing verification of global subprogs again. > > To add items to such a list, for each global subprog we encounter that > needs to be analyzed as in_sleepable, we will also collect all its > callee global subprogs by walking its instructions (a bit like > check_max_stack_depth does). imo that would be a serious increase in the verifier complexity. It's not clear whether this is needed at this point. For now I would drop cur_state->in_sleepable to false before verifying global subprogs. > > > > > > Overall I think this feature is needed urgently, > > > so if you don't have cycles to work on this soon, > > > I can prioritize it right after bpf_arena work. > > > > I can try to spare a few cycles on it. Even if your instructions were on > > spot, I still can't make the subprogs recognized as sleepable. > > > > For reference, this is where I am (probably bogus, but seems to be > > working when timer_set_sleepable_cb() is called from a sleepable context > > as mentioned by Toke): > > > > I just skimmed the patch but I think it's already 90% there. The only > other change I would suggest is switching from helper to kfunc, as > originally proposed by Alexei. +1. I quickly looked through the patches and it does look like 90% there.
On Tue, Feb 13, 2024 at 08:51:26PM +0100, Toke Høiland-Jørgensen wrote: > Kumar Kartikeya Dwivedi <memxor@gmail.com> writes: > > > On Tue, 13 Feb 2024 at 18:46, Benjamin Tissoires <bentiss@kernel.org> wrote: > >> > >> On Feb 12 2024, Alexei Starovoitov wrote: > >> > On Mon, Feb 12, 2024 at 10:21 AM Benjamin Tissoires > >> > <benjamin.tissoires@redhat.com> wrote: > >> > > > >> > > On Mon, Feb 12, 2024 at 6:46 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote: > >> > > > > >> > > > Benjamin Tissoires <benjamin.tissoires@redhat.com> writes: > >> > > > > >> [...] > >> > I agree that workqueue delegation fits into the bpf_timer concept and > >> > a lot of code can and should be shared. > >> > >> Thanks Alexei for the detailed answer. I've given it an attempt but still can not > >> figure it out entirely. > >> > >> > All the lessons(bugs) learned with bpf_timer don't need to be re-discovered :) > >> > Too bad, bpf_timer_set_callback() doesn't have a flag argument, > >> > so we need a new kfunc to set a sleepable callback. > >> > Maybe > >> > bpf_timer_set_sleepable_cb() ? > >> > >> OK. So I guess I should drop Toke's suggestion with the bpf_timer_ini() flag? > >> > >> > The verifier will set is_async_cb = true for it (like it does for regular cb-s). > >> > And since prog->aux->sleepable is kinda "global" we need another > >> > per subprog flag: > >> > bool is_sleepable: 1; > >> > >> done (in push_callback_call()) > >> > >> > > >> > We can factor out a check "if (prog->aux->sleepable)" into a helper > >> > that will check that "global" flag and another env->cur_state->in_sleepable > >> > flag that will work similar to active_rcu_lock. > >> > >> done (I think), cf patch 2 below > >> > >> > Once the verifier starts processing subprog->is_sleepable > >> > it will set cur_state->in_sleepable = true; > >> > to make all subprogs called from that cb to be recognized as sleepable too. > >> > >> That's the point I don't know where to put the new code. > >> > > > > I think that would go in the already existing special case for > > push_async_cb where you get the verifier state of the async callback. > > You can make setting the boolean in that verifier state conditional on > > whether it's your kfunc/helper you're processing taking a sleepable > > callback. > > > >> It seems the best place would be in do_check(), but I am under the impression > >> that the code of the callback is added at the end of the instruction list, meaning > >> that I do not know where it starts, and which subprog index it corresponds to. > >> > >> > > >> > A bit of a challenge is what to do with global subprogs, > >> > since they're verified lazily. They can be called from > >> > sleepable and non-sleepable contex. Should be solvable. > >> > >> I must confess this is way over me (and given that I didn't even managed to make > >> the "easy" case working, that might explain things a little :-P ) > >> > > > > I think it will be solvable but made somewhat difficult by the fact > > that even if we mark subprog_info of some global_func A as > > in_sleepable, so that we explore it as sleepable during its > > verification, we might encounter later another global_func that calls > > a global func, already explored as non-sleepable, in sleepable > > context. In this case I think we need to redo the verification of that > > global func as sleepable once again. It could be that it is called > > from both non-sleepable and sleepable contexts, so both paths > > (in_sleepable = true, and in_sleepable = false) need to be explored, > > or we could reject such cases, but it might be a little restrictive. > > > > Some common helper global func unrelated to caller context doing some > > auxiliary work, called from sleepable timer callback and normal main > > subprog might be an example where rejection will be prohibitive. > > > > An approach might be to explore main and global subprogs once as we do > > now, and then keep a list of global subprogs that need to be revisited > > as in_sleepable (due to being called from a sleepable context) and > > trigger do_check_common for them again, this might have to be repeated > > as the list grows on each iteration, but eventually we will have > > explored all of them as in_sleepable if need be, and the loop will > > end. Surely, this trades off logical simplicity of verifier code with > > redoing verification of global subprogs again. > > > > To add items to such a list, for each global subprog we encounter that > > needs to be analyzed as in_sleepable, we will also collect all its > > callee global subprogs by walking its instructions (a bit like > > check_max_stack_depth does). > > Sorry if I'm being dense, but why is all this needed if it's already > possible to just define the timer callback from a program type that > allows sleeping, and then set the actual timeout from a different > program that is not sleepable? Isn't the set_sleepable_cb() kfunc just a > convenience then? Or did I misunderstand and it's not actually possible > to mix callback/timer arming from different program types? More than just convience. bpf_set_sleepable_cb() might need to be called from non-sleepable and there could be no way to hack it around with fake sleepable entry. bpf_timer_cancel() clears callback_fn. So if prog wants to bpf_timer_start() and later bpf_timer_cancel() it would need to bpf_set_sleepable_cb() every time before bpf_timer_start(). And at that time it might be in non-sleepable ctx.
Alexei Starovoitov <alexei.starovoitov@gmail.com> writes: > On Tue, Feb 13, 2024 at 08:51:26PM +0100, Toke Høiland-Jørgensen wrote: >> Kumar Kartikeya Dwivedi <memxor@gmail.com> writes: >> >> > On Tue, 13 Feb 2024 at 18:46, Benjamin Tissoires <bentiss@kernel.org> wrote: >> >> >> >> On Feb 12 2024, Alexei Starovoitov wrote: >> >> > On Mon, Feb 12, 2024 at 10:21 AM Benjamin Tissoires >> >> > <benjamin.tissoires@redhat.com> wrote: >> >> > > >> >> > > On Mon, Feb 12, 2024 at 6:46 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote: >> >> > > > >> >> > > > Benjamin Tissoires <benjamin.tissoires@redhat.com> writes: >> >> > > > >> >> [...] >> >> > I agree that workqueue delegation fits into the bpf_timer concept and >> >> > a lot of code can and should be shared. >> >> >> >> Thanks Alexei for the detailed answer. I've given it an attempt but still can not >> >> figure it out entirely. >> >> >> >> > All the lessons(bugs) learned with bpf_timer don't need to be re-discovered :) >> >> > Too bad, bpf_timer_set_callback() doesn't have a flag argument, >> >> > so we need a new kfunc to set a sleepable callback. >> >> > Maybe >> >> > bpf_timer_set_sleepable_cb() ? >> >> >> >> OK. So I guess I should drop Toke's suggestion with the bpf_timer_ini() flag? >> >> >> >> > The verifier will set is_async_cb = true for it (like it does for regular cb-s). >> >> > And since prog->aux->sleepable is kinda "global" we need another >> >> > per subprog flag: >> >> > bool is_sleepable: 1; >> >> >> >> done (in push_callback_call()) >> >> >> >> > >> >> > We can factor out a check "if (prog->aux->sleepable)" into a helper >> >> > that will check that "global" flag and another env->cur_state->in_sleepable >> >> > flag that will work similar to active_rcu_lock. >> >> >> >> done (I think), cf patch 2 below >> >> >> >> > Once the verifier starts processing subprog->is_sleepable >> >> > it will set cur_state->in_sleepable = true; >> >> > to make all subprogs called from that cb to be recognized as sleepable too. >> >> >> >> That's the point I don't know where to put the new code. >> >> >> > >> > I think that would go in the already existing special case for >> > push_async_cb where you get the verifier state of the async callback. >> > You can make setting the boolean in that verifier state conditional on >> > whether it's your kfunc/helper you're processing taking a sleepable >> > callback. >> > >> >> It seems the best place would be in do_check(), but I am under the impression >> >> that the code of the callback is added at the end of the instruction list, meaning >> >> that I do not know where it starts, and which subprog index it corresponds to. >> >> >> >> > >> >> > A bit of a challenge is what to do with global subprogs, >> >> > since they're verified lazily. They can be called from >> >> > sleepable and non-sleepable contex. Should be solvable. >> >> >> >> I must confess this is way over me (and given that I didn't even managed to make >> >> the "easy" case working, that might explain things a little :-P ) >> >> >> > >> > I think it will be solvable but made somewhat difficult by the fact >> > that even if we mark subprog_info of some global_func A as >> > in_sleepable, so that we explore it as sleepable during its >> > verification, we might encounter later another global_func that calls >> > a global func, already explored as non-sleepable, in sleepable >> > context. In this case I think we need to redo the verification of that >> > global func as sleepable once again. It could be that it is called >> > from both non-sleepable and sleepable contexts, so both paths >> > (in_sleepable = true, and in_sleepable = false) need to be explored, >> > or we could reject such cases, but it might be a little restrictive. >> > >> > Some common helper global func unrelated to caller context doing some >> > auxiliary work, called from sleepable timer callback and normal main >> > subprog might be an example where rejection will be prohibitive. >> > >> > An approach might be to explore main and global subprogs once as we do >> > now, and then keep a list of global subprogs that need to be revisited >> > as in_sleepable (due to being called from a sleepable context) and >> > trigger do_check_common for them again, this might have to be repeated >> > as the list grows on each iteration, but eventually we will have >> > explored all of them as in_sleepable if need be, and the loop will >> > end. Surely, this trades off logical simplicity of verifier code with >> > redoing verification of global subprogs again. >> > >> > To add items to such a list, for each global subprog we encounter that >> > needs to be analyzed as in_sleepable, we will also collect all its >> > callee global subprogs by walking its instructions (a bit like >> > check_max_stack_depth does). >> >> Sorry if I'm being dense, but why is all this needed if it's already >> possible to just define the timer callback from a program type that >> allows sleeping, and then set the actual timeout from a different >> program that is not sleepable? Isn't the set_sleepable_cb() kfunc just a >> convenience then? Or did I misunderstand and it's not actually possible >> to mix callback/timer arming from different program types? > > More than just convience. > bpf_set_sleepable_cb() might need to be called from non-sleepable and > there could be no way to hack it around with fake sleepable entry. > bpf_timer_cancel() clears callback_fn. > So if prog wants to bpf_timer_start() and later bpf_timer_cancel() > it would need to bpf_set_sleepable_cb() every time before bpf_timer_start(). > And at that time it might be in non-sleepable ctx. Ah, right, makes sense; didn't think about bpf_timer_cancel(). Thanks for the explanation :) -Toke
On Feb 13 2024, Kumar Kartikeya Dwivedi wrote: > On Tue, 13 Feb 2024 at 18:46, Benjamin Tissoires <bentiss@kernel.org> wrote: > > > > On Feb 12 2024, Alexei Starovoitov wrote: > > > On Mon, Feb 12, 2024 at 10:21 AM Benjamin Tissoires > > > <benjamin.tissoires@redhat.com> wrote: > > > > > > > > On Mon, Feb 12, 2024 at 6:46 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote: > > > > > > > > > > Benjamin Tissoires <benjamin.tissoires@redhat.com> writes: > > > > > > > [...] > > > I agree that workqueue delegation fits into the bpf_timer concept and > > > a lot of code can and should be shared. > > > > Thanks Alexei for the detailed answer. I've given it an attempt but still can not > > figure it out entirely. > > > > > All the lessons(bugs) learned with bpf_timer don't need to be re-discovered :) > > > Too bad, bpf_timer_set_callback() doesn't have a flag argument, > > > so we need a new kfunc to set a sleepable callback. > > > Maybe > > > bpf_timer_set_sleepable_cb() ? > > > > OK. So I guess I should drop Toke's suggestion with the bpf_timer_ini() flag? > > > > > The verifier will set is_async_cb = true for it (like it does for regular cb-s). > > > And since prog->aux->sleepable is kinda "global" we need another > > > per subprog flag: > > > bool is_sleepable: 1; > > > > done (in push_callback_call()) > > > > > > > > We can factor out a check "if (prog->aux->sleepable)" into a helper > > > that will check that "global" flag and another env->cur_state->in_sleepable > > > flag that will work similar to active_rcu_lock. > > > > done (I think), cf patch 2 below > > > > > Once the verifier starts processing subprog->is_sleepable > > > it will set cur_state->in_sleepable = true; > > > to make all subprogs called from that cb to be recognized as sleepable too. > > > > That's the point I don't know where to put the new code. > > > > I think that would go in the already existing special case for > push_async_cb where you get the verifier state of the async callback. > You can make setting the boolean in that verifier state conditional on > whether it's your kfunc/helper you're processing taking a sleepable > callback. Hehe, thanks a lot. Indeed, it was a simple fix. I tried to put this everywhere but here, and with your help got it working in 2 mins :) > > > It seems the best place would be in do_check(), but I am under the impression > > that the code of the callback is added at the end of the instruction list, meaning > > that I do not know where it starts, and which subprog index it corresponds to. > > > > > > > > A bit of a challenge is what to do with global subprogs, > > > since they're verified lazily. They can be called from > > > sleepable and non-sleepable contex. Should be solvable. > > > > I must confess this is way over me (and given that I didn't even managed to make > > the "easy" case working, that might explain things a little :-P ) > > > > I think it will be solvable but made somewhat difficult by the fact > that even if we mark subprog_info of some global_func A as > in_sleepable, so that we explore it as sleepable during its > verification, we might encounter later another global_func that calls > a global func, already explored as non-sleepable, in sleepable > context. In this case I think we need to redo the verification of that > global func as sleepable once again. It could be that it is called > from both non-sleepable and sleepable contexts, so both paths > (in_sleepable = true, and in_sleepable = false) need to be explored, > or we could reject such cases, but it might be a little restrictive. > > Some common helper global func unrelated to caller context doing some > auxiliary work, called from sleepable timer callback and normal main > subprog might be an example where rejection will be prohibitive. > > An approach might be to explore main and global subprogs once as we do > now, and then keep a list of global subprogs that need to be revisited > as in_sleepable (due to being called from a sleepable context) and > trigger do_check_common for them again, this might have to be repeated > as the list grows on each iteration, but eventually we will have > explored all of them as in_sleepable if need be, and the loop will > end. Surely, this trades off logical simplicity of verifier code with > redoing verification of global subprogs again. > > To add items to such a list, for each global subprog we encounter that > needs to be analyzed as in_sleepable, we will also collect all its > callee global subprogs by walking its instructions (a bit like > check_max_stack_depth does). FWIW, this (or Alexei's suggestion) is still not implemented in v2 > > > > > > > Overall I think this feature is needed urgently, > > > so if you don't have cycles to work on this soon, > > > I can prioritize it right after bpf_arena work. > > > > I can try to spare a few cycles on it. Even if your instructions were on > > spot, I still can't make the subprogs recognized as sleepable. > > > > For reference, this is where I am (probably bogus, but seems to be > > working when timer_set_sleepable_cb() is called from a sleepable context > > as mentioned by Toke): > > > > I just skimmed the patch but I think it's already 90% there. The only > other change I would suggest is switching from helper to kfunc, as > originally proposed by Alexei. the kfunc was a rabbit hole: - I needed to teach the verifier about BPF_TIMER in kfunc - I needed to teach the verifier about the kfunc itself - I'm failing at calling the callback :( Anyway, I'm about to send a second RFC so we can discuss on the code and see where my monkey patching capabilities are reaching their limits. Cheers, Benjamin
[Putting this as a RFC because I'm pretty sure I'm not doing the things correctly at the BPF level.] [Also using bpf-next as the base tree as there will be conflicting changes otherwise] Ideally I'd like to have something similar to bpf_timers, but not in soft IRQ context. So I'm emulating this with a sleepable bpf_tail_call() (see "HID: bpf: allow to defer work in a delayed workqueue"). Basically, I need to be able to defer a HID-BPF program for the following reasons (from the aforementioned patch): 1. defer an event: Sometimes we receive an out of proximity event, but the device can not be trusted enough, and we need to ensure that we won't receive another one in the following n milliseconds. So we need to wait those n milliseconds, and eventually re-inject that event in the stack. 2. inject new events in reaction to one given event: We might want to transform one given event into several. This is the case for macro keys where a single key press is supposed to send a sequence of key presses. But this could also be used to patch a faulty behavior, if a device forgets to send a release event. 3. communicate with the device in reaction to one event: We might want to communicate back to the device after a given event. For example a device might send us an event saying that it came back from sleeping state and needs to be re-initialized. Currently we can achieve that by keeping a userspace program around, raise a bpf event, and let that userspace program inject the events and commands. However, we are just keeping that program alive as a daemon for just scheduling commands. There is no logic in it, so it doesn't really justify an actual userspace wakeup. So a kernel workqueue seems simpler to handle. Besides the "this is insane to reimplement the wheel", the other part I'm not sure is whether we can say that BPF maps of type prog_array and of type queue/stack can be used in sleepable context. I don't see any warning when running the test programs, but that's probably not a guarantee I'm doing the things properly :) Cheers, Benjamin Signed-off-by: Benjamin Tissoires <bentiss@kernel.org> --- Benjamin Tissoires (9): bpf: allow more maps in sleepable bpf programs HID: bpf/dispatch: regroup kfuncs definitions HID: bpf: export hid_hw_output_report as a BPF kfunc selftests/hid: Add test for hid_bpf_hw_output_report HID: bpf: allow to inject HID event from BPF selftests/hid: add tests for hid_bpf_input_report HID: bpf: allow to defer work in a delayed workqueue selftests/hid: add test for hid_bpf_schedule_delayed_work selftests/hid: add another set of delayed work tests Documentation/hid/hid-bpf.rst | 26 +- drivers/hid/bpf/entrypoints/entrypoints.bpf.c | 8 + drivers/hid/bpf/entrypoints/entrypoints.lskel.h | 236 +++++++++------ drivers/hid/bpf/hid_bpf_dispatch.c | 315 ++++++++++++++++----- drivers/hid/bpf/hid_bpf_jmp_table.c | 34 ++- drivers/hid/hid-core.c | 2 + include/linux/hid_bpf.h | 10 + kernel/bpf/verifier.c | 3 + tools/testing/selftests/hid/hid_bpf.c | 286 ++++++++++++++++++- tools/testing/selftests/hid/progs/hid.c | 205 ++++++++++++++ .../testing/selftests/hid/progs/hid_bpf_helpers.h | 8 + 11 files changed, 962 insertions(+), 171 deletions(-) --- base-commit: 4f7a05917237b006ceae760507b3d15305769ade change-id: 20240205-hid-bpf-sleepable-c01260fd91c4 Best regards,