diff mbox series

[sched_ext/for-6.15,v3,2/5] sched_ext: Declare context-sensitive kfunc groups that can be used by different SCX operations

Message ID AM6PR03MB508018ABBD34FBAA089DD9F799C22@AM6PR03MB5080.eurprd03.prod.outlook.com (mailing list archive)
State Not Applicable
Headers show
Series bpf, sched_ext: Make kfunc filters support struct_ops context to reduce runtime overhead | expand

Checks

Context Check Description
netdev/tree_selection success Guessing tree name failed - patch did not apply, async

Commit Message

Juntong Deng Feb. 26, 2025, 7:28 p.m. UTC
This patch declare context-sensitive kfunc groups that can be used by
different SCX operations.

In SCX, some kfuncs are context-sensitive and can only be used in
specific SCX operations.

Currently context-sensitive kfuncs can be grouped into UNLOCKED,
CPU_RELEASE, DISPATCH, ENQUEUE, SELECT_CPU.

In this patch enum scx_ops_kf_flags was added to represent these groups,
which is based on scx_kf_mask.

SCX_OPS_KF_ANY is a special value that indicates kfuncs can be used in
any context.

scx_ops_context_flags is used to declare the groups of kfuncs that can
be used by each SCX operation. An SCX operation can use multiple groups
of kfuncs.

Signed-off-by: Juntong Deng <juntong.deng@outlook.com>
---
 kernel/sched/ext.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 48 insertions(+)

Comments

Tejun Heo Feb. 27, 2025, 7:19 p.m. UTC | #1
On Wed, Feb 26, 2025 at 07:28:17PM +0000, Juntong Deng wrote:
> This patch declare context-sensitive kfunc groups that can be used by
             ^
             declares

> different SCX operations.
...
> +/* Each flag corresponds to a btf kfunc id set */
> +enum scx_ops_kf_flags {
> +	SCX_OPS_KF_ANY			= 0,
> +	SCX_OPS_KF_UNLOCKED		= 1 << 1,

nit: Any specific reason to skip bit 0?

> +	SCX_OPS_KF_CPU_RELEASE		= 1 << 2,
> +	SCX_OPS_KF_DISPATCH		= 1 << 3,
> +	SCX_OPS_KF_ENQUEUE		= 1 << 4,
> +	SCX_OPS_KF_SELECT_CPU		= 1 << 5,
> +};
> +
> +static const u32 scx_ops_context_flags[] = {
> +	[SCX_OP_IDX(select_cpu)]		= SCX_OPS_KF_SELECT_CPU | SCX_OPS_KF_ENQUEUE,
> +	[SCX_OP_IDX(enqueue)]			= SCX_OPS_KF_ENQUEUE,
> +	[SCX_OP_IDX(dequeue)]			= SCX_OPS_KF_ANY,
> +	[SCX_OP_IDX(dispatch)]			= SCX_OPS_KF_DISPATCH | SCX_OPS_KF_ENQUEUE,
> +	[SCX_OP_IDX(tick)]			= SCX_OPS_KF_ANY,
> +	[SCX_OP_IDX(runnable)]			= SCX_OPS_KF_ANY,
> +	[SCX_OP_IDX(running)]			= SCX_OPS_KF_ANY,
> +	[SCX_OP_IDX(stopping)]			= SCX_OPS_KF_ANY,
> +	[SCX_OP_IDX(quiescent)]			= SCX_OPS_KF_ANY,
> +	[SCX_OP_IDX(yield)]			= SCX_OPS_KF_ANY,
> +	[SCX_OP_IDX(core_sched_before)]		= SCX_OPS_KF_ANY,
> +	[SCX_OP_IDX(set_weight)]		= SCX_OPS_KF_ANY,
> +	[SCX_OP_IDX(set_cpumask)]		= SCX_OPS_KF_ANY,
> +	[SCX_OP_IDX(update_idle)]		= SCX_OPS_KF_ANY,
> +	[SCX_OP_IDX(cpu_acquire)]		= SCX_OPS_KF_ANY,
> +	[SCX_OP_IDX(cpu_release)]		= SCX_OPS_KF_CPU_RELEASE,
> +	[SCX_OP_IDX(init_task)]			= SCX_OPS_KF_UNLOCKED,
> +	[SCX_OP_IDX(exit_task)]			= SCX_OPS_KF_ANY,
> +	[SCX_OP_IDX(enable)]			= SCX_OPS_KF_ANY,
> +	[SCX_OP_IDX(disable)]			= SCX_OPS_KF_ANY,
> +	[SCX_OP_IDX(dump)]			= SCX_OPS_KF_DISPATCH,

Shouldn't this be SCX_OPS_KF_UNLOCKED?

Thanks.
Tejun Heo Feb. 28, 2025, 9:57 p.m. UTC | #2
On Wed, Feb 26, 2025 at 07:28:17PM +0000, Juntong Deng wrote:
> This patch declare context-sensitive kfunc groups that can be used by
> different SCX operations.
> 
> In SCX, some kfuncs are context-sensitive and can only be used in
> specific SCX operations.
> 
> Currently context-sensitive kfuncs can be grouped into UNLOCKED,
> CPU_RELEASE, DISPATCH, ENQUEUE, SELECT_CPU.
> 
> In this patch enum scx_ops_kf_flags was added to represent these groups,
> which is based on scx_kf_mask.
> 
> SCX_OPS_KF_ANY is a special value that indicates kfuncs can be used in
> any context.
> 
> scx_ops_context_flags is used to declare the groups of kfuncs that can
> be used by each SCX operation. An SCX operation can use multiple groups
> of kfuncs.
> 

Can you merge this into the next patch? I don't think separating this out
helps with reviewing.

Thanks.
Juntong Deng March 3, 2025, 9:12 p.m. UTC | #3
On 2025/2/28 21:57, Tejun Heo wrote:

> On Fri, Feb 28, 2025 at 06:42:11PM +0000, Juntong Deng wrote:
> > > > Return 0 means allowed. So kfuncs in scx_kfunc_ids_unlocked can be
> > > > called by other struct_ops programs.
> >
> > > Hmm... would that mean a non-sched_ext bpf prog would be able to call e.g.
> > > scx_bpf_dsq_insert()?
> >
> > For other struct_ops programs, yes, in the current logic,
> > when prog->aux->st_ops != &bpf_sched_ext_ops, all calls are allowed.
> >
> > This may seem a bit weird, but the reason I did it is that in other
> > struct_ops programs, the meaning of member_off changes, so the logic
> > that follows makes no sense at all.
> >
> > Of course, we can change this, and ideally there would be some groupings
> > (kfunc id set) that declare which kfunc can be called by other
> > struct_ops programs and which cannot.

> Other than any and unlocked, I don't think other bpf struct ops should be
> able to call SCX kfuncs. They all assume rq lock to be held which wouldn't
> be true for other struct_ops after all.

Ok, I will allow only any and unlocked to be called by other struct_ops
programs in the next version.

> ...
> > > I see, scx_dsq_move_*() are in both groups, so it should be fine. I'm not
> > > fully sure the groupings are the actually implemented filtering are in sync.
> > > They are intended to be but the grouping didn't really matter in the
> > > previous implementation. So, they need to be carefully audited.
> >
> > After you audit the current groupings of scx kfuncs, please tell me how
> > you would like to change the current groupings.

> Yeah, I'll go over them but after all, we need to ensure that the behavior
> currently implemented by scx_kf_allowed*() matches what the new code does,
> so I'd appreciate if you can go over with that in mind too. This is kinda
> confusing so we can definitely use more eyes.

Yes, I will use more eyes and be more careful on consistency.

> On Wed, Feb 26, 2025 at 07:28:17PM +0000, Juntong Deng wrote:
>> This patch declare context-sensitive kfunc groups that can be used by
>> different SCX operations.
>>
>> In SCX, some kfuncs are context-sensitive and can only be used in
>> specific SCX operations.
>>
>> Currently context-sensitive kfuncs can be grouped into UNLOCKED,
>> CPU_RELEASE, DISPATCH, ENQUEUE, SELECT_CPU.
>>
>> In this patch enum scx_ops_kf_flags was added to represent these groups,
>> which is based on scx_kf_mask.
>>
>> SCX_OPS_KF_ANY is a special value that indicates kfuncs can be used in
>> any context.
>>
>> scx_ops_context_flags is used to declare the groups of kfuncs that can
>> be used by each SCX operation. An SCX operation can use multiple groups
>> of kfuncs.
>>
> 
> Can you merge this into the next patch? I don't think separating this out
> helps with reviewing.
> 

Yes, I can merge them in the next version.

I am not sure, but it seems to me that the two patches are doing
different things?

> Thanks.
>
Tejun Heo March 4, 2025, 6 p.m. UTC | #4
On Mon, Mar 03, 2025 at 09:12:52PM +0000, Juntong Deng wrote:
> > Can you merge this into the next patch? I don't think separating this out
> > helps with reviewing.
> > 
> 
> Yes, I can merge them in the next version.
> 
> I am not sure, but it seems to me that the two patches are doing
> different things?

I don't know. It can be argued either way but it's more that the table added
by the first patch is not used in the first patch and the second patch is
difficult to review without referring to the table added in the first patch.
It can be either way but I don't see benefits of them being separate
patches.

Thanks.
Juntong Deng March 7, 2025, 12:07 a.m. UTC | #5
On 2025/3/4 18:00, Tejun Heo wrote:
> On Mon, Mar 03, 2025 at 09:12:52PM +0000, Juntong Deng wrote:
>>> Can you merge this into the next patch? I don't think separating this out
>>> helps with reviewing.
>>>
>>
>> Yes, I can merge them in the next version.
>>
>> I am not sure, but it seems to me that the two patches are doing
>> different things?
> 
> I don't know. It can be argued either way but it's more that the table added
> by the first patch is not used in the first patch and the second patch is
> difficult to review without referring to the table added in the first patch.
> It can be either way but I don't see benefits of them being separate
> patches.
> 

Yes, I agree, that is a good point.

I will merge them in the next version.

> Thanks.
>
diff mbox series

Patch

diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
index 74b247c36b3d..15fac968629e 100644
--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@ -730,6 +730,54 @@  struct sched_ext_ops {
 	char name[SCX_OPS_NAME_LEN];
 };
 
+/* Each flag corresponds to a btf kfunc id set */
+enum scx_ops_kf_flags {
+	SCX_OPS_KF_ANY			= 0,
+	SCX_OPS_KF_UNLOCKED		= 1 << 1,
+	SCX_OPS_KF_CPU_RELEASE		= 1 << 2,
+	SCX_OPS_KF_DISPATCH		= 1 << 3,
+	SCX_OPS_KF_ENQUEUE		= 1 << 4,
+	SCX_OPS_KF_SELECT_CPU		= 1 << 5,
+};
+
+static const u32 scx_ops_context_flags[] = {
+	[SCX_OP_IDX(select_cpu)]		= SCX_OPS_KF_SELECT_CPU | SCX_OPS_KF_ENQUEUE,
+	[SCX_OP_IDX(enqueue)]			= SCX_OPS_KF_ENQUEUE,
+	[SCX_OP_IDX(dequeue)]			= SCX_OPS_KF_ANY,
+	[SCX_OP_IDX(dispatch)]			= SCX_OPS_KF_DISPATCH | SCX_OPS_KF_ENQUEUE,
+	[SCX_OP_IDX(tick)]			= SCX_OPS_KF_ANY,
+	[SCX_OP_IDX(runnable)]			= SCX_OPS_KF_ANY,
+	[SCX_OP_IDX(running)]			= SCX_OPS_KF_ANY,
+	[SCX_OP_IDX(stopping)]			= SCX_OPS_KF_ANY,
+	[SCX_OP_IDX(quiescent)]			= SCX_OPS_KF_ANY,
+	[SCX_OP_IDX(yield)]			= SCX_OPS_KF_ANY,
+	[SCX_OP_IDX(core_sched_before)]		= SCX_OPS_KF_ANY,
+	[SCX_OP_IDX(set_weight)]		= SCX_OPS_KF_ANY,
+	[SCX_OP_IDX(set_cpumask)]		= SCX_OPS_KF_ANY,
+	[SCX_OP_IDX(update_idle)]		= SCX_OPS_KF_ANY,
+	[SCX_OP_IDX(cpu_acquire)]		= SCX_OPS_KF_ANY,
+	[SCX_OP_IDX(cpu_release)]		= SCX_OPS_KF_CPU_RELEASE,
+	[SCX_OP_IDX(init_task)]			= SCX_OPS_KF_UNLOCKED,
+	[SCX_OP_IDX(exit_task)]			= SCX_OPS_KF_ANY,
+	[SCX_OP_IDX(enable)]			= SCX_OPS_KF_ANY,
+	[SCX_OP_IDX(disable)]			= SCX_OPS_KF_ANY,
+	[SCX_OP_IDX(dump)]			= SCX_OPS_KF_DISPATCH,
+	[SCX_OP_IDX(dump_cpu)]			= SCX_OPS_KF_ANY,
+	[SCX_OP_IDX(dump_task)]			= SCX_OPS_KF_ANY,
+#ifdef CONFIG_EXT_GROUP_SCHED
+	[SCX_OP_IDX(cgroup_init)]		= SCX_OPS_KF_UNLOCKED,
+	[SCX_OP_IDX(cgroup_exit)]		= SCX_OPS_KF_UNLOCKED,
+	[SCX_OP_IDX(cgroup_prep_move)]		= SCX_OPS_KF_UNLOCKED,
+	[SCX_OP_IDX(cgroup_move)]		= SCX_OPS_KF_UNLOCKED,
+	[SCX_OP_IDX(cgroup_cancel_move)]	= SCX_OPS_KF_UNLOCKED,
+	[SCX_OP_IDX(cgroup_set_weight)]		= SCX_OPS_KF_UNLOCKED,
+#endif	/* CONFIG_EXT_GROUP_SCHED */
+	[SCX_OP_IDX(cpu_online)]		= SCX_OPS_KF_UNLOCKED,
+	[SCX_OP_IDX(cpu_offline)]		= SCX_OPS_KF_UNLOCKED,
+	[SCX_OP_IDX(init)]			= SCX_OPS_KF_UNLOCKED,
+	[SCX_OP_IDX(exit)]			= SCX_OPS_KF_UNLOCKED,
+};
+
 enum scx_opi {
 	SCX_OPI_BEGIN			= 0,
 	SCX_OPI_NORMAL_BEGIN		= 0,