diff mbox

[2/6] sched: add function to execute a function synchronously on a physical cpu

Message ID 1457697574-6710-3-git-send-email-jgross@suse.com (mailing list archive)
State New, archived
Headers show

Commit Message

Jürgen Groß March 11, 2016, 11:59 a.m. UTC
On some hardware models (e.g. Dell Studio 1555 laptop) some hardware
related functions (e.g. SMIs) are to be executed on physical cpu 0
only. Instead of open coding such a functionality multiple times in
the kernel add a service function for this purpose. This will enable
the possibility to take special measures in virtualized environments
like Xen, too.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 include/linux/sched.h |  9 +++++++++
 kernel/sched/core.c   | 26 ++++++++++++++++++++++++++
 2 files changed, 35 insertions(+)

Comments

Peter Zijlstra March 11, 2016, 12:19 p.m. UTC | #1
On Fri, Mar 11, 2016 at 12:59:30PM +0100, Juergen Gross wrote:
> +int call_sync_on_phys_cpu(unsigned cpu, int (*func)(void *), void *par)
> +{
> +	cpumask_var_t old_mask;
> +	int ret;
> +
> +	if (cpu >= nr_cpu_ids)
> +		return -EINVAL;
> +
> +	if (!alloc_cpumask_var(&old_mask, GFP_KERNEL))
> +		return -ENOMEM;
> +
> +	cpumask_copy(old_mask, &current->cpus_allowed);
> +	ret = set_cpus_allowed_ptr(current, cpumask_of(cpu));
> +	if (ret)
> +		goto out;

So what happens if someone does sched_setaffinity() right about here?

> +
> +	ret = func(par);
> +
> +	set_cpus_allowed_ptr(current, old_mask);
> +
> +out:
> +	free_cpumask_var(old_mask);
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(call_sync_on_phys_cpu);

This is disgusting, and you're adding this to !Xen kernels too.
Peter Zijlstra March 11, 2016, 12:42 p.m. UTC | #2
On Fri, Mar 11, 2016 at 01:19:50PM +0100, Peter Zijlstra wrote:
> On Fri, Mar 11, 2016 at 12:59:30PM +0100, Juergen Gross wrote:
> > +int call_sync_on_phys_cpu(unsigned cpu, int (*func)(void *), void *par)
> > +{
> > +	cpumask_var_t old_mask;
> > +	int ret;
> > +
> > +	if (cpu >= nr_cpu_ids)
> > +		return -EINVAL;
> > +
> > +	if (!alloc_cpumask_var(&old_mask, GFP_KERNEL))
> > +		return -ENOMEM;
> > +
> > +	cpumask_copy(old_mask, &current->cpus_allowed);
> > +	ret = set_cpus_allowed_ptr(current, cpumask_of(cpu));
> > +	if (ret)
> > +		goto out;
> 
> So what happens if someone does sched_setaffinity() right about here?
> 
> > +
> > +	ret = func(par);
> > +
> > +	set_cpus_allowed_ptr(current, old_mask);
> > +
> > +out:
> > +	free_cpumask_var(old_mask);
> > +	return ret;
> > +}
> > +EXPORT_SYMBOL_GPL(call_sync_on_phys_cpu);
> 
> This is disgusting, and you're adding this to !Xen kernels too.

how about something like:

struct xen_callback_struct {
	struct work_struct	work;
	struct completion	done;
	void *			data;
	int			ret;
};

static void xen_callback_f(struct work_struct *work)
{
	struct xen_callback_struct *xcs = container_of(work, struct xen_callback_struct, work);

	xcs->ret = xcs->func(xcs->data);

	complete(&xcs->done);
}

xen_call_on_cpu_sync(int cpu, int (*func)(void *), void *data)
{
	struct xen_callback_state xcs = {
		.work = __WORK_INITIALIZER(xcs.work, xen_callback_f);
		.done = COMPLETION_INITIALIZER_ONSTACK(xcs.done),
		.data = data,
	};

	queue_work_on(&work, cpu);
	wait_for_completion(&xcs.done);

	return xcs.ret;
}

No mucking about with the scheduler state, no new exported functions
etc..
Jürgen Groß March 11, 2016, 12:43 p.m. UTC | #3
On 11/03/16 13:19, Peter Zijlstra wrote:
> On Fri, Mar 11, 2016 at 12:59:30PM +0100, Juergen Gross wrote:
>> +int call_sync_on_phys_cpu(unsigned cpu, int (*func)(void *), void *par)
>> +{
>> +	cpumask_var_t old_mask;
>> +	int ret;
>> +
>> +	if (cpu >= nr_cpu_ids)
>> +		return -EINVAL;
>> +
>> +	if (!alloc_cpumask_var(&old_mask, GFP_KERNEL))
>> +		return -ENOMEM;
>> +
>> +	cpumask_copy(old_mask, &current->cpus_allowed);
>> +	ret = set_cpus_allowed_ptr(current, cpumask_of(cpu));
>> +	if (ret)
>> +		goto out;
> 
> So what happens if someone does sched_setaffinity() right about here?

Aah, okay. Seems I should add preempt_disable() in this patch already
and retry the set_cpus_allowed_ptr() in case cpus_allowed has changed.

> 
>> +
>> +	ret = func(par);
>> +
>> +	set_cpus_allowed_ptr(current, old_mask);
>> +
>> +out:
>> +	free_cpumask_var(old_mask);
>> +	return ret;
>> +}
>> +EXPORT_SYMBOL_GPL(call_sync_on_phys_cpu);
> 
> This is disgusting, and you're adding this to !Xen kernels too.

Sure. It is called on !Xen kernels too. Without Xen it is just omitting
the vcpu pinning. I've copied above logic from dcdbas/i8k (it was open
coded twice).

BTW: I tried to get rid of the complete logic to call a function on cpu
0 only. It was rejected by the Dell maintainers.


Juergen
Peter Zijlstra March 11, 2016, 12:45 p.m. UTC | #4
On Fri, Mar 11, 2016 at 01:43:53PM +0100, Juergen Gross wrote:
> On 11/03/16 13:19, Peter Zijlstra wrote:
> > On Fri, Mar 11, 2016 at 12:59:30PM +0100, Juergen Gross wrote:
> >> +int call_sync_on_phys_cpu(unsigned cpu, int (*func)(void *), void *par)
> >> +{
> >> +	cpumask_var_t old_mask;
> >> +	int ret;
> >> +
> >> +	if (cpu >= nr_cpu_ids)
> >> +		return -EINVAL;
> >> +
> >> +	if (!alloc_cpumask_var(&old_mask, GFP_KERNEL))
> >> +		return -ENOMEM;
> >> +
> >> +	cpumask_copy(old_mask, &current->cpus_allowed);
> >> +	ret = set_cpus_allowed_ptr(current, cpumask_of(cpu));
> >> +	if (ret)
> >> +		goto out;
> > 
> > So what happens if someone does sched_setaffinity() right about here?
> 
> Aah, okay. Seems I should add preempt_disable() in this patch already
> and retry the set_cpus_allowed_ptr() in case cpus_allowed has changed.

Doesn't help, you should simply not _ever_ touch this for random tasks.
Jürgen Groß March 11, 2016, 12:48 p.m. UTC | #5
On 11/03/16 13:42, Peter Zijlstra wrote:
> On Fri, Mar 11, 2016 at 01:19:50PM +0100, Peter Zijlstra wrote:
>> On Fri, Mar 11, 2016 at 12:59:30PM +0100, Juergen Gross wrote:
>>> +int call_sync_on_phys_cpu(unsigned cpu, int (*func)(void *), void *par)
>>> +{
>>> +	cpumask_var_t old_mask;
>>> +	int ret;
>>> +
>>> +	if (cpu >= nr_cpu_ids)
>>> +		return -EINVAL;
>>> +
>>> +	if (!alloc_cpumask_var(&old_mask, GFP_KERNEL))
>>> +		return -ENOMEM;
>>> +
>>> +	cpumask_copy(old_mask, &current->cpus_allowed);
>>> +	ret = set_cpus_allowed_ptr(current, cpumask_of(cpu));
>>> +	if (ret)
>>> +		goto out;
>>
>> So what happens if someone does sched_setaffinity() right about here?
>>
>>> +
>>> +	ret = func(par);
>>> +
>>> +	set_cpus_allowed_ptr(current, old_mask);
>>> +
>>> +out:
>>> +	free_cpumask_var(old_mask);
>>> +	return ret;
>>> +}
>>> +EXPORT_SYMBOL_GPL(call_sync_on_phys_cpu);
>>
>> This is disgusting, and you're adding this to !Xen kernels too.
> 
> how about something like:
> 
> struct xen_callback_struct {
> 	struct work_struct	work;
> 	struct completion	done;
> 	void *			data;
> 	int			ret;
> };
> 
> static void xen_callback_f(struct work_struct *work)
> {
> 	struct xen_callback_struct *xcs = container_of(work, struct xen_callback_struct, work);
> 
> 	xcs->ret = xcs->func(xcs->data);
> 
> 	complete(&xcs->done);
> }
> 
> xen_call_on_cpu_sync(int cpu, int (*func)(void *), void *data)
> {
> 	struct xen_callback_state xcs = {
> 		.work = __WORK_INITIALIZER(xcs.work, xen_callback_f);
> 		.done = COMPLETION_INITIALIZER_ONSTACK(xcs.done),
> 		.data = data,
> 	};
> 
> 	queue_work_on(&work, cpu);
> 	wait_for_completion(&xcs.done);
> 
> 	return xcs.ret;
> }
> 
> No mucking about with the scheduler state, no new exported functions
> etc..
> 

Hey, I like it. Can't be limited to Xen as on bare metal the function
needs to be called on cpu 0, too. But avoiding the scheduler fiddling
is much better! As this seems to be required for Dell hardware only,
I could add it to some Dell base driver in case you don't want to add
it to core code.


Juergen
Peter Zijlstra March 11, 2016, 12:57 p.m. UTC | #6
On Fri, Mar 11, 2016 at 01:48:12PM +0100, Juergen Gross wrote:
> On 11/03/16 13:42, Peter Zijlstra wrote:
> > how about something like:
> > 
> > struct xen_callback_struct {
> > 	struct work_struct	work;
> > 	struct completion	done;
	int			(*func)(void*);
> > 	void *			data;
> > 	int			ret;
> > };
> > 
> > static void xen_callback_f(struct work_struct *work)
> > {
> > 	struct xen_callback_struct *xcs = container_of(work, struct xen_callback_struct, work);
> > 
> > 	xcs->ret = xcs->func(xcs->data);
> > 
> > 	complete(&xcs->done);
> > }
> > 
> > xen_call_on_cpu_sync(int cpu, int (*func)(void *), void *data)
> > {
> > 	struct xen_callback_state xcs = {
> > 		.work = __WORK_INITIALIZER(xcs.work, xen_callback_f);
> > 		.done = COMPLETION_INITIALIZER_ONSTACK(xcs.done),
		.func = func,
> > 		.data = data,
> > 	};
> > 
> > 	queue_work_on(&work, cpu);
> > 	wait_for_completion(&xcs.done);
> > 
> > 	return xcs.ret;
> > }
> > 
> > No mucking about with the scheduler state, no new exported functions
> > etc..
> > 
> 
> Hey, I like it. Can't be limited to Xen as on bare metal the function
> needs to be called on cpu 0, too. But avoiding the scheduler fiddling
> is much better! As this seems to be required for Dell hardware only,
> I could add it to some Dell base driver in case you don't want to add
> it to core code.

Urgh yeah, saw that in your other mail. It looks like I should go look
at set_cpus_allowed_ptr() abuse :/

Not sure where this would fit best, maybe somewhere near workqueue.c or
smp.c.
Jürgen Groß March 11, 2016, 1:07 p.m. UTC | #7
On 11/03/16 13:57, Peter Zijlstra wrote:
> On Fri, Mar 11, 2016 at 01:48:12PM +0100, Juergen Gross wrote:
>> On 11/03/16 13:42, Peter Zijlstra wrote:
>>> how about something like:
>>>
>>> struct xen_callback_struct {
>>> 	struct work_struct	work;
>>> 	struct completion	done;
> 	int			(*func)(void*);
>>> 	void *			data;
>>> 	int			ret;
>>> };
>>>
>>> static void xen_callback_f(struct work_struct *work)
>>> {
>>> 	struct xen_callback_struct *xcs = container_of(work, struct xen_callback_struct, work);
>>>
>>> 	xcs->ret = xcs->func(xcs->data);
>>>
>>> 	complete(&xcs->done);
>>> }
>>>
>>> xen_call_on_cpu_sync(int cpu, int (*func)(void *), void *data)
>>> {
>>> 	struct xen_callback_state xcs = {
>>> 		.work = __WORK_INITIALIZER(xcs.work, xen_callback_f);
>>> 		.done = COMPLETION_INITIALIZER_ONSTACK(xcs.done),
> 		.func = func,
>>> 		.data = data,
>>> 	};
>>>
>>> 	queue_work_on(&work, cpu);
>>> 	wait_for_completion(&xcs.done);
>>>
>>> 	return xcs.ret;
>>> }
>>>
>>> No mucking about with the scheduler state, no new exported functions
>>> etc..
>>>
>>
>> Hey, I like it. Can't be limited to Xen as on bare metal the function
>> needs to be called on cpu 0, too. But avoiding the scheduler fiddling
>> is much better! As this seems to be required for Dell hardware only,
>> I could add it to some Dell base driver in case you don't want to add
>> it to core code.
> 
> Urgh yeah, saw that in your other mail. It looks like I should go look
> at set_cpus_allowed_ptr() abuse :/
> 
> Not sure where this would fit best, maybe somewhere near workqueue.c or
> smp.c.

At a first glance I think smp.c would be the better choice. I'll have a try.

Thanks,

Juergen
diff mbox

Patch

diff --git a/include/linux/sched.h b/include/linux/sched.h
index a10494a..dfadf1a 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2247,6 +2247,7 @@  extern void do_set_cpus_allowed(struct task_struct *p,
 
 extern int set_cpus_allowed_ptr(struct task_struct *p,
 				const struct cpumask *new_mask);
+int call_sync_on_phys_cpu(unsigned cpu, int (*func)(void *), void *par);
 #else
 static inline void do_set_cpus_allowed(struct task_struct *p,
 				      const struct cpumask *new_mask)
@@ -2259,6 +2260,14 @@  static inline int set_cpus_allowed_ptr(struct task_struct *p,
 		return -EINVAL;
 	return 0;
 }
+static inline int call_sync_on_phys_cpu(unsigned cpu, int (*func)(void *),
+					void *par)
+{
+	if (cpu != 0)
+		return -EINVAL;
+
+	return func(par);
+}
 #endif
 
 #ifdef CONFIG_NO_HZ_COMMON
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 41f6b22..cb9955f 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1265,6 +1265,32 @@  int set_cpus_allowed_ptr(struct task_struct *p, const struct cpumask *new_mask)
 }
 EXPORT_SYMBOL_GPL(set_cpus_allowed_ptr);
 
+int call_sync_on_phys_cpu(unsigned cpu, int (*func)(void *), void *par)
+{
+	cpumask_var_t old_mask;
+	int ret;
+
+	if (cpu >= nr_cpu_ids)
+		return -EINVAL;
+
+	if (!alloc_cpumask_var(&old_mask, GFP_KERNEL))
+		return -ENOMEM;
+
+	cpumask_copy(old_mask, &current->cpus_allowed);
+	ret = set_cpus_allowed_ptr(current, cpumask_of(cpu));
+	if (ret)
+		goto out;
+
+	ret = func(par);
+
+	set_cpus_allowed_ptr(current, old_mask);
+
+out:
+	free_cpumask_var(old_mask);
+	return ret;
+}
+EXPORT_SYMBOL_GPL(call_sync_on_phys_cpu);
+
 void set_task_cpu(struct task_struct *p, unsigned int new_cpu)
 {
 #ifdef CONFIG_SCHED_DEBUG