Message ID | b3f648c2c4cd36b6a043239bee8437a2060c0ac4.1477000078.git.tim.c.chen@linux.intel.com (mailing list archive) |
---|---|
State | Not Applicable, archived |
Headers | show |
On Thu, 20 Oct 2016, Tim Chen wrote: > +static int sched_itmt_update_handler(struct ctl_table *table, int write, > + void __user *buffer, size_t *lenp, loff_t *ppos) Please align the arguments proper static int sched_itmt_update_handler(struct ctl_table *table, int write, void __user *buffer, size_t *lenp, loff_t *ppos) > +{ > + int ret; > + unsigned int old_sysctl; unsigned int old_sysctl; int ret; Please. It's way simpler to read. > -void sched_set_itmt_support(void) > +int sched_set_itmt_support(void) > { > mutex_lock(&itmt_update_mutex); > > + if (sched_itmt_capable) { > + mutex_unlock(&itmt_update_mutex); > + return 0; > + } > + > + itmt_sysctl_header = register_sysctl_table(itmt_root_table); > + if (!itmt_sysctl_header) { > + mutex_unlock(&itmt_update_mutex); > + return -ENOMEM; > + } > + > sched_itmt_capable = true; > > + /* > + * ITMT capability automatically enables ITMT > + * scheduling for small systems (single node). > + */ > + if (topology_num_packages() == 1) > + sysctl_sched_itmt_enabled = 1; I really hate this. This is policy and the kernel should not impose policy. Why would I like to have this enforced on my single socket XEON server? > + if (sysctl_sched_itmt_enabled) { Why would sysctl_sched_itmt_enabled be true at this point, aside of the above policy imposement? Thanks, tglx -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 20 Oct 2016, Tim Chen wrote: > > + if (itmt_sysctl_header) > + unregister_sysctl_table(itmt_sysctl_header); What sets itmt_sysctl_header to NULL? Thanks, tglx -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 26 Oct 2016, Peter Zijlstra wrote: > On Wed, Oct 26, 2016 at 12:49:36PM +0200, Thomas Gleixner wrote: > > > > + /* > > > + * ITMT capability automatically enables ITMT > > > + * scheduling for small systems (single node). > > > + */ > > > + if (topology_num_packages() == 1) > > > + sysctl_sched_itmt_enabled = 1; > > > > I really hate this. This is policy and the kernel should not impose > > policy. Why would I like to have this enforced on my single socket XEON > > server? > > So this really wants to be enabled by default; otherwise nobody will use > this, and it really does help single threaded workloads. Fair enough. Then this wants to be documented. > There were reservations on the multi-socket case of ITMT, maybe it would > help to spell those out in great detail here. That is, have the comment > explain the policy instead of simply stating what the code does (which > is always bad comment policy, you can read the code just fine). What is the objection for multi sockets? If it improves the behaviour then why would this be a bad thing for multi sockets? Thanks, tglx -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Oct 26, 2016 at 12:49:36PM +0200, Thomas Gleixner wrote: > > + /* > > + * ITMT capability automatically enables ITMT > > + * scheduling for small systems (single node). > > + */ > > + if (topology_num_packages() == 1) > > + sysctl_sched_itmt_enabled = 1; > > I really hate this. This is policy and the kernel should not impose > policy. Why would I like to have this enforced on my single socket XEON > server? So this really wants to be enabled by default; otherwise nobody will use this, and it really does help single threaded workloads. There were reservations on the multi-socket case of ITMT, maybe it would help to spell those out in great detail here. That is, have the comment explain the policy instead of simply stating what the code does (which is always bad comment policy, you can read the code just fine). -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 2016-10-26 at 13:24 +0200, Thomas Gleixner wrote: > On Wed, 26 Oct 2016, Peter Zijlstra wrote: > > > > On Wed, Oct 26, 2016 at 12:49:36PM +0200, Thomas Gleixner wrote: > > > > > > > > > > > > > + /* > > > > + * ITMT capability automatically enables ITMT > > > > + * scheduling for small systems (single node). > > > > + */ > > > > + if (topology_num_packages() == 1) > > > > + sysctl_sched_itmt_enabled = 1; > > > I really hate this. This is policy and the kernel should not impose > > > policy. Why would I like to have this enforced on my single socket XEON > > > server? > > So this really wants to be enabled by default; otherwise nobody will use > > this, and it really does help single threaded workloads. > Fair enough. Then this wants to be documented. > > > > > There were reservations on the multi-socket case of ITMT, maybe it would > > help to spell those out in great detail here. That is, have the comment > > explain the policy instead of simply stating what the code does (which > > is always bad comment policy, you can read the code just fine). > What is the objection for multi sockets? If it improves the behaviour then > why would this be a bad thing for multi sockets? For multi-socket (server system), it is much more likely that they will have multiple cpus in a socket busy and not run in turbo mode. So the extra work in migrating the workload to the one with extra headroom will not make use of those headroom in that scenario. I will update the comment to reflect this policy. See also our previous discussions: http://lkml.iu.edu/hypermail/linux/kernel/1609.1/03381.html Tim -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 2016-10-26 at 12:49 +0200, Thomas Gleixner wrote: > On Thu, 20 Oct 2016, Tim Chen wrote: > > > > +static int sched_itmt_update_handler(struct ctl_table *table, int write, > > + void __user *buffer, size_t *lenp, loff_t *ppos) > Please align the arguments proper > > static int > sched_itmt_update_handler(struct ctl_table *table, int write, > void __user *buffer, size_t *lenp, loff_t *ppos) > Okay. > > > > +{ > > + int ret; > > + unsigned int old_sysctl; > unsigned int old_sysctl; > int ret; > > Please. It's way simpler to read. Sure. > > > > > -void sched_set_itmt_support(void) > > +int sched_set_itmt_support(void) > > { > > mutex_lock(&itmt_update_mutex); > > > > + if (sched_itmt_capable) { > > + mutex_unlock(&itmt_update_mutex); > > + return 0; > > + } > > + > > + itmt_sysctl_header = register_sysctl_table(itmt_root_table); > > + if (!itmt_sysctl_header) { > > + mutex_unlock(&itmt_update_mutex); > > + return -ENOMEM; > > + } > > + > > sched_itmt_capable = true; > > > > + /* > > + * ITMT capability automatically enables ITMT > > + * scheduling for small systems (single node). > > + */ > > + if (topology_num_packages() == 1) > > + sysctl_sched_itmt_enabled = 1; > I really hate this. This is policy and the kernel should not impose > policy. Why would I like to have this enforced on my single socket XEON > server? > > > > > + if (sysctl_sched_itmt_enabled) { > Why would sysctl_sched_itmt_enabled be true at this point, aside of the > above policy imposement? That's true, it will only be enabled for the above case. I can merge it into the if check above. Tim -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 2016-10-26 at 12:52 +0200, Thomas Gleixner wrote: > On Thu, 20 Oct 2016, Tim Chen wrote: > > > > > > + if (itmt_sysctl_header) > > + unregister_sysctl_table(itmt_sysctl_header); > What sets itmt_sysctl_header to NULL? > If the registration of the itmt sysctl table has failed, it will be NULL. Tim -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 26 Oct 2016, Tim Chen wrote: > On Wed, 2016-10-26 at 13:24 +0200, Thomas Gleixner wrote: > > > There were reservations on the multi-socket case of ITMT, maybe it would > > > help to spell those out in great detail here. That is, have the comment > > > explain the policy instead of simply stating what the code does (which > > > is always bad comment policy, you can read the code just fine). > > What is the objection for multi sockets? If it improves the behaviour then > > why would this be a bad thing for multi sockets? > > For multi-socket (server system), it is much more likely that they will > have multiple cpus in a socket busy and not run in turbo mode. So the extra > work in migrating the workload to the one with extra headroom will > not make use of those headroom in that scenario. I will update the comment > to reflect this policy. So on a single socket server system the extra work does not matter, right? Don't tell me that single socket server systems are irrelevant. Intel is actively promoting single socket CPUs, like XEON D, for high densitiy servers... Instead of handwaving arguments I prefer a proper analysis of what the overhead is and why it is not a good thing for loaded servers in general. Then instead of slapping half baken heuristics into the code, we should sit down and think a bit harder about it. Thanks, tglx
On Wed, 26 Oct 2016, Tim Chen wrote: > On Wed, 2016-10-26 at 12:52 +0200, Thomas Gleixner wrote: > > On Thu, 20 Oct 2016, Tim Chen wrote: > > > > > > > > > + if (itmt_sysctl_header) > > > + unregister_sysctl_table(itmt_sysctl_header); > > What sets itmt_sysctl_header to NULL? > > > > If the registration of the itmt sysctl table has failed, it will > be NULL. And what clears it _AFTER_ the deregistration? Nothing, AFAICT.
On Wed, 2016-10-26 at 20:11 +0200, Thomas Gleixner wrote: > On Wed, 26 Oct 2016, Tim Chen wrote: > > > > > On Wed, 2016-10-26 at 12:52 +0200, Thomas Gleixner wrote: > > > > > > On Thu, 20 Oct 2016, Tim Chen wrote: > > > > > > > > > > > > > > > > + if (itmt_sysctl_header) > > > > + unregister_sysctl_table(itmt_sysctl_header); > > > What sets itmt_sysctl_header to NULL? > > > > > If the registration of the itmt sysctl table has failed, it will > > be NULL. > And what clears it _AFTER_ the deregistration? Nothing, AFAICT. Ok. I'll clear itmt_sysctl_header here. Thanks. Tim -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 2016-10-26 at 20:09 +0200, Thomas Gleixner wrote: > On Wed, 26 Oct 2016, Tim Chen wrote: > > > > On Wed, 2016-10-26 at 13:24 +0200, Thomas Gleixner wrote: > > > > > > > > > > > There were reservations on the multi-socket case of ITMT, maybe it would > > > > help to spell those out in great detail here. That is, have the comment > > > > explain the policy instead of simply stating what the code does (which > > > > is always bad comment policy, you can read the code just fine). > > > What is the objection for multi sockets? If it improves the behaviour then > > > why would this be a bad thing for multi sockets? > > For multi-socket (server system), it is much more likely that they will > > have multiple cpus in a socket busy and not run in turbo mode. So the extra > > work in migrating the workload to the one with extra headroom will > > not make use of those headroom in that scenario. I will update the comment > > to reflect this policy. > So on a single socket server system the extra work does not matter, right? > Don't tell me that single socket server systems are irrelevant. Intel is > actively promoting single socket CPUs, like XEON D, for high densitiy > servers... > > Instead of handwaving arguments I prefer a proper analysis of what the > overhead is and why it is not a good thing for loaded servers in general. > > Then instead of slapping half baken heuristics into the code, we should sit > down and think a bit harder about it. > The ITMT scheduling overhead should be small. Mostly a small number of cycles initially spent to idle balance tasks towards an idled favored core, and cycles to refill hot data in the mid level cache for the migrated task. Those should be a very small percentage of the cycles that the task spent running on the favored core. So any extra boost in frequency should compensate so should be a good trade off. After some internal discussions, we think we should enable the ITMT feature by default for all systems supporting ITMT. I will remove the single socket restriction. Thanks. Tim -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/arch/x86/include/asm/topology.h b/arch/x86/include/asm/topology.h index a73fb80..46ebdd1 100644 --- a/arch/x86/include/asm/topology.h +++ b/arch/x86/include/asm/topology.h @@ -155,23 +155,26 @@ extern bool x86_topology_update; #include <asm/percpu.h> DECLARE_PER_CPU_READ_MOSTLY(int, sched_core_priority); +extern unsigned int __read_mostly sysctl_sched_itmt_enabled; /* Interface to set priority of a cpu */ void sched_set_itmt_core_prio(int prio, int core_cpu); /* Interface to notify scheduler that system supports ITMT */ -void sched_set_itmt_support(void); +int sched_set_itmt_support(void); /* Interface to notify scheduler that system revokes ITMT support */ void sched_clear_itmt_support(void); #else /* CONFIG_SCHED_ITMT */ +#define sysctl_sched_itmt_enabled 0 static inline void sched_set_itmt_core_prio(int prio, int core_cpu) { } -static inline void sched_set_itmt_support(void) +static inline int sched_set_itmt_support(void) { + return 0; } static inline void sched_clear_itmt_support(void) { diff --git a/arch/x86/kernel/itmt.c b/arch/x86/kernel/itmt.c index 63c9b3e..e999e6e 100644 --- a/arch/x86/kernel/itmt.c +++ b/arch/x86/kernel/itmt.c @@ -34,6 +34,67 @@ DEFINE_PER_CPU_READ_MOSTLY(int, sched_core_priority); /* Boolean to track if system has ITMT capabilities */ static bool __read_mostly sched_itmt_capable; +/* + * Boolean to control whether we want to move processes to cpu capable + * of higher turbo frequency for cpus supporting Intel Turbo Boost Max + * Technology 3.0. + * + * It can be set via /proc/sys/kernel/sched_itmt_enabled + */ +unsigned int __read_mostly sysctl_sched_itmt_enabled; + +static int sched_itmt_update_handler(struct ctl_table *table, int write, + void __user *buffer, size_t *lenp, loff_t *ppos) +{ + int ret; + unsigned int old_sysctl; + + mutex_lock(&itmt_update_mutex); + + if (!sched_itmt_capable) { + mutex_unlock(&itmt_update_mutex); + return -EINVAL; + } + + old_sysctl = sysctl_sched_itmt_enabled; + ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos); + + if (!ret && write && old_sysctl != sysctl_sched_itmt_enabled) { + x86_topology_update = true; + rebuild_sched_domains(); + } + + mutex_unlock(&itmt_update_mutex); + + return ret; +} + +static unsigned int zero; +static unsigned int one = 1; +static struct ctl_table itmt_kern_table[] = { + { + .procname = "sched_itmt_enabled", + .data = &sysctl_sched_itmt_enabled, + .maxlen = sizeof(unsigned int), + .mode = 0644, + .proc_handler = sched_itmt_update_handler, + .extra1 = &zero, + .extra2 = &one, + }, + {} +}; + +static struct ctl_table itmt_root_table[] = { + { + .procname = "kernel", + .mode = 0555, + .child = itmt_kern_table, + }, + {} +}; + +static struct ctl_table_header *itmt_sysctl_header; + /** * sched_set_itmt_support() - Indicate platform supports ITMT * @@ -45,14 +106,44 @@ static bool __read_mostly sched_itmt_capable; * * This must be done only after sched_set_itmt_core_prio * has been called to set the cpus' priorities. + * It must not be called with cpu hot plug lock + * held as we need to acquire the lock to rebuild sched domains + * later. + * + * Return: 0 on success */ -void sched_set_itmt_support(void) +int sched_set_itmt_support(void) { mutex_lock(&itmt_update_mutex); + if (sched_itmt_capable) { + mutex_unlock(&itmt_update_mutex); + return 0; + } + + itmt_sysctl_header = register_sysctl_table(itmt_root_table); + if (!itmt_sysctl_header) { + mutex_unlock(&itmt_update_mutex); + return -ENOMEM; + } + sched_itmt_capable = true; + /* + * ITMT capability automatically enables ITMT + * scheduling for small systems (single node). + */ + if (topology_num_packages() == 1) + sysctl_sched_itmt_enabled = 1; + + if (sysctl_sched_itmt_enabled) { + x86_topology_update = true; + rebuild_sched_domains(); + } + mutex_unlock(&itmt_update_mutex); + + return 0; } /** @@ -61,13 +152,30 @@ void sched_set_itmt_support(void) * This function is used by the OS to indicate that it has * revoked the platform's support of ITMT feature. * + * It must not be called with cpu hot plug lock + * held as we need to acquire the lock to rebuild sched domains + * later. */ void sched_clear_itmt_support(void) { mutex_lock(&itmt_update_mutex); + if (!sched_itmt_capable) { + mutex_unlock(&itmt_update_mutex); + return; + } sched_itmt_capable = false; + if (itmt_sysctl_header) + unregister_sysctl_table(itmt_sysctl_header); + + if (sysctl_sched_itmt_enabled) { + /* disable sched_itmt if we are no longer ITMT capable */ + sysctl_sched_itmt_enabled = 0; + x86_topology_update = true; + rebuild_sched_domains(); + } + mutex_unlock(&itmt_update_mutex); }