diff mbox

[00/12] thermal/x86_pkg_temp: Sanitize yet another hotplug and locking trainwreck

Message ID 1479770177.6544.195.camel@intel.com (mailing list archive)
State Not Applicable, archived
Headers show

Commit Message

Pandruvada, Srinivas Nov. 21, 2016, 11:16 p.m. UTC
On Mon, 2016-11-21 at 22:34 +0100, Thomas Gleixner wrote:
> On Mon, 21 Nov 2016, Pandruvada, Srinivas wrote:


[...]

> Stupid me. I tested putting a socket offline, which works, but did

> not

> check what happens on module removal. Delta fix below. That needs to

> be

> folded into the series as the wreckage already happens before the

> last

> patch.


Your change below fixes the crash issue. Now I tested a case where the
last cpu offlined from a package, it removed thermal zone and added
zone back once any cpu from the package onlined. So this is working.

I want to try to run some workload on those cpu to bump up the
temperature and check interrupts. I am hitting some issue unrelated to
this change may be. I onlined three cpus from the package 1.

[189443.567728] smpboot: Booting Node 1 Processor 15 APIC 0x2e
[189656.625947] smpboot: Booting Node 1 Processor 8 APIC 0x20
[189829.545851] smpboot: Booting Node 1 Processor 24 APIC 0x21

But I can't schedule anything on those CPUs. For example now can't run
turbostat, it complains
"
turbostat: re-initialized with num_cpus 19
Could not migrate to CPU 8
"

Same with

#taskset 0x100 stress -c 1
taskset: failed to set pid 0's affinity: Invalid argument

I am on the latest linux-pm/linux-next tree on this server. I will
switch to latest main line and try.

Thanks,
Srinivas



8<--------------------
        spin_unlock_irq(&pkg_temp_lock);
@@ -399,13 +401,15 @@ static int pkg_temp_thermal_device_add(u
 
 static int pkg_thermal_cpu_offline(unsigned int cpu)
 {
-       int target = cpumask_any_but(topology_core_cpumask(cpu), cpu);
        struct pkg_device *pkgdev = pkg_temp_thermal_get_dev(cpu);
        bool lastcpu, was_target;
+       int target;
 
        if (!pkgdev)
                return 0;
 
+       target = cpumask_any_but(&pkgdev->cpumask, cpu);
+       cpumask_clear_cpu(cpu, &pkgdev->cpumask);
        lastcpu = target >= nr_cpu_ids;
 
        /*
@@ -492,8 +496,10 @@ static int pkg_thermal_cpu_online(unsign
                return -ENODEV;
 
        /* If the package exists, nothing to do */
-       if (pkgdev)
+       if (pkgdev) {
+               cpumask_set_cpu(cpu, &pkgdev->cpumask);
                return 0;
+       }
        return pkg_temp_thermal_device_add(cpu);
 }

Comments

Thomas Gleixner Nov. 22, 2016, 9:05 a.m. UTC | #1
On Mon, 21 Nov 2016, Pandruvada, Srinivas wrote:
> Your change below fixes the crash issue. Now I tested a case where the
> last cpu offlined from a package, it removed thermal zone and added
> zone back once any cpu from the package onlined. So this is working.
> 
> I want to try to run some workload on those cpu to bump up the
> temperature and check interrupts. I am hitting some issue unrelated to
> this change may be. I onlined three cpus from the package 1.
> 
> [189443.567728] smpboot: Booting Node 1 Processor 15 APIC 0x2e
> [189656.625947] smpboot: Booting Node 1 Processor 8 APIC 0x20
> [189829.545851] smpboot: Booting Node 1 Processor 24 APIC 0x21
> 
> But I can't schedule anything on those CPUs. For example now can't run
> turbostat, it complains
> "
> turbostat: re-initialized with num_cpus 19
> Could not migrate to CPU 8
> "
> 
> Same with
> 
> #taskset 0x100 stress -c 1
> taskset: failed to set pid 0's affinity: Invalid argument
> 
> I am on the latest linux-pm/linux-next tree on this server. I will
> switch to latest main line and try.

That must be something unrelated. I can use turbostat and taskset after
doing the above.

Thanks,

	tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

--- a/drivers/thermal/x86_pkg_temp_thermal.c
+++ b/drivers/thermal/x86_pkg_temp_thermal.c
@@ -63,6 +63,7 @@  struct pkg_device {
        u32                             msr_pkg_therm_high;
        struct delayed_work             work;
        struct thermal_zone_device      *tzone;
+       struct cpumask                  cpumask;
 };
 
 static struct thermal_zone_params pkg_temp_tz_params = {
@@ -391,6 +392,7 @@  static int pkg_temp_thermal_device_add(u
        rdmsr(MSR_IA32_PACKAGE_THERM_INTERRUPT, pkgdev-
>msr_pkg_therm_low,
              pkgdev->msr_pkg_therm_high);
 
+       cpumask_set_cpu(cpu, &pkgdev->cpumask);
        spin_lock_irq(&pkg_temp_lock);
        packages[pkgid] = pkgdev;