diff mbox series

x86/x2APIC: correct cluster tracking upon CPUs going down for S3

Message ID 0a082833-3d52-4c9d-8ea4-5be01986cd2c@suse.com (mailing list archive)
State New
Headers show
Series x86/x2APIC: correct cluster tracking upon CPUs going down for S3 | expand

Commit Message

Jan Beulich Aug. 8, 2024, 1:30 p.m. UTC
Downing CPUs for S3 is somewhat special: Since we can expect the system
to come back up in exactly the same hardware configuration, per-CPU data
for the secondary CPUs isn't de-allocated (and then cleared upon re-
allocation when the CPUs are being brought back up). Therefore the
cluster_cpus per-CPU pointer will retain its value for all CPUs other
than the final one in a cluster (i.e. in particular for all CPUs in the
same cluster as CPU0). That, however, is in conflict with the assertion
early in init_apic_ldr_x2apic_cluster().

Note that the issue is avoided on Intel hardware, where we park CPUs
instead of bringing them down.

Extend the bypassing of the freeing to the suspend case, thus making
suspend/resume also a tiny bit faster.

Fixes: 2e6c8f182c9c ("x86: distinguish CPU offlining from CPU removal")
Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Tested-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>

Comments

Andrew Cooper Aug. 23, 2024, 1:42 p.m. UTC | #1
On 08/08/2024 2:30 pm, Jan Beulich wrote:
> Downing CPUs for S3 is somewhat special: Since we can expect the system
> to come back up in exactly the same hardware configuration, per-CPU data
> for the secondary CPUs isn't de-allocated (and then cleared upon re-
> allocation when the CPUs are being brought back up). Therefore the
> cluster_cpus per-CPU pointer will retain its value for all CPUs other
> than the final one in a cluster (i.e. in particular for all CPUs in the
> same cluster as CPU0). That, however, is in conflict with the assertion
> early in init_apic_ldr_x2apic_cluster().
>
> Note that the issue is avoided on Intel hardware, where we park CPUs
> instead of bringing them down.
>
> Extend the bypassing of the freeing to the suspend case, thus making
> suspend/resume also a tiny bit faster.
>
> Fixes: 2e6c8f182c9c ("x86: distinguish CPU offlining from CPU removal")
> Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> Tested-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>

Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
diff mbox series

Patch

--- a/xen/arch/x86/genapic/x2apic.c
+++ b/xen/arch/x86/genapic/x2apic.c
@@ -228,7 +228,8 @@  static int cf_check update_clusterinfo(
     case CPU_UP_CANCELED:
     case CPU_DEAD:
     case CPU_REMOVE:
-        if ( park_offline_cpus == (action != CPU_REMOVE) )
+        if ( park_offline_cpus == (action != CPU_REMOVE) ||
+             system_state == SYS_STATE_suspend )
             break;
         if ( per_cpu(cluster_cpus, cpu) )
         {