Message ID | alpine.DEB.2.20.1612121102260.3429@nanos (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 12/12/2016 05:04 AM, Thomas Gleixner wrote: > The logical package management has several issues: > > - The APIC ids provided by ACPI are not required to be the same as the > initial APIC id which can be retrieved by CPUID. The APIC ids provided > by ACPI are those which are written by the BIOS into the APIC. The > initial id is set by hardware and can not be changed. The hardware > provided ids contain the real hardware package information. > > Especially AMD sets the effective APIC id different from the hardware id > as they need to reserve space for the IOAPIC ids starting at id 0. > > As a consequence those machines trigger the currently active firmware > bug printouts in dmesg, These are obviously wrong. > > - Virtual machines have their own interesting of enumerating APICs and > packages which are not reliably covered by the current implementation. > > The sizing of the mapping array has been tweaked to be generously large to > handle systems which provide a wrong core count when HT is disabled so the > whole magic which checks for space in the physical hotplug case is not > needed anymore. > > Simplify the whole machinery and do the mapping when the CPU starts and the > CPUID derived physical package information is available. This solves the > observed problems on AMD machines and works for the virtualization issues > as well. > > Remove the extra call from XEN cpu bringup code as it is not longer > required. > > Fixes: d49597fd3bc7 ("x86/cpu: Deal with broken firmware (VMWare/XEN)") > Reported-and-tested-by: Borislav Petkov <bp@suse.de> > Signed-off-by: Thomas Gleixner <tglx@linutronix.de> > Cc: stable@vger.kernel.org For Xen: Tested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> (Note that we still have [Firmware Bug]: CPU13: APIC id mismatch. Firmware: 0 APIC: d but that will be fixed in Xen code.) -boris
Hi Thomas, there is a problem booting recent kernels on some Xen domUs hosted by provider JiffyBox. The kernel seems to crash just after logging [ 0.038700] SMP alternatives: switching to SMP code We started seeing this with 4.9.2 and bisecting the 4.9 stable kernels determined that this commit introduced the problem. Reverting it from 4.9.2 makes the kernel boot again. Older kernels (starting from 3.16 up to and including 4.9.1) were running fine in this setup. But recent mainline (tested 4.12-rc3) and 4.9.x both fail to boot there. Unfortunately we have no detailed information about the hypervisor or setup and the provider is not very forthcoming with details. I'm attaching dmesg of a successful boot (4.9.2 with this commit reverted). It shows a fairly old XEN version: [ 0.000000] Xen version: 3.1.2-416.el5 (preserve-AD) Any ideas? Thanks and kind regards, Max git bisect start # good: [69973b830859bc6529a7a0468ba0d80ee5117826] Linux 4.9 git bisect good 69973b830859bc6529a7a0468ba0d80ee5117826 # bad: [a8c90ef62281db933118aa84489eb0e1e9cc347c] Linux 4.9.25 git bisect bad a8c90ef62281db933118aa84489eb0e1e9cc347c # bad: [84d209b75e7254fba5de26ee2d3b31e638337a82] target: Fix COMPARE_AND_WRITE ref leak for non GOOD status git bisect bad 84d209b75e7254fba5de26ee2d3b31e638337a82 # bad: [3f41ee3a45cb3b2a458e7c4aa69c0638fd745ad2] drm/i915/gen9: Fix PCODE polling during CDCLK change notification git bisect bad 3f41ee3a45cb3b2a458e7c4aa69c0638fd745ad2 # bad: [2b95c939cb88c3182e9dd681d4cf40b70985b8a5] usb: gadget: composite: Test get_alt() presence instead of set_alt() git bisect bad 2b95c939cb88c3182e9dd681d4cf40b70985b8a5 # good: [8e1b86f30bc1e3d213d269a74b3375a06ba8199f] drm/amdgpu: Also call cursor_move_locked when the cursor size changes git bisect good 8e1b86f30bc1e3d213d269a74b3375a06ba8199f # bad: [afd2a1994ea4e37fb602410b350827d5909714fe] Input: drv260x - fix input device's parent assignment git bisect bad afd2a1994ea4e37fb602410b350827d5909714fe # good: [9d33a399566771b023a08f490344b70a200c87da] scsi: avoid a permanent stop of the scsi device's request queue git bisect good 9d33a399566771b023a08f490344b70a200c87da # good: [e80ceb2da52e0aae8e0ae9632c3abbfdd579cf61] vsock/virtio: fix src/dst cid format git bisect good e80ceb2da52e0aae8e0ae9632c3abbfdd579cf61 # bad: [5984423bf7ebea12f953e4665aa72ccff83623d1] IB/multicast: Check ib_find_pkey() return value git bisect bad 5984423bf7ebea12f953e4665aa72ccff83623d1 # bad: [a035dc674dd477e61e5b917c60c30622b6d083f8] x86/smpboot: Make logical package management more robust git bisect bad a035dc674dd477e61e5b917c60c30622b6d083f8 # good: [3168762e8ad3600392b0b6e230e550271c68fe36] platform/x86: asus-nb-wmi.c: Add X45U quirk git bisect good 3168762e8ad3600392b0b6e230e550271c68fe36 # first bad commit: [a035dc674dd477e61e5b917c60c30622b6d083f8] x86/smpboot: Make logical package management more robust [ 0.000000] Linux version 4.9.2-bisect-00001-g3327865 (aaa@example.com) (gcc version 4.9.2 (Debian 4.9.2-10) ) #13 SMP Tue Jun 6 12:01:42 UTC 2017 [ 0.000000] Command line: root=/dev/xvda ro cgroup_enable=memory apparmor=1 security=apparmor [ 0.000000] x86/fpu: Legacy x87 FPU detected. [ 0.000000] x86/fpu: Using 'eager' FPU context switches. [ 0.000000] ACPI in unprivileged domain disabled [ 0.000000] Released 0 page(s) [ 0.000000] e820: BIOS-provided physical RAM map: [ 0.000000] Xen: [mem 0x0000000000000000-0x000000000009ffff] usable [ 0.000000] Xen: [mem 0x00000000000a0000-0x00000000000fffff] reserved [ 0.000000] Xen: [mem 0x0000000000100000-0x00000000807fffff] usable [ 0.000000] NX (Execute Disable) protection: active [ 0.000000] DMI not present or invalid. [ 0.000000] Hypervisor detected: Xen [ 0.000000] e820: update [mem 0x00000000-0x00000fff] usable ==> reserved [ 0.000000] e820: remove [mem 0x000a0000-0x000fffff] usable [ 0.000000] e820: last_pfn = 0x80800 max_arch_pfn = 0x400000000 [ 0.000000] MTRR: Disabled [ 0.000000] x86/PAT: MTRRs disabled, skipping PAT initialization too. [ 0.000000] x86/PAT: Configuration [0-7]: WB WT UC- UC WC WP UC UC [ 0.000000] Base memory trampoline at [ffff88000009a000] 9a000 size 24576 [ 0.000000] BRK [0x01fb4000, 0x01fb4fff] PGTABLE [ 0.000000] BRK [0x01fb5000, 0x01fb5fff] PGTABLE [ 0.000000] BRK [0x01fb6000, 0x01fb6fff] PGTABLE [ 0.000000] BRK [0x01fb7000, 0x01fb7fff] PGTABLE [ 0.000000] BRK [0x01fb8000, 0x01fb8fff] PGTABLE [ 0.000000] BRK [0x01fb9000, 0x01fb9fff] PGTABLE [ 0.000000] BRK [0x01fba000, 0x01fbafff] PGTABLE [ 0.000000] BRK [0x01fbb000, 0x01fbbfff] PGTABLE [ 0.000000] BRK [0x01fbc000, 0x01fbcfff] PGTABLE [ 0.000000] BRK [0x01fbd000, 0x01fbdfff] PGTABLE [ 0.000000] BRK [0x01fbe000, 0x01fbefff] PGTABLE [ 0.000000] BRK [0x01fbf000, 0x01fbffff] PGTABLE [ 0.000000] RAMDISK: [mem 0x01fe1000-0x0264afff] [ 0.000000] NUMA turned off [ 0.000000] Faking a node at [mem 0x0000000000000000-0x00000000807fffff] [ 0.000000] NODE_DATA(0) allocated [mem 0x7fc17000-0x7fc1bfff] [ 0.000000] Zone ranges: [ 0.000000] DMA [mem 0x0000000000001000-0x0000000000ffffff] [ 0.000000] DMA32 [mem 0x0000000001000000-0x00000000807fffff] [ 0.000000] Normal empty [ 0.000000] Device empty [ 0.000000] Movable zone start for each node [ 0.000000] Early memory node ranges [ 0.000000] node 0: [mem 0x0000000000001000-0x000000000009ffff] [ 0.000000] node 0: [mem 0x0000000000100000-0x00000000807fffff] [ 0.000000] Initmem setup node 0 [mem 0x0000000000001000-0x00000000807fffff] [ 0.000000] On node 0 totalpages: 526239 [ 0.000000] DMA zone: 64 pages used for memmap [ 0.000000] DMA zone: 21 pages reserved [ 0.000000] DMA zone: 3999 pages, LIFO batch:0 [ 0.000000] DMA32 zone: 8160 pages used for memmap [ 0.000000] DMA32 zone: 522240 pages, LIFO batch:31 [ 0.000000] p2m virtual area at ffffc90000000000, size is 40000000 [ 0.000000] Remapped 0 page(s) [ 0.000000] SFI: Simple Firmware Interface v0.81 http://simplefirmware.org [ 0.000000] smpboot: Allowing 3 CPUs, 0 hotplug CPUs [ 0.000000] PM: Registered nosave memory: [mem 0x00000000-0x00000fff] [ 0.000000] PM: Registered nosave memory: [mem 0x000a0000-0x000fffff] [ 0.000000] e820: [mem 0x80800000-0xffffffff] available for PCI devices [ 0.000000] Booting paravirtualized kernel on Xen [ 0.000000] Xen version: 3.1.2-416.el5 (preserve-AD) [ 0.000000] clocksource: refined-jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645519600211568 ns [ 0.000000] setup_percpu: NR_CPUS:512 nr_cpumask_bits:512 nr_cpu_ids:3 nr_node_ids:1 [ 0.000000] percpu: Embedded 35 pages/cpu @ffff88007d000000 s105240 r8192 d29928 u524288 [ 0.000000] pcpu-alloc: s105240 r8192 d29928 u524288 alloc=1*2097152 [ 0.000000] pcpu-alloc: [0] 0 1 2 - [ 0.000000] xen: PV spinlocks enabled [ 0.000000] PV qspinlock hash table entries: 256 (order: 0, 4096 bytes) [ 0.000000] Built 1 zonelists in Node order, mobility grouping on. Total pages: 517994 [ 0.000000] Policy zone: DMA32 [ 0.000000] Kernel command line: root=/dev/xvda ro cgroup_enable=memory apparmor=1 security=apparmor [ 0.000000] PID hash table entries: 4096 (order: 3, 32768 bytes) [ 0.000000] Calgary: detecting Calgary via BIOS EBDA area [ 0.000000] Calgary: Unable to locate Rio Grande table in EBDA - bailing! [ 0.000000] Memory: 2030340K/2104956K available (7412K kernel code, 1426K rwdata, 3120K rodata, 1480K init, 848K bss, 74616K reserved, 0K cma-reserved) [ 0.000000] Hierarchical RCU implementation. [ 0.000000] Build-time adjustment of leaf fanout to 64. [ 0.000000] RCU restricting CPUs from NR_CPUS=512 to nr_cpu_ids=3. [ 0.000000] RCU: Adjusting geometry for rcu_fanout_leaf=64, nr_cpu_ids=3 [ 0.000000] Using NULL legacy PIC [ 0.000000] NR_IRQS:33024 nr_irqs:64 0 [ 0.000000] xen:events: Using 2-level ABI [ 0.000000] Console: colour dummy device 80x25 [ 0.000000] console [tty0] enabled [ 0.000000] console [hvc0] enabled [ 0.000000] clocksource: xen: mask: 0xffffffffffffffff max_cycles: 0x1cd42e4dffb, max_idle_ns: 881590591483 ns [ 0.000000] Xen: using vcpuop timer interface [ 0.000000] installing Xen timer for CPU 0 [ 0.000000] tsc: Fast TSC calibration using PIT [ 0.000000] tsc: Detected 2133.309 MHz processor [ 0.008000] Calibrating delay loop (skipped), value calculated using timer frequency.. 4266.81 BogoMIPS (lpj=8533632) [ 0.008000] pid_max: default: 32768 minimum: 301 [ 0.008000] Security Framework initialized [ 0.008000] Yama: becoming mindful. [ 0.008000] AppArmor: AppArmor initialized [ 0.008000] Dentry cache hash table entries: 262144 (order: 9, 2097152 bytes) [ 0.008000] Inode-cache hash table entries: 131072 (order: 8, 1048576 bytes) [ 0.008000] Mount-cache hash table entries: 4096 (order: 3, 32768 bytes) [ 0.008000] Mountpoint-cache hash table entries: 4096 (order: 3, 32768 bytes) [ 0.008000] [Firmware Bug]: CPU0: APIC id mismatch. Firmware: 0 CPUID: 12 [ 0.008000] [Firmware Bug]: CPU0: Using firmware package id 0 instead of 18 [ 0.008000] Last level iTLB entries: 4KB 512, 2MB 7, 4MB 7 [ 0.008000] Last level dTLB entries: 4KB 512, 2MB 32, 4MB 32, 1GB 0 [ 0.039801] ftrace: allocating 28842 entries in 113 pages [ 0.048116] cpu 0 spinlock event irq 1 [ 0.048133] smpboot: APIC(0) Converting physical 0 to logical package 0 [ 0.048140] smpboot: Max logical packages: 1 [ 0.048148] VPMU disabled by hypervisor. [ 0.048172] Performance Events: unsupported p6 CPU model 44 no PMU driver, software events only. [ 0.048962] NMI watchdog: disabled (cpu0): hardware events not enabled [ 0.048973] NMI watchdog: Shutting down hard lockup detector on all cpus [ 0.049148] installing Xen timer for CPU 1 [ 0.049187] SMP alternatives: switching to SMP code [ 0.076036] cpu 1 spinlock event irq 13 [ 0.077674] installing Xen timer for CPU 2 [ 0.008000] [Firmware Bug]: CPU2: APIC id mismatch. Firmware: 0 CPUID: 20 [ 0.008000] [Firmware Bug]: CPU2: Using firmware package id 0 instead of 32 [ 0.076037] cpu 2 spinlock event irq 20 [ 0.077674] x86: Booted up 1 node, 3 CPUs [ 0.077674] devtmpfs: initialized [ 0.077674] x86/mm: Memory block size: 128MB [ 0.081523] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645041785100000 ns [ 0.081523] pinctrl core: initialized pinctrl subsystem [ 0.081523] NET: Registered protocol family 16 [ 0.081523] xen:grant_table: Grant tables using version 1 layout [ 0.081523] Grant table initialized [ 0.081523] PCI: setting up Xen PCI frontend stub [ 0.081523] PCI: pci_cache_line_size set to 64 bytes [ 0.096068] ACPI: Interpreter disabled. [ 0.096081] xen:balloon: Initialising balloon driver [ 0.100033] xen_balloon: Initialising balloon driver [ 0.100062] vgaarb: loaded [ 0.100079] dmi: Firmware registration failed. [ 0.100124] PCI: System does not support PCI [ 0.100124] PCI: System does not support PCI [ 0.100184] clocksource: Switched to clocksource xen [ 0.110141] VFS: Disk quotas dquot_6.6.0 [ 0.110177] VFS: Dquot-cache hash table entries: 512 (order 0, 4096 bytes) [ 0.110211] hugetlbfs: disabling because there are no supported hugepage sizes [ 0.110259] AppArmor: AppArmor Filesystem Enabled [ 0.110305] pnp: PnP ACPI: disabled [ 0.110857] random: fast init done [ 0.113749] NET: Registered protocol family 2 [ 0.113985] TCP established hash table entries: 16384 (order: 5, 131072 bytes) [ 0.114063] TCP bind hash table entries: 16384 (order: 6, 262144 bytes) [ 0.114112] TCP: Hash tables configured (established 16384 bind 16384) [ 0.114156] UDP hash table entries: 1024 (order: 3, 32768 bytes) [ 0.114175] UDP-Lite hash table entries: 1024 (order: 3, 32768 bytes) [ 0.186960] NET: Registered protocol family 1 [ 0.186982] PCI: CLS 0 bytes, default 64 [ 0.187065] Unpacking initramfs... [ 0.196523] Freeing initrd memory: 6568K (ffff880001fe1000 - ffff88000264b000) [ 0.196820] kvm: VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL does not work properly. Using workaround [ 0.197630] futex hash table entries: 1024 (order: 4, 65536 bytes) [ 0.197687] audit: initializing netlink subsys (disabled) [ 0.197722] audit: type=2000 audit(1496753281.728:1): initialized [ 0.198325] Initialise system trusted keyrings [ 0.198481] workingset: timestamp_bits=40 max_order=19 bucket_order=0 [ 0.198557] zbud: loaded [ 0.267817] NET: Registered protocol family 38 [ 0.267833] Key type asymmetric registered [ 0.267840] Asymmetric key parser 'x509' registered [ 0.267888] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 251) [ 0.267965] io scheduler noop registered [ 0.267973] io scheduler deadline registered [ 0.267986] io scheduler cfq registered (default) [ 0.268149] pci_hotplug: PCI Hot Plug PCI Core version: 0.5 [ 0.268164] pciehp: PCI Express Hot Plug Controller Driver version: 0.4 [ 0.268197] intel_idle: does not run on family 6 model 44 [ 0.268849] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled [ 0.269305] Linux agpgart interface v0.103 [ 0.269347] AMD IOMMUv2 driver by Joerg Roedel <jroedel@suse.de> [ 0.269354] AMD IOMMUv2 functionality not available on this system [ 0.271755] loop: module loaded [ 0.271764] Invalid max_queues (4), will use default max: 3. [ 0.277796] tun: Universal TUN/TAP device driver, 1.6 [ 0.277806] tun: (C) 1999-2004 Max Krasnyansky <maxk@qualcomm.com> [ 0.277879] xen_netfront: Initialising Xen virtual ethernet driver [ 0.283767] blkfront: xvda: barrier or flush: disabled; persistent grants: disabled; indirect descriptors: disabled; [ 0.284987] i8042: PNP: No PS/2 controller found. Probing ports directly. [ 0.290649] blkfront: xvdb: barrier or flush: disabled; persistent grants: disabled; indirect descriptors: disabled; [ 1.212043] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x1ec077e4ba6, max_idle_ns: 440795309929 ns [ 1.288242] i8042: No controller found [ 1.288419] mousedev: PS/2 mouse device common for all mice [ 1.288520] input: PC Speaker as /devices/platform/pcspkr/input/input0 [ 1.288641] device-mapper: uevent: version 1.0.3 [ 1.288720] device-mapper: ioctl: 4.35.0-ioctl (2016-06-23) initialised: dm-devel@redhat.com [ 1.288795] ledtrig-cpu: registered to indicate activity on CPUs [ 1.288804] dmi-sysfs: dmi entry is absent. [ 1.288888] Netfilter messages via NETLINK v0.30. [ 1.288992] nf_conntrack version 0.5.0 (16384 buckets, 65536 max) [ 1.289071] ctnetlink v0.93: registering with nfnetlink. [ 1.289628] ip_tables: (C) 2000-2006 Netfilter Core Team [ 1.289816] NET: Registered protocol family 10 [ 1.290189] mip6: Mobile IPv6 [ 1.290204] ip6_tables: (C) 2000-2006 Netfilter Core Team [ 1.290828] NET: Registered protocol family 17 [ 1.290850] sctp: Hash tables configured (bind 256/256) [ 1.290918] mpls_gso: MPLS GSO support [ 1.290927] mce: Unable to init device /dev/mcelog (rc: -5) [ 1.291025] microcode: sig=0x206c2, pf=0x1, revision=0xc [ 1.291108] microcode: Microcode Update Driver: v2.01 <tigran@aivazian.fsnet.co.uk>, Peter Oruba [ 1.291268] registered taskstats version 1 [ 1.291278] Loading compiled-in X.509 certificates [ 1.291305] zswap: loaded using pool lzo/zbud [ 1.295214] Key type encrypted registered [ 1.295229] AppArmor: AppArmor sha1 policy hashing enabled [ 1.295269] hctosys: unable to open rtc device (rtc0) [ 1.295293] PM: Hibernation image not present or could not be loaded. [ 1.296109] Freeing unused kernel memory: 1480K (ffffffff81d66000 - ffffffff81ed8000) [ 1.296122] Write protecting the kernel read-only data: 12288k [ 1.299830] Freeing unused kernel memory: 756K (ffff880001743000 - ffff880001800000) [ 1.300377] Freeing unused kernel memory: 976K (ffff880001b0c000 - ffff880001c00000) [ 1.300395] ------------[ cut here ]------------ [ 1.300406] WARNING: CPU: 2 PID: 1 at arch/x86/mm/dump_pagetables.c:225 note_page+0x639/0x800 [ 1.300417] x86/mm: Found insecure W+X mapping at address ffff880000000000/0xffff880000000000 [ 1.300427] Modules linked in: [ 1.300435] CPU: 2 PID: 1 Comm: swapper/0 Not tainted 4.9.2-bisect-00001-g3327865 #13 [ 1.300446] 0000000000000000 ffffffff813ed495 ffffc90040317de0 0000000000000000 [ 1.300458] ffffffff810cee84 0000000000000000 ffffc90040317e38 8010000000000065 [ 1.300470] ffffc00000000fff ffffc90040317ed0 0000000000000000 ffffffff810ceeff [ 1.300482] Call Trace: [ 1.300491] [<ffffffff813ed495>] ? dump_stack+0x5c/0x77 [ 1.300499] [<ffffffff810cee84>] ? __warn+0xc4/0xe0 [ 1.300506] [<ffffffff810ceeff>] ? warn_slowpath_fmt+0x5f/0x80 [ 1.300513] [<ffffffff810be609>] ? note_page+0x639/0x800 [ 1.300520] [<ffffffff810beb22>] ? ptdump_walk_pgd_level_core+0x352/0x400 [ 1.300531] [<ffffffff81727300>] ? rest_init+0x80/0x80 [ 1.300537] [<ffffffff81727326>] ? kernel_init+0x26/0x100 [ 1.300545] [<ffffffff81734a75>] ? ret_from_fork+0x25/0x30 [ 1.300552] ---[ end trace 4ccffb8d1f2dada7 ]--- [ 1.316671] x86/mm: Checked W+X mappings: FAILED, 5529 W+X pages found. [ 1.637410] EXT4-fs (xvda): mounted filesystem with ordered data mode. Opts: (null) [ 2.307402] systemd[1]: Failed to insert module 'kdbus': Function not implemented [ 2.356830] systemd[1]: systemd 230 running in system mode. (+PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ -LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD +IDN) [ 2.356941] systemd[1]: Detected virtualization xen. [ 2.356956] systemd[1]: Detected architecture x86-64. [ 2.383519] systemd[1]: Set hostname to <neon.leisure.amaext.net>. [ 3.049425] systemd[1]: Listening on Syslog Socket. [ 3.051540] systemd[1]: Created slice User and Session Slice. [ 3.051688] systemd[1]: Listening on fsck to fsckd communication Socket. [ 3.051866] systemd[1]: Started Forward Password Requests to Wall Directory Watch. [ 3.052102] systemd[1]: Listening on Network Service Netlink Socket. [ 3.052275] systemd[1]: Listening on Journal Socket. [ 3.182750] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready [ 3.282990] EXT4-fs (xvda): re-mounted. Opts: (null) [ 3.311244] systemd-journald[347]: Received request to flush runtime journal from PID 1 [ 4.451580] EXT4-fs (dm-0): mounted filesystem with ordered data mode. Opts: (null) [ 4.690547] Adding 524284k swap on /dev/mapper/swap. Priority:-1 extents:1 across:524284k SSFS [ 5.429801] audit: type=1400 audit(1496753286.960:2): apparmor="STATUS" operation="profile_load" name="gst_plugin_scanner" pid=851 comm="apparmor_parser" [ 5.446321] audit: type=1400 audit(1496753286.976:3): apparmor="STATUS" operation="profile_load" name="/sbin/klogd" pid=852 comm="apparmor_parser" [ 5.482514] audit: type=1400 audit(1496753287.012:4): apparmor="STATUS" operation="profile_load" name="/sbin/syslogd" pid=853 comm="apparmor_parser" [ 5.491485] audit: type=1400 audit(1496753287.020:5): apparmor="STATUS" operation="profile_load" name="/sbin/syslog-ng" pid=854 comm="apparmor_parser" [ 5.498191] audit: type=1400 audit(1496753287.028:6): apparmor="STATUS" operation="profile_load" name="/{usr/,}bin/ping" pid=850 comm="apparmor_parser" [ 5.577004] audit: type=1400 audit(1496753287.108:7): apparmor="STATUS" operation="profile_load" name="/usr/bin/irssi" pid=857 comm="apparmor_parser" [ 5.630754] audit: type=1400 audit(1496753287.160:8): apparmor="STATUS" operation="profile_load" name="/usr/bin/pidgin" pid=858 comm="apparmor_parser" [ 5.630780] audit: type=1400 audit(1496753287.160:9): apparmor="STATUS" operation="profile_load" name="/usr/bin/pidgin//launchpad_integration" pid=858 comm="apparmor_parser" [ 5.630800] audit: type=1400 audit(1496753287.160:10): apparmor="STATUS" operation="profile_load" name="/usr/bin/pidgin//sanitized_helper" pid=858 comm="apparmor_parser" [ 5.656699] audit: type=1400 audit(1496753287.188:11): apparmor="STATUS" operation="profile_load" name="/usr/bin/evince" pid=856 comm="apparmor_parser" [ 6.268508] random: crng init done [ 6.936495] xt_CT: No such helper "pptp" [ 6.985142] xt_CT: No such helper "snmp" [ 6.998182] xt_CT: No such helper "irc" [ 7.011142] xt_CT: No such helper "irc-0" [ 7.025865] xt_CT: No such helper "netbios-ns" [ 7.197284] xt_addrtype: ipv6 does not support BROADCAST matching
On 06/06/2017 09:39 AM, Max Vozeler wrote: > Hi Thomas, > > there is a problem booting recent kernels on some Xen domUs hosted by > provider JiffyBox. > > The kernel seems to crash just after logging > [ 0.038700] SMP alternatives: switching to SMP code Do you have the crash splat? Stack trace and such. In fact, full boot log might be useful. > > We started seeing this with 4.9.2 and bisecting the 4.9 stable kernels > determined that this commit introduced the problem. Reverting it from 4.9.2 > makes the kernel boot again. > > Older kernels (starting from 3.16 up to and including 4.9.1) were running > fine in this setup. But recent mainline (tested 4.12-rc3) and 4.9.x both > fail to boot there. > > Unfortunately we have no detailed information about the hypervisor or > setup and the provider is not very forthcoming with details. I'm attaching > dmesg of a successful boot (4.9.2 with this commit reverted). > > It shows a fairly old XEN version: > > [ 0.000000] Xen version: 3.1.2-416.el5 (preserve-AD) This is a 10 year old hypervisor so it's not especially surprising that newer kernels don't work. (If anything, I am surprised that you actually booted 4.9 at all). There have been a bunch of problems in this area (topology) on PV guests. -boris
On Tue, Jun 06, 2017 at 09:48:37PM -0400, Boris Ostrovsky wrote: > On 06/06/2017 09:39 AM, Max Vozeler wrote: > >there is a problem booting recent kernels on some Xen domUs hosted by > >provider JiffyBox. > > > >The kernel seems to crash just after logging > >[ 0.038700] SMP alternatives: switching to SMP code > > Do you have the crash splat? Stack trace and such. > > In fact, full boot log might be useful. Unfortunately, we don't have much more information. Just after "switching to SMP code" the console connection is lost and we get a notification that the VM has crashed. I'm attaching the boot log up to that point.. just in case. I have asked the hosting provider if they can provide XEN hypervisor logs. > >We started seeing this with 4.9.2 and bisecting the 4.9 stable kernels > >determined that this commit introduced the problem. Reverting it from 4.9.2 > >makes the kernel boot again. > > > >Older kernels (starting from 3.16 up to and including 4.9.1) were running > >fine in this setup. But recent mainline (tested 4.12-rc3) and 4.9.x both > >fail to boot there. > > > >Unfortunately we have no detailed information about the hypervisor or > >setup and the provider is not very forthcoming with details. I'm attaching > >dmesg of a successful boot (4.9.2 with this commit reverted). > > > >It shows a fairly old XEN version: > > > >[ 0.000000] Xen version: 3.1.2-416.el5 (preserve-AD) > > This is a 10 year old hypervisor so it's not especially surprising that > newer kernels don't work. (If anything, I am surprised that you actually > booted 4.9 at all). > > There have been a bunch of problems in this area (topology) on PV guests. Thanks and kind regards, Max Filesystem type is ext2fs, using whole disk kernel /boot/vmlinuz-4.9.2-bisect-00033-g2b95c93 root=/dev/xvda ro cgroup_enab le=memory apparmor=1 security=apparmor initrd /boot/initrd.img-4.9.2-bisect-00033-g2b95c93 ============= Init TPM Front ================ Tpmfront:Error Unable to read device/vtpm/0/backend-id during tpmfront initialization! error = ENOENT Tpmfront:Info Shutting down tpmfront close blk: backend=/local/domain/0/backend/vbd/226/51712 node=device/vbd/51712 close blk: backend=/local/domain/0/backend/vbd/226/51728 node=device/vbd/51728 [ 0.000000] Linux version 4.9.2-bisect-00033-g2b95c93 (aaa@example.com) (gcc version 4.9.2 (Debian 4.9.2-10) ) #5 SMP Wed May 31 10:54:25 UTC 2017 [ 0.000000] Command line: root=/dev/xvda ro cgroup_enable=memory apparmor=1 security=apparmor [ 0.000000] x86/fpu: Legacy x87 FPU detected. [ 0.000000] x86/fpu: Using 'eager' FPU context switches. [ 0.000000] ACPI in unprivileged domain disabled [ 0.000000] Released 0 page(s) [ 0.000000] e820: BIOS-provided physical RAM map: [ 0.000000] Xen: [mem 0x0000000000000000-0x000000000009ffff] usable [ 0.000000] Xen: [mem 0x00000000000a0000-0x00000000000fffff] reserved [ 0.000000] Xen: [mem 0x0000000000100000-0x00000000807fffff] usable [ 0.000000] NX (Execute Disable) protection: active [ 0.000000] DMI not present or invalid. [ 0.000000] Hypervisor detected: Xen [ 0.000000] e820: last_pfn = 0x80800 max_arch_pfn = 0x400000000 [ 0.000000] MTRR: Disabled [ 0.000000] x86/PAT: MTRRs disabled, skipping PAT initialization too. [ 0.000000] x86/PAT: Configuration [0-7]: WB WT UC- UC WC WP UC UC [ 0.000000] RAMDISK: [mem 0x01fe1000-0x0264afff] [ 0.000000] NUMA turned off [ 0.000000] Faking a node at [mem 0x0000000000000000-0x00000000807fffff] [ 0.000000] NODE_DATA(0) allocated [mem 0x7fc17000-0x7fc1bfff] [ 0.000000] Zone ranges: [ 0.000000] DMA [mem 0x0000000000001000-0x0000000000ffffff] [ 0.000000] DMA32 [mem 0x0000000001000000-0x00000000807fffff] [ 0.000000] Normal empty [ 0.000000] Device empty [ 0.000000] Movable zone start for each node [ 0.000000] Early memory node ranges [ 0.000000] node 0: [mem 0x0000000000001000-0x000000000009ffff] [ 0.000000] node 0: [mem 0x0000000000100000-0x00000000807fffff] [ 0.000000] Initmem setup node 0 [mem 0x0000000000001000-0x00000000807fffff] [ 0.000000] p2m virtual area at ffffc90000000000, size is 40000000 [ 0.000000] Remapped 0 page(s) [ 0.000000] SFI: Simple Firmware Interface v0.81 http://simplefirmware.org [ 0.000000] smpboot: Allowing 3 CPUs, 0 hotplug CPUs [ 0.000000] PM: Registered nosave memory: [mem 0x00000000-0x00000fff] [ 0.000000] PM: Registered nosave memory: [mem 0x000a0000-0x000fffff] [ 0.000000] e820: [mem 0x80800000-0xffffffff] available for PCI devices [ 0.000000] Booting paravirtualized kernel on Xen [ 0.000000] Xen version: 3.1.2-416.el5 (preserve-AD) [ 0.000000] clocksource: refined-jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645519600211568 ns [ 0.000000] setup_percpu: NR_CPUS:512 nr_cpumask_bits:512 nr_cpu_ids:3 nr_node_ids:1 [ 0.000000] percpu: Embedded 35 pages/cpu @ffff88007d000000 s105240 r8192 d29928 u524288 [ 0.000000] PV qspinlock hash table entries: 256 (order: 0, 4096 bytes) [ 0.000000] Built 1 zonelists in Node order, mobility grouping on. Total pages: 517994 [ 0.000000] Policy zone: DMA32 [ 0.000000] Kernel command line: root=/dev/xvda ro cgroup_enable=memory apparmor=1 security=apparmor [ 0.000000] PID hash table entries: 4096 (order: 3, 32768 bytes) [ 0.000000] Memory: 2030340K/2104956K available (7412K kernel code, 1426K rwdata, 3120K rodata, 1480K init, 848K bss, 74616K reserved, 0K cma-reserved) [ 0.000000] Hierarchical RCU implementation. [ 0.000000] Build-time adjustment of leaf fanout to 64. [ 0.000000] RCU restricting CPUs from NR_CPUS=512 to nr_cpu_ids=3. [ 0.000000] RCU: Adjusting geometry for rcu_fanout_leaf=64, nr_cpu_ids=3 [ 0.000000] Using NULL legacy PIC [ 0.000000] NR_IRQS:33024 nr_irqs:64 0 [ 0.000000] xen:events: Using 2-level ABI [ 0.000000] Console: colour dummy device 80x25 [ 0.000000] console [tty0] enabled [ 0.000000] console [hvc0] enabled [ 0.000000] clocksource: xen: mask: 0xffffffffffffffff max_cycles: 0x1cd42e4dffb, max_idle_ns: 881590591483 ns [ 0.000000] installing Xen timer for CPU 0 [ 0.000000] tsc: Fast TSC calibration using PIT [ 0.000000] tsc: Detected 2133.431 MHz processor [ 0.008000] Calibrating delay loop (skipped), value calculated using timer frequency.. 4266.81 BogoMIPS (lpj=8533632) [ 0.008000] pid_max: default: 32768 minimum: 301 [ 0.008000] Security Framework initialized [ 0.008000] Yama: becoming mindful. [ 0.008000] AppArmor: AppArmor initialized [ 0.008000] Dentry cache hash table entries: 262144 (order: 9, 2097152 bytes) [ 0.008000] Inode-cache hash table entries: 131072 (order: 8, 1048576 bytes) [ 0.008000] Mount-cache hash table entries: 4096 (order: 3, 32768 bytes) [ 0.008000] Mountpoint-cache hash table entries: 4096 (order: 3, 32768 bytes) [ 0.008000] Last level iTLB entries: 4KB 512, 2MB 7, 4MB 7 [ 0.008000] Last level dTLB entries: 4KB 512, 2MB 32, 4MB 32, 1GB 0 [ 0.037256] ftrace: allocating 28843 entries in 113 pages [ 0.044095] cpu 0 spinlock event irq 1 [ 0.044108] smpboot: Max logical packages: 1 [ 0.044114] smpboot: CPU 0 Converting physical 33 to logical package 0 [ 0.044123] VPMU disabled by hypervisor. [ 0.044144] Performance Events: unsupported p6 CPU model 44 no PMU driver, software events only. [ 0.044911] NMI watchdog: disabled (cpu0): hardware events not enabled [ 0.044922] NMI watchdog: Shutting down hard lockup detector on all cpus [ 0.045096] installing Xen timer for CPU 1 [ 0.045135] SMP alternatives: switching to SMP code Verbindung mit Console getrennt
--- a/arch/x86/kernel/apic/apic.c +++ b/arch/x86/kernel/apic/apic.c @@ -2159,21 +2159,6 @@ int __generic_processor_info(int apicid, } /* - * This can happen on physical hotplug. The sanity check at boot time - * is done from native_smp_prepare_cpus() after num_possible_cpus() is - * established. - */ - if (topology_update_package_map(apicid, cpu) < 0) { - int thiscpu = max + disabled_cpus; - - pr_warning("APIC: Package limit reached. Processor %d/0x%x ignored.\n", - thiscpu, apicid); - - disabled_cpus++; - return -ENOSPC; - } - - /* * Validate version */ if (version == 0x0) { --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -979,29 +979,21 @@ static void x86_init_cache_qos(struct cp } /* - * The physical to logical package id mapping is initialized from the - * acpi/mptables information. Make sure that CPUID actually agrees with - * that. + * Validate that ACPI/mptables have the same information about the + * effective APIC id and update the package map. */ -static void sanitize_package_id(struct cpuinfo_x86 *c) +static void validate_apic_and_package_id(struct cpuinfo_x86 *c) { #ifdef CONFIG_SMP - unsigned int pkg, apicid, cpu = smp_processor_id(); + unsigned int apicid, cpu = smp_processor_id(); apicid = apic->cpu_present_to_apicid(cpu); - pkg = apicid >> boot_cpu_data.x86_coreid_bits; - if (apicid != c->initial_apicid) { - pr_err(FW_BUG "CPU%u: APIC id mismatch. Firmware: %x CPUID: %x\n", + if (apicid != c->apicid) { + pr_err(FW_BUG "CPU%u: APIC id mismatch. Firmware: %x APIC: %x\n", cpu, apicid, c->initial_apicid); - c->initial_apicid = apicid; } - if (pkg != c->phys_proc_id) { - pr_err(FW_BUG "CPU%u: Using firmware package id %u instead of %u\n", - cpu, pkg, c->phys_proc_id); - c->phys_proc_id = pkg; - } - c->logical_proc_id = topology_phys_to_logical_pkg(pkg); + BUG_ON(topology_update_package_map(c->phys_proc_id, cpu)); #else c->logical_proc_id = 0; #endif @@ -1132,7 +1124,6 @@ static void identify_cpu(struct cpuinfo_ #ifdef CONFIG_NUMA numa_add_cpu(smp_processor_id()); #endif - sanitize_package_id(c); } /* @@ -1188,6 +1179,7 @@ void identify_secondary_cpu(struct cpuin enable_sep_cpu(); #endif mtrr_ap_init(); + validate_apic_and_package_id(c); } struct msr_range { --- a/arch/x86/kernel/smpboot.c +++ b/arch/x86/kernel/smpboot.c @@ -104,7 +104,6 @@ static unsigned int max_physical_pkg_id unsigned int __max_logical_packages __read_mostly; EXPORT_SYMBOL(__max_logical_packages); static unsigned int logical_packages __read_mostly; -static bool logical_packages_frozen __read_mostly; /* Maximum number of SMT threads on any online core */ int __max_smt_threads __read_mostly; @@ -263,9 +262,14 @@ static void notrace start_secondary(void cpu_startup_entry(CPUHP_AP_ONLINE_IDLE); } -int topology_update_package_map(unsigned int apicid, unsigned int cpu) +/** + * topology_update_package_map - Update the physical to logical package map + * @pkg: The physical package id as retrieved via CPUID + * @cpu: The cpu for which this is updated + */ +int topology_update_package_map(unsigned int pkg, unsigned int cpu) { - unsigned int new, pkg = apicid >> boot_cpu_data.x86_coreid_bits; + unsigned int new; /* Called from early boot ? */ if (!physical_package_map) @@ -278,16 +282,17 @@ int topology_update_package_map(unsigned if (test_and_set_bit(pkg, physical_package_map)) goto found; - if (logical_packages_frozen) { - physical_to_logical_pkg[pkg] = -1; - pr_warn("APIC(%x) Package %u exceeds logical package max\n", - apicid, pkg); + if (logical_packages >= __max_logical_packages) { + pr_warn("Package %u of CPU %u exceeds BIOS package data %u.\n", + logical_packages, cpu, __max_logical_packages); return -ENOSPC; } new = logical_packages++; - pr_info("APIC(%x) Converting physical %u to logical package %u\n", - apicid, pkg, new); + if (new != pkg) { + pr_info("CPU %u Converting physical %u to logical package %u\n", + cpu, pkg, new); + } physical_to_logical_pkg[pkg] = new; found: @@ -308,9 +313,9 @@ int topology_phys_to_logical_pkg(unsigne } EXPORT_SYMBOL(topology_phys_to_logical_pkg); -static void __init smp_init_package_map(void) +static void __init smp_init_package_map(struct cpuinfo_x86 *c, unsigned int cpu) { - unsigned int ncpus, cpu; + unsigned int ncpus; size_t size; /* @@ -355,27 +360,9 @@ static void __init smp_init_package_map( size = BITS_TO_LONGS(max_physical_pkg_id) * sizeof(unsigned long); physical_package_map = kzalloc(size, GFP_KERNEL); - for_each_present_cpu(cpu) { - unsigned int apicid = apic->cpu_present_to_apicid(cpu); - - if (apicid == BAD_APICID || !apic->apic_id_valid(apicid)) - continue; - if (!topology_update_package_map(apicid, cpu)) - continue; - pr_warn("CPU %u APICId %x disabled\n", cpu, apicid); - per_cpu(x86_bios_cpu_apicid, cpu) = BAD_APICID; - set_cpu_possible(cpu, false); - set_cpu_present(cpu, false); - } - - if (logical_packages > __max_logical_packages) { - pr_warn("Detected more packages (%u), then computed by BIOS data (%u).\n", - logical_packages, __max_logical_packages); - logical_packages_frozen = true; - __max_logical_packages = logical_packages; - } - pr_info("Max logical packages: %u\n", __max_logical_packages); + + topology_update_package_map(c->phys_proc_id, cpu); } void __init smp_store_boot_cpu_info(void) @@ -385,7 +372,7 @@ void __init smp_store_boot_cpu_info(void *c = boot_cpu_data; c->cpu_index = id; - smp_init_package_map(); + smp_init_package_map(c, id); } /* --- a/arch/x86/xen/smp.c +++ b/arch/x86/xen/smp.c @@ -87,12 +87,6 @@ static void cpu_bringup(void) cpu_data(cpu).x86_max_cores = 1; set_cpu_sibling_map(cpu); - /* - * identify_cpu() may have set logical_pkg_id to -1 due - * to incorrect phys_proc_id. Let's re-comupte it. - */ - topology_update_package_map(apic->cpu_present_to_apicid(cpu), cpu); - xen_setup_cpu_clockevents(); notify_cpu_starting(cpu);
The logical package management has several issues: - The APIC ids provided by ACPI are not required to be the same as the initial APIC id which can be retrieved by CPUID. The APIC ids provided by ACPI are those which are written by the BIOS into the APIC. The initial id is set by hardware and can not be changed. The hardware provided ids contain the real hardware package information. Especially AMD sets the effective APIC id different from the hardware id as they need to reserve space for the IOAPIC ids starting at id 0. As a consequence those machines trigger the currently active firmware bug printouts in dmesg, These are obviously wrong. - Virtual machines have their own interesting of enumerating APICs and packages which are not reliably covered by the current implementation. The sizing of the mapping array has been tweaked to be generously large to handle systems which provide a wrong core count when HT is disabled so the whole magic which checks for space in the physical hotplug case is not needed anymore. Simplify the whole machinery and do the mapping when the CPU starts and the CPUID derived physical package information is available. This solves the observed problems on AMD machines and works for the virtualization issues as well. Remove the extra call from XEN cpu bringup code as it is not longer required. Fixes: d49597fd3bc7 ("x86/cpu: Deal with broken firmware (VMWare/XEN)") Reported-and-tested-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org --- arch/x86/kernel/apic/apic.c | 15 ------------ arch/x86/kernel/cpu/common.c | 24 ++++++-------------- arch/x86/kernel/smpboot.c | 51 ++++++++++++++++--------------------------- arch/x86/xen/smp.c | 6 ----- 4 files changed, 27 insertions(+), 69 deletions(-)