Message ID | 20190807143119.8360-1-jgross@suse.com (mailing list archive) |
---|---|
Headers | show |
Series | enhance lock debugging | expand |
On 07/08/2019 15:31, Juergen Gross wrote: > While hunting a locking problem in my core scheduling series I have > added some debugging aids to spinlock handling making it easier to > find the root cause for the problem. > > Making use of the already lock profiling and enhancing it a little > bit produces some really valuable diagnostic data e.g. when a NMI > watchdog is triggering a crash. So I'm not sure where best to report this in the series, so 0/$N will have to do. Here is a sample trace from panic(): (XEN) Xen lock profile info SHOW (now = 49695791886 total = 49695791886) (XEN) Global grant_lock: addr=ffff82d080818640, lockval=0000, cpu=4095 (XEN) lock: 0(00000000:00000000), block: 0(00000000:00000000) (XEN) Global balloon_lock: addr=ffff82d080818630, lockval=0000, cpu=4095 (XEN) lock: 0(00000000:00000000), block: 0(00000000:00000000) (XEN) Global osvw_lock: addr=ffff82d080818610, lockval=0000, cpu=4095 (XEN) lock: 0(00000000:00000000), block: 0(00000000:00000000) (XEN) Global mtrr_mutex: addr=ffff82d0808185f0, lockval=0000, cpu=4095 (XEN) lock: 0(00000000:00000000), block: 0(00000000:00000000) (XEN) Global set_atomicity_lock: addr=ffff82d0808185e0, lockval=80008, cpu=4095 (XEN) lock: 0(00000000:00000000), block: 0(00000000:00000000) (XEN) Global cmci_discover_lock: addr=ffff82d0808185c0, lockval=80008, cpu=4095 (XEN) lock: 0(00000000:00000000), block: 0(00000000:00000000) (XEN) Global mce_logout_lock: addr=ffff82d0808185a0, lockval=0000, cpu=4095 (XEN) lock: 0(00000000:00000000), block: 0(00000000:00000000) (XEN) Global processing_lock: addr=ffff82d080818580, lockval=0000, cpu=4095 (XEN) lock: 0(00000000:00000000), block: 0(00000000:00000000) (XEN) Global vpmu_lock: addr=ffff82d080818560, lockval=40004, cpu=4095 (XEN) lock: 4(00000000:0000007D), block: 0(00000000:00000000) (XEN) Global pm_lock: addr=ffff82d080818230, lockval=0000, cpu=4095 (XEN) lock: 0(00000000:00000000), block: 0(00000000:00000000) (XEN) Global rtc_lock: addr=ffff82d080818210, lockval=310031, cpu=4095 (XEN) lock: 48(00000000:0001892E), block: 0(00000000:00000000) (XEN) Global pit_lock: addr=ffff82d080818200, lockval=0000, cpu=4095 (XEN) lock: 0(00000000:00000000), block: 0(00000000:00000000) (XEN) Global platform_timer_lock: addr=ffff82d0808181f0, lockval=6c006c, cpu=4095 (XEN) lock: 92(00000000:00012CAB), block: 0(00000000:00000000) (XEN) Global sync_lock: addr=ffff82d0808181d0, lockval=c188c188, cpu=4095 (XEN) lock: 115080(00000000:0074F130), block: 115079(00000000:0871000E) (XEN) Global lock: addr=ffff82d0808181e0, lockval=10001, cpu=4095 (XEN) lock: 1(00000000:01322165), block: 0(00000000:00000000) (XEN) Global flush_lock: addr=ffff82d0808181c0, lockval=91139111, cpu=3 (XEN) lock: 168195(00000000:159C6F6E), block: 1695(00000000:006B31FD) (XEN) Global xenpf_lock: addr=ffff82d0808181b0, lockval=2f002f, cpu=4095 (XEN) lock: 47(00000000:0001BC02), block: 0(00000000:00000000) (XEN) Global pci_config_lock: addr=ffff82d0808181a0, lockval=374d374d, cpu=4095 (XEN) lock: 5196(00000000:004545AA), block: 0(00000000:00000000) (XEN) Global lapic_nmi_owner_lock: addr=ffff82d080817d70, lockval=0000, cpu=4095 (XEN) lock: 0(00000000:00000000), block: 0(00000000:00000000) (XEN) Global last_lock: addr=ffff82d080817d40, lockval=40034003, cpu=4095 (XEN) lock: 16387(00000000:00047132), block: 0(00000000:00000000) (XEN) Global map_pgdir_lock: addr=ffff82d080817d50, lockval=2c002c, cpu=4095 (XEN) lock: 38(00000000:00000ACF), block: 0(00000000:00000000) (XEN) Global microcode_mutex: addr=ffff82d080817d20, lockval=0000, cpu=4095 (XEN) lock: 0(00000000:00000000), block: 0(00000000:00000000) (XEN) Global microcode_update_lock: addr=ffff82d080817d10, lockval=0000, cpu=4095 (XEN) lock: 0(00000000:00000000), block: 0(00000000:00000000) (XEN) Global microcode_update_lock: addr=ffff82d080817d00, lockval=0000, cpu=4095 (XEN) lock: 0(00000000:00000000), block: 0(00000000:00000000) (XEN) Global vector_lock: addr=ffff82d080817cf0, lockval=440044, cpu=4095 (XEN) lock: 27(00000000:00003B20), block: 0(00000000:00000000) (XEN) Global irq_ratelimit_lock: addr=ffff82d080817ce0, lockval=0000, cpu=4095 (XEN) lock: 0(00000000:00000000), block: 0(00000000:00000000) (XEN) Global lock: addr=ffff82d080817cc0, lockval=100010, cpu=4095 (XEN) lock: 0(00000000:00000000), block: 0(00000000:00000000) (XEN) Global msix_fixmap_lock: addr=ffff82d080817cb0, lockval=20002, cpu=4095 (XEN) lock: 2(00000000:000000CF), block: 0(00000000:00000000) (XEN) Global ioapic_lock: addr=ffff82d080817c90, lockval=660066, cpu=4095 (XEN) lock: 20(00000000:00026E6C), block: 0(00000000:00000000) (XEN) Global i8259A_lock: addr=ffff82d080817c80, lockval=2d002d, cpu=4095 (XEN) lock: 6(00000000:00001BA0), block: 0(00000000:00000000) (XEN) Global apei_iomaps_lock: addr=ffff82d080817c50, lockval=110011, cpu=4095 (XEN) lock: 0(00000000:00000000), block: 0(00000000:00000000) (XEN) Global erst_lock: addr=ffff82d080817c40, lockval=0000, cpu=4095 (XEN) lock: 0(00000000:00000000), block: 0(00000000:00000000) (XEN) Global shared_intremap_lock: addr=ffff82d080817bc0, lockval=0000, cpu=4095 (XEN) lock: 0(00000000:00000000), block: 0(00000000:00000000) (XEN) Global iommu_pt_cleanup_lock: addr=ffff82d080817ba0, lockval=0000, cpu=4095 (XEN) lock: 0(00000000:00000000), block: 0(00000000:00000000) (XEN) Global rx_lock: addr=ffff82d080817b70, lockval=0000, cpu=4095 (XEN) lock: 0(00000000:00000000), block: 0(00000000:00000000) (XEN) Global tx_lock: addr=ffff82d080817b60, lockval=0000, cpu=4095 (XEN) lock: 0(00000000:00000000), block: 0(00000000:00000000) (XEN) Global console_lock: addr=ffff82d080817830, lockval=770077, cpu=4095 (XEN) lock: 3447(00000000:07595894), block: 0(00000000:00000000) (XEN) Global ratelimit_lock: addr=ffff82d080817820, lockval=0000, cpu=4095 (XEN) lock: 0(00000000:00000000), block: 0(00000000:00000000) (XEN) Global lock: addr=ffff82d080817800, lockval=0000, cpu=4095 (XEN) lock: 0(00000000:00000000), block: 0(00000000:00000000) (XEN) Global pool_list_lock: addr=ffff82d0808177e0, lockval=10001, cpu=4095 (XEN) lock: 0(00000000:00000000), block: 0(00000000:00000000) (XEN) Global xenoprof_lock: addr=ffff82d0808177d0, lockval=0000, cpu=4095 (XEN) lock: 0(00000000:00000000), block: 0(00000000:00000000) (XEN) Global pmu_owner_lock: addr=ffff82d0808177c0, lockval=0000, cpu=4095 (XEN) lock: 0(00000000:00000000), block: 0(00000000:00000000) (XEN) Global vm_lock: addr=ffff82d0808177b0, lockval=220022, cpu=4095 (XEN) lock: 17(00000000:00001ED1), block: 0(00000000:00000000) (XEN) Global virtual_region_lock: addr=ffff82d0808177a0, lockval=10001, cpu=4095 (XEN) lock: 1(00000000:00000054), block: 0(00000000:00000000) (XEN) Global lock: addr=ffff82d080817780, lockval=0000, cpu=4095 (XEN) lock: 0(00000000:00000000), block: 0(00000000:00000000) (XEN) Global wc_lock: addr=ffff82d080817770, lockval=c000c, cpu=4095 (XEN) lock: 11(00000000:00000995), block: 0(00000000:00000000) (XEN) Global tasklet_lock: addr=ffff82d080817760, lockval=3f403f4, cpu=4095 (XEN) lock: 426(00000000:00007AFF), block: 0(00000000:00000000) (XEN) Global sysctl_lock: addr=ffff82d080817750, lockval=0000, cpu=4095 (XEN) lock: 0(00000000:00000000), block: 0(00000000:00000000) (XEN) Global symbols_mutex: addr=ffff82d080817740, lockval=0000, cpu=4095 (XEN) lock: 0(00000000:00000000), block: 0(00000000:00000000) (XEN) Global stopmachine_lock: addr=ffff82d080817730, lockval=0000, cpu=4095 (XEN) lock: 0(00000000:00000000), block: 0(00000000:00000000) (XEN) Global call_lock: addr=ffff82d080817710, lockval=4d004d, cpu=4095 (XEN) lock: 67(00000000:00176310), block: 0(00000000:00000000) (XEN) Global heap_lock: addr=ffff82d080817570, lockval=6b286b28, cpu=4095 (XEN) lock: 753071(00000000:018E9B8B), block: 33(00000000:0000916A) (XEN) Global payload_lock: addr=ffff82d080817550, lockval=0000, cpu=4095 (XEN) lock: 0(00000000:00000000), block: 0(00000000:00000000) (XEN) Global crash_notes_lock: addr=ffff82d080817530, lockval=80008, cpu=4095 (XEN) lock: 0(00000000:00000000), block: 0(00000000:00000000) (XEN) Global lock: addr=ffff82d080817510, lockval=0000, cpu=4095 (XEN) lock: 0(00000000:00000000), block: 0(00000000:00000000) (XEN) Global global_virq_handlers_lock: addr=ffff82d080817500, lockval=0000, cpu=4095 (XEN) lock: 0(00000000:00000000), block: 0(00000000:00000000) (XEN) Global domlist_update_lock: addr=ffff82d0808174e0, lockval=10001, cpu=4095 (XEN) lock: 1(00000000:0000005B), block: 0(00000000:00000000) (XEN) Global domctl_lock: addr=ffff82d0808174d0, lockval=50005, cpu=4095 (XEN) lock: 5(00000000:000008DC), block: 0(00000000:00000000) (XEN) Global cpupool_lock: addr=ffff82d0808174b0, lockval=a000a, cpu=4095 (XEN) lock: 1(00000000:00000176), block: 0(00000000:00000000) (XEN) Global cpu_add_remove_lock: addr=ffff82d0808174a0, lockval=2d002d, cpu=4095 (XEN) lock: 24(00000000:0000175F), block: 0(00000000:00000000) (XEN) Global efi_rs_lock: addr=ffff82d080817030, lockval=0000, cpu=4095 (XEN) lock: 0(00000000:00000000), block: 0(00000000:00000000) (XEN) Domain 0 event_lock: addr=ffff8308366c50e8, lockval=1d451d45, cpu=4095 (XEN) lock: 7493(00000000:00866052), block: 0(00000000:00000000) (XEN) Domain 0 page_alloc_lock: addr=ffff8308366c5028, lockval=9aa59aa5, cpu=4095 (XEN) lock: 39589(00000000:0055339E), block: 0(00000000:00000000) (XEN) Domain 0 domain_lock: addr=ffff8308366c5018, lockval=150015, cpu=4095 (XEN) lock: 21(00000000:00009D70), block: 0(00000000:00000000) (XEN) Domain 32767 page_alloc_lock: addr=ffff83084ccc5028, lockval=0000, cpu=4095 (XEN) lock: 0(00000000:00000000), block: 0(00000000:00000000) (XEN) Domain 32767 domain_lock: addr=ffff83084ccc5018, lockval=0000, cpu=4095 (XEN) lock: 0(00000000:00000000), block: 0(00000000:00000000) (XEN) Domain 32753 page_alloc_lock: addr=ffff83084cccf028, lockval=3ee43ee4, cpu=4095 (XEN) lock: 16100(00000000:0005173F), block: 0(00000000:00000000) (XEN) Domain 32753 domain_lock: addr=ffff83084cccf018, lockval=0000, cpu=4095 (XEN) lock: 0(00000000:00000000), block: 0(00000000:00000000) (XEN) Domain 32754 page_alloc_lock: addr=ffff83084ccd0028, lockval=64006400, cpu=4095 (XEN) lock: 25600(00000000:000814D9), block: 0(00000000:00000000) (XEN) Domain 32754 domain_lock: addr=ffff83084ccd0018, lockval=0000, cpu=4095 (XEN) lock: 0(00000000:00000000), block: 0(00000000:00000000) Its not exactly the easiest to dump to follow. First of all - I don't see why the hold/block time are printed like that. It might be a holdover from the 32bit build, pre PRId64 support. They should probably use PRI_stime anyway. The domid rendering is unfortunate. Ideally we'd use %pd but that would involve rearranging the logic to get a struct domain* in hand. Seeing as you're the last person to modify this code, how hard would that be to do? If/when the per-domain locks are rendered more clearly, the "Global " prefix can be dropped. An interesting quirk of ticketlocks is that lockval= is redundant with lock_cnt. Could we perhaps fold the two fields? Distinguishing cpu=4095 as "not locked" would be useful. We have several global locks called lock: (XEN) Global lock: addr=ffff82d0808181e0, lockval=10001, cpu=4095 (XEN) lock: 1(00000000:01322165), block: 0(00000000:00000000) (XEN) Global lock: addr=ffff82d080817cc0, lockval=100010, cpu=4095 (XEN) lock: 0(00000000:00000000), block: 0(00000000:00000000) (XEN) Global lock: addr=ffff82d080817800, lockval=0000, cpu=4095 (XEN) lock: 0(00000000:00000000), block: 0(00000000:00000000) (XEN) Global lock: addr=ffff82d080817780, lockval=0000, cpu=4095 (XEN) lock: 0(00000000:00000000), block: 0(00000000:00000000) (XEN) Global lock: addr=ffff82d080817510, lockval=0000, cpu=4095 (XEN) lock: 0(00000000:00000000), block: 0(00000000:00000000) The second one in particular has corrupt data, seeing has it has been taken and released several times without lock_cnt increasing. For sanity sake, we should enforce unique naming of any lock registered for profiling. ~Andrew
On 07.08.19 20:11, Andrew Cooper wrote: > On 07/08/2019 15:31, Juergen Gross wrote: >> While hunting a locking problem in my core scheduling series I have >> added some debugging aids to spinlock handling making it easier to >> find the root cause for the problem. >> >> Making use of the already lock profiling and enhancing it a little >> bit produces some really valuable diagnostic data e.g. when a NMI >> watchdog is triggering a crash. > > So I'm not sure where best to report this in the series, so 0/$N will > have to do. Here is a sample trace from panic(): > > (XEN) Xen lock profile info SHOW (now = 49695791886 total = 49695791886) > (XEN) Global grant_lock: addr=ffff82d080818640, lockval=0000, cpu=4095 > (XEN) lock: 0(00000000:00000000), block: 0(00000000:00000000) > (XEN) Global balloon_lock: addr=ffff82d080818630, lockval=0000, cpu=4095 > (XEN) lock: 0(00000000:00000000), block: 0(00000000:00000000) > (XEN) Global osvw_lock: addr=ffff82d080818610, lockval=0000, cpu=4095 > (XEN) lock: 0(00000000:00000000), block: 0(00000000:00000000) > (XEN) Global mtrr_mutex: addr=ffff82d0808185f0, lockval=0000, cpu=4095 > (XEN) lock: 0(00000000:00000000), block: 0(00000000:00000000) > (XEN) Global set_atomicity_lock: addr=ffff82d0808185e0, lockval=80008, cpu=4095 > (XEN) lock: 0(00000000:00000000), block: 0(00000000:00000000) > (XEN) Global cmci_discover_lock: addr=ffff82d0808185c0, lockval=80008, cpu=4095 > (XEN) lock: 0(00000000:00000000), block: 0(00000000:00000000) > (XEN) Global mce_logout_lock: addr=ffff82d0808185a0, lockval=0000, cpu=4095 > (XEN) lock: 0(00000000:00000000), block: 0(00000000:00000000) > (XEN) Global processing_lock: addr=ffff82d080818580, lockval=0000, cpu=4095 > (XEN) lock: 0(00000000:00000000), block: 0(00000000:00000000) > (XEN) Global vpmu_lock: addr=ffff82d080818560, lockval=40004, cpu=4095 > (XEN) lock: 4(00000000:0000007D), block: 0(00000000:00000000) > (XEN) Global pm_lock: addr=ffff82d080818230, lockval=0000, cpu=4095 > (XEN) lock: 0(00000000:00000000), block: 0(00000000:00000000) > (XEN) Global rtc_lock: addr=ffff82d080818210, lockval=310031, cpu=4095 > (XEN) lock: 48(00000000:0001892E), block: 0(00000000:00000000) > (XEN) Global pit_lock: addr=ffff82d080818200, lockval=0000, cpu=4095 > (XEN) lock: 0(00000000:00000000), block: 0(00000000:00000000) > (XEN) Global platform_timer_lock: addr=ffff82d0808181f0, lockval=6c006c, cpu=4095 > (XEN) lock: 92(00000000:00012CAB), block: 0(00000000:00000000) > (XEN) Global sync_lock: addr=ffff82d0808181d0, lockval=c188c188, cpu=4095 > (XEN) lock: 115080(00000000:0074F130), block: 115079(00000000:0871000E) > (XEN) Global lock: addr=ffff82d0808181e0, lockval=10001, cpu=4095 > (XEN) lock: 1(00000000:01322165), block: 0(00000000:00000000) > (XEN) Global flush_lock: addr=ffff82d0808181c0, lockval=91139111, cpu=3 > (XEN) lock: 168195(00000000:159C6F6E), block: 1695(00000000:006B31FD) > (XEN) Global xenpf_lock: addr=ffff82d0808181b0, lockval=2f002f, cpu=4095 > (XEN) lock: 47(00000000:0001BC02), block: 0(00000000:00000000) > (XEN) Global pci_config_lock: addr=ffff82d0808181a0, lockval=374d374d, cpu=4095 > (XEN) lock: 5196(00000000:004545AA), block: 0(00000000:00000000) > (XEN) Global lapic_nmi_owner_lock: addr=ffff82d080817d70, lockval=0000, cpu=4095 > (XEN) lock: 0(00000000:00000000), block: 0(00000000:00000000) > (XEN) Global last_lock: addr=ffff82d080817d40, lockval=40034003, cpu=4095 > (XEN) lock: 16387(00000000:00047132), block: 0(00000000:00000000) > (XEN) Global map_pgdir_lock: addr=ffff82d080817d50, lockval=2c002c, cpu=4095 > (XEN) lock: 38(00000000:00000ACF), block: 0(00000000:00000000) > (XEN) Global microcode_mutex: addr=ffff82d080817d20, lockval=0000, cpu=4095 > (XEN) lock: 0(00000000:00000000), block: 0(00000000:00000000) > (XEN) Global microcode_update_lock: addr=ffff82d080817d10, lockval=0000, cpu=4095 > (XEN) lock: 0(00000000:00000000), block: 0(00000000:00000000) > (XEN) Global microcode_update_lock: addr=ffff82d080817d00, lockval=0000, cpu=4095 > (XEN) lock: 0(00000000:00000000), block: 0(00000000:00000000) > (XEN) Global vector_lock: addr=ffff82d080817cf0, lockval=440044, cpu=4095 > (XEN) lock: 27(00000000:00003B20), block: 0(00000000:00000000) > (XEN) Global irq_ratelimit_lock: addr=ffff82d080817ce0, lockval=0000, cpu=4095 > (XEN) lock: 0(00000000:00000000), block: 0(00000000:00000000) > (XEN) Global lock: addr=ffff82d080817cc0, lockval=100010, cpu=4095 > (XEN) lock: 0(00000000:00000000), block: 0(00000000:00000000) > (XEN) Global msix_fixmap_lock: addr=ffff82d080817cb0, lockval=20002, cpu=4095 > (XEN) lock: 2(00000000:000000CF), block: 0(00000000:00000000) > (XEN) Global ioapic_lock: addr=ffff82d080817c90, lockval=660066, cpu=4095 > (XEN) lock: 20(00000000:00026E6C), block: 0(00000000:00000000) > (XEN) Global i8259A_lock: addr=ffff82d080817c80, lockval=2d002d, cpu=4095 > (XEN) lock: 6(00000000:00001BA0), block: 0(00000000:00000000) > (XEN) Global apei_iomaps_lock: addr=ffff82d080817c50, lockval=110011, cpu=4095 > (XEN) lock: 0(00000000:00000000), block: 0(00000000:00000000) > (XEN) Global erst_lock: addr=ffff82d080817c40, lockval=0000, cpu=4095 > (XEN) lock: 0(00000000:00000000), block: 0(00000000:00000000) > (XEN) Global shared_intremap_lock: addr=ffff82d080817bc0, lockval=0000, cpu=4095 > (XEN) lock: 0(00000000:00000000), block: 0(00000000:00000000) > (XEN) Global iommu_pt_cleanup_lock: addr=ffff82d080817ba0, lockval=0000, cpu=4095 > (XEN) lock: 0(00000000:00000000), block: 0(00000000:00000000) > (XEN) Global rx_lock: addr=ffff82d080817b70, lockval=0000, cpu=4095 > (XEN) lock: 0(00000000:00000000), block: 0(00000000:00000000) > (XEN) Global tx_lock: addr=ffff82d080817b60, lockval=0000, cpu=4095 > (XEN) lock: 0(00000000:00000000), block: 0(00000000:00000000) > (XEN) Global console_lock: addr=ffff82d080817830, lockval=770077, cpu=4095 > (XEN) lock: 3447(00000000:07595894), block: 0(00000000:00000000) > (XEN) Global ratelimit_lock: addr=ffff82d080817820, lockval=0000, cpu=4095 > (XEN) lock: 0(00000000:00000000), block: 0(00000000:00000000) > (XEN) Global lock: addr=ffff82d080817800, lockval=0000, cpu=4095 > (XEN) lock: 0(00000000:00000000), block: 0(00000000:00000000) > (XEN) Global pool_list_lock: addr=ffff82d0808177e0, lockval=10001, cpu=4095 > (XEN) lock: 0(00000000:00000000), block: 0(00000000:00000000) > (XEN) Global xenoprof_lock: addr=ffff82d0808177d0, lockval=0000, cpu=4095 > (XEN) lock: 0(00000000:00000000), block: 0(00000000:00000000) > (XEN) Global pmu_owner_lock: addr=ffff82d0808177c0, lockval=0000, cpu=4095 > (XEN) lock: 0(00000000:00000000), block: 0(00000000:00000000) > (XEN) Global vm_lock: addr=ffff82d0808177b0, lockval=220022, cpu=4095 > (XEN) lock: 17(00000000:00001ED1), block: 0(00000000:00000000) > (XEN) Global virtual_region_lock: addr=ffff82d0808177a0, lockval=10001, cpu=4095 > (XEN) lock: 1(00000000:00000054), block: 0(00000000:00000000) > (XEN) Global lock: addr=ffff82d080817780, lockval=0000, cpu=4095 > (XEN) lock: 0(00000000:00000000), block: 0(00000000:00000000) > (XEN) Global wc_lock: addr=ffff82d080817770, lockval=c000c, cpu=4095 > (XEN) lock: 11(00000000:00000995), block: 0(00000000:00000000) > (XEN) Global tasklet_lock: addr=ffff82d080817760, lockval=3f403f4, cpu=4095 > (XEN) lock: 426(00000000:00007AFF), block: 0(00000000:00000000) > (XEN) Global sysctl_lock: addr=ffff82d080817750, lockval=0000, cpu=4095 > (XEN) lock: 0(00000000:00000000), block: 0(00000000:00000000) > (XEN) Global symbols_mutex: addr=ffff82d080817740, lockval=0000, cpu=4095 > (XEN) lock: 0(00000000:00000000), block: 0(00000000:00000000) > (XEN) Global stopmachine_lock: addr=ffff82d080817730, lockval=0000, cpu=4095 > (XEN) lock: 0(00000000:00000000), block: 0(00000000:00000000) > (XEN) Global call_lock: addr=ffff82d080817710, lockval=4d004d, cpu=4095 > (XEN) lock: 67(00000000:00176310), block: 0(00000000:00000000) > (XEN) Global heap_lock: addr=ffff82d080817570, lockval=6b286b28, cpu=4095 > (XEN) lock: 753071(00000000:018E9B8B), block: 33(00000000:0000916A) > (XEN) Global payload_lock: addr=ffff82d080817550, lockval=0000, cpu=4095 > (XEN) lock: 0(00000000:00000000), block: 0(00000000:00000000) > (XEN) Global crash_notes_lock: addr=ffff82d080817530, lockval=80008, cpu=4095 > (XEN) lock: 0(00000000:00000000), block: 0(00000000:00000000) > (XEN) Global lock: addr=ffff82d080817510, lockval=0000, cpu=4095 > (XEN) lock: 0(00000000:00000000), block: 0(00000000:00000000) > (XEN) Global global_virq_handlers_lock: addr=ffff82d080817500, lockval=0000, cpu=4095 > (XEN) lock: 0(00000000:00000000), block: 0(00000000:00000000) > (XEN) Global domlist_update_lock: addr=ffff82d0808174e0, lockval=10001, cpu=4095 > (XEN) lock: 1(00000000:0000005B), block: 0(00000000:00000000) > (XEN) Global domctl_lock: addr=ffff82d0808174d0, lockval=50005, cpu=4095 > (XEN) lock: 5(00000000:000008DC), block: 0(00000000:00000000) > (XEN) Global cpupool_lock: addr=ffff82d0808174b0, lockval=a000a, cpu=4095 > (XEN) lock: 1(00000000:00000176), block: 0(00000000:00000000) > (XEN) Global cpu_add_remove_lock: addr=ffff82d0808174a0, lockval=2d002d, cpu=4095 > (XEN) lock: 24(00000000:0000175F), block: 0(00000000:00000000) > (XEN) Global efi_rs_lock: addr=ffff82d080817030, lockval=0000, cpu=4095 > (XEN) lock: 0(00000000:00000000), block: 0(00000000:00000000) > (XEN) Domain 0 event_lock: addr=ffff8308366c50e8, lockval=1d451d45, cpu=4095 > (XEN) lock: 7493(00000000:00866052), block: 0(00000000:00000000) > (XEN) Domain 0 page_alloc_lock: addr=ffff8308366c5028, lockval=9aa59aa5, cpu=4095 > (XEN) lock: 39589(00000000:0055339E), block: 0(00000000:00000000) > (XEN) Domain 0 domain_lock: addr=ffff8308366c5018, lockval=150015, cpu=4095 > (XEN) lock: 21(00000000:00009D70), block: 0(00000000:00000000) > (XEN) Domain 32767 page_alloc_lock: addr=ffff83084ccc5028, lockval=0000, cpu=4095 > (XEN) lock: 0(00000000:00000000), block: 0(00000000:00000000) > (XEN) Domain 32767 domain_lock: addr=ffff83084ccc5018, lockval=0000, cpu=4095 > (XEN) lock: 0(00000000:00000000), block: 0(00000000:00000000) > (XEN) Domain 32753 page_alloc_lock: addr=ffff83084cccf028, lockval=3ee43ee4, cpu=4095 > (XEN) lock: 16100(00000000:0005173F), block: 0(00000000:00000000) > (XEN) Domain 32753 domain_lock: addr=ffff83084cccf018, lockval=0000, cpu=4095 > (XEN) lock: 0(00000000:00000000), block: 0(00000000:00000000) > (XEN) Domain 32754 page_alloc_lock: addr=ffff83084ccd0028, lockval=64006400, cpu=4095 > (XEN) lock: 25600(00000000:000814D9), block: 0(00000000:00000000) > (XEN) Domain 32754 domain_lock: addr=ffff83084ccd0018, lockval=0000, cpu=4095 > (XEN) lock: 0(00000000:00000000), block: 0(00000000:00000000) > > Its not exactly the easiest to dump to follow. > > First of all - I don't see why the hold/block time are printed like that. It > might be a holdover from the 32bit build, pre PRId64 support. They should > probably use PRI_stime anyway. Fine with me. > The domid rendering is unfortunate. Ideally we'd use %pd but that would > involve rearranging the logic to get a struct domain* in hand. Seeing as > you're the last person to modify this code, how hard would that be to do? It would completely break the struct type agnostic design. I have a debug patch adding credit2 run-queue lock. It requires to just add 5 lines of code (and this includes moving the spinlock_init() out of an irq-off section). It will produce: (XEN) credit2-runq 0 lock: addr=ffff830413351010, lockval=de00ddf, cpu=6 (XEN) lock: 896287(00000000:00FAA6B2), block: 819(00000000:00009C80) > If/when the per-domain locks are rendered more clearly, the "Global " prefix > can be dropped. I'm not sure we should do that. > An interesting quirk of ticketlocks is that lockval= is redundant with > lock_cnt. Could we perhaps fold the two fields? No. See my credit2 example. As soon as we get above 65535 lockings the values are no longer redundant. > Distinguishing cpu=4095 as "not locked" would be useful. Yes, will change. > We have several global locks called lock: > > (XEN) Global lock: addr=ffff82d0808181e0, lockval=10001, cpu=4095 > (XEN) lock: 1(00000000:01322165), block: 0(00000000:00000000) > (XEN) Global lock: addr=ffff82d080817cc0, lockval=100010, cpu=4095 > (XEN) lock: 0(00000000:00000000), block: 0(00000000:00000000) > (XEN) Global lock: addr=ffff82d080817800, lockval=0000, cpu=4095 > (XEN) lock: 0(00000000:00000000), block: 0(00000000:00000000) > (XEN) Global lock: addr=ffff82d080817780, lockval=0000, cpu=4095 > (XEN) lock: 0(00000000:00000000), block: 0(00000000:00000000) > (XEN) Global lock: addr=ffff82d080817510, lockval=0000, cpu=4095 > (XEN) lock: 0(00000000:00000000), block: 0(00000000:00000000) > > The second one in particular has corrupt data, seeing has it has been taken > and released several times without lock_cnt increasing. The lock might have been taken/released before lock profiling has been initialized. > For sanity sake, we should enforce unique naming of any lock registered for > profiling. This would be every lock inited via DEFINE_SPINLOCK(). I can do a followup patch for that purpose. Juergen
On 08/08/2019 05:50, Juergen Gross wrote: > On 07.08.19 20:11, Andrew Cooper wrote: >> >> <snip> >> Its not exactly the easiest to dump to follow. >> >> First of all - I don't see why the hold/block time are printed like >> that. It >> might be a holdover from the 32bit build, pre PRId64 support. They >> should >> probably use PRI_stime anyway. > > Fine with me. > >> The domid rendering is unfortunate. Ideally we'd use %pd but that would >> involve rearranging the logic to get a struct domain* in hand. >> Seeing as >> you're the last person to modify this code, how hard would that be to >> do? > > It would completely break the struct type agnostic design. Ok. As an alternatively, how about %pdr which takes a raw domid? It would be a trivial adjustment in the vsnprintf code, and could plausibly be useful elsewhere where we have a domid and not a domain pointer. > > I have a debug patch adding credit2 run-queue lock. It requires to just > add 5 lines of code (and this includes moving the spinlock_init() out of > an irq-off section). It will produce: > > (XEN) credit2-runq 0 lock: addr=ffff830413351010, lockval=de00ddf, cpu=6 > (XEN) lock: 896287(00000000:00FAA6B2), block: 819(00000000:00009C80) What is the 0 here? > >> We have several global locks called lock: >> >> (XEN) Global lock: addr=ffff82d0808181e0, lockval=10001, cpu=4095 >> (XEN) lock: 1(00000000:01322165), block: >> 0(00000000:00000000) >> (XEN) Global lock: addr=ffff82d080817cc0, lockval=100010, cpu=4095 >> (XEN) lock: 0(00000000:00000000), block: >> 0(00000000:00000000) >> (XEN) Global lock: addr=ffff82d080817800, lockval=0000, cpu=4095 >> (XEN) lock: 0(00000000:00000000), block: >> 0(00000000:00000000) >> (XEN) Global lock: addr=ffff82d080817780, lockval=0000, cpu=4095 >> (XEN) lock: 0(00000000:00000000), block: >> 0(00000000:00000000) >> (XEN) Global lock: addr=ffff82d080817510, lockval=0000, cpu=4095 >> (XEN) lock: 0(00000000:00000000), block: >> 0(00000000:00000000) >> >> The second one in particular has corrupt data, seeing has it has been >> taken >> and released several times without lock_cnt increasing. > > The lock might have been taken/released before lock profiling has been > initialized. What is there to initialise? It all looks statically set up. > >> For sanity sake, we should enforce unique naming of any lock >> registered for >> profiling. > > This would be every lock inited via DEFINE_SPINLOCK(). I can do a > followup patch for that purpose. I was wondering how to do this. One option which comes to mind is to put an non-static object into .discard.lock_profile or something, so the linker will object to repeated symbol names and then throw all of them away. ~Andrew
On 08.08.2019 10:33, Andrew Cooper wrote: > On 08/08/2019 05:50, Juergen Gross wrote: >> On 07.08.19 20:11, Andrew Cooper wrote: >>> >>> <snip> >>> Its not exactly the easiest to dump to follow. >>> >>> First of all - I don't see why the hold/block time are printed like >>> that. It >>> might be a holdover from the 32bit build, pre PRId64 support. They >>> should >>> probably use PRI_stime anyway. >> >> Fine with me. >> >>> The domid rendering is unfortunate. Ideally we'd use %pd but that would >>> involve rearranging the logic to get a struct domain* in hand. >>> Seeing as >>> you're the last person to modify this code, how hard would that be to >>> do? >> >> It would completely break the struct type agnostic design. > > Ok. As an alternatively, how about %pdr which takes a raw domid? It > would be a trivial adjustment in the vsnprintf code, and could plausibly > be useful elsewhere where we have a domid and not a domain pointer. At the risk of upsetting / annoying you: A domid_t would again better be formatted via %od (and without the need to take the address of a respective variable). In the case here it would have the minor additional benefit of conserving on format string length. Jan
On 08.08.19 10:33, Andrew Cooper wrote: > On 08/08/2019 05:50, Juergen Gross wrote: >> On 07.08.19 20:11, Andrew Cooper wrote: >>> >>> <snip> >>> Its not exactly the easiest to dump to follow. >>> >>> First of all - I don't see why the hold/block time are printed like >>> that. It >>> might be a holdover from the 32bit build, pre PRId64 support. They >>> should >>> probably use PRI_stime anyway. >> >> Fine with me. >> >>> The domid rendering is unfortunate. Ideally we'd use %pd but that would >>> involve rearranging the logic to get a struct domain* in hand. >>> Seeing as >>> you're the last person to modify this code, how hard would that be to >>> do? >> >> It would completely break the struct type agnostic design. > > Ok. As an alternatively, how about %pdr which takes a raw domid? It > would be a trivial adjustment in the vsnprintf code, and could plausibly > be useful elsewhere where we have a domid and not a domain pointer. Lock profiling has no knowledge that it is working on a struct domain. It is just working on a lock in a struct and an index to differentiate the struct instance. Using the domid as the index is just for making it more easy to understand the printout. I wouldn't want e.g. a per-event lock to be named "IDLE" just because it happens to be index 32767. > >> >> I have a debug patch adding credit2 run-queue lock. It requires to just >> add 5 lines of code (and this includes moving the spinlock_init() out of >> an irq-off section). It will produce: >> >> (XEN) credit2-runq 0 lock: addr=ffff830413351010, lockval=de00ddf, cpu=6 >> (XEN) lock: 896287(00000000:00FAA6B2), block: 819(00000000:00009C80) > > What is the 0 here? The runq number. > >> >>> We have several global locks called lock: >>> >>> (XEN) Global lock: addr=ffff82d0808181e0, lockval=10001, cpu=4095 >>> (XEN) lock: 1(00000000:01322165), block: >>> 0(00000000:00000000) >>> (XEN) Global lock: addr=ffff82d080817cc0, lockval=100010, cpu=4095 >>> (XEN) lock: 0(00000000:00000000), block: >>> 0(00000000:00000000) >>> (XEN) Global lock: addr=ffff82d080817800, lockval=0000, cpu=4095 >>> (XEN) lock: 0(00000000:00000000), block: >>> 0(00000000:00000000) >>> (XEN) Global lock: addr=ffff82d080817780, lockval=0000, cpu=4095 >>> (XEN) lock: 0(00000000:00000000), block: >>> 0(00000000:00000000) >>> (XEN) Global lock: addr=ffff82d080817510, lockval=0000, cpu=4095 >>> (XEN) lock: 0(00000000:00000000), block: >>> 0(00000000:00000000) >>> >>> The second one in particular has corrupt data, seeing has it has been >>> taken >>> and released several times without lock_cnt increasing. >> >> The lock might have been taken/released before lock profiling has been >> initialized. > > What is there to initialise? It all looks statically set up. lock->profile is set only in lock_prof_init(). > >> >>> For sanity sake, we should enforce unique naming of any lock >>> registered for >>> profiling. >> >> This would be every lock inited via DEFINE_SPINLOCK(). I can do a >> followup patch for that purpose. > > I was wondering how to do this. One option which comes to mind is to > put an non-static object into .discard.lock_profile or something, so the > linker will object to repeated symbol names and then throw all of them away. I could just drop the "static" in the _LOCK_PROFILE_PTR() macro. At the same time I should move the ".lockprofile.data" section in the linker scripts to the init part maybe. Juergen
On 08/08/2019 10:08, Jan Beulich wrote: > On 08.08.2019 10:33, Andrew Cooper wrote: >> On 08/08/2019 05:50, Juergen Gross wrote: >>> On 07.08.19 20:11, Andrew Cooper wrote: >>>> <snip> >>>> Its not exactly the easiest to dump to follow. >>>> >>>> First of all - I don't see why the hold/block time are printed like >>>> that. It >>>> might be a holdover from the 32bit build, pre PRId64 support. They >>>> should >>>> probably use PRI_stime anyway. >>> Fine with me. >>> >>>> The domid rendering is unfortunate. Ideally we'd use %pd but that would >>>> involve rearranging the logic to get a struct domain* in hand. >>>> Seeing as >>>> you're the last person to modify this code, how hard would that be to >>>> do? >>> It would completely break the struct type agnostic design. >> Ok. As an alternatively, how about %pdr which takes a raw domid? It >> would be a trivial adjustment in the vsnprintf code, and could plausibly >> be useful elsewhere where we have a domid and not a domain pointer. > At the risk of upsetting / annoying you: Yes you really have > A domid_t would again > better be formatted via %od (and without the need to take the > address of a respective variable). In the case here it would > have the minor additional benefit of conserving on format string > length. There are concrete reasons why it is a terrible idea, because none of this has anything to do with octal, and that %od has a well defined meaning which is not related to domid's. There is also a very good reason why all the magic is hidden behind one single formatter. You seem hell bent on making it actively confusing and complicated to write and read printk()'s, and I am not willing to lumber anyone developing on Xen with this cognitive complexity. I am stick to death of having the same over and over, so let me stop any further wasting of time and be absolutely crystal clear. NACK to any form of overloading %o ~Andrew
On 08/08/2019 10:36, Juergen Gross wrote: > On 08.08.19 10:33, Andrew Cooper wrote: >> On 08/08/2019 05:50, Juergen Gross wrote: >>> On 07.08.19 20:11, Andrew Cooper wrote: >>>> >>>> <snip> >>>> Its not exactly the easiest to dump to follow. >>>> >>>> First of all - I don't see why the hold/block time are printed like >>>> that. It >>>> might be a holdover from the 32bit build, pre PRId64 support. They >>>> should >>>> probably use PRI_stime anyway. >>> >>> Fine with me. >>> >>>> The domid rendering is unfortunate. Ideally we'd use %pd but that >>>> would >>>> involve rearranging the logic to get a struct domain* in hand. >>>> Seeing as >>>> you're the last person to modify this code, how hard would that be to >>>> do? >>> >>> It would completely break the struct type agnostic design. >> >> Ok. As an alternatively, how about %pdr which takes a raw domid? It >> would be a trivial adjustment in the vsnprintf code, and could plausibly >> be useful elsewhere where we have a domid and not a domain pointer. > > Lock profiling has no knowledge that it is working on a struct domain. > It is just working on a lock in a struct and an index to differentiate > the struct instance. Using the domid as the index is just for making it > more easy to understand the printout. > > I wouldn't want e.g. a per-event lock to be named "IDLE" just because > it happens to be index 32767. Ok, but clearly there is something which distinguishes domain indices from other indices? > >> >>> >>>> We have several global locks called lock: >>>> >>>> (XEN) Global lock: addr=ffff82d0808181e0, lockval=10001, cpu=4095 >>>> (XEN) lock: 1(00000000:01322165), block: >>>> 0(00000000:00000000) >>>> (XEN) Global lock: addr=ffff82d080817cc0, lockval=100010, cpu=4095 >>>> (XEN) lock: 0(00000000:00000000), block: >>>> 0(00000000:00000000) >>>> (XEN) Global lock: addr=ffff82d080817800, lockval=0000, cpu=4095 >>>> (XEN) lock: 0(00000000:00000000), block: >>>> 0(00000000:00000000) >>>> (XEN) Global lock: addr=ffff82d080817780, lockval=0000, cpu=4095 >>>> (XEN) lock: 0(00000000:00000000), block: >>>> 0(00000000:00000000) >>>> (XEN) Global lock: addr=ffff82d080817510, lockval=0000, cpu=4095 >>>> (XEN) lock: 0(00000000:00000000), block: >>>> 0(00000000:00000000) >>>> >>>> The second one in particular has corrupt data, seeing has it has been >>>> taken >>>> and released several times without lock_cnt increasing. >>> >>> The lock might have been taken/released before lock profiling has been >>> initialized. >> >> What is there to initialise? It all looks statically set up. > > lock->profile is set only in lock_prof_init(). Ah - so it is. I wonder if this can be done at compile time? Its just a backreference for the forward reference which is done at. (Wow this code is complicated to follow). I think it can be done with a forward declaration of static struct lock_profile __lock_profile_data_##l and passing something other than NULL into _SPIN_LOCK_UNLOCKED(), but that will break the ability to do static DEFINE_SPINLOCK(). Probably not worth messing with this now, but perhaps something to think over. ~Andrew
On 08.08.2019 11:45, Andrew Cooper wrote: > On 08/08/2019 10:08, Jan Beulich wrote: >> On 08.08.2019 10:33, Andrew Cooper wrote: >>> On 08/08/2019 05:50, Juergen Gross wrote: >>>> On 07.08.19 20:11, Andrew Cooper wrote: >>>>> <snip> >>>>> Its not exactly the easiest to dump to follow. >>>>> >>>>> First of all - I don't see why the hold/block time are printed like >>>>> that. It >>>>> might be a holdover from the 32bit build, pre PRId64 support. They >>>>> should >>>>> probably use PRI_stime anyway. >>>> Fine with me. >>>> >>>>> The domid rendering is unfortunate. Ideally we'd use %pd but that would >>>>> involve rearranging the logic to get a struct domain* in hand. >>>>> Seeing as >>>>> you're the last person to modify this code, how hard would that be to >>>>> do? >>>> It would completely break the struct type agnostic design. >>> Ok. As an alternatively, how about %pdr which takes a raw domid? It >>> would be a trivial adjustment in the vsnprintf code, and could plausibly >>> be useful elsewhere where we have a domid and not a domain pointer. >> At the risk of upsetting / annoying you: > > Yes you really have > >> A domid_t would again >> better be formatted via %od (and without the need to take the >> address of a respective variable). In the case here it would >> have the minor additional benefit of conserving on format string >> length. > > There are concrete reasons why it is a terrible idea, because none of > this has anything to do with octal, and that %od has a well defined > meaning which is not related to domid's. I'm curious to learn what well defined meaning %od has. > There is also a very good > reason why all the magic is hidden behind one single formatter. > > You seem hell bent on making it actively confusing and complicated to > write and read printk()'s, and I am not willing to lumber anyone > developing on Xen with this cognitive complexity. > > I am stick to death of having the same over and over, so let me stop any > further wasting of time and be absolutely crystal clear. > > NACK to any form of overloading %o In which case, if I took a similar position for the PCI SBDF formatting using %pp, we'd be in a dead end. Waste of time or not - while you have made crystal clear why you personally dislike overloading %o, afaic you've not provided any non-subjective (i.e. truly technical) reasons, which would be what might help convince me of sticking to just %p overloading. Jan
On 08.08.19 12:20, Andrew Cooper wrote: > On 08/08/2019 10:36, Juergen Gross wrote: >> On 08.08.19 10:33, Andrew Cooper wrote: >>> On 08/08/2019 05:50, Juergen Gross wrote: >>>> On 07.08.19 20:11, Andrew Cooper wrote: >>>>> >>>>> <snip> >>>>> Its not exactly the easiest to dump to follow. >>>>> >>>>> First of all - I don't see why the hold/block time are printed like >>>>> that. It >>>>> might be a holdover from the 32bit build, pre PRId64 support. They >>>>> should >>>>> probably use PRI_stime anyway. >>>> >>>> Fine with me. >>>> >>>>> The domid rendering is unfortunate. Ideally we'd use %pd but that >>>>> would >>>>> involve rearranging the logic to get a struct domain* in hand. >>>>> Seeing as >>>>> you're the last person to modify this code, how hard would that be to >>>>> do? >>>> >>>> It would completely break the struct type agnostic design. >>> >>> Ok. As an alternatively, how about %pdr which takes a raw domid? It >>> would be a trivial adjustment in the vsnprintf code, and could plausibly >>> be useful elsewhere where we have a domid and not a domain pointer. >> >> Lock profiling has no knowledge that it is working on a struct domain. >> It is just working on a lock in a struct and an index to differentiate >> the struct instance. Using the domid as the index is just for making it >> more easy to understand the printout. >> >> I wouldn't want e.g. a per-event lock to be named "IDLE" just because >> it happens to be index 32767. > > Ok, but clearly there is something which distinguishes domain indices > from other indices? Not for lock profiling. Juergen
On 08.08.19 11:36, Juergen Gross wrote: > On 08.08.19 10:33, Andrew Cooper wrote: >> On 08/08/2019 05:50, Juergen Gross wrote: >>> On 07.08.19 20:11, Andrew Cooper wrote: >>>> For sanity sake, we should enforce unique naming of any lock >>>> registered for >>>> profiling. >>> >>> This would be every lock inited via DEFINE_SPINLOCK(). I can do a >>> followup patch for that purpose. >> >> I was wondering how to do this. One option which comes to mind is to >> put an non-static object into .discard.lock_profile or something, so the >> linker will object to repeated symbol names and then throw all of them >> away. > > I could just drop the "static" in the _LOCK_PROFILE_PTR() macro. Unfortunately this doesn't work for locks defined in a function body. Juergen