mbox series

[0/4] enhance lock debugging

Message ID 20190807143119.8360-1-jgross@suse.com (mailing list archive)
Headers show
Series enhance lock debugging | expand

Message

Jürgen Groß Aug. 7, 2019, 2:31 p.m. UTC
While hunting a locking problem in my core scheduling series I have
added some debugging aids to spinlock handling making it easier to
find the root cause for the problem.

Making use of the already lock profiling and enhancing it a little
bit produces some really valuable diagnostic data e.g. when a NMI
watchdog is triggering a crash.

Juergen Gross (4):
  xen/spinlocks: in debug builds store cpu holding the lock
  xen: add new CONFIG_SPINLOCK_DEBUG option
  xen: print lock profile info in panic()
  xen: modify lock profiling interface

 tools/libxc/xc_misc.c       |   1 +
 tools/misc/xenlockprof.c    |  17 ++----
 xen/Kconfig.debug           |   8 +++
 xen/common/domain.c         |   4 +-
 xen/common/spinlock.c       | 134 +++++++++++++++++++++++++++++++-------------
 xen/drivers/char/console.c  |   4 +-
 xen/include/public/sysctl.h |  11 ++--
 xen/include/xen/spinlock.h  |  42 ++++++++------
 8 files changed, 142 insertions(+), 79 deletions(-)

Comments

Andrew Cooper Aug. 7, 2019, 6:11 p.m. UTC | #1
On 07/08/2019 15:31, Juergen Gross wrote:
> While hunting a locking problem in my core scheduling series I have
> added some debugging aids to spinlock handling making it easier to
> find the root cause for the problem.
> 
> Making use of the already lock profiling and enhancing it a little
> bit produces some really valuable diagnostic data e.g. when a NMI
> watchdog is triggering a crash.

So I'm not sure where best to report this in the series, so 0/$N will
have to do.  Here is a sample trace from panic():

  (XEN) Xen lock profile info SHOW  (now = 49695791886 total = 49695791886)
  (XEN) Global grant_lock: addr=ffff82d080818640, lockval=0000, cpu=4095
  (XEN)   lock:           0(00000000:00000000), block:            0(00000000:00000000)
  (XEN) Global balloon_lock: addr=ffff82d080818630, lockval=0000, cpu=4095
  (XEN)   lock:           0(00000000:00000000), block:            0(00000000:00000000)
  (XEN) Global osvw_lock: addr=ffff82d080818610, lockval=0000, cpu=4095
  (XEN)   lock:           0(00000000:00000000), block:            0(00000000:00000000)
  (XEN) Global mtrr_mutex: addr=ffff82d0808185f0, lockval=0000, cpu=4095
  (XEN)   lock:           0(00000000:00000000), block:            0(00000000:00000000)
  (XEN) Global set_atomicity_lock: addr=ffff82d0808185e0, lockval=80008, cpu=4095
  (XEN)   lock:           0(00000000:00000000), block:            0(00000000:00000000)
  (XEN) Global cmci_discover_lock: addr=ffff82d0808185c0, lockval=80008, cpu=4095
  (XEN)   lock:           0(00000000:00000000), block:            0(00000000:00000000)
  (XEN) Global mce_logout_lock: addr=ffff82d0808185a0, lockval=0000, cpu=4095
  (XEN)   lock:           0(00000000:00000000), block:            0(00000000:00000000)
  (XEN) Global processing_lock: addr=ffff82d080818580, lockval=0000, cpu=4095
  (XEN)   lock:           0(00000000:00000000), block:            0(00000000:00000000)
  (XEN) Global vpmu_lock: addr=ffff82d080818560, lockval=40004, cpu=4095
  (XEN)   lock:           4(00000000:0000007D), block:            0(00000000:00000000)
  (XEN) Global pm_lock: addr=ffff82d080818230, lockval=0000, cpu=4095
  (XEN)   lock:           0(00000000:00000000), block:            0(00000000:00000000)
  (XEN) Global rtc_lock: addr=ffff82d080818210, lockval=310031, cpu=4095
  (XEN)   lock:          48(00000000:0001892E), block:            0(00000000:00000000)
  (XEN) Global pit_lock: addr=ffff82d080818200, lockval=0000, cpu=4095
  (XEN)   lock:           0(00000000:00000000), block:            0(00000000:00000000)
  (XEN) Global platform_timer_lock: addr=ffff82d0808181f0, lockval=6c006c, cpu=4095
  (XEN)   lock:          92(00000000:00012CAB), block:            0(00000000:00000000)
  (XEN) Global sync_lock: addr=ffff82d0808181d0, lockval=c188c188, cpu=4095
  (XEN)   lock:      115080(00000000:0074F130), block:            115079(00000000:0871000E)
  (XEN) Global lock: addr=ffff82d0808181e0, lockval=10001, cpu=4095
  (XEN)   lock:           1(00000000:01322165), block:            0(00000000:00000000)
  (XEN) Global flush_lock: addr=ffff82d0808181c0, lockval=91139111, cpu=3
  (XEN)   lock:      168195(00000000:159C6F6E), block:            1695(00000000:006B31FD)
  (XEN) Global xenpf_lock: addr=ffff82d0808181b0, lockval=2f002f, cpu=4095
  (XEN)   lock:          47(00000000:0001BC02), block:            0(00000000:00000000)
  (XEN) Global pci_config_lock: addr=ffff82d0808181a0, lockval=374d374d, cpu=4095
  (XEN)   lock:        5196(00000000:004545AA), block:            0(00000000:00000000)
  (XEN) Global lapic_nmi_owner_lock: addr=ffff82d080817d70, lockval=0000, cpu=4095
  (XEN)   lock:           0(00000000:00000000), block:            0(00000000:00000000)
  (XEN) Global last_lock: addr=ffff82d080817d40, lockval=40034003, cpu=4095
  (XEN)   lock:       16387(00000000:00047132), block:            0(00000000:00000000)
  (XEN) Global map_pgdir_lock: addr=ffff82d080817d50, lockval=2c002c, cpu=4095
  (XEN)   lock:          38(00000000:00000ACF), block:            0(00000000:00000000)
  (XEN) Global microcode_mutex: addr=ffff82d080817d20, lockval=0000, cpu=4095
  (XEN)   lock:           0(00000000:00000000), block:            0(00000000:00000000)
  (XEN) Global microcode_update_lock: addr=ffff82d080817d10, lockval=0000, cpu=4095
  (XEN)   lock:           0(00000000:00000000), block:            0(00000000:00000000)
  (XEN) Global microcode_update_lock: addr=ffff82d080817d00, lockval=0000, cpu=4095
  (XEN)   lock:           0(00000000:00000000), block:            0(00000000:00000000)
  (XEN) Global vector_lock: addr=ffff82d080817cf0, lockval=440044, cpu=4095
  (XEN)   lock:          27(00000000:00003B20), block:            0(00000000:00000000)
  (XEN) Global irq_ratelimit_lock: addr=ffff82d080817ce0, lockval=0000, cpu=4095
  (XEN)   lock:           0(00000000:00000000), block:            0(00000000:00000000)
  (XEN) Global lock: addr=ffff82d080817cc0, lockval=100010, cpu=4095
  (XEN)   lock:           0(00000000:00000000), block:            0(00000000:00000000)
  (XEN) Global msix_fixmap_lock: addr=ffff82d080817cb0, lockval=20002, cpu=4095
  (XEN)   lock:           2(00000000:000000CF), block:            0(00000000:00000000)
  (XEN) Global ioapic_lock: addr=ffff82d080817c90, lockval=660066, cpu=4095
  (XEN)   lock:          20(00000000:00026E6C), block:            0(00000000:00000000)
  (XEN) Global i8259A_lock: addr=ffff82d080817c80, lockval=2d002d, cpu=4095
  (XEN)   lock:           6(00000000:00001BA0), block:            0(00000000:00000000)
  (XEN) Global apei_iomaps_lock: addr=ffff82d080817c50, lockval=110011, cpu=4095
  (XEN)   lock:           0(00000000:00000000), block:            0(00000000:00000000)
  (XEN) Global erst_lock: addr=ffff82d080817c40, lockval=0000, cpu=4095
  (XEN)   lock:           0(00000000:00000000), block:            0(00000000:00000000)
  (XEN) Global shared_intremap_lock: addr=ffff82d080817bc0, lockval=0000, cpu=4095
  (XEN)   lock:           0(00000000:00000000), block:            0(00000000:00000000)
  (XEN) Global iommu_pt_cleanup_lock: addr=ffff82d080817ba0, lockval=0000, cpu=4095
  (XEN)   lock:           0(00000000:00000000), block:            0(00000000:00000000)
  (XEN) Global rx_lock: addr=ffff82d080817b70, lockval=0000, cpu=4095
  (XEN)   lock:           0(00000000:00000000), block:            0(00000000:00000000)
  (XEN) Global tx_lock: addr=ffff82d080817b60, lockval=0000, cpu=4095
  (XEN)   lock:           0(00000000:00000000), block:            0(00000000:00000000)
  (XEN) Global console_lock: addr=ffff82d080817830, lockval=770077, cpu=4095
  (XEN)   lock:        3447(00000000:07595894), block:            0(00000000:00000000)
  (XEN) Global ratelimit_lock: addr=ffff82d080817820, lockval=0000, cpu=4095
  (XEN)   lock:           0(00000000:00000000), block:            0(00000000:00000000)
  (XEN) Global lock: addr=ffff82d080817800, lockval=0000, cpu=4095
  (XEN)   lock:           0(00000000:00000000), block:            0(00000000:00000000)
  (XEN) Global pool_list_lock: addr=ffff82d0808177e0, lockval=10001, cpu=4095
  (XEN)   lock:           0(00000000:00000000), block:            0(00000000:00000000)
  (XEN) Global xenoprof_lock: addr=ffff82d0808177d0, lockval=0000, cpu=4095
  (XEN)   lock:           0(00000000:00000000), block:            0(00000000:00000000)
  (XEN) Global pmu_owner_lock: addr=ffff82d0808177c0, lockval=0000, cpu=4095
  (XEN)   lock:           0(00000000:00000000), block:            0(00000000:00000000)
  (XEN) Global vm_lock: addr=ffff82d0808177b0, lockval=220022, cpu=4095
  (XEN)   lock:          17(00000000:00001ED1), block:            0(00000000:00000000)
  (XEN) Global virtual_region_lock: addr=ffff82d0808177a0, lockval=10001, cpu=4095
  (XEN)   lock:           1(00000000:00000054), block:            0(00000000:00000000)
  (XEN) Global lock: addr=ffff82d080817780, lockval=0000, cpu=4095
  (XEN)   lock:           0(00000000:00000000), block:            0(00000000:00000000)
  (XEN) Global wc_lock: addr=ffff82d080817770, lockval=c000c, cpu=4095
  (XEN)   lock:          11(00000000:00000995), block:            0(00000000:00000000)
  (XEN) Global tasklet_lock: addr=ffff82d080817760, lockval=3f403f4, cpu=4095
  (XEN)   lock:         426(00000000:00007AFF), block:            0(00000000:00000000)
  (XEN) Global sysctl_lock: addr=ffff82d080817750, lockval=0000, cpu=4095
  (XEN)   lock:           0(00000000:00000000), block:            0(00000000:00000000)
  (XEN) Global symbols_mutex: addr=ffff82d080817740, lockval=0000, cpu=4095
  (XEN)   lock:           0(00000000:00000000), block:            0(00000000:00000000)
  (XEN) Global stopmachine_lock: addr=ffff82d080817730, lockval=0000, cpu=4095
  (XEN)   lock:           0(00000000:00000000), block:            0(00000000:00000000)
  (XEN) Global call_lock: addr=ffff82d080817710, lockval=4d004d, cpu=4095
  (XEN)   lock:          67(00000000:00176310), block:            0(00000000:00000000)
  (XEN) Global heap_lock: addr=ffff82d080817570, lockval=6b286b28, cpu=4095
  (XEN)   lock:      753071(00000000:018E9B8B), block:            33(00000000:0000916A)
  (XEN) Global payload_lock: addr=ffff82d080817550, lockval=0000, cpu=4095
  (XEN)   lock:           0(00000000:00000000), block:            0(00000000:00000000)
  (XEN) Global crash_notes_lock: addr=ffff82d080817530, lockval=80008, cpu=4095
  (XEN)   lock:           0(00000000:00000000), block:            0(00000000:00000000)
  (XEN) Global lock: addr=ffff82d080817510, lockval=0000, cpu=4095
  (XEN)   lock:           0(00000000:00000000), block:            0(00000000:00000000)
  (XEN) Global global_virq_handlers_lock: addr=ffff82d080817500, lockval=0000, cpu=4095
  (XEN)   lock:           0(00000000:00000000), block:            0(00000000:00000000)
  (XEN) Global domlist_update_lock: addr=ffff82d0808174e0, lockval=10001, cpu=4095
  (XEN)   lock:           1(00000000:0000005B), block:            0(00000000:00000000)
  (XEN) Global domctl_lock: addr=ffff82d0808174d0, lockval=50005, cpu=4095
  (XEN)   lock:           5(00000000:000008DC), block:            0(00000000:00000000)
  (XEN) Global cpupool_lock: addr=ffff82d0808174b0, lockval=a000a, cpu=4095
  (XEN)   lock:           1(00000000:00000176), block:            0(00000000:00000000)
  (XEN) Global cpu_add_remove_lock: addr=ffff82d0808174a0, lockval=2d002d, cpu=4095
  (XEN)   lock:          24(00000000:0000175F), block:            0(00000000:00000000)
  (XEN) Global efi_rs_lock: addr=ffff82d080817030, lockval=0000, cpu=4095
  (XEN)   lock:           0(00000000:00000000), block:            0(00000000:00000000)
  (XEN) Domain 0 event_lock: addr=ffff8308366c50e8, lockval=1d451d45, cpu=4095
  (XEN)   lock:        7493(00000000:00866052), block:            0(00000000:00000000)
  (XEN) Domain 0 page_alloc_lock: addr=ffff8308366c5028, lockval=9aa59aa5, cpu=4095
  (XEN)   lock:       39589(00000000:0055339E), block:            0(00000000:00000000)
  (XEN) Domain 0 domain_lock: addr=ffff8308366c5018, lockval=150015, cpu=4095
  (XEN)   lock:          21(00000000:00009D70), block:            0(00000000:00000000)
  (XEN) Domain 32767 page_alloc_lock: addr=ffff83084ccc5028, lockval=0000, cpu=4095
  (XEN)   lock:           0(00000000:00000000), block:            0(00000000:00000000)
  (XEN) Domain 32767 domain_lock: addr=ffff83084ccc5018, lockval=0000, cpu=4095
  (XEN)   lock:           0(00000000:00000000), block:            0(00000000:00000000)
  (XEN) Domain 32753 page_alloc_lock: addr=ffff83084cccf028, lockval=3ee43ee4, cpu=4095
  (XEN)   lock:       16100(00000000:0005173F), block:            0(00000000:00000000)
  (XEN) Domain 32753 domain_lock: addr=ffff83084cccf018, lockval=0000, cpu=4095
  (XEN)   lock:           0(00000000:00000000), block:            0(00000000:00000000)
  (XEN) Domain 32754 page_alloc_lock: addr=ffff83084ccd0028, lockval=64006400, cpu=4095
  (XEN)   lock:       25600(00000000:000814D9), block:            0(00000000:00000000)
  (XEN) Domain 32754 domain_lock: addr=ffff83084ccd0018, lockval=0000, cpu=4095
  (XEN)   lock:           0(00000000:00000000), block:            0(00000000:00000000)

Its not exactly the easiest to dump to follow.

First of all - I don't see why the hold/block time are printed like that.  It
might be a holdover from the 32bit build, pre PRId64 support.  They should
probably use PRI_stime anyway.

The domid rendering is unfortunate.  Ideally we'd use %pd but that would
involve rearranging the logic to get a struct domain* in hand.  Seeing as
you're the last person to modify this code, how hard would that be to do?

If/when the per-domain locks are rendered more clearly, the "Global " prefix
can be dropped.

An interesting quirk of ticketlocks is that lockval= is redundant with
lock_cnt.  Could we perhaps fold the two fields?

Distinguishing cpu=4095 as "not locked" would be useful.

We have several global locks called lock:

  (XEN) Global lock: addr=ffff82d0808181e0, lockval=10001, cpu=4095
  (XEN)   lock:           1(00000000:01322165), block:            0(00000000:00000000)
  (XEN) Global lock: addr=ffff82d080817cc0, lockval=100010, cpu=4095
  (XEN)   lock:           0(00000000:00000000), block:            0(00000000:00000000)
  (XEN) Global lock: addr=ffff82d080817800, lockval=0000, cpu=4095
  (XEN)   lock:           0(00000000:00000000), block:            0(00000000:00000000)
  (XEN) Global lock: addr=ffff82d080817780, lockval=0000, cpu=4095
  (XEN)   lock:           0(00000000:00000000), block:            0(00000000:00000000)
  (XEN) Global lock: addr=ffff82d080817510, lockval=0000, cpu=4095
  (XEN)   lock:           0(00000000:00000000), block:            0(00000000:00000000)

The second one in particular has corrupt data, seeing has it has been taken
and released several times without lock_cnt increasing.

For sanity sake, we should enforce unique naming of any lock registered for
profiling.

~Andrew
Jürgen Groß Aug. 8, 2019, 4:50 a.m. UTC | #2
On 07.08.19 20:11, Andrew Cooper wrote:
> On 07/08/2019 15:31, Juergen Gross wrote:
>> While hunting a locking problem in my core scheduling series I have
>> added some debugging aids to spinlock handling making it easier to
>> find the root cause for the problem.
>>
>> Making use of the already lock profiling and enhancing it a little
>> bit produces some really valuable diagnostic data e.g. when a NMI
>> watchdog is triggering a crash.
> 
> So I'm not sure where best to report this in the series, so 0/$N will
> have to do.  Here is a sample trace from panic():
> 
>    (XEN) Xen lock profile info SHOW  (now = 49695791886 total = 49695791886)
>    (XEN) Global grant_lock: addr=ffff82d080818640, lockval=0000, cpu=4095
>    (XEN)   lock:           0(00000000:00000000), block:            0(00000000:00000000)
>    (XEN) Global balloon_lock: addr=ffff82d080818630, lockval=0000, cpu=4095
>    (XEN)   lock:           0(00000000:00000000), block:            0(00000000:00000000)
>    (XEN) Global osvw_lock: addr=ffff82d080818610, lockval=0000, cpu=4095
>    (XEN)   lock:           0(00000000:00000000), block:            0(00000000:00000000)
>    (XEN) Global mtrr_mutex: addr=ffff82d0808185f0, lockval=0000, cpu=4095
>    (XEN)   lock:           0(00000000:00000000), block:            0(00000000:00000000)
>    (XEN) Global set_atomicity_lock: addr=ffff82d0808185e0, lockval=80008, cpu=4095
>    (XEN)   lock:           0(00000000:00000000), block:            0(00000000:00000000)
>    (XEN) Global cmci_discover_lock: addr=ffff82d0808185c0, lockval=80008, cpu=4095
>    (XEN)   lock:           0(00000000:00000000), block:            0(00000000:00000000)
>    (XEN) Global mce_logout_lock: addr=ffff82d0808185a0, lockval=0000, cpu=4095
>    (XEN)   lock:           0(00000000:00000000), block:            0(00000000:00000000)
>    (XEN) Global processing_lock: addr=ffff82d080818580, lockval=0000, cpu=4095
>    (XEN)   lock:           0(00000000:00000000), block:            0(00000000:00000000)
>    (XEN) Global vpmu_lock: addr=ffff82d080818560, lockval=40004, cpu=4095
>    (XEN)   lock:           4(00000000:0000007D), block:            0(00000000:00000000)
>    (XEN) Global pm_lock: addr=ffff82d080818230, lockval=0000, cpu=4095
>    (XEN)   lock:           0(00000000:00000000), block:            0(00000000:00000000)
>    (XEN) Global rtc_lock: addr=ffff82d080818210, lockval=310031, cpu=4095
>    (XEN)   lock:          48(00000000:0001892E), block:            0(00000000:00000000)
>    (XEN) Global pit_lock: addr=ffff82d080818200, lockval=0000, cpu=4095
>    (XEN)   lock:           0(00000000:00000000), block:            0(00000000:00000000)
>    (XEN) Global platform_timer_lock: addr=ffff82d0808181f0, lockval=6c006c, cpu=4095
>    (XEN)   lock:          92(00000000:00012CAB), block:            0(00000000:00000000)
>    (XEN) Global sync_lock: addr=ffff82d0808181d0, lockval=c188c188, cpu=4095
>    (XEN)   lock:      115080(00000000:0074F130), block:            115079(00000000:0871000E)
>    (XEN) Global lock: addr=ffff82d0808181e0, lockval=10001, cpu=4095
>    (XEN)   lock:           1(00000000:01322165), block:            0(00000000:00000000)
>    (XEN) Global flush_lock: addr=ffff82d0808181c0, lockval=91139111, cpu=3
>    (XEN)   lock:      168195(00000000:159C6F6E), block:            1695(00000000:006B31FD)
>    (XEN) Global xenpf_lock: addr=ffff82d0808181b0, lockval=2f002f, cpu=4095
>    (XEN)   lock:          47(00000000:0001BC02), block:            0(00000000:00000000)
>    (XEN) Global pci_config_lock: addr=ffff82d0808181a0, lockval=374d374d, cpu=4095
>    (XEN)   lock:        5196(00000000:004545AA), block:            0(00000000:00000000)
>    (XEN) Global lapic_nmi_owner_lock: addr=ffff82d080817d70, lockval=0000, cpu=4095
>    (XEN)   lock:           0(00000000:00000000), block:            0(00000000:00000000)
>    (XEN) Global last_lock: addr=ffff82d080817d40, lockval=40034003, cpu=4095
>    (XEN)   lock:       16387(00000000:00047132), block:            0(00000000:00000000)
>    (XEN) Global map_pgdir_lock: addr=ffff82d080817d50, lockval=2c002c, cpu=4095
>    (XEN)   lock:          38(00000000:00000ACF), block:            0(00000000:00000000)
>    (XEN) Global microcode_mutex: addr=ffff82d080817d20, lockval=0000, cpu=4095
>    (XEN)   lock:           0(00000000:00000000), block:            0(00000000:00000000)
>    (XEN) Global microcode_update_lock: addr=ffff82d080817d10, lockval=0000, cpu=4095
>    (XEN)   lock:           0(00000000:00000000), block:            0(00000000:00000000)
>    (XEN) Global microcode_update_lock: addr=ffff82d080817d00, lockval=0000, cpu=4095
>    (XEN)   lock:           0(00000000:00000000), block:            0(00000000:00000000)
>    (XEN) Global vector_lock: addr=ffff82d080817cf0, lockval=440044, cpu=4095
>    (XEN)   lock:          27(00000000:00003B20), block:            0(00000000:00000000)
>    (XEN) Global irq_ratelimit_lock: addr=ffff82d080817ce0, lockval=0000, cpu=4095
>    (XEN)   lock:           0(00000000:00000000), block:            0(00000000:00000000)
>    (XEN) Global lock: addr=ffff82d080817cc0, lockval=100010, cpu=4095
>    (XEN)   lock:           0(00000000:00000000), block:            0(00000000:00000000)
>    (XEN) Global msix_fixmap_lock: addr=ffff82d080817cb0, lockval=20002, cpu=4095
>    (XEN)   lock:           2(00000000:000000CF), block:            0(00000000:00000000)
>    (XEN) Global ioapic_lock: addr=ffff82d080817c90, lockval=660066, cpu=4095
>    (XEN)   lock:          20(00000000:00026E6C), block:            0(00000000:00000000)
>    (XEN) Global i8259A_lock: addr=ffff82d080817c80, lockval=2d002d, cpu=4095
>    (XEN)   lock:           6(00000000:00001BA0), block:            0(00000000:00000000)
>    (XEN) Global apei_iomaps_lock: addr=ffff82d080817c50, lockval=110011, cpu=4095
>    (XEN)   lock:           0(00000000:00000000), block:            0(00000000:00000000)
>    (XEN) Global erst_lock: addr=ffff82d080817c40, lockval=0000, cpu=4095
>    (XEN)   lock:           0(00000000:00000000), block:            0(00000000:00000000)
>    (XEN) Global shared_intremap_lock: addr=ffff82d080817bc0, lockval=0000, cpu=4095
>    (XEN)   lock:           0(00000000:00000000), block:            0(00000000:00000000)
>    (XEN) Global iommu_pt_cleanup_lock: addr=ffff82d080817ba0, lockval=0000, cpu=4095
>    (XEN)   lock:           0(00000000:00000000), block:            0(00000000:00000000)
>    (XEN) Global rx_lock: addr=ffff82d080817b70, lockval=0000, cpu=4095
>    (XEN)   lock:           0(00000000:00000000), block:            0(00000000:00000000)
>    (XEN) Global tx_lock: addr=ffff82d080817b60, lockval=0000, cpu=4095
>    (XEN)   lock:           0(00000000:00000000), block:            0(00000000:00000000)
>    (XEN) Global console_lock: addr=ffff82d080817830, lockval=770077, cpu=4095
>    (XEN)   lock:        3447(00000000:07595894), block:            0(00000000:00000000)
>    (XEN) Global ratelimit_lock: addr=ffff82d080817820, lockval=0000, cpu=4095
>    (XEN)   lock:           0(00000000:00000000), block:            0(00000000:00000000)
>    (XEN) Global lock: addr=ffff82d080817800, lockval=0000, cpu=4095
>    (XEN)   lock:           0(00000000:00000000), block:            0(00000000:00000000)
>    (XEN) Global pool_list_lock: addr=ffff82d0808177e0, lockval=10001, cpu=4095
>    (XEN)   lock:           0(00000000:00000000), block:            0(00000000:00000000)
>    (XEN) Global xenoprof_lock: addr=ffff82d0808177d0, lockval=0000, cpu=4095
>    (XEN)   lock:           0(00000000:00000000), block:            0(00000000:00000000)
>    (XEN) Global pmu_owner_lock: addr=ffff82d0808177c0, lockval=0000, cpu=4095
>    (XEN)   lock:           0(00000000:00000000), block:            0(00000000:00000000)
>    (XEN) Global vm_lock: addr=ffff82d0808177b0, lockval=220022, cpu=4095
>    (XEN)   lock:          17(00000000:00001ED1), block:            0(00000000:00000000)
>    (XEN) Global virtual_region_lock: addr=ffff82d0808177a0, lockval=10001, cpu=4095
>    (XEN)   lock:           1(00000000:00000054), block:            0(00000000:00000000)
>    (XEN) Global lock: addr=ffff82d080817780, lockval=0000, cpu=4095
>    (XEN)   lock:           0(00000000:00000000), block:            0(00000000:00000000)
>    (XEN) Global wc_lock: addr=ffff82d080817770, lockval=c000c, cpu=4095
>    (XEN)   lock:          11(00000000:00000995), block:            0(00000000:00000000)
>    (XEN) Global tasklet_lock: addr=ffff82d080817760, lockval=3f403f4, cpu=4095
>    (XEN)   lock:         426(00000000:00007AFF), block:            0(00000000:00000000)
>    (XEN) Global sysctl_lock: addr=ffff82d080817750, lockval=0000, cpu=4095
>    (XEN)   lock:           0(00000000:00000000), block:            0(00000000:00000000)
>    (XEN) Global symbols_mutex: addr=ffff82d080817740, lockval=0000, cpu=4095
>    (XEN)   lock:           0(00000000:00000000), block:            0(00000000:00000000)
>    (XEN) Global stopmachine_lock: addr=ffff82d080817730, lockval=0000, cpu=4095
>    (XEN)   lock:           0(00000000:00000000), block:            0(00000000:00000000)
>    (XEN) Global call_lock: addr=ffff82d080817710, lockval=4d004d, cpu=4095
>    (XEN)   lock:          67(00000000:00176310), block:            0(00000000:00000000)
>    (XEN) Global heap_lock: addr=ffff82d080817570, lockval=6b286b28, cpu=4095
>    (XEN)   lock:      753071(00000000:018E9B8B), block:            33(00000000:0000916A)
>    (XEN) Global payload_lock: addr=ffff82d080817550, lockval=0000, cpu=4095
>    (XEN)   lock:           0(00000000:00000000), block:            0(00000000:00000000)
>    (XEN) Global crash_notes_lock: addr=ffff82d080817530, lockval=80008, cpu=4095
>    (XEN)   lock:           0(00000000:00000000), block:            0(00000000:00000000)
>    (XEN) Global lock: addr=ffff82d080817510, lockval=0000, cpu=4095
>    (XEN)   lock:           0(00000000:00000000), block:            0(00000000:00000000)
>    (XEN) Global global_virq_handlers_lock: addr=ffff82d080817500, lockval=0000, cpu=4095
>    (XEN)   lock:           0(00000000:00000000), block:            0(00000000:00000000)
>    (XEN) Global domlist_update_lock: addr=ffff82d0808174e0, lockval=10001, cpu=4095
>    (XEN)   lock:           1(00000000:0000005B), block:            0(00000000:00000000)
>    (XEN) Global domctl_lock: addr=ffff82d0808174d0, lockval=50005, cpu=4095
>    (XEN)   lock:           5(00000000:000008DC), block:            0(00000000:00000000)
>    (XEN) Global cpupool_lock: addr=ffff82d0808174b0, lockval=a000a, cpu=4095
>    (XEN)   lock:           1(00000000:00000176), block:            0(00000000:00000000)
>    (XEN) Global cpu_add_remove_lock: addr=ffff82d0808174a0, lockval=2d002d, cpu=4095
>    (XEN)   lock:          24(00000000:0000175F), block:            0(00000000:00000000)
>    (XEN) Global efi_rs_lock: addr=ffff82d080817030, lockval=0000, cpu=4095
>    (XEN)   lock:           0(00000000:00000000), block:            0(00000000:00000000)
>    (XEN) Domain 0 event_lock: addr=ffff8308366c50e8, lockval=1d451d45, cpu=4095
>    (XEN)   lock:        7493(00000000:00866052), block:            0(00000000:00000000)
>    (XEN) Domain 0 page_alloc_lock: addr=ffff8308366c5028, lockval=9aa59aa5, cpu=4095
>    (XEN)   lock:       39589(00000000:0055339E), block:            0(00000000:00000000)
>    (XEN) Domain 0 domain_lock: addr=ffff8308366c5018, lockval=150015, cpu=4095
>    (XEN)   lock:          21(00000000:00009D70), block:            0(00000000:00000000)
>    (XEN) Domain 32767 page_alloc_lock: addr=ffff83084ccc5028, lockval=0000, cpu=4095
>    (XEN)   lock:           0(00000000:00000000), block:            0(00000000:00000000)
>    (XEN) Domain 32767 domain_lock: addr=ffff83084ccc5018, lockval=0000, cpu=4095
>    (XEN)   lock:           0(00000000:00000000), block:            0(00000000:00000000)
>    (XEN) Domain 32753 page_alloc_lock: addr=ffff83084cccf028, lockval=3ee43ee4, cpu=4095
>    (XEN)   lock:       16100(00000000:0005173F), block:            0(00000000:00000000)
>    (XEN) Domain 32753 domain_lock: addr=ffff83084cccf018, lockval=0000, cpu=4095
>    (XEN)   lock:           0(00000000:00000000), block:            0(00000000:00000000)
>    (XEN) Domain 32754 page_alloc_lock: addr=ffff83084ccd0028, lockval=64006400, cpu=4095
>    (XEN)   lock:       25600(00000000:000814D9), block:            0(00000000:00000000)
>    (XEN) Domain 32754 domain_lock: addr=ffff83084ccd0018, lockval=0000, cpu=4095
>    (XEN)   lock:           0(00000000:00000000), block:            0(00000000:00000000)
> 
> Its not exactly the easiest to dump to follow.
> 
> First of all - I don't see why the hold/block time are printed like that.  It
> might be a holdover from the 32bit build, pre PRId64 support.  They should
> probably use PRI_stime anyway.

Fine with me.

> The domid rendering is unfortunate.  Ideally we'd use %pd but that would
> involve rearranging the logic to get a struct domain* in hand.  Seeing as
> you're the last person to modify this code, how hard would that be to do?

It would completely break the struct type agnostic design.

I have a debug patch adding credit2 run-queue lock. It requires to just
add 5 lines of code (and this includes moving the spinlock_init() out of
an irq-off section). It will produce:

(XEN) credit2-runq 0 lock: addr=ffff830413351010, lockval=de00ddf, cpu=6
(XEN)   lock: 896287(00000000:00FAA6B2), block:  819(00000000:00009C80)

> If/when the per-domain locks are rendered more clearly, the "Global " prefix
> can be dropped.

I'm not sure we should do that.

> An interesting quirk of ticketlocks is that lockval= is redundant with
> lock_cnt.  Could we perhaps fold the two fields?

No. See my credit2 example. As soon as we get above 65535 lockings the
values are no longer redundant.

> Distinguishing cpu=4095 as "not locked" would be useful.

Yes, will change.

> We have several global locks called lock:
> 
>    (XEN) Global lock: addr=ffff82d0808181e0, lockval=10001, cpu=4095
>    (XEN)   lock:           1(00000000:01322165), block:            0(00000000:00000000)
>    (XEN) Global lock: addr=ffff82d080817cc0, lockval=100010, cpu=4095
>    (XEN)   lock:           0(00000000:00000000), block:            0(00000000:00000000)
>    (XEN) Global lock: addr=ffff82d080817800, lockval=0000, cpu=4095
>    (XEN)   lock:           0(00000000:00000000), block:            0(00000000:00000000)
>    (XEN) Global lock: addr=ffff82d080817780, lockval=0000, cpu=4095
>    (XEN)   lock:           0(00000000:00000000), block:            0(00000000:00000000)
>    (XEN) Global lock: addr=ffff82d080817510, lockval=0000, cpu=4095
>    (XEN)   lock:           0(00000000:00000000), block:            0(00000000:00000000)
> 
> The second one in particular has corrupt data, seeing has it has been taken
> and released several times without lock_cnt increasing.

The lock might have been taken/released before lock profiling has been
initialized.

> For sanity sake, we should enforce unique naming of any lock registered for
> profiling.

This would be every lock inited via DEFINE_SPINLOCK(). I can do a
followup patch for that purpose.


Juergen
Andrew Cooper Aug. 8, 2019, 8:33 a.m. UTC | #3
On 08/08/2019 05:50, Juergen Gross wrote:
> On 07.08.19 20:11, Andrew Cooper wrote:
>>
>> <snip>
>> Its not exactly the easiest to dump to follow.
>>
>> First of all - I don't see why the hold/block time are printed like
>> that.  It
>> might be a holdover from the 32bit build, pre PRId64 support.  They
>> should
>> probably use PRI_stime anyway.
>
> Fine with me.
>
>> The domid rendering is unfortunate.  Ideally we'd use %pd but that would
>> involve rearranging the logic to get a struct domain* in hand. 
>> Seeing as
>> you're the last person to modify this code, how hard would that be to
>> do?
>
> It would completely break the struct type agnostic design.

Ok.  As an alternatively, how about %pdr which takes a raw domid?  It
would be a trivial adjustment in the vsnprintf code, and could plausibly
be useful elsewhere where we have a domid and not a domain pointer.

>
> I have a debug patch adding credit2 run-queue lock. It requires to just
> add 5 lines of code (and this includes moving the spinlock_init() out of
> an irq-off section). It will produce:
>
> (XEN) credit2-runq 0 lock: addr=ffff830413351010, lockval=de00ddf, cpu=6
> (XEN)   lock: 896287(00000000:00FAA6B2), block:  819(00000000:00009C80)

What is the 0 here?

>
>> We have several global locks called lock:
>>
>>    (XEN) Global lock: addr=ffff82d0808181e0, lockval=10001, cpu=4095
>>    (XEN)   lock:           1(00000000:01322165), block:           
>> 0(00000000:00000000)
>>    (XEN) Global lock: addr=ffff82d080817cc0, lockval=100010, cpu=4095
>>    (XEN)   lock:           0(00000000:00000000), block:           
>> 0(00000000:00000000)
>>    (XEN) Global lock: addr=ffff82d080817800, lockval=0000, cpu=4095
>>    (XEN)   lock:           0(00000000:00000000), block:           
>> 0(00000000:00000000)
>>    (XEN) Global lock: addr=ffff82d080817780, lockval=0000, cpu=4095
>>    (XEN)   lock:           0(00000000:00000000), block:           
>> 0(00000000:00000000)
>>    (XEN) Global lock: addr=ffff82d080817510, lockval=0000, cpu=4095
>>    (XEN)   lock:           0(00000000:00000000), block:           
>> 0(00000000:00000000)
>>
>> The second one in particular has corrupt data, seeing has it has been
>> taken
>> and released several times without lock_cnt increasing.
>
> The lock might have been taken/released before lock profiling has been
> initialized.

What is there to initialise?  It all looks statically set up.

>
>> For sanity sake, we should enforce unique naming of any lock
>> registered for
>> profiling.
>
> This would be every lock inited via DEFINE_SPINLOCK(). I can do a
> followup patch for that purpose.

I was wondering how to do this.  One option which comes to mind is to
put an non-static object into .discard.lock_profile or something, so the
linker will object to repeated symbol names and then throw all of them away.

~Andrew
Jan Beulich Aug. 8, 2019, 9:08 a.m. UTC | #4
On 08.08.2019 10:33, Andrew Cooper wrote:
> On 08/08/2019 05:50, Juergen Gross wrote:
>> On 07.08.19 20:11, Andrew Cooper wrote:
>>>
>>> <snip>
>>> Its not exactly the easiest to dump to follow.
>>>
>>> First of all - I don't see why the hold/block time are printed like
>>> that.  It
>>> might be a holdover from the 32bit build, pre PRId64 support.  They
>>> should
>>> probably use PRI_stime anyway.
>>
>> Fine with me.
>>
>>> The domid rendering is unfortunate.  Ideally we'd use %pd but that would
>>> involve rearranging the logic to get a struct domain* in hand.
>>> Seeing as
>>> you're the last person to modify this code, how hard would that be to
>>> do?
>>
>> It would completely break the struct type agnostic design.
> 
> Ok.  As an alternatively, how about %pdr which takes a raw domid?  It
> would be a trivial adjustment in the vsnprintf code, and could plausibly
> be useful elsewhere where we have a domid and not a domain pointer.

At the risk of upsetting / annoying you: A domid_t would again
better be formatted via %od (and without the need to take the
address of a respective variable). In the case here it would
have the minor additional benefit of conserving on format string
length.

Jan
Jürgen Groß Aug. 8, 2019, 9:36 a.m. UTC | #5
On 08.08.19 10:33, Andrew Cooper wrote:
> On 08/08/2019 05:50, Juergen Gross wrote:
>> On 07.08.19 20:11, Andrew Cooper wrote:
>>>
>>> <snip>
>>> Its not exactly the easiest to dump to follow.
>>>
>>> First of all - I don't see why the hold/block time are printed like
>>> that.  It
>>> might be a holdover from the 32bit build, pre PRId64 support.  They
>>> should
>>> probably use PRI_stime anyway.
>>
>> Fine with me.
>>
>>> The domid rendering is unfortunate.  Ideally we'd use %pd but that would
>>> involve rearranging the logic to get a struct domain* in hand.
>>> Seeing as
>>> you're the last person to modify this code, how hard would that be to
>>> do?
>>
>> It would completely break the struct type agnostic design.
> 
> Ok.  As an alternatively, how about %pdr which takes a raw domid?  It
> would be a trivial adjustment in the vsnprintf code, and could plausibly
> be useful elsewhere where we have a domid and not a domain pointer.

Lock profiling has no knowledge that it is working on a struct domain.
It is just working on a lock in a struct and an index to differentiate
the struct instance. Using the domid as the index is just for making it
more easy to understand the printout.

I wouldn't want e.g. a per-event lock to be named "IDLE" just because
it happens to be index 32767.

> 
>>
>> I have a debug patch adding credit2 run-queue lock. It requires to just
>> add 5 lines of code (and this includes moving the spinlock_init() out of
>> an irq-off section). It will produce:
>>
>> (XEN) credit2-runq 0 lock: addr=ffff830413351010, lockval=de00ddf, cpu=6
>> (XEN)   lock: 896287(00000000:00FAA6B2), block:  819(00000000:00009C80)
> 
> What is the 0 here?

The runq number.

> 
>>
>>> We have several global locks called lock:
>>>
>>>     (XEN) Global lock: addr=ffff82d0808181e0, lockval=10001, cpu=4095
>>>     (XEN)   lock:           1(00000000:01322165), block:
>>> 0(00000000:00000000)
>>>     (XEN) Global lock: addr=ffff82d080817cc0, lockval=100010, cpu=4095
>>>     (XEN)   lock:           0(00000000:00000000), block:
>>> 0(00000000:00000000)
>>>     (XEN) Global lock: addr=ffff82d080817800, lockval=0000, cpu=4095
>>>     (XEN)   lock:           0(00000000:00000000), block:
>>> 0(00000000:00000000)
>>>     (XEN) Global lock: addr=ffff82d080817780, lockval=0000, cpu=4095
>>>     (XEN)   lock:           0(00000000:00000000), block:
>>> 0(00000000:00000000)
>>>     (XEN) Global lock: addr=ffff82d080817510, lockval=0000, cpu=4095
>>>     (XEN)   lock:           0(00000000:00000000), block:
>>> 0(00000000:00000000)
>>>
>>> The second one in particular has corrupt data, seeing has it has been
>>> taken
>>> and released several times without lock_cnt increasing.
>>
>> The lock might have been taken/released before lock profiling has been
>> initialized.
> 
> What is there to initialise?  It all looks statically set up.

lock->profile is set only in lock_prof_init().

> 
>>
>>> For sanity sake, we should enforce unique naming of any lock
>>> registered for
>>> profiling.
>>
>> This would be every lock inited via DEFINE_SPINLOCK(). I can do a
>> followup patch for that purpose.
> 
> I was wondering how to do this.  One option which comes to mind is to
> put an non-static object into .discard.lock_profile or something, so the
> linker will object to repeated symbol names and then throw all of them away.

I could just drop the "static" in the _LOCK_PROFILE_PTR() macro.
At the same time I should move the ".lockprofile.data" section in the
linker scripts to the init part maybe.


Juergen
Andrew Cooper Aug. 8, 2019, 9:45 a.m. UTC | #6
On 08/08/2019 10:08, Jan Beulich wrote:
> On 08.08.2019 10:33, Andrew Cooper wrote:
>> On 08/08/2019 05:50, Juergen Gross wrote:
>>> On 07.08.19 20:11, Andrew Cooper wrote:
>>>> <snip>
>>>> Its not exactly the easiest to dump to follow.
>>>>
>>>> First of all - I don't see why the hold/block time are printed like
>>>> that.  It
>>>> might be a holdover from the 32bit build, pre PRId64 support.  They
>>>> should
>>>> probably use PRI_stime anyway.
>>> Fine with me.
>>>
>>>> The domid rendering is unfortunate.  Ideally we'd use %pd but that would
>>>> involve rearranging the logic to get a struct domain* in hand.
>>>> Seeing as
>>>> you're the last person to modify this code, how hard would that be to
>>>> do?
>>> It would completely break the struct type agnostic design.
>> Ok.  As an alternatively, how about %pdr which takes a raw domid?  It
>> would be a trivial adjustment in the vsnprintf code, and could plausibly
>> be useful elsewhere where we have a domid and not a domain pointer.
> At the risk of upsetting / annoying you:

Yes you really have

> A domid_t would again
> better be formatted via %od (and without the need to take the
> address of a respective variable). In the case here it would
> have the minor additional benefit of conserving on format string
> length.

There are concrete reasons why it is a terrible idea, because none of
this has anything to do with octal, and that %od has a well defined
meaning which is not related to domid's.  There is also a very good
reason why all the magic is hidden behind one single formatter.

You seem hell bent on making it actively confusing and complicated to
write and read printk()'s, and I am not willing to lumber anyone
developing on Xen with this cognitive complexity.

I am stick to death of having the same over and over, so let me stop any
further wasting of time and be absolutely crystal clear.

NACK to any form of overloading %o

~Andrew
Andrew Cooper Aug. 8, 2019, 10:20 a.m. UTC | #7
On 08/08/2019 10:36, Juergen Gross wrote:
> On 08.08.19 10:33, Andrew Cooper wrote:
>> On 08/08/2019 05:50, Juergen Gross wrote:
>>> On 07.08.19 20:11, Andrew Cooper wrote:
>>>>
>>>> <snip>
>>>> Its not exactly the easiest to dump to follow.
>>>>
>>>> First of all - I don't see why the hold/block time are printed like
>>>> that.  It
>>>> might be a holdover from the 32bit build, pre PRId64 support.  They
>>>> should
>>>> probably use PRI_stime anyway.
>>>
>>> Fine with me.
>>>
>>>> The domid rendering is unfortunate.  Ideally we'd use %pd but that
>>>> would
>>>> involve rearranging the logic to get a struct domain* in hand.
>>>> Seeing as
>>>> you're the last person to modify this code, how hard would that be to
>>>> do?
>>>
>>> It would completely break the struct type agnostic design.
>>
>> Ok.  As an alternatively, how about %pdr which takes a raw domid?  It
>> would be a trivial adjustment in the vsnprintf code, and could plausibly
>> be useful elsewhere where we have a domid and not a domain pointer.
>
> Lock profiling has no knowledge that it is working on a struct domain.
> It is just working on a lock in a struct and an index to differentiate
> the struct instance. Using the domid as the index is just for making it
> more easy to understand the printout.
>
> I wouldn't want e.g. a per-event lock to be named "IDLE" just because
> it happens to be index 32767.

Ok, but clearly there is something which distinguishes domain indices
from other indices?

>
>>
>>>
>>>> We have several global locks called lock:
>>>>
>>>>     (XEN) Global lock: addr=ffff82d0808181e0, lockval=10001, cpu=4095
>>>>     (XEN)   lock:           1(00000000:01322165), block:
>>>> 0(00000000:00000000)
>>>>     (XEN) Global lock: addr=ffff82d080817cc0, lockval=100010, cpu=4095
>>>>     (XEN)   lock:           0(00000000:00000000), block:
>>>> 0(00000000:00000000)
>>>>     (XEN) Global lock: addr=ffff82d080817800, lockval=0000, cpu=4095
>>>>     (XEN)   lock:           0(00000000:00000000), block:
>>>> 0(00000000:00000000)
>>>>     (XEN) Global lock: addr=ffff82d080817780, lockval=0000, cpu=4095
>>>>     (XEN)   lock:           0(00000000:00000000), block:
>>>> 0(00000000:00000000)
>>>>     (XEN) Global lock: addr=ffff82d080817510, lockval=0000, cpu=4095
>>>>     (XEN)   lock:           0(00000000:00000000), block:
>>>> 0(00000000:00000000)
>>>>
>>>> The second one in particular has corrupt data, seeing has it has been
>>>> taken
>>>> and released several times without lock_cnt increasing.
>>>
>>> The lock might have been taken/released before lock profiling has been
>>> initialized.
>>
>> What is there to initialise?  It all looks statically set up.
>
> lock->profile is set only in lock_prof_init().

Ah - so it is.  I wonder if this can be done at compile time?  Its just
a backreference for the forward reference which is done at.

(Wow this code is complicated to follow).  I think it can be done with a
forward declaration of static struct lock_profile
__lock_profile_data_##l and passing something other than NULL into
_SPIN_LOCK_UNLOCKED(), but that will break the ability to do static
DEFINE_SPINLOCK().

Probably not worth messing with this now, but perhaps something to think
over.

~Andrew
Jan Beulich Aug. 8, 2019, 11:51 a.m. UTC | #8
On 08.08.2019 11:45, Andrew Cooper wrote:
> On 08/08/2019 10:08, Jan Beulich wrote:
>> On 08.08.2019 10:33, Andrew Cooper wrote:
>>> On 08/08/2019 05:50, Juergen Gross wrote:
>>>> On 07.08.19 20:11, Andrew Cooper wrote:
>>>>> <snip>
>>>>> Its not exactly the easiest to dump to follow.
>>>>>
>>>>> First of all - I don't see why the hold/block time are printed like
>>>>> that.  It
>>>>> might be a holdover from the 32bit build, pre PRId64 support.  They
>>>>> should
>>>>> probably use PRI_stime anyway.
>>>> Fine with me.
>>>>
>>>>> The domid rendering is unfortunate.  Ideally we'd use %pd but that would
>>>>> involve rearranging the logic to get a struct domain* in hand.
>>>>> Seeing as
>>>>> you're the last person to modify this code, how hard would that be to
>>>>> do?
>>>> It would completely break the struct type agnostic design.
>>> Ok.  As an alternatively, how about %pdr which takes a raw domid?  It
>>> would be a trivial adjustment in the vsnprintf code, and could plausibly
>>> be useful elsewhere where we have a domid and not a domain pointer.
>> At the risk of upsetting / annoying you:
> 
> Yes you really have
> 
>> A domid_t would again
>> better be formatted via %od (and without the need to take the
>> address of a respective variable). In the case here it would
>> have the minor additional benefit of conserving on format string
>> length.
> 
> There are concrete reasons why it is a terrible idea, because none of
> this has anything to do with octal, and that %od has a well defined
> meaning which is not related to domid's.

I'm curious to learn what well defined meaning %od has.

>  There is also a very good
> reason why all the magic is hidden behind one single formatter.
> 
> You seem hell bent on making it actively confusing and complicated to
> write and read printk()'s, and I am not willing to lumber anyone
> developing on Xen with this cognitive complexity.
> 
> I am stick to death of having the same over and over, so let me stop any
> further wasting of time and be absolutely crystal clear.
> 
> NACK to any form of overloading %o

In which case, if I took a similar position for the PCI SBDF
formatting using %pp, we'd be in a dead end. Waste of time or not
- while you have made crystal clear why you personally dislike
overloading %o, afaic you've not provided any non-subjective
(i.e. truly technical) reasons, which would be what might help
convince me of sticking to just %p overloading.

Jan
Jürgen Groß Aug. 8, 2019, 12:02 p.m. UTC | #9
On 08.08.19 12:20, Andrew Cooper wrote:
> On 08/08/2019 10:36, Juergen Gross wrote:
>> On 08.08.19 10:33, Andrew Cooper wrote:
>>> On 08/08/2019 05:50, Juergen Gross wrote:
>>>> On 07.08.19 20:11, Andrew Cooper wrote:
>>>>>
>>>>> <snip>
>>>>> Its not exactly the easiest to dump to follow.
>>>>>
>>>>> First of all - I don't see why the hold/block time are printed like
>>>>> that.  It
>>>>> might be a holdover from the 32bit build, pre PRId64 support.  They
>>>>> should
>>>>> probably use PRI_stime anyway.
>>>>
>>>> Fine with me.
>>>>
>>>>> The domid rendering is unfortunate.  Ideally we'd use %pd but that
>>>>> would
>>>>> involve rearranging the logic to get a struct domain* in hand.
>>>>> Seeing as
>>>>> you're the last person to modify this code, how hard would that be to
>>>>> do?
>>>>
>>>> It would completely break the struct type agnostic design.
>>>
>>> Ok.  As an alternatively, how about %pdr which takes a raw domid?  It
>>> would be a trivial adjustment in the vsnprintf code, and could plausibly
>>> be useful elsewhere where we have a domid and not a domain pointer.
>>
>> Lock profiling has no knowledge that it is working on a struct domain.
>> It is just working on a lock in a struct and an index to differentiate
>> the struct instance. Using the domid as the index is just for making it
>> more easy to understand the printout.
>>
>> I wouldn't want e.g. a per-event lock to be named "IDLE" just because
>> it happens to be index 32767.
> 
> Ok, but clearly there is something which distinguishes domain indices
> from other indices?

Not for lock profiling.


Juergen
Jürgen Groß Aug. 8, 2019, 3:26 p.m. UTC | #10
On 08.08.19 11:36, Juergen Gross wrote:
> On 08.08.19 10:33, Andrew Cooper wrote:
>> On 08/08/2019 05:50, Juergen Gross wrote:
>>> On 07.08.19 20:11, Andrew Cooper wrote:
>>>> For sanity sake, we should enforce unique naming of any lock
>>>> registered for
>>>> profiling.
>>>
>>> This would be every lock inited via DEFINE_SPINLOCK(). I can do a
>>> followup patch for that purpose.
>>
>> I was wondering how to do this.  One option which comes to mind is to
>> put an non-static object into .discard.lock_profile or something, so the
>> linker will object to repeated symbol names and then throw all of them 
>> away.
> 
> I could just drop the "static" in the _LOCK_PROFILE_PTR() macro.

Unfortunately this doesn't work for locks defined in a function body.


Juergen