mbox series

[RFC,0/7] dlm: the ultimate verifier for DLM lock correctness

Message ID 20240827180236.316946-1-aahringo@redhat.com (mailing list archive)
Headers show
Series dlm: the ultimate verifier for DLM lock correctness | expand

Message

Alexander Aring Aug. 27, 2024, 6:02 p.m. UTC
Hi,

I send this rfc patch series to show a (for me) usable use-case for the
DLM net-namespace functionality that is currently pending, see [0]. This
patch-series introduce the DLM verifier to check on DLM correctness on
any workload running on DLM with net-namespace feature. E.g. on gfs2 you
can just run some filesystem benchmark tests and see if DLM works as
aspected.

This comes very useful when DLM recovery kicks in e.g. when nodes
leaving the lockspace due e.g. fencing and recovery solves lock
dependencies transparently from the user. However there is no "fake
fencing switch" yet for DLM net-namespaces, but might be an idea for
future functionality.

There could be bugs in the verifier, that I don't care if they exists...
We need to check whats happening when the verifier complains but so far
everything looks fine. It just an issue if the verifier doesn't say
anything but a small bug introduced in DLM and the verifier will
complain a lot.

There might be still improvements in the DLM verifier. I needed to
change a little bit the python scripts to generate the code but I did
not add them here to this patch series. Also checkpatch complains about
some things in the verifier code but I oriented myself mostly to the
other existing verifiers. There is a printout of all holders if those
violates the DLM compatible locking states. I might improve them when I
actually try to figure out an existing problem, but for now this
printout is very minimal.

I mainly do this work because I prepare more changes in the DLM recovery
code in future to scale with lockspaces with a lot of members that we
can easily try out with the net-namespace functionality.

I cc here the rcu people, may they also get some ideas to check on lock
correctness using tracing kernel verifier subsystem.

- Alex

[0] https://lore.kernel.org/gfs2/20240814143414.1877505-1-aahringo@redhat.com/

Alexander Aring (7):
  dlm: fix possible lkb_resource null dereference
  dlm: fix swapped args sb_flags vs sb_status
  dlm: make add_to_waiters() that is can't fail
  dlm: add our_nodeid to tracepoints
  dlm: add lkb rv mode to ast tracepoint
  dlm: add more tracepoints for DLM kernel verifier
  rv: add dlm compatible lock state kernel verifier

 Documentation/trace/rv/monitor_dlm.rst |  77 +++++
 fs/dlm/ast.c                           |  30 +-
 fs/dlm/dlm_internal.h                  |   3 +
 fs/dlm/lock.c                          |  64 ++--
 fs/dlm/lockspace.c                     |   4 +
 fs/dlm/user.c                          |   9 +-
 include/trace/events/dlm.h             | 121 ++++++-
 include/trace/events/rv.h              |   9 +
 kernel/trace/rv/Kconfig                |  18 +
 kernel/trace/rv/Makefile               |   1 +
 kernel/trace/rv/monitors/dlm/dlm.c     | 445 +++++++++++++++++++++++++
 kernel/trace/rv/monitors/dlm/dlm.h     |  38 +++
 kernel/trace/rv/monitors/dlm/dlm_da.h  | 143 ++++++++
 tools/verification/models/dlm.dot      |  14 +
 14 files changed, 907 insertions(+), 69 deletions(-)
 create mode 100644 Documentation/trace/rv/monitor_dlm.rst
 create mode 100644 kernel/trace/rv/monitors/dlm/dlm.c
 create mode 100644 kernel/trace/rv/monitors/dlm/dlm.h
 create mode 100644 kernel/trace/rv/monitors/dlm/dlm_da.h
 create mode 100644 tools/verification/models/dlm.dot