diff mbox

rdma-core in Dabian

Message ID 1494613473.14477.12.camel@sandisk.com (mailing list archive)
State Not Applicable
Headers show

Commit Message

Bart Van Assche May 12, 2017, 6:24 p.m. UTC
On Tue, 2017-05-09 at 12:18 -0600, Jason Gunthorpe wrote:
> On Tue, May 09, 2017 at 05:54:33PM +0000, Bart Van Assche wrote:
> > > * Bonus points: consolidate the srp daemon. Debian ships a different
> > > service file than upstream, but I am against an additional layer
> > > introduced by srp_daemon.sh. It would also be nice to have a systemd
> > > service shipped by upstream (and not just in the redhat directory)
> > 
> > I will have a look at this too.
> 
> My thoughts..
> 
> It looked to me like srp_daemon needed one process per port.
> 
> The best path looked to me like using systemd templates (eg
> srp_daemon@mlx4_0/0) to allow systemd to directly manage the per port
> srp_daemon.
> 
> Then the question is how to request the right templates are
> created.. Perhaps udev rules can do this directly, but I'm not sure
> about how to get the port #, perhaps an udev triggered script or
> something can do it.
> 
> Perhaps there could be an inbetween progarm that took the udev events
> and asked systemd to make the right units.
> 
> Another option is to have a srp_daemon 'runner' that monitors udev and
> actively manages a set of fork'd childern so that the children cover
> all required ports. But this is actually somewhat hard to do well..
> 
> The big issue with the redhat script is that it is not hotplug safe,
> and needs special boot ordering, which we really need to get away from
> in general for robustness.

Hello Jason,

Thanks for having shared your thoughts.

When using systemd templates instead of the current approach it would
become cumbersome for users to stop and start the srp_daemon on all ports
simultaneously. Additionally, if an RDMA adapter is hot-added, should the
srp_daemon be started or should it not be started for the newly added ports?

An in-between program could indeed do the job of listening for udev events
and actively managing srp_daemon children. Since we already have an
in-between program, namely srp_daemon.sh, the simplest approach is to modify
that shell script. Can you have a look at the attached patch?

Thanks,

Bart.

Comments

Jason Gunthorpe May 12, 2017, 7:51 p.m. UTC | #1
On Fri, May 12, 2017 at 06:24:35PM +0000, Bart Van Assche wrote:

> When using systemd templates instead of the current approach it would
> become cumbersome for users to stop and start the srp_daemon on all ports
> simultaneously.

I think systemd PartOf dependencies is intended to solve that
problem..

       PartOf=

           Configures dependencies similar to Requires=, but limited
           to stopping and restarting of units. When systemd stops or restarts
           the units listed here, the action is propagated to this
           unit. Note that this is a one-way dependency -- changes to this unit
           do not affect the listed units.

So there could still be a service name that covered all of the
children (eg srp_daemon.service and srp_daemon_port@.service)

> Additionally, if an RDMA adapter is hot-added, should the srp_daemon
> be started or should it not be started for the newly added ports?

I'm not sure what is best for that policy..

Today it looks like srp_daemon defaults to explicitly opt-in to
running on ports. To continue that policy the admin would have to
explicitly opt in to each port. Eg perhaps something like:

 systemctl enable srp_daemon@mlx4_0/0

Instead of editing a /etc/ config file.

systemd has a '.device' system, which as I understand it, would allow
srp_daemon@mlx4_0/0 to depend on mlx4_0/0.device, which is created by
udev on hotplug, and would trigger running the service. 

(perhaps for this reason the template value would be umad0, not mlx4_0/0)

> An in-between program could indeed do the job of listening for udev events
> and actively managing srp_daemon children. Since we already have an
> in-between program, namely srp_daemon.sh, the simplest approach is to modify
> that shell script. Can you have a look at the attached patch?

This seems like a reasonable improvement, but as I said, the inbetween
program is a bit more complex that you probably want to do with a
script:

- Sites do not like things running on timers due to the introduced
  jitter, so the trivial sleep loop is not nice.
- Failure of a child should result in a sensible log message and a
  restart protocol like systemd has. Failure to SIGTERM the child should
  escalate to SIGKILL to prevent lockups during shutdown
- Dumping stderr/etc into /dev/null is a bad idea (eg glibc internal
  asserts are lost now). In a systemd world each child should get its
  own stderrr logger connection so this stuff works right.
- We still need to be able to configure which ports srp_daemon runs
  on and to 'reload' that configuration into the script
- Ideally hot removal of an adaptor requires halting just those
  daemons without triggering restart

Since so much of that overlaps with systemd capabilities it does make
some sense to have systemd manage the child processes directly.

For instance you could have the srp_daemon script like in the patch
but replace this:

  ${prog} -e -c -n [..]
with
  systemctl start srp_daemon_port@${hca_id}/${port}

Using PartOf dependencies.. The user experience would be very similar
to the script directly forking, but many more of the above areas are
covered off..

But.. in that case we don't even need the script. If the admin wishes
to run srp_daemon on all IB ports then a simple udev rule will do the
job:

SUBSYSTEM=="infiniband", TAG+="systemd", ATTRS{node_desc}=="*", ENV{SYSTEMD_WANTS}+="srp_daemon_port@$name.service"

[more or less]

And now the timer is gone too.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

From bec3884e36298ab2b6fb09c533b575fa95bd3378 Mon Sep 17 00:00:00 2001
From: Bart Van Assche <bart.vanassche@sandisk.com>
Date: Fri, 12 May 2017 10:24:44 -0700
Subject: [PATCH] srp_daemon.service: Add support for hotplugging

Instead of only starting srp_daemon.sh after the RDMA subsystem has
been initialized, make srp_daemon.sh scan periodically for RDMA ports.
An advantage of the new approach is that it supports hot-add of RDMA
adapters.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
---
 redhat/srp_daemon.service   |  4 ----
 srp_daemon/srp_daemon.sh.in | 41 ++++++++++++++++++++++++++---------------
 2 files changed, 26 insertions(+), 19 deletions(-)

diff --git a/redhat/srp_daemon.service b/redhat/srp_daemon.service
index 9510f5fb..cf98aa74 100644
--- a/redhat/srp_daemon.service
+++ b/redhat/srp_daemon.service
@@ -3,10 +3,6 @@  Description=Start or stop the daemon that attaches to SRP devices
 Documentation=man:srp_daemon file:/etc/rdma/rdma.conf file:/etc/srp_daemon.conf
 DefaultDependencies=false
 Conflicts=emergency.target emergency.service
-Requires=rdma.service
-Wants=opensm.service
-After=rdma.service opensm.service
-After=network.target
 Before=remote-fs-pre.target
 
 [Service]
diff --git a/srp_daemon/srp_daemon.sh.in b/srp_daemon/srp_daemon.sh.in
index 75e8a31b..74c08d7a 100755
--- a/srp_daemon/srp_daemon.sh.in
+++ b/srp_daemon/srp_daemon.sh.in
@@ -37,6 +37,7 @@  rescan_interval=60
 pids=()
 pidfile="@CMAKE_INSTALL_FULL_RUNDIR@/srp_daemon.sh.pid"
 mypid=$$
+umad_devs=()
 
 trap_handler()
 {
@@ -49,6 +50,17 @@  trap_handler()
     exit 0
 }
 
+# Check whether $1 is equal to one of $2..${$#}
+contains()
+{
+    local v
+
+    for v in "${@:2}"; do
+	[ "$v" = "$1" ] && return 0
+    done
+    return 1
+}
+
 # Check if there is another copy running of srp_daemon.sh
 if [ -f "$pidfile" ]; then
     if [ -e "/proc/$(cat "$pidfile" 2>/dev/null)/status" ]; then
@@ -66,19 +78,18 @@  fi
 
 trap 'trap_handler' 2 15
 
-while [ ! -d ${ibdir} ]
-do
-    sleep 30
+while :; do
+    for d in ${ibdir}_mad/umad*; do
+	[ -e "$d" ] || continue
+	contains "$d" "${umad_devs[@]}" && continue
+	hca_id="$(<"$d/ibdev")"
+	port="$(<"$d/port")"
+	add_target="${ibdir}_srp/srp-${hca_id}-${port}/add_target"
+	if [ -e "${add_target}" ]; then
+            ${prog} -e -c -n -i "${hca_id}" -p "${port}" -R "${rescan_interval}" "${params[@]}" >/dev/null 2>&1 &
+            pids+=($!)
+	    umad_dev+=($d)
+	fi
+    done
+    sleep $rescan_interval
 done
-
-for d in ${ibdir}_mad/umad*; do
-    hca_id="$(<"$d/ibdev")"
-    port="$(<"$d/port")"
-    add_target="${ibdir}_srp/srp-${hca_id}-${port}/add_target"
-    if [ -e "${add_target}" ]; then
-        ${prog} -e -c -n -i "${hca_id}" -p "${port}" -R "${rescan_interval}" "${params[@]}" >/dev/null 2>&1 &
-        pids+=($!)
-    fi
-done
-
-wait
-- 
2.12.2