diff mbox

[rdma-core] srp_daemon: srp_daemon.service should be started after network.target

Message ID 20171229101006.27861-1-honli@redhat.com (mailing list archive)
State Rejected
Headers show

Commit Message

Honggang LI Dec. 29, 2017, 10:10 a.m. UTC
From: Honggang Li <honli@redhat.com>

The srp_daemon service will be started at the very beginning state
of systemd when boot/reboot the machine, in case srp_daemon.service
is not after network.target. As result, the srp_daemon.service will
be terminated because of SERVICE_FAILURE_RESOURCES.

Fixes: 1c7fe513e3e9 ("srp_daemon: One systemd service per port")
Signed-off-by: Honggang Li <honli@redhat.com>
---
 srp_daemon/srp_daemon.service.in | 1 +
 1 file changed, 1 insertion(+)

Comments

Jason Gunthorpe Dec. 29, 2017, 6 p.m. UTC | #1
On Fri, Dec 29, 2017 at 06:10:06PM +0800, Honggang LI wrote:
> From: Honggang Li <honli@redhat.com>
> 
> The srp_daemon service will be started at the very beginning state
> of systemd when boot/reboot the machine, in case srp_daemon.service
> is not after network.target. As result, the srp_daemon.service will
> be terminated because of SERVICE_FAILURE_RESOURCES.

How is this possible?  srp_daemon.service just runs a script that
doesn't touch the network.

I can't see any way that srp_daemon.service should have this added,
please explain more what is going on.

I could potentially understand needing it in srp_daemon_port@.service,
but even that needs much more explaination about what exactly is
causing this requirement.

You said SERVICE_FAILURE_RESOURCES which is an internal systemd error
code. Is this because of PrivateNetwork=yes or something similar?

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Honggang LI Dec. 30, 2017, 1:59 p.m. UTC | #2
On Fri, Dec 29, 2017 at 11:00:58AM -0700, Jason Gunthorpe wrote:
> On Fri, Dec 29, 2017 at 06:10:06PM +0800, Honggang LI wrote:
> > From: Honggang Li <honli@redhat.com>
> > 
> > The srp_daemon service will be started at the very beginning state
> > of systemd when boot/reboot the machine, in case srp_daemon.service
> > is not after network.target. As result, the srp_daemon.service will
> > be terminated because of SERVICE_FAILURE_RESOURCES.
> 
> How is this possible?  srp_daemon.service just runs a script that
> doesn't touch the network.

To reproduce it, you just need enable srp_daemon.serice and then reboot
the machine. Watch the serial console when you are waiting for machine
boot up. Please see attached /var/log/boot.log for details. After system
boot up, check the status of srp_daemon.serice.

============
systemctl status srp_daemon.service -l
● srp_daemon.service - Daemon that discovers and logs in to SRP target systems
   Loaded: loaded (/usr/lib/systemd/system/srp_daemon.service; enabled; vendor preset: disabled)
   Active: failed (Result: resources)
     Docs: man:srp_daemon
           file:/etc/srp_daemon.conf

Dec 30 08:45:25 localhost.localdomain systemd[1]: [/usr/lib/systemd/system/srp_daemon.service:12] Unknown lvalue 'MemoryDenyWriteExecute' in section 'Service'
Dec 30 08:45:25 localhost.localdomain systemd[1]: [/usr/lib/systemd/system/srp_daemon.service:15] Unknown lvalue 'ProtectKernelModules' in section 'Service'
Dec 30 08:45:25 localhost.localdomain systemd[1]: [/usr/lib/systemd/system/srp_daemon.service:16] Unknown lvalue 'RestrictRealtime' in section 'Service'
=============

Note, it does not make any difference after remove all unknown lvaule
from srp_daemon.serivce and srp_daemon_port@serivce. We are using an
old version of systemd, which does not support such lvaule.

> 
> I can't see any way that srp_daemon.service should have this added,
> please explain more what is going on.
> 
> I could potentially understand needing it in srp_daemon_port@.service,

No, it does not work with srp_daemon_port@.service.

$ grep -w -n network.target /usr/lib/systemd/system/srp_daemon*
/usr/lib/systemd/system/srp_daemon_port@.service:21:After=srp_daemon.service network.target

> but even that needs much more explaination about what exactly is
> causing this requirement.
> 
> You said SERVICE_FAILURE_RESOURCES which is an internal systemd error
> code. 

Yes, it is systemd-219-51.el7.x86_64 error code.

> Is this because of PrivateNetwork=yes or something similar?

How to test or verify this?
[  OK  ] Started Show Plymouth Boot Screen.
[  OK  ] Reached target Paths.
[  OK  ] Reached target Basic System.
%G[  OK  ] Found device MB0500GCEHF 7.
         Starting File System Check on /dev/...7-f857-41d3-b7ad-5b9618f6f39e...
[  OK  ] Started dracut initqueue hook.
[  OK  ] Reached target Remote File Systems (Pre).
[  OK  ] Reached target Remote File Systems.
[  OK  ] Started File System Check on /dev/d...7b7-f857-41d3-b7ad-5b9618f6f39e.
         Mounting /sysroot...
[  OK  ] Mounted /sysroot.
[  OK  ] Reached target Initrd Root File System.
         Starting Reload Configuration from the Real Root...
[  OK  ] Started Reload Configuration from the Real Root.
[  OK  ] Reached target Initrd File Systems.
[  OK  ] Reached target Initrd Default Target.
         Starting dracut pre-pivot and cleanup hook...
[  OK  ] Started dracut pre-pivot and cleanup hook.
         Starting Cleaning Up and Shutting Down Daemons...
[  OK  ] Stopped target Timers.
         Starting Plymouth switch root service...
[  OK  ] Stopped Cleaning Up and Shutting Down Daemons.
[  OK  ] Stopped dracut pre-pivot and cleanup hook.
         Stopping dracut pre-pivot and cleanup hook...
[  OK  ] Stopped target Remote File Systems.
[  OK  ] Stopped target Remote File Systems (Pre).
[  OK  ] Stopped dracut initqueue hook.
         Stopping dracut initqueue hook...
[  OK  ] Stopped target Initrd Default Target.
[  OK  ] Stopped target Basic System.
[  OK  ] Stopped target Paths.
[  OK  ] Stopped target Slices.
[  OK  ] Stopped target System Initialization.
         Stopping udev Kernel Device Manager...
[  OK  ] Stopped target Local File Systems.
[  OK  ] Stopped udev Coldplug all Devices.
         Stopping udev Coldplug all Devices...
[  OK  ] Stopped Apply Kernel Variables.
         Stopping Apply Kernel Variables...
[  OK  ] Stopped target Swap.
[  OK  ] Stopped target Sockets.
[  OK  ] Stopped udev Kernel Device Manager.
[  OK  ] Stopped Create Static Device Nodes in /dev.
         Stopping Create Static Device Nodes in /dev...
[  OK  ] Stopped Create list of required sta...ce nodes for the current kernel.
         Stopping Create list of required st... nodes for the current kernel...
[  OK  ] Stopped dracut pre-udev hook.
         Stopping dracut pre-udev hook...
[  OK  ] Stopped dracut cmdline hook.
         Stopping dracut cmdline hook...
[  OK  ] Closed udev Control Socket.
[  OK  ] Closed udev Kernel Socket.
         Starting Cleanup udevd DB...
[  OK  ] Started Cleanup udevd DB.
[  OK  ] Reached target Switch Root.
[  OK  ] Started Plymouth switch root service.
         Starting Switch Root...

Welcome to Red Hat Enterprise Linux Server 7.5 Beta (Maipo)!

[  OK  ] Stopped Switch Root.
[  OK  ] Stopped Journal Service.
         Starting Journal Service...
[  OK  ] Listening on udev Control Socket.
[  OK  ] Listening on udev Kernel Socket.
         Mounting POSIX Message Queue File System...
[FAILED] Failed to start Daemon that discove...d logs in to SRP target systems.
See 'systemctl status srp_daemon.service' for details.
         Starting Daemon that discovers and logs in to SRP target systems...
[  OK  ] Created slice system-selinux\x2dpol...grate\x2dlocal\x2dchanges.slice.
[  OK  ] Listening on /dev/initctl Compatibility Named Pipe.
[  OK  ] Stopped File System Check on Root Device.
         Stopping File System Check on Root Device...
[  OK  ] Set up automount Arbitrary Executab...ats File System Automount Point.
[  OK  ] Created slice User and Session Slice.
         Mounting Huge Pages File System...
[  OK  ] Reached target Slices.
[  OK  ] Listening on Delayed Shutdown Socket.
[  OK  ] Created slice system-serial\x2dgetty.slice.
[  OK  ] Reached target Local Encrypted Volumes.
         Mounting Debug File System...
[  OK  ] Stopped target Switch Root.
[  OK  ] Stopped target Initrd File Systems.
         Starting Create list of required st... nodes for the current kernel...
[  OK  ] Stopped target Initrd Root File System.
[  OK  ] Created slice system-getty.slice.
         Starting Initialize the iWARP/InfiniBand/RDMA stack in the kernel...
         Starting Apply Kernel Variables...
         Starting Remount Root and Kernel File Systems...
[  OK  ] Created slice system-systemd\x2dfsck.slice.
[  OK  ] Mounted Debug File System.
[  OK  ] Mounted POSIX Message Queue File System.
[  OK  ] Mounted Huge Pages File System.
[  OK  ] Started Journal Service.
[  OK  ] Started Create list of required sta...ce nodes for the current kernel.
[  OK  ] Started Remount Root and Kernel File Systems.
         Starting Load/Save Random Seed...
         Starting Configure read-only root support...
         Starting udev Coldplug all Devices...
         Starting Create Static Device Nodes in /dev...
         Starting Flush Journal to Persistent Storage...
[  OK  ] Started Apply Kernel Variables.
[  OK  ] Started udev Coldplug all Devices.
[  OK  ] Started Load/Save Random Seed.
[  OK  ] Started Flush Journal to Persistent Storage.
[  OK  ] Started Create Static Device Nodes in /dev.
[  OK  ] Reached target Local File Systems (Pre).
         Starting udev Kernel Device Manager...
[  OK  ] Started Configure read-only root support.
[  OK  ] Started udev Kernel Device Manager.
[  OK  ] Found device /dev/ttyS1.
%G%G[  OK  ] Found device MB0500GCEHF 3.
[  OK  ] Found device MB0500GCEHF 2.
[  OK  ] Found device MB0500GCEHF 6.
[  OK  ] Found device MB0500GCEHF 5.
         Mounting /mnt/rdma-xfs...
         Activating swap /dev/disk/by-uuid/a...6-f0ce-4513-99fa-72bbdb9f5309...
         Mounting /boot...
         Starting File System Check on /dev/...6-129a-4cb2-9040-a0484cbc8765...
[  OK  ] Activated swap /dev/disk/by-uuid/add068f6-f0ce-4513-99fa-72bbdb9f5309.
[  OK  ] Reached target Swap.
[   17.739449] systemd-fsck[561]: /dev/sda3: clean, 11/1638400 files, 146893/6553600 blocks
[  OK  ] Mounted /boot.
[  OK  ] Started File System Check on /dev/d...386-129a-4cb2-9040-a0484cbc8765.
         Mounting /mnt/rdma-ext4...
[  OK  ] Mounted /mnt/rdma-xfs.
[  OK  ] Mounted /mnt/rdma-ext4.
[  OK  ] Reached target Local File Systems.
         Starting Preprocess NFS configuration...
         Starting Tell Plymouth To Write Out Runtime Data...
         Starting Import network configuration from initramfs...
         Starting RDMA Node Description Daemon...
[  OK  ] Created slice system-rdma\x2dload\x2dmodules.slice.
         Starting Load RDMA modules from /etc/rdma/modules/roce.conf...
         Starting Load RDMA modules from /etc/rdma/modules/infiniband.conf...
         Starting Load RDMA modules from /etc/rdma/modules/rdma.conf...
[  OK  ] Started Preprocess NFS configuration.
[  OK  ] Started Load RDMA modules from /etc/rdma/modules/roce.conf.
[  OK  ] Started Import network configuration from initramfs.
         Starting Create Volatile Files and Directories...
[  OK  ] Started Tell Plymouth To Write Out Runtime Data.
[  OK  ] Started RDMA Node Description Daemon.
[  OK  ] Started Create Volatile Files and Directories.
         Starting Security Auditing Service...
         Mounting RPC Pipe File System...
[  OK  ] Started Load RDMA modules from /etc/rdma/modules/infiniband.conf.
[  OK  ] Mounted RPC Pipe File System.
[  OK  ] Reached target rpc_pipefs.target.
[  OK  ] Started Load RDMA modules from /etc/rdma/modules/rdma.conf.
[  OK  ] Reached target RDMA Hardware.
[  OK  ] Reached target Network (Pre).
[  OK  ] Started Initialize the iWARP/InfiniBand/RDMA stack in the kernel.
         Starting Starts the OpenSM InfiniBand fabric Subnet Manager...
[  OK  ] Started Starts the OpenSM InfiniBand fabric Subnet Manager.
[  OK  ] Started Security Auditing Service.
         Starting Update UTMP about System Boot/Shutdown...
[  OK  ] Started Update UTMP about System Boot/Shutdown.
[  OK  ] Reached target System Initialization.
[  OK  ] Reached target Paths.
[  OK  ] Listening on D-Bus System Message Bus Socket.
[  OK  ] Reached target Timers.
[  OK  ] Listening on Open-iSCSI iscsid Socket.
[  OK  ] Listening on RPCbind Server Activation Socket.
         Starting RPC bind service...
[  OK  ] Listening on Open-iSCSI iscsiuio Socket.
[  OK  ] Reached target Sockets.
[  OK  ] Reached target Basic System.
         Starting GSSAPI Proxy Daemon...
[  OK  ] Started D-Bus System Message Bus.
         Starting D-Bus System Message Bus...
         Starting Network Manager...
         Starting Dump dmesg to /var/log/dmesg...
[  OK  ] Started irqbalance daemon.
         Starting irqbalance daemon...
         Starting Login Service...
         Starting Authorization Manager...
         Starting Load CPU microcode update...
         Starting NTP client/server...
[  OK  ] Started RPC bind service.
[  OK  ] Started GSSAPI Proxy Daemon.
[  OK  ] Started Load CPU microcode update.
[  OK  ] Reached target NFS client services.
[  OK  ] Started Login Service.
[  OK  ] Started Dump dmesg to /var/log/dmesg.
[  OK  ] Started NTP client/server.
         Starting Wait for chrony to synchronize system clock...
[  OK  ] Started Authorization Manager.
         Starting Hostname Service...
[  OK  ] Started Hostname Service.
[  OK  ] Started Network Manager.
[  OK  ] Reached target Network.
         Starting Enable periodic update of entitlement certificates....
         Starting Login and scanning of iSCSI devices...
         Starting OpenSSH server daemon...
         Starting Dynamic System Tuning Daemon...
         Starting Logout off all iSCSI sessions on shutdown...
         Starting Postfix Mail Transport Agent...
[  OK  ] Reached target Network is Online.
         Starting System Logging Service...
         Starting Notify NFS peers of a restart...
[  OK  ] Started Enable periodic update of entitlement certificates..
[  OK  ] Started Logout off all iSCSI sessions on shutdown.
         Starting Network Manager Script Dispatcher Service...
[  OK  ] Started Notify NFS peers of a restart.
[  OK  ] Started System Logging Service.
         Starting Open-iSCSI...
[  OK  ] Started Open-iSCSI.
[  OK  ] Started OpenSSH server daemon.
[  OK  ] Started Network Manager Script Dispatcher Service.
[*     ] (1 of 4) A start job is running for...ze system clock (12s / no limit)
[  OK  ] Started Dynamic System Tuning Daemon.
[  OK  ] Started Postfix Mail Transport Agent.
[  OK  ] Reached target Remote File Systems (Pre).
[  OK  ] Reached target Remote File Systems.
         Starting Permit User Sessions...
         Starting Crash recovery kernel arming...
         Starting Availability of block devices...
[  OK  ] Started Availability of block devices.
[  OK  ] Started Permit User Sessions.
         Starting Terminate Plymouth Boot Screen...
         Starting Wait for Plymouth Boot Screen to Quit...
[  OK  ] Started Job spooling tools.
         Starting Job spooling tools...
Jason Gunthorpe Jan. 1, 2018, 7:19 p.m. UTC | #3
On Sat, Dec 30, 2017 at 09:59:27PM +0800, Honggang LI wrote:
> On Fri, Dec 29, 2017 at 11:00:58AM -0700, Jason Gunthorpe wrote:
> > On Fri, Dec 29, 2017 at 06:10:06PM +0800, Honggang LI wrote:
> > > From: Honggang Li <honli@redhat.com>
> > > 
> > > The srp_daemon service will be started at the very beginning state
> > > of systemd when boot/reboot the machine, in case srp_daemon.service
> > > is not after network.target. As result, the srp_daemon.service will
> > > be terminated because of SERVICE_FAILURE_RESOURCES.
> > 
> > How is this possible?  srp_daemon.service just runs a script that
> > doesn't touch the network.
> 
> To reproduce it, you just need enable srp_daemon.serice and then reboot
> the machine. Watch the serial console when you are waiting for machine
> boot up. Please see attached /var/log/boot.log for details. After system
> boot up, check the status of srp_daemon.serice.

Well, I did this sort of testing when I originally set stuff up with
no problem. But I used Ubuntu Xenial with a newer systemd

So we need to find a root cause before we can evalute this as the
right solution..

> > but even that needs much more explaination about what exactly is
> > causing this requirement.
> > 
> > You said SERVICE_FAILURE_RESOURCES which is an internal systemd error
> > code. 
> 
> Yes, it is systemd-219-51.el7.x86_64 error code.
> 
> > Is this because of PrivateNetwork=yes or something similar?
>
> How to test or verify this?

Remove all the sandboxing thing and see if it starts working.

Add them back in until you find the one that breaks it.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Honggang LI Jan. 2, 2018, 7:41 a.m. UTC | #4
On Mon, Jan 01, 2018 at 12:19:50PM -0700, Jason Gunthorpe wrote:
> > Yes, it is systemd-219-51.el7.x86_64 error code.
> > 
> > > Is this because of PrivateNetwork=yes or something similar?
> >
> > How to test or verify this?
> 
> Remove all the sandboxing thing and see if it starts working.

"PrivateNetwork=yes/no" is not the root cause of this issue.

> 
> Add them back in until you find the one that breaks it.

I had confirmed this is a systemd issue with Fedora-27 distro. F27
works as higher version systemd is running.

Please drop this patch.

thanks
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jason Gunthorpe Jan. 2, 2018, 2:58 p.m. UTC | #5
On Tue, Jan 02, 2018 at 03:41:25PM +0800, Honggang LI wrote:
> On Mon, Jan 01, 2018 at 12:19:50PM -0700, Jason Gunthorpe wrote:
> > > Yes, it is systemd-219-51.el7.x86_64 error code.
> > > 
> > > > Is this because of PrivateNetwork=yes or something similar?
> > >
> > > How to test or verify this?
> > 
> > Remove all the sandboxing thing and see if it starts working.
> 
> "PrivateNetwork=yes/no" is not the root cause of this issue.
> 
> > 
> > Add them back in until you find the one that breaks it.
> 
> I had confirmed this is a systemd issue with Fedora-27 distro. F27
> works as higher version systemd is running.
> 
> Please drop this patch.

Very mysterious then, any idea what is going on?

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Honggang LI Jan. 3, 2018, 6:54 a.m. UTC | #6
On Tue, Jan 02, 2018 at 07:58:08AM -0700, Jason Gunthorpe wrote:
> On Tue, Jan 02, 2018 at 03:41:25PM +0800, Honggang LI wrote:
> > On Mon, Jan 01, 2018 at 12:19:50PM -0700, Jason Gunthorpe wrote:
> > > > Yes, it is systemd-219-51.el7.x86_64 error code.
> > > > 
> > > > > Is this because of PrivateNetwork=yes or something similar?
> > > >
> > > > How to test or verify this?
> > > 
> > > Remove all the sandboxing thing and see if it starts working.
> > 
> > "PrivateNetwork=yes/no" is not the root cause of this issue.
> > 
> > > 
> > > Add them back in until you find the one that breaks it.
> > 
> > I had confirmed this is a systemd issue with Fedora-27 distro. F27
> > works as higher version systemd is running.
> > 
> > Please drop this patch.
> 
> Very mysterious then, any idea what is going on?

sorry but no. I'm done with this issue, unless it bites me again.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/srp_daemon/srp_daemon.service.in b/srp_daemon/srp_daemon.service.in
index 188b7e1a..93e44425 100644
--- a/srp_daemon/srp_daemon.service.in
+++ b/srp_daemon/srp_daemon.service.in
@@ -3,6 +3,7 @@  Description=Daemon that discovers and logs in to SRP target systems
 Documentation=man:srp_daemon file:/etc/srp_daemon.conf
 DefaultDependencies=false
 Conflicts=emergency.target emergency.service
+After=network.target
 Before=remote-fs-pre.target
 
 [Service]