diff mbox series

[v1] tools/hotplug: systemd: Make dependency on Xen device nodes

Message ID 20230825033616.3402812-1-leo.yan@linaro.org (mailing list archive)
State New, archived
Headers show
Series [v1] tools/hotplug: systemd: Make dependency on Xen device nodes | expand

Commit Message

Leo Yan Aug. 25, 2023, 3:36 a.m. UTC
When system booting up, the kernel module xen_gntdev.ko is loaded and
the device node '/dev/xen/gntdev' is created; later the xenstored
service in systemd launches daemon to open this device node.

This flow has a race condition between creating the device node in the
kernel module and using the device node in the systemd service.  It's
possible that the xenstored service fails to open the device node due
to the delay of creating the device node.  In the end, xenbus cannot be
used between the Dom0 kernel and the Xen hypervisor.

To resolve this issue, we need to synchronize between udev and systemd
for the device node.  There have an extra change in the udev rules for
tagging 'systemd' for Xen device nodes, which notifies device node
creating to systemd; besides udev change, this patch adds dependency in
systemd service for waiting the device node.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
---

 The udev rules change is on github:
 https://github.com/systemd/systemd/pull/28962/commits/520340dfea3b6cf9fe7a24c9238313b1a5fe8539

 tools/hotplug/Linux/systemd/xenstored.service.in | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Comments

Erik Schilling Aug. 25, 2023, 5:02 a.m. UTC | #1
On Fri Aug 25, 2023 at 5:36 AM CEST, Leo Yan wrote:
> When system booting up, the kernel module xen_gntdev.ko is loaded and
> the device node '/dev/xen/gntdev' is created; later the xenstored
> service in systemd launches daemon to open this device node.
>
> This flow has a race condition between creating the device node in the
> kernel module and using the device node in the systemd service.  It's
> possible that the xenstored service fails to open the device node due
> to the delay of creating the device node.  In the end, xenbus cannot be
> used between the Dom0 kernel and the Xen hypervisor.
>
> To resolve this issue, we need to synchronize between udev and systemd
> for the device node.  There have an extra change in the udev rules for
> tagging 'systemd' for Xen device nodes, which notifies device node
> creating to systemd; besides udev change, this patch adds dependency in
> systemd service for waiting the device node.
>
> Signed-off-by: Leo Yan <leo.yan@linaro.org>
> ---
>
>  The udev rules change is on github:
>  https://github.com/systemd/systemd/pull/28962/commits/520340dfea3b6cf9fe7a24c9238313b1a5fe8539

Let's see what they think, but I fear systemd may not be the correct
upstream to carry this... Skimming through the rules that they have
there, it mostly looks like they only carry rules that are relevant for
systemd-related services directly.

>  tools/hotplug/Linux/systemd/xenstored.service.in | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/tools/hotplug/Linux/systemd/xenstored.service.in b/tools/hotplug/Linux/systemd/xenstored.service.in
> index 261077dc92..6e48cdb0e7 100644
> --- a/tools/hotplug/Linux/systemd/xenstored.service.in
> +++ b/tools/hotplug/Linux/systemd/xenstored.service.in
> @@ -1,7 +1,7 @@
>  [Unit]
>  Description=The Xen xenstore
> -Requires=proc-xen.mount
> -After=proc-xen.mount
> +Requires=proc-xen.mount dev-xen-gntdev.device
> +After=proc-xen.mount dev-xen-gntdev.device

I must admit that I am a bit confused why this is enough... During our
discussion on Slack, when you quoted from your rule it included
`ENV{SYSTEMD_WANTS}="xenstored.service"`. Did you drop that during later
development? Was there additional reasearch involved in dropping it?

My understanding was that if devices do not exist when systemd builds
its transaction model, dependencies on them will just get tossed out[1].
So I would have expected that there should still be a race if
the .device does not pull in the service.

Did you hit the race frequently enough to be sure that this fixes it in
entirety? Is there a place somewhere in the Xen or kernel code where one
could add an excessive sleep in order to trigger the race more reliably?

[1] "citation-needed": But I vaguely remember dealing with a similar
    issue and getting told this on #systemd IRC

- Erik
Leo Yan Aug. 25, 2023, 6:36 a.m. UTC | #2
On Fri, Aug 25, 2023 at 07:02:33AM +0200, Erik Schilling wrote:
> On Fri Aug 25, 2023 at 5:36 AM CEST, Leo Yan wrote:
> > When system booting up, the kernel module xen_gntdev.ko is loaded and
> > the device node '/dev/xen/gntdev' is created; later the xenstored
> > service in systemd launches daemon to open this device node.
> >
> > This flow has a race condition between creating the device node in the
> > kernel module and using the device node in the systemd service.  It's
> > possible that the xenstored service fails to open the device node due
> > to the delay of creating the device node.  In the end, xenbus cannot be
> > used between the Dom0 kernel and the Xen hypervisor.
> >
> > To resolve this issue, we need to synchronize between udev and systemd
> > for the device node.  There have an extra change in the udev rules for
> > tagging 'systemd' for Xen device nodes, which notifies device node
> > creating to systemd; besides udev change, this patch adds dependency in
> > systemd service for waiting the device node.
> >
> > Signed-off-by: Leo Yan <leo.yan@linaro.org>
> > ---
> >
> >  The udev rules change is on github:
> >  https://github.com/systemd/systemd/pull/28962/commits/520340dfea3b6cf9fe7a24c9238313b1a5fe8539
> 
> Let's see what they think, but I fear systemd may not be the correct
> upstream to carry this... Skimming through the rules that they have
> there, it mostly looks like they only carry rules that are relevant for
> systemd-related services directly.

Yeah, I share the same concern.  But seems to me, upstreaming change
to the systemd is the most neat fixing.

> >  tools/hotplug/Linux/systemd/xenstored.service.in | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/tools/hotplug/Linux/systemd/xenstored.service.in b/tools/hotplug/Linux/systemd/xenstored.service.in
> > index 261077dc92..6e48cdb0e7 100644
> > --- a/tools/hotplug/Linux/systemd/xenstored.service.in
> > +++ b/tools/hotplug/Linux/systemd/xenstored.service.in
> > @@ -1,7 +1,7 @@
> >  [Unit]
> >  Description=The Xen xenstore
> > -Requires=proc-xen.mount
> > -After=proc-xen.mount
> > +Requires=proc-xen.mount dev-xen-gntdev.device
> > +After=proc-xen.mount dev-xen-gntdev.device
> 
> I must admit that I am a bit confused why this is enough... During our
> discussion on Slack, when you quoted from your rule it included
> `ENV{SYSTEMD_WANTS}="xenstored.service"`. Did you drop that during later
> development? Was there additional reasearch involved in dropping it?

Yes, I dropped ENV{SYSTEMD_WANTS}="xenstored.service" and it works well
at my side.

My purpose is to upstream the udev rules in systemd as general as
possible.  As you mentioned, the "xenstored.service" is maintained in Xen
but not in systemd, for this reason, I would like avoid to add
"xenstored.service" into udev rules in systemd.

> My understanding was that if devices do not exist when systemd builds
> its transaction model, dependencies on them will just get tossed out[1].
> So I would have expected that there should still be a race if
> the .device does not pull in the service.

Thanks for sharing.  To be honest, I don't know this part.  Doesn't
systemd parse the service script and let service to wait for specific
.device until systemd receives notification for the .device?

> Did you hit the race frequently enough to be sure that this fixes it in
> entirety?

I have two boards.  One on board (Telechips), it's consistently
reproduce this issue, and after applying the udev rules change and
this patch, I confirmed the issue is dismissed entirely.

> Is there a place somewhere in the Xen or kernel code where one
> could add an excessive sleep in order to trigger the race more reliably?

On my AVA board with running Xen, I tried to add a long delay (10
seconds) in the kernel driver 'drivers/xen/gntdev.c', but I can see
systemd will wait for the kernel modules loading, and then it launches
Xen services.  Thus I cannot reproduce this issue on my AVA board.

So in below flow:

- Step 1: drivers/xen/gntdev.c registers misc device;
- Step 2: udev creates device node '/dev/xen/gntdev';
- Step 3: systemd launches xenstored.service.

Seems to me, the race condition happens between step 2 and step 3.  If
there have any delay in udev for creating device node, then the step3
for xenstored.service will fail.

Thanks for review.

Leo

> [1] "citation-needed": But I vaguely remember dealing with a similar
>     issue and getting told this on #systemd IRC
> 
> - Erik
Erik Schilling Aug. 25, 2023, 7:37 a.m. UTC | #3
On Fri Aug 25, 2023 at 8:36 AM CEST, Leo Yan wrote:
> On Fri, Aug 25, 2023 at 07:02:33AM +0200, Erik Schilling wrote:
> > On Fri Aug 25, 2023 at 5:36 AM CEST, Leo Yan wrote:
> > > When system booting up, the kernel module xen_gntdev.ko is loaded and
> > > the device node '/dev/xen/gntdev' is created; later the xenstored
> > > service in systemd launches daemon to open this device node.
> > >
> > > This flow has a race condition between creating the device node in the
> > > kernel module and using the device node in the systemd service.  It's
> > > possible that the xenstored service fails to open the device node due
> > > to the delay of creating the device node.  In the end, xenbus cannot be
> > > used between the Dom0 kernel and the Xen hypervisor.
> > >
> > > To resolve this issue, we need to synchronize between udev and systemd
> > > for the device node.  There have an extra change in the udev rules for
> > > tagging 'systemd' for Xen device nodes, which notifies device node
> > > creating to systemd; besides udev change, this patch adds dependency in
> > > systemd service for waiting the device node.
> > >
> > > Signed-off-by: Leo Yan <leo.yan@linaro.org>
> > > ---
> > >
> > >  The udev rules change is on github:
> > >  https://github.com/systemd/systemd/pull/28962/commits/520340dfea3b6cf9fe7a24c9238313b1a5fe8539
> > 
> > Let's see what they think, but I fear systemd may not be the correct
> > upstream to carry this... Skimming through the rules that they have
> > there, it mostly looks like they only carry rules that are relevant for
> > systemd-related services directly.
>
> Yeah, I share the same concern.  But seems to me, upstreaming change
> to the systemd is the most neat fixing.
>
> > >  tools/hotplug/Linux/systemd/xenstored.service.in | 4 ++--
> > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/tools/hotplug/Linux/systemd/xenstored.service.in b/tools/hotplug/Linux/systemd/xenstored.service.in
> > > index 261077dc92..6e48cdb0e7 100644
> > > --- a/tools/hotplug/Linux/systemd/xenstored.service.in
> > > +++ b/tools/hotplug/Linux/systemd/xenstored.service.in
> > > @@ -1,7 +1,7 @@
> > >  [Unit]
> > >  Description=The Xen xenstore
> > > -Requires=proc-xen.mount
> > > -After=proc-xen.mount
> > > +Requires=proc-xen.mount dev-xen-gntdev.device
> > > +After=proc-xen.mount dev-xen-gntdev.device
> > 
> > I must admit that I am a bit confused why this is enough... During our
> > discussion on Slack, when you quoted from your rule it included
> > `ENV{SYSTEMD_WANTS}="xenstored.service"`. Did you drop that during later
> > development? Was there additional reasearch involved in dropping it?
>
> Yes, I dropped ENV{SYSTEMD_WANTS}="xenstored.service" and it works well
> at my side.

Hm. Interesting. Could you plot the activations after boot?

    systemd-analyze plot > /tmp/plot.svg

Seeing failure vs success but also the success cases on AVA vs Telechips
may be interesting.

I just did some tests on my workstation where I added dependencies on
a random USB device. If that one was not plugged at all, the service
still happily started. Yet, it obviously seems to do something in your
case... So I fear we may not fully understand the real problem yet.

I must admit that I find this case a bit under-documented. While "Wants"
explicitly says that failling transactions are ignored, there is no
clear statement about what happens in that case with "Requires".

While skimming the docs I also realized that maybe BindsTo= would maybe
be more suitable compared to Requires= here. But that is unrelated to
the confusion that I have about the original problem.

> My purpose is to upstream the udev rules in systemd as general as
> possible.  As you mentioned, the "xenstored.service" is maintained in Xen
> but not in systemd, for this reason, I would like avoid to add
> "xenstored.service" into udev rules in systemd.
>
> > My understanding was that if devices do not exist when systemd builds
> > its transaction model, dependencies on them will just get tossed out[1].
> > So I would have expected that there should still be a race if
> > the .device does not pull in the service.
>
> Thanks for sharing.  To be honest, I don't know this part.  Doesn't
> systemd parse the service script and let service to wait for specific
> .device until systemd receives notification for the .device?
>
> > Did you hit the race frequently enough to be sure that this fixes it in
> > entirety?
>
> I have two boards.  One on board (Telechips), it's consistently
> reproduce this issue, and after applying the udev rules change and
> this patch, I confirmed the issue is dismissed entirely.
>
> > Is there a place somewhere in the Xen or kernel code where one
> > could add an excessive sleep in order to trigger the race more reliably?
>
> On my AVA board with running Xen, I tried to add a long delay (10
> seconds) in the kernel driver 'drivers/xen/gntdev.c', but I can see
> systemd will wait for the kernel modules loading, and then it launches
> Xen services.  Thus I cannot reproduce this issue on my AVA board.

Seeing the systemd-analyze plots would maybe help to understand the
differences here... Does one or the other maybe involve some initrd
stage that already creates devices?

Also: What was the original error that you saw? Was /dev/xen/gntdev
missing entirely? Or did interaction with it fail with some error code?

Can you reproduce the race while booting with systemd.log_level=debug
and export the full journal of the current boot?

    journalctl -b0 -o export > journal.txt


> So in below flow:
>
> - Step 1: drivers/xen/gntdev.c registers misc device;
> - Step 2: udev creates device node '/dev/xen/gntdev';
> - Step 3: systemd launches xenstored.service.
>
> Seems to me, the race condition happens between step 2 and step 3.  If
> there have any delay in udev for creating device node, then the step3
> for xenstored.service will fail.

Hm... If you see systemd waiting for the module, then Step 2 + 3 should
not be able to race. The service does not set DefaultDependencies=no,
so xenstored.service should only run after sysinit.target is reached.
Since that target depends on the module loading, everything should - in
theory - work out.

- Erik

>
> Thanks for review.
>
> Leo
>
> > [1] "citation-needed": But I vaguely remember dealing with a similar
> >     issue and getting told this on #systemd IRC
> > 
> > - Erik
Leo Yan Aug. 25, 2023, 9:46 a.m. UTC | #4
On Fri, Aug 25, 2023 at 09:37:58AM +0200, Erik Schilling wrote:

[...]

> > > > diff --git a/tools/hotplug/Linux/systemd/xenstored.service.in b/tools/hotplug/Linux/systemd/xenstored.service.in
> > > > index 261077dc92..6e48cdb0e7 100644
> > > > --- a/tools/hotplug/Linux/systemd/xenstored.service.in
> > > > +++ b/tools/hotplug/Linux/systemd/xenstored.service.in
> > > > @@ -1,7 +1,7 @@
> > > >  [Unit]
> > > >  Description=The Xen xenstore
> > > > -Requires=proc-xen.mount
> > > > -After=proc-xen.mount
> > > > +Requires=proc-xen.mount dev-xen-gntdev.device
> > > > +After=proc-xen.mount dev-xen-gntdev.device
> > > 
> > > I must admit that I am a bit confused why this is enough... During our
> > > discussion on Slack, when you quoted from your rule it included
> > > `ENV{SYSTEMD_WANTS}="xenstored.service"`. Did you drop that during later
> > > development? Was there additional reasearch involved in dropping it?
> >
> > Yes, I dropped ENV{SYSTEMD_WANTS}="xenstored.service" and it works well
> > at my side.
> 
> Hm. Interesting. Could you plot the activations after boot?
> 
>     systemd-analyze plot > /tmp/plot.svg
> 
> Seeing failure vs success but also the success cases on AVA vs Telechips
> may be interesting.
> 
> I just did some tests on my workstation where I added dependencies on
> a random USB device. If that one was not plugged at all, the service
> still happily started. Yet, it obviously seems to do something in your
> case... So I fear we may not fully understand the real problem yet.
> 
> I must admit that I find this case a bit under-documented. While "Wants"
> explicitly says that failling transactions are ignored, there is no
> clear statement about what happens in that case with "Requires".
> 
> While skimming the docs I also realized that maybe BindsTo= would maybe
> be more suitable compared to Requires= here. But that is unrelated to
> the confusion that I have about the original problem.

Please ignore this patch and sorry for noise.

Erik and me confirmed the systemd has established the dependencies
properly.  The dependency is (the above one depends on the blow one):

  xenstored.service
    `> sysinit.target
         `> systemd-modules-load.service (Load xen modules)


My issue is caused by a local customzied systemd-modules-load.service
which breaks the dependency between sysinit.target and
systemd-modules-load.service.

Very appreciate Erik's help for root cause the issue.

Thanks,
Leo
diff mbox series

Patch

diff --git a/tools/hotplug/Linux/systemd/xenstored.service.in b/tools/hotplug/Linux/systemd/xenstored.service.in
index 261077dc92..6e48cdb0e7 100644
--- a/tools/hotplug/Linux/systemd/xenstored.service.in
+++ b/tools/hotplug/Linux/systemd/xenstored.service.in
@@ -1,7 +1,7 @@ 
 [Unit]
 Description=The Xen xenstore
-Requires=proc-xen.mount
-After=proc-xen.mount
+Requires=proc-xen.mount dev-xen-gntdev.device
+After=proc-xen.mount dev-xen-gntdev.device
 Before=libvirtd.service libvirt-guests.service
 RefuseManualStop=true
 ConditionPathExists=/proc/xen/capabilities