diff mbox series

automation: fix race condition in adl-suspend test

Message ID 20231028033404.262729-1-marmarek@invisiblethingslab.com (mailing list archive)
State Superseded
Headers show
Series automation: fix race condition in adl-suspend test | expand

Commit Message

Marek Marczykowski-Górecki Oct. 28, 2023, 3:33 a.m. UTC
If system suspends too quickly, the message for the test controller to
wake up the system may be not sent to the console before suspending.
This will cause the test to timeout.

Fix this by waiting a bit after printing the message. The test
controller then resumes the system 30s after the message, so as long as
the delay + suspending takes less time it is okay.

Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
---
This is consistent with the observation that sync_console "fixes" the
issue.
---
 automation/scripts/qubes-x86-64.sh | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Comments

Roger Pau Monne Oct. 30, 2023, 11:42 a.m. UTC | #1
On Sat, Oct 28, 2023 at 05:33:57AM +0200, Marek Marczykowski-Górecki wrote:
> If system suspends too quickly, the message for the test controller to
> wake up the system may be not sent to the console before suspending.
> This will cause the test to timeout.
> 
> Fix this by waiting a bit after printing the message. The test
> controller then resumes the system 30s after the message, so as long as
> the delay + suspending takes less time it is okay.
> 
> Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
> ---
> This is consistent with the observation that sync_console "fixes" the
> issue.
> ---
>  automation/scripts/qubes-x86-64.sh | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/automation/scripts/qubes-x86-64.sh b/automation/scripts/qubes-x86-64.sh
> index 26131b082671..a34db96e4585 100755
> --- a/automation/scripts/qubes-x86-64.sh
> +++ b/automation/scripts/qubes-x86-64.sh
> @@ -54,11 +54,11 @@ until grep 'domU started' /var/log/xen/console/guest-domU.log; do
>      sleep 1
>  done
>  echo \"${wait_and_wakeup}\"
> +# let the above message flow to console, then suspend
> +sleep 5

Could you use `sync /dev/stdout`?  I guess that might not be enough,
since the sync won't be propagated to the hypervisor, and hence even
if flushed from Linux, we have no guarantee that the hypervisor has
also flushed it.

Xen should flush the buffer when a newline character is found, but I
have no idea whether context could return to guest while the buffer is
still in the process of being fully flushed.

Anyway, adding the extra sync might be good regardless, and keeping
the sleep.

Thanks, Roger.
Marek Marczykowski-Górecki Oct. 30, 2023, 4:32 p.m. UTC | #2
On Mon, Oct 30, 2023 at 12:42:52PM +0100, Roger Pau Monné wrote:
> On Sat, Oct 28, 2023 at 05:33:57AM +0200, Marek Marczykowski-Górecki wrote:
> > If system suspends too quickly, the message for the test controller to
> > wake up the system may be not sent to the console before suspending.
> > This will cause the test to timeout.
> > 
> > Fix this by waiting a bit after printing the message. The test
> > controller then resumes the system 30s after the message, so as long as
> > the delay + suspending takes less time it is okay.
> > 
> > Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
> > ---
> > This is consistent with the observation that sync_console "fixes" the
> > issue.
> > ---
> >  automation/scripts/qubes-x86-64.sh | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/automation/scripts/qubes-x86-64.sh b/automation/scripts/qubes-x86-64.sh
> > index 26131b082671..a34db96e4585 100755
> > --- a/automation/scripts/qubes-x86-64.sh
> > +++ b/automation/scripts/qubes-x86-64.sh
> > @@ -54,11 +54,11 @@ until grep 'domU started' /var/log/xen/console/guest-domU.log; do
> >      sleep 1
> >  done
> >  echo \"${wait_and_wakeup}\"
> > +# let the above message flow to console, then suspend
> > +sleep 5
> 
> Could you use `sync /dev/stdout`?  I guess that might not be enough,
> since the sync won't be propagated to the hypervisor, and hence even
> if flushed from Linux, we have no guarantee that the hypervisor has
> also flushed it.

It seems `sync /dev/stdout` helps too, at least in a limited sample of
two.

> Xen should flush the buffer when a newline character is found, but I
> have no idea whether context could return to guest while the buffer is
> still in the process of being fully flushed.

IIC Xen should flush the console buffer on the suspend path (there is
console_start_sync() in enter_state()). So, if linux manages to send it
to Xen in time, all should be good (in theory at least).

> Anyway, adding the extra sync might be good regardless, and keeping
> the sleep.

Good idea, I'll send v2 with it included.
Marek Marczykowski-Górecki Oct. 30, 2023, 4:38 p.m. UTC | #3
On Mon, Oct 30, 2023 at 05:32:08PM +0100, Marek Marczykowski-Górecki wrote:
> On Mon, Oct 30, 2023 at 12:42:52PM +0100, Roger Pau Monné wrote:
> > On Sat, Oct 28, 2023 at 05:33:57AM +0200, Marek Marczykowski-Górecki wrote:
> > > If system suspends too quickly, the message for the test controller to
> > > wake up the system may be not sent to the console before suspending.
> > > This will cause the test to timeout.
> > > 
> > > Fix this by waiting a bit after printing the message. The test
> > > controller then resumes the system 30s after the message, so as long as
> > > the delay + suspending takes less time it is okay.
> > > 
> > > Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
> > > ---
> > > This is consistent with the observation that sync_console "fixes" the
> > > issue.
> > > ---
> > >  automation/scripts/qubes-x86-64.sh | 4 ++--
> > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/automation/scripts/qubes-x86-64.sh b/automation/scripts/qubes-x86-64.sh
> > > index 26131b082671..a34db96e4585 100755
> > > --- a/automation/scripts/qubes-x86-64.sh
> > > +++ b/automation/scripts/qubes-x86-64.sh
> > > @@ -54,11 +54,11 @@ until grep 'domU started' /var/log/xen/console/guest-domU.log; do
> > >      sleep 1
> > >  done
> > >  echo \"${wait_and_wakeup}\"
> > > +# let the above message flow to console, then suspend
> > > +sleep 5
> > 
> > Could you use `sync /dev/stdout`?  I guess that might not be enough,
> > since the sync won't be propagated to the hypervisor, and hence even
> > if flushed from Linux, we have no guarantee that the hypervisor has
> > also flushed it.
> 
> It seems `sync /dev/stdout` helps too, at least in a limited sample of
> two.

... and the third attempt (with sync instead of sleep) failed.

> > Xen should flush the buffer when a newline character is found, but I
> > have no idea whether context could return to guest while the buffer is
> > still in the process of being fully flushed.
> 
> IIC Xen should flush the console buffer on the suspend path (there is
> console_start_sync() in enter_state()). So, if linux manages to send it
> to Xen in time, all should be good (in theory at least).
> 
> > Anyway, adding the extra sync might be good regardless, and keeping
> > the sleep.
> 
> Good idea, I'll send v2 with it included.
diff mbox series

Patch

diff --git a/automation/scripts/qubes-x86-64.sh b/automation/scripts/qubes-x86-64.sh
index 26131b082671..a34db96e4585 100755
--- a/automation/scripts/qubes-x86-64.sh
+++ b/automation/scripts/qubes-x86-64.sh
@@ -54,11 +54,11 @@  until grep 'domU started' /var/log/xen/console/guest-domU.log; do
     sleep 1
 done
 echo \"${wait_and_wakeup}\"
+# let the above message flow to console, then suspend
+sleep 5
 set -x
 echo deep > /sys/power/mem_sleep
 echo mem > /sys/power/state
-# now wait for resume
-sleep 5
 xl list
 xl dmesg | grep 'Finishing wakeup from ACPI S3 state' || exit 1
 # check if domU is still alive