[2/2] automation: add a suspend test on an Alder Lake system

Message ID	aa4385f5291ebc06551414e4d8cbf7cdd3996eef.1679023966.git-series.marmarek@invisiblethingslab.com (mailing list archive)
State	Superseded
Headers	show Return-Path: <xen-devel-bounces@lists.xenproject.org> Errors-To: xen-devel-bounces@lists.xenproject.org Precedence: list Sender: "Xen-devel" <xen-devel-bounces@lists.xenproject.org> Feedback-ID: i1568416f:Fastmail From: =?utf-8?q?Marek_Marczykowski-G=C3=B3recki?= <marmarek@invisiblethingslab.com> To: xen-devel@lists.xenproject.org Cc: =?utf-8?q?Marek_Marczykowski-G=C3=B3recki?= <marmarek@invisiblethingslab.com>, Doug Goldstein <cardoe@cardoe.com>, Stefano Stabellini <sstabellini@kernel.org> Subject: [PATCH 2/2] automation: add a suspend test on an Alder Lake system Date: Fri, 17 Mar 2023 04:34:18 +0100 Message-Id: <aa4385f5291ebc06551414e4d8cbf7cdd3996eef.1679023966.git-series.marmarek@invisiblethingslab.com> In-Reply-To: <cover.5a69c1f96ff446a5872e9dbf6308be9ab278f9df.1679023966.git-series.marmarek@invisiblethingslab.com> References: <cover.5a69c1f96ff446a5872e9dbf6308be9ab278f9df.1679023966.git-series.marmarek@invisiblethingslab.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit
Series	automation: add another Gitlab-CI test, x86-64 S3 this time \| expand [0/2] automation: add another Gitlab-CI test, x86-64 S3 this time [1/2] automation: build 6.1.19 kernel for x86-64 dom0 [2/2] automation: add a suspend test on an Alder Lake system

Marek Marczykowski-Górecki March 17, 2023, 3:34 a.m. UTC

This is a first test using Qubes OS CI infra. The gitlab-runner has
access to ssh-based control interface (control@thor.testnet, ssh key
exposed to the test via ssh-agent) and pre-configured HTTP dir for boot
files (mapped under /scratch/gitlab-runner/tftp inside the container).
Details about the setup are described on
https://www.qubes-os.org/news/2022/05/05/automated-os-testing-on-physical-laptops/

This test boots Xen, and try if S3 works. It runs on a ADL-based desktop
system. The test script is based on the Xilinx one.

The machine needs newer kernel than other x86 tests run, so use 6.1.x
kernel added in previous commit.

When building rootfs, use fakeroot to preserve device files when
repacking rootfs archives in a non-privileged container.

Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
---
I'm bad at naming things. Things that I could use naming suggestions:
 - test script (qubes-x86-64-suspend.sh) - this might be okay?
 - test template job name (.adl-x86-64)
 - test job name (adl-suspend-x86-64-gcc)
 - tag (qubes-hw2)

For context, our CI has several machines, named test-X or hwX, each
controlled with matching hal900X RPi (this is where gitlab-runner is).
This test uses only one specific hw, but I plan adding few others too.
---
 automation/gitlab-ci/test.yaml             |  31 ++++-
 automation/scripts/qubes-x86-64-suspend.sh | 155 ++++++++++++++++++++++-
 2 files changed, 186 insertions(+)
 create mode 100755 automation/scripts/qubes-x86-64-suspend.sh

Stefano Stabellini March 17, 2023, 11:10 p.m. UTC | #1

On Fri, 17 Mar 2023, Marek Marczykowski-Górecki wrote:
> This is a first test using Qubes OS CI infra. The gitlab-runner has
> access to ssh-based control interface (control@thor.testnet, ssh key
> exposed to the test via ssh-agent) and pre-configured HTTP dir for boot
> files (mapped under /scratch/gitlab-runner/tftp inside the container).
> Details about the setup are described on
> https://www.qubes-os.org/news/2022/05/05/automated-os-testing-on-physical-laptops/
> 
> This test boots Xen, and try if S3 works. It runs on a ADL-based desktop
> system. The test script is based on the Xilinx one.
> 
> The machine needs newer kernel than other x86 tests run, so use 6.1.x
> kernel added in previous commit.
> 
> When building rootfs, use fakeroot to preserve device files when
> repacking rootfs archives in a non-privileged container.
> 
> Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>

This is awesome!! Thanks Marek!


> ---
> I'm bad at naming things. Things that I could use naming suggestions:
>  - test script (qubes-x86-64-suspend.sh) - this might be okay?
>  - test template job name (.adl-x86-64)
>  - test job name (adl-suspend-x86-64-gcc)
>  - tag (qubes-hw2)

I think these names are OK. I would maybe rename the tag "qubes-hw2" to
"qubes" because so far there is only one but I am fine with qubes-hw2
also.


> For context, our CI has several machines, named test-X or hwX, each
> controlled with matching hal900X RPi (this is where gitlab-runner is).
> This test uses only one specific hw, but I plan adding few others too.
> ---
>  automation/gitlab-ci/test.yaml             |  31 ++++-
>  automation/scripts/qubes-x86-64-suspend.sh | 155 ++++++++++++++++++++++-
>  2 files changed, 186 insertions(+)
>  create mode 100755 automation/scripts/qubes-x86-64-suspend.sh
> 
> diff --git a/automation/gitlab-ci/test.yaml b/automation/gitlab-ci/test.yaml
> index 2e1a6886df7f..f5511dd6da70 100644
> --- a/automation/gitlab-ci/test.yaml
> +++ b/automation/gitlab-ci/test.yaml
> @@ -15,6 +15,10 @@
>  .arm32-test-needs: &arm32-test-needs
>    - qemu-system-aarch64-6.0.0-arm32-export
>  
> +.x86-64-test-needs: &x86-64-test-needs
> +  - alpine-3.12-rootfs-export
> +  - kernel-6.1.19-export

As you know, the jobs starting with a "." are template jobs to avoid
repeating the same things over and over. .x86-64-test-needs could be
used in qemu-alpine-x86_64-gcc also.


>  .qemu-arm64:
>    extends: .test-jobs-common
>    variables:
> @@ -84,6 +88,25 @@
>    tags:
>      - xilinx
>  
> +.adl-x86-64:
> +  extends: .test-jobs-common
> +  variables:
> +    # the test controller runs on RPi4
> +    CONTAINER: alpine:3.12-arm64v8
> +    LOGFILE: smoke-test.log
> +  artifacts:
> +    paths:
> +      - smoke.serial
> +      - '*.log'
> +    when: always
> +  only:
> +    variables:
> +      - $QUBES_JOBS == "true" && $CI_COMMIT_REF_PROTECTED == "true"

Let me know which trees should have QUBES_JOBS set to true (thus able to
start Qubes jobs.) At a minimum, I think we would want
https://gitlab.com/xen-project/xen to test "staging" and "master". I can
set QUBES_JOBS to true to https://gitlab.com/xen-project/xen if you are
OK with it.


> +  before_script:
> +    - apk --no-cache add openssh-client fakeroot

This would work but we typically try to avoid installing more packages
in the test jobs for speed and efficiency. Instead, add those packages
directly to the build container dockerfile
(automation/build/alpine/3.12-arm64v8.dockerfile), e.g.:

diff --git a/automation/build/alpine/3.12-arm64v8.dockerfile b/automation/build/alpine/3.12-arm64v8.dockerfile
index 180c978964..3f1e6a3fc6 100644
--- a/automation/build/alpine/3.12-arm64v8.dockerfile
+++ b/automation/build/alpine/3.12-arm64v8.dockerfile
@@ -41,3 +41,6 @@ RUN apk --no-cache add \
   libattr \
   libcap-ng-dev \
   pixman-dev \
+  # qubes test deps
+  openssh-client \
+  fakeroot \

But I am not sure why you need fakeroot, more questions below about it


> +  tags:
> +    - qubes-hw2
> +
>  # Test jobs
>  build-each-commit-gcc:
>    extends: .test-jobs-common
> @@ -110,6 +133,14 @@ xilinx-smoke-dom0less-arm64-gcc:
>      - *arm64-test-needs
>      - alpine-3.12-gcc-arm64
>  
> +adl-suspend-x86-64-gcc:
> +  extends: .adl-x86-64
> +  script:
> +    - ./automation/scripts/qubes-x86-64-suspend.sh s3 2>&1 | tee ${LOGFILE}
> +  needs:
> +    - *x86-64-test-needs
> +    - alpine-3.12-gcc

This up to you, but qubes-x86-64-suspend.sh is very cool and
sophisticated and I would use it to create at least 2 jobs:
- a plain dom0 + domU boot (no suspend)
- a suspend/resume job

I am happy either way


>  qemu-smoke-dom0-arm64-gcc:
>    extends: .qemu-arm64
>    script:
> diff --git a/automation/scripts/qubes-x86-64-suspend.sh b/automation/scripts/qubes-x86-64-suspend.sh
> new file mode 100755
> index 000000000000..fc479c9faaec
> --- /dev/null
> +++ b/automation/scripts/qubes-x86-64-suspend.sh
> @@ -0,0 +1,155 @@
> +#!/bin/sh
> +
> +set -ex
> +
> +test_variant=$1
> +
> +wait_and_wakeup=
> +if [ -z "${test_variant}" ]; then
> +    passed="ping test passed"
> +    domU_check="
> +until ifconfig eth0 192.168.0.2 &> /dev/null && ping -c 10 192.168.0.1; do
> +    sleep 30
> +done
> +echo \"${passed}\"
> +"
> +elif [ "${test_variant}" = "s3" ]; then
> +    passed="suspend test passed"
> +    wait_and_wakeup="started, suspending"
> +    domU_check="
> +ifconfig eth0 192.168.0.2
> +echo domU started
> +"
> +    dom0_check="
> +until grep 'domU started' /var/log/xen/console/guest-domU.log; do
> +    sleep 1
> +done
> +echo \"${wait_and_wakeup}\"
> +set -x
> +echo deep > /sys/power/mem_sleep
> +echo mem > /sys/power/state
> +# now wait for resume
> +sleep 5
> +# get domU console content into test log
> +tail -n 100 /var/log/xen/console/guest-domU.log
> +xl list
> +xl dmesg | grep 'Finishing wakeup from ACPI S3 state' || exit 1
> +# check if domU is still alive
> +ping -c 10 192.168.0.2 || exit 1
> +echo \"${passed}\"
> +"
> +fi

Very nice!


> +# DomU
> +mkdir -p rootfs
> +cd rootfs
> +fakeroot -s ../fakeroot-save tar xzf ../binaries/initrd.tar.gz

I am a bit confused about it: are you sure you need fakeroot for this?
This script is running inside a container as root already? Are you using
Docker on the RPi4 to run this job?


> +mkdir proc
> +mkdir run
> +mkdir srv
> +mkdir sys
> +rm var/run
> +echo "#!/bin/sh
> +
> +${domU_check}
> +/bin/sh" > etc/local.d/xen.start
> +chmod +x etc/local.d/xen.start
> +echo "rc_verbose=yes" >> etc/rc.conf
> +find . | fakeroot -i ../fakeroot-save cpio -H newc -o | gzip > ../binaries/domU-rootfs.cpio.gz

same here


> +cd ..
> +rm -rf rootfs
> +
> +# DOM0 rootfs
> +mkdir -p rootfs
> +cd rootfs
> +fakeroot -s ../fakeroot-save tar xzf ../binaries/initrd.tar.gz
> +mkdir boot
> +mkdir proc
> +mkdir run
> +mkdir srv
> +mkdir sys
> +rm var/run
> +cp -ar ../binaries/dist/install/* .
> +
> +echo "#!/bin/bash
> +
> +export LD_LIBRARY_PATH=/usr/local/lib
> +bash /etc/init.d/xencommons start
> +
> +brctl addbr xenbr0
> +brctl addif xenbr0 eth0
> +ifconfig eth0 up
> +ifconfig xenbr0 up
> +ifconfig xenbr0 192.168.0.1
> +
> +xl create /etc/xen/domU.cfg
> +${dom0_check}
> +" > etc/local.d/xen.start
> +chmod +x etc/local.d/xen.start
> +# just PVH for now
> +echo '
> +type = "pvh"
> +name = "domU"
> +kernel = "/boot/vmlinuz"
> +ramdisk = "/boot/initrd-domU"
> +extra = "root=/dev/ram0 console=hvc0"
> +memory = 512
> +vif = [ "bridge=xenbr0", ]
> +disk = [ ]
> +' > etc/xen/domU.cfg
> +
> +echo "rc_verbose=yes" >> etc/rc.conf
> +echo "XENCONSOLED_TRACE=all" >> etc/default/xencommons
> +echo "QEMU_XEN=/bin/false" >> etc/default/xencommons
> +mkdir -p var/log/xen/console
> +cp ../binaries/bzImage boot/vmlinuz
> +cp ../binaries/domU-rootfs.cpio.gz boot/initrd-domU
> +find . | fakeroot -i ../fakeroot-save cpio -H newc -o | gzip > ../binaries/dom0-rootfs.cpio.gz
> +cd ..
> +
> +
> +TFTP=/scratch/gitlab-runner/tftp
> +
> +echo '
> +multiboot2 (http)/gitlab-ci/xen console=com1 com1=115200,8n1 loglvl=all guest_loglvl=all
> +module2 (http)/gitlab-ci/vmlinuz console=hvc0 root=/dev/ram0
> +module2 (http)/gitlab-ci/initrd-dom0
> +' > $TFTP/grub.cfg
> +
> +cp -f binaries/xen $TFTP/xen
> +cp -f binaries/bzImage $TFTP/vmlinuz
> +cp -f binaries/dom0-rootfs.cpio.gz $TFTP/initrd-dom0
> +
> +# start logging the serial; this gives interactive console, don't close its
> +# stdin to not close it; the 'cat' is important, plain redirection would hang
> +# until somebody opens the pipe; opening and closing the pipe is used to close
> +# the console
> +mkfifo /tmp/console-stdin
> +cat /tmp/console-stdin |\
> +ssh control@thor.testnet console | tee smoke.serial &

For clarity I would add a variable just below TFTP above to store the
controller node. This will make it easier to change in the future and
also makes it easier to see that it is runner-specific detail, e.g.:

TFTP=/scratch/gitlab-runner/tftp
CONTROLLER=control@thor.testnet

then here:

ssh $CONTROLLER console | tee smoke.serial &


> +# start the system pointing at gitlab-ci predefined config
> +ssh control@thor.testnet gitlabci poweron
> +trap "ssh control@thor.testnet poweroff; : > /tmp/console-stdin" EXIT
> +
> +if [ -n "$wait_and_wakeup" ]; then
> +    # wait for suspend or a timeout
> +    timeout=120
> +    until grep "$wait_and_wakeup" smoke.serial || [ $timeout -le 0 ]; do
> +        sleep 1;
> +        : $((--timeout))
> +    done
> +    if [ $timeout -le 0 ]; then
> +        echo "ERROR: suspend timeout, aborting"
> +        exit 1
> +    fi
> +    # keep it suspended a bit, then wakeup
> +    sleep 30
> +    ssh control@thor.testnet wake
> +fi

I like this. It will shorten the duration of the test job. I am not
asking you to do anything, but I think it would be a good idea to have
all test scripts loop around like this if possible to poll for success
every second rather than wait a pre-determined amount of time, which is
usually several minutes longer than necessary.


> +# give the test time to run a bit, and then check
> +sleep 30
> +
> +(grep -q "^Welcome to Alpine Linux" smoke.serial && grep -q "${passed}" smoke.serial) || exit 1
> +exit 0
> -- 
> git-series 0.9.1
>

Marek Marczykowski-Górecki March 18, 2023, 12:53 a.m. UTC | #2

On Fri, Mar 17, 2023 at 04:10:22PM -0700, Stefano Stabellini wrote:
> On Fri, 17 Mar 2023, Marek Marczykowski-Górecki wrote:
> > This is a first test using Qubes OS CI infra. The gitlab-runner has
> > access to ssh-based control interface (control@thor.testnet, ssh key
> > exposed to the test via ssh-agent) and pre-configured HTTP dir for boot
> > files (mapped under /scratch/gitlab-runner/tftp inside the container).
> > Details about the setup are described on
> > https://www.qubes-os.org/news/2022/05/05/automated-os-testing-on-physical-laptops/
> > 
> > This test boots Xen, and try if S3 works. It runs on a ADL-based desktop
> > system. The test script is based on the Xilinx one.
> > 
> > The machine needs newer kernel than other x86 tests run, so use 6.1.x
> > kernel added in previous commit.
> > 
> > When building rootfs, use fakeroot to preserve device files when
> > repacking rootfs archives in a non-privileged container.
> > 
> > Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
> 
> This is awesome!! Thanks Marek!
> 
> 
> > ---
> > I'm bad at naming things. Things that I could use naming suggestions:
> >  - test script (qubes-x86-64-suspend.sh) - this might be okay?
> >  - test template job name (.adl-x86-64)
> >  - test job name (adl-suspend-x86-64-gcc)
> >  - tag (qubes-hw2)
> 
> I think these names are OK. I would maybe rename the tag "qubes-hw2" to
> "qubes" because so far there is only one but I am fine with qubes-hw2
> also.

I have 10 of them (and growing), so I'd like to keep tag name at least
somehow referencing which runner it uses. For example, this one is
a desktop with Alder Lake, but some other tests I may want to use a laptop
with Tiger Lake, for example. The mapping name -> hw spec isn't publicly
written down (although our openQA publishes all kind of logs from them,
so it's possible to infer this info).

> > For context, our CI has several machines, named test-X or hwX, each
> > controlled with matching hal900X RPi (this is where gitlab-runner is).
> > This test uses only one specific hw, but I plan adding few others too.
> > ---
> >  automation/gitlab-ci/test.yaml             |  31 ++++-
> >  automation/scripts/qubes-x86-64-suspend.sh | 155 ++++++++++++++++++++++-
> >  2 files changed, 186 insertions(+)
> >  create mode 100755 automation/scripts/qubes-x86-64-suspend.sh
> > 
> > diff --git a/automation/gitlab-ci/test.yaml b/automation/gitlab-ci/test.yaml
> > index 2e1a6886df7f..f5511dd6da70 100644
> > --- a/automation/gitlab-ci/test.yaml
> > +++ b/automation/gitlab-ci/test.yaml
> > @@ -15,6 +15,10 @@
> >  .arm32-test-needs: &arm32-test-needs
> >    - qemu-system-aarch64-6.0.0-arm32-export
> >  
> > +.x86-64-test-needs: &x86-64-test-needs
> > +  - alpine-3.12-rootfs-export
> > +  - kernel-6.1.19-export
> 
> As you know, the jobs starting with a "." are template jobs to avoid
> repeating the same things over and over. .x86-64-test-needs could be
> used in qemu-alpine-x86_64-gcc also.

Right, when switching all of them to 6.1 kernel, that makes sense.

> >  .qemu-arm64:
> >    extends: .test-jobs-common
> >    variables:
> > @@ -84,6 +88,25 @@
> >    tags:
> >      - xilinx
> >  
> > +.adl-x86-64:
> > +  extends: .test-jobs-common
> > +  variables:
> > +    # the test controller runs on RPi4
> > +    CONTAINER: alpine:3.12-arm64v8
> > +    LOGFILE: smoke-test.log
> > +  artifacts:
> > +    paths:
> > +      - smoke.serial
> > +      - '*.log'
> > +    when: always
> > +  only:
> > +    variables:
> > +      - $QUBES_JOBS == "true" && $CI_COMMIT_REF_PROTECTED == "true"
> 
> Let me know which trees should have QUBES_JOBS set to true (thus able to
> start Qubes jobs.) At a minimum, I think we would want
> https://gitlab.com/xen-project/xen to test "staging" and "master". I can
> set QUBES_JOBS to true to https://gitlab.com/xen-project/xen if you are
> OK with it.

Yes, that's perfectly okay. I'd like at least also staging-4.17, but
depending on push frequency other staging-* are probably fine too (I
very much doubt long queue would be an issue). Of course that assumes
those tests would be backported, which I'm not sure if is planned. I'm
also okay with allowing committers and/or other maintainers to use it on
demand, but preferably not all patchew branches.

BTW, if you trigger job manually via a web interface or API, you can
specify variables for just a single pipeline. And if you use that
feature, you can also make gitlab present you a bit more convenient
interface with variables listed already. See example here:
https://gitlab.com/QubesOS/qubes-continuous-integration/-/blob/main/.gitlab-ci.yml#L15-26
This could be useful to start only a subset of tests.

> > +  before_script:
> > +    - apk --no-cache add openssh-client fakeroot
> 
> This would work but we typically try to avoid installing more packages
> in the test jobs for speed and efficiency. 

Yes, I was being lazy and wanted to avoid rebuilding container (and
changing from where it's pulled) when developing the test. Sure, will
fold that into the dockerfile.

> Instead, add those packages
> directly to the build container dockerfile
> (automation/build/alpine/3.12-arm64v8.dockerfile), e.g.:
> 
> diff --git a/automation/build/alpine/3.12-arm64v8.dockerfile b/automation/build/alpine/3.12-arm64v8.dockerfile
> index 180c978964..3f1e6a3fc6 100644
> --- a/automation/build/alpine/3.12-arm64v8.dockerfile
> +++ b/automation/build/alpine/3.12-arm64v8.dockerfile
> @@ -41,3 +41,6 @@ RUN apk --no-cache add \
>    libattr \
>    libcap-ng-dev \
>    pixman-dev \
> +  # qubes test deps
> +  openssh-client \
> +  fakeroot \
> 
> But I am not sure why you need fakeroot, more questions below about it
> 
> 
> > +  tags:
> > +    - qubes-hw2
> > +
> >  # Test jobs
> >  build-each-commit-gcc:
> >    extends: .test-jobs-common
> > @@ -110,6 +133,14 @@ xilinx-smoke-dom0less-arm64-gcc:
> >      - *arm64-test-needs
> >      - alpine-3.12-gcc-arm64
> >  
> > +adl-suspend-x86-64-gcc:
> > +  extends: .adl-x86-64
> > +  script:
> > +    - ./automation/scripts/qubes-x86-64-suspend.sh s3 2>&1 | tee ${LOGFILE}
> > +  needs:
> > +    - *x86-64-test-needs
> > +    - alpine-3.12-gcc
> 
> This up to you, but qubes-x86-64-suspend.sh is very cool and
> sophisticated and I would use it to create at least 2 jobs:
> - a plain dom0 + domU boot (no suspend)
> - a suspend/resume job

Indeed both tests are implemented there already, just a matter of
creating another job with a different parameter.

But then, the script name isn't really accurate. qubes-x86-64.sh then?
FWIW, next tests I'm planning to contribute:
 - PCI passthrough to HVM with qemu in dom0
 - linux stubdomain
 - PCI passthrough with linux stubdomain

And possibly some of the above exercising other systems, maybe something
on AMD (currently only one runner is on AMD, Ryzen 5 4500U specifically,
and is rather busy because of that, but we can try).

I'm not sure yet if putting them all in a single script is a smart idea.
On the other hand, quite a bit of boilerplate would be duplicated
otherwise.

> I am happy either way
> 
> 
> >  qemu-smoke-dom0-arm64-gcc:
> >    extends: .qemu-arm64
> >    script:
> > diff --git a/automation/scripts/qubes-x86-64-suspend.sh b/automation/scripts/qubes-x86-64-suspend.sh
> > new file mode 100755
> > index 000000000000..fc479c9faaec
> > --- /dev/null
> > +++ b/automation/scripts/qubes-x86-64-suspend.sh
> > @@ -0,0 +1,155 @@
> > +#!/bin/sh
> > +
> > +set -ex
> > +
> > +test_variant=$1
> > +
> > +wait_and_wakeup=
> > +if [ -z "${test_variant}" ]; then
> > +    passed="ping test passed"
> > +    domU_check="
> > +until ifconfig eth0 192.168.0.2 &> /dev/null && ping -c 10 192.168.0.1; do
> > +    sleep 30
> > +done
> > +echo \"${passed}\"
> > +"
> > +elif [ "${test_variant}" = "s3" ]; then
> > +    passed="suspend test passed"
> > +    wait_and_wakeup="started, suspending"
> > +    domU_check="
> > +ifconfig eth0 192.168.0.2
> > +echo domU started
> > +"
> > +    dom0_check="
> > +until grep 'domU started' /var/log/xen/console/guest-domU.log; do
> > +    sleep 1
> > +done
> > +echo \"${wait_and_wakeup}\"
> > +set -x
> > +echo deep > /sys/power/mem_sleep
> > +echo mem > /sys/power/state
> > +# now wait for resume
> > +sleep 5
> > +# get domU console content into test log
> > +tail -n 100 /var/log/xen/console/guest-domU.log
> > +xl list
> > +xl dmesg | grep 'Finishing wakeup from ACPI S3 state' || exit 1
> > +# check if domU is still alive
> > +ping -c 10 192.168.0.2 || exit 1
> > +echo \"${passed}\"
> > +"
> > +fi
> 
> Very nice!
> 
> 
> > +# DomU
> > +mkdir -p rootfs
> > +cd rootfs
> > +fakeroot -s ../fakeroot-save tar xzf ../binaries/initrd.tar.gz
> 
> I am a bit confused about it: are you sure you need fakeroot for this?
> This script is running inside a container as root already? Are you using
> Docker on the RPi4 to run this job?

This is running in a rootless podman container. But even with docker,
for device files to work (see commit message) it would need to run
privileged container, which I'd like to avoid.

> > +mkdir proc
> > +mkdir run
> > +mkdir srv
> > +mkdir sys
> > +rm var/run
> > +echo "#!/bin/sh
> > +
> > +${domU_check}
> > +/bin/sh" > etc/local.d/xen.start
> > +chmod +x etc/local.d/xen.start
> > +echo "rc_verbose=yes" >> etc/rc.conf
> > +find . | fakeroot -i ../fakeroot-save cpio -H newc -o | gzip > ../binaries/domU-rootfs.cpio.gz
> 
> same here

Same answer :)

Specifically it's required to preserve dev/null, dev/console etc.

> > +cd ..
> > +rm -rf rootfs
> > +
> > +# DOM0 rootfs
> > +mkdir -p rootfs
> > +cd rootfs
> > +fakeroot -s ../fakeroot-save tar xzf ../binaries/initrd.tar.gz
> > +mkdir boot
> > +mkdir proc
> > +mkdir run
> > +mkdir srv
> > +mkdir sys
> > +rm var/run
> > +cp -ar ../binaries/dist/install/* .
> > +
> > +echo "#!/bin/bash
> > +
> > +export LD_LIBRARY_PATH=/usr/local/lib
> > +bash /etc/init.d/xencommons start
> > +
> > +brctl addbr xenbr0
> > +brctl addif xenbr0 eth0
> > +ifconfig eth0 up
> > +ifconfig xenbr0 up
> > +ifconfig xenbr0 192.168.0.1
> > +
> > +xl create /etc/xen/domU.cfg
> > +${dom0_check}
> > +" > etc/local.d/xen.start
> > +chmod +x etc/local.d/xen.start
> > +# just PVH for now
> > +echo '
> > +type = "pvh"
> > +name = "domU"
> > +kernel = "/boot/vmlinuz"
> > +ramdisk = "/boot/initrd-domU"
> > +extra = "root=/dev/ram0 console=hvc0"
> > +memory = 512
> > +vif = [ "bridge=xenbr0", ]
> > +disk = [ ]
> > +' > etc/xen/domU.cfg
> > +
> > +echo "rc_verbose=yes" >> etc/rc.conf
> > +echo "XENCONSOLED_TRACE=all" >> etc/default/xencommons
> > +echo "QEMU_XEN=/bin/false" >> etc/default/xencommons
> > +mkdir -p var/log/xen/console
> > +cp ../binaries/bzImage boot/vmlinuz
> > +cp ../binaries/domU-rootfs.cpio.gz boot/initrd-domU
> > +find . | fakeroot -i ../fakeroot-save cpio -H newc -o | gzip > ../binaries/dom0-rootfs.cpio.gz
> > +cd ..
> > +
> > +
> > +TFTP=/scratch/gitlab-runner/tftp
> > +
> > +echo '
> > +multiboot2 (http)/gitlab-ci/xen console=com1 com1=115200,8n1 loglvl=all guest_loglvl=all
> > +module2 (http)/gitlab-ci/vmlinuz console=hvc0 root=/dev/ram0
> > +module2 (http)/gitlab-ci/initrd-dom0
> > +' > $TFTP/grub.cfg
> > +
> > +cp -f binaries/xen $TFTP/xen
> > +cp -f binaries/bzImage $TFTP/vmlinuz
> > +cp -f binaries/dom0-rootfs.cpio.gz $TFTP/initrd-dom0
> > +
> > +# start logging the serial; this gives interactive console, don't close its
> > +# stdin to not close it; the 'cat' is important, plain redirection would hang
> > +# until somebody opens the pipe; opening and closing the pipe is used to close
> > +# the console
> > +mkfifo /tmp/console-stdin
> > +cat /tmp/console-stdin |\
> > +ssh control@thor.testnet console | tee smoke.serial &
> 
> For clarity I would add a variable just below TFTP above to store the
> controller node. This will make it easier to change in the future and
> also makes it easier to see that it is runner-specific detail, e.g.:
> 
> TFTP=/scratch/gitlab-runner/tftp
> CONTROLLER=control@thor.testnet
> 
> then here:
> 
> ssh $CONTROLLER console | tee smoke.serial &

Ok.

> > +# start the system pointing at gitlab-ci predefined config
> > +ssh control@thor.testnet gitlabci poweron
> > +trap "ssh control@thor.testnet poweroff; : > /tmp/console-stdin" EXIT
> > +
> > +if [ -n "$wait_and_wakeup" ]; then
> > +    # wait for suspend or a timeout
> > +    timeout=120
> > +    until grep "$wait_and_wakeup" smoke.serial || [ $timeout -le 0 ]; do
> > +        sleep 1;
> > +        : $((--timeout))
> > +    done
> > +    if [ $timeout -le 0 ]; then
> > +        echo "ERROR: suspend timeout, aborting"
> > +        exit 1
> > +    fi
> > +    # keep it suspended a bit, then wakeup
> > +    sleep 30
> > +    ssh control@thor.testnet wake
> > +fi
> 
> I like this. It will shorten the duration of the test job. I am not
> asking you to do anything, but I think it would be a good idea to have
> all test scripts loop around like this if possible to poll for success
> every second rather than wait a pre-determined amount of time, which is
> usually several minutes longer than necessary.
> 
> 
> > +# give the test time to run a bit, and then check
> > +sleep 30
> > +
> > +(grep -q "^Welcome to Alpine Linux" smoke.serial && grep -q "${passed}" smoke.serial) || exit 1
> > +exit 0
> > -- 
> > git-series 0.9.1
> >

Stefano Stabellini March 20, 2023, 8:08 p.m. UTC | #3

On Sat, 18 Mar 2023, Marek Marczykowski-Górecki wrote:
> On Fri, Mar 17, 2023 at 04:10:22PM -0700, Stefano Stabellini wrote:
> > On Fri, 17 Mar 2023, Marek Marczykowski-Górecki wrote:
> > > This is a first test using Qubes OS CI infra. The gitlab-runner has
> > > access to ssh-based control interface (control@thor.testnet, ssh key
> > > exposed to the test via ssh-agent) and pre-configured HTTP dir for boot
> > > files (mapped under /scratch/gitlab-runner/tftp inside the container).
> > > Details about the setup are described on
> > > https://www.qubes-os.org/news/2022/05/05/automated-os-testing-on-physical-laptops/
> > > 
> > > This test boots Xen, and try if S3 works. It runs on a ADL-based desktop
> > > system. The test script is based on the Xilinx one.
> > > 
> > > The machine needs newer kernel than other x86 tests run, so use 6.1.x
> > > kernel added in previous commit.
> > > 
> > > When building rootfs, use fakeroot to preserve device files when
> > > repacking rootfs archives in a non-privileged container.
> > > 
> > > Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
> > 
> > This is awesome!! Thanks Marek!
> > 
> > 
> > > ---
> > > I'm bad at naming things. Things that I could use naming suggestions:
> > >  - test script (qubes-x86-64-suspend.sh) - this might be okay?
> > >  - test template job name (.adl-x86-64)
> > >  - test job name (adl-suspend-x86-64-gcc)
> > >  - tag (qubes-hw2)
> > 
> > I think these names are OK. I would maybe rename the tag "qubes-hw2" to
> > "qubes" because so far there is only one but I am fine with qubes-hw2
> > also.
> 
> I have 10 of them (and growing), so I'd like to keep tag name at least
> somehow referencing which runner it uses. For example, this one is
> a desktop with Alder Lake, but some other tests I may want to use a laptop
> with Tiger Lake, for example. The mapping name -> hw spec isn't publicly
> written down (although our openQA publishes all kind of logs from them,
> so it's possible to infer this info).

That's fine by me, use whatever naming scheme works for you :-)


> > > For context, our CI has several machines, named test-X or hwX, each
> > > controlled with matching hal900X RPi (this is where gitlab-runner is).
> > > This test uses only one specific hw, but I plan adding few others too.
> > > ---
> > >  automation/gitlab-ci/test.yaml             |  31 ++++-
> > >  automation/scripts/qubes-x86-64-suspend.sh | 155 ++++++++++++++++++++++-
> > >  2 files changed, 186 insertions(+)
> > >  create mode 100755 automation/scripts/qubes-x86-64-suspend.sh
> > > 
> > > diff --git a/automation/gitlab-ci/test.yaml b/automation/gitlab-ci/test.yaml
> > > index 2e1a6886df7f..f5511dd6da70 100644
> > > --- a/automation/gitlab-ci/test.yaml
> > > +++ b/automation/gitlab-ci/test.yaml
> > > @@ -15,6 +15,10 @@
> > >  .arm32-test-needs: &arm32-test-needs
> > >    - qemu-system-aarch64-6.0.0-arm32-export
> > >  
> > > +.x86-64-test-needs: &x86-64-test-needs
> > > +  - alpine-3.12-rootfs-export
> > > +  - kernel-6.1.19-export
> > 
> > As you know, the jobs starting with a "." are template jobs to avoid
> > repeating the same things over and over. .x86-64-test-needs could be
> > used in qemu-alpine-x86_64-gcc also.
> 
> Right, when switching all of them to 6.1 kernel, that makes sense.
> 
> > >  .qemu-arm64:
> > >    extends: .test-jobs-common
> > >    variables:
> > > @@ -84,6 +88,25 @@
> > >    tags:
> > >      - xilinx
> > >  
> > > +.adl-x86-64:
> > > +  extends: .test-jobs-common
> > > +  variables:
> > > +    # the test controller runs on RPi4
> > > +    CONTAINER: alpine:3.12-arm64v8
> > > +    LOGFILE: smoke-test.log
> > > +  artifacts:
> > > +    paths:
> > > +      - smoke.serial
> > > +      - '*.log'
> > > +    when: always
> > > +  only:
> > > +    variables:
> > > +      - $QUBES_JOBS == "true" && $CI_COMMIT_REF_PROTECTED == "true"
> > 
> > Let me know which trees should have QUBES_JOBS set to true (thus able to
> > start Qubes jobs.) At a minimum, I think we would want
> > https://gitlab.com/xen-project/xen to test "staging" and "master". I can
> > set QUBES_JOBS to true to https://gitlab.com/xen-project/xen if you are
> > OK with it.
> 
> Yes, that's perfectly okay. I'd like at least also staging-4.17, but
> depending on push frequency other staging-* are probably fine too (I
> very much doubt long queue would be an issue). Of course that assumes
> those tests would be backported, which I'm not sure if is planned. I'm
> also okay with allowing committers and/or other maintainers to use it on
> demand, but preferably not all patchew branches.

Yes that makes sense. Let's start with
https://gitlab.com/xen-project/xen for now, then we'll figure out a way
to implement more precisely what you wrote above, which also matches the
requirements for the xilinx runner.


> BTW, if you trigger job manually via a web interface or API, you can
> specify variables for just a single pipeline. And if you use that
> feature, you can also make gitlab present you a bit more convenient
> interface with variables listed already. See example here:
> https://gitlab.com/QubesOS/qubes-continuous-integration/-/blob/main/.gitlab-ci.yml#L15-26
> This could be useful to start only a subset of tests.

Good to know!


> > > +  before_script:
> > > +    - apk --no-cache add openssh-client fakeroot
> > 
> > This would work but we typically try to avoid installing more packages
> > in the test jobs for speed and efficiency. 
> 
> Yes, I was being lazy and wanted to avoid rebuilding container (and
> changing from where it's pulled) when developing the test. Sure, will
> fold that into the dockerfile.

Thanks!


> > Instead, add those packages
> > directly to the build container dockerfile
> > (automation/build/alpine/3.12-arm64v8.dockerfile), e.g.:
> > 
> > diff --git a/automation/build/alpine/3.12-arm64v8.dockerfile b/automation/build/alpine/3.12-arm64v8.dockerfile
> > index 180c978964..3f1e6a3fc6 100644
> > --- a/automation/build/alpine/3.12-arm64v8.dockerfile
> > +++ b/automation/build/alpine/3.12-arm64v8.dockerfile
> > @@ -41,3 +41,6 @@ RUN apk --no-cache add \
> >    libattr \
> >    libcap-ng-dev \
> >    pixman-dev \
> > +  # qubes test deps
> > +  openssh-client \
> > +  fakeroot \
> > 
> > But I am not sure why you need fakeroot, more questions below about it
> > 
> > 
> > > +  tags:
> > > +    - qubes-hw2
> > > +
> > >  # Test jobs
> > >  build-each-commit-gcc:
> > >    extends: .test-jobs-common
> > > @@ -110,6 +133,14 @@ xilinx-smoke-dom0less-arm64-gcc:
> > >      - *arm64-test-needs
> > >      - alpine-3.12-gcc-arm64
> > >  
> > > +adl-suspend-x86-64-gcc:
> > > +  extends: .adl-x86-64
> > > +  script:
> > > +    - ./automation/scripts/qubes-x86-64-suspend.sh s3 2>&1 | tee ${LOGFILE}
> > > +  needs:
> > > +    - *x86-64-test-needs
> > > +    - alpine-3.12-gcc
> > 
> > This up to you, but qubes-x86-64-suspend.sh is very cool and
> > sophisticated and I would use it to create at least 2 jobs:
> > - a plain dom0 + domU boot (no suspend)
> > - a suspend/resume job
> 
> Indeed both tests are implemented there already, just a matter of
> creating another job with a different parameter.
> 
> But then, the script name isn't really accurate. qubes-x86-64.sh then?

Yes good idea to rename the script


> FWIW, next tests I'm planning to contribute:
>  - PCI passthrough to HVM with qemu in dom0
>  - linux stubdomain
>  - PCI passthrough with linux stubdomain
> 
> And possibly some of the above exercising other systems, maybe something
> on AMD (currently only one runner is on AMD, Ryzen 5 4500U specifically,
> and is rather busy because of that, but we can try).

That's very cool! I am looking forward to it.


> I'm not sure yet if putting them all in a single script is a smart idea.
> On the other hand, quite a bit of boilerplate would be duplicated
> otherwise.

There is no hard rule on this. I'd say try to put them on a single
script because it tends to help with maintainability. But if one of the
tests is sufficiently different and would make things complicated is
also OK to move it to a new script.


> > I am happy either way
> > 
> > 
> > >  qemu-smoke-dom0-arm64-gcc:
> > >    extends: .qemu-arm64
> > >    script:
> > > diff --git a/automation/scripts/qubes-x86-64-suspend.sh b/automation/scripts/qubes-x86-64-suspend.sh
> > > new file mode 100755
> > > index 000000000000..fc479c9faaec
> > > --- /dev/null
> > > +++ b/automation/scripts/qubes-x86-64-suspend.sh
> > > @@ -0,0 +1,155 @@
> > > +#!/bin/sh
> > > +
> > > +set -ex
> > > +
> > > +test_variant=$1
> > > +
> > > +wait_and_wakeup=
> > > +if [ -z "${test_variant}" ]; then
> > > +    passed="ping test passed"
> > > +    domU_check="
> > > +until ifconfig eth0 192.168.0.2 &> /dev/null && ping -c 10 192.168.0.1; do
> > > +    sleep 30
> > > +done
> > > +echo \"${passed}\"
> > > +"
> > > +elif [ "${test_variant}" = "s3" ]; then
> > > +    passed="suspend test passed"
> > > +    wait_and_wakeup="started, suspending"
> > > +    domU_check="
> > > +ifconfig eth0 192.168.0.2
> > > +echo domU started
> > > +"
> > > +    dom0_check="
> > > +until grep 'domU started' /var/log/xen/console/guest-domU.log; do
> > > +    sleep 1
> > > +done
> > > +echo \"${wait_and_wakeup}\"
> > > +set -x
> > > +echo deep > /sys/power/mem_sleep
> > > +echo mem > /sys/power/state
> > > +# now wait for resume
> > > +sleep 5
> > > +# get domU console content into test log
> > > +tail -n 100 /var/log/xen/console/guest-domU.log
> > > +xl list
> > > +xl dmesg | grep 'Finishing wakeup from ACPI S3 state' || exit 1
> > > +# check if domU is still alive
> > > +ping -c 10 192.168.0.2 || exit 1
> > > +echo \"${passed}\"
> > > +"
> > > +fi
> > 
> > Very nice!
> > 
> > 
> > > +# DomU
> > > +mkdir -p rootfs
> > > +cd rootfs
> > > +fakeroot -s ../fakeroot-save tar xzf ../binaries/initrd.tar.gz
> > 
> > I am a bit confused about it: are you sure you need fakeroot for this?
> > This script is running inside a container as root already? Are you using
> > Docker on the RPi4 to run this job?
> 
> This is running in a rootless podman container. But even with docker,
> for device files to work (see commit message) it would need to run
> privileged container, which I'd like to avoid.

Are you sure? I can run a non-privileged container with device assigned
just fine with Docker and
 
  devices = ["/dev/ttyUSB0:/dev/ttyUSB0"]

in the gitlab-runner config.toml.

Maybe it is just a matter of access permissions on your devices?


In any case, if your podman environment requires fakeroot, that is a
good enough reason for me. Just please add a note about it in the
commit message or as in-code comment in the script.


> > > +mkdir proc
> > > +mkdir run
> > > +mkdir srv
> > > +mkdir sys
> > > +rm var/run
> > > +echo "#!/bin/sh
> > > +
> > > +${domU_check}
> > > +/bin/sh" > etc/local.d/xen.start
> > > +chmod +x etc/local.d/xen.start
> > > +echo "rc_verbose=yes" >> etc/rc.conf
> > > +find . | fakeroot -i ../fakeroot-save cpio -H newc -o | gzip > ../binaries/domU-rootfs.cpio.gz
> > 
> > same here
> 
> Same answer :)
> 
> Specifically it's required to preserve dev/null, dev/console etc.
> 
> > > +cd ..
> > > +rm -rf rootfs
> > > +
> > > +# DOM0 rootfs
> > > +mkdir -p rootfs
> > > +cd rootfs
> > > +fakeroot -s ../fakeroot-save tar xzf ../binaries/initrd.tar.gz
> > > +mkdir boot
> > > +mkdir proc
> > > +mkdir run
> > > +mkdir srv
> > > +mkdir sys
> > > +rm var/run
> > > +cp -ar ../binaries/dist/install/* .
> > > +
> > > +echo "#!/bin/bash
> > > +
> > > +export LD_LIBRARY_PATH=/usr/local/lib
> > > +bash /etc/init.d/xencommons start
> > > +
> > > +brctl addbr xenbr0
> > > +brctl addif xenbr0 eth0
> > > +ifconfig eth0 up
> > > +ifconfig xenbr0 up
> > > +ifconfig xenbr0 192.168.0.1
> > > +
> > > +xl create /etc/xen/domU.cfg
> > > +${dom0_check}
> > > +" > etc/local.d/xen.start
> > > +chmod +x etc/local.d/xen.start
> > > +# just PVH for now
> > > +echo '
> > > +type = "pvh"
> > > +name = "domU"
> > > +kernel = "/boot/vmlinuz"
> > > +ramdisk = "/boot/initrd-domU"
> > > +extra = "root=/dev/ram0 console=hvc0"
> > > +memory = 512
> > > +vif = [ "bridge=xenbr0", ]
> > > +disk = [ ]
> > > +' > etc/xen/domU.cfg
> > > +
> > > +echo "rc_verbose=yes" >> etc/rc.conf
> > > +echo "XENCONSOLED_TRACE=all" >> etc/default/xencommons
> > > +echo "QEMU_XEN=/bin/false" >> etc/default/xencommons
> > > +mkdir -p var/log/xen/console
> > > +cp ../binaries/bzImage boot/vmlinuz
> > > +cp ../binaries/domU-rootfs.cpio.gz boot/initrd-domU
> > > +find . | fakeroot -i ../fakeroot-save cpio -H newc -o | gzip > ../binaries/dom0-rootfs.cpio.gz
> > > +cd ..
> > > +
> > > +
> > > +TFTP=/scratch/gitlab-runner/tftp
> > > +
> > > +echo '
> > > +multiboot2 (http)/gitlab-ci/xen console=com1 com1=115200,8n1 loglvl=all guest_loglvl=all
> > > +module2 (http)/gitlab-ci/vmlinuz console=hvc0 root=/dev/ram0
> > > +module2 (http)/gitlab-ci/initrd-dom0
> > > +' > $TFTP/grub.cfg
> > > +
> > > +cp -f binaries/xen $TFTP/xen
> > > +cp -f binaries/bzImage $TFTP/vmlinuz
> > > +cp -f binaries/dom0-rootfs.cpio.gz $TFTP/initrd-dom0
> > > +
> > > +# start logging the serial; this gives interactive console, don't close its
> > > +# stdin to not close it; the 'cat' is important, plain redirection would hang
> > > +# until somebody opens the pipe; opening and closing the pipe is used to close
> > > +# the console
> > > +mkfifo /tmp/console-stdin
> > > +cat /tmp/console-stdin |\
> > > +ssh control@thor.testnet console | tee smoke.serial &
> > 
> > For clarity I would add a variable just below TFTP above to store the
> > controller node. This will make it easier to change in the future and
> > also makes it easier to see that it is runner-specific detail, e.g.:
> > 
> > TFTP=/scratch/gitlab-runner/tftp
> > CONTROLLER=control@thor.testnet
> > 
> > then here:
> > 
> > ssh $CONTROLLER console | tee smoke.serial &
> 
> Ok.
> 
> > > +# start the system pointing at gitlab-ci predefined config
> > > +ssh control@thor.testnet gitlabci poweron
> > > +trap "ssh control@thor.testnet poweroff; : > /tmp/console-stdin" EXIT
> > > +
> > > +if [ -n "$wait_and_wakeup" ]; then
> > > +    # wait for suspend or a timeout
> > > +    timeout=120
> > > +    until grep "$wait_and_wakeup" smoke.serial || [ $timeout -le 0 ]; do
> > > +        sleep 1;
> > > +        : $((--timeout))
> > > +    done
> > > +    if [ $timeout -le 0 ]; then
> > > +        echo "ERROR: suspend timeout, aborting"
> > > +        exit 1
> > > +    fi
> > > +    # keep it suspended a bit, then wakeup
> > > +    sleep 30
> > > +    ssh control@thor.testnet wake
> > > +fi
> > 
> > I like this. It will shorten the duration of the test job. I am not
> > asking you to do anything, but I think it would be a good idea to have
> > all test scripts loop around like this if possible to poll for success
> > every second rather than wait a pre-determined amount of time, which is
> > usually several minutes longer than necessary.
> > 
> > 
> > > +# give the test time to run a bit, and then check
> > > +sleep 30
> > > +
> > > +(grep -q "^Welcome to Alpine Linux" smoke.serial && grep -q "${passed}" smoke.serial) || exit 1
> > > +exit 0
> > > -- 
> > > git-series 0.9.1
> > > 
> 
> 
> -- 
> Best Regards,
> Marek Marczykowski-Górecki
> Invisible Things Lab
>

Marek Marczykowski-Górecki March 20, 2023, 8:36 p.m. UTC | #4

On Mon, Mar 20, 2023 at 01:08:42PM -0700, Stefano Stabellini wrote:
> On Sat, 18 Mar 2023, Marek Marczykowski-Górecki wrote:
> > On Fri, Mar 17, 2023 at 04:10:22PM -0700, Stefano Stabellini wrote:
> > > On Fri, 17 Mar 2023, Marek Marczykowski-Górecki wrote:
> > > > +fakeroot -s ../fakeroot-save tar xzf ../binaries/initrd.tar.gz
> > > 
> > > I am a bit confused about it: are you sure you need fakeroot for this?
> > > This script is running inside a container as root already? Are you using
> > > Docker on the RPi4 to run this job?
> > 
> > This is running in a rootless podman container. But even with docker,
> > for device files to work (see commit message) it would need to run
> > privileged container, which I'd like to avoid.
> 
> Are you sure? I can run a non-privileged container with device assigned
> just fine with Docker and
>  
>   devices = ["/dev/ttyUSB0:/dev/ttyUSB0"]
> 
> in the gitlab-runner config.toml.

It isn't about accessing existing devices, it's about creating them
(unpacking rootfs which contains static /dev) and then packing it back
still having those devices.

Stefano Stabellini March 20, 2023, 8:41 p.m. UTC | #5

On Mon, 20 Mar 2023, Marek Marczykowski-Górecki wrote:
> On Mon, Mar 20, 2023 at 01:08:42PM -0700, Stefano Stabellini wrote:
> > On Sat, 18 Mar 2023, Marek Marczykowski-Górecki wrote:
> > > On Fri, Mar 17, 2023 at 04:10:22PM -0700, Stefano Stabellini wrote:
> > > > On Fri, 17 Mar 2023, Marek Marczykowski-Górecki wrote:
> > > > > +fakeroot -s ../fakeroot-save tar xzf ../binaries/initrd.tar.gz
> > > > 
> > > > I am a bit confused about it: are you sure you need fakeroot for this?
> > > > This script is running inside a container as root already? Are you using
> > > > Docker on the RPi4 to run this job?
> > > 
> > > This is running in a rootless podman container. But even with docker,
> > > for device files to work (see commit message) it would need to run
> > > privileged container, which I'd like to avoid.
> > 
> > Are you sure? I can run a non-privileged container with device assigned
> > just fine with Docker and
> >  
> >   devices = ["/dev/ttyUSB0:/dev/ttyUSB0"]
> > 
> > in the gitlab-runner config.toml.
> 
> It isn't about accessing existing devices, it's about creating them
> (unpacking rootfs which contains static /dev) and then packing it back
> still having those devices.

OK for that definitely you don't need a privileged container. A regular
container comes with "root" by default, but without all the privileges
that "root" normally allows outside of a container. That is enough (at
least in my environments) to pack/unpack a rootfs successfully without
fakeroot. Maybe this is a podman-specific limitation?

If you are curious to try, you should be able to run a simple
pack/unpack rootfs with Docker (of course without --privileged) without
issues.

Marek Marczykowski-Górecki March 20, 2023, 8:55 p.m. UTC | #6

On Mon, Mar 20, 2023 at 01:41:45PM -0700, Stefano Stabellini wrote:
> On Mon, 20 Mar 2023, Marek Marczykowski-Górecki wrote:
> > On Mon, Mar 20, 2023 at 01:08:42PM -0700, Stefano Stabellini wrote:
> > > On Sat, 18 Mar 2023, Marek Marczykowski-Górecki wrote:
> > > > On Fri, Mar 17, 2023 at 04:10:22PM -0700, Stefano Stabellini wrote:
> > > > > On Fri, 17 Mar 2023, Marek Marczykowski-Górecki wrote:
> > > > > > +fakeroot -s ../fakeroot-save tar xzf ../binaries/initrd.tar.gz
> > > > > 
> > > > > I am a bit confused about it: are you sure you need fakeroot for this?
> > > > > This script is running inside a container as root already? Are you using
> > > > > Docker on the RPi4 to run this job?
> > > > 
> > > > This is running in a rootless podman container. But even with docker,
> > > > for device files to work (see commit message) it would need to run
> > > > privileged container, which I'd like to avoid.
> > > 
> > > Are you sure? I can run a non-privileged container with device assigned
> > > just fine with Docker and
> > >  
> > >   devices = ["/dev/ttyUSB0:/dev/ttyUSB0"]
> > > 
> > > in the gitlab-runner config.toml.
> > 
> > It isn't about accessing existing devices, it's about creating them
> > (unpacking rootfs which contains static /dev) and then packing it back
> > still having those devices.
> 
> OK for that definitely you don't need a privileged container. A regular
> container comes with "root" by default, but without all the privileges
> that "root" normally allows outside of a container. That is enough (at
> least in my environments) to pack/unpack a rootfs successfully without
> fakeroot. Maybe this is a podman-specific limitation?

It seems so, as rootless podman isn't running commands as root on the
"host", but instead in a separate user namespace.

> If you are curious to try, you should be able to run a simple
> pack/unpack rootfs with Docker (of course without --privileged) without
> issues.

In fact, the same issue happens in docker, if I enable this extra
protection there[1].

[1] https://docs.docker.com/engine/security/userns-remap/

Stefano Stabellini March 20, 2023, 9:40 p.m. UTC | #7

On Mon, 20 Mar 2023, Marek Marczykowski-Górecki wrote:
> On Mon, Mar 20, 2023 at 01:41:45PM -0700, Stefano Stabellini wrote:
> > On Mon, 20 Mar 2023, Marek Marczykowski-Górecki wrote:
> > > On Mon, Mar 20, 2023 at 01:08:42PM -0700, Stefano Stabellini wrote:
> > > > On Sat, 18 Mar 2023, Marek Marczykowski-Górecki wrote:
> > > > > On Fri, Mar 17, 2023 at 04:10:22PM -0700, Stefano Stabellini wrote:
> > > > > > On Fri, 17 Mar 2023, Marek Marczykowski-Górecki wrote:
> > > > > > > +fakeroot -s ../fakeroot-save tar xzf ../binaries/initrd.tar.gz
> > > > > > 
> > > > > > I am a bit confused about it: are you sure you need fakeroot for this?
> > > > > > This script is running inside a container as root already? Are you using
> > > > > > Docker on the RPi4 to run this job?
> > > > > 
> > > > > This is running in a rootless podman container. But even with docker,
> > > > > for device files to work (see commit message) it would need to run
> > > > > privileged container, which I'd like to avoid.
> > > > 
> > > > Are you sure? I can run a non-privileged container with device assigned
> > > > just fine with Docker and
> > > >  
> > > >   devices = ["/dev/ttyUSB0:/dev/ttyUSB0"]
> > > > 
> > > > in the gitlab-runner config.toml.
> > > 
> > > It isn't about accessing existing devices, it's about creating them
> > > (unpacking rootfs which contains static /dev) and then packing it back
> > > still having those devices.
> > 
> > OK for that definitely you don't need a privileged container. A regular
> > container comes with "root" by default, but without all the privileges
> > that "root" normally allows outside of a container. That is enough (at
> > least in my environments) to pack/unpack a rootfs successfully without
> > fakeroot. Maybe this is a podman-specific limitation?
> 
> It seems so, as rootless podman isn't running commands as root on the
> "host", but instead in a separate user namespace.
> 
> > If you are curious to try, you should be able to run a simple
> > pack/unpack rootfs with Docker (of course without --privileged) without
> > issues.
> 
> In fact, the same issue happens in docker, if I enable this extra
> protection there[1].
> 
> [1] https://docs.docker.com/engine/security/userns-remap/
 
Useful to know! Please add all the info to the commit message or
a in-code comment. I am fine with using fakeroot for this (and in fact we
might want to add it to the other scripts as well at some point, given
that it shouldn't hurt in the "root" case either).

[2/2] automation: add a suspend test on an Alder Lake system

Commit Message

Comments

Patch