diff mbox

[for-2.13] pc-bios/s390-ccw: Increase virtio timeout to 30 seconds

Message ID 1522316251-16399-1-git-send-email-thuth@redhat.com (mailing list archive)
State New, archived
Headers show

Commit Message

Thomas Huth March 29, 2018, 9:37 a.m. UTC
The current timeout is set to only three seconds - and considering that
vring_wait_reply() or rather get_second() is not doing any rounding,
the real timeout is likely rather 2 seconds in most cases. When the
host is really badly loaded and we run the guest in TCG mode instead
of KVM, it's possible that we hit this timeout by mistake. So let's
increase the timeout to 30 seconds instead to ease this situation (30
seconds is also the timeout that is used by the Linux SCSI subsystem
for example, so this seems to be a sane value for block IO timeout).

Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1549079
Signed-off-by: Thomas Huth <thuth@redhat.com>
---
 pc-bios/s390-ccw/virtio.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Christian Borntraeger March 29, 2018, 10:03 a.m. UTC | #1
On 03/29/2018 11:37 AM, Thomas Huth wrote:
> The current timeout is set to only three seconds - and considering that
> vring_wait_reply() or rather get_second() is not doing any rounding,
> the real timeout is likely rather 2 seconds in most cases. When the
> host is really badly loaded and we run the guest in TCG mode instead
> of KVM, it's possible that we hit this timeout by mistake. So let's
> increase the timeout to 30 seconds instead to ease this situation (30
> seconds is also the timeout that is used by the Linux SCSI subsystem
> for example, so this seems to be a sane value for block IO timeout).

I have never seen this, but cant this also with KVM (e.g. host paging
and the swap disk is busy for some seconds).

Wouldnt it also qualify for 2.12?

> 
> Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1549079
> Signed-off-by: Thomas Huth <thuth@redhat.com>
> ---
>  pc-bios/s390-ccw/virtio.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/pc-bios/s390-ccw/virtio.c b/pc-bios/s390-ccw/virtio.c
> index 817e7f5..cdb66f4 100644
> --- a/pc-bios/s390-ccw/virtio.c
> +++ b/pc-bios/s390-ccw/virtio.c
> @@ -14,7 +14,7 @@
>  #include "virtio-scsi.h"
>  #include "bswap.h"
> 
> -#define VRING_WAIT_REPLY_TIMEOUT 3
> +#define VRING_WAIT_REPLY_TIMEOUT 30
> 
>  static VRing block[VIRTIO_MAX_VQS];
>  static char ring_area[VIRTIO_RING_SIZE * VIRTIO_MAX_VQS]
>
Cornelia Huck March 29, 2018, 11:02 a.m. UTC | #2
On Thu, 29 Mar 2018 12:03:42 +0200
Christian Borntraeger <borntraeger@de.ibm.com> wrote:

> On 03/29/2018 11:37 AM, Thomas Huth wrote:
> > The current timeout is set to only three seconds - and considering that
> > vring_wait_reply() or rather get_second() is not doing any rounding,
> > the real timeout is likely rather 2 seconds in most cases. When the
> > host is really badly loaded and we run the guest in TCG mode instead
> > of KVM, it's possible that we hit this timeout by mistake. So let's
> > increase the timeout to 30 seconds instead to ease this situation (30
> > seconds is also the timeout that is used by the Linux SCSI subsystem
> > for example, so this seems to be a sane value for block IO timeout).  
> 
> I have never seen this, but cant this also with KVM (e.g. host paging
> and the swap disk is busy for some seconds).

It's probably less likely, but I don't see why not. 30s sounds like a
more forgiving value.

> 
> Wouldnt it also qualify for 2.12?

I can queue it (and a bios rebuild) to s390-fixes, it's certainly
reasonable.

Else,

Reviewed-by: Cornelia Huck <cohuck@redhat.com>

> 
> > 
> > Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1549079
> > Signed-off-by: Thomas Huth <thuth@redhat.com>
> > ---
> >  pc-bios/s390-ccw/virtio.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/pc-bios/s390-ccw/virtio.c b/pc-bios/s390-ccw/virtio.c
> > index 817e7f5..cdb66f4 100644
> > --- a/pc-bios/s390-ccw/virtio.c
> > +++ b/pc-bios/s390-ccw/virtio.c
> > @@ -14,7 +14,7 @@
> >  #include "virtio-scsi.h"
> >  #include "bswap.h"
> > 
> > -#define VRING_WAIT_REPLY_TIMEOUT 3
> > +#define VRING_WAIT_REPLY_TIMEOUT 30
> > 
> >  static VRing block[VIRTIO_MAX_VQS];
> >  static char ring_area[VIRTIO_RING_SIZE * VIRTIO_MAX_VQS]
> >   
>
Thomas Huth March 29, 2018, 11:32 a.m. UTC | #3
On 29.03.2018 12:03, Christian Borntraeger wrote:
> 
> 
> On 03/29/2018 11:37 AM, Thomas Huth wrote:
>> The current timeout is set to only three seconds - and considering that
>> vring_wait_reply() or rather get_second() is not doing any rounding,
>> the real timeout is likely rather 2 seconds in most cases. When the
>> host is really badly loaded and we run the guest in TCG mode instead
>> of KVM, it's possible that we hit this timeout by mistake. So let's
>> increase the timeout to 30 seconds instead to ease this situation (30
>> seconds is also the timeout that is used by the Linux SCSI subsystem
>> for example, so this seems to be a sane value for block IO timeout).
> 
> I have never seen this, but cant this also with KVM (e.g. host paging
> and the swap disk is busy for some seconds).

Sure, I even assume that Lukáš might have hit this with KVM, and not
with TCG. It's just a little bit more unlikely there.

> Wouldnt it also qualify for 2.12?

Yes, I was just not sure whether we want to rebuild the s390-ccw.img
just because of this ... I'd like to leave that decision to you and
Cornelia.

 Thomas
Christian Borntraeger March 29, 2018, 11:46 a.m. UTC | #4
On 03/29/2018 11:37 AM, Thomas Huth wrote:
> The current timeout is set to only three seconds - and considering that
> vring_wait_reply() or rather get_second() is not doing any rounding,
> the real timeout is likely rather 2 seconds in most cases. When the
> host is really badly loaded and we run the guest in TCG mode instead
> of KVM, it's possible that we hit this timeout by mistake. So let's
> increase the timeout to 30 seconds instead to ease this situation (30
> seconds is also the timeout that is used by the Linux SCSI subsystem
> for example, so this seems to be a sane value for block IO timeout).
> 
> Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1549079
> Signed-off-by: Thomas Huth <thuth@redhat.com>

Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>

I think this should go into 2.12 with a rebuild


> ---
>  pc-bios/s390-ccw/virtio.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/pc-bios/s390-ccw/virtio.c b/pc-bios/s390-ccw/virtio.c
> index 817e7f5..cdb66f4 100644
> --- a/pc-bios/s390-ccw/virtio.c
> +++ b/pc-bios/s390-ccw/virtio.c
> @@ -14,7 +14,7 @@
>  #include "virtio-scsi.h"
>  #include "bswap.h"
> 
> -#define VRING_WAIT_REPLY_TIMEOUT 3
> +#define VRING_WAIT_REPLY_TIMEOUT 30
> 
>  static VRing block[VIRTIO_MAX_VQS];
>  static char ring_area[VIRTIO_RING_SIZE * VIRTIO_MAX_VQS]
>
Cornelia Huck March 29, 2018, 1:15 p.m. UTC | #5
On Thu, 29 Mar 2018 11:37:31 +0200
Thomas Huth <thuth@redhat.com> wrote:

> The current timeout is set to only three seconds - and considering that
> vring_wait_reply() or rather get_second() is not doing any rounding,
> the real timeout is likely rather 2 seconds in most cases. When the
> host is really badly loaded and we run the guest in TCG mode instead
> of KVM, it's possible that we hit this timeout by mistake. 

Let's tweak this sentence to

"When the host is really badly loaded, it's possible that we hit this
timeout by mistake; it's even more likely if we run the guest in TCG
mode instead of KVM."

?

> So let's
> increase the timeout to 30 seconds instead to ease this situation (30
> seconds is also the timeout that is used by the Linux SCSI subsystem
> for example, so this seems to be a sane value for block IO timeout).
> 
> Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1549079
> Signed-off-by: Thomas Huth <thuth@redhat.com>
> ---
>  pc-bios/s390-ccw/virtio.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
Thomas Huth March 29, 2018, 1:23 p.m. UTC | #6
On 29.03.2018 15:15, Cornelia Huck wrote:
> On Thu, 29 Mar 2018 11:37:31 +0200
> Thomas Huth <thuth@redhat.com> wrote:
> 
>> The current timeout is set to only three seconds - and considering that
>> vring_wait_reply() or rather get_second() is not doing any rounding,
>> the real timeout is likely rather 2 seconds in most cases. When the
>> host is really badly loaded and we run the guest in TCG mode instead
>> of KVM, it's possible that we hit this timeout by mistake. 
> 
> Let's tweak this sentence to
> 
> "When the host is really badly loaded, it's possible that we hit this
> timeout by mistake; it's even more likely if we run the guest in TCG
> mode instead of KVM."
> 
> ?

Fine for me. You could even remove that TCG vs. KVM part if you like.

 Thomas
Cornelia Huck March 29, 2018, 2:34 p.m. UTC | #7
On Thu, 29 Mar 2018 11:37:31 +0200
Thomas Huth <thuth@redhat.com> wrote:

> The current timeout is set to only three seconds - and considering that
> vring_wait_reply() or rather get_second() is not doing any rounding,
> the real timeout is likely rather 2 seconds in most cases. When the
> host is really badly loaded and we run the guest in TCG mode instead
> of KVM, it's possible that we hit this timeout by mistake. So let's
> increase the timeout to 30 seconds instead to ease this situation (30
> seconds is also the timeout that is used by the Linux SCSI subsystem
> for example, so this seems to be a sane value for block IO timeout).
> 
> Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1549079
> Signed-off-by: Thomas Huth <thuth@redhat.com>
> ---
>  pc-bios/s390-ccw/virtio.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/pc-bios/s390-ccw/virtio.c b/pc-bios/s390-ccw/virtio.c
> index 817e7f5..cdb66f4 100644
> --- a/pc-bios/s390-ccw/virtio.c
> +++ b/pc-bios/s390-ccw/virtio.c
> @@ -14,7 +14,7 @@
>  #include "virtio-scsi.h"
>  #include "bswap.h"
>  
> -#define VRING_WAIT_REPLY_TIMEOUT 3
> +#define VRING_WAIT_REPLY_TIMEOUT 30
>  
>  static VRing block[VIRTIO_MAX_VQS];
>  static char ring_area[VIRTIO_RING_SIZE * VIRTIO_MAX_VQS]

Thanks, merged this and a rebuild into s390-fixes.
no-reply@patchew.org March 31, 2018, 6:28 a.m. UTC | #8
Hi,

This series failed docker-quick@centos6 build test. Please find the testing commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

Type: series
Message-id: 1522316251-16399-1-git-send-email-thuth@redhat.com
Subject: [Qemu-devel] [PATCH for-2.13] pc-bios/s390-ccw: Increase virtio timeout to 30 seconds

=== TEST SCRIPT BEGIN ===
#!/bin/bash
set -e
git submodule update --init dtc
# Let docker tests dump environment info
export SHOW_ENV=1
export J=8
time make docker-test-quick@centos6
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
Switched to a new branch 'test'
eab5813eba pc-bios/s390-ccw: Increase virtio timeout to 30 seconds

=== OUTPUT BEGIN ===
Submodule 'dtc' (git://git.qemu-project.org/dtc.git) registered for path 'dtc'
Cloning into '/var/tmp/patchew-tester-tmp-btm39b6h/src/dtc'...
Submodule path 'dtc': checked out 'e54388015af1fb4bf04d0bca99caba1074d9cc42'
  BUILD   centos6
make[1]: Entering directory '/var/tmp/patchew-tester-tmp-btm39b6h/src'
  GEN     /var/tmp/patchew-tester-tmp-btm39b6h/src/docker-src.2018-03-31-02.27.43.19655/qemu.tar
Cloning into '/var/tmp/patchew-tester-tmp-btm39b6h/src/docker-src.2018-03-31-02.27.43.19655/qemu.tar.vroot'...
done.
Checking out files:  41% (2535/6066)   
Checking out files:  42% (2548/6066)   
Checking out files:  43% (2609/6066)   
Checking out files:  44% (2670/6066)   
Checking out files:  45% (2730/6066)   
Checking out files:  46% (2791/6066)   
Checking out files:  47% (2852/6066)   
Checking out files:  48% (2912/6066)   
Checking out files:  49% (2973/6066)   
Checking out files:  50% (3033/6066)   
Checking out files:  51% (3094/6066)   
Checking out files:  52% (3155/6066)   
Checking out files:  53% (3215/6066)   
Checking out files:  54% (3276/6066)   
Checking out files:  55% (3337/6066)   
Checking out files:  56% (3397/6066)   
Checking out files:  57% (3458/6066)   
Checking out files:  58% (3519/6066)   
Checking out files:  59% (3579/6066)   
Checking out files:  60% (3640/6066)   
Checking out files:  60% (3652/6066)   
Checking out files:  61% (3701/6066)   
Checking out files:  62% (3761/6066)   
Checking out files:  63% (3822/6066)   
Checking out files:  64% (3883/6066)   
Checking out files:  65% (3943/6066)   
Checking out files:  66% (4004/6066)   
Checking out files:  67% (4065/6066)   
Checking out files:  68% (4125/6066)   
Checking out files:  69% (4186/6066)   
Checking out files:  69% (4194/6066)   
Checking out files:  70% (4247/6066)   
Checking out files:  71% (4307/6066)   
Checking out files:  72% (4368/6066)   
Checking out files:  73% (4429/6066)   
Checking out files:  73% (4481/6066)   
Checking out files:  74% (4489/6066)   
Checking out files:  75% (4550/6066)   
Checking out files:  76% (4611/6066)   
Checking out files:  77% (4671/6066)   
Checking out files:  78% (4732/6066)   
Checking out files:  79% (4793/6066)   
Checking out files:  80% (4853/6066)   
Checking out files:  81% (4914/6066)   
Checking out files:  82% (4975/6066)   
Checking out files:  83% (5035/6066)   
Checking out files:  84% (5096/6066)   
Checking out files:  85% (5157/6066)   
Checking out files:  86% (5217/6066)   
Checking out files:  87% (5278/6066)   
Checking out files:  88% (5339/6066)   
Checking out files:  89% (5399/6066)   
Checking out files:  90% (5460/6066)   
Checking out files:  91% (5521/6066)   
Checking out files:  92% (5581/6066)   
Checking out files:  93% (5642/6066)   
Checking out files:  94% (5703/6066)   
Checking out files:  95% (5763/6066)   
Checking out files:  96% (5824/6066)   
Checking out files:  97% (5885/6066)   
Checking out files:  98% (5945/6066)   
Checking out files:  99% (6006/6066)   
Checking out files: 100% (6066/6066)   
Checking out files: 100% (6066/6066), done.
Your branch is up-to-date with 'origin/test'.
Submodule 'dtc' (git://git.qemu-project.org/dtc.git) registered for path 'dtc'
Cloning into '/var/tmp/patchew-tester-tmp-btm39b6h/src/docker-src.2018-03-31-02.27.43.19655/qemu.tar.vroot/dtc'...
Submodule path 'dtc': checked out 'e54388015af1fb4bf04d0bca99caba1074d9cc42'
Submodule 'ui/keycodemapdb' (git://git.qemu.org/keycodemapdb.git) registered for path 'ui/keycodemapdb'
Cloning into '/var/tmp/patchew-tester-tmp-btm39b6h/src/docker-src.2018-03-31-02.27.43.19655/qemu.tar.vroot/ui/keycodemapdb'...
Submodule path 'ui/keycodemapdb': checked out '6b3d716e2b6472eb7189d3220552280ef3d832ce'
tar: /var/tmp/patchew-tester-tmp-btm39b6h/src/docker-src.2018-03-31-02.27.43.19655/qemu.tar: Wrote only 4096 of 10240 bytes
tar: Error is not recoverable: exiting now
failed to create tar file
  COPY    RUNNER
    RUN test-quick in qemu:centos6 
tar: Unexpected EOF in archive
tar: Unexpected EOF in archive
tar: Error is not recoverable: exiting now
/var/tmp/qemu/run: line 32: prep_fail: command not found
Packages installed:
SDL-devel-1.2.14-7.el6_7.1.x86_64
bison-2.4.1-5.el6.x86_64
bzip2-devel-1.0.5-7.el6_0.x86_64
ccache-3.1.6-2.el6.x86_64
csnappy-devel-0-6.20150729gitd7bc683.el6.x86_64
flex-2.5.35-9.el6.x86_64
gcc-4.4.7-18.el6.x86_64
gettext-0.17-18.el6.x86_64
git-1.7.1-9.el6_9.x86_64
glib2-devel-2.28.8-9.el6.x86_64
libepoxy-devel-1.2-3.el6.x86_64
libfdt-devel-1.4.0-1.el6.x86_64
librdmacm-devel-1.0.21-0.el6.x86_64
lzo-devel-2.03-3.1.el6_5.1.x86_64
make-3.81-23.el6.x86_64
mesa-libEGL-devel-11.0.7-4.el6.x86_64
mesa-libgbm-devel-11.0.7-4.el6.x86_64
package g++ is not installed
pixman-devel-0.32.8-1.el6.x86_64
spice-glib-devel-0.26-8.el6.x86_64
spice-server-devel-0.12.4-16.el6.x86_64
tar-1.23-15.el6_8.x86_64
vte-devel-0.25.1-9.el6.x86_64
xen-devel-4.6.6-2.el6.x86_64
zlib-devel-1.2.3-29.el6.x86_64

Environment variables:
PACKAGES=bison     bzip2-devel     ccache     csnappy-devel     flex     g++     gcc     gettext     git     glib2-devel     libepoxy-devel     libfdt-devel     librdmacm-devel     lzo-devel     make     mesa-libEGL-devel     mesa-libgbm-devel     pixman-devel     SDL-devel     spice-glib-devel     spice-server-devel     tar     vte-devel     xen-devel     zlib-devel
HOSTNAME=da49c0188004
MAKEFLAGS= -j8
J=8
CCACHE_DIR=/var/tmp/ccache
EXTRA_CONFIGURE_OPTS=
V=
SHOW_ENV=1
PATH=/usr/lib/ccache:/usr/lib64/ccache:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
PWD=/
TARGET_LIST=
SHLVL=1
HOME=/root
TEST_DIR=/tmp/qemu-test
FEATURES= dtc
DEBUG=
_=/usr/bin/env

Configure options:
--enable-werror --target-list=x86_64-softmmu,aarch64-softmmu --prefix=/tmp/qemu-test/install

ERROR: DTC (libfdt) version >= 1.4.2 not present.
       Please install the DTC (libfdt) devel package

Traceback (most recent call last):
  File "./tests/docker/docker.py", line 407, in <module>
    sys.exit(main())
  File "./tests/docker/docker.py", line 404, in main
    return args.cmdobj.run(args, argv)
  File "./tests/docker/docker.py", line 261, in run
    return Docker().run(argv, args.keep, quiet=args.quiet)
  File "./tests/docker/docker.py", line 229, in run
    quiet=quiet)
  File "./tests/docker/docker.py", line 147, in _do_check
    return subprocess.check_call(self._command + cmd, **kwargs)
  File "/usr/lib64/python2.7/subprocess.py", line 186, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['docker', 'run', '--label', 'com.qemu.instance.uuid=a9e9312034ac11e8bfdf52540069c830', '-u', '0', '--security-opt', 'seccomp=unconfined', '--rm', '--net=none', '-e', 'TARGET_LIST=', '-e', 'EXTRA_CONFIGURE_OPTS=', '-e', 'V=', '-e', 'J=8', '-e', 'DEBUG=', '-e', 'SHOW_ENV=1', '-e', 'CCACHE_DIR=/var/tmp/ccache', '-v', '/root/.cache/qemu-docker-ccache:/var/tmp/ccache:z', '-v', '/var/tmp/patchew-tester-tmp-btm39b6h/src/docker-src.2018-03-31-02.27.43.19655:/var/tmp/qemu:z,ro', 'qemu:centos6', '/var/tmp/qemu/run', 'test-quick']' returned non-zero exit status 1
make[1]: *** [tests/docker/Makefile.include:129: docker-run] Error 1
make[1]: Leaving directory '/var/tmp/patchew-tester-tmp-btm39b6h/src'
make: *** [tests/docker/Makefile.include:163: docker-run-test-quick@centos6] Error 2

real	0m52.122s
user	0m8.994s
sys	0m6.805s
=== OUTPUT END ===

Test command exited with code: 2


---
Email generated automatically by Patchew [http://patchew.org/].
Please send your feedback to patchew-devel@redhat.com
diff mbox

Patch

diff --git a/pc-bios/s390-ccw/virtio.c b/pc-bios/s390-ccw/virtio.c
index 817e7f5..cdb66f4 100644
--- a/pc-bios/s390-ccw/virtio.c
+++ b/pc-bios/s390-ccw/virtio.c
@@ -14,7 +14,7 @@ 
 #include "virtio-scsi.h"
 #include "bswap.h"
 
-#define VRING_WAIT_REPLY_TIMEOUT 3
+#define VRING_WAIT_REPLY_TIMEOUT 30
 
 static VRing block[VIRTIO_MAX_VQS];
 static char ring_area[VIRTIO_RING_SIZE * VIRTIO_MAX_VQS]