Message ID | 20240731132524.308273-1-philipp.reisner@linbit.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | util: retry open() when it gets interrupted by a signal | expand |
On 31.07.24 15:25, Philipp Reisner wrote: > As with many syscalls, open() might be interrupted by a signal. > > The experienced logfile entry is: > > qemu-system-x86_64: -device virtio-blk-pci,bus=pci.0,addr=0x7,drive=libvirt-2-format,id=virtio-disk0,bootindex=2,write-cache=on,serial=1b990c4d13b74a4e90ea: Could not open '/dev/drbd1003': Interrupted system call > > Retry it until it is not interrupted by a signal. > FYI, dd has the same kind of loop aroud open(). > https://github.com/coreutils/coreutils/blob/1ae98dbda7322427e8226356fd110d2553f5fac9/src/dd.c#L1294-L1299 > > Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> > --- > util/osdep.c | 13 ++++++++----- > 1 file changed, 8 insertions(+), 5 deletions(-) > > diff --git a/util/osdep.c b/util/osdep.c > index 770369831b..a1269d9345 100644 > --- a/util/osdep.c > +++ b/util/osdep.c > @@ -294,14 +294,17 @@ bool qemu_has_direct_io(void) > static int qemu_open_cloexec(const char *name, int flags, mode_t mode) > { > int ret; > + do { > #ifdef O_CLOEXEC > - ret = open(name, flags | O_CLOEXEC, mode); > + ret = open(name, flags | O_CLOEXEC, mode); > #else > - ret = open(name, flags, mode); > - if (ret >= 0) { > - qemu_set_cloexec(ret); > - } > + ret = open(name, flags, mode); > + if (ret >= 0) { > + qemu_set_cloexec(ret); > + } > #endif > + } while (ret == -1 && errno == EINTR); > + > return ret; > } > Reviewed-by: David Hildenbrand <david@redhat.com>
On Wed, Jul 31, 2024 at 03:25:24PM +0200, Philipp Reisner wrote: > As with many syscalls, open() might be interrupted by a signal. > > The experienced logfile entry is: > > qemu-system-x86_64: -device virtio-blk-pci,bus=pci.0,addr=0x7,drive=libvirt-2-format,id=virtio-disk0,bootindex=2,write-cache=on,serial=1b990c4d13b74a4e90ea: Could not open '/dev/drbd1003': Interrupted system call > > Retry it until it is not interrupted by a signal. As you say, many syscalls can be interruptted by signals, so special casing open() isn't really a solution - its just addressing one specific instance you happened to see. If there are certain signals that we don't want to have a fatal interruption for, it'd be better to set SA_RESTART with sigaction, which will auto-restart a large set of syscalls, while allowing other signals to be fatal. > FYI, dd has the same kind of loop aroud open(). > https://github.com/coreutils/coreutils/blob/1ae98dbda7322427e8226356fd110d2553f5fac9/src/dd.c#L1294-L1299 > > Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> > --- > util/osdep.c | 13 ++++++++----- > 1 file changed, 8 insertions(+), 5 deletions(-) > > diff --git a/util/osdep.c b/util/osdep.c > index 770369831b..a1269d9345 100644 > --- a/util/osdep.c > +++ b/util/osdep.c > @@ -294,14 +294,17 @@ bool qemu_has_direct_io(void) > static int qemu_open_cloexec(const char *name, int flags, mode_t mode) > { > int ret; > + do { > #ifdef O_CLOEXEC > - ret = open(name, flags | O_CLOEXEC, mode); > + ret = open(name, flags | O_CLOEXEC, mode); > #else > - ret = open(name, flags, mode); > - if (ret >= 0) { > - qemu_set_cloexec(ret); > - } > + ret = open(name, flags, mode); > + if (ret >= 0) { > + qemu_set_cloexec(ret); > + } > #endif > + } while (ret == -1 && errno == EINTR); > + > return ret; > } > > -- > 2.45.2 > > With regards, Daniel
On Wed, 31 Jul 2024 at 15:11, Daniel P. Berrangé <berrange@redhat.com> wrote: > > On Wed, Jul 31, 2024 at 03:25:24PM +0200, Philipp Reisner wrote: > > As with many syscalls, open() might be interrupted by a signal. > > > > The experienced logfile entry is: > > > > qemu-system-x86_64: -device virtio-blk-pci,bus=pci.0,addr=0x7,drive=libvirt-2-format,id=virtio-disk0,bootindex=2,write-cache=on,serial=1b990c4d13b74a4e90ea: Could not open '/dev/drbd1003': Interrupted system call > > > > Retry it until it is not interrupted by a signal. > > As you say, many syscalls can be interruptted by signals, so > special casing open() isn't really a solution - its just > addressing one specific instance you happened to see. > > If there are certain signals that we don't want to have a > fatal interruption for, it'd be better to set SA_RESTART > with sigaction, which will auto-restart a large set of > syscalls, while allowing other signals to be fatal. This is why we have the RETRY_ON_EINTR() macro, right? Currently we have some places that call qemu_open_old() inside RETRY_ON_EINTR -- we should decide whether we want to handle EINTR inside the qemu_open family of functions, or make the caller deal with it, and put the macro uses in the right place consistently. I agree that it would be nicer if we could use SA_RESTART, but presumably there's a reason why we don't. (At any rate code that's shared with the user-mode emulation has to be EINTR-resistant, because we can't force the user-mode guest code to avoid registering signal handlers that aren't SA_RESTART.) thanks -- PMM
On Wed, Jul 31, 2024 at 03:32:52PM +0100, Peter Maydell wrote: > On Wed, 31 Jul 2024 at 15:11, Daniel P. Berrangé <berrange@redhat.com> wrote: > > > > On Wed, Jul 31, 2024 at 03:25:24PM +0200, Philipp Reisner wrote: > > > As with many syscalls, open() might be interrupted by a signal. > > > > > > The experienced logfile entry is: > > > > > > qemu-system-x86_64: -device virtio-blk-pci,bus=pci.0,addr=0x7,drive=libvirt-2-format,id=virtio-disk0,bootindex=2,write-cache=on,serial=1b990c4d13b74a4e90ea: Could not open '/dev/drbd1003': Interrupted system call What is the actual signal you are seeing that impacts QEMU in this way ? > > > Retry it until it is not interrupted by a signal. > > > > As you say, many syscalls can be interruptted by signals, so > > special casing open() isn't really a solution - its just > > addressing one specific instance you happened to see. > > > > If there are certain signals that we don't want to have a > > fatal interruption for, it'd be better to set SA_RESTART > > with sigaction, which will auto-restart a large set of > > syscalls, while allowing other signals to be fatal. > > This is why we have the RETRY_ON_EINTR() macro, right? > > Currently we have some places that call qemu_open_old() inside > RETRY_ON_EINTR -- we should decide whether we want to > handle EINTR inside the qemu_open family of functions, > or make the caller deal with it, and put the macro uses > in the right place consistently. It is incredibly arbitrary where we use RETRY_ON_EINTR, which I think points towards it being a sub-optimal solution to the general problem. > > I agree that it would be nicer if we could use SA_RESTART, > but presumably there's a reason why we don't. (At any > rate code that's shared with the user-mode emulation > has to be EINTR-resistant, because we can't force the > user-mode guest code to avoid registering signal handlers > that aren't SA_RESTART.) For user mode emulation isn't it valid to just propagage the EINTR back up to the application, since EINTR is a valid errno they have to be willing to handle unless the app has itself use SA_RESTART. With regards, Daniel
On Wed, 31 Jul 2024 at 16:21, Daniel P. Berrangé <berrange@redhat.com> wrote: > > On Wed, Jul 31, 2024 at 03:32:52PM +0100, Peter Maydell wrote: > > This is why we have the RETRY_ON_EINTR() macro, right? > > > > Currently we have some places that call qemu_open_old() inside > > RETRY_ON_EINTR -- we should decide whether we want to > > handle EINTR inside the qemu_open family of functions, > > or make the caller deal with it, and put the macro uses > > in the right place consistently. > > It is incredibly arbitrary where we use RETRY_ON_EINTR, which I think > points towards it being a sub-optimal solution to the general problem. Agreed (and agreed that SA_RESTART is the usual approach to avoid this mess). Partly I just vaguely recall discussions about this back when we added/improved the RETRY_ON_EINTR macro in the first place: maybe there's a reason we have it still... > > I agree that it would be nicer if we could use SA_RESTART, > > but presumably there's a reason why we don't. (At any > > rate code that's shared with the user-mode emulation > > has to be EINTR-resistant, because we can't force the > > user-mode guest code to avoid registering signal handlers > > that aren't SA_RESTART.) > > For user mode emulation isn't it valid to just propagage the > EINTR back up to the application, since EINTR is a valid errno > they have to be willing to handle unless the app has itself > use SA_RESTART. Yes, that's what we must do for cases where we are doing some syscall on behalf of the guest. But for cases where we're doing a syscall because of something QEMU itself needs to do, we may need to retry, because we might not be in a position to be able to back out of what we're doing (or we might not even be inside the "handle a guest syscall" codepath at all). -- PMM
On Wed, Jul 31, 2024 at 04:24:45PM +0100, Peter Maydell wrote: > On Wed, 31 Jul 2024 at 16:21, Daniel P. Berrangé <berrange@redhat.com> wrote: > > > > On Wed, Jul 31, 2024 at 03:32:52PM +0100, Peter Maydell wrote: > > > This is why we have the RETRY_ON_EINTR() macro, right? > > > > > > Currently we have some places that call qemu_open_old() inside > > > RETRY_ON_EINTR -- we should decide whether we want to > > > handle EINTR inside the qemu_open family of functions, > > > or make the caller deal with it, and put the macro uses > > > in the right place consistently. > > > > It is incredibly arbitrary where we use RETRY_ON_EINTR, which I think > > points towards it being a sub-optimal solution to the general problem. > > Agreed (and agreed that SA_RESTART is the usual approach to > avoid this mess). Partly I just vaguely recall discussions > about this back when we added/improved the RETRY_ON_EINTR > macro in the first place: maybe there's a reason we have it > still... > > > > I agree that it would be nicer if we could use SA_RESTART, > > > but presumably there's a reason why we don't. (At any > > > rate code that's shared with the user-mode emulation > > > has to be EINTR-resistant, because we can't force the > > > user-mode guest code to avoid registering signal handlers > > > that aren't SA_RESTART.) > > > > For user mode emulation isn't it valid to just propagage the > > EINTR back up to the application, since EINTR is a valid errno > > they have to be willing to handle unless the app has itself > > use SA_RESTART. > > Yes, that's what we must do for cases where we are doing some > syscall on behalf of the guest. But for cases where we're > doing a syscall because of something QEMU itself needs to do, > we may need to retry, because we might not be in a position > to be able to back out of what we're doing (or we might not > even be inside the "handle a guest syscall" codepath at all). Ah ok, so RETRY_ON_EINTR conceivably makes sense in the linux-user / bsd-user code in certain scenarios......but it seems almost every single use today is in system emulator code ! With regards, Daniel
Hi Daniel, > > > > The experienced logfile entry is: > > > > > > > > qemu-system-x86_64: -device virtio-blk-pci,bus=pci.0,addr=0x7,drive=libvirt-2-format,id=virtio-disk0,bootindex=2,write-cache=on,serial=1b990c4d13b74a4e90ea: Could not open '/dev/drbd1003': Interrupted system call > > What is the actual signal you are seeing that impacts QEMU > in this way ? > I do not know at this point. This only reproduces on a customer's system we do not have access to. We do not see it in our in-house lab. And qemu is called through libvirt through ApacheCloudStack. And it affects only about 10%-20% of the VM start operations. I will wrap my head around bpftrace and see if I can instruct the customer to run that on his systems. So, maybe I can answer the question regarding the signal in a few days. Maybe next week. The backing device we use (drbd) does an "auto promote" action in its open implementation. That involves exchanging some packets with some peers on the local network. I guess that takes between 1ms to 10ms. So, it exposes a larger time window than other backing block devices, which probably have a shorter running open implementation. So this is why we see it sometimes. with regards, Philipp
diff --git a/util/osdep.c b/util/osdep.c index 770369831b..a1269d9345 100644 --- a/util/osdep.c +++ b/util/osdep.c @@ -294,14 +294,17 @@ bool qemu_has_direct_io(void) static int qemu_open_cloexec(const char *name, int flags, mode_t mode) { int ret; + do { #ifdef O_CLOEXEC - ret = open(name, flags | O_CLOEXEC, mode); + ret = open(name, flags | O_CLOEXEC, mode); #else - ret = open(name, flags, mode); - if (ret >= 0) { - qemu_set_cloexec(ret); - } + ret = open(name, flags, mode); + if (ret >= 0) { + qemu_set_cloexec(ret); + } #endif + } while (ret == -1 && errno == EINTR); + return ret; }
As with many syscalls, open() might be interrupted by a signal. The experienced logfile entry is: qemu-system-x86_64: -device virtio-blk-pci,bus=pci.0,addr=0x7,drive=libvirt-2-format,id=virtio-disk0,bootindex=2,write-cache=on,serial=1b990c4d13b74a4e90ea: Could not open '/dev/drbd1003': Interrupted system call Retry it until it is not interrupted by a signal. FYI, dd has the same kind of loop aroud open(). https://github.com/coreutils/coreutils/blob/1ae98dbda7322427e8226356fd110d2553f5fac9/src/dd.c#L1294-L1299 Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> --- util/osdep.c | 13 ++++++++----- 1 file changed, 8 insertions(+), 5 deletions(-)