diff mbox series

[1/9] monitor: Honor QMP request for fd removal immediately

Message ID 20240426142042.14573-2-farosas@suse.de (mailing list archive)
State New, archived
Headers show
Series migration/mapped-ram: Add direct-io support | expand

Commit Message

Fabiano Rosas April 26, 2024, 2:20 p.m. UTC
We're enabling using the fdset interface to pass file descriptors for
use in the migration code. Since migrations can happen more than once
during the VMs lifetime, we need a way to remove an fd from the fdset
at the end of migration.

The current code only removes an fd from the fdset if the VM is
running. This causes a QMP call to "remove-fd" to not actually remove
the fd if the VM happens to be stopped.

While the fd would eventually be removed when monitor_fdset_cleanup()
is called again, the user request should be honored and the fd
actually removed. Calling remove-fd + query-fdset shows a recently
removed fd still present.

The runstate_is_running() check was introduced by commit ebe52b592d
("monitor: Prevent removing fd from set during init"), which by the
shortlog indicates that they were trying to avoid removing an
yet-unduplicated fd too early.

I don't see why an fd explicitly removed with qmp_remove_fd() should
be under runstate_is_running(). I'm assuming this was a mistake when
adding the parenthesis around the expression.

Move the runstate_is_running() check to apply only to the
QLIST_EMPTY(dup_fds) side of the expression and ignore it when
mon_fdset_fd->removed has been explicitly set.

Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
 monitor/fds.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

Comments

Peter Xu May 3, 2024, 4:02 p.m. UTC | #1
On Fri, Apr 26, 2024 at 11:20:34AM -0300, Fabiano Rosas wrote:
> We're enabling using the fdset interface to pass file descriptors for
> use in the migration code. Since migrations can happen more than once
> during the VMs lifetime, we need a way to remove an fd from the fdset
> at the end of migration.
> 
> The current code only removes an fd from the fdset if the VM is
> running. This causes a QMP call to "remove-fd" to not actually remove
> the fd if the VM happens to be stopped.
> 
> While the fd would eventually be removed when monitor_fdset_cleanup()
> is called again, the user request should be honored and the fd
> actually removed. Calling remove-fd + query-fdset shows a recently
> removed fd still present.
> 
> The runstate_is_running() check was introduced by commit ebe52b592d
> ("monitor: Prevent removing fd from set during init"), which by the
> shortlog indicates that they were trying to avoid removing an
> yet-unduplicated fd too early.
> 
> I don't see why an fd explicitly removed with qmp_remove_fd() should
> be under runstate_is_running(). I'm assuming this was a mistake when
> adding the parenthesis around the expression.
> 
> Move the runstate_is_running() check to apply only to the
> QLIST_EMPTY(dup_fds) side of the expression and ignore it when
> mon_fdset_fd->removed has been explicitly set.

I am confused on why the fdset removal is as complicated.  I'm also
wondering here whether it's dropped because we checked against
"mon_refcount == 0", and maybe monitor_fdset_cleanup() is simply called
_before_ a monitor is created?  Why do we need such check on the first
place?

I'm thinking one case where the only QMP monitor got (for some reason)
disconnected, and reconnected again during VM running.  Won't current code
already lead to unwanted removal of mostly all fds due to mon_refcount==0?

I also am confused why ->removed flags is ever needed, and why we can't
already remove the fdsets fds if found matching.

Copy Corey, Eric and Kevin.

> 
> Signed-off-by: Fabiano Rosas <farosas@suse.de>
> ---
>  monitor/fds.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/monitor/fds.c b/monitor/fds.c
> index d86c2c674c..4ec3b7eea9 100644
> --- a/monitor/fds.c
> +++ b/monitor/fds.c
> @@ -173,9 +173,9 @@ static void monitor_fdset_cleanup(MonFdset *mon_fdset)
>      MonFdsetFd *mon_fdset_fd_next;
>  
>      QLIST_FOREACH_SAFE(mon_fdset_fd, &mon_fdset->fds, next, mon_fdset_fd_next) {
> -        if ((mon_fdset_fd->removed ||
> -                (QLIST_EMPTY(&mon_fdset->dup_fds) && mon_refcount == 0)) &&
> -                runstate_is_running()) {
> +        if (mon_fdset_fd->removed ||
> +            (QLIST_EMPTY(&mon_fdset->dup_fds) && mon_refcount == 0 &&
> +             runstate_is_running())) {
>              close(mon_fdset_fd->fd);
>              g_free(mon_fdset_fd->opaque);
>              QLIST_REMOVE(mon_fdset_fd, next);
> -- 
> 2.35.3
>
Daniel P. Berrangé May 8, 2024, 7:17 a.m. UTC | #2
On Fri, Apr 26, 2024 at 11:20:34AM -0300, Fabiano Rosas wrote:
> We're enabling using the fdset interface to pass file descriptors for
> use in the migration code. Since migrations can happen more than once
> during the VMs lifetime, we need a way to remove an fd from the fdset
> at the end of migration.
> 
> The current code only removes an fd from the fdset if the VM is
> running. This causes a QMP call to "remove-fd" to not actually remove
> the fd if the VM happens to be stopped.
> 
> While the fd would eventually be removed when monitor_fdset_cleanup()
> is called again, the user request should be honored and the fd
> actually removed. Calling remove-fd + query-fdset shows a recently
> removed fd still present.
> 
> The runstate_is_running() check was introduced by commit ebe52b592d
> ("monitor: Prevent removing fd from set during init"), which by the
> shortlog indicates that they were trying to avoid removing an
> yet-unduplicated fd too early.

IMHO that should be reverted. The justification says

  "If an fd is added to an fd set via the command line, and it is not
   referenced by another command line option (ie. -drive), then clean
   it up after QEMU initialization is complete"

which I think is pretty weak. Why should QEMU forceably stop an app
from passing in an FD to be used by a QMP command issued just after
the VM starts running ?  While it could just use QMP to pass in the
FD set, the mgmt app might have its own reason for wanting QEMU to
own the passed FD from the very start of the process execve().

Implicitly this cleanup is attempting to "fix" a bug where the mgmt
app passes in an FD that it never needed. If any such bug were ever
found, then the mgmt app should just be fixed to not pass it in. I
don't think QEMU needs to be trying to fix mgmt app bugs.

IOW, this commit is imposing an arbitrary & unecessary usage policy
on passed in FD sets, and as your commit explains has further
unhelpful (& undocumented) side effects on the 'remove-fd' QMP command.

Just revert it IMHO.

> 
> I don't see why an fd explicitly removed with qmp_remove_fd() should
> be under runstate_is_running(). I'm assuming this was a mistake when
> adding the parenthesis around the expression.
> 
> Move the runstate_is_running() check to apply only to the
> QLIST_EMPTY(dup_fds) side of the expression and ignore it when
> mon_fdset_fd->removed has been explicitly set.
> 
> Signed-off-by: Fabiano Rosas <farosas@suse.de>
> ---
>  monitor/fds.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/monitor/fds.c b/monitor/fds.c
> index d86c2c674c..4ec3b7eea9 100644
> --- a/monitor/fds.c
> +++ b/monitor/fds.c
> @@ -173,9 +173,9 @@ static void monitor_fdset_cleanup(MonFdset *mon_fdset)
>      MonFdsetFd *mon_fdset_fd_next;
>  
>      QLIST_FOREACH_SAFE(mon_fdset_fd, &mon_fdset->fds, next, mon_fdset_fd_next) {
> -        if ((mon_fdset_fd->removed ||
> -                (QLIST_EMPTY(&mon_fdset->dup_fds) && mon_refcount == 0)) &&
> -                runstate_is_running()) {
> +        if (mon_fdset_fd->removed ||
> +            (QLIST_EMPTY(&mon_fdset->dup_fds) && mon_refcount == 0 &&
> +             runstate_is_running())) {
>              close(mon_fdset_fd->fd);
>              g_free(mon_fdset_fd->opaque);
>              QLIST_REMOVE(mon_fdset_fd, next);
> -- 
> 2.35.3
> 

With regards,
Daniel
Fabiano Rosas May 16, 2024, 9:46 p.m. UTC | #3
Hi all, sorry to have been away from this thread for a while, I was
trying to catch up on my reviews queue.

Peter Xu <peterx@redhat.com> writes:

> On Fri, Apr 26, 2024 at 11:20:34AM -0300, Fabiano Rosas wrote:
>> We're enabling using the fdset interface to pass file descriptors for
>> use in the migration code. Since migrations can happen more than once
>> during the VMs lifetime, we need a way to remove an fd from the fdset
>> at the end of migration.
>> 
>> The current code only removes an fd from the fdset if the VM is
>> running. This causes a QMP call to "remove-fd" to not actually remove
>> the fd if the VM happens to be stopped.
>> 
>> While the fd would eventually be removed when monitor_fdset_cleanup()
>> is called again, the user request should be honored and the fd
>> actually removed. Calling remove-fd + query-fdset shows a recently
>> removed fd still present.
>> 
>> The runstate_is_running() check was introduced by commit ebe52b592d
>> ("monitor: Prevent removing fd from set during init"), which by the
>> shortlog indicates that they were trying to avoid removing an
>> yet-unduplicated fd too early.
>> 
>> I don't see why an fd explicitly removed with qmp_remove_fd() should
>> be under runstate_is_running(). I'm assuming this was a mistake when
>> adding the parenthesis around the expression.
>> 
>> Move the runstate_is_running() check to apply only to the
>> QLIST_EMPTY(dup_fds) side of the expression and ignore it when
>> mon_fdset_fd->removed has been explicitly set.
>
> I am confused on why the fdset removal is as complicated.  I'm also
> wondering here whether it's dropped because we checked against
> "mon_refcount == 0", and maybe monitor_fdset_cleanup() is simply called
> _before_ a monitor is created?  Why do we need such check on the first
> place?
>

It seems the intention was to reuse monitor_fdset_cleanup() to do
cleanup when all monitors connections are closed:

efb87c1697 ("monitor: Clean up fd sets on monitor disconnect")
Author: Corey Bryant <coreyb@linux.vnet.ibm.com>
Date:   Tue Aug 14 16:43:48 2012 -0400

    monitor: Clean up fd sets on monitor disconnect
    
    Fd sets are shared by all monitor connections.  Fd sets are considered
    to be in use while at least one monitor is connected.  When the last
    monitor disconnects, all fds that are members of an fd set with no
    outstanding dup references are closed.  This prevents any fd leakage
    associated with a client disconnect prior to using a passed fd.
    
    Signed-off-by: Corey Bryant <coreyb@linux.vnet.ibm.com>
    Signed-off-by: Kevin Wolf <kwolf@redhat.com>

This could have been done differently at monitor_qmp_event():

    case CHR_EVENT_CLOSED:
        ...
        mon_refcount--;
        if (mon_refcount == 0) {
            monitor_fdsets_cleanup();
        }

But maybe there was a concern about making sure the empty fdsets (last
block in monitor_fdset_cleanup) were removed at every refcount decrement
and not only when mon_refcount == 0 for some reason.

> I'm thinking one case where the only QMP monitor got (for some reason)
> disconnected, and reconnected again during VM running.  Won't current code
> already lead to unwanted removal of mostly all fds due to mon_refcount==0?

I think that's the case that the patch in question was trying to
avoid. If the last monitor connects and disconnects while fds have not
been dup'ed yet, the mon_fdset->dup_fds list will be empty and what you
say will happen. There seems to be an assumption that after the guest
starts running all fds that need to be dup'ed will already have been
dup'ed.

So I think we cannot simply revert the patch as Daniel suggested,
because that might regress the original block layer use-case if a
monitor open->close causes the removal of all the yet undup'ed fds[1].

For the migration use-case, the dup() only happens after the migrate
command has been issued, so the runstate_is_running() check doesn't help
us. But it also doesn't hurt. However, we're still exposed to a monitor
disconnection removing the undup'ed fds. So AFAICS, we either stop
calling monitor_fdset_cleanup() at monitor close or declare that this
issue is unlikely to occur (because it is) and leave a comment in the
code.

===========
1- I ran a quick test:

connect()         // monitor opened: refcnt: 1

{"execute": "add-fd", "arguments": {"fdset-id": 1}}
{"return": {"fd": 9, "fdset-id": 1}}

{"execute": "add-fd", "arguments": {"fdset-id": 1}}
{"return": {"fd": 21, "fdset-id": 1}}

close()           // monitor closed: refcnt: 0

connect()         // monitor opened: refcnt: 1

{"execute": "migrate", "arguments": {"uri": "file:/dev/fdset/1,offset=4096"}}
{
    "error": {
        "class": "GenericError",
        "desc": "Outgoing migration needs two fds in the fdset, got 0"
    }
}

>
> I also am confused why ->removed flags is ever needed, and why we can't
> already remove the fdsets fds if found matching.
>

Prior to commit efb87c1697 ("monitor: Clean up fd sets on monitor
disconnect") we only called monitor_fdset_cleanup() from
qmp_remove_fd(), so we effectively always removed immediately after
setting ->removed = true. I don't see a reason to have the flag either.
Fabiano Rosas May 16, 2024, 10 p.m. UTC | #4
Daniel P. Berrangé <berrange@redhat.com> writes:

> On Fri, Apr 26, 2024 at 11:20:34AM -0300, Fabiano Rosas wrote:
>> We're enabling using the fdset interface to pass file descriptors for
>> use in the migration code. Since migrations can happen more than once
>> during the VMs lifetime, we need a way to remove an fd from the fdset
>> at the end of migration.
>> 
>> The current code only removes an fd from the fdset if the VM is
>> running. This causes a QMP call to "remove-fd" to not actually remove
>> the fd if the VM happens to be stopped.
>> 
>> While the fd would eventually be removed when monitor_fdset_cleanup()
>> is called again, the user request should be honored and the fd
>> actually removed. Calling remove-fd + query-fdset shows a recently
>> removed fd still present.
>> 
>> The runstate_is_running() check was introduced by commit ebe52b592d
>> ("monitor: Prevent removing fd from set during init"), which by the
>> shortlog indicates that they were trying to avoid removing an
>> yet-unduplicated fd too early.
>
> IMHO that should be reverted. The justification says
>
>   "If an fd is added to an fd set via the command line, and it is not
>    referenced by another command line option (ie. -drive), then clean
>    it up after QEMU initialization is complete"
>
> which I think is pretty weak. Why should QEMU forceably stop an app
> from passing in an FD to be used by a QMP command issued just after
> the VM starts running ?  While it could just use QMP to pass in the
> FD set, the mgmt app might have its own reason for wanting QEMU to
> own the passed FD from the very start of the process execve().

I don't think that's what that patch does. That description is
misleading. I read it as:

   "If an fd is added to an fd set via the command line, and it is not
    referenced by another command line option (ie. -drive), then clean
    it up ONLY after QEMU initialization is complete"
          ^

By the subject ("monitor: Prevent removing fd from set during init") and
the fact that this function is only called when the monitor connection
closes, I believe the idea was to *save* the fds until after the VM
starts running, i.e. some fd was being lost because
monitor_fdset_cleanup() was being called before the dup().

See my reply to Peter in this same patch (PATCH 1/9).

>
> Implicitly this cleanup is attempting to "fix" a bug where the mgmt
> app passes in an FD that it never needed. If any such bug were ever
> found, then the mgmt app should just be fixed to not pass it in. I
> don't think QEMU needs to be trying to fix mgmt app bugs.
>
> IOW, this commit is imposing an arbitrary & unecessary usage policy
> on passed in FD sets, and as your commit explains has further
> unhelpful (& undocumented) side effects on the 'remove-fd' QMP command.
>
> Just revert it IMHO.
>
>> 
>> I don't see why an fd explicitly removed with qmp_remove_fd() should
>> be under runstate_is_running(). I'm assuming this was a mistake when
>> adding the parenthesis around the expression.
>> 
>> Move the runstate_is_running() check to apply only to the
>> QLIST_EMPTY(dup_fds) side of the expression and ignore it when
>> mon_fdset_fd->removed has been explicitly set.
>> 
>> Signed-off-by: Fabiano Rosas <farosas@suse.de>
>> ---
>>  monitor/fds.c | 6 +++---
>>  1 file changed, 3 insertions(+), 3 deletions(-)
>> 
>> diff --git a/monitor/fds.c b/monitor/fds.c
>> index d86c2c674c..4ec3b7eea9 100644
>> --- a/monitor/fds.c
>> +++ b/monitor/fds.c
>> @@ -173,9 +173,9 @@ static void monitor_fdset_cleanup(MonFdset *mon_fdset)
>>      MonFdsetFd *mon_fdset_fd_next;
>>  
>>      QLIST_FOREACH_SAFE(mon_fdset_fd, &mon_fdset->fds, next, mon_fdset_fd_next) {
>> -        if ((mon_fdset_fd->removed ||
>> -                (QLIST_EMPTY(&mon_fdset->dup_fds) && mon_refcount == 0)) &&
>> -                runstate_is_running()) {
>> +        if (mon_fdset_fd->removed ||
>> +            (QLIST_EMPTY(&mon_fdset->dup_fds) && mon_refcount == 0 &&
>> +             runstate_is_running())) {
>>              close(mon_fdset_fd->fd);
>>              g_free(mon_fdset_fd->opaque);
>>              QLIST_REMOVE(mon_fdset_fd, next);
>> -- 
>> 2.35.3
>> 
>
> With regards,
> Daniel
Daniel P. Berrangé May 17, 2024, 7:33 a.m. UTC | #5
On Thu, May 16, 2024 at 07:00:11PM -0300, Fabiano Rosas wrote:
> Daniel P. Berrangé <berrange@redhat.com> writes:
> 
> > On Fri, Apr 26, 2024 at 11:20:34AM -0300, Fabiano Rosas wrote:
> >> We're enabling using the fdset interface to pass file descriptors for
> >> use in the migration code. Since migrations can happen more than once
> >> during the VMs lifetime, we need a way to remove an fd from the fdset
> >> at the end of migration.
> >> 
> >> The current code only removes an fd from the fdset if the VM is
> >> running. This causes a QMP call to "remove-fd" to not actually remove
> >> the fd if the VM happens to be stopped.
> >> 
> >> While the fd would eventually be removed when monitor_fdset_cleanup()
> >> is called again, the user request should be honored and the fd
> >> actually removed. Calling remove-fd + query-fdset shows a recently
> >> removed fd still present.
> >> 
> >> The runstate_is_running() check was introduced by commit ebe52b592d
> >> ("monitor: Prevent removing fd from set during init"), which by the
> >> shortlog indicates that they were trying to avoid removing an
> >> yet-unduplicated fd too early.
> >
> > IMHO that should be reverted. The justification says
> >
> >   "If an fd is added to an fd set via the command line, and it is not
> >    referenced by another command line option (ie. -drive), then clean
> >    it up after QEMU initialization is complete"
> >
> > which I think is pretty weak. Why should QEMU forceably stop an app
> > from passing in an FD to be used by a QMP command issued just after
> > the VM starts running ?  While it could just use QMP to pass in the
> > FD set, the mgmt app might have its own reason for wanting QEMU to
> > own the passed FD from the very start of the process execve().
> 
> I don't think that's what that patch does. That description is
> misleading. I read it as:
> 
>    "If an fd is added to an fd set via the command line, and it is not
>     referenced by another command line option (ie. -drive), then clean
>     it up ONLY after QEMU initialization is complete"
>           ^
> 
> By the subject ("monitor: Prevent removing fd from set during init") and
> the fact that this function is only called when the monitor connection
> closes, I believe the idea was to *save* the fds until after the VM
> starts running, i.e. some fd was being lost because
> monitor_fdset_cleanup() was being called before the dup().

I know that, but I'm saying QEMU should not be doing *any* generic cleanup
of passed in FDs at any point. 

A passed in FD should be taken by whatever part of the QEMU configuration
is told to use it when needed, and this takes responsibility for closing
it. If nothing is told to use the fdset /yet/, then it should stay in the
fdset untouched for later use.

If an application accidentally passes in a FD that it doesn't reference
in any configuration, that's simply a application bug to fix. QEMU does
not need to secondguess the app's intent and decide to arbitrarily close
it.

With regards,
Daniel
diff mbox series

Patch

diff --git a/monitor/fds.c b/monitor/fds.c
index d86c2c674c..4ec3b7eea9 100644
--- a/monitor/fds.c
+++ b/monitor/fds.c
@@ -173,9 +173,9 @@  static void monitor_fdset_cleanup(MonFdset *mon_fdset)
     MonFdsetFd *mon_fdset_fd_next;
 
     QLIST_FOREACH_SAFE(mon_fdset_fd, &mon_fdset->fds, next, mon_fdset_fd_next) {
-        if ((mon_fdset_fd->removed ||
-                (QLIST_EMPTY(&mon_fdset->dup_fds) && mon_refcount == 0)) &&
-                runstate_is_running()) {
+        if (mon_fdset_fd->removed ||
+            (QLIST_EMPTY(&mon_fdset->dup_fds) && mon_refcount == 0 &&
+             runstate_is_running())) {
             close(mon_fdset_fd->fd);
             g_free(mon_fdset_fd->opaque);
             QLIST_REMOVE(mon_fdset_fd, next);