diff mbox series

migration: Cleanup during exit

Message ID 20190227164900.16378-1-dgilbert@redhat.com (mailing list archive)
State New, archived
Headers show
Series migration: Cleanup during exit | expand

Commit Message

Dr. David Alan Gilbert Feb. 27, 2019, 4:49 p.m. UTC
From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Currently we cleanup the migration object as we exit main after the
main_loop finishes; however if there's a migration running things
get messy and we can end up with the migration thread still trying
to access freed structures.

We now take a ref to the object around the migration thread itself,
so the act of dropping the ref during exit doesn't cause us to lose
the state until the thread quits.

Cancelling the migration during migration also tries to get the thread
to quit.

We do this a bit earlier; so hopefully migration gets out of the way
before all the devices etc are freed.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Tested-by: Alex Bennée <alex.bennee@linaro.org>
---
 include/migration/misc.h |  2 +-
 migration/migration.c    | 10 +++++++++-
 vl.c                     |  7 ++++++-
 3 files changed, 16 insertions(+), 3 deletions(-)

Comments

Juan Quintela Feb. 27, 2019, 8:32 p.m. UTC | #1
"Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> Currently we cleanup the migration object as we exit main after the
> main_loop finishes; however if there's a migration running things
> get messy and we can end up with the migration thread still trying
> to access freed structures.
>
> We now take a ref to the object around the migration thread itself,
> so the act of dropping the ref during exit doesn't cause us to lose
> the state until the thread quits.
>
> Cancelling the migration during migration also tries to get the thread
> to quit.
>
> We do this a bit earlier; so hopefully migration gets out of the way
> before all the devices etc are freed.
>
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> Tested-by: Alex Bennée <alex.bennee@linaro.org>

Reviewed-by: Juan Quintela <quintela@redhat.com>
Peter Xu Feb. 28, 2019, 6:28 a.m. UTC | #2
On Wed, Feb 27, 2019 at 04:49:00PM +0000, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Currently we cleanup the migration object as we exit main after the
> main_loop finishes; however if there's a migration running things
> get messy and we can end up with the migration thread still trying
> to access freed structures.
> 
> We now take a ref to the object around the migration thread itself,
> so the act of dropping the ref during exit doesn't cause us to lose
> the state until the thread quits.
> 
> Cancelling the migration during migration also tries to get the thread
> to quit.
> 
> We do this a bit earlier; so hopefully migration gets out of the way
> before all the devices etc are freed.

So does it mean that even with the patch it's still possible the
migration thread will be accessing device structs that have already
been freed which can still crash QEMU?

Thanks,
Alex Bennée Feb. 28, 2019, 10:49 a.m. UTC | #3
Dr. David Alan Gilbert (git) <dgilbert@redhat.com> writes:

> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> Currently we cleanup the migration object as we exit main after the
> main_loop finishes; however if there's a migration running things
> get messy and we can end up with the migration thread still trying
> to access freed structures.
>
> We now take a ref to the object around the migration thread itself,
> so the act of dropping the ref during exit doesn't cause us to lose
> the state until the thread quits.
>
> Cancelling the migration during migration also tries to get the thread
> to quit.
>
> We do this a bit earlier; so hopefully migration gets out of the way
> before all the devices etc are freed.
>
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> Tested-by: Alex Bennée <alex.bennee@linaro.org>
> ---
>  include/migration/misc.h |  2 +-
>  migration/migration.c    | 10 +++++++++-
>  vl.c                     |  7 ++++++-
>  3 files changed, 16 insertions(+), 3 deletions(-)
>
> diff --git a/include/migration/misc.h b/include/migration/misc.h
> index 0471e04d1f..6f9df74436 100644
> --- a/include/migration/misc.h
> +++ b/include/migration/misc.h
> @@ -36,7 +36,7 @@ void dump_vmstate_json_to_file(FILE *out_fp);
>
>  /* migration/migration.c */
>  void migration_object_init(void);
> -void migration_object_finalize(void);
> +void migration_shutdown(void);
>  void qemu_start_incoming_migration(const char *uri, Error **errp);
>  bool migration_is_idle(void);
>  void add_migration_state_change_notifier(Notifier *notify);
> diff --git a/migration/migration.c b/migration/migration.c
> index e44f77af02..d45561f9b8 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -126,6 +126,7 @@ static bool migration_object_check(MigrationState *ms, Error **errp);
>  static int migration_maybe_pause(MigrationState *s,
>                                   int *current_active_state,
>                                   int new_state);
> +static void migrate_fd_cancel(MigrationState *s);
>
>  void migration_object_init(void)
>  {
> @@ -167,8 +168,13 @@ void migration_object_init(void)
>      }
>  }
>
> -void migration_object_finalize(void)
> +void migration_shutdown(void)
>  {
> +    /*
> +     * Cancel the current migration - that will (eventually)
> +     * stop the migration using this structure
> +     */
> +    migrate_fd_cancel(current_migration);
>      object_unref(OBJECT(current_migration));
>  }
>
> @@ -3134,6 +3140,7 @@ static void *migration_thread(void *opaque)
>
>      rcu_register_thread();
>
> +    object_ref(OBJECT(s));
>      s->iteration_start_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
>
>      qemu_savevm_state_header(s->to_dst_file);
> @@ -3230,6 +3237,7 @@ static void *migration_thread(void *opaque)
>
>      trace_migration_thread_after_loop();
>      migration_iteration_finish(s);
> +    object_unref(OBJECT(s));
>      rcu_unregister_thread();
>      return NULL;
>  }
> diff --git a/vl.c b/vl.c
> index 2f340686a7..e7f460a114 100644
> --- a/vl.c
> +++ b/vl.c
> @@ -4579,6 +4579,12 @@ int main(int argc, char **argv, char **envp)
>
>      gdbserver_cleanup();
>
> +    /*
> +     * cleaning up the migration object cancels any existing migration
> +     * try to do this early so that it also stops using devices.
> +     */
> +    migration_shutdown();
> +

We do like to vary our language over here, I guess shutdown and cleanup
imply subtly different things. FWIW I suspect the gdbserver_cleanup()
should be a shutdown ;-)

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

>      /* No more vcpu or device emulation activity beyond this point */
>      vm_shutdown();
>
> @@ -4594,7 +4600,6 @@ int main(int argc, char **argv, char **envp)
>      monitor_cleanup();
>      qemu_chr_cleanup();
>      user_creatable_cleanup();
> -    migration_object_finalize();
>      /* TODO: unref root container, check all devices are ok */
>
>      return 0;


--
Alex Bennée
Dr. David Alan Gilbert Feb. 28, 2019, 11:40 a.m. UTC | #4
* Peter Xu (peterx@redhat.com) wrote:
> On Wed, Feb 27, 2019 at 04:49:00PM +0000, Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > 
> > Currently we cleanup the migration object as we exit main after the
> > main_loop finishes; however if there's a migration running things
> > get messy and we can end up with the migration thread still trying
> > to access freed structures.
> > 
> > We now take a ref to the object around the migration thread itself,
> > so the act of dropping the ref during exit doesn't cause us to lose
> > the state until the thread quits.
> > 
> > Cancelling the migration during migration also tries to get the thread
> > to quit.
> > 
> > We do this a bit earlier; so hopefully migration gets out of the way
> > before all the devices etc are freed.
> 
> So does it mean that even with the patch it's still possible the
> migration thread will be accessing device structs that have already
> been freed which can still crash QEMU?

Possibly yes; I'm not sure how to go to the next stage and stop that
case; the consensus seems to be we don't want to explicitly block
during the exit process, so doing a join on the migration thread doesn't
seem to be wanted.

Dave

> Thanks,
> 
> -- 
> Peter Xu
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
Daniel P. Berrangé Feb. 28, 2019, 12:01 p.m. UTC | #5
On Thu, Feb 28, 2019 at 11:40:19AM +0000, Dr. David Alan Gilbert wrote:
> * Peter Xu (peterx@redhat.com) wrote:
> > On Wed, Feb 27, 2019 at 04:49:00PM +0000, Dr. David Alan Gilbert (git) wrote:
> > > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > > 
> > > Currently we cleanup the migration object as we exit main after the
> > > main_loop finishes; however if there's a migration running things
> > > get messy and we can end up with the migration thread still trying
> > > to access freed structures.
> > > 
> > > We now take a ref to the object around the migration thread itself,
> > > so the act of dropping the ref during exit doesn't cause us to lose
> > > the state until the thread quits.
> > > 
> > > Cancelling the migration during migration also tries to get the thread
> > > to quit.
> > > 
> > > We do this a bit earlier; so hopefully migration gets out of the way
> > > before all the devices etc are freed.
> > 
> > So does it mean that even with the patch it's still possible the
> > migration thread will be accessing device structs that have already
> > been freed which can still crash QEMU?
> 
> Possibly yes; I'm not sure how to go to the next stage and stop that
> case; the consensus seems to be we don't want to explicitly block
> during the exit process, so doing a join on the migration thread doesn't
> seem to be wanted.

Essentially the many  *_cleanup() calls at the end of main() in vl.c
are only ever safe if all background threads have stopped using the
resources that are being freed. This isn't the case with migration
currently. I also worry about other threads that might be running
in QEMU, SPICE in particular as it touchs many device backends.

Aborting the migration & joining the migration threads is the natural
way to address this. Cancelling appears to require the main loop to
still be running, so would require main_loop_should_exit() to issue
the cancel & return false unless it has completed. This would delay
shutdown for as long as it takes migration to abort.

FWIW, even the bdrv_close_all() call during shutdown can delay things
for a considerable time if draining the backends isn't a quick op,
which could be the case if there are storage problems (blocked local
I/O, or interrupted network - rbd/ceph/nfs clients). So I'm not sure
that stopping migration is inherantly worse than what's already
possible with block cleanup, in terms of delaying things.

A slightly more hacky approach would be to pthread_cancel() all the
migration threads. Normally we'd never use pthread_cancel() as it
is incredibly hard to ensure all memory used by the threads is
freed. Since we're about to exit the process though, this cleanup
does not matter. The risk, however, is that one of the threads is
holding a mutex which then blocks the rest of the *cleanup functions,
so this still feels dangerous.

Overall to me a clean cancel & join of the migration threads feels
like the only safe option, unless we just remove all notion of
cleanup from QEMU & delete all the _cleanup functions in main()
and let the OS free all memory.

Regards,
Daniel
Dr. David Alan Gilbert Feb. 28, 2019, 12:28 p.m. UTC | #6
* Daniel P. Berrangé (berrange@redhat.com) wrote:
> On Thu, Feb 28, 2019 at 11:40:19AM +0000, Dr. David Alan Gilbert wrote:
> > * Peter Xu (peterx@redhat.com) wrote:
> > > On Wed, Feb 27, 2019 at 04:49:00PM +0000, Dr. David Alan Gilbert (git) wrote:
> > > > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > > > 
> > > > Currently we cleanup the migration object as we exit main after the
> > > > main_loop finishes; however if there's a migration running things
> > > > get messy and we can end up with the migration thread still trying
> > > > to access freed structures.
> > > > 
> > > > We now take a ref to the object around the migration thread itself,
> > > > so the act of dropping the ref during exit doesn't cause us to lose
> > > > the state until the thread quits.
> > > > 
> > > > Cancelling the migration during migration also tries to get the thread
> > > > to quit.
> > > > 
> > > > We do this a bit earlier; so hopefully migration gets out of the way
> > > > before all the devices etc are freed.
> > > 
> > > So does it mean that even with the patch it's still possible the
> > > migration thread will be accessing device structs that have already
> > > been freed which can still crash QEMU?
> > 
> > Possibly yes; I'm not sure how to go to the next stage and stop that
> > case; the consensus seems to be we don't want to explicitly block
> > during the exit process, so doing a join on the migration thread doesn't
> > seem to be wanted.
> 
> Essentially the many  *_cleanup() calls at the end of main() in vl.c
> are only ever safe if all background threads have stopped using the
> resources that are being freed. This isn't the case with migration
> currently. I also worry about other threads that might be running
> in QEMU, SPICE in particular as it touchs many device backends.
> 
> Aborting the migration & joining the migration threads is the natural
> way to address this. Cancelling appears to require the main loop to
> still be running, so would require main_loop_should_exit() to issue
> the cancel & return false unless it has completed. This would delay
> shutdown for as long as it takes migration to abort.

ish - cancelling performs a shutdown(2) on the fd, that should be enough
in most cases to kick the migration thread into falling out without
main loop interaction; I think...

> FWIW, even the bdrv_close_all() call during shutdown can delay things
> for a considerable time if draining the backends isn't a quick op,
> which could be the case if there are storage problems (blocked local
> I/O, or interrupted network - rbd/ceph/nfs clients). So I'm not sure
> that stopping migration is inherantly worse than what's already
> possible with block cleanup, in terms of delaying things.
> 
> A slightly more hacky approach would be to pthread_cancel() all the
> migration threads. Normally we'd never use pthread_cancel() as it
> is incredibly hard to ensure all memory used by the threads is
> freed. Since we're about to exit the process though, this cleanup
> does not matter. The risk, however, is that one of the threads is
> holding a mutex which then blocks the rest of the *cleanup functions,
> so this still feels dangerous.
> 
> Overall to me a clean cancel & join of the migration threads feels
> like the only safe option, unless we just remove all notion of
> cleanup from QEMU & delete all the _cleanup functions in main()
> and let the OS free all memory.

That's actually my preference; I think trying to do clean tidy ups
here is very racy and doesn't gain much.  However, for things like
storage there may be locks that have to be properly dropped and
bitmaps and things to be stored/sync'd - so just exiting probably
isn't safe either.

Dave

> Regards,
> Daniel
> -- 
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
Peter Xu March 1, 2019, 3:27 a.m. UTC | #7
On Thu, Feb 28, 2019 at 12:28:22PM +0000, Dr. David Alan Gilbert wrote:
> * Daniel P. Berrangé (berrange@redhat.com) wrote:
> > On Thu, Feb 28, 2019 at 11:40:19AM +0000, Dr. David Alan Gilbert wrote:
> > > * Peter Xu (peterx@redhat.com) wrote:
> > > > On Wed, Feb 27, 2019 at 04:49:00PM +0000, Dr. David Alan Gilbert (git) wrote:
> > > > > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > > > > 
> > > > > Currently we cleanup the migration object as we exit main after the
> > > > > main_loop finishes; however if there's a migration running things
> > > > > get messy and we can end up with the migration thread still trying
> > > > > to access freed structures.
> > > > > 
> > > > > We now take a ref to the object around the migration thread itself,
> > > > > so the act of dropping the ref during exit doesn't cause us to lose
> > > > > the state until the thread quits.
> > > > > 
> > > > > Cancelling the migration during migration also tries to get the thread
> > > > > to quit.
> > > > > 
> > > > > We do this a bit earlier; so hopefully migration gets out of the way
> > > > > before all the devices etc are freed.
> > > > 
> > > > So does it mean that even with the patch it's still possible the
> > > > migration thread will be accessing device structs that have already
> > > > been freed which can still crash QEMU?
> > > 
> > > Possibly yes; I'm not sure how to go to the next stage and stop that
> > > case; the consensus seems to be we don't want to explicitly block
> > > during the exit process, so doing a join on the migration thread doesn't
> > > seem to be wanted.
> > 
> > Essentially the many  *_cleanup() calls at the end of main() in vl.c
> > are only ever safe if all background threads have stopped using the
> > resources that are being freed. This isn't the case with migration
> > currently. I also worry about other threads that might be running
> > in QEMU, SPICE in particular as it touchs many device backends.
> > 
> > Aborting the migration & joining the migration threads is the natural
> > way to address this. Cancelling appears to require the main loop to
> > still be running, so would require main_loop_should_exit() to issue
> > the cancel & return false unless it has completed. This would delay
> > shutdown for as long as it takes migration to abort.
> 
> ish - cancelling performs a shutdown(2) on the fd, that should be enough
> in most cases to kick the migration thread into falling out without
> main loop interaction; I think...

Dave, could you hint me on when shutdown() will not suffice (say,
we'll hang death if we do join() upon the migration thread)?

> 
> > FWIW, even the bdrv_close_all() call during shutdown can delay things
> > for a considerable time if draining the backends isn't a quick op,
> > which could be the case if there are storage problems (blocked local
> > I/O, or interrupted network - rbd/ceph/nfs clients). So I'm not sure
> > that stopping migration is inherantly worse than what's already
> > possible with block cleanup, in terms of delaying things.
> > 
> > A slightly more hacky approach would be to pthread_cancel() all the
> > migration threads. Normally we'd never use pthread_cancel() as it
> > is incredibly hard to ensure all memory used by the threads is
> > freed. Since we're about to exit the process though, this cleanup
> > does not matter. The risk, however, is that one of the threads is
> > holding a mutex which then blocks the rest of the *cleanup functions,
> > so this still feels dangerous.
> > 
> > Overall to me a clean cancel & join of the migration threads feels
> > like the only safe option, unless we just remove all notion of
> > cleanup from QEMU & delete all the _cleanup functions in main()
> > and let the OS free all memory.
> 
> That's actually my preference; I think trying to do clean tidy ups
> here is very racy and doesn't gain much.  However, for things like
> storage there may be locks that have to be properly dropped and
> bitmaps and things to be stored/sync'd - so just exiting probably
> isn't safe either.

I'm unsure about whether I caught the whole idea but I feel like we
can't drop all the cleanup hooks since what if we want to do something
else than "freeing memories"?  Or anything that the OS can't do for us
but we want to try to do before the process quits.  If that operation
hangs we'll probably face a similar hang issue.

Regarding pthread_cancel() - it will only work if the thread reaches
cancellation points, right?  Then does it mean that it still cannot
guarantee the thread will quit very soon and we still have chance that
we'll wait forever when join()?  One major reason that the thread will
be waiting should be the migration streams but AFAIU shutdown() (as
Dave has mentioned) should solve this problem properly, then I'm
unsuer how pthread_cancel() can help much comparing to shutdown()...

My understanding (probably not correct, but I'll just speak loud...)
is that we will never have a way to guarantee a process to quit
cleanly all the time because there're really many things that can hang
(I'm assuming we've already solved the MigrationState refcounting
issue by this patch, so I'm assuming we're having a "hang QEMU" rather
than crashing issue).  Then IMHO we could simply do whatever we can do
to cleanup everything assuming no hang will happen, and if it really
happens we use SIGKILL which will be the last thing we can do by which
we might lost many things (e.g., unflushed caches in block layer) but
we've tried our best.  For migration, it'll be (1) cancel (not
pthread_cancel, but cancel the migration to trigger shutdown() on fds
or whatever) (2) join thread (3) finalize/cleanup data structures.

IMHO SIGKILL is the real de-facto solution for all these issues to me.
And even if with SIGKILL we can still hang.... but I'll assume those
hangs will be kernel problems not QEMU since SIGKILL should be
designed to kill a process without a question.

Thanks,
Dr. David Alan Gilbert March 1, 2019, 10:11 a.m. UTC | #8
* Peter Xu (peterx@redhat.com) wrote:
> On Thu, Feb 28, 2019 at 12:28:22PM +0000, Dr. David Alan Gilbert wrote:
> > * Daniel P. Berrangé (berrange@redhat.com) wrote:
> > > On Thu, Feb 28, 2019 at 11:40:19AM +0000, Dr. David Alan Gilbert wrote:
> > > > * Peter Xu (peterx@redhat.com) wrote:
> > > > > On Wed, Feb 27, 2019 at 04:49:00PM +0000, Dr. David Alan Gilbert (git) wrote:
> > > > > > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > > > > > 
> > > > > > Currently we cleanup the migration object as we exit main after the
> > > > > > main_loop finishes; however if there's a migration running things
> > > > > > get messy and we can end up with the migration thread still trying
> > > > > > to access freed structures.
> > > > > > 
> > > > > > We now take a ref to the object around the migration thread itself,
> > > > > > so the act of dropping the ref during exit doesn't cause us to lose
> > > > > > the state until the thread quits.
> > > > > > 
> > > > > > Cancelling the migration during migration also tries to get the thread
> > > > > > to quit.
> > > > > > 
> > > > > > We do this a bit earlier; so hopefully migration gets out of the way
> > > > > > before all the devices etc are freed.
> > > > > 
> > > > > So does it mean that even with the patch it's still possible the
> > > > > migration thread will be accessing device structs that have already
> > > > > been freed which can still crash QEMU?
> > > > 
> > > > Possibly yes; I'm not sure how to go to the next stage and stop that
> > > > case; the consensus seems to be we don't want to explicitly block
> > > > during the exit process, so doing a join on the migration thread doesn't
> > > > seem to be wanted.
> > > 
> > > Essentially the many  *_cleanup() calls at the end of main() in vl.c
> > > are only ever safe if all background threads have stopped using the
> > > resources that are being freed. This isn't the case with migration
> > > currently. I also worry about other threads that might be running
> > > in QEMU, SPICE in particular as it touchs many device backends.
> > > 
> > > Aborting the migration & joining the migration threads is the natural
> > > way to address this. Cancelling appears to require the main loop to
> > > still be running, so would require main_loop_should_exit() to issue
> > > the cancel & return false unless it has completed. This would delay
> > > shutdown for as long as it takes migration to abort.
> > 
> > ish - cancelling performs a shutdown(2) on the fd, that should be enough
> > in most cases to kick the migration thread into falling out without
> > main loop interaction; I think...
> 
> Dave, could you hint me on when shutdown() will not suffice (say,
> we'll hang death if we do join() upon the migration thread)?

I think I mostly worry about when we hang in a device's migration code
or where that interacts with an external program (e.g. vhost-user).

> > 
> > > FWIW, even the bdrv_close_all() call during shutdown can delay things
> > > for a considerable time if draining the backends isn't a quick op,
> > > which could be the case if there are storage problems (blocked local
> > > I/O, or interrupted network - rbd/ceph/nfs clients). So I'm not sure
> > > that stopping migration is inherantly worse than what's already
> > > possible with block cleanup, in terms of delaying things.
> > > 
> > > A slightly more hacky approach would be to pthread_cancel() all the
> > > migration threads. Normally we'd never use pthread_cancel() as it
> > > is incredibly hard to ensure all memory used by the threads is
> > > freed. Since we're about to exit the process though, this cleanup
> > > does not matter. The risk, however, is that one of the threads is
> > > holding a mutex which then blocks the rest of the *cleanup functions,
> > > so this still feels dangerous.
> > > 
> > > Overall to me a clean cancel & join of the migration threads feels
> > > like the only safe option, unless we just remove all notion of
> > > cleanup from QEMU & delete all the _cleanup functions in main()
> > > and let the OS free all memory.
> > 
> > That's actually my preference; I think trying to do clean tidy ups
> > here is very racy and doesn't gain much.  However, for things like
> > storage there may be locks that have to be properly dropped and
> > bitmaps and things to be stored/sync'd - so just exiting probably
> > isn't safe either.
> 
> I'm unsure about whether I caught the whole idea but I feel like we
> can't drop all the cleanup hooks since what if we want to do something
> else than "freeing memories"?  Or anything that the OS can't do for us
> but we want to try to do before the process quits.  If that operation
> hangs we'll probably face a similar hang issue.

Right; things like where the block code wants to ensure that
bitmaps/data structtures/buffers are saved out - you need to give them a
chance.

> Regarding pthread_cancel() - it will only work if the thread reaches
> cancellation points, right?  Then does it mean that it still cannot
> guarantee the thread will quit very soon and we still have chance that
> we'll wait forever when join()?  One major reason that the thread will
> be waiting should be the migration streams but AFAIU shutdown() (as
> Dave has mentioned) should solve this problem properly, then I'm
> unsuer how pthread_cancel() can help much comparing to shutdown()...

If I understand correctly, threads are always cancelable by default?

> My understanding (probably not correct, but I'll just speak loud...)
> is that we will never have a way to guarantee a process to quit
> cleanly all the time because there're really many things that can hang
> (I'm assuming we've already solved the MigrationState refcounting
> issue by this patch, so I'm assuming we're having a "hang QEMU" rather
> than crashing issue).  Then IMHO we could simply do whatever we can do
> to cleanup everything assuming no hang will happen, and if it really
> happens we use SIGKILL which will be the last thing we can do by which
> we might lost many things (e.g., unflushed caches in block layer) but
> we've tried our best.  For migration, it'll be (1) cancel (not
> pthread_cancel, but cancel the migration to trigger shutdown() on fds
> or whatever) (2) join thread (3) finalize/cleanup data structures.

Right so the question I think is just whether it's worth adding the join
at the moment which turns us from a small-risk of crash to small-risk of
hang.

> IMHO SIGKILL is the real de-facto solution for all these issues to me.
> And even if with SIGKILL we can still hang.... but I'll assume those
> hangs will be kernel problems not QEMU since SIGKILL should be
> designed to kill a process without a question.

Dave

> Thanks,
> 
> -- 
> Peter Xu
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
Peter Xu March 4, 2019, 7:52 a.m. UTC | #9
On Fri, Mar 01, 2019 at 10:11:42AM +0000, Dr. David Alan Gilbert wrote:
> * Peter Xu (peterx@redhat.com) wrote:
> > On Thu, Feb 28, 2019 at 12:28:22PM +0000, Dr. David Alan Gilbert wrote:
> > > * Daniel P. Berrangé (berrange@redhat.com) wrote:
> > > > On Thu, Feb 28, 2019 at 11:40:19AM +0000, Dr. David Alan Gilbert wrote:
> > > > > * Peter Xu (peterx@redhat.com) wrote:
> > > > > > On Wed, Feb 27, 2019 at 04:49:00PM +0000, Dr. David Alan Gilbert (git) wrote:
> > > > > > > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > > > > > > 
> > > > > > > Currently we cleanup the migration object as we exit main after the
> > > > > > > main_loop finishes; however if there's a migration running things
> > > > > > > get messy and we can end up with the migration thread still trying
> > > > > > > to access freed structures.
> > > > > > > 
> > > > > > > We now take a ref to the object around the migration thread itself,
> > > > > > > so the act of dropping the ref during exit doesn't cause us to lose
> > > > > > > the state until the thread quits.
> > > > > > > 
> > > > > > > Cancelling the migration during migration also tries to get the thread
> > > > > > > to quit.
> > > > > > > 
> > > > > > > We do this a bit earlier; so hopefully migration gets out of the way
> > > > > > > before all the devices etc are freed.
> > > > > > 
> > > > > > So does it mean that even with the patch it's still possible the
> > > > > > migration thread will be accessing device structs that have already
> > > > > > been freed which can still crash QEMU?
> > > > > 
> > > > > Possibly yes; I'm not sure how to go to the next stage and stop that
> > > > > case; the consensus seems to be we don't want to explicitly block
> > > > > during the exit process, so doing a join on the migration thread doesn't
> > > > > seem to be wanted.
> > > > 
> > > > Essentially the many  *_cleanup() calls at the end of main() in vl.c
> > > > are only ever safe if all background threads have stopped using the
> > > > resources that are being freed. This isn't the case with migration
> > > > currently. I also worry about other threads that might be running
> > > > in QEMU, SPICE in particular as it touchs many device backends.
> > > > 
> > > > Aborting the migration & joining the migration threads is the natural
> > > > way to address this. Cancelling appears to require the main loop to
> > > > still be running, so would require main_loop_should_exit() to issue
> > > > the cancel & return false unless it has completed. This would delay
> > > > shutdown for as long as it takes migration to abort.
> > > 
> > > ish - cancelling performs a shutdown(2) on the fd, that should be enough
> > > in most cases to kick the migration thread into falling out without
> > > main loop interaction; I think...
> > 
> > Dave, could you hint me on when shutdown() will not suffice (say,
> > we'll hang death if we do join() upon the migration thread)?
> 
> I think I mostly worry about when we hang in a device's migration code
> or where that interacts with an external program (e.g. vhost-user).

I see.

> 
> > > 
> > > > FWIW, even the bdrv_close_all() call during shutdown can delay things
> > > > for a considerable time if draining the backends isn't a quick op,
> > > > which could be the case if there are storage problems (blocked local
> > > > I/O, or interrupted network - rbd/ceph/nfs clients). So I'm not sure
> > > > that stopping migration is inherantly worse than what's already
> > > > possible with block cleanup, in terms of delaying things.
> > > > 
> > > > A slightly more hacky approach would be to pthread_cancel() all the
> > > > migration threads. Normally we'd never use pthread_cancel() as it
> > > > is incredibly hard to ensure all memory used by the threads is
> > > > freed. Since we're about to exit the process though, this cleanup
> > > > does not matter. The risk, however, is that one of the threads is
> > > > holding a mutex which then blocks the rest of the *cleanup functions,
> > > > so this still feels dangerous.
> > > > 
> > > > Overall to me a clean cancel & join of the migration threads feels
> > > > like the only safe option, unless we just remove all notion of
> > > > cleanup from QEMU & delete all the _cleanup functions in main()
> > > > and let the OS free all memory.
> > > 
> > > That's actually my preference; I think trying to do clean tidy ups
> > > here is very racy and doesn't gain much.  However, for things like
> > > storage there may be locks that have to be properly dropped and
> > > bitmaps and things to be stored/sync'd - so just exiting probably
> > > isn't safe either.
> > 
> > I'm unsure about whether I caught the whole idea but I feel like we
> > can't drop all the cleanup hooks since what if we want to do something
> > else than "freeing memories"?  Or anything that the OS can't do for us
> > but we want to try to do before the process quits.  If that operation
> > hangs we'll probably face a similar hang issue.
> 
> Right; things like where the block code wants to ensure that
> bitmaps/data structtures/buffers are saved out - you need to give them a
> chance.
> 
> > Regarding pthread_cancel() - it will only work if the thread reaches
> > cancellation points, right?  Then does it mean that it still cannot
> > guarantee the thread will quit very soon and we still have chance that
> > we'll wait forever when join()?  One major reason that the thread will
> > be waiting should be the migration streams but AFAIU shutdown() (as
> > Dave has mentioned) should solve this problem properly, then I'm
> > unsuer how pthread_cancel() can help much comparing to shutdown()...
> 
> If I understand correctly, threads are always cancelable by default?

I never used that, but I'm reading man pthread_setcancelstate and the
default cancel state/type for pthreads should be PTHREAD_CANCEL_ENABLE
& PTHREAD_CANCEL_DEFERRED, which IIUC means that a process can be
cancelled but in a deferred fashion (until the cancellation points,
which is listed in "man pthread", section "Cancellation points").
There is an PTHREAD_CANCEL_ASYNCHRONOUS comparing to
PTHREAD_CANCEL_DEFERRED which seems to allow the thread to respond to
the cancel request faster but still not guaranteed:

       PTHREAD_CANCEL_ASYNCHRONOUS
              The thread can be canceled at any time.  (Typically, it
              will be canceled immediately upon receiving a
              cancellation request, but the system doesn't guarantee
              this.)

I took a quick glance on how it's implemented in glibc and it should
be using a SIGTIMER to wake up the thread, and that explains why it's
not guaranteed.  As a conclusion, there seems to have no way to
guarantee that we can cancel a thread immediately.

> 
> > My understanding (probably not correct, but I'll just speak loud...)
> > is that we will never have a way to guarantee a process to quit
> > cleanly all the time because there're really many things that can hang
> > (I'm assuming we've already solved the MigrationState refcounting
> > issue by this patch, so I'm assuming we're having a "hang QEMU" rather
> > than crashing issue).  Then IMHO we could simply do whatever we can do
> > to cleanup everything assuming no hang will happen, and if it really
> > happens we use SIGKILL which will be the last thing we can do by which
> > we might lost many things (e.g., unflushed caches in block layer) but
> > we've tried our best.  For migration, it'll be (1) cancel (not
> > pthread_cancel, but cancel the migration to trigger shutdown() on fds
> > or whatever) (2) join thread (3) finalize/cleanup data structures.
> 
> Right so the question I think is just whether it's worth adding the join
> at the moment which turns us from a small-risk of crash to small-risk of
> hang.

Sorry to be unclear on this... I would see the discussion was about
the next step on how we should proceed (especially we've had thought
about removing all the cleanup hooks which seem hard to achieve from
the first glance).  This patch itself was well tested and it
guarantees to unbreak people so I would assume it's good.

Thanks,
diff mbox series

Patch

diff --git a/include/migration/misc.h b/include/migration/misc.h
index 0471e04d1f..6f9df74436 100644
--- a/include/migration/misc.h
+++ b/include/migration/misc.h
@@ -36,7 +36,7 @@  void dump_vmstate_json_to_file(FILE *out_fp);
 
 /* migration/migration.c */
 void migration_object_init(void);
-void migration_object_finalize(void);
+void migration_shutdown(void);
 void qemu_start_incoming_migration(const char *uri, Error **errp);
 bool migration_is_idle(void);
 void add_migration_state_change_notifier(Notifier *notify);
diff --git a/migration/migration.c b/migration/migration.c
index e44f77af02..d45561f9b8 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -126,6 +126,7 @@  static bool migration_object_check(MigrationState *ms, Error **errp);
 static int migration_maybe_pause(MigrationState *s,
                                  int *current_active_state,
                                  int new_state);
+static void migrate_fd_cancel(MigrationState *s);
 
 void migration_object_init(void)
 {
@@ -167,8 +168,13 @@  void migration_object_init(void)
     }
 }
 
-void migration_object_finalize(void)
+void migration_shutdown(void)
 {
+    /*
+     * Cancel the current migration - that will (eventually)
+     * stop the migration using this structure
+     */
+    migrate_fd_cancel(current_migration);
     object_unref(OBJECT(current_migration));
 }
 
@@ -3134,6 +3140,7 @@  static void *migration_thread(void *opaque)
 
     rcu_register_thread();
 
+    object_ref(OBJECT(s));
     s->iteration_start_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
 
     qemu_savevm_state_header(s->to_dst_file);
@@ -3230,6 +3237,7 @@  static void *migration_thread(void *opaque)
 
     trace_migration_thread_after_loop();
     migration_iteration_finish(s);
+    object_unref(OBJECT(s));
     rcu_unregister_thread();
     return NULL;
 }
diff --git a/vl.c b/vl.c
index 2f340686a7..e7f460a114 100644
--- a/vl.c
+++ b/vl.c
@@ -4579,6 +4579,12 @@  int main(int argc, char **argv, char **envp)
 
     gdbserver_cleanup();
 
+    /*
+     * cleaning up the migration object cancels any existing migration
+     * try to do this early so that it also stops using devices.
+     */
+    migration_shutdown();
+
     /* No more vcpu or device emulation activity beyond this point */
     vm_shutdown();
 
@@ -4594,7 +4600,6 @@  int main(int argc, char **argv, char **envp)
     monitor_cleanup();
     qemu_chr_cleanup();
     user_creatable_cleanup();
-    migration_object_finalize();
     /* TODO: unref root container, check all devices are ok */
 
     return 0;