diff mbox series

[v6,07/36] migration: postcopy_ram_listen_thread() should take BQL for some calls

Message ID 24a7412cc151f8b48d74cd170a3bdc1ce8e6c879.1741124640.git.maciej.szmigiero@oracle.com (mailing list archive)
State New
Headers show
Series Multifd | expand

Commit Message

Maciej S. Szmigiero March 4, 2025, 10:03 p.m. UTC
From: "Maciej S. Szmigiero" <maciej.szmigiero@oracle.com>

All callers to migration_incoming_state_destroy() other than
postcopy_ram_listen_thread() do this call with BQL held.

Since migration_incoming_state_destroy() ultimately calls "load_cleanup"
SaveVMHandlers and it will soon call BQL-sensitive code it makes sense
to always call that function under BQL rather than to have it deal with
both cases (with BQL and without BQL).
Add the necessary bql_lock() and bql_unlock() to
postcopy_ram_listen_thread().

qemu_loadvm_state_main() in postcopy_ram_listen_thread() could call
"load_state" SaveVMHandlers that are expecting BQL to be held.

In principle, the only devices that should be arriving on migration
channel serviced by postcopy_ram_listen_thread() are those that are
postcopiable and whose load handlers are safe to be called without BQL
being held.

But nothing currently prevents the source from sending data for "unsafe"
devices which would cause trouble there.
Add a TODO comment there so it's clear that it would be good to improve
handling of such (erroneous) case in the future.

Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
---
 migration/migration.c | 16 ++++++++++++++++
 migration/savevm.c    |  4 ++++
 2 files changed, 20 insertions(+)

Comments

Peter Xu March 5, 2025, 12:34 p.m. UTC | #1
On Tue, Mar 04, 2025 at 11:03:34PM +0100, Maciej S. Szmigiero wrote:
> From: "Maciej S. Szmigiero" <maciej.szmigiero@oracle.com>
> 
> All callers to migration_incoming_state_destroy() other than
> postcopy_ram_listen_thread() do this call with BQL held.
> 
> Since migration_incoming_state_destroy() ultimately calls "load_cleanup"
> SaveVMHandlers and it will soon call BQL-sensitive code it makes sense
> to always call that function under BQL rather than to have it deal with
> both cases (with BQL and without BQL).
> Add the necessary bql_lock() and bql_unlock() to
> postcopy_ram_listen_thread().
> 
> qemu_loadvm_state_main() in postcopy_ram_listen_thread() could call
> "load_state" SaveVMHandlers that are expecting BQL to be held.
> 
> In principle, the only devices that should be arriving on migration
> channel serviced by postcopy_ram_listen_thread() are those that are
> postcopiable and whose load handlers are safe to be called without BQL
> being held.
> 
> But nothing currently prevents the source from sending data for "unsafe"
> devices which would cause trouble there.
> Add a TODO comment there so it's clear that it would be good to improve
> handling of such (erroneous) case in the future.
> 
> Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
> ---
>  migration/migration.c | 16 ++++++++++++++++
>  migration/savevm.c    |  4 ++++
>  2 files changed, 20 insertions(+)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index 9e9db26667f1..6b2a8af4231d 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -402,10 +402,26 @@ void migration_incoming_state_destroy(void)
>      struct MigrationIncomingState *mis = migration_incoming_get_current();
>  
>      multifd_recv_cleanup();
> +
>      /*
>       * RAM state cleanup needs to happen after multifd cleanup, because
>       * multifd threads can use some of its states (receivedmap).
> +     *
> +     * This call also needs BQL held since it calls all registered
> +     * load_cleanup SaveVMHandlers and at least the VFIO implementation is
> +     * BQL-sensitive.
> +     *
> +     * In addition to the above, it also performs cleanup of load threads
> +     * thread pool.
> +     * This cleanup operation is BQL-sensitive as it requires unlocking BQL
> +     * so a thread possibly waiting for it could get unblocked and finally
> +     * exit.
> +     * The reason why a load thread may need to hold BQL in the first place
> +     * is because address space modification operations require it.

Hold on...

This almost says exactly why load_cleanup() should _not_ take BQL... rather
than should..

So I had a closer look at the latest code, it's about this:

static void vfio_load_cleanup_load_bufs_thread(VFIOMultifd *multifd)
{
    /* The lock order is load_bufs_mutex -> BQL so unlock BQL here first */
    bql_unlock();
    WITH_QEMU_LOCK_GUARD(&multifd->load_bufs_mutex) {
        while (multifd->load_bufs_thread_running) {
            multifd->load_bufs_thread_want_exit = true;

            qemu_cond_signal(&multifd->load_bufs_buffer_ready_cond);
            qemu_cond_signal(&multifd->load_bufs_iter_done_cond);
            qemu_cond_wait(&multifd->load_bufs_thread_finished_cond,
                           &multifd->load_bufs_mutex);
        }
    }
    bql_lock();
}

It doesn't make much sense to me to take it only because we want to drop it
unconditionally. Can we guarantee the function not taking BQL instead?  I
had a quick look on pmem's pmem_persist() (from libpmem, qemu_ram_msync <-
qemu_ram_block_writeback <- ram_load_cleanup), it looks ok.

So the question is, is it safe to unlock BQL in whatever context (in
coroutines, or in a bottom half)?

If the answer is yes, we could make migration_incoming_state_destroy()
always not taking BQL (and assert(!bql_locked()) instead).

If the answer is no, then vfio_load_cleanup_load_bufs_thread()'s current
version may not work either..

> +     *
> +     * Check proper BQL state here rather than risk possible deadlock later.
>       */
> +    assert(bql_locked());
>      qemu_loadvm_state_cleanup();
>  
>      if (mis->to_src_file) {
> diff --git a/migration/savevm.c b/migration/savevm.c
> index 7c1aa8ad7b9d..3e86b572cfa8 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -1986,6 +1986,8 @@ static void *postcopy_ram_listen_thread(void *opaque)
>       * in qemu_file, and thus we must be blocking now.
>       */
>      qemu_file_set_blocking(f, true);
> +
> +    /* TODO: sanity check that only postcopiable data will be loaded here */
>      load_res = qemu_loadvm_state_main(f, mis);
>  
>      /*
> @@ -2046,7 +2048,9 @@ static void *postcopy_ram_listen_thread(void *opaque)
>       * (If something broke then qemu will have to exit anyway since it's
>       * got a bad migration state).
>       */
> +    bql_lock();
>      migration_incoming_state_destroy();
> +    bql_unlock();
>  
>      rcu_unregister_thread();
>      mis->have_listen_thread = false;
>
Maciej S. Szmigiero March 5, 2025, 3:11 p.m. UTC | #2
On 5.03.2025 13:34, Peter Xu wrote:
> On Tue, Mar 04, 2025 at 11:03:34PM +0100, Maciej S. Szmigiero wrote:
>> From: "Maciej S. Szmigiero" <maciej.szmigiero@oracle.com>
>>
>> All callers to migration_incoming_state_destroy() other than
>> postcopy_ram_listen_thread() do this call with BQL held.
>>
>> Since migration_incoming_state_destroy() ultimately calls "load_cleanup"
>> SaveVMHandlers and it will soon call BQL-sensitive code it makes sense
>> to always call that function under BQL rather than to have it deal with
>> both cases (with BQL and without BQL).
>> Add the necessary bql_lock() and bql_unlock() to
>> postcopy_ram_listen_thread().
>>
>> qemu_loadvm_state_main() in postcopy_ram_listen_thread() could call
>> "load_state" SaveVMHandlers that are expecting BQL to be held.
>>
>> In principle, the only devices that should be arriving on migration
>> channel serviced by postcopy_ram_listen_thread() are those that are
>> postcopiable and whose load handlers are safe to be called without BQL
>> being held.
>>
>> But nothing currently prevents the source from sending data for "unsafe"
>> devices which would cause trouble there.
>> Add a TODO comment there so it's clear that it would be good to improve
>> handling of such (erroneous) case in the future.
>>
>> Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
>> ---
>>   migration/migration.c | 16 ++++++++++++++++
>>   migration/savevm.c    |  4 ++++
>>   2 files changed, 20 insertions(+)
>>
>> diff --git a/migration/migration.c b/migration/migration.c
>> index 9e9db26667f1..6b2a8af4231d 100644
>> --- a/migration/migration.c
>> +++ b/migration/migration.c
>> @@ -402,10 +402,26 @@ void migration_incoming_state_destroy(void)
>>       struct MigrationIncomingState *mis = migration_incoming_get_current();
>>   
>>       multifd_recv_cleanup();
>> +
>>       /*
>>        * RAM state cleanup needs to happen after multifd cleanup, because
>>        * multifd threads can use some of its states (receivedmap).
>> +     *
>> +     * This call also needs BQL held since it calls all registered
>> +     * load_cleanup SaveVMHandlers and at least the VFIO implementation is
>> +     * BQL-sensitive.
>> +     *
>> +     * In addition to the above, it also performs cleanup of load threads
>> +     * thread pool.
>> +     * This cleanup operation is BQL-sensitive as it requires unlocking BQL
>> +     * so a thread possibly waiting for it could get unblocked and finally
>> +     * exit.
>> +     * The reason why a load thread may need to hold BQL in the first place
>> +     * is because address space modification operations require it.
> 
> Hold on...
> 
> This almost says exactly why load_cleanup() should _not_ take BQL... rather
> than should..
> 
> So I had a closer look at the latest code, it's about this:
> 
> static void vfio_load_cleanup_load_bufs_thread(VFIOMultifd *multifd)
> {
>      /* The lock order is load_bufs_mutex -> BQL so unlock BQL here first */
>      bql_unlock();
>      WITH_QEMU_LOCK_GUARD(&multifd->load_bufs_mutex) {
>          while (multifd->load_bufs_thread_running) {
>              multifd->load_bufs_thread_want_exit = true;
> 
>              qemu_cond_signal(&multifd->load_bufs_buffer_ready_cond);
>              qemu_cond_signal(&multifd->load_bufs_iter_done_cond);
>              qemu_cond_wait(&multifd->load_bufs_thread_finished_cond,
>                             &multifd->load_bufs_mutex);
>          }
>      }
>      bql_lock();
> }
> 
> It doesn't make much sense to me to take it only because we want to drop it
> unconditionally. Can we guarantee the function not taking BQL instead?  I
> had a quick look on pmem's pmem_persist() (from libpmem, qemu_ram_msync <-
> qemu_ram_block_writeback <- ram_load_cleanup), it looks ok.
>
> So the question is, is it safe to unlock BQL in whatever context (in
> coroutines, or in a bottom half)?
> 
> If the answer is yes, we could make migration_incoming_state_destroy()
> always not taking BQL (and assert(!bql_locked()) instead).

All the other callers of migration_incoming_state_destroy() are holding BQL:
process_incoming_migration_bh(), process_incoming_migration_co() (called on,
failure path only), load_snapshot() and qmp_xen_load_devices_state().

So AFAIK the safer way is to standardize on holding BQL when calling
that function.
  
> If the answer is no, then vfio_load_cleanup_load_bufs_thread()'s current
> version may not work either..

I think the reason for BQL is to serialize access to the QEMU internals
which are not thread-safe.

So as long as these internals aren't touched when not holding BQL then
we should be safe - I don't see any particular state that's cached
around these BQL calls and would need explicit reloading after re-gaining
it.

Thanks,
Maciej
Peter Xu March 5, 2025, 4:15 p.m. UTC | #3
On Wed, Mar 05, 2025 at 04:11:30PM +0100, Maciej S. Szmigiero wrote:
> On 5.03.2025 13:34, Peter Xu wrote:
> > On Tue, Mar 04, 2025 at 11:03:34PM +0100, Maciej S. Szmigiero wrote:
> > > From: "Maciej S. Szmigiero" <maciej.szmigiero@oracle.com>
> > > 
> > > All callers to migration_incoming_state_destroy() other than
> > > postcopy_ram_listen_thread() do this call with BQL held.
> > > 
> > > Since migration_incoming_state_destroy() ultimately calls "load_cleanup"
> > > SaveVMHandlers and it will soon call BQL-sensitive code it makes sense
> > > to always call that function under BQL rather than to have it deal with
> > > both cases (with BQL and without BQL).
> > > Add the necessary bql_lock() and bql_unlock() to
> > > postcopy_ram_listen_thread().
> > > 
> > > qemu_loadvm_state_main() in postcopy_ram_listen_thread() could call
> > > "load_state" SaveVMHandlers that are expecting BQL to be held.
> > > 
> > > In principle, the only devices that should be arriving on migration
> > > channel serviced by postcopy_ram_listen_thread() are those that are
> > > postcopiable and whose load handlers are safe to be called without BQL
> > > being held.
> > > 
> > > But nothing currently prevents the source from sending data for "unsafe"
> > > devices which would cause trouble there.
> > > Add a TODO comment there so it's clear that it would be good to improve
> > > handling of such (erroneous) case in the future.
> > > 
> > > Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
> > > ---
> > >   migration/migration.c | 16 ++++++++++++++++
> > >   migration/savevm.c    |  4 ++++
> > >   2 files changed, 20 insertions(+)
> > > 
> > > diff --git a/migration/migration.c b/migration/migration.c
> > > index 9e9db26667f1..6b2a8af4231d 100644
> > > --- a/migration/migration.c
> > > +++ b/migration/migration.c
> > > @@ -402,10 +402,26 @@ void migration_incoming_state_destroy(void)
> > >       struct MigrationIncomingState *mis = migration_incoming_get_current();
> > >       multifd_recv_cleanup();
> > > +
> > >       /*
> > >        * RAM state cleanup needs to happen after multifd cleanup, because
> > >        * multifd threads can use some of its states (receivedmap).
> > > +     *
> > > +     * This call also needs BQL held since it calls all registered
> > > +     * load_cleanup SaveVMHandlers and at least the VFIO implementation is
> > > +     * BQL-sensitive.
> > > +     *
> > > +     * In addition to the above, it also performs cleanup of load threads
> > > +     * thread pool.
> > > +     * This cleanup operation is BQL-sensitive as it requires unlocking BQL
> > > +     * so a thread possibly waiting for it could get unblocked and finally
> > > +     * exit.
> > > +     * The reason why a load thread may need to hold BQL in the first place
> > > +     * is because address space modification operations require it.
> > 
> > Hold on...
> > 
> > This almost says exactly why load_cleanup() should _not_ take BQL... rather
> > than should..
> > 
> > So I had a closer look at the latest code, it's about this:
> > 
> > static void vfio_load_cleanup_load_bufs_thread(VFIOMultifd *multifd)
> > {
> >      /* The lock order is load_bufs_mutex -> BQL so unlock BQL here first */
> >      bql_unlock();
> >      WITH_QEMU_LOCK_GUARD(&multifd->load_bufs_mutex) {
> >          while (multifd->load_bufs_thread_running) {
> >              multifd->load_bufs_thread_want_exit = true;
> > 
> >              qemu_cond_signal(&multifd->load_bufs_buffer_ready_cond);
> >              qemu_cond_signal(&multifd->load_bufs_iter_done_cond);
> >              qemu_cond_wait(&multifd->load_bufs_thread_finished_cond,
> >                             &multifd->load_bufs_mutex);
> >          }
> >      }
> >      bql_lock();
> > }
> > 
> > It doesn't make much sense to me to take it only because we want to drop it
> > unconditionally. Can we guarantee the function not taking BQL instead?  I
> > had a quick look on pmem's pmem_persist() (from libpmem, qemu_ram_msync <-
> > qemu_ram_block_writeback <- ram_load_cleanup), it looks ok.
> > 
> > So the question is, is it safe to unlock BQL in whatever context (in
> > coroutines, or in a bottom half)?
> > 
> > If the answer is yes, we could make migration_incoming_state_destroy()
> > always not taking BQL (and assert(!bql_locked()) instead).
> 
> All the other callers of migration_incoming_state_destroy() are holding BQL:
> process_incoming_migration_bh(), process_incoming_migration_co() (called on,
> failure path only), load_snapshot() and qmp_xen_load_devices_state().
> 
> So AFAIK the safer way is to standardize on holding BQL when calling
> that function.
> > If the answer is no, then vfio_load_cleanup_load_bufs_thread()'s current
> > version may not work either..
> 
> I think the reason for BQL is to serialize access to the QEMU internals
> which are not thread-safe.
> 
> So as long as these internals aren't touched when not holding BQL then
> we should be safe - I don't see any particular state that's cached
> around these BQL calls and would need explicit reloading after re-gaining
> it.

OK, I checked with misterious force and looks like it's ok.

Would you please rephrase the comment, though?  I want to make it crystal
clear that what we're looking for is not holding BQL.. Maybe something like
this:

    /*
     * The VFIO load_cleanup() implementation is BQL-sensitive. It requires
     * BQL must NOT be taken when recycling load threads, so that it won't
     * block the load threads from making progress on address space
     * modification operations.
     *
     * To make it work, we could try to not take BQL for all load_cleanup(),
     * or conditionally unlock BQL only if bql_locked() in VFIO.
     *
     * Since most existing call sites take BQL for load_cleanup(), make
     * it simple by taking BQL always as the rule, so that VFIO can unlock
     * BQL and retake unconditionally.
     */

We may also want to update the subject.  Currently:

  "migration: postcopy_ram_listen_thread() should take BQL for some calls"

It's not accurate anymore, it could be:

  "migration: Always take BQL for migration_incoming_state_destroy()"

If with all above, please feel free to take:

Acked-by: Peter Xu <peterx@redhat.com>

I'm OK if it'll be touched up when merge too.

Thanks,
Cédric Le Goater March 5, 2025, 4:37 p.m. UTC | #4
On 3/5/25 17:15, Peter Xu wrote:
> On Wed, Mar 05, 2025 at 04:11:30PM +0100, Maciej S. Szmigiero wrote:
>> On 5.03.2025 13:34, Peter Xu wrote:
>>> On Tue, Mar 04, 2025 at 11:03:34PM +0100, Maciej S. Szmigiero wrote:
>>>> From: "Maciej S. Szmigiero" <maciej.szmigiero@oracle.com>
>>>>
>>>> All callers to migration_incoming_state_destroy() other than
>>>> postcopy_ram_listen_thread() do this call with BQL held.
>>>>
>>>> Since migration_incoming_state_destroy() ultimately calls "load_cleanup"
>>>> SaveVMHandlers and it will soon call BQL-sensitive code it makes sense
>>>> to always call that function under BQL rather than to have it deal with
>>>> both cases (with BQL and without BQL).
>>>> Add the necessary bql_lock() and bql_unlock() to
>>>> postcopy_ram_listen_thread().
>>>>
>>>> qemu_loadvm_state_main() in postcopy_ram_listen_thread() could call
>>>> "load_state" SaveVMHandlers that are expecting BQL to be held.
>>>>
>>>> In principle, the only devices that should be arriving on migration
>>>> channel serviced by postcopy_ram_listen_thread() are those that are
>>>> postcopiable and whose load handlers are safe to be called without BQL
>>>> being held.
>>>>
>>>> But nothing currently prevents the source from sending data for "unsafe"
>>>> devices which would cause trouble there.
>>>> Add a TODO comment there so it's clear that it would be good to improve
>>>> handling of such (erroneous) case in the future.
>>>>
>>>> Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
>>>> ---
>>>>    migration/migration.c | 16 ++++++++++++++++
>>>>    migration/savevm.c    |  4 ++++
>>>>    2 files changed, 20 insertions(+)
>>>>
>>>> diff --git a/migration/migration.c b/migration/migration.c
>>>> index 9e9db26667f1..6b2a8af4231d 100644
>>>> --- a/migration/migration.c
>>>> +++ b/migration/migration.c
>>>> @@ -402,10 +402,26 @@ void migration_incoming_state_destroy(void)
>>>>        struct MigrationIncomingState *mis = migration_incoming_get_current();
>>>>        multifd_recv_cleanup();
>>>> +
>>>>        /*
>>>>         * RAM state cleanup needs to happen after multifd cleanup, because
>>>>         * multifd threads can use some of its states (receivedmap).
>>>> +     *
>>>> +     * This call also needs BQL held since it calls all registered
>>>> +     * load_cleanup SaveVMHandlers and at least the VFIO implementation is
>>>> +     * BQL-sensitive.
>>>> +     *
>>>> +     * In addition to the above, it also performs cleanup of load threads
>>>> +     * thread pool.
>>>> +     * This cleanup operation is BQL-sensitive as it requires unlocking BQL
>>>> +     * so a thread possibly waiting for it could get unblocked and finally
>>>> +     * exit.
>>>> +     * The reason why a load thread may need to hold BQL in the first place
>>>> +     * is because address space modification operations require it.
>>>
>>> Hold on...
>>>
>>> This almost says exactly why load_cleanup() should _not_ take BQL... rather
>>> than should..
>>>
>>> So I had a closer look at the latest code, it's about this:
>>>
>>> static void vfio_load_cleanup_load_bufs_thread(VFIOMultifd *multifd)
>>> {
>>>       /* The lock order is load_bufs_mutex -> BQL so unlock BQL here first */
>>>       bql_unlock();
>>>       WITH_QEMU_LOCK_GUARD(&multifd->load_bufs_mutex) {
>>>           while (multifd->load_bufs_thread_running) {
>>>               multifd->load_bufs_thread_want_exit = true;
>>>
>>>               qemu_cond_signal(&multifd->load_bufs_buffer_ready_cond);
>>>               qemu_cond_signal(&multifd->load_bufs_iter_done_cond);
>>>               qemu_cond_wait(&multifd->load_bufs_thread_finished_cond,
>>>                              &multifd->load_bufs_mutex);
>>>           }
>>>       }
>>>       bql_lock();
>>> }
>>>
>>> It doesn't make much sense to me to take it only because we want to drop it
>>> unconditionally. Can we guarantee the function not taking BQL instead?  I
>>> had a quick look on pmem's pmem_persist() (from libpmem, qemu_ram_msync <-
>>> qemu_ram_block_writeback <- ram_load_cleanup), it looks ok.
>>>
>>> So the question is, is it safe to unlock BQL in whatever context (in
>>> coroutines, or in a bottom half)?
>>>
>>> If the answer is yes, we could make migration_incoming_state_destroy()
>>> always not taking BQL (and assert(!bql_locked()) instead).
>>
>> All the other callers of migration_incoming_state_destroy() are holding BQL:
>> process_incoming_migration_bh(), process_incoming_migration_co() (called on,
>> failure path only), load_snapshot() and qmp_xen_load_devices_state().
>>
>> So AFAIK the safer way is to standardize on holding BQL when calling
>> that function.
>>> If the answer is no, then vfio_load_cleanup_load_bufs_thread()'s current
>>> version may not work either..
>>
>> I think the reason for BQL is to serialize access to the QEMU internals
>> which are not thread-safe.
>>
>> So as long as these internals aren't touched when not holding BQL then
>> we should be safe - I don't see any particular state that's cached
>> around these BQL calls and would need explicit reloading after re-gaining
>> it.
> 
> OK, I checked with misterious force and looks like it's ok.
> 
> Would you please rephrase the comment, though?  I want to make it crystal
> clear that what we're looking for is not holding BQL.. Maybe something like
> this:
> 
>      /*
>       * The VFIO load_cleanup() implementation is BQL-sensitive. It requires
>       * BQL must NOT be taken when recycling load threads, so that it won't
>       * block the load threads from making progress on address space
>       * modification operations.
>       *
>       * To make it work, we could try to not take BQL for all load_cleanup(),
>       * or conditionally unlock BQL only if bql_locked() in VFIO.
>       *
>       * Since most existing call sites take BQL for load_cleanup(), make
>       * it simple by taking BQL always as the rule, so that VFIO can unlock
>       * BQL and retake unconditionally.
>       */
> 
> We may also want to update the subject.  Currently:
> 
>    "migration: postcopy_ram_listen_thread() should take BQL for some calls"
> 
> It's not accurate anymore, it could be:
> 
>    "migration: Always take BQL for migration_incoming_state_destroy()"
> 
> If with all above, please feel free to take:
> 
> Acked-by: Peter Xu <peterx@redhat.com>
> 
> I'm OK if it'll be touched up when merge too.

Maciej,

Could you please resend just that patch as a reply to the series ? No need to
resend the whole series.

Thanks,

C.
Maciej S. Szmigiero March 5, 2025, 4:49 p.m. UTC | #5
On 5.03.2025 17:37, Cédric Le Goater wrote:
> On 3/5/25 17:15, Peter Xu wrote:
>> On Wed, Mar 05, 2025 at 04:11:30PM +0100, Maciej S. Szmigiero wrote:
>>> On 5.03.2025 13:34, Peter Xu wrote:
>>>> On Tue, Mar 04, 2025 at 11:03:34PM +0100, Maciej S. Szmigiero wrote:
>>>>> From: "Maciej S. Szmigiero" <maciej.szmigiero@oracle.com>
>>>>>
>>>>> All callers to migration_incoming_state_destroy() other than
>>>>> postcopy_ram_listen_thread() do this call with BQL held.
>>>>>
>>>>> Since migration_incoming_state_destroy() ultimately calls "load_cleanup"
>>>>> SaveVMHandlers and it will soon call BQL-sensitive code it makes sense
>>>>> to always call that function under BQL rather than to have it deal with
>>>>> both cases (with BQL and without BQL).
>>>>> Add the necessary bql_lock() and bql_unlock() to
>>>>> postcopy_ram_listen_thread().
>>>>>
>>>>> qemu_loadvm_state_main() in postcopy_ram_listen_thread() could call
>>>>> "load_state" SaveVMHandlers that are expecting BQL to be held.
>>>>>
>>>>> In principle, the only devices that should be arriving on migration
>>>>> channel serviced by postcopy_ram_listen_thread() are those that are
>>>>> postcopiable and whose load handlers are safe to be called without BQL
>>>>> being held.
>>>>>
>>>>> But nothing currently prevents the source from sending data for "unsafe"
>>>>> devices which would cause trouble there.
>>>>> Add a TODO comment there so it's clear that it would be good to improve
>>>>> handling of such (erroneous) case in the future.
>>>>>
>>>>> Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
>>>>> ---
>>>>>    migration/migration.c | 16 ++++++++++++++++
>>>>>    migration/savevm.c    |  4 ++++
>>>>>    2 files changed, 20 insertions(+)
>>>>>
>>>>> diff --git a/migration/migration.c b/migration/migration.c
>>>>> index 9e9db26667f1..6b2a8af4231d 100644
>>>>> --- a/migration/migration.c
>>>>> +++ b/migration/migration.c
>>>>> @@ -402,10 +402,26 @@ void migration_incoming_state_destroy(void)
>>>>>        struct MigrationIncomingState *mis = migration_incoming_get_current();
>>>>>        multifd_recv_cleanup();
>>>>> +
>>>>>        /*
>>>>>         * RAM state cleanup needs to happen after multifd cleanup, because
>>>>>         * multifd threads can use some of its states (receivedmap).
>>>>> +     *
>>>>> +     * This call also needs BQL held since it calls all registered
>>>>> +     * load_cleanup SaveVMHandlers and at least the VFIO implementation is
>>>>> +     * BQL-sensitive.
>>>>> +     *
>>>>> +     * In addition to the above, it also performs cleanup of load threads
>>>>> +     * thread pool.
>>>>> +     * This cleanup operation is BQL-sensitive as it requires unlocking BQL
>>>>> +     * so a thread possibly waiting for it could get unblocked and finally
>>>>> +     * exit.
>>>>> +     * The reason why a load thread may need to hold BQL in the first place
>>>>> +     * is because address space modification operations require it.
>>>>
>>>> Hold on...
>>>>
>>>> This almost says exactly why load_cleanup() should _not_ take BQL... rather
>>>> than should..
>>>>
>>>> So I had a closer look at the latest code, it's about this:
>>>>
>>>> static void vfio_load_cleanup_load_bufs_thread(VFIOMultifd *multifd)
>>>> {
>>>>       /* The lock order is load_bufs_mutex -> BQL so unlock BQL here first */
>>>>       bql_unlock();
>>>>       WITH_QEMU_LOCK_GUARD(&multifd->load_bufs_mutex) {
>>>>           while (multifd->load_bufs_thread_running) {
>>>>               multifd->load_bufs_thread_want_exit = true;
>>>>
>>>>               qemu_cond_signal(&multifd->load_bufs_buffer_ready_cond);
>>>>               qemu_cond_signal(&multifd->load_bufs_iter_done_cond);
>>>>               qemu_cond_wait(&multifd->load_bufs_thread_finished_cond,
>>>>                              &multifd->load_bufs_mutex);
>>>>           }
>>>>       }
>>>>       bql_lock();
>>>> }
>>>>
>>>> It doesn't make much sense to me to take it only because we want to drop it
>>>> unconditionally. Can we guarantee the function not taking BQL instead?  I
>>>> had a quick look on pmem's pmem_persist() (from libpmem, qemu_ram_msync <-
>>>> qemu_ram_block_writeback <- ram_load_cleanup), it looks ok.
>>>>
>>>> So the question is, is it safe to unlock BQL in whatever context (in
>>>> coroutines, or in a bottom half)?
>>>>
>>>> If the answer is yes, we could make migration_incoming_state_destroy()
>>>> always not taking BQL (and assert(!bql_locked()) instead).
>>>
>>> All the other callers of migration_incoming_state_destroy() are holding BQL:
>>> process_incoming_migration_bh(), process_incoming_migration_co() (called on,
>>> failure path only), load_snapshot() and qmp_xen_load_devices_state().
>>>
>>> So AFAIK the safer way is to standardize on holding BQL when calling
>>> that function.
>>>> If the answer is no, then vfio_load_cleanup_load_bufs_thread()'s current
>>>> version may not work either..
>>>
>>> I think the reason for BQL is to serialize access to the QEMU internals
>>> which are not thread-safe.
>>>
>>> So as long as these internals aren't touched when not holding BQL then
>>> we should be safe - I don't see any particular state that's cached
>>> around these BQL calls and would need explicit reloading after re-gaining
>>> it.
>>
>> OK, I checked with misterious force and looks like it's ok.
>>
>> Would you please rephrase the comment, though?  I want to make it crystal
>> clear that what we're looking for is not holding BQL.. Maybe something like
>> this:
>>
>>      /*
>>       * The VFIO load_cleanup() implementation is BQL-sensitive. It requires
>>       * BQL must NOT be taken when recycling load threads, so that it won't
>>       * block the load threads from making progress on address space
>>       * modification operations.
>>       *
>>       * To make it work, we could try to not take BQL for all load_cleanup(),
>>       * or conditionally unlock BQL only if bql_locked() in VFIO.
>>       *
>>       * Since most existing call sites take BQL for load_cleanup(), make
>>       * it simple by taking BQL always as the rule, so that VFIO can unlock
>>       * BQL and retake unconditionally.
>>       */
>>
>> We may also want to update the subject.  Currently:
>>
>>    "migration: postcopy_ram_listen_thread() should take BQL for some calls"
>>
>> It's not accurate anymore, it could be:
>>
>>    "migration: Always take BQL for migration_incoming_state_destroy()"
>>
>> If with all above, please feel free to take:
>>
>> Acked-by: Peter Xu <peterx@redhat.com>
>>
>> I'm OK if it'll be touched up when merge too.

@Peter: Updated the comment and patch subject.
Added your review tag.

> Maciej,
> 
> Could you please resend just that patch as a reply to the series ? No need to
> resend the whole series.

@Cedric:
I've sent the updated patch.

I also updated the tree at GitLab with collected review tags and
updated version of this patch.

> Thanks,
> 
> C.
> 
> 

Thanks,
Maciej
diff mbox series

Patch

diff --git a/migration/migration.c b/migration/migration.c
index 9e9db26667f1..6b2a8af4231d 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -402,10 +402,26 @@  void migration_incoming_state_destroy(void)
     struct MigrationIncomingState *mis = migration_incoming_get_current();
 
     multifd_recv_cleanup();
+
     /*
      * RAM state cleanup needs to happen after multifd cleanup, because
      * multifd threads can use some of its states (receivedmap).
+     *
+     * This call also needs BQL held since it calls all registered
+     * load_cleanup SaveVMHandlers and at least the VFIO implementation is
+     * BQL-sensitive.
+     *
+     * In addition to the above, it also performs cleanup of load threads
+     * thread pool.
+     * This cleanup operation is BQL-sensitive as it requires unlocking BQL
+     * so a thread possibly waiting for it could get unblocked and finally
+     * exit.
+     * The reason why a load thread may need to hold BQL in the first place
+     * is because address space modification operations require it.
+     *
+     * Check proper BQL state here rather than risk possible deadlock later.
      */
+    assert(bql_locked());
     qemu_loadvm_state_cleanup();
 
     if (mis->to_src_file) {
diff --git a/migration/savevm.c b/migration/savevm.c
index 7c1aa8ad7b9d..3e86b572cfa8 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1986,6 +1986,8 @@  static void *postcopy_ram_listen_thread(void *opaque)
      * in qemu_file, and thus we must be blocking now.
      */
     qemu_file_set_blocking(f, true);
+
+    /* TODO: sanity check that only postcopiable data will be loaded here */
     load_res = qemu_loadvm_state_main(f, mis);
 
     /*
@@ -2046,7 +2048,9 @@  static void *postcopy_ram_listen_thread(void *opaque)
      * (If something broke then qemu will have to exit anyway since it's
      * got a bad migration state).
      */
+    bql_lock();
     migration_incoming_state_destroy();
+    bql_unlock();
 
     rcu_unregister_thread();
     mis->have_listen_thread = false;