diff mbox series

[v2,04/20] qemu-file: We only call qemu_file_transferred_* on the sending side

Message ID 20230530183941.7223-5-quintela@redhat.com (mailing list archive)
State New, archived
Headers show
Series Next round of migration atomic counters | expand

Commit Message

Juan Quintela May 30, 2023, 6:39 p.m. UTC
Remove the increase in qemu_file_fill_buffer() and add asserts to
qemu_file_transferred* functions.

Signed-off-by: Juan Quintela <quintela@redhat.com>
---
 migration/qemu-file.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

Comments

Peter Xu June 5, 2023, 6:18 p.m. UTC | #1
On Tue, May 30, 2023 at 08:39:25PM +0200, Juan Quintela wrote:
> Remove the increase in qemu_file_fill_buffer() and add asserts to
> qemu_file_transferred* functions.
> 
> Signed-off-by: Juan Quintela <quintela@redhat.com>

The read side accounting does look a bit weird and never caught my notice..

Maybe worth also touching the document of QEMUFile::total_transferred to
clarify what it accounts?

Reviewed-by: Peter Xu <peterx@redhat.com>

Though when I'm looking at the counters (didn't follow every single recent
patch on this..), I found that now reading transferred value is actually
more expensive - qemu_file_transferred() needs flushing, even if for the
fast version, qemu_file_transferred_fast() loops over all possible iovs,
which can be as large as MAX_IOV_SIZE==64.

To be explicit, I _think_ for each guest page we now need to flush...

  ram_save_iterate
    migration_rate_exceeded
      migration_transferred_bytes
        qemu_file_transferred

I hope I'm wrong..

Does it mean that perhaps we simply need "sent and put into send buffer"
more than "what really got transferred"?  So I start to wonder what's the
origianl purpose of this change, and which one is better..
Juan Quintela June 13, 2023, 4:02 p.m. UTC | #2
Peter Xu <peterx@redhat.com> wrote:
> On Tue, May 30, 2023 at 08:39:25PM +0200, Juan Quintela wrote:
>> Remove the increase in qemu_file_fill_buffer() and add asserts to
>> qemu_file_transferred* functions.
>> 
>> Signed-off-by: Juan Quintela <quintela@redhat.com>
>
> The read side accounting does look a bit weird and never caught my notice..
>
> Maybe worth also touching the document of QEMUFile::total_transferred to
> clarify what it accounts?
>
> Reviewed-by: Peter Xu <peterx@redhat.com>
>
> Though when I'm looking at the counters (didn't follow every single recent
> patch on this..), I found that now reading transferred value is actually
> more expensive - qemu_file_transferred() needs flushing, even if for the
> fast version, qemu_file_transferred_fast() loops over all possible iovs,
> which can be as large as MAX_IOV_SIZE==64.
>
> To be explicit, I _think_ for each guest page we now need to flush...
>
>   ram_save_iterate
>     migration_rate_exceeded
>       migration_transferred_bytes
>         qemu_file_transferred
>
> I hope I'm wrong..

See patch 7:

diff --git a/migration/migration-stats.c b/migration/migration-stats.c
index 79eea8d865..1696185694 100644
--- a/migration/migration-stats.c
+++ b/migration/migration-stats.c
@@ -62,7 +62,7 @@ uint64_t migration_transferred_bytes(QEMUFile *f)
 {
     uint64_t multifd = stat64_get(&mig_stats.multifd_bytes);
     uint64_t rdma = stat64_get(&mig_stats.rdma_bytes);
-    uint64_t qemu_file = qemu_file_transferred(f);
+    uint64_t qemu_file = stat64_get(&mig_stats.qemu_file_transferred);
 
     trace_migration_transferred_bytes(qemu_file, multifd, rdma);
     return qemu_file + multifd + rdma;


> Does it mean that perhaps we simply need "sent and put into send buffer"
> more than "what really got transferred"?  So I start to wonder what's the
> origianl purpose of this change, and which one is better..

That is basically what patch 5 and 6 do O:-)

Problem is arriving to something that is bisectable (for correctness)
and is easy to review.

And yes, my choices can be different from the ones tat you do.

The other reason for the small patches is that:
a - sometimes I found a different test where things broke, and have to
    bisect
b - small patches are much easier to rebase (that I am doing a lot)

Later, Juan.
Peter Xu June 14, 2023, 1:36 p.m. UTC | #3
On Tue, Jun 13, 2023 at 06:02:05PM +0200, Juan Quintela wrote:
> Peter Xu <peterx@redhat.com> wrote:
> > On Tue, May 30, 2023 at 08:39:25PM +0200, Juan Quintela wrote:
> >> Remove the increase in qemu_file_fill_buffer() and add asserts to
> >> qemu_file_transferred* functions.
> >> 
> >> Signed-off-by: Juan Quintela <quintela@redhat.com>
> >
> > The read side accounting does look a bit weird and never caught my notice..
> >
> > Maybe worth also touching the document of QEMUFile::total_transferred to
> > clarify what it accounts?
> >
> > Reviewed-by: Peter Xu <peterx@redhat.com>
> >
> > Though when I'm looking at the counters (didn't follow every single recent
> > patch on this..), I found that now reading transferred value is actually
> > more expensive - qemu_file_transferred() needs flushing, even if for the
> > fast version, qemu_file_transferred_fast() loops over all possible iovs,
> > which can be as large as MAX_IOV_SIZE==64.
> >
> > To be explicit, I _think_ for each guest page we now need to flush...
> >
> >   ram_save_iterate
> >     migration_rate_exceeded
> >       migration_transferred_bytes
> >         qemu_file_transferred
> >
> > I hope I'm wrong..
> 
> See patch 7:
> 
> diff --git a/migration/migration-stats.c b/migration/migration-stats.c
> index 79eea8d865..1696185694 100644
> --- a/migration/migration-stats.c
> +++ b/migration/migration-stats.c
> @@ -62,7 +62,7 @@ uint64_t migration_transferred_bytes(QEMUFile *f)
>  {
>      uint64_t multifd = stat64_get(&mig_stats.multifd_bytes);
>      uint64_t rdma = stat64_get(&mig_stats.rdma_bytes);
> -    uint64_t qemu_file = qemu_file_transferred(f);
> +    uint64_t qemu_file = stat64_get(&mig_stats.qemu_file_transferred);
>  
>      trace_migration_transferred_bytes(qemu_file, multifd, rdma);
>      return qemu_file + multifd + rdma;

If this is a known regression, should we make a first patchset fix it and
make it higher priority to merge?

It seems this is even not mentioned in the cover letter.. while IMHO this
is the most important bit to have in it..

> 
> 
> > Does it mean that perhaps we simply need "sent and put into send buffer"
> > more than "what really got transferred"?  So I start to wonder what's the
> > origianl purpose of this change, and which one is better..
> 
> That is basically what patch 5 and 6 do O:-)
> 
> Problem is arriving to something that is bisectable (for correctness)
> and is easy to review.
> 
> And yes, my choices can be different from the ones tat you do.
> 
> The other reason for the small patches is that:
> a - sometimes I found a different test where things broke, and have to
>     bisect
> b - small patches are much easier to rebase (that I am doing a lot)

That's okay.  Thanks,
Juan Quintela June 21, 2023, 10:20 p.m. UTC | #4
Peter Xu <peterx@redhat.com> wrote:
> On Tue, Jun 13, 2023 at 06:02:05PM +0200, Juan Quintela wrote:
>> Peter Xu <peterx@redhat.com> wrote:
>> > On Tue, May 30, 2023 at 08:39:25PM +0200, Juan Quintela wrote:
>> >> Remove the increase in qemu_file_fill_buffer() and add asserts to
>> >> qemu_file_transferred* functions.
>> >> 
>> >> Signed-off-by: Juan Quintela <quintela@redhat.com>
>> >
>> > The read side accounting does look a bit weird and never caught my notice..
>> >
>> > Maybe worth also touching the document of QEMUFile::total_transferred to
>> > clarify what it accounts?
>> >
>> > Reviewed-by: Peter Xu <peterx@redhat.com>
>> >
>> > Though when I'm looking at the counters (didn't follow every single recent
>> > patch on this..), I found that now reading transferred value is actually
>> > more expensive - qemu_file_transferred() needs flushing, even if for the
>> > fast version, qemu_file_transferred_fast() loops over all possible iovs,
>> > which can be as large as MAX_IOV_SIZE==64.
>> >
>> > To be explicit, I _think_ for each guest page we now need to flush...
>> >
>> >   ram_save_iterate
>> >     migration_rate_exceeded
>> >       migration_transferred_bytes
>> >         qemu_file_transferred
>> >
>> > I hope I'm wrong..
>> 
>> See patch 7:
>> 
>> diff --git a/migration/migration-stats.c b/migration/migration-stats.c
>> index 79eea8d865..1696185694 100644
>> --- a/migration/migration-stats.c
>> +++ b/migration/migration-stats.c
>> @@ -62,7 +62,7 @@ uint64_t migration_transferred_bytes(QEMUFile *f)
>>  {
>>      uint64_t multifd = stat64_get(&mig_stats.multifd_bytes);
>>      uint64_t rdma = stat64_get(&mig_stats.rdma_bytes);
>> -    uint64_t qemu_file = qemu_file_transferred(f);
>> +    uint64_t qemu_file = stat64_get(&mig_stats.qemu_file_transferred);
>>  
>>      trace_migration_transferred_bytes(qemu_file, multifd, rdma);
>>      return qemu_file + multifd + rdma;
>
> If this is a known regression, should we make a first patchset fix it and
> make it higher priority to merge?

This is the simpler way that I have found to arrive from A to B.
The reason why I didn't want to enter the atomic vars (and everybody
before) was because it had so many changing bits.

And here we are, moving to single counters instead of several of them
took something like 200 patches.

> It seems this is even not mentioned in the cover letter.. while IMHO this
> is the most important bit to have in it..

My fault.

I cc'd Fiona on v1, and she confirmed that this fixed her problem.

This is the commit that introduces the slowdown.

commit 813cd61669e45ee6d5db09a83d03df8f0c6eb5d2
Author: Juan Quintela <quintela@redhat.com>
Date:   Mon May 15 21:57:01 2023 +0200

    migration: Use migration_transferred_bytes() to calculate rate_limit
    
    Signed-off-by: Juan Quintela <quintela@redhat.com>
    Reviewed-by: Cédric Le Goater <clg@kaod.org>
    Message-Id: <20230515195709.63843-9-quintela@redhat.com>

Important bits:

-    uint64_t rate_limit_used = stat64_get(&mig_stats.rate_limit_used);
+    uint64_t rate_limit_start = stat64_get(&mig_stats.rate_limit_start);
+    uint64_t rate_limit_current = migration_transferred_bytes(f);
+    uint64_t rate_limit_used = rate_limit_current - rate_limit_start;
     uint64_t rate_limit_max = stat64_get(&mig_stats.rate_limit_max);

We moved from reading an atomic to call qemu_file_transferred(), that
does the iovec dance.

This commit (on this series):

ommit 524072cb5f5ce5605f1171f86ba0879405e4b9b3
Author: Juan Quintela <quintela@redhat.com>
Date:   Mon May 8 12:16:47 2023 +0200

    migration: Use the number of transferred bytes directly
    
    We only use migration_transferred_bytes() to calculate the rate_limit,
    for that we don't need to flush whatever is on the qemu_file buffer.
    Remember that the buffer is really small (normal case is 32K if we use
    iov's can be 64 * TARGET_PAGE_SIZE), so this is not relevant to
    calculations.
    
    Signed-off-by: Juan Quintela <quintela@redhat.com>

diff --git a/migration/migration-stats.c b/migration/migration-stats.c
index 79eea8d865..1696185694 100644
--- a/migration/migration-stats.c
+++ b/migration/migration-stats.c
@@ -62,7 +62,7 @@ uint64_t migration_transferred_bytes(QEMUFile *f)
 {
     uint64_t multifd = stat64_get(&mig_stats.multifd_bytes);
     uint64_t rdma = stat64_get(&mig_stats.rdma_bytes);
-    uint64_t qemu_file = qemu_file_transferred(f);
+    uint64_t qemu_file = stat64_get(&mig_stats.qemu_file_transferred);
 
     trace_migration_transferred_bytes(qemu_file, multifd, rdma);
     return qemu_file + multifd + rdma;

Undoes the damage.

And yes, before you ask, I got this wrong lots of times, have to rebase
and changing order of patches several times O:-)

>> 
>> > Does it mean that perhaps we simply need "sent and put into send buffer"
>> > more than "what really got transferred"?  So I start to wonder what's the
>> > origianl purpose of this change, and which one is better..
>> 
>> That is basically what patch 5 and 6 do O:-)
>> 
>> Problem is arriving to something that is bisectable (for correctness)
>> and is easy to review.
>> 
>> And yes, my choices can be different from the ones tat you do.
>> 
>> The other reason for the small patches is that:
>> a - sometimes I found a different test where things broke, and have to
>>     bisect
>> b - small patches are much easier to rebase (that I am doing a lot)
>
> That's okay.  Thanks,

Later, Juan.
diff mbox series

Patch

diff --git a/migration/qemu-file.c b/migration/qemu-file.c
index 7a5c1a5e32..be3dab85cb 100644
--- a/migration/qemu-file.c
+++ b/migration/qemu-file.c
@@ -342,7 +342,6 @@  static ssize_t coroutine_mixed_fn qemu_fill_buffer(QEMUFile *f)
 
     if (len > 0) {
         f->buf_size += len;
-        f->total_transferred += len;
     } else if (len == 0) {
         qemu_file_set_error_obj(f, -EIO, local_error);
     } else {
@@ -632,6 +631,8 @@  uint64_t qemu_file_transferred_noflush(QEMUFile *f)
     uint64_t ret = f->total_transferred;
     int i;
 
+    g_assert(qemu_file_is_writable(f));
+
     for (i = 0; i < f->iovcnt; i++) {
         ret += f->iov[i].iov_len;
     }
@@ -641,6 +642,7 @@  uint64_t qemu_file_transferred_noflush(QEMUFile *f)
 
 uint64_t qemu_file_transferred(QEMUFile *f)
 {
+    g_assert(qemu_file_is_writable(f));
     qemu_fflush(f);
     return f->total_transferred;
 }