diff mbox

[QEMU,1/7] balloon: speed up inflating & deflating process

Message ID 1465813009-21390-2-git-send-email-liang.z.li@intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Liang Li June 13, 2016, 10:16 a.m. UTC
The implementation of the current virtio-balloon is not very efficient,
Bellow is test result of time spends on inflating the balloon to 3GB of
a 4GB idle guest:

a. allocating pages (6.5%, 103ms)
b. sending PFNs to host (68.3%, 787ms)
c. address translation (6.1%, 96ms)
d. madvise (19%, 300ms)

It takes about 1577ms for the whole inflating process to complete. The
test shows that the bottle neck is the stage b and stage d.

If using a bitmap to send the page info instead of the PFNs, we can
reduce the overhead spends on stage b quite a lot. Furthermore, it's
possible to do the address translation and do the madvise with a bulk
of pages, instead of the current page per page way, so the overhead of
stage c and stage d can also be reduced a lot.

This patch is the QEMU side implementation which is intended to speed
up the inflating & deflating process by adding a new feature to the
virtio-balloon device. And now, inflating the balloon to 3GB of a 4GB
idle guest only takes 210ms, it's about 8 times as fast as before.

TODO: optimize stage a by allocating/freeing a chunk of pages instead
of a single page at a time.

Signed-off-by: Liang Li <liang.z.li@intel.com>
---
 hw/virtio/virtio-balloon.c                      | 159 ++++++++++++++++++++----
 include/standard-headers/linux/virtio_balloon.h |   1 +
 2 files changed, 139 insertions(+), 21 deletions(-)

Comments

Thomas Huth June 14, 2016, 11:37 a.m. UTC | #1
On 13.06.2016 12:16, Liang Li wrote:
> The implementation of the current virtio-balloon is not very efficient,
> Bellow is test result of time spends on inflating the balloon to 3GB of
> a 4GB idle guest:
> 
> a. allocating pages (6.5%, 103ms)
> b. sending PFNs to host (68.3%, 787ms)
> c. address translation (6.1%, 96ms)
> d. madvise (19%, 300ms)
> 
> It takes about 1577ms for the whole inflating process to complete. The
> test shows that the bottle neck is the stage b and stage d.
> 
> If using a bitmap to send the page info instead of the PFNs, we can
> reduce the overhead spends on stage b quite a lot. Furthermore, it's
> possible to do the address translation and do the madvise with a bulk
> of pages, instead of the current page per page way, so the overhead of
> stage c and stage d can also be reduced a lot.
> 
> This patch is the QEMU side implementation which is intended to speed
> up the inflating & deflating process by adding a new feature to the
> virtio-balloon device. And now, inflating the balloon to 3GB of a 4GB
> idle guest only takes 210ms, it's about 8 times as fast as before.
> 
> TODO: optimize stage a by allocating/freeing a chunk of pages instead
> of a single page at a time.
> 
> Signed-off-by: Liang Li <liang.z.li@intel.com>
> ---
>  hw/virtio/virtio-balloon.c                      | 159 ++++++++++++++++++++----
>  include/standard-headers/linux/virtio_balloon.h |   1 +
>  2 files changed, 139 insertions(+), 21 deletions(-)
> 
> diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
> index 8c15e09..8cf74c2 100644
> --- a/hw/virtio/virtio-balloon.c
> +++ b/hw/virtio/virtio-balloon.c
> @@ -47,6 +47,76 @@ static void balloon_page(void *addr, int deflate)
>  #endif
>  }
>  
> +static void do_balloon_bulk_pages(ram_addr_t base_pfn, int page_shift,
> +                                  unsigned long len, bool deflate)
> +{
> +    ram_addr_t size, processed, chunk, base;
> +    void *addr;
> +    MemoryRegionSection section = {.mr = NULL};
> +
> +    size = (len << page_shift);
> +    base = (base_pfn << page_shift);
> +
> +    for (processed = 0; processed < size; processed += chunk) {
> +        chunk = size - processed;
> +        while (chunk >= TARGET_PAGE_SIZE) {
> +            section = memory_region_find(get_system_memory(),
> +                                         base + processed, chunk);
> +            if (!section.mr) {
> +                chunk = QEMU_ALIGN_DOWN(chunk / 2, TARGET_PAGE_SIZE);
> +            } else {
> +                break;
> +            }
> +        }
> +
> +        if (section.mr &&
> +            (int128_nz(section.size) && memory_region_is_ram(section.mr))) {
> +            addr = section.offset_within_region +
> +                   memory_region_get_ram_ptr(section.mr);
> +            qemu_madvise(addr, chunk,
> +                         deflate ? QEMU_MADV_WILLNEED : QEMU_MADV_DONTNEED);
> +        } else {
> +            fprintf(stderr, "can't find the chunk, skip\n");

Please try to avoid new fprintf(stderr, ...) in the QEMU sources.
Use error_report(...) or in this case maybe rather
qemu_log_mask(LOG_GUEST_ERROR, ...) instead, and try to use a more
reasonable error message (e.g. that it is clear that the error happened
in the balloon code).

> +            chunk = TARGET_PAGE_SIZE;
> +        }
> +    }
> +}
> +
> +static void balloon_bulk_pages(ram_addr_t base_pfn, unsigned long *bitmap,
> +                               unsigned long len, int page_shift, bool deflate)
> +{
> +#if defined(__linux__)

Why do you need this #if here?

> +    unsigned long end  = len * 8;
> +    unsigned long current = 0;
> +
> +    if (!qemu_balloon_is_inhibited() && (!kvm_enabled() ||
> +                                         kvm_has_sync_mmu())) {
> +        while (current < end) {
> +            unsigned long one = find_next_bit(bitmap, end, current);
> +
> +            if (one < end) {
> +                unsigned long zero = find_next_zero_bit(bitmap, end, one + 1);
> +                unsigned long page_length;
> +
> +                if (zero >= end) {
> +                    page_length = end - one;
> +                } else {
> +                    page_length = zero - one;
> +                }
> +
> +                if (page_length) {
> +                    do_balloon_bulk_pages(base_pfn + one, page_shift,
> +                                          page_length, deflate);
> +                }
> +                current = one + page_length;
> +            } else {
> +                current = one;
> +            }
> +        }
> +    }
> +#endif
> +}

 Thomas
Liang Li June 14, 2016, 2:22 p.m. UTC | #2
> Subject: Re: [QEMU 1/7] balloon: speed up inflating & deflating process

> 

> On 13.06.2016 12:16, Liang Li wrote:

> > The implementation of the current virtio-balloon is not very

> > efficient, Bellow is test result of time spends on inflating the

> > balloon to 3GB of a 4GB idle guest:

> >

> > a. allocating pages (6.5%, 103ms)

> > b. sending PFNs to host (68.3%, 787ms) c. address translation (6.1%,

> > 96ms) d. madvise (19%, 300ms)

> >

> > It takes about 1577ms for the whole inflating process to complete. The

> > test shows that the bottle neck is the stage b and stage d.

> >

> > If using a bitmap to send the page info instead of the PFNs, we can

> > reduce the overhead spends on stage b quite a lot. Furthermore, it's

> > possible to do the address translation and do the madvise with a bulk

> > of pages, instead of the current page per page way, so the overhead of

> > stage c and stage d can also be reduced a lot.

> >

> > This patch is the QEMU side implementation which is intended to speed

> > up the inflating & deflating process by adding a new feature to the

> > virtio-balloon device. And now, inflating the balloon to 3GB of a 4GB

> > idle guest only takes 210ms, it's about 8 times as fast as before.

> >

> > TODO: optimize stage a by allocating/freeing a chunk of pages instead

> > of a single page at a time.

> >

> > Signed-off-by: Liang Li <liang.z.li@intel.com>

> > ---

> >  hw/virtio/virtio-balloon.c                      | 159 ++++++++++++++++++++----

> >  include/standard-headers/linux/virtio_balloon.h |   1 +

> >  2 files changed, 139 insertions(+), 21 deletions(-)

> >

> > diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c

> > index 8c15e09..8cf74c2 100644

> > --- a/hw/virtio/virtio-balloon.c

> > +++ b/hw/virtio/virtio-balloon.c

> > @@ -47,6 +47,76 @@ static void balloon_page(void *addr, int deflate)

> > #endif  }

> >

> > +static void do_balloon_bulk_pages(ram_addr_t base_pfn, int page_shift,

> > +                                  unsigned long len, bool deflate) {

> > +    ram_addr_t size, processed, chunk, base;

> > +    void *addr;

> > +    MemoryRegionSection section = {.mr = NULL};

> > +

> > +    size = (len << page_shift);

> > +    base = (base_pfn << page_shift);

> > +

> > +    for (processed = 0; processed < size; processed += chunk) {

> > +        chunk = size - processed;

> > +        while (chunk >= TARGET_PAGE_SIZE) {

> > +            section = memory_region_find(get_system_memory(),

> > +                                         base + processed, chunk);

> > +            if (!section.mr) {

> > +                chunk = QEMU_ALIGN_DOWN(chunk / 2, TARGET_PAGE_SIZE);

> > +            } else {

> > +                break;

> > +            }

> > +        }

> > +

> > +        if (section.mr &&

> > +            (int128_nz(section.size) && memory_region_is_ram(section.mr))) {

> > +            addr = section.offset_within_region +

> > +                   memory_region_get_ram_ptr(section.mr);

> > +            qemu_madvise(addr, chunk,

> > +                         deflate ? QEMU_MADV_WILLNEED :

> QEMU_MADV_DONTNEED);

> > +        } else {

> > +            fprintf(stderr, "can't find the chunk, skip\n");

> 

> Please try to avoid new fprintf(stderr, ...) in the QEMU sources.

> Use error_report(...) or in this case maybe rather

> qemu_log_mask(LOG_GUEST_ERROR, ...) instead, and try to use a more

> reasonable error message (e.g. that it is clear that the error happened in the

> balloon code).

> 


Indeed, the error message is no good, will change in next version.

> > +            chunk = TARGET_PAGE_SIZE;

> > +        }

> > +    }

> > +}

> > +

> > +static void balloon_bulk_pages(ram_addr_t base_pfn, unsigned long

> *bitmap,

> > +                               unsigned long len, int page_shift,

> > +bool deflate) { #if defined(__linux__)

> 

> Why do you need this #if here?

> 


Ooh,  it is wrong to add the '#if' here, will remove.

Thanks a lot!

Liang
Liang Li June 14, 2016, 2:41 p.m. UTC | #3
> > On 13.06.2016 12:16, Liang Li wrote:

> > > The implementation of the current virtio-balloon is not very

> > > efficient, Bellow is test result of time spends on inflating the

> > > balloon to 3GB of a 4GB idle guest:

> > >

> > > a. allocating pages (6.5%, 103ms)

> > > b. sending PFNs to host (68.3%, 787ms) c. address translation (6.1%,

> > > 96ms) d. madvise (19%, 300ms)

> > >

> > > It takes about 1577ms for the whole inflating process to complete.

> > > The test shows that the bottle neck is the stage b and stage d.

> > >

> > > If using a bitmap to send the page info instead of the PFNs, we can

> > > reduce the overhead spends on stage b quite a lot. Furthermore, it's

> > > possible to do the address translation and do the madvise with a

> > > bulk of pages, instead of the current page per page way, so the

> > > overhead of stage c and stage d can also be reduced a lot.

> > >

> > > This patch is the QEMU side implementation which is intended to

> > > speed up the inflating & deflating process by adding a new feature

> > > to the virtio-balloon device. And now, inflating the balloon to 3GB

> > > of a 4GB idle guest only takes 210ms, it's about 8 times as fast as before.

> > >

> > > TODO: optimize stage a by allocating/freeing a chunk of pages

> > > instead of a single page at a time.

> > >

> > > Signed-off-by: Liang Li <liang.z.li@intel.com>

> > > ---

> > >  hw/virtio/virtio-balloon.c                      | 159 ++++++++++++++++++++----

> > >  include/standard-headers/linux/virtio_balloon.h |   1 +

> > >  2 files changed, 139 insertions(+), 21 deletions(-)

> > >

> > > diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c

> > > index 8c15e09..8cf74c2 100644

> > > --- a/hw/virtio/virtio-balloon.c

> > > +++ b/hw/virtio/virtio-balloon.c

> > > @@ -47,6 +47,76 @@ static void balloon_page(void *addr, int deflate)

> > > #endif  }

> > >

> > > +static void do_balloon_bulk_pages(ram_addr_t base_pfn, int

> page_shift,

> > > +                                  unsigned long len, bool deflate) {

> > > +    ram_addr_t size, processed, chunk, base;

> > > +    void *addr;

> > > +    MemoryRegionSection section = {.mr = NULL};

> > > +

> > > +    size = (len << page_shift);

> > > +    base = (base_pfn << page_shift);

> > > +

> > > +    for (processed = 0; processed < size; processed += chunk) {

> > > +        chunk = size - processed;

> > > +        while (chunk >= TARGET_PAGE_SIZE) {

> > > +            section = memory_region_find(get_system_memory(),

> > > +                                         base + processed, chunk);

> > > +            if (!section.mr) {

> > > +                chunk = QEMU_ALIGN_DOWN(chunk / 2, TARGET_PAGE_SIZE);

> > > +            } else {

> > > +                break;

> > > +            }

> > > +        }

> > > +

> > > +        if (section.mr &&

> > > +            (int128_nz(section.size) && memory_region_is_ram(section.mr)))

> {

> > > +            addr = section.offset_within_region +

> > > +                   memory_region_get_ram_ptr(section.mr);

> > > +            qemu_madvise(addr, chunk,

> > > +                         deflate ? QEMU_MADV_WILLNEED :

> > QEMU_MADV_DONTNEED);

> > > +        } else {

> > > +            fprintf(stderr, "can't find the chunk, skip\n");

> >

> > Please try to avoid new fprintf(stderr, ...) in the QEMU sources.

> > Use error_report(...) or in this case maybe rather

> > qemu_log_mask(LOG_GUEST_ERROR, ...) instead, and try to use a more

> > reasonable error message (e.g. that it is clear that the error

> > happened in the balloon code).

> >

> 

> Indeed, the error message is no good, will change in next version.

> 

> > > +            chunk = TARGET_PAGE_SIZE;

> > > +        }

> > > +    }

> > > +}

> > > +

> > > +static void balloon_bulk_pages(ram_addr_t base_pfn, unsigned long

> > *bitmap,

> > > +                               unsigned long len, int page_shift,

> > > +bool deflate) { #if defined(__linux__)

> >

> > Why do you need this #if here?

> >

> 

> Ooh,  it is wrong to add the '#if' here, will remove.

No, it is needed, just follow the code in balloon_page().
only Linux support the madvise().

Liang
Thomas Huth June 14, 2016, 3:33 p.m. UTC | #4
On 14.06.2016 16:41, Li, Liang Z wrote:
>>> On 13.06.2016 12:16, Liang Li wrote:
>>>> The implementation of the current virtio-balloon is not very
>>>> efficient, Bellow is test result of time spends on inflating the
>>>> balloon to 3GB of a 4GB idle guest:
>>>>
>>>> a. allocating pages (6.5%, 103ms)
>>>> b. sending PFNs to host (68.3%, 787ms) c. address translation (6.1%,
>>>> 96ms) d. madvise (19%, 300ms)
>>>>
>>>> It takes about 1577ms for the whole inflating process to complete.
>>>> The test shows that the bottle neck is the stage b and stage d.
>>>>
>>>> If using a bitmap to send the page info instead of the PFNs, we can
>>>> reduce the overhead spends on stage b quite a lot. Furthermore, it's
>>>> possible to do the address translation and do the madvise with a
>>>> bulk of pages, instead of the current page per page way, so the
>>>> overhead of stage c and stage d can also be reduced a lot.
>>>>
>>>> This patch is the QEMU side implementation which is intended to
>>>> speed up the inflating & deflating process by adding a new feature
>>>> to the virtio-balloon device. And now, inflating the balloon to 3GB
>>>> of a 4GB idle guest only takes 210ms, it's about 8 times as fast as before.
[...]
>>>> +            chunk = TARGET_PAGE_SIZE;
>>>> +        }
>>>> +    }
>>>> +}
>>>> +
>>>> +static void balloon_bulk_pages(ram_addr_t base_pfn, unsigned long
>>> *bitmap,
>>>> +                               unsigned long len, int page_shift,
>>>> +bool deflate) { #if defined(__linux__)
>>>
>>> Why do you need this #if here?
>>>
>> Ooh,  it is wrong to add the '#if' here, will remove.
>>
> No, it is needed, just follow the code in balloon_page().
> only Linux support the madvise().

I think it is not needed anymore today and the #if in balloon_page could
be removed, too: As far as I can see, the #if there is from the early
days, when there was no wrapper around madvise() yet. But nowadays,
we've got the qemu_madvise() wrapper which takes care of either using
madvise(), posix_madvise() or doing nothing, so the virtio-balloon code
should be able to work without the #if now.

 Thomas
Liang Li June 17, 2016, 12:54 a.m. UTC | #5
> >>>> +            chunk = TARGET_PAGE_SIZE;

> >>>> +        }

> >>>> +    }

> >>>> +}

> >>>> +

> >>>> +static void balloon_bulk_pages(ram_addr_t base_pfn, unsigned long

> >>> *bitmap,

> >>>> +                               unsigned long len, int page_shift,

> >>>> +bool deflate) { #if defined(__linux__)

> >>>

> >>> Why do you need this #if here?

> >>>

> >> Ooh,  it is wrong to add the '#if' here, will remove.

> >>

> > No, it is needed, just follow the code in balloon_page().

> > only Linux support the madvise().

> 

> I think it is not needed anymore today and the #if in balloon_page could be

> removed, too: As far as I can see, the #if there is from the early days, when

> there was no wrapper around madvise() yet. But nowadays, we've got the

> qemu_madvise() wrapper which takes care of either using madvise(),

> posix_madvise() or doing nothing, so the virtio-balloon code should be able

> to work without the #if now.

> 

>  Thomas


You are right! I will remove both of them.

Thanks!
Liang
Michael S. Tsirkin June 19, 2016, 4:12 a.m. UTC | #6
On Mon, Jun 13, 2016 at 06:16:43PM +0800, Liang Li wrote:
> The implementation of the current virtio-balloon is not very efficient,
> Bellow is test result of time spends on inflating the balloon to 3GB of
> a 4GB idle guest:
> 
> a. allocating pages (6.5%, 103ms)
> b. sending PFNs to host (68.3%, 787ms)
> c. address translation (6.1%, 96ms)
> d. madvise (19%, 300ms)
> 
> It takes about 1577ms for the whole inflating process to complete. The
> test shows that the bottle neck is the stage b and stage d.
> 
> If using a bitmap to send the page info instead of the PFNs, we can
> reduce the overhead spends on stage b quite a lot. Furthermore, it's
> possible to do the address translation and do the madvise with a bulk
> of pages, instead of the current page per page way, so the overhead of
> stage c and stage d can also be reduced a lot.
> 
> This patch is the QEMU side implementation which is intended to speed
> up the inflating & deflating process by adding a new feature to the
> virtio-balloon device. And now, inflating the balloon to 3GB of a 4GB
> idle guest only takes 210ms, it's about 8 times as fast as before.
> 
> TODO: optimize stage a by allocating/freeing a chunk of pages instead
> of a single page at a time.
> 
> Signed-off-by: Liang Li <liang.z.li@intel.com>
> ---
>  hw/virtio/virtio-balloon.c                      | 159 ++++++++++++++++++++----
>  include/standard-headers/linux/virtio_balloon.h |   1 +
>  2 files changed, 139 insertions(+), 21 deletions(-)
> 
> diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
> index 8c15e09..8cf74c2 100644
> --- a/hw/virtio/virtio-balloon.c
> +++ b/hw/virtio/virtio-balloon.c
> @@ -47,6 +47,76 @@ static void balloon_page(void *addr, int deflate)
>  #endif
>  }
>  
> +static void do_balloon_bulk_pages(ram_addr_t base_pfn, int page_shift,
> +                                  unsigned long len, bool deflate)
> +{
> +    ram_addr_t size, processed, chunk, base;
> +    void *addr;
> +    MemoryRegionSection section = {.mr = NULL};
> +
> +    size = (len << page_shift);
> +    base = (base_pfn << page_shift);
> +
> +    for (processed = 0; processed < size; processed += chunk) {
> +        chunk = size - processed;
> +        while (chunk >= TARGET_PAGE_SIZE) {
> +            section = memory_region_find(get_system_memory(),
> +                                         base + processed, chunk);
> +            if (!section.mr) {
> +                chunk = QEMU_ALIGN_DOWN(chunk / 2, TARGET_PAGE_SIZE);
> +            } else {
> +                break;
> +            }
> +        }
> +
> +        if (section.mr &&
> +            (int128_nz(section.size) && memory_region_is_ram(section.mr))) {
> +            addr = section.offset_within_region +
> +                   memory_region_get_ram_ptr(section.mr);
> +            qemu_madvise(addr, chunk,
> +                         deflate ? QEMU_MADV_WILLNEED : QEMU_MADV_DONTNEED);
> +        } else {
> +            fprintf(stderr, "can't find the chunk, skip\n");
> +            chunk = TARGET_PAGE_SIZE;
> +        }
> +    }
> +}
> +
> +static void balloon_bulk_pages(ram_addr_t base_pfn, unsigned long *bitmap,
> +                               unsigned long len, int page_shift, bool deflate)
> +{
> +#if defined(__linux__)
> +    unsigned long end  = len * 8;
> +    unsigned long current = 0;
> +
> +    if (!qemu_balloon_is_inhibited() && (!kvm_enabled() ||
> +                                         kvm_has_sync_mmu())) {
> +        while (current < end) {
> +            unsigned long one = find_next_bit(bitmap, end, current);
> +
> +            if (one < end) {
> +                unsigned long zero = find_next_zero_bit(bitmap, end, one + 1);
> +                unsigned long page_length;
> +
> +                if (zero >= end) {
> +                    page_length = end - one;
> +                } else {
> +                    page_length = zero - one;
> +                }
> +
> +                if (page_length) {
> +                    do_balloon_bulk_pages(base_pfn + one, page_shift,
> +                                          page_length, deflate);
> +                }
> +                current = one + page_length;
> +            } else {
> +                current = one;
> +            }
> +        }
> +    }
> +#endif
> +}
> +
>  static const char *balloon_stat_names[] = {
>     [VIRTIO_BALLOON_S_SWAP_IN] = "stat-swap-in",
>     [VIRTIO_BALLOON_S_SWAP_OUT] = "stat-swap-out",
> @@ -78,6 +148,12 @@ static bool balloon_stats_supported(const VirtIOBalloon *s)
>      return virtio_vdev_has_feature(vdev, VIRTIO_BALLOON_F_STATS_VQ);
>  }
>  
> +static bool balloon_page_bitmap_supported(const VirtIOBalloon *s)
> +{
> +    VirtIODevice *vdev = VIRTIO_DEVICE(s);
> +    return virtio_vdev_has_feature(vdev, VIRTIO_BALLOON_F_PAGE_BITMAP);
> +}
> +
>  static bool balloon_stats_enabled(const VirtIOBalloon *s)
>  {
>      return s->stats_poll_interval > 0;
> @@ -224,27 +300,66 @@ static void virtio_balloon_handle_output(VirtIODevice *vdev, VirtQueue *vq)
>              return;
>          }
>  
> -        while (iov_to_buf(elem->out_sg, elem->out_num, offset, &pfn, 4) == 4) {
> -            ram_addr_t pa;
> -            ram_addr_t addr;
> -            int p = virtio_ldl_p(vdev, &pfn);
> -
> -            pa = (ram_addr_t) p << VIRTIO_BALLOON_PFN_SHIFT;
> -            offset += 4;
> -
> -            /* FIXME: remove get_system_memory(), but how? */
> -            section = memory_region_find(get_system_memory(), pa, 1);
> -            if (!int128_nz(section.size) || !memory_region_is_ram(section.mr))
> -                continue;
> -
> -            trace_virtio_balloon_handle_output(memory_region_name(section.mr),
> -                                               pa);
> -            /* Using memory_region_get_ram_ptr is bending the rules a bit, but
> -               should be OK because we only want a single page.  */
> -            addr = section.offset_within_region;
> -            balloon_page(memory_region_get_ram_ptr(section.mr) + addr,
> -                         !!(vq == s->dvq));
> -            memory_region_unref(section.mr);
> +        if (balloon_page_bitmap_supported(s)) {
> +            uint64_t base_pfn, tmp64, bmap_len;
> +            uint32_t tmp32, page_shift, id;
> +            unsigned long *bitmap;
> +
> +            iov_to_buf(elem->out_sg, elem->out_num, offset,
> +                       &tmp32, sizeof(uint32_t));
> +            id = virtio_ldl_p(vdev, &tmp32);
> +            offset += sizeof(uint32_t);
> +            /* to suppress build warning */
> +            id = id;
> +
> +            iov_to_buf(elem->out_sg, elem->out_num, offset,
> +                       &tmp32, sizeof(uint32_t));
> +            page_shift = virtio_ldl_p(vdev, &tmp32);
> +            offset += sizeof(uint32_t);
> +
> +            iov_to_buf(elem->out_sg, elem->out_num, offset,
> +                       &tmp64, sizeof(uint64_t));
> +            base_pfn = virtio_ldq_p(vdev, &tmp64);
> +            offset += sizeof(uint64_t);
> +
> +            iov_to_buf(elem->out_sg, elem->out_num, offset,
> +                       &tmp64, sizeof(uint64_t));
> +            bmap_len = virtio_ldq_p(vdev, &tmp64);
> +            offset += sizeof(uint64_t);
> +
> +            bitmap = bitmap_new(bmap_len * BITS_PER_BYTE);
> +            iov_to_buf(elem->out_sg, elem->out_num, offset,
> +                       bitmap, bmap_len);
> +            offset += bmap_len;
> +
> +            balloon_bulk_pages(base_pfn, bitmap, bmap_len,
> +                               page_shift, !!(vq == s->dvq));
> +            g_free(bitmap);
> +        } else {
> +            while (iov_to_buf(elem->out_sg, elem->out_num, offset,
> +                              &pfn, 4) == 4) {
> +                ram_addr_t pa;
> +                ram_addr_t addr;
> +                int p = virtio_ldl_p(vdev, &pfn);
> +
> +                pa = (ram_addr_t) p << VIRTIO_BALLOON_PFN_SHIFT;
> +                offset += 4;
> +
> +                /* FIXME: remove get_system_memory(), but how? */
> +                section = memory_region_find(get_system_memory(), pa, 1);
> +                if (!int128_nz(section.size) ||
> +                    !memory_region_is_ram(section.mr))
> +                    continue;
> +
> +                trace_virtio_balloon_handle_output(memory_region_name(
> +                                                            section.mr), pa);
> +                /* Using memory_region_get_ram_ptr is bending the rules a bit,
> +                 * but should be OK because we only want a single page.  */
> +                addr = section.offset_within_region;
> +                balloon_page(memory_region_get_ram_ptr(section.mr) + addr,
> +                             !!(vq == s->dvq));
> +                memory_region_unref(section.mr);
> +            }
>          }
>  
>          virtqueue_push(vq, elem, offset);
> @@ -374,6 +489,7 @@ static uint64_t virtio_balloon_get_features(VirtIODevice *vdev, uint64_t f,
>      VirtIOBalloon *dev = VIRTIO_BALLOON(vdev);
>      f |= dev->host_features;
>      virtio_add_feature(&f, VIRTIO_BALLOON_F_STATS_VQ);
> +    virtio_add_feature(&f, VIRTIO_BALLOON_F_PAGE_BITMAP);
>      return f;
>  }
>

Pls add features to virtio_balloon_properties.
You also need to handle compatibility by disabling for
old machine types.

  
> @@ -388,6 +504,7 @@ static void virtio_balloon_to_target(void *opaque, ram_addr_t target)
>  {
>      VirtIOBalloon *dev = VIRTIO_BALLOON(opaque);
>      VirtIODevice *vdev = VIRTIO_DEVICE(dev);
> +
>      ram_addr_t vm_ram_size = get_current_ram_size();
>  
>      if (target > vm_ram_size) {
> diff --git a/include/standard-headers/linux/virtio_balloon.h b/include/standard-headers/linux/virtio_balloon.h
> index 9d06ccd..7c9686c 100644
> --- a/include/standard-headers/linux/virtio_balloon.h
> +++ b/include/standard-headers/linux/virtio_balloon.h
> @@ -34,6 +34,7 @@
>  #define VIRTIO_BALLOON_F_MUST_TELL_HOST	0 /* Tell before reclaiming pages */
>  #define VIRTIO_BALLOON_F_STATS_VQ	1 /* Memory Stats virtqueue */
>  #define VIRTIO_BALLOON_F_DEFLATE_ON_OOM	2 /* Deflate balloon on OOM */
> +#define VIRTIO_BALLOON_F_PAGE_BITMAP  3 /* Use page bitmap to send page info */
>  
>  /* Size of a PFN in the balloon interface. */
>  #define VIRTIO_BALLOON_PFN_SHIFT 12

We want to keep this in sync with Linux.
Let's get a minimal patch to extend this header merged in linux, then update this one.

> -- 
> 1.9.1
Liang Li June 20, 2016, 1:37 a.m. UTC | #7
> >
> >          virtqueue_push(vq, elem, offset); @@ -374,6 +489,7 @@ static
> > uint64_t virtio_balloon_get_features(VirtIODevice *vdev, uint64_t f,
> >      VirtIOBalloon *dev = VIRTIO_BALLOON(vdev);
> >      f |= dev->host_features;
> >      virtio_add_feature(&f, VIRTIO_BALLOON_F_STATS_VQ);
> > +    virtio_add_feature(&f, VIRTIO_BALLOON_F_PAGE_BITMAP);
> >      return f;
> >  }
> >
> 
> Pls add features to virtio_balloon_properties.
> You also need to handle compatibility by disabling for old machine types.
> 

I forgot that, will add in next version.

> > --- a/include/standard-headers/linux/virtio_balloon.h
> > +++ b/include/standard-headers/linux/virtio_balloon.h
> > @@ -34,6 +34,7 @@
> >  #define VIRTIO_BALLOON_F_MUST_TELL_HOST	0 /* Tell before
> reclaiming pages */
> >  #define VIRTIO_BALLOON_F_STATS_VQ	1 /* Memory Stats virtqueue
> */
> >  #define VIRTIO_BALLOON_F_DEFLATE_ON_OOM	2 /* Deflate balloon
> on OOM */
> > +#define VIRTIO_BALLOON_F_PAGE_BITMAP  3 /* Use page bitmap to
> send
> > +page info */
> >
> >  /* Size of a PFN in the balloon interface. */  #define
> > VIRTIO_BALLOON_PFN_SHIFT 12
> 
> We want to keep this in sync with Linux.
> Let's get a minimal patch to extend this header merged in linux, then update
> this one.

OK. Can this be independent of the virtio-balloon SPEC? As I understand it,
 it will not get merged before the SPEC is set?

Thanks!
Liang
diff mbox

Patch

diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
index 8c15e09..8cf74c2 100644
--- a/hw/virtio/virtio-balloon.c
+++ b/hw/virtio/virtio-balloon.c
@@ -47,6 +47,76 @@  static void balloon_page(void *addr, int deflate)
 #endif
 }
 
+static void do_balloon_bulk_pages(ram_addr_t base_pfn, int page_shift,
+                                  unsigned long len, bool deflate)
+{
+    ram_addr_t size, processed, chunk, base;
+    void *addr;
+    MemoryRegionSection section = {.mr = NULL};
+
+    size = (len << page_shift);
+    base = (base_pfn << page_shift);
+
+    for (processed = 0; processed < size; processed += chunk) {
+        chunk = size - processed;
+        while (chunk >= TARGET_PAGE_SIZE) {
+            section = memory_region_find(get_system_memory(),
+                                         base + processed, chunk);
+            if (!section.mr) {
+                chunk = QEMU_ALIGN_DOWN(chunk / 2, TARGET_PAGE_SIZE);
+            } else {
+                break;
+            }
+        }
+
+        if (section.mr &&
+            (int128_nz(section.size) && memory_region_is_ram(section.mr))) {
+            addr = section.offset_within_region +
+                   memory_region_get_ram_ptr(section.mr);
+            qemu_madvise(addr, chunk,
+                         deflate ? QEMU_MADV_WILLNEED : QEMU_MADV_DONTNEED);
+        } else {
+            fprintf(stderr, "can't find the chunk, skip\n");
+            chunk = TARGET_PAGE_SIZE;
+        }
+    }
+}
+
+static void balloon_bulk_pages(ram_addr_t base_pfn, unsigned long *bitmap,
+                               unsigned long len, int page_shift, bool deflate)
+{
+#if defined(__linux__)
+    unsigned long end  = len * 8;
+    unsigned long current = 0;
+
+    if (!qemu_balloon_is_inhibited() && (!kvm_enabled() ||
+                                         kvm_has_sync_mmu())) {
+        while (current < end) {
+            unsigned long one = find_next_bit(bitmap, end, current);
+
+            if (one < end) {
+                unsigned long zero = find_next_zero_bit(bitmap, end, one + 1);
+                unsigned long page_length;
+
+                if (zero >= end) {
+                    page_length = end - one;
+                } else {
+                    page_length = zero - one;
+                }
+
+                if (page_length) {
+                    do_balloon_bulk_pages(base_pfn + one, page_shift,
+                                          page_length, deflate);
+                }
+                current = one + page_length;
+            } else {
+                current = one;
+            }
+        }
+    }
+#endif
+}
+
 static const char *balloon_stat_names[] = {
    [VIRTIO_BALLOON_S_SWAP_IN] = "stat-swap-in",
    [VIRTIO_BALLOON_S_SWAP_OUT] = "stat-swap-out",
@@ -78,6 +148,12 @@  static bool balloon_stats_supported(const VirtIOBalloon *s)
     return virtio_vdev_has_feature(vdev, VIRTIO_BALLOON_F_STATS_VQ);
 }
 
+static bool balloon_page_bitmap_supported(const VirtIOBalloon *s)
+{
+    VirtIODevice *vdev = VIRTIO_DEVICE(s);
+    return virtio_vdev_has_feature(vdev, VIRTIO_BALLOON_F_PAGE_BITMAP);
+}
+
 static bool balloon_stats_enabled(const VirtIOBalloon *s)
 {
     return s->stats_poll_interval > 0;
@@ -224,27 +300,66 @@  static void virtio_balloon_handle_output(VirtIODevice *vdev, VirtQueue *vq)
             return;
         }
 
-        while (iov_to_buf(elem->out_sg, elem->out_num, offset, &pfn, 4) == 4) {
-            ram_addr_t pa;
-            ram_addr_t addr;
-            int p = virtio_ldl_p(vdev, &pfn);
-
-            pa = (ram_addr_t) p << VIRTIO_BALLOON_PFN_SHIFT;
-            offset += 4;
-
-            /* FIXME: remove get_system_memory(), but how? */
-            section = memory_region_find(get_system_memory(), pa, 1);
-            if (!int128_nz(section.size) || !memory_region_is_ram(section.mr))
-                continue;
-
-            trace_virtio_balloon_handle_output(memory_region_name(section.mr),
-                                               pa);
-            /* Using memory_region_get_ram_ptr is bending the rules a bit, but
-               should be OK because we only want a single page.  */
-            addr = section.offset_within_region;
-            balloon_page(memory_region_get_ram_ptr(section.mr) + addr,
-                         !!(vq == s->dvq));
-            memory_region_unref(section.mr);
+        if (balloon_page_bitmap_supported(s)) {
+            uint64_t base_pfn, tmp64, bmap_len;
+            uint32_t tmp32, page_shift, id;
+            unsigned long *bitmap;
+
+            iov_to_buf(elem->out_sg, elem->out_num, offset,
+                       &tmp32, sizeof(uint32_t));
+            id = virtio_ldl_p(vdev, &tmp32);
+            offset += sizeof(uint32_t);
+            /* to suppress build warning */
+            id = id;
+
+            iov_to_buf(elem->out_sg, elem->out_num, offset,
+                       &tmp32, sizeof(uint32_t));
+            page_shift = virtio_ldl_p(vdev, &tmp32);
+            offset += sizeof(uint32_t);
+
+            iov_to_buf(elem->out_sg, elem->out_num, offset,
+                       &tmp64, sizeof(uint64_t));
+            base_pfn = virtio_ldq_p(vdev, &tmp64);
+            offset += sizeof(uint64_t);
+
+            iov_to_buf(elem->out_sg, elem->out_num, offset,
+                       &tmp64, sizeof(uint64_t));
+            bmap_len = virtio_ldq_p(vdev, &tmp64);
+            offset += sizeof(uint64_t);
+
+            bitmap = bitmap_new(bmap_len * BITS_PER_BYTE);
+            iov_to_buf(elem->out_sg, elem->out_num, offset,
+                       bitmap, bmap_len);
+            offset += bmap_len;
+
+            balloon_bulk_pages(base_pfn, bitmap, bmap_len,
+                               page_shift, !!(vq == s->dvq));
+            g_free(bitmap);
+        } else {
+            while (iov_to_buf(elem->out_sg, elem->out_num, offset,
+                              &pfn, 4) == 4) {
+                ram_addr_t pa;
+                ram_addr_t addr;
+                int p = virtio_ldl_p(vdev, &pfn);
+
+                pa = (ram_addr_t) p << VIRTIO_BALLOON_PFN_SHIFT;
+                offset += 4;
+
+                /* FIXME: remove get_system_memory(), but how? */
+                section = memory_region_find(get_system_memory(), pa, 1);
+                if (!int128_nz(section.size) ||
+                    !memory_region_is_ram(section.mr))
+                    continue;
+
+                trace_virtio_balloon_handle_output(memory_region_name(
+                                                            section.mr), pa);
+                /* Using memory_region_get_ram_ptr is bending the rules a bit,
+                 * but should be OK because we only want a single page.  */
+                addr = section.offset_within_region;
+                balloon_page(memory_region_get_ram_ptr(section.mr) + addr,
+                             !!(vq == s->dvq));
+                memory_region_unref(section.mr);
+            }
         }
 
         virtqueue_push(vq, elem, offset);
@@ -374,6 +489,7 @@  static uint64_t virtio_balloon_get_features(VirtIODevice *vdev, uint64_t f,
     VirtIOBalloon *dev = VIRTIO_BALLOON(vdev);
     f |= dev->host_features;
     virtio_add_feature(&f, VIRTIO_BALLOON_F_STATS_VQ);
+    virtio_add_feature(&f, VIRTIO_BALLOON_F_PAGE_BITMAP);
     return f;
 }
 
@@ -388,6 +504,7 @@  static void virtio_balloon_to_target(void *opaque, ram_addr_t target)
 {
     VirtIOBalloon *dev = VIRTIO_BALLOON(opaque);
     VirtIODevice *vdev = VIRTIO_DEVICE(dev);
+
     ram_addr_t vm_ram_size = get_current_ram_size();
 
     if (target > vm_ram_size) {
diff --git a/include/standard-headers/linux/virtio_balloon.h b/include/standard-headers/linux/virtio_balloon.h
index 9d06ccd..7c9686c 100644
--- a/include/standard-headers/linux/virtio_balloon.h
+++ b/include/standard-headers/linux/virtio_balloon.h
@@ -34,6 +34,7 @@ 
 #define VIRTIO_BALLOON_F_MUST_TELL_HOST	0 /* Tell before reclaiming pages */
 #define VIRTIO_BALLOON_F_STATS_VQ	1 /* Memory Stats virtqueue */
 #define VIRTIO_BALLOON_F_DEFLATE_ON_OOM	2 /* Deflate balloon on OOM */
+#define VIRTIO_BALLOON_F_PAGE_BITMAP  3 /* Use page bitmap to send page info */
 
 /* Size of a PFN in the balloon interface. */
 #define VIRTIO_BALLOON_PFN_SHIFT 12