diff mbox series

[1/5] virtio-balloon: Remove unnecessary MADV_WILLNEED on deflate

Message ID 20190214043916.22128-2-david@gibson.dropbear.id.au (mailing list archive)
State New, archived
Headers show
Series Improve balloon handling of pagesizes other than 4kiB | expand

Commit Message

David Gibson Feb. 14, 2019, 4:39 a.m. UTC
When the balloon is inflated, we discard memory place in it using madvise()
with MADV_DONTNEED.  And when we deflate it we use MADV_WILLNEED, which
sounds like it makes sense but is actually unnecessary.

The misleadingly named MADV_DONTNEED just discards the memory in question,
it doesn't set any persistent state on it in-kernel; all that's necessary
to bring the memory back is to touch it.  MADV_WILLNEED in contrast
specifically says that the memory will be used soon and faults it in.

This patch simplify's the balloon operation by dropping the madvise()
on deflate.  This might have an impact on performance - it will move a
delay at deflate time until that memory is actually touched, which
might be more latency sensitive.  However:

  * Memory that's being given back to the guest by deflating the
    balloon *might* be used soon, but it equally could just sit around
    in the guest's pools until needed (or even be faulted out again if
    the host is under memory pressure).

  * Usually, the timescale over which you'll be adjusting the balloon
    is long enough that a few extra faults after deflation aren't
    going to make a difference.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
---
 hw/virtio/virtio-balloon.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

Comments

Michael S. Tsirkin Feb. 28, 2019, 1:36 p.m. UTC | #1
On Thu, Feb 14, 2019 at 03:39:12PM +1100, David Gibson wrote:
> When the balloon is inflated, we discard memory place in it using madvise()
> with MADV_DONTNEED.  And when we deflate it we use MADV_WILLNEED, which
> sounds like it makes sense but is actually unnecessary.
> 
> The misleadingly named MADV_DONTNEED just discards the memory in question,
> it doesn't set any persistent state on it in-kernel; all that's necessary
> to bring the memory back is to touch it.  MADV_WILLNEED in contrast
> specifically says that the memory will be used soon and faults it in.
> 
> This patch simplify's the balloon operation by dropping the madvise()
> on deflate.  This might have an impact on performance - it will move a
> delay at deflate time until that memory is actually touched, which
> might be more latency sensitive.  However:
> 
>   * Memory that's being given back to the guest by deflating the
>     balloon *might* be used soon, but it equally could just sit around
>     in the guest's pools until needed (or even be faulted out again if
>     the host is under memory pressure).
> 
>   * Usually, the timescale over which you'll be adjusting the balloon
>     is long enough that a few extra faults after deflation aren't
>     going to make a difference.
> 
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> Reviewed-by: David Hildenbrand <david@redhat.com>
> Reviewed-by: Michael S. Tsirkin <mst@redhat.com>

I'm having second thoughts about this. It might affect performance but
probably won't but we have no idea.  Might cause latency jitter after
deflate where it previously didn't happen.  This kind of patch should
really be accompanied by benchmarking results, not philosophy.

> ---
>  hw/virtio/virtio-balloon.c | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
> index a12677d4d5..43af521884 100644
> --- a/hw/virtio/virtio-balloon.c
> +++ b/hw/virtio/virtio-balloon.c
> @@ -35,9 +35,8 @@
>  
>  static void balloon_page(void *addr, int deflate)
>  {
> -    if (!qemu_balloon_is_inhibited()) {
> -        qemu_madvise(addr, BALLOON_PAGE_SIZE,
> -                deflate ? QEMU_MADV_WILLNEED : QEMU_MADV_DONTNEED);
> +    if (!qemu_balloon_is_inhibited() && !deflate) {
> +        qemu_madvise(addr, BALLOON_PAGE_SIZE, QEMU_MADV_DONTNEED);
>      }
>  }
>  
> -- 
> 2.20.1
David Gibson March 5, 2019, 12:52 a.m. UTC | #2
On Thu, Feb 28, 2019 at 08:36:58AM -0500, Michael S. Tsirkin wrote:
> On Thu, Feb 14, 2019 at 03:39:12PM +1100, David Gibson wrote:
> > When the balloon is inflated, we discard memory place in it using madvise()
> > with MADV_DONTNEED.  And when we deflate it we use MADV_WILLNEED, which
> > sounds like it makes sense but is actually unnecessary.
> > 
> > The misleadingly named MADV_DONTNEED just discards the memory in question,
> > it doesn't set any persistent state on it in-kernel; all that's necessary
> > to bring the memory back is to touch it.  MADV_WILLNEED in contrast
> > specifically says that the memory will be used soon and faults it in.
> > 
> > This patch simplify's the balloon operation by dropping the madvise()
> > on deflate.  This might have an impact on performance - it will move a
> > delay at deflate time until that memory is actually touched, which
> > might be more latency sensitive.  However:
> > 
> >   * Memory that's being given back to the guest by deflating the
> >     balloon *might* be used soon, but it equally could just sit around
> >     in the guest's pools until needed (or even be faulted out again if
> >     the host is under memory pressure).
> > 
> >   * Usually, the timescale over which you'll be adjusting the balloon
> >     is long enough that a few extra faults after deflation aren't
> >     going to make a difference.
> > 
> > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > Reviewed-by: David Hildenbrand <david@redhat.com>
> > Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
> 
> I'm having second thoughts about this. It might affect performance but
> probably won't but we have no idea.  Might cause latency jitter after
> deflate where it previously didn't happen.  This kind of patch should
> really be accompanied by benchmarking results, not philosophy.

I guess I see your point, much as it's annoying to spend time
benchmarking a device that's basically broken by design.

That said.. I don't really know how I'd go about benchmarking it.  Any
guesses at a suitable workload which would be most likely to show a
performance degradation here?
Michael S. Tsirkin March 5, 2019, 2:29 a.m. UTC | #3
On Tue, Mar 05, 2019 at 11:52:08AM +1100, David Gibson wrote:
> On Thu, Feb 28, 2019 at 08:36:58AM -0500, Michael S. Tsirkin wrote:
> > On Thu, Feb 14, 2019 at 03:39:12PM +1100, David Gibson wrote:
> > > When the balloon is inflated, we discard memory place in it using madvise()
> > > with MADV_DONTNEED.  And when we deflate it we use MADV_WILLNEED, which
> > > sounds like it makes sense but is actually unnecessary.
> > > 
> > > The misleadingly named MADV_DONTNEED just discards the memory in question,
> > > it doesn't set any persistent state on it in-kernel; all that's necessary
> > > to bring the memory back is to touch it.  MADV_WILLNEED in contrast
> > > specifically says that the memory will be used soon and faults it in.
> > > 
> > > This patch simplify's the balloon operation by dropping the madvise()
> > > on deflate.  This might have an impact on performance - it will move a
> > > delay at deflate time until that memory is actually touched, which
> > > might be more latency sensitive.  However:
> > > 
> > >   * Memory that's being given back to the guest by deflating the
> > >     balloon *might* be used soon, but it equally could just sit around
> > >     in the guest's pools until needed (or even be faulted out again if
> > >     the host is under memory pressure).
> > > 
> > >   * Usually, the timescale over which you'll be adjusting the balloon
> > >     is long enough that a few extra faults after deflation aren't
> > >     going to make a difference.
> > > 
> > > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > > Reviewed-by: David Hildenbrand <david@redhat.com>
> > > Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
> > 
> > I'm having second thoughts about this. It might affect performance but
> > probably won't but we have no idea.  Might cause latency jitter after
> > deflate where it previously didn't happen.  This kind of patch should
> > really be accompanied by benchmarking results, not philosophy.
> 
> I guess I see your point, much as it's annoying to spend time
> benchmarking a device that's basically broken by design.

Because of 4K page thing? It's an annoying bug for sure.  There were
patches to add a feature bit to just switch to plan s/g format, but they
were abandoned. You are welcome to revive them though.
Additionally or alternatively, we can easily add a field specifying page size.

> That said.. I don't really know how I'd go about benchmarking it.  Any
> guesses at a suitable workload which would be most likely to show a
> performance degradation here?
> 
> -- 
> David Gibson			| I'll have my music baroque, and my code
> david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
> 				| _way_ _around_!
> http://www.ozlabs.org/~dgibson



Here's one idea - I tried to come up with a worst case scenario here.
Basically based on idea by Alex Duyck. All credits are his, all bugs are
mine:


Setup:
Memory-15837 MB
Guest Memory Size-5 GB
Swap-Disabled
Test Program-Simple program which allocates 4GB memory via malloc, touches it via memset and exits.
Use case-Number of guests that can be launched completely including the successful execution of the test program.
Procedure:
Setup:
A first guest is launched and once its console is up,
test allocation program is executed with 4 GB memory request (Due to
this the guest occupies almost 4-5 GB of memory in the host)
Afterwards balloon is inflated by 4Gbyte in the guest.
We continue launching the guests until a guest gets
killed due to low memory condition in the host.


Now repeatedly, in each guest in turn, balloon is deflated and
test allocation program is executed with 4 GB memory request (Due to
this the guest occupies almost 4-5 GB of memory in the host)
After program finishes balloon is inflated by 4GB again.

Then we switch to another guest.

Time how many cycles of this we can do.


Hope this helps.
David Gibson March 5, 2019, 5:03 a.m. UTC | #4
On Mon, Mar 04, 2019 at 09:29:24PM -0500, Michael S. Tsirkin wrote:
> On Tue, Mar 05, 2019 at 11:52:08AM +1100, David Gibson wrote:
> > On Thu, Feb 28, 2019 at 08:36:58AM -0500, Michael S. Tsirkin wrote:
> > > On Thu, Feb 14, 2019 at 03:39:12PM +1100, David Gibson wrote:
> > > > When the balloon is inflated, we discard memory place in it using madvise()
> > > > with MADV_DONTNEED.  And when we deflate it we use MADV_WILLNEED, which
> > > > sounds like it makes sense but is actually unnecessary.
> > > > 
> > > > The misleadingly named MADV_DONTNEED just discards the memory in question,
> > > > it doesn't set any persistent state on it in-kernel; all that's necessary
> > > > to bring the memory back is to touch it.  MADV_WILLNEED in contrast
> > > > specifically says that the memory will be used soon and faults it in.
> > > > 
> > > > This patch simplify's the balloon operation by dropping the madvise()
> > > > on deflate.  This might have an impact on performance - it will move a
> > > > delay at deflate time until that memory is actually touched, which
> > > > might be more latency sensitive.  However:
> > > > 
> > > >   * Memory that's being given back to the guest by deflating the
> > > >     balloon *might* be used soon, but it equally could just sit around
> > > >     in the guest's pools until needed (or even be faulted out again if
> > > >     the host is under memory pressure).
> > > > 
> > > >   * Usually, the timescale over which you'll be adjusting the balloon
> > > >     is long enough that a few extra faults after deflation aren't
> > > >     going to make a difference.
> > > > 
> > > > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > > > Reviewed-by: David Hildenbrand <david@redhat.com>
> > > > Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
> > > 
> > > I'm having second thoughts about this. It might affect performance but
> > > probably won't but we have no idea.  Might cause latency jitter after
> > > deflate where it previously didn't happen.  This kind of patch should
> > > really be accompanied by benchmarking results, not philosophy.
> > 
> > I guess I see your point, much as it's annoying to spend time
> > benchmarking a device that's basically broken by design.
> 
> Because of 4K page thing?

For one thing.  I believe David H has bunch of other reasons.

> It's an annoying bug for sure.  There were
> patches to add a feature bit to just switch to plan s/g format, but they
> were abandoned. You are welcome to revive them though.
> Additionally or alternatively, we can easily add a field specifying
> page size.

We could, but I'm pretty disinclined to work on this when virtio-mem
is a better solution in nearly every way.

> > That said.. I don't really know how I'd go about benchmarking it.  Any
> > guesses at a suitable workload which would be most likely to show a
> > performance degradation here?
> 
> Here's one idea - I tried to come up with a worst case scenario here.
> Basically based on idea by Alex Duyck. All credits are his, all bugs are
> mine:

Ok.  I'll try to find time to implement this and test it.

> Setup:
> Memory-15837 MB
> Guest Memory Size-5 GB
> Swap-Disabled
> Test Program-Simple program which allocates 4GB memory via malloc, touches it via memset and exits.
> Use case-Number of guests that can be launched completely including the successful execution of the test program.
> Procedure:
> Setup:
> A first guest is launched and once its console is up,
> test allocation program is executed with 4 GB memory request (Due to
> this the guest occupies almost 4-5 GB of memory in the host)
> Afterwards balloon is inflated by 4Gbyte in the guest.
> We continue launching the guests until a guest gets
> killed due to low memory condition in the host.
> 
> 
> Now repeatedly, in each guest in turn, balloon is deflated and
> test allocation program is executed with 4 GB memory request (Due to
> this the guest occupies almost 4-5 GB of memory in the host)
> After program finishes balloon is inflated by 4GB again.
> 
> Then we switch to another guest.
> 
> Time how many cycles of this we can do.
> 
> 
> Hope this helps.
> 
> 
>
Michael S. Tsirkin March 5, 2019, 2:41 p.m. UTC | #5
On Tue, Mar 05, 2019 at 04:03:00PM +1100, David Gibson wrote:
> On Mon, Mar 04, 2019 at 09:29:24PM -0500, Michael S. Tsirkin wrote:
> > On Tue, Mar 05, 2019 at 11:52:08AM +1100, David Gibson wrote:
> > > On Thu, Feb 28, 2019 at 08:36:58AM -0500, Michael S. Tsirkin wrote:
> > > > On Thu, Feb 14, 2019 at 03:39:12PM +1100, David Gibson wrote:
> > > > > When the balloon is inflated, we discard memory place in it using madvise()
> > > > > with MADV_DONTNEED.  And when we deflate it we use MADV_WILLNEED, which
> > > > > sounds like it makes sense but is actually unnecessary.
> > > > > 
> > > > > The misleadingly named MADV_DONTNEED just discards the memory in question,
> > > > > it doesn't set any persistent state on it in-kernel; all that's necessary
> > > > > to bring the memory back is to touch it.  MADV_WILLNEED in contrast
> > > > > specifically says that the memory will be used soon and faults it in.
> > > > > 
> > > > > This patch simplify's the balloon operation by dropping the madvise()
> > > > > on deflate.  This might have an impact on performance - it will move a
> > > > > delay at deflate time until that memory is actually touched, which
> > > > > might be more latency sensitive.  However:
> > > > > 
> > > > >   * Memory that's being given back to the guest by deflating the
> > > > >     balloon *might* be used soon, but it equally could just sit around
> > > > >     in the guest's pools until needed (or even be faulted out again if
> > > > >     the host is under memory pressure).
> > > > > 
> > > > >   * Usually, the timescale over which you'll be adjusting the balloon
> > > > >     is long enough that a few extra faults after deflation aren't
> > > > >     going to make a difference.
> > > > > 
> > > > > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > > > > Reviewed-by: David Hildenbrand <david@redhat.com>
> > > > > Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
> > > > 
> > > > I'm having second thoughts about this. It might affect performance but
> > > > probably won't but we have no idea.  Might cause latency jitter after
> > > > deflate where it previously didn't happen.  This kind of patch should
> > > > really be accompanied by benchmarking results, not philosophy.
> > > 
> > > I guess I see your point, much as it's annoying to spend time
> > > benchmarking a device that's basically broken by design.
> > 
> > Because of 4K page thing?
> 
> For one thing.  I believe David H has bunch of other reasons.
> 
> > It's an annoying bug for sure.  There were
> > patches to add a feature bit to just switch to plan s/g format, but they
> > were abandoned. You are welcome to revive them though.
> > Additionally or alternatively, we can easily add a field specifying
> > page size.
> 
> We could, but I'm pretty disinclined to work on this when virtio-mem
> is a better solution in nearly every way.

Then one way would be to just let balloon be. Make it behave same as
always and don't make changes to it :)

> > > That said.. I don't really know how I'd go about benchmarking it.  Any
> > > guesses at a suitable workload which would be most likely to show a
> > > performance degradation here?
> > 
> > Here's one idea - I tried to come up with a worst case scenario here.
> > Basically based on idea by Alex Duyck. All credits are his, all bugs are
> > mine:
> 
> Ok.  I'll try to find time to implement this and test it.
> 
> > Setup:
> > Memory-15837 MB
> > Guest Memory Size-5 GB
> > Swap-Disabled
> > Test Program-Simple program which allocates 4GB memory via malloc, touches it via memset and exits.
> > Use case-Number of guests that can be launched completely including the successful execution of the test program.
> > Procedure:
> > Setup:
> > A first guest is launched and once its console is up,
> > test allocation program is executed with 4 GB memory request (Due to
> > this the guest occupies almost 4-5 GB of memory in the host)
> > Afterwards balloon is inflated by 4Gbyte in the guest.
> > We continue launching the guests until a guest gets
> > killed due to low memory condition in the host.
> > 
> > 
> > Now repeatedly, in each guest in turn, balloon is deflated and
> > test allocation program is executed with 4 GB memory request (Due to
> > this the guest occupies almost 4-5 GB of memory in the host)
> > After program finishes balloon is inflated by 4GB again.
> > 
> > Then we switch to another guest.
> > 
> > Time how many cycles of this we can do.
> > 
> > 
> > Hope this helps.
> > 
> > 
> > 
> 
> -- 
> David Gibson			| I'll have my music baroque, and my code
> david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
> 				| _way_ _around_!
> http://www.ozlabs.org/~dgibson
David Gibson March 5, 2019, 11:35 p.m. UTC | #6
On Tue, Mar 05, 2019 at 09:41:34AM -0500, Michael S. Tsirkin wrote:
> On Tue, Mar 05, 2019 at 04:03:00PM +1100, David Gibson wrote:
> > On Mon, Mar 04, 2019 at 09:29:24PM -0500, Michael S. Tsirkin wrote:
> > > On Tue, Mar 05, 2019 at 11:52:08AM +1100, David Gibson wrote:
> > > > On Thu, Feb 28, 2019 at 08:36:58AM -0500, Michael S. Tsirkin wrote:
> > > > > On Thu, Feb 14, 2019 at 03:39:12PM +1100, David Gibson wrote:
> > > > > > When the balloon is inflated, we discard memory place in it using madvise()
> > > > > > with MADV_DONTNEED.  And when we deflate it we use MADV_WILLNEED, which
> > > > > > sounds like it makes sense but is actually unnecessary.
> > > > > > 
> > > > > > The misleadingly named MADV_DONTNEED just discards the memory in question,
> > > > > > it doesn't set any persistent state on it in-kernel; all that's necessary
> > > > > > to bring the memory back is to touch it.  MADV_WILLNEED in contrast
> > > > > > specifically says that the memory will be used soon and faults it in.
> > > > > > 
> > > > > > This patch simplify's the balloon operation by dropping the madvise()
> > > > > > on deflate.  This might have an impact on performance - it will move a
> > > > > > delay at deflate time until that memory is actually touched, which
> > > > > > might be more latency sensitive.  However:
> > > > > > 
> > > > > >   * Memory that's being given back to the guest by deflating the
> > > > > >     balloon *might* be used soon, but it equally could just sit around
> > > > > >     in the guest's pools until needed (or even be faulted out again if
> > > > > >     the host is under memory pressure).
> > > > > > 
> > > > > >   * Usually, the timescale over which you'll be adjusting the balloon
> > > > > >     is long enough that a few extra faults after deflation aren't
> > > > > >     going to make a difference.
> > > > > > 
> > > > > > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > > > > > Reviewed-by: David Hildenbrand <david@redhat.com>
> > > > > > Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
> > > > > 
> > > > > I'm having second thoughts about this. It might affect performance but
> > > > > probably won't but we have no idea.  Might cause latency jitter after
> > > > > deflate where it previously didn't happen.  This kind of patch should
> > > > > really be accompanied by benchmarking results, not philosophy.
> > > > 
> > > > I guess I see your point, much as it's annoying to spend time
> > > > benchmarking a device that's basically broken by design.
> > > 
> > > Because of 4K page thing?
> > 
> > For one thing.  I believe David H has bunch of other reasons.
> > 
> > > It's an annoying bug for sure.  There were
> > > patches to add a feature bit to just switch to plan s/g format, but they
> > > were abandoned. You are welcome to revive them though.
> > > Additionally or alternatively, we can easily add a field specifying
> > > page size.
> > 
> > We could, but I'm pretty disinclined to work on this when virtio-mem
> > is a better solution in nearly every way.
> 
> Then one way would be to just let balloon be. Make it behave same as
> always and don't make changes to it :)

I'd love to, but it is in real world use, so I think we do need to fix
serious bugs in it - at least if they can be fixed on one side,
without needing to roll out both qemu and guest changes (which adding
page size negotiation would require).

> 
> > > > That said.. I don't really know how I'd go about benchmarking it.  Any
> > > > guesses at a suitable workload which would be most likely to show a
> > > > performance degradation here?
> > > 
> > > Here's one idea - I tried to come up with a worst case scenario here.
> > > Basically based on idea by Alex Duyck. All credits are his, all bugs are
> > > mine:
> > 
> > Ok.  I'll try to find time to implement this and test it.
> > 
> > > Setup:
> > > Memory-15837 MB
> > > Guest Memory Size-5 GB
> > > Swap-Disabled
> > > Test Program-Simple program which allocates 4GB memory via malloc, touches it via memset and exits.
> > > Use case-Number of guests that can be launched completely including the successful execution of the test program.
> > > Procedure:
> > > Setup:
> > > A first guest is launched and once its console is up,
> > > test allocation program is executed with 4 GB memory request (Due to
> > > this the guest occupies almost 4-5 GB of memory in the host)
> > > Afterwards balloon is inflated by 4Gbyte in the guest.
> > > We continue launching the guests until a guest gets
> > > killed due to low memory condition in the host.
> > > 
> > > 
> > > Now repeatedly, in each guest in turn, balloon is deflated and
> > > test allocation program is executed with 4 GB memory request (Due to
> > > this the guest occupies almost 4-5 GB of memory in the host)
> > > After program finishes balloon is inflated by 4GB again.
> > > 
> > > Then we switch to another guest.
> > > 
> > > Time how many cycles of this we can do.
> > > 
> > > 
> > > Hope this helps.
> > > 
> > > 
> > > 
> > 
> 
>
Michael S. Tsirkin March 6, 2019, 12:14 a.m. UTC | #7
On Wed, Mar 06, 2019 at 10:35:12AM +1100, David Gibson wrote:
> On Tue, Mar 05, 2019 at 09:41:34AM -0500, Michael S. Tsirkin wrote:
> > On Tue, Mar 05, 2019 at 04:03:00PM +1100, David Gibson wrote:
> > > On Mon, Mar 04, 2019 at 09:29:24PM -0500, Michael S. Tsirkin wrote:
> > > > On Tue, Mar 05, 2019 at 11:52:08AM +1100, David Gibson wrote:
> > > > > On Thu, Feb 28, 2019 at 08:36:58AM -0500, Michael S. Tsirkin wrote:
> > > > > > On Thu, Feb 14, 2019 at 03:39:12PM +1100, David Gibson wrote:
> > > > > > > When the balloon is inflated, we discard memory place in it using madvise()
> > > > > > > with MADV_DONTNEED.  And when we deflate it we use MADV_WILLNEED, which
> > > > > > > sounds like it makes sense but is actually unnecessary.
> > > > > > > 
> > > > > > > The misleadingly named MADV_DONTNEED just discards the memory in question,
> > > > > > > it doesn't set any persistent state on it in-kernel; all that's necessary
> > > > > > > to bring the memory back is to touch it.  MADV_WILLNEED in contrast
> > > > > > > specifically says that the memory will be used soon and faults it in.
> > > > > > > 
> > > > > > > This patch simplify's the balloon operation by dropping the madvise()
> > > > > > > on deflate.  This might have an impact on performance - it will move a
> > > > > > > delay at deflate time until that memory is actually touched, which
> > > > > > > might be more latency sensitive.  However:
> > > > > > > 
> > > > > > >   * Memory that's being given back to the guest by deflating the
> > > > > > >     balloon *might* be used soon, but it equally could just sit around
> > > > > > >     in the guest's pools until needed (or even be faulted out again if
> > > > > > >     the host is under memory pressure).
> > > > > > > 
> > > > > > >   * Usually, the timescale over which you'll be adjusting the balloon
> > > > > > >     is long enough that a few extra faults after deflation aren't
> > > > > > >     going to make a difference.
> > > > > > > 
> > > > > > > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > > > > > > Reviewed-by: David Hildenbrand <david@redhat.com>
> > > > > > > Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
> > > > > > 
> > > > > > I'm having second thoughts about this. It might affect performance but
> > > > > > probably won't but we have no idea.  Might cause latency jitter after
> > > > > > deflate where it previously didn't happen.  This kind of patch should
> > > > > > really be accompanied by benchmarking results, not philosophy.
> > > > > 
> > > > > I guess I see your point, much as it's annoying to spend time
> > > > > benchmarking a device that's basically broken by design.
> > > > 
> > > > Because of 4K page thing?
> > > 
> > > For one thing.  I believe David H has bunch of other reasons.
> > > 
> > > > It's an annoying bug for sure.  There were
> > > > patches to add a feature bit to just switch to plan s/g format, but they
> > > > were abandoned. You are welcome to revive them though.
> > > > Additionally or alternatively, we can easily add a field specifying
> > > > page size.
> > > 
> > > We could, but I'm pretty disinclined to work on this when virtio-mem
> > > is a better solution in nearly every way.
> > 
> > Then one way would be to just let balloon be. Make it behave same as
> > always and don't make changes to it :)
> 
> I'd love to, but it is in real world use, so I think we do need to fix
> serious bugs in it - at least if they can be fixed on one side,
> without needing to roll out both qemu and guest changes (which adding
> page size negotiation would require).


Absolutely I'm just saying don't add optimizations in that case :)

> > 
> > > > > That said.. I don't really know how I'd go about benchmarking it.  Any
> > > > > guesses at a suitable workload which would be most likely to show a
> > > > > performance degradation here?
> > > > 
> > > > Here's one idea - I tried to come up with a worst case scenario here.
> > > > Basically based on idea by Alex Duyck. All credits are his, all bugs are
> > > > mine:
> > > 
> > > Ok.  I'll try to find time to implement this and test it.
> > > 
> > > > Setup:
> > > > Memory-15837 MB
> > > > Guest Memory Size-5 GB
> > > > Swap-Disabled
> > > > Test Program-Simple program which allocates 4GB memory via malloc, touches it via memset and exits.
> > > > Use case-Number of guests that can be launched completely including the successful execution of the test program.
> > > > Procedure:
> > > > Setup:
> > > > A first guest is launched and once its console is up,
> > > > test allocation program is executed with 4 GB memory request (Due to
> > > > this the guest occupies almost 4-5 GB of memory in the host)
> > > > Afterwards balloon is inflated by 4Gbyte in the guest.
> > > > We continue launching the guests until a guest gets
> > > > killed due to low memory condition in the host.
> > > > 
> > > > 
> > > > Now repeatedly, in each guest in turn, balloon is deflated and
> > > > test allocation program is executed with 4 GB memory request (Due to
> > > > this the guest occupies almost 4-5 GB of memory in the host)
> > > > After program finishes balloon is inflated by 4GB again.
> > > > 
> > > > Then we switch to another guest.
> > > > 
> > > > Time how many cycles of this we can do.
> > > > 
> > > > 
> > > > Hope this helps.
> > > > 
> > > > 
> > > > 
> > > 
> > 
> > 
> 
> -- 
> David Gibson			| I'll have my music baroque, and my code
> david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
> 				| _way_ _around_!
> http://www.ozlabs.org/~dgibson
David Gibson March 6, 2019, 12:58 a.m. UTC | #8
On Tue, Mar 05, 2019 at 07:14:09PM -0500, Michael S. Tsirkin wrote:
> On Wed, Mar 06, 2019 at 10:35:12AM +1100, David Gibson wrote:
> > On Tue, Mar 05, 2019 at 09:41:34AM -0500, Michael S. Tsirkin wrote:
> > > On Tue, Mar 05, 2019 at 04:03:00PM +1100, David Gibson wrote:
> > > > On Mon, Mar 04, 2019 at 09:29:24PM -0500, Michael S. Tsirkin wrote:
> > > > > On Tue, Mar 05, 2019 at 11:52:08AM +1100, David Gibson wrote:
> > > > > > On Thu, Feb 28, 2019 at 08:36:58AM -0500, Michael S. Tsirkin wrote:
> > > > > > > On Thu, Feb 14, 2019 at 03:39:12PM +1100, David Gibson wrote:
> > > > > > > > When the balloon is inflated, we discard memory place in it using madvise()
> > > > > > > > with MADV_DONTNEED.  And when we deflate it we use MADV_WILLNEED, which
> > > > > > > > sounds like it makes sense but is actually unnecessary.
> > > > > > > > 
> > > > > > > > The misleadingly named MADV_DONTNEED just discards the memory in question,
> > > > > > > > it doesn't set any persistent state on it in-kernel; all that's necessary
> > > > > > > > to bring the memory back is to touch it.  MADV_WILLNEED in contrast
> > > > > > > > specifically says that the memory will be used soon and faults it in.
> > > > > > > > 
> > > > > > > > This patch simplify's the balloon operation by dropping the madvise()
> > > > > > > > on deflate.  This might have an impact on performance - it will move a
> > > > > > > > delay at deflate time until that memory is actually touched, which
> > > > > > > > might be more latency sensitive.  However:
> > > > > > > > 
> > > > > > > >   * Memory that's being given back to the guest by deflating the
> > > > > > > >     balloon *might* be used soon, but it equally could just sit around
> > > > > > > >     in the guest's pools until needed (or even be faulted out again if
> > > > > > > >     the host is under memory pressure).
> > > > > > > > 
> > > > > > > >   * Usually, the timescale over which you'll be adjusting the balloon
> > > > > > > >     is long enough that a few extra faults after deflation aren't
> > > > > > > >     going to make a difference.
> > > > > > > > 
> > > > > > > > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > > > > > > > Reviewed-by: David Hildenbrand <david@redhat.com>
> > > > > > > > Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
> > > > > > > 
> > > > > > > I'm having second thoughts about this. It might affect performance but
> > > > > > > probably won't but we have no idea.  Might cause latency jitter after
> > > > > > > deflate where it previously didn't happen.  This kind of patch should
> > > > > > > really be accompanied by benchmarking results, not philosophy.
> > > > > > 
> > > > > > I guess I see your point, much as it's annoying to spend time
> > > > > > benchmarking a device that's basically broken by design.
> > > > > 
> > > > > Because of 4K page thing?
> > > > 
> > > > For one thing.  I believe David H has bunch of other reasons.
> > > > 
> > > > > It's an annoying bug for sure.  There were
> > > > > patches to add a feature bit to just switch to plan s/g format, but they
> > > > > were abandoned. You are welcome to revive them though.
> > > > > Additionally or alternatively, we can easily add a field specifying
> > > > > page size.
> > > > 
> > > > We could, but I'm pretty disinclined to work on this when virtio-mem
> > > > is a better solution in nearly every way.
> > > 
> > > Then one way would be to just let balloon be. Make it behave same as
> > > always and don't make changes to it :)
> > 
> > I'd love to, but it is in real world use, so I think we do need to fix
> > serious bugs in it - at least if they can be fixed on one side,
> > without needing to roll out both qemu and guest changes (which adding
> > page size negotiation would require).
> 
> 
> Absolutely I'm just saying don't add optimizations in that case :)

I don't plan to.
diff mbox series

Patch

diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
index a12677d4d5..43af521884 100644
--- a/hw/virtio/virtio-balloon.c
+++ b/hw/virtio/virtio-balloon.c
@@ -35,9 +35,8 @@ 
 
 static void balloon_page(void *addr, int deflate)
 {
-    if (!qemu_balloon_is_inhibited()) {
-        qemu_madvise(addr, BALLOON_PAGE_SIZE,
-                deflate ? QEMU_MADV_WILLNEED : QEMU_MADV_DONTNEED);
+    if (!qemu_balloon_is_inhibited() && !deflate) {
+        qemu_madvise(addr, BALLOON_PAGE_SIZE, QEMU_MADV_DONTNEED);
     }
 }