diff mbox series

[v2,1/4] mm/balloon_compaction: list interfaces

Message ID 20190328010718.2248-2-namit@vmware.com (mailing list archive)
State New, archived
Headers show
Series vmw_balloon: compaction and shrinker support | expand

Commit Message

Nadav Amit March 28, 2019, 1:07 a.m. UTC
Introduce interfaces for ballooning enqueueing and dequeueing of a list
of pages. These interfaces reduce the overhead of storing and restoring
IRQs by batching the operations. In addition they do not panic if the
list of pages is empty.

Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: linux-mm@kvack.org
Cc: virtualization@lists.linux-foundation.org
Reviewed-by: Xavier Deguillard <xdeguillard@vmware.com>
Signed-off-by: Nadav Amit <namit@vmware.com>
---
 include/linux/balloon_compaction.h |   4 +
 mm/balloon_compaction.c            | 145 +++++++++++++++++++++--------
 2 files changed, 111 insertions(+), 38 deletions(-)

Comments

Nadav Amit April 8, 2019, 5:35 p.m. UTC | #1
> On Mar 27, 2019, at 6:07 PM, Nadav Amit <namit@vmware.com> wrote:
> 
> Introduce interfaces for ballooning enqueueing and dequeueing of a list
> of pages. These interfaces reduce the overhead of storing and restoring
> IRQs by batching the operations. In addition they do not panic if the
> list of pages is empty.
> 
> Cc: "Michael S. Tsirkin" <mst@redhat.com>

Michael, may I ping for your ack?
Nadav Amit April 19, 2019, 9:41 p.m. UTC | #2
> On Apr 8, 2019, at 10:35 AM, Nadav Amit <namit@vmware.com> wrote:
> 
>> On Mar 27, 2019, at 6:07 PM, Nadav Amit <namit@vmware.com> wrote:
>> 
>> Introduce interfaces for ballooning enqueueing and dequeueing of a list
>> of pages. These interfaces reduce the overhead of storing and restoring
>> IRQs by batching the operations. In addition they do not panic if the
>> list of pages is empty.
>> 
>> Cc: "Michael S. Tsirkin" <mst@redhat.com>
> 
> Michael, may I ping for your ack?

Ping again?
Michael S. Tsirkin April 19, 2019, 10:07 p.m. UTC | #3
On Thu, Mar 28, 2019 at 01:07:15AM +0000, Nadav Amit wrote:
> Introduce interfaces for ballooning enqueueing and dequeueing of a list
> of pages. These interfaces reduce the overhead of storing and restoring
> IRQs by batching the operations. In addition they do not panic if the
> list of pages is empty.
> 
> Cc: "Michael S. Tsirkin" <mst@redhat.com>
> Cc: Jason Wang <jasowang@redhat.com>
> Cc: linux-mm@kvack.org
> Cc: virtualization@lists.linux-foundation.org
> Reviewed-by: Xavier Deguillard <xdeguillard@vmware.com>
> Signed-off-by: Nadav Amit <namit@vmware.com>
> ---
>  include/linux/balloon_compaction.h |   4 +
>  mm/balloon_compaction.c            | 145 +++++++++++++++++++++--------
>  2 files changed, 111 insertions(+), 38 deletions(-)
> 
> diff --git a/include/linux/balloon_compaction.h b/include/linux/balloon_compaction.h
> index f111c780ef1d..1da79edadb69 100644
> --- a/include/linux/balloon_compaction.h
> +++ b/include/linux/balloon_compaction.h
> @@ -64,6 +64,10 @@ extern struct page *balloon_page_alloc(void);
>  extern void balloon_page_enqueue(struct balloon_dev_info *b_dev_info,
>  				 struct page *page);
>  extern struct page *balloon_page_dequeue(struct balloon_dev_info *b_dev_info);
> +extern size_t balloon_page_list_enqueue(struct balloon_dev_info *b_dev_info,
> +				      struct list_head *pages);
> +extern size_t balloon_page_list_dequeue(struct balloon_dev_info *b_dev_info,
> +				     struct list_head *pages, int n_req_pages);
>  

Why size_t I wonder? It can never be > n_req_pages which is int.
Callers also seem to assume int.

>  static inline void balloon_devinfo_init(struct balloon_dev_info *balloon)
>  {


> diff --git a/mm/balloon_compaction.c b/mm/balloon_compaction.c
> index ef858d547e2d..88d5d9a01072 100644
> --- a/mm/balloon_compaction.c
> +++ b/mm/balloon_compaction.c
> @@ -10,6 +10,106 @@
>  #include <linux/export.h>
>  #include <linux/balloon_compaction.h>
>  
> +static int balloon_page_enqueue_one(struct balloon_dev_info *b_dev_info,
> +				     struct page *page)
> +{
> +	/*
> +	 * Block others from accessing the 'page' when we get around to
> +	 * establishing additional references. We should be the only one
> +	 * holding a reference to the 'page' at this point.
> +	 */
> +	if (!trylock_page(page)) {
> +		WARN_ONCE(1, "balloon inflation failed to enqueue page\n");
> +		return -EFAULT;

Looks like all callers bug on a failure. So let's just do it here,
and then make this void?

> +	}
> +	list_del(&page->lru);
> +	balloon_page_insert(b_dev_info, page);
> +	unlock_page(page);
> +	__count_vm_event(BALLOON_INFLATE);
> +	return 0;
> +}
> +
> +/**
> + * balloon_page_list_enqueue() - inserts a list of pages into the balloon page
> + *				 list.
> + * @b_dev_info: balloon device descriptor where we will insert a new page to
> + * @pages: pages to enqueue - allocated using balloon_page_alloc.
> + *
> + * Driver must call it to properly enqueue a balloon pages before definitively
> + * removing it from the guest system.

A bunch of grammar error here. Pls fix for clarify.
Also - document that nothing must lock the pages? More assumptions?
What is "it" in this context? All pages? And what does removing from
guest mean? Really adding to the balloon?

> + *
> + * Return: number of pages that were enqueued.
> + */
> +size_t balloon_page_list_enqueue(struct balloon_dev_info *b_dev_info,
> +			       struct list_head *pages)
> +{
> +	struct page *page, *tmp;
> +	unsigned long flags;
> +	size_t n_pages = 0;
> +
> +	spin_lock_irqsave(&b_dev_info->pages_lock, flags);
> +	list_for_each_entry_safe(page, tmp, pages, lru) {
> +		balloon_page_enqueue_one(b_dev_info, page);

Do we want to do something about an error here?

> +		n_pages++;
> +	}
> +	spin_unlock_irqrestore(&b_dev_info->pages_lock, flags);
> +	return n_pages;
> +}
> +EXPORT_SYMBOL_GPL(balloon_page_list_enqueue);
> +
> +/**
> + * balloon_page_list_dequeue() - removes pages from balloon's page list and
> + *				 returns a list of the pages.
> + * @b_dev_info: balloon device decriptor where we will grab a page from.
> + * @pages: pointer to the list of pages that would be returned to the caller.
> + * @n_req_pages: number of requested pages.
> + *
> + * Driver must call it to properly de-allocate a previous enlisted balloon pages
> + * before definetively releasing it back to the guest system. This function
> + * tries to remove @n_req_pages from the ballooned pages and return it to the
> + * caller in the @pages list.
> + *
> + * Note that this function may fail to dequeue some pages temporarily empty due
> + * to compaction isolated pages.
> + *
> + * Return: number of pages that were added to the @pages list.
> + */
> +size_t balloon_page_list_dequeue(struct balloon_dev_info *b_dev_info,
> +				 struct list_head *pages, int n_req_pages)
> +{
> +	struct page *page, *tmp;
> +	unsigned long flags;
> +	size_t n_pages = 0;
> +
> +	spin_lock_irqsave(&b_dev_info->pages_lock, flags);
> +	list_for_each_entry_safe(page, tmp, &b_dev_info->pages, lru) {
> +		/*
> +		 * Block others from accessing the 'page' while we get around
> +		 * establishing additional references and preparing the 'page'
> +		 * to be released by the balloon driver.
> +		 */
> +		if (!trylock_page(page))
> +			continue;
> +
> +		if (IS_ENABLED(CONFIG_BALLOON_COMPACTION) &&
> +		    PageIsolated(page)) {
> +			/* raced with isolation */
> +			unlock_page(page);
> +			continue;
> +		}
> +		balloon_page_delete(page);
> +		__count_vm_event(BALLOON_DEFLATE);
> +		unlock_page(page);
> +		list_add(&page->lru, pages);
> +		if (++n_pages >= n_req_pages)
> +			break;
> +	}
> +	spin_unlock_irqrestore(&b_dev_info->pages_lock, flags);
> +
> +	return n_pages;
> +}
> +EXPORT_SYMBOL_GPL(balloon_page_list_dequeue);
> +
>  /*
>   * balloon_page_alloc - allocates a new page for insertion into the balloon
>   *			  page list.
> @@ -43,17 +143,9 @@ void balloon_page_enqueue(struct balloon_dev_info *b_dev_info,
>  {
>  	unsigned long flags;
>  
> -	/*
> -	 * Block others from accessing the 'page' when we get around to
> -	 * establishing additional references. We should be the only one
> -	 * holding a reference to the 'page' at this point.
> -	 */
> -	BUG_ON(!trylock_page(page));
>  	spin_lock_irqsave(&b_dev_info->pages_lock, flags);
> -	balloon_page_insert(b_dev_info, page);
> -	__count_vm_event(BALLOON_INFLATE);
> +	balloon_page_enqueue_one(b_dev_info, page);

We used to bug on failure to lock page, now we
silently ignore this error. Why?


>  	spin_unlock_irqrestore(&b_dev_info->pages_lock, flags);
> -	unlock_page(page);
>  }
>  EXPORT_SYMBOL_GPL(balloon_page_enqueue);
>  
> @@ -70,36 +162,13 @@ EXPORT_SYMBOL_GPL(balloon_page_enqueue);
>   */
>  struct page *balloon_page_dequeue(struct balloon_dev_info *b_dev_info)
>  {
> -	struct page *page, *tmp;
>  	unsigned long flags;
> -	bool dequeued_page;
> +	LIST_HEAD(pages);
> +	int n_pages;
>  
> -	dequeued_page = false;
> -	spin_lock_irqsave(&b_dev_info->pages_lock, flags);
> -	list_for_each_entry_safe(page, tmp, &b_dev_info->pages, lru) {
> -		/*
> -		 * Block others from accessing the 'page' while we get around
> -		 * establishing additional references and preparing the 'page'
> -		 * to be released by the balloon driver.
> -		 */
> -		if (trylock_page(page)) {
> -#ifdef CONFIG_BALLOON_COMPACTION
> -			if (PageIsolated(page)) {
> -				/* raced with isolation */
> -				unlock_page(page);
> -				continue;
> -			}
> -#endif
> -			balloon_page_delete(page);
> -			__count_vm_event(BALLOON_DEFLATE);
> -			unlock_page(page);
> -			dequeued_page = true;
> -			break;
> -		}
> -	}
> -	spin_unlock_irqrestore(&b_dev_info->pages_lock, flags);
> +	n_pages = balloon_page_list_dequeue(b_dev_info, &pages, 1);
>  
> -	if (!dequeued_page) {
> +	if (n_pages != 1) {
>  		/*
>  		 * If we are unable to dequeue a balloon page because the page
>  		 * list is empty and there is no isolated pages, then something
> @@ -112,9 +181,9 @@ struct page *balloon_page_dequeue(struct balloon_dev_info *b_dev_info)
>  			     !b_dev_info->isolated_pages))
>  			BUG();
>  		spin_unlock_irqrestore(&b_dev_info->pages_lock, flags);
> -		page = NULL;
> +		return NULL;
>  	}
> -	return page;
> +	return list_first_entry(&pages, struct page, lru);
>  }
>  EXPORT_SYMBOL_GPL(balloon_page_dequeue);
>  
> -- 
> 2.19.1
Nadav Amit April 19, 2019, 10:34 p.m. UTC | #4
> On Apr 19, 2019, at 3:07 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> 
> On Thu, Mar 28, 2019 at 01:07:15AM +0000, Nadav Amit wrote:
>> Introduce interfaces for ballooning enqueueing and dequeueing of a list
>> of pages. These interfaces reduce the overhead of storing and restoring
>> IRQs by batching the operations. In addition they do not panic if the
>> list of pages is empty.
>> 
>> Cc: "Michael S. Tsirkin" <mst@redhat.com>
>> Cc: Jason Wang <jasowang@redhat.com>
>> Cc: linux-mm@kvack.org
>> Cc: virtualization@lists.linux-foundation.org
>> Reviewed-by: Xavier Deguillard <xdeguillard@vmware.com>
>> Signed-off-by: Nadav Amit <namit@vmware.com>
>> ---
>> include/linux/balloon_compaction.h |   4 +
>> mm/balloon_compaction.c            | 145 +++++++++++++++++++++--------
>> 2 files changed, 111 insertions(+), 38 deletions(-)
>> 
>> diff --git a/include/linux/balloon_compaction.h b/include/linux/balloon_compaction.h
>> index f111c780ef1d..1da79edadb69 100644
>> --- a/include/linux/balloon_compaction.h
>> +++ b/include/linux/balloon_compaction.h
>> @@ -64,6 +64,10 @@ extern struct page *balloon_page_alloc(void);
>> extern void balloon_page_enqueue(struct balloon_dev_info *b_dev_info,
>> 				 struct page *page);
>> extern struct page *balloon_page_dequeue(struct balloon_dev_info *b_dev_info);
>> +extern size_t balloon_page_list_enqueue(struct balloon_dev_info *b_dev_info,
>> +				      struct list_head *pages);
>> +extern size_t balloon_page_list_dequeue(struct balloon_dev_info *b_dev_info,
>> +				     struct list_head *pages, int n_req_pages);
> 
> Why size_t I wonder? It can never be > n_req_pages which is int.
> Callers also seem to assume int.

Only because on the previous iteration
( https://lkml.org/lkml/2019/2/6/912 ) you said:

> Are we sure this int never overflows? Why not just use u64
> or size_t straight away?

I am ok either way, but please be consistent.

> 
>> static inline void balloon_devinfo_init(struct balloon_dev_info *balloon)
>> {
> 
> 
>> diff --git a/mm/balloon_compaction.c b/mm/balloon_compaction.c
>> index ef858d547e2d..88d5d9a01072 100644
>> --- a/mm/balloon_compaction.c
>> +++ b/mm/balloon_compaction.c
>> @@ -10,6 +10,106 @@
>> #include <linux/export.h>
>> #include <linux/balloon_compaction.h>
>> 
>> +static int balloon_page_enqueue_one(struct balloon_dev_info *b_dev_info,
>> +				     struct page *page)
>> +{
>> +	/*
>> +	 * Block others from accessing the 'page' when we get around to
>> +	 * establishing additional references. We should be the only one
>> +	 * holding a reference to the 'page' at this point.
>> +	 */
>> +	if (!trylock_page(page)) {
>> +		WARN_ONCE(1, "balloon inflation failed to enqueue page\n");
>> +		return -EFAULT;
> 
> Looks like all callers bug on a failure. So let's just do it here,
> and then make this void?

As you noted below, actually balloon_page_list_enqueue() does not do
anything when an error occurs. I really prefer to avoid adding BUG_ON() - 
I always get pushed back on such things. Yes, this might lead to memory
leak, but there is no reason to crash the system.

>> +	}
>> +	list_del(&page->lru);
>> +	balloon_page_insert(b_dev_info, page);
>> +	unlock_page(page);
>> +	__count_vm_event(BALLOON_INFLATE);
>> +	return 0;
>> +}
>> +
>> +/**
>> + * balloon_page_list_enqueue() - inserts a list of pages into the balloon page
>> + *				 list.
>> + * @b_dev_info: balloon device descriptor where we will insert a new page to
>> + * @pages: pages to enqueue - allocated using balloon_page_alloc.
>> + *
>> + * Driver must call it to properly enqueue a balloon pages before definitively
>> + * removing it from the guest system.
> 
> A bunch of grammar error here. Pls fix for clarify.
> Also - document that nothing must lock the pages? More assumptions?
> What is "it" in this context? All pages? And what does removing from
> guest mean? Really adding to the balloon?

I pretty much copy-pasted this description from balloon_page_enqueue(). I
see that you edited this message in the past at least couple of times (e.g.,
c7cdff0e86471 “virtio_balloon: fix deadlock on OOM”) and left it as is.

So maybe all of the comments in this file need a rework, but I don’t think
this patch-set needs to do it.

>> + *
>> + * Return: number of pages that were enqueued.
>> + */
>> +size_t balloon_page_list_enqueue(struct balloon_dev_info *b_dev_info,
>> +			       struct list_head *pages)
>> +{
>> +	struct page *page, *tmp;
>> +	unsigned long flags;
>> +	size_t n_pages = 0;
>> +
>> +	spin_lock_irqsave(&b_dev_info->pages_lock, flags);
>> +	list_for_each_entry_safe(page, tmp, pages, lru) {
>> +		balloon_page_enqueue_one(b_dev_info, page);
> 
> Do we want to do something about an error here?

Hmm… This is really something that should never happen, but I still prefer
to avoid BUG_ON(), as I said before. I will just not count the page.

> 
>> +		n_pages++;
>> +	}
>> +	spin_unlock_irqrestore(&b_dev_info->pages_lock, flags);
>> +	return n_pages;
>> +}
>> +EXPORT_SYMBOL_GPL(balloon_page_list_enqueue);
>> +
>> +/**
>> + * balloon_page_list_dequeue() - removes pages from balloon's page list and
>> + *				 returns a list of the pages.
>> + * @b_dev_info: balloon device decriptor where we will grab a page from.
>> + * @pages: pointer to the list of pages that would be returned to the caller.
>> + * @n_req_pages: number of requested pages.
>> + *
>> + * Driver must call it to properly de-allocate a previous enlisted balloon pages
>> + * before definetively releasing it back to the guest system. This function
>> + * tries to remove @n_req_pages from the ballooned pages and return it to the
>> + * caller in the @pages list.
>> + *
>> + * Note that this function may fail to dequeue some pages temporarily empty due
>> + * to compaction isolated pages.
>> + *
>> + * Return: number of pages that were added to the @pages list.
>> + */
>> +size_t balloon_page_list_dequeue(struct balloon_dev_info *b_dev_info,
>> +				 struct list_head *pages, int n_req_pages)
>> +{
>> +	struct page *page, *tmp;
>> +	unsigned long flags;
>> +	size_t n_pages = 0;
>> +
>> +	spin_lock_irqsave(&b_dev_info->pages_lock, flags);
>> +	list_for_each_entry_safe(page, tmp, &b_dev_info->pages, lru) {
>> +		/*
>> +		 * Block others from accessing the 'page' while we get around
>> +		 * establishing additional references and preparing the 'page'
>> +		 * to be released by the balloon driver.
>> +		 */
>> +		if (!trylock_page(page))
>> +			continue;
>> +
>> +		if (IS_ENABLED(CONFIG_BALLOON_COMPACTION) &&
>> +		    PageIsolated(page)) {
>> +			/* raced with isolation */
>> +			unlock_page(page);
>> +			continue;
>> +		}
>> +		balloon_page_delete(page);
>> +		__count_vm_event(BALLOON_DEFLATE);
>> +		unlock_page(page);
>> +		list_add(&page->lru, pages);
>> +		if (++n_pages >= n_req_pages)
>> +			break;
>> +	}
>> +	spin_unlock_irqrestore(&b_dev_info->pages_lock, flags);
>> +
>> +	return n_pages;
>> +}
>> +EXPORT_SYMBOL_GPL(balloon_page_list_dequeue);
>> +
>> /*
>>  * balloon_page_alloc - allocates a new page for insertion into the balloon
>>  *			  page list.
>> @@ -43,17 +143,9 @@ void balloon_page_enqueue(struct balloon_dev_info *b_dev_info,
>> {
>> 	unsigned long flags;
>> 
>> -	/*
>> -	 * Block others from accessing the 'page' when we get around to
>> -	 * establishing additional references. We should be the only one
>> -	 * holding a reference to the 'page' at this point.
>> -	 */
>> -	BUG_ON(!trylock_page(page));
>> 	spin_lock_irqsave(&b_dev_info->pages_lock, flags);
>> -	balloon_page_insert(b_dev_info, page);
>> -	__count_vm_event(BALLOON_INFLATE);
>> +	balloon_page_enqueue_one(b_dev_info, page);
> 
> We used to bug on failure to lock page, now we
> silently ignore this error. Why?

That’s a mistake. I’ll add a BUG_ON() if balloon_page_enqueue_one() fails.
Michael S. Tsirkin April 19, 2019, 10:47 p.m. UTC | #5
On Fri, Apr 19, 2019 at 10:34:04PM +0000, Nadav Amit wrote:
> > On Apr 19, 2019, at 3:07 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> > 
> > On Thu, Mar 28, 2019 at 01:07:15AM +0000, Nadav Amit wrote:
> >> Introduce interfaces for ballooning enqueueing and dequeueing of a list
> >> of pages. These interfaces reduce the overhead of storing and restoring
> >> IRQs by batching the operations. In addition they do not panic if the
> >> list of pages is empty.
> >> 
> >> Cc: "Michael S. Tsirkin" <mst@redhat.com>
> >> Cc: Jason Wang <jasowang@redhat.com>
> >> Cc: linux-mm@kvack.org
> >> Cc: virtualization@lists.linux-foundation.org
> >> Reviewed-by: Xavier Deguillard <xdeguillard@vmware.com>
> >> Signed-off-by: Nadav Amit <namit@vmware.com>
> >> ---
> >> include/linux/balloon_compaction.h |   4 +
> >> mm/balloon_compaction.c            | 145 +++++++++++++++++++++--------
> >> 2 files changed, 111 insertions(+), 38 deletions(-)
> >> 
> >> diff --git a/include/linux/balloon_compaction.h b/include/linux/balloon_compaction.h
> >> index f111c780ef1d..1da79edadb69 100644
> >> --- a/include/linux/balloon_compaction.h
> >> +++ b/include/linux/balloon_compaction.h
> >> @@ -64,6 +64,10 @@ extern struct page *balloon_page_alloc(void);
> >> extern void balloon_page_enqueue(struct balloon_dev_info *b_dev_info,
> >> 				 struct page *page);
> >> extern struct page *balloon_page_dequeue(struct balloon_dev_info *b_dev_info);
> >> +extern size_t balloon_page_list_enqueue(struct balloon_dev_info *b_dev_info,
> >> +				      struct list_head *pages);
> >> +extern size_t balloon_page_list_dequeue(struct balloon_dev_info *b_dev_info,
> >> +				     struct list_head *pages, int n_req_pages);
> > 
> > Why size_t I wonder? It can never be > n_req_pages which is int.
> > Callers also seem to assume int.
> 
> Only because on the previous iteration
> ( https://lkml.org/lkml/2019/2/6/912 ) you said:
> 
> > Are we sure this int never overflows? Why not just use u64
> > or size_t straight away?

And the answer is because n_req_pages is an int too?

> 
> I am ok either way, but please be consistent.

I guess n_req_pages should be size_t too then?

> > 
> >> static inline void balloon_devinfo_init(struct balloon_dev_info *balloon)
> >> {
> > 
> > 
> >> diff --git a/mm/balloon_compaction.c b/mm/balloon_compaction.c
> >> index ef858d547e2d..88d5d9a01072 100644
> >> --- a/mm/balloon_compaction.c
> >> +++ b/mm/balloon_compaction.c
> >> @@ -10,6 +10,106 @@
> >> #include <linux/export.h>
> >> #include <linux/balloon_compaction.h>
> >> 
> >> +static int balloon_page_enqueue_one(struct balloon_dev_info *b_dev_info,
> >> +				     struct page *page)
> >> +{
> >> +	/*
> >> +	 * Block others from accessing the 'page' when we get around to
> >> +	 * establishing additional references. We should be the only one
> >> +	 * holding a reference to the 'page' at this point.
> >> +	 */
> >> +	if (!trylock_page(page)) {
> >> +		WARN_ONCE(1, "balloon inflation failed to enqueue page\n");
> >> +		return -EFAULT;
> > 
> > Looks like all callers bug on a failure. So let's just do it here,
> > and then make this void?
> 
> As you noted below, actually balloon_page_list_enqueue() does not do
> anything when an error occurs. I really prefer to avoid adding BUG_ON() - 
> I always get pushed back on such things. Yes, this might lead to memory
> leak, but there is no reason to crash the system.

Need to audit callers to make sure they don't misbehave in worse ways.

I think in this case this indicates that someone is using the page so if
one keeps going and adds it into balloon this will lead to corruption down the road.

If you can change the caller code such that it's just a leak,
then a warning is more appropriate. Or even do not warn at all.


> >> +	}
> >> +	list_del(&page->lru);
> >> +	balloon_page_insert(b_dev_info, page);
> >> +	unlock_page(page);
> >> +	__count_vm_event(BALLOON_INFLATE);
> >> +	return 0;
> >> +}
> >> +
> >> +/**
> >> + * balloon_page_list_enqueue() - inserts a list of pages into the balloon page
> >> + *				 list.
> >> + * @b_dev_info: balloon device descriptor where we will insert a new page to
> >> + * @pages: pages to enqueue - allocated using balloon_page_alloc.
> >> + *
> >> + * Driver must call it to properly enqueue a balloon pages before definitively
> >> + * removing it from the guest system.
> > 
> > A bunch of grammar error here. Pls fix for clarify.
> > Also - document that nothing must lock the pages? More assumptions?
> > What is "it" in this context? All pages? And what does removing from
> > guest mean? Really adding to the balloon?
> 
> I pretty much copy-pasted this description from balloon_page_enqueue(). I
> see that you edited this message in the past at least couple of times (e.g.,
> c7cdff0e86471 “virtio_balloon: fix deadlock on OOM”) and left it as is.
> 
> So maybe all of the comments in this file need a rework, but I don’t think
> this patch-set needs to do it.

I see.
That one dealt with one page so "it" was the page. This one deals with
many pages so you can't just copy it over without changes.
Makes it look like "it" refers to driver or guest.

> >> + *
> >> + * Return: number of pages that were enqueued.
> >> + */
> >> +size_t balloon_page_list_enqueue(struct balloon_dev_info *b_dev_info,
> >> +			       struct list_head *pages)
> >> +{
> >> +	struct page *page, *tmp;
> >> +	unsigned long flags;
> >> +	size_t n_pages = 0;
> >> +
> >> +	spin_lock_irqsave(&b_dev_info->pages_lock, flags);
> >> +	list_for_each_entry_safe(page, tmp, pages, lru) {
> >> +		balloon_page_enqueue_one(b_dev_info, page);
> > 
> > Do we want to do something about an error here?
> 
> Hmm… This is really something that should never happen, but I still prefer
> to avoid BUG_ON(), as I said before. I will just not count the page.

Callers can BUG then if they want. That is fine but you then
need to change the callers to do it.

> > 
> >> +		n_pages++;
> >> +	}
> >> +	spin_unlock_irqrestore(&b_dev_info->pages_lock, flags);
> >> +	return n_pages;
> >> +}
> >> +EXPORT_SYMBOL_GPL(balloon_page_list_enqueue);
> >> +
> >> +/**
> >> + * balloon_page_list_dequeue() - removes pages from balloon's page list and
> >> + *				 returns a list of the pages.
> >> + * @b_dev_info: balloon device decriptor where we will grab a page from.
> >> + * @pages: pointer to the list of pages that would be returned to the caller.
> >> + * @n_req_pages: number of requested pages.
> >> + *
> >> + * Driver must call it to properly de-allocate a previous enlisted balloon pages
> >> + * before definetively releasing it back to the guest system. This function
> >> + * tries to remove @n_req_pages from the ballooned pages and return it to the
> >> + * caller in the @pages list.
> >> + *
> >> + * Note that this function may fail to dequeue some pages temporarily empty due
> >> + * to compaction isolated pages.
> >> + *
> >> + * Return: number of pages that were added to the @pages list.
> >> + */
> >> +size_t balloon_page_list_dequeue(struct balloon_dev_info *b_dev_info,
> >> +				 struct list_head *pages, int n_req_pages)
> >> +{
> >> +	struct page *page, *tmp;
> >> +	unsigned long flags;
> >> +	size_t n_pages = 0;
> >> +
> >> +	spin_lock_irqsave(&b_dev_info->pages_lock, flags);
> >> +	list_for_each_entry_safe(page, tmp, &b_dev_info->pages, lru) {
> >> +		/*
> >> +		 * Block others from accessing the 'page' while we get around
> >> +		 * establishing additional references and preparing the 'page'
> >> +		 * to be released by the balloon driver.
> >> +		 */
> >> +		if (!trylock_page(page))
> >> +			continue;
> >> +
> >> +		if (IS_ENABLED(CONFIG_BALLOON_COMPACTION) &&
> >> +		    PageIsolated(page)) {
> >> +			/* raced with isolation */
> >> +			unlock_page(page);
> >> +			continue;
> >> +		}
> >> +		balloon_page_delete(page);
> >> +		__count_vm_event(BALLOON_DEFLATE);
> >> +		unlock_page(page);
> >> +		list_add(&page->lru, pages);
> >> +		if (++n_pages >= n_req_pages)
> >> +			break;
> >> +	}
> >> +	spin_unlock_irqrestore(&b_dev_info->pages_lock, flags);
> >> +
> >> +	return n_pages;
> >> +}
> >> +EXPORT_SYMBOL_GPL(balloon_page_list_dequeue);
> >> +
> >> /*
> >>  * balloon_page_alloc - allocates a new page for insertion into the balloon
> >>  *			  page list.
> >> @@ -43,17 +143,9 @@ void balloon_page_enqueue(struct balloon_dev_info *b_dev_info,
> >> {
> >> 	unsigned long flags;
> >> 
> >> -	/*
> >> -	 * Block others from accessing the 'page' when we get around to
> >> -	 * establishing additional references. We should be the only one
> >> -	 * holding a reference to the 'page' at this point.
> >> -	 */
> >> -	BUG_ON(!trylock_page(page));
> >> 	spin_lock_irqsave(&b_dev_info->pages_lock, flags);
> >> -	balloon_page_insert(b_dev_info, page);
> >> -	__count_vm_event(BALLOON_INFLATE);
> >> +	balloon_page_enqueue_one(b_dev_info, page);
> > 
> > We used to bug on failure to lock page, now we
> > silently ignore this error. Why?
> 
> That’s a mistake. I’ll add a BUG_ON() if balloon_page_enqueue_one() fails.
> 
>
Nadav Amit April 19, 2019, 10:58 p.m. UTC | #6
> On Apr 19, 2019, at 3:47 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> 
> On Fri, Apr 19, 2019 at 10:34:04PM +0000, Nadav Amit wrote:
>>> On Apr 19, 2019, at 3:07 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
>>> 
>>> On Thu, Mar 28, 2019 at 01:07:15AM +0000, Nadav Amit wrote:
>>>> Introduce interfaces for ballooning enqueueing and dequeueing of a list
>>>> of pages. These interfaces reduce the overhead of storing and restoring
>>>> IRQs by batching the operations. In addition they do not panic if the
>>>> list of pages is empty.
>>>> 
>>>> Cc: "Michael S. Tsirkin" <mst@redhat.com>
>>>> Cc: Jason Wang <jasowang@redhat.com>
>>>> Cc: linux-mm@kvack.org
>>>> Cc: virtualization@lists.linux-foundation.org
>>>> Reviewed-by: Xavier Deguillard <xdeguillard@vmware.com>
>>>> Signed-off-by: Nadav Amit <namit@vmware.com>
>>>> ---
>>>> include/linux/balloon_compaction.h |   4 +
>>>> mm/balloon_compaction.c            | 145 +++++++++++++++++++++--------
>>>> 2 files changed, 111 insertions(+), 38 deletions(-)
>>>> 
>>>> diff --git a/include/linux/balloon_compaction.h b/include/linux/balloon_compaction.h
>>>> index f111c780ef1d..1da79edadb69 100644
>>>> --- a/include/linux/balloon_compaction.h
>>>> +++ b/include/linux/balloon_compaction.h
>>>> @@ -64,6 +64,10 @@ extern struct page *balloon_page_alloc(void);
>>>> extern void balloon_page_enqueue(struct balloon_dev_info *b_dev_info,
>>>> 				 struct page *page);
>>>> extern struct page *balloon_page_dequeue(struct balloon_dev_info *b_dev_info);
>>>> +extern size_t balloon_page_list_enqueue(struct balloon_dev_info *b_dev_info,
>>>> +				      struct list_head *pages);
>>>> +extern size_t balloon_page_list_dequeue(struct balloon_dev_info *b_dev_info,
>>>> +				     struct list_head *pages, int n_req_pages);
>>> 
>>> Why size_t I wonder? It can never be > n_req_pages which is int.
>>> Callers also seem to assume int.
>> 
>> Only because on the previous iteration
>> ( https://lkml.org/lkml/2019/2/6/912 ) you said:
>> 
>>> Are we sure this int never overflows? Why not just use u64
>>> or size_t straight away?
> 
> And the answer is because n_req_pages is an int too?
> 
>> I am ok either way, but please be consistent.
> 
> I guess n_req_pages should be size_t too then?

Yes. I will change it.

> 
>>>> static inline void balloon_devinfo_init(struct balloon_dev_info *balloon)
>>>> {
>>> 
>>> 
>>>> diff --git a/mm/balloon_compaction.c b/mm/balloon_compaction.c
>>>> index ef858d547e2d..88d5d9a01072 100644
>>>> --- a/mm/balloon_compaction.c
>>>> +++ b/mm/balloon_compaction.c
>>>> @@ -10,6 +10,106 @@
>>>> #include <linux/export.h>
>>>> #include <linux/balloon_compaction.h>
>>>> 
>>>> +static int balloon_page_enqueue_one(struct balloon_dev_info *b_dev_info,
>>>> +				     struct page *page)
>>>> +{
>>>> +	/*
>>>> +	 * Block others from accessing the 'page' when we get around to
>>>> +	 * establishing additional references. We should be the only one
>>>> +	 * holding a reference to the 'page' at this point.
>>>> +	 */
>>>> +	if (!trylock_page(page)) {
>>>> +		WARN_ONCE(1, "balloon inflation failed to enqueue page\n");
>>>> +		return -EFAULT;
>>> 
>>> Looks like all callers bug on a failure. So let's just do it here,
>>> and then make this void?
>> 
>> As you noted below, actually balloon_page_list_enqueue() does not do
>> anything when an error occurs. I really prefer to avoid adding BUG_ON() - 
>> I always get pushed back on such things. Yes, this might lead to memory
>> leak, but there is no reason to crash the system.
> 
> Need to audit callers to make sure they don't misbehave in worse ways.
> 
> I think in this case this indicates that someone is using the page so if
> one keeps going and adds it into balloon this will lead to corruption down the road.
> 
> If you can change the caller code such that it's just a leak,
> then a warning is more appropriate. Or even do not warn at all.

Yes, you are right (and I was wrong) - this is indeed much more than a
memory leak. I’ll see if it is easy to handle this case (I am not sure), but
I think the warning should stay.

>>>> +	}
>>>> +	list_del(&page->lru);
>>>> +	balloon_page_insert(b_dev_info, page);
>>>> +	unlock_page(page);
>>>> +	__count_vm_event(BALLOON_INFLATE);
>>>> +	return 0;
>>>> +}
>>>> +
>>>> +/**
>>>> + * balloon_page_list_enqueue() - inserts a list of pages into the balloon page
>>>> + *				 list.
>>>> + * @b_dev_info: balloon device descriptor where we will insert a new page to
>>>> + * @pages: pages to enqueue - allocated using balloon_page_alloc.
>>>> + *
>>>> + * Driver must call it to properly enqueue a balloon pages before definitively
>>>> + * removing it from the guest system.
>>> 
>>> A bunch of grammar error here. Pls fix for clarify.
>>> Also - document that nothing must lock the pages? More assumptions?
>>> What is "it" in this context? All pages? And what does removing from
>>> guest mean? Really adding to the balloon?
>> 
>> I pretty much copy-pasted this description from balloon_page_enqueue(). I
>> see that you edited this message in the past at least couple of times (e.g.,
>> c7cdff0e86471 “virtio_balloon: fix deadlock on OOM”) and left it as is.
>> 
>> So maybe all of the comments in this file need a rework, but I don’t think
>> this patch-set needs to do it.
> 
> I see.
> That one dealt with one page so "it" was the page. This one deals with
> many pages so you can't just copy it over without changes.
> Makes it look like "it" refers to driver or guest.

I will fix “it”. ;-)

> 
>>>> + *
>>>> + * Return: number of pages that were enqueued.
>>>> + */
>>>> +size_t balloon_page_list_enqueue(struct balloon_dev_info *b_dev_info,
>>>> +			       struct list_head *pages)
>>>> +{
>>>> +	struct page *page, *tmp;
>>>> +	unsigned long flags;
>>>> +	size_t n_pages = 0;
>>>> +
>>>> +	spin_lock_irqsave(&b_dev_info->pages_lock, flags);
>>>> +	list_for_each_entry_safe(page, tmp, pages, lru) {
>>>> +		balloon_page_enqueue_one(b_dev_info, page);
>>> 
>>> Do we want to do something about an error here?
>> 
>> Hmm… This is really something that should never happen, but I still prefer
>> to avoid BUG_ON(), as I said before. I will just not count the page.
> 
> Callers can BUG then if they want. That is fine but you then
> need to change the callers to do it.

Ok, I’ll pay more attention this time. Thanks!
diff mbox series

Patch

diff --git a/include/linux/balloon_compaction.h b/include/linux/balloon_compaction.h
index f111c780ef1d..1da79edadb69 100644
--- a/include/linux/balloon_compaction.h
+++ b/include/linux/balloon_compaction.h
@@ -64,6 +64,10 @@  extern struct page *balloon_page_alloc(void);
 extern void balloon_page_enqueue(struct balloon_dev_info *b_dev_info,
 				 struct page *page);
 extern struct page *balloon_page_dequeue(struct balloon_dev_info *b_dev_info);
+extern size_t balloon_page_list_enqueue(struct balloon_dev_info *b_dev_info,
+				      struct list_head *pages);
+extern size_t balloon_page_list_dequeue(struct balloon_dev_info *b_dev_info,
+				     struct list_head *pages, int n_req_pages);
 
 static inline void balloon_devinfo_init(struct balloon_dev_info *balloon)
 {
diff --git a/mm/balloon_compaction.c b/mm/balloon_compaction.c
index ef858d547e2d..88d5d9a01072 100644
--- a/mm/balloon_compaction.c
+++ b/mm/balloon_compaction.c
@@ -10,6 +10,106 @@ 
 #include <linux/export.h>
 #include <linux/balloon_compaction.h>
 
+static int balloon_page_enqueue_one(struct balloon_dev_info *b_dev_info,
+				     struct page *page)
+{
+	/*
+	 * Block others from accessing the 'page' when we get around to
+	 * establishing additional references. We should be the only one
+	 * holding a reference to the 'page' at this point.
+	 */
+	if (!trylock_page(page)) {
+		WARN_ONCE(1, "balloon inflation failed to enqueue page\n");
+		return -EFAULT;
+	}
+	list_del(&page->lru);
+	balloon_page_insert(b_dev_info, page);
+	unlock_page(page);
+	__count_vm_event(BALLOON_INFLATE);
+	return 0;
+}
+
+/**
+ * balloon_page_list_enqueue() - inserts a list of pages into the balloon page
+ *				 list.
+ * @b_dev_info: balloon device descriptor where we will insert a new page to
+ * @pages: pages to enqueue - allocated using balloon_page_alloc.
+ *
+ * Driver must call it to properly enqueue a balloon pages before definitively
+ * removing it from the guest system.
+ *
+ * Return: number of pages that were enqueued.
+ */
+size_t balloon_page_list_enqueue(struct balloon_dev_info *b_dev_info,
+			       struct list_head *pages)
+{
+	struct page *page, *tmp;
+	unsigned long flags;
+	size_t n_pages = 0;
+
+	spin_lock_irqsave(&b_dev_info->pages_lock, flags);
+	list_for_each_entry_safe(page, tmp, pages, lru) {
+		balloon_page_enqueue_one(b_dev_info, page);
+		n_pages++;
+	}
+	spin_unlock_irqrestore(&b_dev_info->pages_lock, flags);
+	return n_pages;
+}
+EXPORT_SYMBOL_GPL(balloon_page_list_enqueue);
+
+/**
+ * balloon_page_list_dequeue() - removes pages from balloon's page list and
+ *				 returns a list of the pages.
+ * @b_dev_info: balloon device decriptor where we will grab a page from.
+ * @pages: pointer to the list of pages that would be returned to the caller.
+ * @n_req_pages: number of requested pages.
+ *
+ * Driver must call it to properly de-allocate a previous enlisted balloon pages
+ * before definetively releasing it back to the guest system. This function
+ * tries to remove @n_req_pages from the ballooned pages and return it to the
+ * caller in the @pages list.
+ *
+ * Note that this function may fail to dequeue some pages temporarily empty due
+ * to compaction isolated pages.
+ *
+ * Return: number of pages that were added to the @pages list.
+ */
+size_t balloon_page_list_dequeue(struct balloon_dev_info *b_dev_info,
+				 struct list_head *pages, int n_req_pages)
+{
+	struct page *page, *tmp;
+	unsigned long flags;
+	size_t n_pages = 0;
+
+	spin_lock_irqsave(&b_dev_info->pages_lock, flags);
+	list_for_each_entry_safe(page, tmp, &b_dev_info->pages, lru) {
+		/*
+		 * Block others from accessing the 'page' while we get around
+		 * establishing additional references and preparing the 'page'
+		 * to be released by the balloon driver.
+		 */
+		if (!trylock_page(page))
+			continue;
+
+		if (IS_ENABLED(CONFIG_BALLOON_COMPACTION) &&
+		    PageIsolated(page)) {
+			/* raced with isolation */
+			unlock_page(page);
+			continue;
+		}
+		balloon_page_delete(page);
+		__count_vm_event(BALLOON_DEFLATE);
+		unlock_page(page);
+		list_add(&page->lru, pages);
+		if (++n_pages >= n_req_pages)
+			break;
+	}
+	spin_unlock_irqrestore(&b_dev_info->pages_lock, flags);
+
+	return n_pages;
+}
+EXPORT_SYMBOL_GPL(balloon_page_list_dequeue);
+
 /*
  * balloon_page_alloc - allocates a new page for insertion into the balloon
  *			  page list.
@@ -43,17 +143,9 @@  void balloon_page_enqueue(struct balloon_dev_info *b_dev_info,
 {
 	unsigned long flags;
 
-	/*
-	 * Block others from accessing the 'page' when we get around to
-	 * establishing additional references. We should be the only one
-	 * holding a reference to the 'page' at this point.
-	 */
-	BUG_ON(!trylock_page(page));
 	spin_lock_irqsave(&b_dev_info->pages_lock, flags);
-	balloon_page_insert(b_dev_info, page);
-	__count_vm_event(BALLOON_INFLATE);
+	balloon_page_enqueue_one(b_dev_info, page);
 	spin_unlock_irqrestore(&b_dev_info->pages_lock, flags);
-	unlock_page(page);
 }
 EXPORT_SYMBOL_GPL(balloon_page_enqueue);
 
@@ -70,36 +162,13 @@  EXPORT_SYMBOL_GPL(balloon_page_enqueue);
  */
 struct page *balloon_page_dequeue(struct balloon_dev_info *b_dev_info)
 {
-	struct page *page, *tmp;
 	unsigned long flags;
-	bool dequeued_page;
+	LIST_HEAD(pages);
+	int n_pages;
 
-	dequeued_page = false;
-	spin_lock_irqsave(&b_dev_info->pages_lock, flags);
-	list_for_each_entry_safe(page, tmp, &b_dev_info->pages, lru) {
-		/*
-		 * Block others from accessing the 'page' while we get around
-		 * establishing additional references and preparing the 'page'
-		 * to be released by the balloon driver.
-		 */
-		if (trylock_page(page)) {
-#ifdef CONFIG_BALLOON_COMPACTION
-			if (PageIsolated(page)) {
-				/* raced with isolation */
-				unlock_page(page);
-				continue;
-			}
-#endif
-			balloon_page_delete(page);
-			__count_vm_event(BALLOON_DEFLATE);
-			unlock_page(page);
-			dequeued_page = true;
-			break;
-		}
-	}
-	spin_unlock_irqrestore(&b_dev_info->pages_lock, flags);
+	n_pages = balloon_page_list_dequeue(b_dev_info, &pages, 1);
 
-	if (!dequeued_page) {
+	if (n_pages != 1) {
 		/*
 		 * If we are unable to dequeue a balloon page because the page
 		 * list is empty and there is no isolated pages, then something
@@ -112,9 +181,9 @@  struct page *balloon_page_dequeue(struct balloon_dev_info *b_dev_info)
 			     !b_dev_info->isolated_pages))
 			BUG();
 		spin_unlock_irqrestore(&b_dev_info->pages_lock, flags);
-		page = NULL;
+		return NULL;
 	}
-	return page;
+	return list_first_entry(&pages, struct page, lru);
 }
 EXPORT_SYMBOL_GPL(balloon_page_dequeue);