diff mbox series

[v4,3/4] mm/page_reporting: Allow driver to specify reporting order

Message ID 20210625014710.42954-4-gshan@redhat.com (mailing list archive)
State New
Headers show
Series mm/page_reporting: Make page reporting work on arm64 with 64KB page size | expand

Commit Message

Gavin Shan June 25, 2021, 1:47 a.m. UTC
The page reporting order (threshold) is sticky to @pageblock_order
by default. The page reporting can never be triggered because the
freeing page can't come up with a free area like that huge. The
situation becomes worse when the system memory becomes heavily
fragmented.

For example, the following configurations are used on ARM64 when 64KB
base page size is enabled. In this specific case, the page reporting
won't be triggered until the freeing page comes up with a 512MB free
area. That's hard to be met, especially when the system memory becomes
heavily fragmented.

   PAGE_SIZE:          64KB
   HPAGE_SIZE:         512MB
   pageblock_order:    13       (512MB)
   MAX_ORDER:          14

This allows the drivers to specify the page reporting order when the
page reporting device is registered. It falls back to @pageblock_order
if it's not specified by the driver. The existing users (hv_balloon
and virtio_balloon) don't specify it and @pageblock_order is still
taken as their page reporting order. So this shouldn't introduce any
functional changes.

Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
---
 include/linux/page_reporting.h | 3 +++
 mm/page_reporting.c            | 6 ++++++
 2 files changed, 9 insertions(+)

Comments

Alexander Duyck June 25, 2021, 1:19 a.m. UTC | #1
On Thu, Jun 24, 2021 at 4:46 PM Gavin Shan <gshan@redhat.com> wrote:
>
> The page reporting order (threshold) is sticky to @pageblock_order
> by default. The page reporting can never be triggered because the
> freeing page can't come up with a free area like that huge. The
> situation becomes worse when the system memory becomes heavily
> fragmented.
>
> For example, the following configurations are used on ARM64 when 64KB
> base page size is enabled. In this specific case, the page reporting
> won't be triggered until the freeing page comes up with a 512MB free
> area. That's hard to be met, especially when the system memory becomes
> heavily fragmented.
>
>    PAGE_SIZE:          64KB
>    HPAGE_SIZE:         512MB
>    pageblock_order:    13       (512MB)
>    MAX_ORDER:          14
>
> This allows the drivers to specify the page reporting order when the
> page reporting device is registered. It falls back to @pageblock_order
> if it's not specified by the driver. The existing users (hv_balloon
> and virtio_balloon) don't specify it and @pageblock_order is still
> taken as their page reporting order. So this shouldn't introduce any
> functional changes.
>
> Signed-off-by: Gavin Shan <gshan@redhat.com>
> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
> ---
>  include/linux/page_reporting.h | 3 +++
>  mm/page_reporting.c            | 6 ++++++
>  2 files changed, 9 insertions(+)
>
> diff --git a/include/linux/page_reporting.h b/include/linux/page_reporting.h
> index 3b99e0ec24f2..fe648dfa3a7c 100644
> --- a/include/linux/page_reporting.h
> +++ b/include/linux/page_reporting.h
> @@ -18,6 +18,9 @@ struct page_reporting_dev_info {
>
>         /* Current state of page reporting */
>         atomic_t state;
> +
> +       /* Minimal order of page reporting */
> +       unsigned int order;
>  };
>
>  /* Tear-down and bring-up for page reporting devices */
> diff --git a/mm/page_reporting.c b/mm/page_reporting.c
> index 34bf4d26c2c4..382958eef8a9 100644
> --- a/mm/page_reporting.c
> +++ b/mm/page_reporting.c
> @@ -329,6 +329,12 @@ int page_reporting_register(struct page_reporting_dev_info *prdev)
>                 goto err_out;
>         }
>
> +       /*
> +        * Update the page reporting order if it's specified by driver.
> +        * Otherwise, it falls back to @pageblock_order.
> +        */
> +       page_reporting_order = prdev->order ? : pageblock_order;
> +

An alternative to this would be to look at setting up some
comparisons. I might add another variable and do something like:
order = prdev->order ? : pageblock_order;
if (order < page_reporting_order)
    page_reporting_order = order;

You could essentially do something similar in the previous patch but
just use pageblock_order directly rather than having to add a local
variable.

That way if you need to still pull down the page reporting order you
can do so without prdev->order or pageblock_order overwriting the
value and pushing it back up.
Gavin Shan June 25, 2021, 4 a.m. UTC | #2
On 6/25/21 11:19 AM, Alexander Duyck wrote:
> On Thu, Jun 24, 2021 at 4:46 PM Gavin Shan <gshan@redhat.com> wrote:
>>
>> The page reporting order (threshold) is sticky to @pageblock_order
>> by default. The page reporting can never be triggered because the
>> freeing page can't come up with a free area like that huge. The
>> situation becomes worse when the system memory becomes heavily
>> fragmented.
>>
>> For example, the following configurations are used on ARM64 when 64KB
>> base page size is enabled. In this specific case, the page reporting
>> won't be triggered until the freeing page comes up with a 512MB free
>> area. That's hard to be met, especially when the system memory becomes
>> heavily fragmented.
>>
>>     PAGE_SIZE:          64KB
>>     HPAGE_SIZE:         512MB
>>     pageblock_order:    13       (512MB)
>>     MAX_ORDER:          14
>>
>> This allows the drivers to specify the page reporting order when the
>> page reporting device is registered. It falls back to @pageblock_order
>> if it's not specified by the driver. The existing users (hv_balloon
>> and virtio_balloon) don't specify it and @pageblock_order is still
>> taken as their page reporting order. So this shouldn't introduce any
>> functional changes.
>>
>> Signed-off-by: Gavin Shan <gshan@redhat.com>
>> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
>> ---
>>   include/linux/page_reporting.h | 3 +++
>>   mm/page_reporting.c            | 6 ++++++
>>   2 files changed, 9 insertions(+)
>>
>> diff --git a/include/linux/page_reporting.h b/include/linux/page_reporting.h
>> index 3b99e0ec24f2..fe648dfa3a7c 100644
>> --- a/include/linux/page_reporting.h
>> +++ b/include/linux/page_reporting.h
>> @@ -18,6 +18,9 @@ struct page_reporting_dev_info {
>>
>>          /* Current state of page reporting */
>>          atomic_t state;
>> +
>> +       /* Minimal order of page reporting */
>> +       unsigned int order;
>>   };
>>
>>   /* Tear-down and bring-up for page reporting devices */
>> diff --git a/mm/page_reporting.c b/mm/page_reporting.c
>> index 34bf4d26c2c4..382958eef8a9 100644
>> --- a/mm/page_reporting.c
>> +++ b/mm/page_reporting.c
>> @@ -329,6 +329,12 @@ int page_reporting_register(struct page_reporting_dev_info *prdev)
>>                  goto err_out;
>>          }
>>
>> +       /*
>> +        * Update the page reporting order if it's specified by driver.
>> +        * Otherwise, it falls back to @pageblock_order.
>> +        */
>> +       page_reporting_order = prdev->order ? : pageblock_order;
>> +
> 
> An alternative to this would be to look at setting up some
> comparisons. I might add another variable and do something like:
> order = prdev->order ? : pageblock_order;
> if (order < page_reporting_order)
>      page_reporting_order = order;
> 
> You could essentially do something similar in the previous patch but
> just use pageblock_order directly rather than having to add a local
> variable.
> 
> That way if you need to still pull down the page reporting order you
> can do so without prdev->order or pageblock_order overwriting the
> value and pushing it back up.
> 

Thanks, Alex. Lets do both in v5, which will be posted shortly.

Thanks,
Gavin
Gavin Shan June 25, 2021, 4:24 a.m. UTC | #3
On 6/25/21 2:00 PM, Gavin Shan wrote:
> On 6/25/21 11:19 AM, Alexander Duyck wrote:
>> On Thu, Jun 24, 2021 at 4:46 PM Gavin Shan <gshan@redhat.com> wrote:
>>>
>>> The page reporting order (threshold) is sticky to @pageblock_order
>>> by default. The page reporting can never be triggered because the
>>> freeing page can't come up with a free area like that huge. The
>>> situation becomes worse when the system memory becomes heavily
>>> fragmented.
>>>
>>> For example, the following configurations are used on ARM64 when 64KB
>>> base page size is enabled. In this specific case, the page reporting
>>> won't be triggered until the freeing page comes up with a 512MB free
>>> area. That's hard to be met, especially when the system memory becomes
>>> heavily fragmented.
>>>
>>>     PAGE_SIZE:          64KB
>>>     HPAGE_SIZE:         512MB
>>>     pageblock_order:    13       (512MB)
>>>     MAX_ORDER:          14
>>>
>>> This allows the drivers to specify the page reporting order when the
>>> page reporting device is registered. It falls back to @pageblock_order
>>> if it's not specified by the driver. The existing users (hv_balloon
>>> and virtio_balloon) don't specify it and @pageblock_order is still
>>> taken as their page reporting order. So this shouldn't introduce any
>>> functional changes.
>>>
>>> Signed-off-by: Gavin Shan <gshan@redhat.com>
>>> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
>>> ---
>>>   include/linux/page_reporting.h | 3 +++
>>>   mm/page_reporting.c            | 6 ++++++
>>>   2 files changed, 9 insertions(+)
>>>
>>> diff --git a/include/linux/page_reporting.h b/include/linux/page_reporting.h
>>> index 3b99e0ec24f2..fe648dfa3a7c 100644
>>> --- a/include/linux/page_reporting.h
>>> +++ b/include/linux/page_reporting.h
>>> @@ -18,6 +18,9 @@ struct page_reporting_dev_info {
>>>
>>>          /* Current state of page reporting */
>>>          atomic_t state;
>>> +
>>> +       /* Minimal order of page reporting */
>>> +       unsigned int order;
>>>   };
>>>
>>>   /* Tear-down and bring-up for page reporting devices */
>>> diff --git a/mm/page_reporting.c b/mm/page_reporting.c
>>> index 34bf4d26c2c4..382958eef8a9 100644
>>> --- a/mm/page_reporting.c
>>> +++ b/mm/page_reporting.c
>>> @@ -329,6 +329,12 @@ int page_reporting_register(struct page_reporting_dev_info *prdev)
>>>                  goto err_out;
>>>          }
>>>
>>> +       /*
>>> +        * Update the page reporting order if it's specified by driver.
>>> +        * Otherwise, it falls back to @pageblock_order.
>>> +        */
>>> +       page_reporting_order = prdev->order ? : pageblock_order;
>>> +
>>
>> An alternative to this would be to look at setting up some
>> comparisons. I might add another variable and do something like:
>> order = prdev->order ? : pageblock_order;
>> if (order < page_reporting_order)
>>      page_reporting_order = order;
>>
>> You could essentially do something similar in the previous patch but
>> just use pageblock_order directly rather than having to add a local
>> variable.
>>
>> That way if you need to still pull down the page reporting order you
>> can do so without prdev->order or pageblock_order overwriting the
>> value and pushing it back up.
>>
> 
> Thanks, Alex. Lets do both in v5, which will be posted shortly.
> 

Alex, I just posted v5 to have the checks you suggested. Could
you help to have a quick scan. It's pointless to let Andrew
drop the patches and apply the last one again :)

Thanks,
Gavin
Michael S. Tsirkin June 25, 2021, 5:48 a.m. UTC | #4
On Fri, Jun 25, 2021 at 09:47:09AM +0800, Gavin Shan wrote:
> The page reporting order (threshold) is sticky to @pageblock_order
> by default. The page reporting can never be triggered because the
> freeing page can't come up with a free area like that huge. The
> situation becomes worse when the system memory becomes heavily
> fragmented.
> 
> For example, the following configurations are used on ARM64 when 64KB
> base page size is enabled. In this specific case, the page reporting
> won't be triggered until the freeing page comes up with a 512MB free
> area. That's hard to be met, especially when the system memory becomes
> heavily fragmented.
> 
>    PAGE_SIZE:          64KB
>    HPAGE_SIZE:         512MB
>    pageblock_order:    13       (512MB)
>    MAX_ORDER:          14
> 
> This allows the drivers to specify the page reporting order when the
> page reporting device is registered. It falls back to @pageblock_order
> if it's not specified by the driver. The existing users (hv_balloon
> and virtio_balloon) don't specify it and @pageblock_order is still
> taken as their page reporting order. So this shouldn't introduce any
> functional changes.
> 
> Signed-off-by: Gavin Shan <gshan@redhat.com>
> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
> ---
>  include/linux/page_reporting.h | 3 +++
>  mm/page_reporting.c            | 6 ++++++
>  2 files changed, 9 insertions(+)
> 
> diff --git a/include/linux/page_reporting.h b/include/linux/page_reporting.h
> index 3b99e0ec24f2..fe648dfa3a7c 100644
> --- a/include/linux/page_reporting.h
> +++ b/include/linux/page_reporting.h
> @@ -18,6 +18,9 @@ struct page_reporting_dev_info {
>  
>  	/* Current state of page reporting */
>  	atomic_t state;
> +
> +	/* Minimal order of page reporting */
> +	unsigned int order;
>  };
>  
>  /* Tear-down and bring-up for page reporting devices */
> diff --git a/mm/page_reporting.c b/mm/page_reporting.c
> index 34bf4d26c2c4..382958eef8a9 100644
> --- a/mm/page_reporting.c
> +++ b/mm/page_reporting.c
> @@ -329,6 +329,12 @@ int page_reporting_register(struct page_reporting_dev_info *prdev)
>  		goto err_out;
>  	}
>  
> +	/*
> +	 * Update the page reporting order if it's specified by driver.
> +	 * Otherwise, it falls back to @pageblock_order.
> +	 */
> +	page_reporting_order = prdev->order ? : pageblock_order;
> +

Hmm. So on ARM achitectures with 64K pages, the command line parameter
is silently ignored?

Isn't this a problem?

>  	/* initialize state and work structures */
>  	atomic_set(&prdev->state, PAGE_REPORTING_IDLE);
>  	INIT_DELAYED_WORK(&prdev->work, &page_reporting_process);
> -- 
> 2.23.0
Gavin Shan June 25, 2021, 6:04 a.m. UTC | #5
On 6/25/21 3:48 PM, Michael S. Tsirkin wrote:
> On Fri, Jun 25, 2021 at 09:47:09AM +0800, Gavin Shan wrote:
>> The page reporting order (threshold) is sticky to @pageblock_order
>> by default. The page reporting can never be triggered because the
>> freeing page can't come up with a free area like that huge. The
>> situation becomes worse when the system memory becomes heavily
>> fragmented.
>>
>> For example, the following configurations are used on ARM64 when 64KB
>> base page size is enabled. In this specific case, the page reporting
>> won't be triggered until the freeing page comes up with a 512MB free
>> area. That's hard to be met, especially when the system memory becomes
>> heavily fragmented.
>>
>>     PAGE_SIZE:          64KB
>>     HPAGE_SIZE:         512MB
>>     pageblock_order:    13       (512MB)
>>     MAX_ORDER:          14
>>
>> This allows the drivers to specify the page reporting order when the
>> page reporting device is registered. It falls back to @pageblock_order
>> if it's not specified by the driver. The existing users (hv_balloon
>> and virtio_balloon) don't specify it and @pageblock_order is still
>> taken as their page reporting order. So this shouldn't introduce any
>> functional changes.
>>
>> Signed-off-by: Gavin Shan <gshan@redhat.com>
>> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
>> ---
>>   include/linux/page_reporting.h | 3 +++
>>   mm/page_reporting.c            | 6 ++++++
>>   2 files changed, 9 insertions(+)
>>
>> diff --git a/include/linux/page_reporting.h b/include/linux/page_reporting.h
>> index 3b99e0ec24f2..fe648dfa3a7c 100644
>> --- a/include/linux/page_reporting.h
>> +++ b/include/linux/page_reporting.h
>> @@ -18,6 +18,9 @@ struct page_reporting_dev_info {
>>   
>>   	/* Current state of page reporting */
>>   	atomic_t state;
>> +
>> +	/* Minimal order of page reporting */
>> +	unsigned int order;
>>   };
>>   
>>   /* Tear-down and bring-up for page reporting devices */
>> diff --git a/mm/page_reporting.c b/mm/page_reporting.c
>> index 34bf4d26c2c4..382958eef8a9 100644
>> --- a/mm/page_reporting.c
>> +++ b/mm/page_reporting.c
>> @@ -329,6 +329,12 @@ int page_reporting_register(struct page_reporting_dev_info *prdev)
>>   		goto err_out;
>>   	}
>>   
>> +	/*
>> +	 * Update the page reporting order if it's specified by driver.
>> +	 * Otherwise, it falls back to @pageblock_order.
>> +	 */
>> +	page_reporting_order = prdev->order ? : pageblock_order;
>> +
> 
> Hmm. So on ARM achitectures with 64K pages, the command line parameter
> is silently ignored?
> 
> Isn't this a problem?
> 

It's fine as the command line parameter is used as debugging purpose.
Besides, it can be changed by writing to the following file:

    /sys/module/page_reporting/parameters/page_reporting_order


>>   	/* initialize state and work structures */
>>   	atomic_set(&prdev->state, PAGE_REPORTING_IDLE);
>>   	INIT_DELAYED_WORK(&prdev->work, &page_reporting_process);
>> -- 
>> 2.23.0

Thanks,
Gavin
diff mbox series

Patch

diff --git a/include/linux/page_reporting.h b/include/linux/page_reporting.h
index 3b99e0ec24f2..fe648dfa3a7c 100644
--- a/include/linux/page_reporting.h
+++ b/include/linux/page_reporting.h
@@ -18,6 +18,9 @@  struct page_reporting_dev_info {
 
 	/* Current state of page reporting */
 	atomic_t state;
+
+	/* Minimal order of page reporting */
+	unsigned int order;
 };
 
 /* Tear-down and bring-up for page reporting devices */
diff --git a/mm/page_reporting.c b/mm/page_reporting.c
index 34bf4d26c2c4..382958eef8a9 100644
--- a/mm/page_reporting.c
+++ b/mm/page_reporting.c
@@ -329,6 +329,12 @@  int page_reporting_register(struct page_reporting_dev_info *prdev)
 		goto err_out;
 	}
 
+	/*
+	 * Update the page reporting order if it's specified by driver.
+	 * Otherwise, it falls back to @pageblock_order.
+	 */
+	page_reporting_order = prdev->order ? : pageblock_order;
+
 	/* initialize state and work structures */
 	atomic_set(&prdev->state, PAGE_REPORTING_IDLE);
 	INIT_DELAYED_WORK(&prdev->work, &page_reporting_process);