Message ID | 20210625014710.42954-4-gshan@redhat.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | mm/page_reporting: Make page reporting work on arm64 with 64KB page size | expand |
On Thu, Jun 24, 2021 at 4:46 PM Gavin Shan <gshan@redhat.com> wrote: > > The page reporting order (threshold) is sticky to @pageblock_order > by default. The page reporting can never be triggered because the > freeing page can't come up with a free area like that huge. The > situation becomes worse when the system memory becomes heavily > fragmented. > > For example, the following configurations are used on ARM64 when 64KB > base page size is enabled. In this specific case, the page reporting > won't be triggered until the freeing page comes up with a 512MB free > area. That's hard to be met, especially when the system memory becomes > heavily fragmented. > > PAGE_SIZE: 64KB > HPAGE_SIZE: 512MB > pageblock_order: 13 (512MB) > MAX_ORDER: 14 > > This allows the drivers to specify the page reporting order when the > page reporting device is registered. It falls back to @pageblock_order > if it's not specified by the driver. The existing users (hv_balloon > and virtio_balloon) don't specify it and @pageblock_order is still > taken as their page reporting order. So this shouldn't introduce any > functional changes. > > Signed-off-by: Gavin Shan <gshan@redhat.com> > Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> > --- > include/linux/page_reporting.h | 3 +++ > mm/page_reporting.c | 6 ++++++ > 2 files changed, 9 insertions(+) > > diff --git a/include/linux/page_reporting.h b/include/linux/page_reporting.h > index 3b99e0ec24f2..fe648dfa3a7c 100644 > --- a/include/linux/page_reporting.h > +++ b/include/linux/page_reporting.h > @@ -18,6 +18,9 @@ struct page_reporting_dev_info { > > /* Current state of page reporting */ > atomic_t state; > + > + /* Minimal order of page reporting */ > + unsigned int order; > }; > > /* Tear-down and bring-up for page reporting devices */ > diff --git a/mm/page_reporting.c b/mm/page_reporting.c > index 34bf4d26c2c4..382958eef8a9 100644 > --- a/mm/page_reporting.c > +++ b/mm/page_reporting.c > @@ -329,6 +329,12 @@ int page_reporting_register(struct page_reporting_dev_info *prdev) > goto err_out; > } > > + /* > + * Update the page reporting order if it's specified by driver. > + * Otherwise, it falls back to @pageblock_order. > + */ > + page_reporting_order = prdev->order ? : pageblock_order; > + An alternative to this would be to look at setting up some comparisons. I might add another variable and do something like: order = prdev->order ? : pageblock_order; if (order < page_reporting_order) page_reporting_order = order; You could essentially do something similar in the previous patch but just use pageblock_order directly rather than having to add a local variable. That way if you need to still pull down the page reporting order you can do so without prdev->order or pageblock_order overwriting the value and pushing it back up.
On 6/25/21 11:19 AM, Alexander Duyck wrote: > On Thu, Jun 24, 2021 at 4:46 PM Gavin Shan <gshan@redhat.com> wrote: >> >> The page reporting order (threshold) is sticky to @pageblock_order >> by default. The page reporting can never be triggered because the >> freeing page can't come up with a free area like that huge. The >> situation becomes worse when the system memory becomes heavily >> fragmented. >> >> For example, the following configurations are used on ARM64 when 64KB >> base page size is enabled. In this specific case, the page reporting >> won't be triggered until the freeing page comes up with a 512MB free >> area. That's hard to be met, especially when the system memory becomes >> heavily fragmented. >> >> PAGE_SIZE: 64KB >> HPAGE_SIZE: 512MB >> pageblock_order: 13 (512MB) >> MAX_ORDER: 14 >> >> This allows the drivers to specify the page reporting order when the >> page reporting device is registered. It falls back to @pageblock_order >> if it's not specified by the driver. The existing users (hv_balloon >> and virtio_balloon) don't specify it and @pageblock_order is still >> taken as their page reporting order. So this shouldn't introduce any >> functional changes. >> >> Signed-off-by: Gavin Shan <gshan@redhat.com> >> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> >> --- >> include/linux/page_reporting.h | 3 +++ >> mm/page_reporting.c | 6 ++++++ >> 2 files changed, 9 insertions(+) >> >> diff --git a/include/linux/page_reporting.h b/include/linux/page_reporting.h >> index 3b99e0ec24f2..fe648dfa3a7c 100644 >> --- a/include/linux/page_reporting.h >> +++ b/include/linux/page_reporting.h >> @@ -18,6 +18,9 @@ struct page_reporting_dev_info { >> >> /* Current state of page reporting */ >> atomic_t state; >> + >> + /* Minimal order of page reporting */ >> + unsigned int order; >> }; >> >> /* Tear-down and bring-up for page reporting devices */ >> diff --git a/mm/page_reporting.c b/mm/page_reporting.c >> index 34bf4d26c2c4..382958eef8a9 100644 >> --- a/mm/page_reporting.c >> +++ b/mm/page_reporting.c >> @@ -329,6 +329,12 @@ int page_reporting_register(struct page_reporting_dev_info *prdev) >> goto err_out; >> } >> >> + /* >> + * Update the page reporting order if it's specified by driver. >> + * Otherwise, it falls back to @pageblock_order. >> + */ >> + page_reporting_order = prdev->order ? : pageblock_order; >> + > > An alternative to this would be to look at setting up some > comparisons. I might add another variable and do something like: > order = prdev->order ? : pageblock_order; > if (order < page_reporting_order) > page_reporting_order = order; > > You could essentially do something similar in the previous patch but > just use pageblock_order directly rather than having to add a local > variable. > > That way if you need to still pull down the page reporting order you > can do so without prdev->order or pageblock_order overwriting the > value and pushing it back up. > Thanks, Alex. Lets do both in v5, which will be posted shortly. Thanks, Gavin
On 6/25/21 2:00 PM, Gavin Shan wrote: > On 6/25/21 11:19 AM, Alexander Duyck wrote: >> On Thu, Jun 24, 2021 at 4:46 PM Gavin Shan <gshan@redhat.com> wrote: >>> >>> The page reporting order (threshold) is sticky to @pageblock_order >>> by default. The page reporting can never be triggered because the >>> freeing page can't come up with a free area like that huge. The >>> situation becomes worse when the system memory becomes heavily >>> fragmented. >>> >>> For example, the following configurations are used on ARM64 when 64KB >>> base page size is enabled. In this specific case, the page reporting >>> won't be triggered until the freeing page comes up with a 512MB free >>> area. That's hard to be met, especially when the system memory becomes >>> heavily fragmented. >>> >>> PAGE_SIZE: 64KB >>> HPAGE_SIZE: 512MB >>> pageblock_order: 13 (512MB) >>> MAX_ORDER: 14 >>> >>> This allows the drivers to specify the page reporting order when the >>> page reporting device is registered. It falls back to @pageblock_order >>> if it's not specified by the driver. The existing users (hv_balloon >>> and virtio_balloon) don't specify it and @pageblock_order is still >>> taken as their page reporting order. So this shouldn't introduce any >>> functional changes. >>> >>> Signed-off-by: Gavin Shan <gshan@redhat.com> >>> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> >>> --- >>> include/linux/page_reporting.h | 3 +++ >>> mm/page_reporting.c | 6 ++++++ >>> 2 files changed, 9 insertions(+) >>> >>> diff --git a/include/linux/page_reporting.h b/include/linux/page_reporting.h >>> index 3b99e0ec24f2..fe648dfa3a7c 100644 >>> --- a/include/linux/page_reporting.h >>> +++ b/include/linux/page_reporting.h >>> @@ -18,6 +18,9 @@ struct page_reporting_dev_info { >>> >>> /* Current state of page reporting */ >>> atomic_t state; >>> + >>> + /* Minimal order of page reporting */ >>> + unsigned int order; >>> }; >>> >>> /* Tear-down and bring-up for page reporting devices */ >>> diff --git a/mm/page_reporting.c b/mm/page_reporting.c >>> index 34bf4d26c2c4..382958eef8a9 100644 >>> --- a/mm/page_reporting.c >>> +++ b/mm/page_reporting.c >>> @@ -329,6 +329,12 @@ int page_reporting_register(struct page_reporting_dev_info *prdev) >>> goto err_out; >>> } >>> >>> + /* >>> + * Update the page reporting order if it's specified by driver. >>> + * Otherwise, it falls back to @pageblock_order. >>> + */ >>> + page_reporting_order = prdev->order ? : pageblock_order; >>> + >> >> An alternative to this would be to look at setting up some >> comparisons. I might add another variable and do something like: >> order = prdev->order ? : pageblock_order; >> if (order < page_reporting_order) >> page_reporting_order = order; >> >> You could essentially do something similar in the previous patch but >> just use pageblock_order directly rather than having to add a local >> variable. >> >> That way if you need to still pull down the page reporting order you >> can do so without prdev->order or pageblock_order overwriting the >> value and pushing it back up. >> > > Thanks, Alex. Lets do both in v5, which will be posted shortly. > Alex, I just posted v5 to have the checks you suggested. Could you help to have a quick scan. It's pointless to let Andrew drop the patches and apply the last one again :) Thanks, Gavin
On Fri, Jun 25, 2021 at 09:47:09AM +0800, Gavin Shan wrote: > The page reporting order (threshold) is sticky to @pageblock_order > by default. The page reporting can never be triggered because the > freeing page can't come up with a free area like that huge. The > situation becomes worse when the system memory becomes heavily > fragmented. > > For example, the following configurations are used on ARM64 when 64KB > base page size is enabled. In this specific case, the page reporting > won't be triggered until the freeing page comes up with a 512MB free > area. That's hard to be met, especially when the system memory becomes > heavily fragmented. > > PAGE_SIZE: 64KB > HPAGE_SIZE: 512MB > pageblock_order: 13 (512MB) > MAX_ORDER: 14 > > This allows the drivers to specify the page reporting order when the > page reporting device is registered. It falls back to @pageblock_order > if it's not specified by the driver. The existing users (hv_balloon > and virtio_balloon) don't specify it and @pageblock_order is still > taken as their page reporting order. So this shouldn't introduce any > functional changes. > > Signed-off-by: Gavin Shan <gshan@redhat.com> > Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> > --- > include/linux/page_reporting.h | 3 +++ > mm/page_reporting.c | 6 ++++++ > 2 files changed, 9 insertions(+) > > diff --git a/include/linux/page_reporting.h b/include/linux/page_reporting.h > index 3b99e0ec24f2..fe648dfa3a7c 100644 > --- a/include/linux/page_reporting.h > +++ b/include/linux/page_reporting.h > @@ -18,6 +18,9 @@ struct page_reporting_dev_info { > > /* Current state of page reporting */ > atomic_t state; > + > + /* Minimal order of page reporting */ > + unsigned int order; > }; > > /* Tear-down and bring-up for page reporting devices */ > diff --git a/mm/page_reporting.c b/mm/page_reporting.c > index 34bf4d26c2c4..382958eef8a9 100644 > --- a/mm/page_reporting.c > +++ b/mm/page_reporting.c > @@ -329,6 +329,12 @@ int page_reporting_register(struct page_reporting_dev_info *prdev) > goto err_out; > } > > + /* > + * Update the page reporting order if it's specified by driver. > + * Otherwise, it falls back to @pageblock_order. > + */ > + page_reporting_order = prdev->order ? : pageblock_order; > + Hmm. So on ARM achitectures with 64K pages, the command line parameter is silently ignored? Isn't this a problem? > /* initialize state and work structures */ > atomic_set(&prdev->state, PAGE_REPORTING_IDLE); > INIT_DELAYED_WORK(&prdev->work, &page_reporting_process); > -- > 2.23.0
On 6/25/21 3:48 PM, Michael S. Tsirkin wrote: > On Fri, Jun 25, 2021 at 09:47:09AM +0800, Gavin Shan wrote: >> The page reporting order (threshold) is sticky to @pageblock_order >> by default. The page reporting can never be triggered because the >> freeing page can't come up with a free area like that huge. The >> situation becomes worse when the system memory becomes heavily >> fragmented. >> >> For example, the following configurations are used on ARM64 when 64KB >> base page size is enabled. In this specific case, the page reporting >> won't be triggered until the freeing page comes up with a 512MB free >> area. That's hard to be met, especially when the system memory becomes >> heavily fragmented. >> >> PAGE_SIZE: 64KB >> HPAGE_SIZE: 512MB >> pageblock_order: 13 (512MB) >> MAX_ORDER: 14 >> >> This allows the drivers to specify the page reporting order when the >> page reporting device is registered. It falls back to @pageblock_order >> if it's not specified by the driver. The existing users (hv_balloon >> and virtio_balloon) don't specify it and @pageblock_order is still >> taken as their page reporting order. So this shouldn't introduce any >> functional changes. >> >> Signed-off-by: Gavin Shan <gshan@redhat.com> >> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> >> --- >> include/linux/page_reporting.h | 3 +++ >> mm/page_reporting.c | 6 ++++++ >> 2 files changed, 9 insertions(+) >> >> diff --git a/include/linux/page_reporting.h b/include/linux/page_reporting.h >> index 3b99e0ec24f2..fe648dfa3a7c 100644 >> --- a/include/linux/page_reporting.h >> +++ b/include/linux/page_reporting.h >> @@ -18,6 +18,9 @@ struct page_reporting_dev_info { >> >> /* Current state of page reporting */ >> atomic_t state; >> + >> + /* Minimal order of page reporting */ >> + unsigned int order; >> }; >> >> /* Tear-down and bring-up for page reporting devices */ >> diff --git a/mm/page_reporting.c b/mm/page_reporting.c >> index 34bf4d26c2c4..382958eef8a9 100644 >> --- a/mm/page_reporting.c >> +++ b/mm/page_reporting.c >> @@ -329,6 +329,12 @@ int page_reporting_register(struct page_reporting_dev_info *prdev) >> goto err_out; >> } >> >> + /* >> + * Update the page reporting order if it's specified by driver. >> + * Otherwise, it falls back to @pageblock_order. >> + */ >> + page_reporting_order = prdev->order ? : pageblock_order; >> + > > Hmm. So on ARM achitectures with 64K pages, the command line parameter > is silently ignored? > > Isn't this a problem? > It's fine as the command line parameter is used as debugging purpose. Besides, it can be changed by writing to the following file: /sys/module/page_reporting/parameters/page_reporting_order >> /* initialize state and work structures */ >> atomic_set(&prdev->state, PAGE_REPORTING_IDLE); >> INIT_DELAYED_WORK(&prdev->work, &page_reporting_process); >> -- >> 2.23.0 Thanks, Gavin
diff --git a/include/linux/page_reporting.h b/include/linux/page_reporting.h index 3b99e0ec24f2..fe648dfa3a7c 100644 --- a/include/linux/page_reporting.h +++ b/include/linux/page_reporting.h @@ -18,6 +18,9 @@ struct page_reporting_dev_info { /* Current state of page reporting */ atomic_t state; + + /* Minimal order of page reporting */ + unsigned int order; }; /* Tear-down and bring-up for page reporting devices */ diff --git a/mm/page_reporting.c b/mm/page_reporting.c index 34bf4d26c2c4..382958eef8a9 100644 --- a/mm/page_reporting.c +++ b/mm/page_reporting.c @@ -329,6 +329,12 @@ int page_reporting_register(struct page_reporting_dev_info *prdev) goto err_out; } + /* + * Update the page reporting order if it's specified by driver. + * Otherwise, it falls back to @pageblock_order. + */ + page_reporting_order = prdev->order ? : pageblock_order; + /* initialize state and work structures */ atomic_set(&prdev->state, PAGE_REPORTING_IDLE); INIT_DELAYED_WORK(&prdev->work, &page_reporting_process);