diff mbox series

mm/thp: decrease nr_thps in file's mapping on THP split

Message ID 20211012120237.2600-1-m.szyprowski@samsung.com (mailing list archive)
State New
Headers show
Series mm/thp: decrease nr_thps in file's mapping on THP split | expand

Commit Message

Marek Szyprowski Oct. 12, 2021, 12:02 p.m. UTC
Decrease nr_thps counter in file's mapping to ensure that the page cache
won't be dropped excessively on file write access if page has been
already splitted.

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Fixes: 09d91cda0e82 ("mm,thp: avoid writes to file with THP in pagecache")
Fixes: 06d3eff62d9d ("mm/thp: fix node page state in split_huge_page_to_list()")
---
I've analyzed the code a few times but either I missed something or the
nr_thps counter is not decremented during the THP split on non-shmem file
pages.
---
 mm/huge_memory.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

Comments

Matthew Wilcox Oct. 12, 2021, 12:43 p.m. UTC | #1
On Tue, Oct 12, 2021 at 02:02:37PM +0200, Marek Szyprowski wrote:
> Decrease nr_thps counter in file's mapping to ensure that the page cache
> won't be dropped excessively on file write access if page has been
> already splitted.
> 
> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
> Fixes: 09d91cda0e82 ("mm,thp: avoid writes to file with THP in pagecache")
> Fixes: 06d3eff62d9d ("mm/thp: fix node page state in split_huge_page_to_list()")
> ---
> I've analyzed the code a few times but either I missed something or the
> nr_thps counter is not decremented during the THP split on non-shmem file
> pages.

This looks OK to me, but have you tested it?  If so, what workload did
you use?  The way you wrote this changelog makes it sound like you only
read the code and there have been rather too many bugs introduced recently
that way :-(
Marek Szyprowski Oct. 13, 2021, 10:47 a.m. UTC | #2
On 12.10.2021 14:43, Matthew Wilcox wrote:
> On Tue, Oct 12, 2021 at 02:02:37PM +0200, Marek Szyprowski wrote:
>> Decrease nr_thps counter in file's mapping to ensure that the page cache
>> won't be dropped excessively on file write access if page has been
>> already splitted.
>>
>> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
>> Fixes: 09d91cda0e82 ("mm,thp: avoid writes to file with THP in pagecache")
>> Fixes: 06d3eff62d9d ("mm/thp: fix node page state in split_huge_page_to_list()")
>> ---
>> I've analyzed the code a few times but either I missed something or the
>> nr_thps counter is not decremented during the THP split on non-shmem file
>> pages.
> This looks OK to me, but have you tested it?  If so, what workload did
> you use?  The way you wrote this changelog makes it sound like you only
> read the code and there have been rather too many bugs introduced recently
> that way :-(

Well, indeed I've found it while reading the code. However I've just 
tried a test scenario, where one runs a big binary, kernel remaps it 
with THPs, then one forces THP split with 
/sys/kernel/debug/split_huge_pages. During any further open of that 
binary with O_RDWR or O_WRITEONLY kernel drops page cache for it, 
because of non-zero thps counter.

Best regards
Matthew Wilcox Oct. 13, 2021, 12:01 p.m. UTC | #3
On Wed, Oct 13, 2021 at 12:47:03PM +0200, Marek Szyprowski wrote:
> On 12.10.2021 14:43, Matthew Wilcox wrote:
> > On Tue, Oct 12, 2021 at 02:02:37PM +0200, Marek Szyprowski wrote:
> >> Decrease nr_thps counter in file's mapping to ensure that the page cache
> >> won't be dropped excessively on file write access if page has been
> >> already splitted.
> >>
> >> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
> >> Fixes: 09d91cda0e82 ("mm,thp: avoid writes to file with THP in pagecache")
> >> Fixes: 06d3eff62d9d ("mm/thp: fix node page state in split_huge_page_to_list()")
> >> ---
> >> I've analyzed the code a few times but either I missed something or the
> >> nr_thps counter is not decremented during the THP split on non-shmem file
> >> pages.
> > This looks OK to me, but have you tested it?  If so, what workload did
> > you use?  The way you wrote this changelog makes it sound like you only
> > read the code and there have been rather too many bugs introduced recently
> > that way :-(
> 
> Well, indeed I've found it while reading the code. However I've just 
> tried a test scenario, where one runs a big binary, kernel remaps it 
> with THPs, then one forces THP split with 
> /sys/kernel/debug/split_huge_pages. During any further open of that 
> binary with O_RDWR or O_WRITEONLY kernel drops page cache for it, 
> because of non-zero thps counter.

... and with this patch, it no longer happens?  Good enough for me!

Acked-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Yang Shi Oct. 13, 2021, 9:44 p.m. UTC | #4
On Tue, Oct 12, 2021 at 5:03 AM Marek Szyprowski
<m.szyprowski@samsung.com> wrote:
>
> Decrease nr_thps counter in file's mapping to ensure that the page cache
> won't be dropped excessively on file write access if page has been
> already splitted.
>
> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
> Fixes: 09d91cda0e82 ("mm,thp: avoid writes to file with THP in pagecache")
> Fixes: 06d3eff62d9d ("mm/thp: fix node page state in split_huge_page_to_list()")

Reviewed-by: Yang Shi <shy828301@gmail.com>

> ---
> I've analyzed the code a few times but either I missed something or the
> nr_thps counter is not decremented during the THP split on non-shmem file
> pages.
> ---
>  mm/huge_memory.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index ec2bb93f7431..a6c2ba59abcd 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -2709,10 +2709,12 @@ int split_huge_page_to_list(struct page *page, struct list_head *list)
>                 }
>                 spin_unlock(&ds_queue->split_queue_lock);
>                 if (mapping) {
> -                       if (PageSwapBacked(head))
> +                       if (PageSwapBacked(head)) {
>                                 __dec_node_page_state(head, NR_SHMEM_THPS);
> -                       else
> +                       } else {
>                                 __dec_node_page_state(head, NR_FILE_THPS);
> +                               filemap_nr_thps_dec(mapping);
> +                       }
>                 }
>
>                 __split_huge_page(page, list, end, flags);
> --
> 2.17.1
>
>
diff mbox series

Patch

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index ec2bb93f7431..a6c2ba59abcd 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2709,10 +2709,12 @@  int split_huge_page_to_list(struct page *page, struct list_head *list)
 		}
 		spin_unlock(&ds_queue->split_queue_lock);
 		if (mapping) {
-			if (PageSwapBacked(head))
+			if (PageSwapBacked(head)) {
 				__dec_node_page_state(head, NR_SHMEM_THPS);
-			else
+			} else {
 				__dec_node_page_state(head, NR_FILE_THPS);
+				filemap_nr_thps_dec(mapping);
+			}
 		}
 
 		__split_huge_page(page, list, end, flags);