Message ID | 20200828142546.GN14765@casper.infradead.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Is shmem page accounting wrong on split? | expand |
On Fri, Aug 28, 2020 at 03:25:46PM +0100, Matthew Wilcox wrote: > If I understand truncate of a shmem THP correctly ... > > Let's suppose the file has a single 2MB page at index 0, and is being > truncated down to 7 bytes in size. > > shmem_setattr() > i_size_write(7); > shmem_truncate_range(7, -1); > shmem_undo_range(7, -1) > start = 1; > page = &head[1]; > shmem_punch_compound(); > split_huge_page() > end = DIV_ROUND_UP(i_size_read(mapping->host), PAGE_SIZE); # == 1 > __split_huge_page(..., 1, ...); > __delete_from_page_cache(&head[1], ...); > truncate_inode_page(page); > delete_from_page_cache(page) > __delete_from_page_cache(&head[1]) > > I think the solution is to call truncate_inode_page() from within > shmem_punch_compound() if we don't call split_huge_page(). I came across > this while reusing all this infrastructure for the XFS THP patchset, > so I'm not in a great position to test this patch. Oh, this works for truncate, but not hole-punch. __split_huge_page() won't call __delete_from_page_cache() for pages below the end of the file. So maybe this instead? It's a bit cheesy ... maybe split_huge_page() could return 1 to indicate that it actually disposed of the page passed in? +++ b/mm/shmem.c @@ -827,7 +827,7 @@ static bool shmem_punch_compound(struct page *page, pgoff_t start, pgoff_t end) return true; /* Try to split huge page, so we can truly punch the hole or truncate */ - return split_huge_page(page) >= 0; + return split_huge_page(page) >= 0 && end < -1; } /*
On Fri, Aug 28, 2020 at 7:55 AM Matthew Wilcox <willy@infradead.org> wrote: > > On Fri, Aug 28, 2020 at 03:25:46PM +0100, Matthew Wilcox wrote: > > If I understand truncate of a shmem THP correctly ... > > > > Let's suppose the file has a single 2MB page at index 0, and is being > > truncated down to 7 bytes in size. > > > > shmem_setattr() > > i_size_write(7); > > shmem_truncate_range(7, -1); > > shmem_undo_range(7, -1) > > start = 1; > > page = &head[1]; > > shmem_punch_compound(); > > split_huge_page() > > end = DIV_ROUND_UP(i_size_read(mapping->host), PAGE_SIZE); # == 1 > > __split_huge_page(..., 1, ...); > > __delete_from_page_cache(&head[1], ...); > > truncate_inode_page(page); > > delete_from_page_cache(page) > > __delete_from_page_cache(&head[1]) > > > > I think the solution is to call truncate_inode_page() from within > > shmem_punch_compound() if we don't call split_huge_page(). I came across > > this while reusing all this infrastructure for the XFS THP patchset, > > so I'm not in a great position to test this patch. > > Oh, this works for truncate, but not hole-punch. __split_huge_page() > won't call __delete_from_page_cache() for pages below the end of the > file. So maybe this instead? > > It's a bit cheesy ... maybe split_huge_page() could return 1 to indicate > that it actually disposed of the page passed in? I'm fine to have split_huge_page() return 1. > > +++ b/mm/shmem.c > @@ -827,7 +827,7 @@ static bool shmem_punch_compound(struct page *page, pgoff_t start, pgoff_t end) > return true; > > /* Try to split huge page, so we can truly punch the hole or truncate */ > - return split_huge_page(page) >= 0; > + return split_huge_page(page) >= 0 && end < -1; It would be more clear if we could have some comment about what "-1" means. It took me a little while to understand the magic number, but once I understood it it looks more straightforward to me. > } > > /* > >
On Fri, 28 Aug 2020, Yang Shi wrote: > On Fri, Aug 28, 2020 at 7:55 AM Matthew Wilcox <willy@infradead.org> wrote: > > > > On Fri, Aug 28, 2020 at 03:25:46PM +0100, Matthew Wilcox wrote: > > > If I understand truncate of a shmem THP correctly ... > > > > > > Let's suppose the file has a single 2MB page at index 0, and is being > > > truncated down to 7 bytes in size. > > > > > > shmem_setattr() > > > i_size_write(7); > > > shmem_truncate_range(7, -1); > > > shmem_undo_range(7, -1) > > > start = 1; > > > page = &head[1]; > > > shmem_punch_compound(); > > > split_huge_page() > > > end = DIV_ROUND_UP(i_size_read(mapping->host), PAGE_SIZE); # == 1 > > > __split_huge_page(..., 1, ...); > > > __delete_from_page_cache(&head[1], ...); > > > truncate_inode_page(page); > > > delete_from_page_cache(page) > > > __delete_from_page_cache(&head[1]) > > > > > > I think the solution is to call truncate_inode_page() from within > > > shmem_punch_compound() if we don't call split_huge_page(). I came across > > > this while reusing all this infrastructure for the XFS THP patchset, > > > so I'm not in a great position to test this patch. It's a good observation of an oddity that I probably didn't think of, but you haven't said which kind of shmem page accounting goes wrong here (vm_enough_memory? df of filesystem? du of filesystem? memcg charge? all of the above? observed in practice?), and what needs solving. If that page has already been deleted from page cache when splitting, truncate_inode_page() sees NULL page->mapping != mapping and returns without doing anything. What's the problem? Hugh > > > > Oh, this works for truncate, but not hole-punch. __split_huge_page() > > won't call __delete_from_page_cache() for pages below the end of the > > file. So maybe this instead? > > > > It's a bit cheesy ... maybe split_huge_page() could return 1 to indicate > > that it actually disposed of the page passed in? > > I'm fine to have split_huge_page() return 1. > > > > > +++ b/mm/shmem.c > > @@ -827,7 +827,7 @@ static bool shmem_punch_compound(struct page *page, pgoff_t start, pgoff_t end) > > return true; > > > > /* Try to split huge page, so we can truly punch the hole or truncate */ > > - return split_huge_page(page) >= 0; > > + return split_huge_page(page) >= 0 && end < -1; > > It would be more clear if we could have some comment about what "-1" > means. It took me a little while to understand the magic number, but > once I understood it it looks more straightforward to me. > > > } > > > > /*
On Fri, Aug 28, 2020 at 10:08:52AM -0700, Hugh Dickins wrote: > On Fri, 28 Aug 2020, Yang Shi wrote: > > On Fri, Aug 28, 2020 at 7:55 AM Matthew Wilcox <willy@infradead.org> wrote: > > > > > > On Fri, Aug 28, 2020 at 03:25:46PM +0100, Matthew Wilcox wrote: > > > > If I understand truncate of a shmem THP correctly ... > > > > > > > > Let's suppose the file has a single 2MB page at index 0, and is being > > > > truncated down to 7 bytes in size. > > > > > > > > shmem_setattr() > > > > i_size_write(7); > > > > shmem_truncate_range(7, -1); > > > > shmem_undo_range(7, -1) > > > > start = 1; > > > > page = &head[1]; > > > > shmem_punch_compound(); > > > > split_huge_page() > > > > end = DIV_ROUND_UP(i_size_read(mapping->host), PAGE_SIZE); # == 1 > > > > __split_huge_page(..., 1, ...); > > > > __delete_from_page_cache(&head[1], ...); > > > > truncate_inode_page(page); > > > > delete_from_page_cache(page) > > > > __delete_from_page_cache(&head[1]) > > > > > > > > I think the solution is to call truncate_inode_page() from within > > > > shmem_punch_compound() if we don't call split_huge_page(). I came across > > > > this while reusing all this infrastructure for the XFS THP patchset, > > > > so I'm not in a great position to test this patch. > > It's a good observation of an oddity that I probably didn't think of, > but you haven't said which kind of shmem page accounting goes wrong here > (vm_enough_memory? df of filesystem? du of filesystem? memcg charge? > all of the above? observed in practice?), and what needs solving. > > If that page has already been deleted from page cache when splitting, > truncate_inode_page() sees NULL page->mapping != mapping and returns > without doing anything. What's the problem? Ah! I missed the check in truncate_inode_page(). This should be fine then. The problem I've observed in practice is following the same pattern in truncate_inode_pages_range(). The call to delete_from_page_cache_batch() trips the assertion that the page hasn't already been deleted from the page cache. I think the solution is obvious -- don't add the page to locked_pvec if page->mapping is NULL. if (thp_punch(page, lstart, lend)) pagevec_add(&locked_pvec, page); else unlock_page(page); } for (i = 0; i < pagevec_count(&locked_pvec); i++) truncate_cleanup_page(mapping, locked_pvec.pages[i]); delete_from_page_cache_batch(mapping, &locked_pvec); for (i = 0; i < pagevec_count(&locked_pvec); i++) unlock_page(locked_pvec.pages[i]); truncate_exceptional_pvec_entries(mapping, &pvec, indices); pagevec_release(&pvec); (shmem_punch_compound() got renamed to thp_punch() and moved to truncate.c)
On Fri, Aug 28, 2020 at 06:31:22PM +0100, Matthew Wilcox wrote: > On Fri, Aug 28, 2020 at 10:08:52AM -0700, Hugh Dickins wrote: > > It's a good observation of an oddity that I probably didn't think of, > > but you haven't said which kind of shmem page accounting goes wrong here > > (vm_enough_memory? df of filesystem? du of filesystem? memcg charge? > > all of the above? observed in practice?), and what needs solving. Oh, I forgot to say which > The problem I've observed in practice is following the same pattern in > truncate_inode_pages_range(). The call to delete_from_page_cache_batch() > trips the assertion that the page hasn't already been deleted from the > page cache. I think the solution is obvious -- don't add the page to > locked_pvec if page->mapping is NULL. Here's the change I'm currently testing. It's survived about eight minutes of xfstests so far, which is far longer than it was surviving before. - /* Try to split huge page, so we can truly punch the hole or truncate */ - return split_huge_page(page) >= 0; + /* + * split_huge_page may remove the page itself; if so, we should + * not attempt to remove it again. + */ + return split_huge_page(page) >= 0 && page->mapping;
diff --git a/mm/shmem.c b/mm/shmem.c index b2abca3f7f33..a0bc42974c2d 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -819,15 +819,18 @@ void shmem_unlock_mapping(struct address_space *mapping) static bool shmem_punch_compound(struct page *page, pgoff_t start, pgoff_t end) { if (!PageTransCompound(page)) - return true; + goto nosplit; /* Just proceed to delete a huge page wholly within the range punched */ if (PageHead(page) && page->index >= start && page->index + HPAGE_PMD_NR <= end) - return true; + goto nosplit; /* Try to split huge page, so we can truly punch the hole or truncate */ return split_huge_page(page) >= 0; +nosplit: + truncate_inode_page(page->mapping, page); + return true; } /* @@ -883,8 +886,7 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend, if ((!unfalloc || !PageUptodate(page)) && page_mapping(page) == mapping) { VM_BUG_ON_PAGE(PageWriteback(page), page); - if (shmem_punch_compound(page, start, end)) - truncate_inode_page(mapping, page); + shmem_punch_compound(page, start, end); } unlock_page(page); } @@ -966,9 +968,7 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend, break; } VM_BUG_ON_PAGE(PageWriteback(page), page); - if (shmem_punch_compound(page, start, end)) - truncate_inode_page(mapping, page); - else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) { + if (!shmem_punch_compound(page, start, end)) { /* Wipe the page and don't get stuck */ clear_highpage(page); flush_dcache_page(page);