mm: fix livelock caused by iterating multi order entry
diff mbox series

Message ID 1563495160-25647-1-git-send-email-bo.liu@linux.alibaba.com
State New
Headers show
Series
  • mm: fix livelock caused by iterating multi order entry
Related show

Commit Message

Liu Bo July 19, 2019, 12:12 a.m. UTC
The livelock can be triggerred in the following pattern,

	while (index < end && pagevec_lookup_entries(&pvec, mapping, index,
				min(end - index, (pgoff_t)PAGEVEC_SIZE),
				indices)) {
		...
		for (i = 0; i < pagevec_count(&pvec); i++) {
			index = indices[i];
			...
		}
		index++; /* BUG */
	}

multi order exceptional entry is not specially considered in
invalidate_inode_pages2_range() and it ended up with a livelock because
both index 0 and index 1 finds the same pmd, but this pmd is binded to
index 0, so index is set to 0 again.

This introduces a helper to take the pmd entry's length into account when
deciding the next index.

Note that there're other users of the above pattern which doesn't need to
fix,

- dax_layout_busy_page
It's been fixed in commit d7782145e1ad
("filesystem-dax: Fix dax_layout_busy_page() livelock")

- truncate_inode_pages_range
This won't loop forever since the exceptional entries are immediately
removed from radix tree after the search.

Fixes: 642261a ("dax: add struct iomap based DAX PMD support")
Cc: <stable@vger.kernel.org> since 4.9 to 4.19
Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com>
---

The problem is gone after commit f280bf092d48 ("page cache: Convert
find_get_entries to XArray"), but since xarray seems too new to backport
to 4.19, I made this fix based on radix tree implementation.

 fs/dax.c            | 19 +++++++++++++++++++
 include/linux/dax.h |  8 ++++++++
 mm/truncate.c       | 26 ++++++++++++++++++++++++--
 3 files changed, 51 insertions(+), 2 deletions(-)

Comments

Dan Williams July 19, 2019, 2:53 a.m. UTC | #1
[ add Sasha for -stable advice ]

On Thu, Jul 18, 2019 at 5:13 PM Liu Bo <bo.liu@linux.alibaba.com> wrote:
>
> The livelock can be triggerred in the following pattern,
>
>         while (index < end && pagevec_lookup_entries(&pvec, mapping, index,
>                                 min(end - index, (pgoff_t)PAGEVEC_SIZE),
>                                 indices)) {
>                 ...
>                 for (i = 0; i < pagevec_count(&pvec); i++) {
>                         index = indices[i];
>                         ...
>                 }
>                 index++; /* BUG */
>         }
>
> multi order exceptional entry is not specially considered in
> invalidate_inode_pages2_range() and it ended up with a livelock because
> both index 0 and index 1 finds the same pmd, but this pmd is binded to
> index 0, so index is set to 0 again.
>
> This introduces a helper to take the pmd entry's length into account when
> deciding the next index.
>
> Note that there're other users of the above pattern which doesn't need to
> fix,
>
> - dax_layout_busy_page
> It's been fixed in commit d7782145e1ad
> ("filesystem-dax: Fix dax_layout_busy_page() livelock")
>
> - truncate_inode_pages_range
> This won't loop forever since the exceptional entries are immediately
> removed from radix tree after the search.
>
> Fixes: 642261a ("dax: add struct iomap based DAX PMD support")
> Cc: <stable@vger.kernel.org> since 4.9 to 4.19
> Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com>
> ---
>
> The problem is gone after commit f280bf092d48 ("page cache: Convert
> find_get_entries to XArray"), but since xarray seems too new to backport
> to 4.19, I made this fix based on radix tree implementation.

I think in this situation, since mainline does not need this change
and the bug has been buried under a major refactoring, is to send a
backport directly against the v4.19 kernel. Include notes about how it
replaces the fix that was inadvertently contained in f280bf092d48
("page cache: Convert find_get_entries to XArray"). Do you have a test
case that you can include in the changelog?
Greg Kroah-Hartman July 19, 2019, 3:43 a.m. UTC | #2
On Thu, Jul 18, 2019 at 07:53:42PM -0700, Dan Williams wrote:
> [ add Sasha for -stable advice ]
> 
> On Thu, Jul 18, 2019 at 5:13 PM Liu Bo <bo.liu@linux.alibaba.com> wrote:
> >
> > The livelock can be triggerred in the following pattern,
> >
> >         while (index < end && pagevec_lookup_entries(&pvec, mapping, index,
> >                                 min(end - index, (pgoff_t)PAGEVEC_SIZE),
> >                                 indices)) {
> >                 ...
> >                 for (i = 0; i < pagevec_count(&pvec); i++) {
> >                         index = indices[i];
> >                         ...
> >                 }
> >                 index++; /* BUG */
> >         }
> >
> > multi order exceptional entry is not specially considered in
> > invalidate_inode_pages2_range() and it ended up with a livelock because
> > both index 0 and index 1 finds the same pmd, but this pmd is binded to
> > index 0, so index is set to 0 again.
> >
> > This introduces a helper to take the pmd entry's length into account when
> > deciding the next index.
> >
> > Note that there're other users of the above pattern which doesn't need to
> > fix,
> >
> > - dax_layout_busy_page
> > It's been fixed in commit d7782145e1ad
> > ("filesystem-dax: Fix dax_layout_busy_page() livelock")
> >
> > - truncate_inode_pages_range
> > This won't loop forever since the exceptional entries are immediately
> > removed from radix tree after the search.
> >
> > Fixes: 642261a ("dax: add struct iomap based DAX PMD support")
> > Cc: <stable@vger.kernel.org> since 4.9 to 4.19
> > Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com>
> > ---
> >
> > The problem is gone after commit f280bf092d48 ("page cache: Convert
> > find_get_entries to XArray"), but since xarray seems too new to backport
> > to 4.19, I made this fix based on radix tree implementation.
> 
> I think in this situation, since mainline does not need this change
> and the bug has been buried under a major refactoring, is to send a
> backport directly against the v4.19 kernel. Include notes about how it
> replaces the fix that was inadvertently contained in f280bf092d48
> ("page cache: Convert find_get_entries to XArray"). Do you have a test
> case that you can include in the changelog?

Yes, I need a _TON_ of documentation, and signed off by from all of the
developers involved in this part of the kernel, before I can take this
not-in-mainline patch.

thanks,

greg k-h
Liu Bo July 19, 2019, 9:43 p.m. UTC | #3
On Thu, Jul 18, 2019 at 07:53:42PM -0700, Dan Williams wrote:
> [ add Sasha for -stable advice ]
> 
> On Thu, Jul 18, 2019 at 5:13 PM Liu Bo <bo.liu@linux.alibaba.com> wrote:
> >
> > The livelock can be triggerred in the following pattern,
> >
> >         while (index < end && pagevec_lookup_entries(&pvec, mapping, index,
> >                                 min(end - index, (pgoff_t)PAGEVEC_SIZE),
> >                                 indices)) {
> >                 ...
> >                 for (i = 0; i < pagevec_count(&pvec); i++) {
> >                         index = indices[i];
> >                         ...
> >                 }
> >                 index++; /* BUG */
> >         }
> >
> > multi order exceptional entry is not specially considered in
> > invalidate_inode_pages2_range() and it ended up with a livelock because
> > both index 0 and index 1 finds the same pmd, but this pmd is binded to
> > index 0, so index is set to 0 again.
> >
> > This introduces a helper to take the pmd entry's length into account when
> > deciding the next index.
> >
> > Note that there're other users of the above pattern which doesn't need to
> > fix,
> >
> > - dax_layout_busy_page
> > It's been fixed in commit d7782145e1ad
> > ("filesystem-dax: Fix dax_layout_busy_page() livelock")
> >
> > - truncate_inode_pages_range
> > This won't loop forever since the exceptional entries are immediately
> > removed from radix tree after the search.
> >
> > Fixes: 642261a ("dax: add struct iomap based DAX PMD support")
> > Cc: <stable@vger.kernel.org> since 4.9 to 4.19
> > Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com>
> > ---
> >
> > The problem is gone after commit f280bf092d48 ("page cache: Convert
> > find_get_entries to XArray"), but since xarray seems too new to backport
> > to 4.19, I made this fix based on radix tree implementation.
> 
> I think in this situation, since mainline does not need this change
> and the bug has been buried under a major refactoring, is to send a
> backport directly against the v4.19 kernel. Include notes about how it
> replaces the fix that was inadvertently contained in f280bf092d48
> ("page cache: Convert find_get_entries to XArray"). Do you have a test
> case that you can include in the changelog?

The root cause behind the bug is exactly same as what commit
d7782145e1ad ("filesystem-dax: Fix dax_layout_busy_page() livelock")
does.

For test case, I have a not 100% reproducible one based on ltp's
rwtest[1] and virtiofs.

[1]:
$mount -t virtio_fs -o tag=alwaysdax -o rootmode=040000,user_id=0,group_id=0,dax,default_permissions,allow_other alwaysdax /mnt/virtio-fs/
$cat test.txt
rwtest01 export LTPROOT; rwtest -N rwtest01 -c -q -i 60s -f sync 10%25000:$TMPDIR/rw-sync-$$
$runltp -d /mnt/virtio-fs -f test.txt

thanks,
-liubo

Patch
diff mbox series

diff --git a/fs/dax.c b/fs/dax.c
index ac334bc..cd05337 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -764,6 +764,25 @@  int dax_invalidate_mapping_entry_sync(struct address_space *mapping,
 	return __dax_invalidate_mapping_entry(mapping, index, false);
 }
 
+pgoff_t dax_get_multi_order(struct address_space *mapping, pgoff_t index,
+			    void *entry)
+{
+	struct radix_tree_root *pages = &mapping->i_pages;
+	pgoff_t nr_pages = 1;
+
+	if (!dax_mapping(mapping))
+		return nr_pages;
+
+	xa_lock_irq(pages);
+	entry = get_unlocked_mapping_entry(mapping, index, NULL);
+	if (entry)
+		nr_pages = 1UL << dax_radix_order(entry);
+	put_unlocked_mapping_entry(mapping, index, entry);
+	xa_unlock_irq(pages);
+
+	return nr_pages;
+}
+
 static int copy_user_dax(struct block_device *bdev, struct dax_device *dax_dev,
 		sector_t sector, size_t size, struct page *to,
 		unsigned long vaddr)
diff --git a/include/linux/dax.h b/include/linux/dax.h
index a846184..f3c95c6 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -91,6 +91,8 @@  int dax_writeback_mapping_range(struct address_space *mapping,
 struct page *dax_layout_busy_page(struct address_space *mapping);
 bool dax_lock_mapping_entry(struct page *page);
 void dax_unlock_mapping_entry(struct page *page);
+pgoff_t dax_get_multi_order(struct address_space *mapping, pgoff_t index,
+			    void *entry);
 #else
 static inline bool bdev_dax_supported(struct block_device *bdev,
 		int blocksize)
@@ -134,6 +136,12 @@  static inline bool dax_lock_mapping_entry(struct page *page)
 static inline void dax_unlock_mapping_entry(struct page *page)
 {
 }
+
+static inline pgoff_t dax_get_multi_order(struct address_space *mapping,
+					  pgoff_t index, void *entry)
+{
+	return 1;
+}
 #endif
 
 int dax_read_lock(void);
diff --git a/mm/truncate.c b/mm/truncate.c
index 71b65aa..835911f 100644
--- a/mm/truncate.c
+++ b/mm/truncate.c
@@ -557,6 +557,8 @@  unsigned long invalidate_mapping_pages(struct address_space *mapping,
 	while (index <= end && pagevec_lookup_entries(&pvec, mapping, index,
 			min(end - index, (pgoff_t)PAGEVEC_SIZE - 1) + 1,
 			indices)) {
+		pgoff_t nr_pages = 1;
+
 		for (i = 0; i < pagevec_count(&pvec); i++) {
 			struct page *page = pvec.pages[i];
 
@@ -568,6 +570,15 @@  unsigned long invalidate_mapping_pages(struct address_space *mapping,
 			if (radix_tree_exceptional_entry(page)) {
 				invalidate_exceptional_entry(mapping, index,
 							     page);
+				/*
+				 * Account for multi-order entries at
+				 * the end of the pagevec.
+				 */
+				if (i < pagevec_count(&pvec) - 1)
+					continue;
+
+				nr_pages = dax_get_multi_order(mapping, index,
+							       page);
 				continue;
 			}
 
@@ -607,7 +618,7 @@  unsigned long invalidate_mapping_pages(struct address_space *mapping,
 		pagevec_remove_exceptionals(&pvec);
 		pagevec_release(&pvec);
 		cond_resched();
-		index++;
+		index += nr_pages;
 	}
 	return count;
 }
@@ -688,6 +699,8 @@  int invalidate_inode_pages2_range(struct address_space *mapping,
 	while (index <= end && pagevec_lookup_entries(&pvec, mapping, index,
 			min(end - index, (pgoff_t)PAGEVEC_SIZE - 1) + 1,
 			indices)) {
+		pgoff_t nr_pages = 1;
+
 		for (i = 0; i < pagevec_count(&pvec); i++) {
 			struct page *page = pvec.pages[i];
 
@@ -700,6 +713,15 @@  int invalidate_inode_pages2_range(struct address_space *mapping,
 				if (!invalidate_exceptional_entry2(mapping,
 								   index, page))
 					ret = -EBUSY;
+				/*
+				 * Account for multi-order entries at
+				 * the end of the pagevec.
+				 */
+				if (i < pagevec_count(&pvec) - 1)
+					continue;
+
+				nr_pages = dax_get_multi_order(mapping, index,
+							       page);
 				continue;
 			}
 
@@ -739,7 +761,7 @@  int invalidate_inode_pages2_range(struct address_space *mapping,
 		pagevec_remove_exceptionals(&pvec);
 		pagevec_release(&pvec);
 		cond_resched();
-		index++;
+		index += nr_pages;
 	}
 	/*
 	 * For DAX we invalidate page tables after invalidating radix tree.  We