[mm-unstable,v1] mm/truncate: batch-clear shadow entries

Message ID	20240708212753.3120511-1-yuzhao@google.com (mailing list archive)
State	New
Headers	show Return-Path: <owner-linux-mm@kvack.org> Date: Mon, 8 Jul 2024 15:27:53 -0600 Mime-Version: 1.0 Message-ID: <20240708212753.3120511-1-yuzhao@google.com> Subject: [PATCH mm-unstable v1] mm/truncate: batch-clear shadow entries From: Yu Zhao <yuzhao@google.com> To: Andrew Morton <akpm@linux-foundation.org> Cc: Bharata B Rao <bharata@amd.com>, "Matthew Wilcox (Oracle)" <willy@infradead.org>, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Yu Zhao <yuzhao@google.com> Content-Type: text/plain; charset="UTF-8" Sender: owner-linux-mm@kvack.org Precedence: bulk
Series	[mm-unstable,v1] mm/truncate: batch-clear shadow entries \| expand [mm-unstable,v1] mm/truncate: batch-clear shadow entries

Message ID

20240708212753.3120511-1-yuzhao@google.com (mailing list archive)

State

New

Headers

Date: Mon,  8 Jul 2024 15:27:53 -0600
Mime-Version: 1.0
Message-ID: <20240708212753.3120511-1-yuzhao@google.com>
Subject: [PATCH mm-unstable v1] mm/truncate: batch-clear shadow entries
From: Yu Zhao <yuzhao@google.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Bharata B Rao <bharata@amd.com>,
 "Matthew Wilcox (Oracle)" <willy@infradead.org>, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, Yu Zhao <yuzhao@google.com>
Content-Type: text/plain; charset="UTF-8"
Sender: owner-linux-mm@kvack.org
Precedence: bulk

Series

[mm-unstable,v1] mm/truncate: batch-clear shadow entries | expand

Commit Message

Yu Zhao July 8, 2024, 9:27 p.m. UTC

Make clear_shadow_entry() clear shadow entries in `struct folio_batch`
so that it can reduce contention on i_lock and i_pages locks, e.g.,

  watchdog: BUG: soft lockup - CPU#29 stuck for 11s! [fio:2701649]
    clear_shadow_entry+0x3d/0x100
    mapping_try_invalidate+0x117/0x1d0
    invalidate_mapping_pages+0x10/0x20
    invalidate_bdev+0x3c/0x50
    blkdev_common_ioctl+0x5f7/0xa90
    blkdev_ioctl+0x109/0x270

Reported-by: Bharata B Rao <bharata@amd.com>
Closes: https://lore.kernel.org/d2841226-e27b-4d3d-a578-63587a3aa4f3@amd.com/
Signed-off-by: Yu Zhao <yuzhao@google.com>
---
 mm/truncate.c | 67 +++++++++++++++++++++++----------------------------
 1 file changed, 30 insertions(+), 37 deletions(-)

Comments

Andrew Morton July 8, 2024, 10:16 p.m. UTC | #1

On Mon,  8 Jul 2024 15:27:53 -0600 Yu Zhao <yuzhao@google.com> wrote:

> Make clear_shadow_entry() clear shadow entries in `struct folio_batch`
> so that it can reduce contention on i_lock and i_pages locks, e.g.,
> 
>   watchdog: BUG: soft lockup - CPU#29 stuck for 11s! [fio:2701649]
>     clear_shadow_entry+0x3d/0x100
>     mapping_try_invalidate+0x117/0x1d0
>     invalidate_mapping_pages+0x10/0x20
>     invalidate_bdev+0x3c/0x50
>     blkdev_common_ioctl+0x5f7/0xa90
>     blkdev_ioctl+0x109/0x270

This will clearly reduce lock traffic a lot, but does it truly fix the
issue?  Is it the case that sufficiently extreme loads will still run
into problems?

> --- a/mm/truncate.c
> +++ b/mm/truncate.c
> @@ -39,12 +39,24 @@ static inline void __clear_shadow_entry(struct address_space *mapping,
>  	xas_store(&xas, NULL);
>  }
>  
> -static void clear_shadow_entry(struct address_space *mapping, pgoff_t index,
> -			       void *entry)
> +static void clear_shadow_entry(struct address_space *mapping,
> +			       struct folio_batch *fbatch, pgoff_t *indices)
>  {
> +	int i;
> +
> +	if (shmem_mapping(mapping) || dax_mapping(mapping))
> +		return;

We lost the comment which was in invalidate_exceptional_entry() and
elsewhere.  It wasn't a terribly good one but I do think a few words
which explain why we're testing for these things would be helpful.

I expect we should backport this.  But identifying a Fixes: target
looks to be challenging.

Matthew Wilcox (Oracle) July 8, 2024, 10:36 p.m. UTC | #2

On Mon, Jul 08, 2024 at 03:27:53PM -0600, Yu Zhao wrote:
> Make clear_shadow_entry() clear shadow entries in `struct folio_batch`
> so that it can reduce contention on i_lock and i_pages locks, e.g.,

I think it needs to be renamed, perhaps to clear_shadow_entries().

> @@ -503,8 +486,8 @@ unsigned long mapping_try_invalidate(struct address_space *mapping,
>  			/* We rely upon deletion not changing folio->index */
>  
>  			if (xa_is_value(folio)) {
> -				count += invalidate_exceptional_entry(mapping,
> -							     indices[i], folio);
> +				xa_has_values = true;
> +				count++;

Mmm.  This is awkward.  It's supposed to return the number of pages,
not the number of folios (or shadow entries) invalidated.

Yu Zhao July 9, 2024, 10:27 p.m. UTC | #3

On Mon, Jul 8, 2024 at 4:16 PM Andrew Morton <akpm@linux-foundation.org> wrote:
>
> On Mon,  8 Jul 2024 15:27:53 -0600 Yu Zhao <yuzhao@google.com> wrote:
>
> > Make clear_shadow_entry() clear shadow entries in `struct folio_batch`
> > so that it can reduce contention on i_lock and i_pages locks, e.g.,
> >
> >   watchdog: BUG: soft lockup - CPU#29 stuck for 11s! [fio:2701649]
> >     clear_shadow_entry+0x3d/0x100
> >     mapping_try_invalidate+0x117/0x1d0
> >     invalidate_mapping_pages+0x10/0x20
> >     invalidate_bdev+0x3c/0x50
> >     blkdev_common_ioctl+0x5f7/0xa90
> >     blkdev_ioctl+0x109/0x270
>
> This will clearly reduce lock traffic a lot, but does it truly fix the
> issue?  Is it the case that sufficiently extreme loads will still run
> into problems?

I think Bharata was running extreme loads. So I'd say it's good enough
for now, considering truncation doesn't happen that often.

> > --- a/mm/truncate.c
> > +++ b/mm/truncate.c
> > @@ -39,12 +39,24 @@ static inline void __clear_shadow_entry(struct address_space *mapping,
> >       xas_store(&xas, NULL);
> >  }
> >
> > -static void clear_shadow_entry(struct address_space *mapping, pgoff_t index,
> > -                            void *entry)
> > +static void clear_shadow_entry(struct address_space *mapping,
> > +                            struct folio_batch *fbatch, pgoff_t *indices)
> >  {
> > +     int i;
> > +
> > +     if (shmem_mapping(mapping) || dax_mapping(mapping))
> > +             return;
>
> We lost the comment which was in invalidate_exceptional_entry() and
> elsewhere.  It wasn't a terribly good one but I do think a few words
> which explain why we're testing for these things would be helpful.

I'll put the original comment back. It seems to me it was stating the
obvious, and I don't really know how to improve it since it's obvious
;)

> I expect we should backport this.  But identifying a Fixes: target
> looks to be challenging.

I wouldn't worry about backporting, nobody else has run into this
scalability issue (not even a day-to-day performance problem).

Yu Zhao July 9, 2024, 10:30 p.m. UTC | #4

On Mon, Jul 8, 2024 at 4:36 PM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Mon, Jul 08, 2024 at 03:27:53PM -0600, Yu Zhao wrote:
> > Make clear_shadow_entry() clear shadow entries in `struct folio_batch`
> > so that it can reduce contention on i_lock and i_pages locks, e.g.,
>
> I think it needs to be renamed, perhaps to clear_shadow_entries().

Yes, thank you.

> > @@ -503,8 +486,8 @@ unsigned long mapping_try_invalidate(struct address_space *mapping,
> >                       /* We rely upon deletion not changing folio->index */
> >
> >                       if (xa_is_value(folio)) {
> > -                             count += invalidate_exceptional_entry(mapping,
> > -                                                          indices[i], folio);
> > +                             xa_has_values = true;
> > +                             count++;
>
> Mmm.  This is awkward.  It's supposed to return the number of pages,
> not the number of folios (or shadow entries) invalidated.

I didn't think about this much. It seems to me no callers really care
about it other than some debugging messages?

diff --git a/mm/truncate.c b/mm/truncate.c
index 5ce62a939e55..bc12e7ff7d6a 100644
--- a/mm/truncate.c
+++ b/mm/truncate.c
@@ -39,12 +39,24 @@  static inline void __clear_shadow_entry(struct address_space *mapping,
 	xas_store(&xas, NULL);
 }
 
-static void clear_shadow_entry(struct address_space *mapping, pgoff_t index,
-			       void *entry)
+static void clear_shadow_entry(struct address_space *mapping,
+			       struct folio_batch *fbatch, pgoff_t *indices)
 {
+	int i;
+
+	if (shmem_mapping(mapping) || dax_mapping(mapping))
+		return;
+
 	spin_lock(&mapping->host->i_lock);
 	xa_lock_irq(&mapping->i_pages);
-	__clear_shadow_entry(mapping, index, entry);
+
+	for (i = 0; i < folio_batch_count(fbatch); i++) {
+		struct folio *folio = fbatch->folios[i];
+
+		if (xa_is_value(folio))
+			__clear_shadow_entry(mapping, indices[i], folio);
+	}
+
 	xa_unlock_irq(&mapping->i_pages);
 	if (mapping_shrinkable(mapping))
 		inode_add_lru(mapping->host);
@@ -105,36 +117,6 @@  static void truncate_folio_batch_exceptionals(struct address_space *mapping,
 	fbatch->nr = j;
 }
 
-/*
- * Invalidate exceptional entry if easily possible. This handles exceptional
- * entries for invalidate_inode_pages().
- */
-static int invalidate_exceptional_entry(struct address_space *mapping,
-					pgoff_t index, void *entry)
-{
-	/* Handled by shmem itself, or for DAX we do nothing. */
-	if (shmem_mapping(mapping) || dax_mapping(mapping))
-		return 1;
-	clear_shadow_entry(mapping, index, entry);
-	return 1;
-}
-
-/*
- * Invalidate exceptional entry if clean. This handles exceptional entries for
- * invalidate_inode_pages2() so for DAX it evicts only clean entries.
- */
-static int invalidate_exceptional_entry2(struct address_space *mapping,
-					 pgoff_t index, void *entry)
-{
-	/* Handled by shmem itself */
-	if (shmem_mapping(mapping))
-		return 1;
-	if (dax_mapping(mapping))
-		return dax_invalidate_mapping_entry_sync(mapping, index);
-	clear_shadow_entry(mapping, index, entry);
-	return 1;
-}
-
 /**
  * folio_invalidate - Invalidate part or all of a folio.
  * @folio: The folio which is affected.
@@ -494,6 +476,7 @@  unsigned long mapping_try_invalidate(struct address_space *mapping,
 	unsigned long ret;
 	unsigned long count = 0;
 	int i;
+	bool xa_has_values = false;
 
 	folio_batch_init(&fbatch);
 	while (find_lock_entries(mapping, &index, end, &fbatch, indices)) {
@@ -503,8 +486,8 @@  unsigned long mapping_try_invalidate(struct address_space *mapping,
 			/* We rely upon deletion not changing folio->index */
 
 			if (xa_is_value(folio)) {
-				count += invalidate_exceptional_entry(mapping,
-							     indices[i], folio);
+				xa_has_values = true;
+				count++;
 				continue;
 			}
 
@@ -522,6 +505,10 @@  unsigned long mapping_try_invalidate(struct address_space *mapping,
 			}
 			count += ret;
 		}
+
+		if (xa_has_values)
+			clear_shadow_entry(mapping, &fbatch, indices);
+
 		folio_batch_remove_exceptionals(&fbatch);
 		folio_batch_release(&fbatch);
 		cond_resched();
@@ -616,6 +603,7 @@  int invalidate_inode_pages2_range(struct address_space *mapping,
 	int ret = 0;
 	int ret2 = 0;
 	int did_range_unmap = 0;
+	bool xa_has_values = false;
 
 	if (mapping_empty(mapping))
 		return 0;
@@ -629,8 +617,9 @@  int invalidate_inode_pages2_range(struct address_space *mapping,
 			/* We rely upon deletion not changing folio->index */
 
 			if (xa_is_value(folio)) {
-				if (!invalidate_exceptional_entry2(mapping,
-						indices[i], folio))
+				xa_has_values = true;
+				if (dax_mapping(mapping) &&
+				    !dax_invalidate_mapping_entry_sync(mapping, indices[i]))
 					ret = -EBUSY;
 				continue;
 			}
@@ -666,6 +655,10 @@  int invalidate_inode_pages2_range(struct address_space *mapping,
 				ret = ret2;
 			folio_unlock(folio);
 		}
+
+		if (xa_has_values)
+			clear_shadow_entry(mapping, &fbatch, indices);
+
 		folio_batch_remove_exceptionals(&fbatch);
 		folio_batch_release(&fbatch);
 		cond_resched();

[mm-unstable,v1] mm/truncate: batch-clear shadow entries

Commit Message

Comments

Patch