diff mbox series

mm: fs: fix lru_cache_disabled race in bh_lru

Message ID 20220308180709.2017638-1-minchan@kernel.org (mailing list archive)
State New
Headers show
Series mm: fs: fix lru_cache_disabled race in bh_lru | expand

Commit Message

Minchan Kim March 8, 2022, 6:07 p.m. UTC
Check lru_cache_disabled under bh_lru_lock. Otherwise, it could
introduce race below and it fails to migrate pages containing
buffer_head.

   CPU 0					CPU 1

bh_lru_install
                                       lru_cache_disable
  lru_cache_disabled = false
                                       atomic_inc(&lru_disable_count);
				       invalidate_bh_lrus_cpu of CPU 0
				       bh_lru_lock
				       __invalidate_bh_lrus
				       bh_lru_unlock
  bh_lru_lock
  install the bh
  bh_lru_unlock

Fixes: 8cc621d2f45d (mm: fs: invalidate BH LRU during page migration)
Signed-off-by: Minchan Kim <minchan@kernel.org>
---
 fs/buffer.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

Comments

Andrew Morton March 9, 2022, 10:06 p.m. UTC | #1
On Tue,  8 Mar 2022 10:07:09 -0800 Minchan Kim <minchan@kernel.org> wrote:

> Check lru_cache_disabled under bh_lru_lock. Otherwise, it could
> introduce race below and it fails to migrate pages containing
> buffer_head.
> 
>    CPU 0					CPU 1
> 
> bh_lru_install
>                                        lru_cache_disable
>   lru_cache_disabled = false
>                                        atomic_inc(&lru_disable_count);
> 				       invalidate_bh_lrus_cpu of CPU 0
> 				       bh_lru_lock
> 				       __invalidate_bh_lrus
> 				       bh_lru_unlock
>   bh_lru_lock
>   install the bh
>   bh_lru_unlock

What are the user-visible runtime effects of this bug?

Is a cc:stable needed?

Should there be a reported-by?

Thanks.
Minchan Kim March 9, 2022, 10:40 p.m. UTC | #2
On Wed, Mar 09, 2022 at 02:06:27PM -0800, Andrew Morton wrote:
> On Tue,  8 Mar 2022 10:07:09 -0800 Minchan Kim <minchan@kernel.org> wrote:
> 
> > Check lru_cache_disabled under bh_lru_lock. Otherwise, it could
> > introduce race below and it fails to migrate pages containing
> > buffer_head.
> > 
> >    CPU 0					CPU 1
> > 
> > bh_lru_install
> >                                        lru_cache_disable
> >   lru_cache_disabled = false
> >                                        atomic_inc(&lru_disable_count);
> > 				       invalidate_bh_lrus_cpu of CPU 0
> > 				       bh_lru_lock
> > 				       __invalidate_bh_lrus
> > 				       bh_lru_unlock
> >   bh_lru_lock
> >   install the bh
> >   bh_lru_unlock
> 
> What are the user-visible runtime effects of this bug?

Once the race happens, CMA allocation fails, which is critical for
the workload CMA allocation depends.

> 
> Is a cc:stable needed?

Ah, missed it. I think it would be rare to trigger the race considering
how CMA allocation would be rare but once it happens, it makes the CMA
allocation failure, which is critical for some. And the patch size is
small enough so I think it's worth to add in the stable.

> 
> Should there be a reported-by?

I found it on my own while I reviewed Marcelo's other patchset so
I don't think we need to add my reported-by.

Andrew, please tell me if you want me resend it.
diff mbox series

Patch

diff --git a/fs/buffer.c b/fs/buffer.c
index 8e112b6bd371..c76a8ef60a75 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -1235,16 +1235,18 @@  static void bh_lru_install(struct buffer_head *bh)
 	int i;
 
 	check_irqs_on();
+	bh_lru_lock();
+
 	/*
 	 * the refcount of buffer_head in bh_lru prevents dropping the
 	 * attached page(i.e., try_to_free_buffers) so it could cause
 	 * failing page migration.
 	 * Skip putting upcoming bh into bh_lru until migration is done.
 	 */
-	if (lru_cache_disabled())
+	if (lru_cache_disabled()) {
+		bh_lru_unlock();
 		return;
-
-	bh_lru_lock();
+	}
 
 	b = this_cpu_ptr(&bh_lrus);
 	for (i = 0; i < BH_LRU_SIZE; i++) {