diff mbox series

fscache: Need to go round again after processing LRU_DISCARDING state

Message ID 599331.1639410068@warthog.procyon.org.uk (mailing list archive)
State New, archived
Headers show
Series fscache: Need to go round again after processing LRU_DISCARDING state | expand

Commit Message

David Howells Dec. 13, 2021, 3:41 p.m. UTC
David Wysochanski <dwysocha@redhat.com> wrote:

> > [  432.921382] BUG: KASAN: use-after-free in
> > fscache_unhash_cookie+0x9e/0x160 [fscache]^M

I think the patch below is the way to fix this.

David
---
fscache: Need to go round again after processing LRU_DISCARDING state

There's a race between the LRU discard and relinquishment actions.  In the
state machine, fscache_cookie_state_machine(), the ACTIVE state transits to
the LRU_DISCARD state in preference to transiting to the RELINQUISHING or
WITHDRAWING states.

This should be fine, but the LRU_DISCARDING state just breaks out the
bottom of the function without going round again after transiting to the
QUIESCENT state.

However, if both LRU discard and relinquishment happen *before* the SM
runs, one of the queue events will get discarded, along with the ref that
would be associated with it.  The last ref is then discarded and the cookie
is removed without completing the relinquishment process - leaving the
cookie hashed.

The fix is to make sure that the SM always goes back around after changing
the state.

Signed-off-by: David Howells <dhowells@redhat.com>
---

Comments

David Howells Dec. 13, 2021, 4:22 p.m. UTC | #1
David Howells <dhowells@redhat.com> wrote:

> However, if both LRU discard and relinquishment happen *before* the SM
> runs, one of the queue events will get discarded, along with the ref that
> would be associated with it.  The last ref is then discarded and the cookie
> is removed without completing the relinquishment process - leaving the
> cookie hashed.

This can be seen in a trace, e.g.:

  kworker/u16:97-5939    [000] .....   639.403740: fscache_cookie: c=000071a9 -   lrudo r=3
  kworker/u16:97-5939    [000] .....   639.403741: fscache_cookie: c=000071a9 GQ  endac r=4
  kworker/u16:97-5939    [000] .....   639.403745: fscache_cookie: c=000071a9 PUT lru   r=3
       dirstress-7027    [002] .....   639.427220: fscache_relinquish: c=000071a9 V=00000001 r=3 U=0 f=bd rt=0
       dirstress-7027    [002] .....   639.427222: fscache_cookie: c=000071a9 GQ  endac r=4
       dirstress-7027    [002] .....   639.427223: fscache_cookie: c=000071a9 PQ  overq r=3

where the "overq" line marks the discarded event and ref.

David
David Howells Dec. 13, 2021, 8:03 p.m. UTC | #2
I forgot to commit part of the patch.  Attached is the patch in full.

David
---
commit a3d01f1a21bcf2c39aa6db49edbc08540378a593
Author: David Howells <dhowells@redhat.com>
Date:   Mon Dec 13 16:26:44 2021 +0000

    afs: Fix mmap
    
    Fix afs_add_open_map() to check that the vnode isn't already on the list
    when it adds it.  It's possible that afs_drop_open_mmap() decremented the
    cb_nr_mmap counter, but hadn't yet got into the locked section to remove
    it.
    
    Also vnode->cb_mmap_link should be initialised, so fix that too.
    
    Fixes: 6e0e99d58a65 ("afs: Fix mmap coherency vs 3rd-party changes")
    Reported-by: Marc Dionne <marc.dionne@auristor.com>
    Signed-off-by: David Howells <dhowells@redhat.com>
    cc: linux-afs@lists.infradead.org

diff --git a/fs/afs/file.c b/fs/afs/file.c
index 572063dad0b3..bcda99dcfdec 100644
--- a/fs/afs/file.c
+++ b/fs/afs/file.c
@@ -539,8 +539,9 @@ static void afs_add_open_mmap(struct afs_vnode *vnode)
 	if (atomic_inc_return(&vnode->cb_nr_mmap) == 1) {
 		down_write(&vnode->volume->cell->fs_open_mmaps_lock);
 
-		list_add_tail(&vnode->cb_mmap_link,
-			      &vnode->volume->cell->fs_open_mmaps);
+		if (list_empty(&vnode->cb_mmap_link))
+			list_add_tail(&vnode->cb_mmap_link,
+				      &vnode->volume->cell->fs_open_mmaps);
 
 		up_write(&vnode->volume->cell->fs_open_mmaps_lock);
 	}
diff --git a/fs/afs/super.c b/fs/afs/super.c
index af7cbd9949c5..5ec9fd97eccc 100644
--- a/fs/afs/super.c
+++ b/fs/afs/super.c
@@ -668,6 +668,7 @@ static void afs_i_init_once(void *_vnode)
 	INIT_LIST_HEAD(&vnode->pending_locks);
 	INIT_LIST_HEAD(&vnode->granted_locks);
 	INIT_DELAYED_WORK(&vnode->lock_work, afs_lock_work);
+	INIT_LIST_HEAD(&vnode->cb_mmap_link);
 	seqlock_init(&vnode->cb_lock);
 }
David Wysochanski Dec. 13, 2021, 9:05 p.m. UTC | #3
On Mon, Dec 13, 2021 at 10:41 AM David Howells <dhowells@redhat.com> wrote:
>
> David Wysochanski <dwysocha@redhat.com> wrote:
>
> > > [  432.921382] BUG: KASAN: use-after-free in
> > > fscache_unhash_cookie+0x9e/0x160 [fscache]^M
>
> I think the patch below is the way to fix this.
>
> David
> ---
> fscache: Need to go round again after processing LRU_DISCARDING state
>
> There's a race between the LRU discard and relinquishment actions.  In the
> state machine, fscache_cookie_state_machine(), the ACTIVE state transits to
> the LRU_DISCARD state in preference to transiting to the RELINQUISHING or
> WITHDRAWING states.
>
> This should be fine, but the LRU_DISCARDING state just breaks out the
> bottom of the function without going round again after transiting to the
> QUIESCENT state.
>
> However, if both LRU discard and relinquishment happen *before* the SM
> runs, one of the queue events will get discarded, along with the ref that
> would be associated with it.  The last ref is then discarded and the cookie
> is removed without completing the relinquishment process - leaving the
> cookie hashed.
>
> The fix is to make sure that the SM always goes back around after changing
> the state.
>
> Signed-off-by: David Howells <dhowells@redhat.com>
> ---
>
> diff --git a/fs/fscache/cookie.c b/fs/fscache/cookie.c
> index d7e825d636e2..8d0769a5ee2b 100644
> --- a/fs/fscache/cookie.c
> +++ b/fs/fscache/cookie.c
> @@ -755,7 +755,7 @@ static void fscache_cookie_state_machine(struct fscache_cookie *cookie)
>                 set_bit(FSCACHE_COOKIE_NO_DATA_TO_READ, &cookie->flags);
>                 __fscache_set_cookie_state(cookie, FSCACHE_COOKIE_STATE_QUIESCENT);
>                 wake = true;
> -               break;
> +               goto again_locked;
>
>         case FSCACHE_COOKIE_STATE_DROPPED:
>                 break;
>

Agree and verified with xfstests generic full runs twice with NFSv3.
Prior to this patch with NFSv3 xfstest I'd regularly see the crash:
BUG: KASAN: use-after-free in __fscache_acquire_cookie+0x437
https://marc.info/?l=v9fs-developer&m=163916153103008&w=2
https://marc.info/?l=linux-nfs&m=163917893813589&w=2

Tested-by: Dave Wysochanski <dwysocha@redhat.com>
David Howells Dec. 13, 2021, 10:39 p.m. UTC | #4
David Howells <dhowells@redhat.com> wrote:

> I forgot to commit part of the patch.  Attached is the patch in full.

Sigh.  I replied to the wrong message.

David
diff mbox series

Patch

diff --git a/fs/fscache/cookie.c b/fs/fscache/cookie.c
index d7e825d636e2..8d0769a5ee2b 100644
--- a/fs/fscache/cookie.c
+++ b/fs/fscache/cookie.c
@@ -755,7 +755,7 @@  static void fscache_cookie_state_machine(struct fscache_cookie *cookie)
 		set_bit(FSCACHE_COOKIE_NO_DATA_TO_READ, &cookie->flags);
 		__fscache_set_cookie_state(cookie, FSCACHE_COOKIE_STATE_QUIESCENT);
 		wake = true;
-		break;
+		goto again_locked;
 
 	case FSCACHE_COOKIE_STATE_DROPPED:
 		break;