Message ID | 599331.1639410068@warthog.procyon.org.uk (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | fscache: Need to go round again after processing LRU_DISCARDING state | expand |
David Howells <dhowells@redhat.com> wrote: > However, if both LRU discard and relinquishment happen *before* the SM > runs, one of the queue events will get discarded, along with the ref that > would be associated with it. The last ref is then discarded and the cookie > is removed without completing the relinquishment process - leaving the > cookie hashed. This can be seen in a trace, e.g.: kworker/u16:97-5939 [000] ..... 639.403740: fscache_cookie: c=000071a9 - lrudo r=3 kworker/u16:97-5939 [000] ..... 639.403741: fscache_cookie: c=000071a9 GQ endac r=4 kworker/u16:97-5939 [000] ..... 639.403745: fscache_cookie: c=000071a9 PUT lru r=3 dirstress-7027 [002] ..... 639.427220: fscache_relinquish: c=000071a9 V=00000001 r=3 U=0 f=bd rt=0 dirstress-7027 [002] ..... 639.427222: fscache_cookie: c=000071a9 GQ endac r=4 dirstress-7027 [002] ..... 639.427223: fscache_cookie: c=000071a9 PQ overq r=3 where the "overq" line marks the discarded event and ref. David
I forgot to commit part of the patch. Attached is the patch in full.
David
---
commit a3d01f1a21bcf2c39aa6db49edbc08540378a593
Author: David Howells <dhowells@redhat.com>
Date: Mon Dec 13 16:26:44 2021 +0000
afs: Fix mmap
Fix afs_add_open_map() to check that the vnode isn't already on the list
when it adds it. It's possible that afs_drop_open_mmap() decremented the
cb_nr_mmap counter, but hadn't yet got into the locked section to remove
it.
Also vnode->cb_mmap_link should be initialised, so fix that too.
Fixes: 6e0e99d58a65 ("afs: Fix mmap coherency vs 3rd-party changes")
Reported-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: linux-afs@lists.infradead.org
diff --git a/fs/afs/file.c b/fs/afs/file.c
index 572063dad0b3..bcda99dcfdec 100644
--- a/fs/afs/file.c
+++ b/fs/afs/file.c
@@ -539,8 +539,9 @@ static void afs_add_open_mmap(struct afs_vnode *vnode)
if (atomic_inc_return(&vnode->cb_nr_mmap) == 1) {
down_write(&vnode->volume->cell->fs_open_mmaps_lock);
- list_add_tail(&vnode->cb_mmap_link,
- &vnode->volume->cell->fs_open_mmaps);
+ if (list_empty(&vnode->cb_mmap_link))
+ list_add_tail(&vnode->cb_mmap_link,
+ &vnode->volume->cell->fs_open_mmaps);
up_write(&vnode->volume->cell->fs_open_mmaps_lock);
}
diff --git a/fs/afs/super.c b/fs/afs/super.c
index af7cbd9949c5..5ec9fd97eccc 100644
--- a/fs/afs/super.c
+++ b/fs/afs/super.c
@@ -668,6 +668,7 @@ static void afs_i_init_once(void *_vnode)
INIT_LIST_HEAD(&vnode->pending_locks);
INIT_LIST_HEAD(&vnode->granted_locks);
INIT_DELAYED_WORK(&vnode->lock_work, afs_lock_work);
+ INIT_LIST_HEAD(&vnode->cb_mmap_link);
seqlock_init(&vnode->cb_lock);
}
On Mon, Dec 13, 2021 at 10:41 AM David Howells <dhowells@redhat.com> wrote: > > David Wysochanski <dwysocha@redhat.com> wrote: > > > > [ 432.921382] BUG: KASAN: use-after-free in > > > fscache_unhash_cookie+0x9e/0x160 [fscache]^M > > I think the patch below is the way to fix this. > > David > --- > fscache: Need to go round again after processing LRU_DISCARDING state > > There's a race between the LRU discard and relinquishment actions. In the > state machine, fscache_cookie_state_machine(), the ACTIVE state transits to > the LRU_DISCARD state in preference to transiting to the RELINQUISHING or > WITHDRAWING states. > > This should be fine, but the LRU_DISCARDING state just breaks out the > bottom of the function without going round again after transiting to the > QUIESCENT state. > > However, if both LRU discard and relinquishment happen *before* the SM > runs, one of the queue events will get discarded, along with the ref that > would be associated with it. The last ref is then discarded and the cookie > is removed without completing the relinquishment process - leaving the > cookie hashed. > > The fix is to make sure that the SM always goes back around after changing > the state. > > Signed-off-by: David Howells <dhowells@redhat.com> > --- > > diff --git a/fs/fscache/cookie.c b/fs/fscache/cookie.c > index d7e825d636e2..8d0769a5ee2b 100644 > --- a/fs/fscache/cookie.c > +++ b/fs/fscache/cookie.c > @@ -755,7 +755,7 @@ static void fscache_cookie_state_machine(struct fscache_cookie *cookie) > set_bit(FSCACHE_COOKIE_NO_DATA_TO_READ, &cookie->flags); > __fscache_set_cookie_state(cookie, FSCACHE_COOKIE_STATE_QUIESCENT); > wake = true; > - break; > + goto again_locked; > > case FSCACHE_COOKIE_STATE_DROPPED: > break; > Agree and verified with xfstests generic full runs twice with NFSv3. Prior to this patch with NFSv3 xfstest I'd regularly see the crash: BUG: KASAN: use-after-free in __fscache_acquire_cookie+0x437 https://marc.info/?l=v9fs-developer&m=163916153103008&w=2 https://marc.info/?l=linux-nfs&m=163917893813589&w=2 Tested-by: Dave Wysochanski <dwysocha@redhat.com>
David Howells <dhowells@redhat.com> wrote:
> I forgot to commit part of the patch. Attached is the patch in full.
Sigh. I replied to the wrong message.
David
diff --git a/fs/fscache/cookie.c b/fs/fscache/cookie.c index d7e825d636e2..8d0769a5ee2b 100644 --- a/fs/fscache/cookie.c +++ b/fs/fscache/cookie.c @@ -755,7 +755,7 @@ static void fscache_cookie_state_machine(struct fscache_cookie *cookie) set_bit(FSCACHE_COOKIE_NO_DATA_TO_READ, &cookie->flags); __fscache_set_cookie_state(cookie, FSCACHE_COOKIE_STATE_QUIESCENT); wake = true; - break; + goto again_locked; case FSCACHE_COOKIE_STATE_DROPPED: break;