Message ID | 20220403120119.235457-4-hch@lst.de (mailing list archive) |
---|---|
State | Superseded, archived |
Headers | show |
Series | [1/5] xfs: add a flags argument to xfs_buf_get | expand |
On Sun, Apr 03, 2022 at 02:01:17PM +0200, Christoph Hellwig wrote: > xfs_buf_get_map has a bit of a strange structure where the xfs_buf_find > helper is called twice before we actually insert a new buffer on a cache > miss. Given that the rhashtable has an interface to insert a new entry > and return the found one on a conflict we can easily get rid of the > double lookup by using that. We can do that without completely rewriting this code. spin_lock(&pag->pag_buf_lock); if (!new_bp) { bp = rhashtable_lookup_fast(&pag->pag_buf_hash, &cmap, xfs_buf_hash_params); if (!bp) { error = -ENOENT; goto not_found; } } else { bp = rhashtable_lookup_get_insert_fast(&pag->pag_buf_hash, &new_bp->b_rhash_head, xfs_buf_hash_params); if (IS_ERR(bp)) { error = PTR_ERR(*bpp); goto not_found; } if (!bp) { /* * Inserted new buffer, it keeps the perag reference until * it is freed. */ new_bp->b_pag = pag; spin_unlock(&pag->pag_buf_lock); *found_bp = new_bp; return 0; } } atomic_inc(&bp->b_hold); spin_lock(&pag->pag_buf_lock); xfs_perag_put(pag); <lock buffer> *found_bp = bp; return 0; not_found: spin_lock(&pag->pag_buf_lock); xfs_perag_put(pag); return error; } And now we have the existing code with the the optimised rhashtable insertion. I'd much prefer this be separated out like this so that this rhashtable usage change is a separate bisectable commit to all the other changes. I'm also not sold on moving where we allocate the buffer allocation as done in this patch because: > +static int > +xfs_buf_insert( > + struct xfs_buftarg *btp, > + struct xfs_buf_map *map, > + int nmaps, > + xfs_buf_flags_t flags, > + struct xfs_perag *pag, > + struct xfs_buf **bpp) > +{ > + int error; > + struct xfs_buf *bp; > + > + error = _xfs_buf_alloc(btp, map, nmaps, flags, &bp); > + if (error) > + return error; > + > + /* > + * For buffers that fit entirely within a single page, first attempt to > + * allocate the memory from the heap to minimise memory usage. If we > + * can't get heap memory for these small buffers, we fall back to using > + * the page allocator. > + */ > + if (BBTOB(bp->b_length) >= PAGE_SIZE || > + xfs_buf_alloc_kmem(bp, flags) < 0) { > + error = xfs_buf_alloc_pages(bp, flags); > + if (error) > + goto out_free_buf; > + } > + > + /* the buffer keeps the perag reference until it is freed */ > + bp->b_pag = pag; > + > + spin_lock(&pag->pag_buf_lock); > + *bpp = rhashtable_lookup_get_insert_fast(&pag->pag_buf_hash, > + &bp->b_rhash_head, xfs_buf_hash_params); > + if (IS_ERR(*bpp)) { > + error = PTR_ERR(*bpp); > + goto out_unlock; > + } > + if (*bpp) { > + /* found an existing buffer */ > + atomic_inc(&(*bpp)->b_hold); > + error = -EEXIST; > + goto out_unlock; > + } > + spin_unlock(&pag->pag_buf_lock); > + *bpp = bp; > + return 0; The return cases of this function end up being a bit of a mess. We can return: - error = 0 and a locked buffer in *bpp - error = -EEXIST and an unlocked buffer in *bpp - error != 0 and a modified *bpp pointer - error != 0 and an unmodified *bpp pointer So we end up with a bunch of logic here to separate out the return cases into different error values, then have to add logic to the caller to handle the different return cases. And if we look at the new caller: > + if (unlikely(!bp)) { > + if (flags & XBF_NOALLOC) { > + XFS_STATS_INC(mp, xb_miss_locked); > + xfs_perag_put(pag); > + return -ENOENT; > + } > + > + error = xfs_buf_insert(btp, map, nmaps, flags, pag, &bp); > + if (!error) > + goto finish; > + if (error != -EEXIST) { > + xfs_perag_put(pag); > + return error; > + } > + } > xfs_perag_put(pag); <lock the buffer> It took me quite some time to realise this wasn't buggy - it looks for all the world like the "!error" case here fails to lock the buffer and leaks the perag reference. It isn't at all clear that the new buffer is inserted locked into the hash and that it steals the callers perag reference, whilst the -EEXIST case does neither yet still returns a buffer. Hence while I like the idea of changing up how we allocate the buffer on lookup failure, I'm not sold on this interface yet. Hence I think splitting the rhashtable insert optimisation from the allocation rework might help clean this up more. Cheers, Dave.
On Mon, Apr 04, 2022 at 09:04:52AM +1000, Dave Chinner wrote: > On Sun, Apr 03, 2022 at 02:01:17PM +0200, Christoph Hellwig wrote: > > xfs_buf_get_map has a bit of a strange structure where the xfs_buf_find > > helper is called twice before we actually insert a new buffer on a cache > > miss. Given that the rhashtable has an interface to insert a new entry > > and return the found one on a conflict we can easily get rid of the > > double lookup by using that. > > We can do that without completely rewriting this code. We could. And I had something similar earlier. But I actually thing the structure of the code after this patch makes much more sense. All the logic for the fast path buffer lookup is clearly layed out in one function, which then just calls a helper to perform the lookup. The new scheme also is slightly less code overall. Even more so once the lockless lookup comes into play which requires different locking and refcount increments. > The return cases of this function end up being a bit of a mess. We can return: > > - error = 0 and a locked buffer in *bpp > - error = -EEXIST and an unlocked buffer in *bpp > - error != 0 and a modified *bpp pointer > - error != 0 and an unmodified *bpp pointer The last two are the same - the *bpp pointer simply is not valid on a "real" error return. So the return really is a tristate, similar to many other places in xfs.
On Tue, Apr 05, 2022 at 05:00:27PM +0200, Christoph Hellwig wrote: > On Mon, Apr 04, 2022 at 09:04:52AM +1000, Dave Chinner wrote: > > On Sun, Apr 03, 2022 at 02:01:17PM +0200, Christoph Hellwig wrote: > > > xfs_buf_get_map has a bit of a strange structure where the xfs_buf_find > > > helper is called twice before we actually insert a new buffer on a cache > > > miss. Given that the rhashtable has an interface to insert a new entry > > > and return the found one on a conflict we can easily get rid of the > > > double lookup by using that. > > > > We can do that without completely rewriting this code. > > We could. And I had something similar earlier. But I actually thing > the structure of the code after this patch makes much more sense. All > the logic for the fast path buffer lookup is clearly layed out in one > function, which then just calls a helper to perform the lookup. > The new scheme also is slightly less code overall. Even more so once > the lockless lookup comes into play which requires different locking > and refcount increments. Agreed, but you're making two distinct, significant modifications in the one patchset. One is changing the way we use a generic library functionality, the other is changing the entire structure of the lookup path. IOWs, I was not saying the end result was bad, I was (clumsily) trying to suggest that you should split these two modifications into separate patches because they are largely separate changes. Once I thought about it that way, and looking that them that way made me want to structure the code quite differently. > > The return cases of this function end up being a bit of a mess. We can return: > > > > - error = 0 and a locked buffer in *bpp > > - error = -EEXIST and an unlocked buffer in *bpp > > - error != 0 and a modified *bpp pointer > > - error != 0 and an unmodified *bpp pointer > > The last two are the same - the *bpp pointer simply is not valid on a > "real" error return. So the return really is a tristate, similar > to many other places in xfs. I think you missed the point I was making. I'm not complaining about whether it's a tristate return or not, it's the fact that it can return buffers in different states and the caller has to handle that inconsistency itself whilst still maintaining an efficient fast path. That's what makes the code difficult to follow - xfs_buf_insert() is the slow path, so all the complexity and twisted logic should be inside that function rather than directly impacting the fast path code. e.g. Most of the complexity goes away if we factor out the buffer trylock/locking code into a helper (like we have in the iomap code) and then have xfs_buf_insert() call it when it finds an existing buffer. Then the -EEXIST return value can go away, and xfs_buf_insert can return a locked buffer exactly the same as if it inserted a new buffer. Have the newly allocated buffer take a new perag reference, too, instead of stealing the caller's reference, and then all the differences between insert and -EEXIST cases go away. Then you can move all the conditional lookup failure cases into xfs_buf_insert(), too, and we end up with high level fast path code that is clear and concise: /* cache hits generally outnumber misses by at least 10:1 */ bp = xfs_buf_lookup_fast(pag, &cmap); if (likely(bp)) error = xfs_buf_lookup_lock(bp, flags); else error = xfs_buf_lookup_insert(pag, &cmap, flags, &bp); xfs_perag_put(pag); if (!error) *bpp = bp; return error; Cheers, Dave.
On Wed, Apr 06, 2022 at 08:01:21AM +1000, Dave Chinner wrote: > Agreed, but you're making two distinct, significant modifications in > the one patchset. One is changing the way we use a generic library > functionality, the other is changing the entire structure of the > lookup path. > > IOWs, I was not saying the end result was bad, I was (clumsily) > trying to suggest that you should split these two modifications into > separate patches because they are largely separate changes. > > Once I thought about it that way, and > looking that them that way made me want to structure the code quite > differently. Ok, I'll see if I can split things up a bit better. > > e.g. Most of the complexity goes away if we factor out the buffer > trylock/locking code into a helper (like we have in the iomap code) > and then have xfs_buf_insert() call it when it finds an existing > buffer. Then the -EEXIST return value can go away, and > xfs_buf_insert can return a locked buffer exactly the same as if it > inserted a new buffer. Have the newly allocated buffer take a new > perag reference, too, instead of stealing the caller's reference, > and then all the differences between insert and -EEXIST cases go > away. I actually had that earlier as well and really like the flow of the single function. So it certainly is doable.
diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c index 0951b7cbd79675..ef645e15935369 100644 --- a/fs/xfs/xfs_buf.c +++ b/fs/xfs/xfs_buf.c @@ -504,12 +504,66 @@ xfs_buf_hash_destroy( rhashtable_destroy(&pag->pag_buf_hash); } +static int +xfs_buf_insert( + struct xfs_buftarg *btp, + struct xfs_buf_map *map, + int nmaps, + xfs_buf_flags_t flags, + struct xfs_perag *pag, + struct xfs_buf **bpp) +{ + int error; + struct xfs_buf *bp; + + error = _xfs_buf_alloc(btp, map, nmaps, flags, &bp); + if (error) + return error; + + /* + * For buffers that fit entirely within a single page, first attempt to + * allocate the memory from the heap to minimise memory usage. If we + * can't get heap memory for these small buffers, we fall back to using + * the page allocator. + */ + if (BBTOB(bp->b_length) >= PAGE_SIZE || + xfs_buf_alloc_kmem(bp, flags) < 0) { + error = xfs_buf_alloc_pages(bp, flags); + if (error) + goto out_free_buf; + } + + /* the buffer keeps the perag reference until it is freed */ + bp->b_pag = pag; + + spin_lock(&pag->pag_buf_lock); + *bpp = rhashtable_lookup_get_insert_fast(&pag->pag_buf_hash, + &bp->b_rhash_head, xfs_buf_hash_params); + if (IS_ERR(*bpp)) { + error = PTR_ERR(*bpp); + goto out_unlock; + } + if (*bpp) { + /* found an existing buffer */ + atomic_inc(&(*bpp)->b_hold); + error = -EEXIST; + goto out_unlock; + } + spin_unlock(&pag->pag_buf_lock); + *bpp = bp; + return 0; + +out_unlock: + spin_unlock(&pag->pag_buf_lock); +out_free_buf: + xfs_buf_free(bp); + return error; +} + /* - * Look up a buffer in the buffer cache and return it referenced and locked - * in @found_bp. - * - * If @new_bp is supplied and we have a lookup miss, insert @new_bp into the - * cache. + * Assemble a buffer covering the specified range. The code is optimised for + * cache hits, as metadata intensive workloads will see 3 orders of magnitude + * more hits than misses. * * If XBF_TRYLOCK is set in @flags, only try to lock the buffer and return * -EAGAIN if we fail to lock it. @@ -517,27 +571,26 @@ xfs_buf_hash_destroy( * Return values are: * -EFSCORRUPTED if have been supplied with an invalid address * -EAGAIN on trylock failure - * -ENOENT if we fail to find a match and @new_bp was NULL - * 0, with @found_bp: - * - @new_bp if we inserted it into the cache - * - the buffer we found and locked. + * -ENOENT if we fail to find a match and alloc was %false + * 0 if a buffer was found or newly created */ -static int -xfs_buf_find( +int +xfs_buf_get_map( struct xfs_buftarg *btp, struct xfs_buf_map *map, int nmaps, xfs_buf_flags_t flags, - struct xfs_buf *new_bp, - struct xfs_buf **found_bp) + struct xfs_buf **bpp) { + struct xfs_mount *mp = btp->bt_mount; struct xfs_perag *pag; struct xfs_buf *bp; struct xfs_buf_map cmap = { .bm_bn = map[0].bm_bn }; + int error = 0; xfs_daddr_t eofs; int i; - *found_bp = NULL; + *bpp = NULL; for (i = 0; i < nmaps; i++) cmap.bm_len += map[i].bm_len; @@ -550,54 +603,47 @@ xfs_buf_find( * Corrupted block numbers can get through to here, unfortunately, so we * have to check that the buffer falls within the filesystem bounds. */ - eofs = XFS_FSB_TO_BB(btp->bt_mount, btp->bt_mount->m_sb.sb_dblocks); - if (cmap.bm_bn < 0 || cmap.bm_bn >= eofs) { - xfs_alert(btp->bt_mount, - "%s: daddr 0x%llx out of range, EOFS 0x%llx", + eofs = XFS_FSB_TO_BB(mp, mp->m_sb.sb_dblocks); + if (WARN_ON_ONCE(cmap.bm_bn < 0) || WARN_ON_ONCE(cmap.bm_bn >= eofs)) { + xfs_alert(mp, "%s: daddr 0x%llx out of range, EOFS 0x%llx", __func__, cmap.bm_bn, eofs); - WARN_ON(1); return -EFSCORRUPTED; } - pag = xfs_perag_get(btp->bt_mount, - xfs_daddr_to_agno(btp->bt_mount, cmap.bm_bn)); + pag = xfs_perag_get(mp, xfs_daddr_to_agno(mp, cmap.bm_bn)); spin_lock(&pag->pag_buf_lock); bp = rhashtable_lookup_fast(&pag->pag_buf_hash, &cmap, xfs_buf_hash_params); - if (bp) { + if (bp) atomic_inc(&bp->b_hold); - goto found; - } - - /* No match found */ - if (!new_bp) { - XFS_STATS_INC(btp->bt_mount, xb_miss_locked); - spin_unlock(&pag->pag_buf_lock); - xfs_perag_put(pag); - return -ENOENT; - } - - /* the buffer keeps the perag reference until it is freed */ - new_bp->b_pag = pag; - rhashtable_insert_fast(&pag->pag_buf_hash, &new_bp->b_rhash_head, - xfs_buf_hash_params); spin_unlock(&pag->pag_buf_lock); - *found_bp = new_bp; - return 0; -found: - spin_unlock(&pag->pag_buf_lock); + if (unlikely(!bp)) { + if (flags & XBF_NOALLOC) { + XFS_STATS_INC(mp, xb_miss_locked); + xfs_perag_put(pag); + return -ENOENT; + } + + error = xfs_buf_insert(btp, map, nmaps, flags, pag, &bp); + if (!error) + goto finish; + if (error != -EEXIST) { + xfs_perag_put(pag); + return error; + } + } xfs_perag_put(pag); if (!xfs_buf_trylock(bp)) { if (flags & XBF_TRYLOCK) { xfs_buf_rele(bp); - XFS_STATS_INC(btp->bt_mount, xb_busy_locked); + XFS_STATS_INC(mp, xb_busy_locked); return -EAGAIN; } xfs_buf_lock(bp); - XFS_STATS_INC(btp->bt_mount, xb_get_locked_waited); + XFS_STATS_INC(mp, xb_get_locked_waited); } /* @@ -612,64 +658,12 @@ xfs_buf_find( } trace_xfs_buf_find(bp, flags, _RET_IP_); - XFS_STATS_INC(btp->bt_mount, xb_get_locked); - *found_bp = bp; - return 0; -} - -/* - * Assembles a buffer covering the specified range. The code is optimised for - * cache hits, as metadata intensive workloads will see 3 orders of magnitude - * more hits than misses. - */ -int -xfs_buf_get_map( - struct xfs_buftarg *target, - struct xfs_buf_map *map, - int nmaps, - xfs_buf_flags_t flags, - struct xfs_buf **bpp) -{ - struct xfs_buf *bp; - struct xfs_buf *new_bp; - int error; - - *bpp = NULL; - error = xfs_buf_find(target, map, nmaps, flags, NULL, &bp); - if (!error) - goto found; - if (error != -ENOENT || (flags & XBF_NOALLOC)) - return error; - - error = _xfs_buf_alloc(target, map, nmaps, flags, &new_bp); - if (error) - return error; - - /* - * For buffers that fit entirely within a single page, first attempt to - * allocate the memory from the heap to minimise memory usage. If we - * can't get heap memory for these small buffers, we fall back to using - * the page allocator. - */ - if (BBTOB(new_bp->b_length) >= PAGE_SIZE || - xfs_buf_alloc_kmem(new_bp, flags) < 0) { - error = xfs_buf_alloc_pages(new_bp, flags); - if (error) - goto out_free_buf; - } - - error = xfs_buf_find(target, map, nmaps, flags, new_bp, &bp); - if (error) - goto out_free_buf; - - if (bp != new_bp) - xfs_buf_free(new_bp); - -found: + XFS_STATS_INC(mp, xb_get_locked); +finish: if (!bp->b_addr) { error = _xfs_buf_map_pages(bp, flags); if (unlikely(error)) { - xfs_warn_ratelimited(target->bt_mount, + xfs_warn_ratelimited(mp, "%s: failed to map %u pages", __func__, bp->b_page_count); xfs_buf_relse(bp); @@ -684,13 +678,10 @@ xfs_buf_get_map( if (!(flags & XBF_READ)) xfs_buf_ioerror(bp, 0); - XFS_STATS_INC(target->bt_mount, xb_get); + XFS_STATS_INC(mp, xb_get); trace_xfs_buf_get(bp, flags, _RET_IP_); *bpp = bp; return 0; -out_free_buf: - xfs_buf_free(new_bp); - return error; } int @@ -962,12 +953,12 @@ xfs_buf_rele( /* * We grab the b_lock here first to serialise racing xfs_buf_rele() * calls. The pag_buf_lock being taken on the last reference only - * serialises against racing lookups in xfs_buf_find(). IOWs, the second - * to last reference we drop here is not serialised against the last - * reference until we take bp->b_lock. Hence if we don't grab b_lock - * first, the last "release" reference can win the race to the lock and - * free the buffer before the second-to-last reference is processed, - * leading to a use-after-free scenario. + * serialises against racing lookups in xfs_buf_get_map(). IOWs, the + * second to last reference we drop here is not serialised against the + * last reference until we take bp->b_lock. Hence if we don't grab + * b_lock first, the last "release" reference can win the race to the + * lock and free the buffer before the second-to-last reference is + * processed, leading to a use-after-free scenario. */ spin_lock(&bp->b_lock); release = atomic_dec_and_lock(&bp->b_hold, &pag->pag_buf_lock);
xfs_buf_get_map has a bit of a strange structure where the xfs_buf_find helper is called twice before we actually insert a new buffer on a cache miss. Given that the rhashtable has an interface to insert a new entry and return the found one on a conflict we can easily get rid of the double lookup by using that. To get a straight code path this requires folding the xfs_buf_find helper into xfs_buf_get_map, but the counter side we can now easily move the logic for the slow path cache insert into a new helper. Signed-off-by: Christoph Hellwig <hch@lst.de> --- fs/xfs/xfs_buf.c | 207 +++++++++++++++++++++++------------------------ 1 file changed, 99 insertions(+), 108 deletions(-)