Message ID | 2117977.1733750054@warthog.procyon.org.uk (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | netfs: Fix ceph copy to cache on write-begin | expand |
On Mon, Dec 9, 2024 at 2:14 PM David Howells <dhowells@redhat.com> wrote:
> Could you try this?
No change, still hangs immediately on the first try.
How about if you add the attached?
For convenience, I've put the outstanding fix patches I have here:
https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=netfs-fixes
David
---
commit d0bc2ecca996105f55da22e8867905ca1dad7c8f
Author: David Howells <dhowells@redhat.com>
Date: Thu Dec 12 15:26:24 2024 +0000
netfs: Fix the (non-)cancellation of copy when cache is temporarily disabled
When the caching for a cookie is temporarily disabled (e.g. due to a DIO
write on that file), future copying to the cache for that file is disabled
until all fds open on that file are closed. However, if netfslib is using
the deprecated PG_private_2 method (such as is currently used by ceph), and
decides it wants to copy to the cache, netfs_advance_write() will just bail
at the first check seeing that the cache stream is unavailable, and
indicate that it dealt with all the content.
This means that we have no subrequests to provide notifications to drive
the state machine or even to pin the request and the request just gets
discarded, leaving the folios with PG_private_2 set.
Fix this by jumping directly to cancel the request if the cache is not
available. That way, we don't remove mark3 from the folio_queue list and
netfs_pgpriv2_cancel() will clean up the folios.
This was found by running the generic/013 xfstest against ceph with an active
cache and the "-o fsc" option passed to ceph. That would usually hang
Fixes: ee4cdf7ba857 ("netfs: Speed up buffered reading")
Reported-by: Max Kellermann <max.kellermann@ionos.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: Ilya Dryomov <idryomov@gmail.com>
cc: Xiubo Li <xiubli@redhat.com>
cc: netfs@lists.linux.dev
cc: ceph-devel@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
diff --git a/fs/netfs/read_pgpriv2.c b/fs/netfs/read_pgpriv2.c
index ba5af89d37fa..54d5004fec18 100644
--- a/fs/netfs/read_pgpriv2.c
+++ b/fs/netfs/read_pgpriv2.c
@@ -170,6 +170,10 @@ void netfs_pgpriv2_write_to_the_cache(struct netfs_io_request *rreq)
trace_netfs_write(wreq, netfs_write_trace_copy_to_cache);
netfs_stat(&netfs_n_wh_copy_to_cache);
+ if (!wreq->io_streams[1].avail) {
+ netfs_put_request(wreq, false, netfs_rreq_trace_put_return);
+ goto couldnt_start;
+ }
for (;;) {
error = netfs_pgpriv2_copy_folio(wreq, folio);
diff --git a/fs/netfs/read_collect.c b/fs/netfs/read_collect.c index 849f40f64443..72a16222b63b 100644 --- a/fs/netfs/read_collect.c +++ b/fs/netfs/read_collect.c @@ -62,10 +62,14 @@ static void netfs_unlock_read_folio(struct netfs_io_subrequest *subreq, } else { trace_netfs_folio(folio, netfs_folio_trace_read_done); } + + folioq_clear(folioq, slot); } else { // TODO: Use of PG_private_2 is deprecated. if (test_bit(NETFS_SREQ_COPY_TO_CACHE, &subreq->flags)) netfs_pgpriv2_mark_copy_to_cache(subreq, rreq, folioq, slot); + else + folioq_clear(folioq, slot); } if (!test_bit(NETFS_RREQ_DONT_UNLOCK_FOLIOS, &rreq->flags)) { @@ -77,8 +81,6 @@ static void netfs_unlock_read_folio(struct netfs_io_subrequest *subreq, folio_unlock(folio); } } - - folioq_clear(folioq, slot); } /*
Hi Max, Could you try this? David --- netfs: Fix ceph copy to cache on write-begin At the end of netfs_unlock_read_folio() in which folios are marked appropriately for copying to the cache (either with by being marked dirty and having their private data set or by having PG_private_2 set) and then unlocked, the folio_queue struct has the entry pointing to the folio cleared. This presents a problem for netfs_pgpriv2_write_to_the_cache(), which is used to write folios marked with PG_private_2 to the cache as it expects to be able to trawl the folio_queue list thereafter to find the relevant folios, leading to a hang. Fix this by not clearing the folio_queue entry if we're going to do the deprecated copy-to-cache. The clearance will be done instead as the folios are written to the cache. This can be reproduced by starting cachefiles, mounting a ceph filesystem with "-o fsc" and writing to it. Reported-by: Max Kellermann <max.kellermann@ionos.com> Signed-off-by: David Howells <dhowells@redhat.com> cc: Jeff Layton <jlayton@kernel.org> cc: Ilya Dryomov <idryomov@gmail.com> cc: Xiubo Li <xiubli@redhat.com> cc: netfs@lists.linux.dev cc: ceph-devel@vger.kernel.org cc: linux-fsdevel@vger.kernel.org --- fs/netfs/read_collect.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)