[19/26] netfs: New writeback implementation

The current netfslib writeback implementation creates writeback requests of
contiguous folio data and then separately tiles subrequests over the space
twice, once for the server and once for the cache.  This creates a few
issues:

 (1) Every time there's a discontiguity or a change between writing to only
     one destination or writing to both, it must create a new request.
     This makes it harder to do vectored writes.

 (2) The folios don't have the writeback mark removed until the end of the
     request - and a request could be hundreds of megabytes.

 (3) In future, I want to support a larger cache granularity, which will
     require aggregation of some folios that contain unmodified data (which
     only need to go to the cache) and some which contain modifications
     (which need to be uploaded and stored to the cache) - but, currently,
     these are treated as discontiguous.

There's also a move to get everyone to use writeback_iter() to extract
writable folios from the pagecache.  That said, currently writeback_iter()
has some issues that make it less than ideal:

 (1) there's no way to cancel the iteration, even if you find a "temporary"
     error that means the current folio and all subsequent folios are going
     to fail;

 (2) there's no way to filter the folios being written back - something
     that will impact Ceph with it's ordered snap system;

 (3) and if you get a folio you can't immediately deal with (say you need
     to flush the preceding writes), you are left with a folio hanging in
     the locked state for the duration, when really we should unlock it and
     relock it later.

In this new implementation, I use writeback_iter() to pump folios,
progressively creating two parallel, but separate streams and cleaning up
the finished folios as the subrequests complete.  Either or both streams
can contain gaps, and the subrequests in each stream can be of variable
size, don't need to align with each other and don't need to align with the
folios.

Indeed, subrequests can cross folio boundaries, may cover several folios or
a folio may be spanned by multiple folios, e.g.:

         +---+---+-----+-----+---+----------+
Folios:  |   |   |     |     |   |          |
         +---+---+-----+-----+---+----------+

           +------+------+     +----+----+
Upload:    |      |      |.....|    |    |
           +------+------+     +----+----+

         +------+------+------+------+------+
Cache:   |      |      |      |      |      |
         +------+------+------+------+------+

The progressive subrequest construction permits the algorithm to be
preparing both the next upload to the server and the next write to the
cache whilst the previous ones are already in progress.  Throttling can be
applied to control the rate of production of subrequests - and, in any
case, we probably want to write them to the server in ascending order,
particularly if the file will be extended.

Content crypto can also be prepared at the same time as the subrequests and
run asynchronously, with the prepped requests being stalled until the
crypto catches up with them.  This might also be useful for transport
crypto, but that happens at a lower layer, so probably would be harder to
pull off.

The algorithm is split into three parts:

 (1) The issuer.  This walks through the data, packaging it up, encrypting
     it and creating subrequests.  The part of this that generates
     subrequests only deals with file positions and spans and so is usable
     for DIO/unbuffered writes as well as buffered writes.

 (2) The collector. This asynchronously collects completed subrequests,
     unlocks folios, frees crypto buffers and performs any retries.  This
     runs in a work queue so that the issuer can return to the caller for
     writeback (so that the VM can have its kswapd thread back) or async
     writes.

 (3) The retryer.  This pauses the issuer, waits for all outstanding
     subrequests to complete and then goes through the failed subrequests
     to reissue them.  This may involve reprepping them (with cifs, the
     credits must be renegotiated, and a subrequest may need splitting),
     and doing RMW for content crypto if there's a conflicting change on
     the server.

[!] Note that some of the functions are prefixed with "new_" to avoid
clashes with existing functions.  These will be renamed in a later patch
that cuts over to the new algorithm.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: Eric Van Hensbergen <ericvh@kernel.org>
cc: Latchesar Ionkov <lucho@ionkov.net>
cc: Dominique Martinet <asmadeus@codewreck.org>
cc: Christian Schoenebeck <linux_oss@crudebyte.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: v9fs@lists.linux.dev
cc: linux-afs@lists.infradead.org
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
---
 fs/netfs/Makefile            |   4 +-
 fs/netfs/buffered_write.c    |   4 -
 fs/netfs/internal.h          |  27 ++
 fs/netfs/objects.c           |  17 +
 fs/netfs/write_collect.c     | 808 +++++++++++++++++++++++++++++++++++
 fs/netfs/write_issue.c       | 673 +++++++++++++++++++++++++++++
 include/linux/netfs.h        |  68 ++-
 include/trace/events/netfs.h | 232 +++++++++-
 8 files changed, 1824 insertions(+), 9 deletions(-)
 create mode 100644 fs/netfs/write_collect.c
 create mode 100644 fs/netfs/write_issue.c

Message ID	20240328163424.2781320-20-dhowells@redhat.com (mailing list archive)
State	New
Headers	show Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1B4AD3CF63 for <linux-nfs@vger.kernel.org>; Thu, 28 Mar 2024 16:38:24 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711643908; cv=none; b=GHeE5r5wVkpQ84Ca+huPaR/4qciwAVG3N10n0NIeJPEJDpFwE3+LLjL1zbwzmGSV+BTJgBVvQOZ6CxsOQFfgg6slaz0pP7aRQDf8ee51DxnP3kxSHvSWjR4fOSeAbftO+O+DQQJvRlnEVvmErHpCqIGi5vAWtgQNqybSSXDYYTs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711643908; c=relaxed/simple; bh=ZWbeJXzY1Q3rKfnF8Ma3SV69jo3ctZYa9zgUC2+uqwo=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=ZCWNuxXhHUDk1oDhfbWLwSmyK8ewFWchtqqAUIBmJD/7kZN5usQX05BZs+xOf6h5t1ukkPNZROZtJ1XbShPThMvtVlU/K4BOPMyvgsIMtmb0K2gnhMUgMH9C15h6wkN1XT68ypYRq/6zDYvhNvFjrQlQpOazS9X/sPFu89kCChk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=B0AO4nDW; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="B0AO4nDW" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1711643903; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=6oR5oLLnDEGzLyyE/FpEL31/ICMpZGBhfg6xdP5vXJM=; b=B0AO4nDWOGtc0X5XpNwR5jI+3NU3wiJ1vqOu2TVX53ZJD22iIAhr/eh0bgQtHYla16k7s2 62om1v8fw3SLfSx6p9LsBicG8ucwCwISIPVQwg+KI8KxjhS00w4ASZkserJEonl05/TDsL +IbujMMG32UqxUvTyp3zvlXnx3JZXe0= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-360-QMo_bW_3PAqGtHSyc5cHWA-1; Thu, 28 Mar 2024 12:38:17 -0400 X-MC-Unique: QMo_bW_3PAqGtHSyc5cHWA-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 7284910189AB; Thu, 28 Mar 2024 16:38:16 +0000 (UTC) Received: from warthog.procyon.org.com (unknown [10.42.28.146]) by smtp.corp.redhat.com (Postfix) with ESMTP id E8E67C37A86; Thu, 28 Mar 2024 16:38:12 +0000 (UTC) From: David Howells <dhowells@redhat.com> To: Christian Brauner <christian@brauner.io>, Jeff Layton <jlayton@kernel.org>, Gao Xiang <hsiangkao@linux.alibaba.com>, Dominique Martinet <asmadeus@codewreck.org> Cc: David Howells <dhowells@redhat.com>, Matthew Wilcox <willy@infradead.org>, Steve French <smfrench@gmail.com>, Marc Dionne <marc.dionne@auristor.com>, Paulo Alcantara <pc@manguebit.com>, Shyam Prasad N <sprasad@microsoft.com>, Tom Talpey <tom@talpey.com>, Eric Van Hensbergen <ericvh@kernel.org>, Ilya Dryomov <idryomov@gmail.com>, netfs@lists.linux.dev, linux-cachefs@redhat.com, linux-afs@lists.infradead.org, linux-cifs@vger.kernel.org, linux-nfs@vger.kernel.org, ceph-devel@vger.kernel.org, v9fs@lists.linux.dev, linux-erofs@lists.ozlabs.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, Latchesar Ionkov <lucho@ionkov.net>, Christian Schoenebeck <linux_oss@crudebyte.com> Subject: [PATCH 19/26] netfs: New writeback implementation Date: Thu, 28 Mar 2024 16:34:11 +0000 Message-ID: <20240328163424.2781320-20-dhowells@redhat.com> In-Reply-To: <20240328163424.2781320-1-dhowells@redhat.com> References: <20240328163424.2781320-1-dhowells@redhat.com> Precedence: bulk X-Mailing-List: linux-nfs@vger.kernel.org List-Id: <linux-nfs.vger.kernel.org> List-Subscribe: <mailto:linux-nfs+subscribe@vger.kernel.org> List-Unsubscribe: <mailto:linux-nfs+unsubscribe@vger.kernel.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.8
Series	netfs, afs, 9p, cifs: Rework netfs to use ->writepages() to copy to cache \| expand [00/26] netfs, afs, 9p, cifs: Rework netfs to use ->writepages() to copy to cache [01/26] cifs: Fix duplicate fscache cookie warnings [02/26] 9p: Clean up some kdoc and unused var warnings. [03/26] netfs: Update i_blocks when write committed to pagecache [04/26] netfs: Replace PG_fscache by setting folio->private and marking dirty [05/26] mm: Remove the PG_fscache alias for PG_private_2 [06/26] netfs: Remove deprecated use of PG_private_2 as a second writeback flag [07/26] netfs: Make netfs_io_request::subreq_counter an atomic_t [08/26] netfs: Use subreq_counter to allocate subreq debug_index values [09/26] mm: Provide a means of invalidation without using launder_folio [10/26] cifs: Use alternative invalidation to using launder_folio [11/26] 9p: Use alternative invalidation to using launder_folio [12/26] afs: Use alternative invalidation to using launder_folio [13/26] netfs: Remove ->launder_folio() support [14/26] netfs: Use mempools for allocating requests and subrequests [15/26] mm: Export writeback_iter() [16/26] netfs: Switch to using unsigned long long rather than loff_t [17/26] netfs: Fix writethrough-mode error handling [18/26] netfs: Add some write-side stats and clean up some stat names [19/26] netfs: New writeback implementation [20/26] netfs, afs: Implement helpers for new write code [21/26] netfs, 9p: Implement helpers for new write code [22/26] netfs, cachefiles: Implement helpers for new write code [23/26] netfs: Cut over to using new writeback code [24/26] netfs: Remove the old writeback code [25/26] netfs: Miscellaneous tidy ups [26/26] netfs, afs: Use writeback retry to deal with alternate keys

[19/26] netfs: New writeback implementation

Commit Message

Comments

Patch