From patchwork Mon Feb 15 15:44:26 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 12088437 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4AFD1C433E0 for ; Mon, 15 Feb 2021 15:44:43 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id ABC3264E8E for ; Mon, 15 Feb 2021 15:44:42 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org ABC3264E8E Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 3BBF38D010C; Mon, 15 Feb 2021 10:44:42 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 36B2B8D00FD; Mon, 15 Feb 2021 10:44:42 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 20CAB8D010C; Mon, 15 Feb 2021 10:44:42 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0057.hostedemail.com [216.40.44.57]) by kanga.kvack.org (Postfix) with ESMTP id 010268D00FD for ; Mon, 15 Feb 2021 10:44:41 -0500 (EST) Received: from smtpin26.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id B95BB781C for ; Mon, 15 Feb 2021 15:44:41 +0000 (UTC) X-FDA: 77820924762.26.actor56_0614c762763c Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin26.hostedemail.com (Postfix) with ESMTP id 981B01810DA5C for ; Mon, 15 Feb 2021 15:44:41 +0000 (UTC) X-HE-Tag: actor56_0614c762763c X-Filterd-Recvd-Size: 24032 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [63.128.21.124]) by imf24.hostedemail.com (Postfix) with ESMTP for ; Mon, 15 Feb 2021 15:44:40 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1613403880; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=YHHZFkMMZ6ql5k3TCAeh2HP9Bgj5fyKyaZUTKJfb3T4=; b=Z7lteGiPbPCXAleTRw6bhPD9MwW61iXgfPNOXUnmAgqyYDtmoOtWaiCOCRq8kqG5Nyc2gb G1501Ko//ry9grQixduiRS2z6UI3vwHNAEMEDWORFVmoGMTyiN+DkSOUtRhQXcSU5B6QJh gjd7URGnKf1AlqeskE6UZI4DPyGayq0= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-224-I_F9V2M5N06wwvoYuLp8rA-1; Mon, 15 Feb 2021 10:44:36 -0500 X-MC-Unique: I_F9V2M5N06wwvoYuLp8rA-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 35F5379EC2; Mon, 15 Feb 2021 15:44:34 +0000 (UTC) Received: from warthog.procyon.org.uk (ovpn-119-68.rdu2.redhat.com [10.10.119.68]) by smtp.corp.redhat.com (Postfix) with ESMTP id 9452F10016DB; Mon, 15 Feb 2021 15:44:27 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [PATCH 01/33] iov_iter: Add ITER_XARRAY From: David Howells To: Trond Myklebust , Anna Schumaker , Steve French , Dominique Martinet Cc: Alexander Viro , "Matthew Wilcox (Oracle)" , Christoph Hellwig , linux-mm@kvack.org, linux-cachefs@redhat.com, linux-afs@lists.infradead.org, linux-nfs@vger.kernel.org, linux-cifs@vger.kernel.org, ceph-devel@vger.kernel.org, v9fs-developer@lists.sourceforge.net, linux-fsdevel@vger.kernel.org, dhowells@redhat.com, Jeff Layton , David Wysochanski , "Matthew Wilcox (Oracle)" , Alexander Viro , linux-cachefs@redhat.com, linux-afs@lists.infradead.org, linux-nfs@vger.kernel.org, linux-cifs@vger.kernel.org, ceph-devel@vger.kernel.org, v9fs-developer@lists.sourceforge.net, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Date: Mon, 15 Feb 2021 15:44:26 +0000 Message-ID: <161340386671.1303470.10752208972482479840.stgit@warthog.procyon.org.uk> In-Reply-To: <161340385320.1303470.2392622971006879777.stgit@warthog.procyon.org.uk> References: <161340385320.1303470.2392622971006879777.stgit@warthog.procyon.org.uk> User-Agent: StGit/0.23 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Add an iterator, ITER_XARRAY, that walks through a set of pages attached to an xarray, starting at a given page and offset and walking for the specified amount of bytes. The iterator supports transparent huge pages. The caller must guarantee that the pages are all present and they must be locked using PG_locked, PG_writeback or PG_fscache to prevent them from going away or being migrated whilst they're being accessed. This is useful for copying data from socket buffers to inodes in network filesystems and for transferring data between those inodes and the cache using direct I/O. Whilst it is true that ITER_BVEC could be used instead, that would require a bio_vec array to be allocated to refer to all the pages - which should be redundant if inode->i_pages also points to all these pages. Signed-off-by: David Howells cc: Alexander Viro cc: Matthew Wilcox (Oracle) cc: Christoph Hellwig cc: linux-mm@kvack.org cc: linux-cachefs@redhat.com cc: linux-afs@lists.infradead.org cc: linux-nfs@vger.kernel.org cc: linux-cifs@vger.kernel.org cc: ceph-devel@vger.kernel.org cc: v9fs-developer@lists.sourceforge.net cc: linux-fsdevel@vger.kernel.org --- include/linux/uio.h | 11 ++ lib/iov_iter.c | 313 +++++++++++++++++++++++++++++++++++++++++++++++---- 2 files changed, 301 insertions(+), 23 deletions(-) diff --git a/include/linux/uio.h b/include/linux/uio.h index 72d88566694e..08b186df54ac 100644 --- a/include/linux/uio.h +++ b/include/linux/uio.h @@ -10,6 +10,7 @@ #include struct page; +struct address_space; struct pipe_inode_info; struct kvec { @@ -24,6 +25,7 @@ enum iter_type { ITER_BVEC = 16, ITER_PIPE = 32, ITER_DISCARD = 64, + ITER_XARRAY = 128, }; struct iov_iter { @@ -39,6 +41,7 @@ struct iov_iter { const struct iovec *iov; const struct kvec *kvec; const struct bio_vec *bvec; + struct xarray *xarray; struct pipe_inode_info *pipe; }; union { @@ -47,6 +50,7 @@ struct iov_iter { unsigned int head; unsigned int start_head; }; + loff_t xarray_start; }; }; @@ -80,6 +84,11 @@ static inline bool iov_iter_is_discard(const struct iov_iter *i) return iov_iter_type(i) == ITER_DISCARD; } +static inline bool iov_iter_is_xarray(const struct iov_iter *i) +{ + return iov_iter_type(i) == ITER_XARRAY; +} + static inline unsigned char iov_iter_rw(const struct iov_iter *i) { return i->type & (READ | WRITE); @@ -221,6 +230,8 @@ void iov_iter_bvec(struct iov_iter *i, unsigned int direction, const struct bio_ void iov_iter_pipe(struct iov_iter *i, unsigned int direction, struct pipe_inode_info *pipe, size_t count); void iov_iter_discard(struct iov_iter *i, unsigned int direction, size_t count); +void iov_iter_xarray(struct iov_iter *i, unsigned int direction, struct xarray *xarray, + loff_t start, size_t count); ssize_t iov_iter_get_pages(struct iov_iter *i, struct page **pages, size_t maxsize, unsigned maxpages, size_t *start); ssize_t iov_iter_get_pages_alloc(struct iov_iter *i, struct page ***pages, diff --git a/lib/iov_iter.c b/lib/iov_iter.c index a21e6a5792c5..f53a57588489 100644 --- a/lib/iov_iter.c +++ b/lib/iov_iter.c @@ -78,7 +78,44 @@ } \ } -#define iterate_all_kinds(i, n, v, I, B, K) { \ +#define iterate_xarray(i, n, __v, skip, STEP) { \ + struct page *head = NULL; \ + size_t wanted = n, seg, offset; \ + loff_t start = i->xarray_start + skip; \ + pgoff_t index = start >> PAGE_SHIFT; \ + int j; \ + \ + XA_STATE(xas, i->xarray, index); \ + \ + rcu_read_lock(); \ + xas_for_each(&xas, head, ULONG_MAX) { \ + if (xas_retry(&xas, head)) \ + continue; \ + if (WARN_ON(xa_is_value(head))) \ + break; \ + if (WARN_ON(PageHuge(head))) \ + break; \ + for (j = (head->index < index) ? index - head->index : 0; \ + j < thp_nr_pages(head); j++) { \ + __v.bv_page = head + j; \ + offset = (i->xarray_start + skip) & ~PAGE_MASK; \ + seg = PAGE_SIZE - offset; \ + __v.bv_offset = offset; \ + __v.bv_len = min(n, seg); \ + (void)(STEP); \ + n -= __v.bv_len; \ + skip += __v.bv_len; \ + if (n == 0) \ + break; \ + } \ + if (n == 0) \ + break; \ + } \ + rcu_read_unlock(); \ + n = wanted - n; \ +} + +#define iterate_all_kinds(i, n, v, I, B, K, X) { \ if (likely(n)) { \ size_t skip = i->iov_offset; \ if (unlikely(i->type & ITER_BVEC)) { \ @@ -90,6 +127,9 @@ struct kvec v; \ iterate_kvec(i, n, v, kvec, skip, (K)) \ } else if (unlikely(i->type & ITER_DISCARD)) { \ + } else if (unlikely(i->type & ITER_XARRAY)) { \ + struct bio_vec v; \ + iterate_xarray(i, n, v, skip, (X)); \ } else { \ const struct iovec *iov; \ struct iovec v; \ @@ -98,7 +138,7 @@ } \ } -#define iterate_and_advance(i, n, v, I, B, K) { \ +#define iterate_and_advance(i, n, v, I, B, K, X) { \ if (unlikely(i->count < n)) \ n = i->count; \ if (i->count) { \ @@ -123,6 +163,9 @@ i->kvec = kvec; \ } else if (unlikely(i->type & ITER_DISCARD)) { \ skip += n; \ + } else if (unlikely(i->type & ITER_XARRAY)) { \ + struct bio_vec v; \ + iterate_xarray(i, n, v, skip, (X)) \ } else { \ const struct iovec *iov; \ struct iovec v; \ @@ -636,7 +679,9 @@ size_t _copy_to_iter(const void *addr, size_t bytes, struct iov_iter *i) copyout(v.iov_base, (from += v.iov_len) - v.iov_len, v.iov_len), memcpy_to_page(v.bv_page, v.bv_offset, (from += v.bv_len) - v.bv_len, v.bv_len), - memcpy(v.iov_base, (from += v.iov_len) - v.iov_len, v.iov_len) + memcpy(v.iov_base, (from += v.iov_len) - v.iov_len, v.iov_len), + memcpy_to_page(v.bv_page, v.bv_offset, + (from += v.bv_len) - v.bv_len, v.bv_len) ) return bytes; @@ -752,6 +797,16 @@ size_t _copy_mc_to_iter(const void *addr, size_t bytes, struct iov_iter *i) bytes = curr_addr - s_addr - rem; return bytes; } + }), + ({ + rem = copy_mc_to_page(v.bv_page, v.bv_offset, + (from += v.bv_len) - v.bv_len, v.bv_len); + if (rem) { + curr_addr = (unsigned long) from; + bytes = curr_addr - s_addr - rem; + rcu_read_unlock(); + return bytes; + } }) ) @@ -773,7 +828,9 @@ size_t _copy_from_iter(void *addr, size_t bytes, struct iov_iter *i) copyin((to += v.iov_len) - v.iov_len, v.iov_base, v.iov_len), memcpy_from_page((to += v.bv_len) - v.bv_len, v.bv_page, v.bv_offset, v.bv_len), - memcpy((to += v.iov_len) - v.iov_len, v.iov_base, v.iov_len) + memcpy((to += v.iov_len) - v.iov_len, v.iov_base, v.iov_len), + memcpy_from_page((to += v.bv_len) - v.bv_len, v.bv_page, + v.bv_offset, v.bv_len) ) return bytes; @@ -799,7 +856,9 @@ bool _copy_from_iter_full(void *addr, size_t bytes, struct iov_iter *i) 0;}), memcpy_from_page((to += v.bv_len) - v.bv_len, v.bv_page, v.bv_offset, v.bv_len), - memcpy((to += v.iov_len) - v.iov_len, v.iov_base, v.iov_len) + memcpy((to += v.iov_len) - v.iov_len, v.iov_base, v.iov_len), + memcpy_from_page((to += v.bv_len) - v.bv_len, v.bv_page, + v.bv_offset, v.bv_len) ) iov_iter_advance(i, bytes); @@ -819,7 +878,9 @@ size_t _copy_from_iter_nocache(void *addr, size_t bytes, struct iov_iter *i) v.iov_base, v.iov_len), memcpy_from_page((to += v.bv_len) - v.bv_len, v.bv_page, v.bv_offset, v.bv_len), - memcpy((to += v.iov_len) - v.iov_len, v.iov_base, v.iov_len) + memcpy((to += v.iov_len) - v.iov_len, v.iov_base, v.iov_len), + memcpy_from_page((to += v.bv_len) - v.bv_len, v.bv_page, + v.bv_offset, v.bv_len) ) return bytes; @@ -854,7 +915,9 @@ size_t _copy_from_iter_flushcache(void *addr, size_t bytes, struct iov_iter *i) memcpy_page_flushcache((to += v.bv_len) - v.bv_len, v.bv_page, v.bv_offset, v.bv_len), memcpy_flushcache((to += v.iov_len) - v.iov_len, v.iov_base, - v.iov_len) + v.iov_len), + memcpy_page_flushcache((to += v.bv_len) - v.bv_len, v.bv_page, + v.bv_offset, v.bv_len) ) return bytes; @@ -878,7 +941,9 @@ bool _copy_from_iter_full_nocache(void *addr, size_t bytes, struct iov_iter *i) 0;}), memcpy_from_page((to += v.bv_len) - v.bv_len, v.bv_page, v.bv_offset, v.bv_len), - memcpy((to += v.iov_len) - v.iov_len, v.iov_base, v.iov_len) + memcpy((to += v.iov_len) - v.iov_len, v.iov_base, v.iov_len), + memcpy_from_page((to += v.bv_len) - v.bv_len, v.bv_page, + v.bv_offset, v.bv_len) ) iov_iter_advance(i, bytes); @@ -915,7 +980,7 @@ size_t copy_page_to_iter(struct page *page, size_t offset, size_t bytes, { if (unlikely(!page_copy_sane(page, offset, bytes))) return 0; - if (i->type & (ITER_BVEC|ITER_KVEC)) { + if (i->type & (ITER_BVEC | ITER_KVEC | ITER_XARRAY)) { void *kaddr = kmap_atomic(page); size_t wanted = copy_to_iter(kaddr + offset, bytes, i); kunmap_atomic(kaddr); @@ -938,7 +1003,7 @@ size_t copy_page_from_iter(struct page *page, size_t offset, size_t bytes, WARN_ON(1); return 0; } - if (i->type & (ITER_BVEC|ITER_KVEC)) { + if (i->type & (ITER_BVEC | ITER_KVEC | ITER_XARRAY)) { void *kaddr = kmap_atomic(page); size_t wanted = _copy_from_iter(kaddr + offset, bytes, i); kunmap_atomic(kaddr); @@ -982,7 +1047,8 @@ size_t iov_iter_zero(size_t bytes, struct iov_iter *i) iterate_and_advance(i, bytes, v, clear_user(v.iov_base, v.iov_len), memzero_page(v.bv_page, v.bv_offset, v.bv_len), - memset(v.iov_base, 0, v.iov_len) + memset(v.iov_base, 0, v.iov_len), + memzero_page(v.bv_page, v.bv_offset, v.bv_len) ) return bytes; @@ -1006,7 +1072,9 @@ size_t iov_iter_copy_from_user_atomic(struct page *page, copyin((p += v.iov_len) - v.iov_len, v.iov_base, v.iov_len), memcpy_from_page((p += v.bv_len) - v.bv_len, v.bv_page, v.bv_offset, v.bv_len), - memcpy((p += v.iov_len) - v.iov_len, v.iov_base, v.iov_len) + memcpy((p += v.iov_len) - v.iov_len, v.iov_base, v.iov_len), + memcpy_from_page((p += v.bv_len) - v.bv_len, v.bv_page, + v.bv_offset, v.bv_len) ) kunmap_atomic(kaddr); return bytes; @@ -1077,7 +1145,12 @@ void iov_iter_advance(struct iov_iter *i, size_t size) i->count -= size; return; } - iterate_and_advance(i, size, v, 0, 0, 0) + if (unlikely(iov_iter_is_xarray(i))) { + i->iov_offset += size; + i->count -= size; + return; + } + iterate_and_advance(i, size, v, 0, 0, 0, 0) } EXPORT_SYMBOL(iov_iter_advance); @@ -1121,7 +1194,12 @@ void iov_iter_revert(struct iov_iter *i, size_t unroll) return; } unroll -= i->iov_offset; - if (iov_iter_is_bvec(i)) { + if (iov_iter_is_xarray(i)) { + BUG(); /* We should never go beyond the start of the specified + * range since we might then be straying into pages that + * aren't pinned. + */ + } else if (iov_iter_is_bvec(i)) { const struct bio_vec *bvec = i->bvec; while (1) { size_t n = (--bvec)->bv_len; @@ -1158,9 +1236,9 @@ size_t iov_iter_single_seg_count(const struct iov_iter *i) return i->count; // it is a silly place, anyway if (i->nr_segs == 1) return i->count; - if (unlikely(iov_iter_is_discard(i))) + if (unlikely(iov_iter_is_discard(i) || iov_iter_is_xarray(i))) return i->count; - else if (iov_iter_is_bvec(i)) + if (iov_iter_is_bvec(i)) return min(i->count, i->bvec->bv_len - i->iov_offset); else return min(i->count, i->iov->iov_len - i->iov_offset); @@ -1208,6 +1286,31 @@ void iov_iter_pipe(struct iov_iter *i, unsigned int direction, } EXPORT_SYMBOL(iov_iter_pipe); +/** + * iov_iter_xarray - Initialise an I/O iterator to use the pages in an xarray + * @i: The iterator to initialise. + * @direction: The direction of the transfer. + * @xarray: The xarray to access. + * @start: The start file position. + * @count: The size of the I/O buffer in bytes. + * + * Set up an I/O iterator to either draw data out of the pages attached to an + * inode or to inject data into those pages. The pages *must* be prevented + * from evaporation, either by taking a ref on them or locking them by the + * caller. + */ +void iov_iter_xarray(struct iov_iter *i, unsigned int direction, + struct xarray *xarray, loff_t start, size_t count) +{ + BUG_ON(direction & ~1); + i->type = ITER_XARRAY | (direction & (READ | WRITE)); + i->xarray = xarray; + i->xarray_start = start; + i->count = count; + i->iov_offset = 0; +} +EXPORT_SYMBOL(iov_iter_xarray); + /** * iov_iter_discard - Initialise an I/O iterator that discards data * @i: The iterator to initialise. @@ -1241,7 +1344,8 @@ unsigned long iov_iter_alignment(const struct iov_iter *i) iterate_all_kinds(i, size, v, (res |= (unsigned long)v.iov_base | v.iov_len, 0), res |= v.bv_offset | v.bv_len, - res |= (unsigned long)v.iov_base | v.iov_len + res |= (unsigned long)v.iov_base | v.iov_len, + res |= v.bv_offset | v.bv_len ) return res; } @@ -1263,7 +1367,9 @@ unsigned long iov_iter_gap_alignment(const struct iov_iter *i) (res |= (!res ? 0 : (unsigned long)v.bv_offset) | (size != v.bv_len ? size : 0)), (res |= (!res ? 0 : (unsigned long)v.iov_base) | - (size != v.iov_len ? size : 0)) + (size != v.iov_len ? size : 0)), + (res |= (!res ? 0 : (unsigned long)v.bv_offset) | + (size != v.bv_len ? size : 0)) ); return res; } @@ -1313,6 +1419,75 @@ static ssize_t pipe_get_pages(struct iov_iter *i, return __pipe_get_pages(i, min(maxsize, capacity), pages, iter_head, start); } +static ssize_t iter_xarray_copy_pages(struct page **pages, struct xarray *xa, + pgoff_t index, unsigned int nr_pages) +{ + XA_STATE(xas, xa, index); + struct page *page; + unsigned int ret = 0; + + rcu_read_lock(); + for (page = xas_load(&xas); page; page = xas_next(&xas)) { + if (xas_retry(&xas, page)) + continue; + + /* Has the page moved or been split? */ + if (unlikely(page != xas_reload(&xas))) { + xas_reset(&xas); + continue; + } + + pages[ret] = find_subpage(page, xas.xa_index); + get_page(pages[ret]); + if (++ret == nr_pages) + break; + } + rcu_read_unlock(); + return ret; +} + +static ssize_t iter_xarray_get_pages(struct iov_iter *i, + struct page **pages, size_t maxsize, + unsigned maxpages, size_t *_start_offset) +{ + unsigned nr, offset; + pgoff_t index, count; + size_t size = maxsize, actual; + loff_t pos; + + if (!size || !maxpages) + return 0; + + pos = i->xarray_start + i->iov_offset; + index = pos >> PAGE_SHIFT; + offset = pos & ~PAGE_MASK; + *_start_offset = offset; + + count = 1; + if (size > PAGE_SIZE - offset) { + size -= PAGE_SIZE - offset; + count += size >> PAGE_SHIFT; + size &= ~PAGE_MASK; + if (size) + count++; + } + + if (count > maxpages) + count = maxpages; + + nr = iter_xarray_copy_pages(pages, i->xarray, index, count); + if (nr == 0) + return 0; + + actual = PAGE_SIZE * nr; + actual -= offset; + if (nr == count && size > 0) { + unsigned last_offset = (nr > 1) ? 0 : offset; + actual -= PAGE_SIZE - (last_offset + size); + } + return actual; +} + ssize_t iov_iter_get_pages(struct iov_iter *i, struct page **pages, size_t maxsize, unsigned maxpages, size_t *start) @@ -1322,6 +1497,8 @@ ssize_t iov_iter_get_pages(struct iov_iter *i, if (unlikely(iov_iter_is_pipe(i))) return pipe_get_pages(i, pages, maxsize, maxpages, start); + if (unlikely(iov_iter_is_xarray(i))) + return iter_xarray_get_pages(i, pages, maxsize, maxpages, start); if (unlikely(iov_iter_is_discard(i))) return -EFAULT; @@ -1348,7 +1525,8 @@ ssize_t iov_iter_get_pages(struct iov_iter *i, return v.bv_len; }),({ return -EFAULT; - }) + }), + 0 ) return 0; } @@ -1392,6 +1570,51 @@ static ssize_t pipe_get_pages_alloc(struct iov_iter *i, return n; } +static ssize_t iter_xarray_get_pages_alloc(struct iov_iter *i, + struct page ***pages, size_t maxsize, + size_t *_start_offset) +{ + struct page **p; + unsigned nr, offset; + pgoff_t index, count; + size_t size = maxsize, actual; + loff_t pos; + + if (!size) + return 0; + + pos = i->xarray_start + i->iov_offset; + index = pos >> PAGE_SHIFT; + offset = pos & ~PAGE_MASK; + *_start_offset = offset; + + count = 1; + if (size > PAGE_SIZE - offset) { + size -= PAGE_SIZE - offset; + count += size >> PAGE_SHIFT; + size &= ~PAGE_MASK; + if (size) + count++; + } + + p = get_pages_array(count); + if (!p) + return -ENOMEM; + *pages = p; + + nr = iter_xarray_copy_pages(p, i->xarray, index, count); + if (nr == 0) + return 0; + + actual = PAGE_SIZE * nr; + actual -= offset; + if (nr == count && size > 0) { + unsigned last_offset = (nr > 1) ? 0 : offset; + actual -= PAGE_SIZE - (last_offset + size); + } + return actual; +} + ssize_t iov_iter_get_pages_alloc(struct iov_iter *i, struct page ***pages, size_t maxsize, size_t *start) @@ -1403,6 +1626,8 @@ ssize_t iov_iter_get_pages_alloc(struct iov_iter *i, if (unlikely(iov_iter_is_pipe(i))) return pipe_get_pages_alloc(i, pages, maxsize, start); + if (unlikely(iov_iter_is_xarray(i))) + return iter_xarray_get_pages_alloc(i, pages, maxsize, start); if (unlikely(iov_iter_is_discard(i))) return -EFAULT; @@ -1435,7 +1660,7 @@ ssize_t iov_iter_get_pages_alloc(struct iov_iter *i, return v.bv_len; }),({ return -EFAULT; - }) + }), 0 ) return 0; } @@ -1473,6 +1698,13 @@ size_t csum_and_copy_from_iter(void *addr, size_t bytes, __wsum *csum, v.iov_base, v.iov_len, sum, off); off += v.iov_len; + }), ({ + char *p = kmap_atomic(v.bv_page); + sum = csum_and_memcpy((to += v.bv_len) - v.bv_len, + p + v.bv_offset, v.bv_len, + sum, off); + kunmap_atomic(p); + off += v.bv_len; }) ) *csum = sum; @@ -1514,6 +1746,13 @@ bool csum_and_copy_from_iter_full(void *addr, size_t bytes, __wsum *csum, v.iov_base, v.iov_len, sum, off); off += v.iov_len; + }), ({ + char *p = kmap_atomic(v.bv_page); + sum = csum_and_memcpy((to += v.bv_len) - v.bv_len, + p + v.bv_offset, v.bv_len, + sum, off); + kunmap_atomic(p); + off += v.bv_len; }) ) *csum = sum; @@ -1559,6 +1798,13 @@ size_t csum_and_copy_to_iter(const void *addr, size_t bytes, void *csump, (from += v.iov_len) - v.iov_len, v.iov_len, sum, off); off += v.iov_len; + }), ({ + char *p = kmap_atomic(v.bv_page); + sum = csum_and_memcpy(p + v.bv_offset, + (from += v.bv_len) - v.bv_len, + v.bv_len, sum, off); + kunmap_atomic(p); + off += v.bv_len; }) ) *csum = sum; @@ -1608,6 +1854,21 @@ int iov_iter_npages(const struct iov_iter *i, int maxpages) npages = pipe_space_for_user(iter_head, pipe->tail, pipe); if (npages >= maxpages) return maxpages; + } else if (unlikely(iov_iter_is_xarray(i))) { + unsigned offset; + + offset = (i->xarray_start + i->iov_offset) & ~PAGE_MASK; + + npages = 1; + if (size > PAGE_SIZE - offset) { + size -= PAGE_SIZE - offset; + npages += size >> PAGE_SHIFT; + size &= ~PAGE_MASK; + if (size) + npages++; + } + if (npages >= maxpages) + return maxpages; } else iterate_all_kinds(i, size, v, ({ unsigned long p = (unsigned long)v.iov_base; npages += DIV_ROUND_UP(p + v.iov_len, PAGE_SIZE) @@ -1624,7 +1885,8 @@ int iov_iter_npages(const struct iov_iter *i, int maxpages) - p / PAGE_SIZE; if (npages >= maxpages) return maxpages; - }) + }), + 0 ) return npages; } @@ -1637,7 +1899,7 @@ const void *dup_iter(struct iov_iter *new, struct iov_iter *old, gfp_t flags) WARN_ON(1); return NULL; } - if (unlikely(iov_iter_is_discard(new))) + if (unlikely(iov_iter_is_discard(new) || iov_iter_is_xarray(new))) return NULL; if (iov_iter_is_bvec(new)) return new->bvec = kmemdup(new->bvec, @@ -1842,7 +2104,12 @@ int iov_iter_for_each_range(struct iov_iter *i, size_t bytes, kunmap(v.bv_page); err;}), ({ w = v; - err = f(&w, context);}) + err = f(&w, context);}), ({ + w.iov_base = kmap(v.bv_page) + v.bv_offset; + w.iov_len = v.bv_len; + err = f(&w, context); + kunmap(v.bv_page); + err;}) ) return err; } From patchwork Mon Feb 15 15:44:39 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 12088439 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5B2F2C433E0 for ; Mon, 15 Feb 2021 15:44:53 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id E480D64E8C for ; Mon, 15 Feb 2021 15:44:52 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org E480D64E8C Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 874B78D010D; Mon, 15 Feb 2021 10:44:52 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 7FD0F8D00FD; Mon, 15 Feb 2021 10:44:52 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6C58A8D010D; Mon, 15 Feb 2021 10:44:52 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0254.hostedemail.com [216.40.44.254]) by kanga.kvack.org (Postfix) with ESMTP id 534B38D00FD for ; Mon, 15 Feb 2021 10:44:52 -0500 (EST) Received: from smtpin24.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 1C3758410 for ; Mon, 15 Feb 2021 15:44:52 +0000 (UTC) X-FDA: 77820925224.24.ice43_2d170552763c Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin24.hostedemail.com (Postfix) with ESMTP id F3AD91A4A7 for ; Mon, 15 Feb 2021 15:44:51 +0000 (UTC) X-HE-Tag: ice43_2d170552763c X-Filterd-Recvd-Size: 6210 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [63.128.21.124]) by imf29.hostedemail.com (Postfix) with ESMTP for ; Mon, 15 Feb 2021 15:44:51 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1613403890; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=eRjZtsA5zhSS/WW5VYasjVbQdw0rYUOiOpY9AmwMNHQ=; b=i47rPONr3SPATkRDe9UWHj5Z1nez118tvZzzo6Y4u1GBA8bySc46UekOr3CUQVNCmgB+go ai+CGTZvzhrgudImHTTH7+g2HmL+KPPzHOOtYnxg5U49mpp7SfiV/fyD4mAnaITEqIlRaw +ePfve4w7ULA6m8I2ARVv8UVz1sqXK4= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-448-MqlpGAkWNeC56cKekoPN7Q-1; Mon, 15 Feb 2021 10:44:49 -0500 X-MC-Unique: MqlpGAkWNeC56cKekoPN7Q-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id BFEF1107ACC7; Mon, 15 Feb 2021 15:44:46 +0000 (UTC) Received: from warthog.procyon.org.uk (ovpn-119-68.rdu2.redhat.com [10.10.119.68]) by smtp.corp.redhat.com (Postfix) with ESMTP id 462C060CEF; Mon, 15 Feb 2021 15:44:40 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [PATCH 02/33] mm: Add an unlock function for PG_private_2/PG_fscache From: David Howells To: Trond Myklebust , Anna Schumaker , Steve French , Dominique Martinet Cc: Linus Torvalds , Linus Torvalds , "Matthew Wilcox (Oracle)" , Alexander Viro , Christoph Hellwig , linux-mm@kvack.org, linux-cachefs@redhat.com, linux-afs@lists.infradead.org, linux-nfs@vger.kernel.org, linux-cifs@vger.kernel.org, ceph-devel@vger.kernel.org, v9fs-developer@lists.sourceforge.net, linux-fsdevel@vger.kernel.org, dhowells@redhat.com, Jeff Layton , David Wysochanski , "Matthew Wilcox (Oracle)" , Alexander Viro , linux-cachefs@redhat.com, linux-afs@lists.infradead.org, linux-nfs@vger.kernel.org, linux-cifs@vger.kernel.org, ceph-devel@vger.kernel.org, v9fs-developer@lists.sourceforge.net, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Date: Mon, 15 Feb 2021 15:44:39 +0000 Message-ID: <161340387944.1303470.7944159520278177652.stgit@warthog.procyon.org.uk> In-Reply-To: <161340385320.1303470.2392622971006879777.stgit@warthog.procyon.org.uk> References: <161340385320.1303470.2392622971006879777.stgit@warthog.procyon.org.uk> User-Agent: StGit/0.23 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Add a function, unlock_page_private_2(), to unlock PG_private_2 analogous to that of PG_lock. Add a kerneldoc banner to that indicating the example usage case. A wrapper will need to be placed in the netfs header in the patch that adds that. [This implements a suggestion by Linus to not mix the terminology of PG_private_2 and PG_fscache in the mm core function] Suggested-by: Linus Torvalds Signed-off-by: David Howells cc: Linus Torvalds cc: Matthew Wilcox (Oracle) cc: Alexander Viro cc: Christoph Hellwig cc: linux-mm@kvack.org cc: linux-cachefs@redhat.com cc: linux-afs@lists.infradead.org cc: linux-nfs@vger.kernel.org cc: linux-cifs@vger.kernel.org cc: ceph-devel@vger.kernel.org cc: v9fs-developer@lists.sourceforge.net cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/linux-fsdevel/1330473.1612974547@warthog.procyon.org.uk/ Link: https://lore.kernel.org/linux-fsdevel/CAHk-=wjgA-74ddehziVk=XAEMTKswPu1Yw4uaro1R3ibs27ztw@mail.gmail.com/ Reviewed-by: Christoph Hellwig --- include/linux/pagemap.h | 1 + mm/filemap.c | 20 ++++++++++++++++++++ 2 files changed, 21 insertions(+) diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index d5570deff400..365a28ece763 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -591,6 +591,7 @@ extern int __lock_page_async(struct page *page, struct wait_page_queue *wait); extern int __lock_page_or_retry(struct page *page, struct mm_struct *mm, unsigned int flags); extern void unlock_page(struct page *page); +extern void unlock_page_private_2(struct page *page); /* * Return true if the page was successfully locked diff --git a/mm/filemap.c b/mm/filemap.c index 5c9d564317a5..7d321152d579 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -1466,6 +1466,26 @@ void unlock_page(struct page *page) } EXPORT_SYMBOL(unlock_page); +/** + * unlock_page_private_2 - Unlock a page that's locked with PG_private_2 + * @page: The page + * + * Unlocks a page that's locked with PG_private_2 and wakes up sleepers in + * wait_on_page_private_2(). + * + * This is, for example, used when a netfs page is being written to a local + * disk cache, thereby allowing writes to the cache for the same page to be + * serialised. + */ +void unlock_page_private_2(struct page *page) +{ + page = compound_head(page); + VM_BUG_ON_PAGE(!PagePrivate2(page), page); + clear_bit_unlock(PG_private_2, &page->flags); + wake_up_page_bit(page, PG_private_2); +} +EXPORT_SYMBOL(unlock_page_private_2); + /** * end_page_writeback - end writeback against a page * @page: the page From patchwork Mon Feb 15 15:44:52 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 12088441 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 12F7FC433E0 for ; Mon, 15 Feb 2021 15:45:03 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id A892364E9E for ; Mon, 15 Feb 2021 15:45:02 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org A892364E9E Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 393878D010E; Mon, 15 Feb 2021 10:45:02 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 31B658D00FD; Mon, 15 Feb 2021 10:45:02 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1E3FC8D010E; Mon, 15 Feb 2021 10:45:02 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0247.hostedemail.com [216.40.44.247]) by kanga.kvack.org (Postfix) with ESMTP id F0DC08D00FD for ; Mon, 15 Feb 2021 10:45:01 -0500 (EST) Received: from smtpin30.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id ACE09812F for ; Mon, 15 Feb 2021 15:45:01 +0000 (UTC) X-FDA: 77820925602.30.B384B4E Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [216.205.24.124]) by imf04.hostedemail.com (Postfix) with ESMTP id B930B12E for ; Mon, 15 Feb 2021 15:44:59 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1613403900; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=I0qlBdxKxClBfihp/lgdjTIf+rONu/U7ZON8p6l2s80=; b=E9spuKvgvOFhnyMiJ6rtlpGUjcShzcArFHvFka6j/8twiMUz+vAnRd5mxdUJZjnWASzbjU dIKhxRN9XDQRDK5P68tOd3/t+ySlcXNPrFgghze4Je1OzV2Db3WkQQrGHce0ohf6WpteIn CJ+7EL/+zIu99FcQWBK7zFcK1OI8m9Q= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-463-XD3HjPbrO-a3_Rft59zEag-1; Mon, 15 Feb 2021 10:44:58 -0500 X-MC-Unique: XD3HjPbrO-a3_Rft59zEag-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 479DD801962; Mon, 15 Feb 2021 15:44:56 +0000 (UTC) Received: from warthog.procyon.org.uk (ovpn-119-68.rdu2.redhat.com [10.10.119.68]) by smtp.corp.redhat.com (Postfix) with ESMTP id CF03C608DB; Mon, 15 Feb 2021 15:44:52 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [PATCH 03/33] mm: Implement readahead_control pageset expansion From: David Howells To: Trond Myklebust , Anna Schumaker , Steve French , Dominique Martinet Cc: "Matthew Wilcox (Oracle)" , "Matthew Wilcox (Oracle)" , Alexander Viro , Christoph Hellwig , linux-mm@kvack.org, linux-cachefs@redhat.com, linux-afs@lists.infradead.org, linux-nfs@vger.kernel.org, linux-cifs@vger.kernel.org, ceph-devel@vger.kernel.org, v9fs-developer@lists.sourceforge.net, linux-fsdevel@vger.kernel.org, dhowells@redhat.com, Jeff Layton , David Wysochanski , "Matthew Wilcox (Oracle)" , Alexander Viro , linux-cachefs@redhat.com, linux-afs@lists.infradead.org, linux-nfs@vger.kernel.org, linux-cifs@vger.kernel.org, ceph-devel@vger.kernel.org, v9fs-developer@lists.sourceforge.net, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Date: Mon, 15 Feb 2021 15:44:52 +0000 Message-ID: <161340389201.1303470.14353807284546854878.stgit@warthog.procyon.org.uk> In-Reply-To: <161340385320.1303470.2392622971006879777.stgit@warthog.procyon.org.uk> References: <161340385320.1303470.2392622971006879777.stgit@warthog.procyon.org.uk> User-Agent: StGit/0.23 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 X-Stat-Signature: qho4i1dzwxuqbjak9o693g8b7bsjuytf X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: B930B12E Received-SPF: none (redhat.com>: No applicable sender policy available) receiver=imf04; identity=mailfrom; envelope-from=""; helo=us-smtp-delivery-124.mimecast.com; client-ip=216.205.24.124 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1613403899-503070 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Provide a function, readahead_expand(), that expands the set of pages specified by a readahead_control object to encompass a revised area with a proposed size and length. The proposed area must include all of the old area and may be expanded yet more by this function so that the edges align on (transparent huge) page boundaries as allocated. The expansion will be cut short if a page already exists in either of the areas being expanded into. Note that any expansion made in such a case is not rolled back. This will be used by fscache so that reads can be expanded to cache granule boundaries, thereby allowing whole granules to be stored in the cache, but there are other potential users also. Suggested-by: Matthew Wilcox (Oracle) Signed-off-by: David Howells cc: Matthew Wilcox (Oracle) cc: Alexander Viro cc: Christoph Hellwig cc: linux-mm@kvack.org cc: linux-cachefs@redhat.com cc: linux-afs@lists.infradead.org cc: linux-nfs@vger.kernel.org cc: linux-cifs@vger.kernel.org cc: ceph-devel@vger.kernel.org cc: v9fs-developer@lists.sourceforge.net cc: linux-fsdevel@vger.kernel.org --- include/linux/pagemap.h | 2 + mm/readahead.c | 70 +++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 72 insertions(+) diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index 365a28ece763..d2786607d297 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -761,6 +761,8 @@ extern void __delete_from_page_cache(struct page *page, void *shadow); int replace_page_cache_page(struct page *old, struct page *new, gfp_t gfp_mask); void delete_from_page_cache_batch(struct address_space *mapping, struct pagevec *pvec); +void readahead_expand(struct readahead_control *ractl, + loff_t new_start, size_t new_len); /* * Like add_to_page_cache_locked, but used to add newly allocated pages: diff --git a/mm/readahead.c b/mm/readahead.c index c5b0457415be..4446dada0bc2 100644 --- a/mm/readahead.c +++ b/mm/readahead.c @@ -638,3 +638,73 @@ SYSCALL_DEFINE3(readahead, int, fd, loff_t, offset, size_t, count) { return ksys_readahead(fd, offset, count); } + +/** + * readahead_expand - Expand a readahead request + * @ractl: The request to be expanded + * @new_start: The revised start + * @new_len: The revised size of the request + * + * Attempt to expand a readahead request outwards from the current size to the + * specified size by inserting locked pages before and after the current window + * to increase the size to the new window. This may involve the insertion of + * THPs, in which case the window may get expanded even beyond what was + * requested. + * + * The algorithm will stop if it encounters a conflicting page already in the + * pagecache and leave a smaller expansion than requested. + * + * The caller must check for this by examining the revised @ractl object for a + * different expansion than was requested. + */ +void readahead_expand(struct readahead_control *ractl, + loff_t new_start, size_t new_len) +{ + struct address_space *mapping = ractl->mapping; + pgoff_t new_index, new_nr_pages; + gfp_t gfp_mask = readahead_gfp_mask(mapping); + + new_index = new_start / PAGE_SIZE; + + /* Expand the leading edge downwards */ + while (ractl->_index > new_index) { + unsigned long index = ractl->_index - 1; + struct page *page = xa_load(&mapping->i_pages, index); + + if (page && !xa_is_value(page)) + return; /* Page apparently present */ + + page = __page_cache_alloc(gfp_mask); + if (!page) + return; + if (add_to_page_cache_lru(page, mapping, index, gfp_mask) < 0) { + put_page(page); + return; + } + + ractl->_nr_pages++; + ractl->_index = page->index; + } + + new_len += new_start - readahead_pos(ractl); + new_nr_pages = DIV_ROUND_UP(new_len, PAGE_SIZE); + + /* Expand the trailing edge upwards */ + while (ractl->_nr_pages < new_nr_pages) { + unsigned long index = ractl->_index + ractl->_nr_pages; + struct page *page = xa_load(&mapping->i_pages, index); + + if (page && !xa_is_value(page)) + return; /* Page apparently present */ + + page = __page_cache_alloc(gfp_mask); + if (!page) + return; + if (add_to_page_cache_lru(page, mapping, index, gfp_mask) < 0) { + put_page(page); + return; + } + ractl->_nr_pages++; + } +} +EXPORT_SYMBOL(readahead_expand); From patchwork Mon Feb 15 15:45:01 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 12088443 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id CD586C433DB for ; Mon, 15 Feb 2021 15:45:17 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 78F5164E8E for ; Mon, 15 Feb 2021 15:45:17 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 78F5164E8E Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 122E98D010F; Mon, 15 Feb 2021 10:45:17 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 0AAA18D00FD; Mon, 15 Feb 2021 10:45:17 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EDB678D010F; Mon, 15 Feb 2021 10:45:16 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0089.hostedemail.com [216.40.44.89]) by kanga.kvack.org (Postfix) with ESMTP id D4B878D00FD for ; Mon, 15 Feb 2021 10:45:16 -0500 (EST) Received: from smtpin11.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 9F2738248068 for ; Mon, 15 Feb 2021 15:45:16 +0000 (UTC) X-FDA: 77820926232.11.674E684 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [216.205.24.124]) by imf27.hostedemail.com (Postfix) with ESMTP id BC48180192C7 for ; Mon, 15 Feb 2021 15:45:13 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1613403915; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=YZMneCWDbQEk+LndIaAM6UfAna/NfoWJ6AKcotcHN5o=; b=Mo1zqK0OVQDG1MXdvGMAShCeHumtCzvSHi7Tyl7MYZoEo8ornHaV1cg58XczDp5UrJpFor 8XASYOmxgZYAw+w7MlJtZxtfw1xJLoyO5TTjP4OBFytjdFBJr9OVjAmUuTl+Ja8chL6ScY kP7nlqvMc6V3ZUiMRLEAANyB0Unuw70= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-503-3Du8V8weOv22OMkr9mhwtw-1; Mon, 15 Feb 2021 10:45:11 -0500 X-MC-Unique: 3Du8V8weOv22OMkr9mhwtw-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 0CB4A874981; Mon, 15 Feb 2021 15:45:09 +0000 (UTC) Received: from warthog.procyon.org.uk (ovpn-119-68.rdu2.redhat.com [10.10.119.68]) by smtp.corp.redhat.com (Postfix) with ESMTP id 7C2E05D9D3; Mon, 15 Feb 2021 15:45:02 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [PATCH 04/33] vfs: Export rw_verify_area() for use by cachefiles From: David Howells To: Trond Myklebust , Anna Schumaker , Steve French , Dominique Martinet Cc: Alexander Viro , Christoph Hellwig , Matthew Wilcox , linux-mm@kvack.org, linux-cachefs@redhat.com, linux-afs@lists.infradead.org, linux-nfs@vger.kernel.org, linux-cifs@vger.kernel.org, ceph-devel@vger.kernel.org, v9fs-developer@lists.sourceforge.net, linux-fsdevel@vger.kernel.org, dhowells@redhat.com, Jeff Layton , David Wysochanski , "Matthew Wilcox (Oracle)" , Alexander Viro , linux-cachefs@redhat.com, linux-afs@lists.infradead.org, linux-nfs@vger.kernel.org, linux-cifs@vger.kernel.org, ceph-devel@vger.kernel.org, v9fs-developer@lists.sourceforge.net, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Date: Mon, 15 Feb 2021 15:45:01 +0000 Message-ID: <161340390150.1303470.509630287091953754.stgit@warthog.procyon.org.uk> In-Reply-To: <161340385320.1303470.2392622971006879777.stgit@warthog.procyon.org.uk> References: <161340385320.1303470.2392622971006879777.stgit@warthog.procyon.org.uk> User-Agent: StGit/0.23 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14 X-Stat-Signature: 7xx3m1ftbxnk7cbikren6bgnynefpdhb X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: BC48180192C7 Received-SPF: none (redhat.com>: No applicable sender policy available) receiver=imf27; identity=mailfrom; envelope-from=""; helo=us-smtp-delivery-124.mimecast.com; client-ip=216.205.24.124 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1613403913-38659 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Export rw_verify_area() for so that cachefiles can use it before issuing call_read_iter() and call_write_iter() to effect async DIO operations against the cache. This is analogous to aio_read() and aio_write(). Signed-off-by: David Howells cc: Alexander Viro cc: Christoph Hellwig cc: Matthew Wilcox cc: linux-mm@kvack.org cc: linux-cachefs@redhat.com cc: linux-afs@lists.infradead.org cc: linux-nfs@vger.kernel.org cc: linux-cifs@vger.kernel.org cc: ceph-devel@vger.kernel.org cc: v9fs-developer@lists.sourceforge.net cc: linux-fsdevel@vger.kernel.org --- fs/internal.h | 5 ----- fs/read_write.c | 1 + include/linux/fs.h | 1 + 3 files changed, 2 insertions(+), 5 deletions(-) diff --git a/fs/internal.h b/fs/internal.h index 77c50befbfbe..92e686249c40 100644 --- a/fs/internal.h +++ b/fs/internal.h @@ -164,11 +164,6 @@ extern char *simple_dname(struct dentry *, char *, int); extern void dput_to_list(struct dentry *, struct list_head *); extern void shrink_dentry_list(struct list_head *); -/* - * read_write.c - */ -extern int rw_verify_area(int, struct file *, const loff_t *, size_t); - /* * pipe.c */ diff --git a/fs/read_write.c b/fs/read_write.c index 75f764b43418..fe84e11245bd 100644 --- a/fs/read_write.c +++ b/fs/read_write.c @@ -400,6 +400,7 @@ int rw_verify_area(int read_write, struct file *file, const loff_t *ppos, size_t return security_file_permission(file, read_write == READ ? MAY_READ : MAY_WRITE); } +EXPORT_SYMBOL(rw_verify_area); static ssize_t new_sync_read(struct file *filp, char __user *buf, size_t len, loff_t *ppos) { diff --git a/include/linux/fs.h b/include/linux/fs.h index fd47deea7c17..493804856ab3 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -2760,6 +2760,7 @@ extern int notify_change(struct dentry *, struct iattr *, struct inode **); extern int inode_permission(struct inode *, int); extern int generic_permission(struct inode *, int); extern int __check_sticky(struct inode *dir, struct inode *inode); +extern int rw_verify_area(int, struct file *, const loff_t *, size_t); static inline bool execute_ok(struct inode *inode) { From patchwork Mon Feb 15 15:45:14 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 12088445 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 28E57C433DB for ; Mon, 15 Feb 2021 15:45:26 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id BE9B864E8E for ; Mon, 15 Feb 2021 15:45:25 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org BE9B864E8E Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 5B5298D0110; Mon, 15 Feb 2021 10:45:25 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 53C918D00FD; Mon, 15 Feb 2021 10:45:25 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 42B8B8D0110; Mon, 15 Feb 2021 10:45:25 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 2AC6A8D00FD for ; Mon, 15 Feb 2021 10:45:25 -0500 (EST) Received: from smtpin06.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id DA9BE18213824 for ; Mon, 15 Feb 2021 15:45:24 +0000 (UTC) X-FDA: 77820926568.06.8630769 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [63.128.21.124]) by imf14.hostedemail.com (Postfix) with ESMTP id 6F3D8C000C48 for ; Mon, 15 Feb 2021 15:45:21 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1613403923; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=oBTSFhINMsM68khIK0Wz2lQs8Yep4BBTL+CrILJHwTU=; b=eSIYMyF9l5MjHC3zejUZbNIZ5HgU46zJMp1s0zAzkGCLo3hOtTSuOpxMd3hdFGEa5FNuHB I7L1bkDN1fI71rC8rPMk7HRnaxBiP60Dmicrbo+v33wSHDkGnRDFwgiJeSReOlX8YidLsW e/q3WRwlUhE55IM3wTNCYU3LrAZ5MiU= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-386-EpmUWSaFOcu3F9GItixxoA-1; Mon, 15 Feb 2021 10:45:20 -0500 X-MC-Unique: EpmUWSaFOcu3F9GItixxoA-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 3D74F189DF56; Mon, 15 Feb 2021 15:45:18 +0000 (UTC) Received: from warthog.procyon.org.uk (ovpn-119-68.rdu2.redhat.com [10.10.119.68]) by smtp.corp.redhat.com (Postfix) with ESMTP id 233A15D9C0; Mon, 15 Feb 2021 15:45:14 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [PATCH 05/33] netfs: Make a netfs helper module From: David Howells To: Trond Myklebust , Anna Schumaker , Steve French , Dominique Martinet Cc: Jeff Layton , Matthew Wilcox , linux-mm@kvack.org, linux-cachefs@redhat.com, linux-afs@lists.infradead.org, linux-nfs@vger.kernel.org, linux-cifs@vger.kernel.org, ceph-devel@vger.kernel.org, v9fs-developer@lists.sourceforge.net, linux-fsdevel@vger.kernel.org, dhowells@redhat.com, Jeff Layton , David Wysochanski , "Matthew Wilcox (Oracle)" , Alexander Viro , linux-cachefs@redhat.com, linux-afs@lists.infradead.org, linux-nfs@vger.kernel.org, linux-cifs@vger.kernel.org, ceph-devel@vger.kernel.org, v9fs-developer@lists.sourceforge.net, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Date: Mon, 15 Feb 2021 15:45:14 +0000 Message-ID: <161340391427.1303470.14884950716721956560.stgit@warthog.procyon.org.uk> In-Reply-To: <161340385320.1303470.2392622971006879777.stgit@warthog.procyon.org.uk> References: <161340385320.1303470.2392622971006879777.stgit@warthog.procyon.org.uk> User-Agent: StGit/0.23 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14 X-Stat-Signature: tby9dw3dr17fbxuazd4kjikr3dkiy6cp X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 6F3D8C000C48 Received-SPF: none (redhat.com>: No applicable sender policy available) receiver=imf14; identity=mailfrom; envelope-from=""; helo=us-smtp-delivery-124.mimecast.com; client-ip=63.128.21.124 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1613403921-906192 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Make a netfs helper module to manage read request segmentation, caching support and transparent huge page support on behalf of a network filesystem. Signed-off-by: David Howells Reviewed-by: Jeff Layton cc: Matthew Wilcox cc: linux-mm@kvack.org cc: linux-cachefs@redhat.com cc: linux-afs@lists.infradead.org cc: linux-nfs@vger.kernel.org cc: linux-cifs@vger.kernel.org cc: ceph-devel@vger.kernel.org cc: v9fs-developer@lists.sourceforge.net cc: linux-fsdevel@vger.kernel.org --- fs/netfs/Kconfig | 8 ++++++++ 1 file changed, 8 insertions(+) create mode 100644 fs/netfs/Kconfig diff --git a/fs/netfs/Kconfig b/fs/netfs/Kconfig new file mode 100644 index 000000000000..2ebf90e6ca95 --- /dev/null +++ b/fs/netfs/Kconfig @@ -0,0 +1,8 @@ +# SPDX-License-Identifier: GPL-2.0-only + +config NETFS_SUPPORT + tristate "Support for network filesystem high-level I/O" + help + This option enables support for network filesystems, including + helpers for high-level buffered I/O, abstracting out read + segmentation, local caching and transparent huge page support. From patchwork Mon Feb 15 15:45:23 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 12088447 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id BECBBC433E0 for ; Mon, 15 Feb 2021 15:45:39 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 76A9664EA8 for ; Mon, 15 Feb 2021 15:45:39 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 76A9664EA8 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 193408D0111; Mon, 15 Feb 2021 10:45:39 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 143E88D00FD; Mon, 15 Feb 2021 10:45:39 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 033EA8D0111; Mon, 15 Feb 2021 10:45:38 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0131.hostedemail.com [216.40.44.131]) by kanga.kvack.org (Postfix) with ESMTP id DE9F38D00FD for ; Mon, 15 Feb 2021 10:45:38 -0500 (EST) Received: from smtpin04.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id AEDA85DC9 for ; Mon, 15 Feb 2021 15:45:38 +0000 (UTC) X-FDA: 77820927156.04.desk65_2b14f9d2763c Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin04.hostedemail.com (Postfix) with ESMTP id 941588021F78 for ; Mon, 15 Feb 2021 15:45:38 +0000 (UTC) X-HE-Tag: desk65_2b14f9d2763c X-Filterd-Recvd-Size: 6104 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [63.128.21.124]) by imf01.hostedemail.com (Postfix) with ESMTP for ; Mon, 15 Feb 2021 15:45:37 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1613403937; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=+LfM2EJONX4z+yqWB0bfNhcb8TeIK0udZvbEVRj8Fs8=; b=GAF/r3Yqo0PnkgC7UMlE+bPssS9ek0LkyYRWMIxviURwmbpdK5bqKWfxDW6tTc7MwDX2D9 fqThLsI71hLrd85iYFBds5j+YRHuRAxiG0mpq907j5ycCRaV3BCYDDzBEsVFXpA+J4qqVy d/oP/0B+3x+lxdDaDRW5FB1mllPw3NY= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-396-p1buAmfhNqqEmdKSa54Mnw-1; Mon, 15 Feb 2021 10:45:33 -0500 X-MC-Unique: p1buAmfhNqqEmdKSa54Mnw-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 6F7B879EC1; Mon, 15 Feb 2021 15:45:30 +0000 (UTC) Received: from warthog.procyon.org.uk (ovpn-119-68.rdu2.redhat.com [10.10.119.68]) by smtp.corp.redhat.com (Postfix) with ESMTP id 4DC845D9C0; Mon, 15 Feb 2021 15:45:24 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [PATCH 06/33] netfs, mm: Move PG_fscache helper funcs to linux/netfs.h From: David Howells To: Trond Myklebust , Anna Schumaker , Steve French , Dominique Martinet Cc: Matthew Wilcox , linux-mm@kvack.org, linux-cachefs@redhat.com, linux-afs@lists.infradead.org, linux-nfs@vger.kernel.org, linux-cifs@vger.kernel.org, ceph-devel@vger.kernel.org, v9fs-developer@lists.sourceforge.net, linux-fsdevel@vger.kernel.org, dhowells@redhat.com, Jeff Layton , David Wysochanski , "Matthew Wilcox (Oracle)" , Alexander Viro , linux-cachefs@redhat.com, linux-afs@lists.infradead.org, linux-nfs@vger.kernel.org, linux-cifs@vger.kernel.org, ceph-devel@vger.kernel.org, v9fs-developer@lists.sourceforge.net, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Date: Mon, 15 Feb 2021 15:45:23 +0000 Message-ID: <161340392347.1303470.18065131603507621762.stgit@warthog.procyon.org.uk> In-Reply-To: <161340385320.1303470.2392622971006879777.stgit@warthog.procyon.org.uk> References: <161340385320.1303470.2392622971006879777.stgit@warthog.procyon.org.uk> User-Agent: StGit/0.23 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Move the PG_fscache related helper funcs (such as SetPageFsCache()) to linux/netfs.h rather than linux/fscache.h as the intention is to move to a model where they're used by the network filesystem and the helper library, but not by fscache/cachefiles itself. Signed-off-by: David Howells cc: Matthew Wilcox cc: linux-mm@kvack.org cc: linux-cachefs@redhat.com cc: linux-afs@lists.infradead.org cc: linux-nfs@vger.kernel.org cc: linux-cifs@vger.kernel.org cc: ceph-devel@vger.kernel.org cc: v9fs-developer@lists.sourceforge.net cc: linux-fsdevel@vger.kernel.org --- include/linux/fscache.h | 11 +---------- include/linux/netfs.h | 25 +++++++++++++++++++++++++ 2 files changed, 26 insertions(+), 10 deletions(-) create mode 100644 include/linux/netfs.h diff --git a/include/linux/fscache.h b/include/linux/fscache.h index a1c928fe98e7..1f8dc72369ee 100644 --- a/include/linux/fscache.h +++ b/include/linux/fscache.h @@ -19,6 +19,7 @@ #include #include #include +#include #if defined(CONFIG_FSCACHE) || defined(CONFIG_FSCACHE_MODULE) #define fscache_available() (1) @@ -29,16 +30,6 @@ #endif -/* - * overload PG_private_2 to give us PG_fscache - this is used to indicate that - * a page is currently backed by a local disk cache - */ -#define PageFsCache(page) PagePrivate2((page)) -#define SetPageFsCache(page) SetPagePrivate2((page)) -#define ClearPageFsCache(page) ClearPagePrivate2((page)) -#define TestSetPageFsCache(page) TestSetPagePrivate2((page)) -#define TestClearPageFsCache(page) TestClearPagePrivate2((page)) - /* pattern used to fill dead space in an index entry */ #define FSCACHE_INDEX_DEADFILL_PATTERN 0x79 diff --git a/include/linux/netfs.h b/include/linux/netfs.h new file mode 100644 index 000000000000..b3d869ec7d2a --- /dev/null +++ b/include/linux/netfs.h @@ -0,0 +1,25 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ +/* Network filesystem support services. + * + * Copyright (C) 2021 Red Hat, Inc. All Rights Reserved. + * Written by David Howells (dhowells@redhat.com) + * + * for a description of the network filesystem interface declared here. + */ + +#ifndef _LINUX_NETFS_H +#define _LINUX_NETFS_H + +#include + +/* + * Overload PG_private_2 to give us PG_fscache - this is used to indicate that + * a page is currently backed by a local disk cache + */ +#define PageFsCache(page) PagePrivate2((page)) +#define SetPageFsCache(page) SetPagePrivate2((page)) +#define ClearPageFsCache(page) ClearPagePrivate2((page)) +#define TestSetPageFsCache(page) TestSetPagePrivate2((page)) +#define TestClearPageFsCache(page) TestClearPagePrivate2((page)) + +#endif /* _LINUX_NETFS_H */ From patchwork Mon Feb 15 15:45:35 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 12088449 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C0A16C433E0 for ; Mon, 15 Feb 2021 15:45:49 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 69F2864E8E for ; Mon, 15 Feb 2021 15:45:49 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 69F2864E8E Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 0ABEF8D0112; Mon, 15 Feb 2021 10:45:49 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 033628D00FD; Mon, 15 Feb 2021 10:45:48 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DF1038D0112; Mon, 15 Feb 2021 10:45:48 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0151.hostedemail.com [216.40.44.151]) by kanga.kvack.org (Postfix) with ESMTP id C87568D00FD for ; Mon, 15 Feb 2021 10:45:48 -0500 (EST) Received: from smtpin17.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 9014B8E75 for ; Mon, 15 Feb 2021 15:45:48 +0000 (UTC) X-FDA: 77820927576.17.ball30_260cf4a2763c Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin17.hostedemail.com (Postfix) with ESMTP id 77865180F76C3 for ; Mon, 15 Feb 2021 15:45:48 +0000 (UTC) X-HE-Tag: ball30_260cf4a2763c X-Filterd-Recvd-Size: 5701 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [216.205.24.124]) by imf10.hostedemail.com (Postfix) with ESMTP for ; Mon, 15 Feb 2021 15:45:47 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1613403947; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=wLM2gSa0kD6zIHb8GDXyN66lajfI3bDPXzUqDwmVInE=; b=gJy71bJB2Q9caJEUn+LDcfoZ+VE8XUshQKlyBZ0u11Bx5azYSHO/YVVMZ8/RP6dH/yZhYA d3EXLgxJNQkQWkAVzAAqPKZiVsWTfkgSfbt+JkdE1sXbM/Tq/StDd9KJ6niqPFGE6v5Dxs RWlFlgbhJziu9AYgKSU1ACBuwWN86kQ= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-201-kf4scoX2NFOwQcvbtP7Y1g-1; Mon, 15 Feb 2021 10:45:45 -0500 X-MC-Unique: kf4scoX2NFOwQcvbtP7Y1g-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 7BF09874980; Mon, 15 Feb 2021 15:45:43 +0000 (UTC) Received: from warthog.procyon.org.uk (ovpn-119-68.rdu2.redhat.com [10.10.119.68]) by smtp.corp.redhat.com (Postfix) with ESMTP id BB4A519C99; Mon, 15 Feb 2021 15:45:36 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [PATCH 07/33] netfs, mm: Add unlock_page_fscache() and wait_on_page_fscache() From: David Howells To: Trond Myklebust , Anna Schumaker , Steve French , Dominique Martinet Cc: Linus Torvalds , Matthew Wilcox , linux-mm@kvack.org, linux-cachefs@redhat.com, linux-afs@lists.infradead.org, linux-nfs@vger.kernel.org, linux-cifs@vger.kernel.org, ceph-devel@vger.kernel.org, v9fs-developer@lists.sourceforge.net, linux-fsdevel@vger.kernel.org, dhowells@redhat.com, Jeff Layton , David Wysochanski , "Matthew Wilcox (Oracle)" , Alexander Viro , linux-cachefs@redhat.com, linux-afs@lists.infradead.org, linux-nfs@vger.kernel.org, linux-cifs@vger.kernel.org, ceph-devel@vger.kernel.org, v9fs-developer@lists.sourceforge.net, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Date: Mon, 15 Feb 2021 15:45:35 +0000 Message-ID: <161340393568.1303470.4997526899111310530.stgit@warthog.procyon.org.uk> In-Reply-To: <161340385320.1303470.2392622971006879777.stgit@warthog.procyon.org.uk> References: <161340385320.1303470.2392622971006879777.stgit@warthog.procyon.org.uk> User-Agent: StGit/0.23 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Add unlock_page_fscache() as an alias of unlock_page_private_2(). This allows a page 'locked' with PG_fscache to be unlocked. Add wait_on_page_fscache() to wait for PG_fscache to be unlocked. [Linus suggested putting the fscache-themed functions into the caching-specific headers rather than pagemap.h] Signed-off-by: David Howells cc: Linus Torvalds cc: Matthew Wilcox cc: linux-mm@kvack.org cc: linux-cachefs@redhat.com cc: linux-afs@lists.infradead.org cc: linux-nfs@vger.kernel.org cc: linux-cifs@vger.kernel.org cc: ceph-devel@vger.kernel.org cc: v9fs-developer@lists.sourceforge.net cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/linux-fsdevel/1330473.1612974547@warthog.procyon.org.uk/ Link: https://lore.kernel.org/linux-fsdevel/CAHk-=wjgA-74ddehziVk=XAEMTKswPu1Yw4uaro1R3ibs27ztw@mail.gmail.com/ --- include/linux/netfs.h | 29 +++++++++++++++++++++++++++++ 1 file changed, 29 insertions(+) diff --git a/include/linux/netfs.h b/include/linux/netfs.h index b3d869ec7d2a..f69703543788 100644 --- a/include/linux/netfs.h +++ b/include/linux/netfs.h @@ -22,4 +22,33 @@ #define TestSetPageFsCache(page) TestSetPagePrivate2((page)) #define TestClearPageFsCache(page) TestClearPagePrivate2((page)) +/** + * unlock_page_fscache - Unlock a page that's locked with PG_fscache + * @page: The page + * + * Unlocks a page that's locked with PG_fscache and wakes up sleepers in + * wait_on_page_fscache(). This page bit is used by the netfs helpers when a + * netfs page is being written to a local disk cache, thereby allowing writes + * to the cache for the same page to be serialised. + */ +static inline void unlock_page_fscache(struct page *page) +{ + unlock_page_private_2(page); +} + +/** + * wait_on_page_fscache - Wait for PG_fscache to be cleared on a page + * @page: The page + * + * Wait for the PG_fscache (PG_private_2) page bit to be removed from a page. + * This is, for example, used to handle a netfs page being written to a local + * disk cache, thereby allowing writes to the cache for the same page to be + * serialised. + */ +static inline void wait_on_page_fscache(struct page *page) +{ + if (PageFsCache(page)) + wait_on_page_bit(compound_head(page), PG_fscache); +} + #endif /* _LINUX_NETFS_H */ From patchwork Mon Feb 15 15:45:48 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 12088451 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id DF161C433E0 for ; Mon, 15 Feb 2021 15:46:01 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 4C7DF64E8C for ; Mon, 15 Feb 2021 15:46:01 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 4C7DF64E8C Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id E193D8D0113; Mon, 15 Feb 2021 10:46:00 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id DED4D8D00FD; Mon, 15 Feb 2021 10:46:00 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C1B068D0113; Mon, 15 Feb 2021 10:46:00 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0112.hostedemail.com [216.40.44.112]) by kanga.kvack.org (Postfix) with ESMTP id A1D1D8D00FD for ; Mon, 15 Feb 2021 10:46:00 -0500 (EST) Received: from smtpin24.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 651E98248068 for ; Mon, 15 Feb 2021 15:46:00 +0000 (UTC) X-FDA: 77820928080.24.spy03_260a9462763c Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin24.hostedemail.com (Postfix) with ESMTP id 4151A1A4A7 for ; Mon, 15 Feb 2021 15:46:00 +0000 (UTC) X-HE-Tag: spy03_260a9462763c X-Filterd-Recvd-Size: 35640 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [63.128.21.124]) by imf21.hostedemail.com (Postfix) with ESMTP for ; Mon, 15 Feb 2021 15:45:59 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1613403959; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=yJ0a7jlaA/S/WT5cv+SVpjmVvXXZhzt+EhOtoJIcvuQ=; b=PKY0Bo0q5IkViP1rzowQcM+XlpCk093vczrxL2RiakC0okrjoqXxNEtH/D5AKMkXhN4Frb AGHjQ4+fhffl+e9yPW5tp1wAXENLqr2Nv6ABKO27OYnTFEIS6nHKyYxIwuI7xj3cNDK736 WxvKA9pzEbrEdNZJodC5xTfkpgX1vjQ= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-444-2oYO_Y_uP9mgsDpCeS0hWQ-1; Mon, 15 Feb 2021 10:45:55 -0500 X-MC-Unique: 2oYO_Y_uP9mgsDpCeS0hWQ-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 33960100A671; Mon, 15 Feb 2021 15:45:53 +0000 (UTC) Received: from warthog.procyon.org.uk (ovpn-119-68.rdu2.redhat.com [10.10.119.68]) by smtp.corp.redhat.com (Postfix) with ESMTP id 9866519C99; Mon, 15 Feb 2021 15:45:49 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [PATCH 08/33] netfs: Provide readahead and readpage netfs helpers From: David Howells To: Trond Myklebust , Anna Schumaker , Steve French , Dominique Martinet Cc: Jeff Layton , Matthew Wilcox , linux-mm@kvack.org, linux-cachefs@redhat.com, linux-afs@lists.infradead.org, linux-nfs@vger.kernel.org, linux-cifs@vger.kernel.org, ceph-devel@vger.kernel.org, v9fs-developer@lists.sourceforge.net, linux-fsdevel@vger.kernel.org, dhowells@redhat.com, Jeff Layton , David Wysochanski , "Matthew Wilcox (Oracle)" , Alexander Viro , linux-cachefs@redhat.com, linux-afs@lists.infradead.org, linux-nfs@vger.kernel.org, linux-cifs@vger.kernel.org, ceph-devel@vger.kernel.org, v9fs-developer@lists.sourceforge.net, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Date: Mon, 15 Feb 2021 15:45:48 +0000 Message-ID: <161340394873.1303470.6237319335883242536.stgit@warthog.procyon.org.uk> In-Reply-To: <161340385320.1303470.2392622971006879777.stgit@warthog.procyon.org.uk> References: <161340385320.1303470.2392622971006879777.stgit@warthog.procyon.org.uk> User-Agent: StGit/0.23 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Add a pair of helper functions: (*) netfs_readahead() (*) netfs_readpage() to do the work of handling a readahead or a readpage, where the page(s) that form part of the request may be split between the local cache, the server or just require clearing, and may be single pages and transparent huge pages. This is all handled within the helper. Note that while both will read from the cache if there is data present, only netfs_readahead() will expand the request beyond what it was asked to do, and only netfs_readahead() will write back to the cache. netfs_readpage(), on the other hand, is synchronous and only fetches the page (which might be a THP) it is asked for. The netfs gives the helper parameters from the VM, the cache cookie it wants to use (or NULL) and a table of operations (only one of which is mandatory): (*) expand_readahead() [optional] Called to allow the netfs to request an expansion of a readahead request to meet its own alignment requirements. This is done by changing rreq->start and rreq->len. (*) clamp_length() [optional] Called to allow the netfs to cut down a subrequest to meet its own boundary requirements. If it does this, the helper will generate additional subrequests until the full request is satisfied. (*) is_still_valid() [optional] Called to find out if the data just read from the cache has been invalidated and must be reread from the server. (*) issue_op() [required] Called to ask the netfs to issue a read to the server. The subrequest describes the read. The read request holds information about the file being accessed. The netfs can cache information in rreq->netfs_priv. Upon completion, the netfs should set the error, transferred and can also set FSCACHE_SREQ_CLEAR_TAIL and then call fscache_subreq_terminated(). (*) done() [optional] Called after the pages have been unlocked. The read request is still pinning the file and mapping and may still be pinning pages with PG_fscache. rreq->error indicates any error that has been accumulated. (*) cleanup() [optional] Called when the helper is disposing of a finished read request. This allows the netfs to clear rreq->netfs_priv. Netfs support is enabled with CONFIG_NETFS_SUPPORT=y. It will be built even if CONFIG_FSCACHE=n and in this case much of it should be optimised away, allowing the filesystem to use it even when caching is disabled. changes: - Folded in a kerneldoc comment fix - Folded in a fix for the error handling in the case that ENOMEM occurs. Signed-off-by: David Howells Reviewed-by: Jeff Layton cc: Matthew Wilcox cc: linux-mm@kvack.org cc: linux-cachefs@redhat.com cc: linux-afs@lists.infradead.org cc: linux-nfs@vger.kernel.org cc: linux-cifs@vger.kernel.org cc: ceph-devel@vger.kernel.org cc: v9fs-developer@lists.sourceforge.net cc: linux-fsdevel@vger.kernel.org --- fs/Kconfig | 1 fs/Makefile | 1 fs/netfs/Makefile | 6 fs/netfs/internal.h | 61 ++++ fs/netfs/read_helper.c | 717 ++++++++++++++++++++++++++++++++++++++++++++++++ include/linux/netfs.h | 82 +++++ 6 files changed, 868 insertions(+) create mode 100644 fs/netfs/Makefile create mode 100644 fs/netfs/internal.h create mode 100644 fs/netfs/read_helper.c diff --git a/fs/Kconfig b/fs/Kconfig index aa4c12282301..a8ab2802c52e 100644 --- a/fs/Kconfig +++ b/fs/Kconfig @@ -125,6 +125,7 @@ source "fs/overlayfs/Kconfig" menu "Caches" +source "fs/netfs/Kconfig" source "fs/fscache/Kconfig" source "fs/cachefiles/Kconfig" diff --git a/fs/Makefile b/fs/Makefile index 999d1a23f036..e922c3bc0a43 100644 --- a/fs/Makefile +++ b/fs/Makefile @@ -68,6 +68,7 @@ obj-$(CONFIG_PROFILING) += dcookies.o obj-$(CONFIG_DLM) += dlm/ # Do not add any filesystems before this line +obj-$(CONFIG_NETFS_SUPPORT) += netfs/ obj-$(CONFIG_FSCACHE) += fscache/ obj-$(CONFIG_REISERFS_FS) += reiserfs/ obj-$(CONFIG_EXT4_FS) += ext4/ diff --git a/fs/netfs/Makefile b/fs/netfs/Makefile new file mode 100644 index 000000000000..4b4eff2ba369 --- /dev/null +++ b/fs/netfs/Makefile @@ -0,0 +1,6 @@ +# SPDX-License-Identifier: GPL-2.0 + +netfs-y := \ + read_helper.o + +obj-$(CONFIG_NETFS_SUPPORT) := netfs.o diff --git a/fs/netfs/internal.h b/fs/netfs/internal.h new file mode 100644 index 000000000000..ee665c0e7dc8 --- /dev/null +++ b/fs/netfs/internal.h @@ -0,0 +1,61 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ +/* Internal definitions for network filesystem support + * + * Copyright (C) 2021 Red Hat, Inc. All Rights Reserved. + * Written by David Howells (dhowells@redhat.com) + */ + +#ifdef pr_fmt +#undef pr_fmt +#endif + +#define pr_fmt(fmt) "netfs: " fmt + +/* + * read_helper.c + */ +extern unsigned int netfs_debug; + +#define netfs_stat(x) do {} while(0) +#define netfs_stat_d(x) do {} while(0) + +/*****************************************************************************/ +/* + * debug tracing + */ +#define dbgprintk(FMT, ...) \ + printk("[%-6.6s] "FMT"\n", current->comm, ##__VA_ARGS__) + +#define kenter(FMT, ...) dbgprintk("==> %s("FMT")", __func__, ##__VA_ARGS__) +#define kleave(FMT, ...) dbgprintk("<== %s()"FMT"", __func__, ##__VA_ARGS__) +#define kdebug(FMT, ...) dbgprintk(FMT, ##__VA_ARGS__) + +#ifdef __KDEBUG +#define _enter(FMT, ...) kenter(FMT, ##__VA_ARGS__) +#define _leave(FMT, ...) kleave(FMT, ##__VA_ARGS__) +#define _debug(FMT, ...) kdebug(FMT, ##__VA_ARGS__) + +#elif defined(CONFIG_NETFS_DEBUG) +#define _enter(FMT, ...) \ +do { \ + if (netfs_debug) \ + kenter(FMT, ##__VA_ARGS__); \ +} while (0) + +#define _leave(FMT, ...) \ +do { \ + if (netfs_debug) \ + kleave(FMT, ##__VA_ARGS__); \ +} while (0) + +#define _debug(FMT, ...) \ +do { \ + if (netfs_debug) \ + kdebug(FMT, ##__VA_ARGS__); \ +} while (0) + +#else +#define _enter(FMT, ...) no_printk("==> %s("FMT")", __func__, ##__VA_ARGS__) +#define _leave(FMT, ...) no_printk("<== %s()"FMT"", __func__, ##__VA_ARGS__) +#define _debug(FMT, ...) no_printk(FMT, ##__VA_ARGS__) +#endif diff --git a/fs/netfs/read_helper.c b/fs/netfs/read_helper.c new file mode 100644 index 000000000000..a35bd29928fa --- /dev/null +++ b/fs/netfs/read_helper.c @@ -0,0 +1,717 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* Network filesystem high-level read support. + * + * Copyright (C) 2021 Red Hat, Inc. All Rights Reserved. + * Written by David Howells (dhowells@redhat.com) + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include "internal.h" + +MODULE_DESCRIPTION("Network fs support"); +MODULE_AUTHOR("Red Hat, Inc."); +MODULE_LICENSE("GPL"); + +unsigned netfs_debug; +module_param_named(debug, netfs_debug, uint, S_IWUSR | S_IRUGO); +MODULE_PARM_DESC(netfs_debug, "Netfs support debugging mask"); + +static void netfs_rreq_work(struct work_struct *); +static void __netfs_put_subrequest(struct netfs_read_subrequest *); + +static void netfs_put_subrequest(struct netfs_read_subrequest *subreq) +{ + if (refcount_dec_and_test(&subreq->usage)) + __netfs_put_subrequest(subreq); +} + +static struct netfs_read_request *netfs_alloc_read_request( + const struct netfs_read_request_ops *ops, void *netfs_priv, + struct file *file) +{ + struct netfs_read_request *rreq; + + rreq = kzalloc(sizeof(struct netfs_read_request), GFP_KERNEL); + if (rreq) { + rreq->netfs_ops = ops; + rreq->netfs_priv = netfs_priv; + rreq->inode = file_inode(file); + rreq->i_size = i_size_read(rreq->inode); + INIT_LIST_HEAD(&rreq->subrequests); + INIT_WORK(&rreq->work, netfs_rreq_work); + refcount_set(&rreq->usage, 1); + __set_bit(NETFS_RREQ_IN_PROGRESS, &rreq->flags); + ops->init_rreq(rreq, file); + } + + return rreq; +} + +static void netfs_get_read_request(struct netfs_read_request *rreq) +{ + refcount_inc(&rreq->usage); +} + +static void netfs_rreq_clear_subreqs(struct netfs_read_request *rreq) +{ + struct netfs_read_subrequest *subreq; + + while (!list_empty(&rreq->subrequests)) { + subreq = list_first_entry(&rreq->subrequests, + struct netfs_read_subrequest, rreq_link); + list_del(&subreq->rreq_link); + netfs_put_subrequest(subreq); + } +} + +static void netfs_free_read_request(struct work_struct *work) +{ + struct netfs_read_request *rreq = + container_of(work, struct netfs_read_request, work); + netfs_rreq_clear_subreqs(rreq); + if (rreq->netfs_priv) + rreq->netfs_ops->cleanup(rreq->mapping, rreq->netfs_priv); + kfree(rreq); +} + +static void netfs_put_read_request(struct netfs_read_request *rreq) +{ + if (refcount_dec_and_test(&rreq->usage)) { + if (in_softirq()) { + rreq->work.func = netfs_free_read_request; + if (!queue_work(system_unbound_wq, &rreq->work)) + BUG(); + } else { + netfs_free_read_request(&rreq->work); + } + } +} + +/* + * Allocate and partially initialise an I/O request structure. + */ +static struct netfs_read_subrequest *netfs_alloc_subrequest( + struct netfs_read_request *rreq) +{ + struct netfs_read_subrequest *subreq; + + subreq = kzalloc(sizeof(struct netfs_read_subrequest), GFP_KERNEL); + if (subreq) { + INIT_LIST_HEAD(&subreq->rreq_link); + refcount_set(&subreq->usage, 2); + subreq->rreq = rreq; + netfs_get_read_request(rreq); + } + + return subreq; +} + +static void netfs_get_read_subrequest(struct netfs_read_subrequest *subreq) +{ + refcount_inc(&subreq->usage); +} + +static void __netfs_put_subrequest(struct netfs_read_subrequest *subreq) +{ + netfs_put_read_request(subreq->rreq); + kfree(subreq); +} + +/* + * Clear the unread part of an I/O request. + */ +static void netfs_clear_unread(struct netfs_read_subrequest *subreq) +{ + struct iov_iter iter; + + iov_iter_xarray(&iter, WRITE, &subreq->rreq->mapping->i_pages, + subreq->start + subreq->transferred, + subreq->len - subreq->transferred); + iov_iter_zero(iov_iter_count(&iter), &iter); +} + +/* + * Fill a subrequest region with zeroes. + */ +static void netfs_fill_with_zeroes(struct netfs_read_request *rreq, + struct netfs_read_subrequest *subreq) +{ + __set_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags); + netfs_subreq_terminated(subreq, 0); +} + +/* + * Ask the netfs to issue a read request to the server for us. + * + * The netfs is expected to read from subreq->pos + subreq->transferred to + * subreq->pos + subreq->len - 1. It may not backtrack and write data into the + * buffer prior to the transferred point as it might clobber dirty data + * obtained from the cache. + * + * Alternatively, the netfs is allowed to indicate one of two things: + * + * - NETFS_SREQ_SHORT_READ: A short read - it will get called again to try and + * make progress. + * + * - NETFS_SREQ_CLEAR_TAIL: A short read - the rest of the buffer will be + * cleared. + */ +static void netfs_read_from_server(struct netfs_read_request *rreq, + struct netfs_read_subrequest *subreq) +{ + rreq->netfs_ops->issue_op(subreq); +} + +/* + * Release those waiting. + */ +static void netfs_rreq_completed(struct netfs_read_request *rreq) +{ + netfs_rreq_clear_subreqs(rreq); + netfs_put_read_request(rreq); +} + +/* + * Unlock the pages in a read operation. We need to set PG_fscache on any + * pages we're going to write back before we unlock them. + */ +static void netfs_rreq_unlock(struct netfs_read_request *rreq) +{ + struct netfs_read_subrequest *subreq; + struct page *page; + unsigned int iopos, account = 0; + pgoff_t start_page = rreq->start / PAGE_SIZE; + pgoff_t last_page = ((rreq->start + rreq->len) / PAGE_SIZE) - 1; + bool subreq_failed = false; + int i; + + XA_STATE(xas, &rreq->mapping->i_pages, start_page); + + if (test_bit(NETFS_RREQ_FAILED, &rreq->flags)) { + __clear_bit(NETFS_RREQ_WRITE_TO_CACHE, &rreq->flags); + list_for_each_entry(subreq, &rreq->subrequests, rreq_link) { + __clear_bit(NETFS_SREQ_WRITE_TO_CACHE, &subreq->flags); + } + } + + /* Walk through the pagecache and the I/O request lists simultaneously. + * We may have a mixture of cached and uncached sections and we only + * really want to write out the uncached sections. This is slightly + * complicated by the possibility that we might have huge pages with a + * mixture inside. + */ + subreq = list_first_entry(&rreq->subrequests, + struct netfs_read_subrequest, rreq_link); + iopos = 0; + subreq_failed = (subreq->error < 0); + + rcu_read_lock(); + xas_for_each(&xas, page, last_page) { + unsigned int pgpos = (page->index - start_page) * PAGE_SIZE; + unsigned int pgend = pgpos + thp_size(page); + bool pg_failed = false; + + for (;;) { + if (!subreq) { + pg_failed = true; + break; + } + if (test_bit(NETFS_SREQ_WRITE_TO_CACHE, &subreq->flags)) + SetPageFsCache(page); + pg_failed |= subreq_failed; + if (pgend < iopos + subreq->len) + break; + + account += subreq->transferred; + iopos += subreq->len; + if (!list_is_last(&subreq->rreq_link, &rreq->subrequests)) { + subreq = list_next_entry(subreq, rreq_link); + subreq_failed = (subreq->error < 0); + } else { + subreq = NULL; + subreq_failed = false; + } + if (pgend == iopos) + break; + } + + if (!pg_failed) { + for (i = 0; i < thp_nr_pages(page); i++) + flush_dcache_page(page); + SetPageUptodate(page); + } + + if (!test_bit(NETFS_RREQ_DONT_UNLOCK_PAGES, &rreq->flags)) { + if (page->index == rreq->no_unlock_page && + test_bit(NETFS_RREQ_NO_UNLOCK_PAGE, &rreq->flags)) + _debug("no unlock"); + else + unlock_page(page); + } + } + rcu_read_unlock(); + + task_io_account_read(account); + if (rreq->netfs_ops->done) + rreq->netfs_ops->done(rreq); +} + +/* + * Handle a short read. + */ +static void netfs_rreq_short_read(struct netfs_read_request *rreq, + struct netfs_read_subrequest *subreq) +{ + __clear_bit(NETFS_SREQ_SHORT_READ, &subreq->flags); + __set_bit(NETFS_SREQ_SEEK_DATA_READ, &subreq->flags); + + netfs_get_read_subrequest(subreq); + atomic_inc(&rreq->nr_rd_ops); + netfs_read_from_server(rreq, subreq); +} + +/* + * Resubmit any short or failed operations. Returns true if we got the rreq + * ref back. + */ +static bool netfs_rreq_perform_resubmissions(struct netfs_read_request *rreq) +{ + struct netfs_read_subrequest *subreq; + + WARN_ON(in_softirq()); + + /* We don't want terminating submissions trying to wake us up whilst + * we're still going through the list. + */ + atomic_inc(&rreq->nr_rd_ops); + + __clear_bit(NETFS_RREQ_INCOMPLETE_IO, &rreq->flags); + list_for_each_entry(subreq, &rreq->subrequests, rreq_link) { + if (subreq->error) { + if (subreq->source != NETFS_READ_FROM_CACHE) + break; + subreq->source = NETFS_DOWNLOAD_FROM_SERVER; + subreq->error = 0; + netfs_get_read_subrequest(subreq); + atomic_inc(&rreq->nr_rd_ops); + netfs_read_from_server(rreq, subreq); + } else if (test_bit(NETFS_SREQ_SHORT_READ, &subreq->flags)) { + netfs_rreq_short_read(rreq, subreq); + } + } + + /* If we decrement nr_rd_ops to 0, the usage ref belongs to us. */ + if (atomic_dec_and_test(&rreq->nr_rd_ops)) + return true; + + wake_up_var(&rreq->nr_rd_ops); + return false; +} + +/* + * Assess the state of a read request and decide what to do next. + * + * Note that we could be in an ordinary kernel thread, on a workqueue or in + * softirq context at this point. We inherit a ref from the caller. + */ +static void netfs_rreq_assess(struct netfs_read_request *rreq) +{ +again: + if (!test_bit(NETFS_RREQ_FAILED, &rreq->flags) && + test_bit(NETFS_RREQ_INCOMPLETE_IO, &rreq->flags)) { + if (netfs_rreq_perform_resubmissions(rreq)) + goto again; + return; + } + + netfs_rreq_unlock(rreq); + + clear_bit_unlock(NETFS_RREQ_IN_PROGRESS, &rreq->flags); + wake_up_bit(&rreq->flags, NETFS_RREQ_IN_PROGRESS); + + netfs_rreq_completed(rreq); +} + +static void netfs_rreq_work(struct work_struct *work) +{ + struct netfs_read_request *rreq = + container_of(work, struct netfs_read_request, work); + netfs_rreq_assess(rreq); +} + +/* + * Handle the completion of all outstanding I/O operations on a read request. + * We inherit a ref from the caller. + */ +static void netfs_rreq_terminated(struct netfs_read_request *rreq) +{ + if (test_bit(NETFS_RREQ_INCOMPLETE_IO, &rreq->flags) && + in_softirq()) { + if (!queue_work(system_unbound_wq, &rreq->work)) + BUG(); + } else { + netfs_rreq_assess(rreq); + } +} + +/** + * netfs_subreq_terminated - Note the termination of an I/O operation. + * @subreq: The I/O request that has terminated. + * @transferred_or_error: The amount of data transferred or an error code. + * + * This tells the read helper that a contributory I/O operation has terminated, + * one way or another, and that it should integrate the results. + * + * The caller indicates in @transferred_or_error the outcome of the operation, + * supplying a positive value to indicate the number of bytes transferred, 0 to + * indicate a failure to transfer anything that should be retried or a negative + * error code. The helper will look after reissuing I/O operations as + * appropriate and writing downloaded data to the cache. + * + * This may be called from a softirq handler, so we want to avoid taking the + * spinlock if we can. + */ +void netfs_subreq_terminated(struct netfs_read_subrequest *subreq, + ssize_t transferred_or_error) +{ + struct netfs_read_request *rreq = subreq->rreq; + int u; + + _enter("[%u]{%llx,%lx},%zd", + subreq->debug_index, subreq->start, subreq->flags, + transferred_or_error); + + if (IS_ERR_VALUE(transferred_or_error)) { + subreq->error = transferred_or_error; + goto failed; + } + + if (WARN_ON(transferred_or_error > subreq->len - subreq->transferred)) + transferred_or_error = subreq->len - subreq->transferred; + + subreq->error = 0; + subreq->transferred += transferred_or_error; + if (subreq->transferred < subreq->len) + goto incomplete; + +complete: + __clear_bit(NETFS_SREQ_NO_PROGRESS, &subreq->flags); + if (test_bit(NETFS_SREQ_WRITE_TO_CACHE, &subreq->flags)) + set_bit(NETFS_RREQ_WRITE_TO_CACHE, &rreq->flags); + +out: + /* If we decrement nr_rd_ops to 0, the ref belongs to us. */ + u = atomic_dec_return(&rreq->nr_rd_ops); + if (u == 0) + netfs_rreq_terminated(rreq); + else if (u == 1) + wake_up_var(&rreq->nr_rd_ops); + + netfs_put_subrequest(subreq); + return; + +incomplete: + if (test_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags)) { + netfs_clear_unread(subreq); + subreq->transferred = subreq->len; + goto complete; + } + + if (transferred_or_error == 0) { + if (__test_and_set_bit(NETFS_SREQ_NO_PROGRESS, &subreq->flags)) { + subreq->error = -ENODATA; + goto failed; + } + } else { + __clear_bit(NETFS_SREQ_NO_PROGRESS, &subreq->flags); + } + + __set_bit(NETFS_SREQ_SHORT_READ, &subreq->flags); + set_bit(NETFS_RREQ_INCOMPLETE_IO, &rreq->flags); + goto out; + +failed: + if (subreq->source == NETFS_READ_FROM_CACHE) { + set_bit(NETFS_RREQ_INCOMPLETE_IO, &rreq->flags); + } else { + set_bit(NETFS_RREQ_FAILED, &rreq->flags); + rreq->error = subreq->error; + } + goto out; +} +EXPORT_SYMBOL(netfs_subreq_terminated); + +static enum netfs_read_source netfs_cache_prepare_read(struct netfs_read_subrequest *subreq, + loff_t i_size) +{ + struct netfs_read_request *rreq = subreq->rreq; + + if (subreq->start >= rreq->i_size) + return NETFS_FILL_WITH_ZEROES; + return NETFS_DOWNLOAD_FROM_SERVER; +} + +/* + * Work out what sort of subrequest the next one will be. + */ +static enum netfs_read_source +netfs_rreq_prepare_read(struct netfs_read_request *rreq, + struct netfs_read_subrequest *subreq) +{ + enum netfs_read_source source; + + _enter("%llx-%llx,%llx", subreq->start, subreq->start + subreq->len, rreq->i_size); + + source = netfs_cache_prepare_read(subreq, rreq->i_size); + if (source == NETFS_INVALID_READ) + goto out; + + if (source == NETFS_DOWNLOAD_FROM_SERVER) { + /* Call out to the netfs to let it shrink the request to fit + * its own I/O sizes and boundaries. If it shinks it here, it + * will be called again to make simultaneous calls; if it wants + * to make serial calls, it can indicate a short read and then + * we will call it again. + */ + if (subreq->len > rreq->i_size - subreq->start) + subreq->len = rreq->i_size - subreq->start; + + if (rreq->netfs_ops->clamp_length && + !rreq->netfs_ops->clamp_length(subreq)) { + source = NETFS_INVALID_READ; + goto out; + } + } + + if (WARN_ON(subreq->len == 0)) + source = NETFS_INVALID_READ; + +out: + subreq->source = source; + return source; +} + +/* + * Slice off a piece of a read request and submit an I/O request for it. + */ +static bool netfs_rreq_submit_slice(struct netfs_read_request *rreq, + unsigned int *_debug_index) +{ + struct netfs_read_subrequest *subreq; + enum netfs_read_source source; + + subreq = netfs_alloc_subrequest(rreq); + if (!subreq) + return false; + + subreq->debug_index = (*_debug_index)++; + subreq->start = rreq->start + rreq->submitted; + subreq->len = rreq->len - rreq->submitted; + + _debug("slice %llx,%zx,%zx", subreq->start, subreq->len, rreq->submitted); + list_add_tail(&subreq->rreq_link, &rreq->subrequests); + + /* Call out to the cache to find out what it can do with the remaining + * subset. It tells us in subreq->flags what it decided should be done + * and adjusts subreq->len down if the subset crosses a cache boundary. + * + * Then when we hand the subset, it can choose to take a subset of that + * (the starts must coincide), in which case, we go around the loop + * again and ask it to download the next piece. + */ + source = netfs_rreq_prepare_read(rreq, subreq); + if (source == NETFS_INVALID_READ) + goto subreq_failed; + + atomic_inc(&rreq->nr_rd_ops); + + rreq->submitted += subreq->len; + + switch (source) { + case NETFS_FILL_WITH_ZEROES: + netfs_fill_with_zeroes(rreq, subreq); + break; + case NETFS_DOWNLOAD_FROM_SERVER: + netfs_read_from_server(rreq, subreq); + break; + default: + BUG(); + } + + return true; + +subreq_failed: + rreq->error = subreq->error; + netfs_put_subrequest(subreq); + return false; +} + +static void netfs_rreq_expand(struct netfs_read_request *rreq, + struct readahead_control *ractl) +{ + /* Give the netfs a chance to change the request parameters. The + * resultant request must contain the original region. + */ + if (rreq->netfs_ops->expand_readahead) + rreq->netfs_ops->expand_readahead(rreq); + + /* Expand the request if the cache wants it to start earlier. Note + * that the expansion may get further extended if the VM wishes to + * insert THPs and the preferred start and/or end wind up in the middle + * of THPs. + * + * If this is the case, however, the THP size should be an integer + * multiple of the cache granule size, so we get a whole number of + * granules to deal with. + */ + if (rreq->start != readahead_pos(ractl) || + rreq->len != readahead_length(ractl)) { + readahead_expand(ractl, rreq->start, rreq->len); + rreq->start = readahead_pos(ractl); + rreq->len = readahead_length(ractl); + } +} + +/** + * netfs_readahead - Helper to manage a read request + * @ractl: The description of the readahead request + * @ops: The network filesystem's operations for the helper to use + * @netfs_priv: Private netfs data to be retained in the request + * + * Fulfil a readahead request by drawing data from the cache if possible, or + * the netfs if not. Space beyond the EOF is zero-filled. Multiple I/O + * requests from different sources will get munged together. If necessary, the + * readahead window can be expanded in either direction to a more convenient + * alighment for RPC efficiency or to make storage in the cache feasible. + * + * The calling netfs must provide a table of operations, only one of which, + * issue_op, is mandatory. It may also be passed a private token, which will + * be retained in rreq->netfs_priv and will be cleaned up by ops->cleanup(). + * + * This is usable whether or not caching is enabled. + */ +void netfs_readahead(struct readahead_control *ractl, + const struct netfs_read_request_ops *ops, + void *netfs_priv) +{ + struct netfs_read_request *rreq; + struct page *page; + unsigned int debug_index = 0; + + _enter("%lx,%x", readahead_index(ractl), readahead_count(ractl)); + + if (readahead_count(ractl) == 0) + goto cleanup; + + rreq = netfs_alloc_read_request(ops, netfs_priv, ractl->file); + if (!rreq) + goto cleanup; + rreq->mapping = ractl->mapping; + rreq->start = readahead_pos(ractl); + rreq->len = readahead_length(ractl); + + netfs_rreq_expand(rreq, ractl); + + atomic_set(&rreq->nr_rd_ops, 1); + do { + if (!netfs_rreq_submit_slice(rreq, &debug_index)) + break; + + } while (rreq->submitted < rreq->len); + + if (rreq->submitted == 0) { + netfs_put_read_request(rreq); + return; + } + + // TODO: If we didn't submit enough readage, we need to try punting to + // a work queue. + + while ((page = readahead_page(ractl))) + put_page(page); + + /* If we decrement nr_rd_ops to 0, the ref belongs to us. */ + if (atomic_dec_and_test(&rreq->nr_rd_ops)) + netfs_rreq_assess(rreq); + return; + +cleanup: + if (netfs_priv) + ops->cleanup(ractl->mapping, netfs_priv); + return; +} +EXPORT_SYMBOL(netfs_readahead); + +/** + * netfs_page - Helper to manage a readpage request + * @file: The file to read from + * @page: The page to read + * @ops: The network filesystem's operations for the helper to use + * @netfs_priv: Private netfs data to be retained in the request + * + * Fulfil a readpage request by drawing data from the cache if possible, or the + * netfs if not. Space beyond the EOF is zero-filled. Multiple I/O requests + * from different sources will get munged together. + * + * The calling netfs must provide a table of operations, only one of which, + * issue_op, is mandatory. It may also be passed a private token, which will + * be retained in rreq->netfs_priv and will be cleaned up by ops->cleanup(). + * + * This is usable whether or not caching is enabled. + */ +int netfs_readpage(struct file *file, + struct page *page, + const struct netfs_read_request_ops *ops, + void *netfs_priv) +{ + struct netfs_read_request *rreq; + unsigned int debug_index = 0; + int ret; + + _enter("%lx", page->index); + + rreq = netfs_alloc_read_request(ops, netfs_priv, file); + if (!rreq) { + if (netfs_priv) + ops->cleanup(netfs_priv, page->mapping); + unlock_page(page); + return -ENOMEM; + } + rreq->mapping = page->mapping; + rreq->start = page->index * PAGE_SIZE; + rreq->len = thp_size(page); + + netfs_get_read_request(rreq); + + atomic_set(&rreq->nr_rd_ops, 1); + do { + if (!netfs_rreq_submit_slice(rreq, &debug_index)) + break; + + } while (rreq->submitted < rreq->len); + + /* Keep nr_rd_ops incremented so that the ref always belongs to us, and + * the service code isn't punted off to a random thread pool to + * process. + */ + do { + wait_var_event(&rreq->nr_rd_ops, atomic_read(&rreq->nr_rd_ops) == 1); + netfs_rreq_assess(rreq); + } while (test_bit(NETFS_RREQ_IN_PROGRESS, &rreq->flags)); + + ret = rreq->error; + if (ret == 0 && rreq->submitted < rreq->len) + ret = -EIO; + netfs_put_read_request(rreq); + return ret; +} +EXPORT_SYMBOL(netfs_readpage); diff --git a/include/linux/netfs.h b/include/linux/netfs.h index f69703543788..7f74d668a459 100644 --- a/include/linux/netfs.h +++ b/include/linux/netfs.h @@ -10,6 +10,8 @@ #ifndef _LINUX_NETFS_H #define _LINUX_NETFS_H +#include +#include #include /* @@ -51,4 +53,84 @@ static inline void wait_on_page_fscache(struct page *page) wait_on_page_bit(compound_head(page), PG_fscache); } +enum netfs_read_source { + NETFS_FILL_WITH_ZEROES, + NETFS_DOWNLOAD_FROM_SERVER, + NETFS_READ_FROM_CACHE, + NETFS_INVALID_READ, +} __mode(byte); + +/* + * Descriptor for a single component subrequest. + */ +struct netfs_read_subrequest { + struct netfs_read_request *rreq; /* Supervising read request */ + struct list_head rreq_link; /* Link in rreq->subrequests */ + loff_t start; /* Where to start the I/O */ + size_t len; /* Size of the I/O */ + size_t transferred; /* Amount of data transferred */ + refcount_t usage; + short error; /* 0 or error that occurred */ + unsigned short debug_index; /* Index in list (for debugging output) */ + enum netfs_read_source source; /* Where to read from */ + unsigned long flags; +#define NETFS_SREQ_WRITE_TO_CACHE 0 /* Set if should write to cache */ +#define NETFS_SREQ_CLEAR_TAIL 1 /* Set if the rest of the read should be cleared */ +#define NETFS_SREQ_SHORT_READ 2 /* Set if there was a short read from the cache */ +#define NETFS_SREQ_SEEK_DATA_READ 3 /* Set if ->read() should SEEK_DATA first */ +#define NETFS_SREQ_NO_PROGRESS 4 /* Set if we didn't manage to read any data */ +}; + +/* + * Descriptor for a read helper request. This is used to make multiple I/O + * requests on a variety of sources and then stitch the result together. + */ +struct netfs_read_request { + struct work_struct work; + struct inode *inode; /* The file being accessed */ + struct address_space *mapping; /* The mapping being accessed */ + struct list_head subrequests; /* Requests to fetch I/O from disk or net */ + void *netfs_priv; /* Private data for the netfs */ + atomic_t nr_rd_ops; /* Number of read ops in progress */ + size_t submitted; /* Amount submitted for I/O so far */ + size_t len; /* Length of the request */ + short error; /* 0 or error that occurred */ + loff_t i_size; /* Size of the file */ + loff_t start; /* Start position */ + pgoff_t no_unlock_page; /* Don't unlock this page after read */ + refcount_t usage; + unsigned long flags; +#define NETFS_RREQ_INCOMPLETE_IO 0 /* Some ioreqs terminated short or with error */ +#define NETFS_RREQ_WRITE_TO_CACHE 1 /* Need to write to the cache */ +#define NETFS_RREQ_NO_UNLOCK_PAGE 2 /* Don't unlock no_unlock_page on completion */ +#define NETFS_RREQ_DONT_UNLOCK_PAGES 3 /* Don't unlock the pages on completion */ +#define NETFS_RREQ_FAILED 4 /* The request failed */ +#define NETFS_RREQ_IN_PROGRESS 5 /* Unlocked when the request completes */ + const struct netfs_read_request_ops *netfs_ops; +}; + +/* + * Operations the network filesystem can/must provide to the helpers. + */ +struct netfs_read_request_ops { + void (*init_rreq)(struct netfs_read_request *rreq, struct file *file); + void (*expand_readahead)(struct netfs_read_request *rreq); + bool (*clamp_length)(struct netfs_read_subrequest *subreq); + void (*issue_op)(struct netfs_read_subrequest *subreq); + bool (*is_still_valid)(struct netfs_read_request *rreq); + void (*done)(struct netfs_read_request *rreq); + void (*cleanup)(struct address_space *mapping, void *netfs_priv); +}; + +struct readahead_control; +extern void netfs_readahead(struct readahead_control *, + const struct netfs_read_request_ops *, + void *); +extern int netfs_readpage(struct file *, + struct page *, + const struct netfs_read_request_ops *, + void *); + +extern void netfs_subreq_terminated(struct netfs_read_subrequest *, ssize_t); + #endif /* _LINUX_NETFS_H */ From patchwork Mon Feb 15 15:45:58 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 12088453 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.0 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, UNWANTED_LANGUAGE_BODY autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C7111C433E0 for ; Mon, 15 Feb 2021 15:46:12 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 47AA164E8C for ; Mon, 15 Feb 2021 15:46:12 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 47AA164E8C Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id DA1C28D0114; Mon, 15 Feb 2021 10:46:11 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id D34D58D00FD; Mon, 15 Feb 2021 10:46:11 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BF3418D0114; Mon, 15 Feb 2021 10:46:11 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0053.hostedemail.com [216.40.44.53]) by kanga.kvack.org (Postfix) with ESMTP id A546C8D00FD for ; Mon, 15 Feb 2021 10:46:11 -0500 (EST) Received: from smtpin30.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 337A0180FDD83 for ; Mon, 15 Feb 2021 15:46:11 +0000 (UTC) X-FDA: 77820928542.30.quiet34_551023f2763c Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin30.hostedemail.com (Postfix) with ESMTP id 108F21801380D for ; Mon, 15 Feb 2021 15:46:11 +0000 (UTC) X-HE-Tag: quiet34_551023f2763c X-Filterd-Recvd-Size: 16469 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [63.128.21.124]) by imf33.hostedemail.com (Postfix) with ESMTP for ; Mon, 15 Feb 2021 15:46:09 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1613403969; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=OhawpzYF9TXmoZ6dVvDLnLzXkrsQa778e+Po+y2nBnY=; b=BxDgWeEjLzwApJu/ZCye5qSXslUO6m751V+VDXmZvf232dlz17wPktU4tjFJVTd92sQZTD 7tcKpBzKP1e0XsW6KycuSxZ8VYZbpuJX784TbWtBbw3KFNEd1oOC1yrSjfZAJ+dR1Pfpnx aAZcjLVEc/x9Pczn2VjJ/ljrwEbGBwo= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-418-Cy2-TE12MsWZ4xtZuKaaCw-1; Mon, 15 Feb 2021 10:46:07 -0500 X-MC-Unique: Cy2-TE12MsWZ4xtZuKaaCw-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id C0EEA874980; Mon, 15 Feb 2021 15:46:05 +0000 (UTC) Received: from warthog.procyon.org.uk (ovpn-119-68.rdu2.redhat.com [10.10.119.68]) by smtp.corp.redhat.com (Postfix) with ESMTP id 422465C241; Mon, 15 Feb 2021 15:45:59 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [PATCH 09/33] netfs: Add tracepoints From: David Howells To: Trond Myklebust , Anna Schumaker , Steve French , Dominique Martinet Cc: Jeff Layton , Matthew Wilcox , linux-mm@kvack.org, linux-cachefs@redhat.com, linux-afs@lists.infradead.org, linux-nfs@vger.kernel.org, linux-cifs@vger.kernel.org, ceph-devel@vger.kernel.org, v9fs-developer@lists.sourceforge.net, linux-fsdevel@vger.kernel.org, dhowells@redhat.com, Jeff Layton , David Wysochanski , "Matthew Wilcox (Oracle)" , Alexander Viro , linux-cachefs@redhat.com, linux-afs@lists.infradead.org, linux-nfs@vger.kernel.org, linux-cifs@vger.kernel.org, ceph-devel@vger.kernel.org, v9fs-developer@lists.sourceforge.net, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Date: Mon, 15 Feb 2021 15:45:58 +0000 Message-ID: <161340395843.1303470.7355519662919639648.stgit@warthog.procyon.org.uk> In-Reply-To: <161340385320.1303470.2392622971006879777.stgit@warthog.procyon.org.uk> References: <161340385320.1303470.2392622971006879777.stgit@warthog.procyon.org.uk> User-Agent: StGit/0.23 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Add three tracepoints to track the activity of the read helpers: (1) netfs/netfs_read This logs entry to the read helpers and also expansion of the range in a readahead request. (2) netfs/netfs_rreq This logs the progress of netfs_read_request objects which track read requests. A read request may be a compound of multiple subrequests. (3) netfs/netfs_sreq This logs the progress of netfs_read_subrequest objects, which track the contributions from various sources to a read request. Signed-off-by: David Howells Reviewed-by: Jeff Layton cc: Matthew Wilcox cc: linux-mm@kvack.org cc: linux-cachefs@redhat.com cc: linux-afs@lists.infradead.org cc: linux-nfs@vger.kernel.org cc: linux-cifs@vger.kernel.org cc: ceph-devel@vger.kernel.org cc: v9fs-developer@lists.sourceforge.net cc: linux-fsdevel@vger.kernel.org --- fs/netfs/read_helper.c | 28 ++++++ include/linux/netfs.h | 2 include/trace/events/netfs.h | 199 ++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 229 insertions(+) create mode 100644 include/trace/events/netfs.h diff --git a/fs/netfs/read_helper.c b/fs/netfs/read_helper.c index a35bd29928fa..53b04b105179 100644 --- a/fs/netfs/read_helper.c +++ b/fs/netfs/read_helper.c @@ -16,6 +16,8 @@ #include #include #include "internal.h" +#define CREATE_TRACE_POINTS +#include MODULE_DESCRIPTION("Network fs support"); MODULE_AUTHOR("Red Hat, Inc."); @@ -38,6 +40,7 @@ static struct netfs_read_request *netfs_alloc_read_request( const struct netfs_read_request_ops *ops, void *netfs_priv, struct file *file) { + static atomic_t debug_ids; struct netfs_read_request *rreq; rreq = kzalloc(sizeof(struct netfs_read_request), GFP_KERNEL); @@ -46,6 +49,7 @@ static struct netfs_read_request *netfs_alloc_read_request( rreq->netfs_priv = netfs_priv; rreq->inode = file_inode(file); rreq->i_size = i_size_read(rreq->inode); + rreq->debug_id = atomic_inc_return(&debug_ids); INIT_LIST_HEAD(&rreq->subrequests); INIT_WORK(&rreq->work, netfs_rreq_work); refcount_set(&rreq->usage, 1); @@ -80,6 +84,7 @@ static void netfs_free_read_request(struct work_struct *work) netfs_rreq_clear_subreqs(rreq); if (rreq->netfs_priv) rreq->netfs_ops->cleanup(rreq->mapping, rreq->netfs_priv); + trace_netfs_rreq(rreq, netfs_rreq_trace_free); kfree(rreq); } @@ -122,6 +127,7 @@ static void netfs_get_read_subrequest(struct netfs_read_subrequest *subreq) static void __netfs_put_subrequest(struct netfs_read_subrequest *subreq) { + trace_netfs_sreq(subreq, netfs_sreq_trace_free); netfs_put_read_request(subreq->rreq); kfree(subreq); } @@ -176,6 +182,7 @@ static void netfs_read_from_server(struct netfs_read_request *rreq, */ static void netfs_rreq_completed(struct netfs_read_request *rreq) { + trace_netfs_rreq(rreq, netfs_rreq_trace_done); netfs_rreq_clear_subreqs(rreq); netfs_put_read_request(rreq); } @@ -214,6 +221,8 @@ static void netfs_rreq_unlock(struct netfs_read_request *rreq) iopos = 0; subreq_failed = (subreq->error < 0); + trace_netfs_rreq(rreq, netfs_rreq_trace_unlock); + rcu_read_lock(); xas_for_each(&xas, page, last_page) { unsigned int pgpos = (page->index - start_page) * PAGE_SIZE; @@ -274,6 +283,8 @@ static void netfs_rreq_short_read(struct netfs_read_request *rreq, __clear_bit(NETFS_SREQ_SHORT_READ, &subreq->flags); __set_bit(NETFS_SREQ_SEEK_DATA_READ, &subreq->flags); + trace_netfs_sreq(subreq, netfs_sreq_trace_resubmit_short); + netfs_get_read_subrequest(subreq); atomic_inc(&rreq->nr_rd_ops); netfs_read_from_server(rreq, subreq); @@ -289,6 +300,8 @@ static bool netfs_rreq_perform_resubmissions(struct netfs_read_request *rreq) WARN_ON(in_softirq()); + trace_netfs_rreq(rreq, netfs_rreq_trace_resubmit); + /* We don't want terminating submissions trying to wake us up whilst * we're still going through the list. */ @@ -301,6 +314,7 @@ static bool netfs_rreq_perform_resubmissions(struct netfs_read_request *rreq) break; subreq->source = NETFS_DOWNLOAD_FROM_SERVER; subreq->error = 0; + trace_netfs_sreq(subreq, netfs_sreq_trace_download_instead); netfs_get_read_subrequest(subreq); atomic_inc(&rreq->nr_rd_ops); netfs_read_from_server(rreq, subreq); @@ -325,6 +339,8 @@ static bool netfs_rreq_perform_resubmissions(struct netfs_read_request *rreq) */ static void netfs_rreq_assess(struct netfs_read_request *rreq) { + trace_netfs_rreq(rreq, netfs_rreq_trace_assess); + again: if (!test_bit(NETFS_RREQ_FAILED, &rreq->flags) && test_bit(NETFS_RREQ_INCOMPLETE_IO, &rreq->flags)) { @@ -409,6 +425,8 @@ void netfs_subreq_terminated(struct netfs_read_subrequest *subreq, set_bit(NETFS_RREQ_WRITE_TO_CACHE, &rreq->flags); out: + trace_netfs_sreq(subreq, netfs_sreq_trace_terminated); + /* If we decrement nr_rd_ops to 0, the ref belongs to us. */ u = atomic_dec_return(&rreq->nr_rd_ops); if (u == 0) @@ -497,6 +515,7 @@ netfs_rreq_prepare_read(struct netfs_read_request *rreq, out: subreq->source = source; + trace_netfs_sreq(subreq, netfs_sreq_trace_prepare); return source; } @@ -536,6 +555,7 @@ static bool netfs_rreq_submit_slice(struct netfs_read_request *rreq, rreq->submitted += subreq->len; + trace_netfs_sreq(subreq, netfs_sreq_trace_submit); switch (source) { case NETFS_FILL_WITH_ZEROES: netfs_fill_with_zeroes(rreq, subreq); @@ -578,6 +598,9 @@ static void netfs_rreq_expand(struct netfs_read_request *rreq, readahead_expand(ractl, rreq->start, rreq->len); rreq->start = readahead_pos(ractl); rreq->len = readahead_length(ractl); + + trace_netfs_read(rreq, readahead_pos(ractl), readahead_length(ractl), + netfs_read_trace_expanded); } } @@ -619,6 +642,9 @@ void netfs_readahead(struct readahead_control *ractl, rreq->start = readahead_pos(ractl); rreq->len = readahead_length(ractl); + trace_netfs_read(rreq, readahead_pos(ractl), readahead_length(ractl), + netfs_read_trace_readahead); + netfs_rreq_expand(rreq, ractl); atomic_set(&rreq->nr_rd_ops, 1); @@ -690,6 +716,8 @@ int netfs_readpage(struct file *file, rreq->start = page->index * PAGE_SIZE; rreq->len = thp_size(page); + trace_netfs_read(rreq, rreq->start, rreq->len, netfs_read_trace_readpage); + netfs_get_read_request(rreq); atomic_set(&rreq->nr_rd_ops, 1); diff --git a/include/linux/netfs.h b/include/linux/netfs.h index 7f74d668a459..24083dc0adfa 100644 --- a/include/linux/netfs.h +++ b/include/linux/netfs.h @@ -91,6 +91,8 @@ struct netfs_read_request { struct address_space *mapping; /* The mapping being accessed */ struct list_head subrequests; /* Requests to fetch I/O from disk or net */ void *netfs_priv; /* Private data for the netfs */ + unsigned int debug_id; + unsigned int cookie_debug_id; atomic_t nr_rd_ops; /* Number of read ops in progress */ size_t submitted; /* Amount submitted for I/O so far */ size_t len; /* Length of the request */ diff --git a/include/trace/events/netfs.h b/include/trace/events/netfs.h new file mode 100644 index 000000000000..12ad382764c5 --- /dev/null +++ b/include/trace/events/netfs.h @@ -0,0 +1,199 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ +/* Network filesystem support module tracepoints + * + * Copyright (C) 2021 Red Hat, Inc. All Rights Reserved. + * Written by David Howells (dhowells@redhat.com) + */ +#undef TRACE_SYSTEM +#define TRACE_SYSTEM netfs + +#if !defined(_TRACE_NETFS_H) || defined(TRACE_HEADER_MULTI_READ) +#define _TRACE_NETFS_H + +#include + +/* + * Define enums for tracing information. + */ +#ifndef __NETFS_DECLARE_TRACE_ENUMS_ONCE_ONLY +#define __NETFS_DECLARE_TRACE_ENUMS_ONCE_ONLY + +enum netfs_read_trace { + netfs_read_trace_expanded, + netfs_read_trace_readahead, + netfs_read_trace_readpage, +}; + +enum netfs_rreq_trace { + netfs_rreq_trace_assess, + netfs_rreq_trace_done, + netfs_rreq_trace_free, + netfs_rreq_trace_resubmit, + netfs_rreq_trace_unlock, + netfs_rreq_trace_unmark, + netfs_rreq_trace_write, +}; + +enum netfs_sreq_trace { + netfs_sreq_trace_download_instead, + netfs_sreq_trace_free, + netfs_sreq_trace_prepare, + netfs_sreq_trace_resubmit_short, + netfs_sreq_trace_submit, + netfs_sreq_trace_terminated, + netfs_sreq_trace_write, + netfs_sreq_trace_write_term, +}; + +#endif + +#define netfs_read_traces \ + EM(netfs_read_trace_expanded, "EXPANDED ") \ + EM(netfs_read_trace_readahead, "READAHEAD") \ + E_(netfs_read_trace_readpage, "READPAGE ") + +#define netfs_rreq_traces \ + EM(netfs_rreq_trace_assess, "ASSESS") \ + EM(netfs_rreq_trace_done, "DONE ") \ + EM(netfs_rreq_trace_free, "FREE ") \ + EM(netfs_rreq_trace_resubmit, "RESUBM") \ + EM(netfs_rreq_trace_unlock, "UNLOCK") \ + EM(netfs_rreq_trace_unmark, "UNMARK") \ + E_(netfs_rreq_trace_write, "WRITE ") + +#define netfs_sreq_sources \ + EM(NETFS_FILL_WITH_ZEROES, "ZERO") \ + EM(NETFS_DOWNLOAD_FROM_SERVER, "DOWN") \ + EM(NETFS_READ_FROM_CACHE, "READ") \ + E_(NETFS_INVALID_READ, "INVL") \ + +#define netfs_sreq_traces \ + EM(netfs_sreq_trace_download_instead, "RDOWN") \ + EM(netfs_sreq_trace_free, "FREE ") \ + EM(netfs_sreq_trace_prepare, "PREP ") \ + EM(netfs_sreq_trace_resubmit_short, "SHORT") \ + EM(netfs_sreq_trace_submit, "SUBMT") \ + EM(netfs_sreq_trace_terminated, "TERM ") \ + EM(netfs_sreq_trace_write, "WRITE") \ + E_(netfs_sreq_trace_write_term, "WTERM") + + +/* + * Export enum symbols via userspace. + */ +#undef EM +#undef E_ +#define EM(a, b) TRACE_DEFINE_ENUM(a); +#define E_(a, b) TRACE_DEFINE_ENUM(a); + +netfs_read_traces; +netfs_rreq_traces; +netfs_sreq_sources; +netfs_sreq_traces; + +/* + * Now redefine the EM() and E_() macros to map the enums to the strings that + * will be printed in the output. + */ +#undef EM +#undef E_ +#define EM(a, b) { a, b }, +#define E_(a, b) { a, b } + +TRACE_EVENT(netfs_read, + TP_PROTO(struct netfs_read_request *rreq, + loff_t start, size_t len, + enum netfs_read_trace what), + + TP_ARGS(rreq, start, len, what), + + TP_STRUCT__entry( + __field(unsigned int, rreq ) + __field(unsigned int, cookie ) + __field(loff_t, start ) + __field(size_t, len ) + __field(enum netfs_read_trace, what ) + ), + + TP_fast_assign( + __entry->rreq = rreq->debug_id; + __entry->cookie = rreq->cookie_debug_id; + __entry->start = start; + __entry->len = len; + __entry->what = what; + ), + + TP_printk("R=%08x %s c=%08x s=%llx %zx", + __entry->rreq, + __print_symbolic(__entry->what, netfs_read_traces), + __entry->cookie, + __entry->start, __entry->len) + ); + +TRACE_EVENT(netfs_rreq, + TP_PROTO(struct netfs_read_request *rreq, + enum netfs_rreq_trace what), + + TP_ARGS(rreq, what), + + TP_STRUCT__entry( + __field(unsigned int, rreq ) + __field(unsigned short, flags ) + __field(enum netfs_rreq_trace, what ) + ), + + TP_fast_assign( + __entry->rreq = rreq->debug_id; + __entry->flags = rreq->flags; + __entry->what = what; + ), + + TP_printk("R=%08x %s f=%02x", + __entry->rreq, + __print_symbolic(__entry->what, netfs_rreq_traces), + __entry->flags) + ); + +TRACE_EVENT(netfs_sreq, + TP_PROTO(struct netfs_read_subrequest *sreq, + enum netfs_sreq_trace what), + + TP_ARGS(sreq, what), + + TP_STRUCT__entry( + __field(unsigned int, rreq ) + __field(unsigned short, index ) + __field(short, error ) + __field(unsigned short, flags ) + __field(enum netfs_read_source, source ) + __field(enum netfs_sreq_trace, what ) + __field(size_t, len ) + __field(size_t, transferred ) + __field(loff_t, start ) + ), + + TP_fast_assign( + __entry->rreq = sreq->rreq->debug_id; + __entry->index = sreq->debug_index; + __entry->error = sreq->error; + __entry->flags = sreq->flags; + __entry->source = sreq->source; + __entry->what = what; + __entry->len = sreq->len; + __entry->transferred = sreq->transferred; + __entry->start = sreq->start; + ), + + TP_printk("R=%08x[%u] %s %s f=%02x s=%llx %zx/%zx e=%d", + __entry->rreq, __entry->index, + __print_symbolic(__entry->what, netfs_sreq_traces), + __print_symbolic(__entry->source, netfs_sreq_sources), + __entry->flags, + __entry->start, __entry->transferred, __entry->len, + __entry->error) + ); + +#endif /* _TRACE_NETFS_H */ + +/* This part must be outside protection */ +#include From patchwork Mon Feb 15 15:46:11 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 12088455 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B496FC433DB for ; Mon, 15 Feb 2021 15:46:24 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 4D31164DDA for ; Mon, 15 Feb 2021 15:46:24 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 4D31164DDA Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id D29B98D0115; Mon, 15 Feb 2021 10:46:23 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id CDBA88D00FD; Mon, 15 Feb 2021 10:46:23 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BF0E08D0115; Mon, 15 Feb 2021 10:46:23 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0254.hostedemail.com [216.40.44.254]) by kanga.kvack.org (Postfix) with ESMTP id A78008D00FD for ; Mon, 15 Feb 2021 10:46:23 -0500 (EST) Received: from smtpin13.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 66C4C1820BA7A for ; Mon, 15 Feb 2021 15:46:23 +0000 (UTC) X-FDA: 77820929046.13.rice63_0b035092763c Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin13.hostedemail.com (Postfix) with ESMTP id 409E2180FDD83 for ; Mon, 15 Feb 2021 15:46:23 +0000 (UTC) X-HE-Tag: rice63_0b035092763c X-Filterd-Recvd-Size: 13312 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [216.205.24.124]) by imf10.hostedemail.com (Postfix) with ESMTP for ; Mon, 15 Feb 2021 15:46:22 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1613403982; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=FSeGQPRgQOfnLZQYeacjhtBI6uVsAcX8IrVMZtVQC4Q=; b=VMkHXAoJ96/f4Uu4E2pznKEEyYn+1nwX8wTY/8bbjB/BKDIE2B+1cZFOAVApNCFQAhnW8H u3YBq/RM7VpJvsJ+GYVPPBQywPHxuhPDzjZETnfk/Vz7Z5AGSbe1RlgFZ96AgpNZ3s/aXv 5k9d4igA9AQKyFSDMeXe8tGlBg6vMj4= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-75-RR9oVwJCMjih5l2RF-Fw4Q-1; Mon, 15 Feb 2021 10:46:20 -0500 X-MC-Unique: RR9oVwJCMjih5l2RF-Fw4Q-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 711A479EC2; Mon, 15 Feb 2021 15:46:18 +0000 (UTC) Received: from warthog.procyon.org.uk (ovpn-119-68.rdu2.redhat.com [10.10.119.68]) by smtp.corp.redhat.com (Postfix) with ESMTP id F32BB5D9C0; Mon, 15 Feb 2021 15:46:11 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [PATCH 10/33] netfs: Gather stats From: David Howells To: Trond Myklebust , Anna Schumaker , Steve French , Dominique Martinet Cc: Jeff Layton , Matthew Wilcox , linux-mm@kvack.org, linux-cachefs@redhat.com, linux-afs@lists.infradead.org, linux-nfs@vger.kernel.org, linux-cifs@vger.kernel.org, ceph-devel@vger.kernel.org, v9fs-developer@lists.sourceforge.net, linux-fsdevel@vger.kernel.org, dhowells@redhat.com, Jeff Layton , David Wysochanski , "Matthew Wilcox (Oracle)" , Alexander Viro , linux-cachefs@redhat.com, linux-afs@lists.infradead.org, linux-nfs@vger.kernel.org, linux-cifs@vger.kernel.org, ceph-devel@vger.kernel.org, v9fs-developer@lists.sourceforge.net, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Date: Mon, 15 Feb 2021 15:46:11 +0000 Message-ID: <161340397101.1303470.17581910581108378458.stgit@warthog.procyon.org.uk> In-Reply-To: <161340385320.1303470.2392622971006879777.stgit@warthog.procyon.org.uk> References: <161340385320.1303470.2392622971006879777.stgit@warthog.procyon.org.uk> User-Agent: StGit/0.23 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Gather statistics from the netfs interface that can be exported through a seqfile. This is intended to be called by a later patch when viewing /proc/fs/fscache/stats. Signed-off-by: David Howells Reviewed-by: Jeff Layton cc: Matthew Wilcox cc: linux-mm@kvack.org cc: linux-cachefs@redhat.com cc: linux-afs@lists.infradead.org cc: linux-nfs@vger.kernel.org cc: linux-cifs@vger.kernel.org cc: ceph-devel@vger.kernel.org cc: v9fs-developer@lists.sourceforge.net cc: linux-fsdevel@vger.kernel.org --- fs/netfs/Kconfig | 15 +++++++++++++ fs/netfs/Makefile | 3 +-- fs/netfs/internal.h | 34 ++++++++++++++++++++++++++++++ fs/netfs/read_helper.c | 23 ++++++++++++++++++++ fs/netfs/stats.c | 54 ++++++++++++++++++++++++++++++++++++++++++++++++ include/linux/netfs.h | 1 + 6 files changed, 128 insertions(+), 2 deletions(-) create mode 100644 fs/netfs/stats.c diff --git a/fs/netfs/Kconfig b/fs/netfs/Kconfig index 2ebf90e6ca95..578112713703 100644 --- a/fs/netfs/Kconfig +++ b/fs/netfs/Kconfig @@ -6,3 +6,18 @@ config NETFS_SUPPORT This option enables support for network filesystems, including helpers for high-level buffered I/O, abstracting out read segmentation, local caching and transparent huge page support. + +config NETFS_STATS + bool "Gather statistical information on local caching" + depends on NETFS_SUPPORT && PROC_FS + help + This option causes statistical information to be gathered on local + caching and exported through file: + + /proc/fs/fscache/stats + + The gathering of statistics adds a certain amount of overhead to + execution as there are a quite a few stats gathered, and on a + multi-CPU system these may be on cachelines that keep bouncing + between CPUs. On the other hand, the stats are very useful for + debugging purposes. Saying 'Y' here is recommended. diff --git a/fs/netfs/Makefile b/fs/netfs/Makefile index 4b4eff2ba369..c15bfc966d96 100644 --- a/fs/netfs/Makefile +++ b/fs/netfs/Makefile @@ -1,6 +1,5 @@ # SPDX-License-Identifier: GPL-2.0 -netfs-y := \ - read_helper.o +netfs-y := read_helper.o stats.o obj-$(CONFIG_NETFS_SUPPORT) := netfs.o diff --git a/fs/netfs/internal.h b/fs/netfs/internal.h index ee665c0e7dc8..98b6f4516da1 100644 --- a/fs/netfs/internal.h +++ b/fs/netfs/internal.h @@ -16,8 +16,42 @@ */ extern unsigned int netfs_debug; +/* + * stats.c + */ +#ifdef CONFIG_NETFS_STATS +extern atomic_t netfs_n_rh_readahead; +extern atomic_t netfs_n_rh_readpage; +extern atomic_t netfs_n_rh_rreq; +extern atomic_t netfs_n_rh_sreq; +extern atomic_t netfs_n_rh_download; +extern atomic_t netfs_n_rh_download_done; +extern atomic_t netfs_n_rh_download_failed; +extern atomic_t netfs_n_rh_download_instead; +extern atomic_t netfs_n_rh_read; +extern atomic_t netfs_n_rh_read_done; +extern atomic_t netfs_n_rh_read_failed; +extern atomic_t netfs_n_rh_zero; +extern atomic_t netfs_n_rh_short_read; +extern atomic_t netfs_n_rh_write; +extern atomic_t netfs_n_rh_write_done; +extern atomic_t netfs_n_rh_write_failed; + + +static inline void netfs_stat(atomic_t *stat) +{ + atomic_inc(stat); +} + +static inline void netfs_stat_d(atomic_t *stat) +{ + atomic_dec(stat); +} + +#else #define netfs_stat(x) do {} while(0) #define netfs_stat_d(x) do {} while(0) +#endif /*****************************************************************************/ /* diff --git a/fs/netfs/read_helper.c b/fs/netfs/read_helper.c index 53b04b105179..4f6f708f8f18 100644 --- a/fs/netfs/read_helper.c +++ b/fs/netfs/read_helper.c @@ -55,6 +55,7 @@ static struct netfs_read_request *netfs_alloc_read_request( refcount_set(&rreq->usage, 1); __set_bit(NETFS_RREQ_IN_PROGRESS, &rreq->flags); ops->init_rreq(rreq, file); + netfs_stat(&netfs_n_rh_rreq); } return rreq; @@ -86,6 +87,7 @@ static void netfs_free_read_request(struct work_struct *work) rreq->netfs_ops->cleanup(rreq->mapping, rreq->netfs_priv); trace_netfs_rreq(rreq, netfs_rreq_trace_free); kfree(rreq); + netfs_stat_d(&netfs_n_rh_rreq); } static void netfs_put_read_request(struct netfs_read_request *rreq) @@ -115,6 +117,7 @@ static struct netfs_read_subrequest *netfs_alloc_subrequest( refcount_set(&subreq->usage, 2); subreq->rreq = rreq; netfs_get_read_request(rreq); + netfs_stat(&netfs_n_rh_sreq); } return subreq; @@ -130,6 +133,7 @@ static void __netfs_put_subrequest(struct netfs_read_subrequest *subreq) trace_netfs_sreq(subreq, netfs_sreq_trace_free); netfs_put_read_request(subreq->rreq); kfree(subreq); + netfs_stat_d(&netfs_n_rh_sreq); } /* @@ -151,6 +155,7 @@ static void netfs_clear_unread(struct netfs_read_subrequest *subreq) static void netfs_fill_with_zeroes(struct netfs_read_request *rreq, struct netfs_read_subrequest *subreq) { + netfs_stat(&netfs_n_rh_zero); __set_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags); netfs_subreq_terminated(subreq, 0); } @@ -174,6 +179,7 @@ static void netfs_fill_with_zeroes(struct netfs_read_request *rreq, static void netfs_read_from_server(struct netfs_read_request *rreq, struct netfs_read_subrequest *subreq) { + netfs_stat(&netfs_n_rh_download); rreq->netfs_ops->issue_op(subreq); } @@ -283,6 +289,7 @@ static void netfs_rreq_short_read(struct netfs_read_request *rreq, __clear_bit(NETFS_SREQ_SHORT_READ, &subreq->flags); __set_bit(NETFS_SREQ_SEEK_DATA_READ, &subreq->flags); + netfs_stat(&netfs_n_rh_short_read); trace_netfs_sreq(subreq, netfs_sreq_trace_resubmit_short); netfs_get_read_subrequest(subreq); @@ -314,6 +321,7 @@ static bool netfs_rreq_perform_resubmissions(struct netfs_read_request *rreq) break; subreq->source = NETFS_DOWNLOAD_FROM_SERVER; subreq->error = 0; + netfs_stat(&netfs_n_rh_download_instead); trace_netfs_sreq(subreq, netfs_sreq_trace_download_instead); netfs_get_read_subrequest(subreq); atomic_inc(&rreq->nr_rd_ops); @@ -406,6 +414,17 @@ void netfs_subreq_terminated(struct netfs_read_subrequest *subreq, subreq->debug_index, subreq->start, subreq->flags, transferred_or_error); + switch (subreq->source) { + case NETFS_READ_FROM_CACHE: + netfs_stat(&netfs_n_rh_read_done); + break; + case NETFS_DOWNLOAD_FROM_SERVER: + netfs_stat(&netfs_n_rh_download_done); + break; + default: + break; + } + if (IS_ERR_VALUE(transferred_or_error)) { subreq->error = transferred_or_error; goto failed; @@ -459,8 +478,10 @@ void netfs_subreq_terminated(struct netfs_read_subrequest *subreq, failed: if (subreq->source == NETFS_READ_FROM_CACHE) { + netfs_stat(&netfs_n_rh_read_failed); set_bit(NETFS_RREQ_INCOMPLETE_IO, &rreq->flags); } else { + netfs_stat(&netfs_n_rh_download_failed); set_bit(NETFS_RREQ_FAILED, &rreq->flags); rreq->error = subreq->error; } @@ -642,6 +663,7 @@ void netfs_readahead(struct readahead_control *ractl, rreq->start = readahead_pos(ractl); rreq->len = readahead_length(ractl); + netfs_stat(&netfs_n_rh_readahead); trace_netfs_read(rreq, readahead_pos(ractl), readahead_length(ractl), netfs_read_trace_readahead); @@ -716,6 +738,7 @@ int netfs_readpage(struct file *file, rreq->start = page->index * PAGE_SIZE; rreq->len = thp_size(page); + netfs_stat(&netfs_n_rh_readpage); trace_netfs_read(rreq, rreq->start, rreq->len, netfs_read_trace_readpage); netfs_get_read_request(rreq); diff --git a/fs/netfs/stats.c b/fs/netfs/stats.c new file mode 100644 index 000000000000..df6ff5718f25 --- /dev/null +++ b/fs/netfs/stats.c @@ -0,0 +1,54 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* Netfs support statistics + * + * Copyright (C) 2021 Red Hat, Inc. All Rights Reserved. + * Written by David Howells (dhowells@redhat.com) + */ + +#include +#include +#include +#include "internal.h" + +atomic_t netfs_n_rh_readahead; +atomic_t netfs_n_rh_readpage; +atomic_t netfs_n_rh_rreq; +atomic_t netfs_n_rh_sreq; +atomic_t netfs_n_rh_download; +atomic_t netfs_n_rh_download_done; +atomic_t netfs_n_rh_download_failed; +atomic_t netfs_n_rh_download_instead; +atomic_t netfs_n_rh_read; +atomic_t netfs_n_rh_read_done; +atomic_t netfs_n_rh_read_failed; +atomic_t netfs_n_rh_zero; +atomic_t netfs_n_rh_short_read; +atomic_t netfs_n_rh_write; +atomic_t netfs_n_rh_write_done; +atomic_t netfs_n_rh_write_failed; + +void netfs_stats_show(struct seq_file *m) +{ + seq_printf(m, "RdHelp : RA=%u RP=%u rr=%u sr=%u\n", + atomic_read(&netfs_n_rh_readahead), + atomic_read(&netfs_n_rh_readpage), + atomic_read(&netfs_n_rh_rreq), + atomic_read(&netfs_n_rh_sreq)); + seq_printf(m, "RdHelp : ZR=%u sh=%u\n", + atomic_read(&netfs_n_rh_zero), + atomic_read(&netfs_n_rh_short_read)); + seq_printf(m, "RdHelp : DL=%u ds=%u df=%u di=%u\n", + atomic_read(&netfs_n_rh_download), + atomic_read(&netfs_n_rh_download_done), + atomic_read(&netfs_n_rh_download_failed), + atomic_read(&netfs_n_rh_download_instead)); + seq_printf(m, "RdHelp : RD=%u rs=%u rf=%u\n", + atomic_read(&netfs_n_rh_read), + atomic_read(&netfs_n_rh_read_done), + atomic_read(&netfs_n_rh_read_failed)); + seq_printf(m, "RdHelp : WR=%u ws=%u wf=%u\n", + atomic_read(&netfs_n_rh_write), + atomic_read(&netfs_n_rh_write_done), + atomic_read(&netfs_n_rh_write_failed)); +} +EXPORT_SYMBOL(netfs_stats_show); diff --git a/include/linux/netfs.h b/include/linux/netfs.h index 24083dc0adfa..b8237b6f17cb 100644 --- a/include/linux/netfs.h +++ b/include/linux/netfs.h @@ -134,5 +134,6 @@ extern int netfs_readpage(struct file *, void *); extern void netfs_subreq_terminated(struct netfs_read_subrequest *, ssize_t); +extern void netfs_stats_show(struct seq_file *); #endif /* _LINUX_NETFS_H */ From patchwork Mon Feb 15 15:46:23 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 12088457 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0BF88C433E0 for ; Mon, 15 Feb 2021 15:46:39 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 86CF664DDA for ; Mon, 15 Feb 2021 15:46:38 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 86CF664DDA Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 1E3F88D0116; Mon, 15 Feb 2021 10:46:38 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 1B7F08D00FD; Mon, 15 Feb 2021 10:46:38 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0A6ED8D0116; Mon, 15 Feb 2021 10:46:38 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0152.hostedemail.com [216.40.44.152]) by kanga.kvack.org (Postfix) with ESMTP id E5CDB8D00FD for ; Mon, 15 Feb 2021 10:46:37 -0500 (EST) Received: from smtpin14.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id AD9D18248068 for ; Mon, 15 Feb 2021 15:46:37 +0000 (UTC) X-FDA: 77820929634.14.seat34_2e02bfc2763c Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin14.hostedemail.com (Postfix) with ESMTP id 8EFB118229996 for ; Mon, 15 Feb 2021 15:46:37 +0000 (UTC) X-HE-Tag: seat34_2e02bfc2763c X-Filterd-Recvd-Size: 13407 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [63.128.21.124]) by imf01.hostedemail.com (Postfix) with ESMTP for ; Mon, 15 Feb 2021 15:46:36 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1613403996; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=1NvFOYT3VpnmCz9Zzl3doeruth2whDwdqaa9ZR5fTAA=; b=iII34ihwaQifz1lKDV7BDCVAZJ5uxxWk+Gz2U4LR84EoNhJHkRCVJokh1qWMFWwROLEQZO FUwqT062Pxl0Y5obIyP3gbsUeUbyoT4bL9CebcAy1H/05IQv6GHugvSyN6pPNr7+aZYOvd wCGCIglzEWDBNim4lhseEESdQzMCsfc= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-154-dsAS2Yf_P8uiPynAvD7FjQ-1; Mon, 15 Feb 2021 10:46:32 -0500 X-MC-Unique: dsAS2Yf_P8uiPynAvD7FjQ-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 71F2B18A08BF; Mon, 15 Feb 2021 15:46:30 +0000 (UTC) Received: from warthog.procyon.org.uk (ovpn-119-68.rdu2.redhat.com [10.10.119.68]) by smtp.corp.redhat.com (Postfix) with ESMTP id 8A90E5D9C0; Mon, 15 Feb 2021 15:46:24 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [PATCH 11/33] netfs: Add write_begin helper From: David Howells To: Trond Myklebust , Anna Schumaker , Steve French , Dominique Martinet Cc: Jeff Layton , Matthew Wilcox , linux-mm@kvack.org, linux-cachefs@redhat.com, linux-afs@lists.infradead.org, linux-nfs@vger.kernel.org, linux-cifs@vger.kernel.org, ceph-devel@vger.kernel.org, v9fs-developer@lists.sourceforge.net, linux-fsdevel@vger.kernel.org, dhowells@redhat.com, Jeff Layton , David Wysochanski , "Matthew Wilcox (Oracle)" , Alexander Viro , linux-cachefs@redhat.com, linux-afs@lists.infradead.org, linux-nfs@vger.kernel.org, linux-cifs@vger.kernel.org, ceph-devel@vger.kernel.org, v9fs-developer@lists.sourceforge.net, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Date: Mon, 15 Feb 2021 15:46:23 +0000 Message-ID: <161340398368.1303470.11242918276563276090.stgit@warthog.procyon.org.uk> In-Reply-To: <161340385320.1303470.2392622971006879777.stgit@warthog.procyon.org.uk> References: <161340385320.1303470.2392622971006879777.stgit@warthog.procyon.org.uk> User-Agent: StGit/0.23 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Add a helper to do the pre-reading work for the netfs write_begin address space op. Signed-off-by: David Howells Reviewed-by: Jeff Layton cc: Matthew Wilcox cc: linux-mm@kvack.org cc: linux-cachefs@redhat.com cc: linux-afs@lists.infradead.org cc: linux-nfs@vger.kernel.org cc: linux-cifs@vger.kernel.org cc: ceph-devel@vger.kernel.org cc: v9fs-developer@lists.sourceforge.net cc: linux-fsdevel@vger.kernel.org --- fs/netfs/internal.h | 2 + fs/netfs/read_helper.c | 165 ++++++++++++++++++++++++++++++++++++++++++ fs/netfs/stats.c | 10 ++- include/linux/netfs.h | 8 ++ include/trace/events/netfs.h | 4 + 5 files changed, 185 insertions(+), 4 deletions(-) diff --git a/fs/netfs/internal.h b/fs/netfs/internal.h index 98b6f4516da1..b7f2c4459f33 100644 --- a/fs/netfs/internal.h +++ b/fs/netfs/internal.h @@ -34,8 +34,10 @@ extern atomic_t netfs_n_rh_read_failed; extern atomic_t netfs_n_rh_zero; extern atomic_t netfs_n_rh_short_read; extern atomic_t netfs_n_rh_write; +extern atomic_t netfs_n_rh_write_begin; extern atomic_t netfs_n_rh_write_done; extern atomic_t netfs_n_rh_write_failed; +extern atomic_t netfs_n_rh_write_zskip; static inline void netfs_stat(atomic_t *stat) diff --git a/fs/netfs/read_helper.c b/fs/netfs/read_helper.c index 4f6f708f8f18..d179a37b92fd 100644 --- a/fs/netfs/read_helper.c +++ b/fs/netfs/read_helper.c @@ -766,3 +766,168 @@ int netfs_readpage(struct file *file, return ret; } EXPORT_SYMBOL(netfs_readpage); + +static void netfs_clear_thp(struct page *page) +{ + unsigned int i; + + for (i = 0; i < thp_nr_pages(page); i++) + clear_highpage(page + i); +} + +/** + * netfs_write_begin - Helper to prepare for writing + * @file: The file to read from + * @mapping: The mapping to read from + * @pos: File position at which the write will begin + * @len: The length of the write in this page + * @flags: AOP_* flags + * @_page: Where to put the resultant page + * @_fsdata: Place for the netfs to store a cookie + * @ops: The network filesystem's operations for the helper to use + * @netfs_priv: Private netfs data to be retained in the request + * + * Pre-read data for a write-begin request by drawing data from the cache if + * possible, or the netfs if not. Space beyond the EOF is zero-filled. + * Multiple I/O requests from different sources will get munged together. If + * necessary, the readahead window can be expanded in either direction to a + * more convenient alighment for RPC efficiency or to make storage in the cache + * feasible. + * + * The calling netfs must provide a table of operations, only one of which, + * issue_op, is mandatory. + * + * The check_write_begin() operation can be provided to check for and flush + * conflicting writes once the page is grabbed and locked. It is passed a + * pointer to the fsdata cookie that gets returned to the VM to be passed to + * write_end. It is permitted to sleep. It should return 0 if the request + * should go ahead; unlock the page and return -EAGAIN to cause the page to be + * regot; or return an error. + * + * This is usable whether or not caching is enabled. + */ +int netfs_write_begin(struct file *file, struct address_space *mapping, + loff_t pos, unsigned int len, unsigned int flags, + struct page **_page, void **_fsdata, + const struct netfs_read_request_ops *ops, + void *netfs_priv) +{ + struct netfs_read_request *rreq; + struct page *page, *xpage; + struct inode *inode = file_inode(file); + unsigned int debug_index = 0; + pgoff_t index = pos >> PAGE_SHIFT; + int pos_in_page = pos & ~PAGE_MASK; + loff_t size; + int ret; + + struct readahead_control ractl = { + .file = file, + .mapping = mapping, + ._index = index, + ._nr_pages = 0, + }; + +retry: + page = grab_cache_page_write_begin(mapping, index, 0); + if (!page) + return -ENOMEM; + + if (ops->check_write_begin) { + /* Allow the netfs (eg. ceph) to flush conflicts. */ + ret = ops->check_write_begin(file, pos, len, page, _fsdata); + if (ret < 0) { + if (ret == -EAGAIN) + goto retry; + goto error; + } + } + + if (PageUptodate(page)) + goto have_page; + + /* If the page is beyond the EOF, we want to clear it - unless it's + * within the cache granule containing the EOF, in which case we need + * to preload the granule. + */ + size = i_size_read(inode); + if (!ops->is_cache_enabled(inode) && + ((pos_in_page == 0 && len == thp_size(page)) || + (pos >= size) || + (pos_in_page == 0 && (pos + len) >= size))) { + netfs_clear_thp(page); + SetPageUptodate(page); + netfs_stat(&netfs_n_rh_write_zskip); + goto have_page_no_wait; + } + + ret = -ENOMEM; + rreq = netfs_alloc_read_request(ops, netfs_priv, file); + if (!rreq) + goto error; + rreq->mapping = page->mapping; + rreq->start = page->index * PAGE_SIZE; + rreq->len = thp_size(page); + rreq->no_unlock_page = page->index; + __set_bit(NETFS_RREQ_NO_UNLOCK_PAGE, &rreq->flags); + netfs_priv = NULL; + + netfs_stat(&netfs_n_rh_write_begin); + trace_netfs_read(rreq, pos, len, netfs_read_trace_write_begin); + + /* Expand the request to meet caching requirements and download + * preferences. + */ + ractl._nr_pages = thp_nr_pages(page); + netfs_rreq_expand(rreq, &ractl); + netfs_get_read_request(rreq); + + /* We hold the page locks, so we can drop the references */ + while ((xpage = readahead_page(&ractl))) + if (xpage != page) + put_page(xpage); + + atomic_set(&rreq->nr_rd_ops, 1); + do { + if (!netfs_rreq_submit_slice(rreq, &debug_index)) + break; + + } while (rreq->submitted < rreq->len); + + /* Keep nr_rd_ops incremented so that the ref always belongs to us, and + * the service code isn't punted off to a random thread pool to + * process. + */ + for (;;) { + wait_var_event(&rreq->nr_rd_ops, atomic_read(&rreq->nr_rd_ops) == 1); + netfs_rreq_assess(rreq); + if (!test_bit(NETFS_RREQ_IN_PROGRESS, &rreq->flags)) + break; + cond_resched(); + } + + ret = rreq->error; + if (ret == 0 && rreq->submitted < rreq->len) + ret = -EIO; + netfs_put_read_request(rreq); + if (ret < 0) + goto error; + +have_page: + wait_on_page_fscache(page); +have_page_no_wait: + if (netfs_priv) + ops->cleanup(netfs_priv, mapping); + *_page = page; + _leave(" = 0"); + return 0; + +error: + unlock_page(page); + put_page(page); + if (netfs_priv) + ops->cleanup(netfs_priv, mapping); + _leave(" = %d", ret); + return ret; +} +EXPORT_SYMBOL(netfs_write_begin); diff --git a/fs/netfs/stats.c b/fs/netfs/stats.c index df6ff5718f25..dd7ad66ed07e 100644 --- a/fs/netfs/stats.c +++ b/fs/netfs/stats.c @@ -24,19 +24,23 @@ atomic_t netfs_n_rh_read_failed; atomic_t netfs_n_rh_zero; atomic_t netfs_n_rh_short_read; atomic_t netfs_n_rh_write; +atomic_t netfs_n_rh_write_begin; atomic_t netfs_n_rh_write_done; atomic_t netfs_n_rh_write_failed; +atomic_t netfs_n_rh_write_zskip; void netfs_stats_show(struct seq_file *m) { - seq_printf(m, "RdHelp : RA=%u RP=%u rr=%u sr=%u\n", + seq_printf(m, "RdHelp : RA=%u RP=%u WB=%u rr=%u sr=%u\n", atomic_read(&netfs_n_rh_readahead), atomic_read(&netfs_n_rh_readpage), + atomic_read(&netfs_n_rh_write_begin), atomic_read(&netfs_n_rh_rreq), atomic_read(&netfs_n_rh_sreq)); - seq_printf(m, "RdHelp : ZR=%u sh=%u\n", + seq_printf(m, "RdHelp : ZR=%u sh=%u sk=%u\n", atomic_read(&netfs_n_rh_zero), - atomic_read(&netfs_n_rh_short_read)); + atomic_read(&netfs_n_rh_short_read), + atomic_read(&netfs_n_rh_write_zskip)); seq_printf(m, "RdHelp : DL=%u ds=%u df=%u di=%u\n", atomic_read(&netfs_n_rh_download), atomic_read(&netfs_n_rh_download_done), diff --git a/include/linux/netfs.h b/include/linux/netfs.h index b8237b6f17cb..ec9d1240ba49 100644 --- a/include/linux/netfs.h +++ b/include/linux/netfs.h @@ -115,11 +115,14 @@ struct netfs_read_request { * Operations the network filesystem can/must provide to the helpers. */ struct netfs_read_request_ops { + bool (*is_cache_enabled)(struct inode *inode); void (*init_rreq)(struct netfs_read_request *rreq, struct file *file); void (*expand_readahead)(struct netfs_read_request *rreq); bool (*clamp_length)(struct netfs_read_subrequest *subreq); void (*issue_op)(struct netfs_read_subrequest *subreq); bool (*is_still_valid)(struct netfs_read_request *rreq); + int (*check_write_begin)(struct file *file, loff_t pos, unsigned len, + struct page *page, void **_fsdata); void (*done)(struct netfs_read_request *rreq); void (*cleanup)(struct address_space *mapping, void *netfs_priv); }; @@ -132,6 +135,11 @@ extern int netfs_readpage(struct file *, struct page *, const struct netfs_read_request_ops *, void *); +extern int netfs_write_begin(struct file *, struct address_space *, + loff_t, unsigned int, unsigned int, struct page **, + void **, + const struct netfs_read_request_ops *, + void *); extern void netfs_subreq_terminated(struct netfs_read_subrequest *, ssize_t); extern void netfs_stats_show(struct seq_file *); diff --git a/include/trace/events/netfs.h b/include/trace/events/netfs.h index 12ad382764c5..a2bf6cd84bd4 100644 --- a/include/trace/events/netfs.h +++ b/include/trace/events/netfs.h @@ -22,6 +22,7 @@ enum netfs_read_trace { netfs_read_trace_expanded, netfs_read_trace_readahead, netfs_read_trace_readpage, + netfs_read_trace_write_begin, }; enum netfs_rreq_trace { @@ -50,7 +51,8 @@ enum netfs_sreq_trace { #define netfs_read_traces \ EM(netfs_read_trace_expanded, "EXPANDED ") \ EM(netfs_read_trace_readahead, "READAHEAD") \ - E_(netfs_read_trace_readpage, "READPAGE ") + EM(netfs_read_trace_readpage, "READPAGE ") \ + E_(netfs_read_trace_write_begin, "WRITEBEGN") #define netfs_rreq_traces \ EM(netfs_rreq_trace_assess, "ASSESS") \ From patchwork Mon Feb 15 15:46:35 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 12088459 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 37187C433DB for ; Mon, 15 Feb 2021 15:46:50 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 85A2A64E73 for ; Mon, 15 Feb 2021 15:46:49 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 85A2A64E73 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 169208D0117; Mon, 15 Feb 2021 10:46:49 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 1412D8D00FD; Mon, 15 Feb 2021 10:46:49 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 02FA68D0117; Mon, 15 Feb 2021 10:46:48 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0230.hostedemail.com [216.40.44.230]) by kanga.kvack.org (Postfix) with ESMTP id DDA648D00FD for ; Mon, 15 Feb 2021 10:46:48 -0500 (EST) Received: from smtpin19.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id A0CAA181CE299 for ; Mon, 15 Feb 2021 15:46:48 +0000 (UTC) X-FDA: 77820930096.19.trail81_1005ea32763c Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin19.hostedemail.com (Postfix) with ESMTP id 787CC1AD1B1 for ; Mon, 15 Feb 2021 15:46:48 +0000 (UTC) X-HE-Tag: trail81_1005ea32763c X-Filterd-Recvd-Size: 21459 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [216.205.24.124]) by imf42.hostedemail.com (Postfix) with ESMTP for ; Mon, 15 Feb 2021 15:46:47 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1613404007; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=QE0n7rWrBQ2i7DgbcizqLoWK1/iqYLgeje7wNt2nODM=; b=ILKfVJ0EEKdRJ3fnm2gwz0tNQQzpxSs/MTIE2umLsns1t1oVHRCxsBU6p79VsSx9ge8ODS XPmzLyrCd/3QVlHQuaLonVRnD9pl9+hfUldY1DPCuauTnnzjzhSRwfepuryPeQ8jZuR/7Y drTV41dM7bcIams8R1xNjMB1klfUG/s= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-297-JSfeZ1kkO5adpEhXBnjahA-1; Mon, 15 Feb 2021 10:46:45 -0500 X-MC-Unique: JSfeZ1kkO5adpEhXBnjahA-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 1B7DC801962; Mon, 15 Feb 2021 15:46:43 +0000 (UTC) Received: from warthog.procyon.org.uk (ovpn-119-68.rdu2.redhat.com [10.10.119.68]) by smtp.corp.redhat.com (Postfix) with ESMTP id BE26E5B697; Mon, 15 Feb 2021 15:46:36 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [PATCH 12/33] netfs: Define an interface to talk to a cache From: David Howells To: Trond Myklebust , Anna Schumaker , Steve French , Dominique Martinet Cc: Jeff Layton , Matthew Wilcox , linux-mm@kvack.org, linux-cachefs@redhat.com, linux-afs@lists.infradead.org, linux-nfs@vger.kernel.org, linux-cifs@vger.kernel.org, ceph-devel@vger.kernel.org, v9fs-developer@lists.sourceforge.net, linux-fsdevel@vger.kernel.org, dhowells@redhat.com, Jeff Layton , David Wysochanski , "Matthew Wilcox (Oracle)" , Alexander Viro , linux-cachefs@redhat.com, linux-afs@lists.infradead.org, linux-nfs@vger.kernel.org, linux-cifs@vger.kernel.org, ceph-devel@vger.kernel.org, v9fs-developer@lists.sourceforge.net, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Date: Mon, 15 Feb 2021 15:46:35 +0000 Message-ID: <161340399569.1303470.1138884774643385730.stgit@warthog.procyon.org.uk> In-Reply-To: <161340385320.1303470.2392622971006879777.stgit@warthog.procyon.org.uk> References: <161340385320.1303470.2392622971006879777.stgit@warthog.procyon.org.uk> User-Agent: StGit/0.23 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Add an interface to the netfs helper library for reading data from the cache instead of downloading it from the server and support for writing data just downloaded or cleared to the cache. The API passes an iov_iter to the cache read/write routines to indicate the data/buffer to be used. This is done using the ITER_XARRAY type to provide direct access to the netfs inode's pagecache. When the netfs's ->begin_cache_operation() method is called, this must fill in the cache_resources in the netfs_read_request struct, including the netfs_cache_ops used by the helper lib to talk to the cache. The helper lib does not directly access the cache. Signed-off-by: David Howells Reviewed-by: Jeff Layton cc: Matthew Wilcox cc: linux-mm@kvack.org cc: linux-cachefs@redhat.com cc: linux-afs@lists.infradead.org cc: linux-nfs@vger.kernel.org cc: linux-cifs@vger.kernel.org cc: ceph-devel@vger.kernel.org cc: v9fs-developer@lists.sourceforge.net cc: linux-fsdevel@vger.kernel.org --- fs/netfs/read_helper.c | 230 +++++++++++++++++++++++++++++++++++++++++ fs/netfs/stats.c | 3 - include/linux/fscache-cache.h | 4 + include/linux/fscache.h | 17 +++ include/linux/netfs.h | 48 +++++++++ 5 files changed, 300 insertions(+), 2 deletions(-) diff --git a/fs/netfs/read_helper.c b/fs/netfs/read_helper.c index d179a37b92fd..156941e0de61 100644 --- a/fs/netfs/read_helper.c +++ b/fs/netfs/read_helper.c @@ -86,6 +86,8 @@ static void netfs_free_read_request(struct work_struct *work) if (rreq->netfs_priv) rreq->netfs_ops->cleanup(rreq->mapping, rreq->netfs_priv); trace_netfs_rreq(rreq, netfs_rreq_trace_free); + if (rreq->cache_resources.ops) + rreq->cache_resources.ops->end_operation(&rreq->cache_resources); kfree(rreq); netfs_stat_d(&netfs_n_rh_rreq); } @@ -149,6 +151,32 @@ static void netfs_clear_unread(struct netfs_read_subrequest *subreq) iov_iter_zero(iov_iter_count(&iter), &iter); } +static void netfs_cache_read_terminated(void *priv, ssize_t transferred_or_error) +{ + struct netfs_read_subrequest *subreq = priv; + + netfs_subreq_terminated(subreq, transferred_or_error); +} + +/* + * Issue a read against the cache. + * - Eats the caller's ref on subreq. + */ +static void netfs_read_from_cache(struct netfs_read_request *rreq, + struct netfs_read_subrequest *subreq, + bool seek_data) +{ + struct netfs_cache_resources *cres = &rreq->cache_resources; + struct iov_iter iter; + + iov_iter_xarray(&iter, READ, &rreq->mapping->i_pages, + subreq->start + subreq->transferred, + subreq->len - subreq->transferred); + + cres->ops->read(cres, subreq->start, &iter, seek_data, + netfs_cache_read_terminated, subreq); +} + /* * Fill a subrequest region with zeroes. */ @@ -193,6 +221,141 @@ static void netfs_rreq_completed(struct netfs_read_request *rreq) netfs_put_read_request(rreq); } +/* + * Deal with the completion of writing the data to the cache. We have to clear + * the PG_fscache bits on the pages involved and release the caller's ref. + * + * May be called in softirq mode and we inherit a ref from the caller. + */ +static void netfs_rreq_unmark_after_write(struct netfs_read_request *rreq) +{ + struct netfs_read_subrequest *subreq; + struct page *page; + pgoff_t unlocked = 0; + bool have_unlocked = false; + + rcu_read_lock(); + + list_for_each_entry(subreq, &rreq->subrequests, rreq_link) { + XA_STATE(xas, &rreq->mapping->i_pages, subreq->start / PAGE_SIZE); + + xas_for_each(&xas, page, (subreq->start + subreq->len - 1) / PAGE_SIZE) { + /* We might have multiple writes from the same huge + * page, but we mustn't unlock a page more than once. + */ + if (have_unlocked && page->index <= unlocked) + continue; + unlocked = page->index; + unlock_page_fscache(page); + have_unlocked = true; + } + } + + rcu_read_unlock(); + netfs_rreq_completed(rreq); +} + +static void netfs_rreq_copy_terminated(void *priv, ssize_t transferred_or_error) +{ + struct netfs_read_subrequest *subreq = priv; + struct netfs_read_request *rreq = subreq->rreq; + + if (IS_ERR_VALUE(transferred_or_error)) { + subreq->error = transferred_or_error; + netfs_stat(&netfs_n_rh_write_failed); + } else { + subreq->error = 0; + netfs_stat(&netfs_n_rh_write_done); + } + + trace_netfs_sreq(subreq, netfs_sreq_trace_write_term); + + /* If we decrement nr_wr_ops to 0, the ref belongs to us. */ + if (atomic_dec_and_test(&rreq->nr_wr_ops)) + netfs_rreq_unmark_after_write(rreq); + + netfs_put_subrequest(subreq); +} + +/* + * Perform any outstanding writes to the cache. We inherit a ref from the + * caller. + */ +static void netfs_rreq_do_write_to_cache(struct netfs_read_request *rreq) +{ + struct netfs_cache_resources *cres = &rreq->cache_resources; + struct netfs_read_subrequest *subreq, *next, *p; + struct iov_iter iter; + loff_t pos; + + trace_netfs_rreq(rreq, netfs_rreq_trace_write); + + /* We don't want terminating writes trying to wake us up whilst we're + * still going through the list. + */ + atomic_inc(&rreq->nr_wr_ops); + + list_for_each_entry_safe(subreq, p, &rreq->subrequests, rreq_link) { + if (!test_bit(NETFS_SREQ_WRITE_TO_CACHE, &subreq->flags)) { + list_del_init(&subreq->rreq_link); + netfs_put_subrequest(subreq); + } + } + + list_for_each_entry(subreq, &rreq->subrequests, rreq_link) { + /* Amalgamate adjacent writes */ + pos = round_down(subreq->start, PAGE_SIZE); + if (pos != subreq->start) { + subreq->len += subreq->start - pos; + subreq->start = pos; + } + subreq->len = round_up(subreq->len, PAGE_SIZE); + + while (!list_is_last(&subreq->rreq_link, &rreq->subrequests)) { + next = list_next_entry(subreq, rreq_link); + if (next->start > subreq->start + subreq->len) + break; + subreq->len += next->len; + subreq->len = round_up(subreq->len, PAGE_SIZE); + list_del_init(&next->rreq_link); + netfs_put_subrequest(next); + } + + iov_iter_xarray(&iter, WRITE, &rreq->mapping->i_pages, + subreq->start, subreq->len); + + atomic_inc(&rreq->nr_wr_ops); + netfs_stat(&netfs_n_rh_write); + netfs_get_read_subrequest(subreq); + trace_netfs_sreq(subreq, netfs_sreq_trace_write); + cres->ops->write(cres, subreq->start, &iter, + netfs_rreq_copy_terminated, subreq); + } + + /* If we decrement nr_wr_ops to 0, the usage ref belongs to us. */ + if (atomic_dec_and_test(&rreq->nr_wr_ops)) + netfs_rreq_unmark_after_write(rreq); +} + +static void netfs_rreq_write_to_cache_work(struct work_struct *work) +{ + struct netfs_read_request *rreq = + container_of(work, struct netfs_read_request, work); + + netfs_rreq_do_write_to_cache(rreq); +} + +static void netfs_rreq_write_to_cache(struct netfs_read_request *rreq) +{ + if (in_softirq()) { + rreq->work.func = netfs_rreq_write_to_cache_work; + if (!queue_work(system_unbound_wq, &rreq->work)) + BUG(); + } else { + netfs_rreq_do_write_to_cache(rreq); + } +} + /* * Unlock the pages in a read operation. We need to set PG_fscache on any * pages we're going to write back before we unlock them. @@ -294,7 +457,10 @@ static void netfs_rreq_short_read(struct netfs_read_request *rreq, netfs_get_read_subrequest(subreq); atomic_inc(&rreq->nr_rd_ops); - netfs_read_from_server(rreq, subreq); + if (subreq->source == NETFS_READ_FROM_CACHE) + netfs_read_from_cache(rreq, subreq, true); + else + netfs_read_from_server(rreq, subreq); } /* @@ -339,6 +505,25 @@ static bool netfs_rreq_perform_resubmissions(struct netfs_read_request *rreq) return false; } +/* + * Check to see if the data read is still valid. + */ +static void netfs_rreq_is_still_valid(struct netfs_read_request *rreq) +{ + struct netfs_read_subrequest *subreq; + + if (!rreq->netfs_ops->is_still_valid || + rreq->netfs_ops->is_still_valid(rreq)) + return; + + list_for_each_entry(subreq, &rreq->subrequests, rreq_link) { + if (subreq->source == NETFS_READ_FROM_CACHE) { + subreq->error = -ESTALE; + __set_bit(NETFS_RREQ_INCOMPLETE_IO, &rreq->flags); + } + } +} + /* * Assess the state of a read request and decide what to do next. * @@ -350,6 +535,8 @@ static void netfs_rreq_assess(struct netfs_read_request *rreq) trace_netfs_rreq(rreq, netfs_rreq_trace_assess); again: + netfs_rreq_is_still_valid(rreq); + if (!test_bit(NETFS_RREQ_FAILED, &rreq->flags) && test_bit(NETFS_RREQ_INCOMPLETE_IO, &rreq->flags)) { if (netfs_rreq_perform_resubmissions(rreq)) @@ -362,6 +549,9 @@ static void netfs_rreq_assess(struct netfs_read_request *rreq) clear_bit_unlock(NETFS_RREQ_IN_PROGRESS, &rreq->flags); wake_up_bit(&rreq->flags, NETFS_RREQ_IN_PROGRESS); + if (test_bit(NETFS_RREQ_WRITE_TO_CACHE, &rreq->flags)) + return netfs_rreq_write_to_cache(rreq); + netfs_rreq_completed(rreq); } @@ -493,7 +683,10 @@ static enum netfs_read_source netfs_cache_prepare_read(struct netfs_read_subrequ loff_t i_size) { struct netfs_read_request *rreq = subreq->rreq; + struct netfs_cache_resources *cres = &rreq->cache_resources; + if (cres->ops) + return cres->ops->prepare_read(subreq, i_size); if (subreq->start >= rreq->i_size) return NETFS_FILL_WITH_ZEROES; return NETFS_DOWNLOAD_FROM_SERVER; @@ -584,6 +777,9 @@ static bool netfs_rreq_submit_slice(struct netfs_read_request *rreq, case NETFS_DOWNLOAD_FROM_SERVER: netfs_read_from_server(rreq, subreq); break; + case NETFS_READ_FROM_CACHE: + netfs_read_from_cache(rreq, subreq, false); + break; default: BUG(); } @@ -596,9 +792,23 @@ static bool netfs_rreq_submit_slice(struct netfs_read_request *rreq, return false; } +static void netfs_cache_expand_readahead(struct netfs_read_request *rreq, + loff_t *_start, size_t *_len, loff_t i_size) +{ + struct netfs_cache_resources *cres = &rreq->cache_resources; + + if (cres->ops && cres->ops->expand_readahead) + cres->ops->expand_readahead(cres, _start, _len, i_size); +} + static void netfs_rreq_expand(struct netfs_read_request *rreq, struct readahead_control *ractl) { + /* Give the cache a chance to change the request parameters. The + * resultant request must contain the original region. + */ + netfs_cache_expand_readahead(rreq, &rreq->start, &rreq->len, rreq->i_size); + /* Give the netfs a chance to change the request parameters. The * resultant request must contain the original region. */ @@ -650,6 +860,7 @@ void netfs_readahead(struct readahead_control *ractl, struct netfs_read_request *rreq; struct page *page; unsigned int debug_index = 0; + int ret; _enter("%lx,%x", readahead_index(ractl), readahead_count(ractl)); @@ -667,6 +878,11 @@ void netfs_readahead(struct readahead_control *ractl, trace_netfs_read(rreq, readahead_pos(ractl), readahead_length(ractl), netfs_read_trace_readahead); + if (ops->begin_cache_operation) { + ret = ops->begin_cache_operation(rreq); + if (ret == -ENOMEM || ret == -EINTR || ret == -ERESTARTSYS) + goto cleanup_free; + } netfs_rreq_expand(rreq, ractl); atomic_set(&rreq->nr_rd_ops, 1); @@ -692,6 +908,9 @@ void netfs_readahead(struct readahead_control *ractl, netfs_rreq_assess(rreq); return; +cleanup_free: + netfs_put_read_request(rreq); + return; cleanup: if (netfs_priv) ops->cleanup(ractl->mapping, netfs_priv); @@ -741,6 +960,14 @@ int netfs_readpage(struct file *file, netfs_stat(&netfs_n_rh_readpage); trace_netfs_read(rreq, rreq->start, rreq->len, netfs_read_trace_readpage); + if (ops->begin_cache_operation) { + ret = ops->begin_cache_operation(rreq); + if (ret == -ENOMEM || ret == -EINTR || ret == -ERESTARTSYS) { + unlock_page(page); + goto out; + } + } + netfs_get_read_request(rreq); atomic_set(&rreq->nr_rd_ops, 1); @@ -762,6 +989,7 @@ int netfs_readpage(struct file *file, ret = rreq->error; if (ret == 0 && rreq->submitted < rreq->len) ret = -EIO; +out: netfs_put_read_request(rreq); return ret; } diff --git a/fs/netfs/stats.c b/fs/netfs/stats.c index dd7ad66ed07e..9ae538c85378 100644 --- a/fs/netfs/stats.c +++ b/fs/netfs/stats.c @@ -31,10 +31,11 @@ atomic_t netfs_n_rh_write_zskip; void netfs_stats_show(struct seq_file *m) { - seq_printf(m, "RdHelp : RA=%u RP=%u WB=%u rr=%u sr=%u\n", + seq_printf(m, "RdHelp : RA=%u RP=%u WB=%u WBZ=%u rr=%u sr=%u\n", atomic_read(&netfs_n_rh_readahead), atomic_read(&netfs_n_rh_readpage), atomic_read(&netfs_n_rh_write_begin), + atomic_read(&netfs_n_rh_write_zskip), atomic_read(&netfs_n_rh_rreq), atomic_read(&netfs_n_rh_sreq)); seq_printf(m, "RdHelp : ZR=%u sh=%u sk=%u\n", diff --git a/include/linux/fscache-cache.h b/include/linux/fscache-cache.h index 3f0b19dcfae7..3235ddbdcc09 100644 --- a/include/linux/fscache-cache.h +++ b/include/linux/fscache-cache.h @@ -304,6 +304,10 @@ struct fscache_cache_ops { /* dissociate a cache from all the pages it was backing */ void (*dissociate_pages)(struct fscache_cache *cache); + + /* Begin a read operation for the netfs lib */ + int (*begin_read_operation)(struct netfs_read_request *rreq, + struct fscache_retrieval *op); }; extern struct fscache_cookie fscache_fsdef_index; diff --git a/include/linux/fscache.h b/include/linux/fscache.h index 1f8dc72369ee..0e4161ce347c 100644 --- a/include/linux/fscache.h +++ b/include/linux/fscache.h @@ -37,6 +37,7 @@ struct pagevec; struct fscache_cache_tag; struct fscache_cookie; struct fscache_netfs; +struct netfs_read_request; typedef void (*fscache_rw_complete_t)(struct page *page, void *context, @@ -217,6 +218,7 @@ extern void __fscache_readpages_cancel(struct fscache_cookie *cookie, extern void __fscache_disable_cookie(struct fscache_cookie *, const void *, bool); extern void __fscache_enable_cookie(struct fscache_cookie *, const void *, loff_t, bool (*)(void *), void *); +extern int __fscache_begin_read_operation(struct netfs_read_request *, struct fscache_cookie *); /** * fscache_register_netfs - Register a filesystem as desiring caching services @@ -831,4 +833,19 @@ void fscache_enable_cookie(struct fscache_cookie *cookie, can_enable, data); } +/** + * fscache_begin_read_operation - Begin a read operation for the netfs lib + * @rreq: The read request being undertaken + * @cookie: The cookie representing the cache object + */ +static inline +int fscache_begin_read_operation(struct netfs_read_request *rreq, + struct fscache_cookie *cookie) +{ + if (fscache_cookie_valid(cookie) && fscache_cookie_enabled(cookie)) + return __fscache_begin_read_operation(rreq, cookie); + else + return -ENOBUFS; +} + #endif /* _LINUX_FSCACHE_H */ diff --git a/include/linux/netfs.h b/include/linux/netfs.h index ec9d1240ba49..b2589b39feb8 100644 --- a/include/linux/netfs.h +++ b/include/linux/netfs.h @@ -60,6 +60,17 @@ enum netfs_read_source { NETFS_INVALID_READ, } __mode(byte); +typedef void (*netfs_io_terminated_t)(void *priv, ssize_t transferred_or_error); + +/* + * Resources required to do operations on a cache. + */ +struct netfs_cache_resources { + const struct netfs_cache_ops *ops; + void *cache_priv; + void *cache_priv2; +}; + /* * Descriptor for a single component subrequest. */ @@ -89,11 +100,13 @@ struct netfs_read_request { struct work_struct work; struct inode *inode; /* The file being accessed */ struct address_space *mapping; /* The mapping being accessed */ + struct netfs_cache_resources cache_resources; struct list_head subrequests; /* Requests to fetch I/O from disk or net */ void *netfs_priv; /* Private data for the netfs */ unsigned int debug_id; unsigned int cookie_debug_id; atomic_t nr_rd_ops; /* Number of read ops in progress */ + atomic_t nr_wr_ops; /* Number of write ops in progress */ size_t submitted; /* Amount submitted for I/O so far */ size_t len; /* Length of the request */ short error; /* 0 or error that occurred */ @@ -117,6 +130,7 @@ struct netfs_read_request { struct netfs_read_request_ops { bool (*is_cache_enabled)(struct inode *inode); void (*init_rreq)(struct netfs_read_request *rreq, struct file *file); + int (*begin_cache_operation)(struct netfs_read_request *rreq); void (*expand_readahead)(struct netfs_read_request *rreq); bool (*clamp_length)(struct netfs_read_subrequest *subreq); void (*issue_op)(struct netfs_read_subrequest *subreq); @@ -127,6 +141,40 @@ struct netfs_read_request_ops { void (*cleanup)(struct address_space *mapping, void *netfs_priv); }; +/* + * Table of operations for access to a cache. This is obtained by + * rreq->ops->begin_cache_operation(). + */ +struct netfs_cache_ops { + /* End an operation */ + void (*end_operation)(struct netfs_cache_resources *cres); + + /* Read data from the cache */ + int (*read)(struct netfs_cache_resources *cres, + loff_t start_pos, + struct iov_iter *iter, + bool seek_data, + netfs_io_terminated_t term_func, + void *term_func_priv); + + /* Write data to the cache */ + int (*write)(struct netfs_cache_resources *cres, + loff_t start_pos, + struct iov_iter *iter, + netfs_io_terminated_t term_func, + void *term_func_priv); + + /* Expand readahead request */ + void (*expand_readahead)(struct netfs_cache_resources *cres, + loff_t *_start, size_t *_len, loff_t i_size); + + /* Prepare a read operation, shortening it to a cached/uncached + * boundary as appropriate. + */ + enum netfs_read_source (*prepare_read)(struct netfs_read_subrequest *subreq, + loff_t i_size); +}; + struct readahead_control; extern void netfs_readahead(struct readahead_control *, const struct netfs_read_request_ops *, From patchwork Mon Feb 15 15:46:48 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 12088461 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 892CCC433E0 for ; Mon, 15 Feb 2021 15:47:01 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 39A6061494 for ; Mon, 15 Feb 2021 15:47:01 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 39A6061494 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id CB7578D0118; Mon, 15 Feb 2021 10:47:00 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id C8CDC8D00FD; Mon, 15 Feb 2021 10:47:00 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BA0948D0118; Mon, 15 Feb 2021 10:47:00 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0212.hostedemail.com [216.40.44.212]) by kanga.kvack.org (Postfix) with ESMTP id A43E98D00FD for ; Mon, 15 Feb 2021 10:47:00 -0500 (EST) Received: from smtpin17.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 629238248076 for ; Mon, 15 Feb 2021 15:47:00 +0000 (UTC) X-FDA: 77820930600.17.4361C34 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [63.128.21.124]) by imf28.hostedemail.com (Postfix) with ESMTP id 609A920001D6 for ; Mon, 15 Feb 2021 15:46:59 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1613404019; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=D7aSMvUOK48InysNZ6yfpBVuSlFVpk14VB2492kELOo=; b=Hs96xRxf04PqN9xTQMMA4vfO82oZIZqCMVIBqNW6g5tCbmUUPgBIKUkxAfOjC0Og1Z/um+ CF5Qy3zA0MKKuDIzlFQIsCRXdMTlZ44YE4ycDsdLx02FPIbrlLQ/2x8yulf1zsZhxARP4u IJ3wglVJnXaxg8a0ZOEOF2b+yBLqC7A= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-162-zu5zqU1zMpG1S-ynmgtivA-1; Mon, 15 Feb 2021 10:46:57 -0500 X-MC-Unique: zu5zqU1zMpG1S-ynmgtivA-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 53A5679EC5; Mon, 15 Feb 2021 15:46:55 +0000 (UTC) Received: from warthog.procyon.org.uk (ovpn-119-68.rdu2.redhat.com [10.10.119.68]) by smtp.corp.redhat.com (Postfix) with ESMTP id 344EB60BE2; Mon, 15 Feb 2021 15:46:49 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [PATCH 13/33] netfs: Hold a ref on a page when PG_private_2 is set From: David Howells To: Trond Myklebust , Anna Schumaker , Steve French , Dominique Martinet Cc: Linus Torvalds , Matthew Wilcox , Linus Torvalds , linux-mm@kvack.org, linux-cachefs@redhat.com, linux-afs@lists.infradead.org, linux-nfs@vger.kernel.org, linux-cifs@vger.kernel.org, ceph-devel@vger.kernel.org, v9fs-developer@lists.sourceforge.net, linux-fsdevel@vger.kernel.org, dhowells@redhat.com, Jeff Layton , David Wysochanski , "Matthew Wilcox (Oracle)" , Alexander Viro , linux-cachefs@redhat.com, linux-afs@lists.infradead.org, linux-nfs@vger.kernel.org, linux-cifs@vger.kernel.org, ceph-devel@vger.kernel.org, v9fs-developer@lists.sourceforge.net, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Date: Mon, 15 Feb 2021 15:46:48 +0000 Message-ID: <161340400833.1303470.8070750156044982282.stgit@warthog.procyon.org.uk> In-Reply-To: <161340385320.1303470.2392622971006879777.stgit@warthog.procyon.org.uk> References: <161340385320.1303470.2392622971006879777.stgit@warthog.procyon.org.uk> User-Agent: StGit/0.23 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 609A920001D6 X-Stat-Signature: zamdaoy81ow1kwmxjgsbguad4dp1xjbg Received-SPF: none (redhat.com>: No applicable sender policy available) receiver=imf28; identity=mailfrom; envelope-from=""; helo=us-smtp-delivery-124.mimecast.com; client-ip=63.128.21.124 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1613404019-161506 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Take a reference on a page when PG_private_2 is set and drop it once the bit is unlocked. Reported-by: Linus Torvalds Signed-off-by: David Howells cc: Matthew Wilcox cc: Linus Torvalds cc: linux-mm@kvack.org cc: linux-cachefs@redhat.com cc: linux-afs@lists.infradead.org cc: linux-nfs@vger.kernel.org cc: linux-cifs@vger.kernel.org cc: ceph-devel@vger.kernel.org cc: v9fs-developer@lists.sourceforge.net cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/linux-fsdevel/1331025.1612974993@warthog.procyon.org.uk/ --- fs/netfs/read_helper.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/fs/netfs/read_helper.c b/fs/netfs/read_helper.c index 156941e0de61..9191a3617d91 100644 --- a/fs/netfs/read_helper.c +++ b/fs/netfs/read_helper.c @@ -10,6 +10,7 @@ #include #include #include +#include #include #include #include @@ -230,10 +231,13 @@ static void netfs_rreq_completed(struct netfs_read_request *rreq) static void netfs_rreq_unmark_after_write(struct netfs_read_request *rreq) { struct netfs_read_subrequest *subreq; + struct pagevec pvec; struct page *page; pgoff_t unlocked = 0; bool have_unlocked = false; + pagevec_init(&pvec); + rcu_read_lock(); list_for_each_entry(subreq, &rreq->subrequests, rreq_link) { @@ -247,6 +251,8 @@ static void netfs_rreq_unmark_after_write(struct netfs_read_request *rreq) continue; unlocked = page->index; unlock_page_fscache(page); + if (pagevec_add(&pvec, page) == 0) + pagevec_release(&pvec); have_unlocked = true; } } @@ -403,8 +409,10 @@ static void netfs_rreq_unlock(struct netfs_read_request *rreq) pg_failed = true; break; } - if (test_bit(NETFS_SREQ_WRITE_TO_CACHE, &subreq->flags)) + if (test_bit(NETFS_SREQ_WRITE_TO_CACHE, &subreq->flags)) { + get_page(page); SetPageFsCache(page); + } pg_failed |= subreq_failed; if (pgend < iopos + subreq->len) break; From patchwork Mon Feb 15 22:46:23 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 12089231 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 44135C433E0 for ; Mon, 15 Feb 2021 22:46:41 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id ACAA664DEB for ; Mon, 15 Feb 2021 22:46:40 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org ACAA664DEB Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 0EB938D014A; Mon, 15 Feb 2021 17:46:40 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 09D3A8D0140; Mon, 15 Feb 2021 17:46:40 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E7F8A8D014A; Mon, 15 Feb 2021 17:46:39 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0135.hostedemail.com [216.40.44.135]) by kanga.kvack.org (Postfix) with ESMTP id D2A9C8D0140 for ; Mon, 15 Feb 2021 17:46:39 -0500 (EST) Received: from smtpin07.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 8EC811E1B for ; Mon, 15 Feb 2021 22:46:39 +0000 (UTC) X-FDA: 77821988118.07.tree84_2e130112763e Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin07.hostedemail.com (Postfix) with ESMTP id 770A21824BB47 for ; Mon, 15 Feb 2021 22:46:39 +0000 (UTC) X-HE-Tag: tree84_2e130112763e X-Filterd-Recvd-Size: 6777 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [63.128.21.124]) by imf14.hostedemail.com (Postfix) with ESMTP for ; Mon, 15 Feb 2021 22:46:38 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1613429198; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=siuHUtdQ6wlk6BrFE1gCfr6HqsnZ29fFXCRpUG5E5oE=; b=ityYQ/eOfIEQ9RMM3cqm9UL3DKROp0Sd1Znf1x08vb8zjtCjfhbTuQDYYMJ4SMfUKqiXTT MgcFBg9J1pdh0M+r7hBqoB0D7FNJ11SYYNO0tQpqgs0aFC6V/esB5JGpdiOUkxeSWtFWL5 J4tZ9eGOeyAPAzH7qDOje/esUcOvy3g= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-316-Wht9_IG0Ptu0_iui9ofi0w-1; Mon, 15 Feb 2021 17:46:34 -0500 X-MC-Unique: Wht9_IG0Ptu0_iui9ofi0w-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 1F5B56DD22; Mon, 15 Feb 2021 22:46:32 +0000 (UTC) Received: from warthog.procyon.org.uk (ovpn-119-68.rdu2.redhat.com [10.10.119.68]) by smtp.corp.redhat.com (Postfix) with ESMTP id 187641F0; Mon, 15 Feb 2021 22:46:23 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 In-Reply-To: <161340385320.1303470.2392622971006879777.stgit@warthog.procyon.org.uk> References: <161340385320.1303470.2392622971006879777.stgit@warthog.procyon.org.uk> To: Trond Myklebust Cc: dhowells@redhat.com, Marc Dionne , Anna Schumaker , Steve French , Dominique Martinet , linux-cifs@vger.kernel.org, ceph-devel@vger.kernel.org, Jeff Layton , Matthew Wilcox , linux-cachefs@redhat.com, Alexander Viro , linux-mm@kvack.org, linux-afs@lists.infradead.org, v9fs-developer@lists.sourceforge.net, Christoph Hellwig , linux-fsdevel@vger.kernel.org, linux-nfs@vger.kernel.org, Jeff Layton , David Wysochanski , linux-kernel@vger.kernel.org Subject: [PATCH 34/33] netfs: Use in_interrupt() not in_softirq() From: David Howells MIME-Version: 1.0 Content-ID: <1376937.1613429183.1@warthog.procyon.org.uk> Date: Mon, 15 Feb 2021 22:46:23 +0000 Message-ID: <1376938.1613429183@warthog.procyon.org.uk> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The in_softirq() in netfs_rreq_terminated() works fine for the cache being on a normal disk, as the completion handlers may get called in softirq context, but for an NVMe drive, the completion handler may get called in IRQ context. Fix to use in_interrupt() instead of in_softirq() throughout the read helpers, particularly when deciding whether to punt code that might sleep off to a worker thread. The symptom involves warnings like the following appearing and the kernel hanging: WARNING: CPU: 0 PID: 0 at kernel/softirq.c:175 __local_bh_enable_ip+0x35/0x50 ... RIP: 0010:__local_bh_enable_ip+0x35/0x50 ... Call Trace: rxrpc_kernel_begin_call+0x7d/0x1b0 [rxrpc] ? afs_rx_new_call+0x40/0x40 [kafs] ? afs_alloc_call+0x28/0x120 [kafs] afs_make_call+0x120/0x510 [kafs] ? afs_rx_new_call+0x40/0x40 [kafs] ? afs_alloc_flat_call+0xba/0x100 [kafs] ? __kmalloc+0x167/0x2f0 ? afs_alloc_flat_call+0x9b/0x100 [kafs] afs_wait_for_operation+0x2d/0x200 [kafs] afs_do_sync_operation+0x16/0x20 [kafs] afs_req_issue_op+0x8c/0xb0 [kafs] netfs_rreq_assess+0x125/0x7d0 [netfs] ? cachefiles_end_operation+0x40/0x40 [cachefiles] netfs_subreq_terminated+0x117/0x220 [netfs] cachefiles_read_complete+0x21/0x60 [cachefiles] iomap_dio_bio_end_io+0xdd/0x110 blk_update_request+0x20a/0x380 blk_mq_end_request+0x1c/0x120 nvme_process_cq+0x159/0x1f0 [nvme] nvme_irq+0x10/0x20 [nvme] __handle_irq_event_percpu+0x37/0x150 handle_irq_event+0x49/0xb0 handle_edge_irq+0x7c/0x200 asm_call_irq_on_stack+0xf/0x20 common_interrupt+0xad/0x120 asm_common_interrupt+0x1e/0x40 ... Reported-by: Marc Dionne Signed-off-by: David Howells cc: Matthew Wilcox cc: linux-mm@kvack.org cc: linux-cachefs@redhat.com cc: linux-afs@lists.infradead.org cc: linux-nfs@vger.kernel.org cc: linux-cifs@vger.kernel.org cc: ceph-devel@vger.kernel.org cc: v9fs-developer@lists.sourceforge.net cc: linux-fsdevel@vger.kernel.org --- read_helper.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/fs/netfs/read_helper.c b/fs/netfs/read_helper.c index 9191a3617d91..db582008b4bd 100644 --- a/fs/netfs/read_helper.c +++ b/fs/netfs/read_helper.c @@ -96,7 +96,7 @@ static void netfs_free_read_request(struct work_struct *work) static void netfs_put_read_request(struct netfs_read_request *rreq) { if (refcount_dec_and_test(&rreq->usage)) { - if (in_softirq()) { + if (in_interrupt()) { rreq->work.func = netfs_free_read_request; if (!queue_work(system_unbound_wq, &rreq->work)) BUG(); @@ -353,7 +353,7 @@ static void netfs_rreq_write_to_cache_work(struct work_struct *work) static void netfs_rreq_write_to_cache(struct netfs_read_request *rreq) { - if (in_softirq()) { + if (in_interrupt()) { rreq->work.func = netfs_rreq_write_to_cache_work; if (!queue_work(system_unbound_wq, &rreq->work)) BUG(); @@ -479,7 +479,7 @@ static bool netfs_rreq_perform_resubmissions(struct netfs_read_request *rreq) { struct netfs_read_subrequest *subreq; - WARN_ON(in_softirq()); + WARN_ON(in_interrupt()); trace_netfs_rreq(rreq, netfs_rreq_trace_resubmit); @@ -577,7 +577,7 @@ static void netfs_rreq_work(struct work_struct *work) static void netfs_rreq_terminated(struct netfs_read_request *rreq) { if (test_bit(NETFS_RREQ_INCOMPLETE_IO, &rreq->flags) && - in_softirq()) { + in_interrupt()) { if (!queue_work(system_unbound_wq, &rreq->work)) BUG(); } else {