From patchwork Mon May 9 07:40:07 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jingbo Xu X-Patchwork-Id: 12843121 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5E470C46467 for ; Mon, 9 May 2022 08:01:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235802AbiEIIDO (ORCPT ); Mon, 9 May 2022 04:03:14 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59056 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234856AbiEIHox (ORCPT ); Mon, 9 May 2022 03:44:53 -0400 Received: from out199-3.us.a.mail.aliyun.com (out199-3.us.a.mail.aliyun.com [47.90.199.3]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F39C0140C9; Mon, 9 May 2022 00:40:57 -0700 (PDT) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R211e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=alimailimapcm10staff010182156082;MF=jefflexu@linux.alibaba.com;NM=1;PH=DS;RN=20;SR=0;TI=SMTPD_---0VCfUcI7_1652082030; Received: from localhost(mailfrom:jefflexu@linux.alibaba.com fp:SMTPD_---0VCfUcI7_1652082030) by smtp.aliyun-inc.com(127.0.0.1); Mon, 09 May 2022 15:40:31 +0800 From: Jeffle Xu To: dhowells@redhat.com, linux-cachefs@redhat.com, xiang@kernel.org, chao@kernel.org, linux-erofs@lists.ozlabs.org Cc: torvalds@linux-foundation.org, gregkh@linuxfoundation.org, willy@infradead.org, linux-fsdevel@vger.kernel.org, joseph.qi@linux.alibaba.com, bo.liu@linux.alibaba.com, tao.peng@linux.alibaba.com, gerry@linux.alibaba.com, eguan@linux.alibaba.com, linux-kernel@vger.kernel.org, luodaowen.backend@bytedance.com, tianzichen@kuaishou.com, yinxin.x@bytedance.com, zhangjiachen.jaycee@bytedance.com, zhujia.zj@bytedance.com Subject: [PATCH v11 01/22] cachefiles: extract write routine Date: Mon, 9 May 2022 15:40:07 +0800 Message-Id: <20220509074028.74954-2-jefflexu@linux.alibaba.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20220509074028.74954-1-jefflexu@linux.alibaba.com> References: <20220509074028.74954-1-jefflexu@linux.alibaba.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Extract the generic routine of writing data to cache files, and make it generally available. This will be used by the following patch implementing on-demand read mode. Since it's called inside CacheFiles module, make the interface generic and unrelated to netfs_cache_resources. It is worth noting that, ki->inval_counter is not initialized after this cleanup. It shall not make any visible difference, since inval_counter is no longer used in the write completion routine, i.e. cachefiles_write_complete(). Signed-off-by: Jeffle Xu Acked-by: David Howells --- fs/cachefiles/internal.h | 10 +++++++ fs/cachefiles/io.c | 61 +++++++++++++++++++++++----------------- 2 files changed, 45 insertions(+), 26 deletions(-) diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h index c793d33b0224..e80673d0ab97 100644 --- a/fs/cachefiles/internal.h +++ b/fs/cachefiles/internal.h @@ -201,6 +201,16 @@ extern void cachefiles_put_object(struct cachefiles_object *object, */ extern bool cachefiles_begin_operation(struct netfs_cache_resources *cres, enum fscache_want_state want_state); +extern int __cachefiles_prepare_write(struct cachefiles_object *object, + struct file *file, + loff_t *_start, size_t *_len, + bool no_space_allocated_yet); +extern int __cachefiles_write(struct cachefiles_object *object, + struct file *file, + loff_t start_pos, + struct iov_iter *iter, + netfs_io_terminated_t term_func, + void *term_func_priv); /* * key.c diff --git a/fs/cachefiles/io.c b/fs/cachefiles/io.c index 9dc81e781f2b..50a14e8f0aac 100644 --- a/fs/cachefiles/io.c +++ b/fs/cachefiles/io.c @@ -277,36 +277,33 @@ static void cachefiles_write_complete(struct kiocb *iocb, long ret) /* * Initiate a write to the cache. */ -static int cachefiles_write(struct netfs_cache_resources *cres, - loff_t start_pos, - struct iov_iter *iter, - netfs_io_terminated_t term_func, - void *term_func_priv) +int __cachefiles_write(struct cachefiles_object *object, + struct file *file, + loff_t start_pos, + struct iov_iter *iter, + netfs_io_terminated_t term_func, + void *term_func_priv) { - struct cachefiles_object *object; struct cachefiles_cache *cache; struct cachefiles_kiocb *ki; struct inode *inode; - struct file *file; unsigned int old_nofs; - ssize_t ret = -ENOBUFS; + ssize_t ret; size_t len = iov_iter_count(iter); - if (!fscache_wait_for_operation(cres, FSCACHE_WANT_WRITE)) - goto presubmission_error; fscache_count_write(); - object = cachefiles_cres_object(cres); cache = object->volume->cache; - file = cachefiles_cres_file(cres); _enter("%pD,%li,%llx,%zx/%llx", file, file_inode(file)->i_ino, start_pos, len, i_size_read(file_inode(file))); - ret = -ENOMEM; ki = kzalloc(sizeof(struct cachefiles_kiocb), GFP_KERNEL); - if (!ki) - goto presubmission_error; + if (!ki) { + if (term_func) + term_func(term_func_priv, -ENOMEM, false); + return -ENOMEM; + } refcount_set(&ki->ki_refcnt, 2); ki->iocb.ki_filp = file; @@ -314,7 +311,6 @@ static int cachefiles_write(struct netfs_cache_resources *cres, ki->iocb.ki_flags = IOCB_DIRECT | IOCB_WRITE; ki->iocb.ki_ioprio = get_current_ioprio(); ki->object = object; - ki->inval_counter = cres->inval_counter; ki->start = start_pos; ki->len = len; ki->term_func = term_func; @@ -369,11 +365,24 @@ static int cachefiles_write(struct netfs_cache_resources *cres, cachefiles_put_kiocb(ki); _leave(" = %zd", ret); return ret; +} -presubmission_error: - if (term_func) - term_func(term_func_priv, ret, false); - return ret; +static int cachefiles_write(struct netfs_cache_resources *cres, + loff_t start_pos, + struct iov_iter *iter, + netfs_io_terminated_t term_func, + void *term_func_priv) +{ + if (!fscache_wait_for_operation(cres, FSCACHE_WANT_WRITE)) { + if (term_func) + term_func(term_func_priv, -ENOBUFS, false); + return -ENOBUFS; + } + + return __cachefiles_write(cachefiles_cres_object(cres), + cachefiles_cres_file(cres), + start_pos, iter, + term_func, term_func_priv); } /* @@ -484,13 +493,12 @@ static enum netfs_io_source cachefiles_prepare_read(struct netfs_io_subrequest * /* * Prepare for a write to occur. */ -static int __cachefiles_prepare_write(struct netfs_cache_resources *cres, - loff_t *_start, size_t *_len, loff_t i_size, - bool no_space_allocated_yet) +int __cachefiles_prepare_write(struct cachefiles_object *object, + struct file *file, + loff_t *_start, size_t *_len, + bool no_space_allocated_yet) { - struct cachefiles_object *object = cachefiles_cres_object(cres); struct cachefiles_cache *cache = object->volume->cache; - struct file *file = cachefiles_cres_file(cres); loff_t start = *_start, pos; size_t len = *_len, down; int ret; @@ -577,7 +585,8 @@ static int cachefiles_prepare_write(struct netfs_cache_resources *cres, } cachefiles_begin_secure(cache, &saved_cred); - ret = __cachefiles_prepare_write(cres, _start, _len, i_size, + ret = __cachefiles_prepare_write(object, cachefiles_cres_file(cres), + _start, _len, no_space_allocated_yet); cachefiles_end_secure(cache, saved_cred); return ret; From patchwork Mon May 9 07:40:08 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jingbo Xu X-Patchwork-Id: 12843140 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C9C15C35296 for ; Mon, 9 May 2022 08:02:42 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238183AbiEIIEE (ORCPT ); Mon, 9 May 2022 04:04:04 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59154 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234930AbiEIHoy (ORCPT ); Mon, 9 May 2022 03:44:54 -0400 Received: from out30-54.freemail.mail.aliyun.com (out30-54.freemail.mail.aliyun.com [115.124.30.54]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 210E3140E6; Mon, 9 May 2022 00:40:57 -0700 (PDT) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R211e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e04423;MF=jefflexu@linux.alibaba.com;NM=1;PH=DS;RN=20;SR=0;TI=SMTPD_---0VCfxsVJ_1652082032; Received: from localhost(mailfrom:jefflexu@linux.alibaba.com fp:SMTPD_---0VCfxsVJ_1652082032) by smtp.aliyun-inc.com(127.0.0.1); Mon, 09 May 2022 15:40:32 +0800 From: Jeffle Xu To: dhowells@redhat.com, linux-cachefs@redhat.com, xiang@kernel.org, chao@kernel.org, linux-erofs@lists.ozlabs.org Cc: torvalds@linux-foundation.org, gregkh@linuxfoundation.org, willy@infradead.org, linux-fsdevel@vger.kernel.org, joseph.qi@linux.alibaba.com, bo.liu@linux.alibaba.com, tao.peng@linux.alibaba.com, gerry@linux.alibaba.com, eguan@linux.alibaba.com, linux-kernel@vger.kernel.org, luodaowen.backend@bytedance.com, tianzichen@kuaishou.com, yinxin.x@bytedance.com, zhangjiachen.jaycee@bytedance.com, zhujia.zj@bytedance.com Subject: [PATCH v11 02/22] cachefiles: notify the user daemon when looking up cookie Date: Mon, 9 May 2022 15:40:08 +0800 Message-Id: <20220509074028.74954-3-jefflexu@linux.alibaba.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20220509074028.74954-1-jefflexu@linux.alibaba.com> References: <20220509074028.74954-1-jefflexu@linux.alibaba.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Fscache/CacheFiles used to serve as a local cache for a remote networking fs. A new on-demand read mode will be introduced for CacheFiles, which can boost the scenario where on-demand read semantics are needed, e.g. container image distribution. The essential difference between these two modes is seen when a cache miss occurs: In the original mode, the netfs will fetch the data from the remote server and then write it to the cache file; in on-demand read mode, fetching the data and writing it into the cache is delegated to a user daemon. As the first step, notify the user daemon when looking up cookie. In this case, an anonymous fd is sent to the user daemon, through which the user daemon can write the fetched data to the cache file. Since the user daemon may move the anonymous fd around, e.g. through dup(), an object ID uniquely identifying the cache file is also attached. Also add one advisory flag (FSCACHE_ADV_WANT_CACHE_SIZE) suggesting that the cache file size shall be retrieved at runtime. This helps the scenario where one cache file contains multiple netfs files, e.g. for the purpose of deduplication. In this case, netfs itself has no idea the size of the cache file, whilst the user daemon should give the hint on it. Signed-off-by: Jeffle Xu Acked-by: David Howells --- fs/cachefiles/Kconfig | 12 + fs/cachefiles/Makefile | 1 + fs/cachefiles/daemon.c | 81 ++++++- fs/cachefiles/internal.h | 51 ++++ fs/cachefiles/namei.c | 16 +- fs/cachefiles/ondemand.c | 378 ++++++++++++++++++++++++++++++ include/linux/fscache.h | 1 + include/trace/events/cachefiles.h | 2 + include/uapi/linux/cachefiles.h | 50 ++++ 9 files changed, 577 insertions(+), 15 deletions(-) create mode 100644 fs/cachefiles/ondemand.c create mode 100644 include/uapi/linux/cachefiles.h diff --git a/fs/cachefiles/Kconfig b/fs/cachefiles/Kconfig index 719faeeda168..8df715640a48 100644 --- a/fs/cachefiles/Kconfig +++ b/fs/cachefiles/Kconfig @@ -26,3 +26,15 @@ config CACHEFILES_ERROR_INJECTION help This permits error injection to be enabled in cachefiles whilst a cache is in service. + +config CACHEFILES_ONDEMAND + bool "Support for on-demand read" + depends on CACHEFILES + default n + help + This permits userspace to enable the cachefiles on-demand read mode. + In this mode, when a cache miss occurs, responsibility for fetching + the data lies with the cachefiles backend instead of with the netfs + and is delegated to userspace. + + If unsure, say N. diff --git a/fs/cachefiles/Makefile b/fs/cachefiles/Makefile index 16d811f1a2fa..c37a7a9af10b 100644 --- a/fs/cachefiles/Makefile +++ b/fs/cachefiles/Makefile @@ -16,5 +16,6 @@ cachefiles-y := \ xattr.o cachefiles-$(CONFIG_CACHEFILES_ERROR_INJECTION) += error_inject.o +cachefiles-$(CONFIG_CACHEFILES_ONDEMAND) += ondemand.o obj-$(CONFIG_CACHEFILES) := cachefiles.o diff --git a/fs/cachefiles/daemon.c b/fs/cachefiles/daemon.c index 7ac04ee2c0a0..d5417da7f792 100644 --- a/fs/cachefiles/daemon.c +++ b/fs/cachefiles/daemon.c @@ -75,6 +75,9 @@ static const struct cachefiles_daemon_cmd cachefiles_daemon_cmds[] = { { "inuse", cachefiles_daemon_inuse }, { "secctx", cachefiles_daemon_secctx }, { "tag", cachefiles_daemon_tag }, +#ifdef CONFIG_CACHEFILES_ONDEMAND + { "copen", cachefiles_ondemand_copen }, +#endif { "", NULL } }; @@ -108,6 +111,8 @@ static int cachefiles_daemon_open(struct inode *inode, struct file *file) INIT_LIST_HEAD(&cache->volumes); INIT_LIST_HEAD(&cache->object_list); spin_lock_init(&cache->object_list_lock); + xa_init_flags(&cache->reqs, XA_FLAGS_ALLOC); + xa_init_flags(&cache->ondemand_ids, XA_FLAGS_ALLOC1); /* set default caching limits * - limit at 1% free space and/or free files @@ -126,6 +131,39 @@ static int cachefiles_daemon_open(struct inode *inode, struct file *file) return 0; } +static void cachefiles_flush_reqs(struct cachefiles_cache *cache) +{ + struct xarray *xa = &cache->reqs; + struct cachefiles_req *req; + unsigned long index; + + /* + * Make sure the following two operations won't be reordered. + * 1) set CACHEFILES_DEAD bit + * 2) flush requests in the xarray + * Otherwise the request may be enqueued after xarray has been + * flushed, leaving the orphan request never being completed. + * + * CPU 1 CPU 2 + * ===== ===== + * flush requests in the xarray + * test CACHEFILES_DEAD bit + * enqueue the request + * set CACHEFILES_DEAD bit + */ + smp_mb(); + + xa_lock(xa); + xa_for_each(xa, index, req) { + req->error = -EIO; + complete(&req->done); + } + xa_unlock(xa); + + xa_destroy(&cache->reqs); + xa_destroy(&cache->ondemand_ids); +} + /* * Release a cache. */ @@ -139,6 +177,8 @@ static int cachefiles_daemon_release(struct inode *inode, struct file *file) set_bit(CACHEFILES_DEAD, &cache->flags); + if (cachefiles_in_ondemand_mode(cache)) + cachefiles_flush_reqs(cache); cachefiles_daemon_unbind(cache); /* clean up the control file interface */ @@ -152,23 +192,14 @@ static int cachefiles_daemon_release(struct inode *inode, struct file *file) return 0; } -/* - * Read the cache state. - */ -static ssize_t cachefiles_daemon_read(struct file *file, char __user *_buffer, - size_t buflen, loff_t *pos) +static ssize_t cachefiles_do_daemon_read(struct cachefiles_cache *cache, + char __user *_buffer, size_t buflen) { - struct cachefiles_cache *cache = file->private_data; unsigned long long b_released; unsigned f_released; char buffer[256]; int n; - //_enter(",,%zu,", buflen); - - if (!test_bit(CACHEFILES_READY, &cache->flags)) - return 0; - /* check how much space the cache has */ cachefiles_has_space(cache, 0, 0, cachefiles_has_space_check); @@ -206,6 +237,25 @@ static ssize_t cachefiles_daemon_read(struct file *file, char __user *_buffer, return n; } +/* + * Read the cache state. + */ +static ssize_t cachefiles_daemon_read(struct file *file, char __user *_buffer, + size_t buflen, loff_t *pos) +{ + struct cachefiles_cache *cache = file->private_data; + + //_enter(",,%zu,", buflen); + + if (!test_bit(CACHEFILES_READY, &cache->flags)) + return 0; + + if (cachefiles_in_ondemand_mode(cache)) + return cachefiles_ondemand_daemon_read(cache, _buffer, buflen); + else + return cachefiles_do_daemon_read(cache, _buffer, buflen); +} + /* * Take a command from cachefilesd, parse it and act on it. */ @@ -297,8 +347,13 @@ static __poll_t cachefiles_daemon_poll(struct file *file, poll_wait(file, &cache->daemon_pollwq, poll); mask = 0; - if (test_bit(CACHEFILES_STATE_CHANGED, &cache->flags)) - mask |= EPOLLIN; + if (cachefiles_in_ondemand_mode(cache)) { + if (!xa_empty(&cache->reqs)) + mask |= EPOLLIN; + } else { + if (test_bit(CACHEFILES_STATE_CHANGED, &cache->flags)) + mask |= EPOLLIN; + } if (test_bit(CACHEFILES_CULLING, &cache->flags)) mask |= EPOLLOUT; diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h index e80673d0ab97..4f5150a96849 100644 --- a/fs/cachefiles/internal.h +++ b/fs/cachefiles/internal.h @@ -15,6 +15,8 @@ #include #include #include +#include +#include #define CACHEFILES_DIO_BLOCK_SIZE 4096 @@ -58,8 +60,13 @@ struct cachefiles_object { enum cachefiles_content content_info:8; /* Info about content presence */ unsigned long flags; #define CACHEFILES_OBJECT_USING_TMPFILE 0 /* Have an unlinked tmpfile */ +#ifdef CONFIG_CACHEFILES_ONDEMAND + int ondemand_id; +#endif }; +#define CACHEFILES_ONDEMAND_ID_CLOSED -1 + /* * Cache files cache definition */ @@ -98,11 +105,30 @@ struct cachefiles_cache { #define CACHEFILES_DEAD 1 /* T if cache dead */ #define CACHEFILES_CULLING 2 /* T if cull engaged */ #define CACHEFILES_STATE_CHANGED 3 /* T if state changed (poll trigger) */ +#define CACHEFILES_ONDEMAND_MODE 4 /* T if in on-demand read mode */ char *rootdirname; /* name of cache root directory */ char *secctx; /* LSM security context */ char *tag; /* cache binding tag */ + struct xarray reqs; /* xarray of pending on-demand requests */ + struct xarray ondemand_ids; /* xarray for ondemand_id allocation */ + u32 ondemand_id_next; }; +static inline bool cachefiles_in_ondemand_mode(struct cachefiles_cache *cache) +{ + return IS_ENABLED(CONFIG_CACHEFILES_ONDEMAND) && + test_bit(CACHEFILES_ONDEMAND_MODE, &cache->flags); +} + +struct cachefiles_req { + struct cachefiles_object *object; + struct completion done; + int error; + struct cachefiles_msg msg; +}; + +#define CACHEFILES_REQ_NEW XA_MARK_1 + #include static inline @@ -250,6 +276,31 @@ extern struct file *cachefiles_create_tmpfile(struct cachefiles_object *object); extern bool cachefiles_commit_tmpfile(struct cachefiles_cache *cache, struct cachefiles_object *object); +/* + * ondemand.c + */ +#ifdef CONFIG_CACHEFILES_ONDEMAND +extern ssize_t cachefiles_ondemand_daemon_read(struct cachefiles_cache *cache, + char __user *_buffer, size_t buflen); + +extern int cachefiles_ondemand_copen(struct cachefiles_cache *cache, + char *args); + +extern int cachefiles_ondemand_init_object(struct cachefiles_object *object); + +#else +static inline ssize_t cachefiles_ondemand_daemon_read(struct cachefiles_cache *cache, + char __user *_buffer, size_t buflen) +{ + return -EOPNOTSUPP; +} + +static inline int cachefiles_ondemand_init_object(struct cachefiles_object *object) +{ + return 0; +} +#endif + /* * security.c */ diff --git a/fs/cachefiles/namei.c b/fs/cachefiles/namei.c index ca9f3e4ec4b3..facf2ebe464b 100644 --- a/fs/cachefiles/namei.c +++ b/fs/cachefiles/namei.c @@ -452,10 +452,9 @@ struct file *cachefiles_create_tmpfile(struct cachefiles_object *object) struct dentry *fan = volume->fanout[(u8)object->cookie->key_hash]; struct file *file; struct path path; - uint64_t ni_size = object->cookie->object_size; + uint64_t ni_size; long ret; - ni_size = round_up(ni_size, CACHEFILES_DIO_BLOCK_SIZE); cachefiles_begin_secure(cache, &saved_cred); @@ -481,6 +480,15 @@ struct file *cachefiles_create_tmpfile(struct cachefiles_object *object) goto out_dput; } + ret = cachefiles_ondemand_init_object(object); + if (ret < 0) { + file = ERR_PTR(ret); + goto out_unuse; + } + + ni_size = object->cookie->object_size; + ni_size = round_up(ni_size, CACHEFILES_DIO_BLOCK_SIZE); + if (ni_size > 0) { trace_cachefiles_trunc(object, d_backing_inode(path.dentry), 0, ni_size, cachefiles_trunc_expand_tmpfile); @@ -586,6 +594,10 @@ static bool cachefiles_open_file(struct cachefiles_object *object, } _debug("file -> %pd positive", dentry); + ret = cachefiles_ondemand_init_object(object); + if (ret < 0) + goto error_fput; + ret = cachefiles_check_auxdata(object, file); if (ret < 0) goto check_failed; diff --git a/fs/cachefiles/ondemand.c b/fs/cachefiles/ondemand.c new file mode 100644 index 000000000000..64fc312b16d3 --- /dev/null +++ b/fs/cachefiles/ondemand.c @@ -0,0 +1,378 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +#include +#include +#include +#include "internal.h" + +static int cachefiles_ondemand_fd_release(struct inode *inode, + struct file *file) +{ + struct cachefiles_object *object = file->private_data; + struct cachefiles_cache *cache = object->volume->cache; + int object_id = object->ondemand_id; + + object->ondemand_id = CACHEFILES_ONDEMAND_ID_CLOSED; + xa_erase(&cache->ondemand_ids, object_id); + cachefiles_put_object(object, cachefiles_obj_put_ondemand_fd); + return 0; +} + +static ssize_t cachefiles_ondemand_fd_write_iter(struct kiocb *kiocb, + struct iov_iter *iter) +{ + struct cachefiles_object *object = kiocb->ki_filp->private_data; + struct cachefiles_cache *cache = object->volume->cache; + struct file *file = object->file; + size_t len = iter->count; + loff_t pos = kiocb->ki_pos; + const struct cred *saved_cred; + int ret; + + if (!file) + return -ENOBUFS; + + cachefiles_begin_secure(cache, &saved_cred); + ret = __cachefiles_prepare_write(object, file, &pos, &len, true); + cachefiles_end_secure(cache, saved_cred); + if (ret < 0) + return ret; + + ret = __cachefiles_write(object, file, pos, iter, NULL, NULL); + if (!ret) + ret = len; + + return ret; +} + +static loff_t cachefiles_ondemand_fd_llseek(struct file *filp, loff_t pos, + int whence) +{ + struct cachefiles_object *object = filp->private_data; + struct file *file = object->file; + + if (!file) + return -ENOBUFS; + + return vfs_llseek(file, pos, whence); +} + +static const struct file_operations cachefiles_ondemand_fd_fops = { + .owner = THIS_MODULE, + .release = cachefiles_ondemand_fd_release, + .write_iter = cachefiles_ondemand_fd_write_iter, + .llseek = cachefiles_ondemand_fd_llseek, +}; + +/* + * OPEN request Completion (copen) + * - command: "copen ," + * indicates the object size if >=0, error code if negative + */ +int cachefiles_ondemand_copen(struct cachefiles_cache *cache, char *args) +{ + struct cachefiles_req *req; + struct fscache_cookie *cookie; + char *pid, *psize; + unsigned long id; + long size; + int ret; + + if (!test_bit(CACHEFILES_ONDEMAND_MODE, &cache->flags)) + return -EOPNOTSUPP; + + if (!*args) { + pr_err("Empty id specified\n"); + return -EINVAL; + } + + pid = args; + psize = strchr(args, ','); + if (!psize) { + pr_err("Cache size is not specified\n"); + return -EINVAL; + } + + *psize = 0; + psize++; + + ret = kstrtoul(pid, 0, &id); + if (ret) + return ret; + + req = xa_erase(&cache->reqs, id); + if (!req) + return -EINVAL; + + /* fail OPEN request if copen format is invalid */ + ret = kstrtol(psize, 0, &size); + if (ret) { + req->error = ret; + goto out; + } + + /* fail OPEN request if daemon reports an error */ + if (size < 0) { + if (!IS_ERR_VALUE(size)) + size = -EINVAL; + req->error = size; + goto out; + } + + cookie = req->object->cookie; + cookie->object_size = size; + if (size) + clear_bit(FSCACHE_COOKIE_NO_DATA_TO_READ, &cookie->flags); + else + set_bit(FSCACHE_COOKIE_NO_DATA_TO_READ, &cookie->flags); + +out: + complete(&req->done); + return ret; +} + +static int cachefiles_ondemand_get_fd(struct cachefiles_req *req) +{ + struct cachefiles_object *object; + struct cachefiles_cache *cache; + struct cachefiles_open *load; + struct file *file; + u32 object_id; + int ret, fd; + + object = cachefiles_grab_object(req->object, + cachefiles_obj_get_ondemand_fd); + cache = object->volume->cache; + + ret = xa_alloc_cyclic(&cache->ondemand_ids, &object_id, NULL, + XA_LIMIT(1, INT_MAX), + &cache->ondemand_id_next, GFP_KERNEL); + if (ret < 0) + goto err; + + fd = get_unused_fd_flags(O_WRONLY); + if (fd < 0) { + ret = fd; + goto err_free_id; + } + + file = anon_inode_getfile("[cachefiles]", &cachefiles_ondemand_fd_fops, + object, O_WRONLY); + if (IS_ERR(file)) { + ret = PTR_ERR(file); + goto err_put_fd; + } + + file->f_mode |= FMODE_PWRITE | FMODE_LSEEK; + fd_install(fd, file); + + load = (void *)req->msg.data; + load->fd = fd; + req->msg.object_id = object_id; + object->ondemand_id = object_id; + return 0; + +err_put_fd: + put_unused_fd(fd); +err_free_id: + xa_erase(&cache->ondemand_ids, object_id); +err: + cachefiles_put_object(object, cachefiles_obj_put_ondemand_fd); + return ret; +} + +ssize_t cachefiles_ondemand_daemon_read(struct cachefiles_cache *cache, + char __user *_buffer, size_t buflen) +{ + struct cachefiles_req *req; + struct cachefiles_msg *msg; + unsigned long id = 0; + size_t n; + int ret = 0; + XA_STATE(xas, &cache->reqs, 0); + + /* + * Search for a request that has not ever been processed, to prevent + * requests from being processed repeatedly. + */ + xa_lock(&cache->reqs); + req = xas_find_marked(&xas, UINT_MAX, CACHEFILES_REQ_NEW); + if (!req) { + xa_unlock(&cache->reqs); + return 0; + } + + msg = &req->msg; + n = msg->len; + + if (n > buflen) { + xa_unlock(&cache->reqs); + return -EMSGSIZE; + } + + xas_clear_mark(&xas, CACHEFILES_REQ_NEW); + xa_unlock(&cache->reqs); + + id = xas.xa_index; + msg->msg_id = id; + + if (msg->opcode == CACHEFILES_OP_OPEN) { + ret = cachefiles_ondemand_get_fd(req); + if (ret) + goto error; + } + + if (copy_to_user(_buffer, msg, n) != 0) { + ret = -EFAULT; + goto err_put_fd; + } + + return n; + +err_put_fd: + if (msg->opcode == CACHEFILES_OP_OPEN) + close_fd(((struct cachefiles_open *)msg->data)->fd); +error: + xa_erase(&cache->reqs, id); + req->error = ret; + complete(&req->done); + return ret; +} + +typedef int (*init_req_fn)(struct cachefiles_req *req, void *private); + +static int cachefiles_ondemand_send_req(struct cachefiles_object *object, + enum cachefiles_opcode opcode, + size_t data_len, + init_req_fn init_req, + void *private) +{ + struct cachefiles_cache *cache = object->volume->cache; + struct cachefiles_req *req; + XA_STATE(xas, &cache->reqs, 0); + int ret; + + if (!test_bit(CACHEFILES_ONDEMAND_MODE, &cache->flags)) + return 0; + + if (test_bit(CACHEFILES_DEAD, &cache->flags)) + return -EIO; + + req = kzalloc(sizeof(*req) + data_len, GFP_KERNEL); + if (!req) + return -ENOMEM; + + req->object = object; + init_completion(&req->done); + req->msg.opcode = opcode; + req->msg.len = sizeof(struct cachefiles_msg) + data_len; + + ret = init_req(req, private); + if (ret) + goto out; + + do { + /* + * Stop enqueuing the request when daemon is dying. The + * following two operations need to be atomic as a whole. + * 1) check cache state, and + * 2) enqueue request if cache is alive. + * Otherwise the request may be enqueued after xarray has been + * flushed, leaving the orphan request never being completed. + * + * CPU 1 CPU 2 + * ===== ===== + * test CACHEFILES_DEAD bit + * set CACHEFILES_DEAD bit + * flush requests in the xarray + * enqueue the request + */ + xas_lock(&xas); + + if (test_bit(CACHEFILES_DEAD, &cache->flags)) { + xas_unlock(&xas); + ret = -EIO; + goto out; + } + + /* coupled with the barrier in cachefiles_flush_reqs() */ + smp_mb(); + + xas.xa_index = 0; + xas_find_marked(&xas, UINT_MAX, XA_FREE_MARK); + if (xas.xa_node == XAS_RESTART) + xas_set_err(&xas, -EBUSY); + xas_store(&xas, req); + xas_clear_mark(&xas, XA_FREE_MARK); + xas_set_mark(&xas, CACHEFILES_REQ_NEW); + xas_unlock(&xas); + } while (xas_nomem(&xas, GFP_KERNEL)); + + ret = xas_error(&xas); + if (ret) + goto out; + + wake_up_all(&cache->daemon_pollwq); + wait_for_completion(&req->done); + ret = req->error; +out: + kfree(req); + return ret; +} + +static int cachefiles_ondemand_init_open_req(struct cachefiles_req *req, + void *private) +{ + struct cachefiles_object *object = req->object; + struct fscache_cookie *cookie = object->cookie; + struct fscache_volume *volume = object->volume->vcookie; + struct cachefiles_open *load = (void *)req->msg.data; + size_t volume_key_size, cookie_key_size; + void *volume_key, *cookie_key; + + /* + * Volume key is a NUL-terminated string. key[0] stores strlen() of the + * string, followed by the content of the string (excluding '\0'). + */ + volume_key_size = volume->key[0] + 1; + volume_key = volume->key + 1; + + /* Cookie key is binary data, which is netfs specific. */ + cookie_key_size = cookie->key_len; + cookie_key = fscache_get_key(cookie); + + if (!(object->cookie->advice & FSCACHE_ADV_WANT_CACHE_SIZE)) { + pr_err("WANT_CACHE_SIZE is needed for on-demand mode\n"); + return -EINVAL; + } + + load->volume_key_size = volume_key_size; + load->cookie_key_size = cookie_key_size; + memcpy(load->data, volume_key, volume_key_size); + memcpy(load->data + volume_key_size, cookie_key, cookie_key_size); + + return 0; +} + +int cachefiles_ondemand_init_object(struct cachefiles_object *object) +{ + struct fscache_cookie *cookie = object->cookie; + struct fscache_volume *volume = object->volume->vcookie; + size_t volume_key_size, cookie_key_size, data_len; + + /* + * CacheFiles will firstly check the cache file under the root cache + * directory. If the coherency check failed, it will fallback to + * creating a new tmpfile as the cache file. Reuse the previously + * allocated object ID if any. + */ + if (object->ondemand_id > 0) + return 0; + + volume_key_size = volume->key[0] + 1; + cookie_key_size = cookie->key_len; + data_len = sizeof(struct cachefiles_open) + + volume_key_size + cookie_key_size; + + return cachefiles_ondemand_send_req(object, CACHEFILES_OP_OPEN, + data_len, cachefiles_ondemand_init_open_req, NULL); +} diff --git a/include/linux/fscache.h b/include/linux/fscache.h index e25539072463..72585c9729a2 100644 --- a/include/linux/fscache.h +++ b/include/linux/fscache.h @@ -39,6 +39,7 @@ struct fscache_cookie; #define FSCACHE_ADV_SINGLE_CHUNK 0x01 /* The object is a single chunk of data */ #define FSCACHE_ADV_WRITE_CACHE 0x00 /* Do cache if written to locally */ #define FSCACHE_ADV_WRITE_NOCACHE 0x02 /* Don't cache if written to locally */ +#define FSCACHE_ADV_WANT_CACHE_SIZE 0x04 /* Retrieve cache size at runtime */ #define FSCACHE_INVAL_DIO_WRITE 0x01 /* Invalidate due to DIO write */ diff --git a/include/trace/events/cachefiles.h b/include/trace/events/cachefiles.h index 311c14a20e70..93df9391bd7f 100644 --- a/include/trace/events/cachefiles.h +++ b/include/trace/events/cachefiles.h @@ -31,6 +31,8 @@ enum cachefiles_obj_ref_trace { cachefiles_obj_see_lookup_failed, cachefiles_obj_see_withdraw_cookie, cachefiles_obj_see_withdrawal, + cachefiles_obj_get_ondemand_fd, + cachefiles_obj_put_ondemand_fd, }; enum fscache_why_object_killed { diff --git a/include/uapi/linux/cachefiles.h b/include/uapi/linux/cachefiles.h new file mode 100644 index 000000000000..521f2fe4fe9c --- /dev/null +++ b/include/uapi/linux/cachefiles.h @@ -0,0 +1,50 @@ +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ +#ifndef _LINUX_CACHEFILES_H +#define _LINUX_CACHEFILES_H + +#include + +/* + * Fscache ensures that the maximum length of cookie key is 255. The volume key + * is controlled by netfs, and generally no bigger than 255. + */ +#define CACHEFILES_MSG_MAX_SIZE 1024 + +enum cachefiles_opcode { + CACHEFILES_OP_OPEN, +}; + +/* + * Message Header + * + * @msg_id a unique ID identifying this message + * @opcode message type, CACHEFILE_OP_* + * @len message length, including message header and following data + * @object_id a unique ID identifying a cache file + * @data message type specific payload + */ +struct cachefiles_msg { + __u32 msg_id; + __u32 opcode; + __u32 len; + __u32 object_id; + __u8 data[]; +}; + +/* + * @data contains the volume_key followed directly by the cookie_key. volume_key + * is a NUL-terminated string; @volume_key_size indicates the size of the volume + * key in bytes. cookie_key is binary data, which is netfs specific; + * @cookie_key_size indicates the size of the cookie key in bytes. + * + * @fd identifies an anon_fd referring to the cache file. + */ +struct cachefiles_open { + __u32 volume_key_size; + __u32 cookie_key_size; + __u32 fd; + __u32 flags; + __u8 data[]; +}; + +#endif From patchwork Mon May 9 07:40:09 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jingbo Xu X-Patchwork-Id: 12843134 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0FB5FC3527B for ; Mon, 9 May 2022 08:02:42 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238103AbiEIIDi (ORCPT ); Mon, 9 May 2022 04:03:38 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59026 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234790AbiEIHow (ORCPT ); Mon, 9 May 2022 03:44:52 -0400 Received: from out30-56.freemail.mail.aliyun.com (out30-56.freemail.mail.aliyun.com [115.124.30.56]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F31BC14022; Mon, 9 May 2022 00:40:57 -0700 (PDT) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R481e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e04395;MF=jefflexu@linux.alibaba.com;NM=1;PH=DS;RN=20;SR=0;TI=SMTPD_---0VCfaIjH_1652082033; Received: from localhost(mailfrom:jefflexu@linux.alibaba.com fp:SMTPD_---0VCfaIjH_1652082033) by smtp.aliyun-inc.com(127.0.0.1); Mon, 09 May 2022 15:40:34 +0800 From: Jeffle Xu To: dhowells@redhat.com, linux-cachefs@redhat.com, xiang@kernel.org, chao@kernel.org, linux-erofs@lists.ozlabs.org Cc: torvalds@linux-foundation.org, gregkh@linuxfoundation.org, willy@infradead.org, linux-fsdevel@vger.kernel.org, joseph.qi@linux.alibaba.com, bo.liu@linux.alibaba.com, tao.peng@linux.alibaba.com, gerry@linux.alibaba.com, eguan@linux.alibaba.com, linux-kernel@vger.kernel.org, luodaowen.backend@bytedance.com, tianzichen@kuaishou.com, yinxin.x@bytedance.com, zhangjiachen.jaycee@bytedance.com, zhujia.zj@bytedance.com Subject: [PATCH v11 03/22] cachefiles: unbind cachefiles gracefully in on-demand mode Date: Mon, 9 May 2022 15:40:09 +0800 Message-Id: <20220509074028.74954-4-jefflexu@linux.alibaba.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20220509074028.74954-1-jefflexu@linux.alibaba.com> References: <20220509074028.74954-1-jefflexu@linux.alibaba.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Add a refcount to avoid the deadlock in on-demand read mode. The on-demand read mode will pin the corresponding cachefiles object for each anonymous fd. The cachefiles object is unpinned when the anonymous fd gets closed. When the user daemon exits and the fd of "/dev/cachefiles" device node gets closed, it will wait for all cahcefiles objects getting withdrawn. Then if there's any anonymous fd getting closed after the fd of the device node, the user daemon will hang forever, waiting for all objects getting withdrawn. To fix this, add a refcount indicating if there's any object pinned by anonymous fds. The cachefiles cache gets unbound and withdrawn when the refcount is decreased to 0. It won't change the behaviour of the original mode, in which case the cachefiles cache gets unbound and withdrawn as long as the fd of the device node gets closed. Signed-off-by: Jeffle Xu Acked-by: David Howells --- fs/cachefiles/daemon.c | 19 ++++++++++++++++--- fs/cachefiles/internal.h | 3 +++ fs/cachefiles/ondemand.c | 3 +++ 3 files changed, 22 insertions(+), 3 deletions(-) diff --git a/fs/cachefiles/daemon.c b/fs/cachefiles/daemon.c index d5417da7f792..5b1d0642c749 100644 --- a/fs/cachefiles/daemon.c +++ b/fs/cachefiles/daemon.c @@ -111,6 +111,7 @@ static int cachefiles_daemon_open(struct inode *inode, struct file *file) INIT_LIST_HEAD(&cache->volumes); INIT_LIST_HEAD(&cache->object_list); spin_lock_init(&cache->object_list_lock); + refcount_set(&cache->unbind_pincount, 1); xa_init_flags(&cache->reqs, XA_FLAGS_ALLOC); xa_init_flags(&cache->ondemand_ids, XA_FLAGS_ALLOC1); @@ -164,6 +165,20 @@ static void cachefiles_flush_reqs(struct cachefiles_cache *cache) xa_destroy(&cache->ondemand_ids); } +void cachefiles_put_unbind_pincount(struct cachefiles_cache *cache) +{ + if (refcount_dec_and_test(&cache->unbind_pincount)) { + cachefiles_daemon_unbind(cache); + cachefiles_open = 0; + kfree(cache); + } +} + +void cachefiles_get_unbind_pincount(struct cachefiles_cache *cache) +{ + refcount_inc(&cache->unbind_pincount); +} + /* * Release a cache. */ @@ -179,14 +194,12 @@ static int cachefiles_daemon_release(struct inode *inode, struct file *file) if (cachefiles_in_ondemand_mode(cache)) cachefiles_flush_reqs(cache); - cachefiles_daemon_unbind(cache); /* clean up the control file interface */ cache->cachefilesd = NULL; file->private_data = NULL; - cachefiles_open = 0; - kfree(cache); + cachefiles_put_unbind_pincount(cache); _leave(""); return 0; diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h index 4f5150a96849..e5c612888f84 100644 --- a/fs/cachefiles/internal.h +++ b/fs/cachefiles/internal.h @@ -109,6 +109,7 @@ struct cachefiles_cache { char *rootdirname; /* name of cache root directory */ char *secctx; /* LSM security context */ char *tag; /* cache binding tag */ + refcount_t unbind_pincount;/* refcount to do daemon unbind */ struct xarray reqs; /* xarray of pending on-demand requests */ struct xarray ondemand_ids; /* xarray for ondemand_id allocation */ u32 ondemand_id_next; @@ -171,6 +172,8 @@ extern int cachefiles_has_space(struct cachefiles_cache *cache, * daemon.c */ extern const struct file_operations cachefiles_daemon_fops; +extern void cachefiles_get_unbind_pincount(struct cachefiles_cache *cache); +extern void cachefiles_put_unbind_pincount(struct cachefiles_cache *cache); /* * error_inject.c diff --git a/fs/cachefiles/ondemand.c b/fs/cachefiles/ondemand.c index 64fc312b16d3..7946ee6c40be 100644 --- a/fs/cachefiles/ondemand.c +++ b/fs/cachefiles/ondemand.c @@ -14,6 +14,7 @@ static int cachefiles_ondemand_fd_release(struct inode *inode, object->ondemand_id = CACHEFILES_ONDEMAND_ID_CLOSED; xa_erase(&cache->ondemand_ids, object_id); cachefiles_put_object(object, cachefiles_obj_put_ondemand_fd); + cachefiles_put_unbind_pincount(cache); return 0; } @@ -169,6 +170,8 @@ static int cachefiles_ondemand_get_fd(struct cachefiles_req *req) load->fd = fd; req->msg.object_id = object_id; object->ondemand_id = object_id; + + cachefiles_get_unbind_pincount(cache); return 0; err_put_fd: From patchwork Mon May 9 07:40:10 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jingbo Xu X-Patchwork-Id: 12843122 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 36A16C433F5 for ; Mon, 9 May 2022 08:02:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235993AbiEIIDU (ORCPT ); Mon, 9 May 2022 04:03:20 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59030 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234814AbiEIHow (ORCPT ); Mon, 9 May 2022 03:44:52 -0400 Received: from out30-45.freemail.mail.aliyun.com (out30-45.freemail.mail.aliyun.com [115.124.30.45]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9A2EF165BC; Mon, 9 May 2022 00:40:58 -0700 (PDT) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R961e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e04394;MF=jefflexu@linux.alibaba.com;NM=1;PH=DS;RN=20;SR=0;TI=SMTPD_---0VCggc7l_1652082035; Received: from localhost(mailfrom:jefflexu@linux.alibaba.com fp:SMTPD_---0VCggc7l_1652082035) by smtp.aliyun-inc.com(127.0.0.1); Mon, 09 May 2022 15:40:36 +0800 From: Jeffle Xu To: dhowells@redhat.com, linux-cachefs@redhat.com, xiang@kernel.org, chao@kernel.org, linux-erofs@lists.ozlabs.org Cc: torvalds@linux-foundation.org, gregkh@linuxfoundation.org, willy@infradead.org, linux-fsdevel@vger.kernel.org, joseph.qi@linux.alibaba.com, bo.liu@linux.alibaba.com, tao.peng@linux.alibaba.com, gerry@linux.alibaba.com, eguan@linux.alibaba.com, linux-kernel@vger.kernel.org, luodaowen.backend@bytedance.com, tianzichen@kuaishou.com, yinxin.x@bytedance.com, zhangjiachen.jaycee@bytedance.com, zhujia.zj@bytedance.com Subject: [PATCH v11 04/22] cachefiles: notify the user daemon when withdrawing cookie Date: Mon, 9 May 2022 15:40:10 +0800 Message-Id: <20220509074028.74954-5-jefflexu@linux.alibaba.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20220509074028.74954-1-jefflexu@linux.alibaba.com> References: <20220509074028.74954-1-jefflexu@linux.alibaba.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Notify the user daemon that cookie is going to be withdrawn, providing a hint that the associated anonymous fd can be closed. Be noted that this is only a hint. The user daemon may close the associated anonymous fd when receiving the CLOSE request, then it will receive another anonymous fd when the cookie gets looked up. Or it may ignore the CLOSE request, and keep writing data through the anonymous fd. However the next time the cookie gets looked up, the user daemon will still receive another new anonymous fd. Signed-off-by: Jeffle Xu Acked-by: David Howells --- fs/cachefiles/interface.c | 2 ++ fs/cachefiles/internal.h | 5 +++++ fs/cachefiles/ondemand.c | 38 +++++++++++++++++++++++++++++++++ include/uapi/linux/cachefiles.h | 1 + 4 files changed, 46 insertions(+) diff --git a/fs/cachefiles/interface.c b/fs/cachefiles/interface.c index ae93cee9d25d..a69073a1d3f0 100644 --- a/fs/cachefiles/interface.c +++ b/fs/cachefiles/interface.c @@ -362,6 +362,8 @@ static void cachefiles_withdraw_cookie(struct fscache_cookie *cookie) spin_unlock(&cache->object_list_lock); } + cachefiles_ondemand_clean_object(object); + if (object->file) { cachefiles_begin_secure(cache, &saved_cred); cachefiles_clean_up_object(object, cache); diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h index e5c612888f84..da388ba127eb 100644 --- a/fs/cachefiles/internal.h +++ b/fs/cachefiles/internal.h @@ -290,6 +290,7 @@ extern int cachefiles_ondemand_copen(struct cachefiles_cache *cache, char *args); extern int cachefiles_ondemand_init_object(struct cachefiles_object *object); +extern void cachefiles_ondemand_clean_object(struct cachefiles_object *object); #else static inline ssize_t cachefiles_ondemand_daemon_read(struct cachefiles_cache *cache, @@ -302,6 +303,10 @@ static inline int cachefiles_ondemand_init_object(struct cachefiles_object *obje { return 0; } + +static inline void cachefiles_ondemand_clean_object(struct cachefiles_object *object) +{ +} #endif /* diff --git a/fs/cachefiles/ondemand.c b/fs/cachefiles/ondemand.c index 7946ee6c40be..11b1c15ac697 100644 --- a/fs/cachefiles/ondemand.c +++ b/fs/cachefiles/ondemand.c @@ -229,6 +229,12 @@ ssize_t cachefiles_ondemand_daemon_read(struct cachefiles_cache *cache, goto err_put_fd; } + /* CLOSE request has no reply */ + if (msg->opcode == CACHEFILES_OP_CLOSE) { + xa_erase(&cache->reqs, id); + complete(&req->done); + } + return n; err_put_fd: @@ -300,6 +306,13 @@ static int cachefiles_ondemand_send_req(struct cachefiles_object *object, /* coupled with the barrier in cachefiles_flush_reqs() */ smp_mb(); + if (opcode != CACHEFILES_OP_OPEN && object->ondemand_id <= 0) { + WARN_ON_ONCE(object->ondemand_id == 0); + xas_unlock(&xas); + ret = -EIO; + goto out; + } + xas.xa_index = 0; xas_find_marked(&xas, UINT_MAX, XA_FREE_MARK); if (xas.xa_node == XAS_RESTART) @@ -356,6 +369,25 @@ static int cachefiles_ondemand_init_open_req(struct cachefiles_req *req, return 0; } +static int cachefiles_ondemand_init_close_req(struct cachefiles_req *req, + void *private) +{ + struct cachefiles_object *object = req->object; + int object_id = object->ondemand_id; + + /* + * It's possible that object id is still 0 if the cookie looking up + * phase failed before OPEN request has ever been sent. Also avoid + * sending CLOSE request for CACHEFILES_ONDEMAND_ID_CLOSED, which means + * anon_fd has already been closed. + */ + if (object_id <= 0) + return -ENOENT; + + req->msg.object_id = object_id; + return 0; +} + int cachefiles_ondemand_init_object(struct cachefiles_object *object) { struct fscache_cookie *cookie = object->cookie; @@ -379,3 +411,9 @@ int cachefiles_ondemand_init_object(struct cachefiles_object *object) return cachefiles_ondemand_send_req(object, CACHEFILES_OP_OPEN, data_len, cachefiles_ondemand_init_open_req, NULL); } + +void cachefiles_ondemand_clean_object(struct cachefiles_object *object) +{ + cachefiles_ondemand_send_req(object, CACHEFILES_OP_CLOSE, 0, + cachefiles_ondemand_init_close_req, NULL); +} diff --git a/include/uapi/linux/cachefiles.h b/include/uapi/linux/cachefiles.h index 521f2fe4fe9c..37a0071037c8 100644 --- a/include/uapi/linux/cachefiles.h +++ b/include/uapi/linux/cachefiles.h @@ -12,6 +12,7 @@ enum cachefiles_opcode { CACHEFILES_OP_OPEN, + CACHEFILES_OP_CLOSE, }; /* From patchwork Mon May 9 07:40:11 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jingbo Xu X-Patchwork-Id: 12843123 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4680FC433FE for ; Mon, 9 May 2022 08:02:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238048AbiEIIDW (ORCPT ); Mon, 9 May 2022 04:03:22 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59090 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234895AbiEIHox (ORCPT ); Mon, 9 May 2022 03:44:53 -0400 Received: from out30-54.freemail.mail.aliyun.com (out30-54.freemail.mail.aliyun.com [115.124.30.54]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9E707167C5; Mon, 9 May 2022 00:40:58 -0700 (PDT) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R171e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e04400;MF=jefflexu@linux.alibaba.com;NM=1;PH=DS;RN=20;SR=0;TI=SMTPD_---0VCfUcKJ_1652082036; Received: from localhost(mailfrom:jefflexu@linux.alibaba.com fp:SMTPD_---0VCfUcKJ_1652082036) by smtp.aliyun-inc.com(127.0.0.1); Mon, 09 May 2022 15:40:37 +0800 From: Jeffle Xu To: dhowells@redhat.com, linux-cachefs@redhat.com, xiang@kernel.org, chao@kernel.org, linux-erofs@lists.ozlabs.org Cc: torvalds@linux-foundation.org, gregkh@linuxfoundation.org, willy@infradead.org, linux-fsdevel@vger.kernel.org, joseph.qi@linux.alibaba.com, bo.liu@linux.alibaba.com, tao.peng@linux.alibaba.com, gerry@linux.alibaba.com, eguan@linux.alibaba.com, linux-kernel@vger.kernel.org, luodaowen.backend@bytedance.com, tianzichen@kuaishou.com, yinxin.x@bytedance.com, zhangjiachen.jaycee@bytedance.com, zhujia.zj@bytedance.com Subject: [PATCH v11 05/22] cachefiles: implement on-demand read Date: Mon, 9 May 2022 15:40:11 +0800 Message-Id: <20220509074028.74954-6-jefflexu@linux.alibaba.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20220509074028.74954-1-jefflexu@linux.alibaba.com> References: <20220509074028.74954-1-jefflexu@linux.alibaba.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Implement the data plane of on-demand read mode. The early implementation [1] place the entry to cachefiles_ondemand_read() in fscache_read(). However, fscache_read() can only detect if the requested file range is fully cache miss, whilst we need to notify the user daemon as long as there's a hole inside the requested file range. Thus the entry is now placed in cachefiles_prepare_read(). When working in on-demand read mode, once a hole detected, the read routine will send a READ request to the user daemon. The user daemon needs to fetch the data and write it to the cache file. After sending the READ request, the read routine will hang there, until the READ request is handled by the user daemon. Then it will retry to read from the same file range. If no progress encountered, the read routine will fail then. A new NETFS_SREQ_ONDEMAND flag is introduced to indicate that on-demand read should be done when a cache miss encountered. [1] https://lore.kernel.org/all/20220406075612.60298-6-jefflexu@linux.alibaba.com/ #v8 Signed-off-by: Jeffle Xu Acked-by: David Howells --- fs/cachefiles/internal.h | 9 ++++ fs/cachefiles/io.c | 15 ++++++- fs/cachefiles/ondemand.c | 77 +++++++++++++++++++++++++++++++++ include/linux/netfs.h | 1 + include/uapi/linux/cachefiles.h | 17 ++++++++ 5 files changed, 117 insertions(+), 2 deletions(-) diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h index da388ba127eb..6cba2c6de2f9 100644 --- a/fs/cachefiles/internal.h +++ b/fs/cachefiles/internal.h @@ -292,6 +292,9 @@ extern int cachefiles_ondemand_copen(struct cachefiles_cache *cache, extern int cachefiles_ondemand_init_object(struct cachefiles_object *object); extern void cachefiles_ondemand_clean_object(struct cachefiles_object *object); +extern int cachefiles_ondemand_read(struct cachefiles_object *object, + loff_t pos, size_t len); + #else static inline ssize_t cachefiles_ondemand_daemon_read(struct cachefiles_cache *cache, char __user *_buffer, size_t buflen) @@ -307,6 +310,12 @@ static inline int cachefiles_ondemand_init_object(struct cachefiles_object *obje static inline void cachefiles_ondemand_clean_object(struct cachefiles_object *object) { } + +static inline int cachefiles_ondemand_read(struct cachefiles_object *object, + loff_t pos, size_t len) +{ + return -EOPNOTSUPP; +} #endif /* diff --git a/fs/cachefiles/io.c b/fs/cachefiles/io.c index 50a14e8f0aac..000a28f46e59 100644 --- a/fs/cachefiles/io.c +++ b/fs/cachefiles/io.c @@ -403,6 +403,7 @@ static enum netfs_io_source cachefiles_prepare_read(struct netfs_io_subrequest * enum netfs_io_source ret = NETFS_DOWNLOAD_FROM_SERVER; loff_t off, to; ino_t ino = file ? file_inode(file)->i_ino : 0; + int rc; _enter("%zx @%llx/%llx", subreq->len, subreq->start, i_size); @@ -415,7 +416,8 @@ static enum netfs_io_source cachefiles_prepare_read(struct netfs_io_subrequest * if (test_bit(FSCACHE_COOKIE_NO_DATA_TO_READ, &cookie->flags)) { __set_bit(NETFS_SREQ_COPY_TO_CACHE, &subreq->flags); why = cachefiles_trace_read_no_data; - goto out_no_object; + if (!test_bit(NETFS_SREQ_ONDEMAND, &subreq->flags)) + goto out_no_object; } /* The object and the file may be being created in the background. */ @@ -432,7 +434,7 @@ static enum netfs_io_source cachefiles_prepare_read(struct netfs_io_subrequest * object = cachefiles_cres_object(cres); cache = object->volume->cache; cachefiles_begin_secure(cache, &saved_cred); - +retry: off = cachefiles_inject_read_error(); if (off == 0) off = vfs_llseek(file, subreq->start, SEEK_DATA); @@ -483,6 +485,15 @@ static enum netfs_io_source cachefiles_prepare_read(struct netfs_io_subrequest * download_and_store: __set_bit(NETFS_SREQ_COPY_TO_CACHE, &subreq->flags); + if (test_bit(NETFS_SREQ_ONDEMAND, &subreq->flags)) { + rc = cachefiles_ondemand_read(object, subreq->start, + subreq->len); + if (!rc) { + __clear_bit(NETFS_SREQ_ONDEMAND, &subreq->flags); + goto retry; + } + ret = NETFS_INVALID_READ; + } out: cachefiles_end_secure(cache, saved_cred); out_no_object: diff --git a/fs/cachefiles/ondemand.c b/fs/cachefiles/ondemand.c index 11b1c15ac697..3470d4e8f0cb 100644 --- a/fs/cachefiles/ondemand.c +++ b/fs/cachefiles/ondemand.c @@ -10,8 +10,25 @@ static int cachefiles_ondemand_fd_release(struct inode *inode, struct cachefiles_object *object = file->private_data; struct cachefiles_cache *cache = object->volume->cache; int object_id = object->ondemand_id; + struct cachefiles_req *req; + XA_STATE(xas, &cache->reqs, 0); + xa_lock(&cache->reqs); object->ondemand_id = CACHEFILES_ONDEMAND_ID_CLOSED; + + /* + * Flush all pending READ requests since their completion depends on + * anon_fd. + */ + xas_for_each(&xas, req, ULONG_MAX) { + if (req->msg.opcode == CACHEFILES_OP_READ) { + req->error = -EIO; + complete(&req->done); + xas_store(&xas, NULL); + } + } + xa_unlock(&cache->reqs); + xa_erase(&cache->ondemand_ids, object_id); cachefiles_put_object(object, cachefiles_obj_put_ondemand_fd); cachefiles_put_unbind_pincount(cache); @@ -57,11 +74,35 @@ static loff_t cachefiles_ondemand_fd_llseek(struct file *filp, loff_t pos, return vfs_llseek(file, pos, whence); } +static long cachefiles_ondemand_fd_ioctl(struct file *filp, unsigned int ioctl, + unsigned long arg) +{ + struct cachefiles_object *object = filp->private_data; + struct cachefiles_cache *cache = object->volume->cache; + struct cachefiles_req *req; + unsigned long id; + + if (ioctl != CACHEFILES_IOC_READ_COMPLETE) + return -EINVAL; + + if (!test_bit(CACHEFILES_ONDEMAND_MODE, &cache->flags)) + return -EOPNOTSUPP; + + id = arg; + req = xa_erase(&cache->reqs, id); + if (!req) + return -EINVAL; + + complete(&req->done); + return 0; +} + static const struct file_operations cachefiles_ondemand_fd_fops = { .owner = THIS_MODULE, .release = cachefiles_ondemand_fd_release, .write_iter = cachefiles_ondemand_fd_write_iter, .llseek = cachefiles_ondemand_fd_llseek, + .unlocked_ioctl = cachefiles_ondemand_fd_ioctl, }; /* @@ -388,6 +429,32 @@ static int cachefiles_ondemand_init_close_req(struct cachefiles_req *req, return 0; } +struct cachefiles_read_ctx { + loff_t off; + size_t len; +}; + +static int cachefiles_ondemand_init_read_req(struct cachefiles_req *req, + void *private) +{ + struct cachefiles_object *object = req->object; + struct cachefiles_read *load = (void *)req->msg.data; + struct cachefiles_read_ctx *read_ctx = private; + int object_id = object->ondemand_id; + + /* Stop enqueuing requests when daemon has closed anon_fd. */ + if (object_id <= 0) { + WARN_ON_ONCE(object_id == 0); + pr_info_once("READ: anonymous fd closed prematurely.\n"); + return -EIO; + } + + req->msg.object_id = object_id; + load->off = read_ctx->off; + load->len = read_ctx->len; + return 0; +} + int cachefiles_ondemand_init_object(struct cachefiles_object *object) { struct fscache_cookie *cookie = object->cookie; @@ -417,3 +484,13 @@ void cachefiles_ondemand_clean_object(struct cachefiles_object *object) cachefiles_ondemand_send_req(object, CACHEFILES_OP_CLOSE, 0, cachefiles_ondemand_init_close_req, NULL); } + +int cachefiles_ondemand_read(struct cachefiles_object *object, + loff_t pos, size_t len) +{ + struct cachefiles_read_ctx read_ctx = {pos, len}; + + return cachefiles_ondemand_send_req(object, CACHEFILES_OP_READ, + sizeof(struct cachefiles_read), + cachefiles_ondemand_init_read_req, &read_ctx); +} diff --git a/include/linux/netfs.h b/include/linux/netfs.h index c7bf1eaf51d5..057d04efaf79 100644 --- a/include/linux/netfs.h +++ b/include/linux/netfs.h @@ -159,6 +159,7 @@ struct netfs_io_subrequest { #define NETFS_SREQ_SHORT_IO 2 /* Set if the I/O was short */ #define NETFS_SREQ_SEEK_DATA_READ 3 /* Set if ->read() should SEEK_DATA first */ #define NETFS_SREQ_NO_PROGRESS 4 /* Set if we didn't manage to read any data */ +#define NETFS_SREQ_ONDEMAND 5 /* Set if it's from on-demand read mode */ }; enum netfs_io_origin { diff --git a/include/uapi/linux/cachefiles.h b/include/uapi/linux/cachefiles.h index 37a0071037c8..78caa73e5343 100644 --- a/include/uapi/linux/cachefiles.h +++ b/include/uapi/linux/cachefiles.h @@ -3,6 +3,7 @@ #define _LINUX_CACHEFILES_H #include +#include /* * Fscache ensures that the maximum length of cookie key is 255. The volume key @@ -13,6 +14,7 @@ enum cachefiles_opcode { CACHEFILES_OP_OPEN, CACHEFILES_OP_CLOSE, + CACHEFILES_OP_READ, }; /* @@ -48,4 +50,19 @@ struct cachefiles_open { __u8 data[]; }; +/* + * @off indicates the starting offset of the requested file range + * @len indicates the length of the requested file range + */ +struct cachefiles_read { + __u64 off; + __u64 len; +}; + +/* + * Reply for READ request + * @arg for this ioctl is the @id field of READ request. + */ +#define CACHEFILES_IOC_READ_COMPLETE _IOW(0x98, 1, int) + #endif From patchwork Mon May 9 07:40:12 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jingbo Xu X-Patchwork-Id: 12843132 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 45E20C3527C for ; Mon, 9 May 2022 08:02:42 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238122AbiEIIDm (ORCPT ); Mon, 9 May 2022 04:03:42 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59094 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234913AbiEIHox (ORCPT ); Mon, 9 May 2022 03:44:53 -0400 Received: from out30-42.freemail.mail.aliyun.com (out30-42.freemail.mail.aliyun.com [115.124.30.42]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D549C17050; Mon, 9 May 2022 00:40:58 -0700 (PDT) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R121e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e04394;MF=jefflexu@linux.alibaba.com;NM=1;PH=DS;RN=20;SR=0;TI=SMTPD_---0VCgwz9P_1652082038; Received: from localhost(mailfrom:jefflexu@linux.alibaba.com fp:SMTPD_---0VCgwz9P_1652082038) by smtp.aliyun-inc.com(127.0.0.1); Mon, 09 May 2022 15:40:39 +0800 From: Jeffle Xu To: dhowells@redhat.com, linux-cachefs@redhat.com, xiang@kernel.org, chao@kernel.org, linux-erofs@lists.ozlabs.org Cc: torvalds@linux-foundation.org, gregkh@linuxfoundation.org, willy@infradead.org, linux-fsdevel@vger.kernel.org, joseph.qi@linux.alibaba.com, bo.liu@linux.alibaba.com, tao.peng@linux.alibaba.com, gerry@linux.alibaba.com, eguan@linux.alibaba.com, linux-kernel@vger.kernel.org, luodaowen.backend@bytedance.com, tianzichen@kuaishou.com, yinxin.x@bytedance.com, zhangjiachen.jaycee@bytedance.com, zhujia.zj@bytedance.com Subject: [PATCH v11 06/22] cachefiles: enable on-demand read mode Date: Mon, 9 May 2022 15:40:12 +0800 Message-Id: <20220509074028.74954-7-jefflexu@linux.alibaba.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20220509074028.74954-1-jefflexu@linux.alibaba.com> References: <20220509074028.74954-1-jefflexu@linux.alibaba.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Enable on-demand read mode by adding an optional parameter to the "bind" command. On-demand mode will be turned on when this parameter is "ondemand", i.e. "bind ondemand". Otherwise cachefiles will work in the original mode. Signed-off-by: Jeffle Xu Acked-by: David Howells --- fs/cachefiles/daemon.c | 17 ++++++++++++----- 1 file changed, 12 insertions(+), 5 deletions(-) diff --git a/fs/cachefiles/daemon.c b/fs/cachefiles/daemon.c index 5b1d0642c749..aa4efcabb5e3 100644 --- a/fs/cachefiles/daemon.c +++ b/fs/cachefiles/daemon.c @@ -755,11 +755,6 @@ static int cachefiles_daemon_bind(struct cachefiles_cache *cache, char *args) cache->brun_percent >= 100) return -ERANGE; - if (*args) { - pr_err("'bind' command doesn't take an argument\n"); - return -EINVAL; - } - if (!cache->rootdirname) { pr_err("No cache directory specified\n"); return -EINVAL; @@ -771,6 +766,18 @@ static int cachefiles_daemon_bind(struct cachefiles_cache *cache, char *args) return -EBUSY; } + if (IS_ENABLED(CONFIG_CACHEFILES_ONDEMAND)) { + if (!strcmp(args, "ondemand")) { + set_bit(CACHEFILES_ONDEMAND_MODE, &cache->flags); + } else if (*args) { + pr_err("Invalid argument to the 'bind' command\n"); + return -EINVAL; + } + } else if (*args) { + pr_err("'bind' command doesn't take an argument\n"); + return -EINVAL; + } + /* Make sure we have copies of the tag string */ if (!cache->tag) { /* From patchwork Mon May 9 07:40:13 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jingbo Xu X-Patchwork-Id: 12843135 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 65966C46467 for ; Mon, 9 May 2022 08:02:42 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238128AbiEIIDo (ORCPT ); Mon, 9 May 2022 04:03:44 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59152 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234927AbiEIHoy (ORCPT ); Mon, 9 May 2022 03:44:54 -0400 Received: from out30-54.freemail.mail.aliyun.com (out30-54.freemail.mail.aliyun.com [115.124.30.54]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3B6DC165A1; Mon, 9 May 2022 00:40:58 -0700 (PDT) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R331e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e04426;MF=jefflexu@linux.alibaba.com;NM=1;PH=DS;RN=20;SR=0;TI=SMTPD_---0VCggc9B_1652082039; Received: from localhost(mailfrom:jefflexu@linux.alibaba.com fp:SMTPD_---0VCggc9B_1652082039) by smtp.aliyun-inc.com(127.0.0.1); Mon, 09 May 2022 15:40:40 +0800 From: Jeffle Xu To: dhowells@redhat.com, linux-cachefs@redhat.com, xiang@kernel.org, chao@kernel.org, linux-erofs@lists.ozlabs.org Cc: torvalds@linux-foundation.org, gregkh@linuxfoundation.org, willy@infradead.org, linux-fsdevel@vger.kernel.org, joseph.qi@linux.alibaba.com, bo.liu@linux.alibaba.com, tao.peng@linux.alibaba.com, gerry@linux.alibaba.com, eguan@linux.alibaba.com, linux-kernel@vger.kernel.org, luodaowen.backend@bytedance.com, tianzichen@kuaishou.com, yinxin.x@bytedance.com, zhangjiachen.jaycee@bytedance.com, zhujia.zj@bytedance.com Subject: [PATCH v11 07/22] cachefiles: add tracepoints for on-demand read mode Date: Mon, 9 May 2022 15:40:13 +0800 Message-Id: <20220509074028.74954-8-jefflexu@linux.alibaba.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20220509074028.74954-1-jefflexu@linux.alibaba.com> References: <20220509074028.74954-1-jefflexu@linux.alibaba.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Add tracepoints for on-demand read mode. Currently following tracepoints are added: OPEN request / COPEN reply CLOSE request READ request / CREAD reply write through anonymous fd release of anonymous fd Signed-off-by: Jeffle Xu Acked-by: David Howells --- fs/cachefiles/ondemand.c | 7 ++ include/trace/events/cachefiles.h | 174 ++++++++++++++++++++++++++++++ 2 files changed, 181 insertions(+) diff --git a/fs/cachefiles/ondemand.c b/fs/cachefiles/ondemand.c index 3470d4e8f0cb..a41ae6efc545 100644 --- a/fs/cachefiles/ondemand.c +++ b/fs/cachefiles/ondemand.c @@ -30,6 +30,7 @@ static int cachefiles_ondemand_fd_release(struct inode *inode, xa_unlock(&cache->reqs); xa_erase(&cache->ondemand_ids, object_id); + trace_cachefiles_ondemand_fd_release(object, object_id); cachefiles_put_object(object, cachefiles_obj_put_ondemand_fd); cachefiles_put_unbind_pincount(cache); return 0; @@ -55,6 +56,7 @@ static ssize_t cachefiles_ondemand_fd_write_iter(struct kiocb *kiocb, if (ret < 0) return ret; + trace_cachefiles_ondemand_fd_write(object, file_inode(file), pos, len); ret = __cachefiles_write(object, file, pos, iter, NULL, NULL); if (!ret) ret = len; @@ -93,6 +95,7 @@ static long cachefiles_ondemand_fd_ioctl(struct file *filp, unsigned int ioctl, if (!req) return -EINVAL; + trace_cachefiles_ondemand_cread(object, id); complete(&req->done); return 0; } @@ -166,6 +169,7 @@ int cachefiles_ondemand_copen(struct cachefiles_cache *cache, char *args) clear_bit(FSCACHE_COOKIE_NO_DATA_TO_READ, &cookie->flags); else set_bit(FSCACHE_COOKIE_NO_DATA_TO_READ, &cookie->flags); + trace_cachefiles_ondemand_copen(req->object, id, size); out: complete(&req->done); @@ -213,6 +217,7 @@ static int cachefiles_ondemand_get_fd(struct cachefiles_req *req) object->ondemand_id = object_id; cachefiles_get_unbind_pincount(cache); + trace_cachefiles_ondemand_open(object, &req->msg, load); return 0; err_put_fd: @@ -426,6 +431,7 @@ static int cachefiles_ondemand_init_close_req(struct cachefiles_req *req, return -ENOENT; req->msg.object_id = object_id; + trace_cachefiles_ondemand_close(object, &req->msg); return 0; } @@ -452,6 +458,7 @@ static int cachefiles_ondemand_init_read_req(struct cachefiles_req *req, req->msg.object_id = object_id; load->off = read_ctx->off; load->len = read_ctx->len; + trace_cachefiles_ondemand_read(object, &req->msg, load); return 0; } diff --git a/include/trace/events/cachefiles.h b/include/trace/events/cachefiles.h index 93df9391bd7f..d8d4d73fe7b6 100644 --- a/include/trace/events/cachefiles.h +++ b/include/trace/events/cachefiles.h @@ -673,6 +673,180 @@ TRACE_EVENT(cachefiles_io_error, __entry->error) ); +TRACE_EVENT(cachefiles_ondemand_open, + TP_PROTO(struct cachefiles_object *obj, struct cachefiles_msg *msg, + struct cachefiles_open *load), + + TP_ARGS(obj, msg, load), + + TP_STRUCT__entry( + __field(unsigned int, obj ) + __field(unsigned int, msg_id ) + __field(unsigned int, object_id ) + __field(unsigned int, fd ) + __field(unsigned int, flags ) + ), + + TP_fast_assign( + __entry->obj = obj ? obj->debug_id : 0; + __entry->msg_id = msg->msg_id; + __entry->object_id = msg->object_id; + __entry->fd = load->fd; + __entry->flags = load->flags; + ), + + TP_printk("o=%08x mid=%x oid=%x fd=%d f=%x", + __entry->obj, + __entry->msg_id, + __entry->object_id, + __entry->fd, + __entry->flags) + ); + +TRACE_EVENT(cachefiles_ondemand_copen, + TP_PROTO(struct cachefiles_object *obj, unsigned int msg_id, + long len), + + TP_ARGS(obj, msg_id, len), + + TP_STRUCT__entry( + __field(unsigned int, obj ) + __field(unsigned int, msg_id ) + __field(long, len ) + ), + + TP_fast_assign( + __entry->obj = obj ? obj->debug_id : 0; + __entry->msg_id = msg_id; + __entry->len = len; + ), + + TP_printk("o=%08x mid=%x l=%lx", + __entry->obj, + __entry->msg_id, + __entry->len) + ); + +TRACE_EVENT(cachefiles_ondemand_close, + TP_PROTO(struct cachefiles_object *obj, struct cachefiles_msg *msg), + + TP_ARGS(obj, msg), + + TP_STRUCT__entry( + __field(unsigned int, obj ) + __field(unsigned int, msg_id ) + __field(unsigned int, object_id ) + ), + + TP_fast_assign( + __entry->obj = obj ? obj->debug_id : 0; + __entry->msg_id = msg->msg_id; + __entry->object_id = msg->object_id; + ), + + TP_printk("o=%08x mid=%x oid=%x", + __entry->obj, + __entry->msg_id, + __entry->object_id) + ); + +TRACE_EVENT(cachefiles_ondemand_read, + TP_PROTO(struct cachefiles_object *obj, struct cachefiles_msg *msg, + struct cachefiles_read *load), + + TP_ARGS(obj, msg, load), + + TP_STRUCT__entry( + __field(unsigned int, obj ) + __field(unsigned int, msg_id ) + __field(unsigned int, object_id ) + __field(loff_t, start ) + __field(size_t, len ) + ), + + TP_fast_assign( + __entry->obj = obj ? obj->debug_id : 0; + __entry->msg_id = msg->msg_id; + __entry->object_id = msg->object_id; + __entry->start = load->off; + __entry->len = load->len; + ), + + TP_printk("o=%08x mid=%x oid=%x s=%llx l=%zx", + __entry->obj, + __entry->msg_id, + __entry->object_id, + __entry->start, + __entry->len) + ); + +TRACE_EVENT(cachefiles_ondemand_cread, + TP_PROTO(struct cachefiles_object *obj, unsigned int msg_id), + + TP_ARGS(obj, msg_id), + + TP_STRUCT__entry( + __field(unsigned int, obj ) + __field(unsigned int, msg_id ) + ), + + TP_fast_assign( + __entry->obj = obj ? obj->debug_id : 0; + __entry->msg_id = msg_id; + ), + + TP_printk("o=%08x mid=%x", + __entry->obj, + __entry->msg_id) + ); + +TRACE_EVENT(cachefiles_ondemand_fd_write, + TP_PROTO(struct cachefiles_object *obj, struct inode *backer, + loff_t start, size_t len), + + TP_ARGS(obj, backer, start, len), + + TP_STRUCT__entry( + __field(unsigned int, obj ) + __field(unsigned int, backer ) + __field(loff_t, start ) + __field(size_t, len ) + ), + + TP_fast_assign( + __entry->obj = obj ? obj->debug_id : 0; + __entry->backer = backer->i_ino; + __entry->start = start; + __entry->len = len; + ), + + TP_printk("o=%08x iB=%x s=%llx l=%zx", + __entry->obj, + __entry->backer, + __entry->start, + __entry->len) + ); + +TRACE_EVENT(cachefiles_ondemand_fd_release, + TP_PROTO(struct cachefiles_object *obj, int object_id), + + TP_ARGS(obj, object_id), + + TP_STRUCT__entry( + __field(unsigned int, obj ) + __field(unsigned int, object_id ) + ), + + TP_fast_assign( + __entry->obj = obj ? obj->debug_id : 0; + __entry->object_id = object_id; + ), + + TP_printk("o=%08x oid=%x", + __entry->obj, + __entry->object_id) + ); + #endif /* _TRACE_CACHEFILES_H */ /* This part must be outside protection */ From patchwork Mon May 9 07:40:14 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jingbo Xu X-Patchwork-Id: 12843139 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 84ABDC4707A for ; Mon, 9 May 2022 08:02:42 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238159AbiEIIDx (ORCPT ); Mon, 9 May 2022 04:03:53 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59156 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235007AbiEIHoy (ORCPT ); Mon, 9 May 2022 03:44:54 -0400 Received: from out30-133.freemail.mail.aliyun.com (out30-133.freemail.mail.aliyun.com [115.124.30.133]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 50C2017073; Mon, 9 May 2022 00:40:59 -0700 (PDT) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R231e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e04395;MF=jefflexu@linux.alibaba.com;NM=1;PH=DS;RN=20;SR=0;TI=SMTPD_---0VCfxsXb_1652082041; Received: from localhost(mailfrom:jefflexu@linux.alibaba.com fp:SMTPD_---0VCfxsXb_1652082041) by smtp.aliyun-inc.com(127.0.0.1); Mon, 09 May 2022 15:40:42 +0800 From: Jeffle Xu To: dhowells@redhat.com, linux-cachefs@redhat.com, xiang@kernel.org, chao@kernel.org, linux-erofs@lists.ozlabs.org Cc: torvalds@linux-foundation.org, gregkh@linuxfoundation.org, willy@infradead.org, linux-fsdevel@vger.kernel.org, joseph.qi@linux.alibaba.com, bo.liu@linux.alibaba.com, tao.peng@linux.alibaba.com, gerry@linux.alibaba.com, eguan@linux.alibaba.com, linux-kernel@vger.kernel.org, luodaowen.backend@bytedance.com, tianzichen@kuaishou.com, yinxin.x@bytedance.com, zhangjiachen.jaycee@bytedance.com, zhujia.zj@bytedance.com Subject: [PATCH v11 08/22] cachefiles: document on-demand read mode Date: Mon, 9 May 2022 15:40:14 +0800 Message-Id: <20220509074028.74954-9-jefflexu@linux.alibaba.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20220509074028.74954-1-jefflexu@linux.alibaba.com> References: <20220509074028.74954-1-jefflexu@linux.alibaba.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Document new user interface introduced by on-demand read mode. Signed-off-by: Jeffle Xu Acked-by: David Howells --- .../filesystems/caching/cachefiles.rst | 178 ++++++++++++++++++ 1 file changed, 178 insertions(+) diff --git a/Documentation/filesystems/caching/cachefiles.rst b/Documentation/filesystems/caching/cachefiles.rst index 8bf396b76359..fc7abf712315 100644 --- a/Documentation/filesystems/caching/cachefiles.rst +++ b/Documentation/filesystems/caching/cachefiles.rst @@ -28,6 +28,7 @@ Cache on Already Mounted Filesystem (*) Debugging. + (*) On-demand Read. Overview @@ -482,3 +483,180 @@ the control file. For example:: echo $((1|4|8)) >/sys/module/cachefiles/parameters/debug will turn on all function entry debugging. + + +On-demand Read +============== + +When working in its original mode, CacheFiles serves as a local cache for a +remote networking fs - while in on-demand read mode, CacheFiles can boost the +scenario where on-demand read semantics are needed, e.g. container image +distribution. + +The essential difference between these two modes is seen when a cache miss +occurs: In the original mode, the netfs will fetch the data from the remote +server and then write it to the cache file; in on-demand read mode, fetching +the data and writing it into the cache is delegated to a user daemon. + +``CONFIG_CACHEFILES_ONDEMAND`` should be enabled to support on-demand read mode. + + +Protocol Communication +---------------------- + +The on-demand read mode uses a simple protocol for communication between kernel +and user daemon. The protocol can be modeled as:: + + kernel --[request]--> user daemon --[reply]--> kernel + +CacheFiles will send requests to the user daemon when needed. The user daemon +should poll the devnode ('/dev/cachefiles') to check if there's a pending +request to be processed. A POLLIN event will be returned when there's a pending +request. + +The user daemon then reads the devnode to fetch a request to process. It should +be noted that each read only gets one request. When it has finished processing +the request, the user daemon should write the reply to the devnode. + +Each request starts with a message header of the form:: + + struct cachefiles_msg { + __u32 msg_id; + __u32 opcode; + __u32 len; + __u32 object_id; + __u8 data[]; + }; + +where: + + * ``msg_id`` is a unique ID identifying this request among all pending + requests. + + * ``opcode`` indicates the type of this request. + + * ``object_id`` is a unique ID identifying the cache file operated on. + + * ``data`` indicates the payload of this request. + + * ``len`` indicates the whole length of this request, including the + header and following type-specific payload. + + +Turning on On-demand Mode +------------------------- + +An optional parameter becomes available to the "bind" command:: + + bind [ondemand] + +When the "bind" command is given no argument, it defaults to the original mode. +When it is given the "ondemand" argument, i.e. "bind ondemand", on-demand read +mode will be enabled. + + +The OPEN Request +---------------- + +When the netfs opens a cache file for the first time, a request with the +CACHEFILES_OP_OPEN opcode, a.k.a an OPEN request will be sent to the user +daemon. The payload format is of the form:: + + struct cachefiles_open { + __u32 volume_key_size; + __u32 cookie_key_size; + __u32 fd; + __u32 flags; + __u8 data[]; + }; + +where: + + * ``data`` contains the volume_key followed directly by the cookie_key. + The volume key is a NUL-terminated string; the cookie key is binary + data. + + * ``volume_key_size`` indicates the size of the volume key in bytes. + + * ``cookie_key_size`` indicates the size of the cookie key in bytes. + + * ``fd`` indicates an anonymous fd referring to the cache file, through + which the user daemon can perform write/llseek file operations on the + cache file. + + +The user daemon can use the given (volume_key, cookie_key) pair to distinguish +the requested cache file. With the given anonymous fd, the user daemon can +fetch the data and write it to the cache file in the background, even when +kernel has not triggered a cache miss yet. + +Be noted that each cache file has a unique object_id, while it may have multiple +anonymous fds. The user daemon may duplicate anonymous fds from the initial +anonymous fd indicated by the @fd field through dup(). Thus each object_id can +be mapped to multiple anonymous fds, while the usr daemon itself needs to +maintain the mapping. + +When implementing a user daemon, please be careful of RLIMIT_NOFILE, +``/proc/sys/fs/nr_open`` and ``/proc/sys/fs/file-max``. Typically these needn't +be huge since they're related to the number of open device blobs rather than +open files of each individual filesystem. + +The user daemon should reply the OPEN request by issuing a "copen" (complete +open) command on the devnode:: + + copen , + +where: + + * ``msg_id`` must match the msg_id field of the OPEN request. + + * When >= 0, ``cache_size`` indicates the size of the cache file; + when < 0, ``cache_size`` indicates any error code encountered by the + user daemon. + + +The CLOSE Request +----------------- + +When a cookie withdrawn, a CLOSE request (opcode CACHEFILES_OP_CLOSE) will be +sent to the user daemon. This tells the user daemon to close all anonymous fds +associated with the given object_id. The CLOSE request has no extra payload, +and shouldn't be replied. + + +The READ Request +---------------- + +When a cache miss is encountered in on-demand read mode, CacheFiles will send a +READ request (opcode CACHEFILES_OP_READ) to the user daemon. This tells the user +daemon to fetch the contents of the requested file range. The payload is of the +form:: + + struct cachefiles_read { + __u64 off; + __u64 len; + }; + +where: + + * ``off`` indicates the starting offset of the requested file range. + + * ``len`` indicates the length of the requested file range. + + +When it receives a READ request, the user daemon should fetch the requested data +and write it to the cache file identified by object_id. + +When it has finished processing the READ request, the user daemon should reply +by using the CACHEFILES_IOC_READ_COMPLETE ioctl on one of the anonymous fds +associated with the object_id given in the READ request. The ioctl is of the +form:: + + ioctl(fd, CACHEFILES_IOC_READ_COMPLETE, msg_id); + +where: + + * ``fd`` is one of the anonymous fds associated with the object_id + given. + + * ``msg_id`` must match the msg_id field of the READ request. From patchwork Mon May 9 07:40:15 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jingbo Xu X-Patchwork-Id: 12843129 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id CE764C4167E for ; Mon, 9 May 2022 08:02:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238085AbiEIIDc (ORCPT ); Mon, 9 May 2022 04:03:32 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59158 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235011AbiEIHoy (ORCPT ); Mon, 9 May 2022 03:44:54 -0400 Received: from out30-132.freemail.mail.aliyun.com (out30-132.freemail.mail.aliyun.com [115.124.30.132]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7981B183BD; Mon, 9 May 2022 00:40:59 -0700 (PDT) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R151e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e04423;MF=jefflexu@linux.alibaba.com;NM=1;PH=DS;RN=20;SR=0;TI=SMTPD_---0VCgX6ia_1652082043; Received: from localhost(mailfrom:jefflexu@linux.alibaba.com fp:SMTPD_---0VCgX6ia_1652082043) by smtp.aliyun-inc.com(127.0.0.1); Mon, 09 May 2022 15:40:44 +0800 From: Jeffle Xu To: dhowells@redhat.com, linux-cachefs@redhat.com, xiang@kernel.org, chao@kernel.org, linux-erofs@lists.ozlabs.org Cc: torvalds@linux-foundation.org, gregkh@linuxfoundation.org, willy@infradead.org, linux-fsdevel@vger.kernel.org, joseph.qi@linux.alibaba.com, bo.liu@linux.alibaba.com, tao.peng@linux.alibaba.com, gerry@linux.alibaba.com, eguan@linux.alibaba.com, linux-kernel@vger.kernel.org, luodaowen.backend@bytedance.com, tianzichen@kuaishou.com, yinxin.x@bytedance.com, zhangjiachen.jaycee@bytedance.com, zhujia.zj@bytedance.com Subject: [PATCH v11 09/22] erofs: make erofs_map_blocks() generally available Date: Mon, 9 May 2022 15:40:15 +0800 Message-Id: <20220509074028.74954-10-jefflexu@linux.alibaba.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20220509074028.74954-1-jefflexu@linux.alibaba.com> References: <20220509074028.74954-1-jefflexu@linux.alibaba.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org ... so that it can be used in the following introduced fscache mode. Signed-off-by: Jeffle Xu Reviewed-by: Gao Xiang --- fs/erofs/data.c | 4 ++-- fs/erofs/internal.h | 2 ++ 2 files changed, 4 insertions(+), 2 deletions(-) diff --git a/fs/erofs/data.c b/fs/erofs/data.c index 780db1e5f4b7..bc22642358ec 100644 --- a/fs/erofs/data.c +++ b/fs/erofs/data.c @@ -110,8 +110,8 @@ static int erofs_map_blocks_flatmode(struct inode *inode, return 0; } -static int erofs_map_blocks(struct inode *inode, - struct erofs_map_blocks *map, int flags) +int erofs_map_blocks(struct inode *inode, + struct erofs_map_blocks *map, int flags) { struct super_block *sb = inode->i_sb; struct erofs_inode *vi = EROFS_I(inode); diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h index 5298c4ee277d..fe9564e5091e 100644 --- a/fs/erofs/internal.h +++ b/fs/erofs/internal.h @@ -486,6 +486,8 @@ void *erofs_read_metabuf(struct erofs_buf *buf, struct super_block *sb, int erofs_map_dev(struct super_block *sb, struct erofs_map_dev *dev); int erofs_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo, u64 start, u64 len); +int erofs_map_blocks(struct inode *inode, + struct erofs_map_blocks *map, int flags); /* inode.c */ static inline unsigned long erofs_inode_hash(erofs_nid_t nid) From patchwork Mon May 9 07:40:16 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jingbo Xu X-Patchwork-Id: 12843130 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E4048C4167D for ; Mon, 9 May 2022 08:02:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238088AbiEIIDd (ORCPT ); Mon, 9 May 2022 04:03:33 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59162 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235012AbiEIHoy (ORCPT ); Mon, 9 May 2022 03:44:54 -0400 Received: from out30-133.freemail.mail.aliyun.com (out30-133.freemail.mail.aliyun.com [115.124.30.133]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 49DB618E20; Mon, 9 May 2022 00:41:00 -0700 (PDT) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R151e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e04394;MF=jefflexu@linux.alibaba.com;NM=1;PH=DS;RN=20;SR=0;TI=SMTPD_---0VCfaImD_1652082044; Received: from localhost(mailfrom:jefflexu@linux.alibaba.com fp:SMTPD_---0VCfaImD_1652082044) by smtp.aliyun-inc.com(127.0.0.1); Mon, 09 May 2022 15:40:45 +0800 From: Jeffle Xu To: dhowells@redhat.com, linux-cachefs@redhat.com, xiang@kernel.org, chao@kernel.org, linux-erofs@lists.ozlabs.org Cc: torvalds@linux-foundation.org, gregkh@linuxfoundation.org, willy@infradead.org, linux-fsdevel@vger.kernel.org, joseph.qi@linux.alibaba.com, bo.liu@linux.alibaba.com, tao.peng@linux.alibaba.com, gerry@linux.alibaba.com, eguan@linux.alibaba.com, linux-kernel@vger.kernel.org, luodaowen.backend@bytedance.com, tianzichen@kuaishou.com, yinxin.x@bytedance.com, zhangjiachen.jaycee@bytedance.com, zhujia.zj@bytedance.com Subject: [PATCH v11 10/22] erofs: add fscache mode check helper Date: Mon, 9 May 2022 15:40:16 +0800 Message-Id: <20220509074028.74954-11-jefflexu@linux.alibaba.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20220509074028.74954-1-jefflexu@linux.alibaba.com> References: <20220509074028.74954-1-jefflexu@linux.alibaba.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Until then erofs is exactly blockdev based filesystem. A new fscache-based mode is going to be introduced for erofs to support scenarios where on-demand read semantics is needed, e.g. container image distribution. In this case, erofs could be mounted from data blobs through fscache. Add a helper checking which mode erofs works in, and twist the code in preparation for the upcoming fscache mode. Signed-off-by: Jeffle Xu Reviewed-by: Gao Xiang --- fs/erofs/internal.h | 5 +++++ fs/erofs/super.c | 44 +++++++++++++++++++++++++++++--------------- 2 files changed, 34 insertions(+), 15 deletions(-) diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h index fe9564e5091e..05a97533b1e9 100644 --- a/fs/erofs/internal.h +++ b/fs/erofs/internal.h @@ -161,6 +161,11 @@ struct erofs_sb_info { #define set_opt(opt, option) ((opt)->mount_opt |= EROFS_MOUNT_##option) #define test_opt(opt, option) ((opt)->mount_opt & EROFS_MOUNT_##option) +static inline bool erofs_is_fscache_mode(struct super_block *sb) +{ + return IS_ENABLED(CONFIG_EROFS_FS_ONDEMAND) && !sb->s_bdev; +} + enum { EROFS_ZIP_CACHE_DISABLED, EROFS_ZIP_CACHE_READAHEAD, diff --git a/fs/erofs/super.c b/fs/erofs/super.c index 0c4b41130c2f..724d5ff0d78c 100644 --- a/fs/erofs/super.c +++ b/fs/erofs/super.c @@ -259,15 +259,19 @@ static int erofs_init_devices(struct super_block *sb, } dis = ptr + erofs_blkoff(pos); - bdev = blkdev_get_by_path(dif->path, - FMODE_READ | FMODE_EXCL, - sb->s_type); - if (IS_ERR(bdev)) { - err = PTR_ERR(bdev); - break; + if (!erofs_is_fscache_mode(sb)) { + bdev = blkdev_get_by_path(dif->path, + FMODE_READ | FMODE_EXCL, + sb->s_type); + if (IS_ERR(bdev)) { + err = PTR_ERR(bdev); + break; + } + dif->bdev = bdev; + dif->dax_dev = fs_dax_get_by_bdev(bdev, + &dif->dax_part_off); } - dif->bdev = bdev; - dif->dax_dev = fs_dax_get_by_bdev(bdev, &dif->dax_part_off); + dif->blocks = le32_to_cpu(dis->blocks); dif->mapped_blkaddr = le32_to_cpu(dis->mapped_blkaddr); sbi->total_blocks += dif->blocks; @@ -586,21 +590,28 @@ static int erofs_fc_fill_super(struct super_block *sb, struct fs_context *fc) sb->s_magic = EROFS_SUPER_MAGIC; - if (!sb_set_blocksize(sb, EROFS_BLKSIZ)) { - erofs_err(sb, "failed to set erofs blksize"); - return -EINVAL; - } - sbi = kzalloc(sizeof(*sbi), GFP_KERNEL); if (!sbi) return -ENOMEM; sb->s_fs_info = sbi; sbi->opt = ctx->opt; - sbi->dax_dev = fs_dax_get_by_bdev(sb->s_bdev, &sbi->dax_part_off); sbi->devs = ctx->devs; ctx->devs = NULL; + if (erofs_is_fscache_mode(sb)) { + sb->s_blocksize = EROFS_BLKSIZ; + sb->s_blocksize_bits = LOG_BLOCK_SIZE; + } else { + if (!sb_set_blocksize(sb, EROFS_BLKSIZ)) { + erofs_err(sb, "failed to set erofs blksize"); + return -EINVAL; + } + + sbi->dax_dev = fs_dax_get_by_bdev(sb->s_bdev, + &sbi->dax_part_off); + } + err = erofs_read_superblock(sb); if (err) return err; @@ -857,7 +868,10 @@ static int erofs_statfs(struct dentry *dentry, struct kstatfs *buf) { struct super_block *sb = dentry->d_sb; struct erofs_sb_info *sbi = EROFS_SB(sb); - u64 id = huge_encode_dev(sb->s_bdev->bd_dev); + u64 id = 0; + + if (!erofs_is_fscache_mode(sb)) + id = huge_encode_dev(sb->s_bdev->bd_dev); buf->f_type = sb->s_magic; buf->f_bsize = EROFS_BLKSIZ; From patchwork Mon May 9 07:40:17 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jingbo Xu X-Patchwork-Id: 12843120 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4F1A8C4167E for ; Mon, 9 May 2022 08:01:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235558AbiEIIDN (ORCPT ); Mon, 9 May 2022 04:03:13 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59374 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235576AbiEIHo5 (ORCPT ); Mon, 9 May 2022 03:44:57 -0400 Received: from out30-42.freemail.mail.aliyun.com (out30-42.freemail.mail.aliyun.com [115.124.30.42]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BD17B17050; Mon, 9 May 2022 00:41:02 -0700 (PDT) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R171e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=alimailimapcm10staff010182156082;MF=jefflexu@linux.alibaba.com;NM=1;PH=DS;RN=20;SR=0;TI=SMTPD_---0VCfxsZ3_1652082046; Received: from localhost(mailfrom:jefflexu@linux.alibaba.com fp:SMTPD_---0VCfxsZ3_1652082046) by smtp.aliyun-inc.com(127.0.0.1); Mon, 09 May 2022 15:40:47 +0800 From: Jeffle Xu To: dhowells@redhat.com, linux-cachefs@redhat.com, xiang@kernel.org, chao@kernel.org, linux-erofs@lists.ozlabs.org Cc: torvalds@linux-foundation.org, gregkh@linuxfoundation.org, willy@infradead.org, linux-fsdevel@vger.kernel.org, joseph.qi@linux.alibaba.com, bo.liu@linux.alibaba.com, tao.peng@linux.alibaba.com, gerry@linux.alibaba.com, eguan@linux.alibaba.com, linux-kernel@vger.kernel.org, luodaowen.backend@bytedance.com, tianzichen@kuaishou.com, yinxin.x@bytedance.com, zhangjiachen.jaycee@bytedance.com, zhujia.zj@bytedance.com Subject: [PATCH v11 11/22] erofs: register fscache volume Date: Mon, 9 May 2022 15:40:17 +0800 Message-Id: <20220509074028.74954-12-jefflexu@linux.alibaba.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20220509074028.74954-1-jefflexu@linux.alibaba.com> References: <20220509074028.74954-1-jefflexu@linux.alibaba.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org A new fscache based mode is going to be introduced for erofs, in which case on-demand read semantics is implemented through fscache. As the first step, register fscache volume for each erofs filesystem. That means, data blobs can not be shared among erofs filesystems. In the following iteration, we are going to introduce the domain semantics, in which case several erofs filesystems can belong to one domain, and data blobs can be shared among these erofs filesystems of one domain. Signed-off-by: Jeffle Xu Reviewed-by: Gao Xiang --- fs/erofs/Kconfig | 10 ++++++++++ fs/erofs/Makefile | 1 + fs/erofs/fscache.c | 37 +++++++++++++++++++++++++++++++++++++ fs/erofs/internal.h | 16 ++++++++++++++++ fs/erofs/super.c | 5 +++++ 5 files changed, 69 insertions(+) create mode 100644 fs/erofs/fscache.c diff --git a/fs/erofs/Kconfig b/fs/erofs/Kconfig index f57255ab88ed..85490370e0ca 100644 --- a/fs/erofs/Kconfig +++ b/fs/erofs/Kconfig @@ -98,3 +98,13 @@ config EROFS_FS_ZIP_LZMA systems will be readable without selecting this option. If unsure, say N. + +config EROFS_FS_ONDEMAND + bool "EROFS fscache-based on-demand read support" + depends on CACHEFILES_ONDEMAND && (EROFS_FS=m && FSCACHE || EROFS_FS=y && FSCACHE=y) + default n + help + This permits EROFS to use fscache-backed data blobs with on-demand + read support. + + If unsure, say N. diff --git a/fs/erofs/Makefile b/fs/erofs/Makefile index 8a3317e38e5a..99bbc597a3e9 100644 --- a/fs/erofs/Makefile +++ b/fs/erofs/Makefile @@ -5,3 +5,4 @@ erofs-objs := super.o inode.o data.o namei.o dir.o utils.o pcpubuf.o sysfs.o erofs-$(CONFIG_EROFS_FS_XATTR) += xattr.o erofs-$(CONFIG_EROFS_FS_ZIP) += decompressor.o zmap.o zdata.o erofs-$(CONFIG_EROFS_FS_ZIP_LZMA) += decompressor_lzma.o +erofs-$(CONFIG_EROFS_FS_ONDEMAND) += fscache.o diff --git a/fs/erofs/fscache.c b/fs/erofs/fscache.c new file mode 100644 index 000000000000..7a6d0239ebb1 --- /dev/null +++ b/fs/erofs/fscache.c @@ -0,0 +1,37 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * Copyright (C) 2022, Alibaba Cloud + */ +#include +#include "internal.h" + +int erofs_fscache_register_fs(struct super_block *sb) +{ + struct erofs_sb_info *sbi = EROFS_SB(sb); + struct fscache_volume *volume; + char *name; + int ret = 0; + + name = kasprintf(GFP_KERNEL, "erofs,%s", sbi->opt.fsid); + if (!name) + return -ENOMEM; + + volume = fscache_acquire_volume(name, NULL, NULL, 0); + if (IS_ERR_OR_NULL(volume)) { + erofs_err(sb, "failed to register volume for %s", name); + ret = volume ? PTR_ERR(volume) : -EOPNOTSUPP; + volume = NULL; + } + + sbi->volume = volume; + kfree(name); + return ret; +} + +void erofs_fscache_unregister_fs(struct super_block *sb) +{ + struct erofs_sb_info *sbi = EROFS_SB(sb); + + fscache_relinquish_volume(sbi->volume, NULL, false); + sbi->volume = NULL; +} diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h index 05a97533b1e9..e4f6a13f161f 100644 --- a/fs/erofs/internal.h +++ b/fs/erofs/internal.h @@ -74,6 +74,7 @@ struct erofs_mount_opts { unsigned int max_sync_decompress_pages; #endif unsigned int mount_opt; + char *fsid; }; struct erofs_dev_context { @@ -146,6 +147,9 @@ struct erofs_sb_info { /* sysfs support */ struct kobject s_kobj; /* /sys/fs/erofs/ */ struct completion s_kobj_unregister; + + /* fscache support */ + struct fscache_volume *volume; }; #define EROFS_SB(sb) ((struct erofs_sb_info *)(sb)->s_fs_info) @@ -618,6 +622,18 @@ static inline int z_erofs_load_lzma_config(struct super_block *sb, } #endif /* !CONFIG_EROFS_FS_ZIP */ +/* fscache.c */ +#ifdef CONFIG_EROFS_FS_ONDEMAND +int erofs_fscache_register_fs(struct super_block *sb); +void erofs_fscache_unregister_fs(struct super_block *sb); +#else +static inline int erofs_fscache_register_fs(struct super_block *sb) +{ + return 0; +} +static inline void erofs_fscache_unregister_fs(struct super_block *sb) {} +#endif + #define EFSCORRUPTED EUCLEAN /* Filesystem is corrupted */ #endif /* __EROFS_INTERNAL_H */ diff --git a/fs/erofs/super.c b/fs/erofs/super.c index 724d5ff0d78c..fd8daa447237 100644 --- a/fs/erofs/super.c +++ b/fs/erofs/super.c @@ -602,6 +602,10 @@ static int erofs_fc_fill_super(struct super_block *sb, struct fs_context *fc) if (erofs_is_fscache_mode(sb)) { sb->s_blocksize = EROFS_BLKSIZ; sb->s_blocksize_bits = LOG_BLOCK_SIZE; + + err = erofs_fscache_register_fs(sb); + if (err) + return err; } else { if (!sb_set_blocksize(sb, EROFS_BLKSIZ)) { erofs_err(sb, "failed to set erofs blksize"); @@ -768,6 +772,7 @@ static void erofs_kill_sb(struct super_block *sb) erofs_free_dev_context(sbi->devs); fs_put_dax(sbi->dax_dev); + erofs_fscache_unregister_fs(sb); kfree(sbi); sb->s_fs_info = NULL; } From patchwork Mon May 9 07:40:18 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jingbo Xu X-Patchwork-Id: 12843112 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4D8C8C433F5 for ; Mon, 9 May 2022 07:59:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232655AbiEIH6p (ORCPT ); Mon, 9 May 2022 03:58:45 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59376 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235627AbiEIHo6 (ORCPT ); Mon, 9 May 2022 03:44:58 -0400 Received: from out30-42.freemail.mail.aliyun.com (out30-42.freemail.mail.aliyun.com [115.124.30.42]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7F32318E20; Mon, 9 May 2022 00:41:03 -0700 (PDT) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R771e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e04395;MF=jefflexu@linux.alibaba.com;NM=1;PH=DS;RN=20;SR=0;TI=SMTPD_---0VCfxsZc_1652082047; Received: from localhost(mailfrom:jefflexu@linux.alibaba.com fp:SMTPD_---0VCfxsZc_1652082047) by smtp.aliyun-inc.com(127.0.0.1); Mon, 09 May 2022 15:40:48 +0800 From: Jeffle Xu To: dhowells@redhat.com, linux-cachefs@redhat.com, xiang@kernel.org, chao@kernel.org, linux-erofs@lists.ozlabs.org Cc: torvalds@linux-foundation.org, gregkh@linuxfoundation.org, willy@infradead.org, linux-fsdevel@vger.kernel.org, joseph.qi@linux.alibaba.com, bo.liu@linux.alibaba.com, tao.peng@linux.alibaba.com, gerry@linux.alibaba.com, eguan@linux.alibaba.com, linux-kernel@vger.kernel.org, luodaowen.backend@bytedance.com, tianzichen@kuaishou.com, yinxin.x@bytedance.com, zhangjiachen.jaycee@bytedance.com, zhujia.zj@bytedance.com Subject: [PATCH v11 12/22] erofs: add fscache context helper functions Date: Mon, 9 May 2022 15:40:18 +0800 Message-Id: <20220509074028.74954-13-jefflexu@linux.alibaba.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20220509074028.74954-1-jefflexu@linux.alibaba.com> References: <20220509074028.74954-1-jefflexu@linux.alibaba.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Introduce a context structure for managing data blobs, and helper functions for initializing and cleaning up this context structure. Signed-off-by: Jeffle Xu Reviewed-by: Gao Xiang --- fs/erofs/fscache.c | 41 +++++++++++++++++++++++++++++++++++++++++ fs/erofs/internal.h | 19 +++++++++++++++++++ 2 files changed, 60 insertions(+) diff --git a/fs/erofs/fscache.c b/fs/erofs/fscache.c index 7a6d0239ebb1..dfff245b006b 100644 --- a/fs/erofs/fscache.c +++ b/fs/erofs/fscache.c @@ -5,6 +5,47 @@ #include #include "internal.h" +int erofs_fscache_register_cookie(struct super_block *sb, + struct erofs_fscache **fscache, char *name) +{ + struct fscache_volume *volume = EROFS_SB(sb)->volume; + struct erofs_fscache *ctx; + struct fscache_cookie *cookie; + + ctx = kzalloc(sizeof(*ctx), GFP_KERNEL); + if (!ctx) + return -ENOMEM; + + cookie = fscache_acquire_cookie(volume, FSCACHE_ADV_WANT_CACHE_SIZE, + name, strlen(name), NULL, 0, 0); + if (!cookie) { + erofs_err(sb, "failed to get cookie for %s", name); + kfree(name); + return -EINVAL; + } + + fscache_use_cookie(cookie, false); + ctx->cookie = cookie; + + *fscache = ctx; + return 0; +} + +void erofs_fscache_unregister_cookie(struct erofs_fscache **fscache) +{ + struct erofs_fscache *ctx = *fscache; + + if (!ctx) + return; + + fscache_unuse_cookie(ctx->cookie, NULL, NULL); + fscache_relinquish_cookie(ctx->cookie, false); + ctx->cookie = NULL; + + kfree(ctx); + *fscache = NULL; +} + int erofs_fscache_register_fs(struct super_block *sb) { struct erofs_sb_info *sbi = EROFS_SB(sb); diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h index e4f6a13f161f..b1f19f058503 100644 --- a/fs/erofs/internal.h +++ b/fs/erofs/internal.h @@ -97,6 +97,10 @@ struct erofs_sb_lz4_info { u16 max_pclusterblks; }; +struct erofs_fscache { + struct fscache_cookie *cookie; +}; + struct erofs_sb_info { struct erofs_mount_opts opt; /* options */ #ifdef CONFIG_EROFS_FS_ZIP @@ -626,12 +630,27 @@ static inline int z_erofs_load_lzma_config(struct super_block *sb, #ifdef CONFIG_EROFS_FS_ONDEMAND int erofs_fscache_register_fs(struct super_block *sb); void erofs_fscache_unregister_fs(struct super_block *sb); + +int erofs_fscache_register_cookie(struct super_block *sb, + struct erofs_fscache **fscache, char *name); +void erofs_fscache_unregister_cookie(struct erofs_fscache **fscache); #else static inline int erofs_fscache_register_fs(struct super_block *sb) { return 0; } static inline void erofs_fscache_unregister_fs(struct super_block *sb) {} + +static inline int erofs_fscache_register_cookie(struct super_block *sb, + struct erofs_fscache **fscache, + char *name) +{ + return -EOPNOTSUPP; +} + +static inline void erofs_fscache_unregister_cookie(struct erofs_fscache **fscache) +{ +} #endif #define EFSCORRUPTED EUCLEAN /* Filesystem is corrupted */ From patchwork Mon May 9 07:40:19 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jingbo Xu X-Patchwork-Id: 12843125 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A92C0C4167B for ; Mon, 9 May 2022 08:02:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238076AbiEIIDa (ORCPT ); Mon, 9 May 2022 04:03:30 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59274 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235587AbiEIHo6 (ORCPT ); Mon, 9 May 2022 03:44:58 -0400 Received: from out30-56.freemail.mail.aliyun.com (out30-56.freemail.mail.aliyun.com [115.124.30.56]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7F1FB17073; Mon, 9 May 2022 00:41:03 -0700 (PDT) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R241e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e04426;MF=jefflexu@linux.alibaba.com;NM=1;PH=DS;RN=20;SR=0;TI=SMTPD_---0VCfxsaA_1652082049; Received: from localhost(mailfrom:jefflexu@linux.alibaba.com fp:SMTPD_---0VCfxsaA_1652082049) by smtp.aliyun-inc.com(127.0.0.1); Mon, 09 May 2022 15:40:50 +0800 From: Jeffle Xu To: dhowells@redhat.com, linux-cachefs@redhat.com, xiang@kernel.org, chao@kernel.org, linux-erofs@lists.ozlabs.org Cc: torvalds@linux-foundation.org, gregkh@linuxfoundation.org, willy@infradead.org, linux-fsdevel@vger.kernel.org, joseph.qi@linux.alibaba.com, bo.liu@linux.alibaba.com, tao.peng@linux.alibaba.com, gerry@linux.alibaba.com, eguan@linux.alibaba.com, linux-kernel@vger.kernel.org, luodaowen.backend@bytedance.com, tianzichen@kuaishou.com, yinxin.x@bytedance.com, zhangjiachen.jaycee@bytedance.com, zhujia.zj@bytedance.com Subject: [PATCH v11 13/22] erofs: add anonymous inode caching metadata for data blobs Date: Mon, 9 May 2022 15:40:19 +0800 Message-Id: <20220509074028.74954-14-jefflexu@linux.alibaba.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20220509074028.74954-1-jefflexu@linux.alibaba.com> References: <20220509074028.74954-1-jefflexu@linux.alibaba.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Introduce one anonymous inode for data blobs so that erofs can cache metadata directly within such anonymous inode. Signed-off-by: Jeffle Xu Reviewed-by: Gao Xiang --- fs/erofs/fscache.c | 39 ++++++++++++++++++++++++++++++++++++--- fs/erofs/internal.h | 6 ++++-- 2 files changed, 40 insertions(+), 5 deletions(-) diff --git a/fs/erofs/fscache.c b/fs/erofs/fscache.c index dfff245b006b..26f038d9c4e1 100644 --- a/fs/erofs/fscache.c +++ b/fs/erofs/fscache.c @@ -5,12 +5,17 @@ #include #include "internal.h" +static const struct address_space_operations erofs_fscache_meta_aops = { +}; + int erofs_fscache_register_cookie(struct super_block *sb, - struct erofs_fscache **fscache, char *name) + struct erofs_fscache **fscache, + char *name, bool need_inode) { struct fscache_volume *volume = EROFS_SB(sb)->volume; struct erofs_fscache *ctx; struct fscache_cookie *cookie; + int ret; ctx = kzalloc(sizeof(*ctx), GFP_KERNEL); if (!ctx) @@ -20,15 +25,40 @@ int erofs_fscache_register_cookie(struct super_block *sb, name, strlen(name), NULL, 0, 0); if (!cookie) { erofs_err(sb, "failed to get cookie for %s", name); - kfree(name); - return -EINVAL; + ret = -EINVAL; + goto err; } fscache_use_cookie(cookie, false); ctx->cookie = cookie; + if (need_inode) { + struct inode *const inode = new_inode(sb); + + if (!inode) { + erofs_err(sb, "failed to get anon inode for %s", name); + ret = -ENOMEM; + goto err_cookie; + } + + set_nlink(inode, 1); + inode->i_size = OFFSET_MAX; + inode->i_mapping->a_ops = &erofs_fscache_meta_aops; + mapping_set_gfp_mask(inode->i_mapping, GFP_NOFS); + + ctx->inode = inode; + } + *fscache = ctx; return 0; + +err_cookie: + fscache_unuse_cookie(ctx->cookie, NULL, NULL); + fscache_relinquish_cookie(ctx->cookie, false); + ctx->cookie = NULL; +err: + kfree(ctx); + return ret; } void erofs_fscache_unregister_cookie(struct erofs_fscache **fscache) @@ -42,6 +72,9 @@ void erofs_fscache_unregister_cookie(struct erofs_fscache **fscache) fscache_relinquish_cookie(ctx->cookie, false); ctx->cookie = NULL; + iput(ctx->inode); + ctx->inode = NULL; + kfree(ctx); *fscache = NULL; } diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h index b1f19f058503..5867cb63fd74 100644 --- a/fs/erofs/internal.h +++ b/fs/erofs/internal.h @@ -99,6 +99,7 @@ struct erofs_sb_lz4_info { struct erofs_fscache { struct fscache_cookie *cookie; + struct inode *inode; }; struct erofs_sb_info { @@ -632,7 +633,8 @@ int erofs_fscache_register_fs(struct super_block *sb); void erofs_fscache_unregister_fs(struct super_block *sb); int erofs_fscache_register_cookie(struct super_block *sb, - struct erofs_fscache **fscache, char *name); + struct erofs_fscache **fscache, + char *name, bool need_inode); void erofs_fscache_unregister_cookie(struct erofs_fscache **fscache); #else static inline int erofs_fscache_register_fs(struct super_block *sb) @@ -643,7 +645,7 @@ static inline void erofs_fscache_unregister_fs(struct super_block *sb) {} static inline int erofs_fscache_register_cookie(struct super_block *sb, struct erofs_fscache **fscache, - char *name) + char *name, bool need_inode) { return -EOPNOTSUPP; } From patchwork Mon May 9 07:40:20 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jingbo Xu X-Patchwork-Id: 12843131 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 00755C41535 for ; Mon, 9 May 2022 08:02:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238096AbiEIIDf (ORCPT ); Mon, 9 May 2022 04:03:35 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59274 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235762AbiEIHo7 (ORCPT ); Mon, 9 May 2022 03:44:59 -0400 Received: from out30-57.freemail.mail.aliyun.com (out30-57.freemail.mail.aliyun.com [115.124.30.57]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C6DA918E23; Mon, 9 May 2022 00:41:04 -0700 (PDT) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R431e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e01424;MF=jefflexu@linux.alibaba.com;NM=1;PH=DS;RN=20;SR=0;TI=SMTPD_---0VCgwzCt_1652082051; Received: from localhost(mailfrom:jefflexu@linux.alibaba.com fp:SMTPD_---0VCgwzCt_1652082051) by smtp.aliyun-inc.com(127.0.0.1); Mon, 09 May 2022 15:40:52 +0800 From: Jeffle Xu To: dhowells@redhat.com, linux-cachefs@redhat.com, xiang@kernel.org, chao@kernel.org, linux-erofs@lists.ozlabs.org Cc: torvalds@linux-foundation.org, gregkh@linuxfoundation.org, willy@infradead.org, linux-fsdevel@vger.kernel.org, joseph.qi@linux.alibaba.com, bo.liu@linux.alibaba.com, tao.peng@linux.alibaba.com, gerry@linux.alibaba.com, eguan@linux.alibaba.com, linux-kernel@vger.kernel.org, luodaowen.backend@bytedance.com, tianzichen@kuaishou.com, yinxin.x@bytedance.com, zhangjiachen.jaycee@bytedance.com, zhujia.zj@bytedance.com Subject: [PATCH v11 14/22] erofs: add erofs_fscache_read_folios() helper Date: Mon, 9 May 2022 15:40:20 +0800 Message-Id: <20220509074028.74954-15-jefflexu@linux.alibaba.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20220509074028.74954-1-jefflexu@linux.alibaba.com> References: <20220509074028.74954-1-jefflexu@linux.alibaba.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Add erofs_fscache_read_folios() helper reading from fscache. It supports on-demand read semantics. That is, it will make the backend prepare for the data when cache miss. Once data ready, it will read from the cache. This helper can then be used to implement .readpage()/.readahead() of on-demand read semantics. Signed-off-by: Jeffle Xu Reviewed-by: Gao Xiang --- fs/erofs/fscache.c | 54 ++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 54 insertions(+) diff --git a/fs/erofs/fscache.c b/fs/erofs/fscache.c index 26f038d9c4e1..ac02af8cce3e 100644 --- a/fs/erofs/fscache.c +++ b/fs/erofs/fscache.c @@ -5,6 +5,60 @@ #include #include "internal.h" +/* + * Read data from fscache and fill the read data into page cache described by + * @start/len, which shall be both aligned with PAGE_SIZE. @pstart describes + * the start physical address in the cache file. + */ +static int erofs_fscache_read_folios(struct fscache_cookie *cookie, + struct address_space *mapping, + loff_t start, size_t len, + loff_t pstart) +{ + enum netfs_io_source source; + struct netfs_io_request rreq = {}; + struct netfs_io_subrequest subreq = { .rreq = &rreq, }; + struct netfs_cache_resources *cres = &rreq.cache_resources; + struct super_block *sb = mapping->host->i_sb; + struct iov_iter iter; + size_t done = 0; + int ret; + + ret = fscache_begin_read_operation(cres, cookie); + if (ret) + return ret; + + while (done < len) { + subreq.start = pstart + done; + subreq.len = len - done; + subreq.flags = 1 << NETFS_SREQ_ONDEMAND; + + source = cres->ops->prepare_read(&subreq, LLONG_MAX); + if (WARN_ON(subreq.len == 0)) + source = NETFS_INVALID_READ; + if (source != NETFS_READ_FROM_CACHE) { + erofs_err(sb, "failed to fscache prepare_read (source %d)", + source); + ret = -EIO; + goto out; + } + + iov_iter_xarray(&iter, READ, &mapping->i_pages, + start + done, subreq.len); + ret = fscache_read(cres, subreq.start, &iter, + NETFS_READ_HOLE_FAIL, NULL, NULL); + if (ret) { + erofs_err(sb, "failed to fscache_read (ret %d)", ret); + goto out; + } + + done += subreq.len; + } +out: + fscache_end_operation(cres); + return ret; +} + static const struct address_space_operations erofs_fscache_meta_aops = { }; From patchwork Mon May 9 07:40:21 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jingbo Xu X-Patchwork-Id: 12843138 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 94F75C35295 for ; Mon, 9 May 2022 08:02:42 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238167AbiEIID5 (ORCPT ); Mon, 9 May 2022 04:03:57 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59452 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235773AbiEIHo7 (ORCPT ); Mon, 9 May 2022 03:44:59 -0400 Received: from out30-44.freemail.mail.aliyun.com (out30-44.freemail.mail.aliyun.com [115.124.30.44]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8EAB5140C9; Mon, 9 May 2022 00:41:05 -0700 (PDT) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R711e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e04426;MF=jefflexu@linux.alibaba.com;NM=1;PH=DS;RN=20;SR=0;TI=SMTPD_---0VCgX6lg_1652082052; Received: from localhost(mailfrom:jefflexu@linux.alibaba.com fp:SMTPD_---0VCgX6lg_1652082052) by smtp.aliyun-inc.com(127.0.0.1); Mon, 09 May 2022 15:40:53 +0800 From: Jeffle Xu To: dhowells@redhat.com, linux-cachefs@redhat.com, xiang@kernel.org, chao@kernel.org, linux-erofs@lists.ozlabs.org Cc: torvalds@linux-foundation.org, gregkh@linuxfoundation.org, willy@infradead.org, linux-fsdevel@vger.kernel.org, joseph.qi@linux.alibaba.com, bo.liu@linux.alibaba.com, tao.peng@linux.alibaba.com, gerry@linux.alibaba.com, eguan@linux.alibaba.com, linux-kernel@vger.kernel.org, luodaowen.backend@bytedance.com, tianzichen@kuaishou.com, yinxin.x@bytedance.com, zhangjiachen.jaycee@bytedance.com, zhujia.zj@bytedance.com Subject: [PATCH v11 15/22] erofs: register fscache context for primary data blob Date: Mon, 9 May 2022 15:40:21 +0800 Message-Id: <20220509074028.74954-16-jefflexu@linux.alibaba.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20220509074028.74954-1-jefflexu@linux.alibaba.com> References: <20220509074028.74954-1-jefflexu@linux.alibaba.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Registers fscache context for primary data blob. Also move the initialization of s_op and related fields forward, since anonymous inode will be allocated under the super block when registering the fscache context. Something worth mentioning about the cleanup routine. 1. The fscache context will instantiate anonymous inodes under the super block. Release these anonymous inodes when .put_super() is called, or we'll get "VFS: Busy inodes after unmount." warning. 2. The fscache context is initialized prior to the root inode. If .kill_sb() is called when mount failed, .put_super() won't be called when root inode has not been initialized yet. Thus .kill_sb() shall also contain the cleanup routine. Signed-off-by: Jeffle Xu Reviewed-by: Gao Xiang --- fs/erofs/internal.h | 1 + fs/erofs/super.c | 15 +++++++++++---- 2 files changed, 12 insertions(+), 4 deletions(-) diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h index 5867cb63fd74..386658416159 100644 --- a/fs/erofs/internal.h +++ b/fs/erofs/internal.h @@ -155,6 +155,7 @@ struct erofs_sb_info { /* fscache support */ struct fscache_volume *volume; + struct erofs_fscache *s_fscache; }; #define EROFS_SB(sb) ((struct erofs_sb_info *)(sb)->s_fs_info) diff --git a/fs/erofs/super.c b/fs/erofs/super.c index fd8daa447237..61dc900295f9 100644 --- a/fs/erofs/super.c +++ b/fs/erofs/super.c @@ -589,6 +589,9 @@ static int erofs_fc_fill_super(struct super_block *sb, struct fs_context *fc) int err; sb->s_magic = EROFS_SUPER_MAGIC; + sb->s_flags |= SB_RDONLY | SB_NOATIME; + sb->s_maxbytes = MAX_LFS_FILESIZE; + sb->s_op = &erofs_sops; sbi = kzalloc(sizeof(*sbi), GFP_KERNEL); if (!sbi) @@ -606,6 +609,11 @@ static int erofs_fc_fill_super(struct super_block *sb, struct fs_context *fc) err = erofs_fscache_register_fs(sb); if (err) return err; + + err = erofs_fscache_register_cookie(sb, &sbi->s_fscache, + sbi->opt.fsid, true); + if (err) + return err; } else { if (!sb_set_blocksize(sb, EROFS_BLKSIZ)) { erofs_err(sb, "failed to set erofs blksize"); @@ -628,11 +636,8 @@ static int erofs_fc_fill_super(struct super_block *sb, struct fs_context *fc) clear_opt(&sbi->opt, DAX_ALWAYS); } } - sb->s_flags |= SB_RDONLY | SB_NOATIME; - sb->s_maxbytes = MAX_LFS_FILESIZE; - sb->s_time_gran = 1; - sb->s_op = &erofs_sops; + sb->s_time_gran = 1; sb->s_xattr = erofs_xattr_handlers; if (test_opt(&sbi->opt, POSIX_ACL)) @@ -772,6 +777,7 @@ static void erofs_kill_sb(struct super_block *sb) erofs_free_dev_context(sbi->devs); fs_put_dax(sbi->dax_dev); + erofs_fscache_unregister_cookie(&sbi->s_fscache); erofs_fscache_unregister_fs(sb); kfree(sbi); sb->s_fs_info = NULL; @@ -790,6 +796,7 @@ static void erofs_put_super(struct super_block *sb) iput(sbi->managed_cache); sbi->managed_cache = NULL; #endif + erofs_fscache_unregister_cookie(&sbi->s_fscache); } static struct file_system_type erofs_fs_type = { From patchwork Mon May 9 07:40:22 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jingbo Xu X-Patchwork-Id: 12843119 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 26BF7C4167B for ; Mon, 9 May 2022 08:01:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232116AbiEIIDL (ORCPT ); Mon, 9 May 2022 04:03:11 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59450 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235775AbiEIHo7 (ORCPT ); Mon, 9 May 2022 03:44:59 -0400 Received: from out30-130.freemail.mail.aliyun.com (out30-130.freemail.mail.aliyun.com [115.124.30.130]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8EA6214083; Mon, 9 May 2022 00:41:05 -0700 (PDT) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R181e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=alimailimapcm10staff010182156082;MF=jefflexu@linux.alibaba.com;NM=1;PH=DS;RN=20;SR=0;TI=SMTPD_---0VCgX6mC_1652082054; Received: from localhost(mailfrom:jefflexu@linux.alibaba.com fp:SMTPD_---0VCgX6mC_1652082054) by smtp.aliyun-inc.com(127.0.0.1); Mon, 09 May 2022 15:40:55 +0800 From: Jeffle Xu To: dhowells@redhat.com, linux-cachefs@redhat.com, xiang@kernel.org, chao@kernel.org, linux-erofs@lists.ozlabs.org Cc: torvalds@linux-foundation.org, gregkh@linuxfoundation.org, willy@infradead.org, linux-fsdevel@vger.kernel.org, joseph.qi@linux.alibaba.com, bo.liu@linux.alibaba.com, tao.peng@linux.alibaba.com, gerry@linux.alibaba.com, eguan@linux.alibaba.com, linux-kernel@vger.kernel.org, luodaowen.backend@bytedance.com, tianzichen@kuaishou.com, yinxin.x@bytedance.com, zhangjiachen.jaycee@bytedance.com, zhujia.zj@bytedance.com Subject: [PATCH v11 16/22] erofs: register fscache context for extra data blobs Date: Mon, 9 May 2022 15:40:22 +0800 Message-Id: <20220509074028.74954-17-jefflexu@linux.alibaba.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20220509074028.74954-1-jefflexu@linux.alibaba.com> References: <20220509074028.74954-1-jefflexu@linux.alibaba.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Similar to the multi device mode, erofs could be mounted from one primary data blob (mandatory) and multiple extra data blobs (optional). Register fscache context for each extra data blob. Signed-off-by: Jeffle Xu Reviewed-by: Gao Xiang --- fs/erofs/data.c | 3 +++ fs/erofs/internal.h | 2 ++ fs/erofs/super.c | 8 +++++++- 3 files changed, 12 insertions(+), 1 deletion(-) diff --git a/fs/erofs/data.c b/fs/erofs/data.c index bc22642358ec..14b64d960541 100644 --- a/fs/erofs/data.c +++ b/fs/erofs/data.c @@ -199,6 +199,7 @@ int erofs_map_dev(struct super_block *sb, struct erofs_map_dev *map) map->m_bdev = sb->s_bdev; map->m_daxdev = EROFS_SB(sb)->dax_dev; map->m_dax_part_off = EROFS_SB(sb)->dax_part_off; + map->m_fscache = EROFS_SB(sb)->s_fscache; if (map->m_deviceid) { down_read(&devs->rwsem); @@ -210,6 +211,7 @@ int erofs_map_dev(struct super_block *sb, struct erofs_map_dev *map) map->m_bdev = dif->bdev; map->m_daxdev = dif->dax_dev; map->m_dax_part_off = dif->dax_part_off; + map->m_fscache = dif->fscache; up_read(&devs->rwsem); } else if (devs->extra_devices) { down_read(&devs->rwsem); @@ -227,6 +229,7 @@ int erofs_map_dev(struct super_block *sb, struct erofs_map_dev *map) map->m_bdev = dif->bdev; map->m_daxdev = dif->dax_dev; map->m_dax_part_off = dif->dax_part_off; + map->m_fscache = dif->fscache; break; } } diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h index 386658416159..fa488af8dfcf 100644 --- a/fs/erofs/internal.h +++ b/fs/erofs/internal.h @@ -49,6 +49,7 @@ typedef u32 erofs_blk_t; struct erofs_device_info { char *path; + struct erofs_fscache *fscache; struct block_device *bdev; struct dax_device *dax_dev; u64 dax_part_off; @@ -482,6 +483,7 @@ static inline int z_erofs_map_blocks_iter(struct inode *inode, #endif /* !CONFIG_EROFS_FS_ZIP */ struct erofs_map_dev { + struct erofs_fscache *m_fscache; struct block_device *m_bdev; struct dax_device *m_daxdev; u64 m_dax_part_off; diff --git a/fs/erofs/super.c b/fs/erofs/super.c index 61dc900295f9..c6755bcae4a6 100644 --- a/fs/erofs/super.c +++ b/fs/erofs/super.c @@ -259,7 +259,12 @@ static int erofs_init_devices(struct super_block *sb, } dis = ptr + erofs_blkoff(pos); - if (!erofs_is_fscache_mode(sb)) { + if (erofs_is_fscache_mode(sb)) { + err = erofs_fscache_register_cookie(sb, &dif->fscache, + dif->path, false); + if (err) + break; + } else { bdev = blkdev_get_by_path(dif->path, FMODE_READ | FMODE_EXCL, sb->s_type); @@ -710,6 +715,7 @@ static int erofs_release_device_info(int id, void *ptr, void *data) fs_put_dax(dif->dax_dev); if (dif->bdev) blkdev_put(dif->bdev, FMODE_READ | FMODE_EXCL); + erofs_fscache_unregister_cookie(&dif->fscache); kfree(dif->path); kfree(dif); return 0; From patchwork Mon May 9 07:40:23 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jingbo Xu X-Patchwork-Id: 12843137 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id AA8B1C4707E for ; Mon, 9 May 2022 08:02:42 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238176AbiEIIEB (ORCPT ); Mon, 9 May 2022 04:04:01 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59614 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235808AbiEIHpB (ORCPT ); Mon, 9 May 2022 03:45:01 -0400 Received: from out30-133.freemail.mail.aliyun.com (out30-133.freemail.mail.aliyun.com [115.124.30.133]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AC239165BC; Mon, 9 May 2022 00:41:05 -0700 (PDT) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R171e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=alimailimapcm10staff010182156082;MF=jefflexu@linux.alibaba.com;NM=1;PH=DS;RN=20;SR=0;TI=SMTPD_---0VCfaIpJ_1652082055; Received: from localhost(mailfrom:jefflexu@linux.alibaba.com fp:SMTPD_---0VCfaIpJ_1652082055) by smtp.aliyun-inc.com(127.0.0.1); Mon, 09 May 2022 15:40:56 +0800 From: Jeffle Xu To: dhowells@redhat.com, linux-cachefs@redhat.com, xiang@kernel.org, chao@kernel.org, linux-erofs@lists.ozlabs.org Cc: torvalds@linux-foundation.org, gregkh@linuxfoundation.org, willy@infradead.org, linux-fsdevel@vger.kernel.org, joseph.qi@linux.alibaba.com, bo.liu@linux.alibaba.com, tao.peng@linux.alibaba.com, gerry@linux.alibaba.com, eguan@linux.alibaba.com, linux-kernel@vger.kernel.org, luodaowen.backend@bytedance.com, tianzichen@kuaishou.com, yinxin.x@bytedance.com, zhangjiachen.jaycee@bytedance.com, zhujia.zj@bytedance.com Subject: [PATCH v11 17/22] erofs: implement fscache-based metadata read Date: Mon, 9 May 2022 15:40:23 +0800 Message-Id: <20220509074028.74954-18-jefflexu@linux.alibaba.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20220509074028.74954-1-jefflexu@linux.alibaba.com> References: <20220509074028.74954-1-jefflexu@linux.alibaba.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Implement the data plane of reading metadata from primary data blob over fscache. Signed-off-by: Jeffle Xu Reviewed-by: Gao Xiang --- fs/erofs/data.c | 19 +++++++++++++++---- fs/erofs/fscache.c | 25 +++++++++++++++++++++++++ 2 files changed, 40 insertions(+), 4 deletions(-) diff --git a/fs/erofs/data.c b/fs/erofs/data.c index 14b64d960541..bb9c1fd48c19 100644 --- a/fs/erofs/data.c +++ b/fs/erofs/data.c @@ -6,6 +6,7 @@ */ #include "internal.h" #include +#include #include #include @@ -35,14 +36,20 @@ void *erofs_bread(struct erofs_buf *buf, struct inode *inode, erofs_off_t offset = blknr_to_addr(blkaddr); pgoff_t index = offset >> PAGE_SHIFT; struct page *page = buf->page; + struct folio *folio; + unsigned int nofs_flag; if (!page || page->index != index) { erofs_put_metabuf(buf); - page = read_cache_page_gfp(mapping, index, - mapping_gfp_constraint(mapping, ~__GFP_FS)); - if (IS_ERR(page)) - return page; + + nofs_flag = memalloc_nofs_save(); + folio = read_cache_folio(mapping, index, NULL, NULL); + memalloc_nofs_restore(nofs_flag); + if (IS_ERR(folio)) + return folio; + /* should already be PageUptodate, no need to lock page */ + page = folio_file_page(folio, index); buf->page = page; } if (buf->kmap_type == EROFS_NO_KMAP) { @@ -63,6 +70,10 @@ void *erofs_bread(struct erofs_buf *buf, struct inode *inode, void *erofs_read_metabuf(struct erofs_buf *buf, struct super_block *sb, erofs_blk_t blkaddr, enum erofs_kmap_type type) { + if (erofs_is_fscache_mode(sb)) + return erofs_bread(buf, EROFS_SB(sb)->s_fscache->inode, + blkaddr, type); + return erofs_bread(buf, sb->s_bdev->bd_inode, blkaddr, type); } diff --git a/fs/erofs/fscache.c b/fs/erofs/fscache.c index ac02af8cce3e..23d7e862eed8 100644 --- a/fs/erofs/fscache.c +++ b/fs/erofs/fscache.c @@ -59,7 +59,32 @@ static int erofs_fscache_read_folios(struct fscache_cookie *cookie, return ret; } +static int erofs_fscache_meta_readpage(struct file *data, struct page *page) +{ + int ret; + struct folio *folio = page_folio(page); + struct super_block *sb = folio_mapping(folio)->host->i_sb; + struct erofs_map_dev mdev = { + .m_deviceid = 0, + .m_pa = folio_pos(folio), + }; + + ret = erofs_map_dev(sb, &mdev); + if (ret) + goto out; + + ret = erofs_fscache_read_folios(mdev.m_fscache->cookie, + folio_mapping(folio), folio_pos(folio), + folio_size(folio), mdev.m_pa); + if (!ret) + folio_mark_uptodate(folio); +out: + folio_unlock(folio); + return ret; +} + static const struct address_space_operations erofs_fscache_meta_aops = { + .readpage = erofs_fscache_meta_readpage, }; int erofs_fscache_register_cookie(struct super_block *sb, From patchwork Mon May 9 07:40:24 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jingbo Xu X-Patchwork-Id: 12843127 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5AA3AC433EF for ; Mon, 9 May 2022 08:02:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238055AbiEIIDX (ORCPT ); Mon, 9 May 2022 04:03:23 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59602 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235793AbiEIHpB (ORCPT ); Mon, 9 May 2022 03:45:01 -0400 Received: from out30-132.freemail.mail.aliyun.com (out30-132.freemail.mail.aliyun.com [115.124.30.132]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AC17E15FFF; Mon, 9 May 2022 00:41:05 -0700 (PDT) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R101e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e04423;MF=jefflexu@linux.alibaba.com;NM=1;PH=DS;RN=20;SR=0;TI=SMTPD_---0VCgwzEf_1652082057; Received: from localhost(mailfrom:jefflexu@linux.alibaba.com fp:SMTPD_---0VCgwzEf_1652082057) by smtp.aliyun-inc.com(127.0.0.1); Mon, 09 May 2022 15:40:58 +0800 From: Jeffle Xu To: dhowells@redhat.com, linux-cachefs@redhat.com, xiang@kernel.org, chao@kernel.org, linux-erofs@lists.ozlabs.org Cc: torvalds@linux-foundation.org, gregkh@linuxfoundation.org, willy@infradead.org, linux-fsdevel@vger.kernel.org, joseph.qi@linux.alibaba.com, bo.liu@linux.alibaba.com, tao.peng@linux.alibaba.com, gerry@linux.alibaba.com, eguan@linux.alibaba.com, linux-kernel@vger.kernel.org, luodaowen.backend@bytedance.com, tianzichen@kuaishou.com, yinxin.x@bytedance.com, zhangjiachen.jaycee@bytedance.com, zhujia.zj@bytedance.com Subject: [PATCH v11 18/22] erofs: implement fscache-based data read for non-inline layout Date: Mon, 9 May 2022 15:40:24 +0800 Message-Id: <20220509074028.74954-19-jefflexu@linux.alibaba.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20220509074028.74954-1-jefflexu@linux.alibaba.com> References: <20220509074028.74954-1-jefflexu@linux.alibaba.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Implement the data plane of reading data from data blobs over fscache for non-inline layout. Signed-off-by: Jeffle Xu Reviewed-by: Gao Xiang --- fs/erofs/fscache.c | 51 +++++++++++++++++++++++++++++++++++++++++++++ fs/erofs/inode.c | 4 ++++ fs/erofs/internal.h | 2 ++ 3 files changed, 57 insertions(+) diff --git a/fs/erofs/fscache.c b/fs/erofs/fscache.c index 23d7e862eed8..b3af72af7c88 100644 --- a/fs/erofs/fscache.c +++ b/fs/erofs/fscache.c @@ -83,10 +83,61 @@ static int erofs_fscache_meta_readpage(struct file *data, struct page *page) return ret; } +static int erofs_fscache_readpage(struct file *file, struct page *page) +{ + struct folio *folio = page_folio(page); + struct inode *inode = folio_mapping(folio)->host; + struct super_block *sb = inode->i_sb; + struct erofs_map_blocks map; + struct erofs_map_dev mdev; + erofs_off_t pos; + loff_t pstart; + int ret; + + DBG_BUGON(folio_size(folio) != EROFS_BLKSIZ); + + pos = folio_pos(folio); + map.m_la = pos; + + ret = erofs_map_blocks(inode, &map, EROFS_GET_BLOCKS_RAW); + if (ret) + goto out_unlock; + + if (!(map.m_flags & EROFS_MAP_MAPPED)) { + folio_zero_range(folio, 0, folio_size(folio)); + goto out_uptodate; + } + + mdev = (struct erofs_map_dev) { + .m_deviceid = map.m_deviceid, + .m_pa = map.m_pa, + }; + + ret = erofs_map_dev(sb, &mdev); + if (ret) + goto out_unlock; + + pstart = mdev.m_pa + (pos - map.m_la); + ret = erofs_fscache_read_folios(mdev.m_fscache->cookie, + folio_mapping(folio), folio_pos(folio), + folio_size(folio), pstart); + +out_uptodate: + if (!ret) + folio_mark_uptodate(folio); +out_unlock: + folio_unlock(folio); + return ret; +} + static const struct address_space_operations erofs_fscache_meta_aops = { .readpage = erofs_fscache_meta_readpage, }; +const struct address_space_operations erofs_fscache_access_aops = { + .readpage = erofs_fscache_readpage, +}; + int erofs_fscache_register_cookie(struct super_block *sb, struct erofs_fscache **fscache, char *name, bool need_inode) diff --git a/fs/erofs/inode.c b/fs/erofs/inode.c index e8b37ba5e9ad..8d3f56c6469b 100644 --- a/fs/erofs/inode.c +++ b/fs/erofs/inode.c @@ -297,6 +297,10 @@ static int erofs_fill_inode(struct inode *inode, int isdir) goto out_unlock; } inode->i_mapping->a_ops = &erofs_raw_access_aops; +#ifdef CONFIG_EROFS_FS_ONDEMAND + if (erofs_is_fscache_mode(inode->i_sb)) + inode->i_mapping->a_ops = &erofs_fscache_access_aops; +#endif out_unlock: erofs_put_metabuf(&buf); diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h index fa488af8dfcf..c8f6ac910976 100644 --- a/fs/erofs/internal.h +++ b/fs/erofs/internal.h @@ -639,6 +639,8 @@ int erofs_fscache_register_cookie(struct super_block *sb, struct erofs_fscache **fscache, char *name, bool need_inode); void erofs_fscache_unregister_cookie(struct erofs_fscache **fscache); + +extern const struct address_space_operations erofs_fscache_access_aops; #else static inline int erofs_fscache_register_fs(struct super_block *sb) { From patchwork Mon May 9 07:40:25 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jingbo Xu X-Patchwork-Id: 12843136 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 75285C35280 for ; Mon, 9 May 2022 08:02:42 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238141AbiEIIDr (ORCPT ); Mon, 9 May 2022 04:03:47 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59618 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235822AbiEIHpB (ORCPT ); Mon, 9 May 2022 03:45:01 -0400 Received: from out30-54.freemail.mail.aliyun.com (out30-54.freemail.mail.aliyun.com [115.124.30.54]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AC32C167C5; Mon, 9 May 2022 00:41:05 -0700 (PDT) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R711e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e04423;MF=jefflexu@linux.alibaba.com;NM=1;PH=DS;RN=20;SR=0;TI=SMTPD_---0VCfUcRK_1652082058; Received: from localhost(mailfrom:jefflexu@linux.alibaba.com fp:SMTPD_---0VCfUcRK_1652082058) by smtp.aliyun-inc.com(127.0.0.1); Mon, 09 May 2022 15:40:59 +0800 From: Jeffle Xu To: dhowells@redhat.com, linux-cachefs@redhat.com, xiang@kernel.org, chao@kernel.org, linux-erofs@lists.ozlabs.org Cc: torvalds@linux-foundation.org, gregkh@linuxfoundation.org, willy@infradead.org, linux-fsdevel@vger.kernel.org, joseph.qi@linux.alibaba.com, bo.liu@linux.alibaba.com, tao.peng@linux.alibaba.com, gerry@linux.alibaba.com, eguan@linux.alibaba.com, linux-kernel@vger.kernel.org, luodaowen.backend@bytedance.com, tianzichen@kuaishou.com, yinxin.x@bytedance.com, zhangjiachen.jaycee@bytedance.com, zhujia.zj@bytedance.com Subject: [PATCH v11 19/22] erofs: implement fscache-based data read for inline layout Date: Mon, 9 May 2022 15:40:25 +0800 Message-Id: <20220509074028.74954-20-jefflexu@linux.alibaba.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20220509074028.74954-1-jefflexu@linux.alibaba.com> References: <20220509074028.74954-1-jefflexu@linux.alibaba.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Implement the data plane of reading data from data blobs over fscache for inline layout. For the heading non-inline part, the data plane for non-inline layout is reused, while only the tail packing part needs special handling. Signed-off-by: Jeffle Xu Reviewed-by: Gao Xiang --- fs/erofs/fscache.c | 32 ++++++++++++++++++++++++++++++++ 1 file changed, 32 insertions(+) diff --git a/fs/erofs/fscache.c b/fs/erofs/fscache.c index b3af72af7c88..5b779812a5ee 100644 --- a/fs/erofs/fscache.c +++ b/fs/erofs/fscache.c @@ -83,6 +83,33 @@ static int erofs_fscache_meta_readpage(struct file *data, struct page *page) return ret; } +static int erofs_fscache_readpage_inline(struct folio *folio, + struct erofs_map_blocks *map) +{ + struct super_block *sb = folio_mapping(folio)->host->i_sb; + struct erofs_buf buf = __EROFS_BUF_INITIALIZER; + erofs_blk_t blknr; + size_t offset, len; + void *src, *dst; + + /* For tail packing layout, the offset may be non-zero. */ + offset = erofs_blkoff(map->m_pa); + blknr = erofs_blknr(map->m_pa); + len = map->m_llen; + + src = erofs_read_metabuf(&buf, sb, blknr, EROFS_KMAP); + if (IS_ERR(src)) + return PTR_ERR(src); + + dst = kmap_local_folio(folio, 0); + memcpy(dst, src + offset, len); + memset(dst + len, 0, PAGE_SIZE - len); + kunmap_local(dst); + + erofs_put_metabuf(&buf); + return 0; +} + static int erofs_fscache_readpage(struct file *file, struct page *page) { struct folio *folio = page_folio(page); @@ -108,6 +135,11 @@ static int erofs_fscache_readpage(struct file *file, struct page *page) goto out_uptodate; } + if (map.m_flags & EROFS_MAP_META) { + ret = erofs_fscache_readpage_inline(folio, &map); + goto out_uptodate; + } + mdev = (struct erofs_map_dev) { .m_deviceid = map.m_deviceid, .m_pa = map.m_pa, From patchwork Mon May 9 07:40:26 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jingbo Xu X-Patchwork-Id: 12843124 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7F518C43219 for ; Mon, 9 May 2022 08:02:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238061AbiEIID1 (ORCPT ); Mon, 9 May 2022 04:03:27 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59616 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235810AbiEIHpB (ORCPT ); Mon, 9 May 2022 03:45:01 -0400 Received: from out30-131.freemail.mail.aliyun.com (out30-131.freemail.mail.aliyun.com [115.124.30.131]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AC295167C4; Mon, 9 May 2022 00:41:06 -0700 (PDT) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R611e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e04357;MF=jefflexu@linux.alibaba.com;NM=1;PH=DS;RN=20;SR=0;TI=SMTPD_---0VCfxsdD_1652082060; Received: from localhost(mailfrom:jefflexu@linux.alibaba.com fp:SMTPD_---0VCfxsdD_1652082060) by smtp.aliyun-inc.com(127.0.0.1); Mon, 09 May 2022 15:41:01 +0800 From: Jeffle Xu To: dhowells@redhat.com, linux-cachefs@redhat.com, xiang@kernel.org, chao@kernel.org, linux-erofs@lists.ozlabs.org Cc: torvalds@linux-foundation.org, gregkh@linuxfoundation.org, willy@infradead.org, linux-fsdevel@vger.kernel.org, joseph.qi@linux.alibaba.com, bo.liu@linux.alibaba.com, tao.peng@linux.alibaba.com, gerry@linux.alibaba.com, eguan@linux.alibaba.com, linux-kernel@vger.kernel.org, luodaowen.backend@bytedance.com, tianzichen@kuaishou.com, yinxin.x@bytedance.com, zhangjiachen.jaycee@bytedance.com, zhujia.zj@bytedance.com Subject: [PATCH v11 20/22] erofs: implement fscache-based data readahead Date: Mon, 9 May 2022 15:40:26 +0800 Message-Id: <20220509074028.74954-21-jefflexu@linux.alibaba.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20220509074028.74954-1-jefflexu@linux.alibaba.com> References: <20220509074028.74954-1-jefflexu@linux.alibaba.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Implement fscache-based data readahead. Also registers an individual bdi for each erofs instance to enable readahead. Signed-off-by: Jeffle Xu Reviewed-by: Gao Xiang --- fs/erofs/fscache.c | 90 ++++++++++++++++++++++++++++++++++++++++++++++ fs/erofs/super.c | 4 +++ 2 files changed, 94 insertions(+) diff --git a/fs/erofs/fscache.c b/fs/erofs/fscache.c index 5b779812a5ee..a402d8f0a063 100644 --- a/fs/erofs/fscache.c +++ b/fs/erofs/fscache.c @@ -162,12 +162,102 @@ static int erofs_fscache_readpage(struct file *file, struct page *page) return ret; } +static void erofs_fscache_unlock_folios(struct readahead_control *rac, + size_t len) +{ + while (len) { + struct folio *folio = readahead_folio(rac); + + len -= folio_size(folio); + folio_mark_uptodate(folio); + folio_unlock(folio); + } +} + +static void erofs_fscache_readahead(struct readahead_control *rac) +{ + struct inode *inode = rac->mapping->host; + struct super_block *sb = inode->i_sb; + size_t len, count, done = 0; + erofs_off_t pos; + loff_t start, offset; + int ret; + + if (!readahead_count(rac)) + return; + + start = readahead_pos(rac); + len = readahead_length(rac); + + do { + struct erofs_map_blocks map; + struct erofs_map_dev mdev; + + pos = start + done; + map.m_la = pos; + + ret = erofs_map_blocks(inode, &map, EROFS_GET_BLOCKS_RAW); + if (ret) + return; + + offset = start + done; + count = min_t(size_t, map.m_llen - (pos - map.m_la), + len - done); + + if (!(map.m_flags & EROFS_MAP_MAPPED)) { + struct iov_iter iter; + + iov_iter_xarray(&iter, READ, &rac->mapping->i_pages, + offset, count); + iov_iter_zero(count, &iter); + + erofs_fscache_unlock_folios(rac, count); + ret = count; + continue; + } + + if (map.m_flags & EROFS_MAP_META) { + struct folio *folio = readahead_folio(rac); + + ret = erofs_fscache_readpage_inline(folio, &map); + if (!ret) { + folio_mark_uptodate(folio); + ret = folio_size(folio); + } + + folio_unlock(folio); + continue; + } + + mdev = (struct erofs_map_dev) { + .m_deviceid = map.m_deviceid, + .m_pa = map.m_pa, + }; + ret = erofs_map_dev(sb, &mdev); + if (ret) + return; + + ret = erofs_fscache_read_folios(mdev.m_fscache->cookie, + rac->mapping, offset, count, + mdev.m_pa + (pos - map.m_la)); + /* + * For the error cases, the folios will be unlocked when + * .readahead() returns. + */ + if (!ret) { + erofs_fscache_unlock_folios(rac, count); + ret = count; + } + } while (ret > 0 && ((done += ret) < len)); +} + static const struct address_space_operations erofs_fscache_meta_aops = { .readpage = erofs_fscache_meta_readpage, }; const struct address_space_operations erofs_fscache_access_aops = { .readpage = erofs_fscache_readpage, + .readahead = erofs_fscache_readahead, }; int erofs_fscache_register_cookie(struct super_block *sb, diff --git a/fs/erofs/super.c b/fs/erofs/super.c index c6755bcae4a6..f68ba929100d 100644 --- a/fs/erofs/super.c +++ b/fs/erofs/super.c @@ -619,6 +619,10 @@ static int erofs_fc_fill_super(struct super_block *sb, struct fs_context *fc) sbi->opt.fsid, true); if (err) return err; + + err = super_setup_bdi(sb); + if (err) + return err; } else { if (!sb_set_blocksize(sb, EROFS_BLKSIZ)) { erofs_err(sb, "failed to set erofs blksize"); From patchwork Mon May 9 07:40:27 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jingbo Xu X-Patchwork-Id: 12843126 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 90369C43217 for ; Mon, 9 May 2022 08:02:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238072AbiEIID3 (ORCPT ); Mon, 9 May 2022 04:03:29 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59782 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235875AbiEIHpE (ORCPT ); Mon, 9 May 2022 03:45:04 -0400 Received: from out199-2.us.a.mail.aliyun.com (out199-2.us.a.mail.aliyun.com [47.90.199.2]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5C6371B7AD; Mon, 9 May 2022 00:41:08 -0700 (PDT) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R181e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e04394;MF=jefflexu@linux.alibaba.com;NM=1;PH=DS;RN=20;SR=0;TI=SMTPD_---0VCggcGT_1652082062; Received: from localhost(mailfrom:jefflexu@linux.alibaba.com fp:SMTPD_---0VCggcGT_1652082062) by smtp.aliyun-inc.com(127.0.0.1); Mon, 09 May 2022 15:41:03 +0800 From: Jeffle Xu To: dhowells@redhat.com, linux-cachefs@redhat.com, xiang@kernel.org, chao@kernel.org, linux-erofs@lists.ozlabs.org Cc: torvalds@linux-foundation.org, gregkh@linuxfoundation.org, willy@infradead.org, linux-fsdevel@vger.kernel.org, joseph.qi@linux.alibaba.com, bo.liu@linux.alibaba.com, tao.peng@linux.alibaba.com, gerry@linux.alibaba.com, eguan@linux.alibaba.com, linux-kernel@vger.kernel.org, luodaowen.backend@bytedance.com, tianzichen@kuaishou.com, yinxin.x@bytedance.com, zhangjiachen.jaycee@bytedance.com, zhujia.zj@bytedance.com Subject: [PATCH v11 21/22] erofs: add 'fsid' mount option Date: Mon, 9 May 2022 15:40:27 +0800 Message-Id: <20220509074028.74954-22-jefflexu@linux.alibaba.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20220509074028.74954-1-jefflexu@linux.alibaba.com> References: <20220509074028.74954-1-jefflexu@linux.alibaba.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Introduce 'fsid' mount option to enable on-demand read sementics, in which case, erofs will be mounted from data blobs. Users could specify the name of primary data blob by this mount option. Signed-off-by: Jeffle Xu Reviewed-by: Gao Xiang Tested-by: Zichen Tian Tested-by: Jia Zhu --- fs/erofs/super.c | 31 ++++++++++++++++++++++++++++++- fs/erofs/sysfs.c | 4 ++-- 2 files changed, 32 insertions(+), 3 deletions(-) diff --git a/fs/erofs/super.c b/fs/erofs/super.c index f68ba929100d..4a623630e1c4 100644 --- a/fs/erofs/super.c +++ b/fs/erofs/super.c @@ -371,6 +371,8 @@ static int erofs_read_superblock(struct super_block *sb) if (erofs_sb_has_ztailpacking(sbi)) erofs_info(sb, "EXPERIMENTAL compressed inline data feature in use. Use at your own risk!"); + if (erofs_is_fscache_mode(sb)) + erofs_info(sb, "EXPERIMENTAL fscache-based on-demand read feature in use. Use at your own risk!"); out: erofs_put_metabuf(&buf); return ret; @@ -399,6 +401,7 @@ enum { Opt_dax, Opt_dax_enum, Opt_device, + Opt_fsid, Opt_err }; @@ -423,6 +426,7 @@ static const struct fs_parameter_spec erofs_fs_parameters[] = { fsparam_flag("dax", Opt_dax), fsparam_enum("dax", Opt_dax_enum, erofs_dax_param_enums), fsparam_string("device", Opt_device), + fsparam_string("fsid", Opt_fsid), {} }; @@ -518,6 +522,16 @@ static int erofs_fc_parse_param(struct fs_context *fc, } ++ctx->devs->extra_devices; break; + case Opt_fsid: +#ifdef CONFIG_EROFS_FS_ONDEMAND + kfree(ctx->opt.fsid); + ctx->opt.fsid = kstrdup(param->string, GFP_KERNEL); + if (!ctx->opt.fsid) + return -ENOMEM; +#else + errorfc(fc, "fsid option not supported"); +#endif + break; default: return -ENOPARAM; } @@ -604,6 +618,7 @@ static int erofs_fc_fill_super(struct super_block *sb, struct fs_context *fc) sb->s_fs_info = sbi; sbi->opt = ctx->opt; + ctx->opt.fsid = NULL; sbi->devs = ctx->devs; ctx->devs = NULL; @@ -690,6 +705,11 @@ static int erofs_fc_fill_super(struct super_block *sb, struct fs_context *fc) static int erofs_fc_get_tree(struct fs_context *fc) { + struct erofs_fs_context *ctx = fc->fs_private; + + if (IS_ENABLED(CONFIG_EROFS_FS_ONDEMAND) && ctx->opt.fsid) + return get_tree_nodev(fc, erofs_fc_fill_super); + return get_tree_bdev(fc, erofs_fc_fill_super); } @@ -739,6 +759,7 @@ static void erofs_fc_free(struct fs_context *fc) struct erofs_fs_context *ctx = fc->fs_private; erofs_free_dev_context(ctx->devs); + kfree(ctx->opt.fsid); kfree(ctx); } @@ -779,7 +800,10 @@ static void erofs_kill_sb(struct super_block *sb) WARN_ON(sb->s_magic != EROFS_SUPER_MAGIC); - kill_block_super(sb); + if (erofs_is_fscache_mode(sb)) + generic_shutdown_super(sb); + else + kill_block_super(sb); sbi = EROFS_SB(sb); if (!sbi) @@ -789,6 +813,7 @@ static void erofs_kill_sb(struct super_block *sb) fs_put_dax(sbi->dax_dev); erofs_fscache_unregister_cookie(&sbi->s_fscache); erofs_fscache_unregister_fs(sb); + kfree(sbi->opt.fsid); kfree(sbi); sb->s_fs_info = NULL; } @@ -938,6 +963,10 @@ static int erofs_show_options(struct seq_file *seq, struct dentry *root) seq_puts(seq, ",dax=always"); if (test_opt(opt, DAX_NEVER)) seq_puts(seq, ",dax=never"); +#ifdef CONFIG_EROFS_FS_ONDEMAND + if (opt->fsid) + seq_printf(seq, ",fsid=%s", opt->fsid); +#endif return 0; } diff --git a/fs/erofs/sysfs.c b/fs/erofs/sysfs.c index f3babf1e6608..c1383e508bbe 100644 --- a/fs/erofs/sysfs.c +++ b/fs/erofs/sysfs.c @@ -205,8 +205,8 @@ int erofs_register_sysfs(struct super_block *sb) sbi->s_kobj.kset = &erofs_root; init_completion(&sbi->s_kobj_unregister); - err = kobject_init_and_add(&sbi->s_kobj, &erofs_sb_ktype, NULL, - "%s", sb->s_id); + err = kobject_init_and_add(&sbi->s_kobj, &erofs_sb_ktype, NULL, "%s", + erofs_is_fscache_mode(sb) ? sbi->opt.fsid : sb->s_id); if (err) goto put_sb_kobj; return 0; From patchwork Mon May 9 07:40:28 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jingbo Xu X-Patchwork-Id: 12843128 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6C3ADC4332F for ; Mon, 9 May 2022 08:02:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238058AbiEIIDY (ORCPT ); Mon, 9 May 2022 04:03:24 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59788 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235861AbiEIHpE (ORCPT ); Mon, 9 May 2022 03:45:04 -0400 Received: from out30-45.freemail.mail.aliyun.com (out30-45.freemail.mail.aliyun.com [115.124.30.45]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 415A123179; Mon, 9 May 2022 00:41:09 -0700 (PDT) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R211e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e04357;MF=jefflexu@linux.alibaba.com;NM=1;PH=DS;RN=20;SR=0;TI=SMTPD_---0VCgwzGy_1652082063; Received: from localhost(mailfrom:jefflexu@linux.alibaba.com fp:SMTPD_---0VCgwzGy_1652082063) by smtp.aliyun-inc.com(127.0.0.1); Mon, 09 May 2022 15:41:04 +0800 From: Jeffle Xu To: dhowells@redhat.com, linux-cachefs@redhat.com, xiang@kernel.org, chao@kernel.org, linux-erofs@lists.ozlabs.org Cc: torvalds@linux-foundation.org, gregkh@linuxfoundation.org, willy@infradead.org, linux-fsdevel@vger.kernel.org, joseph.qi@linux.alibaba.com, bo.liu@linux.alibaba.com, tao.peng@linux.alibaba.com, gerry@linux.alibaba.com, eguan@linux.alibaba.com, linux-kernel@vger.kernel.org, luodaowen.backend@bytedance.com, tianzichen@kuaishou.com, yinxin.x@bytedance.com, zhangjiachen.jaycee@bytedance.com, zhujia.zj@bytedance.com Subject: [PATCH v11 22/22] erofs: change to use asynchronous io for fscache readpage/readahead Date: Mon, 9 May 2022 15:40:28 +0800 Message-Id: <20220509074028.74954-23-jefflexu@linux.alibaba.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20220509074028.74954-1-jefflexu@linux.alibaba.com> References: <20220509074028.74954-1-jefflexu@linux.alibaba.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: Xin Yin Use asynchronous io to read data from fscache may greatly improve IO bandwidth for sequential buffer read scenario. Change erofs_fscache_read_folios to erofs_fscache_read_folios_async, and read data from fscache asynchronously. Make .readpage()/.readahead() to use this new helper. Signed-off-by: Xin Yin Reviewed-by: Jeffle Xu Signed-off-by: Jeffle Xu --- fs/erofs/fscache.c | 239 +++++++++++++++++++++++++++++++++++++-------- 1 file changed, 199 insertions(+), 40 deletions(-) diff --git a/fs/erofs/fscache.c b/fs/erofs/fscache.c index a402d8f0a063..f23fde003c6d 100644 --- a/fs/erofs/fscache.c +++ b/fs/erofs/fscache.c @@ -5,57 +5,203 @@ #include #include "internal.h" +static struct netfs_io_request *erofs_fscache_alloc_request(struct address_space *mapping, + loff_t start, size_t len) +{ + struct netfs_io_request *rreq; + + rreq = kzalloc(sizeof(struct netfs_io_request), GFP_KERNEL); + if (!rreq) + return ERR_PTR(-ENOMEM); + + rreq->start = start; + rreq->len = len; + rreq->mapping = mapping; + INIT_LIST_HEAD(&rreq->subrequests); + refcount_set(&rreq->ref, 1); + + return rreq; +} + +static void erofs_fscache_put_request(struct netfs_io_request *rreq) +{ + if (refcount_dec_and_test(&rreq->ref)) { + if (rreq->cache_resources.ops) + rreq->cache_resources.ops->end_operation(&rreq->cache_resources); + kfree(rreq); + } +} + +static void erofs_fscache_put_subrequest(struct netfs_io_subrequest *subreq) +{ + if (refcount_dec_and_test(&subreq->ref)) { + erofs_fscache_put_request(subreq->rreq); + kfree(subreq); + } +} + +static void erofs_fscache_clear_subrequests(struct netfs_io_request *rreq) +{ + struct netfs_io_subrequest *subreq; + + while (!list_empty(&rreq->subrequests)) { + subreq = list_first_entry(&rreq->subrequests, + struct netfs_io_subrequest, rreq_link); + list_del(&subreq->rreq_link); + erofs_fscache_put_subrequest(subreq); + } +} + +static void erofs_fscache_rreq_unlock_folios(struct netfs_io_request *rreq) +{ + struct netfs_io_subrequest *subreq; + struct folio *folio; + unsigned int iopos; + pgoff_t start_page = rreq->start / PAGE_SIZE; + pgoff_t last_page = ((rreq->start + rreq->len) / PAGE_SIZE) - 1; + bool subreq_failed = false; + + XA_STATE(xas, &rreq->mapping->i_pages, start_page); + + subreq = list_first_entry(&rreq->subrequests, + struct netfs_io_subrequest, rreq_link); + iopos = 0; + subreq_failed = (subreq->error < 0); + + rcu_read_lock(); + xas_for_each(&xas, folio, last_page) { + unsigned int pgpos = (folio_index(folio) - start_page) * PAGE_SIZE; + unsigned int pgend = pgpos + folio_size(folio); + bool pg_failed = false; + + for (;;) { + if (!subreq) { + pg_failed = true; + break; + } + + pg_failed |= subreq_failed; + if (pgend < iopos + subreq->len) + break; + + iopos += subreq->len; + if (!list_is_last(&subreq->rreq_link, &rreq->subrequests)) { + subreq = list_next_entry(subreq, rreq_link); + subreq_failed = (subreq->error < 0); + } else { + subreq = NULL; + subreq_failed = false; + } + if (pgend == iopos) + break; + } + + if (!pg_failed) + folio_mark_uptodate(folio); + + folio_unlock(folio); + } + rcu_read_unlock(); +} + +static void erofs_fscache_rreq_complete(struct netfs_io_request *rreq) +{ + erofs_fscache_rreq_unlock_folios(rreq); + erofs_fscache_clear_subrequests(rreq); + erofs_fscache_put_request(rreq); +} + +static void erofc_fscache_subreq_complete(void *priv, + ssize_t transferred_or_error, bool was_async) +{ + struct netfs_io_subrequest *subreq = priv; + struct netfs_io_request *rreq = subreq->rreq; + + if (IS_ERR_VALUE(transferred_or_error)) + subreq->error = transferred_or_error; + + if (atomic_dec_and_test(&rreq->nr_outstanding)) + erofs_fscache_rreq_complete(rreq); + + erofs_fscache_put_subrequest(subreq); +} + /* * Read data from fscache and fill the read data into page cache described by - * @start/len, which shall be both aligned with PAGE_SIZE. @pstart describes + * @rreq, which shall be both aligned with PAGE_SIZE. @pstart describes * the start physical address in the cache file. */ -static int erofs_fscache_read_folios(struct fscache_cookie *cookie, - struct address_space *mapping, - loff_t start, size_t len, +static int erofs_fscache_read_folios_async(struct fscache_cookie *cookie, + struct netfs_io_request *rreq, loff_t pstart) { enum netfs_io_source source; - struct netfs_io_request rreq = {}; - struct netfs_io_subrequest subreq = { .rreq = &rreq, }; - struct netfs_cache_resources *cres = &rreq.cache_resources; - struct super_block *sb = mapping->host->i_sb; + struct super_block *sb = rreq->mapping->host->i_sb; + struct netfs_io_subrequest *subreq; + struct netfs_cache_resources *cres = &rreq->cache_resources; struct iov_iter iter; + loff_t start = rreq->start; + size_t len = rreq->len; size_t done = 0; int ret; + atomic_set(&rreq->nr_outstanding, 1); + ret = fscache_begin_read_operation(cres, cookie); if (ret) - return ret; + goto out; while (done < len) { - subreq.start = pstart + done; - subreq.len = len - done; - subreq.flags = 1 << NETFS_SREQ_ONDEMAND; + subreq = kzalloc(sizeof(struct netfs_io_subrequest), GFP_KERNEL); + if (subreq) { + INIT_LIST_HEAD(&subreq->rreq_link); + refcount_set(&subreq->ref, 2); + subreq->rreq = rreq; + refcount_inc(&rreq->ref); + } else { + ret = -ENOMEM; + goto out; + } + + subreq->start = pstart + done; + subreq->len = len - done; + subreq->flags = 1 << NETFS_SREQ_ONDEMAND; - source = cres->ops->prepare_read(&subreq, LLONG_MAX); - if (WARN_ON(subreq.len == 0)) + list_add_tail(&subreq->rreq_link, &rreq->subrequests); + + source = cres->ops->prepare_read(subreq, LLONG_MAX); + if (WARN_ON(subreq->len == 0)) source = NETFS_INVALID_READ; if (source != NETFS_READ_FROM_CACHE) { erofs_err(sb, "failed to fscache prepare_read (source %d)", source); ret = -EIO; + subreq->error = ret; + erofs_fscache_put_subrequest(subreq); goto out; } - iov_iter_xarray(&iter, READ, &mapping->i_pages, - start + done, subreq.len); - ret = fscache_read(cres, subreq.start, &iter, - NETFS_READ_HOLE_FAIL, NULL, NULL); + atomic_inc(&rreq->nr_outstanding); + + iov_iter_xarray(&iter, READ, &rreq->mapping->i_pages, + start + done, subreq->len); + + ret = fscache_read(cres, subreq->start, &iter, + NETFS_READ_HOLE_FAIL, + erofc_fscache_subreq_complete, subreq); + if (ret == -EIOCBQUEUED) + ret = 0; if (ret) { erofs_err(sb, "failed to fscache_read (ret %d)", ret); goto out; } - done += subreq.len; + done += subreq->len; } out: - fscache_end_operation(cres); + if (atomic_dec_and_test(&rreq->nr_outstanding)) + erofs_fscache_rreq_complete(rreq); + return ret; } @@ -64,6 +210,7 @@ static int erofs_fscache_meta_readpage(struct file *data, struct page *page) int ret; struct folio *folio = page_folio(page); struct super_block *sb = folio_mapping(folio)->host->i_sb; + struct netfs_io_request *rreq; struct erofs_map_dev mdev = { .m_deviceid = 0, .m_pa = folio_pos(folio), @@ -73,11 +220,13 @@ static int erofs_fscache_meta_readpage(struct file *data, struct page *page) if (ret) goto out; - ret = erofs_fscache_read_folios(mdev.m_fscache->cookie, - folio_mapping(folio), folio_pos(folio), - folio_size(folio), mdev.m_pa); - if (!ret) - folio_mark_uptodate(folio); + rreq = erofs_fscache_alloc_request(folio_mapping(folio), + folio_pos(folio), folio_size(folio)); + if (IS_ERR(rreq)) + goto out; + + return erofs_fscache_read_folios_async(mdev.m_fscache->cookie, + rreq, mdev.m_pa); out: folio_unlock(folio); return ret; @@ -117,6 +266,7 @@ static int erofs_fscache_readpage(struct file *file, struct page *page) struct super_block *sb = inode->i_sb; struct erofs_map_blocks map; struct erofs_map_dev mdev; + struct netfs_io_request *rreq; erofs_off_t pos; loff_t pstart; int ret; @@ -149,10 +299,15 @@ static int erofs_fscache_readpage(struct file *file, struct page *page) if (ret) goto out_unlock; + + rreq = erofs_fscache_alloc_request(folio_mapping(folio), + folio_pos(folio), folio_size(folio)); + if (IS_ERR(rreq)) + goto out_unlock; + pstart = mdev.m_pa + (pos - map.m_la); - ret = erofs_fscache_read_folios(mdev.m_fscache->cookie, - folio_mapping(folio), folio_pos(folio), - folio_size(folio), pstart); + return erofs_fscache_read_folios_async(mdev.m_fscache->cookie, + rreq, pstart); out_uptodate: if (!ret) @@ -162,15 +317,16 @@ static int erofs_fscache_readpage(struct file *file, struct page *page) return ret; } -static void erofs_fscache_unlock_folios(struct readahead_control *rac, - size_t len) +static void erofs_fscache_advance_folios(struct readahead_control *rac, + size_t len, bool unlock) { while (len) { struct folio *folio = readahead_folio(rac); - len -= folio_size(folio); - folio_mark_uptodate(folio); - folio_unlock(folio); + if (unlock) { + folio_mark_uptodate(folio); + folio_unlock(folio); + } } } @@ -192,6 +348,7 @@ static void erofs_fscache_readahead(struct readahead_control *rac) do { struct erofs_map_blocks map; struct erofs_map_dev mdev; + struct netfs_io_request *rreq; pos = start + done; map.m_la = pos; @@ -211,7 +368,7 @@ static void erofs_fscache_readahead(struct readahead_control *rac) offset, count); iov_iter_zero(count, &iter); - erofs_fscache_unlock_folios(rac, count); + erofs_fscache_advance_folios(rac, count, true); ret = count; continue; } @@ -237,15 +394,17 @@ static void erofs_fscache_readahead(struct readahead_control *rac) if (ret) return; - ret = erofs_fscache_read_folios(mdev.m_fscache->cookie, - rac->mapping, offset, count, - mdev.m_pa + (pos - map.m_la)); + rreq = erofs_fscache_alloc_request(rac->mapping, offset, count); + if (IS_ERR(rreq)) + return; /* - * For the error cases, the folios will be unlocked when - * .readahead() returns. + * Drop the ref of folios here. Unlock them in + * rreq_unlock_folios() when rreq complete. */ + erofs_fscache_advance_folios(rac, count, false); + ret = erofs_fscache_read_folios_async(mdev.m_fscache->cookie, + rreq, mdev.m_pa + (pos - map.m_la)); if (!ret) { - erofs_fscache_unlock_folios(rac, count); ret = count; } } while (ret > 0 && ((done += ret) < len));