[v8,07/20] cachefiles: document on-demand read mode

Message ID	20220406075612.60298-8-jefflexu@linux.alibaba.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-fsdevel-owner@kernel.org> From: Jeffle Xu <jefflexu@linux.alibaba.com> To: dhowells@redhat.com, linux-cachefs@redhat.com, xiang@kernel.org, chao@kernel.org, linux-erofs@lists.ozlabs.org Cc: torvalds@linux-foundation.org, gregkh@linuxfoundation.org, willy@infradead.org, linux-fsdevel@vger.kernel.org, joseph.qi@linux.alibaba.com, bo.liu@linux.alibaba.com, tao.peng@linux.alibaba.com, gerry@linux.alibaba.com, eguan@linux.alibaba.com, linux-kernel@vger.kernel.org, luodaowen.backend@bytedance.com, tianzichen@kuaishou.com, fannaihao@baidu.com Subject: [PATCH v8 07/20] cachefiles: document on-demand read mode Date: Wed, 6 Apr 2022 15:55:59 +0800 Message-Id: <20220406075612.60298-8-jefflexu@linux.alibaba.com> In-Reply-To: <20220406075612.60298-1-jefflexu@linux.alibaba.com> References: <20220406075612.60298-1-jefflexu@linux.alibaba.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	fscache,erofs: fscache-based on-demand read semantics \| expand [v8,00/20] fscache,erofs: fscache-based on-demand read semantics [v8,01/20] cachefiles: unmark inode in use in error path [v8,02/20] cachefiles: extract write routine [v8,03/20] cachefiles: notify user daemon with anon_fd when looking up cookie [v8,04/20] cachefiles: notify user daemon when withdrawing cookie [v8,05/20] cachefiles: implement on-demand read [v8,06/20] cachefiles: enable on-demand read mode [v8,07/20] cachefiles: document on-demand read mode [v8,08/20] erofs: make erofs_map_blocks() generally available [v8,09/20] erofs: add mode checking helper [v8,10/20] erofs: register fscache volume [v8,11/20] erofs: add fscache context helper functions [v8,12/20] erofs: add anonymous inode managing page cache for data blob [v8,13/20] erofs: add erofs_fscache_read_folios() helper [v8,14/20] erofs: register fscache context for primary data blob [v8,15/20] erofs: register fscache context for extra data blobs [v8,16/20] erofs: implement fscache-based metadata read [v8,17/20] erofs: implement fscache-based data read for non-inline layout [v8,18/20] erofs: implement fscache-based data read for inline layout [v8,19/20] erofs: implement fscache-based data readahead [v8,20/20] erofs: add 'fsid' mount option

Message ID

20220406075612.60298-8-jefflexu@linux.alibaba.com (mailing list archive)

State

New, archived

Headers

From: Jeffle Xu <jefflexu@linux.alibaba.com>
To: dhowells@redhat.com, linux-cachefs@redhat.com, xiang@kernel.org,
        chao@kernel.org, linux-erofs@lists.ozlabs.org
Cc: torvalds@linux-foundation.org, gregkh@linuxfoundation.org,
        willy@infradead.org, linux-fsdevel@vger.kernel.org,
        joseph.qi@linux.alibaba.com, bo.liu@linux.alibaba.com,
        tao.peng@linux.alibaba.com, gerry@linux.alibaba.com,
        eguan@linux.alibaba.com, linux-kernel@vger.kernel.org,
        luodaowen.backend@bytedance.com, tianzichen@kuaishou.com,
        fannaihao@baidu.com
Subject: [PATCH v8 07/20] cachefiles: document on-demand read mode
Date: Wed,  6 Apr 2022 15:55:59 +0800
Message-Id: <20220406075612.60298-8-jefflexu@linux.alibaba.com>
In-Reply-To: <20220406075612.60298-1-jefflexu@linux.alibaba.com>
References: <20220406075612.60298-1-jefflexu@linux.alibaba.com>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Precedence: bulk

Series

fscache,erofs: fscache-based on-demand read semantics | expand

Document new user interface introduced by on-demand read mode. Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com> --- .../filesystems/caching/cachefiles.rst | 165 ++++++++++++++++++ 1 file changed, 165 insertions(+)

Comments

David Howells April 11, 2022, 1:38 p.m. UTC | #1

Jeffle Xu <jefflexu@linux.alibaba.com> wrote:

> + (*) On-demand Read.
> +

Unnecessary extra blank line.

Jeffle Xu <jefflexu@linux.alibaba.com> wrote:

> +
> +
> +On-demand Read
> +==============
> +
> +When working in original mode, cachefiles mainly serves as a local cache for
> +remote networking fs, while in on-demand read mode, cachefiles can boost the
> +scenario where on-demand read semantics is needed, e.g. container image
> +distribution.
> +
> +The essential difference between these two modes is that, in original mode,
> +when cache miss, netfs itself will fetch data from remote, and then write the
> +fetched data into cache file. While in on-demand read mode, a user daemon is
> +responsible for fetching data and then writing to the cache file.
> +
> +``CONFIG_CACHEFILES_ONDEMAND`` shall be enabled to support on-demand read mode.

You're missing a few articles there.  How about:

"""
When working in its original mode, cachefiles mainly serves as a local cache
for a remote networking fs - while in on-demand read mode, cachefiles can boost
the scenario where on-demand read semantics are needed, e.g. container image
distribution.

The essential difference between these two modes is that, in original mode,
when a cache miss occurs, the netfs will fetch the data from the remote server
and then write it to the cache file.  With on-demand read mode, however,
fetching and the data and writing it into the cache is delegated to a user
daemon.

``CONFIG_CACHEFILES_ONDEMAND`` shall be enabled to support on-demand read mode.
"""

"should be enabled".

Also, two spaces after a full stop please (but not after the dot in a
contraction, e.g. "e.g.").

> +The on-demand read mode relies on a simple protocol used for communication
> +between kernel and user daemon. The model is like::

"The protocol can be modelled as"?

> +The cachefiles kernel module will send requests to

the

> user daemon when needed.
> +

the

> User daemon needs to poll on the devnode ('/dev/cachefiles') to check if
> +there's

a

> pending request to be processed.  A POLLIN event will be returned
> +when there's

a

> pending request.

> +Then user daemon needs to read

"The user daemon [than] reads "

> the devnode to fetch one

one -> a

> request and process it
> +accordingly. It is worth nothing

nothing -> noting

> that each read only gets one request. When
> +finished processing the request,

the

> user daemon needs to write the reply to the
> +devnode.

> +Each request is started with a message header like::

"is started with" -> "starts with".
"like" -> "of the form".

> +	* ``id`` is a unique ID identifying this request among all pending
> +	  requests.

What's the scope of the uniqueness of "id"?  Is it just unique to a particular
cachefiles cache?

> +	* ``len`` identifies the whole length of this request, including the
> +	  header and following type specific payload.

type-specific.

> +An optional parameter is added to "bind" command::

to the "bind" command.

> +When

the

> "bind" command takes

takes -> is given

> without argument, it defaults to the original mode.
> +When

the

> "bind" command takes

is given

> with

the

> "ondemand" argument, i.e. "bind ondemand",
> +on-demand read mode will be enabled.

> +OPEN Request

The

> +------------
> +
> +When

the

> netfs opens a cache file for the first time, a request with

the

> +CACHEFILES_OP_OPEN opcode, a.k.a

an

> OPEN request will be sent to

the

> user daemon. The
> +payload format is like::

format is like -> of the form

> +
> +	struct cachefiles_open {
> +		__u32 volume_key_size;
> +		__u32 cookie_key_size;
> +		__u32 fd;
> +		__u32 flags;
> +		__u8  data[];
> +	};
> +

"where:"

> +	* ``data`` contains

the

> volume_key and cookie_key in sequence.

Might be better to say "contains the volume_key followed directly by the
cookie_key.  The volume key is a NUL-terminated string; cookie_key is binary
data.".

> +
> +	* ``volume_key_size`` identifies

identifies -> indicates/supplies

> the size of

the

> volume key of the cache
> +	  file, in bytes. volume_key is of string format, with a suffix '\0'.
> +
> +	* ``cookie_key_size`` identifies the size of cookie key of the cache
> +	  file, in bytes. cookie_key is of binary format, which is netfs
> +	  specific.

"... indicates the size of the cookie key in bytes."

> +
> +	* ``fd`` identifies the

the -> an

> anonymous fd of

of -> referring to

> the cache file, with

with -> through

> which user
> +	  daemon can perform write/llseek file operations on the cache file.
> +
> +
> +

The

> OPEN request contains

a

> (volume_key, cookie_key, anon_fd) triple for

triplet for the

I would probably also use {...} rather than (...).

> corresponding
> +cache file. With this triple,

triplet, the

> user daemon could

could -> can

> fetch and write data into the
> +cache file in the background, even when kernel has not triggered the

the -> a

> cache miss
> +yet.

The

> User daemon is able to distinguish the requested cache file with the given
> +(volume_key, cookie_key), and write the fetched data into

the

> cache file with

with -> using

> the
> +given anon_fd.
> +
> +After recording the (volume_key, cookie_key, anon_fd) triple,

triplet, the

> user daemon shall

shall -> should

> +reply with

reply with -> complete the request by issuing a

> "copen" (complete open) command::
> +
> +	copen <id>,<cache_size>
> +
> +	* ``id`` is exactly the id field of the previous OPEN request.
> +
> +	* When >= 0, ``cache_size`` identifies the size of the cache file;
> +	  when < 0, ``cache_size`` identifies the error code ecountered by the
> +	  user daemon.

identifies -> indicates
ecountered -> encountered

> +CLOSE Request

The

> +-------------
> +When

a

> cookie withdrawed,

withdrawed -> withdrawn

> a request with

a

> CACHEFILES_OP_CLOSE opcode, a.k.a CLOSE
> +request,

Maybe phrase as "... a close request (opcode CACHEFILES_OP_CLOSE),

> will be sent to user daemon. It will notify

the

> user daemon to close the
> +attached anon_fd. The payload format is like::

like -> of the form

> +
> +	struct cachefiles_close {
> +		__u32 fd;
> +	};
> +

"where:"

> +	* ``fd`` identifies the anon_fd to be closed, which is exactly the same

"... which should be the same as that provided to the OPEN request".

Is it possible for userspace to move the fd around with dup() or whatever?

> +	  with that in OPEN request.
> +
> +
> +READ Request

The

> +------------
> +
> +When on-demand read mode is turned on, and

a

> cache miss encountered,

the

> kernel will
> +send a request with CACHEFILES_OP_READ opcode, a.k.a READ request,

"send a READ request (opcode CACHEFILES_OP_READ)"

> to

the

> user
> +daemon. It will notify

It will notify -> This will ask/tell

> user daemon to fetch data in the requested file range.
> +The payload format is like::

format is like -> is of the form

> +
> +	struct cachefiles_read {
> +		__u64 off;
> +		__u64 len;
> +		__u32 fd;
> +	};
> +
> +	* ``off`` identifies the starting offset of the requested file range.

identifies -> indicates

> +
> +	* ``len`` identifies the length of the requested file range.
> +

identifies -> indicates (you could alternatively say "specified")

> +	* ``fd`` identifies the anonymous fd of the requested cache file. It is
> +	  guaranteed that it shall be the same with

"same with" -> "same as"

Since the kernel cannot make such a guarantee, I think you may need to restate
this as something like "Userspace must present the same fd as was given in the
previous OPEN request".

> the fd field in the previous
> +	  OPEN request.
> +
> +When receiving one

one -> a

> READ request,

the

> user daemon needs to fetch

the

> data of the
> +requested file range, and then write the fetched data

, and then write the fetched data -> and write it

> into cache file

cache file -> cache

> with

using

> the
> +given anonymous fd.

+ to indicate the destination.

> +
> +When finished

When finished -> To finish

> processing the READ request,

the

> user daemon needs to reply with

the

> +CACHEFILES_IOC_CREAD ioctl on the corresponding anon_fd::
> +
> +	ioctl(fd, CACHEFILES_IOC_CREAD, id);
> +
> +	* ``fd`` is exactly the fd field of the previous READ request.

Does that have to be true?  What if userspace moves it somewhere else?

> +
> +	* ``id`` is exactly the id field of the previous READ request.

is exactly the -> must match the

David

Jingbo Xu April 12, 2022, 3:17 a.m. UTC | #2

Hi, thanks for such thorough and detailed reviewing and all these
corrections. I will fix them in the next version.

On 4/11/22 9:38 PM, David Howells wrote:
> Jeffle Xu <jefflexu@linux.alibaba.com> wrote:
> 
>> + (*) On-demand Read.
>> +
> 
> Unnecessary extra blank line.
> 
> Jeffle Xu <jefflexu@linux.alibaba.com> wrote:
> 
> What's the scope of the uniqueness of "id"?  Is it just unique to a particular
> cachefiles cache?

Yes. Currently each cache, I mean, each "struct cachefiles_cache",
maintains an xarray. The id is unique in the scope of the cache.

> 
>> +
>> +	struct cachefiles_close {
>> +		__u32 fd;
>> +	};
>> +
> 
> "where:"
> 
>> +	* ``fd`` identifies the anon_fd to be closed, which is exactly the same
> 
> "... which should be the same as that provided to the OPEN request".
> 
> Is it possible for userspace to move the fd around with dup() or whatever?

Currently No. The anon_fd is stored in

```
struct cachefiles_object {
	int fd;
	...
}
```

When sending READ/CLOSE request, the associated anon_fd is all fetched
from @fd field of struct cachefiles_object. dup() won't update @fd field
of struct cachefiles_object.

Thus when dup() is done, let's say there are fd A (original) and fd B
(duplicated from fd A) associated to the cachefiles_object. Then the @fd
field of following READ/CLOSE requests is always fd A, since @fd field
of struct cachefiles_object is not updated. However the CREAD (reply to
READ request) ioctl indeed can be done on either fd A or fd B.

Then when fd A is closed while fd B is still alive, @fd field of
following READ/CLOSE requests is still fd A, which is indeed buggy since
fd A can be reused then.

To fix this, I plan to replace @fd field of READ/CLOSE requests with
@object_id field.

```
struct cachefiles_close {
        __u32 object_id;
};

struct cachefiles_read {
        __u32 object_id;
        __u64 off;
        __u64 len;
};
```

Then each cachefiles_object has a unique object_id (in the scope of
cachefiles_cache). Each object_id can be mapped to multiple fds (1:N
mapping), while kernel only send an initial fd of this object_id through
OPEN request.

```
struct cachefiles_open {
	__u32 object_id;
        __u32 fd;
        __u32 volume_key_size;
        __u32 cookie_key_size;
        __u32 flags;
        __u8  data[];
};
```

The user daemon can modify the mapping through dup(), but it's
responsible for maintaining and updating this mapping. That is, the
mapping between object_id and all its associated fds should be
maintained in the user space.

>> +
>> +	struct cachefiles_read {
>> +		__u64 off;
>> +		__u64 len;
>> +		__u32 fd;
>> +	};
>> +
>> +	* ``off`` identifies the starting offset of the requested file range.
> 
> identifies -> indicates
> 
>> +
>> +	* ``len`` identifies the length of the requested file range.
>> +
> 
> identifies -> indicates (you could alternatively say "specified")
> 
>> +	* ``fd`` identifies the anonymous fd of the requested cache file. It is
>> +	  guaranteed that it shall be the same with
> 
> "same with" -> "same as"
> 
> Since the kernel cannot make such a guarantee, I think you may need to restate
> this as something like "Userspace must present the same fd as was given in the
> previous OPEN request".

Yes, whether the @fd field of READ request is same as that of OPEN
request or not, is actually implementation dependent. However as
described above, I'm going to change @fd field into @object_id field.
After that refactoring, the @object_id field of READ/CLOSE request
should be the same as the @object_id filed of CLOSE request.

>> +CACHEFILES_IOC_CREAD ioctl on the corresponding anon_fd::
>> +
>> +	ioctl(fd, CACHEFILES_IOC_CREAD, id);
>> +
>> +	* ``fd`` is exactly the fd field of the previous READ request.
> 
> Does that have to be true?  What if userspace moves it somewhere else?
> 

As described above, I'm going to change @fd field into @object_id field.
Then there is an @object_id filed in READ request. When replying the
READ request, the user daemon itself needs to get the corresponding
anon_fd of the given @object_id through the self-maintained mapping.

diff --git a/Documentation/filesystems/caching/cachefiles.rst b/Documentation/filesystems/caching/cachefiles.rst
index 8bf396b76359..386801135027 100644
--- a/Documentation/filesystems/caching/cachefiles.rst
+++ b/Documentation/filesystems/caching/cachefiles.rst
@@ -28,6 +28,8 @@  Cache on Already Mounted Filesystem
 
  (*) Debugging.
 
+ (*) On-demand Read.
+
 
 
 Overview
@@ -482,3 +484,166 @@  the control file.  For example::
 	echo $((1|4|8)) >/sys/module/cachefiles/parameters/debug
 
 will turn on all function entry debugging.
+
+
+On-demand Read
+==============
+
+When working in original mode, cachefiles mainly serves as a local cache for
+remote networking fs, while in on-demand read mode, cachefiles can boost the
+scenario where on-demand read semantics is needed, e.g. container image
+distribution.
+
+The essential difference between these two modes is that, in original mode,
+when cache miss, netfs itself will fetch data from remote, and then write the
+fetched data into cache file. While in on-demand read mode, a user daemon is
+responsible for fetching data and then writing to the cache file.
+
+``CONFIG_CACHEFILES_ONDEMAND`` shall be enabled to support on-demand read mode.
+
+
+Protocol Communication
+----------------------
+
+The on-demand read mode relies on a simple protocol used for communication
+between kernel and user daemon. The model is like::
+
+	kernel --[request]--> user daemon --[reply]--> kernel
+
+The cachefiles kernel module will send requests to user daemon when needed.
+User daemon needs to poll on the devnode ('/dev/cachefiles') to check if
+there's pending request to be processed. A POLLIN event will be returned
+when there's pending request.
+
+Then user daemon needs to read the devnode to fetch one request and process it
+accordingly. It is worth nothing that each read only gets one request. When
+finished processing the request, user daemon needs to write the reply to the
+devnode.
+
+Each request is started with a message header like::
+
+	struct cachefiles_msg {
+		__u32 id;
+		__u32 opcode;
+		__u32 len;
+		__u8  data[];
+	};
+
+	* ``id`` is a unique ID identifying this request among all pending
+	  requests.
+
+	* ``opcode`` identifies the type of this request.
+
+	* ``data`` identifies the payload of this request.
+
+	* ``len`` identifies the whole length of this request, including the
+	  header and following type specific payload.
+
+
+Turn on On-demand Mode
+----------------------
+
+An optional parameter is added to "bind" command::
+
+	bind [ondemand]
+
+When "bind" command takes without argument, it defaults to the original mode.
+When "bind" command takes with "ondemand" argument, i.e. "bind ondemand",
+on-demand read mode will be enabled.
+
+
+OPEN Request
+------------
+
+When netfs opens a cache file for the first time, a request with
+CACHEFILES_OP_OPEN opcode, a.k.a OPEN request will be sent to user daemon. The
+payload format is like::
+
+	struct cachefiles_open {
+		__u32 volume_key_size;
+		__u32 cookie_key_size;
+		__u32 fd;
+		__u32 flags;
+		__u8  data[];
+	};
+
+	* ``data`` contains volume_key and cookie_key in sequence.
+
+	* ``volume_key_size`` identifies the size of volume key of the cache
+	  file, in bytes. volume_key is of string format, with a suffix '\0'.
+
+	* ``cookie_key_size`` identifies the size of cookie key of the cache
+	  file, in bytes. cookie_key is of binary format, which is netfs
+	  specific.
+
+	* ``fd`` identifies the anonymous fd of the cache file, with which user
+	  daemon can perform write/llseek file operations on the cache file.
+
+
+OPEN request contains (volume_key, cookie_key, anon_fd) triple for corresponding
+cache file. With this triple, user daemon could fetch and write data into the
+cache file in the background, even when kernel has not triggered the cache miss
+yet. User daemon is able to distinguish the requested cache file with the given
+(volume_key, cookie_key), and write the fetched data into cache file with the
+given anon_fd.
+
+After recording the (volume_key, cookie_key, anon_fd) triple, user daemon shall
+reply with "copen" (complete open) command::
+
+	copen <id>,<cache_size>
+
+	* ``id`` is exactly the id field of the previous OPEN request.
+
+	* When >= 0, ``cache_size`` identifies the size of the cache file;
+	  when < 0, ``cache_size`` identifies the error code ecountered by the
+	  user daemon.
+
+
+CLOSE Request
+-------------
+When cookie withdrawed, a request with CACHEFILES_OP_CLOSE opcode, a.k.a CLOSE
+request, will be sent to user daemon. It will notify user daemon to close the
+attached anon_fd. The payload format is like::
+
+	struct cachefiles_close {
+		__u32 fd;
+	};
+
+	* ``fd`` identifies the anon_fd to be closed, which is exactly the same
+	  with that in OPEN request.
+
+
+READ Request
+------------
+
+When on-demand read mode is turned on, and cache miss encountered, kernel will
+send a request with CACHEFILES_OP_READ opcode, a.k.a READ request, to user
+daemon. It will notify user daemon to fetch data in the requested file range.
+The payload format is like::
+
+	struct cachefiles_read {
+		__u64 off;
+		__u64 len;
+		__u32 fd;
+	};
+
+	* ``off`` identifies the starting offset of the requested file range.
+
+	* ``len`` identifies the length of the requested file range.
+
+	* ``fd`` identifies the anonymous fd of the requested cache file. It is
+	  guaranteed that it shall be the same with the fd field in the previous
+	  OPEN request.
+
+When receiving one READ request, user daemon needs to fetch data of the
+requested file range, and then write the fetched data into cache file with the
+given anonymous fd.
+
+When finished processing the READ request, user daemon needs to reply with
+CACHEFILES_IOC_CREAD ioctl on the corresponding anon_fd::
+
+	ioctl(fd, CACHEFILES_IOC_CREAD, id);
+
+	* ``fd`` is exactly the fd field of the previous READ request.
+
+	* ``id`` is exactly the id field of the previous READ request.

[v8,07/20] cachefiles: document on-demand read mode

Commit Message

Comments

Patch