Message ID | 20220406075612.60298-8-jefflexu@linux.alibaba.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | fscache,erofs: fscache-based on-demand read semantics | expand |
Jeffle Xu <jefflexu@linux.alibaba.com> wrote: > + (*) On-demand Read. > + Unnecessary extra blank line. Jeffle Xu <jefflexu@linux.alibaba.com> wrote: > + > + > +On-demand Read > +============== > + > +When working in original mode, cachefiles mainly serves as a local cache for > +remote networking fs, while in on-demand read mode, cachefiles can boost the > +scenario where on-demand read semantics is needed, e.g. container image > +distribution. > + > +The essential difference between these two modes is that, in original mode, > +when cache miss, netfs itself will fetch data from remote, and then write the > +fetched data into cache file. While in on-demand read mode, a user daemon is > +responsible for fetching data and then writing to the cache file. > + > +``CONFIG_CACHEFILES_ONDEMAND`` shall be enabled to support on-demand read mode. You're missing a few articles there. How about: """ When working in its original mode, cachefiles mainly serves as a local cache for a remote networking fs - while in on-demand read mode, cachefiles can boost the scenario where on-demand read semantics are needed, e.g. container image distribution. The essential difference between these two modes is that, in original mode, when a cache miss occurs, the netfs will fetch the data from the remote server and then write it to the cache file. With on-demand read mode, however, fetching and the data and writing it into the cache is delegated to a user daemon. ``CONFIG_CACHEFILES_ONDEMAND`` shall be enabled to support on-demand read mode. """ "should be enabled". Also, two spaces after a full stop please (but not after the dot in a contraction, e.g. "e.g."). > +The on-demand read mode relies on a simple protocol used for communication > +between kernel and user daemon. The model is like:: "The protocol can be modelled as"? > +The cachefiles kernel module will send requests to the > user daemon when needed. > + the > User daemon needs to poll on the devnode ('/dev/cachefiles') to check if > +there's a > pending request to be processed. A POLLIN event will be returned > +when there's a > pending request. > +Then user daemon needs to read "The user daemon [than] reads " > the devnode to fetch one one -> a > request and process it > +accordingly. It is worth nothing nothing -> noting > that each read only gets one request. When > +finished processing the request, the > user daemon needs to write the reply to the > +devnode. > +Each request is started with a message header like:: "is started with" -> "starts with". "like" -> "of the form". > + * ``id`` is a unique ID identifying this request among all pending > + requests. What's the scope of the uniqueness of "id"? Is it just unique to a particular cachefiles cache? > + * ``len`` identifies the whole length of this request, including the > + header and following type specific payload. type-specific. > +An optional parameter is added to "bind" command:: to the "bind" command. > +When the > "bind" command takes takes -> is given > without argument, it defaults to the original mode. > +When the > "bind" command takes is given > with the > "ondemand" argument, i.e. "bind ondemand", > +on-demand read mode will be enabled. > +OPEN Request The > +------------ > + > +When the > netfs opens a cache file for the first time, a request with the > +CACHEFILES_OP_OPEN opcode, a.k.a an > OPEN request will be sent to the > user daemon. The > +payload format is like:: format is like -> of the form > + > + struct cachefiles_open { > + __u32 volume_key_size; > + __u32 cookie_key_size; > + __u32 fd; > + __u32 flags; > + __u8 data[]; > + }; > + "where:" > + * ``data`` contains the > volume_key and cookie_key in sequence. Might be better to say "contains the volume_key followed directly by the cookie_key. The volume key is a NUL-terminated string; cookie_key is binary data.". > + > + * ``volume_key_size`` identifies identifies -> indicates/supplies > the size of the > volume key of the cache > + file, in bytes. volume_key is of string format, with a suffix '\0'. > + > + * ``cookie_key_size`` identifies the size of cookie key of the cache > + file, in bytes. cookie_key is of binary format, which is netfs > + specific. "... indicates the size of the cookie key in bytes." > + > + * ``fd`` identifies the the -> an > anonymous fd of of -> referring to > the cache file, with with -> through > which user > + daemon can perform write/llseek file operations on the cache file. > + > + > + The > OPEN request contains a > (volume_key, cookie_key, anon_fd) triple for triplet for the I would probably also use {...} rather than (...). > corresponding > +cache file. With this triple, triplet, the > user daemon could could -> can > fetch and write data into the > +cache file in the background, even when kernel has not triggered the the -> a > cache miss > +yet. The > User daemon is able to distinguish the requested cache file with the given > +(volume_key, cookie_key), and write the fetched data into the > cache file with with -> using > the > +given anon_fd. > + > +After recording the (volume_key, cookie_key, anon_fd) triple, triplet, the > user daemon shall shall -> should > +reply with reply with -> complete the request by issuing a > "copen" (complete open) command:: > + > + copen <id>,<cache_size> > + > + * ``id`` is exactly the id field of the previous OPEN request. > + > + * When >= 0, ``cache_size`` identifies the size of the cache file; > + when < 0, ``cache_size`` identifies the error code ecountered by the > + user daemon. identifies -> indicates ecountered -> encountered > +CLOSE Request The > +------------- > +When a > cookie withdrawed, withdrawed -> withdrawn > a request with a > CACHEFILES_OP_CLOSE opcode, a.k.a CLOSE > +request, Maybe phrase as "... a close request (opcode CACHEFILES_OP_CLOSE), > will be sent to user daemon. It will notify the > user daemon to close the > +attached anon_fd. The payload format is like:: like -> of the form > + > + struct cachefiles_close { > + __u32 fd; > + }; > + "where:" > + * ``fd`` identifies the anon_fd to be closed, which is exactly the same "... which should be the same as that provided to the OPEN request". Is it possible for userspace to move the fd around with dup() or whatever? > + with that in OPEN request. > + > + > +READ Request The > +------------ > + > +When on-demand read mode is turned on, and a > cache miss encountered, the > kernel will > +send a request with CACHEFILES_OP_READ opcode, a.k.a READ request, "send a READ request (opcode CACHEFILES_OP_READ)" > to the > user > +daemon. It will notify It will notify -> This will ask/tell > user daemon to fetch data in the requested file range. > +The payload format is like:: format is like -> is of the form > + > + struct cachefiles_read { > + __u64 off; > + __u64 len; > + __u32 fd; > + }; > + > + * ``off`` identifies the starting offset of the requested file range. identifies -> indicates > + > + * ``len`` identifies the length of the requested file range. > + identifies -> indicates (you could alternatively say "specified") > + * ``fd`` identifies the anonymous fd of the requested cache file. It is > + guaranteed that it shall be the same with "same with" -> "same as" Since the kernel cannot make such a guarantee, I think you may need to restate this as something like "Userspace must present the same fd as was given in the previous OPEN request". > the fd field in the previous > + OPEN request. > + > +When receiving one one -> a > READ request, the > user daemon needs to fetch the > data of the > +requested file range, and then write the fetched data , and then write the fetched data -> and write it > into cache file cache file -> cache > with using > the > +given anonymous fd. + to indicate the destination. > + > +When finished When finished -> To finish > processing the READ request, the > user daemon needs to reply with the > +CACHEFILES_IOC_CREAD ioctl on the corresponding anon_fd:: > + > + ioctl(fd, CACHEFILES_IOC_CREAD, id); > + > + * ``fd`` is exactly the fd field of the previous READ request. Does that have to be true? What if userspace moves it somewhere else? > + > + * ``id`` is exactly the id field of the previous READ request. is exactly the -> must match the David
Hi, thanks for such thorough and detailed reviewing and all these corrections. I will fix them in the next version. On 4/11/22 9:38 PM, David Howells wrote: > Jeffle Xu <jefflexu@linux.alibaba.com> wrote: > >> + (*) On-demand Read. >> + > > Unnecessary extra blank line. > > Jeffle Xu <jefflexu@linux.alibaba.com> wrote: > > What's the scope of the uniqueness of "id"? Is it just unique to a particular > cachefiles cache? Yes. Currently each cache, I mean, each "struct cachefiles_cache", maintains an xarray. The id is unique in the scope of the cache. > >> + >> + struct cachefiles_close { >> + __u32 fd; >> + }; >> + > > "where:" > >> + * ``fd`` identifies the anon_fd to be closed, which is exactly the same > > "... which should be the same as that provided to the OPEN request". > > Is it possible for userspace to move the fd around with dup() or whatever? Currently No. The anon_fd is stored in ``` struct cachefiles_object { int fd; ... } ``` When sending READ/CLOSE request, the associated anon_fd is all fetched from @fd field of struct cachefiles_object. dup() won't update @fd field of struct cachefiles_object. Thus when dup() is done, let's say there are fd A (original) and fd B (duplicated from fd A) associated to the cachefiles_object. Then the @fd field of following READ/CLOSE requests is always fd A, since @fd field of struct cachefiles_object is not updated. However the CREAD (reply to READ request) ioctl indeed can be done on either fd A or fd B. Then when fd A is closed while fd B is still alive, @fd field of following READ/CLOSE requests is still fd A, which is indeed buggy since fd A can be reused then. To fix this, I plan to replace @fd field of READ/CLOSE requests with @object_id field. ``` struct cachefiles_close { __u32 object_id; }; struct cachefiles_read { __u32 object_id; __u64 off; __u64 len; }; ``` Then each cachefiles_object has a unique object_id (in the scope of cachefiles_cache). Each object_id can be mapped to multiple fds (1:N mapping), while kernel only send an initial fd of this object_id through OPEN request. ``` struct cachefiles_open { __u32 object_id; __u32 fd; __u32 volume_key_size; __u32 cookie_key_size; __u32 flags; __u8 data[]; }; ``` The user daemon can modify the mapping through dup(), but it's responsible for maintaining and updating this mapping. That is, the mapping between object_id and all its associated fds should be maintained in the user space. >> + >> + struct cachefiles_read { >> + __u64 off; >> + __u64 len; >> + __u32 fd; >> + }; >> + >> + * ``off`` identifies the starting offset of the requested file range. > > identifies -> indicates > >> + >> + * ``len`` identifies the length of the requested file range. >> + > > identifies -> indicates (you could alternatively say "specified") > >> + * ``fd`` identifies the anonymous fd of the requested cache file. It is >> + guaranteed that it shall be the same with > > "same with" -> "same as" > > Since the kernel cannot make such a guarantee, I think you may need to restate > this as something like "Userspace must present the same fd as was given in the > previous OPEN request". Yes, whether the @fd field of READ request is same as that of OPEN request or not, is actually implementation dependent. However as described above, I'm going to change @fd field into @object_id field. After that refactoring, the @object_id field of READ/CLOSE request should be the same as the @object_id filed of CLOSE request. >> +CACHEFILES_IOC_CREAD ioctl on the corresponding anon_fd:: >> + >> + ioctl(fd, CACHEFILES_IOC_CREAD, id); >> + >> + * ``fd`` is exactly the fd field of the previous READ request. > > Does that have to be true? What if userspace moves it somewhere else? > As described above, I'm going to change @fd field into @object_id field. Then there is an @object_id filed in READ request. When replying the READ request, the user daemon itself needs to get the corresponding anon_fd of the given @object_id through the self-maintained mapping.
diff --git a/Documentation/filesystems/caching/cachefiles.rst b/Documentation/filesystems/caching/cachefiles.rst index 8bf396b76359..386801135027 100644 --- a/Documentation/filesystems/caching/cachefiles.rst +++ b/Documentation/filesystems/caching/cachefiles.rst @@ -28,6 +28,8 @@ Cache on Already Mounted Filesystem (*) Debugging. + (*) On-demand Read. + Overview @@ -482,3 +484,166 @@ the control file. For example:: echo $((1|4|8)) >/sys/module/cachefiles/parameters/debug will turn on all function entry debugging. + + +On-demand Read +============== + +When working in original mode, cachefiles mainly serves as a local cache for +remote networking fs, while in on-demand read mode, cachefiles can boost the +scenario where on-demand read semantics is needed, e.g. container image +distribution. + +The essential difference between these two modes is that, in original mode, +when cache miss, netfs itself will fetch data from remote, and then write the +fetched data into cache file. While in on-demand read mode, a user daemon is +responsible for fetching data and then writing to the cache file. + +``CONFIG_CACHEFILES_ONDEMAND`` shall be enabled to support on-demand read mode. + + +Protocol Communication +---------------------- + +The on-demand read mode relies on a simple protocol used for communication +between kernel and user daemon. The model is like:: + + kernel --[request]--> user daemon --[reply]--> kernel + +The cachefiles kernel module will send requests to user daemon when needed. +User daemon needs to poll on the devnode ('/dev/cachefiles') to check if +there's pending request to be processed. A POLLIN event will be returned +when there's pending request. + +Then user daemon needs to read the devnode to fetch one request and process it +accordingly. It is worth nothing that each read only gets one request. When +finished processing the request, user daemon needs to write the reply to the +devnode. + +Each request is started with a message header like:: + + struct cachefiles_msg { + __u32 id; + __u32 opcode; + __u32 len; + __u8 data[]; + }; + + * ``id`` is a unique ID identifying this request among all pending + requests. + + * ``opcode`` identifies the type of this request. + + * ``data`` identifies the payload of this request. + + * ``len`` identifies the whole length of this request, including the + header and following type specific payload. + + +Turn on On-demand Mode +---------------------- + +An optional parameter is added to "bind" command:: + + bind [ondemand] + +When "bind" command takes without argument, it defaults to the original mode. +When "bind" command takes with "ondemand" argument, i.e. "bind ondemand", +on-demand read mode will be enabled. + + +OPEN Request +------------ + +When netfs opens a cache file for the first time, a request with +CACHEFILES_OP_OPEN opcode, a.k.a OPEN request will be sent to user daemon. The +payload format is like:: + + struct cachefiles_open { + __u32 volume_key_size; + __u32 cookie_key_size; + __u32 fd; + __u32 flags; + __u8 data[]; + }; + + * ``data`` contains volume_key and cookie_key in sequence. + + * ``volume_key_size`` identifies the size of volume key of the cache + file, in bytes. volume_key is of string format, with a suffix '\0'. + + * ``cookie_key_size`` identifies the size of cookie key of the cache + file, in bytes. cookie_key is of binary format, which is netfs + specific. + + * ``fd`` identifies the anonymous fd of the cache file, with which user + daemon can perform write/llseek file operations on the cache file. + + +OPEN request contains (volume_key, cookie_key, anon_fd) triple for corresponding +cache file. With this triple, user daemon could fetch and write data into the +cache file in the background, even when kernel has not triggered the cache miss +yet. User daemon is able to distinguish the requested cache file with the given +(volume_key, cookie_key), and write the fetched data into cache file with the +given anon_fd. + +After recording the (volume_key, cookie_key, anon_fd) triple, user daemon shall +reply with "copen" (complete open) command:: + + copen <id>,<cache_size> + + * ``id`` is exactly the id field of the previous OPEN request. + + * When >= 0, ``cache_size`` identifies the size of the cache file; + when < 0, ``cache_size`` identifies the error code ecountered by the + user daemon. + + +CLOSE Request +------------- +When cookie withdrawed, a request with CACHEFILES_OP_CLOSE opcode, a.k.a CLOSE +request, will be sent to user daemon. It will notify user daemon to close the +attached anon_fd. The payload format is like:: + + struct cachefiles_close { + __u32 fd; + }; + + * ``fd`` identifies the anon_fd to be closed, which is exactly the same + with that in OPEN request. + + +READ Request +------------ + +When on-demand read mode is turned on, and cache miss encountered, kernel will +send a request with CACHEFILES_OP_READ opcode, a.k.a READ request, to user +daemon. It will notify user daemon to fetch data in the requested file range. +The payload format is like:: + + struct cachefiles_read { + __u64 off; + __u64 len; + __u32 fd; + }; + + * ``off`` identifies the starting offset of the requested file range. + + * ``len`` identifies the length of the requested file range. + + * ``fd`` identifies the anonymous fd of the requested cache file. It is + guaranteed that it shall be the same with the fd field in the previous + OPEN request. + +When receiving one READ request, user daemon needs to fetch data of the +requested file range, and then write the fetched data into cache file with the +given anonymous fd. + +When finished processing the READ request, user daemon needs to reply with +CACHEFILES_IOC_CREAD ioctl on the corresponding anon_fd:: + + ioctl(fd, CACHEFILES_IOC_CREAD, id); + + * ``fd`` is exactly the fd field of the previous READ request. + + * ``id`` is exactly the id field of the previous READ request.
Document new user interface introduced by on-demand read mode. Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com> --- .../filesystems/caching/cachefiles.rst | 165 ++++++++++++++++++ 1 file changed, 165 insertions(+)