diff mbox series

[v5,03/22] cachefiles: introduce on-demand read mode

Message ID 20220316131723.111553-4-jefflexu@linux.alibaba.com (mailing list archive)
State New, archived
Headers show
Series fscache,erofs: fscache-based on-demand read semantics | expand

Commit Message

Jingbo Xu March 16, 2022, 1:17 p.m. UTC
Fscache/cachefiles used to serve as a local cache for remote fs. This
patch, along with the following patches, introduces a new on-demand read
mode for cachefiles, which can boost the scenario where on-demand read
semantics is needed, e.g. container image distribution.

The essential difference between the original mode and on-demand read
mode is that, in the original mode, when cache miss, netfs itself will
fetch data from remote, and then write the fetched data into cache file.
While in on-demand read mode, a user daemon is responsible for fetching
data and then writing to the cache file.

This patch only adds the command to enable on-demand read mode. An
optional parameter to "bind" command is added. On-demand mode will be
turned on when this optional argument matches "ondemand" exactly, i.e.
"bind ondemand". Otherwise cachefiles will keep working in the original
mode.

The following patches will implement the data plane of on-demand read
mode.

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
---
 fs/cachefiles/Kconfig    |  11 ++++
 fs/cachefiles/daemon.c   | 132 +++++++++++++++++++++++++++++++--------
 fs/cachefiles/internal.h |   6 ++
 3 files changed, 124 insertions(+), 25 deletions(-)

Comments

David Howells March 21, 2022, 1:34 p.m. UTC | #1
Jeffle Xu <jefflexu@linux.alibaba.com> wrote:

> Fscache/cachefiles used to serve as a local cache for remote fs. This
> patch, along with the following patches, introduces a new on-demand read
> mode for cachefiles, which can boost the scenario where on-demand read
> semantics is needed, e.g. container image distribution.
> 
> The essential difference between the original mode and on-demand read
> mode is that, in the original mode, when cache miss, netfs itself will
> fetch data from remote, and then write the fetched data into cache file.
> While in on-demand read mode, a user daemon is responsible for fetching
> data and then writing to the cache file.
> 
> This patch only adds the command to enable on-demand read mode. An optional
> parameter to "bind" command is added. On-demand mode will be turned on when
> this optional argument matches "ondemand" exactly, i.e.  "bind
> ondemand". Otherwise cachefiles will keep working in the original mode.

You're not really adding a command, per se.  Also, I would recommend
starting the paragraph with a verb.  How about:

	Make it possible to enable on-demand read mode by adding an
	optional parameter to the "bind" command.  On-demand mode will be
	turned on when this parameter is "ondemand", i.e. "bind ondemand".
	Otherwise cachefiles will work in the original mode.

Also, I'd add a note something like the following:

	This is implemented as a variation on the bind command so that it
	can't be turned on accidentally in /etc/cachefilesd.conf when
	cachefilesd isn't expecting it.	

> The following patches will implement the data plane of on-demand read
> mode.

I would remove this line.  If ondemand mode is not fully implemented in
cachefiles at this point, I would be tempted to move this to the end of the
cachefiles subset of the patchset.  That said, I'm not sure it can be made
to do anything much before that point.

> +#ifdef CONFIG_CACHEFILES_ONDEMAND
> +static inline void cachefiles_ondemand_open(struct cachefiles_cache *cache)
> +{
> +	xa_init_flags(&cache->reqs, XA_FLAGS_ALLOC);
> +	rwlock_init(&cache->reqs_lock);
> +}

Just merge that into the caller.

> +static inline void cachefiles_ondemand_release(struct cachefiles_cache *cache)
> +{
> +	xa_destroy(&cache->reqs);
> +}

Ditto.

> +static inline
> +bool cachefiles_ondemand_daemon_bind(struct cachefiles_cache *cache, char *args)
> +{
> +	if (!strcmp(args, "ondemand")) {
> +		set_bit(CACHEFILES_ONDEMAND_MODE, &cache->flags);
> +		return true;
> +	}
> +
> +	return false;
> +}
> ...
> +	if (!cachefiles_ondemand_daemon_bind(cache, args) && *args) {
> +		pr_err("'bind' command doesn't take an argument\n");
> +		return -EINVAL;
> +	}
> +

I would merge these together, I think, and say something like "Ondemand
mode not enabled in kernel" if CONFIG_CACHEFILES_ONDEMAND=n.

David
Matthew Wilcox March 21, 2022, 1:40 p.m. UTC | #2
On Wed, Mar 16, 2022 at 09:17:04PM +0800, Jeffle Xu wrote:
> +#ifdef CONFIG_CACHEFILES_ONDEMAND
> +	struct xarray			reqs;		/* xarray of pending on-demand requests */
> +	rwlock_t			reqs_lock;	/* Lock for reqs xarray */

Why do you have a separate rwlock when the xarray already has its own
spinlock?  This is usually a really bad idea.
Jingbo Xu March 21, 2022, 2:08 p.m. UTC | #3
On 3/21/22 9:40 PM, Matthew Wilcox wrote:
> On Wed, Mar 16, 2022 at 09:17:04PM +0800, Jeffle Xu wrote:
>> +#ifdef CONFIG_CACHEFILES_ONDEMAND
>> +	struct xarray			reqs;		/* xarray of pending on-demand requests */
>> +	rwlock_t			reqs_lock;	/* Lock for reqs xarray */
> 
> Why do you have a separate rwlock when the xarray already has its own
> spinlock?  This is usually a really bad idea.

Hi,

Thanks for reviewing.

reqs_lock is also used to protect the check of cache->flags. Please
refer to patch 4 [1] of this patchset.

```
+	/*
+	 * Enqueue the pending request.
+	 *
+	 * Stop enqueuing the request when daemon is dying. So we need to
+	 * 1) check cache state, and 2) enqueue request if cache is alive.
+	 *
+	 * The above two ops need to be atomic as a whole. @reqs_lock is used
+	 * here to ensure that. Otherwise, request may be enqueued after xarray
+	 * has been flushed, in which case the orphan request will never be
+	 * completed and thus netfs will hang there forever.
+	 */
+	read_lock(&cache->reqs_lock);
+
+	/* recheck dead state under lock */
+	if (test_bit(CACHEFILES_DEAD, &cache->flags)) {
+		read_unlock(&cache->reqs_lock);
+		ret = -EIO;
+		goto out;
+	}
+
+	xa_lock(xa);
+	ret = __xa_alloc(xa, &id, req, xa_limit_32b, GFP_KERNEL);
+	if (!ret)
+		__xa_set_mark(xa, id, CACHEFILES_REQ_NEW);
+	xa_unlock(xa);
+
+	read_unlock(&cache->reqs_lock);
```

It's mainly used to protect against the xarray flush.

Besides, IMHO read-write lock shall be more performance friendly, since
most cases are the read side.


[1] https://lkml.org/lkml/2022/3/16/351
David Howells March 21, 2022, 2:14 p.m. UTC | #4
Matthew Wilcox <willy@infradead.org> wrote:

> Why do you have a separate rwlock when the xarray already has its own
> spinlock?  This is usually a really bad idea.

Jeffle wants to hold a lock across the CACHEFILES_DEAD check and the xarray
access.

However, he tells xarray to do a GFP_KERNEL alloc whilst holding the rwlock:-/

David
Jingbo Xu March 21, 2022, 2:16 p.m. UTC | #5
Hi,

Thanks for reviewing.


On 3/21/22 9:34 PM, David Howells wrote:
> Jeffle Xu <jefflexu@linux.alibaba.com> wrote:
> 
>> Fscache/cachefiles used to serve as a local cache for remote fs. This
>> patch, along with the following patches, introduces a new on-demand read
>> mode for cachefiles, which can boost the scenario where on-demand read
>> semantics is needed, e.g. container image distribution.
>>
>> The essential difference between the original mode and on-demand read
>> mode is that, in the original mode, when cache miss, netfs itself will
>> fetch data from remote, and then write the fetched data into cache file.
>> While in on-demand read mode, a user daemon is responsible for fetching
>> data and then writing to the cache file.
>>
>> This patch only adds the command to enable on-demand read mode. An optional
>> parameter to "bind" command is added. On-demand mode will be turned on when
>> this optional argument matches "ondemand" exactly, i.e.  "bind
>> ondemand". Otherwise cachefiles will keep working in the original mode.
> 
> You're not really adding a command, per se.  Also, I would recommend
> starting the paragraph with a verb.  How about:
> 
> 	Make it possible to enable on-demand read mode by adding an
> 	optional parameter to the "bind" command.  On-demand mode will be
> 	turned on when this parameter is "ondemand", i.e. "bind ondemand".
> 	Otherwise cachefiles will work in the original mode.
> 
> Also, I'd add a note something like the following:
> 
> 	This is implemented as a variation on the bind command so that it
> 	can't be turned on accidentally in /etc/cachefilesd.conf when
> 	cachefilesd isn't expecting it.	

Alright, looks much better :)

> 
>> The following patches will implement the data plane of on-demand read
>> mode.
> 
> I would remove this line.  If ondemand mode is not fully implemented in
> cachefiles at this point, I would be tempted to move this to the end of the
> cachefiles subset of the patchset.  That said, I'm not sure it can be made
> to do anything much before that point.


Alright.

> 
>> +#ifdef CONFIG_CACHEFILES_ONDEMAND
>> +static inline void cachefiles_ondemand_open(struct cachefiles_cache *cache)
>> +{
>> +	xa_init_flags(&cache->reqs, XA_FLAGS_ALLOC);
>> +	rwlock_init(&cache->reqs_lock);
>> +}
> 
> Just merge that into the caller.
> 
>> +static inline void cachefiles_ondemand_release(struct cachefiles_cache *cache)
>> +{
>> +	xa_destroy(&cache->reqs);
>> +}
> 
> Ditto.
> 
>> +static inline
>> +bool cachefiles_ondemand_daemon_bind(struct cachefiles_cache *cache, char *args)
>> +{
>> +	if (!strcmp(args, "ondemand")) {
>> +		set_bit(CACHEFILES_ONDEMAND_MODE, &cache->flags);
>> +		return true;
>> +	}
>> +
>> +	return false;
>> +}
>> ...
>> +	if (!cachefiles_ondemand_daemon_bind(cache, args) && *args) {
>> +		pr_err("'bind' command doesn't take an argument\n");
>> +		return -EINVAL;
>> +	}
>> +
> 
> I would merge these together, I think, and say something like "Ondemand
> mode not enabled in kernel" if CONFIG_CACHEFILES_ONDEMAND=n.
> 

The reason why I extract all these logic into small sized function is
that, the **callers** can call cachefiles_ondemand_daemon_bind()
directly without any clause like:

```
#ifdef CONFIG_CACHEFILES_ONDEMAND
	...
#else
	...
```



Another choice is like

```
if (IS_ENABLED(CONFIG_CACHEFILES_ONDEMAND))
	...
else
	...
```
Gao Xiang March 21, 2022, 2:20 p.m. UTC | #6
On Mon, Mar 21, 2022 at 02:14:03PM +0000, David Howells wrote:
> Matthew Wilcox <willy@infradead.org> wrote:
> 
> > Why do you have a separate rwlock when the xarray already has its own
> > spinlock?  This is usually a really bad idea.
> 
> Jeffle wants to hold a lock across the CACHEFILES_DEAD check and the xarray
> access.
> 
> However, he tells xarray to do a GFP_KERNEL alloc whilst holding the rwlock:-/

Yeah, sorry, there are trivial mistakes due to sleep in atomic
contexts (sorry that I didn't catch them earlier..)

Thanks,
Gao Xiang

> 
> David
> --
> Linux-cachefs mailing list
> Linux-cachefs@redhat.com
> https://listman.redhat.com/mailman/listinfo/linux-cachefs
Matthew Wilcox March 21, 2022, 2:26 p.m. UTC | #7
On Mon, Mar 21, 2022 at 10:08:47PM +0800, JeffleXu wrote:
> reqs_lock is also used to protect the check of cache->flags. Please
> refer to patch 4 [1] of this patchset.

Yes, that's exactly what I meant by "bad idea".

> ```
> +	/*
> +	 * Enqueue the pending request.
> +	 *
> +	 * Stop enqueuing the request when daemon is dying. So we need to
> +	 * 1) check cache state, and 2) enqueue request if cache is alive.
> +	 *
> +	 * The above two ops need to be atomic as a whole. @reqs_lock is used
> +	 * here to ensure that. Otherwise, request may be enqueued after xarray
> +	 * has been flushed, in which case the orphan request will never be
> +	 * completed and thus netfs will hang there forever.
> +	 */
> +	read_lock(&cache->reqs_lock);
> +
> +	/* recheck dead state under lock */
> +	if (test_bit(CACHEFILES_DEAD, &cache->flags)) {
> +		read_unlock(&cache->reqs_lock);
> +		ret = -EIO;
> +		goto out;
> +	}

So this is an error path.  We're almost always going to take the xa_lock
immediately after taking the read_lock.  In other words, you've done two
atomic operations instead of one.

> +	xa_lock(xa);
> +	ret = __xa_alloc(xa, &id, req, xa_limit_32b, GFP_KERNEL);
> +	if (!ret)
> +		__xa_set_mark(xa, id, CACHEFILES_REQ_NEW);
> +	xa_unlock(xa);
> +
> +	read_unlock(&cache->reqs_lock);
> ```
> 
> It's mainly used to protect against the xarray flush.
> 
> Besides, IMHO read-write lock shall be more performance friendly, since
> most cases are the read side.

That's almost never true.  rwlocks are usually a bad idea because you
still have to bounce the cacheline, so you replace lock contention
(which you can see) with cacheline contention (which is harder to
measure).  And then you have questions about reader/writer fairness
(should new readers queue behind a writer if there's one waiting, or
should a steady stream of readers be able to hold a writer off
indefinitely?)
Jingbo Xu March 21, 2022, 3:18 p.m. UTC | #8
On 3/21/22 10:26 PM, Matthew Wilcox wrote:
> On Mon, Mar 21, 2022 at 10:08:47PM +0800, JeffleXu wrote:
>> reqs_lock is also used to protect the check of cache->flags. Please
>> refer to patch 4 [1] of this patchset.
> 
> Yes, that's exactly what I meant by "bad idea".
> 
>> ```
>> +	/*
>> +	 * Enqueue the pending request.
>> +	 *
>> +	 * Stop enqueuing the request when daemon is dying. So we need to
>> +	 * 1) check cache state, and 2) enqueue request if cache is alive.
>> +	 *
>> +	 * The above two ops need to be atomic as a whole. @reqs_lock is used
>> +	 * here to ensure that. Otherwise, request may be enqueued after xarray
>> +	 * has been flushed, in which case the orphan request will never be
>> +	 * completed and thus netfs will hang there forever.
>> +	 */
>> +	read_lock(&cache->reqs_lock);
>> +
>> +	/* recheck dead state under lock */
>> +	if (test_bit(CACHEFILES_DEAD, &cache->flags)) {
>> +		read_unlock(&cache->reqs_lock);
>> +		ret = -EIO;
>> +		goto out;
>> +	}
> 
> So this is an error path.  We're almost always going to take the xa_lock
> immediately after taking the read_lock.  In other words, you've done two
> atomic operations instead of one.

Right.

> 
>> +	xa_lock(xa);
>> +	ret = __xa_alloc(xa, &id, req, xa_limit_32b, GFP_KERNEL);
>> +	if (!ret)
>> +		__xa_set_mark(xa, id, CACHEFILES_REQ_NEW);
>> +	xa_unlock(xa);
>> +
>> +	read_unlock(&cache->reqs_lock);
>> ```
>>
>> It's mainly used to protect against the xarray flush.
>>
>> Besides, IMHO read-write lock shall be more performance friendly, since
>> most cases are the read side.
> 
> That's almost never true.  rwlocks are usually a bad idea because you
> still have to bounce the cacheline, so you replace lock contention
> (which you can see) with cacheline contention (which is harder to
> measure).  And then you have questions about reader/writer fairness
> (should new readers queue behind a writer if there's one waiting, or
> should a steady stream of readers be able to hold a writer off
> indefinitely?)

Interesting, I didn't notice it before. Thanks for explaining it.


BTW what I want is just

```
PROCESS 1		PROCESS 2
=========		=========
#lock			#lock
set DEAD state		if (not DEAD)
flush xarray		   enqueue into xarray
#unlock			#unlock
```

I think it is a generic paradigm. So it seems that the spinlock inside
xarray is already adequate for this job?
Matthew Wilcox March 21, 2022, 3:21 p.m. UTC | #9
On Mon, Mar 21, 2022 at 11:18:05PM +0800, JeffleXu wrote:
> >> Besides, IMHO read-write lock shall be more performance friendly, since
> >> most cases are the read side.
> > 
> > That's almost never true.  rwlocks are usually a bad idea because you
> > still have to bounce the cacheline, so you replace lock contention
> > (which you can see) with cacheline contention (which is harder to
> > measure).  And then you have questions about reader/writer fairness
> > (should new readers queue behind a writer if there's one waiting, or
> > should a steady stream of readers be able to hold a writer off
> > indefinitely?)
> 
> Interesting, I didn't notice it before. Thanks for explaining it.

No problem.  It's hard to notice.

> BTW what I want is just
> 
> ```
> PROCESS 1		PROCESS 2
> =========		=========
> #lock			#lock
> set DEAD state		if (not DEAD)
> flush xarray		   enqueue into xarray
> #unlock			#unlock
> ```
> 
> I think it is a generic paradigm. So it seems that the spinlock inside
> xarray is already adequate for this job?

Absolutely; just use xa_lock() to protect both setting & testing the
flag.
David Howells March 21, 2022, 3:30 p.m. UTC | #10
Matthew Wilcox <willy@infradead.org> wrote:

> Absolutely; just use xa_lock() to protect both setting & testing the
> flag.

How should Jeffle deal with xarray dropping the lock internally in order to do
an allocation and then taking it again (actually in patch 5)?

David
Matthew Wilcox March 22, 2022, 5:04 p.m. UTC | #11
On Mon, Mar 21, 2022 at 03:30:52PM +0000, David Howells wrote:
> Matthew Wilcox <willy@infradead.org> wrote:
> 
> > Absolutely; just use xa_lock() to protect both setting & testing the
> > flag.
> 
> How should Jeffle deal with xarray dropping the lock internally in order to do
> an allocation and then taking it again (actually in patch 5)?

There are a number of ways to handle this.  I'll outline two; others
are surely possible.

option 1:

add side:

xa_lock();
if (!DEAD)
	xa_store(GFP_KERNEL);
	if (DEAD)
		xa_erase();
xa_unlock();

destroy side:

xa_lock();
set DEAD;
xa_for_each()
	xa_erase();
xa_unlock();

That has the problem (?) that it might be temporarily possible to see
a newly-added entry in a DEAD array.

If that is a problem, you can use xa_reserve() on the add side, followed
by overwriting it or removing it, depending on the state of the DEAD flag.

If you really want to, you can decompose the add side so that you always
check the DEAD flag before doing the store, ie:

do {
	xas_lock();
	if (DEAD)
		xas_set_error(-EINVAL);
	else
		xas_store();
	xas_unlock();
} while (xas_nomem(GFP_KERNEL));
Jingbo Xu March 23, 2022, 5:32 a.m. UTC | #12
On 3/23/22 1:04 AM, Matthew Wilcox wrote:
> On Mon, Mar 21, 2022 at 03:30:52PM +0000, David Howells wrote:
>> Matthew Wilcox <willy@infradead.org> wrote:
>>
>>> Absolutely; just use xa_lock() to protect both setting & testing the
>>> flag.
>>
>> How should Jeffle deal with xarray dropping the lock internally in order to do
>> an allocation and then taking it again (actually in patch 5)?
> 
> There are a number of ways to handle this.  I'll outline two; others
> are surely possible.

Thanks.


> 
> option 1:
> 
> add side:
> 
> xa_lock();
> if (!DEAD)
> 	xa_store(GFP_KERNEL);
> 	if (DEAD)
> 		xa_erase();
> xa_unlock();
> 
> destroy side:
> 
> xa_lock();
> set DEAD;
> xa_for_each()
> 	xa_erase();
> xa_unlock();
> 
> That has the problem (?) that it might be temporarily possible to see
> a newly-added entry in a DEAD array.

I think this problem doesn't matter in our scenario.


> 
> If that is a problem, you can use xa_reserve() on the add side, followed
> by overwriting it or removing it, depending on the state of the DEAD flag.

Right. Then even the normal path (when memory allocation succeeds) needs
to call xa_reserve() once.


> 
> If you really want to, you can decompose the add side so that you always
> check the DEAD flag before doing the store, ie:
> 
> do {
> 	xas_lock();
> 	if (DEAD)
> 		xas_set_error(-EINVAL);
> 	else
> 		xas_store();
> 	xas_unlock();
> } while (xas_nomem(GFP_KERNEL));

This way is more cleaner from the locking semantics, with the cost of
code duplication. However, after decomposing the __xa_alloc(), we can
also reuse the xas when setting CACHEFILES_REQ_NEW mark.

```
+	xa_lock(xa);
+	ret = __xa_alloc(xa, &id, req, xa_limit_32b, GFP_KERNEL);
+	if (!ret)
+		__xa_set_mark(xa, id, CACHEFILES_REQ_NEW);
+	xa_unlock(xa);
```

So far personally I prefer the decomposing way in our scenario.
diff mbox series

Patch

diff --git a/fs/cachefiles/Kconfig b/fs/cachefiles/Kconfig
index 719faeeda168..58aad1fb4c5c 100644
--- a/fs/cachefiles/Kconfig
+++ b/fs/cachefiles/Kconfig
@@ -26,3 +26,14 @@  config CACHEFILES_ERROR_INJECTION
 	help
 	  This permits error injection to be enabled in cachefiles whilst a
 	  cache is in service.
+
+config CACHEFILES_ONDEMAND
+	bool "Support for on-demand read"
+	depends on CACHEFILES
+	default n
+	help
+	  This permits on-demand read mode of cachefiles. In this mode, when
+	  cache miss, the cachefiles backend instead of netfs, is responsible
+          for fetching data, e.g. through user daemon.
+
+	  If unsure, say N.
diff --git a/fs/cachefiles/daemon.c b/fs/cachefiles/daemon.c
index 7ac04ee2c0a0..c0c3a3cbee28 100644
--- a/fs/cachefiles/daemon.c
+++ b/fs/cachefiles/daemon.c
@@ -78,6 +78,65 @@  static const struct cachefiles_daemon_cmd cachefiles_daemon_cmds[] = {
 	{ "",		NULL				}
 };
 
+#ifdef CONFIG_CACHEFILES_ONDEMAND
+static inline void cachefiles_ondemand_open(struct cachefiles_cache *cache)
+{
+	xa_init_flags(&cache->reqs, XA_FLAGS_ALLOC);
+	rwlock_init(&cache->reqs_lock);
+}
+
+static inline void cachefiles_ondemand_release(struct cachefiles_cache *cache)
+{
+	xa_destroy(&cache->reqs);
+}
+
+static inline __poll_t cachefiles_ondemand_mask(struct cachefiles_cache *cache)
+{
+	__poll_t mask = 0;
+
+	if (!xa_empty(&cache->reqs))
+		mask |= EPOLLIN;
+
+	if (test_bit(CACHEFILES_CULLING, &cache->flags))
+		mask |= EPOLLOUT;
+
+	return mask;
+}
+
+static inline
+bool cachefiles_ondemand_daemon_bind(struct cachefiles_cache *cache, char *args)
+{
+	if (!strcmp(args, "ondemand")) {
+		set_bit(CACHEFILES_ONDEMAND_MODE, &cache->flags);
+		return true;
+	}
+
+	return false;
+}
+
+#else
+static inline void cachefiles_ondemand_open(struct cachefiles_cache *cache) {}
+static inline void cachefiles_ondemand_release(struct cachefiles_cache *cache) {}
+
+static inline
+__poll_t cachefiles_ondemand_mask(struct cachefiles_cache *cache)
+{
+	return 0;
+}
+
+static inline
+bool cachefiles_ondemand_daemon_bind(struct cachefiles_cache *cache, char *args)
+{
+	return false;
+}
+#endif
+
+static inline
+ssize_t cachefiles_ondemand_daemon_read(struct cachefiles_cache *cache,
+					char __user *_buffer, size_t buflen)
+{
+	return -EOPNOTSUPP;
+}
 
 /*
  * Prepare a cache for caching.
@@ -108,6 +167,7 @@  static int cachefiles_daemon_open(struct inode *inode, struct file *file)
 	INIT_LIST_HEAD(&cache->volumes);
 	INIT_LIST_HEAD(&cache->object_list);
 	spin_lock_init(&cache->object_list_lock);
+	cachefiles_ondemand_open(cache);
 
 	/* set default caching limits
 	 * - limit at 1% free space and/or free files
@@ -139,6 +199,7 @@  static int cachefiles_daemon_release(struct inode *inode, struct file *file)
 
 	set_bit(CACHEFILES_DEAD, &cache->flags);
 
+	cachefiles_ondemand_release(cache);
 	cachefiles_daemon_unbind(cache);
 
 	/* clean up the control file interface */
@@ -152,23 +213,15 @@  static int cachefiles_daemon_release(struct inode *inode, struct file *file)
 	return 0;
 }
 
-/*
- * Read the cache state.
- */
-static ssize_t cachefiles_daemon_read(struct file *file, char __user *_buffer,
-				      size_t buflen, loff_t *pos)
+static ssize_t cachefiles_do_daemon_read(struct cachefiles_cache *cache,
+					 char __user *_buffer,
+					 size_t buflen)
 {
-	struct cachefiles_cache *cache = file->private_data;
 	unsigned long long b_released;
 	unsigned f_released;
 	char buffer[256];
 	int n;
 
-	//_enter(",,%zu,", buflen);
-
-	if (!test_bit(CACHEFILES_READY, &cache->flags))
-		return 0;
-
 	/* check how much space the cache has */
 	cachefiles_has_space(cache, 0, 0, cachefiles_has_space_check);
 
@@ -206,6 +259,25 @@  static ssize_t cachefiles_daemon_read(struct file *file, char __user *_buffer,
 	return n;
 }
 
+/*
+ * Read the cache state.
+ */
+static ssize_t cachefiles_daemon_read(struct file *file, char __user *_buffer,
+				      size_t buflen, loff_t *pos)
+{
+	struct cachefiles_cache *cache = file->private_data;
+
+	//_enter(",,%zu,", buflen);
+
+	if (!test_bit(CACHEFILES_READY, &cache->flags))
+		return 0;
+
+	if (likely(!test_bit(CACHEFILES_ONDEMAND_MODE, &cache->flags)))
+		return cachefiles_do_daemon_read(cache, _buffer, buflen);
+	else
+		return cachefiles_ondemand_daemon_read(cache, _buffer, buflen);
+}
+
 /*
  * Take a command from cachefilesd, parse it and act on it.
  */
@@ -284,6 +356,21 @@  static ssize_t cachefiles_daemon_write(struct file *file,
 	goto error;
 }
 
+
+static inline
+__poll_t cachefiles_daemon_mask(struct cachefiles_cache *cache)
+{
+	__poll_t mask = 0;
+
+	if (test_bit(CACHEFILES_STATE_CHANGED, &cache->flags))
+		mask |= EPOLLIN;
+
+	if (test_bit(CACHEFILES_CULLING, &cache->flags))
+		mask |= EPOLLOUT;
+
+	return mask;
+}
+
 /*
  * Poll for culling state
  * - use EPOLLOUT to indicate culling state
@@ -292,18 +379,13 @@  static __poll_t cachefiles_daemon_poll(struct file *file,
 					   struct poll_table_struct *poll)
 {
 	struct cachefiles_cache *cache = file->private_data;
-	__poll_t mask;
 
 	poll_wait(file, &cache->daemon_pollwq, poll);
-	mask = 0;
-
-	if (test_bit(CACHEFILES_STATE_CHANGED, &cache->flags))
-		mask |= EPOLLIN;
-
-	if (test_bit(CACHEFILES_CULLING, &cache->flags))
-		mask |= EPOLLOUT;
 
-	return mask;
+	if (likely(!test_bit(CACHEFILES_ONDEMAND_MODE, &cache->flags)))
+		return cachefiles_daemon_mask(cache);
+	else
+		return cachefiles_ondemand_mask(cache);
 }
 
 /*
@@ -687,11 +769,6 @@  static int cachefiles_daemon_bind(struct cachefiles_cache *cache, char *args)
 	    cache->brun_percent  >= 100)
 		return -ERANGE;
 
-	if (*args) {
-		pr_err("'bind' command doesn't take an argument\n");
-		return -EINVAL;
-	}
-
 	if (!cache->rootdirname) {
 		pr_err("No cache directory specified\n");
 		return -EINVAL;
@@ -703,6 +780,11 @@  static int cachefiles_daemon_bind(struct cachefiles_cache *cache, char *args)
 		return -EBUSY;
 	}
 
+	if (!cachefiles_ondemand_daemon_bind(cache, args) && *args) {
+		pr_err("'bind' command doesn't take an argument\n");
+		return -EINVAL;
+	}
+
 	/* Make sure we have copies of the tag string */
 	if (!cache->tag) {
 		/*
diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h
index e80673d0ab97..3f791882fa3f 100644
--- a/fs/cachefiles/internal.h
+++ b/fs/cachefiles/internal.h
@@ -15,6 +15,7 @@ 
 #include <linux/fscache-cache.h>
 #include <linux/cred.h>
 #include <linux/security.h>
+#include <linux/xarray.h>
 
 #define CACHEFILES_DIO_BLOCK_SIZE 4096
 
@@ -98,9 +99,14 @@  struct cachefiles_cache {
 #define CACHEFILES_DEAD			1	/* T if cache dead */
 #define CACHEFILES_CULLING		2	/* T if cull engaged */
 #define CACHEFILES_STATE_CHANGED	3	/* T if state changed (poll trigger) */
+#define CACHEFILES_ONDEMAND_MODE	4	/* T if in on-demand read mode */
 	char				*rootdirname;	/* name of cache root directory */
 	char				*secctx;	/* LSM security context */
 	char				*tag;		/* cache binding tag */
+#ifdef CONFIG_CACHEFILES_ONDEMAND
+	struct xarray			reqs;		/* xarray of pending on-demand requests */
+	rwlock_t			reqs_lock;	/* Lock for reqs xarray */
+#endif
 };
 
 #include <trace/events/cachefiles.h>