diff mbox series

[v2] fuse: use newer inode info when writeback cache is enabled

Message ID 20210130085003.1392-1-changfengnan@vivo.com (mailing list archive)
State New, archived
Headers show
Series [v2] fuse: use newer inode info when writeback cache is enabled | expand

Commit Message

常凤楠 Jan. 30, 2021, 8:50 a.m. UTC
When writeback cache is enabled, the inode information in cached is
considered new by default, and the inode information of lowerfs is
stale.
When a lower fs is mount in a different directory through different
connection, for example PATHA and PATHB, since writeback cache is
enabled by default, when the file is modified through PATHA, viewing the
same file from the PATHB, PATHB will think that cached inode is newer
than lowerfs, resulting in file size and time from under PATHA and PATHB
is inconsistent.
Add a judgment condition to check whether to use the info in the cache
according to mtime.

Signed-off-by: Fengnan Chang <changfengnan@vivo.com>
---
 fs/fuse/inode.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

--
2.29.0

Comments

Miklos Szeredi June 22, 2021, 7:59 a.m. UTC | #1
On Sat, 30 Jan 2021 at 09:50, Fengnan Chang <changfengnan@vivo.com> wrote:
>
> When writeback cache is enabled, the inode information in cached is
> considered new by default, and the inode information of lowerfs is
> stale.
> When a lower fs is mount in a different directory through different
> connection, for example PATHA and PATHB, since writeback cache is
> enabled by default, when the file is modified through PATHA, viewing the
> same file from the PATHB, PATHB will think that cached inode is newer
> than lowerfs, resulting in file size and time from under PATHA and PATHB
> is inconsistent.
> Add a judgment condition to check whether to use the info in the cache
> according to mtime.

This seems to break the fsx-linux stress test.

I suspect a better direction would be looking at whether the inode has
any files open for write (i_writecount > 0)...

Thanks,
Miklos
常凤楠 June 22, 2021, 12:25 p.m. UTC | #2
Unh, it seems i_writecount not work.
If we modify file through lowerfs, i_writecount won't change, but the 
size already changed.
For example:
echo "111" > /lowerfs/test
ls -l /upper/test
echo "2222" >> /lowerfs/test
ls -l /upper/test

So, can you describe your test enviroment? including kernel version and 
fsx parameters, I will check it.

Thanks.

On 2021/6/22 15:59, Miklos Szeredi wrote:
> On Sat, 30 Jan 2021 at 09:50, Fengnan Chang <changfengnan@vivo.com> wrote:
>>
>> When writeback cache is enabled, the inode information in cached is
>> considered new by default, and the inode information of lowerfs is
>> stale.
>> When a lower fs is mount in a different directory through different
>> connection, for example PATHA and PATHB, since writeback cache is
>> enabled by default, when the file is modified through PATHA, viewing the
>> same file from the PATHB, PATHB will think that cached inode is newer
>> than lowerfs, resulting in file size and time from under PATHA and PATHB
>> is inconsistent.
>> Add a judgment condition to check whether to use the info in the cache
>> according to mtime.
> 
> This seems to break the fsx-linux stress test.
> 
> I suspect a better direction would be looking at whether the inode has
> any files open for write (i_writecount > 0)...
> 
> Thanks,
> Miklos
>
Miklos Szeredi June 22, 2021, 3:19 p.m. UTC | #3
On Tue, 22 Jun 2021 at 14:25, Fengnan Chang <changfengnan@vivo.com> wrote:
>
> Unh, it seems i_writecount not work.
> If we modify file through lowerfs, i_writecount won't change, but the
> size already changed.
> For example:
> echo "111" > /lowerfs/test
> ls -l /upper/test
> echo "2222" >> /lowerfs/test
> ls -l /upper/test
>
> So, can you describe your test enviroment? including kernel version and
> fsx parameters, I will check it.

linux-5.13-rc5 + patch
mkdir /tmp/test
libfuse/example/passthrough_ll -ocache=always,writeback /mnt/fuse/
fsx-linux -N 1000000 /mnt/fuse/tmp/test/fsx

Thanks,
Miklos
常凤楠 June 24, 2021, 7:42 a.m. UTC | #4
Hi Miklos:

Thank you for the information, I have been able to reproduce the problem.

The new version of the patch as below. Previous fsx test is pass now. 
Need do more test, Can you help to test new patch? or send me your test 
case, I will test this.

Here is my test case, and is the problem this patch is trying to solve.
Case A:
mkdir /tmp/test
passthrough_ll -ocache=always,writeback /mnt/test/
echo "11111" > /tmp/test/fsx
ls -l /mnt/test/tmp/test/
echo "2222" >> /tmp/test/fsx
ls -l /mnt/test/tmp/test/

Case B:
mkdir /tmp/test
passthrough_ll -ocache=always,writeback /mnt/test/
passthrough_ll -ocache=always,writeback /mnt/test2/
echo "11111" > /tmp/test/fsx
ls -l /mnt/test/tmp/test/
ls -l /mnt/test2/tmp/test/
echo "222" >> /mnt/test/tmp/test/fsx
ls -l /mnt/test/tmp/test/
ls -l /mnt/test2/tmp/test/


diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index b9beb39a4a18..8e22a31b55c4 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -60,6 +60,10 @@ MODULE_PARM_DESC(max_user_congthresh,
  /** Congestion starts at 75% of maximum */
  #define FUSE_DEFAULT_CONGESTION_THRESHOLD (FUSE_DEFAULT_MAX_BACKGROUND 
* 3 / 4)

+static inline bool attr_newer_than_local(struct fuse_attr *attr, struct 
inode *inode) {
+    return (attr->mtime > inode->i_mtime.tv_sec)
+               || ((attr->mtime == inode->i_mtime.tv_sec) && 
(attr->mtimensec > inode->i_mtime.tv_nsec));
+}
  #ifdef CONFIG_BLOCK
  static struct file_system_type fuseblk_fs_type;
  #endif
@@ -241,8 +245,10 @@ void fuse_change_attributes(struct inode *inode, 
struct fuse_attr *attr,
          * extend local i_size without keeping userspace server in 
sync. So,
          * attr->size coming from server can be stale. We cannot trust it.
          */
-       if (!is_wb || !S_ISREG(inode->i_mode))
+       if (!is_wb || !S_ISREG(inode->i_mode)
+               || (attr_newer_than_local(attr, inode) && 
!inode_is_open_for_write(inode))) {
                 i_size_write(inode, attr->size);
+       }
         spin_unlock(&fi->lock);

         if (!is_wb && S_ISREG(inode->i_mode)) {

On 2021/6/22 23:19, Miklos Szeredi wrote:
> On Tue, 22 Jun 2021 at 14:25, Fengnan Chang <changfengnan@vivo.com> wrote:
>>
>> Unh, it seems i_writecount not work.
>> If we modify file through lowerfs, i_writecount won't change, but the
>> size already changed.
>> For example:
>> echo "111" > /lowerfs/test
>> ls -l /upper/test
>> echo "2222" >> /lowerfs/test
>> ls -l /upper/test
>>
>> So, can you describe your test enviroment? including kernel version and
>> fsx parameters, I will check it.
> 
> linux-5.13-rc5 + patch
> mkdir /tmp/test
> libfuse/example/passthrough_ll -ocache=always,writeback /mnt/fuse/
> fsx-linux -N 1000000 /mnt/fuse/tmp/test/fsx
> 
> Thanks,
> Miklos
>
常凤楠 June 25, 2021, 3:42 a.m. UTC | #5
Hi Miklos:

FYI, I run xfstest on fuse, with linux 5.4.61 + patch, no new failure case.

On 2021/6/24 15:42, Fengnan Chang wrote:
> Hi Miklos:
> 
> Thank you for the information, I have been able to reproduce the problem.
> 
> The new version of the patch as below. Previous fsx test is pass now. 
> Need do more test, Can you help to test new patch? or send me your test 
> case, I will test this.
> 
> Here is my test case, and is the problem this patch is trying to solve.
> Case A:
> mkdir /tmp/test
> passthrough_ll -ocache=always,writeback /mnt/test/
> echo "11111" > /tmp/test/fsx
> ls -l /mnt/test/tmp/test/
> echo "2222" >> /tmp/test/fsx
> ls -l /mnt/test/tmp/test/
> 
> Case B:
> mkdir /tmp/test
> passthrough_ll -ocache=always,writeback /mnt/test/
> passthrough_ll -ocache=always,writeback /mnt/test2/
> echo "11111" > /tmp/test/fsx
> ls -l /mnt/test/tmp/test/
> ls -l /mnt/test2/tmp/test/
> echo "222" >> /mnt/test/tmp/test/fsx
> ls -l /mnt/test/tmp/test/
> ls -l /mnt/test2/tmp/test/
> 
> 
> diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
> index b9beb39a4a18..8e22a31b55c4 100644
> --- a/fs/fuse/inode.c
> +++ b/fs/fuse/inode.c
> @@ -60,6 +60,10 @@ MODULE_PARM_DESC(max_user_congthresh,
>   /** Congestion starts at 75% of maximum */
>   #define FUSE_DEFAULT_CONGESTION_THRESHOLD (FUSE_DEFAULT_MAX_BACKGROUND 
> * 3 / 4)
> 
> +static inline bool attr_newer_than_local(struct fuse_attr *attr, struct 
> inode *inode) {
> +    return (attr->mtime > inode->i_mtime.tv_sec)
> +               || ((attr->mtime == inode->i_mtime.tv_sec) && 
> (attr->mtimensec > inode->i_mtime.tv_nsec));
> +}
>   #ifdef CONFIG_BLOCK
>   static struct file_system_type fuseblk_fs_type;
>   #endif
> @@ -241,8 +245,10 @@ void fuse_change_attributes(struct inode *inode, 
> struct fuse_attr *attr,
>           * extend local i_size without keeping userspace server in 
> sync. So,
>           * attr->size coming from server can be stale. We cannot trust it.
>           */
> -       if (!is_wb || !S_ISREG(inode->i_mode))
> +       if (!is_wb || !S_ISREG(inode->i_mode)
> +               || (attr_newer_than_local(attr, inode) && 
> !inode_is_open_for_write(inode))) {
>                  i_size_write(inode, attr->size);
> +       }
>          spin_unlock(&fi->lock);
> 
>          if (!is_wb && S_ISREG(inode->i_mode)) {
> 
> On 2021/6/22 23:19, Miklos Szeredi wrote:
>> On Tue, 22 Jun 2021 at 14:25, Fengnan Chang <changfengnan@vivo.com> 
>> wrote:
>>>
>>> Unh, it seems i_writecount not work.
>>> If we modify file through lowerfs, i_writecount won't change, but the
>>> size already changed.
>>> For example:
>>> echo "111" > /lowerfs/test
>>> ls -l /upper/test
>>> echo "2222" >> /lowerfs/test
>>> ls -l /upper/test
>>>
>>> So, can you describe your test enviroment? including kernel version and
>>> fsx parameters, I will check it.
>>
>> linux-5.13-rc5 + patch
>> mkdir /tmp/test
>> libfuse/example/passthrough_ll -ocache=always,writeback /mnt/fuse/
>> fsx-linux -N 1000000 /mnt/fuse/tmp/test/fsx
>>
>> Thanks,
>> Miklos
>>
Miklos Szeredi Aug. 6, 2021, 12:20 p.m. UTC | #6
On Thu, 24 Jun 2021 at 09:42, Fengnan Chang <changfengnan@vivo.com> wrote:
>
> Hi Miklos:
>
> Thank you for the information, I have been able to reproduce the problem.
>
> The new version of the patch as below. Previous fsx test is pass now.
> Need do more test, Can you help to test new patch? or send me your test
> case, I will test this.
>
> Here is my test case, and is the problem this patch is trying to solve.
> Case A:
> mkdir /tmp/test
> passthrough_ll -ocache=always,writeback /mnt/test/
> echo "11111" > /tmp/test/fsx
> ls -l /mnt/test/tmp/test/
> echo "2222" >> /tmp/test/fsx
> ls -l /mnt/test/tmp/test/
>
> Case B:
> mkdir /tmp/test
> passthrough_ll -ocache=always,writeback /mnt/test/
> passthrough_ll -ocache=always,writeback /mnt/test2/
> echo "11111" > /tmp/test/fsx
> ls -l /mnt/test/tmp/test/
> ls -l /mnt/test2/tmp/test/
> echo "222" >> /mnt/test/tmp/test/fsx
> ls -l /mnt/test/tmp/test/
> ls -l /mnt/test2/tmp/test/

Both these testcases have the "cache=always" option, which means:
cached values (both data and metadata) are always valid; i.e. changes
will be made only through this client and not through some other
channel (like the backing filesystem or another instance).

Why is "cache=always" used?

Thanks,
Miklos
常凤楠 Aug. 10, 2021, 1:41 a.m. UTC | #7
Remove cache=always still have this problem, this problem is related 
about FUSE_CAP_WRITEBACK_CACHE.

On 2021/8/6 20:20, Miklos Szeredi wrote:
> On Thu, 24 Jun 2021 at 09:42, Fengnan Chang <changfengnan@vivo.com> wrote:
>>
>> Hi Miklos:
>>
>> Thank you for the information, I have been able to reproduce the problem.
>>
>> The new version of the patch as below. Previous fsx test is pass now.
>> Need do more test, Can you help to test new patch? or send me your test
>> case, I will test this.
>>
>> Here is my test case, and is the problem this patch is trying to solve.
>> Case A:
>> mkdir /tmp/test
>> passthrough_ll -ocache=always,writeback /mnt/test/
>> echo "11111" > /tmp/test/fsx
>> ls -l /mnt/test/tmp/test/
>> echo "2222" >> /tmp/test/fsx
>> ls -l /mnt/test/tmp/test/
>>
>> Case B:
>> mkdir /tmp/test
>> passthrough_ll -ocache=always,writeback /mnt/test/
>> passthrough_ll -ocache=always,writeback /mnt/test2/
>> echo "11111" > /tmp/test/fsx
>> ls -l /mnt/test/tmp/test/
>> ls -l /mnt/test2/tmp/test/
>> echo "222" >> /mnt/test/tmp/test/fsx
>> ls -l /mnt/test/tmp/test/
>> ls -l /mnt/test2/tmp/test/
> 
> Both these testcases have the "cache=always" option, which means:
> cached values (both data and metadata) are always valid; i.e. changes
> will be made only through this client and not through some other
> channel (like the backing filesystem or another instance).
> 
> Why is "cache=always" used?
> 
> Thanks,
> Miklos
>
Peng Tao Aug. 16, 2021, 2:48 a.m. UTC | #8
On Tue, Aug 10, 2021 at 3:12 PM Fengnan Chang <changfengnan@vivo.com> wrote:
>
> Remove cache=always still have this problem, this problem is related
> about FUSE_CAP_WRITEBACK_CACHE.

FUSE_CAP_WRITEBACK_CACHE by definition asks the kernel to trust its
own inode attributes. So I don't think we should fix its semantics.
Instead, in your case, it seems that the two mnts (PATHA and PATHB)
are not sharing the same superblock. I would suggest fixing it at a
higher level:
1. use bind-mount to mount PATHA and PATHB,
2. or add a `tag=xxx` argument to fuse mount option to uniquely
identify a fuse file system (just like we do in the virtiofs case)

Cheers,
Tao
diff mbox series

Patch

diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index b0e18b470e91..55fdafcaca34 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -182,7 +182,10 @@  void fuse_change_attributes_common(struct inode *inode, struct fuse_attr *attr,
 	inode->i_atime.tv_sec   = attr->atime;
 	inode->i_atime.tv_nsec  = attr->atimensec;
 	/* mtime from server may be stale due to local buffered write */
-	if (!fc->writeback_cache || !S_ISREG(inode->i_mode)) {
+	if (!fc->writeback_cache || !S_ISREG(inode->i_mode)
+		|| (attr->mtime > inode->i_mtime.tv_sec)
+		|| ((attr->mtime == inode->i_mtime.tv_sec)
+			&& (attr->mtimensec >= inode->i_mtime.tv_nsec))) {
 		inode->i_mtime.tv_sec   = attr->mtime;
 		inode->i_mtime.tv_nsec  = attr->mtimensec;
 		inode->i_ctime.tv_sec   = attr->ctime;
@@ -241,8 +244,12 @@  void fuse_change_attributes(struct inode *inode, struct fuse_attr *attr,
 	 * extend local i_size without keeping userspace server in sync. So,
 	 * attr->size coming from server can be stale. We cannot trust it.
 	 */
-	if (!is_wb || !S_ISREG(inode->i_mode))
+	if (!is_wb || !S_ISREG(inode->i_mode)
+		|| (attr->mtime > inode->i_mtime.tv_sec)
+		|| ((attr->mtime == inode->i_mtime.tv_sec)
+			&& (attr->mtimensec >= inode->i_mtime.tv_nsec))) {
 		i_size_write(inode, attr->size);
+	}
 	spin_unlock(&fi->lock);

 	if (!is_wb && S_ISREG(inode->i_mode)) {