Message ID | 20210130085003.1392-1-changfengnan@vivo.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v2] fuse: use newer inode info when writeback cache is enabled | expand |
On Sat, 30 Jan 2021 at 09:50, Fengnan Chang <changfengnan@vivo.com> wrote: > > When writeback cache is enabled, the inode information in cached is > considered new by default, and the inode information of lowerfs is > stale. > When a lower fs is mount in a different directory through different > connection, for example PATHA and PATHB, since writeback cache is > enabled by default, when the file is modified through PATHA, viewing the > same file from the PATHB, PATHB will think that cached inode is newer > than lowerfs, resulting in file size and time from under PATHA and PATHB > is inconsistent. > Add a judgment condition to check whether to use the info in the cache > according to mtime. This seems to break the fsx-linux stress test. I suspect a better direction would be looking at whether the inode has any files open for write (i_writecount > 0)... Thanks, Miklos
Unh, it seems i_writecount not work. If we modify file through lowerfs, i_writecount won't change, but the size already changed. For example: echo "111" > /lowerfs/test ls -l /upper/test echo "2222" >> /lowerfs/test ls -l /upper/test So, can you describe your test enviroment? including kernel version and fsx parameters, I will check it. Thanks. On 2021/6/22 15:59, Miklos Szeredi wrote: > On Sat, 30 Jan 2021 at 09:50, Fengnan Chang <changfengnan@vivo.com> wrote: >> >> When writeback cache is enabled, the inode information in cached is >> considered new by default, and the inode information of lowerfs is >> stale. >> When a lower fs is mount in a different directory through different >> connection, for example PATHA and PATHB, since writeback cache is >> enabled by default, when the file is modified through PATHA, viewing the >> same file from the PATHB, PATHB will think that cached inode is newer >> than lowerfs, resulting in file size and time from under PATHA and PATHB >> is inconsistent. >> Add a judgment condition to check whether to use the info in the cache >> according to mtime. > > This seems to break the fsx-linux stress test. > > I suspect a better direction would be looking at whether the inode has > any files open for write (i_writecount > 0)... > > Thanks, > Miklos >
On Tue, 22 Jun 2021 at 14:25, Fengnan Chang <changfengnan@vivo.com> wrote: > > Unh, it seems i_writecount not work. > If we modify file through lowerfs, i_writecount won't change, but the > size already changed. > For example: > echo "111" > /lowerfs/test > ls -l /upper/test > echo "2222" >> /lowerfs/test > ls -l /upper/test > > So, can you describe your test enviroment? including kernel version and > fsx parameters, I will check it. linux-5.13-rc5 + patch mkdir /tmp/test libfuse/example/passthrough_ll -ocache=always,writeback /mnt/fuse/ fsx-linux -N 1000000 /mnt/fuse/tmp/test/fsx Thanks, Miklos
Hi Miklos: Thank you for the information, I have been able to reproduce the problem. The new version of the patch as below. Previous fsx test is pass now. Need do more test, Can you help to test new patch? or send me your test case, I will test this. Here is my test case, and is the problem this patch is trying to solve. Case A: mkdir /tmp/test passthrough_ll -ocache=always,writeback /mnt/test/ echo "11111" > /tmp/test/fsx ls -l /mnt/test/tmp/test/ echo "2222" >> /tmp/test/fsx ls -l /mnt/test/tmp/test/ Case B: mkdir /tmp/test passthrough_ll -ocache=always,writeback /mnt/test/ passthrough_ll -ocache=always,writeback /mnt/test2/ echo "11111" > /tmp/test/fsx ls -l /mnt/test/tmp/test/ ls -l /mnt/test2/tmp/test/ echo "222" >> /mnt/test/tmp/test/fsx ls -l /mnt/test/tmp/test/ ls -l /mnt/test2/tmp/test/ diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c index b9beb39a4a18..8e22a31b55c4 100644 --- a/fs/fuse/inode.c +++ b/fs/fuse/inode.c @@ -60,6 +60,10 @@ MODULE_PARM_DESC(max_user_congthresh, /** Congestion starts at 75% of maximum */ #define FUSE_DEFAULT_CONGESTION_THRESHOLD (FUSE_DEFAULT_MAX_BACKGROUND * 3 / 4) +static inline bool attr_newer_than_local(struct fuse_attr *attr, struct inode *inode) { + return (attr->mtime > inode->i_mtime.tv_sec) + || ((attr->mtime == inode->i_mtime.tv_sec) && (attr->mtimensec > inode->i_mtime.tv_nsec)); +} #ifdef CONFIG_BLOCK static struct file_system_type fuseblk_fs_type; #endif @@ -241,8 +245,10 @@ void fuse_change_attributes(struct inode *inode, struct fuse_attr *attr, * extend local i_size without keeping userspace server in sync. So, * attr->size coming from server can be stale. We cannot trust it. */ - if (!is_wb || !S_ISREG(inode->i_mode)) + if (!is_wb || !S_ISREG(inode->i_mode) + || (attr_newer_than_local(attr, inode) && !inode_is_open_for_write(inode))) { i_size_write(inode, attr->size); + } spin_unlock(&fi->lock); if (!is_wb && S_ISREG(inode->i_mode)) { On 2021/6/22 23:19, Miklos Szeredi wrote: > On Tue, 22 Jun 2021 at 14:25, Fengnan Chang <changfengnan@vivo.com> wrote: >> >> Unh, it seems i_writecount not work. >> If we modify file through lowerfs, i_writecount won't change, but the >> size already changed. >> For example: >> echo "111" > /lowerfs/test >> ls -l /upper/test >> echo "2222" >> /lowerfs/test >> ls -l /upper/test >> >> So, can you describe your test enviroment? including kernel version and >> fsx parameters, I will check it. > > linux-5.13-rc5 + patch > mkdir /tmp/test > libfuse/example/passthrough_ll -ocache=always,writeback /mnt/fuse/ > fsx-linux -N 1000000 /mnt/fuse/tmp/test/fsx > > Thanks, > Miklos >
Hi Miklos: FYI, I run xfstest on fuse, with linux 5.4.61 + patch, no new failure case. On 2021/6/24 15:42, Fengnan Chang wrote: > Hi Miklos: > > Thank you for the information, I have been able to reproduce the problem. > > The new version of the patch as below. Previous fsx test is pass now. > Need do more test, Can you help to test new patch? or send me your test > case, I will test this. > > Here is my test case, and is the problem this patch is trying to solve. > Case A: > mkdir /tmp/test > passthrough_ll -ocache=always,writeback /mnt/test/ > echo "11111" > /tmp/test/fsx > ls -l /mnt/test/tmp/test/ > echo "2222" >> /tmp/test/fsx > ls -l /mnt/test/tmp/test/ > > Case B: > mkdir /tmp/test > passthrough_ll -ocache=always,writeback /mnt/test/ > passthrough_ll -ocache=always,writeback /mnt/test2/ > echo "11111" > /tmp/test/fsx > ls -l /mnt/test/tmp/test/ > ls -l /mnt/test2/tmp/test/ > echo "222" >> /mnt/test/tmp/test/fsx > ls -l /mnt/test/tmp/test/ > ls -l /mnt/test2/tmp/test/ > > > diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c > index b9beb39a4a18..8e22a31b55c4 100644 > --- a/fs/fuse/inode.c > +++ b/fs/fuse/inode.c > @@ -60,6 +60,10 @@ MODULE_PARM_DESC(max_user_congthresh, > /** Congestion starts at 75% of maximum */ > #define FUSE_DEFAULT_CONGESTION_THRESHOLD (FUSE_DEFAULT_MAX_BACKGROUND > * 3 / 4) > > +static inline bool attr_newer_than_local(struct fuse_attr *attr, struct > inode *inode) { > + return (attr->mtime > inode->i_mtime.tv_sec) > + || ((attr->mtime == inode->i_mtime.tv_sec) && > (attr->mtimensec > inode->i_mtime.tv_nsec)); > +} > #ifdef CONFIG_BLOCK > static struct file_system_type fuseblk_fs_type; > #endif > @@ -241,8 +245,10 @@ void fuse_change_attributes(struct inode *inode, > struct fuse_attr *attr, > * extend local i_size without keeping userspace server in > sync. So, > * attr->size coming from server can be stale. We cannot trust it. > */ > - if (!is_wb || !S_ISREG(inode->i_mode)) > + if (!is_wb || !S_ISREG(inode->i_mode) > + || (attr_newer_than_local(attr, inode) && > !inode_is_open_for_write(inode))) { > i_size_write(inode, attr->size); > + } > spin_unlock(&fi->lock); > > if (!is_wb && S_ISREG(inode->i_mode)) { > > On 2021/6/22 23:19, Miklos Szeredi wrote: >> On Tue, 22 Jun 2021 at 14:25, Fengnan Chang <changfengnan@vivo.com> >> wrote: >>> >>> Unh, it seems i_writecount not work. >>> If we modify file through lowerfs, i_writecount won't change, but the >>> size already changed. >>> For example: >>> echo "111" > /lowerfs/test >>> ls -l /upper/test >>> echo "2222" >> /lowerfs/test >>> ls -l /upper/test >>> >>> So, can you describe your test enviroment? including kernel version and >>> fsx parameters, I will check it. >> >> linux-5.13-rc5 + patch >> mkdir /tmp/test >> libfuse/example/passthrough_ll -ocache=always,writeback /mnt/fuse/ >> fsx-linux -N 1000000 /mnt/fuse/tmp/test/fsx >> >> Thanks, >> Miklos >>
On Thu, 24 Jun 2021 at 09:42, Fengnan Chang <changfengnan@vivo.com> wrote: > > Hi Miklos: > > Thank you for the information, I have been able to reproduce the problem. > > The new version of the patch as below. Previous fsx test is pass now. > Need do more test, Can you help to test new patch? or send me your test > case, I will test this. > > Here is my test case, and is the problem this patch is trying to solve. > Case A: > mkdir /tmp/test > passthrough_ll -ocache=always,writeback /mnt/test/ > echo "11111" > /tmp/test/fsx > ls -l /mnt/test/tmp/test/ > echo "2222" >> /tmp/test/fsx > ls -l /mnt/test/tmp/test/ > > Case B: > mkdir /tmp/test > passthrough_ll -ocache=always,writeback /mnt/test/ > passthrough_ll -ocache=always,writeback /mnt/test2/ > echo "11111" > /tmp/test/fsx > ls -l /mnt/test/tmp/test/ > ls -l /mnt/test2/tmp/test/ > echo "222" >> /mnt/test/tmp/test/fsx > ls -l /mnt/test/tmp/test/ > ls -l /mnt/test2/tmp/test/ Both these testcases have the "cache=always" option, which means: cached values (both data and metadata) are always valid; i.e. changes will be made only through this client and not through some other channel (like the backing filesystem or another instance). Why is "cache=always" used? Thanks, Miklos
Remove cache=always still have this problem, this problem is related about FUSE_CAP_WRITEBACK_CACHE. On 2021/8/6 20:20, Miklos Szeredi wrote: > On Thu, 24 Jun 2021 at 09:42, Fengnan Chang <changfengnan@vivo.com> wrote: >> >> Hi Miklos: >> >> Thank you for the information, I have been able to reproduce the problem. >> >> The new version of the patch as below. Previous fsx test is pass now. >> Need do more test, Can you help to test new patch? or send me your test >> case, I will test this. >> >> Here is my test case, and is the problem this patch is trying to solve. >> Case A: >> mkdir /tmp/test >> passthrough_ll -ocache=always,writeback /mnt/test/ >> echo "11111" > /tmp/test/fsx >> ls -l /mnt/test/tmp/test/ >> echo "2222" >> /tmp/test/fsx >> ls -l /mnt/test/tmp/test/ >> >> Case B: >> mkdir /tmp/test >> passthrough_ll -ocache=always,writeback /mnt/test/ >> passthrough_ll -ocache=always,writeback /mnt/test2/ >> echo "11111" > /tmp/test/fsx >> ls -l /mnt/test/tmp/test/ >> ls -l /mnt/test2/tmp/test/ >> echo "222" >> /mnt/test/tmp/test/fsx >> ls -l /mnt/test/tmp/test/ >> ls -l /mnt/test2/tmp/test/ > > Both these testcases have the "cache=always" option, which means: > cached values (both data and metadata) are always valid; i.e. changes > will be made only through this client and not through some other > channel (like the backing filesystem or another instance). > > Why is "cache=always" used? > > Thanks, > Miklos >
On Tue, Aug 10, 2021 at 3:12 PM Fengnan Chang <changfengnan@vivo.com> wrote: > > Remove cache=always still have this problem, this problem is related > about FUSE_CAP_WRITEBACK_CACHE. FUSE_CAP_WRITEBACK_CACHE by definition asks the kernel to trust its own inode attributes. So I don't think we should fix its semantics. Instead, in your case, it seems that the two mnts (PATHA and PATHB) are not sharing the same superblock. I would suggest fixing it at a higher level: 1. use bind-mount to mount PATHA and PATHB, 2. or add a `tag=xxx` argument to fuse mount option to uniquely identify a fuse file system (just like we do in the virtiofs case) Cheers, Tao
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c index b0e18b470e91..55fdafcaca34 100644 --- a/fs/fuse/inode.c +++ b/fs/fuse/inode.c @@ -182,7 +182,10 @@ void fuse_change_attributes_common(struct inode *inode, struct fuse_attr *attr, inode->i_atime.tv_sec = attr->atime; inode->i_atime.tv_nsec = attr->atimensec; /* mtime from server may be stale due to local buffered write */ - if (!fc->writeback_cache || !S_ISREG(inode->i_mode)) { + if (!fc->writeback_cache || !S_ISREG(inode->i_mode) + || (attr->mtime > inode->i_mtime.tv_sec) + || ((attr->mtime == inode->i_mtime.tv_sec) + && (attr->mtimensec >= inode->i_mtime.tv_nsec))) { inode->i_mtime.tv_sec = attr->mtime; inode->i_mtime.tv_nsec = attr->mtimensec; inode->i_ctime.tv_sec = attr->ctime; @@ -241,8 +244,12 @@ void fuse_change_attributes(struct inode *inode, struct fuse_attr *attr, * extend local i_size without keeping userspace server in sync. So, * attr->size coming from server can be stale. We cannot trust it. */ - if (!is_wb || !S_ISREG(inode->i_mode)) + if (!is_wb || !S_ISREG(inode->i_mode) + || (attr->mtime > inode->i_mtime.tv_sec) + || ((attr->mtime == inode->i_mtime.tv_sec) + && (attr->mtimensec >= inode->i_mtime.tv_nsec))) { i_size_write(inode, attr->size); + } spin_unlock(&fi->lock); if (!is_wb && S_ISREG(inode->i_mode)) {
When writeback cache is enabled, the inode information in cached is considered new by default, and the inode information of lowerfs is stale. When a lower fs is mount in a different directory through different connection, for example PATHA and PATHB, since writeback cache is enabled by default, when the file is modified through PATHA, viewing the same file from the PATHB, PATHB will think that cached inode is newer than lowerfs, resulting in file size and time from under PATHA and PATHB is inconsistent. Add a judgment condition to check whether to use the info in the cache according to mtime. Signed-off-by: Fengnan Chang <changfengnan@vivo.com> --- fs/fuse/inode.c | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) -- 2.29.0