diff mbox series

[RFC,v3,09/10] ovl: introduce helper of syncfs writeback inode waiting

Message ID 20201108140307.1385745-10-cgxu519@mykernel.net (mailing list archive)
State New
Headers show
Series implement containerized syncfs for overlayfs | expand

Commit Message

Chengguang Xu Nov. 8, 2020, 2:03 p.m. UTC
Introduce a helper ovl_wait_wb_inodes() to wait until all
target upper inodes finish writeback.

Signed-off-by: Chengguang Xu <cgxu519@mykernel.net>
---
 fs/overlayfs/super.c | 30 ++++++++++++++++++++++++++++++
 1 file changed, 30 insertions(+)

Comments

Chengguang Xu Nov. 9, 2020, 3:33 a.m. UTC | #1
---- 在 星期日, 2020-11-08 22:03:06 Chengguang Xu <cgxu519@mykernel.net> 撰写 ----
 > Introduce a helper ovl_wait_wb_inodes() to wait until all
 > target upper inodes finish writeback.
 > 
 > Signed-off-by: Chengguang Xu <cgxu519@mykernel.net>
 > ---
 >  fs/overlayfs/super.c | 30 ++++++++++++++++++++++++++++++
 >  1 file changed, 30 insertions(+)
 > 
 > diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
 > index e5607a908d82..9a535fc11221 100644
 > --- a/fs/overlayfs/super.c
 > +++ b/fs/overlayfs/super.c
 > @@ -255,6 +255,36 @@ static void ovl_put_super(struct super_block *sb)
 >      ovl_free_fs(ofs);
 >  }
 >  
 > +void ovl_wait_wb_inodes(struct ovl_fs *ofs)
 > +{
 > +    LIST_HEAD(tmp_list);
 > +    struct ovl_inode *oi;
 > +    struct inode *upper;
 > +
 > +    spin_lock(&ofs->syncfs_wait_list_lock);
 > +    list_splice_init(&ofs->syncfs_wait_list, &tmp_list);
 > +
 > +    while (!list_empty(&tmp_list)) {
 > +        oi = list_first_entry(&tmp_list, struct ovl_inode, wait_list);
 > +        list_del_init(&oi->wait_list);
 > +        ihold(&oi->vfs_inode);

Maybe I overlooked race condition with inode eviction, so still need to introduce
OVL_EVICT_PENDING flag just like we did in old syncfs efficiency patch series.


Thanks,
Chengguang
Amir Goldstein Nov. 9, 2020, 7:07 a.m. UTC | #2
On Mon, Nov 9, 2020 at 5:34 AM Chengguang Xu <cgxu519@mykernel.net> wrote:
>
>  ---- 在 星期日, 2020-11-08 22:03:06 Chengguang Xu <cgxu519@mykernel.net> 撰写 ----
>  > Introduce a helper ovl_wait_wb_inodes() to wait until all
>  > target upper inodes finish writeback.
>  >
>  > Signed-off-by: Chengguang Xu <cgxu519@mykernel.net>
>  > ---
>  >  fs/overlayfs/super.c | 30 ++++++++++++++++++++++++++++++
>  >  1 file changed, 30 insertions(+)
>  >
>  > diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
>  > index e5607a908d82..9a535fc11221 100644
>  > --- a/fs/overlayfs/super.c
>  > +++ b/fs/overlayfs/super.c
>  > @@ -255,6 +255,36 @@ static void ovl_put_super(struct super_block *sb)
>  >      ovl_free_fs(ofs);
>  >  }
>  >
>  > +void ovl_wait_wb_inodes(struct ovl_fs *ofs)
>  > +{
>  > +    LIST_HEAD(tmp_list);
>  > +    struct ovl_inode *oi;
>  > +    struct inode *upper;
>  > +
>  > +    spin_lock(&ofs->syncfs_wait_list_lock);
>  > +    list_splice_init(&ofs->syncfs_wait_list, &tmp_list);
>  > +
>  > +    while (!list_empty(&tmp_list)) {
>  > +        oi = list_first_entry(&tmp_list, struct ovl_inode, wait_list);
>  > +        list_del_init(&oi->wait_list);
>  > +        ihold(&oi->vfs_inode);
>
> Maybe I overlooked race condition with inode eviction, so still need to introduce
> OVL_EVICT_PENDING flag just like we did in old syncfs efficiency patch series.
>

I am not sure why you added the ovl wait list.

I think you misunderstood Jan's suggestion.
I think what Jan meant is that ovl_sync_fs() should call
wait_sb_inodes(upper_sb)
to wait for writeback of ALL upper inodes after sync_filesystem()
started writeback
only on this ovl instance upper inodes.

I am not sure if this is acceptable or not - it is certainly an improvement over
current situation, but I have a feeling that on a large scale (many
containers) it
won't be enough.

The idea was to keep it simple without over optimizing, since anyway
you are going for the "correct" solution long term (ovl inode aops),
so I wouldn't
add the wait list.

As long as the upper inode is still dirty, we can keep the ovl inode in cache,
so the worst outcome is that drop_caches needs to get called twice before the
ovl inode can be evicted, no?

Thanks,
Amir.
Chengguang Xu Nov. 9, 2020, 8:34 a.m. UTC | #3
---- 在 星期一, 2020-11-09 15:07:18 Amir Goldstein <amir73il@gmail.com> 撰写 ----
 > On Mon, Nov 9, 2020 at 5:34 AM Chengguang Xu <cgxu519@mykernel.net> wrote:
 > >
 > >  ---- 在 星期日, 2020-11-08 22:03:06 Chengguang Xu <cgxu519@mykernel.net> 撰写 ----
 > >  > Introduce a helper ovl_wait_wb_inodes() to wait until all
 > >  > target upper inodes finish writeback.
 > >  >
 > >  > Signed-off-by: Chengguang Xu <cgxu519@mykernel.net>
 > >  > ---
 > >  >  fs/overlayfs/super.c | 30 ++++++++++++++++++++++++++++++
 > >  >  1 file changed, 30 insertions(+)
 > >  >
 > >  > diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
 > >  > index e5607a908d82..9a535fc11221 100644
 > >  > --- a/fs/overlayfs/super.c
 > >  > +++ b/fs/overlayfs/super.c
 > >  > @@ -255,6 +255,36 @@ static void ovl_put_super(struct super_block *sb)
 > >  >      ovl_free_fs(ofs);
 > >  >  }
 > >  >
 > >  > +void ovl_wait_wb_inodes(struct ovl_fs *ofs)
 > >  > +{
 > >  > +    LIST_HEAD(tmp_list);
 > >  > +    struct ovl_inode *oi;
 > >  > +    struct inode *upper;
 > >  > +
 > >  > +    spin_lock(&ofs->syncfs_wait_list_lock);
 > >  > +    list_splice_init(&ofs->syncfs_wait_list, &tmp_list);
 > >  > +
 > >  > +    while (!list_empty(&tmp_list)) {
 > >  > +        oi = list_first_entry(&tmp_list, struct ovl_inode, wait_list);
 > >  > +        list_del_init(&oi->wait_list);
 > >  > +        ihold(&oi->vfs_inode);
 > >
 > > Maybe I overlooked race condition with inode eviction, so still need to introduce
 > > OVL_EVICT_PENDING flag just like we did in old syncfs efficiency patch series.
 > >
 > 
 > I am not sure why you added the ovl wait list.
 > 
 > I think you misunderstood Jan's suggestion.
 > I think what Jan meant is that ovl_sync_fs() should call
 > wait_sb_inodes(upper_sb)
 > to wait for writeback of ALL upper inodes after sync_filesystem()
 > started writeback
 > only on this ovl instance upper inodes.
 > 


Maybe you are right, the wait list is just for accuracy that can completely
avoid interferes between ovl instances, otherwise we may need to face
waiting interferes  in high density environment. 


 > I am not sure if this is acceptable or not - it is certainly an improvement over
 > current situation, but I have a feeling that on a large scale (many
 > containers) it
 > won't be enough.
 > 

The same as your thought. 


 > The idea was to keep it simple without over optimizing, since anyway
 > you are going for the "correct" solution long term (ovl inode aops),
 > so I wouldn't
 > add the wait list.
 > 

Maybe, I think it depends on how to implement ovl page-cache, so at current
stage I have no idea for the wait list.


 > As long as the upper inode is still dirty, we can keep the ovl inode in cache,
 > so the worst outcome is that drop_caches needs to get called twice before the
 > ovl inode can be evicted, no?
 > 
 
IIUC, since currently ovl does not have it's own page-cache, so there is no affect to page-cache reclaim, 
also  there is no ovl shrinker to reclaim slab because we drop ovl inode directly after final iput. 
So should we add a shrinker in this series?


Thanks,
Chengguang
Amir Goldstein Nov. 9, 2020, 10:07 a.m. UTC | #4
On Mon, Nov 9, 2020 at 10:34 AM Chengguang Xu <cgxu519@mykernel.net> wrote:
>
>  ---- 在 星期一, 2020-11-09 15:07:18 Amir Goldstein <amir73il@gmail.com> 撰写 ----
>  > On Mon, Nov 9, 2020 at 5:34 AM Chengguang Xu <cgxu519@mykernel.net> wrote:
>  > >
>  > >  ---- 在 星期日, 2020-11-08 22:03:06 Chengguang Xu <cgxu519@mykernel.net> 撰写 ----
>  > >  > Introduce a helper ovl_wait_wb_inodes() to wait until all
>  > >  > target upper inodes finish writeback.
>  > >  >
>  > >  > Signed-off-by: Chengguang Xu <cgxu519@mykernel.net>
>  > >  > ---
>  > >  >  fs/overlayfs/super.c | 30 ++++++++++++++++++++++++++++++
>  > >  >  1 file changed, 30 insertions(+)
>  > >  >
>  > >  > diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
>  > >  > index e5607a908d82..9a535fc11221 100644
>  > >  > --- a/fs/overlayfs/super.c
>  > >  > +++ b/fs/overlayfs/super.c
>  > >  > @@ -255,6 +255,36 @@ static void ovl_put_super(struct super_block *sb)
>  > >  >      ovl_free_fs(ofs);
>  > >  >  }
>  > >  >
>  > >  > +void ovl_wait_wb_inodes(struct ovl_fs *ofs)
>  > >  > +{
>  > >  > +    LIST_HEAD(tmp_list);
>  > >  > +    struct ovl_inode *oi;
>  > >  > +    struct inode *upper;
>  > >  > +
>  > >  > +    spin_lock(&ofs->syncfs_wait_list_lock);
>  > >  > +    list_splice_init(&ofs->syncfs_wait_list, &tmp_list);
>  > >  > +
>  > >  > +    while (!list_empty(&tmp_list)) {
>  > >  > +        oi = list_first_entry(&tmp_list, struct ovl_inode, wait_list);
>  > >  > +        list_del_init(&oi->wait_list);
>  > >  > +        ihold(&oi->vfs_inode);
>  > >
>  > > Maybe I overlooked race condition with inode eviction, so still need to introduce
>  > > OVL_EVICT_PENDING flag just like we did in old syncfs efficiency patch series.
>  > >
>  >
>  > I am not sure why you added the ovl wait list.
>  >
>  > I think you misunderstood Jan's suggestion.
>  > I think what Jan meant is that ovl_sync_fs() should call
>  > wait_sb_inodes(upper_sb)
>  > to wait for writeback of ALL upper inodes after sync_filesystem()
>  > started writeback
>  > only on this ovl instance upper inodes.
>  >
>
>
> Maybe you are right, the wait list is just for accuracy that can completely
> avoid interferes between ovl instances, otherwise we may need to face
> waiting interferes  in high density environment.
>
>
>  > I am not sure if this is acceptable or not - it is certainly an improvement over
>  > current situation, but I have a feeling that on a large scale (many
>  > containers) it
>  > won't be enough.
>  >
>
> The same as your thought.
>
>
>  > The idea was to keep it simple without over optimizing, since anyway
>  > you are going for the "correct" solution long term (ovl inode aops),
>  > so I wouldn't
>  > add the wait list.
>  >
>
> Maybe, I think it depends on how to implement ovl page-cache, so at current
> stage I have no idea for the wait list.
>
>
>  > As long as the upper inode is still dirty, we can keep the ovl inode in cache,
>  > so the worst outcome is that drop_caches needs to get called twice before the
>  > ovl inode can be evicted, no?
>  >
>
> IIUC, since currently ovl does not have it's own page-cache, so there is no affect to page-cache reclaim,
> also  there is no ovl shrinker to reclaim slab because we drop ovl inode directly after final iput.
> So should we add a shrinker in this series?
>

Would that add a lot of complexity?
Thinking out loud: maybe we follow Jan's suggestion and fix remaining
performance with followup series?

Thanks,
Amir.
Chengguang Xu Nov. 9, 2020, 12:06 p.m. UTC | #5
---- 在 星期一, 2020-11-09 18:07:18 Amir Goldstein <amir73il@gmail.com> 撰写 ----
 > On Mon, Nov 9, 2020 at 10:34 AM Chengguang Xu <cgxu519@mykernel.net> wrote:
 > >
 > >  ---- 在 星期一, 2020-11-09 15:07:18 Amir Goldstein <amir73il@gmail.com> 撰写 ----
 > >  > On Mon, Nov 9, 2020 at 5:34 AM Chengguang Xu <cgxu519@mykernel.net> wrote:
 > >  > >
 > >  > >  ---- 在 星期日, 2020-11-08 22:03:06 Chengguang Xu <cgxu519@mykernel.net> 撰写 ----
 > >  > >  > Introduce a helper ovl_wait_wb_inodes() to wait until all
 > >  > >  > target upper inodes finish writeback.
 > >  > >  >
 > >  > >  > Signed-off-by: Chengguang Xu <cgxu519@mykernel.net>
 > >  > >  > ---
 > >  > >  >  fs/overlayfs/super.c | 30 ++++++++++++++++++++++++++++++
 > >  > >  >  1 file changed, 30 insertions(+)
 > >  > >  >
 > >  > >  > diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
 > >  > >  > index e5607a908d82..9a535fc11221 100644
 > >  > >  > --- a/fs/overlayfs/super.c
 > >  > >  > +++ b/fs/overlayfs/super.c
 > >  > >  > @@ -255,6 +255,36 @@ static void ovl_put_super(struct super_block *sb)
 > >  > >  >      ovl_free_fs(ofs);
 > >  > >  >  }
 > >  > >  >
 > >  > >  > +void ovl_wait_wb_inodes(struct ovl_fs *ofs)
 > >  > >  > +{
 > >  > >  > +    LIST_HEAD(tmp_list);
 > >  > >  > +    struct ovl_inode *oi;
 > >  > >  > +    struct inode *upper;
 > >  > >  > +
 > >  > >  > +    spin_lock(&ofs->syncfs_wait_list_lock);
 > >  > >  > +    list_splice_init(&ofs->syncfs_wait_list, &tmp_list);
 > >  > >  > +
 > >  > >  > +    while (!list_empty(&tmp_list)) {
 > >  > >  > +        oi = list_first_entry(&tmp_list, struct ovl_inode, wait_list);
 > >  > >  > +        list_del_init(&oi->wait_list);
 > >  > >  > +        ihold(&oi->vfs_inode);
 > >  > >
 > >  > > Maybe I overlooked race condition with inode eviction, so still need to introduce
 > >  > > OVL_EVICT_PENDING flag just like we did in old syncfs efficiency patch series.
 > >  > >
 > >  >
 > >  > I am not sure why you added the ovl wait list.
 > >  >
 > >  > I think you misunderstood Jan's suggestion.
 > >  > I think what Jan meant is that ovl_sync_fs() should call
 > >  > wait_sb_inodes(upper_sb)
 > >  > to wait for writeback of ALL upper inodes after sync_filesystem()
 > >  > started writeback
 > >  > only on this ovl instance upper inodes.
 > >  >
 > >
 > >
 > > Maybe you are right, the wait list is just for accuracy that can completely
 > > avoid interferes between ovl instances, otherwise we may need to face
 > > waiting interferes  in high density environment.
 > >
 > >
 > >  > I am not sure if this is acceptable or not - it is certainly an improvement over
 > >  > current situation, but I have a feeling that on a large scale (many
 > >  > containers) it
 > >  > won't be enough.
 > >  >
 > >
 > > The same as your thought.
 > >
 > >
 > >  > The idea was to keep it simple without over optimizing, since anyway
 > >  > you are going for the "correct" solution long term (ovl inode aops),
 > >  > so I wouldn't
 > >  > add the wait list.
 > >  >
 > >
 > > Maybe, I think it depends on how to implement ovl page-cache, so at current
 > > stage I have no idea for the wait list.
 > >
 > >
 > >  > As long as the upper inode is still dirty, we can keep the ovl inode in cache,
 > >  > so the worst outcome is that drop_caches needs to get called twice before the
 > >  > ovl inode can be evicted, no?
 > >  >
 > >
 > > IIUC, since currently ovl does not have it's own page-cache, so there is no affect to page-cache reclaim,
 > > also  there is no ovl shrinker to reclaim slab because we drop ovl inode directly after final iput.
 > > So should we add a shrinker in this series?
 > >
 > 
 > Would that add a lot of complexity?

Sorry, don't need any other shrinker because inode and dentry use common vfs shrinker.

 > Thinking out loud: maybe we follow Jan's suggestion and fix remaining
 > performance with followup series?
 > 

Okay,  so let's leave it as homework.


Thanks,
Chengguang
diff mbox series

Patch

diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
index e5607a908d82..9a535fc11221 100644
--- a/fs/overlayfs/super.c
+++ b/fs/overlayfs/super.c
@@ -255,6 +255,36 @@  static void ovl_put_super(struct super_block *sb)
 	ovl_free_fs(ofs);
 }
 
+void ovl_wait_wb_inodes(struct ovl_fs *ofs)
+{
+	LIST_HEAD(tmp_list);
+	struct ovl_inode *oi;
+	struct inode *upper;
+
+	spin_lock(&ofs->syncfs_wait_list_lock);
+	list_splice_init(&ofs->syncfs_wait_list, &tmp_list);
+
+	while (!list_empty(&tmp_list)) {
+		oi = list_first_entry(&tmp_list, struct ovl_inode, wait_list);
+		list_del_init(&oi->wait_list);
+		ihold(&oi->vfs_inode);
+		spin_unlock(&ofs->syncfs_wait_list_lock);
+
+		upper = ovl_inode_upper(&oi->vfs_inode);
+		if (!mapping_tagged(upper->i_mapping, PAGECACHE_TAG_WRITEBACK)) {
+			iput(&oi->vfs_inode);
+			goto wait_next;
+		}
+
+		filemap_fdatawait_keep_errors(upper->i_mapping);
+		iput(&oi->vfs_inode);
+		cond_resched();
+wait_next:
+		spin_lock(&ofs->syncfs_wait_list_lock);
+	}
+	spin_unlock(&ofs->syncfs_wait_list_lock);
+}
+
 /* Sync real dirty inodes in upper filesystem (if it exists) */
 static int ovl_sync_fs(struct super_block *sb, int wait)
 {