mbox series

[RFC,V6,0/7] implement containerized syncfs for overlayfs

Message ID 20211122030038.1938875-1-cgxu519@mykernel.net (mailing list archive)
Headers show
Series implement containerized syncfs for overlayfs | expand

Message

Chengguang Xu Nov. 22, 2021, 3 a.m. UTC
From: Chengguang Xu <charliecgxu@tencent.com>

Current syncfs(2) syscall on overlayfs just calls sync_filesystem()
on upper_sb to synchronize whole dirty inodes in upper filesystem
regardless of the overlay ownership of the inode. In the use case of
container, when multiple containers using the same underlying upper
filesystem, it has some shortcomings as below.

(1) Performance
Synchronization is probably heavy because it actually syncs unnecessary
inodes for target overlayfs.

(2) Interference
Unplanned synchronization will probably impact IO performance of
unrelated container processes on the other overlayfs.

This series try to implement containerized syncfs for overlayfs so that
only sync target dirty upper inodes which are belong to specific overlayfs
instance. By doing this, it is able to reduce cost of synchronization and 
will not seriously impact IO performance of unrelated processes.

v1->v2:
- Mark overlayfs' inode dirty itself instead of adding notification
mechanism to vfs inode.

v2->v3:
- Introduce overlayfs' extra syncfs wait list to wait target upper inodes
in ->sync_fs.

v3->v4:
- Using wait_sb_inodes() to wait syncing upper inodes.
- Mark overlay inode dirty only when having upper inode and VM_SHARED
flag in ovl_mmap().
- Check upper i_state after checking upper mmap state
in ovl_write_inode.

v4->v5:
- Add underlying inode dirtiness check after mnt_drop_write().
- Handle both wait/no-wait mode of syncfs(2) in overlayfs' ->sync_fs().

v5->v6:
- Rebase to latest overlayfs-next tree.
- Mark oerlay inode dirty when it has upper instead of marking dirty on
  modification.
- Trigger dirty page writeback in overlayfs' ->write_inode().
- Mark overlay inode 'DONTCACHE' flag.
- Delete overlayfs' ->writepages() and ->evict_inode() operations.

Chengguang Xu (7):
  ovl: setup overlayfs' private bdi
  ovl: mark overlayfs inode dirty when it has upper
  ovl: implement overlayfs' own ->write_inode operation
  ovl: set 'DONTCACHE' flag for overlayfs inode
  fs: export wait_sb_inodes()
  ovl: introduce ovl_sync_upper_blockdev()
  ovl: implement containerized syncfs for overlayfs

 fs/fs-writeback.c         |  3 ++-
 fs/overlayfs/inode.c      |  5 +++-
 fs/overlayfs/super.c      | 49 ++++++++++++++++++++++++++++++++-------
 fs/overlayfs/util.c       |  1 +
 include/linux/writeback.h |  1 +
 5 files changed, 48 insertions(+), 11 deletions(-)

Comments

Chengguang Xu Nov. 27, 2021, 9:26 a.m. UTC | #1
---- 在 星期一, 2021-11-22 11:00:31 Chengguang Xu <cgxu519@mykernel.net> 撰写 ----
 > From: Chengguang Xu <charliecgxu@tencent.com>
 > 
 > Current syncfs(2) syscall on overlayfs just calls sync_filesystem()
 > on upper_sb to synchronize whole dirty inodes in upper filesystem
 > regardless of the overlay ownership of the inode. In the use case of
 > container, when multiple containers using the same underlying upper
 > filesystem, it has some shortcomings as below.
 > 
 > (1) Performance
 > Synchronization is probably heavy because it actually syncs unnecessary
 > inodes for target overlayfs.
 > 
 > (2) Interference
 > Unplanned synchronization will probably impact IO performance of
 > unrelated container processes on the other overlayfs.
 > 
 > This series try to implement containerized syncfs for overlayfs so that
 > only sync target dirty upper inodes which are belong to specific overlayfs
 > instance. By doing this, it is able to reduce cost of synchronization and 
 > will not seriously impact IO performance of unrelated processes.
 > 
 > v1->v2:
 > - Mark overlayfs' inode dirty itself instead of adding notification
 > mechanism to vfs inode.
 > 
 > v2->v3:
 > - Introduce overlayfs' extra syncfs wait list to wait target upper inodes
 > in ->sync_fs.
 > 
 > v3->v4:
 > - Using wait_sb_inodes() to wait syncing upper inodes.
 > - Mark overlay inode dirty only when having upper inode and VM_SHARED
 > flag in ovl_mmap().
 > - Check upper i_state after checking upper mmap state
 > in ovl_write_inode.
 > 
 > v4->v5:
 > - Add underlying inode dirtiness check after mnt_drop_write().
 > - Handle both wait/no-wait mode of syncfs(2) in overlayfs' ->sync_fs().
 > 
 > v5->v6:
 > - Rebase to latest overlayfs-next tree.
 > - Mark oerlay inode dirty when it has upper instead of marking dirty on
 >   modification.
 > - Trigger dirty page writeback in overlayfs' ->write_inode().
 > - Mark overlay inode 'DONTCACHE' flag.
 > - Delete overlayfs' ->writepages() and ->evict_inode() operations.


Hi Miklos,

Have you got time to have a look at this V6 series? I think this version has already fixed
the issues in previous feedbacks of you guys and passed fstests (generic/overlay cases).

I did some stress long time tests (tar & syncfs & diff on w/wo copy-up) and found no obvious problem.
For syncfs time with 1M clean upper inodes, there was extra 1.3s wasted on waiting scheduling.
I guess this 1.3s will not bring significant impact to container instance in most cases, I also
agree with Jack that we can start with this approach and do some improvements afterwards if there is
complain from any real users.



Thanks,
Chengguang


 > 
 > Chengguang Xu (7):
 >   ovl: setup overlayfs' private bdi
 >   ovl: mark overlayfs inode dirty when it has upper
 >   ovl: implement overlayfs' own ->write_inode operation
 >   ovl: set 'DONTCACHE' flag for overlayfs inode
 >   fs: export wait_sb_inodes()
 >   ovl: introduce ovl_sync_upper_blockdev()
 >   ovl: implement containerized syncfs for overlayfs
 > 
 >  fs/fs-writeback.c         |  3 ++-
 >  fs/overlayfs/inode.c      |  5 +++-
 >  fs/overlayfs/super.c      | 49 ++++++++++++++++++++++++++++++++-------
 >  fs/overlayfs/util.c       |  1 +
 >  include/linux/writeback.h |  1 +
 >  5 files changed, 48 insertions(+), 11 deletions(-)
 > 
 > -- 
 > 2.27.0
 > 
 >