mbox series

[RFC,0/8] fsdax: introduce FS query interface to support reflink

Message ID 20200807131336.318774-1-ruansy.fnst@cn.fujitsu.com (mailing list archive)
Headers show
Series fsdax: introduce FS query interface to support reflink | expand

Message

Ruan Shiyang Aug. 7, 2020, 1:13 p.m. UTC
This patchset is a try to resolve the problem of tracking shared page
for fsdax.

Instead of per-page tracking method, this patchset introduces a query
interface: get_shared_files(), which is implemented by each FS, to
obtain the owners of a shared page.  It returns an owner list of this
shared page.  Then, the memory-failure() iterates the list to be able
to notify each process using files that sharing this page.

The design of the tracking method is as follow:
1. dax_assocaite_entry() associates the owner's info to this page
- For non-reflink case:
  page->mapping,->index stores the file's mapping, offset in file.
    A dax page is not shared by other files. dax_associate_entry() is
    called only once.  So, use page->mapping,->index to store the
    owner's info.
- For reflink case:
  page->mapping,->index stores the block device, offset in device.
    A dax page is shared more than once.  So, dax_assocaite_entry()
    will be called more than once.  We introduce page->zone_device_data
    as reflink counter, to indicate that this page is shared and how
    many owners now is using this page. The page->mapping,->index is
    used to store the block_device of the fs and page offset of this
    device.

2. dax_lock_page() calls query interface to lock each dax entry
- For non-reflink case:
  owner's info is stored in page->mapping,->index.
    So, It is easy to lock its dax entry.
- For reflink case:
  owner's info is obtained by calling get_shared_files(), which is
  implemented by FS.
    The FS context could be found in block_device that stored by
    page->mapping.  Then lock the dax entries of the owners.

In memory-failure(), since the owner list has been obtained in 
dax_lock_page(), just iterate the list and handle the error.  This
patchset didn't handle the memory failure on metadata of FS because
I haven't found a way to distinguish whether this page contains
matadata yet.  Still working on it.

==
I also borrowed and made some changes on Goldwyn's patchsets.
These patches makes up for the lack of CoW mechanism in fsdax.

The rests are dax & reflink support for xfs.

(Rebased on v5.8)
==
Shiyang Ruan (8):
  fs: introduce get_shared_files() for dax&reflink
  fsdax, mm: track files sharing dax page for memory-failure
  fsdax: introduce dax_copy_edges() for COW
  fsdax: copy data before write
  fsdax: replace mmap entry in case of CoW
  fsdax: dedup file range to use a compare function
  fs/xfs: handle CoW for fsdax write() path
  fs/xfs: support dedupe for fsdax

 fs/btrfs/reflink.c     |   3 +-
 fs/dax.c               | 302 +++++++++++++++++++++++++++++++++++------
 fs/ocfs2/file.c        |   2 +-
 fs/read_write.c        |  11 +-
 fs/xfs/xfs_bmap_util.c |   6 +-
 fs/xfs/xfs_file.c      |  10 +-
 fs/xfs/xfs_iomap.c     |   3 +-
 fs/xfs/xfs_iops.c      |  11 +-
 fs/xfs/xfs_reflink.c   |  80 ++++++-----
 fs/xfs/xfs_super.c     |  67 +++++++++
 include/linux/dax.h    |  18 ++-
 include/linux/fs.h     |  11 +-
 include/linux/mm.h     |   8 ++
 mm/memory-failure.c    | 138 ++++++++++++-------
 14 files changed, 529 insertions(+), 141 deletions(-)

Comments

Matthew Wilcox Aug. 7, 2020, 1:38 p.m. UTC | #1
On Fri, Aug 07, 2020 at 09:13:28PM +0800, Shiyang Ruan wrote:
> This patchset is a try to resolve the problem of tracking shared page
> for fsdax.
> 
> Instead of per-page tracking method, this patchset introduces a query
> interface: get_shared_files(), which is implemented by each FS, to
> obtain the owners of a shared page.  It returns an owner list of this
> shared page.  Then, the memory-failure() iterates the list to be able
> to notify each process using files that sharing this page.
> 
> The design of the tracking method is as follow:
> 1. dax_assocaite_entry() associates the owner's info to this page

I think that's the first problem with this design.  dax_associate_entry is
a horrendous idea which needs to be ripped out, not made more important.
It's all part of the general problem of trying to do something on a
per-page basis instead of per-extent basis.
Ruan Shiyang Aug. 10, 2020, 8:15 a.m. UTC | #2
On 2020/8/7 下午9:38, Matthew Wilcox wrote:
> On Fri, Aug 07, 2020 at 09:13:28PM +0800, Shiyang Ruan wrote:
>> This patchset is a try to resolve the problem of tracking shared page
>> for fsdax.
>>
>> Instead of per-page tracking method, this patchset introduces a query
>> interface: get_shared_files(), which is implemented by each FS, to
>> obtain the owners of a shared page.  It returns an owner list of this
>> shared page.  Then, the memory-failure() iterates the list to be able
>> to notify each process using files that sharing this page.
>>
>> The design of the tracking method is as follow:
>> 1. dax_assocaite_entry() associates the owner's info to this page
> 
> I think that's the first problem with this design.  dax_associate_entry is
> a horrendous idea which needs to be ripped out, not made more important.
> It's all part of the general problem of trying to do something on a
> per-page basis instead of per-extent basis.
> 

The memory-failure needs to track owners info from a dax page, so I 
should associate the owner with this page.  In this version, I associate 
the block device to the dax page, so that the memory-failure is able to 
iterate the owners by the query interface provided by filesystem.


--
Thanks,
Ruan Shiyang.
> 
>
Matthew Wilcox Aug. 10, 2020, 11:16 a.m. UTC | #3
On Mon, Aug 10, 2020 at 04:15:50PM +0800, Ruan Shiyang wrote:
> 
> 
> On 2020/8/7 下午9:38, Matthew Wilcox wrote:
> > On Fri, Aug 07, 2020 at 09:13:28PM +0800, Shiyang Ruan wrote:
> > > This patchset is a try to resolve the problem of tracking shared page
> > > for fsdax.
> > > 
> > > Instead of per-page tracking method, this patchset introduces a query
> > > interface: get_shared_files(), which is implemented by each FS, to
> > > obtain the owners of a shared page.  It returns an owner list of this
> > > shared page.  Then, the memory-failure() iterates the list to be able
> > > to notify each process using files that sharing this page.
> > > 
> > > The design of the tracking method is as follow:
> > > 1. dax_assocaite_entry() associates the owner's info to this page
> > 
> > I think that's the first problem with this design.  dax_associate_entry is
> > a horrendous idea which needs to be ripped out, not made more important.
> > It's all part of the general problem of trying to do something on a
> > per-page basis instead of per-extent basis.
> > 
> 
> The memory-failure needs to track owners info from a dax page, so I should
> associate the owner with this page.  In this version, I associate the block
> device to the dax page, so that the memory-failure is able to iterate the
> owners by the query interface provided by filesystem.

No, it doesn't need to track owner info from a DAX page.  What it needs to
do is ask the filesystem.
Dave Chinner Aug. 10, 2020, 9:10 p.m. UTC | #4
On Mon, Aug 10, 2020 at 12:16:57PM +0100, Matthew Wilcox wrote:
> On Mon, Aug 10, 2020 at 04:15:50PM +0800, Ruan Shiyang wrote:
> > 
> > 
> > On 2020/8/7 下午9:38, Matthew Wilcox wrote:
> > > On Fri, Aug 07, 2020 at 09:13:28PM +0800, Shiyang Ruan wrote:
> > > > This patchset is a try to resolve the problem of tracking shared page
> > > > for fsdax.
> > > > 
> > > > Instead of per-page tracking method, this patchset introduces a query
> > > > interface: get_shared_files(), which is implemented by each FS, to
> > > > obtain the owners of a shared page.  It returns an owner list of this
> > > > shared page.  Then, the memory-failure() iterates the list to be able
> > > > to notify each process using files that sharing this page.
> > > > 
> > > > The design of the tracking method is as follow:
> > > > 1. dax_assocaite_entry() associates the owner's info to this page
> > > 
> > > I think that's the first problem with this design.  dax_associate_entry is
> > > a horrendous idea which needs to be ripped out, not made more important.
> > > It's all part of the general problem of trying to do something on a
> > > per-page basis instead of per-extent basis.
> > > 
> > 
> > The memory-failure needs to track owners info from a dax page, so I should
> > associate the owner with this page.  In this version, I associate the block
> > device to the dax page, so that the memory-failure is able to iterate the
> > owners by the query interface provided by filesystem.
> 
> No, it doesn't need to track owner info from a DAX page.  What it needs to
> do is ask the filesystem.

Just to add to this: the owner tracking that is current done deep
inside the DAX code needs to be moved out to the owner of the dax
device. That may be the dax device itself, or it may be a filesystem
like ext4 or XFS. Initially, nothing will be able to share pages and
page owner tracking should be done on the page itself as the DAX
code currently does.

Hence when a page error occurs, the device owner is called with the
page and error information, and the dax device or filesystem can
then look up the page owner (via mapping/index pointers) and run the
Die Userspace Die functions instead of running this all internally
in the DAX code.

Once the implementation is abstracted and controlled by the device
owner, then we can start working to change the XFS implementation to
support shared pages....

Cheers,

Dave.