mbox series

[RFC,0/7] xfs: add reflink & dedupe support for fsdax.

Message ID 20190731114935.11030-1-ruansy.fnst@cn.fujitsu.com (mailing list archive)
Headers show
Series xfs: add reflink & dedupe support for fsdax. | expand

Message

Ruan Shiyang July 31, 2019, 11:49 a.m. UTC
This patchset aims to take care of this issue to make reflink and dedupe
work correctly in XFS.

It is based on Goldwyn's patchsets: "v4 Btrfs dax support" and "Btrfs
iomap".  I picked up some patches related and made a few fix to make it
basically works fine.

For dax framework: 
  1. adapt to the latest change in iomap.

For XFS:
  1. report the source address and set IOMAP_COW type for those write
     operations that need COW.
  2. update extent list at the end.
  3. add file contents comparison function based on dax framework.
  4. use xfs_break_layouts() to support dax.


Goldwyn Rodrigues (3):
  dax: replace mmap entry in case of CoW
  fs: dedup file range to use a compare function
  dax: memcpy before zeroing range

Shiyang Ruan (4):
  dax: Introduce dax_copy_edges() for COW.
  dax: copy data before write.
  xfs: Add COW handle for fsdax.
  xfs: Add dedupe support for fsdax.

 fs/btrfs/ioctl.c      |  11 ++-
 fs/dax.c              | 203 ++++++++++++++++++++++++++++++++++++++----
 fs/iomap.c            |   9 +-
 fs/ocfs2/file.c       |   2 +-
 fs/read_write.c       |  11 +--
 fs/xfs/xfs_iomap.c    |  42 +++++----
 fs/xfs/xfs_reflink.c  |  84 +++++++++--------
 include/linux/dax.h   |  15 ++--
 include/linux/fs.h    |   8 +-
 include/linux/iomap.h |   6 ++
 10 files changed, 294 insertions(+), 97 deletions(-)

Comments

Goldwyn Rodrigues July 31, 2019, 8:33 p.m. UTC | #1
On 19:49 31/07, Shiyang Ruan wrote:
> This patchset aims to take care of this issue to make reflink and dedupe
> work correctly in XFS.
> 
> It is based on Goldwyn's patchsets: "v4 Btrfs dax support" and "Btrfs
> iomap".  I picked up some patches related and made a few fix to make it
> basically works fine.
> 
> For dax framework: 
>   1. adapt to the latest change in iomap.
> 
> For XFS:
>   1. report the source address and set IOMAP_COW type for those write
>      operations that need COW.
>   2. update extent list at the end.
>   3. add file contents comparison function based on dax framework.
>   4. use xfs_break_layouts() to support dax.

Shiyang,

I think you used the older patches which does not contain the iomap changes
which would call iomap_begin() with two iomaps. I have it in the btrfs-iomap
branch and plan to update it today. It is built on v5.3-rcX, so it should
contain the changes which moves the iomap code to the different directory.
I will build the dax patches on top of that.
However, we are making a big dependency chain here :(
Ruan Shiyang Aug. 1, 2019, 1:37 a.m. UTC | #2
On 8/1/19 4:33 AM, Goldwyn Rodrigues wrote:
> On 19:49 31/07, Shiyang Ruan wrote:
>> This patchset aims to take care of this issue to make reflink and dedupe
>> work correctly in XFS.
>>
>> It is based on Goldwyn's patchsets: "v4 Btrfs dax support" and "Btrfs
>> iomap".  I picked up some patches related and made a few fix to make it
>> basically works fine.
>>
>> For dax framework:
>>    1. adapt to the latest change in iomap.
>>
>> For XFS:
>>    1. report the source address and set IOMAP_COW type for those write
>>       operations that need COW.
>>    2. update extent list at the end.
>>    3. add file contents comparison function based on dax framework.
>>    4. use xfs_break_layouts() to support dax.
> 
> Shiyang,
> 
> I think you used the older patches which does not contain the iomap changes
> which would call iomap_begin() with two iomaps. I have it in the btrfs-iomap

Oh, Sorry for my carelessness.  This patchset is built on your "Btrfs 
iomap".  I didn't point it out in cover letter.

> branch and plan to update it today. It is built on v5.3-rcX, so it should
> contain the changes which moves the iomap code to the different directory.
> I will build the dax patches on top of that.
> However, we are making a big dependency chain here 
Don't worry.  It's fine for me.  I'll follow your updates.

>
Dave Chinner Aug. 5, 2019, 12:21 a.m. UTC | #3
On Thu, Aug 01, 2019 at 09:37:04AM +0800, Shiyang Ruan wrote:
> 
> 
> On 8/1/19 4:33 AM, Goldwyn Rodrigues wrote:
> > On 19:49 31/07, Shiyang Ruan wrote:
> > > This patchset aims to take care of this issue to make reflink and dedupe
> > > work correctly in XFS.
> > > 
> > > It is based on Goldwyn's patchsets: "v4 Btrfs dax support" and "Btrfs
> > > iomap".  I picked up some patches related and made a few fix to make it
> > > basically works fine.
> > > 
> > > For dax framework:
> > >    1. adapt to the latest change in iomap.
> > > 
> > > For XFS:
> > >    1. report the source address and set IOMAP_COW type for those write
> > >       operations that need COW.
> > >    2. update extent list at the end.
> > >    3. add file contents comparison function based on dax framework.
> > >    4. use xfs_break_layouts() to support dax.
> > 
> > Shiyang,
> > 
> > I think you used the older patches which does not contain the iomap changes
> > which would call iomap_begin() with two iomaps. I have it in the btrfs-iomap
> 
> Oh, Sorry for my carelessness.  This patchset is built on your "Btrfs
> iomap".  I didn't point it out in cover letter.
> 
> > branch and plan to update it today. It is built on v5.3-rcX, so it should
> > contain the changes which moves the iomap code to the different directory.
> > I will build the dax patches on top of that.
> > However, we are making a big dependency chain here
> Don't worry.  It's fine for me.  I'll follow your updates.

Hi Shiyang,

I'll wait for you to update your patches on top of the latest btrfs
patches before looking at this any futher. it would be good to get
a set of iomap infrastructure patches separated from the btrfs
patchsets so could have them both built from a common patchset.

Cheers,

Dave.
Christoph Hellwig Oct. 9, 2019, 6:31 a.m. UTC | #4
Btw, I just had a chat with Dan last week on this.  And he pointed out
that while this series deals with the read/write path issues of 
reflink on DAX it doesn't deal with the mmap side issue that
page->mapping and page->index can point back to exactly one file.

I think we want a few xfstests that reflink a file and then use the
different links using mmap, as that should blow up pretty reliably.
Darrick J. Wong Oct. 9, 2019, 5:11 p.m. UTC | #5
On Tue, Oct 08, 2019 at 11:31:44PM -0700, Christoph Hellwig wrote:
> Btw, I just had a chat with Dan last week on this.  And he pointed out
> that while this series deals with the read/write path issues of 
> reflink on DAX it doesn't deal with the mmap side issue that
> page->mapping and page->index can point back to exactly one file.
> 
> I think we want a few xfstests that reflink a file and then use the
> different links using mmap, as that should blow up pretty reliably.

Hmm, you're right, we don't actually have a test that checks the
behavior of mwriting all copies of a shared block.  Ok, I'll go write
one.

--D
Dave Chinner Oct. 10, 2019, 7:30 a.m. UTC | #6
On Wed, Oct 09, 2019 at 10:11:52AM -0700, Darrick J. Wong wrote:
> On Tue, Oct 08, 2019 at 11:31:44PM -0700, Christoph Hellwig wrote:
> > Btw, I just had a chat with Dan last week on this.  And he pointed out
> > that while this series deals with the read/write path issues of 
> > reflink on DAX it doesn't deal with the mmap side issue that
> > page->mapping and page->index can point back to exactly one file.
> > 
> > I think we want a few xfstests that reflink a file and then use the
> > different links using mmap, as that should blow up pretty reliably.
> 
> Hmm, you're right, we don't actually have a test that checks the
> behavior of mwriting all copies of a shared block.  Ok, I'll go write
> one.

I've pointed this problem out to everyone who has asked me "what do
we need to do to support reflink on DAX". I've even walked a couple
of people right through the problem that needs to be solved and
discussed the potential solutions to it.

Problems that I think need addressing:

	- device dax and filesystem dax have fundamentally different
	  needs in this space, so they need to be separated and not
	  try to use the same solution.
	- dax_lock_entry() being used as a substitute for
	  page_lock() but it not being held on the page itself means
	  it can't be extended to serialise access to the page
	  across multiple mappings that are unaware of each other
	- dax_lock_page/dax_unlock_page interface for hardware
	  memory errors needs to report to the
	  filesystem for processing and repair, not assume the page
	  is user data and killing processes is the only possible
	  recovery mechanism.
	- dax_associate_entry/dax_disassociate_entry can only work
	  for a 1:1 page:mapping,index relationship. It needs to go
	  away and be replaced by a mechanism that allows
	  tracking multiple page mapping/index/state tuples. This
	  has much wider use than DAX (e.g. sharing page cache pages
	  between reflinked files)

I've proposed shadow pages (based on a concept from Matethw Wilcox)
for each read-only reflink mapping with the real physical page being
owned by the filesystem and indexed by LBA in the filesystem buffer
cache. This would be based on whether the extent in the file the
page is mapped from has multiple references to it.

i.e. When a new page mapping occurs in a shared extent, we add the
page to the buffer cache (i.e. point a struct xfs_buf at it)i if it
isn't already present, then allocate a shadow page, point it at the
master, set it up with the new mapping,index tuple and add it to the
mapping tree. Then we can treat it as a unique page even though it
points to the read-only master page.

When the page get's COWed, we toss away the shadow page and the
master can be reclaimed with the reference count goes to zero or the
extent is no longer shared.  Think of it kind of like the way we
multiply reference the zero page for holes in mmap()d dax regions,
except we can have millions of them and they are found by physical
buffer cache index lookups. 

This works for both DAX and non-DAX sharing of read-only shared
filesytsem pages. i.e. it would form the basis of single-copy
read-only page cache pages for reflinked files.

There was quite a bit of talk at LSFMM 2018 about having a linked
list of mapping structures hanging off a struct page, one for each
mapping that references the page. Operations would then have to walk
all mappings that reference the page. This was useful to other
subsystems (HMM?) for some purpose I forget, but I'm not sure it's
particularly useful by itself for non-dax reflink purposes - I
suspect the filesystem would still need to track such pages itself
in it's buffer cache so it can find the cached page to link new
reflink copies to the same page...

ISTR a couple of other solutions were thrown around, but I don't
think anyone came up with a simple solution...

Cheers,

Dave.