diff mbox series

dax: dax_layout_busy_page() should not unmap cow pages

Message ID 20190802192956.GA3032@redhat.com (mailing list archive)
State Mainlined
Commit d75996dd022b6d83bd14af59b2775b1aa639e4b9
Headers show
Series dax: dax_layout_busy_page() should not unmap cow pages | expand

Commit Message

Vivek Goyal Aug. 2, 2019, 7:29 p.m. UTC
As of now dax_layout_busy_page() calls unmap_mapping_range() with last
argument as 1, which says even unmap cow pages. I am wondering who needs
to get rid of cow pages as well.

I noticed one interesting side affect of this. I mount xfs with -o dax and
mmaped a file with MAP_PRIVATE and wrote some data to a page which created
cow page. Then I called fallocate() on that file to zero a page of file.
fallocate() called dax_layout_busy_page() which unmapped cow pages as well
and then I tried to read back the data I wrote and what I get is old
data from persistent memory. I lost the data I had written. This
read basically resulted in new fault and read back the data from
persistent memory.

This sounds wrong. Are there any users which need to unmap cow pages
as well? If not, I am proposing changing it to not unmap cow pages.

I noticed this while while writing virtio_fs code where when I tried
to reclaim a memory range and that corrupted the executable and I
was running from virtio-fs and program got segment violation.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
 fs/dax.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Dan Williams Aug. 2, 2019, 7:37 p.m. UTC | #1
On Fri, Aug 2, 2019 at 12:30 PM Vivek Goyal <vgoyal@redhat.com> wrote:
>
> As of now dax_layout_busy_page() calls unmap_mapping_range() with last
> argument as 1, which says even unmap cow pages. I am wondering who needs
> to get rid of cow pages as well.
>
> I noticed one interesting side affect of this. I mount xfs with -o dax and
> mmaped a file with MAP_PRIVATE and wrote some data to a page which created
> cow page. Then I called fallocate() on that file to zero a page of file.
> fallocate() called dax_layout_busy_page() which unmapped cow pages as well
> and then I tried to read back the data I wrote and what I get is old
> data from persistent memory. I lost the data I had written. This
> read basically resulted in new fault and read back the data from
> persistent memory.
>
> This sounds wrong. Are there any users which need to unmap cow pages
> as well? If not, I am proposing changing it to not unmap cow pages.
>
> I noticed this while while writing virtio_fs code where when I tried
> to reclaim a memory range and that corrupted the executable and I
> was running from virtio-fs and program got segment violation.
>
> Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> ---
>  fs/dax.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> Index: rhvgoyal-linux/fs/dax.c
> ===================================================================
> --- rhvgoyal-linux.orig/fs/dax.c        2019-08-01 17:03:10.574675652 -0400
> +++ rhvgoyal-linux/fs/dax.c     2019-08-02 14:32:28.809639116 -0400
> @@ -600,7 +600,7 @@ struct page *dax_layout_busy_page(struct
>          * guaranteed to either see new references or prevent new
>          * references from being established.
>          */
> -       unmap_mapping_range(mapping, 0, 0, 1);
> +       unmap_mapping_range(mapping, 0, 0, 0);

Good find, yes, this looks correct to me and should also go to -stable.
Boaz Harrosh Aug. 5, 2019, 11:53 a.m. UTC | #2
On 02/08/2019 22:37, Dan Williams wrote:
> On Fri, Aug 2, 2019 at 12:30 PM Vivek Goyal <vgoyal@redhat.com> wrote:
>>
>> As of now dax_layout_busy_page() calls unmap_mapping_range() with last
>> argument as 1, which says even unmap cow pages. I am wondering who needs
>> to get rid of cow pages as well.
>>
>> I noticed one interesting side affect of this. I mount xfs with -o dax and
>> mmaped a file with MAP_PRIVATE and wrote some data to a page which created
>> cow page. Then I called fallocate() on that file to zero a page of file.
>> fallocate() called dax_layout_busy_page() which unmapped cow pages as well
>> and then I tried to read back the data I wrote and what I get is old
>> data from persistent memory. I lost the data I had written. This
>> read basically resulted in new fault and read back the data from
>> persistent memory.
>>
>> This sounds wrong. Are there any users which need to unmap cow pages
>> as well? If not, I am proposing changing it to not unmap cow pages.
>>
>> I noticed this while while writing virtio_fs code where when I tried
>> to reclaim a memory range and that corrupted the executable and I
>> was running from virtio-fs and program got segment violation.
>>
>> Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
>> ---
>>  fs/dax.c |    2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> Index: rhvgoyal-linux/fs/dax.c
>> ===================================================================
>> --- rhvgoyal-linux.orig/fs/dax.c        2019-08-01 17:03:10.574675652 -0400
>> +++ rhvgoyal-linux/fs/dax.c     2019-08-02 14:32:28.809639116 -0400
>> @@ -600,7 +600,7 @@ struct page *dax_layout_busy_page(struct
>>          * guaranteed to either see new references or prevent new
>>          * references from being established.
>>          */
>> -       unmap_mapping_range(mapping, 0, 0, 1);
>> +       unmap_mapping_range(mapping, 0, 0, 0);
> 
> Good find, yes, this looks correct to me and should also go to -stable.
> 

Please pay attention that unmap_mapping_range(mapping, ..., 1) is for the truncate case and friends

So as I understand the man page:
fallocate(FL_PUNCH_HOLE); means user is asking to get rid also of COW pages.
On the other way fallocate(FL_ZERO_RANGE) only the pmem portion is zeroed and COW (private pages) stays

Just saying I have not followed the above code path
(We should have an xfstest for this?)

Cheers
Boaz
Vivek Goyal Aug. 5, 2019, 6:49 p.m. UTC | #3
On Mon, Aug 05, 2019 at 02:53:06PM +0300, Boaz Harrosh wrote:
> On 02/08/2019 22:37, Dan Williams wrote:
> > On Fri, Aug 2, 2019 at 12:30 PM Vivek Goyal <vgoyal@redhat.com> wrote:
> >>
> >> As of now dax_layout_busy_page() calls unmap_mapping_range() with last
> >> argument as 1, which says even unmap cow pages. I am wondering who needs
> >> to get rid of cow pages as well.
> >>
> >> I noticed one interesting side affect of this. I mount xfs with -o dax and
> >> mmaped a file with MAP_PRIVATE and wrote some data to a page which created
> >> cow page. Then I called fallocate() on that file to zero a page of file.
> >> fallocate() called dax_layout_busy_page() which unmapped cow pages as well
> >> and then I tried to read back the data I wrote and what I get is old
> >> data from persistent memory. I lost the data I had written. This
> >> read basically resulted in new fault and read back the data from
> >> persistent memory.
> >>
> >> This sounds wrong. Are there any users which need to unmap cow pages
> >> as well? If not, I am proposing changing it to not unmap cow pages.
> >>
> >> I noticed this while while writing virtio_fs code where when I tried
> >> to reclaim a memory range and that corrupted the executable and I
> >> was running from virtio-fs and program got segment violation.
> >>
> >> Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> >> ---
> >>  fs/dax.c |    2 +-
> >>  1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >> Index: rhvgoyal-linux/fs/dax.c
> >> ===================================================================
> >> --- rhvgoyal-linux.orig/fs/dax.c        2019-08-01 17:03:10.574675652 -0400
> >> +++ rhvgoyal-linux/fs/dax.c     2019-08-02 14:32:28.809639116 -0400
> >> @@ -600,7 +600,7 @@ struct page *dax_layout_busy_page(struct
> >>          * guaranteed to either see new references or prevent new
> >>          * references from being established.
> >>          */
> >> -       unmap_mapping_range(mapping, 0, 0, 1);
> >> +       unmap_mapping_range(mapping, 0, 0, 0);
> > 
> > Good find, yes, this looks correct to me and should also go to -stable.
> > 
> 
> Please pay attention that unmap_mapping_range(mapping, ..., 1) is for the truncate case and friends
> 
> So as I understand the man page:
> fallocate(FL_PUNCH_HOLE); means user is asking to get rid also of COW pages.
> On the other way fallocate(FL_ZERO_RANGE) only the pmem portion is zeroed and COW (private pages) stays

I tested fallocate(FL_PUNCH_HOLE) on xfs (non-dax) and it does not seem to
get rid of COW pages and my test case still can read the data it wrote
in private pages.

> 
> Just saying I have not followed the above code path
> (We should have an xfstest for this?)

I don't know either. It indeed is interesting to figure out what's the
expected behavior with fallocate() and truncate() for COW pages and cover
that using xfstest (if not already done).

Irrespective of that, for dax, it seems particularly bad because
we call unmap_mapping_range() for the whole file. So even if we are
punching hole on a single page and expected cow page to go away associated
with that page, currently it will get rid of all COW pages in whole
file.

So to me it makes sense to not get rid of COW pages and possibly
introduce option of performing dax_layout_busy_page() on a range
of pages (as opposed to whole file) and caller can specify whether
to zap cow pages or not in the specified range.

Thanks
Vivek
Boaz Harrosh Aug. 5, 2019, 7:16 p.m. UTC | #4
On 05/08/2019 21:49, Vivek Goyal wrote:
> On Mon, Aug 05, 2019 at 02:53:06PM +0300, Boaz Harrosh wrote:
<>
>> So as I understand the man page:
>> fallocate(FL_PUNCH_HOLE); means user is asking to get rid also of COW pages.
>> On the other way fallocate(FL_ZERO_RANGE) only the pmem portion is zeroed and COW (private pages) stays
> 
> I tested fallocate(FL_PUNCH_HOLE) on xfs (non-dax) and it does not seem to
> get rid of COW pages and my test case still can read the data it wrote
> in private pages.
> 

It seems you are right and I am wrong. This is what the Kernel code has to say about it:

	/*
	 * Unlike in truncate_pagecache, unmap_mapping_range is called only
	 * once (before truncating pagecache), and without "even_cows" flag:
	 * hole-punching should not remove private COWed pages from the hole.
	 */

For me this is confusing but that is what it is. So remove private COWed pages
is only done when we do an setattr(ATTR_SIZE).

>>
>> Just saying I have not followed the above code path
>> (We should have an xfstest for this?)
> 
> I don't know either. It indeed is interesting to figure out what's the
> expected behavior with fallocate() and truncate() for COW pages and cover
> that using xfstest (if not already done).
> 

I could not find any test for the COW positive FL_PUNCH_HOLE (I have that bug)
could be nice to make one, and let FSs like mine fail.
Any way very nice catch.

> 
> Thanks
> Vivek
> 

Thanks
Boaz
Dan Williams Aug. 5, 2019, 8:11 p.m. UTC | #5
On Mon, Aug 5, 2019 at 12:17 PM Boaz Harrosh <boaz@plexistor.com> wrote:
>
> On 05/08/2019 21:49, Vivek Goyal wrote:
> > On Mon, Aug 05, 2019 at 02:53:06PM +0300, Boaz Harrosh wrote:
> <>
> >> So as I understand the man page:
> >> fallocate(FL_PUNCH_HOLE); means user is asking to get rid also of COW pages.
> >> On the other way fallocate(FL_ZERO_RANGE) only the pmem portion is zeroed and COW (private pages) stays
> >
> > I tested fallocate(FL_PUNCH_HOLE) on xfs (non-dax) and it does not seem to
> > get rid of COW pages and my test case still can read the data it wrote
> > in private pages.
> >
>
> It seems you are right and I am wrong. This is what the Kernel code has to say about it:
>
>         /*
>          * Unlike in truncate_pagecache, unmap_mapping_range is called only
>          * once (before truncating pagecache), and without "even_cows" flag:
>          * hole-punching should not remove private COWed pages from the hole.
>          */
>
> For me this is confusing but that is what it is. So remove private COWed pages
> is only done when we do an setattr(ATTR_SIZE).
>
> >>
> >> Just saying I have not followed the above code path
> >> (We should have an xfstest for this?)
> >
> > I don't know either. It indeed is interesting to figure out what's the
> > expected behavior with fallocate() and truncate() for COW pages and cover
> > that using xfstest (if not already done).
> >
>
> I could not find any test for the COW positive FL_PUNCH_HOLE (I have that bug)
> could be nice to make one, and let FSs like mine fail.
> Any way very nice catch.
>

Yes, and this bug is worse because it affects COW pages that are not
the direct target of the truncate / hole punch. This unmap in
dax_layout_busy_page() is only there to allow the fs to synchronize
against get_user_pages_fast() which might otherwise race to grab a
page reference and prevent the fs from making forward progress. The
unmap_mapping_range() that addresses COW pages in the truncated range
occurs later after the filesystem has regained control of the extent
layout (i.e. break layouts has succeeded).
diff mbox series

Patch

Index: rhvgoyal-linux/fs/dax.c
===================================================================
--- rhvgoyal-linux.orig/fs/dax.c	2019-08-01 17:03:10.574675652 -0400
+++ rhvgoyal-linux/fs/dax.c	2019-08-02 14:32:28.809639116 -0400
@@ -600,7 +600,7 @@  struct page *dax_layout_busy_page(struct
 	 * guaranteed to either see new references or prevent new
 	 * references from being established.
 	 */
-	unmap_mapping_range(mapping, 0, 0, 1);
+	unmap_mapping_range(mapping, 0, 0, 0);
 
 	xas_lock_irq(&xas);
 	xas_for_each(&xas, entry, ULONG_MAX) {