diff mbox

[3/3,v2] dax: use pfn_mkwrite to update c/mtime + freeze protection

Message ID 54F820E2.9060109@plexistor.com (mailing list archive)
State New, archived
Headers show

Commit Message

Boaz Harrosh March 5, 2015, 9:24 a.m. UTC
From: Yigal Korman <yigal@plexistor.com>

[v1]
Without this patch, c/mtime is not updated correctly when mmap'ed page is
first read from and then written to.

A new xfstest is submitted for testing this (generic/080)

[v2]
Jan Kara has pointed out that if we add the
sb_start/end_pagefault pair in the new pfn_mkwrite we
are then fixing another bug where: A user could start
writing to the page while filesystem is frozen.

CC: Jan Kara <jack@suse.cz>
Signed-off-by: Yigal Korman <yigal@plexistor.com>
Signed-off-by: Boaz Harrosh <boaz@plexistor.com>
---
 fs/dax.c           | 17 +++++++++++++++++
 fs/ext2/file.c     |  1 +
 fs/ext4/file.c     |  1 +
 include/linux/fs.h |  1 +
 4 files changed, 20 insertions(+)

Comments

Boaz Harrosh March 5, 2015, 9:32 a.m. UTC | #1
On 03/05/2015 11:24 AM, Boaz Harrosh wrote:
> 
> [v1]
> Without this patch, c/mtime is not updated correctly when mmap'ed page is
> first read from and then written to.
> 
> A new xfstest is submitted for testing this (generic/080)
> 
> [v2]
> Jan Kara has pointed out that if we add the
> sb_start/end_pagefault pair in the new pfn_mkwrite we
> are then fixing another bug where: A user could start
> writing to the page while filesystem is frozen.
> 

Thanks Jan.

Just as curiosity, does the freezing code goes and turns all mappings
into read-only, Also for pfn mapping?

Do you think there is already an xfstest freezing test that should now
fail, and will succeed after this patch (v2). Something like:
  * mmap-read/write before the freeze
  * freeze the fs
  * Another thread tries to mmap-write, should get stuck
  * unfreeze the fs
  * Now mmap-writer continues

Thanks again
Boaz

> CC: Jan Kara <jack@suse.cz>
> Signed-off-by: Yigal Korman <yigal@plexistor.com>
> Signed-off-by: Boaz Harrosh <boaz@plexistor.com>
<>
Jan Kara March 5, 2015, 10:35 a.m. UTC | #2
On Thu 05-03-15 11:32:25, Boaz Harrosh wrote:
> On 03/05/2015 11:24 AM, Boaz Harrosh wrote:
> > 
> > [v1]
> > Without this patch, c/mtime is not updated correctly when mmap'ed page is
> > first read from and then written to.
> > 
> > A new xfstest is submitted for testing this (generic/080)
> > 
> > [v2]
> > Jan Kara has pointed out that if we add the
> > sb_start/end_pagefault pair in the new pfn_mkwrite we
> > are then fixing another bug where: A user could start
> > writing to the page while filesystem is frozen.
> > 
> 
> Thanks Jan.
> 
> Just as curiosity, does the freezing code goes and turns all mappings
> into read-only, Also for pfn mapping?
  Hum, that's a good question. Probably we don't end up doing that. For
normal filesystems we sync all inodes which also writeprotects all pages
(in clear_page_dirty_for_io() - for normal filesystems we know that if page
is writeably mapped it is dirty). However this won't happen for pfn
mapping as we don't have dirty pages. So we probably need dax_freeze()
implementation that will walk through all inodes with writeable mappings and
writeprotect them.

> Do you think there is already an xfstest freezing test that should now
> fail, and will succeed after this patch (v2). Something like:
>   * mmap-read/write before the freeze
>   * freeze the fs
>   * Another thread tries to mmap-write, should get stuck
>   * unfreeze the fs
>   * Now mmap-writer continues
  I don't remember there would be any test to specifically test this.

								Honza
Boaz Harrosh March 5, 2015, 10:47 a.m. UTC | #3
On 03/05/2015 12:35 PM, Jan Kara wrote:
> On Thu 05-03-15 11:32:25, Boaz Harrosh wrote:
>> On 03/05/2015 11:24 AM, Boaz Harrosh wrote:
<>
>>
>> Just as curiosity, does the freezing code goes and turns all mappings
>> into read-only, Also for pfn mapping?
>   Hum, that's a good question. Probably we don't end up doing that. For
>
> normal filesystems we sync all inodes which also writeprotects all pages
> (in clear_page_dirty_for_io() - for normal filesystems we know that if page
> is writeably mapped it is dirty). However this won't happen for pfn
> mapping as we don't have dirty pages. So we probably need dax_freeze()
> implementation that will walk through all inodes with writeable mappings and
> writeprotect them.
> 

I'll go head and try my shot on implementing a dax_freeze(). But I will
please need help with where to call it from.

Probably something like:
	if (IS_DAX(inode))
		dax_freeze(inode);
	else
		sync(inode)

So to share the for-all-inodes-in-sb loop. And also the IS_DAX(inode)
is per inode, for example dirs need the regular sync (if they are using page cache)

>> Do you think there is already an xfstest freezing test that should now
>> fail, and will succeed after this patch (v2). Something like:
>>   * mmap-read/write before the freeze
>>   * freeze the fs
>>   * Another thread tries to mmap-write, should get stuck
>>   * unfreeze the fs
>>   * Now mmap-writer continues
>   I don't remember there would be any test to specifically test this.
> 

OK Thanks, I was hopping we should already test for mmap vs freeze. This
is not special for our case. Actually mmap is the most fragile access.

> 								Honza
> 

Thanks
Boaz
Jan Kara March 5, 2015, 10:56 a.m. UTC | #4
On Thu 05-03-15 12:47:30, Boaz Harrosh wrote:
> On 03/05/2015 12:35 PM, Jan Kara wrote:
> > On Thu 05-03-15 11:32:25, Boaz Harrosh wrote:
> >> On 03/05/2015 11:24 AM, Boaz Harrosh wrote:
> <>
> >>
> >> Just as curiosity, does the freezing code goes and turns all mappings
> >> into read-only, Also for pfn mapping?
> >   Hum, that's a good question. Probably we don't end up doing that. For
> >
> > normal filesystems we sync all inodes which also writeprotects all pages
> > (in clear_page_dirty_for_io() - for normal filesystems we know that if page
> > is writeably mapped it is dirty). However this won't happen for pfn
> > mapping as we don't have dirty pages. So we probably need dax_freeze()
> > implementation that will walk through all inodes with writeable mappings and
> > writeprotect them.
> > 
> 
> I'll go head and try my shot on implementing a dax_freeze(). But I will
> please need help with where to call it from.
> 
> Probably something like:
> 	if (IS_DAX(inode))
> 		dax_freeze(inode);
> 	else
> 		sync(inode)
  We normally call sync_filesystem() from fs/superblock:freeze_super(). For
DAX filesystems we'd also need to call the special function after that.
Maybe dax_freeze() isn't the best name. It could be called something like
dax_writeprotect(sb) or something like that.

								Honza
diff mbox

Patch

diff --git a/fs/dax.c b/fs/dax.c
index ed1619e..d0bd1f4 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -464,6 +464,23 @@  int dax_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
 EXPORT_SYMBOL_GPL(dax_fault);
 
 /**
+ * dax_pfn_mkwrite - handle first write to DAX page
+ * @vma: The virtual memory area where the fault occurred
+ * @vmf: The description of the fault
+ *
+ */
+int dax_pfn_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
+{
+	struct super_block *sb = file_inode(vma->vm_file)->i_sb;
+
+	sb_start_pagefault(sb);
+	file_update_time(vma->vm_file);
+	sb_end_pagefault(sb);
+	return VM_FAULT_NOPAGE;
+}
+EXPORT_SYMBOL_GPL(dax_pfn_mkwrite);
+
+/**
  * dax_zero_page_range - zero a range within a page of a DAX file
  * @inode: The file being truncated
  * @from: The file offset that is being truncated to
diff --git a/fs/ext2/file.c b/fs/ext2/file.c
index e317017..866a3ce 100644
--- a/fs/ext2/file.c
+++ b/fs/ext2/file.c
@@ -39,6 +39,7 @@  static int ext2_dax_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
 static const struct vm_operations_struct ext2_dax_vm_ops = {
 	.fault		= ext2_dax_fault,
 	.page_mkwrite	= ext2_dax_mkwrite,
+	.pfn_mkwrite	= dax_pfn_mkwrite,
 };
 
 static int ext2_file_mmap(struct file *file, struct vm_area_struct *vma)
diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index 33a09da..b43a7a6 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -206,6 +206,7 @@  static int ext4_dax_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
 static const struct vm_operations_struct ext4_dax_vm_ops = {
 	.fault		= ext4_dax_fault,
 	.page_mkwrite	= ext4_dax_mkwrite,
+	.pfn_mkwrite	= dax_pfn_mkwrite,
 };
 #else
 #define ext4_dax_vm_ops	ext4_file_vm_ops
diff --git a/include/linux/fs.h b/include/linux/fs.h
index b4d71b5..24af817 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2597,6 +2597,7 @@  int dax_clear_blocks(struct inode *, sector_t block, long size);
 int dax_zero_page_range(struct inode *, loff_t from, unsigned len, get_block_t);
 int dax_truncate_page(struct inode *, loff_t from, get_block_t);
 int dax_fault(struct vm_area_struct *, struct vm_fault *, get_block_t);
+int dax_pfn_mkwrite(struct vm_area_struct *, struct vm_fault *);
 #define dax_mkwrite(vma, vmf, gb)	dax_fault(vma, vmf, gb)
 
 #ifdef CONFIG_BLOCK