diff mbox

mm, dax: VMA with vm_ops->pfn_mkwrite wants to be write-notified

Message ID 1441102961-68041-1-git-send-email-kirill.shutemov@linux.intel.com (mailing list archive)
State Superseded
Headers show

Commit Message

kirill.shutemov@linux.intel.com Sept. 1, 2015, 10:22 a.m. UTC
For VM_PFNMAP and VM_MIXEDMAP we use vm_ops->pfn_mkwrite instead of
vm_ops->page_mkwrite to notify abort write access. This means we want
vma->vm_page_prot to be write-protected if the VMA provides this vm_ops.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Yigal Korman <yigal@plexistor.com>
Cc: Boaz Harrosh <boaz@plexistor.com>
Cc: Matthew Wilcox <matthew.r.wilcox@intel.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Dave Chinner <david@fromorbit.com>
---
 mm/mmap.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Comments

Boaz Harrosh Sept. 1, 2015, 10:54 a.m. UTC | #1
On 09/01/2015 01:22 PM, Kirill A. Shutemov wrote:
> For VM_PFNMAP and VM_MIXEDMAP we use vm_ops->pfn_mkwrite instead of
> vm_ops->page_mkwrite to notify abort write access. This means we want
> vma->vm_page_prot to be write-protected if the VMA provides this vm_ops.
> 

Hi Kirill

I will test with this right away and ACK on this.

Hmm so are you saying we might be missing some buffer modifications right now.

What would be a theoretical scenario that will cause these missed events?
I would like to put a test in our test rigs that should fail today and this
patch fixes.

[In our system every modified pmem block is also RDMAed to a remote
 pmem for HA, a missed modification will make the two copies unsynced]

Thanks for catching this
Boaz

> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Cc: Yigal Korman <yigal@plexistor.com>
> Cc: Boaz Harrosh <boaz@plexistor.com>
> Cc: Matthew Wilcox <matthew.r.wilcox@intel.com>
> Cc: Jan Kara <jack@suse.cz>
> Cc: Dave Chinner <david@fromorbit.com>
> ---
>  mm/mmap.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/mmap.c b/mm/mmap.c
> index df6d5f07035b..3f78bceefe5a 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -1498,7 +1498,8 @@ int vma_wants_writenotify(struct vm_area_struct *vma)
>  		return 0;
>  
>  	/* The backer wishes to know when pages are first written to? */
> -	if (vma->vm_ops && vma->vm_ops->page_mkwrite)
> +	if (vma->vm_ops &&
> +			(vma->vm_ops->page_mkwrite || vma->vm_ops->pfn_mkwrite))
>  		return 1;
>  
>  	/* The open routine did something to the protections that pgprot_modify
>
Kirill A . Shutemov Sept. 1, 2015, 11:10 a.m. UTC | #2
On Tue, Sep 01, 2015 at 01:54:26PM +0300, Boaz Harrosh wrote:
> On 09/01/2015 01:22 PM, Kirill A. Shutemov wrote:
> > For VM_PFNMAP and VM_MIXEDMAP we use vm_ops->pfn_mkwrite instead of
> > vm_ops->page_mkwrite to notify abort write access. This means we want
> > vma->vm_page_prot to be write-protected if the VMA provides this vm_ops.
> > 
> 
> Hi Kirill
> 
> I will test with this right away and ACK on this.
> 
> Hmm so are you saying we might be missing some buffer modifications right now.
> 
> What would be a theoretical scenario that will cause these missed events?

On writable mapping with vm_ops->pfn_mkwrite, but without
vm_ops->page_mkwrite: read fault followed by write access to the pfn.
Writable pte will be set up on read fault and write fault will not be
generated.

I found it examining Dave's complain on generic/080:

http://lkml.kernel.org/g/20150831233803.GO3902@dastard

Although I don't think it's the reason.

> I would like to put a test in our test rigs that should fail today and this
> patch fixes.
> 
> [In our system every modified pmem block is also RDMAed to a remote
>  pmem for HA, a missed modification will make the two copies unsynced]

It shouldn't be a problem for ext2/ext4 as they provide both pfn_mkwrite
and page_mkwrite.
Boaz Harrosh Sept. 1, 2015, 11:21 a.m. UTC | #3
On 09/01/2015 02:10 PM, Kirill A. Shutemov wrote:
> On Tue, Sep 01, 2015 at 01:54:26PM +0300, Boaz Harrosh wrote:
>> On 09/01/2015 01:22 PM, Kirill A. Shutemov wrote:
>>> For VM_PFNMAP and VM_MIXEDMAP we use vm_ops->pfn_mkwrite instead of
>>> vm_ops->page_mkwrite to notify abort write access. This means we want
>>> vma->vm_page_prot to be write-protected if the VMA provides this vm_ops.
>>>
>>
>> Hi Kirill
>>
>> I will test with this right away and ACK on this.
>>
>> Hmm so are you saying we might be missing some buffer modifications right now.
>>
>> What would be a theoretical scenario that will cause these missed events?
> 
> On writable mapping with vm_ops->pfn_mkwrite, but without
> vm_ops->page_mkwrite: read fault followed by write access to the pfn.
> Writable pte will be set up on read fault and write fault will not be
> generated.
> 
> I found it examining Dave's complain on generic/080:
> 
> http://lkml.kernel.org/g/20150831233803.GO3902@dastard
> 
> Although I don't think it's the reason.
> 
>> I would like to put a test in our test rigs that should fail today and this
>> patch fixes.
>>
>> [In our system every modified pmem block is also RDMAed to a remote
>>  pmem for HA, a missed modification will make the two copies unsynced]
> 
> It shouldn't be a problem for ext2/ext4 as they provide both pfn_mkwrite
> and page_mkwrite.
> 

Ha right we have both as well, and so should xfs I think (because of the
zero pages thing, in fact any dax.c user should).

Thanks so this verifies why we could not see any such breakage.

ACK-by: Boaz Harrosh <boaz@plexistor.com>
diff mbox

Patch

diff --git a/mm/mmap.c b/mm/mmap.c
index df6d5f07035b..3f78bceefe5a 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1498,7 +1498,8 @@  int vma_wants_writenotify(struct vm_area_struct *vma)
 		return 0;
 
 	/* The backer wishes to know when pages are first written to? */
-	if (vma->vm_ops && vma->vm_ops->page_mkwrite)
+	if (vma->vm_ops &&
+			(vma->vm_ops->page_mkwrite || vma->vm_ops->pfn_mkwrite))
 		return 1;
 
 	/* The open routine did something to the protections that pgprot_modify