Message ID | 20160915115523.29737-13-kirill.shutemov@linux.intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Thu 15-09-16 14:54:54, Kirill A. Shutemov wrote: > For filesystems that wants to be write-notified (has mkwrite), we will > encount write-protection faults for huge PMDs in shared mappings. > > The easiest way to handle them is to clear the PMD and let it refault as > wriable. > > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> > --- > mm/memory.c | 11 ++++++++++- > 1 file changed, 10 insertions(+), 1 deletion(-) > > diff --git a/mm/memory.c b/mm/memory.c > index 83be99d9d8a1..aad8d5c6311f 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -3451,8 +3451,17 @@ static int wp_huge_pmd(struct fault_env *fe, pmd_t orig_pmd) > return fe->vma->vm_ops->pmd_fault(fe->vma, fe->address, fe->pmd, > fe->flags); > > + if (fe->vma->vm_flags & VM_SHARED) { > + /* Clear PMD */ > + zap_page_range_single(fe->vma, fe->address, > + HPAGE_PMD_SIZE, NULL); > + VM_BUG_ON(!pmd_none(*fe->pmd)); > + > + /* Refault to establish writable PMD */ > + return 0; > + } > + Since we want to write-protect the page table entry on each page writeback and write-enable then on the next write, this is relatively expensive. Would it be that complicated to handle this fully in ->pmd_fault handler like we do for DAX? Maybe it doesn't have to be done now but longer term I guess it might make sense. Otherwise the patch looks good so feel free to add: Reviewed-by: Jan Kara <jack@suse.cz> Honza
On Tue, Oct 11, 2016 at 05:47:50PM +0200, Jan Kara wrote: > On Thu 15-09-16 14:54:54, Kirill A. Shutemov wrote: > > For filesystems that wants to be write-notified (has mkwrite), we will > > encount write-protection faults for huge PMDs in shared mappings. > > > > The easiest way to handle them is to clear the PMD and let it refault as > > wriable. > > > > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> > > --- > > mm/memory.c | 11 ++++++++++- > > 1 file changed, 10 insertions(+), 1 deletion(-) > > > > diff --git a/mm/memory.c b/mm/memory.c > > index 83be99d9d8a1..aad8d5c6311f 100644 > > --- a/mm/memory.c > > +++ b/mm/memory.c > > @@ -3451,8 +3451,17 @@ static int wp_huge_pmd(struct fault_env *fe, pmd_t orig_pmd) > > return fe->vma->vm_ops->pmd_fault(fe->vma, fe->address, fe->pmd, > > fe->flags); > > > > + if (fe->vma->vm_flags & VM_SHARED) { > > + /* Clear PMD */ > > + zap_page_range_single(fe->vma, fe->address, > > + HPAGE_PMD_SIZE, NULL); > > + VM_BUG_ON(!pmd_none(*fe->pmd)); > > + > > + /* Refault to establish writable PMD */ > > + return 0; > > + } > > + > > Since we want to write-protect the page table entry on each page writeback > and write-enable then on the next write, this is relatively expensive. > Would it be that complicated to handle this fully in ->pmd_fault handler > like we do for DAX? > > Maybe it doesn't have to be done now but longer term I guess it might make > sense. Right. This approach is just simplier to implement. We can rework it if it will show up on traces. > Otherwise the patch looks good so feel free to add: > > Reviewed-by: Jan Kara <jack@suse.cz> Thanks!
diff --git a/mm/memory.c b/mm/memory.c index 83be99d9d8a1..aad8d5c6311f 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3451,8 +3451,17 @@ static int wp_huge_pmd(struct fault_env *fe, pmd_t orig_pmd) return fe->vma->vm_ops->pmd_fault(fe->vma, fe->address, fe->pmd, fe->flags); + if (fe->vma->vm_flags & VM_SHARED) { + /* Clear PMD */ + zap_page_range_single(fe->vma, fe->address, + HPAGE_PMD_SIZE, NULL); + VM_BUG_ON(!pmd_none(*fe->pmd)); + + /* Refault to establish writable PMD */ + return 0; + } + /* COW handled on pte level: split pmd */ - VM_BUG_ON_VMA(fe->vma->vm_flags & VM_SHARED, fe->vma); split_huge_pmd(fe->vma, fe->pmd, fe->address); return VM_FAULT_FALLBACK;
For filesystems that wants to be write-notified (has mkwrite), we will encount write-protection faults for huge PMDs in shared mappings. The easiest way to handle them is to clear the PMD and let it refault as wriable. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> --- mm/memory.c | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-)