diff mbox

[7/8] xfs: Support for transparent PUD pages

Message ID 1450974037-24775-8-git-send-email-matthew.r.wilcox@intel.com (mailing list archive)
State Superseded
Headers show

Commit Message

Wilcox, Matthew R Dec. 24, 2015, 4:20 p.m. UTC
From: Matthew Wilcox <willy@linux.intel.com>

Call into DAX to provide support for PUD pages, just like the PMD cases.

Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
---
 fs/xfs/xfs_file.c  | 33 +++++++++++++++++++++++++++++++++
 fs/xfs/xfs_trace.h |  1 +
 2 files changed, 34 insertions(+)

Comments

Dave Chinner Dec. 30, 2015, 11:30 p.m. UTC | #1
On Thu, Dec 24, 2015 at 11:20:36AM -0500, Matthew Wilcox wrote:
> From: Matthew Wilcox <willy@linux.intel.com>
> 
> Call into DAX to provide support for PUD pages, just like the PMD cases.
> 
> Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
> ---
>  fs/xfs/xfs_file.c  | 33 +++++++++++++++++++++++++++++++++
>  fs/xfs/xfs_trace.h |  1 +
>  2 files changed, 34 insertions(+)
> 
> diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> index f5392ab..a81b942 100644
> --- a/fs/xfs/xfs_file.c
> +++ b/fs/xfs/xfs_file.c
> @@ -1600,6 +1600,38 @@ xfs_filemap_pmd_fault(
>  	return ret;
>  }
>  
> +STATIC int
> +xfs_filemap_pud_fault(
> +	struct vm_area_struct	*vma,
> +	unsigned long		addr,
> +	pud_t			*pud,
> +	unsigned int		flags)
> +{
> +	struct inode		*inode = file_inode(vma->vm_file);
> +	struct xfs_inode	*ip = XFS_I(inode);
> +	int			ret;
> +
> +	if (!IS_DAX(inode))
> +		return VM_FAULT_FALLBACK;
> +
> +	trace_xfs_filemap_pud_fault(ip);
> +
> +	if (flags & FAULT_FLAG_WRITE) {
> +		sb_start_pagefault(inode->i_sb);
> +		file_update_time(vma->vm_file);
> +	}
> +
> +	xfs_ilock(XFS_I(inode), XFS_MMAPLOCK_SHARED);
> +	ret = __dax_pud_fault(vma, addr, pud, flags, xfs_get_blocks_dax_fault,
> +			      NULL);
> +	xfs_iunlock(XFS_I(inode), XFS_MMAPLOCK_SHARED);
> +
> +	if (flags & FAULT_FLAG_WRITE)
> +		sb_end_pagefault(inode->i_sb);
> +
> +	return ret;
> +}
> +
>  /*
>   * pfn_mkwrite was originally inteneded to ensure we capture time stamp
>   * updates on write faults. In reality, it's need to serialise against
> @@ -1637,6 +1669,7 @@ xfs_filemap_pfn_mkwrite(
>  static const struct vm_operations_struct xfs_file_vm_ops = {
>  	.fault		= xfs_filemap_fault,
>  	.pmd_fault	= xfs_filemap_pmd_fault,
> +	.pud_fault	= xfs_filemap_pud_fault,

This is getting silly - we now have 3 different page fault handlers
that all do exactly the same thing. Please abstract this so that the
page/pmd/pud is transparent and gets passed through to the generic
handler code that then handles the differences between page/pmd/pud
internally.

This, after all, is the original reason that the ->fault handler was
introduced....

Cheers,

Dave.
Matthew Wilcox Jan. 2, 2016, 4:43 p.m. UTC | #2
On Thu, Dec 31, 2015 at 10:30:27AM +1100, Dave Chinner wrote:
> > @@ -1637,6 +1669,7 @@ xfs_filemap_pfn_mkwrite(
> >  static const struct vm_operations_struct xfs_file_vm_ops = {
> >  	.fault		= xfs_filemap_fault,
> >  	.pmd_fault	= xfs_filemap_pmd_fault,
> > +	.pud_fault	= xfs_filemap_pud_fault,
> 
> This is getting silly - we now have 3 different page fault handlers
> that all do exactly the same thing. Please abstract this so that the
> page/pmd/pud is transparent and gets passed through to the generic
> handler code that then handles the differences between page/pmd/pud
> internally.
> 
> This, after all, is the original reason that the ->fault handler was
> introduced....

I agree that it's silly, but this is the direction I was asked to go in by
the MM people at the last MM summit.  There was agreement that this needs
to be abstracted, but that should be left for a separate cleanup round.
I did prototype something I called a vpte (virtual pte), but that's very
much on the back burner for now.
Dave Chinner Jan. 3, 2016, 8:33 p.m. UTC | #3
On Sat, Jan 02, 2016 at 11:43:09AM -0500, Matthew Wilcox wrote:
> On Thu, Dec 31, 2015 at 10:30:27AM +1100, Dave Chinner wrote:
> > > @@ -1637,6 +1669,7 @@ xfs_filemap_pfn_mkwrite(
> > >  static const struct vm_operations_struct xfs_file_vm_ops = {
> > >  	.fault		= xfs_filemap_fault,
> > >  	.pmd_fault	= xfs_filemap_pmd_fault,
> > > +	.pud_fault	= xfs_filemap_pud_fault,
> > 
> > This is getting silly - we now have 3 different page fault handlers
> > that all do exactly the same thing. Please abstract this so that the
> > page/pmd/pud is transparent and gets passed through to the generic
> > handler code that then handles the differences between page/pmd/pud
> > internally.
> > 
> > This, after all, is the original reason that the ->fault handler was
> > introduced....
> 
> I agree that it's silly, but this is the direction I was asked to go in by
> the MM people at the last MM summit.  There was agreement that this needs
> to be abstracted, but that should be left for a separate cleanup round.

Ok, so it's time to abstract it now, before we end up with another
round of broken filesystem code (like the first attempts at the
XFS pmd_fault code).

> I did prototype something I called a vpte (virtual pte), but that's very
> much on the back burner for now.

It's trivial to pack the parameters for pmd_fault and pud_fault
into the struct vm_fault - all you need to do is add pmd_t/pud_t
pointers to the structure, and everything else can be put into
existing members of that structure. There's no need for a "virtual
pte" type anywhere - you can do this effectively with an anonymous
union for the pte/pmd/pud pointer and a flag to indicate the fault
type.

Then in __dax_fault() you can check vmf->flags and call the
appropriate __dax_p{te,md,ud}_fault function, all without the
filesystem having to care about the different fault types. Similar
can be done with filemap_fault() - if it gets pmd/pud fault flags
set it can just reject them as they should never occur right now...

Cheers,

Dave.
Kirill A . Shutemov Jan. 4, 2016, 8:41 p.m. UTC | #4
On Mon, Jan 04, 2016 at 07:33:56AM +1100, Dave Chinner wrote:
> On Sat, Jan 02, 2016 at 11:43:09AM -0500, Matthew Wilcox wrote:
> > On Thu, Dec 31, 2015 at 10:30:27AM +1100, Dave Chinner wrote:
> > > > @@ -1637,6 +1669,7 @@ xfs_filemap_pfn_mkwrite(
> > > >  static const struct vm_operations_struct xfs_file_vm_ops = {
> > > >  	.fault		= xfs_filemap_fault,
> > > >  	.pmd_fault	= xfs_filemap_pmd_fault,
> > > > +	.pud_fault	= xfs_filemap_pud_fault,
> > > 
> > > This is getting silly - we now have 3 different page fault handlers
> > > that all do exactly the same thing. Please abstract this so that the
> > > page/pmd/pud is transparent and gets passed through to the generic
> > > handler code that then handles the differences between page/pmd/pud
> > > internally.
> > > 
> > > This, after all, is the original reason that the ->fault handler was
> > > introduced....
> > 
> > I agree that it's silly, but this is the direction I was asked to go in by
> > the MM people at the last MM summit.  There was agreement that this needs
> > to be abstracted, but that should be left for a separate cleanup round.
> 
> Ok, so it's time to abstract it now, before we end up with another
> round of broken filesystem code (like the first attempts at the
> XFS pmd_fault code).
> 
> > I did prototype something I called a vpte (virtual pte), but that's very
> > much on the back burner for now.
> 
> It's trivial to pack the parameters for pmd_fault and pud_fault
> into the struct vm_fault - all you need to do is add pmd_t/pud_t
> pointers to the structure, and everything else can be put into
> existing members of that structure. There's no need for a "virtual
> pte" type anywhere - you can do this effectively with an anonymous
> union for the pte/pmd/pud pointer and a flag to indicate the fault
> type.
> 
> Then in __dax_fault() you can check vmf->flags and call the
> appropriate __dax_p{te,md,ud}_fault function, all without the
> filesystem having to care about the different fault types. Similar
> can be done with filemap_fault() - if it gets pmd/pud fault flags
> set it can just reject them as they should never occur right now...

I think the first 4 patches of my hugetmpfs RFD patchset[1] are relevant
here. Looks like it shouldn't be a big deal to extend the approach to
cover DAX case.

[1] http://lkml.kernel.org./r/1447889136-6928-1-git-send-email-kirill.shutemov@linux.intel.com
diff mbox

Patch

diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index f5392ab..a81b942 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -1600,6 +1600,38 @@  xfs_filemap_pmd_fault(
 	return ret;
 }
 
+STATIC int
+xfs_filemap_pud_fault(
+	struct vm_area_struct	*vma,
+	unsigned long		addr,
+	pud_t			*pud,
+	unsigned int		flags)
+{
+	struct inode		*inode = file_inode(vma->vm_file);
+	struct xfs_inode	*ip = XFS_I(inode);
+	int			ret;
+
+	if (!IS_DAX(inode))
+		return VM_FAULT_FALLBACK;
+
+	trace_xfs_filemap_pud_fault(ip);
+
+	if (flags & FAULT_FLAG_WRITE) {
+		sb_start_pagefault(inode->i_sb);
+		file_update_time(vma->vm_file);
+	}
+
+	xfs_ilock(XFS_I(inode), XFS_MMAPLOCK_SHARED);
+	ret = __dax_pud_fault(vma, addr, pud, flags, xfs_get_blocks_dax_fault,
+			      NULL);
+	xfs_iunlock(XFS_I(inode), XFS_MMAPLOCK_SHARED);
+
+	if (flags & FAULT_FLAG_WRITE)
+		sb_end_pagefault(inode->i_sb);
+
+	return ret;
+}
+
 /*
  * pfn_mkwrite was originally inteneded to ensure we capture time stamp
  * updates on write faults. In reality, it's need to serialise against
@@ -1637,6 +1669,7 @@  xfs_filemap_pfn_mkwrite(
 static const struct vm_operations_struct xfs_file_vm_ops = {
 	.fault		= xfs_filemap_fault,
 	.pmd_fault	= xfs_filemap_pmd_fault,
+	.pud_fault	= xfs_filemap_pud_fault,
 	.map_pages	= filemap_map_pages,
 	.page_mkwrite	= xfs_filemap_page_mkwrite,
 	.pfn_mkwrite	= xfs_filemap_pfn_mkwrite,
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index 877079eb..16442bb 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -688,6 +688,7 @@  DEFINE_INODE_EVENT(xfs_inode_free_eofblocks_invalid);
 
 DEFINE_INODE_EVENT(xfs_filemap_fault);
 DEFINE_INODE_EVENT(xfs_filemap_pmd_fault);
+DEFINE_INODE_EVENT(xfs_filemap_pud_fault);
 DEFINE_INODE_EVENT(xfs_filemap_page_mkwrite);
 DEFINE_INODE_EVENT(xfs_filemap_pfn_mkwrite);