diff mbox

[v4,RESEND,1/2] thp, dax: add thp_get_unmapped_area for pmd mappings

Message ID 1472497881-9323-2-git-send-email-toshi.kani@hpe.com (mailing list archive)
State New, archived
Headers show

Commit Message

Kani, Toshi Aug. 29, 2016, 7:11 p.m. UTC
When CONFIG_FS_DAX_PMD is set, DAX supports mmap() using pmd page
size.  This feature relies on both mmap virtual address and FS
block (i.e. physical address) to be aligned by the pmd page size.
Users can use mkfs options to specify FS to align block allocations.
However, aligning mmap address requires code changes to existing
applications for providing a pmd-aligned address to mmap().

For instance, fio with "ioengine=mmap" performs I/Os with mmap() [1].
It calls mmap() with a NULL address, which needs to be changed to
provide a pmd-aligned address for testing with DAX pmd mappings.
Changing all applications that call mmap() with NULL is undesirable.

Add thp_get_unmapped_area(), which can be called by filesystem's
get_unmapped_area to align an mmap address by the pmd size for
a DAX file.  It calls the default handler, mm->get_unmapped_area(),
to find a range and then aligns it for a DAX file.

The patch is based on Matthew Wilcox's change that allows adding
support of the pud page size easily.

[1]: https://github.com/axboe/fio/blob/master/engines/mmap.c
Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
---
 include/linux/huge_mm.h |    7 +++++++
 mm/huge_memory.c        |   43 +++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 50 insertions(+)

Comments

Dan Williams Aug. 29, 2016, 7:34 p.m. UTC | #1
On Mon, Aug 29, 2016 at 12:11 PM, Toshi Kani <toshi.kani@hpe.com> wrote:
> When CONFIG_FS_DAX_PMD is set, DAX supports mmap() using pmd page
> size.  This feature relies on both mmap virtual address and FS
> block (i.e. physical address) to be aligned by the pmd page size.
> Users can use mkfs options to specify FS to align block allocations.
> However, aligning mmap address requires code changes to existing
> applications for providing a pmd-aligned address to mmap().
>
> For instance, fio with "ioengine=mmap" performs I/Os with mmap() [1].
> It calls mmap() with a NULL address, which needs to be changed to
> provide a pmd-aligned address for testing with DAX pmd mappings.
> Changing all applications that call mmap() with NULL is undesirable.
>
> Add thp_get_unmapped_area(), which can be called by filesystem's
> get_unmapped_area to align an mmap address by the pmd size for
> a DAX file.  It calls the default handler, mm->get_unmapped_area(),
> to find a range and then aligns it for a DAX file.
>
> The patch is based on Matthew Wilcox's change that allows adding
> support of the pud page size easily.
>
> [1]: https://github.com/axboe/fio/blob/master/engines/mmap.c
> Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Matthew Wilcox <mawilcox@microsoft.com>
> Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Cc: Dave Chinner <david@fromorbit.com>
> Cc: Jan Kara <jack@suse.cz>
> Cc: Theodore Ts'o <tytso@mit.edu>
> Cc: Andreas Dilger <adilger.kernel@dilger.ca>
> Cc: Mike Kravetz <mike.kravetz@oracle.com>
> ---

Reviewed-by: Dan Williams <dan.j.williams@intel.com>

...with one minor nit:


>  include/linux/huge_mm.h |    7 +++++++
>  mm/huge_memory.c        |   43 +++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 50 insertions(+)
>
> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> index 6f14de4..4fca526 100644
> --- a/include/linux/huge_mm.h
> +++ b/include/linux/huge_mm.h
> @@ -87,6 +87,10 @@ extern bool is_vma_temporary_stack(struct vm_area_struct *vma);
>
>  extern unsigned long transparent_hugepage_flags;
>
> +extern unsigned long thp_get_unmapped_area(struct file *filp,
> +               unsigned long addr, unsigned long len, unsigned long pgoff,
> +               unsigned long flags);
> +
>  extern void prep_transhuge_page(struct page *page);
>  extern void free_transhuge_page(struct page *page);
>
> @@ -169,6 +173,9 @@ void put_huge_zero_page(void);
>  static inline void prep_transhuge_page(struct page *page) {}
>
>  #define transparent_hugepage_flags 0UL
> +
> +#define thp_get_unmapped_area  NULL

Lets make this:

static inline unsigned long thp_get_unmapped_area(struct file *filp,
               unsigned long addr, unsigned long len, unsigned long pgoff,
               unsigned long flags)
{
    return 0;
}

...to get some type checking in the CONFIG_TRANSPARENT_HUGEPAGE=n case.
Kani, Toshi Aug. 29, 2016, 8:44 p.m. UTC | #2
On Mon, 2016-08-29 at 12:34 -0700, Dan Williams wrote:
> On Mon, Aug 29, 2016 at 12:11 PM, Toshi Kani <toshi.kani@hpe.com>

> wrote:

> > 

> > When CONFIG_FS_DAX_PMD is set, DAX supports mmap() using pmd page

> > size.  This feature relies on both mmap virtual address and FS

> > block (i.e. physical address) to be aligned by the pmd page size.

> > Users can use mkfs options to specify FS to align block

> > allocations. However, aligning mmap address requires code changes

> > to existing applications for providing a pmd-aligned address to

> > mmap().

> > 

> > For instance, fio with "ioengine=mmap" performs I/Os with mmap()

> > [1]. It calls mmap() with a NULL address, which needs to be changed

> > to provide a pmd-aligned address for testing with DAX pmd mappings.

> > Changing all applications that call mmap() with NULL is

> > undesirable.

> > 

> > Add thp_get_unmapped_area(), which can be called by filesystem's

> > get_unmapped_area to align an mmap address by the pmd size for

> > a DAX file.  It calls the default handler, mm->get_unmapped_area(),

> > to find a range and then aligns it for a DAX file.

> > 

> > The patch is based on Matthew Wilcox's change that allows adding

> > support of the pud page size easily.

 :
> 

> Reviewed-by: Dan Williams <dan.j.williams@intel.com>


Great!

> ...with one minor nit:

> 

> 

> > 

> >  include/linux/huge_mm.h |    7 +++++++

> >  mm/huge_memory.c        |   43

> > +++++++++++++++++++++++++++++++++++++++++++

> >  2 files changed, 50 insertions(+)

> > 

> > diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h

> > index 6f14de4..4fca526 100644

> > --- a/include/linux/huge_mm.h

> > +++ b/include/linux/huge_mm.h

> > @@ -87,6 +87,10 @@ extern bool is_vma_temporary_stack(struct

> > vm_area_struct *vma);

> > 

> >  extern unsigned long transparent_hugepage_flags;

> > 

> > +extern unsigned long thp_get_unmapped_area(struct file *filp,

> > +               unsigned long addr, unsigned long len, unsigned

> > long pgoff,

> > +               unsigned long flags);

> > +

> >  extern void prep_transhuge_page(struct page *page);

> >  extern void free_transhuge_page(struct page *page);

> > 

> > @@ -169,6 +173,9 @@ void put_huge_zero_page(void);

> >  static inline void prep_transhuge_page(struct page *page) {}

> > 

> >  #define transparent_hugepage_flags 0UL

> > +

> > +#define thp_get_unmapped_area  NULL

> 

> Lets make this:

> 

> static inline unsigned long thp_get_unmapped_area(struct file *filp,

>                unsigned long addr, unsigned long len, unsigned long

> pgoff,

>                unsigned long flags)

> {

>     return 0;

> }

> 

> ...to get some type checking in the CONFIG_TRANSPARENT_HUGEPAGE=n

> case.

> 


Per get_unmapped_area() in mm/mmap.c, I think we need to set it to NULL
when we do not override current->mm->get_unmapped_area.

Thanks!
-Toshi
Dan Williams Aug. 29, 2016, 8:54 p.m. UTC | #3
On Mon, Aug 29, 2016 at 1:44 PM, Kani, Toshimitsu <toshi.kani@hpe.com> wrote:
> On Mon, 2016-08-29 at 12:34 -0700, Dan Williams wrote:
>> On Mon, Aug 29, 2016 at 12:11 PM, Toshi Kani <toshi.kani@hpe.com>
>> wrote:
>> >
>> > When CONFIG_FS_DAX_PMD is set, DAX supports mmap() using pmd page
>> > size.  This feature relies on both mmap virtual address and FS
>> > block (i.e. physical address) to be aligned by the pmd page size.
>> > Users can use mkfs options to specify FS to align block
>> > allocations. However, aligning mmap address requires code changes
>> > to existing applications for providing a pmd-aligned address to
>> > mmap().
>> >
>> > For instance, fio with "ioengine=mmap" performs I/Os with mmap()
>> > [1]. It calls mmap() with a NULL address, which needs to be changed
>> > to provide a pmd-aligned address for testing with DAX pmd mappings.
>> > Changing all applications that call mmap() with NULL is
>> > undesirable.
>> >
>> > Add thp_get_unmapped_area(), which can be called by filesystem's
>> > get_unmapped_area to align an mmap address by the pmd size for
>> > a DAX file.  It calls the default handler, mm->get_unmapped_area(),
>> > to find a range and then aligns it for a DAX file.
>> >
>> > The patch is based on Matthew Wilcox's change that allows adding
>> > support of the pud page size easily.
>  :
>>
>> Reviewed-by: Dan Williams <dan.j.williams@intel.com>
>
> Great!
>
>> ...with one minor nit:
>>
>>
>> >
>> >  include/linux/huge_mm.h |    7 +++++++
>> >  mm/huge_memory.c        |   43
>> > +++++++++++++++++++++++++++++++++++++++++++
>> >  2 files changed, 50 insertions(+)
>> >
>> > diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
>> > index 6f14de4..4fca526 100644
>> > --- a/include/linux/huge_mm.h
>> > +++ b/include/linux/huge_mm.h
>> > @@ -87,6 +87,10 @@ extern bool is_vma_temporary_stack(struct
>> > vm_area_struct *vma);
>> >
>> >  extern unsigned long transparent_hugepage_flags;
>> >
>> > +extern unsigned long thp_get_unmapped_area(struct file *filp,
>> > +               unsigned long addr, unsigned long len, unsigned
>> > long pgoff,
>> > +               unsigned long flags);
>> > +
>> >  extern void prep_transhuge_page(struct page *page);
>> >  extern void free_transhuge_page(struct page *page);
>> >
>> > @@ -169,6 +173,9 @@ void put_huge_zero_page(void);
>> >  static inline void prep_transhuge_page(struct page *page) {}
>> >
>> >  #define transparent_hugepage_flags 0UL
>> > +
>> > +#define thp_get_unmapped_area  NULL
>>
>> Lets make this:
>>
>> static inline unsigned long thp_get_unmapped_area(struct file *filp,
>>                unsigned long addr, unsigned long len, unsigned long
>> pgoff,
>>                unsigned long flags)
>> {
>>     return 0;
>> }
>>
>> ...to get some type checking in the CONFIG_TRANSPARENT_HUGEPAGE=n
>> case.
>>
>
> Per get_unmapped_area() in mm/mmap.c, I think we need to set it to NULL
> when we do not override current->mm->get_unmapped_area.

Ah, ok.
diff mbox

Patch

diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index 6f14de4..4fca526 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -87,6 +87,10 @@  extern bool is_vma_temporary_stack(struct vm_area_struct *vma);
 
 extern unsigned long transparent_hugepage_flags;
 
+extern unsigned long thp_get_unmapped_area(struct file *filp,
+		unsigned long addr, unsigned long len, unsigned long pgoff,
+		unsigned long flags);
+
 extern void prep_transhuge_page(struct page *page);
 extern void free_transhuge_page(struct page *page);
 
@@ -169,6 +173,9 @@  void put_huge_zero_page(void);
 static inline void prep_transhuge_page(struct page *page) {}
 
 #define transparent_hugepage_flags 0UL
+
+#define thp_get_unmapped_area	NULL
+
 static inline int
 split_huge_page_to_list(struct page *page, struct list_head *list)
 {
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 2db2112..883f0ee 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -469,6 +469,49 @@  void prep_transhuge_page(struct page *page)
 	set_compound_page_dtor(page, TRANSHUGE_PAGE_DTOR);
 }
 
+unsigned long __thp_get_unmapped_area(struct file *filp, unsigned long len,
+		loff_t off, unsigned long flags, unsigned long size)
+{
+	unsigned long addr;
+	loff_t off_end = off + len;
+	loff_t off_align = round_up(off, size);
+	unsigned long len_pad;
+
+	if (off_end <= off_align || (off_end - off_align) < size)
+		return 0;
+
+	len_pad = len + size;
+	if (len_pad < len || (off + len_pad) < off)
+		return 0;
+
+	addr = current->mm->get_unmapped_area(filp, 0, len_pad,
+					      off >> PAGE_SHIFT, flags);
+	if (IS_ERR_VALUE(addr))
+		return 0;
+
+	addr += (off - addr) & (size - 1);
+	return addr;
+}
+
+unsigned long thp_get_unmapped_area(struct file *filp, unsigned long addr,
+		unsigned long len, unsigned long pgoff, unsigned long flags)
+{
+	loff_t off = (loff_t)pgoff << PAGE_SHIFT;
+
+	if (addr)
+		goto out;
+	if (!IS_DAX(filp->f_mapping->host) || !IS_ENABLED(CONFIG_FS_DAX_PMD))
+		goto out;
+
+	addr = __thp_get_unmapped_area(filp, len, off, flags, PMD_SIZE);
+	if (addr)
+		return addr;
+
+ out:
+	return current->mm->get_unmapped_area(filp, addr, len, pgoff, flags);
+}
+EXPORT_SYMBOL_GPL(thp_get_unmapped_area);
+
 static int __do_huge_pmd_anonymous_page(struct fault_env *fe, struct page *page,
 		gfp_t gfp)
 {