[v2] mm: Introduce new function vm_insert_kmem_page

Message ID	20181003185854.GA1174@jordon-HP-15-Notebook-PC (mailing list archive)
State	New, archived
Headers	show Return-Path: <owner-linux-mm@kvack.org> Received-SPF: pass (google.com: domain of jrdr.linux@gmail.com designates 209.85.220.65 as permitted sender) client-ip=209.85.220.65; Date: Thu, 4 Oct 2018 00:28:54 +0530 From: Souptick Joarder <jrdr.linux@gmail.com> To: willy@infradead.org, linux@armlinux.org.uk, miguel.ojeda.sandonis@gmail.com, robin@protonic.nl, stefanr@s5r6.in-berlin.de, hjc@rock-chips.com, heiko@sntech.de, airlied@linux.ie, robin.murphy@arm.com, iamjoonsoo.kim@lge.com, akpm@linux-foundation.org, m.szyprowski@samsung.com, keescook@chromium.org, treding@nvidia.com, mhocko@suse.com, dan.j.williams@intel.com, kirill.shutemov@linux.intel.com, mark.rutland@arm.com, aryabinin@virtuozzo.com, dvyukov@google.com, kstewart@linuxfoundation.org, tchibo@google.com, riel@redhat.com, minchan@kernel.org, peterz@infradead.org, ying.huang@intel.com, ak@linux.intel.com, rppt@linux.vnet.ibm.com, linux@dominikbrodowski.net, arnd@arndb.de, cpandya@codeaurora.org, hannes@cmpxchg.org, joe@perches.com, mcgrof@kernel.org Cc: linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux1394-devel@lists.sourceforge.net, dri-devel@lists.freedesktop.org, linux-rockchip@lists.infradead.org, linux-mm@kvack.org Subject: [PATCH v2] mm: Introduce new function vm_insert_kmem_page Message-ID: <20181003185854.GA1174@jordon-HP-15-Notebook-PC> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.21 (2010-09-15) Sender: owner-linux-mm@kvack.org Precedence: bulk
Series	[v2] mm: Introduce new function vm_insert_kmem_page \| expand [v2] mm: Introduce new function vm_insert_kmem_page

Souptick Joarder Oct. 3, 2018, 6:58 p.m. UTC

vm_insert_kmem_page is similar to vm_insert_page and will
be used by drivers to map kernel (kmalloc/vmalloc/pages)
allocated memory to user vma.

Going forward, the plan is to restrict future drivers not
to use vm_insert_page ( *it will generate new errno to
VM_FAULT_CODE mapping code for new drivers which were already
cleaned up for existing drivers*) in #PF (page fault handler)
context but to make use of vmf_insert_page which returns
VMF_FAULT_CODE and that is not possible until both vm_insert_page
and vmf_insert_page API exists.

But there are some consumers of vm_insert_page which use it
outside #PF context. straight forward conversion of vm_insert_page
to vmf_insert_page won't work there as those function calls expects
errno not vm_fault_t in return.

These are the approaches which could have been taken to handle
this scenario -

*  Replace vm_insert_page with vmf_insert_page and then write few
   extra lines of code to convert VM_FAULT_CODE to errno which
   makes driver users more complex ( also the reverse mapping errno to
   VM_FAULT_CODE have been cleaned up as part of vm_fault_t migration ,
   not preferred to introduce anything similar again)

*  Maintain both vm_insert_page and vmf_insert_page and use it in
   respective places. But it won't gurantee that vm_insert_page will
   never be used in #PF context.

*  Introduce a similar API like vm_insert_page, convert all non #PF
   consumer to use it and finally remove vm_insert_page by converting
   it to vmf_insert_page.

And the 3rd approach was taken by introducing vm_insert_kmem_page().

In short, vmf_insert_page will be used in page fault handlers
context and vm_insert_kmem_page will be used to map kernel
memory to user vma outside page fault handlers context.

Few drivers are converted to use vm_insert_kmem_page(). This will
allow both to review the api and that it serves it purpose. other
consumers of vm_insert_page (*used in non #PF context*) will be
replaced by vm_insert_kmem_page, but in separate patches.

Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com>
---
v2: Few non #PF consumers of vm_insert_page are converted
    to use vm_insert_kmem_page in patch v2.

    Updated the change log.
    
 arch/arm/mm/dma-mapping.c                   |  2 +-
 drivers/auxdisplay/cfag12864bfb.c           |  2 +-
 drivers/auxdisplay/ht16k33.c                |  2 +-
 drivers/firewire/core-iso.c                 |  2 +-
 drivers/gpu/drm/rockchip/rockchip_drm_gem.c |  2 +-
 include/linux/mm.h                          |  2 +
 kernel/kcov.c                               |  4 +-
 mm/memory.c                                 | 69 +++++++++++++++++++++++++++++
 mm/nommu.c                                  |  7 +++
 mm/vmalloc.c                                |  2 +-
 10 files changed, 86 insertions(+), 8 deletions(-)

Miguel Ojeda Oct. 3, 2018, 7:58 p.m. UTC | #1

Hi Souptick,

On Wed, Oct 3, 2018 at 8:55 PM Souptick Joarder <jrdr.linux@gmail.com> wrote:
>
> vm_insert_kmem_page is similar to vm_insert_page and will
> be used by drivers to map kernel (kmalloc/vmalloc/pages)
> allocated memory to user vma.
>
> Going forward, the plan is to restrict future drivers not
> to use vm_insert_page ( *it will generate new errno to
> VM_FAULT_CODE mapping code for new drivers which were already
> cleaned up for existing drivers*) in #PF (page fault handler)
> context but to make use of vmf_insert_page which returns
> VMF_FAULT_CODE and that is not possible until both vm_insert_page
> and vmf_insert_page API exists.
>
> But there are some consumers of vm_insert_page which use it
> outside #PF context. straight forward conversion of vm_insert_page
> to vmf_insert_page won't work there as those function calls expects
> errno not vm_fault_t in return.
>
> These are the approaches which could have been taken to handle
> this scenario -
>
> *  Replace vm_insert_page with vmf_insert_page and then write few
>    extra lines of code to convert VM_FAULT_CODE to errno which
>    makes driver users more complex ( also the reverse mapping errno to
>    VM_FAULT_CODE have been cleaned up as part of vm_fault_t migration ,
>    not preferred to introduce anything similar again)
>
> *  Maintain both vm_insert_page and vmf_insert_page and use it in
>    respective places. But it won't gurantee that vm_insert_page will
>    never be used in #PF context.
>
> *  Introduce a similar API like vm_insert_page, convert all non #PF
>    consumer to use it and finally remove vm_insert_page by converting
>    it to vmf_insert_page.
>
> And the 3rd approach was taken by introducing vm_insert_kmem_page().

This looks better than the previous one of adding non-trivial code to
each driver, thank you!

A couple of comments below.

>
> In short, vmf_insert_page will be used in page fault handlers
> context and vm_insert_kmem_page will be used to map kernel
> memory to user vma outside page fault handlers context.
>
> Few drivers are converted to use vm_insert_kmem_page(). This will
> allow both to review the api and that it serves it purpose. other
> consumers of vm_insert_page (*used in non #PF context*) will be
> replaced by vm_insert_kmem_page, but in separate patches.
>

other -> Other

Also, as far as I can see, there are only a few vm_insert_page users
remaining. With the new function, they should be trivial to convert,
no? Therefore, could we do them all in one go, possibly in a patch
series?

Or, maybe, even better: wait until you remove the vm_* functions and
simply reuse vm_insert_page for this -- that way you don't need a new
name and you don't have to change any of the last users (I mean the
drivers using it outside the page fault handlers).

> Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com>
> ---
> v2: Few non #PF consumers of vm_insert_page are converted
>     to use vm_insert_kmem_page in patch v2.
>
>     Updated the change log.
>
>  arch/arm/mm/dma-mapping.c                   |  2 +-
>  drivers/auxdisplay/cfag12864bfb.c           |  2 +-
>  drivers/auxdisplay/ht16k33.c                |  2 +-
>  drivers/firewire/core-iso.c                 |  2 +-
>  drivers/gpu/drm/rockchip/rockchip_drm_gem.c |  2 +-
>  include/linux/mm.h                          |  2 +
>  kernel/kcov.c                               |  4 +-
>  mm/memory.c                                 | 69 +++++++++++++++++++++++++++++
>  mm/nommu.c                                  |  7 +++
>  mm/vmalloc.c                                |  2 +-
>  10 files changed, 86 insertions(+), 8 deletions(-)
>
> diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
> index 6656647..58d7971 100644
> --- a/arch/arm/mm/dma-mapping.c
> +++ b/arch/arm/mm/dma-mapping.c
> @@ -1598,7 +1598,7 @@ static int __arm_iommu_mmap_attrs(struct device *dev, struct vm_area_struct *vma
>         pages += off;
>
>         do {
> -               int ret = vm_insert_page(vma, uaddr, *pages++);
> +               int ret = vm_insert_kmem_page(vma, uaddr, *pages++);
>                 if (ret) {
>                         pr_err("Remapping memory failed: %d\n", ret);
>                         return ret;
> diff --git a/drivers/auxdisplay/cfag12864bfb.c b/drivers/auxdisplay/cfag12864bfb.c
> index 40c8a55..82fd627 100644
> --- a/drivers/auxdisplay/cfag12864bfb.c
> +++ b/drivers/auxdisplay/cfag12864bfb.c
> @@ -52,7 +52,7 @@
>
>  static int cfag12864bfb_mmap(struct fb_info *info, struct vm_area_struct *vma)
>  {
> -       return vm_insert_page(vma, vma->vm_start,
> +       return vm_insert_kmem_page(vma, vma->vm_start,
>                 virt_to_page(cfag12864b_buffer));
>  }
>
> diff --git a/drivers/auxdisplay/ht16k33.c b/drivers/auxdisplay/ht16k33.c
> index a43276c..64de30b 100644
> --- a/drivers/auxdisplay/ht16k33.c
> +++ b/drivers/auxdisplay/ht16k33.c
> @@ -224,7 +224,7 @@ static int ht16k33_mmap(struct fb_info *info, struct vm_area_struct *vma)
>  {
>         struct ht16k33_priv *priv = info->par;
>
> -       return vm_insert_page(vma, vma->vm_start,
> +       return vm_insert_kmem_page(vma, vma->vm_start,
>                               virt_to_page(priv->fbdev.buffer));
>  }
>
> diff --git a/drivers/firewire/core-iso.c b/drivers/firewire/core-iso.c
> index 051327a..5f1548d 100644
> --- a/drivers/firewire/core-iso.c
> +++ b/drivers/firewire/core-iso.c
> @@ -112,7 +112,7 @@ int fw_iso_buffer_map_vma(struct fw_iso_buffer *buffer,
>
>         uaddr = vma->vm_start;
>         for (i = 0; i < buffer->page_count; i++) {
> -               err = vm_insert_page(vma, uaddr, buffer->pages[i]);
> +               err = vm_insert_kmem_page(vma, uaddr, buffer->pages[i]);
>                 if (err)
>                         return err;
>
> diff --git a/drivers/gpu/drm/rockchip/rockchip_drm_gem.c b/drivers/gpu/drm/rockchip/rockchip_drm_gem.c
> index a8db758..57eb7af 100644
> --- a/drivers/gpu/drm/rockchip/rockchip_drm_gem.c
> +++ b/drivers/gpu/drm/rockchip/rockchip_drm_gem.c
> @@ -234,7 +234,7 @@ static int rockchip_drm_gem_object_mmap_iommu(struct drm_gem_object *obj,
>                 return -ENXIO;
>
>         for (i = offset; i < end; i++) {
> -               ret = vm_insert_page(vma, uaddr, rk_obj->pages[i]);
> +               ret = vm_insert_kmem_page(vma, uaddr, rk_obj->pages[i]);
>                 if (ret)
>                         return ret;
>                 uaddr += PAGE_SIZE;
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index a61ebe8..5f42d35 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2477,6 +2477,8 @@ unsigned long change_prot_numa(struct vm_area_struct *vma,
>  struct vm_area_struct *find_extend_vma(struct mm_struct *, unsigned long addr);
>  int remap_pfn_range(struct vm_area_struct *, unsigned long addr,
>                         unsigned long pfn, unsigned long size, pgprot_t);
> +int vm_insert_kmem_page(struct vm_area_struct *vma, unsigned long addr,
> +                               struct page *page);
>  int vm_insert_page(struct vm_area_struct *, unsigned long addr, struct page *);
>  int vm_insert_pfn(struct vm_area_struct *vma, unsigned long addr,
>                         unsigned long pfn);
> diff --git a/kernel/kcov.c b/kernel/kcov.c
> index 3ebd09e..2afaeb4 100644
> --- a/kernel/kcov.c
> +++ b/kernel/kcov.c
> @@ -293,8 +293,8 @@ static int kcov_mmap(struct file *filep, struct vm_area_struct *vma)
>                 spin_unlock(&kcov->lock);
>                 for (off = 0; off < size; off += PAGE_SIZE) {
>                         page = vmalloc_to_page(kcov->area + off);
> -                       if (vm_insert_page(vma, vma->vm_start + off, page))
> -                               WARN_ONCE(1, "vm_insert_page() failed");
> +                       if (vm_insert_kmem_page(vma, vma->vm_start + off, page))
> +                               WARN_ONCE(1, "vm_insert_kmem_page() failed");
>                 }
>                 return 0;
>         }
> diff --git a/mm/memory.c b/mm/memory.c
> index c467102..b800c10 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -1682,6 +1682,75 @@ pte_t *__get_locked_pte(struct mm_struct *mm, unsigned long addr,
>         return pte_alloc_map_lock(mm, pmd, addr, ptl);
>  }
>
> +static int insert_kmem_page(struct vm_area_struct *vma, unsigned long addr,
> +               struct page *page, pgprot_t prot)
> +{
> +       struct mm_struct *mm = vma->vm_mm;
> +       int retval;
> +       pte_t *pte;
> +       spinlock_t *ptl;
> +
> +       retval = -EINVAL;
> +       if (PageAnon(page))
> +               goto out;
> +       retval = -ENOMEM;
> +       flush_dcache_page(page);
> +       pte = get_locked_pte(mm, addr, &ptl);
> +       if (!pte)
> +               goto out;
> +       retval = -EBUSY;
> +       if (!pte_none(*pte))
> +               goto out_unlock;
> +
> +       get_page(page);
> +       inc_mm_counter_fast(mm, mm_counter_file(page));
> +       page_add_file_rmap(page, false);
> +       set_pte_at(mm, addr, pte, mk_pte(page, prot));
> +
> +       retval = 0;
> +       pte_unmap_unlock(pte, ptl);
> +       return retval;
> +out_unlock:
> +       pte_unmap_unlock(pte, ptl);
> +out:
> +       return retval;
> +}
> +
> +/**
> + * vm_insert_kmem_page - insert single page into user vma
> + * @vma: user vma to map to
> + * @addr: target user address of this page
> + * @page: source kernel page
> + *
> + * This allows drivers to insert individual kernel memory into a user vma.
> + * This API should be used outside page fault handlers context.
> + *
> + * Previously the same has been done with vm_insert_page by drivers. But
> + * vm_insert_page will be converted to vmf_insert_page and will be used
> + * in fault handlers context and return type of vmf_insert_page will be
> + * vm_fault_t type.

This is a "temporal" comment, i.e. it refers to things that are
happening at the moment -- I would say that should be part of the
commit message, not the code, since it will be obsolete soon. Also,
consider that, in a way, vm_insert_page is actually being replaced by
vmf_insert_page only in one of the use cases (the other being replaced
by this). Maybe you could instead say something like:

    In the past, vm_insert_page was used for this purpose. Do not use
vmf_insert_page because...

and leave the full explanation in the commit.

> + *
> + * But there are places where drivers need to map kernel memory into user
> + * vma outside fault handlers context. As vmf_insert_page will be restricted
> + * to use within page fault handlers, vm_insert_kmem_page could be used
> + * to map kernel memory to user vma outside fault handlers context.
> + */

Ditto.

> +int vm_insert_kmem_page(struct vm_area_struct *vma, unsigned long addr,
> +                       struct page *page)
> +{
> +       if (addr < vma->vm_start || addr >= vma->vm_end)
> +               return -EFAULT;
> +       if (!page_count(page))
> +               return -EINVAL;
> +       if (!(vma->vm_flags & VM_MIXEDMAP)) {
> +               BUG_ON(down_read_trylock(&vma->vm_mm->mmap_sem));
> +               BUG_ON(vma->vm_flags & VM_PFNMAP);
> +               vma->vm_flags |= VM_MIXEDMAP;
> +       }
> +       return insert_kmem_page(vma, addr, page, vma->vm_page_prot);
> +}
> +EXPORT_SYMBOL(vm_insert_kmem_page);
> +
>  /*
>   * This is the old fallback for page remapping.
>   *
> diff --git a/mm/nommu.c b/mm/nommu.c
> index e4aac33..153b8c8 100644
> --- a/mm/nommu.c
> +++ b/mm/nommu.c
> @@ -473,6 +473,13 @@ int vm_insert_page(struct vm_area_struct *vma, unsigned long addr,
>  }
>  EXPORT_SYMBOL(vm_insert_page);
>
> +int vm_insert_kmem_page(struct vm_area_struct *vma, unsigned long addr,
> +                               struct page *page)
> +{
> +       return -EINVAL;
> +}
> +EXPORT_SYMBOL(vm_insert_kmem_page);
> +
>  /*
>   *  sys_brk() for the most part doesn't need the global kernel
>   *  lock, except when an application is doing something nasty
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index a728fc4..61d279f 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -2251,7 +2251,7 @@ int remap_vmalloc_range_partial(struct vm_area_struct *vma, unsigned long uaddr,
>                 struct page *page = vmalloc_to_page(kaddr);
>                 int ret;
>
> -               ret = vm_insert_page(vma, uaddr, page);
> +               ret = vm_insert_kmem_page(vma, uaddr, page);
>                 if (ret)
>                         return ret;
>
> --
> 1.9.1
>

Cheers,
Miguel

Matthew Wilcox Oct. 3, 2018, 8 p.m. UTC | #2

On Thu, Oct 04, 2018 at 12:28:54AM +0530, Souptick Joarder wrote:
> These are the approaches which could have been taken to handle
> this scenario -
> 
> *  Replace vm_insert_page with vmf_insert_page and then write few
>    extra lines of code to convert VM_FAULT_CODE to errno which
>    makes driver users more complex ( also the reverse mapping errno to
>    VM_FAULT_CODE have been cleaned up as part of vm_fault_t migration ,
>    not preferred to introduce anything similar again)
> 
> *  Maintain both vm_insert_page and vmf_insert_page and use it in
>    respective places. But it won't gurantee that vm_insert_page will
>    never be used in #PF context.
> 
> *  Introduce a similar API like vm_insert_page, convert all non #PF
>    consumer to use it and finally remove vm_insert_page by converting
>    it to vmf_insert_page.
> 
> And the 3rd approach was taken by introducing vm_insert_kmem_page().
> 
> In short, vmf_insert_page will be used in page fault handlers
> context and vm_insert_kmem_page will be used to map kernel
> memory to user vma outside page fault handlers context.

As far as I can tell, vm_insert_kmem_page() is line-for-line identical
with vm_insert_page().  Seriously, here's a diff I just did:

-static int insert_page(struct vm_area_struct *vma, unsigned long addr,
-                       struct page *page, pgprot_t prot)
+static int insert_kmem_page(struct vm_area_struct *vma, unsigned long addr,
+               struct page *page, pgprot_t prot)
-       /* Ok, finally just insert the thing.. */
-int vm_insert_page(struct vm_area_struct *vma, unsigned long addr,
+int vm_insert_kmem_page(struct vm_area_struct *vma, unsigned long addr,
-       return insert_page(vma, addr, page, vma->vm_page_prot);
+       return insert_kmem_page(vma, addr, page, vma->vm_page_prot);
-EXPORT_SYMBOL(vm_insert_page);
+EXPORT_SYMBOL(vm_insert_kmem_page);

What on earth are you trying to do?

Russell King (Oracle) Oct. 3, 2018, 10:14 p.m. UTC | #3

On Wed, Oct 03, 2018 at 01:00:03PM -0700, Matthew Wilcox wrote:
> On Thu, Oct 04, 2018 at 12:28:54AM +0530, Souptick Joarder wrote:
> > These are the approaches which could have been taken to handle
> > this scenario -
> > 
> > *  Replace vm_insert_page with vmf_insert_page and then write few
> >    extra lines of code to convert VM_FAULT_CODE to errno which
> >    makes driver users more complex ( also the reverse mapping errno to
> >    VM_FAULT_CODE have been cleaned up as part of vm_fault_t migration ,
> >    not preferred to introduce anything similar again)
> > 
> > *  Maintain both vm_insert_page and vmf_insert_page and use it in
> >    respective places. But it won't gurantee that vm_insert_page will
> >    never be used in #PF context.
> > 
> > *  Introduce a similar API like vm_insert_page, convert all non #PF
> >    consumer to use it and finally remove vm_insert_page by converting
> >    it to vmf_insert_page.
> > 
> > And the 3rd approach was taken by introducing vm_insert_kmem_page().
> > 
> > In short, vmf_insert_page will be used in page fault handlers
> > context and vm_insert_kmem_page will be used to map kernel
> > memory to user vma outside page fault handlers context.
> 
> As far as I can tell, vm_insert_kmem_page() is line-for-line identical
> with vm_insert_page().  Seriously, here's a diff I just did:
> 
> -static int insert_page(struct vm_area_struct *vma, unsigned long addr,
> -                       struct page *page, pgprot_t prot)
> +static int insert_kmem_page(struct vm_area_struct *vma, unsigned long addr,
> +               struct page *page, pgprot_t prot)
> -       /* Ok, finally just insert the thing.. */
> -int vm_insert_page(struct vm_area_struct *vma, unsigned long addr,
> +int vm_insert_kmem_page(struct vm_area_struct *vma, unsigned long addr,
> -       return insert_page(vma, addr, page, vma->vm_page_prot);
> +       return insert_kmem_page(vma, addr, page, vma->vm_page_prot);
> -EXPORT_SYMBOL(vm_insert_page);
> +EXPORT_SYMBOL(vm_insert_kmem_page);
> 
> What on earth are you trying to do?

Reading the commit log, it seems that the intention is to split out
vm_insert_page() used outside of page-fault handling with the use
within page-fault handling, so that different return codes can be
used.

I don't see that justifies the code duplication - can't
vm_insert_page() and vm_insert_kmem_page() use the same mechanics
to do their job, and just translate the error code from the most-
specific to the least-specific error code?  Do we really need two
copies of the same code just to return different error codes.

Matthew Wilcox Oct. 4, 2018, 12:39 a.m. UTC | #4

On Wed, Oct 03, 2018 at 11:14:45PM +0100, Russell King - ARM Linux wrote:
> On Wed, Oct 03, 2018 at 01:00:03PM -0700, Matthew Wilcox wrote:
> > On Thu, Oct 04, 2018 at 12:28:54AM +0530, Souptick Joarder wrote:
> > > These are the approaches which could have been taken to handle
> > > this scenario -
> > > 
> > > *  Replace vm_insert_page with vmf_insert_page and then write few
> > >    extra lines of code to convert VM_FAULT_CODE to errno which
> > >    makes driver users more complex ( also the reverse mapping errno to
> > >    VM_FAULT_CODE have been cleaned up as part of vm_fault_t migration ,
> > >    not preferred to introduce anything similar again)
> > > 
> > > *  Maintain both vm_insert_page and vmf_insert_page and use it in
> > >    respective places. But it won't gurantee that vm_insert_page will
> > >    never be used in #PF context.
> > > 
> > > *  Introduce a similar API like vm_insert_page, convert all non #PF
> > >    consumer to use it and finally remove vm_insert_page by converting
> > >    it to vmf_insert_page.
> > > 
> > > And the 3rd approach was taken by introducing vm_insert_kmem_page().
> > > 
> > > In short, vmf_insert_page will be used in page fault handlers
> > > context and vm_insert_kmem_page will be used to map kernel
> > > memory to user vma outside page fault handlers context.
> > 
> > As far as I can tell, vm_insert_kmem_page() is line-for-line identical
> > with vm_insert_page().  Seriously, here's a diff I just did:
[...]
> > What on earth are you trying to do?
> 
> Reading the commit log, it seems that the intention is to split out
> vm_insert_page() used outside of page-fault handling with the use
> within page-fault handling, so that different return codes can be
> used.

Right, but we already did that.  We now have vmf_insert_page() which
returns a VM_FAULT_* code and vm_insert_page() which returns an errno.

Souptick Joarder Oct. 4, 2018, 11:56 a.m. UTC | #5

On Thu, Oct 4, 2018 at 1:28 AM Miguel Ojeda
<miguel.ojeda.sandonis@gmail.com> wrote:
>
> Hi Souptick,
>
> On Wed, Oct 3, 2018 at 8:55 PM Souptick Joarder <jrdr.linux@gmail.com> wrote:
> >
> > vm_insert_kmem_page is similar to vm_insert_page and will
> > be used by drivers to map kernel (kmalloc/vmalloc/pages)
> > allocated memory to user vma.
> >
> > Going forward, the plan is to restrict future drivers not
> > to use vm_insert_page ( *it will generate new errno to
> > VM_FAULT_CODE mapping code for new drivers which were already
> > cleaned up for existing drivers*) in #PF (page fault handler)
> > context but to make use of vmf_insert_page which returns
> > VMF_FAULT_CODE and that is not possible until both vm_insert_page
> > and vmf_insert_page API exists.
> >
> > But there are some consumers of vm_insert_page which use it
> > outside #PF context. straight forward conversion of vm_insert_page
> > to vmf_insert_page won't work there as those function calls expects
> > errno not vm_fault_t in return.
> >
> > These are the approaches which could have been taken to handle
> > this scenario -
> >
> > *  Replace vm_insert_page with vmf_insert_page and then write few
> >    extra lines of code to convert VM_FAULT_CODE to errno which
> >    makes driver users more complex ( also the reverse mapping errno to
> >    VM_FAULT_CODE have been cleaned up as part of vm_fault_t migration ,
> >    not preferred to introduce anything similar again)
> >
> > *  Maintain both vm_insert_page and vmf_insert_page and use it in
> >    respective places. But it won't gurantee that vm_insert_page will
> >    never be used in #PF context.
> >
> > *  Introduce a similar API like vm_insert_page, convert all non #PF
> >    consumer to use it and finally remove vm_insert_page by converting
> >    it to vmf_insert_page.
> >
> > And the 3rd approach was taken by introducing vm_insert_kmem_page().
>
> This looks better than the previous one of adding non-trivial code to
> each driver, thank you!
>
> A couple of comments below.
>
> >
> > In short, vmf_insert_page will be used in page fault handlers
> > context and vm_insert_kmem_page will be used to map kernel
> > memory to user vma outside page fault handlers context.
> >
> > Few drivers are converted to use vm_insert_kmem_page(). This will
> > allow both to review the api and that it serves it purpose. other
> > consumers of vm_insert_page (*used in non #PF context*) will be
> > replaced by vm_insert_kmem_page, but in separate patches.
> >
>
> other -> Other
>
> Also, as far as I can see, there are only a few vm_insert_page users
> remaining. With the new function, they should be trivial to convert,
> no? Therefore, could we do them all in one go, possibly in a patch
> series?

yes, all the user can be converted in a patch series.

>
> Or, maybe, even better: wait until you remove the vm_* functions and
> simply reuse vm_insert_page for this -- that way you don't need a new
> name and you don't have to change any of the last users (I mean the
> drivers using it outside the page fault handlers).
>
> > Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com>
> > ---
> > v2: Few non #PF consumers of vm_insert_page are converted
> >     to use vm_insert_kmem_page in patch v2.
> >
> >     Updated the change log.
> >
> >  arch/arm/mm/dma-mapping.c                   |  2 +-
> >  drivers/auxdisplay/cfag12864bfb.c           |  2 +-
> >  drivers/auxdisplay/ht16k33.c                |  2 +-
> >  drivers/firewire/core-iso.c                 |  2 +-
> >  drivers/gpu/drm/rockchip/rockchip_drm_gem.c |  2 +-
> >  include/linux/mm.h                          |  2 +
> >  kernel/kcov.c                               |  4 +-
> >  mm/memory.c                                 | 69 +++++++++++++++++++++++++++++
> >  mm/nommu.c                                  |  7 +++
> >  mm/vmalloc.c                                |  2 +-
> >  10 files changed, 86 insertions(+), 8 deletions(-)
> >
> > diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
> > index 6656647..58d7971 100644
> > --- a/arch/arm/mm/dma-mapping.c
> > +++ b/arch/arm/mm/dma-mapping.c
> > @@ -1598,7 +1598,7 @@ static int __arm_iommu_mmap_attrs(struct device *dev, struct vm_area_struct *vma
> >         pages += off;
> >
> >         do {
> > -               int ret = vm_insert_page(vma, uaddr, *pages++);
> > +               int ret = vm_insert_kmem_page(vma, uaddr, *pages++);
> >                 if (ret) {
> >                         pr_err("Remapping memory failed: %d\n", ret);
> >                         return ret;
> > diff --git a/drivers/auxdisplay/cfag12864bfb.c b/drivers/auxdisplay/cfag12864bfb.c
> > index 40c8a55..82fd627 100644
> > --- a/drivers/auxdisplay/cfag12864bfb.c
> > +++ b/drivers/auxdisplay/cfag12864bfb.c
> > @@ -52,7 +52,7 @@
> >
> >  static int cfag12864bfb_mmap(struct fb_info *info, struct vm_area_struct *vma)
> >  {
> > -       return vm_insert_page(vma, vma->vm_start,
> > +       return vm_insert_kmem_page(vma, vma->vm_start,
> >                 virt_to_page(cfag12864b_buffer));
> >  }
> >
> > diff --git a/drivers/auxdisplay/ht16k33.c b/drivers/auxdisplay/ht16k33.c
> > index a43276c..64de30b 100644
> > --- a/drivers/auxdisplay/ht16k33.c
> > +++ b/drivers/auxdisplay/ht16k33.c
> > @@ -224,7 +224,7 @@ static int ht16k33_mmap(struct fb_info *info, struct vm_area_struct *vma)
> >  {
> >         struct ht16k33_priv *priv = info->par;
> >
> > -       return vm_insert_page(vma, vma->vm_start,
> > +       return vm_insert_kmem_page(vma, vma->vm_start,
> >                               virt_to_page(priv->fbdev.buffer));
> >  }
> >
> > diff --git a/drivers/firewire/core-iso.c b/drivers/firewire/core-iso.c
> > index 051327a..5f1548d 100644
> > --- a/drivers/firewire/core-iso.c
> > +++ b/drivers/firewire/core-iso.c
> > @@ -112,7 +112,7 @@ int fw_iso_buffer_map_vma(struct fw_iso_buffer *buffer,
> >
> >         uaddr = vma->vm_start;
> >         for (i = 0; i < buffer->page_count; i++) {
> > -               err = vm_insert_page(vma, uaddr, buffer->pages[i]);
> > +               err = vm_insert_kmem_page(vma, uaddr, buffer->pages[i]);
> >                 if (err)
> >                         return err;
> >
> > diff --git a/drivers/gpu/drm/rockchip/rockchip_drm_gem.c b/drivers/gpu/drm/rockchip/rockchip_drm_gem.c
> > index a8db758..57eb7af 100644
> > --- a/drivers/gpu/drm/rockchip/rockchip_drm_gem.c
> > +++ b/drivers/gpu/drm/rockchip/rockchip_drm_gem.c
> > @@ -234,7 +234,7 @@ static int rockchip_drm_gem_object_mmap_iommu(struct drm_gem_object *obj,
> >                 return -ENXIO;
> >
> >         for (i = offset; i < end; i++) {
> > -               ret = vm_insert_page(vma, uaddr, rk_obj->pages[i]);
> > +               ret = vm_insert_kmem_page(vma, uaddr, rk_obj->pages[i]);
> >                 if (ret)
> >                         return ret;
> >                 uaddr += PAGE_SIZE;
> > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > index a61ebe8..5f42d35 100644
> > --- a/include/linux/mm.h
> > +++ b/include/linux/mm.h
> > @@ -2477,6 +2477,8 @@ unsigned long change_prot_numa(struct vm_area_struct *vma,
> >  struct vm_area_struct *find_extend_vma(struct mm_struct *, unsigned long addr);
> >  int remap_pfn_range(struct vm_area_struct *, unsigned long addr,
> >                         unsigned long pfn, unsigned long size, pgprot_t);
> > +int vm_insert_kmem_page(struct vm_area_struct *vma, unsigned long addr,
> > +                               struct page *page);
> >  int vm_insert_page(struct vm_area_struct *, unsigned long addr, struct page *);
> >  int vm_insert_pfn(struct vm_area_struct *vma, unsigned long addr,
> >                         unsigned long pfn);
> > diff --git a/kernel/kcov.c b/kernel/kcov.c
> > index 3ebd09e..2afaeb4 100644
> > --- a/kernel/kcov.c
> > +++ b/kernel/kcov.c
> > @@ -293,8 +293,8 @@ static int kcov_mmap(struct file *filep, struct vm_area_struct *vma)
> >                 spin_unlock(&kcov->lock);
> >                 for (off = 0; off < size; off += PAGE_SIZE) {
> >                         page = vmalloc_to_page(kcov->area + off);
> > -                       if (vm_insert_page(vma, vma->vm_start + off, page))
> > -                               WARN_ONCE(1, "vm_insert_page() failed");
> > +                       if (vm_insert_kmem_page(vma, vma->vm_start + off, page))
> > +                               WARN_ONCE(1, "vm_insert_kmem_page() failed");
> >                 }
> >                 return 0;
> >         }
> > diff --git a/mm/memory.c b/mm/memory.c
> > index c467102..b800c10 100644
> > --- a/mm/memory.c
> > +++ b/mm/memory.c
> > @@ -1682,6 +1682,75 @@ pte_t *__get_locked_pte(struct mm_struct *mm, unsigned long addr,
> >         return pte_alloc_map_lock(mm, pmd, addr, ptl);
> >  }
> >
> > +static int insert_kmem_page(struct vm_area_struct *vma, unsigned long addr,
> > +               struct page *page, pgprot_t prot)
> > +{
> > +       struct mm_struct *mm = vma->vm_mm;
> > +       int retval;
> > +       pte_t *pte;
> > +       spinlock_t *ptl;
> > +
> > +       retval = -EINVAL;
> > +       if (PageAnon(page))
> > +               goto out;
> > +       retval = -ENOMEM;
> > +       flush_dcache_page(page);
> > +       pte = get_locked_pte(mm, addr, &ptl);
> > +       if (!pte)
> > +               goto out;
> > +       retval = -EBUSY;
> > +       if (!pte_none(*pte))
> > +               goto out_unlock;
> > +
> > +       get_page(page);
> > +       inc_mm_counter_fast(mm, mm_counter_file(page));
> > +       page_add_file_rmap(page, false);
> > +       set_pte_at(mm, addr, pte, mk_pte(page, prot));
> > +
> > +       retval = 0;
> > +       pte_unmap_unlock(pte, ptl);
> > +       return retval;
> > +out_unlock:
> > +       pte_unmap_unlock(pte, ptl);
> > +out:
> > +       return retval;
> > +}
> > +
> > +/**
> > + * vm_insert_kmem_page - insert single page into user vma
> > + * @vma: user vma to map to
> > + * @addr: target user address of this page
> > + * @page: source kernel page
> > + *
> > + * This allows drivers to insert individual kernel memory into a user vma.
> > + * This API should be used outside page fault handlers context.
> > + *
> > + * Previously the same has been done with vm_insert_page by drivers. But
> > + * vm_insert_page will be converted to vmf_insert_page and will be used
> > + * in fault handlers context and return type of vmf_insert_page will be
> > + * vm_fault_t type.
>
> This is a "temporal" comment, i.e. it refers to things that are
> happening at the moment -- I would say that should be part of the
> commit message, not the code, since it will be obsolete soon. Also,
> consider that, in a way, vm_insert_page is actually being replaced by
> vmf_insert_page only in one of the use cases (the other being replaced
> by this). Maybe you could instead say something like:
>
>     In the past, vm_insert_page was used for this purpose. Do not use
> vmf_insert_page because...
>
> and leave the full explanation in the commit.

Sure , I will work on it.
>
> > + *
> > + * But there are places where drivers need to map kernel memory into user
> > + * vma outside fault handlers context. As vmf_insert_page will be restricted
> > + * to use within page fault handlers, vm_insert_kmem_page could be used
> > + * to map kernel memory to user vma outside fault handlers context.
> > + */
>
> Ditto.
>
> > +int vm_insert_kmem_page(struct vm_area_struct *vma, unsigned long addr,
> > +                       struct page *page)
> > +{
> > +       if (addr < vma->vm_start || addr >= vma->vm_end)
> > +               return -EFAULT;
> > +       if (!page_count(page))
> > +               return -EINVAL;
> > +       if (!(vma->vm_flags & VM_MIXEDMAP)) {
> > +               BUG_ON(down_read_trylock(&vma->vm_mm->mmap_sem));
> > +               BUG_ON(vma->vm_flags & VM_PFNMAP);
> > +               vma->vm_flags |= VM_MIXEDMAP;
> > +       }
> > +       return insert_kmem_page(vma, addr, page, vma->vm_page_prot);
> > +}
> > +EXPORT_SYMBOL(vm_insert_kmem_page);
> > +
> >  /*
> >   * This is the old fallback for page remapping.
> >   *
> > diff --git a/mm/nommu.c b/mm/nommu.c
> > index e4aac33..153b8c8 100644
> > --- a/mm/nommu.c
> > +++ b/mm/nommu.c
> > @@ -473,6 +473,13 @@ int vm_insert_page(struct vm_area_struct *vma, unsigned long addr,
> >  }
> >  EXPORT_SYMBOL(vm_insert_page);
> >
> > +int vm_insert_kmem_page(struct vm_area_struct *vma, unsigned long addr,
> > +                               struct page *page)
> > +{
> > +       return -EINVAL;
> > +}
> > +EXPORT_SYMBOL(vm_insert_kmem_page);
> > +
> >  /*
> >   *  sys_brk() for the most part doesn't need the global kernel
> >   *  lock, except when an application is doing something nasty
> > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> > index a728fc4..61d279f 100644
> > --- a/mm/vmalloc.c
> > +++ b/mm/vmalloc.c
> > @@ -2251,7 +2251,7 @@ int remap_vmalloc_range_partial(struct vm_area_struct *vma, unsigned long uaddr,
> >                 struct page *page = vmalloc_to_page(kaddr);
> >                 int ret;
> >
> > -               ret = vm_insert_page(vma, uaddr, page);
> > +               ret = vm_insert_kmem_page(vma, uaddr, page);
> >                 if (ret)
> >                         return ret;
> >
> > --
> > 1.9.1
> >
>
> Cheers,
> Miguel

Souptick Joarder Oct. 4, 2018, 12:15 p.m. UTC | #6

On Thu, Oct 4, 2018 at 3:45 AM Russell King - ARM Linux
<linux@armlinux.org.uk> wrote:
>
> On Wed, Oct 03, 2018 at 01:00:03PM -0700, Matthew Wilcox wrote:
> > On Thu, Oct 04, 2018 at 12:28:54AM +0530, Souptick Joarder wrote:
> > > These are the approaches which could have been taken to handle
> > > this scenario -
> > >
> > > *  Replace vm_insert_page with vmf_insert_page and then write few
> > >    extra lines of code to convert VM_FAULT_CODE to errno which
> > >    makes driver users more complex ( also the reverse mapping errno to
> > >    VM_FAULT_CODE have been cleaned up as part of vm_fault_t migration ,
> > >    not preferred to introduce anything similar again)
> > >
> > > *  Maintain both vm_insert_page and vmf_insert_page and use it in
> > >    respective places. But it won't gurantee that vm_insert_page will
> > >    never be used in #PF context.
> > >
> > > *  Introduce a similar API like vm_insert_page, convert all non #PF
> > >    consumer to use it and finally remove vm_insert_page by converting
> > >    it to vmf_insert_page.
> > >
> > > And the 3rd approach was taken by introducing vm_insert_kmem_page().
> > >
> > > In short, vmf_insert_page will be used in page fault handlers
> > > context and vm_insert_kmem_page will be used to map kernel
> > > memory to user vma outside page fault handlers context.
> >
> > As far as I can tell, vm_insert_kmem_page() is line-for-line identical
> > with vm_insert_page().  Seriously, here's a diff I just did:
> >
> > -static int insert_page(struct vm_area_struct *vma, unsigned long addr,
> > -                       struct page *page, pgprot_t prot)
> > +static int insert_kmem_page(struct vm_area_struct *vma, unsigned long addr,
> > +               struct page *page, pgprot_t prot)
> > -       /* Ok, finally just insert the thing.. */
> > -int vm_insert_page(struct vm_area_struct *vma, unsigned long addr,
> > +int vm_insert_kmem_page(struct vm_area_struct *vma, unsigned long addr,
> > -       return insert_page(vma, addr, page, vma->vm_page_prot);
> > +       return insert_kmem_page(vma, addr, page, vma->vm_page_prot);
> > -EXPORT_SYMBOL(vm_insert_page);
> > +EXPORT_SYMBOL(vm_insert_kmem_page);
> >
> > What on earth are you trying to do?

>
> Reading the commit log, it seems that the intention is to split out
> vm_insert_page() used outside of page-fault handling with the use
> within page-fault handling, so that different return codes can be
> used.
>
> I don't see that justifies the code duplication - can't
> vm_insert_page() and vm_insert_kmem_page() use the same mechanics
> to do their job, and just translate the error code from the most-
> specific to the least-specific error code?  Do we really need two
> copies of the same code just to return different error codes.

Sorry about it.
can I take below approach in a patch series ->

create a wrapper function vm_insert_kmem_page using vm_insert_page.
Convert all the non #PF users to use it.
Then make vm_insert_page static and convert inline vmf_insert_page to caller.

Russell King (Oracle) Oct. 4, 2018, 12:34 p.m. UTC | #7

On Thu, Oct 04, 2018 at 05:45:13PM +0530, Souptick Joarder wrote:
> On Thu, Oct 4, 2018 at 3:45 AM Russell King - ARM Linux
> <linux@armlinux.org.uk> wrote:
> >
> > On Wed, Oct 03, 2018 at 01:00:03PM -0700, Matthew Wilcox wrote:
> > > On Thu, Oct 04, 2018 at 12:28:54AM +0530, Souptick Joarder wrote:
> > > > These are the approaches which could have been taken to handle
> > > > this scenario -
> > > >
> > > > *  Replace vm_insert_page with vmf_insert_page and then write few
> > > >    extra lines of code to convert VM_FAULT_CODE to errno which
> > > >    makes driver users more complex ( also the reverse mapping errno to
> > > >    VM_FAULT_CODE have been cleaned up as part of vm_fault_t migration ,
> > > >    not preferred to introduce anything similar again)
> > > >
> > > > *  Maintain both vm_insert_page and vmf_insert_page and use it in
> > > >    respective places. But it won't gurantee that vm_insert_page will
> > > >    never be used in #PF context.
> > > >
> > > > *  Introduce a similar API like vm_insert_page, convert all non #PF
> > > >    consumer to use it and finally remove vm_insert_page by converting
> > > >    it to vmf_insert_page.
> > > >
> > > > And the 3rd approach was taken by introducing vm_insert_kmem_page().
> > > >
> > > > In short, vmf_insert_page will be used in page fault handlers
> > > > context and vm_insert_kmem_page will be used to map kernel
> > > > memory to user vma outside page fault handlers context.
> > >
> > > As far as I can tell, vm_insert_kmem_page() is line-for-line identical
> > > with vm_insert_page().  Seriously, here's a diff I just did:
> > >
> > > -static int insert_page(struct vm_area_struct *vma, unsigned long addr,
> > > -                       struct page *page, pgprot_t prot)
> > > +static int insert_kmem_page(struct vm_area_struct *vma, unsigned long addr,
> > > +               struct page *page, pgprot_t prot)
> > > -       /* Ok, finally just insert the thing.. */
> > > -int vm_insert_page(struct vm_area_struct *vma, unsigned long addr,
> > > +int vm_insert_kmem_page(struct vm_area_struct *vma, unsigned long addr,
> > > -       return insert_page(vma, addr, page, vma->vm_page_prot);
> > > +       return insert_kmem_page(vma, addr, page, vma->vm_page_prot);
> > > -EXPORT_SYMBOL(vm_insert_page);
> > > +EXPORT_SYMBOL(vm_insert_kmem_page);
> > >
> > > What on earth are you trying to do?
> 
> >
> > Reading the commit log, it seems that the intention is to split out
> > vm_insert_page() used outside of page-fault handling with the use
> > within page-fault handling, so that different return codes can be
> > used.
> >
> > I don't see that justifies the code duplication - can't
> > vm_insert_page() and vm_insert_kmem_page() use the same mechanics
> > to do their job, and just translate the error code from the most-
> > specific to the least-specific error code?  Do we really need two
> > copies of the same code just to return different error codes.
> 
> Sorry about it.
> can I take below approach in a patch series ->
> 
> create a wrapper function vm_insert_kmem_page using vm_insert_page.
> Convert all the non #PF users to use it.
> Then make vm_insert_page static and convert inline vmf_insert_page to caller.

I'm confused, what are you trying to do?

It seems that we already have:

vm_insert_page() - returns an errno
vmf_insert_page() - returns a VM_FAULT_* code

From what I _think_ you're saying, you're trying to provide
vm_insert_kmem_page() as a direct replacement for the existing
vm_insert_page(), and then make vm_insert_page() behave as per
vmf_insert_page(), so we end up with:

vm_insert_kmem_page() - returns an errno
vm_insert_page() - returns a VM_FAULT_* code
vmf_insert_page() - returns a VM_FAULT_* code and is identical to
      vm_insert_page()

Given that the documentation for vm_insert_page() says:

 * Usually this function is called from f_op->mmap() handler
 * under mm->mmap_sem write-lock, so it can change vma->vm_flags.
 * Caller must set VM_MIXEDMAP on vma if it wants to call this
 * function from other places, for example from page-fault handler.

this says that the "usual" use method for vm_insert_page() is
_outside_ of page fault handling - if it is used _inside_ page fault
handling, then it states that additional fixups are required on the
VMA.  So I don't get why your patch commentry seems to be saying that
users of vm_insert_page() outside of page fault handling all need to
be patched - isn't this the use case that this function is defined
to be handling?

If you're going to be changing the semantics, doesn't this create a
flag day where we could get new users of vm_insert_page() using the
_existing_ semantics merged after you've changed its semantics (eg,
the return code)?

Maybe I don't understand fully what you're trying to achieve here.

Souptick Joarder Oct. 4, 2018, 6:12 p.m. UTC | #8

On Thu, Oct 4, 2018 at 6:04 PM Russell King - ARM Linux
<linux@armlinux.org.uk> wrote:
>
> On Thu, Oct 04, 2018 at 05:45:13PM +0530, Souptick Joarder wrote:
> > On Thu, Oct 4, 2018 at 3:45 AM Russell King - ARM Linux
> > <linux@armlinux.org.uk> wrote:
> > >
> > > On Wed, Oct 03, 2018 at 01:00:03PM -0700, Matthew Wilcox wrote:
> > > > On Thu, Oct 04, 2018 at 12:28:54AM +0530, Souptick Joarder wrote:
> > > > > These are the approaches which could have been taken to handle
> > > > > this scenario -
> > > > >
> > > > > *  Replace vm_insert_page with vmf_insert_page and then write few
> > > > >    extra lines of code to convert VM_FAULT_CODE to errno which
> > > > >    makes driver users more complex ( also the reverse mapping errno to
> > > > >    VM_FAULT_CODE have been cleaned up as part of vm_fault_t migration ,
> > > > >    not preferred to introduce anything similar again)
> > > > >
> > > > > *  Maintain both vm_insert_page and vmf_insert_page and use it in
> > > > >    respective places. But it won't gurantee that vm_insert_page will
> > > > >    never be used in #PF context.
> > > > >
> > > > > *  Introduce a similar API like vm_insert_page, convert all non #PF
> > > > >    consumer to use it and finally remove vm_insert_page by converting
> > > > >    it to vmf_insert_page.
> > > > >
> > > > > And the 3rd approach was taken by introducing vm_insert_kmem_page().
> > > > >
> > > > > In short, vmf_insert_page will be used in page fault handlers
> > > > > context and vm_insert_kmem_page will be used to map kernel
> > > > > memory to user vma outside page fault handlers context.
> > > >
> > > > As far as I can tell, vm_insert_kmem_page() is line-for-line identical
> > > > with vm_insert_page().  Seriously, here's a diff I just did:
> > > >
> > > > -static int insert_page(struct vm_area_struct *vma, unsigned long addr,
> > > > -                       struct page *page, pgprot_t prot)
> > > > +static int insert_kmem_page(struct vm_area_struct *vma, unsigned long addr,
> > > > +               struct page *page, pgprot_t prot)
> > > > -       /* Ok, finally just insert the thing.. */
> > > > -int vm_insert_page(struct vm_area_struct *vma, unsigned long addr,
> > > > +int vm_insert_kmem_page(struct vm_area_struct *vma, unsigned long addr,
> > > > -       return insert_page(vma, addr, page, vma->vm_page_prot);
> > > > +       return insert_kmem_page(vma, addr, page, vma->vm_page_prot);
> > > > -EXPORT_SYMBOL(vm_insert_page);
> > > > +EXPORT_SYMBOL(vm_insert_kmem_page);
> > > >
> > > > What on earth are you trying to do?
> >
> > >
> > > Reading the commit log, it seems that the intention is to split out
> > > vm_insert_page() used outside of page-fault handling with the use
> > > within page-fault handling, so that different return codes can be
> > > used.
> > >
> > > I don't see that justifies the code duplication - can't
> > > vm_insert_page() and vm_insert_kmem_page() use the same mechanics
> > > to do their job, and just translate the error code from the most-
> > > specific to the least-specific error code?  Do we really need two
> > > copies of the same code just to return different error codes.
> >
> > Sorry about it.
> > can I take below approach in a patch series ->
> >
> > create a wrapper function vm_insert_kmem_page using vm_insert_page.
> > Convert all the non #PF users to use it.
> > Then make vm_insert_page static and convert inline vmf_insert_page to caller.
>
> I'm confused, what are you trying to do?
>
> It seems that we already have:
>
> vm_insert_page() - returns an errno
> vmf_insert_page() - returns a VM_FAULT_* code
>
> From what I _think_ you're saying, you're trying to provide
> vm_insert_kmem_page() as a direct replacement for the existing
> vm_insert_page(), and then make vm_insert_page() behave as per
> vmf_insert_page(), so we end up with:

yes, vm_insert_kmem_page() can be a direct replacement of vm_insert_page
or might be a wrapper function written using vm_insert_page whichever
suites better
based on feedback.

>
> vm_insert_kmem_page() - returns an errno
> vm_insert_page() - returns a VM_FAULT_* code
> vmf_insert_page() - returns a VM_FAULT_* code and is identical to
>       vm_insert_page()
>

After completion of conversion we end up with

 vm_insert_kmem_page() - returns an errno
 vmf_insert_page() - returns a VM_FAULT_* code


> Given that the documentation for vm_insert_page() says:
>
>  * Usually this function is called from f_op->mmap() handler
>  * under mm->mmap_sem write-lock, so it can change vma->vm_flags.
>  * Caller must set VM_MIXEDMAP on vma if it wants to call this
>  * function from other places, for example from page-fault handler.
>
> this says that the "usual" use method for vm_insert_page() is
> _outside_ of page fault handling - if it is used _inside_ page fault
> handling, then it states that additional fixups are required on the
> VMA.  So I don't get why your patch commentry seems to be saying that
> users of vm_insert_page() outside of page fault handling all need to
> be patched - isn't this the use case that this function is defined
> to be handling?

The answer is yes best of my knowledge.

But as mentioned in change log ->

Going forward, the plan is to restrict future drivers not
to use vm_insert_page ( *it will generate new errno to
VM_FAULT_CODE mapping code for new drivers which were already
cleaned up for existing drivers*) in #PF (page fault handler)
context but to make use of vmf_insert_page which returns
VMF_FAULT_CODE and that is not possible until both vm_insert_page
and vmf_insert_page API exists.

But there are some consumers of vm_insert_page which use it
outside #PF context. straight forward conversion of vm_insert_page
to vmf_insert_page won't work there as those function calls expects
errno not vm_fault_t in return.

If both {vm, vmf}_insert_page exists, vm_insert_page might be used for
#PF context which we want to protect by removing/ replacing vm_insert_page
with another similar/ wrapper API.

Is that the right answer of your question ? no ?

>
> If you're going to be changing the semantics, doesn't this create a
> flag day where we could get new users of vm_insert_page() using the
> _existing_ semantics merged after you've changed its semantics (eg,
> the return code)?

No, vm_insert_page API will remove/ replace only when all the user are
converted.
We will do it intermediately by first introducing new API, convert all
user to use it
and at final step remove the old API.

Matthew Wilcox Oct. 4, 2018, 6:17 p.m. UTC | #9

On Thu, Oct 04, 2018 at 11:42:18PM +0530, Souptick Joarder wrote:
> On Thu, Oct 4, 2018 at 6:04 PM Russell King - ARM Linux
> <linux@armlinux.org.uk> wrote:
> > I'm confused, what are you trying to do?
> >
> > It seems that we already have:
> >
> > vm_insert_page() - returns an errno
> > vmf_insert_page() - returns a VM_FAULT_* code
> >
> > From what I _think_ you're saying, you're trying to provide
> > vm_insert_kmem_page() as a direct replacement for the existing
> > vm_insert_page(), and then make vm_insert_page() behave as per
> > vmf_insert_page(), so we end up with:
> 
> yes, vm_insert_kmem_page() can be a direct replacement of vm_insert_page
> or might be a wrapper function written using vm_insert_page whichever
> suites better
> based on feedback.
> 
> >
> > vm_insert_kmem_page() - returns an errno
> > vm_insert_page() - returns a VM_FAULT_* code
> > vmf_insert_page() - returns a VM_FAULT_* code and is identical to
> >       vm_insert_page()
> >
> 
> After completion of conversion we end up with
> 
>  vm_insert_kmem_page() - returns an errno
>  vmf_insert_page() - returns a VM_FAULT_* code
> 
> 
> > Given that the documentation for vm_insert_page() says:
> >
> >  * Usually this function is called from f_op->mmap() handler
> >  * under mm->mmap_sem write-lock, so it can change vma->vm_flags.
> >  * Caller must set VM_MIXEDMAP on vma if it wants to call this
> >  * function from other places, for example from page-fault handler.
> >
> > this says that the "usual" use method for vm_insert_page() is
> > _outside_ of page fault handling - if it is used _inside_ page fault
> > handling, then it states that additional fixups are required on the
> > VMA.  So I don't get why your patch commentry seems to be saying that
> > users of vm_insert_page() outside of page fault handling all need to
> > be patched - isn't this the use case that this function is defined
> > to be handling?
> 
> The answer is yes best of my knowledge.
> 
> But as mentioned in change log ->
> 
> Going forward, the plan is to restrict future drivers not
> to use vm_insert_page ( *it will generate new errno to
> VM_FAULT_CODE mapping code for new drivers which were already
> cleaned up for existing drivers*) in #PF (page fault handler)
> context but to make use of vmf_insert_page which returns
> VMF_FAULT_CODE and that is not possible until both vm_insert_page
> and vmf_insert_page API exists.
> 
> But there are some consumers of vm_insert_page which use it
> outside #PF context. straight forward conversion of vm_insert_page
> to vmf_insert_page won't work there as those function calls expects
> errno not vm_fault_t in return.
> 
> If both {vm, vmf}_insert_page exists, vm_insert_page might be used for
> #PF context which we want to protect by removing/ replacing vm_insert_page
> with another similar/ wrapper API.
> 
> Is that the right answer of your question ? no ?

I think this is a bad plan.  What we should rather do is examine the current
users of vm_insert_page() and ask "What interface would better replace
vm_insert_page()?"

As I've said to you before, I believe the right answer is to have a
vm_insert_range() which takes an array of struct page pointers.  That
fits the majority of remaining users.

----

If we do want to rename vm_insert_page() to vm_insert_kmem_page(), then
the right answer is to _just do that_.  Not duplicate vm_insert_page()
in its entirety.  I don't see the point to doing that.

Souptick Joarder Oct. 4, 2018, 6:21 p.m. UTC | #10

Hi Matthew,

On Thu, Oct 4, 2018 at 1:30 AM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Thu, Oct 04, 2018 at 12:28:54AM +0530, Souptick Joarder wrote:
> > These are the approaches which could have been taken to handle
> > this scenario -
> >
> > *  Replace vm_insert_page with vmf_insert_page and then write few
> >    extra lines of code to convert VM_FAULT_CODE to errno which
> >    makes driver users more complex ( also the reverse mapping errno to
> >    VM_FAULT_CODE have been cleaned up as part of vm_fault_t migration ,
> >    not preferred to introduce anything similar again)
> >
> > *  Maintain both vm_insert_page and vmf_insert_page and use it in
> >    respective places. But it won't gurantee that vm_insert_page will
> >    never be used in #PF context.
> >
> > *  Introduce a similar API like vm_insert_page, convert all non #PF
> >    consumer to use it and finally remove vm_insert_page by converting
> >    it to vmf_insert_page.
> >
> > And the 3rd approach was taken by introducing vm_insert_kmem_page().
> >
> > In short, vmf_insert_page will be used in page fault handlers
> > context and vm_insert_kmem_page will be used to map kernel
> > memory to user vma outside page fault handlers context.
>
> As far as I can tell, vm_insert_kmem_page() is line-for-line identical
> with vm_insert_page().  Seriously, here's a diff I just did:
>
> -static int insert_page(struct vm_area_struct *vma, unsigned long addr,
> -                       struct page *page, pgprot_t prot)
> +static int insert_kmem_page(struct vm_area_struct *vma, unsigned long addr,
> +               struct page *page, pgprot_t prot)
> -       /* Ok, finally just insert the thing.. */
> -int vm_insert_page(struct vm_area_struct *vma, unsigned long addr,
> +int vm_insert_kmem_page(struct vm_area_struct *vma, unsigned long addr,
> -       return insert_page(vma, addr, page, vma->vm_page_prot);
> +       return insert_kmem_page(vma, addr, page, vma->vm_page_prot);
> -EXPORT_SYMBOL(vm_insert_page);
> +EXPORT_SYMBOL(vm_insert_kmem_page);
>
> What on earth are you trying to do?

Shall I take below approach rather than just creating a identical API
same as vm_insert_page ??

1. create a wrapper function vm_insert_kmem_page using vm_insert_page.
2. Convert all the non #PF users to use it.
3. Then make vm_insert_page static and convert inline vmf_insert_page to caller.

In that way we will be having two functions vmf_insert_page (#PF) and
vm_insert_kmem_page (non #PF) and both will be using common vm_insert_page
which will be static.

I am clear with the problem statement but not very clear on my solution.

Souptick Joarder Oct. 4, 2018, 6:53 p.m. UTC | #11

On Thu, Oct 4, 2018 at 11:47 PM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Thu, Oct 04, 2018 at 11:42:18PM +0530, Souptick Joarder wrote:
> > On Thu, Oct 4, 2018 at 6:04 PM Russell King - ARM Linux
> > <linux@armlinux.org.uk> wrote:
> > > I'm confused, what are you trying to do?
> > >
> > > It seems that we already have:
> > >
> > > vm_insert_page() - returns an errno
> > > vmf_insert_page() - returns a VM_FAULT_* code
> > >
> > > From what I _think_ you're saying, you're trying to provide
> > > vm_insert_kmem_page() as a direct replacement for the existing
> > > vm_insert_page(), and then make vm_insert_page() behave as per
> > > vmf_insert_page(), so we end up with:
> >
> > yes, vm_insert_kmem_page() can be a direct replacement of vm_insert_page
> > or might be a wrapper function written using vm_insert_page whichever
> > suites better
> > based on feedback.
> >
> > >
> > > vm_insert_kmem_page() - returns an errno
> > > vm_insert_page() - returns a VM_FAULT_* code
> > > vmf_insert_page() - returns a VM_FAULT_* code and is identical to
> > >       vm_insert_page()
> > >
> >
> > After completion of conversion we end up with
> >
> >  vm_insert_kmem_page() - returns an errno
> >  vmf_insert_page() - returns a VM_FAULT_* code
> >
> >
> > > Given that the documentation for vm_insert_page() says:
> > >
> > >  * Usually this function is called from f_op->mmap() handler
> > >  * under mm->mmap_sem write-lock, so it can change vma->vm_flags.
> > >  * Caller must set VM_MIXEDMAP on vma if it wants to call this
> > >  * function from other places, for example from page-fault handler.
> > >
> > > this says that the "usual" use method for vm_insert_page() is
> > > _outside_ of page fault handling - if it is used _inside_ page fault
> > > handling, then it states that additional fixups are required on the
> > > VMA.  So I don't get why your patch commentry seems to be saying that
> > > users of vm_insert_page() outside of page fault handling all need to
> > > be patched - isn't this the use case that this function is defined
> > > to be handling?
> >
> > The answer is yes best of my knowledge.
> >
> > But as mentioned in change log ->
> >
> > Going forward, the plan is to restrict future drivers not
> > to use vm_insert_page ( *it will generate new errno to
> > VM_FAULT_CODE mapping code for new drivers which were already
> > cleaned up for existing drivers*) in #PF (page fault handler)
> > context but to make use of vmf_insert_page which returns
> > VMF_FAULT_CODE and that is not possible until both vm_insert_page
> > and vmf_insert_page API exists.
> >
> > But there are some consumers of vm_insert_page which use it
> > outside #PF context. straight forward conversion of vm_insert_page
> > to vmf_insert_page won't work there as those function calls expects
> > errno not vm_fault_t in return.
> >
> > If both {vm, vmf}_insert_page exists, vm_insert_page might be used for
> > #PF context which we want to protect by removing/ replacing vm_insert_page
> > with another similar/ wrapper API.
> >
> > Is that the right answer of your question ? no ?
>
> I think this is a bad plan.  What we should rather do is examine the current
> users of vm_insert_page() and ask "What interface would better replace
> vm_insert_page()?"
>
> As I've said to you before, I believe the right answer is to have a
> vm_insert_range() which takes an array of struct page pointers.  That
> fits the majority of remaining users.

Ok, but it will take some time.
Is it a good idea to introduce the final vm_fault_t patch and then
start working on vm_insert_range as it will be bit time consuming ?

>
> ----
>
> If we do want to rename vm_insert_page() to vm_insert_kmem_page(), then
> the right answer is to _just do that_.  Not duplicate vm_insert_page()
> in its entirety.  I don't see the point to doing that.

Miguel Ojeda Oct. 4, 2018, 7:46 p.m. UTC | #12

Hi Souptick,

On Thu, Oct 4, 2018 at 8:49 PM Souptick Joarder <jrdr.linux@gmail.com> wrote:
>
> On Thu, Oct 4, 2018 at 11:47 PM Matthew Wilcox <willy@infradead.org> wrote:
> >
> > I think this is a bad plan.  What we should rather do is examine the current
> > users of vm_insert_page() and ask "What interface would better replace
> > vm_insert_page()?"
> >
> > As I've said to you before, I believe the right answer is to have a
> > vm_insert_range() which takes an array of struct page pointers.  That
> > fits the majority of remaining users.
>
> Ok, but it will take some time.
> Is it a good idea to introduce the final vm_fault_t patch and then
> start working on vm_insert_range as it will be bit time consuming ?
>

Well, why is there a rush? Development should be done in a patch
series or a tree, and submitted as a whole, instead of sending partial
patches.

Also, not sure if you saw my comments/review: if the interface is not
going to change, why the name change? Why can't we simply keep using
vm_insert_page?

Cheers,
Miguel

Souptick Joarder Oct. 5, 2018, 5:50 a.m. UTC | #13

On Fri, Oct 5, 2018 at 1:16 AM Miguel Ojeda
<miguel.ojeda.sandonis@gmail.com> wrote:
>
> Hi Souptick,
>
> On Thu, Oct 4, 2018 at 8:49 PM Souptick Joarder <jrdr.linux@gmail.com> wrote:
> >
> > On Thu, Oct 4, 2018 at 11:47 PM Matthew Wilcox <willy@infradead.org> wrote:
> > >
> > > I think this is a bad plan.  What we should rather do is examine the current
> > > users of vm_insert_page() and ask "What interface would better replace
> > > vm_insert_page()?"
> > >
> > > As I've said to you before, I believe the right answer is to have a
> > > vm_insert_range() which takes an array of struct page pointers.  That
> > > fits the majority of remaining users.
> >
> > Ok, but it will take some time.
> > Is it a good idea to introduce the final vm_fault_t patch and then
> > start working on vm_insert_range as it will be bit time consuming ?
> >
>
> Well, why is there a rush? Development should be done in a patch
> series or a tree, and submitted as a whole, instead of sending partial
> patches.

Not in hurry, will do it in a patch series :-)
>
> Also, not sure if you saw my comments/review: if the interface is not
> going to change, why the name change? Why can't we simply keep using
> vm_insert_page?

yes, changing the name without changing the interface is a
bad approach and this can't be taken. As Matthew mentioned,
"vm_insert_range() which takes an array of struct page pointers.
That fits the majority of remaining users" would be a better approach
to fit this use case.

But yes, we can't keep vm_insert_page and vmf_insert_page together
as it doesn't guarantee  that future drivers will not use vm_insert_page
in #PF context ( which will generate new errno to VM_FAULT_CODE).

Any further comment form others on vm_Insert_range() approach ?

Miguel Ojeda Oct. 5, 2018, 8:52 a.m. UTC | #14

Hi Souptick,

On Fri, Oct 5, 2018 at 7:51 AM Souptick Joarder <jrdr.linux@gmail.com> wrote:
>
> On Fri, Oct 5, 2018 at 1:16 AM Miguel Ojeda
> <miguel.ojeda.sandonis@gmail.com> wrote:
> >
> >
> > Also, not sure if you saw my comments/review: if the interface is not
> > going to change, why the name change? Why can't we simply keep using
> > vm_insert_page?
>
> yes, changing the name without changing the interface is a
> bad approach and this can't be taken. As Matthew mentioned,
> "vm_insert_range() which takes an array of struct page pointers.
> That fits the majority of remaining users" would be a better approach
> to fit this use case.
>
> But yes, we can't keep vm_insert_page and vmf_insert_page together
> as it doesn't guarantee  that future drivers will not use vm_insert_page
> in #PF context ( which will generate new errno to VM_FAULT_CODE).
>

Maybe I am hard of thinking, but aren't you planning to remove
vm_insert_page with these changes? If yes, why you can't use the keep
vm_insert_page name? In other words, keep returning what the drivers
expect?

Cheers,
Miguel

Souptick Joarder Oct. 5, 2018, 10:01 a.m. UTC | #15

On Fri, Oct 5, 2018 at 2:22 PM Miguel Ojeda
<miguel.ojeda.sandonis@gmail.com> wrote:
>
> Hi Souptick,
>
> On Fri, Oct 5, 2018 at 7:51 AM Souptick Joarder <jrdr.linux@gmail.com> wrote:
> >
> > On Fri, Oct 5, 2018 at 1:16 AM Miguel Ojeda
> > <miguel.ojeda.sandonis@gmail.com> wrote:
> > >
> > >
> > > Also, not sure if you saw my comments/review: if the interface is not
> > > going to change, why the name change? Why can't we simply keep using
> > > vm_insert_page?
> >
> > yes, changing the name without changing the interface is a
> > bad approach and this can't be taken. As Matthew mentioned,
> > "vm_insert_range() which takes an array of struct page pointers.
> > That fits the majority of remaining users" would be a better approach
> > to fit this use case.
> >
> > But yes, we can't keep vm_insert_page and vmf_insert_page together
> > as it doesn't guarantee  that future drivers will not use vm_insert_page
> > in #PF context ( which will generate new errno to VM_FAULT_CODE).
> >
>
> Maybe I am hard of thinking, but aren't you planning to remove
> vm_insert_page with these changes? If yes, why you can't use the keep
> vm_insert_page name? In other words, keep returning what the drivers
> expect?

The final goal is to remove vm_insert_page by converting it to
vmf_insert_page. But to do that we have to first introduce the
new API which is similar to vm_insert_page  (for non #PF). I tried this by
introducing vm_insert_kmem_page ( * identical as vm_insert_page
except API name *) in this patch. But this looks like a bad approach.

The new proposal is to introduce vm_insert_range() ( * which might be
bit different from vm_insert_page but will serve all the non #PF use cases)

Miguel Ojeda Oct. 5, 2018, 10:49 a.m. UTC | #16

Hi Souptick,

On Fri, Oct 5, 2018 at 12:01 PM Souptick Joarder <jrdr.linux@gmail.com> wrote:
>
> The final goal is to remove vm_insert_page by converting it to
> vmf_insert_page. But to do that we have to first introduce the
> new API which is similar to vm_insert_page  (for non #PF). I tried this by
> introducing vm_insert_kmem_page ( * identical as vm_insert_page
> except API name *) in this patch. But this looks like a bad approach.

We are going in circles here. That you want to convert vm_insert_page
to vmf_insert_page for the PF case is fine and understood. However,
you don't *need* to introduce a new name for the remaining non-PF
cases if the function is going to be the exact same thing as before.
You say "The final goal is to remove vm_insert_page", but you haven't
justified *why* you need to remove that name.

Now, if we want to rename the function for some reason (e.g. avoid
confusion with vmf_insert_page), that is fine but is another topic. It
may be or not a good idea, but it is orthogonal to the vmf_ work.
Matthew, on this regard, told you that you shouldn't duplicate
functions. If you want a rename, do so; but don't copy the code. In
other words: nobody said introducing the vm_insert_kmem_page name is a
bad idea -- what Matthew told you is that *duplicating* vm_insert_page
just for that is bad.

Further, you are copying the code (if I understand your thought
process) because you want to change the callers of non-PF first, and
then do the "full conversion from vm_* to vmf_*". However, that is
confusing, because there is no need to change non-PF callers of
vm_insert_page since they don't care about the new vmf_* functions.

Instead, the proper way of doing this is:

  1. Introduce the vmf_* API
  2. Change all PF-users users to that (leaving all non-PF ones
untouched!) -- if this is too big, you can split this patch into
several patches, one per subsystem, etc.
  3. Remove the vm_* functions (except the ones that are still used in
non-PF contexts, e.g. vm_insert_page)

Then, optionally, if you want to rename the function for the remaining
non-PF users:

  4. Rename vm_insert_page (justifying why the current name is
confusing *on its own merits*).

Otherwise, if you want to pursue Matthew's idea:

  4. Introduce the vm_insert_range (possibly leveraging
vm_insert_page, or not; you have to see what is best).
  5. Replace those callers that can take advantage of vm_insert_range
  6. Remove vm_insert_page and replace callers with vm_insert_range
(only if it is not worth to keep vm_insert_range, again justifying it
*on its own merits*)

As you see, these are all logical step-by-step improvements, without
duplicating functions temporarily, leaving temporary changes or
changing current callers to new APIs for unrelated reasons (i.e. no
need to introduce vm_insert_kmem_page simply to do a "conversion" to
vmf_).

Cheers,
Miguel

Souptick Joarder Oct. 5, 2018, 12:11 p.m. UTC | #17

On Fri, Oct 5, 2018 at 4:19 PM Miguel Ojeda
<miguel.ojeda.sandonis@gmail.com> wrote:
>
> Hi Souptick,
>
> On Fri, Oct 5, 2018 at 12:01 PM Souptick Joarder <jrdr.linux@gmail.com> wrote:
> >
> > The final goal is to remove vm_insert_page by converting it to
> > vmf_insert_page. But to do that we have to first introduce the
> > new API which is similar to vm_insert_page  (for non #PF). I tried this by
> > introducing vm_insert_kmem_page ( * identical as vm_insert_page
> > except API name *) in this patch. But this looks like a bad approach.
>
> We are going in circles here. That you want to convert vm_insert_page
> to vmf_insert_page for the PF case is fine and understood. However,
> you don't *need* to introduce a new name for the remaining non-PF
> cases if the function is going to be the exact same thing as before.
> You say "The final goal is to remove vm_insert_page", but you haven't
> justified *why* you need to remove that name.
>
> Now, if we want to rename the function for some reason (e.g. avoid
> confusion with vmf_insert_page), that is fine but is another topic. It
> may be or not a good idea, but it is orthogonal to the vmf_ work.
> Matthew, on this regard, told you that you shouldn't duplicate
> functions. If you want a rename, do so; but don't copy the code. In
> other words: nobody said introducing the vm_insert_kmem_page name is a
> bad idea -- what Matthew told you is that *duplicating* vm_insert_page
> just for that is bad.
>
> Further, you are copying the code (if I understand your thought
> process) because you want to change the callers of non-PF first, and
> then do the "full conversion from vm_* to vmf_*". However, that is
> confusing, because there is no need to change non-PF callers of
> vm_insert_page since they don't care about the new vmf_* functions.
>
> Instead, the proper way of doing this is:
>
>   1. Introduce the vmf_* API
>   2. Change all PF-users users to that (leaving all non-PF ones
> untouched!) -- if this is too big, you can split this patch into
> several patches, one per subsystem, etc.

We are done with step 2. All the PF-users are converted to use
vmf_insert_page. ( Ref - linux-next-20181005)

>   3. Remove the vm_* functions (except the ones that are still used in
> non-PF contexts, e.g. vm_insert_page)

Step 3 is part of step 2. Already done.

>
> Then, optionally, if you want to rename the function for the remaining
> non-PF users:
>
>   4. Rename vm_insert_page (justifying why the current name is
> confusing *on its own merits*).
>
> Otherwise, if you want to pursue Matthew's idea:
>
>   4. Introduce the vm_insert_range (possibly leveraging
> vm_insert_page, or not; you have to see what is best).
>   5. Replace those callers that can take advantage of vm_insert_range
>   6. Remove vm_insert_page and replace callers with vm_insert_range
> (only if it is not worth to keep vm_insert_range, again justifying it
> *on its own merits*)

Step 4 to 6, going to do it.  It is part of plan now :-)

>
> As you see, these are all logical step-by-step improvements, without
> duplicating functions temporarily, leaving temporary changes or
> changing current callers to new APIs for unrelated reasons (i.e. no
> need to introduce vm_insert_kmem_page simply to do a "conversion" to
> vmf_).
>
> Cheers,
> Miguel

Miguel Ojeda Oct. 5, 2018, 6:09 p.m. UTC | #18

On Fri, Oct 5, 2018 at 2:11 PM Souptick Joarder <jrdr.linux@gmail.com> wrote:
>
> On Fri, Oct 5, 2018 at 4:19 PM Miguel Ojeda
> <miguel.ojeda.sandonis@gmail.com> wrote:
> >
> >   1. Introduce the vmf_* API
> >   2. Change all PF-users users to that (leaving all non-PF ones
> > untouched!) -- if this is too big, you can split this patch into
> > several patches, one per subsystem, etc.
>
> We are done with step 2. All the PF-users are converted to use
> vmf_insert_page. ( Ref - linux-next-20181005)

They are not supposed to be "steps". You did it with 70+ commits (!!)
over the course of several months. Why a tree wasn't created, stuff
developed there, and when done, submitted it for review?

> >
> > Otherwise, if you want to pursue Matthew's idea:
> >
> >   4. Introduce the vm_insert_range (possibly leveraging
> > vm_insert_page, or not; you have to see what is best).
> >   5. Replace those callers that can take advantage of vm_insert_range
> >   6. Remove vm_insert_page and replace callers with vm_insert_range
> > (only if it is not worth to keep vm_insert_range, again justifying it
> > *on its own merits*)
>
> Step 4 to 6, going to do it.  It is part of plan now :-)
>

Fine, but you haven't answered to the other parts of my email: you
don't explain why you choose one alternative over the others, you
simply keep changing the approach.

Cheers,
Miguel

Souptick Joarder Oct. 6, 2018, 5:14 a.m. UTC | #19

On Fri, Oct 5, 2018 at 11:39 PM Miguel Ojeda
<miguel.ojeda.sandonis@gmail.com> wrote:
>
> On Fri, Oct 5, 2018 at 2:11 PM Souptick Joarder <jrdr.linux@gmail.com> wrote:
> >
> > On Fri, Oct 5, 2018 at 4:19 PM Miguel Ojeda
> > <miguel.ojeda.sandonis@gmail.com> wrote:
> > >
> > >   1. Introduce the vmf_* API
> > >   2. Change all PF-users users to that (leaving all non-PF ones
> > > untouched!) -- if this is too big, you can split this patch into
> > > several patches, one per subsystem, etc.
> >
> > We are done with step 2. All the PF-users are converted to use
> > vmf_insert_page. ( Ref - linux-next-20181005)
>
> They are not supposed to be "steps". You did it with 70+ commits (!!)
> over the course of several months. Why a tree wasn't created, stuff
> developed there, and when done, submitted it for review?

Because we already have a plan for entire vm_fault_t migration and
the * instruction * was to send one patch per driver.
>
> > >
> > > Otherwise, if you want to pursue Matthew's idea:
> > >
> > >   4. Introduce the vm_insert_range (possibly leveraging
> > > vm_insert_page, or not; you have to see what is best).
> > >   5. Replace those callers that can take advantage of vm_insert_range
> > >   6. Remove vm_insert_page and replace callers with vm_insert_range
> > > (only if it is not worth to keep vm_insert_range, again justifying it
> > > *on its own merits*)
> >
> > Step 4 to 6, going to do it.  It is part of plan now :-)
> >
>
> Fine, but you haven't answered to the other parts of my email: you
> don't explain why you choose one alternative over the others, you
> simply keep changing the approach.

We are going in circles here. That you want to convert vm_insert_page
to vmf_insert_page for the PF case is fine and understood. However,
you don't *need* to introduce a new name for the remaining non-PF
cases if the function is going to be the exact same thing as before.
You say "The final goal is to remove vm_insert_page", but you haven't
justified *why* you need to remove that name.

I think I have given that answer. If we don't remove vm_insert_page,
future #PF caller will have option to use it. But those should be
restricted. How are we going to restrict vm_insert_page in one half
of kernel when other half is still using it  ?? Is there any way ? ( I don't
know)

Miguel Ojeda Oct. 6, 2018, 10:49 a.m. UTC | #20

On Sat, Oct 6, 2018 at 7:11 AM Souptick Joarder <jrdr.linux@gmail.com> wrote:
>
> On Fri, Oct 5, 2018 at 11:39 PM Miguel Ojeda
> <miguel.ojeda.sandonis@gmail.com> wrote:
> > They are not supposed to be "steps". You did it with 70+ commits (!!)
> > over the course of several months. Why a tree wasn't created, stuff
> > developed there, and when done, submitted it for review?
>
> Because we already have a plan for entire vm_fault_t migration and
> the * instruction * was to send one patch per driver.

The instruction?

> >
> > Fine, but you haven't answered to the other parts of my email: you
> > don't explain why you choose one alternative over the others, you
> > simply keep changing the approach.
>
> We are going in circles here. That you want to convert vm_insert_page
> to vmf_insert_page for the PF case is fine and understood. However,
> you don't *need* to introduce a new name for the remaining non-PF
> cases if the function is going to be the exact same thing as before.
> You say "The final goal is to remove vm_insert_page", but you haven't
> justified *why* you need to remove that name.
>
> I think I have given that answer. If we don't remove vm_insert_page,
> future #PF caller will have option to use it. But those should be
> restricted. How are we going to restrict vm_insert_page in one half
> of kernel when other half is still using it  ?? Is there any way ? ( I don't
> know)

Ah, so that is what you are concerned about: future misuses. Well, I
don't really see the problem. There are only ~18 calls to
vm_insert_page() in the entire kernel: checking if people is using it
properly for a while should be easy. As long as the new behavior is
documented properly, it should be fine. If you are really concerned
about mistakes being made, then fine, we can rename it as I suggested.

Now, the new vm_insert_range() is another topic. It simplifies a few
of the callers and buys us the rename at the same time, so I am also
OK with it.

As you see, I am not against the changes -- it is just that they
should clearly justified. :-) It wasn't clear what your problem with
the current vm_insert_page() is.

Cheers,
Miguel

Souptick Joarder Oct. 23, 2018, 12:14 p.m. UTC | #21

On Sat, Oct 6, 2018 at 4:19 PM Miguel Ojeda
<miguel.ojeda.sandonis@gmail.com> wrote:
>
> On Sat, Oct 6, 2018 at 7:11 AM Souptick Joarder <jrdr.linux@gmail.com> wrote:
> >
> > On Fri, Oct 5, 2018 at 11:39 PM Miguel Ojeda
> > <miguel.ojeda.sandonis@gmail.com> wrote:
> > > They are not supposed to be "steps". You did it with 70+ commits (!!)
> > > over the course of several months. Why a tree wasn't created, stuff
> > > developed there, and when done, submitted it for review?
> >
> > Because we already have a plan for entire vm_fault_t migration and
> > the * instruction * was to send one patch per driver.
>
> The instruction?

Sorry for the delayed response.
Instruction from Matthew  Wilcox who is supervising the entire vm_fault_t
migration work :-)

>
> > >
> > > Fine, but you haven't answered to the other parts of my email: you
> > > don't explain why you choose one alternative over the others, you
> > > simply keep changing the approach.
> >
> > We are going in circles here. That you want to convert vm_insert_page
> > to vmf_insert_page for the PF case is fine and understood. However,
> > you don't *need* to introduce a new name for the remaining non-PF
> > cases if the function is going to be the exact same thing as before.
> > You say "The final goal is to remove vm_insert_page", but you haven't
> > justified *why* you need to remove that name.
> >
> > I think I have given that answer. If we don't remove vm_insert_page,
> > future #PF caller will have option to use it. But those should be
> > restricted. How are we going to restrict vm_insert_page in one half
> > of kernel when other half is still using it  ?? Is there any way ? ( I don't
> > know)
>
> Ah, so that is what you are concerned about: future misuses. Well, I
> don't really see the problem. There are only ~18 calls to
> vm_insert_page() in the entire kernel: checking if people is using it
> properly for a while should be easy. As long as the new behavior is
> documented properly, it should be fine. If you are really concerned
> about mistakes being made, then fine, we can rename it as I suggested.
>
> Now, the new vm_insert_range() is another topic. It simplifies a few
> of the callers and buys us the rename at the same time, so I am also
> OK with it.
>
> As you see, I am not against the changes -- it is just that they
> should clearly justified. :-) It wasn't clear what your problem with
> the current vm_insert_page() is.
>
> Cheers,
> Miguel

Matthew Wilcox Oct. 23, 2018, 12:24 p.m. UTC | #22

On Tue, Oct 23, 2018 at 05:44:32PM +0530, Souptick Joarder wrote:
> On Sat, Oct 6, 2018 at 4:19 PM Miguel Ojeda
> <miguel.ojeda.sandonis@gmail.com> wrote:
> >
> > On Sat, Oct 6, 2018 at 7:11 AM Souptick Joarder <jrdr.linux@gmail.com> wrote:
> > >
> > > On Fri, Oct 5, 2018 at 11:39 PM Miguel Ojeda
> > > <miguel.ojeda.sandonis@gmail.com> wrote:
> > > > They are not supposed to be "steps". You did it with 70+ commits (!!)
> > > > over the course of several months. Why a tree wasn't created, stuff
> > > > developed there, and when done, submitted it for review?
> > >
> > > Because we already have a plan for entire vm_fault_t migration and
> > > the * instruction * was to send one patch per driver.
> >
> > The instruction?
> 
> Sorry for the delayed response.
> Instruction from Matthew  Wilcox who is supervising the entire vm_fault_t
> migration work :-)

Hang on.  That was for the initial vm_fault_t conversion in which each
step was clearly an improvement.  What you're looking at now is far
from that.

Souptick Joarder Oct. 23, 2018, 12:33 p.m. UTC | #23

On Tue, Oct 23, 2018 at 5:54 PM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Tue, Oct 23, 2018 at 05:44:32PM +0530, Souptick Joarder wrote:
> > On Sat, Oct 6, 2018 at 4:19 PM Miguel Ojeda
> > <miguel.ojeda.sandonis@gmail.com> wrote:
> > >
> > > On Sat, Oct 6, 2018 at 7:11 AM Souptick Joarder <jrdr.linux@gmail.com> wrote:
> > > >
> > > > On Fri, Oct 5, 2018 at 11:39 PM Miguel Ojeda
> > > > <miguel.ojeda.sandonis@gmail.com> wrote:
> > > > > They are not supposed to be "steps". You did it with 70+ commits (!!)
> > > > > over the course of several months. Why a tree wasn't created, stuff
> > > > > developed there, and when done, submitted it for review?
> > > >
> > > > Because we already have a plan for entire vm_fault_t migration and
> > > > the * instruction * was to send one patch per driver.
> > >
> > > The instruction?
> >
> > Sorry for the delayed response.
> > Instruction from Matthew  Wilcox who is supervising the entire vm_fault_t
> > migration work :-)
>
> Hang on.  That was for the initial vm_fault_t conversion in which each
> step was clearly an improvement.  What you're looking at now is far
> from that.

Ok. But my understanding was, the approach of vm_insert_range comes
into discussion as part of converting vm_insert_page into vmf_insert_page
which is still part of original vm_fault_t conversion discussion.  No ?

Matthew Wilcox Oct. 23, 2018, 12:59 p.m. UTC | #24

On Tue, Oct 23, 2018 at 06:03:42PM +0530, Souptick Joarder wrote:
> On Tue, Oct 23, 2018 at 5:54 PM Matthew Wilcox <willy@infradead.org> wrote:
> > On Tue, Oct 23, 2018 at 05:44:32PM +0530, Souptick Joarder wrote:
> > > Instruction from Matthew  Wilcox who is supervising the entire vm_fault_t
> > > migration work :-)
> >
> > Hang on.  That was for the initial vm_fault_t conversion in which each
> > step was clearly an improvement.  What you're looking at now is far
> > from that.
> 
> Ok. But my understanding was, the approach of vm_insert_range comes
> into discussion as part of converting vm_insert_page into vmf_insert_page
> which is still part of original vm_fault_t conversion discussion.  No ?

No.  The initial part (converting all page fault methods to vm_fault_t)
is done.  What remains undone (looking at akpm's tree) is changing the
typedef of vm_fault_t from int to unsigned int.  That will prevent new
page fault handlers with the wrong type from being added.

I don't necessarily want to get rid of vm_insert_page().  Maybe it will
make sense to do that, and maybe not.  What I do want to see is thought,
and not "Matthew told me to do it", when I didn't.

Souptick Joarder Oct. 23, 2018, 1:15 p.m. UTC | #25

On Tue, Oct 23, 2018 at 6:29 PM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Tue, Oct 23, 2018 at 06:03:42PM +0530, Souptick Joarder wrote:
> > On Tue, Oct 23, 2018 at 5:54 PM Matthew Wilcox <willy@infradead.org> wrote:
> > > On Tue, Oct 23, 2018 at 05:44:32PM +0530, Souptick Joarder wrote:
> > > > Instruction from Matthew  Wilcox who is supervising the entire vm_fault_t
> > > > migration work :-)
> > >
> > > Hang on.  That was for the initial vm_fault_t conversion in which each
> > > step was clearly an improvement.  What you're looking at now is far
> > > from that.
> >
> > Ok. But my understanding was, the approach of vm_insert_range comes
> > into discussion as part of converting vm_insert_page into vmf_insert_page
> > which is still part of original vm_fault_t conversion discussion.  No ?
>
> No.  The initial part (converting all page fault methods to vm_fault_t)
> is done.  What remains undone (looking at akpm's tree) is changing the
> typedef of vm_fault_t from int to unsigned int.  That will prevent new
> page fault handlers with the wrong type from being added.

Ok, I will post the final typedef of vm_fault_t patch.

>
> I don't necessarily want to get rid of vm_insert_page().  Maybe it will
> make sense to do that, and maybe not.  What I do want to see is thought,
> and not "Matthew told me to do it", when I didn't.

I didn't mean it in other way. Sorry about it.
I will work on it.

[v2] mm: Introduce new function vm_insert_kmem_page

Commit Message

Comments

Patch