diff mbox series

mm: Introduce new function vm_insert_kmem_page

Message ID 20180927175123.GA16367@jordon-HP-15-Notebook-PC (mailing list archive)
State New, archived
Headers show
Series mm: Introduce new function vm_insert_kmem_page | expand

Commit Message

Souptick Joarder Sept. 27, 2018, 5:51 p.m. UTC
vm_insert_kmem_page is similar to vm_insert_page and will
be used by drivers to map kernel (kmalloc/vmalloc/pages)
allocated memory to user vma.

Previously vm_insert_page is used for both page fault
handlers and outside page fault handlers context. When
vm_insert_page is used in page fault handlers context,
each driver have to map errno to VM_FAULT_CODE in their
own way. But as part of vm_fault_t migration all the
page fault handlers are cleaned up by using new vmf_insert_page.
Going forward, vm_insert_page will be removed by converting
it to vmf_insert_page.
 
But their are places where vm_insert_page is used outside
page fault handlers context and converting those to
vmf_insert_page is not a good approach as drivers will end
up with new VM_FAULT_CODE to errno conversion code and it will
make each user more complex.

So this new vm_insert_kmem_page can be used to map kernel
memory to user vma outside page fault handler context.

In short, vmf_insert_page will be used in page fault handlers
context and vm_insert_kmem_page will be used to map kernel
memory to user vma outside page fault handlers context.

We will slowly convert all the user of vm_insert_page to
vm_insert_kmem_page after this API be available in linus tree.

Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com>
---
 include/linux/mm.h |  2 ++
 mm/memory.c        | 69 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 mm/nommu.c         |  7 ++++++
 3 files changed, 78 insertions(+)

Comments

Michal Hocko Sept. 27, 2018, 6:32 p.m. UTC | #1
On Thu 27-09-18 23:21:23, Souptick Joarder wrote:
> vm_insert_kmem_page is similar to vm_insert_page and will
> be used by drivers to map kernel (kmalloc/vmalloc/pages)
> allocated memory to user vma.
> 
> Previously vm_insert_page is used for both page fault
> handlers and outside page fault handlers context. When
> vm_insert_page is used in page fault handlers context,
> each driver have to map errno to VM_FAULT_CODE in their
> own way. But as part of vm_fault_t migration all the
> page fault handlers are cleaned up by using new vmf_insert_page.
> Going forward, vm_insert_page will be removed by converting
> it to vmf_insert_page.
>  
> But their are places where vm_insert_page is used outside
> page fault handlers context and converting those to
> vmf_insert_page is not a good approach as drivers will end
> up with new VM_FAULT_CODE to errno conversion code and it will
> make each user more complex.
> 
> So this new vm_insert_kmem_page can be used to map kernel
> memory to user vma outside page fault handler context.
> 
> In short, vmf_insert_page will be used in page fault handlers
> context and vm_insert_kmem_page will be used to map kernel
> memory to user vma outside page fault handlers context.
> 
> We will slowly convert all the user of vm_insert_page to
> vm_insert_kmem_page after this API be available in linus tree.

In general I do not like patches adding a new exports/functionality
without any user added at the same time. I am not going to look at the
implementation right now but the above opens more questions than it
gives answers. Why do we have to distinguish #PF from other paths?
Souptick Joarder Sept. 28, 2018, 12:27 p.m. UTC | #2
On Fri, Sep 28, 2018 at 12:02 AM Michal Hocko <mhocko@kernel.org> wrote:
>
> On Thu 27-09-18 23:21:23, Souptick Joarder wrote:
> > vm_insert_kmem_page is similar to vm_insert_page and will
> > be used by drivers to map kernel (kmalloc/vmalloc/pages)
> > allocated memory to user vma.
> >
> > Previously vm_insert_page is used for both page fault
> > handlers and outside page fault handlers context. When
> > vm_insert_page is used in page fault handlers context,
> > each driver have to map errno to VM_FAULT_CODE in their
> > own way. But as part of vm_fault_t migration all the
> > page fault handlers are cleaned up by using new vmf_insert_page.
> > Going forward, vm_insert_page will be removed by converting
> > it to vmf_insert_page.
> >
> > But their are places where vm_insert_page is used outside
> > page fault handlers context and converting those to
> > vmf_insert_page is not a good approach as drivers will end
> > up with new VM_FAULT_CODE to errno conversion code and it will
> > make each user more complex.
> >
> > So this new vm_insert_kmem_page can be used to map kernel
> > memory to user vma outside page fault handler context.
> >
> > In short, vmf_insert_page will be used in page fault handlers
> > context and vm_insert_kmem_page will be used to map kernel
> > memory to user vma outside page fault handlers context.
> >
> > We will slowly convert all the user of vm_insert_page to
> > vm_insert_kmem_page after this API be available in linus tree.
>
> In general I do not like patches adding a new exports/functionality
> without any user added at the same time. I am not going to look at the
> implementation right now but the above opens more questions than it
> gives answers. Why do we have to distinguish #PF from other paths?

Going forward, the plan is to restrict future drivers not to use vm_insert_page
( *it will generate new errno to VM_FAULT_CODE mapping code for new drivers
which were already cleaned up for existing drivers*) in #PF context but to make
use of vmf_insert_page which returns VMF_FAULT_CODE and that is not possible
until both vm_insert_page and vmf_insert_page API exists.

But there are some consumers of vm_insert_page which use it outside #PF context.
straight forward conversion of vm_insert_page to vmf_insert_page won't
work there as
those function calls expects errno not vm_fault_t in return.

e.g - drivers/auxdisplay/cfag12864bfb.c, line 55
        drivers/auxdisplay/ht16k33.c, line 227
        drivers/firewire/core-iso.c, line 115
        drivers/gpu/drm/rockchip/rockchip_drm_gem.c, line 237
        drivers/gpu/drm/xen/xen_drm_front_gem.c, line 253
        drivers/iommu/dma-iommu.c, line 600
        drivers/media/common/videobuf2/videobuf2-dma-sg.c, line 343
        drivers/media/usb/usbvision/usbvision-video.c, line 1056
        drivers/xen/gntalloc.c, line 548
        drivers/xen/gntdev.c, line 1149
        drivers/xen/privcmd-buf.c, line 184
        mm/vmalloc.c, line 2254
        net/ipv4/tcp.c, line 1806
        net/packet/af_packet.c, line 4407

These are the approaches which could have been taken to handle this scenario -

1. Replace vm_insert_page with vmf_insert_page and then write few
   extra lines of code to convert VM_FAULT_CODE to errno which
   makes driver users more complex ( also the reverse mapping errno to
   VM_FAULT_CODE have been cleaned up as part of vm_fault_t migration ,
   not preferred to introduce anything similar again)

2. Maintain both vm_insert_page and vmf_insert_page and use it in
   respective places. But it won't gurantee that vm_insert_page will
   never be used in #PF context.

3. Introduce a similar API like vm_insert_page, convert all non #PF
   consumer to use it and finally remove vm_insert_page by converting
   it to vmf_insert_page.

And the 3rd approach was taken by introducing vm_insert_kmem_page().
Souptick Joarder Oct. 2, 2018, 8:21 a.m. UTC | #3
Hi Michal,

On Fri, Sep 28, 2018 at 5:57 PM Souptick Joarder <jrdr.linux@gmail.com> wrote:
>
> On Fri, Sep 28, 2018 at 12:02 AM Michal Hocko <mhocko@kernel.org> wrote:
> >
> > On Thu 27-09-18 23:21:23, Souptick Joarder wrote:
> > > vm_insert_kmem_page is similar to vm_insert_page and will
> > > be used by drivers to map kernel (kmalloc/vmalloc/pages)
> > > allocated memory to user vma.
> > >
> > > Previously vm_insert_page is used for both page fault
> > > handlers and outside page fault handlers context. When
> > > vm_insert_page is used in page fault handlers context,
> > > each driver have to map errno to VM_FAULT_CODE in their
> > > own way. But as part of vm_fault_t migration all the
> > > page fault handlers are cleaned up by using new vmf_insert_page.
> > > Going forward, vm_insert_page will be removed by converting
> > > it to vmf_insert_page.
> > >
> > > But their are places where vm_insert_page is used outside
> > > page fault handlers context and converting those to
> > > vmf_insert_page is not a good approach as drivers will end
> > > up with new VM_FAULT_CODE to errno conversion code and it will
> > > make each user more complex.
> > >
> > > So this new vm_insert_kmem_page can be used to map kernel
> > > memory to user vma outside page fault handler context.
> > >
> > > In short, vmf_insert_page will be used in page fault handlers
> > > context and vm_insert_kmem_page will be used to map kernel
> > > memory to user vma outside page fault handlers context.
> > >
> > > We will slowly convert all the user of vm_insert_page to
> > > vm_insert_kmem_page after this API be available in linus tree.
> >
> > In general I do not like patches adding a new exports/functionality
> > without any user added at the same time. I am not going to look at the
> > implementation right now but the above opens more questions than it
> > gives answers. Why do we have to distinguish #PF from other paths?
>
> Going forward, the plan is to restrict future drivers not to use vm_insert_page
> ( *it will generate new errno to VM_FAULT_CODE mapping code for new drivers
> which were already cleaned up for existing drivers*) in #PF context but to make
> use of vmf_insert_page which returns VMF_FAULT_CODE and that is not possible
> until both vm_insert_page and vmf_insert_page API exists.
>
> But there are some consumers of vm_insert_page which use it outside #PF context.
> straight forward conversion of vm_insert_page to vmf_insert_page won't
> work there as
> those function calls expects errno not vm_fault_t in return.
>
> e.g - drivers/auxdisplay/cfag12864bfb.c, line 55
>         drivers/auxdisplay/ht16k33.c, line 227
>         drivers/firewire/core-iso.c, line 115
>         drivers/gpu/drm/rockchip/rockchip_drm_gem.c, line 237
>         drivers/gpu/drm/xen/xen_drm_front_gem.c, line 253
>         drivers/iommu/dma-iommu.c, line 600
>         drivers/media/common/videobuf2/videobuf2-dma-sg.c, line 343
>         drivers/media/usb/usbvision/usbvision-video.c, line 1056
>         drivers/xen/gntalloc.c, line 548
>         drivers/xen/gntdev.c, line 1149
>         drivers/xen/privcmd-buf.c, line 184
>         mm/vmalloc.c, line 2254
>         net/ipv4/tcp.c, line 1806
>         net/packet/af_packet.c, line 4407
>
> These are the approaches which could have been taken to handle this scenario -
>
> 1. Replace vm_insert_page with vmf_insert_page and then write few
>    extra lines of code to convert VM_FAULT_CODE to errno which
>    makes driver users more complex ( also the reverse mapping errno to
>    VM_FAULT_CODE have been cleaned up as part of vm_fault_t migration ,
>    not preferred to introduce anything similar again)
>
> 2. Maintain both vm_insert_page and vmf_insert_page and use it in
>    respective places. But it won't gurantee that vm_insert_page will
>    never be used in #PF context.
>
> 3. Introduce a similar API like vm_insert_page, convert all non #PF
>    consumer to use it and finally remove vm_insert_page by converting
>    it to vmf_insert_page.
>
> And the 3rd approach was taken by introducing vm_insert_kmem_page().

Does this 3rd approach looks good or their is a better way to handle
this scenario ?
Remaining vm_fault_t migration work has dependency on this patch.
Michal Hocko Oct. 2, 2018, 1:55 p.m. UTC | #4
On Fri 28-09-18 17:57:17, Souptick Joarder wrote:
> On Fri, Sep 28, 2018 at 12:02 AM Michal Hocko <mhocko@kernel.org> wrote:
> >
> > On Thu 27-09-18 23:21:23, Souptick Joarder wrote:
> > > vm_insert_kmem_page is similar to vm_insert_page and will
> > > be used by drivers to map kernel (kmalloc/vmalloc/pages)
> > > allocated memory to user vma.
> > >
> > > Previously vm_insert_page is used for both page fault
> > > handlers and outside page fault handlers context. When
> > > vm_insert_page is used in page fault handlers context,
> > > each driver have to map errno to VM_FAULT_CODE in their
> > > own way. But as part of vm_fault_t migration all the
> > > page fault handlers are cleaned up by using new vmf_insert_page.
> > > Going forward, vm_insert_page will be removed by converting
> > > it to vmf_insert_page.
> > >
> > > But their are places where vm_insert_page is used outside
> > > page fault handlers context and converting those to
> > > vmf_insert_page is not a good approach as drivers will end
> > > up with new VM_FAULT_CODE to errno conversion code and it will
> > > make each user more complex.
> > >
> > > So this new vm_insert_kmem_page can be used to map kernel
> > > memory to user vma outside page fault handler context.
> > >
> > > In short, vmf_insert_page will be used in page fault handlers
> > > context and vm_insert_kmem_page will be used to map kernel
> > > memory to user vma outside page fault handlers context.
> > >
> > > We will slowly convert all the user of vm_insert_page to
> > > vm_insert_kmem_page after this API be available in linus tree.
> >
> > In general I do not like patches adding a new exports/functionality
> > without any user added at the same time. I am not going to look at the
> > implementation right now but the above opens more questions than it
> > gives answers. Why do we have to distinguish #PF from other paths?
> 
> Going forward, the plan is to restrict future drivers not to use vm_insert_page
> ( *it will generate new errno to VM_FAULT_CODE mapping code for new drivers
> which were already cleaned up for existing drivers*) in #PF context but to make
> use of vmf_insert_page which returns VMF_FAULT_CODE and that is not possible
> until both vm_insert_page and vmf_insert_page API exists.
> 
> But there are some consumers of vm_insert_page which use it outside #PF context.
> straight forward conversion of vm_insert_page to vmf_insert_page won't
> work there as
> those function calls expects errno not vm_fault_t in return.
> 
> e.g - drivers/auxdisplay/cfag12864bfb.c, line 55
>         drivers/auxdisplay/ht16k33.c, line 227
>         drivers/firewire/core-iso.c, line 115
>         drivers/gpu/drm/rockchip/rockchip_drm_gem.c, line 237
>         drivers/gpu/drm/xen/xen_drm_front_gem.c, line 253
>         drivers/iommu/dma-iommu.c, line 600
>         drivers/media/common/videobuf2/videobuf2-dma-sg.c, line 343
>         drivers/media/usb/usbvision/usbvision-video.c, line 1056
>         drivers/xen/gntalloc.c, line 548
>         drivers/xen/gntdev.c, line 1149
>         drivers/xen/privcmd-buf.c, line 184
>         mm/vmalloc.c, line 2254
>         net/ipv4/tcp.c, line 1806
>         net/packet/af_packet.c, line 4407
> 
> These are the approaches which could have been taken to handle this scenario -
> 
> 1. Replace vm_insert_page with vmf_insert_page and then write few
>    extra lines of code to convert VM_FAULT_CODE to errno which
>    makes driver users more complex ( also the reverse mapping errno to
>    VM_FAULT_CODE have been cleaned up as part of vm_fault_t migration ,
>    not preferred to introduce anything similar again)
> 
> 2. Maintain both vm_insert_page and vmf_insert_page and use it in
>    respective places. But it won't gurantee that vm_insert_page will
>    never be used in #PF context.
> 
> 3. Introduce a similar API like vm_insert_page, convert all non #PF
>    consumer to use it and finally remove vm_insert_page by converting
>    it to vmf_insert_page.
> 
> And the 3rd approach was taken by introducing vm_insert_kmem_page().

OK, the make sure to convert some of those users in the same patch. This
will allow both to review the api and that it serves it purpose. Other
users can also see how the initial conversion has been done and do it in
a similar way.
diff mbox series

Patch

diff --git a/include/linux/mm.h b/include/linux/mm.h
index a61ebe8..5f42d35 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2477,6 +2477,8 @@  unsigned long change_prot_numa(struct vm_area_struct *vma,
 struct vm_area_struct *find_extend_vma(struct mm_struct *, unsigned long addr);
 int remap_pfn_range(struct vm_area_struct *, unsigned long addr,
 			unsigned long pfn, unsigned long size, pgprot_t);
+int vm_insert_kmem_page(struct vm_area_struct *vma, unsigned long addr,
+				struct page *page);
 int vm_insert_page(struct vm_area_struct *, unsigned long addr, struct page *);
 int vm_insert_pfn(struct vm_area_struct *vma, unsigned long addr,
 			unsigned long pfn);
diff --git a/mm/memory.c b/mm/memory.c
index c467102..b800c10 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1682,6 +1682,75 @@  pte_t *__get_locked_pte(struct mm_struct *mm, unsigned long addr,
 	return pte_alloc_map_lock(mm, pmd, addr, ptl);
 }
 
+static int insert_kmem_page(struct vm_area_struct *vma, unsigned long addr,
+		struct page *page, pgprot_t prot)
+{
+	struct mm_struct *mm = vma->vm_mm;
+	int retval;
+	pte_t *pte;
+	spinlock_t *ptl;
+
+	retval = -EINVAL;
+	if (PageAnon(page))
+		goto out;
+	retval = -ENOMEM;
+	flush_dcache_page(page);
+	pte = get_locked_pte(mm, addr, &ptl);
+	if (!pte)
+		goto out;
+	retval = -EBUSY;
+	if (!pte_none(*pte))
+		goto out_unlock;
+
+	get_page(page);
+	inc_mm_counter_fast(mm, mm_counter_file(page));
+	page_add_file_rmap(page, false);
+	set_pte_at(mm, addr, pte, mk_pte(page, prot));
+
+	retval = 0;
+	pte_unmap_unlock(pte, ptl);
+	return retval;
+out_unlock:
+	pte_unmap_unlock(pte, ptl);
+out:
+	return retval;
+}
+
+/**
+ * vm_insert_kmem_page - insert single page into user vma
+ * @vma: user vma to map to
+ * @addr: target user address of this page
+ * @page: source kernel page
+ *
+ * This allows drivers to insert individual kernel memory into a user vma.
+ * This API should be used outside page fault handlers context.
+ *
+ * Previously the same has been done with vm_insert_page by drivers. But
+ * vm_insert_page will be converted to vmf_insert_page and will be used
+ * in fault handlers context and return type of vmf_insert_page will be
+ * vm_fault_t type.
+ *
+ * But there are places where drivers need to map kernel memory into user
+ * vma outside fault handlers context. As vmf_insert_page will be restricted
+ * to use within page fault handlers, vm_insert_kmem_page could be used
+ * to map kernel memory to user vma outside fault handlers context.
+ */
+int vm_insert_kmem_page(struct vm_area_struct *vma, unsigned long addr,
+			struct page *page)
+{
+	if (addr < vma->vm_start || addr >= vma->vm_end)
+		return -EFAULT;
+	if (!page_count(page))
+		return -EINVAL;
+	if (!(vma->vm_flags & VM_MIXEDMAP)) {
+		BUG_ON(down_read_trylock(&vma->vm_mm->mmap_sem));
+		BUG_ON(vma->vm_flags & VM_PFNMAP);
+		vma->vm_flags |= VM_MIXEDMAP;
+	}
+	return insert_kmem_page(vma, addr, page, vma->vm_page_prot);
+}
+EXPORT_SYMBOL(vm_insert_kmem_page);
+
 /*
  * This is the old fallback for page remapping.
  *
diff --git a/mm/nommu.c b/mm/nommu.c
index e4aac33..153b8c8 100644
--- a/mm/nommu.c
+++ b/mm/nommu.c
@@ -473,6 +473,13 @@  int vm_insert_page(struct vm_area_struct *vma, unsigned long addr,
 }
 EXPORT_SYMBOL(vm_insert_page);
 
+int vm_insert_kmem_page(struct vm_area_struct *vma, unsigned long addr,
+				struct page *page)
+{
+	return -EINVAL;
+}
+EXPORT_SYMBOL(vm_insert_kmem_page);
+
 /*
  *  sys_brk() for the most part doesn't need the global kernel
  *  lock, except when an application is doing something nasty