Message ID | 20180927175123.GA16367@jordon-HP-15-Notebook-PC (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | mm: Introduce new function vm_insert_kmem_page | expand |
On Thu 27-09-18 23:21:23, Souptick Joarder wrote: > vm_insert_kmem_page is similar to vm_insert_page and will > be used by drivers to map kernel (kmalloc/vmalloc/pages) > allocated memory to user vma. > > Previously vm_insert_page is used for both page fault > handlers and outside page fault handlers context. When > vm_insert_page is used in page fault handlers context, > each driver have to map errno to VM_FAULT_CODE in their > own way. But as part of vm_fault_t migration all the > page fault handlers are cleaned up by using new vmf_insert_page. > Going forward, vm_insert_page will be removed by converting > it to vmf_insert_page. > > But their are places where vm_insert_page is used outside > page fault handlers context and converting those to > vmf_insert_page is not a good approach as drivers will end > up with new VM_FAULT_CODE to errno conversion code and it will > make each user more complex. > > So this new vm_insert_kmem_page can be used to map kernel > memory to user vma outside page fault handler context. > > In short, vmf_insert_page will be used in page fault handlers > context and vm_insert_kmem_page will be used to map kernel > memory to user vma outside page fault handlers context. > > We will slowly convert all the user of vm_insert_page to > vm_insert_kmem_page after this API be available in linus tree. In general I do not like patches adding a new exports/functionality without any user added at the same time. I am not going to look at the implementation right now but the above opens more questions than it gives answers. Why do we have to distinguish #PF from other paths?
On Fri, Sep 28, 2018 at 12:02 AM Michal Hocko <mhocko@kernel.org> wrote: > > On Thu 27-09-18 23:21:23, Souptick Joarder wrote: > > vm_insert_kmem_page is similar to vm_insert_page and will > > be used by drivers to map kernel (kmalloc/vmalloc/pages) > > allocated memory to user vma. > > > > Previously vm_insert_page is used for both page fault > > handlers and outside page fault handlers context. When > > vm_insert_page is used in page fault handlers context, > > each driver have to map errno to VM_FAULT_CODE in their > > own way. But as part of vm_fault_t migration all the > > page fault handlers are cleaned up by using new vmf_insert_page. > > Going forward, vm_insert_page will be removed by converting > > it to vmf_insert_page. > > > > But their are places where vm_insert_page is used outside > > page fault handlers context and converting those to > > vmf_insert_page is not a good approach as drivers will end > > up with new VM_FAULT_CODE to errno conversion code and it will > > make each user more complex. > > > > So this new vm_insert_kmem_page can be used to map kernel > > memory to user vma outside page fault handler context. > > > > In short, vmf_insert_page will be used in page fault handlers > > context and vm_insert_kmem_page will be used to map kernel > > memory to user vma outside page fault handlers context. > > > > We will slowly convert all the user of vm_insert_page to > > vm_insert_kmem_page after this API be available in linus tree. > > In general I do not like patches adding a new exports/functionality > without any user added at the same time. I am not going to look at the > implementation right now but the above opens more questions than it > gives answers. Why do we have to distinguish #PF from other paths? Going forward, the plan is to restrict future drivers not to use vm_insert_page ( *it will generate new errno to VM_FAULT_CODE mapping code for new drivers which were already cleaned up for existing drivers*) in #PF context but to make use of vmf_insert_page which returns VMF_FAULT_CODE and that is not possible until both vm_insert_page and vmf_insert_page API exists. But there are some consumers of vm_insert_page which use it outside #PF context. straight forward conversion of vm_insert_page to vmf_insert_page won't work there as those function calls expects errno not vm_fault_t in return. e.g - drivers/auxdisplay/cfag12864bfb.c, line 55 drivers/auxdisplay/ht16k33.c, line 227 drivers/firewire/core-iso.c, line 115 drivers/gpu/drm/rockchip/rockchip_drm_gem.c, line 237 drivers/gpu/drm/xen/xen_drm_front_gem.c, line 253 drivers/iommu/dma-iommu.c, line 600 drivers/media/common/videobuf2/videobuf2-dma-sg.c, line 343 drivers/media/usb/usbvision/usbvision-video.c, line 1056 drivers/xen/gntalloc.c, line 548 drivers/xen/gntdev.c, line 1149 drivers/xen/privcmd-buf.c, line 184 mm/vmalloc.c, line 2254 net/ipv4/tcp.c, line 1806 net/packet/af_packet.c, line 4407 These are the approaches which could have been taken to handle this scenario - 1. Replace vm_insert_page with vmf_insert_page and then write few extra lines of code to convert VM_FAULT_CODE to errno which makes driver users more complex ( also the reverse mapping errno to VM_FAULT_CODE have been cleaned up as part of vm_fault_t migration , not preferred to introduce anything similar again) 2. Maintain both vm_insert_page and vmf_insert_page and use it in respective places. But it won't gurantee that vm_insert_page will never be used in #PF context. 3. Introduce a similar API like vm_insert_page, convert all non #PF consumer to use it and finally remove vm_insert_page by converting it to vmf_insert_page. And the 3rd approach was taken by introducing vm_insert_kmem_page().
Hi Michal, On Fri, Sep 28, 2018 at 5:57 PM Souptick Joarder <jrdr.linux@gmail.com> wrote: > > On Fri, Sep 28, 2018 at 12:02 AM Michal Hocko <mhocko@kernel.org> wrote: > > > > On Thu 27-09-18 23:21:23, Souptick Joarder wrote: > > > vm_insert_kmem_page is similar to vm_insert_page and will > > > be used by drivers to map kernel (kmalloc/vmalloc/pages) > > > allocated memory to user vma. > > > > > > Previously vm_insert_page is used for both page fault > > > handlers and outside page fault handlers context. When > > > vm_insert_page is used in page fault handlers context, > > > each driver have to map errno to VM_FAULT_CODE in their > > > own way. But as part of vm_fault_t migration all the > > > page fault handlers are cleaned up by using new vmf_insert_page. > > > Going forward, vm_insert_page will be removed by converting > > > it to vmf_insert_page. > > > > > > But their are places where vm_insert_page is used outside > > > page fault handlers context and converting those to > > > vmf_insert_page is not a good approach as drivers will end > > > up with new VM_FAULT_CODE to errno conversion code and it will > > > make each user more complex. > > > > > > So this new vm_insert_kmem_page can be used to map kernel > > > memory to user vma outside page fault handler context. > > > > > > In short, vmf_insert_page will be used in page fault handlers > > > context and vm_insert_kmem_page will be used to map kernel > > > memory to user vma outside page fault handlers context. > > > > > > We will slowly convert all the user of vm_insert_page to > > > vm_insert_kmem_page after this API be available in linus tree. > > > > In general I do not like patches adding a new exports/functionality > > without any user added at the same time. I am not going to look at the > > implementation right now but the above opens more questions than it > > gives answers. Why do we have to distinguish #PF from other paths? > > Going forward, the plan is to restrict future drivers not to use vm_insert_page > ( *it will generate new errno to VM_FAULT_CODE mapping code for new drivers > which were already cleaned up for existing drivers*) in #PF context but to make > use of vmf_insert_page which returns VMF_FAULT_CODE and that is not possible > until both vm_insert_page and vmf_insert_page API exists. > > But there are some consumers of vm_insert_page which use it outside #PF context. > straight forward conversion of vm_insert_page to vmf_insert_page won't > work there as > those function calls expects errno not vm_fault_t in return. > > e.g - drivers/auxdisplay/cfag12864bfb.c, line 55 > drivers/auxdisplay/ht16k33.c, line 227 > drivers/firewire/core-iso.c, line 115 > drivers/gpu/drm/rockchip/rockchip_drm_gem.c, line 237 > drivers/gpu/drm/xen/xen_drm_front_gem.c, line 253 > drivers/iommu/dma-iommu.c, line 600 > drivers/media/common/videobuf2/videobuf2-dma-sg.c, line 343 > drivers/media/usb/usbvision/usbvision-video.c, line 1056 > drivers/xen/gntalloc.c, line 548 > drivers/xen/gntdev.c, line 1149 > drivers/xen/privcmd-buf.c, line 184 > mm/vmalloc.c, line 2254 > net/ipv4/tcp.c, line 1806 > net/packet/af_packet.c, line 4407 > > These are the approaches which could have been taken to handle this scenario - > > 1. Replace vm_insert_page with vmf_insert_page and then write few > extra lines of code to convert VM_FAULT_CODE to errno which > makes driver users more complex ( also the reverse mapping errno to > VM_FAULT_CODE have been cleaned up as part of vm_fault_t migration , > not preferred to introduce anything similar again) > > 2. Maintain both vm_insert_page and vmf_insert_page and use it in > respective places. But it won't gurantee that vm_insert_page will > never be used in #PF context. > > 3. Introduce a similar API like vm_insert_page, convert all non #PF > consumer to use it and finally remove vm_insert_page by converting > it to vmf_insert_page. > > And the 3rd approach was taken by introducing vm_insert_kmem_page(). Does this 3rd approach looks good or their is a better way to handle this scenario ? Remaining vm_fault_t migration work has dependency on this patch.
On Fri 28-09-18 17:57:17, Souptick Joarder wrote: > On Fri, Sep 28, 2018 at 12:02 AM Michal Hocko <mhocko@kernel.org> wrote: > > > > On Thu 27-09-18 23:21:23, Souptick Joarder wrote: > > > vm_insert_kmem_page is similar to vm_insert_page and will > > > be used by drivers to map kernel (kmalloc/vmalloc/pages) > > > allocated memory to user vma. > > > > > > Previously vm_insert_page is used for both page fault > > > handlers and outside page fault handlers context. When > > > vm_insert_page is used in page fault handlers context, > > > each driver have to map errno to VM_FAULT_CODE in their > > > own way. But as part of vm_fault_t migration all the > > > page fault handlers are cleaned up by using new vmf_insert_page. > > > Going forward, vm_insert_page will be removed by converting > > > it to vmf_insert_page. > > > > > > But their are places where vm_insert_page is used outside > > > page fault handlers context and converting those to > > > vmf_insert_page is not a good approach as drivers will end > > > up with new VM_FAULT_CODE to errno conversion code and it will > > > make each user more complex. > > > > > > So this new vm_insert_kmem_page can be used to map kernel > > > memory to user vma outside page fault handler context. > > > > > > In short, vmf_insert_page will be used in page fault handlers > > > context and vm_insert_kmem_page will be used to map kernel > > > memory to user vma outside page fault handlers context. > > > > > > We will slowly convert all the user of vm_insert_page to > > > vm_insert_kmem_page after this API be available in linus tree. > > > > In general I do not like patches adding a new exports/functionality > > without any user added at the same time. I am not going to look at the > > implementation right now but the above opens more questions than it > > gives answers. Why do we have to distinguish #PF from other paths? > > Going forward, the plan is to restrict future drivers not to use vm_insert_page > ( *it will generate new errno to VM_FAULT_CODE mapping code for new drivers > which were already cleaned up for existing drivers*) in #PF context but to make > use of vmf_insert_page which returns VMF_FAULT_CODE and that is not possible > until both vm_insert_page and vmf_insert_page API exists. > > But there are some consumers of vm_insert_page which use it outside #PF context. > straight forward conversion of vm_insert_page to vmf_insert_page won't > work there as > those function calls expects errno not vm_fault_t in return. > > e.g - drivers/auxdisplay/cfag12864bfb.c, line 55 > drivers/auxdisplay/ht16k33.c, line 227 > drivers/firewire/core-iso.c, line 115 > drivers/gpu/drm/rockchip/rockchip_drm_gem.c, line 237 > drivers/gpu/drm/xen/xen_drm_front_gem.c, line 253 > drivers/iommu/dma-iommu.c, line 600 > drivers/media/common/videobuf2/videobuf2-dma-sg.c, line 343 > drivers/media/usb/usbvision/usbvision-video.c, line 1056 > drivers/xen/gntalloc.c, line 548 > drivers/xen/gntdev.c, line 1149 > drivers/xen/privcmd-buf.c, line 184 > mm/vmalloc.c, line 2254 > net/ipv4/tcp.c, line 1806 > net/packet/af_packet.c, line 4407 > > These are the approaches which could have been taken to handle this scenario - > > 1. Replace vm_insert_page with vmf_insert_page and then write few > extra lines of code to convert VM_FAULT_CODE to errno which > makes driver users more complex ( also the reverse mapping errno to > VM_FAULT_CODE have been cleaned up as part of vm_fault_t migration , > not preferred to introduce anything similar again) > > 2. Maintain both vm_insert_page and vmf_insert_page and use it in > respective places. But it won't gurantee that vm_insert_page will > never be used in #PF context. > > 3. Introduce a similar API like vm_insert_page, convert all non #PF > consumer to use it and finally remove vm_insert_page by converting > it to vmf_insert_page. > > And the 3rd approach was taken by introducing vm_insert_kmem_page(). OK, the make sure to convert some of those users in the same patch. This will allow both to review the api and that it serves it purpose. Other users can also see how the initial conversion has been done and do it in a similar way.
diff --git a/include/linux/mm.h b/include/linux/mm.h index a61ebe8..5f42d35 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2477,6 +2477,8 @@ unsigned long change_prot_numa(struct vm_area_struct *vma, struct vm_area_struct *find_extend_vma(struct mm_struct *, unsigned long addr); int remap_pfn_range(struct vm_area_struct *, unsigned long addr, unsigned long pfn, unsigned long size, pgprot_t); +int vm_insert_kmem_page(struct vm_area_struct *vma, unsigned long addr, + struct page *page); int vm_insert_page(struct vm_area_struct *, unsigned long addr, struct page *); int vm_insert_pfn(struct vm_area_struct *vma, unsigned long addr, unsigned long pfn); diff --git a/mm/memory.c b/mm/memory.c index c467102..b800c10 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1682,6 +1682,75 @@ pte_t *__get_locked_pte(struct mm_struct *mm, unsigned long addr, return pte_alloc_map_lock(mm, pmd, addr, ptl); } +static int insert_kmem_page(struct vm_area_struct *vma, unsigned long addr, + struct page *page, pgprot_t prot) +{ + struct mm_struct *mm = vma->vm_mm; + int retval; + pte_t *pte; + spinlock_t *ptl; + + retval = -EINVAL; + if (PageAnon(page)) + goto out; + retval = -ENOMEM; + flush_dcache_page(page); + pte = get_locked_pte(mm, addr, &ptl); + if (!pte) + goto out; + retval = -EBUSY; + if (!pte_none(*pte)) + goto out_unlock; + + get_page(page); + inc_mm_counter_fast(mm, mm_counter_file(page)); + page_add_file_rmap(page, false); + set_pte_at(mm, addr, pte, mk_pte(page, prot)); + + retval = 0; + pte_unmap_unlock(pte, ptl); + return retval; +out_unlock: + pte_unmap_unlock(pte, ptl); +out: + return retval; +} + +/** + * vm_insert_kmem_page - insert single page into user vma + * @vma: user vma to map to + * @addr: target user address of this page + * @page: source kernel page + * + * This allows drivers to insert individual kernel memory into a user vma. + * This API should be used outside page fault handlers context. + * + * Previously the same has been done with vm_insert_page by drivers. But + * vm_insert_page will be converted to vmf_insert_page and will be used + * in fault handlers context and return type of vmf_insert_page will be + * vm_fault_t type. + * + * But there are places where drivers need to map kernel memory into user + * vma outside fault handlers context. As vmf_insert_page will be restricted + * to use within page fault handlers, vm_insert_kmem_page could be used + * to map kernel memory to user vma outside fault handlers context. + */ +int vm_insert_kmem_page(struct vm_area_struct *vma, unsigned long addr, + struct page *page) +{ + if (addr < vma->vm_start || addr >= vma->vm_end) + return -EFAULT; + if (!page_count(page)) + return -EINVAL; + if (!(vma->vm_flags & VM_MIXEDMAP)) { + BUG_ON(down_read_trylock(&vma->vm_mm->mmap_sem)); + BUG_ON(vma->vm_flags & VM_PFNMAP); + vma->vm_flags |= VM_MIXEDMAP; + } + return insert_kmem_page(vma, addr, page, vma->vm_page_prot); +} +EXPORT_SYMBOL(vm_insert_kmem_page); + /* * This is the old fallback for page remapping. * diff --git a/mm/nommu.c b/mm/nommu.c index e4aac33..153b8c8 100644 --- a/mm/nommu.c +++ b/mm/nommu.c @@ -473,6 +473,13 @@ int vm_insert_page(struct vm_area_struct *vma, unsigned long addr, } EXPORT_SYMBOL(vm_insert_page); +int vm_insert_kmem_page(struct vm_area_struct *vma, unsigned long addr, + struct page *page) +{ + return -EINVAL; +} +EXPORT_SYMBOL(vm_insert_kmem_page); + /* * sys_brk() for the most part doesn't need the global kernel * lock, except when an application is doing something nasty
vm_insert_kmem_page is similar to vm_insert_page and will be used by drivers to map kernel (kmalloc/vmalloc/pages) allocated memory to user vma. Previously vm_insert_page is used for both page fault handlers and outside page fault handlers context. When vm_insert_page is used in page fault handlers context, each driver have to map errno to VM_FAULT_CODE in their own way. But as part of vm_fault_t migration all the page fault handlers are cleaned up by using new vmf_insert_page. Going forward, vm_insert_page will be removed by converting it to vmf_insert_page. But their are places where vm_insert_page is used outside page fault handlers context and converting those to vmf_insert_page is not a good approach as drivers will end up with new VM_FAULT_CODE to errno conversion code and it will make each user more complex. So this new vm_insert_kmem_page can be used to map kernel memory to user vma outside page fault handler context. In short, vmf_insert_page will be used in page fault handlers context and vm_insert_kmem_page will be used to map kernel memory to user vma outside page fault handlers context. We will slowly convert all the user of vm_insert_page to vm_insert_kmem_page after this API be available in linus tree. Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com> --- include/linux/mm.h | 2 ++ mm/memory.c | 69 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ mm/nommu.c | 7 ++++++ 3 files changed, 78 insertions(+)