From patchwork Tue Jul 18 08:28:56 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Kasireddy, Vivek" X-Patchwork-Id: 13316899 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 3D2D4EB64DA for ; Tue, 18 Jul 2023 08:50:38 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 9953210E2FA; Tue, 18 Jul 2023 08:50:37 +0000 (UTC) Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by gabe.freedesktop.org (Postfix) with ESMTPS id 1608510E2FA for ; Tue, 18 Jul 2023 08:50:24 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1689670225; x=1721206225; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=ytZGfVs0FjX73uMwB3S3VLJpXAwmQPIgbbtxJr4xh/I=; b=Pidz3hwOmwx9GASUaZdQfBdeQvce706ENWchXuHQadrkiQxkvpF3aoKA VBMheCr5ciwtzFAP4Fdky0oZaKrQVD5D/H2L/zjqkid6GAUNSXgoSLDiL ZKQ+QLOObkzbglVuEOMydtq3HvQtUIyddD/fRicQd0Ol3OaVXuvA7PUW9 cvAyiWjJTD/8/A8EMqXwMpgP8QQXwEi4DgdCrtEmMDan3m1ppeIJdPebv JqOpml8VG948p+mnWrQT/PqFz04SedUilV1ViEjh0AA4VvlX19w3KUq4g JTJFiBPp6uNpMX2Tp05TghErckIiWfgs2o42wJevYTDQqdZ5jAfjtYNe5 w==; X-IronPort-AV: E=McAfee;i="6600,9927,10774"; a="363616482" X-IronPort-AV: E=Sophos;i="6.01,213,1684825200"; d="scan'208";a="363616482" Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Jul 2023 01:50:23 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10774"; a="837205682" X-IronPort-AV: E=Sophos;i="6.01,213,1684825200"; d="scan'208";a="837205682" Received: from vkasired-desk2.fm.intel.com ([10.105.128.127]) by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Jul 2023 01:50:23 -0700 From: Vivek Kasireddy To: dri-devel@lists.freedesktop.org, linux-mm@kvack.org Subject: [RFC v1 1/3] mm/mmu_notifier: Add a new notifier for mapping updates (new pages) Date: Tue, 18 Jul 2023 01:28:56 -0700 Message-Id: <20230718082858.1570809-2-vivek.kasireddy@intel.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20230718082858.1570809-1-vivek.kasireddy@intel.com> References: <20230718082858.1570809-1-vivek.kasireddy@intel.com> MIME-Version: 1.0 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Dongwon Kim , David Hildenbrand , Junxiao Chang , Hugh Dickins , Vivek Kasireddy , Peter Xu , Gerd Hoffmann , Jason Gunthorpe , Mike Kravetz Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" Currently, there does not appear to be any mechanism for letting drivers or other kernel entities know about updates made in a mapping particularly when a new page is faulted in. Providing notifications for such situations is really useful when using memfds backed by ram-based filesystems such as shmem or hugetlbfs that also allow FALLOC_FL_PUNCH_HOLE. More specifically, when a hole is punched in a memfd (that is backed by shmem or hugetlbfs), a driver can register for notifications associated with range invalidations. However, it would also be useful to have notifications when new pages are faulted in as a result of writes made to the mapping region that overlaps with a previously punched hole. Cc: David Hildenbrand Cc: Mike Kravetz Cc: Hugh Dickins Cc: Peter Xu Cc: Jason Gunthorpe Cc: Gerd Hoffmann Cc: Dongwon Kim Cc: Junxiao Chang Signed-off-by: Vivek Kasireddy --- include/linux/mmu_notifier.h | 27 +++++++++++++++++++++++++++ mm/hugetlb.c | 9 ++++++++- mm/mmu_notifier.c | 17 +++++++++++++++++ mm/shmem.c | 7 ++++++- 4 files changed, 58 insertions(+), 2 deletions(-) diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h index 64a3e051c3c4..218ddc3b4bc7 100644 --- a/include/linux/mmu_notifier.h +++ b/include/linux/mmu_notifier.h @@ -131,6 +131,16 @@ struct mmu_notifier_ops { unsigned long address, pte_t pte); + /* + * update_mapping is called when a page is replaced (at a given offset) + * in a mapping backed by shmem or hugetlbfs. The new page's pfn will + * be contained in the pfn field. + */ + void (*update_mapping)(struct mmu_notifier *subscription, + struct mm_struct *mm, + unsigned long address, + unsigned long pfn); + /* * invalidate_range_start() and invalidate_range_end() must be * paired and are called only when the mmap_lock and/or the @@ -394,6 +404,9 @@ extern int __mmu_notifier_test_young(struct mm_struct *mm, unsigned long address); extern void __mmu_notifier_change_pte(struct mm_struct *mm, unsigned long address, pte_t pte); +extern void __mmu_notifier_update_mapping(struct mm_struct *mm, + unsigned long address, + unsigned long pfn); extern int __mmu_notifier_invalidate_range_start(struct mmu_notifier_range *r); extern void __mmu_notifier_invalidate_range_end(struct mmu_notifier_range *r, bool only_end); @@ -447,6 +460,14 @@ static inline void mmu_notifier_change_pte(struct mm_struct *mm, __mmu_notifier_change_pte(mm, address, pte); } +static inline void mmu_notifier_update_mapping(struct mm_struct *mm, + unsigned long address, + unsigned long pfn) +{ + if (mm_has_notifiers(mm)) + __mmu_notifier_update_mapping(mm, address, pfn); +} + static inline void mmu_notifier_invalidate_range_start(struct mmu_notifier_range *range) { @@ -695,6 +716,12 @@ static inline void mmu_notifier_change_pte(struct mm_struct *mm, { } +static inline void mmu_notifier_update_mapping(struct mm_struct *mm, + unsigned long address, + unsigned long pfn) +{ +} + static inline void mmu_notifier_invalidate_range_start(struct mmu_notifier_range *range) { diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 64a3239b6407..1f2f0209101a 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -6096,8 +6096,12 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, * hugetlb_no_page will drop vma lock and hugetlb fault * mutex internally, which make us return immediately. */ - return hugetlb_no_page(mm, vma, mapping, idx, address, ptep, + ret = hugetlb_no_page(mm, vma, mapping, idx, address, ptep, entry, flags); + if (!ret) + mmu_notifier_update_mapping(vma->vm_mm, address, + pte_pfn(*ptep)); + return ret; ret = 0; @@ -6223,6 +6227,9 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, */ if (need_wait_lock) folio_wait_locked(folio); + if (!ret) + mmu_notifier_update_mapping(vma->vm_mm, address, + pte_pfn(*ptep)); return ret; } diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c index 50c0dde1354f..6421405334b9 100644 --- a/mm/mmu_notifier.c +++ b/mm/mmu_notifier.c @@ -441,6 +441,23 @@ void __mmu_notifier_change_pte(struct mm_struct *mm, unsigned long address, srcu_read_unlock(&srcu, id); } +void __mmu_notifier_update_mapping(struct mm_struct *mm, unsigned long address, + unsigned long pfn) +{ + struct mmu_notifier *subscription; + int id; + + id = srcu_read_lock(&srcu); + hlist_for_each_entry_rcu(subscription, + &mm->notifier_subscriptions->list, hlist, + srcu_read_lock_held(&srcu)) { + if (subscription->ops->update_mapping) + subscription->ops->update_mapping(subscription, mm, + address, pfn); + } + srcu_read_unlock(&srcu, id); +} + static int mn_itree_invalidate(struct mmu_notifier_subscriptions *subscriptions, const struct mmu_notifier_range *range) { diff --git a/mm/shmem.c b/mm/shmem.c index 2f2e0e618072..e59eb5fafadb 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -77,6 +77,7 @@ static struct vfsmount *shm_mnt; #include #include #include +#include #include #include @@ -2164,8 +2165,12 @@ static vm_fault_t shmem_fault(struct vm_fault *vmf) gfp, vma, vmf, &ret); if (err) return vmf_error(err); - if (folio) + if (folio) { vmf->page = folio_file_page(folio, vmf->pgoff); + if (ret == VM_FAULT_LOCKED) + mmu_notifier_update_mapping(vma->vm_mm, vmf->address, + page_to_pfn(vmf->page)); + } return ret; } From patchwork Tue Jul 18 08:28:57 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Kasireddy, Vivek" X-Patchwork-Id: 13316896 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 16D3DEB64DD for ; Tue, 18 Jul 2023 08:50:27 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 6131510E2FC; Tue, 18 Jul 2023 08:50:26 +0000 (UTC) Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by gabe.freedesktop.org (Postfix) with ESMTPS id 310D610E2FC for ; Tue, 18 Jul 2023 08:50:25 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1689670225; x=1721206225; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=qXZ6AwfAvrK7KjXkWKm1YdazaQCCVWSV4uTMM4BsPII=; b=XvZdibUN9/+tXmqcVCNZK8sHTN07sCVkoxZE+T92kiacowz4vogca+8H r7mi9obLAYBd86SnTnGI1V48epZlbN/G5hzOVK8qd9nzzoAB3Z0Ms/B0e IaJKjny/gBOMRKDiaTFKXIOXAjE2KA1PVh3sIBjIhGLlDxoKZ13PtYhwB 4UgSYjvdzmtSEBVCbbGuU5AsW4J8nIhY2bLCL9k9bACO62H8Dyts14A+v 4ingOawJIBwfAQGjZXK79/wgiyFsyDrUdOPoYxdH2sY2keDbNYbArpTzM b3lbEL6XXWcaFjznEAgTgU6nv+UrC0FmGqYIYvJL3sSBX6a+Ta/6VEGB0 w==; X-IronPort-AV: E=McAfee;i="6600,9927,10774"; a="363616489" X-IronPort-AV: E=Sophos;i="6.01,213,1684825200"; d="scan'208";a="363616489" Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Jul 2023 01:50:23 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10774"; a="837205685" X-IronPort-AV: E=Sophos;i="6.01,213,1684825200"; d="scan'208";a="837205685" Received: from vkasired-desk2.fm.intel.com ([10.105.128.127]) by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Jul 2023 01:50:23 -0700 From: Vivek Kasireddy To: dri-devel@lists.freedesktop.org, linux-mm@kvack.org Subject: [RFC v1 2/3] udmabuf: Replace pages when there is FALLOC_FL_PUNCH_HOLE in memfd Date: Tue, 18 Jul 2023 01:28:57 -0700 Message-Id: <20230718082858.1570809-3-vivek.kasireddy@intel.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20230718082858.1570809-1-vivek.kasireddy@intel.com> References: <20230718082858.1570809-1-vivek.kasireddy@intel.com> MIME-Version: 1.0 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Dongwon Kim , David Hildenbrand , Junxiao Chang , Hugh Dickins , Vivek Kasireddy , Peter Xu , Gerd Hoffmann , Jason Gunthorpe , Mike Kravetz Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" When a hole is punched in the memfd or when a page is replaced for any reason, the udmabuf driver needs to get notified in order to update its list of pages with the new page. To accomplish this, we first identify the vma ranges where pages associated with a given udmabuf are mapped to and then register a handler for update_mapping mmu notifier for receiving mapping updates. Once we get notified about a new page faulted in at a given offset in the mapping (backed by shmem or hugetlbfs), the list of pages is updated and we also zap the relevant PTEs associated with the vmas that have mmap'd the udmabuf fd. Cc: David Hildenbrand Cc: Mike Kravetz Cc: Hugh Dickins Cc: Peter Xu Cc: Jason Gunthorpe Cc: Gerd Hoffmann Cc: Dongwon Kim Cc: Junxiao Chang Signed-off-by: Vivek Kasireddy --- drivers/dma-buf/udmabuf.c | 172 ++++++++++++++++++++++++++++++++++++++ 1 file changed, 172 insertions(+) diff --git a/drivers/dma-buf/udmabuf.c b/drivers/dma-buf/udmabuf.c index 10c47bf77fb5..189a36c41906 100644 --- a/drivers/dma-buf/udmabuf.c +++ b/drivers/dma-buf/udmabuf.c @@ -4,6 +4,8 @@ #include #include #include +#include +#include #include #include #include @@ -30,6 +32,23 @@ struct udmabuf { struct sg_table *sg; struct miscdevice *device; pgoff_t *offsets; + struct udmabuf_vma_range *ranges; + unsigned int num_ranges; + struct mmu_notifier notifier; + struct mutex mn_lock; + struct list_head mmap_vmas; +}; + +struct udmabuf_vma_range { + struct file *memfd; + pgoff_t ubufindex; + unsigned long start; + unsigned long end; +}; + +struct udmabuf_mmap_vma { + struct list_head vma_link; + struct vm_area_struct *vma; }; static vm_fault_t udmabuf_vm_fault(struct vm_fault *vmf) @@ -42,28 +61,54 @@ static vm_fault_t udmabuf_vm_fault(struct vm_fault *vmf) if (pgoff >= ubuf->pagecount) return VM_FAULT_SIGBUS; + mutex_lock(&ubuf->mn_lock); pfn = page_to_pfn(ubuf->pages[pgoff]); if (ubuf->offsets) { pfn += ubuf->offsets[pgoff] >> PAGE_SHIFT; } + mutex_unlock(&ubuf->mn_lock); return vmf_insert_pfn(vma, vmf->address, pfn); } +static void udmabuf_vm_close(struct vm_area_struct *vma) +{ + struct udmabuf *ubuf = vma->vm_private_data; + struct udmabuf_mmap_vma *mmap_vma; + + list_for_each_entry(mmap_vma, &ubuf->mmap_vmas, vma_link) { + if (mmap_vma->vma == vma) { + list_del(&mmap_vma->vma_link); + kfree(mmap_vma); + break; + } + } +} + static const struct vm_operations_struct udmabuf_vm_ops = { .fault = udmabuf_vm_fault, + .close = udmabuf_vm_close, }; static int mmap_udmabuf(struct dma_buf *buf, struct vm_area_struct *vma) { struct udmabuf *ubuf = buf->priv; + struct udmabuf_mmap_vma *mmap_vma; if ((vma->vm_flags & (VM_SHARED | VM_MAYSHARE)) == 0) return -EINVAL; + mmap_vma = kmalloc(sizeof(*mmap_vma), GFP_KERNEL); + if (!mmap_vma) + return -ENOMEM; + vma->vm_ops = &udmabuf_vm_ops; vma->vm_private_data = ubuf; vm_flags_set(vma, VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP); + + mmap_vma->vma = vma; + list_add(&mmap_vma->vma_link, &ubuf->mmap_vmas); + return 0; } @@ -109,6 +154,7 @@ static struct sg_table *get_sg_table(struct device *dev, struct dma_buf *buf, if (ret < 0) goto err_alloc; + mutex_lock(&ubuf->mn_lock); for_each_sg(sg->sgl, sgl, ubuf->pagecount, i) { offset = ubuf->offsets ? ubuf->offsets[i] : 0; sg_set_page(sgl, ubuf->pages[i], PAGE_SIZE, offset); @@ -116,9 +162,12 @@ static struct sg_table *get_sg_table(struct device *dev, struct dma_buf *buf, ret = dma_map_sgtable(dev, sg, direction, 0); if (ret < 0) goto err_map; + + mutex_unlock(&ubuf->mn_lock); return sg; err_map: + mutex_unlock(&ubuf->mn_lock); sg_free_table(sg); err_alloc: kfree(sg); @@ -157,6 +206,9 @@ static void release_udmabuf(struct dma_buf *buf) for (pg = 0; pg < ubuf->pagecount; pg++) put_page(ubuf->pages[pg]); + + mmu_notifier_unregister(&ubuf->notifier, ubuf->notifier.mm); + kfree(ubuf->ranges); kfree(ubuf->offsets); kfree(ubuf->pages); kfree(ubuf); @@ -208,6 +260,93 @@ static const struct dma_buf_ops udmabuf_ops = { .end_cpu_access = end_cpu_udmabuf, }; +static void invalidate_mmap_vmas(struct udmabuf *ubuf, + struct udmabuf_vma_range *range, + unsigned long address, unsigned long size) +{ + struct udmabuf_mmap_vma *vma; + unsigned long start = range->ubufindex << PAGE_SHIFT; + + start += address - range->start; + list_for_each_entry(vma, &ubuf->mmap_vmas, vma_link) { + zap_vma_ptes(vma->vma, vma->vma->vm_start + start, size); + } +} + +static struct udmabuf_vma_range *find_udmabuf_range(struct udmabuf *ubuf, + unsigned long address) +{ + struct udmabuf_vma_range *range; + int i; + + for (i = 0; i < ubuf->num_ranges; i++) { + range = &ubuf->ranges[i]; + if (address >= range->start && address < range->end) + return range; + } + + return NULL; +} + +static void update_udmabuf(struct mmu_notifier *mn, struct mm_struct *mm, + unsigned long address, unsigned long pfn) +{ + struct udmabuf *ubuf = container_of(mn, struct udmabuf, notifier); + struct udmabuf_vma_range *range = find_udmabuf_range(ubuf, address); + struct page *old_page, *new_page; + pgoff_t pgoff, pgshift = PAGE_SHIFT; + unsigned long size = 0; + + if (!range || !pfn_valid(pfn)) + return; + + if (is_file_hugepages(range->memfd)) + pgshift = huge_page_shift(hstate_file(range->memfd)); + + mutex_lock(&ubuf->mn_lock); + pgoff = range->ubufindex + ((address - range->start) >> pgshift); + old_page = ubuf->pages[pgoff]; + new_page = pfn_to_page(pfn); + + do { + ubuf->pages[pgoff] = new_page; + get_page(new_page); + put_page(old_page); + size += PAGE_SIZE; + } while (ubuf->pages[++pgoff] == old_page); + + mutex_unlock(&ubuf->mn_lock); + invalidate_mmap_vmas(ubuf, range, address, size); +} + +static const struct mmu_notifier_ops udmabuf_update_ops = { + .update_mapping = update_udmabuf, +}; + +static struct vm_area_struct *find_guest_ram_vma(struct udmabuf *ubuf, + struct mm_struct *vmm_mm) +{ + struct vm_area_struct *vma = NULL; + MA_STATE(mas, &vmm_mm->mm_mt, 0, 0); + unsigned long addr; + pgoff_t pg; + + mas_set(&mas, 0); + mmap_read_lock(vmm_mm); + mas_for_each(&mas, vma, ULONG_MAX) { + for (pg = 0; pg < ubuf->pagecount; pg++) { + addr = page_address_in_vma(ubuf->pages[pg], vma); + if (addr == -EFAULT) + break; + } + if (addr != -EFAULT) + break; + } + mmap_read_unlock(vmm_mm); + + return vma; +} + #define SEALS_WANTED (F_SEAL_SHRINK) #define SEALS_DENIED (F_SEAL_WRITE) @@ -218,6 +357,7 @@ static long udmabuf_create(struct miscdevice *device, DEFINE_DMA_BUF_EXPORT_INFO(exp_info); struct file *memfd = NULL; struct address_space *mapping = NULL; + struct vm_area_struct *guest_ram; struct udmabuf *ubuf; struct dma_buf *buf; pgoff_t pgoff, pgcnt, pgidx, pgbuf = 0, pglimit; @@ -252,6 +392,13 @@ static long udmabuf_create(struct miscdevice *device, goto err; } + ubuf->ranges = kmalloc_array(head->count, sizeof(*ubuf->ranges), + GFP_KERNEL); + if (!ubuf->ranges) { + ret = -ENOMEM; + goto err; + } + pgbuf = 0; for (i = 0; i < head->count; i++) { ret = -EBADFD; @@ -270,6 +417,8 @@ static long udmabuf_create(struct miscdevice *device, goto err; pgoff = list[i].offset >> PAGE_SHIFT; pgcnt = list[i].size >> PAGE_SHIFT; + ubuf->ranges[i].ubufindex = pgbuf; + ubuf->ranges[i].memfd = memfd; if (is_file_hugepages(memfd)) { if (!ubuf->offsets) { ubuf->offsets = kmalloc_array(ubuf->pagecount, @@ -299,6 +448,7 @@ static long udmabuf_create(struct miscdevice *device, get_page(hpage); ubuf->pages[pgbuf] = hpage; ubuf->offsets[pgbuf++] = chunkoff << PAGE_SHIFT; + if (++chunkoff == maxchunks) { put_page(hpage); hpage = NULL; @@ -334,6 +484,25 @@ static long udmabuf_create(struct miscdevice *device, goto err; } + guest_ram = find_guest_ram_vma(ubuf, current->mm); + if (!guest_ram) + goto err; + + ubuf->notifier.ops = &udmabuf_update_ops; + ret = mmu_notifier_register(&ubuf->notifier, current->mm); + if (ret) + goto err; + + ubuf->num_ranges = head->count; + for (i = 0; i < ubuf->num_ranges; i++) { + page = ubuf->pages[ubuf->ranges[i].ubufindex]; + ubuf->ranges[i].start = page_address_in_vma(page, guest_ram); + ubuf->ranges[i].end = ubuf->ranges[i].start + list[i].size; + } + + INIT_LIST_HEAD(&ubuf->mmap_vmas); + mutex_init(&ubuf->mn_lock); + flags = 0; if (head->flags & UDMABUF_FLAGS_CLOEXEC) flags |= O_CLOEXEC; @@ -344,6 +513,9 @@ static long udmabuf_create(struct miscdevice *device, put_page(ubuf->pages[--pgbuf]); if (memfd) fput(memfd); + if (ubuf->notifier.mm) + mmu_notifier_unregister(&ubuf->notifier, ubuf->notifier.mm); + kfree(ubuf->ranges); kfree(ubuf->offsets); kfree(ubuf->pages); kfree(ubuf); From patchwork Tue Jul 18 08:28:58 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Kasireddy, Vivek" X-Patchwork-Id: 13316898 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 3F135EB64DA for ; Tue, 18 Jul 2023 08:50:33 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 623A010E301; Tue, 18 Jul 2023 08:50:32 +0000 (UTC) Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by gabe.freedesktop.org (Postfix) with ESMTPS id 60DDD10E2FE for ; Tue, 18 Jul 2023 08:50:25 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1689670225; x=1721206225; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=fMX2BBj+J9tuSKUq1MbJBSXs8cZcPuxmqmzRIg8h3FQ=; b=W4mreijEYxFGNfA91z741aSLyjBWCIrSyE0g/Zi4C2IowFfBm+4gsr3C pqMLZrcSU1UdDx/YaInMVqhcijUHtYD+4PCU0QvFf/UsZuCZmB+swAcL3 k4uvp2hvM3fllt7lCEnn/aRl/9+gXCRIg4DZe0DZo3AloHBMkqI5B6OSO ZMcXh/1ifLgQ5Nz5wWL7JUmGk+5wK29BqyXhTxkKu0dbdvIzNIxGIfm47 wsuaFpgzIFxxe9ASQMKCiy8fxUI0fgNS+cWNjBzLRt2omSLBoJ26U+syQ Kpu1GRgxpfhpzaj/n6jM3pzx3pYkuI4E/iSMUuGQqWKm7aXBBbLfx5dpv Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10774"; a="363616498" X-IronPort-AV: E=Sophos;i="6.01,213,1684825200"; d="scan'208";a="363616498" Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Jul 2023 01:50:24 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10774"; a="837205688" X-IronPort-AV: E=Sophos;i="6.01,213,1684825200"; d="scan'208";a="837205688" Received: from vkasired-desk2.fm.intel.com ([10.105.128.127]) by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Jul 2023 01:50:23 -0700 From: Vivek Kasireddy To: dri-devel@lists.freedesktop.org, linux-mm@kvack.org Subject: [RFC v1 3/3] selftests/dma-buf/udmabuf: Add tests for huge pages and FALLOC_FL_PUNCH_HOLE Date: Tue, 18 Jul 2023 01:28:58 -0700 Message-Id: <20230718082858.1570809-4-vivek.kasireddy@intel.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20230718082858.1570809-1-vivek.kasireddy@intel.com> References: <20230718082858.1570809-1-vivek.kasireddy@intel.com> MIME-Version: 1.0 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Dongwon Kim , David Hildenbrand , Junxiao Chang , Hugh Dickins , Vivek Kasireddy , Peter Xu , Gerd Hoffmann , Jason Gunthorpe , Shuah Khan , Mike Kravetz Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" The new tests added in this patch try to mimic the situation that happens when a hole is punched in the memfd associated with Guest RAM that is managed by a VMM such as Qemu. The main goal of these tests is to ensure that the udmabuf driver updates its list of pages when newer Guest writes overlap with the region of the mapping where a hole had been punched previously. Based-on-patch-by: Mike Kravetz Cc: Shuah Khan Cc: David Hildenbrand Cc: Mike Kravetz Cc: Hugh Dickins Cc: Peter Xu Cc: Jason Gunthorpe Cc: Gerd Hoffmann Cc: Dongwon Kim Cc: Junxiao Chang Signed-off-by: Vivek Kasireddy --- .../selftests/drivers/dma-buf/udmabuf.c | 165 +++++++++++++++++- 1 file changed, 161 insertions(+), 4 deletions(-) diff --git a/tools/testing/selftests/drivers/dma-buf/udmabuf.c b/tools/testing/selftests/drivers/dma-buf/udmabuf.c index c812080e304e..bee966444e9e 100644 --- a/tools/testing/selftests/drivers/dma-buf/udmabuf.c +++ b/tools/testing/selftests/drivers/dma-buf/udmabuf.c @@ -9,26 +9,144 @@ #include #include #include +#include #include #include +#include #include #include #define TEST_PREFIX "drivers/dma-buf/udmabuf" #define NUM_PAGES 4 +#define NUM_ENTRIES 4 +#define MEMFD_SIZE 1024 /* in pages */ -static int memfd_create(const char *name, unsigned int flags) +static unsigned int page_size; + +static int create_memfd_with_seals(off64_t size, bool hpage) { - return syscall(__NR_memfd_create, name, flags); + int memfd, ret; + unsigned int flags = MFD_ALLOW_SEALING; + + if (hpage) + flags |= MFD_HUGETLB; + + memfd = memfd_create("udmabuf-test", flags); + if (memfd < 0) { + printf("%s: [skip,no-memfd]\n", TEST_PREFIX); + exit(77); + } + + ret = fcntl(memfd, F_ADD_SEALS, F_SEAL_SHRINK); + if (ret < 0) { + printf("%s: [skip,fcntl-add-seals]\n", TEST_PREFIX); + exit(77); + } + + ret = ftruncate(memfd, size); + if (ret == -1) { + printf("%s: [FAIL,memfd-truncate]\n", TEST_PREFIX); + exit(1); + } + + return memfd; +} + +static int create_udmabuf_list(int devfd, int memfd, off64_t memfd_size) +{ + struct udmabuf_create_list *list; + int ubuf_fd, i; + + list = malloc(sizeof(struct udmabuf_create_list) + + sizeof(struct udmabuf_create_item) * NUM_ENTRIES); + if (!list) { + printf("%s: [FAIL, udmabuf-malloc]\n", TEST_PREFIX); + exit(1); + } + + for (i = 0; i < NUM_ENTRIES; i++) { + list->list[i].memfd = memfd; + list->list[i].offset = i * (memfd_size / NUM_ENTRIES); + list->list[i].size = getpagesize() * NUM_PAGES; + } + + list->count = NUM_ENTRIES; + list->flags = UDMABUF_FLAGS_CLOEXEC; + ubuf_fd = ioctl(devfd, UDMABUF_CREATE_LIST, list); + free(list); + if (ubuf_fd < 0) { + printf("%s: [FAIL, udmabuf-create]\n", TEST_PREFIX); + exit(1); + } + + return ubuf_fd; +} + +static void write_to_memfd(void *addr, off64_t size, char chr) +{ + int i; + + for (i = 0; i < size / page_size; i++) { + *((char *)addr + (i * page_size)) = chr; + } +} + +static void *mmap_fd(int fd, off64_t size) +{ + void *addr; + + addr = mmap(NULL, size, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0); + if (addr == MAP_FAILED) { + printf("%s: ubuf_fd mmap fail\n", TEST_PREFIX); + exit(1); + } + + return addr; +} + +static void punch_hole(int memfd, int num_pages) +{ + int ret; + + ret = fallocate(memfd, FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE, + 0, page_size * num_pages); + if (ret) { + printf("%s: memfd punch hole failed\n", TEST_PREFIX); + exit(1); + } +} + +static int compare_chunks(void *addr1, void *addr2, off64_t memfd_size) +{ + off64_t off; + int i = 0, j, k = 0, ret = 0; + char char1, char2; + + while (i < NUM_ENTRIES) { + off = i * (memfd_size / NUM_ENTRIES); + for (j = 0; j < NUM_PAGES; j++, k++) { + char1 = *((char *)addr1 + off + (j * getpagesize())); + char2 = *((char *)addr2 + (k * getpagesize())); + if (char1 != char2) { + ret = -1; + goto err; + } + } + i++; + } +err: + munmap(addr1, memfd_size); + munmap(addr2, NUM_ENTRIES * NUM_PAGES * getpagesize()); + return ret; } int main(int argc, char *argv[]) { struct udmabuf_create create; int devfd, memfd, buf, ret; - off_t size; - void *mem; + off64_t size; + void *addr1, *addr2; devfd = open("/dev/udmabuf", O_RDWR); if (devfd < 0) { @@ -90,6 +208,9 @@ int main(int argc, char *argv[]) } /* should work */ + page_size = getpagesize(); + addr1 = mmap_fd(memfd, size); + write_to_memfd(addr1, size, 'a'); create.memfd = memfd; create.offset = 0; create.size = size; @@ -98,6 +219,42 @@ int main(int argc, char *argv[]) printf("%s: [FAIL,test-4]\n", TEST_PREFIX); exit(1); } + munmap(addr1, size); + close(buf); + close(memfd); + + /* should work (punch hole)*/ + size = MEMFD_SIZE * page_size; + memfd = create_memfd_with_seals(size, false); + addr1 = mmap_fd(memfd, size); + write_to_memfd(addr1, size, 'a'); + buf = create_udmabuf_list(devfd, memfd, size); + addr2 = mmap_fd(buf, NUM_PAGES * NUM_ENTRIES * getpagesize()); + punch_hole(memfd, MEMFD_SIZE / 3); + write_to_memfd(addr1, size, 'b'); + ret = compare_chunks(addr1, addr2, size); + if (ret < 0) { + printf("%s: [FAIL,test-5]\n", TEST_PREFIX); + exit(1); + } + close(buf); + close(memfd); + + /* should work (huge pages + punch hole)*/ + page_size = getpagesize() * 512; /* 2 MB */ + size = MEMFD_SIZE * page_size; + memfd = create_memfd_with_seals(size, true); + addr1 = mmap_fd(memfd, size); + write_to_memfd(addr1, size, 'a'); + buf = create_udmabuf_list(devfd, memfd, size); + addr2 = mmap_fd(buf, NUM_PAGES * NUM_ENTRIES * getpagesize()); + punch_hole(memfd, MEMFD_SIZE / 3); + write_to_memfd(addr1, size, 'b'); + ret = compare_chunks(addr1, addr2, size); + if (ret < 0) { + printf("%s: [FAIL,test-6]\n", TEST_PREFIX); + exit(1); + } fprintf(stderr, "%s: ok\n", TEST_PREFIX); close(buf);