From patchwork Wed Oct 13 19:58:23 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mina Almasry X-Patchwork-Id: 12556849 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1BAACC433EF for ; Wed, 13 Oct 2021 19:58:31 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id A98D7611BF for ; Wed, 13 Oct 2021 19:58:30 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org A98D7611BF Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 460166B0071; Wed, 13 Oct 2021 15:58:30 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 40DB86B0072; Wed, 13 Oct 2021 15:58:30 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2FC776B0073; Wed, 13 Oct 2021 15:58:30 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0048.hostedemail.com [216.40.44.48]) by kanga.kvack.org (Postfix) with ESMTP id 238296B0071 for ; Wed, 13 Oct 2021 15:58:30 -0400 (EDT) Received: from smtpin32.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id D23DE31ED6 for ; Wed, 13 Oct 2021 19:58:29 +0000 (UTC) X-FDA: 78692476338.32.BB83360 Received: from mail-yb1-f202.google.com (mail-yb1-f202.google.com [209.85.219.202]) by imf09.hostedemail.com (Postfix) with ESMTP id 601D13000104 for ; Wed, 13 Oct 2021 19:58:29 +0000 (UTC) Received: by mail-yb1-f202.google.com with SMTP id q193-20020a252aca000000b005ba63482993so4489203ybq.0 for ; Wed, 13 Oct 2021 12:58:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:message-id:mime-version:subject:from:cc; bh=e4Wzc1td/2jTkgPD91rDF6K16+hzxl1aXnOtt/YvVvc=; b=NtLKcrsdYZSuPqK3naRr3DYY7lBOzLNEleZZ6RgFpj6crxruM5XrGlp38d4i11d+Nv A6sr+0V18vjtib2uYGDy+Z8UG9Icx7AcIIiKQX6WwmlNJRhI5rLHdrnjxbyOYtv0cHxi vq5jBFuTWQ6Y3it3R/1CdOM3DLCMUSuC8vIOfPpRB61AYzHBWnLVN/n0uxP1y3AI6KA5 cBMGHozwwnog2x8jio7eEKocLW8NAO44Qyc1c3nbMHh1JvXftZSsHraKrt729HIeXK6o wDyx3ihRs+Atq4jaov3AMdLi7xp9blrXdK/xT0cZomEEnGfwpvIRSCkhpLDZzWras0r6 7fjg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:message-id:mime-version:subject:from:cc; bh=e4Wzc1td/2jTkgPD91rDF6K16+hzxl1aXnOtt/YvVvc=; b=rsHTzg9CD2Zos06ZtoMaiKfNQilmTbBrGAEMMoSvOX8xlVQJHcQ4f0vuaR9G55bZq9 Igb13OePjr6ZhpnSYfa0hO6XVqIo41XIPSfLnd9Wf6xl+CiV1n+OHj7zLjMK9WL5ZWBg YfGFGpweZMfLfbVSu7S1b07bqsK7xG7jokZtYQ1Z/KoZDszrwcMEDOD00B8B7vC0sd6H h3nkU+Hhz9xpEiVCtEbgm4KtrpOgPhMGnKrNNHyMnzj3F57M4y30ohNxb0rdVJGAsB1v a+rf0k84Kbd0vP2B6Ve37FwC1CbURx3CkUhli4kW++1of9J6gRKkf3pgWxHseZdYz6sg 66Sw== X-Gm-Message-State: AOAM532HKHu5gVb9miCb5TO3RMELQDqO8QFRbiOf6ZL8Zip4DZhnmJoW IojXEXmPv4NoMiwDEecKxR48vk3d8jcR0Xvz4A== X-Google-Smtp-Source: ABdhPJzs1MmpH7jpvDTOdBFWRmleyuyHnVssIvobfdyPEgkMxf7T2g0cX3GoETTuj9ARzzfrDG3D6A2DV6qHjs5Qvg== X-Received: from almasrymina.svl.corp.google.com ([2620:15c:2cd:202:cb5d:d61e:f106:3c70]) (user=almasrymina job=sendgmr) by 2002:a25:2785:: with SMTP id n127mr1492100ybn.235.1634155108618; Wed, 13 Oct 2021 12:58:28 -0700 (PDT) Date: Wed, 13 Oct 2021 12:58:23 -0700 Message-Id: <20211013195825.3058275-1-almasrymina@google.com> Mime-Version: 1.0 X-Mailer: git-send-email 2.33.0.882.g93a45727a2-goog Subject: [PATCH v7 1/2] mm, hugepages: add mremap() support for hugepage backed vma From: Mina Almasry Cc: Mina Almasry , Mike Kravetz , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Ken Chen , Chris Kennelly , Michal Hocko , Vlastimil Babka , Kirill Shutemov X-Rspamd-Queue-Id: 601D13000104 X-Stat-Signature: p3eihzdssoudq7hdt5zkudohk1yq7mrt Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=NtLKcrsd; spf=pass (imf09.hostedemail.com: domain of 3ZDpnYQsKCIclwxl329xtylrzzrwp.nzxwty58-xxv6lnv.z2r@flex--almasrymina.bounces.google.com designates 209.85.219.202 as permitted sender) smtp.mailfrom=3ZDpnYQsKCIclwxl329xtylrzzrwp.nzxwty58-xxv6lnv.z2r@flex--almasrymina.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-Rspamd-Server: rspam06 X-HE-Tag: 1634155109-875335 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Support mremap() for hugepage backed vma segment by simply repositioning page table entries. The page table entries are repositioned to the new virtual address on mremap(). Hugetlb mremap() support is of course generic; my motivating use case is a library (hugepage_text), which reloads the ELF text of executables in hugepages. This significantly increases the execution performance of said executables. Restricts the mremap operation on hugepages to up to the size of the original mapping as the underlying hugetlb reservation is not yet capable of handling remapping to a larger size. During the mremap() operation we detect pmd_share'd mappings and we unshare those during the mremap(). On access and fault the sharing is established again. Signed-off-by: Mina Almasry Reviewed-by: Mike Kravetz Cc: Mike Kravetz Cc: Andrew Morton Cc: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org Cc: Ken Chen Cc: Chris Kennelly Cc: Michal Hocko Cc: Vlastimil Babka Cc: Kirill Shutemov --- Changes in v7: - Addressed nits from Mike. - Added reviewd-by Mike. Changes in v6: - Converted hugetlb_vma_shareable() to static only (fixes build warning). - Removed huge_pmd_shared(). It's not needed since I removed the BUG_ON(huge_pmd_shared) in v3. - Removed *...* format for emphasis. - Fixed mremap behavior to hugepage-align lengths but return error if addresses are not hugepage aligned. Changes in v5: - Remove hugetlb_vma_shareable and huge_pmd_shared dummy definitions for !CONFIG_HUGETLB_PAGE config, since they are not used and were causing added warning and build errors. Changes in v4: - Added addr, new_addr, old_len, and new_len hugepage alignment. Changes in v3: - Addressed review comments from Mike. - Separated tests into their own patch. Changes in v2: - Re-wrote comment around clear_vma_resv_huge_pages() to make it clear that the resv_map has been moved to the new VMA and why we need to clear it from the current VMA. - We detect huge_pmd_shared() pte's and unshare those rather than bug on hugetlb_vma_shareable(). - This case now returns EFAULT: if (!vma || vma->vm_start > addr) goto out; - Added kselftests for mremap() support. --- include/linux/hugetlb.h | 19 +++++++ mm/hugetlb.c | 111 +++++++++++++++++++++++++++++++++++++--- mm/mremap.c | 36 +++++++++++-- 3 files changed, 157 insertions(+), 9 deletions(-) -- 2.33.0.882.g93a45727a2-goog diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index ebaba02706c87..44c2ab0dfa591 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -124,6 +124,7 @@ struct hugepage_subpool *hugepage_new_subpool(struct hstate *h, long max_hpages, void hugepage_put_subpool(struct hugepage_subpool *spool); void reset_vma_resv_huge_pages(struct vm_area_struct *vma); +void clear_vma_resv_huge_pages(struct vm_area_struct *vma); int hugetlb_sysctl_handler(struct ctl_table *, int, void *, size_t *, loff_t *); int hugetlb_overcommit_handler(struct ctl_table *, int, void *, size_t *, loff_t *); @@ -132,6 +133,10 @@ int hugetlb_treat_movable_handler(struct ctl_table *, int, void *, size_t *, int hugetlb_mempolicy_sysctl_handler(struct ctl_table *, int, void *, size_t *, loff_t *); +int move_hugetlb_page_tables(struct vm_area_struct *vma, + struct vm_area_struct *new_vma, + unsigned long old_addr, unsigned long new_addr, + unsigned long len); int copy_hugetlb_page_range(struct mm_struct *, struct mm_struct *, struct vm_area_struct *); long follow_hugetlb_page(struct mm_struct *, struct vm_area_struct *, struct page **, struct vm_area_struct **, @@ -215,6 +220,10 @@ static inline void reset_vma_resv_huge_pages(struct vm_area_struct *vma) { } +static inline void clear_vma_resv_huge_pages(struct vm_area_struct *vma) +{ +} + static inline unsigned long hugetlb_total_pages(void) { return 0; @@ -262,6 +271,16 @@ static inline int copy_hugetlb_page_range(struct mm_struct *dst, return 0; } +static inline int move_hugetlb_page_tables(struct vm_area_struct *vma, + struct vm_area_struct *new_vma, + unsigned long old_addr, + unsigned long new_addr, + unsigned long len) +{ + BUG(); + return 0; +} + static inline void hugetlb_report_meminfo(struct seq_file *m) { } diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 6d2f4c25dd9fb..61168186024a8 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1015,6 +1015,35 @@ void reset_vma_resv_huge_pages(struct vm_area_struct *vma) vma->vm_private_data = (void *)0; } +/* + * Reset and decrement one ref on hugepage private reservation. + * Called with mm->mmap_sem writer semaphore held. + * This function should be only used by move_vma() and operate on + * same sized vma. It should never come here with last ref on the + * reservation. + */ +void clear_vma_resv_huge_pages(struct vm_area_struct *vma) +{ + /* + * Clear the old hugetlb private page reservation. + * It has already been transferred to new_vma. + * + * During a mremap() operation of a hugetlb vma we call move_vma() + * which copies vma into new_vma and unmaps vma. After the copy + * operation both new_vma and vma share a reference to the resv_map + * struct, and at that point vma is about to be unmapped. We don't + * want to return the reservation to the pool at unmap of vma because + * the reservation still lives on in new_vma, so simply decrement the + * ref here and remove the resv_map reference from this vma. + */ + struct resv_map *reservations = vma_resv_map(vma); + + if (reservations && is_vma_resv_set(vma, HPAGE_RESV_OWNER)) + kref_put(&reservations->refs, resv_map_release); + + reset_vma_resv_huge_pages(vma); +} + /* Returns true if the VMA has associated reserve pages */ static bool vma_has_reserves(struct vm_area_struct *vma, long chg) { @@ -4800,6 +4829,82 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, return ret; } +static void move_huge_pte(struct vm_area_struct *vma, unsigned long old_addr, + unsigned long new_addr, pte_t *src_pte) +{ + struct hstate *h = hstate_vma(vma); + struct mm_struct *mm = vma->vm_mm; + pte_t *dst_pte, pte; + spinlock_t *src_ptl, *dst_ptl; + + dst_pte = huge_pte_offset(mm, new_addr, huge_page_size(h)); + dst_ptl = huge_pte_lock(h, mm, dst_pte); + src_ptl = huge_pte_lockptr(h, mm, src_pte); + + /* + * We don't have to worry about the ordering of src and dst ptlocks + * because exclusive mmap_sem (or the i_mmap_lock) prevents deadlock. + */ + if (src_ptl != dst_ptl) + spin_lock_nested(src_ptl, SINGLE_DEPTH_NESTING); + + pte = huge_ptep_get_and_clear(mm, old_addr, src_pte); + set_huge_pte_at(mm, new_addr, dst_pte, pte); + + if (src_ptl != dst_ptl) + spin_unlock(src_ptl); + spin_unlock(dst_ptl); +} + +int move_hugetlb_page_tables(struct vm_area_struct *vma, + struct vm_area_struct *new_vma, + unsigned long old_addr, unsigned long new_addr, + unsigned long len) +{ + struct hstate *h = hstate_vma(vma); + struct address_space *mapping = vma->vm_file->f_mapping; + unsigned long sz = huge_page_size(h); + struct mm_struct *mm = vma->vm_mm; + unsigned long old_end = old_addr + len; + unsigned long old_addr_copy; + pte_t *src_pte, *dst_pte; + struct mmu_notifier_range range; + + mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, mm, old_addr, + old_end); + adjust_range_if_pmd_sharing_possible(vma, &range.start, &range.end); + mmu_notifier_invalidate_range_start(&range); + /* Prevent race with file truncation */ + i_mmap_lock_write(mapping); + for (; old_addr < old_end; old_addr += sz, new_addr += sz) { + src_pte = huge_pte_offset(mm, old_addr, sz); + if (!src_pte) + continue; + if (huge_pte_none(huge_ptep_get(src_pte))) + continue; + + /* old_addr arg to huge_pmd_unshare() is a pointer and so the + * arg may be modified. Pass a copy instead to preserve the + * value in old_addr. + */ + old_addr_copy = old_addr; + + if (huge_pmd_unshare(mm, vma, &old_addr_copy, src_pte)) + continue; + + dst_pte = huge_pte_alloc(mm, new_vma, new_addr, sz); + if (!dst_pte) + break; + + move_huge_pte(vma, old_addr, new_addr, src_pte); + } + i_mmap_unlock_write(mapping); + flush_tlb_range(vma, old_end - len, old_end); + mmu_notifier_invalidate_range_end(&range); + + return len + old_addr - old_end; +} + static void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct *vma, unsigned long start, unsigned long end, struct page *ref_page) @@ -6339,12 +6444,6 @@ void adjust_range_if_pmd_sharing_possible(struct vm_area_struct *vma, * sharing is possible. For hugetlbfs, this prevents removal of any page * table entries associated with the address space. This is important as we * are setting up sharing based on existing page table entries (mappings). - * - * NOTE: This routine is only called from huge_pte_alloc. Some callers of - * huge_pte_alloc know that sharing is not possible and do not take - * i_mmap_rwsem as a performance optimization. This is handled by the - * if !vma_shareable check at the beginning of the routine. i_mmap_rwsem is - * only required for subsequent processing. */ pte_t *huge_pmd_share(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long addr, pud_t *pud) diff --git a/mm/mremap.c b/mm/mremap.c index c0b6c41b7b78f..002eec83e91e5 100644 --- a/mm/mremap.c +++ b/mm/mremap.c @@ -489,6 +489,10 @@ unsigned long move_page_tables(struct vm_area_struct *vma, old_end = old_addr + len; flush_cache_range(vma, old_addr, old_end); + if (is_vm_hugetlb_page(vma)) + return move_hugetlb_page_tables(vma, new_vma, old_addr, + new_addr, len); + mmu_notifier_range_init(&range, MMU_NOTIFY_UNMAP, 0, vma, vma->vm_mm, old_addr, old_end); mmu_notifier_invalidate_range_start(&range); @@ -646,6 +650,10 @@ static unsigned long move_vma(struct vm_area_struct *vma, mremap_userfaultfd_prep(new_vma, uf); } + if (is_vm_hugetlb_page(vma)) { + clear_vma_resv_huge_pages(vma); + } + /* Conceal VM_ACCOUNT so old reservation is not undone */ if (vm_flags & VM_ACCOUNT && !(flags & MREMAP_DONTUNMAP)) { vma->vm_flags &= ~VM_ACCOUNT; @@ -739,9 +747,6 @@ static struct vm_area_struct *vma_to_resize(unsigned long addr, (vma->vm_flags & (VM_DONTEXPAND | VM_PFNMAP))) return ERR_PTR(-EINVAL); - if (is_vm_hugetlb_page(vma)) - return ERR_PTR(-EINVAL); - /* We can't remap across vm area boundaries */ if (old_len > vma->vm_end - addr) return ERR_PTR(-EFAULT); @@ -937,6 +942,31 @@ SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned long, old_len, if (mmap_write_lock_killable(current->mm)) return -EINTR; + vma = find_vma(mm, addr); + if (!vma || vma->vm_start > addr) { + ret = EFAULT; + goto out; + } + + if (is_vm_hugetlb_page(vma)) { + struct hstate *h __maybe_unused = hstate_vma(vma); + + old_len = ALIGN(old_len, huge_page_size(h)); + new_len = ALIGN(new_len, huge_page_size(h)); + + /* addrs must be huge page aligned */ + if (addr & ~huge_page_mask(h)) + goto out; + if (new_addr & ~huge_page_mask(h)) + goto out; + + /* + * Don't allow remap expansion, because the underlying hugetlb + * reservation is not yet capable to handle split reservation. + */ + if (new_len > old_len) + goto out; + } if (flags & (MREMAP_FIXED | MREMAP_DONTUNMAP)) { ret = mremap_to(addr, old_len, new_addr, new_len, From patchwork Wed Oct 13 19:58:24 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mina Almasry X-Patchwork-Id: 12556851 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id F0AC6C433F5 for ; Wed, 13 Oct 2021 19:58:36 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 888B161168 for ; Wed, 13 Oct 2021 19:58:36 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 888B161168 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 2AFAD6B0072; Wed, 13 Oct 2021 15:58:36 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 25FE26B0073; Wed, 13 Oct 2021 15:58:36 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 14E6F900002; Wed, 13 Oct 2021 15:58:36 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0234.hostedemail.com [216.40.44.234]) by kanga.kvack.org (Postfix) with ESMTP id 089596B0072 for ; Wed, 13 Oct 2021 15:58:36 -0400 (EDT) Received: from smtpin33.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id BA9DC180295A3 for ; Wed, 13 Oct 2021 19:58:35 +0000 (UTC) X-FDA: 78692476590.33.67FFD8C Received: from mail-yb1-f202.google.com (mail-yb1-f202.google.com [209.85.219.202]) by imf22.hostedemail.com (Postfix) with ESMTP id CB1C11901 for ; Wed, 13 Oct 2021 19:58:34 +0000 (UTC) Received: by mail-yb1-f202.google.com with SMTP id b126-20020a251b84000000b005bd8aca71a2so4434001ybb.4 for ; Wed, 13 Oct 2021 12:58:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:cc; bh=r8ggS6kE7hiwMXlOxoEpS37drz0U1ooi4VjFJlqzmYg=; b=dH8cneN8llnIsoo/wcx1GeFOBdRgkMlo2Z6frqRCQWsB0suLgWv41LsdsLATepH5VI 1uPdj7aA+ZGMgRJ10NY1JH1v0tWKUG/3YJCw+msj+Er1SM8MjwzF089KRvAiE3k1lrLg LyIE5IjroS5VPvFZ8XyTXGl2luL826FsyYfU1IybkUgqiPpCsO9W8Fs0iCz/bA+U0ivy Dy3vfcSiiEICK2p0XiaDyI1IWNclcodOaI1mFDhrz2SNSQ9SAMOfYa4t3I4eTM6crbYu 1kLvRRC/Giv0iUWi3bwU510RoLMCjKmy7YqlUMckbii2QPsi46elED49KG0MF2IcZUVd QG0g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:cc; bh=r8ggS6kE7hiwMXlOxoEpS37drz0U1ooi4VjFJlqzmYg=; b=L0i8J60iQuG93J/Iu8zBpL7tSsXszWBivxYwdqdcctetCgjnTxGv3Uf+Xw5BLJWqd7 aFchgEHYQ7VPjB4CPYj5rzsyuMMzAtCfgxjAqS/VzZ7N9aq61M5w379KBLlfKw1hvgNi oOaJdy/2jWR8X5C0XSlyGftmcpF4Qpwnqu9G9k1lJJO8T2TgVpkoTJnGI/sRiYgGKorl 1+IXXUfUyVeg2mxquYkGMnJMELTzTp0g1OMYorhHmOJQQVdAyI7+HBzTiSR/eSSzF0ov +2zK7S6hP8r5v2Wn2ShNTp44RqKLNDudL5yAIc9Lgg7IjSpXPFzi5rilqqkixdy5l6Zf WxWA== X-Gm-Message-State: AOAM533CfptUPUvcvyBjLhDSsZu3cPVSROM3zeMViISGielnkI8LFH0H XZyAWUbyafDSPWGjRJ9YIJEau8G84iTpNYe+Hw== X-Google-Smtp-Source: ABdhPJx9cCchBMgfHHsObNzgkArLgdXTyq1Ou1oC7F59VOdxZpISoMTloEOmaO2bFrGaz+AUhN5AtwzbK/JAYW0cuA== X-Received: from almasrymina.svl.corp.google.com ([2620:15c:2cd:202:cb5d:d61e:f106:3c70]) (user=almasrymina job=sendgmr) by 2002:a25:7e46:: with SMTP id z67mr1443397ybc.92.1634155114722; Wed, 13 Oct 2021 12:58:34 -0700 (PDT) Date: Wed, 13 Oct 2021 12:58:24 -0700 In-Reply-To: <20211013195825.3058275-1-almasrymina@google.com> Message-Id: <20211013195825.3058275-2-almasrymina@google.com> Mime-Version: 1.0 References: <20211013195825.3058275-1-almasrymina@google.com> X-Mailer: git-send-email 2.33.0.882.g93a45727a2-goog Subject: [PATCH v7 2/2] mm, hugepages: Add hugetlb vma mremap() test From: Mina Almasry Cc: Mina Almasry , Mike Kravetz , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Ken Chen , Chris Kennelly , Michal Hocko , Vlastimil Babka , Kirill Shutemov X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: CB1C11901 X-Stat-Signature: yzom54d13sg47aggn8x78dzowkx7npqg Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=dH8cneN8; spf=pass (imf22.hostedemail.com: domain of 3ajpnYQsKCI0r23r98F3z4rx55x2v.t532z4BE-331Crt1.58x@flex--almasrymina.bounces.google.com designates 209.85.219.202 as permitted sender) smtp.mailfrom=3ajpnYQsKCI0r23r98F3z4rx55x2v.t532z4BE-331Crt1.58x@flex--almasrymina.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-HE-Tag: 1634155114-889105 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Signed-off-by: Mina Almasry Cc: Mike Kravetz Cc: Andrew Morton Cc: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org Cc: Ken Chen Cc: Chris Kennelly Cc: Michal Hocko Cc: Vlastimil Babka Cc: Kirill Shutemov Acked-by: Mike Kravetz --- Changes in v6: - Reverted change in v4: test case now passes huge page aligned addrs to mmap/mremap. Changes in v4: - Added comments to make test output clearer. - Modified test case slightly to test hugepage alignment of new_addr. --- tools/testing/selftests/vm/.gitignore | 1 + tools/testing/selftests/vm/Makefile | 1 + tools/testing/selftests/vm/hugepage-mremap.c | 165 +++++++++++++++++++ 3 files changed, 167 insertions(+) create mode 100644 tools/testing/selftests/vm/hugepage-mremap.c -- 2.33.0.882.g93a45727a2-goog diff --git a/tools/testing/selftests/vm/.gitignore b/tools/testing/selftests/vm/.gitignore index b02eac613fdda..2e7e86e852828 100644 --- a/tools/testing/selftests/vm/.gitignore +++ b/tools/testing/selftests/vm/.gitignore @@ -1,5 +1,6 @@ # SPDX-License-Identifier: GPL-2.0-only hugepage-mmap +hugepage-mremap hugepage-shm khugepaged map_hugetlb diff --git a/tools/testing/selftests/vm/Makefile b/tools/testing/selftests/vm/Makefile index d9605bd10f2de..1607322a112c9 100644 --- a/tools/testing/selftests/vm/Makefile +++ b/tools/testing/selftests/vm/Makefile @@ -29,6 +29,7 @@ TEST_GEN_FILES = compaction_test TEST_GEN_FILES += gup_test TEST_GEN_FILES += hmm-tests TEST_GEN_FILES += hugepage-mmap +TEST_GEN_FILES += hugepage-mremap TEST_GEN_FILES += hugepage-shm TEST_GEN_FILES += khugepaged TEST_GEN_FILES += madv_populate diff --git a/tools/testing/selftests/vm/hugepage-mremap.c b/tools/testing/selftests/vm/hugepage-mremap.c new file mode 100644 index 0000000000000..e84b79922fe6e --- /dev/null +++ b/tools/testing/selftests/vm/hugepage-mremap.c @@ -0,0 +1,165 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * hugepage-mremap: + * + * Example of remapping huge page memory in a user application using the + * mremap system call. Before running this application, make sure that the + * administrator has mounted the hugetlbfs filesystem (on some directory + * like /mnt) using the command mount -t hugetlbfs nodev /mnt. In this + * example, the app is requesting memory of size 10MB that is backed by + * huge pages. + * + */ + +#define _GNU_SOURCE +#include +#include +#include +#include +#include +#include +#include /* Definition of O_* constants */ +#include /* Definition of SYS_* constants */ +#include +#include +#include + +#define LENGTH (1UL * 1024 * 1024 * 1024) + +#define PROTECTION (PROT_READ | PROT_WRITE | PROT_EXEC) +#define FLAGS (MAP_SHARED | MAP_ANONYMOUS) + +static void check_bytes(char *addr) +{ + printf("First hex is %x\n", *((unsigned int *)addr)); +} + +static void write_bytes(char *addr) +{ + unsigned long i; + + for (i = 0; i < LENGTH; i++) + *(addr + i) = (char)i; +} + +static int read_bytes(char *addr) +{ + unsigned long i; + + check_bytes(addr); + for (i = 0; i < LENGTH; i++) + if (*(addr + i) != (char)i) { + printf("Mismatch at %lu\n", i); + return 1; + } + return 0; +} + +static void register_region_with_uffd(char *addr, size_t len) +{ + long uffd; /* userfaultfd file descriptor */ + struct uffdio_api uffdio_api; + struct uffdio_register uffdio_register; + + /* Create and enable userfaultfd object. */ + + uffd = syscall(__NR_userfaultfd, O_CLOEXEC | O_NONBLOCK); + if (uffd == -1) { + perror("userfaultfd"); + exit(1); + } + + uffdio_api.api = UFFD_API; + uffdio_api.features = 0; + if (ioctl(uffd, UFFDIO_API, &uffdio_api) == -1) { + perror("ioctl-UFFDIO_API"); + exit(1); + } + + /* Create a private anonymous mapping. The memory will be + * demand-zero paged--that is, not yet allocated. When we + * actually touch the memory, it will be allocated via + * the userfaultfd. + */ + + addr = mmap(NULL, len, PROT_READ | PROT_WRITE, + MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); + if (addr == MAP_FAILED) { + perror("mmap"); + exit(1); + } + + printf("Address returned by mmap() = %p\n", addr); + + /* Register the memory range of the mapping we just created for + * handling by the userfaultfd object. In mode, we request to track + * missing pages (i.e., pages that have not yet been faulted in). + */ + + uffdio_register.range.start = (unsigned long)addr; + uffdio_register.range.len = len; + uffdio_register.mode = UFFDIO_REGISTER_MODE_MISSING; + if (ioctl(uffd, UFFDIO_REGISTER, &uffdio_register) == -1) { + perror("ioctl-UFFDIO_REGISTER"); + exit(1); + } +} + +int main(void) +{ + int ret = 0; + + int fd = open("/mnt/huge/test", O_CREAT | O_RDWR, 0755); + + if (fd < 0) { + perror("Open failed"); + exit(1); + } + + /* mmap to a PUD aligned address to hopefully trigger pmd sharing. */ + unsigned long suggested_addr = 0x7eaa40000000; + void *haddr = mmap((void *)suggested_addr, LENGTH, PROTECTION, + MAP_HUGETLB | MAP_SHARED | MAP_POPULATE, fd, 0); + printf("Map haddr: Returned address is %p\n", haddr); + if (haddr == MAP_FAILED) { + perror("mmap1"); + exit(1); + } + + /* mmap again to a dummy address to hopefully trigger pmd sharing. */ + suggested_addr = 0x7daa40000000; + void *daddr = mmap((void *)suggested_addr, LENGTH, PROTECTION, + MAP_HUGETLB | MAP_SHARED | MAP_POPULATE, fd, 0); + printf("Map daddr: Returned address is %p\n", daddr); + if (daddr == MAP_FAILED) { + perror("mmap3"); + exit(1); + } + + suggested_addr = 0x7faa40000000; + void *vaddr = + mmap((void *)suggested_addr, LENGTH, PROTECTION, FLAGS, -1, 0); + printf("Map vaddr: Returned address is %p\n", vaddr); + if (vaddr == MAP_FAILED) { + perror("mmap2"); + exit(1); + } + + register_region_with_uffd(haddr, LENGTH); + + void *addr = mremap(haddr, LENGTH, LENGTH, + MREMAP_MAYMOVE | MREMAP_FIXED, vaddr); + if (addr == MAP_FAILED) { + perror("mremap"); + exit(1); + } + + printf("Mremap: Returned address is %p\n", addr); + check_bytes(addr); + write_bytes(addr); + ret = read_bytes(addr); + + munmap(addr, LENGTH); + + return ret; +}