From patchwork Thu Jan 5 10:18:24 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 13089660 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D4866C53210 for ; Thu, 5 Jan 2023 10:19:36 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4AF3D940017; Thu, 5 Jan 2023 05:19:35 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 4619F940008; Thu, 5 Jan 2023 05:19:35 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 30087940017; Thu, 5 Jan 2023 05:19:35 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 1DD04940008 for ; Thu, 5 Jan 2023 05:19:35 -0500 (EST) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id E723380D72 for ; Thu, 5 Jan 2023 10:19:34 +0000 (UTC) X-FDA: 80320348668.26.76D328B Received: from mail-yw1-f201.google.com (mail-yw1-f201.google.com [209.85.128.201]) by imf27.hostedemail.com (Postfix) with ESMTP id 5F7CE4000F for ; Thu, 5 Jan 2023 10:19:33 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b="aZrPoex/"; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf27.hostedemail.com: domain of 3NKS2YwoKCHsisgntfgsnmfnnfkd.bnlkhmtw-lljuZbj.nqf@flex--jthoughton.bounces.google.com designates 209.85.128.201 as permitted sender) smtp.mailfrom=3NKS2YwoKCHsisgntfgsnmfnnfkd.bnlkhmtw-lljuZbj.nqf@flex--jthoughton.bounces.google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1672913973; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=zYgA7vTy+UiW0jcHBjsT9jPj1JCczIw3Dv3/VC3unlc=; b=8glhXEN+aHLPt3JAWxgTOEPgOwJgnCf0+3YYG7GJavAWw0CIt+mJ9+dJbs8n+by2Fzerqq SracZKUGECOuH9+MxGg2LdQmuKD3UxpX8bSgUgiR3Put4VdDhtrYBzl6KxhC5R2jpzNZJV Us1V8cAvaNiEm62KlxN6u//bX4LB/MI= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b="aZrPoex/"; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf27.hostedemail.com: domain of 3NKS2YwoKCHsisgntfgsnmfnnfkd.bnlkhmtw-lljuZbj.nqf@flex--jthoughton.bounces.google.com designates 209.85.128.201 as permitted sender) smtp.mailfrom=3NKS2YwoKCHsisgntfgsnmfnnfkd.bnlkhmtw-lljuZbj.nqf@flex--jthoughton.bounces.google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1672913973; a=rsa-sha256; cv=none; b=F9oV8/TsczSQE4LlS0UrSgOwhbBA+RsloIx+OxDX3IFygaWkygmIwJE2drXetUR9yqm1GS kXhYuEoo6qhxMb1o9Nrs2wIj1Vvrjf4A84nOhaq2y2++ftGjjmKSPeWs1lSrB4Ww2+l92E /wsKfpLEHsA1iTyLY6kvjXTkn3C9yrM= Received: by mail-yw1-f201.google.com with SMTP id 00721157ae682-460ab8a327eso378498687b3.23 for ; Thu, 05 Jan 2023 02:19:33 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=zYgA7vTy+UiW0jcHBjsT9jPj1JCczIw3Dv3/VC3unlc=; b=aZrPoex/pVw6R2bZjUd0s/eQeCQD6Z+ZR1pwJBBdpF4+ODpwC87+571DSmG52tQd2A huBv6LZpBuHxyiw0O3CiSCiBc811ySkiBmjVOpz7sDguClfGeqjOyal8IA+Ji11V8jkN bEFusYMrdUdHMUBqX7dtj5Cx+qMWmrKu+iN47FofwRZruFCBAY5cfV+BlJmz8ohPVsX3 hreD3U4vshXW0/FmhhWMhII3bpHEdeySPmB3+xk2RkaW9C04IumqG5jIlCikAQysXd4v Ip9f+khY1/cJbjOG79FTa5wulnPT7jLwfHVYArewSw1+oGaupqCkA/OQmR8a7Mqa+kJj +N4g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=zYgA7vTy+UiW0jcHBjsT9jPj1JCczIw3Dv3/VC3unlc=; b=JrQjMFpihl5wBWpLbh/f+z3myyZN3wNSJwHGRRV/j1wwK0oyw0MSp82CgWO0ytrRkT dtN5TnPKJ56Sq5OxnhO0yI/P+UPkCJTfwf8eKEOcbPC0z+wqCUch+HaraZeiHDWDFvFB swtNs7m+qC1b/iZD3UK11z8wHD0pnVErDBDi2wqvX7lqzidvwIRSxbyXJcxh7FH4o2Dm 96vxzb78AFJUof9MDbn7wQMnRB1PBWSfVdPxMnB39z07bDdT1JTfNgcbd28fCbd2i+HJ lAVzLNsurOHkN5Ao5/FelzlHhm+Fc11Ts4crqLF0BOR8XQPmc/yc+AJIRnaKGmrwBwrx fI0g== X-Gm-Message-State: AFqh2kqvC6wTsd8EXiUbp7ptzKWKeVUYSRXYT/+k/LoaA6ua5+PbizDj XZxJ64tNvbirvqiJhamh77xufmBOk6acd3Bx X-Google-Smtp-Source: AMrXdXuxfUQNLilPrRGqsslWu2QZEFsqppw9mENfwOLUFmAwgaOYPQkWmXE3a1h40OvKVj2fDKV0vtvou6nlsSLw X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a0d:d454:0:b0:482:a03a:3fcd with SMTP id w81-20020a0dd454000000b00482a03a3fcdmr3459124ywd.99.1672913972517; Thu, 05 Jan 2023 02:19:32 -0800 (PST) Date: Thu, 5 Jan 2023 10:18:24 +0000 In-Reply-To: <20230105101844.1893104-1-jthoughton@google.com> Mime-Version: 1.0 References: <20230105101844.1893104-1-jthoughton@google.com> X-Mailer: git-send-email 2.39.0.314.g84b9a713c41-goog Message-ID: <20230105101844.1893104-27-jthoughton@google.com> Subject: [PATCH 26/46] hugetlb: add HGM support for copy_hugetlb_page_range From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton X-Rspamd-Queue-Id: 5F7CE4000F X-Rspamd-Server: rspam09 X-Rspam-User: X-Stat-Signature: q7xo9ye6u6cwsq5dfwzugcnzrutp1yaf X-HE-Tag: 1672913973-954965 X-HE-Meta: U2FsdGVkX19ZkEago6tkl9FSYvsl0hR85pehNByoSElNTGpbh8uCFEhbe1AVzUkuyO0kD3DyqmkgbM4Po/Yj4NdjL+PCoxxL0HJCcN/SA14nysZ0eUueGzEDPhgLFkU03GOees+wkUQfwyi7ThzlKSCNMajZsROdXehIq4OmajGDh9fpOWcTVPetzBIazojlq7VbIsYTYBd/35d53xyj1U4+qVbc9aNAIyLmfpf8PI5oxjbwLfNOmrH625G6ss3j5aaIs2zivS+yPUZM9paTY8P+6xzGCEAUVS2k4Kbo6tCUuDooM3OHyVzAcCFu9ad7mcu8akmvE5P1MFGZPMXnwToc9p/EjL1MioCwZy60gnfsMiR2fjh5WvCBm7Owsa83mK2L0YUcNxy83vLYVkOucdDEkP5DFp4Qd+PppKm3sx4X4Xlgpy79wAPIITx+gG4jKj5JeQzMfZM9SisPiuQepWaDyUwVsdBi5i+Y+6SnFzOoL6oTQZ+Y/fVmfjqr2t7p6Wa0Vn5bAD6vDthclfqfscft4sG3XYZHL9RuBdv8RvgF9piy8xvKcx5/BMW3QyhNm9EBaNcAB9DxA0kwR8q49pV9OQZHzUMpycOzQqqHkA29q4KYNr/Pic9C4b2HiMRwidfp8aClp5V1ghn5TvWym92tZT4eRYVd5eLDSIb1IP1G3xSdsDed16rhijpmngg71Ma/AzFC3hAPfcph6uIrfKv+4yhVFv/XzKjyo4WlaxE2GxJ+WhryspTaBfSKFpkP49cv+HEQVr0HTWJ3bkZxtWbgMmM6loy+yacTRdd+C6EYkyglIcNeYdgY4BzieNxXeQzquopPc1InlwRJ7zKR2iUwYBcvsRXJ3tlFw46FRnfWi4LkEqZGIqy6ei58JOfkV8mAwE7X2VtFW48RnJ1qG55uqqLqWvFKRsHmFDxT9wEY2a58BIT3f71v9eKLJGAN8/xwx4/QpZShAh4/uYf /ENuMhh2 +9k4+v04jwvrOTGItu+2S4Ndf5uS/uwJOlleJkr//Rwjmttl0fOnbx9Z9KjgPdxlN4K8MiyerqZD6P+faV0mDmr+eQ++0I83xsPrmWsNRww+N/S4nJNPrXKcUxWlZ4mCaHcH7rn+fwdGFJ4fNsUL5DjBj5wfDBlwe3bqA1Rkcko6RDbDASXk4UH1oK3Wi9TTLDcFfpR7haY3leoqMyxlZvHZbG2YMLnDLufvyOyNOTToDUyyy2xCatscV+ieDsC/AfMRJsQv1Fl1xfUijlqsdHflp26WZlGgeV145qEx8haDcxnHSBnVfXfLVZLHYiNX46x86ymnhVMGNSCzxIg7BZ3rYSjuE4sb6uHWccUDTyHSPP1vH7lbCNsnNYXKBz79IE8ZIXOdpEAr63Qm/903fDNgOK5BOWGw0MtNsvpRmOobJzqK4D3/DG6yUDHrJZ9d9Ct4x X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: This allows fork() to work with high-granularity mappings. The page table structure is copied such that partially mapped regions will remain partially mapped in the same way for the new process. A page's reference count is incremented for *each* portion of it that is mapped in the page table. For example, if you have a PMD-mapped 1G page, the reference count and mapcount will be incremented by 512. Signed-off-by: James Houghton --- mm/hugetlb.c | 75 ++++++++++++++++++++++++++++++++++------------------ 1 file changed, 50 insertions(+), 25 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 718572444a73..21a5116f509b 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -5106,7 +5106,8 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, struct vm_area_struct *src_vma) { pte_t *src_pte, *dst_pte, entry; - struct page *ptepage; + struct hugetlb_pte src_hpte, dst_hpte; + struct page *ptepage, *hpage; unsigned long addr; bool cow = is_cow_mapping(src_vma->vm_flags); struct hstate *h = hstate_vma(src_vma); @@ -5126,26 +5127,34 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, } else { /* * For shared mappings the vma lock must be held before - * calling hugetlb_walk() in the src vma. Otherwise, the - * returned ptep could go away if part of a shared pmd and - * another thread calls huge_pmd_unshare. + * calling hugetlb_full_walk() in the src vma. Otherwise, the + * returned hpte could go away if + * - part of a shared pmd and another thread calls + * - huge_pmd_unshare, or + * - another thread collapses a high-granularity mapping. */ hugetlb_vma_lock_read(src_vma); } last_addr_mask = hugetlb_mask_last_page(h); - for (addr = src_vma->vm_start; addr < src_vma->vm_end; addr += sz) { + addr = src_vma->vm_start; + while (addr < src_vma->vm_end) { spinlock_t *src_ptl, *dst_ptl; - src_pte = hugetlb_walk(src_vma, addr, sz); - if (!src_pte) { - addr |= last_addr_mask; + unsigned long hpte_sz; + + if (hugetlb_full_walk(&src_hpte, src_vma, addr)) { + addr = (addr | last_addr_mask) + sz; continue; } - dst_pte = huge_pte_alloc(dst, dst_vma, addr, sz); - if (!dst_pte) { - ret = -ENOMEM; + ret = hugetlb_full_walk_alloc(&dst_hpte, dst_vma, addr, + hugetlb_pte_size(&src_hpte)); + if (ret) break; - } + + src_pte = src_hpte.ptep; + dst_pte = dst_hpte.ptep; + + hpte_sz = hugetlb_pte_size(&src_hpte); /* * If the pagetables are shared don't copy or take references. @@ -5155,13 +5164,14 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, * another vma. So page_count of ptep page is checked instead * to reliably determine whether pte is shared. */ - if (page_count(virt_to_page(dst_pte)) > 1) { - addr |= last_addr_mask; + if (hugetlb_pte_size(&dst_hpte) == sz && + page_count(virt_to_page(dst_pte)) > 1) { + addr = (addr | last_addr_mask) + sz; continue; } - dst_ptl = huge_pte_lock(h, dst, dst_pte); - src_ptl = huge_pte_lockptr(huge_page_shift(h), src, src_pte); + dst_ptl = hugetlb_pte_lock(&dst_hpte); + src_ptl = hugetlb_pte_lockptr(&src_hpte); spin_lock_nested(src_ptl, SINGLE_DEPTH_NESTING); entry = huge_ptep_get(src_pte); again: @@ -5205,10 +5215,15 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, */ if (userfaultfd_wp(dst_vma)) set_huge_pte_at(dst, addr, dst_pte, entry); + } else if (!hugetlb_pte_present_leaf(&src_hpte, entry)) { + /* Retry the walk. */ + spin_unlock(src_ptl); + spin_unlock(dst_ptl); + continue; } else { - entry = huge_ptep_get(src_pte); ptepage = pte_page(entry); - get_page(ptepage); + hpage = compound_head(ptepage); + get_page(hpage); /* * Failing to duplicate the anon rmap is a rare case @@ -5220,25 +5235,31 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, * need to be without the pgtable locks since we could * sleep during the process. */ - if (!PageAnon(ptepage)) { - page_dup_file_rmap(ptepage, true); - } else if (page_try_dup_anon_rmap(ptepage, true, + if (!PageAnon(hpage)) { + page_dup_file_rmap(hpage, true); + } else if (page_try_dup_anon_rmap(hpage, true, src_vma)) { pte_t src_pte_old = entry; struct page *new; + if (hugetlb_pte_size(&src_hpte) != sz) { + put_page(hpage); + ret = -EINVAL; + break; + } + spin_unlock(src_ptl); spin_unlock(dst_ptl); /* Do not use reserve as it's private owned */ new = alloc_huge_page(dst_vma, addr, 1); if (IS_ERR(new)) { - put_page(ptepage); + put_page(hpage); ret = PTR_ERR(new); break; } - copy_user_huge_page(new, ptepage, addr, dst_vma, + copy_user_huge_page(new, hpage, addr, dst_vma, npages); - put_page(ptepage); + put_page(hpage); /* Install the new huge page if src pte stable */ dst_ptl = huge_pte_lock(h, dst, dst_pte); @@ -5256,6 +5277,7 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, hugetlb_install_page(dst_vma, dst_pte, addr, new); spin_unlock(src_ptl); spin_unlock(dst_ptl); + addr += hugetlb_pte_size(&src_hpte); continue; } @@ -5272,10 +5294,13 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, } set_huge_pte_at(dst, addr, dst_pte, entry); - hugetlb_count_add(npages, dst); + hugetlb_count_add( + hugetlb_pte_size(&dst_hpte) / PAGE_SIZE, + dst); } spin_unlock(src_ptl); spin_unlock(dst_ptl); + addr += hugetlb_pte_size(&src_hpte); } if (cow) {