From patchwork Fri Mar 17 10:58:01 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ryan Roberts X-Patchwork-Id: 13178882 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E014DC7618B for ; Fri, 17 Mar 2023 10:58:31 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 926446B0083; Fri, 17 Mar 2023 06:58:29 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 8B23B6B0081; Fri, 17 Mar 2023 06:58:29 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 72CC16B0083; Fri, 17 Mar 2023 06:58:29 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 544926B0080 for ; Fri, 17 Mar 2023 06:58:29 -0400 (EDT) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 2E235A02B6 for ; Fri, 17 Mar 2023 10:58:29 +0000 (UTC) X-FDA: 80578091538.17.13DBE61 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf16.hostedemail.com (Postfix) with ESMTP id 89A05180013 for ; Fri, 17 Mar 2023 10:58:27 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf16.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1679050707; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=rBzd0QwMo2nCREZq1qWZEFUrzGffeQOFlyzkW39kPcw=; b=hRBgn6daVVtxaU1G5fZ/x43sudrT04H4GeozrNIkQa47CkNdVUhMKspoPN4WLkA9ByLMBD PGKIDmsKpeUyS1fDTVe60g2qw4Rx4m6/oGoiflyJkD/ODn0sTPbaqIs2NgEI3rzUUAOTlK 9aJj4/ZomrK4hQT0p1eR8jOkdc2EtfM= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf16.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1679050707; a=rsa-sha256; cv=none; b=toRNiuzHg5rRsaw5X1JszTaZzj6Vd73V3WKp9Vy91iWE4tKZ9zmTVK1aXHuKnN9+xwFS6O H3Oo4mKvJLHWXBD6mWqncIrak+XZx+GtRCn/D6Gwr6TS1AVWPCVCBbBKFY2ESPFlwqdacF 8WrSlp3QAePpDOKCmlW3MGnROzCdEdA= Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id C67C91A25; Fri, 17 Mar 2023 03:59:10 -0700 (PDT) Received: from e125769.cambridge.arm.com (e125769.cambridge.arm.com [10.1.196.26]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 0C6533F885; Fri, 17 Mar 2023 03:58:25 -0700 (PDT) From: Ryan Roberts To: Andrew Morton , "Matthew Wilcox (Oracle)" , "Yin, Fengwei" , Yu Zhao Cc: Ryan Roberts , linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org Subject: [RFC PATCH 5/6] mm: Allocate large folios for anonymous memory Date: Fri, 17 Mar 2023 10:58:01 +0000 Message-Id: <20230317105802.2634004-6-ryan.roberts@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230317105802.2634004-1-ryan.roberts@arm.com> References: <20230317105802.2634004-1-ryan.roberts@arm.com> MIME-Version: 1.0 X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 89A05180013 X-Stat-Signature: dqb45o356xafj6kcnodg78thw1y8bk49 X-Rspam-User: X-HE-Tag: 1679050707-492453 X-HE-Meta: U2FsdGVkX18yJWOOe9c4AMU00kr2THeig2hjrjGTmvR+FwGWLrcU6xwUc4zZjg+xscTJU6IKKjeigrKtY602/WL/nUN9DyhiMYlZOGU8H1WGQkrBieYsNY+s+DAerf1Xv2h6Ky3qzNAxtqOB3jaS5twH08Hit4HCbIvKoSE/o7dFJhmFCJeDCHfqcRNW/BtxRwXc+ml+GDrtRhna9QTN2cNY67IBim2J5fSpinL/7UkQOeUza35Lb5A4vUgA6InQD0vyjczt+Dnu3nbeDwiF7uVUOKWnrhqjCN+b9e9FV8bTYuXdvUCHjIIyFFjE4V7Hg3TkCPtJJdK8u71xo9gFDpVCb2vGtyrURrJKNvdOFqWvVH/EYTpKJEZ7NQb6qECFbyVQX7hbNmjkQH/R/r3yJiIvfaCVyoy26gJY4Izpqt9Jlk2UVZD7awKLUbg3H8steRTUjbAEUjOzZIA0P5vuBDpuERH2Jt/3rM0LpxGGrku/dc1VhS5PighUU2OC3AjzJaKeMjxq5n+5RaFGCfH+R/r2LnGkK0UNVZdHfJ1LbFtFGQdSnj7GUpZKuV94EjY4N8Vtif//JJPZkt0mnOW2rVWeIcbj2Wb+kI6cZqau6d2D3cOivBsDaVyDDfgJhH29tNyoDIQ/kpUh5Sw8/5zewd7hCmnqEM/QJ6s0riqnRuT+NKC+Fmuf+XTK61K/WSZ35bBox0EQ4B/fHu+CW40flJMGVuEY6BZQV5e0nmxZll/ILl1u6Vzr8m+taosefGn+LAMq59Oud7s2eRunXBcfgIHqeWKIY31px+fQLwiZsgfDS6PnAMpOeEWYys6Kmog6b6Xwr3Pm0QM3xakxQgT93R9HwNYarU0fJfOPqgfQskVjQr+1Hhyf96Mkrj3dMif8T4CZO8rBKxr9aRQj0xqjobS0Xu9EUncQPQEak7kpQSIIcWSVRhDJBHTfJhGdnSvqG4fQnQ0ygC5AfQVqCAB 6gKgoRlE eRMGrGS5xBi6U6eB6b4N6FOBxBvP82R+T8tEIq7ohdSntTItoD30UwjpMLjQ6Um5o9Dh0nl14DwNN86l0g4ysqDTBhvaTY9YudxlZYBAJ81DXzuYlHTENgp9GTPQhi/Ln45IGTdAK+/QLlQgdBGwnvxYqNn6364XtGrP9gDvIiekby5E= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Add the machinery to determine what order of folio to allocate within do_anonymous_page() and deal with racing faults to the same region. TODO: For now, the maximum order is set to 4. This should probably be set per-vma based on factors, and adjusted dynamically. Signed-off-by: Ryan Roberts --- mm/memory.c | 140 ++++++++++++++++++++++++++++++++++++++++++++++------ 1 file changed, 124 insertions(+), 16 deletions(-) -- 2.25.1 diff --git a/mm/memory.c b/mm/memory.c index c9e09415ee18..3d01eab46d9c 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -4013,6 +4013,77 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) return ret; } +/* + * Returns index of first pte that is not none, or nr if all are none. + */ +static int check_all_ptes_none(pte_t *pte, int nr) +{ + int i; + + for (i = 0; i < nr; i++) { + if (!pte_none(*pte++)) + return i; + } + + return nr; +} + +static void calc_anonymous_folio_order(struct vm_fault *vmf, + int *order_out, + unsigned long *addr_out) +{ + /* + * The aim here is to determine what size of folio we should allocate + * for this fault. Factors include: + * - Folio must be naturally aligned within VA space + * - Folio must not breach boundaries of vma + * - Folio must be fully contained inside one pmd entry + * - Folio must not overlap any non-none ptes + * - Order must not be higher than *order_out upon entry + * + * Note that the caller may or may not choose to lock the pte. If + * unlocked, the calculation should be considered an estimate that will + * need to be validated under the lock. + */ + + struct vm_area_struct *vma = vmf->vma; + int nr; + int order = min(*order_out, PMD_SHIFT - PAGE_SHIFT); + unsigned long addr; + pte_t *pte; + pte_t *first_set = NULL; + int ret; + + for (; order > 0; order--) { + nr = 1 << order; + addr = ALIGN_DOWN(vmf->address, nr * PAGE_SIZE); + pte = vmf->pte - ((vmf->address - addr) >> PAGE_SHIFT); + + /* Check vma bounds. */ + if (addr < vma->vm_start || + addr + nr * PAGE_SIZE > vma->vm_end) + continue; + + /* All ptes covered by order already known to be none. */ + if (pte + nr <= first_set) + break; + + /* Already found set pte in range covered by order. */ + if (pte <= first_set) + continue; + + /* Need to check if all the ptes are none. */ + ret = check_all_ptes_none(pte, nr); + if (ret == nr) + break; + + first_set = pte + ret; + } + + *order_out = order; + *addr_out = order > 0 ? addr : vmf->address; +} + /* * We enter with non-exclusive mmap_lock (to exclude vma changes, * but allow concurrent faults), and pte mapped but not yet locked. @@ -4024,6 +4095,9 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf) struct folio *folio; vm_fault_t ret = 0; pte_t entry; + unsigned long addr; + int order = 4; // TODO: Policy for maximum folio order. + int pgcount; /* File mapping without ->vm_ops ? */ if (vma->vm_flags & VM_SHARED) @@ -4065,24 +4139,41 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf) pte_unmap_unlock(vmf->pte, vmf->ptl); return handle_userfault(vmf, VM_UFFD_MISSING); } - goto setpte; + set_pte_at(vma->vm_mm, vmf->address, vmf->pte, entry); + + /* No need to invalidate - it was non-present before */ + update_mmu_cache(vma, vmf->address, vmf->pte); + goto unlock; } - /* Allocate our own private page. */ +retry: + /* + * Estimate the folio order to allocate. We are not under the ptl here + * so this estiamte needs to be re-checked later once we have the lock. + */ + vmf->pte = pte_offset_map(vmf->pmd, vmf->address); + calc_anonymous_folio_order(vmf, &order, &addr); + pte_unmap(vmf->pte); + + /* Allocate our own private folio. */ if (unlikely(anon_vma_prepare(vma))) goto oom; - folio = try_vma_alloc_zeroed_movable_folio(vma, vmf->address, 0); + folio = try_vma_alloc_zeroed_movable_folio(vma, addr, order); if (!folio) goto oom; + /* We may have been granted less than we asked for. */ + order = folio_order(folio); + pgcount = folio_nr_pages(folio); + if (mem_cgroup_charge(folio, vma->vm_mm, GFP_KERNEL)) goto oom_free_page; - cgroup_throttle_swaprate(&folio->page, GFP_KERNEL); + folio_throttle_swaprate(folio, GFP_KERNEL); /* * The memory barrier inside __folio_mark_uptodate makes sure that - * preceding stores to the page contents become visible before - * the set_pte_at() write. + * preceding stores to the folio contents become visible before + * the set_ptes() write. */ __folio_mark_uptodate(folio); @@ -4091,11 +4182,26 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf) if (vma->vm_flags & VM_WRITE) entry = pte_mkwrite(pte_mkdirty(entry)); - vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, vmf->address, - &vmf->ptl); - if (!pte_none(*vmf->pte)) { - update_mmu_tlb(vma, vmf->address, vmf->pte); - goto release; + vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, addr, &vmf->ptl); + + /* + * Ensure our estimate above is still correct; we could have raced with + * another thread to service a fault in the region. + */ + if (check_all_ptes_none(vmf->pte, pgcount) != pgcount) { + pte_t *pte = vmf->pte + ((vmf->address - addr) >> PAGE_SHIFT); + + /* If faulting pte was allocated by another, exit early. */ + if (!pte_none(*pte)) { + update_mmu_tlb(vma, vmf->address, pte); + goto release; + } + + /* Else try again, with a lower order. */ + pte_unmap_unlock(vmf->pte, vmf->ptl); + folio_put(folio); + order--; + goto retry; } ret = check_stable_address_space(vma->vm_mm); @@ -4109,14 +4215,16 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf) return handle_userfault(vmf, VM_UFFD_MISSING); } - inc_mm_counter(vma->vm_mm, MM_ANONPAGES); - folio_add_new_anon_rmap(folio, vma, vmf->address); + folio_ref_add(folio, pgcount - 1); + + add_mm_counter(vma->vm_mm, MM_ANONPAGES, pgcount); + folio_add_new_anon_rmap_range(folio, vma, addr); folio_add_lru_vma(folio, vma); -setpte: - set_pte_at(vma->vm_mm, vmf->address, vmf->pte, entry); + + set_ptes(vma->vm_mm, addr, vmf->pte, entry, pgcount); /* No need to invalidate - it was non-present before */ - update_mmu_cache(vma, vmf->address, vmf->pte); + update_mmu_cache_range(vma, addr, vmf->pte, pgcount); unlock: pte_unmap_unlock(vmf->pte, vmf->ptl); return ret;