From patchwork Wed Jul 6 23:59:26 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zach O'Keefe X-Patchwork-Id: 12908932 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1BE81C43334 for ; Thu, 7 Jul 2022 00:06:21 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 294E48E0008; Wed, 6 Jul 2022 20:06:20 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 1F83B8E0001; Wed, 6 Jul 2022 20:06:20 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id F3DDB8E0008; Wed, 6 Jul 2022 20:06:19 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id D82068E0001 for ; Wed, 6 Jul 2022 20:06:19 -0400 (EDT) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id B21E1357D3 for ; Thu, 7 Jul 2022 00:06:19 +0000 (UTC) X-FDA: 79658361678.21.6DDC70F Received: from mail-pf1-f202.google.com (mail-pf1-f202.google.com [209.85.210.202]) by imf17.hostedemail.com (Postfix) with ESMTP id 4662F4001E for ; Thu, 7 Jul 2022 00:06:19 +0000 (UTC) Received: by mail-pf1-f202.google.com with SMTP id h5-20020a62b405000000b00528c76085e4so368114pfn.15 for ; Wed, 06 Jul 2022 17:06:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=azaBRPKMBQNqTel70OP7y5jVN502POn9ChWb2FkVrDQ=; b=ieavFtb/cng3OFAq8TepAn41fg6ZX5kQ5rpRIES1arug47WUAXEycQ0wwcoAneisoN MMwhLOocSmb+Tzxt+zIzppbdYbQS5mcMQnEhunNJVY78gH+it07ffhPNQh0/lOouUPfJ aIgilgwYDixOxQVsOtLk79VrL0yazVP9aEcEl+kZPzQd+xQpTpEME7MiSefO2dYHBR3q y2sKC7QC5DF+bPCh2RHytfzcqCrRbXmsli42FlOiaoDTk+2+d6uqOUMyTnkhG+7FjsQu luo0H0AzLux49x7s8kAAtH5EZoBzBcZIA6Fp97/AnKU12eFB2NE85qUS/bbLuIS9NPT+ RJHQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=azaBRPKMBQNqTel70OP7y5jVN502POn9ChWb2FkVrDQ=; b=DPKmmf+OBGqB+8kHO6AjWs/ppB/4qaqhIY37VFiM+OuFVFTH+cZa6g15qgiCtO+1CP 9DcyH41X2SzuHjBTTCCTyG7akJ8ZKZVD89FfzRTwtmpeLAge3lVGwbPM4Eh6MFJP8dqm TfI0OdeOh7daAvbcZmHZAg+yag6umaFN/+HNtzIWk1HLF3HnrVmryc9T4FQrbTge4DIv +g3fRPFS/W3J6wEdrWpdTtMSYhSScYazEMGr40ZfRQtKBBvX4SKggdPVjFrdnx/1MHaD XFwxvYWssHLEBKJsQULKiC4B4KmNjApcv0OY0twFKk0d+igcqUqKPOXNbMYx/TffpbHO bAXg== X-Gm-Message-State: AJIora+rTMZ0HcvpWRdNVVhZQ65xBQXQQyM1G75DOZcn19QOoptwcWyv 2XsCLSlVdT8kSHOhHe7xTCALgV5NfhiE X-Google-Smtp-Source: AGRyM1vWR8SR/F+5ayVsB5WyGiF4GYz6sVcUJaM/wuQ0jIhbtASVv6LLo+HqEpOQv87VGvxKQtSjd6DJYyPz X-Received: from zokeefe3.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:1b6]) (user=zokeefe job=sendgmr) by 2002:a17:903:230d:b0:16a:73ce:9068 with SMTP id d13-20020a170903230d00b0016a73ce9068mr50533610plh.57.1657152378396; Wed, 06 Jul 2022 17:06:18 -0700 (PDT) Date: Wed, 6 Jul 2022 16:59:26 -0700 In-Reply-To: <20220706235936.2197195-1-zokeefe@google.com> Message-Id: <20220706235936.2197195-9-zokeefe@google.com> Mime-Version: 1.0 References: <20220706235936.2197195-1-zokeefe@google.com> X-Mailer: git-send-email 2.37.0.rc0.161.g10f37bed90-goog Subject: [mm-unstable v7 08/18] mm/khugepaged: record SCAN_PMD_MAPPED when scan_pmd() finds hugepage From: "Zach O'Keefe" To: Alex Shi , David Hildenbrand , David Rientjes , Matthew Wilcox , Michal Hocko , Pasha Tatashin , Peter Xu , Rongwei Wang , SeongJae Park , Song Liu , Vlastimil Babka , Yang Shi , Zi Yan , linux-mm@kvack.org Cc: Andrea Arcangeli , Andrew Morton , Arnd Bergmann , Axel Rasmussen , Chris Kennelly , Chris Zankel , Helge Deller , Hugh Dickins , Ivan Kokshaysky , "James E.J. Bottomley" , Jens Axboe , "Kirill A. Shutemov" , Matt Turner , Max Filippov , Miaohe Lin , Minchan Kim , Patrick Xia , Pavel Begunkov , Thomas Bogendoerfer , "Zach O'Keefe" ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1657152379; a=rsa-sha256; cv=none; b=ogeiOQMNjkgrD+CGvNSQdlNO0M3r07CkJszJ8EAwjqtOGQUZuOjcTvSecqbFZM2hRs7me6 N1pYuSnanlLckPt206Jhi2jRgR5MfgBla8cK2YsRWKB1JDNBvg3KAEWBxlrtsTx8f1Fa5p CUcqG3vVXUhXCDNwAYmxKDoJw3yZFLI= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b="ieavFtb/"; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf17.hostedemail.com: domain of 3eiPGYgcKCPUwlhbbcbdlldib.Zljifkru-jjhsXZh.lod@flex--zokeefe.bounces.google.com designates 209.85.210.202 as permitted sender) smtp.mailfrom=3eiPGYgcKCPUwlhbbcbdlldib.Zljifkru-jjhsXZh.lod@flex--zokeefe.bounces.google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1657152379; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=azaBRPKMBQNqTel70OP7y5jVN502POn9ChWb2FkVrDQ=; b=BCBDIbjSM3SdcAE5l6iIBT5JHqaxTZQJ9E4rMKgXRjp78EO05ppWGMcaFkMn9yPbfU2cYD msmMREtnirIopZR6x8BHk4e2emhhFmZdwzRUP5GTj2y13XMPhXIocvp0GpB0jmOPfwAb/f Z+IGlJqdHXAOiGGJ+9JhU0sEANEtyGw= X-Rspamd-Server: rspam04 X-Rspam-User: Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b="ieavFtb/"; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf17.hostedemail.com: domain of 3eiPGYgcKCPUwlhbbcbdlldib.Zljifkru-jjhsXZh.lod@flex--zokeefe.bounces.google.com designates 209.85.210.202 as permitted sender) smtp.mailfrom=3eiPGYgcKCPUwlhbbcbdlldib.Zljifkru-jjhsXZh.lod@flex--zokeefe.bounces.google.com X-Stat-Signature: mmssbgqbfxzjzefpghrkh6g4mrrnzrte X-Rspamd-Queue-Id: 4662F4001E X-HE-Tag: 1657152379-480671 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: When scanning an anon pmd to see if it's eligible for collapse, return SCAN_PMD_MAPPED if the pmd already maps a hugepage. Note that SCAN_PMD_MAPPED is different from SCAN_PAGE_COMPOUND used in the file-collapse path, since the latter might identify pte-mapped compound pages. This is required by MADV_COLLAPSE which necessarily needs to know what hugepage-aligned/sized regions are already pmd-mapped. In order to determine if a pmd already maps a hugepage, refactor mm_find_pmd(): Return mm_find_pmd() to it's pre-commit f72e7dcdd252 ("mm: let mm_find_pmd fix buggy race with THP fault") behavior. ksm was the only caller that explicitly wanted a pte-mapping pmd, so open code the pte-mapping logic there (pmd_present() and pmd_trans_huge() checks). Undo revert change in commit f72e7dcdd252 ("mm: let mm_find_pmd fix buggy race with THP fault") that open-coded split_huge_pmd_address() pmd lookup and use mm_find_pmd() instead. Signed-off-by: Zach O'Keefe Reviewed-by: Yang Shi --- include/trace/events/huge_memory.h | 1 + mm/huge_memory.c | 18 +-------- mm/internal.h | 2 +- mm/khugepaged.c | 60 ++++++++++++++++++++++++------ mm/ksm.c | 10 +++++ mm/rmap.c | 15 +++----- 6 files changed, 67 insertions(+), 39 deletions(-) diff --git a/include/trace/events/huge_memory.h b/include/trace/events/huge_memory.h index d651f3437367..55392bf30a03 100644 --- a/include/trace/events/huge_memory.h +++ b/include/trace/events/huge_memory.h @@ -11,6 +11,7 @@ EM( SCAN_FAIL, "failed") \ EM( SCAN_SUCCEED, "succeeded") \ EM( SCAN_PMD_NULL, "pmd_null") \ + EM( SCAN_PMD_MAPPED, "page_pmd_mapped") \ EM( SCAN_EXCEED_NONE_PTE, "exceed_none_pte") \ EM( SCAN_EXCEED_SWAP_PTE, "exceed_swap_pte") \ EM( SCAN_EXCEED_SHARED_PTE, "exceed_shared_pte") \ diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 4fbe43dc1568..fb76db6c703e 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2363,25 +2363,11 @@ void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd, void split_huge_pmd_address(struct vm_area_struct *vma, unsigned long address, bool freeze, struct folio *folio) { - pgd_t *pgd; - p4d_t *p4d; - pud_t *pud; - pmd_t *pmd; + pmd_t *pmd = mm_find_pmd(vma->vm_mm, address); - pgd = pgd_offset(vma->vm_mm, address); - if (!pgd_present(*pgd)) + if (!pmd) return; - p4d = p4d_offset(pgd, address); - if (!p4d_present(*p4d)) - return; - - pud = pud_offset(p4d, address); - if (!pud_present(*pud)) - return; - - pmd = pmd_offset(pud, address); - __split_huge_pmd(vma, pmd, address, freeze, folio); } diff --git a/mm/internal.h b/mm/internal.h index 6e14749ad1e5..ef8c23fb678f 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -188,7 +188,7 @@ extern void reclaim_throttle(pg_data_t *pgdat, enum vmscan_throttle_state reason /* * in mm/rmap.c: */ -extern pmd_t *mm_find_pmd(struct mm_struct *mm, unsigned long address); +pmd_t *mm_find_pmd(struct mm_struct *mm, unsigned long address); /* * in mm/page_alloc.c diff --git a/mm/khugepaged.c b/mm/khugepaged.c index b0e20db3f805..c7a09cc9a0e8 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -28,6 +28,7 @@ enum scan_result { SCAN_FAIL, SCAN_SUCCEED, SCAN_PMD_NULL, + SCAN_PMD_MAPPED, SCAN_EXCEED_NONE_PTE, SCAN_EXCEED_SWAP_PTE, SCAN_EXCEED_SHARED_PTE, @@ -871,6 +872,45 @@ static int hugepage_vma_revalidate(struct mm_struct *mm, unsigned long address, return SCAN_SUCCEED; } +static int find_pmd_or_thp_or_none(struct mm_struct *mm, + unsigned long address, + pmd_t **pmd) +{ + pmd_t pmde; + + *pmd = mm_find_pmd(mm, address); + if (!*pmd) + return SCAN_PMD_NULL; + + pmde = pmd_read_atomic(*pmd); + +#ifdef CONFIG_TRANSPARENT_HUGEPAGE + /* See comments in pmd_none_or_trans_huge_or_clear_bad() */ + barrier(); +#endif + if (!pmd_present(pmde)) + return SCAN_PMD_NULL; + if (pmd_trans_huge(pmde)) + return SCAN_PMD_MAPPED; + if (pmd_bad(pmde)) + return SCAN_PMD_NULL; + return SCAN_SUCCEED; +} + +static int check_pmd_still_valid(struct mm_struct *mm, + unsigned long address, + pmd_t *pmd) +{ + pmd_t *new_pmd; + int result = find_pmd_or_thp_or_none(mm, address, &new_pmd); + + if (result != SCAN_SUCCEED) + return result; + if (new_pmd != pmd) + return SCAN_FAIL; + return SCAN_SUCCEED; +} + /* * Bring missing pages in from swap, to complete THP collapse. * Only done if khugepaged_scan_pmd believes it is worthwhile. @@ -982,9 +1022,8 @@ static int collapse_huge_page(struct mm_struct *mm, unsigned long address, goto out_nolock; } - pmd = mm_find_pmd(mm, address); - if (!pmd) { - result = SCAN_PMD_NULL; + result = find_pmd_or_thp_or_none(mm, address, &pmd); + if (result != SCAN_SUCCEED) { mmap_read_unlock(mm); goto out_nolock; } @@ -1012,7 +1051,8 @@ static int collapse_huge_page(struct mm_struct *mm, unsigned long address, if (result != SCAN_SUCCEED) goto out_up_write; /* check if the pmd is still valid */ - if (mm_find_pmd(mm, address) != pmd) + result = check_pmd_still_valid(mm, address, pmd); + if (result != SCAN_SUCCEED) goto out_up_write; anon_vma_lock_write(vma->anon_vma); @@ -1115,11 +1155,9 @@ static int khugepaged_scan_pmd(struct mm_struct *mm, struct vm_area_struct *vma, VM_BUG_ON(address & ~HPAGE_PMD_MASK); - pmd = mm_find_pmd(mm, address); - if (!pmd) { - result = SCAN_PMD_NULL; + result = find_pmd_or_thp_or_none(mm, address, &pmd); + if (result != SCAN_SUCCEED) goto out; - } memset(cc->node_load, 0, sizeof(cc->node_load)); pte = pte_offset_map_lock(mm, pmd, address, &ptl); @@ -1373,8 +1411,7 @@ void collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr) if (!PageHead(hpage)) goto drop_hpage; - pmd = mm_find_pmd(mm, haddr); - if (!pmd) + if (find_pmd_or_thp_or_none(mm, haddr, &pmd) != SCAN_SUCCEED) goto drop_hpage; start_pte = pte_offset_map_lock(mm, pmd, haddr, &ptl); @@ -1492,8 +1529,7 @@ static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff) if (vma->vm_end < addr + HPAGE_PMD_SIZE) continue; mm = vma->vm_mm; - pmd = mm_find_pmd(mm, addr); - if (!pmd) + if (find_pmd_or_thp_or_none(mm, addr, &pmd) != SCAN_SUCCEED) continue; /* * We need exclusive mmap_lock to retract page table. diff --git a/mm/ksm.c b/mm/ksm.c index 075123602bd0..3e0a0a42fa1f 100644 --- a/mm/ksm.c +++ b/mm/ksm.c @@ -1136,6 +1136,7 @@ static int replace_page(struct vm_area_struct *vma, struct page *page, { struct mm_struct *mm = vma->vm_mm; pmd_t *pmd; + pmd_t pmde; pte_t *ptep; pte_t newpte; spinlock_t *ptl; @@ -1150,6 +1151,15 @@ static int replace_page(struct vm_area_struct *vma, struct page *page, pmd = mm_find_pmd(mm, addr); if (!pmd) goto out; + /* + * Some THP functions use the sequence pmdp_huge_clear_flush(), set_pmd_at() + * without holding anon_vma lock for write. So when looking for a + * genuine pmde (in which to find pte), test present and !THP together. + */ + pmde = *pmd; + barrier(); + if (!pmd_present(pmde) || pmd_trans_huge(pmde)) + goto out; mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, mm, addr, addr + PAGE_SIZE); diff --git a/mm/rmap.c b/mm/rmap.c index edc06c52bc82..af775855e58f 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -767,13 +767,17 @@ unsigned long page_address_in_vma(struct page *page, struct vm_area_struct *vma) return vma_address(page, vma); } +/* + * Returns the actual pmd_t* where we expect 'address' to be mapped from, or + * NULL if it doesn't exist. No guarantees / checks on what the pmd_t* + * represents. + */ pmd_t *mm_find_pmd(struct mm_struct *mm, unsigned long address) { pgd_t *pgd; p4d_t *p4d; pud_t *pud; pmd_t *pmd = NULL; - pmd_t pmde; pgd = pgd_offset(mm, address); if (!pgd_present(*pgd)) @@ -788,15 +792,6 @@ pmd_t *mm_find_pmd(struct mm_struct *mm, unsigned long address) goto out; pmd = pmd_offset(pud, address); - /* - * Some THP functions use the sequence pmdp_huge_clear_flush(), set_pmd_at() - * without holding anon_vma lock for write. So when looking for a - * genuine pmde (in which to find pte), test present and !THP together. - */ - pmde = *pmd; - barrier(); - if (!pmd_present(pmde) || pmd_trans_huge(pmde)) - pmd = NULL; out: return pmd; }