From patchwork Sun Aug 2 19:15:24 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hugh Dickins X-Patchwork-Id: 11696843 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id EB27F1392 for ; Sun, 2 Aug 2020 19:15:29 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id B6B2D20792 for ; Sun, 2 Aug 2020 19:15:29 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="K6DX5GPY" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org B6B2D20792 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id EEA0E8D00CC; Sun, 2 Aug 2020 15:15:28 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id EC0848D00AA; Sun, 2 Aug 2020 15:15:28 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DD79B8D00CC; Sun, 2 Aug 2020 15:15:28 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0033.hostedemail.com [216.40.44.33]) by kanga.kvack.org (Postfix) with ESMTP id C7F598D00AA for ; Sun, 2 Aug 2020 15:15:28 -0400 (EDT) Received: from smtpin18.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 83C04180AD806 for ; Sun, 2 Aug 2020 19:15:28 +0000 (UTC) X-FDA: 77106582336.18.sugar39_2e0343426f97 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin18.hostedemail.com (Postfix) with ESMTP id 5702B100ED0F8 for ; Sun, 2 Aug 2020 19:15:28 +0000 (UTC) X-Spam-Summary: 1,0,0,7f9943e7a8304d96,d41d8cd98f00b204,hughd@google.com,,RULES_HIT:41:69:355:379:800:960:966:968:973:988:989:1260:1277:1313:1314:1345:1359:1437:1516:1518:1535:1543:1593:1594:1711:1730:1747:1777:1792:2196:2199:2393:2553:2559:2562:2693:2909:3138:3139:3140:3141:3142:3152:3355:3865:3866:3867:3868:3872:3874:4117:4250:4321:4385:4605:5007:6119:6120:6261:6653:7901:7903:8568:8957:9592:10004:10400:11026:11473:11658:11914:12043:12114:12296:12297:12438:12517:12519:12555:12679:12683:12740:12895:12986:13161:13229:13439:14096:14097:14181:14394:14659:14721:21080:21324:21444:21451:21611:21627:21796:21939:21990:30016:30036:30054:30070:30090,0,RBL:209.85.160.195:@google.com:.lbl8.mailshell.net-66.100.201.100 62.18.0.100;04yrbd4ac8j1i5hsrxidog5kpr9nsypus7g7mwknux4mq1k3yfd4cssdz8b3otp.izaq3q6r6ypoogzhuskku313j45qbojmodk1ifpb4s1fazxm5s3nsdeo8dphb6i.4-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:ne utral,Cu X-HE-Tag: sugar39_2e0343426f97 X-Filterd-Recvd-Size: 6928 Received: from mail-qt1-f195.google.com (mail-qt1-f195.google.com [209.85.160.195]) by imf44.hostedemail.com (Postfix) with ESMTP for ; Sun, 2 Aug 2020 19:15:27 +0000 (UTC) Received: by mail-qt1-f195.google.com with SMTP id b25so26803281qto.2 for ; Sun, 02 Aug 2020 12:15:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:in-reply-to:message-id:references :user-agent:mime-version; bh=vS2vRaAkTGnixX4BAai/gTgrBST/TBnGGZjBlA+esRM=; b=K6DX5GPY7IwFkwoiAtoKOSRoll93bYubZeIiKxw2JkhpEdiYJB3ukMFh4bwKlnAI5d 3bWVaO1GfbcgKcbEPO/iaOe/lpYwEXle+qXkfVVjbn4FinRbBlJ4/tMNSSyHa4xgNQFK 1RxgS8AHoTQZuP7Ym8hqrJ7z2TK629wsZmm2aC30AmeQnN0ieHljT3+/cWlsZsSqux9e B0HO9ftXRfak9vVy/RAABHVYUmODca0qa+x4SFVyAJxTP8u02TB21ZR89CZfXsHSgVzc eGybkcotEcctEsUZvgR1zoeSBBo7/C+C7S7lTVVFykzGZMF6FCaqwerxfjQCwMg48I/Z /+4g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:in-reply-to:message-id :references:user-agent:mime-version; bh=vS2vRaAkTGnixX4BAai/gTgrBST/TBnGGZjBlA+esRM=; b=tjNjoBPhyeG382nRjw3h1FCN+HjZi+nzW1cQ28qOFq8SUagcJZ3KdqckRLdzaUryWC O821GSUUGjGUmqnl8o7xkYpxo4zJa09EmhU1Rabg4iKuS393Cs8V2X96u1gWwv8cR1ZM lpiSogQUec2LgZBKM024hWzPhKyhIVdX8etOxiHhRuR0HrM5UrceMG2uo78HirZ7ppJ/ As164i/9TEhyCa8zxI8UlpMK5ELbsMzSEChpziPl18jEU7v6zd9E+Bo8UbM2QhNKR3l+ qFLL9pRLz+gJj6cSwPMBCeuJ2xTgVrv2XpZx9L3I5wzB7/ZnctP6p2lohKXK/57TTfa8 nbpA== X-Gm-Message-State: AOAM5317SxB7HeEcYKs/6pMWPI1rnRaz+m8W26tJJ5DAG02/gDn9XZCA j7wbrjv4I7MZbbUUNGUuUJ/U4Q== X-Google-Smtp-Source: ABdhPJzRb+C23fUu3f4LTU7NLJtGziasQnCgIc5ixpWKAjMGf7iooXhSpgv+oeeEQ9QVKsVgGRpEJg== X-Received: by 2002:ac8:24d9:: with SMTP id t25mr13778535qtt.15.1596395727021; Sun, 02 Aug 2020 12:15:27 -0700 (PDT) Received: from eggly.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id l11sm18727086qti.59.2020.08.02.12.15.25 (version=TLS1 cipher=ECDHE-ECDSA-AES128-SHA bits=128/128); Sun, 02 Aug 2020 12:15:26 -0700 (PDT) Date: Sun, 2 Aug 2020 12:15:24 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@eggly.anvils To: Andrew Morton cc: "Kirill A. Shutemov" , Andrea Arcangeli , Song Liu , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH] khugepaged: collapse_pte_mapped_thp() protect the pmd lock In-Reply-To: Message-ID: References: User-Agent: Alpine 2.11 (LSU 23 2013-08-11) MIME-Version: 1.0 X-Rspamd-Queue-Id: 5702B100ED0F8 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam04 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: When retract_page_tables() removes a page table to make way for a huge pmd, it holds huge page lock, i_mmap_lock_write, mmap_write_trylock and pmd lock; but when collapse_pte_mapped_thp() does the same (to handle the case when the original mmap_write_trylock had failed), only mmap_write_trylock and pmd lock are held. That's not enough. One machine has twice crashed under load, with "BUG: spinlock bad magic" and GPF on 6b6b6b6b6b6b6b6b. Examining the second crash, page_vma_mapped_walk_done()'s spin_unlock of pvmw->ptl (serving page_referenced() on a file THP, that had found a page table at *pmd) discovers that the page table page and its lock have already been freed by the time it comes to unlock. Follow the example of retract_page_tables(), but we only need one of huge page lock or i_mmap_lock_write to secure against this: because it's the narrower lock, and because it simplifies collapse_pte_mapped_thp() to know the hpage earlier, choose to rely on huge page lock here. Fixes: 27e1f8273113 ("khugepaged: enable collapse pmd for pte-mapped THP") Signed-off-by: Hugh Dickins Cc: stable@vger.kernel.org # v5.4+ Acked-by: Kirill A. Shutemov --- mm/khugepaged.c | 44 +++++++++++++++++++------------------------- 1 file changed, 19 insertions(+), 25 deletions(-) --- 5.8-rc7/mm/khugepaged.c 2020-07-26 16:58:02.189038680 -0700 +++ linux/mm/khugepaged.c 2020-08-02 10:51:02.127688808 -0700 @@ -1412,7 +1412,7 @@ void collapse_pte_mapped_thp(struct mm_s { unsigned long haddr = addr & HPAGE_PMD_MASK; struct vm_area_struct *vma = find_vma(mm, haddr); - struct page *hpage = NULL; + struct page *hpage; pte_t *start_pte, *pte; pmd_t *pmd, _pmd; spinlock_t *ptl; @@ -1432,9 +1432,17 @@ void collapse_pte_mapped_thp(struct mm_s if (!hugepage_vma_check(vma, vma->vm_flags | VM_HUGEPAGE)) return; + hpage = find_lock_page(vma->vm_file->f_mapping, + linear_page_index(vma, haddr)); + if (!hpage) + return; + + if (!PageHead(hpage)) + goto drop_hpage; + pmd = mm_find_pmd(mm, haddr); if (!pmd) - return; + goto drop_hpage; start_pte = pte_offset_map_lock(mm, pmd, haddr, &ptl); @@ -1453,30 +1461,11 @@ void collapse_pte_mapped_thp(struct mm_s page = vm_normal_page(vma, addr, *pte); - if (!page || !PageCompound(page)) - goto abort; - - if (!hpage) { - hpage = compound_head(page); - /* - * The mapping of the THP should not change. - * - * Note that uprobe, debugger, or MAP_PRIVATE may - * change the page table, but the new page will - * not pass PageCompound() check. - */ - if (WARN_ON(hpage->mapping != vma->vm_file->f_mapping)) - goto abort; - } - /* - * Confirm the page maps to the correct subpage. - * - * Note that uprobe, debugger, or MAP_PRIVATE may change - * the page table, but the new page will not pass - * PageCompound() check. + * Note that uprobe, debugger, or MAP_PRIVATE may change the + * page table, but the new page will not be a subpage of hpage. */ - if (WARN_ON(hpage + i != page)) + if (hpage + i != page) goto abort; count++; } @@ -1495,7 +1484,7 @@ void collapse_pte_mapped_thp(struct mm_s pte_unmap_unlock(start_pte, ptl); /* step 3: set proper refcount and mm_counters. */ - if (hpage) { + if (count) { page_ref_sub(hpage, count); add_mm_counter(vma->vm_mm, mm_counter_file(hpage), -count); } @@ -1506,10 +1495,15 @@ void collapse_pte_mapped_thp(struct mm_s spin_unlock(ptl); mm_dec_nr_ptes(mm); pte_free(mm, pmd_pgtable(_pmd)); + +drop_hpage: + unlock_page(hpage); + put_page(hpage); return; abort: pte_unmap_unlock(start_pte, ptl); + goto drop_hpage; } static int khugepaged_collapse_pte_mapped_thps(struct mm_slot *mm_slot)