From patchwork Sat Oct 10 03:07:59 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hugh Dickins X-Patchwork-Id: 11829801 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7B8C51580 for ; Sat, 10 Oct 2020 03:08:20 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 0A23E2222F for ; Sat, 10 Oct 2020 03:08:19 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="oG4sK3F6" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 0A23E2222F Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 2614F8E0003; Fri, 9 Oct 2020 23:08:19 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 2139F8E0001; Fri, 9 Oct 2020 23:08:19 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0DA238E0003; Fri, 9 Oct 2020 23:08:19 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0178.hostedemail.com [216.40.44.178]) by kanga.kvack.org (Postfix) with ESMTP id D3B3D8E0001 for ; Fri, 9 Oct 2020 23:08:18 -0400 (EDT) Received: from smtpin15.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 60DEE181AE867 for ; Sat, 10 Oct 2020 03:08:18 +0000 (UTC) X-FDA: 77354532276.15.baby58_3a07daa271e5 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin15.hostedemail.com (Postfix) with ESMTP id 40BA21814B0C1 for ; Sat, 10 Oct 2020 03:08:18 +0000 (UTC) X-Spam-Summary: 1,0,0,4cffc29bc9ccc361,d41d8cd98f00b204,hughd@google.com,,RULES_HIT:41:355:379:800:960:966:967:973:988:989:1260:1277:1313:1314:1345:1437:1516:1518:1535:1544:1593:1594:1711:1730:1747:1777:1792:1801:2196:2199:2393:2525:2553:2559:2565:2570:2682:2685:2691:2703:2859:2902:2933:2937:2939:2942:2945:2947:2951:2954:3022:3152:3355:3743:3865:3866:3867:3868:3870:3871:3872:3874:3934:3936:3938:3941:3944:3947:3950:3953:3956:3959:4118:4250:4321:4385:4605:5007:6261:7514:7903:8957:9025:9121:10004:11658:12740:13161:13221:13229:30056,0,RBL:209.85.167.193:@google.com:.lbl8.mailshell.net-62.18.0.100 66.100.201.100;04yrw3k9q6rwgbntnasgqf9z8xuzkyc9bb68q7w5oeby89f38ubwi8ur9gti6b8.oibhtnhg6tz9oe8bnyk7izqr37jjw417foy35mx5neyceh43bbpqbk11qc6kptm.4-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:23,LUA_SUMMARY:none X-HE-Tag: baby58_3a07daa271e5 X-Filterd-Recvd-Size: 7379 Received: from mail-oi1-f193.google.com (mail-oi1-f193.google.com [209.85.167.193]) by imf36.hostedemail.com (Postfix) with ESMTP for ; Sat, 10 Oct 2020 03:08:17 +0000 (UTC) Received: by mail-oi1-f193.google.com with SMTP id q136so11119383oic.8 for ; Fri, 09 Oct 2020 20:08:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:message-id:user-agent:mime-version; bh=fZ9ooXmn2GQKpm0k8FtJCZVXBPaGul3wh0cIHpiIF2U=; b=oG4sK3F6KhXNCzBLglOUojErBNiB00LZjENdEH/udSgzvBUZ3hDhcJayWMBVGnL5uS duPcpjpimRgSfhPKLGaiPZ5zEAwED8Ct4SOo4ozv3TwUQj0dZXZv7aMjyTvHfBJp8JET RKrLpdZgtxNkPG2UxGyhrT0cZ0CiNaGZp3gEzvXsN4zoxM/j6H2Rho3WWKSBFYmRs0cJ sZAST+Nkcxv8GlOYB6DsdDTZG+kB1qz3BoET++tVraZkT8xqIi6jTlYeXYrE7obko2ew mgGevB8NqsdjiEusXPr9upxJM7b5gdUhs+GqEMhx3agPlvFZRDgmxeXHyfMYPTf6/IuG AppQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:user-agent :mime-version; bh=fZ9ooXmn2GQKpm0k8FtJCZVXBPaGul3wh0cIHpiIF2U=; b=VV2uwJ+J7h//ot9uVa2/n/QZ7NJ4iZIcCLY5jHpg9qneWhpkMUvI9dUZcZTjV7lN7J aoxJPI5S4k64FuC6IU71DYJiTueSPiyOJIg1/RFl7AgM2HqMfQZgoDHnq1aEb7vSzTnz TNrEg9iNMKBaFUIUOwOIQgLPJ9zKEf1OUBf4GvCd/oMhaDQSf/QXFE/SZOiKEGSTVk19 BZv7yKy9rPTIoUIJflB+TvWFd+fPf00hrCMDn/f9/5QJqSGfrxxofe02vLedAEeO3ZBW axJrvucx5N83l/XFKjh+9JlquWFzrn+npMtzooTEjJC3G4PfcqCVorDsl67s6GnlWdfs dz3g== X-Gm-Message-State: AOAM531Wbd+xF/LCaqIEOWOoCX8tlu61s5utt0ODAzOGP3WFuMoIFryB UBtANndIo+mRI9d+WizNYNQlYw== X-Google-Smtp-Source: ABdhPJzctLhVCjZFt53I0JGs7dx8LueATe7Ud2mcJyLwvDuyDQMHnGK1okhRXwWmiHh71l7M4InFIg== X-Received: by 2002:aca:ea44:: with SMTP id i65mr4213710oih.117.1602299296924; Fri, 09 Oct 2020 20:08:16 -0700 (PDT) Received: from eggly.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id j83sm7979078oia.19.2020.10.09.20.08.13 (version=TLS1 cipher=ECDHE-ECDSA-AES128-SHA bits=128/128); Fri, 09 Oct 2020 20:08:15 -0700 (PDT) Date: Fri, 9 Oct 2020 20:07:59 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@eggly.anvils To: Andrew Morton cc: Linus Torvalds , Matthew Wilcox , Song Liu , "Kirill A. Shutemov" , Yang Shi , Denis Lisov , Qian Cai , Suren Baghdasaryan , David Rientjes , Minchan Kim , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH] mm/khugepaged: fix filemap page_to_pgoff(page) != offset Message-ID: User-Agent: Alpine 2.11 (LSU 23 2013-08-11) MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: There have been elusive reports of filemap_fault() hitting its VM_BUG_ON_PAGE(page_to_pgoff(page) != offset, page) on kernels built with CONFIG_READ_ONLY_THP_FOR_FS=y. Suren has hit it on a kernel with CONFIG_READ_ONLY_THP_FOR_FS=y and CONFIG_NUMA is not set: and he has analyzed it down to how khugepaged without NUMA reuses the same huge page after collapse_file() failed (whereas NUMA targets its allocation to the respective node each time). And most of us were usually testing with CONFIG_NUMA=y kernels. collapse_file(old start) new_page = khugepaged_alloc_page(hpage) __SetPageLocked(new_page) new_page->index = start // hpage->index=old offset new_page->mapping = mapping xas_store(&xas, new_page) filemap_fault page = find_get_page(mapping, offset) // if offset falls inside hpage then // compound_head(page) == hpage lock_page_maybe_drop_mmap() __lock_page(page) // collapse fails xas_store(&xas, old page) new_page->mapping = NULL unlock_page(new_page) collapse_file(new start) new_page = khugepaged_alloc_page(hpage) __SetPageLocked(new_page) new_page->index = start // hpage->index=new offset new_page->mapping = mapping // mapping becomes valid again // since compound_head(page) == hpage // page_to_pgoff(page) got changed VM_BUG_ON_PAGE(page_to_pgoff(page) != offset) An initial patch replaced __SetPageLocked() by lock_page(), which did fix the race which Suren illustrates above. But testing showed that it's not good enough: if the racing task's __lock_page() gets delayed long after its find_get_page(), then it may follow collapse_file(new start)'s successful final unlock_page(), and crash on the same VM_BUG_ON_PAGE. It could be fixed by relaxing filemap_fault()'s VM_BUG_ON_PAGE to a check and retry (as is done for mapping), with similar relaxations in find_lock_entry() and pagecache_get_page(): but it's not obvious what else might get caught out; and khugepaged non-NUMA appears to be unique in exposing a page to page cache, then revoking, without going through a full cycle of freeing before reuse. Instead, non-NUMA khugepaged_prealloc_page() release the old page if anyone else has a reference to it (1% of cases when I tested). Although never reported on huge tmpfs, I believe its find_lock_entry() has been at similar risk; but huge tmpfs does not rely on khugepaged for its normal working nearly so much as READ_ONLY_THP_FOR_FS does. Reported-by: Denis Lisov Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=206569 Link: https://lore.kernel.org/linux-mm/?q=20200219144635.3b7417145de19b65f258c943%40linux-foundation.org Reported-by: Qian Cai Link: https://lore.kernel.org/linux-xfs/?q=20200616013309.GB815%40lca.pw Reported-and-analyzed-by: Suren Baghdasaryan Fixes: 87c460a0bded ("mm/khugepaged: collapse_shmem() without freezing new_page") Signed-off-by: Hugh Dickins Cc: stable@vger.kernel.org # v4.9+ Reviewed-by: Matthew Wilcox (Oracle) --- mm/khugepaged.c | 12 ++++++++++++ 1 file changed, 12 insertions(+) --- 5.9-rc8/mm/khugepaged.c 2020-09-06 17:34:46.939306972 -0700 +++ linux/mm/khugepaged.c 2020-10-08 16:19:42.999765534 -0700 @@ -914,6 +914,18 @@ static struct page *khugepaged_alloc_hug static bool khugepaged_prealloc_page(struct page **hpage, bool *wait) { + /* + * If the hpage allocated earlier was briefly exposed in page cache + * before collapse_file() failed, it is possible that racing lookups + * have not yet completed, and would then be unpleasantly surprised by + * finding the hpage reused for the same mapping at a different offset. + * Just release the previous allocation if there is any danger of that. + */ + if (*hpage && page_count(*hpage) > 1) { + put_page(*hpage); + *hpage = NULL; + } + if (!*hpage) *hpage = khugepaged_alloc_hugepage(wait);