From patchwork Thu Feb 17 06:07:43 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 12749419 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7C869C433F5 for ; Thu, 17 Feb 2022 06:08:05 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E76866B0075; Thu, 17 Feb 2022 01:08:04 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id DFC536B0078; Thu, 17 Feb 2022 01:08:04 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C505D6B007B; Thu, 17 Feb 2022 01:08:04 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0191.hostedemail.com [216.40.44.191]) by kanga.kvack.org (Postfix) with ESMTP id B2E026B0075 for ; Thu, 17 Feb 2022 01:08:04 -0500 (EST) Received: from smtpin25.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 5FC6293D95 for ; Thu, 17 Feb 2022 06:08:04 +0000 (UTC) X-FDA: 79151241288.25.A83C8FF Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf20.hostedemail.com (Postfix) with ESMTP id CF4DC1C0002 for ; Thu, 17 Feb 2022 06:08:03 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1645078083; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ZSU30vCJtk7GotZgltWgn1HG21KfRp7j8JvnndBOhWw=; b=aae/IZX3xJWhqhMDwyYfYLsazld5ideSAqvBOew0caLV7bjp/PhQGvoEiAeL6E6xI3r8xX nwnBYHp4kIPZr7jUPsHusvYaQzPp/FC1fJ9GBfc670mV5zD6LTOWNK7I2KDEXD8gYWLWGT sqVbrR8JM3JBLE0G8G9HshpHXw9KW7I= Received: from mail-pj1-f69.google.com (mail-pj1-f69.google.com [209.85.216.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-16-I2vvCoKcMrqKPHpxyYII4w-1; Thu, 17 Feb 2022 01:08:02 -0500 X-MC-Unique: I2vvCoKcMrqKPHpxyYII4w-1 Received: by mail-pj1-f69.google.com with SMTP id jf17-20020a17090b175100b001b90cf26a4eso2804792pjb.3 for ; Wed, 16 Feb 2022 22:08:01 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=ZSU30vCJtk7GotZgltWgn1HG21KfRp7j8JvnndBOhWw=; b=UzoadJLZCeAmrD1aIJwu79kY88Q4GzH8v8tCXpTBGBiGpYjjV0BB9tUIStji+BOAwz 8Eg3ZKvFhxYKvYirn9TrD6CkjfP2koNj6QvA4jffEWqlNdFYFB5BCLG2FJe90JPyT0hr wGs96jwLjCbroWhY0ALT9oToi61VnRrt9e9GinbN+6f2PklxwciPjJ8WPipos7NtfoGB up/lxVLDdY556qqAI3sjhFy9fxodpr1niEYhY43/LhznOx4LJYudCT785vT6P2/1pw3L 5j43k/bdU/XqrrmLnhO1FJWrQJ4IPzO//KvC+5X5OuQk85q7F630AcZOWf+CYSFnQ8Ut Tabg== X-Gm-Message-State: AOAM532VfTX8KZA9yIl5TT884eXKurdXIFAhwZB/t5Vepbx2hMvIT77p nAyy4J4RvLLY93in2L3jhYzEfetGtSoQ/scjOAV37idSXv3pYY56i4M6uv0dpKiPG63n4Zqu53l 5nprAp6iHxHMEJ9tGpwMCYDH3cpua3YQenfzHnX7MscqdAJ7zDCr+E310Ae0C X-Received: by 2002:a17:902:da91:b0:14d:58de:5667 with SMTP id j17-20020a170902da9100b0014d58de5667mr1392103plx.95.1645078080733; Wed, 16 Feb 2022 22:08:00 -0800 (PST) X-Google-Smtp-Source: ABdhPJyDUnDyEDIU63kvmjH9FX83zNUGtXLAelwg+LX3uRrYzwGaMsuEvfOVDWTrPD9itq0ndmmSlQ== X-Received: by 2002:a17:902:da91:b0:14d:58de:5667 with SMTP id j17-20020a170902da9100b0014d58de5667mr1392070plx.95.1645078080296; Wed, 16 Feb 2022 22:08:00 -0800 (PST) Received: from localhost.localdomain ([94.177.118.126]) by smtp.gmail.com with ESMTPSA id j8sm224230pjc.11.2022.02.16.22.07.55 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Wed, 16 Feb 2022 22:08:00 -0800 (PST) From: Peter Xu To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Alistair Popple , Matthew Wilcox , peterx@redhat.com, David Hildenbrand , Andrea Arcangeli , Hugh Dickins , Yang Shi , Vlastimil Babka , John Hubbard , Andrew Morton , "Kirill A . Shutemov" Subject: [PATCH v5 1/4] mm: Don't skip swap entry even if zap_details specified Date: Thu, 17 Feb 2022 14:07:43 +0800 Message-Id: <20220217060746.71256-2-peterx@redhat.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20220217060746.71256-1-peterx@redhat.com> References: <20220217060746.71256-1-peterx@redhat.com> MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-Rspam-User: X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: CF4DC1C0002 X-Stat-Signature: byj4bgjymzehwzufh9c9qraqcft8mpoe Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b="aae/IZX3"; spf=none (imf20.hostedemail.com: domain of peterx@redhat.com has no SPF policy when checking 170.10.133.124) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com X-HE-Tag: 1645078083-475589 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The "details" pointer shouldn't be the token to decide whether we should skip swap entries. For example, when the callers specified details->zap_mapping==NULL, it means the user wants to zap all the pages (including COWed pages), then we need to look into swap entries because there can be private COWed pages that was swapped out. Skipping some swap entries when details is non-NULL may lead to wrongly leaving some of the swap entries while we should have zapped them. A reproducer of the problem: ===8<=== #define _GNU_SOURCE /* See feature_test_macros(7) */ #include #include #include #include #include int page_size; int shmem_fd; char *buffer; void main(void) { int ret; char val; page_size = getpagesize(); shmem_fd = memfd_create("test", 0); assert(shmem_fd >= 0); ret = ftruncate(shmem_fd, page_size * 2); assert(ret == 0); buffer = mmap(NULL, page_size * 2, PROT_READ | PROT_WRITE, MAP_PRIVATE, shmem_fd, 0); assert(buffer != MAP_FAILED); /* Write private page, swap it out */ buffer[page_size] = 1; madvise(buffer, page_size * 2, MADV_PAGEOUT); /* This should drop private buffer[page_size] already */ ret = ftruncate(shmem_fd, page_size); assert(ret == 0); /* Recover the size */ ret = ftruncate(shmem_fd, page_size * 2); assert(ret == 0); /* Re-read the data, it should be all zero */ val = buffer[page_size]; if (val == 0) printf("Good\n"); else printf("BUG\n"); } ===8<=== We don't need to touch up the pmd path, because pmd never had a issue with swap entries. For example, shmem pmd migration will always be split into pte level, and same to swapping on anonymous. Add another helper should_zap_cows() so that we can also check whether we should zap private mappings when there's no page pointer specified. This patch drops that trick, so we handle swap ptes coherently. Meanwhile we should do the same check upon migration entry, hwpoison entry and genuine swap entries too. To be explicit, we should still remember to keep the private entries if even_cows==false, and always zap them when even_cows==true. The issue seems to exist starting from the initial commit of git. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reviewed-by: John Hubbard Signed-off-by: Peter Xu --- mm/memory.c | 40 +++++++++++++++++++++++++++++++--------- 1 file changed, 31 insertions(+), 9 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index c125c4969913..533da5d6c32c 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1313,6 +1313,17 @@ struct zap_details { struct folio *single_folio; /* Locked folio to be unmapped */ }; +/* Whether we should zap all COWed (private) pages too */ +static inline bool should_zap_cows(struct zap_details *details) +{ + /* By default, zap all pages */ + if (!details) + return true; + + /* Or, we zap COWed pages only if the caller wants to */ + return !details->zap_mapping; +} + /* * We set details->zap_mapping when we want to unmap shared but keep private * pages. Return true if skip zapping this page, false otherwise. @@ -1320,11 +1331,15 @@ struct zap_details { static inline bool zap_skip_check_mapping(struct zap_details *details, struct page *page) { - if (!details || !page) + /* If we can make a decision without *page.. */ + if (should_zap_cows(details)) + return false; + + /* E.g. the caller passes NULL for the case of a zero page */ + if (!page) return false; - return details->zap_mapping && - (details->zap_mapping != page_rmapping(page)); + return details->zap_mapping != page_rmapping(page); } static unsigned long zap_pte_range(struct mmu_gather *tlb, @@ -1405,17 +1420,24 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb, continue; } - /* If details->check_mapping, we leave swap entries. */ - if (unlikely(details)) - continue; - - if (!non_swap_entry(entry)) + if (!non_swap_entry(entry)) { + /* Genuine swap entry, hence a private anon page */ + if (!should_zap_cows(details)) + continue; rss[MM_SWAPENTS]--; - else if (is_migration_entry(entry)) { + } else if (is_migration_entry(entry)) { struct page *page; page = pfn_swap_entry_to_page(entry); + if (zap_skip_check_mapping(details, page)) + continue; rss[mm_counter(page)]--; + } else if (is_hwpoison_entry(entry)) { + if (!should_zap_cows(details)) + continue; + } else { + /* We should have covered all the swap entry types */ + WARN_ON_ONCE(1); } if (unlikely(!free_swap_and_cache(entry))) print_bad_pte(vma, addr, ptent, NULL);