From patchwork Wed Feb 16 09:48:07 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 12748344 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2CA4AC433F5 for ; Wed, 16 Feb 2022 09:48:54 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8D57F6B007B; Wed, 16 Feb 2022 04:48:53 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 8853C6B007D; Wed, 16 Feb 2022 04:48:53 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7861F6B007E; Wed, 16 Feb 2022 04:48:53 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0161.hostedemail.com [216.40.44.161]) by kanga.kvack.org (Postfix) with ESMTP id 6A0E96B007B for ; Wed, 16 Feb 2022 04:48:53 -0500 (EST) Received: from smtpin15.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 237CC894CE for ; Wed, 16 Feb 2022 09:48:53 +0000 (UTC) X-FDA: 79148168946.15.B5DC920 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf19.hostedemail.com (Postfix) with ESMTP id 7E2C91A0006 for ; Wed, 16 Feb 2022 09:48:52 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1645004932; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=r0OLEdIrUqYcaY6DYkFVTZPLD4+1hLYjm8QOXt2Ctgs=; b=bnWmuNK7ZH3+P2nYPLbVQMDnt3FgmCA8z4gA6xmEfVLqTRYip1lwzjAUc6gSLH3c30C+qj EA83DxdqWoohe0SaCfz9g3sAFsLzCzMl0WKMVq0QotNAJMVtgWsgnRMxhN+VKm8fw16vog em+BPCW4bp7tHW0fNEdFfFtXe+mp+NY= Received: from mail-pj1-f71.google.com (mail-pj1-f71.google.com [209.85.216.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-155-5ZgI5VFnNu2HSUFcvJV9yQ-1; Wed, 16 Feb 2022 04:48:50 -0500 X-MC-Unique: 5ZgI5VFnNu2HSUFcvJV9yQ-1 Received: by mail-pj1-f71.google.com with SMTP id f6-20020a17090a654600b001b9e4758439so1104573pjs.1 for ; Wed, 16 Feb 2022 01:48:50 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=r0OLEdIrUqYcaY6DYkFVTZPLD4+1hLYjm8QOXt2Ctgs=; b=W9Ic4mN3MexBUj4D/H41nUL+2kyRFcIoL7unDUzfYDfZ2Qidm7YrmbDH9bBfTiWsYv iOPNrK7l0RpH/+iseB0xryC1IgUY0AeY2SiXGoxoD/TVEuoyE5eNZuxTivepVhXNz8ed QO0dFyWYq8SreNNz6PQ+XDmwUpmUYYPK6+NrAc0kXdqDbjF1MQ29yjbpG5yOODKaEOfm 66XwG6PYQycYe68+FN/0+N2IPrKa2i2+vdzDSO0d1tKl3Tq+ueSl4/VCIDSu/WiaCXok vydR+gxE8s5uzB317KAZNEaV9C4UfPlaCUZEgLGfmGpwjtYpIQ08r7AIGnfDYU9IrWG6 FbAA== X-Gm-Message-State: AOAM531OJw43SqtJGNXgofHKx7J5O433MVgcwtfol8IOMZgaC1oNnyXy pxeKvI9LTJ+6Xva7sLE3U2iLzx7+4/gvBb0HjdWP2lRrTyax4v5f8EOVpE3Uc6oSaaKZwHG8wod VaRlK41g7D4c= X-Received: by 2002:a17:902:6acc:b0:149:8f60:a526 with SMTP id i12-20020a1709026acc00b001498f60a526mr2088573plt.25.1645004929853; Wed, 16 Feb 2022 01:48:49 -0800 (PST) X-Google-Smtp-Source: ABdhPJy8ro/PQOMeJ2c0+5tdk79xjZJksrx44v8baDKpAy/uaeFgFPs2vRvtMpMFMhDbUf7ALQrB3g== X-Received: by 2002:a17:902:6acc:b0:149:8f60:a526 with SMTP id i12-20020a1709026acc00b001498f60a526mr2088543plt.25.1645004929405; Wed, 16 Feb 2022 01:48:49 -0800 (PST) Received: from localhost.localdomain ([64.64.123.81]) by smtp.gmail.com with ESMTPSA id qe7sm11567835pjb.25.2022.02.16.01.48.36 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Wed, 16 Feb 2022 01:48:49 -0800 (PST) From: Peter Xu To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: Andrew Morton , "Kirill A . Shutemov" , Matthew Wilcox , Yang Shi , Andrea Arcangeli , peterx@redhat.com, John Hubbard , Alistair Popple , David Hildenbrand , Vlastimil Babka , Hugh Dickins Subject: [PATCH v4 1/4] mm: Don't skip swap entry even if zap_details specified Date: Wed, 16 Feb 2022 17:48:07 +0800 Message-Id: <20220216094810.60572-2-peterx@redhat.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20220216094810.60572-1-peterx@redhat.com> References: <20220216094810.60572-1-peterx@redhat.com> MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=bnWmuNK7; dmarc=pass (policy=none) header.from=redhat.com; spf=none (imf19.hostedemail.com: domain of peterx@redhat.com has no SPF policy when checking 170.10.129.124) smtp.mailfrom=peterx@redhat.com X-Rspam-User: X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 7E2C91A0006 X-Stat-Signature: 4u6m6bkuz7z8o878inu7pr3npizcf6q8 X-HE-Tag: 1645004932-247898 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The "details" pointer shouldn't be the token to decide whether we should skip swap entries. For example, when the user specified details->zap_mapping==NULL, it means the user wants to zap all the pages (including COWed pages), then we need to look into swap entries because there can be private COWed pages that was swapped out. Skipping some swap entries when details is non-NULL may lead to wrongly leaving some of the swap entries while we should have zapped them. A reproducer of the problem: ===8<=== #define _GNU_SOURCE /* See feature_test_macros(7) */ #include #include #include #include #include int page_size; int shmem_fd; char *buffer; void main(void) { int ret; char val; page_size = getpagesize(); shmem_fd = memfd_create("test", 0); assert(shmem_fd >= 0); ret = ftruncate(shmem_fd, page_size * 2); assert(ret == 0); buffer = mmap(NULL, page_size * 2, PROT_READ | PROT_WRITE, MAP_PRIVATE, shmem_fd, 0); assert(buffer != MAP_FAILED); /* Write private page, swap it out */ buffer[page_size] = 1; madvise(buffer, page_size * 2, MADV_PAGEOUT); /* This should drop private buffer[page_size] already */ ret = ftruncate(shmem_fd, page_size); assert(ret == 0); /* Recover the size */ ret = ftruncate(shmem_fd, page_size * 2); assert(ret == 0); /* Re-read the data, it should be all zero */ val = buffer[page_size]; if (val == 0) printf("Good\n"); else printf("BUG\n"); } ===8<=== We don't need to touch up the pmd path, because pmd never had a issue with swap entries. For example, shmem pmd migration will always be split into pte level, and same to swapping on anonymous. Add another helper should_zap_cows() so that we can also check whether we should zap private mappings when there's no page pointer specified. This patch drops that trick, so we handle swap ptes coherently. Meanwhile we should do the same check upon migration entry, hwpoison entry and genuine swap entries too. To be explicit, we should still remember to keep the private entries if even_cows==false, and always zap them when even_cows==true. The issue seems to exist starting from the initial commit of git. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Peter Xu Reviewed-by: John Hubbard --- mm/memory.c | 45 ++++++++++++++++++++++++++++++++++++--------- 1 file changed, 36 insertions(+), 9 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index c125c4969913..4bfeaca7cbc7 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1313,6 +1313,17 @@ struct zap_details { struct folio *single_folio; /* Locked folio to be unmapped */ }; +/* Whether we should zap all COWed (private) pages too */ +static inline bool should_zap_cows(struct zap_details *details) +{ + /* By default, zap all pages */ + if (!details) + return true; + + /* Or, we zap COWed pages only if the caller wants to */ + return !details->zap_mapping; +} + /* * We set details->zap_mapping when we want to unmap shared but keep private * pages. Return true if skip zapping this page, false otherwise. @@ -1320,11 +1331,15 @@ struct zap_details { static inline bool zap_skip_check_mapping(struct zap_details *details, struct page *page) { - if (!details || !page) + /* If we can make a decision without *page.. */ + if (should_zap_cows(details)) return false; - return details->zap_mapping && - (details->zap_mapping != page_rmapping(page)); + /* E.g. zero page */ + if (!page) + return false; + + return details->zap_mapping != page_rmapping(page); } static unsigned long zap_pte_range(struct mmu_gather *tlb, @@ -1405,17 +1420,29 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb, continue; } - /* If details->check_mapping, we leave swap entries. */ - if (unlikely(details)) - continue; - - if (!non_swap_entry(entry)) + if (!non_swap_entry(entry)) { + /* + * If this is a genuine swap entry, then it must be an + * private anon page. If the caller wants to skip + * COWed pages, ignore it. + */ + if (!should_zap_cows(details)) + continue; rss[MM_SWAPENTS]--; - else if (is_migration_entry(entry)) { + } else if (is_migration_entry(entry)) { struct page *page; page = pfn_swap_entry_to_page(entry); + if (zap_skip_check_mapping(details, page)) + continue; rss[mm_counter(page)]--; + } else if (is_hwpoison_entry(entry)) { + /* If the caller wants to skip COWed pages, ignore it */ + if (!should_zap_cows(details)) + continue; + } else { + /* We should have covered all the swap entry types */ + WARN_ON_ONCE(1); } if (unlikely(!free_swap_and_cache(entry))) print_bad_pte(vma, addr, ptent, NULL);