From patchwork Fri Jan 28 04:54:09 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 12727933 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id EF101C433EF for ; Fri, 28 Jan 2022 04:54:33 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 908FA6B008C; Thu, 27 Jan 2022 23:54:33 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 891466B0092; Thu, 27 Jan 2022 23:54:33 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6E4106B0093; Thu, 27 Jan 2022 23:54:33 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0007.hostedemail.com [216.40.44.7]) by kanga.kvack.org (Postfix) with ESMTP id 5B1A06B008C for ; Thu, 27 Jan 2022 23:54:33 -0500 (EST) Received: from smtpin07.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 144B1181059D9 for ; Fri, 28 Jan 2022 04:54:33 +0000 (UTC) X-FDA: 79078480026.07.D4D775A Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf06.hostedemail.com (Postfix) with ESMTP id 9AB5F180014 for ; Fri, 28 Jan 2022 04:54:32 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1643345672; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=r0OLEdIrUqYcaY6DYkFVTZPLD4+1hLYjm8QOXt2Ctgs=; b=KS0rCxbUCeGWpqdh3MnrO9A1mZgVFH+ymefxG+6bCb6t7XB9L4SS6XfjU8H5EkdD0VfLef krK/FHiO5R+zsL0DnOB+DLjdR/ZdgngzSO4Pg+/Znb8aFef7UssGBxcnj8S9/29zftmqpH sLEXpgSdTzHNStNBJZsvVq2cmWUMeJw= Received: from mail-wm1-f70.google.com (mail-wm1-f70.google.com [209.85.128.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-639-GO06Lr_ZMqqWuIREHUmbRw-1; Thu, 27 Jan 2022 23:54:31 -0500 X-MC-Unique: GO06Lr_ZMqqWuIREHUmbRw-1 Received: by mail-wm1-f70.google.com with SMTP id n7-20020a1c7207000000b0034ec3d8ce0aso2392468wmc.8 for ; Thu, 27 Jan 2022 20:54:30 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=r0OLEdIrUqYcaY6DYkFVTZPLD4+1hLYjm8QOXt2Ctgs=; b=fjti5C7Xo59SI4/RPLooTy0MS7yj6rhJYt2WAT7+uoQtCp5vq0r1bC0vBv17A94GyK 2233E8OmOChpYIKkHqpv67iO9LabsKPpqqV9D3ZsU0FUUq5Tpz0QMgwfXWMx773Pzg9t eOPG3ZnGea3JajjvJTu8Qq9EtL6lmZC0junnZXGH9pSNd/MhOo2QUlYdhN0EiyN/mXzP BZcdtq6nS2qRC9zoNwtI8t/8VLtjMHveGgfoDDklaXcV9J0kO/j04Bv+5dD0igwoIEzU ZGgbJvwmASyaf10EKc8G2EHvZV3XKwVXoxydyUcBhDQ6hj7MTs4uyii9UV53k88NPyat 56SA== X-Gm-Message-State: AOAM532/sGdYyEdwm6Po2una2ft879W+pvFNxjLwiGLc7TQbO5+FuOO8 jj+y5Yk+SMKpy66rs3OeiQlC87xWmznr2yjMBf8zB87znSjv/AsRx5lo5IRi2HIlhqopueVTmwT 4NcNMdfp8GZET4l/hUYhyVcs9AeitqKRYiLEmQl+Zv/5w/YumCDuEjxy6SUhe X-Received: by 2002:a5d:6da4:: with SMTP id u4mr5486429wrs.611.1643345669693; Thu, 27 Jan 2022 20:54:29 -0800 (PST) X-Google-Smtp-Source: ABdhPJzNjlbcr557L0JzcCx8cJn1ypoGGBxeXy3L3cPs1G3/t3aoCJ8N911EUy9hAc8Wvze/o8vnfQ== X-Received: by 2002:a5d:6da4:: with SMTP id u4mr5486406wrs.611.1643345669412; Thu, 27 Jan 2022 20:54:29 -0800 (PST) Received: from localhost.localdomain ([64.64.123.9]) by smtp.gmail.com with ESMTPSA id i13sm814014wrf.3.2022.01.27.20.54.24 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Thu, 27 Jan 2022 20:54:29 -0800 (PST) From: Peter Xu To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: peterx@redhat.com, Alistair Popple , Andrew Morton , Andrea Arcangeli , David Hildenbrand , Matthew Wilcox , John Hubbard , Hugh Dickins , Vlastimil Babka , Yang Shi , "Kirill A . Shutemov" Subject: [PATCH v3 1/4] mm: Don't skip swap entry even if zap_details specified Date: Fri, 28 Jan 2022 12:54:09 +0800 Message-Id: <20220128045412.18695-2-peterx@redhat.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20220128045412.18695-1-peterx@redhat.com> References: <20220128045412.18695-1-peterx@redhat.com> MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-Rspamd-Server: rspam11 X-Rspamd-Queue-Id: 9AB5F180014 X-Stat-Signature: 6ohckoek5ugmq9mxghm5i5dsuiwr3wnb Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=KS0rCxbU; dmarc=pass (policy=none) header.from=redhat.com; spf=none (imf06.hostedemail.com: domain of peterx@redhat.com has no SPF policy when checking 170.10.129.124) smtp.mailfrom=peterx@redhat.com X-Rspam-User: nil X-HE-Tag: 1643345672-941044 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The "details" pointer shouldn't be the token to decide whether we should skip swap entries. For example, when the user specified details->zap_mapping==NULL, it means the user wants to zap all the pages (including COWed pages), then we need to look into swap entries because there can be private COWed pages that was swapped out. Skipping some swap entries when details is non-NULL may lead to wrongly leaving some of the swap entries while we should have zapped them. A reproducer of the problem: ===8<=== #define _GNU_SOURCE /* See feature_test_macros(7) */ #include #include #include #include #include int page_size; int shmem_fd; char *buffer; void main(void) { int ret; char val; page_size = getpagesize(); shmem_fd = memfd_create("test", 0); assert(shmem_fd >= 0); ret = ftruncate(shmem_fd, page_size * 2); assert(ret == 0); buffer = mmap(NULL, page_size * 2, PROT_READ | PROT_WRITE, MAP_PRIVATE, shmem_fd, 0); assert(buffer != MAP_FAILED); /* Write private page, swap it out */ buffer[page_size] = 1; madvise(buffer, page_size * 2, MADV_PAGEOUT); /* This should drop private buffer[page_size] already */ ret = ftruncate(shmem_fd, page_size); assert(ret == 0); /* Recover the size */ ret = ftruncate(shmem_fd, page_size * 2); assert(ret == 0); /* Re-read the data, it should be all zero */ val = buffer[page_size]; if (val == 0) printf("Good\n"); else printf("BUG\n"); } ===8<=== We don't need to touch up the pmd path, because pmd never had a issue with swap entries. For example, shmem pmd migration will always be split into pte level, and same to swapping on anonymous. Add another helper should_zap_cows() so that we can also check whether we should zap private mappings when there's no page pointer specified. This patch drops that trick, so we handle swap ptes coherently. Meanwhile we should do the same check upon migration entry, hwpoison entry and genuine swap entries too. To be explicit, we should still remember to keep the private entries if even_cows==false, and always zap them when even_cows==true. The issue seems to exist starting from the initial commit of git. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Peter Xu --- mm/memory.c | 45 ++++++++++++++++++++++++++++++++++++--------- 1 file changed, 36 insertions(+), 9 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index c125c4969913..4bfeaca7cbc7 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1313,6 +1313,17 @@ struct zap_details { struct folio *single_folio; /* Locked folio to be unmapped */ }; +/* Whether we should zap all COWed (private) pages too */ +static inline bool should_zap_cows(struct zap_details *details) +{ + /* By default, zap all pages */ + if (!details) + return true; + + /* Or, we zap COWed pages only if the caller wants to */ + return !details->zap_mapping; +} + /* * We set details->zap_mapping when we want to unmap shared but keep private * pages. Return true if skip zapping this page, false otherwise. @@ -1320,11 +1331,15 @@ struct zap_details { static inline bool zap_skip_check_mapping(struct zap_details *details, struct page *page) { - if (!details || !page) + /* If we can make a decision without *page.. */ + if (should_zap_cows(details)) return false; - return details->zap_mapping && - (details->zap_mapping != page_rmapping(page)); + /* E.g. zero page */ + if (!page) + return false; + + return details->zap_mapping != page_rmapping(page); } static unsigned long zap_pte_range(struct mmu_gather *tlb, @@ -1405,17 +1420,29 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb, continue; } - /* If details->check_mapping, we leave swap entries. */ - if (unlikely(details)) - continue; - - if (!non_swap_entry(entry)) + if (!non_swap_entry(entry)) { + /* + * If this is a genuine swap entry, then it must be an + * private anon page. If the caller wants to skip + * COWed pages, ignore it. + */ + if (!should_zap_cows(details)) + continue; rss[MM_SWAPENTS]--; - else if (is_migration_entry(entry)) { + } else if (is_migration_entry(entry)) { struct page *page; page = pfn_swap_entry_to_page(entry); + if (zap_skip_check_mapping(details, page)) + continue; rss[mm_counter(page)]--; + } else if (is_hwpoison_entry(entry)) { + /* If the caller wants to skip COWed pages, ignore it */ + if (!should_zap_cows(details)) + continue; + } else { + /* We should have covered all the swap entry types */ + WARN_ON_ONCE(1); } if (unlikely(!free_swap_and_cache(entry))) print_bad_pte(vma, addr, ptent, NULL);