From patchwork Thu Aug 4 20:39:50 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 12936683 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0C13AC25B07 for ; Thu, 4 Aug 2022 20:39:59 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id EEEF68E0002; Thu, 4 Aug 2022 16:39:58 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E9DFD8E0001; Thu, 4 Aug 2022 16:39:58 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D3F3A8E0002; Thu, 4 Aug 2022 16:39:58 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id C0E9F8E0001 for ; Thu, 4 Aug 2022 16:39:58 -0400 (EDT) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 94B2740E82 for ; Thu, 4 Aug 2022 20:39:58 +0000 (UTC) X-FDA: 79763076876.21.C225313 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf23.hostedemail.com (Postfix) with ESMTP id 77E40140069 for ; Thu, 4 Aug 2022 20:39:57 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1659645597; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=1MZEhXGPLmxxGTFBXqyHNwWW8pRxmnJfI3yJxThqftE=; b=HCKlnZcnRqK/31VlJyF0YscLfFthEyhX8PvopfihyMBGKIw7NI8a/wpeJ4/AqcYHhM74Mz ZpAqAztbq0wuAkM9E6ohAYkwzEpzoEJzgDULnnkW3d3bJJdffti8SM8jF4JVn/yd+V3vT+ DAFFRlKGU9vjfRioO/ZQ0mTTTICAmXA= Received: from mail-qt1-f198.google.com (mail-qt1-f198.google.com [209.85.160.198]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-171-KpOrMbEpPwCI6lSYgwkRVg-1; Thu, 04 Aug 2022 16:39:56 -0400 X-MC-Unique: KpOrMbEpPwCI6lSYgwkRVg-1 Received: by mail-qt1-f198.google.com with SMTP id k3-20020ac86043000000b0033cab47c483so594047qtm.4 for ; Thu, 04 Aug 2022 13:39:56 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc; bh=1MZEhXGPLmxxGTFBXqyHNwWW8pRxmnJfI3yJxThqftE=; b=ibW21TH0TFMeDmhVHupIQ2y+azgjT1OqMoXoKKr+QCDcD16YmW2zuoy3X6XxAlWvUG YBdp1EQJ0aK70RksAn/Nv7ElAcRDzhAkwOlTr2pbkrxOEQ2AoOv7nqur807LM5JI0giP dBofkLz6IHIdhGUIw2DTFh4pDgGBuYsOn/RE9c7LlCpZf6mP3JStN4/7JGBUClw/6j9Q UwzoRGaI3M3Qj3FRT6o///4OqaYaWF1UE8xik9ZzaDYWCa2Yln+SnGks2RopT5cSwTEk E/EzDoqID7VNrWPp+iQDZ1uXeLJ1U31jCrwD0tkNe+kmnQy08Th+rIGOpIMPp874Lbei 467w== X-Gm-Message-State: ACgBeo0tqsAU0jxQSsUxC/FqthoFVc5mXRrpt3ZoLy8O39FgqGHez4b8 g5DqhcmDVpBRpIXgYElr1VYGMQgbtAAOB4WT/MN9HnMwZC3UVafQjfc+JPAdaxb+xDI3ve8iIZP TRjxSiJBzidt+cyl4cufEPqqIEtc3CqkaqaRSOtj8qb4A2L8lA1IPIEf8zBD8 X-Received: by 2002:a05:620a:2804:b0:6b8:5f52:c66b with SMTP id f4-20020a05620a280400b006b85f52c66bmr2827371qkp.8.1659645595258; Thu, 04 Aug 2022 13:39:55 -0700 (PDT) X-Google-Smtp-Source: AA6agR5Y2IDkecy+Qrvyscm6Uko/4DMUWbSh00qHIiYTLsn9oAJI5FraJkHOL5ve0uc5Uyqsjq1+QQ== X-Received: by 2002:a05:620a:2804:b0:6b8:5f52:c66b with SMTP id f4-20020a05620a280400b006b85f52c66bmr2827342qkp.8.1659645594876; Thu, 04 Aug 2022 13:39:54 -0700 (PDT) Received: from localhost.localdomain (bras-base-aurron9127w-grc-35-70-27-3-10.dsl.bell.ca. [70.27.3.10]) by smtp.gmail.com with ESMTPSA id k19-20020ae9f113000000b006b5e50057basm1395266qkg.95.2022.08.04.13.39.53 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Thu, 04 Aug 2022 13:39:54 -0700 (PDT) From: Peter Xu To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Huang Ying , Andrea Arcangeli , David Hildenbrand , Minchan Kim , Andrew Morton , Vlastimil Babka , Nadav Amit , Hugh Dickins , Andi Kleen , peterx@redhat.com, "Kirill A . Shutemov" Subject: [PATCH v2 0/2] mm: Remember a/d bits for migration entries Date: Thu, 4 Aug 2022 16:39:50 -0400 Message-Id: <20220804203952.53665-1-peterx@redhat.com> X-Mailer: git-send-email 2.32.0 MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1659645598; a=rsa-sha256; cv=none; b=xcjVoonk2xNAyYLUUa2egaxEiDRGGdTUBDxIzvLiTnQpfJnzGCVgAjR4Wi3H+CoGiTKl/w /sOKeYj+kpM8AXiNDqvh1IJ4mQTD3pVxoNMomyXLrvefid7yIphnNrcLbtZQjBvLHyeKFV 1N6WmdyjvngmqUuTdxyE8SuCP5p5ByM= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=HCKlnZcn; spf=pass (imf23.hostedemail.com: domain of peterx@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1659645598; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding:in-reply-to: references:dkim-signature; bh=1MZEhXGPLmxxGTFBXqyHNwWW8pRxmnJfI3yJxThqftE=; b=sdT9OmziM5PHkCzSF1F4553/ueerfMKDuQor8qNNG1LU/I4a8SZ8fnloEE+4btA7TDnx0y V1onJ8qDoz+pb9MnnwiwhZv4s3lmaJ8w2Ld+7f+qHz7vycjQkFw4q9VMp0QlwFGncoegIm QILVGr26vxw9PXxGcHpglo6EJjsDiyI= X-Rspamd-Server: rspam04 X-Stat-Signature: 1a7xyyjwfc7u9bw7xjdu3yo1kd7cg7t3 Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=HCKlnZcn; spf=pass (imf23.hostedemail.com: domain of peterx@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com X-Rspamd-Queue-Id: 77E40140069 X-Rspam-User: X-HE-Tag: 1659645597-472163 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: v2: - Fix build for !CONFIG_SWAP [syzbot] - Carry over dirty bit too [Nadav] rfc: https://lore.kernel.org/all/20220729014041.21292-1-peterx@redhat.com v1: https://lore.kernel.org/all/20220803012159.36551-1-peterx@redhat.com Problem ======= When migrate a page, right now we always mark the migrated page as old & clean. However that could lead to at least two problems: (1) We lost the real hot/cold information while we could have persisted. That information shouldn't change even if the backing page is changed after the migration, (2) There can be always extra overhead on the immediate next access to any migrated page, because hardware MMU needs cycles to set the young bit again for reads, and dirty bits for write, as long as the hardware MMU supports these bits. Many of the recent upstream works showed that (2) is not something trivial and actually very measurable. In my test case, reading 1G chunk of memory - jumping in page size intervals - could take 99ms just because of the extra setting on the young bit on a generic x86_64 system, comparing to 4ms if young set. This issue is originally reported by Andrea Arcangeli. Solution ======== To solve this problem, this patchset tries to remember the young/dirty bits in the migration entries and carry them over when recovering the ptes. We have the chance to do so because in many systems the swap offset is not really fully used. Migration entries use swp offset to store PFN only, while the PFN is normally not as large as swp offset and normally smaller. It means we do have some free bits in swp offset that we can use to store things like A/D bits, and that's how this series tried to approach this problem. max_swapfile_size() is used here to detect per-arch offset length in swp entries. We'll automatically remember the A/D bits when we find that we have enough swp offset field to keep both the PFN and the extra bits. This series is majorly solving the bit lost issue only for the migration procedure. Besides that, a few topics to mention related to this series: (1) Page Idle Tracking Before this series, idle tracking can cause false negative if an accessed page got migrated, since after migration the young bit will get lost. After this series, it'll be better in that after migration young bit will be persisted, so it'll be able to be detected correctly by page idle logic when walking the pgtable. However there's still nothing done when page idle reset was carried out during migration procedure in progress, but that should be a separate topic to be addressed (e.g. to teach rmap pgtable walk code to be able to walk with both present ptes and migration ptes). (2) MADV_COLD/MADV_FREE This series doesn't teach madvise() code to recognize the new entries with reasons. Currently MADV_COLD is not handled for migration entries containing A bit. Logically A bit should be dropped when colding a page, but here the more important thing is probably LRU operations which is still missing. It may justify that it is not a major scenario for COLD on migration entries. Similar thing applies to MADV_FREE: logically we should consider dropping migration entries as a whole if it is found. In all cases, this series didn't cover any of the cases above assuming they'll be either kept as-is or be addressed in separate patchset. Tests ===== After the patchset applied, the immediate read access test [1] of above 1G chunk after migration can shrink from 99ms to 4ms. The test is done by moving 1G pages from node 0->1->0 then read it in page size jumps. The test is with Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz. Similar effect can also be measured when writting the memory the 1st time after migration. After applying the patchset, both initial immediate read/write after page migrated will perform similarly like before migration happened. Patch Layout ============ Patch 1: Add swp_offset_pfn() and apply to all pfn swap entries, we should also stop treating swp_offset() as PFN anymore because it can contain more information starting from next patch. Patch 2: The core patch to remember young/dirty bit in swap offsets. Please review, thanks. [1] https://github.com/xzpeter/clibs/blob/master/misc/swap-young.c Peter Xu (2): mm/swap: Add swp_offset_pfn() to fetch PFN from swap entry mm: Remember young/dirty bit for page migrations arch/arm64/mm/hugetlbpage.c | 2 +- include/linux/swapops.h | 126 ++++++++++++++++++++++++++++++++++-- mm/hmm.c | 2 +- mm/huge_memory.c | 26 +++++++- mm/memory-failure.c | 2 +- mm/migrate.c | 6 +- mm/migrate_device.c | 4 ++ mm/page_vma_mapped.c | 6 +- mm/rmap.c | 5 +- 9 files changed, 163 insertions(+), 16 deletions(-)