From patchwork Fri Jul 29 01:40:37 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 12931859 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 45EA2C00140 for ; Fri, 29 Jul 2022 01:40:48 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id AD4908E0001; Thu, 28 Jul 2022 21:40:47 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A83976B0072; Thu, 28 Jul 2022 21:40:47 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 924658E0001; Thu, 28 Jul 2022 21:40:47 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 7D4226B0071 for ; Thu, 28 Jul 2022 21:40:47 -0400 (EDT) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 4C63E1410C3 for ; Fri, 29 Jul 2022 01:40:47 +0000 (UTC) X-FDA: 79738433334.08.C162D2A Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf15.hostedemail.com (Postfix) with ESMTP id D5A68A00CB for ; Fri, 29 Jul 2022 01:40:45 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1659058845; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=YhoWXCGcYHtMETdYFmNXJciqsK6lpwzX/oxTkeK81lg=; b=ab6p5yxQLq6nMf31cTGWy8kmFwrVOeoIjrpQEhppXgVS40SKNv2KTsc39O1aH0D72GuWHc eMzWszdwRJAsLTcBu2c/45dQt9xBNNBfHgU9RJYlElXpq9CKIiwu3aTwg971m8rPQjud/4 7pzZSqPqZ9VH8AaL2Uc4pwZndG2hFn8= Received: from mail-qk1-f200.google.com (mail-qk1-f200.google.com [209.85.222.200]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-278-7JgsBrUGOw63KORFrJiqjg-1; Thu, 28 Jul 2022 21:40:44 -0400 X-MC-Unique: 7JgsBrUGOw63KORFrJiqjg-1 Received: by mail-qk1-f200.google.com with SMTP id bj26-20020a05620a191a00b006b5c4e2dc77so2621937qkb.16 for ; Thu, 28 Jul 2022 18:40:44 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=YhoWXCGcYHtMETdYFmNXJciqsK6lpwzX/oxTkeK81lg=; b=nfcjvPVpmZWwHE3oSmXrPPeXx18jFT+dUZ2YMfK1HHa4n91UTlm7athOsJzBUpkeNk wLJZ2l2jvDkfNdgzR6dk6Go0SMPtFM0hztrDjj3vQLVYIgr9LY/ZRCCvhvDFF+YCbEUi 9ymdZpouiB/nJczdjKVe5GH9bcWKv/4s7QPurz2jH4Lkr4P3uJKegMv93+CDRSfpdPHW dwabxZ16Et0Z8fVm+hxIUZKeoYBWpKZqDgOcnXp4vpTA6hA8ThZTg7lIGqlemaci35rd YX8BsAgSaf5NrdwAVNzJwHu2A/1lMSYBKbBSOriIKtaHxk2jqo2pe/LB23uCdBaThc7S xdFg== X-Gm-Message-State: AJIora/uowYO4/3gymcjZct1YxXSKhYZU9lLQEL3IsBrBlzwUS4hYUbj WcTHqGacPUXwcpR5/yuR3iK8Q90FEUh2rfEdSszxhfDAluu1L7pO8qSw48sRxJMSErwf4aR5ks+ RJMWDEX/SlD88KHm/H0Mbh0gg/qZRNV6Q270zDmAkl29K+hn3DsEolzgUIvU3 X-Received: by 2002:ac8:5753:0:b0:31e:eb8e:6c5b with SMTP id 19-20020ac85753000000b0031eeb8e6c5bmr1562690qtx.630.1659058843578; Thu, 28 Jul 2022 18:40:43 -0700 (PDT) X-Google-Smtp-Source: AGRyM1sisuiIowjvve1UXIMx1jFjzxHrHqZwm4h1E+tG7ztTiMUWChTU59F22LS70qME7O9B4M8GqQ== X-Received: by 2002:ac8:5753:0:b0:31e:eb8e:6c5b with SMTP id 19-20020ac85753000000b0031eeb8e6c5bmr1562671qtx.630.1659058843257; Thu, 28 Jul 2022 18:40:43 -0700 (PDT) Received: from localhost.localdomain (bras-base-aurron9127w-grc-35-70-27-3-10.dsl.bell.ca. [70.27.3.10]) by smtp.gmail.com with ESMTPSA id u9-20020a05620a454900b006b259b5dd12sm1584531qkp.53.2022.07.28.18.40.42 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Thu, 28 Jul 2022 18:40:42 -0700 (PDT) From: Peter Xu To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Huang Ying , peterx@redhat.com, Andrea Arcangeli , Andrew Morton , "Kirill A . Shutemov" , Nadav Amit , Hugh Dickins , David Hildenbrand , Vlastimil Babka Subject: [PATCH RFC 0/4] mm: Remember young bit for migration entries Date: Thu, 28 Jul 2022 21:40:37 -0400 Message-Id: <20220729014041.21292-1-peterx@redhat.com> X-Mailer: git-send-email 2.32.0 MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=ab6p5yxQ; spf=pass (imf15.hostedemail.com: domain of peterx@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1659058846; a=rsa-sha256; cv=none; b=k9chCWwI1sf1RxzgI7djzf5KauapH+nC2ICjQ4/n4p53sRloMrGYVFFsMOsDrbrVLm07mN eI2NoKZvV02W3MxYuv34xWDzwFsC7IVxK6IiDQ8guFvNlJ/oXx/7ZYU7rW0/RIR0lmBJlY nSujAKdHWHNrKDAE/GF83rmMyqJcpis= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1659058846; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding:in-reply-to: references:dkim-signature; bh=YhoWXCGcYHtMETdYFmNXJciqsK6lpwzX/oxTkeK81lg=; b=wrV66h54aR79r7p5j00lPNcHBzzQcn7/TC1j2fcjFWgXwFC/lDpg/3yE8FHHo4q5Npgc6C 7sEWQEecLwtcHZ+us33htTmZ34N5oFS2N8gTtcIvMqnXkfJpk1muh67iC589Ea9BDrZWmg /oJ5unFb5tem1EvRakSDgmxFtIOak7Q= X-Stat-Signature: pt13es1n1yawxerxq9o6zx7gk7x8k3fo X-Rspamd-Queue-Id: D5A68A00CB X-Rspam-User: Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=ab6p5yxQ; spf=pass (imf15.hostedemail.com: domain of peterx@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com X-Rspamd-Server: rspam12 X-HE-Tag: 1659058845-714692 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: [Marking as RFC; only x86 is supported for now, plan to add a few more archs when there's a formal version] Problem ======= When migrate a page, right now we always mark the migrated page as old. The reason could be that we don't really know whether the page is hot or cold, so we could have taken it a default negative assuming that's safer. However that could lead to at least two problems: (1) We lost the real hot/cold information while we could have persisted. That information shouldn't change even if the backing page is changed after the migration, (2) There can be always extra overhead on the immediate next access to any migrated page, because hardware MMU needs cycles to set the young bit again (as long as the MMU supports). Many of the recent upstream works showed that (2) is not something trivial and actually very measurable. In my test case, reading 1G chunk of memory - jumping in page size intervals - could take 99ms just because of the extra setting on the young bit on a generic x86_64 system, comparing to 4ms if young set. This issue is originally reported by Andrea Arcangeli. Solution ======== To solve this problem, this patchset tries to remember the young bit in the migration entries and carry it over when recovering the ptes. We have the chance to do so because in many systems the swap offset is not really fully used. Migration entries use swp offset to store PFN only, while the PFN is normally not as large as swp offset and normally smaller. It means we do have some free bits in swp offset that we can use to store things like young, and that's how this series tried to approach this problem. One tricky thing here is even though we're embedding the information into swap entry which seems to be a very generic data structure, the number of bits that are free is still arch dependent. Not only because the size of swp_entry_t differs, but also due to the different layouts of swap ptes on different archs. Here, this series requires specific arch to define an extra macro called __ARCH_SWP_OFFSET_BITS represents the size of swp offset. With this information, the swap logic can know whether there's extra bits to use, then it'll remember the young bits when possible. By default, it'll keep the old behavior of keeping all migrated pages cold. Tests ===== After the patchset applied, the immediate read access test [1] of above 1G chunk after migration can shrink from 99ms to 4ms. The test is done by moving 1G pages from node 0->1->0 then read it in page size jumps. Currently __ARCH_SWP_OFFSET_BITS is only defined on x86 for this series and only tested on x86_64 with Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz. Patch Layout ============ Patch 1: Add swp_offset_pfn() and apply to all pfn swap entries, we should also stop treating swp_offset() as PFN anymore because it can contain more information starting from next patch. Patch 2: The core patch to remember young bit in swap offsets. Patch 3: A cleanup for x86 32 bits pgtable.h. Patch 4: Define __ARCH_SWP_OFFSET_BITS on x86, enable young bit for migration Please review, thanks. [1] https://github.com/xzpeter/clibs/blob/master/misc/swap-young.c Peter Xu (4): mm/swap: Add swp_offset_pfn() to fetch PFN from swap entry mm: Remember young bit for page migrations mm/x86: Use SWP_TYPE_BITS in 3-level swap macros mm/x86: Define __ARCH_SWP_OFFSET_BITS arch/arm64/mm/hugetlbpage.c | 2 +- arch/x86/include/asm/pgtable-2level.h | 6 ++ arch/x86/include/asm/pgtable-3level.h | 15 +++-- arch/x86/include/asm/pgtable_64.h | 5 ++ include/linux/swapops.h | 85 +++++++++++++++++++++++++-- mm/hmm.c | 2 +- mm/huge_memory.c | 10 +++- mm/memory-failure.c | 2 +- mm/migrate.c | 4 +- mm/migrate_device.c | 2 + mm/page_vma_mapped.c | 6 +- mm/rmap.c | 3 +- 12 files changed, 122 insertions(+), 20 deletions(-)