From patchwork Tue Aug 15 21:25:47 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 13354340 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B187EC04A94 for ; Tue, 15 Aug 2023 21:25:58 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 191CD94002F; Tue, 15 Aug 2023 17:25:58 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 11B8B8D0001; Tue, 15 Aug 2023 17:25:58 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EFCAC94002F; Tue, 15 Aug 2023 17:25:57 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id DBF838D0001 for ; Tue, 15 Aug 2023 17:25:57 -0400 (EDT) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 9DD1816036A for ; Tue, 15 Aug 2023 21:25:57 +0000 (UTC) X-FDA: 81127621554.19.0E7AC8C Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf15.hostedemail.com (Postfix) with ESMTP id 0901AA0007 for ; Tue, 15 Aug 2023 21:25:53 +0000 (UTC) Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=hgX8sTHb; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf15.hostedemail.com: domain of peterx@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=peterx@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1692134755; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding:in-reply-to: references:dkim-signature; bh=LA/KNe9RAueM9eTxmInOgxtXMAA1qFu1F1sExD6LA7g=; b=i+DW63InxZCJ8eK+nW9dgw2t7kYjtbBvsZ8ejdPdNe5/LcQ2pASuTlgL8rzl+u/vf5/Mhf T1nJdDF10DvpVuvI/4etTKzrF8lRCNSlXTq8C3Fbi0PgaIqGBlYSRfbz+EluOPohWiAls8 Q+1AL0c1Z4aJMIOmk2pBQTEp75ep37Q= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=hgX8sTHb; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf15.hostedemail.com: domain of peterx@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=peterx@redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1692134755; a=rsa-sha256; cv=none; b=z1/2pyPRfzXaRbZ41tlAZ0n8ZN8MPyYTAMqKtepj772AR8fBTORkX1/C0QLX2+NrTJ02F5 /hAweVR+HyeBjzKQg13+FzBak06LKmS88ybQgR0TxCRPefGn85pYlEsVlPnveECNQMN+Aa D2annwGbNeqzqvvG3At169S/ATVbjZI= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1692134753; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=LA/KNe9RAueM9eTxmInOgxtXMAA1qFu1F1sExD6LA7g=; b=hgX8sTHbqkcAGPvC/yIavvMoynbAOpSsfPj0FmMHLEn8BONKX92UR4q0fS2JgoEIJ5K05w vzb/xeSRui2uXU960Mgqu6D8ToN9cnyaFXz0ZTsCJ1QlewqbHT2j6QoFAAIj2+aQOoT9rT YZrj2zEOmGHtXpYcztqucLuAwRYxJoY= Received: from mail-qt1-f200.google.com (mail-qt1-f200.google.com [209.85.160.200]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-93-bBGjuCRyO8u9J0idM-37xQ-1; Tue, 15 Aug 2023 17:25:49 -0400 X-MC-Unique: bBGjuCRyO8u9J0idM-37xQ-1 Received: by mail-qt1-f200.google.com with SMTP id d75a77b69052e-40fd6d83c21so17926821cf.1 for ; Tue, 15 Aug 2023 14:25:49 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1692134749; x=1692739549; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=LA/KNe9RAueM9eTxmInOgxtXMAA1qFu1F1sExD6LA7g=; b=I9FY3KGOIlt4biiqAz5e9eHVW4SPmNDQ5HneCpsOysAa4xxlS3yTyXZAw1EHuHOpDn Pl9dAlto7Ob4Awx2+M0X9lTjhbjPWQuBUkhKdwbevVOqfZh1NfTBcubwt3ju5sbnO8bs zQNi9zKYyp/QZsKgjm6GnrAGpQ6+JAMv8MfA7Xy1Nn39bcS5YuD3Cm9efta43BOGeTqD svMrvzQ1IGCo/BKhsSLy1O7KNfGXHJV5V1YCqjqJgDdkCr+1h44IxOp6LxvWEpOnGhII EuYB0bzscCMVXdjHTyAbg/UsfVtdk2HBUf8d2PSs+CO1sWAOZGrMLCeDtTNB04OgmVrW PJ/Q== X-Gm-Message-State: AOJu0YzFkX0hIIpMthOn5IUdO1LRt7CF+2HQ0IIN1dOiIrD1F8H8l5lo plQcVozg6Pt8KfmG3zcySKZcRO/V6AsjbRCw5g96/UC76TiPyBOITIzRZk6k1KMQiuEEUR1KrnF mqJvX6wAVUnY= X-Received: by 2002:a05:622a:19a1:b0:3f6:a8e2:127b with SMTP id u33-20020a05622a19a100b003f6a8e2127bmr20518278qtc.5.1692134749288; Tue, 15 Aug 2023 14:25:49 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFHBa7lM2k7nFCP/moqysB/BtWU5zxoKgZTm9XcpxszLJz8ZkxYyjeAst6aEvIPWdHt8/8WkQ== X-Received: by 2002:a05:622a:19a1:b0:3f6:a8e2:127b with SMTP id u33-20020a05622a19a100b003f6a8e2127bmr20518247qtc.5.1692134748927; Tue, 15 Aug 2023 14:25:48 -0700 (PDT) Received: from x1n.redhat.com (cpe5c7695f3aee0-cm5c7695f3aede.cpe.net.cable.rogers.com. [99.254.144.39]) by smtp.gmail.com with ESMTPSA id v10-20020ac873ca000000b0040554ed322dsm4025975qtp.62.2023.08.15.14.25.47 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 15 Aug 2023 14:25:48 -0700 (PDT) From: Peter Xu To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: peterx@redhat.com, Hugh Dickins , "Kirill A . Shutemov" , Randy Dunlap , Mike Kravetz , Matthew Wilcox , Yu Zhao , Ryan Roberts , Andrew Morton , Yang Shi , David Hildenbrand Subject: [PATCH RFC v3] mm: Proper document tail pages fields for folio Date: Tue, 15 Aug 2023 17:25:47 -0400 Message-ID: <20230815212547.431693-1-peterx@redhat.com> X-Mailer: git-send-email 2.41.0 MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 0901AA0007 X-Stat-Signature: e1bfaokayiseu4gd9jbhmb9hq3p5etos X-HE-Tag: 1692134753-988762 X-HE-Meta: U2FsdGVkX19KSEuuAn0n8dqa0oZ9XHjfm55EltIJcPYjONDcCgIjs41Xy7CDg2mOQuRmHwZiM16eAx3WHjZRSZpkHaDvRw+iOrEcgTVBV6B4b4LiBSyiSOtNKKZODif40qg9yXEXAO5r6Fmrx+lL2xFX1LaDPv/LVcrzQzS2EuBXZ/KF9z0hwJKxZs0ydAJLpIvZ0POVtVbTKJvT7PwyVtmUvpLBYJQUs4fpF4EbQdlFj29dO4oXaipFVBBNgVDINH5G7XrFevs/Moe3xLBTRq1dMpLbbigINVOmzeom+Q8pXQ8Z+VJnpTmojt2k7zP/w3M5mMzVrngar2e0oshp9oZb3sbIxdPzkvCp8gqOfNBcVgEW5q5ejOidWfDBDpLWrNWNCbiFiaERFXSC9qGkwu9/ENAcHOeIs6HMrvTg6LwREPqfO0PW9zG+417ViXu4HpETjCuF2lnDgcqzvV2Wtnr5E+hnPKA3GVK40DVqQ8T6L0yP4rVgENEe7lx5AddRygb/CSiahc3MrN6brPP35WwhuhmgNewrQCgJnPsmiODHc4XDJX56oXItMQ7nIDh9YxirkA2at8r49wgvuieoZzXtbSR+0kmALTg7T21K+EOVT+iG/bD6KwzosC103DxlWMloT8nP8AWGekCXWYvN/HHAJev2B2o7p7U4mupFaKYEDucUtjRNL5O3sGnZTsFqzVhDBcvkBfg7Fz/NOY2dC09Sf62oqHOqQmVIe6knLpASgxHnJ/0y/EyqdYT6xvzKe3m8r5T2rICWhoENTTOmMB//TcH6FaDlXm2wvUMUqFYQqKfnVfGioTQWoq0nubkCxW8MZpGSjYy0d4VwBFErf4+a1aFocO4QsN7xoaAYBQR/j3ADGHg2pOjYL22wHu0Y+aBMgkkcc6zy0qE0fhtsO/YDE6owWjmufat+WqTcuLHKJIhV0G6mdTmfoSR6mPesiXSLdkdEXD+2PFgh+rD mj1BHUVQ 7vj+qhBKcCxQ5aJL1bsaCNxAgYGPv+3/UDSkU66DseYcg709yPlmCFQpvN4DaUP+DgUge8Nnyu9AXSFtSqJ4Bs/vZFsxDSBi14BnufruiOoBqqbyuM7U6JCkJh+0WMtiR1D/E2LseXrAEsn1X6X4UFdRs36MTPA8tQ6gbR18f6as5Xy3aRCj6Jn9MRuZJg6ObrYUutorIvVct1tT16kbkJqqZJsXa5kVKEAg5uGBe2ItGvsFkSsmFA1J9JO3H3gbOpLJtuHVI+jUB5VY7QP4pTpDwYBlkSnUUVrkhNejHNeqIGm17gH19iN+6dH0FAdsYAseKaJBuMu5XIysnQ3EXxwkh7e8LBShIwj//LVxfw8ibS2sxdgsLlLAFIa3AOfJKc1/r2AQeFgmlu7AntlIaAuXZjZxA8bzGin4zUp9LKNYt8ilScF+ZR0sRiPtqu1/DCpRN X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Tail page struct reuse is over-comlicated. Not only because we have implicit uses of tail page fields (mapcounts, or private for thp swap support, etc., that we may still use in the page structs, but not obvious the relationship between that and the folio definitions), but also because we have 32/64 bits layouts for struct page so it's unclear what we can use and what we cannot when trying to find a new spot in folio struct. It's also unclear on how many fields we can reuse for a tail page. The real answer is (after help from Matthew): we have 7 WORDs guaranteed on 64 bits and 8 WORDs on 32 bits. Nothing more than that is guaranteed to even exist. That means nothing over page->_refcount field can be reused. Let's document it clearly on what we can use and what we can't when extending folio on reusing tail page fields, with explanations on each of them. Hopefully after the doc update it will make it easier when: (1) Any reader to know exactly what folio field is where and for what, the relationships between folio tail pages and struct page definitions, (2) Any potential new fields to be added to a large folio, so we're clear which field one can still reuse. This is assuming WORD is defined as sizeof(void *) on any archs, just like the other comment in struct page we already have. The _mapcount/_refcount fields are also added for each tail page to clamp the fields tight, with FOLIO_MATCH() making sure nothing messed up the ordering. Signed-off-by: Peter Xu Signed-off-by: David Hildenbrand --- rfcv1: https://lore.kernel.org/all/20230810204944.53471-1-peterx@redhat.com rfcv2: https://lore.kernel.org/r/20230814184411.330496-1-peterx@redhat.com No change log since it changed quite a bit; I sent patch 1 separately as non-rfc, while I merged the rest two patches because I just noticed I can avoid reorder the fields, so no functional change should be intended, hence no reason to split either. Matthew, I wanted to remove the whole chunk of comments above the tail pages from last version (which might fall into "over-documented" category), but at last I still kept it; not only because I found that helpful to give me a whole picture (maybe only me?), but also it's a good place to document a few important things (e.g., on the fact that refcnt==0 is a must for all tails). I'm open to removing the chunk or part of it, if you think the rest is still ok. This of course also conflict so far with the other series to drop folio_order/... but I can always rebase if this is not NACKed. Comments welcomed, thanks. --- include/linux/mm_types.h | 69 ++++++++++++++++++++++++++++++++++++++-- 1 file changed, 66 insertions(+), 3 deletions(-) diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 81456fa5fda5..66f1b0814334 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -324,6 +324,35 @@ struct folio { }; struct page page; }; + /* + * Some of the tail page fields may not be reused by the folio + * object because they have already been used by the page struct. + * On 32bits there are at least 8 WORDs while on 64 bits there're + * at least 7 WORDs, all ending at _refcount field. + * + * |--------+-------------+-------------------| + * | index | 32 bits | 64 bits | + * |--------+-------------+-------------------| + * | 0 | flags | flags | + * | 1 | head | head | + * | 2 | FREE | FREE | + * | 3 | FREE [1] | FREE [1] | + * | 4 | FREE | FREE | + * | 5 | FREE | private [2] | + * | 6 | mapcnt | mapcnt+refcnt [3] | + * | 7 | refcnt [3] | | + * |--------+-------------+-------------------| + * + * [1] "mapping" field. It is free to use but needs to be with + * some caution due to poisoning, see TAIL_MAPPING_REUSED_MAX. + * + * [2] "private" field, used when THP_SWAP is on (but disabled on + * 32 bits, so this index is FREE on 32bit or hugetlb folios). + * May need to be fixed finally. + * + * [3] "refcount" field must be zero for all tail pages. See e.g. + * has_unmovable_pages() on page_ref_count() check and comment. + */ union { struct { unsigned long _flags_1; @@ -331,18 +360,29 @@ struct folio { /* public: */ unsigned char _folio_dtor; unsigned char _folio_order; + /* private: 2 bytes can be reused later */ + unsigned char _free_1_0[2]; + /* public: */ atomic_t _entire_mapcount; atomic_t _nr_pages_mapped; atomic_t _pincount; #ifdef CONFIG_64BIT unsigned int _folio_nr_pages; + /* private: 4 bytes can be reused later (64 bits only) */ + unsigned char _free_1_1[4]; + /* Currently used by THP_SWAP, to be fixed */ + void *_private_1; + /* public: */ #endif + /* private: */ + atomic_t _mapcount_1; + atomic_t _refcount_1; /* private: the union with struct page is transitional */ }; struct page __page_1; }; union { - struct { + struct { /* hugetlb folios */ unsigned long _flags_2; unsigned long _head_2; /* public: */ @@ -351,13 +391,22 @@ struct folio { void *_hugetlb_cgroup_rsvd; void *_hugetlb_hwpoison; /* private: the union with struct page is transitional */ + atomic_t _mapcount_2; + atomic_t _refcount_2; }; - struct { + struct { /* non-hugetlb folios */ unsigned long _flags_2a; unsigned long _head_2a; /* public: */ struct list_head _deferred_list; - /* private: the union with struct page is transitional */ + /* private: 8 more free bytes for either 32/64 bits */ + unsigned char _free_2_2[8]; +#ifdef CONFIG_64BIT + /* currently used by THP_SWAP, to be fixed */ + void *_private_2a; +#endif + atomic_t _mapcount_2a; + atomic_t _refcount_2a; }; struct page __page_2; }; @@ -382,12 +431,26 @@ FOLIO_MATCH(memcg_data, memcg_data); offsetof(struct page, pg) + sizeof(struct page)) FOLIO_MATCH(flags, _flags_1); FOLIO_MATCH(compound_head, _head_1); +#ifdef CONFIG_64BIT +FOLIO_MATCH(private, _private_1); +#endif +FOLIO_MATCH(_mapcount, _mapcount_1); +FOLIO_MATCH(_refcount, _refcount_1); #undef FOLIO_MATCH #define FOLIO_MATCH(pg, fl) \ static_assert(offsetof(struct folio, fl) == \ offsetof(struct page, pg) + 2 * sizeof(struct page)) FOLIO_MATCH(flags, _flags_2); FOLIO_MATCH(compound_head, _head_2); +FOLIO_MATCH(_mapcount, _mapcount_2); +FOLIO_MATCH(_refcount, _refcount_2); +FOLIO_MATCH(flags, _flags_2a); +FOLIO_MATCH(compound_head, _head_2a); +FOLIO_MATCH(_mapcount, _mapcount_2a); +FOLIO_MATCH(_refcount, _refcount_2a); +#ifdef CONFIG_64BIT +FOLIO_MATCH(private, _private_2a); +#endif #undef FOLIO_MATCH /*