From patchwork Wed Sep 22 10:24:07 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Muchun Song X-Patchwork-Id: 12510017 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-11.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 713BCC433F5 for ; Wed, 22 Sep 2021 10:27:06 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id D0482611B0 for ; Wed, 22 Sep 2021 10:27:05 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org D0482611B0 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=bytedance.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 314086B006C; Wed, 22 Sep 2021 06:27:05 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 29BC5900002; Wed, 22 Sep 2021 06:27:05 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 115766B0073; Wed, 22 Sep 2021 06:27:05 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0097.hostedemail.com [216.40.44.97]) by kanga.kvack.org (Postfix) with ESMTP id F1AC46B006C for ; Wed, 22 Sep 2021 06:27:04 -0400 (EDT) Received: from smtpin15.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 961812FE14 for ; Wed, 22 Sep 2021 10:27:04 +0000 (UTC) X-FDA: 78614831568.15.C4EF12D Received: from mail-pl1-f179.google.com (mail-pl1-f179.google.com [209.85.214.179]) by imf06.hostedemail.com (Postfix) with ESMTP id 1F988801A8A7 for ; Wed, 22 Sep 2021 10:27:02 +0000 (UTC) Received: by mail-pl1-f179.google.com with SMTP id c4so1445128pls.6 for ; Wed, 22 Sep 2021 03:27:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=+oUB7HAoD4wBiJlrX27jdtnkAUYacWVBdRBc24wDAnY=; b=109xe3JMexjuDNZFOZnCjwRHqec3HiXUEKhmCTXbeoo/oUzyMyd7pesufk7AL+bJGz kfMeNnEvW6uI81FcpdMlI9JEiCVdaBFo6GoEWaOEnBQs8Md/qdC1O+QRPvFA6oKv3MlC 2wA8FLCLr6BH90WmxryCX4QGPu9sevwBxkiwdoyFwTQsh+ly4cHyLqa7YlZUU4N1XDc+ iUZrRGmDjuOpEWhx7UVGgifZ3fDPxty9zNMdwxAzU8T5Ts3oouD9TJim7CLl3RVze3bF GEyvSrBQEcx1fDw+Kzgb/59/oQISeBmtQdB9XGMHySStP9FajRTl76GMO6O0E7jFVQen rBAQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=+oUB7HAoD4wBiJlrX27jdtnkAUYacWVBdRBc24wDAnY=; b=3nGHxM/H8vGjudtwkxjM5LccmEGJTo87yNJUnmAWPRPijeCHPgB9pPxFrwTcxGWv/Y cLqI501lsGtT5bKa+IC0r+9DeQePjhkjfX5kPDJm6qiI45ZQsJZdXYguwVogln0w0dqh qrD+sqiPn406Hi/+1IrtZaB1JwwvGrUr0UGxzvIchhrlV6RUdrDstWj0B8s+P7pbllH3 TqCakiKynNcvUDv2hXg3wXZqunGKBj/F4Uxy2CUNefSA8DI3vTWvQERp7LlLek/AyXd+ Emp0XFI9hKQhuxreu/szhRWo/7+2V3trBqVEZBWPU2x7fdM5NCVSl2wC8qHD8K+VMZDK 8iRQ== X-Gm-Message-State: AOAM532aJiBHv6bURPsAd4mCkqBst5PcOB6VtB5pEsapXtVZvDJeUfYv sGQiSS0f+83Uwc00FbQH49psYQ== X-Google-Smtp-Source: ABdhPJyYI4JmmPfcSXxDKV5iqZ0hKywDYyp4wSBjwQizG5ozLGDpSom6m2tA19h2Ve43QA6xwsk3Rw== X-Received: by 2002:a17:90a:193:: with SMTP id 19mr10372052pjc.164.1632306421592; Wed, 22 Sep 2021 03:27:01 -0700 (PDT) Received: from localhost.localdomain ([139.177.225.255]) by smtp.gmail.com with ESMTPSA id s89sm1821929pjj.43.2021.09.22.03.26.55 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Wed, 22 Sep 2021 03:27:01 -0700 (PDT) From: Muchun Song To: mike.kravetz@oracle.com, akpm@linux-foundation.org, osalvador@suse.de, mhocko@suse.com, song.bao.hua@hisilicon.com, david@redhat.com, chenhuang5@huawei.com, bodeddub@amazon.com, corbet@lwn.net, willy@infradead.org, 21cnbao@gmail.com Cc: duanxiongchun@bytedance.com, fam.zheng@bytedance.com, smuchun@gmail.com, zhengqi.arch@bytedance.com, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Muchun Song Subject: [PATCH v3 0/4] Free the 2nd vmemmap page associated with each HugeTLB page Date: Wed, 22 Sep 2021 18:24:07 +0800 Message-Id: <20210922102411.34494-1-songmuchun@bytedance.com> X-Mailer: git-send-email 2.21.0 (Apple Git-122) MIME-Version: 1.0 X-Stat-Signature: 3z3attfeeitsexjuoi914gb48fp8zte5 Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=109xe3JM; dmarc=pass (policy=none) header.from=bytedance.com; spf=pass (imf06.hostedemail.com: domain of songmuchun@bytedance.com designates 209.85.214.179 as permitted sender) smtp.mailfrom=songmuchun@bytedance.com X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 1F988801A8A7 X-HE-Tag: 1632306422-977915 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: This series can minimize the overhead of struct page for 2MB HugeTLB pages significantly, comments and reviews are welcome. Thanks. After the feature of "Free sonme vmemmap pages of HugeTLB page" is enabled, the mapping of the vmemmap addresses associated with a 2MB HugeTLB page becomes the figure below. HugeTLB struct pages(8 pages) page frame(8 pages) +-----------+ ---virt_to_page---> +-----------+ mapping to +-----------+---> PG_head | | | 0 | -------------> | 0 | | | +-----------+ +-----------+ | | | 1 | -------------> | 1 | | | +-----------+ +-----------+ | | | 2 | ----------------^ ^ ^ ^ ^ ^ | | +-----------+ | | | | | | | | 3 | ------------------+ | | | | | | +-----------+ | | | | | | | 4 | --------------------+ | | | | 2MB | +-----------+ | | | | | | 5 | ----------------------+ | | | | +-----------+ | | | | | 6 | ------------------------+ | | | +-----------+ | | | | 7 | --------------------------+ | | +-----------+ | | | | | | +-----------+ As we can see, the 2nd vmemmap page frame (indexed by 1) is reused and remaped. However, the 2nd vmemmap page frame is also can be freed to the buddy allocator, then we can change the mapping from the figure above to the figure below. HugeTLB struct pages(8 pages) page frame(8 pages) +-----------+ ---virt_to_page---> +-----------+ mapping to +-----------+---> PG_head | | | 0 | -------------> | 0 | | | +-----------+ +-----------+ | | | 1 | ---------------^ ^ ^ ^ ^ ^ ^ | | +-----------+ | | | | | | | | | 2 | -----------------+ | | | | | | | +-----------+ | | | | | | | | 3 | -------------------+ | | | | | | +-----------+ | | | | | | | 4 | ---------------------+ | | | | 2MB | +-----------+ | | | | | | 5 | -----------------------+ | | | | +-----------+ | | | | | 6 | -------------------------+ | | | +-----------+ | | | | 7 | ---------------------------+ | | +-----------+ | | | | | | +-----------+ After we do this, all tail vmemmap pages (1-7) are mapped to the head vmemmap page frame (0). In other words, there are more than one page struct with PG_head associated with each HugeTLB page. We __know__ that there is only one head page struct, the tail page structs with PG_head are fake head page structs. We need an approach to distinguish between those two different types of page structs so that compound_head(), PageHead() and PageTail() can work properly if the parameter is the tail page struct but with PG_head. The following code snippet describes how to distinguish between real and fake head page struct. if (test_bit(PG_head, &page->flags)) { unsigned long head = READ_ONCE(page[1].compound_head); if (head & 1) { if (head == (unsigned long)page + 1) ==> head page struct else ==> tail page struct } else ==> head page struct } We can safely access the field of the @page[1] with PG_head because the @page is a compound page composed with at least two contiguous pages. The main implementation is in the patch 1. In our server, we can save extra 2GB memory with this patchset applied if there are 1 TB HugeTLB (2 MB) pages. If the size of the HugeTLB page is 1 GB, it only can save 4MB. For 2 MB HugeTLB page, it is a nice gain. Changlogs in v3: 1. Rename page_head_if_fake() to page_fixed_fake_head(). 2. Introducing a new helper page_is_fake_head() to make code more readable. 3. Update commit log of patch 3 to add more judgements. 4. Add some comments in check_page_flags() in the patch 4. Thanks Barry for his suggestions and reviews. Changlogs in v2: 1. Drop two patches of introducing PAGEFLAGS_MASK from this series. 2. Let page_head_if_fake() return page instead of NULL. 3. Add a selftest to check if PageHead or PageTail work well. Muchun Song (4): mm: hugetlb: free the 2nd vmemmap page associated with each HugeTLB page mm: hugetlb: replace hugetlb_free_vmemmap_enabled with a static_key mm: sparsemem: use page table lock to protect kernel pmd operations selftests: vm: add a hugetlb test case Documentation/admin-guide/kernel-parameters.txt | 2 +- include/linux/hugetlb.h | 10 +- include/linux/page-flags.h | 79 ++++++++++++- mm/hugetlb_vmemmap.c | 66 ++++++----- mm/memory_hotplug.c | 2 +- mm/ptdump.c | 16 ++- mm/sparse-vmemmap.c | 70 +++++++++--- tools/testing/selftests/vm/vmemmap_hugetlb.c | 144 ++++++++++++++++++++++++ 8 files changed, 332 insertions(+), 57 deletions(-) create mode 100644 tools/testing/selftests/vm/vmemmap_hugetlb.c