From patchwork Fri Oct 11 23:22:35 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ackerley Tng X-Patchwork-Id: 13833255 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 35802D1A427 for ; Fri, 11 Oct 2024 23:22:57 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 432606B00AC; Fri, 11 Oct 2024 19:22:56 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 3BAA96B00B0; Fri, 11 Oct 2024 19:22:56 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 234B36B00B2; Fri, 11 Oct 2024 19:22:56 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 00EDC6B00AC for ; Fri, 11 Oct 2024 19:22:55 -0400 (EDT) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 67266160E84 for ; Fri, 11 Oct 2024 23:22:50 +0000 (UTC) X-FDA: 82662898626.23.6A060CE Received: from mail-yw1-f201.google.com (mail-yw1-f201.google.com [209.85.128.201]) by imf08.hostedemail.com (Postfix) with ESMTP id AA7A1160017 for ; Fri, 11 Oct 2024 23:22:51 +0000 (UTC) Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=ZJXxJy+M; spf=pass (imf08.hostedemail.com: domain of 3TLMJZwsKCBkz193GA3NIC55DD5A3.1DBA7CJM-BB9Kz19.DG5@flex--ackerleytng.bounces.google.com designates 209.85.128.201 as permitted sender) smtp.mailfrom=3TLMJZwsKCBkz193GA3NIC55DD5A3.1DBA7CJM-BB9Kz19.DG5@flex--ackerleytng.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1728688903; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding:in-reply-to: references:dkim-signature; bh=bt/q6lVwMuPfkVr4eqRPloLtMZQ7iAniDEn2+RLnOtI=; b=OfRDXEXEpWVn7kObELJWgn8yaLlG+0L3NbzNRFzp2RmrQqQ9r23maUarynlh/z4shxyRFz heLfwnehWqsXWxkpHOSyZXkFoZzVSKyyXFQfIbG9bzXnM0jxd4CdJwEsZAEs0q/s4GFZhr AJcqwJCSdApgXJE+SUX7eHlKqXTqnbk= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=ZJXxJy+M; spf=pass (imf08.hostedemail.com: domain of 3TLMJZwsKCBkz193GA3NIC55DD5A3.1DBA7CJM-BB9Kz19.DG5@flex--ackerleytng.bounces.google.com designates 209.85.128.201 as permitted sender) smtp.mailfrom=3TLMJZwsKCBkz193GA3NIC55DD5A3.1DBA7CJM-BB9Kz19.DG5@flex--ackerleytng.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1728688903; a=rsa-sha256; cv=none; b=pTMyAsxaDbR1uKtfrKX+wFjr1dVn1l17lxQysNkq5PVPwQ4eu1XYrzmip+bnyp9fZ8IWgy 9xhmMpkLuUVKLVlZ+4n/TfwW4jUr2QSJMCx60kKTUflCzFNXehulxPblEBLnCha2MsoXaH hHfRXkGYMr/XngonibFbAmT7q3l6Ru8= Received: by mail-yw1-f201.google.com with SMTP id 00721157ae682-6e3245ed6b8so44485727b3.0 for ; Fri, 11 Oct 2024 16:22:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1728688973; x=1729293773; darn=kvack.org; h=cc:to:from:subject:message-id:mime-version:date:from:to:cc:subject :date:message-id:reply-to; bh=bt/q6lVwMuPfkVr4eqRPloLtMZQ7iAniDEn2+RLnOtI=; b=ZJXxJy+M7zo/xV+SH0v7x/4pvr9YKssA12BikeG2oCOZ9xorWOX/gkJYrnfrPWx5bq ATaAys1gdo3v4H+LlKoM4hOeReXgFerrxNxddwyNwT8ezQSW5s3y2NJTW1Jbl0wUasSk 7m0KnJpHu/aGsMfkeRLozc9U5lQvdxexuPKOaRgsW66g2glHBgYQ9Tof/PDxYEBcgR6H D7/mrYaIJLKdhjX9ZrDvGnpABIhhXx5ra1Vr8LWE14oKvLEghjhKCUnMUwvM3XH+hVMG 6T6732EZWxH0Ti8ec3Ny0XcX1JEnh9qtdHeAT+vux4xm7+GkLf7jQwkDSg9YrFoxiZbx AJCw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1728688973; x=1729293773; h=cc:to:from:subject:message-id:mime-version:date:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=bt/q6lVwMuPfkVr4eqRPloLtMZQ7iAniDEn2+RLnOtI=; b=TdS6mqfWhWma4nA/gfLUntOdP+1fmi371sS+P0vJLcFtCEazf8PReUrpn62yIt0ZYf NCemCYaIwsCuVSfEHrXiTmbNmTBPDzpVNowZ7ngvOojF5vWpzuSxXSR2vlpOnpTsQfpv AjVJQBvG6ghTXWIk5IDfPrk7h6sV6BvTBMPZQquTpvEySqlr6jfVXYWX7NG8WZSIOjEJ xoBaE4ksEWB4mn1SBfGlM7Ak1Lu0WQd5UeCvNAzrMIj612N+pcBJobSnY2IpKSksHJpY Njphhe8Bh5YsuDtv3MbvtBQcBB87Awk3YGxc230RRTWw0jyEa/6t8nRslwV9lfRe7JW5 Bvbg== X-Forwarded-Encrypted: i=1; AJvYcCWHkl33ygpjZarEEuRpnqqz7WAmDHWDqWT35hL+KFbyD2v2ft3GF3494NK8bgXwd0s2i7aLfTtfTw==@kvack.org X-Gm-Message-State: AOJu0YwKdmn4UYsLRyxae44GMKyCUSCAreX0Pg3zBfW6J4tU9ZDxZRtp p0W1F182R9rPQHK0oJ4sku7L82tmzlp/0vI5j/O3G9YBYPIym4HggzJPYarJNjHNn+F5snexT5T aGmX0T6Xqo3odw9Gvxwc9HA== X-Google-Smtp-Source: AGHT+IG8lUZjFQAkew/PUffXX1Gv9YN7WjmIker4tw4oJjNyQyJVSHDEY+AvIFQh6DZ3lgAyQc6M0IQdvTU1Ndvldg== X-Received: from ackerleytng-ctop.c.googlers.com ([fda3:e722:ac3:cc00:146:b875:ac13:a9fc]) (user=ackerleytng job=sendgmr) by 2002:a05:690c:4383:b0:6db:c34f:9e4f with SMTP id 00721157ae682-6e347c8c372mr160117b3.8.1728688972926; Fri, 11 Oct 2024 16:22:52 -0700 (PDT) Date: Fri, 11 Oct 2024 23:22:35 +0000 Mime-Version: 1.0 X-Mailer: git-send-email 2.47.0.rc1.288.g06298d1525-goog Message-ID: Subject: [RFC PATCH 0/3] Reduce dependence on vmas deep in hugetlb allocation code From: Ackerley Tng To: muchun.song@linux.dev, peterx@redhat.com, akpm@linux-foundation.org, rientjes@google.com, fvdl@google.com, jthoughton@google.com, david@redhat.com Cc: isaku.yamahata@intel.com, zhiquan1.li@intel.com, fan.du@intel.com, jun.miao@intel.com, tabba@google.com, quic_eberman@quicinc.com, roypat@amazon.co.uk, jgg@nvidia.com, jhubbard@nvidia.com, seanjc@google.com, pbonzini@redhat.com, erdemaktas@google.com, vannapurve@google.com, ackerleytng@google.com, pgonda@google.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org X-Rspamd-Server: rspam03 X-Rspam-User: X-Rspamd-Queue-Id: AA7A1160017 X-Stat-Signature: i3hhuyjsog5jz8ddbdjizqiofuo493tq X-HE-Tag: 1728688971-931480 X-HE-Meta: U2FsdGVkX1+0zsKjOqSsOJqsAcP0UajWCJ0Hetwd/seVvnX3SIDxppmi/h9Mn5zfApH6QPmf78oHEAd1nUSclwo48FO5LzprEJ0OqCPaI/n3M6gs9jfVrp+rK75dWbEI0zslRBCoRur6IsfiCzRtWA2E8nHZO+463T5t9WwLfx7Eao8TzYGibcUZ1RSAUFIUx4FIpfSvLcw4Oj6dqWIV9RWS3llZ3om2kYhqN8wsOn2usqjjBzL9dWXa9pJXTnFWNnremsEs1wYUFkQh5rLzPxrxqNiY42X5OuUsMWzDs1gMZEk/GmI1eXf3C2+iz/dwdbB2tpCot+seXfwMJ4Dt1FJ1MBtpDYkWLslLj0gOj9gg8SWWBD4P/RcKFEpzxuEnLmGkztXkhgvMlt+AsJBX8Z6F1boVnbWnKIUCF/7ejxDa8/0TCvFxMjxFQTnvbstd5OPu86o8enAK1XmysuIp7rZqeznLqHpzLfUiY0mkb5jm70GPYEQVFIY01YqdOxazxQtUVMCjxHtRpJB27ophq3K1H82c7vJ44BKqALwQXSC/QajwO69PjBvBH4hcshV27AiPx2bS5mAGShQu217F/ARPgIGm9Lw8G5oLgsPfEzHF3gKpDlNZ65eszuwyuk2hIfn4+4saKh77HiZFTD8gkO6kBz1HTIbgQ2m/jLQ3ie8i5q0TsEcT2iYL80IefHVGEd7VKJHFzg/maV5J2Kobrg7X4hsgrlxUx6fhmBUMnylBRIxvKkQqVeHmb9tEt8miDnV6brV6+lUuVBWQ9ULxducbq4dz7VCkLSyzM3VZXyMFzTuGom7jM1P7QYL/iMUyVvFOtH/8eZ7ubVMnBJPTl7cUhK6/vu62zHJRbJwaBdHQ6E7rfpvXRp0RQvQcwZvJVmtFT2s/P80djG6oYK2XqXFGPKHy/1Eo/y73ZGWycM42QCEXyn5mi29psQxZq/dxU9noRAxNSj2id583Vxq cYRogkeB B8R6+2kMyN0Jx1IwBAgWubXK9iJHGyPp2PFqh3A1/MSb7lQjJzIeMisjMUERhlchcXE7JBgVbqgZRZ9HiNtkQwYQLLalUW4FTPbXPeK9yroSI+0L1z2s2mEWmgDntVToZRfarTpWilU1eCdO7iX1rUULv6RlSNEDZB7eCXmg/g5rEErOIlamt5LEshDrs/hwJSeFLgBlL68F/py1I8Ssh/xPN5hrQ8/JqLYmbPB9ckGUSjw+KvtkH9dcfEw67fGxSAA8KHKUqJrfvCKRHLyJa/c5RHe7cSs5DQuI9A14nTJvAFSlQk2UTQmm1Lv8yy//XqbSF9Nh86Cqt2gxJ+dMiOaf/UKrLrX+t2gI6Zl2GrjoVoQ29bEOmPNm1c/Ir7F3txo8NdeR0mrxCh5Z0XVvJKfPHMJr90YwIney4dFReB2RjUdi5OKzcfpRUa+uJLaAe9LGwdH9A7DK250JSyFX9DfUWFj0Q0Ny7BeDVUhmggiIauoivGIVxJIMWX9jPvVYnzp1YVbSfGll7PZhLu6DjC0kAP0lVKiZPtA7TAwAgp9ORMVTaoqTPzWVZ9dcTxNR7CiY0lTEublF/j21F7rhhtXoOyZSqbiUzs+sr5czPB6SD4jdcs7nKEQeKof8p/Sd0E50PcCOit3vDu+zms7yM15ErYW0glzuH6Wl4nGxXfVFzuTY= X-Bogosity: Ham, tests=bogofilter, spamicity=0.001034, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: I hope to use these 3 patches to start a discussion on eventually removing the need to pass a struct vma pointer when taking a folio from the global pool (i.e. dequeue_hugetlb_folio_vma()). Why eliminate passing the struct vma pointer? VMAs are more related to mapping into userspace, and it would be cleaner if the HugeTLB folio allocation process could just focus on returning a folio. Currently, the vma struct is a convenient struct that holds pieces of information required in the allocation process, but dequeuing should not depend on the VMA concept. If the vma is needed deep in the allocation process, then allocation could become awkward, such as in HugeTLBfs's fallocate, where there is no vma (yet) and a pseudo-vma has to be created. Separation will help with HugeTLB unification. Taking reference from the buddy allocator, __alloc_pages_noprof() is conceptually separate from VMAs. I started looking into this because we want to use HugeTLB folios in guest_memfd [1], and then I found that the HugeTLB folio allocation process is tightly coupled with VMAs. This makes it hard to use HugeTLB folios in guest_memfd, which does not have VMAs for private pages. Then, I watched Peter Xu's talk at LSFMM [2] about HugeTLB unifications and thought that these patches could also contribute to the unification effort. As discussed at LPC 2024 [3], the general preference is for guest_memfd to use HugeTLB folios. While that is being worked out, I hope these patches can be separately considered and merged. I believe the patches are still useful in improving understandability of the resv_map/subpool/hstate reservation system in HugeTLB, and there are no functionality changes intended. --- Why use HugeTLB folios in guest_memfd? HugeTLB is *the* source of 1G pages in the kernel today and it would be best for all 1G page users (HugeTLB, HugeTLBfs, or guest_memfd) on a host to draw from the same pool of 1G pages. This allows central tracking of all 1G pages, a precious resource on a machine. Having a separate 1G page allocator would not only require rebuilding of features that HugeTLB has, but also cause a split 1G pool. If both allocators are used on a machine, it would be complicated to (a) predetermine how many pages to put in each allocator's pool or (b) transfer pages between the pools at runtime. --- [1] https://lore.kernel.org/all/cover.1726009989.git.ackerleytng@google.com/T/ [2] https://youtu.be/7k-m2gTDu2k?si=ghWZ6qa1GAdaHOFP [3] https://youtu.be/PVTjLLEpozE?si=HvdDlUc_4ElVXu5R Ackerley Tng (3): mm: hugetlb: Simplify logic in dequeue_hugetlb_folio_vma() mm: hugetlb: Refactor vma_has_reserves() to should_use_hstate_resv() mm: hugetlb: Remove unnecessary check for avoid_reserve mm/hugetlb.c | 57 +++++++++++++++++++++------------------------------- 1 file changed, 23 insertions(+), 34 deletions(-) -- 2.47.0.rc1.288.g06298d1525-goog