[v3] mm: alloc_pages_bulk: support both simple and full-featured API

As mentioned in [1], it seems odd to check NULL elements in
the middle of page bulk allocating, and it seems caller can
do a better job of bulk allocating pages into a whole array
sequentially without checking NULL elements first before
doing the page bulk allocation for most of existing users
by passing 'page_array + allocated' and 'nr_pages - allocated'
when calling subsequent page bulk alloc API so that NULL
checking can be avoided, see the pattern in mm/mempolicy.c.

Through analyzing of existing bulk allocation API users, it
seems only the fs users are depending on the assumption of
populating only NULL elements, see:
commit 91d6ac1d62c3 ("btrfs: allocate page arrays using bulk page allocator")
commit d6db47e571dc ("erofs: do not use pagepool in z_erofs_gbuf_growsize()")
commit f6e70aab9dfe ("SUNRPC: refresh rq_pages using a bulk page allocator")
commit 88e4d41a264d ("SUNRPC: Use __alloc_bulk_pages() in svc_init_buffer()")

The current API adds a mental burden for most users. For most
users, their code would be much cleaner if the interface accepts
an uninitialised array with length, and were told how many pages
had been stored in that array, so support one simple and one
full-featured to meet the above different use cases as below:
- alloc_pages_bulk() would be given an uninitialised array of page
  pointers and a required count and would return the number of
  pages that were allocated.
- alloc_pages_bulk_refill() would be given an initialised array
  of page pointers some of which might be NULL. It would attempt
  to allocate pages for the non-NULL pointers, return 0 if all
  pages are allocated, -EAGAIN if at least one page allocated,
  ok to try again immediately or -ENOMEM if don't bother trying
  again soon, which provides a more consistent semantics than the
  current API as mentioned in [2], at the cost of the pages might
  be getting re-ordered to make the implementation simpler.

Change the existing fs users to use the full-featured API, except
for the one for svc_init_buffer() in net/sunrpc/svc.c. Other
existing callers can use the simple API as they seems to be passing
all NULL elements via memset, kzalloc, etc, only remove unnecessary
memset for existing users calling the simple API in this patch.

The test result for xfstests full test:
Before this patch:
btrfs/default: 1061 tests, 3 failures, 290 skipped, 13152 seconds
  Failures: btrfs/012 btrfs/226
  Flaky: generic/301: 60% (3/5)
Totals: 1073 tests, 290 skipped, 13 failures, 0 errors, 12540s

nfs/loopback: 530 tests, 3 failures, 392 skipped, 3942 seconds
  Failures: generic/464 generic/551
  Flaky: generic/650: 40% (2/5)
Totals: 542 tests, 392 skipped, 12 failures, 0 errors, 3799s

After this patch:
btrfs/default: 1061 tests, 2 failures, 290 skipped, 13446 seconds
  Failures: btrfs/012 btrfs/226
Totals: 1069 tests, 290 skipped, 10 failures, 0 errors, 12853s

nfs/loopback: 530 tests, 3 failures, 392 skipped, 4103 seconds
  Failures: generic/464 generic/551
  Flaky: generic/650: 60% (3/5)
Totals: 542 tests, 392 skipped, 13 failures, 0 errors, 3933s

The stress test also suggest there is no regression for the erofs
too.

Using the simple API also enable the caller to not zero the array
before calling the page bulk allocating API, which has about 1~2 ns
performance improvement for time_bench_page_pool03_slow() test case
of page_pool in a x86 vm system, this reduces some performance impact
of fixing the DMA API misuse problem in [3], performance improves
from 87.886 ns to 86.429 ns.

Also a temporary patch to enable the using of full-featured API in
page_pool suggests that the new full-featured API doesn't seem to have
noticeable performance impact for the existing users, like SUNRPC, btrfs
and erofs.

1. https://lore.kernel.org/all/bd8c2f5c-464d-44ab-b607-390a87ea4cd5@huawei.com/
2. https://lore.kernel.org/all/180818a1-b906-4a0b-89d3-34cb71cc26c9@huawei.com/
3. https://lore.kernel.org/all/20250212092552.1779679-1-linyunsheng@huawei.com/
CC: Jesper Dangaard Brouer <hawk@kernel.org>
CC: Luiz Capitulino <luizcap@redhat.com>
CC: Mel Gorman <mgorman@techsingularity.net>
Suggested-by: Neil Brown <neilb@suse.de>
Acked-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
---
V3:
1. Provide both simple and full-featured API as suggested by NeilBrown.
2. Do the fs testing as suggested in V2.

V2:
1. Drop RFC tag.
2. Fix a compile error for xfs.
3. Defragmemt the page_array for SUNRPC and btrfs.
---
 drivers/vfio/pci/mlx5/cmd.c       |  2 --
 drivers/vfio/pci/virtio/migrate.c |  2 --
 fs/btrfs/extent_io.c              | 21 +++++++++---------
 fs/erofs/zutil.c                  | 11 +++++----
 include/linux/gfp.h               | 37 +++++++++++++++++++++++++++++++
 include/trace/events/sunrpc.h     | 12 +++++-----
 kernel/bpf/arena.c                |  1 -
 mm/page_alloc.c                   | 32 +++++---------------------
 net/core/page_pool.c              |  3 ---
 net/sunrpc/svc_xprt.c             | 12 ++++++----
 10 files changed, 72 insertions(+), 61 deletions(-)

Message ID	20250414120819.3053967-1-linyunsheng@huawei.com (mailing list archive)
State	New
Headers	show Received: from szxga06-in.huawei.com (szxga06-in.huawei.com [45.249.212.32]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id F2B8818A6A9; Mon, 14 Apr 2025 12:14:59 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.32 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744632903; cv=none; b=mJWPPgjajJspBsTXHLqdvkrmuDu3lN5+lZ2V9kQrE5swyNYOsmd9INWb+61465hOk4bRLSfYT7zlJnQ/uVSO0OY1p3yjsgNXMR/x93gDDQCvhBzVhQx9tLQO6mSQnz+NnIcUz5PaKli/5DxiKtIIu85Fbou1OBnOAijsytY+njk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744632903; c=relaxed/simple; bh=Vh1XYNk1Pgd7/7fao+EL3a14mR2DGRdB6cj8+UPIOGo=; h=From:To:CC:Subject:Date:Message-ID:MIME-Version:Content-Type; b=G4uKOiLzo4yqERVkdYJ0lVShA+qJsmQimRBLHjF1RhhF3r73sbgDmM5VRAbWDgbX9IRXJEkrF/U8ChGQCpWvr4Z3cnTJWrbcvNd4mW8u9B704mFWOlTSpsRlWsQVytLCquKVhADzoAwzHatNZydj+re0SYIKXc9KmgDpRyum0CU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com; spf=pass smtp.mailfrom=huawei.com; arc=none smtp.client-ip=45.249.212.32 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huawei.com Received: from mail.maildlp.com (unknown [172.19.162.112]) by szxga06-in.huawei.com (SkyGuard) with ESMTP id 4ZbmTN4kPBz27hSp; Mon, 14 Apr 2025 20:15:36 +0800 (CST) Received: from dggpemf200006.china.huawei.com (unknown [7.185.36.61]) by mail.maildlp.com (Postfix) with ESMTPS id 3309B140257; Mon, 14 Apr 2025 20:14:55 +0800 (CST) Received: from localhost.localdomain (10.90.30.45) by dggpemf200006.china.huawei.com (7.185.36.61) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.11; Mon, 14 Apr 2025 20:14:54 +0800 From: Yunsheng Lin <linyunsheng@huawei.com> To: Andrew Morton <akpm@linux-foundation.org>, Yishai Hadas <yishaih@nvidia.com>, Jason Gunthorpe <jgg@ziepe.ca>, Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>, Kevin Tian <kevin.tian@intel.com>, Alex Williamson <alex.williamson@redhat.com>, Chris Mason <clm@fb.com>, Josef Bacik <josef@toxicpanda.com>, David Sterba <dsterba@suse.com>, Gao Xiang <xiang@kernel.org>, Chao Yu <chao@kernel.org>, Yue Hu <zbestahu@gmail.com>, Jeffle Xu <jefflexu@linux.alibaba.com>, Sandeep Dhavale <dhavale@google.com>, Chuck Lever <chuck.lever@oracle.com>, Jeff Layton <jlayton@kernel.org>, Neil Brown <neilb@suse.de>, Olga Kornievskaia <okorniev@redhat.com>, Dai Ngo <Dai.Ngo@oracle.com>, Tom Talpey <tom@talpey.com>, Steven Rostedt <rostedt@goodmis.org>, Masami Hiramatsu <mhiramat@kernel.org>, Mathieu Desnoyers <mathieu.desnoyers@efficios.com>, Alexei Starovoitov <ast@kernel.org>, Daniel Borkmann <daniel@iogearbox.net>, Andrii Nakryiko <andrii@kernel.org>, Martin KaFai Lau <martin.lau@linux.dev>, Eduard Zingerman <eddyz87@gmail.com>, Song Liu <song@kernel.org>, Yonghong Song <yonghong.song@linux.dev>, John Fastabend <john.fastabend@gmail.com>, KP Singh <kpsingh@kernel.org>, Stanislav Fomichev <sdf@fomichev.me>, Hao Luo <haoluo@google.com>, Jiri Olsa <jolsa@kernel.org>, Jesper Dangaard Brouer <hawk@kernel.org>, Ilias Apalodimas <ilias.apalodimas@linaro.org>, "David S. Miller" <davem@davemloft.net>, Eric Dumazet <edumazet@google.com>, Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>, Simon Horman <horms@kernel.org>, Trond Myklebust <trondmy@kernel.org>, Anna Schumaker <anna@kernel.org> CC: Yunsheng Lin <linyunsheng@huawei.com>, Luiz Capitulino <luizcap@redhat.com>, Mel Gorman <mgorman@techsingularity.net>, <kvm@vger.kernel.org>, <linux-kernel@vger.kernel.org>, <virtualization@lists.linux.dev>, <linux-btrfs@vger.kernel.org>, <linux-erofs@lists.ozlabs.org>, <linux-mm@kvack.org>, <linux-nfs@vger.kernel.org>, <linux-trace-kernel@vger.kernel.org>, <bpf@vger.kernel.org>, <netdev@vger.kernel.org> Subject: [PATCH v3] mm: alloc_pages_bulk: support both simple and full-featured API Date: Mon, 14 Apr 2025 20:08:11 +0800 Message-ID: <20250414120819.3053967-1-linyunsheng@huawei.com> X-Mailer: git-send-email 2.30.0 Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: <kvm.vger.kernel.org> List-Subscribe: <mailto:kvm+subscribe@vger.kernel.org> List-Unsubscribe: <mailto:kvm+unsubscribe@vger.kernel.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain X-ClientProxiedBy: dggems702-chm.china.huawei.com (10.3.19.179) To dggpemf200006.china.huawei.com (7.185.36.61)
Series	[v3] mm: alloc_pages_bulk: support both simple and full-featured API \| expand [v3] mm: alloc_pages_bulk: support both simple and full-featured API

[v3] mm: alloc_pages_bulk: support both simple and full-featured API

Commit Message

Comments

Patch