[net-next,v13,04/14] mm: page_frag: add '_va' suffix to page_frag API

Message ID	20240808123714.462740-5-linyunsheng@huawei.com (mailing list archive)
State	New
Headers	show Received: from szxga06-in.huawei.com (szxga06-in.huawei.com [45.249.212.32]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 24BDA18C92E; Thu, 8 Aug 2024 12:43:06 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.32 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723120989; cv=none; b=jnpCG/B0iceBQMR2fK9FQZNfLCd8MoX9z0UdxNVyC2dTlnDJNXOeoLKKDnmgNouShm5OiHTaMXqCZIaNkx4LYHipLUwH8YLtj5DLAzwf1sDdytKXdjeWkBkcDAydh45bv673di14EZUP68yTiGTkkPRxM3Xcm1ijMv+dv+vePME= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723120989; c=relaxed/simple; bh=USC0Noa3yXLEkQj+LYxHJq++lZLR65WRuj6ejkWV9Yk=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=PpGlRvyokKYTno4UCGN9icQWS9XMyjHHbnNPOm9Fhevog+hj0f/2HxqJZa4b8Jh0n3VfMPdURk8HbWNc284rIiqX5xKJ/y35CyOMePwwoAzlWTSdOoH9sf4EWUhFk62dkcgtVGMmR6ptIlBIjixZoqeIVf2dehXo/Ojalpt02DM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com; spf=pass smtp.mailfrom=huawei.com; arc=none smtp.client-ip=45.249.212.32 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huawei.com Received: from mail.maildlp.com (unknown [172.19.163.17]) by szxga06-in.huawei.com (SkyGuard) with ESMTP id 4Wfmqw020fz1xtyH; Thu, 8 Aug 2024 20:41:16 +0800 (CST) Received: from dggpemf200006.china.huawei.com (unknown [7.185.36.61]) by mail.maildlp.com (Postfix) with ESMTPS id B9FCA1A0188; Thu, 8 Aug 2024 20:43:04 +0800 (CST) Received: from localhost.localdomain (10.90.30.45) by dggpemf200006.china.huawei.com (7.185.36.61) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.11; Thu, 8 Aug 2024 20:43:04 +0800 From: Yunsheng Lin <linyunsheng@huawei.com> To: <davem@davemloft.net>, <kuba@kernel.org>, <pabeni@redhat.com> CC: <netdev@vger.kernel.org>, <linux-kernel@vger.kernel.org>, Yunsheng Lin <linyunsheng@huawei.com>, Alexander Duyck <alexander.duyck@gmail.com>, Subbaraya Sundeep <sbhatta@marvell.com>, Chuck Lever <chuck.lever@oracle.com>, Sagi Grimberg <sagi@grimberg.me>, Jeroen de Borst <jeroendb@google.com>, Praveen Kaligineedi <pkaligineedi@google.com>, Shailend Chand <shailend@google.com>, Eric Dumazet <edumazet@google.com>, Tony Nguyen <anthony.l.nguyen@intel.com>, Przemek Kitszel <przemyslaw.kitszel@intel.com>, Sunil Goutham <sgoutham@marvell.com>, Geetha sowjanya <gakula@marvell.com>, hariprasad <hkelam@marvell.com>, Felix Fietkau <nbd@nbd.name>, Sean Wang <sean.wang@mediatek.com>, Mark Lee <Mark-MC.Lee@mediatek.com>, Lorenzo Bianconi <lorenzo@kernel.org>, Matthias Brugger <matthias.bgg@gmail.com>, AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>, Keith Busch <kbusch@kernel.org>, Jens Axboe <axboe@kernel.dk>, Christoph Hellwig <hch@lst.de>, Chaitanya Kulkarni <kch@nvidia.com>, "Michael S. Tsirkin" <mst@redhat.com>, Jason Wang <jasowang@redhat.com>, =?utf-8?q?Eugenio_P=C3=A9rez?= <eperezma@redhat.com>, Andrew Morton <akpm@linux-foundation.org>, Alexei Starovoitov <ast@kernel.org>, Daniel Borkmann <daniel@iogearbox.net>, Jesper Dangaard Brouer <hawk@kernel.org>, John Fastabend <john.fastabend@gmail.com>, Andrii Nakryiko <andrii@kernel.org>, Martin KaFai Lau <martin.lau@linux.dev>, Eduard Zingerman <eddyz87@gmail.com>, Song Liu <song@kernel.org>, Yonghong Song <yonghong.song@linux.dev>, KP Singh <kpsingh@kernel.org>, Stanislav Fomichev <sdf@fomichev.me>, Hao Luo <haoluo@google.com>, Jiri Olsa <jolsa@kernel.org>, David Howells <dhowells@redhat.com>, Marc Dionne <marc.dionne@auristor.com>, Jeff Layton <jlayton@kernel.org>, Neil Brown <neilb@suse.de>, Olga Kornievskaia <kolga@netapp.com>, Dai Ngo <Dai.Ngo@oracle.com>, Tom Talpey <tom@talpey.com>, Trond Myklebust <trondmy@kernel.org>, Anna Schumaker <anna@kernel.org>, Shuah Khan <shuah@kernel.org>, <intel-wired-lan@lists.osuosl.org>, <linux-arm-kernel@lists.infradead.org>, <linux-mediatek@lists.infradead.org>, <linux-nvme@lists.infradead.org>, <kvm@vger.kernel.org>, <virtualization@lists.linux.dev>, <linux-mm@kvack.org>, <bpf@vger.kernel.org>, <linux-afs@lists.infradead.org>, <linux-nfs@vger.kernel.org>, <linux-kselftest@vger.kernel.org> Subject: [PATCH net-next v13 04/14] mm: page_frag: add '_va' suffix to page_frag API Date: Thu, 8 Aug 2024 20:37:04 +0800 Message-ID: <20240808123714.462740-5-linyunsheng@huawei.com> X-Mailer: git-send-email 2.30.0 In-Reply-To: <20240808123714.462740-1-linyunsheng@huawei.com> References: <20240808123714.462740-1-linyunsheng@huawei.com> Precedence: bulk X-Mailing-List: linux-nfs@vger.kernel.org List-Id: <linux-nfs.vger.kernel.org> List-Subscribe: <mailto:linux-nfs+subscribe@vger.kernel.org> List-Unsubscribe: <mailto:linux-nfs+unsubscribe@vger.kernel.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain X-ClientProxiedBy: dggems704-chm.china.huawei.com (10.3.19.181) To dggpemf200006.china.huawei.com (7.185.36.61)
Series	None \| expand [net-next,v13,04/14] mm: page_frag: add '_va' suffix to page_frag API [net-next,v13,05/14] mm: page_frag: avoid caller accessing 'page_frag_cache' directly

On 2024/8/21 0:02, Alexander Duyck wrote: > On Tue, Aug 20, 2024 at 6:07 AM Yunsheng Lin <linyunsheng@huawei.com> wrote: >> >> On 2024/8/19 23:54, Alexander Duyck wrote: >> >> ... >> >>>>>> >>>>>> "There are three types of API as proposed in this patchset instead of >>>>>> two types of API: >>>>>> 1. page_frag_alloc_va() returns [va]. >>>>>> 2. page_frag_alloc_pg() returns [page, offset]. >>>>>> 3. page_frag_alloc() returns [va] & [page, offset]. >>>>>> >>>>>> You seemed to miss that we need a third naming for the type 3 API. >>>>>> Do you see type 3 API as a valid API? if yes, what naming are you >>>>>> suggesting for it? if no, why it is not a valid API?" >>>>> >>>>> I didn't. I just don't see the point in pushing out the existing API >>>>> to support that. In reality 2 and 3 are redundant. You probably only >>>>> need 3. Like I mentioned earlier you can essentially just pass a >>>> >>>> If the caller just expect [page, offset], do you expect the caller also >>>> type 3 API, which return both [va] and [page, offset]? >>>> >>>> I am not sure if I understand why you think 2 and 3 are redundant here? >>>> If you think 2 and 3 are redundant here, aren't 1 and 3 also redundant >>>> as the similar agrument? >>> >>> The big difference is the need to return page and offset. Basically to >>> support returning page and offset you need to pass at least one value >>> as a pointer so you can store the return there. >>> >>> The reason why 3 is just a redundant form of 2 is that you will >>> normally just be converting from a va to a page and offset so the va >>> should already be easily accessible. >> >> I am assuming that by 'easily accessible', you meant the 'va' can be >> calculated as below, right? >> >> va = encoded_page_address(encoded_va) + >> (page_frag_cache_page_size(encoded_va) - remaining); >> >> I guess it is easily accessible, but it is not without some overhead >> to calculate the 'va' here. > > It is just the encoded_page_address + offset that you have to > calculate anyway. So the only bit you actually have to do is 2 > instructions, one to mask the encoded_va and then the addition of the > offset that you provided to the page. As it stands those instruction > can easily be slipped in while you are working on converting the va to > a page. Well, with your suggestions against other optimizations like avoiding a checking in fast patch and avoid calling virt_to_page(), the overhead is kind of added up. And I am really surprised by your above suggestion about deciding the API for users according to the internal implementation detail here. As the overhead of calculating 'va' is really depending on the layout of 'struct page_frag_cache' here, what if we change the implementation and the overhead of calculating 'va' becomes bigger? Do we expect to change the API for the callers when we change the internal implementation of page_frag_cache? > > >>> >>>>> page_frag via pointer to the function. With that you could also look >>>>> at just returning a virtual address as well if you insist on having >>>>> something that returns all of the above. No point in having 2 and 3 be >>>>> seperate functions. >>>> >>>> Let's be more specific about what are your suggestion here: which way >>>> is the prefer way to return the virtual address. It seems there are two >>>> options: >>>> >>>> 1. Return the virtual address by function returning as below: >>>> void *page_frag_alloc_bio(struct page_frag_cache *nc, struct bio_vec *bio); >>>> >>>> 2. Return the virtual address by double pointer as below: >>>> int page_frag_alloc_bio(struct page_frag_cache *nc, struct bio_vec *bio, >>>> void **va); >>> >>> I was thinking more of option 1. Basically this is a superset of >>> page_frag_alloc_va that is also returning the page and offset via a >>> page frag. However instead of bio_vec I would be good with "struct >>> page_frag *" being the value passed to the function to play the role >>> of container. Basically the big difference between 1 and 2/3 if I am >>> not mistaken is the fact that for 1 you pass the size, whereas with >>> 2/3 you are peeling off the page frag from the larger page frag cache >> >> Let's be clear here: The callers just expecting [page, offset] also need >> to call type 3 API, which return both [va] and [page, offset]? and it >> is ok to ignore the overhead of calculating the 'va' for those kinds >> of callers just because we don't want to do the renaming for a existing >> API and can't come up with good naming for that? >> >>> after the fact via a commit type action. >> >> Just be clear here, there is no commit type action for some subtype of >> type 2/3 API. >> >> For example, for type 2 API in this patchset, it has below subtypes: >> >> subtype 1: it does not need a commit type action, it just return >> [page, offset] instead of page_frag_alloc_va() returning [va], >> and it does not return the allocated fragsz back to the caller >> as page_frag_alloc_va() does not too: >> struct page *page_frag_alloc_pg(struct page_frag_cache *nc, >> unsigned int *offset, unsigned int fragsz, >> gfp_t gfp) >> >> subtype 2: it does need a commit type action, and @fragsz is returned to >> the caller and caller used that to commit how much fragsz to >> commit. >> struct page *page_frag_alloc_pg_prepare(struct page_frag_cache *nc, >> unsigned int *offset, >> unsigned int *fragsz, gfp_t gfp) >> >> Do you see subtype 1 as valid API? If no, why? > > Not really, it is just a wrapper for page_frag_alloc that is > converting the virtual address to a page and offset. They are the same > data and don't justify the need for two functions. It kind of explains I am supposing you meant something like below: struct page *page_frag_alloc_pg(struct page_frag_cache *nc, unsigned int *offset, unsigned int fragsz, gfp_t gfp) { struct page *page; void *va; va = page_frag_alloc_va(nc, fragsz, gfp); if (!va) return NULL; page = virt_to_head_page(va); *offset = va - page_to_virt(page); return page; } If yes, I really think you are caring about maintainability too much by trading off too much performance here by not only recalculating the offset here, but also sometimes calling virt_to_head_page() unnecessarily. If no, please share the pseudo code in your mind. > one of the complaints I had about this code. Supposedly it was > refactoring and combining several different callers into one, but what > it is actually doing is fracturing the code path into 3 different > variants based on little if any actual difference as it is doing > unnecessary optimization. I am supposing the 3 different variants meant the below, right? 1. page_frag_alloc_va() returns [va]. 2. page_frag_alloc_pg() returns [page, offset]. 3. page_frag_alloc() returns [va] & [page, offset]. And there is others 3 different variants for prepare API too: 4. page_frag_alloc_va_prepare() returns [va]. 5. page_frag_alloc_pg_prepare() returns [page, offset]. 6. page_frag_alloc_prepare() returns [va] & [page, offset]. Side note: I just found the '4. page_frag_alloc_va_prepare()' API is not used/called currently and can be removed in next revision for this patchset. It seems what you really want is 3 & 2 to be a wrapper for 1, and 5 & 6 to be a wrapper for 4? If yes, too much performance is traded off here as my understanding. Does't the introducing of __page_frag_cache_reload() already enable the balance between performance and maintainability as much as possible in patch 8? > >> If yes, do you also expect the caller to use "struct page_frag *" as the >> container? If yes, what is the caller expected to do with the size field in >> "struct page_frag *" from API perspective? Just ignore it? > > It should be populated. You passed a fragsz, so you should populate > the output fragsz so you can get the truesize in the case of network > packets. The removal of the page_frag from the other callers is making > it much harder to review your code anyway. If we keep the page_frag > there it should reduce the amount of change needed when you replace > page_frag with the page_frag_cache. I am not starting to use page_frag as the container yet, but the above part is something that I am probably agreed with. > > Honestly this is eating up too much of my time. As I said before this > patch set is too big and it is trying to squeeze in more than it > really should for a single patch set to be reviewable. Going forward > please split up the patch set as I had suggested before and address my > comments. Ideally you would have your first patch just be some > refactor and cleanup to get the "offset" pointer moving in the > direction you want. With that we can at least get half of this set > digested before we start chewing into all this refactor for the > replacement of page_frag with the page_frag_cache. I don't really think breaking this patchset into more patchsets without a newcase is helping to speed up the process here, it might slow down the process instead, as the different idea about the refactoring and new API naming is not going to disappear by breaking the patchset, and the breaking may make the discussion harder without a bigger picture and context.

diff --git a/drivers/net/ethernet/google/gve/gve_rx.c b/drivers/net/ethernet/google/gve/gve_rx.c index acb73d4d0de6..b6c10100e462 100644 --- a/drivers/net/ethernet/google/gve/gve_rx.c +++ b/drivers/net/ethernet/google/gve/gve_rx.c @@ -729,7 +729,7 @@ static int gve_xdp_redirect(struct net_device *dev, struct gve_rx_ring *rx, total_len = headroom + SKB_DATA_ALIGN(len) + SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); - frame = page_frag_alloc(&rx->page_cache, total_len, GFP_ATOMIC); + frame = page_frag_alloc_va(&rx->page_cache, total_len, GFP_ATOMIC); if (!frame) { u64_stats_update_begin(&rx->statss); rx->xdp_alloc_fails++; @@ -742,7 +742,7 @@ static int gve_xdp_redirect(struct net_device *dev, struct gve_rx_ring *rx, err = xdp_do_redirect(dev, &new, xdp_prog); if (err) - page_frag_free(frame); + page_frag_free_va(frame); return err; } diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.c b/drivers/net/ethernet/intel/ice/ice_txrx.c index 8d25b6981269..00c706de2b82 100644 --- a/drivers/net/ethernet/intel/ice/ice_txrx.c +++ b/drivers/net/ethernet/intel/ice/ice_txrx.c @@ -126,7 +126,7 @@ ice_unmap_and_free_tx_buf(struct ice_tx_ring *ring, struct ice_tx_buf *tx_buf) dev_kfree_skb_any(tx_buf->skb); break; case ICE_TX_BUF_XDP_TX: - page_frag_free(tx_buf->raw_buf); + page_frag_free_va(tx_buf->raw_buf); break; case ICE_TX_BUF_XDP_XMIT: xdp_return_frame(tx_buf->xdpf); diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.h b/drivers/net/ethernet/intel/ice/ice_txrx.h index feba314a3fe4..6379f57d8228 100644 --- a/drivers/net/ethernet/intel/ice/ice_txrx.h +++ b/drivers/net/ethernet/intel/ice/ice_txrx.h @@ -148,7 +148,7 @@ static inline int ice_skb_pad(void) * @ICE_TX_BUF_DUMMY: dummy Flow Director packet, unmap and kfree() * @ICE_TX_BUF_FRAG: mapped skb OR &xdp_buff frag, only unmap DMA * @ICE_TX_BUF_SKB: &sk_buff, unmap and consume_skb(), update stats - * @ICE_TX_BUF_XDP_TX: &xdp_buff, unmap and page_frag_free(), stats + * @ICE_TX_BUF_XDP_TX: &xdp_buff, unmap and page_frag_free_va(), stats * @ICE_TX_BUF_XDP_XMIT: &xdp_frame, unmap and xdp_return_frame(), stats * @ICE_TX_BUF_XSK_TX: &xdp_buff on XSk queue, xsk_buff_free(), stats */ diff --git a/drivers/net/ethernet/intel/ice/ice_txrx_lib.c b/drivers/net/ethernet/intel/ice/ice_txrx_lib.c index 2719f0e20933..a1a41a14df0d 100644 --- a/drivers/net/ethernet/intel/ice/ice_txrx_lib.c +++ b/drivers/net/ethernet/intel/ice/ice_txrx_lib.c @@ -250,7 +250,7 @@ ice_clean_xdp_tx_buf(struct device *dev, struct ice_tx_buf *tx_buf, switch (tx_buf->type) { case ICE_TX_BUF_XDP_TX: - page_frag_free(tx_buf->raw_buf); + page_frag_free_va(tx_buf->raw_buf); break; case ICE_TX_BUF_XDP_XMIT: xdp_return_frame_bulk(tx_buf->xdpf, bq); diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c index 149911e3002a..eef16a909f85 100644 --- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c +++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c @@ -302,7 +302,7 @@ static bool ixgbevf_clean_tx_irq(struct ixgbevf_q_vector *q_vector, /* free the skb */ if (ring_is_xdp(tx_ring)) - page_frag_free(tx_buffer->data); + page_frag_free_va(tx_buffer->data); else napi_consume_skb(tx_buffer->skb, napi_budget); @@ -2412,7 +2412,7 @@ static void ixgbevf_clean_tx_ring(struct ixgbevf_ring *tx_ring) /* Free all the Tx ring sk_buffs */ if (ring_is_xdp(tx_ring)) - page_frag_free(tx_buffer->data); + page_frag_free_va(tx_buffer->data); else dev_kfree_skb_any(tx_buffer->skb); diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c index 87d5776e3b88..a485e988fa1d 100644 --- a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c +++ b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c @@ -553,7 +553,7 @@ static int __otx2_alloc_rbuf(struct otx2_nic *pfvf, struct otx2_pool *pool, *dma = dma_map_single_attrs(pfvf->dev, buf, pool->rbsize, DMA_FROM_DEVICE, DMA_ATTR_SKIP_CPU_SYNC); if (unlikely(dma_mapping_error(pfvf->dev, *dma))) { - page_frag_free(buf); + page_frag_free_va(buf); return -ENOMEM; } diff --git a/drivers/net/ethernet/mediatek/mtk_wed_wo.c b/drivers/net/ethernet/mediatek/mtk_wed_wo.c index 7063c78bd35f..c4228719f8a4 100644 --- a/drivers/net/ethernet/mediatek/mtk_wed_wo.c +++ b/drivers/net/ethernet/mediatek/mtk_wed_wo.c @@ -142,8 +142,8 @@ mtk_wed_wo_queue_refill(struct mtk_wed_wo *wo, struct mtk_wed_wo_queue *q, dma_addr_t addr; void *buf; - buf = page_frag_alloc(&q->cache, q->buf_size, - GFP_ATOMIC | GFP_DMA32); + buf = page_frag_alloc_va(&q->cache, q->buf_size, + GFP_ATOMIC | GFP_DMA32); if (!buf) break; diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c index a2a47d3ab99f..86906bc505de 100644 --- a/drivers/nvme/host/tcp.c +++ b/drivers/nvme/host/tcp.c @@ -506,7 +506,7 @@ static void nvme_tcp_exit_request(struct blk_mq_tag_set *set, { struct nvme_tcp_request *req = blk_mq_rq_to_pdu(rq); - page_frag_free(req->pdu); + page_frag_free_va(req->pdu); } static int nvme_tcp_init_request(struct blk_mq_tag_set *set, @@ -520,7 +520,7 @@ static int nvme_tcp_init_request(struct blk_mq_tag_set *set, struct nvme_tcp_queue *queue = &ctrl->queues[queue_idx]; u8 hdgst = nvme_tcp_hdgst_len(queue); - req->pdu = page_frag_alloc(&queue->pf_cache, + req->pdu = page_frag_alloc_va(&queue->pf_cache, sizeof(struct nvme_tcp_cmd_pdu) + hdgst, GFP_KERNEL | __GFP_ZERO); if (!req->pdu) @@ -1337,7 +1337,7 @@ static void nvme_tcp_free_async_req(struct nvme_tcp_ctrl *ctrl) { struct nvme_tcp_request *async = &ctrl->async_req; - page_frag_free(async->pdu); + page_frag_free_va(async->pdu); } static int nvme_tcp_alloc_async_req(struct nvme_tcp_ctrl *ctrl) @@ -1346,7 +1346,7 @@ static int nvme_tcp_alloc_async_req(struct nvme_tcp_ctrl *ctrl) struct nvme_tcp_request *async = &ctrl->async_req; u8 hdgst = nvme_tcp_hdgst_len(queue); - async->pdu = page_frag_alloc(&queue->pf_cache, + async->pdu = page_frag_alloc_va(&queue->pf_cache, sizeof(struct nvme_tcp_cmd_pdu) + hdgst, GFP_KERNEL | __GFP_ZERO); if (!async->pdu) diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c index 5bff0d5464d1..560df3db2f82 100644 --- a/drivers/nvme/target/tcp.c +++ b/drivers/nvme/target/tcp.c @@ -1463,24 +1463,24 @@ static int nvmet_tcp_alloc_cmd(struct nvmet_tcp_queue *queue, c->queue = queue; c->req.port = queue->port->nport; - c->cmd_pdu = page_frag_alloc(&queue->pf_cache, + c->cmd_pdu = page_frag_alloc_va(&queue->pf_cache, sizeof(*c->cmd_pdu) + hdgst, GFP_KERNEL | __GFP_ZERO); if (!c->cmd_pdu) return -ENOMEM; c->req.cmd = &c->cmd_pdu->cmd; - c->rsp_pdu = page_frag_alloc(&queue->pf_cache, + c->rsp_pdu = page_frag_alloc_va(&queue->pf_cache, sizeof(*c->rsp_pdu) + hdgst, GFP_KERNEL | __GFP_ZERO); if (!c->rsp_pdu) goto out_free_cmd; c->req.cqe = &c->rsp_pdu->cqe; - c->data_pdu = page_frag_alloc(&queue->pf_cache, + c->data_pdu = page_frag_alloc_va(&queue->pf_cache, sizeof(*c->data_pdu) + hdgst, GFP_KERNEL | __GFP_ZERO); if (!c->data_pdu) goto out_free_rsp; - c->r2t_pdu = page_frag_alloc(&queue->pf_cache, + c->r2t_pdu = page_frag_alloc_va(&queue->pf_cache, sizeof(*c->r2t_pdu) + hdgst, GFP_KERNEL | __GFP_ZERO); if (!c->r2t_pdu) goto out_free_data; @@ -1495,20 +1495,20 @@ static int nvmet_tcp_alloc_cmd(struct nvmet_tcp_queue *queue, return 0; out_free_data: - page_frag_free(c->data_pdu); + page_frag_free_va(c->data_pdu); out_free_rsp: - page_frag_free(c->rsp_pdu); + page_frag_free_va(c->rsp_pdu); out_free_cmd: - page_frag_free(c->cmd_pdu); + page_frag_free_va(c->cmd_pdu); return -ENOMEM; } static void nvmet_tcp_free_cmd(struct nvmet_tcp_cmd *c) { - page_frag_free(c->r2t_pdu); - page_frag_free(c->data_pdu); - page_frag_free(c->rsp_pdu); - page_frag_free(c->cmd_pdu); + page_frag_free_va(c->r2t_pdu); + page_frag_free_va(c->data_pdu); + page_frag_free_va(c->rsp_pdu); + page_frag_free_va(c->cmd_pdu); } static int nvmet_tcp_alloc_cmds(struct nvmet_tcp_queue *queue) diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index f16279351db5..6691fac01e0d 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -686,8 +686,8 @@ static int vhost_net_build_xdp(struct vhost_net_virtqueue *nvq, return -ENOSPC; buflen += SKB_DATA_ALIGN(len + pad); - buf = page_frag_alloc_align(&net->pf_cache, buflen, GFP_KERNEL, - SMP_CACHE_BYTES); + buf = page_frag_alloc_va_align(&net->pf_cache, buflen, GFP_KERNEL, + SMP_CACHE_BYTES); if (unlikely(!buf)) return -ENOMEM; @@ -734,7 +734,7 @@ static int vhost_net_build_xdp(struct vhost_net_virtqueue *nvq, return 0; err: - page_frag_free(buf); + page_frag_free_va(buf); return ret; } diff --git a/include/linux/page_frag_cache.h b/include/linux/page_frag_cache.h index a758cb65a9b3..ef038a07925c 100644 --- a/include/linux/page_frag_cache.h +++ b/include/linux/page_frag_cache.h @@ -9,23 +9,24 @@ void page_frag_cache_drain(struct page_frag_cache *nc); void __page_frag_cache_drain(struct page *page, unsigned int count); -void *__page_frag_alloc_align(struct page_frag_cache *nc, unsigned int fragsz, - gfp_t gfp_mask, unsigned int align_mask); +void *__page_frag_alloc_va_align(struct page_frag_cache *nc, + unsigned int fragsz, gfp_t gfp_mask, + unsigned int align_mask); -static inline void *page_frag_alloc_align(struct page_frag_cache *nc, - unsigned int fragsz, gfp_t gfp_mask, - unsigned int align) +static inline void *page_frag_alloc_va_align(struct page_frag_cache *nc, + unsigned int fragsz, + gfp_t gfp_mask, unsigned int align) { WARN_ON_ONCE(!is_power_of_2(align)); - return __page_frag_alloc_align(nc, fragsz, gfp_mask, -align); + return __page_frag_alloc_va_align(nc, fragsz, gfp_mask, -align); } -static inline void *page_frag_alloc(struct page_frag_cache *nc, - unsigned int fragsz, gfp_t gfp_mask) +static inline void *page_frag_alloc_va(struct page_frag_cache *nc, + unsigned int fragsz, gfp_t gfp_mask) { - return __page_frag_alloc_align(nc, fragsz, gfp_mask, ~0u); + return __page_frag_alloc_va_align(nc, fragsz, gfp_mask, ~0u); } -void page_frag_free(void *addr); +void page_frag_free_va(void *addr); #endif diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 7482997c719f..f9f0393e6b12 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -3381,7 +3381,7 @@ static inline struct sk_buff *netdev_alloc_skb_ip_align(struct net_device *dev, static inline void skb_free_frag(void *addr) { - page_frag_free(addr); + page_frag_free_va(addr); } void *__napi_alloc_frag_align(unsigned int fragsz, unsigned int align_mask); diff --git a/kernel/bpf/cpumap.c b/kernel/bpf/cpumap.c index fbdf5a1aabfe..3b70b6b071b9 100644 --- a/kernel/bpf/cpumap.c +++ b/kernel/bpf/cpumap.c @@ -323,7 +323,7 @@ static int cpu_map_kthread_run(void *data) /* Bring struct page memory area to curr CPU. Read by * build_skb_around via page_is_pfmemalloc(), and when - * freed written by page_frag_free call. + * freed written by page_frag_free_va call. */ prefetchw(page); } diff --git a/mm/page_frag_cache.c b/mm/page_frag_cache.c index c5bc72cf018a..70fb6dead624 100644 --- a/mm/page_frag_cache.c +++ b/mm/page_frag_cache.c @@ -59,9 +59,9 @@ void __page_frag_cache_drain(struct page *page, unsigned int count) } EXPORT_SYMBOL(__page_frag_cache_drain); -void *__page_frag_alloc_align(struct page_frag_cache *nc, - unsigned int fragsz, gfp_t gfp_mask, - unsigned int align_mask) +void *__page_frag_alloc_va_align(struct page_frag_cache *nc, + unsigned int fragsz, gfp_t gfp_mask, + unsigned int align_mask) { #if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE) unsigned int size = nc->size; @@ -130,16 +130,16 @@ void *__page_frag_alloc_align(struct page_frag_cache *nc, return nc->va + (size - remaining); } -EXPORT_SYMBOL(__page_frag_alloc_align); +EXPORT_SYMBOL(__page_frag_alloc_va_align); /* * Frees a page fragment allocated out of either a compound or order 0 page. */ -void page_frag_free(void *addr) +void page_frag_free_va(void *addr) { struct page *page = virt_to_head_page(addr); if (unlikely(put_page_testzero(page))) free_unref_page(page, compound_order(page)); } -EXPORT_SYMBOL(page_frag_free); +EXPORT_SYMBOL(page_frag_free_va); diff --git a/net/core/skbuff.c b/net/core/skbuff.c index de2a044cc665..6cf2c51a34e1 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -314,8 +314,8 @@ void *__napi_alloc_frag_align(unsigned int fragsz, unsigned int align_mask) fragsz = SKB_DATA_ALIGN(fragsz); local_lock_nested_bh(&napi_alloc_cache.bh_lock); - data = __page_frag_alloc_align(&nc->page, fragsz, - GFP_ATOMIC | __GFP_NOWARN, align_mask); + data = __page_frag_alloc_va_align(&nc->page, fragsz, + GFP_ATOMIC | __GFP_NOWARN, align_mask); local_unlock_nested_bh(&napi_alloc_cache.bh_lock); return data; @@ -330,9 +330,9 @@ void *__netdev_alloc_frag_align(unsigned int fragsz, unsigned int align_mask) struct page_frag_cache *nc = this_cpu_ptr(&netdev_alloc_cache); fragsz = SKB_DATA_ALIGN(fragsz); - data = __page_frag_alloc_align(nc, fragsz, - GFP_ATOMIC | __GFP_NOWARN, - align_mask); + data = __page_frag_alloc_va_align(nc, fragsz, + GFP_ATOMIC | __GFP_NOWARN, + align_mask); } else { local_bh_disable(); data = __napi_alloc_frag_align(fragsz, align_mask); @@ -751,14 +751,14 @@ struct sk_buff *__netdev_alloc_skb(struct net_device *dev, unsigned int len, if (in_hardirq() || irqs_disabled()) { nc = this_cpu_ptr(&netdev_alloc_cache); - data = page_frag_alloc(nc, len, gfp_mask); + data = page_frag_alloc_va(nc, len, gfp_mask); pfmemalloc = nc->pfmemalloc; } else { local_bh_disable(); local_lock_nested_bh(&napi_alloc_cache.bh_lock); nc = this_cpu_ptr(&napi_alloc_cache.page); - data = page_frag_alloc(nc, len, gfp_mask); + data = page_frag_alloc_va(nc, len, gfp_mask); pfmemalloc = nc->pfmemalloc; local_unlock_nested_bh(&napi_alloc_cache.bh_lock); @@ -848,7 +848,7 @@ struct sk_buff *napi_alloc_skb(struct napi_struct *napi, unsigned int len) } else { len = SKB_HEAD_ALIGN(len); - data = page_frag_alloc(&nc->page, len, gfp_mask); + data = page_frag_alloc_va(&nc->page, len, gfp_mask); pfmemalloc = nc->page.pfmemalloc; } local_unlock_nested_bh(&napi_alloc_cache.bh_lock); diff --git a/net/core/xdp.c b/net/core/xdp.c index bcc5551c6424..7d4e09fb478f 100644 --- a/net/core/xdp.c +++ b/net/core/xdp.c @@ -387,7 +387,7 @@ void __xdp_return(void *data, struct xdp_mem_info *mem, bool napi_direct, page_pool_put_full_page(page->pp, page, napi_direct); break; case MEM_TYPE_PAGE_SHARED: - page_frag_free(data); + page_frag_free_va(data); break; case MEM_TYPE_PAGE_ORDER0: page = virt_to_page(data); /* Assumes order0 page*/ diff --git a/net/rxrpc/txbuf.c b/net/rxrpc/txbuf.c index c3913d8a50d3..dccb0353ee84 100644 --- a/net/rxrpc/txbuf.c +++ b/net/rxrpc/txbuf.c @@ -33,8 +33,8 @@ struct rxrpc_txbuf *rxrpc_alloc_data_txbuf(struct rxrpc_call *call, size_t data_ data_align = umax(data_align, L1_CACHE_BYTES); mutex_lock(&call->conn->tx_data_alloc_lock); - buf = page_frag_alloc_align(&call->conn->tx_data_alloc, total, gfp, - data_align); + buf = page_frag_alloc_va_align(&call->conn->tx_data_alloc, total, gfp, + data_align); mutex_unlock(&call->conn->tx_data_alloc_lock); if (!buf) { kfree(txb); @@ -96,17 +96,18 @@ struct rxrpc_txbuf *rxrpc_alloc_ack_txbuf(struct rxrpc_call *call, size_t sack_s if (!txb) return NULL; - buf = page_frag_alloc(&call->local->tx_alloc, - sizeof(*whdr) + sizeof(*ack) + 1 + 3 + sizeof(*trailer), gfp); + buf = page_frag_alloc_va(&call->local->tx_alloc, + sizeof(*whdr) + sizeof(*ack) + 1 + 3 + sizeof(*trailer), gfp); if (!buf) { kfree(txb); return NULL; } if (sack_size) { - buf2 = page_frag_alloc(&call->local->tx_alloc, sack_size, gfp); + buf2 = page_frag_alloc_va(&call->local->tx_alloc, sack_size, + gfp); if (!buf2) { - page_frag_free(buf); + page_frag_free_va(buf); kfree(txb); return NULL; } @@ -180,7 +181,7 @@ static void rxrpc_free_txbuf(struct rxrpc_txbuf *txb) rxrpc_txbuf_free); for (i = 0; i < txb->nr_kvec; i++) if (txb->kvec[i].iov_base) - page_frag_free(txb->kvec[i].iov_base); + page_frag_free_va(txb->kvec[i].iov_base); kfree(txb); atomic_dec(&rxrpc_nr_txbuf); } diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c index 6b3f01beb294..42d20412c1c3 100644 --- a/net/sunrpc/svcsock.c +++ b/net/sunrpc/svcsock.c @@ -1222,8 +1222,8 @@ static int svc_tcp_sendmsg(struct svc_sock *svsk, struct svc_rqst *rqstp, /* The stream record marker is copied into a temporary page * fragment buffer so that it can be included in rq_bvec. */ - buf = page_frag_alloc(&svsk->sk_frag_cache, sizeof(marker), - GFP_KERNEL); + buf = page_frag_alloc_va(&svsk->sk_frag_cache, sizeof(marker), + GFP_KERNEL); if (!buf) return -ENOMEM; memcpy(buf, &marker, sizeof(marker)); @@ -1235,7 +1235,7 @@ static int svc_tcp_sendmsg(struct svc_sock *svsk, struct svc_rqst *rqstp, iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, rqstp->rq_bvec, 1 + count, sizeof(marker) + rqstp->rq_res.len); ret = sock_sendmsg(svsk->sk_sock, &msg); - page_frag_free(buf); + page_frag_free_va(buf); if (ret < 0) return ret; *sentp += ret; diff --git a/tools/testing/selftests/mm/page_frag/page_frag_test.c b/tools/testing/selftests/mm/page_frag/page_frag_test.c index 4a009122991e..e522611452c9 100644 --- a/tools/testing/selftests/mm/page_frag/page_frag_test.c +++ b/tools/testing/selftests/mm/page_frag/page_frag_test.c @@ -52,7 +52,7 @@ static int page_frag_pop_thread(void *arg) if (obj) { nr--; - page_frag_free(obj); + page_frag_free_va(obj); } else { cond_resched(); } @@ -80,13 +80,16 @@ static int page_frag_push_thread(void *arg) int ret; if (test_align) { - va = page_frag_alloc_align(&test_frag, test_alloc_len, - GFP_KERNEL, SMP_CACHE_BYTES); + va = page_frag_alloc_va_align(&test_frag, + test_alloc_len, + GFP_KERNEL, + SMP_CACHE_BYTES); WARN_ONCE((unsigned long)va & (SMP_CACHE_BYTES - 1), "unaligned va returned\n"); } else { - va = page_frag_alloc(&test_frag, test_alloc_len, GFP_KERNEL); + va = page_frag_alloc_va(&test_frag, test_alloc_len, + GFP_KERNEL); } if (!va) @@ -94,7 +97,7 @@ static int page_frag_push_thread(void *arg) ret = __ptr_ring_produce(ring, va); if (ret) { - page_frag_free(va); + page_frag_free_va(va); cond_resched(); } else { nr--;

[net-next,v13,04/14] mm: page_frag: add '_va' suffix to page_frag API

Commit Message

Comments

Patch