From patchwork Wed Jul 31 12:44:53 2024
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Yunsheng Lin <linyunsheng@huawei.com>
X-Patchwork-Id: 13748707
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 73E01C49EA1
	for <linux-mm@archiver.kernel.org>; Wed, 31 Jul 2024 12:50:52 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id 528346B0085; Wed, 31 Jul 2024 08:50:50 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 4B2146B0088; Wed, 31 Jul 2024 08:50:50 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 28C906B0089; Wed, 31 Jul 2024 08:50:50 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com
 [216.40.44.12])
	by kanga.kvack.org (Postfix) with ESMTP id 0D0426B0085
	for <linux-mm@kvack.org>; Wed, 31 Jul 2024 08:50:50 -0400 (EDT)
Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay10.hostedemail.com (Postfix) with ESMTP id 90433C013D
	for <linux-mm@kvack.org>; Wed, 31 Jul 2024 12:50:49 +0000 (UTC)
X-FDA: 82400032218.15.90F75B5
Received: from szxga03-in.huawei.com (szxga03-in.huawei.com [45.249.212.189])
	by imf20.hostedemail.com (Postfix) with ESMTP id 12B541C0021
	for <linux-mm@kvack.org>; Wed, 31 Jul 2024 12:50:46 +0000 (UTC)
Authentication-Results: imf20.hostedemail.com;
	dkim=none;
	spf=pass (imf20.hostedemail.com: domain of linyunsheng@huawei.com designates
 45.249.212.189 as permitted sender) smtp.mailfrom=linyunsheng@huawei.com;
	dmarc=pass (policy=quarantine) header.from=huawei.com
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed;
 d=hostedemail.com;
	s=arc-20220608; t=1722430191;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references;
	bh=ndHPZibyN91y2SvysEbGwwZzMP8q7lzkifaFrI9sbuU=;
	b=Otde8IOqBnd7t1oFztYWgPAKTc3ciRziVD/sKDOdQ0Waiii6hMA9EJHSqO6glLdfhtEf15
	E6DLzwWpcfjizenTgv6HzVJgU3v4ojzD6MIIwmY5blGYSjvGgRfHuqPYztZttZDh/1Z5A8
	AUBgm6NkbEYzl/8duN1eLiwU5J5e51c=
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1722430191; a=rsa-sha256;
	cv=none;
	b=E/M0rP+DOc7cPnzC5Z6bGj6y9qEMu3cwgLKxGz1F7WNEkb9F+Mg5gxVdUHQfR1DEeV+9sT
	bsgj5dde/yMlfFust55ZkPibb+0ha/amX+3PTHSbmuyad0DPpw18TtCEGlxoS3QKODmgbP
	psd0nPflWwrqZw5f9HouuMKJVIH6EsM=
ARC-Authentication-Results: i=1;
	imf20.hostedemail.com;
	dkim=none;
	spf=pass (imf20.hostedemail.com: domain of linyunsheng@huawei.com designates
 45.249.212.189 as permitted sender) smtp.mailfrom=linyunsheng@huawei.com;
	dmarc=pass (policy=quarantine) header.from=huawei.com
Received: from mail.maildlp.com (unknown [172.19.88.194])
	by szxga03-in.huawei.com (SkyGuard) with ESMTP id 4WYsKY07hKzQnR4;
	Wed, 31 Jul 2024 20:46:25 +0800 (CST)
Received: from dggpemf200006.china.huawei.com (unknown [7.185.36.61])
	by mail.maildlp.com (Postfix) with ESMTPS id 1867D140257;
	Wed, 31 Jul 2024 20:50:44 +0800 (CST)
Received: from localhost.localdomain (10.90.30.45) by
 dggpemf200006.china.huawei.com (7.185.36.61) with Microsoft SMTP Server
 (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id
 15.2.1544.11; Wed, 31 Jul 2024 20:50:43 +0800
From: Yunsheng Lin <linyunsheng@huawei.com>
To: <davem@davemloft.net>, <kuba@kernel.org>, <pabeni@redhat.com>
CC: <netdev@vger.kernel.org>, <linux-kernel@vger.kernel.org>, Yunsheng Lin
	<linyunsheng@huawei.com>, Alexander Duyck <alexander.duyck@gmail.com>, Andrew
 Morton <akpm@linux-foundation.org>, <linux-mm@kvack.org>
Subject: [PATCH net-next v12 03/14] mm: page_frag: use initial zero offset for
 page_frag_alloc_align()
Date: Wed, 31 Jul 2024 20:44:53 +0800
Message-ID: <20240731124505.2903877-4-linyunsheng@huawei.com>
X-Mailer: git-send-email 2.30.0
In-Reply-To: <20240731124505.2903877-1-linyunsheng@huawei.com>
References: <20240731124505.2903877-1-linyunsheng@huawei.com>
MIME-Version: 1.0
X-Originating-IP: [10.90.30.45]
X-ClientProxiedBy: dggems705-chm.china.huawei.com (10.3.19.182) To
 dggpemf200006.china.huawei.com (7.185.36.61)
X-Stat-Signature: p1zmxaikosazongu4zp56zocabeguqag
X-Rspamd-Queue-Id: 12B541C0021
X-Rspam-User: 
X-Rspamd-Server: rspam08
X-HE-Tag: 1722430246-417359
X-HE-Meta: 
 U2FsdGVkX1/H0JRajf1g43sYo+2j0Vtn+Uk8TZMRjUMYYKvOVxvoT9+xR5RDRA/MtlwnwPKyJ3PY5rJgDKHxAjfDa21HLiAKAvb/vDl4uUT5SY+P11p9rKha6LeYinRwQT/K5HgLtD5whVSLdiJwlNXnee10dQgRqUbzTlU6d52B/G5u45rw4ziMO6cbIsz6TO627YzOcXjneT20ZNUE1BmyWx172MtPzRjL32dbsDdoVP//L0R/doMwpoKNVFP3xTrNAZAoWZx74akwS1r6aK8S5kOhZn+zvg7RWMq0BWna9/llvAF5xMbShMwKx1PGhodxiSVWZdG4R3GE9O3uGAMEf5tLARbsNmhHayqSpniOwmKH8g8Zz2IlBOD/dZG8I9Dls1FYfNLYT7XWTZmbdiGue0FI7kAeLtfwuxJq217nrVXEirczLsFydqmV5yjjGAJEaS4KqMb17dGsb9bqJzPCXwggv98MUB0EFyEjNmVSykCrw2fQpa35N2zLClGGf/siVFXRsKAwqey1FlRJLZ84yeTQWiZcdcDPp7+ItN9wTbLKY9lKBm+VGwEC57aE0aGRPN4CzIfcC5WVnlEstbHst7RYk97V1n0+9IDOIBKKQUkL35YQxl+S74t0WVL2b+XVVP8k0VLnDY7m1kzTDWPWry8Ryfjg51c5SnQVv5T9ewz0iMlfXx7d+u0K9F0qiz3pjU6tVDnNEGNXh3HVf2ntN/t5Xc/p2lZVXn2DfRJ8Zr3B4d+E4Q5pX605xcJiO/M60ergW/1i30GQ93xkam2oz8GqFh9cK2b+Ry+TwW8jz0ZxPwXqDat/G3UC224PeBHXdCt5uQYFca06waNOYFM0eZYvheJ1qMRA8Yn4SomtOvpy2c2taeuG0f67HXvhmo2LJbycJJbA2Jwqi4ipfuh01DDWOhWCuWwwS2Jc+cNDSHPib5X6T5E84hRA0ez/gFSX9D/2rOyEtwZE7fw
 a6jiDjAF
 UmjYj12laSk0buFpRhxPPMTAmlQ+/56FIx/zKLjmcLmUtig5wM/f406vuVoiXd5kAuVnFI7sEB0gHw9qqo346j3MDyPgv8KlxvTlyAIq3WH959aMUbTQIsc7GhKfqRcHHGXC+Cp7LJ5JPQp9TpTQ8fOE34F4PFqBbGL0R5IILWsxTpZZIEtxMh/auJJM5Js+1AHjN1rNtbNRQOQCw2rseJ15ibPPzfh/BS5WkNcEnM9d7N8xSNVHfBrLrVU6XIcYwQj3e/ZhdwI9JnNFvUTIbVxwZ8iCIDPzVV2ZDUDPagx/jnrkpIbdC2gTQe4DwlnUJ8gDxaGukF6ptBD21ig/lpt6VRw==
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>
List-Subscribe: <mailto:majordomo@kvack.org>
List-Unsubscribe: <mailto:majordomo@kvack.org>

We are about to use page_frag_alloc_*() API to not just
allocate memory for skb->data, but also use them to do
the memory allocation for skb frag too. Currently the
implementation of page_frag in mm subsystem is running
the offset as a countdown rather than count-up value,
there may have several advantages to that as mentioned
in [1], but it may have some disadvantages, for example,
it may disable skb frag coalescing and more correct cache
prefetching

We have a trade-off to make in order to have a unified
implementation and API for page_frag, so use a initial zero
offset in this patch, and the following patch will try to
make some optimization to avoid the disadvantages as much
as possible.

Rename 'offset' to 'remaining' to retain the 'countdown'
behavior as 'remaining countdown' instead of 'offset
countdown'. Also, Renaming enable us to do a single
'fragsz > remaining' checking for the case of cache not
being enough, which should be the fast path if we ensure
'remaining' is zero when 'va' == NULL by memset'ing
'struct page_frag_cache' in page_frag_cache_init() and
page_frag_cache_drain().

1. https://lore.kernel.org/all/f4abe71b3439b39d17a6fb2d410180f367cadf5c.camel@gmail.com/

CC: Alexander Duyck <alexander.duyck@gmail.com>
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
---
 include/linux/mm_types_task.h |  4 +--
 mm/page_frag_cache.c          | 52 +++++++++++++++++------------------
 2 files changed, 28 insertions(+), 28 deletions(-)

diff --git a/include/linux/mm_types_task.h b/include/linux/mm_types_task.h
index cdc1e3696439..b1c54b2b9308 100644
--- a/include/linux/mm_types_task.h
+++ b/include/linux/mm_types_task.h
@@ -52,10 +52,10 @@ struct page_frag {
 struct page_frag_cache {
 	void *va;
 #if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE)
-	__u16 offset;
+	__u16 remaining;
 	__u16 size;
 #else
-	__u32 offset;
+	__u32 remaining;
 #endif
 	/* we maintain a pagecount bias, so that we dont dirty cache line
 	 * containing page->_refcount every time we allocate a fragment.
diff --git a/mm/page_frag_cache.c b/mm/page_frag_cache.c
index 609a485cd02a..c5bc72cf018a 100644
--- a/mm/page_frag_cache.c
+++ b/mm/page_frag_cache.c
@@ -63,9 +63,13 @@ void *__page_frag_alloc_align(struct page_frag_cache *nc,
 			      unsigned int fragsz, gfp_t gfp_mask,
 			      unsigned int align_mask)
 {
+#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE)
+	unsigned int size = nc->size;
+#else
 	unsigned int size = PAGE_SIZE;
+#endif
+	unsigned int remaining;
 	struct page *page;
-	int offset;
 
 	if (unlikely(!nc->va)) {
 refill:
@@ -82,14 +86,27 @@ void *__page_frag_alloc_align(struct page_frag_cache *nc,
 		 */
 		page_ref_add(page, PAGE_FRAG_CACHE_MAX_SIZE);
 
-		/* reset page count bias and offset to start of new frag */
+		/* reset page count bias and remaining to start of new frag */
 		nc->pfmemalloc = page_is_pfmemalloc(page);
 		nc->pagecnt_bias = PAGE_FRAG_CACHE_MAX_SIZE + 1;
-		nc->offset = size;
+		nc->remaining = size;
 	}
 
-	offset = nc->offset - fragsz;
-	if (unlikely(offset < 0)) {
+	remaining = nc->remaining & align_mask;
+	if (unlikely(remaining < fragsz)) {
+		if (unlikely(fragsz > PAGE_SIZE)) {
+			/*
+			 * The caller is trying to allocate a fragment
+			 * with fragsz > PAGE_SIZE but the cache isn't big
+			 * enough to satisfy the request, this may
+			 * happen in low memory conditions.
+			 * We don't release the cache page because
+			 * it could make memory pressure worse
+			 * so we simply return NULL here.
+			 */
+			return NULL;
+		}
+
 		page = virt_to_page(nc->va);
 
 		if (!page_ref_sub_and_test(page, nc->pagecnt_bias))
@@ -100,35 +117,18 @@ void *__page_frag_alloc_align(struct page_frag_cache *nc,
 			goto refill;
 		}
 
-#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE)
-		/* if size can vary use size else just use PAGE_SIZE */
-		size = nc->size;
-#endif
 		/* OK, page count is 0, we can safely set it */
 		set_page_count(page, PAGE_FRAG_CACHE_MAX_SIZE + 1);
 
-		/* reset page count bias and offset to start of new frag */
+		/* reset page count bias and remaining to start of new frag */
 		nc->pagecnt_bias = PAGE_FRAG_CACHE_MAX_SIZE + 1;
-		offset = size - fragsz;
-		if (unlikely(offset < 0)) {
-			/*
-			 * The caller is trying to allocate a fragment
-			 * with fragsz > PAGE_SIZE but the cache isn't big
-			 * enough to satisfy the request, this may
-			 * happen in low memory conditions.
-			 * We don't release the cache page because
-			 * it could make memory pressure worse
-			 * so we simply return NULL here.
-			 */
-			return NULL;
-		}
+		remaining = size;
 	}
 
 	nc->pagecnt_bias--;
-	offset &= align_mask;
-	nc->offset = offset;
+	nc->remaining = remaining - fragsz;
 
-	return nc->va + offset;
+	return nc->va + (size - remaining);
 }
 EXPORT_SYMBOL(__page_frag_alloc_align);