From patchwork Tue Jun 25 13:52:06 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yunsheng Lin X-Patchwork-Id: 13711192 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 82F01C2BBCA for ; Tue, 25 Jun 2024 13:55:37 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C2EBA6B02E1; Tue, 25 Jun 2024 09:55:36 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id BDF486B02E2; Tue, 25 Jun 2024 09:55:36 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A7FD76B02E3; Tue, 25 Jun 2024 09:55:36 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 8BCBC6B02E1 for ; Tue, 25 Jun 2024 09:55:36 -0400 (EDT) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 3DA5C161A23 for ; Tue, 25 Jun 2024 13:55:36 +0000 (UTC) X-FDA: 82269558672.15.C8947F2 Received: from szxga01-in.huawei.com (szxga01-in.huawei.com [45.249.212.187]) by imf03.hostedemail.com (Postfix) with ESMTP id C62BD2000E for ; Tue, 25 Jun 2024 13:55:32 +0000 (UTC) Authentication-Results: imf03.hostedemail.com; dkim=none; spf=pass (imf03.hostedemail.com: domain of linyunsheng@huawei.com designates 45.249.212.187 as permitted sender) smtp.mailfrom=linyunsheng@huawei.com; dmarc=pass (policy=quarantine) header.from=huawei.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1719323726; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=szLMBG/uWYE2U9nzsI1uEBlD/zDJpLU0mXUdjjzWfRI=; b=3KQPlECJr3PZq7F4w4wge3WBQZ4HkASVKmH2zHWf9Kwbciy60/YFOy0vcjgwNzfMg6psGe jrHnCJLmvqhiLuFxnACQTSJqkZXaAYZyv/vfhCQRA4i7TFl3QlI4wsHVQjUdEiAjp/hE6g EZYUcR54MrJZ0KdSh7O5YnyZUGTuNSk= ARC-Authentication-Results: i=1; imf03.hostedemail.com; dkim=none; spf=pass (imf03.hostedemail.com: domain of linyunsheng@huawei.com designates 45.249.212.187 as permitted sender) smtp.mailfrom=linyunsheng@huawei.com; dmarc=pass (policy=quarantine) header.from=huawei.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1719323726; a=rsa-sha256; cv=none; b=q9dQWh3KJRgmdY958PquJEZwRHm9KqdxHknqGSzVsI4ObWNp6Zrv3y6jb3DiYz3timIP3X 9RnQ3MLPKSAw8QsnSMiZVqmx48DKS2+tBR4RHU83DO9N3LD+Hrt9JBFOJ7AlS8ZNhxTnqy sT+OJw8nB6MQPSmzBcRHec7WlYSo5ww= Received: from mail.maildlp.com (unknown [172.19.163.48]) by szxga01-in.huawei.com (SkyGuard) with ESMTP id 4W7mSq4wHnzxTb5; Tue, 25 Jun 2024 21:51:07 +0800 (CST) Received: from dggpemf200006.china.huawei.com (unknown [7.185.36.61]) by mail.maildlp.com (Postfix) with ESMTPS id 873B118006C; Tue, 25 Jun 2024 21:55:26 +0800 (CST) Received: from localhost.localdomain (10.69.192.56) by dggpemf200006.china.huawei.com (7.185.36.61) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.11; Tue, 25 Jun 2024 21:55:26 +0800 From: Yunsheng Lin To: , , CC: , , Yunsheng Lin , Alexander Duyck , Andrew Morton , Subject: [PATCH net-next v9 03/13] mm: page_frag: use initial zero offset for page_frag_alloc_align() Date: Tue, 25 Jun 2024 21:52:06 +0800 Message-ID: <20240625135216.47007-4-linyunsheng@huawei.com> X-Mailer: git-send-email 2.33.0 In-Reply-To: <20240625135216.47007-1-linyunsheng@huawei.com> References: <20240625135216.47007-1-linyunsheng@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.69.192.56] X-ClientProxiedBy: dggems706-chm.china.huawei.com (10.3.19.183) To dggpemf200006.china.huawei.com (7.185.36.61) X-Rspamd-Server: rspam03 X-Rspam-User: X-Rspamd-Queue-Id: C62BD2000E X-Stat-Signature: fxau5u76a1yfckfps6xs46misworro71 X-HE-Tag: 1719323732-414298 X-HE-Meta: U2FsdGVkX19940WiZL6uDL5atTWnwAIrTXBQeQe681MWx7iJkLFxsQvUwaMLcWvoBJ9TyK6yDFmmRkyW18Vbd7Fjsz3qFCidi1G9oEpOnGSZYSRpDUiTwZ4ZH/Q9xgRghKLgVGnzjJSbA4RG/onTdeOSjr7awpTjdVCax/KmJvRl+Mw4TSuxzSC75UMnVjtj7HT156QyIM8FXRoFTTLpz9FFh1P24+V8amsOzCwP4LMXgdVJ4ecvTrBh7zjUPnqIRc34HKbwM6+Kl6cVhxsmM1/FO4SDCWlWa2I7OQXkfPm9WOAiaP2+cIYgW4GgXW1R1L27/oCc2Z6JlY0Muh8Y0gDOF1E79gL09LvBH5z1v2nbHFgwqP2gIw21oKwkwxw3lY4JJ5YhuzOlR+ZIo5KD+W0zFwAoJzW+vwZf5pD+l8f3nzfSwcWoSNriB1qTQcPusf/+zk4MGlQSy/mcOCJ3mjhilNStJbuqGABv9WoHFF23XeuLogW1kx8ZjH+ALewoJmWE5ekWE7XdRI5IZVAueHQWp6Tb+a+cKicZK2f1gI1Rw3ctgZ/oyRcizGRBYVXCVLsE7iI2hE8iL7OPciuZ1kCLM5HnnZ8Qhgx/sOnWDjFYBrqTOSUrEEYLreZxftZoLIgNZkOobCwiuydbx1S7n3tQel4ulRtFRHlLKAhMc7B20m2OCLrc6LYBmar3Y9MA9W2Byv+T8nAWJk89Z08OI14H+LscStVualFeTzbCRKyARpI52vn/DaOL+dXpiKLzFl0h9pe+8Hkv4f2izN64/tCIpMN2CI6BnSOatYRFCr9x4cWDcP+JZDdfa8Or+ACFiMf3Zc7bSrmiC8Lb7uDBH/E3fso5e1nrNJDq9Svto/Aj3LTW9Js5IVZ4kLbZ7wJzvbkWEnKYQVOxmZgSNxsexVORgE41Np2VVrrsHQCO0LwEOJxzsmWu5Z6VTNoe3GuvzUWIutduN5YqbxshqNY yvL2Rd3i AWOICkgL9yDT8w9JT/vAa2/z4Viq61BL29Xz0EuB6Ez3xH5QD+4VUR/scNfrauS+sN7qcDLtSSZG5KafhVkKqbIahfqxEK3L44VOt47D0dNcFEDTFjNG6f6K5ImAxhEQ1SIiayrGu/BqHuyXVu0PDrj11yGmzgqrdC2/RAmjBCv3zWHICqwsdoezUQ4uHi2xxq6mhFHctoYhgSqqysFXDIwKgA0d5nYVjoUoDPU2YdiIN5cPnjEo7p+3P9kgP5XGslBygPPBH5T3Y3wfEuLiXiVJhbiOXeN44k/1d0wretHAnLr+8tGUDzECyl4rPhPp0n3912i3oGYzph7Ok6mfeb3S2OQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: We are above to use page_frag_alloc_*() API to not just allocate memory for skb->data, but also use them to do the memory allocation for skb frag too. Currently the implementation of page_frag in mm subsystem is running the offset as a countdown rather than count-up value, there may have several advantages to that as mentioned in [1], but it may have some disadvantages, for example, it may disable skb frag coaleasing and more correct cache prefetching We have a trade-off to make in order to have a unified implementation and API for page_frag, so use a initial zero offset in this patch, and the following patch will try to make some optimization to aovid the disadvantages as much as possible. As offsets is added due to alignment requirement before actually checking if the cache is enough, which might make it exploitable if caller passes a align value bigger than 32K mistakenly. As we are allowing order 3 page allocation to fail easily under low memory condition, align value bigger than PAGE_SIZE is not really allowed, so add a 'align > PAGE_SIZE' checking in page_frag_alloc_va_align() to catch that. 1. https://lore.kernel.org/all/f4abe71b3439b39d17a6fb2d410180f367cadf5c.camel@gmail.com/ CC: Alexander Duyck Signed-off-by: Yunsheng Lin --- include/linux/page_frag_cache.h | 2 +- include/linux/skbuff.h | 4 ++-- mm/page_frag_cache.c | 26 +++++++++++--------------- 3 files changed, 14 insertions(+), 18 deletions(-) diff --git a/include/linux/page_frag_cache.h b/include/linux/page_frag_cache.h index 3a44bfc99750..b9411f0db25a 100644 --- a/include/linux/page_frag_cache.h +++ b/include/linux/page_frag_cache.h @@ -32,7 +32,7 @@ static inline void *page_frag_alloc_align(struct page_frag_cache *nc, unsigned int fragsz, gfp_t gfp_mask, unsigned int align) { - WARN_ON_ONCE(!is_power_of_2(align)); + WARN_ON_ONCE(!is_power_of_2(align) || align > PAGE_SIZE); return __page_frag_alloc_align(nc, fragsz, gfp_mask, -align); } diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index eb8ae8292c48..d1fea23ec386 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -3320,7 +3320,7 @@ static inline void *netdev_alloc_frag(unsigned int fragsz) static inline void *netdev_alloc_frag_align(unsigned int fragsz, unsigned int align) { - WARN_ON_ONCE(!is_power_of_2(align)); + WARN_ON_ONCE(!is_power_of_2(align) || align > PAGE_SIZE); return __netdev_alloc_frag_align(fragsz, -align); } @@ -3391,7 +3391,7 @@ static inline void *napi_alloc_frag(unsigned int fragsz) static inline void *napi_alloc_frag_align(unsigned int fragsz, unsigned int align) { - WARN_ON_ONCE(!is_power_of_2(align)); + WARN_ON_ONCE(!is_power_of_2(align) || align > PAGE_SIZE); return __napi_alloc_frag_align(fragsz, -align); } diff --git a/mm/page_frag_cache.c b/mm/page_frag_cache.c index 88f567ef0e29..da244851b8a4 100644 --- a/mm/page_frag_cache.c +++ b/mm/page_frag_cache.c @@ -72,10 +72,6 @@ void *__page_frag_alloc_align(struct page_frag_cache *nc, if (!page) return NULL; -#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE) - /* if size can vary use size else just use PAGE_SIZE */ - size = nc->size; -#endif /* Even if we own the page, we do not use atomic_set(). * This would break get_page_unless_zero() users. */ @@ -84,11 +80,16 @@ void *__page_frag_alloc_align(struct page_frag_cache *nc, /* reset page count bias and offset to start of new frag */ nc->pfmemalloc = page_is_pfmemalloc(page); nc->pagecnt_bias = PAGE_FRAG_CACHE_MAX_SIZE + 1; - nc->offset = size; + nc->offset = 0; } - offset = nc->offset - fragsz; - if (unlikely(offset < 0)) { +#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE) + /* if size can vary use size else just use PAGE_SIZE */ + size = nc->size; +#endif + + offset = __ALIGN_KERNEL_MASK(nc->offset, ~align_mask); + if (unlikely(offset + fragsz > size)) { page = virt_to_page(nc->va); if (!page_ref_sub_and_test(page, nc->pagecnt_bias)) @@ -99,17 +100,13 @@ void *__page_frag_alloc_align(struct page_frag_cache *nc, goto refill; } -#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE) - /* if size can vary use size else just use PAGE_SIZE */ - size = nc->size; -#endif /* OK, page count is 0, we can safely set it */ set_page_count(page, PAGE_FRAG_CACHE_MAX_SIZE + 1); /* reset page count bias and offset to start of new frag */ nc->pagecnt_bias = PAGE_FRAG_CACHE_MAX_SIZE + 1; - offset = size - fragsz; - if (unlikely(offset < 0)) { + offset = 0; + if (unlikely(fragsz > PAGE_SIZE)) { /* * The caller is trying to allocate a fragment * with fragsz > PAGE_SIZE but the cache isn't big @@ -124,8 +121,7 @@ void *__page_frag_alloc_align(struct page_frag_cache *nc, } nc->pagecnt_bias--; - offset &= align_mask; - nc->offset = offset; + nc->offset = offset + fragsz; return nc->va + offset; }