Message ID | 20230321163550.1574254-1-eric.dumazet@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [v2,net-next] net: introduce a config option to tweak MAX_SKB_FRAGS | expand |
On 21/03/2023 18:35, Eric Dumazet wrote: > From: Eric Dumazet <edumazet@google.com> > > Currently, MAX_SKB_FRAGS value is 17. > > For standard tcp sendmsg() traffic, no big deal because tcp_sendmsg() > attempts order-3 allocations, stuffing 32768 bytes per frag. > > But with zero copy, we use order-0 pages. > > For BIG TCP to show its full potential, we add a config option > to be able to fit up to 45 segments per skb. > > This is also needed for BIG TCP rx zerocopy, as zerocopy currently > does not support skbs with frag list. > > We have used MAX_SKB_FRAGS=45 value for years at Google before > we deployed 4K MTU, with no adverse effect, other than > a recent issue in mlx4, fixed in commit 26782aad00cc > ("net/mlx4: MLX4_TX_BOUNCE_BUFFER_SIZE depends on MAX_SKB_FRAGS") > > Back then, goal was to be able to receive full size (64KB) GRO > packets without the frag_list overhead. > > Note that /proc/sys/net/core/max_skb_frags can also be used to limit > the number of fragments TCP can use in tx packets. > > By default we keep the old/legacy value of 17 until we get > more coverage for the updated values. > > Sizes of struct skb_shared_info on 64bit arches > > MAX_SKB_FRAGS | sizeof(struct skb_shared_info): > ============================================== > 17 320 > 21 320+64 = 384 > 25 320+128 = 448 > 29 320+192 = 512 > 33 320+256 = 576 > 37 320+320 = 640 > 41 320+384 = 704 > 45 320+448 = 768 > > This inflation might cause problems for drivers assuming they could pack > both the incoming packet and skb_shared_info in half a page, using build_skb(). > > v2: fix two build errors assuming MAX_SKB_FRAGS was "unsigned long" > > Signed-off-by: Eric Dumazet <edumazet@google.com> > --- > drivers/scsi/cxgbi/libcxgbi.c | 4 ++-- > include/linux/skbuff.h | 14 ++------------ > net/Kconfig | 12 ++++++++++++ > net/packet/af_packet.c | 4 ++-- > 4 files changed, 18 insertions(+), 16 deletions(-) > Nice! I was statically increasing it for our datapath performance tests w/ BIG TCP and zerocopy, had to implement custom header-data split for mlx to get it all working but the improvements are impressive as expected. FWIW, Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Hi Eric,
I love your patch! Yet something to improve:
[auto build test ERROR on net-next/main]
url: https://github.com/intel-lab-lkp/linux/commits/Eric-Dumazet/net-introduce-a-config-option-to-tweak-MAX_SKB_FRAGS/20230322-003641
patch link: https://lore.kernel.org/r/20230321163550.1574254-1-eric.dumazet%40gmail.com
patch subject: [PATCH v2 net-next] net: introduce a config option to tweak MAX_SKB_FRAGS
config: ia64-randconfig-r005-20230322 (https://download.01.org/0day-ci/archive/20230322/202303221833.CjbkODlQ-lkp@intel.com/config)
compiler: ia64-linux-gcc (GCC) 12.1.0
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# https://github.com/intel-lab-lkp/linux/commit/69776fdcf56a3d545d8b37c25829fcadec2d9144
git remote add linux-review https://github.com/intel-lab-lkp/linux
git fetch --no-tags linux-review Eric-Dumazet/net-introduce-a-config-option-to-tweak-MAX_SKB_FRAGS/20230322-003641
git checkout 69776fdcf56a3d545d8b37c25829fcadec2d9144
# save the config file
mkdir build_dir && cp config build_dir/.config
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=ia64 olddefconfig
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=ia64 SHELL=/bin/bash kernel/
If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <lkp@intel.com>
| Link: https://lore.kernel.org/oe-kbuild-all/202303221833.CjbkODlQ-lkp@intel.com/
All errors (new ones prefixed by >>):
In file included from include/linux/filter.h:12,
from include/linux/bpf_verifier.h:9,
from kernel/bpf/btf.c:19:
include/linux/skbuff.h:348:23: error: 'CONFIG_MAX_SKB_FRAGS' undeclared here (not in a function); did you mean 'MAX_SKB_FRAGS'?
348 | #define MAX_SKB_FRAGS CONFIG_MAX_SKB_FRAGS
| ^~~~~~~~~~~~~~~~~~~~
include/linux/skbuff.h:593:31: note: in expansion of macro 'MAX_SKB_FRAGS'
593 | skb_frag_t frags[MAX_SKB_FRAGS];
| ^~~~~~~~~~~~~
include/linux/skbuff.h: In function '__skb_fill_page_desc_noacc':
include/linux/skbuff.h:2392:51: warning: parameter 'i' set but not used [-Wunused-but-set-parameter]
2392 | int i, struct page *page,
| ~~~~^
include/linux/skbuff.h: In function 'skb_frag_ref':
include/linux/skbuff.h:3380:58: warning: parameter 'f' set but not used [-Wunused-but-set-parameter]
3380 | static inline void skb_frag_ref(struct sk_buff *skb, int f)
| ~~~~^
include/linux/skbuff.h: In function 'skb_frag_unref':
include/linux/skbuff.h:3411:60: warning: parameter 'f' set but not used [-Wunused-but-set-parameter]
3411 | static inline void skb_frag_unref(struct sk_buff *skb, int f)
| ~~~~^
include/linux/skbuff.h: In function 'skb_frag_set_page':
include/linux/skbuff.h:3478:63: warning: parameter 'f' set but not used [-Wunused-but-set-parameter]
3478 | static inline void skb_frag_set_page(struct sk_buff *skb, int f,
| ~~~~^
In file included from <command-line>:
include/linux/skmsg.h: In function 'sk_msg_init':
>> include/linux/build_bug.h:16:51: error: bit-field '<anonymous>' width not an integer constant
16 | #define BUILD_BUG_ON_ZERO(e) ((int)(sizeof(struct { int:(-!!(e)); })))
| ^
include/linux/compiler_types.h:377:23: note: in definition of macro '__compiletime_assert'
377 | if (!(condition)) \
| ^~~~~~~~~
include/linux/compiler_types.h:397:9: note: in expansion of macro '_compiletime_assert'
397 | _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
| ^~~~~~~~~~~~~~~~~~~
include/linux/build_bug.h:39:37: note: in expansion of macro 'compiletime_assert'
39 | #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg)
| ^~~~~~~~~~~~~~~~~~
include/linux/build_bug.h:50:9: note: in expansion of macro 'BUILD_BUG_ON_MSG'
50 | BUILD_BUG_ON_MSG(condition, "BUILD_BUG_ON failed: " #condition)
| ^~~~~~~~~~~~~~~~
include/linux/skmsg.h:177:9: note: in expansion of macro 'BUILD_BUG_ON'
177 | BUILD_BUG_ON(ARRAY_SIZE(msg->sg.data) - 1 != NR_MSG_FRAG_IDS);
| ^~~~~~~~~~~~
include/linux/compiler.h:232:33: note: in expansion of macro 'BUILD_BUG_ON_ZERO'
232 | #define __must_be_array(a) BUILD_BUG_ON_ZERO(__same_type((a), &(a)[0]))
| ^~~~~~~~~~~~~~~~~
include/linux/kernel.h:55:59: note: in expansion of macro '__must_be_array'
55 | #define ARRAY_SIZE(arr) (sizeof(arr) / sizeof((arr)[0]) + __must_be_array(arr))
| ^~~~~~~~~~~~~~~
include/linux/skmsg.h:177:22: note: in expansion of macro 'ARRAY_SIZE'
177 | BUILD_BUG_ON(ARRAY_SIZE(msg->sg.data) - 1 != NR_MSG_FRAG_IDS);
| ^~~~~~~~~~
In file included from kernel/bpf/btf.c:23:
include/linux/skmsg.h: In function 'sk_msg_xfer':
include/linux/skmsg.h:183:36: warning: parameter 'which' set but not used [-Wunused-but-set-parameter]
183 | int which, u32 size)
| ~~~~^~~~~
include/linux/skmsg.h: In function 'sk_msg_elem':
include/linux/skmsg.h:209:71: warning: parameter 'which' set but not used [-Wunused-but-set-parameter]
209 | static inline struct scatterlist *sk_msg_elem(struct sk_msg *msg, int which)
| ~~~~^~~~~
include/linux/skmsg.h: In function 'sk_msg_elem_cpy':
include/linux/skmsg.h:214:74: warning: parameter 'which' set but not used [-Wunused-but-set-parameter]
214 | static inline struct scatterlist sk_msg_elem_cpy(struct sk_msg *msg, int which)
| ~~~~^~~~~
kernel/bpf/btf.c: In function 'btf_seq_show':
kernel/bpf/btf.c:7101:29: warning: function 'btf_seq_show' might be a candidate for 'gnu_printf' format attribute [-Wsuggest-attribute=format]
7101 | seq_vprintf((struct seq_file *)show->target, fmt, args);
| ^~~~~~~~
kernel/bpf/btf.c: In function 'btf_snprintf_show':
kernel/bpf/btf.c:7138:9: warning: function 'btf_snprintf_show' might be a candidate for 'gnu_printf' format attribute [-Wsuggest-attribute=format]
7138 | len = vsnprintf(show->target, ssnprintf->len_left, fmt, args);
| ^~~
vim +16 include/linux/build_bug.h
bc6245e5efd70c Ian Abbott 2017-07-10 6
bc6245e5efd70c Ian Abbott 2017-07-10 7 #ifdef __CHECKER__
bc6245e5efd70c Ian Abbott 2017-07-10 8 #define BUILD_BUG_ON_ZERO(e) (0)
bc6245e5efd70c Ian Abbott 2017-07-10 9 #else /* __CHECKER__ */
bc6245e5efd70c Ian Abbott 2017-07-10 10 /*
bc6245e5efd70c Ian Abbott 2017-07-10 11 * Force a compilation error if condition is true, but also produce a
8788994376d84d Rikard Falkeborn 2019-12-04 12 * result (of value 0 and type int), so the expression can be used
bc6245e5efd70c Ian Abbott 2017-07-10 13 * e.g. in a structure initializer (or where-ever else comma expressions
bc6245e5efd70c Ian Abbott 2017-07-10 14 * aren't permitted).
bc6245e5efd70c Ian Abbott 2017-07-10 15 */
8788994376d84d Rikard Falkeborn 2019-12-04 @16 #define BUILD_BUG_ON_ZERO(e) ((int)(sizeof(struct { int:(-!!(e)); })))
527edbc18a70e7 Masahiro Yamada 2019-01-03 17 #endif /* __CHECKER__ */
527edbc18a70e7 Masahiro Yamada 2019-01-03 18
diff --git a/drivers/scsi/cxgbi/libcxgbi.c b/drivers/scsi/cxgbi/libcxgbi.c index af281e271f886041b397ea881e2ce7be00eff625..3e1de4c842cc6102e25a5972d6b11e05c3e4c060 100644 --- a/drivers/scsi/cxgbi/libcxgbi.c +++ b/drivers/scsi/cxgbi/libcxgbi.c @@ -2314,9 +2314,9 @@ static int cxgbi_sock_tx_queue_up(struct cxgbi_sock *csk, struct sk_buff *skb) frags++; if (frags >= SKB_WR_LIST_SIZE) { - pr_err("csk 0x%p, frags %u, %u,%u >%lu.\n", + pr_err("csk 0x%p, frags %u, %u,%u >%u.\n", csk, skb_shinfo(skb)->nr_frags, skb->len, - skb->data_len, SKB_WR_LIST_SIZE); + skb->data_len, (unsigned int)SKB_WR_LIST_SIZE); return -EINVAL; } diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index fe661011644b8f468ff5e92075a6624f0557584c..43726ca7d20f232461a4d2e5b984032806e9c13e 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -345,18 +345,8 @@ struct sk_buff_head { struct sk_buff; -/* To allow 64K frame to be packed as single skb without frag_list we - * require 64K/PAGE_SIZE pages plus 1 additional page to allow for - * buffers which do not start on a page boundary. - * - * Since GRO uses frags we allocate at least 16 regardless of page - * size. - */ -#if (65536/PAGE_SIZE + 1) < 16 -#define MAX_SKB_FRAGS 16UL -#else -#define MAX_SKB_FRAGS (65536/PAGE_SIZE + 1) -#endif +#define MAX_SKB_FRAGS CONFIG_MAX_SKB_FRAGS + extern int sysctl_max_skb_frags; /* Set skb_shinfo(skb)->gso_size to this in case you want skb_segment to diff --git a/net/Kconfig b/net/Kconfig index 48c33c2221999e575c83a409ab773b9cc3656eab..f806722bccf450c62e07bfdb245e5195ac4a156d 100644 --- a/net/Kconfig +++ b/net/Kconfig @@ -251,6 +251,18 @@ config PCPU_DEV_REFCNT network device refcount are using per cpu variables if this option is set. This can be forced to N to detect underflows (with a performance drop). +config MAX_SKB_FRAGS + int "Maximum number of fragments per skb_shared_info" + range 17 45 + default 17 + help + Having more fragments per skb_shared_info can help GRO efficiency. + This helps BIG TCP workloads, but might expose bugs in some + legacy drivers. + This also increases memory overhead of small packets, + and in drivers using build_skb(). + If unsure, say 17. + config RPS bool depends on SMP && SYSFS diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c index 497193f73030c385a2d33b71dfbc299fbf9b763d..568f8d76e3c124f3b322a8d88dc3dcfbc45e7c0e 100644 --- a/net/packet/af_packet.c +++ b/net/packet/af_packet.c @@ -2622,8 +2622,8 @@ static int tpacket_fill_skb(struct packet_sock *po, struct sk_buff *skb, nr_frags = skb_shinfo(skb)->nr_frags; if (unlikely(nr_frags >= MAX_SKB_FRAGS)) { - pr_err("Packet exceed the number of skb frags(%lu)\n", - MAX_SKB_FRAGS); + pr_err("Packet exceed the number of skb frags(%u)\n", + (unsigned int)MAX_SKB_FRAGS); return -EFAULT; }