Message ID | 20241119222139.14338-1-justin.iurman@uliege.be (mailing list archive) |
---|---|
Headers | show |
Series | Mitigate the two-reallocations issue for iptunnels | expand |
On 11/19/24 23:21, Justin Iurman wrote: > v5: > - address Paolo's comments > - s/int dst_dev_overhead()/unsigned int dst_dev_overhead()/ > v4: > - move static inline function to include/net/dst.h > v3: > - fix compilation error in seg6_iptunnel > v2: > - add missing "static" keywords in seg6_iptunnel > - use a static-inline function to return the dev overhead (as suggested > by Olek, thanks) > > The same pattern is found in ioam6, rpl6, and seg6. Basically, it first > makes sure there is enough room for inserting a new header: > > (1) err = skb_cow_head(skb, len + skb->mac_len); > > Then, when the insertion (encap or inline) is performed, the input and > output handlers respectively make sure there is enough room for layer 2: > > (2) err = skb_cow_head(skb, LL_RESERVED_SPACE(dst->dev)); > > skb_cow_head() does nothing when there is enough room. Otherwise, it > reallocates more room, which depends on the architecture. Briefly, > skb_cow_head() calls __skb_cow() which then calls pskb_expand_head() as > follows: > > pskb_expand_head(skb, ALIGN(delta, NET_SKB_PAD), 0, GFP_ATOMIC); > > "delta" represents the number of bytes to be added. This value is > aligned with NET_SKB_PAD, which is defined as follows: > > NET_SKB_PAD = max(32, L1_CACHE_BYTES) > > ... where L1_CACHE_BYTES also depends on the architecture. In our case > (x86), it is defined as follows: > > L1_CACHE_BYTES = (1 << CONFIG_X86_L1_CACHE_SHIFT) > > ... where (again, in our case) CONFIG_X86_L1_CACHE_SHIFT equals 6 > (=X86_GENERIC). > > All this to say, skb_cow_head() would reallocate to the next multiple of > NET_SKB_PAD (in our case a 64-byte multiple) when there is not enough > room. > > Back to the main issue with the pattern: in some cases, two > reallocations are triggered, resulting in a performance drop (i.e., > lines (1) and (2) would both trigger an implicit reallocation). How's > that possible? Well, this is kind of bad luck as we hit an exact > NET_SKB_PAD boundary and when skb->mac_len (=14) is smaller than > LL_RESERVED_SPACE(dst->dev) (=16 in our case). For an x86 arch, it > happens in the following cases (with the default needed_headroom): > > - ioam6: > - (inline mode) pre-allocated data trace of 236 or 240 bytes > - (encap mode) pre-allocated data trace of 196 or 200 bytes > - seg6: > - (encap mode) for 13, 17, 21, 25, 29, 33, ...(+4)... prefixes > > Let's illustrate the problem, i.e., when we fall on the exact > NET_SKB_PAD boundary. In the case of ioam6, for the above problematic > values, the total overhead is 256 bytes for both modes. Based on line > (1), skb->mac_len (=14) is added, therefore passing 270 bytes to > skb_cow_head(). At that moment, the headroom has 206 bytes available (in > our case). Since 270 > 206, skb_cow_head() performs a reallocation and > the new headroom is now 206 + 64 (NET_SKB_PAD) = 270. Which is exactly > the room we needed. After the insertion, the headroom has 0 byte > available. But, there's line (2) where 16 bytes are still needed. Which, > again, triggers another reallocation. > > The same logic is applied to seg6 (although it does not happen with the > inline mode, i.e., -40 bytes). It happens with other L1 cache shifts too > (the larger the cache shift, the less often it happens). For example, > with a +32 cache shift (instead of +64), the following number of > segments would trigger two reallocations: 11, 15, 19, ... With a +128 > cache shift, the following number of segments would trigger two > reallocations: 17, 25, 33, ... And so on and so forth. Note that it is > the same for both the "encap" and "l2encap" modes. For the "encap.red" > and "l2encap.red" modes, it is the same logic but with "segs+1" (e.g., > 14, 18, 22, 26, etc for a +64 cache shift). Note also that it may happen > with rpl6 (based on some calculations), although it did not in our case. > > This series provides a solution to mitigate the aforementioned issue for > ioam6, seg6, and rpl6. It provides the dst_entry (in the cache) to > skb_cow_head() **before** the insertion (line (1)). As a result, the > very first iteration would still trigger two reallocations (i.e., empty > cache), while next iterations would only trigger a single reallocation. > > Justin Iurman (4): > include: net: add static inline dst_dev_overhead() to dst.h > net: ipv6: ioam6_iptunnel: mitigate 2-realloc issue > net: ipv6: seg6_iptunnel: mitigate 2-realloc issue > net: ipv6: rpl_iptunnel: mitigate 2-realloc issue > > include/net/dst.h | 9 +++++ > net/ipv6/ioam6_iptunnel.c | 73 ++++++++++++++++----------------- > net/ipv6/rpl_iptunnel.c | 46 +++++++++++---------- > net/ipv6/seg6_iptunnel.c | 85 ++++++++++++++++++++++++--------------- > 4 files changed, 123 insertions(+), 90 deletions(-) > Sorry, I just noticed I missed net-next window. Will resubmit when it reopens.