Message ID | 20250410152432.30246-3-justin.iurman@uliege.be (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | Mitigate double allocations in ioam6_iptunnel | expand |
On 4/10/25 17:24, Justin Iurman wrote: > If the dst_entry is the same post transformation (which is a valid use > case for IOAM), we don't add it to the cache to avoid a reference loop. > Instead, we use a "fake" dst_entry and add it to the cache as a signal. > When we read the cache, we compare it with our "fake" dst_entry and > therefore detect if we're in the special case. FYI: double reallocation also happens in seg6_iptunnel and rpl_iptunnel and was fixed previously. However, they don't need this patch, since it's for the case "dst_entry is the same post transformation", which is due to pathological configurations in their cases.
On 4/10/25 5:24 PM, Justin Iurman wrote: > If the dst_entry is the same post transformation (which is a valid use > case for IOAM), we don't add it to the cache to avoid a reference loop. > Instead, we use a "fake" dst_entry and add it to the cache as a signal. > When we read the cache, we compare it with our "fake" dst_entry and > therefore detect if we're in the special case. > > Signed-off-by: Justin Iurman <justin.iurman@uliege.be> > --- > net/ipv6/ioam6_iptunnel.c | 40 +++++++++++++++++++++++++++++++++++++-- > 1 file changed, 38 insertions(+), 2 deletions(-) > > diff --git a/net/ipv6/ioam6_iptunnel.c b/net/ipv6/ioam6_iptunnel.c > index 57200b9991a1..bbfb7dd7fa61 100644 > --- a/net/ipv6/ioam6_iptunnel.c > +++ b/net/ipv6/ioam6_iptunnel.c > @@ -38,6 +38,7 @@ struct ioam6_lwt_freq { > }; > > struct ioam6_lwt { > + struct dst_entry null_dst; > struct dst_cache cache; > struct ioam6_lwt_freq freq; > atomic_t pkt_cnt; > @@ -177,6 +178,16 @@ static int ioam6_build_state(struct net *net, struct nlattr *nla, > if (err) > goto free_lwt; > > + /* We set DST_NOCOUNT, even though this "fake" dst_entry will be stored > + * in a dst_cache, which will call dst_hold() and dst_release() on it. > + * These functions don't check for DST_NOCOUNT and modify the reference > + * count anyway. This is not really a problem, as long as we make sure > + * that dst_destroy() won't be called (which is the case since the > + * initial refcount is 1, then +1 to store it in the cache, and then > + * +1/-1 each time we read the cache and release it). AFAICS the DST_NOCOUNT flag is not related to the dst reference counting, but to the dst number accounting for gc's purpose. The above comment is misleading and should be rephrased, to avoid confusing whoever is going to look at this code later. > + */ > + dst_init(&ilwt->null_dst, NULL, NULL, DST_OBSOLETE_NONE, DST_NOCOUNT); > + > atomic_set(&ilwt->pkt_cnt, 0); > ilwt->freq.k = freq_k; > ilwt->freq.n = freq_n; > @@ -356,6 +367,17 @@ static int ioam6_output(struct net *net, struct sock *sk, struct sk_buff *skb) > dst = dst_cache_get(&ilwt->cache); > local_bh_enable(); > > + /* This is how we notify that the destination does not change after > + * transformation and that we need to use orig_dst instead of the cache > + */ > + if (dst == &ilwt->null_dst) { > + dst_release(dst); > + > + dst = orig_dst; > + /* keep refcount balance: dst_release() is called at the end */ > + dst_hold(dst); > + } > + > switch (ilwt->mode) { > case IOAM6_IPTUNNEL_MODE_INLINE: > do_inline: > @@ -408,8 +430,18 @@ static int ioam6_output(struct net *net, struct sock *sk, struct sk_buff *skb) > goto drop; > } > > - /* cache only if we don't create a dst reference loop */ > - if (orig_dst->lwtstate != dst->lwtstate) { > + /* If the destination is the same after transformation (which is > + * a valid use case for IOAM), then we don't want to add it to > + * the cache in order to avoid a reference loop. Instead, we add > + * our fake dst_entry to the cache as a way to detect this case. > + * Otherwise, we add the resolved destination to the cache. > + */ > + if (orig_dst->lwtstate == dst->lwtstate) { > + local_bh_disable(); > + dst_cache_set_ip6(&ilwt->cache, > + &ilwt->null_dst, &fl6.saddr); > + local_bh_enable(); > + } else { > local_bh_disable(); > dst_cache_set_ip6(&ilwt->cache, dst, &fl6.saddr); > local_bh_enable(); Possibly move the BH disable/enable around the if statement for smaller/more readable code. Otherwise LGTM, thanks! Paolo
diff --git a/net/ipv6/ioam6_iptunnel.c b/net/ipv6/ioam6_iptunnel.c index 57200b9991a1..bbfb7dd7fa61 100644 --- a/net/ipv6/ioam6_iptunnel.c +++ b/net/ipv6/ioam6_iptunnel.c @@ -38,6 +38,7 @@ struct ioam6_lwt_freq { }; struct ioam6_lwt { + struct dst_entry null_dst; struct dst_cache cache; struct ioam6_lwt_freq freq; atomic_t pkt_cnt; @@ -177,6 +178,16 @@ static int ioam6_build_state(struct net *net, struct nlattr *nla, if (err) goto free_lwt; + /* We set DST_NOCOUNT, even though this "fake" dst_entry will be stored + * in a dst_cache, which will call dst_hold() and dst_release() on it. + * These functions don't check for DST_NOCOUNT and modify the reference + * count anyway. This is not really a problem, as long as we make sure + * that dst_destroy() won't be called (which is the case since the + * initial refcount is 1, then +1 to store it in the cache, and then + * +1/-1 each time we read the cache and release it). + */ + dst_init(&ilwt->null_dst, NULL, NULL, DST_OBSOLETE_NONE, DST_NOCOUNT); + atomic_set(&ilwt->pkt_cnt, 0); ilwt->freq.k = freq_k; ilwt->freq.n = freq_n; @@ -356,6 +367,17 @@ static int ioam6_output(struct net *net, struct sock *sk, struct sk_buff *skb) dst = dst_cache_get(&ilwt->cache); local_bh_enable(); + /* This is how we notify that the destination does not change after + * transformation and that we need to use orig_dst instead of the cache + */ + if (dst == &ilwt->null_dst) { + dst_release(dst); + + dst = orig_dst; + /* keep refcount balance: dst_release() is called at the end */ + dst_hold(dst); + } + switch (ilwt->mode) { case IOAM6_IPTUNNEL_MODE_INLINE: do_inline: @@ -408,8 +430,18 @@ static int ioam6_output(struct net *net, struct sock *sk, struct sk_buff *skb) goto drop; } - /* cache only if we don't create a dst reference loop */ - if (orig_dst->lwtstate != dst->lwtstate) { + /* If the destination is the same after transformation (which is + * a valid use case for IOAM), then we don't want to add it to + * the cache in order to avoid a reference loop. Instead, we add + * our fake dst_entry to the cache as a way to detect this case. + * Otherwise, we add the resolved destination to the cache. + */ + if (orig_dst->lwtstate == dst->lwtstate) { + local_bh_disable(); + dst_cache_set_ip6(&ilwt->cache, + &ilwt->null_dst, &fl6.saddr); + local_bh_enable(); + } else { local_bh_disable(); dst_cache_set_ip6(&ilwt->cache, dst, &fl6.saddr); local_bh_enable(); @@ -439,6 +471,10 @@ static int ioam6_output(struct net *net, struct sock *sk, struct sk_buff *skb) static void ioam6_destroy_state(struct lwtunnel_state *lwt) { + /* Since the refcount of per-cpu dst_entry caches will never be 0 (see + * why above) when our "fake" dst_entry is used, it is not necessary to + * remove them before calling dst_cache_destroy() + */ dst_cache_destroy(&ioam6_lwt_state(lwt)->cache); }
If the dst_entry is the same post transformation (which is a valid use case for IOAM), we don't add it to the cache to avoid a reference loop. Instead, we use a "fake" dst_entry and add it to the cache as a signal. When we read the cache, we compare it with our "fake" dst_entry and therefore detect if we're in the special case. Signed-off-by: Justin Iurman <justin.iurman@uliege.be> --- net/ipv6/ioam6_iptunnel.c | 40 +++++++++++++++++++++++++++++++++++++-- 1 file changed, 38 insertions(+), 2 deletions(-)