diff mbox series

[net-next,2/2] net: ipv6: ioam6: fix double reallocation

Message ID 20250410152432.30246-3-justin.iurman@uliege.be (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series Mitigate double allocations in ioam6_iptunnel | expand

Checks

Context Check Description
netdev/series_format success Posting correctly formatted
netdev/tree_selection success Clearly marked for net-next
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 0 this patch: 0
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers warning 1 maintainers not CCed: dsahern@kernel.org
netdev/build_clang success Errors and warnings before: 0 this patch: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 0 this patch: 0
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 70 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
netdev/contest success net-next-2025-04-11--03-00 (tests: 900)

Commit Message

Justin Iurman April 10, 2025, 3:24 p.m. UTC
If the dst_entry is the same post transformation (which is a valid use
case for IOAM), we don't add it to the cache to avoid a reference loop.
Instead, we use a "fake" dst_entry and add it to the cache as a signal.
When we read the cache, we compare it with our "fake" dst_entry and
therefore detect if we're in the special case.

Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
---
 net/ipv6/ioam6_iptunnel.c | 40 +++++++++++++++++++++++++++++++++++++--
 1 file changed, 38 insertions(+), 2 deletions(-)

Comments

Justin Iurman April 10, 2025, 4:16 p.m. UTC | #1
On 4/10/25 17:24, Justin Iurman wrote:
> If the dst_entry is the same post transformation (which is a valid use
> case for IOAM), we don't add it to the cache to avoid a reference loop.
> Instead, we use a "fake" dst_entry and add it to the cache as a signal.
> When we read the cache, we compare it with our "fake" dst_entry and
> therefore detect if we're in the special case.

FYI: double reallocation also happens in seg6_iptunnel and rpl_iptunnel 
and was fixed previously. However, they don't need this patch, since 
it's for the case "dst_entry is the same post transformation", which is 
due to pathological configurations in their cases.
Paolo Abeni April 15, 2025, 9:59 a.m. UTC | #2
On 4/10/25 5:24 PM, Justin Iurman wrote:
> If the dst_entry is the same post transformation (which is a valid use
> case for IOAM), we don't add it to the cache to avoid a reference loop.
> Instead, we use a "fake" dst_entry and add it to the cache as a signal.
> When we read the cache, we compare it with our "fake" dst_entry and
> therefore detect if we're in the special case.
> 
> Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
> ---
>  net/ipv6/ioam6_iptunnel.c | 40 +++++++++++++++++++++++++++++++++++++--
>  1 file changed, 38 insertions(+), 2 deletions(-)
> 
> diff --git a/net/ipv6/ioam6_iptunnel.c b/net/ipv6/ioam6_iptunnel.c
> index 57200b9991a1..bbfb7dd7fa61 100644
> --- a/net/ipv6/ioam6_iptunnel.c
> +++ b/net/ipv6/ioam6_iptunnel.c
> @@ -38,6 +38,7 @@ struct ioam6_lwt_freq {
>  };
>  
>  struct ioam6_lwt {
> +	struct dst_entry null_dst;
>  	struct dst_cache cache;
>  	struct ioam6_lwt_freq freq;
>  	atomic_t pkt_cnt;
> @@ -177,6 +178,16 @@ static int ioam6_build_state(struct net *net, struct nlattr *nla,
>  	if (err)
>  		goto free_lwt;
>  
> +	/* We set DST_NOCOUNT, even though this "fake" dst_entry will be stored
> +	 * in a dst_cache, which will call dst_hold() and dst_release() on it.
> +	 * These functions don't check for DST_NOCOUNT and modify the reference
> +	 * count anyway. This is not really a problem, as long as we make sure
> +	 * that dst_destroy() won't be called (which is the case since the
> +	 * initial refcount is 1, then +1 to store it in the cache, and then
> +	 * +1/-1 each time we read the cache and release it).

AFAICS the DST_NOCOUNT flag is not related to the dst reference
counting, but to the dst number accounting for gc's purpose.

The above comment is misleading and should be rephrased, to avoid
confusing whoever is going to look at this code later.



> +	 */
> +	dst_init(&ilwt->null_dst, NULL, NULL, DST_OBSOLETE_NONE, DST_NOCOUNT);
> +
>  	atomic_set(&ilwt->pkt_cnt, 0);
>  	ilwt->freq.k = freq_k;
>  	ilwt->freq.n = freq_n;
> @@ -356,6 +367,17 @@ static int ioam6_output(struct net *net, struct sock *sk, struct sk_buff *skb)
>  	dst = dst_cache_get(&ilwt->cache);
>  	local_bh_enable();
>  
> +	/* This is how we notify that the destination does not change after
> +	 * transformation and that we need to use orig_dst instead of the cache
> +	 */
> +	if (dst == &ilwt->null_dst) {
> +		dst_release(dst);
> +
> +		dst = orig_dst;
> +		/* keep refcount balance: dst_release() is called at the end */
> +		dst_hold(dst);
> +	}
> +
>  	switch (ilwt->mode) {
>  	case IOAM6_IPTUNNEL_MODE_INLINE:
>  do_inline:
> @@ -408,8 +430,18 @@ static int ioam6_output(struct net *net, struct sock *sk, struct sk_buff *skb)
>  			goto drop;
>  		}
>  
> -		/* cache only if we don't create a dst reference loop */
> -		if (orig_dst->lwtstate != dst->lwtstate) {
> +		/* If the destination is the same after transformation (which is
> +		 * a valid use case for IOAM), then we don't want to add it to
> +		 * the cache in order to avoid a reference loop. Instead, we add
> +		 * our fake dst_entry to the cache as a way to detect this case.
> +		 * Otherwise, we add the resolved destination to the cache.
> +		 */
> +		if (orig_dst->lwtstate == dst->lwtstate) {
> +			local_bh_disable();
> +			dst_cache_set_ip6(&ilwt->cache,
> +					  &ilwt->null_dst, &fl6.saddr);
> +			local_bh_enable();
> +		} else {
>  			local_bh_disable();
>  			dst_cache_set_ip6(&ilwt->cache, dst, &fl6.saddr);
>  			local_bh_enable();

Possibly move the BH disable/enable around the if statement for
smaller/more readable code.

Otherwise LGTM, thanks!

Paolo
diff mbox series

Patch

diff --git a/net/ipv6/ioam6_iptunnel.c b/net/ipv6/ioam6_iptunnel.c
index 57200b9991a1..bbfb7dd7fa61 100644
--- a/net/ipv6/ioam6_iptunnel.c
+++ b/net/ipv6/ioam6_iptunnel.c
@@ -38,6 +38,7 @@  struct ioam6_lwt_freq {
 };
 
 struct ioam6_lwt {
+	struct dst_entry null_dst;
 	struct dst_cache cache;
 	struct ioam6_lwt_freq freq;
 	atomic_t pkt_cnt;
@@ -177,6 +178,16 @@  static int ioam6_build_state(struct net *net, struct nlattr *nla,
 	if (err)
 		goto free_lwt;
 
+	/* We set DST_NOCOUNT, even though this "fake" dst_entry will be stored
+	 * in a dst_cache, which will call dst_hold() and dst_release() on it.
+	 * These functions don't check for DST_NOCOUNT and modify the reference
+	 * count anyway. This is not really a problem, as long as we make sure
+	 * that dst_destroy() won't be called (which is the case since the
+	 * initial refcount is 1, then +1 to store it in the cache, and then
+	 * +1/-1 each time we read the cache and release it).
+	 */
+	dst_init(&ilwt->null_dst, NULL, NULL, DST_OBSOLETE_NONE, DST_NOCOUNT);
+
 	atomic_set(&ilwt->pkt_cnt, 0);
 	ilwt->freq.k = freq_k;
 	ilwt->freq.n = freq_n;
@@ -356,6 +367,17 @@  static int ioam6_output(struct net *net, struct sock *sk, struct sk_buff *skb)
 	dst = dst_cache_get(&ilwt->cache);
 	local_bh_enable();
 
+	/* This is how we notify that the destination does not change after
+	 * transformation and that we need to use orig_dst instead of the cache
+	 */
+	if (dst == &ilwt->null_dst) {
+		dst_release(dst);
+
+		dst = orig_dst;
+		/* keep refcount balance: dst_release() is called at the end */
+		dst_hold(dst);
+	}
+
 	switch (ilwt->mode) {
 	case IOAM6_IPTUNNEL_MODE_INLINE:
 do_inline:
@@ -408,8 +430,18 @@  static int ioam6_output(struct net *net, struct sock *sk, struct sk_buff *skb)
 			goto drop;
 		}
 
-		/* cache only if we don't create a dst reference loop */
-		if (orig_dst->lwtstate != dst->lwtstate) {
+		/* If the destination is the same after transformation (which is
+		 * a valid use case for IOAM), then we don't want to add it to
+		 * the cache in order to avoid a reference loop. Instead, we add
+		 * our fake dst_entry to the cache as a way to detect this case.
+		 * Otherwise, we add the resolved destination to the cache.
+		 */
+		if (orig_dst->lwtstate == dst->lwtstate) {
+			local_bh_disable();
+			dst_cache_set_ip6(&ilwt->cache,
+					  &ilwt->null_dst, &fl6.saddr);
+			local_bh_enable();
+		} else {
 			local_bh_disable();
 			dst_cache_set_ip6(&ilwt->cache, dst, &fl6.saddr);
 			local_bh_enable();
@@ -439,6 +471,10 @@  static int ioam6_output(struct net *net, struct sock *sk, struct sk_buff *skb)
 
 static void ioam6_destroy_state(struct lwtunnel_state *lwt)
 {
+	/* Since the refcount of per-cpu dst_entry caches will never be 0 (see
+	 * why above) when our "fake" dst_entry is used, it is not necessary to
+	 * remove them before calling dst_cache_destroy()
+	 */
 	dst_cache_destroy(&ioam6_lwt_state(lwt)->cache);
 }