diff mbox series

[net,1/2] net: do not keep the dst cache when uncloning an skb dst and its metadata

Message ID 20220202110137.470850-2-atenart@kernel.org (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series net: fix issues when uncloning an skb dst+metadata | expand

Checks

Context Check Description
netdev/tree_selection success Clearly marked for net
netdev/fixes_present success Fixes tag present in non-next series
netdev/subject_prefix success Link
netdev/cover_letter success Series has a cover letter
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 75 this patch: 75
netdev/cc_maintainers success CCed 4 of 4 maintainers
netdev/build_clang success Errors and warnings before: 25 this patch: 25
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 94 this patch: 94
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 26 lines checked
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Antoine Tenart Feb. 2, 2022, 11:01 a.m. UTC
When uncloning an skb dst and its associated metadata a new dst+metadata
is allocated and the tunnel information from the old metadata is copied
over there.

The issue is the tunnel metadata has references to cached dst, which are
copied along the way. When a dst+metadata refcount drops to 0 the
metadata is freed including the cached dst entries. As they are also
referenced in the initial dst+metadata, this ends up in UaFs.

In practice the above did not happen because of another issue, the
dst+metadata was never freed because its refcount never dropped to 0
(this will be fixed in a subsequent patch).

Fix this by initializing the dst cache after copying the tunnel
information from the old metadata to also unshare the dst cache.

Fixes: d71785ffc7e7 ("net: add dst_cache to ovs vxlan lwtunnel")
Cc: Paolo Abeni <pabeni@redhat.com>
Reported-by: Vlad Buslov <vladbu@nvidia.com>
Tested-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Antoine Tenart <atenart@kernel.org>
---
 include/net/dst_metadata.h | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

Comments

Antoine Tenart Feb. 2, 2022, 11:17 a.m. UTC | #1
Quoting Antoine Tenart (2022-02-02 12:01:36)
> When uncloning an skb dst and its associated metadata a new dst+metadata
> is allocated and the tunnel information from the old metadata is copied
> over there.
> 
> The issue is the tunnel metadata has references to cached dst, which are
> copied along the way. When a dst+metadata refcount drops to 0 the
> metadata is freed including the cached dst entries. As they are also
> referenced in the initial dst+metadata, this ends up in UaFs.
> 
> In practice the above did not happen because of another issue, the
> dst+metadata was never freed because its refcount never dropped to 0
> (this will be fixed in a subsequent patch).
> 
> Fix this by initializing the dst cache after copying the tunnel
> information from the old metadata to also unshare the dst cache.
> 
> Fixes: d71785ffc7e7 ("net: add dst_cache to ovs vxlan lwtunnel")
> Cc: Paolo Abeni <pabeni@redhat.com>
> Reported-by: Vlad Buslov <vladbu@nvidia.com>
> Tested-by: Vlad Buslov <vladbu@nvidia.com>
> Signed-off-by: Antoine Tenart <atenart@kernel.org>
> ---
>  include/net/dst_metadata.h | 13 ++++++++++++-
>  1 file changed, 12 insertions(+), 1 deletion(-)
> 
> diff --git a/include/net/dst_metadata.h b/include/net/dst_metadata.h
> index 14efa0ded75d..c8f8b7b56bba 100644
> --- a/include/net/dst_metadata.h
> +++ b/include/net/dst_metadata.h
> @@ -110,8 +110,8 @@ static inline struct metadata_dst *tun_rx_dst(int md_size)
>  static inline struct metadata_dst *tun_dst_unclone(struct sk_buff *skb)
>  {
>         struct metadata_dst *md_dst = skb_metadata_dst(skb);
> -       int md_size;
>         struct metadata_dst *new_md;
> +       int md_size, ret;

Hmmm ret should probably be defined inside a CONFIG_DST_CACHE section.

>         if (!md_dst || md_dst->type != METADATA_IP_TUNNEL)
>                 return ERR_PTR(-EINVAL);
> @@ -123,6 +123,17 @@ static inline struct metadata_dst *tun_dst_unclone(struct sk_buff *skb)
>  
>         memcpy(&new_md->u.tun_info, &md_dst->u.tun_info,
>                sizeof(struct ip_tunnel_info) + md_size);
> +#ifdef CONFIG_DST_CACHE
> +       ret = dst_cache_init(&new_md->u.tun_info.dst_cache, GFP_ATOMIC);
> +       if (ret) {
> +               /* We can't call metadata_dst_free directly as the still shared
> +                * dst cache would be released.
> +                */
> +               kfree(new_md);
> +               return ERR_PTR(ret);
> +       }
> +#endif
> +
>         skb_dst_drop(skb);
>         dst_hold(&new_md->dst);
>         skb_dst_set(skb, &new_md->dst);
> -- 
> 2.34.1
>
Daniel Borkmann Feb. 2, 2022, 12:13 p.m. UTC | #2
On 2/2/22 12:01 PM, Antoine Tenart wrote:
> When uncloning an skb dst and its associated metadata a new dst+metadata
> is allocated and the tunnel information from the old metadata is copied
> over there.
> 
> The issue is the tunnel metadata has references to cached dst, which are
> copied along the way. When a dst+metadata refcount drops to 0 the
> metadata is freed including the cached dst entries. As they are also
> referenced in the initial dst+metadata, this ends up in UaFs.
> 
> In practice the above did not happen because of another issue, the
> dst+metadata was never freed because its refcount never dropped to 0
> (this will be fixed in a subsequent patch).
> 
> Fix this by initializing the dst cache after copying the tunnel
> information from the old metadata to also unshare the dst cache.
> 
> Fixes: d71785ffc7e7 ("net: add dst_cache to ovs vxlan lwtunnel")
> Cc: Paolo Abeni <pabeni@redhat.com>
> Reported-by: Vlad Buslov <vladbu@nvidia.com>
> Tested-by: Vlad Buslov <vladbu@nvidia.com>
> Signed-off-by: Antoine Tenart <atenart@kernel.org>
> ---
>   include/net/dst_metadata.h | 13 ++++++++++++-
>   1 file changed, 12 insertions(+), 1 deletion(-)
> 
> diff --git a/include/net/dst_metadata.h b/include/net/dst_metadata.h
> index 14efa0ded75d..c8f8b7b56bba 100644
> --- a/include/net/dst_metadata.h
> +++ b/include/net/dst_metadata.h
> @@ -110,8 +110,8 @@ static inline struct metadata_dst *tun_rx_dst(int md_size)
>   static inline struct metadata_dst *tun_dst_unclone(struct sk_buff *skb)
>   {
>   	struct metadata_dst *md_dst = skb_metadata_dst(skb);
> -	int md_size;
>   	struct metadata_dst *new_md;
> +	int md_size, ret;
>   
>   	if (!md_dst || md_dst->type != METADATA_IP_TUNNEL)
>   		return ERR_PTR(-EINVAL);
> @@ -123,6 +123,17 @@ static inline struct metadata_dst *tun_dst_unclone(struct sk_buff *skb)
>   
>   	memcpy(&new_md->u.tun_info, &md_dst->u.tun_info,
>   	       sizeof(struct ip_tunnel_info) + md_size);
> +#ifdef CONFIG_DST_CACHE
> +	ret = dst_cache_init(&new_md->u.tun_info.dst_cache, GFP_ATOMIC);
> +	if (ret) {
> +		/* We can't call metadata_dst_free directly as the still shared
> +		 * dst cache would be released.
> +		 */
> +		kfree(new_md);
> +		return ERR_PTR(ret);
> +	}
> +#endif

Could you elaborate (e.g. also in commit message) how this interacts or whether it is
needed for TUNNEL_NOCACHE users? (Among others, latter is used by BPF, for example.)

>   	skb_dst_drop(skb);
>   	dst_hold(&new_md->dst);
>   	skb_dst_set(skb, &new_md->dst);
>
Antoine Tenart Feb. 2, 2022, 1:44 p.m. UTC | #3
Quoting Daniel Borkmann (2022-02-02 13:13:30)
> On 2/2/22 12:01 PM, Antoine Tenart wrote:
> > When uncloning an skb dst and its associated metadata a new dst+metadata
> > is allocated and the tunnel information from the old metadata is copied
> > over there.
> > 
> > The issue is the tunnel metadata has references to cached dst, which are
> > copied along the way. When a dst+metadata refcount drops to 0 the
> > metadata is freed including the cached dst entries. As they are also
> > referenced in the initial dst+metadata, this ends up in UaFs.
> > 
> > In practice the above did not happen because of another issue, the
> > dst+metadata was never freed because its refcount never dropped to 0
> > (this will be fixed in a subsequent patch).
> > 
> > Fix this by initializing the dst cache after copying the tunnel
> > information from the old metadata to also unshare the dst cache.
> > 
> > Fixes: d71785ffc7e7 ("net: add dst_cache to ovs vxlan lwtunnel")
> > Cc: Paolo Abeni <pabeni@redhat.com>
> > Reported-by: Vlad Buslov <vladbu@nvidia.com>
> > Tested-by: Vlad Buslov <vladbu@nvidia.com>
> > Signed-off-by: Antoine Tenart <atenart@kernel.org>
> > ---
> >   include/net/dst_metadata.h | 13 ++++++++++++-
> >   1 file changed, 12 insertions(+), 1 deletion(-)
> > 
> > diff --git a/include/net/dst_metadata.h b/include/net/dst_metadata.h
> > index 14efa0ded75d..c8f8b7b56bba 100644
> > --- a/include/net/dst_metadata.h
> > +++ b/include/net/dst_metadata.h
> > @@ -110,8 +110,8 @@ static inline struct metadata_dst *tun_rx_dst(int md_size)
> >   static inline struct metadata_dst *tun_dst_unclone(struct sk_buff *skb)
> >   {
> >       struct metadata_dst *md_dst = skb_metadata_dst(skb);
> > -     int md_size;
> >       struct metadata_dst *new_md;
> > +     int md_size, ret;
> >   
> >       if (!md_dst || md_dst->type != METADATA_IP_TUNNEL)
> >               return ERR_PTR(-EINVAL);
> > @@ -123,6 +123,17 @@ static inline struct metadata_dst *tun_dst_unclone(struct sk_buff *skb)
> >   
> >       memcpy(&new_md->u.tun_info, &md_dst->u.tun_info,
> >              sizeof(struct ip_tunnel_info) + md_size);
> > +#ifdef CONFIG_DST_CACHE
> > +     ret = dst_cache_init(&new_md->u.tun_info.dst_cache, GFP_ATOMIC);
> > +     if (ret) {
> > +             /* We can't call metadata_dst_free directly as the still shared
> > +              * dst cache would be released.
> > +              */
> > +             kfree(new_md);
> > +             return ERR_PTR(ret);
> > +     }
> > +#endif
> 
> Could you elaborate (e.g. also in commit message) how this interacts
> or whether it is needed for TUNNEL_NOCACHE users? (Among others,
> latter is used by BPF, for example.)

My understanding is that TUNNEL_NOCACHE is used to decide whether or not
to use a dst cache, that might or might not come from the tunnel info
attached to an skb. The dst cache being allocated in a tunnel info is
orthogonal to the use of TUNNEL_NOCACHE. While looking around I actually
found a code path explicitly setting both, in nft_tunnel_obj_init (that
might need to be investigated though but it is another topic).

It doesn't look like initializing the dst cache would break
TUNNEL_NOCACHE users as ip_tunnel_dst_cache_usable would return false
anyway. Having said that, we probably want to unshare the dst cache only
if there is one already, checking for
'md_dst->u.tun_info.dst_cache.cache != NULL' first.

Does that make sense?

Thanks!
Antoine
Paolo Abeni Feb. 2, 2022, 2:24 p.m. UTC | #4
On Wed, 2022-02-02 at 12:01 +0100, Antoine Tenart wrote:
> When uncloning an skb dst and its associated metadata a new dst+metadata
> is allocated and the tunnel information from the old metadata is copied
> over there.
> 
> The issue is the tunnel metadata has references to cached dst, which are
> copied along the way. When a dst+metadata refcount drops to 0 the
> metadata is freed including the cached dst entries. As they are also
> referenced in the initial dst+metadata, this ends up in UaFs.
> 
> In practice the above did not happen because of another issue, the
> dst+metadata was never freed because its refcount never dropped to 0
> (this will be fixed in a subsequent patch).
> 
> Fix this by initializing the dst cache after copying the tunnel
> information from the old metadata to also unshare the dst cache.
> 
> Fixes: d71785ffc7e7 ("net: add dst_cache to ovs vxlan lwtunnel")
> Cc: Paolo Abeni <pabeni@redhat.com>
> Reported-by: Vlad Buslov <vladbu@nvidia.com>
> Tested-by: Vlad Buslov <vladbu@nvidia.com>
> Signed-off-by: Antoine Tenart <atenart@kernel.org>
> ---
>  include/net/dst_metadata.h | 13 ++++++++++++-
>  1 file changed, 12 insertions(+), 1 deletion(-)
> 
> diff --git a/include/net/dst_metadata.h b/include/net/dst_metadata.h
> index 14efa0ded75d..c8f8b7b56bba 100644
> --- a/include/net/dst_metadata.h
> +++ b/include/net/dst_metadata.h
> @@ -110,8 +110,8 @@ static inline struct metadata_dst *tun_rx_dst(int md_size)
>  static inline struct metadata_dst *tun_dst_unclone(struct sk_buff *skb)
>  {
>  	struct metadata_dst *md_dst = skb_metadata_dst(skb);
> -	int md_size;
>  	struct metadata_dst *new_md;
> +	int md_size, ret;
>  
>  	if (!md_dst || md_dst->type != METADATA_IP_TUNNEL)
>  		return ERR_PTR(-EINVAL);
> @@ -123,6 +123,17 @@ static inline struct metadata_dst *tun_dst_unclone(struct sk_buff *skb)
>  
>  	memcpy(&new_md->u.tun_info, &md_dst->u.tun_info,
>  	       sizeof(struct ip_tunnel_info) + md_size);
> +#ifdef CONFIG_DST_CACHE
> +	ret = dst_cache_init(&new_md->u.tun_info.dst_cache, GFP_ATOMIC);
> +	if (ret) {
> +		/* We can't call metadata_dst_free directly as the still shared
> +		 * dst cache would be released.
> +		 */
> +		kfree(new_md);

I think here you can use metadata_dst_free(): if dst_cache_init fails,
the dst_cache will be zeroed.

> +		return ERR_PTR(ret);
> +	}
> +#endif
> +
>  	skb_dst_drop(skb);
>  	dst_hold(&new_md->dst);
>  	skb_dst_set(skb, &new_md->dst);

Other than that LGTM, thanks!

/P
Antoine Tenart Feb. 2, 2022, 2:29 p.m. UTC | #5
Quoting Paolo Abeni (2022-02-02 15:24:51)
> On Wed, 2022-02-02 at 12:01 +0100, Antoine Tenart wrote:
> >       memcpy(&new_md->u.tun_info, &md_dst->u.tun_info,
> >              sizeof(struct ip_tunnel_info) + md_size);
> > +#ifdef CONFIG_DST_CACHE
> > +     ret = dst_cache_init(&new_md->u.tun_info.dst_cache, GFP_ATOMIC);
> > +     if (ret) {
> > +             /* We can't call metadata_dst_free directly as the still shared
> > +              * dst cache would be released.
> > +              */
> > +             kfree(new_md);
> 
> I think here you can use metadata_dst_free(): if dst_cache_init fails,
> the dst_cache will be zeroed.

You're right, I'll use it in v2.

Thanks!
Antoine
Daniel Borkmann Feb. 4, 2022, 12:33 p.m. UTC | #6
On 2/2/22 2:44 PM, Antoine Tenart wrote:
> Quoting Daniel Borkmann (2022-02-02 13:13:30)
>> On 2/2/22 12:01 PM, Antoine Tenart wrote:
>>> When uncloning an skb dst and its associated metadata a new dst+metadata
>>> is allocated and the tunnel information from the old metadata is copied
>>> over there.
>>>
>>> The issue is the tunnel metadata has references to cached dst, which are
>>> copied along the way. When a dst+metadata refcount drops to 0 the
>>> metadata is freed including the cached dst entries. As they are also
>>> referenced in the initial dst+metadata, this ends up in UaFs.
>>>
>>> In practice the above did not happen because of another issue, the
>>> dst+metadata was never freed because its refcount never dropped to 0
>>> (this will be fixed in a subsequent patch).
>>>
>>> Fix this by initializing the dst cache after copying the tunnel
>>> information from the old metadata to also unshare the dst cache.
>>>
>>> Fixes: d71785ffc7e7 ("net: add dst_cache to ovs vxlan lwtunnel")
>>> Cc: Paolo Abeni <pabeni@redhat.com>
>>> Reported-by: Vlad Buslov <vladbu@nvidia.com>
>>> Tested-by: Vlad Buslov <vladbu@nvidia.com>
>>> Signed-off-by: Antoine Tenart <atenart@kernel.org>
>>> ---
>>>    include/net/dst_metadata.h | 13 ++++++++++++-
>>>    1 file changed, 12 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/include/net/dst_metadata.h b/include/net/dst_metadata.h
>>> index 14efa0ded75d..c8f8b7b56bba 100644
>>> --- a/include/net/dst_metadata.h
>>> +++ b/include/net/dst_metadata.h
>>> @@ -110,8 +110,8 @@ static inline struct metadata_dst *tun_rx_dst(int md_size)
>>>    static inline struct metadata_dst *tun_dst_unclone(struct sk_buff *skb)
>>>    {
>>>        struct metadata_dst *md_dst = skb_metadata_dst(skb);
>>> -     int md_size;
>>>        struct metadata_dst *new_md;
>>> +     int md_size, ret;
>>>    
>>>        if (!md_dst || md_dst->type != METADATA_IP_TUNNEL)
>>>                return ERR_PTR(-EINVAL);
>>> @@ -123,6 +123,17 @@ static inline struct metadata_dst *tun_dst_unclone(struct sk_buff *skb)
>>>    
>>>        memcpy(&new_md->u.tun_info, &md_dst->u.tun_info,
>>>               sizeof(struct ip_tunnel_info) + md_size);
>>> +#ifdef CONFIG_DST_CACHE
>>> +     ret = dst_cache_init(&new_md->u.tun_info.dst_cache, GFP_ATOMIC);
>>> +     if (ret) {
>>> +             /* We can't call metadata_dst_free directly as the still shared
>>> +              * dst cache would be released.
>>> +              */
>>> +             kfree(new_md);
>>> +             return ERR_PTR(ret);
>>> +     }
>>> +#endif
>>
>> Could you elaborate (e.g. also in commit message) how this interacts
>> or whether it is needed for TUNNEL_NOCACHE users? (Among others,
>> latter is used by BPF, for example.)
> 
> My understanding is that TUNNEL_NOCACHE is used to decide whether or not
> to use a dst cache, that might or might not come from the tunnel info
> attached to an skb. The dst cache being allocated in a tunnel info is
> orthogonal to the use of TUNNEL_NOCACHE. While looking around I actually
> found a code path explicitly setting both, in nft_tunnel_obj_init (that
> might need to be investigated though but it is another topic).

Good point, this is coming from 3e511d5652ce ("netfilter: nft_tunnel: Add dst_cache
support") and was added only after af308b94a2a4 ("netfilter: nf_tables: add tunnel
support") which initially indicated TUNNEL_NOCACHE. This is indeed contradictory.
wenxu (+Cc), ptal.

> It doesn't look like initializing the dst cache would break
> TUNNEL_NOCACHE users as ip_tunnel_dst_cache_usable would return false
> anyway. Having said that, we probably want to unshare the dst cache only
> if there is one already, checking for
> 'md_dst->u.tun_info.dst_cache.cache != NULL' first.

Meaning, if that is the case, we wouldn't require the dst_cache_init() and thus
extra alloc, right? Would make sense afaics. db3c6139e6ea ("bpf, vxlan, geneve,
gre: fix usage of dst_cache on xmit") had some details related to BPF use.

Thanks again!
Daniel
Antoine Tenart Feb. 4, 2022, 4:19 p.m. UTC | #7
Quoting Daniel Borkmann (2022-02-04 13:33:20)
> On 2/2/22 2:44 PM, Antoine Tenart wrote:
> > Quoting Daniel Borkmann (2022-02-02 13:13:30)
> >> On 2/2/22 12:01 PM, Antoine Tenart wrote:
> >>> When uncloning an skb dst and its associated metadata a new dst+metadata
> >>> is allocated and the tunnel information from the old metadata is copied
> >>> over there.
> >>>
> >>> The issue is the tunnel metadata has references to cached dst, which are
> >>> copied along the way. When a dst+metadata refcount drops to 0 the
> >>> metadata is freed including the cached dst entries. As they are also
> >>> referenced in the initial dst+metadata, this ends up in UaFs.
> >>>
> >>> In practice the above did not happen because of another issue, the
> >>> dst+metadata was never freed because its refcount never dropped to 0
> >>> (this will be fixed in a subsequent patch).
> >>>
> >>> Fix this by initializing the dst cache after copying the tunnel
> >>> information from the old metadata to also unshare the dst cache.
> >>>
> >>> Fixes: d71785ffc7e7 ("net: add dst_cache to ovs vxlan lwtunnel")
> >>> Cc: Paolo Abeni <pabeni@redhat.com>
> >>> Reported-by: Vlad Buslov <vladbu@nvidia.com>
> >>> Tested-by: Vlad Buslov <vladbu@nvidia.com>
> >>> Signed-off-by: Antoine Tenart <atenart@kernel.org>
> >>> ---
> >>>    include/net/dst_metadata.h | 13 ++++++++++++-
> >>>    1 file changed, 12 insertions(+), 1 deletion(-)
> >>>
> >>> diff --git a/include/net/dst_metadata.h b/include/net/dst_metadata.h
> >>> index 14efa0ded75d..c8f8b7b56bba 100644
> >>> --- a/include/net/dst_metadata.h
> >>> +++ b/include/net/dst_metadata.h
> >>> @@ -110,8 +110,8 @@ static inline struct metadata_dst *tun_rx_dst(int md_size)
> >>>    static inline struct metadata_dst *tun_dst_unclone(struct sk_buff *skb)
> >>>    {
> >>>        struct metadata_dst *md_dst = skb_metadata_dst(skb);
> >>> -     int md_size;
> >>>        struct metadata_dst *new_md;
> >>> +     int md_size, ret;
> >>>    
> >>>        if (!md_dst || md_dst->type != METADATA_IP_TUNNEL)
> >>>                return ERR_PTR(-EINVAL);
> >>> @@ -123,6 +123,17 @@ static inline struct metadata_dst *tun_dst_unclone(struct sk_buff *skb)
> >>>    
> >>>        memcpy(&new_md->u.tun_info, &md_dst->u.tun_info,
> >>>               sizeof(struct ip_tunnel_info) + md_size);
> >>> +#ifdef CONFIG_DST_CACHE
> >>> +     ret = dst_cache_init(&new_md->u.tun_info.dst_cache, GFP_ATOMIC);
> >>> +     if (ret) {
> >>> +             /* We can't call metadata_dst_free directly as the still shared
> >>> +              * dst cache would be released.
> >>> +              */
> >>> +             kfree(new_md);
> >>> +             return ERR_PTR(ret);
> >>> +     }
> >>> +#endif
> >>
> >> Could you elaborate (e.g. also in commit message) how this interacts
> >> or whether it is needed for TUNNEL_NOCACHE users? (Among others,
> >> latter is used by BPF, for example.)
> > 
> > My understanding is that TUNNEL_NOCACHE is used to decide whether or not
> > to use a dst cache, that might or might not come from the tunnel info
> > attached to an skb. The dst cache being allocated in a tunnel info is
> > orthogonal to the use of TUNNEL_NOCACHE. While looking around I actually
> > found a code path explicitly setting both, in nft_tunnel_obj_init (that
> > might need to be investigated though but it is another topic).
> 
> Good point, this is coming from 3e511d5652ce ("netfilter: nft_tunnel: Add dst_cache
> support") and was added only after af308b94a2a4 ("netfilter: nf_tables: add tunnel
> support") which initially indicated TUNNEL_NOCACHE. This is indeed contradictory.
> wenxu (+Cc), ptal.
> 
> > It doesn't look like initializing the dst cache would break
> > TUNNEL_NOCACHE users as ip_tunnel_dst_cache_usable would return false
> > anyway. Having said that, we probably want to unshare the dst cache only
> > if there is one already, checking for
> > 'md_dst->u.tun_info.dst_cache.cache != NULL' first.
> 
> Meaning, if that is the case, we wouldn't require the dst_cache_init()
> and thus extra alloc, right? Would make sense afaics.

Meaning:

          memcpy(&new_md->u.tun_info, &md_dst->u.tun_info,
                 sizeof(struct ip_tunnel_info) + md_size);
  +#ifdef CONFIG_DST_CACHE
  +       if (new_md->u.tun_info.dst_cache.cache) {
  +               int ret = dst_cache_init(&new_md->u.tun_info.dst_cache,
  +                                        GFP_ATOMIC);
  +               if (ret) {
  +                       metadata_dst_free(new_md);
  +                       return ERR_PTR(ret);
  +               }
  +       }
  +#endif

So that the cache is unshared only if there was one in the first place.
If there was no cache to unshare, we can save the extra alloc.

> db3c6139e6ea ("bpf, vxlan, geneve, gre: fix usage of dst_cache on
> xmit") had some details related to BPF use.

With the above commit if TUNNEL_NOCACHE is set the tunnel cache
shouldn't be used, regardless of it being allocated. I guess with that
and the above, we should be good.

Thanks!
Antoine
diff mbox series

Patch

diff --git a/include/net/dst_metadata.h b/include/net/dst_metadata.h
index 14efa0ded75d..c8f8b7b56bba 100644
--- a/include/net/dst_metadata.h
+++ b/include/net/dst_metadata.h
@@ -110,8 +110,8 @@  static inline struct metadata_dst *tun_rx_dst(int md_size)
 static inline struct metadata_dst *tun_dst_unclone(struct sk_buff *skb)
 {
 	struct metadata_dst *md_dst = skb_metadata_dst(skb);
-	int md_size;
 	struct metadata_dst *new_md;
+	int md_size, ret;
 
 	if (!md_dst || md_dst->type != METADATA_IP_TUNNEL)
 		return ERR_PTR(-EINVAL);
@@ -123,6 +123,17 @@  static inline struct metadata_dst *tun_dst_unclone(struct sk_buff *skb)
 
 	memcpy(&new_md->u.tun_info, &md_dst->u.tun_info,
 	       sizeof(struct ip_tunnel_info) + md_size);
+#ifdef CONFIG_DST_CACHE
+	ret = dst_cache_init(&new_md->u.tun_info.dst_cache, GFP_ATOMIC);
+	if (ret) {
+		/* We can't call metadata_dst_free directly as the still shared
+		 * dst cache would be released.
+		 */
+		kfree(new_md);
+		return ERR_PTR(ret);
+	}
+#endif
+
 	skb_dst_drop(skb);
 	dst_hold(&new_md->dst);
 	skb_dst_set(skb, &new_md->dst);