diff mbox series

[net,v2] net: vlan: fix a UAF in vlan_dev_real_dev()

Message ID 20211102021218.955277-1-william.xuanziyang@huawei.com (mailing list archive)
State Accepted
Commit 563bcbae3ba233c275c244bfce2efe12938f5363
Delegated to: Netdev Maintainers
Headers show
Series [net,v2] net: vlan: fix a UAF in vlan_dev_real_dev() | expand

Checks

Context Check Description
netdev/cover_letter success Single patches do not need cover letters
netdev/fixes_present success Fixes tag present in non-next series
netdev/patch_count success Link
netdev/tree_selection success Clearly marked for net
netdev/subject_prefix success Link
netdev/cc_maintainers warning 6 maintainers not CCed: pablo@netfilter.org eranbe@nvidia.com nbd@nbd.name arnd@arndb.de keescook@chromium.org zhudi21@huawei.com
netdev/source_inline success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/module_param success Was 0 now: 0
netdev/build_32bit success Errors and warnings before: 0 this patch: 0
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/verify_fixes success Fixes tag looks correct
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 18 lines checked
netdev/build_allmodconfig_warn success Errors and warnings before: 0 this patch: 0
netdev/header_inline success No static functions without inline keyword in header files

Commit Message

Ziyang Xuan (William) Nov. 2, 2021, 2:12 a.m. UTC
The real_dev of a vlan net_device may be freed after
unregister_vlan_dev(). Access the real_dev continually by
vlan_dev_real_dev() will trigger the UAF problem for the
real_dev like following:

==================================================================
BUG: KASAN: use-after-free in vlan_dev_real_dev+0xf9/0x120
Call Trace:
 kasan_report.cold+0x83/0xdf
 vlan_dev_real_dev+0xf9/0x120
 is_eth_port_of_netdev_filter.part.0+0xb1/0x2c0
 is_eth_port_of_netdev_filter+0x28/0x40
 ib_enum_roce_netdev+0x1a3/0x300
 ib_enum_all_roce_netdevs+0xc7/0x140
 netdevice_event_work_handler+0x9d/0x210
...

Freed by task 9288:
 kasan_save_stack+0x1b/0x40
 kasan_set_track+0x1c/0x30
 kasan_set_free_info+0x20/0x30
 __kasan_slab_free+0xfc/0x130
 slab_free_freelist_hook+0xdd/0x240
 kfree+0xe4/0x690
 kvfree+0x42/0x50
 device_release+0x9f/0x240
 kobject_put+0x1c8/0x530
 put_device+0x1b/0x30
 free_netdev+0x370/0x540
 ppp_destroy_interface+0x313/0x3d0
...

Move the put_device(real_dev) to vlan_dev_free(). Ensure
real_dev not be freed before vlan_dev unregistered.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: syzbot+e4df4e1389e28972e955@syzkaller.appspotmail.com
Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com>
---
 net/8021q/vlan.c     | 3 ---
 net/8021q/vlan_dev.c | 3 +++
 2 files changed, 3 insertions(+), 3 deletions(-)

Comments

Jason Gunthorpe Nov. 3, 2021, 1:50 p.m. UTC | #1
On Tue, Nov 02, 2021 at 10:12:18AM +0800, Ziyang Xuan wrote:
> The real_dev of a vlan net_device may be freed after
> unregister_vlan_dev(). Access the real_dev continually by
> vlan_dev_real_dev() will trigger the UAF problem for the
> real_dev like following:
> 
> ==================================================================
> BUG: KASAN: use-after-free in vlan_dev_real_dev+0xf9/0x120
> Call Trace:
>  kasan_report.cold+0x83/0xdf
>  vlan_dev_real_dev+0xf9/0x120
>  is_eth_port_of_netdev_filter.part.0+0xb1/0x2c0
>  is_eth_port_of_netdev_filter+0x28/0x40
>  ib_enum_roce_netdev+0x1a3/0x300
>  ib_enum_all_roce_netdevs+0xc7/0x140
>  netdevice_event_work_handler+0x9d/0x210
> ...
> 
> Freed by task 9288:
>  kasan_save_stack+0x1b/0x40
>  kasan_set_track+0x1c/0x30
>  kasan_set_free_info+0x20/0x30
>  __kasan_slab_free+0xfc/0x130
>  slab_free_freelist_hook+0xdd/0x240
>  kfree+0xe4/0x690
>  kvfree+0x42/0x50
>  device_release+0x9f/0x240
>  kobject_put+0x1c8/0x530
>  put_device+0x1b/0x30
>  free_netdev+0x370/0x540
>  ppp_destroy_interface+0x313/0x3d0
> ...
> 
> Move the put_device(real_dev) to vlan_dev_free(). Ensure
> real_dev not be freed before vlan_dev unregistered.
> 
> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
> Reported-by: syzbot+e4df4e1389e28972e955@syzkaller.appspotmail.com
> Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com>
> ---
>  net/8021q/vlan.c     | 3 ---
>  net/8021q/vlan_dev.c | 3 +++
>  2 files changed, 3 insertions(+), 3 deletions(-)

Suggested-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

(though I can't tell either if there is a possiblecircular dep problem
in some oddball case)

Jason
patchwork-bot+netdevbpf@kernel.org Nov. 3, 2021, 2:30 p.m. UTC | #2
Hello:

This patch was applied to netdev/net.git (master)
by David S. Miller <davem@davemloft.net>:

On Tue, 2 Nov 2021 10:12:18 +0800 you wrote:
> The real_dev of a vlan net_device may be freed after
> unregister_vlan_dev(). Access the real_dev continually by
> vlan_dev_real_dev() will trigger the UAF problem for the
> real_dev like following:
> 
> ==================================================================
> BUG: KASAN: use-after-free in vlan_dev_real_dev+0xf9/0x120
> Call Trace:
>  kasan_report.cold+0x83/0xdf
>  vlan_dev_real_dev+0xf9/0x120
>  is_eth_port_of_netdev_filter.part.0+0xb1/0x2c0
>  is_eth_port_of_netdev_filter+0x28/0x40
>  ib_enum_roce_netdev+0x1a3/0x300
>  ib_enum_all_roce_netdevs+0xc7/0x140
>  netdevice_event_work_handler+0x9d/0x210
> ...
> 
> [...]

Here is the summary with links:
  - [net,v2] net: vlan: fix a UAF in vlan_dev_real_dev()
    https://git.kernel.org/netdev/net/c/563bcbae3ba2

You are awesome, thank you!
Jakub Kicinski Nov. 3, 2021, 3:47 p.m. UTC | #3
On Wed, 3 Nov 2021 10:50:34 -0300 Jason Gunthorpe wrote:
> (though I can't tell either if there is a possiblecircular dep problem
> in some oddball case)

Same, luckily we're just starting a new dev cycle and syzbot can have
at it. 

We should probably not let this patch get into stable right away -
assuming you agree - would you mind nacking the selection if it happens?
I'm not sure I'll get CCed since it doesn't have my tags.
Jason Gunthorpe Nov. 3, 2021, 4:11 p.m. UTC | #4
On Wed, Nov 03, 2021 at 08:47:46AM -0700, Jakub Kicinski wrote:
> On Wed, 3 Nov 2021 10:50:34 -0300 Jason Gunthorpe wrote:
> > (though I can't tell either if there is a possiblecircular dep problem
> > in some oddball case)
> 
> Same, luckily we're just starting a new dev cycle and syzbot can have
> at it. 
> 
> We should probably not let this patch get into stable right away -
> assuming you agree - would you mind nacking the selection if it happens?
> I'm not sure I'll get CCed since it doesn't have my tags.

I will make an effort, sure. It is hard to be confident due to all the
stable selection emails I get :|

Thanks,
Jason
Petr Machata Nov. 15, 2021, 5:04 p.m. UTC | #5
Ziyang Xuan <william.xuanziyang@huawei.com> writes:

> diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c
> index 55275ef9a31a..a3a0a5e994f5 100644
> --- a/net/8021q/vlan.c
> +++ b/net/8021q/vlan.c
> @@ -123,9 +123,6 @@ void unregister_vlan_dev(struct net_device *dev, struct list_head *head)
>  	}
>  
>  	vlan_vid_del(real_dev, vlan->vlan_proto, vlan_id);
> -
> -	/* Get rid of the vlan's reference to real_dev */
> -	dev_put(real_dev);
>  }
>  
>  int vlan_check_real_dev(struct net_device *real_dev,
> diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c
> index 0c21d1fec852..aeeb5f90417b 100644
> --- a/net/8021q/vlan_dev.c
> +++ b/net/8021q/vlan_dev.c
> @@ -843,6 +843,9 @@ static void vlan_dev_free(struct net_device *dev)
>  
>  	free_percpu(vlan->vlan_pcpu_stats);
>  	vlan->vlan_pcpu_stats = NULL;
> +
> +	/* Get rid of the vlan's reference to real_dev */
> +	dev_put(vlan->real_dev);
>  }
>  
>  void vlan_setup(struct net_device *dev)

This is causing reference counting issues when vetoing is involved.
Consider the following snippet:

    ip link add name bond1 type bond mode 802.3ad
    ip link set dev swp1 master bond1
    ip link add name bond1.100 link bond1 type vlan protocol 802.1ad id 100
    # ^ vetoed, no netdevice created
    ip link del dev bond1

The setup process goes like this: vlan_newlink() calls
register_vlan_dev() calls netdev_upper_dev_link() calls
__netdev_upper_dev_link(), which issues a notifier
NETDEV_PRECHANGEUPPER, which yields a non-zero error,
because a listener vetoed it.

So it unwinds, skipping dev_hold(real_dev), but eventually the VLAN ends
up decreasing reference count of the real_dev. Then when when the bond
netdevice is removed, we get an endless loop of:

    kernel:unregister_netdevice: waiting for bond1 to become free. Usage count = 0 

Moving the dev_hold(real_dev) to always happen even if the
netdev_upper_dev_link() call makes the issue go away.

I'm not sure why this wasn't happening before. After the veto,
register_vlan_dev() follows with a goto out_unregister_netdev, which
calls unregister_netdevice() calls unregister_netdevice_queue(), which
issues a notifier NETDEV_UNREGISTER, which invokes vlan_device_event(),
which calls unregister_vlan_dev(), which used to dev_put(real_dev),
which seems like it should have caused the same issue. Dunno.
Jakub Kicinski Nov. 15, 2021, 5:49 p.m. UTC | #6
On Mon, 15 Nov 2021 18:04:42 +0100 Petr Machata wrote:
> Ziyang Xuan <william.xuanziyang@huawei.com> writes:
> 
> > diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c
> > index 55275ef9a31a..a3a0a5e994f5 100644
> > --- a/net/8021q/vlan.c
> > +++ b/net/8021q/vlan.c
> > @@ -123,9 +123,6 @@ void unregister_vlan_dev(struct net_device *dev, struct list_head *head)
> >  	}
> >  
> >  	vlan_vid_del(real_dev, vlan->vlan_proto, vlan_id);
> > -
> > -	/* Get rid of the vlan's reference to real_dev */
> > -	dev_put(real_dev);
> >  }
> >  
> >  int vlan_check_real_dev(struct net_device *real_dev,
> > diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c
> > index 0c21d1fec852..aeeb5f90417b 100644
> > --- a/net/8021q/vlan_dev.c
> > +++ b/net/8021q/vlan_dev.c
> > @@ -843,6 +843,9 @@ static void vlan_dev_free(struct net_device *dev)
> >  
> >  	free_percpu(vlan->vlan_pcpu_stats);
> >  	vlan->vlan_pcpu_stats = NULL;
> > +
> > +	/* Get rid of the vlan's reference to real_dev */
> > +	dev_put(vlan->real_dev);
> >  }
> >  
> >  void vlan_setup(struct net_device *dev)  
> 
> This is causing reference counting issues when vetoing is involved.
> Consider the following snippet:
> 
>     ip link add name bond1 type bond mode 802.3ad
>     ip link set dev swp1 master bond1
>     ip link add name bond1.100 link bond1 type vlan protocol 802.1ad id 100
>     # ^ vetoed, no netdevice created
>     ip link del dev bond1
> 
> The setup process goes like this: vlan_newlink() calls
> register_vlan_dev() calls netdev_upper_dev_link() calls
> __netdev_upper_dev_link(), which issues a notifier
> NETDEV_PRECHANGEUPPER, which yields a non-zero error,
> because a listener vetoed it.
> 
> So it unwinds, skipping dev_hold(real_dev), but eventually the VLAN ends
> up decreasing reference count of the real_dev. Then when when the bond
> netdevice is removed, we get an endless loop of:
> 
>     kernel:unregister_netdevice: waiting for bond1 to become free. Usage count = 0 
> 
> Moving the dev_hold(real_dev) to always happen even if the
> netdev_upper_dev_link() call makes the issue go away.
> 
> I'm not sure why this wasn't happening before. After the veto,
> register_vlan_dev() follows with a goto out_unregister_netdev, which
> calls unregister_netdevice() calls unregister_netdevice_queue(), which
> issues a notifier NETDEV_UNREGISTER, which invokes vlan_device_event(),
> which calls unregister_vlan_dev(), which used to dev_put(real_dev),
> which seems like it should have caused the same issue. Dunno.

Does the notifier trigger unregister_vlan_dev()? I thought the notifier
triggers when lower dev is unregistered.

I think we should move the dev_hold() to ndo_init(), otherwise 
it's hard to reason if destructor was invoked or not if
register_netdevice() errors out.
Petr Machata Nov. 16, 2021, 2:20 p.m. UTC | #7
Jakub Kicinski <kuba@kernel.org> writes:

> On Mon, 15 Nov 2021 18:04:42 +0100 Petr Machata wrote:
>
>> I'm not sure why this wasn't happening before. After the veto,
>> register_vlan_dev() follows with a goto out_unregister_netdev, which
>> calls unregister_netdevice() calls unregister_netdevice_queue(), which
>> issues a notifier NETDEV_UNREGISTER, which invokes vlan_device_event(),
>> which calls unregister_vlan_dev(), which used to dev_put(real_dev),
>> which seems like it should have caused the same issue. Dunno.
>
> Does the notifier trigger unregister_vlan_dev()? I thought the notifier
> triggers when lower dev is unregistered.

Right, I misinterpreted this bit:

        vlan_info = rtnl_dereference(dev->vlan_info);
        if (!vlan_info)
                goto out;
Petr Machata Nov. 17, 2021, 11:50 a.m. UTC | #8
Jakub Kicinski <kuba@kernel.org> writes:

> On Mon, 15 Nov 2021 18:04:42 +0100 Petr Machata wrote:
>> Ziyang Xuan <william.xuanziyang@huawei.com> writes:
>> 
>> > diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c
>> > index 55275ef9a31a..a3a0a5e994f5 100644
>> > --- a/net/8021q/vlan.c
>> > +++ b/net/8021q/vlan.c
>> > @@ -123,9 +123,6 @@ void unregister_vlan_dev(struct net_device *dev, struct list_head *head)
>> >  	}
>> >  
>> >  	vlan_vid_del(real_dev, vlan->vlan_proto, vlan_id);
>> > -
>> > -	/* Get rid of the vlan's reference to real_dev */
>> > -	dev_put(real_dev);
>> >  }
>> >  
>> >  int vlan_check_real_dev(struct net_device *real_dev,
>> > diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c
>> > index 0c21d1fec852..aeeb5f90417b 100644
>> > --- a/net/8021q/vlan_dev.c
>> > +++ b/net/8021q/vlan_dev.c
>> > @@ -843,6 +843,9 @@ static void vlan_dev_free(struct net_device *dev)
>> >  
>> >  	free_percpu(vlan->vlan_pcpu_stats);
>> >  	vlan->vlan_pcpu_stats = NULL;
>> > +
>> > +	/* Get rid of the vlan's reference to real_dev */
>> > +	dev_put(vlan->real_dev);
>> >  }
>> >  
>> >  void vlan_setup(struct net_device *dev)  
>> 
>> This is causing reference counting issues when vetoing is involved.
>> Consider the following snippet:
>> 
>>     ip link add name bond1 type bond mode 802.3ad
>>     ip link set dev swp1 master bond1
>>     ip link add name bond1.100 link bond1 type vlan protocol 802.1ad id 100
>>     # ^ vetoed, no netdevice created
>>     ip link del dev bond1
>> 
>> The setup process goes like this: vlan_newlink() calls
>> register_vlan_dev() calls netdev_upper_dev_link() calls
>> __netdev_upper_dev_link(), which issues a notifier
>> NETDEV_PRECHANGEUPPER, which yields a non-zero error,
>> because a listener vetoed it.
>> 
>> So it unwinds, skipping dev_hold(real_dev), but eventually the VLAN ends
>> up decreasing reference count of the real_dev. Then when when the bond
>> netdevice is removed, we get an endless loop of:
>> 
>>     kernel:unregister_netdevice: waiting for bond1 to become free. Usage count = 0 
>> 
>> Moving the dev_hold(real_dev) to always happen even if the
>> netdev_upper_dev_link() call makes the issue go away.
>
> I think we should move the dev_hold() to ndo_init(), otherwise 
> it's hard to reason if destructor was invoked or not if
> register_netdevice() errors out.

Ziyang Xuan, do you intend to take care of this?
Ziyang Xuan (William) Nov. 18, 2021, 1:46 a.m. UTC | #9
> 
> Jakub Kicinski <kuba@kernel.org> writes:
> 
>> On Mon, 15 Nov 2021 18:04:42 +0100 Petr Machata wrote:
>>> Ziyang Xuan <william.xuanziyang@huawei.com> writes:
>>>
>>>> diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c
>>>> index 55275ef9a31a..a3a0a5e994f5 100644
>>>> --- a/net/8021q/vlan.c
>>>> +++ b/net/8021q/vlan.c
>>>> @@ -123,9 +123,6 @@ void unregister_vlan_dev(struct net_device *dev, struct list_head *head)
>>>>  	}
>>>>  
>>>>  	vlan_vid_del(real_dev, vlan->vlan_proto, vlan_id);
>>>> -
>>>> -	/* Get rid of the vlan's reference to real_dev */
>>>> -	dev_put(real_dev);
>>>>  }
>>>>  
>>>>  int vlan_check_real_dev(struct net_device *real_dev,
>>>> diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c
>>>> index 0c21d1fec852..aeeb5f90417b 100644
>>>> --- a/net/8021q/vlan_dev.c
>>>> +++ b/net/8021q/vlan_dev.c
>>>> @@ -843,6 +843,9 @@ static void vlan_dev_free(struct net_device *dev)
>>>>  
>>>>  	free_percpu(vlan->vlan_pcpu_stats);
>>>>  	vlan->vlan_pcpu_stats = NULL;
>>>> +
>>>> +	/* Get rid of the vlan's reference to real_dev */
>>>> +	dev_put(vlan->real_dev);
>>>>  }
>>>>  
>>>>  void vlan_setup(struct net_device *dev)  
>>>
>>> This is causing reference counting issues when vetoing is involved.
>>> Consider the following snippet:
>>>
>>>     ip link add name bond1 type bond mode 802.3ad
>>>     ip link set dev swp1 master bond1
>>>     ip link add name bond1.100 link bond1 type vlan protocol 802.1ad id 100
>>>     # ^ vetoed, no netdevice created
>>>     ip link del dev bond1
>>>
>>> The setup process goes like this: vlan_newlink() calls
>>> register_vlan_dev() calls netdev_upper_dev_link() calls
>>> __netdev_upper_dev_link(), which issues a notifier
>>> NETDEV_PRECHANGEUPPER, which yields a non-zero error,
>>> because a listener vetoed it.
>>>
>>> So it unwinds, skipping dev_hold(real_dev), but eventually the VLAN ends
>>> up decreasing reference count of the real_dev. Then when when the bond
>>> netdevice is removed, we get an endless loop of:
>>>
>>>     kernel:unregister_netdevice: waiting for bond1 to become free. Usage count = 0 
>>>
>>> Moving the dev_hold(real_dev) to always happen even if the
>>> netdev_upper_dev_link() call makes the issue go away.
>>
>> I think we should move the dev_hold() to ndo_init(), otherwise 
>> it's hard to reason if destructor was invoked or not if
>> register_netdevice() errors out.
> 
> Ziyang Xuan, do you intend to take care of this?
> .

I am reading the related processes according to the problem scenario.
And I will give a more clear sequence and root cause as soon as possible
by some necessary tests.

Thank you!
Jakub Kicinski Nov. 18, 2021, 2:17 p.m. UTC | #10
On Thu, 18 Nov 2021 09:46:24 +0800 Ziyang Xuan (William) wrote:
> >> I think we should move the dev_hold() to ndo_init(), otherwise 
> >> it's hard to reason if destructor was invoked or not if
> >> register_netdevice() errors out.  
> > 
> > Ziyang Xuan, do you intend to take care of this?
> > .  
> 
> I am reading the related processes according to the problem scenario.
> And I will give a more clear sequence and root cause as soon as possible
> by some necessary tests.

Okay, I still don't see the fix. 

Petr would you mind submitting since you have a repro handy?
Ziyang Xuan (William) Nov. 19, 2021, 3:04 a.m. UTC | #11
> 
> Ziyang Xuan <william.xuanziyang@huawei.com> writes:
> 
>> diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c
>> index 55275ef9a31a..a3a0a5e994f5 100644
>> --- a/net/8021q/vlan.c
>> +++ b/net/8021q/vlan.c
>> @@ -123,9 +123,6 @@ void unregister_vlan_dev(struct net_device *dev, struct list_head *head)
>>  	}
>>  
>>  	vlan_vid_del(real_dev, vlan->vlan_proto, vlan_id);
>> -
>> -	/* Get rid of the vlan's reference to real_dev */
>> -	dev_put(real_dev);
>>  }
>>  
>>  int vlan_check_real_dev(struct net_device *real_dev,
>> diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c
>> index 0c21d1fec852..aeeb5f90417b 100644
>> --- a/net/8021q/vlan_dev.c
>> +++ b/net/8021q/vlan_dev.c
>> @@ -843,6 +843,9 @@ static void vlan_dev_free(struct net_device *dev)
>>  
>>  	free_percpu(vlan->vlan_pcpu_stats);
>>  	vlan->vlan_pcpu_stats = NULL;
>> +
>> +	/* Get rid of the vlan's reference to real_dev */
>> +	dev_put(vlan->real_dev);
>>  }
>>  
>>  void vlan_setup(struct net_device *dev)
> 
> This is causing reference counting issues when vetoing is involved.
> Consider the following snippet:
> 
>     ip link add name bond1 type bond mode 802.3ad
>     ip link set dev swp1 master bond1
>     ip link add name bond1.100 link bond1 type vlan protocol 802.1ad id 100
>     # ^ vetoed, no netdevice created
>     ip link del dev bond1
> 
> The setup process goes like this: vlan_newlink() calls
> register_vlan_dev() calls netdev_upper_dev_link() calls
> __netdev_upper_dev_link(), which issues a notifier
> NETDEV_PRECHANGEUPPER, which yields a non-zero error,
> because a listener vetoed it.
> 
> So it unwinds, skipping dev_hold(real_dev), but eventually the VLAN ends
> up decreasing reference count of the real_dev. Then when when the bond
> netdevice is removed, we get an endless loop of:
> 
>     kernel:unregister_netdevice: waiting for bond1 to become free. Usage count = 0 
> 
> Moving the dev_hold(real_dev) to always happen even if the
> netdev_upper_dev_link() call makes the issue go away.
> 
> I'm not sure why this wasn't happening before. After the veto,
> register_vlan_dev() follows with a goto out_unregister_netdev, which
> calls unregister_netdevice() calls unregister_netdevice_queue(), which
> issues a notifier NETDEV_UNREGISTER, which invokes vlan_device_event(),
> which calls unregister_vlan_dev(), which used to dev_put(real_dev),
> which seems like it should have caused the same issue. Dunno.

netdev_upper_dev_link() failure in register_vlan_dev() will result in
no dev_hold(real_dev) and vlan_group_set_device(). So NETDEV_UNREGISTER
notifier caused by vlan_dev invokes vlan_device_event() will not find the
vlan_dev in vlan_group, and no unregister_vlan_dev() for the vlan_dev.

>
Ziyang Xuan (William) Nov. 19, 2021, 3:29 a.m. UTC | #12
> On Thu, 18 Nov 2021 09:46:24 +0800 Ziyang Xuan (William) wrote:
>>>> I think we should move the dev_hold() to ndo_init(), otherwise 
>>>> it's hard to reason if destructor was invoked or not if
>>>> register_netdevice() errors out.  
>>>
>>> Ziyang Xuan, do you intend to take care of this?
>>> .  
>>
>> I am reading the related processes according to the problem scenario.
>> And I will give a more clear sequence and root cause as soon as possible
>> by some necessary tests.
> 
> Okay, I still don't see the fix. 
> 
> Petr would you mind submitting since you have a repro handy?
> .
The sequence for the problem maybe as following:

=============================================================
# create vlan device
vlan_newlink [assume real_dev refcnt == 2 just referenced by itself]
	register_vlan_dev
		register_netdevice(vlan_dev) [success]
		netdev_upper_dev_link [failed]
		unregister_netdevice(vlan_dev) [failure process]
		...
		netdev_run_todo [vlan_dev]
			vlan_dev_free(vlan_dev) [priv_destructor cb]
				dev_put(real_dev) [real_dev refcnt == 1]

# delete real device
unregister_netdevice(real_dev) [real_dev refcnt == 1]
	unregister_netdevice_many
		dev_put(real_dev) [real_dev refcnt == 0]
		net_set_todo(real_dev)
...
netdev_run_todo [real_dev]
	netdev_wait_allrefs(real_dev) [real_dev refcnt == 0]
		pr_emerg("unregister_netdevice: ...", ...)

I am thinking about how to fix the problem. priv_destructor() of the
net_device referenced not only by net_set_todo() but also failure process
in register_netdevice().

I need some time to test my some ideas. And anyone has good ideas, please
do not be stingy.

Thank you!
Petr Machata Nov. 19, 2021, 10:07 a.m. UTC | #13
Ziyang Xuan (William) <william.xuanziyang@huawei.com> writes:

> I need some time to test my some ideas. And anyone has good ideas, please
> do not be stingy.

Jakub Kicinski <kuba@kernel.org> writes:

> I think we should move the dev_hold() to ndo_init(), otherwise it's
> hard to reason if destructor was invoked or not if
> register_netdevice() errors out.

That makes sense to me. We always put real_dev in the destructor, so we
should always hold it in the constructor...
Ziyang Xuan (William) Nov. 23, 2021, 9:01 a.m. UTC | #14
> 
> Ziyang Xuan (William) <william.xuanziyang@huawei.com> writes:
> 
>> I need some time to test my some ideas. And anyone has good ideas, please
>> do not be stingy.
> 
> Jakub Kicinski <kuba@kernel.org> writes:
> 
>> I think we should move the dev_hold() to ndo_init(), otherwise it's
>> hard to reason if destructor was invoked or not if
>> register_netdevice() errors out.
> 
> That makes sense to me. We always put real_dev in the destructor, so we
> should always hold it in the constructor...

Inject error before dev_hold(real_dev) in register_vlan_dev(), and execute
the following testcase:

ip link add dev dummy1 type dummy
ip link add name dummy1.100 link dummy1 type vlan id 100 // failed for error injection
ip link del dev dummy1

Make the problem repro. The problem is solved using the following fix
according to the Jakub's suggestion:

diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c
index a3a0a5e994f5..abaa5d96ded2 100644
--- a/net/8021q/vlan.c
+++ b/net/8021q/vlan.c
@@ -184,9 +184,6 @@ int register_vlan_dev(struct net_device *dev, struct netlink_ext_ack *extack)
        if (err)
                goto out_unregister_netdev;

-       /* Account for reference in struct vlan_dev_priv */
-       dev_hold(real_dev);
-
        vlan_stacked_transfer_operstate(real_dev, dev, vlan);
        linkwatch_fire_event(dev); /* _MUST_ call rfc2863_policy() */

diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c
index ab6dee28536d..a54535cbcf4c 100644
--- a/net/8021q/vlan_dev.c
+++ b/net/8021q/vlan_dev.c
@@ -615,6 +615,9 @@ static int vlan_dev_init(struct net_device *dev)
        if (!vlan->vlan_pcpu_stats)
                return -ENOMEM;

+       /* Get vlan's reference to real_dev */
+       dev_hold(real_dev);


If there is not any other idea and objection, I will send the fix patch later.

Thank you!
Petr Machata Nov. 23, 2021, 12:35 p.m. UTC | #15
Ziyang Xuan (William) <william.xuanziyang@huawei.com> writes:

>> 
>> Ziyang Xuan (William) <william.xuanziyang@huawei.com> writes:
>> 
>>> I need some time to test my some ideas. And anyone has good ideas, please
>>> do not be stingy.
>> 
>> Jakub Kicinski <kuba@kernel.org> writes:
>> 
>>> I think we should move the dev_hold() to ndo_init(), otherwise it's
>>> hard to reason if destructor was invoked or not if
>>> register_netdevice() errors out.
>> 
>> That makes sense to me. We always put real_dev in the destructor, so we
>> should always hold it in the constructor...
>
> Inject error before dev_hold(real_dev) in register_vlan_dev(), and execute
> the following testcase:
>
> ip link add dev dummy1 type dummy
> ip link add name dummy1.100 link dummy1 type vlan id 100 // failed for error injection
> ip link del dev dummy1
>
> Make the problem repro. The problem is solved using the following fix
> according to the Jakub's suggestion:
>
> diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c
> index a3a0a5e994f5..abaa5d96ded2 100644
> --- a/net/8021q/vlan.c
> +++ b/net/8021q/vlan.c
> @@ -184,9 +184,6 @@ int register_vlan_dev(struct net_device *dev, struct netlink_ext_ack *extack)
>         if (err)
>                 goto out_unregister_netdev;
>
> -       /* Account for reference in struct vlan_dev_priv */
> -       dev_hold(real_dev);
> -
>         vlan_stacked_transfer_operstate(real_dev, dev, vlan);
>         linkwatch_fire_event(dev); /* _MUST_ call rfc2863_policy() */
>
> diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c
> index ab6dee28536d..a54535cbcf4c 100644
> --- a/net/8021q/vlan_dev.c
> +++ b/net/8021q/vlan_dev.c
> @@ -615,6 +615,9 @@ static int vlan_dev_init(struct net_device *dev)
>         if (!vlan->vlan_pcpu_stats)
>                 return -ENOMEM;
>
> +       /* Get vlan's reference to real_dev */
> +       dev_hold(real_dev);
>
>
> If there is not any other idea and objection, I will send the fix patch later.
>
> Thank you!

This fixes the issue in my repro, and does not seems to break anything.
We'll take it to full regression overnight, I'll report back tomorrow
about the result.
Petr Machata Nov. 25, 2021, 11:33 a.m. UTC | #16
Petr Machata <petrm@nvidia.com> writes:

> Ziyang Xuan (William) <william.xuanziyang@huawei.com> writes:
>
>>> 
>>> Ziyang Xuan (William) <william.xuanziyang@huawei.com> writes:
>>> 
>>>> I need some time to test my some ideas. And anyone has good ideas, please
>>>> do not be stingy.
>>> 
>>> Jakub Kicinski <kuba@kernel.org> writes:
>>> 
>>>> I think we should move the dev_hold() to ndo_init(), otherwise it's
>>>> hard to reason if destructor was invoked or not if
>>>> register_netdevice() errors out.
>>> 
>>> That makes sense to me. We always put real_dev in the destructor, so we
>>> should always hold it in the constructor...
>>
>> Inject error before dev_hold(real_dev) in register_vlan_dev(), and execute
>> the following testcase:
>>
>> ip link add dev dummy1 type dummy
>> ip link add name dummy1.100 link dummy1 type vlan id 100 // failed for error injection
>> ip link del dev dummy1
>>
>> Make the problem repro. The problem is solved using the following fix
>> according to the Jakub's suggestion:
>>
>> diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c
>> index a3a0a5e994f5..abaa5d96ded2 100644
>> --- a/net/8021q/vlan.c
>> +++ b/net/8021q/vlan.c
>> @@ -184,9 +184,6 @@ int register_vlan_dev(struct net_device *dev, struct netlink_ext_ack *extack)
>>         if (err)
>>                 goto out_unregister_netdev;
>>
>> -       /* Account for reference in struct vlan_dev_priv */
>> -       dev_hold(real_dev);
>> -
>>         vlan_stacked_transfer_operstate(real_dev, dev, vlan);
>>         linkwatch_fire_event(dev); /* _MUST_ call rfc2863_policy() */
>>
>> diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c
>> index ab6dee28536d..a54535cbcf4c 100644
>> --- a/net/8021q/vlan_dev.c
>> +++ b/net/8021q/vlan_dev.c
>> @@ -615,6 +615,9 @@ static int vlan_dev_init(struct net_device *dev)
>>         if (!vlan->vlan_pcpu_stats)
>>                 return -ENOMEM;
>>
>> +       /* Get vlan's reference to real_dev */
>> +       dev_hold(real_dev);
>>
>>
>> If there is not any other idea and objection, I will send the fix patch later.
>>
>> Thank you!
>
> This fixes the issue in my repro, and does not seems to break anything.
> We'll take it to full regression overnight, I'll report back tomorrow
> about the result.

Sorry, was AFK yesterday. It does look good.
Ziyang Xuan (William) Nov. 26, 2021, 1:48 a.m. UTC | #17
> 
> Petr Machata <petrm@nvidia.com> writes:
> 
>> Ziyang Xuan (William) <william.xuanziyang@huawei.com> writes:
>>
>>>>
>>>> Ziyang Xuan (William) <william.xuanziyang@huawei.com> writes:
>>>>
>>>>> I need some time to test my some ideas. And anyone has good ideas, please
>>>>> do not be stingy.
>>>>
>>>> Jakub Kicinski <kuba@kernel.org> writes:
>>>>
>>>>> I think we should move the dev_hold() to ndo_init(), otherwise it's
>>>>> hard to reason if destructor was invoked or not if
>>>>> register_netdevice() errors out.
>>>>
>>>> That makes sense to me. We always put real_dev in the destructor, so we
>>>> should always hold it in the constructor...
>>>
>>> Inject error before dev_hold(real_dev) in register_vlan_dev(), and execute
>>> the following testcase:
>>>
>>> ip link add dev dummy1 type dummy
>>> ip link add name dummy1.100 link dummy1 type vlan id 100 // failed for error injection
>>> ip link del dev dummy1
>>>
>>> Make the problem repro. The problem is solved using the following fix
>>> according to the Jakub's suggestion:
>>>
>>> diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c
>>> index a3a0a5e994f5..abaa5d96ded2 100644
>>> --- a/net/8021q/vlan.c
>>> +++ b/net/8021q/vlan.c
>>> @@ -184,9 +184,6 @@ int register_vlan_dev(struct net_device *dev, struct netlink_ext_ack *extack)
>>>         if (err)
>>>                 goto out_unregister_netdev;
>>>
>>> -       /* Account for reference in struct vlan_dev_priv */
>>> -       dev_hold(real_dev);
>>> -
>>>         vlan_stacked_transfer_operstate(real_dev, dev, vlan);
>>>         linkwatch_fire_event(dev); /* _MUST_ call rfc2863_policy() */
>>>
>>> diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c
>>> index ab6dee28536d..a54535cbcf4c 100644
>>> --- a/net/8021q/vlan_dev.c
>>> +++ b/net/8021q/vlan_dev.c
>>> @@ -615,6 +615,9 @@ static int vlan_dev_init(struct net_device *dev)
>>>         if (!vlan->vlan_pcpu_stats)
>>>                 return -ENOMEM;
>>>
>>> +       /* Get vlan's reference to real_dev */
>>> +       dev_hold(real_dev);
>>>
>>>
>>> If there is not any other idea and objection, I will send the fix patch later.
>>>
>>> Thank you!
>>
>> This fixes the issue in my repro, and does not seems to break anything.
>> We'll take it to full regression overnight, I'll report back tomorrow
>> about the result.
> 
> Sorry, was AFK yesterday. It does look good.
> .

Thank you for your efforts very much! I have sent the fix patch.
diff mbox series

Patch

diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c
index 55275ef9a31a..a3a0a5e994f5 100644
--- a/net/8021q/vlan.c
+++ b/net/8021q/vlan.c
@@ -123,9 +123,6 @@  void unregister_vlan_dev(struct net_device *dev, struct list_head *head)
 	}
 
 	vlan_vid_del(real_dev, vlan->vlan_proto, vlan_id);
-
-	/* Get rid of the vlan's reference to real_dev */
-	dev_put(real_dev);
 }
 
 int vlan_check_real_dev(struct net_device *real_dev,
diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c
index 0c21d1fec852..aeeb5f90417b 100644
--- a/net/8021q/vlan_dev.c
+++ b/net/8021q/vlan_dev.c
@@ -843,6 +843,9 @@  static void vlan_dev_free(struct net_device *dev)
 
 	free_percpu(vlan->vlan_pcpu_stats);
 	vlan->vlan_pcpu_stats = NULL;
+
+	/* Get rid of the vlan's reference to real_dev */
+	dev_put(vlan->real_dev);
 }
 
 void vlan_setup(struct net_device *dev)