vhost_net: initialize rx_ring in vhost_net_open()
diff mbox

Message ID 20180308175642-mutt-send-email-mst@kernel.org
State New
Headers show

Commit Message

Michael S. Tsirkin March 8, 2018, 4 p.m. UTC
On Thu, Mar 08, 2018 at 04:55:39PM +0100, Alexander Potapenko wrote:
> On Thu, Mar 8, 2018 at 4:33 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> > On Thu, Mar 08, 2018 at 02:37:17PM +0100, Alexander Potapenko wrote:
> >> KMSAN reported a use of uninit memory in vhost_net_buf_unproduce()
> >> while trying to access n->vqs[VHOST_NET_VQ_TX].rx_ring:
> >>
> >> ==================================================================
> >> BUG: KMSAN: use of uninitialized memory in vhost_net_buf_unproduce+0x7bb/0x9a0 drivers/vho
> >> et.c:170
> >> CPU: 0 PID: 3021 Comm: syz-fuzzer Not tainted 4.16.0-rc4+ #3853
> >> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
> >> Call Trace:
> >>  __dump_stack lib/dump_stack.c:17 [inline]
> >>  dump_stack+0x185/0x1d0 lib/dump_stack.c:53
> >>  kmsan_report+0x142/0x1f0 mm/kmsan/kmsan.c:1093
> >>  __msan_warning_32+0x6c/0xb0 mm/kmsan/kmsan_instr.c:676
> >>  vhost_net_buf_unproduce+0x7bb/0x9a0 drivers/vhost/net.c:170
> >>  vhost_net_stop_vq drivers/vhost/net.c:974 [inline]
> >>  vhost_net_stop+0x146/0x380 drivers/vhost/net.c:982
> >>  vhost_net_release+0xb1/0x4f0 drivers/vhost/net.c:1015
> >>  __fput+0x49f/0xa00 fs/file_table.c:209
> >>  ____fput+0x37/0x40 fs/file_table.c:243
> >>  task_work_run+0x243/0x2c0 kernel/task_work.c:113
> >>  tracehook_notify_resume include/linux/tracehook.h:191 [inline]
> >>  exit_to_usermode_loop arch/x86/entry/common.c:166 [inline]
> >>  prepare_exit_to_usermode+0x349/0x3b0 arch/x86/entry/common.c:196
> >>  syscall_return_slowpath+0xf3/0x6d0 arch/x86/entry/common.c:265
> >>  do_syscall_64+0x34d/0x450 arch/x86/entry/common.c:292
> >> ...
> >> origin:
> >>  kmsan_save_stack_with_flags mm/kmsan/kmsan.c:303 [inline]
> >>  kmsan_internal_poison_shadow+0xb8/0x1b0 mm/kmsan/kmsan.c:213
> >>  kmsan_kmalloc_large+0x6f/0xd0 mm/kmsan/kmsan.c:392
> >>  kmalloc_large_node_hook mm/slub.c:1366 [inline]
> >>  kmalloc_large_node mm/slub.c:3808 [inline]
> >>  __kmalloc_node+0x100e/0x1290 mm/slub.c:3818
> >>  kmalloc_node include/linux/slab.h:554 [inline]
> >>  kvmalloc_node+0x1a5/0x2e0 mm/util.c:419
> >>  kvmalloc include/linux/mm.h:541 [inline]
> >>  vhost_net_open+0x64/0x5f0 drivers/vhost/net.c:921
> >>  misc_open+0x7b5/0x8b0 drivers/char/misc.c:154
> >>  chrdev_open+0xc28/0xd90 fs/char_dev.c:417
> >>  do_dentry_open+0xccb/0x1430 fs/open.c:752
> >>  vfs_open+0x272/0x2e0 fs/open.c:866
> >>  do_last fs/namei.c:3378 [inline]
> >>  path_openat+0x49ad/0x6580 fs/namei.c:3519
> >>  do_filp_open+0x267/0x640 fs/namei.c:3553
> >>  do_sys_open+0x6ad/0x9c0 fs/open.c:1059
> >>  SYSC_openat+0xc7/0xe0 fs/open.c:1086
> >>  SyS_openat+0x63/0x90 fs/open.c:1080
> >>  do_syscall_64+0x2f1/0x450 arch/x86/entry/common.c:287
> >> ==================================================================
> >>
> >> Signed-off-by: Alexander Potapenko <glider@google.com>
> >> ---
> >>  drivers/vhost/net.c | 1 +
> >>  1 file changed, 1 insertion(+)
> >>
> >> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> >> index 610cba276d47..60f1080bffc7 100644
> >> --- a/drivers/vhost/net.c
> >> +++ b/drivers/vhost/net.c
> >> @@ -948,6 +948,7 @@ static int vhost_net_open(struct inode *inode, struct file *f)
> >>               n->vqs[i].done_idx = 0;
> >>               n->vqs[i].vhost_hlen = 0;
> >>               n->vqs[i].sock_hlen = 0;
> >> +             n->vqs[i].rx_ring = NULL;
> >>               vhost_net_buf_init(&n->vqs[i].rxq);
> >>       }
> >>       vhost_dev_init(dev, vqs, VHOST_NET_VQ_MAX);
> >> --
> >> 2.16.2.395.g2e18187dfd-goog
> >
> >
> > I suspect that's not sufficient. rx ring is tied to the tap device.
> > I think we need to drop it every time we drop the device.
> 
> Unfortunately I've no idea where is the device dropped. Are you
> referring to vhost_net_vq_reset()?
> I can fix that part if needed, but won't be able to validate it with KMSAN.

I see several issues. For example in vhost_net_set_backend
if there's a value then rx ring will point to the
ring of the wrong socket.
Something like the below might help but we really need
documentation of when is rx_ring valid. Is it only valid
when private-data is valid? If yes need to make sure
we reset it with private_data.

Also I see __skb_array_destroy_skb used with ptr_ring which
seems suspicious: how do we know the entries are skbs?

Patch below is on top of yours, and

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

But I really would like Jason to look and come up with a
patch to address all these issues.

---

Comments

Jason Wang March 9, 2018, 2:30 a.m. UTC | #1
On 2018年03月09日 00:00, Michael S. Tsirkin wrote:
> On Thu, Mar 08, 2018 at 04:55:39PM +0100, Alexander Potapenko wrote:
>> On Thu, Mar 8, 2018 at 4:33 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
>>> On Thu, Mar 08, 2018 at 02:37:17PM +0100, Alexander Potapenko wrote:
>>>> KMSAN reported a use of uninit memory in vhost_net_buf_unproduce()
>>>> while trying to access n->vqs[VHOST_NET_VQ_TX].rx_ring:
>>>>
>>>> ==================================================================
>>>> BUG: KMSAN: use of uninitialized memory in vhost_net_buf_unproduce+0x7bb/0x9a0 drivers/vho
>>>> et.c:170
>>>> CPU: 0 PID: 3021 Comm: syz-fuzzer Not tainted 4.16.0-rc4+ #3853
>>>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
>>>> Call Trace:
>>>>   __dump_stack lib/dump_stack.c:17 [inline]
>>>>   dump_stack+0x185/0x1d0 lib/dump_stack.c:53
>>>>   kmsan_report+0x142/0x1f0 mm/kmsan/kmsan.c:1093
>>>>   __msan_warning_32+0x6c/0xb0 mm/kmsan/kmsan_instr.c:676
>>>>   vhost_net_buf_unproduce+0x7bb/0x9a0 drivers/vhost/net.c:170
>>>>   vhost_net_stop_vq drivers/vhost/net.c:974 [inline]
>>>>   vhost_net_stop+0x146/0x380 drivers/vhost/net.c:982
>>>>   vhost_net_release+0xb1/0x4f0 drivers/vhost/net.c:1015
>>>>   __fput+0x49f/0xa00 fs/file_table.c:209
>>>>   ____fput+0x37/0x40 fs/file_table.c:243
>>>>   task_work_run+0x243/0x2c0 kernel/task_work.c:113
>>>>   tracehook_notify_resume include/linux/tracehook.h:191 [inline]
>>>>   exit_to_usermode_loop arch/x86/entry/common.c:166 [inline]
>>>>   prepare_exit_to_usermode+0x349/0x3b0 arch/x86/entry/common.c:196
>>>>   syscall_return_slowpath+0xf3/0x6d0 arch/x86/entry/common.c:265
>>>>   do_syscall_64+0x34d/0x450 arch/x86/entry/common.c:292
>>>> ...
>>>> origin:
>>>>   kmsan_save_stack_with_flags mm/kmsan/kmsan.c:303 [inline]
>>>>   kmsan_internal_poison_shadow+0xb8/0x1b0 mm/kmsan/kmsan.c:213
>>>>   kmsan_kmalloc_large+0x6f/0xd0 mm/kmsan/kmsan.c:392
>>>>   kmalloc_large_node_hook mm/slub.c:1366 [inline]
>>>>   kmalloc_large_node mm/slub.c:3808 [inline]
>>>>   __kmalloc_node+0x100e/0x1290 mm/slub.c:3818
>>>>   kmalloc_node include/linux/slab.h:554 [inline]
>>>>   kvmalloc_node+0x1a5/0x2e0 mm/util.c:419
>>>>   kvmalloc include/linux/mm.h:541 [inline]
>>>>   vhost_net_open+0x64/0x5f0 drivers/vhost/net.c:921
>>>>   misc_open+0x7b5/0x8b0 drivers/char/misc.c:154
>>>>   chrdev_open+0xc28/0xd90 fs/char_dev.c:417
>>>>   do_dentry_open+0xccb/0x1430 fs/open.c:752
>>>>   vfs_open+0x272/0x2e0 fs/open.c:866
>>>>   do_last fs/namei.c:3378 [inline]
>>>>   path_openat+0x49ad/0x6580 fs/namei.c:3519
>>>>   do_filp_open+0x267/0x640 fs/namei.c:3553
>>>>   do_sys_open+0x6ad/0x9c0 fs/open.c:1059
>>>>   SYSC_openat+0xc7/0xe0 fs/open.c:1086
>>>>   SyS_openat+0x63/0x90 fs/open.c:1080
>>>>   do_syscall_64+0x2f1/0x450 arch/x86/entry/common.c:287
>>>> ==================================================================
>>>>
>>>> Signed-off-by: Alexander Potapenko <glider@google.com>
>>>> ---
>>>>   drivers/vhost/net.c | 1 +
>>>>   1 file changed, 1 insertion(+)
>>>>
>>>> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
>>>> index 610cba276d47..60f1080bffc7 100644
>>>> --- a/drivers/vhost/net.c
>>>> +++ b/drivers/vhost/net.c
>>>> @@ -948,6 +948,7 @@ static int vhost_net_open(struct inode *inode, struct file *f)
>>>>                n->vqs[i].done_idx = 0;
>>>>                n->vqs[i].vhost_hlen = 0;
>>>>                n->vqs[i].sock_hlen = 0;
>>>> +             n->vqs[i].rx_ring = NULL;
>>>>                vhost_net_buf_init(&n->vqs[i].rxq);
>>>>        }
>>>>        vhost_dev_init(dev, vqs, VHOST_NET_VQ_MAX);
>>>> --
>>>> 2.16.2.395.g2e18187dfd-goog
>>>
>>> I suspect that's not sufficient. rx ring is tied to the tap device.
>>> I think we need to drop it every time we drop the device.
>> Unfortunately I've no idea where is the device dropped. Are you
>> referring to vhost_net_vq_reset()?
>> I can fix that part if needed, but won't be able to validate it with KMSAN.
> I see several issues. For example in vhost_net_set_backend
> if there's a value then rx ring will point to the
> ring of the wrong socket.
> Something like the below might help but we really need
> documentation of when is rx_ring valid. Is it only valid
> when private-data is valid?

I think so, we need keep rx_ring synced with private_data.

> If yes need to make sure
> we reset it with private_data.
>
> Also I see __skb_array_destroy_skb used with ptr_ring which
> seems suspicious: how do we know the entries are skbs?

Good catch, will post an independent patch to fix this.

>
> Patch below is on top of yours, and
>
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
>
> But I really would like Jason to look and come up with a
> patch to address all these issues.
>
> ---
>
> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> index 610cba2..7a65b69 100644
> --- a/drivers/vhost/net.c
> +++ b/drivers/vhost/net.c
> @@ -972,6 +973,7 @@ static struct socket *vhost_net_stop_vq(struct vhost_net *n,
>   	vhost_net_disable_vq(n, vq);
>   	vq->private_data = NULL;
>   	vhost_net_buf_unproduce(nvq);
> +	vq->rx_ring = NULL;
>   	mutex_unlock(&vq->mutex);
>   	return sock;
>   }
> @@ -1161,8 +1163,6 @@ static long vhost_net_set_backend(struct vhost_net *n, unsigned index, int fd)
>   		vhost_net_disable_vq(n, vq);
>   		vq->private_data = sock;
>   		vhost_net_buf_unproduce(nvq);
> -		if (index == VHOST_NET_VQ_RX)
> -			nvq->rx_ring = get_tap_ptr_ring(fd);
>   		r = vhost_vq_init_access(vq);
>   		if (r)
>   			goto err_used;
> @@ -1172,6 +1172,10 @@ static long vhost_net_set_backend(struct vhost_net *n, unsigned index, int fd)
>   
>   		oldubufs = nvq->ubufs;
>   		nvq->ubufs = ubufs;
> +		if (index == VHOST_NET_VQ_RX)
> +			nvq->rx_ring = get_tap_ptr_ring(fd);
> +		else
> +			nvq->rx_ring = NULL;
>   

Any reason to move those after vhost_net_enable_vq()? And consider we 
won't try to assign rx_ring to TX, the "else" part seems unnecessary.

Thanks

>   		n->tx_packets = 0;
>   		n->tx_zcopy_err = 0;
Michael S. Tsirkin March 9, 2018, 3:29 a.m. UTC | #2
On Fri, Mar 09, 2018 at 10:30:17AM +0800, Jason Wang wrote:
> 
> 
> On 2018年03月09日 00:00, Michael S. Tsirkin wrote:
> > On Thu, Mar 08, 2018 at 04:55:39PM +0100, Alexander Potapenko wrote:
> > > On Thu, Mar 8, 2018 at 4:33 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > On Thu, Mar 08, 2018 at 02:37:17PM +0100, Alexander Potapenko wrote:
> > > > > KMSAN reported a use of uninit memory in vhost_net_buf_unproduce()
> > > > > while trying to access n->vqs[VHOST_NET_VQ_TX].rx_ring:
> > > > > 
> > > > > ==================================================================
> > > > > BUG: KMSAN: use of uninitialized memory in vhost_net_buf_unproduce+0x7bb/0x9a0 drivers/vho
> > > > > et.c:170
> > > > > CPU: 0 PID: 3021 Comm: syz-fuzzer Not tainted 4.16.0-rc4+ #3853
> > > > > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
> > > > > Call Trace:
> > > > >   __dump_stack lib/dump_stack.c:17 [inline]
> > > > >   dump_stack+0x185/0x1d0 lib/dump_stack.c:53
> > > > >   kmsan_report+0x142/0x1f0 mm/kmsan/kmsan.c:1093
> > > > >   __msan_warning_32+0x6c/0xb0 mm/kmsan/kmsan_instr.c:676
> > > > >   vhost_net_buf_unproduce+0x7bb/0x9a0 drivers/vhost/net.c:170
> > > > >   vhost_net_stop_vq drivers/vhost/net.c:974 [inline]
> > > > >   vhost_net_stop+0x146/0x380 drivers/vhost/net.c:982
> > > > >   vhost_net_release+0xb1/0x4f0 drivers/vhost/net.c:1015
> > > > >   __fput+0x49f/0xa00 fs/file_table.c:209
> > > > >   ____fput+0x37/0x40 fs/file_table.c:243
> > > > >   task_work_run+0x243/0x2c0 kernel/task_work.c:113
> > > > >   tracehook_notify_resume include/linux/tracehook.h:191 [inline]
> > > > >   exit_to_usermode_loop arch/x86/entry/common.c:166 [inline]
> > > > >   prepare_exit_to_usermode+0x349/0x3b0 arch/x86/entry/common.c:196
> > > > >   syscall_return_slowpath+0xf3/0x6d0 arch/x86/entry/common.c:265
> > > > >   do_syscall_64+0x34d/0x450 arch/x86/entry/common.c:292
> > > > > ...
> > > > > origin:
> > > > >   kmsan_save_stack_with_flags mm/kmsan/kmsan.c:303 [inline]
> > > > >   kmsan_internal_poison_shadow+0xb8/0x1b0 mm/kmsan/kmsan.c:213
> > > > >   kmsan_kmalloc_large+0x6f/0xd0 mm/kmsan/kmsan.c:392
> > > > >   kmalloc_large_node_hook mm/slub.c:1366 [inline]
> > > > >   kmalloc_large_node mm/slub.c:3808 [inline]
> > > > >   __kmalloc_node+0x100e/0x1290 mm/slub.c:3818
> > > > >   kmalloc_node include/linux/slab.h:554 [inline]
> > > > >   kvmalloc_node+0x1a5/0x2e0 mm/util.c:419
> > > > >   kvmalloc include/linux/mm.h:541 [inline]
> > > > >   vhost_net_open+0x64/0x5f0 drivers/vhost/net.c:921
> > > > >   misc_open+0x7b5/0x8b0 drivers/char/misc.c:154
> > > > >   chrdev_open+0xc28/0xd90 fs/char_dev.c:417
> > > > >   do_dentry_open+0xccb/0x1430 fs/open.c:752
> > > > >   vfs_open+0x272/0x2e0 fs/open.c:866
> > > > >   do_last fs/namei.c:3378 [inline]
> > > > >   path_openat+0x49ad/0x6580 fs/namei.c:3519
> > > > >   do_filp_open+0x267/0x640 fs/namei.c:3553
> > > > >   do_sys_open+0x6ad/0x9c0 fs/open.c:1059
> > > > >   SYSC_openat+0xc7/0xe0 fs/open.c:1086
> > > > >   SyS_openat+0x63/0x90 fs/open.c:1080
> > > > >   do_syscall_64+0x2f1/0x450 arch/x86/entry/common.c:287
> > > > > ==================================================================
> > > > > 
> > > > > Signed-off-by: Alexander Potapenko <glider@google.com>
> > > > > ---
> > > > >   drivers/vhost/net.c | 1 +
> > > > >   1 file changed, 1 insertion(+)
> > > > > 
> > > > > diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> > > > > index 610cba276d47..60f1080bffc7 100644
> > > > > --- a/drivers/vhost/net.c
> > > > > +++ b/drivers/vhost/net.c
> > > > > @@ -948,6 +948,7 @@ static int vhost_net_open(struct inode *inode, struct file *f)
> > > > >                n->vqs[i].done_idx = 0;
> > > > >                n->vqs[i].vhost_hlen = 0;
> > > > >                n->vqs[i].sock_hlen = 0;
> > > > > +             n->vqs[i].rx_ring = NULL;
> > > > >                vhost_net_buf_init(&n->vqs[i].rxq);
> > > > >        }
> > > > >        vhost_dev_init(dev, vqs, VHOST_NET_VQ_MAX);
> > > > > --
> > > > > 2.16.2.395.g2e18187dfd-goog
> > > > 
> > > > I suspect that's not sufficient. rx ring is tied to the tap device.
> > > > I think we need to drop it every time we drop the device.
> > > Unfortunately I've no idea where is the device dropped. Are you
> > > referring to vhost_net_vq_reset()?
> > > I can fix that part if needed, but won't be able to validate it with KMSAN.
> > I see several issues. For example in vhost_net_set_backend
> > if there's a value then rx ring will point to the
> > ring of the wrong socket.
> > Something like the below might help but we really need
> > documentation of when is rx_ring valid. Is it only valid
> > when private-data is valid?
> 
> I think so, we need keep rx_ring synced with private_data.
> 
> > If yes need to make sure
> > we reset it with private_data.
> > 
> > Also I see __skb_array_destroy_skb used with ptr_ring which
> > seems suspicious: how do we know the entries are skbs?
> 
> Good catch, will post an independent patch to fix this.
> 
> > 
> > Patch below is on top of yours, and
> > 
> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > 
> > But I really would like Jason to look and come up with a
> > patch to address all these issues.
> > 
> > ---
> > 
> > diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> > index 610cba2..7a65b69 100644
> > --- a/drivers/vhost/net.c
> > +++ b/drivers/vhost/net.c
> > @@ -972,6 +973,7 @@ static struct socket *vhost_net_stop_vq(struct vhost_net *n,
> >   	vhost_net_disable_vq(n, vq);
> >   	vq->private_data = NULL;
> >   	vhost_net_buf_unproduce(nvq);
> > +	vq->rx_ring = NULL;
> >   	mutex_unlock(&vq->mutex);
> >   	return sock;
> >   }
> > @@ -1161,8 +1163,6 @@ static long vhost_net_set_backend(struct vhost_net *n, unsigned index, int fd)
> >   		vhost_net_disable_vq(n, vq);
> >   		vq->private_data = sock;
> >   		vhost_net_buf_unproduce(nvq);
> > -		if (index == VHOST_NET_VQ_RX)
> > -			nvq->rx_ring = get_tap_ptr_ring(fd);
> >   		r = vhost_vq_init_access(vq);
> >   		if (r)
> >   			goto err_used;
> > @@ -1172,6 +1172,10 @@ static long vhost_net_set_backend(struct vhost_net *n, unsigned index, int fd)
> >   		oldubufs = nvq->ubufs;
> >   		nvq->ubufs = ubufs;
> > +		if (index == VHOST_NET_VQ_RX)
> > +			nvq->rx_ring = get_tap_ptr_ring(fd);
> > +		else
> > +			nvq->rx_ring = NULL;
> 
> Any reason to move those after vhost_net_enable_vq()?

Otherwise I see an issue if there is an error and 
we revert the change.

> And consider we won't
> try to assign rx_ring to TX, the "else" part seems unnecessary.
> 
> Thanks

ok, pls pack up all fixes as you see fit and post
a patchset.

> >   		n->tx_packets = 0;
> >   		n->tx_zcopy_err = 0;
Jason Wang March 9, 2018, 3:47 a.m. UTC | #3
On 2018年03月09日 11:29, Michael S. Tsirkin wrote:
> On Fri, Mar 09, 2018 at 10:30:17AM +0800, Jason Wang wrote:
>>
>> On 2018年03月09日 00:00, Michael S. Tsirkin wrote:
>>> On Thu, Mar 08, 2018 at 04:55:39PM +0100, Alexander Potapenko wrote:
>>>> On Thu, Mar 8, 2018 at 4:33 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
>>>>> On Thu, Mar 08, 2018 at 02:37:17PM +0100, Alexander Potapenko wrote:
>>>>>> KMSAN reported a use of uninit memory in vhost_net_buf_unproduce()
>>>>>> while trying to access n->vqs[VHOST_NET_VQ_TX].rx_ring:
>>>>>>
>>>>>> ==================================================================
>>>>>> BUG: KMSAN: use of uninitialized memory in vhost_net_buf_unproduce+0x7bb/0x9a0 drivers/vho
>>>>>> et.c:170
>>>>>> CPU: 0 PID: 3021 Comm: syz-fuzzer Not tainted 4.16.0-rc4+ #3853
>>>>>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
>>>>>> Call Trace:
>>>>>>    __dump_stack lib/dump_stack.c:17 [inline]
>>>>>>    dump_stack+0x185/0x1d0 lib/dump_stack.c:53
>>>>>>    kmsan_report+0x142/0x1f0 mm/kmsan/kmsan.c:1093
>>>>>>    __msan_warning_32+0x6c/0xb0 mm/kmsan/kmsan_instr.c:676
>>>>>>    vhost_net_buf_unproduce+0x7bb/0x9a0 drivers/vhost/net.c:170
>>>>>>    vhost_net_stop_vq drivers/vhost/net.c:974 [inline]
>>>>>>    vhost_net_stop+0x146/0x380 drivers/vhost/net.c:982
>>>>>>    vhost_net_release+0xb1/0x4f0 drivers/vhost/net.c:1015
>>>>>>    __fput+0x49f/0xa00 fs/file_table.c:209
>>>>>>    ____fput+0x37/0x40 fs/file_table.c:243
>>>>>>    task_work_run+0x243/0x2c0 kernel/task_work.c:113
>>>>>>    tracehook_notify_resume include/linux/tracehook.h:191 [inline]
>>>>>>    exit_to_usermode_loop arch/x86/entry/common.c:166 [inline]
>>>>>>    prepare_exit_to_usermode+0x349/0x3b0 arch/x86/entry/common.c:196
>>>>>>    syscall_return_slowpath+0xf3/0x6d0 arch/x86/entry/common.c:265
>>>>>>    do_syscall_64+0x34d/0x450 arch/x86/entry/common.c:292
>>>>>> ...
>>>>>> origin:
>>>>>>    kmsan_save_stack_with_flags mm/kmsan/kmsan.c:303 [inline]
>>>>>>    kmsan_internal_poison_shadow+0xb8/0x1b0 mm/kmsan/kmsan.c:213
>>>>>>    kmsan_kmalloc_large+0x6f/0xd0 mm/kmsan/kmsan.c:392
>>>>>>    kmalloc_large_node_hook mm/slub.c:1366 [inline]
>>>>>>    kmalloc_large_node mm/slub.c:3808 [inline]
>>>>>>    __kmalloc_node+0x100e/0x1290 mm/slub.c:3818
>>>>>>    kmalloc_node include/linux/slab.h:554 [inline]
>>>>>>    kvmalloc_node+0x1a5/0x2e0 mm/util.c:419
>>>>>>    kvmalloc include/linux/mm.h:541 [inline]
>>>>>>    vhost_net_open+0x64/0x5f0 drivers/vhost/net.c:921
>>>>>>    misc_open+0x7b5/0x8b0 drivers/char/misc.c:154
>>>>>>    chrdev_open+0xc28/0xd90 fs/char_dev.c:417
>>>>>>    do_dentry_open+0xccb/0x1430 fs/open.c:752
>>>>>>    vfs_open+0x272/0x2e0 fs/open.c:866
>>>>>>    do_last fs/namei.c:3378 [inline]
>>>>>>    path_openat+0x49ad/0x6580 fs/namei.c:3519
>>>>>>    do_filp_open+0x267/0x640 fs/namei.c:3553
>>>>>>    do_sys_open+0x6ad/0x9c0 fs/open.c:1059
>>>>>>    SYSC_openat+0xc7/0xe0 fs/open.c:1086
>>>>>>    SyS_openat+0x63/0x90 fs/open.c:1080
>>>>>>    do_syscall_64+0x2f1/0x450 arch/x86/entry/common.c:287
>>>>>> ==================================================================
>>>>>>
>>>>>> Signed-off-by: Alexander Potapenko <glider@google.com>
>>>>>> ---
>>>>>>    drivers/vhost/net.c | 1 +
>>>>>>    1 file changed, 1 insertion(+)
>>>>>>
>>>>>> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
>>>>>> index 610cba276d47..60f1080bffc7 100644
>>>>>> --- a/drivers/vhost/net.c
>>>>>> +++ b/drivers/vhost/net.c
>>>>>> @@ -948,6 +948,7 @@ static int vhost_net_open(struct inode *inode, struct file *f)
>>>>>>                 n->vqs[i].done_idx = 0;
>>>>>>                 n->vqs[i].vhost_hlen = 0;
>>>>>>                 n->vqs[i].sock_hlen = 0;
>>>>>> +             n->vqs[i].rx_ring = NULL;
>>>>>>                 vhost_net_buf_init(&n->vqs[i].rxq);
>>>>>>         }
>>>>>>         vhost_dev_init(dev, vqs, VHOST_NET_VQ_MAX);
>>>>>> --
>>>>>> 2.16.2.395.g2e18187dfd-goog
>>>>> I suspect that's not sufficient. rx ring is tied to the tap device.
>>>>> I think we need to drop it every time we drop the device.
>>>> Unfortunately I've no idea where is the device dropped. Are you
>>>> referring to vhost_net_vq_reset()?
>>>> I can fix that part if needed, but won't be able to validate it with KMSAN.
>>> I see several issues. For example in vhost_net_set_backend
>>> if there's a value then rx ring will point to the
>>> ring of the wrong socket.
>>> Something like the below might help but we really need
>>> documentation of when is rx_ring valid. Is it only valid
>>> when private-data is valid?
>> I think so, we need keep rx_ring synced with private_data.
>>
>>> If yes need to make sure
>>> we reset it with private_data.
>>>
>>> Also I see __skb_array_destroy_skb used with ptr_ring which
>>> seems suspicious: how do we know the entries are skbs?
>> Good catch, will post an independent patch to fix this.
>>
>>> Patch below is on top of yours, and
>>>
>>> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
>>>
>>> But I really would like Jason to look and come up with a
>>> patch to address all these issues.
>>>
>>> ---
>>>
>>> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
>>> index 610cba2..7a65b69 100644
>>> --- a/drivers/vhost/net.c
>>> +++ b/drivers/vhost/net.c
>>> @@ -972,6 +973,7 @@ static struct socket *vhost_net_stop_vq(struct vhost_net *n,
>>>    	vhost_net_disable_vq(n, vq);
>>>    	vq->private_data = NULL;
>>>    	vhost_net_buf_unproduce(nvq);
>>> +	vq->rx_ring = NULL;
>>>    	mutex_unlock(&vq->mutex);
>>>    	return sock;
>>>    }
>>> @@ -1161,8 +1163,6 @@ static long vhost_net_set_backend(struct vhost_net *n, unsigned index, int fd)
>>>    		vhost_net_disable_vq(n, vq);
>>>    		vq->private_data = sock;
>>>    		vhost_net_buf_unproduce(nvq);
>>> -		if (index == VHOST_NET_VQ_RX)
>>> -			nvq->rx_ring = get_tap_ptr_ring(fd);
>>>    		r = vhost_vq_init_access(vq);
>>>    		if (r)
>>>    			goto err_used;
>>> @@ -1172,6 +1172,10 @@ static long vhost_net_set_backend(struct vhost_net *n, unsigned index, int fd)
>>>    		oldubufs = nvq->ubufs;
>>>    		nvq->ubufs = ubufs;
>>> +		if (index == VHOST_NET_VQ_RX)
>>> +			nvq->rx_ring = get_tap_ptr_ring(fd);
>>> +		else
>>> +			nvq->rx_ring = NULL;
>> Any reason to move those after vhost_net_enable_vq()?
> Otherwise I see an issue if there is an error and
> we revert the change.

I see.

>
>> And consider we won't
>> try to assign rx_ring to TX, the "else" part seems unnecessary.
>>
>> Thanks
> ok, pls pack up all fixes as you see fit and post
> a patchset.

Ok.

Thanks

>
>>>    		n->tx_packets = 0;
>>>    		n->tx_zcopy_err = 0;

Patch
diff mbox

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 610cba2..7a65b69 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -972,6 +973,7 @@  static struct socket *vhost_net_stop_vq(struct vhost_net *n,
 	vhost_net_disable_vq(n, vq);
 	vq->private_data = NULL;
 	vhost_net_buf_unproduce(nvq);
+	vq->rx_ring = NULL;
 	mutex_unlock(&vq->mutex);
 	return sock;
 }
@@ -1161,8 +1163,6 @@  static long vhost_net_set_backend(struct vhost_net *n, unsigned index, int fd)
 		vhost_net_disable_vq(n, vq);
 		vq->private_data = sock;
 		vhost_net_buf_unproduce(nvq);
-		if (index == VHOST_NET_VQ_RX)
-			nvq->rx_ring = get_tap_ptr_ring(fd);
 		r = vhost_vq_init_access(vq);
 		if (r)
 			goto err_used;
@@ -1172,6 +1172,10 @@  static long vhost_net_set_backend(struct vhost_net *n, unsigned index, int fd)
 
 		oldubufs = nvq->ubufs;
 		nvq->ubufs = ubufs;
+		if (index == VHOST_NET_VQ_RX)
+			nvq->rx_ring = get_tap_ptr_ring(fd);
+		else
+			nvq->rx_ring = NULL;
 
 		n->tx_packets = 0;
 		n->tx_zcopy_err = 0;