Message ID | 3f6be9c84782a0943ea21a8a6f8a5d055b65f2d5.1619018363.git.crobinso@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [PATCH-for-6.0] net: tap: fix crash on hotplug | expand |
Cc'ing Bin. On 4/21/21 5:22 PM, Cole Robinson wrote: > Attempting to hotplug a tap nic with libvirt will crash qemu: > > $ sudo virsh attach-interface f32 network default > error: Failed to attach interface > error: Unable to read from monitor: Connection reset by peer > > 0x000055875b7f3a99 in tap_send (opaque=0x55875e39eae0) at ../net/tap.c:206 > 206 if (!s->nc.peer->do_not_pad) { > gdb$ bt > > s->nc.peer may not be set at this point. This seems to be an > expected case, as qemu_send_packet_* explicitly checks for NULL > s->nc.peer later. > > Fix it by checking for s->nc.peer here too. Padding is applied if > s->nc.peer is not set. > > https://bugzilla.redhat.com/show_bug.cgi?id=1949786 > Fixes: 969e50b61a2 > > Signed-off-by: Cole Robinson <crobinso@redhat.com> > --- > * Or should we skip padding if nc.peer is unset? I didn't dig into it > * tap-win3.c and slirp.c may need a similar fix, but the slirp case > didn't crash in a simple test. > > net/tap.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/net/tap.c b/net/tap.c > index dd42ac6134..937559dbb8 100644 > --- a/net/tap.c > +++ b/net/tap.c > @@ -203,7 +203,7 @@ static void tap_send(void *opaque) > size -= s->host_vnet_hdr_len; > } > > - if (!s->nc.peer->do_not_pad) { > + if (!s->nc.peer || !s->nc.peer->do_not_pad) { > if (eth_pad_short_frame(min_pkt, &min_pktsz, buf, size)) { > buf = min_pkt; > size = min_pktsz; >
On Wed, 21 Apr 2021 at 16:24, Cole Robinson <crobinso@redhat.com> wrote: > > Attempting to hotplug a tap nic with libvirt will crash qemu: > > $ sudo virsh attach-interface f32 network default > error: Failed to attach interface > error: Unable to read from monitor: Connection reset by peer > > 0x000055875b7f3a99 in tap_send (opaque=0x55875e39eae0) at ../net/tap.c:206 > 206 if (!s->nc.peer->do_not_pad) { > gdb$ bt > > s->nc.peer may not be set at this point. This seems to be an > expected case, as qemu_send_packet_* explicitly checks for NULL > s->nc.peer later. > > Fix it by checking for s->nc.peer here too. Padding is applied if > s->nc.peer is not set. > > https://bugzilla.redhat.com/show_bug.cgi?id=1949786 > Fixes: 969e50b61a2 Is this a regression since 5.2 ? (I guess so given the Fixes tag.) Also, I'm kind of irritated that this was reported to RH on the 15th and we only get a patch now after rc4. I really really don't want to have to roll an rc5, so this now has a much higher hill to climb to get into 6.0 than if it had been reported (eg on the "Planning" wiki page) as a for-6.0 issue before rc4 was tagged... thanks -- PMM
On 4/21/21 3:54 PM, Peter Maydell wrote: > On Wed, 21 Apr 2021 at 16:24, Cole Robinson <crobinso@redhat.com> wrote: >> >> Attempting to hotplug a tap nic with libvirt will crash qemu: >> >> $ sudo virsh attach-interface f32 network default >> error: Failed to attach interface >> error: Unable to read from monitor: Connection reset by peer >> >> 0x000055875b7f3a99 in tap_send (opaque=0x55875e39eae0) at ../net/tap.c:206 >> 206 if (!s->nc.peer->do_not_pad) { >> gdb$ bt >> >> s->nc.peer may not be set at this point. This seems to be an >> expected case, as qemu_send_packet_* explicitly checks for NULL >> s->nc.peer later. >> >> Fix it by checking for s->nc.peer here too. Padding is applied if >> s->nc.peer is not set. >> >> https://bugzilla.redhat.com/show_bug.cgi?id=1949786 >> Fixes: 969e50b61a2 > > Is this a regression since 5.2 ? (I guess so given the Fixes tag.) > Yes > Also, I'm kind of irritated that this was reported to RH on the > 15th and we only get a patch now after rc4. Sorry about that, I was slow attempting the reproducer, only gave it a spin today. I saw Jason had some reverts in rc3 so I guessed that would fix things, I was surprised to see it still reproduced with rc4. I really really don't > want to have to roll an rc5, so this now has a much higher > hill to climb to get into 6.0 than if it had been reported > (eg on the "Planning" wiki page) as a for-6.0 issue before rc4 > was tagged.. I'm not too in tune to rules of the rc releases TBH, I used the subject prefix just to ensure this got attention. For Fedora's needs it's not a big deal if this isn't in 6.0.0 GA. But AFAICT most nic hotplug via libvirt will crash qemu 100% of the time so I imagine every distro will want to immediately backport this patch. Thanks, Cole
在 2021/4/21 下午11:22, Cole Robinson 写道: > Attempting to hotplug a tap nic with libvirt will crash qemu: > > $ sudo virsh attach-interface f32 network default > error: Failed to attach interface > error: Unable to read from monitor: Connection reset by peer > > 0x000055875b7f3a99 in tap_send (opaque=0x55875e39eae0) at ../net/tap.c:206 > 206 if (!s->nc.peer->do_not_pad) { > gdb$ bt > > s->nc.peer may not be set at this point. This seems to be an > expected case, as qemu_send_packet_* explicitly checks for NULL > s->nc.peer later. > > Fix it by checking for s->nc.peer here too. Padding is applied if > s->nc.peer is not set. > > https://bugzilla.redhat.com/show_bug.cgi?id=1949786 > Fixes: 969e50b61a2 > > Signed-off-by: Cole Robinson <crobinso@redhat.com> > --- > * Or should we skip padding if nc.peer is unset? I think so, the padding is for the peer. > I didn't dig into it > * tap-win3.c and slirp.c may need a similar fix, but the slirp case > didn't crash in a simple test. Yes, the reason is because there's no packet go through slirp I think. Thanks. > > net/tap.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/net/tap.c b/net/tap.c > index dd42ac6134..937559dbb8 100644 > --- a/net/tap.c > +++ b/net/tap.c > @@ -203,7 +203,7 @@ static void tap_send(void *opaque) > size -= s->host_vnet_hdr_len; > } > > - if (!s->nc.peer->do_not_pad) { > + if (!s->nc.peer || !s->nc.peer->do_not_pad) { > if (eth_pad_short_frame(min_pkt, &min_pktsz, buf, size)) { > buf = min_pkt; > size = min_pktsz;
On Thu, Apr 22, 2021 at 12:36 AM Philippe Mathieu-Daudé <philmd@redhat.com> wrote: > > Cc'ing Bin. > > On 4/21/21 5:22 PM, Cole Robinson wrote: > > Attempting to hotplug a tap nic with libvirt will crash qemu: > > > > $ sudo virsh attach-interface f32 network default > > error: Failed to attach interface > > error: Unable to read from monitor: Connection reset by peer > > > > 0x000055875b7f3a99 in tap_send (opaque=0x55875e39eae0) at ../net/tap.c:206 > > 206 if (!s->nc.peer->do_not_pad) { > > gdb$ bt > > > > s->nc.peer may not be set at this point. This seems to be an > > expected case, as qemu_send_packet_* explicitly checks for NULL > > s->nc.peer later. > > > > Fix it by checking for s->nc.peer here too. Padding is applied if > > s->nc.peer is not set. > > > > https://bugzilla.redhat.com/show_bug.cgi?id=1949786 > > Fixes: 969e50b61a2 > > > > Signed-off-by: Cole Robinson <crobinso@redhat.com> > > --- > > * Or should we skip padding if nc.peer is unset? I didn't dig into it > > * tap-win3.c and slirp.c may need a similar fix, but the slirp case > > didn't crash in a simple test. > > > > net/tap.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/net/tap.c b/net/tap.c > > index dd42ac6134..937559dbb8 100644 > > --- a/net/tap.c > > +++ b/net/tap.c > > @@ -203,7 +203,7 @@ static void tap_send(void *opaque) > > size -= s->host_vnet_hdr_len; > > } > > > > - if (!s->nc.peer->do_not_pad) { > > + if (!s->nc.peer || !s->nc.peer->do_not_pad) { I think we should do: if (s->nc.peer && !s->nc.peer->do_not_pad) > > if (eth_pad_short_frame(min_pkt, &min_pktsz, buf, size)) { > > buf = min_pkt; > > size = min_pktsz; > > And do the similar fix on tap-win32 and slirp codes. Regards, Bin
On Thu, 22 Apr 2021 at 05:29, Bin Meng <bmeng.cn@gmail.com> wrote: > > On Thu, Apr 22, 2021 at 12:36 AM Philippe Mathieu-Daudé > <philmd@redhat.com> wrote: > > > > Cc'ing Bin. > > > > On 4/21/21 5:22 PM, Cole Robinson wrote: > > > Attempting to hotplug a tap nic with libvirt will crash qemu: > > > > > > $ sudo virsh attach-interface f32 network default > > > error: Failed to attach interface > > > error: Unable to read from monitor: Connection reset by peer > > > > > > 0x000055875b7f3a99 in tap_send (opaque=0x55875e39eae0) at ../net/tap.c:206 > > > 206 if (!s->nc.peer->do_not_pad) { > > > gdb$ bt > > > > > > s->nc.peer may not be set at this point. This seems to be an > > > expected case, as qemu_send_packet_* explicitly checks for NULL > > > s->nc.peer later. > > > > > > Fix it by checking for s->nc.peer here too. Padding is applied if > > > s->nc.peer is not set. > > > > > > https://bugzilla.redhat.com/show_bug.cgi?id=1949786 > > > Fixes: 969e50b61a2 > > > > > > Signed-off-by: Cole Robinson <crobinso@redhat.com> > > > --- > > > * Or should we skip padding if nc.peer is unset? I didn't dig into it > > > * tap-win3.c and slirp.c may need a similar fix, but the slirp case > > > didn't crash in a simple test. > > > > > > net/tap.c | 2 +- > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > diff --git a/net/tap.c b/net/tap.c > > > index dd42ac6134..937559dbb8 100644 > > > --- a/net/tap.c > > > +++ b/net/tap.c > > > @@ -203,7 +203,7 @@ static void tap_send(void *opaque) > > > size -= s->host_vnet_hdr_len; > > > } > > > > > > - if (!s->nc.peer->do_not_pad) { > > > + if (!s->nc.peer || !s->nc.peer->do_not_pad) { > > I think we should do: > > if (s->nc.peer && !s->nc.peer->do_not_pad) Yes. If there is no peer then the qemu_send_packet() that we're about to do is going to discard the packet anyway, so there's no point in padding it. Maybe consider static inline bool net_peer_needs_padding(NetClientState *nc) { return nc->peer && !nc->peer->do_not_pad; } since we want the same check in three places ? thanks -- PMM
On Thu, Apr 22, 2021 at 5:36 PM Peter Maydell <peter.maydell@linaro.org> wrote: > > On Thu, 22 Apr 2021 at 05:29, Bin Meng <bmeng.cn@gmail.com> wrote: > > > > On Thu, Apr 22, 2021 at 12:36 AM Philippe Mathieu-Daudé > > <philmd@redhat.com> wrote: > > > > > > Cc'ing Bin. > > > > > > On 4/21/21 5:22 PM, Cole Robinson wrote: > > > > Attempting to hotplug a tap nic with libvirt will crash qemu: > > > > > > > > $ sudo virsh attach-interface f32 network default > > > > error: Failed to attach interface > > > > error: Unable to read from monitor: Connection reset by peer > > > > > > > > 0x000055875b7f3a99 in tap_send (opaque=0x55875e39eae0) at ../net/tap.c:206 > > > > 206 if (!s->nc.peer->do_not_pad) { > > > > gdb$ bt > > > > > > > > s->nc.peer may not be set at this point. This seems to be an > > > > expected case, as qemu_send_packet_* explicitly checks for NULL > > > > s->nc.peer later. > > > > > > > > Fix it by checking for s->nc.peer here too. Padding is applied if > > > > s->nc.peer is not set. > > > > > > > > https://bugzilla.redhat.com/show_bug.cgi?id=1949786 > > > > Fixes: 969e50b61a2 > > > > > > > > Signed-off-by: Cole Robinson <crobinso@redhat.com> > > > > --- > > > > * Or should we skip padding if nc.peer is unset? I didn't dig into it > > > > * tap-win3.c and slirp.c may need a similar fix, but the slirp case > > > > didn't crash in a simple test. > > > > > > > > net/tap.c | 2 +- > > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > > > diff --git a/net/tap.c b/net/tap.c > > > > index dd42ac6134..937559dbb8 100644 > > > > --- a/net/tap.c > > > > +++ b/net/tap.c > > > > @@ -203,7 +203,7 @@ static void tap_send(void *opaque) > > > > size -= s->host_vnet_hdr_len; > > > > } > > > > > > > > - if (!s->nc.peer->do_not_pad) { > > > > + if (!s->nc.peer || !s->nc.peer->do_not_pad) { > > > > I think we should do: > > > > if (s->nc.peer && !s->nc.peer->do_not_pad) > > Yes. If there is no peer then the qemu_send_packet() that we're about > to do is going to discard the packet anyway, so there's no point in > padding it. > > Maybe consider > > static inline bool net_peer_needs_padding(NetClientState *nc) > { > return nc->peer && !nc->peer->do_not_pad; > } > > since we want the same check in three places ? Sounds good to me. Regards, Bin
On 4/22/21 5:42 AM, Bin Meng wrote: > On Thu, Apr 22, 2021 at 5:36 PM Peter Maydell <peter.maydell@linaro.org> wrote: >> >> On Thu, 22 Apr 2021 at 05:29, Bin Meng <bmeng.cn@gmail.com> wrote: >>> >>> On Thu, Apr 22, 2021 at 12:36 AM Philippe Mathieu-Daudé >>> <philmd@redhat.com> wrote: >>>> >>>> Cc'ing Bin. >>>> >>>> On 4/21/21 5:22 PM, Cole Robinson wrote: >>>>> Attempting to hotplug a tap nic with libvirt will crash qemu: >>>>> >>>>> $ sudo virsh attach-interface f32 network default >>>>> error: Failed to attach interface >>>>> error: Unable to read from monitor: Connection reset by peer >>>>> >>>>> 0x000055875b7f3a99 in tap_send (opaque=0x55875e39eae0) at ../net/tap.c:206 >>>>> 206 if (!s->nc.peer->do_not_pad) { >>>>> gdb$ bt >>>>> >>>>> s->nc.peer may not be set at this point. This seems to be an >>>>> expected case, as qemu_send_packet_* explicitly checks for NULL >>>>> s->nc.peer later. >>>>> >>>>> Fix it by checking for s->nc.peer here too. Padding is applied if >>>>> s->nc.peer is not set. >>>>> >>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1949786 >>>>> Fixes: 969e50b61a2 >>>>> >>>>> Signed-off-by: Cole Robinson <crobinso@redhat.com> >>>>> --- >>>>> * Or should we skip padding if nc.peer is unset? I didn't dig into it >>>>> * tap-win3.c and slirp.c may need a similar fix, but the slirp case >>>>> didn't crash in a simple test. >>>>> >>>>> net/tap.c | 2 +- >>>>> 1 file changed, 1 insertion(+), 1 deletion(-) >>>>> >>>>> diff --git a/net/tap.c b/net/tap.c >>>>> index dd42ac6134..937559dbb8 100644 >>>>> --- a/net/tap.c >>>>> +++ b/net/tap.c >>>>> @@ -203,7 +203,7 @@ static void tap_send(void *opaque) >>>>> size -= s->host_vnet_hdr_len; >>>>> } >>>>> >>>>> - if (!s->nc.peer->do_not_pad) { >>>>> + if (!s->nc.peer || !s->nc.peer->do_not_pad) { >>> >>> I think we should do: >>> >>> if (s->nc.peer && !s->nc.peer->do_not_pad) >> >> Yes. If there is no peer then the qemu_send_packet() that we're about >> to do is going to discard the packet anyway, so there's no point in >> padding it. >> >> Maybe consider >> >> static inline bool net_peer_needs_padding(NetClientState *nc) >> { >> return nc->peer && !nc->peer->do_not_pad; >> } >> >> since we want the same check in three places ? > > Sounds good to me. > I did not get to this today. Bin/Jason/anyone want to write the patch, I will test it tomorrow (US EDT time). If not I'll write the patch tomorrow. Thanks, Cole
在 2021/4/23 上午5:34, Cole Robinson 写道: > On 4/22/21 5:42 AM, Bin Meng wrote: >> On Thu, Apr 22, 2021 at 5:36 PM Peter Maydell <peter.maydell@linaro.org> wrote: >>> On Thu, 22 Apr 2021 at 05:29, Bin Meng <bmeng.cn@gmail.com> wrote: >>>> On Thu, Apr 22, 2021 at 12:36 AM Philippe Mathieu-Daudé >>>> <philmd@redhat.com> wrote: >>>>> Cc'ing Bin. >>>>> >>>>> On 4/21/21 5:22 PM, Cole Robinson wrote: >>>>>> Attempting to hotplug a tap nic with libvirt will crash qemu: >>>>>> >>>>>> $ sudo virsh attach-interface f32 network default >>>>>> error: Failed to attach interface >>>>>> error: Unable to read from monitor: Connection reset by peer >>>>>> >>>>>> 0x000055875b7f3a99 in tap_send (opaque=0x55875e39eae0) at ../net/tap.c:206 >>>>>> 206 if (!s->nc.peer->do_not_pad) { >>>>>> gdb$ bt >>>>>> >>>>>> s->nc.peer may not be set at this point. This seems to be an >>>>>> expected case, as qemu_send_packet_* explicitly checks for NULL >>>>>> s->nc.peer later. >>>>>> >>>>>> Fix it by checking for s->nc.peer here too. Padding is applied if >>>>>> s->nc.peer is not set. >>>>>> >>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1949786 >>>>>> Fixes: 969e50b61a2 >>>>>> >>>>>> Signed-off-by: Cole Robinson <crobinso@redhat.com> >>>>>> --- >>>>>> * Or should we skip padding if nc.peer is unset? I didn't dig into it >>>>>> * tap-win3.c and slirp.c may need a similar fix, but the slirp case >>>>>> didn't crash in a simple test. >>>>>> >>>>>> net/tap.c | 2 +- >>>>>> 1 file changed, 1 insertion(+), 1 deletion(-) >>>>>> >>>>>> diff --git a/net/tap.c b/net/tap.c >>>>>> index dd42ac6134..937559dbb8 100644 >>>>>> --- a/net/tap.c >>>>>> +++ b/net/tap.c >>>>>> @@ -203,7 +203,7 @@ static void tap_send(void *opaque) >>>>>> size -= s->host_vnet_hdr_len; >>>>>> } >>>>>> >>>>>> - if (!s->nc.peer->do_not_pad) { >>>>>> + if (!s->nc.peer || !s->nc.peer->do_not_pad) { >>>> I think we should do: >>>> >>>> if (s->nc.peer && !s->nc.peer->do_not_pad) >>> Yes. If there is no peer then the qemu_send_packet() that we're about >>> to do is going to discard the packet anyway, so there's no point in >>> padding it. >>> >>> Maybe consider >>> >>> static inline bool net_peer_needs_padding(NetClientState *nc) >>> { >>> return nc->peer && !nc->peer->do_not_pad; >>> } >>> >>> since we want the same check in three places ? >> Sounds good to me. >> > I did not get to this today. Bin/Jason/anyone want to write the patch, I will send a patch soon. Thanks > I > will test it tomorrow (US EDT time). If not I'll write the patch tomorrow. > > Thanks, > Cole > >
diff --git a/net/tap.c b/net/tap.c index dd42ac6134..937559dbb8 100644 --- a/net/tap.c +++ b/net/tap.c @@ -203,7 +203,7 @@ static void tap_send(void *opaque) size -= s->host_vnet_hdr_len; } - if (!s->nc.peer->do_not_pad) { + if (!s->nc.peer || !s->nc.peer->do_not_pad) { if (eth_pad_short_frame(min_pkt, &min_pktsz, buf, size)) { buf = min_pkt; size = min_pktsz;
Attempting to hotplug a tap nic with libvirt will crash qemu: $ sudo virsh attach-interface f32 network default error: Failed to attach interface error: Unable to read from monitor: Connection reset by peer 0x000055875b7f3a99 in tap_send (opaque=0x55875e39eae0) at ../net/tap.c:206 206 if (!s->nc.peer->do_not_pad) { gdb$ bt s->nc.peer may not be set at this point. This seems to be an expected case, as qemu_send_packet_* explicitly checks for NULL s->nc.peer later. Fix it by checking for s->nc.peer here too. Padding is applied if s->nc.peer is not set. https://bugzilla.redhat.com/show_bug.cgi?id=1949786 Fixes: 969e50b61a2 Signed-off-by: Cole Robinson <crobinso@redhat.com> --- * Or should we skip padding if nc.peer is unset? I didn't dig into it * tap-win3.c and slirp.c may need a similar fix, but the slirp case didn't crash in a simple test. net/tap.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)