diff mbox

Guest unresponsive after Virtqueue size exceeded error

Message ID CABdb7358YVXbk++Z7s+q6Z5O0Q=6CQiQvYsTdc35eqmQeUnP_Q@mail.gmail.com (mailing list archive)
State New, archived
Headers show

Commit Message

Ladi Prosek June 16, 2017, 6:58 a.m. UTC
Hi,

On Wed, Jun 14, 2017 at 11:56 PM, Fernando Casas Schössow
<casasfernando@hotmail.com> wrote:
> Hi there,
>
> I recently migrated a Hyper-V host to qemu/kvm runing on Alpine Linux 3.6.1 (kernel 4.9.30 -with grsec patches- and qemu 2.8.1).
>
> Almost on daily basis at least one of the guests is showing the following error in the log and the it needs to be terminated and restarted to recover it:
>
> qemu-system-x86_64: Virtqueue size exceeded
>
> Is not always the same guest, and the error is appearing for both, Linux (CentOS 7.3) and Windows (2012R2) guests.
> As soon as this error appears the guest is not really working anymore. It may respond to ping or you can even try to login but then everything is very slow or completely unresponsive. Restarting the guest from within the guest OS is not working either and the only thing I can do is to terminate it (virsh destroy) and start it again until the next failure.
>
> In Windows guest the error seems to be related to disk:
> "Reset to device, \Device\RaidPort2, was issued" and the source is viostor
>
> And in Linux guests the error is always (with the process and pid changing):
>
> INFO: task <process>:<pid> blocked for more than 120 seconds
>
> But unfortunately I was not able to find any other indication of a problem in the guests logs nor in the host logs except for the error regarding the virtqueue size. The problem is happening at different times of day and I couldn't find any patterns yet.
>
> All the Windows guests are using virtio drivers version 126 and all Linux guests are CentOS 7.3 using the latest kernel available in the distribution (3.10.0-514.21.1). They all run qemu-guest agent as well.
> All the guest disks are qcow2 images with cache=none and aimode=threads (tried native mode before but with the same results).
>
> Example qemu command for a Linux guest:
>
> /usr/bin/qemu-system-x86_64 -name guest=DOCKER01,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-24-DOCKER01/master-key.aes -machine pc-i440fx-2.8,accel=kvm,usb=off,dump-guest-core=off -cpu IvyBridge,+ds,+acpi,+ss,+ht,+tm,+pbe,+dtes64,+monitor,+ds_cpl,+vmx,+smx,+est,+tm2,+xtpr,+pdcm,+pcid,+osxsave,+arat,+xsaveopt -drive file=/usr/share/edk2.git/ovmf-x64/OVMF_CODE-pure-efi.fd,if=pflash,format=raw,unit=0,readonly=on -drive file=/var/lib/libvirt/qemu/nvram/DOCKER01_VARS.fd,if=pflash,format=raw,unit=1 -m 2048 -realtime mlock=off -smp 2,sockets=2,cores=1,threads=1 -uuid 4705b146-3b14-4c20-923c-42105d47e7fc -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-24-DOCKER01/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=delay -no-hpet -no-shutdown -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -boot strict=on -device ich9-usb-ehci1,id=usb,bus=pci.0,addr=0x4.0x7 -device ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pci.0,multifunction=on,addr=0x4 -device ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pci.0,addr=0x4.0x1 -device ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pci.0,addr=0x4.0x2 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 -drive file=/storage/storage-ssd-vms/virtual_machines_ssd/docker01.qcow2,format=qcow2,if=none,id=drive-virtio-disk0,cache=none,aio=threads -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -netdev tap,fd=35,id=hostnet0,vhost=on,vhostfd=45 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:1c:af:ce,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev socket,id=charchannel0,path=/var/lib/libvirt/qemu/channel/target/domain-24-DOCKER01/org.qemu.guest_agent.0,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0 -chardev spicevmc,id=charchannel1,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=com.redhat.spice.0 -device usb-tablet,id=input0,bus=usb.0,port=1 -spice port=5905,addr=127.0.0.1,disable-ticketing,seamless-migration=on -device qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,vram64_size_mb=0,vgamem_mb=16,max_outputs=1,bus=pci.0,addr=0x2 -chardev spicevmc,id=charredir0,name=usbredir -device usb-redir,chardev=charredir0,id=redir0,bus=usb.0,port=2 -chardev spicevmc,id=charredir1,name=usbredir -device usb-redir,chardev=charredir1,id=redir1,bus=usb.0,port=3 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7 -object rng-random,id=objrng0,filename=/dev/random -device virtio-rng-pci,rng=objrng0,id=rng0,bus=pci.0,addr=0x8 -msg timestamp=on
>
> For what it worth, the same guests were working fine for years on Hyper-V on the same hardware (Intel Xeon E3, 32GB RAM, Supermicro mainboard, 6x3TB Western Digital Red disks and 6x120MB Kingston V300 SSD all connected to a LSI LSISAS2008 controller).
> Except for this stability issue that I hope to solve everything else is working great and outperforming Hyper-V.
>
> Any ideas, thoughts or suggestions to try to narrow down the problem?

Would you be able to enhance the error message and rebuild QEMU?


This would at least confirm the theory that it's caused by virtio-blk-pci.

If rebuilding is not feasible I would start by removing other virtio
devices -- particularly balloon which has had quite a few virtio
related bugs fixed recently.

Does your environment involve VM migrations or saving/resuming, or
does the crashing QEMU process always run the VM from its boot?

Thanks!

> Thanks in advance and sorry for the long email but I wanted to be as descriptive as possible.
>
> Fer

Comments

Fernando Casas Schössow June 16, 2017, 10:11 a.m. UTC | #1
Hi Ladi,

Thanks a lot for looking into this and replying.
I will do my best to rebuild and deploy Alpine's qemu packages with this patch included but not sure its feasible yet.
In any case, would it be possible to have this patch included in the next qemu release?
The current error message is helpful but knowing which device was involved will be much more helpful.

Regarding the environment, I'm not doing migrations and only managed save is done in case the host needs to be rebooted or shutdown. The QEMU process is running the VM since the host is started and this failuire is ocurring randomly without any previous manage save done.

As part of troubleshooting on one of the guests I switched from virtio_blk to virtio_scsi for the guest disks but I will need more time to see if that helped.
If I have this problem again I will follow your advise and remove virtio_balloon.

Another question: is there any way to monitor the virtqueue size either from the guest itself or from the host? Any file in sysfs or proc?
This may help to understand in which conditions this is happening and to react faster to mitigate the problem.

Thanks again for your help with this!

Fer

On vie, jun 16, 2017 at 8:58 , Ladi Prosek <lprosek@redhat.com> wrote:
Hi,
Would you be able to enhance the error message and rebuild QEMU? --- a/hw/virtio/virtio.c +++ b/hw/virtio/virtio.c @@ -856,7 +856,7 @@ void *virtqueue_pop(VirtQueue *vq, size_t sz) max = vq->vring.num; if (vq->inuse >= vq->vring.num) { - virtio_error(vdev, "Virtqueue size exceeded"); + virtio_error(vdev, "Virtqueue %u device %s size exceeded", vq->queue_index, vdev->name); goto done; } This would at least confirm the theory that it's caused by virtio-blk-pci. If rebuilding is not feasible I would start by removing other virtio devices -- particularly balloon which has had quite a few virtio related bugs fixed recently. Does your environment involve VM migrations or saving/resuming, or does the crashing QEMU process always run the VM from its boot? Thanks!
Ladi Prosek June 16, 2017, 10:25 a.m. UTC | #2
On Fri, Jun 16, 2017 at 12:11 PM, Fernando Casas Schössow
<casasfernando@hotmail.com> wrote:
> Hi Ladi,
>
> Thanks a lot for looking into this and replying.
> I will do my best to rebuild and deploy Alpine's qemu packages with this
> patch included but not sure its feasible yet.
> In any case, would it be possible to have this patch included in the next
> qemu release?

Yes, I have already added this to my todo list.

> The current error message is helpful but knowing which device was involved
> will be much more helpful.
>
> Regarding the environment, I'm not doing migrations and only managed save is
> done in case the host needs to be rebooted or shutdown. The QEMU process is
> running the VM since the host is started and this failuire is ocurring
> randomly without any previous manage save done.
>
> As part of troubleshooting on one of the guests I switched from virtio_blk
> to virtio_scsi for the guest disks but I will need more time to see if that
> helped.
> If I have this problem again I will follow your advise and remove
> virtio_balloon.

Thanks, please keep us posted.

> Another question: is there any way to monitor the virtqueue size either from
> the guest itself or from the host? Any file in sysfs or proc?
> This may help to understand in which conditions this is happening and to
> react faster to mitigate the problem.

The problem is not in the virtqueue size but in one piece of internal
state ("inuse") which is meant to track the number of buffers "checked
out" by QEMU. It's being compared to virtqueue size merely as a sanity
check.

I'm afraid that there's no way to expose this variable without
rebuilding QEMU. The best you could do is attach gdb to the QEMU
process and use some clever data access breakpoints to catch
suspicious writes to the variable. Although it's likely that it just
creeps up slowly and you won't see anything interesting. It's probably
beyond reasonable at this point anyway.

I would continue with the elimination process (virtio_scsi instead of
virtio_blk, no balloon, etc.) and then maybe once we know which device
it is, we can add some instrumentation to the code.

> Thanks again for your help with this!
>
> Fer
>
> On vie, jun 16, 2017 at 8:58 , Ladi Prosek <lprosek@redhat.com> wrote:
>
> Hi,
>
> Would you be able to enhance the error message and rebuild QEMU? ---
> a/hw/virtio/virtio.c +++ b/hw/virtio/virtio.c @@ -856,7 +856,7 @@ void
> *virtqueue_pop(VirtQueue *vq, size_t sz) max = vq->vring.num; if (vq->inuse
>>= vq->vring.num) { - virtio_error(vdev, "Virtqueue size exceeded"); +
> virtio_error(vdev, "Virtqueue %u device %s size exceeded", vq->queue_index,
> vdev->name); goto done; } This would at least confirm the theory that it's
> caused by virtio-blk-pci. If rebuilding is not feasible I would start by
> removing other virtio devices -- particularly balloon which has had quite a
> few virtio related bugs fixed recently. Does your environment involve VM
> migrations or saving/resuming, or does the crashing QEMU process always run
> the VM from its boot? Thanks!
>
>
>
Fernando Casas Schössow June 19, 2017, 10:10 p.m. UTC | #3
Hi Ladi,

Today two guests failed again at different times of day.
One of them was the one I switched from virtio_blk to virtio_scsi so this change didn't solved the problem.
Now in this guest I also disabled virtio_balloon, continuing with the elimination process.

Also this time I found a different error message in the guest console.
In the guest already switched to virtio_scsi:

virtio_scsi virtio2: request:id 44 is not a head!

Followed by the usual "task blocked for more than 120 seconds." error.

On the guest still running on virtio_blk the error was similar:

virtio_blk virtio2: req.0:id 42 is not a head!
blk_update_request: I/O error, dev vda, sector 645657736
Buffer I/O error on dev dm-1, logical block 7413821, lost async page write

Followed by the usual "task blocked for more than 120 seconds." error.

Do you think that the blk_update_request and the buffer I/O error may be a consequence of the previous "is not a head!" error or should I be worried for a storage level issue here?

Now I will wait to see if disabling virtio_balloon helps or not and report back.

Thanks.

Fer

On vie, jun 16, 2017 at 12:25 , Ladi Prosek <lprosek@redhat.com> wrote:
On Fri, Jun 16, 2017 at 12:11 PM, Fernando Casas Schössow <casasfernando@hotmail.com<mailto:casasfernando@hotmail.com>> wrote:
Hi Ladi, Thanks a lot for looking into this and replying. I will do my best to rebuild and deploy Alpine's qemu packages with this patch included but not sure its feasible yet. In any case, would it be possible to have this patch included in the next qemu release?
Yes, I have already added this to my todo list.
The current error message is helpful but knowing which device was involved will be much more helpful. Regarding the environment, I'm not doing migrations and only managed save is done in case the host needs to be rebooted or shutdown. The QEMU process is running the VM since the host is started and this failuire is ocurring randomly without any previous manage save done. As part of troubleshooting on one of the guests I switched from virtio_blk to virtio_scsi for the guest disks but I will need more time to see if that helped. If I have this problem again I will follow your advise and remove virtio_balloon.
Thanks, please keep us posted.
Another question: is there any way to monitor the virtqueue size either from the guest itself or from the host? Any file in sysfs or proc? This may help to understand in which conditions this is happening and to react faster to mitigate the problem.
The problem is not in the virtqueue size but in one piece of internal state ("inuse") which is meant to track the number of buffers "checked out" by QEMU. It's being compared to virtqueue size merely as a sanity check. I'm afraid that there's no way to expose this variable without rebuilding QEMU. The best you could do is attach gdb to the QEMU process and use some clever data access breakpoints to catch suspicious writes to the variable. Although it's likely that it just creeps up slowly and you won't see anything interesting. It's probably beyond reasonable at this point anyway. I would continue with the elimination process (virtio_scsi instead of virtio_blk, no balloon, etc.) and then maybe once we know which device it is, we can add some instrumentation to the code.
Thanks again for your help with this! Fer On vie, jun 16, 2017 at 8:58 , Ladi Prosek <lprosek@redhat.com<mailto:lprosek@redhat.com>> wrote: Hi, Would you be able to enhance the error message and rebuild QEMU? --- a/hw/virtio/virtio.c +++ b/hw/virtio/virtio.c @@ -856,7 +856,7 @@ void *virtqueue_pop(VirtQueue *vq, size_t sz) max = vq->vring.num; if (vq->inuse
= vq->vring.num) { - virtio_error(vdev, "Virtqueue size exceeded"); +
virtio_error(vdev, "Virtqueue %u device %s size exceeded", vq->queue_index, vdev->name); goto done; } This would at least confirm the theory that it's caused by virtio-blk-pci. If rebuilding is not feasible I would start by removing other virtio devices -- particularly balloon which has had quite a few virtio related bugs fixed recently. Does your environment involve VM migrations or saving/resuming, or does the crashing QEMU process always run the VM from its boot? Thanks!
Ladi Prosek June 20, 2017, 5:59 a.m. UTC | #4
Hi Fernando,

On Tue, Jun 20, 2017 at 12:10 AM, Fernando Casas Schössow
<casasfernando@hotmail.com> wrote:
> Hi Ladi,
>
> Today two guests failed again at different times of day.
> One of them was the one I switched from virtio_blk to virtio_scsi so this
> change didn't solved the problem.
> Now in this guest I also disabled virtio_balloon, continuing with the
> elimination process.
>
> Also this time I found a different error message in the guest console.
> In the guest already switched to virtio_scsi:
>
> virtio_scsi virtio2: request:id 44 is not a head!
>
> Followed by the usual "task blocked for more than 120 seconds." error.
>
> On the guest still running on virtio_blk the error was similar:
>
> virtio_blk virtio2: req.0:id 42 is not a head!
> blk_update_request: I/O error, dev vda, sector 645657736
> Buffer I/O error on dev dm-1, logical block 7413821, lost async page write
>
> Followed by the usual "task blocked for more than 120 seconds." error.

Honestly this is starting to look more and more like a memory
corruption. Two different virtio devices and two different guest
operating systems, that would have to be a bug in the common virtio
code and we would have seen it somewhere else already.

Would it be possible run a thorough memtest on the host just in case?

> Do you think that the blk_update_request and the buffer I/O error may be a
> consequence of the previous "is not a head!" error or should I be worried
> for a storage level issue here?
>
> Now I will wait to see if disabling virtio_balloon helps or not and report
> back.
>
> Thanks.
>
> Fer
>
> On vie, jun 16, 2017 at 12:25 , Ladi Prosek <lprosek@redhat.com> wrote:
>
> On Fri, Jun 16, 2017 at 12:11 PM, Fernando Casas Schössow
> <casasfernando@hotmail.com> wrote:
>
> Hi Ladi, Thanks a lot for looking into this and replying. I will do my best
> to rebuild and deploy Alpine's qemu packages with this patch included but
> not sure its feasible yet. In any case, would it be possible to have this
> patch included in the next qemu release?
>
> Yes, I have already added this to my todo list.
>
> The current error message is helpful but knowing which device was involved
> will be much more helpful. Regarding the environment, I'm not doing
> migrations and only managed save is done in case the host needs to be
> rebooted or shutdown. The QEMU process is running the VM since the host is
> started and this failuire is ocurring randomly without any previous manage
> save done. As part of troubleshooting on one of the guests I switched from
> virtio_blk to virtio_scsi for the guest disks but I will need more time to
> see if that helped. If I have this problem again I will follow your advise
> and remove virtio_balloon.
>
> Thanks, please keep us posted.
>
> Another question: is there any way to monitor the virtqueue size either from
> the guest itself or from the host? Any file in sysfs or proc? This may help
> to understand in which conditions this is happening and to react faster to
> mitigate the problem.
>
> The problem is not in the virtqueue size but in one piece of internal state
> ("inuse") which is meant to track the number of buffers "checked out" by
> QEMU. It's being compared to virtqueue size merely as a sanity check. I'm
> afraid that there's no way to expose this variable without rebuilding QEMU.
> The best you could do is attach gdb to the QEMU process and use some clever
> data access breakpoints to catch suspicious writes to the variable. Although
> it's likely that it just creeps up slowly and you won't see anything
> interesting. It's probably beyond reasonable at this point anyway. I would
> continue with the elimination process (virtio_scsi instead of virtio_blk, no
> balloon, etc.) and then maybe once we know which device it is, we can add
> some instrumentation to the code.
>
> Thanks again for your help with this! Fer On vie, jun 16, 2017 at 8:58 ,
> Ladi Prosek <lprosek@redhat.com> wrote: Hi, Would you be able to enhance the
> error message and rebuild QEMU? --- a/hw/virtio/virtio.c +++
> b/hw/virtio/virtio.c @@ -856,7 +856,7 @@ void *virtqueue_pop(VirtQueue *vq,
> size_t sz) max = vq->vring.num; if (vq->inuse
>
> = vq->vring.num) { - virtio_error(vdev, "Virtqueue size exceeded"); +
>
> virtio_error(vdev, "Virtqueue %u device %s size exceeded", vq->queue_index,
> vdev->name); goto done; } This would at least confirm the theory that it's
> caused by virtio-blk-pci. If rebuilding is not feasible I would start by
> removing other virtio devices -- particularly balloon which has had quite a
> few virtio related bugs fixed recently. Does your environment involve VM
> migrations or saving/resuming, or does the crashing QEMU process always run
> the VM from its boot? Thanks!
>
>
>
Fernando Casas Schössow June 20, 2017, 6:30 a.m. UTC | #5
Hi Ladi,

In this case both guests are CentOS 7.3 running the same kernel 3.10.0-514.21.1.
Also the guest that fails most frequently is running Docker with 4 or 5 containers.

Another thing I would like to mention is that the host is running on Alpine's default grsec patched kernel. I have the option to install also a vanilla kernel. Would it make sense to switch to the vanilla kernel on the host and see if that helps?
And last but not least KSM is enabled on the host. Should I disable it?

Following your advice I will run memtest on the host and report back. Just as a side comment, the host is running on ECC memory.

Thanks for all your help.

Fer.

On mar, jun 20, 2017 at 7:59 , Ladi Prosek <lprosek@redhat.com> wrote:
Hi Fernando, On Tue, Jun 20, 2017 at 12:10 AM, Fernando Casas Schössow <casasfernando@hotmail.com<mailto:casasfernando@hotmail.com>> wrote:
Hi Ladi, Today two guests failed again at different times of day. One of them was the one I switched from virtio_blk to virtio_scsi so this change didn't solved the problem. Now in this guest I also disabled virtio_balloon, continuing with the elimination process. Also this time I found a different error message in the guest console. In the guest already switched to virtio_scsi: virtio_scsi virtio2: request:id 44 is not a head! Followed by the usual "task blocked for more than 120 seconds." error. On the guest still running on virtio_blk the error was similar: virtio_blk virtio2: req.0:id 42 is not a head! blk_update_request: I/O error, dev vda, sector 645657736 Buffer I/O error on dev dm-1, logical block 7413821, lost async page write Followed by the usual "task blocked for more than 120 seconds." error.
Honestly this is starting to look more and more like a memory corruption. Two different virtio devices and two different guest operating systems, that would have to be a bug in the common virtio code and we would have seen it somewhere else already. Would it be possible run a thorough memtest on the host just in case?
Do you think that the blk_update_request and the buffer I/O error may be a consequence of the previous "is not a head!" error or should I be worried for a storage level issue here? Now I will wait to see if disabling virtio_balloon helps or not and report back. Thanks. Fer On vie, jun 16, 2017 at 12:25 , Ladi Prosek <lprosek@redhat.com<mailto:lprosek@redhat.com>> wrote: On Fri, Jun 16, 2017 at 12:11 PM, Fernando Casas Schössow <casasfernando@hotmail.com<mailto:casasfernando@hotmail.com>> wrote: Hi Ladi, Thanks a lot for looking into this and replying. I will do my best to rebuild and deploy Alpine's qemu packages with this patch included but not sure its feasible yet. In any case, would it be possible to have this patch included in the next qemu release? Yes, I have already added this to my todo list. The current error message is helpful but knowing which device was involved will be much more helpful. Regarding the environment, I'm not doing migrations and only managed save is done in case the host needs to be rebooted or shutdown. The QEMU process is running the VM since the host is started and this failuire is ocurring randomly without any previous manage save done. As part of troubleshooting on one of the guests I switched from virtio_blk to virtio_scsi for the guest disks but I will need more time to see if that helped. If I have this problem again I will follow your advise and remove virtio_balloon. Thanks, please keep us posted. Another question: is there any way to monitor the virtqueue size either from the guest itself or from the host? Any file in sysfs or proc? This may help to understand in which conditions this is happening and to react faster to mitigate the problem. The problem is not in the virtqueue size but in one piece of internal state ("inuse") which is meant to track the number of buffers "checked out" by QEMU. It's being compared to virtqueue size merely as a sanity check. I'm afraid that there's no way to expose this variable without rebuilding QEMU. The best you could do is attach gdb to the QEMU process and use some clever data access breakpoints to catch suspicious writes to the variable. Although it's likely that it just creeps up slowly and you won't see anything interesting. It's probably beyond reasonable at this point anyway. I would continue with the elimination process (virtio_scsi instead of virtio_blk, no balloon, etc.) and then maybe once we know which device it is, we can add some instrumentation to the code. Thanks again for your help with this! Fer On vie, jun 16, 2017 at 8:58 , Ladi Prosek <lprosek@redhat.com<mailto:lprosek@redhat.com>> wrote: Hi, Would you be able to enhance the error message and rebuild QEMU? --- a/hw/virtio/virtio.c +++ b/hw/virtio/virtio.c @@ -856,7 +856,7 @@ void *virtqueue_pop(VirtQueue *vq, size_t sz) max = vq->vring.num; if (vq->inuse = vq->vring.num) { - virtio_error(vdev, "Virtqueue size exceeded"); + virtio_error(vdev, "Virtqueue %u device %s size exceeded", vq->queue_index, vdev->name); goto done; } This would at least confirm the theory that it's caused by virtio-blk-pci. If rebuilding is not feasible I would start by removing other virtio devices -- particularly balloon which has had quite a few virtio related bugs fixed recently. Does your environment involve VM migrations or saving/resuming, or does the crashing QEMU process always run the VM from its boot? Thanks!
Ladi Prosek June 20, 2017, 7:52 a.m. UTC | #6
On Tue, Jun 20, 2017 at 8:30 AM, Fernando Casas Schössow
<casasfernando@hotmail.com> wrote:
> Hi Ladi,
>
> In this case both guests are CentOS 7.3 running the same kernel
> 3.10.0-514.21.1.
> Also the guest that fails most frequently is running Docker with 4 or 5
> containers.
>
> Another thing I would like to mention is that the host is running on
> Alpine's default grsec patched kernel. I have the option to install also a
> vanilla kernel. Would it make sense to switch to the vanilla kernel on the
> host and see if that helps?

The host kernel is less likely to be responsible for this, in my
opinion. I'd hold off on that for now.

> And last but not least KSM is enabled on the host. Should I disable it?

Could be worth the try.

> Following your advice I will run memtest on the host and report back. Just
> as a side comment, the host is running on ECC memory.

I see.

Would it be possible for you, once a guest is in the broken state, to
make it available for debugging? By attaching gdb to the QEMU process
for example and letting me poke around it remotely? Thanks!

> Thanks for all your help.
>
> Fer.
>
> On mar, jun 20, 2017 at 7:59 , Ladi Prosek <lprosek@redhat.com> wrote:
>
> Hi Fernando, On Tue, Jun 20, 2017 at 12:10 AM, Fernando Casas Schössow
> <casasfernando@hotmail.com> wrote:
>
> Hi Ladi, Today two guests failed again at different times of day. One of
> them was the one I switched from virtio_blk to virtio_scsi so this change
> didn't solved the problem. Now in this guest I also disabled virtio_balloon,
> continuing with the elimination process. Also this time I found a different
> error message in the guest console. In the guest already switched to
> virtio_scsi: virtio_scsi virtio2: request:id 44 is not a head! Followed by
> the usual "task blocked for more than 120 seconds." error. On the guest
> still running on virtio_blk the error was similar: virtio_blk virtio2:
> req.0:id 42 is not a head! blk_update_request: I/O error, dev vda, sector
> 645657736 Buffer I/O error on dev dm-1, logical block 7413821, lost async
> page write Followed by the usual "task blocked for more than 120 seconds."
> error.
>
> Honestly this is starting to look more and more like a memory corruption.
> Two different virtio devices and two different guest operating systems, that
> would have to be a bug in the common virtio code and we would have seen it
> somewhere else already. Would it be possible run a thorough memtest on the
> host just in case?
>
> Do you think that the blk_update_request and the buffer I/O error may be a
> consequence of the previous "is not a head!" error or should I be worried
> for a storage level issue here? Now I will wait to see if disabling
> virtio_balloon helps or not and report back. Thanks. Fer On vie, jun 16,
> 2017 at 12:25 , Ladi Prosek <lprosek@redhat.com> wrote: On Fri, Jun 16, 2017
> at 12:11 PM, Fernando Casas Schössow <casasfernando@hotmail.com> wrote: Hi
> Ladi, Thanks a lot for looking into this and replying. I will do my best to
> rebuild and deploy Alpine's qemu packages with this patch included but not
> sure its feasible yet. In any case, would it be possible to have this patch
> included in the next qemu release? Yes, I have already added this to my todo
> list. The current error message is helpful but knowing which device was
> involved will be much more helpful. Regarding the environment, I'm not doing
> migrations and only managed save is done in case the host needs to be
> rebooted or shutdown. The QEMU process is running the VM since the host is
> started and this failuire is ocurring randomly without any previous manage
> save done. As part of troubleshooting on one of the guests I switched from
> virtio_blk to virtio_scsi for the guest disks but I will need more time to
> see if that helped. If I have this problem again I will follow your advise
> and remove virtio_balloon. Thanks, please keep us posted. Another question:
> is there any way to monitor the virtqueue size either from the guest itself
> or from the host? Any file in sysfs or proc? This may help to understand in
> which conditions this is happening and to react faster to mitigate the
> problem. The problem is not in the virtqueue size but in one piece of
> internal state ("inuse") which is meant to track the number of buffers
> "checked out" by QEMU. It's being compared to virtqueue size merely as a
> sanity check. I'm afraid that there's no way to expose this variable without
> rebuilding QEMU. The best you could do is attach gdb to the QEMU process and
> use some clever data access breakpoints to catch suspicious writes to the
> variable. Although it's likely that it just creeps up slowly and you won't
> see anything interesting. It's probably beyond reasonable at this point
> anyway. I would continue with the elimination process (virtio_scsi instead
> of virtio_blk, no balloon, etc.) and then maybe once we know which device it
> is, we can add some instrumentation to the code. Thanks again for your help
> with this! Fer On vie, jun 16, 2017 at 8:58 , Ladi Prosek
> <lprosek@redhat.com> wrote: Hi, Would you be able to enhance the error
> message and rebuild QEMU? --- a/hw/virtio/virtio.c +++ b/hw/virtio/virtio.c
> @@ -856,7 +856,7 @@ void *virtqueue_pop(VirtQueue *vq, size_t sz) max =
> vq->vring.num; if (vq->inuse = vq->vring.num) { - virtio_error(vdev,
> "Virtqueue size exceeded"); + virtio_error(vdev, "Virtqueue %u device %s
> size exceeded", vq->queue_index, vdev->name); goto done; } This would at
> least confirm the theory that it's caused by virtio-blk-pci. If rebuilding
> is not feasible I would start by removing other virtio devices --
> particularly balloon which has had quite a few virtio related bugs fixed
> recently. Does your environment involve VM migrations or saving/resuming, or
> does the crashing QEMU process always run the VM from its boot? Thanks!
>
>
>
Fernando Casas Schössow June 21, 2017, 12:19 p.m. UTC | #7
Hi Ladi,

Sorry for the delay in my reply.
I will leave the host kernel alone for now then.
For the last 15 hours or so I'm running memtest86+ on the host. So far so good. Two passes no errors so far. I will try to leave it running for at least another 24hr and report back the results. Hopefully we can discard the memory issue at hardware level.

Regarding KSM, that will be the next thing I will disable if after removing the balloon device guests still crash.

About leaving a guest in a failed state for you to debug it remotely, that's absolutely an option. We just need to coordinate so I can give you remote access to the host and so on. Let me know if any preparation is needed in advance and which tools you need installed on the host.

Once I again I would like to thank you for all your help and your great disposition!

Cheers,

Fer

On mar, jun 20, 2017 at 9:52 , Ladi Prosek <lprosek@redhat.com> wrote:
The host kernel is less likely to be responsible for this, in my opinion. I'd hold off on that for now.
And last but not least KSM is enabled on the host. Should I disable it?
Could be worth the try.
Following your advice I will run memtest on the host and report back. Just as a side comment, the host is running on ECC memory.
I see. Would it be possible for you, once a guest is in the broken state, to make it available for debugging? By attaching gdb to the QEMU process for example and letting me poke around it remotely? Thanks!
Ladi Prosek June 22, 2017, 7:43 a.m. UTC | #8
Hi Fernando,

On Wed, Jun 21, 2017 at 2:19 PM, Fernando Casas Schössow
<casasfernando@hotmail.com> wrote:
> Hi Ladi,
>
> Sorry for the delay in my reply.
> I will leave the host kernel alone for now then.
> For the last 15 hours or so I'm running memtest86+ on the host. So far so
> good. Two passes no errors so far. I will try to leave it running for at
> least another 24hr and report back the results. Hopefully we can discard the
> memory issue at hardware level.
>
> Regarding KSM, that will be the next thing I will disable if after removing
> the balloon device guests still crash.
>
> About leaving a guest in a failed state for you to debug it remotely, that's
> absolutely an option. We just need to coordinate so I can give you remote
> access to the host and so on. Let me know if any preparation is needed in
> advance and which tools you need installed on the host.

I think that gdbserver attached to the QEMU process should be enough.
When the VM gets into the broken state please do something like:

gdbserver --attach host:12345 <QEMU pid>

and let me know the host name and port (12345 in the above example).

> Once I again I would like to thank you for all your help and your great
> disposition!

You're absolutely welcome, I don't think I've done anything helpful so far :)

> Cheers,
>
> Fer
>
> On mar, jun 20, 2017 at 9:52 , Ladi Prosek <lprosek@redhat.com> wrote:
>
> The host kernel is less likely to be responsible for this, in my opinion.
> I'd hold off on that for now.
>
> And last but not least KSM is enabled on the host. Should I disable it?
>
> Could be worth the try.
>
> Following your advice I will run memtest on the host and report back. Just
> as a side comment, the host is running on ECC memory.
>
> I see. Would it be possible for you, once a guest is in the broken state, to
> make it available for debugging? By attaching gdb to the QEMU process for
> example and letting me poke around it remotely? Thanks!
>
>
>
Fernando Casas Schössow June 23, 2017, 6:29 a.m. UTC | #9
Hi Ladi,

Small update. Memtest86+ was running on the host for more than 54 hours. 8 passes were completed and no memory errors found. For now I think we can assume that the host memory is ok.

I just started all the guests one hour ago. I will monitor them and once one fails I will attach the debugger and let you know.

Thanks.

Fer

On jue, jun 22, 2017 at 9:43 , Ladi Prosek <lprosek@redhat.com> wrote:
Hi Fernando, On Wed, Jun 21, 2017 at 2:19 PM, Fernando Casas Schössow <casasfernando@hotmail.com<mailto:casasfernando@hotmail.com>> wrote:
Hi Ladi, Sorry for the delay in my reply. I will leave the host kernel alone for now then. For the last 15 hours or so I'm running memtest86+ on the host. So far so good. Two passes no errors so far. I will try to leave it running for at least another 24hr and report back the results. Hopefully we can discard the memory issue at hardware level. Regarding KSM, that will be the next thing I will disable if after removing the balloon device guests still crash. About leaving a guest in a failed state for you to debug it remotely, that's absolutely an option. We just need to coordinate so I can give you remote access to the host and so on. Let me know if any preparation is needed in advance and which tools you need installed on the host.
I think that gdbserver attached to the QEMU process should be enough. When the VM gets into the broken state please do something like: gdbserver --attach host:12345 <QEMU pid> and let me know the host name and port (12345 in the above example).
Once I again I would like to thank you for all your help and your great disposition!
You're absolutely welcome, I don't think I've done anything helpful so far :)
Cheers, Fer On mar, jun 20, 2017 at 9:52 , Ladi Prosek <lprosek@redhat.com<mailto:lprosek@redhat.com>> wrote: The host kernel is less likely to be responsible for this, in my opinion. I'd hold off on that for now. And last but not least KSM is enabled on the host. Should I disable it? Could be worth the try. Following your advice I will run memtest on the host and report back. Just as a side comment, the host is running on ECC memory. I see. Would it be possible for you, once a guest is in the broken state, to make it available for debugging? By attaching gdb to the QEMU process for example and letting me poke around it remotely? Thanks!
Fernando Casas Schössow June 24, 2017, 8:34 a.m. UTC | #10
Hi Ladi,

After running for about 15hrs two different guests (one Windows, one Linux) crashed with around 1 hour difference and the same error in qemu log "Virqueue size exceeded".

The Linux guest was already running on virtio_scsi and without virtio_balloon. :(
I compiled and attached gdbserver to the qemu process for this guest but when I did this I got the following warning in gdbserver:

warning: Cannot call inferior functions, Linux kernel PaX protection forbids return to non-executable pages!

The default Alpine kernel is a grsec kernel. Not sure if this will interfere with debugging or not but I suspect yes.
If you need me to replace the grsec kernel with a vanilla one (also available as an option in Alpine) let me know and I will do so.
Otherwise send me an email directly so I can share with you the host:port details so you can connect to gdbserver.

Thanks,

Fer

On vie, jun 23, 2017 at 8:29 , Fernando Casas Schössow <casasfernando@hotmail.com> wrote:
Hi Ladi,

Small update. Memtest86+ was running on the host for more than 54 hours. 8 passes were completed and no memory errors found. For now I think we can assume that the host memory is ok.

I just started all the guests one hour ago. I will monitor them and once one fails I will attach the debugger and let you know.

Thanks.

Fer

On jue, jun 22, 2017 at 9:43 , Ladi Prosek <lprosek@redhat.com> wrote:
Hi Fernando, On Wed, Jun 21, 2017 at 2:19 PM, Fernando Casas Schössow <casasfernando@hotmail.com<mailto:casasfernando@hotmail.com>> wrote:
Hi Ladi, Sorry for the delay in my reply. I will leave the host kernel alone for now then. For the last 15 hours or so I'm running memtest86+ on the host. So far so good. Two passes no errors so far. I will try to leave it running for at least another 24hr and report back the results. Hopefully we can discard the memory issue at hardware level. Regarding KSM, that will be the next thing I will disable if after removing the balloon device guests still crash. About leaving a guest in a failed state for you to debug it remotely, that's absolutely an option. We just need to coordinate so I can give you remote access to the host and so on. Let me know if any preparation is needed in advance and which tools you need installed on the host.
I think that gdbserver attached to the QEMU process should be enough. When the VM gets into the broken state please do something like: gdbserver --attach host:12345 <QEMU pid> and let me know the host name and port (12345 in the above example).
Once I again I would like to thank you for all your help and your great disposition!
You're absolutely welcome, I don't think I've done anything helpful so far :)
Cheers, Fer On mar, jun 20, 2017 at 9:52 , Ladi Prosek <lprosek@redhat.com<mailto:lprosek@redhat.com>> wrote: The host kernel is less likely to be responsible for this, in my opinion. I'd hold off on that for now. And last but not least KSM is enabled on the host. Should I disable it? Could be worth the try. Following your advice I will run memtest on the host and report back. Just as a side comment, the host is running on ECC memory. I see. Would it be possible for you, once a guest is in the broken state, to make it available for debugging? By attaching gdb to the QEMU process for example and letting me poke around it remotely? Thanks!
Fernando Casas Schössow Jan. 31, 2019, 11:32 a.m. UTC | #11
Hi,

Sorry for resurrecting this thread after so long but I just upgraded the host to Qemu 3.1 and libvirt 4.10 and I'm still facing this problem.
At the moment I cannot use virtio disks (virtio-blk nor virtio-scsi) with my guests in order to avoid this issue so as a workaround I'm using SATA emulated storage which is not ideal but is perfectly stable.

Do you have any suggestions on how can I progress troubleshooting?
Qemu is not crashing so I don't have any dumps that can be analyzed. The guest is just "stuck" and all I can do is destroy it and start it again.
It's really frustrating that after all this time I couldn't find the cause for this issue so any ideas are welcome.

Thanks.

Fernando
Stefan Hajnoczi Feb. 1, 2019, 5:48 a.m. UTC | #12
On Thu, Jan 31, 2019 at 11:32:32AM +0000, Fernando Casas Schössow wrote:
> Sorry for resurrecting this thread after so long but I just upgraded the host to Qemu 3.1 and libvirt 4.10 and I'm still facing this problem.
> At the moment I cannot use virtio disks (virtio-blk nor virtio-scsi) with my guests in order to avoid this issue so as a workaround I'm using SATA emulated storage which is not ideal but is perfectly stable.
> 
> Do you have any suggestions on how can I progress troubleshooting?
> Qemu is not crashing so I don't have any dumps that can be analyzed. The guest is just "stuck" and all I can do is destroy it and start it again.
> It's really frustrating that after all this time I couldn't find the cause for this issue so any ideas are welcome.

Hi Fernando,
Please post your QEMU command-line (ps aux | grep qemu) and the details
of the guest operating system and version.

Stefan
Fernando Casas Schössow Feb. 1, 2019, 8:17 a.m. UTC | #13
Hi Stefan,

Thanks for looking into this. Please find the requested details below.
If you need any further details (like the host storage details) just let me know.

Host details:
CPU: Intel(R) Xeon(R) CPU E3-1230 V2
Memory: 32GB (ECC)
OS: Alpine 3.9
Kernel version: 4.19.18-0-vanilla
Qemu version: 3.1
Libvirt version: 4.10
Networking: openvswitch-2.10.1


Windows guest:
OS: Windows Server 2012 R2 64bit (up to date)
Virtio drivers version: 141 (stable)
Command line: /usr/bin/qemu-system-x86_64 -name guest=DCHOMENET02,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-17-DCHOMENET02/master-key.aes -machine pc-i440fx-3.1,accel=kvm,usb=off,dump-guest-core=off -cpu IvyBridge,ss=on,vmx=on,pcid=on,hypervisor=on,arat=on,tsc_adjust=on,umip=on,xsaveopt=on,hv_time,hv_relaxed,hv_vapic,hv_spinlocks=0x1fff -drive file=/usr/share/edk2.git/ovmf-x64/OVMF_CODE-pure-efi.fd,if=pflash,format=raw,unit=0,readonly=on -drive file=/var/lib/libvirt/qemu/nvram/DCHOMENET02_VARS.fd,if=pflash,format=raw,unit=1 -m 2048 -realtime mlock=off -smp 2,sockets=2,cores=1,threads=1 -uuid 84afccf7-f164-4d0d-b9c9-c92ffab00848 -no-user-config -nodefaults -chardev socket,id=charmonitor,fd=42,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=localtime,driftfix=slew -global kvm-pit.lost_tick_policy=delay -no-hpet -no-shutdown -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -boot strict=on -device ich9-usb-ehci1,id=usb,bus=pci.0,addr=0x4.0x7 -device ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pci.0,multifunction=on,addr=0x4 -device ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pci.0,addr=0x4.0x1 -device ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pci.0,addr=0x4.0x2 -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x5 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x6 -drive file=/storage/storage-ssd-vms/virtual_machines_ssd/dchomenet02.qcow2,format=qcow2,if=none,id=drive-scsi0-0-0-0,cache=none,aio=threads -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1,write-cache=on -drive file=/storage/storage-hdd-vms/virtual_machines_hdd/dchomenet02_storage.qcow2,format=qcow2,if=none,id=drive-scsi0-0-0-1,cache=none,aio=threads -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=1,drive=drive-scsi0-0-0-1,id=scsi0-0-0-1,write-cache=on -netdev tap,fd=44,id=hostnet0,vhost=on,vhostfd=45 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:91:aa:a1,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev spicevmc,id=charchannel0,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.spice.0 -chardev socket,id=charchannel1,fd=47,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=org.qemu.guest_agent.0 -device usb-tablet,id=input0,bus=usb.0,port=1 -spice port=5905,addr=127.0.0.1,disable-ticketing,seamless-migration=on -device qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,vram64_size_mb=0,vgamem_mb=16,max_outputs=1,bus=pci.0,addr=0x2 -chardev spicevmc,id=charredir0,name=usbredir -device usb-redir,chardev=charredir0,id=redir0,bus=usb.0,port=2 -chardev spicevmc,id=charredir1,name=usbredir -device usb-redir,chardev=charredir1,id=redir1,bus=usb.0,port=3 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x8 -sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny -msg timestamp=on

Linux guest:
OS: CentOS 7.6 64bit (up to date)
Kernel version: 3.10.0-957.1.3.el7.x86_64
Command line: /usr/bin/qemu-system-x86_64 -name guest=DOCKER01,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-21-DOCKER01/master-key.aes -machine pc-i440fx-3.1,accel=kvm,usb=off,dump-guest-core=off -cpu IvyBridge,ss=on,vmx=on,pcid=on,hypervisor=on,arat=on,tsc_adjust=on,umip=on,xsaveopt=on -drive file=/usr/share/edk2.git/ovmf-x64/OVMF_CODE-pure-efi.fd,if=pflash,format=raw,unit=0,readonly=on -drive file=/var/lib/libvirt/qemu/nvram/DOCKER01_VARS.fd,if=pflash,format=raw,unit=1 -m 2048 -realtime mlock=off -smp 2,sockets=2,cores=1,threads=1 -uuid 4705b146-3b14-4c20-923c-42105d47e7fc -no-user-config -nodefaults -chardev socket,id=charmonitor,fd=44,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=delay -no-hpet -no-shutdown -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -boot strict=on -device ich9-usb-ehci1,id=usb,bus=pci.0,addr=0x4.0x7 -device ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pci.0,multifunction=on,addr=0x4 -device ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pci.0,addr=0x4.0x1 -device ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pci.0,addr=0x4.0x2 -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x6 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 -drive file=/storage/storage-ssd-vms/virtual_machines_ssd/docker01.qcow2,format=qcow2,if=none,id=drive-scsi0-0-0-0,cache=none,aio=threads -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1,write-cache=on -netdev tap,fd=46,id=hostnet0,vhost=on,vhostfd=48 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:1c:af:ce,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev socket,id=charchannel0,fd=49,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0 -chardev spicevmc,id=charchannel1,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=com.redhat.spice.0 -spice port=5904,addr=127.0.0.1,disable-ticketing,seamless-migration=on -device qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,vram64_size_mb=0,vgamem_mb=16,max_outputs=1,bus=pci.0,addr=0x2 -chardev spicevmc,id=charredir0,name=usbredir -device usb-redir,chardev=charredir0,id=redir0,bus=usb.0,port=2 -chardev spicevmc,id=charredir1,name=usbredir -device usb-redir,chardev=charredir1,id=redir1,bus=usb.0,port=3 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7 -object rng-random,id=objrng0,filename=/dev/random -device virtio-rng-pci,rng=objrng0,id=rng0,bus=pci.0,addr=0x8 -sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny -msg timestamp=on

Thanks.
Kind regards,

Fernando

On vie, feb 1, 2019 at 6:48 AM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
On Thu, Jan 31, 2019 at 11:32:32AM +0000, Fernando Casas Schössow wrote:
Sorry for resurrecting this thread after so long but I just upgraded the host to Qemu 3.1 and libvirt 4.10 and I'm still facing this problem. At the moment I cannot use virtio disks (virtio-blk nor virtio-scsi) with my guests in order to avoid this issue so as a workaround I'm using SATA emulated storage which is not ideal but is perfectly stable. Do you have any suggestions on how can I progress troubleshooting? Qemu is not crashing so I don't have any dumps that can be analyzed. The guest is just "stuck" and all I can do is destroy it and start it again. It's really frustrating that after all this time I couldn't find the cause for this issue so any ideas are welcome.
Hi Fernando, Please post your QEMU command-line (ps aux | grep qemu) and the details of the guest operating system and version. Stefan
Stefan Hajnoczi Feb. 4, 2019, 6:06 a.m. UTC | #14
Are you sure this happens with both virtio-blk and virtio-scsi?

The following patch adds more debug output.  You can build as follows:

  $ git clone https://git.qemu.org/git/qemu.git
  $ cd qemu
  $ patch apply -p1
  ...paste the patch here...
  ^D

  # For info on build dependencies see https://wiki.qemu.org/Hosts/Linux
  $ ./configure --target-list=x86_64-softmmu
  $ make -j4

You can configure a libvirt domain to use your custom QEMU binary by
changing the <devices><emulator> tag to the
qemu/x86_64-softmmu/qemu-system-x86_64 path.

---

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 22bd1ac34e..aa44bffa1f 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -879,6 +879,9 @@ void *virtqueue_pop(VirtQueue *vq, size_t sz)
     max = vq->vring.num;

     if (vq->inuse >= vq->vring.num) {
+        fprintf(stderr, "vdev %p (\"%s\")\n", vdev, vdev->name);
+        fprintf(stderr, "vq %p (idx %u)\n", vq, (unsigned int)(vq - vdev->vq));
+        fprintf(stderr, "inuse %u vring.num %u\n", vq->inuse, vq->vring.num);
         virtio_error(vdev, "Virtqueue size exceeded");
         goto done;
     }
Fernando Casas Schössow Feb. 4, 2019, 7:24 a.m. UTC | #15
I can test again with qemu 3.1 but with previous versions yes, it was happening the same with both virtio-blk and virtio-scsi.
For 3.1 I can confirm it happens for virtio-scsi (already tested it) and I can test with virtio-blk again if that will add value to the investigation.
Also I'm attaching a guest console screenshot showing the errors displayed by the guest when it goes unresponsive in case it can help.

Thanks for the patch. I will build the custom qemu binary and reproduce the issue. This may take a couple of days since I cannot reproduce it at will. Sometimes it takes 12 hours sometimes 2 days until it happens.
Hopefully the code below will add more light on to this problem.

Thanks,

Fernando

On lun, feb 4, 2019 at 7:06 AM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
Are you sure this happens with both virtio-blk and virtio-scsi? The following patch adds more debug output. You can build as follows: $ git clone https://git.qemu.org/git/qemu.git $ cd qemu $ patch apply -p1 ...paste the patch here... ^D # For info on build dependencies see https://wiki.qemu.org/Hosts/Linux $ ./configure --target-list=x86_64-softmmu $ make -j4 You can configure a libvirt domain to use your custom QEMU binary by changing the <devices><emulator> tag to the qemu/x86_64-softmmu/qemu-system-x86_64 path. --- diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c index 22bd1ac34e..aa44bffa1f 100644 --- a/hw/virtio/virtio.c +++ b/hw/virtio/virtio.c @@ -879,6 +879,9 @@ void *virtqueue_pop(VirtQueue *vq, size_t sz) max = vq->vring.num; if (vq->inuse >= vq->vring.num) { + fprintf(stderr, "vdev %p (\"%s\")\n", vdev, vdev->name); + fprintf(stderr, "vq %p (idx %u)\n", vq, (unsigned int)(vq - vdev->vq)); + fprintf(stderr, "inuse %u vring.num %u\n", vq->inuse, vq->vring.num); virtio_error(vdev, "Virtqueue size exceeded"); goto done; }
Fernando Casas Schössow Feb. 6, 2019, 7:15 a.m. UTC | #16
I can now confirm that the same happens with virtio-blk and virtio-scsi.
Please find below the qemu log enhanced with the new information added by the patch provided by Stefan:

vdev 0x55d22b8e10f0 ("virtio-blk")
vq 0x55d22b8ebe40 (idx 0)
inuse 128 vring.num 128
2019-02-06T00:40:41.742552Z qemu-system-x86_64: Virtqueue size exceeded

I just changed the disk back to virtio-scsi so I can repro this again with the patched qemu and report back.

Thanks.

On lun, feb 4, 2019 at 8:24 AM, Fernando Casas Schössow <casasfernando@outlook.com> wrote:

I can test again with qemu 3.1 but with previous versions yes, it was happening the same with both virtio-blk and virtio-scsi.
For 3.1 I can confirm it happens for virtio-scsi (already tested it) and I can test with virtio-blk again if that will add value to the investigation.
Also I'm attaching a guest console screenshot showing the errors displayed by the guest when it goes unresponsive in case it can help.

Thanks for the patch. I will build the custom qemu binary and reproduce the issue. This may take a couple of days since I cannot reproduce it at will. Sometimes it takes 12 hours sometimes 2 days until it happens.
Hopefully the code below will add more light on to this problem.

Thanks,

Fernando

On lun, feb 4, 2019 at 7:06 AM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
Are you sure this happens with both virtio-blk and virtio-scsi? The following patch adds more debug output. You can build as follows: $ git clone https://git.qemu.org/git/qemu.git $ cd qemu $ patch apply -p1 ...paste the patch here... ^D # For info on build dependencies see https://wiki.qemu.org/Hosts/Linux $ ./configure --target-list=x86_64-softmmu $ make -j4 You can configure a libvirt domain to use your custom QEMU binary by changing the <devices><emulator> tag to the qemu/x86_64-softmmu/qemu-system-x86_64 path. --- diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c index 22bd1ac34e..aa44bffa1f 100644 --- a/hw/virtio/virtio.c +++ b/hw/virtio/virtio.c @@ -879,6 +879,9 @@ void *virtqueue_pop(VirtQueue *vq, size_t sz) max = vq->vring.num; if (vq->inuse >= vq->vring.num) { + fprintf(stderr, "vdev %p (\"%s\")\n", vdev, vdev->name); + fprintf(stderr, "vq %p (idx %u)\n", vq, (unsigned int)(vq - vdev->vq)); + fprintf(stderr, "inuse %u vring.num %u\n", vq->inuse, vq->vring.num); virtio_error(vdev, "Virtqueue size exceeded"); goto done; }
Fernando Casas Schössow Feb. 6, 2019, 4:47 p.m. UTC | #17
I could also repro the same with virtio-scsi on the same guest a couple of hours later:

2019-02-06 07:10:37.672+0000: starting up libvirt version: 4.10.0, qemu version: 3.1.0, kernel: 4.19.18-0-vanilla, hostname: vmsvr01.homenet.local
LC_ALL=C PATH=/bin:/sbin:/bin:/sbin:/usr/bin:/usr/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin HOME=/root USER=root QEMU_AUDIO_DRV=spice /home/fernando/qemu-system-x86_64 -name guest=DOCKER01,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-32-DOCKER01/master-key.aes -machine pc-i440fx-3.1,accel=kvm,usb=off,dump-guest-core=off -cpu IvyBridge,ss=on,vmx=on,pcid=on,hypervisor=on,arat=on,tsc_adjust=on,umip=on,xsaveopt=on -drive file=/usr/share/edk2.git/ovmf-x64/OVMF_CODE-pure-efi.fd,if=pflash,format=raw,unit=0,readonly=on -drive file=/var/lib/libvirt/qemu/nvram/DOCKER01_VARS.fd,if=pflash,format=raw,unit=1 -m 2048 -realtime mlock=off -smp 2,sockets=2,cores=1,threads=1 -uuid 4705b146-3b14-4c20-923c-42105d47e7fc -no-user-config -nodefaults -chardev socket,id=charmonitor,fd=46,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=delay -no-hpet -no-shutdown -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -boot strict=on -device ich9-usb-ehci1,id=usb,bus=pci.0,addr=0x4.0x7 -device ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pci.0,multifunction=on,addr=0x4 -device ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pci.0,addr=0x4.0x1 -device ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pci.0,addr=0x4.0x2 -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x6 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 -drive file=/storage/storage-ssd-vms/virtual_machines_ssd/docker01.qcow2,format=qcow2,if=none,id=drive-scsi0-0-0-0,cache=none,aio=threads -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1,write-cache=on -netdev tap,fd=48,id=hostnet0,vhost=on,vhostfd=50 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:1c:af:ce,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev socket,id=charchannel0,fd=51,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0 -chardev spicevmc,id=charchannel1,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=com.redhat.spice.0 -spice port=5904,addr=127.0.0.1,disable-ticketing,seamless-migration=on -device qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,vram64_size_mb=0,vgamem_mb=16,max_outputs=1,bus=pci.0,addr=0x2 -chardev spicevmc,id=charredir0,name=usbredir -device usb-redir,chardev=charredir0,id=redir0,bus=usb.0,port=2 -chardev spicevmc,id=charredir1,name=usbredir -device usb-redir,chardev=charredir1,id=redir1,bus=usb.0,port=3 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7 -object rng-random,id=objrng0,filename=/dev/random -device virtio-rng-pci,rng=objrng0,id=rng0,bus=pci.0,addr=0x8 -sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny -msg timestamp=on
2019-02-06 07:10:37.672+0000: Domain id=32 is tainted: high-privileges
char device redirected to /dev/pts/5 (label charserial0)
vdev 0x5585456ef6b0 ("virtio-scsi")
vq 0x5585456f90a0 (idx 2)
inuse 128 vring.num 128
2019-02-06T13:00:46.942424Z qemu-system-x86_64: Virtqueue size exceeded


I'm open to any tests or suggestions that can move the investigation forward and find the cause of this issue.
Thanks.

On mié, feb 6, 2019 at 8:15 AM, Fernando Casas Schössow <casasfernando@outlook.com> wrote:
I can now confirm that the same happens with virtio-blk and virtio-scsi.
Please find below the qemu log enhanced with the new information added by the patch provided by Stefan:

vdev 0x55d22b8e10f0 ("virtio-blk")
vq 0x55d22b8ebe40 (idx 0)
inuse 128 vring.num 128
2019-02-06T00:40:41.742552Z qemu-system-x86_64: Virtqueue size exceeded

I just changed the disk back to virtio-scsi so I can repro this again with the patched qemu and report back.

Thanks.

On lun, feb 4, 2019 at 8:24 AM, Fernando Casas Schössow <casasfernando@outlook.com> wrote:

I can test again with qemu 3.1 but with previous versions yes, it was happening the same with both virtio-blk and virtio-scsi.
For 3.1 I can confirm it happens for virtio-scsi (already tested it) and I can test with virtio-blk again if that will add value to the investigation.
Also I'm attaching a guest console screenshot showing the errors displayed by the guest when it goes unresponsive in case it can help.

Thanks for the patch. I will build the custom qemu binary and reproduce the issue. This may take a couple of days since I cannot reproduce it at will. Sometimes it takes 12 hours sometimes 2 days until it happens.
Hopefully the code below will add more light on to this problem.

Thanks,

Fernando

On lun, feb 4, 2019 at 7:06 AM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
Are you sure this happens with both virtio-blk and virtio-scsi? The following patch adds more debug output. You can build as follows: $ git clone https://git.qemu.org/git/qemu.git $ cd qemu $ patch apply -p1 ...paste the patch here... ^D # For info on build dependencies see https://wiki.qemu.org/Hosts/Linux $ ./configure --target-list=x86_64-softmmu $ make -j4 You can configure a libvirt domain to use your custom QEMU binary by changing the <devices><emulator> tag to the qemu/x86_64-softmmu/qemu-system-x86_64 path. --- diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c index 22bd1ac34e..aa44bffa1f 100644 --- a/hw/virtio/virtio.c +++ b/hw/virtio/virtio.c @@ -879,6 +879,9 @@ void *virtqueue_pop(VirtQueue *vq, size_t sz) max = vq->vring.num; if (vq->inuse >= vq->vring.num) { + fprintf(stderr, "vdev %p (\"%s\")\n", vdev, vdev->name); + fprintf(stderr, "vq %p (idx %u)\n", vq, (unsigned int)(vq - vdev->vq)); + fprintf(stderr, "inuse %u vring.num %u\n", vq->inuse, vq->vring.num); virtio_error(vdev, "Virtqueue size exceeded"); goto done; }
Stefan Hajnoczi Feb. 11, 2019, 3:17 a.m. UTC | #18
On Wed, Feb 06, 2019 at 04:47:19PM +0000, Fernando Casas Schössow wrote:
> I could also repro the same with virtio-scsi on the same guest a couple of hours later:
> 
> 2019-02-06 07:10:37.672+0000: starting up libvirt version: 4.10.0, qemu version: 3.1.0, kernel: 4.19.18-0-vanilla, hostname: vmsvr01.homenet.local
> LC_ALL=C PATH=/bin:/sbin:/bin:/sbin:/usr/bin:/usr/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin HOME=/root USER=root QEMU_AUDIO_DRV=spice /home/fernando/qemu-system-x86_64 -name guest=DOCKER01,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-32-DOCKER01/master-key.aes -machine pc-i440fx-3.1,accel=kvm,usb=off,dump-guest-core=off -cpu IvyBridge,ss=on,vmx=on,pcid=on,hypervisor=on,arat=on,tsc_adjust=on,umip=on,xsaveopt=on -drive file=/usr/share/edk2.git/ovmf-x64/OVMF_CODE-pure-efi.fd,if=pflash,format=raw,unit=0,readonly=on -drive file=/var/lib/libvirt/qemu/nvram/DOCKER01_VARS.fd,if=pflash,format=raw,unit=1 -m 2048 -realtime mlock=off -smp 2,sockets=2,cores=1,threads=1 -uuid 4705b146-3b14-4c20-923c-42105d47e7fc -no-user-config -nodefaults -chardev socket,id=charmonitor,fd=46,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=delay -no-hpet -no-shutdown -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -boot strict=on -device ich9-usb-ehci1,id=usb,bus=pci.0,addr=0x4.0x7 -device ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pci.0,multifunction=on,addr=0x4 -device ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pci.0,addr=0x4.0x1 -device ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pci.0,addr=0x4.0x2 -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x6 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 -drive file=/storage/storage-ssd-vms/virtual_machines_ssd/docker01.qcow2,format=qcow2,if=none,id=drive-scsi0-0-0-0,cache=none,aio=threads -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1,write-cache=on -netdev tap,fd=48,id=hostnet0,vhost=on,vhostfd=50 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:1c:af:ce,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev socket,id=charchannel0,fd=51,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0 -chardev spicevmc,id=charchannel1,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=com.redhat.spice.0 -spice port=5904,addr=127.0.0.1,disable-ticketing,seamless-migration=on -device qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,vram64_size_mb=0,vgamem_mb=16,max_outputs=1,bus=pci.0,addr=0x2 -chardev spicevmc,id=charredir0,name=usbredir -device usb-redir,chardev=charredir0,id=redir0,bus=usb.0,port=2 -chardev spicevmc,id=charredir1,name=usbredir -device usb-redir,chardev=charredir1,id=redir1,bus=usb.0,port=3 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7 -object rng-random,id=objrng0,filename=/dev/random -device virtio-rng-pci,rng=objrng0,id=rng0,bus=pci.0,addr=0x8 -sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny -msg timestamp=on
> 2019-02-06 07:10:37.672+0000: Domain id=32 is tainted: high-privileges
> char device redirected to /dev/pts/5 (label charserial0)
> vdev 0x5585456ef6b0 ("virtio-scsi")
> vq 0x5585456f90a0 (idx 2)
> inuse 128 vring.num 128
> 2019-02-06T13:00:46.942424Z qemu-system-x86_64: Virtqueue size exceeded
> 
> 
> I'm open to any tests or suggestions that can move the investigation forward and find the cause of this issue.

Thanks for collecting the data!

The fact that both virtio-blk and virtio-scsi failed suggests it's not a
virtqueue element leak in the virtio-blk or virtio-scsi device emulation
code.

The hung task error messages from inside the guest are a consequence of
QEMU hitting the "Virtqueue size exceeded" error.  QEMU refuses to
process further requests after the error, causing tasks inside the guest
to get stuck on I/O.

I don't have a good theory regarding the root cause.  Two ideas:
1. The guest is corrupting the vring or submitting more requests than
   will fit into the ring.  Somewhat unlikely because it happens with
   both Windows and Linux guests.
2. QEMU's virtqueue code is buggy, maybe the memory region cache which
   is used for fast guest RAM accesses.

Here is an expanded version of the debug patch which might help identify
which of these scenarios is likely.  Sorry, it requires running the
guest again!

This time let's make QEMU dump core so both QEMU state and guest RAM are
captured for further debugging.  That way it will be possible to extract
more information using gdb without rerunning.

Stefan
---
diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index a1ff647a66..28d89fcbcb 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -866,6 +866,7 @@ void *virtqueue_pop(VirtQueue *vq, size_t sz)
         return NULL;
     }
     rcu_read_lock();
+    uint16_t old_shadow_avail_idx = vq->shadow_avail_idx;
     if (virtio_queue_empty_rcu(vq)) {
         goto done;
     }
@@ -879,6 +880,12 @@ void *virtqueue_pop(VirtQueue *vq, size_t sz)
     max = vq->vring.num;

     if (vq->inuse >= vq->vring.num) {
+        fprintf(stderr, "vdev %p (\"%s\")\n", vdev, vdev->name);
+        fprintf(stderr, "vq %p (idx %u)\n", vq, (unsigned int)(vq - vdev->vq));
+        fprintf(stderr, "inuse %u vring.num %u\n", vq->inuse, vq->vring.num);
+        fprintf(stderr, "old_shadow_avail_idx %u last_avail_idx %u avail_idx %u\n", old_shadow_avail_idx, vq->last_avail_idx, vq->shadow_avail_idx);
+        fprintf(stderr, "avail %#" HWADDR_PRIx " avail_idx (cache bypassed) %u\n", vq->vring.avail, virtio_lduw_phys(vdev, vq->vring.avail + offsetof(VRingAvail, idx)));
+        fprintf(stderr, "used_idx %u\n", vq->used_idx);
+        abort(); /* <--- core dump! */
         virtio_error(vdev, "Virtqueue size exceeded");
         goto done;
     }
Fernando Casas Schössow Feb. 11, 2019, 9:48 a.m. UTC | #19
Thanks for looking into this Stefan.

I rebuilt Qemu with the new patch and got a couple of guests running with the new build. Two of them using virtio-scsi and another one using virtio-blk. Now I'm waiting for any of them to crash.
I also set libvirt to include the guest memory in the qemu dumps as I understood you will want to look at both (qemu dump and guest memory dump).

I will reply to this thread once I have any news.

Kind regards.

Fernando

On lun, feb 11, 2019 at 4:17 AM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
On Wed, Feb 06, 2019 at 04:47:19PM +0000, Fernando Casas Schössow wrote:
I could also repro the same with virtio-scsi on the same guest a couple of hours later: 2019-02-06 07:10:37.672+0000: starting up libvirt version: 4.10.0, qemu version: 3.1.0, kernel: 4.19.18-0-vanilla, hostname: vmsvr01.homenet.local LC_ALL=C PATH=/bin:/sbin:/bin:/sbin:/usr/bin:/usr/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin HOME=/root USER=root QEMU_AUDIO_DRV=spice /home/fernando/qemu-system-x86_64 -name guest=DOCKER01,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-32-DOCKER01/master-key.aes -machine pc-i440fx-3.1,accel=kvm,usb=off,dump-guest-core=off -cpu IvyBridge,ss=on,vmx=on,pcid=on,hypervisor=on,arat=on,tsc_adjust=on,umip=on,xsaveopt=on -drive file=/usr/share/edk2.git/ovmf-x64/OVMF_CODE-pure-efi.fd,if=pflash,format=raw,unit=0,readonly=on -drive file=/var/lib/libvirt/qemu/nvram/DOCKER01_VARS.fd,if=pflash,format=raw,unit=1 -m 2048 -realtime mlock=off -smp 2,sockets=2,cores=1,threads=1 -uuid 4705b146-3b14-4c20-923c-42105d47e7fc -no-user-config -nodefaults -chardev socket,id=charmonitor,fd=46,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=delay -no-hpet -no-shutdown -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -boot strict=on -device ich9-usb-ehci1,id=usb,bus=pci.0,addr=0x4.0x7 -device ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pci.0,multifunction=on,addr=0x4 -device ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pci.0,addr=0x4.0x1 -device ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pci.0,addr=0x4.0x2 -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x6 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 -drive file=/storage/storage-ssd-vms/virtual_machines_ssd/docker01.qcow2,format=qcow2,if=none,id=drive-scsi0-0-0-0,cache=none,aio=threads -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1,write-cache=on -netdev tap,fd=48,id=hostnet0,vhost=on,vhostfd=50 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:1c:af:ce,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev socket,id=charchannel0,fd=51,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0 -chardev spicevmc,id=charchannel1,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=com.redhat.spice.0 -spice port=5904,addr=127.0.0.1,disable-ticketing,seamless-migration=on -device qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,vram64_size_mb=0,vgamem_mb=16,max_outputs=1,bus=pci.0,addr=0x2 -chardev spicevmc,id=charredir0,name=usbredir -device usb-redir,chardev=charredir0,id=redir0,bus=usb.0,port=2 -chardev spicevmc,id=charredir1,name=usbredir -device usb-redir,chardev=charredir1,id=redir1,bus=usb.0,port=3 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7 -object rng-random,id=objrng0,filename=/dev/random -device virtio-rng-pci,rng=objrng0,id=rng0,bus=pci.0,addr=0x8 -sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny -msg timestamp=on 2019-02-06 07:10:37.672+0000: Domain id=32 is tainted: high-privileges char device redirected to /dev/pts/5 (label charserial0) vdev 0x5585456ef6b0 ("virtio-scsi") vq 0x5585456f90a0 (idx 2) inuse 128 vring.num 128 2019-02-06T13:00:46.942424Z qemu-system-x86_64: Virtqueue size exceeded I'm open to any tests or suggestions that can move the investigation forward and find the cause of this issue.
Thanks for collecting the data! The fact that both virtio-blk and virtio-scsi failed suggests it's not a virtqueue element leak in the virtio-blk or virtio-scsi device emulation code. The hung task error messages from inside the guest are a consequence of QEMU hitting the "Virtqueue size exceeded" error. QEMU refuses to process further requests after the error, causing tasks inside the guest to get stuck on I/O. I don't have a good theory regarding the root cause. Two ideas: 1. The guest is corrupting the vring or submitting more requests than will fit into the ring. Somewhat unlikely because it happens with both Windows and Linux guests. 2. QEMU's virtqueue code is buggy, maybe the memory region cache which is used for fast guest RAM accesses. Here is an expanded version of the debug patch which might help identify which of these scenarios is likely. Sorry, it requires running the guest again! This time let's make QEMU dump core so both QEMU state and guest RAM are captured for further debugging. That way it will be possible to extract more information using gdb without rerunning. Stefan --- diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c index a1ff647a66..28d89fcbcb 100644 --- a/hw/virtio/virtio.c +++ b/hw/virtio/virtio.c @@ -866,6 +866,7 @@ void *virtqueue_pop(VirtQueue *vq, size_t sz) return NULL; } rcu_read_lock(); + uint16_t old_shadow_avail_idx = vq->shadow_avail_idx; if (virtio_queue_empty_rcu(vq)) { goto done; } @@ -879,6 +880,12 @@ void *virtqueue_pop(VirtQueue *vq, size_t sz) max = vq->vring.num; if (vq->inuse >= vq->vring.num) { + fprintf(stderr, "vdev %p (\"%s\")\n", vdev, vdev->name); + fprintf(stderr, "vq %p (idx %u)\n", vq, (unsigned int)(vq - vdev->vq)); + fprintf(stderr, "inuse %u vring.num %u\n", vq->inuse, vq->vring.num); + fprintf(stderr, "old_shadow_avail_idx %u last_avail_idx %u avail_idx %u\n", old_shadow_avail_idx, vq->last_avail_idx, vq->shadow_avail_idx); + fprintf(stderr, "avail %#" HWADDR_PRIx " avail_idx (cache bypassed) %u\n", vq->vring.avail, virtio_lduw_phys(vdev, vq->vring.avail + offsetof(VRingAvail, idx))); + fprintf(stderr, "used_idx %u\n", vq->used_idx); + abort(); /* <--- core dump! */ virtio_error(vdev, "Virtqueue size exceeded"); goto done; }
Fernando Casas Schössow Feb. 18, 2019, 7:21 a.m. UTC | #20
It took a few days but last night the problem was reproduced.
This is the information from the log:

vdev 0x55f261d940f0 ("virtio-blk")
vq 0x55f261d9ee40 (idx 0)
inuse 128 vring.num 128
old_shadow_avail_idx 58874 last_avail_idx 58625 avail_idx 58874
avail 0x3d87a800 avail_idx (cache bypassed) 58625
used_idx 58497
2019-02-18 03:20:08.605+0000: shutting down, reason=crashed

The dump file, including guest memory, was generated successfully (after gzip the file is around 492MB).
I switched the guest now to virtio-scsi to get the information and dump with this setup as well.

How should we proceed?

Thanks.

On lun, feb 11, 2019 at 4:17 AM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
Thanks for collecting the data! The fact that both virtio-blk and virtio-scsi failed suggests it's not a virtqueue element leak in the virtio-blk or virtio-scsi device emulation code. The hung task error messages from inside the guest are a consequence of QEMU hitting the "Virtqueue size exceeded" error. QEMU refuses to process further requests after the error, causing tasks inside the guest to get stuck on I/O. I don't have a good theory regarding the root cause. Two ideas: 1. The guest is corrupting the vring or submitting more requests than will fit into the ring. Somewhat unlikely because it happens with both Windows and Linux guests. 2. QEMU's virtqueue code is buggy, maybe the memory region cache which is used for fast guest RAM accesses. Here is an expanded version of the debug patch which might help identify which of these scenarios is likely. Sorry, it requires running the guest again! This time let's make QEMU dump core so both QEMU state and guest RAM are captured for further debugging. That way it will be possible to extract more information using gdb without rerunning. Stefan --- diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c index a1ff647a66..28d89fcbcb 100644 --- a/hw/virtio/virtio.c +++ b/hw/virtio/virtio.c @@ -866,6 +866,7 @@ void *virtqueue_pop(VirtQueue *vq, size_t sz) return NULL; } rcu_read_lock(); + uint16_t old_shadow_avail_idx = vq->shadow_avail_idx; if (virtio_queue_empty_rcu(vq)) { goto done; } @@ -879,6 +880,12 @@ void *virtqueue_pop(VirtQueue *vq, size_t sz) max = vq->vring.num; if (vq->inuse >= vq->vring.num) { + fprintf(stderr, "vdev %p (\"%s\")\n", vdev, vdev->name); + fprintf(stderr, "vq %p (idx %u)\n", vq, (unsigned int)(vq - vdev->vq)); + fprintf(stderr, "inuse %u vring.num %u\n", vq->inuse, vq->vring.num); + fprintf(stderr, "old_shadow_avail_idx %u last_avail_idx %u avail_idx %u\n", old_shadow_avail_idx, vq->last_avail_idx, vq->shadow_avail_idx); + fprintf(stderr, "avail %#" HWADDR_PRIx " avail_idx (cache bypassed) %u\n", vq->vring.avail, virtio_lduw_phys(vdev, vq->vring.avail + offsetof(VRingAvail, idx))); + fprintf(stderr, "used_idx %u\n", vq->used_idx); + abort(); /* <--- core dump! */ virtio_error(vdev, "Virtqueue size exceeded"); goto done; }
Fernando Casas Schössow Feb. 19, 2019, 7:26 a.m. UTC | #21
Problem reproduced with virtio-scsi as well on the same guest, this time it took less than a day.
Information from the log file:

vdev 0x55823f119f90 ("virtio-scsi")
vq 0x55823f122e80 (idx 2)
inuse 128 vring.num 128
old_shadow_avail_idx 58367 last_avail_idx 58113 avail_idx 58367
avail 0x3de8a800 avail_idx (cache bypassed) 58113
used_idx 57985
2019-02-19 04:20:43.291+0000: shutting down, reason=crashed

Got the dump file as well, including guest memory. Size is around 486MB after compression.
Is there any other information I should collect to progress the investigation?

Thanks.

On lun, feb 18, 2019 at 8:21 AM, Fernando Casas Schössow <casasfernando@outlook.com> wrote:
It took a few days but last night the problem was reproduced.
This is the information from the log:

vdev 0x55f261d940f0 ("virtio-blk")
vq 0x55f261d9ee40 (idx 0)
inuse 128 vring.num 128
old_shadow_avail_idx 58874 last_avail_idx 58625 avail_idx 58874
avail 0x3d87a800 avail_idx (cache bypassed) 58625
used_idx 58497
2019-02-18 03:20:08.605+0000: shutting down, reason=crashed

The dump file, including guest memory, was generated successfully (after gzip the file is around 492MB).
I switched the guest now to virtio-scsi to get the information and dump with this setup as well.

How should we proceed?

Thanks.

On lun, feb 11, 2019 at 4:17 AM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
Thanks for collecting the data! The fact that both virtio-blk and virtio-scsi failed suggests it's not a virtqueue element leak in the virtio-blk or virtio-scsi device emulation code. The hung task error messages from inside the guest are a consequence of QEMU hitting the "Virtqueue size exceeded" error. QEMU refuses to process further requests after the error, causing tasks inside the guest to get stuck on I/O. I don't have a good theory regarding the root cause. Two ideas: 1. The guest is corrupting the vring or submitting more requests than will fit into the ring. Somewhat unlikely because it happens with both Windows and Linux guests. 2. QEMU's virtqueue code is buggy, maybe the memory region cache which is used for fast guest RAM accesses. Here is an expanded version of the debug patch which might help identify which of these scenarios is likely. Sorry, it requires running the guest again! This time let's make QEMU dump core so both QEMU state and guest RAM are captured for further debugging. That way it will be possible to extract more information using gdb without rerunning. Stefan --- diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c index a1ff647a66..28d89fcbcb 100644 --- a/hw/virtio/virtio.c +++ b/hw/virtio/virtio.c @@ -866,6 +866,7 @@ void *virtqueue_pop(VirtQueue *vq, size_t sz) return NULL; } rcu_read_lock(); + uint16_t old_shadow_avail_idx = vq->shadow_avail_idx; if (virtio_queue_empty_rcu(vq)) { goto done; } @@ -879,6 +880,12 @@ void *virtqueue_pop(VirtQueue *vq, size_t sz) max = vq->vring.num; if (vq->inuse >= vq->vring.num) { + fprintf(stderr, "vdev %p (\"%s\")\n", vdev, vdev->name); + fprintf(stderr, "vq %p (idx %u)\n", vq, (unsigned int)(vq - vdev->vq)); + fprintf(stderr, "inuse %u vring.num %u\n", vq->inuse, vq->vring.num); + fprintf(stderr, "old_shadow_avail_idx %u last_avail_idx %u avail_idx %u\n", old_shadow_avail_idx, vq->last_avail_idx, vq->shadow_avail_idx); + fprintf(stderr, "avail %#" HWADDR_PRIx " avail_idx (cache bypassed) %u\n", vq->vring.avail, virtio_lduw_phys(vdev, vq->vring.avail + offsetof(VRingAvail, idx))); + fprintf(stderr, "used_idx %u\n", vq->used_idx); + abort(); /* <--- core dump! */ virtio_error(vdev, "Virtqueue size exceeded"); goto done; }
Stefan Hajnoczi Feb. 20, 2019, 4:58 p.m. UTC | #22
On Mon, Feb 18, 2019 at 07:21:25AM +0000, Fernando Casas Schössow wrote:
> It took a few days but last night the problem was reproduced.
> This is the information from the log:
> 
> vdev 0x55f261d940f0 ("virtio-blk")
> vq 0x55f261d9ee40 (idx 0)
> inuse 128 vring.num 128
> old_shadow_avail_idx 58874 last_avail_idx 58625 avail_idx 58874
> avail 0x3d87a800 avail_idx (cache bypassed) 58625

Hi Paolo,
Are you aware of any recent MemoryRegionCache issues?  The avail_idx
value 58874 was read via the cache while a non-cached read produces
58625!

I suspect that 58625 is correct since the vring is already full and the
driver wouldn't bump avail_idx any further until requests complete.

Fernando also hits this issue with virtio-scsi so it's not a
virtio_blk.ko driver bug or a virtio-blk device emulation issue.

A QEMU core dump is available for debugging.

Here is the patch that produced this debug output:
diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index a1ff647a66..28d89fcbcb 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -866,6 +866,7 @@ void *virtqueue_pop(VirtQueue *vq, size_t sz)
         return NULL;
     }
     rcu_read_lock();
+    uint16_t old_shadow_avail_idx = vq->shadow_avail_idx;
     if (virtio_queue_empty_rcu(vq)) {
         goto done;
     }
@@ -879,6 +880,12 @@ void *virtqueue_pop(VirtQueue *vq, size_t sz)
     max = vq->vring.num;

     if (vq->inuse >= vq->vring.num) {
+        fprintf(stderr, "vdev %p (\"%s\")\n", vdev, vdev->name);
+        fprintf(stderr, "vq %p (idx %u)\n", vq, (unsigned int)(vq - vdev->vq));
+        fprintf(stderr, "inuse %u vring.num %u\n", vq->inuse, vq->vring.num);
+        fprintf(stderr, "old_shadow_avail_idx %u last_avail_idx %u avail_idx %u\n", old_shadow_avail_idx, vq->last_avail_idx, vq->shadow_avail_idx);
+        fprintf(stderr, "avail %#" HWADDR_PRIx " avail_idx (cache bypassed) %u\n", vq->vring.avail, virtio_lduw_phys(vdev, vq->vring.avail + offsetof(VRingAvail, idx)));
+        fprintf(stderr, "used_idx %u\n", vq->used_idx);
+        abort(); /* <--- core dump! */
         virtio_error(vdev, "Virtqueue size exceeded");
         goto done;
     }

Stefan

> used_idx 58497
> 2019-02-18 03:20:08.605+0000: shutting down, reason=crashed
> 
> The dump file, including guest memory, was generated successfully (after gzip the file is around 492MB).
> I switched the guest now to virtio-scsi to get the information and dump with this setup as well.
> 
> How should we proceed?
> 
> Thanks.
> 
> On lun, feb 11, 2019 at 4:17 AM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
> Thanks for collecting the data! The fact that both virtio-blk and virtio-scsi failed suggests it's not a virtqueue element leak in the virtio-blk or virtio-scsi device emulation code. The hung task error messages from inside the guest are a consequence of QEMU hitting the "Virtqueue size exceeded" error. QEMU refuses to process further requests after the error, causing tasks inside the guest to get stuck on I/O. I don't have a good theory regarding the root cause. Two ideas: 1. The guest is corrupting the vring or submitting more requests than will fit into the ring. Somewhat unlikely because it happens with both Windows and Linux guests. 2. QEMU's virtqueue code is buggy, maybe the memory region cache which is used for fast guest RAM accesses. Here is an expanded version of the debug patch which might help identify which of these scenarios is likely. Sorry, it requires running the guest again! This time let's make QEMU dump core so both QEMU state and guest RAM are captured for further debugging. That way it will be possible to extract more information using gdb without rerunning. Stefan --- diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c index a1ff647a66..28d89fcbcb 100644 --- a/hw/virtio/virtio.c +++ b/hw/virtio/virtio.c @@ -866,6 +866,7 @@ void *virtqueue_pop(VirtQueue *vq, size_t sz) return NULL; } rcu_read_lock(); + uint16_t old_shadow_avail_idx = vq->shadow_avail_idx; if (virtio_queue_empty_rcu(vq)) { goto done; } @@ -879,6 +880,12 @@ void *virtqueue_pop(VirtQueue *vq, size_t sz) max = vq->vring.num; if (vq->inuse >= vq->vring.num) { + fprintf(stderr, "vdev %p (\"%s\")\n", vdev, vdev->name); + fprintf(stderr, "vq %p (idx %u)\n", vq, (unsigned int)(vq - vdev->vq)); + fprintf(stderr, "inuse %u vring.num %u\n", vq->inuse, vq->vring.num); + fprintf(stderr, "old_shadow_avail_idx %u last_avail_idx %u avail_idx %u\n", old_shadow_avail_idx, vq->last_avail_idx, vq->shadow_avail_idx); + fprintf(stderr, "avail %#" HWADDR_PRIx " avail_idx (cache bypassed) %u\n", vq->vring.avail, virtio_lduw_phys(vdev, vq->vring.avail + offsetof(VRingAvail, idx))); + fprintf(stderr, "used_idx %u\n", vq->used_idx); + abort(); /* <--- core dump! */ virtio_error(vdev, "Virtqueue size exceeded"); goto done; }
> 
>
Paolo Bonzini Feb. 20, 2019, 5:53 p.m. UTC | #23
On 20/02/19 17:58, Stefan Hajnoczi wrote:
> On Mon, Feb 18, 2019 at 07:21:25AM +0000, Fernando Casas Schössow wrote:
>> It took a few days but last night the problem was reproduced.
>> This is the information from the log:
>>
>> vdev 0x55f261d940f0 ("virtio-blk")
>> vq 0x55f261d9ee40 (idx 0)
>> inuse 128 vring.num 128
>> old_shadow_avail_idx 58874 last_avail_idx 58625 avail_idx 58874
>> avail 0x3d87a800 avail_idx (cache bypassed) 58625
> 
> Hi Paolo,
> Are you aware of any recent MemoryRegionCache issues?  The avail_idx
> value 58874 was read via the cache while a non-cached read produces
> 58625!
> 
> I suspect that 58625 is correct since the vring is already full and the
> driver wouldn't bump avail_idx any further until requests complete.
> 
> Fernando also hits this issue with virtio-scsi so it's not a
> virtio_blk.ko driver bug or a virtio-blk device emulation issue.

No, I am not aware of any issues.

How can I get the core dump (and the corresponding executable to get the
symbols)?  Alternatively, it should be enough to print the
vq->vring.caches->avail.mrs from the debugger.

Also, one possibility is to add in vring_avail_idx an assertion like

   assert(vq->shadow_availa_idx == virtio_lduw_phys(vdev,
vq->vring.avail + offsetof(VRingAvail, idx)));

and try to catch the error earlier.

Paolo

> A QEMU core dump is available for debugging.
> 
> Here is the patch that produced this debug output:
> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> index a1ff647a66..28d89fcbcb 100644
> --- a/hw/virtio/virtio.c
> +++ b/hw/virtio/virtio.c
> @@ -866,6 +866,7 @@ void *virtqueue_pop(VirtQueue *vq, size_t sz)
>          return NULL;
>      }
>      rcu_read_lock();
> +    uint16_t old_shadow_avail_idx = vq->shadow_avail_idx;
>      if (virtio_queue_empty_rcu(vq)) {
>          goto done;
>      }
> @@ -879,6 +880,12 @@ void *virtqueue_pop(VirtQueue *vq, size_t sz)
>      max = vq->vring.num;
> 
>      if (vq->inuse >= vq->vring.num) {
> +        fprintf(stderr, "vdev %p (\"%s\")\n", vdev, vdev->name);
> +        fprintf(stderr, "vq %p (idx %u)\n", vq, (unsigned int)(vq - vdev->vq));
> +        fprintf(stderr, "inuse %u vring.num %u\n", vq->inuse, vq->vring.num);
> +        fprintf(stderr, "old_shadow_avail_idx %u last_avail_idx %u avail_idx %u\n", old_shadow_avail_idx, vq->last_avail_idx, vq->shadow_avail_idx);
> +        fprintf(stderr, "avail %#" HWADDR_PRIx " avail_idx (cache bypassed) %u\n", vq->vring.avail, virtio_lduw_phys(vdev, vq->vring.avail + offsetof(VRingAvail, idx)));
> +        fprintf(stderr, "used_idx %u\n", vq->used_idx);
> +        abort(); /* <--- core dump! */
>          virtio_error(vdev, "Virtqueue size exceeded");
>          goto done;
>      }
> 
> Stefan
> 
>> used_idx 58497
>> 2019-02-18 03:20:08.605+0000: shutting down, reason=crashed
>>
>> The dump file, including guest memory, was generated successfully (after gzip the file is around 492MB).
>> I switched the guest now to virtio-scsi to get the information and dump with this setup as well.
>>
>> How should we proceed?
>>
>> Thanks.
>>
>> On lun, feb 11, 2019 at 4:17 AM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
>> Thanks for collecting the data! The fact that both virtio-blk and virtio-scsi failed suggests it's not a virtqueue element leak in the virtio-blk or virtio-scsi device emulation code. The hung task error messages from inside the guest are a consequence of QEMU hitting the "Virtqueue size exceeded" error. QEMU refuses to process further requests after the error, causing tasks inside the guest to get stuck on I/O. I don't have a good theory regarding the root cause. Two ideas: 1. The guest is corrupting the vring or submitting more requests than will fit into the ring. Somewhat unlikely because it happens with both Windows and Linux guests. 2. QEMU's virtqueue code is buggy, maybe the memory region cache which is used for fast guest RAM accesses. Here is an expanded version of the debug patch which might help identify which of these scenarios is likely. Sorry, it requires running the guest again! This time let's make QEMU dump core so both QEMU state and guest RAM are captured for further debugging. That way it will be possible to extract more information using gdb without rerunning. Stefan --- diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c index a1ff647a66..28d89fcbcb 100644 --- a/hw/virtio/virtio.c +++ b/hw/virtio/virtio.c @@ -866,6 +866,7 @@ void *virtqueue_pop(VirtQueue *vq, size_t sz) return NULL; } rcu_read_lock(); + uint16_t old_shadow_avail_idx = vq->shadow_avail_idx; if (virtio_queue_empty_rcu(vq)) { goto done; } @@ -879,6 +880,12 @@ void *virtqueue_pop(VirtQueue *vq, size_t sz) max = vq->vring.num; if (vq->inuse >= vq->vring.num) { + fprintf(stderr, "vdev %p (\"%s\")\n", vdev, vdev->name); + fprintf(stderr, "vq %p (idx %u)\n", vq, (unsigned int)(vq - vdev->vq)); + fprintf(stderr, "inuse %u vring.num %u\n", vq->inuse, vq->vring.num); + fprintf(stderr, "old_shadow_avail_idx %u last_avail_idx %u avail_idx %u\n", old_shadow_avail_idx, vq->last_avail_idx, vq->shadow_avail_idx); + fprintf(stderr, "avail %#" HWADDR_PRIx " avail_idx (cache bypassed) %u\n", vq->vring.avail, virtio_lduw_phys(vdev, vq->vring.avail + offsetof(VRingAvail, idx))); + fprintf(stderr, "used_idx %u\n", vq->used_idx); + abort(); /* <--- core dump! */ virtio_error(vdev, "Virtqueue size exceeded"); goto done; }
>>
>>
Fernando Casas Schössow Feb. 20, 2019, 6:56 p.m. UTC | #24
Hi Paolo,

This is Fernando, the one  that reported the issue.
Regarding the dumps I have three of them including guest memory, 2 for virtio-scsi, 1 for virtio-blk, in case a comparison may help to confirm which is the proble.) I can upload them to a server you indicate me or I can also put them on a server so you can download them as you see fit. Each dump, compressed, is around 500MB.

If it's more convenient for you I can try to get the requested information from gdb. But I will need some guidance since I'm not skilled enough with the debugger.

Another option, if you provide me with the right patch, is for me to patch, rebuild QEMU and repro the problem again. With virtio-scsi I'm able to repro this in a matter of hours most of the times, with virtio-blk it will take a couple of days.

Just let me know how do you prefer to move forward.

Thanks a lot for helping with this!

Kind regards,

Fernando

On mié, feb 20, 2019 at 6:53 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
On 20/02/19 17:58, Stefan Hajnoczi wrote:
On Mon, Feb 18, 2019 at 07:21:25AM +0000, Fernando Casas Schössow wrote:
It took a few days but last night the problem was reproduced. This is the information from the log: vdev 0x55f261d940f0 ("virtio-blk") vq 0x55f261d9ee40 (idx 0) inuse 128 vring.num 128 old_shadow_avail_idx 58874 last_avail_idx 58625 avail_idx 58874 avail 0x3d87a800 avail_idx (cache bypassed) 58625
Hi Paolo, Are you aware of any recent MemoryRegionCache issues? The avail_idx value 58874 was read via the cache while a non-cached read produces 58625! I suspect that 58625 is correct since the vring is already full and the driver wouldn't bump avail_idx any further until requests complete. Fernando also hits this issue with virtio-scsi so it's not a virtio_blk.ko driver bug or a virtio-blk device emulation issue.
No, I am not aware of any issues. How can I get the core dump (and the corresponding executable to get the symbols)? Alternatively, it should be enough to print the vq->vring.caches->avail.mrs from the debugger. Also, one possibility is to add in vring_avail_idx an assertion like assert(vq->shadow_availa_idx == virtio_lduw_phys(vdev, vq->vring.avail + offsetof(VRingAvail, idx))); and try to catch the error earlier. Paolo
A QEMU core dump is available for debugging. Here is the patch that produced this debug output: diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c index a1ff647a66..28d89fcbcb 100644 --- a/hw/virtio/virtio.c +++ b/hw/virtio/virtio.c @@ -866,6 +866,7 @@ void *virtqueue_pop(VirtQueue *vq, size_t sz) return NULL; } rcu_read_lock(); + uint16_t old_shadow_avail_idx = vq->shadow_avail_idx; if (virtio_queue_empty_rcu(vq)) { goto done; } @@ -879,6 +880,12 @@ void *virtqueue_pop(VirtQueue *vq, size_t sz) max = vq->vring.num; if (vq->inuse >= vq->vring.num) { + fprintf(stderr, "vdev %p (\"%s\")\n", vdev, vdev->name); + fprintf(stderr, "vq %p (idx %u)\n", vq, (unsigned int)(vq - vdev->vq)); + fprintf(stderr, "inuse %u vring.num %u\n", vq->inuse, vq->vring.num); + fprintf(stderr, "old_shadow_avail_idx %u last_avail_idx %u avail_idx %u\n", old_shadow_avail_idx, vq->last_avail_idx, vq->shadow_avail_idx); + fprintf(stderr, "avail %#" HWADDR_PRIx " avail_idx (cache bypassed) %u\n", vq->vring.avail, virtio_lduw_phys(vdev, vq->vring.avail + offsetof(VRingAvail, idx))); + fprintf(stderr, "used_idx %u\n", vq->used_idx); + abort(); /* <--- core dump! */ virtio_error(vdev, "Virtqueue size exceeded"); goto done; } Stefan
used_idx 58497 2019-02-18 03:20:08.605+0000: shutting down, reason=crashed The dump file, including guest memory, was generated successfully (after gzip the file is around 492MB). I switched the guest now to virtio-scsi to get the information and dump with this setup as well. How should we proceed? Thanks.
Stefan Hajnoczi Feb. 21, 2019, 11:11 a.m. UTC | #25
On Wed, Feb 20, 2019 at 06:56:04PM +0000, Fernando Casas Schössow wrote:
> Regarding the dumps I have three of them including guest memory, 2 for virtio-scsi, 1 for virtio-blk, in case a comparison may help to confirm which is the proble.) I can upload them to a server you indicate me or I can also put them on a server so you can download them as you see fit. Each dump, compressed, is around 500MB.

Hi Fernando,
It would be great if you could make a compressed coredump and the
corresponding QEMU executable (hopefully with debug symbols) available
on a server so Paolo and/or I can download them.

If you're wondering about the debug symbols, since you built QEMU from
source the binary in x86_64-softmmu/qemu-system-x86_64 should have the
debug symbols.

"gdb path/to/qemu-system-x86_64" will print "Reading symbols from
qemu-system-x86_64...done." if symbols are available.  Otherwise it will
say "Reading symbols from qemu-system-x86_64...(no debugging symbols
found)...done.".

Thanks,
Stefan
Fernando Casas Schössow Feb. 21, 2019, 11:33 a.m. UTC | #26
Hi Stefan,

I can confirm that the symbols are included in the binary using gdb.

I will send you and Paolo an email with the link and credentials (if needed) so you can download everything.

Thanks!

On jue, feb 21, 2019 at 12:11 PM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
On Wed, Feb 20, 2019 at 06:56:04PM +0000, Fernando Casas Schössow wrote:
Regarding the dumps I have three of them including guest memory, 2 for virtio-scsi, 1 for virtio-blk, in case a comparison may help to confirm which is the proble.) I can upload them to a server you indicate me or I can also put them on a server so you can download them as you see fit. Each dump, compressed, is around 500MB.
Hi Fernando, It would be great if you could make a compressed coredump and the corresponding QEMU executable (hopefully with debug symbols) available on a server so Paolo and/or I can download them. If you're wondering about the debug symbols, since you built QEMU from source the binary in x86_64-softmmu/qemu-system-x86_64 should have the debug symbols. "gdb path/to/qemu-system-x86_64" will print "Reading symbols from qemu-system-x86_64...done." if symbols are available. Otherwise it will say "Reading symbols from qemu-system-x86_64...(no debugging symbols found)...done.". Thanks, Stefan
Stefan Hajnoczi Feb. 22, 2019, 2:04 p.m. UTC | #27
On Fri, Feb 22, 2019 at 12:57 PM Fernando Casas Schössow
<casasfernando@outlook.com> wrote:

I have CCed Natanael Copa, qemu package maintainer in Alpine Linux.

Fernando: Can you confirm that the bug occurs with an unmodified
Alpine Linux qemu binary?

Richard: Commit 7db2145a6826b14efceb8dd64bfe6ad8647072eb ("bswap: Add
host endian unaligned access functions") introduced the unaligned
memory access functions in question here.  Please see below for
details on the bug - basically QEMU code assumes they are atomic, but
that is not guaranteed :(.  Any ideas for how to fix this?

Natanael: It seems likely that the qemu package in Alpine Linux
suffers from a compilation issue resulting in a broken QEMU.  It may
be necessary to leave the compiler optimization flag alone in APKBUILD
to work around this problem.

Here are the details.  QEMU relies on the compiler turning
memcpy(&dst, &src, 2) turning into a load instruction in
include/qemu/bswap.h:lduw_he_p() (explanation below):

/* Any compiler worth its salt will turn these memcpy into native unaligned
   operations.  Thus we don't need to play games with packed attributes, or
   inline byte-by-byte stores.  */

static inline int lduw_he_p(const void *ptr)
{
    uint16_t r;
    memcpy(&r, ptr, sizeof(r));
    return r;
}

Here is the disassembly snippet of virtqueue_pop() from Fedora 29 that
shows the load instruction:

  398166:       0f b7 42 02             movzwl 0x2(%rdx),%eax
  39816a:       66 89 43 32             mov    %ax,0x32(%rbx)

Here is the instruction sequence in the Alpine Linux binary:

  455562:    ba 02 00 00 00           mov    $0x2,%edx
  455567:    e8 74 24 f3 ff           callq  3879e0 <memcpy@plt>
  45556c:    0f b7 44 24 42           movzwl 0x42(%rsp),%eax
  455571:    66 41 89 47 32           mov    %ax,0x32(%r15)

It's calling memcpy instead of using a load instruction.

Fernando found that QEMU's virtqueue_pop() function sees bogus values
when loading a 16-bit guest RAM location.  Paolo figured out that the
bogus value can be produced by memcpy() when another thread is
updating the 16-bit memory location simultaneously.  This is a race
condition between one thread loading the 16-bit value and another
thread storing it (in this case a guest vcpu thread).  Sometimes
memcpy() may load one old byte and one new byte, resulting in a bogus
value.

The symptom that Fernando experienced is a "Virtqueue size exceeded"
error message from QEMU and then the virtio-blk or virtio-scsi device
stops working.  This issue potentially affects other device emulation
code in QEMU as well, not just virtio devices.

For the time being, I suggest tweaking the APKBUILD so the memcpy() is
not generated.  Hopefully QEMU can make the code more portable in the
future so the compiler always does the expected thing, but this may
not be easily possible.

Stefan
Paolo Bonzini Feb. 22, 2019, 2:38 p.m. UTC | #28
On 22/02/19 15:04, Stefan Hajnoczi wrote:
> On Fri, Feb 22, 2019 at 12:57 PM Fernando Casas Schössow
> <casasfernando@outlook.com> wrote:
> 
> I have CCed Natanael Copa, qemu package maintainer in Alpine Linux.
> 
> Fernando: Can you confirm that the bug occurs with an unmodified
> Alpine Linux qemu binary?

I can confirm that the memcpy calls are there, at least.

Paolo

> Richard: Commit 7db2145a6826b14efceb8dd64bfe6ad8647072eb ("bswap: Add
> host endian unaligned access functions") introduced the unaligned
> memory access functions in question here.  Please see below for
> details on the bug - basically QEMU code assumes they are atomic, but
> that is not guaranteed :(.  Any ideas for how to fix this?
> 
> Natanael: It seems likely that the qemu package in Alpine Linux
> suffers from a compilation issue resulting in a broken QEMU.  It may
> be necessary to leave the compiler optimization flag alone in APKBUILD
> to work around this problem.
> 
> Here are the details.  QEMU relies on the compiler turning
> memcpy(&dst, &src, 2) turning into a load instruction in
> include/qemu/bswap.h:lduw_he_p() (explanation below):
> 
> /* Any compiler worth its salt will turn these memcpy into native unaligned
>    operations.  Thus we don't need to play games with packed attributes, or
>    inline byte-by-byte stores.  */
> 
> static inline int lduw_he_p(const void *ptr)
> {
>     uint16_t r;
>     memcpy(&r, ptr, sizeof(r));
>     return r;
> }
> 
> Here is the disassembly snippet of virtqueue_pop() from Fedora 29 that
> shows the load instruction:
> 
>   398166:       0f b7 42 02             movzwl 0x2(%rdx),%eax
>   39816a:       66 89 43 32             mov    %ax,0x32(%rbx)
> 
> Here is the instruction sequence in the Alpine Linux binary:
> 
>   455562:    ba 02 00 00 00           mov    $0x2,%edx
>   455567:    e8 74 24 f3 ff           callq  3879e0 <memcpy@plt>
>   45556c:    0f b7 44 24 42           movzwl 0x42(%rsp),%eax
>   455571:    66 41 89 47 32           mov    %ax,0x32(%r15)
> 
> It's calling memcpy instead of using a load instruction.
> 
> Fernando found that QEMU's virtqueue_pop() function sees bogus values
> when loading a 16-bit guest RAM location.  Paolo figured out that the
> bogus value can be produced by memcpy() when another thread is
> updating the 16-bit memory location simultaneously.  This is a race
> condition between one thread loading the 16-bit value and another
> thread storing it (in this case a guest vcpu thread).  Sometimes
> memcpy() may load one old byte and one new byte, resulting in a bogus
> value.
> 
> The symptom that Fernando experienced is a "Virtqueue size exceeded"
> error message from QEMU and then the virtio-blk or virtio-scsi device
> stops working.  This issue potentially affects other device emulation
> code in QEMU as well, not just virtio devices.
> 
> For the time being, I suggest tweaking the APKBUILD so the memcpy() is
> not generated.  Hopefully QEMU can make the code more portable in the
> future so the compiler always does the expected thing, but this may
> not be easily possible.
> 
> Stefan
>
Fernando Casas Schössow Feb. 22, 2019, 2:43 p.m. UTC | #29
Hi all,

Indeed I can confirm this issue occurs with the stock, unmodified QEMU binary provided with Alpine since at least Alpine 3.6 up to 3.9.

Is there any compiler flag I can tweak, add or remove to rebuild qemu and try to repro to confirm a possible workaround?

Thanks.

Fernando

Sent from my iPhone

> On 22 Feb 2019, at 15:04, Stefan Hajnoczi <stefanha@gmail.com> wrote:
> 
> On Fri, Feb 22, 2019 at 12:57 PM Fernando Casas Schössow
> <casasfernando@outlook.com> wrote:
> 
> I have CCed Natanael Copa, qemu package maintainer in Alpine Linux.
> 
> Fernando: Can you confirm that the bug occurs with an unmodified
> Alpine Linux qemu binary?
> 
> Richard: Commit 7db2145a6826b14efceb8dd64bfe6ad8647072eb ("bswap: Add
> host endian unaligned access functions") introduced the unaligned
> memory access functions in question here.  Please see below for
> details on the bug - basically QEMU code assumes they are atomic, but
> that is not guaranteed :(.  Any ideas for how to fix this?
> 
> Natanael: It seems likely that the qemu package in Alpine Linux
> suffers from a compilation issue resulting in a broken QEMU.  It may
> be necessary to leave the compiler optimization flag alone in APKBUILD
> to work around this problem.
> 
> Here are the details.  QEMU relies on the compiler turning
> memcpy(&dst, &src, 2) turning into a load instruction in
> include/qemu/bswap.h:lduw_he_p() (explanation below):
> 
> /* Any compiler worth its salt will turn these memcpy into native unaligned
>   operations.  Thus we don't need to play games with packed attributes, or
>   inline byte-by-byte stores.  */
> 
> static inline int lduw_he_p(const void *ptr)
> {
>    uint16_t r;
>    memcpy(&r, ptr, sizeof(r));
>    return r;
> }
> 
> Here is the disassembly snippet of virtqueue_pop() from Fedora 29 that
> shows the load instruction:
> 
>  398166:       0f b7 42 02             movzwl 0x2(%rdx),%eax
>  39816a:       66 89 43 32             mov    %ax,0x32(%rbx)
> 
> Here is the instruction sequence in the Alpine Linux binary:
> 
>  455562:    ba 02 00 00 00           mov    $0x2,%edx
>  455567:    e8 74 24 f3 ff           callq  3879e0 <memcpy@plt>
>  45556c:    0f b7 44 24 42           movzwl 0x42(%rsp),%eax
>  455571:    66 41 89 47 32           mov    %ax,0x32(%r15)
> 
> It's calling memcpy instead of using a load instruction.
> 
> Fernando found that QEMU's virtqueue_pop() function sees bogus values
> when loading a 16-bit guest RAM location.  Paolo figured out that the
> bogus value can be produced by memcpy() when another thread is
> updating the 16-bit memory location simultaneously.  This is a race
> condition between one thread loading the 16-bit value and another
> thread storing it (in this case a guest vcpu thread).  Sometimes
> memcpy() may load one old byte and one new byte, resulting in a bogus
> value.
> 
> The symptom that Fernando experienced is a "Virtqueue size exceeded"
> error message from QEMU and then the virtio-blk or virtio-scsi device
> stops working.  This issue potentially affects other device emulation
> code in QEMU as well, not just virtio devices.
> 
> For the time being, I suggest tweaking the APKBUILD so the memcpy() is
> not generated.  Hopefully QEMU can make the code more portable in the
> future so the compiler always does the expected thing, but this may
> not be easily possible.
> 
> Stefan
Paolo Bonzini Feb. 22, 2019, 2:55 p.m. UTC | #30
On 22/02/19 15:43, Fernando Casas Schössow wrote:
> Hi all,
> 
> Indeed I can confirm this issue occurs with the stock, unmodified
> QEMU binary provided with Alpine since at least Alpine 3.6 up to
> 3.9.
> 
> Is there any compiler flag I can tweak, add or remove to rebuild qemu
> and try to repro to confirm a possible workaround?

Nope, not yet.  However, I can try some experiments if you can provide
some information on how to rebuild an apk.

Paolo

> Thanks.
> 
> Fernando
> 
> Sent from my iPhone
> 
>> On 22 Feb 2019, at 15:04, Stefan Hajnoczi <stefanha@gmail.com> wrote:
>>
>> On Fri, Feb 22, 2019 at 12:57 PM Fernando Casas Schössow
>> <casasfernando@outlook.com> wrote:
>>
>> I have CCed Natanael Copa, qemu package maintainer in Alpine Linux.
>>
>> Fernando: Can you confirm that the bug occurs with an unmodified
>> Alpine Linux qemu binary?
>>
>> Richard: Commit 7db2145a6826b14efceb8dd64bfe6ad8647072eb ("bswap: Add
>> host endian unaligned access functions") introduced the unaligned
>> memory access functions in question here.  Please see below for
>> details on the bug - basically QEMU code assumes they are atomic, but
>> that is not guaranteed :(.  Any ideas for how to fix this?
>>
>> Natanael: It seems likely that the qemu package in Alpine Linux
>> suffers from a compilation issue resulting in a broken QEMU.  It may
>> be necessary to leave the compiler optimization flag alone in APKBUILD
>> to work around this problem.
>>
>> Here are the details.  QEMU relies on the compiler turning
>> memcpy(&dst, &src, 2) turning into a load instruction in
>> include/qemu/bswap.h:lduw_he_p() (explanation below):
>>
>> /* Any compiler worth its salt will turn these memcpy into native unaligned
>>   operations.  Thus we don't need to play games with packed attributes, or
>>   inline byte-by-byte stores.  */
>>
>> static inline int lduw_he_p(const void *ptr)
>> {
>>    uint16_t r;
>>    memcpy(&r, ptr, sizeof(r));
>>    return r;
>> }
>>
>> Here is the disassembly snippet of virtqueue_pop() from Fedora 29 that
>> shows the load instruction:
>>
>>  398166:       0f b7 42 02             movzwl 0x2(%rdx),%eax
>>  39816a:       66 89 43 32             mov    %ax,0x32(%rbx)
>>
>> Here is the instruction sequence in the Alpine Linux binary:
>>
>>  455562:    ba 02 00 00 00           mov    $0x2,%edx
>>  455567:    e8 74 24 f3 ff           callq  3879e0 <memcpy@plt>
>>  45556c:    0f b7 44 24 42           movzwl 0x42(%rsp),%eax
>>  455571:    66 41 89 47 32           mov    %ax,0x32(%r15)
>>
>> It's calling memcpy instead of using a load instruction.
>>
>> Fernando found that QEMU's virtqueue_pop() function sees bogus values
>> when loading a 16-bit guest RAM location.  Paolo figured out that the
>> bogus value can be produced by memcpy() when another thread is
>> updating the 16-bit memory location simultaneously.  This is a race
>> condition between one thread loading the 16-bit value and another
>> thread storing it (in this case a guest vcpu thread).  Sometimes
>> memcpy() may load one old byte and one new byte, resulting in a bogus
>> value.
>>
>> The symptom that Fernando experienced is a "Virtqueue size exceeded"
>> error message from QEMU and then the virtio-blk or virtio-scsi device
>> stops working.  This issue potentially affects other device emulation
>> code in QEMU as well, not just virtio devices.
>>
>> For the time being, I suggest tweaking the APKBUILD so the memcpy() is
>> not generated.  Hopefully QEMU can make the code more portable in the
>> future so the compiler always does the expected thing, but this may
>> not be easily possible.
>>
>> Stefan
Fernando Casas Schössow Feb. 22, 2019, 3:48 p.m. UTC | #31
You can find the the information here: https://wiki.alpinelinux.org/wiki/Creating_an_Alpine_package

For your convenience, I can setup an Alpine VM and give you SSH access with the build environment and the aports tree already setup.
Then summary steps in case you want to try on your own container/VM would be:

1- Setup the build environment as described in the wiki.
2- Clone aports tree: git clone https://github.com/alpinelinux/aports
3- Go into the qemu package directory directory in aports tree: cd aports/main/qemu
4- Run: abuild clean checksum unpack prepare deps build

This would get you the binary built.
If you need to add a patch. Create the patch file add it to the APKBUILD file and then run "abuild checksum" to compute it's checksum and add it to the APKBUILD file.

abuild quick ref:

abuild clean: Remove temp build and install dirs
abuild unpack: Unpack sources to \$srcdir
abuild checksum: Generate checksum to be included in APKBUILD
abuild prepare: Apply patches
abuild deps: Install packages listed in makedepends and depends
abuild build: Compile and install package into $pkgdir

Once finished to clean up everything run: abuild undeps clean

Thanks.

Fernando


On vie, feb 22, 2019 at 3:55 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
On 22/02/19 15:43, Fernando Casas Schössow wrote:
Hi all, Indeed I can confirm this issue occurs with the stock, unmodified QEMU binary provided with Alpine since at least Alpine 3.6 up to 3.9. Is there any compiler flag I can tweak, add or remove to rebuild qemu and try to repro to confirm a possible workaround?
Nope, not yet. However, I can try some experiments if you can provide some information on how to rebuild an apk. Paolo
Dr. David Alan Gilbert Feb. 22, 2019, 4:37 p.m. UTC | #32
* Stefan Hajnoczi (stefanha@gmail.com) wrote:
> On Fri, Feb 22, 2019 at 12:57 PM Fernando Casas Schössow
> <casasfernando@outlook.com> wrote:
> 
> I have CCed Natanael Copa, qemu package maintainer in Alpine Linux.
> 
> Fernando: Can you confirm that the bug occurs with an unmodified
> Alpine Linux qemu binary?
> 
> Richard: Commit 7db2145a6826b14efceb8dd64bfe6ad8647072eb ("bswap: Add
> host endian unaligned access functions") introduced the unaligned
> memory access functions in question here.  Please see below for
> details on the bug - basically QEMU code assumes they are atomic, but
> that is not guaranteed :(.  Any ideas for how to fix this?


Why does vring_avail_idx use virtio_ld*u*w_phys_cached?
(similar for vring_avail_ring).
There's no generically safe way to do unaligned atomic loads - don't we know
in virtio that these are naturally aligned?

Dave

> Natanael: It seems likely that the qemu package in Alpine Linux
> suffers from a compilation issue resulting in a broken QEMU.  It may
> be necessary to leave the compiler optimization flag alone in APKBUILD
> to work around this problem.
> 
> Here are the details.  QEMU relies on the compiler turning
> memcpy(&dst, &src, 2) turning into a load instruction in
> include/qemu/bswap.h:lduw_he_p() (explanation below):
> 
> /* Any compiler worth its salt will turn these memcpy into native unaligned
>    operations.  Thus we don't need to play games with packed attributes, or
>    inline byte-by-byte stores.  */
> 
> static inline int lduw_he_p(const void *ptr)
> {
>     uint16_t r;
>     memcpy(&r, ptr, sizeof(r));
>     return r;
> }
> 
> Here is the disassembly snippet of virtqueue_pop() from Fedora 29 that
> shows the load instruction:
> 
>   398166:       0f b7 42 02             movzwl 0x2(%rdx),%eax
>   39816a:       66 89 43 32             mov    %ax,0x32(%rbx)
> 
> Here is the instruction sequence in the Alpine Linux binary:
> 
>   455562:    ba 02 00 00 00           mov    $0x2,%edx
>   455567:    e8 74 24 f3 ff           callq  3879e0 <memcpy@plt>
>   45556c:    0f b7 44 24 42           movzwl 0x42(%rsp),%eax
>   455571:    66 41 89 47 32           mov    %ax,0x32(%r15)
> 
> It's calling memcpy instead of using a load instruction.
> 
> Fernando found that QEMU's virtqueue_pop() function sees bogus values
> when loading a 16-bit guest RAM location.  Paolo figured out that the
> bogus value can be produced by memcpy() when another thread is
> updating the 16-bit memory location simultaneously.  This is a race
> condition between one thread loading the 16-bit value and another
> thread storing it (in this case a guest vcpu thread).  Sometimes
> memcpy() may load one old byte and one new byte, resulting in a bogus
> value.
> 
> The symptom that Fernando experienced is a "Virtqueue size exceeded"
> error message from QEMU and then the virtio-blk or virtio-scsi device
> stops working.  This issue potentially affects other device emulation
> code in QEMU as well, not just virtio devices.
> 
> For the time being, I suggest tweaking the APKBUILD so the memcpy() is
> not generated.  Hopefully QEMU can make the code more portable in the
> future so the compiler always does the expected thing, but this may
> not be easily possible.
> 
> Stefan
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
Paolo Bonzini Feb. 22, 2019, 4:39 p.m. UTC | #33
On 22/02/19 17:37, Dr. David Alan Gilbert wrote:
> Why does vring_avail_idx use virtio_ld*u*w_phys_cached?
> (similar for vring_avail_ring).
> There's no generically safe way to do unaligned atomic loads - don't we know
> in virtio that these are naturally aligned?

u is for unsigned. :)  We know that these are aligned and so they will
be atomic.

Paolo
Dr. David Alan Gilbert Feb. 22, 2019, 4:47 p.m. UTC | #34
* Paolo Bonzini (pbonzini@redhat.com) wrote:
> On 22/02/19 17:37, Dr. David Alan Gilbert wrote:
> > Why does vring_avail_idx use virtio_ld*u*w_phys_cached?
> > (similar for vring_avail_ring).
> > There's no generically safe way to do unaligned atomic loads - don't we know
> > in virtio that these are naturally aligned?
> 
> u is for unsigned. :)  We know that these are aligned and so they will
> be atomic.

Ah OK!

Dave

> Paolo
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
Natanael Copa Feb. 23, 2019, 11:49 a.m. UTC | #35
On Fri, 22 Feb 2019 14:04:20 +0000
Stefan Hajnoczi <stefanha@gmail.com> wrote:

> On Fri, Feb 22, 2019 at 12:57 PM Fernando Casas Schössow
> <casasfernando@outlook.com> wrote:
> 
> I have CCed Natanael Copa, qemu package maintainer in Alpine Linux.

Hi!

...

> Richard: Commit 7db2145a6826b14efceb8dd64bfe6ad8647072eb ("bswap: Add
> host endian unaligned access functions") introduced the unaligned
> memory access functions in question here.  Please see below for
> details on the bug - basically QEMU code assumes they are atomic, but
> that is not guaranteed :(.  Any ideas for how to fix this?
> 
> Natanael: It seems likely that the qemu package in Alpine Linux
> suffers from a compilation issue resulting in a broken QEMU.  It may
> be necessary to leave the compiler optimization flag alone in APKBUILD
> to work around this problem.
> 
> Here are the details.  QEMU relies on the compiler turning
> memcpy(&dst, &src, 2) turning into a load instruction in
> include/qemu/bswap.h:lduw_he_p() (explanation below):
> 
> /* Any compiler worth its salt will turn these memcpy into native unaligned
>    operations.  Thus we don't need to play games with packed attributes, or
>    inline byte-by-byte stores.  */
> 
> static inline int lduw_he_p(const void *ptr)
> {
>     uint16_t r;
>     memcpy(&r, ptr, sizeof(r));
>     return r;
> }
> 
> Here is the disassembly snippet of virtqueue_pop() from Fedora 29 that
> shows the load instruction:
> 
>   398166:       0f b7 42 02             movzwl 0x2(%rdx),%eax
>   39816a:       66 89 43 32             mov    %ax,0x32(%rbx)
> 
> Here is the instruction sequence in the Alpine Linux binary:
> 
>   455562:    ba 02 00 00 00           mov    $0x2,%edx
>   455567:    e8 74 24 f3 ff           callq  3879e0 <memcpy@plt>
>   45556c:    0f b7 44 24 42           movzwl 0x42(%rsp),%eax
>   455571:    66 41 89 47 32           mov    %ax,0x32(%r15)
> 
> It's calling memcpy instead of using a load instruction.

My first reaction to this is: If the intention is to not actually call
memcpy function, then maybe memcpy should not be used in the C code in
first place?
 
> Fernando found that QEMU's virtqueue_pop() function sees bogus values
> when loading a 16-bit guest RAM location.  Paolo figured out that the
> bogus value can be produced by memcpy() when another thread is
> updating the 16-bit memory location simultaneously.  This is a race
> condition between one thread loading the 16-bit value and another
> thread storing it (in this case a guest vcpu thread).  Sometimes
> memcpy() may load one old byte and one new byte, resulting in a bogus
> value.
> 
> The symptom that Fernando experienced is a "Virtqueue size exceeded"
> error message from QEMU and then the virtio-blk or virtio-scsi device
> stops working.  This issue potentially affects other device emulation
> code in QEMU as well, not just virtio devices.
> 
> For the time being, I suggest tweaking the APKBUILD so the memcpy() is
> not generated.  Hopefully QEMU can make the code more portable in the
> future so the compiler always does the expected thing, but this may
> not be easily possible.

I suspect this happens due to the Alpine toolchain will enable
_FORTIFY_SOURCE=2 by default and the way this is implemented via fortify-headers:
http://git.2f30.org/fortify-headers/file/include/string.h.html#l39

Try build with -U_FORTIFY_SOURCE

> 
> Stefan
Natanael Copa Feb. 23, 2019, 3:55 p.m. UTC | #36
On Fri, 22 Feb 2019 14:04:20 +0000
Stefan Hajnoczi <stefanha@gmail.com> wrote:

> On Fri, Feb 22, 2019 at 12:57 PM Fernando Casas Schössow
> <casasfernando@outlook.com> wrote:
> 
> I have CCed Natanael Copa, qemu package maintainer in Alpine Linux.
> 
> Fernando: Can you confirm that the bug occurs with an unmodified
> Alpine Linux qemu binary?

I wonder exactly which binary it is. What arcitecture (x86 or x86_64)
and what binary. Is it ddynamic or statically linked qemu-system-x86_64?
 
> Richard: Commit 7db2145a6826b14efceb8dd64bfe6ad8647072eb ("bswap: Add
> host endian unaligned access functions") introduced the unaligned
> memory access functions in question here.  Please see below for
> details on the bug - basically QEMU code assumes they are atomic, but
> that is not guaranteed :(.  Any ideas for how to fix this?
> 
> Natanael: It seems likely that the qemu package in Alpine Linux
> suffers from a compilation issue resulting in a broken QEMU.  It may
> be necessary to leave the compiler optimization flag alone in APKBUILD
> to work around this problem.
> 
> Here are the details.  QEMU relies on the compiler turning
> memcpy(&dst, &src, 2) turning into a load instruction in
> include/qemu/bswap.h:lduw_he_p() (explanation below):
> 
> /* Any compiler worth its salt will turn these memcpy into native unaligned
>    operations.  Thus we don't need to play games with packed attributes, or
>    inline byte-by-byte stores.  */
> 
> static inline int lduw_he_p(const void *ptr)
> {
>     uint16_t r;
>     memcpy(&r, ptr, sizeof(r));
>     return r;
> }
> 
> Here is the disassembly snippet of virtqueue_pop() from Fedora 29 that
> shows the load instruction:
> 
>   398166:       0f b7 42 02             movzwl 0x2(%rdx),%eax
>   39816a:       66 89 43 32             mov    %ax,0x32(%rbx)
> 
> Here is the instruction sequence in the Alpine Linux binary:
> 
>   455562:    ba 02 00 00 00           mov    $0x2,%edx
>   455567:    e8 74 24 f3 ff           callq  3879e0 <memcpy@plt>
>   45556c:    0f b7 44 24 42           movzwl 0x42(%rsp),%eax
>   455571:    66 41 89 47 32           mov    %ax,0x32(%r15)
> 
> It's calling memcpy instead of using a load instruction.

I tried to find this section. How do you get the assembly listing of
relevant secion? I tried to do "disas virtio_pop" from
`gdb /usr/bin/qemu-system-x86_64` from the binary in alpine edge. I
could find 2 memcpy but none of them look like a 16 bit operation after:

   0x00000000004551f1 <+353>:   mov    0x10(%rsp),%rdi
   0x00000000004551f6 <+358>:   mov    $0x10,%edx
   0x00000000004551fb <+363>:   callq  0x3879e0 <memcpy@plt>
   0x0000000000455200 <+368>:   movzwl 0x5c(%rsp),%eax
   0x0000000000455205 <+373>:   test   $0x4,%al
   0x0000000000455207 <+375>:   je     0x4552aa <virtqueue_pop+538>

....

   0x0000000000455291 <+513>:   mov    0x10(%rsp),%rdi
   0x0000000000455296 <+518>:   mov    $0x10,%edx
   0x000000000045529b <+523>:   callq  0x3879e0 <memcpy@plt>
   0x00000000004552a0 <+528>:   mov    %rbp,0x20(%rsp)
   0x00000000004552a5 <+533>:   movzwl 0x5c(%rsp),%eax
   0x00000000004552aa <+538>:   lea    0x20e0(%rsp),%rdi
   0x00000000004552b2 <+546>:   xor    %r11d,%r11d
   0x00000000004552b5 <+549>:   mov    %r15,0x38(%rsp)



> 
> Fernando found that QEMU's virtqueue_pop() function sees bogus values
> when loading a 16-bit guest RAM location.  Paolo figured out that the
> bogus value can be produced by memcpy() when another thread is
> updating the 16-bit memory location simultaneously.  This is a race
> condition between one thread loading the 16-bit value and another
> thread storing it (in this case a guest vcpu thread).  Sometimes
> memcpy() may load one old byte and one new byte, resulting in a bogus
> value.
> 
> The symptom that Fernando experienced is a "Virtqueue size exceeded"
> error message from QEMU and then the virtio-blk or virtio-scsi device
> stops working.  This issue potentially affects other device emulation
> code in QEMU as well, not just virtio devices.
> 
> For the time being, I suggest tweaking the APKBUILD so the memcpy() is
> not generated.  Hopefully QEMU can make the code more portable in the
> future so the compiler always does the expected thing, but this may
> not be easily possible.

I was thinking of something in the lines of:

typedef volatile uint16_t __attribute__((__may_alias__)) volatile_uint16_t;
static inline int lduw_he_p(const void *ptr)
{
     volatile_uint16_t r = *(volatile_uint16_t*)ptr;
     return r;
}

I can test different CFLAGS with and without the _FORTIFY_SOURCE and
with different variants of memcpy (like __builtint_memcpy etc) but i
need find a way to get the correct assembly output so I know if/when I
have found something that works.

Thanks!

-nc


> 
> Stefan
Peter Maydell Feb. 23, 2019, 4:18 p.m. UTC | #37
On Sat, 23 Feb 2019 at 16:05, Natanael Copa <ncopa@alpinelinux.org> wrote:
> I was thinking of something in the lines of:
>
> typedef volatile uint16_t __attribute__((__may_alias__)) volatile_uint16_t;
> static inline int lduw_he_p(const void *ptr)
> {
>      volatile_uint16_t r = *(volatile_uint16_t*)ptr;
>      return r;
> }

This won't correctly handle accesses with unaligned pointers,
I'm afraid. We rely on these functions correctly working
with pointers that are potentially unaligned.

thanks
-- PMM
Fernando Casas Schössow Feb. 23, 2019, 4:21 p.m. UTC | #38
Hi Natanael,

The package information is the following:

https://pkgs.alpinelinux.org/package/v3.9/main/x86_64/qemu-system-x86_64

The binary is for x86_64 architecture and dynamically linked:

fernando@vmsvr01:~$ file /usr/bin/qemu-system-x86_64
/usr/bin/qemu-system-x86_64: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib/ld-musl-x86_64.so.1, stripped

Thanks.

On sáb, feb 23, 2019 at 4:55 PM, Natanael Copa <ncopa@alpinelinux.org> wrote:
On Fri, 22 Feb 2019 14:04:20 +0000 Stefan Hajnoczi <stefanha@gmail.com<mailto:stefanha@gmail.com>> wrote:
On Fri, Feb 22, 2019 at 12:57 PM Fernando Casas Schössow <casasfernando@outlook.com<mailto:casasfernando@outlook.com>> wrote: I have CCed Natanael Copa, qemu package maintainer in Alpine Linux. Fernando: Can you confirm that the bug occurs with an unmodified Alpine Linux qemu binary?
I wonder exactly which binary it is. What arcitecture (x86 or x86_64) and what binary. Is it ddynamic or statically linked qemu-system-x86_64?
Peter Maydell Feb. 23, 2019, 4:57 p.m. UTC | #39
On Fri, 22 Feb 2019 at 14:07, Stefan Hajnoczi <stefanha@gmail.com> wrote:
> Richard: Commit 7db2145a6826b14efceb8dd64bfe6ad8647072eb ("bswap: Add
> host endian unaligned access functions") introduced the unaligned
> memory access functions in question here.  Please see below for
> details on the bug - basically QEMU code assumes they are atomic, but
> that is not guaranteed :(.  Any ideas for how to fix this?

I suspect we want a separate family of access functions for
"I guarantee this will be an aligned access and I need the
atomicity". (The other place where we've talked about needing
the atomicity is in emulation of page-table-walk, where you
need the page table loads to be atomic w.r.t. other CPU
threads, especially in the case where you need to emulate
a hardware update of a dirty/access bit in the page table entry.)

Mostly this hasn't bitten us before because any sensible compiler
will turn the memcpy into a straight load on most common hosts,
which will be atomic (but accidentally rather than on purpose).

thanks
-- PMM
Natanael Copa Feb. 25, 2019, 10:24 a.m. UTC | #40
On Sat, 23 Feb 2019 16:18:15 +0000
Peter Maydell <peter.maydell@linaro.org> wrote:

> On Sat, 23 Feb 2019 at 16:05, Natanael Copa <ncopa@alpinelinux.org> wrote:
> > I was thinking of something in the lines of:
> >
> > typedef volatile uint16_t __attribute__((__may_alias__)) volatile_uint16_t;
> > static inline int lduw_he_p(const void *ptr)
> > {
> >      volatile_uint16_t r = *(volatile_uint16_t*)ptr;
> >      return r;
> > }  
> 
> This won't correctly handle accesses with unaligned pointers,
> I'm afraid. We rely on these functions correctly working
> with pointers that are potentially unaligned.

Well, current code does not even handle access with aligned pointers,
depending on FORTIFY_SOURCE implementation.

My thinking here is that we depend on assumption that compiler will
remove the memcpy call, so maybe find other way to generate same
assembly, while still depend on compiler optimization.

I did some tests and compared the assembly output. The compiler will
generate same assembly if volatile is not used. The attribute
__may_alias__ does not seem to make any difference. So the following
function generates exactly same assembly:

static inline int lduw_he_p(const void *ptr)
{
	uint16_t r = *(uint16_t*)ptr;
	return r;
}

I created a script to compare on different Linux distros and
architectures. My conclusion is:

- Alpine fortify implementation is non-optimal, but technically
  correct.
- `uint16_t r = *(uint16_t*)ptr;` generates the same assembly code,
  regardless of fortify implementation, and is clearer on the
  intention, than use of memcpy.


test.sh:
---[CUT HERE]---8<--------------------------------------------------
#!/bin/sh -x
cat > testcase.c <<EOF
#include <stdint.h>

#ifdef WITH_MEMCPY
#include <string.h>
static inline int lduw_he_p(const void *ptr)
{
	uint16_t r;
	memcpy(&r, ptr, sizeof(r));
	return r;
}
#else
static inline int lduw_he_p(const void *ptr)
{
	uint16_t r = *(uint16_t*)ptr;
	return r;
}
#endif

int main(int argc, char *argv[])
{
	void *p = argv;
	int i;
	p++; // make sure we are unaligned
	i=lduw_he_p(p);
	return i;
}
EOF

: ${CC:=gcc}
$CC --version
uname -m

for fortify in "-D_FORTIFY_SOURCE=2" "-U_FORTIFY_SOURCE"; do
	for cflag in "-DWITH_MEMCPY" "-DWITHOUT_MEMCPY"; do
		$CC $cflag $fortify $@ -o testcase$cflag testcase.c
		gdb --batch -ex "disas ${func:-main}" ./testcase$cflag
	done
done
---[CUT HERE]---8<--------------------------------------------------


Output on Alpine:

$ sh -x test.sh -O2
+ cat
+ : gcc
+ gcc --version
gcc (Alpine 8.2.0) 8.2.0
Copyright (C) 2018 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

+ uname -m
x86_64
+ gcc -DWITH_MEMCPY '-D_FORTIFY_SOURCE=2' -O2 -o testcase-DWITH_MEMCPY testcase.c
+ gdb --batch -ex 'disas main' ./testcase-DWITH_MEMCPY
Dump of assembler code for function main:
   0x0000000000001070 <+0>:	sub    $0x18,%rsp
   0x0000000000001074 <+4>:	mov    %fs:0x28,%rax
   0x000000000000107d <+13>:	mov    %rax,0x8(%rsp)
   0x0000000000001082 <+18>:	xor    %eax,%eax
   0x0000000000001084 <+20>:	lea    0x6(%rsp),%rdi
   0x0000000000001089 <+25>:	lea    0x1(%rsi),%rax
   0x000000000000108d <+29>:	lea    0x2(%rdi),%rdx
   0x0000000000001091 <+33>:	cmp    %rdx,%rax
   0x0000000000001094 <+36>:	jae    0x109b <main+43>
   0x0000000000001096 <+38>:	cmp    %rdi,%rax
   0x0000000000001099 <+41>:	ja     0x10d0 <main+96>
   0x000000000000109b <+43>:	cmp    %rdi,%rax
   0x000000000000109e <+46>:	jb     0x10c7 <main+87>
   0x00000000000010a0 <+48>:	mov    $0x2,%edx
   0x00000000000010a5 <+53>:	mov    %rax,%rsi
   0x00000000000010a8 <+56>:	callq  0x1020 <memcpy@plt>
   0x00000000000010ad <+61>:	movzwl 0x6(%rsp),%eax
   0x00000000000010b2 <+66>:	mov    0x8(%rsp),%rcx
   0x00000000000010b7 <+71>:	xor    %fs:0x28,%rcx
   0x00000000000010c0 <+80>:	jne    0x10d2 <main+98>
   0x00000000000010c2 <+82>:	add    $0x18,%rsp
   0x00000000000010c6 <+86>:	retq   
   0x00000000000010c7 <+87>:	add    $0x3,%rsi
   0x00000000000010cb <+91>:	cmp    %rsi,%rdi
   0x00000000000010ce <+94>:	jae    0x10a0 <main+48>
   0x00000000000010d0 <+96>:	ud2    
   0x00000000000010d2 <+98>:	callq  0x1030 <__stack_chk_fail@plt>
End of assembler dump.
+ gcc -DWITHOUT_MEMCPY '-D_FORTIFY_SOURCE=2' -O2 -o testcase-DWITHOUT_MEMCPY testcase.c
+ gdb --batch -ex 'disas main' ./testcase-DWITHOUT_MEMCPY
Dump of assembler code for function main:
   0x0000000000001050 <+0>:	movzwl 0x1(%rsi),%eax
   0x0000000000001054 <+4>:	retq   
End of assembler dump.
+ gcc -DWITH_MEMCPY -U_FORTIFY_SOURCE -O2 -o testcase-DWITH_MEMCPY testcase.c
+ gdb --batch -ex 'disas main' ./testcase-DWITH_MEMCPY
Dump of assembler code for function main:
   0x0000000000001050 <+0>:	movzwl 0x1(%rsi),%eax
   0x0000000000001054 <+4>:	retq   
End of assembler dump.
+ gcc -DWITHOUT_MEMCPY -U_FORTIFY_SOURCE -O2 -o testcase-DWITHOUT_MEMCPY testcase.c
+ gdb --batch -ex 'disas main' ./testcase-DWITHOUT_MEMCPY
Dump of assembler code for function main:
   0x0000000000001050 <+0>:	movzwl 0x1(%rsi),%eax
   0x0000000000001054 <+4>:	retq   
End of assembler dump.




Output on Debian:

root@19c3f13c6023:~# sh -x test.sh -O2
+ cat
+ : gcc
+ gcc --version
gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516
Copyright (C) 2016 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

+ uname -m
x86_64
+ gcc -DWITH_MEMCPY -D_FORTIFY_SOURCE=2 -O2 -o testcase-DWITH_MEMCPY testcase.c
+ gdb --batch -ex disas main ./testcase-DWITH_MEMCPY
Dump of assembler code for function main:
   0x0000000000000530 <+0>:	movzwl 0x1(%rsi),%eax
   0x0000000000000534 <+4>:	retq   
End of assembler dump.
+ gcc -DWITHOUT_MEMCPY -D_FORTIFY_SOURCE=2 -O2 -o testcase-DWITHOUT_MEMCPY testcase.c
+ gdb --batch -ex disas main ./testcase-DWITHOUT_MEMCPY
Dump of assembler code for function main:
   0x0000000000000530 <+0>:	movzwl 0x1(%rsi),%eax
   0x0000000000000534 <+4>:	retq   
End of assembler dump.
+ gcc -DWITH_MEMCPY -U_FORTIFY_SOURCE -O2 -o testcase-DWITH_MEMCPY testcase.c
+ gdb --batch -ex disas main ./testcase-DWITH_MEMCPY
Dump of assembler code for function main:
   0x0000000000000530 <+0>:	movzwl 0x1(%rsi),%eax
   0x0000000000000534 <+4>:	retq   
End of assembler dump.
+ gcc -DWITHOUT_MEMCPY -U_FORTIFY_SOURCE -O2 -o testcase-DWITHOUT_MEMCPY testcase.c
+ gdb --batch -ex disas main ./testcase-DWITHOUT_MEMCPY
Dump of assembler code for function main:
   0x0000000000000530 <+0>:	movzwl 0x1(%rsi),%eax
   0x0000000000000534 <+4>:	retq   
End of assembler dump.
root@19c3f13c6023:~# 



Output on Fedora:
[root@ac68847f5bb1 ~]# sh -x test.sh -O2
+ cat
+ : gcc
+ gcc --version
gcc (GCC) 8.2.1 20181215 (Red Hat 8.2.1-6)
Copyright (C) 2018 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

+ uname -m
x86_64
+ for fortify in "-D_FORTIFY_SOURCE=2" "-U_FORTIFY_SOURCE"
+ for cflag in "-DWITH_MEMCPY" "-DWITHOUT_MEMCPY"
+ gcc -DWITH_MEMCPY -D_FORTIFY_SOURCE=2 -O2 -o testcase-DWITH_MEMCPY testcase.c
+ gdb --batch -ex 'disas main' ./testcase-DWITH_MEMCPY
Dump of assembler code for function main:
   0x0000000000401020 <+0>:	movzwl 0x1(%rsi),%eax
   0x0000000000401024 <+4>:	retq   
End of assembler dump.
+ for cflag in "-DWITH_MEMCPY" "-DWITHOUT_MEMCPY"
+ gcc -DWITHOUT_MEMCPY -D_FORTIFY_SOURCE=2 -O2 -o testcase-DWITHOUT_MEMCPY testcase.c
+ gdb --batch -ex 'disas main' ./testcase-DWITHOUT_MEMCPY
Dump of assembler code for function main:
   0x0000000000401020 <+0>:	movzwl 0x1(%rsi),%eax
   0x0000000000401024 <+4>:	retq   
End of assembler dump.
+ for fortify in "-D_FORTIFY_SOURCE=2" "-U_FORTIFY_SOURCE"
+ for cflag in "-DWITH_MEMCPY" "-DWITHOUT_MEMCPY"
+ gcc -DWITH_MEMCPY -U_FORTIFY_SOURCE -O2 -o testcase-DWITH_MEMCPY testcase.c
+ gdb --batch -ex 'disas main' ./testcase-DWITH_MEMCPY
Dump of assembler code for function main:
   0x0000000000401020 <+0>:	movzwl 0x1(%rsi),%eax
   0x0000000000401024 <+4>:	retq   
End of assembler dump.
+ for cflag in "-DWITH_MEMCPY" "-DWITHOUT_MEMCPY"
+ gcc -DWITHOUT_MEMCPY -U_FORTIFY_SOURCE -O2 -o testcase-DWITHOUT_MEMCPY testcase.c
+ gdb --batch -ex 'disas main' ./testcase-DWITHOUT_MEMCPY
Dump of assembler code for function main:
   0x0000000000401020 <+0>:	movzwl 0x1(%rsi),%eax
   0x0000000000401024 <+4>:	retq   
End of assembler dump.



Output on Alpine with clang:
$ CC=clang sh -x test.sh -O2
+ cat
+ : clang
+ clang --version
Alpine clang version 5.0.2 (tags/RELEASE_502/final) (based on LLVM 5.0.2)
Target: x86_64-alpine-linux-musl
Thread model: posix
InstalledDir: /usr/bin
+ uname -m
x86_64
+ clang -DWITH_MEMCPY '-D_FORTIFY_SOURCE=2' -O2 -o testcase-DWITH_MEMCPY testcase.c
+ gdb --batch -ex 'disas main' ./testcase-DWITH_MEMCPY
Dump of assembler code for function main:
   0x00000000000011a0 <+0>:	movzwl 0x1(%rsi),%eax
   0x00000000000011a4 <+4>:	retq   
End of assembler dump.
+ clang -DWITHOUT_MEMCPY '-D_FORTIFY_SOURCE=2' -O2 -o testcase-DWITHOUT_MEMCPY testcase.c
+ gdb --batch -ex 'disas main' ./testcase-DWITHOUT_MEMCPY
Dump of assembler code for function main:
   0x00000000000011a0 <+0>:	movzwl 0x1(%rsi),%eax
   0x00000000000011a4 <+4>:	retq   
End of assembler dump.
+ clang -DWITH_MEMCPY -U_FORTIFY_SOURCE -O2 -o testcase-DWITH_MEMCPY testcase.c
+ gdb --batch -ex 'disas main' ./testcase-DWITH_MEMCPY
Dump of assembler code for function main:
   0x00000000000011a0 <+0>:	movzwl 0x1(%rsi),%eax
   0x00000000000011a4 <+4>:	retq   
End of assembler dump.
+ clang -DWITHOUT_MEMCPY -U_FORTIFY_SOURCE -O2 -o testcase-DWITHOUT_MEMCPY testcase.c
+ gdb --batch -ex 'disas main' ./testcase-DWITHOUT_MEMCPY
Dump of assembler code for function main:
   0x00000000000011a0 <+0>:	movzwl 0x1(%rsi),%eax
   0x00000000000011a4 <+4>:	retq   
End of assembler dump.



Output on alpine armv7:
$ sh -x test.sh -O2
+ cat
+ : gcc
+ gcc --version
gcc (Alpine 8.2.0) 8.2.0
Copyright (C) 2018 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

+ uname -m
armv8l
+ gcc -DWITH_MEMCPY '-D_FORTIFY_SOURCE=2' -O2 -o testcase-DWITH_MEMCPY testcase.c
+ gdb --batch -ex 'disas main' ./testcase-DWITH_MEMCPY
Dump of assembler code for function main:
   0x000003f8 <+0>:     push    {r4, r5, lr}
   0x000003fa <+2>:     sub     sp, #12
   0x000003fc <+4>:     adds    r3, r1, #1
   0x000003fe <+6>:     add     r4, sp, #4
   0x00000400 <+8>:     cmp     r3, r4
   0x00000402 <+10>:    add.w   r0, sp, #2
   0x00000406 <+14>:    ite     cs
   0x00000408 <+16>:    movcs   r2, #0
   0x0000040a <+18>:    movcc   r2, #1
   0x0000040c <+20>:    cmp     r3, r0
   0x0000040e <+22>:    it      ls
   0x00000410 <+24>:    movls   r2, #0
   0x00000412 <+26>:    ldr     r4, [pc, #52]   ; (0x448 <main+80>)
   0x00000414 <+28>:    ldr     r5, [pc, #52]   ; (0x44c <main+84>)
   0x00000416 <+30>:    add     r4, pc
   0x00000418 <+32>:    ldr     r4, [r4, r5]
   0x0000041a <+34>:    ldr     r5, [r4, #0]
   0x0000041c <+36>:    str     r5, [sp, #4]
   0x0000041e <+38>:    cbnz    r2, 0x442 <main+74>
   0x00000420 <+40>:    cmp     r3, r0
   0x00000422 <+42>:    bcc.n   0x43c <main+68>
   0x00000424 <+44>:    mov     r1, r3
   0x00000426 <+46>:    movs    r2, #2
   0x00000428 <+48>:    blx     0x3b0 <memcpy@plt>
   0x0000042c <+52>:    ldr     r2, [sp, #4]
   0x0000042e <+54>:    ldr     r3, [r4, #0]
   0x00000430 <+56>:    ldrh.w  r0, [sp, #2]
   0x00000434 <+60>:    cmp     r2, r3
   0x00000436 <+62>:    bne.n   0x444 <main+76>
   0x00000438 <+64>:    add     sp, #12
   0x0000043a <+66>:    pop     {r4, r5, pc}
   0x0000043c <+68>:    adds    r1, #3
   0x0000043e <+70>:    cmp     r0, r1
   0x00000440 <+72>:    bcs.n   0x424 <main+44>
   0x00000442 <+74>:    udf     #255    ; 0xff
   0x00000444 <+76>:    blx     0x3c8 <__stack_chk_fail@plt>
   0x00000448 <+80>:    muleq   r1, lr, r11
   0x0000044c <+84>:    andeq   r0, r0, r4, lsr #32
End of assembler dump.
+ gcc -DWITHOUT_MEMCPY '-D_FORTIFY_SOURCE=2' -O2 -o testcase-DWITHOUT_MEMCPY testcase.c
+ gdb --batch -ex 'disas main' ./testcase-DWITHOUT_MEMCPY
Dump of assembler code for function main:
   0x0000036c <+0>:     ldrh.w  r0, [r1, #1]
   0x00000370 <+4>:     bx      lr
End of assembler dump.
+ gcc -DWITH_MEMCPY -U_FORTIFY_SOURCE -O2 -o testcase-DWITH_MEMCPY testcase.c
+ gdb --batch -ex 'disas main' ./testcase-DWITH_MEMCPY
Dump of assembler code for function main:
   0x0000036c <+0>:     ldrh.w  r0, [r1, #1]
   0x00000370 <+4>:     bx      lr
End of assembler dump.
+ gcc -DWITHOUT_MEMCPY -U_FORTIFY_SOURCE -O2 -o testcase-DWITHOUT_MEMCPY testcase.c
+ gdb --batch -ex 'disas main' ./testcase-DWITHOUT_MEMCPY
Dump of assembler code for function main:
   0x0000036c <+0>:     ldrh.w  r0, [r1, #1]
   0x00000370 <+4>:     bx      lr
End of assembler dump.


Output on Alpine aarch64:
$ sh -x test.sh -O2
+ cat
+ : gcc
+ gcc --version
gcc (Alpine 8.2.0) 8.2.0
Copyright (C) 2018 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

+ uname -m
aarch64
+ gcc -DWITH_MEMCPY '-D_FORTIFY_SOURCE=2' -O2 -o testcase-DWITH_MEMCPY testcase.c
+ gdb --batch -ex 'disas main' ./testcase-DWITH_MEMCPY
Dump of assembler code for function main:
   0x00000000000006b0 <+0>:     stp     x29, x30, [sp, #-48]!
   0x00000000000006b4 <+4>:     add     x2, x1, #0x1
   0x00000000000006b8 <+8>:     mov     x29, sp
   0x00000000000006bc <+12>:    str     x19, [sp, #16]
   0x00000000000006c0 <+16>:    adrp    x19, 0x10000
   0x00000000000006c4 <+20>:    add     x0, sp, #0x26
   0x00000000000006c8 <+24>:    ldr     x3, [x19, #4024]
   0x00000000000006cc <+28>:    add     x4, x0, #0x2
   0x00000000000006d0 <+32>:    cmp     x4, x2
   0x00000000000006d4 <+36>:    ldr     x4, [x3]
   0x00000000000006d8 <+40>:    str     x4, [sp, #40]
   0x00000000000006dc <+44>:    mov     x4, #0x0                        // #0
   0x00000000000006e0 <+48>:    ccmp    x0, x2, #0x2, hi  // hi = pmore
   0x00000000000006e4 <+52>:    b.cc    0x72c <main+124>  // b.lo, b.ul, b.last
   0x00000000000006e8 <+56>:    cmp     x2, x0
   0x00000000000006ec <+60>:    b.cc    0x720 <main+112>  // b.lo, b.ul, b.last
   0x00000000000006f0 <+64>:    mov     x1, x2
   0x00000000000006f4 <+68>:    mov     x2, #0x2                        // #2
   0x00000000000006f8 <+72>:    bl      0x650 <memcpy@plt>
   0x00000000000006fc <+76>:    ldr     x19, [x19, #4024]
   0x0000000000000700 <+80>:    ldrh    w0, [sp, #38]
   0x0000000000000704 <+84>:    ldr     x2, [sp, #40]
   0x0000000000000708 <+88>:    ldr     x1, [x19]
   0x000000000000070c <+92>:    eor     x1, x2, x1
   0x0000000000000710 <+96>:    cbnz    x1, 0x730 <main+128>
   0x0000000000000714 <+100>:   ldr     x19, [sp, #16]
   0x0000000000000718 <+104>:   ldp     x29, x30, [sp], #48
   0x000000000000071c <+108>:   ret
   0x0000000000000720 <+112>:   add     x1, x1, #0x3
   0x0000000000000724 <+116>:   cmp     x0, x1
   0x0000000000000728 <+120>:   b.cs    0x6f0 <main+64>  // b.hs, b.nlast
   0x000000000000072c <+124>:   brk     #0x3e8
   0x0000000000000730 <+128>:   bl      0x670 <__stack_chk_fail@plt>
End of assembler dump.
+ gcc -DWITHOUT_MEMCPY '-D_FORTIFY_SOURCE=2' -O2 -o testcase-DWITHOUT_MEMCPY testcase.c
+ gdb --batch -ex 'disas main' ./testcase-DWITHOUT_MEMCPY
Dump of assembler code for function main:
   0x00000000000005e0 <+0>:     ldurh   w0, [x1, #1]
   0x00000000000005e4 <+4>:     ret
End of assembler dump.
+ gcc -DWITH_MEMCPY -U_FORTIFY_SOURCE -O2 -o testcase-DWITH_MEMCPY testcase.c
+ gdb --batch -ex 'disas main' ./testcase-DWITH_MEMCPY
Dump of assembler code for function main:
   0x00000000000005e0 <+0>:     ldurh   w0, [x1, #1]
   0x00000000000005e4 <+4>:     ret
End of assembler dump.
+ gcc -DWITHOUT_MEMCPY -U_FORTIFY_SOURCE -O2 -o testcase-DWITHOUT_MEMCPY testcase.c
+ gdb --batch -ex 'disas main' ./testcase-DWITHOUT_MEMCPY
Dump of assembler code for function main:
   0x00000000000005e0 <+0>:     ldurh   w0, [x1, #1]
   0x00000000000005e4 <+4>:     ret
End of assembler dump.





Output on Alpine ppc64le:
$ sh -x test.sh -O2
+ cat
+ : gcc
+ gcc --version
gcc (Alpine 8.2.0) 8.2.0
Copyright (C) 2018 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

+ uname -m
ppc64le
+ gcc -DWITH_MEMCPY '-D_FORTIFY_SOURCE=2' -O2 -o testcase-DWITH_MEMCPY testcase.c
+ gdb --batch -ex 'disas main' ./testcase-DWITH_MEMCPY
Dump of assembler code for function main:
   0x00000000000005c0 <+0>:     lhz     r3,1(r4)
   0x00000000000005c4 <+4>:     blr
   0x00000000000005c8 <+8>:     .long 0x0
   0x00000000000005cc <+12>:    .long 0x0
   0x00000000000005d0 <+16>:    .long 0x0
End of assembler dump.
+ gcc -DWITHOUT_MEMCPY '-D_FORTIFY_SOURCE=2' -O2 -o testcase-DWITHOUT_MEMCPY testcase.c
+ gdb --batch -ex 'disas main' ./testcase-DWITHOUT_MEMCPY
Dump of assembler code for function main:
   0x00000000000005c0 <+0>:     lhz     r3,1(r4)
   0x00000000000005c4 <+4>:     blr
   0x00000000000005c8 <+8>:     .long 0x0
   0x00000000000005cc <+12>:    .long 0x0
   0x00000000000005d0 <+16>:    .long 0x0
End of assembler dump.
+ gcc -DWITH_MEMCPY -U_FORTIFY_SOURCE -O2 -o testcase-DWITH_MEMCPY testcase.c
+ gdb --batch -ex 'disas main' ./testcase-DWITH_MEMCPY
Dump of assembler code for function main:
   0x00000000000005c0 <+0>:     lhz     r3,1(r4)
   0x00000000000005c4 <+4>:     blr
   0x00000000000005c8 <+8>:     .long 0x0
   0x00000000000005cc <+12>:    .long 0x0
   0x00000000000005d0 <+16>:    .long 0x0
End of assembler dump.
+ gcc -DWITHOUT_MEMCPY -U_FORTIFY_SOURCE -O2 -o testcase-DWITHOUT_MEMCPY testcase.c
+ gdb --batch -ex 'disas main' ./testcase-DWITHOUT_MEMCPY
Dump of assembler code for function main:
   0x00000000000005c0 <+0>:     lhz     r3,1(r4)
   0x00000000000005c4 <+4>:     blr
   0x00000000000005c8 <+8>:     .long 0x0
   0x00000000000005cc <+12>:    .long 0x0
   0x00000000000005d0 <+16>:    .long 0x0
End of assembler dump.



Output on Alpine s390x:
$ sh -x test.sh -O2
+ cat
+ : gcc
+ gcc --version
gcc (Alpine 8.2.0) 8.2.0
Copyright (C) 2018 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

+ uname -m
s390x
+ gcc -DWITH_MEMCPY '-D_FORTIFY_SOURCE=2' -O2 -o testcase-DWITH_MEMCPY testcase.c
+ gdb --batch -ex 'disas main' ./testcase-DWITH_MEMCPY
Dump of assembler code for function main:
   0x00000000000006b0 <+0>:	stmg	%r14,%r15,112(%r15)
   0x00000000000006b6 <+6>:	ear	%r1,%a0
   0x00000000000006ba <+10>:	lay	%r15,-176(%r15)
   0x00000000000006c0 <+16>:	sllg	%r1,%r1,32
   0x00000000000006c6 <+22>:	ear	%r1,%a1
   0x00000000000006ca <+26>:	la	%r2,166(%r15)
   0x00000000000006ce <+30>:	mvc	168(8,%r15),40(%r1)
   0x00000000000006d4 <+36>:	la	%r1,1(%r3)
   0x00000000000006d8 <+40>:	clgr	%r1,%r2
   0x00000000000006dc <+44>:	jle	0x6e8 <main+56>
   0x00000000000006e0 <+48>:	la	%r4,168(%r15)
   0x00000000000006e4 <+52>:	clgrtl	%r1,%r4
   0x00000000000006e8 <+56>:	clgr	%r1,%r2
   0x00000000000006ec <+60>:	jl	0x724 <main+116>
   0x00000000000006f0 <+64>:	lgr	%r3,%r1
   0x00000000000006f4 <+68>:	lghi	%r4,2
   0x00000000000006f8 <+72>:	brasl	%r14,0x5f0 <memcpy@plt>
   0x00000000000006fe <+78>:	ear	%r1,%a0
   0x0000000000000702 <+82>:	llgh	%r2,166(%r15)
   0x0000000000000708 <+88>:	sllg	%r1,%r1,32
   0x000000000000070e <+94>:	ear	%r1,%a1
   0x0000000000000712 <+98>:	clc	168(8,%r15),40(%r1)
   0x0000000000000718 <+104>:	jne	0x730 <main+128>
   0x000000000000071c <+108>:	lmg	%r14,%r15,288(%r15)
   0x0000000000000722 <+114>:	br	%r14
   0x0000000000000724 <+116>:	la	%r3,3(%r3)
   0x0000000000000728 <+120>:	clgrtl	%r2,%r3
   0x000000000000072c <+124>:	j	0x6f0 <main+64>
   0x0000000000000730 <+128>:	brasl	%r14,0x630 <__stack_chk_fail@plt>
End of assembler dump.
+ gcc -DWITHOUT_MEMCPY '-D_FORTIFY_SOURCE=2' -O2 -o testcase-DWITHOUT_MEMCPY testcase.c
+ gdb --batch -ex 'disas main' ./testcase-DWITHOUT_MEMCPY
Dump of assembler code for function main:
   0x00000000000005f8 <+0>:	llgh	%r2,1(%r3)
   0x00000000000005fe <+6>:	br	%r14
End of assembler dump.
+ gcc -DWITH_MEMCPY -U_FORTIFY_SOURCE -O2 -o testcase-DWITH_MEMCPY testcase.c
+ gdb --batch -ex 'disas main' ./testcase-DWITH_MEMCPY
Dump of assembler code for function main:
   0x00000000000005f8 <+0>:	llgh	%r2,1(%r3)
   0x00000000000005fe <+6>:	br	%r14
End of assembler dump.
+ gcc -DWITHOUT_MEMCPY -U_FORTIFY_SOURCE -O2 -o testcase-DWITHOUT_MEMCPY testcase.c
+ gdb --batch -ex 'disas main' ./testcase-DWITHOUT_MEMCPY
Dump of assembler code for function main:
   0x00000000000005f8 <+0>:	llgh	%r2,1(%r3)
   0x00000000000005fe <+6>:	br	%r14
End of assembler dump.
Stefan Hajnoczi Feb. 25, 2019, 10:30 a.m. UTC | #41
On Sat, Feb 23, 2019 at 3:55 PM Natanael Copa <ncopa@alpinelinux.org> wrote:
> On Fri, 22 Feb 2019 14:04:20 +0000
> Stefan Hajnoczi <stefanha@gmail.com> wrote:
> > On Fri, Feb 22, 2019 at 12:57 PM Fernando Casas Schössow
> > <casasfernando@outlook.com> wrote:
> I tried to find this section. How do you get the assembly listing of
> relevant secion? I tried to do "disas virtio_pop" from
> `gdb /usr/bin/qemu-system-x86_64` from the binary in alpine edge. I
> could find 2 memcpy but none of them look like a 16 bit operation after:
>
>    0x00000000004551f1 <+353>:   mov    0x10(%rsp),%rdi
>    0x00000000004551f6 <+358>:   mov    $0x10,%edx
>    0x00000000004551fb <+363>:   callq  0x3879e0 <memcpy@plt>
>    0x0000000000455200 <+368>:   movzwl 0x5c(%rsp),%eax
>    0x0000000000455205 <+373>:   test   $0x4,%al
>    0x0000000000455207 <+375>:   je     0x4552aa <virtqueue_pop+538>
>
> ....
>
>    0x0000000000455291 <+513>:   mov    0x10(%rsp),%rdi
>    0x0000000000455296 <+518>:   mov    $0x10,%edx
>    0x000000000045529b <+523>:   callq  0x3879e0 <memcpy@plt>
>    0x00000000004552a0 <+528>:   mov    %rbp,0x20(%rsp)
>    0x00000000004552a5 <+533>:   movzwl 0x5c(%rsp),%eax
>    0x00000000004552aa <+538>:   lea    0x20e0(%rsp),%rdi
>    0x00000000004552b2 <+546>:   xor    %r11d,%r11d
>    0x00000000004552b5 <+549>:   mov    %r15,0x38(%rsp)

Here is the beginning of the function:

0000000000455090 <virtqueue_pop@@Base>:
  455090:    41 57                    push   %r15
  455092:    49 89 ff                 mov    %rdi,%r15
  455095:    b9 0e 00 00 00           mov    $0xe,%ecx
  45509a:    41 56                    push   %r14
  45509c:    41 55                    push   %r13
  45509e:    41 54                    push   %r12
  4550a0:    55                       push   %rbp
  4550a1:    53                       push   %rbx
  4550a2:    48 81 ec f8 60 00 00     sub    $0x60f8,%rsp
  4550a9:    4d 8b 67 58              mov    0x58(%r15),%r12
  4550ad:    48 89 74 24 08           mov    %rsi,0x8(%rsp)
  4550b2:    48 8d 6c 24 70           lea    0x70(%rsp),%rbp
  4550b7:    48 89 ef                 mov    %rbp,%rdi
  4550ba:    64 48 8b 04 25 28 00     mov    %fs:0x28,%rax
  4550c1:    00 00
  4550c3:    48 89 84 24 e8 60 00     mov    %rax,0x60e8(%rsp)
  4550ca:    00
  4550cb:    31 c0                    xor    %eax,%eax
  4550cd:    41 80 bc 24 6b 01 00     cmpb   $0x0,0x16b(%r12)
  4550d4:    00 00
  4550d6:    f3 48 ab                 rep stos %rax,%es:(%rdi)
  4550d9:    0f 85 c1 04 00 00        jne    4555a0 <virtqueue_pop@@Base+0x510>
  4550df:    48 8b 1d f2 4e 99 00     mov    0x994ef2(%rip),%rbx
 # de9fd8 <rcu_reader@@Base+0xde9f20>
  4550e6:    64 8b 43 0c              mov    %fs:0xc(%rbx),%eax
  4550ea:    8d 50 01                 lea    0x1(%rax),%edx
  4550ed:    64 89 53 0c              mov    %edx,%fs:0xc(%rbx)
  4550f1:    85 c0                    test   %eax,%eax
  4550f3:    0f 84 d7 03 00 00        je     4554d0 <virtqueue_pop@@Base+0x440>
  4550f9:    49 83 7f 18 00           cmpq   $0x0,0x18(%r15)
  4550fe:    0f 84 7d 03 00 00        je     455481 <virtqueue_pop@@Base+0x3f1>
  455104:    41 0f b7 47 30           movzwl 0x30(%r15),%eax
  455109:    66 41 39 47 32           cmp    %ax,0x32(%r15)
  45510e:    0f 84 0c 04 00 00        je     455520 <virtqueue_pop@@Base+0x490>

This last branch leads to:

  455520:    4c 89 ff                 mov    %r15,%rdi
  455523:    e8 78 b4 ff ff           callq  4509a0
<virtio_queue_host_notifier_read@@Base+0x6d0>
  455528:    48 83 b8 90 00 00 00     cmpq   $0x3,0x90(%rax)
  45552f:    03
  455530:    0f 86 98 03 00 00        jbe    4558ce <virtqueue_pop@@Base+0x83e>
  455536:    48 8b 90 80 00 00 00     mov    0x80(%rax),%rdx
  45553d:    48 85 d2                 test   %rdx,%rdx
  455540:    0f 84 c2 02 00 00        je     455808 <virtqueue_pop@@Base+0x778>
  455546:    48 8d 72 02              lea    0x2(%rdx),%rsi
  45554a:    48 8d 7c 24 42           lea    0x42(%rsp),%rdi
  45554f:    48 39 fe                 cmp    %rdi,%rsi
  455552:    76 09                    jbe    45555d <virtqueue_pop@@Base+0x4cd>
  455554:    48 8d 47 02              lea    0x2(%rdi),%rax
  455558:    48 39 c6                 cmp    %rax,%rsi
  45555b:    72 3c                    jb     455599 <virtqueue_pop@@Base+0x509>
  45555d:    48 39 fe                 cmp    %rdi,%rsi
  455560:    72 2e                    jb     455590 <virtqueue_pop@@Base+0x500>
  455562:    ba 02 00 00 00           mov    $0x2,%edx
  455567:    e8 74 24 f3 ff           callq  3879e0 <memcpy@plt>

> I can test different CFLAGS with and without the _FORTIFY_SOURCE and
> with different variants of memcpy (like __builtint_memcpy etc) but i
> need find a way to get the correct assembly output so I know if/when I
> have found something that works.

Good.  I think tweaking the compilation flags is the quickest
workaround to produce a working QEMU binary for Alpine.  Other distros
don't seem to hit this problem.

In the longer term QEMU will need to fix these memory accessor
functions in the way that Peter described, but doing that properly
requires auditing all the code so each instance that relies on
atomicity can be converted properly (and the other ones will stay
non-atomic because they must support misaligned addresses).

Stefan
Stefan Hajnoczi Feb. 25, 2019, 10:33 a.m. UTC | #42
On Mon, Feb 25, 2019 at 10:30 AM Stefan Hajnoczi <stefanha@gmail.com> wrote:
> On Sat, Feb 23, 2019 at 3:55 PM Natanael Copa <ncopa@alpinelinux.org> wrote:
> > On Fri, 22 Feb 2019 14:04:20 +0000
> > Stefan Hajnoczi <stefanha@gmail.com> wrote:
> > > On Fri, Feb 22, 2019 at 12:57 PM Fernando Casas Schössow
> > > <casasfernando@outlook.com> wrote:
> > I tried to find this section. How do you get the assembly listing of
> > relevant secion? I tried to do "disas virtio_pop" from
> > `gdb /usr/bin/qemu-system-x86_64` from the binary in alpine edge. I
> > could find 2 memcpy but none of them look like a 16 bit operation after:
> >
> >    0x00000000004551f1 <+353>:   mov    0x10(%rsp),%rdi
> >    0x00000000004551f6 <+358>:   mov    $0x10,%edx
> >    0x00000000004551fb <+363>:   callq  0x3879e0 <memcpy@plt>
> >    0x0000000000455200 <+368>:   movzwl 0x5c(%rsp),%eax
> >    0x0000000000455205 <+373>:   test   $0x4,%al
> >    0x0000000000455207 <+375>:   je     0x4552aa <virtqueue_pop+538>
> >
> > ....
> >
> >    0x0000000000455291 <+513>:   mov    0x10(%rsp),%rdi
> >    0x0000000000455296 <+518>:   mov    $0x10,%edx
> >    0x000000000045529b <+523>:   callq  0x3879e0 <memcpy@plt>
> >    0x00000000004552a0 <+528>:   mov    %rbp,0x20(%rsp)
> >    0x00000000004552a5 <+533>:   movzwl 0x5c(%rsp),%eax
> >    0x00000000004552aa <+538>:   lea    0x20e0(%rsp),%rdi
> >    0x00000000004552b2 <+546>:   xor    %r11d,%r11d
> >    0x00000000004552b5 <+549>:   mov    %r15,0x38(%rsp)
>
> Here is the beginning of the function:

This was built from the docker alpine image:
REPOSITORY          TAG                 IMAGE ID            CREATED
         SIZE
docker.io/alpine    latest              caf27325b298        3 weeks
ago         5.52 MB

aports git commit: be41538f0061b406a374564a0043a363efcb0293
gcc (Alpine 8.2.0) 8.2.0

Stefan
Peter Maydell Feb. 25, 2019, 10:34 a.m. UTC | #43
On Mon, 25 Feb 2019 at 10:24, Natanael Copa <ncopa@alpinelinux.org> wrote:
>
> On Sat, 23 Feb 2019 16:18:15 +0000
> Peter Maydell <peter.maydell@linaro.org> wrote:
>
> > On Sat, 23 Feb 2019 at 16:05, Natanael Copa <ncopa@alpinelinux.org> wrote:
> > > I was thinking of something in the lines of:
> > >
> > > typedef volatile uint16_t __attribute__((__may_alias__)) volatile_uint16_t;
> > > static inline int lduw_he_p(const void *ptr)
> > > {
> > >      volatile_uint16_t r = *(volatile_uint16_t*)ptr;
> > >      return r;
> > > }
> >
> > This won't correctly handle accesses with unaligned pointers,
> > I'm afraid. We rely on these functions correctly working
> > with pointers that are potentially unaligned.
>
> Well, current code does not even handle access with aligned pointers,
> depending on FORTIFY_SOURCE implementation.

It correctly handles aligned and unaligned pointers for the
API guarantees that the function in QEMU provides, which is
to say "definitely works on any kind of aligned or unaligned
pointer, not guaranteed to be atomic". Unfortunately some
code in QEMU is implicitly assuming it is atomic, which this
QEMU function does not guarantee -- it just happens to provide
that most of the time.

> My thinking here is that we depend on assumption that compiler will
> remove the memcpy call, so maybe find other way to generate same
> assembly, while still depend on compiler optimization.

> I did some tests and compared the assembly output. The compiler will
> generate same assembly if volatile is not used. The attribute
> __may_alias__ does not seem to make any difference. So the following
> function generates exactly same assembly:
>
> static inline int lduw_he_p(const void *ptr)
> {
>         uint16_t r = *(uint16_t*)ptr;
>         return r;
> }

This still is not guaranteed to work on unaligned pointers.
(For instance it probably won't work on SPARC hosts.)
More generally there is no way to have a single function
that is guaranteed to handle unaligned pointers and also
guaranteed to produce an atomic access on all host architectures
that we support, because not all host architectures allow
you to do both with the same instruction sequence. So
you can't just change this function to make it provide
atomic access without breaking the other guarantee it is
providing.

The long term fix for this is that we need to separate out
our APIs so we can have a family of functions that guarantee
to work on unaligned pointers, and a family that guarantee
to work atomically, and calling code can use the one that
provides the semantics it requires.

The short term fix is to fix your toolchain/compilation
environment options so that it isn't trying to override
the definition of memcpy().

thanks
-- PMM
Fernando Casas Schössow Feb. 25, 2019, 12:15 p.m. UTC | #44
I’m running the test guest on another host for the last three days and so far so good.
Yet because of the nature of this bug/issue it can take a few hours or a few days more to fail. The failure is unpredictable.

Does it make sense to continue running the guest on this different host to try to repro or can I migrate the guest back to the original host? At this point I don’t think it will make any difference. The current host storage is far more slower causing latency issues to the guest so if I can migrate it back it would be great.

Sent from my iPhone

> On 25 Feb 2019, at 11:34, Peter Maydell <peter.maydell@linaro.org> wrote:
> 
>> On Mon, 25 Feb 2019 at 10:24, Natanael Copa <ncopa@alpinelinux.org> wrote:
>> 
>> On Sat, 23 Feb 2019 16:18:15 +0000
>> Peter Maydell <peter.maydell@linaro.org> wrote:
>> 
>>>> On Sat, 23 Feb 2019 at 16:05, Natanael Copa <ncopa@alpinelinux.org> wrote:
>>>> I was thinking of something in the lines of:
>>>> 
>>>> typedef volatile uint16_t __attribute__((__may_alias__)) volatile_uint16_t;
>>>> static inline int lduw_he_p(const void *ptr)
>>>> {
>>>>     volatile_uint16_t r = *(volatile_uint16_t*)ptr;
>>>>     return r;
>>>> }
>>> 
>>> This won't correctly handle accesses with unaligned pointers,
>>> I'm afraid. We rely on these functions correctly working
>>> with pointers that are potentially unaligned.
>> 
>> Well, current code does not even handle access with aligned pointers,
>> depending on FORTIFY_SOURCE implementation.
> 
> It correctly handles aligned and unaligned pointers for the
> API guarantees that the function in QEMU provides, which is
> to say "definitely works on any kind of aligned or unaligned
> pointer, not guaranteed to be atomic". Unfortunately some
> code in QEMU is implicitly assuming it is atomic, which this
> QEMU function does not guarantee -- it just happens to provide
> that most of the time.
> 
>> My thinking here is that we depend on assumption that compiler will
>> remove the memcpy call, so maybe find other way to generate same
>> assembly, while still depend on compiler optimization.
> 
>> I did some tests and compared the assembly output. The compiler will
>> generate same assembly if volatile is not used. The attribute
>> __may_alias__ does not seem to make any difference. So the following
>> function generates exactly same assembly:
>> 
>> static inline int lduw_he_p(const void *ptr)
>> {
>>        uint16_t r = *(uint16_t*)ptr;
>>        return r;
>> }
> 
> This still is not guaranteed to work on unaligned pointers.
> (For instance it probably won't work on SPARC hosts.)
> More generally there is no way to have a single function
> that is guaranteed to handle unaligned pointers and also
> guaranteed to produce an atomic access on all host architectures
> that we support, because not all host architectures allow
> you to do both with the same instruction sequence. So
> you can't just change this function to make it provide
> atomic access without breaking the other guarantee it is
> providing.
> 
> The long term fix for this is that we need to separate out
> our APIs so we can have a family of functions that guarantee
> to work on unaligned pointers, and a family that guarantee
> to work atomically, and calling code can use the one that
> provides the semantics it requires.
> 
> The short term fix is to fix your toolchain/compilation
> environment options so that it isn't trying to override
> the definition of memcpy().
> 
> thanks
> -- PMM
Natanael Copa Feb. 25, 2019, 12:21 p.m. UTC | #45
On Mon, 25 Feb 2019 10:34:23 +0000
Peter Maydell <peter.maydell@linaro.org> wrote:

> On Mon, 25 Feb 2019 at 10:24, Natanael Copa <ncopa@alpinelinux.org> wrote:
> >
> > On Sat, 23 Feb 2019 16:18:15 +0000
> > Peter Maydell <peter.maydell@linaro.org> wrote:
> >  
> > > On Sat, 23 Feb 2019 at 16:05, Natanael Copa <ncopa@alpinelinux.org> wrote:  
> > > > I was thinking of something in the lines of:
> > > >
> > > > typedef volatile uint16_t __attribute__((__may_alias__)) volatile_uint16_t;
> > > > static inline int lduw_he_p(const void *ptr)
> > > > {
> > > >      volatile_uint16_t r = *(volatile_uint16_t*)ptr;
> > > >      return r;
> > > > }  
> > >
> > > This won't correctly handle accesses with unaligned pointers,
> > > I'm afraid. We rely on these functions correctly working
> > > with pointers that are potentially unaligned.  
> >
> > Well, current code does not even handle access with aligned pointers,
> > depending on FORTIFY_SOURCE implementation.  
> 
> It correctly handles aligned and unaligned pointers for the
> API guarantees that the function in QEMU provides, which is
> to say "definitely works on any kind of aligned or unaligned
> pointer, not guaranteed to be atomic". Unfortunately some
> code in QEMU is implicitly assuming it is atomic, which this
> QEMU function does not guarantee -- it just happens to provide
> that most of the time.
> 
> > My thinking here is that we depend on assumption that compiler will
> > remove the memcpy call, so maybe find other way to generate same
> > assembly, while still depend on compiler optimization.  
> 
> > I did some tests and compared the assembly output. The compiler will
> > generate same assembly if volatile is not used. The attribute
> > __may_alias__ does not seem to make any difference. So the following
> > function generates exactly same assembly:
> >
> > static inline int lduw_he_p(const void *ptr)
> > {
> >         uint16_t r = *(uint16_t*)ptr;
> >         return r;
> > }  
> 
> This still is not guaranteed to work on unaligned pointers.
> (For instance it probably won't work on SPARC hosts.)
> More generally there is no way to have a single function
> that is guaranteed to handle unaligned pointers and also
> guaranteed to produce an atomic access on all host architectures
> that we support, because not all host architectures allow
> you to do both with the same instruction sequence. So
> you can't just change this function to make it provide
> atomic access without breaking the other guarantee it is
> providing.

Right, so it is possible that there are other architectures (like
SPARC) that suffer from the same race here as Alpine, because memcpy is
not guaranteed to be atomic.

> The long term fix for this is that we need to separate out
> our APIs so we can have a family of functions that guarantee
> to work on unaligned pointers, and a family that guarantee
> to work atomically, and calling code can use the one that
> provides the semantics it requires.
> 
> The short term fix is to fix your toolchain/compilation
> environment options so that it isn't trying to override
> the definition of memcpy().

The easiest workaround is to simply disable FORTIY_SOURCE, but that
will weaken the security for all implemented string functions, strcpy,
memmove etc, so I don't want to do that.

Is it only lduw_he_p that needs to be atomic or are the other functions
in include/qemu/bswap.h using memcpy also required to be atomic?

I intend to replace memcpy with __builtin_memcpy there.

> 
> thanks
> -- PMM
Peter Maydell Feb. 25, 2019, 1:06 p.m. UTC | #46
On Mon, 25 Feb 2019 at 12:22, Natanael Copa <ncopa@alpinelinux.org> wrote:
>
> On Mon, 25 Feb 2019 10:34:23 +0000
> Peter Maydell <peter.maydell@linaro.org> wrote:
> > The short term fix is to fix your toolchain/compilation
> > environment options so that it isn't trying to override
> > the definition of memcpy().
>
> The easiest workaround is to simply disable FORTIY_SOURCE, but that
> will weaken the security for all implemented string functions, strcpy,
> memmove etc, so I don't want to do that.
>
> Is it only lduw_he_p that needs to be atomic or are the other functions
> in include/qemu/bswap.h using memcpy also required to be atomic?

Hard to say, since we haven't done the "audit all the callers"
step that Stefan mentioned. If you're going to replace memcpy
with __builtin_memcpy then the safest thing is to do it for
all those uses (this will also give you much better generated
code for performance purposes).

thanks
-- PMM
Natanael Copa Feb. 25, 2019, 1:25 p.m. UTC | #47
On Mon, 25 Feb 2019 13:06:16 +0000
Peter Maydell <peter.maydell@linaro.org> wrote:

> On Mon, 25 Feb 2019 at 12:22, Natanael Copa <ncopa@alpinelinux.org> wrote:
> >
> > On Mon, 25 Feb 2019 10:34:23 +0000
> > Peter Maydell <peter.maydell@linaro.org> wrote:  
> > > The short term fix is to fix your toolchain/compilation
> > > environment options so that it isn't trying to override
> > > the definition of memcpy().  
> >
> > The easiest workaround is to simply disable FORTIY_SOURCE, but that
> > will weaken the security for all implemented string functions, strcpy,
> > memmove etc, so I don't want to do that.
> >
> > Is it only lduw_he_p that needs to be atomic or are the other functions
> > in include/qemu/bswap.h using memcpy also required to be atomic?  
> 
> Hard to say, since we haven't done the "audit all the callers"
> step that Stefan mentioned. If you're going to replace memcpy
> with __builtin_memcpy then the safest thing is to do it for
> all those uses (this will also give you much better generated
> code for performance purposes).

I figured that and that is exactly what I did.

Fernando: Can you please test the binary from
qemu-system-x86_64-3.1.0-r3 from alpine edge? I will backport the fix
if you can confirm it fixes the problem.

Thanks!

-nc

PS. Those issues are pretty hard to track down, so big thanks to
everyone who helped find the exact issue here. You have done a great
work!
Fernando Casas Schössow Feb. 25, 2019, 1:32 p.m. UTC | #48
Thanks Natanael. Is the new package ready?

I will update as soon as the package is available, try to repro and report back.

Thanks everyone for looking into this!

On lun, feb 25, 2019 at 2:25 PM, Natanael Copa <ncopa@alpinelinux.org> wrote:
On Mon, 25 Feb 2019 13:06:16 +0000 Peter Maydell <peter.maydell@linaro.org<mailto:peter.maydell@linaro.org>> wrote:
On Mon, 25 Feb 2019 at 12:22, Natanael Copa <ncopa@alpinelinux.org<mailto:ncopa@alpinelinux.org>> wrote: > > On Mon, 25 Feb 2019 10:34:23 +0000 > Peter Maydell <peter.maydell@linaro.org<mailto:peter.maydell@linaro.org>> wrote: > > The short term fix is to fix your toolchain/compilation > > environment options so that it isn't trying to override > > the definition of memcpy(). > > The easiest workaround is to simply disable FORTIY_SOURCE, but that > will weaken the security for all implemented string functions, strcpy, > memmove etc, so I don't want to do that. > > Is it only lduw_he_p that needs to be atomic or are the other functions > in include/qemu/bswap.h using memcpy also required to be atomic? Hard to say, since we haven't done the "audit all the callers" step that Stefan mentioned. If you're going to replace memcpy with __builtin_memcpy then the safest thing is to do it for all those uses (this will also give you much better generated code for performance purposes).
I figured that and that is exactly what I did. Fernando: Can you please test the binary from qemu-system-x86_64-3.1.0-r3 from alpine edge? I will backport the fix if you can confirm it fixes the problem. Thanks! -nc PS. Those issues are pretty hard to track down, so big thanks to everyone who helped find the exact issue here. You have done a great work!
Fernando Casas Schössow Feb. 25, 2019, 3:41 p.m. UTC | #49
Ok, the new package is deployed.

I stopped and started the test guest so it will use the new QEMU binary.

Will monitor and report back.

On lun, feb 25, 2019 at 2:32 PM, Fernando Casas Schössow <casasfernando@outlook.com> wrote:
Thanks Natanael. Is the new package ready?

I will update as soon as the package is available, try to repro and report back.

Thanks everyone for looking into this!

On lun, feb 25, 2019 at 2:25 PM, Natanael Copa <ncopa@alpinelinux.org> wrote:
On Mon, 25 Feb 2019 13:06:16 +0000 Peter Maydell <peter.maydell@linaro.org<mailto:peter.maydell@linaro.org>> wrote:
On Mon, 25 Feb 2019 at 12:22, Natanael Copa <ncopa@alpinelinux.org<mailto:ncopa@alpinelinux.org>> wrote: > > On Mon, 25 Feb 2019 10:34:23 +0000 > Peter Maydell <peter.maydell@linaro.org<mailto:peter.maydell@linaro.org>> wrote: > > The short term fix is to fix your toolchain/compilation > > environment options so that it isn't trying to override > > the definition of memcpy(). > > The easiest workaround is to simply disable FORTIY_SOURCE, but that > will weaken the security for all implemented string functions, strcpy, > memmove etc, so I don't want to do that. > > Is it only lduw_he_p that needs to be atomic or are the other functions > in include/qemu/bswap.h using memcpy also required to be atomic? Hard to say, since we haven't done the "audit all the callers" step that Stefan mentioned. If you're going to replace memcpy with __builtin_memcpy then the safest thing is to do it for all those uses (this will also give you much better generated code for performance purposes).
I figured that and that is exactly what I did. Fernando: Can you please test the binary from qemu-system-x86_64-3.1.0-r3 from alpine edge? I will backport the fix if you can confirm it fixes the problem. Thanks! -nc PS. Those issues are pretty hard to track down, so big thanks to everyone who helped find the exact issue here. You have done a great work!
Paolo Bonzini Feb. 26, 2019, 1:30 p.m. UTC | #50
On 23/02/19 12:49, Natanael Copa wrote:
> I suspect this happens due to the Alpine toolchain will enable
> _FORTIFY_SOURCE=2 by default and the way this is implemented via fortify-headers:
> http://git.2f30.org/fortify-headers/file/include/string.h.html#l39

The call to __orig_memcpy is the culprit there, is there any reason not
to do something like

_FORTIFY_FN(memcpy)
void *__memcpy_chk(void *__od, const void *__os, size_t __n)
{
	size_t __bd = __builtin_object_size(__od, 0);
	size_t __bs = __builtin_object_size(__os, 0);
	char *__d = (char *)__od;
	const char *__s = (const char *)__os;

	/* trap if pointers are overlapping but not if dst == src.
	 * gcc seems to like to generate code that relies on dst == src */
	if ((__d < __s && __d + __n > __s) ||
	    (__s < __d && __s + __n > __d))
		__builtin_trap();
	if (__n > __bd || __n > __bs)
		__builtin_trap();
	return memcpy(__od, __os, __n);
}
#define memcpy __memcpy_chk

?  That is, getting rid of _FORTIFY_ORIG altogether.

Paolo
Fernando Casas Schössow Feb. 28, 2019, 7:35 a.m. UTC | #51
Just wanted to share a small update on the situation after updating QEMU to the new Alpine package patched with Natanael's patch.
So far so good, moreover I switched a few other guests from SATA to VirtIO SCSI and after two days no issues.
Unless I find any problem I will report back with an update in a week from now.

Thanks everyone for all you did to help find a solution to this issue.

On mar, feb 26, 2019 at 2:30 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
On 23/02/19 12:49, Natanael Copa wrote:
I suspect this happens due to the Alpine toolchain will enable _FORTIFY_SOURCE=2 by default and the way this is implemented via fortify-headers: http://git.2f30.org/fortify-headers/file/include/string.h.html#l39
The call to __orig_memcpy is the culprit there, is there any reason not to do something like _FORTIFY_FN(memcpy) void *__memcpy_chk(void *__od, const void *__os, size_t __n) { size_t __bd = __builtin_object_size(__od, 0); size_t __bs = __builtin_object_size(__os, 0); char *__d = (char *)__od; const char *__s = (const char *)__os; /* trap if pointers are overlapping but not if dst == src. * gcc seems to like to generate code that relies on dst == src */ if ((__d < __s && __d + __n > __s) || (__s < __d && __s + __n > __d)) __builtin_trap(); if (__n > __bd || __n > __bs) __builtin_trap(); return memcpy(__od, __os, __n); } #define memcpy __memcpy_chk ? That is, getting rid of _FORTIFY_ORIG altogether. Paolo
Peter Maydell Feb. 28, 2019, 9:58 a.m. UTC | #52
On Mon, 25 Feb 2019 at 13:06, Peter Maydell <peter.maydell@linaro.org> wrote:
>
> On Mon, 25 Feb 2019 at 12:22, Natanael Copa <ncopa@alpinelinux.org> wrote:
> >
> > On Mon, 25 Feb 2019 10:34:23 +0000
> > Peter Maydell <peter.maydell@linaro.org> wrote:
> > > The short term fix is to fix your toolchain/compilation
> > > environment options so that it isn't trying to override
> > > the definition of memcpy().
> >
> > The easiest workaround is to simply disable FORTIY_SOURCE, but that
> > will weaken the security for all implemented string functions, strcpy,
> > memmove etc, so I don't want to do that.
> >
> > Is it only lduw_he_p that needs to be atomic or are the other functions
> > in include/qemu/bswap.h using memcpy also required to be atomic?
>
> Hard to say, since we haven't done the "audit all the callers"
> step that Stefan mentioned. If you're going to replace memcpy
> with __builtin_memcpy then the safest thing is to do it for
> all those uses (this will also give you much better generated
> code for performance purposes).

I mentioned this to Richard on IRC the other day, but I think there's
a good argument for making upstream QEMU use __builtin_memcpy in
these helpers:
 * it makes us robust against things like this fortify library
   in the short term (until we fix QEMU to have an equivalent set
   of functions for doing atomic accesses to AddressSpaces)
 * in the longer term it will mean that we don't end up with
   these functions being really badly-performing even if the
   semantics of the out-of-line memcpy() are correct

thanks
-- PMM
Fernando Casas Schössow March 7, 2019, 7:14 a.m. UTC | #53
After two weeks of running the new QEMU package everything is fine.
Moreover I migrated the rest of the guests (both Windows and Linux) to virtio-scsi and no issues so far.
I will monitor for another week but this issue seems pretty much fixed!

Kudos to each and everyone of you that helped finding and solving this issue.

Fer

On jue, feb 28, 2019 at 10:58 AM, Peter Maydell <peter.maydell@linaro.org> wrote:
On Mon, 25 Feb 2019 at 13:06, Peter Maydell <peter.maydell@linaro.org<mailto:peter.maydell@linaro.org>> wrote:
On Mon, 25 Feb 2019 at 12:22, Natanael Copa <ncopa@alpinelinux.org<mailto:ncopa@alpinelinux.org>> wrote: > > On Mon, 25 Feb 2019 10:34:23 +0000 > Peter Maydell <peter.maydell@linaro.org<mailto:peter.maydell@linaro.org>> wrote: > > The short term fix is to fix your toolchain/compilation > > environment options so that it isn't trying to override > > the definition of memcpy(). > > The easiest workaround is to simply disable FORTIY_SOURCE, but that > will weaken the security for all implemented string functions, strcpy, > memmove etc, so I don't want to do that. > > Is it only lduw_he_p that needs to be atomic or are the other functions > in include/qemu/bswap.h using memcpy also required to be atomic? Hard to say, since we haven't done the "audit all the callers" step that Stefan mentioned. If you're going to replace memcpy with __builtin_memcpy then the safest thing is to do it for all those uses (this will also give you much better generated code for performance purposes).
I mentioned this to Richard on IRC the other day, but I think there's a good argument for making upstream QEMU use __builtin_memcpy in these helpers: * it makes us robust against things like this fortify library in the short term (until we fix QEMU to have an equivalent set of functions for doing atomic accesses to AddressSpaces) * in the longer term it will mean that we don't end up with these functions being really badly-performing even if the semantics of the out-of-line memcpy() are correct thanks -- PMM
Fernando Casas Schössow March 18, 2019, 7:07 a.m. UTC | #54
Hi all,

Wanted to share one more update regarding this issue.
Since running the new package with the patch from Natanael, the issue never happen again.
It's been three weeks with all guests running on virtio-scsi and no issues at all so I guess we can consider this solved.

Thanks again to everyone that helped identifying and fixing this issue. Great work.

Cheers,

Fer

On jue, mar 7, 2019 at 8:14 AM, Fernando Casas Schössow <casasfernando@outlook.com> wrote:
After two weeks of running the new QEMU package everything is fine.
Moreover I migrated the rest of the guests (both Windows and Linux) to virtio-scsi and no issues so far.
I will monitor for another week but this issue seems pretty much fixed!

Kudos to each and everyone of you that helped finding and solving this issue.

Fer
Stefan Hajnoczi March 20, 2019, 1:57 p.m. UTC | #55
On Mon, Mar 18, 2019 at 07:07:25AM +0000, Fernando Casas Schössow wrote:
> Wanted to share one more update regarding this issue.
> Since running the new package with the patch from Natanael, the issue never happen again.
> It's been three weeks with all guests running on virtio-scsi and no issues at all so I guess we can consider this solved.
> 
> Thanks again to everyone that helped identifying and fixing this issue. Great work.

Nice, thanks for reporting and providing all the debugging information
to track down this bug!

Stefan
diff mbox

Patch

--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -856,7 +856,7 @@  void *virtqueue_pop(VirtQueue *vq, size_t sz)
     max = vq->vring.num;

     if (vq->inuse >= vq->vring.num) {
-        virtio_error(vdev, "Virtqueue size exceeded");
+        virtio_error(vdev, "Virtqueue %u device %s size exceeded",
vq->queue_index, vdev->name);
         goto done;
     }