Message ID | CABdb7358YVXbk++Z7s+q6Z5O0Q=6CQiQvYsTdc35eqmQeUnP_Q@mail.gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Hi Ladi, Thanks a lot for looking into this and replying. I will do my best to rebuild and deploy Alpine's qemu packages with this patch included but not sure its feasible yet. In any case, would it be possible to have this patch included in the next qemu release? The current error message is helpful but knowing which device was involved will be much more helpful. Regarding the environment, I'm not doing migrations and only managed save is done in case the host needs to be rebooted or shutdown. The QEMU process is running the VM since the host is started and this failuire is ocurring randomly without any previous manage save done. As part of troubleshooting on one of the guests I switched from virtio_blk to virtio_scsi for the guest disks but I will need more time to see if that helped. If I have this problem again I will follow your advise and remove virtio_balloon. Another question: is there any way to monitor the virtqueue size either from the guest itself or from the host? Any file in sysfs or proc? This may help to understand in which conditions this is happening and to react faster to mitigate the problem. Thanks again for your help with this! Fer On vie, jun 16, 2017 at 8:58 , Ladi Prosek <lprosek@redhat.com> wrote: Hi, Would you be able to enhance the error message and rebuild QEMU? --- a/hw/virtio/virtio.c +++ b/hw/virtio/virtio.c @@ -856,7 +856,7 @@ void *virtqueue_pop(VirtQueue *vq, size_t sz) max = vq->vring.num; if (vq->inuse >= vq->vring.num) { - virtio_error(vdev, "Virtqueue size exceeded"); + virtio_error(vdev, "Virtqueue %u device %s size exceeded", vq->queue_index, vdev->name); goto done; } This would at least confirm the theory that it's caused by virtio-blk-pci. If rebuilding is not feasible I would start by removing other virtio devices -- particularly balloon which has had quite a few virtio related bugs fixed recently. Does your environment involve VM migrations or saving/resuming, or does the crashing QEMU process always run the VM from its boot? Thanks!
On Fri, Jun 16, 2017 at 12:11 PM, Fernando Casas Schössow <casasfernando@hotmail.com> wrote: > Hi Ladi, > > Thanks a lot for looking into this and replying. > I will do my best to rebuild and deploy Alpine's qemu packages with this > patch included but not sure its feasible yet. > In any case, would it be possible to have this patch included in the next > qemu release? Yes, I have already added this to my todo list. > The current error message is helpful but knowing which device was involved > will be much more helpful. > > Regarding the environment, I'm not doing migrations and only managed save is > done in case the host needs to be rebooted or shutdown. The QEMU process is > running the VM since the host is started and this failuire is ocurring > randomly without any previous manage save done. > > As part of troubleshooting on one of the guests I switched from virtio_blk > to virtio_scsi for the guest disks but I will need more time to see if that > helped. > If I have this problem again I will follow your advise and remove > virtio_balloon. Thanks, please keep us posted. > Another question: is there any way to monitor the virtqueue size either from > the guest itself or from the host? Any file in sysfs or proc? > This may help to understand in which conditions this is happening and to > react faster to mitigate the problem. The problem is not in the virtqueue size but in one piece of internal state ("inuse") which is meant to track the number of buffers "checked out" by QEMU. It's being compared to virtqueue size merely as a sanity check. I'm afraid that there's no way to expose this variable without rebuilding QEMU. The best you could do is attach gdb to the QEMU process and use some clever data access breakpoints to catch suspicious writes to the variable. Although it's likely that it just creeps up slowly and you won't see anything interesting. It's probably beyond reasonable at this point anyway. I would continue with the elimination process (virtio_scsi instead of virtio_blk, no balloon, etc.) and then maybe once we know which device it is, we can add some instrumentation to the code. > Thanks again for your help with this! > > Fer > > On vie, jun 16, 2017 at 8:58 , Ladi Prosek <lprosek@redhat.com> wrote: > > Hi, > > Would you be able to enhance the error message and rebuild QEMU? --- > a/hw/virtio/virtio.c +++ b/hw/virtio/virtio.c @@ -856,7 +856,7 @@ void > *virtqueue_pop(VirtQueue *vq, size_t sz) max = vq->vring.num; if (vq->inuse >>= vq->vring.num) { - virtio_error(vdev, "Virtqueue size exceeded"); + > virtio_error(vdev, "Virtqueue %u device %s size exceeded", vq->queue_index, > vdev->name); goto done; } This would at least confirm the theory that it's > caused by virtio-blk-pci. If rebuilding is not feasible I would start by > removing other virtio devices -- particularly balloon which has had quite a > few virtio related bugs fixed recently. Does your environment involve VM > migrations or saving/resuming, or does the crashing QEMU process always run > the VM from its boot? Thanks! > > >
Hi Ladi, Today two guests failed again at different times of day. One of them was the one I switched from virtio_blk to virtio_scsi so this change didn't solved the problem. Now in this guest I also disabled virtio_balloon, continuing with the elimination process. Also this time I found a different error message in the guest console. In the guest already switched to virtio_scsi: virtio_scsi virtio2: request:id 44 is not a head! Followed by the usual "task blocked for more than 120 seconds." error. On the guest still running on virtio_blk the error was similar: virtio_blk virtio2: req.0:id 42 is not a head! blk_update_request: I/O error, dev vda, sector 645657736 Buffer I/O error on dev dm-1, logical block 7413821, lost async page write Followed by the usual "task blocked for more than 120 seconds." error. Do you think that the blk_update_request and the buffer I/O error may be a consequence of the previous "is not a head!" error or should I be worried for a storage level issue here? Now I will wait to see if disabling virtio_balloon helps or not and report back. Thanks. Fer On vie, jun 16, 2017 at 12:25 , Ladi Prosek <lprosek@redhat.com> wrote: On Fri, Jun 16, 2017 at 12:11 PM, Fernando Casas Schössow <casasfernando@hotmail.com<mailto:casasfernando@hotmail.com>> wrote: Hi Ladi, Thanks a lot for looking into this and replying. I will do my best to rebuild and deploy Alpine's qemu packages with this patch included but not sure its feasible yet. In any case, would it be possible to have this patch included in the next qemu release? Yes, I have already added this to my todo list. The current error message is helpful but knowing which device was involved will be much more helpful. Regarding the environment, I'm not doing migrations and only managed save is done in case the host needs to be rebooted or shutdown. The QEMU process is running the VM since the host is started and this failuire is ocurring randomly without any previous manage save done. As part of troubleshooting on one of the guests I switched from virtio_blk to virtio_scsi for the guest disks but I will need more time to see if that helped. If I have this problem again I will follow your advise and remove virtio_balloon. Thanks, please keep us posted. Another question: is there any way to monitor the virtqueue size either from the guest itself or from the host? Any file in sysfs or proc? This may help to understand in which conditions this is happening and to react faster to mitigate the problem. The problem is not in the virtqueue size but in one piece of internal state ("inuse") which is meant to track the number of buffers "checked out" by QEMU. It's being compared to virtqueue size merely as a sanity check. I'm afraid that there's no way to expose this variable without rebuilding QEMU. The best you could do is attach gdb to the QEMU process and use some clever data access breakpoints to catch suspicious writes to the variable. Although it's likely that it just creeps up slowly and you won't see anything interesting. It's probably beyond reasonable at this point anyway. I would continue with the elimination process (virtio_scsi instead of virtio_blk, no balloon, etc.) and then maybe once we know which device it is, we can add some instrumentation to the code. Thanks again for your help with this! Fer On vie, jun 16, 2017 at 8:58 , Ladi Prosek <lprosek@redhat.com<mailto:lprosek@redhat.com>> wrote: Hi, Would you be able to enhance the error message and rebuild QEMU? --- a/hw/virtio/virtio.c +++ b/hw/virtio/virtio.c @@ -856,7 +856,7 @@ void *virtqueue_pop(VirtQueue *vq, size_t sz) max = vq->vring.num; if (vq->inuse = vq->vring.num) { - virtio_error(vdev, "Virtqueue size exceeded"); + virtio_error(vdev, "Virtqueue %u device %s size exceeded", vq->queue_index, vdev->name); goto done; } This would at least confirm the theory that it's caused by virtio-blk-pci. If rebuilding is not feasible I would start by removing other virtio devices -- particularly balloon which has had quite a few virtio related bugs fixed recently. Does your environment involve VM migrations or saving/resuming, or does the crashing QEMU process always run the VM from its boot? Thanks!
Hi Fernando, On Tue, Jun 20, 2017 at 12:10 AM, Fernando Casas Schössow <casasfernando@hotmail.com> wrote: > Hi Ladi, > > Today two guests failed again at different times of day. > One of them was the one I switched from virtio_blk to virtio_scsi so this > change didn't solved the problem. > Now in this guest I also disabled virtio_balloon, continuing with the > elimination process. > > Also this time I found a different error message in the guest console. > In the guest already switched to virtio_scsi: > > virtio_scsi virtio2: request:id 44 is not a head! > > Followed by the usual "task blocked for more than 120 seconds." error. > > On the guest still running on virtio_blk the error was similar: > > virtio_blk virtio2: req.0:id 42 is not a head! > blk_update_request: I/O error, dev vda, sector 645657736 > Buffer I/O error on dev dm-1, logical block 7413821, lost async page write > > Followed by the usual "task blocked for more than 120 seconds." error. Honestly this is starting to look more and more like a memory corruption. Two different virtio devices and two different guest operating systems, that would have to be a bug in the common virtio code and we would have seen it somewhere else already. Would it be possible run a thorough memtest on the host just in case? > Do you think that the blk_update_request and the buffer I/O error may be a > consequence of the previous "is not a head!" error or should I be worried > for a storage level issue here? > > Now I will wait to see if disabling virtio_balloon helps or not and report > back. > > Thanks. > > Fer > > On vie, jun 16, 2017 at 12:25 , Ladi Prosek <lprosek@redhat.com> wrote: > > On Fri, Jun 16, 2017 at 12:11 PM, Fernando Casas Schössow > <casasfernando@hotmail.com> wrote: > > Hi Ladi, Thanks a lot for looking into this and replying. I will do my best > to rebuild and deploy Alpine's qemu packages with this patch included but > not sure its feasible yet. In any case, would it be possible to have this > patch included in the next qemu release? > > Yes, I have already added this to my todo list. > > The current error message is helpful but knowing which device was involved > will be much more helpful. Regarding the environment, I'm not doing > migrations and only managed save is done in case the host needs to be > rebooted or shutdown. The QEMU process is running the VM since the host is > started and this failuire is ocurring randomly without any previous manage > save done. As part of troubleshooting on one of the guests I switched from > virtio_blk to virtio_scsi for the guest disks but I will need more time to > see if that helped. If I have this problem again I will follow your advise > and remove virtio_balloon. > > Thanks, please keep us posted. > > Another question: is there any way to monitor the virtqueue size either from > the guest itself or from the host? Any file in sysfs or proc? This may help > to understand in which conditions this is happening and to react faster to > mitigate the problem. > > The problem is not in the virtqueue size but in one piece of internal state > ("inuse") which is meant to track the number of buffers "checked out" by > QEMU. It's being compared to virtqueue size merely as a sanity check. I'm > afraid that there's no way to expose this variable without rebuilding QEMU. > The best you could do is attach gdb to the QEMU process and use some clever > data access breakpoints to catch suspicious writes to the variable. Although > it's likely that it just creeps up slowly and you won't see anything > interesting. It's probably beyond reasonable at this point anyway. I would > continue with the elimination process (virtio_scsi instead of virtio_blk, no > balloon, etc.) and then maybe once we know which device it is, we can add > some instrumentation to the code. > > Thanks again for your help with this! Fer On vie, jun 16, 2017 at 8:58 , > Ladi Prosek <lprosek@redhat.com> wrote: Hi, Would you be able to enhance the > error message and rebuild QEMU? --- a/hw/virtio/virtio.c +++ > b/hw/virtio/virtio.c @@ -856,7 +856,7 @@ void *virtqueue_pop(VirtQueue *vq, > size_t sz) max = vq->vring.num; if (vq->inuse > > = vq->vring.num) { - virtio_error(vdev, "Virtqueue size exceeded"); + > > virtio_error(vdev, "Virtqueue %u device %s size exceeded", vq->queue_index, > vdev->name); goto done; } This would at least confirm the theory that it's > caused by virtio-blk-pci. If rebuilding is not feasible I would start by > removing other virtio devices -- particularly balloon which has had quite a > few virtio related bugs fixed recently. Does your environment involve VM > migrations or saving/resuming, or does the crashing QEMU process always run > the VM from its boot? Thanks! > > >
Hi Ladi, In this case both guests are CentOS 7.3 running the same kernel 3.10.0-514.21.1. Also the guest that fails most frequently is running Docker with 4 or 5 containers. Another thing I would like to mention is that the host is running on Alpine's default grsec patched kernel. I have the option to install also a vanilla kernel. Would it make sense to switch to the vanilla kernel on the host and see if that helps? And last but not least KSM is enabled on the host. Should I disable it? Following your advice I will run memtest on the host and report back. Just as a side comment, the host is running on ECC memory. Thanks for all your help. Fer. On mar, jun 20, 2017 at 7:59 , Ladi Prosek <lprosek@redhat.com> wrote: Hi Fernando, On Tue, Jun 20, 2017 at 12:10 AM, Fernando Casas Schössow <casasfernando@hotmail.com<mailto:casasfernando@hotmail.com>> wrote: Hi Ladi, Today two guests failed again at different times of day. One of them was the one I switched from virtio_blk to virtio_scsi so this change didn't solved the problem. Now in this guest I also disabled virtio_balloon, continuing with the elimination process. Also this time I found a different error message in the guest console. In the guest already switched to virtio_scsi: virtio_scsi virtio2: request:id 44 is not a head! Followed by the usual "task blocked for more than 120 seconds." error. On the guest still running on virtio_blk the error was similar: virtio_blk virtio2: req.0:id 42 is not a head! blk_update_request: I/O error, dev vda, sector 645657736 Buffer I/O error on dev dm-1, logical block 7413821, lost async page write Followed by the usual "task blocked for more than 120 seconds." error. Honestly this is starting to look more and more like a memory corruption. Two different virtio devices and two different guest operating systems, that would have to be a bug in the common virtio code and we would have seen it somewhere else already. Would it be possible run a thorough memtest on the host just in case? Do you think that the blk_update_request and the buffer I/O error may be a consequence of the previous "is not a head!" error or should I be worried for a storage level issue here? Now I will wait to see if disabling virtio_balloon helps or not and report back. Thanks. Fer On vie, jun 16, 2017 at 12:25 , Ladi Prosek <lprosek@redhat.com<mailto:lprosek@redhat.com>> wrote: On Fri, Jun 16, 2017 at 12:11 PM, Fernando Casas Schössow <casasfernando@hotmail.com<mailto:casasfernando@hotmail.com>> wrote: Hi Ladi, Thanks a lot for looking into this and replying. I will do my best to rebuild and deploy Alpine's qemu packages with this patch included but not sure its feasible yet. In any case, would it be possible to have this patch included in the next qemu release? Yes, I have already added this to my todo list. The current error message is helpful but knowing which device was involved will be much more helpful. Regarding the environment, I'm not doing migrations and only managed save is done in case the host needs to be rebooted or shutdown. The QEMU process is running the VM since the host is started and this failuire is ocurring randomly without any previous manage save done. As part of troubleshooting on one of the guests I switched from virtio_blk to virtio_scsi for the guest disks but I will need more time to see if that helped. If I have this problem again I will follow your advise and remove virtio_balloon. Thanks, please keep us posted. Another question: is there any way to monitor the virtqueue size either from the guest itself or from the host? Any file in sysfs or proc? This may help to understand in which conditions this is happening and to react faster to mitigate the problem. The problem is not in the virtqueue size but in one piece of internal state ("inuse") which is meant to track the number of buffers "checked out" by QEMU. It's being compared to virtqueue size merely as a sanity check. I'm afraid that there's no way to expose this variable without rebuilding QEMU. The best you could do is attach gdb to the QEMU process and use some clever data access breakpoints to catch suspicious writes to the variable. Although it's likely that it just creeps up slowly and you won't see anything interesting. It's probably beyond reasonable at this point anyway. I would continue with the elimination process (virtio_scsi instead of virtio_blk, no balloon, etc.) and then maybe once we know which device it is, we can add some instrumentation to the code. Thanks again for your help with this! Fer On vie, jun 16, 2017 at 8:58 , Ladi Prosek <lprosek@redhat.com<mailto:lprosek@redhat.com>> wrote: Hi, Would you be able to enhance the error message and rebuild QEMU? --- a/hw/virtio/virtio.c +++ b/hw/virtio/virtio.c @@ -856,7 +856,7 @@ void *virtqueue_pop(VirtQueue *vq, size_t sz) max = vq->vring.num; if (vq->inuse = vq->vring.num) { - virtio_error(vdev, "Virtqueue size exceeded"); + virtio_error(vdev, "Virtqueue %u device %s size exceeded", vq->queue_index, vdev->name); goto done; } This would at least confirm the theory that it's caused by virtio-blk-pci. If rebuilding is not feasible I would start by removing other virtio devices -- particularly balloon which has had quite a few virtio related bugs fixed recently. Does your environment involve VM migrations or saving/resuming, or does the crashing QEMU process always run the VM from its boot? Thanks!
On Tue, Jun 20, 2017 at 8:30 AM, Fernando Casas Schössow <casasfernando@hotmail.com> wrote: > Hi Ladi, > > In this case both guests are CentOS 7.3 running the same kernel > 3.10.0-514.21.1. > Also the guest that fails most frequently is running Docker with 4 or 5 > containers. > > Another thing I would like to mention is that the host is running on > Alpine's default grsec patched kernel. I have the option to install also a > vanilla kernel. Would it make sense to switch to the vanilla kernel on the > host and see if that helps? The host kernel is less likely to be responsible for this, in my opinion. I'd hold off on that for now. > And last but not least KSM is enabled on the host. Should I disable it? Could be worth the try. > Following your advice I will run memtest on the host and report back. Just > as a side comment, the host is running on ECC memory. I see. Would it be possible for you, once a guest is in the broken state, to make it available for debugging? By attaching gdb to the QEMU process for example and letting me poke around it remotely? Thanks! > Thanks for all your help. > > Fer. > > On mar, jun 20, 2017 at 7:59 , Ladi Prosek <lprosek@redhat.com> wrote: > > Hi Fernando, On Tue, Jun 20, 2017 at 12:10 AM, Fernando Casas Schössow > <casasfernando@hotmail.com> wrote: > > Hi Ladi, Today two guests failed again at different times of day. One of > them was the one I switched from virtio_blk to virtio_scsi so this change > didn't solved the problem. Now in this guest I also disabled virtio_balloon, > continuing with the elimination process. Also this time I found a different > error message in the guest console. In the guest already switched to > virtio_scsi: virtio_scsi virtio2: request:id 44 is not a head! Followed by > the usual "task blocked for more than 120 seconds." error. On the guest > still running on virtio_blk the error was similar: virtio_blk virtio2: > req.0:id 42 is not a head! blk_update_request: I/O error, dev vda, sector > 645657736 Buffer I/O error on dev dm-1, logical block 7413821, lost async > page write Followed by the usual "task blocked for more than 120 seconds." > error. > > Honestly this is starting to look more and more like a memory corruption. > Two different virtio devices and two different guest operating systems, that > would have to be a bug in the common virtio code and we would have seen it > somewhere else already. Would it be possible run a thorough memtest on the > host just in case? > > Do you think that the blk_update_request and the buffer I/O error may be a > consequence of the previous "is not a head!" error or should I be worried > for a storage level issue here? Now I will wait to see if disabling > virtio_balloon helps or not and report back. Thanks. Fer On vie, jun 16, > 2017 at 12:25 , Ladi Prosek <lprosek@redhat.com> wrote: On Fri, Jun 16, 2017 > at 12:11 PM, Fernando Casas Schössow <casasfernando@hotmail.com> wrote: Hi > Ladi, Thanks a lot for looking into this and replying. I will do my best to > rebuild and deploy Alpine's qemu packages with this patch included but not > sure its feasible yet. In any case, would it be possible to have this patch > included in the next qemu release? Yes, I have already added this to my todo > list. The current error message is helpful but knowing which device was > involved will be much more helpful. Regarding the environment, I'm not doing > migrations and only managed save is done in case the host needs to be > rebooted or shutdown. The QEMU process is running the VM since the host is > started and this failuire is ocurring randomly without any previous manage > save done. As part of troubleshooting on one of the guests I switched from > virtio_blk to virtio_scsi for the guest disks but I will need more time to > see if that helped. If I have this problem again I will follow your advise > and remove virtio_balloon. Thanks, please keep us posted. Another question: > is there any way to monitor the virtqueue size either from the guest itself > or from the host? Any file in sysfs or proc? This may help to understand in > which conditions this is happening and to react faster to mitigate the > problem. The problem is not in the virtqueue size but in one piece of > internal state ("inuse") which is meant to track the number of buffers > "checked out" by QEMU. It's being compared to virtqueue size merely as a > sanity check. I'm afraid that there's no way to expose this variable without > rebuilding QEMU. The best you could do is attach gdb to the QEMU process and > use some clever data access breakpoints to catch suspicious writes to the > variable. Although it's likely that it just creeps up slowly and you won't > see anything interesting. It's probably beyond reasonable at this point > anyway. I would continue with the elimination process (virtio_scsi instead > of virtio_blk, no balloon, etc.) and then maybe once we know which device it > is, we can add some instrumentation to the code. Thanks again for your help > with this! Fer On vie, jun 16, 2017 at 8:58 , Ladi Prosek > <lprosek@redhat.com> wrote: Hi, Would you be able to enhance the error > message and rebuild QEMU? --- a/hw/virtio/virtio.c +++ b/hw/virtio/virtio.c > @@ -856,7 +856,7 @@ void *virtqueue_pop(VirtQueue *vq, size_t sz) max = > vq->vring.num; if (vq->inuse = vq->vring.num) { - virtio_error(vdev, > "Virtqueue size exceeded"); + virtio_error(vdev, "Virtqueue %u device %s > size exceeded", vq->queue_index, vdev->name); goto done; } This would at > least confirm the theory that it's caused by virtio-blk-pci. If rebuilding > is not feasible I would start by removing other virtio devices -- > particularly balloon which has had quite a few virtio related bugs fixed > recently. Does your environment involve VM migrations or saving/resuming, or > does the crashing QEMU process always run the VM from its boot? Thanks! > > >
Hi Ladi, Sorry for the delay in my reply. I will leave the host kernel alone for now then. For the last 15 hours or so I'm running memtest86+ on the host. So far so good. Two passes no errors so far. I will try to leave it running for at least another 24hr and report back the results. Hopefully we can discard the memory issue at hardware level. Regarding KSM, that will be the next thing I will disable if after removing the balloon device guests still crash. About leaving a guest in a failed state for you to debug it remotely, that's absolutely an option. We just need to coordinate so I can give you remote access to the host and so on. Let me know if any preparation is needed in advance and which tools you need installed on the host. Once I again I would like to thank you for all your help and your great disposition! Cheers, Fer On mar, jun 20, 2017 at 9:52 , Ladi Prosek <lprosek@redhat.com> wrote: The host kernel is less likely to be responsible for this, in my opinion. I'd hold off on that for now. And last but not least KSM is enabled on the host. Should I disable it? Could be worth the try. Following your advice I will run memtest on the host and report back. Just as a side comment, the host is running on ECC memory. I see. Would it be possible for you, once a guest is in the broken state, to make it available for debugging? By attaching gdb to the QEMU process for example and letting me poke around it remotely? Thanks!
Hi Fernando, On Wed, Jun 21, 2017 at 2:19 PM, Fernando Casas Schössow <casasfernando@hotmail.com> wrote: > Hi Ladi, > > Sorry for the delay in my reply. > I will leave the host kernel alone for now then. > For the last 15 hours or so I'm running memtest86+ on the host. So far so > good. Two passes no errors so far. I will try to leave it running for at > least another 24hr and report back the results. Hopefully we can discard the > memory issue at hardware level. > > Regarding KSM, that will be the next thing I will disable if after removing > the balloon device guests still crash. > > About leaving a guest in a failed state for you to debug it remotely, that's > absolutely an option. We just need to coordinate so I can give you remote > access to the host and so on. Let me know if any preparation is needed in > advance and which tools you need installed on the host. I think that gdbserver attached to the QEMU process should be enough. When the VM gets into the broken state please do something like: gdbserver --attach host:12345 <QEMU pid> and let me know the host name and port (12345 in the above example). > Once I again I would like to thank you for all your help and your great > disposition! You're absolutely welcome, I don't think I've done anything helpful so far :) > Cheers, > > Fer > > On mar, jun 20, 2017 at 9:52 , Ladi Prosek <lprosek@redhat.com> wrote: > > The host kernel is less likely to be responsible for this, in my opinion. > I'd hold off on that for now. > > And last but not least KSM is enabled on the host. Should I disable it? > > Could be worth the try. > > Following your advice I will run memtest on the host and report back. Just > as a side comment, the host is running on ECC memory. > > I see. Would it be possible for you, once a guest is in the broken state, to > make it available for debugging? By attaching gdb to the QEMU process for > example and letting me poke around it remotely? Thanks! > > >
Hi Ladi, Small update. Memtest86+ was running on the host for more than 54 hours. 8 passes were completed and no memory errors found. For now I think we can assume that the host memory is ok. I just started all the guests one hour ago. I will monitor them and once one fails I will attach the debugger and let you know. Thanks. Fer On jue, jun 22, 2017 at 9:43 , Ladi Prosek <lprosek@redhat.com> wrote: Hi Fernando, On Wed, Jun 21, 2017 at 2:19 PM, Fernando Casas Schössow <casasfernando@hotmail.com<mailto:casasfernando@hotmail.com>> wrote: Hi Ladi, Sorry for the delay in my reply. I will leave the host kernel alone for now then. For the last 15 hours or so I'm running memtest86+ on the host. So far so good. Two passes no errors so far. I will try to leave it running for at least another 24hr and report back the results. Hopefully we can discard the memory issue at hardware level. Regarding KSM, that will be the next thing I will disable if after removing the balloon device guests still crash. About leaving a guest in a failed state for you to debug it remotely, that's absolutely an option. We just need to coordinate so I can give you remote access to the host and so on. Let me know if any preparation is needed in advance and which tools you need installed on the host. I think that gdbserver attached to the QEMU process should be enough. When the VM gets into the broken state please do something like: gdbserver --attach host:12345 <QEMU pid> and let me know the host name and port (12345 in the above example). Once I again I would like to thank you for all your help and your great disposition! You're absolutely welcome, I don't think I've done anything helpful so far :) Cheers, Fer On mar, jun 20, 2017 at 9:52 , Ladi Prosek <lprosek@redhat.com<mailto:lprosek@redhat.com>> wrote: The host kernel is less likely to be responsible for this, in my opinion. I'd hold off on that for now. And last but not least KSM is enabled on the host. Should I disable it? Could be worth the try. Following your advice I will run memtest on the host and report back. Just as a side comment, the host is running on ECC memory. I see. Would it be possible for you, once a guest is in the broken state, to make it available for debugging? By attaching gdb to the QEMU process for example and letting me poke around it remotely? Thanks!
Hi Ladi, After running for about 15hrs two different guests (one Windows, one Linux) crashed with around 1 hour difference and the same error in qemu log "Virqueue size exceeded". The Linux guest was already running on virtio_scsi and without virtio_balloon. :( I compiled and attached gdbserver to the qemu process for this guest but when I did this I got the following warning in gdbserver: warning: Cannot call inferior functions, Linux kernel PaX protection forbids return to non-executable pages! The default Alpine kernel is a grsec kernel. Not sure if this will interfere with debugging or not but I suspect yes. If you need me to replace the grsec kernel with a vanilla one (also available as an option in Alpine) let me know and I will do so. Otherwise send me an email directly so I can share with you the host:port details so you can connect to gdbserver. Thanks, Fer On vie, jun 23, 2017 at 8:29 , Fernando Casas Schössow <casasfernando@hotmail.com> wrote: Hi Ladi, Small update. Memtest86+ was running on the host for more than 54 hours. 8 passes were completed and no memory errors found. For now I think we can assume that the host memory is ok. I just started all the guests one hour ago. I will monitor them and once one fails I will attach the debugger and let you know. Thanks. Fer On jue, jun 22, 2017 at 9:43 , Ladi Prosek <lprosek@redhat.com> wrote: Hi Fernando, On Wed, Jun 21, 2017 at 2:19 PM, Fernando Casas Schössow <casasfernando@hotmail.com<mailto:casasfernando@hotmail.com>> wrote: Hi Ladi, Sorry for the delay in my reply. I will leave the host kernel alone for now then. For the last 15 hours or so I'm running memtest86+ on the host. So far so good. Two passes no errors so far. I will try to leave it running for at least another 24hr and report back the results. Hopefully we can discard the memory issue at hardware level. Regarding KSM, that will be the next thing I will disable if after removing the balloon device guests still crash. About leaving a guest in a failed state for you to debug it remotely, that's absolutely an option. We just need to coordinate so I can give you remote access to the host and so on. Let me know if any preparation is needed in advance and which tools you need installed on the host. I think that gdbserver attached to the QEMU process should be enough. When the VM gets into the broken state please do something like: gdbserver --attach host:12345 <QEMU pid> and let me know the host name and port (12345 in the above example). Once I again I would like to thank you for all your help and your great disposition! You're absolutely welcome, I don't think I've done anything helpful so far :) Cheers, Fer On mar, jun 20, 2017 at 9:52 , Ladi Prosek <lprosek@redhat.com<mailto:lprosek@redhat.com>> wrote: The host kernel is less likely to be responsible for this, in my opinion. I'd hold off on that for now. And last but not least KSM is enabled on the host. Should I disable it? Could be worth the try. Following your advice I will run memtest on the host and report back. Just as a side comment, the host is running on ECC memory. I see. Would it be possible for you, once a guest is in the broken state, to make it available for debugging? By attaching gdb to the QEMU process for example and letting me poke around it remotely? Thanks!
Hi, Sorry for resurrecting this thread after so long but I just upgraded the host to Qemu 3.1 and libvirt 4.10 and I'm still facing this problem. At the moment I cannot use virtio disks (virtio-blk nor virtio-scsi) with my guests in order to avoid this issue so as a workaround I'm using SATA emulated storage which is not ideal but is perfectly stable. Do you have any suggestions on how can I progress troubleshooting? Qemu is not crashing so I don't have any dumps that can be analyzed. The guest is just "stuck" and all I can do is destroy it and start it again. It's really frustrating that after all this time I couldn't find the cause for this issue so any ideas are welcome. Thanks. Fernando
On Thu, Jan 31, 2019 at 11:32:32AM +0000, Fernando Casas Schössow wrote: > Sorry for resurrecting this thread after so long but I just upgraded the host to Qemu 3.1 and libvirt 4.10 and I'm still facing this problem. > At the moment I cannot use virtio disks (virtio-blk nor virtio-scsi) with my guests in order to avoid this issue so as a workaround I'm using SATA emulated storage which is not ideal but is perfectly stable. > > Do you have any suggestions on how can I progress troubleshooting? > Qemu is not crashing so I don't have any dumps that can be analyzed. The guest is just "stuck" and all I can do is destroy it and start it again. > It's really frustrating that after all this time I couldn't find the cause for this issue so any ideas are welcome. Hi Fernando, Please post your QEMU command-line (ps aux | grep qemu) and the details of the guest operating system and version. Stefan
Hi Stefan, Thanks for looking into this. Please find the requested details below. If you need any further details (like the host storage details) just let me know. Host details: CPU: Intel(R) Xeon(R) CPU E3-1230 V2 Memory: 32GB (ECC) OS: Alpine 3.9 Kernel version: 4.19.18-0-vanilla Qemu version: 3.1 Libvirt version: 4.10 Networking: openvswitch-2.10.1 Windows guest: OS: Windows Server 2012 R2 64bit (up to date) Virtio drivers version: 141 (stable) Command line: /usr/bin/qemu-system-x86_64 -name guest=DCHOMENET02,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-17-DCHOMENET02/master-key.aes -machine pc-i440fx-3.1,accel=kvm,usb=off,dump-guest-core=off -cpu IvyBridge,ss=on,vmx=on,pcid=on,hypervisor=on,arat=on,tsc_adjust=on,umip=on,xsaveopt=on,hv_time,hv_relaxed,hv_vapic,hv_spinlocks=0x1fff -drive file=/usr/share/edk2.git/ovmf-x64/OVMF_CODE-pure-efi.fd,if=pflash,format=raw,unit=0,readonly=on -drive file=/var/lib/libvirt/qemu/nvram/DCHOMENET02_VARS.fd,if=pflash,format=raw,unit=1 -m 2048 -realtime mlock=off -smp 2,sockets=2,cores=1,threads=1 -uuid 84afccf7-f164-4d0d-b9c9-c92ffab00848 -no-user-config -nodefaults -chardev socket,id=charmonitor,fd=42,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=localtime,driftfix=slew -global kvm-pit.lost_tick_policy=delay -no-hpet -no-shutdown -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -boot strict=on -device ich9-usb-ehci1,id=usb,bus=pci.0,addr=0x4.0x7 -device ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pci.0,multifunction=on,addr=0x4 -device ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pci.0,addr=0x4.0x1 -device ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pci.0,addr=0x4.0x2 -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x5 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x6 -drive file=/storage/storage-ssd-vms/virtual_machines_ssd/dchomenet02.qcow2,format=qcow2,if=none,id=drive-scsi0-0-0-0,cache=none,aio=threads -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1,write-cache=on -drive file=/storage/storage-hdd-vms/virtual_machines_hdd/dchomenet02_storage.qcow2,format=qcow2,if=none,id=drive-scsi0-0-0-1,cache=none,aio=threads -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=1,drive=drive-scsi0-0-0-1,id=scsi0-0-0-1,write-cache=on -netdev tap,fd=44,id=hostnet0,vhost=on,vhostfd=45 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:91:aa:a1,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev spicevmc,id=charchannel0,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.spice.0 -chardev socket,id=charchannel1,fd=47,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=org.qemu.guest_agent.0 -device usb-tablet,id=input0,bus=usb.0,port=1 -spice port=5905,addr=127.0.0.1,disable-ticketing,seamless-migration=on -device qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,vram64_size_mb=0,vgamem_mb=16,max_outputs=1,bus=pci.0,addr=0x2 -chardev spicevmc,id=charredir0,name=usbredir -device usb-redir,chardev=charredir0,id=redir0,bus=usb.0,port=2 -chardev spicevmc,id=charredir1,name=usbredir -device usb-redir,chardev=charredir1,id=redir1,bus=usb.0,port=3 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x8 -sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny -msg timestamp=on Linux guest: OS: CentOS 7.6 64bit (up to date) Kernel version: 3.10.0-957.1.3.el7.x86_64 Command line: /usr/bin/qemu-system-x86_64 -name guest=DOCKER01,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-21-DOCKER01/master-key.aes -machine pc-i440fx-3.1,accel=kvm,usb=off,dump-guest-core=off -cpu IvyBridge,ss=on,vmx=on,pcid=on,hypervisor=on,arat=on,tsc_adjust=on,umip=on,xsaveopt=on -drive file=/usr/share/edk2.git/ovmf-x64/OVMF_CODE-pure-efi.fd,if=pflash,format=raw,unit=0,readonly=on -drive file=/var/lib/libvirt/qemu/nvram/DOCKER01_VARS.fd,if=pflash,format=raw,unit=1 -m 2048 -realtime mlock=off -smp 2,sockets=2,cores=1,threads=1 -uuid 4705b146-3b14-4c20-923c-42105d47e7fc -no-user-config -nodefaults -chardev socket,id=charmonitor,fd=44,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=delay -no-hpet -no-shutdown -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -boot strict=on -device ich9-usb-ehci1,id=usb,bus=pci.0,addr=0x4.0x7 -device ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pci.0,multifunction=on,addr=0x4 -device ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pci.0,addr=0x4.0x1 -device ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pci.0,addr=0x4.0x2 -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x6 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 -drive file=/storage/storage-ssd-vms/virtual_machines_ssd/docker01.qcow2,format=qcow2,if=none,id=drive-scsi0-0-0-0,cache=none,aio=threads -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1,write-cache=on -netdev tap,fd=46,id=hostnet0,vhost=on,vhostfd=48 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:1c:af:ce,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev socket,id=charchannel0,fd=49,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0 -chardev spicevmc,id=charchannel1,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=com.redhat.spice.0 -spice port=5904,addr=127.0.0.1,disable-ticketing,seamless-migration=on -device qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,vram64_size_mb=0,vgamem_mb=16,max_outputs=1,bus=pci.0,addr=0x2 -chardev spicevmc,id=charredir0,name=usbredir -device usb-redir,chardev=charredir0,id=redir0,bus=usb.0,port=2 -chardev spicevmc,id=charredir1,name=usbredir -device usb-redir,chardev=charredir1,id=redir1,bus=usb.0,port=3 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7 -object rng-random,id=objrng0,filename=/dev/random -device virtio-rng-pci,rng=objrng0,id=rng0,bus=pci.0,addr=0x8 -sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny -msg timestamp=on Thanks. Kind regards, Fernando On vie, feb 1, 2019 at 6:48 AM, Stefan Hajnoczi <stefanha@gmail.com> wrote: On Thu, Jan 31, 2019 at 11:32:32AM +0000, Fernando Casas Schössow wrote: Sorry for resurrecting this thread after so long but I just upgraded the host to Qemu 3.1 and libvirt 4.10 and I'm still facing this problem. At the moment I cannot use virtio disks (virtio-blk nor virtio-scsi) with my guests in order to avoid this issue so as a workaround I'm using SATA emulated storage which is not ideal but is perfectly stable. Do you have any suggestions on how can I progress troubleshooting? Qemu is not crashing so I don't have any dumps that can be analyzed. The guest is just "stuck" and all I can do is destroy it and start it again. It's really frustrating that after all this time I couldn't find the cause for this issue so any ideas are welcome. Hi Fernando, Please post your QEMU command-line (ps aux | grep qemu) and the details of the guest operating system and version. Stefan
Are you sure this happens with both virtio-blk and virtio-scsi? The following patch adds more debug output. You can build as follows: $ git clone https://git.qemu.org/git/qemu.git $ cd qemu $ patch apply -p1 ...paste the patch here... ^D # For info on build dependencies see https://wiki.qemu.org/Hosts/Linux $ ./configure --target-list=x86_64-softmmu $ make -j4 You can configure a libvirt domain to use your custom QEMU binary by changing the <devices><emulator> tag to the qemu/x86_64-softmmu/qemu-system-x86_64 path. --- diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c index 22bd1ac34e..aa44bffa1f 100644 --- a/hw/virtio/virtio.c +++ b/hw/virtio/virtio.c @@ -879,6 +879,9 @@ void *virtqueue_pop(VirtQueue *vq, size_t sz) max = vq->vring.num; if (vq->inuse >= vq->vring.num) { + fprintf(stderr, "vdev %p (\"%s\")\n", vdev, vdev->name); + fprintf(stderr, "vq %p (idx %u)\n", vq, (unsigned int)(vq - vdev->vq)); + fprintf(stderr, "inuse %u vring.num %u\n", vq->inuse, vq->vring.num); virtio_error(vdev, "Virtqueue size exceeded"); goto done; }
I can test again with qemu 3.1 but with previous versions yes, it was happening the same with both virtio-blk and virtio-scsi. For 3.1 I can confirm it happens for virtio-scsi (already tested it) and I can test with virtio-blk again if that will add value to the investigation. Also I'm attaching a guest console screenshot showing the errors displayed by the guest when it goes unresponsive in case it can help. Thanks for the patch. I will build the custom qemu binary and reproduce the issue. This may take a couple of days since I cannot reproduce it at will. Sometimes it takes 12 hours sometimes 2 days until it happens. Hopefully the code below will add more light on to this problem. Thanks, Fernando On lun, feb 4, 2019 at 7:06 AM, Stefan Hajnoczi <stefanha@gmail.com> wrote: Are you sure this happens with both virtio-blk and virtio-scsi? The following patch adds more debug output. You can build as follows: $ git clone https://git.qemu.org/git/qemu.git $ cd qemu $ patch apply -p1 ...paste the patch here... ^D # For info on build dependencies see https://wiki.qemu.org/Hosts/Linux $ ./configure --target-list=x86_64-softmmu $ make -j4 You can configure a libvirt domain to use your custom QEMU binary by changing the <devices><emulator> tag to the qemu/x86_64-softmmu/qemu-system-x86_64 path. --- diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c index 22bd1ac34e..aa44bffa1f 100644 --- a/hw/virtio/virtio.c +++ b/hw/virtio/virtio.c @@ -879,6 +879,9 @@ void *virtqueue_pop(VirtQueue *vq, size_t sz) max = vq->vring.num; if (vq->inuse >= vq->vring.num) { + fprintf(stderr, "vdev %p (\"%s\")\n", vdev, vdev->name); + fprintf(stderr, "vq %p (idx %u)\n", vq, (unsigned int)(vq - vdev->vq)); + fprintf(stderr, "inuse %u vring.num %u\n", vq->inuse, vq->vring.num); virtio_error(vdev, "Virtqueue size exceeded"); goto done; }
I can now confirm that the same happens with virtio-blk and virtio-scsi. Please find below the qemu log enhanced with the new information added by the patch provided by Stefan: vdev 0x55d22b8e10f0 ("virtio-blk") vq 0x55d22b8ebe40 (idx 0) inuse 128 vring.num 128 2019-02-06T00:40:41.742552Z qemu-system-x86_64: Virtqueue size exceeded I just changed the disk back to virtio-scsi so I can repro this again with the patched qemu and report back. Thanks. On lun, feb 4, 2019 at 8:24 AM, Fernando Casas Schössow <casasfernando@outlook.com> wrote: I can test again with qemu 3.1 but with previous versions yes, it was happening the same with both virtio-blk and virtio-scsi. For 3.1 I can confirm it happens for virtio-scsi (already tested it) and I can test with virtio-blk again if that will add value to the investigation. Also I'm attaching a guest console screenshot showing the errors displayed by the guest when it goes unresponsive in case it can help. Thanks for the patch. I will build the custom qemu binary and reproduce the issue. This may take a couple of days since I cannot reproduce it at will. Sometimes it takes 12 hours sometimes 2 days until it happens. Hopefully the code below will add more light on to this problem. Thanks, Fernando On lun, feb 4, 2019 at 7:06 AM, Stefan Hajnoczi <stefanha@gmail.com> wrote: Are you sure this happens with both virtio-blk and virtio-scsi? The following patch adds more debug output. You can build as follows: $ git clone https://git.qemu.org/git/qemu.git $ cd qemu $ patch apply -p1 ...paste the patch here... ^D # For info on build dependencies see https://wiki.qemu.org/Hosts/Linux $ ./configure --target-list=x86_64-softmmu $ make -j4 You can configure a libvirt domain to use your custom QEMU binary by changing the <devices><emulator> tag to the qemu/x86_64-softmmu/qemu-system-x86_64 path. --- diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c index 22bd1ac34e..aa44bffa1f 100644 --- a/hw/virtio/virtio.c +++ b/hw/virtio/virtio.c @@ -879,6 +879,9 @@ void *virtqueue_pop(VirtQueue *vq, size_t sz) max = vq->vring.num; if (vq->inuse >= vq->vring.num) { + fprintf(stderr, "vdev %p (\"%s\")\n", vdev, vdev->name); + fprintf(stderr, "vq %p (idx %u)\n", vq, (unsigned int)(vq - vdev->vq)); + fprintf(stderr, "inuse %u vring.num %u\n", vq->inuse, vq->vring.num); virtio_error(vdev, "Virtqueue size exceeded"); goto done; }
I could also repro the same with virtio-scsi on the same guest a couple of hours later: 2019-02-06 07:10:37.672+0000: starting up libvirt version: 4.10.0, qemu version: 3.1.0, kernel: 4.19.18-0-vanilla, hostname: vmsvr01.homenet.local LC_ALL=C PATH=/bin:/sbin:/bin:/sbin:/usr/bin:/usr/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin HOME=/root USER=root QEMU_AUDIO_DRV=spice /home/fernando/qemu-system-x86_64 -name guest=DOCKER01,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-32-DOCKER01/master-key.aes -machine pc-i440fx-3.1,accel=kvm,usb=off,dump-guest-core=off -cpu IvyBridge,ss=on,vmx=on,pcid=on,hypervisor=on,arat=on,tsc_adjust=on,umip=on,xsaveopt=on -drive file=/usr/share/edk2.git/ovmf-x64/OVMF_CODE-pure-efi.fd,if=pflash,format=raw,unit=0,readonly=on -drive file=/var/lib/libvirt/qemu/nvram/DOCKER01_VARS.fd,if=pflash,format=raw,unit=1 -m 2048 -realtime mlock=off -smp 2,sockets=2,cores=1,threads=1 -uuid 4705b146-3b14-4c20-923c-42105d47e7fc -no-user-config -nodefaults -chardev socket,id=charmonitor,fd=46,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=delay -no-hpet -no-shutdown -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -boot strict=on -device ich9-usb-ehci1,id=usb,bus=pci.0,addr=0x4.0x7 -device ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pci.0,multifunction=on,addr=0x4 -device ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pci.0,addr=0x4.0x1 -device ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pci.0,addr=0x4.0x2 -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x6 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 -drive file=/storage/storage-ssd-vms/virtual_machines_ssd/docker01.qcow2,format=qcow2,if=none,id=drive-scsi0-0-0-0,cache=none,aio=threads -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1,write-cache=on -netdev tap,fd=48,id=hostnet0,vhost=on,vhostfd=50 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:1c:af:ce,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev socket,id=charchannel0,fd=51,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0 -chardev spicevmc,id=charchannel1,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=com.redhat.spice.0 -spice port=5904,addr=127.0.0.1,disable-ticketing,seamless-migration=on -device qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,vram64_size_mb=0,vgamem_mb=16,max_outputs=1,bus=pci.0,addr=0x2 -chardev spicevmc,id=charredir0,name=usbredir -device usb-redir,chardev=charredir0,id=redir0,bus=usb.0,port=2 -chardev spicevmc,id=charredir1,name=usbredir -device usb-redir,chardev=charredir1,id=redir1,bus=usb.0,port=3 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7 -object rng-random,id=objrng0,filename=/dev/random -device virtio-rng-pci,rng=objrng0,id=rng0,bus=pci.0,addr=0x8 -sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny -msg timestamp=on 2019-02-06 07:10:37.672+0000: Domain id=32 is tainted: high-privileges char device redirected to /dev/pts/5 (label charserial0) vdev 0x5585456ef6b0 ("virtio-scsi") vq 0x5585456f90a0 (idx 2) inuse 128 vring.num 128 2019-02-06T13:00:46.942424Z qemu-system-x86_64: Virtqueue size exceeded I'm open to any tests or suggestions that can move the investigation forward and find the cause of this issue. Thanks. On mié, feb 6, 2019 at 8:15 AM, Fernando Casas Schössow <casasfernando@outlook.com> wrote: I can now confirm that the same happens with virtio-blk and virtio-scsi. Please find below the qemu log enhanced with the new information added by the patch provided by Stefan: vdev 0x55d22b8e10f0 ("virtio-blk") vq 0x55d22b8ebe40 (idx 0) inuse 128 vring.num 128 2019-02-06T00:40:41.742552Z qemu-system-x86_64: Virtqueue size exceeded I just changed the disk back to virtio-scsi so I can repro this again with the patched qemu and report back. Thanks. On lun, feb 4, 2019 at 8:24 AM, Fernando Casas Schössow <casasfernando@outlook.com> wrote: I can test again with qemu 3.1 but with previous versions yes, it was happening the same with both virtio-blk and virtio-scsi. For 3.1 I can confirm it happens for virtio-scsi (already tested it) and I can test with virtio-blk again if that will add value to the investigation. Also I'm attaching a guest console screenshot showing the errors displayed by the guest when it goes unresponsive in case it can help. Thanks for the patch. I will build the custom qemu binary and reproduce the issue. This may take a couple of days since I cannot reproduce it at will. Sometimes it takes 12 hours sometimes 2 days until it happens. Hopefully the code below will add more light on to this problem. Thanks, Fernando On lun, feb 4, 2019 at 7:06 AM, Stefan Hajnoczi <stefanha@gmail.com> wrote: Are you sure this happens with both virtio-blk and virtio-scsi? The following patch adds more debug output. You can build as follows: $ git clone https://git.qemu.org/git/qemu.git $ cd qemu $ patch apply -p1 ...paste the patch here... ^D # For info on build dependencies see https://wiki.qemu.org/Hosts/Linux $ ./configure --target-list=x86_64-softmmu $ make -j4 You can configure a libvirt domain to use your custom QEMU binary by changing the <devices><emulator> tag to the qemu/x86_64-softmmu/qemu-system-x86_64 path. --- diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c index 22bd1ac34e..aa44bffa1f 100644 --- a/hw/virtio/virtio.c +++ b/hw/virtio/virtio.c @@ -879,6 +879,9 @@ void *virtqueue_pop(VirtQueue *vq, size_t sz) max = vq->vring.num; if (vq->inuse >= vq->vring.num) { + fprintf(stderr, "vdev %p (\"%s\")\n", vdev, vdev->name); + fprintf(stderr, "vq %p (idx %u)\n", vq, (unsigned int)(vq - vdev->vq)); + fprintf(stderr, "inuse %u vring.num %u\n", vq->inuse, vq->vring.num); virtio_error(vdev, "Virtqueue size exceeded"); goto done; }
On Wed, Feb 06, 2019 at 04:47:19PM +0000, Fernando Casas Schössow wrote: > I could also repro the same with virtio-scsi on the same guest a couple of hours later: > > 2019-02-06 07:10:37.672+0000: starting up libvirt version: 4.10.0, qemu version: 3.1.0, kernel: 4.19.18-0-vanilla, hostname: vmsvr01.homenet.local > LC_ALL=C PATH=/bin:/sbin:/bin:/sbin:/usr/bin:/usr/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin HOME=/root USER=root QEMU_AUDIO_DRV=spice /home/fernando/qemu-system-x86_64 -name guest=DOCKER01,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-32-DOCKER01/master-key.aes -machine pc-i440fx-3.1,accel=kvm,usb=off,dump-guest-core=off -cpu IvyBridge,ss=on,vmx=on,pcid=on,hypervisor=on,arat=on,tsc_adjust=on,umip=on,xsaveopt=on -drive file=/usr/share/edk2.git/ovmf-x64/OVMF_CODE-pure-efi.fd,if=pflash,format=raw,unit=0,readonly=on -drive file=/var/lib/libvirt/qemu/nvram/DOCKER01_VARS.fd,if=pflash,format=raw,unit=1 -m 2048 -realtime mlock=off -smp 2,sockets=2,cores=1,threads=1 -uuid 4705b146-3b14-4c20-923c-42105d47e7fc -no-user-config -nodefaults -chardev socket,id=charmonitor,fd=46,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=delay -no-hpet -no-shutdown -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -boot strict=on -device ich9-usb-ehci1,id=usb,bus=pci.0,addr=0x4.0x7 -device ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pci.0,multifunction=on,addr=0x4 -device ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pci.0,addr=0x4.0x1 -device ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pci.0,addr=0x4.0x2 -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x6 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 -drive file=/storage/storage-ssd-vms/virtual_machines_ssd/docker01.qcow2,format=qcow2,if=none,id=drive-scsi0-0-0-0,cache=none,aio=threads -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1,write-cache=on -netdev tap,fd=48,id=hostnet0,vhost=on,vhostfd=50 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:1c:af:ce,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev socket,id=charchannel0,fd=51,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0 -chardev spicevmc,id=charchannel1,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=com.redhat.spice.0 -spice port=5904,addr=127.0.0.1,disable-ticketing,seamless-migration=on -device qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,vram64_size_mb=0,vgamem_mb=16,max_outputs=1,bus=pci.0,addr=0x2 -chardev spicevmc,id=charredir0,name=usbredir -device usb-redir,chardev=charredir0,id=redir0,bus=usb.0,port=2 -chardev spicevmc,id=charredir1,name=usbredir -device usb-redir,chardev=charredir1,id=redir1,bus=usb.0,port=3 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7 -object rng-random,id=objrng0,filename=/dev/random -device virtio-rng-pci,rng=objrng0,id=rng0,bus=pci.0,addr=0x8 -sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny -msg timestamp=on > 2019-02-06 07:10:37.672+0000: Domain id=32 is tainted: high-privileges > char device redirected to /dev/pts/5 (label charserial0) > vdev 0x5585456ef6b0 ("virtio-scsi") > vq 0x5585456f90a0 (idx 2) > inuse 128 vring.num 128 > 2019-02-06T13:00:46.942424Z qemu-system-x86_64: Virtqueue size exceeded > > > I'm open to any tests or suggestions that can move the investigation forward and find the cause of this issue. Thanks for collecting the data! The fact that both virtio-blk and virtio-scsi failed suggests it's not a virtqueue element leak in the virtio-blk or virtio-scsi device emulation code. The hung task error messages from inside the guest are a consequence of QEMU hitting the "Virtqueue size exceeded" error. QEMU refuses to process further requests after the error, causing tasks inside the guest to get stuck on I/O. I don't have a good theory regarding the root cause. Two ideas: 1. The guest is corrupting the vring or submitting more requests than will fit into the ring. Somewhat unlikely because it happens with both Windows and Linux guests. 2. QEMU's virtqueue code is buggy, maybe the memory region cache which is used for fast guest RAM accesses. Here is an expanded version of the debug patch which might help identify which of these scenarios is likely. Sorry, it requires running the guest again! This time let's make QEMU dump core so both QEMU state and guest RAM are captured for further debugging. That way it will be possible to extract more information using gdb without rerunning. Stefan --- diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c index a1ff647a66..28d89fcbcb 100644 --- a/hw/virtio/virtio.c +++ b/hw/virtio/virtio.c @@ -866,6 +866,7 @@ void *virtqueue_pop(VirtQueue *vq, size_t sz) return NULL; } rcu_read_lock(); + uint16_t old_shadow_avail_idx = vq->shadow_avail_idx; if (virtio_queue_empty_rcu(vq)) { goto done; } @@ -879,6 +880,12 @@ void *virtqueue_pop(VirtQueue *vq, size_t sz) max = vq->vring.num; if (vq->inuse >= vq->vring.num) { + fprintf(stderr, "vdev %p (\"%s\")\n", vdev, vdev->name); + fprintf(stderr, "vq %p (idx %u)\n", vq, (unsigned int)(vq - vdev->vq)); + fprintf(stderr, "inuse %u vring.num %u\n", vq->inuse, vq->vring.num); + fprintf(stderr, "old_shadow_avail_idx %u last_avail_idx %u avail_idx %u\n", old_shadow_avail_idx, vq->last_avail_idx, vq->shadow_avail_idx); + fprintf(stderr, "avail %#" HWADDR_PRIx " avail_idx (cache bypassed) %u\n", vq->vring.avail, virtio_lduw_phys(vdev, vq->vring.avail + offsetof(VRingAvail, idx))); + fprintf(stderr, "used_idx %u\n", vq->used_idx); + abort(); /* <--- core dump! */ virtio_error(vdev, "Virtqueue size exceeded"); goto done; }
Thanks for looking into this Stefan. I rebuilt Qemu with the new patch and got a couple of guests running with the new build. Two of them using virtio-scsi and another one using virtio-blk. Now I'm waiting for any of them to crash. I also set libvirt to include the guest memory in the qemu dumps as I understood you will want to look at both (qemu dump and guest memory dump). I will reply to this thread once I have any news. Kind regards. Fernando On lun, feb 11, 2019 at 4:17 AM, Stefan Hajnoczi <stefanha@gmail.com> wrote: On Wed, Feb 06, 2019 at 04:47:19PM +0000, Fernando Casas Schössow wrote: I could also repro the same with virtio-scsi on the same guest a couple of hours later: 2019-02-06 07:10:37.672+0000: starting up libvirt version: 4.10.0, qemu version: 3.1.0, kernel: 4.19.18-0-vanilla, hostname: vmsvr01.homenet.local LC_ALL=C PATH=/bin:/sbin:/bin:/sbin:/usr/bin:/usr/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin HOME=/root USER=root QEMU_AUDIO_DRV=spice /home/fernando/qemu-system-x86_64 -name guest=DOCKER01,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-32-DOCKER01/master-key.aes -machine pc-i440fx-3.1,accel=kvm,usb=off,dump-guest-core=off -cpu IvyBridge,ss=on,vmx=on,pcid=on,hypervisor=on,arat=on,tsc_adjust=on,umip=on,xsaveopt=on -drive file=/usr/share/edk2.git/ovmf-x64/OVMF_CODE-pure-efi.fd,if=pflash,format=raw,unit=0,readonly=on -drive file=/var/lib/libvirt/qemu/nvram/DOCKER01_VARS.fd,if=pflash,format=raw,unit=1 -m 2048 -realtime mlock=off -smp 2,sockets=2,cores=1,threads=1 -uuid 4705b146-3b14-4c20-923c-42105d47e7fc -no-user-config -nodefaults -chardev socket,id=charmonitor,fd=46,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=delay -no-hpet -no-shutdown -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -boot strict=on -device ich9-usb-ehci1,id=usb,bus=pci.0,addr=0x4.0x7 -device ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pci.0,multifunction=on,addr=0x4 -device ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pci.0,addr=0x4.0x1 -device ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pci.0,addr=0x4.0x2 -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x6 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 -drive file=/storage/storage-ssd-vms/virtual_machines_ssd/docker01.qcow2,format=qcow2,if=none,id=drive-scsi0-0-0-0,cache=none,aio=threads -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1,write-cache=on -netdev tap,fd=48,id=hostnet0,vhost=on,vhostfd=50 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:1c:af:ce,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev socket,id=charchannel0,fd=51,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0 -chardev spicevmc,id=charchannel1,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=com.redhat.spice.0 -spice port=5904,addr=127.0.0.1,disable-ticketing,seamless-migration=on -device qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,vram64_size_mb=0,vgamem_mb=16,max_outputs=1,bus=pci.0,addr=0x2 -chardev spicevmc,id=charredir0,name=usbredir -device usb-redir,chardev=charredir0,id=redir0,bus=usb.0,port=2 -chardev spicevmc,id=charredir1,name=usbredir -device usb-redir,chardev=charredir1,id=redir1,bus=usb.0,port=3 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7 -object rng-random,id=objrng0,filename=/dev/random -device virtio-rng-pci,rng=objrng0,id=rng0,bus=pci.0,addr=0x8 -sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny -msg timestamp=on 2019-02-06 07:10:37.672+0000: Domain id=32 is tainted: high-privileges char device redirected to /dev/pts/5 (label charserial0) vdev 0x5585456ef6b0 ("virtio-scsi") vq 0x5585456f90a0 (idx 2) inuse 128 vring.num 128 2019-02-06T13:00:46.942424Z qemu-system-x86_64: Virtqueue size exceeded I'm open to any tests or suggestions that can move the investigation forward and find the cause of this issue. Thanks for collecting the data! The fact that both virtio-blk and virtio-scsi failed suggests it's not a virtqueue element leak in the virtio-blk or virtio-scsi device emulation code. The hung task error messages from inside the guest are a consequence of QEMU hitting the "Virtqueue size exceeded" error. QEMU refuses to process further requests after the error, causing tasks inside the guest to get stuck on I/O. I don't have a good theory regarding the root cause. Two ideas: 1. The guest is corrupting the vring or submitting more requests than will fit into the ring. Somewhat unlikely because it happens with both Windows and Linux guests. 2. QEMU's virtqueue code is buggy, maybe the memory region cache which is used for fast guest RAM accesses. Here is an expanded version of the debug patch which might help identify which of these scenarios is likely. Sorry, it requires running the guest again! This time let's make QEMU dump core so both QEMU state and guest RAM are captured for further debugging. That way it will be possible to extract more information using gdb without rerunning. Stefan --- diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c index a1ff647a66..28d89fcbcb 100644 --- a/hw/virtio/virtio.c +++ b/hw/virtio/virtio.c @@ -866,6 +866,7 @@ void *virtqueue_pop(VirtQueue *vq, size_t sz) return NULL; } rcu_read_lock(); + uint16_t old_shadow_avail_idx = vq->shadow_avail_idx; if (virtio_queue_empty_rcu(vq)) { goto done; } @@ -879,6 +880,12 @@ void *virtqueue_pop(VirtQueue *vq, size_t sz) max = vq->vring.num; if (vq->inuse >= vq->vring.num) { + fprintf(stderr, "vdev %p (\"%s\")\n", vdev, vdev->name); + fprintf(stderr, "vq %p (idx %u)\n", vq, (unsigned int)(vq - vdev->vq)); + fprintf(stderr, "inuse %u vring.num %u\n", vq->inuse, vq->vring.num); + fprintf(stderr, "old_shadow_avail_idx %u last_avail_idx %u avail_idx %u\n", old_shadow_avail_idx, vq->last_avail_idx, vq->shadow_avail_idx); + fprintf(stderr, "avail %#" HWADDR_PRIx " avail_idx (cache bypassed) %u\n", vq->vring.avail, virtio_lduw_phys(vdev, vq->vring.avail + offsetof(VRingAvail, idx))); + fprintf(stderr, "used_idx %u\n", vq->used_idx); + abort(); /* <--- core dump! */ virtio_error(vdev, "Virtqueue size exceeded"); goto done; }
It took a few days but last night the problem was reproduced. This is the information from the log: vdev 0x55f261d940f0 ("virtio-blk") vq 0x55f261d9ee40 (idx 0) inuse 128 vring.num 128 old_shadow_avail_idx 58874 last_avail_idx 58625 avail_idx 58874 avail 0x3d87a800 avail_idx (cache bypassed) 58625 used_idx 58497 2019-02-18 03:20:08.605+0000: shutting down, reason=crashed The dump file, including guest memory, was generated successfully (after gzip the file is around 492MB). I switched the guest now to virtio-scsi to get the information and dump with this setup as well. How should we proceed? Thanks. On lun, feb 11, 2019 at 4:17 AM, Stefan Hajnoczi <stefanha@gmail.com> wrote: Thanks for collecting the data! The fact that both virtio-blk and virtio-scsi failed suggests it's not a virtqueue element leak in the virtio-blk or virtio-scsi device emulation code. The hung task error messages from inside the guest are a consequence of QEMU hitting the "Virtqueue size exceeded" error. QEMU refuses to process further requests after the error, causing tasks inside the guest to get stuck on I/O. I don't have a good theory regarding the root cause. Two ideas: 1. The guest is corrupting the vring or submitting more requests than will fit into the ring. Somewhat unlikely because it happens with both Windows and Linux guests. 2. QEMU's virtqueue code is buggy, maybe the memory region cache which is used for fast guest RAM accesses. Here is an expanded version of the debug patch which might help identify which of these scenarios is likely. Sorry, it requires running the guest again! This time let's make QEMU dump core so both QEMU state and guest RAM are captured for further debugging. That way it will be possible to extract more information using gdb without rerunning. Stefan --- diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c index a1ff647a66..28d89fcbcb 100644 --- a/hw/virtio/virtio.c +++ b/hw/virtio/virtio.c @@ -866,6 +866,7 @@ void *virtqueue_pop(VirtQueue *vq, size_t sz) return NULL; } rcu_read_lock(); + uint16_t old_shadow_avail_idx = vq->shadow_avail_idx; if (virtio_queue_empty_rcu(vq)) { goto done; } @@ -879,6 +880,12 @@ void *virtqueue_pop(VirtQueue *vq, size_t sz) max = vq->vring.num; if (vq->inuse >= vq->vring.num) { + fprintf(stderr, "vdev %p (\"%s\")\n", vdev, vdev->name); + fprintf(stderr, "vq %p (idx %u)\n", vq, (unsigned int)(vq - vdev->vq)); + fprintf(stderr, "inuse %u vring.num %u\n", vq->inuse, vq->vring.num); + fprintf(stderr, "old_shadow_avail_idx %u last_avail_idx %u avail_idx %u\n", old_shadow_avail_idx, vq->last_avail_idx, vq->shadow_avail_idx); + fprintf(stderr, "avail %#" HWADDR_PRIx " avail_idx (cache bypassed) %u\n", vq->vring.avail, virtio_lduw_phys(vdev, vq->vring.avail + offsetof(VRingAvail, idx))); + fprintf(stderr, "used_idx %u\n", vq->used_idx); + abort(); /* <--- core dump! */ virtio_error(vdev, "Virtqueue size exceeded"); goto done; }
Problem reproduced with virtio-scsi as well on the same guest, this time it took less than a day. Information from the log file: vdev 0x55823f119f90 ("virtio-scsi") vq 0x55823f122e80 (idx 2) inuse 128 vring.num 128 old_shadow_avail_idx 58367 last_avail_idx 58113 avail_idx 58367 avail 0x3de8a800 avail_idx (cache bypassed) 58113 used_idx 57985 2019-02-19 04:20:43.291+0000: shutting down, reason=crashed Got the dump file as well, including guest memory. Size is around 486MB after compression. Is there any other information I should collect to progress the investigation? Thanks. On lun, feb 18, 2019 at 8:21 AM, Fernando Casas Schössow <casasfernando@outlook.com> wrote: It took a few days but last night the problem was reproduced. This is the information from the log: vdev 0x55f261d940f0 ("virtio-blk") vq 0x55f261d9ee40 (idx 0) inuse 128 vring.num 128 old_shadow_avail_idx 58874 last_avail_idx 58625 avail_idx 58874 avail 0x3d87a800 avail_idx (cache bypassed) 58625 used_idx 58497 2019-02-18 03:20:08.605+0000: shutting down, reason=crashed The dump file, including guest memory, was generated successfully (after gzip the file is around 492MB). I switched the guest now to virtio-scsi to get the information and dump with this setup as well. How should we proceed? Thanks. On lun, feb 11, 2019 at 4:17 AM, Stefan Hajnoczi <stefanha@gmail.com> wrote: Thanks for collecting the data! The fact that both virtio-blk and virtio-scsi failed suggests it's not a virtqueue element leak in the virtio-blk or virtio-scsi device emulation code. The hung task error messages from inside the guest are a consequence of QEMU hitting the "Virtqueue size exceeded" error. QEMU refuses to process further requests after the error, causing tasks inside the guest to get stuck on I/O. I don't have a good theory regarding the root cause. Two ideas: 1. The guest is corrupting the vring or submitting more requests than will fit into the ring. Somewhat unlikely because it happens with both Windows and Linux guests. 2. QEMU's virtqueue code is buggy, maybe the memory region cache which is used for fast guest RAM accesses. Here is an expanded version of the debug patch which might help identify which of these scenarios is likely. Sorry, it requires running the guest again! This time let's make QEMU dump core so both QEMU state and guest RAM are captured for further debugging. That way it will be possible to extract more information using gdb without rerunning. Stefan --- diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c index a1ff647a66..28d89fcbcb 100644 --- a/hw/virtio/virtio.c +++ b/hw/virtio/virtio.c @@ -866,6 +866,7 @@ void *virtqueue_pop(VirtQueue *vq, size_t sz) return NULL; } rcu_read_lock(); + uint16_t old_shadow_avail_idx = vq->shadow_avail_idx; if (virtio_queue_empty_rcu(vq)) { goto done; } @@ -879,6 +880,12 @@ void *virtqueue_pop(VirtQueue *vq, size_t sz) max = vq->vring.num; if (vq->inuse >= vq->vring.num) { + fprintf(stderr, "vdev %p (\"%s\")\n", vdev, vdev->name); + fprintf(stderr, "vq %p (idx %u)\n", vq, (unsigned int)(vq - vdev->vq)); + fprintf(stderr, "inuse %u vring.num %u\n", vq->inuse, vq->vring.num); + fprintf(stderr, "old_shadow_avail_idx %u last_avail_idx %u avail_idx %u\n", old_shadow_avail_idx, vq->last_avail_idx, vq->shadow_avail_idx); + fprintf(stderr, "avail %#" HWADDR_PRIx " avail_idx (cache bypassed) %u\n", vq->vring.avail, virtio_lduw_phys(vdev, vq->vring.avail + offsetof(VRingAvail, idx))); + fprintf(stderr, "used_idx %u\n", vq->used_idx); + abort(); /* <--- core dump! */ virtio_error(vdev, "Virtqueue size exceeded"); goto done; }
On Mon, Feb 18, 2019 at 07:21:25AM +0000, Fernando Casas Schössow wrote: > It took a few days but last night the problem was reproduced. > This is the information from the log: > > vdev 0x55f261d940f0 ("virtio-blk") > vq 0x55f261d9ee40 (idx 0) > inuse 128 vring.num 128 > old_shadow_avail_idx 58874 last_avail_idx 58625 avail_idx 58874 > avail 0x3d87a800 avail_idx (cache bypassed) 58625 Hi Paolo, Are you aware of any recent MemoryRegionCache issues? The avail_idx value 58874 was read via the cache while a non-cached read produces 58625! I suspect that 58625 is correct since the vring is already full and the driver wouldn't bump avail_idx any further until requests complete. Fernando also hits this issue with virtio-scsi so it's not a virtio_blk.ko driver bug or a virtio-blk device emulation issue. A QEMU core dump is available for debugging. Here is the patch that produced this debug output: diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c index a1ff647a66..28d89fcbcb 100644 --- a/hw/virtio/virtio.c +++ b/hw/virtio/virtio.c @@ -866,6 +866,7 @@ void *virtqueue_pop(VirtQueue *vq, size_t sz) return NULL; } rcu_read_lock(); + uint16_t old_shadow_avail_idx = vq->shadow_avail_idx; if (virtio_queue_empty_rcu(vq)) { goto done; } @@ -879,6 +880,12 @@ void *virtqueue_pop(VirtQueue *vq, size_t sz) max = vq->vring.num; if (vq->inuse >= vq->vring.num) { + fprintf(stderr, "vdev %p (\"%s\")\n", vdev, vdev->name); + fprintf(stderr, "vq %p (idx %u)\n", vq, (unsigned int)(vq - vdev->vq)); + fprintf(stderr, "inuse %u vring.num %u\n", vq->inuse, vq->vring.num); + fprintf(stderr, "old_shadow_avail_idx %u last_avail_idx %u avail_idx %u\n", old_shadow_avail_idx, vq->last_avail_idx, vq->shadow_avail_idx); + fprintf(stderr, "avail %#" HWADDR_PRIx " avail_idx (cache bypassed) %u\n", vq->vring.avail, virtio_lduw_phys(vdev, vq->vring.avail + offsetof(VRingAvail, idx))); + fprintf(stderr, "used_idx %u\n", vq->used_idx); + abort(); /* <--- core dump! */ virtio_error(vdev, "Virtqueue size exceeded"); goto done; } Stefan > used_idx 58497 > 2019-02-18 03:20:08.605+0000: shutting down, reason=crashed > > The dump file, including guest memory, was generated successfully (after gzip the file is around 492MB). > I switched the guest now to virtio-scsi to get the information and dump with this setup as well. > > How should we proceed? > > Thanks. > > On lun, feb 11, 2019 at 4:17 AM, Stefan Hajnoczi <stefanha@gmail.com> wrote: > Thanks for collecting the data! The fact that both virtio-blk and virtio-scsi failed suggests it's not a virtqueue element leak in the virtio-blk or virtio-scsi device emulation code. The hung task error messages from inside the guest are a consequence of QEMU hitting the "Virtqueue size exceeded" error. QEMU refuses to process further requests after the error, causing tasks inside the guest to get stuck on I/O. I don't have a good theory regarding the root cause. Two ideas: 1. The guest is corrupting the vring or submitting more requests than will fit into the ring. Somewhat unlikely because it happens with both Windows and Linux guests. 2. QEMU's virtqueue code is buggy, maybe the memory region cache which is used for fast guest RAM accesses. Here is an expanded version of the debug patch which might help identify which of these scenarios is likely. Sorry, it requires running the guest again! This time let's make QEMU dump core so both QEMU state and guest RAM are captured for further debugging. That way it will be possible to extract more information using gdb without rerunning. Stefan --- diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c index a1ff647a66..28d89fcbcb 100644 --- a/hw/virtio/virtio.c +++ b/hw/virtio/virtio.c @@ -866,6 +866,7 @@ void *virtqueue_pop(VirtQueue *vq, size_t sz) return NULL; } rcu_read_lock(); + uint16_t old_shadow_avail_idx = vq->shadow_avail_idx; if (virtio_queue_empty_rcu(vq)) { goto done; } @@ -879,6 +880,12 @@ void *virtqueue_pop(VirtQueue *vq, size_t sz) max = vq->vring.num; if (vq->inuse >= vq->vring.num) { + fprintf(stderr, "vdev %p (\"%s\")\n", vdev, vdev->name); + fprintf(stderr, "vq %p (idx %u)\n", vq, (unsigned int)(vq - vdev->vq)); + fprintf(stderr, "inuse %u vring.num %u\n", vq->inuse, vq->vring.num); + fprintf(stderr, "old_shadow_avail_idx %u last_avail_idx %u avail_idx %u\n", old_shadow_avail_idx, vq->last_avail_idx, vq->shadow_avail_idx); + fprintf(stderr, "avail %#" HWADDR_PRIx " avail_idx (cache bypassed) %u\n", vq->vring.avail, virtio_lduw_phys(vdev, vq->vring.avail + offsetof(VRingAvail, idx))); + fprintf(stderr, "used_idx %u\n", vq->used_idx); + abort(); /* <--- core dump! */ virtio_error(vdev, "Virtqueue size exceeded"); goto done; } > >
On 20/02/19 17:58, Stefan Hajnoczi wrote: > On Mon, Feb 18, 2019 at 07:21:25AM +0000, Fernando Casas Schössow wrote: >> It took a few days but last night the problem was reproduced. >> This is the information from the log: >> >> vdev 0x55f261d940f0 ("virtio-blk") >> vq 0x55f261d9ee40 (idx 0) >> inuse 128 vring.num 128 >> old_shadow_avail_idx 58874 last_avail_idx 58625 avail_idx 58874 >> avail 0x3d87a800 avail_idx (cache bypassed) 58625 > > Hi Paolo, > Are you aware of any recent MemoryRegionCache issues? The avail_idx > value 58874 was read via the cache while a non-cached read produces > 58625! > > I suspect that 58625 is correct since the vring is already full and the > driver wouldn't bump avail_idx any further until requests complete. > > Fernando also hits this issue with virtio-scsi so it's not a > virtio_blk.ko driver bug or a virtio-blk device emulation issue. No, I am not aware of any issues. How can I get the core dump (and the corresponding executable to get the symbols)? Alternatively, it should be enough to print the vq->vring.caches->avail.mrs from the debugger. Also, one possibility is to add in vring_avail_idx an assertion like assert(vq->shadow_availa_idx == virtio_lduw_phys(vdev, vq->vring.avail + offsetof(VRingAvail, idx))); and try to catch the error earlier. Paolo > A QEMU core dump is available for debugging. > > Here is the patch that produced this debug output: > diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c > index a1ff647a66..28d89fcbcb 100644 > --- a/hw/virtio/virtio.c > +++ b/hw/virtio/virtio.c > @@ -866,6 +866,7 @@ void *virtqueue_pop(VirtQueue *vq, size_t sz) > return NULL; > } > rcu_read_lock(); > + uint16_t old_shadow_avail_idx = vq->shadow_avail_idx; > if (virtio_queue_empty_rcu(vq)) { > goto done; > } > @@ -879,6 +880,12 @@ void *virtqueue_pop(VirtQueue *vq, size_t sz) > max = vq->vring.num; > > if (vq->inuse >= vq->vring.num) { > + fprintf(stderr, "vdev %p (\"%s\")\n", vdev, vdev->name); > + fprintf(stderr, "vq %p (idx %u)\n", vq, (unsigned int)(vq - vdev->vq)); > + fprintf(stderr, "inuse %u vring.num %u\n", vq->inuse, vq->vring.num); > + fprintf(stderr, "old_shadow_avail_idx %u last_avail_idx %u avail_idx %u\n", old_shadow_avail_idx, vq->last_avail_idx, vq->shadow_avail_idx); > + fprintf(stderr, "avail %#" HWADDR_PRIx " avail_idx (cache bypassed) %u\n", vq->vring.avail, virtio_lduw_phys(vdev, vq->vring.avail + offsetof(VRingAvail, idx))); > + fprintf(stderr, "used_idx %u\n", vq->used_idx); > + abort(); /* <--- core dump! */ > virtio_error(vdev, "Virtqueue size exceeded"); > goto done; > } > > Stefan > >> used_idx 58497 >> 2019-02-18 03:20:08.605+0000: shutting down, reason=crashed >> >> The dump file, including guest memory, was generated successfully (after gzip the file is around 492MB). >> I switched the guest now to virtio-scsi to get the information and dump with this setup as well. >> >> How should we proceed? >> >> Thanks. >> >> On lun, feb 11, 2019 at 4:17 AM, Stefan Hajnoczi <stefanha@gmail.com> wrote: >> Thanks for collecting the data! The fact that both virtio-blk and virtio-scsi failed suggests it's not a virtqueue element leak in the virtio-blk or virtio-scsi device emulation code. The hung task error messages from inside the guest are a consequence of QEMU hitting the "Virtqueue size exceeded" error. QEMU refuses to process further requests after the error, causing tasks inside the guest to get stuck on I/O. I don't have a good theory regarding the root cause. Two ideas: 1. The guest is corrupting the vring or submitting more requests than will fit into the ring. Somewhat unlikely because it happens with both Windows and Linux guests. 2. QEMU's virtqueue code is buggy, maybe the memory region cache which is used for fast guest RAM accesses. Here is an expanded version of the debug patch which might help identify which of these scenarios is likely. Sorry, it requires running the guest again! This time let's make QEMU dump core so both QEMU state and guest RAM are captured for further debugging. That way it will be possible to extract more information using gdb without rerunning. Stefan --- diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c index a1ff647a66..28d89fcbcb 100644 --- a/hw/virtio/virtio.c +++ b/hw/virtio/virtio.c @@ -866,6 +866,7 @@ void *virtqueue_pop(VirtQueue *vq, size_t sz) return NULL; } rcu_read_lock(); + uint16_t old_shadow_avail_idx = vq->shadow_avail_idx; if (virtio_queue_empty_rcu(vq)) { goto done; } @@ -879,6 +880,12 @@ void *virtqueue_pop(VirtQueue *vq, size_t sz) max = vq->vring.num; if (vq->inuse >= vq->vring.num) { + fprintf(stderr, "vdev %p (\"%s\")\n", vdev, vdev->name); + fprintf(stderr, "vq %p (idx %u)\n", vq, (unsigned int)(vq - vdev->vq)); + fprintf(stderr, "inuse %u vring.num %u\n", vq->inuse, vq->vring.num); + fprintf(stderr, "old_shadow_avail_idx %u last_avail_idx %u avail_idx %u\n", old_shadow_avail_idx, vq->last_avail_idx, vq->shadow_avail_idx); + fprintf(stderr, "avail %#" HWADDR_PRIx " avail_idx (cache bypassed) %u\n", vq->vring.avail, virtio_lduw_phys(vdev, vq->vring.avail + offsetof(VRingAvail, idx))); + fprintf(stderr, "used_idx %u\n", vq->used_idx); + abort(); /* <--- core dump! */ virtio_error(vdev, "Virtqueue size exceeded"); goto done; } >> >>
Hi Paolo, This is Fernando, the one that reported the issue. Regarding the dumps I have three of them including guest memory, 2 for virtio-scsi, 1 for virtio-blk, in case a comparison may help to confirm which is the proble.) I can upload them to a server you indicate me or I can also put them on a server so you can download them as you see fit. Each dump, compressed, is around 500MB. If it's more convenient for you I can try to get the requested information from gdb. But I will need some guidance since I'm not skilled enough with the debugger. Another option, if you provide me with the right patch, is for me to patch, rebuild QEMU and repro the problem again. With virtio-scsi I'm able to repro this in a matter of hours most of the times, with virtio-blk it will take a couple of days. Just let me know how do you prefer to move forward. Thanks a lot for helping with this! Kind regards, Fernando On mié, feb 20, 2019 at 6:53 PM, Paolo Bonzini <pbonzini@redhat.com> wrote: On 20/02/19 17:58, Stefan Hajnoczi wrote: On Mon, Feb 18, 2019 at 07:21:25AM +0000, Fernando Casas Schössow wrote: It took a few days but last night the problem was reproduced. This is the information from the log: vdev 0x55f261d940f0 ("virtio-blk") vq 0x55f261d9ee40 (idx 0) inuse 128 vring.num 128 old_shadow_avail_idx 58874 last_avail_idx 58625 avail_idx 58874 avail 0x3d87a800 avail_idx (cache bypassed) 58625 Hi Paolo, Are you aware of any recent MemoryRegionCache issues? The avail_idx value 58874 was read via the cache while a non-cached read produces 58625! I suspect that 58625 is correct since the vring is already full and the driver wouldn't bump avail_idx any further until requests complete. Fernando also hits this issue with virtio-scsi so it's not a virtio_blk.ko driver bug or a virtio-blk device emulation issue. No, I am not aware of any issues. How can I get the core dump (and the corresponding executable to get the symbols)? Alternatively, it should be enough to print the vq->vring.caches->avail.mrs from the debugger. Also, one possibility is to add in vring_avail_idx an assertion like assert(vq->shadow_availa_idx == virtio_lduw_phys(vdev, vq->vring.avail + offsetof(VRingAvail, idx))); and try to catch the error earlier. Paolo A QEMU core dump is available for debugging. Here is the patch that produced this debug output: diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c index a1ff647a66..28d89fcbcb 100644 --- a/hw/virtio/virtio.c +++ b/hw/virtio/virtio.c @@ -866,6 +866,7 @@ void *virtqueue_pop(VirtQueue *vq, size_t sz) return NULL; } rcu_read_lock(); + uint16_t old_shadow_avail_idx = vq->shadow_avail_idx; if (virtio_queue_empty_rcu(vq)) { goto done; } @@ -879,6 +880,12 @@ void *virtqueue_pop(VirtQueue *vq, size_t sz) max = vq->vring.num; if (vq->inuse >= vq->vring.num) { + fprintf(stderr, "vdev %p (\"%s\")\n", vdev, vdev->name); + fprintf(stderr, "vq %p (idx %u)\n", vq, (unsigned int)(vq - vdev->vq)); + fprintf(stderr, "inuse %u vring.num %u\n", vq->inuse, vq->vring.num); + fprintf(stderr, "old_shadow_avail_idx %u last_avail_idx %u avail_idx %u\n", old_shadow_avail_idx, vq->last_avail_idx, vq->shadow_avail_idx); + fprintf(stderr, "avail %#" HWADDR_PRIx " avail_idx (cache bypassed) %u\n", vq->vring.avail, virtio_lduw_phys(vdev, vq->vring.avail + offsetof(VRingAvail, idx))); + fprintf(stderr, "used_idx %u\n", vq->used_idx); + abort(); /* <--- core dump! */ virtio_error(vdev, "Virtqueue size exceeded"); goto done; } Stefan used_idx 58497 2019-02-18 03:20:08.605+0000: shutting down, reason=crashed The dump file, including guest memory, was generated successfully (after gzip the file is around 492MB). I switched the guest now to virtio-scsi to get the information and dump with this setup as well. How should we proceed? Thanks.
On Wed, Feb 20, 2019 at 06:56:04PM +0000, Fernando Casas Schössow wrote:
> Regarding the dumps I have three of them including guest memory, 2 for virtio-scsi, 1 for virtio-blk, in case a comparison may help to confirm which is the proble.) I can upload them to a server you indicate me or I can also put them on a server so you can download them as you see fit. Each dump, compressed, is around 500MB.
Hi Fernando,
It would be great if you could make a compressed coredump and the
corresponding QEMU executable (hopefully with debug symbols) available
on a server so Paolo and/or I can download them.
If you're wondering about the debug symbols, since you built QEMU from
source the binary in x86_64-softmmu/qemu-system-x86_64 should have the
debug symbols.
"gdb path/to/qemu-system-x86_64" will print "Reading symbols from
qemu-system-x86_64...done." if symbols are available. Otherwise it will
say "Reading symbols from qemu-system-x86_64...(no debugging symbols
found)...done.".
Thanks,
Stefan
Hi Stefan, I can confirm that the symbols are included in the binary using gdb. I will send you and Paolo an email with the link and credentials (if needed) so you can download everything. Thanks! On jue, feb 21, 2019 at 12:11 PM, Stefan Hajnoczi <stefanha@gmail.com> wrote: On Wed, Feb 20, 2019 at 06:56:04PM +0000, Fernando Casas Schössow wrote: Regarding the dumps I have three of them including guest memory, 2 for virtio-scsi, 1 for virtio-blk, in case a comparison may help to confirm which is the proble.) I can upload them to a server you indicate me or I can also put them on a server so you can download them as you see fit. Each dump, compressed, is around 500MB. Hi Fernando, It would be great if you could make a compressed coredump and the corresponding QEMU executable (hopefully with debug symbols) available on a server so Paolo and/or I can download them. If you're wondering about the debug symbols, since you built QEMU from source the binary in x86_64-softmmu/qemu-system-x86_64 should have the debug symbols. "gdb path/to/qemu-system-x86_64" will print "Reading symbols from qemu-system-x86_64...done." if symbols are available. Otherwise it will say "Reading symbols from qemu-system-x86_64...(no debugging symbols found)...done.". Thanks, Stefan
On Fri, Feb 22, 2019 at 12:57 PM Fernando Casas Schössow <casasfernando@outlook.com> wrote: I have CCed Natanael Copa, qemu package maintainer in Alpine Linux. Fernando: Can you confirm that the bug occurs with an unmodified Alpine Linux qemu binary? Richard: Commit 7db2145a6826b14efceb8dd64bfe6ad8647072eb ("bswap: Add host endian unaligned access functions") introduced the unaligned memory access functions in question here. Please see below for details on the bug - basically QEMU code assumes they are atomic, but that is not guaranteed :(. Any ideas for how to fix this? Natanael: It seems likely that the qemu package in Alpine Linux suffers from a compilation issue resulting in a broken QEMU. It may be necessary to leave the compiler optimization flag alone in APKBUILD to work around this problem. Here are the details. QEMU relies on the compiler turning memcpy(&dst, &src, 2) turning into a load instruction in include/qemu/bswap.h:lduw_he_p() (explanation below): /* Any compiler worth its salt will turn these memcpy into native unaligned operations. Thus we don't need to play games with packed attributes, or inline byte-by-byte stores. */ static inline int lduw_he_p(const void *ptr) { uint16_t r; memcpy(&r, ptr, sizeof(r)); return r; } Here is the disassembly snippet of virtqueue_pop() from Fedora 29 that shows the load instruction: 398166: 0f b7 42 02 movzwl 0x2(%rdx),%eax 39816a: 66 89 43 32 mov %ax,0x32(%rbx) Here is the instruction sequence in the Alpine Linux binary: 455562: ba 02 00 00 00 mov $0x2,%edx 455567: e8 74 24 f3 ff callq 3879e0 <memcpy@plt> 45556c: 0f b7 44 24 42 movzwl 0x42(%rsp),%eax 455571: 66 41 89 47 32 mov %ax,0x32(%r15) It's calling memcpy instead of using a load instruction. Fernando found that QEMU's virtqueue_pop() function sees bogus values when loading a 16-bit guest RAM location. Paolo figured out that the bogus value can be produced by memcpy() when another thread is updating the 16-bit memory location simultaneously. This is a race condition between one thread loading the 16-bit value and another thread storing it (in this case a guest vcpu thread). Sometimes memcpy() may load one old byte and one new byte, resulting in a bogus value. The symptom that Fernando experienced is a "Virtqueue size exceeded" error message from QEMU and then the virtio-blk or virtio-scsi device stops working. This issue potentially affects other device emulation code in QEMU as well, not just virtio devices. For the time being, I suggest tweaking the APKBUILD so the memcpy() is not generated. Hopefully QEMU can make the code more portable in the future so the compiler always does the expected thing, but this may not be easily possible. Stefan
On 22/02/19 15:04, Stefan Hajnoczi wrote: > On Fri, Feb 22, 2019 at 12:57 PM Fernando Casas Schössow > <casasfernando@outlook.com> wrote: > > I have CCed Natanael Copa, qemu package maintainer in Alpine Linux. > > Fernando: Can you confirm that the bug occurs with an unmodified > Alpine Linux qemu binary? I can confirm that the memcpy calls are there, at least. Paolo > Richard: Commit 7db2145a6826b14efceb8dd64bfe6ad8647072eb ("bswap: Add > host endian unaligned access functions") introduced the unaligned > memory access functions in question here. Please see below for > details on the bug - basically QEMU code assumes they are atomic, but > that is not guaranteed :(. Any ideas for how to fix this? > > Natanael: It seems likely that the qemu package in Alpine Linux > suffers from a compilation issue resulting in a broken QEMU. It may > be necessary to leave the compiler optimization flag alone in APKBUILD > to work around this problem. > > Here are the details. QEMU relies on the compiler turning > memcpy(&dst, &src, 2) turning into a load instruction in > include/qemu/bswap.h:lduw_he_p() (explanation below): > > /* Any compiler worth its salt will turn these memcpy into native unaligned > operations. Thus we don't need to play games with packed attributes, or > inline byte-by-byte stores. */ > > static inline int lduw_he_p(const void *ptr) > { > uint16_t r; > memcpy(&r, ptr, sizeof(r)); > return r; > } > > Here is the disassembly snippet of virtqueue_pop() from Fedora 29 that > shows the load instruction: > > 398166: 0f b7 42 02 movzwl 0x2(%rdx),%eax > 39816a: 66 89 43 32 mov %ax,0x32(%rbx) > > Here is the instruction sequence in the Alpine Linux binary: > > 455562: ba 02 00 00 00 mov $0x2,%edx > 455567: e8 74 24 f3 ff callq 3879e0 <memcpy@plt> > 45556c: 0f b7 44 24 42 movzwl 0x42(%rsp),%eax > 455571: 66 41 89 47 32 mov %ax,0x32(%r15) > > It's calling memcpy instead of using a load instruction. > > Fernando found that QEMU's virtqueue_pop() function sees bogus values > when loading a 16-bit guest RAM location. Paolo figured out that the > bogus value can be produced by memcpy() when another thread is > updating the 16-bit memory location simultaneously. This is a race > condition between one thread loading the 16-bit value and another > thread storing it (in this case a guest vcpu thread). Sometimes > memcpy() may load one old byte and one new byte, resulting in a bogus > value. > > The symptom that Fernando experienced is a "Virtqueue size exceeded" > error message from QEMU and then the virtio-blk or virtio-scsi device > stops working. This issue potentially affects other device emulation > code in QEMU as well, not just virtio devices. > > For the time being, I suggest tweaking the APKBUILD so the memcpy() is > not generated. Hopefully QEMU can make the code more portable in the > future so the compiler always does the expected thing, but this may > not be easily possible. > > Stefan >
Hi all, Indeed I can confirm this issue occurs with the stock, unmodified QEMU binary provided with Alpine since at least Alpine 3.6 up to 3.9. Is there any compiler flag I can tweak, add or remove to rebuild qemu and try to repro to confirm a possible workaround? Thanks. Fernando Sent from my iPhone > On 22 Feb 2019, at 15:04, Stefan Hajnoczi <stefanha@gmail.com> wrote: > > On Fri, Feb 22, 2019 at 12:57 PM Fernando Casas Schössow > <casasfernando@outlook.com> wrote: > > I have CCed Natanael Copa, qemu package maintainer in Alpine Linux. > > Fernando: Can you confirm that the bug occurs with an unmodified > Alpine Linux qemu binary? > > Richard: Commit 7db2145a6826b14efceb8dd64bfe6ad8647072eb ("bswap: Add > host endian unaligned access functions") introduced the unaligned > memory access functions in question here. Please see below for > details on the bug - basically QEMU code assumes they are atomic, but > that is not guaranteed :(. Any ideas for how to fix this? > > Natanael: It seems likely that the qemu package in Alpine Linux > suffers from a compilation issue resulting in a broken QEMU. It may > be necessary to leave the compiler optimization flag alone in APKBUILD > to work around this problem. > > Here are the details. QEMU relies on the compiler turning > memcpy(&dst, &src, 2) turning into a load instruction in > include/qemu/bswap.h:lduw_he_p() (explanation below): > > /* Any compiler worth its salt will turn these memcpy into native unaligned > operations. Thus we don't need to play games with packed attributes, or > inline byte-by-byte stores. */ > > static inline int lduw_he_p(const void *ptr) > { > uint16_t r; > memcpy(&r, ptr, sizeof(r)); > return r; > } > > Here is the disassembly snippet of virtqueue_pop() from Fedora 29 that > shows the load instruction: > > 398166: 0f b7 42 02 movzwl 0x2(%rdx),%eax > 39816a: 66 89 43 32 mov %ax,0x32(%rbx) > > Here is the instruction sequence in the Alpine Linux binary: > > 455562: ba 02 00 00 00 mov $0x2,%edx > 455567: e8 74 24 f3 ff callq 3879e0 <memcpy@plt> > 45556c: 0f b7 44 24 42 movzwl 0x42(%rsp),%eax > 455571: 66 41 89 47 32 mov %ax,0x32(%r15) > > It's calling memcpy instead of using a load instruction. > > Fernando found that QEMU's virtqueue_pop() function sees bogus values > when loading a 16-bit guest RAM location. Paolo figured out that the > bogus value can be produced by memcpy() when another thread is > updating the 16-bit memory location simultaneously. This is a race > condition between one thread loading the 16-bit value and another > thread storing it (in this case a guest vcpu thread). Sometimes > memcpy() may load one old byte and one new byte, resulting in a bogus > value. > > The symptom that Fernando experienced is a "Virtqueue size exceeded" > error message from QEMU and then the virtio-blk or virtio-scsi device > stops working. This issue potentially affects other device emulation > code in QEMU as well, not just virtio devices. > > For the time being, I suggest tweaking the APKBUILD so the memcpy() is > not generated. Hopefully QEMU can make the code more portable in the > future so the compiler always does the expected thing, but this may > not be easily possible. > > Stefan
On 22/02/19 15:43, Fernando Casas Schössow wrote: > Hi all, > > Indeed I can confirm this issue occurs with the stock, unmodified > QEMU binary provided with Alpine since at least Alpine 3.6 up to > 3.9. > > Is there any compiler flag I can tweak, add or remove to rebuild qemu > and try to repro to confirm a possible workaround? Nope, not yet. However, I can try some experiments if you can provide some information on how to rebuild an apk. Paolo > Thanks. > > Fernando > > Sent from my iPhone > >> On 22 Feb 2019, at 15:04, Stefan Hajnoczi <stefanha@gmail.com> wrote: >> >> On Fri, Feb 22, 2019 at 12:57 PM Fernando Casas Schössow >> <casasfernando@outlook.com> wrote: >> >> I have CCed Natanael Copa, qemu package maintainer in Alpine Linux. >> >> Fernando: Can you confirm that the bug occurs with an unmodified >> Alpine Linux qemu binary? >> >> Richard: Commit 7db2145a6826b14efceb8dd64bfe6ad8647072eb ("bswap: Add >> host endian unaligned access functions") introduced the unaligned >> memory access functions in question here. Please see below for >> details on the bug - basically QEMU code assumes they are atomic, but >> that is not guaranteed :(. Any ideas for how to fix this? >> >> Natanael: It seems likely that the qemu package in Alpine Linux >> suffers from a compilation issue resulting in a broken QEMU. It may >> be necessary to leave the compiler optimization flag alone in APKBUILD >> to work around this problem. >> >> Here are the details. QEMU relies on the compiler turning >> memcpy(&dst, &src, 2) turning into a load instruction in >> include/qemu/bswap.h:lduw_he_p() (explanation below): >> >> /* Any compiler worth its salt will turn these memcpy into native unaligned >> operations. Thus we don't need to play games with packed attributes, or >> inline byte-by-byte stores. */ >> >> static inline int lduw_he_p(const void *ptr) >> { >> uint16_t r; >> memcpy(&r, ptr, sizeof(r)); >> return r; >> } >> >> Here is the disassembly snippet of virtqueue_pop() from Fedora 29 that >> shows the load instruction: >> >> 398166: 0f b7 42 02 movzwl 0x2(%rdx),%eax >> 39816a: 66 89 43 32 mov %ax,0x32(%rbx) >> >> Here is the instruction sequence in the Alpine Linux binary: >> >> 455562: ba 02 00 00 00 mov $0x2,%edx >> 455567: e8 74 24 f3 ff callq 3879e0 <memcpy@plt> >> 45556c: 0f b7 44 24 42 movzwl 0x42(%rsp),%eax >> 455571: 66 41 89 47 32 mov %ax,0x32(%r15) >> >> It's calling memcpy instead of using a load instruction. >> >> Fernando found that QEMU's virtqueue_pop() function sees bogus values >> when loading a 16-bit guest RAM location. Paolo figured out that the >> bogus value can be produced by memcpy() when another thread is >> updating the 16-bit memory location simultaneously. This is a race >> condition between one thread loading the 16-bit value and another >> thread storing it (in this case a guest vcpu thread). Sometimes >> memcpy() may load one old byte and one new byte, resulting in a bogus >> value. >> >> The symptom that Fernando experienced is a "Virtqueue size exceeded" >> error message from QEMU and then the virtio-blk or virtio-scsi device >> stops working. This issue potentially affects other device emulation >> code in QEMU as well, not just virtio devices. >> >> For the time being, I suggest tweaking the APKBUILD so the memcpy() is >> not generated. Hopefully QEMU can make the code more portable in the >> future so the compiler always does the expected thing, but this may >> not be easily possible. >> >> Stefan
You can find the the information here: https://wiki.alpinelinux.org/wiki/Creating_an_Alpine_package For your convenience, I can setup an Alpine VM and give you SSH access with the build environment and the aports tree already setup. Then summary steps in case you want to try on your own container/VM would be: 1- Setup the build environment as described in the wiki. 2- Clone aports tree: git clone https://github.com/alpinelinux/aports 3- Go into the qemu package directory directory in aports tree: cd aports/main/qemu 4- Run: abuild clean checksum unpack prepare deps build This would get you the binary built. If you need to add a patch. Create the patch file add it to the APKBUILD file and then run "abuild checksum" to compute it's checksum and add it to the APKBUILD file. abuild quick ref: abuild clean: Remove temp build and install dirs abuild unpack: Unpack sources to \$srcdir abuild checksum: Generate checksum to be included in APKBUILD abuild prepare: Apply patches abuild deps: Install packages listed in makedepends and depends abuild build: Compile and install package into $pkgdir Once finished to clean up everything run: abuild undeps clean Thanks. Fernando On vie, feb 22, 2019 at 3:55 PM, Paolo Bonzini <pbonzini@redhat.com> wrote: On 22/02/19 15:43, Fernando Casas Schössow wrote: Hi all, Indeed I can confirm this issue occurs with the stock, unmodified QEMU binary provided with Alpine since at least Alpine 3.6 up to 3.9. Is there any compiler flag I can tweak, add or remove to rebuild qemu and try to repro to confirm a possible workaround? Nope, not yet. However, I can try some experiments if you can provide some information on how to rebuild an apk. Paolo
* Stefan Hajnoczi (stefanha@gmail.com) wrote: > On Fri, Feb 22, 2019 at 12:57 PM Fernando Casas Schössow > <casasfernando@outlook.com> wrote: > > I have CCed Natanael Copa, qemu package maintainer in Alpine Linux. > > Fernando: Can you confirm that the bug occurs with an unmodified > Alpine Linux qemu binary? > > Richard: Commit 7db2145a6826b14efceb8dd64bfe6ad8647072eb ("bswap: Add > host endian unaligned access functions") introduced the unaligned > memory access functions in question here. Please see below for > details on the bug - basically QEMU code assumes they are atomic, but > that is not guaranteed :(. Any ideas for how to fix this? Why does vring_avail_idx use virtio_ld*u*w_phys_cached? (similar for vring_avail_ring). There's no generically safe way to do unaligned atomic loads - don't we know in virtio that these are naturally aligned? Dave > Natanael: It seems likely that the qemu package in Alpine Linux > suffers from a compilation issue resulting in a broken QEMU. It may > be necessary to leave the compiler optimization flag alone in APKBUILD > to work around this problem. > > Here are the details. QEMU relies on the compiler turning > memcpy(&dst, &src, 2) turning into a load instruction in > include/qemu/bswap.h:lduw_he_p() (explanation below): > > /* Any compiler worth its salt will turn these memcpy into native unaligned > operations. Thus we don't need to play games with packed attributes, or > inline byte-by-byte stores. */ > > static inline int lduw_he_p(const void *ptr) > { > uint16_t r; > memcpy(&r, ptr, sizeof(r)); > return r; > } > > Here is the disassembly snippet of virtqueue_pop() from Fedora 29 that > shows the load instruction: > > 398166: 0f b7 42 02 movzwl 0x2(%rdx),%eax > 39816a: 66 89 43 32 mov %ax,0x32(%rbx) > > Here is the instruction sequence in the Alpine Linux binary: > > 455562: ba 02 00 00 00 mov $0x2,%edx > 455567: e8 74 24 f3 ff callq 3879e0 <memcpy@plt> > 45556c: 0f b7 44 24 42 movzwl 0x42(%rsp),%eax > 455571: 66 41 89 47 32 mov %ax,0x32(%r15) > > It's calling memcpy instead of using a load instruction. > > Fernando found that QEMU's virtqueue_pop() function sees bogus values > when loading a 16-bit guest RAM location. Paolo figured out that the > bogus value can be produced by memcpy() when another thread is > updating the 16-bit memory location simultaneously. This is a race > condition between one thread loading the 16-bit value and another > thread storing it (in this case a guest vcpu thread). Sometimes > memcpy() may load one old byte and one new byte, resulting in a bogus > value. > > The symptom that Fernando experienced is a "Virtqueue size exceeded" > error message from QEMU and then the virtio-blk or virtio-scsi device > stops working. This issue potentially affects other device emulation > code in QEMU as well, not just virtio devices. > > For the time being, I suggest tweaking the APKBUILD so the memcpy() is > not generated. Hopefully QEMU can make the code more portable in the > future so the compiler always does the expected thing, but this may > not be easily possible. > > Stefan > -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
On 22/02/19 17:37, Dr. David Alan Gilbert wrote: > Why does vring_avail_idx use virtio_ld*u*w_phys_cached? > (similar for vring_avail_ring). > There's no generically safe way to do unaligned atomic loads - don't we know > in virtio that these are naturally aligned? u is for unsigned. :) We know that these are aligned and so they will be atomic. Paolo
* Paolo Bonzini (pbonzini@redhat.com) wrote: > On 22/02/19 17:37, Dr. David Alan Gilbert wrote: > > Why does vring_avail_idx use virtio_ld*u*w_phys_cached? > > (similar for vring_avail_ring). > > There's no generically safe way to do unaligned atomic loads - don't we know > > in virtio that these are naturally aligned? > > u is for unsigned. :) We know that these are aligned and so they will > be atomic. Ah OK! Dave > Paolo -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
On Fri, 22 Feb 2019 14:04:20 +0000 Stefan Hajnoczi <stefanha@gmail.com> wrote: > On Fri, Feb 22, 2019 at 12:57 PM Fernando Casas Schössow > <casasfernando@outlook.com> wrote: > > I have CCed Natanael Copa, qemu package maintainer in Alpine Linux. Hi! ... > Richard: Commit 7db2145a6826b14efceb8dd64bfe6ad8647072eb ("bswap: Add > host endian unaligned access functions") introduced the unaligned > memory access functions in question here. Please see below for > details on the bug - basically QEMU code assumes they are atomic, but > that is not guaranteed :(. Any ideas for how to fix this? > > Natanael: It seems likely that the qemu package in Alpine Linux > suffers from a compilation issue resulting in a broken QEMU. It may > be necessary to leave the compiler optimization flag alone in APKBUILD > to work around this problem. > > Here are the details. QEMU relies on the compiler turning > memcpy(&dst, &src, 2) turning into a load instruction in > include/qemu/bswap.h:lduw_he_p() (explanation below): > > /* Any compiler worth its salt will turn these memcpy into native unaligned > operations. Thus we don't need to play games with packed attributes, or > inline byte-by-byte stores. */ > > static inline int lduw_he_p(const void *ptr) > { > uint16_t r; > memcpy(&r, ptr, sizeof(r)); > return r; > } > > Here is the disassembly snippet of virtqueue_pop() from Fedora 29 that > shows the load instruction: > > 398166: 0f b7 42 02 movzwl 0x2(%rdx),%eax > 39816a: 66 89 43 32 mov %ax,0x32(%rbx) > > Here is the instruction sequence in the Alpine Linux binary: > > 455562: ba 02 00 00 00 mov $0x2,%edx > 455567: e8 74 24 f3 ff callq 3879e0 <memcpy@plt> > 45556c: 0f b7 44 24 42 movzwl 0x42(%rsp),%eax > 455571: 66 41 89 47 32 mov %ax,0x32(%r15) > > It's calling memcpy instead of using a load instruction. My first reaction to this is: If the intention is to not actually call memcpy function, then maybe memcpy should not be used in the C code in first place? > Fernando found that QEMU's virtqueue_pop() function sees bogus values > when loading a 16-bit guest RAM location. Paolo figured out that the > bogus value can be produced by memcpy() when another thread is > updating the 16-bit memory location simultaneously. This is a race > condition between one thread loading the 16-bit value and another > thread storing it (in this case a guest vcpu thread). Sometimes > memcpy() may load one old byte and one new byte, resulting in a bogus > value. > > The symptom that Fernando experienced is a "Virtqueue size exceeded" > error message from QEMU and then the virtio-blk or virtio-scsi device > stops working. This issue potentially affects other device emulation > code in QEMU as well, not just virtio devices. > > For the time being, I suggest tweaking the APKBUILD so the memcpy() is > not generated. Hopefully QEMU can make the code more portable in the > future so the compiler always does the expected thing, but this may > not be easily possible. I suspect this happens due to the Alpine toolchain will enable _FORTIFY_SOURCE=2 by default and the way this is implemented via fortify-headers: http://git.2f30.org/fortify-headers/file/include/string.h.html#l39 Try build with -U_FORTIFY_SOURCE > > Stefan
On Fri, 22 Feb 2019 14:04:20 +0000 Stefan Hajnoczi <stefanha@gmail.com> wrote: > On Fri, Feb 22, 2019 at 12:57 PM Fernando Casas Schössow > <casasfernando@outlook.com> wrote: > > I have CCed Natanael Copa, qemu package maintainer in Alpine Linux. > > Fernando: Can you confirm that the bug occurs with an unmodified > Alpine Linux qemu binary? I wonder exactly which binary it is. What arcitecture (x86 or x86_64) and what binary. Is it ddynamic or statically linked qemu-system-x86_64? > Richard: Commit 7db2145a6826b14efceb8dd64bfe6ad8647072eb ("bswap: Add > host endian unaligned access functions") introduced the unaligned > memory access functions in question here. Please see below for > details on the bug - basically QEMU code assumes they are atomic, but > that is not guaranteed :(. Any ideas for how to fix this? > > Natanael: It seems likely that the qemu package in Alpine Linux > suffers from a compilation issue resulting in a broken QEMU. It may > be necessary to leave the compiler optimization flag alone in APKBUILD > to work around this problem. > > Here are the details. QEMU relies on the compiler turning > memcpy(&dst, &src, 2) turning into a load instruction in > include/qemu/bswap.h:lduw_he_p() (explanation below): > > /* Any compiler worth its salt will turn these memcpy into native unaligned > operations. Thus we don't need to play games with packed attributes, or > inline byte-by-byte stores. */ > > static inline int lduw_he_p(const void *ptr) > { > uint16_t r; > memcpy(&r, ptr, sizeof(r)); > return r; > } > > Here is the disassembly snippet of virtqueue_pop() from Fedora 29 that > shows the load instruction: > > 398166: 0f b7 42 02 movzwl 0x2(%rdx),%eax > 39816a: 66 89 43 32 mov %ax,0x32(%rbx) > > Here is the instruction sequence in the Alpine Linux binary: > > 455562: ba 02 00 00 00 mov $0x2,%edx > 455567: e8 74 24 f3 ff callq 3879e0 <memcpy@plt> > 45556c: 0f b7 44 24 42 movzwl 0x42(%rsp),%eax > 455571: 66 41 89 47 32 mov %ax,0x32(%r15) > > It's calling memcpy instead of using a load instruction. I tried to find this section. How do you get the assembly listing of relevant secion? I tried to do "disas virtio_pop" from `gdb /usr/bin/qemu-system-x86_64` from the binary in alpine edge. I could find 2 memcpy but none of them look like a 16 bit operation after: 0x00000000004551f1 <+353>: mov 0x10(%rsp),%rdi 0x00000000004551f6 <+358>: mov $0x10,%edx 0x00000000004551fb <+363>: callq 0x3879e0 <memcpy@plt> 0x0000000000455200 <+368>: movzwl 0x5c(%rsp),%eax 0x0000000000455205 <+373>: test $0x4,%al 0x0000000000455207 <+375>: je 0x4552aa <virtqueue_pop+538> .... 0x0000000000455291 <+513>: mov 0x10(%rsp),%rdi 0x0000000000455296 <+518>: mov $0x10,%edx 0x000000000045529b <+523>: callq 0x3879e0 <memcpy@plt> 0x00000000004552a0 <+528>: mov %rbp,0x20(%rsp) 0x00000000004552a5 <+533>: movzwl 0x5c(%rsp),%eax 0x00000000004552aa <+538>: lea 0x20e0(%rsp),%rdi 0x00000000004552b2 <+546>: xor %r11d,%r11d 0x00000000004552b5 <+549>: mov %r15,0x38(%rsp) > > Fernando found that QEMU's virtqueue_pop() function sees bogus values > when loading a 16-bit guest RAM location. Paolo figured out that the > bogus value can be produced by memcpy() when another thread is > updating the 16-bit memory location simultaneously. This is a race > condition between one thread loading the 16-bit value and another > thread storing it (in this case a guest vcpu thread). Sometimes > memcpy() may load one old byte and one new byte, resulting in a bogus > value. > > The symptom that Fernando experienced is a "Virtqueue size exceeded" > error message from QEMU and then the virtio-blk or virtio-scsi device > stops working. This issue potentially affects other device emulation > code in QEMU as well, not just virtio devices. > > For the time being, I suggest tweaking the APKBUILD so the memcpy() is > not generated. Hopefully QEMU can make the code more portable in the > future so the compiler always does the expected thing, but this may > not be easily possible. I was thinking of something in the lines of: typedef volatile uint16_t __attribute__((__may_alias__)) volatile_uint16_t; static inline int lduw_he_p(const void *ptr) { volatile_uint16_t r = *(volatile_uint16_t*)ptr; return r; } I can test different CFLAGS with and without the _FORTIFY_SOURCE and with different variants of memcpy (like __builtint_memcpy etc) but i need find a way to get the correct assembly output so I know if/when I have found something that works. Thanks! -nc > > Stefan
On Sat, 23 Feb 2019 at 16:05, Natanael Copa <ncopa@alpinelinux.org> wrote: > I was thinking of something in the lines of: > > typedef volatile uint16_t __attribute__((__may_alias__)) volatile_uint16_t; > static inline int lduw_he_p(const void *ptr) > { > volatile_uint16_t r = *(volatile_uint16_t*)ptr; > return r; > } This won't correctly handle accesses with unaligned pointers, I'm afraid. We rely on these functions correctly working with pointers that are potentially unaligned. thanks -- PMM
Hi Natanael, The package information is the following: https://pkgs.alpinelinux.org/package/v3.9/main/x86_64/qemu-system-x86_64 The binary is for x86_64 architecture and dynamically linked: fernando@vmsvr01:~$ file /usr/bin/qemu-system-x86_64 /usr/bin/qemu-system-x86_64: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib/ld-musl-x86_64.so.1, stripped Thanks. On sáb, feb 23, 2019 at 4:55 PM, Natanael Copa <ncopa@alpinelinux.org> wrote: On Fri, 22 Feb 2019 14:04:20 +0000 Stefan Hajnoczi <stefanha@gmail.com<mailto:stefanha@gmail.com>> wrote: On Fri, Feb 22, 2019 at 12:57 PM Fernando Casas Schössow <casasfernando@outlook.com<mailto:casasfernando@outlook.com>> wrote: I have CCed Natanael Copa, qemu package maintainer in Alpine Linux. Fernando: Can you confirm that the bug occurs with an unmodified Alpine Linux qemu binary? I wonder exactly which binary it is. What arcitecture (x86 or x86_64) and what binary. Is it ddynamic or statically linked qemu-system-x86_64?
On Fri, 22 Feb 2019 at 14:07, Stefan Hajnoczi <stefanha@gmail.com> wrote: > Richard: Commit 7db2145a6826b14efceb8dd64bfe6ad8647072eb ("bswap: Add > host endian unaligned access functions") introduced the unaligned > memory access functions in question here. Please see below for > details on the bug - basically QEMU code assumes they are atomic, but > that is not guaranteed :(. Any ideas for how to fix this? I suspect we want a separate family of access functions for "I guarantee this will be an aligned access and I need the atomicity". (The other place where we've talked about needing the atomicity is in emulation of page-table-walk, where you need the page table loads to be atomic w.r.t. other CPU threads, especially in the case where you need to emulate a hardware update of a dirty/access bit in the page table entry.) Mostly this hasn't bitten us before because any sensible compiler will turn the memcpy into a straight load on most common hosts, which will be atomic (but accidentally rather than on purpose). thanks -- PMM
On Sat, 23 Feb 2019 16:18:15 +0000 Peter Maydell <peter.maydell@linaro.org> wrote: > On Sat, 23 Feb 2019 at 16:05, Natanael Copa <ncopa@alpinelinux.org> wrote: > > I was thinking of something in the lines of: > > > > typedef volatile uint16_t __attribute__((__may_alias__)) volatile_uint16_t; > > static inline int lduw_he_p(const void *ptr) > > { > > volatile_uint16_t r = *(volatile_uint16_t*)ptr; > > return r; > > } > > This won't correctly handle accesses with unaligned pointers, > I'm afraid. We rely on these functions correctly working > with pointers that are potentially unaligned. Well, current code does not even handle access with aligned pointers, depending on FORTIFY_SOURCE implementation. My thinking here is that we depend on assumption that compiler will remove the memcpy call, so maybe find other way to generate same assembly, while still depend on compiler optimization. I did some tests and compared the assembly output. The compiler will generate same assembly if volatile is not used. The attribute __may_alias__ does not seem to make any difference. So the following function generates exactly same assembly: static inline int lduw_he_p(const void *ptr) { uint16_t r = *(uint16_t*)ptr; return r; } I created a script to compare on different Linux distros and architectures. My conclusion is: - Alpine fortify implementation is non-optimal, but technically correct. - `uint16_t r = *(uint16_t*)ptr;` generates the same assembly code, regardless of fortify implementation, and is clearer on the intention, than use of memcpy. test.sh: ---[CUT HERE]---8<-------------------------------------------------- #!/bin/sh -x cat > testcase.c <<EOF #include <stdint.h> #ifdef WITH_MEMCPY #include <string.h> static inline int lduw_he_p(const void *ptr) { uint16_t r; memcpy(&r, ptr, sizeof(r)); return r; } #else static inline int lduw_he_p(const void *ptr) { uint16_t r = *(uint16_t*)ptr; return r; } #endif int main(int argc, char *argv[]) { void *p = argv; int i; p++; // make sure we are unaligned i=lduw_he_p(p); return i; } EOF : ${CC:=gcc} $CC --version uname -m for fortify in "-D_FORTIFY_SOURCE=2" "-U_FORTIFY_SOURCE"; do for cflag in "-DWITH_MEMCPY" "-DWITHOUT_MEMCPY"; do $CC $cflag $fortify $@ -o testcase$cflag testcase.c gdb --batch -ex "disas ${func:-main}" ./testcase$cflag done done ---[CUT HERE]---8<-------------------------------------------------- Output on Alpine: $ sh -x test.sh -O2 + cat + : gcc + gcc --version gcc (Alpine 8.2.0) 8.2.0 Copyright (C) 2018 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. + uname -m x86_64 + gcc -DWITH_MEMCPY '-D_FORTIFY_SOURCE=2' -O2 -o testcase-DWITH_MEMCPY testcase.c + gdb --batch -ex 'disas main' ./testcase-DWITH_MEMCPY Dump of assembler code for function main: 0x0000000000001070 <+0>: sub $0x18,%rsp 0x0000000000001074 <+4>: mov %fs:0x28,%rax 0x000000000000107d <+13>: mov %rax,0x8(%rsp) 0x0000000000001082 <+18>: xor %eax,%eax 0x0000000000001084 <+20>: lea 0x6(%rsp),%rdi 0x0000000000001089 <+25>: lea 0x1(%rsi),%rax 0x000000000000108d <+29>: lea 0x2(%rdi),%rdx 0x0000000000001091 <+33>: cmp %rdx,%rax 0x0000000000001094 <+36>: jae 0x109b <main+43> 0x0000000000001096 <+38>: cmp %rdi,%rax 0x0000000000001099 <+41>: ja 0x10d0 <main+96> 0x000000000000109b <+43>: cmp %rdi,%rax 0x000000000000109e <+46>: jb 0x10c7 <main+87> 0x00000000000010a0 <+48>: mov $0x2,%edx 0x00000000000010a5 <+53>: mov %rax,%rsi 0x00000000000010a8 <+56>: callq 0x1020 <memcpy@plt> 0x00000000000010ad <+61>: movzwl 0x6(%rsp),%eax 0x00000000000010b2 <+66>: mov 0x8(%rsp),%rcx 0x00000000000010b7 <+71>: xor %fs:0x28,%rcx 0x00000000000010c0 <+80>: jne 0x10d2 <main+98> 0x00000000000010c2 <+82>: add $0x18,%rsp 0x00000000000010c6 <+86>: retq 0x00000000000010c7 <+87>: add $0x3,%rsi 0x00000000000010cb <+91>: cmp %rsi,%rdi 0x00000000000010ce <+94>: jae 0x10a0 <main+48> 0x00000000000010d0 <+96>: ud2 0x00000000000010d2 <+98>: callq 0x1030 <__stack_chk_fail@plt> End of assembler dump. + gcc -DWITHOUT_MEMCPY '-D_FORTIFY_SOURCE=2' -O2 -o testcase-DWITHOUT_MEMCPY testcase.c + gdb --batch -ex 'disas main' ./testcase-DWITHOUT_MEMCPY Dump of assembler code for function main: 0x0000000000001050 <+0>: movzwl 0x1(%rsi),%eax 0x0000000000001054 <+4>: retq End of assembler dump. + gcc -DWITH_MEMCPY -U_FORTIFY_SOURCE -O2 -o testcase-DWITH_MEMCPY testcase.c + gdb --batch -ex 'disas main' ./testcase-DWITH_MEMCPY Dump of assembler code for function main: 0x0000000000001050 <+0>: movzwl 0x1(%rsi),%eax 0x0000000000001054 <+4>: retq End of assembler dump. + gcc -DWITHOUT_MEMCPY -U_FORTIFY_SOURCE -O2 -o testcase-DWITHOUT_MEMCPY testcase.c + gdb --batch -ex 'disas main' ./testcase-DWITHOUT_MEMCPY Dump of assembler code for function main: 0x0000000000001050 <+0>: movzwl 0x1(%rsi),%eax 0x0000000000001054 <+4>: retq End of assembler dump. Output on Debian: root@19c3f13c6023:~# sh -x test.sh -O2 + cat + : gcc + gcc --version gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516 Copyright (C) 2016 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. + uname -m x86_64 + gcc -DWITH_MEMCPY -D_FORTIFY_SOURCE=2 -O2 -o testcase-DWITH_MEMCPY testcase.c + gdb --batch -ex disas main ./testcase-DWITH_MEMCPY Dump of assembler code for function main: 0x0000000000000530 <+0>: movzwl 0x1(%rsi),%eax 0x0000000000000534 <+4>: retq End of assembler dump. + gcc -DWITHOUT_MEMCPY -D_FORTIFY_SOURCE=2 -O2 -o testcase-DWITHOUT_MEMCPY testcase.c + gdb --batch -ex disas main ./testcase-DWITHOUT_MEMCPY Dump of assembler code for function main: 0x0000000000000530 <+0>: movzwl 0x1(%rsi),%eax 0x0000000000000534 <+4>: retq End of assembler dump. + gcc -DWITH_MEMCPY -U_FORTIFY_SOURCE -O2 -o testcase-DWITH_MEMCPY testcase.c + gdb --batch -ex disas main ./testcase-DWITH_MEMCPY Dump of assembler code for function main: 0x0000000000000530 <+0>: movzwl 0x1(%rsi),%eax 0x0000000000000534 <+4>: retq End of assembler dump. + gcc -DWITHOUT_MEMCPY -U_FORTIFY_SOURCE -O2 -o testcase-DWITHOUT_MEMCPY testcase.c + gdb --batch -ex disas main ./testcase-DWITHOUT_MEMCPY Dump of assembler code for function main: 0x0000000000000530 <+0>: movzwl 0x1(%rsi),%eax 0x0000000000000534 <+4>: retq End of assembler dump. root@19c3f13c6023:~# Output on Fedora: [root@ac68847f5bb1 ~]# sh -x test.sh -O2 + cat + : gcc + gcc --version gcc (GCC) 8.2.1 20181215 (Red Hat 8.2.1-6) Copyright (C) 2018 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. + uname -m x86_64 + for fortify in "-D_FORTIFY_SOURCE=2" "-U_FORTIFY_SOURCE" + for cflag in "-DWITH_MEMCPY" "-DWITHOUT_MEMCPY" + gcc -DWITH_MEMCPY -D_FORTIFY_SOURCE=2 -O2 -o testcase-DWITH_MEMCPY testcase.c + gdb --batch -ex 'disas main' ./testcase-DWITH_MEMCPY Dump of assembler code for function main: 0x0000000000401020 <+0>: movzwl 0x1(%rsi),%eax 0x0000000000401024 <+4>: retq End of assembler dump. + for cflag in "-DWITH_MEMCPY" "-DWITHOUT_MEMCPY" + gcc -DWITHOUT_MEMCPY -D_FORTIFY_SOURCE=2 -O2 -o testcase-DWITHOUT_MEMCPY testcase.c + gdb --batch -ex 'disas main' ./testcase-DWITHOUT_MEMCPY Dump of assembler code for function main: 0x0000000000401020 <+0>: movzwl 0x1(%rsi),%eax 0x0000000000401024 <+4>: retq End of assembler dump. + for fortify in "-D_FORTIFY_SOURCE=2" "-U_FORTIFY_SOURCE" + for cflag in "-DWITH_MEMCPY" "-DWITHOUT_MEMCPY" + gcc -DWITH_MEMCPY -U_FORTIFY_SOURCE -O2 -o testcase-DWITH_MEMCPY testcase.c + gdb --batch -ex 'disas main' ./testcase-DWITH_MEMCPY Dump of assembler code for function main: 0x0000000000401020 <+0>: movzwl 0x1(%rsi),%eax 0x0000000000401024 <+4>: retq End of assembler dump. + for cflag in "-DWITH_MEMCPY" "-DWITHOUT_MEMCPY" + gcc -DWITHOUT_MEMCPY -U_FORTIFY_SOURCE -O2 -o testcase-DWITHOUT_MEMCPY testcase.c + gdb --batch -ex 'disas main' ./testcase-DWITHOUT_MEMCPY Dump of assembler code for function main: 0x0000000000401020 <+0>: movzwl 0x1(%rsi),%eax 0x0000000000401024 <+4>: retq End of assembler dump. Output on Alpine with clang: $ CC=clang sh -x test.sh -O2 + cat + : clang + clang --version Alpine clang version 5.0.2 (tags/RELEASE_502/final) (based on LLVM 5.0.2) Target: x86_64-alpine-linux-musl Thread model: posix InstalledDir: /usr/bin + uname -m x86_64 + clang -DWITH_MEMCPY '-D_FORTIFY_SOURCE=2' -O2 -o testcase-DWITH_MEMCPY testcase.c + gdb --batch -ex 'disas main' ./testcase-DWITH_MEMCPY Dump of assembler code for function main: 0x00000000000011a0 <+0>: movzwl 0x1(%rsi),%eax 0x00000000000011a4 <+4>: retq End of assembler dump. + clang -DWITHOUT_MEMCPY '-D_FORTIFY_SOURCE=2' -O2 -o testcase-DWITHOUT_MEMCPY testcase.c + gdb --batch -ex 'disas main' ./testcase-DWITHOUT_MEMCPY Dump of assembler code for function main: 0x00000000000011a0 <+0>: movzwl 0x1(%rsi),%eax 0x00000000000011a4 <+4>: retq End of assembler dump. + clang -DWITH_MEMCPY -U_FORTIFY_SOURCE -O2 -o testcase-DWITH_MEMCPY testcase.c + gdb --batch -ex 'disas main' ./testcase-DWITH_MEMCPY Dump of assembler code for function main: 0x00000000000011a0 <+0>: movzwl 0x1(%rsi),%eax 0x00000000000011a4 <+4>: retq End of assembler dump. + clang -DWITHOUT_MEMCPY -U_FORTIFY_SOURCE -O2 -o testcase-DWITHOUT_MEMCPY testcase.c + gdb --batch -ex 'disas main' ./testcase-DWITHOUT_MEMCPY Dump of assembler code for function main: 0x00000000000011a0 <+0>: movzwl 0x1(%rsi),%eax 0x00000000000011a4 <+4>: retq End of assembler dump. Output on alpine armv7: $ sh -x test.sh -O2 + cat + : gcc + gcc --version gcc (Alpine 8.2.0) 8.2.0 Copyright (C) 2018 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. + uname -m armv8l + gcc -DWITH_MEMCPY '-D_FORTIFY_SOURCE=2' -O2 -o testcase-DWITH_MEMCPY testcase.c + gdb --batch -ex 'disas main' ./testcase-DWITH_MEMCPY Dump of assembler code for function main: 0x000003f8 <+0>: push {r4, r5, lr} 0x000003fa <+2>: sub sp, #12 0x000003fc <+4>: adds r3, r1, #1 0x000003fe <+6>: add r4, sp, #4 0x00000400 <+8>: cmp r3, r4 0x00000402 <+10>: add.w r0, sp, #2 0x00000406 <+14>: ite cs 0x00000408 <+16>: movcs r2, #0 0x0000040a <+18>: movcc r2, #1 0x0000040c <+20>: cmp r3, r0 0x0000040e <+22>: it ls 0x00000410 <+24>: movls r2, #0 0x00000412 <+26>: ldr r4, [pc, #52] ; (0x448 <main+80>) 0x00000414 <+28>: ldr r5, [pc, #52] ; (0x44c <main+84>) 0x00000416 <+30>: add r4, pc 0x00000418 <+32>: ldr r4, [r4, r5] 0x0000041a <+34>: ldr r5, [r4, #0] 0x0000041c <+36>: str r5, [sp, #4] 0x0000041e <+38>: cbnz r2, 0x442 <main+74> 0x00000420 <+40>: cmp r3, r0 0x00000422 <+42>: bcc.n 0x43c <main+68> 0x00000424 <+44>: mov r1, r3 0x00000426 <+46>: movs r2, #2 0x00000428 <+48>: blx 0x3b0 <memcpy@plt> 0x0000042c <+52>: ldr r2, [sp, #4] 0x0000042e <+54>: ldr r3, [r4, #0] 0x00000430 <+56>: ldrh.w r0, [sp, #2] 0x00000434 <+60>: cmp r2, r3 0x00000436 <+62>: bne.n 0x444 <main+76> 0x00000438 <+64>: add sp, #12 0x0000043a <+66>: pop {r4, r5, pc} 0x0000043c <+68>: adds r1, #3 0x0000043e <+70>: cmp r0, r1 0x00000440 <+72>: bcs.n 0x424 <main+44> 0x00000442 <+74>: udf #255 ; 0xff 0x00000444 <+76>: blx 0x3c8 <__stack_chk_fail@plt> 0x00000448 <+80>: muleq r1, lr, r11 0x0000044c <+84>: andeq r0, r0, r4, lsr #32 End of assembler dump. + gcc -DWITHOUT_MEMCPY '-D_FORTIFY_SOURCE=2' -O2 -o testcase-DWITHOUT_MEMCPY testcase.c + gdb --batch -ex 'disas main' ./testcase-DWITHOUT_MEMCPY Dump of assembler code for function main: 0x0000036c <+0>: ldrh.w r0, [r1, #1] 0x00000370 <+4>: bx lr End of assembler dump. + gcc -DWITH_MEMCPY -U_FORTIFY_SOURCE -O2 -o testcase-DWITH_MEMCPY testcase.c + gdb --batch -ex 'disas main' ./testcase-DWITH_MEMCPY Dump of assembler code for function main: 0x0000036c <+0>: ldrh.w r0, [r1, #1] 0x00000370 <+4>: bx lr End of assembler dump. + gcc -DWITHOUT_MEMCPY -U_FORTIFY_SOURCE -O2 -o testcase-DWITHOUT_MEMCPY testcase.c + gdb --batch -ex 'disas main' ./testcase-DWITHOUT_MEMCPY Dump of assembler code for function main: 0x0000036c <+0>: ldrh.w r0, [r1, #1] 0x00000370 <+4>: bx lr End of assembler dump. Output on Alpine aarch64: $ sh -x test.sh -O2 + cat + : gcc + gcc --version gcc (Alpine 8.2.0) 8.2.0 Copyright (C) 2018 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. + uname -m aarch64 + gcc -DWITH_MEMCPY '-D_FORTIFY_SOURCE=2' -O2 -o testcase-DWITH_MEMCPY testcase.c + gdb --batch -ex 'disas main' ./testcase-DWITH_MEMCPY Dump of assembler code for function main: 0x00000000000006b0 <+0>: stp x29, x30, [sp, #-48]! 0x00000000000006b4 <+4>: add x2, x1, #0x1 0x00000000000006b8 <+8>: mov x29, sp 0x00000000000006bc <+12>: str x19, [sp, #16] 0x00000000000006c0 <+16>: adrp x19, 0x10000 0x00000000000006c4 <+20>: add x0, sp, #0x26 0x00000000000006c8 <+24>: ldr x3, [x19, #4024] 0x00000000000006cc <+28>: add x4, x0, #0x2 0x00000000000006d0 <+32>: cmp x4, x2 0x00000000000006d4 <+36>: ldr x4, [x3] 0x00000000000006d8 <+40>: str x4, [sp, #40] 0x00000000000006dc <+44>: mov x4, #0x0 // #0 0x00000000000006e0 <+48>: ccmp x0, x2, #0x2, hi // hi = pmore 0x00000000000006e4 <+52>: b.cc 0x72c <main+124> // b.lo, b.ul, b.last 0x00000000000006e8 <+56>: cmp x2, x0 0x00000000000006ec <+60>: b.cc 0x720 <main+112> // b.lo, b.ul, b.last 0x00000000000006f0 <+64>: mov x1, x2 0x00000000000006f4 <+68>: mov x2, #0x2 // #2 0x00000000000006f8 <+72>: bl 0x650 <memcpy@plt> 0x00000000000006fc <+76>: ldr x19, [x19, #4024] 0x0000000000000700 <+80>: ldrh w0, [sp, #38] 0x0000000000000704 <+84>: ldr x2, [sp, #40] 0x0000000000000708 <+88>: ldr x1, [x19] 0x000000000000070c <+92>: eor x1, x2, x1 0x0000000000000710 <+96>: cbnz x1, 0x730 <main+128> 0x0000000000000714 <+100>: ldr x19, [sp, #16] 0x0000000000000718 <+104>: ldp x29, x30, [sp], #48 0x000000000000071c <+108>: ret 0x0000000000000720 <+112>: add x1, x1, #0x3 0x0000000000000724 <+116>: cmp x0, x1 0x0000000000000728 <+120>: b.cs 0x6f0 <main+64> // b.hs, b.nlast 0x000000000000072c <+124>: brk #0x3e8 0x0000000000000730 <+128>: bl 0x670 <__stack_chk_fail@plt> End of assembler dump. + gcc -DWITHOUT_MEMCPY '-D_FORTIFY_SOURCE=2' -O2 -o testcase-DWITHOUT_MEMCPY testcase.c + gdb --batch -ex 'disas main' ./testcase-DWITHOUT_MEMCPY Dump of assembler code for function main: 0x00000000000005e0 <+0>: ldurh w0, [x1, #1] 0x00000000000005e4 <+4>: ret End of assembler dump. + gcc -DWITH_MEMCPY -U_FORTIFY_SOURCE -O2 -o testcase-DWITH_MEMCPY testcase.c + gdb --batch -ex 'disas main' ./testcase-DWITH_MEMCPY Dump of assembler code for function main: 0x00000000000005e0 <+0>: ldurh w0, [x1, #1] 0x00000000000005e4 <+4>: ret End of assembler dump. + gcc -DWITHOUT_MEMCPY -U_FORTIFY_SOURCE -O2 -o testcase-DWITHOUT_MEMCPY testcase.c + gdb --batch -ex 'disas main' ./testcase-DWITHOUT_MEMCPY Dump of assembler code for function main: 0x00000000000005e0 <+0>: ldurh w0, [x1, #1] 0x00000000000005e4 <+4>: ret End of assembler dump. Output on Alpine ppc64le: $ sh -x test.sh -O2 + cat + : gcc + gcc --version gcc (Alpine 8.2.0) 8.2.0 Copyright (C) 2018 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. + uname -m ppc64le + gcc -DWITH_MEMCPY '-D_FORTIFY_SOURCE=2' -O2 -o testcase-DWITH_MEMCPY testcase.c + gdb --batch -ex 'disas main' ./testcase-DWITH_MEMCPY Dump of assembler code for function main: 0x00000000000005c0 <+0>: lhz r3,1(r4) 0x00000000000005c4 <+4>: blr 0x00000000000005c8 <+8>: .long 0x0 0x00000000000005cc <+12>: .long 0x0 0x00000000000005d0 <+16>: .long 0x0 End of assembler dump. + gcc -DWITHOUT_MEMCPY '-D_FORTIFY_SOURCE=2' -O2 -o testcase-DWITHOUT_MEMCPY testcase.c + gdb --batch -ex 'disas main' ./testcase-DWITHOUT_MEMCPY Dump of assembler code for function main: 0x00000000000005c0 <+0>: lhz r3,1(r4) 0x00000000000005c4 <+4>: blr 0x00000000000005c8 <+8>: .long 0x0 0x00000000000005cc <+12>: .long 0x0 0x00000000000005d0 <+16>: .long 0x0 End of assembler dump. + gcc -DWITH_MEMCPY -U_FORTIFY_SOURCE -O2 -o testcase-DWITH_MEMCPY testcase.c + gdb --batch -ex 'disas main' ./testcase-DWITH_MEMCPY Dump of assembler code for function main: 0x00000000000005c0 <+0>: lhz r3,1(r4) 0x00000000000005c4 <+4>: blr 0x00000000000005c8 <+8>: .long 0x0 0x00000000000005cc <+12>: .long 0x0 0x00000000000005d0 <+16>: .long 0x0 End of assembler dump. + gcc -DWITHOUT_MEMCPY -U_FORTIFY_SOURCE -O2 -o testcase-DWITHOUT_MEMCPY testcase.c + gdb --batch -ex 'disas main' ./testcase-DWITHOUT_MEMCPY Dump of assembler code for function main: 0x00000000000005c0 <+0>: lhz r3,1(r4) 0x00000000000005c4 <+4>: blr 0x00000000000005c8 <+8>: .long 0x0 0x00000000000005cc <+12>: .long 0x0 0x00000000000005d0 <+16>: .long 0x0 End of assembler dump. Output on Alpine s390x: $ sh -x test.sh -O2 + cat + : gcc + gcc --version gcc (Alpine 8.2.0) 8.2.0 Copyright (C) 2018 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. + uname -m s390x + gcc -DWITH_MEMCPY '-D_FORTIFY_SOURCE=2' -O2 -o testcase-DWITH_MEMCPY testcase.c + gdb --batch -ex 'disas main' ./testcase-DWITH_MEMCPY Dump of assembler code for function main: 0x00000000000006b0 <+0>: stmg %r14,%r15,112(%r15) 0x00000000000006b6 <+6>: ear %r1,%a0 0x00000000000006ba <+10>: lay %r15,-176(%r15) 0x00000000000006c0 <+16>: sllg %r1,%r1,32 0x00000000000006c6 <+22>: ear %r1,%a1 0x00000000000006ca <+26>: la %r2,166(%r15) 0x00000000000006ce <+30>: mvc 168(8,%r15),40(%r1) 0x00000000000006d4 <+36>: la %r1,1(%r3) 0x00000000000006d8 <+40>: clgr %r1,%r2 0x00000000000006dc <+44>: jle 0x6e8 <main+56> 0x00000000000006e0 <+48>: la %r4,168(%r15) 0x00000000000006e4 <+52>: clgrtl %r1,%r4 0x00000000000006e8 <+56>: clgr %r1,%r2 0x00000000000006ec <+60>: jl 0x724 <main+116> 0x00000000000006f0 <+64>: lgr %r3,%r1 0x00000000000006f4 <+68>: lghi %r4,2 0x00000000000006f8 <+72>: brasl %r14,0x5f0 <memcpy@plt> 0x00000000000006fe <+78>: ear %r1,%a0 0x0000000000000702 <+82>: llgh %r2,166(%r15) 0x0000000000000708 <+88>: sllg %r1,%r1,32 0x000000000000070e <+94>: ear %r1,%a1 0x0000000000000712 <+98>: clc 168(8,%r15),40(%r1) 0x0000000000000718 <+104>: jne 0x730 <main+128> 0x000000000000071c <+108>: lmg %r14,%r15,288(%r15) 0x0000000000000722 <+114>: br %r14 0x0000000000000724 <+116>: la %r3,3(%r3) 0x0000000000000728 <+120>: clgrtl %r2,%r3 0x000000000000072c <+124>: j 0x6f0 <main+64> 0x0000000000000730 <+128>: brasl %r14,0x630 <__stack_chk_fail@plt> End of assembler dump. + gcc -DWITHOUT_MEMCPY '-D_FORTIFY_SOURCE=2' -O2 -o testcase-DWITHOUT_MEMCPY testcase.c + gdb --batch -ex 'disas main' ./testcase-DWITHOUT_MEMCPY Dump of assembler code for function main: 0x00000000000005f8 <+0>: llgh %r2,1(%r3) 0x00000000000005fe <+6>: br %r14 End of assembler dump. + gcc -DWITH_MEMCPY -U_FORTIFY_SOURCE -O2 -o testcase-DWITH_MEMCPY testcase.c + gdb --batch -ex 'disas main' ./testcase-DWITH_MEMCPY Dump of assembler code for function main: 0x00000000000005f8 <+0>: llgh %r2,1(%r3) 0x00000000000005fe <+6>: br %r14 End of assembler dump. + gcc -DWITHOUT_MEMCPY -U_FORTIFY_SOURCE -O2 -o testcase-DWITHOUT_MEMCPY testcase.c + gdb --batch -ex 'disas main' ./testcase-DWITHOUT_MEMCPY Dump of assembler code for function main: 0x00000000000005f8 <+0>: llgh %r2,1(%r3) 0x00000000000005fe <+6>: br %r14 End of assembler dump.
On Sat, Feb 23, 2019 at 3:55 PM Natanael Copa <ncopa@alpinelinux.org> wrote: > On Fri, 22 Feb 2019 14:04:20 +0000 > Stefan Hajnoczi <stefanha@gmail.com> wrote: > > On Fri, Feb 22, 2019 at 12:57 PM Fernando Casas Schössow > > <casasfernando@outlook.com> wrote: > I tried to find this section. How do you get the assembly listing of > relevant secion? I tried to do "disas virtio_pop" from > `gdb /usr/bin/qemu-system-x86_64` from the binary in alpine edge. I > could find 2 memcpy but none of them look like a 16 bit operation after: > > 0x00000000004551f1 <+353>: mov 0x10(%rsp),%rdi > 0x00000000004551f6 <+358>: mov $0x10,%edx > 0x00000000004551fb <+363>: callq 0x3879e0 <memcpy@plt> > 0x0000000000455200 <+368>: movzwl 0x5c(%rsp),%eax > 0x0000000000455205 <+373>: test $0x4,%al > 0x0000000000455207 <+375>: je 0x4552aa <virtqueue_pop+538> > > .... > > 0x0000000000455291 <+513>: mov 0x10(%rsp),%rdi > 0x0000000000455296 <+518>: mov $0x10,%edx > 0x000000000045529b <+523>: callq 0x3879e0 <memcpy@plt> > 0x00000000004552a0 <+528>: mov %rbp,0x20(%rsp) > 0x00000000004552a5 <+533>: movzwl 0x5c(%rsp),%eax > 0x00000000004552aa <+538>: lea 0x20e0(%rsp),%rdi > 0x00000000004552b2 <+546>: xor %r11d,%r11d > 0x00000000004552b5 <+549>: mov %r15,0x38(%rsp) Here is the beginning of the function: 0000000000455090 <virtqueue_pop@@Base>: 455090: 41 57 push %r15 455092: 49 89 ff mov %rdi,%r15 455095: b9 0e 00 00 00 mov $0xe,%ecx 45509a: 41 56 push %r14 45509c: 41 55 push %r13 45509e: 41 54 push %r12 4550a0: 55 push %rbp 4550a1: 53 push %rbx 4550a2: 48 81 ec f8 60 00 00 sub $0x60f8,%rsp 4550a9: 4d 8b 67 58 mov 0x58(%r15),%r12 4550ad: 48 89 74 24 08 mov %rsi,0x8(%rsp) 4550b2: 48 8d 6c 24 70 lea 0x70(%rsp),%rbp 4550b7: 48 89 ef mov %rbp,%rdi 4550ba: 64 48 8b 04 25 28 00 mov %fs:0x28,%rax 4550c1: 00 00 4550c3: 48 89 84 24 e8 60 00 mov %rax,0x60e8(%rsp) 4550ca: 00 4550cb: 31 c0 xor %eax,%eax 4550cd: 41 80 bc 24 6b 01 00 cmpb $0x0,0x16b(%r12) 4550d4: 00 00 4550d6: f3 48 ab rep stos %rax,%es:(%rdi) 4550d9: 0f 85 c1 04 00 00 jne 4555a0 <virtqueue_pop@@Base+0x510> 4550df: 48 8b 1d f2 4e 99 00 mov 0x994ef2(%rip),%rbx # de9fd8 <rcu_reader@@Base+0xde9f20> 4550e6: 64 8b 43 0c mov %fs:0xc(%rbx),%eax 4550ea: 8d 50 01 lea 0x1(%rax),%edx 4550ed: 64 89 53 0c mov %edx,%fs:0xc(%rbx) 4550f1: 85 c0 test %eax,%eax 4550f3: 0f 84 d7 03 00 00 je 4554d0 <virtqueue_pop@@Base+0x440> 4550f9: 49 83 7f 18 00 cmpq $0x0,0x18(%r15) 4550fe: 0f 84 7d 03 00 00 je 455481 <virtqueue_pop@@Base+0x3f1> 455104: 41 0f b7 47 30 movzwl 0x30(%r15),%eax 455109: 66 41 39 47 32 cmp %ax,0x32(%r15) 45510e: 0f 84 0c 04 00 00 je 455520 <virtqueue_pop@@Base+0x490> This last branch leads to: 455520: 4c 89 ff mov %r15,%rdi 455523: e8 78 b4 ff ff callq 4509a0 <virtio_queue_host_notifier_read@@Base+0x6d0> 455528: 48 83 b8 90 00 00 00 cmpq $0x3,0x90(%rax) 45552f: 03 455530: 0f 86 98 03 00 00 jbe 4558ce <virtqueue_pop@@Base+0x83e> 455536: 48 8b 90 80 00 00 00 mov 0x80(%rax),%rdx 45553d: 48 85 d2 test %rdx,%rdx 455540: 0f 84 c2 02 00 00 je 455808 <virtqueue_pop@@Base+0x778> 455546: 48 8d 72 02 lea 0x2(%rdx),%rsi 45554a: 48 8d 7c 24 42 lea 0x42(%rsp),%rdi 45554f: 48 39 fe cmp %rdi,%rsi 455552: 76 09 jbe 45555d <virtqueue_pop@@Base+0x4cd> 455554: 48 8d 47 02 lea 0x2(%rdi),%rax 455558: 48 39 c6 cmp %rax,%rsi 45555b: 72 3c jb 455599 <virtqueue_pop@@Base+0x509> 45555d: 48 39 fe cmp %rdi,%rsi 455560: 72 2e jb 455590 <virtqueue_pop@@Base+0x500> 455562: ba 02 00 00 00 mov $0x2,%edx 455567: e8 74 24 f3 ff callq 3879e0 <memcpy@plt> > I can test different CFLAGS with and without the _FORTIFY_SOURCE and > with different variants of memcpy (like __builtint_memcpy etc) but i > need find a way to get the correct assembly output so I know if/when I > have found something that works. Good. I think tweaking the compilation flags is the quickest workaround to produce a working QEMU binary for Alpine. Other distros don't seem to hit this problem. In the longer term QEMU will need to fix these memory accessor functions in the way that Peter described, but doing that properly requires auditing all the code so each instance that relies on atomicity can be converted properly (and the other ones will stay non-atomic because they must support misaligned addresses). Stefan
On Mon, Feb 25, 2019 at 10:30 AM Stefan Hajnoczi <stefanha@gmail.com> wrote: > On Sat, Feb 23, 2019 at 3:55 PM Natanael Copa <ncopa@alpinelinux.org> wrote: > > On Fri, 22 Feb 2019 14:04:20 +0000 > > Stefan Hajnoczi <stefanha@gmail.com> wrote: > > > On Fri, Feb 22, 2019 at 12:57 PM Fernando Casas Schössow > > > <casasfernando@outlook.com> wrote: > > I tried to find this section. How do you get the assembly listing of > > relevant secion? I tried to do "disas virtio_pop" from > > `gdb /usr/bin/qemu-system-x86_64` from the binary in alpine edge. I > > could find 2 memcpy but none of them look like a 16 bit operation after: > > > > 0x00000000004551f1 <+353>: mov 0x10(%rsp),%rdi > > 0x00000000004551f6 <+358>: mov $0x10,%edx > > 0x00000000004551fb <+363>: callq 0x3879e0 <memcpy@plt> > > 0x0000000000455200 <+368>: movzwl 0x5c(%rsp),%eax > > 0x0000000000455205 <+373>: test $0x4,%al > > 0x0000000000455207 <+375>: je 0x4552aa <virtqueue_pop+538> > > > > .... > > > > 0x0000000000455291 <+513>: mov 0x10(%rsp),%rdi > > 0x0000000000455296 <+518>: mov $0x10,%edx > > 0x000000000045529b <+523>: callq 0x3879e0 <memcpy@plt> > > 0x00000000004552a0 <+528>: mov %rbp,0x20(%rsp) > > 0x00000000004552a5 <+533>: movzwl 0x5c(%rsp),%eax > > 0x00000000004552aa <+538>: lea 0x20e0(%rsp),%rdi > > 0x00000000004552b2 <+546>: xor %r11d,%r11d > > 0x00000000004552b5 <+549>: mov %r15,0x38(%rsp) > > Here is the beginning of the function: This was built from the docker alpine image: REPOSITORY TAG IMAGE ID CREATED SIZE docker.io/alpine latest caf27325b298 3 weeks ago 5.52 MB aports git commit: be41538f0061b406a374564a0043a363efcb0293 gcc (Alpine 8.2.0) 8.2.0 Stefan
On Mon, 25 Feb 2019 at 10:24, Natanael Copa <ncopa@alpinelinux.org> wrote: > > On Sat, 23 Feb 2019 16:18:15 +0000 > Peter Maydell <peter.maydell@linaro.org> wrote: > > > On Sat, 23 Feb 2019 at 16:05, Natanael Copa <ncopa@alpinelinux.org> wrote: > > > I was thinking of something in the lines of: > > > > > > typedef volatile uint16_t __attribute__((__may_alias__)) volatile_uint16_t; > > > static inline int lduw_he_p(const void *ptr) > > > { > > > volatile_uint16_t r = *(volatile_uint16_t*)ptr; > > > return r; > > > } > > > > This won't correctly handle accesses with unaligned pointers, > > I'm afraid. We rely on these functions correctly working > > with pointers that are potentially unaligned. > > Well, current code does not even handle access with aligned pointers, > depending on FORTIFY_SOURCE implementation. It correctly handles aligned and unaligned pointers for the API guarantees that the function in QEMU provides, which is to say "definitely works on any kind of aligned or unaligned pointer, not guaranteed to be atomic". Unfortunately some code in QEMU is implicitly assuming it is atomic, which this QEMU function does not guarantee -- it just happens to provide that most of the time. > My thinking here is that we depend on assumption that compiler will > remove the memcpy call, so maybe find other way to generate same > assembly, while still depend on compiler optimization. > I did some tests and compared the assembly output. The compiler will > generate same assembly if volatile is not used. The attribute > __may_alias__ does not seem to make any difference. So the following > function generates exactly same assembly: > > static inline int lduw_he_p(const void *ptr) > { > uint16_t r = *(uint16_t*)ptr; > return r; > } This still is not guaranteed to work on unaligned pointers. (For instance it probably won't work on SPARC hosts.) More generally there is no way to have a single function that is guaranteed to handle unaligned pointers and also guaranteed to produce an atomic access on all host architectures that we support, because not all host architectures allow you to do both with the same instruction sequence. So you can't just change this function to make it provide atomic access without breaking the other guarantee it is providing. The long term fix for this is that we need to separate out our APIs so we can have a family of functions that guarantee to work on unaligned pointers, and a family that guarantee to work atomically, and calling code can use the one that provides the semantics it requires. The short term fix is to fix your toolchain/compilation environment options so that it isn't trying to override the definition of memcpy(). thanks -- PMM
I’m running the test guest on another host for the last three days and so far so good. Yet because of the nature of this bug/issue it can take a few hours or a few days more to fail. The failure is unpredictable. Does it make sense to continue running the guest on this different host to try to repro or can I migrate the guest back to the original host? At this point I don’t think it will make any difference. The current host storage is far more slower causing latency issues to the guest so if I can migrate it back it would be great. Sent from my iPhone > On 25 Feb 2019, at 11:34, Peter Maydell <peter.maydell@linaro.org> wrote: > >> On Mon, 25 Feb 2019 at 10:24, Natanael Copa <ncopa@alpinelinux.org> wrote: >> >> On Sat, 23 Feb 2019 16:18:15 +0000 >> Peter Maydell <peter.maydell@linaro.org> wrote: >> >>>> On Sat, 23 Feb 2019 at 16:05, Natanael Copa <ncopa@alpinelinux.org> wrote: >>>> I was thinking of something in the lines of: >>>> >>>> typedef volatile uint16_t __attribute__((__may_alias__)) volatile_uint16_t; >>>> static inline int lduw_he_p(const void *ptr) >>>> { >>>> volatile_uint16_t r = *(volatile_uint16_t*)ptr; >>>> return r; >>>> } >>> >>> This won't correctly handle accesses with unaligned pointers, >>> I'm afraid. We rely on these functions correctly working >>> with pointers that are potentially unaligned. >> >> Well, current code does not even handle access with aligned pointers, >> depending on FORTIFY_SOURCE implementation. > > It correctly handles aligned and unaligned pointers for the > API guarantees that the function in QEMU provides, which is > to say "definitely works on any kind of aligned or unaligned > pointer, not guaranteed to be atomic". Unfortunately some > code in QEMU is implicitly assuming it is atomic, which this > QEMU function does not guarantee -- it just happens to provide > that most of the time. > >> My thinking here is that we depend on assumption that compiler will >> remove the memcpy call, so maybe find other way to generate same >> assembly, while still depend on compiler optimization. > >> I did some tests and compared the assembly output. The compiler will >> generate same assembly if volatile is not used. The attribute >> __may_alias__ does not seem to make any difference. So the following >> function generates exactly same assembly: >> >> static inline int lduw_he_p(const void *ptr) >> { >> uint16_t r = *(uint16_t*)ptr; >> return r; >> } > > This still is not guaranteed to work on unaligned pointers. > (For instance it probably won't work on SPARC hosts.) > More generally there is no way to have a single function > that is guaranteed to handle unaligned pointers and also > guaranteed to produce an atomic access on all host architectures > that we support, because not all host architectures allow > you to do both with the same instruction sequence. So > you can't just change this function to make it provide > atomic access without breaking the other guarantee it is > providing. > > The long term fix for this is that we need to separate out > our APIs so we can have a family of functions that guarantee > to work on unaligned pointers, and a family that guarantee > to work atomically, and calling code can use the one that > provides the semantics it requires. > > The short term fix is to fix your toolchain/compilation > environment options so that it isn't trying to override > the definition of memcpy(). > > thanks > -- PMM
On Mon, 25 Feb 2019 10:34:23 +0000 Peter Maydell <peter.maydell@linaro.org> wrote: > On Mon, 25 Feb 2019 at 10:24, Natanael Copa <ncopa@alpinelinux.org> wrote: > > > > On Sat, 23 Feb 2019 16:18:15 +0000 > > Peter Maydell <peter.maydell@linaro.org> wrote: > > > > > On Sat, 23 Feb 2019 at 16:05, Natanael Copa <ncopa@alpinelinux.org> wrote: > > > > I was thinking of something in the lines of: > > > > > > > > typedef volatile uint16_t __attribute__((__may_alias__)) volatile_uint16_t; > > > > static inline int lduw_he_p(const void *ptr) > > > > { > > > > volatile_uint16_t r = *(volatile_uint16_t*)ptr; > > > > return r; > > > > } > > > > > > This won't correctly handle accesses with unaligned pointers, > > > I'm afraid. We rely on these functions correctly working > > > with pointers that are potentially unaligned. > > > > Well, current code does not even handle access with aligned pointers, > > depending on FORTIFY_SOURCE implementation. > > It correctly handles aligned and unaligned pointers for the > API guarantees that the function in QEMU provides, which is > to say "definitely works on any kind of aligned or unaligned > pointer, not guaranteed to be atomic". Unfortunately some > code in QEMU is implicitly assuming it is atomic, which this > QEMU function does not guarantee -- it just happens to provide > that most of the time. > > > My thinking here is that we depend on assumption that compiler will > > remove the memcpy call, so maybe find other way to generate same > > assembly, while still depend on compiler optimization. > > > I did some tests and compared the assembly output. The compiler will > > generate same assembly if volatile is not used. The attribute > > __may_alias__ does not seem to make any difference. So the following > > function generates exactly same assembly: > > > > static inline int lduw_he_p(const void *ptr) > > { > > uint16_t r = *(uint16_t*)ptr; > > return r; > > } > > This still is not guaranteed to work on unaligned pointers. > (For instance it probably won't work on SPARC hosts.) > More generally there is no way to have a single function > that is guaranteed to handle unaligned pointers and also > guaranteed to produce an atomic access on all host architectures > that we support, because not all host architectures allow > you to do both with the same instruction sequence. So > you can't just change this function to make it provide > atomic access without breaking the other guarantee it is > providing. Right, so it is possible that there are other architectures (like SPARC) that suffer from the same race here as Alpine, because memcpy is not guaranteed to be atomic. > The long term fix for this is that we need to separate out > our APIs so we can have a family of functions that guarantee > to work on unaligned pointers, and a family that guarantee > to work atomically, and calling code can use the one that > provides the semantics it requires. > > The short term fix is to fix your toolchain/compilation > environment options so that it isn't trying to override > the definition of memcpy(). The easiest workaround is to simply disable FORTIY_SOURCE, but that will weaken the security for all implemented string functions, strcpy, memmove etc, so I don't want to do that. Is it only lduw_he_p that needs to be atomic or are the other functions in include/qemu/bswap.h using memcpy also required to be atomic? I intend to replace memcpy with __builtin_memcpy there. > > thanks > -- PMM
On Mon, 25 Feb 2019 at 12:22, Natanael Copa <ncopa@alpinelinux.org> wrote: > > On Mon, 25 Feb 2019 10:34:23 +0000 > Peter Maydell <peter.maydell@linaro.org> wrote: > > The short term fix is to fix your toolchain/compilation > > environment options so that it isn't trying to override > > the definition of memcpy(). > > The easiest workaround is to simply disable FORTIY_SOURCE, but that > will weaken the security for all implemented string functions, strcpy, > memmove etc, so I don't want to do that. > > Is it only lduw_he_p that needs to be atomic or are the other functions > in include/qemu/bswap.h using memcpy also required to be atomic? Hard to say, since we haven't done the "audit all the callers" step that Stefan mentioned. If you're going to replace memcpy with __builtin_memcpy then the safest thing is to do it for all those uses (this will also give you much better generated code for performance purposes). thanks -- PMM
On Mon, 25 Feb 2019 13:06:16 +0000 Peter Maydell <peter.maydell@linaro.org> wrote: > On Mon, 25 Feb 2019 at 12:22, Natanael Copa <ncopa@alpinelinux.org> wrote: > > > > On Mon, 25 Feb 2019 10:34:23 +0000 > > Peter Maydell <peter.maydell@linaro.org> wrote: > > > The short term fix is to fix your toolchain/compilation > > > environment options so that it isn't trying to override > > > the definition of memcpy(). > > > > The easiest workaround is to simply disable FORTIY_SOURCE, but that > > will weaken the security for all implemented string functions, strcpy, > > memmove etc, so I don't want to do that. > > > > Is it only lduw_he_p that needs to be atomic or are the other functions > > in include/qemu/bswap.h using memcpy also required to be atomic? > > Hard to say, since we haven't done the "audit all the callers" > step that Stefan mentioned. If you're going to replace memcpy > with __builtin_memcpy then the safest thing is to do it for > all those uses (this will also give you much better generated > code for performance purposes). I figured that and that is exactly what I did. Fernando: Can you please test the binary from qemu-system-x86_64-3.1.0-r3 from alpine edge? I will backport the fix if you can confirm it fixes the problem. Thanks! -nc PS. Those issues are pretty hard to track down, so big thanks to everyone who helped find the exact issue here. You have done a great work!
Thanks Natanael. Is the new package ready? I will update as soon as the package is available, try to repro and report back. Thanks everyone for looking into this! On lun, feb 25, 2019 at 2:25 PM, Natanael Copa <ncopa@alpinelinux.org> wrote: On Mon, 25 Feb 2019 13:06:16 +0000 Peter Maydell <peter.maydell@linaro.org<mailto:peter.maydell@linaro.org>> wrote: On Mon, 25 Feb 2019 at 12:22, Natanael Copa <ncopa@alpinelinux.org<mailto:ncopa@alpinelinux.org>> wrote: > > On Mon, 25 Feb 2019 10:34:23 +0000 > Peter Maydell <peter.maydell@linaro.org<mailto:peter.maydell@linaro.org>> wrote: > > The short term fix is to fix your toolchain/compilation > > environment options so that it isn't trying to override > > the definition of memcpy(). > > The easiest workaround is to simply disable FORTIY_SOURCE, but that > will weaken the security for all implemented string functions, strcpy, > memmove etc, so I don't want to do that. > > Is it only lduw_he_p that needs to be atomic or are the other functions > in include/qemu/bswap.h using memcpy also required to be atomic? Hard to say, since we haven't done the "audit all the callers" step that Stefan mentioned. If you're going to replace memcpy with __builtin_memcpy then the safest thing is to do it for all those uses (this will also give you much better generated code for performance purposes). I figured that and that is exactly what I did. Fernando: Can you please test the binary from qemu-system-x86_64-3.1.0-r3 from alpine edge? I will backport the fix if you can confirm it fixes the problem. Thanks! -nc PS. Those issues are pretty hard to track down, so big thanks to everyone who helped find the exact issue here. You have done a great work!
Ok, the new package is deployed. I stopped and started the test guest so it will use the new QEMU binary. Will monitor and report back. On lun, feb 25, 2019 at 2:32 PM, Fernando Casas Schössow <casasfernando@outlook.com> wrote: Thanks Natanael. Is the new package ready? I will update as soon as the package is available, try to repro and report back. Thanks everyone for looking into this! On lun, feb 25, 2019 at 2:25 PM, Natanael Copa <ncopa@alpinelinux.org> wrote: On Mon, 25 Feb 2019 13:06:16 +0000 Peter Maydell <peter.maydell@linaro.org<mailto:peter.maydell@linaro.org>> wrote: On Mon, 25 Feb 2019 at 12:22, Natanael Copa <ncopa@alpinelinux.org<mailto:ncopa@alpinelinux.org>> wrote: > > On Mon, 25 Feb 2019 10:34:23 +0000 > Peter Maydell <peter.maydell@linaro.org<mailto:peter.maydell@linaro.org>> wrote: > > The short term fix is to fix your toolchain/compilation > > environment options so that it isn't trying to override > > the definition of memcpy(). > > The easiest workaround is to simply disable FORTIY_SOURCE, but that > will weaken the security for all implemented string functions, strcpy, > memmove etc, so I don't want to do that. > > Is it only lduw_he_p that needs to be atomic or are the other functions > in include/qemu/bswap.h using memcpy also required to be atomic? Hard to say, since we haven't done the "audit all the callers" step that Stefan mentioned. If you're going to replace memcpy with __builtin_memcpy then the safest thing is to do it for all those uses (this will also give you much better generated code for performance purposes). I figured that and that is exactly what I did. Fernando: Can you please test the binary from qemu-system-x86_64-3.1.0-r3 from alpine edge? I will backport the fix if you can confirm it fixes the problem. Thanks! -nc PS. Those issues are pretty hard to track down, so big thanks to everyone who helped find the exact issue here. You have done a great work!
On 23/02/19 12:49, Natanael Copa wrote: > I suspect this happens due to the Alpine toolchain will enable > _FORTIFY_SOURCE=2 by default and the way this is implemented via fortify-headers: > http://git.2f30.org/fortify-headers/file/include/string.h.html#l39 The call to __orig_memcpy is the culprit there, is there any reason not to do something like _FORTIFY_FN(memcpy) void *__memcpy_chk(void *__od, const void *__os, size_t __n) { size_t __bd = __builtin_object_size(__od, 0); size_t __bs = __builtin_object_size(__os, 0); char *__d = (char *)__od; const char *__s = (const char *)__os; /* trap if pointers are overlapping but not if dst == src. * gcc seems to like to generate code that relies on dst == src */ if ((__d < __s && __d + __n > __s) || (__s < __d && __s + __n > __d)) __builtin_trap(); if (__n > __bd || __n > __bs) __builtin_trap(); return memcpy(__od, __os, __n); } #define memcpy __memcpy_chk ? That is, getting rid of _FORTIFY_ORIG altogether. Paolo
Just wanted to share a small update on the situation after updating QEMU to the new Alpine package patched with Natanael's patch. So far so good, moreover I switched a few other guests from SATA to VirtIO SCSI and after two days no issues. Unless I find any problem I will report back with an update in a week from now. Thanks everyone for all you did to help find a solution to this issue. On mar, feb 26, 2019 at 2:30 PM, Paolo Bonzini <pbonzini@redhat.com> wrote: On 23/02/19 12:49, Natanael Copa wrote: I suspect this happens due to the Alpine toolchain will enable _FORTIFY_SOURCE=2 by default and the way this is implemented via fortify-headers: http://git.2f30.org/fortify-headers/file/include/string.h.html#l39 The call to __orig_memcpy is the culprit there, is there any reason not to do something like _FORTIFY_FN(memcpy) void *__memcpy_chk(void *__od, const void *__os, size_t __n) { size_t __bd = __builtin_object_size(__od, 0); size_t __bs = __builtin_object_size(__os, 0); char *__d = (char *)__od; const char *__s = (const char *)__os; /* trap if pointers are overlapping but not if dst == src. * gcc seems to like to generate code that relies on dst == src */ if ((__d < __s && __d + __n > __s) || (__s < __d && __s + __n > __d)) __builtin_trap(); if (__n > __bd || __n > __bs) __builtin_trap(); return memcpy(__od, __os, __n); } #define memcpy __memcpy_chk ? That is, getting rid of _FORTIFY_ORIG altogether. Paolo
On Mon, 25 Feb 2019 at 13:06, Peter Maydell <peter.maydell@linaro.org> wrote: > > On Mon, 25 Feb 2019 at 12:22, Natanael Copa <ncopa@alpinelinux.org> wrote: > > > > On Mon, 25 Feb 2019 10:34:23 +0000 > > Peter Maydell <peter.maydell@linaro.org> wrote: > > > The short term fix is to fix your toolchain/compilation > > > environment options so that it isn't trying to override > > > the definition of memcpy(). > > > > The easiest workaround is to simply disable FORTIY_SOURCE, but that > > will weaken the security for all implemented string functions, strcpy, > > memmove etc, so I don't want to do that. > > > > Is it only lduw_he_p that needs to be atomic or are the other functions > > in include/qemu/bswap.h using memcpy also required to be atomic? > > Hard to say, since we haven't done the "audit all the callers" > step that Stefan mentioned. If you're going to replace memcpy > with __builtin_memcpy then the safest thing is to do it for > all those uses (this will also give you much better generated > code for performance purposes). I mentioned this to Richard on IRC the other day, but I think there's a good argument for making upstream QEMU use __builtin_memcpy in these helpers: * it makes us robust against things like this fortify library in the short term (until we fix QEMU to have an equivalent set of functions for doing atomic accesses to AddressSpaces) * in the longer term it will mean that we don't end up with these functions being really badly-performing even if the semantics of the out-of-line memcpy() are correct thanks -- PMM
After two weeks of running the new QEMU package everything is fine. Moreover I migrated the rest of the guests (both Windows and Linux) to virtio-scsi and no issues so far. I will monitor for another week but this issue seems pretty much fixed! Kudos to each and everyone of you that helped finding and solving this issue. Fer On jue, feb 28, 2019 at 10:58 AM, Peter Maydell <peter.maydell@linaro.org> wrote: On Mon, 25 Feb 2019 at 13:06, Peter Maydell <peter.maydell@linaro.org<mailto:peter.maydell@linaro.org>> wrote: On Mon, 25 Feb 2019 at 12:22, Natanael Copa <ncopa@alpinelinux.org<mailto:ncopa@alpinelinux.org>> wrote: > > On Mon, 25 Feb 2019 10:34:23 +0000 > Peter Maydell <peter.maydell@linaro.org<mailto:peter.maydell@linaro.org>> wrote: > > The short term fix is to fix your toolchain/compilation > > environment options so that it isn't trying to override > > the definition of memcpy(). > > The easiest workaround is to simply disable FORTIY_SOURCE, but that > will weaken the security for all implemented string functions, strcpy, > memmove etc, so I don't want to do that. > > Is it only lduw_he_p that needs to be atomic or are the other functions > in include/qemu/bswap.h using memcpy also required to be atomic? Hard to say, since we haven't done the "audit all the callers" step that Stefan mentioned. If you're going to replace memcpy with __builtin_memcpy then the safest thing is to do it for all those uses (this will also give you much better generated code for performance purposes). I mentioned this to Richard on IRC the other day, but I think there's a good argument for making upstream QEMU use __builtin_memcpy in these helpers: * it makes us robust against things like this fortify library in the short term (until we fix QEMU to have an equivalent set of functions for doing atomic accesses to AddressSpaces) * in the longer term it will mean that we don't end up with these functions being really badly-performing even if the semantics of the out-of-line memcpy() are correct thanks -- PMM
Hi all, Wanted to share one more update regarding this issue. Since running the new package with the patch from Natanael, the issue never happen again. It's been three weeks with all guests running on virtio-scsi and no issues at all so I guess we can consider this solved. Thanks again to everyone that helped identifying and fixing this issue. Great work. Cheers, Fer On jue, mar 7, 2019 at 8:14 AM, Fernando Casas Schössow <casasfernando@outlook.com> wrote: After two weeks of running the new QEMU package everything is fine. Moreover I migrated the rest of the guests (both Windows and Linux) to virtio-scsi and no issues so far. I will monitor for another week but this issue seems pretty much fixed! Kudos to each and everyone of you that helped finding and solving this issue. Fer
On Mon, Mar 18, 2019 at 07:07:25AM +0000, Fernando Casas Schössow wrote: > Wanted to share one more update regarding this issue. > Since running the new package with the patch from Natanael, the issue never happen again. > It's been three weeks with all guests running on virtio-scsi and no issues at all so I guess we can consider this solved. > > Thanks again to everyone that helped identifying and fixing this issue. Great work. Nice, thanks for reporting and providing all the debugging information to track down this bug! Stefan
--- a/hw/virtio/virtio.c +++ b/hw/virtio/virtio.c @@ -856,7 +856,7 @@ void *virtqueue_pop(VirtQueue *vq, size_t sz) max = vq->vring.num; if (vq->inuse >= vq->vring.num) { - virtio_error(vdev, "Virtqueue size exceeded"); + virtio_error(vdev, "Virtqueue %u device %s size exceeded", vq->queue_index, vdev->name); goto done; }