Message ID | 20161108223153-mutt-send-email-mst@kernel.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Di, 2016-11-08 at 22:37 +0200, Michael S. Tsirkin wrote: > On Mon, Nov 07, 2016 at 09:43:24AM +0100, Jiri Slaby wrote: > > Hi, > > > > I can relatively easily reproduce this bug: How? > > BUG: 'list_empty(&vgdev->free_vbufs)' is true! > The following might be helpful for debugging - if kernel still will > not stop panicing, we are looking at some kind > of memory corruption. Looking carefully through the code I think it isn't impossible to trigger this, but you need for that: (1) command queue full (quite possible), (2) cursor queue full too (unlikely), and (3) multiple threads trying to submit commands and waiting for free space in the command queue (possible with virgl enabled). Do things improve if you allocate some extra bufs? int virtio_gpu_alloc_vbufs(struct virtio_gpu_device *vgdev) { struct virtio_gpu_vbuffer *vbuf; - int i, size, count = 0; + int i, size, count = 16; void *ptr; INIT_LIST_HEAD(&vgdev->free_vbufs); Memory corruption sounds plausible too. Redirect console to ttyS0 for trouble-shooting, trying to dump the oops to the display device which triggered the oops in the first place isn't going to work very well ... cheers, Gerd
On 11/08/2016, 09:37 PM, Michael S. Tsirkin wrote: > On Mon, Nov 07, 2016 at 09:43:24AM +0100, Jiri Slaby wrote: > The following might be helpful for debugging - if kernel still will > not stop panicing, we are looking at some kind > of memory corruption. > > > diff --git a/drivers/gpu/drm/virtio/virtgpu_vq.c b/drivers/gpu/drm/virtio/virtgpu_vq.c > index 5a0f8a7..d5e1e72 100644 > --- a/drivers/gpu/drm/virtio/virtgpu_vq.c > +++ b/drivers/gpu/drm/virtio/virtgpu_vq.c > @@ -127,7 +127,11 @@ virtio_gpu_get_vbuf(struct virtio_gpu_device *vgdev, > struct virtio_gpu_vbuffer *vbuf; > > spin_lock(&vgdev->free_vbufs_lock); > - BUG_ON(list_empty(&vgdev->free_vbufs)); > + WARN_ON(list_empty(&vgdev->free_vbufs)); > + if (list_empty(&vgdev->free_vbufs)) { > + spin_unlock(&vgdev->free_vbufs_lock); > + return ERR_PTR(-EINVAL); > + } Yeah, I already tried that, but it dies immediately after that: WARNING: '1' is true! ------------[ cut here ]------------ WARNING: CPU: 2 PID: 5019 at /home/latest/linux/drivers/gpu/drm/virtio/virtgpu_vq.c:130 virtio_gpu_get_vbuf+0x415/0x6a0 Modules linked in: CPU: 2 PID: 5019 Comm: kworker/2:3 Not tainted 4.9.0-rc2-next-20161028+ #33 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.3-0-ge2fc41e-prebuilt.qemu-project.org 04/01/2014 Workqueue: events drm_fb_helper_dirty_work Call Trace: dump_stack+0xcd/0x134 ? _atomic_dec_and_lock+0xcc/0xcc ? vprintk_default+0x1f/0x30 ? printk+0x99/0xb5 __warn+0x19e/0x1d0 warn_slowpath_null+0x1d/0x20 virtio_gpu_get_vbuf+0x415/0x6a0 ? lock_pin_lock+0x4a0/0x4a0 ? virtio_gpu_cmd_capset_cb+0x460/0x460 ? debug_check_no_locks_freed+0x350/0x350 virtio_gpu_cmd_resource_flush+0x8d/0x2d0 ? virtio_gpu_cmd_set_scanout+0x310/0x310 virtio_gpu_surface_dirty+0x364/0x930 ? mark_held_locks+0xff/0x290 ? virtio_gpufb_create+0xab0/0xab0 ? _raw_spin_unlock_irqrestore+0x53/0x70 ? trace_hardirqs_on_caller+0x46c/0x6b0 virtio_gpu_framebuffer_surface_dirty+0x14/0x20 drm_fb_helper_dirty_work+0x27a/0x400 ? drm_fb_helper_is_bound+0x300/0x300 process_one_work+0x834/0x1c90 ? process_one_work+0x7a5/0x1c90 ? pwq_dec_nr_in_flight+0x3a0/0x3a0 ? worker_thread+0x1b2/0x1540 worker_thread+0x650/0x1540 ? process_one_work+0x1c90/0x1c90 ? process_one_work+0x1c90/0x1c90 kthread+0x206/0x310 ? kthread_create_on_node+0xa0/0xa0 ? trace_hardirqs_on+0xd/0x10 ? kthread_create_on_node+0xa0/0xa0 ? kthread_create_on_node+0xa0/0xa0 ret_from_fork+0x2a/0x40 ---[ end trace c723c98d382423f4 ]--- BUG: unable to handle kernel paging request at fffffc0000000000 IP: check_memory_region+0x7f/0x1a0 PGD 0 Oops: 0000 [#1] PREEMPT SMP KASAN Modules linked in: CPU: 2 PID: 5019 Comm: kworker/2:3 Tainted: G W 4.9.0-rc2-next-20161028+ #33 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.3-0-ge2fc41e-prebuilt.qemu-project.org 04/01/2014 Workqueue: events drm_fb_helper_dirty_work task: ffff8800455f4980 task.stack: ffff88001fd78000 RIP: 0010:check_memory_region+0x7f/0x1a0 RSP: 0018:ffff88001fd7f938 EFLAGS: 00010282 RAX: fffffc0000000000 RBX: dffffc0000000001 RCX: ffffffff8260afb3 RDX: 0000000000000001 RSI: 0000000000000030 RDI: fffffffffffffff4 RBP: ffff88001fd7f948 R08: fffffc0000000001 R09: dffffc0000000004 R10: 0000000000000023 R11: dffffc0000000005 R12: 0000000000000030 R13: 0000000000000000 R14: 0000000000000050 R15: 0000000000000001 FS: 0000000000000000(0000) GS:ffff88007dd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: fffffc0000000000 CR3: 00000000773a0000 CR4: 00000000000006e0 Call Trace: Code: 83 fb 10 7f 3f 4d 85 db 74 34 48 bb 01 00 00 00 00 fc ff df 49 01 c3 49 01 d8 80 38 00 75 13 4d 39 c3 4c 89 c0 74 17 49 83 c0 01 <41> 80 78 ff 00 74 ed 49 89 c0 4d 85 c0 0f 85 8f 00 00 00 5b 41 RIP: check_memory_region+0x7f/0x1a0 RSP: ffff88001fd7f938 CR2: fffffc0000000000 thanks,
On 11/09/2016, 09:01 AM, Gerd Hoffmann wrote: > On Di, 2016-11-08 at 22:37 +0200, Michael S. Tsirkin wrote: >> On Mon, Nov 07, 2016 at 09:43:24AM +0100, Jiri Slaby wrote: >>> Hi, >>> >>> I can relatively easily reproduce this bug: > > How? Run dmesg -w in the qemu window (virtio_gpu) to see a lot of output. Run pps [1] without exit(0); on e.g. serial console. Wait a bit. The lot of output causes the BUG. [1] https://github.com/jirislaby/collected_sources/blob/master/pps.c >>> BUG: 'list_empty(&vgdev->free_vbufs)' is true! > >> The following might be helpful for debugging - if kernel still will >> not stop panicing, we are looking at some kind >> of memory corruption. > > Looking carefully through the code I think it isn't impossible to > trigger this, but you need for that: > > (1) command queue full (quite possible), > (2) cursor queue full too (unlikely), and > (3) multiple threads trying to submit commands and waiting for free > space in the command queue (possible with virgl enabled). I use -vga virtio with no -display option, so no virtgl, I suppose: [drm] virgl 3d acceleration not available > Do things improve if you allocate some extra bufs? > > int virtio_gpu_alloc_vbufs(struct virtio_gpu_device *vgdev) > { > struct virtio_gpu_vbuffer *vbuf; > - int i, size, count = 0; > + int i, size, count = 16; This seems to help. thanks,
On Fr, 2016-11-11 at 17:28 +0100, Jiri Slaby wrote: > On 11/09/2016, 09:01 AM, Gerd Hoffmann wrote: > > On Di, 2016-11-08 at 22:37 +0200, Michael S. Tsirkin wrote: > >> On Mon, Nov 07, 2016 at 09:43:24AM +0100, Jiri Slaby wrote: > >>> Hi, > >>> > >>> I can relatively easily reproduce this bug: > > > > How? > > Run dmesg -w in the qemu window (virtio_gpu) to see a lot of output. fbcon? Or xorg/wayland with terminal app? > Run pps [1] without exit(0); on e.g. serial console. > Wait a bit. The lot of output causes the BUG. > > [1] https://github.com/jirislaby/collected_sources/blob/master/pps.c > > >>> BUG: 'list_empty(&vgdev->free_vbufs)' is true! > > > >> The following might be helpful for debugging - if kernel still will > >> not stop panicing, we are looking at some kind > >> of memory corruption. > > > > Looking carefully through the code I think it isn't impossible to > > trigger this, but you need for that: > > > > (1) command queue full (quite possible), > > (2) cursor queue full too (unlikely), and > > (3) multiple threads trying to submit commands and waiting for free > > space in the command queue (possible with virgl enabled). > > I use -vga virtio with no -display option, so no virtgl, I suppose: > [drm] virgl 3d acceleration not available > > > Do things improve if you allocate some extra bufs? > > > > int virtio_gpu_alloc_vbufs(struct virtio_gpu_device *vgdev) > > { > > struct virtio_gpu_vbuffer *vbuf; > > - int i, size, count = 0; > > + int i, size, count = 16; > > This seems to help. > > thanks,
On 11/15/2016, 09:46 AM, Gerd Hoffmann wrote: > On Fr, 2016-11-11 at 17:28 +0100, Jiri Slaby wrote: >> On 11/09/2016, 09:01 AM, Gerd Hoffmann wrote: >>> On Di, 2016-11-08 at 22:37 +0200, Michael S. Tsirkin wrote: >>>> On Mon, Nov 07, 2016 at 09:43:24AM +0100, Jiri Slaby wrote: >>>>> Hi, >>>>> >>>>> I can relatively easily reproduce this bug: >>> >>> How? >> >> Run dmesg -w in the qemu window (virtio_gpu) to see a lot of output. > > fbcon? Or xorg/wayland with terminal app? Ah, just console, so fbcon. No X server running. thanks,
On Di, 2016-11-15 at 09:55 +0100, Jiri Slaby wrote: > On 11/15/2016, 09:46 AM, Gerd Hoffmann wrote: > > On Fr, 2016-11-11 at 17:28 +0100, Jiri Slaby wrote: > >> On 11/09/2016, 09:01 AM, Gerd Hoffmann wrote: > >>> On Di, 2016-11-08 at 22:37 +0200, Michael S. Tsirkin wrote: > >>>> On Mon, Nov 07, 2016 at 09:43:24AM +0100, Jiri Slaby wrote: > >>>>> Hi, > >>>>> > >>>>> I can relatively easily reproduce this bug: > >>> > >>> How? > >> > >> Run dmesg -w in the qemu window (virtio_gpu) to see a lot of output. > > > > fbcon? Or xorg/wayland with terminal app? > > Ah, just console, so fbcon. No X server running. Hmm, /me looks puzzled. fbcon doesn't do cursor updates, so the cursor queue can hardly be full and there should be enough buffers even without allocating 16 extra bufs. I'll go try reproduce and analyze that one. The +16 patch submitted nevertheless as temporary stopgap. cheers, Gerd
On Fr, 2016-11-11 at 17:28 +0100, Jiri Slaby wrote: > On 11/09/2016, 09:01 AM, Gerd Hoffmann wrote: > > On Di, 2016-11-08 at 22:37 +0200, Michael S. Tsirkin wrote: > >> On Mon, Nov 07, 2016 at 09:43:24AM +0100, Jiri Slaby wrote: > >>> Hi, > >>> > >>> I can relatively easily reproduce this bug: > > > > How? > > Run dmesg -w in the qemu window (virtio_gpu) to see a lot of output. > Run pps [1] without exit(0); on e.g. serial console. > Wait a bit. The lot of output causes the BUG. > > [1] https://github.com/jirislaby/collected_sources/blob/master/pps.c Doesn't reproduce here. Running "while true; do dmesg; done" on the virtio-gpu fbcon. Running the pps fork bomb on the serial console. Can watch dmesg printing the kernel messages over and over, until the shell can't spawn dmesg any more due to the fork bomb hitting the process limit. No BUG() triggered. Tried spice, gtk and sdl. Hmm. Any ideas what else might be needed to reproduce it? cheers, Gerd
sparse produces these warnings: drivers/gpu/drm/virtio/virtgpu_fb.c:340:27: warning: incorrect type in assignment (different address spaces) drivers/gpu/drm/virtio/virtgpu_fb.c:340:27: expected char [noderef] <asn:2>*screen_base drivers/gpu/drm/virtio/virtgpu_fb.c:340:27: got void *vmap This is because the expected type is void __iomem *, while virtio gpu object is void *vmap. We could just cast the warning away but I'm not sure this is not a symptom of an actual problem. For example, might some code call iounmap on this address?
On Do, 2016-11-24 at 04:57 +0200, Michael S. Tsirkin wrote: > sparse produces these warnings: > > drivers/gpu/drm/virtio/virtgpu_fb.c:340:27: warning: incorrect type in > assignment (different address spaces) > drivers/gpu/drm/virtio/virtgpu_fb.c:340:27: expected char [noderef] > <asn:2>*screen_base > drivers/gpu/drm/virtio/virtgpu_fb.c:340:27: got void *vmap > > This is because the expected type is void __iomem *, while > virtio gpu object is void *vmap. > > We could just cast the warning away but I'm not sure this > is not a symptom of an actual problem. For example, might > some code call iounmap on this address? Nobody is ever going to unmap that, the kernel will simply use given address to access the framebuffer. Actually it looks like this (in include/linux/fb.h): union { char __iomem *screen_base; /* Virtual address */ char *screen_buffer; }; and given that the virtio always uses normal ram as backing storage for the framebuffer we should simply s/screen_base/screen_buffer/. I'll go prepare a patch. cheers, Gerd
On 11/16/2016, 02:12 PM, Gerd Hoffmann wrote: > On Fr, 2016-11-11 at 17:28 +0100, Jiri Slaby wrote: >> On 11/09/2016, 09:01 AM, Gerd Hoffmann wrote: >>> On Di, 2016-11-08 at 22:37 +0200, Michael S. Tsirkin wrote: >>>> On Mon, Nov 07, 2016 at 09:43:24AM +0100, Jiri Slaby wrote: >>>>> Hi, >>>>> >>>>> I can relatively easily reproduce this bug: >>> >>> How? >> >> Run dmesg -w in the qemu window (virtio_gpu) to see a lot of output. >> Run pps [1] without exit(0); on e.g. serial console. >> Wait a bit. The lot of output causes the BUG. >> >> [1] https://github.com/jirislaby/collected_sources/blob/master/pps.c > > Doesn't reproduce here. > > Running "while true; do dmesg; done" on the virtio-gpu fbcon. > Running the pps fork bomb on the serial console. > > Can watch dmesg printing the kernel messages over and over, until the > shell can't spawn dmesg any more due to the fork bomb hitting the > process limit. No BUG() triggered. > > Tried spice, gtk and sdl. > > Hmm. > > Any ideas what else might be needed to reproduce it? I can reproduce even with count = 32 :(. And without the fork bomb (i.e. with the code from the repository). This is how I start qemu: /usr/bin/qemu-system-x86_64 -machine accel=kvm -k en-us -smp 4 -m 2371 -usb -device virtio-rng-pci -drive file=/home/new/suse-fact.img,format=raw,discard=unmap,if=none,id=hd -device virtio-scsi-pci,id=scsi -device scsi-hd,drive=hd -soundhw hda -net user,tftp=/home/xslaby/tftp,bootfile=/pxelinux.0,hostfwd=tcp::2222-:22,hostfwd=tcp::3632-:3632 -net nic,model=virtio -serial pty -balloon virtio -device virtio-tablet-pci -vga virtio -kernel /home/latest/my/arch/x86/boot/bzImage -append root=/dev/sda1 console=ttyS0,115200 loglevel=debug -snapshot I do dmesg -w # on the console and on serial console: while :; do for aa in `seq 1 10`; do ./pps & done; wait; done Note the latter can cause interrupt "storm" (~ 700 irqs per second) as much output is generated. This can lead to some race condition. serial is on IRQ4 and virtio gpu on IRQ10 which has lower priority AFAIK. thanks,
diff --git a/drivers/gpu/drm/virtio/virtgpu_vq.c b/drivers/gpu/drm/virtio/virtgpu_vq.c index 5a0f8a7..d5e1e72 100644 --- a/drivers/gpu/drm/virtio/virtgpu_vq.c +++ b/drivers/gpu/drm/virtio/virtgpu_vq.c @@ -127,7 +127,11 @@ virtio_gpu_get_vbuf(struct virtio_gpu_device *vgdev, struct virtio_gpu_vbuffer *vbuf; spin_lock(&vgdev->free_vbufs_lock); - BUG_ON(list_empty(&vgdev->free_vbufs)); + WARN_ON(list_empty(&vgdev->free_vbufs)); + if (list_empty(&vgdev->free_vbufs)) { + spin_unlock(&vgdev->free_vbufs_lock); + return ERR_PTR(-EINVAL); + } vbuf = list_first_entry(&vgdev->free_vbufs, struct virtio_gpu_vbuffer, list); list_del(&vbuf->list);