mbox series

[v5,0/7] drm/mgag200: Implement VBLANK support

Message ID 20240718104551.575912-1-tzimmermann@suse.de (mailing list archive)
Headers show
Series drm/mgag200: Implement VBLANK support | expand

Message

Thomas Zimmermann July 18, 2024, 10:44 a.m. UTC
Implement support for VBLANK events in mgag200.

Patches 1 to 5 prepare mgag200's modesetting code by renaming or
adding variables for various hardware fields. This makes the code
more readable and aligns it with the programming manuals for Matrox 
hardware.

Patch 6 implements support for VBLANK events. The patch has been
reviewed before at [1]. That old patchset never found its way into
the kernel, but the VBLANK support is still useful.

Patch 7 adds support for VBLANK timestamps.

v5:
- clear all interrupts before registering IRQ (Jocelyn)
- don't read from ICLEAR (Jocelyn)

v4:
- split off the patchset from an earlier series [1]

[1] https://patchwork.freedesktop.org/series/66442/

Thomas Zimmermann (7):
  drm/mgag200: Use hexadecimal register indeces
  drm/mgag200: Align register field names with documentation
  drm/mgag200: Use adjusted mode values for CRTCs
  drm/mgag200: Add dedicated variables for blanking fields
  drm/mgag200: Add dedicted variable for <linecomp> field
  drm/mgag200: Add vblank support
  drm/mgag200: Implement struct drm_crtc_funcs.get_vblank_timestamp

 drivers/gpu/drm/mgag200/mgag200_drv.c     |  40 ++++++
 drivers/gpu/drm/mgag200/mgag200_drv.h     |  14 +-
 drivers/gpu/drm/mgag200/mgag200_g200.c    |   5 +
 drivers/gpu/drm/mgag200/mgag200_g200eh.c  |   5 +
 drivers/gpu/drm/mgag200/mgag200_g200eh3.c |   5 +
 drivers/gpu/drm/mgag200/mgag200_g200er.c  |   5 +
 drivers/gpu/drm/mgag200/mgag200_g200ev.c  |   5 +
 drivers/gpu/drm/mgag200/mgag200_g200ew3.c |   5 +
 drivers/gpu/drm/mgag200/mgag200_g200se.c  |   5 +
 drivers/gpu/drm/mgag200/mgag200_g200wb.c  |   5 +
 drivers/gpu/drm/mgag200/mgag200_mode.c    | 167 ++++++++++++++++------
 drivers/gpu/drm/mgag200/mgag200_reg.h     |   7 +
 12 files changed, 223 insertions(+), 45 deletions(-)

Comments

Tony Luck Oct. 1, 2024, 10:41 p.m. UTC | #1
My system threw out a bunch of stack traces while booting
v6.12-rc1 and hung.

First of these looks like this:

[   33.639799] fbcon: mgag200drmfb (fb0) is primary device
[   33.651573] ixgbe 0000:03:00.0: Intel(R) 10 Gigabit Network Connection
[   33.652092] ixgbe 0000:03:00.1: enabling device (0100 -> 0102)
[   33.818328] ------------[ cut here ]------------
[   33.818362] [CRTC:34:crtc-0] vblank wait timed out
[   33.818422] WARNING: CPU: 44 PID: 1815 at drivers/gpu/drm/drm_atomic_helper.c:1682 drm_atomic_helper_wait_for_vblanks.part.0+0x245/0x250 [drm_kms_helper]
[   33.818447] Modules linked in: crct10dif_pclmul mgag200(+) crc32_pclmul i2c_algo_bit crc32c_intel drm_shmem_helper ghash_clmulni_intel sha512_ssse3 drm_kms_helper sha256_ssse3 sha1_ssse3 ixgbe(+) mpt3sas mdio drm raid_class dca scsi_transport_sas wmi fuse
[   33.818475] CPU: 44 PID: 1815 Comm: systemd-udevd Not tainted 6.10.0-rc1+ #168
[   33.818478] Hardware name: Intel Corporation BRICKLAND/BRICKLAND, BIOS BRBDXSD1.86B.0338.V01.1603162127 03/16/2016
[   33.818481] RIP: 0010:drm_atomic_helper_wait_for_vblanks.part.0+0x245/0x250 [drm_kms_helper]
[   33.818490] Code: 00 48 8d 7b 08 e8 2b 7e 61 da 45 85 ff 0f 85 d3 fe ff ff 49 8b 56 20 41 8b b6 d8 00 00 00 48 c7 c7 38 52 ba c0 e8 8b 53 59 da <0f> 0b e9 b5 fe ff ff 0f 1f 40 00 90 90 90 90 90 90 90 90 90 90 90
[   33.818493] RSP: 0018:ffffbf61a3faf690 EFLAGS: 00010282
[   33.818496] RAX: 0000000000000026 RBX: ffff99be04ad3028 RCX: 0000000000000000
[   33.818499] RDX: 0000000000000002 RSI: ffffffff9c9fd7c8 RDI: 00000000ffffffff
[   33.818501] RBP: ffff99be08a76c00 R08: 0000000000000001 R09: 0000000000000000
[   33.818503] R10: 0000000000000001 R11: ffff99f1011fffe8 R12: 0000000000000000
[   33.818504] R13: 0000000000000000 R14: ffff99be0bcf93f8 R15: 0000000000000000
[   33.818506] FS:  00007fbe18e7db40(0000) GS:ffff99ca61c00000(0000) knlGS:0000000000000000
[   33.818509] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   33.818510] CR2: 000055b77636c1f8 CR3: 000000000e486004 CR4: 00000000003706f0
[   33.818513] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   33.818514] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   33.818516] Call Trace:
[   33.818519]  <TASK>
[   33.818521]  ? __warn+0x8b/0x190
[   33.818535]  ? drm_atomic_helper_wait_for_vblanks.part.0+0x245/0x250 [drm_kms_helper]
[   33.818545]  ? report_bug+0x1c3/0x1d0
[   33.818559]  ? handle_bug+0x42/0x70
[   33.818571]  ? exc_invalid_op+0x14/0x70
[   33.818575]  ? asm_exc_invalid_op+0x16/0x20
[   33.818589]  ? drm_atomic_helper_wait_for_vblanks.part.0+0x245/0x250 [drm_kms_helper]
[   33.818602]  ? __pfx_autoremove_wake_function+0x10/0x10
[   33.818614]  drm_atomic_helper_commit_tail+0x71/0x80 [drm_kms_helper]
[   33.818625]  mgag200_mode_config_helper_atomic_commit_tail+0x28/0x40 [mgag200]
[   33.818633]  commit_tail+0x94/0x130 [drm_kms_helper]
[   33.818644]  drm_atomic_helper_commit+0x13e/0x170 [drm_kms_helper]
[   33.818655]  drm_atomic_commit+0x97/0xb0 [drm]
[   33.818717]  ? __pfx___drm_printfn_info+0x10/0x10 [drm]
[   33.818750]  drm_client_modeset_commit_atomic+0x207/0x250 [drm]
[   33.818783]  drm_client_modeset_commit_locked+0x5b/0x190 [drm]
[   33.818807]  drm_client_modeset_commit+0x24/0x50 [drm]
[   33.818829]  __drm_fb_helper_restore_fbdev_mode_unlocked+0x92/0xc0 [drm_kms_helper]
[   33.818841]  drm_fb_helper_set_par+0x2e/0x40 [drm_kms_helper]
[   33.818850]  fbcon_init+0x2a8/0x560
[   33.818860]  visual_init+0xc4/0x120
[   33.818867]  do_bind_con_driver.isra.0+0x1a1/0x3d0
[   33.818875]  do_take_over_console+0x10b/0x1a0
[   33.818880]  do_fbcon_takeover+0x5c/0xc0
[   33.818883]  fbcon_fb_registered+0x49/0x70
[   33.818886]  register_framebuffer+0x190/0x250
[   33.818896]  __drm_fb_helper_initial_config_and_unlock+0x345/0x590 [drm_kms_helper]
[   33.818906]  ? drm_client_register+0x33/0xc0 [drm]
[   33.818934]  drm_fbdev_shmem_client_hotplug+0x6c/0xc0 [drm_shmem_helper]
[   33.818939]  drm_client_register+0x7b/0xc0 [drm]
[   33.818963]  mgag200_pci_probe+0x90/0x180 [mgag200]
[   33.818970]  local_pci_probe+0x46/0xa0
[   33.818978]  pci_device_probe+0xb5/0x230
[   33.818986]  really_probe+0xd9/0x380
[   33.818993]  __driver_probe_device+0x78/0x150
[   33.818997]  driver_probe_device+0x1e/0x90
[   33.819000]  __driver_attach+0xd6/0x1d0
[   33.819003]  ? __pfx___driver_attach+0x10/0x10
[   33.819005]  bus_for_each_dev+0x66/0xa0
[   33.819012]  bus_add_driver+0x111/0x240
[   33.819018]  driver_register+0x5c/0x120
[   33.819021]  ? __pfx_mgag200_pci_driver_init+0x10/0x10 [mgag200]
[   33.819026]  do_one_initcall+0x62/0x3a0
[   33.819035]  ? kmalloc_trace_noprof+0x2a0/0x340
[   33.819048]  do_init_module+0x64/0x240
[   33.819058]  init_module_from_file+0x7a/0xa0
[   33.819072]  idempotent_init_module+0x15a/0x210
[   33.819079]  ? __startup_64+0x70/0x3f0
[   33.819086]  __x64_sys_finit_module+0x5a/0xb0
[   33.819092]  do_syscall_64+0x73/0x190
[   33.819098]  entry_SYSCALL_64_after_hwframe+0x76/0x7e

I peeked at changes to this driver between v6.11 and v6.12-rc1 and saw
this set:

$ git log --oneline linus ^v6.11 -- drivers/gpu/drm/mgag200
219b45d023ed drm/mgag200: Remove BMC output
0f9ff361ad82 drm/mgag200: vga-bmc: Control BMC scanout from encoder
9d09cac47de5 drm/mgag200: vga-bmc: Control CRTC VIDRST flag from encoder
dc06efbb7934 drm/mgag200: vga-bmc: Transparently handle BMC
f5510726608f drm/mgag200: Add VGA-BMC output
6c9e14ee9f51 drm/mgag200: Fix VBLANK interrupt handling
d5070c9b2944 drm/mgag200: Implement struct drm_crtc_funcs.get_vblank_timestamp
89c6ea2006e2 drm/mgag200: Add vblank support
5cd522b5331b drm/mgag200: Add dedicted variable for <linecomp> field
d6460bd52c27 drm/mgag200: Add dedicated variables for blanking fields
e8f834b55962 drm/mgag200: Use adjusted mode values for CRTCs
b345b3542d66 drm/mgag200: Align register field names with documentation
754c9129b949 drm/mgag200: Use hexadecimal register indeces
3ac9384061b2 drm/mgag200: Rename BMC vidrst names
7bb97cf91588 drm/mgag200: Remove vidrst callbacks from struct mgag200_device_funcs
cd3a2e8b0a03 drm/mgag200: Only set VIDRST bits in CRTC modesetting

I tried a mini-bisct across these changes and found the system boots
normally with:

5cd522b5331b drm/mgag200: Add dedicted variable for <linecomp> field

and fails with:

89c6ea2006e2 drm/mgag200: Add vblank support

I do see that there is a subsequent "Fix VBLANK" patch, but it appears
that whatever it fixed didn't help on my system.

-Tony
Thomas Zimmermann Oct. 2, 2024, 7:06 a.m. UTC | #2
Hi

Am 02.10.24 um 00:41 schrieb Tony Luck:
> My system threw out a bunch of stack traces while booting
> v6.12-rc1 and hung.

Thanks for the bug report. Can you provide the output of 'sudo lspci 
-vvv' for the graphics device?

Best regards
Thomas

>
> First of these looks like this:
>
> [   33.639799] fbcon: mgag200drmfb (fb0) is primary device
> [   33.651573] ixgbe 0000:03:00.0: Intel(R) 10 Gigabit Network Connection
> [   33.652092] ixgbe 0000:03:00.1: enabling device (0100 -> 0102)
> [   33.818328] ------------[ cut here ]------------
> [   33.818362] [CRTC:34:crtc-0] vblank wait timed out
> [   33.818422] WARNING: CPU: 44 PID: 1815 at drivers/gpu/drm/drm_atomic_helper.c:1682 drm_atomic_helper_wait_for_vblanks.part.0+0x245/0x250 [drm_kms_helper]
> [   33.818447] Modules linked in: crct10dif_pclmul mgag200(+) crc32_pclmul i2c_algo_bit crc32c_intel drm_shmem_helper ghash_clmulni_intel sha512_ssse3 drm_kms_helper sha256_ssse3 sha1_ssse3 ixgbe(+) mpt3sas mdio drm raid_class dca scsi_transport_sas wmi fuse
> [   33.818475] CPU: 44 PID: 1815 Comm: systemd-udevd Not tainted 6.10.0-rc1+ #168
> [   33.818478] Hardware name: Intel Corporation BRICKLAND/BRICKLAND, BIOS BRBDXSD1.86B.0338.V01.1603162127 03/16/2016
> [   33.818481] RIP: 0010:drm_atomic_helper_wait_for_vblanks.part.0+0x245/0x250 [drm_kms_helper]
> [   33.818490] Code: 00 48 8d 7b 08 e8 2b 7e 61 da 45 85 ff 0f 85 d3 fe ff ff 49 8b 56 20 41 8b b6 d8 00 00 00 48 c7 c7 38 52 ba c0 e8 8b 53 59 da <0f> 0b e9 b5 fe ff ff 0f 1f 40 00 90 90 90 90 90 90 90 90 90 90 90
> [   33.818493] RSP: 0018:ffffbf61a3faf690 EFLAGS: 00010282
> [   33.818496] RAX: 0000000000000026 RBX: ffff99be04ad3028 RCX: 0000000000000000
> [   33.818499] RDX: 0000000000000002 RSI: ffffffff9c9fd7c8 RDI: 00000000ffffffff
> [   33.818501] RBP: ffff99be08a76c00 R08: 0000000000000001 R09: 0000000000000000
> [   33.818503] R10: 0000000000000001 R11: ffff99f1011fffe8 R12: 0000000000000000
> [   33.818504] R13: 0000000000000000 R14: ffff99be0bcf93f8 R15: 0000000000000000
> [   33.818506] FS:  00007fbe18e7db40(0000) GS:ffff99ca61c00000(0000) knlGS:0000000000000000
> [   33.818509] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   33.818510] CR2: 000055b77636c1f8 CR3: 000000000e486004 CR4: 00000000003706f0
> [   33.818513] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [   33.818514] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [   33.818516] Call Trace:
> [   33.818519]  <TASK>
> [   33.818521]  ? __warn+0x8b/0x190
> [   33.818535]  ? drm_atomic_helper_wait_for_vblanks.part.0+0x245/0x250 [drm_kms_helper]
> [   33.818545]  ? report_bug+0x1c3/0x1d0
> [   33.818559]  ? handle_bug+0x42/0x70
> [   33.818571]  ? exc_invalid_op+0x14/0x70
> [   33.818575]  ? asm_exc_invalid_op+0x16/0x20
> [   33.818589]  ? drm_atomic_helper_wait_for_vblanks.part.0+0x245/0x250 [drm_kms_helper]
> [   33.818602]  ? __pfx_autoremove_wake_function+0x10/0x10
> [   33.818614]  drm_atomic_helper_commit_tail+0x71/0x80 [drm_kms_helper]
> [   33.818625]  mgag200_mode_config_helper_atomic_commit_tail+0x28/0x40 [mgag200]
> [   33.818633]  commit_tail+0x94/0x130 [drm_kms_helper]
> [   33.818644]  drm_atomic_helper_commit+0x13e/0x170 [drm_kms_helper]
> [   33.818655]  drm_atomic_commit+0x97/0xb0 [drm]
> [   33.818717]  ? __pfx___drm_printfn_info+0x10/0x10 [drm]
> [   33.818750]  drm_client_modeset_commit_atomic+0x207/0x250 [drm]
> [   33.818783]  drm_client_modeset_commit_locked+0x5b/0x190 [drm]
> [   33.818807]  drm_client_modeset_commit+0x24/0x50 [drm]
> [   33.818829]  __drm_fb_helper_restore_fbdev_mode_unlocked+0x92/0xc0 [drm_kms_helper]
> [   33.818841]  drm_fb_helper_set_par+0x2e/0x40 [drm_kms_helper]
> [   33.818850]  fbcon_init+0x2a8/0x560
> [   33.818860]  visual_init+0xc4/0x120
> [   33.818867]  do_bind_con_driver.isra.0+0x1a1/0x3d0
> [   33.818875]  do_take_over_console+0x10b/0x1a0
> [   33.818880]  do_fbcon_takeover+0x5c/0xc0
> [   33.818883]  fbcon_fb_registered+0x49/0x70
> [   33.818886]  register_framebuffer+0x190/0x250
> [   33.818896]  __drm_fb_helper_initial_config_and_unlock+0x345/0x590 [drm_kms_helper]
> [   33.818906]  ? drm_client_register+0x33/0xc0 [drm]
> [   33.818934]  drm_fbdev_shmem_client_hotplug+0x6c/0xc0 [drm_shmem_helper]
> [   33.818939]  drm_client_register+0x7b/0xc0 [drm]
> [   33.818963]  mgag200_pci_probe+0x90/0x180 [mgag200]
> [   33.818970]  local_pci_probe+0x46/0xa0
> [   33.818978]  pci_device_probe+0xb5/0x230
> [   33.818986]  really_probe+0xd9/0x380
> [   33.818993]  __driver_probe_device+0x78/0x150
> [   33.818997]  driver_probe_device+0x1e/0x90
> [   33.819000]  __driver_attach+0xd6/0x1d0
> [   33.819003]  ? __pfx___driver_attach+0x10/0x10
> [   33.819005]  bus_for_each_dev+0x66/0xa0
> [   33.819012]  bus_add_driver+0x111/0x240
> [   33.819018]  driver_register+0x5c/0x120
> [   33.819021]  ? __pfx_mgag200_pci_driver_init+0x10/0x10 [mgag200]
> [   33.819026]  do_one_initcall+0x62/0x3a0
> [   33.819035]  ? kmalloc_trace_noprof+0x2a0/0x340
> [   33.819048]  do_init_module+0x64/0x240
> [   33.819058]  init_module_from_file+0x7a/0xa0
> [   33.819072]  idempotent_init_module+0x15a/0x210
> [   33.819079]  ? __startup_64+0x70/0x3f0
> [   33.819086]  __x64_sys_finit_module+0x5a/0xb0
> [   33.819092]  do_syscall_64+0x73/0x190
> [   33.819098]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
>
> I peeked at changes to this driver between v6.11 and v6.12-rc1 and saw
> this set:
>
> $ git log --oneline linus ^v6.11 -- drivers/gpu/drm/mgag200
> 219b45d023ed drm/mgag200: Remove BMC output
> 0f9ff361ad82 drm/mgag200: vga-bmc: Control BMC scanout from encoder
> 9d09cac47de5 drm/mgag200: vga-bmc: Control CRTC VIDRST flag from encoder
> dc06efbb7934 drm/mgag200: vga-bmc: Transparently handle BMC
> f5510726608f drm/mgag200: Add VGA-BMC output
> 6c9e14ee9f51 drm/mgag200: Fix VBLANK interrupt handling
> d5070c9b2944 drm/mgag200: Implement struct drm_crtc_funcs.get_vblank_timestamp
> 89c6ea2006e2 drm/mgag200: Add vblank support
> 5cd522b5331b drm/mgag200: Add dedicted variable for <linecomp> field
> d6460bd52c27 drm/mgag200: Add dedicated variables for blanking fields
> e8f834b55962 drm/mgag200: Use adjusted mode values for CRTCs
> b345b3542d66 drm/mgag200: Align register field names with documentation
> 754c9129b949 drm/mgag200: Use hexadecimal register indeces
> 3ac9384061b2 drm/mgag200: Rename BMC vidrst names
> 7bb97cf91588 drm/mgag200: Remove vidrst callbacks from struct mgag200_device_funcs
> cd3a2e8b0a03 drm/mgag200: Only set VIDRST bits in CRTC modesetting
>
> I tried a mini-bisct across these changes and found the system boots
> normally with:
>
> 5cd522b5331b drm/mgag200: Add dedicted variable for <linecomp> field
>
> and fails with:
>
> 89c6ea2006e2 drm/mgag200: Add vblank support
>
> I do see that there is a subsequent "Fix VBLANK" patch, but it appears
> that whatever it fixed didn't help on my system.
>
> -Tony
Tony Luck Oct. 2, 2024, 4:15 p.m. UTC | #3
> Thanks for the bug report. Can you provide the output of 'sudo lspci 
> -vvv' for the graphics device?

Thomas,

Sure. Here's the output (run on the v6.11.0 kernel)

$ sudo lspci -vvv -s 0000:08:00.0
08:00.0 VGA compatible controller: Matrox Electronics Systems Ltd. MGA G200e [Pilot] ServerEngines (SEP1) (rev 05) (prog-if 00 [VGA controller])
        Subsystem: Intel Corporation Device 0103
        Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Interrupt: pin A routed to IRQ 16
        Region 0: Memory at 90000000 (32-bit, prefetchable) [size=16M]
        Region 1: Memory at 91800000 (32-bit, non-prefetchable) [size=16K]
        Region 2: Memory at 91000000 (32-bit, non-prefetchable) [size=8M]
        Expansion ROM at 91810000 [disabled] [size=64K]
        Capabilities: [dc] Power Management version 2
                Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [e4] Express (v1) Legacy Endpoint, MSI 00
                DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
                        ExtTag- AttnBtn- AttnInd- PwrInd- RBE- FLReset-
                DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
                        RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop-
                        MaxPayload 128 bytes, MaxReadReq 128 bytes
                DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq+ AuxPwr- TransPend-
                LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s, Exit Latency L0s <64ns
                        ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp-
                LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 2.5GT/s (ok), Width x1 (ok)
                        TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
        Capabilities: [54] MSI: Enable- Count=1/1 Maskable- 64bit-
                Address: 00000000  Data: 0000
        Kernel driver in use: mgag200
        Kernel modules: mgag200

-Tony
Thomas Zimmermann Oct. 4, 2024, 9:17 a.m. UTC | #4
Hi

Am 02.10.24 um 18:15 schrieb Luck, Tony:
>> Thanks for the bug report. Can you provide the output of 'sudo lspci
>> -vvv' for the graphics device?
> Thomas,
>
> Sure. Here's the output (run on the v6.11.0 kernel)

Thanks. It doesn't look much different from other systems. IRQ is also 
assigned.

Attached is a patch that fixes a possible off-by-one error in the 
register settings. This would affect the bug you're reporting. If 
possible, please apply the patch to your 6.12-rc1, test and report the 
result.

Best regards
Thomas

>
> $ sudo lspci -vvv -s 0000:08:00.0
> 08:00.0 VGA compatible controller: Matrox Electronics Systems Ltd. MGA G200e [Pilot] ServerEngines (SEP1) (rev 05) (prog-if 00 [VGA controller])
>          Subsystem: Intel Corporation Device 0103
>          Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
>          Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
>          Interrupt: pin A routed to IRQ 16
>          Region 0: Memory at 90000000 (32-bit, prefetchable) [size=16M]
>          Region 1: Memory at 91800000 (32-bit, non-prefetchable) [size=16K]
>          Region 2: Memory at 91000000 (32-bit, non-prefetchable) [size=8M]
>          Expansion ROM at 91810000 [disabled] [size=64K]
>          Capabilities: [dc] Power Management version 2
>                  Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
>                  Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
>          Capabilities: [e4] Express (v1) Legacy Endpoint, MSI 00
>                  DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
>                          ExtTag- AttnBtn- AttnInd- PwrInd- RBE- FLReset-
>                  DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
>                          RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop-
>                          MaxPayload 128 bytes, MaxReadReq 128 bytes
>                  DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq+ AuxPwr- TransPend-
>                  LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s, Exit Latency L0s <64ns
>                          ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp-
>                  LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk+
>                          ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
>                  LnkSta: Speed 2.5GT/s (ok), Width x1 (ok)
>                          TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
>          Capabilities: [54] MSI: Enable- Count=1/1 Maskable- 64bit-
>                  Address: 00000000  Data: 0000
>          Kernel driver in use: mgag200
>          Kernel modules: mgag200
>
> -Tony
Ville Syrjälä Oct. 4, 2024, 10:01 a.m. UTC | #5
On Fri, Oct 04, 2024 at 11:17:02AM +0200, Thomas Zimmermann wrote:
> Hi
> 
> Am 02.10.24 um 18:15 schrieb Luck, Tony:
> >> Thanks for the bug report. Can you provide the output of 'sudo lspci
> >> -vvv' for the graphics device?
> > Thomas,
> >
> > Sure. Here's the output (run on the v6.11.0 kernel)
> 
> Thanks. It doesn't look much different from other systems. IRQ is also 
> assigned.
> 
> Attached is a patch that fixes a possible off-by-one error in the 
> register settings. This would affect the bug you're reporting. If 
> possible, please apply the patch to your 6.12-rc1, test and report the 
> result.

Didn't one of these weird variants have some bug where the
CRTC startadd was not working? Is this one of those?
That to me sounds like maybe linecomp has internally been
tied to be always active somehow. Perhaps that would
also prevent it from generating the interrupt...

Anyways, sounds like someone should just double check whether
the status bit ever get asserted or not. If yes, then the
problem must be with interrupt delivery, otherwise the
problem is that the internal interrupt is never even
generated. In the latter case you could try using the
vsync interrupt instead.
Thomas Zimmermann Oct. 4, 2024, 11:03 a.m. UTC | #6
Hi

thanks for your help.


Am 04.10.24 um 12:01 schrieb Ville Syrjälä:
> On Fri, Oct 04, 2024 at 11:17:02AM +0200, Thomas Zimmermann wrote:
>> Hi
>>
>> Am 02.10.24 um 18:15 schrieb Luck, Tony:
>>>> Thanks for the bug report. Can you provide the output of 'sudo lspci
>>>> -vvv' for the graphics device?
>>> Thomas,
>>>
>>> Sure. Here's the output (run on the v6.11.0 kernel)
>> Thanks. It doesn't look much different from other systems. IRQ is also
>> assigned.
>>
>> Attached is a patch that fixes a possible off-by-one error in the
>> register settings. This would affect the bug you're reporting. If
>> possible, please apply the patch to your 6.12-rc1, test and report the
>> result.
> Didn't one of these weird variants have some bug where the
> CRTC startadd was not working? Is this one of those?

Yeah, but it seems unrelated.

> That to me sounds like maybe linecomp has internally been
> tied to be always active somehow. Perhaps that would
> also prevent it from generating the interrupt...

Linecomp is usually set to vtotal and that disables the irq. When set to 
vblank_start/vdisplay_end, it acts like a vblank IRQ. But the other 
matrox drivers I saw (fbdev, Xorg-video-matrox) set the value -1, while 
mgag200 doesn't. So there really is an off-by-one error.

>
> Anyways, sounds like someone should just double check whether
> the status bit ever get asserted or not. If yes, then the
> problem must be with interrupt delivery, otherwise the
> problem is that the internal interrupt is never even
> generated. In the latter case you could try using the
> vsync interrupt instead.

I didn't want to go into full debugging while there's a low-hanging fix 
to try first. I'll probably take that patch anyway even if it doesn't 
fix the reported bug.

Wrt. vsync: isn't that way to late for vblank events? Does DRM give any 
timing guarantees? (It doesn't AFAIK.) Or does it just mean that a 
vblank has happened at some point in the past?

Best regards
Thomas

>
Ville Syrjälä Oct. 4, 2024, 11:19 a.m. UTC | #7
On Fri, Oct 04, 2024 at 01:03:21PM +0200, Thomas Zimmermann wrote:
> Hi
> 
> thanks for your help.
> 
> 
> Am 04.10.24 um 12:01 schrieb Ville Syrjälä:
> > On Fri, Oct 04, 2024 at 11:17:02AM +0200, Thomas Zimmermann wrote:
> >> Hi
> >>
> >> Am 02.10.24 um 18:15 schrieb Luck, Tony:
> >>>> Thanks for the bug report. Can you provide the output of 'sudo lspci
> >>>> -vvv' for the graphics device?
> >>> Thomas,
> >>>
> >>> Sure. Here's the output (run on the v6.11.0 kernel)
> >> Thanks. It doesn't look much different from other systems. IRQ is also
> >> assigned.
> >>
> >> Attached is a patch that fixes a possible off-by-one error in the
> >> register settings. This would affect the bug you're reporting. If
> >> possible, please apply the patch to your 6.12-rc1, test and report the
> >> result.
> > Didn't one of these weird variants have some bug where the
> > CRTC startadd was not working? Is this one of those?
> 
> Yeah, but it seems unrelated.
> 
> > That to me sounds like maybe linecomp has internally been
> > tied to be always active somehow. Perhaps that would
> > also prevent it from generating the interrupt...
> 
> Linecomp is usually set to vtotal and that disables the irq. When set to 
> vblank_start/vdisplay_end, it acts like a vblank IRQ. But the other 
> matrox drivers I saw (fbdev, Xorg-video-matrox) set the value -1, while 
> mgag200 doesn't. So there really is an off-by-one error.

For the purposes of the interrupt it shouldn't matter
at all what the linecomp value is, as long as it's
between 0 and vtotal. The patch seemed to just care
about vblkstr which doesn't seem relevant to me.

> 
> >
> > Anyways, sounds like someone should just double check whether
> > the status bit ever get asserted or not. If yes, then the
> > problem must be with interrupt delivery, otherwise the
> > problem is that the internal interrupt is never even
> > generated. In the latter case you could try using the
> > vsync interrupt instead.
> 
> I didn't want to go into full debugging while there's a low-hanging fix 
> to try first. I'll probably take that patch anyway even if it doesn't 
> fix the reported bug.
> 
> Wrt. vsync: isn't that way to late for vblank events? Does DRM give any 
> timing guarantees? (It doesn't AFAIK.) Or does it just mean that a 
> vblank has happened at some point in the past?

It doesn't really matter when the interrupt gets signalled
as long as it's after vblank start. And since the hardware
doesn't even have double buffered register and IIRC doesn't
really care when you reprogram eg. the start address it should
matter even less. Not that it looks like you even try to
do any atomic updates from the vblank handler, so I guess
you just want this for throttling purposes?
Thomas Zimmermann Oct. 4, 2024, 11:47 a.m. UTC | #8
Hi

Am 04.10.24 um 13:19 schrieb Ville Syrjälä:
> On Fri, Oct 04, 2024 at 01:03:21PM +0200, Thomas Zimmermann wrote:
>> Hi
>>
>> thanks for your help.
>>
>>
>> Am 04.10.24 um 12:01 schrieb Ville Syrjälä:
>>> On Fri, Oct 04, 2024 at 11:17:02AM +0200, Thomas Zimmermann wrote:
>>>> Hi
>>>>
>>>> Am 02.10.24 um 18:15 schrieb Luck, Tony:
>>>>>> Thanks for the bug report. Can you provide the output of 'sudo lspci
>>>>>> -vvv' for the graphics device?
>>>>> Thomas,
>>>>>
>>>>> Sure. Here's the output (run on the v6.11.0 kernel)
>>>> Thanks. It doesn't look much different from other systems. IRQ is also
>>>> assigned.
>>>>
>>>> Attached is a patch that fixes a possible off-by-one error in the
>>>> register settings. This would affect the bug you're reporting. If
>>>> possible, please apply the patch to your 6.12-rc1, test and report the
>>>> result.
>>> Didn't one of these weird variants have some bug where the
>>> CRTC startadd was not working? Is this one of those?
>> Yeah, but it seems unrelated.
>>
>>> That to me sounds like maybe linecomp has internally been
>>> tied to be always active somehow. Perhaps that would
>>> also prevent it from generating the interrupt...
>> Linecomp is usually set to vtotal and that disables the irq. When set to
>> vblank_start/vdisplay_end, it acts like a vblank IRQ. But the other
>> matrox drivers I saw (fbdev, Xorg-video-matrox) set the value -1, while
>> mgag200 doesn't. So there really is an off-by-one error.
> For the purposes of the interrupt it shouldn't matter
> at all what the linecomp value is, as long as it's
> between 0 and vtotal. The patch seemed to just care
> about vblkstr which doesn't seem relevant to me.

vblkstr is "vblank start" and equal to vdisplay_end. Then linecomp = 
vblkstr; happens at some later point in the function.

I've run into several mysterious vblank timeouts while making this 
patchset and they all seemed to be related to the exact values in these 
registers. So I'm not sure if linecomp really fires an interrupt if it 
happens too late after vdisplay_end/vblank_start. The official 
documentation is a bit confusing IIRC. So my first step here is to make 
mgag200 behave like other existing drivers and see if that fixes the 
issue. Hence the off-by-one fix.

>
>>> Anyways, sounds like someone should just double check whether
>>> the status bit ever get asserted or not. If yes, then the
>>> problem must be with interrupt delivery, otherwise the
>>> problem is that the internal interrupt is never even
>>> generated. In the latter case you could try using the
>>> vsync interrupt instead.
>> I didn't want to go into full debugging while there's a low-hanging fix
>> to try first. I'll probably take that patch anyway even if it doesn't
>> fix the reported bug.
>>
>> Wrt. vsync: isn't that way to late for vblank events? Does DRM give any
>> timing guarantees? (It doesn't AFAIK.) Or does it just mean that a
>> vblank has happened at some point in the past?
> It doesn't really matter when the interrupt gets signalled
> as long as it's after vblank start. And since the hardware
> doesn't even have double buffered register and IIRC doesn't
> really care when you reprogram eg. the start address it should
> matter even less. Not that it looks like you even try to
> do any atomic updates from the vblank handler, so I guess
> you just want this for throttling purposes?

I see. VSYNC would likely work for that. Throttling is the main purpose.

Best regards
Thomas

>
Tony Luck Oct. 4, 2024, 4:58 p.m. UTC | #9
Thomas,

v6.12-rc1 plus your off-by-one patch is still broken.

Console log from when things went off the rails:

[   32.126676] Console: switching to colour dummy device 80x25
[   32.134887] mgag200 0000:08:00.0: vgaarb: deactivate vga console
[  OK  ] Started Show Plymouth Boot Screen.
[  OK  ] Started Forward Password R…[   32.155183] mpt2sas_cm0: scatter gather: sge_in_main_msg(1), sge_per_chain(9), sge_per_io(128), chains_per_io(15)
s to Plymouth Di[   32.157213] [drm] Initialized mgag200 1.0.0 for 0000:08:00.0 on minor 0
rectory Watch   32.167994] mpt2sas_cm0: request pool(0x00000000b4be1d72) - dma(0xf880000): depth(3200), frame_size(128), pool_size(400 kB)
m.
[  OK  ] Reached target Path Units.
[  OK  ] Reached target Basic System.
[   32.190444] fbcon: mgag200drmfb (fb0) is primary device
[   32.224946] mpt2sas_cm0: sense pool(0x000000005610eff3) - dma(0x10100000): depth(2939), element_size(96), pool_size (275 kB)
[   32.225059] mpt2sas_cm0: reply pool(0x000000000f24e619) - dma(0x10180000): depth(3264), frame_size(128), pool_size(408 kB)
[   32.225073] mpt2sas_cm0: config page(0x00000000ba53d4ed) - dma(0xfea3000): size(512)
[   32.225076] mpt2sas_cm0: Allocated physical memory: size(7012 kB)
[   32.225078] mpt2sas_cm0: Current Controller Queue Depth(2936),Max Controller Queue Depth(3072)
[   32.225080] mpt2sas_cm0: Scatter Gather Elements per IO(128)
[   32.242578] ixgbe 0000:03:00.0: Multiqueue Enabled: Rx Queue count = 63, Tx Queue count = 63 XDP Queue count = 0
[   32.273473] mpt2sas_cm0: LSISAS2308: FWVersion(17.00.01.00), ChipRevision(0x05)
[   32.273486] mpt2sas_cm0: Intel(R) Controller: Subsystem ID: 0x3050
[   32.273490] mpt2sas_cm0: Protocol=(Initiator), Capabilities=(Raid,TLR,EEDP,Snapshot Buffer,Diag Trace Buffer,Task Set Full,NCQ)
[   32.273693] scsi host6: Fusion MPT SAS Host
[   32.281337] mpt2sas_cm0: sending port enable !!
[   32.327525] ixgbe 0000:03:00.0: 16.000 Gb/s available PCIe bandwidth, limited by 5.0 GT/s PCIe x4 link at 0000:00:03.2 (capable of 32.000 Gb/s with 5.0 GT/s PCIe x8 link)
[   32.349384] ------------[ cut here ]------------
[   32.349467] [CRTC:34:crtc-0] vblank wait timed out
[   32.349549] WARNING: CPU: 164 PID: 1820 at drivers/gpu/drm/drm_atomic_helper.c:1682 drm_atomic_helper_wait_for_vblanks.part.0+0x24f/0x260 [drm_kms_helper]
[   32.349600] Modules linked in: crct10dif_pclmul crc32_pclmul crc32c_intel mgag200(+) ghash_clmulni_intel i2c_algo_bit sha512_ssse3 drm_shmem_helper drm_kms_helper sha256_ssse3 sha1_ssse3 ixgbe(+) mpt3sas mdio drm raid_class scsi_transport_sas dca wmi fuse
[   32.349676] CPU: 164 UID: 0 PID: 1820 Comm: systemd-udevd Not tainted 6.12.0-rc1+ #170
[   32.349694] Hardware name: Intel Corporation BRICKLAND/BRICKLAND, BIOS BRBDXSD1.86B.0338.V01.1603162127 03/16/2016
[   32.349696] RIP: 0010:drm_atomic_helper_wait_for_vblanks.part.0+0x24f/0x260 [drm_kms_helper]
[   32.349706] Code: 00 48 8d 7b 08 e8 61 96 36 e8 45 85 ff 0f 85 d3 fe ff ff 49 8b 56 20 41 8b b6 d8 00 00 00 48 c7 c7 b0 e0 e1 c0 e8 21 3d 2e e8 <0f> 0b e9 b5 fe ff ff 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90
[   32.349708] RSP: 0018:ffffa373a4097680 EFLAGS: 00010282
[   32.349712] RAX: 0000000000000026 RBX: ffff94c556084028 RCX: 0000000000000000
[   32.349715] RDX: 0000000000000002 RSI: ffffffffaaa3ec38 RDI: 00000000ffffffff
[   32.349717] RBP: ffff94c55a259e00 R08: 0000000000000001 R09: 0000000000000000
[   32.349719] R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000000
[   32.349721] R13: 0000000000000000 R14: ffff94c55b1893f0 R15: 0000000000000000
[   32.349723] FS:  00007fc95bdc2b40(0000) GS:ffff94d1b2800000(0000) knlGS:0000000000000000
[   32.349726] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   32.349728] CR2: 00007f906e3ca948 CR3: 000000000cd1c002 CR4: 00000000003706f0
[   32.349730] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   32.349732] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   32.349734] Call Trace:
[   32.349736]  <TASK>
[   32.349739]  ? __warn+0x90/0x1a0
[   32.349754]  ? drm_atomic_helper_wait_for_vblanks.part.0+0x24f/0x260 [drm_kms_helper]
[   32.349764]  ? report_bug+0x1c3/0x1d0
[   32.349779]  ? handle_bug+0x5b/0xa0
[   32.349789]  ? exc_invalid_op+0x14/0x70
[   32.349793]  ? asm_exc_invalid_op+0x16/0x20
[   32.349811]  ? drm_atomic_helper_wait_for_vblanks.part.0+0x24f/0x260 [drm_kms_helper]
[   32.349822]  ? __pfx_autoremove_wake_function+0x10/0x10
[   32.349833]  drm_atomic_helper_commit_tail+0x71/0x80 [drm_kms_helper]
[   32.349843]  mgag200_mode_config_helper_atomic_commit_tail+0x28/0x40 [mgag200]
[   32.349851]  commit_tail+0x94/0x130 [drm_kms_helper]
[   32.349862]  drm_atomic_helper_commit+0x13e/0x170 [drm_kms_helper]
[   32.349872]  drm_atomic_commit+0x97/0xb0 [drm]
[   32.349923]  ? __pfx___drm_printfn_info+0x10/0x10 [drm]
[   32.349955]  drm_client_modeset_commit_atomic+0x207/0x250 [drm]
[   32.349991]  drm_client_modeset_commit_locked+0x5b/0x190 [drm]
[   32.350015]  drm_client_modeset_commit+0x24/0x50 [drm]
[   32.350038]  __drm_fb_helper_restore_fbdev_mode_unlocked+0x95/0xd0 [drm_kms_helper]
[   32.350050]  drm_fb_helper_set_par+0x2e/0x40 [drm_kms_helper]
[   32.350059]  fbcon_init+0x2a8/0x560
[   32.350070]  visual_init+0xc4/0x120
[   32.350078]  do_bind_con_driver.isra.0+0x1a1/0x3d0
[   32.350086]  do_take_over_console+0x10b/0x1a0
[   32.350092]  do_fbcon_takeover+0x5c/0xc0
[   32.350095]  fbcon_fb_registered+0x49/0x70
[   32.350098]  do_register_framebuffer+0x184/0x230
[   32.350109]  register_framebuffer+0x20/0x40
[   32.350112]  __drm_fb_helper_initial_config_and_unlock+0x33e/0x590 [drm_kms_helper]
[   32.350122]  ? drm_client_register+0x33/0xc0 [drm]
[   32.350154]  drm_fbdev_shmem_client_hotplug+0x6c/0xc0 [drm_shmem_helper]
[   32.350160]  drm_client_register+0x7b/0xc0 [drm]
[   32.350184]  mgag200_pci_probe+0x90/0x180 [mgag200]
[   32.350191]  local_pci_probe+0x46/0xa0
[   32.350199]  pci_device_probe+0xb5/0x220
[   32.350206]  really_probe+0xd9/0x380
[   32.350214]  __driver_probe_device+0x78/0x150
[   32.350249]  driver_probe_device+0x1e/0x90
[   32.350254]  __driver_attach+0xd6/0x1d0
[   32.350258]  ? __pfx___driver_attach+0x10/0x10
[   32.350261]  bus_for_each_dev+0x66/0xa0
[   32.350267]  bus_add_driver+0x111/0x240
[   32.350272]  driver_register+0x5c/0x120
[   32.350280]  ? __pfx_mgag200_pci_driver_init+0x10/0x10 [mgag200]
[   32.350285]  do_one_initcall+0x62/0x3a0
[   32.350299]  ? __kmalloc_cache_noprof+0x240/0x300
[   32.350315]  do_init_module+0x64/0x240
[   32.350329]  init_module_from_file+0x7a/0xa0
[   32.350341]  idempotent_init_module+0x15f/0x260
[   32.350353]  __x64_sys_finit_module+0x5a/0xb0
[   32.350358]  do_syscall_64+0x73/0x190
[   32.350364]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[   32.350368] RIP: 0033:0x7fc95ca07e0d
[   32.350371] Code: c8 0c 00 0f 05 eb a9 66 0f 1f 44 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 3b 80 0c 00 f7 d8 64 89 01 48
[   32.350373] RSP: 002b:00007ffef10b2468 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
[   32.350377] RAX: ffffffffffffffda RBX: 0000562610eb0f80 RCX: 00007fc95ca07e0d
[   32.350379] RDX: 0000000000000000 RSI: 00007fc95cb6132c RDI: 0000000000000010
[   32.350381] RBP: 0000000000020000 R08: 0000000000000000 R09: 0000000000000000
[   32.350384] R10: 0000000000000010 R11: 0000000000000246 R12: 00007fc95cb6132c
[   32.350386] R13: 0000562610ed00d0 R14: 0000000000000007 R15: 0000562610ed0380
[   32.350397]  </TASK>
[   32.350399] irq event stamp: 51727
[   32.350402] hardirqs last  enabled at (51733): [<ffffffffa918f854>] vprintk_emit+0x3d4/0x3e0
[   32.350407] hardirqs last disabled at (51738): [<ffffffffa918f807>] vprintk_emit+0x387/0x3e0
[   32.350409] softirqs last  enabled at (51576): [<ffffffffa90e2891>] __irq_exit_rcu+0xa1/0x110
[   32.350420] softirqs last disabled at (51569): [<ffffffffa90e2891>] __irq_exit_rcu+0xa1/0x110
[   32.350423] ---[ end trace 0000000000000000 ]---
[   32.350452] Console: switching to colour frame buffer device 128x48
Thomas Zimmermann Oct. 7, 2024, 1:37 p.m. UTC | #10
Hi

Am 04.10.24 um 12:01 schrieb Ville Syrjälä:
> On Fri, Oct 04, 2024 at 11:17:02AM +0200, Thomas Zimmermann wrote:
>> Hi
>>
>> Am 02.10.24 um 18:15 schrieb Luck, Tony:
>>>> Thanks for the bug report. Can you provide the output of 'sudo lspci
>>>> -vvv' for the graphics device?
>>> Thomas,
>>>
>>> Sure. Here's the output (run on the v6.11.0 kernel)
>> Thanks. It doesn't look much different from other systems. IRQ is also
>> assigned.
>>
>> Attached is a patch that fixes a possible off-by-one error in the
>> register settings. This would affect the bug you're reporting. If
>> possible, please apply the patch to your 6.12-rc1, test and report the
>> result.
> Didn't one of these weird variants have some bug where the
> CRTC startadd was not working? Is this one of those?
> That to me sounds like maybe linecomp has internally been
> tied to be always active somehow. Perhaps that would
> also prevent it from generating the interrupt...

Impressive debugging skills! The broken chip has vendor id 0x0522 
according to commit 21e74bf99596 ("drm/mgag200: Store HW_BUG_NO_STARTADD 
flag in device info"). And that's the same type the Tony reported. [1] 
I'm just not sure if it's worth special casing the chip again or simply 
revert vblank irqs.

Best regards
Thomas

[1] https://admin.pci-ids.ucw.cz/read/PC/102b/0522

>
> Anyways, sounds like someone should just double check whether
> the status bit ever get asserted or not. If yes, then the
> problem must be with interrupt delivery, otherwise the
> problem is that the internal interrupt is never even
> generated. In the latter case you could try using the
> vsync interrupt instead.
>
Ville Syrjälä Oct. 8, 2024, 6:45 p.m. UTC | #11
On Mon, Oct 07, 2024 at 03:37:40PM +0200, Thomas Zimmermann wrote:
> Hi
> 
> Am 04.10.24 um 12:01 schrieb Ville Syrjälä:
> > On Fri, Oct 04, 2024 at 11:17:02AM +0200, Thomas Zimmermann wrote:
> >> Hi
> >>
> >> Am 02.10.24 um 18:15 schrieb Luck, Tony:
> >>>> Thanks for the bug report. Can you provide the output of 'sudo lspci
> >>>> -vvv' for the graphics device?
> >>> Thomas,
> >>>
> >>> Sure. Here's the output (run on the v6.11.0 kernel)
> >> Thanks. It doesn't look much different from other systems. IRQ is also
> >> assigned.
> >>
> >> Attached is a patch that fixes a possible off-by-one error in the
> >> register settings. This would affect the bug you're reporting. If
> >> possible, please apply the patch to your 6.12-rc1, test and report the
> >> result.
> > Didn't one of these weird variants have some bug where the
> > CRTC startadd was not working? Is this one of those?
> > That to me sounds like maybe linecomp has internally been
> > tied to be always active somehow. Perhaps that would
> > also prevent it from generating the interrupt...
> 
> Impressive debugging skills! The broken chip has vendor id 0x0522 
> according to commit 21e74bf99596 ("drm/mgag200: Store HW_BUG_NO_STARTADD 
> flag in device info"). And that's the same type the Tony reported. [1] 
> I'm just not sure if it's worth special casing the chip again or simply 
> revert vblank irqs.

Heh. Though I'm not sure if my theory is quite right. It
seems I've been confused about linecomp all these years;
I thought the split screen effect affected both VGA and
MGA modes (at least on the older chips), but looks like
it never affected MGA mode. I tested it here on a 2064w
based card, which is almost as old as you can go (I do
have an older Athena based card somewhere as well but
didn't bother digging it up).
Thomas Zimmermann Oct. 10, 2024, 9:21 a.m. UTC | #12
Hi

Am 04.10.24 um 18:58 schrieb Luck, Tony:
> Thomas,
>
> v6.12-rc1 plus your off-by-one patch is still broken.

Thanks for testing. Here's another patch to try Ville's suggestion. It 
should disable HW vblank IRQs on your system. Could you please test it 
and report on the results?

Best regards
Thomas


>
> Console log from when things went off the rails:
>
> [   32.126676] Console: switching to colour dummy device 80x25
> [   32.134887] mgag200 0000:08:00.0: vgaarb: deactivate vga console
> [  OK  ] Started Show Plymouth Boot Screen.
> [  OK  ] Started Forward Password R…[   32.155183] mpt2sas_cm0: scatter gather: sge_in_main_msg(1), sge_per_chain(9), sge_per_io(128), chains_per_io(15)
> s to Plymouth Di[   32.157213] [drm] Initialized mgag200 1.0.0 for 0000:08:00.0 on minor 0
> rectory Watch   32.167994] mpt2sas_cm0: request pool(0x00000000b4be1d72) - dma(0xf880000): depth(3200), frame_size(128), pool_size(400 kB)
> m.
> [  OK  ] Reached target Path Units.
> [  OK  ] Reached target Basic System.
> [   32.190444] fbcon: mgag200drmfb (fb0) is primary device
> [   32.224946] mpt2sas_cm0: sense pool(0x000000005610eff3) - dma(0x10100000): depth(2939), element_size(96), pool_size (275 kB)
> [   32.225059] mpt2sas_cm0: reply pool(0x000000000f24e619) - dma(0x10180000): depth(3264), frame_size(128), pool_size(408 kB)
> [   32.225073] mpt2sas_cm0: config page(0x00000000ba53d4ed) - dma(0xfea3000): size(512)
> [   32.225076] mpt2sas_cm0: Allocated physical memory: size(7012 kB)
> [   32.225078] mpt2sas_cm0: Current Controller Queue Depth(2936),Max Controller Queue Depth(3072)
> [   32.225080] mpt2sas_cm0: Scatter Gather Elements per IO(128)
> [   32.242578] ixgbe 0000:03:00.0: Multiqueue Enabled: Rx Queue count = 63, Tx Queue count = 63 XDP Queue count = 0
> [   32.273473] mpt2sas_cm0: LSISAS2308: FWVersion(17.00.01.00), ChipRevision(0x05)
> [   32.273486] mpt2sas_cm0: Intel(R) Controller: Subsystem ID: 0x3050
> [   32.273490] mpt2sas_cm0: Protocol=(Initiator), Capabilities=(Raid,TLR,EEDP,Snapshot Buffer,Diag Trace Buffer,Task Set Full,NCQ)
> [   32.273693] scsi host6: Fusion MPT SAS Host
> [   32.281337] mpt2sas_cm0: sending port enable !!
> [   32.327525] ixgbe 0000:03:00.0: 16.000 Gb/s available PCIe bandwidth, limited by 5.0 GT/s PCIe x4 link at 0000:00:03.2 (capable of 32.000 Gb/s with 5.0 GT/s PCIe x8 link)
> [   32.349384] ------------[ cut here ]------------
> [   32.349467] [CRTC:34:crtc-0] vblank wait timed out
> [   32.349549] WARNING: CPU: 164 PID: 1820 at drivers/gpu/drm/drm_atomic_helper.c:1682 drm_atomic_helper_wait_for_vblanks.part.0+0x24f/0x260 [drm_kms_helper]
> [   32.349600] Modules linked in: crct10dif_pclmul crc32_pclmul crc32c_intel mgag200(+) ghash_clmulni_intel i2c_algo_bit sha512_ssse3 drm_shmem_helper drm_kms_helper sha256_ssse3 sha1_ssse3 ixgbe(+) mpt3sas mdio drm raid_class scsi_transport_sas dca wmi fuse
> [   32.349676] CPU: 164 UID: 0 PID: 1820 Comm: systemd-udevd Not tainted 6.12.0-rc1+ #170
> [   32.349694] Hardware name: Intel Corporation BRICKLAND/BRICKLAND, BIOS BRBDXSD1.86B.0338.V01.1603162127 03/16/2016
> [   32.349696] RIP: 0010:drm_atomic_helper_wait_for_vblanks.part.0+0x24f/0x260 [drm_kms_helper]
> [   32.349706] Code: 00 48 8d 7b 08 e8 61 96 36 e8 45 85 ff 0f 85 d3 fe ff ff 49 8b 56 20 41 8b b6 d8 00 00 00 48 c7 c7 b0 e0 e1 c0 e8 21 3d 2e e8 <0f> 0b e9 b5 fe ff ff 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90
> [   32.349708] RSP: 0018:ffffa373a4097680 EFLAGS: 00010282
> [   32.349712] RAX: 0000000000000026 RBX: ffff94c556084028 RCX: 0000000000000000
> [   32.349715] RDX: 0000000000000002 RSI: ffffffffaaa3ec38 RDI: 00000000ffffffff
> [   32.349717] RBP: ffff94c55a259e00 R08: 0000000000000001 R09: 0000000000000000
> [   32.349719] R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000000
> [   32.349721] R13: 0000000000000000 R14: ffff94c55b1893f0 R15: 0000000000000000
> [   32.349723] FS:  00007fc95bdc2b40(0000) GS:ffff94d1b2800000(0000) knlGS:0000000000000000
> [   32.349726] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   32.349728] CR2: 00007f906e3ca948 CR3: 000000000cd1c002 CR4: 00000000003706f0
> [   32.349730] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [   32.349732] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [   32.349734] Call Trace:
> [   32.349736]  <TASK>
> [   32.349739]  ? __warn+0x90/0x1a0
> [   32.349754]  ? drm_atomic_helper_wait_for_vblanks.part.0+0x24f/0x260 [drm_kms_helper]
> [   32.349764]  ? report_bug+0x1c3/0x1d0
> [   32.349779]  ? handle_bug+0x5b/0xa0
> [   32.349789]  ? exc_invalid_op+0x14/0x70
> [   32.349793]  ? asm_exc_invalid_op+0x16/0x20
> [   32.349811]  ? drm_atomic_helper_wait_for_vblanks.part.0+0x24f/0x260 [drm_kms_helper]
> [   32.349822]  ? __pfx_autoremove_wake_function+0x10/0x10
> [   32.349833]  drm_atomic_helper_commit_tail+0x71/0x80 [drm_kms_helper]
> [   32.349843]  mgag200_mode_config_helper_atomic_commit_tail+0x28/0x40 [mgag200]
> [   32.349851]  commit_tail+0x94/0x130 [drm_kms_helper]
> [   32.349862]  drm_atomic_helper_commit+0x13e/0x170 [drm_kms_helper]
> [   32.349872]  drm_atomic_commit+0x97/0xb0 [drm]
> [   32.349923]  ? __pfx___drm_printfn_info+0x10/0x10 [drm]
> [   32.349955]  drm_client_modeset_commit_atomic+0x207/0x250 [drm]
> [   32.349991]  drm_client_modeset_commit_locked+0x5b/0x190 [drm]
> [   32.350015]  drm_client_modeset_commit+0x24/0x50 [drm]
> [   32.350038]  __drm_fb_helper_restore_fbdev_mode_unlocked+0x95/0xd0 [drm_kms_helper]
> [   32.350050]  drm_fb_helper_set_par+0x2e/0x40 [drm_kms_helper]
> [   32.350059]  fbcon_init+0x2a8/0x560
> [   32.350070]  visual_init+0xc4/0x120
> [   32.350078]  do_bind_con_driver.isra.0+0x1a1/0x3d0
> [   32.350086]  do_take_over_console+0x10b/0x1a0
> [   32.350092]  do_fbcon_takeover+0x5c/0xc0
> [   32.350095]  fbcon_fb_registered+0x49/0x70
> [   32.350098]  do_register_framebuffer+0x184/0x230
> [   32.350109]  register_framebuffer+0x20/0x40
> [   32.350112]  __drm_fb_helper_initial_config_and_unlock+0x33e/0x590 [drm_kms_helper]
> [   32.350122]  ? drm_client_register+0x33/0xc0 [drm]
> [   32.350154]  drm_fbdev_shmem_client_hotplug+0x6c/0xc0 [drm_shmem_helper]
> [   32.350160]  drm_client_register+0x7b/0xc0 [drm]
> [   32.350184]  mgag200_pci_probe+0x90/0x180 [mgag200]
> [   32.350191]  local_pci_probe+0x46/0xa0
> [   32.350199]  pci_device_probe+0xb5/0x220
> [   32.350206]  really_probe+0xd9/0x380
> [   32.350214]  __driver_probe_device+0x78/0x150
> [   32.350249]  driver_probe_device+0x1e/0x90
> [   32.350254]  __driver_attach+0xd6/0x1d0
> [   32.350258]  ? __pfx___driver_attach+0x10/0x10
> [   32.350261]  bus_for_each_dev+0x66/0xa0
> [   32.350267]  bus_add_driver+0x111/0x240
> [   32.350272]  driver_register+0x5c/0x120
> [   32.350280]  ? __pfx_mgag200_pci_driver_init+0x10/0x10 [mgag200]
> [   32.350285]  do_one_initcall+0x62/0x3a0
> [   32.350299]  ? __kmalloc_cache_noprof+0x240/0x300
> [   32.350315]  do_init_module+0x64/0x240
> [   32.350329]  init_module_from_file+0x7a/0xa0
> [   32.350341]  idempotent_init_module+0x15f/0x260
> [   32.350353]  __x64_sys_finit_module+0x5a/0xb0
> [   32.350358]  do_syscall_64+0x73/0x190
> [   32.350364]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
> [   32.350368] RIP: 0033:0x7fc95ca07e0d
> [   32.350371] Code: c8 0c 00 0f 05 eb a9 66 0f 1f 44 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 3b 80 0c 00 f7 d8 64 89 01 48
> [   32.350373] RSP: 002b:00007ffef10b2468 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
> [   32.350377] RAX: ffffffffffffffda RBX: 0000562610eb0f80 RCX: 00007fc95ca07e0d
> [   32.350379] RDX: 0000000000000000 RSI: 00007fc95cb6132c RDI: 0000000000000010
> [   32.350381] RBP: 0000000000020000 R08: 0000000000000000 R09: 0000000000000000
> [   32.350384] R10: 0000000000000010 R11: 0000000000000246 R12: 00007fc95cb6132c
> [   32.350386] R13: 0000562610ed00d0 R14: 0000000000000007 R15: 0000562610ed0380
> [   32.350397]  </TASK>
> [   32.350399] irq event stamp: 51727
> [   32.350402] hardirqs last  enabled at (51733): [<ffffffffa918f854>] vprintk_emit+0x3d4/0x3e0
> [   32.350407] hardirqs last disabled at (51738): [<ffffffffa918f807>] vprintk_emit+0x387/0x3e0
> [   32.350409] softirqs last  enabled at (51576): [<ffffffffa90e2891>] __irq_exit_rcu+0xa1/0x110
> [   32.350420] softirqs last disabled at (51569): [<ffffffffa90e2891>] __irq_exit_rcu+0xa1/0x110
> [   32.350423] ---[ end trace 0000000000000000 ]---
> [   32.350452] Console: switching to colour frame buffer device 128x48
Tony Luck Oct. 10, 2024, 4:07 p.m. UTC | #13
> Thanks for testing. Here's another patch to try Ville's suggestion. It 
> should disable HW vblank IRQs on your system. Could you please test it 
> and report on the results?

Thomas,

Thanks for keeping working on this. Output is different, but still dies with vblank problems.

[  OK  ] Started GNOME Display Manager.
[  329.575813] mgag200 0000:08:00.0: [drm] *ERROR* flip_done timed out
[  329.582889] mgag200 0000:08:00.0: [drm] *ERROR* [PLANE:32:plane-0] commit wait timed out
[  329.719779] ------------[ cut here ]------------
[  329.725174] [CRTC:34:crtc-0] vblank wait timed out
[  329.730724] WARNING: CPU: 150 PID: 1402 at drivers/gpu/drm/drm_atomic_helper.c:1682 drm_atomic_helper_wait_for_vblanks.part.0+0x24f/0x260 [drm_kms_helper]
[  329.746264] Modules linked in: xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_nat_tftp nf_conntrack_tftp bridge stp llc nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle iptable_raw iptable_security ip_set rfkill nf_tables nfnetlink ip6table_filter ip6_tables iptable_filter sunrpc vfat fat intel_rapl_msr intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common sb_edac iTCO_wdt intel_pmc_bxt iTCO_vendor_support x86_pkg_temp_thermal intel_powerclamp ipmi_ssif coretemp rapl intel_cstate joydev intel_uncore acpi_ipmi pcspkr mei_me i2c_i801 i2c_smbus ipmi_si lpc_ich mei ioatdma wmi ipmi_devintf ipmi_msghandler acpi_pad zram ip_tables crct10dif_pclmul crc32_pclmul mgag200 crc32c_intel i2c_algo_bit ghash_clmulni_intel drm_shmem_helper sha512_ssse3
[  329.746604]  drm_kms_helper sha256_ssse3 mpt3sas sha1_ssse3 ixgbe raid_class mdio drm scsi_transport_sas dca fuse
[  329.858506] CPU: 150 UID: 0 PID: 1402 Comm: kworker/150:1 Tainted: G        W          6.12.0-rc2+ #171
[  329.869030] Tainted: [W]=WARN
[  329.872357] Hardware name: Intel Corporation BRICKLAND/BRICKLAND, BIOS BRBDXSD1.86B.0338.V01.1603162127 03/16/2016
[  329.883941] Workqueue: events drm_fb_helper_damage_work [drm_kms_helper]
[  329.891472] RIP: 0010:drm_atomic_helper_wait_for_vblanks.part.0+0x24f/0x260 [drm_kms_helper]
[  329.900937] Code: 00 48 8d 7b 08 e8 41 b7 38 d1 45 85 ff 0f 85 d3 fe ff ff 49 8b 56 20 41 8b b6 d8 00 00 00 48 c7 c7 b0 40 df c0 e8 21 61 30 d1 <0f> 0b e9 b5 fe ff ff 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90
[  329.921932] RSP: 0018:ffffbb9f23277c00 EFLAGS: 00010286
[  329.927797] RAX: 0000000000000026 RBX: ffff9de18562e028 RCX: 0000000000000000
[  329.935793] RDX: 0000000000000002 RSI: ffffffff93a00e78 RDI: 00000000ffffffff
[  329.943786] RBP: ffff9e13d910dc80 R08: 0000000000000000 R09: ffffbb9f23277ac0
[  329.951778] R10: ffffbb9f23277ab8 R11: ffff9e33811fffe8 R12: 0000000000000000
[  329.959784] R13: 0000000000000000 R14: ffff9de0ada653f0 R15: 0000000000000000
[  329.967777] FS:  0000000000000000(0000) GS:ffff9e2032100000(0000) knlGS:0000000000000000
[  329.976838] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  329.983280] CR2: 0000555ce9d0d030 CR3: 0000003eccc3a004 CR4: 00000000003706f0
[  329.991273] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  329.999268] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  330.007262] Call Trace:
[  330.010011]  <TASK>
[  330.012383]  ? __warn+0x90/0x1a0
[  330.016022]  ? drm_atomic_helper_wait_for_vblanks.part.0+0x24f/0x260 [drm_kms_helper]
[  330.024803]  ? report_bug+0x1c3/0x1d0
[  330.028924]  ? __irq_work_queue_local+0x48/0x130
[  330.034116]  ? handle_bug+0x5b/0xa0
[  330.038043]  ? exc_invalid_op+0x14/0x70
[  330.042353]  ? asm_exc_invalid_op+0x16/0x20
[  330.047064]  ? drm_atomic_helper_wait_for_vblanks.part.0+0x24f/0x260 [drm_kms_helper]
[  330.055851]  ? __pfx_autoremove_wake_function+0x10/0x10
[  330.061723]  drm_atomic_helper_commit_tail+0x71/0x80 [drm_kms_helper]
[  330.068954]  mgag200_mode_config_helper_atomic_commit_tail+0x28/0x40 [mgag200]
[  330.077057]  commit_tail+0x94/0x130 [drm_kms_helper]
[  330.082642]  drm_atomic_helper_commit+0x13e/0x170 [drm_kms_helper]
[  330.089597]  drm_atomic_commit+0x97/0xb0 [drm]
[  330.094706]  ? __pfx___drm_printfn_info+0x10/0x10 [drm]
[  330.100624]  drm_atomic_helper_dirtyfb+0x185/0x250 [drm_kms_helper]
[  330.107672]  drm_fbdev_shmem_helper_fb_dirty+0x4c/0xb0 [drm_shmem_helper]
[  330.115282]  drm_fb_helper_damage_work+0x83/0x150 [drm_kms_helper]
[  330.122221]  process_one_work+0x214/0x600
[  330.126727]  worker_thread+0x17f/0x320
[  330.130932]  ? __pfx_worker_thread+0x10/0x10
[  330.135714]  kthread+0xe0/0x110
[  330.139245]  ? __pfx_kthread+0x10/0x10
[  330.143455]  ret_from_fork+0x30/0x50
[  330.147473]  ? __pfx_kthread+0x10/0x10
[  330.151683]  ret_from_fork_asm+0x1a/0x30
[  330.156104]  </TASK>
[  330.158553] irq event stamp: 68963
[  330.162368] hardirqs last  enabled at (68975): [<ffffffff92183fae>] __up_console_sem+0x5e/0x70
[  330.172011] hardirqs last disabled at (68986): [<ffffffff92183f93>] __up_console_sem+0x43/0x70
[  330.181647] softirqs last  enabled at (68850): [<ffffffff920dac91>] __irq_exit_rcu+0xa1/0x110
[  330.191195] softirqs last disabled at (69007): [<ffffffff920dac91>] __irq_exit_rcu+0xa1/0x110
[  330.200734] ---[ end trace 0000000000000000 ]---
[  340.327342] mgag200 0000:08:00.0: [drm] *ERROR* flip_done timed out
[  340.334379] mgag200 0000:08:00.0: [drm] *ERROR* [CRTC:34:crtc-0] commit wait timed out
[  350.566891] mgag200 0000:08:00.0: [drm] *ERROR* flip_done timed out
[  350.573925] mgag200 0000:08:00.0: [drm] *ERROR* [PLANE:32:plane-0] commit wait timed out
[  350.710886] ------------[ cut here ]------------
Tony Luck Oct. 10, 2024, 6:12 p.m. UTC | #14
Apologies. The trace below isn't the first place where things went wrong. I dug up the full serial log
and found some earlier mgag errors. Actual first one is:

[  OK  ] Reached target Basic System.
[   32.366479] fbcon: mgag200drmfb (fb0) is primary device
[   32.405678] mpt2sas_cm0: sense pool(0x00000000dfa0f36f) - dma(0xf500000): depth(2939), element_size(96), pool_size (275 kB)
[   32.405790] mpt2sas_cm0: reply pool(0x000000004919fe15) - dma(0xf580000): depth(3264), frame_size(128), pool_size(408 kB)
[   32.405804] mpt2sas_cm0: config page(0x00000000ac9398d5) - dma(0xf2e4000): size(512)
[   32.405806] mpt2sas_cm0: Allocated physical memory: size(7012 kB)
[   32.405808] mpt2sas_cm0: Current Controller Queue Depth(2936),Max Controller Queue Depth(3072)
[   32.405810] mpt2sas_cm0: Scatter Gather Elements per IO(128)
[   32.436831] ixgbe 0000:03:00.0: 16.000 Gb/s available PCIe bandwidth, limited by 5.0 GT/s PCIe x4 link at 0000:00:03.2 (capable of 32.000 Gb/s with 5.0 GT/s PCIe x8 link)
[   32.454205] mpt2sas_cm0: LSISAS2308: FWVersion(17.00.01.00), ChipRevision(0x05)
[   32.454218] mpt2sas_cm0: Intel(R) Controller: Subsystem ID: 0x3050
[   32.454222] mpt2sas_cm0: Protocol=(Initiator), Capabilities=(Raid,TLR,EEDP,Snapshot Buffer,Diag Trace Buffer,Task Set Full,NCQ)
[   32.454513] scsi host6: Fusion MPT SAS Host
[   32.461960] mpt2sas_cm0: sending port enable !!
[   32.520483] ------------[ cut here ]------------
[   32.520517] [CRTC:34:crtc-0] vblank wait timed out
[   32.520582] WARNING: CPU: 114 PID: 1783 at drivers/gpu/drm/drm_atomic_helper.c:1682 drm_atomic_helper_wait_for_vblanks.part.0+0x24f/0x260 [drm_kms_helper]
[   32.520603] Modules linked in: crct10dif_pclmul crc32_pclmul mgag200(+) crc32c_intel i2c_algo_bit ghash_clmulni_intel drm_shmem_helper sha512_ssse3 drm_kms_helper sha256_ssse3 mpt3sas sha1_ssse3 ixgbe(+) raid_class mdio drm scsi_transport_sas dca fuse
[   32.520631] CPU: 114 UID: 0 PID: 1783 Comm: systemd-udevd Not tainted 6.12.0-rc2+ #171
[   32.520634] Hardware name: Intel Corporation BRICKLAND/BRICKLAND, BIOS BRBDXSD1.86B.0338.V01.1603162127 03/16/2016
[   32.520637] RIP: 0010:drm_atomic_helper_wait_for_vblanks.part.0+0x24f/0x260 [drm_kms_helper]
[   32.520648] Code: 00 48 8d 7b 08 e8 41 b7 38 d1 45 85 ff 0f 85 d3 fe ff ff 49 8b 56 20 41 8b b6 d8 00 00 00 48 c7 c7 b0 40 df c0 e8 21 61 30 d1 <0f> 0b e9 b5 fe ff ff 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90
[   32.520651] RSP: 0018:ffffbb9f23fc3680 EFLAGS: 00010282
[   32.520655] RAX: 0000000000000026 RBX: ffff9de18562e028 RCX: 0000000000000000
[   32.520657] RDX: 0000000000000002 RSI: ffffffff93a00e78 RDI: 00000000ffffffff
[   32.520659] RBP: ffff9de18540eb40 R08: 0000000000000001 R09: 0000000000000000
[   32.520662] R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000000
[   32.520664] R13: 0000000000000000 R14: ffff9de0ada653f0 R15: 0000000000000000
[   32.520667] FS:  00007f64988d9b40(0000) GS:ffff9de187900000(0000) knlGS:0000000000000000
[   32.520669] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   32.520671] CR2: 000055daa91a4b40 CR3: 000000000c13c003 CR4: 00000000003706f0
[   32.520674] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   32.520675] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   32.520677] Call Trace:
[   32.520680]  <TASK>
[   32.520682]  ? __warn+0x90/0x1a0
[   32.520693]  ? drm_atomic_helper_wait_for_vblanks.part.0+0x24f/0x260 [drm_kms_helper]
[   32.520703]  ? report_bug+0x1c3/0x1d0
[   32.520716]  ? handle_bug+0x5b/0xa0
[   32.520724]  ? exc_invalid_op+0x14/0x70
[   32.520727]  ? asm_exc_invalid_op+0x16/0x20
[   32.520741]  ? drm_atomic_helper_wait_for_vblanks.part.0+0x24f/0x260 [drm_kms_helper]
[   32.520753]  ? __pfx_autoremove_wake_function+0x10/0x10
[   32.520766]  drm_atomic_helper_commit_tail+0x71/0x80 [drm_kms_helper]
[   32.520776]  mgag200_mode_config_helper_atomic_commit_tail+0x28/0x40 [mgag200]
[   32.520784]  commit_tail+0x94/0x130 [drm_kms_helper]
[   32.520796]  drm_atomic_helper_commit+0x13e/0x170 [drm_kms_helper]
[   32.520807]  drm_atomic_commit+0x97/0xb0 [drm]
[   32.520850]  ? __pfx___drm_printfn_info+0x10/0x10 [drm]
[   32.520881]  drm_client_modeset_commit_atomic+0x207/0x250 [drm]
[   32.520918]  drm_client_modeset_commit_locked+0x5b/0x190 [drm]
[   32.520945]  drm_client_modeset_commit+0x24/0x50 [drm]
[   32.520970]  __drm_fb_helper_restore_fbdev_mode_unlocked+0x95/0xd0 [drm_kms_helper]
[   32.520982]  drm_fb_helper_set_par+0x2e/0x40 [drm_kms_helper]
[   32.520992]  fbcon_init+0x2a8/0x560
[   32.521005]  visual_init+0xc4/0x120
[   32.521013]  do_bind_con_driver.isra.0+0x1a1/0x3d0
[   32.521020]  do_take_over_console+0x10b/0x1a0
[   32.521026]  do_fbcon_takeover+0x5c/0xc0
[   32.521028]  fbcon_fb_registered+0x49/0x70
[   32.521032]  do_register_framebuffer+0x184/0x230
[   32.521041]  register_framebuffer+0x20/0x40
[   32.521044]  __drm_fb_helper_initial_config_and_unlock+0x33e/0x590 [drm_kms_helper]
[   32.521054]  ? drm_client_register+0x33/0xc0 [drm]
[   32.521084]  drm_fbdev_shmem_client_hotplug+0x6c/0xc0 [drm_shmem_helper]
[   32.521090]  drm_client_register+0x7b/0xc0 [drm]
[   32.521116]  mgag200_pci_probe+0x90/0x180 [mgag200]
[   32.521124]  local_pci_probe+0x46/0xa0
[   32.521131]  pci_device_probe+0xb5/0x220
[   32.521138]  really_probe+0xd9/0x380
[   32.521146]  __driver_probe_device+0x78/0x150
[   32.521151]  driver_probe_device+0x1e/0x90
[   32.521155]  __driver_attach+0xd6/0x1d0
[   32.521159]  ? __pfx___driver_attach+0x10/0x10
[   32.521162]  bus_for_each_dev+0x66/0xa0
[   32.521167]  bus_add_driver+0x111/0x240
[   32.521173]  driver_register+0x5c/0x120
[   32.521176]  ? __pfx_mgag200_pci_driver_init+0x10/0x10 [mgag200]
[   32.521182]  do_one_initcall+0x62/0x3a0
[   32.521189]  ? __kmalloc_cache_noprof+0x240/0x300
[   32.521202]  do_init_module+0x64/0x240
[   32.521213]  init_module_from_file+0x7a/0xa0
[   32.521226]  idempotent_init_module+0x15f/0x260
[   32.521240]  __x64_sys_finit_module+0x5a/0xb0
[   32.521245]  do_syscall_64+0x73/0x190
[   32.521260]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[   32.521265] RIP: 0033:0x7f649951ee0d
[   32.521271] Code: c8 0c 00 0f 05 eb a9 66 0f 1f 44 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 3b 80 0c 00 f7 d8 64 89 01 48
[   32.521273] RSP: 002b:00007ffc84905b08 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
[   32.521278] RAX: ffffffffffffffda RBX: 000055daa9188020 RCX: 00007f649951ee0d
[   32.521280] RDX: 0000000000000000 RSI: 00007f649967832c RDI: 0000000000000010
[   32.521282] RBP: 0000000000020000 R08: 0000000000000000 R09: 0000000000000000
[   32.521284] R10: 0000000000000010 R11: 0000000000000246 R12: 00007f649967832c
[   32.521286] R13: 000055daa9189330 R14: 0000000000000007 R15: 000055daa91adb10
[   32.521298]  </TASK>
[   32.521300] irq event stamp: 52913
[   32.521301] hardirqs last  enabled at (52919): [<ffffffff92187784>] vprintk_emit+0x3d4/0x3e0
[   32.521313] hardirqs last disabled at (52924): [<ffffffff92187737>] vprintk_emit+0x387/0x3e0
[   32.521317] softirqs last  enabled at (52274): [<ffffffff920dac91>] __irq_exit_rcu+0xa1/0x110
[   32.521326] softirqs last disabled at (52267): [<ffffffff920dac91>] __irq_exit_rcu+0xa1/0x110
[   32.521329] ---[ end trace 0000000000000000 ]---

-----Original Message-----
From: Luck, Tony 
Sent: Thursday, October 10, 2024 9:07 AM
To: Thomas Zimmermann <tzimmermann@suse.de>
Cc: jfalempe@redhat.com; airlied@redhat.com; sam@ravnborg.org; emil.l.velikov@gmail.com; maarten.lankhorst@linux.intel.com; mripard@kernel.org; airlied@gmail.com; daniel@ffwll.ch; dri-devel@lists.freedesktop.org
Subject: RE: [PATCH v5 0/7] drm/mgag200: Implement VBLANK support

> Thanks for testing. Here's another patch to try Ville's suggestion. It 
> should disable HW vblank IRQs on your system. Could you please test it 
> and report on the results?

Thomas,

Thanks for keeping working on this. Output is different, but still dies with vblank problems.

[  OK  ] Started GNOME Display Manager.
[  329.575813] mgag200 0000:08:00.0: [drm] *ERROR* flip_done timed out
[  329.582889] mgag200 0000:08:00.0: [drm] *ERROR* [PLANE:32:plane-0] commit wait timed out
[  329.719779] ------------[ cut here ]------------
[  329.725174] [CRTC:34:crtc-0] vblank wait timed out
[  329.730724] WARNING: CPU: 150 PID: 1402 at drivers/gpu/drm/drm_atomic_helper.c:1682 drm_atomic_helper_wait_for_vblanks.part.0+0x24f/0x260 [drm_kms_helper]
[  329.746264] Modules linked in: xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_nat_tftp nf_conntrack_tftp bridge stp llc nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle iptable_raw iptable_security ip_set rfkill nf_tables nfnetlink ip6table_filter ip6_tables iptable_filter sunrpc vfat fat intel_rapl_msr intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common sb_edac iTCO_wdt intel_pmc_bxt iTCO_vendor_support x86_pkg_temp_thermal intel_powerclamp ipmi_ssif coretemp rapl intel_cstate joydev intel_uncore acpi_ipmi pcspkr mei_me i2c_i801 i2c_smbus ipmi_si lpc_ich mei ioatdma wmi ipmi_devintf ipmi_msghandler acpi_pad zram ip_tables crct10dif_pclmul crc32_pclmul mgag200 crc32c_intel i2c_algo_bit ghash_clmulni_intel drm_shmem_helper sha512_ssse3
[  329.746604]  drm_kms_helper sha256_ssse3 mpt3sas sha1_ssse3 ixgbe raid_class mdio drm scsi_transport_sas dca fuse
[  329.858506] CPU: 150 UID: 0 PID: 1402 Comm: kworker/150:1 Tainted: G        W          6.12.0-rc2+ #171
[  329.869030] Tainted: [W]=WARN
[  329.872357] Hardware name: Intel Corporation BRICKLAND/BRICKLAND, BIOS BRBDXSD1.86B.0338.V01.1603162127 03/16/2016
[  329.883941] Workqueue: events drm_fb_helper_damage_work [drm_kms_helper]
[  329.891472] RIP: 0010:drm_atomic_helper_wait_for_vblanks.part.0+0x24f/0x260 [drm_kms_helper]
[  329.900937] Code: 00 48 8d 7b 08 e8 41 b7 38 d1 45 85 ff 0f 85 d3 fe ff ff 49 8b 56 20 41 8b b6 d8 00 00 00 48 c7 c7 b0 40 df c0 e8 21 61 30 d1 <0f> 0b e9 b5 fe ff ff 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90
[  329.921932] RSP: 0018:ffffbb9f23277c00 EFLAGS: 00010286
[  329.927797] RAX: 0000000000000026 RBX: ffff9de18562e028 RCX: 0000000000000000
[  329.935793] RDX: 0000000000000002 RSI: ffffffff93a00e78 RDI: 00000000ffffffff
[  329.943786] RBP: ffff9e13d910dc80 R08: 0000000000000000 R09: ffffbb9f23277ac0
[  329.951778] R10: ffffbb9f23277ab8 R11: ffff9e33811fffe8 R12: 0000000000000000
[  329.959784] R13: 0000000000000000 R14: ffff9de0ada653f0 R15: 0000000000000000
[  329.967777] FS:  0000000000000000(0000) GS:ffff9e2032100000(0000) knlGS:0000000000000000
[  329.976838] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  329.983280] CR2: 0000555ce9d0d030 CR3: 0000003eccc3a004 CR4: 00000000003706f0
[  329.991273] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  329.999268] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  330.007262] Call Trace:
[  330.010011]  <TASK>
[  330.012383]  ? __warn+0x90/0x1a0
[  330.016022]  ? drm_atomic_helper_wait_for_vblanks.part.0+0x24f/0x260 [drm_kms_helper]
[  330.024803]  ? report_bug+0x1c3/0x1d0
[  330.028924]  ? __irq_work_queue_local+0x48/0x130
[  330.034116]  ? handle_bug+0x5b/0xa0
[  330.038043]  ? exc_invalid_op+0x14/0x70
[  330.042353]  ? asm_exc_invalid_op+0x16/0x20
[  330.047064]  ? drm_atomic_helper_wait_for_vblanks.part.0+0x24f/0x260 [drm_kms_helper]
[  330.055851]  ? __pfx_autoremove_wake_function+0x10/0x10
[  330.061723]  drm_atomic_helper_commit_tail+0x71/0x80 [drm_kms_helper]
[  330.068954]  mgag200_mode_config_helper_atomic_commit_tail+0x28/0x40 [mgag200]
[  330.077057]  commit_tail+0x94/0x130 [drm_kms_helper]
[  330.082642]  drm_atomic_helper_commit+0x13e/0x170 [drm_kms_helper]
[  330.089597]  drm_atomic_commit+0x97/0xb0 [drm]
[  330.094706]  ? __pfx___drm_printfn_info+0x10/0x10 [drm]
[  330.100624]  drm_atomic_helper_dirtyfb+0x185/0x250 [drm_kms_helper]
[  330.107672]  drm_fbdev_shmem_helper_fb_dirty+0x4c/0xb0 [drm_shmem_helper]
[  330.115282]  drm_fb_helper_damage_work+0x83/0x150 [drm_kms_helper]
[  330.122221]  process_one_work+0x214/0x600
[  330.126727]  worker_thread+0x17f/0x320
[  330.130932]  ? __pfx_worker_thread+0x10/0x10
[  330.135714]  kthread+0xe0/0x110
[  330.139245]  ? __pfx_kthread+0x10/0x10
[  330.143455]  ret_from_fork+0x30/0x50
[  330.147473]  ? __pfx_kthread+0x10/0x10
[  330.151683]  ret_from_fork_asm+0x1a/0x30
[  330.156104]  </TASK>
[  330.158553] irq event stamp: 68963
[  330.162368] hardirqs last  enabled at (68975): [<ffffffff92183fae>] __up_console_sem+0x5e/0x70
[  330.172011] hardirqs last disabled at (68986): [<ffffffff92183f93>] __up_console_sem+0x43/0x70
[  330.181647] softirqs last  enabled at (68850): [<ffffffff920dac91>] __irq_exit_rcu+0xa1/0x110
[  330.191195] softirqs last disabled at (69007): [<ffffffff920dac91>] __irq_exit_rcu+0xa1/0x110
[  330.200734] ---[ end trace 0000000000000000 ]---
[  340.327342] mgag200 0000:08:00.0: [drm] *ERROR* flip_done timed out
[  340.334379] mgag200 0000:08:00.0: [drm] *ERROR* [CRTC:34:crtc-0] commit wait timed out
[  350.566891] mgag200 0000:08:00.0: [drm] *ERROR* flip_done timed out
[  350.573925] mgag200 0000:08:00.0: [drm] *ERROR* [PLANE:32:plane-0] commit wait timed out
[  350.710886] ------------[ cut here ]------------
Thomas Zimmermann Oct. 11, 2024, 7:03 a.m. UTC | #15
Hi

Am 10.10.24 um 20:12 schrieb Luck, Tony:
> Apologies. The trace below isn't the first place where things went wrong. I dug up the full serial log
> and found some earlier mgag errors. Actual first one is:

I have to apologize, as the patch I sent was incorrect. The if condition 
was inverted. Here's a fixed patch for you to test.

Best regards
Thomas

>
> [  OK  ] Reached target Basic System.
> [   32.366479] fbcon: mgag200drmfb (fb0) is primary device
> [   32.405678] mpt2sas_cm0: sense pool(0x00000000dfa0f36f) - dma(0xf500000): depth(2939), element_size(96), pool_size (275 kB)
> [   32.405790] mpt2sas_cm0: reply pool(0x000000004919fe15) - dma(0xf580000): depth(3264), frame_size(128), pool_size(408 kB)
> [   32.405804] mpt2sas_cm0: config page(0x00000000ac9398d5) - dma(0xf2e4000): size(512)
> [   32.405806] mpt2sas_cm0: Allocated physical memory: size(7012 kB)
> [   32.405808] mpt2sas_cm0: Current Controller Queue Depth(2936),Max Controller Queue Depth(3072)
> [   32.405810] mpt2sas_cm0: Scatter Gather Elements per IO(128)
> [   32.436831] ixgbe 0000:03:00.0: 16.000 Gb/s available PCIe bandwidth, limited by 5.0 GT/s PCIe x4 link at 0000:00:03.2 (capable of 32.000 Gb/s with 5.0 GT/s PCIe x8 link)
> [   32.454205] mpt2sas_cm0: LSISAS2308: FWVersion(17.00.01.00), ChipRevision(0x05)
> [   32.454218] mpt2sas_cm0: Intel(R) Controller: Subsystem ID: 0x3050
> [   32.454222] mpt2sas_cm0: Protocol=(Initiator), Capabilities=(Raid,TLR,EEDP,Snapshot Buffer,Diag Trace Buffer,Task Set Full,NCQ)
> [   32.454513] scsi host6: Fusion MPT SAS Host
> [   32.461960] mpt2sas_cm0: sending port enable !!
> [   32.520483] ------------[ cut here ]------------
> [   32.520517] [CRTC:34:crtc-0] vblank wait timed out
> [   32.520582] WARNING: CPU: 114 PID: 1783 at drivers/gpu/drm/drm_atomic_helper.c:1682 drm_atomic_helper_wait_for_vblanks.part.0+0x24f/0x260 [drm_kms_helper]
> [   32.520603] Modules linked in: crct10dif_pclmul crc32_pclmul mgag200(+) crc32c_intel i2c_algo_bit ghash_clmulni_intel drm_shmem_helper sha512_ssse3 drm_kms_helper sha256_ssse3 mpt3sas sha1_ssse3 ixgbe(+) raid_class mdio drm scsi_transport_sas dca fuse
> [   32.520631] CPU: 114 UID: 0 PID: 1783 Comm: systemd-udevd Not tainted 6.12.0-rc2+ #171
> [   32.520634] Hardware name: Intel Corporation BRICKLAND/BRICKLAND, BIOS BRBDXSD1.86B.0338.V01.1603162127 03/16/2016
> [   32.520637] RIP: 0010:drm_atomic_helper_wait_for_vblanks.part.0+0x24f/0x260 [drm_kms_helper]
> [   32.520648] Code: 00 48 8d 7b 08 e8 41 b7 38 d1 45 85 ff 0f 85 d3 fe ff ff 49 8b 56 20 41 8b b6 d8 00 00 00 48 c7 c7 b0 40 df c0 e8 21 61 30 d1 <0f> 0b e9 b5 fe ff ff 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90
> [   32.520651] RSP: 0018:ffffbb9f23fc3680 EFLAGS: 00010282
> [   32.520655] RAX: 0000000000000026 RBX: ffff9de18562e028 RCX: 0000000000000000
> [   32.520657] RDX: 0000000000000002 RSI: ffffffff93a00e78 RDI: 00000000ffffffff
> [   32.520659] RBP: ffff9de18540eb40 R08: 0000000000000001 R09: 0000000000000000
> [   32.520662] R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000000
> [   32.520664] R13: 0000000000000000 R14: ffff9de0ada653f0 R15: 0000000000000000
> [   32.520667] FS:  00007f64988d9b40(0000) GS:ffff9de187900000(0000) knlGS:0000000000000000
> [   32.520669] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   32.520671] CR2: 000055daa91a4b40 CR3: 000000000c13c003 CR4: 00000000003706f0
> [   32.520674] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [   32.520675] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [   32.520677] Call Trace:
> [   32.520680]  <TASK>
> [   32.520682]  ? __warn+0x90/0x1a0
> [   32.520693]  ? drm_atomic_helper_wait_for_vblanks.part.0+0x24f/0x260 [drm_kms_helper]
> [   32.520703]  ? report_bug+0x1c3/0x1d0
> [   32.520716]  ? handle_bug+0x5b/0xa0
> [   32.520724]  ? exc_invalid_op+0x14/0x70
> [   32.520727]  ? asm_exc_invalid_op+0x16/0x20
> [   32.520741]  ? drm_atomic_helper_wait_for_vblanks.part.0+0x24f/0x260 [drm_kms_helper]
> [   32.520753]  ? __pfx_autoremove_wake_function+0x10/0x10
> [   32.520766]  drm_atomic_helper_commit_tail+0x71/0x80 [drm_kms_helper]
> [   32.520776]  mgag200_mode_config_helper_atomic_commit_tail+0x28/0x40 [mgag200]
> [   32.520784]  commit_tail+0x94/0x130 [drm_kms_helper]
> [   32.520796]  drm_atomic_helper_commit+0x13e/0x170 [drm_kms_helper]
> [   32.520807]  drm_atomic_commit+0x97/0xb0 [drm]
> [   32.520850]  ? __pfx___drm_printfn_info+0x10/0x10 [drm]
> [   32.520881]  drm_client_modeset_commit_atomic+0x207/0x250 [drm]
> [   32.520918]  drm_client_modeset_commit_locked+0x5b/0x190 [drm]
> [   32.520945]  drm_client_modeset_commit+0x24/0x50 [drm]
> [   32.520970]  __drm_fb_helper_restore_fbdev_mode_unlocked+0x95/0xd0 [drm_kms_helper]
> [   32.520982]  drm_fb_helper_set_par+0x2e/0x40 [drm_kms_helper]
> [   32.520992]  fbcon_init+0x2a8/0x560
> [   32.521005]  visual_init+0xc4/0x120
> [   32.521013]  do_bind_con_driver.isra.0+0x1a1/0x3d0
> [   32.521020]  do_take_over_console+0x10b/0x1a0
> [   32.521026]  do_fbcon_takeover+0x5c/0xc0
> [   32.521028]  fbcon_fb_registered+0x49/0x70
> [   32.521032]  do_register_framebuffer+0x184/0x230
> [   32.521041]  register_framebuffer+0x20/0x40
> [   32.521044]  __drm_fb_helper_initial_config_and_unlock+0x33e/0x590 [drm_kms_helper]
> [   32.521054]  ? drm_client_register+0x33/0xc0 [drm]
> [   32.521084]  drm_fbdev_shmem_client_hotplug+0x6c/0xc0 [drm_shmem_helper]
> [   32.521090]  drm_client_register+0x7b/0xc0 [drm]
> [   32.521116]  mgag200_pci_probe+0x90/0x180 [mgag200]
> [   32.521124]  local_pci_probe+0x46/0xa0
> [   32.521131]  pci_device_probe+0xb5/0x220
> [   32.521138]  really_probe+0xd9/0x380
> [   32.521146]  __driver_probe_device+0x78/0x150
> [   32.521151]  driver_probe_device+0x1e/0x90
> [   32.521155]  __driver_attach+0xd6/0x1d0
> [   32.521159]  ? __pfx___driver_attach+0x10/0x10
> [   32.521162]  bus_for_each_dev+0x66/0xa0
> [   32.521167]  bus_add_driver+0x111/0x240
> [   32.521173]  driver_register+0x5c/0x120
> [   32.521176]  ? __pfx_mgag200_pci_driver_init+0x10/0x10 [mgag200]
> [   32.521182]  do_one_initcall+0x62/0x3a0
> [   32.521189]  ? __kmalloc_cache_noprof+0x240/0x300
> [   32.521202]  do_init_module+0x64/0x240
> [   32.521213]  init_module_from_file+0x7a/0xa0
> [   32.521226]  idempotent_init_module+0x15f/0x260
> [   32.521240]  __x64_sys_finit_module+0x5a/0xb0
> [   32.521245]  do_syscall_64+0x73/0x190
> [   32.521260]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
> [   32.521265] RIP: 0033:0x7f649951ee0d
> [   32.521271] Code: c8 0c 00 0f 05 eb a9 66 0f 1f 44 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 3b 80 0c 00 f7 d8 64 89 01 48
> [   32.521273] RSP: 002b:00007ffc84905b08 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
> [   32.521278] RAX: ffffffffffffffda RBX: 000055daa9188020 RCX: 00007f649951ee0d
> [   32.521280] RDX: 0000000000000000 RSI: 00007f649967832c RDI: 0000000000000010
> [   32.521282] RBP: 0000000000020000 R08: 0000000000000000 R09: 0000000000000000
> [   32.521284] R10: 0000000000000010 R11: 0000000000000246 R12: 00007f649967832c
> [   32.521286] R13: 000055daa9189330 R14: 0000000000000007 R15: 000055daa91adb10
> [   32.521298]  </TASK>
> [   32.521300] irq event stamp: 52913
> [   32.521301] hardirqs last  enabled at (52919): [<ffffffff92187784>] vprintk_emit+0x3d4/0x3e0
> [   32.521313] hardirqs last disabled at (52924): [<ffffffff92187737>] vprintk_emit+0x387/0x3e0
> [   32.521317] softirqs last  enabled at (52274): [<ffffffff920dac91>] __irq_exit_rcu+0xa1/0x110
> [   32.521326] softirqs last disabled at (52267): [<ffffffff920dac91>] __irq_exit_rcu+0xa1/0x110
> [   32.521329] ---[ end trace 0000000000000000 ]---
>
> -----Original Message-----
> From: Luck, Tony
> Sent: Thursday, October 10, 2024 9:07 AM
> To: Thomas Zimmermann <tzimmermann@suse.de>
> Cc: jfalempe@redhat.com; airlied@redhat.com; sam@ravnborg.org; emil.l.velikov@gmail.com; maarten.lankhorst@linux.intel.com; mripard@kernel.org; airlied@gmail.com; daniel@ffwll.ch; dri-devel@lists.freedesktop.org
> Subject: RE: [PATCH v5 0/7] drm/mgag200: Implement VBLANK support
>
>> Thanks for testing. Here's another patch to try Ville's suggestion. It
>> should disable HW vblank IRQs on your system. Could you please test it
>> and report on the results?
> Thomas,
>
> Thanks for keeping working on this. Output is different, but still dies with vblank problems.
>
> [  OK  ] Started GNOME Display Manager.
> [  329.575813] mgag200 0000:08:00.0: [drm] *ERROR* flip_done timed out
> [  329.582889] mgag200 0000:08:00.0: [drm] *ERROR* [PLANE:32:plane-0] commit wait timed out
> [  329.719779] ------------[ cut here ]------------
> [  329.725174] [CRTC:34:crtc-0] vblank wait timed out
> [  329.730724] WARNING: CPU: 150 PID: 1402 at drivers/gpu/drm/drm_atomic_helper.c:1682 drm_atomic_helper_wait_for_vblanks.part.0+0x24f/0x260 [drm_kms_helper]
> [  329.746264] Modules linked in: xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_nat_tftp nf_conntrack_tftp bridge stp llc nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle iptable_raw iptable_security ip_set rfkill nf_tables nfnetlink ip6table_filter ip6_tables iptable_filter sunrpc vfat fat intel_rapl_msr intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common sb_edac iTCO_wdt intel_pmc_bxt iTCO_vendor_support x86_pkg_temp_thermal intel_powerclamp ipmi_ssif coretemp rapl intel_cstate joydev intel_uncore acpi_ipmi pcspkr mei_me i2c_i801 i2c_smbus ipmi_si lpc_ich mei ioatdma wmi ipmi_devintf ipmi_msghandler acpi_pad zram ip_tables crct10dif_pclmul crc32_pclmul mgag200 crc32c_intel i2c_algo_bit ghash_clmulni_intel drm_shmem_helper sha512_ssse3
> [  329.746604]  drm_kms_helper sha256_ssse3 mpt3sas sha1_ssse3 ixgbe raid_class mdio drm scsi_transport_sas dca fuse
> [  329.858506] CPU: 150 UID: 0 PID: 1402 Comm: kworker/150:1 Tainted: G        W          6.12.0-rc2+ #171
> [  329.869030] Tainted: [W]=WARN
> [  329.872357] Hardware name: Intel Corporation BRICKLAND/BRICKLAND, BIOS BRBDXSD1.86B.0338.V01.1603162127 03/16/2016
> [  329.883941] Workqueue: events drm_fb_helper_damage_work [drm_kms_helper]
> [  329.891472] RIP: 0010:drm_atomic_helper_wait_for_vblanks.part.0+0x24f/0x260 [drm_kms_helper]
> [  329.900937] Code: 00 48 8d 7b 08 e8 41 b7 38 d1 45 85 ff 0f 85 d3 fe ff ff 49 8b 56 20 41 8b b6 d8 00 00 00 48 c7 c7 b0 40 df c0 e8 21 61 30 d1 <0f> 0b e9 b5 fe ff ff 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90
> [  329.921932] RSP: 0018:ffffbb9f23277c00 EFLAGS: 00010286
> [  329.927797] RAX: 0000000000000026 RBX: ffff9de18562e028 RCX: 0000000000000000
> [  329.935793] RDX: 0000000000000002 RSI: ffffffff93a00e78 RDI: 00000000ffffffff
> [  329.943786] RBP: ffff9e13d910dc80 R08: 0000000000000000 R09: ffffbb9f23277ac0
> [  329.951778] R10: ffffbb9f23277ab8 R11: ffff9e33811fffe8 R12: 0000000000000000
> [  329.959784] R13: 0000000000000000 R14: ffff9de0ada653f0 R15: 0000000000000000
> [  329.967777] FS:  0000000000000000(0000) GS:ffff9e2032100000(0000) knlGS:0000000000000000
> [  329.976838] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  329.983280] CR2: 0000555ce9d0d030 CR3: 0000003eccc3a004 CR4: 00000000003706f0
> [  329.991273] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [  329.999268] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [  330.007262] Call Trace:
> [  330.010011]  <TASK>
> [  330.012383]  ? __warn+0x90/0x1a0
> [  330.016022]  ? drm_atomic_helper_wait_for_vblanks.part.0+0x24f/0x260 [drm_kms_helper]
> [  330.024803]  ? report_bug+0x1c3/0x1d0
> [  330.028924]  ? __irq_work_queue_local+0x48/0x130
> [  330.034116]  ? handle_bug+0x5b/0xa0
> [  330.038043]  ? exc_invalid_op+0x14/0x70
> [  330.042353]  ? asm_exc_invalid_op+0x16/0x20
> [  330.047064]  ? drm_atomic_helper_wait_for_vblanks.part.0+0x24f/0x260 [drm_kms_helper]
> [  330.055851]  ? __pfx_autoremove_wake_function+0x10/0x10
> [  330.061723]  drm_atomic_helper_commit_tail+0x71/0x80 [drm_kms_helper]
> [  330.068954]  mgag200_mode_config_helper_atomic_commit_tail+0x28/0x40 [mgag200]
> [  330.077057]  commit_tail+0x94/0x130 [drm_kms_helper]
> [  330.082642]  drm_atomic_helper_commit+0x13e/0x170 [drm_kms_helper]
> [  330.089597]  drm_atomic_commit+0x97/0xb0 [drm]
> [  330.094706]  ? __pfx___drm_printfn_info+0x10/0x10 [drm]
> [  330.100624]  drm_atomic_helper_dirtyfb+0x185/0x250 [drm_kms_helper]
> [  330.107672]  drm_fbdev_shmem_helper_fb_dirty+0x4c/0xb0 [drm_shmem_helper]
> [  330.115282]  drm_fb_helper_damage_work+0x83/0x150 [drm_kms_helper]
> [  330.122221]  process_one_work+0x214/0x600
> [  330.126727]  worker_thread+0x17f/0x320
> [  330.130932]  ? __pfx_worker_thread+0x10/0x10
> [  330.135714]  kthread+0xe0/0x110
> [  330.139245]  ? __pfx_kthread+0x10/0x10
> [  330.143455]  ret_from_fork+0x30/0x50
> [  330.147473]  ? __pfx_kthread+0x10/0x10
> [  330.151683]  ret_from_fork_asm+0x1a/0x30
> [  330.156104]  </TASK>
> [  330.158553] irq event stamp: 68963
> [  330.162368] hardirqs last  enabled at (68975): [<ffffffff92183fae>] __up_console_sem+0x5e/0x70
> [  330.172011] hardirqs last disabled at (68986): [<ffffffff92183f93>] __up_console_sem+0x43/0x70
> [  330.181647] softirqs last  enabled at (68850): [<ffffffff920dac91>] __irq_exit_rcu+0xa1/0x110
> [  330.191195] softirqs last disabled at (69007): [<ffffffff920dac91>] __irq_exit_rcu+0xa1/0x110
> [  330.200734] ---[ end trace 0000000000000000 ]---
> [  340.327342] mgag200 0000:08:00.0: [drm] *ERROR* flip_done timed out
> [  340.334379] mgag200 0000:08:00.0: [drm] *ERROR* [CRTC:34:crtc-0] commit wait timed out
> [  350.566891] mgag200 0000:08:00.0: [drm] *ERROR* flip_done timed out
> [  350.573925] mgag200 0000:08:00.0: [drm] *ERROR* [PLANE:32:plane-0] commit wait timed out
> [  350.710886] ------------[ cut here ]------------
Jocelyn Falempe Oct. 11, 2024, 7:09 a.m. UTC | #16
On 02/10/2024 00:41, Tony Luck wrote:
> My system threw out a bunch of stack traces while booting
> v6.12-rc1 and hung.

Sorry for replying late, but when writing DMA support for mgag200, I had 
a few servers where IRQ wasn't working at all:
https://patchwork.freedesktop.org/series/117380/

Here are my notes from my testing:

hp-dl180 MGA G200e [102b:0522] (rev 02)
[   20.627122] mgag200 0000:02:00.0: [drm] *ERROR* DMA transfer timed out

dell-pem520  G200eR2 [102b:0534]
[  308.168976] mgag200 0000:1a:00.0: [drm] *ERROR* DMA transfer timed out

I don't have access to those machines anymore, but I suspect IRQ is 
either misconfigured or not connected on them.
Tony Luck Oct. 11, 2024, 4:36 p.m. UTC | #17
Progress! My system now boots. But there's one WARN_ON dump along the way to the "login:" prompt.

Thanks

-Tony

---

[   33.111505] Console: switching to colour dummy device 80x25
[   33.119581] mgag200 0000:08:00.0: vgaarb: deactivate vga console
[   33.139574] [drm] Initialized mgag200 1.0.0 for 0000:08:00.0 on minor 0
[   33.157665] fbcon: mgag200drmfb (fb0) is primary device
[   33.196490] ixgbe 0000:03:00.1: Multiqueue Enabled: Rx Queue count = 63, Tx Queue count = 63 XDP Queue count = 0
[   33.281367] ixgbe 0000:03:00.1: 16.000 Gb/s available PCIe bandwidth, limited by 5.0 GT/s PCIe x4 link at 0000:00:03.2 (capable of 32.000 Gb/s with 5.0 GT/s PCIe x8 link)
[   33.282519] ------------[ cut here ]------------
[   33.282550] mgag200 0000:08:00.0: [drm] drm_WARN_ON(pipe >= dev->num_crtcs)
[   33.282610] WARNING: CPU: 123 PID: 1774 at drivers/gpu/drm/drm_vblank.c:1488 drm_crtc_vblank_on_config+0x1b5/0x210 [drm]
[   33.282687] Modules linked in: crct10dif_pclmul crc32_pclmul mgag200(+) crc32c_intel i2c_algo_bit ghash_clmulni_intel drm_shmem_helper sha512_ssse3 drm_kms_helper sha256_ssse3 sha1_ssse3 mpt3sas ixgbe(+) raid_class mdio drm scsi_transport_sas dca fuse
[   33.282712] CPU: 123 UID: 0 PID: 1774 Comm: systemd-udevd Not tainted 6.12.0-rc2+ #171
[   33.282716] Hardware name: Intel Corporation BRICKLAND/BRICKLAND, BIOS BRBDXSD1.86B.0338.V01.1603162127 03/16/2016
[   33.282718] RIP: 0010:drm_crtc_vblank_on_config+0x1b5/0x210 [drm]
[   33.282743] Code: 4c 8b 67 50 4d 85 e4 75 03 4c 8b 27 e8 34 ce 01 d6 48 c7 c1 78 9b b1 c0 4c 89 e2 48 c7 c7 1e d6 b1 c0 48 89 c6 e8 3b 9f 60 d5 <0f> 0b 48 83 c4 18 5b 5d 41 5c 41 5d 41 5e 41 5f c3 cc cc cc cc 48
[   33.282745] RSP: 0018:ffffbd1ca3f8f660 EFLAGS: 00010282
[   33.282749] RAX: 000000000000003f RBX: ffff9ddf0a498000 RCX: 0000000000000000
[   33.282751] RDX: 0000000000000002 RSI: ffffffff97a00e78 RDI: 00000000ffffffff
[   33.282753] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000000
[   33.282755] R10: 0000000000000001 R11: 0000000000000001 R12: ffff9df257758df0
[   33.282757] R13: ffff9ddf0a4993f0 R14: ffffffffc0b726c0 R15: ffff9ddf05d33450
[   33.282758] FS:  00007f66ab8e2b40(0000) GS:ffff9deb61f80000(0000) knlGS:0000000000000000
[   33.282761] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   33.282763] CR2: 00007f66ab8c7c4b CR3: 000000000bc04003 CR4: 00000000003706f0
[   33.282765] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   33.282766] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   33.282768] Call Trace:
[   33.282771]  <TASK>
[   33.282773]  ? __warn+0x90/0x1a0
[   33.282785]  ? drm_crtc_vblank_on_config+0x1b5/0x210 [drm]
[   33.282808]  ? report_bug+0x1c3/0x1d0
[   33.282819]  ? handle_bug+0x5b/0xa0
[   33.282824]  ? exc_invalid_op+0x14/0x70
[   33.282827]  ? asm_exc_invalid_op+0x16/0x20
[   33.282839]  ? drm_crtc_vblank_on_config+0x1b5/0x210 [drm]
[   33.282862]  ? mgag200_crtc_set_gamma_linear+0x17a/0x190 [mgag200]
[   33.282868]  ? mgag200_enable_display+0x13b/0x160 [mgag200]
[   33.282876]  drm_crtc_vblank_on+0x28/0x40 [drm]
[   33.282898]  drm_atomic_helper_commit_modeset_enables+0xa6/0x240 [drm_kms_helper]
[   33.282920]  drm_atomic_helper_commit_tail+0x50/0x80 [drm_kms_helper]
[   33.282931]  mgag200_mode_config_helper_atomic_commit_tail+0x28/0x40 [mgag200]
[   33.282951]  commit_tail+0x94/0x130 [drm_kms_helper]
[   33.282963]  drm_atomic_helper_commit+0x13e/0x170 [drm_kms_helper]
[   33.282975]  drm_atomic_commit+0x97/0xb0 [drm]
[   33.282996]  ? __pfx___drm_printfn_info+0x10/0x10 [drm]
[   33.283027]  drm_client_modeset_commit_atomic+0x207/0x250 [drm]
[   33.283060]  drm_client_modeset_commit_locked+0x5b/0x190 [drm]
[   33.283086]  drm_client_modeset_commit+0x24/0x50 [drm]
[   33.283109]  __drm_fb_helper_restore_fbdev_mode_unlocked+0x95/0xd0 [drm_kms_helper]
[   33.283122]  drm_fb_helper_set_par+0x2e/0x40 [drm_kms_helper]
[   33.283132]  fbcon_init+0x2a8/0x560
[   33.283143]  visual_init+0xc4/0x120
[   33.283150]  do_bind_con_driver.isra.0+0x1a1/0x3d0
[   33.283158]  do_take_over_console+0x10b/0x1a0
[   33.283164]  do_fbcon_takeover+0x5c/0xc0
[   33.283167]  fbcon_fb_registered+0x49/0x70
[   33.283170]  do_register_framebuffer+0x184/0x230
[   33.283179]  register_framebuffer+0x20/0x40
[   33.283182]  __drm_fb_helper_initial_config_and_unlock+0x33e/0x590 [drm_kms_helper]
[   33.283193]  ? drm_client_register+0x33/0xc0 [drm]
[   33.283222]  drm_fbdev_shmem_client_hotplug+0x6c/0xc0 [drm_shmem_helper]
[   33.283228]  drm_client_register+0x7b/0xc0 [drm]
[   33.283254]  mgag200_pci_probe+0x90/0x180 [mgag200]
[   33.283262]  local_pci_probe+0x46/0xa0
[   33.283269]  pci_device_probe+0xb5/0x220
[   33.283277]  really_probe+0xd9/0x380
[   33.283288]  __driver_probe_device+0x78/0x150
[   33.283293]  driver_probe_device+0x1e/0x90
[   33.283297]  __driver_attach+0xd6/0x1d0
[   33.283301]  ? __pfx___driver_attach+0x10/0x10
[   33.283305]  bus_for_each_dev+0x66/0xa0
[   33.283311]  bus_add_driver+0x111/0x240
[   33.283317]  driver_register+0x5c/0x120
[   33.283320]  ? __pfx_mgag200_pci_driver_init+0x10/0x10 [mgag200]
[   33.283326]  do_one_initcall+0x62/0x3a0
[   33.283333]  ? __kmalloc_cache_noprof+0x240/0x300
[   33.283343]  do_init_module+0x64/0x240
[   33.283354]  init_module_from_file+0x7a/0xa0
[   33.283366]  idempotent_init_module+0x15f/0x260
[   33.283378]  __x64_sys_finit_module+0x5a/0xb0
[   33.283383]  do_syscall_64+0x73/0x190
[   33.283396]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[   33.283399] RIP: 0033:0x7f66ac527e0d
[   33.283403] Code: c8 0c 00 0f 05 eb a9 66 0f 1f 44 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 3b 80 0c 00 f7 d8 64 89 01 48
[   33.283406] RSP: 002b:00007ffff0c752b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
[   33.283410] RAX: ffffffffffffffda RBX: 0000557cd3b38d00 RCX: 00007f66ac527e0d
[   33.283412] RDX: 0000000000000000 RSI: 00007f66ac68132c RDI: 0000000000000010
[   33.283414] RBP: 0000000000020000 R08: 0000000000000000 R09: 0000000000000000
[   33.283416] R10: 0000000000000010 R11: 0000000000000246 R12: 00007f66ac68132c
[   33.283418] R13: 0000557cd3b18eb0 R14: 0000000000000007 R15: 0000557cd3b38f80
[   33.283429]  </TASK>
[   33.283431] irq event stamp: 45133
[   33.283433] hardirqs last  enabled at (45139): [<ffffffff96187784>] vprintk_emit+0x3d4/0x3e0
[   33.283444] hardirqs last disabled at (45144): [<ffffffff96187737>] vprintk_emit+0x387/0x3e0
[   33.283448] softirqs last  enabled at (44822): [<ffffffff960dac91>] __irq_exit_rcu+0xa1/0x110
[   33.283456] softirqs last disabled at (44817): [<ffffffff960dac91>] __irq_exit_rcu+0xa1/0x110
[   33.283459] ---[ end trace 0000000000000000 ]---
[   33.283494] Console: switching to colour frame buffer device 128x48
[   33.379557] ixgbe 0000:03:00.1: MAC: 3, PHY: 0, PBA No: G36748-005
[   33.399852] mgag200 0000:08:00.0: [drm] fb0: mgag200drmfb frame buffer device
Tony Luck Oct. 11, 2024, 4:44 p.m. UTC | #18
Posted too soon. Some time (kernel timestamps say a few minutes) after the
successful boot the console spewed another stack dump and the machine hung.

-Tony


brk-bdx-01 login: [  364.922549] ------------[ cut here ]------------
[  364.927987] mgag200 0000:08:00.0: [drm] drm_WARN_ON(pipe >= dev->num_crtcs)
[  364.928157] WARNING: CPU: 46 PID: 3556 at drivers/gpu/drm/drm_vblank.c:1347 drm_crtc_vblank_off+0x250/0x270 [drm]
[  364.947651] Modules linked in: snd_seq_dummy snd_hrtimer snd_seq snd_seq_device snd_timer snd soundcore xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_nat_tftp nf_conntrack_tftp bridge stp llc nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle iptable_raw iptable_security ip_set rfkill nf_tables nfnetlink ip6table_filter ip6_tables iptable_filter sunrpc vfat fat intel_rapl_msr intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp iTCO_wdt intel_pmc_bxt iTCO_vendor_support rapl ipmi_ssif intel_cstate intel_uncore acpi_ipmi joydev pcspkr ipmi_si i2c_i801 mei_me ipmi_devintf i2c_smbus lpc_ich mei ioatdma wmi ipmi_msghandler acpi_pad zram ip_tables crct10dif_pclmul crc32_pclmul mgag200
[  364.948006]  crc32c_intel i2c_algo_bit ghash_clmulni_intel drm_shmem_helper sha512_ssse3 drm_kms_helper sha256_ssse3 sha1_ssse3 mpt3sas ixgbe raid_class mdio drm scsi_transport_sas dca fuse
[  365.066964] CPU: 46 UID: 42 PID: 3556 Comm: gnome-shell Tainted: G        W          6.12.0-rc2+ #171
[  365.077283] Tainted: [W]=WARN
[  365.080617] Hardware name: Intel Corporation BRICKLAND/BRICKLAND, BIOS BRBDXSD1.86B.0338.V01.1603162127 03/16/2016
[  365.092189] RIP: 0010:drm_crtc_vblank_off+0x250/0x270 [drm]
[  365.098473] Code: 4c 8b 67 50 4d 85 e4 75 03 4c 8b 27 e8 e9 be 01 d6 48 c7 c1 78 9b b1 c0 4c 89 e2 48 c7 c7 1e d6 b1 c0 48 89 c6 e8 f0 8f 60 d5 <0f> 0b 48 83 c4 20 5b 5d 41 5c 41 5d 41 5e 41 5f c3 cc cc cc cc 66
[  365.119464] RSP: 0018:ffffbd1ca3d87b20 EFLAGS: 00010282
[  365.125316] RAX: 000000000000003f RBX: ffff9ddf0a498000 RCX: 0000000000000000
[  365.133297] RDX: 0000000000000002 RSI: ffffffff97a00e78 RDI: 00000000ffffffff
[  365.141283] RBP: 0000000000000000 R08: 0000000000000000 R09: ffffbd1ca3d879e0
[  365.149274] R10: ffffbd1ca3d879d8 R11: ffff9e12011fffe8 R12: ffff9df257758df0
[  365.157266] R13: ffff9ddf0a4993f0 R14: ffff9ddf05087a00 R15: ffffffffc0b726c0
[  365.165244] FS:  00007f2ae47fad80(0000) GS:ffff9deb61d00000(0000) knlGS:0000000000000000
[  365.174299] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  365.180736] CR2: 00000be199403000 CR3: 00000000127a8001 CR4: 00000000003706f0
[  365.188726] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  365.196718] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  365.204709] Call Trace:
[  365.207462]  <TASK>
[  365.209829]  ? __warn+0x90/0x1a0
[  365.213469]  ? drm_crtc_vblank_off+0x250/0x270 [drm]
[  365.219104]  ? report_bug+0x1c3/0x1d0
[  365.223226]  ? handle_bug+0x5b/0xa0
[  365.227150]  ? exc_invalid_op+0x14/0x70
[  365.231455]  ? asm_exc_invalid_op+0x16/0x20
[  365.236159]  ? drm_crtc_vblank_off+0x250/0x270 [drm]
[  365.241762]  ? _raw_spin_unlock_irq+0x24/0x50
[  365.246653]  ? lockdep_hardirqs_on+0x7b/0x100
[  365.251549]  mgag200_crtc_helper_atomic_disable+0xf/0x160 [mgag200]
[  365.258576]  disable_outputs+0x246/0x360 [drm_kms_helper]
[  365.264671]  drm_atomic_helper_commit_tail+0x1a/0x80 [drm_kms_helper]
[  365.271896]  mgag200_mode_config_helper_atomic_commit_tail+0x28/0x40 [mgag200]
[  365.279998]  commit_tail+0x94/0x130 [drm_kms_helper]
[  365.285578]  drm_atomic_helper_commit+0x13e/0x170 [drm_kms_helper]
[  365.292513]  drm_atomic_commit+0x97/0xb0 [drm]
[  365.297533]  ? __pfx___drm_printfn_info+0x10/0x10 [drm]
[  365.303439]  drm_mode_atomic_ioctl+0x995/0xb80 [drm]
[  365.309061]  ? __pfx_drm_mode_atomic_ioctl+0x10/0x10 [drm]
[  365.315245]  drm_ioctl_kernel+0x85/0xf0 [drm]
[  365.320183]  drm_ioctl+0x23a/0x450 [drm]
[  365.324640]  ? __pfx_drm_mode_atomic_ioctl+0x10/0x10 [drm]
[  365.330825]  ? __pfx___fget_files+0xb/0x10
[  365.335438]  __x64_sys_ioctl+0x8a/0xc0
[  365.339656]  do_syscall_64+0x73/0x190
[  365.343780]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  365.349445] RIP: 0033:0x7f2ae87280ab
[  365.353462] Code: ff ff ff 85 c0 79 9b 49 c7 c4 ff ff ff ff 5b 5d 4c 89 e0 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 9d bd 0c 00 f7 d8 64 89 01 48
[  365.374448] RSP: 002b:00007ffc89bc33c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[  365.382925] RAX: ffffffffffffffda RBX: 00007ffc89bc3410 RCX: 00007f2ae87280ab
[  365.390908] RDX: 00007ffc89bc3410 RSI: 00000000c03864bc RDI: 000000000000000b
[  365.398904] RBP: 00000000c03864bc R08: 0000000000000002 R09: 0000000000000002
[  365.406901] R10: 00007f2ae87f4a00 R11: 0000000000000246 R12: 0000564bfe3dcc80
[  365.414889] R13: 000000000000000b R14: 0000564bfdead540 R15: 0000564bfdb6b5d0
[  365.422914]  </TASK>
[  365.425379] irq event stamp: 1043639
[  365.429393] hardirqs last  enabled at (1043651): [<ffffffff96183fae>] __up_console_sem+0x5e/0x70
[  365.439231] hardirqs last disabled at (1043662): [<ffffffff96183f93>] __up_console_sem+0x43/0x70
[  365.449074] softirqs last  enabled at (1043676): [<ffffffff960dac91>] __irq_exit_rcu+0xa1/0x110
[  365.458818] softirqs last disabled at (1043671): [<ffffffff960dac91>] __irq_exit_rcu+0xa1/0x110
[  365.468548] ---[ end trace 0000000000000000 ]---


-----Original Message-----
From: Luck, Tony 
Sent: Friday, October 11, 2024 9:37 AM
To: Thomas Zimmermann <tzimmermann@suse.de>
Cc: jfalempe@redhat.com; airlied@redhat.com; sam@ravnborg.org; emil.l.velikov@gmail.com; maarten.lankhorst@linux.intel.com; mripard@kernel.org; airlied@gmail.com; daniel@ffwll.ch; dri-devel@lists.freedesktop.org
Subject: RE: [PATCH v5 0/7] drm/mgag200: Implement VBLANK support

Progress! My system now boots. But there's one WARN_ON dump along the way to the "login:" prompt.

Thanks

-Tony

---

[   33.111505] Console: switching to colour dummy device 80x25
[   33.119581] mgag200 0000:08:00.0: vgaarb: deactivate vga console
[   33.139574] [drm] Initialized mgag200 1.0.0 for 0000:08:00.0 on minor 0
[   33.157665] fbcon: mgag200drmfb (fb0) is primary device
[   33.196490] ixgbe 0000:03:00.1: Multiqueue Enabled: Rx Queue count = 63, Tx Queue count = 63 XDP Queue count = 0
[   33.281367] ixgbe 0000:03:00.1: 16.000 Gb/s available PCIe bandwidth, limited by 5.0 GT/s PCIe x4 link at 0000:00:03.2 (capable of 32.000 Gb/s with 5.0 GT/s PCIe x8 link)
[   33.282519] ------------[ cut here ]------------
[   33.282550] mgag200 0000:08:00.0: [drm] drm_WARN_ON(pipe >= dev->num_crtcs)
[   33.282610] WARNING: CPU: 123 PID: 1774 at drivers/gpu/drm/drm_vblank.c:1488 drm_crtc_vblank_on_config+0x1b5/0x210 [drm]
[   33.282687] Modules linked in: crct10dif_pclmul crc32_pclmul mgag200(+) crc32c_intel i2c_algo_bit ghash_clmulni_intel drm_shmem_helper sha512_ssse3 drm_kms_helper sha256_ssse3 sha1_ssse3 mpt3sas ixgbe(+) raid_class mdio drm scsi_transport_sas dca fuse
[   33.282712] CPU: 123 UID: 0 PID: 1774 Comm: systemd-udevd Not tainted 6.12.0-rc2+ #171
[   33.282716] Hardware name: Intel Corporation BRICKLAND/BRICKLAND, BIOS BRBDXSD1.86B.0338.V01.1603162127 03/16/2016
[   33.282718] RIP: 0010:drm_crtc_vblank_on_config+0x1b5/0x210 [drm]
[   33.282743] Code: 4c 8b 67 50 4d 85 e4 75 03 4c 8b 27 e8 34 ce 01 d6 48 c7 c1 78 9b b1 c0 4c 89 e2 48 c7 c7 1e d6 b1 c0 48 89 c6 e8 3b 9f 60 d5 <0f> 0b 48 83 c4 18 5b 5d 41 5c 41 5d 41 5e 41 5f c3 cc cc cc cc 48
[   33.282745] RSP: 0018:ffffbd1ca3f8f660 EFLAGS: 00010282
[   33.282749] RAX: 000000000000003f RBX: ffff9ddf0a498000 RCX: 0000000000000000
[   33.282751] RDX: 0000000000000002 RSI: ffffffff97a00e78 RDI: 00000000ffffffff
[   33.282753] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000000
[   33.282755] R10: 0000000000000001 R11: 0000000000000001 R12: ffff9df257758df0
[   33.282757] R13: ffff9ddf0a4993f0 R14: ffffffffc0b726c0 R15: ffff9ddf05d33450
[   33.282758] FS:  00007f66ab8e2b40(0000) GS:ffff9deb61f80000(0000) knlGS:0000000000000000
[   33.282761] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   33.282763] CR2: 00007f66ab8c7c4b CR3: 000000000bc04003 CR4: 00000000003706f0
[   33.282765] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   33.282766] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   33.282768] Call Trace:
[   33.282771]  <TASK>
[   33.282773]  ? __warn+0x90/0x1a0
[   33.282785]  ? drm_crtc_vblank_on_config+0x1b5/0x210 [drm]
[   33.282808]  ? report_bug+0x1c3/0x1d0
[   33.282819]  ? handle_bug+0x5b/0xa0
[   33.282824]  ? exc_invalid_op+0x14/0x70
[   33.282827]  ? asm_exc_invalid_op+0x16/0x20
[   33.282839]  ? drm_crtc_vblank_on_config+0x1b5/0x210 [drm]
[   33.282862]  ? mgag200_crtc_set_gamma_linear+0x17a/0x190 [mgag200]
[   33.282868]  ? mgag200_enable_display+0x13b/0x160 [mgag200]
[   33.282876]  drm_crtc_vblank_on+0x28/0x40 [drm]
[   33.282898]  drm_atomic_helper_commit_modeset_enables+0xa6/0x240 [drm_kms_helper]
[   33.282920]  drm_atomic_helper_commit_tail+0x50/0x80 [drm_kms_helper]
[   33.282931]  mgag200_mode_config_helper_atomic_commit_tail+0x28/0x40 [mgag200]
[   33.282951]  commit_tail+0x94/0x130 [drm_kms_helper]
[   33.282963]  drm_atomic_helper_commit+0x13e/0x170 [drm_kms_helper]
[   33.282975]  drm_atomic_commit+0x97/0xb0 [drm]
[   33.282996]  ? __pfx___drm_printfn_info+0x10/0x10 [drm]
[   33.283027]  drm_client_modeset_commit_atomic+0x207/0x250 [drm]
[   33.283060]  drm_client_modeset_commit_locked+0x5b/0x190 [drm]
[   33.283086]  drm_client_modeset_commit+0x24/0x50 [drm]
[   33.283109]  __drm_fb_helper_restore_fbdev_mode_unlocked+0x95/0xd0 [drm_kms_helper]
[   33.283122]  drm_fb_helper_set_par+0x2e/0x40 [drm_kms_helper]
[   33.283132]  fbcon_init+0x2a8/0x560
[   33.283143]  visual_init+0xc4/0x120
[   33.283150]  do_bind_con_driver.isra.0+0x1a1/0x3d0
[   33.283158]  do_take_over_console+0x10b/0x1a0
[   33.283164]  do_fbcon_takeover+0x5c/0xc0
[   33.283167]  fbcon_fb_registered+0x49/0x70
[   33.283170]  do_register_framebuffer+0x184/0x230
[   33.283179]  register_framebuffer+0x20/0x40
[   33.283182]  __drm_fb_helper_initial_config_and_unlock+0x33e/0x590 [drm_kms_helper]
[   33.283193]  ? drm_client_register+0x33/0xc0 [drm]
[   33.283222]  drm_fbdev_shmem_client_hotplug+0x6c/0xc0 [drm_shmem_helper]
[   33.283228]  drm_client_register+0x7b/0xc0 [drm]
[   33.283254]  mgag200_pci_probe+0x90/0x180 [mgag200]
[   33.283262]  local_pci_probe+0x46/0xa0
[   33.283269]  pci_device_probe+0xb5/0x220
[   33.283277]  really_probe+0xd9/0x380
[   33.283288]  __driver_probe_device+0x78/0x150
[   33.283293]  driver_probe_device+0x1e/0x90
[   33.283297]  __driver_attach+0xd6/0x1d0
[   33.283301]  ? __pfx___driver_attach+0x10/0x10
[   33.283305]  bus_for_each_dev+0x66/0xa0
[   33.283311]  bus_add_driver+0x111/0x240
[   33.283317]  driver_register+0x5c/0x120
[   33.283320]  ? __pfx_mgag200_pci_driver_init+0x10/0x10 [mgag200]
[   33.283326]  do_one_initcall+0x62/0x3a0
[   33.283333]  ? __kmalloc_cache_noprof+0x240/0x300
[   33.283343]  do_init_module+0x64/0x240
[   33.283354]  init_module_from_file+0x7a/0xa0
[   33.283366]  idempotent_init_module+0x15f/0x260
[   33.283378]  __x64_sys_finit_module+0x5a/0xb0
[   33.283383]  do_syscall_64+0x73/0x190
[   33.283396]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[   33.283399] RIP: 0033:0x7f66ac527e0d
[   33.283403] Code: c8 0c 00 0f 05 eb a9 66 0f 1f 44 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 3b 80 0c 00 f7 d8 64 89 01 48
[   33.283406] RSP: 002b:00007ffff0c752b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
[   33.283410] RAX: ffffffffffffffda RBX: 0000557cd3b38d00 RCX: 00007f66ac527e0d
[   33.283412] RDX: 0000000000000000 RSI: 00007f66ac68132c RDI: 0000000000000010
[   33.283414] RBP: 0000000000020000 R08: 0000000000000000 R09: 0000000000000000
[   33.283416] R10: 0000000000000010 R11: 0000000000000246 R12: 00007f66ac68132c
[   33.283418] R13: 0000557cd3b18eb0 R14: 0000000000000007 R15: 0000557cd3b38f80
[   33.283429]  </TASK>
[   33.283431] irq event stamp: 45133
[   33.283433] hardirqs last  enabled at (45139): [<ffffffff96187784>] vprintk_emit+0x3d4/0x3e0
[   33.283444] hardirqs last disabled at (45144): [<ffffffff96187737>] vprintk_emit+0x387/0x3e0
[   33.283448] softirqs last  enabled at (44822): [<ffffffff960dac91>] __irq_exit_rcu+0xa1/0x110
[   33.283456] softirqs last disabled at (44817): [<ffffffff960dac91>] __irq_exit_rcu+0xa1/0x110
[   33.283459] ---[ end trace 0000000000000000 ]---
[   33.283494] Console: switching to colour frame buffer device 128x48
[   33.379557] ixgbe 0000:03:00.1: MAC: 3, PHY: 0, PBA No: G36748-005
[   33.399852] mgag200 0000:08:00.0: [drm] fb0: mgag200drmfb frame buffer device
Thomas Zimmermann Oct. 14, 2024, 12:39 p.m. UTC | #19
Hi

Am 11.10.24 um 18:44 schrieb Luck, Tony:
> Posted too soon. Some time (kernel timestamps say a few minutes) after the
> successful boot the console spewed another stack dump and the machine hung.

This warning is OK for the quick workaround.

Attached is a full revert of the vblank support for you to test. If that 
undoes the bug, I'll post it for review to the list.

Best regards
Thomas


>
> -Tony
>
>
> brk-bdx-01 login: [  364.922549] ------------[ cut here ]------------
> [  364.927987] mgag200 0000:08:00.0: [drm] drm_WARN_ON(pipe >= dev->num_crtcs)
> [  364.928157] WARNING: CPU: 46 PID: 3556 at drivers/gpu/drm/drm_vblank.c:1347 drm_crtc_vblank_off+0x250/0x270 [drm]
> [  364.947651] Modules linked in: snd_seq_dummy snd_hrtimer snd_seq snd_seq_device snd_timer snd soundcore xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_nat_tftp nf_conntrack_tftp bridge stp llc nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle iptable_raw iptable_security ip_set rfkill nf_tables nfnetlink ip6table_filter ip6_tables iptable_filter sunrpc vfat fat intel_rapl_msr intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp iTCO_wdt intel_pmc_bxt iTCO_vendor_support rapl ipmi_ssif intel_cstate intel_uncore acpi_ipmi joydev pcspkr ipmi_si i2c_i801 mei_me ipmi_devintf i2c_smbus lpc_ich mei ioatdma wmi ipmi_msghandler acpi_pad zram ip_tables crct10dif_pclmul crc32_pclmul mgag200
> [  364.948006]  crc32c_intel i2c_algo_bit ghash_clmulni_intel drm_shmem_helper sha512_ssse3 drm_kms_helper sha256_ssse3 sha1_ssse3 mpt3sas ixgbe raid_class mdio drm scsi_transport_sas dca fuse
> [  365.066964] CPU: 46 UID: 42 PID: 3556 Comm: gnome-shell Tainted: G        W          6.12.0-rc2+ #171
> [  365.077283] Tainted: [W]=WARN
> [  365.080617] Hardware name: Intel Corporation BRICKLAND/BRICKLAND, BIOS BRBDXSD1.86B.0338.V01.1603162127 03/16/2016
> [  365.092189] RIP: 0010:drm_crtc_vblank_off+0x250/0x270 [drm]
> [  365.098473] Code: 4c 8b 67 50 4d 85 e4 75 03 4c 8b 27 e8 e9 be 01 d6 48 c7 c1 78 9b b1 c0 4c 89 e2 48 c7 c7 1e d6 b1 c0 48 89 c6 e8 f0 8f 60 d5 <0f> 0b 48 83 c4 20 5b 5d 41 5c 41 5d 41 5e 41 5f c3 cc cc cc cc 66
> [  365.119464] RSP: 0018:ffffbd1ca3d87b20 EFLAGS: 00010282
> [  365.125316] RAX: 000000000000003f RBX: ffff9ddf0a498000 RCX: 0000000000000000
> [  365.133297] RDX: 0000000000000002 RSI: ffffffff97a00e78 RDI: 00000000ffffffff
> [  365.141283] RBP: 0000000000000000 R08: 0000000000000000 R09: ffffbd1ca3d879e0
> [  365.149274] R10: ffffbd1ca3d879d8 R11: ffff9e12011fffe8 R12: ffff9df257758df0
> [  365.157266] R13: ffff9ddf0a4993f0 R14: ffff9ddf05087a00 R15: ffffffffc0b726c0
> [  365.165244] FS:  00007f2ae47fad80(0000) GS:ffff9deb61d00000(0000) knlGS:0000000000000000
> [  365.174299] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  365.180736] CR2: 00000be199403000 CR3: 00000000127a8001 CR4: 00000000003706f0
> [  365.188726] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [  365.196718] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [  365.204709] Call Trace:
> [  365.207462]  <TASK>
> [  365.209829]  ? __warn+0x90/0x1a0
> [  365.213469]  ? drm_crtc_vblank_off+0x250/0x270 [drm]
> [  365.219104]  ? report_bug+0x1c3/0x1d0
> [  365.223226]  ? handle_bug+0x5b/0xa0
> [  365.227150]  ? exc_invalid_op+0x14/0x70
> [  365.231455]  ? asm_exc_invalid_op+0x16/0x20
> [  365.236159]  ? drm_crtc_vblank_off+0x250/0x270 [drm]
> [  365.241762]  ? _raw_spin_unlock_irq+0x24/0x50
> [  365.246653]  ? lockdep_hardirqs_on+0x7b/0x100
> [  365.251549]  mgag200_crtc_helper_atomic_disable+0xf/0x160 [mgag200]
> [  365.258576]  disable_outputs+0x246/0x360 [drm_kms_helper]
> [  365.264671]  drm_atomic_helper_commit_tail+0x1a/0x80 [drm_kms_helper]
> [  365.271896]  mgag200_mode_config_helper_atomic_commit_tail+0x28/0x40 [mgag200]
> [  365.279998]  commit_tail+0x94/0x130 [drm_kms_helper]
> [  365.285578]  drm_atomic_helper_commit+0x13e/0x170 [drm_kms_helper]
> [  365.292513]  drm_atomic_commit+0x97/0xb0 [drm]
> [  365.297533]  ? __pfx___drm_printfn_info+0x10/0x10 [drm]
> [  365.303439]  drm_mode_atomic_ioctl+0x995/0xb80 [drm]
> [  365.309061]  ? __pfx_drm_mode_atomic_ioctl+0x10/0x10 [drm]
> [  365.315245]  drm_ioctl_kernel+0x85/0xf0 [drm]
> [  365.320183]  drm_ioctl+0x23a/0x450 [drm]
> [  365.324640]  ? __pfx_drm_mode_atomic_ioctl+0x10/0x10 [drm]
> [  365.330825]  ? __pfx___fget_files+0xb/0x10
> [  365.335438]  __x64_sys_ioctl+0x8a/0xc0
> [  365.339656]  do_syscall_64+0x73/0x190
> [  365.343780]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
> [  365.349445] RIP: 0033:0x7f2ae87280ab
> [  365.353462] Code: ff ff ff 85 c0 79 9b 49 c7 c4 ff ff ff ff 5b 5d 4c 89 e0 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 9d bd 0c 00 f7 d8 64 89 01 48
> [  365.374448] RSP: 002b:00007ffc89bc33c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
> [  365.382925] RAX: ffffffffffffffda RBX: 00007ffc89bc3410 RCX: 00007f2ae87280ab
> [  365.390908] RDX: 00007ffc89bc3410 RSI: 00000000c03864bc RDI: 000000000000000b
> [  365.398904] RBP: 00000000c03864bc R08: 0000000000000002 R09: 0000000000000002
> [  365.406901] R10: 00007f2ae87f4a00 R11: 0000000000000246 R12: 0000564bfe3dcc80
> [  365.414889] R13: 000000000000000b R14: 0000564bfdead540 R15: 0000564bfdb6b5d0
> [  365.422914]  </TASK>
> [  365.425379] irq event stamp: 1043639
> [  365.429393] hardirqs last  enabled at (1043651): [<ffffffff96183fae>] __up_console_sem+0x5e/0x70
> [  365.439231] hardirqs last disabled at (1043662): [<ffffffff96183f93>] __up_console_sem+0x43/0x70
> [  365.449074] softirqs last  enabled at (1043676): [<ffffffff960dac91>] __irq_exit_rcu+0xa1/0x110
> [  365.458818] softirqs last disabled at (1043671): [<ffffffff960dac91>] __irq_exit_rcu+0xa1/0x110
> [  365.468548] ---[ end trace 0000000000000000 ]---
>
>
> -----Original Message-----
> From: Luck, Tony
> Sent: Friday, October 11, 2024 9:37 AM
> To: Thomas Zimmermann <tzimmermann@suse.de>
> Cc: jfalempe@redhat.com; airlied@redhat.com; sam@ravnborg.org; emil.l.velikov@gmail.com; maarten.lankhorst@linux.intel.com; mripard@kernel.org; airlied@gmail.com; daniel@ffwll.ch; dri-devel@lists.freedesktop.org
> Subject: RE: [PATCH v5 0/7] drm/mgag200: Implement VBLANK support
>
> Progress! My system now boots. But there's one WARN_ON dump along the way to the "login:" prompt.
>
> Thanks
>
> -Tony
>
> ---
>
> [   33.111505] Console: switching to colour dummy device 80x25
> [   33.119581] mgag200 0000:08:00.0: vgaarb: deactivate vga console
> [   33.139574] [drm] Initialized mgag200 1.0.0 for 0000:08:00.0 on minor 0
> [   33.157665] fbcon: mgag200drmfb (fb0) is primary device
> [   33.196490] ixgbe 0000:03:00.1: Multiqueue Enabled: Rx Queue count = 63, Tx Queue count = 63 XDP Queue count = 0
> [   33.281367] ixgbe 0000:03:00.1: 16.000 Gb/s available PCIe bandwidth, limited by 5.0 GT/s PCIe x4 link at 0000:00:03.2 (capable of 32.000 Gb/s with 5.0 GT/s PCIe x8 link)
> [   33.282519] ------------[ cut here ]------------
> [   33.282550] mgag200 0000:08:00.0: [drm] drm_WARN_ON(pipe >= dev->num_crtcs)
> [   33.282610] WARNING: CPU: 123 PID: 1774 at drivers/gpu/drm/drm_vblank.c:1488 drm_crtc_vblank_on_config+0x1b5/0x210 [drm]
> [   33.282687] Modules linked in: crct10dif_pclmul crc32_pclmul mgag200(+) crc32c_intel i2c_algo_bit ghash_clmulni_intel drm_shmem_helper sha512_ssse3 drm_kms_helper sha256_ssse3 sha1_ssse3 mpt3sas ixgbe(+) raid_class mdio drm scsi_transport_sas dca fuse
> [   33.282712] CPU: 123 UID: 0 PID: 1774 Comm: systemd-udevd Not tainted 6.12.0-rc2+ #171
> [   33.282716] Hardware name: Intel Corporation BRICKLAND/BRICKLAND, BIOS BRBDXSD1.86B.0338.V01.1603162127 03/16/2016
> [   33.282718] RIP: 0010:drm_crtc_vblank_on_config+0x1b5/0x210 [drm]
> [   33.282743] Code: 4c 8b 67 50 4d 85 e4 75 03 4c 8b 27 e8 34 ce 01 d6 48 c7 c1 78 9b b1 c0 4c 89 e2 48 c7 c7 1e d6 b1 c0 48 89 c6 e8 3b 9f 60 d5 <0f> 0b 48 83 c4 18 5b 5d 41 5c 41 5d 41 5e 41 5f c3 cc cc cc cc 48
> [   33.282745] RSP: 0018:ffffbd1ca3f8f660 EFLAGS: 00010282
> [   33.282749] RAX: 000000000000003f RBX: ffff9ddf0a498000 RCX: 0000000000000000
> [   33.282751] RDX: 0000000000000002 RSI: ffffffff97a00e78 RDI: 00000000ffffffff
> [   33.282753] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000000
> [   33.282755] R10: 0000000000000001 R11: 0000000000000001 R12: ffff9df257758df0
> [   33.282757] R13: ffff9ddf0a4993f0 R14: ffffffffc0b726c0 R15: ffff9ddf05d33450
> [   33.282758] FS:  00007f66ab8e2b40(0000) GS:ffff9deb61f80000(0000) knlGS:0000000000000000
> [   33.282761] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   33.282763] CR2: 00007f66ab8c7c4b CR3: 000000000bc04003 CR4: 00000000003706f0
> [   33.282765] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [   33.282766] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [   33.282768] Call Trace:
> [   33.282771]  <TASK>
> [   33.282773]  ? __warn+0x90/0x1a0
> [   33.282785]  ? drm_crtc_vblank_on_config+0x1b5/0x210 [drm]
> [   33.282808]  ? report_bug+0x1c3/0x1d0
> [   33.282819]  ? handle_bug+0x5b/0xa0
> [   33.282824]  ? exc_invalid_op+0x14/0x70
> [   33.282827]  ? asm_exc_invalid_op+0x16/0x20
> [   33.282839]  ? drm_crtc_vblank_on_config+0x1b5/0x210 [drm]
> [   33.282862]  ? mgag200_crtc_set_gamma_linear+0x17a/0x190 [mgag200]
> [   33.282868]  ? mgag200_enable_display+0x13b/0x160 [mgag200]
> [   33.282876]  drm_crtc_vblank_on+0x28/0x40 [drm]
> [   33.282898]  drm_atomic_helper_commit_modeset_enables+0xa6/0x240 [drm_kms_helper]
> [   33.282920]  drm_atomic_helper_commit_tail+0x50/0x80 [drm_kms_helper]
> [   33.282931]  mgag200_mode_config_helper_atomic_commit_tail+0x28/0x40 [mgag200]
> [   33.282951]  commit_tail+0x94/0x130 [drm_kms_helper]
> [   33.282963]  drm_atomic_helper_commit+0x13e/0x170 [drm_kms_helper]
> [   33.282975]  drm_atomic_commit+0x97/0xb0 [drm]
> [   33.282996]  ? __pfx___drm_printfn_info+0x10/0x10 [drm]
> [   33.283027]  drm_client_modeset_commit_atomic+0x207/0x250 [drm]
> [   33.283060]  drm_client_modeset_commit_locked+0x5b/0x190 [drm]
> [   33.283086]  drm_client_modeset_commit+0x24/0x50 [drm]
> [   33.283109]  __drm_fb_helper_restore_fbdev_mode_unlocked+0x95/0xd0 [drm_kms_helper]
> [   33.283122]  drm_fb_helper_set_par+0x2e/0x40 [drm_kms_helper]
> [   33.283132]  fbcon_init+0x2a8/0x560
> [   33.283143]  visual_init+0xc4/0x120
> [   33.283150]  do_bind_con_driver.isra.0+0x1a1/0x3d0
> [   33.283158]  do_take_over_console+0x10b/0x1a0
> [   33.283164]  do_fbcon_takeover+0x5c/0xc0
> [   33.283167]  fbcon_fb_registered+0x49/0x70
> [   33.283170]  do_register_framebuffer+0x184/0x230
> [   33.283179]  register_framebuffer+0x20/0x40
> [   33.283182]  __drm_fb_helper_initial_config_and_unlock+0x33e/0x590 [drm_kms_helper]
> [   33.283193]  ? drm_client_register+0x33/0xc0 [drm]
> [   33.283222]  drm_fbdev_shmem_client_hotplug+0x6c/0xc0 [drm_shmem_helper]
> [   33.283228]  drm_client_register+0x7b/0xc0 [drm]
> [   33.283254]  mgag200_pci_probe+0x90/0x180 [mgag200]
> [   33.283262]  local_pci_probe+0x46/0xa0
> [   33.283269]  pci_device_probe+0xb5/0x220
> [   33.283277]  really_probe+0xd9/0x380
> [   33.283288]  __driver_probe_device+0x78/0x150
> [   33.283293]  driver_probe_device+0x1e/0x90
> [   33.283297]  __driver_attach+0xd6/0x1d0
> [   33.283301]  ? __pfx___driver_attach+0x10/0x10
> [   33.283305]  bus_for_each_dev+0x66/0xa0
> [   33.283311]  bus_add_driver+0x111/0x240
> [   33.283317]  driver_register+0x5c/0x120
> [   33.283320]  ? __pfx_mgag200_pci_driver_init+0x10/0x10 [mgag200]
> [   33.283326]  do_one_initcall+0x62/0x3a0
> [   33.283333]  ? __kmalloc_cache_noprof+0x240/0x300
> [   33.283343]  do_init_module+0x64/0x240
> [   33.283354]  init_module_from_file+0x7a/0xa0
> [   33.283366]  idempotent_init_module+0x15f/0x260
> [   33.283378]  __x64_sys_finit_module+0x5a/0xb0
> [   33.283383]  do_syscall_64+0x73/0x190
> [   33.283396]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
> [   33.283399] RIP: 0033:0x7f66ac527e0d
> [   33.283403] Code: c8 0c 00 0f 05 eb a9 66 0f 1f 44 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 3b 80 0c 00 f7 d8 64 89 01 48
> [   33.283406] RSP: 002b:00007ffff0c752b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
> [   33.283410] RAX: ffffffffffffffda RBX: 0000557cd3b38d00 RCX: 00007f66ac527e0d
> [   33.283412] RDX: 0000000000000000 RSI: 00007f66ac68132c RDI: 0000000000000010
> [   33.283414] RBP: 0000000000020000 R08: 0000000000000000 R09: 0000000000000000
> [   33.283416] R10: 0000000000000010 R11: 0000000000000246 R12: 00007f66ac68132c
> [   33.283418] R13: 0000557cd3b18eb0 R14: 0000000000000007 R15: 0000557cd3b38f80
> [   33.283429]  </TASK>
> [   33.283431] irq event stamp: 45133
> [   33.283433] hardirqs last  enabled at (45139): [<ffffffff96187784>] vprintk_emit+0x3d4/0x3e0
> [   33.283444] hardirqs last disabled at (45144): [<ffffffff96187737>] vprintk_emit+0x387/0x3e0
> [   33.283448] softirqs last  enabled at (44822): [<ffffffff960dac91>] __irq_exit_rcu+0xa1/0x110
> [   33.283456] softirqs last disabled at (44817): [<ffffffff960dac91>] __irq_exit_rcu+0xa1/0x110
> [   33.283459] ---[ end trace 0000000000000000 ]---
> [   33.283494] Console: switching to colour frame buffer device 128x48
> [   33.379557] ixgbe 0000:03:00.1: MAC: 3, PHY: 0, PBA No: G36748-005
> [   33.399852] mgag200 0000:08:00.0: [drm] fb0: mgag200drmfb frame buffer device
Tony Luck Oct. 14, 2024, 5:14 p.m. UTC | #20
> Attached is a full revert of the vblank support for you to test. If that 
> undoes the bug, I'll post it for review to the list.

Thomas.

I applied that to v6.12-rc3. Builds cleanly.

System boots with no warnings.

MGAG device is present:

$ dmesg | grep mgag
[   31.277259] mgag200 0000:08:00.0: vgaarb: deactivate vga console
[   31.298138] [drm] Initialized mgag200 1.0.0 for 0000:08:00.0 on minor 0
[   31.324081] fbcon: mgag200drmfb (fb0) is primary device
[   31.414494] mgag200 0000:08:00.0: [drm] fb0: mgag200drmfb frame buffer device

VGA console working.

Thanks. Please apply my tags:

Reported-by: Tony Luck <tony.luck@intel.com>
Tested-by: Tony Luck <tony.luck@intel.com>

-Tony
Thomas Zimmermann Oct. 15, 2024, 7:13 a.m. UTC | #21
Hi

Am 14.10.24 um 19:14 schrieb Luck, Tony:
>> Attached is a full revert of the vblank support for you to test. If that
>> undoes the bug, I'll post it for review to the list.
> Thomas.
>
> I applied that to v6.12-rc3. Builds cleanly.
>
> System boots with no warnings.
>
> MGAG device is present:
>
> $ dmesg | grep mgag
> [   31.277259] mgag200 0000:08:00.0: vgaarb: deactivate vga console
> [   31.298138] [drm] Initialized mgag200 1.0.0 for 0000:08:00.0 on minor 0
> [   31.324081] fbcon: mgag200drmfb (fb0) is primary device
> [   31.414494] mgag200 0000:08:00.0: [drm] fb0: mgag200drmfb frame buffer device
>
> VGA console working.
>
> Thanks. Please apply my tags:
>
> Reported-by: Tony Luck <tony.luck@intel.com>
> Tested-by: Tony Luck <tony.luck@intel.com>

Thanks a lot for helping. The revert is at

https://lore.kernel.org/dri-devel/20241015063932.8620-1-tzimmermann@suse.de/T/#u

Best regards
Thomas

>
> -Tony
>
Tony Luck Oct. 25, 2024, 5:32 p.m. UTC | #22
> Thanks a lot for helping. The revert is at
>
> https://lore.kernel.org/dri-devel/20241015063932.8620-1-tzimmermann@suse.de/T/#u

Thomas,

Final closure. That patch was pulled by Linus into v6.12-rc4.  I just built and booted with no problems.

Thanks

-Tony