diff mbox

drm/radeon: Fix oops upon driver load on PowerXpress laptops

Message ID cfb91ba052af06117137eec0637543a2626a7979.1495135190.git.lukas@wunner.de (mailing list archive)
State New, archived
Headers show

Commit Message

Lukas Wunner May 18, 2017, 7:33 p.m. UTC
Nicolai Stange reports the following oops which is caused by
dereferencing rdev->pdev before it's subsequently set by
radeon_device_init().  Fix it.

  BUG: unable to handle kernel NULL pointer dereference at 00000000000007cb
  IP: radeon_driver_load_kms+0xeb/0x230 [radeon]
  PGD 0
  P4D 0

  Oops: 0000 [#1] SMP
  Modules linked in: amdkfd amd_iommu_v2 i915(+) radeon(+) i2c_algo_bit drm_kms_helper ttm e1000e drm sdhci_pci sdhci_acpi ptp sdhci crc32c_intel serio_raw mmc_core pps_core video i2c_hid hid_plantronics
  CPU: 4 PID: 389 Comm: systemd-udevd Not tainted 4.12.0-rc1-next-20170515+ #1
  Hardware name: Dell Inc. Latitude E6540/0725FP, BIOS A10 06/26/2014
  task: ffff97d62c8f0000 task.stack: ffffb96f01478000
  RIP: 0010:radeon_driver_load_kms+0xeb/0x230 [radeon]
  RSP: 0018:ffffb96f0147b9d0 EFLAGS: 00010246
  RAX: 0000000000000000 RBX: ffff97d620085000 RCX: 0000000000610037
  RDX: 0000000000000000 RSI: 000000000000002b RDI: 0000000000000000
  RBP: ffffb96f0147b9e8 R08: 0000000000000002 R09: ffffb96f0147b924
  R10: 0000000000000000 R11: ffff97d62edd2ec0 R12: ffff97d628d5c000
  R13: 0000000000610037 R14: ffffffffc0698280 R15: 0000000000000000
  FS:  00007f496363d8c0(0000) GS:ffff97d62eb00000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00000000000007cb CR3: 000000022c14c000 CR4: 00000000001406e0
  Call Trace:
   drm_dev_register+0x146/0x1d0 [drm]
   drm_get_pci_dev+0x9a/0x180 [drm]
   radeon_pci_probe+0xb8/0xe0 [radeon]
   local_pci_probe+0x45/0xa0
   pci_device_probe+0x14f/0x1a0
   driver_probe_device+0x29c/0x450
   __driver_attach+0xdf/0xf0
   ? driver_probe_device+0x450/0x450
   bus_for_each_dev+0x6c/0xc0
   driver_attach+0x1e/0x20
   bus_add_driver+0x170/0x270
   driver_register+0x60/0xe0
   ? 0xffffffffc0508000
   __pci_register_driver+0x4c/0x50
   drm_pci_init+0xeb/0x100 [drm]
   ? vga_switcheroo_register_handler+0x6a/0x90
   ? 0xffffffffc0508000
   radeon_init+0x98/0xb6 [radeon]
   do_one_initcall+0x52/0x1a0
   ? __vunmap+0x81/0xb0
   ? kmem_cache_alloc_trace+0x159/0x1b0
   ? do_init_module+0x27/0x1f8
   do_init_module+0x5f/0x1f8
   load_module+0x27ce/0x2be0
   SYSC_finit_module+0xdf/0x110
   ? SYSC_finit_module+0xdf/0x110
   SyS_finit_module+0xe/0x10
   do_syscall_64+0x67/0x150
   entry_SYSCALL64_slow_path+0x25/0x25
  RIP: 0033:0x7f4962295679
  RSP: 002b:00007ffdd8c4f878 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
  RAX: ffffffffffffffda RBX: 000055c014ed8200 RCX: 00007f4962295679
  RDX: 0000000000000000 RSI: 00007f4962dd19c5 RDI: 0000000000000010
  RBP: 00007f4962dd19c5 R08: 0000000000000000 R09: 00007ffdd8c4f990
  R10: 0000000000000010 R11: 0000000000000246 R12: 0000000000000000
  R13: 000055c014ed81a0 R14: 0000000000020000 R15: 000055c0149d1fca
  Code: 5d 5d c3 8b 05 a7 05 14 00 49 81 cd 00 00 08 00 85 c0 74 a3 e8 e7 c0 0e 00 84 c0 74 9a 41 f7 c5 00 00 02 00 75 91 49 8b 44 24 10 <0f> b6 90 cb 07 00 00 f6 c2 20 74 1e e9 7b ff ff ff 48 8b 40 38
  RIP: radeon_driver_load_kms+0xeb/0x230 [radeon] RSP: ffffb96f0147b9d0
  CR2: 00000000000007cb
  ---[ end trace 89cc4ba7e569c65c ]---

Reported-by: Nicolai Stange <nicstange@gmail.com>
Fixes: 7ffb0ce31cf9 ("drm/radeon: Don't register Thunderbolt eGPU with vga_switcheroo")
Signed-off-by: Lukas Wunner <lukas@wunner.de>
---

Awaiting a Tested-by: from Nicolai, but it's clear this is a bug and
needs to be fixed, so sending out with a proper commit message now.
The bug was only introduced to radeon, not amdgpu.

@Alex Deucher: I could push this to drm-misc-fixes but then it wouldn't
land before -rc3 because Sean Paul has already sent out the -rc2 pull.
I notice you haven't sent out a pull for -rc2 yet, so maybe you want to
take it yourself?  Whichever you prefer.  Thanks & sorry for the breakage!

 drivers/gpu/drm/radeon/radeon_kms.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Nicolai Stange May 21, 2017, 7:31 a.m. UTC | #1
On Thu, May 18 2017, Lukas Wunner wrote:

<snip>

> Reported-by: Nicolai Stange <nicstange@gmail.com>
> Fixes: 7ffb0ce31cf9 ("drm/radeon: Don't register Thunderbolt eGPU with vga_switcheroo")
> Signed-off-by: Lukas Wunner <lukas@wunner.de>
> ---
>
> Awaiting a Tested-by: from Nicolai, but it's clear this is a bug and
> needs to be fixed, so sending out with a proper commit message now.
> The bug was only introduced to radeon, not amdgpu.

Tested-by: Nicolai Stange <nicstange@gmail.com>

Thanks for the quick fix!


> @Alex Deucher: I could push this to drm-misc-fixes but then it wouldn't
> land before -rc3 because Sean Paul has already sent out the -rc2 pull.
> I notice you haven't sent out a pull for -rc2 yet, so maybe you want to
> take it yourself?  Whichever you prefer.  Thanks & sorry for the breakage!
>
>  drivers/gpu/drm/radeon/radeon_kms.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/radeon/radeon_kms.c b/drivers/gpu/drm/radeon/radeon_kms.c
> index 6a68d440bc44..d0ad03674250 100644
> --- a/drivers/gpu/drm/radeon/radeon_kms.c
> +++ b/drivers/gpu/drm/radeon/radeon_kms.c
> @@ -116,7 +116,7 @@ int radeon_driver_load_kms(struct drm_device *dev, unsigned long flags)
>  	if ((radeon_runtime_pm != 0) &&
>  	    radeon_has_atpx() &&
>  	    ((flags & RADEON_IS_IGP) == 0) &&
> -	    !pci_is_thunderbolt_attached(rdev->pdev))
> +	    !pci_is_thunderbolt_attached(dev->pdev))
>  		flags |= RADEON_IS_PX;
>  
>  	/* radeon_device_init should report only fatal error
Lukas Wunner May 22, 2017, 2:04 p.m. UTC | #2
On Sun, May 21, 2017 at 09:31:09AM +0200, Nicolai Stange wrote:
> On Thu, May 18 2017, Lukas Wunner wrote:
[snip]
> > Reported-by: Nicolai Stange <nicstange@gmail.com>
> > Fixes: 7ffb0ce31cf9 ("drm/radeon: Don't register Thunderbolt eGPU with vga_switcheroo")
> > Signed-off-by: Lukas Wunner <lukas@wunner.de>
> > ---
> >
> > Awaiting a Tested-by: from Nicolai, but it's clear this is a bug and
> > needs to be fixed, so sending out with a proper commit message now.
> > The bug was only introduced to radeon, not amdgpu.
> 
> Tested-by: Nicolai Stange <nicstange@gmail.com>
> 
> Thanks for the quick fix!
>
> > @Alex Deucher: I could push this to drm-misc-fixes but then it wouldn't
> > land before -rc3 because Sean Paul has already sent out the -rc2 pull.
> > I notice you haven't sent out a pull for -rc2 yet, so maybe you want to
> > take it yourself?  Whichever you prefer.  Thanks & sorry for the breakage!

I've learned this morning that Alex is on vacation.  I've pushed
the patch to drm-misc-fixes so that the issue is fixed in 4.12-rc3.

@Sean Paul: I've fast-forwarded to 4.12-rc2 before pushing, please
shout if I've done anything wrong.  First time I'm doing this.

Thanks,

Lukas

> >
> >  drivers/gpu/drm/radeon/radeon_kms.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/radeon/radeon_kms.c b/drivers/gpu/drm/radeon/radeon_kms.c
> > index 6a68d440bc44..d0ad03674250 100644
> > --- a/drivers/gpu/drm/radeon/radeon_kms.c
> > +++ b/drivers/gpu/drm/radeon/radeon_kms.c
> > @@ -116,7 +116,7 @@ int radeon_driver_load_kms(struct drm_device *dev, unsigned long flags)
> >  	if ((radeon_runtime_pm != 0) &&
> >  	    radeon_has_atpx() &&
> >  	    ((flags & RADEON_IS_IGP) == 0) &&
> > -	    !pci_is_thunderbolt_attached(rdev->pdev))
> > +	    !pci_is_thunderbolt_attached(dev->pdev))
> >  		flags |= RADEON_IS_PX;
> >  
> >  	/* radeon_device_init should report only fatal error
Daniel Vetter May 22, 2017, 7:24 p.m. UTC | #3
On Thu, May 18, 2017 at 09:33:44PM +0200, Lukas Wunner wrote:
> Nicolai Stange reports the following oops which is caused by
> dereferencing rdev->pdev before it's subsequently set by
> radeon_device_init().  Fix it.
> 
>   BUG: unable to handle kernel NULL pointer dereference at 00000000000007cb
>   IP: radeon_driver_load_kms+0xeb/0x230 [radeon]
>   PGD 0
>   P4D 0
> 
>   Oops: 0000 [#1] SMP
>   Modules linked in: amdkfd amd_iommu_v2 i915(+) radeon(+) i2c_algo_bit drm_kms_helper ttm e1000e drm sdhci_pci sdhci_acpi ptp sdhci crc32c_intel serio_raw mmc_core pps_core video i2c_hid hid_plantronics
>   CPU: 4 PID: 389 Comm: systemd-udevd Not tainted 4.12.0-rc1-next-20170515+ #1
>   Hardware name: Dell Inc. Latitude E6540/0725FP, BIOS A10 06/26/2014
>   task: ffff97d62c8f0000 task.stack: ffffb96f01478000
>   RIP: 0010:radeon_driver_load_kms+0xeb/0x230 [radeon]
>   RSP: 0018:ffffb96f0147b9d0 EFLAGS: 00010246
>   RAX: 0000000000000000 RBX: ffff97d620085000 RCX: 0000000000610037
>   RDX: 0000000000000000 RSI: 000000000000002b RDI: 0000000000000000
>   RBP: ffffb96f0147b9e8 R08: 0000000000000002 R09: ffffb96f0147b924
>   R10: 0000000000000000 R11: ffff97d62edd2ec0 R12: ffff97d628d5c000
>   R13: 0000000000610037 R14: ffffffffc0698280 R15: 0000000000000000
>   FS:  00007f496363d8c0(0000) GS:ffff97d62eb00000(0000) knlGS:0000000000000000
>   CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>   CR2: 00000000000007cb CR3: 000000022c14c000 CR4: 00000000001406e0
>   Call Trace:
>    drm_dev_register+0x146/0x1d0 [drm]
>    drm_get_pci_dev+0x9a/0x180 [drm]
>    radeon_pci_probe+0xb8/0xe0 [radeon]
>    local_pci_probe+0x45/0xa0
>    pci_device_probe+0x14f/0x1a0
>    driver_probe_device+0x29c/0x450
>    __driver_attach+0xdf/0xf0
>    ? driver_probe_device+0x450/0x450
>    bus_for_each_dev+0x6c/0xc0
>    driver_attach+0x1e/0x20
>    bus_add_driver+0x170/0x270
>    driver_register+0x60/0xe0
>    ? 0xffffffffc0508000
>    __pci_register_driver+0x4c/0x50
>    drm_pci_init+0xeb/0x100 [drm]
>    ? vga_switcheroo_register_handler+0x6a/0x90
>    ? 0xffffffffc0508000
>    radeon_init+0x98/0xb6 [radeon]
>    do_one_initcall+0x52/0x1a0
>    ? __vunmap+0x81/0xb0
>    ? kmem_cache_alloc_trace+0x159/0x1b0
>    ? do_init_module+0x27/0x1f8
>    do_init_module+0x5f/0x1f8
>    load_module+0x27ce/0x2be0
>    SYSC_finit_module+0xdf/0x110
>    ? SYSC_finit_module+0xdf/0x110
>    SyS_finit_module+0xe/0x10
>    do_syscall_64+0x67/0x150
>    entry_SYSCALL64_slow_path+0x25/0x25
>   RIP: 0033:0x7f4962295679
>   RSP: 002b:00007ffdd8c4f878 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
>   RAX: ffffffffffffffda RBX: 000055c014ed8200 RCX: 00007f4962295679
>   RDX: 0000000000000000 RSI: 00007f4962dd19c5 RDI: 0000000000000010
>   RBP: 00007f4962dd19c5 R08: 0000000000000000 R09: 00007ffdd8c4f990
>   R10: 0000000000000010 R11: 0000000000000246 R12: 0000000000000000
>   R13: 000055c014ed81a0 R14: 0000000000020000 R15: 000055c0149d1fca
>   Code: 5d 5d c3 8b 05 a7 05 14 00 49 81 cd 00 00 08 00 85 c0 74 a3 e8 e7 c0 0e 00 84 c0 74 9a 41 f7 c5 00 00 02 00 75 91 49 8b 44 24 10 <0f> b6 90 cb 07 00 00 f6 c2 20 74 1e e9 7b ff ff ff 48 8b 40 38
>   RIP: radeon_driver_load_kms+0xeb/0x230 [radeon] RSP: ffffb96f0147b9d0
>   CR2: 00000000000007cb
>   ---[ end trace 89cc4ba7e569c65c ]---
> 
> Reported-by: Nicolai Stange <nicstange@gmail.com>
> Fixes: 7ffb0ce31cf9 ("drm/radeon: Don't register Thunderbolt eGPU with vga_switcheroo")
> Signed-off-by: Lukas Wunner <lukas@wunner.de>
> ---
> 
> Awaiting a Tested-by: from Nicolai, but it's clear this is a bug and
> needs to be fixed, so sending out with a proper commit message now.
> The bug was only introduced to radeon, not amdgpu.
> 
> @Alex Deucher: I could push this to drm-misc-fixes but then it wouldn't
> land before -rc3 because Sean Paul has already sent out the -rc2 pull.
> I notice you haven't sent out a pull for -rc2 yet, so maybe you want to
> take it yourself?  Whichever you prefer.  Thanks & sorry for the breakage!

Just noticed that this has landed already in drm-misc-fixes, without any
r-b or at least an ack from radeon driver folks. That's breaking the
drm-misc rules, we need at least an ack for small drivers (which radeon
really isn't) and a full reviewed-by tag on everything else.

Patch doesn't look wrong, so not much harm, but please follow the ground
rules and especially don't ever push your own patches without any peer
feedback.

Thanks, Daniel

> 
>  drivers/gpu/drm/radeon/radeon_kms.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/radeon/radeon_kms.c b/drivers/gpu/drm/radeon/radeon_kms.c
> index 6a68d440bc44..d0ad03674250 100644
> --- a/drivers/gpu/drm/radeon/radeon_kms.c
> +++ b/drivers/gpu/drm/radeon/radeon_kms.c
> @@ -116,7 +116,7 @@ int radeon_driver_load_kms(struct drm_device *dev, unsigned long flags)
>  	if ((radeon_runtime_pm != 0) &&
>  	    radeon_has_atpx() &&
>  	    ((flags & RADEON_IS_IGP) == 0) &&
> -	    !pci_is_thunderbolt_attached(rdev->pdev))
> +	    !pci_is_thunderbolt_attached(dev->pdev))
>  		flags |= RADEON_IS_PX;
>  
>  	/* radeon_device_init should report only fatal error
> -- 
> 2.11.0
> 
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
Sean Paul May 22, 2017, 7:35 p.m. UTC | #4
On Mon, May 22, 2017 at 04:04:07PM +0200, Lukas Wunner wrote:
> On Sun, May 21, 2017 at 09:31:09AM +0200, Nicolai Stange wrote:
> > On Thu, May 18 2017, Lukas Wunner wrote:
> [snip]
> > > Reported-by: Nicolai Stange <nicstange@gmail.com>
> > > Fixes: 7ffb0ce31cf9 ("drm/radeon: Don't register Thunderbolt eGPU with vga_switcheroo")
> > > Signed-off-by: Lukas Wunner <lukas@wunner.de>
> > > ---
> > >
> > > Awaiting a Tested-by: from Nicolai, but it's clear this is a bug and
> > > needs to be fixed, so sending out with a proper commit message now.
> > > The bug was only introduced to radeon, not amdgpu.
> > 
> > Tested-by: Nicolai Stange <nicstange@gmail.com>
> > 
> > Thanks for the quick fix!
> >
> > > @Alex Deucher: I could push this to drm-misc-fixes but then it wouldn't
> > > land before -rc3 because Sean Paul has already sent out the -rc2 pull.
> > > I notice you haven't sent out a pull for -rc2 yet, so maybe you want to
> > > take it yourself?  Whichever you prefer.  Thanks & sorry for the breakage!
> 
> I've learned this morning that Alex is on vacation.  I've pushed
> the patch to drm-misc-fixes so that the issue is fixed in 4.12-rc3.
> 
> @Sean Paul: I've fast-forwarded to 4.12-rc2 before pushing, please
> shout if I've done anything wrong.  First time I'm doing this.

No shouting, but a heads-up on IRC is probably warranted for both pushing a
patch without R-b and fast-forwarding one of the branches.

Sean

> 
> Thanks,
> 
> Lukas
> 
> > >
> > >  drivers/gpu/drm/radeon/radeon_kms.c | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/gpu/drm/radeon/radeon_kms.c b/drivers/gpu/drm/radeon/radeon_kms.c
> > > index 6a68d440bc44..d0ad03674250 100644
> > > --- a/drivers/gpu/drm/radeon/radeon_kms.c
> > > +++ b/drivers/gpu/drm/radeon/radeon_kms.c
> > > @@ -116,7 +116,7 @@ int radeon_driver_load_kms(struct drm_device *dev, unsigned long flags)
> > >  	if ((radeon_runtime_pm != 0) &&
> > >  	    radeon_has_atpx() &&
> > >  	    ((flags & RADEON_IS_IGP) == 0) &&
> > > -	    !pci_is_thunderbolt_attached(rdev->pdev))
> > > +	    !pci_is_thunderbolt_attached(dev->pdev))
> > >  		flags |= RADEON_IS_PX;
> > >  
> > >  	/* radeon_device_init should report only fatal error
Michel Dänzer May 23, 2017, 3:09 a.m. UTC | #5
On 22/05/17 11:04 PM, Lukas Wunner wrote:
> On Sun, May 21, 2017 at 09:31:09AM +0200, Nicolai Stange wrote:
>> On Thu, May 18 2017, Lukas Wunner wrote:
> [snip]
>>> Reported-by: Nicolai Stange <nicstange@gmail.com>
>>> Fixes: 7ffb0ce31cf9 ("drm/radeon: Don't register Thunderbolt eGPU with vga_switcheroo")
>>> Signed-off-by: Lukas Wunner <lukas@wunner.de>
>>> ---
>>>
>>> Awaiting a Tested-by: from Nicolai, but it's clear this is a bug and
>>> needs to be fixed, so sending out with a proper commit message now.
>>> The bug was only introduced to radeon, not amdgpu.
>>
>> Tested-by: Nicolai Stange <nicstange@gmail.com>
>>
>> Thanks for the quick fix!
>>
>>> @Alex Deucher: I could push this to drm-misc-fixes but then it wouldn't
>>> land before -rc3 because Sean Paul has already sent out the -rc2 pull.
>>> I notice you haven't sent out a pull for -rc2 yet, so maybe you want to
>>> take it yourself?  Whichever you prefer.  Thanks & sorry for the breakage!
> 
> I've learned this morning that Alex is on vacation.

Christian König is standing in for Alex.

> I've pushed the patch to drm-misc-fixes so that the issue is fixed in
> 4.12-rc3.

I don't think there was any particular need to bypass the normal radeon
tree for this. There was plenty of time for the fix to get into 4.12
final, even after Alex is back.
Lukas Wunner May 23, 2017, 3:50 a.m. UTC | #6
On Tue, May 23, 2017 at 12:09:49PM +0900, Michel Dänzer wrote:
> On 22/05/17 11:04 PM, Lukas Wunner wrote:
> > On Sun, May 21, 2017 at 09:31:09AM +0200, Nicolai Stange wrote:
> >> On Thu, May 18 2017, Lukas Wunner wrote:
> > [snip]
> >>> Reported-by: Nicolai Stange <nicstange@gmail.com>
> >>> Fixes: 7ffb0ce31cf9 ("drm/radeon: Don't register Thunderbolt eGPU with vga_switcheroo")
> >>> Signed-off-by: Lukas Wunner <lukas@wunner.de>
> >>> ---
> >>>
> >>> Awaiting a Tested-by: from Nicolai, but it's clear this is a bug and
> >>> needs to be fixed, so sending out with a proper commit message now.
> >>> The bug was only introduced to radeon, not amdgpu.
> >>
> >> Tested-by: Nicolai Stange <nicstange@gmail.com>
> >>
> >> Thanks for the quick fix!
> >>
> >>> @Alex Deucher: I could push this to drm-misc-fixes but then it wouldn't
> >>> land before -rc3 because Sean Paul has already sent out the -rc2 pull.
> >>> I notice you haven't sent out a pull for -rc2 yet, so maybe you want to
> >>> take it yourself?  Whichever you prefer.  Thanks & sorry for the breakage!
> > 
> > I've learned this morning that Alex is on vacation.
> 
> Christian König is standing in for Alex.

By his own account, he already has "all hands full replacing him [Alex]",
explicitly asked Daniel to merge an amdgpu patch through drm-misc-next for
this reason and lacks permission to update branches in Alex' repo on fdo:

"One lesson learned from the past week is that Alex needs to stop using
his personal repository on fdo.
We were asked a couple of times if I couldn't update a branch there from 
different directions, which we obviously can't do."

https://lists.freedesktop.org/archives/dri-devel/2017-May/142376.html
https://lists.freedesktop.org/archives/dri-devel/2017-May/142380.html


> > I've pushed the patch to drm-misc-fixes so that the issue is fixed in
> > 4.12-rc3.
> 
> I don't think there was any particular need to bypass the normal radeon
> tree for this. There was plenty of time for the fix to get into 4.12
> final, even after Alex is back.

Well, it wouldn't be nice towards users affected by the same issue
who may waste time with bisecting to just sit on a fix twiddling thumbs.

Thanks,

Lukas
Michel Dänzer May 23, 2017, 3:55 a.m. UTC | #7
On 23/05/17 12:50 PM, Lukas Wunner wrote:
> On Tue, May 23, 2017 at 12:09:49PM +0900, Michel Dänzer wrote:
>> On 22/05/17 11:04 PM, Lukas Wunner wrote:
>>> On Sun, May 21, 2017 at 09:31:09AM +0200, Nicolai Stange wrote:
>>>> On Thu, May 18 2017, Lukas Wunner wrote:
>>> [snip]
>>>>> Reported-by: Nicolai Stange <nicstange@gmail.com>
>>>>> Fixes: 7ffb0ce31cf9 ("drm/radeon: Don't register Thunderbolt eGPU with vga_switcheroo")
>>>>> Signed-off-by: Lukas Wunner <lukas@wunner.de>
>>>>> ---
>>>>>
>>>>> Awaiting a Tested-by: from Nicolai, but it's clear this is a bug and
>>>>> needs to be fixed, so sending out with a proper commit message now.
>>>>> The bug was only introduced to radeon, not amdgpu.
>>>>
>>>> Tested-by: Nicolai Stange <nicstange@gmail.com>
>>>>
>>>> Thanks for the quick fix!
>>>>
>>>>> @Alex Deucher: I could push this to drm-misc-fixes but then it wouldn't
>>>>> land before -rc3 because Sean Paul has already sent out the -rc2 pull.
>>>>> I notice you haven't sent out a pull for -rc2 yet, so maybe you want to
>>>>> take it yourself?  Whichever you prefer.  Thanks & sorry for the breakage!
>>>
>>> I've learned this morning that Alex is on vacation.
>>
>> Christian König is standing in for Alex.
> 
> By his own account, he already has "all hands full replacing him [Alex]",
> explicitly asked Daniel to merge an amdgpu patch through drm-misc-next for
> this reason and lacks permission to update branches in Alex' repo on fdo:
> 
> "One lesson learned from the past week is that Alex needs to stop using
> his personal repository on fdo.
> We were asked a couple of times if I couldn't update a branch there from 
> different directions, which we obviously can't do."
> 
> https://lists.freedesktop.org/archives/dri-devel/2017-May/142376.html
> https://lists.freedesktop.org/archives/dri-devel/2017-May/142380.html

The important point being that Christian reviewed that patch and
explicitly asked Daniel to pick it up.


>>> I've pushed the patch to drm-misc-fixes so that the issue is fixed in
>>> 4.12-rc3.
>>
>> I don't think there was any particular need to bypass the normal radeon
>> tree for this. There was plenty of time for the fix to get into 4.12
>> final, even after Alex is back.
> 
> Well, it wouldn't be nice towards users affected by the same issue
> who may waste time with bisecting to just sit on a fix twiddling thumbs.

We all tend to think our latest fix is the most important one. :) But I
really don't see this one being special.
Christian König May 23, 2017, 7:32 a.m. UTC | #8
Am 23.05.2017 um 05:55 schrieb Michel Dänzer:
> On 23/05/17 12:50 PM, Lukas Wunner wrote:
>> On Tue, May 23, 2017 at 12:09:49PM +0900, Michel Dänzer wrote:
>>> On 22/05/17 11:04 PM, Lukas Wunner wrote:
>>>> On Sun, May 21, 2017 at 09:31:09AM +0200, Nicolai Stange wrote:
>>>>> On Thu, May 18 2017, Lukas Wunner wrote:
>>>> [snip]
>>>>>> Reported-by: Nicolai Stange <nicstange@gmail.com>
>>>>>> Fixes: 7ffb0ce31cf9 ("drm/radeon: Don't register Thunderbolt eGPU with vga_switcheroo")
>>>>>> Signed-off-by: Lukas Wunner <lukas@wunner.de>
>>>>>> ---
>>>>>>
>>>>>> Awaiting a Tested-by: from Nicolai, but it's clear this is a bug and
>>>>>> needs to be fixed, so sending out with a proper commit message now.
>>>>>> The bug was only introduced to radeon, not amdgpu.
>>>>> Tested-by: Nicolai Stange <nicstange@gmail.com>
>>>>>
>>>>> Thanks for the quick fix!
>>>>>
>>>>>> @Alex Deucher: I could push this to drm-misc-fixes but then it wouldn't
>>>>>> land before -rc3 because Sean Paul has already sent out the -rc2 pull.
>>>>>> I notice you haven't sent out a pull for -rc2 yet, so maybe you want to
>>>>>> take it yourself?  Whichever you prefer.  Thanks & sorry for the breakage!
>>>> I've learned this morning that Alex is on vacation.
>>> Christian König is standing in for Alex.
>> By his own account, he already has "all hands full replacing him [Alex]",
>> explicitly asked Daniel to merge an amdgpu patch through drm-misc-next for
>> this reason and lacks permission to update branches in Alex' repo on fdo:
>>
>> "One lesson learned from the past week is that Alex needs to stop using
>> his personal repository on fdo.
>> We were asked a couple of times if I couldn't update a branch there from
>> different directions, which we obviously can't do."
>>
>> https://lists.freedesktop.org/archives/dri-devel/2017-May/142376.html
>> https://lists.freedesktop.org/archives/dri-devel/2017-May/142380.html
> The important point being that Christian reviewed that patch and
> explicitly asked Daniel to pick it up.

Wow, wait a second. I'm just catching up on this thread.

Lukas didn't committed the patch to drm-misc without a review, didn't you?

I was intentionally holding back a rb because that isn't my field of 
expertise and I was only briefly involved in the original patch. Alex 
should be back by the end of the week, so no need for a rush like that.

Daniels patch was a global cleanup of include path done by search and 
replace a complete different story.

Regards,
Christian.
Daniel Vetter May 23, 2017, 7:36 a.m. UTC | #9
On Tue, May 23, 2017 at 09:32:38AM +0200, Christian König wrote:
> Am 23.05.2017 um 05:55 schrieb Michel Dänzer:
> > On 23/05/17 12:50 PM, Lukas Wunner wrote:
> > > On Tue, May 23, 2017 at 12:09:49PM +0900, Michel Dänzer wrote:
> > > > On 22/05/17 11:04 PM, Lukas Wunner wrote:
> > > > > On Sun, May 21, 2017 at 09:31:09AM +0200, Nicolai Stange wrote:
> > > > > > On Thu, May 18 2017, Lukas Wunner wrote:
> > > > > [snip]
> > > > > > > Reported-by: Nicolai Stange <nicstange@gmail.com>
> > > > > > > Fixes: 7ffb0ce31cf9 ("drm/radeon: Don't register Thunderbolt eGPU with vga_switcheroo")
> > > > > > > Signed-off-by: Lukas Wunner <lukas@wunner.de>
> > > > > > > ---
> > > > > > > 
> > > > > > > Awaiting a Tested-by: from Nicolai, but it's clear this is a bug and
> > > > > > > needs to be fixed, so sending out with a proper commit message now.
> > > > > > > The bug was only introduced to radeon, not amdgpu.
> > > > > > Tested-by: Nicolai Stange <nicstange@gmail.com>
> > > > > > 
> > > > > > Thanks for the quick fix!
> > > > > > 
> > > > > > > @Alex Deucher: I could push this to drm-misc-fixes but then it wouldn't
> > > > > > > land before -rc3 because Sean Paul has already sent out the -rc2 pull.
> > > > > > > I notice you haven't sent out a pull for -rc2 yet, so maybe you want to
> > > > > > > take it yourself?  Whichever you prefer.  Thanks & sorry for the breakage!
> > > > > I've learned this morning that Alex is on vacation.
> > > > Christian König is standing in for Alex.
> > > By his own account, he already has "all hands full replacing him [Alex]",
> > > explicitly asked Daniel to merge an amdgpu patch through drm-misc-next for
> > > this reason and lacks permission to update branches in Alex' repo on fdo:
> > > 
> > > "One lesson learned from the past week is that Alex needs to stop using
> > > his personal repository on fdo.
> > > We were asked a couple of times if I couldn't update a branch there from
> > > different directions, which we obviously can't do."
> > > 
> > > https://lists.freedesktop.org/archives/dri-devel/2017-May/142376.html
> > > https://lists.freedesktop.org/archives/dri-devel/2017-May/142380.html
> > The important point being that Christian reviewed that patch and
> > explicitly asked Daniel to pick it up.
> 
> Wow, wait a second. I'm just catching up on this thread.
> 
> Lukas didn't committed the patch to drm-misc without a review, didn't you?
> 
> I was intentionally holding back a rb because that isn't my field of
> expertise and I was only briefly involved in the original patch. Alex should
> be back by the end of the week, so no need for a rush like that.
> 
> Daniels patch was a global cleanup of include path done by search and
> replace a complete different story.

Want me to drop the patch until Alex is back? We're generally trying
really hard to refrain from rebasing drm-misc branches, but for -fixes
it's doable (since much less patch traffic there).
-Daniel
Christian König May 23, 2017, 7:43 a.m. UTC | #10
Am 23.05.2017 um 09:36 schrieb Daniel Vetter:
> On Tue, May 23, 2017 at 09:32:38AM +0200, Christian König wrote:
>> Am 23.05.2017 um 05:55 schrieb Michel Dänzer:
>>> On 23/05/17 12:50 PM, Lukas Wunner wrote:
>>>> On Tue, May 23, 2017 at 12:09:49PM +0900, Michel Dänzer wrote:
>>>>> On 22/05/17 11:04 PM, Lukas Wunner wrote:
>>>>>> On Sun, May 21, 2017 at 09:31:09AM +0200, Nicolai Stange wrote:
>>>>>>> On Thu, May 18 2017, Lukas Wunner wrote:
>>>>>> [snip]
>>>>>>>> Reported-by: Nicolai Stange <nicstange@gmail.com>
>>>>>>>> Fixes: 7ffb0ce31cf9 ("drm/radeon: Don't register Thunderbolt eGPU with vga_switcheroo")
>>>>>>>> Signed-off-by: Lukas Wunner <lukas@wunner.de>
>>>>>>>> ---
>>>>>>>>
>>>>>>>> Awaiting a Tested-by: from Nicolai, but it's clear this is a bug and
>>>>>>>> needs to be fixed, so sending out with a proper commit message now.
>>>>>>>> The bug was only introduced to radeon, not amdgpu.
>>>>>>> Tested-by: Nicolai Stange <nicstange@gmail.com>
>>>>>>>
>>>>>>> Thanks for the quick fix!
>>>>>>>
>>>>>>>> @Alex Deucher: I could push this to drm-misc-fixes but then it wouldn't
>>>>>>>> land before -rc3 because Sean Paul has already sent out the -rc2 pull.
>>>>>>>> I notice you haven't sent out a pull for -rc2 yet, so maybe you want to
>>>>>>>> take it yourself?  Whichever you prefer.  Thanks & sorry for the breakage!
>>>>>> I've learned this morning that Alex is on vacation.
>>>>> Christian König is standing in for Alex.
>>>> By his own account, he already has "all hands full replacing him [Alex]",
>>>> explicitly asked Daniel to merge an amdgpu patch through drm-misc-next for
>>>> this reason and lacks permission to update branches in Alex' repo on fdo:
>>>>
>>>> "One lesson learned from the past week is that Alex needs to stop using
>>>> his personal repository on fdo.
>>>> We were asked a couple of times if I couldn't update a branch there from
>>>> different directions, which we obviously can't do."
>>>>
>>>> https://lists.freedesktop.org/archives/dri-devel/2017-May/142376.html
>>>> https://lists.freedesktop.org/archives/dri-devel/2017-May/142380.html
>>> The important point being that Christian reviewed that patch and
>>> explicitly asked Daniel to pick it up.
>> Wow, wait a second. I'm just catching up on this thread.
>>
>> Lukas didn't committed the patch to drm-misc without a review, didn't you?
>>
>> I was intentionally holding back a rb because that isn't my field of
>> expertise and I was only briefly involved in the original patch. Alex should
>> be back by the end of the week, so no need for a rush like that.
>>
>> Daniels patch was a global cleanup of include path done by search and
>> replace a complete different story.
> Want me to drop the patch until Alex is back? We're generally trying
> really hard to refrain from rebasing drm-misc branches, but for -fixes
> it's doable (since much less patch traffic there).

No, that would probably hurt more than it helps.

The patch is trivial and I just double checked the code and it should be 
fine.

Christian.
Daniel Vetter May 23, 2017, 7:43 a.m. UTC | #11
On Tue, May 23, 2017 at 09:36:44AM +0200, Daniel Vetter wrote:
> On Tue, May 23, 2017 at 09:32:38AM +0200, Christian König wrote:
> > Am 23.05.2017 um 05:55 schrieb Michel Dänzer:
> > > On 23/05/17 12:50 PM, Lukas Wunner wrote:
> > > > On Tue, May 23, 2017 at 12:09:49PM +0900, Michel Dänzer wrote:
> > > > > On 22/05/17 11:04 PM, Lukas Wunner wrote:
> > > > > > On Sun, May 21, 2017 at 09:31:09AM +0200, Nicolai Stange wrote:
> > > > > > > On Thu, May 18 2017, Lukas Wunner wrote:
> > > > > > [snip]
> > > > > > > > Reported-by: Nicolai Stange <nicstange@gmail.com>
> > > > > > > > Fixes: 7ffb0ce31cf9 ("drm/radeon: Don't register Thunderbolt eGPU with vga_switcheroo")
> > > > > > > > Signed-off-by: Lukas Wunner <lukas@wunner.de>
> > > > > > > > ---
> > > > > > > > 
> > > > > > > > Awaiting a Tested-by: from Nicolai, but it's clear this is a bug and
> > > > > > > > needs to be fixed, so sending out with a proper commit message now.
> > > > > > > > The bug was only introduced to radeon, not amdgpu.
> > > > > > > Tested-by: Nicolai Stange <nicstange@gmail.com>
> > > > > > > 
> > > > > > > Thanks for the quick fix!
> > > > > > > 
> > > > > > > > @Alex Deucher: I could push this to drm-misc-fixes but then it wouldn't
> > > > > > > > land before -rc3 because Sean Paul has already sent out the -rc2 pull.
> > > > > > > > I notice you haven't sent out a pull for -rc2 yet, so maybe you want to
> > > > > > > > take it yourself?  Whichever you prefer.  Thanks & sorry for the breakage!
> > > > > > I've learned this morning that Alex is on vacation.
> > > > > Christian König is standing in for Alex.
> > > > By his own account, he already has "all hands full replacing him [Alex]",
> > > > explicitly asked Daniel to merge an amdgpu patch through drm-misc-next for
> > > > this reason and lacks permission to update branches in Alex' repo on fdo:
> > > > 
> > > > "One lesson learned from the past week is that Alex needs to stop using
> > > > his personal repository on fdo.
> > > > We were asked a couple of times if I couldn't update a branch there from
> > > > different directions, which we obviously can't do."
> > > > 
> > > > https://lists.freedesktop.org/archives/dri-devel/2017-May/142376.html
> > > > https://lists.freedesktop.org/archives/dri-devel/2017-May/142380.html
> > > The important point being that Christian reviewed that patch and
> > > explicitly asked Daniel to pick it up.
> > 
> > Wow, wait a second. I'm just catching up on this thread.
> > 
> > Lukas didn't committed the patch to drm-misc without a review, didn't you?
> > 
> > I was intentionally holding back a rb because that isn't my field of
> > expertise and I was only briefly involved in the original patch. Alex should
> > be back by the end of the week, so no need for a rush like that.
> > 
> > Daniels patch was a global cleanup of include path done by search and
> > replace a complete different story.
> 
> Want me to drop the patch until Alex is back? We're generally trying
> really hard to refrain from rebasing drm-misc branches, but for -fixes
> it's doable (since much less patch traffic there).

Also, I guess putting this as requirement number one in the drm-misc docs
wasn't visible enough:

"Patch is properly reviewed or at least Ack, i.e. don't just push your own
stuff directly."

It would be nice if we could check this somehow using scripting, but since
we add the r-b/a-b tags only after they're applied that's a bit hard to
pull off. Maybe a sanity check when pushing that all the new patches
authored by the committer have an r-b/a-b would be useful ... Lukas, can
you pls look into that?

Thanks, Daniel
Lukas Wunner May 23, 2017, 9:40 a.m. UTC | #12
On Mon, May 22, 2017 at 03:35:48PM -0400, Sean Paul wrote:
> On Mon, May 22, 2017 at 04:04:07PM +0200, Lukas Wunner wrote:
> > On Sun, May 21, 2017 at 09:31:09AM +0200, Nicolai Stange wrote:
> > > On Thu, May 18 2017, Lukas Wunner wrote:
> > [snip]
> > > > Reported-by: Nicolai Stange <nicstange@gmail.com>
> > > > Fixes: 7ffb0ce31cf9 ("drm/radeon: Don't register Thunderbolt eGPU with vga_switcheroo")
> > > > Signed-off-by: Lukas Wunner <lukas@wunner.de>
> > > > ---
> > > >
> > > > Awaiting a Tested-by: from Nicolai, but it's clear this is a bug and
> > > > needs to be fixed, so sending out with a proper commit message now.
> > > > The bug was only introduced to radeon, not amdgpu.
> > > 
> > > Tested-by: Nicolai Stange <nicstange@gmail.com>
> > > 
> > > Thanks for the quick fix!
> > >
> > > > @Alex Deucher: I could push this to drm-misc-fixes but then it wouldn't
> > > > land before -rc3 because Sean Paul has already sent out the -rc2 pull.
> > > > I notice you haven't sent out a pull for -rc2 yet, so maybe you want to
> > > > take it yourself?  Whichever you prefer.  Thanks & sorry for the breakage!
> > 
> > I've learned this morning that Alex is on vacation.  I've pushed
> > the patch to drm-misc-fixes so that the issue is fixed in 4.12-rc3.
> > 
> > @Sean Paul: I've fast-forwarded to 4.12-rc2 before pushing, please
> > shout if I've done anything wrong.  First time I'm doing this.
> 
> No shouting, but a heads-up on IRC is probably warranted for both pushing a
> patch without R-b and fast-forwarding one of the branches.

Thanks, noted.  I'm not paid for work on the DRM subsystem, so I have to
do this during breaks at $DAYJOB where I have no access to IRC, but I will
ask via e-mail in the future before going out on a limb.

Not being able to dedicate my full attention to this all the time is also
the reason why it's hard for me to get the timing perfect:  I had already
submitted a fix before you sent out your -rc2 pull and I would have hated
missing another rc cycle, yet wasn't sure when exactly you were going to
send out your -rc3 pull this week and whether I would be able to carve out
enough time to push the patch before that date without hurriedly making
major mistakes.

Kind regards,

Lukas
Lukas Wunner May 23, 2017, 10 a.m. UTC | #13
On Mon, May 22, 2017 at 09:24:34PM +0200, Daniel Vetter wrote:
> On Thu, May 18, 2017 at 09:33:44PM +0200, Lukas Wunner wrote:
> > Nicolai Stange reports the following oops which is caused by
> > dereferencing rdev->pdev before it's subsequently set by
> > radeon_device_init().  Fix it.
> > 
> >   BUG: unable to handle kernel NULL pointer dereference at 00000000000007cb
> >   IP: radeon_driver_load_kms+0xeb/0x230 [radeon]
> >   PGD 0
> >   P4D 0
> > 
> >   Oops: 0000 [#1] SMP
> >   Modules linked in: amdkfd amd_iommu_v2 i915(+) radeon(+) i2c_algo_bit drm_kms_helper ttm e1000e drm sdhci_pci sdhci_acpi ptp sdhci crc32c_intel serio_raw mmc_core pps_core video i2c_hid hid_plantronics
> >   CPU: 4 PID: 389 Comm: systemd-udevd Not tainted 4.12.0-rc1-next-20170515+ #1
> >   Hardware name: Dell Inc. Latitude E6540/0725FP, BIOS A10 06/26/2014
> >   task: ffff97d62c8f0000 task.stack: ffffb96f01478000
> >   RIP: 0010:radeon_driver_load_kms+0xeb/0x230 [radeon]
> >   RSP: 0018:ffffb96f0147b9d0 EFLAGS: 00010246
> >   RAX: 0000000000000000 RBX: ffff97d620085000 RCX: 0000000000610037
> >   RDX: 0000000000000000 RSI: 000000000000002b RDI: 0000000000000000
> >   RBP: ffffb96f0147b9e8 R08: 0000000000000002 R09: ffffb96f0147b924
> >   R10: 0000000000000000 R11: ffff97d62edd2ec0 R12: ffff97d628d5c000
> >   R13: 0000000000610037 R14: ffffffffc0698280 R15: 0000000000000000
> >   FS:  00007f496363d8c0(0000) GS:ffff97d62eb00000(0000) knlGS:0000000000000000
> >   CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >   CR2: 00000000000007cb CR3: 000000022c14c000 CR4: 00000000001406e0
> >   Call Trace:
> >    drm_dev_register+0x146/0x1d0 [drm]
> >    drm_get_pci_dev+0x9a/0x180 [drm]
> >    radeon_pci_probe+0xb8/0xe0 [radeon]
> >    local_pci_probe+0x45/0xa0
> >    pci_device_probe+0x14f/0x1a0
> >    driver_probe_device+0x29c/0x450
> >    __driver_attach+0xdf/0xf0
> >    ? driver_probe_device+0x450/0x450
> >    bus_for_each_dev+0x6c/0xc0
> >    driver_attach+0x1e/0x20
> >    bus_add_driver+0x170/0x270
> >    driver_register+0x60/0xe0
> >    ? 0xffffffffc0508000
> >    __pci_register_driver+0x4c/0x50
> >    drm_pci_init+0xeb/0x100 [drm]
> >    ? vga_switcheroo_register_handler+0x6a/0x90
> >    ? 0xffffffffc0508000
> >    radeon_init+0x98/0xb6 [radeon]
> >    do_one_initcall+0x52/0x1a0
> >    ? __vunmap+0x81/0xb0
> >    ? kmem_cache_alloc_trace+0x159/0x1b0
> >    ? do_init_module+0x27/0x1f8
> >    do_init_module+0x5f/0x1f8
> >    load_module+0x27ce/0x2be0
> >    SYSC_finit_module+0xdf/0x110
> >    ? SYSC_finit_module+0xdf/0x110
> >    SyS_finit_module+0xe/0x10
> >    do_syscall_64+0x67/0x150
> >    entry_SYSCALL64_slow_path+0x25/0x25
> >   RIP: 0033:0x7f4962295679
> >   RSP: 002b:00007ffdd8c4f878 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
> >   RAX: ffffffffffffffda RBX: 000055c014ed8200 RCX: 00007f4962295679
> >   RDX: 0000000000000000 RSI: 00007f4962dd19c5 RDI: 0000000000000010
> >   RBP: 00007f4962dd19c5 R08: 0000000000000000 R09: 00007ffdd8c4f990
> >   R10: 0000000000000010 R11: 0000000000000246 R12: 0000000000000000
> >   R13: 000055c014ed81a0 R14: 0000000000020000 R15: 000055c0149d1fca
> >   Code: 5d 5d c3 8b 05 a7 05 14 00 49 81 cd 00 00 08 00 85 c0 74 a3 e8 e7 c0 0e 00 84 c0 74 9a 41 f7 c5 00 00 02 00 75 91 49 8b 44 24 10 <0f> b6 90 cb 07 00 00 f6 c2 20 74 1e e9 7b ff ff ff 48 8b 40 38
> >   RIP: radeon_driver_load_kms+0xeb/0x230 [radeon] RSP: ffffb96f0147b9d0
> >   CR2: 00000000000007cb
> >   ---[ end trace 89cc4ba7e569c65c ]---
> > 
> > Reported-by: Nicolai Stange <nicstange@gmail.com>
> > Fixes: 7ffb0ce31cf9 ("drm/radeon: Don't register Thunderbolt eGPU with vga_switcheroo")
> > Signed-off-by: Lukas Wunner <lukas@wunner.de>
> > ---
> > 
> > Awaiting a Tested-by: from Nicolai, but it's clear this is a bug and
> > needs to be fixed, so sending out with a proper commit message now.
> > The bug was only introduced to radeon, not amdgpu.
> > 
> > @Alex Deucher: I could push this to drm-misc-fixes but then it wouldn't
> > land before -rc3 because Sean Paul has already sent out the -rc2 pull.
> > I notice you haven't sent out a pull for -rc2 yet, so maybe you want to
> > take it yourself?  Whichever you prefer.  Thanks & sorry for the breakage!
> 
> Just noticed that this has landed already in drm-misc-fixes, without any
> r-b or at least an ack from radeon driver folks. That's breaking the
> drm-misc rules, we need at least an ack for small drivers (which radeon
> really isn't) and a full reviewed-by tag on everything else.
> 
> Patch doesn't look wrong, so not much harm, but please follow the ground
> rules and especially don't ever push your own patches without any peer
> feedback.

I was aware of that rule and that the available peer feedback (Nicolai's
Tested-by) was thin.  I misinterpreted Christian's remark that he has
"all hands full replacing" Alex such that he is swamped in work and
didn't get the chance to look at my patch so far.  Christian was already
cc'ed on Nicolai's regression report and on every single e-mail that
followed.  I figured that nagging Christian wouldn't be helpful if he's
already overloaded, yet didn't want to miss another rc cycle, so I pushed
without waiting further for a response.  I'm sorry for the irritation this
has caused, I guess in the future nagging Christian despite regrets is the
only option in such a case.

Thanks,

Lukas
Lukas Wunner May 23, 2017, 10:14 a.m. UTC | #14
On Tue, May 23, 2017 at 09:32:38AM +0200, Christian König wrote:
> Am 23.05.2017 um 05:55 schrieb Michel Dänzer:
> >On 23/05/17 12:50 PM, Lukas Wunner wrote:
> >>On Tue, May 23, 2017 at 12:09:49PM +0900, Michel Dänzer wrote:
> >>>On 22/05/17 11:04 PM, Lukas Wunner wrote:
> >>>>On Sun, May 21, 2017 at 09:31:09AM +0200, Nicolai Stange wrote:
> >>>>>On Thu, May 18 2017, Lukas Wunner wrote:
> >>>>[snip]
> >>>>>>Reported-by: Nicolai Stange <nicstange@gmail.com>
> >>>>>>Fixes: 7ffb0ce31cf9 ("drm/radeon: Don't register Thunderbolt eGPU with vga_switcheroo")
> >>>>>>Signed-off-by: Lukas Wunner <lukas@wunner.de>
> >>>>>>---
> >>>>>>
> >>>>>>Awaiting a Tested-by: from Nicolai, but it's clear this is a bug and
> >>>>>>needs to be fixed, so sending out with a proper commit message now.
> >>>>>>The bug was only introduced to radeon, not amdgpu.
> >>>>>Tested-by: Nicolai Stange <nicstange@gmail.com>
> >>>>>
> >>>>>Thanks for the quick fix!
> >>>>>
> >>>>>>@Alex Deucher: I could push this to drm-misc-fixes but then it wouldn't
> >>>>>>land before -rc3 because Sean Paul has already sent out the -rc2 pull.
> >>>>>>I notice you haven't sent out a pull for -rc2 yet, so maybe you want to
> >>>>>>take it yourself?  Whichever you prefer.  Thanks & sorry for the breakage!
> >>>>I've learned this morning that Alex is on vacation.
> >>>Christian König is standing in for Alex.
> >>By his own account, he already has "all hands full replacing him [Alex]",
> >>explicitly asked Daniel to merge an amdgpu patch through drm-misc-next for
> >>this reason and lacks permission to update branches in Alex' repo on fdo:
> >>
> >>"One lesson learned from the past week is that Alex needs to stop using
> >>his personal repository on fdo.
> >>We were asked a couple of times if I couldn't update a branch there from
> >>different directions, which we obviously can't do."
> >>
> >>https://lists.freedesktop.org/archives/dri-devel/2017-May/142376.html
> >>https://lists.freedesktop.org/archives/dri-devel/2017-May/142380.html
> >The important point being that Christian reviewed that patch and
> >explicitly asked Daniel to pick it up.
> 
> Wow, wait a second. I'm just catching up on this thread.
> 
> Lukas didn't committed the patch to drm-misc without a review, didn't you?
> 
> I was intentionally holding back a rb because that isn't my field of
> expertise and I was only briefly involved in the original patch.

It would have been helpful if you had communicated that, I explicitly
asked Alex which tree he'd prefer merging through.  If you're his
stand-in then why didn't you reply?

I was already wondering why you took the time to reply to Daniel's patch
(which went into drm-misc-next, so queued for 4.13), but didn't reply
at all to my patch (which affects 4.12, so arguably has higher priority).

I'd dispute that the issue at hand requires specific domain knowlege,
it's a trivial dereference of a pointer before it's set.


> Alex should
> be back by the end of the week, so no need for a rush like that.

End of the week means the patch would miss another rc cycle, and the
DRM subsystem is getting enough criticism for causing regressions as
it is, isn't it? :-(

Thanks,

Lukas
Christian König May 23, 2017, 10:32 a.m. UTC | #15
Am 23.05.2017 um 12:14 schrieb Lukas Wunner:
> On Tue, May 23, 2017 at 09:32:38AM +0200, Christian König wrote:
>> Am 23.05.2017 um 05:55 schrieb Michel Dänzer:
>>> On 23/05/17 12:50 PM, Lukas Wunner wrote:
>>>> On Tue, May 23, 2017 at 12:09:49PM +0900, Michel Dänzer wrote:
>>>>> On 22/05/17 11:04 PM, Lukas Wunner wrote:
>>>>>> On Sun, May 21, 2017 at 09:31:09AM +0200, Nicolai Stange wrote:
>>>>>>> On Thu, May 18 2017, Lukas Wunner wrote:
>>>>>> [snip]
>>>>>>>> Reported-by: Nicolai Stange <nicstange@gmail.com>
>>>>>>>> Fixes: 7ffb0ce31cf9 ("drm/radeon: Don't register Thunderbolt eGPU with vga_switcheroo")
>>>>>>>> Signed-off-by: Lukas Wunner <lukas@wunner.de>
>>>>>>>> ---
>>>>>>>>
>>>>>>>> Awaiting a Tested-by: from Nicolai, but it's clear this is a bug and
>>>>>>>> needs to be fixed, so sending out with a proper commit message now.
>>>>>>>> The bug was only introduced to radeon, not amdgpu.
>>>>>>> Tested-by: Nicolai Stange <nicstange@gmail.com>
>>>>>>>
>>>>>>> Thanks for the quick fix!
>>>>>>>
>>>>>>>> @Alex Deucher: I could push this to drm-misc-fixes but then it wouldn't
>>>>>>>> land before -rc3 because Sean Paul has already sent out the -rc2 pull.
>>>>>>>> I notice you haven't sent out a pull for -rc2 yet, so maybe you want to
>>>>>>>> take it yourself?  Whichever you prefer.  Thanks & sorry for the breakage!
>>>>>> I've learned this morning that Alex is on vacation.
>>>>> Christian König is standing in for Alex.
>>>> By his own account, he already has "all hands full replacing him [Alex]",
>>>> explicitly asked Daniel to merge an amdgpu patch through drm-misc-next for
>>>> this reason and lacks permission to update branches in Alex' repo on fdo:
>>>>
>>>> "One lesson learned from the past week is that Alex needs to stop using
>>>> his personal repository on fdo.
>>>> We were asked a couple of times if I couldn't update a branch there from
>>>> different directions, which we obviously can't do."
>>>>
>>>> https://lists.freedesktop.org/archives/dri-devel/2017-May/142376.html
>>>> https://lists.freedesktop.org/archives/dri-devel/2017-May/142380.html
>>> The important point being that Christian reviewed that patch and
>>> explicitly asked Daniel to pick it up.
>> Wow, wait a second. I'm just catching up on this thread.
>>
>> Lukas didn't committed the patch to drm-misc without a review, didn't you?
>>
>> I was intentionally holding back a rb because that isn't my field of
>> expertise and I was only briefly involved in the original patch.
> It would have been helpful if you had communicated that, I explicitly
> asked Alex which tree he'd prefer merging through.  If you're his
> stand-in then why didn't you reply?

Alex will be back before the weekend and probably sending another fixes 
pull for the rc.

PowerXpress is not my field of expertise, but Alex is deeply into so 
I've ignore that issue for now.

> I was already wondering why you took the time to reply to Daniel's patch
> (which went into drm-misc-next, so queued for 4.13), but didn't reply
> at all to my patch (which affects 4.12, so arguably has higher priority).
>
> I'd dispute that the issue at hand requires specific domain knowlege,
> it's a trivial dereference of a pointer before it's set.

After taking the time this morning to take a look at the patch and the 
original code I can confirm that it is indeed completely trivial.

>> Alex should
>> be back by the end of the week, so no need for a rush like that.
> End of the week means the patch would miss another rc cycle, and the
> DRM subsystem is getting enough criticism for causing regressions as
> it is, isn't it? :-(

Yeah, but in this case just ping me and not commit without any peer review.

For stuff like this we will get even more criticism from Linus than 
causing regressions.

Anyway no harm done, let's just merge this one through drm-misc-fixes 
and everything is fine.

Regards,
Christian.

>
> Thanks,
>
> Lukas
Daniel Vetter May 23, 2017, 12:58 p.m. UTC | #16
On Tue, May 23, 2017 at 12:00:16PM +0200, Lukas Wunner wrote:
> On Mon, May 22, 2017 at 09:24:34PM +0200, Daniel Vetter wrote:
> > On Thu, May 18, 2017 at 09:33:44PM +0200, Lukas Wunner wrote:
> > > Nicolai Stange reports the following oops which is caused by
> > > dereferencing rdev->pdev before it's subsequently set by
> > > radeon_device_init().  Fix it.
> > > 
> > >   BUG: unable to handle kernel NULL pointer dereference at 00000000000007cb
> > >   IP: radeon_driver_load_kms+0xeb/0x230 [radeon]
> > >   PGD 0
> > >   P4D 0
> > > 
> > >   Oops: 0000 [#1] SMP
> > >   Modules linked in: amdkfd amd_iommu_v2 i915(+) radeon(+) i2c_algo_bit drm_kms_helper ttm e1000e drm sdhci_pci sdhci_acpi ptp sdhci crc32c_intel serio_raw mmc_core pps_core video i2c_hid hid_plantronics
> > >   CPU: 4 PID: 389 Comm: systemd-udevd Not tainted 4.12.0-rc1-next-20170515+ #1
> > >   Hardware name: Dell Inc. Latitude E6540/0725FP, BIOS A10 06/26/2014
> > >   task: ffff97d62c8f0000 task.stack: ffffb96f01478000
> > >   RIP: 0010:radeon_driver_load_kms+0xeb/0x230 [radeon]
> > >   RSP: 0018:ffffb96f0147b9d0 EFLAGS: 00010246
> > >   RAX: 0000000000000000 RBX: ffff97d620085000 RCX: 0000000000610037
> > >   RDX: 0000000000000000 RSI: 000000000000002b RDI: 0000000000000000
> > >   RBP: ffffb96f0147b9e8 R08: 0000000000000002 R09: ffffb96f0147b924
> > >   R10: 0000000000000000 R11: ffff97d62edd2ec0 R12: ffff97d628d5c000
> > >   R13: 0000000000610037 R14: ffffffffc0698280 R15: 0000000000000000
> > >   FS:  00007f496363d8c0(0000) GS:ffff97d62eb00000(0000) knlGS:0000000000000000
> > >   CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > >   CR2: 00000000000007cb CR3: 000000022c14c000 CR4: 00000000001406e0
> > >   Call Trace:
> > >    drm_dev_register+0x146/0x1d0 [drm]
> > >    drm_get_pci_dev+0x9a/0x180 [drm]
> > >    radeon_pci_probe+0xb8/0xe0 [radeon]
> > >    local_pci_probe+0x45/0xa0
> > >    pci_device_probe+0x14f/0x1a0
> > >    driver_probe_device+0x29c/0x450
> > >    __driver_attach+0xdf/0xf0
> > >    ? driver_probe_device+0x450/0x450
> > >    bus_for_each_dev+0x6c/0xc0
> > >    driver_attach+0x1e/0x20
> > >    bus_add_driver+0x170/0x270
> > >    driver_register+0x60/0xe0
> > >    ? 0xffffffffc0508000
> > >    __pci_register_driver+0x4c/0x50
> > >    drm_pci_init+0xeb/0x100 [drm]
> > >    ? vga_switcheroo_register_handler+0x6a/0x90
> > >    ? 0xffffffffc0508000
> > >    radeon_init+0x98/0xb6 [radeon]
> > >    do_one_initcall+0x52/0x1a0
> > >    ? __vunmap+0x81/0xb0
> > >    ? kmem_cache_alloc_trace+0x159/0x1b0
> > >    ? do_init_module+0x27/0x1f8
> > >    do_init_module+0x5f/0x1f8
> > >    load_module+0x27ce/0x2be0
> > >    SYSC_finit_module+0xdf/0x110
> > >    ? SYSC_finit_module+0xdf/0x110
> > >    SyS_finit_module+0xe/0x10
> > >    do_syscall_64+0x67/0x150
> > >    entry_SYSCALL64_slow_path+0x25/0x25
> > >   RIP: 0033:0x7f4962295679
> > >   RSP: 002b:00007ffdd8c4f878 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
> > >   RAX: ffffffffffffffda RBX: 000055c014ed8200 RCX: 00007f4962295679
> > >   RDX: 0000000000000000 RSI: 00007f4962dd19c5 RDI: 0000000000000010
> > >   RBP: 00007f4962dd19c5 R08: 0000000000000000 R09: 00007ffdd8c4f990
> > >   R10: 0000000000000010 R11: 0000000000000246 R12: 0000000000000000
> > >   R13: 000055c014ed81a0 R14: 0000000000020000 R15: 000055c0149d1fca
> > >   Code: 5d 5d c3 8b 05 a7 05 14 00 49 81 cd 00 00 08 00 85 c0 74 a3 e8 e7 c0 0e 00 84 c0 74 9a 41 f7 c5 00 00 02 00 75 91 49 8b 44 24 10 <0f> b6 90 cb 07 00 00 f6 c2 20 74 1e e9 7b ff ff ff 48 8b 40 38
> > >   RIP: radeon_driver_load_kms+0xeb/0x230 [radeon] RSP: ffffb96f0147b9d0
> > >   CR2: 00000000000007cb
> > >   ---[ end trace 89cc4ba7e569c65c ]---
> > > 
> > > Reported-by: Nicolai Stange <nicstange@gmail.com>
> > > Fixes: 7ffb0ce31cf9 ("drm/radeon: Don't register Thunderbolt eGPU with vga_switcheroo")
> > > Signed-off-by: Lukas Wunner <lukas@wunner.de>
> > > ---
> > > 
> > > Awaiting a Tested-by: from Nicolai, but it's clear this is a bug and
> > > needs to be fixed, so sending out with a proper commit message now.
> > > The bug was only introduced to radeon, not amdgpu.
> > > 
> > > @Alex Deucher: I could push this to drm-misc-fixes but then it wouldn't
> > > land before -rc3 because Sean Paul has already sent out the -rc2 pull.
> > > I notice you haven't sent out a pull for -rc2 yet, so maybe you want to
> > > take it yourself?  Whichever you prefer.  Thanks & sorry for the breakage!
> > 
> > Just noticed that this has landed already in drm-misc-fixes, without any
> > r-b or at least an ack from radeon driver folks. That's breaking the
> > drm-misc rules, we need at least an ack for small drivers (which radeon
> > really isn't) and a full reviewed-by tag on everything else.
> > 
> > Patch doesn't look wrong, so not much harm, but please follow the ground
> > rules and especially don't ever push your own patches without any peer
> > feedback.
> 
> I was aware of that rule and that the available peer feedback (Nicolai's
> Tested-by) was thin.  I misinterpreted Christian's remark that he has
> "all hands full replacing" Alex such that he is swamped in work and
> didn't get the chance to look at my patch so far.  Christian was already
> cc'ed on Nicolai's regression report and on every single e-mail that
> followed.  I figured that nagging Christian wouldn't be helpful if he's
> already overloaded, yet didn't want to miss another rc cycle, so I pushed
> without waiting further for a response.  I'm sorry for the irritation this
> has caused, I guess in the future nagging Christian despite regrets is the
> only option in such a case.

Don't worry too much. The entire point of drm-misc is to make contributing
to drm/gpu drivers as painless as possible. Occasionally things go a bit
wrong, but that's why I then want to focus on tooling&documentation to
make sure we'll get this right in the future (if every committer would
need to learn every implicit rule we have, we'd get nowhere at all).
Assigning blame to people doesn't help in getting better at this stuff as
a community.

Anyway just my 2cents of debriefing, looks like we're all good.
-Daniel
Alex Deucher May 23, 2017, 6:47 p.m. UTC | #17
> -----Original Message-----
> From: Lukas Wunner [mailto:lukas@wunner.de]
> Sent: Monday, May 22, 2017 11:51 PM
> To: Michel Dänzer
> Cc: Nicolai Stange; Sean Paul; Deucher, Alexander; dri-
> devel@lists.freedesktop.org; amd-gfx@lists.freedesktop.org; Koenig,
> Christian
> Subject: Re: [PATCH] drm/radeon: Fix oops upon driver load on PowerXpress
> laptops
> 
> On Tue, May 23, 2017 at 12:09:49PM +0900, Michel Dänzer wrote:
> > On 22/05/17 11:04 PM, Lukas Wunner wrote:
> > > On Sun, May 21, 2017 at 09:31:09AM +0200, Nicolai Stange wrote:
> > >> On Thu, May 18 2017, Lukas Wunner wrote:
> > > [snip]
> > >>> Reported-by: Nicolai Stange <nicstange@gmail.com>
> > >>> Fixes: 7ffb0ce31cf9 ("drm/radeon: Don't register Thunderbolt eGPU
> with vga_switcheroo")
> > >>> Signed-off-by: Lukas Wunner <lukas@wunner.de>
> > >>> ---
> > >>>
> > >>> Awaiting a Tested-by: from Nicolai, but it's clear this is a bug and
> > >>> needs to be fixed, so sending out with a proper commit message now.
> > >>> The bug was only introduced to radeon, not amdgpu.
> > >>
> > >> Tested-by: Nicolai Stange <nicstange@gmail.com>
> > >>
> > >> Thanks for the quick fix!
> > >>
> > >>> @Alex Deucher: I could push this to drm-misc-fixes but then it
> wouldn't
> > >>> land before -rc3 because Sean Paul has already sent out the -rc2 pull.
> > >>> I notice you haven't sent out a pull for -rc2 yet, so maybe you want to
> > >>> take it yourself?  Whichever you prefer.  Thanks & sorry for the
> breakage!
> > >
> > > I've learned this morning that Alex is on vacation.
> >
> > Christian König is standing in for Alex.
> 
> By his own account, he already has "all hands full replacing him [Alex]",
> explicitly asked Daniel to merge an amdgpu patch through drm-misc-next for
> this reason and lacks permission to update branches in Alex' repo on fdo:
> 
> "One lesson learned from the past week is that Alex needs to stop using
> his personal repository on fdo.
> We were asked a couple of times if I couldn't update a branch there from
> different directions, which we obviously can't do."
> 
> https://lists.freedesktop.org/archives/dri-devel/2017-May/142376.html
> https://lists.freedesktop.org/archives/dri-devel/2017-May/142380.html
> 

What tree we use for pull requests is irrelevant.  We need to follow the proper protocol.  In the future patches like this should have an ack or rb and should flow through the radeon tree.

> 
> > > I've pushed the patch to drm-misc-fixes so that the issue is fixed in
> > > 4.12-rc3.
> >
> > I don't think there was any particular need to bypass the normal radeon
> > tree for this. There was plenty of time for the fix to get into 4.12
> > final, even after Alex is back.
> 
> Well, it wouldn't be nice towards users affected by the same issue
> who may waste time with bisecting to just sit on a fix twiddling thumbs.

We also need to try and avoid regressions and try and flow changes through proper trees.  There is always going to be some delay in getting changes upstream.

Alex

> 
> Thanks,
> 
> Lukas
Alex Deucher May 23, 2017, 8:23 p.m. UTC | #18
On Tue, May 23, 2017 at 2:47 PM, Deucher, Alexander
<Alexander.Deucher@amd.com> wrote:
>> -----Original Message-----
>> From: Lukas Wunner [mailto:lukas@wunner.de]
>> Sent: Monday, May 22, 2017 11:51 PM
>> To: Michel Dänzer
>> Cc: Nicolai Stange; Sean Paul; Deucher, Alexander; dri-
>> devel@lists.freedesktop.org; amd-gfx@lists.freedesktop.org; Koenig,
>> Christian
>> Subject: Re: [PATCH] drm/radeon: Fix oops upon driver load on PowerXpress
>> laptops
>>
>> On Tue, May 23, 2017 at 12:09:49PM +0900, Michel Dänzer wrote:
>> > On 22/05/17 11:04 PM, Lukas Wunner wrote:
>> > > On Sun, May 21, 2017 at 09:31:09AM +0200, Nicolai Stange wrote:
>> > >> On Thu, May 18 2017, Lukas Wunner wrote:
>> > > [snip]
>> > >>> Reported-by: Nicolai Stange <nicstange@gmail.com>
>> > >>> Fixes: 7ffb0ce31cf9 ("drm/radeon: Don't register Thunderbolt eGPU
>> with vga_switcheroo")
>> > >>> Signed-off-by: Lukas Wunner <lukas@wunner.de>
>> > >>> ---
>> > >>>
>> > >>> Awaiting a Tested-by: from Nicolai, but it's clear this is a bug and
>> > >>> needs to be fixed, so sending out with a proper commit message now.
>> > >>> The bug was only introduced to radeon, not amdgpu.
>> > >>
>> > >> Tested-by: Nicolai Stange <nicstange@gmail.com>
>> > >>
>> > >> Thanks for the quick fix!
>> > >>
>> > >>> @Alex Deucher: I could push this to drm-misc-fixes but then it
>> wouldn't
>> > >>> land before -rc3 because Sean Paul has already sent out the -rc2 pull.
>> > >>> I notice you haven't sent out a pull for -rc2 yet, so maybe you want to
>> > >>> take it yourself?  Whichever you prefer.  Thanks & sorry for the
>> breakage!
>> > >
>> > > I've learned this morning that Alex is on vacation.
>> >
>> > Christian König is standing in for Alex.
>>
>> By his own account, he already has "all hands full replacing him [Alex]",
>> explicitly asked Daniel to merge an amdgpu patch through drm-misc-next for
>> this reason and lacks permission to update branches in Alex' repo on fdo:
>>
>> "One lesson learned from the past week is that Alex needs to stop using
>> his personal repository on fdo.
>> We were asked a couple of times if I couldn't update a branch there from
>> different directions, which we obviously can't do."
>>
>> https://lists.freedesktop.org/archives/dri-devel/2017-May/142376.html
>> https://lists.freedesktop.org/archives/dri-devel/2017-May/142380.html
>>
>
> What tree we use for pull requests is irrelevant.  We need to follow the proper protocol.  In the future patches like this should have an ack or rb and should flow through the radeon tree.
>
>>
>> > > I've pushed the patch to drm-misc-fixes so that the issue is fixed in
>> > > 4.12-rc3.
>> >
>> > I don't think there was any particular need to bypass the normal radeon
>> > tree for this. There was plenty of time for the fix to get into 4.12
>> > final, even after Alex is back.
>>
>> Well, it wouldn't be nice towards users affected by the same issue
>> who may waste time with bisecting to just sit on a fix twiddling thumbs.
>
> We also need to try and avoid regressions and try and flow changes through proper trees.  There is always going to be some delay in getting changes upstream.

Sorry for piling on, I hadn't quite caught up with the whole thread
yet.  In the end no harm done.

Alex

>
> Alex
>
>>
>> Thanks,
>>
>> Lukas
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
diff mbox

Patch

diff --git a/drivers/gpu/drm/radeon/radeon_kms.c b/drivers/gpu/drm/radeon/radeon_kms.c
index 6a68d440bc44..d0ad03674250 100644
--- a/drivers/gpu/drm/radeon/radeon_kms.c
+++ b/drivers/gpu/drm/radeon/radeon_kms.c
@@ -116,7 +116,7 @@  int radeon_driver_load_kms(struct drm_device *dev, unsigned long flags)
 	if ((radeon_runtime_pm != 0) &&
 	    radeon_has_atpx() &&
 	    ((flags & RADEON_IS_IGP) == 0) &&
-	    !pci_is_thunderbolt_attached(rdev->pdev))
+	    !pci_is_thunderbolt_attached(dev->pdev))
 		flags |= RADEON_IS_PX;
 
 	/* radeon_device_init should report only fatal error