Message ID | CAPM=9twFEv8AcRQG-WXg5owy_Xhxy3DqnvVCFHgtd4TYCcKWEQ@mail.gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [git,pull] drm for 5.20/6.0 | expand |
On Tue, Aug 2, 2022 at 10:38 PM Dave Airlie <airlied@gmail.com> wrote: > > This is a conflicty one. The late revert in 5.19 of the amdgpu buddy > allocator causes major conflict fallout. The buddy allocator code in > this one works, so the resolutions are usually just to take stuff from > this. It might actually be cleaner if you revert > 925b6e59138cefa47275c67891c65d48d3266d57 (Revert "drm/amdgpu: add drm > buddy support to amdgpu") first in your tree then merge this. Ugh, what a pain. The other conflicts are also due to just randomly duplicated commits, with *usually* your drm tree having the superset (so "just take yours" is the easy resolution), but not always (ie the Intel firmware-69 mess was apparently not dealt with in the development tree). Nasty. I think I have it resolved, am still doing a full build test, and will then compare against what your suggested merge is. Linus
On Wed, Aug 3, 2022 at 7:46 PM Linus Torvalds <torvalds@linux-foundation.org> wrote: > > I think I have it resolved, am still doing a full build test, and will > then compare against what your suggested merge is. Hmm. I end up with *almost* the same thing. Except I ended up with a select DRM_BUDDY for the DRM_AMDGPU config entry, and you don't have that. I *think* my version is correct, in that clearly the amdgpu driver now uses that buddy logic (just doing a random "grep drm_buddy_block" to see). But this was messy enough to resolve that I think people should double-check my end, and maybe I just got confused at some point in the process. And while I seem to have gotten the same result as you did on the i915 firmware side too, again, I'd like people to re-verify. Linus
The pull request you sent on Wed, 3 Aug 2022 15:37:43 +1000:
> git://anongit.freedesktop.org/drm/drm tags/drm-next-2022-08-03
has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/b44f2fd87919b5ae6e1756d4c7ba2cbba22238e1
Thank you!
On Thu, 4 Aug 2022 at 13:16, Linus Torvalds <torvalds@linux-foundation.org> wrote: > > On Wed, Aug 3, 2022 at 7:46 PM Linus Torvalds > <torvalds@linux-foundation.org> wrote: > > > > I think I have it resolved, am still doing a full build test, and will > > then compare against what your suggested merge is. > > Hmm. > > I end up with *almost* the same thing. > > Except I ended up with a > > select DRM_BUDDY > > for the DRM_AMDGPU config entry, and you don't have that. > > I *think* my version is correct, in that clearly the amdgpu driver now > uses that buddy logic (just doing a random "grep drm_buddy_block" to > see). Actually I did miss that so that looks good. > > But this was messy enough to resolve that I think people should > double-check my end, and maybe I just got confused at some point in > the process. > > And while I seem to have gotten the same result as you did on the i915 > firmware side too, again, I'd like people to re-verify. I'll pull it down and look over it. Dave.
On Wed, Aug 3, 2022 at 8:37 PM Dave Airlie <airlied@gmail.com> wrote: > > Actually I did miss that so that looks good. .. I wish it did, but I just actually test-booted my desktop with the result, and it crashes the X server. This seems to be the splat in Xorg.0.log: (II) Initializing extension DRI2 (II) AMDGPU(0): Setting screen physical size to 2032 x 571 (EE) (EE) Backtrace: (EE) 0: /usr/libexec/Xorg (OsLookupColor+0x13d) [0x55b1dc61258d] (EE) 1: /lib64/libc.so.6 (__sigaction+0x50) [0x7f7972a3ea70] (EE) 2: /usr/lib64/xorg/modules/drivers/amdgpu_drv.so (AMDGPUCreateWindow_oneshot+0x101) [0x7f797207ddd1] (EE) 3: /usr/libexec/Xorg (compIsAlternateVisual+0xdc4) [0x55b1dc545fa4] (EE) 4: /usr/libexec/Xorg (InitRootWindow+0x17) [0x55b1dc4e0047] (EE) 5: /usr/libexec/Xorg (miPutImage+0xd4c) [0x55b1dc49e60b] (EE) 6: /lib64/libc.so.6 (__libc_start_call_main+0x80) [0x7f7972a29550] (EE) 7: /lib64/libc.so.6 (__libc_start_main+0x89) [0x7f7972a29609] (EE) 8: /usr/libexec/Xorg (_start+0x25) [0x55b1dc49f2c5] (EE) (EE) Segmentation fault at address 0x4 (EE) Fatal server error: (EE) Caught signal 11 (Segmentation fault). Server aborting so something is going horribly wrong. No kernel oops, though. It works on my intel laptop, so it's amdgpu somewhere. I guess I will start bisecting. Oy vey. Linus
On Thu, 4 Aug 2022 at 13:47, Linus Torvalds <torvalds@linux-foundation.org> wrote: > > On Wed, Aug 3, 2022 at 8:37 PM Dave Airlie <airlied@gmail.com> wrote: > > > > Actually I did miss that so that looks good. > > .. I wish it did, but I just actually test-booted my desktop with the > result, and it crashes the X server. This seems to be the splat in > Xorg.0.log: > > (II) Initializing extension DRI2 > (II) AMDGPU(0): Setting screen physical size to 2032 x 571 > (EE) > (EE) Backtrace: > (EE) 0: /usr/libexec/Xorg (OsLookupColor+0x13d) [0x55b1dc61258d] > (EE) 1: /lib64/libc.so.6 (__sigaction+0x50) [0x7f7972a3ea70] > (EE) 2: /usr/lib64/xorg/modules/drivers/amdgpu_drv.so > (AMDGPUCreateWindow_oneshot+0x101) [0x7f797207ddd1] > (EE) 3: /usr/libexec/Xorg (compIsAlternateVisual+0xdc4) [0x55b1dc545fa4] > (EE) 4: /usr/libexec/Xorg (InitRootWindow+0x17) [0x55b1dc4e0047] > (EE) 5: /usr/libexec/Xorg (miPutImage+0xd4c) [0x55b1dc49e60b] > (EE) 6: /lib64/libc.so.6 (__libc_start_call_main+0x80) [0x7f7972a29550] > (EE) 7: /lib64/libc.so.6 (__libc_start_main+0x89) [0x7f7972a29609] > (EE) 8: /usr/libexec/Xorg (_start+0x25) [0x55b1dc49f2c5] > (EE) > (EE) Segmentation fault at address 0x4 > (EE) > Fatal server error: > (EE) Caught signal 11 (Segmentation fault). Server aborting > > so something is going horribly wrong. No kernel oops, though. > > It works on my intel laptop, so it's amdgpu somewhere. I'll spin my ryzen up to see if I can reproduce, and test against the drm-next pre-merge tree as well. Dave.
On Wed, Aug 3, 2022 at 8:53 PM Dave Airlie <airlied@gmail.com> wrote: > > > It works on my intel laptop, so it's amdgpu somewhere. > > I'll spin my ryzen up to see if I can reproduce, and test against the > drm-next pre-merge tree as well. So it's not my merge - I've had a bad result in the middle of the DRM history too. On a positive note, my arm64 machine works fine, but that's just using fbdev so ... But another datapoint to say that it's amdgpu-specific. Not that that was really in doubt. Linus
On Thu, 4 Aug 2022 at 14:02, Linus Torvalds <torvalds@linux-foundation.org> wrote: > > On Wed, Aug 3, 2022 at 8:53 PM Dave Airlie <airlied@gmail.com> wrote: > > > > > It works on my intel laptop, so it's amdgpu somewhere. > > > > I'll spin my ryzen up to see if I can reproduce, and test against the > > drm-next pre-merge tree as well. > > So it's not my merge - I've had a bad result in the middle of the DRM > history too. > > On a positive note, my arm64 machine works fine, but that's just using > fbdev so ... > > But another datapoint to say that it's amdgpu-specific. Not that that > was really in doubt. I've reproduced it, I'll send you a revert pile when I confirm it is the buddy allocator. Dave.
On Wed, Aug 3, 2022 at 9:24 PM Dave Airlie <airlied@gmail.com> wrote: > > I've reproduced it, I'll send you a revert pile when I confirm it is > the buddy allocator. I've bisected it to 86bd6706c404..074293dd9f61 and don't see "buddy" in any of those commits. I'll do a few more. It's close enough already that it should be just four more reboots to pinpoint exactly which commit breaks. Linus
On Wed, Aug 3, 2022 at 9:27 PM Linus Torvalds <torvalds@linux-foundation.org> wrote: > > I'll do a few more. It's close enough already that it should be just > four more reboots to pinpoint exactly which commit breaks. commit 5d945cbcd4b16a29d6470a80dfb19738f9a4319f is the first bad commit. I think it's supposed to make no semantic changes, but it clearly does. What a pain to figure out what's wrong in there, and I assume it doesn't revert cleanly either. Bringing in the guilty parties. See https://lore.kernel.org/all/CAHk-=wj+yzauNXiEwHfCrkbdLSQkizdR1Q3YJLAqPo6AVq2_4Q@mail.gmail.com/ for the beginning of this thread. Linus
On Thu, 4 Aug 2022 at 14:46, Linus Torvalds <torvalds@linux-foundation.org> wrote: > > On Wed, Aug 3, 2022 at 9:27 PM Linus Torvalds > <torvalds@linux-foundation.org> wrote: > > > > I'll do a few more. It's close enough already that it should be just > > four more reboots to pinpoint exactly which commit breaks. > > commit 5d945cbcd4b16a29d6470a80dfb19738f9a4319f is the first bad commit. > > I think it's supposed to make no semantic changes, but it clearly does. > > What a pain to figure out what's wrong in there, and I assume it > doesn't revert cleanly either. > > Bringing in the guilty parties. See > > https://lore.kernel.org/all/CAHk-=wj+yzauNXiEwHfCrkbdLSQkizdR1Q3YJLAqPo6AVq2_4Q@mail.gmail.com/ > > for the beginning of this thread. I think I've tracked it down, looks like it would only affect GFX8 cards, which might explain why you and I have seen it, and I haven't seen any other reports. pretty sure you have an rx580, and I just happen to have a fiji card in this machine right now. I'll retest on master and send you a fixup patch. Dave.
On Thu, 4 Aug 2022 at 15:25, Dave Airlie <airlied@gmail.com> wrote: > > On Thu, 4 Aug 2022 at 14:46, Linus Torvalds > <torvalds@linux-foundation.org> wrote: > > > > On Wed, Aug 3, 2022 at 9:27 PM Linus Torvalds > > <torvalds@linux-foundation.org> wrote: > > > > > > I'll do a few more. It's close enough already that it should be just > > > four more reboots to pinpoint exactly which commit breaks. > > > > commit 5d945cbcd4b16a29d6470a80dfb19738f9a4319f is the first bad commit. > > > > I think it's supposed to make no semantic changes, but it clearly does. > > > > What a pain to figure out what's wrong in there, and I assume it > > doesn't revert cleanly either. > > > > Bringing in the guilty parties. See > > > > https://lore.kernel.org/all/CAHk-=wj+yzauNXiEwHfCrkbdLSQkizdR1Q3YJLAqPo6AVq2_4Q@mail.gmail.com/ > > > > for the beginning of this thread. > > I think I've tracked it down, looks like it would only affect GFX8 > cards, which might explain why you and I have seen it, and I haven't > seen any other reports. > > pretty sure you have an rx580, and I just happen to have a fiji card > in this machine right now. > > I'll retest on master and send you a fixup patch. To close the loop https://lore.kernel.org/all/20220804055036.691670-1-airlied@redhat.com/T/#u Seems to fix it here. Dave.