Message ID | CAPM=9tw-53PCvveRcdLUUQ+mjq2X2er5zp6n1KeE8Nu8x=VP2g@mail.gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [git,pull] drm for 6.10-rc1 | expand |
The pull request you sent on Wed, 15 May 2024 16:20:56 +1000:
> https://gitlab.freedesktop.org/drm/kernel.git tags/drm-next-2024-05-15
has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/db5d28c0bfe566908719bec8e25443aabecbb802
Thank you!
On Tue, 14 May 2024 at 23:21, Dave Airlie <airlied@gmail.com> wrote: > > In drivers the main thing is a new driver for ARM Mali firmware based > GPUs, otherwise there are a lot of changes to amdgpu/xe/i915/msm and > scattered changes to everything else. Hmm. There's something seriously wrong with amdgpu. I'm getting a ton of__force_merge warnings: WARNING: CPU: 0 PID: 1069 at drivers/gpu/drm/drm_buddy.c:199 __force_merge+0x14f/0x180 [drm_buddy] Modules linked in: hid_logitech_hidpp hid_logitech_dj uas usb_storage amdgpu drm_ttm_helper ttm video drm_exec drm_suballoc_helper amdxcp drm_buddy gpu_sched drm_display_helper drm_kms_helper crct10dif_pclmul crc32_pclmul crc32c_intel drm ghash_clmulni_intel igb atlantic nvme dca macsec ccp i2c_algo_bit nvme_core sp5100_tco wmi ip6_tables ip_tables fuse CPU: 0 PID: 1069 Comm: plymouthd Not tainted 6.9.0-07381-g3860ca371740 #60 Hardware name: Gigabyte Technology Co., Ltd. TRX40 AORUS MASTER/TRX40 AORUS MASTER, BIOS F7 09/07/2022 RIP: 0010:__force_merge+0x14f/0x180 [drm_buddy] Code: 74 0d 49 8b 44 24 18 48 d3 e0 49 29 44 24 30 4c 89 e7 ba 01 00 00 00 e8 9f 00 00 00 44 39 e8 73 1f 49 8b 04 24 e9 25 ff ff ff <0f> 0b 4c 39 c3 75 a3 eb 99 b8 f4 ff ff ff c3 b8 f4 ff ff ff eb 02 RSP: 0018:ffffb87a81cb7908 EFLAGS: 00010246 RAX: ffff9b1915de8000 RBX: ffff9b1919478288 RCX: 000000000ffff800 RDX: ffff9b19194782f8 RSI: ffff9b19194782d0 RDI: ffff9b19194782b0 RBP: 0000000000000000 R08: ffff9b1919478288 R09: 0000000000001000 R10: 0000000000000800 R11: 0000000000000000 R12: ffff9b192590fa18 R13: 000000000000000d R14: 0000000010000000 R15: 0000000000000000 FS: 00007fa06bfa9740(0000) GS:ffff9b281e000000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000555adb857000 CR3: 000000011b516000 CR4: 0000000000350ef0 Call Trace: ? __force_merge+0x14f/0x180 [drm_buddy] drm_buddy_alloc_blocks+0x249/0x400 [drm_buddy] ? __cond_resched+0x16/0x40 amdgpu_vram_mgr_new+0x204/0x3f0 [amdgpu] ttm_resource_alloc+0x31/0x120 [ttm] ttm_bo_alloc_resource+0xbc/0x260 [ttm] ttm_bo_validate+0x9f/0x210 [ttm] ttm_bo_init_reserved+0x103/0x130 [ttm] amdgpu_bo_create+0x246/0x400 [amdgpu] ? amdgpu_bo_destroy+0x70/0x70 [amdgpu] amdgpu_bo_create_user+0x29/0x40 [amdgpu] amdgpu_mode_dumb_create+0x108/0x190 [amdgpu] ? amdgpu_bo_destroy+0x70/0x70 [amdgpu] ? drm_mode_create_dumb+0xa0/0xa0 [drm] drm_ioctl_kernel+0xad/0xd0 [drm] drm_ioctl+0x330/0x4b0 [drm] ? drm_mode_create_dumb+0xa0/0xa0 [drm] amdgpu_drm_ioctl+0x41/0x80 [amdgpu] __x64_sys_ioctl+0xd2a/0xe00 ? update_process_times+0x89/0xa0 ? tick_nohz_handler+0xe2/0x120 ? timerqueue_add+0x94/0xa0 ? __hrtimer_run_queues+0x12b/0x250 ? ktime_get+0x34/0xb0 ? lapic_next_event+0x12/0x20 ? clockevents_program_event+0x78/0xd0 ? hrtimer_interrupt+0x118/0x390 ? sched_clock+0x5/0x10 do_syscall_64+0x68/0x130 ? __irq_exit_rcu+0x53/0xb0 entry_SYSCALL_64_after_hwframe+0x4b/0x53 and eventually the whole thing just crashes entirely, with a bad page state in the VM: BUG: Bad page state in process kworker/u261:13 pfn:31fb9a page: refcount:0 mapcount:0 mapping:00000000ff0b239e index:0x37ce8 pfn:0x31fb9a aops:btree_aops ino:1 flags: 0x2fffc600000020c(referenced|uptodate|workingset|node=0|zone=2|lastcpupid=0x3fff) page_type: 0xffffffff() which comes from a btrfs worker (btrfs-delayed-meta btrfs_work_helper), but I would not be surprised if that was caused by whatever odd thing is going on with the DRM code. IOW, it *looks* like this code ends up just corrupting memory in horrible ways. Linus Linus
On Wed, 15 May 2024 at 13:06, Linus Torvalds <torvalds@linux-foundation.org> wrote: > > Hmm. There's something seriously wrong with amdgpu. > > I'm getting a ton of__force_merge warnings: > > WARNING: CPU: 0 PID: 1069 at drivers/gpu/drm/drm_buddy.c:199 > __force_merge+0x14f/0x180 [drm_buddy] Adding likely culprits to the participants, since it looks like this is all new with commit 96950929eb23 ("drm/buddy: Implement tracking clear page feature"). Sadly I can't juist revert that commit to check, because there are many subsequent commits that then depend on it. I guess I'll try to revert the later commit that enables it for amdgpu (commit a68c7eaa7a8f) and see if it at least makes the horrendous messages go away. Anyway, this is some old Radeon graphics card in my Threadripper: 49:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere [Radeon RX 470/480/570/570X/580/580X/590] (rev e7) (prog-if 00 [VGA controller]) Subsystem: Sapphire Technology Limited Radeon RX 570 Pulse 4GB Flags: bus master, fast devsel, latency 0, IRQ 130, IOMMU group 32 Memory at c0000000 (64-bit, prefetchable) [size=256M] Memory at d0000000 (64-bit, prefetchable) [size=2M] I/O ports at 8000 [size=256] Memory at d1c00000 (32-bit, non-prefetchable) [size=256K] Expansion ROM at 000c0000 [disabled] [size=128K] Capabilities: <access denied> Kernel driver in use: amdgpu Kernel modules: amdgpu I think it's a "Sapphire Radeon Pulse RX 580" or something like that. Linus
On Wed, 15 May 2024 at 13:21, Linus Torvalds <torvalds@linux-foundation.org> wrote: > > I guess I'll try to revert the later commit that enables it for amdgpu > (commit a68c7eaa7a8f) and see if it at least makes the horrendous > messages go away. I have to revert both a68c7eaa7a8f ("drm/amdgpu: Enable clear page functionality") e362b7c8f8c7 ("drm/amdgpu: Modify the contiguous flags behaviour") to make things build cleanly. Next step: see if it boots and fixes the problem for me. Linus
On Wed, 15 May 2024 at 13:24, Linus Torvalds <torvalds@linux-foundation.org> wrote: > > I have to revert both > > a68c7eaa7a8f ("drm/amdgpu: Enable clear page functionality") > e362b7c8f8c7 ("drm/amdgpu: Modify the contiguous flags behaviour") > > to make things build cleanly. Next step: see if it boots and fixes the > problem for me. Well, perhaps not surprisingly, the WARN_ON() no longer triggers with this, and everything looks fine. Let's see if the machine ends up being stable now. It took several hours for the "scary messages" state to turn into the "hung machine" state, so they *could* have been independent issues, but it seems a bit unlikely. Linus
On Tue, 14 May 2024 at 23:21, Dave Airlie <airlied@gmail.com> wrote: > > This is the main pull request for the drm subsystems for 6.10. .. and now that I look more at this pull request, I find other things wrong. Why is the DRM code asking if I want to enable -Werror? I have Werror enabled *already*. I hate stupid config questions. They only confuse users. If the global WERROR config is enabled, then the DRM config certainly shouldn't ask whether you want even more -Werror. It does nothing but annoy people. And no, we are not going to have subsystems that can *weaken* the existing CONFIG_WERROR. Happily, that doesn't seem to be what the DRM code wants to do, it just wants to add -Werror, but as mentioned, its' crazy to do that when we already have it globally enabled. Now, it might make more sense to ask if you want -Wextra. A lot of those warnings are bogus. Linus
On Thu, 16 May 2024 at 06:43, Linus Torvalds <torvalds@linux-foundation.org> wrote: > > On Tue, 14 May 2024 at 23:21, Dave Airlie <airlied@gmail.com> wrote: > > > > This is the main pull request for the drm subsystems for 6.10. > > .. and now that I look more at this pull request, I find other things wrong. > > Why is the DRM code asking if I want to enable -Werror? I have Werror > enabled *already*. > > I hate stupid config questions. They only confuse users. > > If the global WERROR config is enabled, then the DRM config certainly > shouldn't ask whether you want even more -Werror. It does nothing but > annoy people. > > And no, we are not going to have subsystems that can *weaken* the > existing CONFIG_WERROR. Happily, that doesn't seem to be what the DRM > code wants to do, it just wants to add -Werror, but as mentioned, its' > crazy to do that when we already have it globally enabled. > > Now, it might make more sense to ask if you want -Wextra. A lot of > those warnings are bogus. The help says: The drm subsystem enables more warnings than the kernel default, so this config option is disabled by default. It's also depends on DRM && EXPERT so we aren't throwing it at random users. should we rename it CONFIG_DRM_WERROR_MORE or something? Dave.
On Wed, 15 May 2024 at 15:45, Dave Airlie <airlied@gmail.com> wrote: > > The drm subsystem enables more warnings than the kernel default, so > this config option is disabled by default. Irrelevant. If the *main* CONFIG_WERROR is on, then it does NOT MATTER if somebody sets CONFIG_DRM_WERROR or not. It's a no-op. It's pointless. And that means that it's also entirely pointless to ask. It's only annoying. > depends on DRM && EXPERT > > so we aren't throwing it at random users. Yes you are. Because - rightly or wrongly - distros enable EXPERT by default. At least Fedora does. So any user that starts from a distro config will have EXPERT enabled. > should we rename it CONFIG_DRM_WERROR_MORE or something? Renaming does nothing. If it's pointless, it's pointless even if it's renamed. It needs to have a depends on !WERROR because if WERROR is already true, then it's stupid and wrong to ask AGAIN. To summarize: if the main WERROR is enabled, then the DRM tree is *ALREADY* built with WERROR. Asking for DRM_WERROR is wrong. I keep harping on bad config variables because our kernel config thing is already much too messy and is by far the most difficult part of building your own kernel. Everything else is literally just "make" followed by "make modules_install" and "make install". Very straightforward. But doing a kernel config? Nasty. And made nastier by bad and nonsensical questions. Linus
On Thu, 16 May 2024 at 08:56, Linus Torvalds <torvalds@linux-foundation.org> wrote: > > On Wed, 15 May 2024 at 15:45, Dave Airlie <airlied@gmail.com> wrote: > > > > The drm subsystem enables more warnings than the kernel default, so > > this config option is disabled by default. > > Irrelevant. > > If the *main* CONFIG_WERROR is on, then it does NOT MATTER if somebody > sets CONFIG_DRM_WERROR or not. It's a no-op. It's pointless. > > And that means that it's also entirely pointless to ask. It's only annoying. > > > depends on DRM && EXPERT > > > > so we aren't throwing it at random users. > > Yes you are. > > Because - rightly or wrongly - distros enable EXPERT by default. At > least Fedora does. So any user that starts from a distro config will > have EXPERT enabled. > > > should we rename it CONFIG_DRM_WERROR_MORE or something? > > Renaming does nothing. If it's pointless, it's pointless even if it's renamed. > > It needs to have a > > depends on !WERROR > > because if WERROR is already true, then it's stupid and wrong to ask AGAIN. > > To summarize: if the main WERROR is enabled, then the DRM tree is > *ALREADY* built with WERROR. Asking for DRM_WERROR is wrong. > > I keep harping on bad config variables because our kernel config thing > is already much too messy and is by far the most difficult part of > building your own kernel. > > Everything else is literally just "make" followed by "make > modules_install" and "make install". Very straightforward. > > But doing a kernel config? Nasty. And made nastier by bad and > nonsensical questions. It's also possible it's just that hey there's a few others in the tree KVM_WERROR not tied to it PPC_WERROR (why does CXL uses this?) AMDGPU, I915 and XE all have !COMPILE_TEST on their variants We should probably add !WERROR to all of these at this point. Adding Jani who was the initial author of commit f89632a9e5fa6c4787c14458cd42a9ef42025434 Author: Jani Nikula <jani.nikula@intel.com> Date: Tue Mar 5 11:07:36 2024 +0200 drm: Add CONFIG_DRM_WERROR where I see we actually removed the !COMPILE_TEST check in v2. Dave.
On Thu, 16 May 2024 at 06:29, Linus Torvalds <torvalds@linux-foundation.org> wrote: > > On Wed, 15 May 2024 at 13:24, Linus Torvalds > <torvalds@linux-foundation.org> wrote: > > > > I have to revert both > > > > a68c7eaa7a8f ("drm/amdgpu: Enable clear page functionality") > > e362b7c8f8c7 ("drm/amdgpu: Modify the contiguous flags behaviour") > > > > to make things build cleanly. Next step: see if it boots and fixes the > > problem for me. > > Well, perhaps not surprisingly, the WARN_ON() no longer triggers with > this, and everything looks fine. > > Let's see if the machine ends up being stable now. It took several > hours for the "scary messages" state to turn into the "hung machine" > state, so they *could* have been independent issues, but it seems a > bit unlikely. I think that should be fine to do for now. I think it is also fine to do like I've attached, but I'm not sure if I'd take that chance. Two questions for Arunpravin (and Alex): Is this fix correct, and can we get a good explanation of it? Where did this error sneak in? Is the problem in the amdgpu tree, or was it a drm-next only problem? If so perhaps we need to discuss moving amdgpu more into drm-tip to catch this sort of problem. Dave.
On Thu, 16 May 2024 at 06:29, Linus Torvalds <torvalds@linux-foundation.org> wrote: > > On Wed, 15 May 2024 at 13:24, Linus Torvalds > <torvalds@linux-foundation.org> wrote: > > > > I have to revert both > > > > a68c7eaa7a8f ("drm/amdgpu: Enable clear page functionality") > > e362b7c8f8c7 ("drm/amdgpu: Modify the contiguous flags behaviour") > > > > to make things build cleanly. Next step: see if it boots and fixes the > > problem for me. > > Well, perhaps not surprisingly, the WARN_ON() no longer triggers with > this, and everything looks fine. > > Let's see if the machine ends up being stable now. It took several > hours for the "scary messages" state to turn into the "hung machine" > state, so they *could* have been independent issues, but it seems a > bit unlikely. This worries me actually, it's possible this warn could cause a problem, but I'm not convinced it should have machine ending properties without some sort of different error at the end, so I'd keep an eye open here. Dave.
On Wed, 15 May 2024 at 16:17, Dave Airlie <airlied@gmail.com> wrote: > > It's also possible it's just that hey there's a few others in the tree > > KVM_WERROR not tied to it > PPC_WERROR (why does CXL uses this?) Yeah, that should be fixed too, but at least KVM_WERROR predates the whole-kernel WERROR. And PPC_WERROR predates it by over a decade. But yes, good catch - both of those should be silenced if we already have the global WERROR enabled. I mainly notice new questions (because I use "make oldconfig"), so old pre-existing illogical ones don't trigger my "why are they asking?" reaction. > AMDGPU, I915 and XE all have !COMPILE_TEST on their variants Hmm. It turns out that I didn't notice the AMDGPU one because my Threadripper - that has AMDGPU enabled - I have actually turned off EXPERT on, so it's hidden by that for me. But yes, both of those should be "depends on !WERROR" too. Or maybe they should just go away entirely, and be subsumed by the DRM_WERROR thing. Linus
On Wed, 15 May 2024 at 16:51, Dave Airlie <airlied@gmail.com> wrote: > > > Let's see if the machine ends up being stable now. It took several > > hours for the "scary messages" state to turn into the "hung machine" > > state, so they *could* have been independent issues, but it seems a > > bit unlikely. > > This worries me actually, it's possible this warn could cause a > problem, but I'm not convinced it should have machine ending > properties without some sort of different error at the end, so I'd > keep an eye open here. Well, since I'm a big believer in dogfooding, I always run my own kernel even during the merge window. I don't reboot between each pull, but I try to basically reboot daily. And it's entirely possible that the eventual "bad page flags" error - which is what I think triggered the eventual hang - is something else that came in during this merge window. I haven't actually gotten the -mm changes from Andrew yet, but it did happen in the btrfs kworker, and I have merged the btrfs changes for 6.10. So maybe they are the cause. I was blaming the DRM case mainly because it clearly *was* about some kind of allocation management, and I got a *lot* of those warnings: $ journalctl -b -1 | grep 'WARNING: CPU' | wc -1 16015 but let's see if it happens with my amdgpu reverts in place, and no drm warnings. It most definitely wouldn't be the first time we had multiple independent bugs during the merge window ;/ Linus
On Thu, 16 May 2024 at 09:50, Dave Airlie <airlied@gmail.com> wrote: > > On Thu, 16 May 2024 at 06:29, Linus Torvalds > <torvalds@linux-foundation.org> wrote: > > > > On Wed, 15 May 2024 at 13:24, Linus Torvalds > > <torvalds@linux-foundation.org> wrote: > > > > > > I have to revert both > > > > > > a68c7eaa7a8f ("drm/amdgpu: Enable clear page functionality") > > > e362b7c8f8c7 ("drm/amdgpu: Modify the contiguous flags behaviour") > > > > > > to make things build cleanly. Next step: see if it boots and fixes the > > > problem for me. > > > > Well, perhaps not surprisingly, the WARN_ON() no longer triggers with > > this, and everything looks fine. > > > > Let's see if the machine ends up being stable now. It took several > > hours for the "scary messages" state to turn into the "hung machine" > > state, so they *could* have been independent issues, but it seems a > > bit unlikely. > > I think that should be fine to do for now. > > I think it is also fine to do like I've attached, but I'm not sure if > I'd take that chance. Scrap that idea, doesn't die, but it makes my system unhappy, like fbdev missing, so for quickest path forward, just make the two reverts seems best. I've reproduced it here, so I'll track it down, Dave.
On Thu, 16 May 2024 at 10:06, Dave Airlie <airlied@gmail.com> wrote: > > On Thu, 16 May 2024 at 09:50, Dave Airlie <airlied@gmail.com> wrote: > > > > On Thu, 16 May 2024 at 06:29, Linus Torvalds > > <torvalds@linux-foundation.org> wrote: > > > > > > On Wed, 15 May 2024 at 13:24, Linus Torvalds > > > <torvalds@linux-foundation.org> wrote: > > > > > > > > I have to revert both > > > > > > > > a68c7eaa7a8f ("drm/amdgpu: Enable clear page functionality") > > > > e362b7c8f8c7 ("drm/amdgpu: Modify the contiguous flags behaviour") > > > > > > > > to make things build cleanly. Next step: see if it boots and fixes the > > > > problem for me. > > > > > > Well, perhaps not surprisingly, the WARN_ON() no longer triggers with > > > this, and everything looks fine. > > > > > > Let's see if the machine ends up being stable now. It took several > > > hours for the "scary messages" state to turn into the "hung machine" > > > state, so they *could* have been independent issues, but it seems a > > > bit unlikely. > > > > I think that should be fine to do for now. > > > > I think it is also fine to do like I've attached, but I'm not sure if > > I'd take that chance. > > Scrap that idea, doesn't die, but it makes my system unhappy, like > fbdev missing, > > so for quickest path forward, just make the two reverts seems best. > > I've reproduced it here, so I'll track it down, https://lore.kernel.org/amd-gfx/20240514145636.16253-1-Arunpravin.PaneerSelvam@amd.com/T/#t This patch seems to fix it for me, I might just pull it into my tree and send it to you. Dave.
On 5/16/2024 8:12 AM, Dave Airlie wrote: > On Thu, 16 May 2024 at 10:06, Dave Airlie <airlied@gmail.com> wrote: >> On Thu, 16 May 2024 at 09:50, Dave Airlie <airlied@gmail.com> wrote: >>> On Thu, 16 May 2024 at 06:29, Linus Torvalds >>> <torvalds@linux-foundation.org> wrote: >>>> On Wed, 15 May 2024 at 13:24, Linus Torvalds >>>> <torvalds@linux-foundation.org> wrote: >>>>> I have to revert both >>>>> >>>>> a68c7eaa7a8f ("drm/amdgpu: Enable clear page functionality") >>>>> e362b7c8f8c7 ("drm/amdgpu: Modify the contiguous flags behaviour") >>>>> >>>>> to make things build cleanly. Next step: see if it boots and fixes the >>>>> problem for me. >>>> Well, perhaps not surprisingly, the WARN_ON() no longer triggers with >>>> this, and everything looks fine. >>>> >>>> Let's see if the machine ends up being stable now. It took several >>>> hours for the "scary messages" state to turn into the "hung machine" >>>> state, so they *could* have been independent issues, but it seems a >>>> bit unlikely. >>> I think that should be fine to do for now. >>> >>> I think it is also fine to do like I've attached, but I'm not sure if >>> I'd take that chance. >> Scrap that idea, doesn't die, but it makes my system unhappy, like >> fbdev missing, >> >> so for quickest path forward, just make the two reverts seems best. >> >> I've reproduced it here, so I'll track it down, > https://lore.kernel.org/amd-gfx/20240514145636.16253-1-Arunpravin.PaneerSelvam@amd.com/T/#t > > This patch seems to fix it for me, I might just pull it into my tree > and send it to you. Sorry for the noise, Dave's link is the right fix for this issue. Have you already picked it up or should I push it to drm-misc-next-fixes? Thanks, Arun. > > Dave.
On Wed, 15 May 2024, Linus Torvalds <torvalds@linux-foundation.org> wrote: > On Wed, 15 May 2024 at 16:17, Dave Airlie <airlied@gmail.com> wrote: >> AMDGPU, I915 and XE all have !COMPILE_TEST on their variants > > Hmm. It turns out that I didn't notice the AMDGPU one because my > Threadripper - that has AMDGPU enabled - I have actually turned off > EXPERT on, so it's hidden by that for me. > > But yes, both of those should be "depends on !WERROR" too. Fair enough. Honestly it just didn't occur to me. The main goal here was to ensure the drm subsystem does not have any build warnings, but without halting CI on any non-drm warnings that might occasionally creep in and that we can't fix as quickly. If there was a way to somehow limit WERROR by subdirectories, without config options, I'd love to ditch the config. > Or maybe they should just go away entirely, and be subsumed by the > DRM_WERROR thing. For i915, this was the idea anyway, we just haven't gotten around to it yet. BR, Jani.
On Thu, May 16, 2024 at 4:42 AM Jani Nikula <jani.nikula@linux.intel.com> wrote: > > On Wed, 15 May 2024, Linus Torvalds <torvalds@linux-foundation.org> wrote: > > On Wed, 15 May 2024 at 16:17, Dave Airlie <airlied@gmail.com> wrote: > >> AMDGPU, I915 and XE all have !COMPILE_TEST on their variants > > > > Hmm. It turns out that I didn't notice the AMDGPU one because my > > Threadripper - that has AMDGPU enabled - I have actually turned off > > EXPERT on, so it's hidden by that for me. > > > > But yes, both of those should be "depends on !WERROR" too. > > Fair enough. Honestly it just didn't occur to me. > > The main goal here was to ensure the drm subsystem does not have any > build warnings, but without halting CI on any non-drm warnings that > might occasionally creep in and that we can't fix as quickly. > > If there was a way to somehow limit WERROR by subdirectories, without > config options, I'd love to ditch the config. Right. Same thing for amdgpu. Our CI was often breaking due to -WERROR in other subsystems or with compiler updates. Maybe it's better now. Alex > > > Or maybe they should just go away entirely, and be subsumed by the > > DRM_WERROR thing. > > For i915, this was the idea anyway, we just haven't gotten around to it > yet. > > > BR, > Jani. > > > -- > Jani Nikula, Intel
On Thu, May 16, 2024, Dave Airlie wrote: > On Thu, 16 May 2024 at 08:56, Linus Torvalds <torvalds@linux-foundation.org> wrote: > > If the *main* CONFIG_WERROR is on, then it does NOT MATTER if somebody > > sets CONFIG_DRM_WERROR or not. It's a no-op. It's pointless. +1 > It's also possible it's just that hey there's a few others in the tree > > KVM_WERROR not tied to it > PPC_WERROR (why does CXL uses this?) > AMDGPU, I915 and XE all have !COMPILE_TEST on their variants > > We should probably add !WERROR to all of these at this point. That creates its own weirdness though, e.g. I guarantee I'll forget about the global WERROR at some point and wonder why I'm seeing -Werror despite having KVM_WERROR=n in my .config. I would rather force KVM_WERROR if WERROR=y, so this? diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig index 2a7f69abcac3..75082c4a9ac4 100644 --- a/arch/x86/kvm/Kconfig +++ b/arch/x86/kvm/Kconfig @@ -44,6 +44,7 @@ config KVM select KVM_VFIO select HAVE_KVM_PM_NOTIFIER if PM select KVM_GENERIC_HARDWARE_ENABLING + select KVM_WERROR if WERROR help Support hosting fully virtualized guest machines using hardware virtualization extensions. You will need a fairly recent @@ -66,7 +67,7 @@ config KVM_WERROR # FRAME_WARN, i.e. KVM_WERROR=y with KASAN=y requires special tuning. # Building KVM with -Werror and KASAN is still doable via enabling # the kernel-wide WERROR=y. - depends on KVM && EXPERT && !KASAN + depends on KVM && ((EXPERT && !KASAN) || WERROR) help Add -Werror to the build flags for KVM.