diff mbox series

[RFC] net: ethtool: do runtime PM outside RTNL

Message ID 20231204200710.40c291e60cea.I2deb5804ef1739a2af307283d320ef7d82456494@changeid (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series [RFC] net: ethtool: do runtime PM outside RTNL | expand

Checks

Context Check Description
netdev/series_format warning Single patches do not need cover letters; Target tree name not specified in the subject
netdev/tree_selection success Guessed tree name to be net-next
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 1117 this patch: 1117
netdev/cc_maintainers fail 6 maintainers not CCed: andrew@lunn.ch pabeni@redhat.com kuba@kernel.org d-tatianin@yandex-team.ru jdamato@fastly.com edumazet@google.com
netdev/build_clang success Errors and warnings before: 1143 this patch: 1143
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api warning Found: 'dev_put(' was: 0 now: 1
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 1144 this patch: 1144
netdev/checkpatch fail ERROR: code indent should use tabs where possible WARNING: line length of 89 exceeds 80 columns WARNING: please, no space before tabs
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Johannes Berg Dec. 4, 2023, 7:07 p.m. UTC
From: Johannes Berg <johannes.berg@intel.com>

As reported by Marc MERLIN in [1], at least one driver (igc)
wants/needs to acquire the RTNL inside suspend/resume ops,
which can be called from here in ethtool if runtime PM is
enabled.

[1] https://lore.kernel.org/r/20231202221402.GA11155@merlins.org

Allow this by doing runtime PM transitions without the RTNL
held. For the ioctl to have the same operations order, this
required reworking the code to separately check validity and
do the operation. For the netlink code, this now has to do
the runtime_pm_put a bit later.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
---
 net/ethtool/ioctl.c   | 71 ++++++++++++++++++++++++++-----------------
 net/ethtool/netlink.c | 32 ++++++++-----------
 2 files changed, 56 insertions(+), 47 deletions(-)

Comments

Marc MERLIN Dec. 4, 2023, 8:36 p.m. UTC | #1
I'll re-add the list since I'm adding details that may matter to others

On Mon, Dec 04, 2023 at 09:01:58PM +0100, Johannes Berg wrote:
> On Mon, 2023-12-04 at 12:00 -0800, Marc MERLIN wrote:
> > Thanks for that, will try this now instead of the 2 line patch you
> > suggested in the list. I can repro this quickly so I'll let you know if
> > it helps.
> 
> You can probably just do any ethtool command instead of waiting for
> powertop to do it ... :)
 
powertop is indeed one of the triggers, but things do hang without me
running powertop.  I suppose I could also remove the ethtool binary
as a dirty workaround, at least for testing.
I'll try that after testing your patch if it does not help.

> > I do have ethernet on that laptop, I only use it rarely, so I can live
> > if it's broken for now as long as wifi works.
> 
> Then you can just blacklist igc I guess, and if you have wifi problems
> I'm actually the right person ;-)

So, ipc is not a module for me
sauron:/usr/src/linux-6.6.4-amd64-preempt-sysrq-20231203# lsmod | grep ipc
sauron:/usr/src/linux-6.6.4-amd64-preempt-sysrq-20231203# 

Is that a build config option I can reaonably turn off without breaking
networking altogether?

Previous boot, before your patch that I just compiled and will test now, 
it ended with this. I also see ethtool lower down, maybe run randomly by network manager
but isn't the hang happening before in these loglines, or the order of the logs is not
the actual sequence order of the problem?

[ 4230.596282] INFO: task kworker/0:2:132 blocked for more than 120 seconds.
[ 4230.596288]       Tainted: G     U             6.6.4-amd64-preempt-sysrq-20231203 #5
[ 4230.596289] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 4230.596291] task:kworker/0:2     state:D stack:0     pid:132   ppid:2      flags:0x00004000
[ 4230.596294] Workqueue: ipv6_addrconf addrconf_verify_work
[ 4230.596305] Call Trace:
[ 4230.596310]  <TASK>
[ 4230.596317]  __schedule+0x6c3/0x727
[ 4230.596330]  ? update_load_avg+0x43/0x3ba
[ 4230.596340]  schedule+0x8b/0xbc
[ 4230.596349]  schedule_preempt_disabled+0x11/0x1d
[ 4230.596359]  __mutex_lock.constprop.0+0x18b/0x291
[ 4230.596367]  addrconf_verify_work+0xe/0x20
[ 4230.596375]  process_scheduled_works+0x1da/0x2e0
[ 4230.596387]  worker_thread+0x1ca/0x224
[ 4230.596397]  ? __pfx_worker_thread+0x40/0x40
[ 4230.596407]  kthread+0xe9/0xf4
[ 4230.596415]  ? __pfx_kthread+0x40/0x40
[ 4230.596422]  ret_from_fork+0x21/0x38
[ 4230.596431]  ? __pfx_kthread+0x40/0x40
[ 4230.596438]  ret_from_fork_asm+0x1b/0x80
[ 4230.596446]  </TASK>
[ 4230.596450] INFO: task kworker/9:2:389 blocked for more than 120 seconds.
[ 4230.596451]       Tainted: G     U             6.6.4-amd64-preempt-sysrq-20231203 #5
[ 4230.596452] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 4230.596453] task:kworker/9:2     state:D stack:0     pid:389   ppid:2      flags:0x00004000
[ 4230.596455] Workqueue: ipv6_addrconf addrconf_verify_work
[ 4230.596457] Call Trace:
[ 4230.596458]  <TASK>
[ 4230.596459]  __schedule+0x6c3/0x727
[ 4230.596461]  ? update_load_avg+0x392/0x3ba
[ 4230.596462]  schedule+0x8b/0xbc
[ 4230.596464]  schedule_preempt_disabled+0x11/0x1d
[ 4230.596466]  __mutex_lock.constprop.0+0x18b/0x291
[ 4230.596467]  addrconf_verify_work+0xe/0x20
[ 4230.596469]  process_scheduled_works+0x1da/0x2e0
[ 4230.596471]  worker_thread+0x1ca/0x224
[ 4230.596473]  ? __pfx_worker_thread+0x40/0x40
[ 4230.596475]  kthread+0xe9/0xf4
[ 4230.596477]  ? __pfx_kthread+0x40/0x40
[ 4230.596478]  ret_from_fork+0x21/0x38
[ 4230.596480]  ? __pfx_kthread+0x40/0x40
[ 4230.596481]  ret_from_fork_asm+0x1b/0x80
[ 4230.596483]  </TASK>
[ 4230.596506] INFO: task NetworkManager:3153 blocked for more than 120 seconds.
[ 4230.596507]       Tainted: G     U             6.6.4-amd64-preempt-sysrq-20231203 #5
[ 4230.596508] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 4230.596509] task:NetworkManager  state:D stack:0     pid:3153  ppid:1      flags:0x00000002
[ 4230.596511] Call Trace:
[ 4230.596512]  <TASK>
[ 4230.596513]  __schedule+0x6c3/0x727
[ 4230.596515]  schedule+0x8b/0xbc
[ 4230.596517]  schedule_preempt_disabled+0x11/0x1d
[ 4230.596519]  __mutex_lock.constprop.0+0x18b/0x291
[ 4230.596521]  nl80211_prepare_wdev_dump+0x8b/0x19f [cfg80211 7e4322db9148b7482d4806580d3e58073444690f]
[ 4230.596563]  nl80211_dump_station+0x49/0x1d0 [cfg80211 7e4322db9148b7482d4806580d3e58073444690f]
[ 4230.596589]  ? __alloc_pages+0x131/0x1e8
[ 4230.596592]  ? __mod_lruvec_page_state+0x4c/0x86
[ 4230.596595]  ? mod_lruvec_page_state.constprop.0+0x1c/0x2e
[ 4230.596597]  ? __kmalloc_large_node+0xd5/0xfb
[ 4230.596599]  ? __alloc_skb+0xad/0x148
[ 4230.596602]  ? __kmalloc_node_track_caller+0x5b/0xb2
[ 4230.596604]  ? kmalloc_reserve+0xab/0xe6
[ 4230.596605]  genl_dumpit+0x32/0x4d
[ 4230.596608]  netlink_dump+0x143/0x2b2
[ 4230.596611]  __netlink_dump_start+0x145/0x197
[ 4230.596613]  genl_family_rcv_msg_dumpit+0xa3/0xd1
[ 4230.596615]  ? __pfx_genl_start+0x40/0x40
[ 4230.596617]  ? __pfx_genl_dumpit+0x40/0x40
[ 4230.596619]  ? __pfx_genl_done+0x40/0x40
[ 4230.596621]  genl_rcv_msg+0x1a0/0x1f2
[ 4230.596623]  ? __pfx_nl80211_dump_station+0x40/0x40 [cfg80211 7e4322db9148b7482d4806580d3e58073444690f]
[ 4230.596651]  ? __pfx_genl_rcv_msg+0x40/0x40
[ 4230.596653]  netlink_rcv_skb+0x89/0xe3
[ 4230.596655]  genl_rcv+0x24/0x31
[ 4230.596657]  netlink_unicast+0x114/0x1b4
[ 4230.596659]  netlink_sendmsg+0x321/0x361
[ 4230.596661]  sock_sendmsg_nosec+0x46/0x70
[ 4230.596664]  ____sys_sendmsg+0x144/0x1f5
[ 4230.596666]  ___sys_sendmsg+0x76/0xb3
[ 4230.596668]  ? __pfx_pollwake+0x40/0x40
[ 4230.596671]  ? __fget+0x38/0x47
[ 4230.596673]  __sys_sendmsg+0x60/0x97
[ 4230.596675]  do_syscall_64+0x7e/0xa7
[ 4230.596678]  ? __x64_sys_epoll_wait+0x45/0x69
[ 4230.596681]  ? syscall_exit_to_user_mode+0x26/0x2c
[ 4230.596682]  ? do_syscall_64+0x9d/0xa7
[ 4230.596684]  ? exit_to_user_mode_prepare+0xef/0x11c
[ 4230.596687]  ? syscall_exit_to_user_mode+0x26/0x2c
[ 4230.596688]  ? do_syscall_64+0x9d/0xa7
[ 4230.596690]  ? do_syscall_64+0x9d/0xa7
[ 4230.596692]  ? __x64_sys_epoll_wait+0x45/0x69
[ 4230.596694]  ? syscall_exit_to_user_mode+0x26/0x2c
[ 4230.596695]  ? do_syscall_64+0x9d/0xa7
[ 4230.596697]  ? do_syscall_64+0x9d/0xa7
[ 4230.596699]  ? do_syscall_64+0x9d/0xa7
[ 4230.596701]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 4230.596703] RIP: 0033:0x7f5d1754d9bd
[ 4230.596705] RSP: 002b:00007ffffdfefb50 EFLAGS: 00000293 ORIG_RAX: 000000000000002e
[ 4230.596707] RAX: ffffffffffffffda RBX: 000055948809bdc0 RCX: 00007f5d1754d9bd
[ 4230.596708] RDX: 0000000000000000 RSI: 00007ffffdfefba0 RDI: 000000000000000b
[ 4230.596709] RBP: 00007ffffdfefba0 R08: 0000000000000000 R09: 0000000000000300
[ 4230.596710] R10: 0000000000000000 R11: 0000000000000293 R12: 00007ffffdfefc80
[ 4230.596711] R13: 00007f5d0c0038a0 R14: 00007ffffdfefeb0 R15: 00005594882d0180
[ 4230.596713]  </TASK>
[ 4230.596743] INFO: task ThreadPoolForeg:9425 blocked for more than 120 seconds.
[ 4230.596745]       Tainted: G     U             6.6.4-amd64-preempt-sysrq-20231203 #5
[ 4230.596745] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 4230.596746] task:ThreadPoolForeg state:D stack:0     pid:9425  ppid:9416   flags:0x00000002
[ 4230.596748] Call Trace:
[ 4230.596749]  <TASK>
[ 4230.596750]  __schedule+0x6c3/0x727
[ 4230.596752]  schedule+0x8b/0xbc
[ 4230.596754]  schedule_preempt_disabled+0x11/0x1d
[ 4230.596756]  __mutex_lock.constprop.0+0x18b/0x291
[ 4230.596758]  ? __pfx_ioctl_standard_call+0x40/0x40
[ 4230.596760]  ? __pfx_ioctl_private_call+0x40/0x40
[ 4230.596762]  wext_ioctl_dispatch+0x4b/0x1a7
[ 4230.596764]  wext_handle_ioctl+0x6f/0xbd
[ 4230.596766]  sock_ioctl+0xc2/0x2c4
[ 4230.596767]  ? alloc_file_pseudo+0xba/0x103
[ 4230.596769]  vfs_ioctl+0x23/0x38
[ 4230.596772]  __do_sys_ioctl+0x63/0x8a
[ 4230.596774]  do_syscall_64+0x7e/0xa7
[ 4230.596776]  ? syscall_exit_to_user_mode+0x26/0x2c
[ 4230.596777]  ? do_syscall_64+0x9d/0xa7
[ 4230.596779]  ? vfs_ioctl+0x23/0x38
[ 4230.596780]  ? __do_sys_ioctl+0x75/0x8a
[ 4230.596782]  ? syscall_exit_to_user_mode+0x26/0x2c
[ 4230.596783]  ? do_syscall_64+0x9d/0xa7
[ 4230.596785]  ? __sys_socket+0xa8/0xdb
[ 4230.596787]  ? syscall_exit_to_user_mode+0x26/0x2c
[ 4230.596788]  ? do_syscall_64+0x9d/0xa7
[ 4230.596790]  ? do_syscall_64+0x9d/0xa7
[ 4230.596792]  ? exc_page_fault+0x122/0x146
[ 4230.596794]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 4230.596795] RIP: 0033:0x7fe7d4f2dbab
[ 4230.596796] RSP: 002b:00007fe7d0bfb8d0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[ 4230.596798] RAX: ffffffffffffffda RBX: 00001e3005e47740 RCX: 00007fe7d4f2dbab
[ 4230.596799] RDX: 00007fe7d0bfbac0 RSI: 0000000000008b01 RDI: 000000000000002d
[ 4230.596800] RBP: 00007fe7d0bfbb10 R08: 0000000000000008 R09: 00001e30000212b8
[ 4230.596801] R10: 0000000000000011 R11: 0000000000000246 R12: 00007fe7d0bfb990
[ 4230.596801] R13: 00007fe7d0bfbac0 R14: 00001e3001692600 R15: 000000000000002d
[ 4230.596803]  </TASK>
[ 4230.596890] INFO: task ethtool:20272 blocked for more than 120 seconds.
[ 4230.596891]       Tainted: G     U             6.6.4-amd64-preempt-sysrq-20231203 #5
[ 4230.596892] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 4230.596893] task:ethtool         state:D stack:0     pid:20272 ppid:1      flags:0x00004006
[ 4230.596894] Call Trace:
[ 4230.596895]  <TASK>
[ 4230.596896]  __schedule+0x6c3/0x727
[ 4230.596899]  schedule+0x8b/0xbc
[ 4230.596900]  schedule_preempt_disabled+0x11/0x1d
[ 4230.596902]  __mutex_lock.constprop.0+0x18b/0x291
[ 4230.596904]  ? __pfx_pci_pm_runtime_resume+0x40/0x40
[ 4230.596908]  igc_resume+0x18b/0x1ca [igc 33cdaa7f35d000f01945bed5e87cab1e358c8327]
[ 4230.596916]  __rpm_callback+0x7a/0xe7
[ 4230.596919]  rpm_callback+0x35/0x64
[ 4230.596921]  ? __pfx_pci_pm_runtime_resume+0x40/0x40
[ 4230.596922]  rpm_resume+0x342/0x44a
[ 4230.596924]  ? dev_get_by_name_rcu+0x25/0x41
[ 4230.596926]  ? dev_get_by_name+0x3c/0x4a
[ 4230.596928]  __pm_runtime_resume+0x5a/0x7a
[ 4230.596930]  ethnl_ops_begin+0x2c/0x8d
[ 4230.596932]  ethnl_default_set_doit+0xdf/0x160
[ 4230.596934]  genl_family_rcv_msg_doit+0xc7/0x132
[ 4230.596936]  genl_rcv_msg+0x1b9/0x1f2
[ 4230.596938]  ? __pfx_ethnl_default_set_doit+0x40/0x40
[ 4230.596940]  ? __pfx_genl_rcv_msg+0x40/0x40
[ 4230.596942]  netlink_rcv_skb+0x89/0xe3
[ 4230.596944]  genl_rcv+0x24/0x31
[ 4230.596946]  netlink_unicast+0x114/0x1b4
[ 4230.596947]  netlink_sendmsg+0x321/0x361
[ 4230.596949]  sock_sendmsg_nosec+0x46/0x70
[ 4230.596951]  __sys_sendto+0xe7/0x126
[ 4230.596954]  ? syscall_trace_enter.constprop.0+0x58/0x110
[ 4230.596956]  ? __handle_mm_fault+0x4e9/0x5ba
[ 4230.596958]  __x64_sys_sendto+0x25/0x2c
[ 4230.596959]  do_syscall_64+0x7e/0xa7
[ 4230.596961]  ? cgroup_rstat_updated+0x21/0xa5
[ 4230.596964]  ? memcg_rstat_updated+0x16/0x62
[ 4230.596966]  ? count_memcg_event_mm+0x44/0x5e
[ 4230.596968]  ? handle_mm_fault+0x1f0/0x275
[ 4230.596969]  ? do_user_addr_fault+0x28f/0x4fb
[ 4230.596972]  ? exc_page_fault+0x122/0x146
[ 4230.596974]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 4230.596976] RIP: 0033:0x7f0252929a23
[ 4230.596977] RSP: 002b:00007ffc839efeb8 EFLAGS: 00000202 ORIG_RAX: 000000000000002c
[ 4230.596978] RAX: ffffffffffffffda RBX: 0000559002fc1430 RCX: 00007f0252929a23
[ 4230.596979] RDX: 0000000000000030 RSI: 0000559002fc3510 RDI: 0000000000000003
[ 4230.596980] RBP: 0000000000000002 R08: 00007f0252a03be0 R09: 000000000000000c
[ 4230.596981] R10: 0000000000000000 R11: 0000000000000202 R12: 0000559002fb1340
[ 4230.596982] R13: 0000559002fc34d0 R14: 000055900242d400 R15: 0000559002fb1330
[ 4230.596983]  </TASK>
[ 4230.596984] INFO: task ifconfig:20322 blocked for more than 120 seconds.
[ 4230.596985]       Tainted: G     U             6.6.4-amd64-preempt-sysrq-20231203 #5
[ 4230.596986] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 4230.596987] task:ifconfig        state:D stack:0     pid:20322 ppid:20310  flags:0x00000002
[ 4230.596989] Call Trace:
[ 4230.596990]  <TASK>
[ 4230.596990]  __schedule+0x6c3/0x727
[ 4230.596992]  ? dput+0x45/0x13f
[ 4230.596995]  schedule+0x8b/0xbc
[ 4230.596996]  schedule_preempt_disabled+0x11/0x1d
[ 4230.596998]  __mutex_lock.constprop.0+0x18b/0x291
[ 4230.597000]  ? security_capable+0x3b/0x59
[ 4230.597003]  devinet_ioctl+0xfe/0x5dd
[ 4230.597005]  inet_ioctl+0x158/0x1b7
[ 4230.597007]  ? dev_get_flags+0x4a/0x6f
[ 4230.597009]  ? dev_ioctl+0x330/0x464
[ 4230.597011]  ? _copy_to_user+0x15/0x3c
[ 4230.597013]  ? put_user_ifreq+0x2a/0x3d
[ 4230.597015]  sock_do_ioctl+0x39/0xd8
[ 4230.597018]  sock_ioctl+0x29b/0x2c4
[ 4230.597019]  ? exit_to_user_mode_prepare+0xb5/0x11c
[ 4230.597021]  vfs_ioctl+0x23/0x38
[ 4230.597023]  __do_sys_ioctl+0x63/0x8a
[ 4230.597025]  do_syscall_64+0x7e/0xa7
[ 4230.597027]  ? exit_to_user_mode_prepare+0xb5/0x11c
[ 4230.597029]  ? syscall_exit_to_user_mode+0x26/0x2c
[ 4230.597030]  ? do_syscall_64+0x9d/0xa7
[ 4230.597032]  ? exc_page_fault+0x122/0x146
[ 4230.597034]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 4230.597036] RIP: 0033:0x7fd707153bab
[ 4230.597037] RSP: 002b:00007ffe0cbee900 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[ 4230.597038] RAX: ffffffffffffffda RBX: 00007ffe0cbeea40 RCX: 00007fd707153bab
[ 4230.597039] RDX: 00007ffe0cbee960 RSI: 0000000000008914 RDI: 0000000000000003
[ 4230.597040] RBP: 0000000000000003 R08: 000000000000000a R09: 0000000000000000
[ 4230.597041] R10: 00007fd70706d358 R11: 0000000000000246 R12: 00000000fffffffe
[ 4230.597042] R13: 0000000000000000 R14: 00007ffe0cbeeed0 R15: 0000000000000064
[ 4230.597043]  </TASK>
[ 4342.824939] snd_hda_intel 0000:00:1f.3: power state changed by ACPI to D0
[ 4342.824946] snd_hda_intel 0000:00:1f.3: ACPI _REG connect evaluation failed (5)
[ 4342.836199] snd_hda_intel 0000:00:1f.3: PME# disabled
[ 4342.836260] i915 0000:00:02.0: vgaarb: __vga_tryget: 1
[ 4342.836263] i915 0000:00:02.0: vgaarb: __vga_tryget: owns: 3
[ 4342.836266] i915 0000:00:02.0: vgaarb: __vga_put
[ 4351.428130] INFO: task kworker/0:2:132 blocked for more than 241 seconds.
[ 4351.428136]       Tainted: G     U             6.6.4-amd64-preempt-sysrq-20231203 #5
[ 4351.428138] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 4351.428139] task:kworker/0:2     state:D stack:0     pid:132   ppid:2      flags:0x00004000
[ 4351.428142] Workqueue: ipv6_addrconf addrconf_verify_work
[ 4351.428148] Call Trace:
[ 4351.428150]  <TASK>
[ 4351.428152]  __schedule+0x6c3/0x727
[ 4351.428156]  ? update_load_avg+0x43/0x3ba
[ 4351.428159]  schedule+0x8b/0xbc
[ 4351.428161]  schedule_preempt_disabled+0x11/0x1d
[ 4351.428163]  __mutex_lock.constprop.0+0x18b/0x291
[ 4351.428165]  addrconf_verify_work+0xe/0x20
[ 4351.428167]  process_scheduled_works+0x1da/0x2e0
[ 4351.428171]  worker_thread+0x1ca/0x224
[ 4351.428173]  ? __pfx_worker_thread+0x40/0x40
[ 4351.428175]  kthread+0xe9/0xf4
[ 4351.428177]  ? __pfx_kthread+0x40/0x40
[ 4351.428179]  ret_from_fork+0x21/0x38
[ 4351.428182]  ? __pfx_kthread+0x40/0x40
[ 4351.428183]  ret_from_fork_asm+0x1b/0x80
[ 4351.428187]  </TASK>
[ 4351.428191] INFO: task kworker/9:2:389 blocked for more than 241 seconds.
[ 4351.428192]       Tainted: G     U             6.6.4-amd64-preempt-sysrq-20231203 #5
[ 4351.428193] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 4351.428194] task:kworker/9:2     state:D stack:0     pid:389   ppid:2      flags:0x00004000
[ 4351.428196] Workqueue: ipv6_addrconf addrconf_verify_work
[ 4351.428198] Call Trace:
[ 4351.428199]  <TASK>
[ 4351.428200]  __schedule+0x6c3/0x727
[ 4351.428201]  ? update_load_avg+0x392/0x3ba
[ 4351.428203]  schedule+0x8b/0xbc
[ 4351.428205]  schedule_preempt_disabled+0x11/0x1d
[ 4351.428207]  __mutex_lock.constprop.0+0x18b/0x291
[ 4351.428208]  addrconf_verify_work+0xe/0x20
[ 4351.428210]  process_scheduled_works+0x1da/0x2e0
[ 4351.428212]  worker_thread+0x1ca/0x224
[ 4351.428214]  ? __pfx_worker_thread+0x40/0x40
[ 4351.428216]  kthread+0xe9/0xf4
[ 4351.428218]  ? __pfx_kthread+0x40/0x40
[ 4351.428219]  ret_from_fork+0x21/0x38
[ 4351.428221]  ? __pfx_kthread+0x40/0x40
[ 4351.428222]  ret_from_fork_asm+0x1b/0x80
[ 4351.428224]  </TASK>
[ 4351.428247] INFO: task NetworkManager:3153 blocked for more than 241 seconds.
[ 4351.428248]       Tainted: G     U             6.6.4-amd64-preempt-sysrq-20231203 #5
[ 4351.428249] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 4351.428250] task:NetworkManager  state:D stack:0     pid:3153  ppid:1      flags:0x00000002
[ 4351.428252] Call Trace:
[ 4351.428253]  <TASK>
[ 4351.428253]  __schedule+0x6c3/0x727
[ 4351.428256]  schedule+0x8b/0xbc
[ 4351.428257]  schedule_preempt_disabled+0x11/0x1d
[ 4351.428259]  __mutex_lock.constprop.0+0x18b/0x291
[ 4351.428262]  nl80211_prepare_wdev_dump+0x8b/0x19f [cfg80211 7e4322db9148b7482d4806580d3e58073444690f]
[ 4351.428301]  nl80211_dump_station+0x49/0x1d0 [cfg80211 7e4322db9148b7482d4806580d3e58073444690f]
[ 4351.428328]  ? __alloc_pages+0x131/0x1e8
[ 4351.428330]  ? __mod_lruvec_page_state+0x4c/0x86
[ 4351.428333]  ? mod_lruvec_page_state.constprop.0+0x1c/0x2e
[ 4351.428335]  ? __kmalloc_large_node+0xd5/0xfb
[ 4351.428337]  ? __alloc_skb+0xad/0x148
[ 4351.428340]  ? __kmalloc_node_track_caller+0x5b/0xb2
[ 4351.428342]  ? kmalloc_reserve+0xab/0xe6
[ 4351.428344]  genl_dumpit+0x32/0x4d
[ 4351.428347]  netlink_dump+0x143/0x2b2
[ 4351.428349]  __netlink_dump_start+0x145/0x197
[ 4351.428351]  genl_family_rcv_msg_dumpit+0xa3/0xd1
[ 4351.428353]  ? __pfx_genl_start+0x40/0x40
[ 4351.428355]  ? __pfx_genl_dumpit+0x40/0x40
[ 4351.428357]  ? __pfx_genl_done+0x40/0x40
[ 4351.428359]  genl_rcv_msg+0x1a0/0x1f2
[ 4351.428361]  ? __pfx_nl80211_dump_station+0x40/0x40 [cfg80211 7e4322db9148b7482d4806580d3e58073444690f]
[ 4351.428387]  ? __pfx_genl_rcv_msg+0x40/0x40
[ 4351.428389]  netlink_rcv_skb+0x89/0xe3
[ 4351.428391]  genl_rcv+0x24/0x31
[ 4351.428393]  netlink_unicast+0x114/0x1b4
[ 4351.428395]  netlink_sendmsg+0x321/0x361
[ 4351.428397]  sock_sendmsg_nosec+0x46/0x70
[ 4351.428400]  ____sys_sendmsg+0x144/0x1f5
[ 4351.428402]  ___sys_sendmsg+0x76/0xb3
[ 4351.428404]  ? __pfx_pollwake+0x40/0x40
[ 4351.428407]  ? __fget+0x38/0x47
[ 4351.428409]  __sys_sendmsg+0x60/0x97
[ 4351.428411]  do_syscall_64+0x7e/0xa7
[ 4351.428414]  ? __x64_sys_epoll_wait+0x45/0x69
[ 4351.428416]  ? syscall_exit_to_user_mode+0x26/0x2c
[ 4351.428418]  ? do_syscall_64+0x9d/0xa7
[ 4351.428420]  ? exit_to_user_mode_prepare+0xef/0x11c
[ 4351.428423]  ? syscall_exit_to_user_mode+0x26/0x2c
[ 4351.428424]  ? do_syscall_64+0x9d/0xa7
[ 4351.428426]  ? do_syscall_64+0x9d/0xa7
[ 4351.428428]  ? __x64_sys_epoll_wait+0x45/0x69
[ 4351.428430]  ? syscall_exit_to_user_mode+0x26/0x2c
[ 4351.428431]  ? do_syscall_64+0x9d/0xa7
[ 4351.428433]  ? do_syscall_64+0x9d/0xa7
[ 4351.428434]  ? do_syscall_64+0x9d/0xa7
[ 4351.428436]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 4351.428439] RIP: 0033:0x7f5d1754d9bd
[ 4351.428441] RSP: 002b:00007ffffdfefb50 EFLAGS: 00000293 ORIG_RAX: 000000000000002e
[ 4351.428443] RAX: ffffffffffffffda RBX: 000055948809bdc0 RCX: 00007f5d1754d9bd
[ 4351.428444] RDX: 0000000000000000 RSI: 00007ffffdfefba0 RDI: 000000000000000b
[ 4351.428445] RBP: 00007ffffdfefba0 R08: 0000000000000000 R09: 0000000000000300
[ 4351.428446] R10: 0000000000000000 R11: 0000000000000293 R12: 00007ffffdfefc80
[ 4351.428447] R13: 00007f5d0c0038a0 R14: 00007ffffdfefeb0 R15: 00005594882d0180
[ 4351.428449]  </TASK>
[ 4351.428478] INFO: task ThreadPoolForeg:9425 blocked for more than 241 seconds.
[ 4351.428479]       Tainted: G     U             6.6.4-amd64-preempt-sysrq-20231203 #5
[ 4351.428480] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 4351.428481] task:ThreadPoolForeg state:D stack:0     pid:9425  ppid:9416   flags:0x00000002
[ 4351.428483] Call Trace:
[ 4351.428484]  <TASK>
[ 4351.428485]  __schedule+0x6c3/0x727
[ 4351.428487]  schedule+0x8b/0xbc
[ 4351.428489]  schedule_preempt_disabled+0x11/0x1d
[ 4351.428491]  __mutex_lock.constprop.0+0x18b/0x291
[ 4351.428492]  ? __pfx_ioctl_standard_call+0x40/0x40
[ 4351.428495]  ? __pfx_ioctl_private_call+0x40/0x40
[ 4351.428497]  wext_ioctl_dispatch+0x4b/0x1a7
[ 4351.428499]  wext_handle_ioctl+0x6f/0xbd
[ 4351.428501]  sock_ioctl+0xc2/0x2c4
[ 4351.428503]  ? alloc_file_pseudo+0xba/0x103
[ 4351.428504]  vfs_ioctl+0x23/0x38
[ 4351.428507]  __do_sys_ioctl+0x63/0x8a
[ 4351.428509]  do_syscall_64+0x7e/0xa7
[ 4351.428511]  ? syscall_exit_to_user_mode+0x26/0x2c
[ 4351.428512]  ? do_syscall_64+0x9d/0xa7
[ 4351.428514]  ? vfs_ioctl+0x23/0x38
[ 4351.428516]  ? __do_sys_ioctl+0x75/0x8a
[ 4351.428517]  ? syscall_exit_to_user_mode+0x26/0x2c
[ 4351.428519]  ? do_syscall_64+0x9d/0xa7
[ 4351.428521]  ? __sys_socket+0xa8/0xdb
[ 4351.428522]  ? syscall_exit_to_user_mode+0x26/0x2c
[ 4351.428523]  ? do_syscall_64+0x9d/0xa7
[ 4351.428525]  ? do_syscall_64+0x9d/0xa7
[ 4351.428527]  ? exc_page_fault+0x122/0x146
[ 4351.428529]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 4351.428531] RIP: 0033:0x7fe7d4f2dbab
[ 4351.428531] RSP: 002b:00007fe7d0bfb8d0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[ 4351.428533] RAX: ffffffffffffffda RBX: 00001e3005e47740 RCX: 00007fe7d4f2dbab
[ 4351.428534] RDX: 00007fe7d0bfbac0 RSI: 0000000000008b01 RDI: 000000000000002d
[ 4351.428535] RBP: 00007fe7d0bfbb10 R08: 0000000000000008 R09: 00001e30000212b8
[ 4351.428536] R10: 0000000000000011 R11: 0000000000000246 R12: 00007fe7d0bfb990
[ 4351.428537] R13: 00007fe7d0bfbac0 R14: 00001e3001692600 R15: 000000000000002d
[ 4351.428538]  </TASK>
[ 4351.428539] Future hung task reports are suppressed, see sysctl kernel.hung_task_warnings

If that helps, 2 boots ago, before your patch, things ended up in a loop
of these:
[36019.307346] xhci_hcd 0000:00:0d.0: ACPI _REG connect evaluation failed (5)
[36019.320969] xhci_hcd 0000:00:0d.0: PME# disabled
[36019.322001] xhci_hcd 0000:00:0d.0: enabling bus mastering
[36019.349811] xhci_hcd 0000:00:0d.0: save config 0x00: 0x9a178086
[36019.350822] xhci_hcd 0000:00:0d.0: save config 0x04: 0x02900402
[36019.351812] xhci_hcd 0000:00:0d.0: save config 0x08: 0x0c033005
[36019.352807] xhci_hcd 0000:00:0d.0: save config 0x0c: 0x00800000
[36019.353820] xhci_hcd 0000:00:0d.0: save config 0x10: 0x4d170004
[36019.354820] xhci_hcd 0000:00:0d.0: save config 0x14: 0x00000064
[36019.355815] xhci_hcd 0000:00:0d.0: save config 0x18: 0x00000000
[36019.356808] xhci_hcd 0000:00:0d.0: save config 0x1c: 0x00000000
[36019.357830] xhci_hcd 0000:00:0d.0: save config 0x20: 0x00000000
[36019.358828] xhci_hcd 0000:00:0d.0: save config 0x24: 0x00000000
[36019.359823] xhci_hcd 0000:00:0d.0: save config 0x28: 0x00000000
[36019.360809] xhci_hcd 0000:00:0d.0: save config 0x2c: 0x22de17aa
[36019.361814] xhci_hcd 0000:00:0d.0: save config 0x30: 0x00000000
[36019.362812] xhci_hcd 0000:00:0d.0: save config 0x34: 0x00000070
[36019.363800] xhci_hcd 0000:00:0d.0: save config 0x38: 0x00000000
[36019.364777] xhci_hcd 0000:00:0d.0: save config 0x3c: 0x000000ff
[36019.365827] xhci_hcd 0000:00:0d.0: PME# enabled
[36019.377435] xhci_hcd 0000:00:0d.0: ACPI _REG disconnect evaluation failed (5)
[36019.378498] xhci_hcd 0000:00:0d.0: power state changed by ACPI to D3cold
[36020.729692] xhci_hcd 0000:00:0d.0: power state changed by ACPI to D0
[36020.730691] xhci_hcd 0000:00:0d.0: ACPI _REG connect evaluation failed (5)
[36020.744340] xhci_hcd 0000:00:0d.0: PME# disabled
[36020.745367] xhci_hcd 0000:00:0d.0: enabling bus mastering
[36020.773617] xhci_hcd 0000:00:0d.0: save config 0x00: 0x9a178086
[36020.774689] xhci_hcd 0000:00:0d.0: save config 0x04: 0x02900402
[36020.775712] xhci_hcd 0000:00:0d.0: save config 0x08: 0x0c033005
[36020.776704] xhci_hcd 0000:00:0d.0: save config 0x0c: 0x00800000
[36020.777711] xhci_hcd 0000:00:0d.0: save config 0x10: 0x4d170004
[36020.778736] xhci_hcd 0000:00:0d.0: save config 0x14: 0x00000064
[36020.779735] xhci_hcd 0000:00:0d.0: save config 0x18: 0x00000000
[36020.780733] xhci_hcd 0000:00:0d.0: save config 0x1c: 0x00000000
[36020.781738] xhci_hcd 0000:00:0d.0: save config 0x20: 0x00000000
[36020.782744] xhci_hcd 0000:00:0d.0: save config 0x24: 0x00000000
[36020.783734] xhci_hcd 0000:00:0d.0: save config 0x28: 0x00000000
[36020.784724] xhci_hcd 0000:00:0d.0: save config 0x2c: 0x22de17aa
[36020.785729] xhci_hcd 0000:00:0d.0: save config 0x30: 0x00000000
[36020.786719] xhci_hcd 0000:00:0d.0: save config 0x34: 0x00000070
[36020.787707] xhci_hcd 0000:00:0d.0: save config 0x38: 0x00000000
[36020.788683] xhci_hcd 0000:00:0d.0: save config 0x3c: 0x000000ff
[36020.789730] xhci_hcd 0000:00:0d.0: PME# enabled
[36020.801364] xhci_hcd 0000:00:0d.0: ACPI _REG disconnect evaluation failed (5)
[36020.802429] xhci_hcd 0000:00:0d.0: power state changed by ACPI to D3cold
[36022.149641] xhci_hcd 0000:00:0d.0: power state changed by ACPI to D0
[36022.150642] xhci_hcd 0000:00:0d.0: ACPI _REG connect evaluation failed (5)
[36022.164238] xhci_hcd 0000:00:0d.0: PME# disabled
[36022.165255] xhci_hcd 0000:00:0d.0: enabling bus mastering
[36022.166445] xhci_hcd 0000:00:0d.0: save config 0x00: 0x9a178086
[36022.167436] xhci_hcd 0000:00:0d.0: save config 0x04: 0x02900402
[36022.168471] xhci_hcd 0000:00:0d.0: save config 0x08: 0x0c033005
[36022.169481] xhci_hcd 0000:00:0d.0: save config 0x0c: 0x00800000
[36022.170476] xhci_hcd 0000:00:0d.0: save config 0x10: 0x4d170004
[36022.171474] xhci_hcd 0000:00:0d.0: save config 0x14: 0x00000064
[36022.172466] xhci_hcd 0000:00:0d.0: save config 0x18: 0x00000000
[36022.173469] xhci_hcd 0000:00:0d.0: save config 0x1c: 0x00000000
[36022.174484] xhci_hcd 0000:00:0d.0: save config 0x20: 0x00000000
[36022.175477] xhci_hcd 0000:00:0d.0: save config 0x24: 0x00000000
[36022.176471] xhci_hcd 0000:00:0d.0: save config 0x28: 0x00000000
[36022.177471] xhci_hcd 0000:00:0d.0: save config 0x2c: 0x22de17aa
[36022.178461] xhci_hcd 0000:00:0d.0: save config 0x30: 0x00000000
[36022.179453] xhci_hcd 0000:00:0d.0: save config 0x34: 0x00000070
[36022.180438] xhci_hcd 0000:00:0d.0: save config 0x38: 0x00000000
[36022.181423] xhci_hcd 0000:00:0d.0: save config 0x3c: 0x000000ff
[36022.182430] xhci_hcd 0000:00:0d.0: PME# enabled
[36022.196019] xhci_hcd 0000:00:0d.0: ACPI _REG disconnect evaluation failed (5)
[36022.197111] xhci_hcd 0000:00:0d.0: power state changed by ACPI to D3cold
[36022.198174] xhci_hcd 0000:00:0d.0: power state changed by ACPI to D0
[36022.199158] xhci_hcd 0000:00:0d.0: ACPI _REG connect evaluation failed (5)
[36022.212754] xhci_hcd 0000:00:0d.0: PME# disabled
[36022.213808] xhci_hcd 0000:00:0d.0: enabling bus mastering
[36022.245566] xhci_hcd 0000:00:0d.0: save config 0x00: 0x9a178086
[36022.246568] xhci_hcd 0000:00:0d.0: save config 0x04: 0x02900402
[36022.247563] xhci_hcd 0000:00:0d.0: save config 0x08: 0x0c033005
[36022.248551] xhci_hcd 0000:00:0d.0: save config 0x0c: 0x00800000
[36022.249553] xhci_hcd 0000:00:0d.0: save config 0x10: 0x4d170004
[36022.250549] xhci_hcd 0000:00:0d.0: save config 0x14: 0x00000064
[36022.251549] xhci_hcd 0000:00:0d.0: save config 0x18: 0x00000000
[36022.252539] xhci_hcd 0000:00:0d.0: save config 0x1c: 0x00000000
[36022.253530] xhci_hcd 0000:00:0d.0: save config 0x20: 0x00000000
[36022.254514] xhci_hcd 0000:00:0d.0: save config 0x24: 0x00000000
[36022.255508] xhci_hcd 0000:00:0d.0: save config 0x28: 0x00000000
[36022.256494] xhci_hcd 0000:00:0d.0: save config 0x2c: 0x22de17aa
[36022.257496] xhci_hcd 0000:00:0d.0: save config 0x30: 0x00000000
[36022.258508] xhci_hcd 0000:00:0d.0: save config 0x34: 0x00000070
[36022.259517] xhci_hcd 0000:00:0d.0: save config 0x38: 0x00000000
[36022.260494] xhci_hcd 0000:00:0d.0: save config 0x3c: 0x000000ff
[36022.261541] xhci_hcd 0000:00:0d.0: PME# enabled
[36022.273388] xhci_hcd 0000:00:0d.0: ACPI _REG disconnect evaluation failed (5)
[36022.274479] xhci_hcd 0000:00:0d.0: power state changed by ACPI to D3cold
[36022.508764] sysrq: Terminate All Tasks
Johannes Berg Dec. 4, 2023, 8:40 p.m. UTC | #2
On Mon, 2023-12-04 at 12:36 -0800, Marc MERLIN wrote:
> 
> So, ipc is not a module for me
> sauron:/usr/src/linux-6.6.4-amd64-preempt-sysrq-20231203# lsmod | grep ipc

I think the driver is "igc", not "ipc".

> sauron:/usr/src/linux-6.6.4-amd64-preempt-sysrq-20231203# 
> 
> Is that a build config option I can reaonably turn off without breaking
> networking altogether?
> 
> Previous boot, before your patch that I just compiled and will test now, 
> it ended with this. I also see ethtool lower down, maybe run randomly by network manager
> but isn't the hang happening before in these loglines, or the order of the logs is not
> the actual sequence order of the problem?

The order of this doesn't tell us much, I believe.

> [ 4230.596282] INFO: task kworker/0:2:132 blocked for more than 120 seconds.
> [ 4230.596288]       Tainted: G     U             6.6.4-amd64-preempt-sysrq-20231203 #5
> [ 4230.596289] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [ 4230.596291] task:kworker/0:2     state:D stack:0     pid:132   ppid:2      flags:0x00004000
> [ 4230.596294] Workqueue: ipv6_addrconf addrconf_verify_work
> [ 4230.596305] Call Trace:
> [ 4230.596310]  <TASK>
> [ 4230.596317]  __schedule+0x6c3/0x727
> [ 4230.596330]  ? update_load_avg+0x43/0x3ba
> [ 4230.596340]  schedule+0x8b/0xbc
> [ 4230.596349]  schedule_preempt_disabled+0x11/0x1d
> [ 4230.596359]  __mutex_lock.constprop.0+0x18b/0x291
> [ 4230.596367]  addrconf_verify_work+0xe/0x20

This is just a worker trying to acquire RTNL.

> [ 4230.596506] INFO: task NetworkManager:3153 blocked for more than 120 seconds.
> [ 4230.596507]       Tainted: G     U             6.6.4-amd64-preempt-sysrq-20231203 #5
> [ 4230.596508] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [ 4230.596509] task:NetworkManager  state:D stack:0     pid:3153  ppid:1      flags:0x00000002
> [ 4230.596511] Call Trace:
> [ 4230.596512]  <TASK>
> [ 4230.596513]  __schedule+0x6c3/0x727
> [ 4230.596515]  schedule+0x8b/0xbc
> [ 4230.596517]  schedule_preempt_disabled+0x11/0x1d
> [ 4230.596519]  __mutex_lock.constprop.0+0x18b/0x291
> [ 4230.596521]  nl80211_prepare_wdev_dump+0x8b/0x19f [cfg80211 7e4322db9148b7482d4806580d3e58073444690f]

This is just wireless trying to acquire the RTNL on behalf of NM.

> [ 4230.596743] INFO: task ThreadPoolForeg:9425 blocked for more than 120 seconds.
> [ 4230.596745]       Tainted: G     U             6.6.4-amd64-preempt-sysrq-20231203 #5
> [ 4230.596745] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [ 4230.596746] task:ThreadPoolForeg state:D stack:0     pid:9425  ppid:9416   flags:0x00000002
> [ 4230.596748] Call Trace:
> [ 4230.596749]  <TASK>
> [ 4230.596750]  __schedule+0x6c3/0x727
> [ 4230.596752]  schedule+0x8b/0xbc
> [ 4230.596754]  schedule_preempt_disabled+0x11/0x1d
> [ 4230.596756]  __mutex_lock.constprop.0+0x18b/0x291
> [ 4230.596758]  ? __pfx_ioctl_standard_call+0x40/0x40
> [ 4230.596760]  ? __pfx_ioctl_private_call+0x40/0x40
> [ 4230.596762]  wext_ioctl_dispatch+0x4b/0x1a7

I don't know why something should be using wireless extensions, but same
thing. Also wext is going away for WiFi7 devices. ;)

> [ 4230.596890] INFO: task ethtool:20272 blocked for more than 120 seconds.
> [ 4230.596891]       Tainted: G     U             6.6.4-amd64-preempt-sysrq-20231203 #5
> [ 4230.596892] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [ 4230.596893] task:ethtool         state:D stack:0     pid:20272 ppid:1      flags:0x00004006
> [ 4230.596894] Call Trace:
> [ 4230.596895]  <TASK>
> [ 4230.596896]  __schedule+0x6c3/0x727
> [ 4230.596899]  schedule+0x8b/0xbc
> [ 4230.596900]  schedule_preempt_disabled+0x11/0x1d
> [ 4230.596902]  __mutex_lock.constprop.0+0x18b/0x291
> [ 4230.596904]  ? __pfx_pci_pm_runtime_resume+0x40/0x40
> [ 4230.596908]  igc_resume+0x18b/0x1ca [igc 33cdaa7f35d000f01945bed5e87cab1e358c8327]
> [ 4230.596916]  __rpm_callback+0x7a/0xe7
> [ 4230.596919]  rpm_callback+0x35/0x64
> [ 4230.596921]  ? __pfx_pci_pm_runtime_resume+0x40/0x40
> [ 4230.596922]  rpm_resume+0x342/0x44a
> [ 4230.596924]  ? dev_get_by_name_rcu+0x25/0x41
> [ 4230.596926]  ? dev_get_by_name+0x3c/0x4a
> [ 4230.596928]  __pm_runtime_resume+0x5a/0x7a
> [ 4230.596930]  ethnl_ops_begin+0x2c/0x8d

This one's still the problem, so I guess my 2-line hack didn't do
anything.

[snip rest, they're all just waiting for RTNL]

johannes
Marc MERLIN Dec. 4, 2023, 8:54 p.m. UTC | #3
On Mon, Dec 04, 2023 at 09:40:08PM +0100, Johannes Berg wrote:
> This one's still the problem, so I guess my 2-line hack didn't do
> anything.

sorry, I wasn't clear, this was the last hang before your patch. I
wanted to make sure it matched your analysis, which it seems to, so
that's good.  I now understand that the order in printk is not actually
the order of who is at fault.
I'm testing your patch now, will let you know ASAP

Marc
Marc MERLIN Dec. 4, 2023, 9:28 p.m. UTC | #4
On Mon, Dec 04, 2023 at 12:54:39PM -0800, Marc MERLIN wrote:
> On Mon, Dec 04, 2023 at 09:40:08PM +0100, Johannes Berg wrote:
> > This one's still the problem, so I guess my 2-line hack didn't do
> > anything.
> 
> sorry, I wasn't clear, this was the last hang before your patch. I
> wanted to make sure it matched your analysis, which it seems to, so
> that's good.  I now understand that the order in printk is not actually
> the order of who is at fault.
> I'm testing your patch now, will let you know ASAP

Well. Good news. So far so good.

sauron:~# ethtool -i enp11s0
driver: igc
version: 6.6.4-amd64-volpre-sysrq-202312
firmware-version: 1073:8754
expansion-rom-version: 
bus-info: 0000:0b:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: yes

sauron:~# iwconfig wlp9s0
wlp9s0    IEEE 802.11  ESSID:"magicnet"  
          Mode:Managed  Frequency:5.2 GHz  Access Point: E0:63:DA:28:03:67   
          Bit Rate=866.7 Mb/s   Tx-Power=22 dBm   
          Retry short limit:7   RTS thr:off   Fragment thr:off
          Encryption key:off
          Power Management:off
          Link Quality=70/70  Signal level=-40 dBm  
          Rx invalid nwid:0  Rx invalid crypt:0  Rx invalid frag:0
          Tx excessive retries:0  Invalid misc:992   Missed beacon:0

sauron:~# lspci | grep -i net
09:00.0 Network controller: Intel Corporation Wi-Fi 6 AX210/AX211/AX411 160MHz (rev 1a)
0b:00.0 Ethernet controller: Intel Corporation Ethernet Controller I225-LM (rev 03)

It's unfortunate that nouveau doesn't seem to support the nvidia chip at all, not even
well enough to turn it off, but thankfully I can do this via
  echo 'auto' > '/sys/bus/pci/devices/0000:01:00.0/power/control'

For power, I was able to get it to idle at 13W with tlp powering down chips to slow
speeds, and the screen fairly dim. It's not great but it's cmoparable to the P73, so
close enough.

The weird PME loops I attached in the last Email would also butn batteries on my P73
without hanging it. Your patch may have fixed that too. If so, thank you
(will report if I see the PME stuff again).

Either way, I'm in much better shape right now on a laptop I was about
to return after having spent 3 days of effort on it, so a heartfelt thank you!

Where do you we go from here? Is the patch obviously good/safe, or do we
need to narrow things down/test some more?

Marc
Johannes Berg Dec. 4, 2023, 9:32 p.m. UTC | #5
On Mon, 2023-12-04 at 13:28 -0800, Marc MERLIN wrote:
> 
> Where do you we go from here? Is the patch obviously good/safe, or do we
> need to narrow things down/test some more?
> 

Well, I was hoping that

 (a) ethtool folks / Jakub would comment if this makes sense, but I
     don't see a good reason to do things the other way around (other
     than "code is simpler"); and

 (b) Intel wired folks could help out with getting the patch across the
     finish line, seeing how their driver needs it :) I think the dev
     get/put needs to use the newer API, but I didn't immediately see
     how that works locally in a function without an allocated tracker

johannes
Jakub Kicinski Dec. 4, 2023, 10:22 p.m. UTC | #6
On Mon, 04 Dec 2023 22:32:25 +0100 Johannes Berg wrote:
> Well, I was hoping that
> 
>  (a) ethtool folks / Jakub would comment if this makes sense, but I
>      don't see a good reason to do things the other way around (other
>      than "code is simpler"); and

My opinion on RPM is pretty uneducated. But taking rtnl_lock to resume
strikes me as shortsighted. RPM functionality should be fairly
self-contained, and deserving of a separate lock.
Or at the very least having looked at the igc RPM code in the past,
I'm a bit cautious about bending the core to fit it, as it is hardly
a model...

>  (b) Intel wired folks could help out with getting the patch across the
>      finish line, seeing how their driver needs it :) I think the dev
>      get/put needs to use the newer API, but I didn't immediately see
>      how that works locally in a function without an allocated tracker
Johannes Berg Dec. 4, 2023, 10:25 p.m. UTC | #7
On Mon, 2023-12-04 at 14:22 -0800, Jakub Kicinski wrote:
> On Mon, 04 Dec 2023 22:32:25 +0100 Johannes Berg wrote:
> > Well, I was hoping that
> > 
> >  (a) ethtool folks / Jakub would comment if this makes sense, but I
> >      don't see a good reason to do things the other way around (other
> >      than "code is simpler"); and
> 
> My opinion on RPM is pretty uneducated. But taking rtnl_lock to resume
> strikes me as shortsighted. RPM functionality should be fairly
> self-contained, and deserving of a separate lock.
> Or at the very least having looked at the igc RPM code in the past,
> I'm a bit cautious about bending the core to fit it, as it is hardly
> a model...

I could agree with that. The reason it seems to do that is that it
invokes some other ifup/down related code in suspend/resume...

On the other hand it also seems a bit odd to do something as unrelated
to networking as runtime PM (which you can also trigger through various
other paths such as sysfs) under RTNL, holding RTNL for longer than
seems necessary, and creating more contention on it?

johannes
Marc MERLIN Dec. 5, 2023, 2:46 a.m. UTC | #8
On Mon, Dec 04, 2023 at 01:28:49PM -0800, Marc MERLIN wrote:
> sauron:~# iwconfig wlp9s0
> wlp9s0    IEEE 802.11  ESSID:"magicnet"  
>           Mode:Managed  Frequency:5.2 GHz  Access Point: E0:63:DA:28:03:67   
>           Bit Rate=866.7 Mb/s   Tx-Power=22 dBm   
>           Retry short limit:7   RTS thr:off   Fragment thr:off
>           Encryption key:off
>           Power Management:off
>           Link Quality=70/70  Signal level=-40 dBm  
>           Rx invalid nwid:0  Rx invalid crypt:0  Rx invalid frag:0
>           Tx excessive retries:0  Invalid misc:992   Missed beacon:0
> 
> sauron:~# lspci | grep -i net
> 09:00.0 Network controller: Intel Corporation Wi-Fi 6 AX210/AX211/AX411 160MHz (rev 1a)
> 0b:00.0 Ethernet controller: Intel Corporation Ethernet Controller I225-LM (rev 03)

Probably not related and probably ok, but getting these bug workaround
messages:
[ 2448.505576] iwlwifi 0000:09:00.0: Detected RF GF, rfid=0x10d000
[ 2448.578917] iwlwifi 0000:09:00.0: base HW address: f4:4e:e3:fb:7f:93
[ 2448.601421] iwlwifi 0000:09:00.0 wlp9s0: renamed from wlan0
[ 2448.695483] iwlwifi 0000:09:00.0: WRT: Invalid buffer destination
[ 2448.863729] iwlwifi 0000:09:00.0: WFPM_UMAC_PD_NOTIFICATION: 0x20
[ 2448.863785] iwlwifi 0000:09:00.0: WFPM_LMAC2_PD_NOTIFICATION: 0x1f
[ 2448.863812] iwlwifi 0000:09:00.0: WFPM_AUTH_KEY_0: 0x90
[ 2448.863837] iwlwifi 0000:09:00.0: CNVI_SCU_SEQ_DATA_DW9: 0x0
[ 2448.971242] iwlwifi 0000:09:00.0: Registered PHC clock: iwlwifi-PTP, with index: 1
[ 2449.063668] iwlwifi 0000:09:00.0: WRT: Invalid buffer destination
[ 2449.232085] iwlwifi 0000:09:00.0: WFPM_UMAC_PD_NOTIFICATION: 0x20
[ 2449.232143] iwlwifi 0000:09:00.0: WFPM_LMAC2_PD_NOTIFICATION: 0x1f
[ 2449.232169] iwlwifi 0000:09:00.0: WFPM_AUTH_KEY_0: 0x90
[ 2449.232188] iwlwifi 0000:09:00.0: CNVI_SCU_SEQ_DATA_DW9: 0x0
[13323.572484] iwlwifi 0000:09:00.0: TB bug workaround: copied 152 bytes from 0xffffff68 to 0xfd080000
[13328.000825] iwlwifi 0000:09:00.0: TB bug workaround: copied 1272 bytes from 0xfffffb08 to 0xff42c000
[13367.278564] iwlwifi 0000:09:00.0: TB bug workaround: copied 1328 bytes from 0xfffffad0 to 0xfec41000
[13389.737971] iwlwifi 0000:09:00.0: TB bug workaround: copied 572 bytes from 0xfffffdc4 to 0xff091000
[13389.860480] iwlwifi 0000:09:00.0: TB bug workaround: copied 148 bytes from 0xffffff6c to 0xfe412000
[13393.435354] iwlwifi 0000:09:00.0: TB bug workaround: copied 360 bytes from 0xfffffe98 to 0xfedcd000
[13409.827199] iwlwifi 0000:09:00.0: TB bug workaround: copied 1348 bytes from 0xfffffabc to 0xfd057000

The other good news is that it now shuts down ok, which is
understandable with the deadlock fixed.

The only thing I'm still seeing is this, I'm not too sure how avad it
is, or whether it's ok-ish.
I understand it's likely totally unrelated, although there are some
iwlwifi lines below if that's helpful
Otherwise, laptop is still working ok so far.

[ 1879.679720] pcieport 0000:00:07.1: power state changed by ACPI to D0
[ 1879.679727] pcieport 0000:00:07.1: ACPI _REG connect evaluation failed (5)
[ 1879.691616] pcieport 0000:00:07.1: restore config 0x2c: 0x00000064 -> 0x00000064
[ 1879.691624] pcieport 0000:00:07.1: restore config 0x28: 0x00000064 -> 0x00000064
[ 1879.691627] pcieport 0000:00:07.1: restore config 0x24: 0x4bf13001 -> 0x4bf13001
[ 1879.691959] pcieport 0000:00:07.1: PME# disabled
[ 1879.691967] pcieport 0000:00:07.1: pciehp: pciehp_check_link_active: lnk_status = 1001
[ 1879.803706] pcieport 0000:00:07.0: power state changed by ACPI to D0
[ 1879.803714] pcieport 0000:00:07.0: ACPI _REG connect evaluation failed (5)
[ 1879.815840] pcieport 0000:00:07.1: save config 0x00: 0x9a2d8086
[ 1879.815847] pcieport 0000:00:07.1: save config 0x04: 0x00100407
[ 1879.815849] pcieport 0000:00:07.1: save config 0x08: 0x06040005
[ 1879.815851] pcieport 0000:00:07.1: save config 0x0c: 0x00810000
[ 1879.815852] pcieport 0000:00:07.1: save config 0x10: 0x00000000
[ 1879.815854] pcieport 0000:00:07.1: save config 0x14: 0x00000000
[ 1879.815856] pcieport 0000:00:07.1: save config 0x18: 0x00795000
[ 1879.815858] pcieport 0000:00:07.1: save config 0x1c: 0x00006060
[ 1879.815859] pcieport 0000:00:07.1: save config 0x20: 0x9e109200
[ 1879.815861] pcieport 0000:00:07.1: save config 0x24: 0x4bf13001
[ 1879.815863] pcieport 0000:00:07.1: save config 0x28: 0x00000064
[ 1879.815864] pcieport 0000:00:07.1: save config 0x2c: 0x00000064
[ 1879.815866] pcieport 0000:00:07.1: save config 0x30: 0x00000000
[ 1879.815868] pcieport 0000:00:07.1: save config 0x34: 0x00000040
[ 1879.815869] pcieport 0000:00:07.1: save config 0x38: 0x00000000
[ 1879.815871] pcieport 0000:00:07.1: save config 0x3c: 0x000202ff
[ 1879.815954] pcieport 0000:00:07.1: PME# enabled
[ 1879.816250] pcieport 0000:00:07.0: restore config 0x2c: 0x00000064 -> 0x00000064
[ 1879.816253] pcieport 0000:00:07.0: restore config 0x28: 0x00000064 -> 0x00000064
[ 1879.816256] pcieport 0000:00:07.0: restore config 0x24: 0x2bf11001 -> 0x2bf11001
[ 1879.816540] pcieport 0000:00:07.0: PME# disabled
[ 1879.816546] pcieport 0000:00:07.0: pciehp: pciehp_check_link_active: lnk_status = 1001
[ 1879.827631] pcieport 0000:00:07.1: ACPI _REG disconnect evaluation failed (5)
[ 1879.827765] pcieport 0000:00:07.1: power state changed by ACPI to D3cold
[ 1879.939979] pcieport 0000:00:07.0: save config 0x00: 0x9a2b8086
[ 1879.939986] pcieport 0000:00:07.0: save config 0x04: 0x00100407
[ 1879.939988] pcieport 0000:00:07.0: save config 0x08: 0x06040005
[ 1879.939989] pcieport 0000:00:07.0: save config 0x0c: 0x00810000
[ 1879.939991] pcieport 0000:00:07.0: save config 0x10: 0x00000000
[ 1879.939993] pcieport 0000:00:07.0: save config 0x14: 0x00000000
[ 1879.939995] pcieport 0000:00:07.0: save config 0x18: 0x00492000
[ 1879.939996] pcieport 0000:00:07.0: save config 0x1c: 0x00005050
[ 1879.939998] pcieport 0000:00:07.0: save config 0x20: 0xac10a000
[ 1879.940000] pcieport 0000:00:07.0: save config 0x24: 0x2bf11001
[ 1879.940001] pcieport 0000:00:07.0: save config 0x28: 0x00000064
[ 1879.940003] pcieport 0000:00:07.0: save config 0x2c: 0x00000064
[ 1879.940005] pcieport 0000:00:07.0: save config 0x30: 0x00000000
[ 1879.940007] pcieport 0000:00:07.0: save config 0x34: 0x00000040
[ 1879.940008] pcieport 0000:00:07.0: save config 0x38: 0x00000000
[ 1879.940010] pcieport 0000:00:07.0: save config 0x3c: 0x000201ff
[ 1879.940093] pcieport 0000:00:07.0: PME# enabled
[ 1879.951532] pcieport 0000:00:07.0: ACPI _REG disconnect evaluation failed (5)
[ 1879.951665] pcieport 0000:00:07.0: power state changed by ACPI to D3cold
[ 2085.430089] pci 0000:00:00.0: save config 0x00: 0x9a268086
[ 2085.430118] pci 0000:00:00.0: save config 0x04: 0x00900106
[ 2085.430131] pci 0000:00:00.0: save config 0x08: 0x06000005
[ 2085.430144] pci 0000:00:00.0: save config 0x0c: 0x00000000
[ 2085.430156] pci 0000:00:00.0: save config 0x10: 0x00000000
[ 2085.430167] pci 0000:00:00.0: save config 0x14: 0x00000000
[ 2085.430178] pci 0000:00:00.0: save config 0x18: 0x00000000
[ 2085.430189] pci 0000:00:00.0: save config 0x1c: 0x20000000
[ 2085.430200] pci 0000:00:00.0: save config 0x20: 0x00000000
[ 2085.430211] pci 0000:00:00.0: save config 0x24: 0x00000000
[ 2085.430222] pci 0000:00:00.0: save config 0x28: 0x00000000
[ 2085.430233] pci 0000:00:00.0: save config 0x2c: 0x22de17aa
[ 2085.430244] pci 0000:00:00.0: save config 0x30: 0x00000000
[ 2085.430255] pci 0000:00:00.0: save config 0x34: 0x00000000
[ 2085.430265] pci 0000:00:00.0: save config 0x38: 0x00000000
[ 2085.430276] pci 0000:00:00.0: save config 0x3c: 0x00000000
[ 2085.465583] xhci_hcd 0000:00:0d.0: save config 0x00: 0x9a178086
[ 2085.465609] xhci_hcd 0000:00:0d.0: save config 0x04: 0x02900402
[ 2085.465622] xhci_hcd 0000:00:0d.0: save config 0x08: 0x0c033005
[ 2085.465631] xhci_hcd 0000:00:0d.0: save config 0x0c: 0x00800000
[ 2085.465642] xhci_hcd 0000:00:0d.0: save config 0x10: 0x4d170004
[ 2085.465652] xhci_hcd 0000:00:0d.0: save config 0x14: 0x00000064
[ 2085.465663] xhci_hcd 0000:00:0d.0: save config 0x18: 0x00000000
[ 2085.465673] xhci_hcd 0000:00:0d.0: save config 0x1c: 0x00000000
[ 2085.465683] xhci_hcd 0000:00:0d.0: save config 0x20: 0x00000000
[ 2085.465693] xhci_hcd 0000:00:0d.0: save config 0x24: 0x00000000
[ 2085.465703] xhci_hcd 0000:00:0d.0: save config 0x28: 0x00000000
[ 2085.465712] xhci_hcd 0000:00:0d.0: save config 0x2c: 0x22de17aa
[ 2085.465722] xhci_hcd 0000:00:0d.0: save config 0x30: 0x00000000
[ 2085.465732] xhci_hcd 0000:00:0d.0: save config 0x34: 0x00000070
[ 2085.465742] xhci_hcd 0000:00:0d.0: save config 0x38: 0x00000000
[ 2085.465752] xhci_hcd 0000:00:0d.0: save config 0x3c: 0x000000ff
[ 2085.465920] xhci_hcd 0000:00:0d.0: PME# enabled
[ 2085.477664] thunderbolt 0000:00:0d.2: save config 0x00: 0x9a1f8086
[ 2085.477685] thunderbolt 0000:00:0d.2: save config 0x04: 0x00100407
[ 2085.477694] thunderbolt 0000:00:0d.2: save config 0x08: 0x0c034005
[ 2085.477701] thunderbolt 0000:00:0d.2: save config 0x0c: 0x00000000
[ 2085.477709] thunderbolt 0000:00:0d.2: save config 0x10: 0x4d100004
[ 2085.477717] thunderbolt 0000:00:0d.2: save config 0x14: 0x00000064
[ 2085.477725] thunderbolt 0000:00:0d.2: save config 0x18: 0x4d18b004
[ 2085.477732] thunderbolt 0000:00:0d.2: save config 0x1c: 0x00000064
[ 2085.477739] thunderbolt 0000:00:0d.2: save config 0x20: 0x00000000
[ 2085.477746] thunderbolt 0000:00:0d.2: save config 0x24: 0x00000000
[ 2085.477753] thunderbolt 0000:00:0d.2: save config 0x28: 0x00000000
[ 2085.477760] thunderbolt 0000:00:0d.2: save config 0x2c: 0x22de17aa
[ 2085.477767] thunderbolt 0000:00:0d.2: save config 0x30: 0x00000000
[ 2085.477774] thunderbolt 0000:00:0d.2: save config 0x34: 0x00000080
[ 2085.477781] thunderbolt 0000:00:0d.2: save config 0x38: 0x00000000
[ 2085.477789] thunderbolt 0000:00:0d.2: save config 0x3c: 0x000001ff
[ 2085.478239] thunderbolt 0000:00:0d.2: PME# enabled
[ 2085.478864] xhci_hcd 0000:00:0d.0: ACPI _REG disconnect evaluation failed (5)
[ 2085.479107] xhci_hcd 0000:00:0d.0: power state changed by ACPI to D3cold
[ 2085.486600] pci 0000:00:14.2: save config 0x00: 0x43ef8086
[ 2085.486624] pci 0000:00:14.2: save config 0x04: 0x00100000
[ 2085.486634] pci 0000:00:14.2: save config 0x08: 0x05000011
[ 2085.486642] pci 0000:00:14.2: save config 0x0c: 0x00000000
[ 2085.486651] pci 0000:00:14.2: save config 0x10: 0x4d184004
[ 2085.486659] pci 0000:00:14.2: save config 0x14: 0x00000064
[ 2085.486668] pci 0000:00:14.2: save config 0x18: 0x4d18a004
[ 2085.486675] pci 0000:00:14.2: save config 0x1c: 0x00000064
[ 2085.486683] pci 0000:00:14.2: save config 0x20: 0x00000000
[ 2085.486691] pci 0000:00:14.2: save config 0x24: 0x00000000
[ 2085.486699] pci 0000:00:14.2: save config 0x28: 0x00000000
[ 2085.486706] pci 0000:00:14.2: save config 0x2c: 0x22de17aa
[ 2085.486714] pci 0000:00:14.2: save config 0x30: 0x00000000
[ 2085.486722] pci 0000:00:14.2: save config 0x34: 0x00000080
[ 2085.486730] pci 0000:00:14.2: save config 0x38: 0x00000000
[ 2085.486738] pci 0000:00:14.2: save config 0x3c: 0x000000ff
[ 2085.490056] thunderbolt 0000:00:0d.2: ACPI _REG disconnect evaluation failed (5)
[ 2085.518094] pci 0000:00:1f.0: save config 0x00: 0x43898086
[ 2085.518119] pci 0000:00:1f.0: save config 0x04: 0x00000407
[ 2085.518129] pci 0000:00:1f.0: save config 0x08: 0x06010011
[ 2085.518137] pci 0000:00:1f.0: save config 0x0c: 0x00800000
[ 2085.518145] pci 0000:00:1f.0: save config 0x10: 0x00000000
[ 2085.518154] pci 0000:00:1f.0: save config 0x14: 0x00000000
[ 2085.518162] pci 0000:00:1f.0: save config 0x18: 0x00000000
[ 2085.518169] pci 0000:00:1f.0: save config 0x1c: 0x00000000
[ 2085.518177] pci 0000:00:1f.0: save config 0x20: 0x00000000
[ 2085.518184] pci 0000:00:1f.0: save config 0x24: 0x00000000
[ 2085.518192] pci 0000:00:1f.0: save config 0x28: 0x00000000
[ 2085.518200] pci 0000:00:1f.0: save config 0x2c: 0x22de17aa
[ 2085.518208] pci 0000:00:1f.0: save config 0x30: 0x00000000
[ 2085.518215] pci 0000:00:1f.0: save config 0x34: 0x00000000
[ 2085.518223] pci 0000:00:1f.0: save config 0x38: 0x00000000
[ 2085.518230] pci 0000:00:1f.0: save config 0x3c: 0x00000000
[ 2085.524402] thunderbolt 0000:00:0d.2: power state changed by ACPI to D3cold
[ 2085.533410] i801_smbus 0000:00:1f.4: save config 0x00: 0x43a38086
[ 2085.533435] i801_smbus 0000:00:1f.4: save config 0x04: 0x02800003
[ 2085.533448] i801_smbus 0000:00:1f.4: save config 0x08: 0x0c050011
[ 2085.533458] i801_smbus 0000:00:1f.4: save config 0x0c: 0x00000000
[ 2085.533469] i801_smbus 0000:00:1f.4: save config 0x10: 0x4d188004
[ 2085.533479] i801_smbus 0000:00:1f.4: save config 0x14: 0x00000064
[ 2085.533490] i801_smbus 0000:00:1f.4: save config 0x18: 0x00000000
[ 2085.533500] i801_smbus 0000:00:1f.4: save config 0x1c: 0x00000000
[ 2085.533509] i801_smbus 0000:00:1f.4: save config 0x20: 0x0000efa1
[ 2085.533520] i801_smbus 0000:00:1f.4: save config 0x24: 0x00000000
[ 2085.533529] i801_smbus 0000:00:1f.4: save config 0x28: 0x00000000
[ 2085.533539] i801_smbus 0000:00:1f.4: save config 0x2c: 0x22de17aa
[ 2085.533549] i801_smbus 0000:00:1f.4: save config 0x30: 0x00000000
[ 2085.533559] i801_smbus 0000:00:1f.4: save config 0x34: 0x00000000
[ 2085.533568] i801_smbus 0000:00:1f.4: save config 0x38: 0x00000000
[ 2085.533578] i801_smbus 0000:00:1f.4: save config 0x3c: 0x000001ff
[ 2085.540270] pci 0000:00:1f.5: save config 0x00: 0x43a48086
[ 2085.540292] pci 0000:00:1f.5: save config 0x04: 0x00000402
[ 2085.540302] pci 0000:00:1f.5: save config 0x08: 0x0c800011
[ 2085.540311] pci 0000:00:1f.5: save config 0x0c: 0x00000000
[ 2085.540320] pci 0000:00:1f.5: save config 0x10: 0x90800000
[ 2085.540329] pci 0000:00:1f.5: save config 0x14: 0x00000000
[ 2085.540337] pci 0000:00:1f.5: save config 0x18: 0x00000000
[ 2085.540345] pci 0000:00:1f.5: save config 0x1c: 0x00000000
[ 2085.540352] pci 0000:00:1f.5: save config 0x20: 0x00000000
[ 2085.540360] pci 0000:00:1f.5: save config 0x24: 0x00000000
[ 2085.540367] pci 0000:00:1f.5: save config 0x28: 0x00000000
[ 2085.540375] pci 0000:00:1f.5: save config 0x2c: 0x22de17aa
[ 2085.540383] pci 0000:00:1f.5: save config 0x30: 0x00000000
[ 2085.540391] pci 0000:00:1f.5: save config 0x34: 0x00000000
[ 2085.540398] pci 0000:00:1f.5: save config 0x38: 0x00000000
[ 2085.540406] pci 0000:00:1f.5: save config 0x3c: 0x00000000
[ 2085.555233] snd_hda_intel 0000:01:00.1: save config 0x00: 0x228b10de
[ 2085.555260] snd_hda_intel 0000:01:00.1: save config 0x04: 0x00100006
[ 2085.555276] snd_hda_intel 0000:01:00.1: save config 0x08: 0x040300a1
[ 2085.555289] snd_hda_intel 0000:01:00.1: save config 0x0c: 0x00800000
[ 2085.555301] snd_hda_intel 0000:01:00.1: save config 0x10: 0xae000000
[ 2085.555315] snd_hda_intel 0000:01:00.1: save config 0x14: 0x00000000
[ 2085.555328] snd_hda_intel 0000:01:00.1: save config 0x18: 0x00000000
[ 2085.555341] snd_hda_intel 0000:01:00.1: save config 0x1c: 0x00000000
[ 2085.555353] snd_hda_intel 0000:01:00.1: save config 0x20: 0x00000000
[ 2085.555365] snd_hda_intel 0000:01:00.1: save config 0x24: 0x00000000
[ 2085.555377] snd_hda_intel 0000:01:00.1: save config 0x28: 0x00000000
[ 2085.555389] snd_hda_intel 0000:01:00.1: save config 0x2c: 0x22de17aa
[ 2085.555401] snd_hda_intel 0000:01:00.1: save config 0x30: 0x00000000
[ 2085.555412] snd_hda_intel 0000:01:00.1: save config 0x34: 0x00000060
[ 2085.555425] snd_hda_intel 0000:01:00.1: save config 0x38: 0x00000000
[ 2085.555437] snd_hda_intel 0000:01:00.1: save config 0x3c: 0x000002ff
[ 2085.578054] pci 0000:01:00.0: save config 0x00: 0x24b610de
[ 2085.578079] pci 0000:01:00.0: save config 0x04: 0x00100003
[ 2085.578093] pci 0000:01:00.0: save config 0x08: 0x030000a1
[ 2085.578106] pci 0000:01:00.0: save config 0x0c: 0x00800000
[ 2085.578119] pci 0000:01:00.0: save config 0x10: 0xad000000
[ 2085.578131] pci 0000:01:00.0: save config 0x14: 0x0000000c
[ 2085.578144] pci 0000:01:00.0: save config 0x18: 0x00000060
[ 2085.578157] pci 0000:01:00.0: save config 0x1c: 0x0000000c
[ 2085.578170] pci 0000:01:00.0: save config 0x20: 0x00000064
[ 2085.578181] pci 0000:01:00.0: save config 0x24: 0x00003001
[ 2085.578193] pci 0000:01:00.0: save config 0x28: 0x00000000
[ 2085.578205] pci 0000:01:00.0: save config 0x2c: 0x22de17aa
[ 2085.578217] pci 0000:01:00.0: save config 0x30: 0xfff80000
[ 2085.578228] pci 0000:01:00.0: save config 0x34: 0x00000060
[ 2085.578240] pci 0000:01:00.0: save config 0x38: 0x00000000
[ 2085.578252] pci 0000:01:00.0: save config 0x3c: 0x000001ff
[ 2085.578322] pcieport 0000:00:01.0: save config 0x00: 0x9a018086
[ 2085.578337] pcieport 0000:00:01.0: save config 0x04: 0x00100507
[ 2085.578350] pcieport 0000:00:01.0: save config 0x08: 0x06040005
[ 2085.578362] pcieport 0000:00:01.0: save config 0x0c: 0x00810000
[ 2085.578374] pcieport 0000:00:01.0: save config 0x10: 0x00000000
[ 2085.578386] pcieport 0000:00:01.0: save config 0x14: 0x00000000
[ 2085.578398] pcieport 0000:00:01.0: save config 0x18: 0x00010100
[ 2085.578411] pcieport 0000:00:01.0: save config 0x1c: 0x20003030
[ 2085.578423] pcieport 0000:00:01.0: save config 0x20: 0xae00ad00
[ 2085.578434] pcieport 0000:00:01.0: save config 0x24: 0x01f10001
[ 2085.578446] pcieport 0000:00:01.0: save config 0x28: 0x00000060
[ 2085.578458] pcieport 0000:00:01.0: save config 0x2c: 0x00000064
[ 2085.578470] pcieport 0000:00:01.0: save config 0x30: 0x00000000
[ 2085.578482] pcieport 0000:00:01.0: save config 0x34: 0x00000040
[ 2085.578494] pcieport 0000:00:01.0: save config 0x38: 0x00000000
[ 2085.578506] pcieport 0000:00:01.0: save config 0x3c: 0x000201ff
[ 2085.578650] pcieport 0000:00:01.0: PME# enabled
[ 2085.590082] pcieport 0000:00:01.0: ACPI _REG disconnect evaluation failed (5)
[ 2085.826081] pcieport 0000:00:01.0: power state changed by ACPI to D3cold
(...)
[ 2153.122300] igc 0000:0b:00.0: enabling bus mastering
[ 2158.299859] igc 0000:0b:00.0: save config 0x00: 0x15f28086
[ 2158.300357] igc 0000:0b:00.0: save config 0x04: 0x00100002
[ 2158.300859] igc 0000:0b:00.0: save config 0x08: 0x02000003
[ 2158.301362] igc 0000:0b:00.0: save config 0x0c: 0x00000020
[ 2158.301874] igc 0000:0b:00.0: save config 0x10: 0xae400000
[ 2158.302361] igc 0000:0b:00.0: save config 0x14: 0x00000000
[ 2158.302445] igc 0000:0b:00.0: save config 0x18: 0x00000000
[ 2158.302453] igc 0000:0b:00.0: save config 0x1c: 0xae500000
[ 2158.302462] igc 0000:0b:00.0: save config 0x20: 0x00000000
[ 2158.302470] igc 0000:0b:00.0: save config 0x24: 0x00000000
[ 2158.302478] igc 0000:0b:00.0: save config 0x28: 0x00000000
[ 2158.302486] igc 0000:0b:00.0: save config 0x2c: 0x22d817aa
[ 2158.302494] igc 0000:0b:00.0: save config 0x30: 0x00000000
[ 2158.302502] igc 0000:0b:00.0: save config 0x34: 0x00000040
[ 2158.302510] igc 0000:0b:00.0: save config 0x38: 0x00000000
[ 2158.302518] igc 0000:0b:00.0: save config 0x3c: 0x000001ff
[ 2158.302596] igc 0000:0b:00.0: PME# enabled
[ 2158.313996] pcieport 0000:00:1c.4: save config 0x00: 0x43bc8086
[ 2158.314045] pcieport 0000:00:1c.4: save config 0x04: 0x00100407
[ 2158.314080] pcieport 0000:00:1c.4: save config 0x08: 0x06040011
[ 2158.314113] pcieport 0000:00:1c.4: save config 0x0c: 0x00810000
[ 2158.314121] pcieport 0000:00:1c.4: save config 0x10: 0x00000000
[ 2158.314154] pcieport 0000:00:1c.4: save config 0x14: 0x00000000
[ 2158.314187] pcieport 0000:00:1c.4: save config 0x18: 0x000b0b00
[ 2158.314219] pcieport 0000:00:1c.4: save config 0x1c: 0x200000f0
[ 2158.314252] pcieport 0000:00:1c.4: save config 0x20: 0xae50ae40
[ 2158.314284] pcieport 0000:00:1c.4: save config 0x24: 0x0001fff1
[ 2158.314292] pcieport 0000:00:1c.4: save config 0x28: 0x00000000
[ 2158.314324] pcieport 0000:00:1c.4: save config 0x2c: 0x00000000
[ 2158.314357] pcieport 0000:00:1c.4: save config 0x30: 0x00000000
[ 2158.314389] pcieport 0000:00:1c.4: save config 0x34: 0x00000040
[ 2158.314416] pcieport 0000:00:1c.4: save config 0x38: 0x00000000
[ 2158.314448] pcieport 0000:00:1c.4: save config 0x3c: 0x000201ff
[ 2158.314541] pcieport 0000:00:1c.4: PME# enabled
[ 2167.722083] wlp9s0: deauthenticating from e0:63:da:28:03:67 by local choice (Reason: 3=DEAUTH_LEAVI
[ 2167.997820] iwlwifi 0000:09:00.0: Unregistering PHC clock: iwlwifi-PTP, with index: 1
[ 2168.045860] iwlwifi 0000:09:00.0: vgaarb: pci_notify
[ 2168.414051] iwlwifi 0000:09:00.0: save config 0x00: 0x27258086
[ 2168.414255] iwlwifi 0000:09:00.0: save config 0x04: 0x00100406
[ 2168.414462] iwlwifi 0000:09:00.0: save config 0x08: 0x0280001a
[ 2168.414670] iwlwifi 0000:09:00.0: save config 0x0c: 0x00000000
[ 2168.414791] iwlwifi 0000:09:00.0: save config 0x10: 0xae300004
[ 2168.414929] iwlwifi 0000:09:00.0: save config 0x14: 0x00000000
[ 2168.415119] iwlwifi 0000:09:00.0: save config 0x18: 0x00000000
[ 2168.415257] iwlwifi 0000:09:00.0: save config 0x1c: 0x00000000
[ 2168.415448] iwlwifi 0000:09:00.0: save config 0x20: 0x00000000
[ 2168.415569] iwlwifi 0000:09:00.0: save config 0x24: 0x00000000
[ 2168.415689] iwlwifi 0000:09:00.0: save config 0x28: 0x00000000
[ 2168.415828] iwlwifi 0000:09:00.0: save config 0x2c: 0x00208086
[ 2168.416018] iwlwifi 0000:09:00.0: save config 0x30: 0x00000000
[ 2168.416139] iwlwifi 0000:09:00.0: save config 0x34: 0x000000c8
[ 2168.416260] iwlwifi 0000:09:00.0: save config 0x38: 0x00000000
[ 2168.416380] iwlwifi 0000:09:00.0: save config 0x3c: 0x000001ff
[ 2168.417051] pcieport 0000:00:1c.5: save config 0x00: 0x43bd8086
[ 2168.417089] pcieport 0000:00:1c.5: save config 0x04: 0x00100407
[ 2168.417126] pcieport 0000:00:1c.5: save config 0x08: 0x06040011
[ 2168.417162] pcieport 0000:00:1c.5: save config 0x0c: 0x00810000
[ 2168.417197] pcieport 0000:00:1c.5: save config 0x10: 0x00000000
[ 2168.417233] pcieport 0000:00:1c.5: save config 0x14: 0x00000000
[ 2168.417269] pcieport 0000:00:1c.5: save config 0x18: 0x00090900
[ 2168.417304] pcieport 0000:00:1c.5: save config 0x1c: 0x200000f0
[ 2168.417340] pcieport 0000:00:1c.5: save config 0x20: 0xae30ae30
[ 2168.417376] pcieport 0000:00:1c.5: save config 0x24: 0x0001fff1
[ 2168.417412] pcieport 0000:00:1c.5: save config 0x28: 0x00000000
[ 2168.417447] pcieport 0000:00:1c.5: save config 0x2c: 0x00000000
[ 2168.417483] pcieport 0000:00:1c.5: save config 0x30: 0x00000000
[ 2168.417517] pcieport 0000:00:1c.5: save config 0x34: 0x00000040
[ 2168.417553] pcieport 0000:00:1c.5: save config 0x38: 0x00000000
[ 2168.417626] pcieport 0000:00:1c.5: save config 0x3c: 0x000202ff
[ 2168.417876] pcieport 0000:00:1c.5: PME# enabled
[ 2168.421126] pci 0000:09:00.0: vgaarb: pci_notify
[ 2173.389773] pcieport 0000:00:1c.4: restore config 0x2c: 0x00000000 -> 0x00000000
[ 2173.389829] pcieport 0000:00:1c.4: restore config 0x28: 0x00000000 -> 0x00000000
[ 2173.389869] pcieport 0000:00:1c.4: restore config 0x24: 0x0001fff1 -> 0x0001fff1
[ 2173.390080] pcieport 0000:00:1c.4: PME# disabled
[ 2173.402003] igc 0000:0b:00.0: PME# disabled
[ 2173.402110] igc 0000:0b:00.0: save config 0x00: 0x15f28086
[ 2173.402133] igc 0000:0b:00.0: save config 0x04: 0x00100002
[ 2173.402142] igc 0000:0b:00.0: save config 0x08: 0x02000003
[ 2173.402151] igc 0000:0b:00.0: save config 0x0c: 0x00000020
[ 2173.402159] igc 0000:0b:00.0: save config 0x10: 0xae400000
[ 2173.402168] igc 0000:0b:00.0: save config 0x14: 0x00000000
[ 2173.402176] igc 0000:0b:00.0: save config 0x18: 0x00000000
[ 2173.402184] igc 0000:0b:00.0: save config 0x1c: 0xae500000
[ 2173.402192] igc 0000:0b:00.0: save config 0x20: 0x00000000
[ 2173.402200] igc 0000:0b:00.0: save config 0x24: 0x00000000
[ 2173.402207] igc 0000:0b:00.0: save config 0x28: 0x00000000
[ 2173.402215] igc 0000:0b:00.0: save config 0x2c: 0x22d817aa
[ 2173.402223] igc 0000:0b:00.0: save config 0x30: 0x00000000



Marc
Przemek Kitszel Dec. 5, 2023, 5:19 a.m. UTC | #9
On 12/4/23 20:07, Johannes Berg wrote:
> From: Johannes Berg <johannes.berg@intel.com>
> 
> As reported by Marc MERLIN in [1], at least one driver (igc)

perhaps Reported-by tag? (I know this is RFC as of now)

> wants/needs to acquire the RTNL inside suspend/resume ops,
> which can be called from here in ethtool if runtime PM is
> enabled.
> 
> [1] https://lore.kernel.org/r/20231202221402.GA11155@merlins.org
> 
> Allow this by doing runtime PM transitions without the RTNL
> held. For the ioctl to have the same operations order, this
> required reworking the code to separately check validity and
> do the operation. For the netlink code, this now has to do
> the runtime_pm_put a bit later.
> 
> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
> ---
>   net/ethtool/ioctl.c   | 71 ++++++++++++++++++++++++++-----------------
>   net/ethtool/netlink.c | 32 ++++++++-----------
>   2 files changed, 56 insertions(+), 47 deletions(-)
> 
Thank you for the patch,

I like the idea of split into validate + do for dev_ethtool(),
what minimizes unneeded PM touching. Moving pm_runtime_get_sync() out of 
RTNL is also a great improvement per se. Also from the pure coding 
perspective I see no obvious flaws in the patch. I think that igc code
was just accidental to the issue, in a way that it was not deliberate to
hold RTNL for extended periods. With your patch fixing the bug, there is
no point with waiting IMO, so

Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Johannes Berg Dec. 5, 2023, 7:33 p.m. UTC | #10
On Mon, 2023-12-04 at 18:46 -0800, Marc MERLIN wrote:
> 
> [13323.572484] iwlwifi 0000:09:00.0: TB bug workaround: copied 152 bytes from 0xffffff68 to 0xfd080000
> [13328.000825] iwlwifi 0000:09:00.0: TB bug workaround: copied 1272 bytes from 0xfffffb08 to 0xff42c000
> [13367.278564] iwlwifi 0000:09:00.0: TB bug workaround: copied 1328 bytes from 0xfffffad0 to 0xfec41000
> [13389.737971] iwlwifi 0000:09:00.0: TB bug workaround: copied 572 bytes from 0xfffffdc4 to 0xff091000
> [13389.860480] iwlwifi 0000:09:00.0: TB bug workaround: copied 148 bytes from 0xffffff6c to 0xfe412000
> [13393.435354] iwlwifi 0000:09:00.0: TB bug workaround: copied 360 bytes from 0xfffffe98 to 0xfedcd000
> [13409.827199] iwlwifi 0000:09:00.0: TB bug workaround: copied 1348 bytes from 0xfffffabc to 0xfd057000

That's fine, just working around a HW bug on 2^32 address boundaries.

I had a patch a long time ago to make those messages not appear ... not
sure where it ended up.

johannes
Johannes Berg Dec. 5, 2023, 7:48 p.m. UTC | #11
On Tue, 2023-12-05 at 06:19 +0100, Przemek Kitszel wrote:
> On 12/4/23 20:07, Johannes Berg wrote:
> > From: Johannes Berg <johannes.berg@intel.com>
> > 
> > As reported by Marc MERLIN in [1], at least one driver (igc)
> 
> perhaps Reported-by tag? (I know this is RFC as of now)

I guess.

> > wants/needs to acquire the RTNL inside suspend/resume ops,
> > which can be called from here in ethtool if runtime PM is
> > enabled.
> > 
> > [1] https://lore.kernel.org/r/20231202221402.GA11155@merlins.org
> > 
> > Allow this by doing runtime PM transitions without the RTNL
> > held. For the ioctl to have the same operations order, this
> > required reworking the code to separately check validity and
> > do the operation. For the netlink code, this now has to do
> > the runtime_pm_put a bit later.
> > 
> > Signed-off-by: Johannes Berg <johannes.berg@intel.com>
> > ---
> >   net/ethtool/ioctl.c   | 71 ++++++++++++++++++++++++++-----------------
> >   net/ethtool/netlink.c | 32 ++++++++-----------
> >   2 files changed, 56 insertions(+), 47 deletions(-)
> > 
> Thank you for the patch,
> 
> I like the idea of split into validate + do for dev_ethtool(),
> what minimizes unneeded PM touching. Moving pm_runtime_get_sync() out of 
> RTNL is also a great improvement per se. Also from the pure coding 
> perspective I see no obvious flaws in the patch. I think that igc code
> was just accidental to the issue, in a way that it was not deliberate to
> hold RTNL for extended periods.

Well Jakub was arguing igc shouldn't be taking rtnl in suspend/resume,
maybe, but dunno.

> With your patch fixing the bug, there is
> no point with waiting IMO, so
> 
> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>

Well, according to the checks, the patch really should use
netdev_get_by_name() and netdev_put()? But I don't know how to do that
on short-term stack thing ... maybe it doesn't have to?

johannes
Marc MERLIN Dec. 5, 2023, 11:15 p.m. UTC | #12
On Tue, Dec 05, 2023 at 08:33:02PM +0100, Johannes Berg wrote:
> On Mon, 2023-12-04 at 18:46 -0800, Marc MERLIN wrote:
> > 
> > [13323.572484] iwlwifi 0000:09:00.0: TB bug workaround: copied 152 bytes from 0xffffff68 to 0xfd080000
> > [13328.000825] iwlwifi 0000:09:00.0: TB bug workaround: copied 1272 bytes from 0xfffffb08 to 0xff42c000
> > [13367.278564] iwlwifi 0000:09:00.0: TB bug workaround: copied 1328 bytes from 0xfffffad0 to 0xfec41000
> > [13389.737971] iwlwifi 0000:09:00.0: TB bug workaround: copied 572 bytes from 0xfffffdc4 to 0xff091000
> > [13389.860480] iwlwifi 0000:09:00.0: TB bug workaround: copied 148 bytes from 0xffffff6c to 0xfe412000
> > [13393.435354] iwlwifi 0000:09:00.0: TB bug workaround: copied 360 bytes from 0xfffffe98 to 0xfedcd000
> > [13409.827199] iwlwifi 0000:09:00.0: TB bug workaround: copied 1348 bytes from 0xfffffabc to 0xfd057000
> 
> That's fine, just working around a HW bug on 2^32 address boundaries.
> 
> I had a patch a long time ago to make those messages not appear ... not
> sure where it ended up.

About this wifi chip (AX210/AX211/AX411), one last issue I have with it is when I go to
batteries, it burns a lot of power:
> The battery reports a discharge rate of 36.8 W
> The power consumed was 772 J
> The estimated remaining time is 1 hours, 56 minutes
> 
> Summary: 1657.3 wakeups/second,  0.0 GPU ops/seconds, 0.0 VFS ops/sec and 121.5% CPU use
> 
> Power est.              Usage       Events/s    Category       Description
>   24.1 W     59.6 pkts/s                Device         Network interface: wlp9s0 (iwlwifi)
>   200 mW     80.6%                      Device         Display backlight
>   139 mW      0.0 µs/s      0.00        Process        [PID 8005] /usr/bin/pipewire
>  5.37 mW     18.0 ms/s     537.1        Timer          tick_sched_timer
> 
> 
> The battery reports a discharge rate of 36.1 W
> The power consumed was 744 J
> System baseline power is estimated at 28.9 W
> 
> Power est.    Usage     Device name
>   24.0 W    100.0%        Radio device: iwlwifi
>   4.43 W    205.2%        CPU core
>   230 mW    100.0%        Audio codec alsa:hwC0D0: thinkpad (Realtek) (pipewire )
> 
> autosuspend is on:
>    Good          Runtime PM for PCI Device Intel Corporation Wi-Fi 6 AX210/AX211/AX411 160MHz


Obviously unloading iwlwifi fixes it:
> The battery reports a discharge rate of 26.0 W
> The power consumed was 534 J
> The estimated remaining time is 2 hours, 40 minutes
> 
> Summary: 927.2 wakeups/second,  0.0 GPU ops/seconds, 0.0 VFS ops/sec and 85.5% CPU use
> 
> Power est.              Usage       Events/s    Category       Description
>   6.95 W    100.0%                      Device         USB device: Yubico Gnubby (gnubby1) (Yubico)
>   2.09 W      0.0 µs/s      0.00        Process        [PID 8005] /usr/bin/pipewire
>   185 mW     80.6%                      Device         Display backlight
> 
> The battery reports a discharge rate of 26.0 W
> The power consumed was 534 J
> System baseline power is estimated at 11.5 W
> 
> Power est.    Usage     Device name
>   6.95 W    100.0%        USB device: Yubico Gnubby (gnubby1) (Yubico)
>   2.09 W    100.0%        Audio codec alsa:hwC0D0: thinkpad (Realtek) (pipewire )
>   2.02 W     85.5%        CPU core

What's very interesting is if I re-insert the iwlwifi module after that, 
it works and used a lot less power last time I tried, but this time around
it's still shown as high although probably with a wrong Watt value because powertop
is confused somehow:
  28.8 W    470.9 pkts/s                Device         Network interface: wlp9s0 (iwlwifi)

  24.4 W    100.0%        Radio device: iwlwifi

Is this something I should persue separately on
linux-wireless@vger.kernel.org or ilw@linux.intel.com or elsewhere?

Thanks,
Marc
Przemek Kitszel Dec. 6, 2023, 8:46 a.m. UTC | #13
On 12/5/23 20:48, Johannes Berg wrote:
> On Tue, 2023-12-05 at 06:19 +0100, Przemek Kitszel wrote:
>> On 12/4/23 20:07, Johannes Berg wrote:
>>> From: Johannes Berg <johannes.berg@intel.com>
>>>
>>> As reported by Marc MERLIN in [1], at least one driver (igc)
>>
>> perhaps Reported-by tag? (I know this is RFC as of now)
> 
> I guess.
> 
>>> wants/needs to acquire the RTNL inside suspend/resume ops,
>>> which can be called from here in ethtool if runtime PM is
>>> enabled.
>>>
>>> [1] https://lore.kernel.org/r/20231202221402.GA11155@merlins.org
>>>
>>> Allow this by doing runtime PM transitions without the RTNL
>>> held. For the ioctl to have the same operations order, this
>>> required reworking the code to separately check validity and
>>> do the operation. For the netlink code, this now has to do
>>> the runtime_pm_put a bit later.
>>>
>>> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
>>> ---
>>>    net/ethtool/ioctl.c   | 71 ++++++++++++++++++++++++++-----------------
>>>    net/ethtool/netlink.c | 32 ++++++++-----------
>>>    2 files changed, 56 insertions(+), 47 deletions(-)
>>>
>> Thank you for the patch,
>>
>> I like the idea of split into validate + do for dev_ethtool(),
>> what minimizes unneeded PM touching. Moving pm_runtime_get_sync() out of
>> RTNL is also a great improvement per se. Also from the pure coding
>> perspective I see no obvious flaws in the patch. I think that igc code
>> was just accidental to the issue, in a way that it was not deliberate to
>> hold RTNL for extended periods.
> 
> Well Jakub was arguing igc shouldn't be taking rtnl in suspend/resume,
> maybe, but dunno.

That sounds right too; one could argue if your fix is orthogonal to that
or not. I would say that your fix makes core net code more robust
against drivers from past millennia. :)
igc folks are notified, no idea how much time it would take to propose
a fix.

> 
>> With your patch fixing the bug, there is
>> no point with waiting IMO, so
>>
>> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
> 
> Well, according to the checks, the patch really should use
> netdev_get_by_name() and netdev_put()? But I don't know how to do that
> on short-term stack thing ... maybe it doesn't have to?

Nice to have such checks :)
You need some &netdevice_tracker, perhaps one added into struct net
or other place that would allow to track it at ethtool level.

"short term stack thing" does not relieve us from good coding practices,
but perhaps "you just replaced __dev_get_by_name() call by
dev_get_by_name()" to fix a bug would ;) - with transition to tracked
alloc as a next series to be promised :)

anyway, I'm fresh here, and would love to know what others think about

> 
> johannes
Johannes Berg Dec. 6, 2023, 9:37 a.m. UTC | #14
On Wed, 2023-12-06 at 09:46 +0100, Przemek Kitszel wrote:
> 
> That sounds right too; one could argue if your fix is orthogonal to that
> or not. I would say that your fix makes core net code more robust
> against drivers from past millennia. :)
> igc folks are notified, no idea how much time it would take to propose
> a fix.

Maybe it should be on whoever added runtime pm to ethtool ;-)

Heiner, the igc driver was already doing this when you added
pm_runtime_get_sync() ops, was there a discussion at the time, or just
missed?

I really don't know any of this ...

> > Well, according to the checks, the patch really should use
> > netdev_get_by_name() and netdev_put()? But I don't know how to do that
> > on short-term stack thing ... maybe it doesn't have to?
> 
> Nice to have such checks :)
> You need some &netdevice_tracker, perhaps one added into struct net
> or other place that would allow to track it at ethtool level.

Yeah but that's dynamic? Seems weird to add something to allocations for
something released in the same function ...

> "short term stack thing" does not relieve us from good coding practices,
> but perhaps "you just replaced __dev_get_by_name() call by
> dev_get_by_name()" to fix a bug would ;) - with transition to tracked
> alloc as a next series to be promised :)

All I want is to know how ;)
but I guess I can try to find examples.

> anyway, I'm fresh here, and would love to know what others think about

Not me, but me too ;-)

johannes
Johannes Berg Dec. 6, 2023, 9:50 a.m. UTC | #15
On Tue, 2023-12-05 at 15:15 -0800, Marc MERLIN wrote:
> > 
> > Power est.              Usage       Events/s    Category       Description
> >   24.1 W     59.6 pkts/s                Device         Network interface: wlp9s0 (iwlwifi)

That seems excessive, I'm not sure it's even _possible_ for it to draw
that much power?

> > The battery reports a discharge rate of 36.1 W
> 
> Obviously unloading iwlwifi fixes it:
> > The battery reports a discharge rate of 26.0 W

That would put it closer to 10W, but even that seems too much... no idea
though.

> Is this something I should persue separately on
> linux-wireless@vger.kernel.org or ilw@linux.intel.com or elsewhere?
> 

Certainly separately, maybe put a bug on bugzilla per
https://wireless.wiki.kernel.org/en/users/drivers/iwlwifi/debugging

johannes
Heiner Kallweit Dec. 6, 2023, 11:59 a.m. UTC | #16
On 06.12.2023 10:37, Johannes Berg wrote:
> On Wed, 2023-12-06 at 09:46 +0100, Przemek Kitszel wrote:
>>
>> That sounds right too; one could argue if your fix is orthogonal to that
>> or not. I would say that your fix makes core net code more robust
>> against drivers from past millennia. :)
>> igc folks are notified, no idea how much time it would take to propose
>> a fix.
> 
> Maybe it should be on whoever added runtime pm to ethtool ;-)
> 
> Heiner, the igc driver was already doing this when you added
> pm_runtime_get_sync() ops, was there a discussion at the time, or just
> missed?
> 
I think it went unnoticed at that time that igc is acquiring RTNL
in runtime-resume. I'm just astonished that this pops up only now,
considering that the change was done more than 2 yrs ago.

Note: In __dev_open() there's a similar scenario where the
runtime-resume callback may be executed under RTNL.

> I really don't know any of this ...
> 
>>> Well, according to the checks, the patch really should use
>>> netdev_get_by_name() and netdev_put()? But I don't know how to do that
>>> on short-term stack thing ... maybe it doesn't have to?
>>
>> Nice to have such checks :)
>> You need some &netdevice_tracker, perhaps one added into struct net
>> or other place that would allow to track it at ethtool level.
> 
> Yeah but that's dynamic? Seems weird to add something to allocations for
> something released in the same function ...
> 
>> "short term stack thing" does not relieve us from good coding practices,
>> but perhaps "you just replaced __dev_get_by_name() call by
>> dev_get_by_name()" to fix a bug would ;) - with transition to tracked
>> alloc as a next series to be promised :)
> 
> All I want is to know how ;)
> but I guess I can try to find examples.
> 
>> anyway, I'm fresh here, and would love to know what others think about
> 
> Not me, but me too ;-)
> 
> johannes
Stanislaw Gruszka Jan. 3, 2024, 8:20 a.m. UTC | #17
On Wed, Dec 06, 2023 at 12:59:09PM +0100, Heiner Kallweit wrote:
> On 06.12.2023 10:37, Johannes Berg wrote:
> > On Wed, 2023-12-06 at 09:46 +0100, Przemek Kitszel wrote:
> >>
> >> That sounds right too; one could argue if your fix is orthogonal to that
> >> or not. I would say that your fix makes core net code more robust
> >> against drivers from past millennia. :)
> >> igc folks are notified, no idea how much time it would take to propose
> >> a fix.
> > 
> > Maybe it should be on whoever added runtime pm to ethtool ;-)
> > 
> > Heiner, the igc driver was already doing this when you added
> > pm_runtime_get_sync() ops, was there a discussion at the time, or just
> > missed?
> > 
> I think it went unnoticed at that time that igc is acquiring RTNL
> in runtime-resume. I'm just astonished that this pops up only now,
> considering that the change was done more than 2 yrs ago.

PM runtime is disabled by default for igc (driver do not call
pm_runtime_allow()). It has to be enabled explicitly by user write
to pci sysfs power/control file. And after that link up/down or
ethtool has to be used to trigger this deadlock, so quite unlikely
scenario.

Is possible though, that some power saving daemon start enabling
pm runtime for devices, so that why issue become visible.

Regards
Stanislaw
diff mbox series

Patch

diff --git a/net/ethtool/ioctl.c b/net/ethtool/ioctl.c
index 0b0ce4f81c01..ae7e165dc471 100644
--- a/net/ethtool/ioctl.c
+++ b/net/ethtool/ioctl.c
@@ -2765,26 +2765,18 @@  static int ethtool_set_fecparam(struct net_device *dev, void __user *useraddr)
 /* The main entry point in this file.  Called from net/core/dev_ioctl.c */
 
 static int
-__dev_ethtool(struct net *net, struct ifreq *ifr, void __user *useraddr,
-	      u32 ethcmd, struct ethtool_devlink_compat *devlink_state)
+__dev_ethtool_check(struct net_device *dev, void __user *useraddr,
+		    u32 ethcmd, u32 *sub_cmd)
 {
-	struct net_device *dev;
-	u32 sub_cmd;
-	int rc;
-	netdev_features_t old_features;
-
-	dev = __dev_get_by_name(net, ifr->ifr_name);
-	if (!dev)
-		return -ENODEV;
-
 	if (ethcmd == ETHTOOL_PERQUEUE) {
-		if (copy_from_user(&sub_cmd, useraddr + sizeof(ethcmd), sizeof(sub_cmd)))
+		if (copy_from_user(sub_cmd, useraddr + sizeof(ethcmd), sizeof(*sub_cmd)))
 			return -EFAULT;
 	} else {
-		sub_cmd = ethcmd;
+		*sub_cmd = ethcmd;
 	}
+
 	/* Allow some commands to be done by anyone */
-	switch (sub_cmd) {
+	switch (*sub_cmd) {
 	case ETHTOOL_GSET:
 	case ETHTOOL_GDRVINFO:
 	case ETHTOOL_GMSGLVL:
@@ -2823,22 +2815,28 @@  __dev_ethtool(struct net *net, struct ifreq *ifr, void __user *useraddr,
 	case ETHTOOL_GFECPARAM:
 		break;
 	default:
-		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(dev_net(dev)->user_ns, CAP_NET_ADMIN))
 			return -EPERM;
 	}
 
-	if (dev->dev.parent)
-		pm_runtime_get_sync(dev->dev.parent);
+	return 0;
+}
 
-	if (!netif_device_present(dev)) {
-		rc = -ENODEV;
-		goto out;
-	}
+static int
+__dev_ethtool_do(struct net_device *dev, struct ifreq *ifr,
+		 void __user *useraddr, u32 ethcmd, u32 sub_cmd,
+		 struct ethtool_devlink_compat *devlink_state)
+{
+	netdev_features_t old_features;
+	int rc;
+
+	if (!netif_device_present(dev))
+		return -ENODEV;
 
 	if (dev->ethtool_ops->begin) {
 		rc = dev->ethtool_ops->begin(dev);
 		if (rc < 0)
-			goto out;
+			return rc;
 	}
 	old_features = dev->features;
 
@@ -3049,7 +3047,7 @@  __dev_ethtool(struct net *net, struct ifreq *ifr, void __user *useraddr,
 		rc = ethtool_set_fecparam(dev, useraddr);
 		break;
 	default:
-		rc = -EOPNOTSUPP;
+		return -EOPNOTSUPP;
 	}
 
 	if (dev->ethtool_ops->complete)
@@ -3057,9 +3055,6 @@  __dev_ethtool(struct net *net, struct ifreq *ifr, void __user *useraddr,
 
 	if (old_features != dev->features)
 		netdev_features_change(dev);
-out:
-	if (dev->dev.parent)
-		pm_runtime_put(dev->dev.parent);
 
 	return rc;
 }
@@ -3067,7 +3062,8 @@  __dev_ethtool(struct net *net, struct ifreq *ifr, void __user *useraddr,
 int dev_ethtool(struct net *net, struct ifreq *ifr, void __user *useraddr)
 {
 	struct ethtool_devlink_compat *state;
-	u32 ethcmd;
+	struct net_device *dev = NULL;
+	u32 ethcmd, subcmd;
 	int rc;
 
 	if (copy_from_user(&ethcmd, useraddr, sizeof(ethcmd)))
@@ -3087,9 +3083,26 @@  int dev_ethtool(struct net *net, struct ifreq *ifr, void __user *useraddr)
 		break;
 	}
 
+	dev = dev_get_by_name(net, ifr->ifr_name);
+	if (!dev) {
+		rc = -ENODEV;
+		goto exit_free;
+	}
+
+	rc = __dev_ethtool_check(dev, useraddr, ethcmd, &subcmd);
+	if (rc)
+		goto exit_free;
+
+	if (dev->dev.parent)
+		pm_runtime_get_sync(dev->dev.parent);
+
 	rtnl_lock();
-	rc = __dev_ethtool(net, ifr, useraddr, ethcmd, state);
+	rc = __dev_ethtool_do(dev, ifr, useraddr, ethcmd, subcmd, state);
 	rtnl_unlock();
+
+	if (dev->dev.parent)
+		pm_runtime_put(dev->dev.parent);
+
 	if (rc)
 		goto exit_free;
 
@@ -3112,6 +3125,8 @@  int dev_ethtool(struct net *net, struct ifreq *ifr, void __user *useraddr)
 	}
 
 exit_free:
+	if (dev)
+		dev_put(dev);
 	if (state->devlink)
 		devlink_put(state->devlink);
 	kfree(state);
diff --git a/net/ethtool/netlink.c b/net/ethtool/netlink.c
index fe3553f60bf3..ee9ccf64688d 100644
--- a/net/ethtool/netlink.c
+++ b/net/ethtool/netlink.c
@@ -34,39 +34,23 @@  int ethnl_ops_begin(struct net_device *dev)
 {
 	int ret;
 
-	if (!dev)
-		return -ENODEV;
-
-	if (dev->dev.parent)
-		pm_runtime_get_sync(dev->dev.parent);
-
 	if (!netif_device_present(dev) ||
-	    dev->reg_state == NETREG_UNREGISTERING) {
-		ret = -ENODEV;
-		goto err;
-	}
+	    dev->reg_state == NETREG_UNREGISTERING)
+	    	return -ENODEV;
 
 	if (dev->ethtool_ops->begin) {
 		ret = dev->ethtool_ops->begin(dev);
 		if (ret)
-			goto err;
+			return ret;
 	}
 
 	return 0;
-err:
-	if (dev->dev.parent)
-		pm_runtime_put(dev->dev.parent);
-
-	return ret;
 }
 
 void ethnl_ops_complete(struct net_device *dev)
 {
 	if (dev->ethtool_ops->complete)
 		dev->ethtool_ops->complete(dev);
-
-	if (dev->dev.parent)
-		pm_runtime_put(dev->dev.parent);
 }
 
 /**
@@ -602,6 +586,14 @@  static int ethnl_default_set_doit(struct sk_buff *skb, struct genl_info *info)
 			goto out_dev;
 	}
 
+	if (!req_info.dev) {
+		ret = -ENODEV;
+		goto out_dev;
+	}
+
+	if (req_info.dev->dev.parent)
+		pm_runtime_get_sync(req_info.dev->dev.parent);
+
 	rtnl_lock();
 	ret = ethnl_ops_begin(req_info.dev);
 	if (ret < 0)
@@ -617,6 +609,8 @@  static int ethnl_default_set_doit(struct sk_buff *skb, struct genl_info *info)
 	ethnl_ops_complete(req_info.dev);
 out_rtnl:
 	rtnl_unlock();
+	if (req_info.dev->dev.parent)
+		pm_runtime_put(req_info.dev->dev.parent);
 out_dev:
 	ethnl_parse_header_dev_put(&req_info);
 	return ret;