diff mbox series

[RFC,iwl-net] e1000: Hold RTNL when e1000_down can be called

Message ID 20241022172153.217890-1-jdamato@fastly.com (mailing list archive)
State RFC
Delegated to: Netdev Maintainers
Headers show
Series [RFC,iwl-net] e1000: Hold RTNL when e1000_down can be called | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for net
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag present in non-next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 5 this patch: 5
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers success CCed 8 of 8 maintainers
netdev/build_clang success Errors and warnings before: 3 this patch: 3
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes fail Problems with Fixes tag: 1
netdev/build_allmodconfig_warn success Errors and warnings before: 4 this patch: 4
netdev/checkpatch warning WARNING: Unknown commit id '8f7ff18a5ec7', maybe rebased or not pulled?
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 30 this patch: 30
netdev/source_inline success Was 0 now: 0

Commit Message

Joe Damato Oct. 22, 2024, 5:21 p.m. UTC
e1000_down calls netif_queue_set_napi, which assumes that RTNL is held.

There are a few paths for e1000_down to be called in e1000 where RTNL is
not currently being held:
  - e1000_shutdown (pci shutdown)
  - e1000_suspend (power management)
  - e1000_reinit_locked (via e1000_reset_task delayed work)

Hold RTNL in two places to fix this issue:
  - e1000_reset_task
  - __e1000_shutdown (which is called from both e1000_shutdown and
    e1000_suspend).

The other paths which call e1000_down seemingly hold RTNL and are OK:
  - e1000_close (ndo_stop)
  - e1000_change_mtu (ndo_change_mtu)

I'm submitting this is as an RFC because:
  - the e1000_reinit_locked issue appears very similar to commit
    21f857f0321d ("e1000e: add rtnl_lock() to e1000_reset_task"), which
    fixes a similar issue in e1000e

however

  - adding rtnl to e1000_reinit_locked seemingly conflicts with an
    earlier e1000 commit b2f963bfaeba ("e1000: fix lockdep warning in
    e1000_reset_task").

Hopefully Intel can weigh in and shed some light on the correct way to
go.

Fixes: 8f7ff18a5ec7 ("e1000: Link NAPI instances to queues and IRQs")
Reported-by: Dmitry Antipov <dmantipov@yandex.ru>
Closes: https://lore.kernel.org/netdev/8cf62307-1965-46a0-a411-ff0080090ff9@yandex.ru/
Signed-off-by: Joe Damato <jdamato@fastly.com>
---
 drivers/net/ethernet/intel/e1000/e1000_main.c | 4 ++++
 1 file changed, 4 insertions(+)


base-commit: d811ac148f0afd2f3f7e1cd7f54de8da973ec5e3

Comments

Joe Damato Oct. 22, 2024, 8 p.m. UTC | #1
On Tue, Oct 22, 2024 at 05:21:53PM +0000, Joe Damato wrote:
> e1000_down calls netif_queue_set_napi, which assumes that RTNL is held.
> 
> There are a few paths for e1000_down to be called in e1000 where RTNL is
> not currently being held:
>   - e1000_shutdown (pci shutdown)
>   - e1000_suspend (power management)
>   - e1000_reinit_locked (via e1000_reset_task delayed work)
> 
> Hold RTNL in two places to fix this issue:
>   - e1000_reset_task
>   - __e1000_shutdown (which is called from both e1000_shutdown and
>     e1000_suspend).

It looks like there's one other spot I missed:

e1000_io_error_detected (pci error handler) which should also hold
rtnl_lock:

+       if (netif_running(netdev)) {
+               rtnl_lock();
                e1000_down(adapter);
+               rtnl_unlock();
+       }

I can send that update in the v2, but I'll wait to see if Intel has suggestions
on the below.
 
> The other paths which call e1000_down seemingly hold RTNL and are OK:
>   - e1000_close (ndo_stop)
>   - e1000_change_mtu (ndo_change_mtu)
> 
> I'm submitting this is as an RFC because:
>   - the e1000_reinit_locked issue appears very similar to commit
>     21f857f0321d ("e1000e: add rtnl_lock() to e1000_reset_task"), which
>     fixes a similar issue in e1000e
> 
> however
> 
>   - adding rtnl to e1000_reinit_locked seemingly conflicts with an
>     earlier e1000 commit b2f963bfaeba ("e1000: fix lockdep warning in
>     e1000_reset_task").
> 
> Hopefully Intel can weigh in and shed some light on the correct way to
> go.
> 
> Fixes: 8f7ff18a5ec7 ("e1000: Link NAPI instances to queues and IRQs")
> Reported-by: Dmitry Antipov <dmantipov@yandex.ru>
> Closes: https://lore.kernel.org/netdev/8cf62307-1965-46a0-a411-ff0080090ff9@yandex.ru/
> Signed-off-by: Joe Damato <jdamato@fastly.com>
diff mbox series

Patch

diff --git a/drivers/net/ethernet/intel/e1000/e1000_main.c b/drivers/net/ethernet/intel/e1000/e1000_main.c
index 4de9b156b2be..9ed99c75d59e 100644
--- a/drivers/net/ethernet/intel/e1000/e1000_main.c
+++ b/drivers/net/ethernet/intel/e1000/e1000_main.c
@@ -3509,7 +3509,9 @@  static void e1000_reset_task(struct work_struct *work)
 		container_of(work, struct e1000_adapter, reset_task);
 
 	e_err(drv, "Reset adapter\n");
+	rtnl_lock();
 	e1000_reinit_locked(adapter);
+	rtnl_unlock();
 }
 
 /**
@@ -5074,7 +5076,9 @@  static int __e1000_shutdown(struct pci_dev *pdev, bool *enable_wake)
 			usleep_range(10000, 20000);
 
 		WARN_ON(test_bit(__E1000_RESETTING, &adapter->flags));
+		rtnl_lock();
 		e1000_down(adapter);
+		rtnl_unlock();
 	}
 
 	status = er32(STATUS);