mbox series

[net,0/4,pull,request] igb: fix igb_msix_other() handling for PREEMPT_RT

Message ID 20250204175243.810189-1-anthony.l.nguyen@intel.com (mailing list archive)
Headers show
Series igb: fix igb_msix_other() handling for PREEMPT_RT | expand

Message

Tony Nguyen Feb. 4, 2025, 5:52 p.m. UTC
Wander Lairson Costa says:

This is the second attempt at fixing the behavior of igb_msix_other()
for PREEMPT_RT. The previous attempt [1] was reverted [2] following
concerns raised by Sebastian [3].

The initial approach proposed converting vfs_lock to a raw_spinlock,
a minor change intended to make it safe. However, it became evident
that igb_rcv_msg_from_vf() invokes kcalloc with GFP_ATOMIC,
which is unsafe in interrupt context on PREEMPT_RT systems.

To address this, the solution involves splitting igb_msg_task()
into two parts:

    * One part invoked from the IRQ context.
    * Another part called from the threaded interrupt handler.

To accommodate this, vfs_lock has been restructured into a double
lock: a spinlock_t and a raw_spinlock_t. In the revised design:

    * igb_disable_sriov() locks both spinlocks.
    * Each part of igb_msg_task() locks the appropriate spinlock for
    its execution context.

It is worth noting that the double lock mechanism is only active under
PREEMPT_RT. For non-PREEMPT_RT builds, the additional raw_spinlock_t
field is omitted.

If the extra raw_spinlock_t field can be tolerated under
!PREEMPT_RT (even though it remains unused), we can eliminate the
need for #ifdefs and simplify the code structure.

[1] https://lore.kernel.org/all/20240920185918.616302-2-wander@redhat.com/
[2] https://lore.kernel.org/all/20241104124050.22290-1-wander@redhat.com/
[3] https://lore.kernel.org/all/20241104110708.gFyxRFlC@linutronix.de/
---
IWL: https://lore.kernel.org/intel-wired-lan/20241204114229.21452-1-wander@redhat.com/

The following are changes since commit 4241a702e0d0c2ca9364cfac08dbf134264962de:
  rxrpc: Fix the rxrpc_connection attend queue handling
and are available in the git repository at:
  git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue 1GbE

Wander Lairson Costa (4):
  igb: narrow scope of vfs_lock in SR-IOV cleanup
  igb: introduce raw vfs_lock to igb_adapter
  igb: split igb_msg_task()
  igb: fix igb_msix_other() handling for PREEMPT_RT

 drivers/net/ethernet/intel/igb/igb.h      |   4 +
 drivers/net/ethernet/intel/igb/igb_main.c | 160 +++++++++++++++++++---
 2 files changed, 148 insertions(+), 16 deletions(-)

Comments

Sebastian Andrzej Siewior Feb. 5, 2025, 9:48 a.m. UTC | #1
On 2025-02-04 09:52:36 [-0800], Tony Nguyen wrote:
> Wander Lairson Costa says:
> 
> This is the second attempt at fixing the behavior of igb_msix_other()
> for PREEMPT_RT. The previous attempt [1] was reverted [2] following
> concerns raised by Sebastian [3].

I still prefer a solution where we don't have the ifdef in the driver. I
was presented two traces but I didn't get why it works in once case but
not in the other. Maybe it was too obvious.
In the mean time:

igb_msg_task_irq_safe()
-> vfs_raw_spin_lock_irqsave() // raw_spinlock_t
-> igb_vf_reset_event()
  -> igb_vf_reset()
    -> igb_set_rx_mode()
      -> igb_write_mc_addr_list()
         -> mta_list = kcalloc(netdev_mc_count(netdev), 6, GFP_ATOMIC); // kaboom?

By explicitly disabling preemption or using a raw_spinlock_t you need to
pay attention not to do anything that might lead to unbounded loops
(like iterating over many lists, polling on a bit for ages, …) and
paying attention that the whole API underneath that it is not doing that
is allowed to.

Sebastian