mbox series

[v4,net-next,0/4] net: Provide SMP threads for backlog NAPI

Message ID 20240305120002.1499223-1-bigeasy@linutronix.de (mailing list archive)
Headers show
Series net: Provide SMP threads for backlog NAPI | expand

Message

Sebastian Andrzej Siewior March 5, 2024, 11:53 a.m. UTC
The RPS code and "deferred skb free" both send IPI/ function call
to a remote CPU in which a softirq is raised. This leads to a warning on
PREEMPT_RT because raising softiqrs from function call led to undesired
behaviour in the past. I had duct tape in RT for the "deferred skb free"
and Wander Lairson Costa reported the RPS case.

This series only provides support for SMP threads for backlog NAPI, I
did not attach a patch to make it default and remove the IPI related
code to avoid confusion. I can post it for reference it asked.

The RedHat performance team was so kind to provide some testing here.
The series (with the IPI code removed) has been tested and no regression
vs without the series has been found. For testing iperf3 was used on 25G
interface, provided by mlx5, ix40e or ice driver and RPS was enabled. I
can provide the individual test results if needed.

Changes:
- v3…v4 https://lore.kernel.org/all/20240228121000.526645-1-bigeasy@linutronix.de/

  - Rebase on top of current net-next, collect Acks.

  - Add struct softnet_data as an argument to kick_defer_list_purge().

  - Add sd_has_rps_ipi_waiting() check to napi_threaded_poll_loop() which was
    accidentally removed.

- v2…v3 https://lore.kernel.org/all/20240221172032.78737-1-bigeasy@linutronix.de/

  - Move the "if use_backlog_threads()" case into the CONFIG_RPS block
    within napi_schedule_rps().

  - Use __napi_schedule_irqoff() instead of napi_schedule_rps() in
    kick_defer_list_purge().

- v1…v2 https://lore.kernel.org/all/20230929162121.1822900-1-bigeasy@linutronix.de/

  - Patch #1 is new. It ensures that NAPI_STATE_SCHED_THREADED is always
    set (instead conditional based on task state) and the smboot thread
    logic relies on this bit now. In v1 NAPI_STATE_SCHED was used but is
    racy.

  - The defer list clean up is split out and also relies on
    NAPI_STATE_SCHED_THREADED. This fixes a different race.

- RFC…v1 https://lore.kernel.org/all/20230814093528.117342-1-bigeasy@linutronix.de/

   - Patch #2 has been removed. Removing the warning is still an option.

   - There are two patches in the series:
     - Patch #1 always creates backlog threads
     - Patch #2 creates the backlog threads if requested at boot time,
       mandatory on PREEMPT_RT.
     So it is either or and I wanted to show how both look like.

   - The kernel test robot reported a performance regression with
     loopback (stress-ng --udp X --udp-ops Y) against the RFC version.
     The regression is now avoided by using local-NAPI if backlog
     processing is requested on the local CPU.

Sebastian

Comments

Wander Lairson Costa March 5, 2024, 12:07 p.m. UTC | #1
On Tue, Mar 05, 2024 at 12:53:18PM +0100, Sebastian Andrzej Siewior wrote:
> The RPS code and "deferred skb free" both send IPI/ function call
> to a remote CPU in which a softirq is raised. This leads to a warning on
> PREEMPT_RT because raising softiqrs from function call led to undesired
> behaviour in the past. I had duct tape in RT for the "deferred skb free"
> and Wander Lairson Costa reported the RPS case.
> 
> This series only provides support for SMP threads for backlog NAPI, I
> did not attach a patch to make it default and remove the IPI related
> code to avoid confusion. I can post it for reference it asked.
> 
> The RedHat performance team was so kind to provide some testing here.
> The series (with the IPI code removed) has been tested and no regression
> vs without the series has been found. For testing iperf3 was used on 25G
> interface, provided by mlx5, ix40e or ice driver and RPS was enabled. I
> can provide the individual test results if needed.
> 
> Changes:
> - v3…v4 https://lore.kernel.org/all/20240228121000.526645-1-bigeasy@linutronix.de/
> 
>   - Rebase on top of current net-next, collect Acks.
> 
>   - Add struct softnet_data as an argument to kick_defer_list_purge().
> 
>   - Add sd_has_rps_ipi_waiting() check to napi_threaded_poll_loop() which was
>     accidentally removed.
> 
> - v2…v3 https://lore.kernel.org/all/20240221172032.78737-1-bigeasy@linutronix.de/
> 
>   - Move the "if use_backlog_threads()" case into the CONFIG_RPS block
>     within napi_schedule_rps().
> 
>   - Use __napi_schedule_irqoff() instead of napi_schedule_rps() in
>     kick_defer_list_purge().
> 
> - v1…v2 https://lore.kernel.org/all/20230929162121.1822900-1-bigeasy@linutronix.de/
> 
>   - Patch #1 is new. It ensures that NAPI_STATE_SCHED_THREADED is always
>     set (instead conditional based on task state) and the smboot thread
>     logic relies on this bit now. In v1 NAPI_STATE_SCHED was used but is
>     racy.
> 
>   - The defer list clean up is split out and also relies on
>     NAPI_STATE_SCHED_THREADED. This fixes a different race.
> 
> - RFC…v1 https://lore.kernel.org/all/20230814093528.117342-1-bigeasy@linutronix.de/
> 
>    - Patch #2 has been removed. Removing the warning is still an option.
> 
>    - There are two patches in the series:
>      - Patch #1 always creates backlog threads
>      - Patch #2 creates the backlog threads if requested at boot time,
>        mandatory on PREEMPT_RT.
>      So it is either or and I wanted to show how both look like.
> 
>    - The kernel test robot reported a performance regression with
>      loopback (stress-ng --udp X --udp-ops Y) against the RFC version.
>      The regression is now avoided by using local-NAPI if backlog
>      processing is requested on the local CPU.
> 
> Sebastian
> 

Patch 0002 does not apply for me. I tried torvalds/master and
linux-rt-devel/linux-6.8.y-rt. Which tree should I use?
Denis Kirjanov March 5, 2024, 12:17 p.m. UTC | #2
>>
> 
> Patch 0002 does not apply for me. I tried torvalds/master and
> linux-rt-devel/linux-6.8.y-rt. Which tree should I use?

git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git

> 
>
Sebastian Andrzej Siewior March 8, 2024, 3:33 p.m. UTC | #3
On 2024-03-05 12:53:18 [+0100], To netdev@vger.kernel.org wrote:
> The RPS code and "deferred skb free" both send IPI/ function call
> to a remote CPU in which a softirq is raised. This leads to a warning on
> PREEMPT_RT because raising softiqrs from function call led to undesired
> behaviour in the past. I had duct tape in RT for the "deferred skb free"
> and Wander Lairson Costa reported the RPS case.
> 
> This series only provides support for SMP threads for backlog NAPI, I
> did not attach a patch to make it default and remove the IPI related
> code to avoid confusion. I can post it for reference it asked.
> 
> The RedHat performance team was so kind to provide some testing here.
> The series (with the IPI code removed) has been tested and no regression
> vs without the series has been found. For testing iperf3 was used on 25G
> interface, provided by mlx5, ix40e or ice driver and RPS was enabled. I
> can provide the individual test results if needed.
> 
> Changes:
> - v3…v4 https://lore.kernel.org/all/20240228121000.526645-1-bigeasy@linutronix.de/

The v4 is marked as "Changes Requested". Is there anything for me to do?
I've been asked to rebase v3 on top of net-next which I did with v4. It
still applies onto net-next as of today.
 
Sebastian
Jakub Kicinski March 9, 2024, 4:29 a.m. UTC | #4
On Fri, 8 Mar 2024 16:33:02 +0100 Sebastian Andrzej Siewior wrote:
> The v4 is marked as "Changes Requested". Is there anything for me to do?
> I've been asked to rebase v3 on top of net-next which I did with v4. It
> still applies onto net-next as of today.

Hm, I tried to apply and it doesn't, sure you fetched?
Big set of changes from Eric got applied last night.
Sebastian Andrzej Siewior March 9, 2024, 9:09 a.m. UTC | #5
On 2024-03-08 20:29:54 [-0800], Jakub Kicinski wrote:
> On Fri, 8 Mar 2024 16:33:02 +0100 Sebastian Andrzej Siewior wrote:
> > The v4 is marked as "Changes Requested". Is there anything for me to do?
> > I've been asked to rebase v3 on top of net-next which I did with v4. It
> > still applies onto net-next as of today.
> 
> Hm, I tried to apply and it doesn't, sure you fetched?
> Big set of changes from Eric got applied last night.

So git merge did fine but the individual import failed due to recent
changes. Now I rebased it on top of
   d7e14e5344933 ("Merge tag 'mlx5-socket-direct-v3' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux")

and reposted as of 20240309090824.2956805-1-bigeasy@linutronix.de.

Sebastian