mbox series

[RFC,net-next,0/4] net: page_pool: a couple assorted optimizations

Message ID 20230629152305.905962-1-aleksander.lobakin@intel.com (mailing list archive)
Headers show
Series net: page_pool: a couple assorted optimizations | expand

Message

Alexander Lobakin June 29, 2023, 3:23 p.m. UTC
Here's spin off the IAVF PP series[0], with 2 runtime (hotpath) and 1
compile-time optimizations. They're based and tested on top of the
hybrid PP allocation series[1], but don't require it to work and
in general independent of it and each other.

Per-patch breakdown:
 #1: already was on the lists, but this time it's done the other way, the
     one that Alex Duyck proposed during the review of the previous series.
     Slightly reduce amount of C preprocessing by stopping including
     <net/page_pool.h> to <linux/skbuff.h> (which is included in the
     half of the kernel sources). Especially useful with the abovementioned
     series applied, as it makes page_pool.h heavier;
 #2: don't call to DMA sync externals when they won't do anything anyway
     by doing some heuristics a bit earlier (when allocating a new page),
     also was on the lists;
 #3: new, prereq to #4. Add NAPI state flag, which would indicate
     napi->poll() is running right now, so that napi->list_owner would
     point to the CPU where it's being run, not just scheduled;
 #4: new. In addition to recycling skb PP pages directly when @napi_safe
     is set, check for the flag from #3, which will mean the same if
     ->list_owner is pointing to us. This allows to use direct recycling
     anytime we're inside a NAPI polling loop or GRO stuff going right
     after it, covering way more cases than is right now.

(complete tree with [1] + this + [0] is available here: [2])

[0] https://lore.kernel.org/netdev/20230530150035.1943669-1-aleksander.lobakin@intel.com
[1] https://lore.kernel.org/netdev/20230629120226.14854-1-linyunsheng@huawei.com
[2] https://github.com/alobakin/linux/commits/iavf-pp-frag

Alexander Lobakin (4):
  net: skbuff: don't include <net/page_pool.h> to <linux/skbuff.h>
  net: page_pool: avoid calling no-op externals when possible
  net: add flag to indicate NAPI/GRO is running right now
  net: skbuff: always recycle PP pages directly when inside a NAPI loop

 drivers/net/ethernet/engleder/tsnep_main.c    |  1 +
 drivers/net/ethernet/freescale/fec_main.c     |  1 +
 .../marvell/octeontx2/nic/otx2_common.c       |  1 +
 .../ethernet/marvell/octeontx2/nic/otx2_pf.c  |  1 +
 .../ethernet/mellanox/mlx5/core/en/params.c   |  1 +
 .../net/ethernet/mellanox/mlx5/core/en/xdp.c  |  1 +
 drivers/net/wireless/mediatek/mt76/mt76.h     |  1 +
 include/linux/netdevice.h                     |  2 +
 include/linux/skbuff.h                        |  3 +-
 include/net/page_pool.h                       |  5 +-
 net/core/dev.c                                | 23 +++++--
 net/core/page_pool.c                          | 62 +++++++------------
 net/core/skbuff.c                             | 29 +++++++++
 13 files changed, 83 insertions(+), 48 deletions(-)

---
Really curious about #3. Implementing the idea correctly (this or other
way) potentially unblocks a lot more interesting stuff (besides #4).

Comments

Jakub Kicinski July 2, 2023, 12:01 a.m. UTC | #1
On Thu, 29 Jun 2023 17:23:01 +0200 Alexander Lobakin wrote:
>  #3: new, prereq to #4. Add NAPI state flag, which would indicate
>      napi->poll() is running right now, so that napi->list_owner would
>      point to the CPU where it's being run, not just scheduled;
>  #4: new. In addition to recycling skb PP pages directly when @napi_safe
>      is set, check for the flag from #3, which will mean the same if
>      ->list_owner is pointing to us. This allows to use direct recycling  
>      anytime we're inside a NAPI polling loop or GRO stuff going right
>      after it, covering way more cases than is right now.

You know NAPI pretty well so I'm worried I'm missing something.
I don't think the new flag adds any value. NAPI does not have to 
be running, you can drop patch 3 and use in_softirq() instead of 
the new flag, AFAIU.

The reason I did not do that is that I wasn't sure if there is no
weird (netcons?) case where skb gets freed from an IRQ :(
Alexander Lobakin July 3, 2023, 1:50 p.m. UTC | #2
From: Jakub Kicinski <kuba@kernel.org>
Date: Sat, 1 Jul 2023 17:01:55 -0700

> On Thu, 29 Jun 2023 17:23:01 +0200 Alexander Lobakin wrote:
>>  #3: new, prereq to #4. Add NAPI state flag, which would indicate
>>      napi->poll() is running right now, so that napi->list_owner would
>>      point to the CPU where it's being run, not just scheduled;
>>  #4: new. In addition to recycling skb PP pages directly when @napi_safe
>>      is set, check for the flag from #3, which will mean the same if
>>      ->list_owner is pointing to us. This allows to use direct recycling  
>>      anytime we're inside a NAPI polling loop or GRO stuff going right
>>      after it, covering way more cases than is right now.
> 
> You know NAPI pretty well so I'm worried I'm missing something.

I wouldn't say I know it well :D

> I don't think the new flag adds any value. NAPI does not have to 
> be running, you can drop patch 3 and use in_softirq() instead of 
> the new flag, AFAIU.

That's most likely true for the patch 4 case, but I wanted to add some
flag for wider usage.
For example, busy polling relies on whether ->poll() returned whole
budget to decide whether interrupts were reenabled to avoid possible
concurrent access, but I wouldn't say it's precise enough.
napi_complete_done() doesn't always return true.
OTOH, the new flag or, more precisely, flag + list_owner combo would
tell for sure.

> 
> The reason I did not do that is that I wasn't sure if there is no
> weird (netcons?) case where skb gets freed from an IRQ :(

Shouldn't they use dev_kfree_skb_any() or _irq()? Usage of plain
kfree_skb() is not allowed in the TH :s

Anyway, if the flag really makes no sense, I can replace it with
in_softirq(), it's my hobby to break weird drivers :D

Thanks,
Olek
Jakub Kicinski July 3, 2023, 6:57 p.m. UTC | #3
On Mon, 3 Jul 2023 15:50:55 +0200 Alexander Lobakin wrote:
> > The reason I did not do that is that I wasn't sure if there is no
> > weird (netcons?) case where skb gets freed from an IRQ :(  
> 
> Shouldn't they use dev_kfree_skb_any() or _irq()? Usage of plain
> kfree_skb() is not allowed in the TH :s

I haven't looked at the code so I could be lying but I thought that 
the only thing that can't run in hard IRQ context is the destructor,
so if the caller knows there's no destructor they can free the skb.

I'd ask you the inverse question. If the main use case is skb xdp
(which eh, uh, okay..) then why not make it use napi_consume_skb()?
I don't think skb XDP can run in hard IRQ context, can it?

> Anyway, if the flag really makes no sense, I can replace it with
> in_softirq(), it's my hobby to break weird drivers :D
Alexander Lobakin July 5, 2023, 12:31 p.m. UTC | #4
From: Jakub Kicinski <kuba@kernel.org>
Date: Mon, 3 Jul 2023 11:57:34 -0700

> On Mon, 3 Jul 2023 15:50:55 +0200 Alexander Lobakin wrote:
>>> The reason I did not do that is that I wasn't sure if there is no
>>> weird (netcons?) case where skb gets freed from an IRQ :(  
>>
>> Shouldn't they use dev_kfree_skb_any() or _irq()? Usage of plain
>> kfree_skb() is not allowed in the TH :s
> 
> I haven't looked at the code so I could be lying but I thought that 
> the only thing that can't run in hard IRQ context is the destructor,
> so if the caller knows there's no destructor they can free the skb.
> 
> I'd ask you the inverse question. If the main use case is skb xdp
> (which eh, uh, okay..) then why not make it use napi_consume_skb()?

Remember about Wi-Fi, DSA, and other poor citizens with no native XDP! :D
That was mostly a joke, but I thought of this, too. At the end my
thought was "let's try making it cover more usecases" and I found this
approach. I'm not saying it's optimal or even much needed, that's why I
sent it to discuss basically.

(e.g. I wanted to try speed up xdp_return_frame{,_bulk}() using it)

> I don't think skb XDP can run in hard IRQ context, can it?

skb XDP can't happen in the TH and I think we could assume it's safe to
use napi_consume_skb() there (with a fake non-zero budget :p).

> 
>> Anyway, if the flag really makes no sense, I can replace it with
>> in_softirq(), it's my hobby to break weird drivers :D

Thanks,
Olek