mbox series

[bpf-next,v2,0/4] XDP bonding support

Message ID 20210624091843.5151-1-joamaki@gmail.com (mailing list archive)
Headers show
Series XDP bonding support | expand

Message

Jussi Maki June 24, 2021, 9:18 a.m. UTC
From: Jussi Maki <joamaki@gmail.com>

This patchset introduces XDP support to the bonding driver.

The motivation for this change is to enable use of bonding (and
802.3ad) in hairpinning L4 load-balancers such as [1] implemented with
XDP and also to transparently support bond devices for projects that
use XDP given most modern NICs have dual port adapters.  An alternative
to this approach would be to implement 802.3ad in user-space and
implement the bonding load-balancing in the XDP program itself, but
is rather a cumbersome endeavor in terms of slave device management
(e.g. by watching netlink) and requires separate programs for native
vs bond cases for the orchestrator. A native in-kernel implementation
overcomes these issues and provides more flexibility.

Below are benchmark results done on two machines with 100Gbit
Intel E810 (ice) NIC and with 32-core 3970X on sending machine, and
16-core 3950X on receiving machine. 64 byte packets were sent with
pktgen-dpdk at full rate. Two issues [2, 3] were identified with the
ice driver, so the tests were performed with iommu=off and patch [2]
applied. Additionally the bonding round robin algorithm was modified
to use per-cpu tx counters as high CPU load (50% vs 10%) and high rate
of cache misses were caused by the shared rr_tx_counter. Fix for this
has been already merged into net-next. The statistics were collected 
using "sar -n dev -u 1 10".

 -----------------------|  CPU  |--| rxpck/s |--| txpck/s |----
 without patch (1 dev):
   XDP_DROP:              3.15%      48.6Mpps
   XDP_TX:                3.12%      18.3Mpps     18.3Mpps
   XDP_DROP (RSS):        9.47%      116.5Mpps
   XDP_TX (RSS):          9.67%      25.3Mpps     24.2Mpps
 -----------------------
 with patch, bond (1 dev):
   XDP_DROP:              3.14%      46.7Mpps
   XDP_TX:                3.15%      13.9Mpps     13.9Mpps
   XDP_DROP (RSS):        10.33%     117.2Mpps
   XDP_TX (RSS):          10.64%     25.1Mpps     24.0Mpps
 -----------------------
 with patch, bond (2 devs):
   XDP_DROP:              6.27%      92.7Mpps
   XDP_TX:                6.26%      17.6Mpps     17.5Mpps
   XDP_DROP (RSS):       11.38%      117.2Mpps
   XDP_TX (RSS):         14.30%      28.7Mpps     27.4Mpps
 --------------------------------------------------------------

RSS: Receive Side Scaling, e.g. the packets were sent to a range of
destination IPs.

[1]: https://cilium.io/blog/2021/05/20/cilium-110#standalonelb
[2]: https://lore.kernel.org/bpf/20210601113236.42651-1-maciej.fijalkowski@intel.com/T/#t
[3]: https://lore.kernel.org/bpf/CAHn8xckNXci+X_Eb2WMv4uVYjO2331UWB2JLtXr_58z0Av8+8A@mail.gmail.com/

Patch 1 prepares bond_xmit_hash for hashing xdp_buff's
Patch 2 adds hooks to implement redirection after bpf prog run
Patch 3 implements the hooks in the bonding driver. 
Patch 4 modifies devmap to properly handle EXCLUDE_INGRESS with a slave device.

v1->v2:
- Split up into smaller easier to review patches and address cosmetic 
  review comments.
- Drop the INDIRECT_CALL optimization as it showed little improvement in tests.
- Drop the rr_tx_counter patch as that has already been merged into net-next.
- Separate the test suite into another patch set. This will follow later once a
  patch set from Magnus Karlsson is merged and provides test utilities that can
  be reused for XDP bonding tests. v2 contains no major functional changes and
  was tested with the test suite included in v1.
  (https://lore.kernel.org/bpf/202106221509.kwNvAAZg-lkp@intel.com/T/#m464146d47299125d5868a08affd6d6ce526dfad1)

---

Jussi Maki (4):
  net: bonding: Refactor bond_xmit_hash for use with xdp_buff
  net: core: Add support for XDP redirection to slave device
  net: bonding: Add XDP support to the bonding driver
  devmap: Exclude XDP broadcast to master device

 drivers/net/bonding/bond_main.c | 431 +++++++++++++++++++++++++++-----
 include/linux/filter.h          |  13 +-
 include/linux/netdevice.h       |   5 +
 include/net/bonding.h           |   1 +
 kernel/bpf/devmap.c             |  34 ++-
 net/core/filter.c               |  25 ++
 6 files changed, 445 insertions(+), 64 deletions(-)

Comments

Jay Vosburgh July 1, 2021, 6:20 p.m. UTC | #1
joamaki@gmail.com wrote:

>From: Jussi Maki <joamaki@gmail.com>
>
>This patchset introduces XDP support to the bonding driver.
>
>The motivation for this change is to enable use of bonding (and
>802.3ad) in hairpinning L4 load-balancers such as [1] implemented with
>XDP and also to transparently support bond devices for projects that
>use XDP given most modern NICs have dual port adapters.  An alternative
>to this approach would be to implement 802.3ad in user-space and
>implement the bonding load-balancing in the XDP program itself, but
>is rather a cumbersome endeavor in terms of slave device management
>(e.g. by watching netlink) and requires separate programs for native
>vs bond cases for the orchestrator. A native in-kernel implementation
>overcomes these issues and provides more flexibility.
>
>Below are benchmark results done on two machines with 100Gbit
>Intel E810 (ice) NIC and with 32-core 3970X on sending machine, and
>16-core 3950X on receiving machine. 64 byte packets were sent with
>pktgen-dpdk at full rate. Two issues [2, 3] were identified with the
>ice driver, so the tests were performed with iommu=off and patch [2]
>applied. Additionally the bonding round robin algorithm was modified
>to use per-cpu tx counters as high CPU load (50% vs 10%) and high rate
>of cache misses were caused by the shared rr_tx_counter. Fix for this
>has been already merged into net-next. The statistics were collected 
>using "sar -n dev -u 1 10".
>
> -----------------------|  CPU  |--| rxpck/s |--| txpck/s |----
> without patch (1 dev):
>   XDP_DROP:              3.15%      48.6Mpps
>   XDP_TX:                3.12%      18.3Mpps     18.3Mpps
>   XDP_DROP (RSS):        9.47%      116.5Mpps
>   XDP_TX (RSS):          9.67%      25.3Mpps     24.2Mpps
> -----------------------
> with patch, bond (1 dev):
>   XDP_DROP:              3.14%      46.7Mpps
>   XDP_TX:                3.15%      13.9Mpps     13.9Mpps
>   XDP_DROP (RSS):        10.33%     117.2Mpps
>   XDP_TX (RSS):          10.64%     25.1Mpps     24.0Mpps
> -----------------------
> with patch, bond (2 devs):
>   XDP_DROP:              6.27%      92.7Mpps
>   XDP_TX:                6.26%      17.6Mpps     17.5Mpps
>   XDP_DROP (RSS):       11.38%      117.2Mpps
>   XDP_TX (RSS):         14.30%      28.7Mpps     27.4Mpps
> --------------------------------------------------------------

	To be clear, the fact that the performance numbers for XDP_DROP
and XDP_TX are lower for "with patch, bond (1 dev)" than "without patch
(1 dev)" is expected, correct?

	-J

>RSS: Receive Side Scaling, e.g. the packets were sent to a range of
>destination IPs.
>
>[1]: https://cilium.io/blog/2021/05/20/cilium-110#standalonelb
>[2]: https://lore.kernel.org/bpf/20210601113236.42651-1-maciej.fijalkowski@intel.com/T/#t
>[3]: https://lore.kernel.org/bpf/CAHn8xckNXci+X_Eb2WMv4uVYjO2331UWB2JLtXr_58z0Av8+8A@mail.gmail.com/
>
>Patch 1 prepares bond_xmit_hash for hashing xdp_buff's
>Patch 2 adds hooks to implement redirection after bpf prog run
>Patch 3 implements the hooks in the bonding driver. 
>Patch 4 modifies devmap to properly handle EXCLUDE_INGRESS with a slave device.
>
>v1->v2:
>- Split up into smaller easier to review patches and address cosmetic 
>  review comments.
>- Drop the INDIRECT_CALL optimization as it showed little improvement in tests.
>- Drop the rr_tx_counter patch as that has already been merged into net-next.
>- Separate the test suite into another patch set. This will follow later once a
>  patch set from Magnus Karlsson is merged and provides test utilities that can
>  be reused for XDP bonding tests. v2 contains no major functional changes and
>  was tested with the test suite included in v1.
>  (https://lore.kernel.org/bpf/202106221509.kwNvAAZg-lkp@intel.com/T/#m464146d47299125d5868a08affd6d6ce526dfad1)
>
>---
>
>Jussi Maki (4):
>  net: bonding: Refactor bond_xmit_hash for use with xdp_buff
>  net: core: Add support for XDP redirection to slave device
>  net: bonding: Add XDP support to the bonding driver
>  devmap: Exclude XDP broadcast to master device
>
> drivers/net/bonding/bond_main.c | 431 +++++++++++++++++++++++++++-----
> include/linux/filter.h          |  13 +-
> include/linux/netdevice.h       |   5 +
> include/net/bonding.h           |   1 +
> kernel/bpf/devmap.c             |  34 ++-
> net/core/filter.c               |  25 ++
> 6 files changed, 445 insertions(+), 64 deletions(-)
>
>-- 
>2.27.0

---
	-Jay Vosburgh, jay.vosburgh@canonical.com
Jussi Maki July 5, 2021, 10:32 a.m. UTC | #2
On Thu, Jul 1, 2021 at 9:20 PM Jay Vosburgh <jay.vosburgh@canonical.com> wrote:
>
> joamaki@gmail.com wrote:
>
> >From: Jussi Maki <joamaki@gmail.com>
> >
> >This patchset introduces XDP support to the bonding driver.
> >
> >The motivation for this change is to enable use of bonding (and
> >802.3ad) in hairpinning L4 load-balancers such as [1] implemented with
> >XDP and also to transparently support bond devices for projects that
> >use XDP given most modern NICs have dual port adapters.  An alternative
> >to this approach would be to implement 802.3ad in user-space and
> >implement the bonding load-balancing in the XDP program itself, but
> >is rather a cumbersome endeavor in terms of slave device management
> >(e.g. by watching netlink) and requires separate programs for native
> >vs bond cases for the orchestrator. A native in-kernel implementation
> >overcomes these issues and provides more flexibility.
> >
> >Below are benchmark results done on two machines with 100Gbit
> >Intel E810 (ice) NIC and with 32-core 3970X on sending machine, and
> >16-core 3950X on receiving machine. 64 byte packets were sent with
> >pktgen-dpdk at full rate. Two issues [2, 3] were identified with the
> >ice driver, so the tests were performed with iommu=off and patch [2]
> >applied. Additionally the bonding round robin algorithm was modified
> >to use per-cpu tx counters as high CPU load (50% vs 10%) and high rate
> >of cache misses were caused by the shared rr_tx_counter. Fix for this
> >has been already merged into net-next. The statistics were collected
> >using "sar -n dev -u 1 10".
> >
> > -----------------------|  CPU  |--| rxpck/s |--| txpck/s |----
> > without patch (1 dev):
> >   XDP_DROP:              3.15%      48.6Mpps
> >   XDP_TX:                3.12%      18.3Mpps     18.3Mpps
> >   XDP_DROP (RSS):        9.47%      116.5Mpps
> >   XDP_TX (RSS):          9.67%      25.3Mpps     24.2Mpps
> > -----------------------
> > with patch, bond (1 dev):
> >   XDP_DROP:              3.14%      46.7Mpps
> >   XDP_TX:                3.15%      13.9Mpps     13.9Mpps
> >   XDP_DROP (RSS):        10.33%     117.2Mpps
> >   XDP_TX (RSS):          10.64%     25.1Mpps     24.0Mpps
> > -----------------------
> > with patch, bond (2 devs):
> >   XDP_DROP:              6.27%      92.7Mpps
> >   XDP_TX:                6.26%      17.6Mpps     17.5Mpps
> >   XDP_DROP (RSS):       11.38%      117.2Mpps
> >   XDP_TX (RSS):         14.30%      28.7Mpps     27.4Mpps
> > --------------------------------------------------------------
>
>         To be clear, the fact that the performance numbers for XDP_DROP
> and XDP_TX are lower for "with patch, bond (1 dev)" than "without patch
> (1 dev)" is expected, correct?

Yes that is correct. With the patch the ndo callback for choosing the
slave device is invoked which in this test (mode=xor) hashes L2&L3
headers (I seem to have failed to mention this in the original
message). In round-robin mode I recall it being about 16Mpps versus
the 18Mpps without the patch. I did also try "INDIRECT_CALL" to avoid
going via ndo_ops, but that had no discernible effect.