Message ID | 20230621170244.1283336-7-sdf@google.com (mailing list archive) |
---|---|
State | RFC |
Delegated to: | BPF |
Headers | show |
Series | bpf: Netdev TX metadata | expand |
Stanislav Fomichev <sdf@google.com> writes: > Have a software-based example for kfuncs to showcase how it > can be used in the real devices and to have something to > test against in the selftests. > > Both path (skb & xdp) are covered. Only the skb path is really > tested though. > > Cc: netdev@vger.kernel.org > Signed-off-by: Stanislav Fomichev <sdf@google.com> Not really related to this patch, but to how it would work with different drivers/hardware. In some of our hardware (the ones handled by igc/igb, for example), the timestamp notification comes some time after the transmit completion event. From what I could gather, the idea would be for the driver to "hold" the completion until the timestamp is ready and then signal the completion of the frame. Is that right? Cheers,
On Fri, Jun 23, 2023 at 4:29 PM Vinicius Costa Gomes <vinicius.gomes@intel.com> wrote: > > Stanislav Fomichev <sdf@google.com> writes: > > > Have a software-based example for kfuncs to showcase how it > > can be used in the real devices and to have something to > > test against in the selftests. > > > > Both path (skb & xdp) are covered. Only the skb path is really > > tested though. > > > > Cc: netdev@vger.kernel.org > > Signed-off-by: Stanislav Fomichev <sdf@google.com> > > Not really related to this patch, but to how it would work with > different drivers/hardware. > > In some of our hardware (the ones handled by igc/igb, for example), the > timestamp notification comes some time after the transmit completion > event. > > From what I could gather, the idea would be for the driver to "hold" the > completion until the timestamp is ready and then signal the completion > of the frame. Is that right? Yeah, that might be the option. Do you think it could work?
Stanislav Fomichev <sdf@google.com> writes: > On Fri, Jun 23, 2023 at 4:29 PM Vinicius Costa Gomes > <vinicius.gomes@intel.com> wrote: >> >> Stanislav Fomichev <sdf@google.com> writes: >> >> > Have a software-based example for kfuncs to showcase how it >> > can be used in the real devices and to have something to >> > test against in the selftests. >> > >> > Both path (skb & xdp) are covered. Only the skb path is really >> > tested though. >> > >> > Cc: netdev@vger.kernel.org >> > Signed-off-by: Stanislav Fomichev <sdf@google.com> >> >> Not really related to this patch, but to how it would work with >> different drivers/hardware. >> >> In some of our hardware (the ones handled by igc/igb, for example), the >> timestamp notification comes some time after the transmit completion >> event. >> >> From what I could gather, the idea would be for the driver to "hold" the >> completion until the timestamp is ready and then signal the completion >> of the frame. Is that right? > > Yeah, that might be the option. Do you think it could work? > For the skb and XDP cases, yeah, just holding the completion for a while seems like it's going to work. XDP ZC looks more complicated to me, not sure if it's only a matter of adding something like: void xsk_tx_completed_one(struct xsk_buff_pool *pool, struct xdp_buff *xdp); Or if more changes would be needed. I am trying to think about the case that the user sent a single "timestamp" packet among a bunch of "non-timestamp" packets. Cheers,
On Mon, Jun 26, 2023 at 3:00 PM Vinicius Costa Gomes <vinicius.gomes@intel.com> wrote: > > Stanislav Fomichev <sdf@google.com> writes: > > > On Fri, Jun 23, 2023 at 4:29 PM Vinicius Costa Gomes > > <vinicius.gomes@intel.com> wrote: > >> > >> Stanislav Fomichev <sdf@google.com> writes: > >> > >> > Have a software-based example for kfuncs to showcase how it > >> > can be used in the real devices and to have something to > >> > test against in the selftests. > >> > > >> > Both path (skb & xdp) are covered. Only the skb path is really > >> > tested though. > >> > > >> > Cc: netdev@vger.kernel.org > >> > Signed-off-by: Stanislav Fomichev <sdf@google.com> > >> > >> Not really related to this patch, but to how it would work with > >> different drivers/hardware. > >> > >> In some of our hardware (the ones handled by igc/igb, for example), the > >> timestamp notification comes some time after the transmit completion > >> event. > >> > >> From what I could gather, the idea would be for the driver to "hold" the > >> completion until the timestamp is ready and then signal the completion > >> of the frame. Is that right? > > > > Yeah, that might be the option. Do you think it could work? > > > > For the skb and XDP cases, yeah, just holding the completion for a while > seems like it's going to work. > > XDP ZC looks more complicated to me, not sure if it's only a matter of > adding something like: [..] > void xsk_tx_completed_one(struct xsk_buff_pool *pool, struct xdp_buff *xdp); > > Or if more changes would be needed. I am trying to think about the case > that the user sent a single "timestamp" packet among a bunch of > "non-timestamp" packets. Since you're passing xdp_buff as an argument I'm assuming that is suggesting out-of-order completions? The completion queue is a single index, we can't do ooo stuff. So you'd have to hold a bunch of packets until you receive the timestamp completion; after this event, you can complete the whole batch (1 packet waiting for the timestamp + a bunch that have been transmitted afterwards but were still unacknowleged in the queue). (lmk if I've misinterpreted)
Stanislav Fomichev <sdf@google.com> writes: > On Mon, Jun 26, 2023 at 3:00 PM Vinicius Costa Gomes > <vinicius.gomes@intel.com> wrote: >> >> Stanislav Fomichev <sdf@google.com> writes: >> >> > On Fri, Jun 23, 2023 at 4:29 PM Vinicius Costa Gomes >> > <vinicius.gomes@intel.com> wrote: >> >> >> >> Stanislav Fomichev <sdf@google.com> writes: >> >> >> >> > Have a software-based example for kfuncs to showcase how it >> >> > can be used in the real devices and to have something to >> >> > test against in the selftests. >> >> > >> >> > Both path (skb & xdp) are covered. Only the skb path is really >> >> > tested though. >> >> > >> >> > Cc: netdev@vger.kernel.org >> >> > Signed-off-by: Stanislav Fomichev <sdf@google.com> >> >> >> >> Not really related to this patch, but to how it would work with >> >> different drivers/hardware. >> >> >> >> In some of our hardware (the ones handled by igc/igb, for example), the >> >> timestamp notification comes some time after the transmit completion >> >> event. >> >> >> >> From what I could gather, the idea would be for the driver to "hold" the >> >> completion until the timestamp is ready and then signal the completion >> >> of the frame. Is that right? >> > >> > Yeah, that might be the option. Do you think it could work? >> > >> >> For the skb and XDP cases, yeah, just holding the completion for a while >> seems like it's going to work. >> >> XDP ZC looks more complicated to me, not sure if it's only a matter of >> adding something like: > > [..] > >> void xsk_tx_completed_one(struct xsk_buff_pool *pool, struct xdp_buff *xdp); >> >> Or if more changes would be needed. I am trying to think about the case >> that the user sent a single "timestamp" packet among a bunch of >> "non-timestamp" packets. > > Since you're passing xdp_buff as an argument I'm assuming that is > suggesting out-of-order completions? > The completion queue is a single index, we can't do ooo stuff. > So you'd have to hold a bunch of packets until you receive the > timestamp completion; after this event, you can complete the whole > batch (1 packet waiting for the timestamp + a bunch that have been > transmitted afterwards but were still unacknowleged in the queue). > > (lmk if I've misinterpreted) Not at all, it was me that wasn't aware that out-of-order completions were out of the picture. So, yeah, what you are proposing, accumulating the pending completions while there's a pending timestamp request, seems the only way to go. The logic seems simple enough, but the fact that the "timestamp ready" interrupt is not associated with any queue seems that it will make things a bit "interesting" to get it right :-) But I don't have any better suggestions. Thank you,
diff --git a/drivers/net/veth.c b/drivers/net/veth.c index 614f3e3efab0..632f0f3771e4 100644 --- a/drivers/net/veth.c +++ b/drivers/net/veth.c @@ -27,6 +27,7 @@ #include <linux/bpf_trace.h> #include <linux/net_tstamp.h> #include <net/page_pool.h> +#include <net/devtx.h> #define DRV_NAME "veth" #define DRV_VERSION "1.0" @@ -123,6 +124,13 @@ struct veth_xdp_buff { struct sk_buff *skb; }; +struct veth_devtx_frame { + struct devtx_frame frame; + bool request_timestamp; + ktime_t xdp_tx_timestamp; + struct sk_buff *skb; +}; + static int veth_get_link_ksettings(struct net_device *dev, struct ethtool_link_ksettings *cmd) { @@ -313,10 +321,43 @@ static int veth_xdp_rx(struct veth_rq *rq, struct sk_buff *skb) return NET_RX_SUCCESS; } +__weak noinline void veth_devtx_submit(struct devtx_frame *ctx) +{ +} + +__weak noinline void veth_devtx_complete(struct devtx_frame *ctx) +{ +} + +BTF_SET8_START(veth_devtx_hook_ids) +BTF_ID_FLAGS(func, veth_devtx_submit) +BTF_ID_FLAGS(func, veth_devtx_complete) +BTF_SET8_END(veth_devtx_hook_ids) + static int veth_forward_skb(struct net_device *dev, struct sk_buff *skb, - struct veth_rq *rq, bool xdp) + struct veth_rq *rq, bool xdp, bool request_timestamp) { - return __dev_forward_skb(dev, skb) ?: xdp ? + struct net_device *orig_dev = skb->dev; + int ret; + + ret = __dev_forward_skb(dev, skb); + if (ret) + return ret; + + if (devtx_enabled()) { + struct veth_devtx_frame ctx; + + if (unlikely(request_timestamp)) + __net_timestamp(skb); + + devtx_frame_from_skb(&ctx.frame, skb, orig_dev); + ctx.frame.data -= ETH_HLEN; /* undo eth_type_trans pull */ + ctx.frame.len += ETH_HLEN; + ctx.skb = skb; + veth_devtx_complete(&ctx.frame); + } + + return xdp ? veth_xdp_rx(rq, skb) : __netif_rx(skb); } @@ -343,6 +384,7 @@ static bool veth_skb_is_eligible_for_gro(const struct net_device *dev, static netdev_tx_t veth_xmit(struct sk_buff *skb, struct net_device *dev) { struct veth_priv *rcv_priv, *priv = netdev_priv(dev); + bool request_timestamp = false; struct veth_rq *rq = NULL; struct net_device *rcv; int length = skb->len; @@ -356,6 +398,15 @@ static netdev_tx_t veth_xmit(struct sk_buff *skb, struct net_device *dev) goto drop; } + if (devtx_enabled()) { + struct veth_devtx_frame ctx; + + devtx_frame_from_skb(&ctx.frame, skb, dev); + ctx.request_timestamp = false; + veth_devtx_submit(&ctx.frame); + request_timestamp = ctx.request_timestamp; + } + rcv_priv = netdev_priv(rcv); rxq = skb_get_queue_mapping(skb); if (rxq < rcv->real_num_rx_queues) { @@ -370,7 +421,7 @@ static netdev_tx_t veth_xmit(struct sk_buff *skb, struct net_device *dev) } skb_tx_timestamp(skb); - if (likely(veth_forward_skb(rcv, skb, rq, use_napi) == NET_RX_SUCCESS)) { + if (likely(veth_forward_skb(rcv, skb, rq, use_napi, request_timestamp) == NET_RX_SUCCESS)) { if (!use_napi) dev_lstats_add(dev, length); } else { @@ -483,6 +534,7 @@ static int veth_xdp_xmit(struct net_device *dev, int n, { struct veth_priv *rcv_priv, *priv = netdev_priv(dev); int i, ret = -ENXIO, nxmit = 0; + ktime_t tx_timestamp = 0; struct net_device *rcv; unsigned int max_len; struct veth_rq *rq; @@ -511,9 +563,32 @@ static int veth_xdp_xmit(struct net_device *dev, int n, void *ptr = veth_xdp_to_ptr(frame); if (unlikely(xdp_get_frame_len(frame) > max_len || - __ptr_ring_produce(&rq->xdp_ring, ptr))) + __ptr_ring_full(&rq->xdp_ring))) + break; + + if (devtx_enabled()) { + struct veth_devtx_frame ctx; + + devtx_frame_from_xdp(&ctx.frame, frame, dev); + ctx.request_timestamp = false; + veth_devtx_submit(&ctx.frame); + + if (unlikely(ctx.request_timestamp)) + tx_timestamp = ktime_get_real(); + } + + if (unlikely(__ptr_ring_produce(&rq->xdp_ring, ptr))) break; nxmit++; + + if (devtx_enabled()) { + struct veth_devtx_frame ctx; + + devtx_frame_from_xdp(&ctx.frame, frame, dev); + ctx.xdp_tx_timestamp = tx_timestamp; + ctx.skb = NULL; + veth_devtx_complete(&ctx.frame); + } } spin_unlock(&rq->xdp_ring.producer_lock); @@ -1732,6 +1807,28 @@ static int veth_xdp_rx_hash(const struct xdp_md *ctx, u32 *hash, return 0; } +static int veth_devtx_sb_request_timestamp(const struct devtx_frame *_ctx) +{ + struct veth_devtx_frame *ctx = (struct veth_devtx_frame *)_ctx; + + ctx->request_timestamp = true; + + return 0; +} + +static int veth_devtx_cp_timestamp(const struct devtx_frame *_ctx, u64 *timestamp) +{ + struct veth_devtx_frame *ctx = (struct veth_devtx_frame *)_ctx; + + if (ctx->skb) { + *timestamp = ctx->skb->tstamp; + return 0; + } + + *timestamp = ctx->xdp_tx_timestamp; + return 0; +} + static const struct net_device_ops veth_netdev_ops = { .ndo_init = veth_dev_init, .ndo_open = veth_open, @@ -1756,6 +1853,8 @@ static const struct net_device_ops veth_netdev_ops = { static const struct xdp_metadata_ops veth_xdp_metadata_ops = { .xmo_rx_timestamp = veth_xdp_rx_timestamp, .xmo_rx_hash = veth_xdp_rx_hash, + .xmo_sb_request_timestamp = veth_devtx_sb_request_timestamp, + .xmo_cp_timestamp = veth_devtx_cp_timestamp, }; #define VETH_FEATURES (NETIF_F_SG | NETIF_F_FRAGLIST | NETIF_F_HW_CSUM | \ @@ -2041,11 +2140,20 @@ static struct rtnl_link_ops veth_link_ops = { static __init int veth_init(void) { + int ret; + + ret = devtx_hooks_register(&veth_devtx_hook_ids, &veth_xdp_metadata_ops); + if (ret) { + pr_warn("failed to register devtx hooks: %d", ret); + return ret; + } + return rtnl_link_register(&veth_link_ops); } static __exit void veth_exit(void) { + devtx_hooks_unregister(&veth_devtx_hook_ids); rtnl_link_unregister(&veth_link_ops); }
Have a software-based example for kfuncs to showcase how it can be used in the real devices and to have something to test against in the selftests. Both path (skb & xdp) are covered. Only the skb path is really tested though. Cc: netdev@vger.kernel.org Signed-off-by: Stanislav Fomichev <sdf@google.com> --- drivers/net/veth.c | 116 +++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 112 insertions(+), 4 deletions(-)