diff mbox series

[RFC,bpf-next,v2,06/11] net: veth: Implement devtx timestamp kfuncs

Message ID 20230621170244.1283336-7-sdf@google.com (mailing list archive)
State RFC
Delegated to: BPF
Headers show
Series bpf: Netdev TX metadata | expand

Checks

Context Check Description
bpf/vmtest-bpf-next-PR fail PR summary
bpf/vmtest-bpf-next-VM_Test-1 success Logs for ${{ matrix.test }} on ${{ matrix.arch }} with ${{ matrix.toolchain_full }}
bpf/vmtest-bpf-next-VM_Test-2 success Logs for ShellCheck
bpf/vmtest-bpf-next-VM_Test-3 fail Logs for build for aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-4 fail Logs for build for s390x with gcc
bpf/vmtest-bpf-next-VM_Test-5 fail Logs for build for x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-6 fail Logs for build for x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-7 success Logs for set-matrix
bpf/vmtest-bpf-next-VM_Test-8 success Logs for veristat
netdev/series_format success Posting correctly formatted
netdev/tree_selection success Clearly marked for bpf-next, async
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit fail Errors and warnings before: 10 this patch: 17
netdev/cc_maintainers warning 5 maintainers not CCed: kuba@kernel.org hawk@kernel.org davem@davemloft.net pabeni@redhat.com edumazet@google.com
netdev/build_clang fail Errors and warnings before: 8 this patch: 16
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn fail Errors and warnings before: 10 this patch: 17
netdev/checkpatch warning WARNING: line length of 100 exceeds 80 columns WARNING: line length of 81 exceeds 80 columns WARNING: line length of 82 exceeds 80 columns
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Stanislav Fomichev June 21, 2023, 5:02 p.m. UTC
Have a software-based example for kfuncs to showcase how it
can be used in the real devices and to have something to
test against in the selftests.

Both path (skb & xdp) are covered. Only the skb path is really
tested though.

Cc: netdev@vger.kernel.org
Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
 drivers/net/veth.c | 116 +++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 112 insertions(+), 4 deletions(-)

Comments

Vinicius Costa Gomes June 23, 2023, 11:29 p.m. UTC | #1
Stanislav Fomichev <sdf@google.com> writes:

> Have a software-based example for kfuncs to showcase how it
> can be used in the real devices and to have something to
> test against in the selftests.
>
> Both path (skb & xdp) are covered. Only the skb path is really
> tested though.
>
> Cc: netdev@vger.kernel.org
> Signed-off-by: Stanislav Fomichev <sdf@google.com>

Not really related to this patch, but to how it would work with
different drivers/hardware.

In some of our hardware (the ones handled by igc/igb, for example), the
timestamp notification comes some time after the transmit completion
event.

From what I could gather, the idea would be for the driver to "hold" the
completion until the timestamp is ready and then signal the completion
of the frame. Is that right?


Cheers,
Stanislav Fomichev June 26, 2023, 5 p.m. UTC | #2
On Fri, Jun 23, 2023 at 4:29 PM Vinicius Costa Gomes
<vinicius.gomes@intel.com> wrote:
>
> Stanislav Fomichev <sdf@google.com> writes:
>
> > Have a software-based example for kfuncs to showcase how it
> > can be used in the real devices and to have something to
> > test against in the selftests.
> >
> > Both path (skb & xdp) are covered. Only the skb path is really
> > tested though.
> >
> > Cc: netdev@vger.kernel.org
> > Signed-off-by: Stanislav Fomichev <sdf@google.com>
>
> Not really related to this patch, but to how it would work with
> different drivers/hardware.
>
> In some of our hardware (the ones handled by igc/igb, for example), the
> timestamp notification comes some time after the transmit completion
> event.
>
> From what I could gather, the idea would be for the driver to "hold" the
> completion until the timestamp is ready and then signal the completion
> of the frame. Is that right?

Yeah, that might be the option. Do you think it could work?
Vinicius Costa Gomes June 26, 2023, 10 p.m. UTC | #3
Stanislav Fomichev <sdf@google.com> writes:

> On Fri, Jun 23, 2023 at 4:29 PM Vinicius Costa Gomes
> <vinicius.gomes@intel.com> wrote:
>>
>> Stanislav Fomichev <sdf@google.com> writes:
>>
>> > Have a software-based example for kfuncs to showcase how it
>> > can be used in the real devices and to have something to
>> > test against in the selftests.
>> >
>> > Both path (skb & xdp) are covered. Only the skb path is really
>> > tested though.
>> >
>> > Cc: netdev@vger.kernel.org
>> > Signed-off-by: Stanislav Fomichev <sdf@google.com>
>>
>> Not really related to this patch, but to how it would work with
>> different drivers/hardware.
>>
>> In some of our hardware (the ones handled by igc/igb, for example), the
>> timestamp notification comes some time after the transmit completion
>> event.
>>
>> From what I could gather, the idea would be for the driver to "hold" the
>> completion until the timestamp is ready and then signal the completion
>> of the frame. Is that right?
>
> Yeah, that might be the option. Do you think it could work?
>

For the skb and XDP cases, yeah, just holding the completion for a while
seems like it's going to work.

XDP ZC looks more complicated to me, not sure if it's only a matter of
adding something like:

void xsk_tx_completed_one(struct xsk_buff_pool *pool, struct xdp_buff *xdp);

Or if more changes would be needed. I am trying to think about the case
that the user sent a single "timestamp" packet among a bunch of
"non-timestamp" packets.


Cheers,
Stanislav Fomichev June 26, 2023, 11:29 p.m. UTC | #4
On Mon, Jun 26, 2023 at 3:00 PM Vinicius Costa Gomes
<vinicius.gomes@intel.com> wrote:
>
> Stanislav Fomichev <sdf@google.com> writes:
>
> > On Fri, Jun 23, 2023 at 4:29 PM Vinicius Costa Gomes
> > <vinicius.gomes@intel.com> wrote:
> >>
> >> Stanislav Fomichev <sdf@google.com> writes:
> >>
> >> > Have a software-based example for kfuncs to showcase how it
> >> > can be used in the real devices and to have something to
> >> > test against in the selftests.
> >> >
> >> > Both path (skb & xdp) are covered. Only the skb path is really
> >> > tested though.
> >> >
> >> > Cc: netdev@vger.kernel.org
> >> > Signed-off-by: Stanislav Fomichev <sdf@google.com>
> >>
> >> Not really related to this patch, but to how it would work with
> >> different drivers/hardware.
> >>
> >> In some of our hardware (the ones handled by igc/igb, for example), the
> >> timestamp notification comes some time after the transmit completion
> >> event.
> >>
> >> From what I could gather, the idea would be for the driver to "hold" the
> >> completion until the timestamp is ready and then signal the completion
> >> of the frame. Is that right?
> >
> > Yeah, that might be the option. Do you think it could work?
> >
>
> For the skb and XDP cases, yeah, just holding the completion for a while
> seems like it's going to work.
>
> XDP ZC looks more complicated to me, not sure if it's only a matter of
> adding something like:

[..]

> void xsk_tx_completed_one(struct xsk_buff_pool *pool, struct xdp_buff *xdp);
>
> Or if more changes would be needed. I am trying to think about the case
> that the user sent a single "timestamp" packet among a bunch of
> "non-timestamp" packets.

Since you're passing xdp_buff as an argument I'm assuming that is
suggesting out-of-order completions?
The completion queue is a single index, we can't do ooo stuff.
So you'd have to hold a bunch of packets until you receive the
timestamp completion; after this event, you can complete the whole
batch (1 packet waiting for the timestamp + a bunch that have been
transmitted afterwards but were still unacknowleged in the queue).

(lmk if I've misinterpreted)
Vinicius Costa Gomes June 27, 2023, 1:38 a.m. UTC | #5
Stanislav Fomichev <sdf@google.com> writes:

> On Mon, Jun 26, 2023 at 3:00 PM Vinicius Costa Gomes
> <vinicius.gomes@intel.com> wrote:
>>
>> Stanislav Fomichev <sdf@google.com> writes:
>>
>> > On Fri, Jun 23, 2023 at 4:29 PM Vinicius Costa Gomes
>> > <vinicius.gomes@intel.com> wrote:
>> >>
>> >> Stanislav Fomichev <sdf@google.com> writes:
>> >>
>> >> > Have a software-based example for kfuncs to showcase how it
>> >> > can be used in the real devices and to have something to
>> >> > test against in the selftests.
>> >> >
>> >> > Both path (skb & xdp) are covered. Only the skb path is really
>> >> > tested though.
>> >> >
>> >> > Cc: netdev@vger.kernel.org
>> >> > Signed-off-by: Stanislav Fomichev <sdf@google.com>
>> >>
>> >> Not really related to this patch, but to how it would work with
>> >> different drivers/hardware.
>> >>
>> >> In some of our hardware (the ones handled by igc/igb, for example), the
>> >> timestamp notification comes some time after the transmit completion
>> >> event.
>> >>
>> >> From what I could gather, the idea would be for the driver to "hold" the
>> >> completion until the timestamp is ready and then signal the completion
>> >> of the frame. Is that right?
>> >
>> > Yeah, that might be the option. Do you think it could work?
>> >
>>
>> For the skb and XDP cases, yeah, just holding the completion for a while
>> seems like it's going to work.
>>
>> XDP ZC looks more complicated to me, not sure if it's only a matter of
>> adding something like:
>
> [..]
>
>> void xsk_tx_completed_one(struct xsk_buff_pool *pool, struct xdp_buff *xdp);
>>
>> Or if more changes would be needed. I am trying to think about the case
>> that the user sent a single "timestamp" packet among a bunch of
>> "non-timestamp" packets.
>
> Since you're passing xdp_buff as an argument I'm assuming that is
> suggesting out-of-order completions?
> The completion queue is a single index, we can't do ooo stuff.
> So you'd have to hold a bunch of packets until you receive the
> timestamp completion; after this event, you can complete the whole
> batch (1 packet waiting for the timestamp + a bunch that have been
> transmitted afterwards but were still unacknowleged in the queue).
>
> (lmk if I've misinterpreted)

Not at all, it was me that wasn't aware that out-of-order completions
were out of the picture.

So, yeah, what you are proposing, accumulating the pending completions
while there's a pending timestamp request, seems the only way to go.

The logic seems simple enough, but the fact that the "timestamp ready"
interrupt is not associated with any queue seems that it will make
things a bit "interesting" to get it right :-) 

But I don't have any better suggestions.


Thank you,
diff mbox series

Patch

diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index 614f3e3efab0..632f0f3771e4 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -27,6 +27,7 @@ 
 #include <linux/bpf_trace.h>
 #include <linux/net_tstamp.h>
 #include <net/page_pool.h>
+#include <net/devtx.h>
 
 #define DRV_NAME	"veth"
 #define DRV_VERSION	"1.0"
@@ -123,6 +124,13 @@  struct veth_xdp_buff {
 	struct sk_buff *skb;
 };
 
+struct veth_devtx_frame {
+	struct devtx_frame frame;
+	bool request_timestamp;
+	ktime_t xdp_tx_timestamp;
+	struct sk_buff *skb;
+};
+
 static int veth_get_link_ksettings(struct net_device *dev,
 				   struct ethtool_link_ksettings *cmd)
 {
@@ -313,10 +321,43 @@  static int veth_xdp_rx(struct veth_rq *rq, struct sk_buff *skb)
 	return NET_RX_SUCCESS;
 }
 
+__weak noinline void veth_devtx_submit(struct devtx_frame *ctx)
+{
+}
+
+__weak noinline void veth_devtx_complete(struct devtx_frame *ctx)
+{
+}
+
+BTF_SET8_START(veth_devtx_hook_ids)
+BTF_ID_FLAGS(func, veth_devtx_submit)
+BTF_ID_FLAGS(func, veth_devtx_complete)
+BTF_SET8_END(veth_devtx_hook_ids)
+
 static int veth_forward_skb(struct net_device *dev, struct sk_buff *skb,
-			    struct veth_rq *rq, bool xdp)
+			    struct veth_rq *rq, bool xdp, bool request_timestamp)
 {
-	return __dev_forward_skb(dev, skb) ?: xdp ?
+	struct net_device *orig_dev = skb->dev;
+	int ret;
+
+	ret = __dev_forward_skb(dev, skb);
+	if (ret)
+		return ret;
+
+	if (devtx_enabled()) {
+		struct veth_devtx_frame ctx;
+
+		if (unlikely(request_timestamp))
+			__net_timestamp(skb);
+
+		devtx_frame_from_skb(&ctx.frame, skb, orig_dev);
+		ctx.frame.data -= ETH_HLEN; /* undo eth_type_trans pull */
+		ctx.frame.len += ETH_HLEN;
+		ctx.skb = skb;
+		veth_devtx_complete(&ctx.frame);
+	}
+
+	return xdp ?
 		veth_xdp_rx(rq, skb) :
 		__netif_rx(skb);
 }
@@ -343,6 +384,7 @@  static bool veth_skb_is_eligible_for_gro(const struct net_device *dev,
 static netdev_tx_t veth_xmit(struct sk_buff *skb, struct net_device *dev)
 {
 	struct veth_priv *rcv_priv, *priv = netdev_priv(dev);
+	bool request_timestamp = false;
 	struct veth_rq *rq = NULL;
 	struct net_device *rcv;
 	int length = skb->len;
@@ -356,6 +398,15 @@  static netdev_tx_t veth_xmit(struct sk_buff *skb, struct net_device *dev)
 		goto drop;
 	}
 
+	if (devtx_enabled()) {
+		struct veth_devtx_frame ctx;
+
+		devtx_frame_from_skb(&ctx.frame, skb, dev);
+		ctx.request_timestamp = false;
+		veth_devtx_submit(&ctx.frame);
+		request_timestamp = ctx.request_timestamp;
+	}
+
 	rcv_priv = netdev_priv(rcv);
 	rxq = skb_get_queue_mapping(skb);
 	if (rxq < rcv->real_num_rx_queues) {
@@ -370,7 +421,7 @@  static netdev_tx_t veth_xmit(struct sk_buff *skb, struct net_device *dev)
 	}
 
 	skb_tx_timestamp(skb);
-	if (likely(veth_forward_skb(rcv, skb, rq, use_napi) == NET_RX_SUCCESS)) {
+	if (likely(veth_forward_skb(rcv, skb, rq, use_napi, request_timestamp) == NET_RX_SUCCESS)) {
 		if (!use_napi)
 			dev_lstats_add(dev, length);
 	} else {
@@ -483,6 +534,7 @@  static int veth_xdp_xmit(struct net_device *dev, int n,
 {
 	struct veth_priv *rcv_priv, *priv = netdev_priv(dev);
 	int i, ret = -ENXIO, nxmit = 0;
+	ktime_t tx_timestamp = 0;
 	struct net_device *rcv;
 	unsigned int max_len;
 	struct veth_rq *rq;
@@ -511,9 +563,32 @@  static int veth_xdp_xmit(struct net_device *dev, int n,
 		void *ptr = veth_xdp_to_ptr(frame);
 
 		if (unlikely(xdp_get_frame_len(frame) > max_len ||
-			     __ptr_ring_produce(&rq->xdp_ring, ptr)))
+			     __ptr_ring_full(&rq->xdp_ring)))
+			break;
+
+		if (devtx_enabled()) {
+			struct veth_devtx_frame ctx;
+
+			devtx_frame_from_xdp(&ctx.frame, frame, dev);
+			ctx.request_timestamp = false;
+			veth_devtx_submit(&ctx.frame);
+
+			if (unlikely(ctx.request_timestamp))
+				tx_timestamp = ktime_get_real();
+		}
+
+		if (unlikely(__ptr_ring_produce(&rq->xdp_ring, ptr)))
 			break;
 		nxmit++;
+
+		if (devtx_enabled()) {
+			struct veth_devtx_frame ctx;
+
+			devtx_frame_from_xdp(&ctx.frame, frame, dev);
+			ctx.xdp_tx_timestamp = tx_timestamp;
+			ctx.skb = NULL;
+			veth_devtx_complete(&ctx.frame);
+		}
 	}
 	spin_unlock(&rq->xdp_ring.producer_lock);
 
@@ -1732,6 +1807,28 @@  static int veth_xdp_rx_hash(const struct xdp_md *ctx, u32 *hash,
 	return 0;
 }
 
+static int veth_devtx_sb_request_timestamp(const struct devtx_frame *_ctx)
+{
+	struct veth_devtx_frame *ctx = (struct veth_devtx_frame *)_ctx;
+
+	ctx->request_timestamp = true;
+
+	return 0;
+}
+
+static int veth_devtx_cp_timestamp(const struct devtx_frame *_ctx, u64 *timestamp)
+{
+	struct veth_devtx_frame *ctx = (struct veth_devtx_frame *)_ctx;
+
+	if (ctx->skb) {
+		*timestamp = ctx->skb->tstamp;
+		return 0;
+	}
+
+	*timestamp = ctx->xdp_tx_timestamp;
+	return 0;
+}
+
 static const struct net_device_ops veth_netdev_ops = {
 	.ndo_init            = veth_dev_init,
 	.ndo_open            = veth_open,
@@ -1756,6 +1853,8 @@  static const struct net_device_ops veth_netdev_ops = {
 static const struct xdp_metadata_ops veth_xdp_metadata_ops = {
 	.xmo_rx_timestamp		= veth_xdp_rx_timestamp,
 	.xmo_rx_hash			= veth_xdp_rx_hash,
+	.xmo_sb_request_timestamp	= veth_devtx_sb_request_timestamp,
+	.xmo_cp_timestamp		= veth_devtx_cp_timestamp,
 };
 
 #define VETH_FEATURES (NETIF_F_SG | NETIF_F_FRAGLIST | NETIF_F_HW_CSUM | \
@@ -2041,11 +2140,20 @@  static struct rtnl_link_ops veth_link_ops = {
 
 static __init int veth_init(void)
 {
+	int ret;
+
+	ret = devtx_hooks_register(&veth_devtx_hook_ids, &veth_xdp_metadata_ops);
+	if (ret) {
+		pr_warn("failed to register devtx hooks: %d", ret);
+		return ret;
+	}
+
 	return rtnl_link_register(&veth_link_ops);
 }
 
 static __exit void veth_exit(void)
 {
+	devtx_hooks_unregister(&veth_devtx_hook_ids);
 	rtnl_link_unregister(&veth_link_ops);
 }