Message ID | 20231203165129.1740512-3-yoong.siang.song@intel.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | BPF |
Headers | show |
Series | xsk: TX metadata Launch Time support | expand |
On 12/3/23 17:51, Song Yoong Siang wrote: > This patch enables Launch Time (Time-Based Scheduling) support to XDP zero > copy via XDP Tx metadata framework. > > Signed-off-by: Song Yoong Siang<yoong.siang.song@intel.com> > --- > drivers/net/ethernet/stmicro/stmmac/stmmac.h | 2 ++ As requested before, I think we need to see another driver implementing this. I propose driver igc and chip i225. The interesting thing for me is to see how the LaunchTime max 1 second into the future[1] is handled code wise. One suggestion is to add a section to Documentation/networking/xsk-tx-metadata.rst per driver that mentions/documents these different hardware limitations. It is natural that different types of hardware have limitations. This is a close-to hardware-level abstraction/API, and IMHO as long as we document the limitations we can expose this API without too many limitations for more capable hardware. [1] https://github.com/xdp-project/xdp-project/blob/master/areas/tsn/code01_follow_qdisc_TSN_offload.org#setup-code-driver-igb This stmmac driver and Intel Tiger Lake CPU must also have some limit on how long into the future it will/can schedule packets? People from xdp-hints list must make their voice hear if they want i210 and igb driver support, because it have even-more hardware limitations, see [1] (E.g. only TX queue 0 and 1 supports LaunchTime). BUT I know some have this hardware in production and might be motivated to get a functioning driver with this feature? --Jesper
On Mon, 2023-12-04 at 11:36 +0100, Jesper Dangaard Brouer wrote: > On 12/3/23 17:51, Song Yoong Siang wrote: > > This patch enables Launch Time (Time-Based Scheduling) support to XDP zero > > copy via XDP Tx metadata framework. > > > > Signed-off-by: Song Yoong Siang<yoong.siang.song@intel.com> > > --- > > drivers/net/ethernet/stmicro/stmmac/stmmac.h | 2 ++ > > As requested before, I think we need to see another driver implementing > this. > > I propose driver igc and chip i225. igc support would be really nice and highly appreciated. There are a lot of tests running here with that chip (i225/i226) / driver (igc) combination. Let me know if we can support somehow, testing included. > > The interesting thing for me is to see how the LaunchTime max 1 second > into the future[1] is handled code wise. One suggestion is to add a > section to Documentation/networking/xsk-tx-metadata.rst per driver that > mentions/documents these different hardware limitations. It is natural > that different types of hardware have limitations. This is a close-to > hardware-level abstraction/API, and IMHO as long as we document the > limitations we can expose this API without too many limitations for more > capable hardware. > > [1] > https://github.com/xdp-project/xdp-project/blob/master/areas/tsn/code01_follow_qdisc_TSN_offload.org#setup-code-driver-igb > > This stmmac driver and Intel Tiger Lake CPU must also have some limit on > how long into the future it will/can schedule packets? > > > People from xdp-hints list must make their voice hear if they want i210 > and igb driver support, because it have even-more hardware limitations, > see [1] (E.g. only TX queue 0 and 1 supports LaunchTime). BUT I know > some have this hardware in production and might be motivated to get a > functioning driver with this feature? i210 support would be nice, that would allow us to compare some test setups with different NICs. In addition it would simplify some test setups. For now, IMHO igc is more important. > > --Jesper
Jesper Dangaard Brouer wrote: > > > On 12/3/23 17:51, Song Yoong Siang wrote: > > This patch enables Launch Time (Time-Based Scheduling) support to XDP zero > > copy via XDP Tx metadata framework. > > > > Signed-off-by: Song Yoong Siang<yoong.siang.song@intel.com> > > --- > > drivers/net/ethernet/stmicro/stmmac/stmmac.h | 2 ++ > > As requested before, I think we need to see another driver implementing > this. > > I propose driver igc and chip i225. > > The interesting thing for me is to see how the LaunchTime max 1 second > into the future[1] is handled code wise. One suggestion is to add a > section to Documentation/networking/xsk-tx-metadata.rst per driver that > mentions/documents these different hardware limitations. It is natural > that different types of hardware have limitations. This is a close-to > hardware-level abstraction/API, and IMHO as long as we document the > limitations we can expose this API without too many limitations for more > capable hardware. I would assume that the kfunc will fail when a value is passed that cannot be programmed. What is being implemented here already exists for qdiscs. The FQ qdisc takes a horizon attribute and " when a packet is beyond the horizon at enqueue() time: - either drop the packet (default policy) - or cap its delivery time to the horizon. " commit 39d010504e6b ("net_sched: sch_fq: add horizon attribute") Having the admin manually configure this on the qdisc based on off-line knowledge of the device is more fragile than if the device would somehow signal its limit to the stack. But I don't think we should add enforcement of that as a requirement for this xdp extension of pacing.
On Monday, December 4, 2023 10:55 PM, Willem de Bruijn wrote: >Jesper Dangaard Brouer wrote: >> >> >> On 12/3/23 17:51, Song Yoong Siang wrote: >> > This patch enables Launch Time (Time-Based Scheduling) support to XDP zero >> > copy via XDP Tx metadata framework. >> > >> > Signed-off-by: Song Yoong Siang<yoong.siang.song@intel.com> >> > --- >> > drivers/net/ethernet/stmicro/stmmac/stmmac.h | 2 ++ >> >> As requested before, I think we need to see another driver implementing >> this. >> >> I propose driver igc and chip i225. Sure. I will include igc patches in next version. >> >> The interesting thing for me is to see how the LaunchTime max 1 second >> into the future[1] is handled code wise. One suggestion is to add a >> section to Documentation/networking/xsk-tx-metadata.rst per driver that >> mentions/documents these different hardware limitations. It is natural >> that different types of hardware have limitations. This is a close-to >> hardware-level abstraction/API, and IMHO as long as we document the >> limitations we can expose this API without too many limitations for more >> capable hardware. Sure. I will try to add hardware limitations in documentation. > >I would assume that the kfunc will fail when a value is passed that >cannot be programmed. > In current design, the xsk_tx_metadata_request() dint got return value. So user won't know if their request is fail. It is complex to inform user which request is failing. Therefore, IMHO, it is good that we let driver handle the error silently. >What is being implemented here already exists for qdiscs. The FQ >qdisc takes a horizon attribute and > > " > when a packet is beyond the horizon > at enqueue() time: > - either drop the packet (default policy) > - or cap its delivery time to the horizon. > " > commit 39d010504e6b ("net_sched: sch_fq: add horizon attribute") > >Having the admin manually configure this on the qdisc based on >off-line knowledge of the device is more fragile than if the device >would somehow signal its limit to the stack. > >But I don't think we should add enforcement of that as a requirement >for this xdp extension of pacing.
On Tue, 2023-12-05 at 15:25 +0000, Song, Yoong Siang wrote: > On Monday, December 4, 2023 10:55 PM, Willem de Bruijn wrote: > > Jesper Dangaard Brouer wrote: > > > > > > > > > On 12/3/23 17:51, Song Yoong Siang wrote: > > > > This patch enables Launch Time (Time-Based Scheduling) support to XDP zero > > > > copy via XDP Tx metadata framework. > > > > > > > > Signed-off-by: Song Yoong Siang<yoong.siang.song@intel.com> > > > > --- > > > > drivers/net/ethernet/stmicro/stmmac/stmmac.h | 2 ++ > > > > > > As requested before, I think we need to see another driver implementing > > > this. > > > > > > I propose driver igc and chip i225. > > Sure. I will include igc patches in next version. > > > > > > > The interesting thing for me is to see how the LaunchTime max 1 second > > > into the future[1] is handled code wise. One suggestion is to add a > > > section to Documentation/networking/xsk-tx-metadata.rst per driver that > > > mentions/documents these different hardware limitations. It is natural > > > that different types of hardware have limitations. This is a close-to > > > hardware-level abstraction/API, and IMHO as long as we document the > > > limitations we can expose this API without too many limitations for more > > > capable hardware. > > Sure. I will try to add hardware limitations in documentation. > > > > > I would assume that the kfunc will fail when a value is passed that > > cannot be programmed. > > > > In current design, the xsk_tx_metadata_request() dint got return value. > So user won't know if their request is fail. > It is complex to inform user which request is failing. > Therefore, IMHO, it is good that we let driver handle the error silently. > If the programmed value is invalid, the packet will be "dropped" / will never make it to the wire, right? That is clearly a situation that the user should be informed about. For RT systems this normally means that something is really wrong regarding timing / cycle overflow. Such systems have to react on that situation. > > > > What is being implemented here already exists for qdiscs. The FQ > > qdisc takes a horizon attribute and > > > > " > > when a packet is beyond the horizon > > at enqueue() time: > > - either drop the packet (default policy) > > - or cap its delivery time to the horizon. > > " > > commit 39d010504e6b ("net_sched: sch_fq: add horizon attribute") > > > > Having the admin manually configure this on the qdisc based on > > off-line knowledge of the device is more fragile than if the device > > would somehow signal its limit to the stack. > > > > But I don't think we should add enforcement of that as a requirement > > for this xdp extension of pacing.
On Tue, Dec 5, 2023 at 7:34 AM Florian Bezdeka <florian.bezdeka@siemens.com> wrote: > > On Tue, 2023-12-05 at 15:25 +0000, Song, Yoong Siang wrote: > > On Monday, December 4, 2023 10:55 PM, Willem de Bruijn wrote: > > > Jesper Dangaard Brouer wrote: > > > > > > > > > > > > On 12/3/23 17:51, Song Yoong Siang wrote: > > > > > This patch enables Launch Time (Time-Based Scheduling) support to XDP zero > > > > > copy via XDP Tx metadata framework. > > > > > > > > > > Signed-off-by: Song Yoong Siang<yoong.siang.song@intel.com> > > > > > --- > > > > > drivers/net/ethernet/stmicro/stmmac/stmmac.h | 2 ++ > > > > > > > > As requested before, I think we need to see another driver implementing > > > > this. > > > > > > > > I propose driver igc and chip i225. > > > > Sure. I will include igc patches in next version. > > > > > > > > > > The interesting thing for me is to see how the LaunchTime max 1 second > > > > into the future[1] is handled code wise. One suggestion is to add a > > > > section to Documentation/networking/xsk-tx-metadata.rst per driver that > > > > mentions/documents these different hardware limitations. It is natural > > > > that different types of hardware have limitations. This is a close-to > > > > hardware-level abstraction/API, and IMHO as long as we document the > > > > limitations we can expose this API without too many limitations for more > > > > capable hardware. > > > > Sure. I will try to add hardware limitations in documentation. > > > > > > > > I would assume that the kfunc will fail when a value is passed that > > > cannot be programmed. > > > > > > > In current design, the xsk_tx_metadata_request() dint got return value. > > So user won't know if their request is fail. > > It is complex to inform user which request is failing. > > Therefore, IMHO, it is good that we let driver handle the error silently. > > > > If the programmed value is invalid, the packet will be "dropped" / will > never make it to the wire, right? > > That is clearly a situation that the user should be informed about. For > RT systems this normally means that something is really wrong regarding > timing / cycle overflow. Such systems have to react on that situation. In general, af_xdp is a bit lacking in this 'notify the user that they somehow messed up' area :-( For example, pushing a tx descriptor with a wrong addr/len in zc mode will not give any visible signal back (besides driver potentially spilling something into dmesg as it was in the mlx case). We can probably start with having some counters for these events?
Stanislav Fomichev wrote: > On Tue, Dec 5, 2023 at 7:34 AM Florian Bezdeka > <florian.bezdeka@siemens.com> wrote: > > > > On Tue, 2023-12-05 at 15:25 +0000, Song, Yoong Siang wrote: > > > On Monday, December 4, 2023 10:55 PM, Willem de Bruijn wrote: > > > > Jesper Dangaard Brouer wrote: > > > > > > > > > > > > > > > On 12/3/23 17:51, Song Yoong Siang wrote: > > > > > > This patch enables Launch Time (Time-Based Scheduling) support to XDP zero > > > > > > copy via XDP Tx metadata framework. > > > > > > > > > > > > Signed-off-by: Song Yoong Siang<yoong.siang.song@intel.com> > > > > > > --- > > > > > > drivers/net/ethernet/stmicro/stmmac/stmmac.h | 2 ++ > > > > > > > > > > As requested before, I think we need to see another driver implementing > > > > > this. > > > > > > > > > > I propose driver igc and chip i225. > > > > > > Sure. I will include igc patches in next version. > > > > > > > > > > > > > The interesting thing for me is to see how the LaunchTime max 1 second > > > > > into the future[1] is handled code wise. One suggestion is to add a > > > > > section to Documentation/networking/xsk-tx-metadata.rst per driver that > > > > > mentions/documents these different hardware limitations. It is natural > > > > > that different types of hardware have limitations. This is a close-to > > > > > hardware-level abstraction/API, and IMHO as long as we document the > > > > > limitations we can expose this API without too many limitations for more > > > > > capable hardware. > > > > > > Sure. I will try to add hardware limitations in documentation. > > > > > > > > > > > I would assume that the kfunc will fail when a value is passed that > > > > cannot be programmed. > > > > > > > > > > In current design, the xsk_tx_metadata_request() dint got return value. > > > So user won't know if their request is fail. > > > It is complex to inform user which request is failing. > > > Therefore, IMHO, it is good that we let driver handle the error silently. > > > > > > > If the programmed value is invalid, the packet will be "dropped" / will > > never make it to the wire, right? Programmable behavior is to either drop or cap to some boundary value, such as the farthest programmable time in the future: the horizon. In fq: /* Check if packet timestamp is too far in the future. */ if (fq_packet_beyond_horizon(skb, q, now)) { if (q->horizon_drop) { q->stat_horizon_drops++; return qdisc_drop(skb, sch, to_free); } q->stat_horizon_caps++; skb->tstamp = now + q->horizon; } fq_skb_cb(skb)->time_to_send = skb->tstamp; Drop is the more obviously correct mode. Programming with a clock source that the driver does not support will then be a persistent failure. Preferably, this driver capability can be queried beforehand (rather than only through reading error counters afterwards). Perhaps it should not be a driver task to convert from possibly multiple clock sources to the device native clock. Right now, we do use per-device timecounters for this, implemented in the driver. As for which clocks are relevant. For PTP, I suppose the device PHC, converted to nsec. For pacing offload, TCP uses CLOCK_MONOTONIC. > > > > That is clearly a situation that the user should be informed about. For > > RT systems this normally means that something is really wrong regarding > > timing / cycle overflow. Such systems have to react on that situation. > > In general, af_xdp is a bit lacking in this 'notify the user that they > somehow messed up' area :-( > For example, pushing a tx descriptor with a wrong addr/len in zc mode > will not give any visible signal back (besides driver potentially > spilling something into dmesg as it was in the mlx case). > We can probably start with having some counters for these events? This is because the AF_XDP completion queue descriptor format is only a u64 address? Could error conditions be reported on tx completion in the metadata, using xsk_tx_metadata_complete?
On 12/05, Willem de Bruijn wrote: > Stanislav Fomichev wrote: > > On Tue, Dec 5, 2023 at 7:34 AM Florian Bezdeka > > <florian.bezdeka@siemens.com> wrote: > > > > > > On Tue, 2023-12-05 at 15:25 +0000, Song, Yoong Siang wrote: > > > > On Monday, December 4, 2023 10:55 PM, Willem de Bruijn wrote: > > > > > Jesper Dangaard Brouer wrote: > > > > > > > > > > > > > > > > > > On 12/3/23 17:51, Song Yoong Siang wrote: > > > > > > > This patch enables Launch Time (Time-Based Scheduling) support to XDP zero > > > > > > > copy via XDP Tx metadata framework. > > > > > > > > > > > > > > Signed-off-by: Song Yoong Siang<yoong.siang.song@intel.com> > > > > > > > --- > > > > > > > drivers/net/ethernet/stmicro/stmmac/stmmac.h | 2 ++ > > > > > > > > > > > > As requested before, I think we need to see another driver implementing > > > > > > this. > > > > > > > > > > > > I propose driver igc and chip i225. > > > > > > > > Sure. I will include igc patches in next version. > > > > > > > > > > > > > > > > The interesting thing for me is to see how the LaunchTime max 1 second > > > > > > into the future[1] is handled code wise. One suggestion is to add a > > > > > > section to Documentation/networking/xsk-tx-metadata.rst per driver that > > > > > > mentions/documents these different hardware limitations. It is natural > > > > > > that different types of hardware have limitations. This is a close-to > > > > > > hardware-level abstraction/API, and IMHO as long as we document the > > > > > > limitations we can expose this API without too many limitations for more > > > > > > capable hardware. > > > > > > > > Sure. I will try to add hardware limitations in documentation. > > > > > > > > > > > > > > I would assume that the kfunc will fail when a value is passed that > > > > > cannot be programmed. > > > > > > > > > > > > > In current design, the xsk_tx_metadata_request() dint got return value. > > > > So user won't know if their request is fail. > > > > It is complex to inform user which request is failing. > > > > Therefore, IMHO, it is good that we let driver handle the error silently. > > > > > > > > > > If the programmed value is invalid, the packet will be "dropped" / will > > > never make it to the wire, right? > > Programmable behavior is to either drop or cap to some boundary > value, such as the farthest programmable time in the future: the > horizon. In fq: > > /* Check if packet timestamp is too far in the future. */ > if (fq_packet_beyond_horizon(skb, q, now)) { > if (q->horizon_drop) { > q->stat_horizon_drops++; > return qdisc_drop(skb, sch, to_free); > } > q->stat_horizon_caps++; > skb->tstamp = now + q->horizon; > } > fq_skb_cb(skb)->time_to_send = skb->tstamp; > > Drop is the more obviously correct mode. > > Programming with a clock source that the driver does not support will > then be a persistent failure. > > Preferably, this driver capability can be queried beforehand (rather > than only through reading error counters afterwards). > > Perhaps it should not be a driver task to convert from possibly > multiple clock sources to the device native clock. Right now, we do > use per-device timecounters for this, implemented in the driver. > > As for which clocks are relevant. For PTP, I suppose the device PHC, > converted to nsec. For pacing offload, TCP uses CLOCK_MONOTONIC. Do we need to expose some generic netdev netlink apis to query/adjust nic clock sources (or maybe there is something existing already)? Then the userspace can be responsible for syncing/converting the timestamps to the internal nic clocks. +1 to trying to avoid doing this in the drivers. > > > That is clearly a situation that the user should be informed about. For > > > RT systems this normally means that something is really wrong regarding > > > timing / cycle overflow. Such systems have to react on that situation. > > > > In general, af_xdp is a bit lacking in this 'notify the user that they > > somehow messed up' area :-( > > For example, pushing a tx descriptor with a wrong addr/len in zc mode > > will not give any visible signal back (besides driver potentially > > spilling something into dmesg as it was in the mlx case). > > We can probably start with having some counters for these events? > > This is because the AF_XDP completion queue descriptor format is only > a u64 address? Yeah. XDP_COPY mode has the descriptor validation which is exported via recvmsg errno, but zerocopy path seems to be too deep in the stack to report something back. And there is no place, as you mention, in the completion ring to report the status. > Could error conditions be reported on tx completion in the metadata, > using xsk_tx_metadata_complete? That would be one way to do it, yes. But then the error reporting depends on the metadata opt-in. Having a separate ring to export the errors, or having a v2 tx-completions layout with extra 'status' field would also work. But this seems like something that should be handled separately? Because we'd have to teach all existing zc drivers to report those errors back instead of dropping these descriptors..
Stanislav Fomichev wrote: > On 12/05, Willem de Bruijn wrote: > > Stanislav Fomichev wrote: > > > On Tue, Dec 5, 2023 at 7:34 AM Florian Bezdeka > > > <florian.bezdeka@siemens.com> wrote: > > > > > > > > On Tue, 2023-12-05 at 15:25 +0000, Song, Yoong Siang wrote: > > > > > On Monday, December 4, 2023 10:55 PM, Willem de Bruijn wrote: > > > > > > Jesper Dangaard Brouer wrote: > > > > > > > > > > > > > > > > > > > > > On 12/3/23 17:51, Song Yoong Siang wrote: > > > > > > > > This patch enables Launch Time (Time-Based Scheduling) support to XDP zero > > > > > > > > copy via XDP Tx metadata framework. > > > > > > > > > > > > > > > > Signed-off-by: Song Yoong Siang<yoong.siang.song@intel.com> > > > > > > > > --- > > > > > > > > drivers/net/ethernet/stmicro/stmmac/stmmac.h | 2 ++ > > > > > > > > > > > > > > As requested before, I think we need to see another driver implementing > > > > > > > this. > > > > > > > > > > > > > > I propose driver igc and chip i225. > > > > > > > > > > Sure. I will include igc patches in next version. > > > > > > > > > > > > > > > > > > > The interesting thing for me is to see how the LaunchTime max 1 second > > > > > > > into the future[1] is handled code wise. One suggestion is to add a > > > > > > > section to Documentation/networking/xsk-tx-metadata.rst per driver that > > > > > > > mentions/documents these different hardware limitations. It is natural > > > > > > > that different types of hardware have limitations. This is a close-to > > > > > > > hardware-level abstraction/API, and IMHO as long as we document the > > > > > > > limitations we can expose this API without too many limitations for more > > > > > > > capable hardware. > > > > > > > > > > Sure. I will try to add hardware limitations in documentation. > > > > > > > > > > > > > > > > > I would assume that the kfunc will fail when a value is passed that > > > > > > cannot be programmed. > > > > > > > > > > > > > > > > In current design, the xsk_tx_metadata_request() dint got return value. > > > > > So user won't know if their request is fail. > > > > > It is complex to inform user which request is failing. > > > > > Therefore, IMHO, it is good that we let driver handle the error silently. > > > > > > > > > > > > > If the programmed value is invalid, the packet will be "dropped" / will > > > > never make it to the wire, right? > > > > Programmable behavior is to either drop or cap to some boundary > > value, such as the farthest programmable time in the future: the > > horizon. In fq: > > > > /* Check if packet timestamp is too far in the future. */ > > if (fq_packet_beyond_horizon(skb, q, now)) { > > if (q->horizon_drop) { > > q->stat_horizon_drops++; > > return qdisc_drop(skb, sch, to_free); > > } > > q->stat_horizon_caps++; > > skb->tstamp = now + q->horizon; > > } > > fq_skb_cb(skb)->time_to_send = skb->tstamp; > > > > Drop is the more obviously correct mode. > > > > Programming with a clock source that the driver does not support will > > then be a persistent failure. > > > > Preferably, this driver capability can be queried beforehand (rather > > than only through reading error counters afterwards). > > > > Perhaps it should not be a driver task to convert from possibly > > multiple clock sources to the device native clock. Right now, we do > > use per-device timecounters for this, implemented in the driver. > > > > As for which clocks are relevant. For PTP, I suppose the device PHC, > > converted to nsec. For pacing offload, TCP uses CLOCK_MONOTONIC. > > Do we need to expose some generic netdev netlink apis to query/adjust > nic clock sources (or maybe there is something existing already)? > Then the userspace can be responsible for syncing/converting the > timestamps to the internal nic clocks. +1 to trying to avoid doing > this in the drivers. Perhaps. I'm just a bit hesitant since that is UAPI and this is all quite hand-wavy still. Some of the conversion necessarily has to be in the driver. Only the driver knows the descriptor format, and limitations of that, such as the bit-width that can be encoded. If we cannot move anything out of the drivers (quite likely), then agreed that a netdev/ethtool netlink query approach is helpful. To be clear: I don't mean that that should be part of this series. This is not an XSK specific concern. > > > > That is clearly a situation that the user should be informed about. For > > > > RT systems this normally means that something is really wrong regarding > > > > timing / cycle overflow. Such systems have to react on that situation. > > > > > > In general, af_xdp is a bit lacking in this 'notify the user that they > > > somehow messed up' area :-( > > > For example, pushing a tx descriptor with a wrong addr/len in zc mode > > > will not give any visible signal back (besides driver potentially > > > spilling something into dmesg as it was in the mlx case). > > > We can probably start with having some counters for these events? > > > > This is because the AF_XDP completion queue descriptor format is only > > a u64 address? > > Yeah. XDP_COPY mode has the descriptor validation which is exported via > recvmsg errno, but zerocopy path seems to be too deep in the stack > to report something back. And there is no place, as you mention, > in the completion ring to report the status. > > > Could error conditions be reported on tx completion in the metadata, > > using xsk_tx_metadata_complete? > > That would be one way to do it, yes. But then the error reporting depends > on the metadata opt-in. Having a separate ring to export the errors, > or having a v2 tx-completions layout with extra 'status' field would also > work. > > But this seems like something that should be handled separately? Because > we'd have to teach all existing zc drivers to report those errors back > instead of dropping these descriptors.. Agreed on both points :) A v2 tx-completions that supports status could be useful. But again, this is out of scope of this specific launch time feature.
On Tue, 5 Dec 2023 at 20:39, Stanislav Fomichev <sdf@google.com> wrote: > > On 12/05, Willem de Bruijn wrote: > > Stanislav Fomichev wrote: > > > On Tue, Dec 5, 2023 at 7:34 AM Florian Bezdeka > > > <florian.bezdeka@siemens.com> wrote: > > > > > > > > On Tue, 2023-12-05 at 15:25 +0000, Song, Yoong Siang wrote: > > > > > On Monday, December 4, 2023 10:55 PM, Willem de Bruijn wrote: > > > > > > Jesper Dangaard Brouer wrote: > > > > > > > > > > > > > > > > > > > > > On 12/3/23 17:51, Song Yoong Siang wrote: > > > > > > > > This patch enables Launch Time (Time-Based Scheduling) support to XDP zero > > > > > > > > copy via XDP Tx metadata framework. > > > > > > > > > > > > > > > > Signed-off-by: Song Yoong Siang<yoong.siang.song@intel.com> > > > > > > > > --- > > > > > > > > drivers/net/ethernet/stmicro/stmmac/stmmac.h | 2 ++ > > > > > > > > > > > > > > As requested before, I think we need to see another driver implementing > > > > > > > this. > > > > > > > > > > > > > > I propose driver igc and chip i225. > > > > > > > > > > Sure. I will include igc patches in next version. > > > > > > > > > > > > > > > > > > > The interesting thing for me is to see how the LaunchTime max 1 second > > > > > > > into the future[1] is handled code wise. One suggestion is to add a > > > > > > > section to Documentation/networking/xsk-tx-metadata.rst per driver that > > > > > > > mentions/documents these different hardware limitations. It is natural > > > > > > > that different types of hardware have limitations. This is a close-to > > > > > > > hardware-level abstraction/API, and IMHO as long as we document the > > > > > > > limitations we can expose this API without too many limitations for more > > > > > > > capable hardware. > > > > > > > > > > Sure. I will try to add hardware limitations in documentation. > > > > > > > > > > > > > > > > > I would assume that the kfunc will fail when a value is passed that > > > > > > cannot be programmed. > > > > > > > > > > > > > > > > In current design, the xsk_tx_metadata_request() dint got return value. > > > > > So user won't know if their request is fail. > > > > > It is complex to inform user which request is failing. > > > > > Therefore, IMHO, it is good that we let driver handle the error silently. > > > > > > > > > > > > > If the programmed value is invalid, the packet will be "dropped" / will > > > > never make it to the wire, right? > > > > Programmable behavior is to either drop or cap to some boundary > > value, such as the farthest programmable time in the future: the > > horizon. In fq: > > > > /* Check if packet timestamp is too far in the future. */ > > if (fq_packet_beyond_horizon(skb, q, now)) { > > if (q->horizon_drop) { > > q->stat_horizon_drops++; > > return qdisc_drop(skb, sch, to_free); > > } > > q->stat_horizon_caps++; > > skb->tstamp = now + q->horizon; > > } > > fq_skb_cb(skb)->time_to_send = skb->tstamp; > > > > Drop is the more obviously correct mode. > > > > Programming with a clock source that the driver does not support will > > then be a persistent failure. > > > > Preferably, this driver capability can be queried beforehand (rather > > than only through reading error counters afterwards). > > > > Perhaps it should not be a driver task to convert from possibly > > multiple clock sources to the device native clock. Right now, we do > > use per-device timecounters for this, implemented in the driver. > > > > As for which clocks are relevant. For PTP, I suppose the device PHC, > > converted to nsec. For pacing offload, TCP uses CLOCK_MONOTONIC. > > Do we need to expose some generic netdev netlink apis to query/adjust > nic clock sources (or maybe there is something existing already)? > Then the userspace can be responsible for syncing/converting the > timestamps to the internal nic clocks. +1 to trying to avoid doing > this in the drivers. > > > > > That is clearly a situation that the user should be informed about. For > > > > RT systems this normally means that something is really wrong regarding > > > > timing / cycle overflow. Such systems have to react on that situation. > > > > > > In general, af_xdp is a bit lacking in this 'notify the user that they > > > somehow messed up' area :-( > > > For example, pushing a tx descriptor with a wrong addr/len in zc mode > > > will not give any visible signal back (besides driver potentially > > > spilling something into dmesg as it was in the mlx case). > > > We can probably start with having some counters for these events? > > > > This is because the AF_XDP completion queue descriptor format is only > > a u64 address? > > Yeah. XDP_COPY mode has the descriptor validation which is exported via > recvmsg errno, but zerocopy path seems to be too deep in the stack > to report something back. And there is no place, as you mention, > in the completion ring to report the status. > > > Could error conditions be reported on tx completion in the metadata, > > using xsk_tx_metadata_complete? > > That would be one way to do it, yes. But then the error reporting depends > on the metadata opt-in. Having a separate ring to export the errors, > or having a v2 tx-completions layout with extra 'status' field would also > work. There are error counters for the non-metadata and offloading cases above that can be retrieved with the XDP_STATISTICS getsockopt(). From if_xdp.h: struct xdp_statistics { __u64 rx_dropped; /* Dropped for other reasons */ __u64 rx_invalid_descs; /* Dropped due to invalid descriptor */ __u64 tx_invalid_descs; /* Dropped due to invalid descriptor */ __u64 rx_ring_full; /* Dropped due to rx ring being full */ __u64 rx_fill_ring_empty_descs; /* Failed to retrieve item from fill ring */ __u64 tx_ring_empty_descs; /* Failed to retrieve item from tx ring */ }; Albeit, these are aggregate statistics and do not say anything about which packet that caused it. Works well for things that are programming bugs that should not occur (such as rx_invalid_descs and tx_invalid_descs) and requires the programmer to debug and fix his or her program, but it does not work for requests that might fail even though the program is correct and need to be handled on a packet by packet basis. So something needs to be added for that as you both say. Would prefer if we could avoid a v2 completion descriptor format or another ring that needs to be checked all the time, so if we could live with providing the error status in the metadata field of the packet at completion time, that would be good. Though having the error status in the completion ring would be faster as that cache line is hot, while the metadata section of the packet is likely not at completion time. So that speaks for a v2 completion ring format. Just thinking out loud here. > But this seems like something that should be handled separately? Because > we'd have to teach all existing zc drivers to report those errors back > instead of dropping these descriptors.. >
On 12/06, Magnus Karlsson wrote: > On Tue, 5 Dec 2023 at 20:39, Stanislav Fomichev <sdf@google.com> wrote: > > > > On 12/05, Willem de Bruijn wrote: > > > Stanislav Fomichev wrote: > > > > On Tue, Dec 5, 2023 at 7:34 AM Florian Bezdeka > > > > <florian.bezdeka@siemens.com> wrote: > > > > > > > > > > On Tue, 2023-12-05 at 15:25 +0000, Song, Yoong Siang wrote: > > > > > > On Monday, December 4, 2023 10:55 PM, Willem de Bruijn wrote: > > > > > > > Jesper Dangaard Brouer wrote: > > > > > > > > > > > > > > > > > > > > > > > > On 12/3/23 17:51, Song Yoong Siang wrote: > > > > > > > > > This patch enables Launch Time (Time-Based Scheduling) support to XDP zero > > > > > > > > > copy via XDP Tx metadata framework. > > > > > > > > > > > > > > > > > > Signed-off-by: Song Yoong Siang<yoong.siang.song@intel.com> > > > > > > > > > --- > > > > > > > > > drivers/net/ethernet/stmicro/stmmac/stmmac.h | 2 ++ > > > > > > > > > > > > > > > > As requested before, I think we need to see another driver implementing > > > > > > > > this. > > > > > > > > > > > > > > > > I propose driver igc and chip i225. > > > > > > > > > > > > Sure. I will include igc patches in next version. > > > > > > > > > > > > > > > > > > > > > > The interesting thing for me is to see how the LaunchTime max 1 second > > > > > > > > into the future[1] is handled code wise. One suggestion is to add a > > > > > > > > section to Documentation/networking/xsk-tx-metadata.rst per driver that > > > > > > > > mentions/documents these different hardware limitations. It is natural > > > > > > > > that different types of hardware have limitations. This is a close-to > > > > > > > > hardware-level abstraction/API, and IMHO as long as we document the > > > > > > > > limitations we can expose this API without too many limitations for more > > > > > > > > capable hardware. > > > > > > > > > > > > Sure. I will try to add hardware limitations in documentation. > > > > > > > > > > > > > > > > > > > > I would assume that the kfunc will fail when a value is passed that > > > > > > > cannot be programmed. > > > > > > > > > > > > > > > > > > > In current design, the xsk_tx_metadata_request() dint got return value. > > > > > > So user won't know if their request is fail. > > > > > > It is complex to inform user which request is failing. > > > > > > Therefore, IMHO, it is good that we let driver handle the error silently. > > > > > > > > > > > > > > > > If the programmed value is invalid, the packet will be "dropped" / will > > > > > never make it to the wire, right? > > > > > > Programmable behavior is to either drop or cap to some boundary > > > value, such as the farthest programmable time in the future: the > > > horizon. In fq: > > > > > > /* Check if packet timestamp is too far in the future. */ > > > if (fq_packet_beyond_horizon(skb, q, now)) { > > > if (q->horizon_drop) { > > > q->stat_horizon_drops++; > > > return qdisc_drop(skb, sch, to_free); > > > } > > > q->stat_horizon_caps++; > > > skb->tstamp = now + q->horizon; > > > } > > > fq_skb_cb(skb)->time_to_send = skb->tstamp; > > > > > > Drop is the more obviously correct mode. > > > > > > Programming with a clock source that the driver does not support will > > > then be a persistent failure. > > > > > > Preferably, this driver capability can be queried beforehand (rather > > > than only through reading error counters afterwards). > > > > > > Perhaps it should not be a driver task to convert from possibly > > > multiple clock sources to the device native clock. Right now, we do > > > use per-device timecounters for this, implemented in the driver. > > > > > > As for which clocks are relevant. For PTP, I suppose the device PHC, > > > converted to nsec. For pacing offload, TCP uses CLOCK_MONOTONIC. > > > > Do we need to expose some generic netdev netlink apis to query/adjust > > nic clock sources (or maybe there is something existing already)? > > Then the userspace can be responsible for syncing/converting the > > timestamps to the internal nic clocks. +1 to trying to avoid doing > > this in the drivers. > > > > > > > That is clearly a situation that the user should be informed about. For > > > > > RT systems this normally means that something is really wrong regarding > > > > > timing / cycle overflow. Such systems have to react on that situation. > > > > > > > > In general, af_xdp is a bit lacking in this 'notify the user that they > > > > somehow messed up' area :-( > > > > For example, pushing a tx descriptor with a wrong addr/len in zc mode > > > > will not give any visible signal back (besides driver potentially > > > > spilling something into dmesg as it was in the mlx case). > > > > We can probably start with having some counters for these events? > > > > > > This is because the AF_XDP completion queue descriptor format is only > > > a u64 address? > > > > Yeah. XDP_COPY mode has the descriptor validation which is exported via > > recvmsg errno, but zerocopy path seems to be too deep in the stack > > to report something back. And there is no place, as you mention, > > in the completion ring to report the status. > > > > > Could error conditions be reported on tx completion in the metadata, > > > using xsk_tx_metadata_complete? > > > > That would be one way to do it, yes. But then the error reporting depends > > on the metadata opt-in. Having a separate ring to export the errors, > > or having a v2 tx-completions layout with extra 'status' field would also > > work. > > There are error counters for the non-metadata and offloading cases > above that can be retrieved with the XDP_STATISTICS getsockopt(). From > if_xdp.h: > > struct xdp_statistics { > __u64 rx_dropped; /* Dropped for other reasons */ > __u64 rx_invalid_descs; /* Dropped due to invalid descriptor */ > __u64 tx_invalid_descs; /* Dropped due to invalid descriptor */ > __u64 rx_ring_full; /* Dropped due to rx ring being full */ > __u64 rx_fill_ring_empty_descs; /* Failed to retrieve item > from fill ring */ > __u64 tx_ring_empty_descs; /* Failed to retrieve item from tx ring */ > }; > > Albeit, these are aggregate statistics and do not say anything about > which packet that caused it. Works well for things that are > programming bugs that should not occur (such as rx_invalid_descs and > tx_invalid_descs) and requires the programmer to debug and fix his or > her program, but it does not work for requests that might fail even > though the program is correct and need to be handled on a packet by > packet basis. So something needs to be added for that as you both say. > > Would prefer if we could avoid a v2 completion descriptor format or > another ring that needs to be checked all the time, so if we could > live with providing the error status in the metadata field of the > packet at completion time, that would be good. Though having the error > status in the completion ring would be faster as that cache line is > hot, while the metadata section of the packet is likely not at > completion time. So that speaks for a v2 completion ring format. Just > thinking out loud here. In this case, maybe adding tx_over_horizon_dropped to XDP_STATISTICS is all we need here? We can have some new api to query this horizon per netdev.
Hi all, Fyi, I submitted a patch [1] to enable tx metadata for igc driver as a preparation to add launch time to it. After the patch is accepted, I will include igc driver in next version of launch time patch set. [1] https://patchwork.kernel.org/project/netdevbpf/patch/20231215162158.951925-1-yoong.siang.song@intel.com/ Thanks & Regards Siang
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac.h b/drivers/net/ethernet/stmicro/stmmac/stmmac.h index 686c94c2e8a7..e8538af6e207 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac.h +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac.h @@ -105,6 +105,8 @@ struct stmmac_metadata_request { struct stmmac_priv *priv; struct dma_desc *tx_desc; bool *set_ic; + struct dma_edesc *edesc; + int tbs; }; struct stmmac_xsk_tx_complete { diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c index c2ac88aaffed..1fe80bfae24b 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c @@ -2465,9 +2465,20 @@ static u64 stmmac_xsk_fill_timestamp(void *_priv) return 0; } +static void stmmac_xsk_request_launch_time(u64 launch_time, void *_priv) +{ + struct stmmac_metadata_request *meta_req = _priv; + struct timespec64 ts = ns_to_timespec64(launch_time); + + if (meta_req->tbs & STMMAC_TBS_EN) + stmmac_set_desc_tbs(meta_req->priv, meta_req->edesc, ts.tv_sec, + ts.tv_nsec); +} + static const struct xsk_tx_metadata_ops stmmac_xsk_tx_metadata_ops = { .tmo_request_timestamp = stmmac_xsk_request_timestamp, .tmo_fill_timestamp = stmmac_xsk_fill_timestamp, + .tmo_request_launch_time = stmmac_xsk_request_launch_time, }; static bool stmmac_xdp_xmit_zc(struct stmmac_priv *priv, u32 queue, u32 budget) @@ -2545,6 +2556,8 @@ static bool stmmac_xdp_xmit_zc(struct stmmac_priv *priv, u32 queue, u32 budget) meta_req.priv = priv; meta_req.tx_desc = tx_desc; meta_req.set_ic = &set_ic; + meta_req.tbs = tx_q->tbs; + meta_req.edesc = &tx_q->dma_entx[entry]; xsk_tx_metadata_request(meta, &stmmac_xsk_tx_metadata_ops, &meta_req); if (set_ic) {
This patch enables Launch Time (Time-Based Scheduling) support to XDP zero copy via XDP Tx metadata framework. Signed-off-by: Song Yoong Siang <yoong.siang.song@intel.com> --- drivers/net/ethernet/stmicro/stmmac/stmmac.h | 2 ++ drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 13 +++++++++++++ 2 files changed, 15 insertions(+)