mbox series

[net-next,v1,0/9] ethtool: Add support for frame preemption

Message ID 20201202045325.3254757-1-vinicius.gomes@intel.com (mailing list archive)
Headers show
Series ethtool: Add support for frame preemption | expand

Message

Vinicius Costa Gomes Dec. 2, 2020, 4:53 a.m. UTC
Hi,

Changes from RFC v2:
 - Reorganised the offload enabling/disabling on the driver size;
 - Added a few igc fixes;
 
Changes from RFC v1:
 - The per-queue preemptible/express setting is moved to applicable
   qdiscs (Jakub Kicinski and others);
 - "min-frag-size" now follows the 802.3br specification more closely,
   it's expressed as X in '64(1 + X) + 4' (Joergen Andreasen);

Another point that should be noted is the addition of the
TC_SETUP_PREEMPT offload type, the idea behind this is to allow other
qdiscs (was thinking of mqprio) to also configure which queues should
be marked as express/preemptible.

Original cover letter:

This is still an RFC because two main reasons, I want to confirm that
this approach (per-queue settings via qdiscs, device settings via
ethtool) looks good, even though there aren't much more options left ;-)
The other reason is that while testing this I found some weirdness
in the driver that I would need a bit more time to investigate.

(In case these patches are not enough to give an idea of how things
work, I can send the userspace patches, of course.)

The idea of this "hybrid" approach is that applications/users would do
the following steps to configure frame preemption:

$ tc qdisc replace dev $IFACE parent root handle 100 taprio \
      num_tc 3 \
      map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 \
      queues 1@0 1@1 2@2 \
      base-time $BASE_TIME \
      sched-entry S 0f 10000000 \
      preempt 1110 \
      flags 0x2 

The "preempt" parameter is the only difference, it configures which
queues are marked as preemptible, in this example, queue 0 is marked
as "not preemptible", so it is express, the rest of the four queues
are preemptible.

The next step, of this example, would be to enable frame preemption in
the device, via ethtool, and set the minimum fragment size to 2:

$ sudo ./ethtool --set-frame-preemption $IFACE fp on min-frag-size 2


Cheers,


Vinicius Costa Gomes (9):
  ethtool: Add support for configuring frame preemption
  taprio: Add support for frame preemption offload
  igc: Set the RX packet buffer size for TSN mode
  igc: Only dump registers if configured to dump HW information
  igc: Avoid TX Hangs because long cycles
  igc: Add support for tuning frame preemption via ethtool
  igc: Add support for Frame Preemption offload
  igc: Add support for exposing frame preemption stats registers
  igc: Separate TSN configurations that can be updated

 drivers/net/ethernet/intel/igc/igc.h         |  10 ++
 drivers/net/ethernet/intel/igc/igc_defines.h |   6 +
 drivers/net/ethernet/intel/igc/igc_dump.c    |   3 +
 drivers/net/ethernet/intel/igc/igc_ethtool.c |  63 ++++++++
 drivers/net/ethernet/intel/igc/igc_main.c    |  31 +++-
 drivers/net/ethernet/intel/igc/igc_regs.h    |  10 ++
 drivers/net/ethernet/intel/igc/igc_tsn.c     | 158 ++++++++++++++-----
 drivers/net/ethernet/intel/igc/igc_tsn.h     |   1 +
 include/linux/ethtool.h                      |  19 +++
 include/linux/netdevice.h                    |   1 +
 include/net/pkt_sched.h                      |   4 +
 include/uapi/linux/ethtool_netlink.h         |  17 ++
 include/uapi/linux/pkt_sched.h               |   1 +
 net/ethtool/Makefile                         |   2 +-
 net/ethtool/netlink.c                        |  19 +++
 net/ethtool/netlink.h                        |   4 +
 net/ethtool/preempt.c                        | 151 ++++++++++++++++++
 net/sched/sch_taprio.c                       |  41 ++++-
 18 files changed, 491 insertions(+), 50 deletions(-)
 create mode 100644 net/ethtool/preempt.c

Comments

Jakub Kicinski Dec. 5, 2020, 5:50 p.m. UTC | #1
On Tue,  1 Dec 2020 20:53:16 -0800 Vinicius Costa Gomes wrote:
> $ tc qdisc replace dev $IFACE parent root handle 100 taprio \
>       num_tc 3 \
>       map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 \
>       queues 1@0 1@1 2@2 \
>       base-time $BASE_TIME \
>       sched-entry S 0f 10000000 \
>       preempt 1110 \
>       flags 0x2 
> 
> The "preempt" parameter is the only difference, it configures which
> queues are marked as preemptible, in this example, queue 0 is marked
> as "not preemptible", so it is express, the rest of the four queues
> are preemptible.

Does it make more sense for the individual queues to be preemptible 
or not, or is it better controlled at traffic class level?
I was looking at patch 2, and 32 queues isn't that many these days..
We either need a larger type there or configure this based on classes.
Vinicius Costa Gomes Dec. 7, 2020, 10:49 p.m. UTC | #2
Jakub Kicinski <kuba@kernel.org> writes:

> On Tue,  1 Dec 2020 20:53:16 -0800 Vinicius Costa Gomes wrote:
>> $ tc qdisc replace dev $IFACE parent root handle 100 taprio \
>>       num_tc 3 \
>>       map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 \
>>       queues 1@0 1@1 2@2 \
>>       base-time $BASE_TIME \
>>       sched-entry S 0f 10000000 \
>>       preempt 1110 \
>>       flags 0x2 
>> 
>> The "preempt" parameter is the only difference, it configures which
>> queues are marked as preemptible, in this example, queue 0 is marked
>> as "not preemptible", so it is express, the rest of the four queues
>> are preemptible.
>
> Does it make more sense for the individual queues to be preemptible 
> or not, or is it better controlled at traffic class level?
> I was looking at patch 2, and 32 queues isn't that many these days..
> We either need a larger type there or configure this based on classes.

I can set more future proof sizes for expressing the queues, sure, but
the issue, I think, is that frame preemption has dimishing returns with
link speed: at 2.5G the latency improvements are on the order of single
digit microseconds. At greater speeds the improvements are even less
noticeable.

The only adapters that I see that support frame preemtion have 8 queues
or less. 

The idea of configuring frame preemption based on classes is
interesting. I will play with it, and see how it looks.


Cheers,
Vladimir Oltean Dec. 7, 2020, 11:12 p.m. UTC | #3
On Mon, Dec 07, 2020 at 02:49:35PM -0800, Vinicius Costa Gomes wrote:
> Jakub Kicinski <kuba@kernel.org> writes:
>
> > On Tue,  1 Dec 2020 20:53:16 -0800 Vinicius Costa Gomes wrote:
> >> $ tc qdisc replace dev $IFACE parent root handle 100 taprio \
> >>       num_tc 3 \
> >>       map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 \
> >>       queues 1@0 1@1 2@2 \
> >>       base-time $BASE_TIME \
> >>       sched-entry S 0f 10000000 \
> >>       preempt 1110 \
> >>       flags 0x2
> >>
> >> The "preempt" parameter is the only difference, it configures which
> >> queues are marked as preemptible, in this example, queue 0 is marked
> >> as "not preemptible", so it is express, the rest of the four queues
> >> are preemptible.
> >
> > Does it make more sense for the individual queues to be preemptible
> > or not, or is it better controlled at traffic class level?
> > I was looking at patch 2, and 32 queues isn't that many these days..
> > We either need a larger type there or configure this based on classes.
>
> I can set more future proof sizes for expressing the queues, sure, but
> the issue, I think, is that frame preemption has dimishing returns with
> link speed: at 2.5G the latency improvements are on the order of single
> digit microseconds. At greater speeds the improvements are even less
> noticeable.

You could look at it another way.
You can enable jumbo frames in your network, and your latency-sensitive
traffic would not suffer as long as the jumbo frames are preemptible.

> The only adapters that I see that support frame preemtion have 8 queues
> or less.
>
> The idea of configuring frame preemption based on classes is
> interesting. I will play with it, and see how it looks.

I admit I never understood why you insist on configuring TSN offloads
per hardware queue and not per traffic class.
Vinicius Costa Gomes Dec. 8, 2020, 12:34 a.m. UTC | #4
Vladimir Oltean <vladimir.oltean@nxp.com> writes:

> On Mon, Dec 07, 2020 at 02:49:35PM -0800, Vinicius Costa Gomes wrote:
>> Jakub Kicinski <kuba@kernel.org> writes:
>>
>> > On Tue,  1 Dec 2020 20:53:16 -0800 Vinicius Costa Gomes wrote:
>> >> $ tc qdisc replace dev $IFACE parent root handle 100 taprio \
>> >>       num_tc 3 \
>> >>       map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 \
>> >>       queues 1@0 1@1 2@2 \
>> >>       base-time $BASE_TIME \
>> >>       sched-entry S 0f 10000000 \
>> >>       preempt 1110 \
>> >>       flags 0x2
>> >>
>> >> The "preempt" parameter is the only difference, it configures which
>> >> queues are marked as preemptible, in this example, queue 0 is marked
>> >> as "not preemptible", so it is express, the rest of the four queues
>> >> are preemptible.
>> >
>> > Does it make more sense for the individual queues to be preemptible
>> > or not, or is it better controlled at traffic class level?
>> > I was looking at patch 2, and 32 queues isn't that many these days..
>> > We either need a larger type there or configure this based on classes.
>>
>> I can set more future proof sizes for expressing the queues, sure, but
>> the issue, I think, is that frame preemption has dimishing returns with
>> link speed: at 2.5G the latency improvements are on the order of single
>> digit microseconds. At greater speeds the improvements are even less
>> noticeable.
>
> You could look at it another way.
> You can enable jumbo frames in your network, and your latency-sensitive
> traffic would not suffer as long as the jumbo frames are preemptible.
>

Speaking of jumbo frame, that's something that the standards are
missing, TSN features + jumbo frames will leave a lot of stuff up to the
implementation.

>> The only adapters that I see that support frame preemtion have 8 queues
>> or less.
>>
>> The idea of configuring frame preemption based on classes is
>> interesting. I will play with it, and see how it looks.
>
> I admit I never understood why you insist on configuring TSN offloads
> per hardware queue and not per traffic class.

So, I am sorry that I wasn't able to fully understand what you were
saying, then.

I always thought that you were thinking more that the driver was
responsible of making the 'traffic class to queue' translation than the
configuration interface for frame preemption to the user (taprio,
mqprio, etc) should be in terms of traffic classes, instead of queues.

My bad.


Cheers,