diff mbox series

[net-next,3/3] net: ethernet: ti: am65-cpsw: enable broadcast/multicast rate limit support

Message ID 20201114035654.32658-4-grygorii.strashko@ti.com (mailing list archive)
State New, archived
Headers show
Series net: ethernet: ti: cpsw: enable broadcast/multicast rate limit support | expand

Commit Message

Grygorii Strashko Nov. 14, 2020, 3:56 a.m. UTC
This patch enables support for ingress broadcast(BC)/multicast(MC) rate limiting
in TI AM65x CPSW driver (the corresponding ALE support was added in previous
patch) by implementing HW offload for simple tc-flower policer with matches
on dst_mac:
 - ff:ff:ff:ff:ff:ff has to be used for BC rate limiting
 - 01:00:00:00:00:00 fixed value has to be used for MC rate limiting

Hence tc policer defines rate limit in terms of bits per second, but the
ALE supports limiting in terms of packets per second - the rate limit
bits/sec is converted to number of packets per second assuming minimum
Ethernet packet size ETH_ZLEN=60 bytes.

Examples:
- BC rate limit to 1000pps:
  tc qdisc add dev eth0 clsact
  tc filter add dev eth0 ingress flower skip_sw dst_mac ff:ff:ff:ff:ff:ff \
  action police rate 480kbit burst 64k

  rate 480kbit - 1000pps * 60 bytes * 8, burst - not used.

- MC rate limit to 20000pps:
  tc qdisc add dev eth0 clsact
  tc filter add dev eth0 ingress flower skip_sw dst_mac 01:00:00:00:00:00 \
  action police rate 9600kbit burst 64k

  rate 9600kbit - 20000pps * 60 bytes * 8, burst - not used.

Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
---
 drivers/net/ethernet/ti/am65-cpsw-qos.c | 148 ++++++++++++++++++++++++
 drivers/net/ethernet/ti/am65-cpsw-qos.h |   8 ++
 2 files changed, 156 insertions(+)

Comments

Vladimir Oltean Nov. 14, 2020, 7:17 p.m. UTC | #1
On Sat, Nov 14, 2020 at 05:56:54AM +0200, Grygorii Strashko wrote:
> This patch enables support for ingress broadcast(BC)/multicast(MC) rate limiting
> in TI AM65x CPSW driver (the corresponding ALE support was added in previous
> patch) by implementing HW offload for simple tc-flower policer with matches
> on dst_mac:
>  - ff:ff:ff:ff:ff:ff has to be used for BC rate limiting
>  - 01:00:00:00:00:00 fixed value has to be used for MC rate limiting
> 
> Hence tc policer defines rate limit in terms of bits per second, but the
> ALE supports limiting in terms of packets per second - the rate limit
> bits/sec is converted to number of packets per second assuming minimum
> Ethernet packet size ETH_ZLEN=60 bytes.
> 
> Examples:
> - BC rate limit to 1000pps:
>   tc qdisc add dev eth0 clsact
>   tc filter add dev eth0 ingress flower skip_sw dst_mac ff:ff:ff:ff:ff:ff \
>   action police rate 480kbit burst 64k
> 
>   rate 480kbit - 1000pps * 60 bytes * 8, burst - not used.
> 
> - MC rate limit to 20000pps:
>   tc qdisc add dev eth0 clsact
>   tc filter add dev eth0 ingress flower skip_sw dst_mac 01:00:00:00:00:00 \
>   action police rate 9600kbit burst 64k
> 
>   rate 9600kbit - 20000pps * 60 bytes * 8, burst - not used.
> 
> Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
> ---

I understand this is unpleasant feedback, but don't you want to extend
tc-police to have an option to rate-limit based on packet count and not
based on byte count? The assumption you make in the driver that the
packets are all going to be minimum-sized is not a great one. I can
imagine that the user's policer budget is vastly exceeded if they enable
jumbo frames and they put a policer at 9.6 Mbps, and this is not at all
according to their expectation. 20Kpps assuming 60 bytes per packet
might be 9.6 Mbps, and the user will assume this bandwidth profile is
not exceeded, that's the whole point. But 20Kpps assuming 9KB per packet
is 1.44Gbps. Weird.
Grygorii Strashko Nov. 16, 2020, 6:39 p.m. UTC | #2
On 14/11/2020 21:17, Vladimir Oltean wrote:
> On Sat, Nov 14, 2020 at 05:56:54AM +0200, Grygorii Strashko wrote:
>> This patch enables support for ingress broadcast(BC)/multicast(MC) rate limiting
>> in TI AM65x CPSW driver (the corresponding ALE support was added in previous
>> patch) by implementing HW offload for simple tc-flower policer with matches
>> on dst_mac:
>>   - ff:ff:ff:ff:ff:ff has to be used for BC rate limiting
>>   - 01:00:00:00:00:00 fixed value has to be used for MC rate limiting
>>
>> Hence tc policer defines rate limit in terms of bits per second, but the
>> ALE supports limiting in terms of packets per second - the rate limit
>> bits/sec is converted to number of packets per second assuming minimum
>> Ethernet packet size ETH_ZLEN=60 bytes.
>>
>> Examples:
>> - BC rate limit to 1000pps:
>>    tc qdisc add dev eth0 clsact
>>    tc filter add dev eth0 ingress flower skip_sw dst_mac ff:ff:ff:ff:ff:ff \
>>    action police rate 480kbit burst 64k
>>
>>    rate 480kbit - 1000pps * 60 bytes * 8, burst - not used.
>>
>> - MC rate limit to 20000pps:
>>    tc qdisc add dev eth0 clsact
>>    tc filter add dev eth0 ingress flower skip_sw dst_mac 01:00:00:00:00:00 \
>>    action police rate 9600kbit burst 64k
>>
>>    rate 9600kbit - 20000pps * 60 bytes * 8, burst - not used.
>>
>> Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
>> ---
> 
> I understand this is unpleasant feedback, but don't you want to extend
> tc-police to have an option to rate-limit based on packet count and not
> based on byte count?

Huh.
I'd be appreciated if you could provide more detailed opinion of how it can look like?
Sry, it's my first experience with tc.

> The assumption you make in the driver that the
> packets are all going to be minimum-sized is not a great one.
> I can
> imagine that the user's policer budget is vastly exceeded if they enable
> jumbo frames and they put a policer at 9.6 Mbps, and this is not at all
> according to their expectation. 20Kpps assuming 60 bytes per packet
> might be 9.6 Mbps, and the user will assume this bandwidth profile is
> not exceeded, that's the whole point. But 20Kpps assuming 9KB per packet
> is 1.44Gbps. Weird.

Sry, not sure I completely understood above. This is specific HW feature, which can
imit packet rate only. And it is expected to be applied by admin who know what he is doing.
The main purpose is to preserve CPU resource, which first of all affected by packet rate.
So, I see it as not "assumption", but requirement/agreement which will be reflected
in docs and works for a specific use case which is determined by presence of:
  - skip_sw flag
  - specific dst_mac/dst_mac_mask pair
in which case rate determines pps and not K/Mbps.


Also some ref on previous discussion [1] [2]
[1] https://www.spinics.net/lists/netdev/msg494630.html
[2] https://lore.kernel.org/patchwork/patch/481285/
Vladimir Oltean Nov. 16, 2020, 6:59 p.m. UTC | #3
On Mon, Nov 16, 2020 at 08:39:54PM +0200, Grygorii Strashko wrote:
> 
> 
> On 14/11/2020 21:17, Vladimir Oltean wrote:
> > On Sat, Nov 14, 2020 at 05:56:54AM +0200, Grygorii Strashko wrote:
> > > This patch enables support for ingress broadcast(BC)/multicast(MC) rate limiting
> > > in TI AM65x CPSW driver (the corresponding ALE support was added in previous
> > > patch) by implementing HW offload for simple tc-flower policer with matches
> > > on dst_mac:
> > >   - ff:ff:ff:ff:ff:ff has to be used for BC rate limiting
> > >   - 01:00:00:00:00:00 fixed value has to be used for MC rate limiting
> > > 
> > > Hence tc policer defines rate limit in terms of bits per second, but the
> > > ALE supports limiting in terms of packets per second - the rate limit
> > > bits/sec is converted to number of packets per second assuming minimum
> > > Ethernet packet size ETH_ZLEN=60 bytes.
> > > 
> > > Examples:
> > > - BC rate limit to 1000pps:
> > >    tc qdisc add dev eth0 clsact
> > >    tc filter add dev eth0 ingress flower skip_sw dst_mac ff:ff:ff:ff:ff:ff \
> > >    action police rate 480kbit burst 64k
> > > 
> > >    rate 480kbit - 1000pps * 60 bytes * 8, burst - not used.
> > > 
> > > - MC rate limit to 20000pps:
> > >    tc qdisc add dev eth0 clsact
> > >    tc filter add dev eth0 ingress flower skip_sw dst_mac 01:00:00:00:00:00 \
> > >    action police rate 9600kbit burst 64k
> > > 
> > >    rate 9600kbit - 20000pps * 60 bytes * 8, burst - not used.
> > > 
> > > Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
> > > ---
> > 
> > I understand this is unpleasant feedback, but don't you want to extend
> > tc-police to have an option to rate-limit based on packet count and not
> > based on byte count?
> 
> Huh.
> I'd be appreciated if you could provide more detailed opinion of how it can look like?
> Sry, it's my first experience with tc.

Same as above, just in packets per second.

tc qdisc add dev eth0 clsact
tc filter add dev eth0 ingress flower skip_sw \
	dst_mac 01:00:00:00:00:00/01:00:00:00:00:00 \
	action police rate 20kpps

> > The assumption you make in the driver that the
> > packets are all going to be minimum-sized is not a great one.
> > I can
> > imagine that the user's policer budget is vastly exceeded if they enable
> > jumbo frames and they put a policer at 9.6 Mbps, and this is not at all
> > according to their expectation. 20Kpps assuming 60 bytes per packet
> > might be 9.6 Mbps, and the user will assume this bandwidth profile is
> > not exceeded, that's the whole point. But 20Kpps assuming 9KB per packet
> > is 1.44Gbps. Weird.
> 
> Sry, not sure I completely understood above. This is specific HW feature, which can
> imit packet rate only. And it is expected to be applied by admin who know what he is doing.

Yes but you're not helping the admin to "know what he's doing" if you're
asking them to translate apples into oranges. A policer that counts
packets is not equivalent to a policer that counts bytes, unless all
packets are guaranteed to be of equal length, something which you cannot
ensure.

> The main purpose is to preserve CPU resource, which first of all affected by packet rate.
> So, I see it as not "assumption", but requirement/agreement which will be reflected
> in docs and works for a specific use case which is determined by presence of:
>  - skip_sw flag
>  - specific dst_mac/dst_mac_mask pair
> in which case rate determines pps and not K/Mbps.
> 
> 
> Also some ref on previous discussion [1] [2]
> [1] https://www.spinics.net/lists/netdev/msg494630.html
> [2] https://lore.kernel.org/patchwork/patch/481285/

ethtool coalescing as a tool to configure a policer makes zero sense.

You are definitely on the right path with the tc-police action. This was
just trying to be constructive feedback that the software implementation
of tc-police needs more work before your hardware could offload its job
in a way that would not violate its semantics.
Andrew Lunn Nov. 16, 2020, 11:30 p.m. UTC | #4
> Same as above, just in packets per second.
> 
> tc qdisc add dev eth0 clsact
> tc filter add dev eth0 ingress flower skip_sw \
> 	dst_mac 01:00:00:00:00:00/01:00:00:00:00:00 \
> 	action police rate 20kpps

I agree with Vladimir here. Since the hardware does PPS limits, the TC
API should also be PPS limit based. And as you said, CPU load is more
a factor of PPS than BPS, so it is a useful feature in general to
have. You just need to implement the software version first, before
you offload it to the hardware.

    Andrew
diff mbox series

Patch

diff --git a/drivers/net/ethernet/ti/am65-cpsw-qos.c b/drivers/net/ethernet/ti/am65-cpsw-qos.c
index 3bdd4dbcd2ff..a06207233cd5 100644
--- a/drivers/net/ethernet/ti/am65-cpsw-qos.c
+++ b/drivers/net/ethernet/ti/am65-cpsw-qos.c
@@ -8,10 +8,12 @@ 
 
 #include <linux/pm_runtime.h>
 #include <linux/time.h>
+#include <net/pkt_cls.h>
 
 #include "am65-cpsw-nuss.h"
 #include "am65-cpsw-qos.h"
 #include "am65-cpts.h"
+#include "cpsw_ale.h"
 
 #define AM65_CPSW_REG_CTL			0x004
 #define AM65_CPSW_PN_REG_CTL			0x004
@@ -588,12 +590,158 @@  static int am65_cpsw_setup_taprio(struct net_device *ndev, void *type_data)
 	return am65_cpsw_set_taprio(ndev, type_data);
 }
 
+static int am65_cpsw_qos_clsflower_add_policer(struct am65_cpsw_port *port,
+					       struct netlink_ext_ack *extack,
+					       struct flow_cls_offload *cls,
+					       u64 rate_bytes_ps)
+{
+	struct flow_rule *rule = flow_cls_offload_flow_rule(cls);
+	struct flow_dissector *dissector = rule->match.dissector;
+	u8 null_mac[] = {0x00, 0x00, 0x00, 0x00, 0x00, 0x00};
+	u8 bc_mac[] = {0xff, 0xff, 0xff, 0xff, 0xff, 0xff};
+	u8 mc_mac[] = {0x01, 0x00, 0x00, 0x00, 0x00, 0x00};
+	struct am65_cpsw_qos *qos = &port->qos;
+	struct flow_match_eth_addrs match;
+	u32 pps;
+	int ret;
+
+	if (dissector->used_keys &
+	    ~(BIT(FLOW_DISSECTOR_KEY_BASIC) |
+	      BIT(FLOW_DISSECTOR_KEY_CONTROL) |
+	      BIT(FLOW_DISSECTOR_KEY_ETH_ADDRS))) {
+		NL_SET_ERR_MSG_MOD(extack,
+				   "Unsupported keys used");
+		return -EOPNOTSUPP;
+	}
+
+	if (!flow_rule_match_key(rule, FLOW_DISSECTOR_KEY_ETH_ADDRS)) {
+		NL_SET_ERR_MSG_MOD(extack, "Not matching on eth address");
+		return -EOPNOTSUPP;
+	}
+
+	flow_rule_match_eth_addrs(rule, &match);
+
+	if (!ether_addr_equal_masked(match.key->src, null_mac,
+				     match.mask->src)) {
+		NL_SET_ERR_MSG_MOD(extack,
+				   "Matching on source MAC not supported");
+		return -EOPNOTSUPP;
+	}
+
+	/* Calculate number of packets per second for given bps
+	 * assuming min ethernet packet size
+	 */
+	pps = div_u64(rate_bytes_ps, ETH_ZLEN);
+
+	if (ether_addr_equal(match.key->dst, bc_mac)) {
+		ret = cpsw_ale_rx_ratelimit_bc(port->common->ale, port->port_id, pps);
+		if (ret)
+			return ret;
+
+		qos->ale_bc_ratelimit.cookie = cls->cookie;
+		qos->ale_bc_ratelimit.rate_packet_ps = pps;
+	}
+
+	if (ether_addr_equal(match.key->dst, mc_mac)) {
+		ret = cpsw_ale_rx_ratelimit_mc(port->common->ale, port->port_id, pps);
+		if (ret)
+			return ret;
+
+		qos->ale_mc_ratelimit.cookie = cls->cookie;
+		qos->ale_mc_ratelimit.rate_packet_ps = pps;
+	}
+
+	return 0;
+}
+
+static int am65_cpsw_qos_configure_clsflower(struct am65_cpsw_port *port,
+					     struct flow_cls_offload *cls)
+{
+	struct flow_rule *rule = flow_cls_offload_flow_rule(cls);
+	struct netlink_ext_ack *extack = cls->common.extack;
+	const struct flow_action_entry *act;
+	int i;
+
+	flow_action_for_each(i, act, &rule->action) {
+		switch (act->id) {
+		case FLOW_ACTION_POLICE:
+			return am65_cpsw_qos_clsflower_add_policer(port, extack, cls,
+							 act->police.rate_bytes_ps);
+		default:
+			NL_SET_ERR_MSG_MOD(extack,
+					   "Action not supported");
+			return -EOPNOTSUPP;
+		}
+	}
+	return -EOPNOTSUPP;
+}
+
+static int am65_cpsw_qos_delete_clsflower(struct am65_cpsw_port *port, struct flow_cls_offload *cls)
+{
+	struct am65_cpsw_qos *qos = &port->qos;
+
+	if (cls->cookie == qos->ale_bc_ratelimit.cookie) {
+		qos->ale_bc_ratelimit.cookie = 0;
+		qos->ale_bc_ratelimit.rate_packet_ps = 0;
+		cpsw_ale_rx_ratelimit_bc(port->common->ale, port->port_id, 0);
+	}
+
+	if (cls->cookie == qos->ale_mc_ratelimit.cookie) {
+		qos->ale_mc_ratelimit.cookie = 0;
+		qos->ale_mc_ratelimit.rate_packet_ps = 0;
+		cpsw_ale_rx_ratelimit_mc(port->common->ale, port->port_id, 0);
+	}
+
+	return 0;
+}
+
+static int am65_cpsw_qos_setup_tc_clsflower(struct am65_cpsw_port *port,
+					    struct flow_cls_offload *cls_flower)
+{
+	switch (cls_flower->command) {
+	case FLOW_CLS_REPLACE:
+		return am65_cpsw_qos_configure_clsflower(port, cls_flower);
+	case FLOW_CLS_DESTROY:
+		return am65_cpsw_qos_delete_clsflower(port, cls_flower);
+	default:
+		return -EOPNOTSUPP;
+	}
+}
+
+static int am65_cpsw_qos_setup_tc_block_cb(enum tc_setup_type type, void *type_data, void *cb_priv)
+{
+	struct am65_cpsw_port *port = cb_priv;
+
+	if (!tc_cls_can_offload_and_chain0(port->ndev, type_data))
+		return -EOPNOTSUPP;
+
+	switch (type) {
+	case TC_SETUP_CLSFLOWER:
+		return am65_cpsw_qos_setup_tc_clsflower(port, type_data);
+	default:
+		return -EOPNOTSUPP;
+	}
+}
+
+static LIST_HEAD(am65_cpsw_qos_block_cb_list);
+
+static int am65_cpsw_qos_setup_tc_block(struct net_device *ndev, struct flow_block_offload *f)
+{
+	struct am65_cpsw_port *port = am65_ndev_to_port(ndev);
+
+	return flow_block_cb_setup_simple(f, &am65_cpsw_qos_block_cb_list,
+					  am65_cpsw_qos_setup_tc_block_cb,
+					  port, port, true);
+}
+
 int am65_cpsw_qos_ndo_setup_tc(struct net_device *ndev, enum tc_setup_type type,
 			       void *type_data)
 {
 	switch (type) {
 	case TC_SETUP_QDISC_TAPRIO:
 		return am65_cpsw_setup_taprio(ndev, type_data);
+	case TC_SETUP_BLOCK:
+		return am65_cpsw_qos_setup_tc_block(ndev, type_data);
 	default:
 		return -EOPNOTSUPP;
 	}
diff --git a/drivers/net/ethernet/ti/am65-cpsw-qos.h b/drivers/net/ethernet/ti/am65-cpsw-qos.h
index e8f1b6b59e93..fb223b43b196 100644
--- a/drivers/net/ethernet/ti/am65-cpsw-qos.h
+++ b/drivers/net/ethernet/ti/am65-cpsw-qos.h
@@ -14,11 +14,19 @@  struct am65_cpsw_est {
 	struct tc_taprio_qopt_offload taprio;
 };
 
+struct am65_cpsw_ale_ratelimit {
+	unsigned long cookie;
+	u64 rate_packet_ps;
+};
+
 struct am65_cpsw_qos {
 	struct am65_cpsw_est *est_admin;
 	struct am65_cpsw_est *est_oper;
 	ktime_t link_down_time;
 	int link_speed;
+
+	struct am65_cpsw_ale_ratelimit ale_bc_ratelimit;
+	struct am65_cpsw_ale_ratelimit ale_mc_ratelimit;
 };
 
 int am65_cpsw_qos_ndo_setup_tc(struct net_device *ndev, enum tc_setup_type type,