Message ID | 20230616201113.45510-8-saeed@kernel.org (mailing list archive) |
---|---|
State | Accepted |
Commit | 791eb78285e8b81bc09bfc6bd928b981eaefb082 |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net-next,01/15] net/mlx5: Ack on sync_reset_request only if PF can do reset_now | expand |
On Fri, 16 Jun 2023 13:11:05 -0700 Saeed Mahameed wrote: > $ cat mlx5/0000\:08\:00.0/esw/bridge/bridge1/fdb > DEV MAC VLAN PACKETS BYTES LASTUSE FLAGS > enp8s0f0_1 e4:0a:05:08:00:06 2 2 204 4295567112 0x0 > enp8s0f0_0 e4:0a:05:08:00:03 2 3 278 4295567112 0x0 The flags here are the only thing that's mlx5 specific? Why not add an API for dumping this kind of stats that other drivers can reuse? The rest of the patches LGTM
On Sat 17 Jun 2023 at 00:48, Jakub Kicinski <kuba@kernel.org> wrote: > On Fri, 16 Jun 2023 13:11:05 -0700 Saeed Mahameed wrote: >> $ cat mlx5/0000\:08\:00.0/esw/bridge/bridge1/fdb >> DEV MAC VLAN PACKETS BYTES LASTUSE FLAGS >> enp8s0f0_1 e4:0a:05:08:00:06 2 2 204 4295567112 0x0 >> enp8s0f0_0 e4:0a:05:08:00:03 2 3 278 4295567112 0x0 > > The flags here are the only thing that's mlx5 specific? Not exactly. This debugfs exposes the state of our bridge offload layer. For example, when VF representors from different eswitches are added to the same bridge every FDB entry on such bridge will have multiple underlying offloaded steering rules (one per eswitch instance connected to the bridge). User will observe the entries in all connected 'fdb' debugfs' (all except the 'main' entry will have flag MLX5_ESW_BRIDGE_FLAG_PEER set) and their corresponding counters will increment only on the eswitch instance that is actually processing the packets, which depends on the mode (when bonding device is added to the bridge in single FDB LAG mode all traffic appears on eswitch 0, without it the the traffic is on the eswitch of parent uplink of the VF). I understand that this is rather convoluted but this is exactly why we are going with debugfs. > Why not add an API for dumping this kind of stats that other drivers > can reuse? As explained in previous paragraph we would like to expose internal mlx5 bridge layer for debug purposes, not to design generic bridge FDB counter interface. Also, the debugging needs of our implementation may not correspond to other drivers because we don't have a 'hardware switch' on our NIC, so we do things like learning and ageing in software, and have to deal with multiple possible mode of operations (single FDB vs merged eswitch from previous example, etc.). > > The rest of the patches LGTM
On Mon, 19 Jun 2023 11:37:30 +0300 Vlad Buslov wrote: > On Sat 17 Jun 2023 at 00:48, Jakub Kicinski <kuba@kernel.org> wrote: > > On Fri, 16 Jun 2023 13:11:05 -0700 Saeed Mahameed wrote: > >> $ cat mlx5/0000\:08\:00.0/esw/bridge/bridge1/fdb > >> DEV MAC VLAN PACKETS BYTES LASTUSE FLAGS > >> enp8s0f0_1 e4:0a:05:08:00:06 2 2 204 4295567112 0x0 > >> enp8s0f0_0 e4:0a:05:08:00:03 2 3 278 4295567112 0x0 > > > > The flags here are the only thing that's mlx5 specific? > > Not exactly. This debugfs exposes the state of our bridge offload layer. > For example, when VF representors from different eswitches are added to > the same bridge every FDB entry on such bridge will have multiple > underlying offloaded steering rules (one per eswitch instance connected > to the bridge). User will observe the entries in all connected 'fdb' > debugfs' (all except the 'main' entry will have flag > MLX5_ESW_BRIDGE_FLAG_PEER set) and their corresponding counters will > increment only on the eswitch instance that is actually processing the > packets, which depends on the mode (when bonding device is added to the > bridge in single FDB LAG mode all traffic appears on eswitch 0, without > it the the traffic is on the eswitch of parent uplink of the VF). I > understand that this is rather convoluted but this is exactly why we are > going with debugfs. > > > Why not add an API for dumping this kind of stats that other drivers > > can reuse? > > As explained in previous paragraph we would like to expose internal mlx5 > bridge layer for debug purposes, not to design generic bridge FDB > counter interface. Also, the debugging needs of our implementation may > not correspond to other drivers because we don't have a 'hardware > switch' on our NIC, so we do things like learning and ageing in > software, and have to deal with multiple possible mode of operations > (single FDB vs merged eswitch from previous example, etc.). Looks like my pw-bot shenanigans backfired / crashed, patches didn't get marked as Changes Requested and Dave applied the series :S I understand the motivation but the information is easy enough to understand to potentially tempt a user to start depending on it for production needs. Then another vendor may get asked to implement similar but not exactly the same set of stats etc. etc. Do you have customer who will need this? At the very least please follow up to make the files readable to only root. Normal users should never look at debugfs IMO.
On Mon 19 Jun 2023 at 11:28, Jakub Kicinski <kuba@kernel.org> wrote: > On Mon, 19 Jun 2023 11:37:30 +0300 Vlad Buslov wrote: >> On Sat 17 Jun 2023 at 00:48, Jakub Kicinski <kuba@kernel.org> wrote: >> > On Fri, 16 Jun 2023 13:11:05 -0700 Saeed Mahameed wrote: >> >> $ cat mlx5/0000\:08\:00.0/esw/bridge/bridge1/fdb >> >> DEV MAC VLAN PACKETS BYTES LASTUSE FLAGS >> >> enp8s0f0_1 e4:0a:05:08:00:06 2 2 204 4295567112 0x0 >> >> enp8s0f0_0 e4:0a:05:08:00:03 2 3 278 4295567112 0x0 >> > >> > The flags here are the only thing that's mlx5 specific? >> >> Not exactly. This debugfs exposes the state of our bridge offload layer. >> For example, when VF representors from different eswitches are added to >> the same bridge every FDB entry on such bridge will have multiple >> underlying offloaded steering rules (one per eswitch instance connected >> to the bridge). User will observe the entries in all connected 'fdb' >> debugfs' (all except the 'main' entry will have flag >> MLX5_ESW_BRIDGE_FLAG_PEER set) and their corresponding counters will >> increment only on the eswitch instance that is actually processing the >> packets, which depends on the mode (when bonding device is added to the >> bridge in single FDB LAG mode all traffic appears on eswitch 0, without >> it the the traffic is on the eswitch of parent uplink of the VF). I >> understand that this is rather convoluted but this is exactly why we are >> going with debugfs. >> >> > Why not add an API for dumping this kind of stats that other drivers >> > can reuse? >> >> As explained in previous paragraph we would like to expose internal mlx5 >> bridge layer for debug purposes, not to design generic bridge FDB >> counter interface. Also, the debugging needs of our implementation may >> not correspond to other drivers because we don't have a 'hardware >> switch' on our NIC, so we do things like learning and ageing in >> software, and have to deal with multiple possible mode of operations >> (single FDB vs merged eswitch from previous example, etc.). > > Looks like my pw-bot shenanigans backfired / crashed, patches didn't > get marked as Changes Requested and Dave applied the series :S > > I understand the motivation but the information is easy enough to > understand to potentially tempt a user to start depending on it for > production needs. Then another vendor may get asked to implement > similar but not exactly the same set of stats etc. etc. That could happen (although consider that bridge offload functionality significantly predates mlx5 implementation and apparently no one really needed that until now), but such API would supplement, not replace the debugfs since we would like to have per-eswitch FDB state exposed together with our internal flags and everything as explained in my previous email. > > Do you have customer who will need this? Yes. But strictly for debugging (by human), not for building some proprietary weird user-space switch-controller application that would query this in normal mode of operation, if I understand your concern correctly. > > At the very least please follow up to make the files readable to only > root. Normal users should never look at debugfs IMO. Hmm, all other debugfs' in mlx5 that I tend to use for switching-related functionality debugging seems to be 0444 (lag, steering, tc hairpin). Why would this one be any different?
On Mon, 19 Jun 2023 21:34:02 +0300 Vlad Buslov wrote: > > Looks like my pw-bot shenanigans backfired / crashed, patches didn't > > get marked as Changes Requested and Dave applied the series :S > > > > I understand the motivation but the information is easy enough to > > understand to potentially tempt a user to start depending on it for > > production needs. Then another vendor may get asked to implement > > similar but not exactly the same set of stats etc. etc. > > That could happen (although consider that bridge offload functionality > significantly predates mlx5 implementation and apparently no one really > needed that until now), but such API would supplement, not replace the > debugfs since we would like to have per-eswitch FDB state exposed > together with our internal flags and everything as explained in my > previous email. Because crossing between eswitches incurs additional cost? > > Do you have customer who will need this? > > Yes. But strictly for debugging (by human), not for building some > proprietary weird user-space switch-controller application that would > query this in normal mode of operation, if I understand your concern > correctly. > > > At the very least please follow up to make the files readable to only > > root. Normal users should never look at debugfs IMO. > > Hmm, all other debugfs' in mlx5 that I tend to use for switching-related > functionality debugging seems to be 0444 (lag, steering, tc hairpin). > Why would this one be any different? Querying the stats seems generally useful, so I'd like to narrow down the access as much as possible. This way if the usage spreads we'll hear complaints and can go back to creating a more appropriate API.
On Mon 19 Jun 2023 at 12:05, Jakub Kicinski <kuba@kernel.org> wrote: > On Mon, 19 Jun 2023 21:34:02 +0300 Vlad Buslov wrote: >> > Looks like my pw-bot shenanigans backfired / crashed, patches didn't >> > get marked as Changes Requested and Dave applied the series :S >> > >> > I understand the motivation but the information is easy enough to >> > understand to potentially tempt a user to start depending on it for >> > production needs. Then another vendor may get asked to implement >> > similar but not exactly the same set of stats etc. etc. >> >> That could happen (although consider that bridge offload functionality >> significantly predates mlx5 implementation and apparently no one really >> needed that until now), but such API would supplement, not replace the >> debugfs since we would like to have per-eswitch FDB state exposed >> together with our internal flags and everything as explained in my >> previous email. > > Because crossing between eswitches incurs additional cost? It is not about performance. I install multiple steering rules (one per eswitch), I would like to understand which one is processing the packets when something goes wrong (main or peer). User/field engineer complains that some FDB is (not) aged out according to the expectations, I would like them to dump the file several times while running traffic to see how the lastused and counters changed during that. Just the basic debugging stuff because, again, ConnectX doesn't implement 802.1D in hardware so all the FDB management is done purely in software and we need a way to expose the state. > >> > Do you have customer who will need this? >> >> Yes. But strictly for debugging (by human), not for building some >> proprietary weird user-space switch-controller application that would >> query this in normal mode of operation, if I understand your concern >> correctly. >> >> > At the very least please follow up to make the files readable to only >> > root. Normal users should never look at debugfs IMO. >> >> Hmm, all other debugfs' in mlx5 that I tend to use for switching-related >> functionality debugging seems to be 0444 (lag, steering, tc hairpin). >> Why would this one be any different? > > Querying the stats seems generally useful, so I'd like to narrow down > the access as much as possible. This way if the usage spreads we'll hear > complaints and can go back to creating a more appropriate API. Ack.
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile b/drivers/net/ethernet/mellanox/mlx5/core/Makefile index ddf1e352f51d..35f00700a4d6 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile +++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile @@ -75,7 +75,8 @@ mlx5_core-$(CONFIG_MLX5_ESWITCH) += esw/acl/helper.o \ esw/acl/egress_lgcy.o esw/acl/egress_ofld.o \ esw/acl/ingress_lgcy.o esw/acl/ingress_ofld.o -mlx5_core-$(CONFIG_MLX5_BRIDGE) += esw/bridge.o esw/bridge_mcast.o en/rep/bridge.o +mlx5_core-$(CONFIG_MLX5_BRIDGE) += esw/bridge.o esw/bridge_mcast.o esw/bridge_debugfs.o \ + en/rep/bridge.o mlx5_core-$(CONFIG_THERMAL) += thermal.o mlx5_core-$(CONFIG_MLX5_MPFS) += lib/mpfs.o diff --git a/drivers/net/ethernet/mellanox/mlx5/core/esw/bridge.c b/drivers/net/ethernet/mellanox/mlx5/core/esw/bridge.c index eaa9b328abd5..f4fe1daa4afd 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/esw/bridge.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/esw/bridge.c @@ -863,6 +863,7 @@ static struct mlx5_esw_bridge *mlx5_esw_bridge_create(struct net_device *br_netd bridge->ageing_time = clock_t_to_jiffies(BR_DEFAULT_AGEING_TIME); bridge->vlan_proto = ETH_P_8021Q; list_add(&bridge->list, &br_offloads->bridges); + mlx5_esw_bridge_debugfs_init(br_netdev, bridge); return bridge; @@ -886,6 +887,7 @@ static void mlx5_esw_bridge_put(struct mlx5_esw_bridge_offloads *br_offloads, if (--bridge->refcnt) return; + mlx5_esw_bridge_debugfs_cleanup(bridge); mlx5_esw_bridge_egress_table_cleanup(bridge); mlx5_esw_bridge_mcast_disable(bridge); list_del(&bridge->list); @@ -1904,6 +1906,7 @@ struct mlx5_esw_bridge_offloads *mlx5_esw_bridge_init(struct mlx5_eswitch *esw) xa_init(&br_offloads->ports); br_offloads->esw = esw; esw->br_offloads = br_offloads; + mlx5_esw_bridge_debugfs_offloads_init(br_offloads); return br_offloads; } @@ -1919,6 +1922,7 @@ void mlx5_esw_bridge_cleanup(struct mlx5_eswitch *esw) mlx5_esw_bridge_flush(br_offloads); WARN_ON(!xa_empty(&br_offloads->ports)); + mlx5_esw_bridge_debugfs_offloads_cleanup(br_offloads); esw->br_offloads = NULL; kvfree(br_offloads); diff --git a/drivers/net/ethernet/mellanox/mlx5/core/esw/bridge.h b/drivers/net/ethernet/mellanox/mlx5/core/esw/bridge.h index 2f7ad3bdba5e..c2c7c70d99eb 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/esw/bridge.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/esw/bridge.h @@ -10,6 +10,7 @@ #include <linux/xarray.h> #include "eswitch.h" +struct dentry; struct mlx5_flow_table; struct mlx5_flow_group; @@ -17,6 +18,7 @@ struct mlx5_esw_bridge_offloads { struct mlx5_eswitch *esw; struct list_head bridges; struct xarray ports; + struct dentry *debugfs_root; struct notifier_block netdev_nb; struct notifier_block nb_blk; diff --git a/drivers/net/ethernet/mellanox/mlx5/core/esw/bridge_debugfs.c b/drivers/net/ethernet/mellanox/mlx5/core/esw/bridge_debugfs.c new file mode 100644 index 000000000000..b6a45eff28f5 --- /dev/null +++ b/drivers/net/ethernet/mellanox/mlx5/core/esw/bridge_debugfs.c @@ -0,0 +1,89 @@ +// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB +/* Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved. */ + +#include <linux/debugfs.h> +#include "bridge.h" +#include "bridge_priv.h" + +static void *mlx5_esw_bridge_debugfs_start(struct seq_file *seq, loff_t *pos); +static void *mlx5_esw_bridge_debugfs_next(struct seq_file *seq, void *v, loff_t *pos); +static void mlx5_esw_bridge_debugfs_stop(struct seq_file *seq, void *v); +static int mlx5_esw_bridge_debugfs_show(struct seq_file *seq, void *v); + +static const struct seq_operations mlx5_esw_bridge_debugfs_sops = { + .start = mlx5_esw_bridge_debugfs_start, + .next = mlx5_esw_bridge_debugfs_next, + .stop = mlx5_esw_bridge_debugfs_stop, + .show = mlx5_esw_bridge_debugfs_show, +}; +DEFINE_SEQ_ATTRIBUTE(mlx5_esw_bridge_debugfs); + +static void *mlx5_esw_bridge_debugfs_start(struct seq_file *seq, loff_t *pos) +{ + struct mlx5_esw_bridge *bridge = seq->private; + + rtnl_lock(); + return *pos ? seq_list_start(&bridge->fdb_list, *pos - 1) : SEQ_START_TOKEN; +} + +static void *mlx5_esw_bridge_debugfs_next(struct seq_file *seq, void *v, loff_t *pos) +{ + struct mlx5_esw_bridge *bridge = seq->private; + + return seq_list_next(v == SEQ_START_TOKEN ? &bridge->fdb_list : v, &bridge->fdb_list, pos); +} + +static void mlx5_esw_bridge_debugfs_stop(struct seq_file *seq, void *v) +{ + rtnl_unlock(); +} + +static int mlx5_esw_bridge_debugfs_show(struct seq_file *seq, void *v) +{ + struct mlx5_esw_bridge_fdb_entry *entry; + u64 packets, bytes, lastuse; + + if (v == SEQ_START_TOKEN) { + seq_printf(seq, "%-16s %-17s %4s %20s %20s %20s %5s\n", + "DEV", "MAC", "VLAN", "PACKETS", "BYTES", "LASTUSE", "FLAGS"); + return 0; + } + + entry = list_entry(v, struct mlx5_esw_bridge_fdb_entry, list); + mlx5_fc_query_cached_raw(entry->ingress_counter, &bytes, &packets, &lastuse); + seq_printf(seq, "%-16s %-17pM %4d %20llu %20llu %20llu %#5x\n", + entry->dev->name, entry->key.addr, entry->key.vid, packets, bytes, lastuse, + entry->flags); + return 0; +} + +void mlx5_esw_bridge_debugfs_init(struct net_device *br_netdev, struct mlx5_esw_bridge *bridge) +{ + if (!bridge->br_offloads->debugfs_root) + return; + + bridge->debugfs_dir = debugfs_create_dir(br_netdev->name, + bridge->br_offloads->debugfs_root); + debugfs_create_file("fdb", 0444, bridge->debugfs_dir, bridge, + &mlx5_esw_bridge_debugfs_fops); +} + +void mlx5_esw_bridge_debugfs_cleanup(struct mlx5_esw_bridge *bridge) +{ + debugfs_remove_recursive(bridge->debugfs_dir); + bridge->debugfs_dir = NULL; +} + +void mlx5_esw_bridge_debugfs_offloads_init(struct mlx5_esw_bridge_offloads *br_offloads) +{ + if (!br_offloads->esw->debugfs_root) + return; + + br_offloads->debugfs_root = debugfs_create_dir("bridge", br_offloads->esw->debugfs_root); +} + +void mlx5_esw_bridge_debugfs_offloads_cleanup(struct mlx5_esw_bridge_offloads *br_offloads) +{ + debugfs_remove_recursive(br_offloads->debugfs_root); + br_offloads->debugfs_root = NULL; +} diff --git a/drivers/net/ethernet/mellanox/mlx5/core/esw/bridge_priv.h b/drivers/net/ethernet/mellanox/mlx5/core/esw/bridge_priv.h index c9595801bdb4..4911cc32161b 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/esw/bridge_priv.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/esw/bridge_priv.h @@ -199,6 +199,7 @@ struct mlx5_esw_bridge { int refcnt; struct list_head list; struct mlx5_esw_bridge_offloads *br_offloads; + struct dentry *debugfs_dir; struct list_head fdb_list; struct rhashtable fdb_ht; @@ -241,4 +242,9 @@ void mlx5_esw_bridge_port_mdb_vlan_flush(struct mlx5_esw_bridge_port *port, struct mlx5_esw_bridge_vlan *vlan); void mlx5_esw_bridge_mdb_flush(struct mlx5_esw_bridge *bridge); +void mlx5_esw_bridge_debugfs_offloads_init(struct mlx5_esw_bridge_offloads *br_offloads); +void mlx5_esw_bridge_debugfs_offloads_cleanup(struct mlx5_esw_bridge_offloads *br_offloads); +void mlx5_esw_bridge_debugfs_init(struct net_device *br_netdev, struct mlx5_esw_bridge *bridge); +void mlx5_esw_bridge_debugfs_cleanup(struct mlx5_esw_bridge *bridge); + #endif /* _MLX5_ESW_BRIDGE_PRIVATE_ */