Message ID | 20230321010325.897817-2-vladimir.oltean@nxp.com (mailing list archive) |
---|---|
State | Accepted |
Commit | 6acc72a43eac78a309160d0a7512bbc59bcdd757 |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | Fix trainwreck with Ocelot switch statistics counters | expand |
On Tue, Mar 21, 2023 at 03:03:23AM +0200, Vladimir Oltean wrote: > The blamed commit changed struct ocelot_stat_layout :: "u32 offset" to > "u32 reg". > > However, "u32 reg" is not quite a register address, but an enum > ocelot_reg, which in itself encodes an enum ocelot_target target in the > upper bits, and an index into the ocelot->map[target][] array in the > lower bits. > > So, whereas the previous code comparison between stats_layout[i].offset > and last + 1 was correct (because those "offsets" at the time were > 32-bit relative addresses), the new code, comparing layout[i].reg to > last + 4 is not correct, because the "reg" here is an enum/index, not an > actual register address. > > What we want to compare are indeed register addresses, but to do that, > we need to actually go through the same motions as > __ocelot_bulk_read_ix() itself. > > With this bug, all statistics counters are deemed by > ocelot_prepare_stats_regions() as constituting their own region. > (Truncated) log on VSC9959 (Felix) below (prints added by me): > > Before: > > region of 1 contiguous counters starting with SYS:STAT:CNT[0x000] > region of 1 contiguous counters starting with SYS:STAT:CNT[0x001] > region of 1 contiguous counters starting with SYS:STAT:CNT[0x002] > ... > region of 1 contiguous counters starting with SYS:STAT:CNT[0x041] > region of 1 contiguous counters starting with SYS:STAT:CNT[0x042] > region of 1 contiguous counters starting with SYS:STAT:CNT[0x080] > region of 1 contiguous counters starting with SYS:STAT:CNT[0x081] > ... > region of 1 contiguous counters starting with SYS:STAT:CNT[0x0ac] > region of 1 contiguous counters starting with SYS:STAT:CNT[0x100] > region of 1 contiguous counters starting with SYS:STAT:CNT[0x101] > ... > region of 1 contiguous counters starting with SYS:STAT:CNT[0x111] > > After: > > region of 67 contiguous counters starting with SYS:STAT:CNT[0x000] > region of 45 contiguous counters starting with SYS:STAT:CNT[0x080] > region of 18 contiguous counters starting with SYS:STAT:CNT[0x100] Yes, I verified this with: `trace-cmd record -p function_graph -l ocelot_* sleep 3` Before the patch series, on the VSC7512 a call to ocelot_port_update_stats() takes about 14ms, with many calls to ocelot_spi_regmap_bus_read(). After the patch series, the calls take about 2ms, with four calls to ocelot_spi_regmap_bus_read(). Acked-by: Colin Foster <colin.foster@in-advantage.com> Tested-by: Colin Foster <colin.foster@in-advantage.com>
diff --git a/drivers/net/ethernet/mscc/ocelot_stats.c b/drivers/net/ethernet/mscc/ocelot_stats.c index bdb893476832..096c81ec9dd6 100644 --- a/drivers/net/ethernet/mscc/ocelot_stats.c +++ b/drivers/net/ethernet/mscc/ocelot_stats.c @@ -899,7 +899,8 @@ static int ocelot_prepare_stats_regions(struct ocelot *ocelot) if (!layout[i].reg) continue; - if (region && layout[i].reg == last + 4) { + if (region && ocelot->map[SYS][layout[i].reg & REG_MASK] == + ocelot->map[SYS][last & REG_MASK] + 4) { region->count++; } else { region = devm_kzalloc(ocelot->dev, sizeof(*region),
The blamed commit changed struct ocelot_stat_layout :: "u32 offset" to "u32 reg". However, "u32 reg" is not quite a register address, but an enum ocelot_reg, which in itself encodes an enum ocelot_target target in the upper bits, and an index into the ocelot->map[target][] array in the lower bits. So, whereas the previous code comparison between stats_layout[i].offset and last + 1 was correct (because those "offsets" at the time were 32-bit relative addresses), the new code, comparing layout[i].reg to last + 4 is not correct, because the "reg" here is an enum/index, not an actual register address. What we want to compare are indeed register addresses, but to do that, we need to actually go through the same motions as __ocelot_bulk_read_ix() itself. With this bug, all statistics counters are deemed by ocelot_prepare_stats_regions() as constituting their own region. (Truncated) log on VSC9959 (Felix) below (prints added by me): Before: region of 1 contiguous counters starting with SYS:STAT:CNT[0x000] region of 1 contiguous counters starting with SYS:STAT:CNT[0x001] region of 1 contiguous counters starting with SYS:STAT:CNT[0x002] ... region of 1 contiguous counters starting with SYS:STAT:CNT[0x041] region of 1 contiguous counters starting with SYS:STAT:CNT[0x042] region of 1 contiguous counters starting with SYS:STAT:CNT[0x080] region of 1 contiguous counters starting with SYS:STAT:CNT[0x081] ... region of 1 contiguous counters starting with SYS:STAT:CNT[0x0ac] region of 1 contiguous counters starting with SYS:STAT:CNT[0x100] region of 1 contiguous counters starting with SYS:STAT:CNT[0x101] ... region of 1 contiguous counters starting with SYS:STAT:CNT[0x111] After: region of 67 contiguous counters starting with SYS:STAT:CNT[0x000] region of 45 contiguous counters starting with SYS:STAT:CNT[0x080] region of 18 contiguous counters starting with SYS:STAT:CNT[0x100] Since commit d87b1c08f38a ("net: mscc: ocelot: use bulk reads for stats") intended bulking as a performance improvement, and since now, with trivial-sized regions, performance is even worse than without bulking at all, this could easily qualify as a performance regression. Fixes: d4c367650704 ("net: mscc: ocelot: keep ocelot_stat_layout by reg address, not offset") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> --- drivers/net/ethernet/mscc/ocelot_stats.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)