diff mbox series

[net-next,v14,5/7] net: dsa: mv88e6xxx: rmu: Add functionality to get RMON

Message ID 20220919110847.744712-6-mattias.forsblad@gmail.com (mailing list archive)
State Changes Requested
Delegated to: Netdev Maintainers
Headers show
Series net: dsa: qca8k, mv88e6xxx: rmon: Add RMU support | expand

Checks

Context Check Description
netdev/tree_selection success Clearly marked for net-next
netdev/fixes_present success Fixes tag not required for -next series
netdev/subject_prefix success Link
netdev/cover_letter success Series has a cover letter
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 0 this patch: 0
netdev/cc_maintainers success CCed 9 of 9 maintainers
netdev/build_clang success Errors and warnings before: 0 this patch: 0
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 0 this patch: 0
netdev/checkpatch warning WARNING: line length of 81 exceeds 80 columns WARNING: line length of 82 exceeds 80 columns WARNING: line length of 90 exceeds 80 columns WARNING: line length of 95 exceeds 80 columns WARNING: line length of 98 exceeds 80 columns
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Mattias Forsblad Sept. 19, 2022, 11:08 a.m. UTC
It's possible to access the RMON counters via RMU.
Add functionality to send DUMP_MIB frames.

Signed-off-by: Mattias Forsblad <mattias.forsblad@gmail.com>
---
 drivers/net/dsa/mv88e6xxx/chip.c | 74 ++++++++++++++++++++++++++++++++
 drivers/net/dsa/mv88e6xxx/chip.h |  6 +++
 drivers/net/dsa/mv88e6xxx/rmu.c  | 53 +++++++++++++++++++++++
 3 files changed, 133 insertions(+)

Comments

Vladimir Oltean Sept. 19, 2022, 10:49 p.m. UTC | #1
On Mon, Sep 19, 2022 at 01:08:45PM +0200, Mattias Forsblad wrote:
> @@ -255,6 +299,15 @@ int mv88e6xxx_rmu_setup(struct mv88e6xxx_chip *chip)
>  	chip->rmu.smi_ops = chip->smi_ops;
>  	chip->rmu.ds_ops = chip->ds->ops;
>  
> +	/* Change rmu ops with our own pointer */
> +	chip->rmu.smi_rmu_ops = (struct mv88e6xxx_bus_ops *)chip->rmu.smi_ops;
> +	chip->rmu.smi_rmu_ops->get_rmon = mv88e6xxx_rmu_stats_get;

The patch splitting is still so poor, that I can't even complain about
this bug properly, since no one uses this pointer for now (and when it
will be used, it will not be obvious how it's assigned).

In short, it's called like this:

static void mv88e6xxx_get_ethtool_stats(struct dsa_switch *ds, int port,
					uint64_t *data)
{
	struct mv88e6xxx_chip *chip = ds->priv;

	chip->smi_ops->get_rmon(chip, port, data);
}

When the switch doesn't support RMU operations, or when the RMU setup
simply failed, "ethtool -S" will dereference a very NULL pointer during
that indirect call, because mv88e6xxx_rmu_setup() is unconditionally
called for every switch during setup. Probably not a wise choice to
start off with the RMU ops for ethtool -S.

Also, not clear, if RMU fails, why we don't make an effort to fall back
to MDIO for that user space request.

> +
> +	/* Also change get stats strings for our own */

Who is "our own", and implicitly, who are they? You'd expect that there
aren't tribalist factions within the same driver.

> +	chip->rmu.ds_rmu_ops = (struct dsa_switch_ops *)chip->ds->ops;
> +	chip->rmu.ds_rmu_ops->get_sset_count = mv88e6xxx_stats_get_sset_count_rmu;
> +	chip->rmu.ds_rmu_ops->get_strings = mv88e6xxx_stats_get_strings_rmu;
> +

Those who cast a const to a non-const pointer and proceed to modify the
read-only structure behind it should go to patchwork arrest for one week.

>  	/* Start disabled, we'll enable RMU in master_state_change */
>  	if (chip->info->ops->rmu_disable)
>  		return chip->info->ops->rmu_disable(chip);
> -- 
> 2.25.1
>
Mattias Forsblad Sept. 20, 2022, 12:26 p.m. UTC | #2
On 2022-09-20 00:49, Vladimir Oltean wrote:
> On Mon, Sep 19, 2022 at 01:08:45PM +0200, Mattias Forsblad wrote:
>> @@ -255,6 +299,15 @@ int mv88e6xxx_rmu_setup(struct mv88e6xxx_chip *chip)
>>  	chip->rmu.smi_ops = chip->smi_ops;
>>  	chip->rmu.ds_ops = chip->ds->ops;
>>  
>> +	/* Change rmu ops with our own pointer */
>> +	chip->rmu.smi_rmu_ops = (struct mv88e6xxx_bus_ops *)chip->rmu.smi_ops;
>> +	chip->rmu.smi_rmu_ops->get_rmon = mv88e6xxx_rmu_stats_get;
> 
> The patch splitting is still so poor, that I can't even complain about
> this bug properly, since no one uses this pointer for now (and when it
> will be used, it will not be obvious how it's assigned).
> 
> In short, it's called like this:
> 
> static void mv88e6xxx_get_ethtool_stats(struct dsa_switch *ds, int port,
> 					uint64_t *data)
> {
> 	struct mv88e6xxx_chip *chip = ds->priv;
> 
> 	chip->smi_ops->get_rmon(chip, port, data);
> }
> 
> When the switch doesn't support RMU operations, or when the RMU setup
> simply failed, "ethtool -S" will dereference a very NULL pointer during
> that indirect call, because mv88e6xxx_rmu_setup() is unconditionally
> called for every switch during setup. Probably not a wise choice to
> start off with the RMU ops for ethtool -S.
> 
> Also, not clear, if RMU fails, why we don't make an effort to fall back
> to MDIO for that user space request.
> 
>> +
>> +	/* Also change get stats strings for our own */
> 
> Who is "our own", and implicitly, who are they? You'd expect that there
> aren't tribalist factions within the same driver.
> 
>> +	chip->rmu.ds_rmu_ops = (struct dsa_switch_ops *)chip->ds->ops;
>> +	chip->rmu.ds_rmu_ops->get_sset_count = mv88e6xxx_stats_get_sset_count_rmu;
>> +	chip->rmu.ds_rmu_ops->get_strings = mv88e6xxx_stats_get_strings_rmu;
>> +
> 
> Those who cast a const to a non-const pointer and proceed to modify the
> read-only structure behind it should go to patchwork arrest for one week.
> 
>>  	/* Start disabled, we'll enable RMU in master_state_change */
>>  	if (chip->info->ops->rmu_disable)
>>  		return chip->info->ops->rmu_disable(chip);
>> -- 
>> 2.25.1
>>
> 

This whole shebang was a suggestion from Andrew. I had a solution with
mv88e6xxx_rmu_available in mv88e6xxx_get_ethtool_stats which he wasn't fond of.
The mv88e6xxx_bus_ops is declared const and how am I to change the get_rmon
member? I'm not really sure on how to solve this in a better way?
Suggestions any? Maybe I've misunderstood his suggestion.

/Mattias
Vladimir Oltean Sept. 20, 2022, 1:10 p.m. UTC | #3
On Tue, Sep 20, 2022 at 02:26:22PM +0200, Mattias Forsblad wrote:
> This whole shebang was a suggestion from Andrew. I had a solution with
> mv88e6xxx_rmu_available in mv88e6xxx_get_ethtool_stats which he wasn't fond of.
> The mv88e6xxx_bus_ops is declared const and how am I to change the get_rmon
> member? I'm not really sure on how to solve this in a better way?
> Suggestions any? Maybe I've misunderstood his suggestion.

Can you point me to the beginning of that exact suggestion? I've removed
everything older than v10 from my inbox, since the flow of patches was
preventing me from seeing other emails.
Mattias Forsblad Sept. 20, 2022, 1:40 p.m. UTC | #4
On 2022-09-20 15:10, Vladimir Oltean wrote:
> On Tue, Sep 20, 2022 at 02:26:22PM +0200, Mattias Forsblad wrote:
>> This whole shebang was a suggestion from Andrew. I had a solution with
>> mv88e6xxx_rmu_available in mv88e6xxx_get_ethtool_stats which he wasn't fond of.
>> The mv88e6xxx_bus_ops is declared const and how am I to change the get_rmon
>> member? I'm not really sure on how to solve this in a better way?
>> Suggestions any? Maybe I've misunderstood his suggestion.
> 
> Can you point me to the beginning of that exact suggestion? I've removed
> everything older than v10 from my inbox, since the flow of patches was
> preventing me from seeing other emails.

https://lore.kernel.org/netdev/Yxkc6Zav7XoKRLBt@lunn.ch/

/Mattias
Andrew Lunn Sept. 20, 2022, 9:04 p.m. UTC | #5
On Tue, Sep 20, 2022 at 04:10:53PM +0300, Vladimir Oltean wrote:
> On Tue, Sep 20, 2022 at 02:26:22PM +0200, Mattias Forsblad wrote:
> > This whole shebang was a suggestion from Andrew. I had a solution with
> > mv88e6xxx_rmu_available in mv88e6xxx_get_ethtool_stats which he wasn't fond of.
> > The mv88e6xxx_bus_ops is declared const and how am I to change the get_rmon
> > member? I'm not really sure on how to solve this in a better way?
> > Suggestions any? Maybe I've misunderstood his suggestion.
> 
> Can you point me to the beginning of that exact suggestion? I've removed
> everything older than v10 from my inbox, since the flow of patches was
> preventing me from seeing other emails.

What i want to do is avoid code like:

     if (have_rmu())
     	foo()
     else
	bar()

There is nothing common in the MDIO MIB code and the RMU MIB code,
just the table of statistics. When we get to dumping the ATU, i also
expect there will be little in common between the MDIO and the RMU
functions.

Doing MIB via RMU is a big gain, but i would also like normal register
read and write to go via RMU, probably with some level of
combining. Multiple writes can be combined into one RMU operation
ending with a read. That should give us an mv88e6xxx_bus_ops which
does RMU, and we can swap the bootstrap MDIO bus_ops for the RMU
bus_ops.

But how do we mix RMU MIB and ATU dumps into this? My idea was to make
them additional members of mv88e6xxx_bus_ops. The MDIO bus_ops
structures would end up call the mv88e6xxx_ops method for MIB or
ATU. The rmu bus_ops and directly call an RMU function to do it.

What is messy at the moment is that we don't have register read/write
via RMU, so we have some horrible hybrid. We should probably just
implement simple read and write, without combining, so we can skip
this hybrid.

I am assuming here that RMU is reliable. The QCA8K driver currently
falls back to MDIO if its inband function is attempted but fails.  I
want to stress this part, lots of data packets and see if the RMU
frames get dropped, or delayed too much causing failures. If we do see
failures, is a couple of retires enough? Or do we need to fallback to
MDIO which should always work? If we do need to fallback, this
structure is not going to work too well.

	  Andrew
Mattias Forsblad Sept. 21, 2022, 5:35 a.m. UTC | #6
On 2022-09-20 23:04, Andrew Lunn wrote:
> On Tue, Sep 20, 2022 at 04:10:53PM +0300, Vladimir Oltean wrote:
>> On Tue, Sep 20, 2022 at 02:26:22PM +0200, Mattias Forsblad wrote:
>>> This whole shebang was a suggestion from Andrew. I had a solution with
>>> mv88e6xxx_rmu_available in mv88e6xxx_get_ethtool_stats which he wasn't fond of.
>>> The mv88e6xxx_bus_ops is declared const and how am I to change the get_rmon
>>> member? I'm not really sure on how to solve this in a better way?
>>> Suggestions any? Maybe I've misunderstood his suggestion.
>>
>> Can you point me to the beginning of that exact suggestion? I've removed
>> everything older than v10 from my inbox, since the flow of patches was
>> preventing me from seeing other emails.
> 
> What i want to do is avoid code like:
> 
>      if (have_rmu())
>      	foo()
>      else
> 	bar()
> 
> There is nothing common in the MDIO MIB code and the RMU MIB code,
> just the table of statistics. When we get to dumping the ATU, i also
> expect there will be little in common between the MDIO and the RMU
> functions.
> 
> Doing MIB via RMU is a big gain, but i would also like normal register
> read and write to go via RMU, probably with some level of
> combining. Multiple writes can be combined into one RMU operation
> ending with a read. That should give us an mv88e6xxx_bus_ops which
> does RMU, and we can swap the bootstrap MDIO bus_ops for the RMU
> bus_ops.
> 
> But how do we mix RMU MIB and ATU dumps into this? My idea was to make
> them additional members of mv88e6xxx_bus_ops. The MDIO bus_ops
> structures would end up call the mv88e6xxx_ops method for MIB or
> ATU. The rmu bus_ops and directly call an RMU function to do it.
> 
> What is messy at the moment is that we don't have register read/write
> via RMU, so we have some horrible hybrid. We should probably just
> implement simple read and write, without combining, so we can skip
> this hybrid.
> 
> I am assuming here that RMU is reliable. The QCA8K driver currently
> falls back to MDIO if its inband function is attempted but fails.  I
> want to stress this part, lots of data packets and see if the RMU
> frames get dropped, or delayed too much causing failures. If we do see
> failures, is a couple of retires enough? Or do we need to fallback to
> MDIO which should always work? If we do need to fallback, this
> structure is not going to work too well.
> 
> 	  Andrew

I understand want you want but I can see a lot of risks and pitfalls with moving
ordinary read and writes to RMU, which I wanted to avoid by first doing
RMON dump and then dump ATU and at a later stage with a better architecture
for write/read combining doing that, instead of forcing through read/writes
with all associated testing it would require. Can we please do this in
steps?

/Mattias
Andrew Lunn Sept. 21, 2022, 3:50 p.m. UTC | #7
> I understand want you want but I can see a lot of risks and pitfalls with moving
> ordinary read and writes to RMU, which I wanted to avoid by first doing
> RMON dump and then dump ATU and at a later stage with a better architecture
> for write/read combining doing that, instead of forcing through read/writes
> with all associated testing it would require. Can we please do this in
> steps?

If we are going to fall back to MDIO when RMU fails, we need a
different code structure for these operations. That different code
structure should also help solve the messy _ops structure stuff.

RMU affects us in two different locations:

ATU and MIB dump: Controlled by struct mv88e6xxx_ops

register read/write: Controlled by struct mv88e6xxx_bus_ops

We could add to struct mv88e6xxx_ops:

        int (*stats_rmu_get_sset_count)(struct mv88e6xxx_chip *chip);
        int (*stats_rmu_get_strings)(struct mv88e6xxx_chip *chip,  uint8_t *data);
        int (*stats_rmu_get_stats)(struct mv88e6xxx_chip *chip,  int port,
                                   uint64_t *data);

and then mv88e6xxx_get_stats() would become something like:

static void mv88e6xxx_get_stats(struct mv88e6xxx_chip *chip, int port,
                                uint64_t *data)
{
        int count = 0;
	int err;

	if (chip->info->ops->stats_rmu_get_stats && mv88e6xxx_rmu_enabled(chip)) {
		err = chip->info->ops->stats_rmu_get_stats(chip, port, data)
		if (!err)
			return;
	}
			
        if (chip->info->ops->stats_get_stats)
                count = chip->info->ops->stats_get_stats(chip, port, data);

We then get fall back to MDIO, and clean, separate implementations of
RMU and MDIO stats operations.

I hope the ATU/fdb dump can be done in a similar way.

register read/writes, we probably need to extend mv88e6xxx_smi_read()
and mv88e6xxx_smi_write(). Try RMU first, and then fall back to MDIO.

Please try something in this direction. But please, lots of small,
simple patches with good commit messages.

       Andrew
Vladimir Oltean Sept. 22, 2022, 11:48 a.m. UTC | #8
On Tue, Sep 20, 2022 at 11:04:07PM +0200, Andrew Lunn wrote:
> On Tue, Sep 20, 2022 at 04:10:53PM +0300, Vladimir Oltean wrote:
> > On Tue, Sep 20, 2022 at 02:26:22PM +0200, Mattias Forsblad wrote:
> > > This whole shebang was a suggestion from Andrew. I had a solution with
> > > mv88e6xxx_rmu_available in mv88e6xxx_get_ethtool_stats which he wasn't fond of.
> > > The mv88e6xxx_bus_ops is declared const and how am I to change the get_rmon
> > > member? I'm not really sure on how to solve this in a better way?
> > > Suggestions any? Maybe I've misunderstood his suggestion.
> > 
> > Can you point me to the beginning of that exact suggestion? I've removed
> > everything older than v10 from my inbox, since the flow of patches was
> > preventing me from seeing other emails.
> 
> What i want to do is avoid code like:
> 
>      if (have_rmu())
>      	foo()
>      else
> 	bar()
> 
> There is nothing common in the MDIO MIB code and the RMU MIB code,
> just the table of statistics. When we get to dumping the ATU, i also
> expect there will be little in common between the MDIO and the RMU
> functions.

Sorry, I don't understand what it is about "if (have_rmu()) foo() else bar()"
that you don't like. Isn't the indirection via bus_ops the exact same
thing, but expressed as an indirect function call rather than an if/else?

Or is it that the if/else structure precludes the calling of bar() when
foo() fails? The bus_ops will suffer from the same problem.

> Doing MIB via RMU is a big gain, but i would also like normal register
> read and write to go via RMU, probably with some level of
> combining. Multiple writes can be combined into one RMU operation
> ending with a read. That should give us an mv88e6xxx_bus_ops which
> does RMU, and we can swap the bootstrap MDIO bus_ops for the RMU
> bus_ops.

At what level would the combining be done? I think the mv88e6xxx doesn't
really make use of bulk operations, C45 MDIO reads with post-increment,
that sort of thing. I could be wrong. And at some higher level, the
register read/write code should not diverge (too much), even if the
operation may be done over Ethernet or MDIO. So we need to find places
which actually make useful sense of bulk reads.

> But how do we mix RMU MIB and ATU dumps into this? My idea was to make
> them additional members of mv88e6xxx_bus_ops. The MDIO bus_ops
> structures would end up call the mv88e6xxx_ops method for MIB or
> ATU. The rmu bus_ops and directly call an RMU function to do it.
> 
> What is messy at the moment is that we don't have register read/write
> via RMU, so we have some horrible hybrid. We should probably just
> implement simple read and write, without combining, so we can skip
> this hybrid.

I see in the manual that there are some registers which aren't
available or not recommended over RMU, for example the PHY registers if
the PPU is disabled, or phylink methods for the upstream facing ports.
There are also less obvious things, like accessing the PTP clock
gettime(). This will surely change the delay characteristic that phc2sys
sees, I'm not sure if for the better or for the worse, but the SMI code
at least had some tuning made to it, so people might care. So bottom
line, I'm not sure whether we do enough to prevent these pitfalls by
simply creating blanket replacements for all register reads/writes and
not inspecting whether the code is fine after we do that.

On the other hand, having RMU operations might also bring subtle
benefits to phc2sys. I think there were some issues with the PTP code
having trouble getting MDIO bus access (because of bridge fdb dumps or
some other things happening in the background). This may also be an
opportunity to have parallel access to independent IP blocks within the
switch, one goes through MDIO and the other through Ethernet.

But then, Mattias' code structure becomes inadequate. Currently we
serialize mv88e6xxx_master_state_change() with respect to bus accesses
via mv88e6xxx_reg_lock(). But if we permit RMU to run in parallel with
MDIO, we need a rwlock, such that multiple 'readers' of the conceptual
have_rmu() function can run in parallel with each other, and just
serialize with the RMU state changes (the 'writers').

> I am assuming here that RMU is reliable. The QCA8K driver currently
> falls back to MDIO if its inband function is attempted but fails.  I
> want to stress this part, lots of data packets and see if the RMU
> frames get dropped, or delayed too much causing failures.

I don't think you even have to stress it too much. Nothing prevents the
user from putting a policer on the DSA master which will randomly drop
responses. Or a shaper that will delay requests beyond the timeout.

> If we do see
> failures, is a couple of retires enough? Or do we need to fallback to
> MDIO which should always work? If we do need to fallback, this
> structure is not going to work too well.

Consider our previous discussion about the switchdev prepare/commit
transactional structure, where the commit stage is not supposed to fail
even if it writes to hardware. You said that the writes are supposed to
be reliable, or else. Either they all work or none do. Not sure how that
is going to hold up with a transport such as Ethernet which has such a
wide arsenal of foot guns. I think that leaving the MDIO fallback in is
inevitable.

In terms of retries, I'm not sure. With the qca8k code structure:

	if (have_rmu()) {
		ret = foo();
		if (ret == 0)
			return 0;
	}

	return bar();

we won't have retries for the _current_ operation, but all further
operations will still try to use Ethernet first. So here, it seems to me
that the timeout needs to be tuned such that everything does not grind
down to a halt even if we have a lossy Ethernet channel.

Otherwise, we need to build some more (perhaps too advanced) logic. Have
"if (have_rmu() && rmu_works())", periodically re-check if RMU works
after a failure, etc.
Andrew Lunn Sept. 22, 2022, 12:45 p.m. UTC | #9
On Thu, Sep 22, 2022 at 02:48:20PM +0300, Vladimir Oltean wrote:
> On Tue, Sep 20, 2022 at 11:04:07PM +0200, Andrew Lunn wrote:
> > On Tue, Sep 20, 2022 at 04:10:53PM +0300, Vladimir Oltean wrote:
> > > On Tue, Sep 20, 2022 at 02:26:22PM +0200, Mattias Forsblad wrote:
> > > > This whole shebang was a suggestion from Andrew. I had a solution with
> > > > mv88e6xxx_rmu_available in mv88e6xxx_get_ethtool_stats which he wasn't fond of.
> > > > The mv88e6xxx_bus_ops is declared const and how am I to change the get_rmon
> > > > member? I'm not really sure on how to solve this in a better way?
> > > > Suggestions any? Maybe I've misunderstood his suggestion.
> > > 
> > > Can you point me to the beginning of that exact suggestion? I've removed
> > > everything older than v10 from my inbox, since the flow of patches was
> > > preventing me from seeing other emails.
> > 
> > What i want to do is avoid code like:
> > 
> >      if (have_rmu())
> >      	foo()
> >      else
> > 	bar()
> > 
> > There is nothing common in the MDIO MIB code and the RMU MIB code,
> > just the table of statistics. When we get to dumping the ATU, i also
> > expect there will be little in common between the MDIO and the RMU
> > functions.
> 
> Sorry, I don't understand what it is about "if (have_rmu()) foo() else bar()"
> that you don't like. Isn't the indirection via bus_ops the exact same
> thing, but expressed as an indirect function call rather than an if/else?

There was code like this deep inside the MIB code, which just looked
ugly. That is now gone.

> > Doing MIB via RMU is a big gain, but i would also like normal register
> > read and write to go via RMU, probably with some level of
> > combining. Multiple writes can be combined into one RMU operation
> > ending with a read. That should give us an mv88e6xxx_bus_ops which
> > does RMU, and we can swap the bootstrap MDIO bus_ops for the RMU
> > bus_ops.
> 
> At what level would the combining be done? I think the mv88e6xxx doesn't
> really make use of bulk operations, C45 MDIO reads with post-increment,
> that sort of thing. I could be wrong. And at some higher level, the
> register read/write code should not diverge (too much), even if the
> operation may be done over Ethernet or MDIO. So we need to find places
> which actually make useful sense of bulk reads.

I was thinking within mv88e6xxx_read() and mv88e6xxx_write(). Keep a
buffer for building requests. Each write call appends the write to the
buffer and returns 0. A read call gets appended to the buffer and then
executes the RMU. We probably also need to wrap the reg mutex, so that
when it is released, any buffered writes get executed. If the RMU
fails, we have all the information needed to do the same via MDIO.

What i was not aware of is that some registers are not supposed to be
accessed over RMU. I suppose we can make a list of them, and if there
is a read/write to such a register, execute the RMU and then perform
an MDIO operation for the restricted register.

> But then, Mattias' code structure becomes inadequate. Currently we
> serialize mv88e6xxx_master_state_change() with respect to bus accesses
> via mv88e6xxx_reg_lock(). But if we permit RMU to run in parallel with
> MDIO, we need a rwlock, such that multiple 'readers' of the conceptual
> have_rmu() function can run in parallel with each other, and just
> serialize with the RMU state changes (the 'writers').

I don't think we can allow RMU to run in parallel to MDIO. The reg
lock will probably prevent that anyway.

> 
> > I am assuming here that RMU is reliable. The QCA8K driver currently
> > falls back to MDIO if its inband function is attempted but fails.  I
> > want to stress this part, lots of data packets and see if the RMU
> > frames get dropped, or delayed too much causing failures.
> 
> I don't think you even have to stress it too much. Nothing prevents the
> user from putting a policer on the DSA master which will randomly drop
> responses. Or a shaper that will delay requests beyond the timeout.

That would be a self inflicted problem. But you are correct, we need
to fall back to MDIO.

This is one area we can experiment with. Maybe we can retry the
operation via RMU a few times? Two retries for MIBs is still going to
be a lot faster, if successful, compared to all the MDIO transactions
for all the statistics. We can also add some fall back tracking
logic. If RMU has failed for N times in a row, stop using it for 60
seconds, etc. That might be something we can put into the DSA core,
since it seems like a generic problem.

There is a lot of experimentation needed here, and it could be we need
to throw it all away and try again a few times until we have explored
the problem sufficiently to get it right...

    Andrew
Vladimir Oltean Sept. 22, 2022, 1:04 p.m. UTC | #10
On Thu, Sep 22, 2022 at 02:45:34PM +0200, Andrew Lunn wrote:
> > > Doing MIB via RMU is a big gain, but i would also like normal register
> > > read and write to go via RMU, probably with some level of
> > > combining. Multiple writes can be combined into one RMU operation
> > > ending with a read. That should give us an mv88e6xxx_bus_ops which
> > > does RMU, and we can swap the bootstrap MDIO bus_ops for the RMU
> > > bus_ops.
> > 
> > At what level would the combining be done? I think the mv88e6xxx doesn't
> > really make use of bulk operations, C45 MDIO reads with post-increment,
> > that sort of thing. I could be wrong. And at some higher level, the
> > register read/write code should not diverge (too much), even if the
> > operation may be done over Ethernet or MDIO. So we need to find places
> > which actually make useful sense of bulk reads.
> 
> I was thinking within mv88e6xxx_read() and mv88e6xxx_write(). Keep a
> buffer for building requests. Each write call appends the write to the
> buffer and returns 0. A read call gets appended to the buffer and then
> executes the RMU. We probably also need to wrap the reg mutex, so that
> when it is released, any buffered writes get executed. If the RMU
> fails, we have all the information needed to do the same via MDIO.

Ah, so you want to make the mv88e6xxx_reg_unlock() become an implicit
write barrier.

That could work, but the trouble seems to be error propagation.
mv88e6xxx_write() will always return 0, the operation will be delayed
until the unlock, and mv88e6xxx_reg_unlock() does not return an error
code (why would it?).

> > But then, Mattias' code structure becomes inadequate. Currently we
> > serialize mv88e6xxx_master_state_change() with respect to bus accesses
> > via mv88e6xxx_reg_lock(). But if we permit RMU to run in parallel with
> > MDIO, we need a rwlock, such that multiple 'readers' of the conceptual
> > have_rmu() function can run in parallel with each other, and just
> > serialize with the RMU state changes (the 'writers').
> 
> I don't think we can allow RMU to run in parallel to MDIO. The reg
> lock will probably prevent that anyway.

Well, I was thinking the locking could get rearchitected, but it seems
you have bigger plans for it, so it becomes even more engrained in the
driver :)

> > > I am assuming here that RMU is reliable. The QCA8K driver currently
> > > falls back to MDIO if its inband function is attempted but fails.  I
> > > want to stress this part, lots of data packets and see if the RMU
> > > frames get dropped, or delayed too much causing failures.
> > 
> > I don't think you even have to stress it too much. Nothing prevents the
> > user from putting a policer on the DSA master which will randomly drop
> > responses. Or a shaper that will delay requests beyond the timeout.
> 
> That would be a self inflicted problem. But you are correct, we need
> to fall back to MDIO.

Here's one variation which is really not self inflicted. You have a 10G
CPU port, and 1G user ports. You use flow control on the DSA master to
avoid packet loss due to the 10G->1G rate adaptation. So the DSA master
goes periodically through states of TX congestion and holds back frames
until it goes away. This creates latency for packets in the TX queues,
including RMU requests, even if the RMU messages don't go to the
external ports. And even with a high skb->priority, you'd still need PFC
to avoid this problem. This can trip up the timeout timers we have for
RMU responses.

> This is one area we can experiment with. Maybe we can retry the
> operation via RMU a few times? Two retries for MIBs is still going to
> be a lot faster, if successful, compared to all the MDIO transactions
> for all the statistics. We can also add some fall back tracking
> logic. If RMU has failed for N times in a row, stop using it for 60
> seconds, etc. That might be something we can put into the DSA core,
> since it seems like a generic problem.

Or the driver might have a worker which periodically sends the GetID
message and tracks whether the switch responded. Maybe the rescheduling
intervals of that are dynamically adjusted based on feedback from
timeouts or successes of register reads/writes. In any case, now we're
starting to talk about really complex logic. And it's not clear how
effective any of these mechanisms would be against random and sporadic
timeouts rather than persistent issues.
Andrew Lunn Sept. 22, 2022, 5:27 p.m. UTC | #11
> > I was thinking within mv88e6xxx_read() and mv88e6xxx_write(). Keep a
> > buffer for building requests. Each write call appends the write to the
> > buffer and returns 0. A read call gets appended to the buffer and then
> > executes the RMU. We probably also need to wrap the reg mutex, so that
> > when it is released, any buffered writes get executed. If the RMU
> > fails, we have all the information needed to do the same via MDIO.
> 
> Ah, so you want to make the mv88e6xxx_reg_unlock() become an implicit
> write barrier.

I'm still thinking this through. It probably needs real code to get
all the details sorted out. The locking could be interesting...  We
need something to flush the queue before we exit from the driver, and
that seems like the obvious synchronisation point. If not that, we
need to add another function call at all the exit points.

> That could work, but the trouble seems to be error propagation.
> mv88e6xxx_write() will always return 0, the operation will be delayed
> until the unlock, and mv88e6xxx_reg_unlock() does not return an error
> code (why would it?).

If the RMU fails and the fallback MDIO also fails, we are in big
trouble, and the switch is probably dead. At which point, do we really
care. netdev_ratelimited_err('Switch has died...') and keep going,
everything afterwards is probably going to go wrong as well.

I don't think there are any instances in the driver where we try to
recover from an MDIO write failure, other than return -ETIMEDOUT or
-EIO or whatever.

> Or the driver might have a worker which periodically sends the GetID
> message and tracks whether the switch responded. Maybe the rescheduling
> intervals of that are dynamically adjusted based on feedback from
> timeouts or successes of register reads/writes. In any case, now we're
> starting to talk about really complex logic. And it's not clear how
> effective any of these mechanisms would be against random and sporadic
> timeouts rather than persistent issues.

I don't think we can assume every switch has an equivalent of
GetID. So a solution like this would have to be per driver. Given the
potential complexity, it would probably be better to have it in the
core, everybody shares it, debugs it, and makes sure it works well.

      Andrew
diff mbox series

Patch

diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index 294bf9bbaf3f..5b22fe4b3284 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -1229,6 +1229,80 @@  static int mv88e6xxx_get_sset_count(struct dsa_switch *ds, int port, int sset)
 	return count;
 }
 
+int mv88e6xxx_stats_get_sset_count_rmu(struct dsa_switch *ds, int port, int sset)
+{
+	if (sset != ETH_SS_STATS)
+		return 0;
+
+	return ARRAY_SIZE(mv88e6xxx_hw_stats);
+}
+
+void mv88e6xxx_stats_get_strings_rmu(struct dsa_switch *ds, int port,
+				     u32 stringset, uint8_t *data)
+{
+	struct mv88e6xxx_hw_stat *stat;
+	int i;
+
+	if (stringset != ETH_SS_STATS)
+		return;
+
+	for (i = 0; i < ARRAY_SIZE(mv88e6xxx_hw_stats); i++) {
+		stat = &mv88e6xxx_hw_stats[i];
+		memcpy(data + i * ETH_GSTRING_LEN, stat->string, ETH_GSTRING_LEN);
+	}
+}
+
+int mv88e6xxx_state_get_stats_rmu(struct mv88e6xxx_chip *chip, int port,
+				  __be32 *raw_cnt, int num_cnt, uint64_t *data)
+{
+	struct mv88e6xxx_hw_stat *stat;
+	int offset = 0;
+	u64 high;
+	int i, j;
+
+	for (i = 0, j = 0; i < ARRAY_SIZE(mv88e6xxx_hw_stats); i++) {
+		stat = &mv88e6xxx_hw_stats[i];
+		offset = stat->reg;
+
+		if (stat->type & STATS_TYPE_PORT) {
+			/* The offsets for the port counters below
+			 * are different in the RMU packet.
+			 */
+			switch (stat->reg) {
+			case 0x10: /* sw_in_discards */
+				offset = 0x81;
+				break;
+			case 0x12: /* sw_in_filtered */
+				offset = 0x85;
+				break;
+			case 0x13: /* sw_out_filtered */
+				offset = 0x89;
+				break;
+			default:
+				dev_err(chip->dev,
+					"RMU: port %d wrong offset requested: %d\n",
+					port, stat->reg);
+				return j;
+			}
+		} else if (stat->type & STATS_TYPE_BANK1) {
+			offset += 32;
+		}
+
+		if (offset >= num_cnt)
+			continue;
+
+		data[j] = be32_to_cpu(raw_cnt[offset]);
+
+		if (stat->size == 8) {
+			high = be32_to_cpu(raw_cnt[offset + 1]);
+			data[j] += (high << 32);
+		}
+
+		j++;
+	}
+	return j;
+}
+
 static int mv88e6xxx_stats_get_stats(struct mv88e6xxx_chip *chip, int port,
 				     uint64_t *data, int types,
 				     u16 bank1_select, u16 histogram)
diff --git a/drivers/net/dsa/mv88e6xxx/chip.h b/drivers/net/dsa/mv88e6xxx/chip.h
index 440e9b274df4..aca7cfef196e 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.h
+++ b/drivers/net/dsa/mv88e6xxx/chip.h
@@ -433,6 +433,7 @@  struct mv88e6xxx_bus_ops {
 	int (*read)(struct mv88e6xxx_chip *chip, int addr, int reg, u16 *val);
 	int (*write)(struct mv88e6xxx_chip *chip, int addr, int reg, u16 val);
 	int (*init)(struct mv88e6xxx_chip *chip);
+	void (*get_rmon)(struct mv88e6xxx_chip *chip, int port, uint64_t *data);
 };
 
 struct mv88e6xxx_mdio_bus {
@@ -822,4 +823,9 @@  static inline void mv88e6xxx_reg_unlock(struct mv88e6xxx_chip *chip)
 
 int mv88e6xxx_fid_map(struct mv88e6xxx_chip *chip, unsigned long *bitmap);
 
+int mv88e6xxx_state_get_stats_rmu(struct mv88e6xxx_chip *chip, int port,
+				  __be32 *raw_cnt, int num_cnt, uint64_t *data);
+int mv88e6xxx_stats_get_sset_count_rmu(struct dsa_switch *ds, int port, int sset);
+void mv88e6xxx_stats_get_strings_rmu(struct dsa_switch *ds, int port,
+				     u32 stringset, uint8_t *data);
 #endif /* _MV88E6XXX_CHIP_H */
diff --git a/drivers/net/dsa/mv88e6xxx/rmu.c b/drivers/net/dsa/mv88e6xxx/rmu.c
index c5b3c156de40..129535777c4b 100644
--- a/drivers/net/dsa/mv88e6xxx/rmu.c
+++ b/drivers/net/dsa/mv88e6xxx/rmu.c
@@ -137,6 +137,50 @@  static int mv88e6xxx_rmu_get_id(struct mv88e6xxx_chip *chip, int port)
 	return ret;
 }
 
+static void mv88e6xxx_rmu_stats_get(struct mv88e6xxx_chip *chip, int port, uint64_t *data)
+{
+	u16 req[4] = { MV88E6XXX_RMU_REQ_FORMAT_SOHO,
+		       MV88E6XXX_RMU_REQ_PAD,
+		       MV88E6XXX_RMU_REQ_CODE_DUMP_MIB,
+		       MV88E6XXX_RMU_REQ_DATA};
+	struct mv88e6xxx_dump_mib_resp resp;
+	struct mv88e6xxx_port *p;
+	u8 resp_port;
+	int resp_len;
+	int num_mibs;
+	int ret;
+
+	/* Populate port number in request */
+	req[3] = FIELD_PREP(MV88E6XXX_RMU_REQ_DUMP_MIB_PORT_MASK, port);
+
+	resp_len = sizeof(resp);
+	ret = mv88e6xxx_rmu_send_wait(chip, req, sizeof(req),
+				      &resp, &resp_len);
+	if (ret) {
+		dev_dbg(chip->dev, "RMU: error for command DUMP_MIB %pe port %d\n",
+			ERR_PTR(ret), port);
+		return;
+	}
+
+	/* Got response */
+	ret = mv88e6xxx_rmu_validate_response(&resp.rmu_header, MV88E6XXX_RMU_RESP_CODE_DUMP_MIB);
+	if (ret)
+		return;
+
+	resp_port = FIELD_GET(MV88E6XXX_SOURCE_PORT, resp.portnum);
+	p = &chip->ports[resp_port];
+	if (!p) {
+		dev_err_ratelimited(chip->dev, "RMU: illegal port number in response: %d\n",
+				    resp_port);
+		return;
+	}
+
+	/* Update MIB for port */
+	num_mibs = (resp_len - offsetof(struct mv88e6xxx_dump_mib_resp, mib)) / sizeof(__be32);
+
+	mv88e6xxx_state_get_stats_rmu(chip, port, resp.mib, num_mibs, data);
+}
+
 static void mv88e6xxx_disable_rmu(struct mv88e6xxx_chip *chip)
 {
 	chip->smi_ops = chip->rmu.smi_ops;
@@ -255,6 +299,15 @@  int mv88e6xxx_rmu_setup(struct mv88e6xxx_chip *chip)
 	chip->rmu.smi_ops = chip->smi_ops;
 	chip->rmu.ds_ops = chip->ds->ops;
 
+	/* Change rmu ops with our own pointer */
+	chip->rmu.smi_rmu_ops = (struct mv88e6xxx_bus_ops *)chip->rmu.smi_ops;
+	chip->rmu.smi_rmu_ops->get_rmon = mv88e6xxx_rmu_stats_get;
+
+	/* Also change get stats strings for our own */
+	chip->rmu.ds_rmu_ops = (struct dsa_switch_ops *)chip->ds->ops;
+	chip->rmu.ds_rmu_ops->get_sset_count = mv88e6xxx_stats_get_sset_count_rmu;
+	chip->rmu.ds_rmu_ops->get_strings = mv88e6xxx_stats_get_strings_rmu;
+
 	/* Start disabled, we'll enable RMU in master_state_change */
 	if (chip->info->ops->rmu_disable)
 		return chip->info->ops->rmu_disable(chip);