[net-next,RFC,0/6] Add support for qca8k mdio rw in Ethernet packet

Message ID	20211207145942.7444-1-ansuelsmth@gmail.com (mailing list archive)
Headers	show Return-Path: <netdev-owner@kernel.org> From: Ansuel Smith <ansuelsmth@gmail.com> To: Andrew Lunn <andrew@lunn.ch>, Vivien Didelot <vivien.didelot@gmail.com>, Florian Fainelli <f.fainelli@gmail.com>, Vladimir Oltean <olteanv@gmail.com>, "David S. Miller" <davem@davemloft.net>, Jakub Kicinski <kuba@kernel.org>, linux-kernel@vger.kernel.org, netdev@vger.kernel.org Cc: Ansuel Smith <ansuelsmth@gmail.com> Subject: [net-next RFC PATCH 0/6] Add support for qca8k mdio rw in Ethernet packet Date: Tue, 7 Dec 2021 15:59:36 +0100 Message-Id: <20211207145942.7444-1-ansuelsmth@gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	Add support for qca8k mdio rw in Ethernet packet \| expand [net-next,RFC,0/6] Add support for qca8k mdio rw in Ethernet packet [net-next,RFC,1/6] net: dsa: tag_qca: convert to FIELD macro [net-next,RFC,2/6] net: dsa: tag_qca: move define to include linux/dsa [net-next,RFC,3/6] net: dsa: tag_qca: add define for mdio read/write in ethernet packet [net-next,RFC,4/6] net: dsa: qca8k: Add support for mdio read/write in Ethernet packet [net-next,RFC,5/6] net: dsa: tag_qca: Add support for handling mdio read/write packet [net-next,RFC,6/6] net: dsa: qca8k: cache lo and hi for mdio write

Christian Marangi Dec. 7, 2021, 2:59 p.m. UTC

Hi, this is still WIP and currently has some problem but I would love if
someone can give this a superficial review and answer to some problem
with this.

The main reason for this is that we notice some routing problem in the
switch and it seems assisted learning is needed. Considering mdio is
quite slow due to the indirect write using this Ethernet alternative way
seems to be quicker.

The qca8k switch supports a special way to pass mdio read/write request
using specially crafted Ethernet packet.
This works by putting some defined data in the Ethernet header where the
mac source and dst should be placed. The Ethernet type header is set to qca
header and is set to a mdio read/write type.
This is used to communicate to the switch that this is a special packet
and should be parsed differently.

Current implementation of this use completion API to wait for the packet
to be processed by the tagger and has a timeout that fallback to the
legacy mdio way and mutex to enforce one transaction at time.

Here I list the main concern I have about this:
- Is the changes done to the tagger acceptable? (moving stuff to global
  include)
- Is it correct to put the skb generation code in the qca8k source?
- Is the changes generally correct? (referring to how this is
  implemented with part of the implementation split between the tagger
  and the driver)

I still have to find a solution to a slowdown problem and this is where
I would love to get some hint.
Currently I still didn't find a good way to understand when the tagger
starts to accept packets and because of this the initial setup is slow
as every completion timeouts. Am I missing something or is there a way
to check for this?
After the initial slowdown, as soon as the cpu port is ready and starts
to accept packet, every transaction is near instant and no completion
timeouts.

As I said this is still WIP but it does work correctly aside from the
initial slowdown problem. (the slowdown is in the first port init and at
the first port init... from port 2 the tagger starts to accept packet
and this starts to work)

Ansuel Smith (6):
  net: dsa: tag_qca: convert to FIELD macro
  net: dsa: tag_qca: move define to include linux/dsa
  net: dsa: tag_qca: add define for mdio read/write in ethernet packet
  net: dsa: qca8k: Add support for mdio read/write in Ethernet packet
  net: dsa: tag_qca: Add support for handling mdio read/write packet
  net: dsa: qca8k: cache lo and hi for mdio write

 drivers/net/dsa/qca8k.c     | 205 ++++++++++++++++++++++++++++++++++--
 drivers/net/dsa/qca8k.h     |   5 +
 include/linux/dsa/tag_qca.h |  75 +++++++++++++
 net/dsa/tag_qca.c           |  69 +++++++-----
 4 files changed, 320 insertions(+), 34 deletions(-)
 create mode 100644 include/linux/dsa/tag_qca.h

Andrew Lunn Dec. 7, 2021, 3:15 p.m. UTC | #1

On Tue, Dec 07, 2021 at 03:59:36PM +0100, Ansuel Smith wrote:
> Hi, this is still WIP and currently has some problem but I would love if
> someone can give this a superficial review and answer to some problem
> with this.
> 
> The main reason for this is that we notice some routing problem in the
> switch and it seems assisted learning is needed. Considering mdio is
> quite slow due to the indirect write using this Ethernet alternative way
> seems to be quicker.
> 
> The qca8k switch supports a special way to pass mdio read/write request
> using specially crafted Ethernet packet.

Oh! Cool! Marvell has this as well, and i suspect a few others. It is
something i've wanted to work on for a long long time, but never had
the opportunity.

This also means that, even if you are focusing on qca8k, please try to
think what could be generic, and what should specific to the
qca8k. The idea of sending an Ethernet frame and sometime later
receiving a reply should be generic and usable for other DSA
drivers. The contents of those frames needs to be driver specific.
How we hook this into MDIO might also be generic, maybe.

I will look at your questions later, but soon.

  Andrew

Christian Marangi Dec. 7, 2021, 3:33 p.m. UTC | #2

On Tue, Dec 07, 2021 at 04:15:51PM +0100, Andrew Lunn wrote:
> On Tue, Dec 07, 2021 at 03:59:36PM +0100, Ansuel Smith wrote:
> > Hi, this is still WIP and currently has some problem but I would love if
> > someone can give this a superficial review and answer to some problem
> > with this.
> > 
> > The main reason for this is that we notice some routing problem in the
> > switch and it seems assisted learning is needed. Considering mdio is
> > quite slow due to the indirect write using this Ethernet alternative way
> > seems to be quicker.
> > 
> > The qca8k switch supports a special way to pass mdio read/write request
> > using specially crafted Ethernet packet.
> 
> Oh! Cool! Marvell has this as well, and i suspect a few others. It is
> something i've wanted to work on for a long long time, but never had
> the opportunity.
>

Really? I tought this was very specific to qca8k.

> This also means that, even if you are focusing on qca8k, please try to
> think what could be generic, and what should specific to the
> qca8k. The idea of sending an Ethernet frame and sometime later
> receiving a reply should be generic and usable for other DSA
> drivers. The contents of those frames needs to be driver specific.
> How we hook this into MDIO might also be generic, maybe.

A generic implementation of this would be add an ops to the dsa generic
struct that driver can implement and find a clean way to use this
alternative way instead of normal mdio. (Implement something like
eth_mdio_read/write ops and the driver will decide when to use them?
The dsa then will correctly understand when the driver is ready to
accept packet and send the skb, in all the other case just send an error
and the driver will use normal mdio?)

I think the tagger require some modification anyway as it's the one that
receive packet and parse them. (I assume also other switch will use the
tagger to understand that the packet is used for mdio)
(A bool to declare that the tagger can correctly parse this kind of
stuff and complete the completion?)

> 
> I will look at your questions later, but soon.
> 

Thanks a lot for the quick response. I'm more than happy to generalize
this and find the a correct way.

>   Andrew

Andrew Lunn Dec. 7, 2021, 6:41 p.m. UTC | #3

> I still have to find a solution to a slowdown problem and this is where
> I would love to get some hint.
> Currently I still didn't find a good way to understand when the tagger
> starts to accept packets and because of this the initial setup is slow
> as every completion timeouts. Am I missing something or is there a way
> to check for this?

I've not looked at this particular driver, i just know the general
architecture.

The MDIO bus driver probes first, maybe as part of the Ethernet
driver, maybe as a standalone MDIO driver. The switch is found in DT
and the driver code will at some point later probe the switch driver.

The switch driver has working MDIO at this point. It should use MDIO
to talk to the switch, make sure it is there, maybe do some initial
configuration. Once it is happy, it registers the switch with the DSA
core using dsa_register_switch().

If this is a single switch, the DSA core will then start setting
things up. As part of dsa_switch_setup() it will call the switch
drivers setup() method. It then figures out what tag driver to use, by
calling dsa_switch_setup_tag_protocol(). However, the tag driver
itself is not inserted into the chain yet. That happens later.  Once
the switch is setup, dsa_tree_setup_master() is called which does
dsa_master_setup() and in the middle there is:

	/* If we use a tagging format that doesn't have an ethertype
	 * field, make sure that all packets from this point on get
	 * sent to the tag format's receive function.
	 */
	wmb();

	dev->dsa_ptr = cpu_dp;

This is the magic to actually enable the tagger receiving frames.

I need to look at your patches, but why is the tagger involved?  At
least for the Marvell switch, you send a pretty normal looking
Ethernet frame to a specific MAC address, and the switch replies using
that MAC address. And it has an Ether Type specific to switch
control. Since this is all normal looking, there are hooks in the
network stack which can be used to get these frames.

	Andrew

Florian Fainelli Dec. 7, 2021, 6:49 p.m. UTC | #4

On 12/7/21 7:15 AM, Andrew Lunn wrote:
> On Tue, Dec 07, 2021 at 03:59:36PM +0100, Ansuel Smith wrote:
>> Hi, this is still WIP and currently has some problem but I would love if
>> someone can give this a superficial review and answer to some problem
>> with this.
>>
>> The main reason for this is that we notice some routing problem in the
>> switch and it seems assisted learning is needed. Considering mdio is
>> quite slow due to the indirect write using this Ethernet alternative way
>> seems to be quicker.
>>
>> The qca8k switch supports a special way to pass mdio read/write request
>> using specially crafted Ethernet packet.
> 
> Oh! Cool! Marvell has this as well, and i suspect a few others. It is
> something i've wanted to work on for a long long time, but never had
> the opportunity.
> 
> This also means that, even if you are focusing on qca8k, please try to
> think what could be generic, and what should specific to the
> qca8k. The idea of sending an Ethernet frame and sometime later
> receiving a reply should be generic and usable for other DSA
> drivers. The contents of those frames needs to be driver specific.
> How we hook this into MDIO might also be generic, maybe.
> 
> I will look at your questions later, but soon.

There was a priori attempt from Vivien to add support for mv88e6xxx over
RMU frames:

https://www.mail-archive.com/netdev@vger.kernel.org/msg298317.html

This gets interesting because the switch's control path moves from MDIO
to Ethernet and there is not really an "ethernet bus" though we could
certainly come up with one. We have mdio-i2c, so maybe we should have
mdio-ethernet?

Christian Marangi Dec. 7, 2021, 6:53 p.m. UTC | #5

On Tue, Dec 07, 2021 at 07:41:23PM +0100, Andrew Lunn wrote:
> > I still have to find a solution to a slowdown problem and this is where
> > I would love to get some hint.
> > Currently I still didn't find a good way to understand when the tagger
> > starts to accept packets and because of this the initial setup is slow
> > as every completion timeouts. Am I missing something or is there a way
> > to check for this?
> 
> I've not looked at this particular driver, i just know the general
> architecture.
> 
> The MDIO bus driver probes first, maybe as part of the Ethernet
> driver, maybe as a standalone MDIO driver. The switch is found in DT
> and the driver code will at some point later probe the switch driver.
> 
> The switch driver has working MDIO at this point. It should use MDIO
> to talk to the switch, make sure it is there, maybe do some initial
> configuration. Once it is happy, it registers the switch with the DSA
> core using dsa_register_switch().
> 
> If this is a single switch, the DSA core will then start setting
> things up. As part of dsa_switch_setup() it will call the switch
> drivers setup() method. It then figures out what tag driver to use, by
> calling dsa_switch_setup_tag_protocol(). However, the tag driver
> itself is not inserted into the chain yet. That happens later.  Once
> the switch is setup, dsa_tree_setup_master() is called which does
> dsa_master_setup() and in the middle there is:
> 
> 	/* If we use a tagging format that doesn't have an ethertype
> 	 * field, make sure that all packets from this point on get
> 	 * sent to the tag format's receive function.
> 	 */
> 	wmb();
> 
> 	dev->dsa_ptr = cpu_dp;
> 
> This is the magic to actually enable the tagger receiving frames.
> 

Will check if using this is the correct way to prevent use of this
alternative way before it's available.

> I need to look at your patches, but why is the tagger involved?  At
> least for the Marvell switch, you send a pretty normal looking
> Ethernet frame to a specific MAC address, and the switch replies using
> that MAC address. And it has an Ether Type specific to switch
> control. Since this is all normal looking, there are hooks in the
> network stack which can be used to get these frames.
>

The qca tag header provide a TYPE value that refer to a big list of
Frame type. In all of this at value 2 we have the type that tells us
that is a READ_WRITE_REG_ACK (aka a mdio rw Ethernet packet)

The idea of using the tagger is to skip parsing the packet 2 times
considering the qca tag header is present at the same place in both
normal packet and mdio rw Ethernet packet.

Your idea would be hook this before the tagger and parse it?
I assume that is the only way if this has to be generilized. But I
wonder if this would create some overhead by the double parsing.

> 	Andrew
>

Andrew Lunn Dec. 7, 2021, 7:15 p.m. UTC | #6

> The qca tag header provide a TYPE value that refer to a big list of
> Frame type. In all of this at value 2 we have the type that tells us
> that is a READ_WRITE_REG_ACK (aka a mdio rw Ethernet packet)
> 
> The idea of using the tagger is to skip parsing the packet 2 times
> considering the qca tag header is present at the same place in both
> normal packet and mdio rw Ethernet packet.
> 
> Your idea would be hook this before the tagger and parse it?
> I assume that is the only way if this has to be generilized. But I
> wonder if this would create some overhead by the double parsing.

So it seems i remembered this incorrectly. Marvell call this Remote
Management Unit, RMU. And RMU makes use of bits inside the Marvell
Tag. I was thinking it was outside of the tag.

So, yes, the tagger does need to be involved in this.

The initial design of DSA was that the tagger and main driver were
kept separate. This has been causing us problems recently, we have use
cases where we need to share information between the tagger and the
driver. This looks like it is going to be another case of that.

	Andrew

Christian Marangi Dec. 7, 2021, 7:21 p.m. UTC | #7

On Tue, Dec 07, 2021 at 08:15:24PM +0100, Andrew Lunn wrote:
> > The qca tag header provide a TYPE value that refer to a big list of
> > Frame type. In all of this at value 2 we have the type that tells us
> > that is a READ_WRITE_REG_ACK (aka a mdio rw Ethernet packet)
> > 
> > The idea of using the tagger is to skip parsing the packet 2 times
> > considering the qca tag header is present at the same place in both
> > normal packet and mdio rw Ethernet packet.
> > 
> > Your idea would be hook this before the tagger and parse it?
> > I assume that is the only way if this has to be generilized. But I
> > wonder if this would create some overhead by the double parsing.
> 
> So it seems i remembered this incorrectly. Marvell call this Remote
> Management Unit, RMU. And RMU makes use of bits inside the Marvell
> Tag. I was thinking it was outside of the tag.
> 
> So, yes, the tagger does need to be involved in this.
> 
> The initial design of DSA was that the tagger and main driver were
> kept separate. This has been causing us problems recently, we have use
> cases where we need to share information between the tagger and the
> driver. This looks like it is going to be another case of that.
> 
> 	Andrew

I mean if you check the code this is still somewhat ""separate"".
I ""abuse"" the dsa port priv to share the required data.
(I allocate a different struct... i put it in qca8k_priv and i set every
port priv to this struct)

Wonder if we can add something to share data between the driver and the
port so the access that from the tagger. (something that doesn't use the
port priv)

Christian Marangi Dec. 7, 2021, 7:44 p.m. UTC | #8

On Tue, Dec 07, 2021 at 10:49:43AM -0800, Florian Fainelli wrote:
> On 12/7/21 7:15 AM, Andrew Lunn wrote:
> > On Tue, Dec 07, 2021 at 03:59:36PM +0100, Ansuel Smith wrote:
> >> Hi, this is still WIP and currently has some problem but I would love if
> >> someone can give this a superficial review and answer to some problem
> >> with this.
> >>
> >> The main reason for this is that we notice some routing problem in the
> >> switch and it seems assisted learning is needed. Considering mdio is
> >> quite slow due to the indirect write using this Ethernet alternative way
> >> seems to be quicker.
> >>
> >> The qca8k switch supports a special way to pass mdio read/write request
> >> using specially crafted Ethernet packet.
> > 
> > Oh! Cool! Marvell has this as well, and i suspect a few others. It is
> > something i've wanted to work on for a long long time, but never had
> > the opportunity.
> > 
> > This also means that, even if you are focusing on qca8k, please try to
> > think what could be generic, and what should specific to the
> > qca8k. The idea of sending an Ethernet frame and sometime later
> > receiving a reply should be generic and usable for other DSA
> > drivers. The contents of those frames needs to be driver specific.
> > How we hook this into MDIO might also be generic, maybe.
> > 
> > I will look at your questions later, but soon.
> 
> There was a priori attempt from Vivien to add support for mv88e6xxx over
> RMU frames:
> 
> https://www.mail-archive.com/netdev@vger.kernel.org/msg298317.html
> 
> This gets interesting because the switch's control path moves from MDIO
> to Ethernet and there is not really an "ethernet bus" though we could
> certainly come up with one. We have mdio-i2c, so maybe we should have
> mdio-ethernet?
> -- 
> Florian

I checked that series and I notice that the proposed implementation used
a workqueue. The current implementation here use completion and mutex so
the transaction is really one command at time and wait for response.
Considering most of the time we do read and write operation is seems a
bit overkill to use a queue... Also to track the response.
Using a single queue simplify the implementation and should be just
good. (btw qca8k supports a way to queue packet using a seq int but we
don't use it to keep things simple)

Is that acceptable? Also I notice in that series mru have some
limitation and can be used only for some kind of data...
Should we add a way to blacklist some particular reg and use the legacy
mdio way?

Vladimir Oltean Dec. 7, 2021, 8:52 p.m. UTC | #9

On Tue, Dec 07, 2021 at 08:21:52PM +0100, Ansuel Smith wrote:
> On Tue, Dec 07, 2021 at 08:15:24PM +0100, Andrew Lunn wrote:
> > > The qca tag header provide a TYPE value that refer to a big list of
> > > Frame type. In all of this at value 2 we have the type that tells us
> > > that is a READ_WRITE_REG_ACK (aka a mdio rw Ethernet packet)
> > > 
> > > The idea of using the tagger is to skip parsing the packet 2 times
> > > considering the qca tag header is present at the same place in both
> > > normal packet and mdio rw Ethernet packet.
> > > 
> > > Your idea would be hook this before the tagger and parse it?
> > > I assume that is the only way if this has to be generilized. But I
> > > wonder if this would create some overhead by the double parsing.
> > 
> > So it seems i remembered this incorrectly. Marvell call this Remote
> > Management Unit, RMU. And RMU makes use of bits inside the Marvell
> > Tag. I was thinking it was outside of the tag.
> > 
> > So, yes, the tagger does need to be involved in this.
> > 
> > The initial design of DSA was that the tagger and main driver were
> > kept separate. This has been causing us problems recently, we have use
> > cases where we need to share information between the tagger and the
> > driver. This looks like it is going to be another case of that.
> > 
> > 	Andrew
> 
> I mean if you check the code this is still somewhat ""separate"".
> I ""abuse"" the dsa port priv to share the required data.
> (I allocate a different struct... i put it in qca8k_priv and i set every
> port priv to this struct)
> 
> Wonder if we can add something to share data between the driver and the
> port so the access that from the tagger. (something that doesn't use the
> port priv)

The one problem relevant to this submission among those referenced by
Andrew is that dp->priv needs to be allocated by the Ethernet switch
driver, although it is used by the tagging protocol driver. So they
aren't really 'separate', nor can they be, the way dp->priv is currently
designed, it can only be "abused", not really "used".

The DSA design allows in principle for any switch driver to return any
protocol it desires in ->get_tag_protocol(). I occasionally test various
tagger submissions by hacking dsa_loop to do just that. But your
tag_qca.c driver would have a pretty unpleasant surprise if it was to be
paired to any other switch driver than qca8k, because that other driver
would either not allocate memory for dp->priv, or (worse) allocate some
other type of structure, expected to be used differently etc.

An even bigger complication is created by the fact that we can
dynamically change tagging protocols in certain cases (dsa <-> edsa,
ocelot <-> ocelot-8021q), and the current design doesn't really scale:
if any tagging protocol required its own dp->priv format, we may end up
with bugs such as the driver not freeing the old dp->priv and setting up
the new one, when the tagging protocol changes. These mistakes are all
too easy to make currently.

Another potential issue, which I don't see present here, but still
worth watching out for, is that the tagger cannot use symbols exported
by the switch, and vice versa. Otherwise the tagger cannot be inserted
into the kernel when built as module, due to missing symbols provided by
the switch. And the switch will not probe until it has a tagger.

I'm afraid we need to make a decision now whether we keep supporting the
separation between taggers and switch drivers, especially since the
tagger could become a bus provider for the switch driver. We need to
weigh the pros and cons.

I thought about what would be needed and I think we'd need tagger-owned
per-switch-tree private data. But this implies that there needs to be a
hook in the tagging protocol driver that notifies it when a certain
struct dsa_switch_tree *dst binds and unbinds to a certain tagger.
Then it could pick and choose the ports that need dp->priv configured in
a certain way: some taggers need the dp->priv of user ports to hold
something per port, others need the dp->priv of _all_ user ports to
point to something shared, others (like yours) apparently need the
dp->priv of the CPU port to hold something. This would become something
handled and owned exclusively by the tagger.

Ansuel, would something like this help you in any way?

Vladimir Oltean Dec. 7, 2021, 9:10 p.m. UTC | #10

On Tue, Dec 07, 2021 at 10:49:43AM -0800, Florian Fainelli wrote:
> On 12/7/21 7:15 AM, Andrew Lunn wrote:
> > On Tue, Dec 07, 2021 at 03:59:36PM +0100, Ansuel Smith wrote:
> >> Hi, this is still WIP and currently has some problem but I would love if
> >> someone can give this a superficial review and answer to some problem
> >> with this.
> >>
> >> The main reason for this is that we notice some routing problem in the
> >> switch and it seems assisted learning is needed. Considering mdio is
> >> quite slow due to the indirect write using this Ethernet alternative way
> >> seems to be quicker.
> >>
> >> The qca8k switch supports a special way to pass mdio read/write request
> >> using specially crafted Ethernet packet.
> > 
> > Oh! Cool! Marvell has this as well, and i suspect a few others. It is
> > something i've wanted to work on for a long long time, but never had
> > the opportunity.
> > 
> > This also means that, even if you are focusing on qca8k, please try to
> > think what could be generic, and what should specific to the
> > qca8k. The idea of sending an Ethernet frame and sometime later
> > receiving a reply should be generic and usable for other DSA
> > drivers. The contents of those frames needs to be driver specific.
> > How we hook this into MDIO might also be generic, maybe.
> > 
> > I will look at your questions later, but soon.
> 
> There was a priori attempt from Vivien to add support for mv88e6xxx over
> RMU frames:
> 
> https://www.mail-archive.com/netdev@vger.kernel.org/msg298317.html
> 
> This gets interesting because the switch's control path moves from MDIO
> to Ethernet and there is not really an "ethernet bus" though we could
> certainly come up with one. We have mdio-i2c, so maybe we should have
> mdio-ethernet?

This raises interesting questions. I see two distinct cases:

1. "dual I/O": the switch probes initially over a standard bus (MDIO,
   SPI, I2C) then at some point transitions towards I/O through the
   tagger.  This would be the case when there is some preparation work
   to be done (maybe the CPU port needs to be brought up, maybe there is
   a firmware to be uploaded to the switch's embedded microcontroller
   such that the expected remote management protocol is supported, etc).
   It would also be the case when multiple CPU ports are supported (and
   changing between CPU ports), because we could end up bringing a
   certain CPU port down, and the register I/O would need to be
   temporarily done over MDIO before bringing the other CPU port up.

2. "single I/O": the switch needs no such configuration, and in this
    case, it could in principle probe over an "Ethernet bus" rather than
    a standard bus as mentioned above.

I don't know which case is going to be more common, honestly. The
difference between the two is that the former would work using the
existing infrastructure (bus types) we have today, whereas the latter
would (maybe) need an "Ethernet bus" as mentioned by Vivien and Florian.

I'm not completely convinced, though. The argument for an "Ethernet bus"
seems to be that any Ethernet controller may need to set up such a
thing, independently of being a DSA master. In Vivien's link, an example
is given where we have "Control path via port 1, Data path via port port 3".
But I don't know, this separation seems pretty artificial and ultimately
boils down to configuration. Like it or not, in that particular example,
both ports 1 and 3 are CPU ports, and both eth1 and eth0 are DSA masters.
The fact that they are used asymmetrically should pretty much not matter.

I think a fair simplifying assumption is that switch management
protocols will never be spoken through interfaces that aren't DSA
masters (because being a DSA master _implies_ having a physical link to
a DSA switch). And if we have this simplifying factor, we could consider
moving dsa_tree_setup_master() somewhere earlier in the DSA probe path,
and just make "type 2" / "single I/O" switches be platform devices, with
a more-or-less empty probe function (just calls dsa_register_switch),
and do their hardware init in ->setup, which is _after_
dsa_tree_setup_master and therefore after the tagging protocol has bound
to the DSA switch tree and is prepared to handle I/O. So no bus really
needed.

Although I have to point this out. An "Ethernet bus" would be one of the
most unreliable buses out there. Consider that the user may run "ip link
set eth0 down" at any given time. What do you do, as a switch driver on
an Ethernet bus, when your DSA master goes down? Postpone all your I/O?
Error out on I/O? Unbind?

Even moreso, there are currently code paths in DSA that can only be done
while the DSA master is down (changing the tagging protocol comes to
mind). So does this mean that those code paths are simply not available
when the I/O is over Ethernet? Or does it mean that Ethernet cannot be
the sole I/O method of any switch driver, due to it being unreliable?
And if the latter, this is yet another argument against "Ethernet as bus".
A bus is basically only there for probing purposes, but if you need to
have a primary I/O method beside Ethernet, you already have a bus and
don't need another.

Christian Marangi Dec. 7, 2021, 9:47 p.m. UTC | #11

On Tue, Dec 07, 2021 at 10:52:19PM +0200, Vladimir Oltean wrote:
> On Tue, Dec 07, 2021 at 08:21:52PM +0100, Ansuel Smith wrote:
> > On Tue, Dec 07, 2021 at 08:15:24PM +0100, Andrew Lunn wrote:
> > > > The qca tag header provide a TYPE value that refer to a big list of
> > > > Frame type. In all of this at value 2 we have the type that tells us
> > > > that is a READ_WRITE_REG_ACK (aka a mdio rw Ethernet packet)
> > > > 
> > > > The idea of using the tagger is to skip parsing the packet 2 times
> > > > considering the qca tag header is present at the same place in both
> > > > normal packet and mdio rw Ethernet packet.
> > > > 
> > > > Your idea would be hook this before the tagger and parse it?
> > > > I assume that is the only way if this has to be generilized. But I
> > > > wonder if this would create some overhead by the double parsing.
> > > 
> > > So it seems i remembered this incorrectly. Marvell call this Remote
> > > Management Unit, RMU. And RMU makes use of bits inside the Marvell
> > > Tag. I was thinking it was outside of the tag.
> > > 
> > > So, yes, the tagger does need to be involved in this.
> > > 
> > > The initial design of DSA was that the tagger and main driver were
> > > kept separate. This has been causing us problems recently, we have use
> > > cases where we need to share information between the tagger and the
> > > driver. This looks like it is going to be another case of that.
> > > 
> > > 	Andrew
> > 
> > I mean if you check the code this is still somewhat ""separate"".
> > I ""abuse"" the dsa port priv to share the required data.
> > (I allocate a different struct... i put it in qca8k_priv and i set every
> > port priv to this struct)
> > 
> > Wonder if we can add something to share data between the driver and the
> > port so the access that from the tagger. (something that doesn't use the
> > port priv)
> 
> The one problem relevant to this submission among those referenced by
> Andrew is that dp->priv needs to be allocated by the Ethernet switch
> driver, although it is used by the tagging protocol driver. So they
> aren't really 'separate', nor can they be, the way dp->priv is currently
> designed, it can only be "abused", not really "used".
> 
> The DSA design allows in principle for any switch driver to return any
> protocol it desires in ->get_tag_protocol(). I occasionally test various
> tagger submissions by hacking dsa_loop to do just that. But your
> tag_qca.c driver would have a pretty unpleasant surprise if it was to be
> paired to any other switch driver than qca8k, because that other driver
> would either not allocate memory for dp->priv, or (worse) allocate some
> other type of structure, expected to be used differently etc.
>
> An even bigger complication is created by the fact that we can
> dynamically change tagging protocols in certain cases (dsa <-> edsa,
> ocelot <-> ocelot-8021q), and the current design doesn't really scale:
> if any tagging protocol required its own dp->priv format, we may end up
> with bugs such as the driver not freeing the old dp->priv and setting up
> the new one, when the tagging protocol changes. These mistakes are all
> too easy to make currently.
> 
> Another potential issue, which I don't see present here, but still
> worth watching out for, is that the tagger cannot use symbols exported
> by the switch, and vice versa. Otherwise the tagger cannot be inserted
> into the kernel when built as module, due to missing symbols provided by
> the switch. And the switch will not probe until it has a tagger.
> 
> I'm afraid we need to make a decision now whether we keep supporting the
> separation between taggers and switch drivers, especially since the
> tagger could become a bus provider for the switch driver. We need to
> weigh the pros and cons.
> 
> I thought about what would be needed and I think we'd need tagger-owned
> per-switch-tree private data. But this implies that there needs to be a
> hook in the tagging protocol driver that notifies it when a certain
> struct dsa_switch_tree *dst binds and unbinds to a certain tagger.
> Then it could pick and choose the ports that need dp->priv configured in
> a certain way: some taggers need the dp->priv of user ports to hold
> something per port, others need the dp->priv of _all_ user ports to
> point to something shared, others (like yours) apparently need the
> dp->priv of the CPU port to hold something. This would become something
> handled and owned exclusively by the tagger.
> 
> Ansuel, would something like this help you in any way?

I agree on all the concern you pointed out. IMHO, current implementation
of the dsa port priv is a bit confusing and someone can really do bad
things with it. (like it's done in this implementation)

The main problem here is that we really need a way to have shared data
between tagger and dsa driver. I also think that it would be limiting
using this only for mdio. For example qca8k can autocast mib with
Ethernet port and that would be another feature that the tagger would
handle.

I like the idea of tagger-owend per-switch-tree private data.
Do we really need to hook logic?
Wonder if something like that would work:
1. Each tagger declare size of his private data (if any).
2. Change tag dsa helper make sure the privata data in dst gets
   allocated and freed.
3. We create some helper to get the tagger private data pointer that
   dsa driver will use. (an error is returned if no data is present)
4. Tagger will use the dst to access his own data.

In theory that way we should be able to make a ""connection"" between
dsa driver and the tagger and prevent any sort of strange stuff that
could result in bug/kernel panic.

I mean for the current task (mdio in ethernet packet) we just need to
put data, send the skb and wait for a response (and after parsing) get
the data from a response skb.

Christian Marangi Dec. 7, 2021, 10:01 p.m. UTC | #12

On Tue, Dec 07, 2021 at 11:10:18PM +0200, Vladimir Oltean wrote:
> On Tue, Dec 07, 2021 at 10:49:43AM -0800, Florian Fainelli wrote:
> > On 12/7/21 7:15 AM, Andrew Lunn wrote:
> > > On Tue, Dec 07, 2021 at 03:59:36PM +0100, Ansuel Smith wrote:
> > >> Hi, this is still WIP and currently has some problem but I would love if
> > >> someone can give this a superficial review and answer to some problem
> > >> with this.
> > >>
> > >> The main reason for this is that we notice some routing problem in the
> > >> switch and it seems assisted learning is needed. Considering mdio is
> > >> quite slow due to the indirect write using this Ethernet alternative way
> > >> seems to be quicker.
> > >>
> > >> The qca8k switch supports a special way to pass mdio read/write request
> > >> using specially crafted Ethernet packet.
> > > 
> > > Oh! Cool! Marvell has this as well, and i suspect a few others. It is
> > > something i've wanted to work on for a long long time, but never had
> > > the opportunity.
> > > 
> > > This also means that, even if you are focusing on qca8k, please try to
> > > think what could be generic, and what should specific to the
> > > qca8k. The idea of sending an Ethernet frame and sometime later
> > > receiving a reply should be generic and usable for other DSA
> > > drivers. The contents of those frames needs to be driver specific.
> > > How we hook this into MDIO might also be generic, maybe.
> > > 
> > > I will look at your questions later, but soon.
> > 
> > There was a priori attempt from Vivien to add support for mv88e6xxx over
> > RMU frames:
> > 
> > https://www.mail-archive.com/netdev@vger.kernel.org/msg298317.html
> > 
> > This gets interesting because the switch's control path moves from MDIO
> > to Ethernet and there is not really an "ethernet bus" though we could
> > certainly come up with one. We have mdio-i2c, so maybe we should have
> > mdio-ethernet?
> 
> This raises interesting questions. I see two distinct cases:
> 
> 1. "dual I/O": the switch probes initially over a standard bus (MDIO,
>    SPI, I2C) then at some point transitions towards I/O through the
>    tagger.  This would be the case when there is some preparation work
>    to be done (maybe the CPU port needs to be brought up, maybe there is
>    a firmware to be uploaded to the switch's embedded microcontroller
>    such that the expected remote management protocol is supported, etc).
>    It would also be the case when multiple CPU ports are supported (and
>    changing between CPU ports), because we could end up bringing a
>    certain CPU port down, and the register I/O would need to be
>    temporarily done over MDIO before bringing the other CPU port up.
> 
> 2. "single I/O": the switch needs no such configuration, and in this
>     case, it could in principle probe over an "Ethernet bus" rather than
>     a standard bus as mentioned above.
> 
> I don't know which case is going to be more common, honestly. The
> difference between the two is that the former would work using the
> existing infrastructure (bus types) we have today, whereas the latter
> would (maybe) need an "Ethernet bus" as mentioned by Vivien and Florian.
>

Considering this is very specific for qca8k (it does use very not
standard Ethernet packet) and we have only another switch that more or
less supports this, I honestly think we should think about the dual I/0.
qca8k for example require to be setup first with mdio (as it does
require some bit to enable header mode, the tagger way to comunicate
ethernet mdio type packet) and generally the cpu port has to be
configured for the phy mode. We can't assume a bootloader correctly
setup the cpu port and the switch being able to receive packet so I
think a fallback is always necessary.

> I'm not completely convinced, though. The argument for an "Ethernet bus"
> seems to be that any Ethernet controller may need to set up such a
> thing, independently of being a DSA master. In Vivien's link, an example
> is given where we have "Control path via port 1, Data path via port port 3".
> But I don't know, this separation seems pretty artificial and ultimately
> boils down to configuration. Like it or not, in that particular example,
> both ports 1 and 3 are CPU ports, and both eth1 and eth0 are DSA masters.
> The fact that they are used asymmetrically should pretty much not matter.
> 
> I think a fair simplifying assumption is that switch management
> protocols will never be spoken through interfaces that aren't DSA
> masters (because being a DSA master _implies_ having a physical link to
> a DSA switch). And if we have this simplifying factor, we could consider
> moving dsa_tree_setup_master() somewhere earlier in the DSA probe path,
> and just make "type 2" / "single I/O" switches be platform devices, with
> a more-or-less empty probe function (just calls dsa_register_switch),
> and do their hardware init in ->setup, which is _after_
> dsa_tree_setup_master and therefore after the tagging protocol has bound
> to the DSA switch tree and is prepared to handle I/O. So no bus really
> needed.
> 
> Although I have to point this out. An "Ethernet bus" would be one of the
> most unreliable buses out there. Consider that the user may run "ip link
> set eth0 down" at any given time. What do you do, as a switch driver on
> an Ethernet bus, when your DSA master goes down? Postpone all your I/O?
> Error out on I/O? Unbind?

With something like an Ethernet bus only (no dual I/O) the operation
should just be rejected to prevent this kind of stuff. (return EBUSY error?)

> 
> Even moreso, there are currently code paths in DSA that can only be done
> while the DSA master is down (changing the tagging protocol comes to
> mind). So does this mean that those code paths are simply not available
> when the I/O is over Ethernet? Or does it mean that Ethernet cannot be
> the sole I/O method of any switch driver, due to it being unreliable?
> And if the latter, this is yet another argument against "Ethernet as bus".
> A bus is basically only there for probing purposes, but if you need to
> have a primary I/O method beside Ethernet, you already have a bus and
> don't need another.

As I said up, a ""stable"" I/O method must be present and the fast path
should be used only if available. About this I'm thinking if we should
create an helper and let dsa decide when a method can be used instead of
another one. A dsa driver will then use these helper function and mimic
the standard read/write/update_bits thing (but this would be very specific
and used only for Ethernet mdio and probably not correct)

Andrew Lunn Dec. 7, 2021, 10:22 p.m. UTC | #13

> The main problem here is that we really need a way to have shared data
> between tagger and dsa driver. I also think that it would be limiting
> using this only for mdio. For example qca8k can autocast mib with
> Ethernet port and that would be another feature that the tagger would
> handle.

The Marvell switches also have an efficient way to get the whole MIB
table. So this is something i would also want.

> I like the idea of tagger-owend per-switch-tree private data.
> Do we really need to hook logic?

We have two different things here.

1) The tagger needs somewhere to store its own private data.
2) The tagger needs to share state with the switch driver.

We can probably have the DSA core provide 1). Add the size to
dsa_device_ops structure, and provide helpers to go from either a
master or a slave netdev to the private data.

2) is harder. But as far as i know, we have an 1:N setup.  One switch
driver can use N tag drivers. So we need the switch driver to be sure
the tag driver is what it expects. We keep the shared state in the tag
driver, so it always has valid data, but when the switch driver wants
to get a pointer to it, it needs to pass a enum dsa_tag_protocol and
if it does not match, the core should return -EINVAL or similar.

   Andrew

Christian Marangi Dec. 7, 2021, 10:30 p.m. UTC | #14

On Tue, Dec 07, 2021 at 11:22:41PM +0100, Andrew Lunn wrote:
> > The main problem here is that we really need a way to have shared data
> > between tagger and dsa driver. I also think that it would be limiting
> > using this only for mdio. For example qca8k can autocast mib with
> > Ethernet port and that would be another feature that the tagger would
> > handle.
> 
> The Marvell switches also have an efficient way to get the whole MIB
> table. So this is something i would also want.
>

Again same think... they just put the type in the qca hdr (placed in the
EthType position) and everything else is a mib. The switch send 7 packet
and each correspond to the MIB for the port (the port number is
comunicated in the qca hdr)

> > I like the idea of tagger-owend per-switch-tree private data.
> > Do we really need to hook logic?
> 
> We have two different things here.
> 
> 1) The tagger needs somewhere to store its own private data.
> 2) The tagger needs to share state with the switch driver.
> 
> We can probably have the DSA core provide 1). Add the size to
> dsa_device_ops structure, and provide helpers to go from either a
> master or a slave netdev to the private data.

I'm just implementing this. It doesn't look that hard.

> 
> 2) is harder. But as far as i know, we have an 1:N setup.  One switch
> driver can use N tag drivers. So we need the switch driver to be sure
> the tag driver is what it expects. We keep the shared state in the tag
> driver, so it always has valid data, but when the switch driver wants
> to get a pointer to it, it needs to pass a enum dsa_tag_protocol and
> if it does not match, the core should return -EINVAL or similar.
> 

Mhh this looks a bit complex. I'm probably missing something but why the
tagger needs to share a state? To check if it does support some feature?
If it's ready to be used for mdio Ethernet? Or just to be future-proof?

>    Andrew

Andrew Lunn Dec. 7, 2021, 10:37 p.m. UTC | #15

> This raises interesting questions. I see two distinct cases:
> 
> 1. "dual I/O": the switch probes initially over a standard bus (MDIO,
>    SPI, I2C) then at some point transitions towards I/O through the
>    tagger.  This would be the case when there is some preparation work
>    to be done (maybe the CPU port needs to be brought up, maybe there is
>    a firmware to be uploaded to the switch's embedded microcontroller
>    such that the expected remote management protocol is supported, etc).
>    It would also be the case when multiple CPU ports are supported (and
>    changing between CPU ports), because we could end up bringing a
>    certain CPU port down, and the register I/O would need to be
>    temporarily done over MDIO before bringing the other CPU port up.

mv88e6xxx is very likely to take this path. You need to program some
registers to enable RMU. It is possible to enable this via EEPROM
configuration, but i've never seen any hardware with the necessary
EEPROM configuration. And you have the old chicken/egg, in order to be
able to program the EEPROM, you need access to the switch, or a header
and a cable.

> 2. "single I/O": the switch needs no such configuration, and in this
>     case, it could in principle probe over an "Ethernet bus" rather than
>     a standard bus as mentioned above.
> 
> I don't know which case is going to be more common, honestly.

Given the history, i think MDIO startup, and then transition to
Ethernet is going to be a lot more common.  If there was a lot of
hardware out there which could do Ethernet from the beginning, we
would of had patches or at least requests for it by now. 

I would keep it KISS.

      Andrew

Vladimir Oltean Dec. 7, 2021, 10:45 p.m. UTC | #16

On Tue, Dec 07, 2021 at 10:47:59PM +0100, Ansuel Smith wrote:
> The main problem here is that we really need a way to have shared data
> between tagger and dsa driver. I also think that it would be limiting
> using this only for mdio. For example qca8k can autocast mib with
> Ethernet port and that would be another feature that the tagger would
> handle.

This is cool. I suppose this is what QCA_HDR_RECV_TYPE_MIB is for.
But it wouldn't work with your design because the tagger doesn't hold
any queues, it is basically a request/response which is always initiated
by the switch driver. The hardware can't automatically push anything to
software on its own. Maybe if the tagger wouldn't be stateless, that
would be better? What if the qca8k switch driver would just provide some
function pointers to the switch driver (these would be protocol
sub-handlers for QCA_HDR_RECV_TYPE_MIB, QCA_HDR_RECV_TYPE_RW_REG_ACK
etc), and your completion structure would be both initialized, as well
as finalized, all from within the switch driver itself?

> I like the idea of tagger-owend per-switch-tree private data.
> Do we really need to hook logic?
> Wonder if something like that would work:
> 1. Each tagger declare size of his private data (if any).
> 2. Change tag dsa helper make sure the privata data in dst gets
>    allocated and freed.
> 3. We create some helper to get the tagger private data pointer that
>    dsa driver will use. (an error is returned if no data is present)
> 4. Tagger will use the dst to access his own data.

I considered a simplified form like this, but I think the tagger private
data will still stay in dp->priv, only its ownership will change.
It is less flexible to just have an autoalloc size. Ok, you allocate a
structure the size you need, but which dp->priv gets to have it?
Maybe a certain tagging protocol will need dp1->priv == dp2->priv ==
dp3->priv == ..., whereas a different tagging protocol will need unique
different structures for each dp.

> 
> In theory that way we should be able to make a ""connection"" between
> dsa driver and the tagger and prevent any sort of strange stuff that
> could result in bug/kernel panic.
> 
> I mean for the current task (mdio in ethernet packet) we just need to
> put data, send the skb and wait for a response (and after parsing) get
> the data from a response skb.

It would be a huge win IMO if we could avoid managing the lifetime of
dp->priv _directly_. I'm thinking something along the lines of:

- every time we make the "dst->tag_ops = tag_ops;" assignment (see dsa2.c)
  there is a connection event between the switch tree and the tagging
  protocol (and also a disconnection event, if dst->tag_ops wasn't
  previously NULL).

- we could add a new tag_ops->connect(dst) and tag_ops->disconnect(dst)
  and call them. These could allocate/free the dp->priv memory for each
  dp in &dst->ports.

- _after_ the tag_ops->connect() has been called (this makes sure that
  the tagger memory has been allocated) we could also emit a cross-chip
  notifier event:

	/* DSA_NOTIFIER_TAG_PROTO_CONNECT */
	struct dsa_notifier_tag_proto_connect_info {
		const struct dsa_device_ops *tag_ops;
	};

	struct dsa_notifier_tag_proto_connect_info info;

	dsa_tree_notify(dst, DSA_NOTIFIER_TAG_PROTO, &info);

  The role of a cross-chip notifier is to fan-out a call exactly once to
  every switch within a tree. This particular cross-chip notifier could
  end up with an implementation in switch.c that lands with a call to:

  ds->ops->tag_proto_connect(ds, tag_ops);

  At this point, I'm a bit fuzzy on the details. I'm thinking of
  something like this:

	struct qca8k_tagger_private {
		void (*rw_reg_ack_handler)(struct dsa_port *dp, void *buf);
		void (*mib_autocast_handler)(struct dsa_port *dp, void *buf);
	};

	static void qca8k_rw_reg_ack_handler(struct dsa_port *dp, void *buf)
	{
		... (code moved from tagger)
	}

	static void qca8k_mib_autocast_handler(struct dsa_port *dp, void *buf)
	{
		... (code moved from tagger)
	}

	static int qca8k_tag_proto_connect(struct dsa_switch *ds,
					   const struct dsa_device_ops *tag_ops)
	{
		switch (tag_ops->proto) {
		case DSA_TAG_PROTO_QCA:
			struct dsa_port *dp;

			dsa_switch_for_each_port(dp, ds) {
				struct qca8k_tagger_private *priv = dp->priv;

				priv->rw_reg_ack_handler = qca8k_rw_reg_ack_handler;
				priv->mib_autocast_handler = qca8k_mib_autocast_handler;
			}

			break;
		default:
			return -EOPNOTSUPP;
		}
	}

	static const struct dsa_switch_ops qca8k_switch_ops = {
		...
		.tag_proto_connect	= qca8k_tag_proto_connect,
	};

  My current idea is maybe not ideal and a bit fuzzy, because the switch
  driver would need to be aware of the fact that the tagger private data
  is in dp->priv, and some code in one folder needs to be in sync with
  some code in another folder. But at least it should be safer this way,
  because we are in more control over the exact connection that's being
  made.

- to avoid leaking memory, we also need to patch dsa_tree_put() to issue
  a disconnect event on unbind.

- the tagging protocol driver would always need to NULL-check the
  function pointer before dereferencing it, because it may connect to a
  switch driver that doesn't set them up (dsa_loop):

	struct qca8k_tagger_private *priv = dp->priv;

	if (priv->rw_reg_ack_handler)
		priv->rw_reg_ack_handler(dp, skb_mac_header(skb));

Andrew Lunn Dec. 7, 2021, 10:46 p.m. UTC | #17

> > 1) The tagger needs somewhere to store its own private data.
> > 2) The tagger needs to share state with the switch driver.
> > 
> > We can probably have the DSA core provide 1). Add the size to
> > dsa_device_ops structure, and provide helpers to go from either a
> > master or a slave netdev to the private data.
> 
> I'm just implementing this. It doesn't look that hard.
> 
> > 
> > 2) is harder. But as far as i know, we have an 1:N setup.  One switch
> > driver can use N tag drivers. So we need the switch driver to be sure
> > the tag driver is what it expects. We keep the shared state in the tag
> > driver, so it always has valid data, but when the switch driver wants
> > to get a pointer to it, it needs to pass a enum dsa_tag_protocol and
> > if it does not match, the core should return -EINVAL or similar.
> > 
> 
> Mhh this looks a bit complex. I'm probably missing something but why the
> tagger needs to share a state? To check if it does support some feature?
> If it's ready to be used for mdio Ethernet? Or just to be future-proof?

This is the general problem we have, and it might not be relevant for
this specific problem of MDIO over Ethernet.

tag_sja1105 wants access to a queue of frames and a work queue shared
with switch driver.

tag_ocelot_8021q has something similar to tag_sja1105.

tag_lan9303 wants to know if its two ports are in the same bridge.

	Andrew

Andrew Lunn Dec. 7, 2021, 10:54 p.m. UTC | #18

> I considered a simplified form like this, but I think the tagger private
> data will still stay in dp->priv, only its ownership will change.

Isn't dp a port structure. So there is one per port?

This is where i think we need to separate shared state from tagger
private data. Probably tagger private data is not per port. Shared
state between the switch driver and the tagger maybe is per port?

      Andrew

Christian Marangi Dec. 7, 2021, 11:05 p.m. UTC | #19

On Wed, Dec 08, 2021 at 12:45:25AM +0200, Vladimir Oltean wrote:
> On Tue, Dec 07, 2021 at 10:47:59PM +0100, Ansuel Smith wrote:
> > The main problem here is that we really need a way to have shared data
> > between tagger and dsa driver. I also think that it would be limiting
> > using this only for mdio. For example qca8k can autocast mib with
> > Ethernet port and that would be another feature that the tagger would
> > handle.
> 
> This is cool. I suppose this is what QCA_HDR_RECV_TYPE_MIB is for.

Exactly that.

> But it wouldn't work with your design because the tagger doesn't hold
> any queues, it is basically a request/response which is always initiated
> by the switch driver. The hardware can't automatically push anything to
> software on its own. Maybe if the tagger wouldn't be stateless, that
> would be better? What if the qca8k switch driver would just provide some
> function pointers to the switch driver (these would be protocol
> sub-handlers for QCA_HDR_RECV_TYPE_MIB, QCA_HDR_RECV_TYPE_RW_REG_ACK
> etc), and your completion structure would be both initialized, as well
> as finalized, all from within the switch driver itself?
> 

Hm. Interesting idea. So qca8k would provide the way to parse the packet
and made the request. The tagger would just detect the packet and
execute the dedicated function.
About mib considering the driver autocast counter for every port and
every packet have the relevant port to it (set in the qca tag), the
idea was to put a big array and directly write the data. The ethtool
function will then just read the data and report it. (or even work
directly on the ethtool data array).

> > I like the idea of tagger-owend per-switch-tree private data.
> > Do we really need to hook logic?
> > Wonder if something like that would work:
> > 1. Each tagger declare size of his private data (if any).
> > 2. Change tag dsa helper make sure the privata data in dst gets
> >    allocated and freed.
> > 3. We create some helper to get the tagger private data pointer that
> >    dsa driver will use. (an error is returned if no data is present)
> > 4. Tagger will use the dst to access his own data.
> 
> I considered a simplified form like this, but I think the tagger private
> data will still stay in dp->priv, only its ownership will change.
> It is less flexible to just have an autoalloc size. Ok, you allocate a
> structure the size you need, but which dp->priv gets to have it?
> Maybe a certain tagging protocol will need dp1->priv == dp2->priv ==
> dp3->priv == ..., whereas a different tagging protocol will need unique
> different structures for each dp.
> 
> > 
> > In theory that way we should be able to make a ""connection"" between
> > dsa driver and the tagger and prevent any sort of strange stuff that
> > could result in bug/kernel panic.
> > 
> > I mean for the current task (mdio in ethernet packet) we just need to
> > put data, send the skb and wait for a response (and after parsing) get
> > the data from a response skb.
> 
> It would be a huge win IMO if we could avoid managing the lifetime of
> dp->priv _directly_. I'm thinking something along the lines of:
> 
> - every time we make the "dst->tag_ops = tag_ops;" assignment (see dsa2.c)
>   there is a connection event between the switch tree and the tagging
>   protocol (and also a disconnection event, if dst->tag_ops wasn't
>   previously NULL).
> 
> - we could add a new tag_ops->connect(dst) and tag_ops->disconnect(dst)
>   and call them. These could allocate/free the dp->priv memory for each
>   dp in &dst->ports.
> 
> - _after_ the tag_ops->connect() has been called (this makes sure that
>   the tagger memory has been allocated) we could also emit a cross-chip
>   notifier event:
> 
> 	/* DSA_NOTIFIER_TAG_PROTO_CONNECT */
> 	struct dsa_notifier_tag_proto_connect_info {
> 		const struct dsa_device_ops *tag_ops;
> 	};
> 
> 	struct dsa_notifier_tag_proto_connect_info info;
> 
> 	dsa_tree_notify(dst, DSA_NOTIFIER_TAG_PROTO, &info);
> 
>   The role of a cross-chip notifier is to fan-out a call exactly once to
>   every switch within a tree. This particular cross-chip notifier could
>   end up with an implementation in switch.c that lands with a call to:
> 
>   ds->ops->tag_proto_connect(ds, tag_ops);
> 
>   At this point, I'm a bit fuzzy on the details. I'm thinking of
>   something like this:
> 
> 	struct qca8k_tagger_private {
> 		void (*rw_reg_ack_handler)(struct dsa_port *dp, void *buf);
> 		void (*mib_autocast_handler)(struct dsa_port *dp, void *buf);
> 	};
> 
> 	static void qca8k_rw_reg_ack_handler(struct dsa_port *dp, void *buf)
> 	{
> 		... (code moved from tagger)
> 	}
> 
> 	static void qca8k_mib_autocast_handler(struct dsa_port *dp, void *buf)
> 	{
> 		... (code moved from tagger)
> 	}
> 
> 	static int qca8k_tag_proto_connect(struct dsa_switch *ds,
> 					   const struct dsa_device_ops *tag_ops)
> 	{
> 		switch (tag_ops->proto) {
> 		case DSA_TAG_PROTO_QCA:
> 			struct dsa_port *dp;
> 
> 			dsa_switch_for_each_port(dp, ds) {
> 				struct qca8k_tagger_private *priv = dp->priv;
> 
> 				priv->rw_reg_ack_handler = qca8k_rw_reg_ack_handler;
> 				priv->mib_autocast_handler = qca8k_mib_autocast_handler;
> 			}
> 
> 			break;
> 		default:
> 			return -EOPNOTSUPP;
> 		}
> 	}
> 
> 	static const struct dsa_switch_ops qca8k_switch_ops = {
> 		...
> 		.tag_proto_connect	= qca8k_tag_proto_connect,
> 	};
> 
>   My current idea is maybe not ideal and a bit fuzzy, because the switch
>   driver would need to be aware of the fact that the tagger private data
>   is in dp->priv, and some code in one folder needs to be in sync with
>   some code in another folder. But at least it should be safer this way,
>   because we are in more control over the exact connection that's being
>   made.
> 
> - to avoid leaking memory, we also need to patch dsa_tree_put() to issue
>   a disconnect event on unbind.
> 
> - the tagging protocol driver would always need to NULL-check the
>   function pointer before dereferencing it, because it may connect to a
>   switch driver that doesn't set them up (dsa_loop):
> 
> 	struct qca8k_tagger_private *priv = dp->priv;
> 
> 	if (priv->rw_reg_ack_handler)
> 		priv->rw_reg_ack_handler(dp, skb_mac_header(skb));

Ok so your idea is to make the driver the one controlling ""everything""
and keep the tagger as dummy as possible. That would also remove all the
need to put stuff in the global include dir. Looks complex but handy. We
still need to understand the state part. Any hint about that?

In the mean time I will try implement this.

Vladimir Oltean Dec. 7, 2021, 11:14 p.m. UTC | #20

On Tue, Dec 07, 2021 at 11:54:07PM +0100, Andrew Lunn wrote:
> > I considered a simplified form like this, but I think the tagger private
> > data will still stay in dp->priv, only its ownership will change.
> 
> Isn't dp a port structure. So there is one per port?

Yes, but dp->priv is a pointer. The thing it points to may not
necessarily be per port.

> This is where i think we need to separate shared state from tagger
> private data. Probably tagger private data is not per port. Shared
> state between the switch driver and the tagger maybe is per port?

I don't know whether there's such a big difference between
"shared state" vs "private data"? The dp->priv model is flexible enough
to support both. For example, in tag_sja1105, dp->priv is a struct
sja1105_port. All struct sja1105_port of a switch have a common struct
sja1105_tagger_data *data pointer. We could certainly set up the
tag_ops->connect(dst) function to allocate memory in this way.

Vladimir Oltean Dec. 7, 2021, 11:20 p.m. UTC | #21

On Wed, Dec 08, 2021 at 12:05:11AM +0100, Ansuel Smith wrote:
> Hm. Interesting idea. So qca8k would provide the way to parse the packet
> and made the request. The tagger would just detect the packet and
> execute the dedicated function.
> About mib considering the driver autocast counter for every port and
> every packet have the relevant port to it (set in the qca tag), the
> idea was to put a big array and directly write the data. The ethtool
> function will then just read the data and report it. (or even work
> directly on the ethtool data array).

Apart from the fact that you'd be running inside the priv->rw_reg_ack_handler()
which runs in softirq context (so you need spinlocks to serialize with
the code that runs in process and/or workqueue context), you have access
to all the data structures from the switch driver that you're used to.
So you could copy from the void *buf into something owned by struct
qca8k_priv *priv, sure.

> >   My current idea is maybe not ideal and a bit fuzzy, because the switch
> >   driver would need to be aware of the fact that the tagger private data
> >   is in dp->priv, and some code in one folder needs to be in sync with
> >   some code in another folder. But at least it should be safer this way,
> >   because we are in more control over the exact connection that's being
> >   made.
> > 
> > - to avoid leaking memory, we also need to patch dsa_tree_put() to issue
> >   a disconnect event on unbind.
> > 
> > - the tagging protocol driver would always need to NULL-check the
> >   function pointer before dereferencing it, because it may connect to a
> >   switch driver that doesn't set them up (dsa_loop):
> > 
> > 	struct qca8k_tagger_private *priv = dp->priv;
> > 
> > 	if (priv->rw_reg_ack_handler)
> > 		priv->rw_reg_ack_handler(dp, skb_mac_header(skb));
> 
> Ok so your idea is to make the driver the one controlling ""everything""
> and keep the tagger as dummy as possible. That would also remove all the
> need to put stuff in the global include dir. Looks complex but handy. We
> still need to understand the state part. Any hint about that?
> 
> In the mean time I will try implement this.

What do you mean exactly by understanding the state?

Christian Marangi Dec. 7, 2021, 11:24 p.m. UTC | #22

On Wed, Dec 08, 2021 at 01:20:20AM +0200, Vladimir Oltean wrote:
> On Wed, Dec 08, 2021 at 12:05:11AM +0100, Ansuel Smith wrote:
> > Hm. Interesting idea. So qca8k would provide the way to parse the packet
> > and made the request. The tagger would just detect the packet and
> > execute the dedicated function.
> > About mib considering the driver autocast counter for every port and
> > every packet have the relevant port to it (set in the qca tag), the
> > idea was to put a big array and directly write the data. The ethtool
> > function will then just read the data and report it. (or even work
> > directly on the ethtool data array).
> 
> Apart from the fact that you'd be running inside the priv->rw_reg_ack_handler()
> which runs in softirq context (so you need spinlocks to serialize with
> the code that runs in process and/or workqueue context), you have access
> to all the data structures from the switch driver that you're used to.
> So you could copy from the void *buf into something owned by struct
> qca8k_priv *priv, sure.
> 
> > >   My current idea is maybe not ideal and a bit fuzzy, because the switch
> > >   driver would need to be aware of the fact that the tagger private data
> > >   is in dp->priv, and some code in one folder needs to be in sync with
> > >   some code in another folder. But at least it should be safer this way,
> > >   because we are in more control over the exact connection that's being
> > >   made.
> > > 
> > > - to avoid leaking memory, we also need to patch dsa_tree_put() to issue
> > >   a disconnect event on unbind.
> > > 
> > > - the tagging protocol driver would always need to NULL-check the
> > >   function pointer before dereferencing it, because it may connect to a
> > >   switch driver that doesn't set them up (dsa_loop):
> > > 
> > > 	struct qca8k_tagger_private *priv = dp->priv;
> > > 
> > > 	if (priv->rw_reg_ack_handler)
> > > 		priv->rw_reg_ack_handler(dp, skb_mac_header(skb));
> > 
> > Ok so your idea is to make the driver the one controlling ""everything""
> > and keep the tagger as dummy as possible. That would also remove all the
> > need to put stuff in the global include dir. Looks complex but handy. We
> > still need to understand the state part. Any hint about that?
> > 
> > In the mean time I will try implement this.
> 
> What do you mean exactly by understanding the state?

I was referring to the "shared state" problem but you already answer
that in the prev email.

Vladimir Oltean Dec. 7, 2021, 11:47 p.m. UTC | #23

On Tue, Dec 07, 2021 at 11:22:41PM +0100, Andrew Lunn wrote:
> > I like the idea of tagger-owend per-switch-tree private data.
> > Do we really need to hook logic?
> 
> We have two different things here.
> 
> 1) The tagger needs somewhere to store its own private data.
> 2) The tagger needs to share state with the switch driver.
> 
> We can probably have the DSA core provide 1). Add the size to
> dsa_device_ops structure, and provide helpers to go from either a
> master or a slave netdev to the private data.

We cannot "add the size to the dsa_device_ops structure", because it is
a singleton (const struct). It is not replicated at all, not per port,
not per switch, not per tree, but global to the kernel. Not to mention
const. Nobody needs state as shared as that.

Given this, do you have objections to the sja1105_port->data model for
shared state?

> 2) is harder. But as far as i know, we have an 1:N setup.  One switch
> driver can use N tag drivers. So we need the switch driver to be sure
> the tag driver is what it expects. We keep the shared state in the tag
> driver, so it always has valid data, but when the switch driver wants
> to get a pointer to it, it needs to pass a enum dsa_tag_protocol and
> if it does not match, the core should return -EINVAL or similar.

In my proposal, the tagger will allocate the memory from its side of the
->connect() call. So regardless of whether the switch driver side
connects or not, the memory inside dp->priv is there for the tagger to
use. The switch can access it or it can ignore it.

Vladimir Oltean Dec. 8, 2021, 12:04 a.m. UTC | #24

On Wed, Dec 08, 2021 at 01:47:36AM +0200, Vladimir Oltean wrote:
> > 2) is harder. But as far as i know, we have an 1:N setup.  One switch
> > driver can use N tag drivers. So we need the switch driver to be sure
> > the tag driver is what it expects. We keep the shared state in the tag
> > driver, so it always has valid data, but when the switch driver wants
> > to get a pointer to it, it needs to pass a enum dsa_tag_protocol and
> > if it does not match, the core should return -EINVAL or similar.
> 
> In my proposal, the tagger will allocate the memory from its side of the
> ->connect() call. So regardless of whether the switch driver side
> connects or not, the memory inside dp->priv is there for the tagger to
> use. The switch can access it or it can ignore it.

I don't think I actually said something useful here.

The goal would be to minimize use of dp->priv inside the switch driver,
outside of the actual ->connect() / ->disconnect() calls.
For example, in the felix driver which supports two tagging protocol
drivers, I think these two methods would be enough, and they would
replace the current felix_port_setup_tagger_data() and
felix_port_teardown_tagger_data() calls.

An additional benefit would be that in ->connect() and ->disconnect() we
get the actual tagging protocol in use. Currently the felix driver lacks
there, because felix_port_setup_tagger_data() just sets dp->priv up
unconditionally for the ocelot-8021q tagging protocol (luckily the
normal ocelot tagger doesn't need dp->priv).

In sja1105 the story is a bit longer, but I believe that can also be
cleaned up to stay within the confines of ->connect()/->disconnect().

So I guess we just need to be careful and push back against dubious use
during review.

Vladimir Oltean Dec. 8, 2021, 12:40 a.m. UTC | #25

On Wed, Dec 08, 2021 at 02:04:32AM +0200, Vladimir Oltean wrote:
> On Wed, Dec 08, 2021 at 01:47:36AM +0200, Vladimir Oltean wrote:
> > > 2) is harder. But as far as i know, we have an 1:N setup.  One switch
> > > driver can use N tag drivers. So we need the switch driver to be sure
> > > the tag driver is what it expects. We keep the shared state in the tag
> > > driver, so it always has valid data, but when the switch driver wants
> > > to get a pointer to it, it needs to pass a enum dsa_tag_protocol and
> > > if it does not match, the core should return -EINVAL or similar.
> > 
> > In my proposal, the tagger will allocate the memory from its side of the
> > ->connect() call. So regardless of whether the switch driver side
> > connects or not, the memory inside dp->priv is there for the tagger to
> > use. The switch can access it or it can ignore it.
> 
> I don't think I actually said something useful here.
> 
> The goal would be to minimize use of dp->priv inside the switch driver,
> outside of the actual ->connect() / ->disconnect() calls.
> For example, in the felix driver which supports two tagging protocol
> drivers, I think these two methods would be enough, and they would
> replace the current felix_port_setup_tagger_data() and
> felix_port_teardown_tagger_data() calls.
> 
> An additional benefit would be that in ->connect() and ->disconnect() we
> get the actual tagging protocol in use. Currently the felix driver lacks
> there, because felix_port_setup_tagger_data() just sets dp->priv up
> unconditionally for the ocelot-8021q tagging protocol (luckily the
> normal ocelot tagger doesn't need dp->priv).
> 
> In sja1105 the story is a bit longer, but I believe that can also be
> cleaned up to stay within the confines of ->connect()/->disconnect().
> 
> So I guess we just need to be careful and push back against dubious use
> during review.

I've started working on a prototype for converting sja1105 to this model.
It should be clearer to me by tomorrow whether there is anything missing
from this proposal.

Christian Marangi Dec. 8, 2021, 12:42 a.m. UTC | #26

On Wed, Dec 08, 2021 at 02:40:51AM +0200, Vladimir Oltean wrote:
> On Wed, Dec 08, 2021 at 02:04:32AM +0200, Vladimir Oltean wrote:
> > On Wed, Dec 08, 2021 at 01:47:36AM +0200, Vladimir Oltean wrote:
> > > > 2) is harder. But as far as i know, we have an 1:N setup.  One switch
> > > > driver can use N tag drivers. So we need the switch driver to be sure
> > > > the tag driver is what it expects. We keep the shared state in the tag
> > > > driver, so it always has valid data, but when the switch driver wants
> > > > to get a pointer to it, it needs to pass a enum dsa_tag_protocol and
> > > > if it does not match, the core should return -EINVAL or similar.
> > > 
> > > In my proposal, the tagger will allocate the memory from its side of the
> > > ->connect() call. So regardless of whether the switch driver side
> > > connects or not, the memory inside dp->priv is there for the tagger to
> > > use. The switch can access it or it can ignore it.
> > 
> > I don't think I actually said something useful here.
> > 
> > The goal would be to minimize use of dp->priv inside the switch driver,
> > outside of the actual ->connect() / ->disconnect() calls.
> > For example, in the felix driver which supports two tagging protocol
> > drivers, I think these two methods would be enough, and they would
> > replace the current felix_port_setup_tagger_data() and
> > felix_port_teardown_tagger_data() calls.
> > 
> > An additional benefit would be that in ->connect() and ->disconnect() we
> > get the actual tagging protocol in use. Currently the felix driver lacks
> > there, because felix_port_setup_tagger_data() just sets dp->priv up
> > unconditionally for the ocelot-8021q tagging protocol (luckily the
> > normal ocelot tagger doesn't need dp->priv).
> > 
> > In sja1105 the story is a bit longer, but I believe that can also be
> > cleaned up to stay within the confines of ->connect()/->disconnect().
> > 
> > So I guess we just need to be careful and push back against dubious use
> > during review.
> 
> I've started working on a prototype for converting sja1105 to this model.
> It should be clearer to me by tomorrow whether there is anything missing
> from this proposal.

I'm working on your suggestion and I should be able to post another RFC
this night if all works correctly with my switch.

Vladimir Oltean Dec. 8, 2021, 1:09 a.m. UTC | #27

On Wed, Dec 08, 2021 at 01:42:59AM +0100, Ansuel Smith wrote:
> On Wed, Dec 08, 2021 at 02:40:51AM +0200, Vladimir Oltean wrote:
> > On Wed, Dec 08, 2021 at 02:04:32AM +0200, Vladimir Oltean wrote:
> > > On Wed, Dec 08, 2021 at 01:47:36AM +0200, Vladimir Oltean wrote:
> > > > > 2) is harder. But as far as i know, we have an 1:N setup.  One switch
> > > > > driver can use N tag drivers. So we need the switch driver to be sure
> > > > > the tag driver is what it expects. We keep the shared state in the tag
> > > > > driver, so it always has valid data, but when the switch driver wants
> > > > > to get a pointer to it, it needs to pass a enum dsa_tag_protocol and
> > > > > if it does not match, the core should return -EINVAL or similar.
> > > > 
> > > > In my proposal, the tagger will allocate the memory from its side of the
> > > > ->connect() call. So regardless of whether the switch driver side
> > > > connects or not, the memory inside dp->priv is there for the tagger to
> > > > use. The switch can access it or it can ignore it.
> > > 
> > > I don't think I actually said something useful here.
> > > 
> > > The goal would be to minimize use of dp->priv inside the switch driver,
> > > outside of the actual ->connect() / ->disconnect() calls.
> > > For example, in the felix driver which supports two tagging protocol
> > > drivers, I think these two methods would be enough, and they would
> > > replace the current felix_port_setup_tagger_data() and
> > > felix_port_teardown_tagger_data() calls.
> > > 
> > > An additional benefit would be that in ->connect() and ->disconnect() we
> > > get the actual tagging protocol in use. Currently the felix driver lacks
> > > there, because felix_port_setup_tagger_data() just sets dp->priv up
> > > unconditionally for the ocelot-8021q tagging protocol (luckily the
> > > normal ocelot tagger doesn't need dp->priv).
> > > 
> > > In sja1105 the story is a bit longer, but I believe that can also be
> > > cleaned up to stay within the confines of ->connect()/->disconnect().
> > > 
> > > So I guess we just need to be careful and push back against dubious use
> > > during review.
> > 
> > I've started working on a prototype for converting sja1105 to this model.
> > It should be clearer to me by tomorrow whether there is anything missing
> > from this proposal.
> 
> I'm working on your suggestion and I should be able to post another RFC
> this night if all works correctly with my switch.

Ok. The key point with my progress so far is that Andrew may be right
and we might need separate tagger priv pointers per port and per switch.
At least for the cleanliness of implementation. In the end I plan to
remove dp->priv and stay with dp->tagger_priv and ds->tagger_priv.

Here's what I have so far. I have more changes locally, but the rest it
isn't ready and overall also a bit irrelevant for the discussion.
I'm going to sleep now.

diff --git a/include/net/dsa.h b/include/net/dsa.h
index bdf308a5c55e..f0f702774c8d 100644
--- a/include/net/dsa.h
+++ b/include/net/dsa.h
@@ -82,12 +82,15 @@ enum dsa_tag_protocol {
 };
 
 struct dsa_switch;
+struct dsa_switch_tree;
 
 struct dsa_device_ops {
 	struct sk_buff *(*xmit)(struct sk_buff *skb, struct net_device *dev);
 	struct sk_buff *(*rcv)(struct sk_buff *skb, struct net_device *dev);
 	void (*flow_dissect)(const struct sk_buff *skb, __be16 *proto,
 			     int *offset);
+	int (*connect)(struct dsa_switch_tree *dst);
+	void (*disconnect)(struct dsa_switch_tree *dst);
 	unsigned int needed_headroom;
 	unsigned int needed_tailroom;
 	const char *name;
@@ -279,6 +282,8 @@ struct dsa_port {
 	 */
 	void *priv;
 
+	void *tagger_priv;
+
 	/*
 	 * Original copy of the master netdev ethtool_ops
 	 */
@@ -337,6 +342,8 @@ struct dsa_switch {
 	 */
 	void *priv;
 
+	void *tagger_priv;
+
 	/*
 	 * Configuration data for this switch.
 	 */
@@ -689,6 +696,8 @@ struct dsa_switch_ops {
 						  enum dsa_tag_protocol mprot);
 	int	(*change_tag_protocol)(struct dsa_switch *ds, int port,
 				       enum dsa_tag_protocol proto);
+	int	(*connect_tag_protocol)(struct dsa_switch *ds,
+					enum dsa_tag_protocol proto);
 
 	/* Optional switch-wide initialization and destruction methods */
 	int	(*setup)(struct dsa_switch *ds);
diff --git a/net/dsa/dsa2.c b/net/dsa/dsa2.c
index 8814fa0e44c8..3787cbce1175 100644
--- a/net/dsa/dsa2.c
+++ b/net/dsa/dsa2.c
@@ -248,8 +248,12 @@ static struct dsa_switch_tree *dsa_tree_alloc(int index)
 
 static void dsa_tree_free(struct dsa_switch_tree *dst)
 {
-	if (dst->tag_ops)
+	if (dst->tag_ops) {
+		if (dst->tag_ops->disconnect)
+			dst->tag_ops->disconnect(dst);
+
 		dsa_tag_driver_put(dst->tag_ops);
+	}
 	list_del(&dst->list);
 	kfree(dst);
 }
@@ -1136,6 +1140,36 @@ static void dsa_tree_teardown(struct dsa_switch_tree *dst)
 	dst->setup = false;
 }
 
+static int dsa_tree_bind_tag_proto(struct dsa_switch_tree *dst,
+				   const struct dsa_device_ops *tag_ops)
+{
+	struct dsa_notifier_tag_proto_info info;
+	int err;
+
+	if (dst->tag_ops && dst->tag_ops->disconnect)
+		dst->tag_ops->disconnect(dst);
+
+	if (tag_ops->connect) {
+		err = tag_ops->connect(dst);
+		if (err)
+			return err;
+	}
+
+	info.tag_ops = tag_ops;
+	err = dsa_tree_notify(dst, DSA_NOTIFIER_TAG_PROTO_CONNECT, &info);
+	if (err && err != -EOPNOTSUPP)
+		goto out_disconnect;
+
+	dst->tag_ops = tag_ops;
+
+	return 0;
+
+out_disconnect:
+	if (tag_ops->disconnect)
+		tag_ops->disconnect(dst);
+	return err;
+}
+
 /* Since the dsa/tagging sysfs device attribute is per master, the assumption
  * is that all DSA switches within a tree share the same tagger, otherwise
  * they would have formed disjoint trees (different "dsa,member" values).
@@ -1173,7 +1207,9 @@ int dsa_tree_change_tag_proto(struct dsa_switch_tree *dst,
 	if (err)
 		goto out_unwind_tagger;
 
-	dst->tag_ops = tag_ops;
+	err = dsa_tree_bind_tag_proto(dst, tag_ops);
+	if (err)
+		goto out_unwind_tagger;
 
 	rtnl_unlock();
 
@@ -1307,7 +1343,9 @@ static int dsa_port_parse_cpu(struct dsa_port *dp, struct net_device *master,
 		 */
 		dsa_tag_driver_put(tag_ops);
 	} else {
-		dst->tag_ops = tag_ops;
+		err = dsa_tree_bind_tag_proto(dst, tag_ops);
+		if (err)
+			return err;
 	}
 
 	dp->master = master;
diff --git a/net/dsa/dsa_priv.h b/net/dsa/dsa_priv.h
index 38ce5129a33d..0db2b26b0c83 100644
--- a/net/dsa/dsa_priv.h
+++ b/net/dsa/dsa_priv.h
@@ -37,6 +37,7 @@ enum {
 	DSA_NOTIFIER_VLAN_DEL,
 	DSA_NOTIFIER_MTU,
 	DSA_NOTIFIER_TAG_PROTO,
+	DSA_NOTIFIER_TAG_PROTO_CONNECT,
 	DSA_NOTIFIER_MRP_ADD,
 	DSA_NOTIFIER_MRP_DEL,
 	DSA_NOTIFIER_MRP_ADD_RING_ROLE,
diff --git a/net/dsa/switch.c b/net/dsa/switch.c
index 9c92edd96961..06948f536829 100644
--- a/net/dsa/switch.c
+++ b/net/dsa/switch.c
@@ -647,6 +647,17 @@ static int dsa_switch_change_tag_proto(struct dsa_switch *ds,
 	return 0;
 }
 
+static int dsa_switch_connect_tag_proto(struct dsa_switch *ds,
+					struct dsa_notifier_tag_proto_info *info)
+{
+	const struct dsa_device_ops *tag_ops = info->tag_ops;
+
+	if (!ds->ops->connect_tag_protocol)
+		return -EOPNOTSUPP;
+
+	return ds->ops->connect_tag_protocol(ds, tag_ops->proto);
+}
+
 static int dsa_switch_mrp_add(struct dsa_switch *ds,
 			      struct dsa_notifier_mrp_info *info)
 {
@@ -766,6 +777,9 @@ static int dsa_switch_event(struct notifier_block *nb,
 	case DSA_NOTIFIER_TAG_PROTO:
 		err = dsa_switch_change_tag_proto(ds, info);
 		break;
+	case DSA_NOTIFIER_TAG_PROTO_CONNECT:
+		err = dsa_switch_connect_tag_proto(ds, info);
+		break;
 	case DSA_NOTIFIER_MRP_ADD:
 		err = dsa_switch_mrp_add(ds, info);
 		break;
diff --git a/net/dsa/tag_sja1105.c b/net/dsa/tag_sja1105.c
index 6c293c2a3008..53362a0f0aab 100644
--- a/net/dsa/tag_sja1105.c
+++ b/net/dsa/tag_sja1105.c
@@ -722,11 +722,59 @@ static void sja1110_flow_dissect(const struct sk_buff *skb, __be16 *proto,
 	*proto = ((__be16 *)skb->data)[(VLAN_HLEN / 2) - 1];
 }
 
+static void sja1105_disconnect(struct dsa_switch_tree *dst)
+{
+	struct dsa_port *dp;
+
+	dsa_tree_for_each_user_port(dp, dst) {
+		if (dp->tagger_priv) {
+			kfree(dp->tagger_priv);
+			dp->tagger_priv = NULL;
+		}
+
+		if (dp->ds->tagger_priv) {
+			kfree(dp->ds->tagger_priv);
+			dp->ds->tagger_priv = NULL;
+		}
+	}
+}
+
+static int sja1105_connect(struct dsa_switch_tree *dst)
+{
+	struct sja1105_tagger_data *data;
+	struct sja1105_port *sp;
+	struct dsa_port *dp;
+
+	dsa_tree_for_each_user_port(dp, dst) {
+		if (!dp->ds->tagger_priv) {
+			data = kzalloc(sizeof(*data), GFP_KERNEL);
+			if (!data)
+				goto out;
+
+			dp->ds->tagger_priv = data;
+		}
+
+		sp = kzalloc(sizeof(*sp), GFP_KERNEL);
+		if (!sp)
+			goto out;
+
+		dp->tagger_priv = sp;
+	}
+
+	return 0;
+
+out:
+	sja1105_disconnect(dst);
+	return -ENOMEM;
+}
+
 static const struct dsa_device_ops sja1105_netdev_ops = {
 	.name = "sja1105",
 	.proto = DSA_TAG_PROTO_SJA1105,
 	.xmit = sja1105_xmit,
 	.rcv = sja1105_rcv,
+	.connect = sja1105_connect,
+	.disconnect = sja1105_disconnect,
 	.needed_headroom = VLAN_HLEN,
 	.flow_dissect = sja1105_flow_dissect,
 	.promisc_on_master = true,
@@ -740,6 +788,8 @@ static const struct dsa_device_ops sja1110_netdev_ops = {
 	.proto = DSA_TAG_PROTO_SJA1110,
 	.xmit = sja1110_xmit,
 	.rcv = sja1110_rcv,
+	.connect = sja1105_connect,
+	.disconnect = sja1105_disconnect,
 	.flow_dissect = sja1110_flow_dissect,
 	.needed_headroom = SJA1110_HEADER_LEN + VLAN_HLEN,
 	.needed_tailroom = SJA1110_RX_TRAILER_LEN + SJA1110_MAX_PADDING_LEN,

Andrew Lunn Dec. 8, 2021, 1:15 a.m. UTC | #28

On Wed, Dec 08, 2021 at 01:47:36AM +0200, Vladimir Oltean wrote:
> On Tue, Dec 07, 2021 at 11:22:41PM +0100, Andrew Lunn wrote:
> > > I like the idea of tagger-owend per-switch-tree private data.
> > > Do we really need to hook logic?
> > 
> > We have two different things here.
> > 
> > 1) The tagger needs somewhere to store its own private data.
> > 2) The tagger needs to share state with the switch driver.
> > 
> > We can probably have the DSA core provide 1). Add the size to
> > dsa_device_ops structure, and provide helpers to go from either a
> > master or a slave netdev to the private data.
> 
> We cannot "add the size to the dsa_device_ops structure", because it is
> a singleton (const struct). It is not replicated at all, not per port,
> not per switch, not per tree, but global to the kernel. Not to mention
> const. Nobody needs state as shared as that.

What i'm suggesting is 

static const struct dsa_device_ops edsa_netdev_ops = {
        .name     = "edsa",
        .proto    = DSA_TAG_PROTO_EDSA,
        .xmit     = edsa_xmit,
        .rcv      = edsa_rcv,
        .needed_headroom = EDSA_HLEN,
	.priv_size = 42;
};

The priv_size indicates that an instance of this tagger needs 42 bytes
of private data. More likely it will be a sizeof(struct dsa_priv), but
that is a detail.

When a master is setup and the tagger instantiated, 42 bytes of memory
will be allocated and put somewhere it can be found via a helper.

This is not meant for shared state between the tagger and the switch
driver, this is private to the tagger. As such it is less likely to be
dependent on the number of ports etc. It is somewhere to store an skb
pointer, maybe a sequence number for the management message expected
as a reply from the switch etc.

   Andrew

Andrew Lunn Dec. 8, 2021, 1:35 a.m. UTC | #29

On Wed, Dec 08, 2021 at 01:14:49AM +0200, Vladimir Oltean wrote:
> On Tue, Dec 07, 2021 at 11:54:07PM +0100, Andrew Lunn wrote:
> > > I considered a simplified form like this, but I think the tagger private
> > > data will still stay in dp->priv, only its ownership will change.
> > 
> > Isn't dp a port structure. So there is one per port?
> 
> Yes, but dp->priv is a pointer. The thing it points to may not
> necessarily be per port.
> 
> > This is where i think we need to separate shared state from tagger
> > private data. Probably tagger private data is not per port. Shared
> > state between the switch driver and the tagger maybe is per port?
> 
> I don't know whether there's such a big difference between
> "shared state" vs "private data"?

The difference is to do with stopping the kernel exploding when the
switch driver is not using the tagger it expects.

Anything which is private to the tagger is not a problem. Only the
tagger uses it, so it cannot be wrong.

Anything which is shared between the tagger and the switch driver we
have to be careful about. We are just passing void * pointers
about. There is no type checking. If i'm correct about the 1:N
relationship, we can store shared state in the tagger. The tagger
should be O.K, because it only ever needs to deal with one format of
shared state. The switch driver needs to handle N different formats of
shared state, depending on which of the N different taggers are in
operation. Ideally, when it asks for the void * pointer for shared
information, some sort of checking is performed to ensure the void *
is what the switch driver actually expects. Maybe it needs to pass the
tag driver it thinks it is talking to, or as well as getting the void
* back, it also gets the tag enum and it verifies it actually knows
about that tag driver.

     Andrew

Christian Marangi Dec. 8, 2021, 3:32 a.m. UTC | #30

On Wed, Dec 08, 2021 at 03:09:47AM +0200, Vladimir Oltean wrote:
> On Wed, Dec 08, 2021 at 01:42:59AM +0100, Ansuel Smith wrote:
> > On Wed, Dec 08, 2021 at 02:40:51AM +0200, Vladimir Oltean wrote:
> > > On Wed, Dec 08, 2021 at 02:04:32AM +0200, Vladimir Oltean wrote:
> > > > On Wed, Dec 08, 2021 at 01:47:36AM +0200, Vladimir Oltean wrote:
> > > > > > 2) is harder. But as far as i know, we have an 1:N setup.  One switch
> > > > > > driver can use N tag drivers. So we need the switch driver to be sure
> > > > > > the tag driver is what it expects. We keep the shared state in the tag
> > > > > > driver, so it always has valid data, but when the switch driver wants
> > > > > > to get a pointer to it, it needs to pass a enum dsa_tag_protocol and
> > > > > > if it does not match, the core should return -EINVAL or similar.
> > > > > 
> > > > > In my proposal, the tagger will allocate the memory from its side of the
> > > > > ->connect() call. So regardless of whether the switch driver side
> > > > > connects or not, the memory inside dp->priv is there for the tagger to
> > > > > use. The switch can access it or it can ignore it.
> > > > 
> > > > I don't think I actually said something useful here.
> > > > 
> > > > The goal would be to minimize use of dp->priv inside the switch driver,
> > > > outside of the actual ->connect() / ->disconnect() calls.
> > > > For example, in the felix driver which supports two tagging protocol
> > > > drivers, I think these two methods would be enough, and they would
> > > > replace the current felix_port_setup_tagger_data() and
> > > > felix_port_teardown_tagger_data() calls.
> > > > 
> > > > An additional benefit would be that in ->connect() and ->disconnect() we
> > > > get the actual tagging protocol in use. Currently the felix driver lacks
> > > > there, because felix_port_setup_tagger_data() just sets dp->priv up
> > > > unconditionally for the ocelot-8021q tagging protocol (luckily the
> > > > normal ocelot tagger doesn't need dp->priv).
> > > > 
> > > > In sja1105 the story is a bit longer, but I believe that can also be
> > > > cleaned up to stay within the confines of ->connect()/->disconnect().
> > > > 
> > > > So I guess we just need to be careful and push back against dubious use
> > > > during review.
> > > 
> > > I've started working on a prototype for converting sja1105 to this model.
> > > It should be clearer to me by tomorrow whether there is anything missing
> > > from this proposal.
> > 
> > I'm working on your suggestion and I should be able to post another RFC
> > this night if all works correctly with my switch.
> 
> Ok. The key point with my progress so far is that Andrew may be right
> and we might need separate tagger priv pointers per port and per switch.
> At least for the cleanliness of implementation. In the end I plan to
> remove dp->priv and stay with dp->tagger_priv and ds->tagger_priv.
> 
> Here's what I have so far. I have more changes locally, but the rest it
> isn't ready and overall also a bit irrelevant for the discussion.
> I'm going to sleep now.
>

BTW, I notice we made the same mistake. Don't know if it was the problem
and you didn't notice... The notifier is not ready at times of the first
tagger setup and the tag_proto_connect is never called.
Anyway sending v2 with your suggestion applied.

> diff --git a/include/net/dsa.h b/include/net/dsa.h
> index bdf308a5c55e..f0f702774c8d 100644
> --- a/include/net/dsa.h
> +++ b/include/net/dsa.h
> @@ -82,12 +82,15 @@ enum dsa_tag_protocol {
>  };
>  
>  struct dsa_switch;
> +struct dsa_switch_tree;
>  
>  struct dsa_device_ops {
>  	struct sk_buff *(*xmit)(struct sk_buff *skb, struct net_device *dev);
>  	struct sk_buff *(*rcv)(struct sk_buff *skb, struct net_device *dev);
>  	void (*flow_dissect)(const struct sk_buff *skb, __be16 *proto,
>  			     int *offset);
> +	int (*connect)(struct dsa_switch_tree *dst);
> +	void (*disconnect)(struct dsa_switch_tree *dst);
>  	unsigned int needed_headroom;
>  	unsigned int needed_tailroom;
>  	const char *name;
> @@ -279,6 +282,8 @@ struct dsa_port {
>  	 */
>  	void *priv;
>  
> +	void *tagger_priv;
> +
>  	/*
>  	 * Original copy of the master netdev ethtool_ops
>  	 */
> @@ -337,6 +342,8 @@ struct dsa_switch {
>  	 */
>  	void *priv;
>  
> +	void *tagger_priv;
> +
>  	/*
>  	 * Configuration data for this switch.
>  	 */
> @@ -689,6 +696,8 @@ struct dsa_switch_ops {
>  						  enum dsa_tag_protocol mprot);
>  	int	(*change_tag_protocol)(struct dsa_switch *ds, int port,
>  				       enum dsa_tag_protocol proto);
> +	int	(*connect_tag_protocol)(struct dsa_switch *ds,
> +					enum dsa_tag_protocol proto);
>  
>  	/* Optional switch-wide initialization and destruction methods */
>  	int	(*setup)(struct dsa_switch *ds);
> diff --git a/net/dsa/dsa2.c b/net/dsa/dsa2.c
> index 8814fa0e44c8..3787cbce1175 100644
> --- a/net/dsa/dsa2.c
> +++ b/net/dsa/dsa2.c
> @@ -248,8 +248,12 @@ static struct dsa_switch_tree *dsa_tree_alloc(int index)
>  
>  static void dsa_tree_free(struct dsa_switch_tree *dst)
>  {
> -	if (dst->tag_ops)
> +	if (dst->tag_ops) {
> +		if (dst->tag_ops->disconnect)
> +			dst->tag_ops->disconnect(dst);
> +
>  		dsa_tag_driver_put(dst->tag_ops);
> +	}
>  	list_del(&dst->list);
>  	kfree(dst);
>  }
> @@ -1136,6 +1140,36 @@ static void dsa_tree_teardown(struct dsa_switch_tree *dst)
>  	dst->setup = false;
>  }
>  
> +static int dsa_tree_bind_tag_proto(struct dsa_switch_tree *dst,
> +				   const struct dsa_device_ops *tag_ops)
> +{
> +	struct dsa_notifier_tag_proto_info info;
> +	int err;
> +
> +	if (dst->tag_ops && dst->tag_ops->disconnect)
> +		dst->tag_ops->disconnect(dst);
> +
> +	if (tag_ops->connect) {
> +		err = tag_ops->connect(dst);
> +		if (err)
> +			return err;
> +	}
> +
> +	info.tag_ops = tag_ops;
> +	err = dsa_tree_notify(dst, DSA_NOTIFIER_TAG_PROTO_CONNECT, &info);
> +	if (err && err != -EOPNOTSUPP)
> +		goto out_disconnect;
> +
> +	dst->tag_ops = tag_ops;
> +
> +	return 0;
> +
> +out_disconnect:
> +	if (tag_ops->disconnect)
> +		tag_ops->disconnect(dst);
> +	return err;
> +}
> +
>  /* Since the dsa/tagging sysfs device attribute is per master, the assumption
>   * is that all DSA switches within a tree share the same tagger, otherwise
>   * they would have formed disjoint trees (different "dsa,member" values).
> @@ -1173,7 +1207,9 @@ int dsa_tree_change_tag_proto(struct dsa_switch_tree *dst,
>  	if (err)
>  		goto out_unwind_tagger;
>  
> -	dst->tag_ops = tag_ops;
> +	err = dsa_tree_bind_tag_proto(dst, tag_ops);
> +	if (err)
> +		goto out_unwind_tagger;
>  
>  	rtnl_unlock();
>  
> @@ -1307,7 +1343,9 @@ static int dsa_port_parse_cpu(struct dsa_port *dp, struct net_device *master,
>  		 */
>  		dsa_tag_driver_put(tag_ops);
>  	} else {
> -		dst->tag_ops = tag_ops;
> +		err = dsa_tree_bind_tag_proto(dst, tag_ops);
> +		if (err)
> +			return err;
>  	}
>  
>  	dp->master = master;
> diff --git a/net/dsa/dsa_priv.h b/net/dsa/dsa_priv.h
> index 38ce5129a33d..0db2b26b0c83 100644
> --- a/net/dsa/dsa_priv.h
> +++ b/net/dsa/dsa_priv.h
> @@ -37,6 +37,7 @@ enum {
>  	DSA_NOTIFIER_VLAN_DEL,
>  	DSA_NOTIFIER_MTU,
>  	DSA_NOTIFIER_TAG_PROTO,
> +	DSA_NOTIFIER_TAG_PROTO_CONNECT,
>  	DSA_NOTIFIER_MRP_ADD,
>  	DSA_NOTIFIER_MRP_DEL,
>  	DSA_NOTIFIER_MRP_ADD_RING_ROLE,
> diff --git a/net/dsa/switch.c b/net/dsa/switch.c
> index 9c92edd96961..06948f536829 100644
> --- a/net/dsa/switch.c
> +++ b/net/dsa/switch.c
> @@ -647,6 +647,17 @@ static int dsa_switch_change_tag_proto(struct dsa_switch *ds,
>  	return 0;
>  }
>  
> +static int dsa_switch_connect_tag_proto(struct dsa_switch *ds,
> +					struct dsa_notifier_tag_proto_info *info)
> +{
> +	const struct dsa_device_ops *tag_ops = info->tag_ops;
> +
> +	if (!ds->ops->connect_tag_protocol)
> +		return -EOPNOTSUPP;
> +
> +	return ds->ops->connect_tag_protocol(ds, tag_ops->proto);
> +}
> +
>  static int dsa_switch_mrp_add(struct dsa_switch *ds,
>  			      struct dsa_notifier_mrp_info *info)
>  {
> @@ -766,6 +777,9 @@ static int dsa_switch_event(struct notifier_block *nb,
>  	case DSA_NOTIFIER_TAG_PROTO:
>  		err = dsa_switch_change_tag_proto(ds, info);
>  		break;
> +	case DSA_NOTIFIER_TAG_PROTO_CONNECT:
> +		err = dsa_switch_connect_tag_proto(ds, info);
> +		break;
>  	case DSA_NOTIFIER_MRP_ADD:
>  		err = dsa_switch_mrp_add(ds, info);
>  		break;
> diff --git a/net/dsa/tag_sja1105.c b/net/dsa/tag_sja1105.c
> index 6c293c2a3008..53362a0f0aab 100644
> --- a/net/dsa/tag_sja1105.c
> +++ b/net/dsa/tag_sja1105.c
> @@ -722,11 +722,59 @@ static void sja1110_flow_dissect(const struct sk_buff *skb, __be16 *proto,
>  	*proto = ((__be16 *)skb->data)[(VLAN_HLEN / 2) - 1];
>  }
>  
> +static void sja1105_disconnect(struct dsa_switch_tree *dst)
> +{
> +	struct dsa_port *dp;
> +
> +	dsa_tree_for_each_user_port(dp, dst) {
> +		if (dp->tagger_priv) {
> +			kfree(dp->tagger_priv);
> +			dp->tagger_priv = NULL;
> +		}
> +
> +		if (dp->ds->tagger_priv) {
> +			kfree(dp->ds->tagger_priv);
> +			dp->ds->tagger_priv = NULL;
> +		}
> +	}
> +}
> +
> +static int sja1105_connect(struct dsa_switch_tree *dst)
> +{
> +	struct sja1105_tagger_data *data;
> +	struct sja1105_port *sp;
> +	struct dsa_port *dp;
> +
> +	dsa_tree_for_each_user_port(dp, dst) {
> +		if (!dp->ds->tagger_priv) {
> +			data = kzalloc(sizeof(*data), GFP_KERNEL);
> +			if (!data)
> +				goto out;
> +
> +			dp->ds->tagger_priv = data;
> +		}
> +
> +		sp = kzalloc(sizeof(*sp), GFP_KERNEL);
> +		if (!sp)
> +			goto out;
> +
> +		dp->tagger_priv = sp;
> +	}
> +
> +	return 0;
> +
> +out:
> +	sja1105_disconnect(dst);
> +	return -ENOMEM;
> +}
> +
>  static const struct dsa_device_ops sja1105_netdev_ops = {
>  	.name = "sja1105",
>  	.proto = DSA_TAG_PROTO_SJA1105,
>  	.xmit = sja1105_xmit,
>  	.rcv = sja1105_rcv,
> +	.connect = sja1105_connect,
> +	.disconnect = sja1105_disconnect,
>  	.needed_headroom = VLAN_HLEN,
>  	.flow_dissect = sja1105_flow_dissect,
>  	.promisc_on_master = true,
> @@ -740,6 +788,8 @@ static const struct dsa_device_ops sja1110_netdev_ops = {
>  	.proto = DSA_TAG_PROTO_SJA1110,
>  	.xmit = sja1110_xmit,
>  	.rcv = sja1110_rcv,
> +	.connect = sja1105_connect,
> +	.disconnect = sja1105_disconnect,
>  	.flow_dissect = sja1110_flow_dissect,
>  	.needed_headroom = SJA1110_HEADER_LEN + VLAN_HLEN,
>  	.needed_tailroom = SJA1110_RX_TRAILER_LEN + SJA1110_MAX_PADDING_LEN,
> -- 
> 2.25.1

Christian Marangi Dec. 8, 2021, 3:39 a.m. UTC | #31

On Wed, Dec 08, 2021 at 02:35:34AM +0100, Andrew Lunn wrote:
> On Wed, Dec 08, 2021 at 01:14:49AM +0200, Vladimir Oltean wrote:
> > On Tue, Dec 07, 2021 at 11:54:07PM +0100, Andrew Lunn wrote:
> > > > I considered a simplified form like this, but I think the tagger private
> > > > data will still stay in dp->priv, only its ownership will change.
> > > 
> > > Isn't dp a port structure. So there is one per port?
> > 
> > Yes, but dp->priv is a pointer. The thing it points to may not
> > necessarily be per port.
> > 
> > > This is where i think we need to separate shared state from tagger
> > > private data. Probably tagger private data is not per port. Shared
> > > state between the switch driver and the tagger maybe is per port?
> > 
> > I don't know whether there's such a big difference between
> > "shared state" vs "private data"?
> 
> The difference is to do with stopping the kernel exploding when the
> switch driver is not using the tagger it expects.
> 
> Anything which is private to the tagger is not a problem. Only the
> tagger uses it, so it cannot be wrong.
> 
> Anything which is shared between the tagger and the switch driver we
> have to be careful about. We are just passing void * pointers
> about. There is no type checking. If i'm correct about the 1:N
> relationship, we can store shared state in the tagger. The tagger
> should be O.K, because it only ever needs to deal with one format of
> shared state. The switch driver needs to handle N different formats of
> shared state, depending on which of the N different taggers are in
> operation. Ideally, when it asks for the void * pointer for shared
> information, some sort of checking is performed to ensure the void *
> is what the switch driver actually expects. Maybe it needs to pass the
> tag driver it thinks it is talking to, or as well as getting the void
> * back, it also gets the tag enum and it verifies it actually knows
> about that tag driver.
> 
>      Andrew

I'm sending v2 with Vladimir suggestion so we can start working on that.
Hope with a some split code it would be easier to find the problem with
this and find a way to correctly validate the shared data between tagger
and dsa driver. (you will probably have to rewrite this also for v2 and
sorry for this)

Vladimir Oltean Dec. 8, 2021, 11:51 a.m. UTC | #32

On Wed, Dec 08, 2021 at 02:35:34AM +0100, Andrew Lunn wrote:
> On Wed, Dec 08, 2021 at 01:14:49AM +0200, Vladimir Oltean wrote:
> > On Tue, Dec 07, 2021 at 11:54:07PM +0100, Andrew Lunn wrote:
> > > > I considered a simplified form like this, but I think the tagger private
> > > > data will still stay in dp->priv, only its ownership will change.
> > > 
> > > Isn't dp a port structure. So there is one per port?
> > 
> > Yes, but dp->priv is a pointer. The thing it points to may not
> > necessarily be per port.
> > 
> > > This is where i think we need to separate shared state from tagger
> > > private data. Probably tagger private data is not per port. Shared
> > > state between the switch driver and the tagger maybe is per port?
> > 
> > I don't know whether there's such a big difference between
> > "shared state" vs "private data"?
> 
> The difference is to do with stopping the kernel exploding when the
> switch driver is not using the tagger it expects.
> 
> Anything which is private to the tagger is not a problem. Only the
> tagger uses it, so it cannot be wrong.
> 
> Anything which is shared between the tagger and the switch driver we
> have to be careful about. We are just passing void * pointers
> about. There is no type checking. If i'm correct about the 1:N
> relationship, we can store shared state in the tagger. The tagger
> should be O.K, because it only ever needs to deal with one format of
> shared state. The switch driver needs to handle N different formats of
> shared state, depending on which of the N different taggers are in
> operation. Ideally, when it asks for the void * pointer for shared
> information, some sort of checking is performed to ensure the void *
> is what the switch driver actually expects. Maybe it needs to pass the
> tag driver it thinks it is talking to, or as well as getting the void
> * back, it also gets the tag enum and it verifies it actually knows
> about that tag driver.

Understood what you mean now (actually I don't know what was unclear yesterday).
I should start doing something else past a certain hour...

What I've started doing now is something like this:

/* include/linux/dsa/ocelot.h */
struct ocelot_8021q_tagger_data {
	void (*xmit_work_fn)(struct kthread_work *work);
};

static inline struct ocelot_8021q_tagger_data *
ocelot_8021q_tagger_data(struct dsa_switch *ds)
{
	BUG_ON(ds->dst->tag_ops->proto != DSA_TAG_PROTO_OCELOT_8021Q);

	return ds->tagger_data;
}

/* net/dsa/tag_ocelot_8021q.c */
struct ocelot_8021q_tagger_private {
	struct ocelot_8021q_tagger_data data; /* Must be first */
	struct kthread_worker *xmit_worker;
};

static struct sk_buff *ocelot_defer_xmit(struct dsa_port *dp,
					 struct sk_buff *skb)
{
	struct ocelot_8021q_tagger_private *priv = dp->ds->tagger_data;
	struct ocelot_8021q_tagger_data *data = &priv->data;
	void (*xmit_work_fn)(struct kthread_work *work);
	struct felix_deferred_xmit_work *xmit_work;
	struct kthread_worker *xmit_worker;

	xmit_work_fn = data->xmit_work_fn;
	xmit_worker = priv->xmit_worker;

	if (!xmit_work_fn || !xmit_worker)
		return NULL;

	xmit_work = kzalloc(sizeof(*xmit_work), GFP_ATOMIC);
	if (!xmit_work)
		return NULL;

	/* Calls felix_port_deferred_xmit in felix.c */
	kthread_init_work(&xmit_work->work, xmit_work_fn);
	/* Increase refcount so the kfree_skb in dsa_slave_xmit
	 * won't really free the packet.
	 */
	xmit_work->dp = dp;
	xmit_work->skb = skb_get(skb);

	kthread_queue_work(xmit_worker, &xmit_work->work);

	return NULL;
}

static void ocelot_disconnect(struct dsa_switch_tree *dst)
{
	struct ocelot_8021q_tagger_private *priv;
	struct dsa_port *dp;

	list_for_each_entry(dp, &dst->ports, list) {
		priv = dp->ds->tagger_data;

		if (!priv)
			continue;

		if (priv->xmit_worker)
			kthread_destroy_worker(priv->xmit_worker);

		kfree(priv);
		dp->ds->tagger_data = NULL;
	}
}

static int ocelot_connect(struct dsa_switch_tree *dst)
{
	struct ocelot_8021q_tagger_private *priv;
	struct dsa_port *dp;
	int err;

	list_for_each_entry(dp, &dst->ports, list) {
		if (dp->ds->tagger_data)
			continue;

		priv = kzalloc(sizeof(*priv), GFP_KERNEL);
		if (!priv) {
			err = -ENOMEM;
			goto out;
		}

		priv->xmit_worker = kthread_create_worker(0, "felix_xmit");
		if (IS_ERR(priv->xmit_worker)) {
			err = PTR_ERR(priv->xmit_worker);
			goto out;
		}

		dp->ds->tagger_data = priv;
	}

	return 0;

out:
	ocelot_disconnect(dst);
	return err;
}

/* drivers/net/dsa/felix.c */
static int felix_connect_tag_protocol(struct dsa_switch *ds,
				      enum dsa_tag_protocol proto)
{
	struct ocelot_8021q_tagger_data *tagger_data;

	switch (proto) {
	case DSA_TAG_PROTO_OCELOT_8021Q:
		tagger_data = ocelot_8021q_tagger_data(ds);
		tagger_data->xmit_work_fn = felix_port_deferred_xmit;
		return 0;
	case DSA_TAG_PROTO_OCELOT:
	case DSA_TAG_PROTO_SEVILLE:
		return 0;
	default:
		return -EPROTONOSUPPORT;
	}
}

Something like this shares memory between what's private and what's
public (it's part of the same allocation), but there's type checking at
least, and private data isn't exposed directly. Is this ok?

Vladimir Oltean Dec. 8, 2021, 11:54 a.m. UTC | #33

On Wed, Dec 08, 2021 at 04:32:43AM +0100, Ansuel Smith wrote:
> On Wed, Dec 08, 2021 at 03:09:47AM +0200, Vladimir Oltean wrote:
> > On Wed, Dec 08, 2021 at 01:42:59AM +0100, Ansuel Smith wrote:
> > > On Wed, Dec 08, 2021 at 02:40:51AM +0200, Vladimir Oltean wrote:
> > > > On Wed, Dec 08, 2021 at 02:04:32AM +0200, Vladimir Oltean wrote:
> > > > > On Wed, Dec 08, 2021 at 01:47:36AM +0200, Vladimir Oltean wrote:
> > > > > > > 2) is harder. But as far as i know, we have an 1:N setup.  One switch
> > > > > > > driver can use N tag drivers. So we need the switch driver to be sure
> > > > > > > the tag driver is what it expects. We keep the shared state in the tag
> > > > > > > driver, so it always has valid data, but when the switch driver wants
> > > > > > > to get a pointer to it, it needs to pass a enum dsa_tag_protocol and
> > > > > > > if it does not match, the core should return -EINVAL or similar.
> > > > > > 
> > > > > > In my proposal, the tagger will allocate the memory from its side of the
> > > > > > ->connect() call. So regardless of whether the switch driver side
> > > > > > connects or not, the memory inside dp->priv is there for the tagger to
> > > > > > use. The switch can access it or it can ignore it.
> > > > > 
> > > > > I don't think I actually said something useful here.
> > > > > 
> > > > > The goal would be to minimize use of dp->priv inside the switch driver,
> > > > > outside of the actual ->connect() / ->disconnect() calls.
> > > > > For example, in the felix driver which supports two tagging protocol
> > > > > drivers, I think these two methods would be enough, and they would
> > > > > replace the current felix_port_setup_tagger_data() and
> > > > > felix_port_teardown_tagger_data() calls.
> > > > > 
> > > > > An additional benefit would be that in ->connect() and ->disconnect() we
> > > > > get the actual tagging protocol in use. Currently the felix driver lacks
> > > > > there, because felix_port_setup_tagger_data() just sets dp->priv up
> > > > > unconditionally for the ocelot-8021q tagging protocol (luckily the
> > > > > normal ocelot tagger doesn't need dp->priv).
> > > > > 
> > > > > In sja1105 the story is a bit longer, but I believe that can also be
> > > > > cleaned up to stay within the confines of ->connect()/->disconnect().
> > > > > 
> > > > > So I guess we just need to be careful and push back against dubious use
> > > > > during review.
> > > > 
> > > > I've started working on a prototype for converting sja1105 to this model.
> > > > It should be clearer to me by tomorrow whether there is anything missing
> > > > from this proposal.
> > > 
> > > I'm working on your suggestion and I should be able to post another RFC
> > > this night if all works correctly with my switch.
> > 
> > Ok. The key point with my progress so far is that Andrew may be right
> > and we might need separate tagger priv pointers per port and per switch.
> > At least for the cleanliness of implementation. In the end I plan to
> > remove dp->priv and stay with dp->tagger_priv and ds->tagger_priv.
> > 
> > Here's what I have so far. I have more changes locally, but the rest it
> > isn't ready and overall also a bit irrelevant for the discussion.
> > I'm going to sleep now.
> >
> 
> BTW, I notice we made the same mistake. Don't know if it was the problem
> and you didn't notice... The notifier is not ready at times of the first
> tagger setup and the tag_proto_connect is never called.
> Anyway sending v2 with your suggestion applied.

I didn't go past the compilation stage yesterday. Anyway, now that you
mention it, I remember Tobias hitting this issue as well when he worked
on changing tagging protocol via device tree, and this is why
dsa_switch_setup_tag_protocol() exists.  I believe that's where we'd
need to call ds->ops->connect_tag_proto from, like this:

static int dsa_switch_setup_tag_protocol(struct dsa_switch *ds)
{
	const struct dsa_device_ops *tag_ops = ds->dst->tag_ops;
	struct dsa_switch_tree *dst = ds->dst;
	struct dsa_port *cpu_dp;
	int err;

	if (tag_ops->proto == dst->default_proto)
		goto connect;

	dsa_switch_for_each_cpu_port(cpu_dp, ds) {
		rtnl_lock();
		err = ds->ops->change_tag_protocol(ds, cpu_dp->index,
						   tag_ops->proto);
		rtnl_unlock();
		if (err) {
			dev_err(ds->dev, "Unable to use tag protocol \"%s\": %pe\n",
				tag_ops->name, ERR_PTR(err));
			return err;
		}
	}

connect:
	if (ds->ops->connect_tag_protocol) {
		err = ds->ops->connect_tag_protocol(ds, tag_ops->proto);
		if (err) {
			dev_err(ds->dev,
				"Unable to connect to tag protocol \"%s\": %pe\n",
				tag_ops->name, ERR_PTR(err));
			return err;
		}
	}

	return 0;
}

[net-next,RFC,0/6] Add support for qca8k mdio rw in Ethernet packet

Message

Comments