Message ID | 20211207145942.7444-1-ansuelsmth@gmail.com (mailing list archive) |
---|---|
Headers | show |
Series | Add support for qca8k mdio rw in Ethernet packet | expand |
On Tue, Dec 07, 2021 at 03:59:36PM +0100, Ansuel Smith wrote: > Hi, this is still WIP and currently has some problem but I would love if > someone can give this a superficial review and answer to some problem > with this. > > The main reason for this is that we notice some routing problem in the > switch and it seems assisted learning is needed. Considering mdio is > quite slow due to the indirect write using this Ethernet alternative way > seems to be quicker. > > The qca8k switch supports a special way to pass mdio read/write request > using specially crafted Ethernet packet. Oh! Cool! Marvell has this as well, and i suspect a few others. It is something i've wanted to work on for a long long time, but never had the opportunity. This also means that, even if you are focusing on qca8k, please try to think what could be generic, and what should specific to the qca8k. The idea of sending an Ethernet frame and sometime later receiving a reply should be generic and usable for other DSA drivers. The contents of those frames needs to be driver specific. How we hook this into MDIO might also be generic, maybe. I will look at your questions later, but soon. Andrew
On Tue, Dec 07, 2021 at 04:15:51PM +0100, Andrew Lunn wrote: > On Tue, Dec 07, 2021 at 03:59:36PM +0100, Ansuel Smith wrote: > > Hi, this is still WIP and currently has some problem but I would love if > > someone can give this a superficial review and answer to some problem > > with this. > > > > The main reason for this is that we notice some routing problem in the > > switch and it seems assisted learning is needed. Considering mdio is > > quite slow due to the indirect write using this Ethernet alternative way > > seems to be quicker. > > > > The qca8k switch supports a special way to pass mdio read/write request > > using specially crafted Ethernet packet. > > Oh! Cool! Marvell has this as well, and i suspect a few others. It is > something i've wanted to work on for a long long time, but never had > the opportunity. > Really? I tought this was very specific to qca8k. > This also means that, even if you are focusing on qca8k, please try to > think what could be generic, and what should specific to the > qca8k. The idea of sending an Ethernet frame and sometime later > receiving a reply should be generic and usable for other DSA > drivers. The contents of those frames needs to be driver specific. > How we hook this into MDIO might also be generic, maybe. A generic implementation of this would be add an ops to the dsa generic struct that driver can implement and find a clean way to use this alternative way instead of normal mdio. (Implement something like eth_mdio_read/write ops and the driver will decide when to use them? The dsa then will correctly understand when the driver is ready to accept packet and send the skb, in all the other case just send an error and the driver will use normal mdio?) I think the tagger require some modification anyway as it's the one that receive packet and parse them. (I assume also other switch will use the tagger to understand that the packet is used for mdio) (A bool to declare that the tagger can correctly parse this kind of stuff and complete the completion?) > > I will look at your questions later, but soon. > Thanks a lot for the quick response. I'm more than happy to generalize this and find the a correct way. > Andrew
> I still have to find a solution to a slowdown problem and this is where > I would love to get some hint. > Currently I still didn't find a good way to understand when the tagger > starts to accept packets and because of this the initial setup is slow > as every completion timeouts. Am I missing something or is there a way > to check for this? I've not looked at this particular driver, i just know the general architecture. The MDIO bus driver probes first, maybe as part of the Ethernet driver, maybe as a standalone MDIO driver. The switch is found in DT and the driver code will at some point later probe the switch driver. The switch driver has working MDIO at this point. It should use MDIO to talk to the switch, make sure it is there, maybe do some initial configuration. Once it is happy, it registers the switch with the DSA core using dsa_register_switch(). If this is a single switch, the DSA core will then start setting things up. As part of dsa_switch_setup() it will call the switch drivers setup() method. It then figures out what tag driver to use, by calling dsa_switch_setup_tag_protocol(). However, the tag driver itself is not inserted into the chain yet. That happens later. Once the switch is setup, dsa_tree_setup_master() is called which does dsa_master_setup() and in the middle there is: /* If we use a tagging format that doesn't have an ethertype * field, make sure that all packets from this point on get * sent to the tag format's receive function. */ wmb(); dev->dsa_ptr = cpu_dp; This is the magic to actually enable the tagger receiving frames. I need to look at your patches, but why is the tagger involved? At least for the Marvell switch, you send a pretty normal looking Ethernet frame to a specific MAC address, and the switch replies using that MAC address. And it has an Ether Type specific to switch control. Since this is all normal looking, there are hooks in the network stack which can be used to get these frames. Andrew
On 12/7/21 7:15 AM, Andrew Lunn wrote: > On Tue, Dec 07, 2021 at 03:59:36PM +0100, Ansuel Smith wrote: >> Hi, this is still WIP and currently has some problem but I would love if >> someone can give this a superficial review and answer to some problem >> with this. >> >> The main reason for this is that we notice some routing problem in the >> switch and it seems assisted learning is needed. Considering mdio is >> quite slow due to the indirect write using this Ethernet alternative way >> seems to be quicker. >> >> The qca8k switch supports a special way to pass mdio read/write request >> using specially crafted Ethernet packet. > > Oh! Cool! Marvell has this as well, and i suspect a few others. It is > something i've wanted to work on for a long long time, but never had > the opportunity. > > This also means that, even if you are focusing on qca8k, please try to > think what could be generic, and what should specific to the > qca8k. The idea of sending an Ethernet frame and sometime later > receiving a reply should be generic and usable for other DSA > drivers. The contents of those frames needs to be driver specific. > How we hook this into MDIO might also be generic, maybe. > > I will look at your questions later, but soon. There was a priori attempt from Vivien to add support for mv88e6xxx over RMU frames: https://www.mail-archive.com/netdev@vger.kernel.org/msg298317.html This gets interesting because the switch's control path moves from MDIO to Ethernet and there is not really an "ethernet bus" though we could certainly come up with one. We have mdio-i2c, so maybe we should have mdio-ethernet?
On Tue, Dec 07, 2021 at 07:41:23PM +0100, Andrew Lunn wrote: > > I still have to find a solution to a slowdown problem and this is where > > I would love to get some hint. > > Currently I still didn't find a good way to understand when the tagger > > starts to accept packets and because of this the initial setup is slow > > as every completion timeouts. Am I missing something or is there a way > > to check for this? > > I've not looked at this particular driver, i just know the general > architecture. > > The MDIO bus driver probes first, maybe as part of the Ethernet > driver, maybe as a standalone MDIO driver. The switch is found in DT > and the driver code will at some point later probe the switch driver. > > The switch driver has working MDIO at this point. It should use MDIO > to talk to the switch, make sure it is there, maybe do some initial > configuration. Once it is happy, it registers the switch with the DSA > core using dsa_register_switch(). > > If this is a single switch, the DSA core will then start setting > things up. As part of dsa_switch_setup() it will call the switch > drivers setup() method. It then figures out what tag driver to use, by > calling dsa_switch_setup_tag_protocol(). However, the tag driver > itself is not inserted into the chain yet. That happens later. Once > the switch is setup, dsa_tree_setup_master() is called which does > dsa_master_setup() and in the middle there is: > > /* If we use a tagging format that doesn't have an ethertype > * field, make sure that all packets from this point on get > * sent to the tag format's receive function. > */ > wmb(); > > dev->dsa_ptr = cpu_dp; > > This is the magic to actually enable the tagger receiving frames. > Will check if using this is the correct way to prevent use of this alternative way before it's available. > I need to look at your patches, but why is the tagger involved? At > least for the Marvell switch, you send a pretty normal looking > Ethernet frame to a specific MAC address, and the switch replies using > that MAC address. And it has an Ether Type specific to switch > control. Since this is all normal looking, there are hooks in the > network stack which can be used to get these frames. > The qca tag header provide a TYPE value that refer to a big list of Frame type. In all of this at value 2 we have the type that tells us that is a READ_WRITE_REG_ACK (aka a mdio rw Ethernet packet) The idea of using the tagger is to skip parsing the packet 2 times considering the qca tag header is present at the same place in both normal packet and mdio rw Ethernet packet. Your idea would be hook this before the tagger and parse it? I assume that is the only way if this has to be generilized. But I wonder if this would create some overhead by the double parsing. > Andrew >
> The qca tag header provide a TYPE value that refer to a big list of > Frame type. In all of this at value 2 we have the type that tells us > that is a READ_WRITE_REG_ACK (aka a mdio rw Ethernet packet) > > The idea of using the tagger is to skip parsing the packet 2 times > considering the qca tag header is present at the same place in both > normal packet and mdio rw Ethernet packet. > > Your idea would be hook this before the tagger and parse it? > I assume that is the only way if this has to be generilized. But I > wonder if this would create some overhead by the double parsing. So it seems i remembered this incorrectly. Marvell call this Remote Management Unit, RMU. And RMU makes use of bits inside the Marvell Tag. I was thinking it was outside of the tag. So, yes, the tagger does need to be involved in this. The initial design of DSA was that the tagger and main driver were kept separate. This has been causing us problems recently, we have use cases where we need to share information between the tagger and the driver. This looks like it is going to be another case of that. Andrew
On Tue, Dec 07, 2021 at 08:15:24PM +0100, Andrew Lunn wrote: > > The qca tag header provide a TYPE value that refer to a big list of > > Frame type. In all of this at value 2 we have the type that tells us > > that is a READ_WRITE_REG_ACK (aka a mdio rw Ethernet packet) > > > > The idea of using the tagger is to skip parsing the packet 2 times > > considering the qca tag header is present at the same place in both > > normal packet and mdio rw Ethernet packet. > > > > Your idea would be hook this before the tagger and parse it? > > I assume that is the only way if this has to be generilized. But I > > wonder if this would create some overhead by the double parsing. > > So it seems i remembered this incorrectly. Marvell call this Remote > Management Unit, RMU. And RMU makes use of bits inside the Marvell > Tag. I was thinking it was outside of the tag. > > So, yes, the tagger does need to be involved in this. > > The initial design of DSA was that the tagger and main driver were > kept separate. This has been causing us problems recently, we have use > cases where we need to share information between the tagger and the > driver. This looks like it is going to be another case of that. > > Andrew I mean if you check the code this is still somewhat ""separate"". I ""abuse"" the dsa port priv to share the required data. (I allocate a different struct... i put it in qca8k_priv and i set every port priv to this struct) Wonder if we can add something to share data between the driver and the port so the access that from the tagger. (something that doesn't use the port priv)
On Tue, Dec 07, 2021 at 10:49:43AM -0800, Florian Fainelli wrote: > On 12/7/21 7:15 AM, Andrew Lunn wrote: > > On Tue, Dec 07, 2021 at 03:59:36PM +0100, Ansuel Smith wrote: > >> Hi, this is still WIP and currently has some problem but I would love if > >> someone can give this a superficial review and answer to some problem > >> with this. > >> > >> The main reason for this is that we notice some routing problem in the > >> switch and it seems assisted learning is needed. Considering mdio is > >> quite slow due to the indirect write using this Ethernet alternative way > >> seems to be quicker. > >> > >> The qca8k switch supports a special way to pass mdio read/write request > >> using specially crafted Ethernet packet. > > > > Oh! Cool! Marvell has this as well, and i suspect a few others. It is > > something i've wanted to work on for a long long time, but never had > > the opportunity. > > > > This also means that, even if you are focusing on qca8k, please try to > > think what could be generic, and what should specific to the > > qca8k. The idea of sending an Ethernet frame and sometime later > > receiving a reply should be generic and usable for other DSA > > drivers. The contents of those frames needs to be driver specific. > > How we hook this into MDIO might also be generic, maybe. > > > > I will look at your questions later, but soon. > > There was a priori attempt from Vivien to add support for mv88e6xxx over > RMU frames: > > https://www.mail-archive.com/netdev@vger.kernel.org/msg298317.html > > This gets interesting because the switch's control path moves from MDIO > to Ethernet and there is not really an "ethernet bus" though we could > certainly come up with one. We have mdio-i2c, so maybe we should have > mdio-ethernet? > -- > Florian I checked that series and I notice that the proposed implementation used a workqueue. The current implementation here use completion and mutex so the transaction is really one command at time and wait for response. Considering most of the time we do read and write operation is seems a bit overkill to use a queue... Also to track the response. Using a single queue simplify the implementation and should be just good. (btw qca8k supports a way to queue packet using a seq int but we don't use it to keep things simple) Is that acceptable? Also I notice in that series mru have some limitation and can be used only for some kind of data... Should we add a way to blacklist some particular reg and use the legacy mdio way?
On Tue, Dec 07, 2021 at 08:21:52PM +0100, Ansuel Smith wrote: > On Tue, Dec 07, 2021 at 08:15:24PM +0100, Andrew Lunn wrote: > > > The qca tag header provide a TYPE value that refer to a big list of > > > Frame type. In all of this at value 2 we have the type that tells us > > > that is a READ_WRITE_REG_ACK (aka a mdio rw Ethernet packet) > > > > > > The idea of using the tagger is to skip parsing the packet 2 times > > > considering the qca tag header is present at the same place in both > > > normal packet and mdio rw Ethernet packet. > > > > > > Your idea would be hook this before the tagger and parse it? > > > I assume that is the only way if this has to be generilized. But I > > > wonder if this would create some overhead by the double parsing. > > > > So it seems i remembered this incorrectly. Marvell call this Remote > > Management Unit, RMU. And RMU makes use of bits inside the Marvell > > Tag. I was thinking it was outside of the tag. > > > > So, yes, the tagger does need to be involved in this. > > > > The initial design of DSA was that the tagger and main driver were > > kept separate. This has been causing us problems recently, we have use > > cases where we need to share information between the tagger and the > > driver. This looks like it is going to be another case of that. > > > > Andrew > > I mean if you check the code this is still somewhat ""separate"". > I ""abuse"" the dsa port priv to share the required data. > (I allocate a different struct... i put it in qca8k_priv and i set every > port priv to this struct) > > Wonder if we can add something to share data between the driver and the > port so the access that from the tagger. (something that doesn't use the > port priv) The one problem relevant to this submission among those referenced by Andrew is that dp->priv needs to be allocated by the Ethernet switch driver, although it is used by the tagging protocol driver. So they aren't really 'separate', nor can they be, the way dp->priv is currently designed, it can only be "abused", not really "used". The DSA design allows in principle for any switch driver to return any protocol it desires in ->get_tag_protocol(). I occasionally test various tagger submissions by hacking dsa_loop to do just that. But your tag_qca.c driver would have a pretty unpleasant surprise if it was to be paired to any other switch driver than qca8k, because that other driver would either not allocate memory for dp->priv, or (worse) allocate some other type of structure, expected to be used differently etc. An even bigger complication is created by the fact that we can dynamically change tagging protocols in certain cases (dsa <-> edsa, ocelot <-> ocelot-8021q), and the current design doesn't really scale: if any tagging protocol required its own dp->priv format, we may end up with bugs such as the driver not freeing the old dp->priv and setting up the new one, when the tagging protocol changes. These mistakes are all too easy to make currently. Another potential issue, which I don't see present here, but still worth watching out for, is that the tagger cannot use symbols exported by the switch, and vice versa. Otherwise the tagger cannot be inserted into the kernel when built as module, due to missing symbols provided by the switch. And the switch will not probe until it has a tagger. I'm afraid we need to make a decision now whether we keep supporting the separation between taggers and switch drivers, especially since the tagger could become a bus provider for the switch driver. We need to weigh the pros and cons. I thought about what would be needed and I think we'd need tagger-owned per-switch-tree private data. But this implies that there needs to be a hook in the tagging protocol driver that notifies it when a certain struct dsa_switch_tree *dst binds and unbinds to a certain tagger. Then it could pick and choose the ports that need dp->priv configured in a certain way: some taggers need the dp->priv of user ports to hold something per port, others need the dp->priv of _all_ user ports to point to something shared, others (like yours) apparently need the dp->priv of the CPU port to hold something. This would become something handled and owned exclusively by the tagger. Ansuel, would something like this help you in any way?
On Tue, Dec 07, 2021 at 10:49:43AM -0800, Florian Fainelli wrote: > On 12/7/21 7:15 AM, Andrew Lunn wrote: > > On Tue, Dec 07, 2021 at 03:59:36PM +0100, Ansuel Smith wrote: > >> Hi, this is still WIP and currently has some problem but I would love if > >> someone can give this a superficial review and answer to some problem > >> with this. > >> > >> The main reason for this is that we notice some routing problem in the > >> switch and it seems assisted learning is needed. Considering mdio is > >> quite slow due to the indirect write using this Ethernet alternative way > >> seems to be quicker. > >> > >> The qca8k switch supports a special way to pass mdio read/write request > >> using specially crafted Ethernet packet. > > > > Oh! Cool! Marvell has this as well, and i suspect a few others. It is > > something i've wanted to work on for a long long time, but never had > > the opportunity. > > > > This also means that, even if you are focusing on qca8k, please try to > > think what could be generic, and what should specific to the > > qca8k. The idea of sending an Ethernet frame and sometime later > > receiving a reply should be generic and usable for other DSA > > drivers. The contents of those frames needs to be driver specific. > > How we hook this into MDIO might also be generic, maybe. > > > > I will look at your questions later, but soon. > > There was a priori attempt from Vivien to add support for mv88e6xxx over > RMU frames: > > https://www.mail-archive.com/netdev@vger.kernel.org/msg298317.html > > This gets interesting because the switch's control path moves from MDIO > to Ethernet and there is not really an "ethernet bus" though we could > certainly come up with one. We have mdio-i2c, so maybe we should have > mdio-ethernet? This raises interesting questions. I see two distinct cases: 1. "dual I/O": the switch probes initially over a standard bus (MDIO, SPI, I2C) then at some point transitions towards I/O through the tagger. This would be the case when there is some preparation work to be done (maybe the CPU port needs to be brought up, maybe there is a firmware to be uploaded to the switch's embedded microcontroller such that the expected remote management protocol is supported, etc). It would also be the case when multiple CPU ports are supported (and changing between CPU ports), because we could end up bringing a certain CPU port down, and the register I/O would need to be temporarily done over MDIO before bringing the other CPU port up. 2. "single I/O": the switch needs no such configuration, and in this case, it could in principle probe over an "Ethernet bus" rather than a standard bus as mentioned above. I don't know which case is going to be more common, honestly. The difference between the two is that the former would work using the existing infrastructure (bus types) we have today, whereas the latter would (maybe) need an "Ethernet bus" as mentioned by Vivien and Florian. I'm not completely convinced, though. The argument for an "Ethernet bus" seems to be that any Ethernet controller may need to set up such a thing, independently of being a DSA master. In Vivien's link, an example is given where we have "Control path via port 1, Data path via port port 3". But I don't know, this separation seems pretty artificial and ultimately boils down to configuration. Like it or not, in that particular example, both ports 1 and 3 are CPU ports, and both eth1 and eth0 are DSA masters. The fact that they are used asymmetrically should pretty much not matter. I think a fair simplifying assumption is that switch management protocols will never be spoken through interfaces that aren't DSA masters (because being a DSA master _implies_ having a physical link to a DSA switch). And if we have this simplifying factor, we could consider moving dsa_tree_setup_master() somewhere earlier in the DSA probe path, and just make "type 2" / "single I/O" switches be platform devices, with a more-or-less empty probe function (just calls dsa_register_switch), and do their hardware init in ->setup, which is _after_ dsa_tree_setup_master and therefore after the tagging protocol has bound to the DSA switch tree and is prepared to handle I/O. So no bus really needed. Although I have to point this out. An "Ethernet bus" would be one of the most unreliable buses out there. Consider that the user may run "ip link set eth0 down" at any given time. What do you do, as a switch driver on an Ethernet bus, when your DSA master goes down? Postpone all your I/O? Error out on I/O? Unbind? Even moreso, there are currently code paths in DSA that can only be done while the DSA master is down (changing the tagging protocol comes to mind). So does this mean that those code paths are simply not available when the I/O is over Ethernet? Or does it mean that Ethernet cannot be the sole I/O method of any switch driver, due to it being unreliable? And if the latter, this is yet another argument against "Ethernet as bus". A bus is basically only there for probing purposes, but if you need to have a primary I/O method beside Ethernet, you already have a bus and don't need another.
On Tue, Dec 07, 2021 at 10:52:19PM +0200, Vladimir Oltean wrote: > On Tue, Dec 07, 2021 at 08:21:52PM +0100, Ansuel Smith wrote: > > On Tue, Dec 07, 2021 at 08:15:24PM +0100, Andrew Lunn wrote: > > > > The qca tag header provide a TYPE value that refer to a big list of > > > > Frame type. In all of this at value 2 we have the type that tells us > > > > that is a READ_WRITE_REG_ACK (aka a mdio rw Ethernet packet) > > > > > > > > The idea of using the tagger is to skip parsing the packet 2 times > > > > considering the qca tag header is present at the same place in both > > > > normal packet and mdio rw Ethernet packet. > > > > > > > > Your idea would be hook this before the tagger and parse it? > > > > I assume that is the only way if this has to be generilized. But I > > > > wonder if this would create some overhead by the double parsing. > > > > > > So it seems i remembered this incorrectly. Marvell call this Remote > > > Management Unit, RMU. And RMU makes use of bits inside the Marvell > > > Tag. I was thinking it was outside of the tag. > > > > > > So, yes, the tagger does need to be involved in this. > > > > > > The initial design of DSA was that the tagger and main driver were > > > kept separate. This has been causing us problems recently, we have use > > > cases where we need to share information between the tagger and the > > > driver. This looks like it is going to be another case of that. > > > > > > Andrew > > > > I mean if you check the code this is still somewhat ""separate"". > > I ""abuse"" the dsa port priv to share the required data. > > (I allocate a different struct... i put it in qca8k_priv and i set every > > port priv to this struct) > > > > Wonder if we can add something to share data between the driver and the > > port so the access that from the tagger. (something that doesn't use the > > port priv) > > The one problem relevant to this submission among those referenced by > Andrew is that dp->priv needs to be allocated by the Ethernet switch > driver, although it is used by the tagging protocol driver. So they > aren't really 'separate', nor can they be, the way dp->priv is currently > designed, it can only be "abused", not really "used". > > The DSA design allows in principle for any switch driver to return any > protocol it desires in ->get_tag_protocol(). I occasionally test various > tagger submissions by hacking dsa_loop to do just that. But your > tag_qca.c driver would have a pretty unpleasant surprise if it was to be > paired to any other switch driver than qca8k, because that other driver > would either not allocate memory for dp->priv, or (worse) allocate some > other type of structure, expected to be used differently etc. > > An even bigger complication is created by the fact that we can > dynamically change tagging protocols in certain cases (dsa <-> edsa, > ocelot <-> ocelot-8021q), and the current design doesn't really scale: > if any tagging protocol required its own dp->priv format, we may end up > with bugs such as the driver not freeing the old dp->priv and setting up > the new one, when the tagging protocol changes. These mistakes are all > too easy to make currently. > > Another potential issue, which I don't see present here, but still > worth watching out for, is that the tagger cannot use symbols exported > by the switch, and vice versa. Otherwise the tagger cannot be inserted > into the kernel when built as module, due to missing symbols provided by > the switch. And the switch will not probe until it has a tagger. > > I'm afraid we need to make a decision now whether we keep supporting the > separation between taggers and switch drivers, especially since the > tagger could become a bus provider for the switch driver. We need to > weigh the pros and cons. > > I thought about what would be needed and I think we'd need tagger-owned > per-switch-tree private data. But this implies that there needs to be a > hook in the tagging protocol driver that notifies it when a certain > struct dsa_switch_tree *dst binds and unbinds to a certain tagger. > Then it could pick and choose the ports that need dp->priv configured in > a certain way: some taggers need the dp->priv of user ports to hold > something per port, others need the dp->priv of _all_ user ports to > point to something shared, others (like yours) apparently need the > dp->priv of the CPU port to hold something. This would become something > handled and owned exclusively by the tagger. > > Ansuel, would something like this help you in any way? I agree on all the concern you pointed out. IMHO, current implementation of the dsa port priv is a bit confusing and someone can really do bad things with it. (like it's done in this implementation) The main problem here is that we really need a way to have shared data between tagger and dsa driver. I also think that it would be limiting using this only for mdio. For example qca8k can autocast mib with Ethernet port and that would be another feature that the tagger would handle. I like the idea of tagger-owend per-switch-tree private data. Do we really need to hook logic? Wonder if something like that would work: 1. Each tagger declare size of his private data (if any). 2. Change tag dsa helper make sure the privata data in dst gets allocated and freed. 3. We create some helper to get the tagger private data pointer that dsa driver will use. (an error is returned if no data is present) 4. Tagger will use the dst to access his own data. In theory that way we should be able to make a ""connection"" between dsa driver and the tagger and prevent any sort of strange stuff that could result in bug/kernel panic. I mean for the current task (mdio in ethernet packet) we just need to put data, send the skb and wait for a response (and after parsing) get the data from a response skb.
On Tue, Dec 07, 2021 at 11:10:18PM +0200, Vladimir Oltean wrote: > On Tue, Dec 07, 2021 at 10:49:43AM -0800, Florian Fainelli wrote: > > On 12/7/21 7:15 AM, Andrew Lunn wrote: > > > On Tue, Dec 07, 2021 at 03:59:36PM +0100, Ansuel Smith wrote: > > >> Hi, this is still WIP and currently has some problem but I would love if > > >> someone can give this a superficial review and answer to some problem > > >> with this. > > >> > > >> The main reason for this is that we notice some routing problem in the > > >> switch and it seems assisted learning is needed. Considering mdio is > > >> quite slow due to the indirect write using this Ethernet alternative way > > >> seems to be quicker. > > >> > > >> The qca8k switch supports a special way to pass mdio read/write request > > >> using specially crafted Ethernet packet. > > > > > > Oh! Cool! Marvell has this as well, and i suspect a few others. It is > > > something i've wanted to work on for a long long time, but never had > > > the opportunity. > > > > > > This also means that, even if you are focusing on qca8k, please try to > > > think what could be generic, and what should specific to the > > > qca8k. The idea of sending an Ethernet frame and sometime later > > > receiving a reply should be generic and usable for other DSA > > > drivers. The contents of those frames needs to be driver specific. > > > How we hook this into MDIO might also be generic, maybe. > > > > > > I will look at your questions later, but soon. > > > > There was a priori attempt from Vivien to add support for mv88e6xxx over > > RMU frames: > > > > https://www.mail-archive.com/netdev@vger.kernel.org/msg298317.html > > > > This gets interesting because the switch's control path moves from MDIO > > to Ethernet and there is not really an "ethernet bus" though we could > > certainly come up with one. We have mdio-i2c, so maybe we should have > > mdio-ethernet? > > This raises interesting questions. I see two distinct cases: > > 1. "dual I/O": the switch probes initially over a standard bus (MDIO, > SPI, I2C) then at some point transitions towards I/O through the > tagger. This would be the case when there is some preparation work > to be done (maybe the CPU port needs to be brought up, maybe there is > a firmware to be uploaded to the switch's embedded microcontroller > such that the expected remote management protocol is supported, etc). > It would also be the case when multiple CPU ports are supported (and > changing between CPU ports), because we could end up bringing a > certain CPU port down, and the register I/O would need to be > temporarily done over MDIO before bringing the other CPU port up. > > 2. "single I/O": the switch needs no such configuration, and in this > case, it could in principle probe over an "Ethernet bus" rather than > a standard bus as mentioned above. > > I don't know which case is going to be more common, honestly. The > difference between the two is that the former would work using the > existing infrastructure (bus types) we have today, whereas the latter > would (maybe) need an "Ethernet bus" as mentioned by Vivien and Florian. > Considering this is very specific for qca8k (it does use very not standard Ethernet packet) and we have only another switch that more or less supports this, I honestly think we should think about the dual I/0. qca8k for example require to be setup first with mdio (as it does require some bit to enable header mode, the tagger way to comunicate ethernet mdio type packet) and generally the cpu port has to be configured for the phy mode. We can't assume a bootloader correctly setup the cpu port and the switch being able to receive packet so I think a fallback is always necessary. > I'm not completely convinced, though. The argument for an "Ethernet bus" > seems to be that any Ethernet controller may need to set up such a > thing, independently of being a DSA master. In Vivien's link, an example > is given where we have "Control path via port 1, Data path via port port 3". > But I don't know, this separation seems pretty artificial and ultimately > boils down to configuration. Like it or not, in that particular example, > both ports 1 and 3 are CPU ports, and both eth1 and eth0 are DSA masters. > The fact that they are used asymmetrically should pretty much not matter. > > I think a fair simplifying assumption is that switch management > protocols will never be spoken through interfaces that aren't DSA > masters (because being a DSA master _implies_ having a physical link to > a DSA switch). And if we have this simplifying factor, we could consider > moving dsa_tree_setup_master() somewhere earlier in the DSA probe path, > and just make "type 2" / "single I/O" switches be platform devices, with > a more-or-less empty probe function (just calls dsa_register_switch), > and do their hardware init in ->setup, which is _after_ > dsa_tree_setup_master and therefore after the tagging protocol has bound > to the DSA switch tree and is prepared to handle I/O. So no bus really > needed. > > Although I have to point this out. An "Ethernet bus" would be one of the > most unreliable buses out there. Consider that the user may run "ip link > set eth0 down" at any given time. What do you do, as a switch driver on > an Ethernet bus, when your DSA master goes down? Postpone all your I/O? > Error out on I/O? Unbind? With something like an Ethernet bus only (no dual I/O) the operation should just be rejected to prevent this kind of stuff. (return EBUSY error?) > > Even moreso, there are currently code paths in DSA that can only be done > while the DSA master is down (changing the tagging protocol comes to > mind). So does this mean that those code paths are simply not available > when the I/O is over Ethernet? Or does it mean that Ethernet cannot be > the sole I/O method of any switch driver, due to it being unreliable? > And if the latter, this is yet another argument against "Ethernet as bus". > A bus is basically only there for probing purposes, but if you need to > have a primary I/O method beside Ethernet, you already have a bus and > don't need another. As I said up, a ""stable"" I/O method must be present and the fast path should be used only if available. About this I'm thinking if we should create an helper and let dsa decide when a method can be used instead of another one. A dsa driver will then use these helper function and mimic the standard read/write/update_bits thing (but this would be very specific and used only for Ethernet mdio and probably not correct)
> The main problem here is that we really need a way to have shared data > between tagger and dsa driver. I also think that it would be limiting > using this only for mdio. For example qca8k can autocast mib with > Ethernet port and that would be another feature that the tagger would > handle. The Marvell switches also have an efficient way to get the whole MIB table. So this is something i would also want. > I like the idea of tagger-owend per-switch-tree private data. > Do we really need to hook logic? We have two different things here. 1) The tagger needs somewhere to store its own private data. 2) The tagger needs to share state with the switch driver. We can probably have the DSA core provide 1). Add the size to dsa_device_ops structure, and provide helpers to go from either a master or a slave netdev to the private data. 2) is harder. But as far as i know, we have an 1:N setup. One switch driver can use N tag drivers. So we need the switch driver to be sure the tag driver is what it expects. We keep the shared state in the tag driver, so it always has valid data, but when the switch driver wants to get a pointer to it, it needs to pass a enum dsa_tag_protocol and if it does not match, the core should return -EINVAL or similar. Andrew
On Tue, Dec 07, 2021 at 11:22:41PM +0100, Andrew Lunn wrote: > > The main problem here is that we really need a way to have shared data > > between tagger and dsa driver. I also think that it would be limiting > > using this only for mdio. For example qca8k can autocast mib with > > Ethernet port and that would be another feature that the tagger would > > handle. > > The Marvell switches also have an efficient way to get the whole MIB > table. So this is something i would also want. > Again same think... they just put the type in the qca hdr (placed in the EthType position) and everything else is a mib. The switch send 7 packet and each correspond to the MIB for the port (the port number is comunicated in the qca hdr) > > I like the idea of tagger-owend per-switch-tree private data. > > Do we really need to hook logic? > > We have two different things here. > > 1) The tagger needs somewhere to store its own private data. > 2) The tagger needs to share state with the switch driver. > > We can probably have the DSA core provide 1). Add the size to > dsa_device_ops structure, and provide helpers to go from either a > master or a slave netdev to the private data. I'm just implementing this. It doesn't look that hard. > > 2) is harder. But as far as i know, we have an 1:N setup. One switch > driver can use N tag drivers. So we need the switch driver to be sure > the tag driver is what it expects. We keep the shared state in the tag > driver, so it always has valid data, but when the switch driver wants > to get a pointer to it, it needs to pass a enum dsa_tag_protocol and > if it does not match, the core should return -EINVAL or similar. > Mhh this looks a bit complex. I'm probably missing something but why the tagger needs to share a state? To check if it does support some feature? If it's ready to be used for mdio Ethernet? Or just to be future-proof? > Andrew
> This raises interesting questions. I see two distinct cases: > > 1. "dual I/O": the switch probes initially over a standard bus (MDIO, > SPI, I2C) then at some point transitions towards I/O through the > tagger. This would be the case when there is some preparation work > to be done (maybe the CPU port needs to be brought up, maybe there is > a firmware to be uploaded to the switch's embedded microcontroller > such that the expected remote management protocol is supported, etc). > It would also be the case when multiple CPU ports are supported (and > changing between CPU ports), because we could end up bringing a > certain CPU port down, and the register I/O would need to be > temporarily done over MDIO before bringing the other CPU port up. mv88e6xxx is very likely to take this path. You need to program some registers to enable RMU. It is possible to enable this via EEPROM configuration, but i've never seen any hardware with the necessary EEPROM configuration. And you have the old chicken/egg, in order to be able to program the EEPROM, you need access to the switch, or a header and a cable. > 2. "single I/O": the switch needs no such configuration, and in this > case, it could in principle probe over an "Ethernet bus" rather than > a standard bus as mentioned above. > > I don't know which case is going to be more common, honestly. Given the history, i think MDIO startup, and then transition to Ethernet is going to be a lot more common. If there was a lot of hardware out there which could do Ethernet from the beginning, we would of had patches or at least requests for it by now. I would keep it KISS. Andrew
On Tue, Dec 07, 2021 at 10:47:59PM +0100, Ansuel Smith wrote: > The main problem here is that we really need a way to have shared data > between tagger and dsa driver. I also think that it would be limiting > using this only for mdio. For example qca8k can autocast mib with > Ethernet port and that would be another feature that the tagger would > handle. This is cool. I suppose this is what QCA_HDR_RECV_TYPE_MIB is for. But it wouldn't work with your design because the tagger doesn't hold any queues, it is basically a request/response which is always initiated by the switch driver. The hardware can't automatically push anything to software on its own. Maybe if the tagger wouldn't be stateless, that would be better? What if the qca8k switch driver would just provide some function pointers to the switch driver (these would be protocol sub-handlers for QCA_HDR_RECV_TYPE_MIB, QCA_HDR_RECV_TYPE_RW_REG_ACK etc), and your completion structure would be both initialized, as well as finalized, all from within the switch driver itself? > I like the idea of tagger-owend per-switch-tree private data. > Do we really need to hook logic? > Wonder if something like that would work: > 1. Each tagger declare size of his private data (if any). > 2. Change tag dsa helper make sure the privata data in dst gets > allocated and freed. > 3. We create some helper to get the tagger private data pointer that > dsa driver will use. (an error is returned if no data is present) > 4. Tagger will use the dst to access his own data. I considered a simplified form like this, but I think the tagger private data will still stay in dp->priv, only its ownership will change. It is less flexible to just have an autoalloc size. Ok, you allocate a structure the size you need, but which dp->priv gets to have it? Maybe a certain tagging protocol will need dp1->priv == dp2->priv == dp3->priv == ..., whereas a different tagging protocol will need unique different structures for each dp. > > In theory that way we should be able to make a ""connection"" between > dsa driver and the tagger and prevent any sort of strange stuff that > could result in bug/kernel panic. > > I mean for the current task (mdio in ethernet packet) we just need to > put data, send the skb and wait for a response (and after parsing) get > the data from a response skb. It would be a huge win IMO if we could avoid managing the lifetime of dp->priv _directly_. I'm thinking something along the lines of: - every time we make the "dst->tag_ops = tag_ops;" assignment (see dsa2.c) there is a connection event between the switch tree and the tagging protocol (and also a disconnection event, if dst->tag_ops wasn't previously NULL). - we could add a new tag_ops->connect(dst) and tag_ops->disconnect(dst) and call them. These could allocate/free the dp->priv memory for each dp in &dst->ports. - _after_ the tag_ops->connect() has been called (this makes sure that the tagger memory has been allocated) we could also emit a cross-chip notifier event: /* DSA_NOTIFIER_TAG_PROTO_CONNECT */ struct dsa_notifier_tag_proto_connect_info { const struct dsa_device_ops *tag_ops; }; struct dsa_notifier_tag_proto_connect_info info; dsa_tree_notify(dst, DSA_NOTIFIER_TAG_PROTO, &info); The role of a cross-chip notifier is to fan-out a call exactly once to every switch within a tree. This particular cross-chip notifier could end up with an implementation in switch.c that lands with a call to: ds->ops->tag_proto_connect(ds, tag_ops); At this point, I'm a bit fuzzy on the details. I'm thinking of something like this: struct qca8k_tagger_private { void (*rw_reg_ack_handler)(struct dsa_port *dp, void *buf); void (*mib_autocast_handler)(struct dsa_port *dp, void *buf); }; static void qca8k_rw_reg_ack_handler(struct dsa_port *dp, void *buf) { ... (code moved from tagger) } static void qca8k_mib_autocast_handler(struct dsa_port *dp, void *buf) { ... (code moved from tagger) } static int qca8k_tag_proto_connect(struct dsa_switch *ds, const struct dsa_device_ops *tag_ops) { switch (tag_ops->proto) { case DSA_TAG_PROTO_QCA: struct dsa_port *dp; dsa_switch_for_each_port(dp, ds) { struct qca8k_tagger_private *priv = dp->priv; priv->rw_reg_ack_handler = qca8k_rw_reg_ack_handler; priv->mib_autocast_handler = qca8k_mib_autocast_handler; } break; default: return -EOPNOTSUPP; } } static const struct dsa_switch_ops qca8k_switch_ops = { ... .tag_proto_connect = qca8k_tag_proto_connect, }; My current idea is maybe not ideal and a bit fuzzy, because the switch driver would need to be aware of the fact that the tagger private data is in dp->priv, and some code in one folder needs to be in sync with some code in another folder. But at least it should be safer this way, because we are in more control over the exact connection that's being made. - to avoid leaking memory, we also need to patch dsa_tree_put() to issue a disconnect event on unbind. - the tagging protocol driver would always need to NULL-check the function pointer before dereferencing it, because it may connect to a switch driver that doesn't set them up (dsa_loop): struct qca8k_tagger_private *priv = dp->priv; if (priv->rw_reg_ack_handler) priv->rw_reg_ack_handler(dp, skb_mac_header(skb));
> > 1) The tagger needs somewhere to store its own private data. > > 2) The tagger needs to share state with the switch driver. > > > > We can probably have the DSA core provide 1). Add the size to > > dsa_device_ops structure, and provide helpers to go from either a > > master or a slave netdev to the private data. > > I'm just implementing this. It doesn't look that hard. > > > > > 2) is harder. But as far as i know, we have an 1:N setup. One switch > > driver can use N tag drivers. So we need the switch driver to be sure > > the tag driver is what it expects. We keep the shared state in the tag > > driver, so it always has valid data, but when the switch driver wants > > to get a pointer to it, it needs to pass a enum dsa_tag_protocol and > > if it does not match, the core should return -EINVAL or similar. > > > > Mhh this looks a bit complex. I'm probably missing something but why the > tagger needs to share a state? To check if it does support some feature? > If it's ready to be used for mdio Ethernet? Or just to be future-proof? This is the general problem we have, and it might not be relevant for this specific problem of MDIO over Ethernet. tag_sja1105 wants access to a queue of frames and a work queue shared with switch driver. tag_ocelot_8021q has something similar to tag_sja1105. tag_lan9303 wants to know if its two ports are in the same bridge. Andrew
> I considered a simplified form like this, but I think the tagger private > data will still stay in dp->priv, only its ownership will change. Isn't dp a port structure. So there is one per port? This is where i think we need to separate shared state from tagger private data. Probably tagger private data is not per port. Shared state between the switch driver and the tagger maybe is per port? Andrew
On Wed, Dec 08, 2021 at 12:45:25AM +0200, Vladimir Oltean wrote: > On Tue, Dec 07, 2021 at 10:47:59PM +0100, Ansuel Smith wrote: > > The main problem here is that we really need a way to have shared data > > between tagger and dsa driver. I also think that it would be limiting > > using this only for mdio. For example qca8k can autocast mib with > > Ethernet port and that would be another feature that the tagger would > > handle. > > This is cool. I suppose this is what QCA_HDR_RECV_TYPE_MIB is for. Exactly that. > But it wouldn't work with your design because the tagger doesn't hold > any queues, it is basically a request/response which is always initiated > by the switch driver. The hardware can't automatically push anything to > software on its own. Maybe if the tagger wouldn't be stateless, that > would be better? What if the qca8k switch driver would just provide some > function pointers to the switch driver (these would be protocol > sub-handlers for QCA_HDR_RECV_TYPE_MIB, QCA_HDR_RECV_TYPE_RW_REG_ACK > etc), and your completion structure would be both initialized, as well > as finalized, all from within the switch driver itself? > Hm. Interesting idea. So qca8k would provide the way to parse the packet and made the request. The tagger would just detect the packet and execute the dedicated function. About mib considering the driver autocast counter for every port and every packet have the relevant port to it (set in the qca tag), the idea was to put a big array and directly write the data. The ethtool function will then just read the data and report it. (or even work directly on the ethtool data array). > > I like the idea of tagger-owend per-switch-tree private data. > > Do we really need to hook logic? > > Wonder if something like that would work: > > 1. Each tagger declare size of his private data (if any). > > 2. Change tag dsa helper make sure the privata data in dst gets > > allocated and freed. > > 3. We create some helper to get the tagger private data pointer that > > dsa driver will use. (an error is returned if no data is present) > > 4. Tagger will use the dst to access his own data. > > I considered a simplified form like this, but I think the tagger private > data will still stay in dp->priv, only its ownership will change. > It is less flexible to just have an autoalloc size. Ok, you allocate a > structure the size you need, but which dp->priv gets to have it? > Maybe a certain tagging protocol will need dp1->priv == dp2->priv == > dp3->priv == ..., whereas a different tagging protocol will need unique > different structures for each dp. > > > > > In theory that way we should be able to make a ""connection"" between > > dsa driver and the tagger and prevent any sort of strange stuff that > > could result in bug/kernel panic. > > > > I mean for the current task (mdio in ethernet packet) we just need to > > put data, send the skb and wait for a response (and after parsing) get > > the data from a response skb. > > It would be a huge win IMO if we could avoid managing the lifetime of > dp->priv _directly_. I'm thinking something along the lines of: > > - every time we make the "dst->tag_ops = tag_ops;" assignment (see dsa2.c) > there is a connection event between the switch tree and the tagging > protocol (and also a disconnection event, if dst->tag_ops wasn't > previously NULL). > > - we could add a new tag_ops->connect(dst) and tag_ops->disconnect(dst) > and call them. These could allocate/free the dp->priv memory for each > dp in &dst->ports. > > - _after_ the tag_ops->connect() has been called (this makes sure that > the tagger memory has been allocated) we could also emit a cross-chip > notifier event: > > /* DSA_NOTIFIER_TAG_PROTO_CONNECT */ > struct dsa_notifier_tag_proto_connect_info { > const struct dsa_device_ops *tag_ops; > }; > > struct dsa_notifier_tag_proto_connect_info info; > > dsa_tree_notify(dst, DSA_NOTIFIER_TAG_PROTO, &info); > > The role of a cross-chip notifier is to fan-out a call exactly once to > every switch within a tree. This particular cross-chip notifier could > end up with an implementation in switch.c that lands with a call to: > > ds->ops->tag_proto_connect(ds, tag_ops); > > At this point, I'm a bit fuzzy on the details. I'm thinking of > something like this: > > struct qca8k_tagger_private { > void (*rw_reg_ack_handler)(struct dsa_port *dp, void *buf); > void (*mib_autocast_handler)(struct dsa_port *dp, void *buf); > }; > > static void qca8k_rw_reg_ack_handler(struct dsa_port *dp, void *buf) > { > ... (code moved from tagger) > } > > static void qca8k_mib_autocast_handler(struct dsa_port *dp, void *buf) > { > ... (code moved from tagger) > } > > static int qca8k_tag_proto_connect(struct dsa_switch *ds, > const struct dsa_device_ops *tag_ops) > { > switch (tag_ops->proto) { > case DSA_TAG_PROTO_QCA: > struct dsa_port *dp; > > dsa_switch_for_each_port(dp, ds) { > struct qca8k_tagger_private *priv = dp->priv; > > priv->rw_reg_ack_handler = qca8k_rw_reg_ack_handler; > priv->mib_autocast_handler = qca8k_mib_autocast_handler; > } > > break; > default: > return -EOPNOTSUPP; > } > } > > static const struct dsa_switch_ops qca8k_switch_ops = { > ... > .tag_proto_connect = qca8k_tag_proto_connect, > }; > > My current idea is maybe not ideal and a bit fuzzy, because the switch > driver would need to be aware of the fact that the tagger private data > is in dp->priv, and some code in one folder needs to be in sync with > some code in another folder. But at least it should be safer this way, > because we are in more control over the exact connection that's being > made. > > - to avoid leaking memory, we also need to patch dsa_tree_put() to issue > a disconnect event on unbind. > > - the tagging protocol driver would always need to NULL-check the > function pointer before dereferencing it, because it may connect to a > switch driver that doesn't set them up (dsa_loop): > > struct qca8k_tagger_private *priv = dp->priv; > > if (priv->rw_reg_ack_handler) > priv->rw_reg_ack_handler(dp, skb_mac_header(skb)); Ok so your idea is to make the driver the one controlling ""everything"" and keep the tagger as dummy as possible. That would also remove all the need to put stuff in the global include dir. Looks complex but handy. We still need to understand the state part. Any hint about that? In the mean time I will try implement this.
On Tue, Dec 07, 2021 at 11:54:07PM +0100, Andrew Lunn wrote: > > I considered a simplified form like this, but I think the tagger private > > data will still stay in dp->priv, only its ownership will change. > > Isn't dp a port structure. So there is one per port? Yes, but dp->priv is a pointer. The thing it points to may not necessarily be per port. > This is where i think we need to separate shared state from tagger > private data. Probably tagger private data is not per port. Shared > state between the switch driver and the tagger maybe is per port? I don't know whether there's such a big difference between "shared state" vs "private data"? The dp->priv model is flexible enough to support both. For example, in tag_sja1105, dp->priv is a struct sja1105_port. All struct sja1105_port of a switch have a common struct sja1105_tagger_data *data pointer. We could certainly set up the tag_ops->connect(dst) function to allocate memory in this way.
On Wed, Dec 08, 2021 at 12:05:11AM +0100, Ansuel Smith wrote: > Hm. Interesting idea. So qca8k would provide the way to parse the packet > and made the request. The tagger would just detect the packet and > execute the dedicated function. > About mib considering the driver autocast counter for every port and > every packet have the relevant port to it (set in the qca tag), the > idea was to put a big array and directly write the data. The ethtool > function will then just read the data and report it. (or even work > directly on the ethtool data array). Apart from the fact that you'd be running inside the priv->rw_reg_ack_handler() which runs in softirq context (so you need spinlocks to serialize with the code that runs in process and/or workqueue context), you have access to all the data structures from the switch driver that you're used to. So you could copy from the void *buf into something owned by struct qca8k_priv *priv, sure. > > My current idea is maybe not ideal and a bit fuzzy, because the switch > > driver would need to be aware of the fact that the tagger private data > > is in dp->priv, and some code in one folder needs to be in sync with > > some code in another folder. But at least it should be safer this way, > > because we are in more control over the exact connection that's being > > made. > > > > - to avoid leaking memory, we also need to patch dsa_tree_put() to issue > > a disconnect event on unbind. > > > > - the tagging protocol driver would always need to NULL-check the > > function pointer before dereferencing it, because it may connect to a > > switch driver that doesn't set them up (dsa_loop): > > > > struct qca8k_tagger_private *priv = dp->priv; > > > > if (priv->rw_reg_ack_handler) > > priv->rw_reg_ack_handler(dp, skb_mac_header(skb)); > > Ok so your idea is to make the driver the one controlling ""everything"" > and keep the tagger as dummy as possible. That would also remove all the > need to put stuff in the global include dir. Looks complex but handy. We > still need to understand the state part. Any hint about that? > > In the mean time I will try implement this. What do you mean exactly by understanding the state?
On Wed, Dec 08, 2021 at 01:20:20AM +0200, Vladimir Oltean wrote: > On Wed, Dec 08, 2021 at 12:05:11AM +0100, Ansuel Smith wrote: > > Hm. Interesting idea. So qca8k would provide the way to parse the packet > > and made the request. The tagger would just detect the packet and > > execute the dedicated function. > > About mib considering the driver autocast counter for every port and > > every packet have the relevant port to it (set in the qca tag), the > > idea was to put a big array and directly write the data. The ethtool > > function will then just read the data and report it. (or even work > > directly on the ethtool data array). > > Apart from the fact that you'd be running inside the priv->rw_reg_ack_handler() > which runs in softirq context (so you need spinlocks to serialize with > the code that runs in process and/or workqueue context), you have access > to all the data structures from the switch driver that you're used to. > So you could copy from the void *buf into something owned by struct > qca8k_priv *priv, sure. > > > > My current idea is maybe not ideal and a bit fuzzy, because the switch > > > driver would need to be aware of the fact that the tagger private data > > > is in dp->priv, and some code in one folder needs to be in sync with > > > some code in another folder. But at least it should be safer this way, > > > because we are in more control over the exact connection that's being > > > made. > > > > > > - to avoid leaking memory, we also need to patch dsa_tree_put() to issue > > > a disconnect event on unbind. > > > > > > - the tagging protocol driver would always need to NULL-check the > > > function pointer before dereferencing it, because it may connect to a > > > switch driver that doesn't set them up (dsa_loop): > > > > > > struct qca8k_tagger_private *priv = dp->priv; > > > > > > if (priv->rw_reg_ack_handler) > > > priv->rw_reg_ack_handler(dp, skb_mac_header(skb)); > > > > Ok so your idea is to make the driver the one controlling ""everything"" > > and keep the tagger as dummy as possible. That would also remove all the > > need to put stuff in the global include dir. Looks complex but handy. We > > still need to understand the state part. Any hint about that? > > > > In the mean time I will try implement this. > > What do you mean exactly by understanding the state? I was referring to the "shared state" problem but you already answer that in the prev email.
On Tue, Dec 07, 2021 at 11:22:41PM +0100, Andrew Lunn wrote: > > I like the idea of tagger-owend per-switch-tree private data. > > Do we really need to hook logic? > > We have two different things here. > > 1) The tagger needs somewhere to store its own private data. > 2) The tagger needs to share state with the switch driver. > > We can probably have the DSA core provide 1). Add the size to > dsa_device_ops structure, and provide helpers to go from either a > master or a slave netdev to the private data. We cannot "add the size to the dsa_device_ops structure", because it is a singleton (const struct). It is not replicated at all, not per port, not per switch, not per tree, but global to the kernel. Not to mention const. Nobody needs state as shared as that. Given this, do you have objections to the sja1105_port->data model for shared state? > 2) is harder. But as far as i know, we have an 1:N setup. One switch > driver can use N tag drivers. So we need the switch driver to be sure > the tag driver is what it expects. We keep the shared state in the tag > driver, so it always has valid data, but when the switch driver wants > to get a pointer to it, it needs to pass a enum dsa_tag_protocol and > if it does not match, the core should return -EINVAL or similar. In my proposal, the tagger will allocate the memory from its side of the ->connect() call. So regardless of whether the switch driver side connects or not, the memory inside dp->priv is there for the tagger to use. The switch can access it or it can ignore it.
On Wed, Dec 08, 2021 at 01:47:36AM +0200, Vladimir Oltean wrote: > > 2) is harder. But as far as i know, we have an 1:N setup. One switch > > driver can use N tag drivers. So we need the switch driver to be sure > > the tag driver is what it expects. We keep the shared state in the tag > > driver, so it always has valid data, but when the switch driver wants > > to get a pointer to it, it needs to pass a enum dsa_tag_protocol and > > if it does not match, the core should return -EINVAL or similar. > > In my proposal, the tagger will allocate the memory from its side of the > ->connect() call. So regardless of whether the switch driver side > connects or not, the memory inside dp->priv is there for the tagger to > use. The switch can access it or it can ignore it. I don't think I actually said something useful here. The goal would be to minimize use of dp->priv inside the switch driver, outside of the actual ->connect() / ->disconnect() calls. For example, in the felix driver which supports two tagging protocol drivers, I think these two methods would be enough, and they would replace the current felix_port_setup_tagger_data() and felix_port_teardown_tagger_data() calls. An additional benefit would be that in ->connect() and ->disconnect() we get the actual tagging protocol in use. Currently the felix driver lacks there, because felix_port_setup_tagger_data() just sets dp->priv up unconditionally for the ocelot-8021q tagging protocol (luckily the normal ocelot tagger doesn't need dp->priv). In sja1105 the story is a bit longer, but I believe that can also be cleaned up to stay within the confines of ->connect()/->disconnect(). So I guess we just need to be careful and push back against dubious use during review.
On Wed, Dec 08, 2021 at 02:04:32AM +0200, Vladimir Oltean wrote: > On Wed, Dec 08, 2021 at 01:47:36AM +0200, Vladimir Oltean wrote: > > > 2) is harder. But as far as i know, we have an 1:N setup. One switch > > > driver can use N tag drivers. So we need the switch driver to be sure > > > the tag driver is what it expects. We keep the shared state in the tag > > > driver, so it always has valid data, but when the switch driver wants > > > to get a pointer to it, it needs to pass a enum dsa_tag_protocol and > > > if it does not match, the core should return -EINVAL or similar. > > > > In my proposal, the tagger will allocate the memory from its side of the > > ->connect() call. So regardless of whether the switch driver side > > connects or not, the memory inside dp->priv is there for the tagger to > > use. The switch can access it or it can ignore it. > > I don't think I actually said something useful here. > > The goal would be to minimize use of dp->priv inside the switch driver, > outside of the actual ->connect() / ->disconnect() calls. > For example, in the felix driver which supports two tagging protocol > drivers, I think these two methods would be enough, and they would > replace the current felix_port_setup_tagger_data() and > felix_port_teardown_tagger_data() calls. > > An additional benefit would be that in ->connect() and ->disconnect() we > get the actual tagging protocol in use. Currently the felix driver lacks > there, because felix_port_setup_tagger_data() just sets dp->priv up > unconditionally for the ocelot-8021q tagging protocol (luckily the > normal ocelot tagger doesn't need dp->priv). > > In sja1105 the story is a bit longer, but I believe that can also be > cleaned up to stay within the confines of ->connect()/->disconnect(). > > So I guess we just need to be careful and push back against dubious use > during review. I've started working on a prototype for converting sja1105 to this model. It should be clearer to me by tomorrow whether there is anything missing from this proposal.
On Wed, Dec 08, 2021 at 02:40:51AM +0200, Vladimir Oltean wrote: > On Wed, Dec 08, 2021 at 02:04:32AM +0200, Vladimir Oltean wrote: > > On Wed, Dec 08, 2021 at 01:47:36AM +0200, Vladimir Oltean wrote: > > > > 2) is harder. But as far as i know, we have an 1:N setup. One switch > > > > driver can use N tag drivers. So we need the switch driver to be sure > > > > the tag driver is what it expects. We keep the shared state in the tag > > > > driver, so it always has valid data, but when the switch driver wants > > > > to get a pointer to it, it needs to pass a enum dsa_tag_protocol and > > > > if it does not match, the core should return -EINVAL or similar. > > > > > > In my proposal, the tagger will allocate the memory from its side of the > > > ->connect() call. So regardless of whether the switch driver side > > > connects or not, the memory inside dp->priv is there for the tagger to > > > use. The switch can access it or it can ignore it. > > > > I don't think I actually said something useful here. > > > > The goal would be to minimize use of dp->priv inside the switch driver, > > outside of the actual ->connect() / ->disconnect() calls. > > For example, in the felix driver which supports two tagging protocol > > drivers, I think these two methods would be enough, and they would > > replace the current felix_port_setup_tagger_data() and > > felix_port_teardown_tagger_data() calls. > > > > An additional benefit would be that in ->connect() and ->disconnect() we > > get the actual tagging protocol in use. Currently the felix driver lacks > > there, because felix_port_setup_tagger_data() just sets dp->priv up > > unconditionally for the ocelot-8021q tagging protocol (luckily the > > normal ocelot tagger doesn't need dp->priv). > > > > In sja1105 the story is a bit longer, but I believe that can also be > > cleaned up to stay within the confines of ->connect()/->disconnect(). > > > > So I guess we just need to be careful and push back against dubious use > > during review. > > I've started working on a prototype for converting sja1105 to this model. > It should be clearer to me by tomorrow whether there is anything missing > from this proposal. I'm working on your suggestion and I should be able to post another RFC this night if all works correctly with my switch.
On Wed, Dec 08, 2021 at 01:42:59AM +0100, Ansuel Smith wrote: > On Wed, Dec 08, 2021 at 02:40:51AM +0200, Vladimir Oltean wrote: > > On Wed, Dec 08, 2021 at 02:04:32AM +0200, Vladimir Oltean wrote: > > > On Wed, Dec 08, 2021 at 01:47:36AM +0200, Vladimir Oltean wrote: > > > > > 2) is harder. But as far as i know, we have an 1:N setup. One switch > > > > > driver can use N tag drivers. So we need the switch driver to be sure > > > > > the tag driver is what it expects. We keep the shared state in the tag > > > > > driver, so it always has valid data, but when the switch driver wants > > > > > to get a pointer to it, it needs to pass a enum dsa_tag_protocol and > > > > > if it does not match, the core should return -EINVAL or similar. > > > > > > > > In my proposal, the tagger will allocate the memory from its side of the > > > > ->connect() call. So regardless of whether the switch driver side > > > > connects or not, the memory inside dp->priv is there for the tagger to > > > > use. The switch can access it or it can ignore it. > > > > > > I don't think I actually said something useful here. > > > > > > The goal would be to minimize use of dp->priv inside the switch driver, > > > outside of the actual ->connect() / ->disconnect() calls. > > > For example, in the felix driver which supports two tagging protocol > > > drivers, I think these two methods would be enough, and they would > > > replace the current felix_port_setup_tagger_data() and > > > felix_port_teardown_tagger_data() calls. > > > > > > An additional benefit would be that in ->connect() and ->disconnect() we > > > get the actual tagging protocol in use. Currently the felix driver lacks > > > there, because felix_port_setup_tagger_data() just sets dp->priv up > > > unconditionally for the ocelot-8021q tagging protocol (luckily the > > > normal ocelot tagger doesn't need dp->priv). > > > > > > In sja1105 the story is a bit longer, but I believe that can also be > > > cleaned up to stay within the confines of ->connect()/->disconnect(). > > > > > > So I guess we just need to be careful and push back against dubious use > > > during review. > > > > I've started working on a prototype for converting sja1105 to this model. > > It should be clearer to me by tomorrow whether there is anything missing > > from this proposal. > > I'm working on your suggestion and I should be able to post another RFC > this night if all works correctly with my switch. Ok. The key point with my progress so far is that Andrew may be right and we might need separate tagger priv pointers per port and per switch. At least for the cleanliness of implementation. In the end I plan to remove dp->priv and stay with dp->tagger_priv and ds->tagger_priv. Here's what I have so far. I have more changes locally, but the rest it isn't ready and overall also a bit irrelevant for the discussion. I'm going to sleep now. diff --git a/include/net/dsa.h b/include/net/dsa.h index bdf308a5c55e..f0f702774c8d 100644 --- a/include/net/dsa.h +++ b/include/net/dsa.h @@ -82,12 +82,15 @@ enum dsa_tag_protocol { }; struct dsa_switch; +struct dsa_switch_tree; struct dsa_device_ops { struct sk_buff *(*xmit)(struct sk_buff *skb, struct net_device *dev); struct sk_buff *(*rcv)(struct sk_buff *skb, struct net_device *dev); void (*flow_dissect)(const struct sk_buff *skb, __be16 *proto, int *offset); + int (*connect)(struct dsa_switch_tree *dst); + void (*disconnect)(struct dsa_switch_tree *dst); unsigned int needed_headroom; unsigned int needed_tailroom; const char *name; @@ -279,6 +282,8 @@ struct dsa_port { */ void *priv; + void *tagger_priv; + /* * Original copy of the master netdev ethtool_ops */ @@ -337,6 +342,8 @@ struct dsa_switch { */ void *priv; + void *tagger_priv; + /* * Configuration data for this switch. */ @@ -689,6 +696,8 @@ struct dsa_switch_ops { enum dsa_tag_protocol mprot); int (*change_tag_protocol)(struct dsa_switch *ds, int port, enum dsa_tag_protocol proto); + int (*connect_tag_protocol)(struct dsa_switch *ds, + enum dsa_tag_protocol proto); /* Optional switch-wide initialization and destruction methods */ int (*setup)(struct dsa_switch *ds); diff --git a/net/dsa/dsa2.c b/net/dsa/dsa2.c index 8814fa0e44c8..3787cbce1175 100644 --- a/net/dsa/dsa2.c +++ b/net/dsa/dsa2.c @@ -248,8 +248,12 @@ static struct dsa_switch_tree *dsa_tree_alloc(int index) static void dsa_tree_free(struct dsa_switch_tree *dst) { - if (dst->tag_ops) + if (dst->tag_ops) { + if (dst->tag_ops->disconnect) + dst->tag_ops->disconnect(dst); + dsa_tag_driver_put(dst->tag_ops); + } list_del(&dst->list); kfree(dst); } @@ -1136,6 +1140,36 @@ static void dsa_tree_teardown(struct dsa_switch_tree *dst) dst->setup = false; } +static int dsa_tree_bind_tag_proto(struct dsa_switch_tree *dst, + const struct dsa_device_ops *tag_ops) +{ + struct dsa_notifier_tag_proto_info info; + int err; + + if (dst->tag_ops && dst->tag_ops->disconnect) + dst->tag_ops->disconnect(dst); + + if (tag_ops->connect) { + err = tag_ops->connect(dst); + if (err) + return err; + } + + info.tag_ops = tag_ops; + err = dsa_tree_notify(dst, DSA_NOTIFIER_TAG_PROTO_CONNECT, &info); + if (err && err != -EOPNOTSUPP) + goto out_disconnect; + + dst->tag_ops = tag_ops; + + return 0; + +out_disconnect: + if (tag_ops->disconnect) + tag_ops->disconnect(dst); + return err; +} + /* Since the dsa/tagging sysfs device attribute is per master, the assumption * is that all DSA switches within a tree share the same tagger, otherwise * they would have formed disjoint trees (different "dsa,member" values). @@ -1173,7 +1207,9 @@ int dsa_tree_change_tag_proto(struct dsa_switch_tree *dst, if (err) goto out_unwind_tagger; - dst->tag_ops = tag_ops; + err = dsa_tree_bind_tag_proto(dst, tag_ops); + if (err) + goto out_unwind_tagger; rtnl_unlock(); @@ -1307,7 +1343,9 @@ static int dsa_port_parse_cpu(struct dsa_port *dp, struct net_device *master, */ dsa_tag_driver_put(tag_ops); } else { - dst->tag_ops = tag_ops; + err = dsa_tree_bind_tag_proto(dst, tag_ops); + if (err) + return err; } dp->master = master; diff --git a/net/dsa/dsa_priv.h b/net/dsa/dsa_priv.h index 38ce5129a33d..0db2b26b0c83 100644 --- a/net/dsa/dsa_priv.h +++ b/net/dsa/dsa_priv.h @@ -37,6 +37,7 @@ enum { DSA_NOTIFIER_VLAN_DEL, DSA_NOTIFIER_MTU, DSA_NOTIFIER_TAG_PROTO, + DSA_NOTIFIER_TAG_PROTO_CONNECT, DSA_NOTIFIER_MRP_ADD, DSA_NOTIFIER_MRP_DEL, DSA_NOTIFIER_MRP_ADD_RING_ROLE, diff --git a/net/dsa/switch.c b/net/dsa/switch.c index 9c92edd96961..06948f536829 100644 --- a/net/dsa/switch.c +++ b/net/dsa/switch.c @@ -647,6 +647,17 @@ static int dsa_switch_change_tag_proto(struct dsa_switch *ds, return 0; } +static int dsa_switch_connect_tag_proto(struct dsa_switch *ds, + struct dsa_notifier_tag_proto_info *info) +{ + const struct dsa_device_ops *tag_ops = info->tag_ops; + + if (!ds->ops->connect_tag_protocol) + return -EOPNOTSUPP; + + return ds->ops->connect_tag_protocol(ds, tag_ops->proto); +} + static int dsa_switch_mrp_add(struct dsa_switch *ds, struct dsa_notifier_mrp_info *info) { @@ -766,6 +777,9 @@ static int dsa_switch_event(struct notifier_block *nb, case DSA_NOTIFIER_TAG_PROTO: err = dsa_switch_change_tag_proto(ds, info); break; + case DSA_NOTIFIER_TAG_PROTO_CONNECT: + err = dsa_switch_connect_tag_proto(ds, info); + break; case DSA_NOTIFIER_MRP_ADD: err = dsa_switch_mrp_add(ds, info); break; diff --git a/net/dsa/tag_sja1105.c b/net/dsa/tag_sja1105.c index 6c293c2a3008..53362a0f0aab 100644 --- a/net/dsa/tag_sja1105.c +++ b/net/dsa/tag_sja1105.c @@ -722,11 +722,59 @@ static void sja1110_flow_dissect(const struct sk_buff *skb, __be16 *proto, *proto = ((__be16 *)skb->data)[(VLAN_HLEN / 2) - 1]; } +static void sja1105_disconnect(struct dsa_switch_tree *dst) +{ + struct dsa_port *dp; + + dsa_tree_for_each_user_port(dp, dst) { + if (dp->tagger_priv) { + kfree(dp->tagger_priv); + dp->tagger_priv = NULL; + } + + if (dp->ds->tagger_priv) { + kfree(dp->ds->tagger_priv); + dp->ds->tagger_priv = NULL; + } + } +} + +static int sja1105_connect(struct dsa_switch_tree *dst) +{ + struct sja1105_tagger_data *data; + struct sja1105_port *sp; + struct dsa_port *dp; + + dsa_tree_for_each_user_port(dp, dst) { + if (!dp->ds->tagger_priv) { + data = kzalloc(sizeof(*data), GFP_KERNEL); + if (!data) + goto out; + + dp->ds->tagger_priv = data; + } + + sp = kzalloc(sizeof(*sp), GFP_KERNEL); + if (!sp) + goto out; + + dp->tagger_priv = sp; + } + + return 0; + +out: + sja1105_disconnect(dst); + return -ENOMEM; +} + static const struct dsa_device_ops sja1105_netdev_ops = { .name = "sja1105", .proto = DSA_TAG_PROTO_SJA1105, .xmit = sja1105_xmit, .rcv = sja1105_rcv, + .connect = sja1105_connect, + .disconnect = sja1105_disconnect, .needed_headroom = VLAN_HLEN, .flow_dissect = sja1105_flow_dissect, .promisc_on_master = true, @@ -740,6 +788,8 @@ static const struct dsa_device_ops sja1110_netdev_ops = { .proto = DSA_TAG_PROTO_SJA1110, .xmit = sja1110_xmit, .rcv = sja1110_rcv, + .connect = sja1105_connect, + .disconnect = sja1105_disconnect, .flow_dissect = sja1110_flow_dissect, .needed_headroom = SJA1110_HEADER_LEN + VLAN_HLEN, .needed_tailroom = SJA1110_RX_TRAILER_LEN + SJA1110_MAX_PADDING_LEN,
On Wed, Dec 08, 2021 at 01:47:36AM +0200, Vladimir Oltean wrote: > On Tue, Dec 07, 2021 at 11:22:41PM +0100, Andrew Lunn wrote: > > > I like the idea of tagger-owend per-switch-tree private data. > > > Do we really need to hook logic? > > > > We have two different things here. > > > > 1) The tagger needs somewhere to store its own private data. > > 2) The tagger needs to share state with the switch driver. > > > > We can probably have the DSA core provide 1). Add the size to > > dsa_device_ops structure, and provide helpers to go from either a > > master or a slave netdev to the private data. > > We cannot "add the size to the dsa_device_ops structure", because it is > a singleton (const struct). It is not replicated at all, not per port, > not per switch, not per tree, but global to the kernel. Not to mention > const. Nobody needs state as shared as that. What i'm suggesting is static const struct dsa_device_ops edsa_netdev_ops = { .name = "edsa", .proto = DSA_TAG_PROTO_EDSA, .xmit = edsa_xmit, .rcv = edsa_rcv, .needed_headroom = EDSA_HLEN, .priv_size = 42; }; The priv_size indicates that an instance of this tagger needs 42 bytes of private data. More likely it will be a sizeof(struct dsa_priv), but that is a detail. When a master is setup and the tagger instantiated, 42 bytes of memory will be allocated and put somewhere it can be found via a helper. This is not meant for shared state between the tagger and the switch driver, this is private to the tagger. As such it is less likely to be dependent on the number of ports etc. It is somewhere to store an skb pointer, maybe a sequence number for the management message expected as a reply from the switch etc. Andrew
On Wed, Dec 08, 2021 at 01:14:49AM +0200, Vladimir Oltean wrote: > On Tue, Dec 07, 2021 at 11:54:07PM +0100, Andrew Lunn wrote: > > > I considered a simplified form like this, but I think the tagger private > > > data will still stay in dp->priv, only its ownership will change. > > > > Isn't dp a port structure. So there is one per port? > > Yes, but dp->priv is a pointer. The thing it points to may not > necessarily be per port. > > > This is where i think we need to separate shared state from tagger > > private data. Probably tagger private data is not per port. Shared > > state between the switch driver and the tagger maybe is per port? > > I don't know whether there's such a big difference between > "shared state" vs "private data"? The difference is to do with stopping the kernel exploding when the switch driver is not using the tagger it expects. Anything which is private to the tagger is not a problem. Only the tagger uses it, so it cannot be wrong. Anything which is shared between the tagger and the switch driver we have to be careful about. We are just passing void * pointers about. There is no type checking. If i'm correct about the 1:N relationship, we can store shared state in the tagger. The tagger should be O.K, because it only ever needs to deal with one format of shared state. The switch driver needs to handle N different formats of shared state, depending on which of the N different taggers are in operation. Ideally, when it asks for the void * pointer for shared information, some sort of checking is performed to ensure the void * is what the switch driver actually expects. Maybe it needs to pass the tag driver it thinks it is talking to, or as well as getting the void * back, it also gets the tag enum and it verifies it actually knows about that tag driver. Andrew
On Wed, Dec 08, 2021 at 03:09:47AM +0200, Vladimir Oltean wrote: > On Wed, Dec 08, 2021 at 01:42:59AM +0100, Ansuel Smith wrote: > > On Wed, Dec 08, 2021 at 02:40:51AM +0200, Vladimir Oltean wrote: > > > On Wed, Dec 08, 2021 at 02:04:32AM +0200, Vladimir Oltean wrote: > > > > On Wed, Dec 08, 2021 at 01:47:36AM +0200, Vladimir Oltean wrote: > > > > > > 2) is harder. But as far as i know, we have an 1:N setup. One switch > > > > > > driver can use N tag drivers. So we need the switch driver to be sure > > > > > > the tag driver is what it expects. We keep the shared state in the tag > > > > > > driver, so it always has valid data, but when the switch driver wants > > > > > > to get a pointer to it, it needs to pass a enum dsa_tag_protocol and > > > > > > if it does not match, the core should return -EINVAL or similar. > > > > > > > > > > In my proposal, the tagger will allocate the memory from its side of the > > > > > ->connect() call. So regardless of whether the switch driver side > > > > > connects or not, the memory inside dp->priv is there for the tagger to > > > > > use. The switch can access it or it can ignore it. > > > > > > > > I don't think I actually said something useful here. > > > > > > > > The goal would be to minimize use of dp->priv inside the switch driver, > > > > outside of the actual ->connect() / ->disconnect() calls. > > > > For example, in the felix driver which supports two tagging protocol > > > > drivers, I think these two methods would be enough, and they would > > > > replace the current felix_port_setup_tagger_data() and > > > > felix_port_teardown_tagger_data() calls. > > > > > > > > An additional benefit would be that in ->connect() and ->disconnect() we > > > > get the actual tagging protocol in use. Currently the felix driver lacks > > > > there, because felix_port_setup_tagger_data() just sets dp->priv up > > > > unconditionally for the ocelot-8021q tagging protocol (luckily the > > > > normal ocelot tagger doesn't need dp->priv). > > > > > > > > In sja1105 the story is a bit longer, but I believe that can also be > > > > cleaned up to stay within the confines of ->connect()/->disconnect(). > > > > > > > > So I guess we just need to be careful and push back against dubious use > > > > during review. > > > > > > I've started working on a prototype for converting sja1105 to this model. > > > It should be clearer to me by tomorrow whether there is anything missing > > > from this proposal. > > > > I'm working on your suggestion and I should be able to post another RFC > > this night if all works correctly with my switch. > > Ok. The key point with my progress so far is that Andrew may be right > and we might need separate tagger priv pointers per port and per switch. > At least for the cleanliness of implementation. In the end I plan to > remove dp->priv and stay with dp->tagger_priv and ds->tagger_priv. > > Here's what I have so far. I have more changes locally, but the rest it > isn't ready and overall also a bit irrelevant for the discussion. > I'm going to sleep now. > BTW, I notice we made the same mistake. Don't know if it was the problem and you didn't notice... The notifier is not ready at times of the first tagger setup and the tag_proto_connect is never called. Anyway sending v2 with your suggestion applied. > diff --git a/include/net/dsa.h b/include/net/dsa.h > index bdf308a5c55e..f0f702774c8d 100644 > --- a/include/net/dsa.h > +++ b/include/net/dsa.h > @@ -82,12 +82,15 @@ enum dsa_tag_protocol { > }; > > struct dsa_switch; > +struct dsa_switch_tree; > > struct dsa_device_ops { > struct sk_buff *(*xmit)(struct sk_buff *skb, struct net_device *dev); > struct sk_buff *(*rcv)(struct sk_buff *skb, struct net_device *dev); > void (*flow_dissect)(const struct sk_buff *skb, __be16 *proto, > int *offset); > + int (*connect)(struct dsa_switch_tree *dst); > + void (*disconnect)(struct dsa_switch_tree *dst); > unsigned int needed_headroom; > unsigned int needed_tailroom; > const char *name; > @@ -279,6 +282,8 @@ struct dsa_port { > */ > void *priv; > > + void *tagger_priv; > + > /* > * Original copy of the master netdev ethtool_ops > */ > @@ -337,6 +342,8 @@ struct dsa_switch { > */ > void *priv; > > + void *tagger_priv; > + > /* > * Configuration data for this switch. > */ > @@ -689,6 +696,8 @@ struct dsa_switch_ops { > enum dsa_tag_protocol mprot); > int (*change_tag_protocol)(struct dsa_switch *ds, int port, > enum dsa_tag_protocol proto); > + int (*connect_tag_protocol)(struct dsa_switch *ds, > + enum dsa_tag_protocol proto); > > /* Optional switch-wide initialization and destruction methods */ > int (*setup)(struct dsa_switch *ds); > diff --git a/net/dsa/dsa2.c b/net/dsa/dsa2.c > index 8814fa0e44c8..3787cbce1175 100644 > --- a/net/dsa/dsa2.c > +++ b/net/dsa/dsa2.c > @@ -248,8 +248,12 @@ static struct dsa_switch_tree *dsa_tree_alloc(int index) > > static void dsa_tree_free(struct dsa_switch_tree *dst) > { > - if (dst->tag_ops) > + if (dst->tag_ops) { > + if (dst->tag_ops->disconnect) > + dst->tag_ops->disconnect(dst); > + > dsa_tag_driver_put(dst->tag_ops); > + } > list_del(&dst->list); > kfree(dst); > } > @@ -1136,6 +1140,36 @@ static void dsa_tree_teardown(struct dsa_switch_tree *dst) > dst->setup = false; > } > > +static int dsa_tree_bind_tag_proto(struct dsa_switch_tree *dst, > + const struct dsa_device_ops *tag_ops) > +{ > + struct dsa_notifier_tag_proto_info info; > + int err; > + > + if (dst->tag_ops && dst->tag_ops->disconnect) > + dst->tag_ops->disconnect(dst); > + > + if (tag_ops->connect) { > + err = tag_ops->connect(dst); > + if (err) > + return err; > + } > + > + info.tag_ops = tag_ops; > + err = dsa_tree_notify(dst, DSA_NOTIFIER_TAG_PROTO_CONNECT, &info); > + if (err && err != -EOPNOTSUPP) > + goto out_disconnect; > + > + dst->tag_ops = tag_ops; > + > + return 0; > + > +out_disconnect: > + if (tag_ops->disconnect) > + tag_ops->disconnect(dst); > + return err; > +} > + > /* Since the dsa/tagging sysfs device attribute is per master, the assumption > * is that all DSA switches within a tree share the same tagger, otherwise > * they would have formed disjoint trees (different "dsa,member" values). > @@ -1173,7 +1207,9 @@ int dsa_tree_change_tag_proto(struct dsa_switch_tree *dst, > if (err) > goto out_unwind_tagger; > > - dst->tag_ops = tag_ops; > + err = dsa_tree_bind_tag_proto(dst, tag_ops); > + if (err) > + goto out_unwind_tagger; > > rtnl_unlock(); > > @@ -1307,7 +1343,9 @@ static int dsa_port_parse_cpu(struct dsa_port *dp, struct net_device *master, > */ > dsa_tag_driver_put(tag_ops); > } else { > - dst->tag_ops = tag_ops; > + err = dsa_tree_bind_tag_proto(dst, tag_ops); > + if (err) > + return err; > } > > dp->master = master; > diff --git a/net/dsa/dsa_priv.h b/net/dsa/dsa_priv.h > index 38ce5129a33d..0db2b26b0c83 100644 > --- a/net/dsa/dsa_priv.h > +++ b/net/dsa/dsa_priv.h > @@ -37,6 +37,7 @@ enum { > DSA_NOTIFIER_VLAN_DEL, > DSA_NOTIFIER_MTU, > DSA_NOTIFIER_TAG_PROTO, > + DSA_NOTIFIER_TAG_PROTO_CONNECT, > DSA_NOTIFIER_MRP_ADD, > DSA_NOTIFIER_MRP_DEL, > DSA_NOTIFIER_MRP_ADD_RING_ROLE, > diff --git a/net/dsa/switch.c b/net/dsa/switch.c > index 9c92edd96961..06948f536829 100644 > --- a/net/dsa/switch.c > +++ b/net/dsa/switch.c > @@ -647,6 +647,17 @@ static int dsa_switch_change_tag_proto(struct dsa_switch *ds, > return 0; > } > > +static int dsa_switch_connect_tag_proto(struct dsa_switch *ds, > + struct dsa_notifier_tag_proto_info *info) > +{ > + const struct dsa_device_ops *tag_ops = info->tag_ops; > + > + if (!ds->ops->connect_tag_protocol) > + return -EOPNOTSUPP; > + > + return ds->ops->connect_tag_protocol(ds, tag_ops->proto); > +} > + > static int dsa_switch_mrp_add(struct dsa_switch *ds, > struct dsa_notifier_mrp_info *info) > { > @@ -766,6 +777,9 @@ static int dsa_switch_event(struct notifier_block *nb, > case DSA_NOTIFIER_TAG_PROTO: > err = dsa_switch_change_tag_proto(ds, info); > break; > + case DSA_NOTIFIER_TAG_PROTO_CONNECT: > + err = dsa_switch_connect_tag_proto(ds, info); > + break; > case DSA_NOTIFIER_MRP_ADD: > err = dsa_switch_mrp_add(ds, info); > break; > diff --git a/net/dsa/tag_sja1105.c b/net/dsa/tag_sja1105.c > index 6c293c2a3008..53362a0f0aab 100644 > --- a/net/dsa/tag_sja1105.c > +++ b/net/dsa/tag_sja1105.c > @@ -722,11 +722,59 @@ static void sja1110_flow_dissect(const struct sk_buff *skb, __be16 *proto, > *proto = ((__be16 *)skb->data)[(VLAN_HLEN / 2) - 1]; > } > > +static void sja1105_disconnect(struct dsa_switch_tree *dst) > +{ > + struct dsa_port *dp; > + > + dsa_tree_for_each_user_port(dp, dst) { > + if (dp->tagger_priv) { > + kfree(dp->tagger_priv); > + dp->tagger_priv = NULL; > + } > + > + if (dp->ds->tagger_priv) { > + kfree(dp->ds->tagger_priv); > + dp->ds->tagger_priv = NULL; > + } > + } > +} > + > +static int sja1105_connect(struct dsa_switch_tree *dst) > +{ > + struct sja1105_tagger_data *data; > + struct sja1105_port *sp; > + struct dsa_port *dp; > + > + dsa_tree_for_each_user_port(dp, dst) { > + if (!dp->ds->tagger_priv) { > + data = kzalloc(sizeof(*data), GFP_KERNEL); > + if (!data) > + goto out; > + > + dp->ds->tagger_priv = data; > + } > + > + sp = kzalloc(sizeof(*sp), GFP_KERNEL); > + if (!sp) > + goto out; > + > + dp->tagger_priv = sp; > + } > + > + return 0; > + > +out: > + sja1105_disconnect(dst); > + return -ENOMEM; > +} > + > static const struct dsa_device_ops sja1105_netdev_ops = { > .name = "sja1105", > .proto = DSA_TAG_PROTO_SJA1105, > .xmit = sja1105_xmit, > .rcv = sja1105_rcv, > + .connect = sja1105_connect, > + .disconnect = sja1105_disconnect, > .needed_headroom = VLAN_HLEN, > .flow_dissect = sja1105_flow_dissect, > .promisc_on_master = true, > @@ -740,6 +788,8 @@ static const struct dsa_device_ops sja1110_netdev_ops = { > .proto = DSA_TAG_PROTO_SJA1110, > .xmit = sja1110_xmit, > .rcv = sja1110_rcv, > + .connect = sja1105_connect, > + .disconnect = sja1105_disconnect, > .flow_dissect = sja1110_flow_dissect, > .needed_headroom = SJA1110_HEADER_LEN + VLAN_HLEN, > .needed_tailroom = SJA1110_RX_TRAILER_LEN + SJA1110_MAX_PADDING_LEN, > -- > 2.25.1
On Wed, Dec 08, 2021 at 02:35:34AM +0100, Andrew Lunn wrote: > On Wed, Dec 08, 2021 at 01:14:49AM +0200, Vladimir Oltean wrote: > > On Tue, Dec 07, 2021 at 11:54:07PM +0100, Andrew Lunn wrote: > > > > I considered a simplified form like this, but I think the tagger private > > > > data will still stay in dp->priv, only its ownership will change. > > > > > > Isn't dp a port structure. So there is one per port? > > > > Yes, but dp->priv is a pointer. The thing it points to may not > > necessarily be per port. > > > > > This is where i think we need to separate shared state from tagger > > > private data. Probably tagger private data is not per port. Shared > > > state between the switch driver and the tagger maybe is per port? > > > > I don't know whether there's such a big difference between > > "shared state" vs "private data"? > > The difference is to do with stopping the kernel exploding when the > switch driver is not using the tagger it expects. > > Anything which is private to the tagger is not a problem. Only the > tagger uses it, so it cannot be wrong. > > Anything which is shared between the tagger and the switch driver we > have to be careful about. We are just passing void * pointers > about. There is no type checking. If i'm correct about the 1:N > relationship, we can store shared state in the tagger. The tagger > should be O.K, because it only ever needs to deal with one format of > shared state. The switch driver needs to handle N different formats of > shared state, depending on which of the N different taggers are in > operation. Ideally, when it asks for the void * pointer for shared > information, some sort of checking is performed to ensure the void * > is what the switch driver actually expects. Maybe it needs to pass the > tag driver it thinks it is talking to, or as well as getting the void > * back, it also gets the tag enum and it verifies it actually knows > about that tag driver. > > Andrew I'm sending v2 with Vladimir suggestion so we can start working on that. Hope with a some split code it would be easier to find the problem with this and find a way to correctly validate the shared data between tagger and dsa driver. (you will probably have to rewrite this also for v2 and sorry for this)
On Wed, Dec 08, 2021 at 02:35:34AM +0100, Andrew Lunn wrote: > On Wed, Dec 08, 2021 at 01:14:49AM +0200, Vladimir Oltean wrote: > > On Tue, Dec 07, 2021 at 11:54:07PM +0100, Andrew Lunn wrote: > > > > I considered a simplified form like this, but I think the tagger private > > > > data will still stay in dp->priv, only its ownership will change. > > > > > > Isn't dp a port structure. So there is one per port? > > > > Yes, but dp->priv is a pointer. The thing it points to may not > > necessarily be per port. > > > > > This is where i think we need to separate shared state from tagger > > > private data. Probably tagger private data is not per port. Shared > > > state between the switch driver and the tagger maybe is per port? > > > > I don't know whether there's such a big difference between > > "shared state" vs "private data"? > > The difference is to do with stopping the kernel exploding when the > switch driver is not using the tagger it expects. > > Anything which is private to the tagger is not a problem. Only the > tagger uses it, so it cannot be wrong. > > Anything which is shared between the tagger and the switch driver we > have to be careful about. We are just passing void * pointers > about. There is no type checking. If i'm correct about the 1:N > relationship, we can store shared state in the tagger. The tagger > should be O.K, because it only ever needs to deal with one format of > shared state. The switch driver needs to handle N different formats of > shared state, depending on which of the N different taggers are in > operation. Ideally, when it asks for the void * pointer for shared > information, some sort of checking is performed to ensure the void * > is what the switch driver actually expects. Maybe it needs to pass the > tag driver it thinks it is talking to, or as well as getting the void > * back, it also gets the tag enum and it verifies it actually knows > about that tag driver. Understood what you mean now (actually I don't know what was unclear yesterday). I should start doing something else past a certain hour... What I've started doing now is something like this: /* include/linux/dsa/ocelot.h */ struct ocelot_8021q_tagger_data { void (*xmit_work_fn)(struct kthread_work *work); }; static inline struct ocelot_8021q_tagger_data * ocelot_8021q_tagger_data(struct dsa_switch *ds) { BUG_ON(ds->dst->tag_ops->proto != DSA_TAG_PROTO_OCELOT_8021Q); return ds->tagger_data; } /* net/dsa/tag_ocelot_8021q.c */ struct ocelot_8021q_tagger_private { struct ocelot_8021q_tagger_data data; /* Must be first */ struct kthread_worker *xmit_worker; }; static struct sk_buff *ocelot_defer_xmit(struct dsa_port *dp, struct sk_buff *skb) { struct ocelot_8021q_tagger_private *priv = dp->ds->tagger_data; struct ocelot_8021q_tagger_data *data = &priv->data; void (*xmit_work_fn)(struct kthread_work *work); struct felix_deferred_xmit_work *xmit_work; struct kthread_worker *xmit_worker; xmit_work_fn = data->xmit_work_fn; xmit_worker = priv->xmit_worker; if (!xmit_work_fn || !xmit_worker) return NULL; xmit_work = kzalloc(sizeof(*xmit_work), GFP_ATOMIC); if (!xmit_work) return NULL; /* Calls felix_port_deferred_xmit in felix.c */ kthread_init_work(&xmit_work->work, xmit_work_fn); /* Increase refcount so the kfree_skb in dsa_slave_xmit * won't really free the packet. */ xmit_work->dp = dp; xmit_work->skb = skb_get(skb); kthread_queue_work(xmit_worker, &xmit_work->work); return NULL; } static void ocelot_disconnect(struct dsa_switch_tree *dst) { struct ocelot_8021q_tagger_private *priv; struct dsa_port *dp; list_for_each_entry(dp, &dst->ports, list) { priv = dp->ds->tagger_data; if (!priv) continue; if (priv->xmit_worker) kthread_destroy_worker(priv->xmit_worker); kfree(priv); dp->ds->tagger_data = NULL; } } static int ocelot_connect(struct dsa_switch_tree *dst) { struct ocelot_8021q_tagger_private *priv; struct dsa_port *dp; int err; list_for_each_entry(dp, &dst->ports, list) { if (dp->ds->tagger_data) continue; priv = kzalloc(sizeof(*priv), GFP_KERNEL); if (!priv) { err = -ENOMEM; goto out; } priv->xmit_worker = kthread_create_worker(0, "felix_xmit"); if (IS_ERR(priv->xmit_worker)) { err = PTR_ERR(priv->xmit_worker); goto out; } dp->ds->tagger_data = priv; } return 0; out: ocelot_disconnect(dst); return err; } /* drivers/net/dsa/felix.c */ static int felix_connect_tag_protocol(struct dsa_switch *ds, enum dsa_tag_protocol proto) { struct ocelot_8021q_tagger_data *tagger_data; switch (proto) { case DSA_TAG_PROTO_OCELOT_8021Q: tagger_data = ocelot_8021q_tagger_data(ds); tagger_data->xmit_work_fn = felix_port_deferred_xmit; return 0; case DSA_TAG_PROTO_OCELOT: case DSA_TAG_PROTO_SEVILLE: return 0; default: return -EPROTONOSUPPORT; } } Something like this shares memory between what's private and what's public (it's part of the same allocation), but there's type checking at least, and private data isn't exposed directly. Is this ok?
On Wed, Dec 08, 2021 at 04:32:43AM +0100, Ansuel Smith wrote: > On Wed, Dec 08, 2021 at 03:09:47AM +0200, Vladimir Oltean wrote: > > On Wed, Dec 08, 2021 at 01:42:59AM +0100, Ansuel Smith wrote: > > > On Wed, Dec 08, 2021 at 02:40:51AM +0200, Vladimir Oltean wrote: > > > > On Wed, Dec 08, 2021 at 02:04:32AM +0200, Vladimir Oltean wrote: > > > > > On Wed, Dec 08, 2021 at 01:47:36AM +0200, Vladimir Oltean wrote: > > > > > > > 2) is harder. But as far as i know, we have an 1:N setup. One switch > > > > > > > driver can use N tag drivers. So we need the switch driver to be sure > > > > > > > the tag driver is what it expects. We keep the shared state in the tag > > > > > > > driver, so it always has valid data, but when the switch driver wants > > > > > > > to get a pointer to it, it needs to pass a enum dsa_tag_protocol and > > > > > > > if it does not match, the core should return -EINVAL or similar. > > > > > > > > > > > > In my proposal, the tagger will allocate the memory from its side of the > > > > > > ->connect() call. So regardless of whether the switch driver side > > > > > > connects or not, the memory inside dp->priv is there for the tagger to > > > > > > use. The switch can access it or it can ignore it. > > > > > > > > > > I don't think I actually said something useful here. > > > > > > > > > > The goal would be to minimize use of dp->priv inside the switch driver, > > > > > outside of the actual ->connect() / ->disconnect() calls. > > > > > For example, in the felix driver which supports two tagging protocol > > > > > drivers, I think these two methods would be enough, and they would > > > > > replace the current felix_port_setup_tagger_data() and > > > > > felix_port_teardown_tagger_data() calls. > > > > > > > > > > An additional benefit would be that in ->connect() and ->disconnect() we > > > > > get the actual tagging protocol in use. Currently the felix driver lacks > > > > > there, because felix_port_setup_tagger_data() just sets dp->priv up > > > > > unconditionally for the ocelot-8021q tagging protocol (luckily the > > > > > normal ocelot tagger doesn't need dp->priv). > > > > > > > > > > In sja1105 the story is a bit longer, but I believe that can also be > > > > > cleaned up to stay within the confines of ->connect()/->disconnect(). > > > > > > > > > > So I guess we just need to be careful and push back against dubious use > > > > > during review. > > > > > > > > I've started working on a prototype for converting sja1105 to this model. > > > > It should be clearer to me by tomorrow whether there is anything missing > > > > from this proposal. > > > > > > I'm working on your suggestion and I should be able to post another RFC > > > this night if all works correctly with my switch. > > > > Ok. The key point with my progress so far is that Andrew may be right > > and we might need separate tagger priv pointers per port and per switch. > > At least for the cleanliness of implementation. In the end I plan to > > remove dp->priv and stay with dp->tagger_priv and ds->tagger_priv. > > > > Here's what I have so far. I have more changes locally, but the rest it > > isn't ready and overall also a bit irrelevant for the discussion. > > I'm going to sleep now. > > > > BTW, I notice we made the same mistake. Don't know if it was the problem > and you didn't notice... The notifier is not ready at times of the first > tagger setup and the tag_proto_connect is never called. > Anyway sending v2 with your suggestion applied. I didn't go past the compilation stage yesterday. Anyway, now that you mention it, I remember Tobias hitting this issue as well when he worked on changing tagging protocol via device tree, and this is why dsa_switch_setup_tag_protocol() exists. I believe that's where we'd need to call ds->ops->connect_tag_proto from, like this: static int dsa_switch_setup_tag_protocol(struct dsa_switch *ds) { const struct dsa_device_ops *tag_ops = ds->dst->tag_ops; struct dsa_switch_tree *dst = ds->dst; struct dsa_port *cpu_dp; int err; if (tag_ops->proto == dst->default_proto) goto connect; dsa_switch_for_each_cpu_port(cpu_dp, ds) { rtnl_lock(); err = ds->ops->change_tag_protocol(ds, cpu_dp->index, tag_ops->proto); rtnl_unlock(); if (err) { dev_err(ds->dev, "Unable to use tag protocol \"%s\": %pe\n", tag_ops->name, ERR_PTR(err)); return err; } } connect: if (ds->ops->connect_tag_protocol) { err = ds->ops->connect_tag_protocol(ds, tag_ops->proto); if (err) { dev_err(ds->dev, "Unable to connect to tag protocol \"%s\": %pe\n", tag_ops->name, ERR_PTR(err)); return err; } } return 0; }