mbox series

[RFC,net-next,0/9] net: pcs: Add support for devices probed in the "usual" manner

Message ID 20220711160519.741990-1-sean.anderson@seco.com (mailing list archive)
Headers show
Series net: pcs: Add support for devices probed in the "usual" manner | expand

Message

Sean Anderson July 11, 2022, 4:05 p.m. UTC
For a long time, PCSs have been tightly coupled with their MACs. For
this reason, the MAC creates the "phy" or mdio device, and then passes
it to the PCS to initialize. This has a few disadvantages:

- Each MAC must re-implement the same steps to look up/create a PCS
- The PCS cannot use functions tied to device lifetime, such as devm_*.
- Generally, the PCS does not have easy access to its device tree node

I'm not sure if these are terribly large disadvantages. In fact, I'm not
sure if this series provides any benefit which could not be achieved
with judicious use of helper functions. In any case, here it is.

NB: Several (later) patches in this series should not be applied. See
the notes in each commit for details on when they can be applied.


Sean Anderson (9):
  dt-bindings: net: Add lynx PCS
  dt-bindings: net: Expand pcs-handle to an array
  net: pcs: Add helpers for registering and finding PCSs
  net: pcs: lynx: Convert to an mdio driver
  net: pcs: lynx: Use pcs_get_by_provider to get PCS
  net: pcs: lynx: Remove lynx_get_mdio_device and lynx_pcs_destroy
  arm64: dts: Add compatible strings for Lynx PCSs
  powerpc: dts: Add compatible strings for Lynx PCSs
  net: pcs: lynx: Remove remaining users of lynx_pcs_create

 .../bindings/net/ethernet-controller.yaml     |   7 +-
 .../devicetree/bindings/net/fsl,lynx-pcs.yaml |  47 ++++
 MAINTAINERS                                   |   1 +
 .../arm64/boot/dts/freescale/fsl-ls1088a.dtsi |  30 ++-
 .../arm64/boot/dts/freescale/fsl-ls208xa.dtsi |  48 ++--
 .../arm64/boot/dts/freescale/fsl-lx2160a.dtsi |  54 +++--
 .../dts/freescale/qoriq-fman3-0-10g-0.dtsi    |   3 +-
 .../dts/freescale/qoriq-fman3-0-10g-1.dtsi    |   3 +-
 .../dts/freescale/qoriq-fman3-0-1g-0.dtsi     |   3 +-
 .../dts/freescale/qoriq-fman3-0-1g-1.dtsi     |   3 +-
 .../dts/freescale/qoriq-fman3-0-1g-2.dtsi     |   3 +-
 .../dts/freescale/qoriq-fman3-0-1g-3.dtsi     |   3 +-
 .../dts/freescale/qoriq-fman3-0-1g-4.dtsi     |   3 +-
 .../dts/freescale/qoriq-fman3-0-1g-5.dtsi     |   3 +-
 .../fsl/qoriq-fman3-0-10g-0-best-effort.dtsi  |   3 +-
 .../boot/dts/fsl/qoriq-fman3-0-10g-0.dtsi     |   3 +-
 .../fsl/qoriq-fman3-0-10g-1-best-effort.dtsi  |   3 +-
 .../boot/dts/fsl/qoriq-fman3-0-10g-1.dtsi     |   3 +-
 .../boot/dts/fsl/qoriq-fman3-0-1g-0.dtsi      |   3 +-
 .../boot/dts/fsl/qoriq-fman3-0-1g-1.dtsi      |   3 +-
 .../boot/dts/fsl/qoriq-fman3-0-1g-2.dtsi      |   3 +-
 .../boot/dts/fsl/qoriq-fman3-0-1g-3.dtsi      |   3 +-
 .../boot/dts/fsl/qoriq-fman3-0-1g-4.dtsi      |   3 +-
 .../boot/dts/fsl/qoriq-fman3-0-1g-5.dtsi      |   3 +-
 .../boot/dts/fsl/qoriq-fman3-1-10g-0.dtsi     |   3 +-
 .../boot/dts/fsl/qoriq-fman3-1-10g-1.dtsi     |   3 +-
 .../boot/dts/fsl/qoriq-fman3-1-1g-0.dtsi      |   3 +-
 .../boot/dts/fsl/qoriq-fman3-1-1g-1.dtsi      |   3 +-
 .../boot/dts/fsl/qoriq-fman3-1-1g-2.dtsi      |   3 +-
 .../boot/dts/fsl/qoriq-fman3-1-1g-3.dtsi      |   3 +-
 .../boot/dts/fsl/qoriq-fman3-1-1g-4.dtsi      |   3 +-
 .../boot/dts/fsl/qoriq-fman3-1-1g-5.dtsi      |   3 +-
 drivers/net/dsa/ocelot/Kconfig                |   2 +
 drivers/net/dsa/ocelot/felix_vsc9959.c        |  26 +-
 drivers/net/dsa/ocelot/seville_vsc9953.c      |  26 +-
 drivers/net/ethernet/freescale/dpaa2/Kconfig  |   1 +
 .../net/ethernet/freescale/dpaa2/dpaa2-mac.c  |  43 +---
 drivers/net/ethernet/freescale/enetc/Kconfig  |   1 +
 .../net/ethernet/freescale/enetc/enetc_pf.c   |  25 +-
 drivers/net/pcs/Kconfig                       |  23 +-
 drivers/net/pcs/Makefile                      |   2 +
 drivers/net/pcs/core.c                        | 226 ++++++++++++++++++
 drivers/net/pcs/pcs-lynx.c                    |  76 ++++--
 drivers/of/property.c                         |   2 +
 include/linux/pcs-lynx.h                      |   6 +-
 include/linux/pcs.h                           |  33 +++
 include/linux/phylink.h                       |   6 +
 47 files changed, 566 insertions(+), 197 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/net/fsl,lynx-pcs.yaml
 create mode 100644 drivers/net/pcs/core.c
 create mode 100644 include/linux/pcs.h

Comments

Vladimir Oltean July 19, 2022, 3:25 p.m. UTC | #1
Hi Sean,

On Mon, Jul 11, 2022 at 12:05:10PM -0400, Sean Anderson wrote:
> For a long time, PCSs have been tightly coupled with their MACs. For
> this reason, the MAC creates the "phy" or mdio device, and then passes
> it to the PCS to initialize. This has a few disadvantages:
> 
> - Each MAC must re-implement the same steps to look up/create a PCS
> - The PCS cannot use functions tied to device lifetime, such as devm_*.
> - Generally, the PCS does not have easy access to its device tree node
> 
> I'm not sure if these are terribly large disadvantages. In fact, I'm not
> sure if this series provides any benefit which could not be achieved
> with judicious use of helper functions. In any case, here it is.
> 
> NB: Several (later) patches in this series should not be applied. See
> the notes in each commit for details on when they can be applied.

Sorry to burst your bubble, but the networking drivers on NXP LS1028A
(device tree at arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi, drivers
at drivers/net/ethernet/freescale/enetc/ and drivers/net/dsa/ocelot/)
do not use the Lynx PCS through a pcs-handle, because the Lynx PCS in
fact has no backing OF node there, nor do the internal MDIO buses of the
ENETC and of the switch.

It seems that I need to point this out explicitly: you need to provide
at least a working migration path to your PCS driver model. Currently
there isn't one, and as a result, networking is broken on the LS1028A
with this patch set.
Sean Anderson July 19, 2022, 3:28 p.m. UTC | #2
Hi Vladimir,

On 7/19/22 11:25 AM, Vladimir Oltean wrote:
> Hi Sean,
> 
> On Mon, Jul 11, 2022 at 12:05:10PM -0400, Sean Anderson wrote:
>> For a long time, PCSs have been tightly coupled with their MACs. For
>> this reason, the MAC creates the "phy" or mdio device, and then passes
>> it to the PCS to initialize. This has a few disadvantages:
>> 
>> - Each MAC must re-implement the same steps to look up/create a PCS
>> - The PCS cannot use functions tied to device lifetime, such as devm_*.
>> - Generally, the PCS does not have easy access to its device tree node
>> 
>> I'm not sure if these are terribly large disadvantages. In fact, I'm not
>> sure if this series provides any benefit which could not be achieved
>> with judicious use of helper functions. In any case, here it is.
>> 
>> NB: Several (later) patches in this series should not be applied. See
>> the notes in each commit for details on when they can be applied.
> 
> Sorry to burst your bubble, but the networking drivers on NXP LS1028A
> (device tree at arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi, drivers
> at drivers/net/ethernet/freescale/enetc/ and drivers/net/dsa/ocelot/)
> do not use the Lynx PCS through a pcs-handle, because the Lynx PCS in
> fact has no backing OF node there, nor do the internal MDIO buses of the
> ENETC and of the switch.
> 
> It seems that I need to point this out explicitly: you need to provide
> at least a working migration path to your PCS driver model. Currently
> there isn't one, and as a result, networking is broken on the LS1028A
> with this patch set.
> 

Please refer to patches 4, 5, and 6.

--Sean
Vladimir Oltean July 19, 2022, 3:38 p.m. UTC | #3
On Tue, Jul 19, 2022 at 11:28:42AM -0400, Sean Anderson wrote:
> Hi Vladimir,
> 
> On 7/19/22 11:25 AM, Vladimir Oltean wrote:
> > Hi Sean,
> > 
> > On Mon, Jul 11, 2022 at 12:05:10PM -0400, Sean Anderson wrote:
> >> For a long time, PCSs have been tightly coupled with their MACs. For
> >> this reason, the MAC creates the "phy" or mdio device, and then passes
> >> it to the PCS to initialize. This has a few disadvantages:
> >> 
> >> - Each MAC must re-implement the same steps to look up/create a PCS
> >> - The PCS cannot use functions tied to device lifetime, such as devm_*.
> >> - Generally, the PCS does not have easy access to its device tree node
> >> 
> >> I'm not sure if these are terribly large disadvantages. In fact, I'm not
> >> sure if this series provides any benefit which could not be achieved
> >> with judicious use of helper functions. In any case, here it is.
> >> 
> >> NB: Several (later) patches in this series should not be applied. See
> >> the notes in each commit for details on when they can be applied.
> > 
> > Sorry to burst your bubble, but the networking drivers on NXP LS1028A
> > (device tree at arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi, drivers
> > at drivers/net/ethernet/freescale/enetc/ and drivers/net/dsa/ocelot/)
> > do not use the Lynx PCS through a pcs-handle, because the Lynx PCS in
> > fact has no backing OF node there, nor do the internal MDIO buses of the
> > ENETC and of the switch.
> > 
> > It seems that I need to point this out explicitly: you need to provide
> > at least a working migration path to your PCS driver model. Currently
> > there isn't one, and as a result, networking is broken on the LS1028A
> > with this patch set.
> > 
> 
> Please refer to patches 4, 5, and 6.

I don't understand, could you be more clear? Are you saying that I
shouldn't have applied patch 9 while testing? When would be a good
moment to apply patch 9?
Sean Anderson July 19, 2022, 3:46 p.m. UTC | #4
On 7/19/22 11:38 AM, Vladimir Oltean wrote:
> On Tue, Jul 19, 2022 at 11:28:42AM -0400, Sean Anderson wrote:
>> Hi Vladimir,
>> 
>> On 7/19/22 11:25 AM, Vladimir Oltean wrote:
>> > Hi Sean,
>> > 
>> > On Mon, Jul 11, 2022 at 12:05:10PM -0400, Sean Anderson wrote:
>> >> For a long time, PCSs have been tightly coupled with their MACs. For
>> >> this reason, the MAC creates the "phy" or mdio device, and then passes
>> >> it to the PCS to initialize. This has a few disadvantages:
>> >> 
>> >> - Each MAC must re-implement the same steps to look up/create a PCS
>> >> - The PCS cannot use functions tied to device lifetime, such as devm_*.
>> >> - Generally, the PCS does not have easy access to its device tree node
>> >> 
>> >> I'm not sure if these are terribly large disadvantages. In fact, I'm not
>> >> sure if this series provides any benefit which could not be achieved
>> >> with judicious use of helper functions. In any case, here it is.
>> >> 
>> >> NB: Several (later) patches in this series should not be applied. See
>> >> the notes in each commit for details on when they can be applied.
>> > 
>> > Sorry to burst your bubble, but the networking drivers on NXP LS1028A
>> > (device tree at arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi, drivers
>> > at drivers/net/ethernet/freescale/enetc/ and drivers/net/dsa/ocelot/)
>> > do not use the Lynx PCS through a pcs-handle, because the Lynx PCS in
>> > fact has no backing OF node there, nor do the internal MDIO buses of the
>> > ENETC and of the switch.
>> > 
>> > It seems that I need to point this out explicitly: you need to provide
>> > at least a working migration path to your PCS driver model. Currently
>> > there isn't one, and as a result, networking is broken on the LS1028A
>> > with this patch set.
>> > 
>> 
>> Please refer to patches 4, 5, and 6.
> 
> I don't understand, could you be more clear? Are you saying that I
> shouldn't have applied patch 9 while testing? When would be a good
> moment to apply patch 9?

I'm saying that patches 4 and 5 [1] provide "...a working migration
path to [my] PCS driver model." Since enetc/ocelot do not use
devicetree for the PCS, patch 9 should have no effect.

That said, if you've tested this on actual hardware, I'm interested
in your results. I do not have access to enetc/ocelot hardware, so
I was unable to test whether my proposed migration would work.

--Sean

[1] I listed 6 but it seems like it just has some small hunks which should have been in 5 instead
Vladimir Oltean July 19, 2022, 6:11 p.m. UTC | #5
On Tue, Jul 19, 2022 at 11:46:23AM -0400, Sean Anderson wrote:
> I'm saying that patches 4 and 5 [1] provide "...a working migration
> path to [my] PCS driver model." Since enetc/ocelot do not use
> devicetree for the PCS, patch 9 should have no effect.
> 
> That said, if you've tested this on actual hardware, I'm interested
> in your results. I do not have access to enetc/ocelot hardware, so
> I was unable to test whether my proposed migration would work.
> 
> --Sean
> 
> [1] I listed 6 but it seems like it just has some small hunks which should have been in 5 instead

Got it, thanks. So things actually work up until the end, after fixing
the compilation errors and warnings and applying my phy_mask patch first.
However, as mentioned by Russell King, this patch set now gives us the
possibility of doing this, which happily kills the system:

echo "0000:00:00.5-imdio:03" > /sys/bus/mdio_bus/drivers/lynx-pcs/unbind

For your information, pcs-rzn1-miic.c already has a device_link_add()
call to its consumer, and it does avoid the unbinding problem. It is a
bit of a heavy hammer as Russell points out (a DSA switch is a single
struct device, but has multiple net_devices and phylink instances, and
the switch device would be unregistered in its entirety), but on the
other hand, this is one of the simpler things we can do, until we have
something more fine-grained. I, for one, am perfectly happy with a
device link. The alternative would be reworking phylink to react on PCS
devices coming and going. I don't even know what the implications are
upon mac_select_pcs() and such...
Sean Anderson July 19, 2022, 7:34 p.m. UTC | #6
On 7/19/22 2:11 PM, Vladimir Oltean wrote:
> On Tue, Jul 19, 2022 at 11:46:23AM -0400, Sean Anderson wrote:
>> I'm saying that patches 4 and 5 [1] provide "...a working migration
>> path to [my] PCS driver model." Since enetc/ocelot do not use
>> devicetree for the PCS, patch 9 should have no effect.
>> 
>> That said, if you've tested this on actual hardware, I'm interested
>> in your results. I do not have access to enetc/ocelot hardware, so
>> I was unable to test whether my proposed migration would work.
>> 
>> --Sean
>> 
>> [1] I listed 6 but it seems like it just has some small hunks which should have been in 5 instead
> 
> Got it, thanks. So things actually work up until the end, after fixing
> the compilation errors and warnings and applying my phy_mask patch first.
> However, as mentioned by Russell King, this patch set now gives us the
> possibility of doing this, which happily kills the system:
> 
> echo "0000:00:00.5-imdio:03" > /sys/bus/mdio_bus/drivers/lynx-pcs/unbind
> 
> For your information, pcs-rzn1-miic.c already has a device_link_add()
> call to its consumer, and it does avoid the unbinding problem. It is a
> bit of a heavy hammer as Russell points out (a DSA switch is a single
> struct device, but has multiple net_devices and phylink instances, and
> the switch device would be unregistered in its entirety), but on the
> other hand, this is one of the simpler things we can do, until we have
> something more fine-grained. I, for one, am perfectly happy with a
> device link. The alternative would be reworking phylink to react on PCS
> devices coming and going. I don't even know what the implications are
> upon mac_select_pcs() and such...

We could do it, but it'd be a pretty big hack. Something like the
following. Phylink would need to be modified to grab the lock before
every op and check if the PCS is dead or not. This is of course still
not optimal, since there's no way to re-attach a PCS once it goes away.

IMO a better solution is to use devlink and submit a patch to add
notifications which the MAC driver can register for. That way it can
find out when the PCS goes away and potentially do something about it
(or just let itself get removed).

---
 drivers/net/pcs/core.c     | 115 +++++++++++++++++++++++++++----------
 drivers/net/pcs/pcs-lynx.c |  20 +++----
 include/linux/pcs.h        |  23 +++++++-
 include/linux/phylink.h    |  19 +-----
 4 files changed, 117 insertions(+), 60 deletions(-)

diff --git a/drivers/net/pcs/core.c b/drivers/net/pcs/core.c
index 782a4cdd19b2..46e4168802db 100644
--- a/drivers/net/pcs/core.c
+++ b/drivers/net/pcs/core.c
@@ -10,42 +10,83 @@
 #include <linux/phylink.h>
 #include <linux/property.h>
 
+struct phylink_pcs {
+	struct mutex lock;
+	struct list_head list;
+	struct device *dev;
+	void *priv;
+	const struct phylink_pcs_ops *ops;
+	int refs;
+	bool dead;
+	bool poll;
+};
+
 static LIST_HEAD(pcs_devices);
 static DEFINE_MUTEX(pcs_mutex);
 
 /**
  * pcs_register() - register a new PCS
- * @pcs: the PCS to register
+ * @init: Initialization data for a new PCS
  *
  * Registers a new PCS which can be automatically attached to a phylink.
  *
- * Return: 0 on success, or -errno on error
+ * Return: A new PCS, or an error pointer
  */
-int pcs_register(struct phylink_pcs *pcs)
+struct phylink_pcs *pcs_register(struct device *dev,
+				 struct pcs_init *init)
 {
-	if (!pcs->dev || !pcs->ops)
-		return -EINVAL;
-	if (!pcs->ops->pcs_an_restart || !pcs->ops->pcs_config ||
-	    !pcs->ops->pcs_get_state)
-		return -EINVAL;
+	struct phylink_pcs *pcs;
 
+	if (!init->ops)
+		return ERR_PTR(-EINVAL);
+	if (!init->ops->pcs_an_restart || !init->ops->pcs_config ||
+	    !init->ops->pcs_get_state)
+		return ERR_PTR(-EINVAL);
+
+	pcs = kzalloc(sizeof(*pcs), GFP_KERNEL);
+	if (!pcs)
+		return ERR_PTR(-ENOMEM);
+
+	pcs->dev = dev;
+	pcs->priv = init->priv;
+	pcs->ops = init->ops;
+	pcs->poll = init->poll;
+	pcs->refs = 1;
+	mutex_init(&pcs->lock);
 	INIT_LIST_HEAD(&pcs->list);
+
 	mutex_lock(&pcs_mutex);
 	list_add(&pcs->list, &pcs_devices);
 	mutex_unlock(&pcs_mutex);
-	return 0;
+	return pcs;
 }
 EXPORT_SYMBOL_GPL(pcs_register);
 
+static void pcs_free(struct phylink_pcs *pcs)
+{
+	int refs;
+
+	refs = --pcs->refs;
+	mutex_unlock(&pcs->lock);
+	if (refs)
+		return;
+
+	WARN_ON(!pcs->dead);
+	mutex_lock(&pcs_mutex);
+	list_del(&pcs->list);
+	mutex_unlock(&pcs_mutex);
+	kfree(pcs);
+}
+
 /**
  * pcs_unregister() - unregister a PCS
  * @pcs: a PCS previously registered with pcs_register()
  */
 void pcs_unregister(struct phylink_pcs *pcs)
 {
-	mutex_lock(&pcs_mutex);
-	list_del(&pcs->list);
-	mutex_unlock(&pcs_mutex);
+	mutex_lock(&pcs->lock);
+	pcs->dead = true;
+	pcs_free(pcs);
 }
 EXPORT_SYMBOL_GPL(pcs_unregister);
 
@@ -65,26 +106,25 @@ static void devm_pcs_release(struct device *dev, void *res)
  *
  * Return: 0 on success, or -errno on failure
  */
-int devm_pcs_register(struct device *dev, struct phylink_pcs *pcs)
+struct phylink_pcs *devm_pcs_register(struct device *dev,
+				      struct pcs_init *init)
 {
 	struct phylink_pcs **pcsp;
-	int ret;
+	struct phylink_pcs *pcs;
 
 	pcsp = devres_alloc(devm_pcs_release, sizeof(*pcsp),
 			    GFP_KERNEL);
 	if (!pcsp)
-		return -ENOMEM;
+		return ERR_PTR(-ENOMEM);
 
-	ret = pcs_register(pcs);
-	if (ret) {
+	pcs = pcs_register(dev, init);
+	if (IS_ERR(pcs)) {
 		devres_free(pcsp);
-		return ret;
+	} else {
+		*pcsp = pcs;
+		devres_add(dev, pcsp);
 	}
-
-	*pcsp = pcs;
-	devres_add(dev, pcsp);
-
-	return ret;
+	return pcs;
 }
 EXPORT_SYMBOL_GPL(devm_pcs_register);
 
@@ -106,16 +146,20 @@ static struct phylink_pcs *pcs_find(const struct fwnode_handle *fwnode,
 
 	mutex_lock(&pcs_mutex);
 	list_for_each_entry(pcs, &pcs_devices, list) {
-		if (dev && pcs->dev == dev)
-			goto out;
-		if (fwnode && pcs->dev->fwnode == fwnode)
-			goto out;
+		mutex_lock(&pcs->lock);
+		if (!pcs->dead) {
+			if (dev && pcs->dev == dev)
+				goto out;
+			if (fwnode && pcs->dev->fwnode == fwnode)
+				goto out;
+		}
+		mutex_unlock(&pcs->lock);
 	}
 	pcs = NULL;
 
 out:
 	mutex_unlock(&pcs_mutex);
-	pr_devel("%s: looking for %pfwf or %s %s...%s found\n", __func__,
+	pr_debug("%s: looking for %pfwf or %s %s...%s found\n", __func__,
 		 fwnode, dev ? dev_driver_string(dev) : "(null)",
 		 dev ? dev_name(dev) : "(null)", pcs ? " not" : "");
 	return pcs;
@@ -132,10 +176,15 @@ static struct phylink_pcs *pcs_find(const struct fwnode_handle *fwnode,
  */
 static struct phylink_pcs *pcs_get_tail(struct phylink_pcs *pcs)
 {
+	bool got_module;
+
 	if (!pcs)
 		return ERR_PTR(-EPROBE_DEFER);
 
-	if (!try_module_get(pcs->ops->owner))
+	got_module = try_module_get(pcs->ops->owner);
+	pcs->refs += got_module;
+	mutex_unlock(&pcs->lock);
+	if (!got_module)
 		return ERR_PTR(-ENODEV);
 	get_device(pcs->dev);
 
@@ -222,5 +271,13 @@ void pcs_put(struct phylink_pcs *pcs)
 
 	put_device(pcs->dev);
 	module_put(pcs->ops->owner);
+	mutex_lock(&pcs->lock);
+	pcs_free(pcs);
 }
 EXPORT_SYMBOL_GPL(pcs_put);
+
+void *pcs_get_priv(struct phylink_pcs *pcs)
+{
+	return pcs->priv;
+}
+EXPORT_SYMBOL_GPL(pcs_get_priv);
diff --git a/drivers/net/pcs/pcs-lynx.c b/drivers/net/pcs/pcs-lynx.c
index c3e2c4a6fab6..f792f2a7cdf2 100644
--- a/drivers/net/pcs/pcs-lynx.c
+++ b/drivers/net/pcs/pcs-lynx.c
@@ -26,7 +26,6 @@
 #define IF_MODE_HALF_DUPLEX		BIT(4)
 
 struct lynx_pcs {
-	struct phylink_pcs pcs;
 	struct mdio_device *mdio;
 };
 
@@ -37,8 +36,7 @@ enum sgmii_speed {
 	SGMII_SPEED_2500	= 2,
 };
 
-#define phylink_pcs_to_lynx(pl_pcs) container_of((pl_pcs), struct lynx_pcs, pcs)
-#define lynx_to_phylink_pcs(lynx) (&(lynx)->pcs)
+#define phylink_pcs_to_lynx(pl_pcs) pcs_get_priv(pl_pcs)
 
 static void lynx_pcs_get_state_usxgmii(struct mdio_device *pcs,
 				       struct phylink_link_state *state)
@@ -318,21 +316,23 @@ static const struct phylink_pcs_ops lynx_pcs_phylink_ops = {
 static int lynx_pcs_probe(struct mdio_device *mdio)
 {
 	struct device *dev = &mdio->dev;
+	struct phylink_pcs *pcs;
 	struct lynx_pcs *lynx;
-	int ret;
+	struct pcs_init init;
 
 	lynx = devm_kzalloc(dev, sizeof(*lynx), GFP_KERNEL);
 	if (!lynx)
 		return -ENOMEM;
 
 	lynx->mdio = mdio;
-	lynx->pcs.dev = dev;
-	lynx->pcs.ops = &lynx_pcs_phylink_ops;
-	lynx->pcs.poll = true;
+	init.priv = lynx;
+	init.ops = &lynx_pcs_phylink_ops;
+	init.poll = true;
 
-	ret = devm_pcs_register(dev, &lynx->pcs);
-	if (ret)
-		return dev_err_probe(dev, ret, "could not register PCS\n");
+	pcs = devm_pcs_register(dev, &init);
+	if (IS_ERR(pcs))
+		return dev_err_probe(dev, PTR_ERR(pcs),
+				     "could not register PCS\n");
 	dev_info(dev, "probed\n");
 	return 0;
 }
diff --git a/include/linux/pcs.h b/include/linux/pcs.h
index 00e76594e03c..2605603149ec 100644
--- a/include/linux/pcs.h
+++ b/include/linux/pcs.h
@@ -6,12 +6,27 @@
 #ifndef _PCS_H
 #define _PCS_H
 
-struct phylink_pcs;
 struct fwnode;
+struct phylink_pcs;
+struct phylink_pcs_ops;
 
-int pcs_register(struct phylink_pcs *pcs);
+/**
+ * struct pcs_init - PCS initialization data
+ * @priv: the device's private data
+ * @ops: a pointer to the &struct phylink_pcs_ops structure
+ * @poll: poll the PCS for link changes
+ */
+struct pcs_init {
+	void *priv;
+	const struct phylink_pcs_ops *ops;
+	bool poll;
+};
+
+struct phylink_pcs *pcs_register(struct device *dev,
+				 struct pcs_init *init);
 void pcs_unregister(struct phylink_pcs *pcs);
-int devm_pcs_register(struct device *dev, struct phylink_pcs *pcs);
+struct phylink_pcs *devm_pcs_register(struct device *dev,
+				      struct pcs_init *init);
 struct phylink_pcs *_pcs_get_by_fwnode(const struct fwnode_handle *fwnode,
 				       const char *id, bool optional);
 struct phylink_pcs *pcs_get_by_provider(const struct device *dev);
@@ -30,4 +45,6 @@ static inline struct phylink_pcs
 	return _pcs_get_by_fwnode(fwnode, id, true);
 }
 
+void *pcs_get_priv(struct phylink_pcs *pcs);
+
 #endif /* PCS_H */
diff --git a/include/linux/phylink.h b/include/linux/phylink.h
index a713e70108a1..864536d1b293 100644
--- a/include/linux/phylink.h
+++ b/include/linux/phylink.h
@@ -392,24 +392,7 @@ void mac_link_up(struct phylink_config *config, struct phy_device *phy,
 		 int speed, int duplex, bool tx_pause, bool rx_pause);
 #endif
 
-struct phylink_pcs_ops;
-
-/**
- * struct phylink_pcs - PHYLINK PCS instance
- * @dev: the device associated with this PCS
- * @ops: a pointer to the &struct phylink_pcs_ops structure
- * @list: internal list of PCS devices
- * @poll: poll the PCS for link changes
- *
- * This structure is designed to be embedded within the PCS private data,
- * and will be passed between phylink and the PCS.
- */
-struct phylink_pcs {
-	struct device *dev;
-	const struct phylink_pcs_ops *ops;
-	struct list_head list;
-	bool poll;
-};
+struct phylink_pcs;
 
 /**
  * struct phylink_pcs_ops - MAC PCS operations structure.
Vladimir Oltean July 20, 2022, 1:53 p.m. UTC | #7
On Tue, Jul 19, 2022 at 03:34:45PM -0400, Sean Anderson wrote:
> We could do it, but it'd be a pretty big hack. Something like the
> following. Phylink would need to be modified to grab the lock before
> every op and check if the PCS is dead or not. This is of course still
> not optimal, since there's no way to re-attach a PCS once it goes away.

You assume it's just phylink who operates on a PCS structure, but if you
include your search pool to also cover include/linux/pcs/pcs-xpcs.h,
you'll see a bunch of exported functions which are called directly by
the client drivers (stmmac, sja1105). At this stage it gets pretty hard
to validate that drivers won't attempt from any code path to do
something stupid with a dead PCS. All in all it creates an environment
with insanely weak guarantees; that's pretty hard to get behind IMO.

> IMO a better solution is to use devlink and submit a patch to add
> notifications which the MAC driver can register for. That way it can
> find out when the PCS goes away and potentially do something about it
> (or just let itself get removed).

Not sure I understand what connection there is between devlink (device
links) and PCS {de}registration notifications. We could probably add those
notifications without any intervention from the device core: we would
just need to make this new PCS "core" to register an blocking_notifier_call_chain
to which interested drivers could add their notifier blocks. How a
certain phylink user is going to determine that "hey, this PCS is
definitely mine and I can use it" is an open question. In any case, my
expectation is that we have a notifier chain, we can at least continue
operating (avoid unbinding the struct device), but essentially move our
phylink_create/phylink_destroy calls to within those notifier blocks.

Again, retrofitting this model to existing drivers, phylink API (and
maybe even its internal structure) is something that's hard to hop on
board of; I think it's a solution waiting for a problem, and I don't
have an interest to develop or even review it.
Sean Anderson July 21, 2022, 9:42 p.m. UTC | #8
On 7/20/22 9:53 AM, Vladimir Oltean wrote:
> On Tue, Jul 19, 2022 at 03:34:45PM -0400, Sean Anderson wrote:
>> We could do it, but it'd be a pretty big hack. Something like the
>> following. Phylink would need to be modified to grab the lock before
>> every op and check if the PCS is dead or not. This is of course still
>> not optimal, since there's no way to re-attach a PCS once it goes away.
> 
> You assume it's just phylink who operates on a PCS structure, but if you
> include your search pool to also cover include/linux/pcs/pcs-xpcs.h,
> you'll see a bunch of exported functions which are called directly by
> the client drivers (stmmac, sja1105). At this stage it gets pretty hard
> to validate that drivers won't attempt from any code path to do
> something stupid with a dead PCS. All in all it creates an environment
> with insanely weak guarantees; that's pretty hard to get behind IMO.

Right. To do this properly, we'd need wrapper functions for all the PCS
operations. And the super-weak guarantees is why I referred to this as a
"hack". But we could certainly make it so that removing a PCS might not
bring down the MAC.

>> IMO a better solution is to use devlink and submit a patch to add
>> notifications which the MAC driver can register for. That way it can
>> find out when the PCS goes away and potentially do something about it
>> (or just let itself get removed).
> 
> Not sure I understand what connection there is between devlink (device
> links) and PCS {de}registration notifications. 

The default action when a supplier is going to be removed is to remove
the consumers. However, it'd be nice to notify the consumer beforehand.
If we used device links, this would need to be integrated (since otherwise
we'd only find out that a PCS was gone after the MAC was gone too).

> We could probably add those
> notifications without any intervention from the device core: we would
> just need to make this new PCS "core" to register an blocking_notifier_call_chain
> to which interested drivers could add their notifier blocks. How a> certain phylink user is going to determine that "hey, this PCS is
> definitely mine and I can use it" is an open question. In any case, my
> expectation is that we have a notifier chain, we can at least continue
> operating (avoid unbinding the struct device), but essentially move our
> phylink_create/phylink_destroy calls to within those notifier blocks.
> Again, retrofitting this model to existing drivers, phylink API (and
> maybe even its internal structure) is something that's hard to hop on
> board of; I think it's a solution waiting for a problem, and I don't
> have an interest to develop or even review it.

I don't either. I'd much rather just bring down the whole MAC when any
PCS gets removed. Whatever we decide on doing here should also be done
for (serdes) phys as well, since they have all the same pitfalls. For
that reason I'd rather use a generic, non-intrusive solution like device
links. I know Russell mentioned composite devices, but I think those
would have similar advantages/drawbacks as a device-link-based solution
(unbinding of one device unbinds the rest).

--Sean
Sean Anderson July 29, 2022, 10:15 p.m. UTC | #9
On 7/21/22 5:41 PM, Sean Anderson wrote:
> On 7/20/22 9:53 AM, Vladimir Oltean wrote:
>> On Tue, Jul 19, 2022 at 03:34:45PM -0400, Sean Anderson wrote:
>>> We could do it, but it'd be a pretty big hack. Something like the
>>> following. Phylink would need to be modified to grab the lock before
>>> every op and check if the PCS is dead or not. This is of course still
>>> not optimal, since there's no way to re-attach a PCS once it goes away.
>> 
>> You assume it's just phylink who operates on a PCS structure, but if you
>> include your search pool to also cover include/linux/pcs/pcs-xpcs.h,
>> you'll see a bunch of exported functions which are called directly by
>> the client drivers (stmmac, sja1105). At this stage it gets pretty hard
>> to validate that drivers won't attempt from any code path to do
>> something stupid with a dead PCS. All in all it creates an environment
>> with insanely weak guarantees; that's pretty hard to get behind IMO.
> 
> Right. To do this properly, we'd need wrapper functions for all the PCS
> operations. And the super-weak guarantees is why I referred to this as a
> "hack". But we could certainly make it so that removing a PCS might not
> bring down the MAC.
> 
>>> IMO a better solution is to use devlink and submit a patch to add
>>> notifications which the MAC driver can register for. That way it can
>>> find out when the PCS goes away and potentially do something about it
>>> (or just let itself get removed).
>> 
>> Not sure I understand what connection there is between devlink (device
>> links) and PCS {de}registration notifications. 
> 
> The default action when a supplier is going to be removed is to remove
> the consumers. However, it'd be nice to notify the consumer beforehand.
> If we used device links, this would need to be integrated (since otherwise
> we'd only find out that a PCS was gone after the MAC was gone too).
> 
>> We could probably add those
>> notifications without any intervention from the device core: we would
>> just need to make this new PCS "core" to register an blocking_notifier_call_chain
>> to which interested drivers could add their notifier blocks. How a> certain phylink user is going to determine that "hey, this PCS is
>> definitely mine and I can use it" is an open question. In any case, my
>> expectation is that we have a notifier chain, we can at least continue
>> operating (avoid unbinding the struct device), but essentially move our
>> phylink_create/phylink_destroy calls to within those notifier blocks.
>> Again, retrofitting this model to existing drivers, phylink API (and
>> maybe even its internal structure) is something that's hard to hop on
>> board of; I think it's a solution waiting for a problem, and I don't
>> have an interest to develop or even review it.
> 
> I don't either. I'd much rather just bring down the whole MAC when any
> PCS gets removed. Whatever we decide on doing here should also be done
> for (serdes) phys as well, since they have all the same pitfalls. For
> that reason I'd rather use a generic, non-intrusive solution like device
> links. I know Russell mentioned composite devices, but I think those
> would have similar advantages/drawbacks as a device-link-based solution
> (unbinding of one device unbinds the rest).

OK, I've thought about this a bit more. Right now, you can crash the
kernel by unbinding a phy (either serdes or networking) and waiting for
a state change. The serdes problem could probably be solved by
strengthening the existing device_link_add calls. This will of course
unbind the netdev if you unbind the serdes. I think this is not a common
use case; if a user does this they probably know what they're doing (or
not).

The problem with ethernet phys is a bit worse. It's very common to have
something like

	+ netdev
	|
	+-+ mdiobus
	  |
	  +-- phy

which poses a problem for device links. The phy is a child of the
netdev, so it should be unbound first. device_link_add will see this and
refuse to create a link, since such a link implies that netdev should be
unbound before phy.

One solution might be something like:

diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
index a74b320f5b27..05894e1c3e59 100644
--- a/drivers/net/phy/phy_device.c
+++ b/drivers/net/phy/phy_device.c
@@ -27,6 +27,7 @@
 #include <linux/phy.h>
 #include <linux/phy_led_triggers.h>
 #include <linux/property.h>
+#include <linux/rtnetlink.h>
 #include <linux/sfp.h>
 #include <linux/skbuff.h>
 #include <linux/slab.h>
@@ -3111,6 +3112,13 @@ static int phy_remove(struct device *dev)
 {
        struct phy_device *phydev = to_phy_device(dev);
 
+	// I'm pretty sure this races with multiple unbinds...
+       rtnl_lock();
+       device_unlock(dev);
+       dev_close(phydev->attached_dev);
+       device_lock(dev);
+       rtnl_unlock();
+       WARN_ON(phydev->attached_dev);
+
        cancel_delayed_work_sync(&phydev->state_queue);
 
        mutex_lock(&phydev->lock);

which is probably the least intrusive we can get. But this isn't very
nice for PCSs.

First, PCSs are not always used by netdevs. So there's no generic way to
ask the user "please clean up this PCS." Additionally, most PCS users
expect the PCS to be around for the lifetime of the driver (or at least
until they're done using it).

Maybe the best solution is to provide some helper functions to use with
bus_register_notifier and just let the drivers fend for themselves. Or
possibly default to a devlink, but allow registering a notifier instead.

--Sean