diff mbox series

[net-next,08/12] net: pse-pd: pd692x0: Add support for PSE PI priority feature

Message ID 20241002-feature_poe_port_prio-v1-8-787054f74ed5@bootlin.com (mailing list archive)
State Changes Requested
Delegated to: Netdev Maintainers
Headers show
Series Add support for PSE port priority | expand

Checks

Context Check Description
netdev/series_format success Posting correctly formatted
netdev/tree_selection success Clearly marked for net-next
netdev/ynl success Generated files up to date; no warnings/errors; GEN HAS DIFF 2 files changed, 27 insertions(+);
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 9 this patch: 9
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers success CCed 6 of 6 maintainers
netdev/build_clang success Errors and warnings before: 9 this patch: 9
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 8 this patch: 8
netdev/checkpatch warning CHECK: From:/Signed-off-by: email comments mismatch: 'From: Kory Maincent (Dent Project) <kory.maincent@bootlin.com>' != 'Signed-off-by: Kory Maincent <kory.maincent@bootlin.com>'
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
netdev/contest success net-next-2024-10-03--18-00 (tests: 772)

Commit Message

Kory Maincent Oct. 2, 2024, 4:28 p.m. UTC
From: Kory Maincent (Dent Project) <kory.maincent@bootlin.com>

This patch extends the PSE callbacks by adding support for the newly
introduced pi_set_prio() callback, enabling the configuration of PSE PI
priorities. The current port priority is now also included in the status
information returned to users.

Signed-off-by: Kory Maincent <kory.maincent@bootlin.com>
---
 drivers/net/pse-pd/pd692x0.c | 23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)

Comments

Andrew Lunn Oct. 2, 2024, 11:41 p.m. UTC | #1
> +	msg = pd692x0_msg_template_list[PD692X0_MSG_SET_PORT_PARAM];
> +	msg.sub[2] = id;
> +	/* Controller priority from 1 to 3 */
> +	msg.data[4] = prio + 1;

Does 0 have a meaning? It just seems an odd design if it does not.

	Andrew
Kory Maincent Oct. 3, 2024, 8:01 a.m. UTC | #2
On Thu, 3 Oct 2024 01:41:02 +0200
Andrew Lunn <andrew@lunn.ch> wrote:

> > +	msg = pd692x0_msg_template_list[PD692X0_MSG_SET_PORT_PARAM];
> > +	msg.sub[2] = id;
> > +	/* Controller priority from 1 to 3 */
> > +	msg.data[4] = prio + 1;  
> 
> Does 0 have a meaning? It just seems an odd design if it does not.

PD692x0 has an odd firmware design from the beginning. ;)
Yes, the priority available are from 1 to 3. Setting it to 0 does nothing.

Regards,
Oleksij Rempel Oct. 8, 2024, 1:57 p.m. UTC | #3
On Thu, Oct 03, 2024 at 01:41:02AM +0200, Andrew Lunn wrote:
> > +	msg = pd692x0_msg_template_list[PD692X0_MSG_SET_PORT_PARAM];
> > +	msg.sub[2] = id;
> > +	/* Controller priority from 1 to 3 */
> > +	msg.data[4] = prio + 1;
> 
> Does 0 have a meaning? It just seems an odd design if it does not.

0 is not documented. But there are sub-priority which are not directly
configured by user, but affect the system behavior.

Priority#: Critical – 1; high – 2; low – 3
 For ports with the same priority, the PoE Controller sets the
 sub-priority according to the logic port number. (Lower number gets
 higher priority).

Port priority affects:
1. Power-up order: After a reset, the ports are powered up according to
 their priority, highest to lowest, highest priority will power up first.
2. Shutdown order: When exceeding the power budget, lowest priority
 ports will turn off first.

Should we return sub priorities on the prio get request?

If i see it correctly, even if user do not actively configures priorities,
they are always present. For example port 0 will have always a Prio
higher than Port 10.

Regards,
Oleksij
Kory Maincent Oct. 8, 2024, 2:21 p.m. UTC | #4
On Tue, 8 Oct 2024 15:57:22 +0200
Oleksij Rempel <o.rempel@pengutronix.de> wrote:

> On Thu, Oct 03, 2024 at 01:41:02AM +0200, Andrew Lunn wrote:
> > > +	msg = pd692x0_msg_template_list[PD692X0_MSG_SET_PORT_PARAM];
> > > +	msg.sub[2] = id;
> > > +	/* Controller priority from 1 to 3 */
> > > +	msg.data[4] = prio + 1;  
> > 
> > Does 0 have a meaning? It just seems an odd design if it does not.  
> 
> 0 is not documented. But there are sub-priority which are not directly
> configured by user, but affect the system behavior.
> 
> Priority#: Critical – 1; high – 2; low – 3
>  For ports with the same priority, the PoE Controller sets the
>  sub-priority according to the logic port number. (Lower number gets
>  higher priority).
> 
> Port priority affects:
> 1. Power-up order: After a reset, the ports are powered up according to
>  their priority, highest to lowest, highest priority will power up first.
> 2. Shutdown order: When exceeding the power budget, lowest priority
>  ports will turn off first.
> 
> Should we return sub priorities on the prio get request?
> 
> If i see it correctly, even if user do not actively configures priorities,
> they are always present. For example port 0 will have always a Prio
> higher than Port 10.

We could add a subprio ehtool attribute, but it won't be configurable.
In fact it could be configurable by changing the port matrix order but it is not
a good idea. Applying a new port matrix turn off all the ports.

I am not sure if it is specific to Microchip controller or if it is generic
enough to add the attribute.
I would say not to return it for now.

Regards,
Oleksij Rempel Oct. 8, 2024, 2:53 p.m. UTC | #5
On Tue, Oct 08, 2024 at 04:21:20PM +0200, Kory Maincent wrote:
> On Tue, 8 Oct 2024 15:57:22 +0200
> Oleksij Rempel <o.rempel@pengutronix.de> wrote:
> 
> > On Thu, Oct 03, 2024 at 01:41:02AM +0200, Andrew Lunn wrote:
> > > > +	msg = pd692x0_msg_template_list[PD692X0_MSG_SET_PORT_PARAM];
> > > > +	msg.sub[2] = id;
> > > > +	/* Controller priority from 1 to 3 */
> > > > +	msg.data[4] = prio + 1;  
> > > 
> > > Does 0 have a meaning? It just seems an odd design if it does not.  
> > 
> > 0 is not documented. But there are sub-priority which are not directly
> > configured by user, but affect the system behavior.
> > 
> > Priority#: Critical – 1; high – 2; low – 3
> >  For ports with the same priority, the PoE Controller sets the
> >  sub-priority according to the logic port number. (Lower number gets
> >  higher priority).
> > 
> > Port priority affects:
> > 1. Power-up order: After a reset, the ports are powered up according to
> >  their priority, highest to lowest, highest priority will power up first.
> > 2. Shutdown order: When exceeding the power budget, lowest priority
> >  ports will turn off first.
> > 
> > Should we return sub priorities on the prio get request?
> > 
> > If i see it correctly, even if user do not actively configures priorities,
> > they are always present. For example port 0 will have always a Prio
> > higher than Port 10.
> 
> We could add a subprio ehtool attribute, but it won't be configurable.
> In fact it could be configurable by changing the port matrix order but it is not
> a good idea. Applying a new port matrix turn off all the ports.
> 
> I am not sure if it is specific to Microchip controller or if it is generic
> enough to add the attribute.
> I would say not to return it for now.

The generic attribute do not reflect the behavior of two different
controllers. Currently implemented prio attribute is in this case TI
specific and do not work for Microchip case.

Please note, I do not care about configurability in this case, I only
care about information the user get.
Andrew Lunn Oct. 8, 2024, 4:50 p.m. UTC | #6
On Tue, Oct 08, 2024 at 03:57:22PM +0200, Oleksij Rempel wrote:
> On Thu, Oct 03, 2024 at 01:41:02AM +0200, Andrew Lunn wrote:
> > > +	msg = pd692x0_msg_template_list[PD692X0_MSG_SET_PORT_PARAM];
> > > +	msg.sub[2] = id;
> > > +	/* Controller priority from 1 to 3 */
> > > +	msg.data[4] = prio + 1;
> > 
> > Does 0 have a meaning? It just seems an odd design if it does not.
> 
> 0 is not documented. But there are sub-priority which are not directly
> configured by user, but affect the system behavior.
> 
> Priority#: Critical – 1; high – 2; low – 3
>  For ports with the same priority, the PoE Controller sets the
>  sub-priority according to the logic port number. (Lower number gets
>  higher priority).

With less priorities than ports, there is always going to be something
like this.

> 
> Port priority affects:
> 1. Power-up order: After a reset, the ports are powered up according to
>  their priority, highest to lowest, highest priority will power up first.
> 2. Shutdown order: When exceeding the power budget, lowest priority
>  ports will turn off first.
> 
> Should we return sub priorities on the prio get request?

I should be optional, since we might not actually know what a
particular device is doing. It could pick at random, it could pick a
port which is consuming just enough to cover the shortfall if it was
turned off, it could pick the highest consumer of the lowest priority
etc. Some of these conditions are not going to be easy to describe
even if we do know it.

	Andrew
Oleksij Rempel Oct. 9, 2024, 7:16 a.m. UTC | #7
Hi Andrew,

On Tue, Oct 08, 2024 at 06:50:25PM +0200, Andrew Lunn wrote:
> On Tue, Oct 08, 2024 at 03:57:22PM +0200, Oleksij Rempel wrote:
> > On Thu, Oct 03, 2024 at 01:41:02AM +0200, Andrew Lunn wrote:
> > > > +	msg = pd692x0_msg_template_list[PD692X0_MSG_SET_PORT_PARAM];
> > > > +	msg.sub[2] = id;
> > > > +	/* Controller priority from 1 to 3 */
> > > > +	msg.data[4] = prio + 1;
> > > 
> > > Does 0 have a meaning? It just seems an odd design if it does not.
> > 
> > 0 is not documented. But there are sub-priority which are not directly
> > configured by user, but affect the system behavior.
> > 
> > Priority#: Critical – 1; high – 2; low – 3
> >  For ports with the same priority, the PoE Controller sets the
> >  sub-priority according to the logic port number. (Lower number gets
> >  higher priority).
> 
> With less priorities than ports, there is always going to be something
> like this.
> 
> > 
> > Port priority affects:
> > 1. Power-up order: After a reset, the ports are powered up according to
> >  their priority, highest to lowest, highest priority will power up first.
> > 2. Shutdown order: When exceeding the power budget, lowest priority
> >  ports will turn off first.
> > 
> > Should we return sub priorities on the prio get request?
> 
> I should be optional, since we might not actually know what a
> particular device is doing. It could pick at random, it could pick a
> port which is consuming just enough to cover the shortfall if it was
> turned off, it could pick the highest consumer of the lowest priority
> etc. Some of these conditions are not going to be easy to describe
> even if we do know it.

After reviewing the manuals for LTC4266 and TPS2388x, I realized that these
controllers expose interfaces, but they don't implement prioritization concepts
themselves.

The LTC4266 and TPS2388x controllers provide only interfaces that allow the
kernel to manage shutdown and prioritization policies. For TPS2388x, fast
shutdown is implemented as a port bitmask with only two priorities, handled via
the OSS pin. Fast shutdown is triggered by the kernel on request by toggling
the corresponding pin, and the policy - when and why this pin is toggled - is
defined by the kernel or user space. Slow shutdown, on the other hand, is
managed via the I2C bus and allows for more refined control, enabling a wider
range of priorities and more granular policies.

I'll tend to hope we can reuse the proposed ETHTOOL_A_C33_PSE_PRIO interface
across different PSE controllers. However, it is already being mapped to
different shutdown concepts: PD692x0 firmware seems to rely on a slow shutdown
backed by internal policies, while TPS2388x maps it to fast shutdown with
driver specific policy. This inconsistency could force us to either break the
UAPI or introduce a new, inconsistent one once we realize TPS2388x fast
shutdown isn't what we actually need.

Regards,
Oleksij
Andrew Lunn Oct. 9, 2024, 4:09 p.m. UTC | #8
On Wed, Oct 09, 2024 at 09:16:33AM +0200, Oleksij Rempel wrote:
> Hi Andrew,
> 
> On Tue, Oct 08, 2024 at 06:50:25PM +0200, Andrew Lunn wrote:
> > On Tue, Oct 08, 2024 at 03:57:22PM +0200, Oleksij Rempel wrote:
> > > On Thu, Oct 03, 2024 at 01:41:02AM +0200, Andrew Lunn wrote:
> > > > > +	msg = pd692x0_msg_template_list[PD692X0_MSG_SET_PORT_PARAM];
> > > > > +	msg.sub[2] = id;
> > > > > +	/* Controller priority from 1 to 3 */
> > > > > +	msg.data[4] = prio + 1;
> > > > 
> > > > Does 0 have a meaning? It just seems an odd design if it does not.
> > > 
> > > 0 is not documented. But there are sub-priority which are not directly
> > > configured by user, but affect the system behavior.
> > > 
> > > Priority#: Critical – 1; high – 2; low – 3
> > >  For ports with the same priority, the PoE Controller sets the
> > >  sub-priority according to the logic port number. (Lower number gets
> > >  higher priority).
> > 
> > With less priorities than ports, there is always going to be something
> > like this.
> > 
> > > 
> > > Port priority affects:
> > > 1. Power-up order: After a reset, the ports are powered up according to
> > >  their priority, highest to lowest, highest priority will power up first.
> > > 2. Shutdown order: When exceeding the power budget, lowest priority
> > >  ports will turn off first.
> > > 
> > > Should we return sub priorities on the prio get request?
> > 
> > I should be optional, since we might not actually know what a
> > particular device is doing. It could pick at random, it could pick a
> > port which is consuming just enough to cover the shortfall if it was
> > turned off, it could pick the highest consumer of the lowest priority
> > etc. Some of these conditions are not going to be easy to describe
> > even if we do know it.
> 
> After reviewing the manuals for LTC4266 and TPS2388x, I realized that these
> controllers expose interfaces, but they don't implement prioritization concepts
> themselves.
> 
> The LTC4266 and TPS2388x controllers provide only interfaces that allow the
> kernel to manage shutdown and prioritization policies. For TPS2388x, fast
> shutdown is implemented as a port bitmask with only two priorities, handled via
> the OSS pin. Fast shutdown is triggered by the kernel on request by toggling
> the corresponding pin, and the policy - when and why this pin is toggled - is
> defined by the kernel or user space. Slow shutdown, on the other hand, is
> managed via the I2C bus and allows for more refined control, enabling a wider
> range of priorities and more granular policies.
> 
> I'll tend to hope we can reuse the proposed ETHTOOL_A_C33_PSE_PRIO interface
> across different PSE controllers. However, it is already being mapped to
> different shutdown concepts: PD692x0 firmware seems to rely on a slow shutdown
> backed by internal policies, while TPS2388x maps it to fast shutdown with
> driver specific policy. This inconsistency could force us to either break the
> UAPI or introduce a new, inconsistent one once we realize TPS2388x fast
> shutdown isn't what we actually need.

We should try to avoid fragmentation of the API, but given there is no
standardisation here, vendors are free to do whatever they want, this
may be difficult. Still, we should try to avoid it.

Maybe we want to consider a generic simple prioritization in the
kernel, plus export a generic model of PSE to user space to allow a
user space manager? This is really policy, and ideally we don't want
policy in the kernel. The in kernel implementation can use the
hardware prioritisation if it supports it, otherwise provide a library
of code which a driver can use to implement simple software
prioritisation?

For a user space manager, we already talked about an event to signal
that we are entering overload. I guess we additionally need an API to
get the current runtime state, what each consumer is actually
consuming, and a management API to shutdown or enable a port due to
overload, which is separate to the administrative state of a port.

The hardware/firmware might provide lots of options and capabilities,
but sometimes it is better it ignore them. Export a basic set which we
expect most PSE controllers can offer, and do the rest in our software
which we have full insight into, and debug and share, unlike firmware
which is a broken black box.

If we decide this is the way we want to go, we should not need to
extend the API for LTC4266 and TPS2388x, you just implement a basic
prioritisation in the driver. If it turns out to be not ideal, it does
not matter so much, so long as we have the understanding that
eventually we can have something better in userspace.

We can maybe do a rough sketch of what the kAPI looks like for a user
space manager, but we can kick the implementation down the road while
the in kernel prioritization is good enough.

	Andrew
diff mbox series

Patch

diff --git a/drivers/net/pse-pd/pd692x0.c b/drivers/net/pse-pd/pd692x0.c
index 0af7db80b2f8..3a4a9836d621 100644
--- a/drivers/net/pse-pd/pd692x0.c
+++ b/drivers/net/pse-pd/pd692x0.c
@@ -685,6 +685,8 @@  static int pd692x0_ethtool_get_status(struct pse_controller_dev *pcdev,
 	if (ret < 0)
 		return ret;
 	status->c33_avail_pw_limit = ret;
+	/* PSE core priority start at 0 */
+	status->c33_prio = buf.data[2] - 1;
 
 	memset(&buf, 0, sizeof(buf));
 	msg = pd692x0_msg_template_list[PD692X0_MSG_GET_PORT_CLASS];
@@ -1061,6 +1063,25 @@  static int pd692x0_pi_set_current_limit(struct pse_controller_dev *pcdev,
 	return pd692x0_sendrecv_msg(priv, &msg, &buf);
 }
 
+static int pd692x0_pi_set_prio(struct pse_controller_dev *pcdev, int id,
+			       unsigned int prio)
+{
+	struct pd692x0_priv *priv = to_pd692x0_priv(pcdev);
+	struct pd692x0_msg msg, buf = {0};
+	int ret;
+
+	ret = pd692x0_fw_unavailable(priv);
+	if (ret)
+		return ret;
+
+	msg = pd692x0_msg_template_list[PD692X0_MSG_SET_PORT_PARAM];
+	msg.sub[2] = id;
+	/* Controller priority from 1 to 3 */
+	msg.data[4] = prio + 1;
+
+	return pd692x0_sendrecv_msg(priv, &msg, &buf);
+}
+
 static const struct pse_controller_ops pd692x0_ops = {
 	.setup_pi_matrix = pd692x0_setup_pi_matrix,
 	.ethtool_get_status = pd692x0_ethtool_get_status,
@@ -1070,6 +1091,7 @@  static const struct pse_controller_ops pd692x0_ops = {
 	.pi_get_voltage = pd692x0_pi_get_voltage,
 	.pi_get_current_limit = pd692x0_pi_get_current_limit,
 	.pi_set_current_limit = pd692x0_pi_set_current_limit,
+	.pi_set_prio = pd692x0_pi_set_prio,
 };
 
 #define PD692X0_FW_LINE_MAX_SZ 0xff
@@ -1486,6 +1508,7 @@  static int pd692x0_i2c_probe(struct i2c_client *client)
 	priv->pcdev.ops = &pd692x0_ops;
 	priv->pcdev.dev = dev;
 	priv->pcdev.types = ETHTOOL_PSE_C33;
+	priv->pcdev.pis_prio_max = 2;
 	ret = devm_pse_controller_register(dev, &priv->pcdev);
 	if (ret)
 		return dev_err_probe(dev, ret,