Message ID | 20241002-feature_poe_port_prio-v1-8-787054f74ed5@bootlin.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | Add support for PSE port priority | expand |
> + msg = pd692x0_msg_template_list[PD692X0_MSG_SET_PORT_PARAM]; > + msg.sub[2] = id; > + /* Controller priority from 1 to 3 */ > + msg.data[4] = prio + 1; Does 0 have a meaning? It just seems an odd design if it does not. Andrew
On Thu, 3 Oct 2024 01:41:02 +0200 Andrew Lunn <andrew@lunn.ch> wrote: > > + msg = pd692x0_msg_template_list[PD692X0_MSG_SET_PORT_PARAM]; > > + msg.sub[2] = id; > > + /* Controller priority from 1 to 3 */ > > + msg.data[4] = prio + 1; > > Does 0 have a meaning? It just seems an odd design if it does not. PD692x0 has an odd firmware design from the beginning. ;) Yes, the priority available are from 1 to 3. Setting it to 0 does nothing. Regards,
On Thu, Oct 03, 2024 at 01:41:02AM +0200, Andrew Lunn wrote: > > + msg = pd692x0_msg_template_list[PD692X0_MSG_SET_PORT_PARAM]; > > + msg.sub[2] = id; > > + /* Controller priority from 1 to 3 */ > > + msg.data[4] = prio + 1; > > Does 0 have a meaning? It just seems an odd design if it does not. 0 is not documented. But there are sub-priority which are not directly configured by user, but affect the system behavior. Priority#: Critical – 1; high – 2; low – 3 For ports with the same priority, the PoE Controller sets the sub-priority according to the logic port number. (Lower number gets higher priority). Port priority affects: 1. Power-up order: After a reset, the ports are powered up according to their priority, highest to lowest, highest priority will power up first. 2. Shutdown order: When exceeding the power budget, lowest priority ports will turn off first. Should we return sub priorities on the prio get request? If i see it correctly, even if user do not actively configures priorities, they are always present. For example port 0 will have always a Prio higher than Port 10. Regards, Oleksij
On Tue, 8 Oct 2024 15:57:22 +0200 Oleksij Rempel <o.rempel@pengutronix.de> wrote: > On Thu, Oct 03, 2024 at 01:41:02AM +0200, Andrew Lunn wrote: > > > + msg = pd692x0_msg_template_list[PD692X0_MSG_SET_PORT_PARAM]; > > > + msg.sub[2] = id; > > > + /* Controller priority from 1 to 3 */ > > > + msg.data[4] = prio + 1; > > > > Does 0 have a meaning? It just seems an odd design if it does not. > > 0 is not documented. But there are sub-priority which are not directly > configured by user, but affect the system behavior. > > Priority#: Critical – 1; high – 2; low – 3 > For ports with the same priority, the PoE Controller sets the > sub-priority according to the logic port number. (Lower number gets > higher priority). > > Port priority affects: > 1. Power-up order: After a reset, the ports are powered up according to > their priority, highest to lowest, highest priority will power up first. > 2. Shutdown order: When exceeding the power budget, lowest priority > ports will turn off first. > > Should we return sub priorities on the prio get request? > > If i see it correctly, even if user do not actively configures priorities, > they are always present. For example port 0 will have always a Prio > higher than Port 10. We could add a subprio ehtool attribute, but it won't be configurable. In fact it could be configurable by changing the port matrix order but it is not a good idea. Applying a new port matrix turn off all the ports. I am not sure if it is specific to Microchip controller or if it is generic enough to add the attribute. I would say not to return it for now. Regards,
On Tue, Oct 08, 2024 at 04:21:20PM +0200, Kory Maincent wrote: > On Tue, 8 Oct 2024 15:57:22 +0200 > Oleksij Rempel <o.rempel@pengutronix.de> wrote: > > > On Thu, Oct 03, 2024 at 01:41:02AM +0200, Andrew Lunn wrote: > > > > + msg = pd692x0_msg_template_list[PD692X0_MSG_SET_PORT_PARAM]; > > > > + msg.sub[2] = id; > > > > + /* Controller priority from 1 to 3 */ > > > > + msg.data[4] = prio + 1; > > > > > > Does 0 have a meaning? It just seems an odd design if it does not. > > > > 0 is not documented. But there are sub-priority which are not directly > > configured by user, but affect the system behavior. > > > > Priority#: Critical – 1; high – 2; low – 3 > > For ports with the same priority, the PoE Controller sets the > > sub-priority according to the logic port number. (Lower number gets > > higher priority). > > > > Port priority affects: > > 1. Power-up order: After a reset, the ports are powered up according to > > their priority, highest to lowest, highest priority will power up first. > > 2. Shutdown order: When exceeding the power budget, lowest priority > > ports will turn off first. > > > > Should we return sub priorities on the prio get request? > > > > If i see it correctly, even if user do not actively configures priorities, > > they are always present. For example port 0 will have always a Prio > > higher than Port 10. > > We could add a subprio ehtool attribute, but it won't be configurable. > In fact it could be configurable by changing the port matrix order but it is not > a good idea. Applying a new port matrix turn off all the ports. > > I am not sure if it is specific to Microchip controller or if it is generic > enough to add the attribute. > I would say not to return it for now. The generic attribute do not reflect the behavior of two different controllers. Currently implemented prio attribute is in this case TI specific and do not work for Microchip case. Please note, I do not care about configurability in this case, I only care about information the user get.
On Tue, Oct 08, 2024 at 03:57:22PM +0200, Oleksij Rempel wrote: > On Thu, Oct 03, 2024 at 01:41:02AM +0200, Andrew Lunn wrote: > > > + msg = pd692x0_msg_template_list[PD692X0_MSG_SET_PORT_PARAM]; > > > + msg.sub[2] = id; > > > + /* Controller priority from 1 to 3 */ > > > + msg.data[4] = prio + 1; > > > > Does 0 have a meaning? It just seems an odd design if it does not. > > 0 is not documented. But there are sub-priority which are not directly > configured by user, but affect the system behavior. > > Priority#: Critical – 1; high – 2; low – 3 > For ports with the same priority, the PoE Controller sets the > sub-priority according to the logic port number. (Lower number gets > higher priority). With less priorities than ports, there is always going to be something like this. > > Port priority affects: > 1. Power-up order: After a reset, the ports are powered up according to > their priority, highest to lowest, highest priority will power up first. > 2. Shutdown order: When exceeding the power budget, lowest priority > ports will turn off first. > > Should we return sub priorities on the prio get request? I should be optional, since we might not actually know what a particular device is doing. It could pick at random, it could pick a port which is consuming just enough to cover the shortfall if it was turned off, it could pick the highest consumer of the lowest priority etc. Some of these conditions are not going to be easy to describe even if we do know it. Andrew
Hi Andrew, On Tue, Oct 08, 2024 at 06:50:25PM +0200, Andrew Lunn wrote: > On Tue, Oct 08, 2024 at 03:57:22PM +0200, Oleksij Rempel wrote: > > On Thu, Oct 03, 2024 at 01:41:02AM +0200, Andrew Lunn wrote: > > > > + msg = pd692x0_msg_template_list[PD692X0_MSG_SET_PORT_PARAM]; > > > > + msg.sub[2] = id; > > > > + /* Controller priority from 1 to 3 */ > > > > + msg.data[4] = prio + 1; > > > > > > Does 0 have a meaning? It just seems an odd design if it does not. > > > > 0 is not documented. But there are sub-priority which are not directly > > configured by user, but affect the system behavior. > > > > Priority#: Critical – 1; high – 2; low – 3 > > For ports with the same priority, the PoE Controller sets the > > sub-priority according to the logic port number. (Lower number gets > > higher priority). > > With less priorities than ports, there is always going to be something > like this. > > > > > Port priority affects: > > 1. Power-up order: After a reset, the ports are powered up according to > > their priority, highest to lowest, highest priority will power up first. > > 2. Shutdown order: When exceeding the power budget, lowest priority > > ports will turn off first. > > > > Should we return sub priorities on the prio get request? > > I should be optional, since we might not actually know what a > particular device is doing. It could pick at random, it could pick a > port which is consuming just enough to cover the shortfall if it was > turned off, it could pick the highest consumer of the lowest priority > etc. Some of these conditions are not going to be easy to describe > even if we do know it. After reviewing the manuals for LTC4266 and TPS2388x, I realized that these controllers expose interfaces, but they don't implement prioritization concepts themselves. The LTC4266 and TPS2388x controllers provide only interfaces that allow the kernel to manage shutdown and prioritization policies. For TPS2388x, fast shutdown is implemented as a port bitmask with only two priorities, handled via the OSS pin. Fast shutdown is triggered by the kernel on request by toggling the corresponding pin, and the policy - when and why this pin is toggled - is defined by the kernel or user space. Slow shutdown, on the other hand, is managed via the I2C bus and allows for more refined control, enabling a wider range of priorities and more granular policies. I'll tend to hope we can reuse the proposed ETHTOOL_A_C33_PSE_PRIO interface across different PSE controllers. However, it is already being mapped to different shutdown concepts: PD692x0 firmware seems to rely on a slow shutdown backed by internal policies, while TPS2388x maps it to fast shutdown with driver specific policy. This inconsistency could force us to either break the UAPI or introduce a new, inconsistent one once we realize TPS2388x fast shutdown isn't what we actually need. Regards, Oleksij
On Wed, Oct 09, 2024 at 09:16:33AM +0200, Oleksij Rempel wrote: > Hi Andrew, > > On Tue, Oct 08, 2024 at 06:50:25PM +0200, Andrew Lunn wrote: > > On Tue, Oct 08, 2024 at 03:57:22PM +0200, Oleksij Rempel wrote: > > > On Thu, Oct 03, 2024 at 01:41:02AM +0200, Andrew Lunn wrote: > > > > > + msg = pd692x0_msg_template_list[PD692X0_MSG_SET_PORT_PARAM]; > > > > > + msg.sub[2] = id; > > > > > + /* Controller priority from 1 to 3 */ > > > > > + msg.data[4] = prio + 1; > > > > > > > > Does 0 have a meaning? It just seems an odd design if it does not. > > > > > > 0 is not documented. But there are sub-priority which are not directly > > > configured by user, but affect the system behavior. > > > > > > Priority#: Critical – 1; high – 2; low – 3 > > > For ports with the same priority, the PoE Controller sets the > > > sub-priority according to the logic port number. (Lower number gets > > > higher priority). > > > > With less priorities than ports, there is always going to be something > > like this. > > > > > > > > Port priority affects: > > > 1. Power-up order: After a reset, the ports are powered up according to > > > their priority, highest to lowest, highest priority will power up first. > > > 2. Shutdown order: When exceeding the power budget, lowest priority > > > ports will turn off first. > > > > > > Should we return sub priorities on the prio get request? > > > > I should be optional, since we might not actually know what a > > particular device is doing. It could pick at random, it could pick a > > port which is consuming just enough to cover the shortfall if it was > > turned off, it could pick the highest consumer of the lowest priority > > etc. Some of these conditions are not going to be easy to describe > > even if we do know it. > > After reviewing the manuals for LTC4266 and TPS2388x, I realized that these > controllers expose interfaces, but they don't implement prioritization concepts > themselves. > > The LTC4266 and TPS2388x controllers provide only interfaces that allow the > kernel to manage shutdown and prioritization policies. For TPS2388x, fast > shutdown is implemented as a port bitmask with only two priorities, handled via > the OSS pin. Fast shutdown is triggered by the kernel on request by toggling > the corresponding pin, and the policy - when and why this pin is toggled - is > defined by the kernel or user space. Slow shutdown, on the other hand, is > managed via the I2C bus and allows for more refined control, enabling a wider > range of priorities and more granular policies. > > I'll tend to hope we can reuse the proposed ETHTOOL_A_C33_PSE_PRIO interface > across different PSE controllers. However, it is already being mapped to > different shutdown concepts: PD692x0 firmware seems to rely on a slow shutdown > backed by internal policies, while TPS2388x maps it to fast shutdown with > driver specific policy. This inconsistency could force us to either break the > UAPI or introduce a new, inconsistent one once we realize TPS2388x fast > shutdown isn't what we actually need. We should try to avoid fragmentation of the API, but given there is no standardisation here, vendors are free to do whatever they want, this may be difficult. Still, we should try to avoid it. Maybe we want to consider a generic simple prioritization in the kernel, plus export a generic model of PSE to user space to allow a user space manager? This is really policy, and ideally we don't want policy in the kernel. The in kernel implementation can use the hardware prioritisation if it supports it, otherwise provide a library of code which a driver can use to implement simple software prioritisation? For a user space manager, we already talked about an event to signal that we are entering overload. I guess we additionally need an API to get the current runtime state, what each consumer is actually consuming, and a management API to shutdown or enable a port due to overload, which is separate to the administrative state of a port. The hardware/firmware might provide lots of options and capabilities, but sometimes it is better it ignore them. Export a basic set which we expect most PSE controllers can offer, and do the rest in our software which we have full insight into, and debug and share, unlike firmware which is a broken black box. If we decide this is the way we want to go, we should not need to extend the API for LTC4266 and TPS2388x, you just implement a basic prioritisation in the driver. If it turns out to be not ideal, it does not matter so much, so long as we have the understanding that eventually we can have something better in userspace. We can maybe do a rough sketch of what the kAPI looks like for a user space manager, but we can kick the implementation down the road while the in kernel prioritization is good enough. Andrew
diff --git a/drivers/net/pse-pd/pd692x0.c b/drivers/net/pse-pd/pd692x0.c index 0af7db80b2f8..3a4a9836d621 100644 --- a/drivers/net/pse-pd/pd692x0.c +++ b/drivers/net/pse-pd/pd692x0.c @@ -685,6 +685,8 @@ static int pd692x0_ethtool_get_status(struct pse_controller_dev *pcdev, if (ret < 0) return ret; status->c33_avail_pw_limit = ret; + /* PSE core priority start at 0 */ + status->c33_prio = buf.data[2] - 1; memset(&buf, 0, sizeof(buf)); msg = pd692x0_msg_template_list[PD692X0_MSG_GET_PORT_CLASS]; @@ -1061,6 +1063,25 @@ static int pd692x0_pi_set_current_limit(struct pse_controller_dev *pcdev, return pd692x0_sendrecv_msg(priv, &msg, &buf); } +static int pd692x0_pi_set_prio(struct pse_controller_dev *pcdev, int id, + unsigned int prio) +{ + struct pd692x0_priv *priv = to_pd692x0_priv(pcdev); + struct pd692x0_msg msg, buf = {0}; + int ret; + + ret = pd692x0_fw_unavailable(priv); + if (ret) + return ret; + + msg = pd692x0_msg_template_list[PD692X0_MSG_SET_PORT_PARAM]; + msg.sub[2] = id; + /* Controller priority from 1 to 3 */ + msg.data[4] = prio + 1; + + return pd692x0_sendrecv_msg(priv, &msg, &buf); +} + static const struct pse_controller_ops pd692x0_ops = { .setup_pi_matrix = pd692x0_setup_pi_matrix, .ethtool_get_status = pd692x0_ethtool_get_status, @@ -1070,6 +1091,7 @@ static const struct pse_controller_ops pd692x0_ops = { .pi_get_voltage = pd692x0_pi_get_voltage, .pi_get_current_limit = pd692x0_pi_get_current_limit, .pi_set_current_limit = pd692x0_pi_set_current_limit, + .pi_set_prio = pd692x0_pi_set_prio, }; #define PD692X0_FW_LINE_MAX_SZ 0xff @@ -1486,6 +1508,7 @@ static int pd692x0_i2c_probe(struct i2c_client *client) priv->pcdev.ops = &pd692x0_ops; priv->pcdev.dev = dev; priv->pcdev.types = ETHTOOL_PSE_C33; + priv->pcdev.pis_prio_max = 2; ret = devm_pse_controller_register(dev, &priv->pcdev); if (ret) return dev_err_probe(dev, ret,