diff mbox series

net br_netlink.c:y allow non "disabled" state for !netif_oper_up() links

Message ID 20221109152410.3572632-2-giometti@enneenne.com (mailing list archive)
State Changes Requested
Delegated to: Netdev Maintainers
Headers show
Series net br_netlink.c:y allow non "disabled" state for !netif_oper_up() links | expand

Checks

Context Check Description
netdev/tree_selection success Guessed tree name to be net-next
netdev/fixes_present success Fixes tag not required for -next series
netdev/subject_prefix warning Target tree name not specified in the subject
netdev/cover_letter success Single patches do not need cover letters
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 0 this patch: 0
netdev/cc_maintainers warning 4 maintainers not CCed: bridge@lists.linux-foundation.org pabeni@redhat.com edumazet@google.com kuba@kernel.org
netdev/build_clang success Errors and warnings before: 0 this patch: 0
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 0 this patch: 0
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 13 lines checked
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Rodolfo Giometti Nov. 9, 2022, 3:24 p.m. UTC
A generic loop-free network protocol (such as STP or MRP and others) may
require that a link not in an operational state be into a non "disabled"
state (such as listening).

For example MRP states that a MRM should set into a "BLOCKED" state (which is
equivalent to the LISTENING state for Linux bridges) one of its ring
connection if it detects that this connection is "DOWN" (that is the
NO-CARRIER status).

Signed-off-by: Rodolfo Giometti <giometti@enneenne.com>
---
 net/bridge/br_netlink.c | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

Comments

Andrew Lunn Nov. 9, 2022, 5:34 p.m. UTC | #1
On Wed, Nov 09, 2022 at 04:24:10PM +0100, Rodolfo Giometti wrote:
> A generic loop-free network protocol (such as STP or MRP and others) may
> require that a link not in an operational state be into a non "disabled"
> state (such as listening).
> 
> For example MRP states that a MRM should set into a "BLOCKED" state (which is
> equivalent to the LISTENING state for Linux bridges) one of its ring
> connection if it detects that this connection is "DOWN" (that is the
> NO-CARRIER status).

Does MRP explain Why?

This change seems odd, and "Because the standard says so" is not the
best of explanations.

     Andrew
Rodolfo Giometti Nov. 9, 2022, 6:19 p.m. UTC | #2
On 09/11/22 18:34, Andrew Lunn wrote:
> On Wed, Nov 09, 2022 at 04:24:10PM +0100, Rodolfo Giometti wrote:
>> A generic loop-free network protocol (such as STP or MRP and others) may
>> require that a link not in an operational state be into a non "disabled"
>> state (such as listening).
>>
>> For example MRP states that a MRM should set into a "BLOCKED" state (which is
>> equivalent to the LISTENING state for Linux bridges) one of its ring
>> connection if it detects that this connection is "DOWN" (that is the
>> NO-CARRIER status).
> 
> Does MRP explain Why?
> 
> This change seems odd, and "Because the standard says so" is not the
> best of explanations.

A MRM instance has two ports: primary port (PRM_RPort) and secondary port 
(SEC_RPort).

When both ports are UP (that is the CARRIER is on) the MRM is into the 
Ring_closed state and the PRM_RPort is in forwarding state while the SEC_RPort 
is in blocking state (remember that MRP blocking is equal to Linux bridge 
listening).

If the PRM_RPort losts its carrier and the link goes down the normative states that:

- ports role swap (PRM_RPort becomes SEC_RPort and vice versa).

- SEC_RPort must be set into blocking state.

- PRM_RPort must be set into forwarding state.

Then the MRM moves into a new state called Primary-UP. In this state, when the 
SEC_RPort returns to UP state (that is the CARRIER is up) it's returns into the 
Ring_closed state where both ports have the right status, that is the PRM_RPort 
is in forwarding state while the SEC_RPort is in blocking state.

This is just an example of one single case, but consider that, in general, when 
the carrier is lost the port state is moved into blocking so that when the 
carrier returns the port it's already into the right state.

Hope it's clearer now.

However, despite this special case, I think that kernel code should implement 
mechanisms and not policies, shouldn't it? If user space needs a non operational 
port (that is with no carrier) into the listening state, why we should prevent it?

Ciao,

Rodolfo
Andrew Lunn Nov. 9, 2022, 6:46 p.m. UTC | #3
On Wed, Nov 09, 2022 at 07:19:22PM +0100, Rodolfo Giometti wrote:
> On 09/11/22 18:34, Andrew Lunn wrote:
> > On Wed, Nov 09, 2022 at 04:24:10PM +0100, Rodolfo Giometti wrote:
> > > A generic loop-free network protocol (such as STP or MRP and others) may
> > > require that a link not in an operational state be into a non "disabled"
> > > state (such as listening).
> > > 
> > > For example MRP states that a MRM should set into a "BLOCKED" state (which is
> > > equivalent to the LISTENING state for Linux bridges) one of its ring
> > > connection if it detects that this connection is "DOWN" (that is the
> > > NO-CARRIER status).
> > 
> > Does MRP explain Why?
> > 
> > This change seems odd, and "Because the standard says so" is not the
> > best of explanations.
> 
> A MRM instance has two ports: primary port (PRM_RPort) and secondary port
> (SEC_RPort).
> 
> When both ports are UP (that is the CARRIER is on) the MRM is into the
> Ring_closed state and the PRM_RPort is in forwarding state while the
> SEC_RPort is in blocking state (remember that MRP blocking is equal to Linux
> bridge listening).
> 
> If the PRM_RPort losts its carrier and the link goes down the normative states that:
> 
> - ports role swap (PRM_RPort becomes SEC_RPort and vice versa).
> 
> - SEC_RPort must be set into blocking state.
> 
> - PRM_RPort must be set into forwarding state.
> 
> Then the MRM moves into a new state called Primary-UP. In this state, when
> the SEC_RPort returns to UP state (that is the CARRIER is up) it's returns
> into the Ring_closed state where both ports have the right status, that is
> the PRM_RPort is in forwarding state while the SEC_RPort is in blocking
> state.
> 
> This is just an example of one single case, but consider that, in general,
> when the carrier is lost the port state is moved into blocking so that when
> the carrier returns the port it's already into the right state.
> 
> Hope it's clearer now.

Yes, please add this to the commit message. The commit message is
supposed to explain Why, and this is a good example.
 
> However, despite this special case, I think that kernel code should
> implement mechanisms and not policies, shouldn't it? If user space needs a
> non operational port (that is with no carrier) into the listening state, why
> we should prevent it?

Did you dig deeper? Does the bridge make use of switchdev to tell the
hardware about this state change while the carrier is down? I also
wonder what the hardware drivers do? Since this is a change in
behaviour, they might not actually do anything. So then you have to
consider does it make sense for the bridge to set the state again
after the carrier comes up?

       Andrew
Rodolfo Giometti Nov. 11, 2022, 9:43 a.m. UTC | #4
On 09/11/22 19:46, Andrew Lunn wrote:
> On Wed, Nov 09, 2022 at 07:19:22PM +0100, Rodolfo Giometti wrote:
>> On 09/11/22 18:34, Andrew Lunn wrote:
>>> On Wed, Nov 09, 2022 at 04:24:10PM +0100, Rodolfo Giometti wrote:
>>>> A generic loop-free network protocol (such as STP or MRP and others) may
>>>> require that a link not in an operational state be into a non "disabled"
>>>> state (such as listening).
>>>>
>>>> For example MRP states that a MRM should set into a "BLOCKED" state (which is
>>>> equivalent to the LISTENING state for Linux bridges) one of its ring
>>>> connection if it detects that this connection is "DOWN" (that is the
>>>> NO-CARRIER status).
>>>
>>> Does MRP explain Why?
>>>
>>> This change seems odd, and "Because the standard says so" is not the
>>> best of explanations.
>>
>> A MRM instance has two ports: primary port (PRM_RPort) and secondary port
>> (SEC_RPort).
>>
>> When both ports are UP (that is the CARRIER is on) the MRM is into the
>> Ring_closed state and the PRM_RPort is in forwarding state while the
>> SEC_RPort is in blocking state (remember that MRP blocking is equal to Linux
>> bridge listening).
>>
>> If the PRM_RPort losts its carrier and the link goes down the normative states that:
>>
>> - ports role swap (PRM_RPort becomes SEC_RPort and vice versa).
>>
>> - SEC_RPort must be set into blocking state.
>>
>> - PRM_RPort must be set into forwarding state.
>>
>> Then the MRM moves into a new state called Primary-UP. In this state, when
>> the SEC_RPort returns to UP state (that is the CARRIER is up) it's returns
>> into the Ring_closed state where both ports have the right status, that is
>> the PRM_RPort is in forwarding state while the SEC_RPort is in blocking
>> state.
>>
>> This is just an example of one single case, but consider that, in general,
>> when the carrier is lost the port state is moved into blocking so that when
>> the carrier returns the port it's already into the right state.
>>
>> Hope it's clearer now.
> 
> Yes, please add this to the commit message. The commit message is
> supposed to explain Why, and this is a good example.

OK. I'm going to do it in v2.

>> However, despite this special case, I think that kernel code should
>> implement mechanisms and not policies, shouldn't it? If user space needs a
>> non operational port (that is with no carrier) into the listening state, why
>> we should prevent it?
> 
> Did you dig deeper? Does the bridge make use of switchdev to tell the
> hardware about this state change while the carrier is down?

I think so since the function br_set_state() do it.

> I also
> wonder what the hardware drivers do? Since this is a change in
> behaviour, they might not actually do anything.

For instance Marvell switches just set the state (see 
linux/drivers/net/dsa/mv88e6xxx/port.c) without checking for carrier status:

int mv88e6xxx_port_set_state(struct mv88e6xxx_chip *chip, int port, u8 state)
{
         u16 reg;
         int err;

         err = mv88e6xxx_port_read(chip, port, MV88E6XXX_PORT_CTL0, &reg);
         if (err)
                 return err;

         reg &= ~MV88E6XXX_PORT_CTL0_STATE_MASK;

         switch (state) {
         case BR_STATE_DISABLED:
                 state = MV88E6XXX_PORT_CTL0_STATE_DISABLED;
                 break;
         case BR_STATE_BLOCKING:
         case BR_STATE_LISTENING:
                 state = MV88E6XXX_PORT_CTL0_STATE_BLOCKING;
                 break;
         case BR_STATE_LEARNING:
                 state = MV88E6XXX_PORT_CTL0_STATE_LEARNING;
                 break;
         case BR_STATE_FORWARDING:
                 state = MV88E6XXX_PORT_CTL0_STATE_FORWARDING;
                 break;
         default:
                 return -EINVAL;
         }

         reg |= state;

         err = mv88e6xxx_port_write(chip, port, MV88E6XXX_PORT_CTL0, reg);
         if (err)
                 return err;

         dev_dbg(chip->dev, "p%d: PortState set to %s\n", port,
                 mv88e6xxx_port_state_names[state]);

         return 0;
}

> So then you have to
> consider does it make sense for the bridge to set the state again
> after the carrier comes up?

Yes, of course we can do it but (in case of MRP) the state machine must be 
altered in several points and, again, why the kernel should force such behaviour 
(i.e. introducing a policy) when drivers just don't consider it (see the above 
example).

The kernel should implement mechanisms while all policies should be into user space.

Ciao,

Rodolfo
Andrew Lunn Nov. 11, 2022, 1:09 p.m. UTC | #5
> > I also
> > wonder what the hardware drivers do? Since this is a change in
> > behaviour, they might not actually do anything.
> 
> For instance Marvell switches just set the state (see
> linux/drivers/net/dsa/mv88e6xxx/port.c) without checking for carrier status:

Yes, that was one i checked myself. I think i remember reviewing a DSA
driver which did not have a mechanism to disable a port, other than
the STP state. So there is a danger the mac_down() call is going to
change the STP state, and the mac_up() call will change it again.

> Yes, of course we can do it but (in case of MRP) the state machine must be
> altered in several points and, again, why the kernel should force such
> behaviour (i.e. introducing a policy) when drivers just don't consider it
> (see the above example).
> 
> The kernel should implement mechanisms while all policies should be into user space.

While i agree the policy should not be in the kernel, you have history
against you. Since this was never a requirement, and on first
mentioning it, it seems like an odd requirement, there is no guarantee
it will actually work for all drivers. So either you have to:

1) Say some kernel drivers are probably broken, will do horrible
   things to your network instead of being redundant, test it well
   before deploying.

2) Monitor for the link up event, and set the STP state as required.

The in kernel bridge/STP code takes this second approach, which again
reinforces the fact that because drivers never needed to support this,
some probably don't.

     Andrew
diff mbox series

Patch

diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c
index 5c6c4305ed23..3f9f45c3d274 100644
--- a/net/bridge/br_netlink.c
+++ b/net/bridge/br_netlink.c
@@ -841,11 +841,8 @@  static int br_set_port_state(struct net_bridge_port *p, u8 state)
 	if (p->br->stp_enabled == BR_KERNEL_STP)
 		return -EBUSY;
 
-	/* if device is not up, change is not allowed
-	 * if link is not present, only allowable state is disabled
-	 */
-	if (!netif_running(p->dev) ||
-	    (!netif_oper_up(p->dev) && state != BR_STATE_DISABLED))
+	/* if device is not up, change is not allowed */
+	if (!netif_running(p->dev))
 		return -ENETDOWN;
 
 	br_set_state(p, state);