diff mbox series

[next] bonding: pair enable_port with slave_arr_updates

Message ID 20220127002605.4049593-1-maheshb@google.com (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series [next] bonding: pair enable_port with slave_arr_updates | expand

Checks

Context Check Description
netdev/fixes_present success Fixes tag not required for -next series
netdev/subject_prefix warning Target tree name not specified in the subject
netdev/cover_letter success Single patches do not need cover letters
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 0 this patch: 0
netdev/cc_maintainers success CCed 6 of 6 maintainers
netdev/build_clang success Errors and warnings before: 0 this patch: 0
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 0 this patch: 0
netdev/checkpatch warning WARNING: Possible repeated word: 'that'
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
netdev/tree_selection success Guessing tree name failed - patch did not apply

Commit Message

When 803.2ad mode enables a participating port, it should update
the slave-array. I have observed that the member links are participating
and are part of the active aggregator while the traffic is egressing via
only one member link (in a case where two links are participating). Via
krpobes I discovered that that slave-arr has only one link added while
the other participating link wasn't part of the slave-arr.

I couldn't see what caused that situation but the simple code-walk
through provided me hints that the enable_port wasn't always associated
with the slave-array update.

Signed-off-by: Mahesh Bandewar <maheshb@google.com>
---
 drivers/net/bonding/bond_3ad.c | 4 ++++
 1 file changed, 4 insertions(+)

Comments

Jay Vosburgh Jan. 27, 2022, 11:54 p.m. UTC | #1
Mahesh Bandewar <maheshb@google.com> wrote:

>When 803.2ad mode enables a participating port, it should update
>the slave-array. I have observed that the member links are participating
>and are part of the active aggregator while the traffic is egressing via
>only one member link (in a case where two links are participating). Via
>krpobes I discovered that that slave-arr has only one link added while
>the other participating link wasn't part of the slave-arr.
>
>I couldn't see what caused that situation but the simple code-walk
>through provided me hints that the enable_port wasn't always associated
>with the slave-array update.
>
>Signed-off-by: Mahesh Bandewar <maheshb@google.com>
>---
> drivers/net/bonding/bond_3ad.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
>diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
>index 6006c2e8fa2b..f20bbc18a03f 100644
>--- a/drivers/net/bonding/bond_3ad.c
>+++ b/drivers/net/bonding/bond_3ad.c
>@@ -1024,6 +1024,8 @@ static void ad_mux_machine(struct port *port, bool *update_slave_arr)
> 
> 					__enable_port(port);
> 				}
>+				/* Slave array needs update */
>+				*update_slave_arr = true;
> 			}

	Shouldn't this be in the same block as the __enable_port() call?
If I'm reading the code correctly, as written this will trigger an
update of the array on every pass of the state machine (every 100ms) if
any port is in COLLECTING_DISTRIBUTING state, which is the usual case.

> 			break;
> 		default:
>@@ -1779,6 +1781,8 @@ static void ad_agg_selection_logic(struct aggregator *agg,
> 			     port = port->next_port_in_aggregator) {
> 				__enable_port(port);
> 			}
>+			/* Slave array needs update. */
>+			*update_slave_arr = true;
> 		}

	I suspect this change would only affect your issue if the port
in question was failing to partner (i.e., the peer wasn't running LACP
or there was some failure in the LACP negotiation).  If the ports in
your test were in the same aggregator, that shouldn't be the case, as I
believe unpartnered ports are always individual (not in an aggregator).

	Do you have a test?

	-J

> 	}
> 
>-- 
>2.35.0.rc0.227.g00780c9af4-goog
>

---
	-Jay Vosburgh, jay.vosburgh@canonical.com
On Thu, Jan 27, 2022 at 3:54 PM Jay Vosburgh <jay.vosburgh@canonical.com> wrote:
>
> Mahesh Bandewar <maheshb@google.com> wrote:
>
> >When 803.2ad mode enables a participating port, it should update
> >the slave-array. I have observed that the member links are participating
> >and are part of the active aggregator while the traffic is egressing via
> >only one member link (in a case where two links are participating). Via
> >krpobes I discovered that that slave-arr has only one link added while
> >the other participating link wasn't part of the slave-arr.
> >
> >I couldn't see what caused that situation but the simple code-walk
> >through provided me hints that the enable_port wasn't always associated
> >with the slave-array update.
> >
> >Signed-off-by: Mahesh Bandewar <maheshb@google.com>
> >---
> > drivers/net/bonding/bond_3ad.c | 4 ++++
> > 1 file changed, 4 insertions(+)
> >
> >diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
> >index 6006c2e8fa2b..f20bbc18a03f 100644
> >--- a/drivers/net/bonding/bond_3ad.c
> >+++ b/drivers/net/bonding/bond_3ad.c
> >@@ -1024,6 +1024,8 @@ static void ad_mux_machine(struct port *port, bool *update_slave_arr)
> >
> >                                       __enable_port(port);
> >                               }
> >+                              /* Slave array needs update */
> >+                              *update_slave_arr = true;
> >                       }
>
>         Shouldn't this be in the same block as the __enable_port() call?
absolutely! It's inefficient to have outside of that if-clause and
would unnecessarily update slave-arr when it's not even needed. My
bad, I'll fix it in v2

> If I'm reading the code correctly, as written this will trigger an
> update of the array on every pass of the state machine (every 100ms) if
> any port is in COLLECTING_DISTRIBUTING state, which is the usual case.
>
> >                       break;
> >               default:
> >@@ -1779,6 +1781,8 @@ static void ad_agg_selection_logic(struct aggregator *agg,
> >                            port = port->next_port_in_aggregator) {
> >                               __enable_port(port);
> >                       }
> >+                      /* Slave array needs update. */
> >+                      *update_slave_arr = true;
> >               }
>
>         I suspect this change would only affect your issue if the port
> in question was failing to partner (i.e., the peer wasn't running LACP
> or there was some failure in the LACP negotiation).  If the ports in
> your test were in the same aggregator, that shouldn't be the case, as I
> believe unpartnered ports are always individual (not in an aggregator).
The condition seems to manifest randomly on some machines and not
always. All links are part of the same aggregator but some transient
situation does break the bond and almost always it reforms but
occasionally it gets into this state I mentioned.

My primary motive behind this fix/patch is to update the slave-arr
when LACP state is changing (for whatever reasons). Enabling port
seems to be an event which must be associated with updating the array
and found these two locations in the code where there is a chance that
update_array may not happen when a port gets enabled.

>
>         Do you have a test?
>
I'm not sure what triggers it and hence I don't have the exact repro
steps / test.

>         -J
>
> >       }
> >
> >--
> >2.35.0.rc0.227.g00780c9af4-goog
> >
>
> ---
>         -Jay Vosburgh, jay.vosburgh@canonical.com
diff mbox series

Patch

diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
index 6006c2e8fa2b..f20bbc18a03f 100644
--- a/drivers/net/bonding/bond_3ad.c
+++ b/drivers/net/bonding/bond_3ad.c
@@ -1024,6 +1024,8 @@  static void ad_mux_machine(struct port *port, bool *update_slave_arr)
 
 					__enable_port(port);
 				}
+				/* Slave array needs update */
+				*update_slave_arr = true;
 			}
 			break;
 		default:
@@ -1779,6 +1781,8 @@  static void ad_agg_selection_logic(struct aggregator *agg,
 			     port = port->next_port_in_aggregator) {
 				__enable_port(port);
 			}
+			/* Slave array needs update. */
+			*update_slave_arr = true;
 		}
 	}