[PATCHv2] teamd: add port_master_ifindex_changed for teamd_event_watch_ops
diff mbox series

Message ID 1533278629-26147-1-git-send-email-liuhangbin@gmail.com
State New
Headers show
Series
  • [PATCHv2] teamd: add port_master_ifindex_changed for teamd_event_watch_ops
Related show

Commit Message

Hangbin Liu Aug. 3, 2018, 6:43 a.m. UTC
When configure two active-backup teams, one slave each, put them
down and up in a loop and test traffic each loop, eventually one of
the teams will not accept/transmit traffic as there is no active port.
Like this:

\#/bin/bash
\# In this scenario we add eth1 to team1, eth2 to team2.
WAIT=2
COUNT=0

start_team()
{
        num=$1
        teamd -o -n -U -d -t team$num -c '{"runner": {"name": "activebackup"},"link_watch": {"name": "ethtool"}}' -gg
        teamdctl team$num port add eth$num
}

while :; do
        echo "==========================================================="
        let "COUNT++"
        echo "Loop $COUNT"
        if teamdctl team1 state | grep -q "active port: eth1" && \
                teamdctl team2 state | grep -q "active port: eth2"; then
                echo "Pass"
        else
                echo "FAIL"
                exit 1
        fi

        teamd -k -t team1
        teamd -k -t team2
        sleep "$WAIT"
        start_team 1 &
        start_team 2 &
        sleep "$WAIT"
done

Failure as follows:

]# teamdctl teamX state dump
    "runner": {
        "active_port": ""
    },

This happens because teamd_port_present() fails in the codepath
ab_link_watch_handler() -> teamd_for_each_tdport() -> teamd_get_next_tdport()
as the link states notification received later than interface up notification.

So the teamd_for_each_tdport() for loop can never run against the tdports
bound to the ctx and a port is never set active.

Currently we only reproduced this with active-backup mode. Fix it by adding a
new teamd_event_watch_ops port_master_ifindex_changed for ab mode.

V2: update commit description from Jamie Bainbridge's reply.

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
---
 teamd/teamd.h                     |  5 +++++
 teamd/teamd_events.c              | 19 +++++++++++++++++++
 teamd/teamd_ifinfo_watch.c        |  5 +++++
 teamd/teamd_runner_activebackup.c |  8 ++++++++
 4 files changed, 37 insertions(+)

Comments

Jiri Pirko Aug. 3, 2018, 12:36 p.m. UTC | #1
Fri, Aug 03, 2018 at 08:43:49AM CEST, liuhangbin@gmail.com wrote:
>When configure two active-backup teams, one slave each, put them
>down and up in a loop and test traffic each loop, eventually one of
>the teams will not accept/transmit traffic as there is no active port.

I don't think that Jamie intended his reply as a replacement of
description...


>Like this:
>
>\#/bin/bash
>\# In this scenario we add eth1 to team1, eth2 to team2.
>WAIT=2
>COUNT=0
>
>start_team()
>{
>        num=$1
>        teamd -o -n -U -d -t team$num -c '{"runner": {"name": "activebackup"},"link_watch": {"name": "ethtool"}}' -gg
>        teamdctl team$num port add eth$num
>}
>
>while :; do
>        echo "==========================================================="
>        let "COUNT++"
>        echo "Loop $COUNT"
>        if teamdctl team1 state | grep -q "active port: eth1" && \
>                teamdctl team2 state | grep -q "active port: eth2"; then
>                echo "Pass"
>        else
>                echo "FAIL"
>                exit 1
>        fi
>
>        teamd -k -t team1
>        teamd -k -t team2
>        sleep "$WAIT"
>        start_team 1 &
>        start_team 2 &
>        sleep "$WAIT"
>done

This reproducer does not work:
===========================================================
Loop 1
Device "team1" does not exist
FAIL



>
>Failure as follows:
>
>]# teamdctl teamX state dump
>    "runner": {
>        "active_port": ""
>    },
>
>This happens because teamd_port_present() fails in the codepath
>ab_link_watch_handler() -> teamd_for_each_tdport() -> teamd_get_next_tdport()
>as the link states notification received later than interface up notification.
>
>So the teamd_for_each_tdport() for loop can never run against the tdports
>bound to the ctx and a port is never set active.
>
>Currently we only reproduced this with active-backup mode. Fix it by adding a
>new teamd_event_watch_ops port_master_ifindex_changed for ab mode.

Okay, so if I understand that correctly, you are getting link up event
(handled by ab_event_watch_port_link_changed() before you get port_added
event (handled by ab_event_watch_port_added()). So wouldn't the
following patch resolve that?

diff --git a/teamd/teamd_runner_activebackup.c b/teamd/teamd_runner_activebackup.c
index 8a3447f1a63d..8cf455d9b3e1 100644
--- a/teamd/teamd_runner_activebackup.c
+++ b/teamd/teamd_runner_activebackup.c
@@ -509,8 +509,12 @@ static int ab_event_watch_port_added(struct teamd_context *ctx,
 				     struct teamd_port *tdport, void *priv)
 {
 	struct ab *ab = priv;
+	int err;
 
-	return teamd_port_priv_create(tdport, &ab_port_priv, ab);
+	err = teamd_port_priv_create(tdport, &ab_port_priv, ab);
+	if (err)
+		return err;
+	return ab_link_watch_handler(ctx, priv);
 }
 
 static int ab_event_watch_port_link_changed(struct teamd_context *ctx,
Hangbin Liu Aug. 6, 2018, 6:18 a.m. UTC | #2
On Fri, Aug 03, 2018 at 02:36:48PM +0200, Jiri Pirko wrote:
> Fri, Aug 03, 2018 at 08:43:49AM CEST, liuhangbin@gmail.com wrote:
> >When configure two active-backup teams, one slave each, put them
> >down and up in a loop and test traffic each loop, eventually one of
> >the teams will not accept/transmit traffic as there is no active port.
> 
> I don't think that Jamie intended his reply as a replacement of
> description...

OK..I thought his reply is more clear to describe the issue. I could re-update
the description if you and Jamie like.

> 
> >Like this:
> >
> >\#/bin/bash
> >\# In this scenario we add eth1 to team1, eth2 to team2.
> >WAIT=2
> >COUNT=0
> >
> >start_team()
> >{
> >        num=$1
> >        teamd -o -n -U -d -t team$num -c '{"runner": {"name": "activebackup"},"link_watch": {"name": "ethtool"}}' -gg
> >        teamdctl team$num port add eth$num
> >}
> >
> >while :; do
> >        echo "==========================================================="
> >        let "COUNT++"
> >        echo "Loop $COUNT"
> >        if teamdctl team1 state | grep -q "active port: eth1" && \
> >                teamdctl team2 state | grep -q "active port: eth2"; then
> >                echo "Pass"
> >        else
> >                echo "FAIL"
> >                exit 1
> >        fi
> >
> >        teamd -k -t team1
> >        teamd -k -t team2
> >        sleep "$WAIT"
> >        start_team 1 &
> >        start_team 2 &
> >        sleep "$WAIT"
> >done
> 
> This reproducer does not work:
> ===========================================================
> Loop 1
> Device "team1" does not exist
> FAIL

This reproducer need a team interface first. Here is an new one.

#!/bin/sh
if [ -z $1 ] || [ -z $2 ]; then
        echo "Usage: $0 iface1 iface2"
        exit 1
else
        iface1=$1
        iface2=$2
fi

WAIT=2
COUNT=0

start_team()
{
        local num=$1
        local iface=$2
        teamd -o -n -U -d -t team$num -c '{"runner": {"name": "activebackup"},"link_watch": {"name": "ethtool"}}' -gg
        teamdctl team$num port add $iface
}

while :; do
        echo "-----------------------------------------------------------"
        let "COUNT++"
        echo "Loop $COUNT"

        teamd -k -t team1
        teamd -k -t team2
        sleep "$WAIT"
        start_team 1 $iface1 &
        start_team 2 $iface2 &
        sleep "$WAIT"

        if teamdctl team1 state | grep -q "active port: $iface1" && \
                teamdctl team2 state | grep -q "active port: $iface2"; then
                echo "Pass"
        else
                echo "FAIL"
                exit 1
        fi
done


Note: I could reproduce the issue easily on VM. But hard to reproduce it on a
physical machine.

> 
> 
> 
> >
> >Failure as follows:
> >
> >]# teamdctl teamX state dump
> >    "runner": {
> >        "active_port": ""
> >    },
> >
> >This happens because teamd_port_present() fails in the codepath
> >ab_link_watch_handler() -> teamd_for_each_tdport() -> teamd_get_next_tdport()
> >as the link states notification received later than interface up notification.
> >
> >So the teamd_for_each_tdport() for loop can never run against the tdports
> >bound to the ctx and a port is never set active.
> >
> >Currently we only reproduced this with active-backup mode. Fix it by adding a
> >new teamd_event_watch_ops port_master_ifindex_changed for ab mode.
> 
> Okay, so if I understand that correctly, you are getting link up event
> (handled by ab_event_watch_port_link_changed() before you get port_added
> event (handled by ab_event_watch_port_added()). So wouldn't the
> following patch resolve that?

No, I got ab_event_watch_port_link_changed() after port_added() event. But
there is no master info in tdport when call ab_event_watch_port_link_changed(). 
That's why I added port_master_ifindex_changed() call back.

> 
> diff --git a/teamd/teamd_runner_activebackup.c b/teamd/teamd_runner_activebackup.c
> index 8a3447f1a63d..8cf455d9b3e1 100644
> --- a/teamd/teamd_runner_activebackup.c
> +++ b/teamd/teamd_runner_activebackup.c
> @@ -509,8 +509,12 @@ static int ab_event_watch_port_added(struct teamd_context *ctx,
>  				     struct teamd_port *tdport, void *priv)
>  {
>  	struct ab *ab = priv;
> +	int err;
>  
> -	return teamd_port_priv_create(tdport, &ab_port_priv, ab);
> +	err = teamd_port_priv_create(tdport, &ab_port_priv, ab);
> +	if (err)
> +		return err;
> +	return ab_link_watch_handler(ctx, priv);
>  }
>  
>  static int ab_event_watch_port_link_changed(struct teamd_context *ctx,

Based on upper reason, and I also tested, this patch not works.

Thanks
Hangbin

Patch
diff mbox series

diff --git a/teamd/teamd.h b/teamd/teamd.h
index 5dbfb9b..3934fc2 100644
--- a/teamd/teamd.h
+++ b/teamd/teamd.h
@@ -189,6 +189,9 @@  struct teamd_event_watch_ops {
 				   struct teamd_port *tdport, void *priv);
 	int (*port_ifname_changed)(struct teamd_context *ctx,
 				   struct teamd_port *tdport, void *priv);
+	int (*port_master_ifindex_changed)(struct teamd_context *ctx,
+					   struct teamd_port *tdport,
+					   void *priv);
 	int (*option_changed)(struct teamd_context *ctx,
 			      struct team_option *option, void *priv);
 	char *option_changed_match_name;
@@ -208,6 +211,8 @@  int teamd_event_ifinfo_hwaddr_changed(struct teamd_context *ctx,
 				      struct team_ifinfo *ifinfo);
 int teamd_event_ifinfo_ifname_changed(struct teamd_context *ctx,
 				      struct team_ifinfo *ifinfo);
+int teamd_event_ifinfo_master_ifindex_changed(struct teamd_context *ctx,
+					      struct team_ifinfo *ifinfo);
 int teamd_event_ifinfo_admin_state_changed(struct teamd_context *ctx,
 					   struct team_ifinfo *ifinfo);
 int teamd_events_init(struct teamd_context *ctx);
diff --git a/teamd/teamd_events.c b/teamd/teamd_events.c
index 1a95974..65aa46a 100644
--- a/teamd/teamd_events.c
+++ b/teamd/teamd_events.c
@@ -167,6 +167,25 @@  int teamd_event_ifinfo_ifname_changed(struct teamd_context *ctx,
 	return 0;
 }
 
+int teamd_event_ifinfo_master_ifindex_changed(struct teamd_context *ctx,
+					      struct team_ifinfo *ifinfo)
+{
+	struct event_watch_item *watch;
+	uint32_t ifindex = team_get_ifinfo_ifindex(ifinfo);
+	struct teamd_port *tdport = teamd_get_port(ctx, ifindex);
+	int err;
+
+	list_for_each_node_entry(watch, &ctx->event_watch_list, list) {
+		if (watch->ops->port_master_ifindex_changed && tdport) {
+			err = watch->ops->port_master_ifindex_changed(ctx, tdport,
+								      watch->priv);
+			if (err)
+				return err;
+		}
+	}
+	return 0;
+}
+
 int teamd_event_ifinfo_admin_state_changed(struct teamd_context *ctx,
 					   struct team_ifinfo *ifinfo)
 {
diff --git a/teamd/teamd_ifinfo_watch.c b/teamd/teamd_ifinfo_watch.c
index f334ff6..6a19532 100644
--- a/teamd/teamd_ifinfo_watch.c
+++ b/teamd/teamd_ifinfo_watch.c
@@ -59,6 +59,11 @@  static int ifinfo_change_handler_func(struct team_handle *th, void *priv,
 			if (err)
 				return err;
 		}
+		if (team_is_ifinfo_master_ifindex_changed(ifinfo)) {
+			err = teamd_event_ifinfo_master_ifindex_changed(ctx, ifinfo);
+			if (err)
+				return err;
+		}
 	}
 	return 0;
 }
diff --git a/teamd/teamd_runner_activebackup.c b/teamd/teamd_runner_activebackup.c
index 8a3447f..f92d341 100644
--- a/teamd/teamd_runner_activebackup.c
+++ b/teamd/teamd_runner_activebackup.c
@@ -520,6 +520,13 @@  static int ab_event_watch_port_link_changed(struct teamd_context *ctx,
 	return ab_link_watch_handler(ctx, priv);
 }
 
+static int ab_event_watch_port_master_ifindex_changed(struct teamd_context *ctx,
+						      struct teamd_port *tdport,
+						      void *priv)
+{
+	return ab_link_watch_handler(ctx, priv);
+}
+
 static int ab_event_watch_prio_option_changed(struct teamd_context *ctx,
 					      struct team_option *option,
 					      void *priv)
@@ -532,6 +539,7 @@  static const struct teamd_event_watch_ops ab_event_watch_ops = {
 	.port_hwaddr_changed = ab_event_watch_port_hwaddr_changed,
 	.port_added = ab_event_watch_port_added,
 	.port_link_changed = ab_event_watch_port_link_changed,
+	.port_master_ifindex_changed = ab_event_watch_port_master_ifindex_changed,
 	.option_changed = ab_event_watch_prio_option_changed,
 	.option_changed_match_name = "priority",
 };