teamd: add a default value 100 for link_watch.interval
diff mbox series

Message ID 20190304063544.12297-1-liuhangbin@gmail.com
State New
Headers show
Series
  • teamd: add a default value 100 for link_watch.interval
Related show

Commit Message

Hangbin Liu March 4, 2019, 6:35 a.m. UTC
As we don't have a default value for link_watch.interval. If user
forgot to set this parameter, teamd will failed to init port priv and
exit at the end.

e.g.
teamd -g -c '{"runner":{"name":"activebackup"},"link_watch":{"name":"arp_ping","target_host":"198.51.100.1"}}'
teamdctl team0 port add p5p1
teamdctl team0 port add p5p2

teamd debug log shows:
p5p2: Got link watch from global config.
p5p2: Using sticky "0".
Failed to get "interval" link-watch option.
Failed to load options.
Failed to init port priv.
Callback named "lw_periodic" not found.
Callback named "lw_socket" not found.
Loop callback failed with: Invalid argument
Failed loop callback: libteam_events, 0x5624c28b9410
select() failed.
Exiting...
Removed loop callback: usock_acc_conn, 0x5624c28bab60
Removed loop callback: usock, 0x5624c28b9410
Removed loop callback: workq, 0x5624c28b9410
Removed loop callback: libteam_events, 0x5624c28b9410
Removed loop callback: daemon, 0x5624c28b9410
Failed: Bad file descriptor

Fix it by adding a default value for link_watch.interval.

Reported-by: LiLiang <liali@redhat.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
---
 man/teamd.conf.5     | 10 ++++++++++
 teamd/teamd_lw_psr.c | 11 ++++++++---
 2 files changed, 18 insertions(+), 3 deletions(-)

Comments

Jamie Bainbridge March 5, 2019, 2:45 a.m. UTC | #1
A default interval is a good idea, bonding does this too for miimon,
but 100ms is too small for the ARP monitor.

It is possible for many ARP requests to overwhelm an ARP target. A /24
subnet of 100ms ARP monitors usually achieves this as it generates
2500 ARP requests per second (253 other hosts * 10 per second).

The default ARP monitor interval should be at least 1000ms, a tenfold
reduction in ARP requests from the above.

Suggest revise this patch to either:
- set default 100 for ethtool watch, and 1000 for ARP and NA/NS watch
- set default 1000 for all watchers

Jamie

On Mon, 4 Mar 2019 at 16:55, Hangbin Liu <liuhangbin@gmail.com> wrote:
>
> As we don't have a default value for link_watch.interval. If user
> forgot to set this parameter, teamd will failed to init port priv and
> exit at the end.
>
> e.g.
> teamd -g -c '{"runner":{"name":"activebackup"},"link_watch":{"name":"arp_ping","target_host":"198.51.100.1"}}'
> teamdctl team0 port add p5p1
> teamdctl team0 port add p5p2
>
> teamd debug log shows:
> p5p2: Got link watch from global config.
> p5p2: Using sticky "0".
> Failed to get "interval" link-watch option.
> Failed to load options.
> Failed to init port priv.
> Callback named "lw_periodic" not found.
> Callback named "lw_socket" not found.
> Loop callback failed with: Invalid argument
> Failed loop callback: libteam_events, 0x5624c28b9410
> select() failed.
> Exiting...
> Removed loop callback: usock_acc_conn, 0x5624c28bab60
> Removed loop callback: usock, 0x5624c28b9410
> Removed loop callback: workq, 0x5624c28b9410
> Removed loop callback: libteam_events, 0x5624c28b9410
> Removed loop callback: daemon, 0x5624c28b9410
> Failed: Bad file descriptor
>
> Fix it by adding a default value for link_watch.interval.
>
> Reported-by: LiLiang <liali@redhat.com>
> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
> ---
>  man/teamd.conf.5     | 10 ++++++++++
>  teamd/teamd_lw_psr.c | 11 ++++++++---
>  2 files changed, 18 insertions(+), 3 deletions(-)
>
> diff --git a/man/teamd.conf.5 b/man/teamd.conf.5
> index 9bdf46a..6d5374a 100644
> --- a/man/teamd.conf.5
> +++ b/man/teamd.conf.5
> @@ -308,6 +308,11 @@ Default:
>  .TP
>  .BR "link_watch.interval "| " ports.PORTIFNAME.link_watch.interval " (int)
>  Value is a positive number in milliseconds. It is the interval between ARP requests being sent.
> +.RS 7
> +.PP
> +Default:
> +.BR "100"
> +.RE
>  .TP
>  .BR "link_watch.init_wait "| " ports.PORTIFNAME.link_watch.init_wait " (int)
>  Value is a positive number in milliseconds. It is the delay between link watch initialization and the first ARP request being sent.
> @@ -371,6 +376,11 @@ Default:
>  .TP
>  .BR "link_watch.interval "| " ports.PORTIFNAME.link_watch.interval " (int)
>  Value is a positive number in milliseconds. It is the interval between sending NS packets.
> +.RS 7
> +.PP
> +Default:
> +.BR "100"
> +.RE
>  .TP
>  .BR "link_watch.init_wait "| " ports.PORTIFNAME.link_watch.init_wait " (int)
>  Value is a positive number in milliseconds. It is the delay between link watch initialization and the first NS packet being sent.
> diff --git a/teamd/teamd_lw_psr.c b/teamd/teamd_lw_psr.c
> index c0772db..3ddf295 100644
> --- a/teamd/teamd_lw_psr.c
> +++ b/teamd/teamd_lw_psr.c
> @@ -29,6 +29,7 @@
>
>  static const struct timespec lw_psr_default_init_wait = { 0, 1 };
>  #define LW_PSR_DEFAULT_MISSED_MAX 3
> +#define LW_PSR_DEFAULT_INTERVAL 100
>
>  #define LW_PERIODIC_CB_NAME "lw_periodic"
>  static int lw_psr_callback_periodic(struct teamd_context *ctx, int events, void *priv)
> @@ -77,9 +78,13 @@ static int lw_psr_load_options(struct teamd_context *ctx,
>         int tmp;
>
>         err = teamd_config_int_get(ctx, &tmp, "@.interval", cpcookie);
> -       if (err) {
> -               teamd_log_err("Failed to get \"interval\" link-watch option.");
> -               return -EINVAL;
> +       if (!err) {
> +               if (tmp < 0) {
> +                       teamd_log_err("\"interval\" must not be negative number.");
> +                       return -EINVAL;
> +               }
> +       } else {
> +               tmp = LW_PSR_DEFAULT_INTERVAL;
>         }
>         teamd_log_dbg("interval \"%d\".", tmp);
>         ms_to_timespec(&psr_ppriv->interval, tmp);
> --
> 2.19.2
> _______________________________________________
> libteam mailing list -- libteam@lists.fedorahosted.org
> To unsubscribe send an email to libteam-leave@lists.fedorahosted.org
> Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
> List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
> List Archives: https://lists.fedorahosted.org/archives/list/libteam@lists.fedorahosted.org
Hangbin Liu March 5, 2019, 3:37 a.m. UTC | #2
Hi Jamie,
On Tue, Mar 05, 2019 at 12:45:05PM +1000, Jamie Bainbridge wrote:
> A default interval is a good idea, bonding does this too for miimon,
> but 100ms is too small for the ARP monitor.

Yes, I also thought about this before post the patch. At latest I set the
default value to 100 based on the config example teamd.conf doc.

> 
> It is possible for many ARP requests to overwhelm an ARP target. A /24
> subnet of 100ms ARP monitors usually achieves this as it generates
> 2500 ARP requests per second (253 other hosts * 10 per second).
> 
> The default ARP monitor interval should be at least 1000ms, a tenfold
> reduction in ARP requests from the above.
> 
> Suggest revise this patch to either:
> - set default 100 for ethtool watch, and 1000 for ARP and NA/NS watch

I'd prefer this one, as ethool is based on notify message and won't cause
message flood.

Thanks
Hangbin
> - set default 1000 for all watchers
> 
> Jamie
> 
> On Mon, 4 Mar 2019 at 16:55, Hangbin Liu <liuhangbin@gmail.com> wrote:
> >
> > As we don't have a default value for link_watch.interval. If user
> > forgot to set this parameter, teamd will failed to init port priv and
> > exit at the end.
> >
> > e.g.
> > teamd -g -c '{"runner":{"name":"activebackup"},"link_watch":{"name":"arp_ping","target_host":"198.51.100.1"}}'
> > teamdctl team0 port add p5p1
> > teamdctl team0 port add p5p2
> >
> > teamd debug log shows:
> > p5p2: Got link watch from global config.
> > p5p2: Using sticky "0".
> > Failed to get "interval" link-watch option.
> > Failed to load options.
> > Failed to init port priv.
> > Callback named "lw_periodic" not found.
> > Callback named "lw_socket" not found.
> > Loop callback failed with: Invalid argument
> > Failed loop callback: libteam_events, 0x5624c28b9410
> > select() failed.
> > Exiting...
> > Removed loop callback: usock_acc_conn, 0x5624c28bab60
> > Removed loop callback: usock, 0x5624c28b9410
> > Removed loop callback: workq, 0x5624c28b9410
> > Removed loop callback: libteam_events, 0x5624c28b9410
> > Removed loop callback: daemon, 0x5624c28b9410
> > Failed: Bad file descriptor
> >
> > Fix it by adding a default value for link_watch.interval.
> >
> > Reported-by: LiLiang <liali@redhat.com>
> > Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
> > ---
> >  man/teamd.conf.5     | 10 ++++++++++
> >  teamd/teamd_lw_psr.c | 11 ++++++++---
> >  2 files changed, 18 insertions(+), 3 deletions(-)
> >
> > diff --git a/man/teamd.conf.5 b/man/teamd.conf.5
> > index 9bdf46a..6d5374a 100644
> > --- a/man/teamd.conf.5
> > +++ b/man/teamd.conf.5
> > @@ -308,6 +308,11 @@ Default:
> >  .TP
> >  .BR "link_watch.interval "| " ports.PORTIFNAME.link_watch.interval " (int)
> >  Value is a positive number in milliseconds. It is the interval between ARP requests being sent.
> > +.RS 7
> > +.PP
> > +Default:
> > +.BR "100"
> > +.RE
> >  .TP
> >  .BR "link_watch.init_wait "| " ports.PORTIFNAME.link_watch.init_wait " (int)
> >  Value is a positive number in milliseconds. It is the delay between link watch initialization and the first ARP request being sent.
> > @@ -371,6 +376,11 @@ Default:
> >  .TP
> >  .BR "link_watch.interval "| " ports.PORTIFNAME.link_watch.interval " (int)
> >  Value is a positive number in milliseconds. It is the interval between sending NS packets.
> > +.RS 7
> > +.PP
> > +Default:
> > +.BR "100"
> > +.RE
> >  .TP
> >  .BR "link_watch.init_wait "| " ports.PORTIFNAME.link_watch.init_wait " (int)
> >  Value is a positive number in milliseconds. It is the delay between link watch initialization and the first NS packet being sent.
> > diff --git a/teamd/teamd_lw_psr.c b/teamd/teamd_lw_psr.c
> > index c0772db..3ddf295 100644
> > --- a/teamd/teamd_lw_psr.c
> > +++ b/teamd/teamd_lw_psr.c
> > @@ -29,6 +29,7 @@
> >
> >  static const struct timespec lw_psr_default_init_wait = { 0, 1 };
> >  #define LW_PSR_DEFAULT_MISSED_MAX 3
> > +#define LW_PSR_DEFAULT_INTERVAL 100
> >
> >  #define LW_PERIODIC_CB_NAME "lw_periodic"
> >  static int lw_psr_callback_periodic(struct teamd_context *ctx, int events, void *priv)
> > @@ -77,9 +78,13 @@ static int lw_psr_load_options(struct teamd_context *ctx,
> >         int tmp;
> >
> >         err = teamd_config_int_get(ctx, &tmp, "@.interval", cpcookie);
> > -       if (err) {
> > -               teamd_log_err("Failed to get \"interval\" link-watch option.");
> > -               return -EINVAL;
> > +       if (!err) {
> > +               if (tmp < 0) {
> > +                       teamd_log_err("\"interval\" must not be negative number.");
> > +                       return -EINVAL;
> > +               }
> > +       } else {
> > +               tmp = LW_PSR_DEFAULT_INTERVAL;
> >         }
> >         teamd_log_dbg("interval \"%d\".", tmp);
> >         ms_to_timespec(&psr_ppriv->interval, tmp);
> > --
> > 2.19.2
> > _______________________________________________
> > libteam mailing list -- libteam@lists.fedorahosted.org
> > To unsubscribe send an email to libteam-leave@lists.fedorahosted.org
> > Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
> > List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
> > List Archives: https://lists.fedorahosted.org/archives/list/libteam@lists.fedorahosted.org

Patch
diff mbox series

diff --git a/man/teamd.conf.5 b/man/teamd.conf.5
index 9bdf46a..6d5374a 100644
--- a/man/teamd.conf.5
+++ b/man/teamd.conf.5
@@ -308,6 +308,11 @@  Default:
 .TP
 .BR "link_watch.interval "| " ports.PORTIFNAME.link_watch.interval " (int)
 Value is a positive number in milliseconds. It is the interval between ARP requests being sent.
+.RS 7
+.PP
+Default:
+.BR "100"
+.RE
 .TP
 .BR "link_watch.init_wait "| " ports.PORTIFNAME.link_watch.init_wait " (int)
 Value is a positive number in milliseconds. It is the delay between link watch initialization and the first ARP request being sent.
@@ -371,6 +376,11 @@  Default:
 .TP
 .BR "link_watch.interval "| " ports.PORTIFNAME.link_watch.interval " (int)
 Value is a positive number in milliseconds. It is the interval between sending NS packets.
+.RS 7
+.PP
+Default:
+.BR "100"
+.RE
 .TP
 .BR "link_watch.init_wait "| " ports.PORTIFNAME.link_watch.init_wait " (int)
 Value is a positive number in milliseconds. It is the delay between link watch initialization and the first NS packet being sent.
diff --git a/teamd/teamd_lw_psr.c b/teamd/teamd_lw_psr.c
index c0772db..3ddf295 100644
--- a/teamd/teamd_lw_psr.c
+++ b/teamd/teamd_lw_psr.c
@@ -29,6 +29,7 @@ 
 
 static const struct timespec lw_psr_default_init_wait = { 0, 1 };
 #define LW_PSR_DEFAULT_MISSED_MAX 3
+#define LW_PSR_DEFAULT_INTERVAL 100
 
 #define LW_PERIODIC_CB_NAME "lw_periodic"
 static int lw_psr_callback_periodic(struct teamd_context *ctx, int events, void *priv)
@@ -77,9 +78,13 @@  static int lw_psr_load_options(struct teamd_context *ctx,
 	int tmp;
 
 	err = teamd_config_int_get(ctx, &tmp, "@.interval", cpcookie);
-	if (err) {
-		teamd_log_err("Failed to get \"interval\" link-watch option.");
-		return -EINVAL;
+	if (!err) {
+		if (tmp < 0) {
+			teamd_log_err("\"interval\" must not be negative number.");
+			return -EINVAL;
+		}
+	} else {
+		tmp = LW_PSR_DEFAULT_INTERVAL;
 	}
 	teamd_log_dbg("interval \"%d\".", tmp);
 	ms_to_timespec(&psr_ppriv->interval, tmp);