Message ID | 20190304063544.12297-1-liuhangbin@gmail.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | teamd: add a default value 100 for link_watch.interval | expand |
A default interval is a good idea, bonding does this too for miimon, but 100ms is too small for the ARP monitor. It is possible for many ARP requests to overwhelm an ARP target. A /24 subnet of 100ms ARP monitors usually achieves this as it generates 2500 ARP requests per second (253 other hosts * 10 per second). The default ARP monitor interval should be at least 1000ms, a tenfold reduction in ARP requests from the above. Suggest revise this patch to either: - set default 100 for ethtool watch, and 1000 for ARP and NA/NS watch - set default 1000 for all watchers Jamie On Mon, 4 Mar 2019 at 16:55, Hangbin Liu <liuhangbin@gmail.com> wrote: > > As we don't have a default value for link_watch.interval. If user > forgot to set this parameter, teamd will failed to init port priv and > exit at the end. > > e.g. > teamd -g -c '{"runner":{"name":"activebackup"},"link_watch":{"name":"arp_ping","target_host":"198.51.100.1"}}' > teamdctl team0 port add p5p1 > teamdctl team0 port add p5p2 > > teamd debug log shows: > p5p2: Got link watch from global config. > p5p2: Using sticky "0". > Failed to get "interval" link-watch option. > Failed to load options. > Failed to init port priv. > Callback named "lw_periodic" not found. > Callback named "lw_socket" not found. > Loop callback failed with: Invalid argument > Failed loop callback: libteam_events, 0x5624c28b9410 > select() failed. > Exiting... > Removed loop callback: usock_acc_conn, 0x5624c28bab60 > Removed loop callback: usock, 0x5624c28b9410 > Removed loop callback: workq, 0x5624c28b9410 > Removed loop callback: libteam_events, 0x5624c28b9410 > Removed loop callback: daemon, 0x5624c28b9410 > Failed: Bad file descriptor > > Fix it by adding a default value for link_watch.interval. > > Reported-by: LiLiang <liali@redhat.com> > Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> > --- > man/teamd.conf.5 | 10 ++++++++++ > teamd/teamd_lw_psr.c | 11 ++++++++--- > 2 files changed, 18 insertions(+), 3 deletions(-) > > diff --git a/man/teamd.conf.5 b/man/teamd.conf.5 > index 9bdf46a..6d5374a 100644 > --- a/man/teamd.conf.5 > +++ b/man/teamd.conf.5 > @@ -308,6 +308,11 @@ Default: > .TP > .BR "link_watch.interval "| " ports.PORTIFNAME.link_watch.interval " (int) > Value is a positive number in milliseconds. It is the interval between ARP requests being sent. > +.RS 7 > +.PP > +Default: > +.BR "100" > +.RE > .TP > .BR "link_watch.init_wait "| " ports.PORTIFNAME.link_watch.init_wait " (int) > Value is a positive number in milliseconds. It is the delay between link watch initialization and the first ARP request being sent. > @@ -371,6 +376,11 @@ Default: > .TP > .BR "link_watch.interval "| " ports.PORTIFNAME.link_watch.interval " (int) > Value is a positive number in milliseconds. It is the interval between sending NS packets. > +.RS 7 > +.PP > +Default: > +.BR "100" > +.RE > .TP > .BR "link_watch.init_wait "| " ports.PORTIFNAME.link_watch.init_wait " (int) > Value is a positive number in milliseconds. It is the delay between link watch initialization and the first NS packet being sent. > diff --git a/teamd/teamd_lw_psr.c b/teamd/teamd_lw_psr.c > index c0772db..3ddf295 100644 > --- a/teamd/teamd_lw_psr.c > +++ b/teamd/teamd_lw_psr.c > @@ -29,6 +29,7 @@ > > static const struct timespec lw_psr_default_init_wait = { 0, 1 }; > #define LW_PSR_DEFAULT_MISSED_MAX 3 > +#define LW_PSR_DEFAULT_INTERVAL 100 > > #define LW_PERIODIC_CB_NAME "lw_periodic" > static int lw_psr_callback_periodic(struct teamd_context *ctx, int events, void *priv) > @@ -77,9 +78,13 @@ static int lw_psr_load_options(struct teamd_context *ctx, > int tmp; > > err = teamd_config_int_get(ctx, &tmp, "@.interval", cpcookie); > - if (err) { > - teamd_log_err("Failed to get \"interval\" link-watch option."); > - return -EINVAL; > + if (!err) { > + if (tmp < 0) { > + teamd_log_err("\"interval\" must not be negative number."); > + return -EINVAL; > + } > + } else { > + tmp = LW_PSR_DEFAULT_INTERVAL; > } > teamd_log_dbg("interval \"%d\".", tmp); > ms_to_timespec(&psr_ppriv->interval, tmp); > -- > 2.19.2 > _______________________________________________ > libteam mailing list -- libteam@lists.fedorahosted.org > To unsubscribe send an email to libteam-leave@lists.fedorahosted.org > Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html > List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines > List Archives: https://lists.fedorahosted.org/archives/list/libteam@lists.fedorahosted.org
Hi Jamie, On Tue, Mar 05, 2019 at 12:45:05PM +1000, Jamie Bainbridge wrote: > A default interval is a good idea, bonding does this too for miimon, > but 100ms is too small for the ARP monitor. Yes, I also thought about this before post the patch. At latest I set the default value to 100 based on the config example teamd.conf doc. > > It is possible for many ARP requests to overwhelm an ARP target. A /24 > subnet of 100ms ARP monitors usually achieves this as it generates > 2500 ARP requests per second (253 other hosts * 10 per second). > > The default ARP monitor interval should be at least 1000ms, a tenfold > reduction in ARP requests from the above. > > Suggest revise this patch to either: > - set default 100 for ethtool watch, and 1000 for ARP and NA/NS watch I'd prefer this one, as ethool is based on notify message and won't cause message flood. Thanks Hangbin > - set default 1000 for all watchers > > Jamie > > On Mon, 4 Mar 2019 at 16:55, Hangbin Liu <liuhangbin@gmail.com> wrote: > > > > As we don't have a default value for link_watch.interval. If user > > forgot to set this parameter, teamd will failed to init port priv and > > exit at the end. > > > > e.g. > > teamd -g -c '{"runner":{"name":"activebackup"},"link_watch":{"name":"arp_ping","target_host":"198.51.100.1"}}' > > teamdctl team0 port add p5p1 > > teamdctl team0 port add p5p2 > > > > teamd debug log shows: > > p5p2: Got link watch from global config. > > p5p2: Using sticky "0". > > Failed to get "interval" link-watch option. > > Failed to load options. > > Failed to init port priv. > > Callback named "lw_periodic" not found. > > Callback named "lw_socket" not found. > > Loop callback failed with: Invalid argument > > Failed loop callback: libteam_events, 0x5624c28b9410 > > select() failed. > > Exiting... > > Removed loop callback: usock_acc_conn, 0x5624c28bab60 > > Removed loop callback: usock, 0x5624c28b9410 > > Removed loop callback: workq, 0x5624c28b9410 > > Removed loop callback: libteam_events, 0x5624c28b9410 > > Removed loop callback: daemon, 0x5624c28b9410 > > Failed: Bad file descriptor > > > > Fix it by adding a default value for link_watch.interval. > > > > Reported-by: LiLiang <liali@redhat.com> > > Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> > > --- > > man/teamd.conf.5 | 10 ++++++++++ > > teamd/teamd_lw_psr.c | 11 ++++++++--- > > 2 files changed, 18 insertions(+), 3 deletions(-) > > > > diff --git a/man/teamd.conf.5 b/man/teamd.conf.5 > > index 9bdf46a..6d5374a 100644 > > --- a/man/teamd.conf.5 > > +++ b/man/teamd.conf.5 > > @@ -308,6 +308,11 @@ Default: > > .TP > > .BR "link_watch.interval "| " ports.PORTIFNAME.link_watch.interval " (int) > > Value is a positive number in milliseconds. It is the interval between ARP requests being sent. > > +.RS 7 > > +.PP > > +Default: > > +.BR "100" > > +.RE > > .TP > > .BR "link_watch.init_wait "| " ports.PORTIFNAME.link_watch.init_wait " (int) > > Value is a positive number in milliseconds. It is the delay between link watch initialization and the first ARP request being sent. > > @@ -371,6 +376,11 @@ Default: > > .TP > > .BR "link_watch.interval "| " ports.PORTIFNAME.link_watch.interval " (int) > > Value is a positive number in milliseconds. It is the interval between sending NS packets. > > +.RS 7 > > +.PP > > +Default: > > +.BR "100" > > +.RE > > .TP > > .BR "link_watch.init_wait "| " ports.PORTIFNAME.link_watch.init_wait " (int) > > Value is a positive number in milliseconds. It is the delay between link watch initialization and the first NS packet being sent. > > diff --git a/teamd/teamd_lw_psr.c b/teamd/teamd_lw_psr.c > > index c0772db..3ddf295 100644 > > --- a/teamd/teamd_lw_psr.c > > +++ b/teamd/teamd_lw_psr.c > > @@ -29,6 +29,7 @@ > > > > static const struct timespec lw_psr_default_init_wait = { 0, 1 }; > > #define LW_PSR_DEFAULT_MISSED_MAX 3 > > +#define LW_PSR_DEFAULT_INTERVAL 100 > > > > #define LW_PERIODIC_CB_NAME "lw_periodic" > > static int lw_psr_callback_periodic(struct teamd_context *ctx, int events, void *priv) > > @@ -77,9 +78,13 @@ static int lw_psr_load_options(struct teamd_context *ctx, > > int tmp; > > > > err = teamd_config_int_get(ctx, &tmp, "@.interval", cpcookie); > > - if (err) { > > - teamd_log_err("Failed to get \"interval\" link-watch option."); > > - return -EINVAL; > > + if (!err) { > > + if (tmp < 0) { > > + teamd_log_err("\"interval\" must not be negative number."); > > + return -EINVAL; > > + } > > + } else { > > + tmp = LW_PSR_DEFAULT_INTERVAL; > > } > > teamd_log_dbg("interval \"%d\".", tmp); > > ms_to_timespec(&psr_ppriv->interval, tmp); > > -- > > 2.19.2 > > _______________________________________________ > > libteam mailing list -- libteam@lists.fedorahosted.org > > To unsubscribe send an email to libteam-leave@lists.fedorahosted.org > > Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html > > List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines > > List Archives: https://lists.fedorahosted.org/archives/list/libteam@lists.fedorahosted.org
diff --git a/man/teamd.conf.5 b/man/teamd.conf.5 index 9bdf46a..6d5374a 100644 --- a/man/teamd.conf.5 +++ b/man/teamd.conf.5 @@ -308,6 +308,11 @@ Default: .TP .BR "link_watch.interval "| " ports.PORTIFNAME.link_watch.interval " (int) Value is a positive number in milliseconds. It is the interval between ARP requests being sent. +.RS 7 +.PP +Default: +.BR "100" +.RE .TP .BR "link_watch.init_wait "| " ports.PORTIFNAME.link_watch.init_wait " (int) Value is a positive number in milliseconds. It is the delay between link watch initialization and the first ARP request being sent. @@ -371,6 +376,11 @@ Default: .TP .BR "link_watch.interval "| " ports.PORTIFNAME.link_watch.interval " (int) Value is a positive number in milliseconds. It is the interval between sending NS packets. +.RS 7 +.PP +Default: +.BR "100" +.RE .TP .BR "link_watch.init_wait "| " ports.PORTIFNAME.link_watch.init_wait " (int) Value is a positive number in milliseconds. It is the delay between link watch initialization and the first NS packet being sent. diff --git a/teamd/teamd_lw_psr.c b/teamd/teamd_lw_psr.c index c0772db..3ddf295 100644 --- a/teamd/teamd_lw_psr.c +++ b/teamd/teamd_lw_psr.c @@ -29,6 +29,7 @@ static const struct timespec lw_psr_default_init_wait = { 0, 1 }; #define LW_PSR_DEFAULT_MISSED_MAX 3 +#define LW_PSR_DEFAULT_INTERVAL 100 #define LW_PERIODIC_CB_NAME "lw_periodic" static int lw_psr_callback_periodic(struct teamd_context *ctx, int events, void *priv) @@ -77,9 +78,13 @@ static int lw_psr_load_options(struct teamd_context *ctx, int tmp; err = teamd_config_int_get(ctx, &tmp, "@.interval", cpcookie); - if (err) { - teamd_log_err("Failed to get \"interval\" link-watch option."); - return -EINVAL; + if (!err) { + if (tmp < 0) { + teamd_log_err("\"interval\" must not be negative number."); + return -EINVAL; + } + } else { + tmp = LW_PSR_DEFAULT_INTERVAL; } teamd_log_dbg("interval \"%d\".", tmp); ms_to_timespec(&psr_ppriv->interval, tmp);
As we don't have a default value for link_watch.interval. If user forgot to set this parameter, teamd will failed to init port priv and exit at the end. e.g. teamd -g -c '{"runner":{"name":"activebackup"},"link_watch":{"name":"arp_ping","target_host":"198.51.100.1"}}' teamdctl team0 port add p5p1 teamdctl team0 port add p5p2 teamd debug log shows: p5p2: Got link watch from global config. p5p2: Using sticky "0". Failed to get "interval" link-watch option. Failed to load options. Failed to init port priv. Callback named "lw_periodic" not found. Callback named "lw_socket" not found. Loop callback failed with: Invalid argument Failed loop callback: libteam_events, 0x5624c28b9410 select() failed. Exiting... Removed loop callback: usock_acc_conn, 0x5624c28bab60 Removed loop callback: usock, 0x5624c28b9410 Removed loop callback: workq, 0x5624c28b9410 Removed loop callback: libteam_events, 0x5624c28b9410 Removed loop callback: daemon, 0x5624c28b9410 Failed: Bad file descriptor Fix it by adding a default value for link_watch.interval. Reported-by: LiLiang <liali@redhat.com> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> --- man/teamd.conf.5 | 10 ++++++++++ teamd/teamd_lw_psr.c | 11 ++++++++--- 2 files changed, 18 insertions(+), 3 deletions(-)