diff mbox

[for,4.9] vif-common.sh: Have iptables wait for the xtables lock

Message ID 1496656950-15815-1-git-send-email-george.dunlap@citrix.com (mailing list archive)
State New, archived
Headers show

Commit Message

George Dunlap June 5, 2017, 10:02 a.m. UTC
iptables has a system-wide lock on the xtables.  Strangely though, in
the case of two concurrent invocations, the default is for the
instance not grabbing the lock to exit out rather than waiting for it.
This means that when starting a large number of guests in parallel,
many will fail out with messages like this:

  2017-05-10 11:45:40 UTC libxl: error: libxl_exec.c:118: libxl_report_child_exitstatus: /etc/xen/scripts/vif-bridge remove [18767] exited with error status 4
  2017-05-10 11:50:52 UTC libxl: error: libxl_exec.c:118: libxl_report_child_exitstatus: /etc/xen/scripts/vif-bridge offline [1554] exited with error status 4

In order to instruct iptables to wait for the lock, you have to
specify '-w'.  Unfortunately, not all versions of iptables have the
'-w' option, so on first invocation check to see if it accepts the -w
command.

Reported-by: Antony Saba <awsaba@gmail.com>
Signed-off-by: George Dunlap <george.dunlap@citrix.com>
---
CC: Ian Jackson <ian.jackson@eu.citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
---
 tools/hotplug/Linux/vif-common.sh | 38 +++++++++++++++++++++++++++++++++++---
 1 file changed, 35 insertions(+), 3 deletions(-)

Comments

George Dunlap June 5, 2017, 11:03 a.m. UTC | #1
Forgot to cc' the release manager.

On Mon, Jun 5, 2017 at 11:02 AM, George Dunlap <george.dunlap@citrix.com> wrote:
> iptables has a system-wide lock on the xtables.  Strangely though, in
> the case of two concurrent invocations, the default is for the
> instance not grabbing the lock to exit out rather than waiting for it.
> This means that when starting a large number of guests in parallel,
> many will fail out with messages like this:
>
>   2017-05-10 11:45:40 UTC libxl: error: libxl_exec.c:118: libxl_report_child_exitstatus: /etc/xen/scripts/vif-bridge remove [18767] exited with error status 4
>   2017-05-10 11:50:52 UTC libxl: error: libxl_exec.c:118: libxl_report_child_exitstatus: /etc/xen/scripts/vif-bridge offline [1554] exited with error status 4
>
> In order to instruct iptables to wait for the lock, you have to
> specify '-w'.  Unfortunately, not all versions of iptables have the
> '-w' option, so on first invocation check to see if it accepts the -w
> command.
>
> Reported-by: Antony Saba <awsaba@gmail.com>
> Signed-off-by: George Dunlap <george.dunlap@citrix.com>
> ---
> CC: Ian Jackson <ian.jackson@eu.citrix.com>
> CC: Wei Liu <wei.liu2@citrix.com>
> ---
>  tools/hotplug/Linux/vif-common.sh | 38 +++++++++++++++++++++++++++++++++++---
>  1 file changed, 35 insertions(+), 3 deletions(-)
>
> diff --git a/tools/hotplug/Linux/vif-common.sh b/tools/hotplug/Linux/vif-common.sh
> index 6e8d584..29cd8dd 100644
> --- a/tools/hotplug/Linux/vif-common.sh
> +++ b/tools/hotplug/Linux/vif-common.sh
> @@ -120,6 +120,38 @@ fi
>  ip=${ip:-}
>  ip=$(xenstore_read_default "$XENBUS_PATH/ip" "$ip")
>
> +IPTABLES_WAIT_RUNE="-w"
> +IPTABLES_WAIT_RUNE_CHECKED=false
> +
> +# When iptables introduced locking, in the event of lock contention,
> +# they made "fail" rather than "wait for the lock" the default
> +# behavior.  In order to select "wait for the lock" behavior, you have
> +# to add the '-w' parameter.  Unfortinately, both the locking and the
> +# option were only introduced in 2013, and older versions of iptables
> +# will fail if the '-w' parameter is included (since they don't
> +# recognize it).  So check to see if it's supported the first time we
> +# use it.
> +iptables_w()
> +{
> +    if ! $IPTABLES_WAIT_RUNE_CHECKED ; then
> +       iptables $IPTABLES_WAIT_RUNE -L -n >& /dev/null
> +       if [[ $? == 0 ]] ; then
> +           # If we succeed, then -w is supported; don't check again
> +           IPTABLES_WAIT_RUNE_CHECKED=true
> +       elif [[ $? == 2 ]] ; then
> +           iptables -L -n >& /dev/null
> +           if [[ $? != 2 ]] ; then
> +               # If we fail with PARAMETER_PROBLEM (2) with -w and
> +               # don't fail with PARAMETER_PROBLEM without it, then
> +               # it's the -w option
> +               IPTABLES_WAIT_RUNE_CHECKED=true
> +               IPTABLES_WAIT_RUNE=""
> +           fi
> +       fi
> +    fi
> +    iptables $IPTABLES_WAIT_RUNE "$@"
> +}
> +
>  frob_iptable()
>  {
>    if [ "$command" == "online" -o "$command" == "add" ]
> @@ -129,9 +161,9 @@ frob_iptable()
>      local c="-D"
>    fi
>
> -  iptables "$c" FORWARD -m physdev --physdev-is-bridged --physdev-in "$dev" \
> +  iptables_w "$c" FORWARD -m physdev --physdev-is-bridged --physdev-in "$dev" \
>      "$@" -j ACCEPT 2>/dev/null &&
> -  iptables "$c" FORWARD -m physdev --physdev-is-bridged --physdev-out "$dev" \
> +  iptables_w "$c" FORWARD -m physdev --physdev-is-bridged --physdev-out "$dev" \
>      -j ACCEPT 2>/dev/null
>
>    if [ \( "$command" == "online" -o "$command" == "add" \) -a $? -ne 0 ]
> @@ -154,7 +186,7 @@ handle_iptable()
>    # binary is not sufficient, because the user may not have the appropriate
>    # modules installed.  If iptables is not working, then there's no need to do
>    # anything with it, so we can just return.
> -  if ! iptables -L -n >&/dev/null
> +  if ! iptables_w -L -n >&/dev/null
>    then
>      return
>    fi
> --
> 2.1.4
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> https://lists.xen.org/xen-devel
Ian Jackson June 5, 2017, 4:07 p.m. UTC | #2
George Dunlap writes ("[PATCH for 4.9] vif-common.sh: Have iptables wait for the xtables lock"):
> iptables has a system-wide lock on the xtables.  Strangely though, in
> the case of two concurrent invocations, the default is for the
> instance not grabbing the lock to exit out rather than waiting for it.
> This means that when starting a large number of guests in parallel,
> many will fail out with messages like this:

What a mess, eh ?

Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Julien Grall June 6, 2017, 4:28 p.m. UTC | #3
Hi George,

On 05/06/17 12:03, George Dunlap wrote:
> Forgot to cc' the release manager.
>
> On Mon, Jun 5, 2017 at 11:02 AM, George Dunlap <george.dunlap@citrix.com> wrote:
>> iptables has a system-wide lock on the xtables.  Strangely though, in
>> the case of two concurrent invocations, the default is for the
>> instance not grabbing the lock to exit out rather than waiting for it.
>> This means that when starting a large number of guests in parallel,
>> many will fail out with messages like this:
>>
>>   2017-05-10 11:45:40 UTC libxl: error: libxl_exec.c:118: libxl_report_child_exitstatus: /etc/xen/scripts/vif-bridge remove [18767] exited with error status 4
>>   2017-05-10 11:50:52 UTC libxl: error: libxl_exec.c:118: libxl_report_child_exitstatus: /etc/xen/scripts/vif-bridge offline [1554] exited with error status 4
>>
>> In order to instruct iptables to wait for the lock, you have to
>> specify '-w'.  Unfortunately, not all versions of iptables have the
>> '-w' option, so on first invocation check to see if it accepts the -w
>> command.
>>
>> Reported-by: Antony Saba <awsaba@gmail.com>
>> Signed-off-by: George Dunlap <george.dunlap@citrix.com>
>> ---
>> CC: Ian Jackson <ian.jackson@eu.citrix.com>
>> CC: Wei Liu <wei.liu2@citrix.com>
>> ---
>>  tools/hotplug/Linux/vif-common.sh | 38 +++++++++++++++++++++++++++++++++++---
>>  1 file changed, 35 insertions(+), 3 deletions(-)
>>
>> diff --git a/tools/hotplug/Linux/vif-common.sh b/tools/hotplug/Linux/vif-common.sh
>> index 6e8d584..29cd8dd 100644
>> --- a/tools/hotplug/Linux/vif-common.sh
>> +++ b/tools/hotplug/Linux/vif-common.sh
>> @@ -120,6 +120,38 @@ fi
>>  ip=${ip:-}
>>  ip=$(xenstore_read_default "$XENBUS_PATH/ip" "$ip")
>>
>> +IPTABLES_WAIT_RUNE="-w"
>> +IPTABLES_WAIT_RUNE_CHECKED=false
>> +
>> +# When iptables introduced locking, in the event of lock contention,
>> +# they made "fail" rather than "wait for the lock" the default
>> +# behavior.  In order to select "wait for the lock" behavior, you have
>> +# to add the '-w' parameter.  Unfortinately, both the locking and the

NIT: s/Unfortinately/Unfortunately/

Release-acked-by: Julien Grall <julien.grall@arm.com>

Cheers,
Wei Liu June 6, 2017, 4:46 p.m. UTC | #4
On Tue, Jun 06, 2017 at 05:28:58PM +0100, Julien Grall wrote:
> Hi George,
> 
> On 05/06/17 12:03, George Dunlap wrote:
> > Forgot to cc' the release manager.
> > 
> > On Mon, Jun 5, 2017 at 11:02 AM, George Dunlap <george.dunlap@citrix.com> wrote:
> > > iptables has a system-wide lock on the xtables.  Strangely though, in
> > > the case of two concurrent invocations, the default is for the
> > > instance not grabbing the lock to exit out rather than waiting for it.
> > > This means that when starting a large number of guests in parallel,
> > > many will fail out with messages like this:
> > > 
> > >   2017-05-10 11:45:40 UTC libxl: error: libxl_exec.c:118: libxl_report_child_exitstatus: /etc/xen/scripts/vif-bridge remove [18767] exited with error status 4
> > >   2017-05-10 11:50:52 UTC libxl: error: libxl_exec.c:118: libxl_report_child_exitstatus: /etc/xen/scripts/vif-bridge offline [1554] exited with error status 4
> > > 
> > > In order to instruct iptables to wait for the lock, you have to
> > > specify '-w'.  Unfortunately, not all versions of iptables have the
> > > '-w' option, so on first invocation check to see if it accepts the -w
> > > command.
> > > 
> > > Reported-by: Antony Saba <awsaba@gmail.com>
> > > Signed-off-by: George Dunlap <george.dunlap@citrix.com>
> > > ---
> > > CC: Ian Jackson <ian.jackson@eu.citrix.com>
> > > CC: Wei Liu <wei.liu2@citrix.com>
> > > ---
> > >  tools/hotplug/Linux/vif-common.sh | 38 +++++++++++++++++++++++++++++++++++---
> > >  1 file changed, 35 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/tools/hotplug/Linux/vif-common.sh b/tools/hotplug/Linux/vif-common.sh
> > > index 6e8d584..29cd8dd 100644
> > > --- a/tools/hotplug/Linux/vif-common.sh
> > > +++ b/tools/hotplug/Linux/vif-common.sh
> > > @@ -120,6 +120,38 @@ fi
> > >  ip=${ip:-}
> > >  ip=$(xenstore_read_default "$XENBUS_PATH/ip" "$ip")
> > > 
> > > +IPTABLES_WAIT_RUNE="-w"
> > > +IPTABLES_WAIT_RUNE_CHECKED=false
> > > +
> > > +# When iptables introduced locking, in the event of lock contention,
> > > +# they made "fail" rather than "wait for the lock" the default
> > > +# behavior.  In order to select "wait for the lock" behavior, you have
> > > +# to add the '-w' parameter.  Unfortinately, both the locking and the
> 
> NIT: s/Unfortinately/Unfortunately/
> 
> Release-acked-by: Julien Grall <julien.grall@arm.com>
> 

Fixed the typo and committed to staging and staging-4.9.
diff mbox

Patch

diff --git a/tools/hotplug/Linux/vif-common.sh b/tools/hotplug/Linux/vif-common.sh
index 6e8d584..29cd8dd 100644
--- a/tools/hotplug/Linux/vif-common.sh
+++ b/tools/hotplug/Linux/vif-common.sh
@@ -120,6 +120,38 @@  fi
 ip=${ip:-}
 ip=$(xenstore_read_default "$XENBUS_PATH/ip" "$ip")
 
+IPTABLES_WAIT_RUNE="-w"
+IPTABLES_WAIT_RUNE_CHECKED=false
+
+# When iptables introduced locking, in the event of lock contention,
+# they made "fail" rather than "wait for the lock" the default
+# behavior.  In order to select "wait for the lock" behavior, you have
+# to add the '-w' parameter.  Unfortinately, both the locking and the
+# option were only introduced in 2013, and older versions of iptables
+# will fail if the '-w' parameter is included (since they don't
+# recognize it).  So check to see if it's supported the first time we
+# use it.
+iptables_w()
+{
+    if ! $IPTABLES_WAIT_RUNE_CHECKED ; then
+	iptables $IPTABLES_WAIT_RUNE -L -n >& /dev/null
+	if [[ $? == 0 ]] ; then
+	    # If we succeed, then -w is supported; don't check again
+	    IPTABLES_WAIT_RUNE_CHECKED=true
+	elif [[ $? == 2 ]] ; then
+	    iptables -L -n >& /dev/null
+	    if [[ $? != 2 ]] ; then
+		# If we fail with PARAMETER_PROBLEM (2) with -w and
+		# don't fail with PARAMETER_PROBLEM without it, then
+		# it's the -w option
+		IPTABLES_WAIT_RUNE_CHECKED=true
+		IPTABLES_WAIT_RUNE=""
+	    fi
+	fi
+    fi
+    iptables $IPTABLES_WAIT_RUNE "$@"
+}
+
 frob_iptable()
 {
   if [ "$command" == "online" -o "$command" == "add" ]
@@ -129,9 +161,9 @@  frob_iptable()
     local c="-D"
   fi
 
-  iptables "$c" FORWARD -m physdev --physdev-is-bridged --physdev-in "$dev" \
+  iptables_w "$c" FORWARD -m physdev --physdev-is-bridged --physdev-in "$dev" \
     "$@" -j ACCEPT 2>/dev/null &&
-  iptables "$c" FORWARD -m physdev --physdev-is-bridged --physdev-out "$dev" \
+  iptables_w "$c" FORWARD -m physdev --physdev-is-bridged --physdev-out "$dev" \
     -j ACCEPT 2>/dev/null
 
   if [ \( "$command" == "online" -o "$command" == "add" \) -a $? -ne 0 ]
@@ -154,7 +186,7 @@  handle_iptable()
   # binary is not sufficient, because the user may not have the appropriate
   # modules installed.  If iptables is not working, then there's no need to do
   # anything with it, so we can just return.
-  if ! iptables -L -n >&/dev/null
+  if ! iptables_w -L -n >&/dev/null
   then
     return
   fi