diff mbox

generic/084: check inotify limit before tail many files

Message ID 1439482592-29611-1-git-send-email-zlang@redhat.com (mailing list archive)
State New, archived
Headers show

Commit Message

Zorro Lang Aug. 13, 2015, 4:16 p.m. UTC
generic/084 try to run 'tail' command, tail will use
inotify, and there're some limit about inotify. I think
the most important is fs.inotify.max_user_instances, then
fs.inotify.max_user_watches is importand too.

When I test on a machine with 154 cpu cores, this case
run failed, and hit many warning likes:

    tail: inotify cannot be used, reverting to polling: Too many
    open files

Because the fs.inotify.max_user_instances is 128, so if
we try to tail 154 files, it will be failed.

Of course, open files limit will effect this too. But generally
max_user_instances is the minimum number, so I don't check open
file limit.

Signed-off-by: Zorro Lang <zlang@redhat.com>
---
 tests/generic/084 | 9 +++++++++
 1 file changed, 9 insertions(+)

Comments

Eryu Guan Aug. 14, 2015, 6:58 a.m. UTC | #1
On Fri, Aug 14, 2015 at 12:16:32AM +0800, Zorro Lang wrote:
> generic/084 try to run 'tail' command, tail will use
> inotify, and there're some limit about inotify. I think
> the most important is fs.inotify.max_user_instances, then
> fs.inotify.max_user_watches is importand too.

max_user_watches is much larger (usually 8k?) than max_user_instances
(128), I think we can skip the max_user_watches check here.

> 
> When I test on a machine with 154 cpu cores, this case
> run failed, and hit many warning likes:
> 
>     tail: inotify cannot be used, reverting to polling: Too many
>     open files

No need to wrap lines for quoted logs.

> 
> Because the fs.inotify.max_user_instances is 128, so if
> we try to tail 154 files, it will be failed.
> 
> Of course, open files limit will effect this too. But generally
> max_user_instances is the minimum number, so I don't check open
> file limit.
> 
> Signed-off-by: Zorro Lang <zlang@redhat.com>
> ---
>  tests/generic/084 | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/tests/generic/084 b/tests/generic/084
> index 3fec6c2..f34d4b2 100755
> --- a/tests/generic/084
> +++ b/tests/generic/084
> @@ -63,6 +63,15 @@ link_unlink_storm()
>  
>  rm -f $seqres.full
>  nr_cpu=`$here/src/feature -o`
> +user_watches=`sysctl -n fs.inotify.max_user_watches`
> +user_instances=`sysctl -n fs.inotify.max_user_instances`

I think we need some comments in the code too.

> +
> +user_limit=$((user_watches > user_instances ? user_instances : user_watches))

Please use plain if:then

> +if [ $nr_cpu -ge $((user_limit/2)) ]
> +then

if []; then
fi

this is the preferred format in xfstests

> +	nr_cpu=$((user_limit/2))

Need comments to explain why divide by 2.

Thanks,
Eryu
> +fi
> +
>  echo "Silence is golden"
>  
>  _scratch_mkfs >>$seqres.full 2>&1
> -- 
> 1.9.3
> 
--
To unsubscribe from this list: send the line "unsubscribe fstests" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dave Chinner Aug. 17, 2015, 12:03 a.m. UTC | #2
On Fri, Aug 14, 2015 at 12:16:32AM +0800, Zorro Lang wrote:
> generic/084 try to run 'tail' command, tail will use
> inotify, and there're some limit about inotify. I think
> the most important is fs.inotify.max_user_instances, then
> fs.inotify.max_user_watches is importand too.
>
> When I test on a machine with 154 cpu cores, this case
> run failed, and hit many warning likes:
> 
>     tail: inotify cannot be used, reverting to polling: Too many
>     open files
> 
> Because the fs.inotify.max_user_instances is 128, so if
> we try to tail 154 files, it will be failed.

We use 'tail' all over the place in xfstests, so why is only
generic/084 affected?

And really, this seems more like a distro/environment bug and
doesn't need xfstests help to work around. i.e. changing the
sysctl before starting xfstests seems much more appropriate than
hacking it a random test. Especially as there may be more than one
test that is affected by this, and when run in a random order this
would cause those other tests to pass/fail depending on whether
generic/084 had already been run on that machine....

Cheers,

Dave.
Zorro Lang Aug. 17, 2015, 1:05 a.m. UTC | #3
Hi Dave,

Thanks for your reply.

----- ???? -----
> ???: "Dave Chinner" <david@fromorbit.com>
> ???: "Zorro Lang" <zlang@redhat.com>
> ??: fstests@vger.kernel.org, eguan@redhat.com
> ????: ???, 2015? 8 ? 17? ?? 8:03:36
> ??: Re: [PATCH] generic/084: check inotify limit before tail many files
> 
> On Fri, Aug 14, 2015 at 12:16:32AM +0800, Zorro Lang wrote:
> > generic/084 try to run 'tail' command, tail will use
> > inotify, and there're some limit about inotify. I think
> > the most important is fs.inotify.max_user_instances, then
> > fs.inotify.max_user_watches is importand too.
> >
> > When I test on a machine with 154 cpu cores, this case
> > run failed, and hit many warning likes:
> > 
> >     tail: inotify cannot be used, reverting to polling: Too many
> >     open files
> > 
> > Because the fs.inotify.max_user_instances is 128, so if
> > we try to tail 154 files, it will be failed.
> 
> We use 'tail' all over the place in xfstests, so why is only
> generic/084 affected?

Because generic/084 use try to create $nr_cpu tail processes:
for i in `seq 1 $nr_cpu`; do
    ...
    tail -f $testfile &
    ...
done

And nr_cpu=`$here/src/feature -o`.

Generally fs.inotify.max_user_instances is 128, when a machine
have more than(or nearly the same) this number, this test will
failed.

Maybe other cases don't try to create so many tail processes, so
they passed.

> 
> And really, this seems more like a distro/environment bug and
> doesn't need xfstests help to work around. i.e. changing the
> sysctl before starting xfstests seems much more appropriate than
> hacking it a random test. Especially as there may be more than one
> test that is affected by this, and when run in a random order this
> would cause those other tests to pass/fail depending on whether
> generic/084 had already been run on that machine....

As this situation generally won't effect other cases, so I try to
make the patch don't effect more other cases. Because I don't know
if someone try to test fs.inotify...

But if maintainer think it'll be better, I will change the patch:)

Thanks,
Zorro Lang

> 
> Cheers,
> 
> Dave.
> --
> Dave Chinner
> david@fromorbit.com
> 
--
To unsubscribe from this list: send the line "unsubscribe fstests" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dave Chinner Aug. 17, 2015, 5:06 a.m. UTC | #4
On Sun, Aug 16, 2015 at 09:05:19PM -0400, Zirong Lang wrote:
> Hi Dave,
> 
> Thanks for your reply.
> 
> ----- ???? -----
> > ???: "Dave Chinner" <david@fromorbit.com>
> > ???: "Zorro Lang" <zlang@redhat.com>
> > ??: fstests@vger.kernel.org, eguan@redhat.com
> > ????: ???, 2015? 8 ? 17? ?? 8:03:36
> > ??: Re: [PATCH] generic/084: check inotify limit before tail many files
> > 
> > On Fri, Aug 14, 2015 at 12:16:32AM +0800, Zorro Lang wrote:
> > > generic/084 try to run 'tail' command, tail will use
> > > inotify, and there're some limit about inotify. I think
> > > the most important is fs.inotify.max_user_instances, then
> > > fs.inotify.max_user_watches is importand too.
> > >
> > > When I test on a machine with 154 cpu cores, this case
> > > run failed, and hit many warning likes:
> > > 
> > >     tail: inotify cannot be used, reverting to polling: Too many
> > >     open files
> > > 
> > > Because the fs.inotify.max_user_instances is 128, so if
> > > we try to tail 154 files, it will be failed.
> > 
> > We use 'tail' all over the place in xfstests, so why is only
> > generic/084 affected?
> 
> Because generic/084 use try to create $nr_cpu tail processes:
> for i in `seq 1 $nr_cpu`; do
>     ...
>     tail -f $testfile &
>     ...
> done
> 
> And nr_cpu=`$here/src/feature -o`.
> 
> Generally fs.inotify.max_user_instances is 128, when a machine
> have more than(or nearly the same) this number, this test will
> failed.

This information should have been in the patch description - it's
concurrently run tail commands that are the problem, not a single
execution of tail...

> Maybe other cases don't try to create so many tail processes, so
> they passed.

Exactly. The answer is obvious when you explain it fully :)

So, we have other tests that use hundreds of open unlinked files and
they don't have this problem. That means the issue is how
generic/084 is creating the unlinked files, not an issue with
inotify config.

Go look at src/multi_open_unlink.c and tests/xfs/1[28]1, and then
rewrite generic/084 to use multi_open_unlink to create and hold 
open unlinked files for a specified amount of time.

Cheers,

Dave.
Zorro Lang Aug. 17, 2015, 1:55 p.m. UTC | #5
----- ???? -----
> ???: "Dave Chinner" <david@fromorbit.com>
> ???: "Zirong Lang" <zlang@redhat.com>
> ??: fstests@vger.kernel.org, eguan@redhat.com
> ????: ???, 2015? 8 ? 17? ?? 1:06:12
> ??: Re: [PATCH] generic/084: check inotify limit before tail many files
> 
> On Sun, Aug 16, 2015 at 09:05:19PM -0400, Zirong Lang wrote:
> > Hi Dave,
> > 
> > Thanks for your reply.
> > 
> > ----- ???? -----
> > > ???: "Dave Chinner" <david@fromorbit.com>
> > > ???: "Zorro Lang" <zlang@redhat.com>
> > > ??: fstests@vger.kernel.org, eguan@redhat.com
> > > ????: ???, 2015? 8 ? 17? ?? 8:03:36
> > > ??: Re: [PATCH] generic/084: check inotify limit before tail many files
> > > 
> > > On Fri, Aug 14, 2015 at 12:16:32AM +0800, Zorro Lang wrote:
> > > > generic/084 try to run 'tail' command, tail will use
> > > > inotify, and there're some limit about inotify. I think
> > > > the most important is fs.inotify.max_user_instances, then
> > > > fs.inotify.max_user_watches is importand too.
> > > >
> > > > When I test on a machine with 154 cpu cores, this case
> > > > run failed, and hit many warning likes:
> > > > 
> > > >     tail: inotify cannot be used, reverting to polling: Too many
> > > >     open files
> > > > 
> > > > Because the fs.inotify.max_user_instances is 128, so if
> > > > we try to tail 154 files, it will be failed.
> > > 
> > > We use 'tail' all over the place in xfstests, so why is only
> > > generic/084 affected?
> > 
> > Because generic/084 use try to create $nr_cpu tail processes:
> > for i in `seq 1 $nr_cpu`; do
> >     ...
> >     tail -f $testfile &
> >     ...
> > done
> > 
> > And nr_cpu=`$here/src/feature -o`.
> > 
> > Generally fs.inotify.max_user_instances is 128, when a machine
> > have more than(or nearly the same) this number, this test will
> > failed.
> 
> This information should have been in the patch description - it's
> concurrently run tail commands that are the problem, not a single
> execution of tail...
> 
> > Maybe other cases don't try to create so many tail processes, so
> > they passed.
> 
> Exactly. The answer is obvious when you explain it fully :)
> 
> So, we have other tests that use hundreds of open unlinked files and
> they don't have this problem. That means the issue is how
> generic/084 is creating the unlinked files, not an issue with
> inotify config.
> 
> Go look at src/multi_open_unlink.c and tests/xfs/1[28]1, and then
> rewrite generic/084 to use multi_open_unlink to create and hold
> open unlinked files for a specified amount of time.

You're Great! This's a good idea to fix this problem. Thanks very much.
I have sent the V3 patch, and tested on an old kernel to sure
the bug still can be reproduced after I changed.

Thanks,
Zorro Lang

> 
> Cheers,
> 
> Dave.
> --
> Dave Chinner
> david@fromorbit.com
> 
--
To unsubscribe from this list: send the line "unsubscribe fstests" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/tests/generic/084 b/tests/generic/084
index 3fec6c2..f34d4b2 100755
--- a/tests/generic/084
+++ b/tests/generic/084
@@ -63,6 +63,15 @@  link_unlink_storm()
 
 rm -f $seqres.full
 nr_cpu=`$here/src/feature -o`
+user_watches=`sysctl -n fs.inotify.max_user_watches`
+user_instances=`sysctl -n fs.inotify.max_user_instances`
+
+user_limit=$((user_watches > user_instances ? user_instances : user_watches))
+if [ $nr_cpu -ge $((user_limit/2)) ]
+then
+	nr_cpu=$((user_limit/2))
+fi
+
 echo "Silence is golden"
 
 _scratch_mkfs >>$seqres.full 2>&1