mbox series

[0/2] Test overlayfs readdir cache

Message ID 20210421092317.68716-1-amir73il@gmail.com (mailing list archive)
Headers show
Series Test overlayfs readdir cache | expand

Message

Amir Goldstein April 21, 2021, 9:23 a.m. UTC
Eryu,

This extends the generic t_dir_offset2 test to verify
some more test cases and adds a new generic test which
passes on overlayfs (and other fs) on upstream kernel.

The overlayfs specific test fails on upstream kernel
and the fix commit is currently in linux-next.
As usual, you may want to wait with merging until the fix
commit hits upstream.

Miklos,

I had noticed in the test full logs that readdir of
a merged dir behaves strangely - when seeking backwards
to offsets > 0, readdir returns unlinked entries in results.
The test does not fail on that behavior because the test
only asserts that this is not allowed after seek to offset 0.

Knowing the implementation of overlayfs readdir cache this is
not surprising to me, but I wonder if this behavior is POSIX
compliant, and if not, whether we should document it and/or
add a failing test for it.

Thanks,
Amir.

Amir Goldstein (2):
  generic: Test readdir of modified directrory
  overlay: Test invalidate of readdir cache

 src/t_dir_offset2.c   |  63 +++++++++++++++++++++++--
 tests/generic/700     |  60 ++++++++++++++++++++++++
 tests/generic/700.out |   2 +
 tests/generic/group   |   1 +
 tests/overlay/077     | 105 ++++++++++++++++++++++++++++++++++++++++++
 tests/overlay/077.out |   2 +
 tests/overlay/group   |   1 +
 7 files changed, 231 insertions(+), 3 deletions(-)
 create mode 100755 tests/generic/700
 create mode 100644 tests/generic/700.out
 create mode 100755 tests/overlay/077
 create mode 100644 tests/overlay/077.out

Comments

Amir Goldstein April 22, 2021, 6:18 a.m. UTC | #1
On Wed, Apr 21, 2021 at 12:23 PM Amir Goldstein <amir73il@gmail.com> wrote:
>
> Eryu,
>
> This extends the generic t_dir_offset2 test to verify
> some more test cases and adds a new generic test which
> passes on overlayfs (and other fs) on upstream kernel.
>
> The overlayfs specific test fails on upstream kernel
> and the fix commit is currently in linux-next.
> As usual, you may want to wait with merging until the fix
> commit hits upstream.
>
> Miklos,
>
> I had noticed in the test full logs that readdir of
> a merged dir behaves strangely - when seeking backwards
> to offsets > 0, readdir returns unlinked entries in results.
> The test does not fail on that behavior because the test
> only asserts that this is not allowed after seek to offset 0.
>
> Knowing the implementation of overlayfs readdir cache this is
> not surprising to me, but I wonder if this behavior is POSIX
> compliant, and if not, whether we should document it and/or
> add a failing test for it.
>

I think the outcome could be worse.
If a copied up entry is unlinked after populating the dir cache
but before ovl_cache_update_ino() then ovl_cache_update_ino()
and subsequently the getdents call will fail with ENOENT.

This test is not smart enough to cover this case (if it really exists).
I think we need to relax the case of negative lookup result in
ovl_cache_update_ino() and just set p->whiteout without and
continue with no error.

This can solve several test cases.
In principle, we can "semi-reset" the merge dir cache if the dir was
modified after every seek, not just seek to 0.
By "semi-reset" I mean use the list, but force ovl_cache_update_ino()
to all upper entries, similar to ovl_dir_read_impure().

OR.. we can just do that unconditionally in ovl_iterate().
The ovl dentry cache for the children will be populated after the first
ovl_iterate() anyway, so maybe the penalty is not so bad?

Thanks,
Amir.
Miklos Szeredi April 22, 2021, 7:53 a.m. UTC | #2
On Thu, Apr 22, 2021 at 8:18 AM Amir Goldstein <amir73il@gmail.com> wrote:
>
> On Wed, Apr 21, 2021 at 12:23 PM Amir Goldstein <amir73il@gmail.com> wrote:
> >
> > Eryu,
> >
> > This extends the generic t_dir_offset2 test to verify
> > some more test cases and adds a new generic test which
> > passes on overlayfs (and other fs) on upstream kernel.
> >
> > The overlayfs specific test fails on upstream kernel
> > and the fix commit is currently in linux-next.
> > As usual, you may want to wait with merging until the fix
> > commit hits upstream.
> >
> > Miklos,
> >
> > I had noticed in the test full logs that readdir of
> > a merged dir behaves strangely - when seeking backwards
> > to offsets > 0, readdir returns unlinked entries in results.
> > The test does not fail on that behavior because the test
> > only asserts that this is not allowed after seek to offset 0.
> >
> > Knowing the implementation of overlayfs readdir cache this is
> > not surprising to me, but I wonder if this behavior is POSIX
> > compliant, and if not, whether we should document it and/or
> > add a failing test for it.
> >
>
> I think the outcome could be worse.
> If a copied up entry is unlinked after populating the dir cache
> but before ovl_cache_update_ino() then ovl_cache_update_ino()
> and subsequently the getdents call will fail with ENOENT.
>
> This test is not smart enough to cover this case (if it really exists).
> I think we need to relax the case of negative lookup result in
> ovl_cache_update_ino() and just set p->whiteout without and
> continue with no error.
>
> This can solve several test cases.
> In principle, we can "semi-reset" the merge dir cache if the dir was
> modified after every seek, not just seek to 0.
> By "semi-reset" I mean use the list, but force ovl_cache_update_ino()
> to all upper entries, similar to ovl_dir_read_impure().
>
> OR.. we can just do that unconditionally in ovl_iterate().
> The ovl dentry cache for the children will be populated after the first
> ovl_iterate() anyway, so maybe the penalty is not so bad?

POSIX does allow stale readdir data (not just in case of non-zero seek):

"If a file is removed from or added to the directory after the most
recent call to opendir() or rewinddir(), whether a subsequent call to
readdir() returns an entry for that file is unspecified."

If you think about the way readdir(3) is implemented by the libc, this
is inevitable.

Returning ENOENT from readdir(3) is obviously a bug.

The merge case being not super high performance is perfectly okay.
The only thing I've worried about in that case is unbound memory use,
but apparently that hasn't been an issue in practice.

Thanks,
Miklos
Amir Goldstein April 22, 2021, 8:47 a.m. UTC | #3
On Thu, Apr 22, 2021 at 10:53 AM Miklos Szeredi <miklos@szeredi.hu> wrote:
>
> On Thu, Apr 22, 2021 at 8:18 AM Amir Goldstein <amir73il@gmail.com> wrote:
> >
> > On Wed, Apr 21, 2021 at 12:23 PM Amir Goldstein <amir73il@gmail.com> wrote:
> > >
> > > Eryu,
> > >
> > > This extends the generic t_dir_offset2 test to verify
> > > some more test cases and adds a new generic test which
> > > passes on overlayfs (and other fs) on upstream kernel.
> > >
> > > The overlayfs specific test fails on upstream kernel
> > > and the fix commit is currently in linux-next.
> > > As usual, you may want to wait with merging until the fix
> > > commit hits upstream.
> > >
> > > Miklos,
> > >
> > > I had noticed in the test full logs that readdir of
> > > a merged dir behaves strangely - when seeking backwards
> > > to offsets > 0, readdir returns unlinked entries in results.
> > > The test does not fail on that behavior because the test
> > > only asserts that this is not allowed after seek to offset 0.
> > >
> > > Knowing the implementation of overlayfs readdir cache this is
> > > not surprising to me, but I wonder if this behavior is POSIX
> > > compliant, and if not, whether we should document it and/or
> > > add a failing test for it.
> > >
> >
> > I think the outcome could be worse.
> > If a copied up entry is unlinked after populating the dir cache
> > but before ovl_cache_update_ino() then ovl_cache_update_ino()
> > and subsequently the getdents call will fail with ENOENT.
> >
> > This test is not smart enough to cover this case (if it really exists).
> > I think we need to relax the case of negative lookup result in
> > ovl_cache_update_ino() and just set p->whiteout without and
> > continue with no error.
> >
> > This can solve several test cases.
> > In principle, we can "semi-reset" the merge dir cache if the dir was
> > modified after every seek, not just seek to 0.
> > By "semi-reset" I mean use the list, but force ovl_cache_update_ino()
> > to all upper entries, similar to ovl_dir_read_impure().
> >
> > OR.. we can just do that unconditionally in ovl_iterate().
> > The ovl dentry cache for the children will be populated after the first
> > ovl_iterate() anyway, so maybe the penalty is not so bad?
>
> POSIX does allow stale readdir data (not just in case of non-zero seek):
>
> "If a file is removed from or added to the directory after the most
> recent call to opendir() or rewinddir(), whether a subsequent call to
> readdir() returns an entry for that file is unspecified."
>
> If you think about the way readdir(3) is implemented by the libc, this
> is inevitable.

I see. In that case, I would defer merging this test as it assumes too much
about readdir behavior (even though applications may expect this behavior).

>
> Returning ENOENT from readdir(3) is obviously a bug.
>
> The merge case being not super high performance is perfectly okay.
> The only thing I've worried about in that case is unbound memory use,
> but apparently that hasn't been an issue in practice.
>

Okay, so I will try to reproduce the ENOENT and fix it.
In any case, even if the bug exists it's not urgent.

Thanks,
Amir.
Miklos Szeredi April 22, 2021, 9:03 a.m. UTC | #4
On Thu, Apr 22, 2021 at 10:47 AM Amir Goldstein <amir73il@gmail.com> wrote:
>
> On Thu, Apr 22, 2021 at 10:53 AM Miklos Szeredi <miklos@szeredi.hu> wrote:
> >
> > On Thu, Apr 22, 2021 at 8:18 AM Amir Goldstein <amir73il@gmail.com> wrote:
> > >
> > > On Wed, Apr 21, 2021 at 12:23 PM Amir Goldstein <amir73il@gmail.com> wrote:
> > > >
> > > > Eryu,
> > > >
> > > > This extends the generic t_dir_offset2 test to verify
> > > > some more test cases and adds a new generic test which
> > > > passes on overlayfs (and other fs) on upstream kernel.
> > > >
> > > > The overlayfs specific test fails on upstream kernel
> > > > and the fix commit is currently in linux-next.
> > > > As usual, you may want to wait with merging until the fix
> > > > commit hits upstream.
> > > >
> > > > Miklos,
> > > >
> > > > I had noticed in the test full logs that readdir of
> > > > a merged dir behaves strangely - when seeking backwards
> > > > to offsets > 0, readdir returns unlinked entries in results.
> > > > The test does not fail on that behavior because the test
> > > > only asserts that this is not allowed after seek to offset 0.
> > > >
> > > > Knowing the implementation of overlayfs readdir cache this is
> > > > not surprising to me, but I wonder if this behavior is POSIX
> > > > compliant, and if not, whether we should document it and/or
> > > > add a failing test for it.
> > > >
> > >
> > > I think the outcome could be worse.
> > > If a copied up entry is unlinked after populating the dir cache
> > > but before ovl_cache_update_ino() then ovl_cache_update_ino()
> > > and subsequently the getdents call will fail with ENOENT.
> > >
> > > This test is not smart enough to cover this case (if it really exists).
> > > I think we need to relax the case of negative lookup result in
> > > ovl_cache_update_ino() and just set p->whiteout without and
> > > continue with no error.
> > >
> > > This can solve several test cases.
> > > In principle, we can "semi-reset" the merge dir cache if the dir was
> > > modified after every seek, not just seek to 0.
> > > By "semi-reset" I mean use the list, but force ovl_cache_update_ino()
> > > to all upper entries, similar to ovl_dir_read_impure().
> > >
> > > OR.. we can just do that unconditionally in ovl_iterate().
> > > The ovl dentry cache for the children will be populated after the first
> > > ovl_iterate() anyway, so maybe the penalty is not so bad?
> >
> > POSIX does allow stale readdir data (not just in case of non-zero seek):
> >
> > "If a file is removed from or added to the directory after the most
> > recent call to opendir() or rewinddir(), whether a subsequent call to
> > readdir() returns an entry for that file is unspecified."
> >
> > If you think about the way readdir(3) is implemented by the libc, this
> > is inevitable.
>
> I see. In that case, I would defer merging this test as it assumes too much
> about readdir behavior (even though applications may expect this behavior).

FWIW, I started writing a readdir stress/validator similar to
fsx-linux but for directories.

It's unfinished and  has performance problem as the directory grows
due to linear searches.

Putting it out in case someone wants to continue working on it, or
just take some ideas.

Thanks,
Miklos
Amir Goldstein April 23, 2021, 10:20 a.m. UTC | #5
On Thu, Apr 22, 2021 at 10:53 AM Miklos Szeredi <miklos@szeredi.hu> wrote:
>
> On Thu, Apr 22, 2021 at 8:18 AM Amir Goldstein <amir73il@gmail.com> wrote:
> >
> > On Wed, Apr 21, 2021 at 12:23 PM Amir Goldstein <amir73il@gmail.com> wrote:
> > >
> > > Eryu,
> > >
> > > This extends the generic t_dir_offset2 test to verify
> > > some more test cases and adds a new generic test which
> > > passes on overlayfs (and other fs) on upstream kernel.
> > >
> > > The overlayfs specific test fails on upstream kernel
> > > and the fix commit is currently in linux-next.
> > > As usual, you may want to wait with merging until the fix
> > > commit hits upstream.
> > >
> > > Miklos,
> > >
> > > I had noticed in the test full logs that readdir of
> > > a merged dir behaves strangely - when seeking backwards
> > > to offsets > 0, readdir returns unlinked entries in results.
> > > The test does not fail on that behavior because the test
> > > only asserts that this is not allowed after seek to offset 0.
> > >
> > > Knowing the implementation of overlayfs readdir cache this is
> > > not surprising to me, but I wonder if this behavior is POSIX
> > > compliant, and if not, whether we should document it and/or
> > > add a failing test for it.
> > >
> >
> > I think the outcome could be worse.
> > If a copied up entry is unlinked after populating the dir cache
> > but before ovl_cache_update_ino() then ovl_cache_update_ino()
> > and subsequently the getdents call will fail with ENOENT.
> >
> > This test is not smart enough to cover this case (if it really exists).
> > I think we need to relax the case of negative lookup result in
> > ovl_cache_update_ino() and just set p->whiteout without and
> > continue with no error.
> >
> > This can solve several test cases.
> > In principle, we can "semi-reset" the merge dir cache if the dir was
> > modified after every seek, not just seek to 0.
> > By "semi-reset" I mean use the list, but force ovl_cache_update_ino()
> > to all upper entries, similar to ovl_dir_read_impure().
> >
> > OR.. we can just do that unconditionally in ovl_iterate().
> > The ovl dentry cache for the children will be populated after the first
> > ovl_iterate() anyway, so maybe the penalty is not so bad?
>
> POSIX does allow stale readdir data (not just in case of non-zero seek):
>
> "If a file is removed from or added to the directory after the most
> recent call to opendir() or rewinddir(), whether a subsequent call to
> readdir() returns an entry for that file is unspecified."
>
> If you think about the way readdir(3) is implemented by the libc, this
> is inevitable.
>

That makes the test I posted wrong, because it expects the
dir modifications to be visible after seek to 0.

The thing is, unlike readdir(3) implementation, overlayfs keeps the
readdir cache on the inode, so by keeping the original fd open
and opening a new fd, I could reproduce stale and missing entries
for a new opendir, which is definitely a bug.

Will post the updated test.

Thanks,
Amir.
Amir Goldstein April 23, 2021, 7:03 p.m. UTC | #6
On Thu, Apr 22, 2021 at 10:53 AM Miklos Szeredi <miklos@szeredi.hu> wrote:
>
> On Thu, Apr 22, 2021 at 8:18 AM Amir Goldstein <amir73il@gmail.com> wrote:
> >
> > On Wed, Apr 21, 2021 at 12:23 PM Amir Goldstein <amir73il@gmail.com> wrote:
> > >
> > > Eryu,
> > >
> > > This extends the generic t_dir_offset2 test to verify
> > > some more test cases and adds a new generic test which
> > > passes on overlayfs (and other fs) on upstream kernel.
> > >
> > > The overlayfs specific test fails on upstream kernel
> > > and the fix commit is currently in linux-next.
> > > As usual, you may want to wait with merging until the fix
> > > commit hits upstream.
> > >
> > > Miklos,
> > >
> > > I had noticed in the test full logs that readdir of
> > > a merged dir behaves strangely - when seeking backwards
> > > to offsets > 0, readdir returns unlinked entries in results.
> > > The test does not fail on that behavior because the test
> > > only asserts that this is not allowed after seek to offset 0.
> > >
> > > Knowing the implementation of overlayfs readdir cache this is
> > > not surprising to me, but I wonder if this behavior is POSIX
> > > compliant, and if not, whether we should document it and/or
> > > add a failing test for it.
> > >
> >
> > I think the outcome could be worse.
> > If a copied up entry is unlinked after populating the dir cache
> > but before ovl_cache_update_ino() then ovl_cache_update_ino()
> > and subsequently the getdents call will fail with ENOENT.
> >
> > This test is not smart enough to cover this case (if it really exists).
> > I think we need to relax the case of negative lookup result in
> > ovl_cache_update_ino() and just set p->whiteout without and
> > continue with no error.
> >
> > This can solve several test cases.
> > In principle, we can "semi-reset" the merge dir cache if the dir was
> > modified after every seek, not just seek to 0.
> > By "semi-reset" I mean use the list, but force ovl_cache_update_ino()
> > to all upper entries, similar to ovl_dir_read_impure().
> >
> > OR.. we can just do that unconditionally in ovl_iterate().
> > The ovl dentry cache for the children will be populated after the first
> > ovl_iterate() anyway, so maybe the penalty is not so bad?
>
> POSIX does allow stale readdir data (not just in case of non-zero seek):
>
> "If a file is removed from or added to the directory after the most
> recent call to opendir() or rewinddir(), whether a subsequent call to
> readdir() returns an entry for that file is unspecified."
>
> If you think about the way readdir(3) is implemented by the libc, this
> is inevitable.
>
> Returning ENOENT from readdir(3) is obviously a bug.

There is no ENOENT bug. I read the code in ovl_cache_update_ino()
wrong, unless lookup_one_len() returns PTR_ERR(-ENOENT) instead
of a negative dentry and I never understood if this ever happens, so
the most we need to do beyond the fix already in overlayfs-next in
to check that -ENOENT case.

Thanks,
Amir.