diff mbox series

[GSoC,v7,06/10] dir-iterator: add flags parameter to dir_iterator_begin

Message ID 5a678ee74de42f1373deeed718fa24d368347d13.1560898723.git.matheus.bernardino@usp.br (mailing list archive)
State New, archived
Headers show
Series clone: dir-iterator refactoring with tests | expand

Commit Message

Matheus Tavares June 18, 2019, 11:27 p.m. UTC
Add the possibility of giving flags to dir_iterator_begin to initialize
a dir-iterator with special options.

Currently possible flags are:
- DIR_ITERATOR_PEDANTIC, which makes dir_iterator_advance abort
immediately in the case of an error, instead of keep looking for the
next valid entry;
- DIR_ITERATOR_FOLLOW_SYMLINKS, which makes the iterator follow
symlinks and include linked directories' contents in the iteration.

These new flags will be used in a subsequent patch.

Also add tests for the flags' usage and adjust refs/files-backend.c to
the new dir_iterator_begin signature.

Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
 dir-iterator.c               | 82 +++++++++++++++++++++++++------
 dir-iterator.h               | 51 ++++++++++++++-----
 refs/files-backend.c         |  2 +-
 t/helper/test-dir-iterator.c | 34 ++++++++++---
 t/t0066-dir-iterator.sh      | 95 ++++++++++++++++++++++++++++++++++++
 5 files changed, 229 insertions(+), 35 deletions(-)

Comments

Junio C Hamano June 25, 2019, 6 p.m. UTC | #1
Matheus Tavares <matheus.bernardino@usp.br> writes:

This hunk, which claims to have 25 lines in the postimage ...

> @@ -44,6 +45,25 @@
>   * dir_iterator_advance() again.
>   */
>  
> +/*
> + * Flags for dir_iterator_begin:
> + *
> + * - DIR_ITERATOR_PEDANTIC: override dir-iterator's default behavior
> + *   in case of an error at dir_iterator_advance(), which is to keep
> + *   looking for a next valid entry. With this flag, resources are freed
> + *   and ITER_ERROR is returned immediately. In both cases, a meaningful
> + *   warning is emitted. Note: ENOENT errors are always ignored so that
> + *   the API users may remove files during iteration.
> + *
> + * - DIR_ITERATOR_FOLLOW_SYMLINKS: make dir-iterator follow symlinks.
> + *   i.e., linked directories' contents will be iterated over and
> + *   iter->base.st will contain information on the referred files,
> + *   not the symlinks themselves, which is the default behavior.
> + *   Recursive symlinks are skipped with a warning and broken symlinks
> + *   are ignored.
> + */
> +#define DIR_ITERATOR_PEDANTIC (1 << 0)
> +#define DIR_ITERATOR_FOLLOW_SYMLINKS (1 << 1)
> +
>  struct dir_iterator {
>  	/* The current path: */
>  	struct strbuf path;
> @@ -58,29 +78,38 @@ struct dir_iterator {

... adds 20 lines, making the postimage 26 lines long.

Did you hand edit your patch?  It is OK to do so, as long as you
know what you are doing ;-).  Adjust the length of the postimage on
the @@ ... @@ line to make it consistent with the patch text, and
also make sure a tweak you do here won't make later patches not
apply.
Matheus Tavares June 25, 2019, 6:11 p.m. UTC | #2
On Tue, Jun 25, 2019 at 3:00 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Matheus Tavares <matheus.bernardino@usp.br> writes:
>
> This hunk, which claims to have 25 lines in the postimage ...
>
> > @@ -44,6 +45,25 @@
> >   * dir_iterator_advance() again.
> >   */
> >
> > +/*
> > + * Flags for dir_iterator_begin:
> > + *
> > + * - DIR_ITERATOR_PEDANTIC: override dir-iterator's default behavior
> > + *   in case of an error at dir_iterator_advance(), which is to keep
> > + *   looking for a next valid entry. With this flag, resources are freed
> > + *   and ITER_ERROR is returned immediately. In both cases, a meaningful
> > + *   warning is emitted. Note: ENOENT errors are always ignored so that
> > + *   the API users may remove files during iteration.
> > + *
> > + * - DIR_ITERATOR_FOLLOW_SYMLINKS: make dir-iterator follow symlinks.
> > + *   i.e., linked directories' contents will be iterated over and
> > + *   iter->base.st will contain information on the referred files,
> > + *   not the symlinks themselves, which is the default behavior.
> > + *   Recursive symlinks are skipped with a warning and broken symlinks
> > + *   are ignored.
> > + */
> > +#define DIR_ITERATOR_PEDANTIC (1 << 0)
> > +#define DIR_ITERATOR_FOLLOW_SYMLINKS (1 << 1)
> > +
> >  struct dir_iterator {
> >       /* The current path: */
> >       struct strbuf path;
> > @@ -58,29 +78,38 @@ struct dir_iterator {
>
> ... adds 20 lines, making the postimage 26 lines long.
>
> Did you hand edit your patch?  It is OK to do so, as long as you
> know what you are doing ;-).  Adjust the length of the postimage on
> the @@ ... @@ line to make it consistent with the patch text, and
> also make sure a tweak you do here won't make later patches not
> apply.

Oh, I'm sorry for that, I'll be more careful with hand editing next
time. Thanks for the advice. I think for this time it won't affect the
later patches as it was a minor addition at one comment, but should I
perhaps re-send it?
Johannes Schindelin June 26, 2019, 1:34 p.m. UTC | #3
Hi Matheus,

On Tue, 18 Jun 2019, Matheus Tavares wrote:

>[...]
> +/*
> + * Look for a recursive symlink at iter->base.path pointing to any directory on
> + * the previous stack levels. If it is found, return 1. If not, return 0.
> + */
> +static int find_recursive_symlinks(struct dir_iterator_int *iter)
> +{
> +	int i;
> +
> +	if (!(iter->flags & DIR_ITERATOR_FOLLOW_SYMLINKS) ||
> +	    !S_ISDIR(iter->base.st.st_mode))
> +		return 0;
>
> +	for (i = 0; i < iter->levels_nr; ++i)
> +		if (iter->base.st.st_ino == iter->levels[i].ino)

This does not work on Windows. Remember, Git relies on (too) many areas
where Linux is strong, and the `lstat()` call is one of them. Therefore,
Git overuses that call.

In the Git for Windows project, we struggled a bit to emulate it in the
best way.

It is pretty expensive, for example, to find out the number of hard
links, the device ID, an equivalent of the inode, etc. Many `lstat()`
calls are really only interested in the `mtime`, though, meaning that we
would waste a ton of time if we tried to be more faithful in our `lstat()`
emulation.

Therefore, we simply assign `0` as inode.

Sure, this violates the POSIX standard, but imagine this: the FAT
filesystem (which is still in use!) does not have _anything_ resembling
inodes.

I fear, therefore, that we will require at least a workaround for the
situation where `st_ino` is always zero.

Ciao,
Johannes
Junio C Hamano June 26, 2019, 6:04 p.m. UTC | #4
Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> Hi Matheus,
>
> On Tue, 18 Jun 2019, Matheus Tavares wrote:
>
>>[...]
>> +/*
>> + * Look for a recursive symlink at iter->base.path pointing to any directory on
>> + * the previous stack levels. If it is found, return 1. If not, return 0.
>> + */
>> +static int find_recursive_symlinks(struct dir_iterator_int *iter)
>> +{
>> +	int i;
>> +
>> +	if (!(iter->flags & DIR_ITERATOR_FOLLOW_SYMLINKS) ||
>> +	    !S_ISDIR(iter->base.st.st_mode))
>> +		return 0;
>>
>> +	for (i = 0; i < iter->levels_nr; ++i)
>> +		if (iter->base.st.st_ino == iter->levels[i].ino)
>
> This does not work on Windows. [[ Windows port does not have
> usable st_ino field ]]]

And if you cross mountpoint, st_ino alone does not guarantee
uniqueness; you'd need to combine it with st_dev, I would think,
even on POSIX systems.
Duy Nguyen June 27, 2019, 9:20 a.m. UTC | #5
On Thu, Jun 27, 2019 at 1:04 AM Junio C Hamano <gitster@pobox.com> wrote:
>
> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
>
> > Hi Matheus,
> >
> > On Tue, 18 Jun 2019, Matheus Tavares wrote:
> >
> >>[...]
> >> +/*
> >> + * Look for a recursive symlink at iter->base.path pointing to any directory on
> >> + * the previous stack levels. If it is found, return 1. If not, return 0.
> >> + */
> >> +static int find_recursive_symlinks(struct dir_iterator_int *iter)
> >> +{
> >> +    int i;
> >> +
> >> +    if (!(iter->flags & DIR_ITERATOR_FOLLOW_SYMLINKS) ||
> >> +        !S_ISDIR(iter->base.st.st_mode))
> >> +            return 0;
> >>
> >> +    for (i = 0; i < iter->levels_nr; ++i)
> >> +            if (iter->base.st.st_ino == iter->levels[i].ino)
> >
> > This does not work on Windows. [[ Windows port does not have
> > usable st_ino field ]]]
>
> And if you cross mountpoint, st_ino alone does not guarantee
> uniqueness; you'd need to combine it with st_dev, I would think,
> even on POSIX systems.

which should be protected by USE_STDEV. There's another code that
ignore st_ino on Windows in entry.c. Maybe it's time to define
USE_STINO instead of spreading "#if GIT_WINDOWS_NATIVE" more.
Matheus Tavares June 27, 2019, 5:23 p.m. UTC | #6
On Wed, Jun 26, 2019 at 3:04 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
>
> > Hi Matheus,
> >
> > On Tue, 18 Jun 2019, Matheus Tavares wrote:
> >
> >>[...]
> >> +/*
> >> + * Look for a recursive symlink at iter->base.path pointing to any directory on
> >> + * the previous stack levels. If it is found, return 1. If not, return 0.
> >> + */
> >> +static int find_recursive_symlinks(struct dir_iterator_int *iter)
> >> +{
> >> +    int i;
> >> +
> >> +    if (!(iter->flags & DIR_ITERATOR_FOLLOW_SYMLINKS) ||
> >> +        !S_ISDIR(iter->base.st.st_mode))
> >> +            return 0;
> >>
> >> +    for (i = 0; i < iter->levels_nr; ++i)
> >> +            if (iter->base.st.st_ino == iter->levels[i].ino)
> >
> > This does not work on Windows. [[ Windows port does not have
> > usable st_ino field ]]]
>
> And if you cross mountpoint, st_ino alone does not guarantee
> uniqueness; you'd need to combine it with st_dev, I would think,
> even on POSIX systems.

Ok, thanks for letting me know. I'm trying to think of another
approach to test for recursive symlinks that does not rely on inode:
Given any symlink, we could get its real_path() and compare it with
the path of the directory current being iterated. If the first is a
prefix of the second, than we mark it as a recursive symlink.

What do you think of this idea?
Johannes Schindelin June 27, 2019, 6:48 p.m. UTC | #7
Hi Matheus,

On Thu, 27 Jun 2019, Matheus Tavares Bernardino wrote:

> On Wed, Jun 26, 2019 at 3:04 PM Junio C Hamano <gitster@pobox.com> wrote:
> >
> > Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
> >
> > > Hi Matheus,
> > >
> > > On Tue, 18 Jun 2019, Matheus Tavares wrote:
> > >
> > >>[...]
> > >> +/*
> > >> + * Look for a recursive symlink at iter->base.path pointing to any directory on
> > >> + * the previous stack levels. If it is found, return 1. If not, return 0.
> > >> + */
> > >> +static int find_recursive_symlinks(struct dir_iterator_int *iter)
> > >> +{
> > >> +    int i;
> > >> +
> > >> +    if (!(iter->flags & DIR_ITERATOR_FOLLOW_SYMLINKS) ||
> > >> +        !S_ISDIR(iter->base.st.st_mode))
> > >> +            return 0;
> > >>
> > >> +    for (i = 0; i < iter->levels_nr; ++i)
> > >> +            if (iter->base.st.st_ino == iter->levels[i].ino)
> > >
> > > This does not work on Windows. [[ Windows port does not have
> > > usable st_ino field ]]]
> >
> > And if you cross mountpoint, st_ino alone does not guarantee
> > uniqueness; you'd need to combine it with st_dev, I would think,
> > even on POSIX systems.
>
> Ok, thanks for letting me know. I'm trying to think of another
> approach to test for recursive symlinks that does not rely on inode:
> Given any symlink, we could get its real_path() and compare it with
> the path of the directory current being iterated. If the first is a
> prefix of the second, than we mark it as a recursive symlink.
>
> What do you think of this idea?

I think this would be pretty expensive. Too expensive.

A better method might be to rely on st_ino/st_dev when we can, and just
not bother looking for recursive symlinks when we cannot, like I did in
https://github.com/git-for-windows/git/commit/979b00ccf44ec31cff4686e24adf27474923c33a

Ciao,
Johannes
Matheus Tavares June 27, 2019, 7:33 p.m. UTC | #8
On Thu, Jun 27, 2019 at 3:47 PM Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
>
> Hi Matheus,
>
> On Thu, 27 Jun 2019, Matheus Tavares Bernardino wrote:
>
> > On Wed, Jun 26, 2019 at 3:04 PM Junio C Hamano <gitster@pobox.com> wrote:
> > >
> > > Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
> > >
> > > > Hi Matheus,
> > > >
> > > > On Tue, 18 Jun 2019, Matheus Tavares wrote:
> > > >
> > > >>[...]
> > > >> +/*
> > > >> + * Look for a recursive symlink at iter->base.path pointing to any directory on
> > > >> + * the previous stack levels. If it is found, return 1. If not, return 0.
> > > >> + */
> > > >> +static int find_recursive_symlinks(struct dir_iterator_int *iter)
> > > >> +{
> > > >> +    int i;
> > > >> +
> > > >> +    if (!(iter->flags & DIR_ITERATOR_FOLLOW_SYMLINKS) ||
> > > >> +        !S_ISDIR(iter->base.st.st_mode))
> > > >> +            return 0;
> > > >>
> > > >> +    for (i = 0; i < iter->levels_nr; ++i)
> > > >> +            if (iter->base.st.st_ino == iter->levels[i].ino)
> > > >
> > > > This does not work on Windows. [[ Windows port does not have
> > > > usable st_ino field ]]]
> > >
> > > And if you cross mountpoint, st_ino alone does not guarantee
> > > uniqueness; you'd need to combine it with st_dev, I would think,
> > > even on POSIX systems.
> >
> > Ok, thanks for letting me know. I'm trying to think of another
> > approach to test for recursive symlinks that does not rely on inode:
> > Given any symlink, we could get its real_path() and compare it with
> > the path of the directory current being iterated. If the first is a
> > prefix of the second, than we mark it as a recursive symlink.
> >
> > What do you think of this idea?
>
> I think this would be pretty expensive. Too expensive.

Hmm, yes unfortunately :(

> A better method might be to rely on st_ino/st_dev when we can, and just
> not bother looking for recursive symlinks when we cannot,

What if we fallback on the path prefix strategy when st_ino is not
available? I mean, if we don't look for recursive symlinks, they would
be iterated over and over until we get an ELOOP error. So I think
using real_path() should be less expensive in this case. (But just as
a fallback to st_ino, off course)

> like I did in
> https://github.com/git-for-windows/git/commit/979b00ccf44ec31cff4686e24adf27474923c33a

Nice! At dir-iterator.h the documentation says that recursive symlinks
will be ignored. If we don't implement any fallback, should we add
that this is not available on Windows, perhaps?

> Ciao,
> Johannes
Johannes Schindelin June 28, 2019, 12:51 p.m. UTC | #9
Hi Matheus,

On Thu, 27 Jun 2019, Matheus Tavares Bernardino wrote:

> On Thu, Jun 27, 2019 at 3:47 PM Johannes Schindelin
> <Johannes.Schindelin@gmx.de> wrote:
> >
> > On Thu, 27 Jun 2019, Matheus Tavares Bernardino wrote:
> >
> > > On Wed, Jun 26, 2019 at 3:04 PM Junio C Hamano <gitster@pobox.com> wrote:
> > > >
> > > > Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
> > > >
> > > > > On Tue, 18 Jun 2019, Matheus Tavares wrote:
> > > > >
> > > > >>[...]
> > > > >> +/*
> > > > >> + * Look for a recursive symlink at iter->base.path pointing to any directory on
> > > > >> + * the previous stack levels. If it is found, return 1. If not, return 0.
> > > > >> + */
> > > > >> +static int find_recursive_symlinks(struct dir_iterator_int *iter)
> > > > >> +{
> > > > >> +    int i;
> > > > >> +
> > > > >> +    if (!(iter->flags & DIR_ITERATOR_FOLLOW_SYMLINKS) ||
> > > > >> +        !S_ISDIR(iter->base.st.st_mode))
> > > > >> +            return 0;
> > > > >>
> > > > >> +    for (i = 0; i < iter->levels_nr; ++i)
> > > > >> +            if (iter->base.st.st_ino == iter->levels[i].ino)
> > > > >
> > > > > This does not work on Windows. [[ Windows port does not have
> > > > > usable st_ino field ]]]
> > > >
> > > > And if you cross mountpoint, st_ino alone does not guarantee
> > > > uniqueness; you'd need to combine it with st_dev, I would think,
> > > > even on POSIX systems.
> > >
> > > Ok, thanks for letting me know. I'm trying to think of another
> > > approach to test for recursive symlinks that does not rely on inode:
> > > Given any symlink, we could get its real_path() and compare it with
> > > the path of the directory current being iterated. If the first is a
> > > prefix of the second, than we mark it as a recursive symlink.
> > >
> > > What do you think of this idea?
> >
> > I think this would be pretty expensive. Too expensive.
>
> Hmm, yes unfortunately :(
>
> > A better method might be to rely on st_ino/st_dev when we can, and just
> > not bother looking for recursive symlinks when we cannot,
>
> What if we fallback on the path prefix strategy when st_ino is not
> available? I mean, if we don't look for recursive symlinks, they would
> be iterated over and over until we get an ELOOP error. So I think
> using real_path() should be less expensive in this case. (But just as
> a fallback to st_ino, off course)
>
> > like I did in
> > https://github.com/git-for-windows/git/commit/979b00ccf44ec31cff4686e24adf27474923c33a
>
> Nice! At dir-iterator.h the documentation says that recursive symlinks
> will be ignored. If we don't implement any fallback, should we add
> that this is not available on Windows, perhaps?

I do not really care, unless it breaks things on Windows that were not
broken before.

You might also want to guard this behind `USE_STDEV` as Duy suggested (and
maybe use the opportunity to correct that constant to `USE_ST_DEV`; I
looked for it and did not find it because of that naming mistake).

Ciao,
Dscho
Matheus Tavares June 28, 2019, 2:16 p.m. UTC | #10
On Fri, Jun 28, 2019 at 9:50 AM Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
>
> Hi Matheus,
>
> On Thu, 27 Jun 2019, Matheus Tavares Bernardino wrote:
>
> > On Thu, Jun 27, 2019 at 3:47 PM Johannes Schindelin
> > <Johannes.Schindelin@gmx.de> wrote:
> > >
> > > On Thu, 27 Jun 2019, Matheus Tavares Bernardino wrote:
> > >
> > > > On Wed, Jun 26, 2019 at 3:04 PM Junio C Hamano <gitster@pobox.com> wrote:
> > > > >
> > > > > Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
> > > > >
> > > > > > On Tue, 18 Jun 2019, Matheus Tavares wrote:
> > > > > >
> > > > > >>[...]
> > > > > >> +/*
> > > > > >> + * Look for a recursive symlink at iter->base.path pointing to any directory on
> > > > > >> + * the previous stack levels. If it is found, return 1. If not, return 0.
> > > > > >> + */
> > > > > >> +static int find_recursive_symlinks(struct dir_iterator_int *iter)
> > > > > >> +{
> > > > > >> +    int i;
> > > > > >> +
> > > > > >> +    if (!(iter->flags & DIR_ITERATOR_FOLLOW_SYMLINKS) ||
> > > > > >> +        !S_ISDIR(iter->base.st.st_mode))
> > > > > >> +            return 0;
> > > > > >>
> > > > > >> +    for (i = 0; i < iter->levels_nr; ++i)
> > > > > >> +            if (iter->base.st.st_ino == iter->levels[i].ino)
> > > > > >
> > > > > > This does not work on Windows. [[ Windows port does not have
> > > > > > usable st_ino field ]]]
> > > > >
> > > > > And if you cross mountpoint, st_ino alone does not guarantee
> > > > > uniqueness; you'd need to combine it with st_dev, I would think,
> > > > > even on POSIX systems.
> > > >
> > > > Ok, thanks for letting me know. I'm trying to think of another
> > > > approach to test for recursive symlinks that does not rely on inode:
> > > > Given any symlink, we could get its real_path() and compare it with
> > > > the path of the directory current being iterated. If the first is a
> > > > prefix of the second, than we mark it as a recursive symlink.
> > > >
> > > > What do you think of this idea?
> > >
> > > I think this would be pretty expensive. Too expensive.
> >
> > Hmm, yes unfortunately :(
> >
> > > A better method might be to rely on st_ino/st_dev when we can, and just
> > > not bother looking for recursive symlinks when we cannot,
> >
> > What if we fallback on the path prefix strategy when st_ino is not
> > available? I mean, if we don't look for recursive symlinks, they would
> > be iterated over and over until we get an ELOOP error. So I think
> > using real_path() should be less expensive in this case. (But just as
> > a fallback to st_ino, off course)
> >
> > > like I did in
> > > https://github.com/git-for-windows/git/commit/979b00ccf44ec31cff4686e24adf27474923c33a
> >
> > Nice! At dir-iterator.h the documentation says that recursive symlinks
> > will be ignored. If we don't implement any fallback, should we add
> > that this is not available on Windows, perhaps?
>
> I do not really care, unless it breaks things on Windows that were not
> broken before.
>
> You might also want to guard this behind `USE_STDEV` as Duy suggested (and
> maybe use the opportunity to correct that constant to `USE_ST_DEV`; I
> looked for it and did not find it because of that naming mistake).

Ok, just to confirm, what I should do is send your fixup patch with
the USE_STDEV guard addition, right? Also, USE_STDEV docs says it is
used "from the update-index perspective", should I make it more
generic as we're using it for other purposes or is it OK like this?

Thanks,
Matheus
Johannes Schindelin July 1, 2019, 12:15 p.m. UTC | #11
Hi Matheus,

On Fri, 28 Jun 2019, Matheus Tavares Bernardino wrote:

> On Fri, Jun 28, 2019 at 9:50 AM Johannes Schindelin
> <Johannes.Schindelin@gmx.de> wrote:
> >
> > Hi Matheus,
> >
> > On Thu, 27 Jun 2019, Matheus Tavares Bernardino wrote:
> >
> > > On Thu, Jun 27, 2019 at 3:47 PM Johannes Schindelin
> > > <Johannes.Schindelin@gmx.de> wrote:
> > > >
> > > > On Thu, 27 Jun 2019, Matheus Tavares Bernardino wrote:
> > > >
> > > > > On Wed, Jun 26, 2019 at 3:04 PM Junio C Hamano <gitster@pobox.com> wrote:
> > > > > >
> > > > > > Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
> > > > > >
> > > > > > > On Tue, 18 Jun 2019, Matheus Tavares wrote:
> > > > > > >
> > > > > > >>[...]
> > > > > > >> +/*
> > > > > > >> + * Look for a recursive symlink at iter->base.path pointing to any directory on
> > > > > > >> + * the previous stack levels. If it is found, return 1. If not, return 0.
> > > > > > >> + */
> > > > > > >> +static int find_recursive_symlinks(struct dir_iterator_int *iter)
> > > > > > >> +{
> > > > > > >> +    int i;
> > > > > > >> +
> > > > > > >> +    if (!(iter->flags & DIR_ITERATOR_FOLLOW_SYMLINKS) ||
> > > > > > >> +        !S_ISDIR(iter->base.st.st_mode))
> > > > > > >> +            return 0;
> > > > > > >>
> > > > > > >> +    for (i = 0; i < iter->levels_nr; ++i)
> > > > > > >> +            if (iter->base.st.st_ino == iter->levels[i].ino)
> > > > > > >
> > > > > > > This does not work on Windows. [[ Windows port does not have
> > > > > > > usable st_ino field ]]]
> > > > > >
> > > > > > And if you cross mountpoint, st_ino alone does not guarantee
> > > > > > uniqueness; you'd need to combine it with st_dev, I would think,
> > > > > > even on POSIX systems.
> > > > >
> > > > > Ok, thanks for letting me know. I'm trying to think of another
> > > > > approach to test for recursive symlinks that does not rely on inode:
> > > > > Given any symlink, we could get its real_path() and compare it with
> > > > > the path of the directory current being iterated. If the first is a
> > > > > prefix of the second, than we mark it as a recursive symlink.
> > > > >
> > > > > What do you think of this idea?
> > > >
> > > > I think this would be pretty expensive. Too expensive.
> > >
> > > Hmm, yes unfortunately :(
> > >
> > > > A better method might be to rely on st_ino/st_dev when we can, and just
> > > > not bother looking for recursive symlinks when we cannot,
> > >
> > > What if we fallback on the path prefix strategy when st_ino is not
> > > available? I mean, if we don't look for recursive symlinks, they would
> > > be iterated over and over until we get an ELOOP error. So I think
> > > using real_path() should be less expensive in this case. (But just as
> > > a fallback to st_ino, off course)
> > >
> > > > like I did in
> > > > https://github.com/git-for-windows/git/commit/979b00ccf44ec31cff4686e24adf27474923c33a
> > >
> > > Nice! At dir-iterator.h the documentation says that recursive symlinks
> > > will be ignored. If we don't implement any fallback, should we add
> > > that this is not available on Windows, perhaps?
> >
> > I do not really care, unless it breaks things on Windows that were not
> > broken before.
> >
> > You might also want to guard this behind `USE_STDEV` as Duy suggested (and
> > maybe use the opportunity to correct that constant to `USE_ST_DEV`; I
> > looked for it and did not find it because of that naming mistake).
>
> Ok, just to confirm, what I should do is send your fixup patch with
> the USE_STDEV guard addition, right? Also, USE_STDEV docs says it is
> used "from the update-index perspective", should I make it more
> generic as we're using it for other purposes or is it OK like this?

I thought Duy had verified that `USE_STDEV` would make sense in this
instance, but I agree with you that the idea of that compile time flag was
not to guard against a missing `st_dev` field, but about trusting it in
the presence of network filesystems.

So no, I revert my vote for using `USE_STDEV`.

Thanks for the sanity check.

Ciao,
Dscho
SZEDER Gábor July 3, 2019, 8:57 a.m. UTC | #12
> diff --git a/t/t0066-dir-iterator.sh b/t/t0066-dir-iterator.sh
> index c739ed7911..8f996a31fa 100755
> --- a/t/t0066-dir-iterator.sh
> +++ b/t/t0066-dir-iterator.sh
> @@ -65,4 +65,99 @@ test_expect_success 'begin should fail upon non directory paths' '
>  	test_cmp expected-non-dir-output actual-non-dir-output
>  '
>  
> +test_expect_success POSIXPERM,SANITY 'advance should not fail on errors by default' '
> +	cat >expected-no-permissions-output <<-EOF &&
> +	[d] (a) [a] ./dir3/a
> +	EOF
> +
> +	mkdir -p dir3/a &&
> +	> dir3/a/b &&

Style nit: space between redirection op and pathname.

> +	chmod 0 dir3/a &&
> +
> +	test-tool dir-iterator ./dir3 >actual-no-permissions-output &&
> +	test_cmp expected-no-permissions-output actual-no-permissions-output &&
> +	chmod 755 dir3/a &&
> +	rm -rf dir3
> +'
> +
> +test_expect_success POSIXPERM,SANITY 'advance should fail on errors, w/ pedantic flag' '
> +	cat >expected-no-permissions-pedantic-output <<-EOF &&
> +	[d] (a) [a] ./dir3/a
> +	dir_iterator_advance failure
> +	EOF
> +
> +	mkdir -p dir3/a &&
> +	> dir3/a/b &&

Likewise.

> +	chmod 0 dir3/a &&
> +
> +	test_must_fail test-tool dir-iterator --pedantic ./dir3 \
> +		>actual-no-permissions-pedantic-output &&
> +	test_cmp expected-no-permissions-pedantic-output \
> +		actual-no-permissions-pedantic-output &&
> +	chmod 755 dir3/a &&
> +	rm -rf dir3
> +'
> +
> +test_expect_success SYMLINKS 'setup dirs with symlinks' '
> +	mkdir -p dir4/a &&
> +	mkdir -p dir4/b/c &&
> +	>dir4/a/d &&
> +	ln -s d dir4/a/e &&
> +	ln -s ../b dir4/a/f &&
> +
> +	mkdir -p dir5/a/b &&
> +	mkdir -p dir5/a/c &&
> +	ln -s ../c dir5/a/b/d &&
> +	ln -s ../ dir5/a/b/e &&
> +	ln -s ../../ dir5/a/b/f
> +'
> +
> +test_expect_success SYMLINKS 'dir-iterator should not follow symlinks by default' '
> +	cat >expected-no-follow-sorted-output <<-EOF &&
> +	[d] (a) [a] ./dir4/a
> +	[d] (b) [b] ./dir4/b
> +	[d] (b/c) [c] ./dir4/b/c
> +	[f] (a/d) [d] ./dir4/a/d
> +	[s] (a/e) [e] ./dir4/a/e
> +	[s] (a/f) [f] ./dir4/a/f
> +	EOF
> +
> +	test-tool dir-iterator ./dir4 >out &&
> +	sort <out >actual-no-follow-sorted-output &&

Unnecessary redirection, 'sort' is capable to open the file on its
own.

> +
> +	test_cmp expected-no-follow-sorted-output actual-no-follow-sorted-output
> +'
> +
> +test_expect_success SYMLINKS 'dir-iterator should follow symlinks w/ follow flag' '
> +	cat >expected-follow-sorted-output <<-EOF &&
> +	[d] (a) [a] ./dir4/a
> +	[d] (a/f) [f] ./dir4/a/f
> +	[d] (a/f/c) [c] ./dir4/a/f/c
> +	[d] (b) [b] ./dir4/b
> +	[d] (b/c) [c] ./dir4/b/c
> +	[f] (a/d) [d] ./dir4/a/d
> +	[f] (a/e) [e] ./dir4/a/e
> +	EOF
> +
> +	test-tool dir-iterator --follow-symlinks ./dir4 >out &&
> +	sort <out >actual-follow-sorted-output &&

Likewise.

> +	test_cmp expected-follow-sorted-output actual-follow-sorted-output
> +'
> +
> +
> +test_expect_success SYMLINKS 'dir-iterator should ignore recursive symlinks w/ follow flag' '
> +	cat >expected-rec-symlinks-sorted-output <<-EOF &&
> +	[d] (a) [a] ./dir5/a
> +	[d] (a/b) [b] ./dir5/a/b
> +	[d] (a/b/d) [d] ./dir5/a/b/d
> +	[d] (a/c) [c] ./dir5/a/c
> +	EOF
> +
> +	test-tool dir-iterator --follow-symlinks ./dir5 >out &&
> +	sort <out >actual-rec-symlinks-sorted-output &&

Likewise.

> +	test_cmp expected-rec-symlinks-sorted-output actual-rec-symlinks-sorted-output
> +'
> +
>  test_done
> -- 
> 2.22.0
>
Matheus Tavares July 8, 2019, 10:21 p.m. UTC | #13
Thanks for the review. I'll address those issues in v8.

Best,
Matheus


On Wed, Jul 3, 2019 at 5:57 AM SZEDER Gábor <szeder.dev@gmail.com> wrote:
>
> > diff --git a/t/t0066-dir-iterator.sh b/t/t0066-dir-iterator.sh
> > index c739ed7911..8f996a31fa 100755
> > --- a/t/t0066-dir-iterator.sh
> > +++ b/t/t0066-dir-iterator.sh
> > @@ -65,4 +65,99 @@ test_expect_success 'begin should fail upon non directory paths' '
> >       test_cmp expected-non-dir-output actual-non-dir-output
> >  '
> >
> > +test_expect_success POSIXPERM,SANITY 'advance should not fail on errors by default' '
> > +     cat >expected-no-permissions-output <<-EOF &&
> > +     [d] (a) [a] ./dir3/a
> > +     EOF
> > +
> > +     mkdir -p dir3/a &&
> > +     > dir3/a/b &&
>
> Style nit: space between redirection op and pathname.
>
> > +     chmod 0 dir3/a &&
> > +
> > +     test-tool dir-iterator ./dir3 >actual-no-permissions-output &&
> > +     test_cmp expected-no-permissions-output actual-no-permissions-output &&
> > +     chmod 755 dir3/a &&
> > +     rm -rf dir3
> > +'
> > +
> > +test_expect_success POSIXPERM,SANITY 'advance should fail on errors, w/ pedantic flag' '
> > +     cat >expected-no-permissions-pedantic-output <<-EOF &&
> > +     [d] (a) [a] ./dir3/a
> > +     dir_iterator_advance failure
> > +     EOF
> > +
> > +     mkdir -p dir3/a &&
> > +     > dir3/a/b &&
>
> Likewise.
>
> > +     chmod 0 dir3/a &&
> > +
> > +     test_must_fail test-tool dir-iterator --pedantic ./dir3 \
> > +             >actual-no-permissions-pedantic-output &&
> > +     test_cmp expected-no-permissions-pedantic-output \
> > +             actual-no-permissions-pedantic-output &&
> > +     chmod 755 dir3/a &&
> > +     rm -rf dir3
> > +'
> > +
> > +test_expect_success SYMLINKS 'setup dirs with symlinks' '
> > +     mkdir -p dir4/a &&
> > +     mkdir -p dir4/b/c &&
> > +     >dir4/a/d &&
> > +     ln -s d dir4/a/e &&
> > +     ln -s ../b dir4/a/f &&
> > +
> > +     mkdir -p dir5/a/b &&
> > +     mkdir -p dir5/a/c &&
> > +     ln -s ../c dir5/a/b/d &&
> > +     ln -s ../ dir5/a/b/e &&
> > +     ln -s ../../ dir5/a/b/f
> > +'
> > +
> > +test_expect_success SYMLINKS 'dir-iterator should not follow symlinks by default' '
> > +     cat >expected-no-follow-sorted-output <<-EOF &&
> > +     [d] (a) [a] ./dir4/a
> > +     [d] (b) [b] ./dir4/b
> > +     [d] (b/c) [c] ./dir4/b/c
> > +     [f] (a/d) [d] ./dir4/a/d
> > +     [s] (a/e) [e] ./dir4/a/e
> > +     [s] (a/f) [f] ./dir4/a/f
> > +     EOF
> > +
> > +     test-tool dir-iterator ./dir4 >out &&
> > +     sort <out >actual-no-follow-sorted-output &&
>
> Unnecessary redirection, 'sort' is capable to open the file on its
> own.
>
> > +
> > +     test_cmp expected-no-follow-sorted-output actual-no-follow-sorted-output
> > +'
> > +
> > +test_expect_success SYMLINKS 'dir-iterator should follow symlinks w/ follow flag' '
> > +     cat >expected-follow-sorted-output <<-EOF &&
> > +     [d] (a) [a] ./dir4/a
> > +     [d] (a/f) [f] ./dir4/a/f
> > +     [d] (a/f/c) [c] ./dir4/a/f/c
> > +     [d] (b) [b] ./dir4/b
> > +     [d] (b/c) [c] ./dir4/b/c
> > +     [f] (a/d) [d] ./dir4/a/d
> > +     [f] (a/e) [e] ./dir4/a/e
> > +     EOF
> > +
> > +     test-tool dir-iterator --follow-symlinks ./dir4 >out &&
> > +     sort <out >actual-follow-sorted-output &&
>
> Likewise.
>
> > +     test_cmp expected-follow-sorted-output actual-follow-sorted-output
> > +'
> > +
> > +
> > +test_expect_success SYMLINKS 'dir-iterator should ignore recursive symlinks w/ follow flag' '
> > +     cat >expected-rec-symlinks-sorted-output <<-EOF &&
> > +     [d] (a) [a] ./dir5/a
> > +     [d] (a/b) [b] ./dir5/a/b
> > +     [d] (a/b/d) [d] ./dir5/a/b/d
> > +     [d] (a/c) [c] ./dir5/a/c
> > +     EOF
> > +
> > +     test-tool dir-iterator --follow-symlinks ./dir5 >out &&
> > +     sort <out >actual-rec-symlinks-sorted-output &&
>
> Likewise.
>
> > +     test_cmp expected-rec-symlinks-sorted-output actual-rec-symlinks-sorted-output
> > +'
> > +
> >  test_done
> > --
> > 2.22.0
> >
diff mbox series

Patch

diff --git a/dir-iterator.c b/dir-iterator.c
index 594fe4d67b..52db87bdc9 100644
--- a/dir-iterator.c
+++ b/dir-iterator.c
@@ -6,6 +6,9 @@ 
 struct dir_iterator_level {
 	DIR *dir;
 
+	/* The inode number of this level's directory. */
+	ino_t ino;
+
 	/*
 	 * The length of the directory part of path at this level
 	 * (including a trailing '/'):
@@ -38,13 +41,16 @@  struct dir_iterator_int {
 	 * that will be included in this iteration.
 	 */
 	struct dir_iterator_level *levels;
+
+	/* Combination of flags for this dir-iterator */
+	unsigned int flags;
 };
 
 /*
  * Push a level in the iter stack and initialize it with information from
  * the directory pointed by iter->base->path. It is assumed that this
  * strbuf points to a valid directory path. Return 0 on success and -1
- * otherwise, leaving the stack unchanged.
+ * otherwise, setting errno accordingly and leaving the stack unchanged.
  */
 static int push_level(struct dir_iterator_int *iter)
 {
@@ -56,14 +62,17 @@  static int push_level(struct dir_iterator_int *iter)
 	if (!is_dir_sep(iter->base.path.buf[iter->base.path.len - 1]))
 		strbuf_addch(&iter->base.path, '/');
 	level->prefix_len = iter->base.path.len;
+	level->ino = iter->base.st.st_ino;
 
 	level->dir = opendir(iter->base.path.buf);
 	if (!level->dir) {
+		int saved_errno = errno;
 		if (errno != ENOENT) {
 			warning_errno("error opening directory '%s'",
 				      iter->base.path.buf);
 		}
 		iter->levels_nr--;
+		errno = saved_errno;
 		return -1;
 	}
 
@@ -90,11 +99,13 @@  static int pop_level(struct dir_iterator_int *iter)
 /*
  * Populate iter->base with the necessary information on the next iteration
  * entry, represented by the given dirent de. Return 0 on success and -1
- * otherwise.
+ * otherwise, setting errno accordingly.
  */
 static int prepare_next_entry_data(struct dir_iterator_int *iter,
 				   struct dirent *de)
 {
+	int err, saved_errno;
+
 	strbuf_addstr(&iter->base.path, de->d_name);
 	/*
 	 * We have to reset these because the path strbuf might have
@@ -105,12 +116,34 @@  static int prepare_next_entry_data(struct dir_iterator_int *iter,
 	iter->base.basename = iter->base.path.buf +
 			      iter->levels[iter->levels_nr - 1].prefix_len;
 
-	if (lstat(iter->base.path.buf, &iter->base.st)) {
-		if (errno != ENOENT)
-			warning_errno("failed to stat '%s'", iter->base.path.buf);
-		return -1;
-	}
+	if (iter->flags & DIR_ITERATOR_FOLLOW_SYMLINKS)
+		err = stat(iter->base.path.buf, &iter->base.st);
+	else
+		err = lstat(iter->base.path.buf, &iter->base.st);
+
+	saved_errno = errno;
+	if (err && errno != ENOENT)
+		warning_errno("failed to stat '%s'", iter->base.path.buf);
+
+	errno = saved_errno;
+	return err;
+}
+
+/*
+ * Look for a recursive symlink at iter->base.path pointing to any directory on
+ * the previous stack levels. If it is found, return 1. If not, return 0.
+ */
+static int find_recursive_symlinks(struct dir_iterator_int *iter)
+{
+	int i;
+
+	if (!(iter->flags & DIR_ITERATOR_FOLLOW_SYMLINKS) ||
+	    !S_ISDIR(iter->base.st.st_mode))
+		return 0;
 
+	for (i = 0; i < iter->levels_nr; ++i)
+		if (iter->base.st.st_ino == iter->levels[i].ino)
+			return 1;
 	return 0;
 }
 
@@ -119,11 +152,11 @@  int dir_iterator_advance(struct dir_iterator *dir_iterator)
 	struct dir_iterator_int *iter =
 		(struct dir_iterator_int *)dir_iterator;
 
-	if (S_ISDIR(iter->base.st.st_mode)) {
-		if (push_level(iter) && iter->levels_nr == 0) {
-			/* Pushing the first level failed */
-			return dir_iterator_abort(dir_iterator);
-		}
+	if (S_ISDIR(iter->base.st.st_mode) && push_level(iter)) {
+		if (errno != ENOENT && iter->flags & DIR_ITERATOR_PEDANTIC)
+			goto error_out;
+		if (iter->levels_nr == 0)
+			goto error_out;
 	}
 
 	/* Loop until we find an entry that we can give back to the caller. */
@@ -137,22 +170,38 @@  int dir_iterator_advance(struct dir_iterator *dir_iterator)
 		de = readdir(level->dir);
 
 		if (!de) {
-			if (errno)
+			if (errno) {
 				warning_errno("error reading directory '%s'",
 					      iter->base.path.buf);
-			else if (pop_level(iter) == 0)
+				if (iter->flags & DIR_ITERATOR_PEDANTIC)
+					goto error_out;
+			} else if (pop_level(iter) == 0) {
 				return dir_iterator_abort(dir_iterator);
+			}
 			continue;
 		}
 
 		if (is_dot_or_dotdot(de->d_name))
 			continue;
 
-		if (prepare_next_entry_data(iter, de))
+		if (prepare_next_entry_data(iter, de)) {
+			if (errno != ENOENT && iter->flags & DIR_ITERATOR_PEDANTIC)
+				goto error_out;
 			continue;
+		}
+
+		if (find_recursive_symlinks(iter)) {
+			warning("ignoring recursive symlink at '%s'",
+				iter->base.path.buf);
+			continue;
+		}
 
 		return ITER_OK;
 	}
+
+error_out:
+	dir_iterator_abort(dir_iterator);
+	return ITER_ERROR;
 }
 
 int dir_iterator_abort(struct dir_iterator *dir_iterator)
@@ -178,7 +227,7 @@  int dir_iterator_abort(struct dir_iterator *dir_iterator)
 	return ITER_DONE;
 }
 
-struct dir_iterator *dir_iterator_begin(const char *path)
+struct dir_iterator *dir_iterator_begin(const char *path, unsigned int flags)
 {
 	struct dir_iterator_int *iter = xcalloc(1, sizeof(*iter));
 	struct dir_iterator *dir_iterator = &iter->base;
@@ -189,6 +238,7 @@  struct dir_iterator *dir_iterator_begin(const char *path)
 
 	ALLOC_GROW(iter->levels, 10, iter->levels_alloc);
 	iter->levels_nr = 0;
+	iter->flags = flags;
 
 	/*
 	 * Note: stat already checks for NULL or empty strings and
diff --git a/dir-iterator.h b/dir-iterator.h
index 0822821e56..42751091a5 100644
--- a/dir-iterator.h
+++ b/dir-iterator.h
@@ -20,7 +20,8 @@ 
  * A typical iteration looks like this:
  *
  *     int ok;
- *     struct iterator *iter = dir_iterator_begin(path);
+ *     unsigned int flags = DIR_ITERATOR_PEDANTIC;
+ *     struct dir_iterator *iter = dir_iterator_begin(path, flags);
  *
  *     if (!iter)
  *             goto error_handler;
@@ -44,6 +45,25 @@ 
  * dir_iterator_advance() again.
  */
 
+/*
+ * Flags for dir_iterator_begin:
+ *
+ * - DIR_ITERATOR_PEDANTIC: override dir-iterator's default behavior
+ *   in case of an error at dir_iterator_advance(), which is to keep
+ *   looking for a next valid entry. With this flag, resources are freed
+ *   and ITER_ERROR is returned immediately. In both cases, a meaningful
+ *   warning is emitted. Note: ENOENT errors are always ignored so that
+ *   the API users may remove files during iteration.
+ *
+ * - DIR_ITERATOR_FOLLOW_SYMLINKS: make dir-iterator follow symlinks.
+ *   i.e., linked directories' contents will be iterated over and
+ *   iter->base.st will contain information on the referred files,
+ *   not the symlinks themselves, which is the default behavior.
+ *   Recursive symlinks are skipped with a warning and broken symlinks
+ *   are ignored.
+ */
+#define DIR_ITERATOR_PEDANTIC (1 << 0)
+#define DIR_ITERATOR_FOLLOW_SYMLINKS (1 << 1)
+
 struct dir_iterator {
 	/* The current path: */
 	struct strbuf path;
@@ -58,29 +78,38 @@  struct dir_iterator {
 	/* The current basename: */
 	const char *basename;
 
-	/* The result of calling lstat() on path: */
+	/*
+	 * The result of calling lstat() on path; or stat(), if the
+	 * DIR_ITERATOR_FOLLOW_SYMLINKS flag was set at
+	 * dir_iterator's initialization.
+	 */
 	struct stat st;
 };
 
 /*
- * Start a directory iteration over path. On success, return a
- * dir_iterator that holds the internal state of the iteration.
- * In case of failure, return NULL and set errno accordingly.
+ * Start a directory iteration over path with the combination of
+ * options specified by flags. On success, return a dir_iterator
+ * that holds the internal state of the iteration. In case of
+ * failure, return NULL and set errno accordingly.
  *
  * The iteration includes all paths under path, not including path
  * itself and not including "." or ".." entries.
  *
- * path is the starting directory. An internal copy will be made.
+ * Parameters are:
+ *  - path is the starting directory. An internal copy will be made.
+ *  - flags is a combination of the possible flags to initialize a
+ *    dir-iterator or 0 for default behavior.
  */
-struct dir_iterator *dir_iterator_begin(const char *path);
+struct dir_iterator *dir_iterator_begin(const char *path, unsigned int flags);
 
 /*
  * Advance the iterator to the first or next item and return ITER_OK.
  * If the iteration is exhausted, free the dir_iterator and any
- * resources associated with it and return ITER_DONE. On error, free
- * dir_iterator and associated resources and return ITER_ERROR. It is
- * a bug to use iterator or call this function again after it has
- * returned ITER_DONE or ITER_ERROR.
+ * resources associated with it and return ITER_DONE.
+ *
+ * It is a bug to use iterator or call this function again after it
+ * has returned ITER_DONE or ITER_ERROR (which may be returned iff
+ * the DIR_ITERATOR_PEDANTIC flag was set).
  */
 int dir_iterator_advance(struct dir_iterator *iterator);
 
diff --git a/refs/files-backend.c b/refs/files-backend.c
index 7ed81046d4..b1f8f53a09 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -2150,7 +2150,7 @@  static struct ref_iterator *reflog_iterator_begin(struct ref_store *ref_store,
 
 	strbuf_addf(&sb, "%s/logs", gitdir);
 
-	diter = dir_iterator_begin(sb.buf);
+	diter = dir_iterator_begin(sb.buf, 0);
 	if(!diter)
 		return empty_ref_iterator_begin();
 
diff --git a/t/helper/test-dir-iterator.c b/t/helper/test-dir-iterator.c
index fab1ff6237..a5b96cb0dc 100644
--- a/t/helper/test-dir-iterator.c
+++ b/t/helper/test-dir-iterator.c
@@ -4,29 +4,44 @@ 
 #include "iterator.h"
 #include "dir-iterator.h"
 
-/* Argument is a directory path to iterate over */
+/*
+ * usage:
+ * tool-test dir-iterator [--follow-symlinks] [--pedantic] directory_path
+ */
 int cmd__dir_iterator(int argc, const char **argv)
 {
 	struct strbuf path = STRBUF_INIT;
 	struct dir_iterator *diter;
+	unsigned int flags = 0;
+	int iter_status;
+
+	for (++argv, --argc; *argv && starts_with(*argv, "--"); ++argv, --argc) {
+		if (strcmp(*argv, "--follow-symlinks") == 0)
+			flags |= DIR_ITERATOR_FOLLOW_SYMLINKS;
+		else if (strcmp(*argv, "--pedantic") == 0)
+			flags |= DIR_ITERATOR_PEDANTIC;
+		else
+			die("invalid option '%s'", *argv);
+	}
 
-	if (argc < 2)
-		die("BUG: test-dir-iterator needs one argument");
-
-	strbuf_add(&path, argv[1], strlen(argv[1]));
+	if (!*argv || argc != 1)
+		die("dir-iterator needs exactly one non-option argument");
 
-	diter = dir_iterator_begin(path.buf);
+	strbuf_add(&path, *argv, strlen(*argv));
+	diter = dir_iterator_begin(path.buf, flags);
 
 	if (!diter) {
 		printf("dir_iterator_begin failure: %d\n", errno);
 		exit(EXIT_FAILURE);
 	}
 
-	while (dir_iterator_advance(diter) == ITER_OK) {
+	while ((iter_status = dir_iterator_advance(diter)) == ITER_OK) {
 		if (S_ISDIR(diter->st.st_mode))
 			printf("[d] ");
 		else if (S_ISREG(diter->st.st_mode))
 			printf("[f] ");
+		else if (S_ISLNK(diter->st.st_mode))
+			printf("[s] ");
 		else
 			printf("[?] ");
 
@@ -34,5 +49,10 @@  int cmd__dir_iterator(int argc, const char **argv)
 		       diter->path.buf);
 	}
 
+	if (iter_status != ITER_DONE) {
+		printf("dir_iterator_advance failure\n");
+		return 1;
+	}
+
 	return 0;
 }
diff --git a/t/t0066-dir-iterator.sh b/t/t0066-dir-iterator.sh
index c739ed7911..8f996a31fa 100755
--- a/t/t0066-dir-iterator.sh
+++ b/t/t0066-dir-iterator.sh
@@ -65,4 +65,99 @@  test_expect_success 'begin should fail upon non directory paths' '
 	test_cmp expected-non-dir-output actual-non-dir-output
 '
 
+test_expect_success POSIXPERM,SANITY 'advance should not fail on errors by default' '
+	cat >expected-no-permissions-output <<-EOF &&
+	[d] (a) [a] ./dir3/a
+	EOF
+
+	mkdir -p dir3/a &&
+	> dir3/a/b &&
+	chmod 0 dir3/a &&
+
+	test-tool dir-iterator ./dir3 >actual-no-permissions-output &&
+	test_cmp expected-no-permissions-output actual-no-permissions-output &&
+	chmod 755 dir3/a &&
+	rm -rf dir3
+'
+
+test_expect_success POSIXPERM,SANITY 'advance should fail on errors, w/ pedantic flag' '
+	cat >expected-no-permissions-pedantic-output <<-EOF &&
+	[d] (a) [a] ./dir3/a
+	dir_iterator_advance failure
+	EOF
+
+	mkdir -p dir3/a &&
+	> dir3/a/b &&
+	chmod 0 dir3/a &&
+
+	test_must_fail test-tool dir-iterator --pedantic ./dir3 \
+		>actual-no-permissions-pedantic-output &&
+	test_cmp expected-no-permissions-pedantic-output \
+		actual-no-permissions-pedantic-output &&
+	chmod 755 dir3/a &&
+	rm -rf dir3
+'
+
+test_expect_success SYMLINKS 'setup dirs with symlinks' '
+	mkdir -p dir4/a &&
+	mkdir -p dir4/b/c &&
+	>dir4/a/d &&
+	ln -s d dir4/a/e &&
+	ln -s ../b dir4/a/f &&
+
+	mkdir -p dir5/a/b &&
+	mkdir -p dir5/a/c &&
+	ln -s ../c dir5/a/b/d &&
+	ln -s ../ dir5/a/b/e &&
+	ln -s ../../ dir5/a/b/f
+'
+
+test_expect_success SYMLINKS 'dir-iterator should not follow symlinks by default' '
+	cat >expected-no-follow-sorted-output <<-EOF &&
+	[d] (a) [a] ./dir4/a
+	[d] (b) [b] ./dir4/b
+	[d] (b/c) [c] ./dir4/b/c
+	[f] (a/d) [d] ./dir4/a/d
+	[s] (a/e) [e] ./dir4/a/e
+	[s] (a/f) [f] ./dir4/a/f
+	EOF
+
+	test-tool dir-iterator ./dir4 >out &&
+	sort <out >actual-no-follow-sorted-output &&
+
+	test_cmp expected-no-follow-sorted-output actual-no-follow-sorted-output
+'
+
+test_expect_success SYMLINKS 'dir-iterator should follow symlinks w/ follow flag' '
+	cat >expected-follow-sorted-output <<-EOF &&
+	[d] (a) [a] ./dir4/a
+	[d] (a/f) [f] ./dir4/a/f
+	[d] (a/f/c) [c] ./dir4/a/f/c
+	[d] (b) [b] ./dir4/b
+	[d] (b/c) [c] ./dir4/b/c
+	[f] (a/d) [d] ./dir4/a/d
+	[f] (a/e) [e] ./dir4/a/e
+	EOF
+
+	test-tool dir-iterator --follow-symlinks ./dir4 >out &&
+	sort <out >actual-follow-sorted-output &&
+
+	test_cmp expected-follow-sorted-output actual-follow-sorted-output
+'
+
+
+test_expect_success SYMLINKS 'dir-iterator should ignore recursive symlinks w/ follow flag' '
+	cat >expected-rec-symlinks-sorted-output <<-EOF &&
+	[d] (a) [a] ./dir5/a
+	[d] (a/b) [b] ./dir5/a/b
+	[d] (a/b/d) [d] ./dir5/a/b/d
+	[d] (a/c) [c] ./dir5/a/c
+	EOF
+
+	test-tool dir-iterator --follow-symlinks ./dir5 >out &&
+	sort <out >actual-rec-symlinks-sorted-output &&
+
+	test_cmp expected-rec-symlinks-sorted-output actual-rec-symlinks-sorted-output
+'
+
 test_done