diff mbox series

mt/dir-iterator-updates, was Re: What's cooking in git.git (Jul 2019, #01; Wed, 3)

Message ID nycvar.QRO.7.76.6.1907041136530.44@tvgsbejvaqbjf.bet (mailing list archive)
State New, archived
Headers show
Series mt/dir-iterator-updates, was Re: What's cooking in git.git (Jul 2019, #01; Wed, 3) | expand

Commit Message

Johannes Schindelin July 4, 2019, 10:03 a.m. UTC
Hi Junio,

On Wed, 3 Jul 2019, Junio C Hamano wrote:

> * mt/dir-iterator-updates (2019-06-25) 10 commits
>  - clone: replace strcmp by fspathcmp
>  - clone: use dir-iterator to avoid explicit dir traversal
>  - clone: extract function from copy_or_link_directory
>  - clone: copy hidden paths at local clone
>  - dir-iterator: add flags parameter to dir_iterator_begin
>  - dir-iterator: refactor state machine model
>  - dir-iterator: use warning_errno when possible
>  - dir-iterator: add tests for dir-iterator API
>  - clone: better handle symlinked files at .git/objects/
>  - clone: test for our behavior on odd objects/* content
>
>  Adjust the dir-iterator API and apply it to the local clone
>  optimization codepath.
>
>  Is this ready for 'next'?

No. It still breaks many dozens of test cases on Windows (if not more)
because it thinks that it can rely on `st_ino` to detect circular
symlinks.

In
https://public-inbox.org/git/nycvar.QRO.7.76.6.1906272046180.44@tvgsbejvaqbjf.bet/
I had suggested to do something like this:

-- snip --
-- snap --

Duy had also suggested to guard part of this using `USE_STDEV`, but as
Matheus figured out that would not make sense, as the `USE_STDEV` flag
really is meant to work around issues with network filesystems where
`st_dev` can be unreliable.

However, in the meantime I thought about this a bit more and I remembered
how this is done elsewhere: I saw many recursive symlink resolvers that
just have an arbitrary cut-off after following, say, 32 links.

In fact, Git itself already has this in abspath.c:

	/* We allow "recursive" symbolic links. Only within reason, though. */
	#ifndef MAXSYMLINKS
	#define MAXSYMLINKS 32
	#endif

But then, the patch in question uses `stat()` instead of `lstat()`, so we
would not have any way to count the number of symbolic links we followed.

Do we _have_ to, though? At some stage the path we come up with is beyond
`PATH_MAX` and we can stop right then and there.

Besides, the way `find_recursive_symlinks()` is implemented adds quadratic
behavior.

So I would like to submit the idea of simplifying the code drastically,
by skipping the `find_recursive_symlinks()` function altogether.

This would solve another issue I have with that function, anyway: The name
suggests, to me at least, that we follow symlinks recursively. It does
not. I think that could have been addressed by using the adjective
"circular" instead of "recursive". But I also think there are enough
reasons to do away with this function in the first place.

Ciao,
Dscho

Comments

Matheus Tavares July 4, 2019, 4:39 p.m. UTC | #1
Hi, Dscho

On Thu, Jul 4, 2019 at 7:02 AM Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
>
> Hi Junio,
>
> On Wed, 3 Jul 2019, Junio C Hamano wrote:
>
> > * mt/dir-iterator-updates (2019-06-25) 10 commits
[...]
> >  Is this ready for 'next'?
>
> No. It still breaks many dozens of test cases on Windows (if not more)
> because it thinks that it can rely on `st_ino` to detect circular
> symlinks.

I wanted to take a look at the failures to see if I could help, but
I'm not very familiar with azure (and travis-ci doesn't run windows'
tests). So the only build I could find, containing the code from this
series, is this[1]. But it only shows 4 failures and they don't seem
to relate with dir-iterator... Could you point me to the right place,
please?

> In
> https://public-inbox.org/git/nycvar.QRO.7.76.6.1906272046180.44@tvgsbejvaqbjf.bet/
> I had suggested to do something like this:
>
[...]
>
> However, in the meantime I thought about this a bit more and I remembered
> how this is done elsewhere: I saw many recursive symlink resolvers that
> just have an arbitrary cut-off after following, say, 32 links.
>
> In fact, Git itself already has this in abspath.c:
>
>         /* We allow "recursive" symbolic links. Only within reason, though. */
>         #ifndef MAXSYMLINKS
>         #define MAXSYMLINKS 32
>         #endif
>
> But then, the patch in question uses `stat()` instead of `lstat()`, so we
> would not have any way to count the number of symbolic links we followed.

Hm, I think `stat()` itself uses this strategy of an arbitrary cut-off
when resolving a path. So we may also "ignore" circular symlinks and
let the iteration continue until the point where `stat()` will return
an ELOOP. (But it may be expensive...)

> Do we _have_ to, though? At some stage the path we come up with is beyond
> `PATH_MAX` and we can stop right then and there.
>
> Besides, the way `find_recursive_symlinks()` is implemented adds quadratic
> behavior.

Yes, indeed. But it only happens when we have a path like this:
`symlink_to_dir1/symlink_to_dir2/symlink_to_dir3/symlink_to_dir4/...`,
right? I think this case won't be very usual on actual filesystems,
thought.

> So I would like to submit the idea of simplifying the code drastically,
> by skipping the `find_recursive_symlinks()` function altogether.
>
> This would solve another issue I have with that function, anyway: The name
> suggests, to me at least, that we follow symlinks recursively. It does
> not. I think that could have been addressed by using the adjective
> "circular" instead of "recursive".

Indeed, "circular" sounds much better then "recursive".

> But I also think there are enough
> reasons to do away with this function in the first place.

We can delegate the circular symlinks problem to `stat()'s ELOOP` or
`path_len > PATH_MAX`. The only downside is the overhead of iterating
through directories which will be latter discarded for being in
circular symlinks' chains. I mean, the overhead at dir-iterator
shouldn't be much, but the one on the code using this API to do
something for each of these directories (and its contents), may be.
Also we would need to "undo" the work done for each of these
directories if we want to ignore circular symlinks and continue the
iteration, whereas if we try to detect it a priori, we can skip it
from the beginning.

> Ciao,
> Dscho

[1]: https://dev.azure.com/git-for-windows/git/_build/results?buildId=38852
Johannes Schindelin July 4, 2019, 9:31 p.m. UTC | #2
Hi Matheus,

On Thu, 4 Jul 2019, Matheus Tavares Bernardino wrote:

> On Thu, Jul 4, 2019 at 7:02 AM Johannes Schindelin
> <Johannes.Schindelin@gmx.de> wrote:
> >
> > On Wed, 3 Jul 2019, Junio C Hamano wrote:
> >
> > > * mt/dir-iterator-updates (2019-06-25) 10 commits
> [...]
> > >  Is this ready for 'next'?
> >
> > No. It still breaks many dozens of test cases on Windows (if not more)
> > because it thinks that it can rely on `st_ino` to detect circular
> > symlinks.
>
> I wanted to take a look at the failures to see if I could help, but
> I'm not very familiar with azure (and travis-ci doesn't run windows'
> tests). So the only build I could find, containing the code from this
> series, is this[1]. But it only shows 4 failures and they don't seem
> to relate with dir-iterator... Could you point me to the right place,
> please?

Certainly. In
https://github.com/gitgitgadget/git/branches/all?utf8=%E2%9C%93&query=mt%2F,
you will find the red X next to the branch name `mt/dir-iterator-updates`,
and clicking on it will pop up the list of jobs (including the failing
ones).

If you click on any item, it will first get you to the "Checks" page where
you can find the link "View more details on Azure Pipelines" on the bottom
of the right-hand side pane. This will lead you to
https://dev.azure.com/gitgitgadget/git/_build/results?buildId=11495

I usually click on the "Tests" tab in that page:
https://dev.azure.com/gitgitgadget/git/_build/results?buildId=11495&view=ms.vss-test-web.build-test-results-tab

You can click on any of the 1,384 (!) failing test cases, it will pop up a
pane on the right-hand side that shows the trace of that failing test
case. For the full trace of that test script, go to "Attachments" and
download the `Standard_Error_Output.log` (via the horizontal bread-crumbs
menu you can see when hovering over the file name).

> > In
> > https://public-inbox.org/git/nycvar.QRO.7.76.6.1906272046180.44@tvgsbejvaqbjf.bet/
> > I had suggested to do something like this:
> >
> [...]
> >
> > However, in the meantime I thought about this a bit more and I remembered
> > how this is done elsewhere: I saw many recursive symlink resolvers that
> > just have an arbitrary cut-off after following, say, 32 links.
> >
> > In fact, Git itself already has this in abspath.c:
> >
> >         /* We allow "recursive" symbolic links. Only within reason, though. */
> >         #ifndef MAXSYMLINKS
> >         #define MAXSYMLINKS 32
> >         #endif
> >
> > But then, the patch in question uses `stat()` instead of `lstat()`, so we
> > would not have any way to count the number of symbolic links we followed.
>
> Hm, I think `stat()` itself uses this strategy of an arbitrary cut-off
> when resolving a path. So we may also "ignore" circular symlinks and
> let the iteration continue until the point where `stat()` will return
> an ELOOP. (But it may be expensive...)

This would not be equivalent, though, as your code also tried to address
circular references when two or more symlinks are involved, e.g. when
one symlink points to a directory that has another symlink that points to
the directory containing the first symlink.

> > Do we _have_ to, though? At some stage the path we come up with is beyond
> > `PATH_MAX` and we can stop right then and there.
> >
> > Besides, the way `find_recursive_symlinks()` is implemented adds quadratic
> > behavior.
>
> Yes, indeed. But it only happens when we have a path like this:
> `symlink_to_dir1/symlink_to_dir2/symlink_to_dir3/symlink_to_dir4/...`,
> right? I think this case won't be very usual on actual filesystems,
> thought.

No, the check is performed in a loop, and that loop is executed whether
you have symlinks or not. That loop is executed for every item in the
iteration, and as we cannot assume a flat directory in general (in fact,
we should assume a directory depth proportional to the total number of
files), that adds that quadratic behavior.

> > So I would like to submit the idea of simplifying the code drastically,
> > by skipping the `find_recursive_symlinks()` function altogether.
> >
> > This would solve another issue I have with that function, anyway: The name
> > suggests, to me at least, that we follow symlinks recursively. It does
> > not. I think that could have been addressed by using the adjective
> > "circular" instead of "recursive".
>
> Indeed, "circular" sounds much better then "recursive".
>
> > But I also think there are enough
> > reasons to do away with this function in the first place.
>
> We can delegate the circular symlinks problem to `stat()'s ELOOP`

Not really. I mean, we _already_ delegate to the `ELOOP` condition, we
cannot even avoid it as long as we keep using `stat()` instead of
`lstat()`, but as I demonstrated above, that only catches part of the
circular symlinks.

> or `path_len > PATH_MAX`.

This would have the advantage of _not_ adding quadratic behavior.

And just in case you think quadratic behavior would not matter much: Git
is used to manage the largest repository on this planet, which has over 3
million index entries when checked out.

Quadratic behavior matters.

I don't know where the dir-iterator is used, but we simply should try our
best to aim for the optimal time complexity in the first place.

> The only downside is the overhead of iterating through directories which
> will be latter discarded for being in circular symlinks' chains. I mean,
> the overhead at dir-iterator shouldn't be much, but the one on the code
> using this API to do something for each of these directories (and its
> contents), may be. Also we would need to "undo" the work done for each
> of these directories if we want to ignore circular symlinks and continue
> the iteration, whereas if we try to detect it a priori, we can skip it
> from the beginning.

Given that the intent of this patch series is a mere refactoring, I wonder
why the new, contentious circular symlink detection is slipped into it
anyway. It complicates the task, makes reviewing a lot harder, and it
holds up the refactoring.

Ciao,
Dscho
Matheus Tavares July 5, 2019, 2:29 p.m. UTC | #3
On Thu, Jul 4, 2019 at 6:30 PM Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
>
> Hi Matheus,
>
> On Thu, 4 Jul 2019, Matheus Tavares Bernardino wrote:
> >
> > I wanted to take a look at the failures to see if I could help, [...]
> > Could you point me to the right place, please?
[...]
>
> I usually click on the "Tests" tab in that page:
> https://dev.azure.com/gitgitgadget/git/_build/results?buildId=11495&view=ms.vss-test-web.build-test-results-tab
>
> You can click on any of the 1,384 (!) failing test cases, it will pop up a
> pane on the right-hand side that shows the trace of that failing test
> case. For the full trace of that test script, go to "Attachments" and
> download the `Standard_Error_Output.log` (via the horizontal bread-crumbs
> menu you can see when hovering over the file name).

Thanks for the explanation! I inspected some of the
`Standard_Error_Output.log`'s and it seems the problem is always with
local clone (which started to use dir-iterator in this series). It
seems all .git/objects/ dirs are being ignored. That makes sense,
since st_ino will always be 0 on Windows. But your fixup patch should
solve this. Is there any azure build for it?

[...]
> >
> > Hm, I think `stat()` itself uses this strategy of an arbitrary cut-off
> > when resolving a path. So we may also "ignore" circular symlinks and
> > let the iteration continue until the point where `stat()` will return
> > an ELOOP. (But it may be expensive...)
>
> This would not be equivalent, though, as your code also tried to address
> circular references when two or more symlinks are involved, e.g. when
> one symlink points to a directory that has another symlink that points to
> the directory containing the first symlink.

Hm, `stat()` also addresses this case doesn't it? For example:

$ mkdir a b
$ ln -s ../a b/s2a
$ ln -s ../b a/s2b
$ stat b/s2a/s2b/s2a/.../s2b

Gives me:
"too many levels of symbolic links"

> > > Do we _have_ to, though? At some stage the path we come up with is beyond
> > > `PATH_MAX` and we can stop right then and there.
> > >
> > > Besides, the way `find_recursive_symlinks()` is implemented adds quadratic
> > > behavior.
> >
> > Yes, indeed. But it only happens when we have a path like this:
> > `symlink_to_dir1/symlink_to_dir2/symlink_to_dir3/symlink_to_dir4/...`,
> > right? I think this case won't be very usual on actual filesystems,
> > thought.
>
> No, the check is performed in a loop, and that loop is executed whether
> you have symlinks or not. That loop is executed for every item in the
> iteration, and as we cannot assume a flat directory in general (in fact,
> we should assume a directory depth proportional to the total number of
> files), that adds that quadratic behavior.

Oh, you're right. Sorry for the noise, I forgot this function was not
only called for symlinks but for every directory entry :(

An alternative solution would be to use `lstat()` together with
`stat()` to only feed symlinked dirs to this function. This should
reduce a lot the number of calls. Although it'd still be quadratic in
the worst case, would that be any good?

[...]
> > > But I also think there are enough
> > > reasons to do away with this function in the first place.
> >
> > We can delegate the circular symlinks problem to `stat()'s ELOOP`
>
> Not really. I mean, we _already_ delegate to the `ELOOP` condition, we
> cannot even avoid it as long as we keep using `stat()` instead of
> `lstat()`

Yes, indeed. What I meant is that we may chose to _fully_ delegate to
ELOOP. The way it is now, we should detect circular symlinks way
before stat() fails. For example, with the case I showed above, we
would stop at "b/s2a/s2b" whereas stat() would only fail at a much
longer "b/s2a/s2b/s2a/s2b/...", far beyond in the iteration.

> but as I demonstrated above, that only catches part of the
> circular symlinks.
>
> > or `path_len > PATH_MAX`.
>
> This would have the advantage of _not_ adding quadratic behavior.
>
> And just in case you think quadratic behavior would not matter much: Git
> is used to manage the largest repository on this planet, which has over 3
> million index entries when checked out.
>
> Quadratic behavior matters.
>
> I don't know where the dir-iterator is used, but we simply should try our
> best to aim for the optimal time complexity in the first place.

Currently, with the follow symlinks option, dir-iterator is only being
used to iterate over .git/objects. As it's rather shallow, perhaps the
quadratic complexity wouldn't be a huge deal in this case. But I agree
with you that we should take care of performance so that this API may,
as well, be used in other places, in the future.

> > The only downside is the overhead of iterating through directories which
> > will be latter discarded for being in circular symlinks' chains. I mean,
> > the overhead at dir-iterator shouldn't be much, but the one on the code
> > using this API to do something for each of these directories (and its
> > contents), may be. Also we would need to "undo" the work done for each
> > of these directories if we want to ignore circular symlinks and continue
> > the iteration, whereas if we try to detect it a priori, we can skip it
> > from the beginning.
>
> Given that the intent of this patch series is a mere refactoring, I wonder
> why the new, contentious circular symlink detection is slipped into it
> anyway. It complicates the task, makes reviewing a lot harder, and it
> holds up the refactoring.

Yes, I apologize for that. I should have split this into 2 or maybe 3
patchsets... This series started really simple just trying to apply
the dir-iterator API into local clone. But then, other things became
necessary and it got more complex.

So, should I send a fixup patch removing find_recursive_symlinks() or
reroll the series? There's also the option to use stat()+lstat() to
reduce calls to this function, but it doesn't solve the problem on
Windows, anyway.
Johannes Schindelin July 5, 2019, 6:17 p.m. UTC | #4
Hi Matheus,

On Fri, 5 Jul 2019, Matheus Tavares Bernardino wrote:

> On Thu, Jul 4, 2019 at 6:30 PM Johannes Schindelin
> <Johannes.Schindelin@gmx.de> wrote:
> >
> > Hi Matheus,
> >
> > On Thu, 4 Jul 2019, Matheus Tavares Bernardino wrote:
> > >
> > > I wanted to take a look at the failures to see if I could help, [...]
> > > Could you point me to the right place, please?
> [...]
> >
> > I usually click on the "Tests" tab in that page:
> > https://dev.azure.com/gitgitgadget/git/_build/results?buildId=11495&view=ms.vss-test-web.build-test-results-tab
> >
> > You can click on any of the 1,384 (!) failing test cases, it will pop up a
> > pane on the right-hand side that shows the trace of that failing test
> > case. For the full trace of that test script, go to "Attachments" and
> > download the `Standard_Error_Output.log` (via the horizontal bread-crumbs
> > menu you can see when hovering over the file name).
>
> Thanks for the explanation! I inspected some of the
> `Standard_Error_Output.log`'s and it seems the problem is always with
> local clone (which started to use dir-iterator in this series). It
> seems all .git/objects/ dirs are being ignored. That makes sense,
> since st_ino will always be 0 on Windows. But your fixup patch should
> solve this. Is there any azure build for it?

There is no direct Azure Pipeline run for it, as I have not created a
branch with the fixup on top of your branch. But the shears/pu branch in
https://github.com/git-for-windows/git/branches does have the fixup, and
passes all tests.

> [...]
> > >
> > > Hm, I think `stat()` itself uses this strategy of an arbitrary cut-off
> > > when resolving a path. So we may also "ignore" circular symlinks and
> > > let the iteration continue until the point where `stat()` will return
> > > an ELOOP. (But it may be expensive...)
> >
> > This would not be equivalent, though, as your code also tried to address
> > circular references when two or more symlinks are involved, e.g. when
> > one symlink points to a directory that has another symlink that points to
> > the directory containing the first symlink.
>
> Hm, `stat()` also addresses this case doesn't it? For example:
>
> $ mkdir a b
> $ ln -s ../a b/s2a
> $ ln -s ../b a/s2b
> $ stat b/s2a/s2b/s2a/.../s2b
>
> Gives me:
> "too many levels of symbolic links"

Okay, then. Even better.

(With the caveat that I don't know how ubiquitous this behavior is, I
assume you only tested on Linux.)

> > > > Do we _have_ to, though? At some stage the path we come up with is beyond
> > > > `PATH_MAX` and we can stop right then and there.
> > > >
> > > > Besides, the way `find_recursive_symlinks()` is implemented adds quadratic
> > > > behavior.
> > >
> > > Yes, indeed. But it only happens when we have a path like this:
> > > `symlink_to_dir1/symlink_to_dir2/symlink_to_dir3/symlink_to_dir4/...`,
> > > right? I think this case won't be very usual on actual filesystems,
> > > thought.
> >
> > No, the check is performed in a loop, and that loop is executed whether
> > you have symlinks or not. That loop is executed for every item in the
> > iteration, and as we cannot assume a flat directory in general (in fact,
> > we should assume a directory depth proportional to the total number of
> > files), that adds that quadratic behavior.
>
> Oh, you're right. Sorry for the noise, I forgot this function was not
> only called for symlinks but for every directory entry :(
>
> An alternative solution would be to use `lstat()` together with
> `stat()` to only feed symlinked dirs to this function. This should
> reduce a lot the number of calls. Although it'd still be quadratic in
> the worst case, would that be any good?

Why not just skip this logic? At least for now? It really blocks the
development of this patch series, causing `pu` to be broken until the test
failures are resolved.

> [...]
> > > > But I also think there are enough
> > > > reasons to do away with this function in the first place.
> > >
> > > We can delegate the circular symlinks problem to `stat()'s ELOOP`
> >
> > Not really. I mean, we _already_ delegate to the `ELOOP` condition, we
> > cannot even avoid it as long as we keep using `stat()` instead of
> > `lstat()`
>
> Yes, indeed. What I meant is that we may chose to _fully_ delegate to
> ELOOP. The way it is now, we should detect circular symlinks way
> before stat() fails. For example, with the case I showed above, we
> would stop at "b/s2a/s2b" whereas stat() would only fail at a much
> longer "b/s2a/s2b/s2a/s2b/...", far beyond in the iteration.

Sounds like the solution to me that I wanted: drop the special code to
detect circular symlinks. In other words: I like that idea.

> > > The only downside is the overhead of iterating through directories which
> > > will be latter discarded for being in circular symlinks' chains. I mean,
> > > the overhead at dir-iterator shouldn't be much, but the one on the code
> > > using this API to do something for each of these directories (and its
> > > contents), may be. Also we would need to "undo" the work done for each
> > > of these directories if we want to ignore circular symlinks and continue
> > > the iteration, whereas if we try to detect it a priori, we can skip it
> > > from the beginning.
> >
> > Given that the intent of this patch series is a mere refactoring, I wonder
> > why the new, contentious circular symlink detection is slipped into it
> > anyway. It complicates the task, makes reviewing a lot harder, and it
> > holds up the refactoring.
>
> Yes, I apologize for that. I should have split this into 2 or maybe 3
> patchsets... This series started really simple just trying to apply
> the dir-iterator API into local clone. But then, other things became
> necessary and it got more complex.
>
> So, should I send a fixup patch removing find_recursive_symlinks() or
> reroll the series? There's also the option to use stat()+lstat() to
> reduce calls to this function, but it doesn't solve the problem on
> Windows, anyway.

I would suggest to send another iteration that removes
`find_recursive_symlinks()`. Junio most likely interpreted my objections
as a veto against advancing the current iteration to `next`, meaning that
you're good to even rewrite completely in the next iteration, should you
feel the need to. No need for "Oops, fix that" follow-up commits at this
stage.

Ciao,
Dscho
Matheus Tavares July 8, 2019, 10:30 p.m. UTC | #5
On Fri, Jul 5, 2019 at 3:17 PM Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
>
> Hi Matheus,
>
> On Fri, 5 Jul 2019, Matheus Tavares Bernardino wrote:
> >
> > So, should I send a fixup patch removing find_recursive_symlinks() or
> > reroll the series? There's also the option to use stat()+lstat() to
> > reduce calls to this function, but it doesn't solve the problem on
> > Windows, anyway.
>
> I would suggest to send another iteration that removes
> `find_recursive_symlinks()`. Junio most likely interpreted my objections
> as a veto against advancing the current iteration to `next`, meaning that
> you're good to even rewrite completely in the next iteration, should you
> feel the need to. No need for "Oops, fix that" follow-up commits at this
> stage.

OK, thanks! I'll send a v8, then, removing the circular symlink section, ASAP.

> Ciao,
> Dscho

Best,
Matheus
diff mbox series

Patch

diff --git a/dir-iterator.c b/dir-iterator.c
index 52db87bdc99f..85cd04b7b571 100644
--- a/dir-iterator.c
+++ b/dir-iterator.c
@@ -8,6 +8,7 @@  struct dir_iterator_level {

 	/* The inode number of this level's directory. */
 	ino_t ino;
+	dev_t dev;

 	/*
 	 * The length of the directory part of path at this level
@@ -63,6 +64,7 @@  static int push_level(struct dir_iterator_int *iter)
 		strbuf_addch(&iter->base.path, '/');
 	level->prefix_len = iter->base.path.len;
 	level->ino = iter->base.st.st_ino;
+	level->dev = iter->base.st.st_dev;

 	level->dir = opendir(iter->base.path.buf);
 	if (!level->dir) {
@@ -138,11 +140,14 @@  static int find_recursive_symlinks(struct dir_iterator_int *iter)
 	int i;

 	if (!(iter->flags & DIR_ITERATOR_FOLLOW_SYMLINKS) ||
-	    !S_ISDIR(iter->base.st.st_mode))
+	    !S_ISDIR(iter->base.st.st_mode) ||
+	    /* On Windows, st_ino is always set to 0 */
+	    !iter->base.st.st_ino)
 		return 0;

 	for (i = 0; i < iter->levels_nr; ++i)
-		if (iter->base.st.st_ino == iter->levels[i].ino)
+		if (iter->base.st.st_ino == iter->levels[i].ino &&
+		    iter->base.st.st_dev == iter->levels[i].dev)
 			return 1;
 	return 0;
 }