Message ID | 5a678ee74de42f1373deeed718fa24d368347d13.1560898723.git.matheus.bernardino@usp.br (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | clone: dir-iterator refactoring with tests | expand |
Matheus Tavares <matheus.bernardino@usp.br> writes: This hunk, which claims to have 25 lines in the postimage ... > @@ -44,6 +45,25 @@ > * dir_iterator_advance() again. > */ > > +/* > + * Flags for dir_iterator_begin: > + * > + * - DIR_ITERATOR_PEDANTIC: override dir-iterator's default behavior > + * in case of an error at dir_iterator_advance(), which is to keep > + * looking for a next valid entry. With this flag, resources are freed > + * and ITER_ERROR is returned immediately. In both cases, a meaningful > + * warning is emitted. Note: ENOENT errors are always ignored so that > + * the API users may remove files during iteration. > + * > + * - DIR_ITERATOR_FOLLOW_SYMLINKS: make dir-iterator follow symlinks. > + * i.e., linked directories' contents will be iterated over and > + * iter->base.st will contain information on the referred files, > + * not the symlinks themselves, which is the default behavior. > + * Recursive symlinks are skipped with a warning and broken symlinks > + * are ignored. > + */ > +#define DIR_ITERATOR_PEDANTIC (1 << 0) > +#define DIR_ITERATOR_FOLLOW_SYMLINKS (1 << 1) > + > struct dir_iterator { > /* The current path: */ > struct strbuf path; > @@ -58,29 +78,38 @@ struct dir_iterator { ... adds 20 lines, making the postimage 26 lines long. Did you hand edit your patch? It is OK to do so, as long as you know what you are doing ;-). Adjust the length of the postimage on the @@ ... @@ line to make it consistent with the patch text, and also make sure a tweak you do here won't make later patches not apply.
On Tue, Jun 25, 2019 at 3:00 PM Junio C Hamano <gitster@pobox.com> wrote: > > Matheus Tavares <matheus.bernardino@usp.br> writes: > > This hunk, which claims to have 25 lines in the postimage ... > > > @@ -44,6 +45,25 @@ > > * dir_iterator_advance() again. > > */ > > > > +/* > > + * Flags for dir_iterator_begin: > > + * > > + * - DIR_ITERATOR_PEDANTIC: override dir-iterator's default behavior > > + * in case of an error at dir_iterator_advance(), which is to keep > > + * looking for a next valid entry. With this flag, resources are freed > > + * and ITER_ERROR is returned immediately. In both cases, a meaningful > > + * warning is emitted. Note: ENOENT errors are always ignored so that > > + * the API users may remove files during iteration. > > + * > > + * - DIR_ITERATOR_FOLLOW_SYMLINKS: make dir-iterator follow symlinks. > > + * i.e., linked directories' contents will be iterated over and > > + * iter->base.st will contain information on the referred files, > > + * not the symlinks themselves, which is the default behavior. > > + * Recursive symlinks are skipped with a warning and broken symlinks > > + * are ignored. > > + */ > > +#define DIR_ITERATOR_PEDANTIC (1 << 0) > > +#define DIR_ITERATOR_FOLLOW_SYMLINKS (1 << 1) > > + > > struct dir_iterator { > > /* The current path: */ > > struct strbuf path; > > @@ -58,29 +78,38 @@ struct dir_iterator { > > ... adds 20 lines, making the postimage 26 lines long. > > Did you hand edit your patch? It is OK to do so, as long as you > know what you are doing ;-). Adjust the length of the postimage on > the @@ ... @@ line to make it consistent with the patch text, and > also make sure a tweak you do here won't make later patches not > apply. Oh, I'm sorry for that, I'll be more careful with hand editing next time. Thanks for the advice. I think for this time it won't affect the later patches as it was a minor addition at one comment, but should I perhaps re-send it?
Hi Matheus, On Tue, 18 Jun 2019, Matheus Tavares wrote: >[...] > +/* > + * Look for a recursive symlink at iter->base.path pointing to any directory on > + * the previous stack levels. If it is found, return 1. If not, return 0. > + */ > +static int find_recursive_symlinks(struct dir_iterator_int *iter) > +{ > + int i; > + > + if (!(iter->flags & DIR_ITERATOR_FOLLOW_SYMLINKS) || > + !S_ISDIR(iter->base.st.st_mode)) > + return 0; > > + for (i = 0; i < iter->levels_nr; ++i) > + if (iter->base.st.st_ino == iter->levels[i].ino) This does not work on Windows. Remember, Git relies on (too) many areas where Linux is strong, and the `lstat()` call is one of them. Therefore, Git overuses that call. In the Git for Windows project, we struggled a bit to emulate it in the best way. It is pretty expensive, for example, to find out the number of hard links, the device ID, an equivalent of the inode, etc. Many `lstat()` calls are really only interested in the `mtime`, though, meaning that we would waste a ton of time if we tried to be more faithful in our `lstat()` emulation. Therefore, we simply assign `0` as inode. Sure, this violates the POSIX standard, but imagine this: the FAT filesystem (which is still in use!) does not have _anything_ resembling inodes. I fear, therefore, that we will require at least a workaround for the situation where `st_ino` is always zero. Ciao, Johannes
Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: > Hi Matheus, > > On Tue, 18 Jun 2019, Matheus Tavares wrote: > >>[...] >> +/* >> + * Look for a recursive symlink at iter->base.path pointing to any directory on >> + * the previous stack levels. If it is found, return 1. If not, return 0. >> + */ >> +static int find_recursive_symlinks(struct dir_iterator_int *iter) >> +{ >> + int i; >> + >> + if (!(iter->flags & DIR_ITERATOR_FOLLOW_SYMLINKS) || >> + !S_ISDIR(iter->base.st.st_mode)) >> + return 0; >> >> + for (i = 0; i < iter->levels_nr; ++i) >> + if (iter->base.st.st_ino == iter->levels[i].ino) > > This does not work on Windows. [[ Windows port does not have > usable st_ino field ]]] And if you cross mountpoint, st_ino alone does not guarantee uniqueness; you'd need to combine it with st_dev, I would think, even on POSIX systems.
On Thu, Jun 27, 2019 at 1:04 AM Junio C Hamano <gitster@pobox.com> wrote: > > Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: > > > Hi Matheus, > > > > On Tue, 18 Jun 2019, Matheus Tavares wrote: > > > >>[...] > >> +/* > >> + * Look for a recursive symlink at iter->base.path pointing to any directory on > >> + * the previous stack levels. If it is found, return 1. If not, return 0. > >> + */ > >> +static int find_recursive_symlinks(struct dir_iterator_int *iter) > >> +{ > >> + int i; > >> + > >> + if (!(iter->flags & DIR_ITERATOR_FOLLOW_SYMLINKS) || > >> + !S_ISDIR(iter->base.st.st_mode)) > >> + return 0; > >> > >> + for (i = 0; i < iter->levels_nr; ++i) > >> + if (iter->base.st.st_ino == iter->levels[i].ino) > > > > This does not work on Windows. [[ Windows port does not have > > usable st_ino field ]]] > > And if you cross mountpoint, st_ino alone does not guarantee > uniqueness; you'd need to combine it with st_dev, I would think, > even on POSIX systems. which should be protected by USE_STDEV. There's another code that ignore st_ino on Windows in entry.c. Maybe it's time to define USE_STINO instead of spreading "#if GIT_WINDOWS_NATIVE" more.
On Wed, Jun 26, 2019 at 3:04 PM Junio C Hamano <gitster@pobox.com> wrote: > > Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: > > > Hi Matheus, > > > > On Tue, 18 Jun 2019, Matheus Tavares wrote: > > > >>[...] > >> +/* > >> + * Look for a recursive symlink at iter->base.path pointing to any directory on > >> + * the previous stack levels. If it is found, return 1. If not, return 0. > >> + */ > >> +static int find_recursive_symlinks(struct dir_iterator_int *iter) > >> +{ > >> + int i; > >> + > >> + if (!(iter->flags & DIR_ITERATOR_FOLLOW_SYMLINKS) || > >> + !S_ISDIR(iter->base.st.st_mode)) > >> + return 0; > >> > >> + for (i = 0; i < iter->levels_nr; ++i) > >> + if (iter->base.st.st_ino == iter->levels[i].ino) > > > > This does not work on Windows. [[ Windows port does not have > > usable st_ino field ]]] > > And if you cross mountpoint, st_ino alone does not guarantee > uniqueness; you'd need to combine it with st_dev, I would think, > even on POSIX systems. Ok, thanks for letting me know. I'm trying to think of another approach to test for recursive symlinks that does not rely on inode: Given any symlink, we could get its real_path() and compare it with the path of the directory current being iterated. If the first is a prefix of the second, than we mark it as a recursive symlink. What do you think of this idea?
Hi Matheus, On Thu, 27 Jun 2019, Matheus Tavares Bernardino wrote: > On Wed, Jun 26, 2019 at 3:04 PM Junio C Hamano <gitster@pobox.com> wrote: > > > > Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: > > > > > Hi Matheus, > > > > > > On Tue, 18 Jun 2019, Matheus Tavares wrote: > > > > > >>[...] > > >> +/* > > >> + * Look for a recursive symlink at iter->base.path pointing to any directory on > > >> + * the previous stack levels. If it is found, return 1. If not, return 0. > > >> + */ > > >> +static int find_recursive_symlinks(struct dir_iterator_int *iter) > > >> +{ > > >> + int i; > > >> + > > >> + if (!(iter->flags & DIR_ITERATOR_FOLLOW_SYMLINKS) || > > >> + !S_ISDIR(iter->base.st.st_mode)) > > >> + return 0; > > >> > > >> + for (i = 0; i < iter->levels_nr; ++i) > > >> + if (iter->base.st.st_ino == iter->levels[i].ino) > > > > > > This does not work on Windows. [[ Windows port does not have > > > usable st_ino field ]]] > > > > And if you cross mountpoint, st_ino alone does not guarantee > > uniqueness; you'd need to combine it with st_dev, I would think, > > even on POSIX systems. > > Ok, thanks for letting me know. I'm trying to think of another > approach to test for recursive symlinks that does not rely on inode: > Given any symlink, we could get its real_path() and compare it with > the path of the directory current being iterated. If the first is a > prefix of the second, than we mark it as a recursive symlink. > > What do you think of this idea? I think this would be pretty expensive. Too expensive. A better method might be to rely on st_ino/st_dev when we can, and just not bother looking for recursive symlinks when we cannot, like I did in https://github.com/git-for-windows/git/commit/979b00ccf44ec31cff4686e24adf27474923c33a Ciao, Johannes
On Thu, Jun 27, 2019 at 3:47 PM Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote: > > Hi Matheus, > > On Thu, 27 Jun 2019, Matheus Tavares Bernardino wrote: > > > On Wed, Jun 26, 2019 at 3:04 PM Junio C Hamano <gitster@pobox.com> wrote: > > > > > > Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: > > > > > > > Hi Matheus, > > > > > > > > On Tue, 18 Jun 2019, Matheus Tavares wrote: > > > > > > > >>[...] > > > >> +/* > > > >> + * Look for a recursive symlink at iter->base.path pointing to any directory on > > > >> + * the previous stack levels. If it is found, return 1. If not, return 0. > > > >> + */ > > > >> +static int find_recursive_symlinks(struct dir_iterator_int *iter) > > > >> +{ > > > >> + int i; > > > >> + > > > >> + if (!(iter->flags & DIR_ITERATOR_FOLLOW_SYMLINKS) || > > > >> + !S_ISDIR(iter->base.st.st_mode)) > > > >> + return 0; > > > >> > > > >> + for (i = 0; i < iter->levels_nr; ++i) > > > >> + if (iter->base.st.st_ino == iter->levels[i].ino) > > > > > > > > This does not work on Windows. [[ Windows port does not have > > > > usable st_ino field ]]] > > > > > > And if you cross mountpoint, st_ino alone does not guarantee > > > uniqueness; you'd need to combine it with st_dev, I would think, > > > even on POSIX systems. > > > > Ok, thanks for letting me know. I'm trying to think of another > > approach to test for recursive symlinks that does not rely on inode: > > Given any symlink, we could get its real_path() and compare it with > > the path of the directory current being iterated. If the first is a > > prefix of the second, than we mark it as a recursive symlink. > > > > What do you think of this idea? > > I think this would be pretty expensive. Too expensive. Hmm, yes unfortunately :( > A better method might be to rely on st_ino/st_dev when we can, and just > not bother looking for recursive symlinks when we cannot, What if we fallback on the path prefix strategy when st_ino is not available? I mean, if we don't look for recursive symlinks, they would be iterated over and over until we get an ELOOP error. So I think using real_path() should be less expensive in this case. (But just as a fallback to st_ino, off course) > like I did in > https://github.com/git-for-windows/git/commit/979b00ccf44ec31cff4686e24adf27474923c33a Nice! At dir-iterator.h the documentation says that recursive symlinks will be ignored. If we don't implement any fallback, should we add that this is not available on Windows, perhaps? > Ciao, > Johannes
Hi Matheus, On Thu, 27 Jun 2019, Matheus Tavares Bernardino wrote: > On Thu, Jun 27, 2019 at 3:47 PM Johannes Schindelin > <Johannes.Schindelin@gmx.de> wrote: > > > > On Thu, 27 Jun 2019, Matheus Tavares Bernardino wrote: > > > > > On Wed, Jun 26, 2019 at 3:04 PM Junio C Hamano <gitster@pobox.com> wrote: > > > > > > > > Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: > > > > > > > > > On Tue, 18 Jun 2019, Matheus Tavares wrote: > > > > > > > > > >>[...] > > > > >> +/* > > > > >> + * Look for a recursive symlink at iter->base.path pointing to any directory on > > > > >> + * the previous stack levels. If it is found, return 1. If not, return 0. > > > > >> + */ > > > > >> +static int find_recursive_symlinks(struct dir_iterator_int *iter) > > > > >> +{ > > > > >> + int i; > > > > >> + > > > > >> + if (!(iter->flags & DIR_ITERATOR_FOLLOW_SYMLINKS) || > > > > >> + !S_ISDIR(iter->base.st.st_mode)) > > > > >> + return 0; > > > > >> > > > > >> + for (i = 0; i < iter->levels_nr; ++i) > > > > >> + if (iter->base.st.st_ino == iter->levels[i].ino) > > > > > > > > > > This does not work on Windows. [[ Windows port does not have > > > > > usable st_ino field ]]] > > > > > > > > And if you cross mountpoint, st_ino alone does not guarantee > > > > uniqueness; you'd need to combine it with st_dev, I would think, > > > > even on POSIX systems. > > > > > > Ok, thanks for letting me know. I'm trying to think of another > > > approach to test for recursive symlinks that does not rely on inode: > > > Given any symlink, we could get its real_path() and compare it with > > > the path of the directory current being iterated. If the first is a > > > prefix of the second, than we mark it as a recursive symlink. > > > > > > What do you think of this idea? > > > > I think this would be pretty expensive. Too expensive. > > Hmm, yes unfortunately :( > > > A better method might be to rely on st_ino/st_dev when we can, and just > > not bother looking for recursive symlinks when we cannot, > > What if we fallback on the path prefix strategy when st_ino is not > available? I mean, if we don't look for recursive symlinks, they would > be iterated over and over until we get an ELOOP error. So I think > using real_path() should be less expensive in this case. (But just as > a fallback to st_ino, off course) > > > like I did in > > https://github.com/git-for-windows/git/commit/979b00ccf44ec31cff4686e24adf27474923c33a > > Nice! At dir-iterator.h the documentation says that recursive symlinks > will be ignored. If we don't implement any fallback, should we add > that this is not available on Windows, perhaps? I do not really care, unless it breaks things on Windows that were not broken before. You might also want to guard this behind `USE_STDEV` as Duy suggested (and maybe use the opportunity to correct that constant to `USE_ST_DEV`; I looked for it and did not find it because of that naming mistake). Ciao, Dscho
On Fri, Jun 28, 2019 at 9:50 AM Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote: > > Hi Matheus, > > On Thu, 27 Jun 2019, Matheus Tavares Bernardino wrote: > > > On Thu, Jun 27, 2019 at 3:47 PM Johannes Schindelin > > <Johannes.Schindelin@gmx.de> wrote: > > > > > > On Thu, 27 Jun 2019, Matheus Tavares Bernardino wrote: > > > > > > > On Wed, Jun 26, 2019 at 3:04 PM Junio C Hamano <gitster@pobox.com> wrote: > > > > > > > > > > Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: > > > > > > > > > > > On Tue, 18 Jun 2019, Matheus Tavares wrote: > > > > > > > > > > > >>[...] > > > > > >> +/* > > > > > >> + * Look for a recursive symlink at iter->base.path pointing to any directory on > > > > > >> + * the previous stack levels. If it is found, return 1. If not, return 0. > > > > > >> + */ > > > > > >> +static int find_recursive_symlinks(struct dir_iterator_int *iter) > > > > > >> +{ > > > > > >> + int i; > > > > > >> + > > > > > >> + if (!(iter->flags & DIR_ITERATOR_FOLLOW_SYMLINKS) || > > > > > >> + !S_ISDIR(iter->base.st.st_mode)) > > > > > >> + return 0; > > > > > >> > > > > > >> + for (i = 0; i < iter->levels_nr; ++i) > > > > > >> + if (iter->base.st.st_ino == iter->levels[i].ino) > > > > > > > > > > > > This does not work on Windows. [[ Windows port does not have > > > > > > usable st_ino field ]]] > > > > > > > > > > And if you cross mountpoint, st_ino alone does not guarantee > > > > > uniqueness; you'd need to combine it with st_dev, I would think, > > > > > even on POSIX systems. > > > > > > > > Ok, thanks for letting me know. I'm trying to think of another > > > > approach to test for recursive symlinks that does not rely on inode: > > > > Given any symlink, we could get its real_path() and compare it with > > > > the path of the directory current being iterated. If the first is a > > > > prefix of the second, than we mark it as a recursive symlink. > > > > > > > > What do you think of this idea? > > > > > > I think this would be pretty expensive. Too expensive. > > > > Hmm, yes unfortunately :( > > > > > A better method might be to rely on st_ino/st_dev when we can, and just > > > not bother looking for recursive symlinks when we cannot, > > > > What if we fallback on the path prefix strategy when st_ino is not > > available? I mean, if we don't look for recursive symlinks, they would > > be iterated over and over until we get an ELOOP error. So I think > > using real_path() should be less expensive in this case. (But just as > > a fallback to st_ino, off course) > > > > > like I did in > > > https://github.com/git-for-windows/git/commit/979b00ccf44ec31cff4686e24adf27474923c33a > > > > Nice! At dir-iterator.h the documentation says that recursive symlinks > > will be ignored. If we don't implement any fallback, should we add > > that this is not available on Windows, perhaps? > > I do not really care, unless it breaks things on Windows that were not > broken before. > > You might also want to guard this behind `USE_STDEV` as Duy suggested (and > maybe use the opportunity to correct that constant to `USE_ST_DEV`; I > looked for it and did not find it because of that naming mistake). Ok, just to confirm, what I should do is send your fixup patch with the USE_STDEV guard addition, right? Also, USE_STDEV docs says it is used "from the update-index perspective", should I make it more generic as we're using it for other purposes or is it OK like this? Thanks, Matheus
Hi Matheus, On Fri, 28 Jun 2019, Matheus Tavares Bernardino wrote: > On Fri, Jun 28, 2019 at 9:50 AM Johannes Schindelin > <Johannes.Schindelin@gmx.de> wrote: > > > > Hi Matheus, > > > > On Thu, 27 Jun 2019, Matheus Tavares Bernardino wrote: > > > > > On Thu, Jun 27, 2019 at 3:47 PM Johannes Schindelin > > > <Johannes.Schindelin@gmx.de> wrote: > > > > > > > > On Thu, 27 Jun 2019, Matheus Tavares Bernardino wrote: > > > > > > > > > On Wed, Jun 26, 2019 at 3:04 PM Junio C Hamano <gitster@pobox.com> wrote: > > > > > > > > > > > > Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: > > > > > > > > > > > > > On Tue, 18 Jun 2019, Matheus Tavares wrote: > > > > > > > > > > > > > >>[...] > > > > > > >> +/* > > > > > > >> + * Look for a recursive symlink at iter->base.path pointing to any directory on > > > > > > >> + * the previous stack levels. If it is found, return 1. If not, return 0. > > > > > > >> + */ > > > > > > >> +static int find_recursive_symlinks(struct dir_iterator_int *iter) > > > > > > >> +{ > > > > > > >> + int i; > > > > > > >> + > > > > > > >> + if (!(iter->flags & DIR_ITERATOR_FOLLOW_SYMLINKS) || > > > > > > >> + !S_ISDIR(iter->base.st.st_mode)) > > > > > > >> + return 0; > > > > > > >> > > > > > > >> + for (i = 0; i < iter->levels_nr; ++i) > > > > > > >> + if (iter->base.st.st_ino == iter->levels[i].ino) > > > > > > > > > > > > > > This does not work on Windows. [[ Windows port does not have > > > > > > > usable st_ino field ]]] > > > > > > > > > > > > And if you cross mountpoint, st_ino alone does not guarantee > > > > > > uniqueness; you'd need to combine it with st_dev, I would think, > > > > > > even on POSIX systems. > > > > > > > > > > Ok, thanks for letting me know. I'm trying to think of another > > > > > approach to test for recursive symlinks that does not rely on inode: > > > > > Given any symlink, we could get its real_path() and compare it with > > > > > the path of the directory current being iterated. If the first is a > > > > > prefix of the second, than we mark it as a recursive symlink. > > > > > > > > > > What do you think of this idea? > > > > > > > > I think this would be pretty expensive. Too expensive. > > > > > > Hmm, yes unfortunately :( > > > > > > > A better method might be to rely on st_ino/st_dev when we can, and just > > > > not bother looking for recursive symlinks when we cannot, > > > > > > What if we fallback on the path prefix strategy when st_ino is not > > > available? I mean, if we don't look for recursive symlinks, they would > > > be iterated over and over until we get an ELOOP error. So I think > > > using real_path() should be less expensive in this case. (But just as > > > a fallback to st_ino, off course) > > > > > > > like I did in > > > > https://github.com/git-for-windows/git/commit/979b00ccf44ec31cff4686e24adf27474923c33a > > > > > > Nice! At dir-iterator.h the documentation says that recursive symlinks > > > will be ignored. If we don't implement any fallback, should we add > > > that this is not available on Windows, perhaps? > > > > I do not really care, unless it breaks things on Windows that were not > > broken before. > > > > You might also want to guard this behind `USE_STDEV` as Duy suggested (and > > maybe use the opportunity to correct that constant to `USE_ST_DEV`; I > > looked for it and did not find it because of that naming mistake). > > Ok, just to confirm, what I should do is send your fixup patch with > the USE_STDEV guard addition, right? Also, USE_STDEV docs says it is > used "from the update-index perspective", should I make it more > generic as we're using it for other purposes or is it OK like this? I thought Duy had verified that `USE_STDEV` would make sense in this instance, but I agree with you that the idea of that compile time flag was not to guard against a missing `st_dev` field, but about trusting it in the presence of network filesystems. So no, I revert my vote for using `USE_STDEV`. Thanks for the sanity check. Ciao, Dscho
> diff --git a/t/t0066-dir-iterator.sh b/t/t0066-dir-iterator.sh > index c739ed7911..8f996a31fa 100755 > --- a/t/t0066-dir-iterator.sh > +++ b/t/t0066-dir-iterator.sh > @@ -65,4 +65,99 @@ test_expect_success 'begin should fail upon non directory paths' ' > test_cmp expected-non-dir-output actual-non-dir-output > ' > > +test_expect_success POSIXPERM,SANITY 'advance should not fail on errors by default' ' > + cat >expected-no-permissions-output <<-EOF && > + [d] (a) [a] ./dir3/a > + EOF > + > + mkdir -p dir3/a && > + > dir3/a/b && Style nit: space between redirection op and pathname. > + chmod 0 dir3/a && > + > + test-tool dir-iterator ./dir3 >actual-no-permissions-output && > + test_cmp expected-no-permissions-output actual-no-permissions-output && > + chmod 755 dir3/a && > + rm -rf dir3 > +' > + > +test_expect_success POSIXPERM,SANITY 'advance should fail on errors, w/ pedantic flag' ' > + cat >expected-no-permissions-pedantic-output <<-EOF && > + [d] (a) [a] ./dir3/a > + dir_iterator_advance failure > + EOF > + > + mkdir -p dir3/a && > + > dir3/a/b && Likewise. > + chmod 0 dir3/a && > + > + test_must_fail test-tool dir-iterator --pedantic ./dir3 \ > + >actual-no-permissions-pedantic-output && > + test_cmp expected-no-permissions-pedantic-output \ > + actual-no-permissions-pedantic-output && > + chmod 755 dir3/a && > + rm -rf dir3 > +' > + > +test_expect_success SYMLINKS 'setup dirs with symlinks' ' > + mkdir -p dir4/a && > + mkdir -p dir4/b/c && > + >dir4/a/d && > + ln -s d dir4/a/e && > + ln -s ../b dir4/a/f && > + > + mkdir -p dir5/a/b && > + mkdir -p dir5/a/c && > + ln -s ../c dir5/a/b/d && > + ln -s ../ dir5/a/b/e && > + ln -s ../../ dir5/a/b/f > +' > + > +test_expect_success SYMLINKS 'dir-iterator should not follow symlinks by default' ' > + cat >expected-no-follow-sorted-output <<-EOF && > + [d] (a) [a] ./dir4/a > + [d] (b) [b] ./dir4/b > + [d] (b/c) [c] ./dir4/b/c > + [f] (a/d) [d] ./dir4/a/d > + [s] (a/e) [e] ./dir4/a/e > + [s] (a/f) [f] ./dir4/a/f > + EOF > + > + test-tool dir-iterator ./dir4 >out && > + sort <out >actual-no-follow-sorted-output && Unnecessary redirection, 'sort' is capable to open the file on its own. > + > + test_cmp expected-no-follow-sorted-output actual-no-follow-sorted-output > +' > + > +test_expect_success SYMLINKS 'dir-iterator should follow symlinks w/ follow flag' ' > + cat >expected-follow-sorted-output <<-EOF && > + [d] (a) [a] ./dir4/a > + [d] (a/f) [f] ./dir4/a/f > + [d] (a/f/c) [c] ./dir4/a/f/c > + [d] (b) [b] ./dir4/b > + [d] (b/c) [c] ./dir4/b/c > + [f] (a/d) [d] ./dir4/a/d > + [f] (a/e) [e] ./dir4/a/e > + EOF > + > + test-tool dir-iterator --follow-symlinks ./dir4 >out && > + sort <out >actual-follow-sorted-output && Likewise. > + test_cmp expected-follow-sorted-output actual-follow-sorted-output > +' > + > + > +test_expect_success SYMLINKS 'dir-iterator should ignore recursive symlinks w/ follow flag' ' > + cat >expected-rec-symlinks-sorted-output <<-EOF && > + [d] (a) [a] ./dir5/a > + [d] (a/b) [b] ./dir5/a/b > + [d] (a/b/d) [d] ./dir5/a/b/d > + [d] (a/c) [c] ./dir5/a/c > + EOF > + > + test-tool dir-iterator --follow-symlinks ./dir5 >out && > + sort <out >actual-rec-symlinks-sorted-output && Likewise. > + test_cmp expected-rec-symlinks-sorted-output actual-rec-symlinks-sorted-output > +' > + > test_done > -- > 2.22.0 >
Thanks for the review. I'll address those issues in v8. Best, Matheus On Wed, Jul 3, 2019 at 5:57 AM SZEDER Gábor <szeder.dev@gmail.com> wrote: > > > diff --git a/t/t0066-dir-iterator.sh b/t/t0066-dir-iterator.sh > > index c739ed7911..8f996a31fa 100755 > > --- a/t/t0066-dir-iterator.sh > > +++ b/t/t0066-dir-iterator.sh > > @@ -65,4 +65,99 @@ test_expect_success 'begin should fail upon non directory paths' ' > > test_cmp expected-non-dir-output actual-non-dir-output > > ' > > > > +test_expect_success POSIXPERM,SANITY 'advance should not fail on errors by default' ' > > + cat >expected-no-permissions-output <<-EOF && > > + [d] (a) [a] ./dir3/a > > + EOF > > + > > + mkdir -p dir3/a && > > + > dir3/a/b && > > Style nit: space between redirection op and pathname. > > > + chmod 0 dir3/a && > > + > > + test-tool dir-iterator ./dir3 >actual-no-permissions-output && > > + test_cmp expected-no-permissions-output actual-no-permissions-output && > > + chmod 755 dir3/a && > > + rm -rf dir3 > > +' > > + > > +test_expect_success POSIXPERM,SANITY 'advance should fail on errors, w/ pedantic flag' ' > > + cat >expected-no-permissions-pedantic-output <<-EOF && > > + [d] (a) [a] ./dir3/a > > + dir_iterator_advance failure > > + EOF > > + > > + mkdir -p dir3/a && > > + > dir3/a/b && > > Likewise. > > > + chmod 0 dir3/a && > > + > > + test_must_fail test-tool dir-iterator --pedantic ./dir3 \ > > + >actual-no-permissions-pedantic-output && > > + test_cmp expected-no-permissions-pedantic-output \ > > + actual-no-permissions-pedantic-output && > > + chmod 755 dir3/a && > > + rm -rf dir3 > > +' > > + > > +test_expect_success SYMLINKS 'setup dirs with symlinks' ' > > + mkdir -p dir4/a && > > + mkdir -p dir4/b/c && > > + >dir4/a/d && > > + ln -s d dir4/a/e && > > + ln -s ../b dir4/a/f && > > + > > + mkdir -p dir5/a/b && > > + mkdir -p dir5/a/c && > > + ln -s ../c dir5/a/b/d && > > + ln -s ../ dir5/a/b/e && > > + ln -s ../../ dir5/a/b/f > > +' > > + > > +test_expect_success SYMLINKS 'dir-iterator should not follow symlinks by default' ' > > + cat >expected-no-follow-sorted-output <<-EOF && > > + [d] (a) [a] ./dir4/a > > + [d] (b) [b] ./dir4/b > > + [d] (b/c) [c] ./dir4/b/c > > + [f] (a/d) [d] ./dir4/a/d > > + [s] (a/e) [e] ./dir4/a/e > > + [s] (a/f) [f] ./dir4/a/f > > + EOF > > + > > + test-tool dir-iterator ./dir4 >out && > > + sort <out >actual-no-follow-sorted-output && > > Unnecessary redirection, 'sort' is capable to open the file on its > own. > > > + > > + test_cmp expected-no-follow-sorted-output actual-no-follow-sorted-output > > +' > > + > > +test_expect_success SYMLINKS 'dir-iterator should follow symlinks w/ follow flag' ' > > + cat >expected-follow-sorted-output <<-EOF && > > + [d] (a) [a] ./dir4/a > > + [d] (a/f) [f] ./dir4/a/f > > + [d] (a/f/c) [c] ./dir4/a/f/c > > + [d] (b) [b] ./dir4/b > > + [d] (b/c) [c] ./dir4/b/c > > + [f] (a/d) [d] ./dir4/a/d > > + [f] (a/e) [e] ./dir4/a/e > > + EOF > > + > > + test-tool dir-iterator --follow-symlinks ./dir4 >out && > > + sort <out >actual-follow-sorted-output && > > Likewise. > > > + test_cmp expected-follow-sorted-output actual-follow-sorted-output > > +' > > + > > + > > +test_expect_success SYMLINKS 'dir-iterator should ignore recursive symlinks w/ follow flag' ' > > + cat >expected-rec-symlinks-sorted-output <<-EOF && > > + [d] (a) [a] ./dir5/a > > + [d] (a/b) [b] ./dir5/a/b > > + [d] (a/b/d) [d] ./dir5/a/b/d > > + [d] (a/c) [c] ./dir5/a/c > > + EOF > > + > > + test-tool dir-iterator --follow-symlinks ./dir5 >out && > > + sort <out >actual-rec-symlinks-sorted-output && > > Likewise. > > > + test_cmp expected-rec-symlinks-sorted-output actual-rec-symlinks-sorted-output > > +' > > + > > test_done > > -- > > 2.22.0 > >
diff --git a/dir-iterator.c b/dir-iterator.c index 594fe4d67b..52db87bdc9 100644 --- a/dir-iterator.c +++ b/dir-iterator.c @@ -6,6 +6,9 @@ struct dir_iterator_level { DIR *dir; + /* The inode number of this level's directory. */ + ino_t ino; + /* * The length of the directory part of path at this level * (including a trailing '/'): @@ -38,13 +41,16 @@ struct dir_iterator_int { * that will be included in this iteration. */ struct dir_iterator_level *levels; + + /* Combination of flags for this dir-iterator */ + unsigned int flags; }; /* * Push a level in the iter stack and initialize it with information from * the directory pointed by iter->base->path. It is assumed that this * strbuf points to a valid directory path. Return 0 on success and -1 - * otherwise, leaving the stack unchanged. + * otherwise, setting errno accordingly and leaving the stack unchanged. */ static int push_level(struct dir_iterator_int *iter) { @@ -56,14 +62,17 @@ static int push_level(struct dir_iterator_int *iter) if (!is_dir_sep(iter->base.path.buf[iter->base.path.len - 1])) strbuf_addch(&iter->base.path, '/'); level->prefix_len = iter->base.path.len; + level->ino = iter->base.st.st_ino; level->dir = opendir(iter->base.path.buf); if (!level->dir) { + int saved_errno = errno; if (errno != ENOENT) { warning_errno("error opening directory '%s'", iter->base.path.buf); } iter->levels_nr--; + errno = saved_errno; return -1; } @@ -90,11 +99,13 @@ static int pop_level(struct dir_iterator_int *iter) /* * Populate iter->base with the necessary information on the next iteration * entry, represented by the given dirent de. Return 0 on success and -1 - * otherwise. + * otherwise, setting errno accordingly. */ static int prepare_next_entry_data(struct dir_iterator_int *iter, struct dirent *de) { + int err, saved_errno; + strbuf_addstr(&iter->base.path, de->d_name); /* * We have to reset these because the path strbuf might have @@ -105,12 +116,34 @@ static int prepare_next_entry_data(struct dir_iterator_int *iter, iter->base.basename = iter->base.path.buf + iter->levels[iter->levels_nr - 1].prefix_len; - if (lstat(iter->base.path.buf, &iter->base.st)) { - if (errno != ENOENT) - warning_errno("failed to stat '%s'", iter->base.path.buf); - return -1; - } + if (iter->flags & DIR_ITERATOR_FOLLOW_SYMLINKS) + err = stat(iter->base.path.buf, &iter->base.st); + else + err = lstat(iter->base.path.buf, &iter->base.st); + + saved_errno = errno; + if (err && errno != ENOENT) + warning_errno("failed to stat '%s'", iter->base.path.buf); + + errno = saved_errno; + return err; +} + +/* + * Look for a recursive symlink at iter->base.path pointing to any directory on + * the previous stack levels. If it is found, return 1. If not, return 0. + */ +static int find_recursive_symlinks(struct dir_iterator_int *iter) +{ + int i; + + if (!(iter->flags & DIR_ITERATOR_FOLLOW_SYMLINKS) || + !S_ISDIR(iter->base.st.st_mode)) + return 0; + for (i = 0; i < iter->levels_nr; ++i) + if (iter->base.st.st_ino == iter->levels[i].ino) + return 1; return 0; } @@ -119,11 +152,11 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator) struct dir_iterator_int *iter = (struct dir_iterator_int *)dir_iterator; - if (S_ISDIR(iter->base.st.st_mode)) { - if (push_level(iter) && iter->levels_nr == 0) { - /* Pushing the first level failed */ - return dir_iterator_abort(dir_iterator); - } + if (S_ISDIR(iter->base.st.st_mode) && push_level(iter)) { + if (errno != ENOENT && iter->flags & DIR_ITERATOR_PEDANTIC) + goto error_out; + if (iter->levels_nr == 0) + goto error_out; } /* Loop until we find an entry that we can give back to the caller. */ @@ -137,22 +170,38 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator) de = readdir(level->dir); if (!de) { - if (errno) + if (errno) { warning_errno("error reading directory '%s'", iter->base.path.buf); - else if (pop_level(iter) == 0) + if (iter->flags & DIR_ITERATOR_PEDANTIC) + goto error_out; + } else if (pop_level(iter) == 0) { return dir_iterator_abort(dir_iterator); + } continue; } if (is_dot_or_dotdot(de->d_name)) continue; - if (prepare_next_entry_data(iter, de)) + if (prepare_next_entry_data(iter, de)) { + if (errno != ENOENT && iter->flags & DIR_ITERATOR_PEDANTIC) + goto error_out; continue; + } + + if (find_recursive_symlinks(iter)) { + warning("ignoring recursive symlink at '%s'", + iter->base.path.buf); + continue; + } return ITER_OK; } + +error_out: + dir_iterator_abort(dir_iterator); + return ITER_ERROR; } int dir_iterator_abort(struct dir_iterator *dir_iterator) @@ -178,7 +227,7 @@ int dir_iterator_abort(struct dir_iterator *dir_iterator) return ITER_DONE; } -struct dir_iterator *dir_iterator_begin(const char *path) +struct dir_iterator *dir_iterator_begin(const char *path, unsigned int flags) { struct dir_iterator_int *iter = xcalloc(1, sizeof(*iter)); struct dir_iterator *dir_iterator = &iter->base; @@ -189,6 +238,7 @@ struct dir_iterator *dir_iterator_begin(const char *path) ALLOC_GROW(iter->levels, 10, iter->levels_alloc); iter->levels_nr = 0; + iter->flags = flags; /* * Note: stat already checks for NULL or empty strings and diff --git a/dir-iterator.h b/dir-iterator.h index 0822821e56..42751091a5 100644 --- a/dir-iterator.h +++ b/dir-iterator.h @@ -20,7 +20,8 @@ * A typical iteration looks like this: * * int ok; - * struct iterator *iter = dir_iterator_begin(path); + * unsigned int flags = DIR_ITERATOR_PEDANTIC; + * struct dir_iterator *iter = dir_iterator_begin(path, flags); * * if (!iter) * goto error_handler; @@ -44,6 +45,25 @@ * dir_iterator_advance() again. */ +/* + * Flags for dir_iterator_begin: + * + * - DIR_ITERATOR_PEDANTIC: override dir-iterator's default behavior + * in case of an error at dir_iterator_advance(), which is to keep + * looking for a next valid entry. With this flag, resources are freed + * and ITER_ERROR is returned immediately. In both cases, a meaningful + * warning is emitted. Note: ENOENT errors are always ignored so that + * the API users may remove files during iteration. + * + * - DIR_ITERATOR_FOLLOW_SYMLINKS: make dir-iterator follow symlinks. + * i.e., linked directories' contents will be iterated over and + * iter->base.st will contain information on the referred files, + * not the symlinks themselves, which is the default behavior. + * Recursive symlinks are skipped with a warning and broken symlinks + * are ignored. + */ +#define DIR_ITERATOR_PEDANTIC (1 << 0) +#define DIR_ITERATOR_FOLLOW_SYMLINKS (1 << 1) + struct dir_iterator { /* The current path: */ struct strbuf path; @@ -58,29 +78,38 @@ struct dir_iterator { /* The current basename: */ const char *basename; - /* The result of calling lstat() on path: */ + /* + * The result of calling lstat() on path; or stat(), if the + * DIR_ITERATOR_FOLLOW_SYMLINKS flag was set at + * dir_iterator's initialization. + */ struct stat st; }; /* - * Start a directory iteration over path. On success, return a - * dir_iterator that holds the internal state of the iteration. - * In case of failure, return NULL and set errno accordingly. + * Start a directory iteration over path with the combination of + * options specified by flags. On success, return a dir_iterator + * that holds the internal state of the iteration. In case of + * failure, return NULL and set errno accordingly. * * The iteration includes all paths under path, not including path * itself and not including "." or ".." entries. * - * path is the starting directory. An internal copy will be made. + * Parameters are: + * - path is the starting directory. An internal copy will be made. + * - flags is a combination of the possible flags to initialize a + * dir-iterator or 0 for default behavior. */ -struct dir_iterator *dir_iterator_begin(const char *path); +struct dir_iterator *dir_iterator_begin(const char *path, unsigned int flags); /* * Advance the iterator to the first or next item and return ITER_OK. * If the iteration is exhausted, free the dir_iterator and any - * resources associated with it and return ITER_DONE. On error, free - * dir_iterator and associated resources and return ITER_ERROR. It is - * a bug to use iterator or call this function again after it has - * returned ITER_DONE or ITER_ERROR. + * resources associated with it and return ITER_DONE. + * + * It is a bug to use iterator or call this function again after it + * has returned ITER_DONE or ITER_ERROR (which may be returned iff + * the DIR_ITERATOR_PEDANTIC flag was set). */ int dir_iterator_advance(struct dir_iterator *iterator); diff --git a/refs/files-backend.c b/refs/files-backend.c index 7ed81046d4..b1f8f53a09 100644 --- a/refs/files-backend.c +++ b/refs/files-backend.c @@ -2150,7 +2150,7 @@ static struct ref_iterator *reflog_iterator_begin(struct ref_store *ref_store, strbuf_addf(&sb, "%s/logs", gitdir); - diter = dir_iterator_begin(sb.buf); + diter = dir_iterator_begin(sb.buf, 0); if(!diter) return empty_ref_iterator_begin(); diff --git a/t/helper/test-dir-iterator.c b/t/helper/test-dir-iterator.c index fab1ff6237..a5b96cb0dc 100644 --- a/t/helper/test-dir-iterator.c +++ b/t/helper/test-dir-iterator.c @@ -4,29 +4,44 @@ #include "iterator.h" #include "dir-iterator.h" -/* Argument is a directory path to iterate over */ +/* + * usage: + * tool-test dir-iterator [--follow-symlinks] [--pedantic] directory_path + */ int cmd__dir_iterator(int argc, const char **argv) { struct strbuf path = STRBUF_INIT; struct dir_iterator *diter; + unsigned int flags = 0; + int iter_status; + + for (++argv, --argc; *argv && starts_with(*argv, "--"); ++argv, --argc) { + if (strcmp(*argv, "--follow-symlinks") == 0) + flags |= DIR_ITERATOR_FOLLOW_SYMLINKS; + else if (strcmp(*argv, "--pedantic") == 0) + flags |= DIR_ITERATOR_PEDANTIC; + else + die("invalid option '%s'", *argv); + } - if (argc < 2) - die("BUG: test-dir-iterator needs one argument"); - - strbuf_add(&path, argv[1], strlen(argv[1])); + if (!*argv || argc != 1) + die("dir-iterator needs exactly one non-option argument"); - diter = dir_iterator_begin(path.buf); + strbuf_add(&path, *argv, strlen(*argv)); + diter = dir_iterator_begin(path.buf, flags); if (!diter) { printf("dir_iterator_begin failure: %d\n", errno); exit(EXIT_FAILURE); } - while (dir_iterator_advance(diter) == ITER_OK) { + while ((iter_status = dir_iterator_advance(diter)) == ITER_OK) { if (S_ISDIR(diter->st.st_mode)) printf("[d] "); else if (S_ISREG(diter->st.st_mode)) printf("[f] "); + else if (S_ISLNK(diter->st.st_mode)) + printf("[s] "); else printf("[?] "); @@ -34,5 +49,10 @@ int cmd__dir_iterator(int argc, const char **argv) diter->path.buf); } + if (iter_status != ITER_DONE) { + printf("dir_iterator_advance failure\n"); + return 1; + } + return 0; } diff --git a/t/t0066-dir-iterator.sh b/t/t0066-dir-iterator.sh index c739ed7911..8f996a31fa 100755 --- a/t/t0066-dir-iterator.sh +++ b/t/t0066-dir-iterator.sh @@ -65,4 +65,99 @@ test_expect_success 'begin should fail upon non directory paths' ' test_cmp expected-non-dir-output actual-non-dir-output ' +test_expect_success POSIXPERM,SANITY 'advance should not fail on errors by default' ' + cat >expected-no-permissions-output <<-EOF && + [d] (a) [a] ./dir3/a + EOF + + mkdir -p dir3/a && + > dir3/a/b && + chmod 0 dir3/a && + + test-tool dir-iterator ./dir3 >actual-no-permissions-output && + test_cmp expected-no-permissions-output actual-no-permissions-output && + chmod 755 dir3/a && + rm -rf dir3 +' + +test_expect_success POSIXPERM,SANITY 'advance should fail on errors, w/ pedantic flag' ' + cat >expected-no-permissions-pedantic-output <<-EOF && + [d] (a) [a] ./dir3/a + dir_iterator_advance failure + EOF + + mkdir -p dir3/a && + > dir3/a/b && + chmod 0 dir3/a && + + test_must_fail test-tool dir-iterator --pedantic ./dir3 \ + >actual-no-permissions-pedantic-output && + test_cmp expected-no-permissions-pedantic-output \ + actual-no-permissions-pedantic-output && + chmod 755 dir3/a && + rm -rf dir3 +' + +test_expect_success SYMLINKS 'setup dirs with symlinks' ' + mkdir -p dir4/a && + mkdir -p dir4/b/c && + >dir4/a/d && + ln -s d dir4/a/e && + ln -s ../b dir4/a/f && + + mkdir -p dir5/a/b && + mkdir -p dir5/a/c && + ln -s ../c dir5/a/b/d && + ln -s ../ dir5/a/b/e && + ln -s ../../ dir5/a/b/f +' + +test_expect_success SYMLINKS 'dir-iterator should not follow symlinks by default' ' + cat >expected-no-follow-sorted-output <<-EOF && + [d] (a) [a] ./dir4/a + [d] (b) [b] ./dir4/b + [d] (b/c) [c] ./dir4/b/c + [f] (a/d) [d] ./dir4/a/d + [s] (a/e) [e] ./dir4/a/e + [s] (a/f) [f] ./dir4/a/f + EOF + + test-tool dir-iterator ./dir4 >out && + sort <out >actual-no-follow-sorted-output && + + test_cmp expected-no-follow-sorted-output actual-no-follow-sorted-output +' + +test_expect_success SYMLINKS 'dir-iterator should follow symlinks w/ follow flag' ' + cat >expected-follow-sorted-output <<-EOF && + [d] (a) [a] ./dir4/a + [d] (a/f) [f] ./dir4/a/f + [d] (a/f/c) [c] ./dir4/a/f/c + [d] (b) [b] ./dir4/b + [d] (b/c) [c] ./dir4/b/c + [f] (a/d) [d] ./dir4/a/d + [f] (a/e) [e] ./dir4/a/e + EOF + + test-tool dir-iterator --follow-symlinks ./dir4 >out && + sort <out >actual-follow-sorted-output && + + test_cmp expected-follow-sorted-output actual-follow-sorted-output +' + + +test_expect_success SYMLINKS 'dir-iterator should ignore recursive symlinks w/ follow flag' ' + cat >expected-rec-symlinks-sorted-output <<-EOF && + [d] (a) [a] ./dir5/a + [d] (a/b) [b] ./dir5/a/b + [d] (a/b/d) [d] ./dir5/a/b/d + [d] (a/c) [c] ./dir5/a/c + EOF + + test-tool dir-iterator --follow-symlinks ./dir5 >out && + sort <out >actual-rec-symlinks-sorted-output && + + test_cmp expected-rec-symlinks-sorted-output actual-rec-symlinks-sorted-output +' + test_done
Add the possibility of giving flags to dir_iterator_begin to initialize a dir-iterator with special options. Currently possible flags are: - DIR_ITERATOR_PEDANTIC, which makes dir_iterator_advance abort immediately in the case of an error, instead of keep looking for the next valid entry; - DIR_ITERATOR_FOLLOW_SYMLINKS, which makes the iterator follow symlinks and include linked directories' contents in the iteration. These new flags will be used in a subsequent patch. Also add tests for the flags' usage and adjust refs/files-backend.c to the new dir_iterator_begin signature. Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> --- dir-iterator.c | 82 +++++++++++++++++++++++++------ dir-iterator.h | 51 ++++++++++++++----- refs/files-backend.c | 2 +- t/helper/test-dir-iterator.c | 34 ++++++++++--- t/t0066-dir-iterator.sh | 95 ++++++++++++++++++++++++++++++++++++ 5 files changed, 229 insertions(+), 35 deletions(-)