mbox series

[GSoC,v7,00/10] clone: dir-iterator refactoring with tests

Message ID cover.1560898723.git.matheus.bernardino@usp.br (mailing list archive)
Headers show
Series clone: dir-iterator refactoring with tests | expand

Message

Matheus Tavares June 18, 2019, 11:27 p.m. UTC
This patchset contains:
- tests to the dir-iterator API;
- dir-iterator refactoring to make its state machine simpler
  and feature adding with tests;
- a replacement of explicit recursive dir iteration at
  copy_or_link_directory for the dir-iterator API;
- some refactoring and behavior changes at local clone, mainly to
  take care of symlinks and hidden files at .git/objects, together
  with tests for these types of files.

Changes since v6:
- Rebased with master;
- Added to dir-iterator documentation that ENOENT errors and hence broken
  symlinks are both ignored.

With the changes brought by this patchset, dir_iterator_begin() may now
return NULL (setting errno) when it finds an error. Also, it's possible
to pass a pedantic flag to it so that dir_iterator_advance() return
immediately on errors. But at refs/files-backend.c, the only user of
the API so far, the flag wasn't used and an empty iterator is
returned in case of errors at dir_iterator_begin(). These actions were
taken in order to keep the files-backend's behavior as close as
possible to the one it previously had. But since it already has guards
for possible errors at dir_iterator_advance(), I'm wondering whether I
should send a follow-up patch making it use the pedantic flag.

Also, should I perhaps call die_errno() on dir_iterator_begin() errors
at files-backend? I mean, we should continue returning an empty
iterator on ENOENT errors since ".git/logs", the dir it iterates over,
may not be present. But we could possibly abort on other errors, just
to be sure...

Any comments on this possible follow-up patch will be highly appreciated.

v6: https://public-inbox.org/git/20190502144829.4394-1-matheus.bernardino@usp.br/
travis build: https://travis-ci.org/matheustavares/git/builds/547451528

Daniel Ferreira (1):
  dir-iterator: add tests for dir-iterator API

Matheus Tavares (8):
  clone: better handle symlinked files at .git/objects/
  dir-iterator: use warning_errno when possible
  dir-iterator: refactor state machine model
  dir-iterator: add flags parameter to dir_iterator_begin
  clone: copy hidden paths at local clone
  clone: extract function from copy_or_link_directory
  clone: use dir-iterator to avoid explicit dir traversal
  clone: replace strcmp by fspathcmp

Ævar Arnfjörð Bjarmason (1):
  clone: test for our behavior on odd objects/* content

 Makefile                     |   1 +
 builtin/clone.c              |  75 +++++----
 dir-iterator.c               | 289 +++++++++++++++++++++--------------
 dir-iterator.h               |  60 ++++++--
 refs/files-backend.c         |  17 ++-
 t/helper/test-dir-iterator.c |  58 +++++++
 t/helper/test-tool.c         |   1 +
 t/helper/test-tool.h         |   1 +
 t/t0066-dir-iterator.sh      | 163 ++++++++++++++++++++
 t/t5604-clone-reference.sh   | 133 ++++++++++++++++
 10 files changed, 635 insertions(+), 163 deletions(-)
 create mode 100644 t/helper/test-dir-iterator.c
 create mode 100755 t/t0066-dir-iterator.sh

Comments

Matheus Tavares June 19, 2019, 4:36 a.m. UTC | #1
On Tue, Jun 18, 2019 at 8:28 PM Matheus Tavares
<matheus.bernardino@usp.br> wrote:
>
> This patchset contains:
> - tests to the dir-iterator API;
> - dir-iterator refactoring to make its state machine simpler
>   and feature adding with tests;
> - a replacement of explicit recursive dir iteration at
>   copy_or_link_directory for the dir-iterator API;
> - some refactoring and behavior changes at local clone, mainly to
>   take care of symlinks and hidden files at .git/objects, together
>   with tests for these types of files.
>
> Changes since v6:
> - Rebased with master;
> - Added to dir-iterator documentation that ENOENT errors and hence broken
>   symlinks are both ignored.
>
> With the changes brought by this patchset, dir_iterator_begin() may now
> return NULL (setting errno) when it finds an error. Also, it's possible
> to pass a pedantic flag to it so that dir_iterator_advance() return
> immediately on errors. But at refs/files-backend.c, the only user of
> the API so far, the flag wasn't used and an empty iterator is
> returned in case of errors at dir_iterator_begin(). These actions were
> taken in order to keep the files-backend's behavior as close as
> possible to the one it previously had. But since it already has guards
> for possible errors at dir_iterator_advance(), I'm wondering whether I
> should send a follow-up patch making it use the pedantic flag.
>
> Also, should I perhaps call die_errno() on dir_iterator_begin() errors
> at files-backend? I mean, we should continue returning an empty
> iterator on ENOENT errors since ".git/logs", the dir it iterates over,
> may not be present. But we could possibly abort on other errors, just
> to be sure...

I got ahead of myself in this last paragraph. ".git/logs" is one of the dirs
that files-backend.c is used to iterate over, but it doesn't mean it's the only
one. This dir, in particular, is iterated when we run 'git rev-list
--reflog', for
example. And upon ENOENTs, the iteration is expected to end
successfully but with no entries.

(also adding Michael and Daniel to CC, in case they have some input on
these ideas)
Junio C Hamano June 20, 2019, 8:18 p.m. UTC | #2
Matheus Tavares <matheus.bernardino@usp.br> writes:

> Daniel Ferreira (1):
>   dir-iterator: add tests for dir-iterator API
>
> Matheus Tavares (8):
>   clone: better handle symlinked files at .git/objects/
>   dir-iterator: use warning_errno when possible
>   dir-iterator: refactor state machine model
>   dir-iterator: add flags parameter to dir_iterator_begin
>   clone: copy hidden paths at local clone
>   clone: extract function from copy_or_link_directory
>   clone: use dir-iterator to avoid explicit dir traversal
>   clone: replace strcmp by fspathcmp
>
> Ævar Arnfjörð Bjarmason (1):
>   clone: test for our behavior on odd objects/* content
>
>  Makefile                     |   1 +
>  builtin/clone.c              |  75 +++++----
>  dir-iterator.c               | 289 +++++++++++++++++++++--------------
>  dir-iterator.h               |  60 ++++++--
>  refs/files-backend.c         |  17 ++-
>  t/helper/test-dir-iterator.c |  58 +++++++
>  t/helper/test-tool.c         |   1 +
>  t/helper/test-tool.h         |   1 +
>  t/t0066-dir-iterator.sh      | 163 ++++++++++++++++++++
>  t/t5604-clone-reference.sh   | 133 ++++++++++++++++
>  10 files changed, 635 insertions(+), 163 deletions(-)
>  create mode 100644 t/helper/test-dir-iterator.c
>  create mode 100755 t/t0066-dir-iterator.sh

A higher level question is what's the benefit of using dir-iterator
API in the first place.  After subtracting 356 added lines to t/,
it still adds 279 lines while removing only 163 lines, so it is not
like "we have a perfect dir-iterator API that can be applied as-is
but an older code that predates dir-iterator API was still using an
old way, so let's make the latter use the former."
Matheus Tavares June 21, 2019, 1:41 p.m. UTC | #3
On Thu, Jun 20, 2019 at 5:18 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Matheus Tavares <matheus.bernardino@usp.br> writes:
>
> > Daniel Ferreira (1):
> >   dir-iterator: add tests for dir-iterator API
> >
> > Matheus Tavares (8):
> >   clone: better handle symlinked files at .git/objects/
> >   dir-iterator: use warning_errno when possible
> >   dir-iterator: refactor state machine model
> >   dir-iterator: add flags parameter to dir_iterator_begin
> >   clone: copy hidden paths at local clone
> >   clone: extract function from copy_or_link_directory
> >   clone: use dir-iterator to avoid explicit dir traversal
> >   clone: replace strcmp by fspathcmp
> >
> > Ævar Arnfjörð Bjarmason (1):
> >   clone: test for our behavior on odd objects/* content
> >
> >  Makefile                     |   1 +
> >  builtin/clone.c              |  75 +++++----
> >  dir-iterator.c               | 289 +++++++++++++++++++++--------------
> >  dir-iterator.h               |  60 ++++++--
> >  refs/files-backend.c         |  17 ++-
> >  t/helper/test-dir-iterator.c |  58 +++++++
> >  t/helper/test-tool.c         |   1 +
> >  t/helper/test-tool.h         |   1 +
> >  t/t0066-dir-iterator.sh      | 163 ++++++++++++++++++++
> >  t/t5604-clone-reference.sh   | 133 ++++++++++++++++
> >  10 files changed, 635 insertions(+), 163 deletions(-)
> >  create mode 100644 t/helper/test-dir-iterator.c
> >  create mode 100755 t/t0066-dir-iterator.sh
>
> A higher level question is what's the benefit of using dir-iterator
> API in the first place.  After subtracting 356 added lines to t/,
> it still adds 279 lines while removing only 163 lines, so it is not
> like "we have a perfect dir-iterator API that can be applied as-is
> but an older code that predates dir-iterator API was still using an
> old way, so let's make the latter use the former."
>

Yes, indeed the dir-iterator API didn't nicely fit in clone without
some tweaking. Yet I think most of those line additions were not only
to adjust the API, but also trying to improve both dir-iterator and
local clone (I should have maybe split those changes into other
patchsets, though). For example, these changes make local clone better
handle possible symlinks and hidden files at git dir. And the API
changes should make it easier to apply it as-is in other sections of
the codebase from now on.

As for the benefit of using the API here, I think it mainly resides in
the security it brings, avoiding recursive iteration (even though it
should be shallow in local clone) and more carefully handling
symlinks.