[WIP,RFC,v2,1/5] dir-iterator: add flags parameter to dir_iterator_begin
diff mbox series

Message ID 20190226051804.10631-2-matheus.bernardino@usp.br
State New
Headers show
Series
  • clone: dir iterator refactoring with tests
Related show

Commit Message

Matheus Tavares Bernardino Feb. 26, 2019, 5:18 a.m. UTC
Add the possibility of giving flags to dir_iterator_begin to initialize
a dir-iterator with special options.

Currently possible flags are DIR_ITERATOR_PEDANTIC, which makes
dir_iterator_advance abort imediatelly in the case of an error while
trying to fetch next entry; and DIR_ITERATOR_FOLLOW_SYMLINKS, which
makes the iteration follow symlinks to directories and include its
contents in the iteration. These new flags will be used in a subsequent
patch.

Also adjust refs/files-backend.c to the new dir_iterator_begin
signature.

Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
 dir-iterator.c       | 28 +++++++++++++++++++++++++---
 dir-iterator.h       | 40 ++++++++++++++++++++++++++++++++--------
 refs/files-backend.c |  2 +-
 3 files changed, 58 insertions(+), 12 deletions(-)

Comments

Duy Nguyen Feb. 26, 2019, 12:01 p.m. UTC | #1
On Tue, Feb 26, 2019 at 12:18 PM Matheus Tavares
<matheus.bernardino@usp.br> wrote:
>  int dir_iterator_advance(struct dir_iterator *dir_iterator)
>  {
>         struct dir_iterator_int *iter =
>                 (struct dir_iterator_int *)dir_iterator;
> +       int ret;
>
>         while (1) {
>                 struct dir_iterator_level *level =
> @@ -71,6 +75,8 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
>
>                         level->dir = opendir(iter->base.path.buf);
>                         if (!level->dir && errno != ENOENT) {
> +                               if (iter->flags & DIR_ITERATOR_PEDANTIC)
> +                                       goto error_out;
>                                 warning("error opening directory %s: %s",
>                                         iter->base.path.buf, strerror(errno));
>                                 /* Popping the level is handled below */
> @@ -122,6 +128,8 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
>                         if (!de) {
>                                 /* This level is exhausted; pop up a level. */
>                                 if (errno) {
> +                                       if (iter->flags & DIR_ITERATOR_PEDANTIC)
> +                                               goto error_out;
>                                         warning("error reading directory %s: %s",
>                                                 iter->base.path.buf, strerror(errno));
>                                 } else if (closedir(level->dir))
> @@ -138,11 +146,20 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
>                                 continue;
>
>                         strbuf_addstr(&iter->base.path, de->d_name);
> -                       if (lstat(iter->base.path.buf, &iter->base.st) < 0) {
> -                               if (errno != ENOENT)
> +
> +                       if (iter->flags & DIR_ITERATOR_FOLLOW_SYMLINKS)
> +                               ret = stat(iter->base.path.buf, &iter->base.st);
> +                       else
> +                               ret = lstat(iter->base.path.buf, &iter->base.st);
> +
> +                       if (ret < 0) {
> +                               if (errno != ENOENT) {
> +                                       if (iter->flags & DIR_ITERATOR_PEDANTIC)
> +                                               goto error_out;
>                                         warning("error reading path '%s': %s",
>                                                 iter->base.path.buf,
>                                                 strerror(errno));
> +                               }
>                                 continue;
>                         }
>
> @@ -159,6 +176,10 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
>                         return ITER_OK;
>                 }
>         }
> +
> +error_out:
> +       dir_iterator_abort(dir_iterator);

Should this function call this or leaveit to the caller? The
description says "free resources" which to me sounds like something
the caller should be aware of. Although if double dir_iterator_abort()
has no bad side effects then I guess we can even leave it here.

PS. files-backend.c does call dir_iterator_abort() unconditionally.
Which sounds like this double-abort pattern should be dealt with even
if that call site does not use the pedantic flag (it could later on,
who knows; don't leave traps behind).

> +       return ITER_ERROR;
>  }
>
> diff --git a/dir-iterator.h b/dir-iterator.h
> index 970793d07a..fe9eb9a04b 100644
> --- a/dir-iterator.h
> +++ b/dir-iterator.h
> @@ -6,9 +6,10 @@
>  /*
>   * Iterate over a directory tree.
>   *
> - * Iterate over a directory tree, recursively, including paths of all
> - * types and hidden paths. Skip "." and ".." entries and don't follow
> - * symlinks except for the original path.
> + * With no flags to modify behaviour, iterate over a directory tree,

Nit but I think we can just skip "With no flags to modify behavior". It's given.

> + * recursively, including paths of all types and hidden paths. Skip
> + * "." and ".." entries and don't follow symlinks except for the
> + * original path.
>   *
>   * Every time dir_iterator_advance() is called, update the members of
>   * the dir_iterator structure to reflect the next path in the
> @@ -19,7 +20,7 @@
>   * A typical iteration looks like this:
>   *
>   *     int ok;
> - *     struct iterator *iter = dir_iterator_begin(path);
> + *     struct iterator *iter = dir_iterator_begin(path, 0);
>   *
>   *     while ((ok = dir_iterator_advance(iter)) == ITER_OK) {
>   *             if (want_to_stop_iteration()) {
> @@ -40,6 +41,20 @@
>   * dir_iterator_advance() again.
>   */
>
> +/*
> + * Flags for dir_iterator_begin:
> + *
> + * - DIR_ITERATOR_PEDANTIC: override dir-iterator's default behavior
> + *   in case of an error while trying to fetch the next entry, which is
> + *   to emit a warning and keep going. With this flag, resouces are
> + *   freed and ITER_ERROR is return immediately.
> + *
> + * - DIR_ITERATOR_FOLLOW_SYMLINKS: make dir-iterator follow symlinks to
> + *   directories, i.e., iterate over linked directories' contents.

Do we really need this flag? If dir-iterator does not follow symlinks,
the caller _can_ check stat data to detect symlinks and look inside
anyway. So this flag is more about convenience (_if_ it has more than
one call site, convenience for one call site is just not worth it).

Or is there something else I'm missing?

> + */
> +#define DIR_ITERATOR_PEDANTIC (1 << 0)
> +#define DIR_ITERATOR_FOLLOW_SYMLINKS (1 << 1)
> +
Matheus Tavares Bernardino Feb. 27, 2019, 1:59 p.m. UTC | #2
On Tue, Feb 26, 2019 at 9:02 AM Duy Nguyen <pclouds@gmail.com> wrote:
>
> On Tue, Feb 26, 2019 at 12:18 PM Matheus Tavares
> <matheus.bernardino@usp.br> wrote:
> >  int dir_iterator_advance(struct dir_iterator *dir_iterator)
> >  {
> >         struct dir_iterator_int *iter =
> >                 (struct dir_iterator_int *)dir_iterator;
> > +       int ret;
> >
> >         while (1) {
> >                 struct dir_iterator_level *level =
> > @@ -71,6 +75,8 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
> >
> >                         level->dir = opendir(iter->base.path.buf);
> >                         if (!level->dir && errno != ENOENT) {
> > +                               if (iter->flags & DIR_ITERATOR_PEDANTIC)
> > +                                       goto error_out;
> >                                 warning("error opening directory %s: %s",
> >                                         iter->base.path.buf, strerror(errno));
> >                                 /* Popping the level is handled below */
> > @@ -122,6 +128,8 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
> >                         if (!de) {
> >                                 /* This level is exhausted; pop up a level. */
> >                                 if (errno) {
> > +                                       if (iter->flags & DIR_ITERATOR_PEDANTIC)
> > +                                               goto error_out;
> >                                         warning("error reading directory %s: %s",
> >                                                 iter->base.path.buf, strerror(errno));
> >                                 } else if (closedir(level->dir))
> > @@ -138,11 +146,20 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
> >                                 continue;
> >
> >                         strbuf_addstr(&iter->base.path, de->d_name);
> > -                       if (lstat(iter->base.path.buf, &iter->base.st) < 0) {
> > -                               if (errno != ENOENT)
> > +
> > +                       if (iter->flags & DIR_ITERATOR_FOLLOW_SYMLINKS)
> > +                               ret = stat(iter->base.path.buf, &iter->base.st);
> > +                       else
> > +                               ret = lstat(iter->base.path.buf, &iter->base.st);
> > +
> > +                       if (ret < 0) {
> > +                               if (errno != ENOENT) {
> > +                                       if (iter->flags & DIR_ITERATOR_PEDANTIC)
> > +                                               goto error_out;
> >                                         warning("error reading path '%s': %s",
> >                                                 iter->base.path.buf,
> >                                                 strerror(errno));
> > +                               }
> >                                 continue;
> >                         }
> >
> > @@ -159,6 +176,10 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
> >                         return ITER_OK;
> >                 }
> >         }
> > +
> > +error_out:
> > +       dir_iterator_abort(dir_iterator);
>
> Should this function call this or leaveit to the caller? The
> description says "free resources" which to me sounds like something
> the caller should be aware of. Although if double dir_iterator_abort()
> has no bad side effects then I guess we can even leave it here.
>

The documentation at dir-iterator.h says that in case of errors (or
iteration exhaustion) at dir_iterator_advance(), the dir-iterator and
associated resources will be freed, so dir_iterator_abort() must be
called here, not left to the user. (Also, the use-case example at the
top of dir-iterator.h confirms this.)

> PS. files-backend.c does call dir_iterator_abort() unconditionally.
> Which sounds like this double-abort pattern should be dealt with even
> if that call site does not use the pedantic flag (it could later on,
> who knows; don't leave traps behind).
>

I have read the code at files-backend.c again but couldn't find where
it could call dir_iterator_abort() after dir_iterator_advance() have
already called it. Could you please point me out that?

A double-abort would certainly lead to 'double free or corruption'
error, so API users must only call it to abort iteration for some
reason when treating the iteration files, not because of iteration
errors. Also, this behaviour is not only for the pedantic flag. When
an iteration is complete, for example, a call to
dir_iterator_advance() will free the resources and return ITER_DONE.

> > +       return ITER_ERROR;
> >  }
> >
> > diff --git a/dir-iterator.h b/dir-iterator.h
> > index 970793d07a..fe9eb9a04b 100644
> > --- a/dir-iterator.h
> > +++ b/dir-iterator.h
> > @@ -6,9 +6,10 @@
> >  /*
> >   * Iterate over a directory tree.
> >   *
> > - * Iterate over a directory tree, recursively, including paths of all
> > - * types and hidden paths. Skip "." and ".." entries and don't follow
> > - * symlinks except for the original path.
> > + * With no flags to modify behaviour, iterate over a directory tree,
>
> Nit but I think we can just skip "With no flags to modify behavior". It's given.
>

Ok!

> > + * recursively, including paths of all types and hidden paths. Skip
> > + * "." and ".." entries and don't follow symlinks except for the
> > + * original path.
> >   *
> >   * Every time dir_iterator_advance() is called, update the members of
> >   * the dir_iterator structure to reflect the next path in the
> > @@ -19,7 +20,7 @@
> >   * A typical iteration looks like this:
> >   *
> >   *     int ok;
> > - *     struct iterator *iter = dir_iterator_begin(path);
> > + *     struct iterator *iter = dir_iterator_begin(path, 0);
> >   *
> >   *     while ((ok = dir_iterator_advance(iter)) == ITER_OK) {
> >   *             if (want_to_stop_iteration()) {
> > @@ -40,6 +41,20 @@
> >   * dir_iterator_advance() again.
> >   */
> >
> > +/*
> > + * Flags for dir_iterator_begin:
> > + *
> > + * - DIR_ITERATOR_PEDANTIC: override dir-iterator's default behavior
> > + *   in case of an error while trying to fetch the next entry, which is
> > + *   to emit a warning and keep going. With this flag, resouces are
> > + *   freed and ITER_ERROR is return immediately.
> > + *
> > + * - DIR_ITERATOR_FOLLOW_SYMLINKS: make dir-iterator follow symlinks to
> > + *   directories, i.e., iterate over linked directories' contents.
>
> Do we really need this flag? If dir-iterator does not follow symlinks,
> the caller _can_ check stat data to detect symlinks and look inside
> anyway. So this flag is more about convenience (_if_ it has more than
> one call site, convenience for one call site is just not worth it).
>

I don't think is just a convenience. If the API user wants to follow
symlinks a simple check at stat data to detect symlinks, wouldn't be
enough, since the user would want to traverse through all contents of
a symlinked directory and its subdirectories, recursively. Without
this flag, the caller would need to manually detect symlinks to
directories and start a new iteration on them (probably using a new
dir-iterator?). As that could lead to a more unnecessarily complex
code, and being this flag implementation so simple, I think it is
worthy.

> Or is there something else I'm missing?
>
> > + */
> > +#define DIR_ITERATOR_PEDANTIC (1 << 0)
> > +#define DIR_ITERATOR_FOLLOW_SYMLINKS (1 << 1)
> > +
> --
> Duy

Patch
diff mbox series

diff --git a/dir-iterator.c b/dir-iterator.c
index f2dcd82fde..17aca8ea41 100644
--- a/dir-iterator.c
+++ b/dir-iterator.c
@@ -48,12 +48,16 @@  struct dir_iterator_int {
 	 * that will be included in this iteration.
 	 */
 	struct dir_iterator_level *levels;
+
+	/* Combination of flags for this dir-iterator */
+	unsigned flags;
 };
 
 int dir_iterator_advance(struct dir_iterator *dir_iterator)
 {
 	struct dir_iterator_int *iter =
 		(struct dir_iterator_int *)dir_iterator;
+	int ret;
 
 	while (1) {
 		struct dir_iterator_level *level =
@@ -71,6 +75,8 @@  int dir_iterator_advance(struct dir_iterator *dir_iterator)
 
 			level->dir = opendir(iter->base.path.buf);
 			if (!level->dir && errno != ENOENT) {
+				if (iter->flags & DIR_ITERATOR_PEDANTIC)
+					goto error_out;
 				warning("error opening directory %s: %s",
 					iter->base.path.buf, strerror(errno));
 				/* Popping the level is handled below */
@@ -122,6 +128,8 @@  int dir_iterator_advance(struct dir_iterator *dir_iterator)
 			if (!de) {
 				/* This level is exhausted; pop up a level. */
 				if (errno) {
+					if (iter->flags & DIR_ITERATOR_PEDANTIC)
+						goto error_out;
 					warning("error reading directory %s: %s",
 						iter->base.path.buf, strerror(errno));
 				} else if (closedir(level->dir))
@@ -138,11 +146,20 @@  int dir_iterator_advance(struct dir_iterator *dir_iterator)
 				continue;
 
 			strbuf_addstr(&iter->base.path, de->d_name);
-			if (lstat(iter->base.path.buf, &iter->base.st) < 0) {
-				if (errno != ENOENT)
+
+			if (iter->flags & DIR_ITERATOR_FOLLOW_SYMLINKS)
+				ret = stat(iter->base.path.buf, &iter->base.st);
+			else
+				ret = lstat(iter->base.path.buf, &iter->base.st);
+
+			if (ret < 0) {
+				if (errno != ENOENT) {
+					if (iter->flags & DIR_ITERATOR_PEDANTIC)
+						goto error_out;
 					warning("error reading path '%s': %s",
 						iter->base.path.buf,
 						strerror(errno));
+				}
 				continue;
 			}
 
@@ -159,6 +176,10 @@  int dir_iterator_advance(struct dir_iterator *dir_iterator)
 			return ITER_OK;
 		}
 	}
+
+error_out:
+	dir_iterator_abort(dir_iterator);
+	return ITER_ERROR;
 }
 
 int dir_iterator_abort(struct dir_iterator *dir_iterator)
@@ -182,7 +203,7 @@  int dir_iterator_abort(struct dir_iterator *dir_iterator)
 	return ITER_DONE;
 }
 
-struct dir_iterator *dir_iterator_begin(const char *path)
+struct dir_iterator *dir_iterator_begin(const char *path, unsigned flags)
 {
 	struct dir_iterator_int *iter = xcalloc(1, sizeof(*iter));
 	struct dir_iterator *dir_iterator = &iter->base;
@@ -195,6 +216,7 @@  struct dir_iterator *dir_iterator_begin(const char *path)
 
 	ALLOC_GROW(iter->levels, 10, iter->levels_alloc);
 
+	iter->flags = flags;
 	iter->levels_nr = 1;
 	iter->levels[0].initialized = 0;
 
diff --git a/dir-iterator.h b/dir-iterator.h
index 970793d07a..fe9eb9a04b 100644
--- a/dir-iterator.h
+++ b/dir-iterator.h
@@ -6,9 +6,10 @@ 
 /*
  * Iterate over a directory tree.
  *
- * Iterate over a directory tree, recursively, including paths of all
- * types and hidden paths. Skip "." and ".." entries and don't follow
- * symlinks except for the original path.
+ * With no flags to modify behaviour, iterate over a directory tree,
+ * recursively, including paths of all types and hidden paths. Skip
+ * "." and ".." entries and don't follow symlinks except for the
+ * original path.
  *
  * Every time dir_iterator_advance() is called, update the members of
  * the dir_iterator structure to reflect the next path in the
@@ -19,7 +20,7 @@ 
  * A typical iteration looks like this:
  *
  *     int ok;
- *     struct iterator *iter = dir_iterator_begin(path);
+ *     struct iterator *iter = dir_iterator_begin(path, 0);
  *
  *     while ((ok = dir_iterator_advance(iter)) == ITER_OK) {
  *             if (want_to_stop_iteration()) {
@@ -40,6 +41,20 @@ 
  * dir_iterator_advance() again.
  */
 
+/*
+ * Flags for dir_iterator_begin:
+ *
+ * - DIR_ITERATOR_PEDANTIC: override dir-iterator's default behavior
+ *   in case of an error while trying to fetch the next entry, which is
+ *   to emit a warning and keep going. With this flag, resouces are
+ *   freed and ITER_ERROR is return immediately.
+ *
+ * - DIR_ITERATOR_FOLLOW_SYMLINKS: make dir-iterator follow symlinks to
+ *   directories, i.e., iterate over linked directories' contents.
+ */
+#define DIR_ITERATOR_PEDANTIC (1 << 0)
+#define DIR_ITERATOR_FOLLOW_SYMLINKS (1 << 1)
+
 struct dir_iterator {
 	/* The current path: */
 	struct strbuf path;
@@ -59,15 +74,19 @@  struct dir_iterator {
 };
 
 /*
- * Start a directory iteration over path. Return a dir_iterator that
- * holds the internal state of the iteration.
+ * Start a directory iteration over path with the combination of
+ * options specified by flags. Return a dir_iterator that holds the
+ * internal state of the iteration.
  *
  * The iteration includes all paths under path, not including path
  * itself and not including "." or ".." entries.
  *
- * path is the starting directory. An internal copy will be made.
+ * Parameters are:
+ *  - path is the starting directory. An internal copy will be made.
+ *  - flags is a combination of the possible flags to initialize a
+ *    dir-iterator or 0 for default behaviour.
  */
-struct dir_iterator *dir_iterator_begin(const char *path);
+struct dir_iterator *dir_iterator_begin(const char *path, unsigned flags);
 
 /*
  * Advance the iterator to the first or next item and return ITER_OK.
@@ -76,6 +95,11 @@  struct dir_iterator *dir_iterator_begin(const char *path);
  * dir_iterator and associated resources and return ITER_ERROR. It is
  * a bug to use iterator or call this function again after it has
  * returned ITER_DONE or ITER_ERROR.
+ *
+ * Note that whether dir-iterator will return ITER_ERROR when failing
+ * to fetch the next entry or just emit a warning and try to fetch the
+ * next is defined by the 'pedantic' option at dir-iterator's
+ * initialization.
  */
 int dir_iterator_advance(struct dir_iterator *iterator);
 
diff --git a/refs/files-backend.c b/refs/files-backend.c
index dd8abe9185..c3d3b6c454 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -2143,7 +2143,7 @@  static struct ref_iterator *reflog_iterator_begin(struct ref_store *ref_store,
 
 	base_ref_iterator_init(ref_iterator, &files_reflog_iterator_vtable, 0);
 	strbuf_addf(&sb, "%s/logs", gitdir);
-	iter->dir_iterator = dir_iterator_begin(sb.buf);
+	iter->dir_iterator = dir_iterator_begin(sb.buf, 0);
 	iter->ref_store = ref_store;
 	strbuf_release(&sb);