diff mbox series

[GSoC,1/3] dir-iterator: add pedantic option to dir_iterator_begin

Message ID 20190223190309.6728-2-matheus.bernardino@usp.br (mailing list archive)
State New, archived
Headers show
Series clone: convert explicit dir traversal to dir-iterator | expand

Commit Message

Matheus Tavares Feb. 23, 2019, 7:03 p.m. UTC
Add the pedantic option to dir-iterator's initialization function,
dir_iterator_begin. When this option is set to true,
dir_iterator_advance will immediately return ITER_ERROR when failing to
fetch the next entry. When set to false, dir_iterator_advance will emit
a warning and keep looking for the next entry.

Also adjust refs/files-backend.c to the new dir_iterator_begin
signature.

Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
Changes in v2:
 - Added in v2

 dir-iterator.c       | 23 +++++++++++++++++++++--
 dir-iterator.h       | 16 +++++++++++++---
 refs/files-backend.c |  2 +-
 3 files changed, 35 insertions(+), 6 deletions(-)

Comments

Thomas Gummerer Feb. 23, 2019, 9:35 p.m. UTC | #1
On 02/23, Matheus Tavares wrote:
> Add the pedantic option to dir-iterator's initialization function,
> dir_iterator_begin. When this option is set to true,
> dir_iterator_advance will immediately return ITER_ERROR when failing to
> fetch the next entry. When set to false, dir_iterator_advance will emit
> a warning and keep looking for the next entry.
> 
> Also adjust refs/files-backend.c to the new dir_iterator_begin
> signature.

Thanks, this makes sense to me.  This commit message describes what we
are doing in this patch, but not why we are doing it.  From previously
reviewing this series, I know this is going to be used in a subsequent
patch, but it is nice to give reviewers and future readers of this
patch that context as well.  Just something like "This behaviour is
going to be used in a subsequent patch." should be enough here.

A few comments inline.

> Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
> ---
> Changes in v2:
>  - Added in v2
> 
>  dir-iterator.c       | 23 +++++++++++++++++++++--
>  dir-iterator.h       | 16 +++++++++++++---
>  refs/files-backend.c |  2 +-
>  3 files changed, 35 insertions(+), 6 deletions(-)
> 
> diff --git a/dir-iterator.c b/dir-iterator.c
> index f2dcd82fde..070a656555 100644
> --- a/dir-iterator.c
> +++ b/dir-iterator.c
> @@ -48,6 +48,13 @@ struct dir_iterator_int {
>  	 * that will be included in this iteration.
>  	 */
>  	struct dir_iterator_level *levels;
> +
> +	/*
> +	 * Boolean value to define dir-iterator's behaviour when failing to
> +	 * fetch next entry. See comments on dir_iterator_begin at
> +	 * dir-iterator.h
> +	 */
> +	int pedantic;
>  };
>  
>  int dir_iterator_advance(struct dir_iterator *dir_iterator)
> @@ -71,6 +78,8 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
>  
>  			level->dir = opendir(iter->base.path.buf);
>  			if (!level->dir && errno != ENOENT) {
> +				if (iter->pedantic)
> +					goto error_out;

I think we should also print an error here.  The caller doesn't have
any context on what went wrong, and will probably just 'die()' if an
error is encountered.  I think it would make sense to call
'error(...)' here before 'goto error_out;' to give a useful error
message here.

>  				warning("error opening directory %s: %s",
>  					iter->base.path.buf, strerror(errno));
>  				/* Popping the level is handled below */
> @@ -122,6 +131,8 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
>  			if (!de) {
>  				/* This level is exhausted; pop up a level. */
>  				if (errno) {
> +					if (iter->pedantic)
> +						goto error_out;
>  					warning("error reading directory %s: %s",
>  						iter->base.path.buf, strerror(errno));
>  				} else if (closedir(level->dir))
> @@ -139,10 +150,13 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
>  
>  			strbuf_addstr(&iter->base.path, de->d_name);
>  			if (lstat(iter->base.path.buf, &iter->base.st) < 0) {
> -				if (errno != ENOENT)
> +				if (errno != ENOENT) {
> +					if (iter->pedantic)
> +						goto error_out;
>  					warning("error reading path '%s': %s",
>  						iter->base.path.buf,
>  						strerror(errno));
> +				}
>  				continue;
>  			}
>  
> @@ -159,6 +173,10 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
>  			return ITER_OK;
>  		}
>  	}
> +
> +error_out:
> +	dir_iterator_abort(dir_iterator);
> +	return ITER_ERROR;
>  }
>  
>  int dir_iterator_abort(struct dir_iterator *dir_iterator)
> @@ -182,7 +200,7 @@ int dir_iterator_abort(struct dir_iterator *dir_iterator)
>  	return ITER_DONE;
>  }
>  
> -struct dir_iterator *dir_iterator_begin(const char *path)
> +struct dir_iterator *dir_iterator_begin(const char *path, int pedantic)

Thinking about the future evolution of this interface, it might make
more sense to have that second parameter be a "struct
dir_iterator_opts".  For now it would just have one member "pedantic",
but in the future we could add additional options there instead of
adding additional parameters.

For now there are not many callers of the dir iterator API, so adding
a parameter is not a huge problem.  However we will most likely add
more callers in the future, so changing all of them becomes more and
more costly.

It also becomes a problem for integrating these patches, in the case
where a topic in flight adds a new use of this API.  In that case the
breakage wouldn't be noticed on merges, as there would be no merge
conflicts, however it would break the build, which makes the
maintainers job harder.

We could of course add new functions in the future, rather than
changing dir_iterator_begin, however just having one function, that
allows combining different options sounds like the better choice to me
here.

>  {
>  	struct dir_iterator_int *iter = xcalloc(1, sizeof(*iter));
>  	struct dir_iterator *dir_iterator = &iter->base;
> @@ -195,6 +213,7 @@ struct dir_iterator *dir_iterator_begin(const char *path)
>  
>  	ALLOC_GROW(iter->levels, 10, iter->levels_alloc);
>  
> +	iter->pedantic = pedantic;
>  	iter->levels_nr = 1;
>  	iter->levels[0].initialized = 0;
>  
> diff --git a/dir-iterator.h b/dir-iterator.h
> index 970793d07a..50ca8e1a27 100644
> --- a/dir-iterator.h
> +++ b/dir-iterator.h
> @@ -19,7 +19,7 @@
>   * A typical iteration looks like this:
>   *
>   *     int ok;
> - *     struct iterator *iter = dir_iterator_begin(path);
> + *     struct iterator *iter = dir_iterator_begin(path, 0);
>   *
>   *     while ((ok = dir_iterator_advance(iter)) == ITER_OK) {
>   *             if (want_to_stop_iteration()) {
> @@ -65,9 +65,15 @@ struct dir_iterator {
>   * The iteration includes all paths under path, not including path
>   * itself and not including "." or ".." entries.
>   *
> - * path is the starting directory. An internal copy will be made.
> + * Parameters are:
> + * - path is the starting directory. An internal copy will be made.
> + * - pedantic is a boolean value. If true, dir-iterator will free
> + *   resources and return ITER_ERROR immediately, in case of an error
> + *   while trying to fetch the next entry in dir_iterator_advance. If
> + *   false, it will just emit a warning and keep looking for the next
> + *   entry.
>   */
> -struct dir_iterator *dir_iterator_begin(const char *path);
> +struct dir_iterator *dir_iterator_begin(const char *path, int pedantic);
>  
>  /*
>   * Advance the iterator to the first or next item and return ITER_OK.
> @@ -76,6 +82,10 @@ struct dir_iterator *dir_iterator_begin(const char *path);
>   * dir_iterator and associated resources and return ITER_ERROR. It is
>   * a bug to use iterator or call this function again after it has
>   * returned ITER_DONE or ITER_ERROR.
> + *
> + * Note that whether dir_iterator_advance will return ITER_ERROR when
> + * failing to fetch the next entry or keep going is defined by the
> + * 'pedantic' option at dir-iterator's initialization.
>   */
>  int dir_iterator_advance(struct dir_iterator *iterator);
>  
> diff --git a/refs/files-backend.c b/refs/files-backend.c
> index dd8abe9185..c3d3b6c454 100644
> --- a/refs/files-backend.c
> +++ b/refs/files-backend.c
> @@ -2143,7 +2143,7 @@ static struct ref_iterator *reflog_iterator_begin(struct ref_store *ref_store,
>  
>  	base_ref_iterator_init(ref_iterator, &files_reflog_iterator_vtable, 0);
>  	strbuf_addf(&sb, "%s/logs", gitdir);
> -	iter->dir_iterator = dir_iterator_begin(sb.buf);
> +	iter->dir_iterator = dir_iterator_begin(sb.buf, 0);
>  	iter->ref_store = ref_store;
>  	strbuf_release(&sb);
>  
> -- 
> 2.20.1
>
Christian Couder Feb. 24, 2019, 8:35 a.m. UTC | #2
On Sat, Feb 23, 2019 at 10:37 PM Thomas Gummerer <t.gummerer@gmail.com> wrote:
>
> On 02/23, Matheus Tavares wrote:
> > Add the pedantic option to dir-iterator's initialization function,
> > dir_iterator_begin. When this option is set to true,
> > dir_iterator_advance will immediately return ITER_ERROR when failing to
> > fetch the next entry. When set to false, dir_iterator_advance will emit
> > a warning and keep looking for the next entry.
> >
> > Also adjust refs/files-backend.c to the new dir_iterator_begin
> > signature.
>
> Thanks, this makes sense to me.  This commit message describes what we
> are doing in this patch, but not why we are doing it.  From previously
> reviewing this series, I know this is going to be used in a subsequent
> patch, but it is nice to give reviewers and future readers of this
> patch that context as well.  Just something like "This behaviour is
> going to be used in a subsequent patch." should be enough here.

I agree that it's a good idea to add just that.

> >  int dir_iterator_advance(struct dir_iterator *dir_iterator)
> > @@ -71,6 +78,8 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
> >
> >                       level->dir = opendir(iter->base.path.buf);
> >                       if (!level->dir && errno != ENOENT) {
> > +                             if (iter->pedantic)
> > +                                     goto error_out;
>
> I think we should also print an error here.  The caller doesn't have
> any context on what went wrong, and will probably just 'die()' if an
> error is encountered.  I think it would make sense to call
> 'error(...)' here before 'goto error_out;' to give a useful error
> message here.

If we start to give error messages, then we might as well give error
messages all the times when we error out. This will avoid the callers
wondering if they need to give an error message or not.

I am not sure it's necessary here though. And I think if it's useful,
it can be added in another patch or another patch series.

> >                               warning("error opening directory %s: %s",
> >                                       iter->base.path.buf, strerror(errno));
> >                               /* Popping the level is handled below */

> > -struct dir_iterator *dir_iterator_begin(const char *path)
> > +struct dir_iterator *dir_iterator_begin(const char *path, int pedantic)
>
> Thinking about the future evolution of this interface, it might make
> more sense to have that second parameter be a "struct
> dir_iterator_opts".  For now it would just have one member "pedantic",
> but in the future we could add additional options there instead of
> adding additional parameters.

I think it's ok with `int pedantic` for now as improvements can be
done when they are really needed. And we will perhaps find out that
it's better to just change `int pedantic` to `unsigned flags` instead
of `struct dir_iterator_opts`.

Thanks,
Christian.
Matheus Tavares Feb. 24, 2019, 5:43 p.m. UTC | #3
On Sun, Feb 24, 2019 at 5:35 AM Christian Couder
<christian.couder@gmail.com> wrote:
>
> On Sat, Feb 23, 2019 at 10:37 PM Thomas Gummerer <t.gummerer@gmail.com> wrote:
> >
> > On 02/23, Matheus Tavares wrote:
> > > Add the pedantic option to dir-iterator's initialization function,
> > > dir_iterator_begin. When this option is set to true,
> > > dir_iterator_advance will immediately return ITER_ERROR when failing to
> > > fetch the next entry. When set to false, dir_iterator_advance will emit
> > > a warning and keep looking for the next entry.
> > >
> > > Also adjust refs/files-backend.c to the new dir_iterator_begin
> > > signature.
> >
> > Thanks, this makes sense to me.  This commit message describes what we
> > are doing in this patch, but not why we are doing it.  From previously
> > reviewing this series, I know this is going to be used in a subsequent
> > patch, but it is nice to give reviewers and future readers of this
> > patch that context as well.  Just something like "This behaviour is
> > going to be used in a subsequent patch." should be enough here.
>
> I agree that it's a good idea to add just that.
>

Indeed! Thank you Thomas and Christian. I will be adding this in v3.

> > >  int dir_iterator_advance(struct dir_iterator *dir_iterator)
> > > @@ -71,6 +78,8 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
> > >
> > >                       level->dir = opendir(iter->base.path.buf);
> > >                       if (!level->dir && errno != ENOENT) {
> > > +                             if (iter->pedantic)
> > > +                                     goto error_out;
> >
> > I think we should also print an error here.  The caller doesn't have
> > any context on what went wrong, and will probably just 'die()' if an
> > error is encountered.

To correctly handle the error, I assumed that the caller wouldn't need
to know which exact function returned an error, as long as it knew it
was a "failure to fetch the next entry" kind of error, which is the
"category" of errors caught with the 'pedantic' option. (currently, it
includes errors on lstat, opendir and readdir). Is this assumption
valid?

> > I think it would make sense to call
> > 'error(...)' here before 'goto error_out;' to give a useful error
> > message here.
>
> If we start to give error messages, then we might as well give error
> messages all the times when we error out. This will avoid the callers
> wondering if they need to give an error message or not.
>
> I am not sure it's necessary here though. And I think if it's useful,
> it can be added in another patch or another patch series.
>

I could just copy the warning messages bellow each 'goto error_out'
and use then at an 'error(...)' call before the goto. But as Christian
pointed out, I think this would confuse callers wether they should
print error messages or not. On the other side, it may just be
different 'layers' of errors... I don't have any strong opinion about
this.

> > >                               warning("error opening directory %s: %s",
> > >                                       iter->base.path.buf, strerror(errno));
> > >                               /* Popping the level is handled below */
>
> > > -struct dir_iterator *dir_iterator_begin(const char *path)
> > > +struct dir_iterator *dir_iterator_begin(const char *path, int pedantic)
> >
> > Thinking about the future evolution of this interface, it might make
> > more sense to have that second parameter be a "struct
> > dir_iterator_opts".  For now it would just have one member "pedantic",
> > but in the future we could add additional options there instead of
> > adding additional parameters.
>
> I think it's ok with `int pedantic` for now as improvements can be
> done when they are really needed. And we will perhaps find out that
> it's better to just change `int pedantic` to `unsigned flags` instead
> of `struct dir_iterator_opts`.
>

I did thought about using `unsigned flags` instead of `int pedantic`
for the same reasons Thomas pointed out, but as there would be just
one flag for now, it seemed to me that `int pedantic` would make more
sense (following the 'YAGNI' principle). But if it is already known
that more flags (or options) are coming in a very near future, I may
change this to `unsigned flags` or `struct dir_iterator_opts` in v3 if
you think it is needed. Just let me know, please.

> Thanks,
> Christian.
Thomas Gummerer Feb. 24, 2019, 9:06 p.m. UTC | #4
On 02/24, Matheus Tavares Bernardino wrote:
> On Sun, Feb 24, 2019 at 5:35 AM Christian Couder
> <christian.couder@gmail.com> wrote:
> > On Sat, Feb 23, 2019 at 10:37 PM Thomas Gummerer <t.gummerer@gmail.com> wrote:
> > > >  int dir_iterator_advance(struct dir_iterator *dir_iterator)
> > > > @@ -71,6 +78,8 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
> > > >
> > > >                       level->dir = opendir(iter->base.path.buf);
> > > >                       if (!level->dir && errno != ENOENT) {
> > > > +                             if (iter->pedantic)
> > > > +                                     goto error_out;
> > >
> > > I think we should also print an error here.  The caller doesn't have
> > > any context on what went wrong, and will probably just 'die()' if an
> > > error is encountered.
> 
> To correctly handle the error, I assumed that the caller wouldn't need
> to know which exact function returned an error, as long as it knew it
> was a "failure to fetch the next entry" kind of error, which is the
> "category" of errors caught with the 'pedantic' option. (currently, it
> includes errors on lstat, opendir and readdir). Is this assumption
> valid?
>
> > > I think it would make sense to call
> > > 'error(...)' here before 'goto error_out;' to give a useful error
> > > message here.
> >
> > If we start to give error messages, then we might as well give error
> > messages all the times when we error out. This will avoid the callers
> > wondering if they need to give an error message or not.
> >
> > I am not sure it's necessary here though. And I think if it's useful,
> > it can be added in another patch or another patch series.
> >
>
> I could just copy the warning messages bellow each 'goto error_out'
> and use then at an 'error(...)' call before the goto. But as Christian
> pointed out, I think this would confuse callers wether they should
> print error messages or not. On the other side, it may just be
> different 'layers' of errors... I don't have any strong opinion about
> this.

Right, I think it just comes down to which amount of detail we want to
communicate back to the user.  I thought a bit more detail could be
helpful, but just giving a more generic error should also be okay.

> > > >                               warning("error opening directory %s: %s",
> > > >                                       iter->base.path.buf, strerror(errno));
> > > >                               /* Popping the level is handled below */
> >
> > > > -struct dir_iterator *dir_iterator_begin(const char *path)
> > > > +struct dir_iterator *dir_iterator_begin(const char *path, int pedantic)
> > >
> > > Thinking about the future evolution of this interface, it might make
> > > more sense to have that second parameter be a "struct
> > > dir_iterator_opts".  For now it would just have one member "pedantic",
> > > but in the future we could add additional options there instead of
> > > adding additional parameters.
> >
> > I think it's ok with `int pedantic` for now as improvements can be
> > done when they are really needed. And we will perhaps find out that
> > it's better to just change `int pedantic` to `unsigned flags` instead
> > of `struct dir_iterator_opts`.
> >
> 
> I did thought about using `unsigned flags` instead of `int pedantic`
> for the same reasons Thomas pointed out, but as there would be just
> one flag for now, it seemed to me that `int pedantic` would make more
> sense (following the 'YAGNI' principle). But if it is already known
> that more flags (or options) are coming in a very near future, I may
> change this to `unsigned flags` or `struct dir_iterator_opts` in v3 if
> you think it is needed. Just let me know, please.

Looking at the potential improvements that were suggested in the
initial commit adding dir-iterator, 'unsigned flags' would not be
enough to be able to pass all those options.  That's where my
suggestion for 'struct dir_iterator_opts' comes in.

But I don't feel too strongly about it, and am okay with just an 'int
pedantic' option for now, until we see different usages, if others
feel like that's the better option for now.

> > Thanks,
> > Christian.
diff mbox series

Patch

diff --git a/dir-iterator.c b/dir-iterator.c
index f2dcd82fde..070a656555 100644
--- a/dir-iterator.c
+++ b/dir-iterator.c
@@ -48,6 +48,13 @@  struct dir_iterator_int {
 	 * that will be included in this iteration.
 	 */
 	struct dir_iterator_level *levels;
+
+	/*
+	 * Boolean value to define dir-iterator's behaviour when failing to
+	 * fetch next entry. See comments on dir_iterator_begin at
+	 * dir-iterator.h
+	 */
+	int pedantic;
 };
 
 int dir_iterator_advance(struct dir_iterator *dir_iterator)
@@ -71,6 +78,8 @@  int dir_iterator_advance(struct dir_iterator *dir_iterator)
 
 			level->dir = opendir(iter->base.path.buf);
 			if (!level->dir && errno != ENOENT) {
+				if (iter->pedantic)
+					goto error_out;
 				warning("error opening directory %s: %s",
 					iter->base.path.buf, strerror(errno));
 				/* Popping the level is handled below */
@@ -122,6 +131,8 @@  int dir_iterator_advance(struct dir_iterator *dir_iterator)
 			if (!de) {
 				/* This level is exhausted; pop up a level. */
 				if (errno) {
+					if (iter->pedantic)
+						goto error_out;
 					warning("error reading directory %s: %s",
 						iter->base.path.buf, strerror(errno));
 				} else if (closedir(level->dir))
@@ -139,10 +150,13 @@  int dir_iterator_advance(struct dir_iterator *dir_iterator)
 
 			strbuf_addstr(&iter->base.path, de->d_name);
 			if (lstat(iter->base.path.buf, &iter->base.st) < 0) {
-				if (errno != ENOENT)
+				if (errno != ENOENT) {
+					if (iter->pedantic)
+						goto error_out;
 					warning("error reading path '%s': %s",
 						iter->base.path.buf,
 						strerror(errno));
+				}
 				continue;
 			}
 
@@ -159,6 +173,10 @@  int dir_iterator_advance(struct dir_iterator *dir_iterator)
 			return ITER_OK;
 		}
 	}
+
+error_out:
+	dir_iterator_abort(dir_iterator);
+	return ITER_ERROR;
 }
 
 int dir_iterator_abort(struct dir_iterator *dir_iterator)
@@ -182,7 +200,7 @@  int dir_iterator_abort(struct dir_iterator *dir_iterator)
 	return ITER_DONE;
 }
 
-struct dir_iterator *dir_iterator_begin(const char *path)
+struct dir_iterator *dir_iterator_begin(const char *path, int pedantic)
 {
 	struct dir_iterator_int *iter = xcalloc(1, sizeof(*iter));
 	struct dir_iterator *dir_iterator = &iter->base;
@@ -195,6 +213,7 @@  struct dir_iterator *dir_iterator_begin(const char *path)
 
 	ALLOC_GROW(iter->levels, 10, iter->levels_alloc);
 
+	iter->pedantic = pedantic;
 	iter->levels_nr = 1;
 	iter->levels[0].initialized = 0;
 
diff --git a/dir-iterator.h b/dir-iterator.h
index 970793d07a..50ca8e1a27 100644
--- a/dir-iterator.h
+++ b/dir-iterator.h
@@ -19,7 +19,7 @@ 
  * A typical iteration looks like this:
  *
  *     int ok;
- *     struct iterator *iter = dir_iterator_begin(path);
+ *     struct iterator *iter = dir_iterator_begin(path, 0);
  *
  *     while ((ok = dir_iterator_advance(iter)) == ITER_OK) {
  *             if (want_to_stop_iteration()) {
@@ -65,9 +65,15 @@  struct dir_iterator {
  * The iteration includes all paths under path, not including path
  * itself and not including "." or ".." entries.
  *
- * path is the starting directory. An internal copy will be made.
+ * Parameters are:
+ * - path is the starting directory. An internal copy will be made.
+ * - pedantic is a boolean value. If true, dir-iterator will free
+ *   resources and return ITER_ERROR immediately, in case of an error
+ *   while trying to fetch the next entry in dir_iterator_advance. If
+ *   false, it will just emit a warning and keep looking for the next
+ *   entry.
  */
-struct dir_iterator *dir_iterator_begin(const char *path);
+struct dir_iterator *dir_iterator_begin(const char *path, int pedantic);
 
 /*
  * Advance the iterator to the first or next item and return ITER_OK.
@@ -76,6 +82,10 @@  struct dir_iterator *dir_iterator_begin(const char *path);
  * dir_iterator and associated resources and return ITER_ERROR. It is
  * a bug to use iterator or call this function again after it has
  * returned ITER_DONE or ITER_ERROR.
+ *
+ * Note that whether dir_iterator_advance will return ITER_ERROR when
+ * failing to fetch the next entry or keep going is defined by the
+ * 'pedantic' option at dir-iterator's initialization.
  */
 int dir_iterator_advance(struct dir_iterator *iterator);
 
diff --git a/refs/files-backend.c b/refs/files-backend.c
index dd8abe9185..c3d3b6c454 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -2143,7 +2143,7 @@  static struct ref_iterator *reflog_iterator_begin(struct ref_store *ref_store,
 
 	base_ref_iterator_init(ref_iterator, &files_reflog_iterator_vtable, 0);
 	strbuf_addf(&sb, "%s/logs", gitdir);
-	iter->dir_iterator = dir_iterator_begin(sb.buf);
+	iter->dir_iterator = dir_iterator_begin(sb.buf, 0);
 	iter->ref_store = ref_store;
 	strbuf_release(&sb);