[v2] fetch: emphasize failure during submodule fetch
diff mbox series

Message ID 20200116025948.136479-1-emilyshaffer@google.com
State New
Headers show
Series
  • [v2] fetch: emphasize failure during submodule fetch
Related show

Commit Message

Emily Shaffer Jan. 16, 2020, 2:59 a.m. UTC
In cases when a submodule fetch fails when there are many submodules, the error
from the lone failing submodule fetch is buried under activity on the other
submodules if more than one fetch fell back on fetch-by-oid. Call out a failure
late so the user is aware that something went wrong, and where.

Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
---
V1's approach was to show a generic error based on the output status of
fetch_populated_submodules(); with a long set of submodules, this is
only marginally better than showing the error inline. For v2, instead
we're gathering a list of submodules which failed during the parallel
processing.

The contents of stderr at the time fetch_finish() is called is not
available to us; 'err' on the input is for providing output only. So,
gather the submodule name only.

 - Emily

 submodule.c | 22 ++++++++++++++++++----
 1 file changed, 18 insertions(+), 4 deletions(-)

Comments

Junio C Hamano Jan. 16, 2020, 6:23 p.m. UTC | #1
Emily Shaffer <emilyshaffer@google.com> writes:

> @@ -1280,10 +1280,13 @@ struct submodule_parallel_fetch {
>  	/* Pending fetches by OIDs */
>  	struct fetch_task **oid_fetch_tasks;
>  	int oid_fetch_tasks_nr, oid_fetch_tasks_alloc;
> +
> +	struct strbuf submodules_with_errors;
> +	pthread_mutex_t submodule_errors_mutex;

Hmph, it is kind of surprising that we need a new mutex for this.

Isn't the task_finish handler, which is what accesses the
with_errors field this patch adds, called by pp_collect_finished()
one at a time, is it?

It seems oid_fetch_tasks[] array is also a shared resource in this
structure among the parallel fetch tasks, but there is no protection
against simultaneous access to it.  Am I missing what makes the new
field different?  Somewhat puzzled...

Other than that, I think this is a vast improvement relative to the
initial round.  I wonder if we want to _("i18n/l10n") the message,
though.

Thanks.


>  #define SPF_INIT {0, ARGV_ARRAY_INIT, NULL, NULL, 0, 0, 0, 0, \
>  		  STRING_LIST_INIT_DUP, \
> -		  NULL, 0, 0}
> +		  NULL, 0, 0, STRBUF_INIT, PTHREAD_MUTEX_INITIALIZER}
>  
>  static int get_fetch_recurse_config(const struct submodule *submodule,
>  				    struct submodule_parallel_fetch *spf)
> @@ -1547,7 +1550,10 @@ static int fetch_finish(int retvalue, struct strbuf *err,
>  	struct string_list_item *it;
>  	struct oid_array *commits;
>  
> -	if (retvalue)
> +	if (!task || !task->sub)
> +		BUG("callback cookie bogus");
> +
> +	if (retvalue) {
>  		/*
>  		 * NEEDSWORK: This indicates that the overall fetch
>  		 * failed, even though there may be a subsequent fetch
> @@ -1557,8 +1563,11 @@ static int fetch_finish(int retvalue, struct strbuf *err,
>  		 */
>  		spf->result = 1;
>  
> -	if (!task || !task->sub)
> -		BUG("callback cookie bogus");
> +		pthread_mutex_lock(&spf->submodule_errors_mutex);
> +		strbuf_addf(&spf->submodules_with_errors, "\t%s\n",
> +			    task->sub->name);
> +		pthread_mutex_unlock(&spf->submodule_errors_mutex);
> +	}
>  
>  	/* Is this the second time we process this submodule? */
>  	if (task->commits)
> @@ -1627,6 +1636,11 @@ int fetch_populated_submodules(struct repository *r,
>  				   &spf,
>  				   "submodule", "parallel/fetch");
>  
> +	if (spf.submodules_with_errors.len > 0)
> +		fprintf(stderr, "Errors during submodule fetch:\n%s",
> +			spf.submodules_with_errors.buf);
> +
> +
>  	argv_array_clear(&spf.args);
>  out:
>  	free_submodules_oids(&spf.changed_submodule_names);
Emily Shaffer Jan. 16, 2020, 9:55 p.m. UTC | #2
On Thu, Jan 16, 2020 at 10:23:58AM -0800, Junio C Hamano wrote:
> Emily Shaffer <emilyshaffer@google.com> writes:
> 
> > @@ -1280,10 +1280,13 @@ struct submodule_parallel_fetch {
> >  	/* Pending fetches by OIDs */
> >  	struct fetch_task **oid_fetch_tasks;
> >  	int oid_fetch_tasks_nr, oid_fetch_tasks_alloc;
> > +
> > +	struct strbuf submodules_with_errors;
> > +	pthread_mutex_t submodule_errors_mutex;
> 
> Hmph, it is kind of surprising that we need a new mutex for this.
> 
> Isn't the task_finish handler, which is what accesses the
> with_errors field this patch adds, called by pp_collect_finished()
> one at a time, is it?

Hm. It is called by pp_collect_finished() one at a time, but while other
processes may still be running. So I guess that is OK - spf might still
be read by other tasks but this field of it won't be touched by anybody
simultaneously. Ok, I'm convinced.

> It seems oid_fetch_tasks[] array is also a shared resource in this
> structure among the parallel fetch tasks, but there is no protection
> against simultaneous access to it.  Am I missing what makes the new
> field different?  Somewhat puzzled...

I think it's similar. As I understand it, it looks something like this:

  loop forever:
    can i start a new process?
      get_next_task cb (blocking)
      start work cb (nonblocking unless it failed to start)
    process stderr in/out once (blocking)
    is anybody done? (blocking)
      task_finished cb (blocking) <- My change is in here
        did fetch by ref fail? (blocking)
          put fetch by OID onto the process list (blocking)
    is everybody done?
      break

That is, everything but the work unit itself is blocking and runs in a
single threaded infinite loop. So since oid_fetch_tasks is read in
get_next_task callback and modified in the task_finished callback, those
areas don't need thread protection.

Thanks for poking me to think it through better. I'll remove the mutex
and include a short note about why it's not needed in the commit message.
I suppose if I wanted to try and catch more precise error information
during the actual work, then I would need it, but I'm not sure it's
necessary or trivial because of how the stdout/stderr is handled for
cohesive printing.

> Other than that, I think this is a vast improvement relative to the
> initial round.  I wonder if we want to _("i18n/l10n") the message,
> though.

Sure, sorry to have missed it.

Thanks for the thoughtful review. Will send a reroll in a moment.

 - Emily

> 
> 
> >  #define SPF_INIT {0, ARGV_ARRAY_INIT, NULL, NULL, 0, 0, 0, 0, \
> >  		  STRING_LIST_INIT_DUP, \
> > -		  NULL, 0, 0}
> > +		  NULL, 0, 0, STRBUF_INIT, PTHREAD_MUTEX_INITIALIZER}
> >  
> >  static int get_fetch_recurse_config(const struct submodule *submodule,
> >  				    struct submodule_parallel_fetch *spf)
> > @@ -1547,7 +1550,10 @@ static int fetch_finish(int retvalue, struct strbuf *err,
> >  	struct string_list_item *it;
> >  	struct oid_array *commits;
> >  
> > -	if (retvalue)
> > +	if (!task || !task->sub)
> > +		BUG("callback cookie bogus");
> > +
> > +	if (retvalue) {
> >  		/*
> >  		 * NEEDSWORK: This indicates that the overall fetch
> >  		 * failed, even though there may be a subsequent fetch
> > @@ -1557,8 +1563,11 @@ static int fetch_finish(int retvalue, struct strbuf *err,
> >  		 */
> >  		spf->result = 1;
> >  
> > -	if (!task || !task->sub)
> > -		BUG("callback cookie bogus");
> > +		pthread_mutex_lock(&spf->submodule_errors_mutex);
> > +		strbuf_addf(&spf->submodules_with_errors, "\t%s\n",
> > +			    task->sub->name);
> > +		pthread_mutex_unlock(&spf->submodule_errors_mutex);
> > +	}
> >  
> >  	/* Is this the second time we process this submodule? */
> >  	if (task->commits)
> > @@ -1627,6 +1636,11 @@ int fetch_populated_submodules(struct repository *r,
> >  				   &spf,
> >  				   "submodule", "parallel/fetch");
> >  
> > +	if (spf.submodules_with_errors.len > 0)
> > +		fprintf(stderr, "Errors during submodule fetch:\n%s",
> > +			spf.submodules_with_errors.buf);
> > +
> > +
> >  	argv_array_clear(&spf.args);
> >  out:
> >  	free_submodules_oids(&spf.changed_submodule_names);
Emily Shaffer Jan. 16, 2020, 10:04 p.m. UTC | #3
On Thu, Jan 16, 2020 at 01:55:26PM -0800, Emily Shaffer wrote:
> On Thu, Jan 16, 2020 at 10:23:58AM -0800, Junio C Hamano wrote:
> > Emily Shaffer <emilyshaffer@google.com> writes:
> > 
> > > @@ -1280,10 +1280,13 @@ struct submodule_parallel_fetch {
> > >  	/* Pending fetches by OIDs */
> > >  	struct fetch_task **oid_fetch_tasks;
> > >  	int oid_fetch_tasks_nr, oid_fetch_tasks_alloc;
> > > +
> > > +	struct strbuf submodules_with_errors;
> > > +	pthread_mutex_t submodule_errors_mutex;
> > 
> > Hmph, it is kind of surprising that we need a new mutex for this.
> > 
> > Isn't the task_finish handler, which is what accesses the
> > with_errors field this patch adds, called by pp_collect_finished()
> > one at a time, is it?
> 
> Hm. It is called by pp_collect_finished() one at a time, but while other
> processes may still be running. So I guess that is OK - spf might still
> be read by other tasks but this field of it won't be touched by anybody
> simultaneously. Ok, I'm convinced.
> 
> > It seems oid_fetch_tasks[] array is also a shared resource in this
> > structure among the parallel fetch tasks, but there is no protection
> > against simultaneous access to it.  Am I missing what makes the new
> > field different?  Somewhat puzzled...
> 
> I think it's similar. As I understand it, it looks something like this:
> 
>   loop forever:
>     can i start a new process?
>       get_next_task cb (blocking)
>       start work cb (nonblocking unless it failed to start)
>     process stderr in/out once (blocking)
>     is anybody done? (blocking)
>       task_finished cb (blocking) <- My change is in here
>         did fetch by ref fail? (blocking)
>           put fetch by OID onto the process list (blocking)
>     is everybody done?
>       break

Ah, as I look deeper I realize that it's a child process, not a thread,
so this code becomes even simpler to understand. I think then I don't
need to worry about thread safety at all here.

 - Emily

Patch
diff mbox series

diff --git a/submodule.c b/submodule.c
index 9da7181321..13bc9354bc 100644
--- a/submodule.c
+++ b/submodule.c
@@ -1280,10 +1280,13 @@  struct submodule_parallel_fetch {
 	/* Pending fetches by OIDs */
 	struct fetch_task **oid_fetch_tasks;
 	int oid_fetch_tasks_nr, oid_fetch_tasks_alloc;
+
+	struct strbuf submodules_with_errors;
+	pthread_mutex_t submodule_errors_mutex;
 };
 #define SPF_INIT {0, ARGV_ARRAY_INIT, NULL, NULL, 0, 0, 0, 0, \
 		  STRING_LIST_INIT_DUP, \
-		  NULL, 0, 0}
+		  NULL, 0, 0, STRBUF_INIT, PTHREAD_MUTEX_INITIALIZER}
 
 static int get_fetch_recurse_config(const struct submodule *submodule,
 				    struct submodule_parallel_fetch *spf)
@@ -1547,7 +1550,10 @@  static int fetch_finish(int retvalue, struct strbuf *err,
 	struct string_list_item *it;
 	struct oid_array *commits;
 
-	if (retvalue)
+	if (!task || !task->sub)
+		BUG("callback cookie bogus");
+
+	if (retvalue) {
 		/*
 		 * NEEDSWORK: This indicates that the overall fetch
 		 * failed, even though there may be a subsequent fetch
@@ -1557,8 +1563,11 @@  static int fetch_finish(int retvalue, struct strbuf *err,
 		 */
 		spf->result = 1;
 
-	if (!task || !task->sub)
-		BUG("callback cookie bogus");
+		pthread_mutex_lock(&spf->submodule_errors_mutex);
+		strbuf_addf(&spf->submodules_with_errors, "\t%s\n",
+			    task->sub->name);
+		pthread_mutex_unlock(&spf->submodule_errors_mutex);
+	}
 
 	/* Is this the second time we process this submodule? */
 	if (task->commits)
@@ -1627,6 +1636,11 @@  int fetch_populated_submodules(struct repository *r,
 				   &spf,
 				   "submodule", "parallel/fetch");
 
+	if (spf.submodules_with_errors.len > 0)
+		fprintf(stderr, "Errors during submodule fetch:\n%s",
+			spf.submodules_with_errors.buf);
+
+
 	argv_array_clear(&spf.args);
 out:
 	free_submodules_oids(&spf.changed_submodule_names);