diff mbox series

[v2,4/7] merge-ort: switch our strmaps over to using memory pools

Message ID dd8839b284330892a3bbcafbc03d71489fc9b01f.1627531121.git.gitgitgadget@gmail.com (mailing list archive)
State Superseded
Headers show
Series Final optimization batch (#15): use memory pools | expand

Commit Message

Elijah Newren July 29, 2021, 3:58 a.m. UTC
From: Elijah Newren <newren@gmail.com>

For all the strmaps (including strintmaps and strsets) whose memory is
unconditionally freed as part of clear_or_reinit_internal_opts(), switch
them over to using our new memory pool.

For the testcases mentioned in commit 557ac0350d ("merge-ort: begin
performance work; instrument with trace2_region_* calls", 2020-10-28),
this change improves the performance as follows:

                            Before                  After
    no-renames:      202.5  ms ±  3.2  ms    198.1 ms ±  2.6 ms
    mega-renames:      1.072 s ±  0.012 s    715.8 ms ±  4.0 ms
    just-one-mega:   357.3  ms ±  3.9  ms    276.8 ms ±  4.2 ms

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-ort.c | 125 +++++++++++++++++++++++++++++++---------------------
 1 file changed, 75 insertions(+), 50 deletions(-)

Comments

Jeff King July 29, 2021, 3:28 p.m. UTC | #1
On Thu, Jul 29, 2021 at 03:58:38AM +0000, Elijah Newren via GitGitGadget wrote:

> diff --git a/merge-ort.c b/merge-ort.c
> index 2bca4b71f2a..5fd2a4ccd35 100644
> --- a/merge-ort.c
> +++ b/merge-ort.c
> @@ -539,15 +539,19 @@ static void clear_or_reinit_internal_opts(struct merge_options_internal *opti,
>  	void (*strset_func)(struct strset *) =
>  		reinitialize ? strset_partial_clear : strset_clear;
>  
> -	/*
> -	 * We marked opti->paths with strdup_strings = 0, so that we
> -	 * wouldn't have to make another copy of the fullpath created by
> -	 * make_traverse_path from setup_path_info().  But, now that we've
> -	 * used it and have no other references to these strings, it is time
> -	 * to deallocate them.
> -	 */
> -	free_strmap_strings(&opti->paths);
> -	strmap_func(&opti->paths, 1);
> +	if (opti->pool)
> +		strmap_func(&opti->paths, 0);

This isn't new in your patch here, but I did scratch my head a bit over
what "strmap_func" is. It's a bit less confusing if you read the whole
function (as opposed to a diff), since then you're more likely to see
the definition. But something like "strmap_clear_func()" would have been
a lot less confusing.

Arguably, the existence of these function indirections is perhaps a sign
that the strmap API should provide a version of the clear functions that
takes "partial / not-partial" as a parameter.

(Again, not really part of this patch series, but I hadn't looked at
some of the earlier optimization steps).

-Peff
Elijah Newren July 29, 2021, 6:37 p.m. UTC | #2
On Thu, Jul 29, 2021 at 9:29 AM Jeff King <peff@peff.net> wrote:
>
> On Thu, Jul 29, 2021 at 03:58:38AM +0000, Elijah Newren via GitGitGadget wrote:
>
> > diff --git a/merge-ort.c b/merge-ort.c
> > index 2bca4b71f2a..5fd2a4ccd35 100644
> > --- a/merge-ort.c
> > +++ b/merge-ort.c
> > @@ -539,15 +539,19 @@ static void clear_or_reinit_internal_opts(struct merge_options_internal *opti,
> >       void (*strset_func)(struct strset *) =
> >               reinitialize ? strset_partial_clear : strset_clear;
> >
> > -     /*
> > -      * We marked opti->paths with strdup_strings = 0, so that we
> > -      * wouldn't have to make another copy of the fullpath created by
> > -      * make_traverse_path from setup_path_info().  But, now that we've
> > -      * used it and have no other references to these strings, it is time
> > -      * to deallocate them.
> > -      */
> > -     free_strmap_strings(&opti->paths);
> > -     strmap_func(&opti->paths, 1);
> > +     if (opti->pool)
> > +             strmap_func(&opti->paths, 0);
>
> This isn't new in your patch here, but I did scratch my head a bit over
> what "strmap_func" is. It's a bit less confusing if you read the whole
> function (as opposed to a diff), since then you're more likely to see
> the definition. But something like "strmap_clear_func()" would have been
> a lot less confusing.

Makes sense.

> Arguably, the existence of these function indirections is perhaps a sign
> that the strmap API should provide a version of the clear functions that
> takes "partial / not-partial" as a parameter.

Are you suggesting a modification of str{map,intmap,set}_clear() to
take an extra parameter, or removing the
str{map,intmap,set}_partial_clear() functions and introducing new
functions that take a partial/not-partial parameter?  I think you're
suggesting the latter, and that makes more sense to me...but I'm
drawing blanks trying to come up with a reasonable function name.

(If it helps for context -- the only current callers of the
*_partial_clear() functions are found in diffcore-rename.c and
merge-ort.c, so it'd be a pretty easy change to make to those.  There
are additionally some callers of strmap_clear() and strset_clear() in
builtin/shortlog.c and rerere.c, and it'd be nice to avoid exposing
those to the complexity of the partial clearing.)

> (Again, not really part of this patch series, but I hadn't looked at
> some of the earlier optimization steps).

Yeah, but this is the kind of reason I wanted you to review this
series, because I figured you might have more good comments on the
str{map,intmap,set} API calls.  :-)
Jeff King July 29, 2021, 8:09 p.m. UTC | #3
On Thu, Jul 29, 2021 at 12:37:52PM -0600, Elijah Newren wrote:

> > Arguably, the existence of these function indirections is perhaps a sign
> > that the strmap API should provide a version of the clear functions that
> > takes "partial / not-partial" as a parameter.
> 
> Are you suggesting a modification of str{map,intmap,set}_clear() to
> take an extra parameter, or removing the
> str{map,intmap,set}_partial_clear() functions and introducing new
> functions that take a partial/not-partial parameter?  I think you're
> suggesting the latter, and that makes more sense to me...but I'm
> drawing blanks trying to come up with a reasonable function name.

It does seem a shame to add the "partial" parameter to strmap_clear(),
just because most callers don't need it (so they end up with this
inscrutable "0" parameter).

What if there was a flags field? Then it could be combined with the
free_values parameter. The result is kind of verbose in two ways:

 - now strset_clear(), etc, need a "flags" parameter, which they didn't
   before (and is just "0" most of the time!)

 - now "strmap_clear(foo, 1)" becomes "strmap_clear(foo, STRMAP_FREE_VALUES)".
   That's a lot longer, though arguably it's easier to understand since
   the boolean is explained.

Having gone through the exercise, I am not sure it is actually making
anything more readable (messy patch is below for reference).

diff --git a/builtin/shortlog.c b/builtin/shortlog.c
index 3e7ab1ca82..dfbdba53da 100644
--- a/builtin/shortlog.c
+++ b/builtin/shortlog.c
@@ -242,7 +242,7 @@ void shortlog_add_commit(struct shortlog *log, struct commit *commit)
 		insert_records_from_trailers(log, &dups, commit, &ctx, oneline_str);
 	}
 
-	strset_clear(&dups);
+	strset_clear(&dups, 0);
 	strbuf_release(&ident);
 	strbuf_release(&oneline);
 }
diff --git a/diffcore-rename.c b/diffcore-rename.c
index 7e6b3e1b14..0c960111d1 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -665,9 +665,10 @@ void partial_clear_dir_rename_count(struct strmap *dir_rename_count)
 
 	strmap_for_each_entry(dir_rename_count, &iter, entry) {
 		struct strintmap *counts = entry->value;
-		strintmap_clear(counts);
+		strintmap_clear(counts, 0);
 	}
-	strmap_partial_clear(dir_rename_count, 1);
+	strmap_clear(dir_rename_count,
+		     STRMAP_FREE_VALUES | STRMAP_PARTIAL_CLEAR);
 }
 
 static void cleanup_dir_rename_info(struct dir_rename_info *info,
@@ -683,15 +684,15 @@ static void cleanup_dir_rename_info(struct dir_rename_info *info,
 		return;
 
 	/* idx_map */
-	strintmap_clear(&info->idx_map);
+	strintmap_clear(&info->idx_map, 0);
 
 	/* dir_rename_guess */
-	strmap_clear(&info->dir_rename_guess, 1);
+	strmap_clear(&info->dir_rename_guess, STRMAP_FREE_VALUES);
 
 	/* relevant_source_dirs */
 	if (info->relevant_source_dirs &&
 	    info->relevant_source_dirs != dirs_removed) {
-		strintmap_clear(info->relevant_source_dirs);
+		strintmap_clear(info->relevant_source_dirs, 0);
 		FREE_AND_NULL(info->relevant_source_dirs);
 	}
 
@@ -716,7 +717,7 @@ static void cleanup_dir_rename_info(struct dir_rename_info *info,
 
 		if (!strintmap_get(dirs_removed, source_dir)) {
 			string_list_append(&to_remove, source_dir);
-			strintmap_clear(counts);
+			strintmap_clear(counts, 0);
 			continue;
 		}
 
@@ -1045,8 +1046,8 @@ static int find_basename_matches(struct diff_options *options,
 		}
 	}
 
-	strintmap_clear(&sources);
-	strintmap_clear(&dests);
+	strintmap_clear(&sources, 0);
+	strintmap_clear(&dests, 0);
 
 	return renames;
 }
@@ -1700,7 +1701,7 @@ void diffcore_rename_extended(struct diff_options *options,
 	FREE_AND_NULL(rename_src);
 	rename_src_nr = rename_src_alloc = 0;
 	if (break_idx) {
-		strintmap_clear(break_idx);
+		strintmap_clear(break_idx, 0);
 		FREE_AND_NULL(break_idx);
 	}
 	trace2_region_leave("diff", "write back to queue", options->repo);
diff --git a/merge-ort.c b/merge-ort.c
index 0fb942692a..0765e23577 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -532,15 +532,10 @@ static void clear_or_reinit_internal_opts(struct merge_options_internal *opti,
 {
 	struct rename_info *renames = &opti->renames;
 	int i;
-	void (*strmap_func)(struct strmap *, int) =
-		reinitialize ? strmap_partial_clear : strmap_clear;
-	void (*strintmap_func)(struct strintmap *) =
-		reinitialize ? strintmap_partial_clear : strintmap_clear;
-	void (*strset_func)(struct strset *) =
-		reinitialize ? strset_partial_clear : strset_clear;
+	unsigned flags = reinitialize ? STRMAP_PARTIAL_CLEAR : 0;
 
 	if (opti->pool)
-		strmap_func(&opti->paths, 0);
+		strmap_clear(&opti->paths, flags);
 	else {
 		/*
 		 * We marked opti->paths with strdup_strings = 0, so that
@@ -550,15 +545,15 @@ static void clear_or_reinit_internal_opts(struct merge_options_internal *opti,
 		 * to these strings, it is time to deallocate them.
 		 */
 		free_strmap_strings(&opti->paths);
-		strmap_func(&opti->paths, 1);
+		strmap_clear(&opti->paths, flags | STRMAP_FREE_VALUES);
 	}
 
 	/*
 	 * All keys and values in opti->conflicted are a subset of those in
 	 * opti->paths.  We don't want to deallocate anything twice, so we
 	 * don't free the keys and we pass 0 for free_values.
 	 */
-	strmap_func(&opti->conflicted, 0);
+	strmap_clear(&opti->conflicted, flags);
 
 	if (!opti->pool) {
 		/*
@@ -579,24 +574,24 @@ static void clear_or_reinit_internal_opts(struct merge_options_internal *opti,
 
 	/* Free memory used by various renames maps */
 	for (i = MERGE_SIDE1; i <= MERGE_SIDE2; ++i) {
-		strintmap_func(&renames->dirs_removed[i]);
-		strmap_func(&renames->dir_renames[i], 0);
-		strintmap_func(&renames->relevant_sources[i]);
+		strintmap_clear(&renames->dirs_removed[i], flags);
+		strmap_clear(&renames->dir_renames[i], flags);
+		strintmap_clear(&renames->relevant_sources[i], flags);
 		if (!reinitialize)
 			assert(renames->cached_pairs_valid_side == 0);
 		if (i != renames->cached_pairs_valid_side &&
 		    -1 != renames->cached_pairs_valid_side) {
-			strset_func(&renames->cached_target_names[i]);
-			strmap_func(&renames->cached_pairs[i], 1);
-			strset_func(&renames->cached_irrelevant[i]);
+			strset_clear(&renames->cached_target_names[i], flags);
+			strmap_clear(&renames->cached_pairs[i], flags | STRMAP_FREE_VALUES);
+			strset_clear(&renames->cached_irrelevant[i], flags);
 			partial_clear_dir_rename_count(&renames->dir_rename_count[i]);
 			if (!reinitialize)
 				strmap_clear(&renames->dir_rename_count[i], 1);
 		}
 	}
 	for (i = MERGE_SIDE1; i <= MERGE_SIDE2; ++i) {
-		strintmap_func(&renames->deferred[i].possible_trivial_merges);
-		strset_func(&renames->deferred[i].target_dirs);
+		strintmap_clear(&renames->deferred[i].possible_trivial_merges, flags);
+		strset_clear(&renames->deferred[i].target_dirs, flags);
 		renames->deferred[i].trivial_merges_okay = 1; /* 1 == maybe */
 	}
 	renames->cached_pairs_valid_side = 0;
@@ -1482,7 +1477,7 @@ static int handle_deferred_entries(struct merge_options *opt,
 			if (ret < 0)
 				return ret;
 		}
-		strintmap_clear(&copy);
+		strintmap_clear(&copy, 0);
 		strintmap_for_each_entry(&renames->deferred[side].possible_trivial_merges,
 					 &iter, entry) {
 			const char *path = entry->key;
diff --git a/strmap.c b/strmap.c
index 4fb9f6100e..7343800df5 100644
--- a/strmap.c
+++ b/strmap.c
@@ -37,10 +37,11 @@ void strmap_init_with_options(struct strmap *map,
 	map->strdup_strings = strdup_strings;
 }
 
-static void strmap_free_entries_(struct strmap *map, int free_values)
+static void strmap_free_entries_(struct strmap *map, unsigned flags)
 {
 	struct hashmap_iter iter;
 	struct strmap_entry *e;
+	int free_values = flags & STRMAP_FREE_VALUES;
 
 	if (!map)
 		return;
@@ -64,16 +65,13 @@ static void strmap_free_entries_(struct strmap *map, int free_values)
 	}
 }
 
-void strmap_clear(struct strmap *map, int free_values)
+void strmap_clear(struct strmap *map, unsigned flags)
 {
-	strmap_free_entries_(map, free_values);
-	hashmap_clear(&map->map);
-}
-
-void strmap_partial_clear(struct strmap *map, int free_values)
-{
-	strmap_free_entries_(map, free_values);
-	hashmap_partial_clear(&map->map);
+	strmap_free_entries_(map, flags);
+	if (flags & STRMAP_PARTIAL_CLEAR)
+		hashmap_partial_clear(&map->map);
+	else
+		hashmap_clear(&map->map);
 }
 
 static struct strmap_entry *create_entry(struct strmap *map,
diff --git a/strmap.h b/strmap.h
index 1e152d832d..d03d451654 100644
--- a/strmap.h
+++ b/strmap.h
@@ -46,16 +46,14 @@ void strmap_init_with_options(struct strmap *map,
 			      struct mem_pool *pool,
 			      int strdup_strings);
 
-/*
- * Remove all entries from the map, releasing any allocated resources.
- */
-void strmap_clear(struct strmap *map, int free_values);
+#define STRMAP_FREE_VALUES 1 /* 1 for historical compat, but we should probably
+				update callers to use the correct name) */
+#define STRMAP_PARTIAL_CLEAR 2
 
 /*
- * Similar to strmap_clear() but leaves map->map->table allocated and
- * pre-sized so that subsequent uses won't need as many rehashings.
+ * Remove all entries from the map, releasing any allocated resources.
  */
-void strmap_partial_clear(struct strmap *map, int free_values);
+void strmap_clear(struct strmap *map, unsigned flags);
 
 /*
  * Insert "str" into the map, pointing to "data".
@@ -148,14 +146,10 @@ static inline void strintmap_init_with_options(struct strintmap *map,
 	map->default_value = default_value;
 }
 
-static inline void strintmap_clear(struct strintmap *map)
-{
-	strmap_clear(&map->map, 0);
-}
-
-static inline void strintmap_partial_clear(struct strintmap *map)
+static inline void strintmap_clear(struct strintmap *map, unsigned flags)
 {
-	strmap_partial_clear(&map->map, 0);
+	/* maybe clear STRMAP_FREE_VALUES bit for extra protection */
+	strmap_clear(&map->map, flags);
 }
 
 static inline int strintmap_contains(struct strintmap *map, const char *str)
@@ -232,14 +226,9 @@ static inline void strset_init_with_options(struct strset *set,
 	strmap_init_with_options(&set->map, pool, strdup_strings);
 }
 
-static inline void strset_clear(struct strset *set)
-{
-	strmap_clear(&set->map, 0);
-}
-
-static inline void strset_partial_clear(struct strset *set)
+static inline void strset_clear(struct strset *set, unsigned flags)
 {
-	strmap_partial_clear(&set->map, 0);
+	strmap_clear(&set->map, flags);
 }
 
 static inline int strset_contains(struct strset *set, const char *str)
Elijah Newren July 30, 2021, 2:30 a.m. UTC | #4
On Thu, Jul 29, 2021 at 2:09 PM Jeff King <peff@peff.net> wrote:
>
> On Thu, Jul 29, 2021 at 12:37:52PM -0600, Elijah Newren wrote:
>
> > > Arguably, the existence of these function indirections is perhaps a sign
> > > that the strmap API should provide a version of the clear functions that
> > > takes "partial / not-partial" as a parameter.
> >
> > Are you suggesting a modification of str{map,intmap,set}_clear() to
> > take an extra parameter, or removing the
> > str{map,intmap,set}_partial_clear() functions and introducing new
> > functions that take a partial/not-partial parameter?  I think you're
> > suggesting the latter, and that makes more sense to me...but I'm
> > drawing blanks trying to come up with a reasonable function name.
>
> It does seem a shame to add the "partial" parameter to strmap_clear(),
> just because most callers don't need it (so they end up with this
> inscrutable "0" parameter).
>
> What if there was a flags field? Then it could be combined with the
> free_values parameter. The result is kind of verbose in two ways:
>
>  - now strset_clear(), etc, need a "flags" parameter, which they didn't
>    before (and is just "0" most of the time!)
>
>  - now "strmap_clear(foo, 1)" becomes "strmap_clear(foo, STRMAP_FREE_VALUES)".
>    That's a lot longer, though arguably it's easier to understand since
>    the boolean is explained.
>
> Having gone through the exercise, I am not sure it is actually making
> anything more readable (messy patch is below for reference).

Thanks for diving in.  Since it's not clear if it's helping, I'll just
take your earlier suggestion to rename the "strmap_func" variable to
"strmap_clear_func" instead.

>
> diff --git a/builtin/shortlog.c b/builtin/shortlog.c
> index 3e7ab1ca82..dfbdba53da 100644
> --- a/builtin/shortlog.c
> +++ b/builtin/shortlog.c
> @@ -242,7 +242,7 @@ void shortlog_add_commit(struct shortlog *log, struct commit *commit)
>                 insert_records_from_trailers(log, &dups, commit, &ctx, oneline_str);
>         }
>
> -       strset_clear(&dups);
> +       strset_clear(&dups, 0);
>         strbuf_release(&ident);
>         strbuf_release(&oneline);
>  }
> diff --git a/diffcore-rename.c b/diffcore-rename.c
> index 7e6b3e1b14..0c960111d1 100644
> --- a/diffcore-rename.c
> +++ b/diffcore-rename.c
> @@ -665,9 +665,10 @@ void partial_clear_dir_rename_count(struct strmap *dir_rename_count)
>
>         strmap_for_each_entry(dir_rename_count, &iter, entry) {
>                 struct strintmap *counts = entry->value;
> -               strintmap_clear(counts);
> +               strintmap_clear(counts, 0);
>         }
> -       strmap_partial_clear(dir_rename_count, 1);
> +       strmap_clear(dir_rename_count,
> +                    STRMAP_FREE_VALUES | STRMAP_PARTIAL_CLEAR);
>  }
>
>  static void cleanup_dir_rename_info(struct dir_rename_info *info,
> @@ -683,15 +684,15 @@ static void cleanup_dir_rename_info(struct dir_rename_info *info,
>                 return;
>
>         /* idx_map */
> -       strintmap_clear(&info->idx_map);
> +       strintmap_clear(&info->idx_map, 0);
>
>         /* dir_rename_guess */
> -       strmap_clear(&info->dir_rename_guess, 1);
> +       strmap_clear(&info->dir_rename_guess, STRMAP_FREE_VALUES);
>
>         /* relevant_source_dirs */
>         if (info->relevant_source_dirs &&
>             info->relevant_source_dirs != dirs_removed) {
> -               strintmap_clear(info->relevant_source_dirs);
> +               strintmap_clear(info->relevant_source_dirs, 0);
>                 FREE_AND_NULL(info->relevant_source_dirs);
>         }
>
> @@ -716,7 +717,7 @@ static void cleanup_dir_rename_info(struct dir_rename_info *info,
>
>                 if (!strintmap_get(dirs_removed, source_dir)) {
>                         string_list_append(&to_remove, source_dir);
> -                       strintmap_clear(counts);
> +                       strintmap_clear(counts, 0);
>                         continue;
>                 }
>
> @@ -1045,8 +1046,8 @@ static int find_basename_matches(struct diff_options *options,
>                 }
>         }
>
> -       strintmap_clear(&sources);
> -       strintmap_clear(&dests);
> +       strintmap_clear(&sources, 0);
> +       strintmap_clear(&dests, 0);
>
>         return renames;
>  }
> @@ -1700,7 +1701,7 @@ void diffcore_rename_extended(struct diff_options *options,
>         FREE_AND_NULL(rename_src);
>         rename_src_nr = rename_src_alloc = 0;
>         if (break_idx) {
> -               strintmap_clear(break_idx);
> +               strintmap_clear(break_idx, 0);
>                 FREE_AND_NULL(break_idx);
>         }
>         trace2_region_leave("diff", "write back to queue", options->repo);
> diff --git a/merge-ort.c b/merge-ort.c
> index 0fb942692a..0765e23577 100644
> --- a/merge-ort.c
> +++ b/merge-ort.c
> @@ -532,15 +532,10 @@ static void clear_or_reinit_internal_opts(struct merge_options_internal *opti,
>  {
>         struct rename_info *renames = &opti->renames;
>         int i;
> -       void (*strmap_func)(struct strmap *, int) =
> -               reinitialize ? strmap_partial_clear : strmap_clear;
> -       void (*strintmap_func)(struct strintmap *) =
> -               reinitialize ? strintmap_partial_clear : strintmap_clear;
> -       void (*strset_func)(struct strset *) =
> -               reinitialize ? strset_partial_clear : strset_clear;
> +       unsigned flags = reinitialize ? STRMAP_PARTIAL_CLEAR : 0;
>
>         if (opti->pool)
> -               strmap_func(&opti->paths, 0);
> +               strmap_clear(&opti->paths, flags);
>         else {
>                 /*
>                  * We marked opti->paths with strdup_strings = 0, so that
> @@ -550,15 +545,15 @@ static void clear_or_reinit_internal_opts(struct merge_options_internal *opti,
>                  * to these strings, it is time to deallocate them.
>                  */
>                 free_strmap_strings(&opti->paths);
> -               strmap_func(&opti->paths, 1);
> +               strmap_clear(&opti->paths, flags | STRMAP_FREE_VALUES);
>         }
>
>         /*
>          * All keys and values in opti->conflicted are a subset of those in
>          * opti->paths.  We don't want to deallocate anything twice, so we
>          * don't free the keys and we pass 0 for free_values.
>          */
> -       strmap_func(&opti->conflicted, 0);
> +       strmap_clear(&opti->conflicted, flags);
>
>         if (!opti->pool) {
>                 /*
> @@ -579,24 +574,24 @@ static void clear_or_reinit_internal_opts(struct merge_options_internal *opti,
>
>         /* Free memory used by various renames maps */
>         for (i = MERGE_SIDE1; i <= MERGE_SIDE2; ++i) {
> -               strintmap_func(&renames->dirs_removed[i]);
> -               strmap_func(&renames->dir_renames[i], 0);
> -               strintmap_func(&renames->relevant_sources[i]);
> +               strintmap_clear(&renames->dirs_removed[i], flags);
> +               strmap_clear(&renames->dir_renames[i], flags);
> +               strintmap_clear(&renames->relevant_sources[i], flags);
>                 if (!reinitialize)
>                         assert(renames->cached_pairs_valid_side == 0);
>                 if (i != renames->cached_pairs_valid_side &&
>                     -1 != renames->cached_pairs_valid_side) {
> -                       strset_func(&renames->cached_target_names[i]);
> -                       strmap_func(&renames->cached_pairs[i], 1);
> -                       strset_func(&renames->cached_irrelevant[i]);
> +                       strset_clear(&renames->cached_target_names[i], flags);
> +                       strmap_clear(&renames->cached_pairs[i], flags | STRMAP_FREE_VALUES);
> +                       strset_clear(&renames->cached_irrelevant[i], flags);
>                         partial_clear_dir_rename_count(&renames->dir_rename_count[i]);
>                         if (!reinitialize)
>                                 strmap_clear(&renames->dir_rename_count[i], 1);
>                 }
>         }
>         for (i = MERGE_SIDE1; i <= MERGE_SIDE2; ++i) {
> -               strintmap_func(&renames->deferred[i].possible_trivial_merges);
> -               strset_func(&renames->deferred[i].target_dirs);
> +               strintmap_clear(&renames->deferred[i].possible_trivial_merges, flags);
> +               strset_clear(&renames->deferred[i].target_dirs, flags);
>                 renames->deferred[i].trivial_merges_okay = 1; /* 1 == maybe */
>         }
>         renames->cached_pairs_valid_side = 0;
> @@ -1482,7 +1477,7 @@ static int handle_deferred_entries(struct merge_options *opt,
>                         if (ret < 0)
>                                 return ret;
>                 }
> -               strintmap_clear(&copy);
> +               strintmap_clear(&copy, 0);
>                 strintmap_for_each_entry(&renames->deferred[side].possible_trivial_merges,
>                                          &iter, entry) {
>                         const char *path = entry->key;
> diff --git a/strmap.c b/strmap.c
> index 4fb9f6100e..7343800df5 100644
> --- a/strmap.c
> +++ b/strmap.c
> @@ -37,10 +37,11 @@ void strmap_init_with_options(struct strmap *map,
>         map->strdup_strings = strdup_strings;
>  }
>
> -static void strmap_free_entries_(struct strmap *map, int free_values)
> +static void strmap_free_entries_(struct strmap *map, unsigned flags)
>  {
>         struct hashmap_iter iter;
>         struct strmap_entry *e;
> +       int free_values = flags & STRMAP_FREE_VALUES;
>
>         if (!map)
>                 return;
> @@ -64,16 +65,13 @@ static void strmap_free_entries_(struct strmap *map, int free_values)
>         }
>  }
>
> -void strmap_clear(struct strmap *map, int free_values)
> +void strmap_clear(struct strmap *map, unsigned flags)
>  {
> -       strmap_free_entries_(map, free_values);
> -       hashmap_clear(&map->map);
> -}
> -
> -void strmap_partial_clear(struct strmap *map, int free_values)
> -{
> -       strmap_free_entries_(map, free_values);
> -       hashmap_partial_clear(&map->map);
> +       strmap_free_entries_(map, flags);
> +       if (flags & STRMAP_PARTIAL_CLEAR)
> +               hashmap_partial_clear(&map->map);
> +       else
> +               hashmap_clear(&map->map);
>  }
>
>  static struct strmap_entry *create_entry(struct strmap *map,
> diff --git a/strmap.h b/strmap.h
> index 1e152d832d..d03d451654 100644
> --- a/strmap.h
> +++ b/strmap.h
> @@ -46,16 +46,14 @@ void strmap_init_with_options(struct strmap *map,
>                               struct mem_pool *pool,
>                               int strdup_strings);
>
> -/*
> - * Remove all entries from the map, releasing any allocated resources.
> - */
> -void strmap_clear(struct strmap *map, int free_values);
> +#define STRMAP_FREE_VALUES 1 /* 1 for historical compat, but we should probably
> +                               update callers to use the correct name) */
> +#define STRMAP_PARTIAL_CLEAR 2
>
>  /*
> - * Similar to strmap_clear() but leaves map->map->table allocated and
> - * pre-sized so that subsequent uses won't need as many rehashings.
> + * Remove all entries from the map, releasing any allocated resources.
>   */
> -void strmap_partial_clear(struct strmap *map, int free_values);
> +void strmap_clear(struct strmap *map, unsigned flags);
>
>  /*
>   * Insert "str" into the map, pointing to "data".
> @@ -148,14 +146,10 @@ static inline void strintmap_init_with_options(struct strintmap *map,
>         map->default_value = default_value;
>  }
>
> -static inline void strintmap_clear(struct strintmap *map)
> -{
> -       strmap_clear(&map->map, 0);
> -}
> -
> -static inline void strintmap_partial_clear(struct strintmap *map)
> +static inline void strintmap_clear(struct strintmap *map, unsigned flags)
>  {
> -       strmap_partial_clear(&map->map, 0);
> +       /* maybe clear STRMAP_FREE_VALUES bit for extra protection */
> +       strmap_clear(&map->map, flags);
>  }
>
>  static inline int strintmap_contains(struct strintmap *map, const char *str)
> @@ -232,14 +226,9 @@ static inline void strset_init_with_options(struct strset *set,
>         strmap_init_with_options(&set->map, pool, strdup_strings);
>  }
>
> -static inline void strset_clear(struct strset *set)
> -{
> -       strmap_clear(&set->map, 0);
> -}
> -
> -static inline void strset_partial_clear(struct strset *set)
> +static inline void strset_clear(struct strset *set, unsigned flags)
>  {
> -       strmap_partial_clear(&set->map, 0);
> +       strmap_clear(&set->map, flags);
>  }
>
>  static inline int strset_contains(struct strset *set, const char *str)
Ævar Arnfjörð Bjarmason July 30, 2021, 1:30 p.m. UTC | #5
On Thu, Jul 29 2021, Jeff King wrote:

> On Thu, Jul 29, 2021 at 12:37:52PM -0600, Elijah Newren wrote:
>
>> > Arguably, the existence of these function indirections is perhaps a sign
>> > that the strmap API should provide a version of the clear functions that
>> > takes "partial / not-partial" as a parameter.
>> 
>> Are you suggesting a modification of str{map,intmap,set}_clear() to
>> take an extra parameter, or removing the
>> str{map,intmap,set}_partial_clear() functions and introducing new
>> functions that take a partial/not-partial parameter?  I think you're
>> suggesting the latter, and that makes more sense to me...but I'm
>> drawing blanks trying to come up with a reasonable function name.
>
> It does seem a shame to add the "partial" parameter to strmap_clear(),
> just because most callers don't need it (so they end up with this
> inscrutable "0" parameter).
>
> What if there was a flags field? Then it could be combined with the
> free_values parameter. The result is kind of verbose in two ways:
>
>  - now strset_clear(), etc, need a "flags" parameter, which they didn't
>    before (and is just "0" most of the time!)
>
>  - now "strmap_clear(foo, 1)" becomes "strmap_clear(foo, STRMAP_FREE_VALUES)".
>    That's a lot longer, though arguably it's easier to understand since
>    the boolean is explained.
>
> Having gone through the exercise, I am not sure it is actually making
> anything more readable (messy patch is below for reference).

I've got some WIP patches for string-list.h and strmap.h to make the API
nicer, and it's probably applicable to strset.h too.

I.e. I found when using strset.h that it was a weird API to use, because
unlike string-list.h it didn't pay attention to your "dup" field when
freeing, you had to do it explicitly.

And then in e.g. merge-ort.c there's this "strdup dance" pattern where
we flip the field back and forth.

The below diff is exctracted from that WIP work, with the relevant two
API headers and then two changed API users for show (the tree-wide
changes are much larger).

I think making the promise I make in the updated docs at "We guarantee
that the `clearfunc`[...]" in string-list.h makes for particularly nice
API behavior.

 builtin/remote.c | 37 ++++++++++++++++++++---------------
 merge-ort.c      | 32 +++++++-----------------------
 string-list.h    | 59 ++++++++++++++++++++++++++++++++++++++++++++++++++++++--
 strmap.h         | 13 +++++++++++++
 4 files changed, 98 insertions(+), 43 deletions(-)

diff --git a/builtin/remote.c b/builtin/remote.c
index 7f88e6ce9de..ec1dbd49f71 100644
--- a/builtin/remote.c
+++ b/builtin/remote.c
@@ -340,10 +340,24 @@ static void read_branches(void)
 
 struct ref_states {
 	struct remote *remote;
-	struct string_list new_refs, stale, tracked, heads, push;
+
+	struct string_list new_refs;
+	struct string_list stale;
+	struct string_list tracked;
+	struct string_list heads;
+	struct string_list push;
+
 	int queried;
 };
 
+#define REF_STATES_INIT { \
+	.new_refs = STRING_LIST_INIT_DUP, \
+	.stale = STRING_LIST_INIT_DUP, \
+	.tracked = STRING_LIST_INIT_DUP, \
+	.heads = STRING_LIST_INIT_DUP, \
+	.push = STRING_LIST_INIT_DUP, \
+}
+
 static int get_ref_states(const struct ref *remote_refs, struct ref_states *states)
 {
 	struct ref *fetch_map = NULL, **tail = &fetch_map;
@@ -355,9 +369,6 @@ static int get_ref_states(const struct ref *remote_refs, struct ref_states *stat
 			die(_("Could not get fetch map for refspec %s"),
 				states->remote->fetch.raw[i]);
 
-	states->new_refs.strdup_strings = 1;
-	states->tracked.strdup_strings = 1;
-	states->stale.strdup_strings = 1;
 	for (ref = fetch_map; ref; ref = ref->next) {
 		if (!ref->peer_ref || !ref_exists(ref->peer_ref->name))
 			string_list_append(&states->new_refs, abbrev_branch(ref->name));
@@ -406,7 +417,6 @@ static int get_push_ref_states(const struct ref *remote_refs,
 
 	match_push_refs(local_refs, &push_map, &remote->push, MATCH_REFS_NONE);
 
-	states->push.strdup_strings = 1;
 	for (ref = push_map; ref; ref = ref->next) {
 		struct string_list_item *item;
 		struct push_info *info;
@@ -449,7 +459,6 @@ static int get_push_ref_states_noquery(struct ref_states *states)
 	if (remote->mirror)
 		return 0;
 
-	states->push.strdup_strings = 1;
 	if (!remote->push.nr) {
 		item = string_list_append(&states->push, _("(matching)"));
 		info = item->util = xcalloc(1, sizeof(struct push_info));
@@ -483,7 +492,6 @@ static int get_head_names(const struct ref *remote_refs, struct ref_states *stat
 	refspec.force = 0;
 	refspec.pattern = 1;
 	refspec.src = refspec.dst = "refs/heads/*";
-	states->heads.strdup_strings = 1;
 	get_fetch_map(remote_refs, &refspec, &fetch_map_tail, 0);
 	matches = guess_remote_head(find_ref_by_name(remote_refs, "HEAD"),
 				    fetch_map, 1);
@@ -905,7 +913,7 @@ static void clear_push_info(void *util, const char *string)
 {
 	struct push_info *info = util;
 	free(info->dest);
-	free(info);
+	/* note: fixed memleak here */
 }
 
 static void free_remote_ref_states(struct ref_states *states)
@@ -1159,7 +1167,7 @@ static int get_one_entry(struct remote *remote, void *priv)
 		string_list_append(list, remote->name)->util =
 				strbuf_detach(&url_buf, NULL);
 	} else
-		string_list_append(list, remote->name)->util = NULL;
+		string_list_append(list, remote->name);
 	if (remote->pushurl_nr) {
 		url = remote->pushurl;
 		url_nr = remote->pushurl_nr;
@@ -1179,10 +1187,9 @@ static int get_one_entry(struct remote *remote, void *priv)
 
 static int show_all(void)
 {
-	struct string_list list = STRING_LIST_INIT_NODUP;
+	struct string_list list = STRING_LIST_INIT_DUP;
 	int result;
 
-	list.strdup_strings = 1;
 	result = for_each_remote(get_one_entry, &list);
 
 	if (!result) {
@@ -1212,7 +1219,7 @@ static int show(int argc, const char **argv)
 		OPT_BOOL('n', NULL, &no_query, N_("do not query remotes")),
 		OPT_END()
 	};
-	struct ref_states states;
+	struct ref_states states = REF_STATES_INIT;
 	struct string_list info_list = STRING_LIST_INIT_NODUP;
 	struct show_info info;
 
@@ -1334,8 +1341,7 @@ static int set_head(int argc, const char **argv)
 	if (!opt_a && !opt_d && argc == 2) {
 		head_name = xstrdup(argv[1]);
 	} else if (opt_a && !opt_d && argc == 1) {
-		struct ref_states states;
-		memset(&states, 0, sizeof(states));
+		struct ref_states states = REF_STATES_INIT;
 		get_remote_ref_states(argv[0], &states, GET_HEAD_NAMES);
 		if (!states.heads.nr)
 			result |= error(_("Cannot determine remote HEAD"));
@@ -1374,14 +1380,13 @@ static int set_head(int argc, const char **argv)
 static int prune_remote(const char *remote, int dry_run)
 {
 	int result = 0;
-	struct ref_states states;
+	struct ref_states states = REF_STATES_INIT;
 	struct string_list refs_to_prune = STRING_LIST_INIT_NODUP;
 	struct string_list_item *item;
 	const char *dangling_msg = dry_run
 		? _(" %s will become dangling!")
 		: _(" %s has become dangling!");
 
-	memset(&states, 0, sizeof(states));
 	get_remote_ref_states(remote, &states, GET_REF_STATES);
 
 	if (!states.stale.nr) {
diff --git a/merge-ort.c b/merge-ort.c
index ec0c5904211..53ed78e7a01 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -432,16 +432,6 @@ struct conflict_info {
 	assert((ci) && !(mi)->clean);        \
 } while (0)
 
-static void free_strmap_strings(struct strmap *map)
-{
-	struct hashmap_iter iter;
-	struct strmap_entry *entry;
-
-	strmap_for_each_entry(map, &iter, entry) {
-		free((char*)entry->key);
-	}
-}
-
 static void clear_or_reinit_internal_opts(struct merge_options_internal *opti,
 					  int reinitialize)
 {
@@ -455,13 +445,11 @@ static void clear_or_reinit_internal_opts(struct merge_options_internal *opti,
 		reinitialize ? strset_partial_clear : strset_clear;
 
 	/*
-	 * We marked opti->paths with strdup_strings = 0, so that we
-	 * wouldn't have to make another copy of the fullpath created by
-	 * make_traverse_path from setup_path_info().  But, now that we've
-	 * used it and have no other references to these strings, it is time
-	 * to deallocate them.
+	 * We used the the pattern of re-using already allocated
+	 * strings strmap_clear_strings() in make_traverse_path from
+	 * setup_path_info(). Deallocate them.
 	 */
-	free_strmap_strings(&opti->paths);
+	strmap_clear_strings(&opti->paths, 0);
 	strmap_func(&opti->paths, 1);
 
 	/*
@@ -472,15 +460,10 @@ static void clear_or_reinit_internal_opts(struct merge_options_internal *opti,
 	strmap_func(&opti->conflicted, 0);
 
 	/*
-	 * opti->paths_to_free is similar to opti->paths; we created it with
-	 * strdup_strings = 0 to avoid making _another_ copy of the fullpath
-	 * but now that we've used it and have no other references to these
-	 * strings, it is time to deallocate them.  We do so by temporarily
-	 * setting strdup_strings to 1.
+	 * opti->paths_to_free is similar to opti->paths; it's memory
+	 * we borrowed and need to free with string_list_clear_strings().
 	 */
-	opti->paths_to_free.strdup_strings = 1;
-	string_list_clear(&opti->paths_to_free, 0);
-	opti->paths_to_free.strdup_strings = 0;
+	string_list_clear_strings(&opti->paths_to_free, 0);
 
 	if (opti->attr_index.cache_nr) /* true iff opt->renormalize */
 		discard_index(&opti->attr_index);
@@ -2664,7 +2647,6 @@ static int collect_renames(struct merge_options *opt,
 	 * and have no other references to these strings, it is time to
 	 * deallocate them.
 	 */
-	free_strmap_strings(&collisions);
 	strmap_clear(&collisions, 1);
 	return clean;
 }
diff --git a/string-list.h b/string-list.h
index 0d6b4692396..9eeea996888 100644
--- a/string-list.h
+++ b/string-list.h
@@ -109,6 +109,9 @@ void string_list_init_dup(struct string_list *list);
  */
 void string_list_init(struct string_list *list, int strdup_strings);
 
+void string_list_cmp_init(struct string_list *list, int strdup_strings,
+			  compare_strings_fn cmp);
+
 /** Callback function type for for_each_string_list */
 typedef int (*string_list_each_func_t)(struct string_list_item *, void *);
 
@@ -129,14 +132,66 @@ void filter_string_list(struct string_list *list, int free_util,
  */
 void string_list_clear(struct string_list *list, int free_util);
 
+/**
+ * Free a string list initialized without `strdup_strings = 1`, but
+ * where we also want to free() the strings. You usually want to just
+ * use string_list_clear() after initializing with
+ * `STRING_LIST_INIT_DUP' instead.
+ *
+ * Useful to free e.g. a string list whose strings came from
+ * strbuf_detach() or other memory that we didn't initially allocate
+ * on the heap, but which we now manage.
+ *
+ * Under the hood this is identical in behavior to temporarily setting
+ * `strbuf_strings` to `1` for the duration of this function call, but
+ * without the verbosity of performing that dance yourself.
+ */
+void string_list_clear_strings(struct string_list *list, int free_util);
+
+/**
+ * Clear only the `util` pointer, but not the `string`, even if
+ * `strdup_strings = 1` is set. Useful for the idiom of doing e.g.:
+ *
+ *    string_list_append(&list, str + offs)->util = str;
+ *
+ * Where we add a string at some offset, own the string (so
+ * effectively `strdup_strings = `), but can't free() the string
+ * itself at the changed offset, but need to free the original data in
+ * `util` instead.
+ */
+void string_list_clear_util(struct string_list *list);
+
 /**
  * Callback type for `string_list_clear_func`.  The string associated
  * with the util pointer is passed as the second argument
  */
 typedef void (*string_list_clear_func_t)(void *p, const char *str);
 
-/** Call a custom clear function on each util pointer */
-void string_list_clear_func(struct string_list *list, string_list_clear_func_t clearfunc);
+/**
+ * Like string_list_clear() except that it first calls a custom clear
+ * function on each util pointer.
+ *
+ * We guarantee that the `clearfunc` will be called on all util
+ * pointers in a list before we proceed to free the first string or
+ * util pointer, i.e. should you need to it's OK to peek at other util
+ * items in the list itself, or to otherwise iterate it from within
+ * the `clearfunc`.
+ *
+ * You do not need to free() the passed-in util pointer itself,
+ * i.e. after calling all `clearfunc` this has the seme behavior as
+ * string_list_clear() called with with `free_util = 1`.
+ */
+void string_list_clear_func(struct string_list *list,
+			    string_list_clear_func_t clearfunc);
+
+/**
+ * Like string_list_clear_func() but free the strings too, using the
+ * same dance as described for string_list_clear_strings()
+ * above. You'll usually want to initialize with
+ * `STRING_LIST_INIT_DUP` and use string_list_clear_strings() instead.
+ */
+void string_list_clear_strings_func(struct string_list *list,
+				    string_list_clear_func_t clearfunc);
 
 /**
  * Apply `func` to each item. If `func` returns nonzero, the
diff --git a/strmap.h b/strmap.h
index 1e152d832d6..337f6278e86 100644
--- a/strmap.h
+++ b/strmap.h
@@ -51,12 +51,25 @@ void strmap_init_with_options(struct strmap *map,
  */
 void strmap_clear(struct strmap *map, int free_values);
 
+/**
+ * To strmap_clear() what string_list_clear_strings() is to
+ * string_list_clear(). I.e. free your keys too, which we used as-is
+ * without `strdup_strings = 1`.
+ */
+void strmap_clear_strings(struct strmap *map, int free_values);
+
 /*
  * Similar to strmap_clear() but leaves map->map->table allocated and
  * pre-sized so that subsequent uses won't need as many rehashings.
  */
 void strmap_partial_clear(struct strmap *map, int free_values);
 
+/**
+ * To strmap_partial_clear() what string_list_clear_strings() is to
+ * string_list_clear(). See strmap_clear_strings() above.
+ */
+void strmap_partial_clear_strings(struct strmap *map, int free_values);
+
 /*
  * Insert "str" into the map, pointing to "data".
  *
Elijah Newren July 30, 2021, 2:36 p.m. UTC | #6
Hi Ævar,

On Fri, Jul 30, 2021 at 7:33 AM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
>
>
> On Thu, Jul 29 2021, Jeff King wrote:
>
> > On Thu, Jul 29, 2021 at 12:37:52PM -0600, Elijah Newren wrote:
> >
> >> > Arguably, the existence of these function indirections is perhaps a sign
> >> > that the strmap API should provide a version of the clear functions that
> >> > takes "partial / not-partial" as a parameter.
> >>
> >> Are you suggesting a modification of str{map,intmap,set}_clear() to
> >> take an extra parameter, or removing the
> >> str{map,intmap,set}_partial_clear() functions and introducing new
> >> functions that take a partial/not-partial parameter?  I think you're
> >> suggesting the latter, and that makes more sense to me...but I'm
> >> drawing blanks trying to come up with a reasonable function name.
> >
> > It does seem a shame to add the "partial" parameter to strmap_clear(),
> > just because most callers don't need it (so they end up with this
> > inscrutable "0" parameter).
> >
> > What if there was a flags field? Then it could be combined with the
> > free_values parameter. The result is kind of verbose in two ways:
> >
> >  - now strset_clear(), etc, need a "flags" parameter, which they didn't
> >    before (and is just "0" most of the time!)
> >
> >  - now "strmap_clear(foo, 1)" becomes "strmap_clear(foo, STRMAP_FREE_VALUES)".
> >    That's a lot longer, though arguably it's easier to understand since
> >    the boolean is explained.
> >
> > Having gone through the exercise, I am not sure it is actually making
> > anything more readable (messy patch is below for reference).
>
> I've got some WIP patches for string-list.h and strmap.h to make the API
> nicer, and it's probably applicable to strset.h too.

There is no strset.h; strset and strintmap along with strmap are part
of strmap.h.

> I.e. I found when using strset.h that it was a weird API to use, because
> unlike string-list.h it didn't pay attention to your "dup" field when
> freeing, you had to do it explicitly.

Do you mean strmap.h instead of strset.h?

In general, if you are asking strmap/strset/strintmap to dup your keys
and are explicitly freeing the strings, then you are misusing the API
and either freeing pointers that were never allocated or getting
double frees.  It's wrong to explicitly deallocate them because:
  * When using a pool, we just allocate from the pool.  The memory
will be freed when the pool is freed.
  * When not using a pool, we use FLEXPTR_ALLOC_STR in order to make
the string be part of the allocated strmap_entry.  The string's memory
is deallocated when the strmap_entry is.

The only reason to explicitly free keys in a strmap/strset/strintmap
is if you do NOT have strdup_strings set and allocated the strings
elsewhere and left your strmap as the only thing tracking the strings.

> And then in e.g. merge-ort.c there's this "strdup dance" pattern where
> we flip the field back and forth.
>
> The below diff is exctracted from that WIP work, with the relevant two
> API headers and then two changed API users for show (the tree-wide
> changes are much larger).
>
> I think making the promise I make in the updated docs at "We guarantee
> that the `clearfunc`[...]" in string-list.h makes for particularly nice
> API behavior.
>
>  builtin/remote.c | 37 ++++++++++++++++++++---------------
>  merge-ort.c      | 32 +++++++-----------------------
>  string-list.h    | 59 ++++++++++++++++++++++++++++++++++++++++++++++++++++++--
>  strmap.h         | 13 +++++++++++++
>  4 files changed, 98 insertions(+), 43 deletions(-)
>
> diff --git a/builtin/remote.c b/builtin/remote.c
> index 7f88e6ce9de..ec1dbd49f71 100644
> --- a/builtin/remote.c
> +++ b/builtin/remote.c
> @@ -340,10 +340,24 @@ static void read_branches(void)
>
>  struct ref_states {
>         struct remote *remote;
> -       struct string_list new_refs, stale, tracked, heads, push;
> +
> +       struct string_list new_refs;
> +       struct string_list stale;
> +       struct string_list tracked;
> +       struct string_list heads;
> +       struct string_list push;
> +
>         int queried;
>  };
>
> +#define REF_STATES_INIT { \
> +       .new_refs = STRING_LIST_INIT_DUP, \
> +       .stale = STRING_LIST_INIT_DUP, \
> +       .tracked = STRING_LIST_INIT_DUP, \
> +       .heads = STRING_LIST_INIT_DUP, \
> +       .push = STRING_LIST_INIT_DUP, \
> +}
> +
>  static int get_ref_states(const struct ref *remote_refs, struct ref_states *states)
>  {
>         struct ref *fetch_map = NULL, **tail = &fetch_map;
> @@ -355,9 +369,6 @@ static int get_ref_states(const struct ref *remote_refs, struct ref_states *stat
>                         die(_("Could not get fetch map for refspec %s"),
>                                 states->remote->fetch.raw[i]);
>
> -       states->new_refs.strdup_strings = 1;
> -       states->tracked.strdup_strings = 1;
> -       states->stale.strdup_strings = 1;
>         for (ref = fetch_map; ref; ref = ref->next) {
>                 if (!ref->peer_ref || !ref_exists(ref->peer_ref->name))
>                         string_list_append(&states->new_refs, abbrev_branch(ref->name));
> @@ -406,7 +417,6 @@ static int get_push_ref_states(const struct ref *remote_refs,
>
>         match_push_refs(local_refs, &push_map, &remote->push, MATCH_REFS_NONE);
>
> -       states->push.strdup_strings = 1;
>         for (ref = push_map; ref; ref = ref->next) {
>                 struct string_list_item *item;
>                 struct push_info *info;
> @@ -449,7 +459,6 @@ static int get_push_ref_states_noquery(struct ref_states *states)
>         if (remote->mirror)
>                 return 0;
>
> -       states->push.strdup_strings = 1;
>         if (!remote->push.nr) {
>                 item = string_list_append(&states->push, _("(matching)"));
>                 info = item->util = xcalloc(1, sizeof(struct push_info));
> @@ -483,7 +492,6 @@ static int get_head_names(const struct ref *remote_refs, struct ref_states *stat
>         refspec.force = 0;
>         refspec.pattern = 1;
>         refspec.src = refspec.dst = "refs/heads/*";
> -       states->heads.strdup_strings = 1;
>         get_fetch_map(remote_refs, &refspec, &fetch_map_tail, 0);
>         matches = guess_remote_head(find_ref_by_name(remote_refs, "HEAD"),
>                                     fetch_map, 1);
> @@ -905,7 +913,7 @@ static void clear_push_info(void *util, const char *string)
>  {
>         struct push_info *info = util;
>         free(info->dest);
> -       free(info);
> +       /* note: fixed memleak here */
>  }
>
>  static void free_remote_ref_states(struct ref_states *states)
> @@ -1159,7 +1167,7 @@ static int get_one_entry(struct remote *remote, void *priv)
>                 string_list_append(list, remote->name)->util =
>                                 strbuf_detach(&url_buf, NULL);
>         } else
> -               string_list_append(list, remote->name)->util = NULL;
> +               string_list_append(list, remote->name);
>         if (remote->pushurl_nr) {
>                 url = remote->pushurl;
>                 url_nr = remote->pushurl_nr;
> @@ -1179,10 +1187,9 @@ static int get_one_entry(struct remote *remote, void *priv)
>
>  static int show_all(void)
>  {
> -       struct string_list list = STRING_LIST_INIT_NODUP;
> +       struct string_list list = STRING_LIST_INIT_DUP;
>         int result;
>
> -       list.strdup_strings = 1;
>         result = for_each_remote(get_one_entry, &list);
>
>         if (!result) {
> @@ -1212,7 +1219,7 @@ static int show(int argc, const char **argv)
>                 OPT_BOOL('n', NULL, &no_query, N_("do not query remotes")),
>                 OPT_END()
>         };
> -       struct ref_states states;
> +       struct ref_states states = REF_STATES_INIT;
>         struct string_list info_list = STRING_LIST_INIT_NODUP;
>         struct show_info info;
>
> @@ -1334,8 +1341,7 @@ static int set_head(int argc, const char **argv)
>         if (!opt_a && !opt_d && argc == 2) {
>                 head_name = xstrdup(argv[1]);
>         } else if (opt_a && !opt_d && argc == 1) {
> -               struct ref_states states;
> -               memset(&states, 0, sizeof(states));
> +               struct ref_states states = REF_STATES_INIT;
>                 get_remote_ref_states(argv[0], &states, GET_HEAD_NAMES);
>                 if (!states.heads.nr)
>                         result |= error(_("Cannot determine remote HEAD"));
> @@ -1374,14 +1380,13 @@ static int set_head(int argc, const char **argv)
>  static int prune_remote(const char *remote, int dry_run)
>  {
>         int result = 0;
> -       struct ref_states states;
> +       struct ref_states states = REF_STATES_INIT;
>         struct string_list refs_to_prune = STRING_LIST_INIT_NODUP;
>         struct string_list_item *item;
>         const char *dangling_msg = dry_run
>                 ? _(" %s will become dangling!")
>                 : _(" %s has become dangling!");
>
> -       memset(&states, 0, sizeof(states));
>         get_remote_ref_states(remote, &states, GET_REF_STATES);
>
>         if (!states.stale.nr) {

Everything up to here looks like a very nice cleanup.

> diff --git a/merge-ort.c b/merge-ort.c
> index ec0c5904211..53ed78e7a01 100644
> --- a/merge-ort.c
> +++ b/merge-ort.c
> @@ -432,16 +432,6 @@ struct conflict_info {
>         assert((ci) && !(mi)->clean);        \
>  } while (0)
>
> -static void free_strmap_strings(struct strmap *map)
> -{
> -       struct hashmap_iter iter;
> -       struct strmap_entry *entry;
> -
> -       strmap_for_each_entry(map, &iter, entry) {
> -               free((char*)entry->key);
> -       }
> -}
> -
>  static void clear_or_reinit_internal_opts(struct merge_options_internal *opti,
>                                           int reinitialize)
>  {
> @@ -455,13 +445,11 @@ static void clear_or_reinit_internal_opts(struct merge_options_internal *opti,
>                 reinitialize ? strset_partial_clear : strset_clear;
>
>         /*
> -        * We marked opti->paths with strdup_strings = 0, so that we
> -        * wouldn't have to make another copy of the fullpath created by
> -        * make_traverse_path from setup_path_info().  But, now that we've
> -        * used it and have no other references to these strings, it is time
> -        * to deallocate them.
> +        * We used the the pattern of re-using already allocated
> +        * strings strmap_clear_strings() in make_traverse_path from
> +        * setup_path_info(). Deallocate them.
>          */
> -       free_strmap_strings(&opti->paths);
> +       strmap_clear_strings(&opti->paths, 0);
>         strmap_func(&opti->paths, 1);
>
>         /*

It's not clear to me that strmap should handle the freeing of the keys
at all; maybe it should and strmap_clear_strings() makes sense to
introduce.  However, this change is clearly wrong regardless, for two
reasons: (1) You are double clearing since strmap_func() is also
called afterwards, and (2) you are also ignoring the potential partial
bit since strmap_func might be strmap_partial_clear() rather than
strmap_clear().

> @@ -472,15 +460,10 @@ static void clear_or_reinit_internal_opts(struct merge_options_internal *opti,
>         strmap_func(&opti->conflicted, 0);
>
>         /*
> -        * opti->paths_to_free is similar to opti->paths; we created it with
> -        * strdup_strings = 0 to avoid making _another_ copy of the fullpath
> -        * but now that we've used it and have no other references to these
> -        * strings, it is time to deallocate them.  We do so by temporarily
> -        * setting strdup_strings to 1.
> +        * opti->paths_to_free is similar to opti->paths; it's memory
> +        * we borrowed and need to free with string_list_clear_strings().
>          */
> -       opti->paths_to_free.strdup_strings = 1;
> -       string_list_clear(&opti->paths_to_free, 0);
> -       opti->paths_to_free.strdup_strings = 0;
> +       string_list_clear_strings(&opti->paths_to_free, 0);

This is very nice.  I really like this new function and API.

>         if (opti->attr_index.cache_nr) /* true iff opt->renormalize */
>                 discard_index(&opti->attr_index);
> @@ -2664,7 +2647,6 @@ static int collect_renames(struct merge_options *opt,
>          * and have no other references to these strings, it is time to
>          * deallocate them.
>          */
> -       free_strmap_strings(&collisions);
>         strmap_clear(&collisions, 1);
>         return clean;
>  }

This hunk is wrong.

> diff --git a/string-list.h b/string-list.h
> index 0d6b4692396..9eeea996888 100644
> --- a/string-list.h
> +++ b/string-list.h
> @@ -109,6 +109,9 @@ void string_list_init_dup(struct string_list *list);
>   */
>  void string_list_init(struct string_list *list, int strdup_strings);
>
> +void string_list_cmp_init(struct string_list *list, int strdup_strings,
> +                         compare_strings_fn cmp);
> +

Seems unrelated to what you were trying to highlight?

>  /** Callback function type for for_each_string_list */
>  typedef int (*string_list_each_func_t)(struct string_list_item *, void *);
>
> @@ -129,14 +132,66 @@ void filter_string_list(struct string_list *list, int free_util,
>   */
>  void string_list_clear(struct string_list *list, int free_util);
>
> +/**
> + * Free a string list initialized without `strdup_strings = 1`, but
> + * where we also want to free() the strings. You usually want to just
> + * use string_list_clear() after initializing with
> + * `STRING_LIST_INIT_DUP' instead.
> + *
> + * Useful to free e.g. a string list whose strings came from
> + * strbuf_detach() or other memory that we didn't initially allocate
> + * on the heap, but which we now manage.
> + *
> + * Under the hood this is identical in behavior to temporarily setting
> + * `strbuf_strings` to `1` for the duration of this function call, but
> + * without the verbosity of performing that dance yourself.
> + */
> +void string_list_clear_strings(struct string_list *list, int free_util);
> +
> +/**
> + * Clear only the `util` pointer, but not the `string`, even if
> + * `strdup_strings = 1` is set. Useful for the idiom of doing e.g.:
> + *
> + *    string_list_append(&list, str + offs)->util = str;
> + *
> + * Where we add a string at some offset, own the string (so
> + * effectively `strdup_strings = `), but can't free() the string
> + * itself at the changed offset, but need to free the original data in
> + * `util` instead.
> + */
> +void string_list_clear_util(struct string_list *list);
> +
>  /**
>   * Callback type for `string_list_clear_func`.  The string associated
>   * with the util pointer is passed as the second argument
>   */
>  typedef void (*string_list_clear_func_t)(void *p, const char *str);
>
> -/** Call a custom clear function on each util pointer */
> -void string_list_clear_func(struct string_list *list, string_list_clear_func_t clearfunc);
> +/**
> + * Like string_list_clear() except that it first calls a custom clear
> + * function on each util pointer.
> + *
> + * We guarantee that the `clearfunc` will be called on all util
> + * pointers in a list before we proceed to free the first string or
> + * util pointer, i.e. should you need to it's OK to peek at other util
> + * items in the list itself, or to otherwise iterate it from within
> + * the `clearfunc`.
> + *
> + * You do not need to free() the passed-in util pointer itself,
> + * i.e. after calling all `clearfunc` this has the seme behavior as
> + * string_list_clear() called with with `free_util = 1`.
> + */
> +void string_list_clear_func(struct string_list *list,
> +                           string_list_clear_func_t clearfunc);
> +
> +/**
> + * Like string_list_clear_func() but free the strings too, using the
> + * same dance as described for string_list_clear_strings()
> + * above. You'll usually want to initialize with
> + * `STRING_LIST_INIT_DUP` and use string_list_clear_strings() instead.
> + */
> +void string_list_clear_strings_func(struct string_list *list,
> +                                   string_list_clear_func_t clearfunc);
>
>  /**
>   * Apply `func` to each item. If `func` returns nonzero, the

string_list_clear_strings() looks very nice.  The others are probably
good too, though I'm curious about the need for double walking the
list to free it instead of doing it in a single walk; what callers
need to walk the list and check out other values?

> diff --git a/strmap.h b/strmap.h
> index 1e152d832d6..337f6278e86 100644
> --- a/strmap.h
> +++ b/strmap.h
> @@ -51,12 +51,25 @@ void strmap_init_with_options(struct strmap *map,
>   */
>  void strmap_clear(struct strmap *map, int free_values);
>
> +/**
> + * To strmap_clear() what string_list_clear_strings() is to
> + * string_list_clear(). I.e. free your keys too, which we used as-is
> + * without `strdup_strings = 1`.
> + */
> +void strmap_clear_strings(struct strmap *map, int free_values);

strmap.h doesn't depend on string-list.h, so the comment should be
self-standing.  The analogy also doesn't seem to hold since we do NOT
need to free the keys when strdup_strings is 1; Peff suggested
FLEXPTR_ALLOC_STR specifically to avoid that extra allocation in that
case.

> +
>  /*
>   * Similar to strmap_clear() but leaves map->map->table allocated and
>   * pre-sized so that subsequent uses won't need as many rehashings.
>   */
>  void strmap_partial_clear(struct strmap *map, int free_values);
>
> +/**
> + * To strmap_partial_clear() what string_list_clear_strings() is to
> + * string_list_clear(). See strmap_clear_strings() above.
> + */
> +void strmap_partial_clear_strings(struct strmap *map, int free_values);
> +

Same comment as above for strmap_clear_strings() applies here.
Jeff King July 30, 2021, 4:12 p.m. UTC | #7
On Thu, Jul 29, 2021 at 08:30:23PM -0600, Elijah Newren wrote:

> > What if there was a flags field? Then it could be combined with the
> > free_values parameter. The result is kind of verbose in two ways:
> >
> >  - now strset_clear(), etc, need a "flags" parameter, which they didn't
> >    before (and is just "0" most of the time!)
> >
> >  - now "strmap_clear(foo, 1)" becomes "strmap_clear(foo, STRMAP_FREE_VALUES)".
> >    That's a lot longer, though arguably it's easier to understand since
> >    the boolean is explained.
> >
> > Having gone through the exercise, I am not sure it is actually making
> > anything more readable (messy patch is below for reference).
> 
> Thanks for diving in.  Since it's not clear if it's helping, I'll just
> take your earlier suggestion to rename the "strmap_func" variable to
> "strmap_clear_func" instead.

That sounds just fine with me. Thanks for considering my tangent. :)

-Peff
Ævar Arnfjörð Bjarmason July 30, 2021, 4:23 p.m. UTC | #8
On Fri, Jul 30 2021, Elijah Newren wrote:

> Hi Ævar,
>
> On Fri, Jul 30, 2021 at 7:33 AM Ævar Arnfjörð Bjarmason
> <avarab@gmail.com> wrote:
>>
>>
>> On Thu, Jul 29 2021, Jeff King wrote:
>>
>> > On Thu, Jul 29, 2021 at 12:37:52PM -0600, Elijah Newren wrote:
>> >
>> >> > Arguably, the existence of these function indirections is perhaps a sign
>> >> > that the strmap API should provide a version of the clear functions that
>> >> > takes "partial / not-partial" as a parameter.
>> >>
>> >> Are you suggesting a modification of str{map,intmap,set}_clear() to
>> >> take an extra parameter, or removing the
>> >> str{map,intmap,set}_partial_clear() functions and introducing new
>> >> functions that take a partial/not-partial parameter?  I think you're
>> >> suggesting the latter, and that makes more sense to me...but I'm
>> >> drawing blanks trying to come up with a reasonable function name.
>> >
>> > It does seem a shame to add the "partial" parameter to strmap_clear(),
>> > just because most callers don't need it (so they end up with this
>> > inscrutable "0" parameter).
>> >
>> > What if there was a flags field? Then it could be combined with the
>> > free_values parameter. The result is kind of verbose in two ways:
>> >
>> >  - now strset_clear(), etc, need a "flags" parameter, which they didn't
>> >    before (and is just "0" most of the time!)
>> >
>> >  - now "strmap_clear(foo, 1)" becomes "strmap_clear(foo, STRMAP_FREE_VALUES)".
>> >    That's a lot longer, though arguably it's easier to understand since
>> >    the boolean is explained.
>> >
>> > Having gone through the exercise, I am not sure it is actually making
>> > anything more readable (messy patch is below for reference).
>>
>> I've got some WIP patches for string-list.h and strmap.h to make the API
>> nicer, and it's probably applicable to strset.h too.
>
> There is no strset.h; strset and strintmap along with strmap are part
> of strmap.h.
>
>> I.e. I found when using strset.h that it was a weird API to use, because
>> unlike string-list.h it didn't pay attention to your "dup" field when
>> freeing, you had to do it explicitly.
>
> Do you mean strmap.h instead of strset.h?

Yes, sorry. Brainfart.

> In general, if you are asking strmap/strset/strintmap to dup your keys
> and are explicitly freeing the strings, then you are misusing the API
> and either freeing pointers that were never allocated or getting
> double frees.  It's wrong to explicitly deallocate them because:
>   * When using a pool, we just allocate from the pool.  The memory
> will be freed when the pool is freed.
>   * When not using a pool, we use FLEXPTR_ALLOC_STR in order to make
> the string be part of the allocated strmap_entry.  The string's memory
> is deallocated when the strmap_entry is.
>
> The only reason to explicitly free keys in a strmap/strset/strintmap
> is if you do NOT have strdup_strings set and allocated the strings
> elsewhere and left your strmap as the only thing tracking the strings.

Yes, sorry. I think I was trying to address that case, i.e. it started
with fixing some memory leaks, but it's part of a branch of mine that's
in a messy WIP state of not passing the tests. Please ignore the rest of
the hunks to do with strmap.h.

I might have some worthwhile fixes there, maybe not. It mainly started
with carrying things over from similar changes in string-list.h.

>> And then in e.g. merge-ort.c there's this "strdup dance" pattern where
>> we flip the field back and forth.
>>
>> The below diff is exctracted from that WIP work, with the relevant two
>> API headers and then two changed API users for show (the tree-wide
>> changes are much larger).
>>
>> I think making the promise I make in the updated docs at "We guarantee
>> that the `clearfunc`[...]" in string-list.h makes for particularly nice
>> API behavior.
>>
>>  builtin/remote.c | 37 ++++++++++++++++++++---------------
>>  merge-ort.c      | 32 +++++++-----------------------
>>  string-list.h    | 59 ++++++++++++++++++++++++++++++++++++++++++++++++++++++--
>>  strmap.h         | 13 +++++++++++++
>>  4 files changed, 98 insertions(+), 43 deletions(-)
>>
>> diff --git a/builtin/remote.c b/builtin/remote.c
>> index 7f88e6ce9de..ec1dbd49f71 100644
>> --- a/builtin/remote.c
>> +++ b/builtin/remote.c
>> @@ -340,10 +340,24 @@ static void read_branches(void)
>>
>>  struct ref_states {
>>         struct remote *remote;
>> -       struct string_list new_refs, stale, tracked, heads, push;
>> +
>> +       struct string_list new_refs;
>> +       struct string_list stale;
>> +       struct string_list tracked;
>> +       struct string_list heads;
>> +       struct string_list push;
>> +
>>         int queried;
>>  };
>>
>> +#define REF_STATES_INIT { \
>> +       .new_refs = STRING_LIST_INIT_DUP, \
>> +       .stale = STRING_LIST_INIT_DUP, \
>> +       .tracked = STRING_LIST_INIT_DUP, \
>> +       .heads = STRING_LIST_INIT_DUP, \
>> +       .push = STRING_LIST_INIT_DUP, \
>> +}
>> +
>>  static int get_ref_states(const struct ref *remote_refs, struct ref_states *states)
>>  {
>>         struct ref *fetch_map = NULL, **tail = &fetch_map;
>> @@ -355,9 +369,6 @@ static int get_ref_states(const struct ref *remote_refs, struct ref_states *stat
>>                         die(_("Could not get fetch map for refspec %s"),
>>                                 states->remote->fetch.raw[i]);
>>
>> -       states->new_refs.strdup_strings = 1;
>> -       states->tracked.strdup_strings = 1;
>> -       states->stale.strdup_strings = 1;
>>         for (ref = fetch_map; ref; ref = ref->next) {
>>                 if (!ref->peer_ref || !ref_exists(ref->peer_ref->name))
>>                         string_list_append(&states->new_refs, abbrev_branch(ref->name));
>> @@ -406,7 +417,6 @@ static int get_push_ref_states(const struct ref *remote_refs,
>>
>>         match_push_refs(local_refs, &push_map, &remote->push, MATCH_REFS_NONE);
>>
>> -       states->push.strdup_strings = 1;
>>         for (ref = push_map; ref; ref = ref->next) {
>>                 struct string_list_item *item;
>>                 struct push_info *info;
>> @@ -449,7 +459,6 @@ static int get_push_ref_states_noquery(struct ref_states *states)
>>         if (remote->mirror)
>>                 return 0;
>>
>> -       states->push.strdup_strings = 1;
>>         if (!remote->push.nr) {
>>                 item = string_list_append(&states->push, _("(matching)"));
>>                 info = item->util = xcalloc(1, sizeof(struct push_info));
>> @@ -483,7 +492,6 @@ static int get_head_names(const struct ref *remote_refs, struct ref_states *stat
>>         refspec.force = 0;
>>         refspec.pattern = 1;
>>         refspec.src = refspec.dst = "refs/heads/*";
>> -       states->heads.strdup_strings = 1;
>>         get_fetch_map(remote_refs, &refspec, &fetch_map_tail, 0);
>>         matches = guess_remote_head(find_ref_by_name(remote_refs, "HEAD"),
>>                                     fetch_map, 1);
>> @@ -905,7 +913,7 @@ static void clear_push_info(void *util, const char *string)
>>  {
>>         struct push_info *info = util;
>>         free(info->dest);
>> -       free(info);
>> +       /* note: fixed memleak here */
>>  }
>>
>>  static void free_remote_ref_states(struct ref_states *states)
>> @@ -1159,7 +1167,7 @@ static int get_one_entry(struct remote *remote, void *priv)
>>                 string_list_append(list, remote->name)->util =
>>                                 strbuf_detach(&url_buf, NULL);
>>         } else
>> -               string_list_append(list, remote->name)->util = NULL;
>> +               string_list_append(list, remote->name);
>>         if (remote->pushurl_nr) {
>>                 url = remote->pushurl;
>>                 url_nr = remote->pushurl_nr;
>> @@ -1179,10 +1187,9 @@ static int get_one_entry(struct remote *remote, void *priv)
>>
>>  static int show_all(void)
>>  {
>> -       struct string_list list = STRING_LIST_INIT_NODUP;
>> +       struct string_list list = STRING_LIST_INIT_DUP;
>>         int result;
>>
>> -       list.strdup_strings = 1;
>>         result = for_each_remote(get_one_entry, &list);
>>
>>         if (!result) {
>> @@ -1212,7 +1219,7 @@ static int show(int argc, const char **argv)
>>                 OPT_BOOL('n', NULL, &no_query, N_("do not query remotes")),
>>                 OPT_END()
>>         };
>> -       struct ref_states states;
>> +       struct ref_states states = REF_STATES_INIT;
>>         struct string_list info_list = STRING_LIST_INIT_NODUP;
>>         struct show_info info;
>>
>> @@ -1334,8 +1341,7 @@ static int set_head(int argc, const char **argv)
>>         if (!opt_a && !opt_d && argc == 2) {
>>                 head_name = xstrdup(argv[1]);
>>         } else if (opt_a && !opt_d && argc == 1) {
>> -               struct ref_states states;
>> -               memset(&states, 0, sizeof(states));
>> +               struct ref_states states = REF_STATES_INIT;
>>                 get_remote_ref_states(argv[0], &states, GET_HEAD_NAMES);
>>                 if (!states.heads.nr)
>>                         result |= error(_("Cannot determine remote HEAD"));
>> @@ -1374,14 +1380,13 @@ static int set_head(int argc, const char **argv)
>>  static int prune_remote(const char *remote, int dry_run)
>>  {
>>         int result = 0;
>> -       struct ref_states states;
>> +       struct ref_states states = REF_STATES_INIT;
>>         struct string_list refs_to_prune = STRING_LIST_INIT_NODUP;
>>         struct string_list_item *item;
>>         const char *dangling_msg = dry_run
>>                 ? _(" %s will become dangling!")
>>                 : _(" %s has become dangling!");
>>
>> -       memset(&states, 0, sizeof(states));
>>         get_remote_ref_states(remote, &states, GET_REF_STATES);
>>
>>         if (!states.stale.nr) {
>
> Everything up to here looks like a very nice cleanup.
>
>> diff --git a/merge-ort.c b/merge-ort.c
>> index ec0c5904211..53ed78e7a01 100644
>> --- a/merge-ort.c
>> +++ b/merge-ort.c
>> @@ -432,16 +432,6 @@ struct conflict_info {
>>         assert((ci) && !(mi)->clean);        \
>>  } while (0)
>>
>> -static void free_strmap_strings(struct strmap *map)
>> -{
>> -       struct hashmap_iter iter;
>> -       struct strmap_entry *entry;
>> -
>> -       strmap_for_each_entry(map, &iter, entry) {
>> -               free((char*)entry->key);
>> -       }
>> -}
>> -
>>  static void clear_or_reinit_internal_opts(struct merge_options_internal *opti,
>>                                           int reinitialize)
>>  {
>> @@ -455,13 +445,11 @@ static void clear_or_reinit_internal_opts(struct merge_options_internal *opti,
>>                 reinitialize ? strset_partial_clear : strset_clear;
>>
>>         /*
>> -        * We marked opti->paths with strdup_strings = 0, so that we
>> -        * wouldn't have to make another copy of the fullpath created by
>> -        * make_traverse_path from setup_path_info().  But, now that we've
>> -        * used it and have no other references to these strings, it is time
>> -        * to deallocate them.
>> +        * We used the the pattern of re-using already allocated
>> +        * strings strmap_clear_strings() in make_traverse_path from
>> +        * setup_path_info(). Deallocate them.
>>          */
>> -       free_strmap_strings(&opti->paths);
>> +       strmap_clear_strings(&opti->paths, 0);
>>         strmap_func(&opti->paths, 1);
>>
>>         /*
>
> It's not clear to me that strmap should handle the freeing of the keys
> at all; maybe it should and strmap_clear_strings() makes sense to
> introduce.  However, this change is clearly wrong regardless, for two
> reasons: (1) You are double clearing since strmap_func() is also
> called afterwards, and (2) you are also ignoring the potential partial
> bit since strmap_func might be strmap_partial_clear() rather than
> strmap_clear().

*Nod*, see above.

>> @@ -472,15 +460,10 @@ static void clear_or_reinit_internal_opts(struct merge_options_internal *opti,
>>         strmap_func(&opti->conflicted, 0);
>>
>>         /*
>> -        * opti->paths_to_free is similar to opti->paths; we created it with
>> -        * strdup_strings = 0 to avoid making _another_ copy of the fullpath
>> -        * but now that we've used it and have no other references to these
>> -        * strings, it is time to deallocate them.  We do so by temporarily
>> -        * setting strdup_strings to 1.
>> +        * opti->paths_to_free is similar to opti->paths; it's memory
>> +        * we borrowed and need to free with string_list_clear_strings().
>>          */
>> -       opti->paths_to_free.strdup_strings = 1;
>> -       string_list_clear(&opti->paths_to_free, 0);
>> -       opti->paths_to_free.strdup_strings = 0;
>> +       string_list_clear_strings(&opti->paths_to_free, 0);
>
> This is very nice.  I really like this new function and API.
>
>>         if (opti->attr_index.cache_nr) /* true iff opt->renormalize */
>>                 discard_index(&opti->attr_index);
>> @@ -2664,7 +2647,6 @@ static int collect_renames(struct merge_options *opt,
>>          * and have no other references to these strings, it is time to
>>          * deallocate them.
>>          */
>> -       free_strmap_strings(&collisions);
>>         strmap_clear(&collisions, 1);
>>         return clean;
>>  }
>
> This hunk is wrong.

*nod*

>> diff --git a/string-list.h b/string-list.h
>> index 0d6b4692396..9eeea996888 100644
>> --- a/string-list.h
>> +++ b/string-list.h
>> @@ -109,6 +109,9 @@ void string_list_init_dup(struct string_list *list);
>>   */
>>  void string_list_init(struct string_list *list, int strdup_strings);
>>
>> +void string_list_cmp_init(struct string_list *list, int strdup_strings,
>> +                         compare_strings_fn cmp);
>> +
>
> Seems unrelated to what you were trying to highlight?

Yes, sorry. I just extracted all the diff WIP diff I had for
string-list.h, this bit was unrelated.

It's unrelated cleanup of various things that hardcoded all the fields
during their "init", just because they need strcasecmp or whatever
instead of strcmp.

>>  /** Callback function type for for_each_string_list */
>>  typedef int (*string_list_each_func_t)(struct string_list_item *, void *);
>>
>> @@ -129,14 +132,66 @@ void filter_string_list(struct string_list *list, int free_util,
>>   */
>>  void string_list_clear(struct string_list *list, int free_util);
>>
>> +/**
>> + * Free a string list initialized without `strdup_strings = 1`, but
>> + * where we also want to free() the strings. You usually want to just
>> + * use string_list_clear() after initializing with
>> + * `STRING_LIST_INIT_DUP' instead.
>> + *
>> + * Useful to free e.g. a string list whose strings came from
>> + * strbuf_detach() or other memory that we didn't initially allocate
>> + * on the heap, but which we now manage.
>> + *
>> + * Under the hood this is identical in behavior to temporarily setting
>> + * `strbuf_strings` to `1` for the duration of this function call, but
>> + * without the verbosity of performing that dance yourself.
>> + */
>> +void string_list_clear_strings(struct string_list *list, int free_util);
>> +
>> +/**
>> + * Clear only the `util` pointer, but not the `string`, even if
>> + * `strdup_strings = 1` is set. Useful for the idiom of doing e.g.:
>> + *
>> + *    string_list_append(&list, str + offs)->util = str;
>> + *
>> + * Where we add a string at some offset, own the string (so
>> + * effectively `strdup_strings = `), but can't free() the string
>> + * itself at the changed offset, but need to free the original data in
>> + * `util` instead.
>> + */
>> +void string_list_clear_util(struct string_list *list);
>> +
>>  /**
>>   * Callback type for `string_list_clear_func`.  The string associated
>>   * with the util pointer is passed as the second argument
>>   */
>>  typedef void (*string_list_clear_func_t)(void *p, const char *str);
>>
>> -/** Call a custom clear function on each util pointer */
>> -void string_list_clear_func(struct string_list *list, string_list_clear_func_t clearfunc);
>> +/**
>> + * Like string_list_clear() except that it first calls a custom clear
>> + * function on each util pointer.
>> + *
>> + * We guarantee that the `clearfunc` will be called on all util
>> + * pointers in a list before we proceed to free the first string or
>> + * util pointer, i.e. should you need to it's OK to peek at other util
>> + * items in the list itself, or to otherwise iterate it from within
>> + * the `clearfunc`.
>> + *
>> + * You do not need to free() the passed-in util pointer itself,
>> + * i.e. after calling all `clearfunc` this has the seme behavior as
>> + * string_list_clear() called with with `free_util = 1`.
>> + */
>> +void string_list_clear_func(struct string_list *list,
>> +                           string_list_clear_func_t clearfunc);
>> +
>> +/**
>> + * Like string_list_clear_func() but free the strings too, using the
>> + * same dance as described for string_list_clear_strings()
>> + * above. You'll usually want to initialize with
>> + * `STRING_LIST_INIT_DUP` and use string_list_clear_strings() instead.
>> + */
>> +void string_list_clear_strings_func(struct string_list *list,
>> +                                   string_list_clear_func_t clearfunc);
>>
>>  /**
>>   * Apply `func` to each item. If `func` returns nonzero, the
>
> string_list_clear_strings() looks very nice.  The others are probably
> good too, though I'm curious about the need for double walking the
> list to free it instead of doing it in a single walk; what callers
> need to walk the list and check out other values?

I think there were a couple of users that needed that, but maybe I'm
wrong. I think even if there's not clearly defining the callback freeing
semantics makes sense for caller sanity.

We double-walk now in string_list_clear(), this is mostly documenting
and extending current behavior to being able to clear any arbitrary
combination of string/util with an optional cb, regardless of your
strdup_strings state.

>> diff --git a/strmap.h b/strmap.h
>> index 1e152d832d6..337f6278e86 100644
>> --- a/strmap.h
>> +++ b/strmap.h
>> @@ -51,12 +51,25 @@ void strmap_init_with_options(struct strmap *map,
>>   */
>>  void strmap_clear(struct strmap *map, int free_values);
>>
>> +/**
>> + * To strmap_clear() what string_list_clear_strings() is to
>> + * string_list_clear(). I.e. free your keys too, which we used as-is
>> + * without `strdup_strings = 1`.
>> + */
>> +void strmap_clear_strings(struct strmap *map, int free_values);
>
> strmap.h doesn't depend on string-list.h, so the comment should be
> self-standing.  The analogy also doesn't seem to hold since we do NOT
> need to free the keys when strdup_strings is 1; Peff suggested
> FLEXPTR_ALLOC_STR specifically to avoid that extra allocation in that
> case.

*nod*, see above (i.e. maybe all this strmap.h stuff is wrong). FWIW
that comment was meant as a "if you're familiar with X in API A, this Y
in B works similarly".

>> +
>>  /*
>>   * Similar to strmap_clear() but leaves map->map->table allocated and
>>   * pre-sized so that subsequent uses won't need as many rehashings.
>>   */
>>  void strmap_partial_clear(struct strmap *map, int free_values);
>>
>> +/**
>> + * To strmap_partial_clear() what string_list_clear_strings() is to
>> + * string_list_clear(). See strmap_clear_strings() above.
>> + */
>> +void strmap_partial_clear_strings(struct strmap *map, int free_values);
>> +
>
> Same comment as above for strmap_clear_strings() applies here.

*nod*
diff mbox series

Patch

diff --git a/merge-ort.c b/merge-ort.c
index 2bca4b71f2a..5fd2a4ccd35 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -539,15 +539,19 @@  static void clear_or_reinit_internal_opts(struct merge_options_internal *opti,
 	void (*strset_func)(struct strset *) =
 		reinitialize ? strset_partial_clear : strset_clear;
 
-	/*
-	 * We marked opti->paths with strdup_strings = 0, so that we
-	 * wouldn't have to make another copy of the fullpath created by
-	 * make_traverse_path from setup_path_info().  But, now that we've
-	 * used it and have no other references to these strings, it is time
-	 * to deallocate them.
-	 */
-	free_strmap_strings(&opti->paths);
-	strmap_func(&opti->paths, 1);
+	if (opti->pool)
+		strmap_func(&opti->paths, 0);
+	else {
+		/*
+		 * We marked opti->paths with strdup_strings = 0, so that
+		 * we wouldn't have to make another copy of the fullpath
+		 * created by make_traverse_path from setup_path_info().
+		 * But, now that we've used it and have no other references
+		 * to these strings, it is time to deallocate them.
+		 */
+		free_strmap_strings(&opti->paths);
+		strmap_func(&opti->paths, 1);
+	}
 
 	/*
 	 * All keys and values in opti->conflicted are a subset of those in
@@ -556,16 +560,19 @@  static void clear_or_reinit_internal_opts(struct merge_options_internal *opti,
 	 */
 	strmap_func(&opti->conflicted, 0);
 
-	/*
-	 * opti->paths_to_free is similar to opti->paths; we created it with
-	 * strdup_strings = 0 to avoid making _another_ copy of the fullpath
-	 * but now that we've used it and have no other references to these
-	 * strings, it is time to deallocate them.  We do so by temporarily
-	 * setting strdup_strings to 1.
-	 */
-	opti->paths_to_free.strdup_strings = 1;
-	string_list_clear(&opti->paths_to_free, 0);
-	opti->paths_to_free.strdup_strings = 0;
+	if (!opti->pool) {
+		/*
+		 * opti->paths_to_free is similar to opti->paths; we
+		 * created it with strdup_strings = 0 to avoid making
+		 * _another_ copy of the fullpath but now that we've used
+		 * it and have no other references to these strings, it is
+		 * time to deallocate them.  We do so by temporarily
+		 * setting strdup_strings to 1.
+		 */
+		opti->paths_to_free.strdup_strings = 1;
+		string_list_clear(&opti->paths_to_free, 0);
+		opti->paths_to_free.strdup_strings = 0;
+	}
 
 	if (opti->attr_index.cache_nr) /* true iff opt->renormalize */
 		discard_index(&opti->attr_index);
@@ -683,7 +690,6 @@  static void path_msg(struct merge_options *opt,
 	strbuf_addch(sb, '\n');
 }
 
-MAYBE_UNUSED
 static void *pool_calloc(struct mem_pool *pool, size_t count, size_t size)
 {
 	if (!pool)
@@ -691,7 +697,6 @@  static void *pool_calloc(struct mem_pool *pool, size_t count, size_t size)
 	return mem_pool_calloc(pool, count, size);
 }
 
-MAYBE_UNUSED
 static void *pool_alloc(struct mem_pool *pool, size_t size)
 {
 	if (!pool)
@@ -699,7 +704,6 @@  static void *pool_alloc(struct mem_pool *pool, size_t size)
 	return mem_pool_alloc(pool, size);
 }
 
-MAYBE_UNUSED
 static void *pool_strndup(struct mem_pool *pool, const char *str, size_t len)
 {
 	if (!pool)
@@ -835,8 +839,9 @@  static void setup_path_info(struct merge_options *opt,
 	assert(!df_conflict || !resolved); /* df_conflict implies !resolved */
 	assert(resolved == (merged_version != NULL));
 
-	mi = xcalloc(1, resolved ? sizeof(struct merged_info) :
-				   sizeof(struct conflict_info));
+	mi = pool_calloc(opt->priv->pool, 1,
+			 resolved ? sizeof(struct merged_info) :
+				    sizeof(struct conflict_info));
 	mi->directory_name = current_dir_name;
 	mi->basename_offset = current_dir_name_len;
 	mi->clean = !!resolved;
@@ -1128,7 +1133,7 @@  static int collect_merge_info_callback(int n,
 	len = traverse_path_len(info, p->pathlen);
 
 	/* +1 in both of the following lines to include the NUL byte */
-	fullpath = xmalloc(len + 1);
+	fullpath = pool_alloc(opt->priv->pool, len + 1);
 	make_traverse_path(fullpath, len + 1, info, p->path, p->pathlen);
 
 	/*
@@ -1383,7 +1388,7 @@  static int handle_deferred_entries(struct merge_options *opt,
 		copy = renames->deferred[side].possible_trivial_merges;
 		strintmap_init_with_options(&renames->deferred[side].possible_trivial_merges,
 					    0,
-					    NULL,
+					    opt->priv->pool,
 					    0);
 		strintmap_for_each_entry(&copy, &iter, entry) {
 			const char *path = entry->key;
@@ -2335,12 +2340,21 @@  static void apply_directory_rename_modifications(struct merge_options *opt,
 	VERIFY_CI(ci);
 
 	/* Find parent directories missing from opt->priv->paths */
-	cur_path = new_path;
+	if (opt->priv->pool) {
+		cur_path = mem_pool_strdup(opt->priv->pool, new_path);
+		free((char*)new_path);
+		new_path = (char *)cur_path;
+	} else {
+		cur_path = new_path;
+	}
+
 	while (1) {
 		/* Find the parent directory of cur_path */
 		char *last_slash = strrchr(cur_path, '/');
 		if (last_slash) {
-			parent_name = xstrndup(cur_path, last_slash - cur_path);
+			parent_name = pool_strndup(opt->priv->pool,
+						   cur_path,
+						   last_slash - cur_path);
 		} else {
 			parent_name = opt->priv->toplevel_dir;
 			break;
@@ -2349,7 +2363,8 @@  static void apply_directory_rename_modifications(struct merge_options *opt,
 		/* Look it up in opt->priv->paths */
 		entry = strmap_get_entry(&opt->priv->paths, parent_name);
 		if (entry) {
-			free((char*)parent_name);
+			if (!opt->priv->pool)
+				free((char*)parent_name);
 			parent_name = entry->key; /* reuse known pointer */
 			break;
 		}
@@ -2376,12 +2391,15 @@  static void apply_directory_rename_modifications(struct merge_options *opt,
 		parent_name = cur_dir;
 	}
 
-	/*
-	 * We are removing old_path from opt->priv->paths.  old_path also will
-	 * eventually need to be freed, but it may still be used by e.g.
-	 * ci->pathnames.  So, store it in another string-list for now.
-	 */
-	string_list_append(&opt->priv->paths_to_free, old_path);
+	if (!opt->priv->pool) {
+		/*
+		 * We are removing old_path from opt->priv->paths.
+		 * old_path also will eventually need to be freed, but it
+		 * may still be used by e.g.  ci->pathnames.  So, store it
+		 * in another string-list for now.
+		 */
+		string_list_append(&opt->priv->paths_to_free, old_path);
+	}
 
 	assert(ci->filemask == 2 || ci->filemask == 4);
 	assert(ci->dirmask == 0);
@@ -2416,7 +2434,8 @@  static void apply_directory_rename_modifications(struct merge_options *opt,
 		new_ci->stages[index].mode = ci->stages[index].mode;
 		oidcpy(&new_ci->stages[index].oid, &ci->stages[index].oid);
 
-		free(ci);
+		if (!opt->priv->pool)
+			free(ci);
 		ci = new_ci;
 	}
 
@@ -3623,7 +3642,8 @@  static void process_entry(struct merge_options *opt,
 		 * the directory to remain here, so we need to move this
 		 * path to some new location.
 		 */
-		CALLOC_ARRAY(new_ci, 1);
+		new_ci = pool_calloc(opt->priv->pool, 1, sizeof(*new_ci));
+
 		/* We don't really want new_ci->merged.result copied, but it'll
 		 * be overwritten below so it doesn't matter.  We also don't
 		 * want any directory mode/oid values copied, but we'll zero
@@ -3713,7 +3733,7 @@  static void process_entry(struct merge_options *opt,
 			const char *a_path = NULL, *b_path = NULL;
 			int rename_a = 0, rename_b = 0;
 
-			new_ci = xmalloc(sizeof(*new_ci));
+			new_ci = pool_alloc(opt->priv->pool, sizeof(*new_ci));
 
 			if (S_ISREG(a_mode))
 				rename_a = 1;
@@ -3777,12 +3797,14 @@  static void process_entry(struct merge_options *opt,
 				strmap_remove(&opt->priv->paths, path, 0);
 				/*
 				 * We removed path from opt->priv->paths.  path
-				 * will also eventually need to be freed, but
-				 * it may still be used by e.g.  ci->pathnames.
-				 * So, store it in another string-list for now.
+				 * will also eventually need to be freed if not
+				 * part of a memory pool...but it may still be
+				 * used by e.g. ci->pathnames.  So, store it in
+				 * another string-list for now in that case.
 				 */
-				string_list_append(&opt->priv->paths_to_free,
-						   path);
+				if (!opt->priv->pool)
+					string_list_append(&opt->priv->paths_to_free,
+							   path);
 			}
 
 			/*
@@ -4322,6 +4344,7 @@  static void merge_start(struct merge_options *opt, struct merge_result *result)
 {
 	struct rename_info *renames;
 	int i;
+	struct mem_pool *pool = NULL;
 
 	/* Sanity checks on opt */
 	trace2_region_enter("merge", "sanity checks", opt->repo);
@@ -4393,9 +4416,10 @@  static void merge_start(struct merge_options *opt, struct merge_result *result)
 #else
 	opt->priv->pool = NULL;
 #endif
+	pool = opt->priv->pool;
 	for (i = MERGE_SIDE1; i <= MERGE_SIDE2; i++) {
 		strintmap_init_with_options(&renames->dirs_removed[i],
-					    NOT_RELEVANT, NULL, 0);
+					    NOT_RELEVANT, pool, 0);
 		strmap_init_with_options(&renames->dir_rename_count[i],
 					 NULL, 1);
 		strmap_init_with_options(&renames->dir_renames[i],
@@ -4409,7 +4433,7 @@  static void merge_start(struct merge_options *opt, struct merge_result *result)
 		 */
 		strintmap_init_with_options(&renames->relevant_sources[i],
 					    -1 /* explicitly invalid */,
-					    NULL, 0);
+					    pool, 0);
 		strmap_init_with_options(&renames->cached_pairs[i],
 					 NULL, 1);
 		strset_init_with_options(&renames->cached_irrelevant[i],
@@ -4419,9 +4443,9 @@  static void merge_start(struct merge_options *opt, struct merge_result *result)
 	}
 	for (i = MERGE_SIDE1; i <= MERGE_SIDE2; i++) {
 		strintmap_init_with_options(&renames->deferred[i].possible_trivial_merges,
-					    0, NULL, 0);
+					    0, pool, 0);
 		strset_init_with_options(&renames->deferred[i].target_dirs,
-					 NULL, 1);
+					 pool, 1);
 		renames->deferred[i].trivial_merges_okay = 1; /* 1 == maybe */
 	}
 
@@ -4434,9 +4458,10 @@  static void merge_start(struct merge_options *opt, struct merge_result *result)
 	 * In contrast, conflicted just has a subset of keys from paths, so
 	 * we don't want to free those (it'd be a duplicate free).
 	 */
-	strmap_init_with_options(&opt->priv->paths, NULL, 0);
-	strmap_init_with_options(&opt->priv->conflicted, NULL, 0);
-	string_list_init(&opt->priv->paths_to_free, 0);
+	strmap_init_with_options(&opt->priv->paths, pool, 0);
+	strmap_init_with_options(&opt->priv->conflicted, pool, 0);
+	if (!opt->priv->pool)
+		string_list_init(&opt->priv->paths_to_free, 0);
 
 	/*
 	 * keys & strbufs in output will sometimes need to outlive "paths",