[v3,07/14] apply: make parse_git_header public
diff mbox series

Message ID 20190708163315.29912-8-t.gummerer@gmail.com
State New
Headers show
Series
  • output improvements for git range-diff
Related show

Commit Message

Thomas Gummerer July 8, 2019, 4:33 p.m. UTC
Make parse_git_header a "public" function in apply.h, so we can re-use
it in range-diff in a subsequent commit.

Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
---
 apply.c | 69 ++++++++++++++++-----------------------------------------
 apply.h | 48 +++++++++++++++++++++++++++++++++++++++
 2 files changed, 67 insertions(+), 50 deletions(-)

Comments

Junio C Hamano July 9, 2019, 7:39 p.m. UTC | #1
Thomas Gummerer <t.gummerer@gmail.com> writes:

> Make parse_git_header a "public" function in apply.h, so we can re-use
> it in range-diff in a subsequent commit.
>
> Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
> ---

Thanks for these refactoring patches on "apply" machinery in the
early part of the series.  I noticed two small things, though.

 - The apply_state instance *does* represent a state and various
   fields get updated as we read and process the patch.  The smaller
   structure you invented, on the other hand, does not carry any
   "state" at all.  Even its "linenr" field does not get incremented
   as we read/process---you create a new copy to take a snapshot of
   the current state from apply_state.  parse_git_header_data may
   have been a name that reflects the nature of the structure
   better.

 - I wonder if it makes the concept clearer if you did not create a
   new instance outside the apply_state, but instead replaced the
   three fields in the apply_state with an instance of this new
   structure.  When you call an API function with shrunk interface,
   you'd pass a pointer to a field inside the apply_state instance,
   instead of copying three fields manually.

But other than that, I think these patches are generally moving bits
in the right direction.

I do not have strong opinions on the later part of the series on
range-diff proper.

Thanks.
Thomas Gummerer July 9, 2019, 9:23 p.m. UTC | #2
On 07/09, Junio C Hamano wrote:
> Thomas Gummerer <t.gummerer@gmail.com> writes:
> 
> > Make parse_git_header a "public" function in apply.h, so we can re-use
> > it in range-diff in a subsequent commit.

Eek, I just noticed that I forgot updating the name here.  This and
the Subject should say 'parse_git_diff_header()' now, instead of
parse_git_header of course.  Will fix that in the reroll.

> > Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
> > ---
> 
> Thanks for these refactoring patches on "apply" machinery in the
> early part of the series.  I noticed two small things, though.
> 
>  - The apply_state instance *does* represent a state and various
>    fields get updated as we read and process the patch.  The smaller
>    structure you invented, on the other hand, does not carry any
>    "state" at all.  Even its "linenr" field does not get incremented
>    as we read/process---you create a new copy to take a snapshot of
>    the current state from apply_state.  parse_git_header_data may
>    have been a name that reflects the nature of the structure
>    better.

Yeah, I think that's better.  Will change, thanks!

Maybe it would be even better to name it 'struct gitdiff_data', as
it's really only used for those few functions?

>  - I wonder if it makes the concept clearer if you did not create a
>    new instance outside the apply_state, but instead replaced the
>    three fields in the apply_state with an instance of this new
>    structure.  When you call an API function with shrunk interface,
>    you'd pass a pointer to a field inside the apply_state instance,
>    instead of copying three fields manually.

I had considered that.  However I struggled to come up with a name
that makes sense in both as an interface to 'parse_git_diff_header()',
and inside 'struct apply_state'.  'linenr' is not specific to parsing
git diff headers (or even parsing any type of diff header), but is
used all over the apply code.  So 'parse_git_header_data' doesn't make
sense as a name anymore (and gets complicated to explain to the
readers of the code I think).

At that point the name should also be <something>_state again, because
we do update the linenr inside 'parse_git_diff_header()', just not
inside any of the 'gitdiff_*' functions, though that is only a minor
point.

So unless there's a good name for this struct that I couldn't think
of, I think it's better to pass in the variables separately to
'parse_git_diff_header()', and then pass the struct just to the
'gitdiff_*' functions, as it's done currently.

> But other than that, I think these patches are generally moving bits
> in the right direction.

Thanks for the review!

> I do not have strong opinions on the later part of the series on
> range-diff proper.
> 
> Thanks.
Junio C Hamano July 9, 2019, 11:22 p.m. UTC | #3
Thomas Gummerer <t.gummerer@gmail.com> writes:

> Maybe it would be even better to name it 'struct gitdiff_data', as
> it's really only used for those few functions?

Is it really the case where "these three are only used by the
codepath you made public"?  If so, I agree that "gitdiff_data" is a
perfectly good name for it.

I however had an impression that it is the oppposite, i.e. "the
codepath you made public only needs these three, but these three are
used by other (still private) parts, too."  If this is the case,
then "gitdiff_data" is a misnomer, if we were to embed an instance
inside apply_state.

It seems that it is not a good idea to do such embedding, and if
that is the case, "gitdiff_data" is a fine for the three-field
struct.

Thanks.
Thomas Gummerer July 10, 2019, 8:48 a.m. UTC | #4
On 07/09, Junio C Hamano wrote:
> Thomas Gummerer <t.gummerer@gmail.com> writes:
> 
> > Maybe it would be even better to name it 'struct gitdiff_data', as
> > it's really only used for those few functions?
> 
> Is it really the case where "these three are only used by the
> codepath you made public"?  If so, I agree that "gitdiff_data" is a
> perfectly good name for it.
> 
> I however had an impression that it is the oppposite, i.e. "the
> codepath you made public only needs these three, but these three are
> used by other (still private) parts, too."  If this is the case,
> then "gitdiff_data" is a misnomer, if we were to embed an instance
> inside apply_state.

Yeah, that's correct.  What I meant was that since we're only using
this struct for the private 'gitdiff_*()' functions, which are called
from 'parse_git_diff_header()', 'struct gitdiff_data' would be a
better name than 'struct parse_git_diff_header_data'.

I do agree that it wouldn't be a good name if we were to embed it
inside 'struct apply_state', and as mentioned in the previous email
I'd have a hard time coming up with a good name if we were to do that.

> It seems that it is not a good idea to do such embedding, and if
> that is the case, "gitdiff_data" is a fine for the three-field
> struct.

Yeah, I think that's the best way forward, thanks.

> Thanks.

Patch
diff mbox series

diff --git a/apply.c b/apply.c
index 468f1d3fee..32b5b072ee 100644
--- a/apply.c
+++ b/apply.c
@@ -207,40 +207,6 @@  struct fragment {
 #define BINARY_DELTA_DEFLATED	1
 #define BINARY_LITERAL_DEFLATED 2
 
-/*
- * This represents a "patch" to a file, both metainfo changes
- * such as creation/deletion, filemode and content changes represented
- * as a series of fragments.
- */
-struct patch {
-	char *new_name, *old_name, *def_name;
-	unsigned int old_mode, new_mode;
-	int is_new, is_delete;	/* -1 = unknown, 0 = false, 1 = true */
-	int rejected;
-	unsigned ws_rule;
-	int lines_added, lines_deleted;
-	int score;
-	int extension_linenr; /* first line specifying delete/new/rename/copy */
-	unsigned int is_toplevel_relative:1;
-	unsigned int inaccurate_eof:1;
-	unsigned int is_binary:1;
-	unsigned int is_copy:1;
-	unsigned int is_rename:1;
-	unsigned int recount:1;
-	unsigned int conflicted_threeway:1;
-	unsigned int direct_to_threeway:1;
-	unsigned int crlf_in_old:1;
-	struct fragment *fragments;
-	char *result;
-	size_t resultsize;
-	char old_oid_prefix[GIT_MAX_HEXSZ + 1];
-	char new_oid_prefix[GIT_MAX_HEXSZ + 1];
-	struct patch *next;
-
-	/* three-way fallback result */
-	struct object_id threeway_stage[3];
-};
-
 static void free_fragment_list(struct fragment *list)
 {
 	while (list) {
@@ -1320,12 +1286,13 @@  static int check_header_line(int linenr, struct patch *patch)
 	return 0;
 }
 
-/* Verify that we recognize the lines following a git header */
-static int parse_git_header(struct apply_state *state,
-			    const char *line,
-			    int len,
-			    unsigned int size,
-			    struct patch *patch)
+int parse_git_diff_header(struct strbuf *root,
+			  int *linenr,
+			  int p_value,
+			  const char *line,
+			  int len,
+			  unsigned int size,
+			  struct patch *patch)
 {
 	unsigned long offset;
 	struct parse_git_header_state parse_hdr_state;
@@ -1340,21 +1307,21 @@  static int parse_git_header(struct apply_state *state,
 	 * or removing or adding empty files), so we get
 	 * the default name from the header.
 	 */
-	patch->def_name = git_header_name(state->p_value, line, len);
-	if (patch->def_name && state->root.len) {
-		char *s = xstrfmt("%s%s", state->root.buf, patch->def_name);
+	patch->def_name = git_header_name(p_value, line, len);
+	if (patch->def_name && root->len) {
+		char *s = xstrfmt("%s%s", root->buf, patch->def_name);
 		free(patch->def_name);
 		patch->def_name = s;
 	}
 
 	line += len;
 	size -= len;
-	state->linenr++;
-	parse_hdr_state.root = &state->root;
-	parse_hdr_state.linenr = state->linenr;
-	parse_hdr_state.p_value = state->p_value;
+	(*linenr)++;
+	parse_hdr_state.root = root;
+	parse_hdr_state.linenr = *linenr;
+	parse_hdr_state.p_value = p_value;
 
-	for (offset = len ; size > 0 ; offset += len, size -= len, line += len, state->linenr++) {
+	for (offset = len ; size > 0 ; offset += len, size -= len, line += len, (*linenr)++) {
 		static const struct opentry {
 			const char *str;
 			int (*fn)(struct parse_git_header_state *, const char *, struct patch *);
@@ -1391,7 +1358,7 @@  static int parse_git_header(struct apply_state *state,
 			res = p->fn(&parse_hdr_state, line + oplen, patch);
 			if (res < 0)
 				return -1;
-			if (check_header_line(state->linenr, patch))
+			if (check_header_line(*linenr, patch))
 				return -1;
 			if (res > 0)
 				return offset;
@@ -1572,7 +1539,9 @@  static int find_header(struct apply_state *state,
 		 * or mode change, so we handle that specially
 		 */
 		if (!memcmp("diff --git ", line, 11)) {
-			int git_hdr_len = parse_git_header(state, line, len, size, patch);
+			int git_hdr_len = parse_git_diff_header(&state->root, &state->linenr,
+								state->p_value, line, len,
+								size, patch);
 			if (git_hdr_len < 0)
 				return -128;
 			if (git_hdr_len <= len)
diff --git a/apply.h b/apply.h
index 5948348133..ade50f66c5 100644
--- a/apply.h
+++ b/apply.h
@@ -117,6 +117,40 @@  struct apply_state {
 	int applied_after_fixing_ws;
 };
 
+/*
+ * This represents a "patch" to a file, both metainfo changes
+ * such as creation/deletion, filemode and content changes represented
+ * as a series of fragments.
+ */
+struct patch {
+	char *new_name, *old_name, *def_name;
+	unsigned int old_mode, new_mode;
+	int is_new, is_delete;	/* -1 = unknown, 0 = false, 1 = true */
+	int rejected;
+	unsigned ws_rule;
+	int lines_added, lines_deleted;
+	int score;
+	int extension_linenr; /* first line specifying delete/new/rename/copy */
+	unsigned int is_toplevel_relative:1;
+	unsigned int inaccurate_eof:1;
+	unsigned int is_binary:1;
+	unsigned int is_copy:1;
+	unsigned int is_rename:1;
+	unsigned int recount:1;
+	unsigned int conflicted_threeway:1;
+	unsigned int direct_to_threeway:1;
+	unsigned int crlf_in_old:1;
+	struct fragment *fragments;
+	char *result;
+	size_t resultsize;
+	char old_oid_prefix[GIT_MAX_HEXSZ + 1];
+	char new_oid_prefix[GIT_MAX_HEXSZ + 1];
+	struct patch *next;
+
+	/* three-way fallback result */
+	struct object_id threeway_stage[3];
+};
+
 int apply_parse_options(int argc, const char **argv,
 			struct apply_state *state,
 			int *force_apply, int *options,
@@ -127,6 +161,20 @@  int init_apply_state(struct apply_state *state,
 void clear_apply_state(struct apply_state *state);
 int check_apply_state(struct apply_state *state, int force_apply);
 
+/*
+ * Parse a get header, starting at line.  Fills the relevant metadata
+ * information in 'struct patch'.
+ *
+ * Returns -1 on failure, the length of the parsed header otherwise.
+ */
+int parse_git_diff_header(struct strbuf *root,
+			  int *linenr,
+			  int p_value,
+			  const char *line,
+			  int len,
+			  unsigned int size,
+			  struct patch *patch);
+
 /*
  * Some aspects of the apply behavior are controlled by the following
  * bits in the "options" parameter passed to apply_all_patches().