diff mbox series

format-patch: warn if commit msg contains a patch delimiter

Message ID d0b577825124ac684ab304d3a1395f3d2d0708e8.1662333027.git.matheus.bernardino@usp.br (mailing list archive)
State Superseded
Headers show
Series format-patch: warn if commit msg contains a patch delimiter | expand

Commit Message

Matheus Tavares Sept. 4, 2022, 11:12 p.m. UTC
When applying a patch, `git am` looks for special delimiter strings
(such as "---") to know where the message ends and the actual diff
starts. If one of these strings appears in the commit message itself,
`am` might get confused and fail to apply the patch properly. This has
already caused inconveniences in the past [1][2]. To help avoid such
problem, let's make `git format-patch` warn on commit messages
containing one of the said strings.

[1]: https://lore.kernel.org/git/20210113085846-mutt-send-email-mst@kernel.org/
[2]: https://lore.kernel.org/git/16297305.cDA1TJNmNo@earendil/

Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
 builtin/log.c           |  1 +
 log-tree.c              |  1 +
 mailinfo.c              |  4 ++--
 mailinfo.h              |  3 +++
 pretty.c                | 21 ++++++++++++++++++++-
 pretty.h                |  3 ++-
 revision.h              |  3 ++-
 t/t4014-format-patch.sh | 16 ++++++++++++++++
 8 files changed, 47 insertions(+), 5 deletions(-)

Comments

Ævar Arnfjörð Bjarmason Sept. 5, 2022, 8:01 a.m. UTC | #1
On Sun, Sep 04 2022, Matheus Tavares wrote:

> When applying a patch, `git am` looks for special delimiter strings
> (such as "---") to know where the message ends and the actual diff
> starts. If one of these strings appears in the commit message itself,
> `am` might get confused and fail to apply the patch properly. This has
> already caused inconveniences in the past [1][2]. To help avoid such
> problem, let's make `git format-patch` warn on commit messages
> containing one of the said strings.
>
> [1]: https://lore.kernel.org/git/20210113085846-mutt-send-email-mst@kernel.org/
> [2]: https://lore.kernel.org/git/16297305.cDA1TJNmNo@earendil/

I followed this topic with one eye, and have run into this myself in the
past. I'm not against this warning, but I wonder if we can't fix
"am/apply" to just be smarter. The cases I've seen are all ones where:

 * We have a copy/pasted git diff, but we could disambiguate based on
   (at least) the "---" line being a telltale for the "real" patch, and
   the "X file changed..." diffstat.
 * We have a not-quite-git-looking patch diff in the commit message
   (which we'd normally detect and apply), as in your [2].

Couldn't we just be a bit smarter about applying these, and do a
look-ahead and find what the user meant.

Is any case, having such a warning won't "settle" this issue, as we're
able to deal with this non-ambiguity in commit objects/the push/fetch
protocol. It's just "format-patch/am" as a "wire protocol" that has this
issue.

But anyway, that's the state of the world now, so warning() about it is
fair, even if we had a fix for the "apply" part we might want to warn
for a while to note that it's an issue on older gits.

> +		if (pp->check_in_body_patch_breaks) {
> +			strbuf_reset(&linebuf);
> +			strbuf_add(&linebuf, line, linelen);
> +			if (patchbreak(&linebuf) || is_scissors_line(linebuf.buf)) {
> +				strbuf_strip_suffix(&linebuf, "\n");

Hrm, it's a (small) shame that the patchbreak() function takes a "struct
strbuf" rather than a char */size_t in this case (seemingly for no good
reason, as it's "const"?).

Because of that you need to make a copy here, instead of just finding
the "\n" and using the %*s format, anyway, small potatoes.

> +				warning("commit message has a patch delimiter: '%s'",
> +					linebuf.buf);

Missing _()?

> +test_expect_success 'warn if commit message contains patch delimiter' '
> +	>delim &&
> +	git add delim &&
> +	GIT_EDITOR="printf \"title\n\n---\" >" git commit &&

Maybe I'm missing something, but isn't this GIT_EDITOR/printf just
another way of saying something like:

	cat >msg <<-\EOF &&
	"title

	---" >
	EOF
	git commit -F msg && ...

Untested, so maybe not..
René Scharfe Sept. 5, 2022, 10:57 a.m. UTC | #2
Am 05.09.22 um 10:01 schrieb Ævar Arnfjörð Bjarmason:
>
> On Sun, Sep 04 2022, Matheus Tavares wrote:
>
>> When applying a patch, `git am` looks for special delimiter strings
>> (such as "---") to know where the message ends and the actual diff
>> starts. If one of these strings appears in the commit message itself,
>> `am` might get confused and fail to apply the patch properly. This has
>> already caused inconveniences in the past [1][2]. To help avoid such
>> problem, let's make `git format-patch` warn on commit messages
>> containing one of the said strings.
>>
>> [1]: https://lore.kernel.org/git/20210113085846-mutt-send-email-mst@kernel.org/
>> [2]: https://lore.kernel.org/git/16297305.cDA1TJNmNo@earendil/
>
> I followed this topic with one eye, and have run into this myself in the
> past. I'm not against this warning, but I wonder if we can't fix
> "am/apply" to just be smarter. The cases I've seen are all ones where:
>
>  * We have a copy/pasted git diff, but we could disambiguate based on
>    (at least) the "---" line being a telltale for the "real" patch, and
>    the "X file changed..." diffstat.
>  * We have a not-quite-git-looking patch diff in the commit message
>    (which we'd normally detect and apply), as in your [2].
>
> Couldn't we just be a bit smarter about applying these, and do a
> look-ahead and find what the user meant.

Whatever we use to separate message from diff can be included in that
message by an unsuspecting user and "---" can be part of a diff.  An
earlier discussion yielded an idea, but no implementation:
https://lore.kernel.org/git/20200204010524-mutt-send-email-mst@kernel.org/

> Is any case, having such a warning won't "settle" this issue, as we're
> able to deal with this non-ambiguity in commit objects/the push/fetch
> protocol. It's just "format-patch/am" as a "wire protocol" that has this
> issue.
>
> But anyway, that's the state of the world now, so warning() about it is
> fair, even if we had a fix for the "apply" part we might want to warn
> for a while to note that it's an issue on older gits.
>
>> +		if (pp->check_in_body_patch_breaks) {
>> +			strbuf_reset(&linebuf);
>> +			strbuf_add(&linebuf, line, linelen);
>> +			if (patchbreak(&linebuf) || is_scissors_line(linebuf.buf)) {
>> +				strbuf_strip_suffix(&linebuf, "\n");
>
> Hrm, it's a (small) shame that the patchbreak() function takes a "struct
> strbuf" rather than a char */size_t in this case (seemingly for no good
> reason, as it's "const"?).

A strbuf is NUL-terminated, a length-limited string (char */size_t)
doesn't have to be.  That means the current implementation can use
functions like starts_with(), but a faithful version that promises to
stay within a given length cannot.  So the reason is probably
convenience.  With skip_prefix_mem() it wouldn't be that bad, though:

---
 mailinfo.c | 37 +++++++++++++++++++------------------
 1 file changed, 19 insertions(+), 18 deletions(-)

diff --git a/mailinfo.c b/mailinfo.c
index 9621ba62a3..ae2e70e363 100644
--- a/mailinfo.c
+++ b/mailinfo.c
@@ -646,32 +646,30 @@ static void decode_transfer_encoding(struct mailinfo *mi, struct strbuf *line)
 	free(ret);
 }

-static inline int patchbreak(const struct strbuf *line)
+static int patchbreak(const char *buf, size_t len)
 {
-	size_t i;
-
 	/* Beginning of a "diff -" header? */
-	if (starts_with(line->buf, "diff -"))
+	if (skip_prefix_mem(buf, len, "diff -", &buf, &len))
 		return 1;

 	/* CVS "Index: " line? */
-	if (starts_with(line->buf, "Index: "))
+	if (skip_prefix_mem(buf, len, "Index: ", &buf, &len))
 		return 1;

 	/*
 	 * "--- <filename>" starts patches without headers
 	 * "---<sp>*" is a manual separator
 	 */
-	if (line->len < 4)
+	if (len < 4)
 		return 0;

-	if (starts_with(line->buf, "---")) {
+	if (skip_prefix_mem(buf, len, "---", &buf, &len)) {
 		/* space followed by a filename? */
-		if (line->buf[3] == ' ' && !isspace(line->buf[4]))
+		if (len > 1 && buf[0] == ' ' && !isspace(buf[1]))
 			return 1;
 		/* Just whitespace? */
-		for (i = 3; i < line->len; i++) {
-			unsigned char c = line->buf[i];
+		for (; len; buf++, len--) {
+			unsigned char c = buf[0];
 			if (c == '\n')
 				return 1;
 			if (!isspace(c))
@@ -682,14 +680,14 @@ static inline int patchbreak(const struct strbuf *line)
 	return 0;
 }

-static int is_scissors_line(const char *line)
+static int is_scissors_line(const char *line, size_t len)
 {
 	const char *c;
 	int scissors = 0, gap = 0;
 	const char *first_nonblank = NULL, *last_nonblank = NULL;
 	int visible, perforation = 0, in_perforation = 0;

-	for (c = line; *c; c++) {
+	for (c = line; len; c++, len--) {
 		if (isspace(*c)) {
 			if (in_perforation) {
 				perforation++;
@@ -705,12 +703,14 @@ static int is_scissors_line(const char *line)
 			perforation++;
 			continue;
 		}
-		if (starts_with(c, ">8") || starts_with(c, "8<") ||
-		    starts_with(c, ">%") || starts_with(c, "%<")) {
+		if (skip_prefix_mem(c, len, ">8", &c, &len) ||
+		    skip_prefix_mem(c, len, "8<", &c, &len) ||
+		    skip_prefix_mem(c, len, ">%", &c, &len) ||
+		    skip_prefix_mem(c, len, "%<", &c, &len)) {
 			in_perforation = 1;
 			perforation += 2;
 			scissors += 2;
-			c++;
+			c--, len++;
 			continue;
 		}
 		in_perforation = 0;
@@ -747,7 +747,8 @@ static int check_inbody_header(struct mailinfo *mi, const struct strbuf *line)
 {
 	if (mi->inbody_header_accum.len &&
 	    (line->buf[0] == ' ' || line->buf[0] == '\t')) {
-		if (mi->use_scissors && is_scissors_line(line->buf)) {
+		if (mi->use_scissors &&
+		    is_scissors_line(line->buf, line->len)) {
 			/*
 			 * This is a scissors line; do not consider this line
 			 * as a header continuation line.
@@ -808,7 +809,7 @@ static int handle_commit_msg(struct mailinfo *mi, struct strbuf *line)
 	if (convert_to_utf8(mi, line, mi->charset.buf))
 		return 0; /* mi->input_error already set */

-	if (mi->use_scissors && is_scissors_line(line->buf)) {
+	if (mi->use_scissors && is_scissors_line(line->buf, line->len)) {
 		int i;

 		strbuf_setlen(&mi->log_message, 0);
@@ -826,7 +827,7 @@ static int handle_commit_msg(struct mailinfo *mi, struct strbuf *line)
 		return 0;
 	}

-	if (patchbreak(line)) {
+	if (patchbreak(line->buf, line->len)) {
 		if (mi->message_id)
 			strbuf_addf(&mi->log_message,
 				    "Message-Id: %s\n", mi->message_id);
--
2.37.2
diff mbox series

Patch

diff --git a/builtin/log.c b/builtin/log.c
index 56e2d95e86..edc84abaef 100644
--- a/builtin/log.c
+++ b/builtin/log.c
@@ -1973,6 +1973,7 @@  int cmd_format_patch(int argc, const char **argv, const char *prefix)
 	rev.diffopt.flags.recursive = 1;
 	rev.diffopt.no_free = 1;
 	rev.subject_prefix = fmt_patch_subject_prefix;
+	rev.check_in_body_patch_breaks = 1;
 	memset(&s_r_opt, 0, sizeof(s_r_opt));
 	s_r_opt.def = "HEAD";
 	s_r_opt.revarg_opt = REVARG_COMMITTISH;
diff --git a/log-tree.c b/log-tree.c
index 3e8c70ddcf..25ed5452b1 100644
--- a/log-tree.c
+++ b/log-tree.c
@@ -766,6 +766,7 @@  void show_log(struct rev_info *opt)
 	ctx.after_subject = extra_headers;
 	ctx.preserve_subject = opt->preserve_subject;
 	ctx.encode_email_headers = opt->encode_email_headers;
+	ctx.check_in_body_patch_breaks = opt->check_in_body_patch_breaks;
 	ctx.reflog_info = opt->reflog_info;
 	ctx.fmt = opt->commit_format;
 	ctx.mailmap = opt->mailmap;
diff --git a/mailinfo.c b/mailinfo.c
index 9621ba62a3..9945ea6267 100644
--- a/mailinfo.c
+++ b/mailinfo.c
@@ -646,7 +646,7 @@  static void decode_transfer_encoding(struct mailinfo *mi, struct strbuf *line)
 	free(ret);
 }
 
-static inline int patchbreak(const struct strbuf *line)
+int patchbreak(const struct strbuf *line)
 {
 	size_t i;
 
@@ -682,7 +682,7 @@  static inline int patchbreak(const struct strbuf *line)
 	return 0;
 }
 
-static int is_scissors_line(const char *line)
+int is_scissors_line(const char *line)
 {
 	const char *c;
 	int scissors = 0, gap = 0;
diff --git a/mailinfo.h b/mailinfo.h
index f2ffd0349e..8d4dda5deb 100644
--- a/mailinfo.h
+++ b/mailinfo.h
@@ -53,4 +53,7 @@  void setup_mailinfo(struct mailinfo *);
 int mailinfo(struct mailinfo *, const char *msg, const char *patch);
 void clear_mailinfo(struct mailinfo *);
 
+int patchbreak(const struct strbuf *line);
+int is_scissors_line(const char *line);
+
 #endif /* MAILINFO_H */
diff --git a/pretty.c b/pretty.c
index 6d819103fb..9f999029f5 100644
--- a/pretty.c
+++ b/pretty.c
@@ -5,6 +5,7 @@ 
 #include "diff.h"
 #include "revision.h"
 #include "string-list.h"
+#include "mailinfo.h"
 #include "mailmap.h"
 #include "log-tree.h"
 #include "notes.h"
@@ -2097,7 +2098,8 @@  void pp_remainder(struct pretty_print_context *pp,
 		  int indent)
 {
 	struct grep_opt *opt = pp->rev ? &pp->rev->grep_filter : NULL;
-	int first = 1;
+	int first = 1, found_delimiter = 0;
+	struct strbuf linebuf = STRBUF_INIT;
 
 	for (;;) {
 		const char *line = *msg_p;
@@ -2107,6 +2109,17 @@  void pp_remainder(struct pretty_print_context *pp,
 		if (!linelen)
 			break;
 
+		if (pp->check_in_body_patch_breaks) {
+			strbuf_reset(&linebuf);
+			strbuf_add(&linebuf, line, linelen);
+			if (patchbreak(&linebuf) || is_scissors_line(linebuf.buf)) {
+				strbuf_strip_suffix(&linebuf, "\n");
+				warning("commit message has a patch delimiter: '%s'",
+					linebuf.buf);
+				found_delimiter = 1;
+			}
+		}
+
 		if (is_blank_line(line, &linelen)) {
 			if (first)
 				continue;
@@ -2133,6 +2146,12 @@  void pp_remainder(struct pretty_print_context *pp,
 		}
 		strbuf_addch(sb, '\n');
 	}
+
+	if (found_delimiter)
+		warning("git am might fail to apply this patch. "
+			"Consider indenting the offending lines.");
+
+	strbuf_release(&linebuf);
 }
 
 void pretty_print_commit(struct pretty_print_context *pp,
diff --git a/pretty.h b/pretty.h
index f34e24c53a..12df2f4a39 100644
--- a/pretty.h
+++ b/pretty.h
@@ -49,7 +49,8 @@  struct pretty_print_context {
 	struct string_list *mailmap;
 	int color;
 	struct ident_split *from_ident;
-	unsigned encode_email_headers:1;
+	unsigned encode_email_headers:1,
+		 check_in_body_patch_breaks:1;
 	struct pretty_print_describe_status *describe_status;
 
 	/*
diff --git a/revision.h b/revision.h
index 61a9b1316b..f384ab716f 100644
--- a/revision.h
+++ b/revision.h
@@ -230,7 +230,8 @@  struct rev_info {
 			date_mode_explicit:1,
 			preserve_subject:1,
 			encode_email_headers:1,
-			include_header:1;
+			include_header:1,
+			check_in_body_patch_breaks:1;
 	unsigned int	disable_stdin:1;
 	/* --show-linear-break */
 	unsigned int	track_linear:1,
diff --git a/t/t4014-format-patch.sh b/t/t4014-format-patch.sh
index fbec8ad2ef..4868ea2b91 100755
--- a/t/t4014-format-patch.sh
+++ b/t/t4014-format-patch.sh
@@ -2329,4 +2329,20 @@  test_expect_success 'interdiff: solo-patch' '
 	test_cmp expect actual
 '
 
+test_expect_success 'warn if commit message contains patch delimiter' '
+	>delim &&
+	git add delim &&
+	GIT_EDITOR="printf \"title\n\n---\" >" git commit &&
+	git format-patch -1 2>stderr &&
+	grep "warning: commit message has a patch delimiter" stderr
+'
+
+test_expect_success 'warn if commit message contains scissors' '
+	>scissors &&
+	git add scissors &&
+	GIT_EDITOR="printf \"title\n\n-- >8 --\" >" git commit &&
+	git format-patch -1 2>stderr &&
+	grep "warning: commit message has a patch delimiter" stderr
+'
+
 test_done