diff mbox series

[1/3] log-tree: take ownership of pointer

Message ID 3b12a8cf393b6d8f0877fd7d87173c565d7d5a90.1709841147.git.code@khaugsbakk.name (mailing list archive)
State Superseded
Headers show
Series format-patch: teach `--header-cmd` | expand

Commit Message

Kristoffer Haugsbakk March 7, 2024, 7:59 p.m. UTC
The MIME header handling started using string buffers in
d50b69b868d (log_write_email_headers: use strbufs, 2018-05-18). The
subject buffer is given to `extra_headers` without that variable taking
ownership; the commit “punts on that ownership” (in general, not just
for this buffer).

In an upcoming commit we will first assign `extra_headers` to the owned
pointer from another `strbuf`. In turn we need this variable to always
contain an owned pointer so that we can free it in the calling
function.

Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name>
---
 log-tree.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Jeff King March 12, 2024, 9:29 a.m. UTC | #1
On Thu, Mar 07, 2024 at 08:59:35PM +0100, Kristoffer Haugsbakk wrote:

> The MIME header handling started using string buffers in
> d50b69b868d (log_write_email_headers: use strbufs, 2018-05-18). The
> subject buffer is given to `extra_headers` without that variable taking
> ownership; the commit “punts on that ownership” (in general, not just
> for this buffer).
> 
> In an upcoming commit we will first assign `extra_headers` to the owned
> pointer from another `strbuf`. In turn we need this variable to always
> contain an owned pointer so that we can free it in the calling
> function.

Hmm, OK. This patch by itself introduces a memory leak. It would be nice
if we could couple it with the matching free() so that we can see that
the issue is fixed. It sounds like your patch 2 is going to introduce
such a free, but I'm not sure it's complete. It frees the old
extra_headers before reassigning it, but nobody cleans it up after
handling the final commit.

We should also drop the "static" from subject_buffer, if it is no longer
needed. Likewise, any strings that start owning memory (here or in patch
2) should probably drop their "const". That makes the ownership more
clear, and avoids ugly casts when freeing.

-Peff
Kristoffer Haugsbakk March 12, 2024, 5:43 p.m. UTC | #2
Hi Jeff and thanks for taking a look

On Tue, Mar 12, 2024, at 10:29, Jeff King wrote:
> On Thu, Mar 07, 2024 at 08:59:35PM +0100, Kristoffer Haugsbakk wrote:
>
>> The MIME header handling started using string buffers in
>> d50b69b868d (log_write_email_headers: use strbufs, 2018-05-18). The
>> subject buffer is given to `extra_headers` without that variable taking
>> ownership; the commit “punts on that ownership” (in general, not just
>> for this buffer).
>>
>> In an upcoming commit we will first assign `extra_headers` to the owned
>> pointer from another `strbuf`. In turn we need this variable to always
>> contain an owned pointer so that we can free it in the calling
>> function.
>
> Hmm, OK. This patch by itself introduces a memory leak. It would be nice
> if we could couple it with the matching free() so that we can see that
> the issue is fixed. It sounds like your patch 2 is going to introduce
> such a free, but I'm not sure it's complete.

Is it okay if it is done in patch 2?

> It frees the old extra_headers before reassigning it, but nobody
> cleans it up after handling the final commit.

I didn’t get any leak errors from the CI. `extra_headers` in `show_log`
is populated by calling `log_write_email_headers`. Then later it is
assigned to

    ctx.after_subject = extra_headers;

Then `ctx.after_subject is freed later

    free((char *)ctx.after_subject);

Am I missing something?

> We should also drop the "static" from subject_buffer, if it is no longer
> needed. Likewise, any strings that start owning memory (here or in patch
> 2) should probably drop their "const". That makes the ownership more
> clear, and avoids ugly casts when freeing.

Okay, I’ll do that.

Thanks
Jeff King March 13, 2024, 6:54 a.m. UTC | #3
On Tue, Mar 12, 2024 at 06:43:55PM +0100, Kristoffer Haugsbakk wrote:

> > Hmm, OK. This patch by itself introduces a memory leak. It would be nice
> > if we could couple it with the matching free() so that we can see that
> > the issue is fixed. It sounds like your patch 2 is going to introduce
> > such a free, but I'm not sure it's complete.
> 
> Is it okay if it is done in patch 2?

I don't think it's the end of the world to do it in patch 2, as long as
we end up in a good spot. But IMHO it's really hard for reviewers to
understand what is going on, because it's intermingled with so many
other changes. It would be much easier to read if we had a preparatory
patch that switched the memory ownership of the field, and then built on
top of that.

But I recognize that sometimes that's hard to do, because the state is
so tangled that the functional change is what untangles it. I'm not sure
if that's the case here or not; you'd probably have a better idea as
somebody who looked carefully at it recently.

> > It frees the old extra_headers before reassigning it, but nobody
> > cleans it up after handling the final commit.
> 
> I didn’t get any leak errors from the CI. `extra_headers` in `show_log`
> is populated by calling `log_write_email_headers`. Then later it is
> assigned to
> 
>     ctx.after_subject = extra_headers;
> 
> Then `ctx.after_subject is freed later
> 
>     free((char *)ctx.after_subject);
> 
> Am I missing something?

Ah, I see. I was confused by looking for a free of an extra_headers
field. We have rev_info.extra_headers, and that is _not_ owned by
rev_info. We used to assign that to a variable in
log_write_email_headers(), but now we actually make a copy of it. And so
the copy is freed in that function when we replace it with a version
containing extra mime headers here:

                  strbuf_addf(&subject_buffer,
                           "%s"
                           "MIME-Version: 1.0\n"
                           "Content-Type: multipart/mixed;"
                           " boundary=\"%s%s\"\n"
                           "\n"
                           "This is a multi-part message in MIME "
                           "format.\n"
                           "--%s%s\n"
                           "Content-Type: text/plain; "
                           "charset=UTF-8; format=fixed\n"
                           "Content-Transfer-Encoding: 8bit\n\n",
                           extra_headers ? extra_headers : "",
                           mime_boundary_leader, opt->mime_boundary,
                           mime_boundary_leader, opt->mime_boundary);
                  free((char *)extra_headers);
                  extra_headers = strbuf_detach(&subject_buffer, NULL);

But the actual ownership is passed out via the extra_headers_p variable,
and that is what is assigned to ctx.after_subject (which now takes
ownership).

I think in the snippet I quoted above that extra_headers could never be
NULL now, right? We'll always return at least an empty string. But
moreover, we are formatting it into a strbuf, only to potentially copy
it it another strbuf. Couldn't we just do it all in one strbuf?

Something like this:

 log-tree.c | 29 ++++++++---------------------
 1 file changed, 8 insertions(+), 21 deletions(-)

diff --git a/log-tree.c b/log-tree.c
index 9196b4f1d4..0a703a0303 100644
--- a/log-tree.c
+++ b/log-tree.c
@@ -469,29 +469,22 @@ void fmt_output_email_subject(struct strbuf *sb, struct rev_info *opt)
 	}
 }
 
-static char *extra_and_pe_headers(const char *extra_headers, const char *pe_headers) {
-	struct strbuf all_headers = STRBUF_INIT;
-
-	if (extra_headers)
-		strbuf_addstr(&all_headers, extra_headers);
-	if (pe_headers) {
-		strbuf_addstr(&all_headers, pe_headers);
-	}
-	return strbuf_detach(&all_headers, NULL);
-}
-
 void log_write_email_headers(struct rev_info *opt, struct commit *commit,
 			     const char **extra_headers_p,
 			     int *need_8bit_cte_p,
 			     int maybe_multipart)
 {
-	const char *extra_headers =
-		extra_and_pe_headers(opt->extra_headers, opt->pe_headers);
+	struct strbuf headers = STRBUF_INIT;
 	const char *name = oid_to_hex(opt->zero_commit ?
 				      null_oid() : &commit->object.oid);
 
 	*need_8bit_cte_p = 0; /* unknown */
 
+	if (opt->extra_headers)
+		strbuf_addstr(&headers, opt->extra_headers);
+	if (opt->pe_headers)
+		strbuf_addstr(&headers, opt->pe_headers);
+
 	fprintf(opt->diffopt.file, "From %s Mon Sep 17 00:00:00 2001\n", name);
 	graph_show_oneline(opt->graph);
 	if (opt->message_id) {
@@ -508,16 +501,13 @@ void log_write_email_headers(struct rev_info *opt, struct commit *commit,
 		graph_show_oneline(opt->graph);
 	}
 	if (opt->mime_boundary && maybe_multipart) {
-		static struct strbuf subject_buffer = STRBUF_INIT;
 		static struct strbuf buffer = STRBUF_INIT;
 		struct strbuf filename =  STRBUF_INIT;
 		*need_8bit_cte_p = -1; /* NEVER */
 
-		strbuf_reset(&subject_buffer);
 		strbuf_reset(&buffer);
 
-		strbuf_addf(&subject_buffer,
-			 "%s"
+		strbuf_addf(&headers,
 			 "MIME-Version: 1.0\n"
 			 "Content-Type: multipart/mixed;"
 			 " boundary=\"%s%s\"\n"
@@ -528,11 +518,8 @@ void log_write_email_headers(struct rev_info *opt, struct commit *commit,
 			 "Content-Type: text/plain; "
 			 "charset=UTF-8; format=fixed\n"
 			 "Content-Transfer-Encoding: 8bit\n\n",
-			 extra_headers ? extra_headers : "",
 			 mime_boundary_leader, opt->mime_boundary,
 			 mime_boundary_leader, opt->mime_boundary);
-		free((char *)extra_headers);
-		extra_headers = strbuf_detach(&subject_buffer, NULL);
 
 		if (opt->numbered_files)
 			strbuf_addf(&filename, "%d", opt->nr);
@@ -552,7 +539,7 @@ void log_write_email_headers(struct rev_info *opt, struct commit *commit,
 		opt->diffopt.stat_sep = buffer.buf;
 		strbuf_release(&filename);
 	}
-	*extra_headers_p = extra_headers;
+	*extra_headers_p = headers.len ? strbuf_detach(&headers, NULL) : NULL;
 }
 
 static void show_sig_lines(struct rev_info *opt, int status, const char *bol)


The resulting code is shorter and (IMHO) easier to understand. It
avoids an extra allocation and copy when using mime. It also avoids the
allocation of an empty string when opt->extra_headers and
opt->pe_headers are both NULL. It does make an extra copy when
extra_headers is non-NULL but pe_headers is NULL (and you're not using
MIME), as we could just use opt->extra_headers as-is, then. But since
the caller needs to take ownership, we can't avoid that copy.

I think you could even do this cleanup before adding pe_headers,
especially if it was coupled with cleaning up the memory ownership
issues.

-Peff
Kristoffer Haugsbakk March 13, 2024, 5:49 p.m. UTC | #4
On Wed, Mar 13, 2024, at 07:54, Jeff King wrote:
> On Tue, Mar 12, 2024 at 06:43:55PM +0100, Kristoffer Haugsbakk wrote:
>
>> > Hmm, OK. This patch by itself introduces a memory leak. It would be nice
>> > if we could couple it with the matching free() so that we can see that
>> > the issue is fixed. It sounds like your patch 2 is going to introduce
>> > such a free, but I'm not sure it's complete.
>>
>> Is it okay if it is done in patch 2?
>
> I don't think it's the end of the world to do it in patch 2, as long as
> we end up in a good spot. But IMHO it's really hard for reviewers to
> understand what is going on, because it's intermingled with so many
> other changes. It would be much easier to read if we had a preparatory
> patch that switched the memory ownership of the field, and then built on
> top of that.

Sounds good. I’ll do that.

> But I recognize that sometimes that's hard to do, because the state is
> so tangled that the functional change is what untangles it. I'm not sure
> if that's the case here or not; you'd probably have a better idea as
> somebody who looked carefully at it recently.

Seems doable in this case.

By the way. I pretty much just elbowed in the changes I needed (like in
`revision.h`) in order to add this per-patch/cover letter headers
variable. Let me know if there are better ways to do it.

>> > It frees the old extra_headers before reassigning it, but nobody
>> > cleans it up after handling the final commit.
>>
>> I didn’t get any leak errors from the CI. `extra_headers` in `show_log`
>> is populated by calling `log_write_email_headers`. Then later it is
>> assigned to
>>
>>     ctx.after_subject = extra_headers;
>>
>> Then `ctx.after_subject is freed later
>>
>>     free((char *)ctx.after_subject);
>>
>> Am I missing something?
>
> Ah, I see. I was confused by looking for a free of an extra_headers
> field. We have rev_info.extra_headers, and that is _not_ owned by
> rev_info. We used to assign that to a variable in
> log_write_email_headers(), but now we actually make a copy of it. And so
> the copy is freed in that function when we replace it with a version
> containing extra mime headers here:
>
> [snip]
>
> But the actual ownership is passed out via the extra_headers_p variable,
> and that is what is assigned to ctx.after_subject (which now takes
> ownership).
>
> I think in the snippet I quoted above that extra_headers could never be
> NULL now, right? We'll always return at least an empty string. But
> moreover, we are formatting it into a strbuf, only to potentially copy
> it it another strbuf. Couldn't we just do it all in one strbuf?
>
> Something like this:
>
> [snip]
>
>
> The resulting code is shorter and (IMHO) easier to understand. It
> avoids an extra allocation and copy when using mime. It also avoids the
> allocation of an empty string when opt->extra_headers and
> opt->pe_headers are both NULL. It does make an extra copy when
> extra_headers is non-NULL but pe_headers is NULL (and you're not using
> MIME), as we could just use opt->extra_headers as-is, then. But since
> the caller needs to take ownership, we can't avoid that copy.
>
> I think you could even do this cleanup before adding pe_headers,
> especially if it was coupled with cleaning up the memory ownership
> issues.
>
> -Peff

I haven’t tried yet but this seems like a good plan. It was getting a
getting a bit too back and forth with my changes. So I’ll try to use
your patch and see if I can get a clean preparatory patch/commit before
the main change.

Cheers
diff mbox series

Patch

diff --git a/log-tree.c b/log-tree.c
index 337b9334cdb..2eabd19962b 100644
--- a/log-tree.c
+++ b/log-tree.c
@@ -519,7 +519,7 @@  void log_write_email_headers(struct rev_info *opt, struct commit *commit,
 			 extra_headers ? extra_headers : "",
 			 mime_boundary_leader, opt->mime_boundary,
 			 mime_boundary_leader, opt->mime_boundary);
-		extra_headers = subject_buffer.buf;
+		extra_headers = strbuf_detach(&subject_buffer, NULL);
 
 		if (opt->numbered_files)
 			strbuf_addf(&filename, "%d", opt->nr);