mbox series

[0/2] UNLEAK style fixes

Message ID 20200813155426.GA896769@coredump.intra.peff.net (mailing list archive)
Headers show
Series UNLEAK style fixes | expand

Message

Jeff King Aug. 13, 2020, 3:54 p.m. UTC
Although we introduced UNLEAK() long ago, I don't know that anybody has
really made a concerted effort to annotate enough variables to make
running a leak-checker useful. So I haven't paid too much attention to
its use.

But a few people have added some annotations, and I think some of them
aren't great examples. So I decided to clean them up. This by definition
has no impact on regular builds (since UNLEAK is a noop there), but even
in leak-checking builds should give no behavior change.

Another category that I was tempted to change is when variables _could_
be freed, but we just don't bother to do so. E.g., at the end of
bugreport.c, we have:

  UNLEAK(buffer);
  UNLEAK(report_path);
  return !!launch_editor(report_path.buf, NULL, NULL);

Using UNLEAK(report_path) makes sense; we can't free it because we're
passing it to a function that runs until program end. But we _could_
free "buffer" here, which isn't otherwise used again (i.e., that could
be strbuf_release() instead of UNLEAK).

But that does have a run-time cost (we'd actually free the memory, even
though we could just exit and let the OS handle it). My guess is that
it's not a measurable cost, and the code might be cleaner to actually
clean up instead of sprinkling more UNLEAKs around. But until we're
actually pushing forward with a real effort to get a leak-checker
running clean, I don't see much point in doing one or the other.

(As a side note, if we want to declare UNLEAK() a failure because nobody
cares enough to really use it, I'm OK with that, too).

  [1/2]: stop calling UNLEAK() before die()
  [2/2]: ls-remote: simplify UNLEAK() usage

 bugreport.c         | 4 +---
 builtin/ls-remote.c | 8 +++-----
 midx.c              | 8 ++------
 3 files changed, 6 insertions(+), 14 deletions(-)

-Peff

Comments

Eric Sunshine Aug. 13, 2020, 7:32 p.m. UTC | #1
On Thu, Aug 13, 2020 at 11:54 AM Jeff King <peff@peff.net> wrote:
> (As a side note, if we want to declare UNLEAK() a failure because nobody
> cares enough to really use it, I'm OK with that, too).

Perhaps the reason that UNLEAK() has not been particularly successful,
in general, is that it requires extra knowledge and reasoning to know
when to use it and how to do so properly. Couple that with the fact
that the scope of cases where it can be used is quite narrow compared
to sum total of all code in project for which we simply free resources
when we're done with them. So, it's hard to keep the specialized
UNLEAK() knowledge in one's head.

Speaking from personal experience, the several times I have had to
deal with UNLEAK(), I had to re-learn it from scratch each time. That
meant studying the header comment, studying the implementation, and
studying existing callers before things "clicked" enough to be able to
feel confident about how to use it (assuming it wasn't false
confidence).

Even today, reading this patch series, I had to go through all that
again just to understand the changes made by the patches, and
especially the commit message of patch [1/2]. It took several
re-reads, plus re-examining UNLEAK() documentation, plus looking at
the UNLEAK() implementation a couple times before the [1/2] commit
message finally "clicked".

That all represents a lot of cognitive overhead versus the common
practice of simply freeing resources when you're done with them, which
requires no extra cognitive load since it is something we think about
_always_ when working with a language like C with no built-in garbage
collection.

So, I for one would not be especially sad to see UNLEAK() retired.

(The patch series itself looked fine and made sense once I had
re-acquired the necessary UNLEAK() knowledge.)
Jeff King Aug. 14, 2020, 10:34 a.m. UTC | #2
On Thu, Aug 13, 2020 at 03:32:56PM -0400, Eric Sunshine wrote:

> On Thu, Aug 13, 2020 at 11:54 AM Jeff King <peff@peff.net> wrote:
> > (As a side note, if we want to declare UNLEAK() a failure because nobody
> > cares enough to really use it, I'm OK with that, too).
> 
> Perhaps the reason that UNLEAK() has not been particularly successful,
> in general, is that it requires extra knowledge and reasoning to know
> when to use it and how to do so properly. Couple that with the fact
> that the scope of cases where it can be used is quite narrow compared
> to sum total of all code in project for which we simply free resources
> when we're done with them. So, it's hard to keep the specialized
> UNLEAK() knowledge in one's head.
> 
> Speaking from personal experience, the several times I have had to
> deal with UNLEAK(), I had to re-learn it from scratch each time. That
> meant studying the header comment, studying the implementation, and
> studying existing callers before things "clicked" enough to be able to
> feel confident about how to use it (assuming it wasn't false
> confidence).

I think this is really the meat of it. I never intended UNLEAK() to be
something people dealt with unless they were trying to get LSAN or
valgrind to run without complaining.

> That all represents a lot of cognitive overhead versus the common
> practice of simply freeing resources when you're done with them, which
> requires no extra cognitive load since it is something we think about
> _always_ when working with a language like C with no built-in garbage
> collection.

To be clear, I have no problem with _actually_ freeing resources if
that's an option. The point of UNLEAK() was:

  - to help with structs that don't have an easy way to free all
    elements (e.g., rev_info)

  - to preempt arguments about whether calling free(buf) right before
    programming exit is wasted effort. Whereas UNLEAK() is true
    zero-cost for non-leak-checking builds.

  - to avoid asking people to rewrite:

      return foo(bar);

     into:

       ret = foo(bar);
       free(bar);
       return ret;

So we could go that direction, but I'd wait on it until somebody feels
like sinking some time into getting us leak-checker-clean.

In the meantime, I have a slight preference to leave UNLEAK() there as a
potential tool for somebody digging into leak-checkers. But we almost
certainly shouldn't be asking new authors to use it in reviews, etc.
TBH, I'm not sure why people starting sprinkling UNLEAK() around in the
first place. ;)

-Peff
Eric Sunshine Aug. 14, 2020, 4:23 p.m. UTC | #3
On Fri, Aug 14, 2020 at 6:35 AM Jeff King <peff@peff.net> wrote:
> On Thu, Aug 13, 2020 at 03:32:56PM -0400, Eric Sunshine wrote:
> > That all represents a lot of cognitive overhead versus the common
> > practice of simply freeing resources when you're done with them, which
> > requires no extra cognitive load since it is something we think about
> > _always_ when working with a language like C with no built-in garbage
> > collection.
>
> In the meantime, I have a slight preference to leave UNLEAK() there as a
> potential tool for somebody digging into leak-checkers. But we almost
> certainly shouldn't be asking new authors to use it in reviews, etc.

I don't think it works that way in practice, though. There are enough
UNLEAK()'s sprinkled around that anyone working on or around code with
an existing UNLEAK() is compelled to understand/[re-]study it in order
to avoid breaking existing uses and/or to correctly mirror existing
uses when dealing with new resource allocations.

The same applies to patches. As a reviewer, I have two choices when I
see UNLEAK(): either I ignore it because I don't have the specialized
knowledge in my head (which makes me feel like my review is
ineffective), or I re-acquire the knowledge. And it's not just patches
like the ones in this series which are actively adjusting UNLEAK()
callers, but any patch which adds or removes an UNLEAK() corresponding
to the central meaty changes of the patch, or even a patch in which
UNLEAK() appears only in context lines, or even patches which don't
contains any UNLEAK() calls, but the source file to which the patch
applies does use UNLEAK(), if the reviewer consults the original
source code in addition to the patch.

> TBH, I'm not sure why people starting sprinkling UNLEAK() around in the
> first place. ;)

For the same reason that people are concerned about calling free() or
otherwise releasing or unlocking resources which they have acquired:
they're trying to be responsible. When a programmer sees UNLEAK()
being used in or around the code being changed, he or she will attempt
to maintain the fidelity of the existing code by being careful to
mimic existing nearby resource handling practices.