diff mbox series

[RFC,2/2] reftable: don't memset() a NULL from failed malloc()

Message ID RFC-patch-2.2-364d1194a95-20220415T101740Z-avarab@gmail.com (mailing list archive)
State New, archived
Headers show
Series reftable: remove poor man's SANITIZE=address, fix a memset() bug | expand

Commit Message

Ævar Arnfjörð Bjarmason April 15, 2022, 10:21 a.m. UTC
Return NULL instead of possibly feeding a NULL to memset() in
reftable_calloc(). This issue was noted by GCC 12's -fanalyzer:

	reftable/publicbasics.c: In function ‘reftable_calloc’:
	reftable/publicbasics.c:43:9: error: use of possibly-NULL ‘p’ where non-null expected [CWE-690] [-Werror=analyzer-possible-null-argument]
	   43 |         memset(p, 0, sz);
	      |         ^~~~~~~~~~~~~~~~
	[...]

This bug has been with us ever since this code was added in
ef8a6c62687 (reftable: utility functions, 2021-10-07).

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 reftable/publicbasics.c | 2 ++
 1 file changed, 2 insertions(+)

Comments

René Scharfe April 15, 2022, 1:37 p.m. UTC | #1
Am 15.04.22 um 12:21 schrieb Ævar Arnfjörð Bjarmason:
> Return NULL instead of possibly feeding a NULL to memset() in
> reftable_calloc(). This issue was noted by GCC 12's -fanalyzer:
>
> 	reftable/publicbasics.c: In function ‘reftable_calloc’:
> 	reftable/publicbasics.c:43:9: error: use of possibly-NULL ‘p’ where non-null expected [CWE-690] [-Werror=analyzer-possible-null-argument]
> 	   43 |         memset(p, 0, sz);
> 	      |         ^~~~~~~~~~~~~~~~
> 	[...]
>
> This bug has been with us ever since this code was added in
> ef8a6c62687 (reftable: utility functions, 2021-10-07).

Not sure it's a bug, or limited to this specific location.  AFAICS it's
more a general lack of handling of allocation failures in the reftable
code.  Perhaps it was a conscious decision to let the system deal with
them via segfaults?

When the code is used by Git eventually it should use xmalloc and
xrealloc instead of calling malloc(3) and realloc(3) directly, to
handle allocation failures explicitly, and to support GIT_ALLOC_LIMIT.
This will calm down the analyzer and extend the safety to the rest of
the reftable code, not just this memset call.

>
> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> ---
>  reftable/publicbasics.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/reftable/publicbasics.c b/reftable/publicbasics.c
> index 0ad7d5c0ff2..a18167f5ab7 100644
> --- a/reftable/publicbasics.c
> +++ b/reftable/publicbasics.c
> @@ -40,6 +40,8 @@ void reftable_free(void *p)
>  void *reftable_calloc(size_t sz)
>  {
>  	void *p = reftable_malloc(sz);
> +	if (!p)
> +		return NULL;
>  	memset(p, 0, sz);

This function is calloc(3) reimplemented, except it can't make use of
pre-zero'd blocks of memory and doesn't multiply element size and count
for the caller while checking for overflow, making it slower and harder
to use securely. :-/

>  	return p;
>  }
Ævar Arnfjörð Bjarmason April 15, 2022, 1:53 p.m. UTC | #2
On Fri, Apr 15 2022, René Scharfe wrote:

> Am 15.04.22 um 12:21 schrieb Ævar Arnfjörð Bjarmason:
>> Return NULL instead of possibly feeding a NULL to memset() in
>> reftable_calloc(). This issue was noted by GCC 12's -fanalyzer:
>>
>> 	reftable/publicbasics.c: In function ‘reftable_calloc’:
>> 	reftable/publicbasics.c:43:9: error: use of possibly-NULL ‘p’ where non-null expected [CWE-690] [-Werror=analyzer-possible-null-argument]
>> 	   43 |         memset(p, 0, sz);
>> 	      |         ^~~~~~~~~~~~~~~~
>> 	[...]
>>
>> This bug has been with us ever since this code was added in
>> ef8a6c62687 (reftable: utility functions, 2021-10-07).
>
> Not sure it's a bug, or limited to this specific location.  AFAICS it's
> more a general lack of handling of allocation failures in the reftable
> code.  Perhaps it was a conscious decision to let the system deal with
> them via segfaults?

I think it's just buggy, it tries to deal with malloc returning NULL in
other contexts.

> When the code is used by Git eventually it should use xmalloc and
> xrealloc instead of calling malloc(3) and realloc(3) directly, to
> handle allocation failures explicitly, and to support GIT_ALLOC_LIMIT.
> This will calm down the analyzer and extend the safety to the rest of
> the reftable code, not just this memset call.

Yeah, its whole custom malloc handling should go away.

>>
>> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
>> ---
>>  reftable/publicbasics.c | 2 ++
>>  1 file changed, 2 insertions(+)
>>
>> diff --git a/reftable/publicbasics.c b/reftable/publicbasics.c
>> index 0ad7d5c0ff2..a18167f5ab7 100644
>> --- a/reftable/publicbasics.c
>> +++ b/reftable/publicbasics.c
>> @@ -40,6 +40,8 @@ void reftable_free(void *p)
>>  void *reftable_calloc(size_t sz)
>>  {
>>  	void *p = reftable_malloc(sz);
>> +	if (!p)
>> +		return NULL;
>>  	memset(p, 0, sz);
>
> This function is calloc(3) reimplemented, except it can't make use of
> pre-zero'd blocks of memory and doesn't multiply element size and count
> for the caller while checking for overflow, making it slower and harder
> to use securely. :-/

*nod*, this is really just re-arranging the deck chairs.

Maybe Han-Wen had something in mind, but I really don't see the point of
having it use malloc wrappers at all, as opposed to just having the
linker point it to the right "malloc".

So if it needs to be used outside of git.git it would just need the
trivial xmalloc() etc. wrappers.
Phillip Wood April 15, 2022, 2:30 p.m. UTC | #3
Hi Ævar and René

On 15/04/2022 14:53, Ævar Arnfjörð Bjarmason wrote:
> 
> On Fri, Apr 15 2022, René Scharfe wrote:
> 
>> Am 15.04.22 um 12:21 schrieb Ævar Arnfjörð Bjarmason:
>>> Return NULL instead of possibly feeding a NULL to memset() in
>>> reftable_calloc(). This issue was noted by GCC 12's -fanalyzer:
>>>
>>> 	reftable/publicbasics.c: In function ‘reftable_calloc’:
>>> 	reftable/publicbasics.c:43:9: error: use of possibly-NULL ‘p’ where non-null expected [CWE-690] [-Werror=analyzer-possible-null-argument]
>>> 	   43 |         memset(p, 0, sz);
>>> 	      |         ^~~~~~~~~~~~~~~~
>>> 	[...]
>>>
>>> This bug has been with us ever since this code was added in
>>> ef8a6c62687 (reftable: utility functions, 2021-10-07).
>>
>> Not sure it's a bug, or limited to this specific location.  AFAICS it's
>> more a general lack of handling of allocation failures in the reftable
>> code.  Perhaps it was a conscious decision to let the system deal with
>> them via segfaults?
> 
> I think it's just buggy, it tries to deal with malloc returning NULL in
> other contexts.

Does it? I just quickly scanned the output of

git grep -e 'reftable_[mc]alloc' -e 'reftable_realloc' -A2 origin/master

and I didn't see a single NULL check

>> When the code is used by Git eventually it should use xmalloc and
>> xrealloc instead of calling malloc(3) and realloc(3) directly, to
>> handle allocation failures explicitly, and to support GIT_ALLOC_LIMIT.
>> This will calm down the analyzer and extend the safety to the rest of
>> the reftable code, not just this memset call.
> 
> Yeah, its whole custom malloc handling should go away.

Isn't it there so the same code can be used by libgit2 and other things 
that let the caller specify a custom allocator?

Best Wishes

Phillip
Ævar Arnfjörð Bjarmason April 15, 2022, 3:20 p.m. UTC | #4
On Fri, Apr 15 2022, Phillip Wood wrote:

> Hi Ævar and René
>
> On 15/04/2022 14:53, Ævar Arnfjörð Bjarmason wrote:
>> On Fri, Apr 15 2022, René Scharfe wrote:
>> 
>>> Am 15.04.22 um 12:21 schrieb Ævar Arnfjörð Bjarmason:
>>>> Return NULL instead of possibly feeding a NULL to memset() in
>>>> reftable_calloc(). This issue was noted by GCC 12's -fanalyzer:
>>>>
>>>> 	reftable/publicbasics.c: In function ‘reftable_calloc’:
>>>> 	reftable/publicbasics.c:43:9: error: use of possibly-NULL ‘p’ where non-null expected [CWE-690] [-Werror=analyzer-possible-null-argument]
>>>> 	   43 |         memset(p, 0, sz);
>>>> 	      |         ^~~~~~~~~~~~~~~~
>>>> 	[...]
>>>>
>>>> This bug has been with us ever since this code was added in
>>>> ef8a6c62687 (reftable: utility functions, 2021-10-07).
>>>
>>> Not sure it's a bug, or limited to this specific location.  AFAICS it's
>>> more a general lack of handling of allocation failures in the reftable
>>> code.  Perhaps it was a conscious decision to let the system deal with
>>> them via segfaults?
>> I think it's just buggy, it tries to deal with malloc returning NULL
>> in
>> other contexts.
>
> Does it? I just quickly scanned the output of
>
> git grep -e 'reftable_[mc]alloc' -e 'reftable_realloc' -A2 origin/master
>
> and I didn't see a single NULL check

I think you're right, I retrieved that information from wetware. Looking
again I think I was confusing it with the if (x) free(x) fixes in
34230514b83 (Merge branch 'hn/reftable-coverity-fixes', 2022-02-16).

>>> When the code is used by Git eventually it should use xmalloc and
>>> xrealloc instead of calling malloc(3) and realloc(3) directly, to
>>> handle allocation failures explicitly, and to support GIT_ALLOC_LIMIT.
>>> This will calm down the analyzer and extend the safety to the rest of
>>> the reftable code, not just this memset call.
>> Yeah, its whole custom malloc handling should go away.
>
> Isn't it there so the same code can be used by libgit2 and other
> things that let the caller specify a custom allocator?

I don't think so, but I may be wrong. I think it's a case of
over-generalization where we'd be better off with something simpler,
i.e. just using our normal allocation functions.

That still allows for taking this code and plugging it into any custom
allocator. If you simply want to compile some code such as this to use
another allocator you can easily do that via the linker, as
e.g. git.git's build process allows you to do with nedmalloc.

So presumably if libgit2 would want to use this they'd just do that via
their build process.

I.e. they'd have "malloc" point to a symbol that would resolve to a
shimmy layer that would dispatch to functions using this:
https://github.com/libgit2/libgit2/blob/main/src/util/alloc.c

The reason you'd use these sorts of pluggable malloc functions is, I
think, a few:

 1. You are a library like libgit2, as opposed to libreftable itself, so
    you want to provide this sort of "set the alloc pointers to this"
    now. But in this case we either always want malloc/free (git.git) or
    the "parent" can provide these wrappers (libgit2).

 2. You want to support switching dynamically (or not-globally), or have
    N instances with different malloc, as e.g. the PCREv2 API does:
    cbe81e653fa (grep/pcre2: move back to thread-only PCREv2 structures,
    2021-02-18).

    For this code I think that's thoroughly in YAGNI territory.

 3. Your distribution model can't assume the user can adjust this via
    linking, e.g. C source that's intended to be #included as-is
    (although there you could use defines...) etc.

But maybe I'm missing something.

But a really good reason not to do this and just rely on the link-time
overriding is:

 A. Things look the same, and benefit from more general fixes (although
    to be fair the x*alloc() wrappers would work either way..)

 B. You don't get subtle bugs where you forget some_custom_malloc() and
    then use a generic free(). Having looked just now reftable/ has at
    least one such submarine-segfault waiting to happen.

 C. If you don't actually need the hyper-customize pluggable model of
    say PCREv2 where the library is trying to support having two
    patterns in memory managed by different allocators, or other such
    advanced use-cases it's just extra complexity.

Some of this was discussed for xdiff/* in this thread:
https://lore.kernel.org/git/220216.86leybszht.gmgdl@evledraar.gmail.com/
Junio C Hamano April 15, 2022, 4:23 p.m. UTC | #5
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

>> Does it? I just quickly scanned the output of
>>
>> git grep -e 'reftable_[mc]alloc' -e 'reftable_realloc' -A2 origin/master
>>
>> and I didn't see a single NULL check
>
> I think you're right, I retrieved that information from wetware. Looking
> again I think I was confusing it with the if (x) free(x) fixes in
> 34230514b83 (Merge branch 'hn/reftable-coverity-fixes', 2022-02-16).

True.  Initially I hoped that these RFC changes gives us a better
solution that comes from stepping back and rethinking the issues
around the original "why are we calling memset() with NULL?", and
if it were only "well, in _return_block() functions, we clear the
block before calling _free()---that shouldn't be necessary, if the
calling custom malloc-free pair cares, they can do the clearing, and
plain vanilla free() certainly doesn't, so let's stop calling
memset()", it certainly would have fallen into that category, but
everything these RFC patches do beyond that seems "oh, it does not
look important to me, so let's rip it out to simplify", without
knowing enough to say if that is sensible.

But ...

>> Isn't it there so the same code can be used by libgit2 and other
>> things that let the caller specify a custom allocator?
>
> I don't think so, but I may be wrong. I think it's a case of
> over-generalization where we'd be better off with something simpler,
> i.e. just using our normal allocation functions.

... many points that was raised on the reftable code in this thread
may deserve response to explain the rationale behind the current
code and design, as nobody can improve, without breaking the
intended way to be used, without knowing what it is.  Thanks for
marking the patches with RFC.  I think the "patch" is a bit too
dense a form to point out the issues in the existing code, but the
discussion that follows by quoting the points in the original code
and suggested changes is a good way to think about the code before
these RFC patches.

Ideally, it would have been much better if these points were raised
during the review before the code was accepted to the code base, but
it is better to have one now, rather than never ;-)

THanks.
Han-Wen Nienhuys April 25, 2022, 10:30 a.m. UTC | #6
On Fri, Apr 15, 2022 at 6:23 PM Junio C Hamano <gitster@pobox.com> wrote:
> >> git grep -e 'reftable_[mc]alloc' -e 'reftable_realloc' -A2 origin/master
> >>
> >> and I didn't see a single NULL check
> >
> > I think you're right, I retrieved that information from wetware. Looking
> > again I think I was confusing it with the if (x) free(x) fixes in
> > 34230514b83 (Merge branch 'hn/reftable-coverity-fixes', 2022-02-16).
>
> True.  Initially I hoped that these RFC changes gives us a better
> solution that comes from stepping back and rethinking the issues
> around the original "why are we calling memset() with NULL?", and

memset with NULL is an oversight.

The malloc routines are pluggable so the code could be reused for
libgit2. However, as use within Git itself is still not imminent, they
could just as well be removed as they are just a premature
generalization right now.

> if it were only "well, in _return_block() functions, we clear the
> block before calling _free()---that shouldn't be necessary, if the
> calling custom malloc-free pair cares, they can do the clearing, and
> plain vanilla free() certainly doesn't, so let's stop calling

The memset() calls on free() (eg. in  are there to tease out
use-after-free bugs in the unittests, but they should probably be
removed from file_return_block() as that is production code.
diff mbox series

Patch

diff --git a/reftable/publicbasics.c b/reftable/publicbasics.c
index 0ad7d5c0ff2..a18167f5ab7 100644
--- a/reftable/publicbasics.c
+++ b/reftable/publicbasics.c
@@ -40,6 +40,8 @@  void reftable_free(void *p)
 void *reftable_calloc(size_t sz)
 {
 	void *p = reftable_malloc(sz);
+	if (!p)
+		return NULL;
 	memset(p, 0, sz);
 	return p;
 }