[0/4] use xmalloc in more places

Message ID	20190411134736.GA28543@sigill.intra.peff.net (mailing list archive)
Headers	show Return-Path: <git-owner@kernel.org> Date: Thu, 11 Apr 2019 09:47:36 -0400 From: Jeff King <peff@peff.net> To: git@vger.kernel.org Cc: =?utf-8?b?546L5YGl5by6?= <jianqiang.wang@securitygossip.com> Subject: [PATCH 0/4] use xmalloc in more places Message-ID: <20190411134736.GA28543@sigill.intra.peff.net> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Sender: git-owner@vger.kernel.org Precedence: bulk
Series	use xmalloc in more places \| expand [0/4] use xmalloc in more places [1/4] test-prio-queue: use xmalloc [2/4] xdiff: use git-compat-util [3/4] xdiff: use xmalloc/xrealloc [4/4] progress: use xmalloc/xcalloc

Message ID

20190411134736.GA28543@sigill.intra.peff.net (mailing list archive)

Headers

Date: Thu, 11 Apr 2019 09:47:36 -0400
From: Jeff King <peff@peff.net>
To: git@vger.kernel.org
Cc: =?utf-8?b?546L5YGl5by6?= <jianqiang.wang@securitygossip.com>
Subject: [PATCH 0/4] use xmalloc in more places
Message-ID: <20190411134736.GA28543@sigill.intra.peff.net>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
Sender: git-owner@vger.kernel.org
Precedence: bulk

Series

use xmalloc in more places | expand

Message

Jeff King April 11, 2019, 1:47 p.m. UTC

It was reported on the Git security list that there are a few spots
which use a bare malloc() but don't check the return value, which could
dereference NULL. I don't think any of these are exploitable in an
interesting way, beyond Git just segfaulting more or less immediately.

But we should still be handling failures, and I think it makes sense to
be consistent about how we do it (and the other rules which come with
using xmalloc, like GIT_ALLOC_LIMIT).

This series cleans up most of the bare calls found by:

  git grep -E '[^a-z_](m|c|re)alloc\(' '*.c' :^compat :^contrib :^wrapper.c

The calls I've left are:

  - wrapper.c obviously needs to call the real functions :)

  - compat/ has functions emulating libc and system calls, and which are
    expected to return ENOMEM as appropriate

  - diff-delta will gracefully return NULL when trying to delta
    something too large, and pack-objects will skip past trying to find
    a delta. I've never seen this happen in practice, but then I
    primarily use Linux which is more than happy to overcommit on
    malloc(). I've left it unchanged, though possibly we could have an
    xmalloc_gently() if we want to enforce things like GIT_ALLOC_LIMIT
    but still do the graceful fallback thing.

  - test-hash tries to probe malloc() to see how big a buffer it can
    allocate. I doubt this even does anything useful on systems like
    Linux that overcommit. We also don't seem to ever invoke this with a
    buffer larger than 8k in the first place. So it could maybe go away
    entirely, but I left it here.

  [1/4]: test-prio-queue: use xmalloc
  [2/4]: xdiff: use git-compat-util
  [3/4]: xdiff: use xmalloc/xrealloc
  [4/4]: progress: use xmalloc/xcalloc

 progress.c                 | 18 +++++-------------
 t/helper/test-prio-queue.c |  2 +-
 xdiff/xdiff.h              |  4 ++--
 xdiff/xinclude.h           |  8 +-------
 4 files changed, 9 insertions(+), 23 deletions(-)

Comments

Taylor Blau April 11, 2019, 7:14 p.m. UTC | #1

Hi Peff,

On Thu, Apr 11, 2019 at 09:47:36AM -0400, Jeff King wrote:
> It was reported on the Git security list that there are a few spots
> which use a bare malloc() but don't check the return value, which could
> dereference NULL. I don't think any of these are exploitable in an
> interesting way, beyond Git just segfaulting more or less immediately.

Good; at least none of these seem to be exploitable for nefarious
purposes. Thanks for posting some patches on the public list.

> But we should still be handling failures, and I think it makes sense to
> be consistent about how we do it (and the other rules which come with
> using xmalloc, like GIT_ALLOC_LIMIT).
>
> This series cleans up most of the bare calls found by:
>
>   git grep -E '[^a-z_](m|c|re)alloc\(' '*.c' :^compat :^contrib :^wrapper.c

This (admittedly pretty awesome) 'git grep' invocation reminds me of a
series I was pretty sure you wrote to ban functions like 'strcpy' and
other obviously bad things.

Some quick searching turned up [1], which landed as f225611d1c
(automatically ban strcpy(), 2018-07-26). Do we want something similar
here? Of course, the locations below would have to be exempt, but it
seems worthwhile (and would save a review cycle in the case that someone
added a 'malloc' in a patch sent here).

FWIW, there isn't any mention of 'malloc' anywhere in that original
thread [1], so I _think_ this would be the first time discussing banning
malloc in this fashion.

> The calls I've left are:
>
>   - wrapper.c obviously needs to call the real functions :)

Yep -- this one needs to stay ;-).

>   - compat/ has functions emulating libc and system calls, and which are
>     expected to return ENOMEM as appropriate
>
>   - diff-delta will gracefully return NULL when trying to delta
>     something too large, and pack-objects will skip past trying to find
>     a delta. I've never seen this happen in practice, but then I
>     primarily use Linux which is more than happy to overcommit on
>     malloc(). I've left it unchanged, though possibly we could have an
>     xmalloc_gently() if we want to enforce things like GIT_ALLOC_LIMIT
>     but still do the graceful fallback thing.
>
>   - test-hash tries to probe malloc() to see how big a buffer it can
>     allocate. I doubt this even does anything useful on systems like
>     Linux that overcommit. We also don't seem to ever invoke this with a
>     buffer larger than 8k in the first place. So it could maybe go away
>     entirely, but I left it here.
>
>   [1/4]: test-prio-queue: use xmalloc
>   [2/4]: xdiff: use git-compat-util
>   [3/4]: xdiff: use xmalloc/xrealloc
>   [4/4]: progress: use xmalloc/xcalloc
>
>  progress.c                 | 18 +++++-------------
>  t/helper/test-prio-queue.c |  2 +-
>  xdiff/xdiff.h              |  4 ++--
>  xdiff/xinclude.h           |  8 +-------
>  4 files changed, 9 insertions(+), 23 deletions(-)
>
Thanks,
Taylor

[1]: https://public-inbox.org/git/20180719203901.GA8079@sigill.intra.peff.net/

Jeff King April 11, 2019, 7:37 p.m. UTC | #2

On Thu, Apr 11, 2019 at 12:14:52PM -0700, Taylor Blau wrote:

> > This series cleans up most of the bare calls found by:
> >
> >   git grep -E '[^a-z_](m|c|re)alloc\(' '*.c' :^compat :^contrib :^wrapper.c
> 
> This (admittedly pretty awesome) 'git grep' invocation reminds me of a
> series I was pretty sure you wrote to ban functions like 'strcpy' and
> other obviously bad things.
> 
> Some quick searching turned up [1], which landed as f225611d1c
> (automatically ban strcpy(), 2018-07-26). Do we want something similar
> here? Of course, the locations below would have to be exempt, but it
> seems worthwhile (and would save a review cycle in the case that someone
> added a 'malloc' in a patch sent here).

I don't think we can ban malloc, since we have to use it ourselves. :)

With some contortions, we probably could unban it specifically in
wrapper.c (though note there are a few other calls I've left which would
need to be handled somehow).

Another option would be coccinelle patches to convert malloc() to
xmalloc(), etc (with an exception for the wrappers). I'm not entirely
comfortable with automatic conversion here because there's often some
follow-up adjustments (i.e., we can stop handling allocation errors and
maybe delete some code). I think coccinelle can identify callers and
barf, though.

I'm not sure whether coccinelle saves review cycles (since that implies
people actually run it, though maybe that is better now that it's part
of Travis?). It seems to me that it's usually more helpful for people
periodically doing follow-up auditing.

So I dunno. If this was a common mistake I'd be more concerned with
saving review cycles, but all of the cases I found were actually just
leftovers from the very early days of Git.

-Peff

Taylor Blau April 11, 2019, 7:43 p.m. UTC | #3

Hi Peff,

On Thu, Apr 11, 2019 at 03:37:35PM -0400, Jeff King wrote:
> On Thu, Apr 11, 2019 at 12:14:52PM -0700, Taylor Blau wrote:
>
> > > This series cleans up most of the bare calls found by:
> > >
> > >   git grep -E '[^a-z_](m|c|re)alloc\(' '*.c' :^compat :^contrib :^wrapper.c
> >
> > This (admittedly pretty awesome) 'git grep' invocation reminds me of a
> > series I was pretty sure you wrote to ban functions like 'strcpy' and
> > other obviously bad things.
> >
> > Some quick searching turned up [1], which landed as f225611d1c
> > (automatically ban strcpy(), 2018-07-26). Do we want something similar
> > here? Of course, the locations below would have to be exempt, but it
> > seems worthwhile (and would save a review cycle in the case that someone
> > added a 'malloc' in a patch sent here).
>
> I don't think we can ban malloc, since we have to use it ourselves. :)
>
> With some contortions, we probably could unban it specifically in
> wrapper.c (though note there are a few other calls I've left which would
> need to be handled somehow).

Right. I think that I should have made this point clearer in my initial
reply. I was thinking that we could #undef the banned macro in
wrapper.c, or some similar hula-hooping.

> Another option would be coccinelle patches to convert malloc() to
> xmalloc(), etc (with an exception for the wrappers). I'm not entirely
> comfortable with automatic conversion here because there's often some
> follow-up adjustments (i.e., we can stop handling allocation errors and
> maybe delete some code). I think coccinelle can identify callers and
> barf, though.
>
> I'm not sure whether coccinelle saves review cycles (since that implies
> people actually run it, though maybe that is better now that it's part
> of Travis?). It seems to me that it's usually more helpful for people
> periodically doing follow-up auditing.
>
> So I dunno. If this was a common mistake I'd be more concerned with
> saving review cycles, but all of the cases I found were actually just
> leftovers from the very early days of Git.

Yeah... maybe that's the bigger question that I hadn't asked. I made the
suggestion thinking that it would help newcomers avoid writing
'malloc()' and sending it if they didn't know we use our 'xmalloc()'
instead.

But I'm not sure if the argument holds up. I think that in general
exactly the sorts of new-comers that I'm thinking of would have more
than one review cycle anyway, so it might not be worth the effort,
anyway...

> -Peff

Thanks,
Taylor

Jeff King April 11, 2019, 8:04 p.m. UTC | #4

On Thu, Apr 11, 2019 at 12:43:08PM -0700, Taylor Blau wrote:

> > I don't think we can ban malloc, since we have to use it ourselves. :)
> >
> > With some contortions, we probably could unban it specifically in
> > wrapper.c (though note there are a few other calls I've left which would
> > need to be handled somehow).
> 
> Right. I think that I should have made this point clearer in my initial
> reply. I was thinking that we could #undef the banned macro in
> wrapper.c, or some similar hula-hooping.

That _probably_ works, but I think technically falls afoul of platforms
on which there's a malloc macro in the first place. We need to not just
#undef it then, but restore the original macro, which is impossible. So
you're better off just not fudging it in the first place. Which would
probably be something like:

  #define SUPPRESS_BAN_MALLOC
  #include "git-compat-util.h"

or something, with the appropriate magic in banned.h.

This might be academic, but you wouldn't know until somebody's platform
subtly breaks. ;) We did run into it already with strcpy(), I think,
hence the defensive #undefs in banned.h.

> Yeah... maybe that's the bigger question that I hadn't asked. I made the
> suggestion thinking that it would help newcomers avoid writing
> 'malloc()' and sending it if they didn't know we use our 'xmalloc()'
> instead.
> 
> But I'm not sure if the argument holds up. I think that in general
> exactly the sorts of new-comers that I'm thinking of would have more
> than one review cycle anyway, so it might not be worth the effort,
> anyway...

I think it's still a reasonable thought, even if I'm not sure the
balance of cost/reward is quite there so far (but might change if it's
an error we see people start to make). Compared to coccinelle, the
banned-function approach is a little nicer for helping new submitters
because it catches the problem during a normal compile (and we know
nobody would ever submit a patch without having at least compiled it,
right? ;) ).

-Peff