mbox series

[0/2] Generate temporary files using a CSPRNG

Message ID 20211116033542.3247094-1-sandals@crustytoothpaste.net (mailing list archive)
Headers show
Series Generate temporary files using a CSPRNG | expand

Message

brian m. carlson Nov. 16, 2021, 3:35 a.m. UTC
Currently, when we generate a temporary file name, we use the seconds,
microseconds, and the PID to generate a unique value.  The resulting
value, while changing frequently, is actually predictable and on some
systems, it may be possible to cause a DoS by creating all potential
temporary files when the temporary file is being created in TMPDIR.

The solution to this is to use the system CSPRNG to generate the
temporary file name.  This is the approach taken by FreeBSD, NetBSD, and
OpenBSD, and glibc also recently switched to this approach from an
approach that resembled ours in many ways.

Even if this is not practically exploitable on many systems, it seems
prudent to be at least as careful about temporary file generation as
libc is.

This issue was mentioned on the security list and it was decided that
this was not sensitive enough to warrant a coordinated disclosure, a
sentiment with which I agree.  This is difficult to exploit on most
systems, but I think it's still worth fixing.

This series introduces two commits.  The first implements a generic
function which calls the system CSPRNG.  A reasonably exhaustive attempt
is made to pick from the options with a preference for performance.  The
second changes our temporary file code to use the CSPRNG.

I have added a test helper that can emit bytes from the CSPRNG, as well
as a self-test mode.  The former is not used, but I anticipated it could
find utility in the testsuite, and it was useful for testing by hand, so
I included it.

The careful reader will notice that the sole additional test is added to
t0000.  That's because temporary file generation is fundamental to how
Git operates and if it fails, the entire testsuite is broken.  Thus, a
simple test to verify that it's working seems prudent as part of t0000.
I was also unable to find a better place to put it, but am open to
suggestions if folks have ideas.

This passes our CI, including on Windows, and I have manually verified
the correctness of the other four branches on Linux (the HAVE_ARC4RANDOM
branch requiring a small patch which is not necessary on systems which
have it in libc and which is therefore not included here).

I am of course interested in hearing from anyone who lacks one of the
CSPRNG interfaces we have here.  Looking at the Go standard library,
/dev/urandom should be available on at least AIX, Darwin (macOS),
DragonflyBSD, FreeBSD, Linux, NetBSD, OpenBSD, and Solaris, and I
believe it is available on most other Unix systems as well.
RtlGenRandom is available on Windows back to XP, which we no longer
support.  The bizarre header contortion on Windows comes from Mozilla,
but is widely used in other codebases with no substantial changes.

For those who are interested, I computed the probability of spurious
failure for the self-test mode like so:

  256 * (255/256)^65536

This Ruby one-liner estimates the probability at approximately 10^-108:

  ruby -e 'a = 255 ** 65536; b = 256 ** 65536; puts b.to_s.length - a.to_s.length - 3'

If I have made an error in the calculation, please do feel free to point
it out.

brian m. carlson (2):
  wrapper: add a helper to generate numbers from a CSPRNG
  wrapper: use a CSPRNG to generate random file names

 Makefile                            | 25 ++++++++++
 compat/winansi.c                    |  6 +++
 config.mak.uname                    |  9 ++++
 contrib/buildsystems/CMakeLists.txt |  2 +-
 git-compat-util.h                   | 16 +++++++
 t/helper/test-csprng.c              | 63 +++++++++++++++++++++++++
 t/helper/test-tool.c                |  1 +
 t/helper/test-tool.h                |  1 +
 t/t0000-basic.sh                    |  4 ++
 wrapper.c                           | 71 ++++++++++++++++++++++++-----
 10 files changed, 186 insertions(+), 12 deletions(-)
 create mode 100644 t/helper/test-csprng.c

Comments

Jeff King Nov. 16, 2021, 3:44 p.m. UTC | #1
On Tue, Nov 16, 2021 at 03:35:40AM +0000, brian m. carlson wrote:

> For those who are interested, I computed the probability of spurious
> failure for the self-test mode like so:
> 
>   256 * (255/256)^65536
> 
> This Ruby one-liner estimates the probability at approximately 10^-108:
> 
>   ruby -e 'a = 255 ** 65536; b = 256 ** 65536; puts b.to_s.length - a.to_s.length - 3'
> 
> If I have made an error in the calculation, please do feel free to point
> it out.

Yes, I think your math is correct there.

A more interesting question is whether generating 64k of PRNG bytes per
test run is going to a problem for system entropy pools. For that
matter, I guess the use of it for tempfiles will produce a similar
burden, since we run so many commands. My understanding is that modern
systems will just produce infinite output for /dev/urandom, etc, but I
wonder if there are any systems left where that is not true (because
they have a misguided notion that they need to stir in more "real"
entropy bits).

-Peff
Ævar Arnfjörð Bjarmason Nov. 16, 2021, 8:35 p.m. UTC | #2
On Tue, Nov 16 2021, brian m. carlson wrote:

> Currently, when we generate a temporary file name, we use the seconds,
> microseconds, and the PID to generate a unique value.  The resulting
> value, while changing frequently, is actually predictable and on some
> systems, it may be possible to cause a DoS by creating all potential
> temporary files when the temporary file is being created in TMPDIR.
>
> The solution to this is to use the system CSPRNG to generate the
> temporary file name.  This is the approach taken by FreeBSD, NetBSD, and
> OpenBSD, and glibc also recently switched to this approach from an
> approach that resembled ours in many ways.
>
> Even if this is not practically exploitable on many systems, it seems
> prudent to be at least as careful about temporary file generation as
> libc is.
>    
> This issue was mentioned on the security list and it was decided that
> this was not sensitive enough to warrant a coordinated disclosure, a
> sentiment with which I agree.  This is difficult to exploit on most
> systems, but I think it's still worth fixing.

I skimmed that report on the security list, and having skimmed this
patch series I think what's missing is something like this summary of
yours there (which I hope you don't mind me quoting):

    Now, in Git's case, I don't think our security model allows untrusted
    users to write directly into the repository, so I don't think this
    constitutes a vulnerability there.  We have a function that uses TMPDIR,
    which appears to be used for prepping temporary blobs in diffs and in
    GnuPG verification, which is definitely more questionable.

I tried testing this codepath real quick now with:
    
    diff --git a/wrapper.c b/wrapper.c
    index 36e12119d76..2f3755886fb 100644
    --- a/wrapper.c
    +++ b/wrapper.c
    @@ -497,6 +497,7 @@ int git_mkstemps_mode(char *pattern, int suffix_len, int mode)
                            v /= num_letters;
                    }
     
    +               BUG("%s", pattern);
                    fd = open(pattern, O_CREAT | O_EXCL | O_RDWR, mode);
                    if (fd >= 0)
                            return fd;
    
And then doing:

    grep BUG test-results/*.out

And the resulting output is all of the form:

    .git/objects/9f/tmp_obj_FOzEcZ
    .git/objects/pack/tmp_pack_fJC0RI

And a couple of:

    .git/info/refs_Lctaew

I.e. these are all cases where we're creating in-repo tempfiles, we're
not racing someone in /tmp/ for these, except perhaps in some cases I've
missed (but you allude to) where we presumably should just move those
into .git/tmp/, at least by default.

Doesn't that entirely solve this security problem going forward? If a
hostile actor can write into your .git/ they don't need to screw with
you in this way, they can just write executable aliases, or the same in
.git/hook/.

Unless that is we do have some use-case for potentially racing others in
/tmp/, but then we could make that specifically configurable etc.

I really don't mind us having a better tempfile() function principle,
but so far this sort of hardening just seems entirely unnecessary to me.

As seen from your implementation requires us top dip our toes into
seeding random data, which I'd think from a security maintenance
perspective we'd be much better offloading to the OS going forward if at
all possible.

If there are cases where we actually need this hardening because we're
writing in a shared /tmp/ and not .git/, then surely we're better having
those API users call a differently named function, or to move those
users to using a .git/tmp/ unless they configure things otherwise?
Jeff King Nov. 16, 2021, 9:06 p.m. UTC | #3
On Tue, Nov 16, 2021 at 09:35:59PM +0100, Ævar Arnfjörð Bjarmason wrote:

> I tried testing this codepath real quick now with:
>     
>     diff --git a/wrapper.c b/wrapper.c
>     index 36e12119d76..2f3755886fb 100644
>     --- a/wrapper.c
>     +++ b/wrapper.c
>     @@ -497,6 +497,7 @@ int git_mkstemps_mode(char *pattern, int suffix_len, int mode)
>                             v /= num_letters;
>                     }
>      
>     +               BUG("%s", pattern);
>                     fd = open(pattern, O_CREAT | O_EXCL | O_RDWR, mode);
>                     if (fd >= 0)
>                             return fd;
>     
> And then doing:
> 
>     grep BUG test-results/*.out
> 
> And the resulting output is all of the form:
> 
>     .git/objects/9f/tmp_obj_FOzEcZ
>     .git/objects/pack/tmp_pack_fJC0RI
> 
> And a couple of:
> 
>     .git/info/refs_Lctaew
> 
> I.e. these are all cases where we're creating in-repo tempfiles, we're
> not racing someone in /tmp/ for these, except perhaps in some cases I've
> missed (but you allude to) where we presumably should just move those
> into .git/tmp/, at least by default.

Your patch is way too aggressive. By bailing via BUG(), most commands
will fail, so we never get to the interesting ones (e.g., we would not
ever get to the point of writing out a tag signature for gpg to verify,
because we'd barf when trying to create the tag in the first place).

Try:

diff --git a/wrapper.c b/wrapper.c
index 36e12119d7..5218a4b3bd 100644
--- a/wrapper.c
+++ b/wrapper.c
@@ -497,6 +497,10 @@ int git_mkstemps_mode(char *pattern, int suffix_len, int mode)
 			v /= num_letters;
 		}
 
+		{
+			static struct trace_key t = TRACE_KEY_INIT(TEMPFILE);
+			trace_printf_key(&t, "%s", pattern);
+		}
 		fd = open(pattern, O_CREAT | O_EXCL | O_RDWR, mode);
 		if (fd >= 0)
 			return fd;

And then:

  GIT_TRACE_TEMPFILE=/tmp/foo make test
  grep ^/tmp /tmp/foo | wc -l

turns up hundreds of hits.

> If there are cases where we actually need this hardening because we're
> writing in a shared /tmp/ and not .git/, then surely we're better having
> those API users call a differently named function, or to move those
> users to using a .git/tmp/ unless they configure things otherwise?

Assuming you can write to .git/tmp means that conceptually read-only
operations (like verifying tags) require write access to the repository.

-Peff
brian m. carlson Nov. 16, 2021, 10:17 p.m. UTC | #4
On 2021-11-16 at 15:44:33, Jeff King wrote:
> On Tue, Nov 16, 2021 at 03:35:40AM +0000, brian m. carlson wrote:
> 
> > For those who are interested, I computed the probability of spurious
> > failure for the self-test mode like so:
> > 
> >   256 * (255/256)^65536
> > 
> > This Ruby one-liner estimates the probability at approximately 10^-108:
> > 
> >   ruby -e 'a = 255 ** 65536; b = 256 ** 65536; puts b.to_s.length - a.to_s.length - 3'
> > 
> > If I have made an error in the calculation, please do feel free to point
> > it out.
> 
> Yes, I think your math is correct there.
> 
> A more interesting question is whether generating 64k of PRNG bytes per
> test run is going to a problem for system entropy pools. For that
> matter, I guess the use of it for tempfiles will produce a similar
> burden, since we run so many commands. My understanding is that modern
> systems will just produce infinite output for /dev/urandom, etc, but I
> wonder if there are any systems left where that is not true (because
> they have a misguided notion that they need to stir in more "real"
> entropy bits).

I have specifically avoided invoking any sort of potentially blocking
CSPRNG for that reason.  /dev/urandom is specifically not supposed to
block, and on the systems that I mentioned, the way Go uses it would
indicate that it should not.  There is a system, which is Plan 9, where
Go uses /dev/random to seed an X.917 generator, and there I assume there
is no /dev/urandom, but I also know full well that we are likely
completely broken on Plan 9 already, so this will be the least of the
required fixes.

RtlGenRandom is non-blocking, and as the commit message mentioned,
arc4random uses ChaCha20 in a non-blocking way on all systems I could
find, except MirBSD which uses RC4, also without blocking.  Linux's
CSPRNG is also non-blocking.

I've also looked at Rust's getrandom crate, which provides support for
various other systems, and I have no indication that any of the
interfaces I've provided are blocking in any way, since that crate would
not desire that behavior.  Looking at it just now, I did notice that
macOS supports getentropy, so if I need to do a reroll, I'll add an
option for that.

So I don't think we're likely to run into a problem here.  If we do run
into systems with that problem, we can add an option to use libbsd,
which provides arc4random and company (using ChaCha20).  The tricky part
is that when using libbsd, arc4random is not in <stdlib.h> (since that's
a system header file) and is instead in <bsd/stdlib.h>.  However, it's
an easy change if we run into some uncommon system where that's the
case.

If we don't like the test, we can avoid running it by default on the
risk of seeing breakage go uncaught.
Randall S. Becker Nov. 16, 2021, 10:29 p.m. UTC | #5
On November 16, 2021 5:18 PM, brian m. carlson wrote:
> On 2021-11-16 at 15:44:33, Jeff King wrote:
> > On Tue, Nov 16, 2021 at 03:35:40AM +0000, brian m. carlson wrote:
> >
> > > For those who are interested, I computed the probability of spurious
> > > failure for the self-test mode like so:
> > >
> > >   256 * (255/256)^65536
> > >
> > > This Ruby one-liner estimates the probability at approximately 10^-108:
> > >
> > >   ruby -e 'a = 255 ** 65536; b = 256 ** 65536; puts b.to_s.length -
> a.to_s.length - 3'
> > >
> > > If I have made an error in the calculation, please do feel free to
> > > point it out.
> >
> > Yes, I think your math is correct there.
> >
> > A more interesting question is whether generating 64k of PRNG bytes
> > per test run is going to a problem for system entropy pools. For that
> > matter, I guess the use of it for tempfiles will produce a similar
> > burden, since we run so many commands. My understanding is that
> modern
> > systems will just produce infinite output for /dev/urandom, etc, but I
> > wonder if there are any systems left where that is not true (because
> > they have a misguided notion that they need to stir in more "real"
> > entropy bits).
> 
> I have specifically avoided invoking any sort of potentially blocking CSPRNG
> for that reason.  /dev/urandom is specifically not supposed to block, and on
> the systems that I mentioned, the way Go uses it would indicate that it
> should not.  There is a system, which is Plan 9, where Go uses /dev/random
> to seed an X.917 generator, and there I assume there is no /dev/urandom,
> but I also know full well that we are likely completely broken on Plan 9
> already, so this will be the least of the required fixes.
> 
> RtlGenRandom is non-blocking, and as the commit message mentioned,
> arc4random uses ChaCha20 in a non-blocking way on all systems I could find,
> except MirBSD which uses RC4, also without blocking.  Linux's CSPRNG is also
> non-blocking.
> 
> I've also looked at Rust's getrandom crate, which provides support for
> various other systems, and I have no indication that any of the interfaces I've
> provided are blocking in any way, since that crate would not desire that
> behavior.  Looking at it just now, I did notice that macOS supports
> getentropy, so if I need to do a reroll, I'll add an option for that.
> 
> So I don't think we're likely to run into a problem here.  If we do run into
> systems with that problem, we can add an option to use libbsd, which
> provides arc4random and company (using ChaCha20).  The tricky part is that
> when using libbsd, arc4random is not in <stdlib.h> (since that's a system
> header file) and is instead in <bsd/stdlib.h>.  However, it's an easy change if
> we run into some uncommon system where that's the case.
> 
> If we don't like the test, we can avoid running it by default on the risk of
> seeing breakage go uncaught.

Adding these dependencies are also a problem. libbsd does not port to NonStop. GO is not available yet. Please stay at least somewhat POSIX-like. Begging because I do not want to lose git.
-Randall
Ævar Arnfjörð Bjarmason Nov. 17, 2021, 8:36 a.m. UTC | #6
On Tue, Nov 16 2021, Jeff King wrote:

> On Tue, Nov 16, 2021 at 09:35:59PM +0100, Ævar Arnfjörð Bjarmason wrote:
>
>> I tried testing this codepath real quick now with:
>>     
>>     diff --git a/wrapper.c b/wrapper.c
>>     index 36e12119d76..2f3755886fb 100644
>>     --- a/wrapper.c
>>     +++ b/wrapper.c
>>     @@ -497,6 +497,7 @@ int git_mkstemps_mode(char *pattern, int suffix_len, int mode)
>>                             v /= num_letters;
>>                     }
>>      
>>     +               BUG("%s", pattern);
>>                     fd = open(pattern, O_CREAT | O_EXCL | O_RDWR, mode);
>>                     if (fd >= 0)
>>                             return fd;
>>     
>> And then doing:
>> 
>>     grep BUG test-results/*.out
>> 
>> And the resulting output is all of the form:
>> 
>>     .git/objects/9f/tmp_obj_FOzEcZ
>>     .git/objects/pack/tmp_pack_fJC0RI
>> 
>> And a couple of:
>> 
>>     .git/info/refs_Lctaew
>> 
>> I.e. these are all cases where we're creating in-repo tempfiles, we're
>> not racing someone in /tmp/ for these, except perhaps in some cases I've
>> missed (but you allude to) where we presumably should just move those
>> into .git/tmp/, at least by default.
>
> Your patch is way too aggressive. By bailing via BUG(), most commands
> will fail, so we never get to the interesting ones (e.g., we would not
> ever get to the point of writing out a tag signature for gpg to verify,
> because we'd barf when trying to create the tag in the first place).
>
> Try:
>
> diff --git a/wrapper.c b/wrapper.c
> index 36e12119d7..5218a4b3bd 100644
> --- a/wrapper.c
> +++ b/wrapper.c
> @@ -497,6 +497,10 @@ int git_mkstemps_mode(char *pattern, int suffix_len, int mode)
>  			v /= num_letters;
>  		}
>  
> +		{
> +			static struct trace_key t = TRACE_KEY_INIT(TEMPFILE);
> +			trace_printf_key(&t, "%s", pattern);
> +		}
>  		fd = open(pattern, O_CREAT | O_EXCL | O_RDWR, mode);
>  		if (fd >= 0)
>  			return fd;
>
> And then:
>
>   GIT_TRACE_TEMPFILE=/tmp/foo make test
>   grep ^/tmp /tmp/foo | wc -l
>
> turns up hundreds of hits.

Thanks, there's a long tail of these, but I came up with this crappy
one-liner one regex at a time while looking at it:

    cat /tmp/git_mkstemps_mode.trace | perl -pe 's[/[0-9a-f]{2}/][/HH/]; s[/incoming-\K[^/]+][XXX]; s[/tmp/\K[^_]+][XXX]; s/tmp_(idx|obj|pack)_\K[a-zA-Z0-9]+$/XXX/; s[/objects/\
K../][$1??/]g; s[^/run/user.*/objects/][<systemd run/user>/objects/]; s[(vtag_tmp|pack_|refs_)\K.*][XXX]; '|sort|uniq -c|sort -nr|less

Which gives us:

    893 .git/objects/pack/tmp_pack_XXX
    836 ./objects/??/tmp_obj_XXX
    722 .git/objects/pack/tmp_idx_XXX
    401 <systemd run/user>/objects/incoming-XXX/HH/tmp_obj_XXX
    366 /run/user/1001/tmp/XXX_pack_XXX
    289 <systemd run/user>/objects/??/tmp_obj_XXX
    261 .git/info/refs_XXX
    258 /tmp/XXX_vtag_tmpXXX
    185 clone.git/objects/??/tmp_obj_XXX
     77 /tmp/XXX_file
     72 marks-test/.git/objects/??/tmp_obj_XXX
     71 <systemd run/user>/objects/pack/tmp_pack_XXX
     69 <systemd run/user>/objects/pack/tmp_idx_XXX
     34 objects/pack/tmp_pack_XXX
     34 objects/pack/tmp_idx_XXX
     25 /run/user/1001/tmp/XXX.git/objects/??/tmp_obj_XXX
     20 info/refs_XXX
     12 /tmp/XXX_text
     12 foo.git/objects/??/tmp_obj_XXX

I.e. this is stuff that's either already in .git, or a small handful of
special-cases such as "git verify-tag".

>> If there are cases where we actually need this hardening because we're
>> writing in a shared /tmp/ and not .git/, then surely we're better having
>> those API users call a differently named function, or to move those
>> users to using a .git/tmp/ unless they configure things otherwise?
>
> Assuming you can write to .git/tmp means that conceptually read-only
> operations (like verifying tags) require write access to the repository.

That leaves the "differently named function" which I think we should
really do in either case.

I.e. if I'm verifying lots of tags then I'm better off on a modern
systemd system using /run/user/`id -u`, as opposed to /tmp/ which is
often disk-backed. So being aware of $XDG_RUNTIME_DIR seems like a
sensible thing in either case.

And on those systems the DoS aspect of this becomes a non-issue, that
directory is only writable by one (non-super)user.

I think there's a big advantage to having any tricky CSPRNG-implementing
code in its own corner like that.

It means that e.g. if gpg learns some mode to do this that doesn't
require tempfiles, and we're confident we don't create things in /tmp
otherwise that we could drop it, or users who don't want git shipping a
CSPRNG can compile it out.

But I really don't see why it isn't an acceptable solution for git to
just die here if we fail to create the Nth tempfile in a row.

Or something simpler like having the "git verify-tag" code fall back to
writing in say $HOME/.cache/git, which is another simple way to avoid
the issue entirely in most cases.