diff mbox series

[v3,2/2] wrapper: use a CSPRNG to generate random file names

Message ID 20220117215617.843190-3-sandals@crustytoothpaste.net (mailing list archive)
State Accepted
Commit 47efda967cfd4ef9d39de149e1e3654b051e5d19
Headers show
Series Generate temporary files using a CSPRNG | expand

Commit Message

brian m. carlson Jan. 17, 2022, 9:56 p.m. UTC
The current way we generate random file names is by taking the seconds
and microseconds, plus the PID, and mixing them together, then encoding
them.  If this fails, we increment the value by 7777, and try again up
to TMP_MAX times.

Unfortunately, this is not the best idea from a security perspective.
If we're writing into TMPDIR, an attacker can guess these values easily
and prevent us from creating any temporary files at all by creating them
all first.  Even though we set TMP_MAX to 16384, this may be achievable
in some contexts, even if unlikely to occur in practice.

Fortunately, we can simply solve this by using the system
cryptographically secure pseudorandom number generator (CSPRNG) to
generate a random 64-bit value, and use that as before.  Note that there
is still a small bias here, but because a six-character sequence chosen
out of 62 characters provides about 36 bits of entropy, the bias here is
less than 2^-28, which is acceptable, especially considering we'll retry
several times.

Note that the use of a CSPRNG in generating temporary file names is also
used in many libcs.  glibc recently changed from an approach similar to
ours to using a CSPRNG, and FreeBSD and OpenBSD also use a CSPRNG in
this case.  Even if the likelihood of an attack is low, we should still
be at least as responsible in creating temporary files as libc is.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 wrapper.c | 15 ++++-----------
 1 file changed, 4 insertions(+), 11 deletions(-)

Comments

Ævar Arnfjörð Bjarmason Jan. 18, 2022, 9:24 a.m. UTC | #1
On Mon, Jan 17 2022, brian m. carlson wrote:

> The current way we generate random file names is by taking the seconds
> and microseconds, plus the PID, and mixing them together, then encoding
> them.  If this fails, we increment the value by 7777, and try again up
> to TMP_MAX times.
>
> Unfortunately, this is not the best idea from a security perspective.
> If we're writing into TMPDIR, an attacker can guess these values easily
> and prevent us from creating any temporary files at all by creating them
> all first.  Even though we set TMP_MAX to 16384, this may be achievable
> in some contexts, even if unlikely to occur in practice.
> [...]

I had a comment on v1[1] of this series which still applies here,
i.e. the "if we're writing into TMPDIR[...]" here elides the fact that
much of the time we're writing a tempfile into .git, so the security
issue ostensibly being solved here won't be a practical issue at all.

Then for out-of-repo tempfiles some OS's have a per-user $TEMPDIR you
can use (e.g. systemd's /run/user/`id -u`). Finally...

> Note that the use of a CSPRNG in generating temporary file names is also
> used in many libcs.  glibc recently changed from an approach similar to
> ours to using a CSPRNG, and FreeBSD and OpenBSD also use a CSPRNG in
> this case.  Even if the likelihood of an attack is low, we should still
> be at least as responsible in creating temporary files as libc is.

...we already have in-tree users of mkstemp(), which on glibc ostensibly
tries to solve the same security issues you note here, and the
reftable/* user has been added since earlier iterations of this series:
o    
    $ git grep -E '\bmkstemp\(' -- '*.[ch]'
    compat/mingw.c:int mkstemp(char *template)
    compat/mingw.h:int mkstemp(char *template);
    entry.c:                return mkstemp(path);
    reftable/stack.c:       tab_fd = mkstemp(temp_tab_file_name.buf);
    reftable/stack.c:       tab_fd = mkstemp(temp_tab->buf);
    reftable/stack_test.c:  int fd = mkstemp(fn);
    wrapper.c:      fd = mkstemp(filename_template);

This series really feels like it's adding too much complexity and
potential auditing headaches (distributors worrying about us shipping a
CSPRNG, having to audit it) to a low-level codepath that most of the
time won't need this at all.

So instead of:

 A. Add CSPRNG with demo test helper
 B. Use it in git_mkstemps_mode()

I'd think we'd be much better off with:

 A. Split out callers of tempfile.c and mk.*temp in wrapper.c that create tempfiles in .git
 B. <the rest>

I honestly haven't looked too much at what <the rest> is, other than
what I wrote in [1], which seems to suggest that most of our codepaths
won't need this.

I'd also think that given the reference to CSPRNG in e.g. some glibc
versions that instead of the ifdefs in csprng_bytes() we should instead
directly use a secure mkstemp() (or similar) for the not-.git cases that
remain after the "mktemp in a dir we chown" v.s. "mktemp in shared /tmp"
are split up.

Maybe these are all things you looked at and considered, but from my
recollection (I didn't go back and re-read the whole discussion) you
didn't chime in on this point, so *bump* :)

1. https://lore.kernel.org/git/211116.864k8bq0xm.gmgdl@evledraar.gmail.com/
René Scharfe Jan. 18, 2022, 2:42 p.m. UTC | #2
Am 18.01.22 um 10:24 schrieb Ævar Arnfjörð Bjarmason:
>
> On Mon, Jan 17 2022, brian m. carlson wrote:
>
>> The current way we generate random file names is by taking the seconds
>> and microseconds, plus the PID, and mixing them together, then encoding
>> them.  If this fails, we increment the value by 7777, and try again up
>> to TMP_MAX times.
>>
>> Unfortunately, this is not the best idea from a security perspective.
>> If we're writing into TMPDIR, an attacker can guess these values easily
>> and prevent us from creating any temporary files at all by creating them
>> all first.  Even though we set TMP_MAX to 16384, this may be achievable
>> in some contexts, even if unlikely to occur in practice.
>> [...]
>
> I had a comment on v1[1] of this series which still applies here,
> i.e. the "if we're writing into TMPDIR[...]" here elides the fact that
> much of the time we're writing a tempfile into .git, so the security
> issue ostensibly being solved here won't be a practical issue at all.
>
> Then for out-of-repo tempfiles some OS's have a per-user $TEMPDIR you
> can use (e.g. systemd's /run/user/`id -u`). Finally...
>
>> Note that the use of a CSPRNG in generating temporary file names is also
>> used in many libcs.  glibc recently changed from an approach similar to
>> ours to using a CSPRNG, and FreeBSD and OpenBSD also use a CSPRNG in
>> this case.  Even if the likelihood of an attack is low, we should still
>> be at least as responsible in creating temporary files as libc is.
>
> ...we already have in-tree users of mkstemp(), which on glibc ostensibly
> tries to solve the same security issues you note here, and the
> reftable/* user has been added since earlier iterations of this series:
> o
>     $ git grep -E '\bmkstemp\(' -- '*.[ch]'
>     compat/mingw.c:int mkstemp(char *template)
>     compat/mingw.h:int mkstemp(char *template);
>     entry.c:                return mkstemp(path);
>     reftable/stack.c:       tab_fd = mkstemp(temp_tab_file_name.buf);
>     reftable/stack.c:       tab_fd = mkstemp(temp_tab->buf);
>     reftable/stack_test.c:  int fd = mkstemp(fn);
>     wrapper.c:      fd = mkstemp(filename_template);
>
> This series really feels like it's adding too much complexity and
> potential auditing headaches (distributors worrying about us shipping a
> CSPRNG, having to audit it) to a low-level codepath that most of the
> time won't need this at all.

Good point.  Please let me think out loud for a moment.

mkstemp() is secure (right?) and used already.  mkstemps() was used as
well until b2d593a779 (wrapper.c: remove unused gitmkstemps() function,
2017-02-28).  What's the difference?  The former requires the random
characters at the end (e.g. "name-XXXXXX"), while the latter allows a
suffix (e.g. "name-XXXXXX.ext"); that's what the added "s" in the name
means, I guess.  And apparently mkstemp is everywhere, but mkstemps is
a GNU extension.

git_mkstemps_mode is used by mks_tempfile_sm, mks_tempfile_tsm and
git_mkstemp_mode.  The latter uses no suffix, so could be implemented
using mkstemp and fchmod instead.

mks_tempfile_sm is called by write_locked_index, mks_tempfile_s,
mks_tempfile_m and mks_tempfile.  Only mks_tempfile_s uses a suffix,
but is itself unused.  So an implementation based on mkstemp and fchmod
would suffice for mks_tempfile_sm users as well.

mks_tempfile_tsm is used by mks_tempfile_ts, mks_tempfile_tm and
mks_tempfile_t.  Only mks_tempfile_ts uses a suffix.  Its only caller
is diff.c::prep_temp_blob, called only by diff.c::prepare_temp_file,
called by add_external_diff_name and run_textconv in the same file.

So if I didn't take a wrong turn somewhere the only temporary file name
templates with suffixes are those for external diffs and textconv
filters.  The rest of the git_mkstemps_mode users could be switched to
mkstemp+fchmod.

Temporary files for external diffs and textconv filters *should* be
placed in $TMPDIR.  Do they need suffixes?  I guess for testconv
filters it doesn't matter, but graphical diff viewers might do syntax
highlighting based on the extension.

Can we get extensions without mktemps or git_mkstemps_mode?  Yes, by
using mkdtemp to create a temporary directory with a random name and
creating the files in it.  There already are mkdtemp users.

So AFAICS all use cases of git_mkstemps_mode can be served by
mkstemp+fchmod or mkdtemp.  Would that help?

> So instead of:
>
>  A. Add CSPRNG with demo test helper
>  B. Use it in git_mkstemps_mode()
>
> I'd think we'd be much better off with:
>
>  A. Split out callers of tempfile.c and mk.*temp in wrapper.c that create tempfiles in .git
>  B. <the rest>
>
> I honestly haven't looked too much at what <the rest> is, other than
> what I wrote in [1], which seems to suggest that most of our codepaths
> won't need this.
>
> I'd also think that given the reference to CSPRNG in e.g. some glibc
> versions that instead of the ifdefs in csprng_bytes() we should instead
> directly use a secure mkstemp() (or similar) for the not-.git cases that
> remain after the "mktemp in a dir we chown" v.s. "mktemp in shared /tmp"
> are split up.
>
> Maybe these are all things you looked at and considered, but from my
> recollection (I didn't go back and re-read the whole discussion) you
> didn't chime in on this point, so *bump* :)
>
> 1. https://lore.kernel.org/git/211116.864k8bq0xm.gmgdl@evledraar.gmail.com/
Ævar Arnfjörð Bjarmason Jan. 18, 2022, 2:51 p.m. UTC | #3
On Tue, Jan 18 2022, René Scharfe wrote:

> Am 18.01.22 um 10:24 schrieb Ævar Arnfjörð Bjarmason:
>>
>> On Mon, Jan 17 2022, brian m. carlson wrote:
>>
>>> The current way we generate random file names is by taking the seconds
>>> and microseconds, plus the PID, and mixing them together, then encoding
>>> them.  If this fails, we increment the value by 7777, and try again up
>>> to TMP_MAX times.
>>>
>>> Unfortunately, this is not the best idea from a security perspective.
>>> If we're writing into TMPDIR, an attacker can guess these values easily
>>> and prevent us from creating any temporary files at all by creating them
>>> all first.  Even though we set TMP_MAX to 16384, this may be achievable
>>> in some contexts, even if unlikely to occur in practice.
>>> [...]
>>
>> I had a comment on v1[1] of this series which still applies here,
>> i.e. the "if we're writing into TMPDIR[...]" here elides the fact that
>> much of the time we're writing a tempfile into .git, so the security
>> issue ostensibly being solved here won't be a practical issue at all.
>>
>> Then for out-of-repo tempfiles some OS's have a per-user $TEMPDIR you
>> can use (e.g. systemd's /run/user/`id -u`). Finally...
>>
>>> Note that the use of a CSPRNG in generating temporary file names is also
>>> used in many libcs.  glibc recently changed from an approach similar to
>>> ours to using a CSPRNG, and FreeBSD and OpenBSD also use a CSPRNG in
>>> this case.  Even if the likelihood of an attack is low, we should still
>>> be at least as responsible in creating temporary files as libc is.
>>
>> ...we already have in-tree users of mkstemp(), which on glibc ostensibly
>> tries to solve the same security issues you note here, and the
>> reftable/* user has been added since earlier iterations of this series:
>> o
>>     $ git grep -E '\bmkstemp\(' -- '*.[ch]'
>>     compat/mingw.c:int mkstemp(char *template)
>>     compat/mingw.h:int mkstemp(char *template);
>>     entry.c:                return mkstemp(path);
>>     reftable/stack.c:       tab_fd = mkstemp(temp_tab_file_name.buf);
>>     reftable/stack.c:       tab_fd = mkstemp(temp_tab->buf);
>>     reftable/stack_test.c:  int fd = mkstemp(fn);
>>     wrapper.c:      fd = mkstemp(filename_template);
>>
>> This series really feels like it's adding too much complexity and
>> potential auditing headaches (distributors worrying about us shipping a
>> CSPRNG, having to audit it) to a low-level codepath that most of the
>> time won't need this at all.
>
> Good point.  Please let me think out loud for a moment.
>
> mkstemp() is secure (right?) and used already.  mkstemps() was used as
> well until b2d593a779 (wrapper.c: remove unused gitmkstemps() function,
> 2017-02-28).  What's the difference?  The former requires the random
> characters at the end (e.g. "name-XXXXXX"), while the latter allows a
> suffix (e.g. "name-XXXXXX.ext"); that's what the added "s" in the name
> means, I guess.  And apparently mkstemp is everywhere, but mkstemps is
> a GNU extension.
>
> git_mkstemps_mode is used by mks_tempfile_sm, mks_tempfile_tsm and
> git_mkstemp_mode.  The latter uses no suffix, so could be implemented
> using mkstemp and fchmod instead.
>
> mks_tempfile_sm is called by write_locked_index, mks_tempfile_s,
> mks_tempfile_m and mks_tempfile.  Only mks_tempfile_s uses a suffix,
> but is itself unused.  So an implementation based on mkstemp and fchmod
> would suffice for mks_tempfile_sm users as well.
>
> mks_tempfile_tsm is used by mks_tempfile_ts, mks_tempfile_tm and
> mks_tempfile_t.  Only mks_tempfile_ts uses a suffix.  Its only caller
> is diff.c::prep_temp_blob, called only by diff.c::prepare_temp_file,
> called by add_external_diff_name and run_textconv in the same file.
>
> So if I didn't take a wrong turn somewhere the only temporary file name
> templates with suffixes are those for external diffs and textconv
> filters.  The rest of the git_mkstemps_mode users could be switched to
> mkstemp+fchmod.
>
> Temporary files for external diffs and textconv filters *should* be
> placed in $TMPDIR.  Do they need suffixes?  I guess for testconv
> filters it doesn't matter, but graphical diff viewers might do syntax
> highlighting based on the extension.
>
> Can we get extensions without mktemps or git_mkstemps_mode?  Yes, by
> using mkdtemp to create a temporary directory with a random name and
> creating the files in it.  There already are mkdtemp users.
>
> So AFAICS all use cases of git_mkstemps_mode can be served by
> mkstemp+fchmod or mkdtemp.  Would that help?

That seems sensible, as a further practical suggestion it seems like
we'll retry around 16k times to create these files on failure, perhaps
trying with a custom extension 8k times would be OK, followed by the
minor UI degradation of doing the final 8k retries with the more-random
OS-provided extension-less variant.

More generally I'm still not sure if this is still a purely hypothetical
attack mitigation, or whether there are actually users out there that
have to deal with this.

Wouldn't something like this also be an acceptable "solution" to this
class of problem?

	#define TMP_MAX 1024
	[...]
	if (count >= TMP_MAX)
		die(_("we tried and failed to create a tempfile using the '%s' template after %d tries!\n\n"
                    "Is someone actively DoS you? Is your sysadmin known to be your mortal enemy?\n"
                    "are you using Satan's shared hosting services? In any case, we give up!\n\n"
                    "To \"retry\" set TEMPDIR to some location where other users won't race us to death"),
                    template, count);

For a bonus grade we could add a few more lines and try to stat() some
of the files we failed to create, and report what UID it is that's
racing you for all of these tempfile creations.

>> So instead of:
>>
>>  A. Add CSPRNG with demo test helper
>>  B. Use it in git_mkstemps_mode()
>>
>> I'd think we'd be much better off with:
>>
>>  A. Split out callers of tempfile.c and mk.*temp in wrapper.c that create tempfiles in .git
>>  B. <the rest>
>>
>> I honestly haven't looked too much at what <the rest> is, other than
>> what I wrote in [1], which seems to suggest that most of our codepaths
>> won't need this.
>>
>> I'd also think that given the reference to CSPRNG in e.g. some glibc
>> versions that instead of the ifdefs in csprng_bytes() we should instead
>> directly use a secure mkstemp() (or similar) for the not-.git cases that
>> remain after the "mktemp in a dir we chown" v.s. "mktemp in shared /tmp"
>> are split up.
>>
>> Maybe these are all things you looked at and considered, but from my
>> recollection (I didn't go back and re-read the whole discussion) you
>> didn't chime in on this point, so *bump* :)
>>
>> 1. https://lore.kernel.org/git/211116.864k8bq0xm.gmgdl@evledraar.gmail.com/
René Scharfe Jan. 18, 2022, 4:44 p.m. UTC | #4
Am 18.01.22 um 15:51 schrieb Ævar Arnfjörð Bjarmason:
>
> On Tue, Jan 18 2022, René Scharfe wrote:
>
>> Am 18.01.22 um 10:24 schrieb Ævar Arnfjörð Bjarmason:
>>>
>>> On Mon, Jan 17 2022, brian m. carlson wrote:
>>>
>>>> The current way we generate random file names is by taking the seconds
>>>> and microseconds, plus the PID, and mixing them together, then encoding
>>>> them.  If this fails, we increment the value by 7777, and try again up
>>>> to TMP_MAX times.
>>>>
>>>> Unfortunately, this is not the best idea from a security perspective.
>>>> If we're writing into TMPDIR, an attacker can guess these values easily
>>>> and prevent us from creating any temporary files at all by creating them
>>>> all first.  Even though we set TMP_MAX to 16384, this may be achievable
>>>> in some contexts, even if unlikely to occur in practice.
>>>> [...]
>>>
>>> I had a comment on v1[1] of this series which still applies here,
>>> i.e. the "if we're writing into TMPDIR[...]" here elides the fact that
>>> much of the time we're writing a tempfile into .git, so the security
>>> issue ostensibly being solved here won't be a practical issue at all.
>>>
>>> Then for out-of-repo tempfiles some OS's have a per-user $TEMPDIR you
>>> can use (e.g. systemd's /run/user/`id -u`). Finally...
>>>
>>>> Note that the use of a CSPRNG in generating temporary file names is also
>>>> used in many libcs.  glibc recently changed from an approach similar to
>>>> ours to using a CSPRNG, and FreeBSD and OpenBSD also use a CSPRNG in
>>>> this case.  Even if the likelihood of an attack is low, we should still
>>>> be at least as responsible in creating temporary files as libc is.
>>>
>>> ...we already have in-tree users of mkstemp(), which on glibc ostensibly
>>> tries to solve the same security issues you note here, and the
>>> reftable/* user has been added since earlier iterations of this series:
>>> o
>>>     $ git grep -E '\bmkstemp\(' -- '*.[ch]'
>>>     compat/mingw.c:int mkstemp(char *template)
>>>     compat/mingw.h:int mkstemp(char *template);
>>>     entry.c:                return mkstemp(path);
>>>     reftable/stack.c:       tab_fd = mkstemp(temp_tab_file_name.buf);
>>>     reftable/stack.c:       tab_fd = mkstemp(temp_tab->buf);
>>>     reftable/stack_test.c:  int fd = mkstemp(fn);
>>>     wrapper.c:      fd = mkstemp(filename_template);
>>>
>>> This series really feels like it's adding too much complexity and
>>> potential auditing headaches (distributors worrying about us shipping a
>>> CSPRNG, having to audit it) to a low-level codepath that most of the
>>> time won't need this at all.
>>
>> Good point.  Please let me think out loud for a moment.
>>
>> mkstemp() is secure (right?) and used already.  mkstemps() was used as
>> well until b2d593a779 (wrapper.c: remove unused gitmkstemps() function,
>> 2017-02-28).  What's the difference?  The former requires the random
>> characters at the end (e.g. "name-XXXXXX"), while the latter allows a
>> suffix (e.g. "name-XXXXXX.ext"); that's what the added "s" in the name
>> means, I guess.  And apparently mkstemp is everywhere, but mkstemps is
>> a GNU extension.
>>
>> git_mkstemps_mode is used by mks_tempfile_sm, mks_tempfile_tsm and
>> git_mkstemp_mode.  The latter uses no suffix, so could be implemented
>> using mkstemp and fchmod instead.
>>
>> mks_tempfile_sm is called by write_locked_index, mks_tempfile_s,
>> mks_tempfile_m and mks_tempfile.  Only mks_tempfile_s uses a suffix,
>> but is itself unused.  So an implementation based on mkstemp and fchmod
>> would suffice for mks_tempfile_sm users as well.
>>
>> mks_tempfile_tsm is used by mks_tempfile_ts, mks_tempfile_tm and
>> mks_tempfile_t.  Only mks_tempfile_ts uses a suffix.  Its only caller
>> is diff.c::prep_temp_blob, called only by diff.c::prepare_temp_file,
>> called by add_external_diff_name and run_textconv in the same file.
>>
>> So if I didn't take a wrong turn somewhere the only temporary file name
>> templates with suffixes are those for external diffs and textconv
>> filters.  The rest of the git_mkstemps_mode users could be switched to
>> mkstemp+fchmod.
>>
>> Temporary files for external diffs and textconv filters *should* be
>> placed in $TMPDIR.  Do they need suffixes?  I guess for testconv
>> filters it doesn't matter, but graphical diff viewers might do syntax
>> highlighting based on the extension.
>>
>> Can we get extensions without mktemps or git_mkstemps_mode?  Yes, by
>> using mkdtemp to create a temporary directory with a random name and
>> creating the files in it.  There already are mkdtemp users.
>>
>> So AFAICS all use cases of git_mkstemps_mode can be served by
>> mkstemp+fchmod or mkdtemp.  Would that help?
>
> That seems sensible, as a further practical suggestion it seems like
> we'll retry around 16k times to create these files on failure, perhaps
> trying with a custom extension 8k times would be OK, followed by the
> minor UI degradation of doing the final 8k retries with the more-random
> OS-provided extension-less variant.

You can't use the suffix-less mkstemp if a suffix is necessary.

Retries would be handled by mkstemp and mkdtemp IIUC.  To an extent.
E.g. https://cgit.freebsd.org/src/tree/lib/libc/stdio/mktemp.c seems
to just count up, which doesn't help if an attacker guessed the first
name correctly.  So maybe some kind of EEXIST loop is still necessary.

> More generally I'm still not sure if this is still a purely hypothetical
> attack mitigation, or whether there are actually users out there that
> have to deal with this.
>
> Wouldn't something like this also be an acceptable "solution" to this
> class of problem?
>
> 	#define TMP_MAX 1024
> 	[...]
> 	if (count >= TMP_MAX)
> 		die(_("we tried and failed to create a tempfile using the '%s' template after %d tries!\n\n"
>                     "Is someone actively DoS you? Is your sysadmin known to be your mortal enemy?\n"
>                     "are you using Satan's shared hosting services? In any case, we give up!\n\n"
>                     "To \"retry\" set TEMPDIR to some location where other users won't race us to death"),
>                     template, count);

You mean users should be educated to stay away from shared temporary
directories on multi-user systems?  Good advice.  I'm also not sure
how relevant it is these days -- doesn't everybody get their own VM
these days? :)  Anyway, there are probably some who cannot follow it,
e.g. because their $HOME quota is exhausted.

> For a bonus grade we could add a few more lines and try to stat() some
> of the files we failed to create, and report what UID it is that's
> racing you for all of these tempfile creations.

Sounds fun, can enliven the office.  Once the fisticuffs are over --
PLOT TWIST! Turns out the RNG handed out the same "random" numbers to
everyone. ;)

>
>>> So instead of:
>>>
>>>  A. Add CSPRNG with demo test helper
>>>  B. Use it in git_mkstemps_mode()
>>>
>>> I'd think we'd be much better off with:
>>>
>>>  A. Split out callers of tempfile.c and mk.*temp in wrapper.c that create tempfiles in .git
>>>  B. <the rest>
>>>
>>> I honestly haven't looked too much at what <the rest> is, other than
>>> what I wrote in [1], which seems to suggest that most of our codepaths
>>> won't need this.
>>>
>>> I'd also think that given the reference to CSPRNG in e.g. some glibc
>>> versions that instead of the ifdefs in csprng_bytes() we should instead
>>> directly use a secure mkstemp() (or similar) for the not-.git cases that
>>> remain after the "mktemp in a dir we chown" v.s. "mktemp in shared /tmp"
>>> are split up.
>>>
>>> Maybe these are all things you looked at and considered, but from my
>>> recollection (I didn't go back and re-read the whole discussion) you
>>> didn't chime in on this point, so *bump* :)
>>>
>>> 1. https://lore.kernel.org/git/211116.864k8bq0xm.gmgdl@evledraar.gmail.com/
>
Junio C Hamano Jan. 18, 2022, 5:25 p.m. UTC | #5
René Scharfe <l.s.r@web.de> writes:

>> This series really feels like it's adding too much complexity and
>> potential auditing headaches (distributors worrying about us shipping a
>> CSPRNG, having to audit it) to a low-level codepath that most of the
>> time won't need this at all.
>
> Good point.  Please let me think out loud for a moment.

Yeah, I agree you and Ævar that the topic may be over-engineering
the solution for problem that we shouldn't be the ones who solve.

I agree with your analysis that the "diff" tempfiles do need suffix,
we SHOULD create them in $TMPDIR (not in the working tree or
$GIT_DIR) to support operation in a read-only repository, but we can
create a unique temporary directory and place a file (even under its
original name) in it as a workaround.

Thanks.
brian m. carlson Jan. 19, 2022, 3:28 a.m. UTC | #6
On 2022-01-18 at 09:24:58, Ævar Arnfjörð Bjarmason wrote:
> I had a comment on v1[1] of this series which still applies here,
> i.e. the "if we're writing into TMPDIR[...]" here elides the fact that
> much of the time we're writing a tempfile into .git, so the security
> issue ostensibly being solved here won't be a practical issue at all.
> 
> Then for out-of-repo tempfiles some OS's have a per-user $TEMPDIR you
> can use (e.g. systemd's /run/user/`id -u`). Finally...

Some OSes do have that, but it is not universal and we can't rely on
environment variables being set.  They are stripped out by some
programs, including Homebrew, even on OSes where they're provided.

/run/user is also not a suitable temporary directory on Linux.
Temporary files can be large, and /run is almost always a tmpfs, which
means it sits entirely in memory.  Setting anything in /run as TMPDIR
is a mistake.

> ...we already have in-tree users of mkstemp(), which on glibc ostensibly
> tries to solve the same security issues you note here, and the
> reftable/* user has been added since earlier iterations of this series:
> o    
>     $ git grep -E '\bmkstemp\(' -- '*.[ch]'
>     compat/mingw.c:int mkstemp(char *template)
>     compat/mingw.h:int mkstemp(char *template);
>     entry.c:                return mkstemp(path);
>     reftable/stack.c:       tab_fd = mkstemp(temp_tab_file_name.buf);
>     reftable/stack.c:       tab_fd = mkstemp(temp_tab->buf);
>     reftable/stack_test.c:  int fd = mkstemp(fn);
>     wrapper.c:      fd = mkstemp(filename_template);
> 
> This series really feels like it's adding too much complexity and
> potential auditing headaches (distributors worrying about us shipping a
> CSPRNG, having to audit it) to a low-level codepath that most of the
> time won't need this at all.

Every Rust program or Go program includes code to call a CSPRNG because
it's required to avoid hash collision DoS attacks.  I do plan to look
into that in the future, because my guess right now is that we are in
fact vulnerable to DoS attacks if someone creates crafted object IDs.[0]
When I do that, I'll need this code.

I also don't think adding code for a CSPRNG is an auditing problem.  We
use the system CSPRNG, which is the thing that literally everybody
should be doing if they need good quality random numbers.  If we were
shipping a custom CSPRNG, then that would be an auditing problem, but I
am explicitly not doing that because it's not necessary here and I'm
willing to insist that the system provide one somewhere so we don't have
to.

> So instead of:
> 
>  A. Add CSPRNG with demo test helper
>  B. Use it in git_mkstemps_mode()
> 
> I'd think we'd be much better off with:
> 
>  A. Split out callers of tempfile.c and mk.*temp in wrapper.c that create tempfiles in .git
>  B. <the rest>
> 
> I honestly haven't looked too much at what <the rest> is, other than
> what I wrote in [1], which seems to suggest that most of our codepaths
> won't need this.
> 
> I'd also think that given the reference to CSPRNG in e.g. some glibc
> versions that instead of the ifdefs in csprng_bytes() we should instead
> directly use a secure mkstemp() (or similar) for the not-.git cases that
> remain after the "mktemp in a dir we chown" v.s. "mktemp in shared /tmp"
> are split up.
> 
> Maybe these are all things you looked at and considered, but from my
> recollection (I didn't go back and re-read the whole discussion) you
> didn't chime in on this point, so *bump* :)

I did explain it in the cover letter for v2, along with the explanation
in the paragraph above.  The situation is that mkstemp doesn't handle
all our use cases, and Randall said in the followups to v1 that mkstemp
is not available on NonStop.  I therefore must conclude that we'll need
to implement this somewhere, even if only as a fallback.

If we think mkstemp is going to work here and we don't need this, then
I'll drop this series.  However, I am annoyed that I'm getting
conflicting information about what code is portable on different
platforms, which is made especially difficult by trying to support Unix
systems that don't support a 21-year-old standard and which lack
suitable public documentation.  If I write and polish a series based on
a set of information I've been given and then it turns out that we
decide to do something completely different based on conflicting
information, that's not a motivator to continue sending patches.  This
will not be the first time I've dropped a series after several rounds of
review because we totally decided to change course and do something
different, and I don't want to repeat this again.

[0] This type of attack is extremely trivial because the number of
collisions necessary for this attack is usually on the order of 2^16,
which means the work can be done in a couple seconds on a typical
system.  I have a proof of concept of this for PHP online.
René Scharfe Jan. 19, 2022, 5:49 p.m. UTC | #7
Am 18.01.22 um 18:25 schrieb Junio C Hamano:
> I agree with your analysis that the "diff" tempfiles do need suffix,
> we SHOULD create them in $TMPDIR (not in the working tree or
> $GIT_DIR) to support operation in a read-only repository, but we can
> create a unique temporary directory and place a file (even under its
> original name) in it as a workaround.

Here's a proof-of-concept patch for handling just that diff use-case
using mkdtemp.  Indeed it's nice to see the actual filename with
difftool.


 diff.c                   |  4 ++--
 t/t4020-diff-external.sh |  2 +-
 tempfile.c               | 20 ++++++++++++++++++++
 tempfile.h               |  1 +
 wrapper.c                | 12 ++++++++++++
 5 files changed, 36 insertions(+), 3 deletions(-)

diff --git a/diff.c b/diff.c
index c862771a58..37db4743b0 100644
--- a/diff.c
+++ b/diff.c
@@ -4098,8 +4098,8 @@ static void prep_temp_blob(struct index_state *istate,

 	init_checkout_metadata(&meta, NULL, NULL, oid);

-	/* Generate "XXXXXX_basename.ext" */
-	strbuf_addstr(&tempfile, "XXXXXX_");
+	/* Generate "git-blob-XXXXXX/basename.ext" */
+	strbuf_addstr(&tempfile, "git-blob-XXXXXX/");
 	strbuf_addstr(&tempfile, base);

 	temp->tempfile = mks_tempfile_ts(tempfile.buf, strlen(base) + 1);
diff --git a/t/t4020-diff-external.sh b/t/t4020-diff-external.sh
index 54bb8ef27e..e7f93f36f5 100755
--- a/t/t4020-diff-external.sh
+++ b/t/t4020-diff-external.sh
@@ -217,7 +217,7 @@ test_expect_success 'GIT_EXTERNAL_DIFF generates pretty paths' '
 	touch file.ext &&
 	git add file.ext &&
 	echo with extension > file.ext &&
-	GIT_EXTERNAL_DIFF=echo git diff file.ext | grep ......_file\.ext &&
+	GIT_EXTERNAL_DIFF=echo git diff file.ext | grep git-blob-....../file\.ext &&
 	git update-index --force-remove file.ext &&
 	rm file.ext
 '
diff --git a/tempfile.c b/tempfile.c
index 94aa18f3f7..80cc9fb512 100644
--- a/tempfile.c
+++ b/tempfile.c
@@ -56,6 +56,21 @@

 static VOLATILE_LIST_HEAD(tempfile_list);

+static void remove_template_directory(struct tempfile *tempfile,
+				      int in_signal_handler)
+{
+	char *end = tempfile->filename.buf + tempfile->filename.len;
+	char *suffix = end - tempfile->suffixlen;
+	if (*suffix != '/')
+		return;
+	*suffix = '\0';
+	if (in_signal_handler)
+		rmdir(tempfile->filename.buf);
+	else
+		rmdir_or_warn(tempfile->filename.buf);
+	*suffix = '/';
+}
+
 static void remove_tempfiles(int in_signal_handler)
 {
 	pid_t me = getpid();
@@ -74,6 +89,7 @@ static void remove_tempfiles(int in_signal_handler)
 			unlink(p->filename.buf);
 		else
 			unlink_or_warn(p->filename.buf);
+		remove_template_directory(p, in_signal_handler);

 		p->active = 0;
 	}
@@ -100,6 +116,7 @@ static struct tempfile *new_tempfile(void)
 	tempfile->owner = 0;
 	INIT_LIST_HEAD(&tempfile->list);
 	strbuf_init(&tempfile->filename, 0);
+	tempfile->suffixlen = 0;
 	return tempfile;
 }

@@ -170,6 +187,7 @@ struct tempfile *mks_tempfile_sm(const char *filename_template, int suffixlen, i
 	struct tempfile *tempfile = new_tempfile();

 	strbuf_add_absolute_path(&tempfile->filename, filename_template);
+	tempfile->suffixlen = suffixlen;
 	tempfile->fd = git_mkstemps_mode(tempfile->filename.buf, suffixlen, mode);
 	if (tempfile->fd < 0) {
 		deactivate_tempfile(tempfile);
@@ -189,6 +207,7 @@ struct tempfile *mks_tempfile_tsm(const char *filename_template, int suffixlen,
 		tmpdir = "/tmp";

 	strbuf_addf(&tempfile->filename, "%s/%s", tmpdir, filename_template);
+	tempfile->suffixlen = suffixlen;
 	tempfile->fd = git_mkstemps_mode(tempfile->filename.buf, suffixlen, mode);
 	if (tempfile->fd < 0) {
 		deactivate_tempfile(tempfile);
@@ -316,6 +335,7 @@ void delete_tempfile(struct tempfile **tempfile_p)

 	close_tempfile_gently(tempfile);
 	unlink_or_warn(tempfile->filename.buf);
+	remove_template_directory(tempfile, 0);
 	deactivate_tempfile(tempfile);
 	*tempfile_p = NULL;
 }
diff --git a/tempfile.h b/tempfile.h
index 4de3bc77d2..b9a60f2431 100644
--- a/tempfile.h
+++ b/tempfile.h
@@ -82,6 +82,7 @@ struct tempfile {
 	FILE *volatile fp;
 	volatile pid_t owner;
 	struct strbuf filename;
+	size_t suffixlen;
 };

 /*
diff --git a/wrapper.c b/wrapper.c
index 36e12119d7..358db282f9 100644
--- a/wrapper.c
+++ b/wrapper.c
@@ -481,6 +481,18 @@ int git_mkstemps_mode(char *pattern, int suffix_len, int mode)
 		return -1;
 	}

+	if (pattern[len - suffix_len] == '/') {
+		if (mode != 0600) {
+			errno = EINVAL;
+			return -1;
+		}
+		pattern[len - suffix_len] = '\0';
+		if (!mkdtemp(pattern))
+			return -1;
+		pattern[len - suffix_len] = '/';
+		return open(pattern, O_CREAT | O_EXCL | O_RDWR, mode);
+	}
+
 	/*
 	 * Replace pattern's XXXXXX characters with randomness.
 	 * Try TMP_MAX different filenames.
René Scharfe Jan. 22, 2022, 10:41 a.m. UTC | #8
Am 18.01.22 um 18:25 schrieb Junio C Hamano:
> René Scharfe <l.s.r@web.de> writes:
>
>>> This series really feels like it's adding too much complexity and
>>> potential auditing headaches (distributors worrying about us shipping a
>>> CSPRNG, having to audit it) to a low-level codepath that most of the
>>> time won't need this at all.
>>
>> Good point.  Please let me think out loud for a moment.
>
> Yeah, I agree you and Ævar that the topic may be over-engineering
> the solution for problem that we shouldn't be the ones who solve.
>
> I agree with your analysis that the "diff" tempfiles do need suffix,
> we SHOULD create them in $TMPDIR (not in the working tree or
> $GIT_DIR) to support operation in a read-only repository, but we can
> create a unique temporary directory and place a file (even under its
> original name) in it as a workaround.

I forgot one crucial aspect, though: umask.  The "m" variants of the
tempfile functions set a mode, with umask applied.  git_mkstemps_mode()
does that by passing the mode to open(2), which applies the umask
internally.  Neither mkstemp(3) nor chmod(2) do that for us, so a
replacement needs to call umask(2) to get it and again to restore it,
which requires two system calls and is racy if multiple threads do this.

Diff doesn't need a custom mode, so we can still use mkdtemp() there and
thus make the suffix feature of git_mkstemps_mode() unnecessary.  But
a replacement for git_mkstemp_mode() with two umask(2) calls looks less
attractive to me than fortifying git_mkstemps_mode() with a good source
of randomness.

René
Junio C Hamano Jan. 24, 2022, 5:08 p.m. UTC | #9
René Scharfe <l.s.r@web.de> writes:

> ...  But
> a replacement for git_mkstemp_mode() with two umask(2) calls looks less
> attractive to me than fortifying git_mkstemps_mode() with a good source
> of randomness.

True.

Also, it is not like we are supplying our own implementation of
random source, but are just pluggig various system-supplied random
source into our code, so I do not see the "auditatiblity" problem we
heard earlier too much of an issue.
diff mbox series

Patch

diff --git a/wrapper.c b/wrapper.c
index 1052356703..3258cdb171 100644
--- a/wrapper.c
+++ b/wrapper.c
@@ -463,8 +463,6 @@  int git_mkstemps_mode(char *pattern, int suffix_len, int mode)
 	static const int num_letters = ARRAY_SIZE(letters) - 1;
 	static const char x_pattern[] = "XXXXXX";
 	static const int num_x = ARRAY_SIZE(x_pattern) - 1;
-	uint64_t value;
-	struct timeval tv;
 	char *filename_template;
 	size_t len;
 	int fd, count;
@@ -485,12 +483,13 @@  int git_mkstemps_mode(char *pattern, int suffix_len, int mode)
 	 * Replace pattern's XXXXXX characters with randomness.
 	 * Try TMP_MAX different filenames.
 	 */
-	gettimeofday(&tv, NULL);
-	value = ((uint64_t)tv.tv_usec << 16) ^ tv.tv_sec ^ getpid();
 	filename_template = &pattern[len - num_x - suffix_len];
 	for (count = 0; count < TMP_MAX; ++count) {
-		uint64_t v = value;
 		int i;
+		uint64_t v;
+		if (csprng_bytes(&v, sizeof(v)) < 0)
+			return error_errno("unable to get random bytes for temporary file");
+
 		/* Fill in the random bits. */
 		for (i = 0; i < num_x; i++) {
 			filename_template[i] = letters[v % num_letters];
@@ -506,12 +505,6 @@  int git_mkstemps_mode(char *pattern, int suffix_len, int mode)
 		 */
 		if (errno != EEXIST)
 			break;
-		/*
-		 * This is a random value.  It is only necessary that
-		 * the next TMP_MAX values generated by adding 7777 to
-		 * VALUE are different with (module 2^32).
-		 */
-		value += 7777;
 	}
 	/* We return the null string if we can't find a unique file name.  */
 	pattern[0] = '\0';