mbox series

[0/2] reftable/stack: stop dying on exhausted entropy pool

Message ID 20250107-b4-pks-reftable-csprng-v1-0-6109a54a8756@pks.im (mailing list archive)
Headers show
Series reftable/stack: stop dying on exhausted entropy pool | expand

Message

Patrick Steinhardt Jan. 7, 2025, 3:26 p.m. UTC
Hi,

this small patch series fixes the issue reported by Randall [1], where
an exhausted entropy pool can cause us to die when writing a new table
to the reftable stack. I _think_ that this is only an issue with the
OpenSSL backend of `csprng_bytes()`:

  - `arc4random_buf()` never returns an error.

  - `getrandom()` pulls from "/dev/urandom" by default.

  - `getentropy()` seems to block when there is not enough randomness
    available.

  - `GtlGenRandom()` I cannot really tell.

  - The fallback reads from "/dev/urandom", which also returns bytes in
    case the entropy pool is drained.

So OpenSSL's `RAND_bytes()` seems to be the only one that returns an
error when the entropy pool is empty. I did wonder whether we even need
to introduce the new flag in the first place, or whether we cannot just
use `RAND_pseudo_bytes()` unconditionally. But I'm a bit uneasy about it
given that OpenSSL has this doc:

    RAND_pseudo_bytes() puts num pseudo-random bytes into buf.
    Pseudo-random byte sequences generated by RAND_pseudo_bytes() will
    be unique if they are of sufficient length, but are not necessarily
    unpredictable. They can be used for non-cryptographic purposes and
    for certain purposes in cryptographic protocols, but usually not for
    key generation etc.

It might be too easy to accidentally rely on `csprng_bytes()` where it
actually requires strong cryptographic data, so I was erring on the side
of caution.

Thanks!

---
Patrick Steinhardt (2):
      wrapper: allow generating insecure random bytes
      reftable/stack: accept insecure random bytes

 builtin/gc.c                        |  2 +-
 reftable/stack.c                    |  4 ++--
 t/helper/test-csprng.c              |  2 +-
 t/unit-tests/t-reftable-readwrite.c |  6 +++---
 wrapper.c                           | 24 ++++++++++++++----------
 wrapper.h                           | 16 ++++++++++++----
 6 files changed, 33 insertions(+), 21 deletions(-)


---
base-commit: b74ff38af58464688b211140b90ec90598d340c6
change-id: 20250107-b4-pks-reftable-csprng-9ed4e8dd83c4

Comments

brian m. carlson Jan. 7, 2025, 11:21 p.m. UTC | #1
On 2025-01-07 at 15:26:58, Patrick Steinhardt wrote:
> Hi,
> 
> this small patch series fixes the issue reported by Randall [1], where
> an exhausted entropy pool can cause us to die when writing a new table
> to the reftable stack. I _think_ that this is only an issue with the
> OpenSSL backend of `csprng_bytes()`:
> 
>   - `arc4random_buf()` never returns an error.
> 
>   - `getrandom()` pulls from "/dev/urandom" by default.
> 
>   - `getentropy()` seems to block when there is not enough randomness
>     available.
> 
>   - `GtlGenRandom()` I cannot really tell.
> 
>   - The fallback reads from "/dev/urandom", which also returns bytes in
>     case the entropy pool is drained.
> 
> So OpenSSL's `RAND_bytes()` seems to be the only one that returns an
> error when the entropy pool is empty. I did wonder whether we even need
> to introduce the new flag in the first place, or whether we cannot just
> use `RAND_pseudo_bytes()` unconditionally. But I'm a bit uneasy about it
> given that OpenSSL has this doc:
> 
>     RAND_pseudo_bytes() puts num pseudo-random bytes into buf.
>     Pseudo-random byte sequences generated by RAND_pseudo_bytes() will
>     be unique if they are of sufficient length, but are not necessarily
>     unpredictable. They can be used for non-cryptographic purposes and
>     for certain purposes in cryptographic protocols, but usually not for
>     key generation etc.
> 
> It might be too easy to accidentally rely on `csprng_bytes()` where it
> actually requires strong cryptographic data, so I was erring on the side
> of caution.

The reason I didn't use RAND_pseudo_bytes is because it's been
deprecated since OpenSSL 1.1.0 and RAND_bytes uses a CSPRNG just like
RAND_pseudo_bytes as of that version.  Once it's seeded, it should be
able to generate plenty of bytes, because I believe it uses a CTR-DRBG,
which only needs to be reseeded after 2^48 bytes (which is far more than
we should be using).

We can full well use RAND_pseudo_bytes, but all operating systems should
provide an appropriate entropy source that can provide 256 bits of
entropy on startup.  arc4random will just kill the process if it can't
seed itself, so your changes won't actually prevent dying on a lack of
entropy.

I don't want an option that chooses "insecure" bytes.  My preference is
that we require people use a different backend or an up-to-date OpenSSL
version that shouldn't have this problem.  We can use RAND_pseudo_bytes
if we really need to support older versions, but there are also no major
operating systems which require that old of a version (CentOS 7, which
is dead, used OpenSSL 1.0.2, and CentOS 8 uses 1.1.1k), so it's probably
not within our support policy to do that.

Note also that if OpenSSL is being used for TLS, a lack of entropy will
result in TLS not working, which means that Git will be randomly broken
on that system, which is not really an experience that we want to
encourage, so that should be taken into account.

Can we get some more information about what version of OpenSSL is being
used and what the system entropy source is?
Randall S. Becker Jan. 7, 2025, 11:54 p.m. UTC | #2
On January 7, 2025 6:22 PM, brian m. carlson wrote:
>On 2025-01-07 at 15:26:58, Patrick Steinhardt wrote:
>> Hi,
>>
>> this small patch series fixes the issue reported by Randall [1], where
>> an exhausted entropy pool can cause us to die when writing a new table
>> to the reftable stack. I _think_ that this is only an issue with the
>> OpenSSL backend of `csprng_bytes()`:
>>
>>   - `arc4random_buf()` never returns an error.
>>
>>   - `getrandom()` pulls from "/dev/urandom" by default.
>>
>>   - `getentropy()` seems to block when there is not enough randomness
>>     available.
>>
>>   - `GtlGenRandom()` I cannot really tell.
>>
>>   - The fallback reads from "/dev/urandom", which also returns bytes in
>>     case the entropy pool is drained.
>>
>> So OpenSSL's `RAND_bytes()` seems to be the only one that returns an
>> error when the entropy pool is empty. I did wonder whether we even
>> need to introduce the new flag in the first place, or whether we
>> cannot just use `RAND_pseudo_bytes()` unconditionally. But I'm a bit
>> uneasy about it given that OpenSSL has this doc:
>>
>>     RAND_pseudo_bytes() puts num pseudo-random bytes into buf.
>>     Pseudo-random byte sequences generated by RAND_pseudo_bytes() will
>>     be unique if they are of sufficient length, but are not necessarily
>>     unpredictable. They can be used for non-cryptographic purposes and
>>     for certain purposes in cryptographic protocols, but usually not for
>>     key generation etc.
>>
>> It might be too easy to accidentally rely on `csprng_bytes()` where it
>> actually requires strong cryptographic data, so I was erring on the
>> side of caution.
>
>The reason I didn't use RAND_pseudo_bytes is because it's been deprecated since
>OpenSSL 1.1.0 and RAND_bytes uses a CSPRNG just like RAND_pseudo_bytes as of
>that version.  Once it's seeded, it should be able to generate plenty of bytes,
>because I believe it uses a CTR-DRBG, which only needs to be reseeded after 2^48
>bytes (which is far more than we should be using).
>
>We can full well use RAND_pseudo_bytes, but all operating systems should provide
>an appropriate entropy source that can provide 256 bits of entropy on startup.
>arc4random will just kill the process if it can't seed itself, so your changes won't
>actually prevent dying on a lack of entropy.
>
>I don't want an option that chooses "insecure" bytes.  My preference is that we
>require people use a different backend or an up-to-date OpenSSL version that
>shouldn't have this problem.  We can use RAND_pseudo_bytes if we really need to
>support older versions, but there are also no major operating systems which
>require that old of a version (CentOS 7, which is dead, used OpenSSL 1.0.2, and
>CentOS 8 uses 1.1.1k), so it's probably not within our support policy to do that.
>
>Note also that if OpenSSL is being used for TLS, a lack of entropy will result in TLS
>not working, which means that Git will be randomly broken on that system, which is
>not really an experience that we want to encourage, so that should be taken into
>account.
>
>Can we get some more information about what version of OpenSSL is being used
>and what the system entropy source is?

In my situation, OpenSSL 3.0.11 on ia64. 3.4.1 and 3.0.13 on x86 (x86 works fine
Because OpenSSL uses hardware. On ia64, we end up on PRNGD, which does fail.
Patrick Steinhardt Jan. 8, 2025, 7:18 a.m. UTC | #3
On Tue, Jan 07, 2025 at 06:54:02PM -0500, rsbecker@nexbridge.com wrote:
> On January 7, 2025 6:22 PM, brian m. carlson wrote:
> >On 2025-01-07 at 15:26:58, Patrick Steinhardt wrote:
> >> Hi,
> >>
> >> this small patch series fixes the issue reported by Randall [1], where
> >> an exhausted entropy pool can cause us to die when writing a new table
> >> to the reftable stack. I _think_ that this is only an issue with the
> >> OpenSSL backend of `csprng_bytes()`:
> >>
> >>   - `arc4random_buf()` never returns an error.
> >>
> >>   - `getrandom()` pulls from "/dev/urandom" by default.
> >>
> >>   - `getentropy()` seems to block when there is not enough randomness
> >>     available.
> >>
> >>   - `GtlGenRandom()` I cannot really tell.
> >>
> >>   - The fallback reads from "/dev/urandom", which also returns bytes in
> >>     case the entropy pool is drained.
> >>
> >> So OpenSSL's `RAND_bytes()` seems to be the only one that returns an
> >> error when the entropy pool is empty. I did wonder whether we even
> >> need to introduce the new flag in the first place, or whether we
> >> cannot just use `RAND_pseudo_bytes()` unconditionally. But I'm a bit
> >> uneasy about it given that OpenSSL has this doc:
> >>
> >>     RAND_pseudo_bytes() puts num pseudo-random bytes into buf.
> >>     Pseudo-random byte sequences generated by RAND_pseudo_bytes() will
> >>     be unique if they are of sufficient length, but are not necessarily
> >>     unpredictable. They can be used for non-cryptographic purposes and
> >>     for certain purposes in cryptographic protocols, but usually not for
> >>     key generation etc.
> >>
> >> It might be too easy to accidentally rely on `csprng_bytes()` where it
> >> actually requires strong cryptographic data, so I was erring on the
> >> side of caution.
> >
> >The reason I didn't use RAND_pseudo_bytes is because it's been deprecated since
> >OpenSSL 1.1.0 and RAND_bytes uses a CSPRNG just like RAND_pseudo_bytes as of
> >that version.  Once it's seeded, it should be able to generate plenty of bytes,
> >because I believe it uses a CTR-DRBG, which only needs to be reseeded after 2^48
> >bytes (which is far more than we should be using).
> >
> >We can full well use RAND_pseudo_bytes, but all operating systems should provide
> >an appropriate entropy source that can provide 256 bits of entropy on startup.
> >arc4random will just kill the process if it can't seed itself, so your changes won't
> >actually prevent dying on a lack of entropy.
> >
> >I don't want an option that chooses "insecure" bytes.  My preference is that we
> >require people use a different backend or an up-to-date OpenSSL version that
> >shouldn't have this problem.  We can use RAND_pseudo_bytes if we really need to
> >support older versions, but there are also no major operating systems which
> >require that old of a version (CentOS 7, which is dead, used OpenSSL 1.0.2, and
> >CentOS 8 uses 1.1.1k), so it's probably not within our support policy to do that.
> >
> >Note also that if OpenSSL is being used for TLS, a lack of entropy will result in TLS
> >not working, which means that Git will be randomly broken on that system, which is
> >not really an experience that we want to encourage, so that should be taken into
> >account.
> >
> >Can we get some more information about what version of OpenSSL is being used
> >and what the system entropy source is?
> 
> In my situation, OpenSSL 3.0.11 on ia64. 3.4.1 and 3.0.13 on x86 (x86 works fine
> Because OpenSSL uses hardware. On ia64, we end up on PRNGD, which does fail.

You reported in [1] that a couple more tests are indeed failing, not
only t0610. That changes things in my opinion as it shows that this is
not a localized issue in the reftable library, but likely in multiple
callsites where we use randomness. So my current patch series is not
sufficient as it only fixes up the reftable codebase. But in the case
where it's a general issue I tend to agree with brian, because I don't
want to play whack-a-mole with all the callsites of `git_rand()` where
we can indeed use insecure bytes.

Honestly, this rather makes me want to remove the OpenSSL backend for
our CSRNG completely. NonStop is the only platform that uses it right
now, and it seems to be easy to misconfigure. All the other backends we
have don't have the same issue as explained further up in my message. So
does NonStop support any of the alternative backends that Git has, like
`arc4random_buf()`, `getrandom()`, `getentropy()` or reading from
"/dev/urandom"?

Might be I'm coming to conclusions too fast, so if I'm missing obvious
usecases then please stop me :)

Randall, you mentioned that your platform had a maintenance window right
during the release of v2.48.0-rc2 [2]. You never mentioned issues with
randomness before that maintenance window, and after it you hit them in
many tests without any changes to the CSPRNG between v2.48.0-rc1 and
-rc2. Could it be that something broke on your end?

Patrick

[1]: https://lore.kernel.org/git/00ad01db615f$ce9b6740$6bd235c0$@nexbridge.com/
[2]: https://lore.kernel.org/git/000501db607c$40c009a0$c2401ce0$@nexbridge.com/
Randall S. Becker Jan. 8, 2025, 1:50 p.m. UTC | #4
On January 8, 2025 2:19 AM, Patrick Steinhardt wrote:
>On Tue, Jan 07, 2025 at 06:54:02PM -0500, rsbecker@nexbridge.com wrote:
>> On January 7, 2025 6:22 PM, brian m. carlson wrote:
>> >On 2025-01-07 at 15:26:58, Patrick Steinhardt wrote:
>> >> Hi,
>> >>
>> >> this small patch series fixes the issue reported by Randall [1],
>> >> where an exhausted entropy pool can cause us to die when writing a
>> >> new table to the reftable stack. I _think_ that this is only an
>> >> issue with the OpenSSL backend of `csprng_bytes()`:
>> >>
>> >>   - `arc4random_buf()` never returns an error.
>> >>
>> >>   - `getrandom()` pulls from "/dev/urandom" by default.
>> >>
>> >>   - `getentropy()` seems to block when there is not enough randomness
>> >>     available.
>> >>
>> >>   - `GtlGenRandom()` I cannot really tell.
>> >>
>> >>   - The fallback reads from "/dev/urandom", which also returns bytes
in
>> >>     case the entropy pool is drained.
>> >>
>> >> So OpenSSL's `RAND_bytes()` seems to be the only one that returns
>> >> an error when the entropy pool is empty. I did wonder whether we
>> >> even need to introduce the new flag in the first place, or whether
>> >> we cannot just use `RAND_pseudo_bytes()` unconditionally. But I'm a
>> >> bit uneasy about it given that OpenSSL has this doc:
>> >>
>> >>     RAND_pseudo_bytes() puts num pseudo-random bytes into buf.
>> >>     Pseudo-random byte sequences generated by RAND_pseudo_bytes() will
>> >>     be unique if they are of sufficient length, but are not
necessarily
>> >>     unpredictable. They can be used for non-cryptographic purposes and
>> >>     for certain purposes in cryptographic protocols, but usually not
for
>> >>     key generation etc.
>> >>
>> >> It might be too easy to accidentally rely on `csprng_bytes()` where
>> >> it actually requires strong cryptographic data, so I was erring on
>> >> the side of caution.
>> >
>> >The reason I didn't use RAND_pseudo_bytes is because it's been
>> >deprecated since OpenSSL 1.1.0 and RAND_bytes uses a CSPRNG just like
>> >RAND_pseudo_bytes as of that version.  Once it's seeded, it should be
>> >able to generate plenty of bytes, because I believe it uses a
>> >CTR-DRBG, which only needs to be reseeded after 2^48 bytes (which is far
more
>than we should be using).
>> >
>> >We can full well use RAND_pseudo_bytes, but all operating systems
>> >should provide an appropriate entropy source that can provide 256 bits
of
>entropy on startup.
>> >arc4random will just kill the process if it can't seed itself, so
>> >your changes won't actually prevent dying on a lack of entropy.
>> >
>> >I don't want an option that chooses "insecure" bytes.  My preference
>> >is that we require people use a different backend or an up-to-date
>> >OpenSSL version that shouldn't have this problem.  We can use
>> >RAND_pseudo_bytes if we really need to support older versions, but
>> >there are also no major operating systems which require that old of a
>> >version (CentOS 7, which is dead, used OpenSSL 1.0.2, and CentOS 8 uses
>1.1.1k), so it's probably not within our support policy to do that.
>> >
>> >Note also that if OpenSSL is being used for TLS, a lack of entropy
>> >will result in TLS not working, which means that Git will be randomly
>> >broken on that system, which is not really an experience that we want
>> >to encourage, so that should be taken into account.
>> >
>> >Can we get some more information about what version of OpenSSL is
>> >being used and what the system entropy source is?
>>
>> In my situation, OpenSSL 3.0.11 on ia64. 3.4.1 and 3.0.13 on x86 (x86
>> works fine Because OpenSSL uses hardware. On ia64, we end up on PRNGD,
>which does fail.
>
>You reported in [1] that a couple more tests are indeed failing, not only
t0610. That
>changes things in my opinion as it shows that this is not a localized issue
in the
>reftable library, but likely in multiple callsites where we use randomness.
So my
>current patch series is not sufficient as it only fixes up the reftable
codebase. But in
>the case where it's a general issue I tend to agree with brian, because I
don't want to
>play whack-a-mole with all the callsites of `git_rand()` where we can
indeed use
>insecure bytes.
>
>Honestly, this rather makes me want to remove the OpenSSL backend for our
>CSRNG completely. NonStop is the only platform that uses it right now, and
it seems
>to be easy to misconfigure. All the other backends we have don't have the
same
>issue as explained further up in my message. So does NonStop support any of
the
>alternative backends that Git has, like `arc4random_buf()`, `getrandom()`,
>`getentropy()` or reading from "/dev/urandom"?
>
>Might be I'm coming to conclusions too fast, so if I'm missing obvious
usecases then
>please stop me :)
>
>Randall, you mentioned that your platform had a maintenance window right
during
>the release of v2.48.0-rc2 [2]. You never mentioned issues with randomness
before
>that maintenance window, and after it you hit them in many tests without
any
>changes to the CSPRNG between v2.48.0-rc1 and -rc2. Could it be that
something
>broke on your end?

Unfortunately, ia64 is not a great platform for randomness. There are no
alternates
available. We have a case open on PRNGD, but it is unlikely to be fixed any
time soon.
The ia64 platform goes off support at the end of 2025, so we will stop
building git
for that platform when that happens. If there is some stopgap solution we
can use,
even PRNGD, but warn about reducing randomness load, it might work. For x86,
the hardware randomizer used in OpenSSL is fine.
brian m. carlson Jan. 8, 2025, 10:44 p.m. UTC | #5
On 2025-01-08 at 07:18:52, Patrick Steinhardt wrote:
> You reported in [1] that a couple more tests are indeed failing, not
> only t0610. That changes things in my opinion as it shows that this is
> not a localized issue in the reftable library, but likely in multiple
> callsites where we use randomness. So my current patch series is not
> sufficient as it only fixes up the reftable codebase. But in the case
> where it's a general issue I tend to agree with brian, because I don't
> want to play whack-a-mole with all the callsites of `git_rand()` where
> we can indeed use insecure bytes.
> 
> Honestly, this rather makes me want to remove the OpenSSL backend for
> our CSRNG completely. NonStop is the only platform that uses it right
> now, and it seems to be easy to misconfigure. All the other backends we
> have don't have the same issue as explained further up in my message. So
> does NonStop support any of the alternative backends that Git has, like
> `arc4random_buf()`, `getrandom()`, `getentropy()` or reading from
> "/dev/urandom"?

OpenSSL's backend is only as good as the system entropy source, which,
apparently in the case of PRNGD, is not very good.  The last release of
PRNGD was in 2007 apparently, so I don't think we should hold our breath
for a fix.

Or, of course, it could be simply that prngd works just fine and there
aren't enough sources for it.  If the machine has an analog microphone
input that isn't plugged in, streaming some data from that might be a
good source, since that will be noisy.  A second of recording signed
16-bit PCM data as 48 kHz might provide at least 64 bits of entropy[0].

I will say that libbsd provides a fallback implementation for its
getentropy code, which would allow the use of arc4random as a backend.
I know there were some portability problems with getting that to run on
NonStop, and of course I provide no guarantees about its suitability or
security, but it does appear that there is some alternative if the
porting problems can be overcome.  We explicitly have support for libbsd
in the Makefile already.

That doesn't avoid the problem of TLS and SSH not working, but it may
get the tests passing.

[0] That's 750 samples per bit of entropy, which I think should be
reasonably conservative.