Message ID | 20211025181634.3889666-1-willy@infradead.org (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | secretmem: Prevent secretmem_users from wrapping to zero | expand |
On Mon, Oct 25, 2021 at 07:16:34PM +0100, Matthew Wilcox (Oracle) wrote: > Commit 110860541f44 ("mm/secretmem: use refcount_t instead of atomic_t") > attempted to fix the problem of secretmem_users wrapping to zero and > allowing suspend once again. Prevent secretmem_users from wrapping to > zero by forbidding new users if the number of users has wrapped from > positive to negative. This stops a long way short of reaching the > necessary 4 billion users, so there's no need to be clever with special > anti-wrap types or checking the return value from atomic_inc(). I still prefer refcount_t here because it provides deterministic saturation, but the risk right now is so narrow ("don't hibernate"), I'm not going to fight for it. I think it'd be fine to use it initialized to 1, and have the removal check for == 0 as a failure state, which would deterministically cover the underflow case too. -Kees > > Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> > Cc: Jordy Zomer <jordy@pwning.systems> > Cc: Kees Cook <keescook@chromium.org>, > Cc: James Bottomley <James.Bottomley@HansenPartnership.com> > Cc: Mike Rapoport <rppt@kernel.org> > Cc: Andrew Morton <akpm@linux-foundation.org> > --- > mm/secretmem.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/mm/secretmem.c b/mm/secretmem.c > index 030f02ddc7c1..c2dda408bb36 100644 > --- a/mm/secretmem.c > +++ b/mm/secretmem.c > @@ -203,6 +203,8 @@ SYSCALL_DEFINE1(memfd_secret, unsigned int, flags) > > if (flags & ~(SECRETMEM_FLAGS_MASK | O_CLOEXEC)) > return -EINVAL; > + if (atomic_read(&secretmem_users) < 0) > + return -ENFILE; > > fd = get_unused_fd_flags(flags & O_CLOEXEC); > if (fd < 0) > -- > 2.33.0 >
On Mon, Oct 25, 2021 at 12:29:46PM -0700, Kees Cook wrote: > On Mon, Oct 25, 2021 at 07:16:34PM +0100, Matthew Wilcox (Oracle) wrote: > > Commit 110860541f44 ("mm/secretmem: use refcount_t instead of atomic_t") > > attempted to fix the problem of secretmem_users wrapping to zero and > > allowing suspend once again. Prevent secretmem_users from wrapping to > > zero by forbidding new users if the number of users has wrapped from > > positive to negative. This stops a long way short of reaching the > > necessary 4 billion users, so there's no need to be clever with special > > anti-wrap types or checking the return value from atomic_inc(). > > I still prefer refcount_t here because it provides deterministic > saturation, but the risk right now is so narrow ("don't hibernate"), > I'm not going to fight for it. I think it'd be fine to use it initialized > to 1, and have the removal check for == 0 as a failure state, which would > deterministically cover the underflow case too. I still think that's abusing the refcount_t pattern. refcount_t should be for ... reference counts. Not these other things.
On Mon, Oct 25, 2021 at 08:51:40PM +0100, Matthew Wilcox wrote: > On Mon, Oct 25, 2021 at 12:29:46PM -0700, Kees Cook wrote: > > On Mon, Oct 25, 2021 at 07:16:34PM +0100, Matthew Wilcox (Oracle) wrote: > > > Commit 110860541f44 ("mm/secretmem: use refcount_t instead of atomic_t") > > > attempted to fix the problem of secretmem_users wrapping to zero and > > > allowing suspend once again. Prevent secretmem_users from wrapping to > > > zero by forbidding new users if the number of users has wrapped from > > > positive to negative. This stops a long way short of reaching the > > > necessary 4 billion users, so there's no need to be clever with special > > > anti-wrap types or checking the return value from atomic_inc(). > > > > I still prefer refcount_t here because it provides deterministic > > saturation, but the risk right now is so narrow ("don't hibernate"), > > I'm not going to fight for it. I think it'd be fine to use it initialized > > to 1, and have the removal check for == 0 as a failure state, which would > > deterministically cover the underflow case too. > > I still think that's abusing the refcount_t pattern. refcount_t should > be for ... reference counts. Not these other things. Is secretmem different? We're trying to count how many of these we have, this is a common pattern in, for example, the network code which does this kind of thing a lot. It's just that for a "global" counter like here, we're not tied to a specific object's lifetime, but usually some global state that depends on having _any_ of the objects alive.
On Mon, Oct 25, 2021 at 2:04 PM Kees Cook <keescook@chromium.org> wrote: > > Is secretmem different? We're trying to count how many of these we have, > this is a common pattern in, for example, the network code which does > this kind of thing a lot. Yes, secretmem is different. A refcount being zero means that the data it referenced no longer exists. That's not what the secretmem counter meant at all. Making it a refcount was simply WRONG. It's not a refcount, it's a usage count, and the fact that syzbot caught the warning just shows how wrong it was. Stop arguing for garbage. It was wrong, just admit it. The semantics for "refcount" is something else than what that code had. As a result, it got reverted. I've applied Willy's patch that actually makes sense. Arguing for garbage in the name of "security" is still garbage. In fact, it only causes confusion, and as such is likely to result in problems - including security problems - later. Because confusion about semantics is bad. And that was what that patch was. And I want to state - again - how dangerous this "refcounts are always prefereable to atomics" mental model is. Refcounts are _not_ fundamentally preferable to atomics. They are slower, bigger, and have completely different semantics. So if something isn't a refcount, it damn well shouldn't use "refcount_t". Linus
On Mon, Oct 25, 2021 at 02:17:58PM -0700, Linus Torvalds wrote: > On Mon, Oct 25, 2021 at 2:04 PM Kees Cook <keescook@chromium.org> wrote: > > > > Is secretmem different? We're trying to count how many of these we have, > > this is a common pattern in, for example, the network code which does > > this kind of thing a lot. > > Yes, secretmem is different. Prefix: I'm fine with this being whatever; it's a corner case, so this reply is mainly about nailing down the rationales for future decisions. > A refcount being zero means that the data it referenced no longer exists. I don't disagree with this definition, but I would like to understand how some other use-cases fit into this. The case of basic allocated object memory lifetime reference counting we all agree on. What about the case of what I see that is more like a "shared resource usage count" where the shared resource doesn't necessarily disappear when we reach "no users"? i.e. there is some resource, and it starts its life with no one using it (count = 1). Then we have users declaring their use (count = 2) and later release (count = 1) of the resource. It's not really ever unallocated when users == 0 (count == 1), but we might use the usage counter for other things and don't want to let it go negative or ever increment from zero (in case zero is used for marking the resource unavailable forever). For example, protocols knowing if there are any sockets left open, crypto API usage counts, kernel module usage counts, etc. I don't see as clear a distinction between secretmem and the above examples. The question being answered is "how many users do I have?" and we want to make sure we don't end up with overflow or underflow given the (unfortunately) common case of reference counting kernel bugs. The fact that secretmem doesn't have its own allocation to free when it hits 0 seems like just an implementation detail of this particular resource usage counter. But, ignoring secretmem for a moment, what about a specific example from above: the module loader's refcnt atomic_t, which is actually maintaining an allocation (the module), and uses usage counts of 0 to mean "I am removing this module", and 1 is "I have no users", 2 is "1 user", etc. It seems like this should use refcount_t to me? > Stop arguing for garbage. It was wrong, just admit it. The semantics > for "refcount" is something else than what that code had. As a result, > it got reverted. I've applied Willy's patch that actually makes sense. > > Arguing for garbage in the name of "security" is still garbage. In > fact, it only causes confusion, and as such is likely to result in > problems - including security problems - later. Sure, and reasonable people can disagree about what is garbage. :) I see using a refcount_t here as a not unreasonable way to protect against potential future security problems if the scope of secretmem ever grows and more than just hibernation starts to care about this usage counter. > Because confusion about semantics is bad. And that was what that patch was. > > And I want to state - again - how dangerous this "refcounts are always > prefereable to atomics" mental model is. Refcounts are _not_ > fundamentally preferable to atomics. They are slower, bigger, and have > completely different semantics. I will push back on this; I don't think that's a fair assessment at all. For storage size, refcount_t is identical to atomic_t (refcount_t _is_ an atomic_t). But perhaps you meant code size? Yes, the refcount_t helpers are technically larger. But like speed, where refcount_t is also technically slower, neither size nor speed were so much changed that we kept around the routines that made it exactly the same speed and grew the instruction count by 1 (the original x86-specific refcount_t implementation was just as fast as atomic_t). I don't see speed nor size alone to be a good reason to say "don't use refcount_t". But yes, I agree about the different semantics: there are very specific memory ordering assumptions that tend to be more strict than atomic_t (which IMO actually makes it more suitable than atomic_t for shared usage counts). > So if something isn't a refcount, it damn well shouldn't use "refcount_t". Again, I don't disagree, but since it looked like a refcount to me, I'd like to understand what why we don't see this case the same way. Since I agree that secretmem is currently pretty iffy (nothing actually allocated to track, only system state), I'll ask a slightly different question: should the module loader use refcount_t? If not, why?
On Mon, Oct 25, 2021 at 3:30 PM Kees Cook <keescook@chromium.org> wrote: > > > A refcount being zero means that the data it referenced no longer exists. > > I don't disagree with this definition, but I would like to understand how > some other use-cases fit into this. I certainly hope that there are no other use-cases for 'recount_t', because that "zero is invalid" is very much part of the semantics. If we want other semantics, it should be a new type. > What about the case of what > I see that is more like a "shared resource usage count" where the shared > resource doesn't necessarily disappear when we reach "no users"? So I think that's really "atomic_t". And instead of saturating, people should always check such shared resources for limits. > i.e. there is some resource, and it starts its life with no one using it > (count = 1). You are already going off into the weeds. That's not a natural thing to do. It's already confusing. Really. Read that sentence yourself, and read it like an outsider. "No one is using it, so count == 1" is a nonsensican statement on the face of it. You are thinking of a refcount_t trick, not some sane semantics. Yes, we have played off-by-one games in the kernel before. We've done it for various subtle reasons. For example, traditionally, on x86, with atomic counting there are three special situations: negative, 0 and positive. So if you use the traditional x86 counting atomics (just add/sub/inc/dec, no xadd) then there are situations where you can get more information about the result in %eflags if you don't use zero as the initial value, but -1. Because then you can do "inc", and if ZF is set, you know you were the _first_ person to increment it. And when you use "dec", and SF is set afterwards, you know you are the _last_ person to decrement it. That was useful when things like "xadd" weren't available, and cmpxchg loops are expensive. So we used to have counters where -1 was that "zero point". Very similar to your "1 is the zero point". But was it _logical_? No. It was an implementation trick. I think we've removed all those cases because it was so subtle and confusing (but maybe we still have it somewhere - I did not check). So we've certainly played those kinds of games. But it had better be for a really good reason. > I don't see as clear a distinction between secretmem and the above > examples. I really don't see what's wrong with 'atomic_t', and just checking for limits. Saturating counters are EVIL AND BAD. They are a DoS waiting to happen. Once you saturate, the machine is basically dead. You may have "protected" against some attack, but you did so by killing the machine and making the resource accounting no longer work. So if a user can ever trigger a saturating counter, that's a big big problem in itself. In contrast, an 'atomic_t' with a simple limit? It just works. And it doesn't need illogical tricks to work. Stop thinking that refcount_t is a good type. Start realizing the downsides. Start understanding that saturation is a HORRENDOUSLY BAD solution, and horrible QoI. Linus
On Mon, Oct 25, 2021 at 4:37 PM Linus Torvalds <torvalds@linux-foundation.org> wrote: > > Stop thinking that refcount_t is a good type. Start realizing the > downsides. Start understanding that saturation is a HORRENDOUSLY BAD > solution, and horrible QoI. Basically, refcount_t should be used purely for internal kernel data structures - it makes perfect sense for things like the 'struct device' resource handling, for example. In fact, there it is good for two reasons: - it's not counting some user resource, so users shouldn't have any way to trigger overflow and saturation which causes problems - it's used by random driver stuff, which is often where kernel bugs happen and testing is fundamentally limited by hw availability etc but in general, anything that is user-accountable needs to have _limits_, not saturation. It's why the page count is a "atomic_t" even if the name of the field is "_refcount". Because refcount_t is the INFERIOR TYPE. Using an atomic_t properly is actually the much better option. It's just that "properly" might be a bit more code, involving actual limit checking. 'refcount_t' is basically a shorthand for "I didn't bother doing this right, so I'm using this type that adds debugging, warns and stops working and might DoS the kernel". It's a crutch. It's not the alpha and the omega of counting types. It has its place, but I really want to stress how people should ABSOLUTELY not think "oh, refcount_t is better than atomic_t". Linus
On Mon, Oct 25, 2021 at 04:37:01PM -0700, Linus Torvalds wrote: > If we want other semantics, it should be a new type. Okay, that's reasonable. > > I see that is more like a "shared resource usage count" where the shared > > resource doesn't necessarily disappear when we reach "no users"? > > So I think that's really "atomic_t". > > And instead of saturating, people should always check such shared > resources for limits. Right, but people make mistakes, etc. I agree about the limit being much more sane than saturating (though in the cases of "missed decrement"), we get to the same place: an open-coded check for the limit that never goes down doesn't matter if it's refcount_t nor atomic_t. :) > > i.e. there is some resource, and it starts its life with no one using it > > (count = 1). > > You are already going off into the weeds. > > That's not a natural thing to do. It's already confusing. Really. Read > that sentence yourself, and read it like an outsider. > > "No one is using it, so count == 1" is a nonsensican statement on the > face of it. > > You are thinking of a refcount_t trick, not some sane semantics. > > Yes, we have played off-by-one games in the kernel before. We've done > it for various subtle reasons. Right, sure, but it's not a rare pattern. Given that it exists, and that it _does_ get used for allocation management (e.g. module loader), it seems worth constructing a proper type for it so that all the open coded stuff around these instances can be consolidated, and the API can be defined in a way that will behave sanely. > I really don't see what's wrong with 'atomic_t', and just checking for limits. It's that last part. :) If we go through atomic_dec() see a zero and do something, okay, fine. But these places need to check for insane conditions too ("we got a -1 back -- this means there's a bug but what do we do?"). Same for atomic_inc(): "oh, we're at our limit, do something", but what above discovering ourselves above the limit? There's nothing about using the atomic_t primitives that enforces these kinds of checks. (And there likely shouldn't be for atomic_t -- it's a plain type.) But we likely need something that fills in this API gap between atomic_t and refcount_t. > So if a user can ever trigger a saturating counter, that's a big big > problem in itself. Yes! It is. :) But they don't get to gain control over a Use-after-Free. The risk to the system is DoS instead of loss of execution control. That's a meaningful risk downgrade. So, what's the right semantics for an atomic type that could be used in the module loader, that would catch kernel counting bugs in a safe manner? The "refcount_t but 1-based" is close, but clearly not the right name. :)
On Mon, Oct 25, 2021 at 5:18 PM Kees Cook <keescook@chromium.org> wrote: > > Right, sure, but it's not a rare pattern. Well, for an actual reference count it certainly isn't a rare pattern, and zero _is_ special, because at zero, you are now in use-after-free territory. But that's kind of the issue here: that really isn't what 'secretmem_users' was ever about. Zero isn't some "now we're use-after-free" situation. Quite the reverse. Zero ends up being the safe thing. So with that kind of "just count number of existing users", where zero isn't special, then refcount_t doesn't make sense. And refcount_t is for non-core stuff that has a lot of random kernel users that you can't easily verify. In contrast, 'secretmem_users' had exactly two sites that modified it, and one that tested it. > But these places need to check for insane > conditions too ("we got a -1 back -- this means there's a bug but what > do we do?"). Same for atomic_inc(): "oh, we're at our limit, do > something", but what above discovering ourselves above the limit? So honestly, "above the limit" is often perfectly fine too. It can be fine for two very different reasons: (a) racy checks are often much simpler and faster, and perfectly safe when the limit is "far away from overflow". (b) limits can change And (a) isn't just about "avoid special atomics". It's about doing the limit check optimistically outside locking etc. And (b) wasn't an issue here (where the only real use was ltierally "are there any users at all"), but in most _proper_ use cases you will want to have some resource limit that might be set by MIS. And might be changed dynamically. So it's entirely possible that somebody sets the limit to something smaller than the current user (prep for shutdown, whatever), without it being an error at all. The limit is for future work, not for past work. Easily happens with things like rlimits etc. > There's nothing about using the atomic_t primitives that enforces these > kinds of checks. (And there likely shouldn't be for atomic_t -- it's a > plain type.) But we likely need something that fills in this API gap > between atomic_t and refcount_t. I dispute the "need". This isn't as common as you claim. Most resource counting _is_ for "free when no longer used". And on the other end, you have the users that don't want refcount_t because they can't live with the limitations of that interface, like the page counts etc, that do it properly. So I think in 99% of all situations, the proper fix is to embed an "atomic_t" in the type it protects, and then have the helper functions that actually do it properly. Like we do for "get_page()" and friends. The "new type" isn't about the reference counting, it's about the data itself, and the atomic_t is just a part of it. Could we do something new type that warns on the "decrement past zero" and "overflow on increment"? Sure. But since they by _definition_ aren't about data lifetimes, they probably don't need saturation - you want the _warning_, but they aren't protecting data, since they aren't refcounts. Or could we have something even fancier, that is an actual defined range, and "overflow" is not "overflow in 32 bits", but "becomes bigger than X")? That gets more complex because now you'd have to encode the range in the type somehow. You could do it with actual static types (generate typedef names and code), or you could do it with types that have a more dynamic pointer to ranges (kind of like the sysfs interfaces do) or have the ranges embedded in the data structure itself. But honestly, the complexity downside seems to just dwarf the upside. Linus
On Mon, Oct 25, 2021 at 04:37:01PM -0700, Linus Torvalds wrote: > For example, traditionally, on x86, with atomic counting there are > three special situations: negative, 0 and positive. So if you use the > traditional x86 counting atomics (just add/sub/inc/dec, no xadd) then > there are situations where you can get more information about the > result in %eflags if you don't use zero as the initial value, but -1. > > Because then you can do "inc", and if ZF is set, you know you were the > _first_ person to increment it. And when you use "dec", and SF is set > afterwards, you know you are the _last_ person to decrement it. > > That was useful when things like "xadd" weren't available, and cmpxchg > loops are expensive. So we used to have counters where -1 was that > "zero point". Very similar to your "1 is the zero point". > > But was it _logical_? No. It was an implementation trick. I think > we've removed all those cases because it was so subtle and confusing > (but maybe we still have it somewhere - I did not check). We still do it for page->_mapcount; it's biased to -1 so that both "the page has exactly one mapping" and "the page has no mapping" are cheaply checkable conditions. But, as you say, that's invisible to the users. page_mapcount() is return atomic_read(&page->_mapcount) + 1;
diff --git a/mm/secretmem.c b/mm/secretmem.c index 030f02ddc7c1..c2dda408bb36 100644 --- a/mm/secretmem.c +++ b/mm/secretmem.c @@ -203,6 +203,8 @@ SYSCALL_DEFINE1(memfd_secret, unsigned int, flags) if (flags & ~(SECRETMEM_FLAGS_MASK | O_CLOEXEC)) return -EINVAL; + if (atomic_read(&secretmem_users) < 0) + return -ENFILE; fd = get_unused_fd_flags(flags & O_CLOEXEC); if (fd < 0)
Commit 110860541f44 ("mm/secretmem: use refcount_t instead of atomic_t") attempted to fix the problem of secretmem_users wrapping to zero and allowing suspend once again. Prevent secretmem_users from wrapping to zero by forbidding new users if the number of users has wrapped from positive to negative. This stops a long way short of reaching the necessary 4 billion users, so there's no need to be clever with special anti-wrap types or checking the return value from atomic_inc(). Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Jordy Zomer <jordy@pwning.systems> Cc: Kees Cook <keescook@chromium.org>, Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> --- mm/secretmem.c | 2 ++ 1 file changed, 2 insertions(+)