mbox series

[00/12] slab: Introduce kmalloc_size_roundup()

Message ID 20220922031013.2150682-1-keescook@chromium.org (mailing list archive)
Headers show
Series slab: Introduce kmalloc_size_roundup() | expand

Message

Kees Cook Sept. 22, 2022, 3:10 a.m. UTC
Hi,

This series fixes up the cases where callers of ksize() use it to
opportunistically grow their buffer sizes, which can run afoul of the
__alloc_size hinting that CONFIG_UBSAN_BOUNDS and CONFIG_FORTIFY_SOURCE
use to perform dynamic buffer bounds checking. Quoting the first patch:


In the effort to help the compiler reason about buffer sizes, the
__alloc_size attribute was added to allocators. This improves the scope
of the compiler's ability to apply CONFIG_UBSAN_BOUNDS and (in the near
future) CONFIG_FORTIFY_SOURCE. For most allocations, this works well,
as the vast majority of callers are not expecting to use more memory
than what they asked for.

There is, however, one common exception to this: anticipatory resizing
of kmalloc allocations. These cases all use ksize() to determine the
actual bucket size of a given allocation (e.g. 128 when 126 was asked
for). This comes in two styles in the kernel:

1) An allocation has been determined to be too small, and needs to be
   resized. Instead of the caller choosing its own next best size, it
   wants to minimize the number of calls to krealloc(), so it just uses
   ksize() plus some additional bytes, forcing the realloc into the next
   bucket size, from which it can learn how large it is now. For example:

	data = krealloc(data, ksize(data) + 1, gfp);
	data_len = ksize(data);

2) The minimum size of an allocation is calculated, but since it may
   grow in the future, just use all the space available in the chosen
   bucket immediately, to avoid needing to reallocate later. A good
   example of this is skbuff's allocators:

	data = kmalloc_reserve(size, gfp_mask, node, &pfmemalloc);
	...
	/* kmalloc(size) might give us more room than requested.
	 * Put skb_shared_info exactly at the end of allocated zone,
	 * to allow max possible filling before reallocation.
	 */
	osize = ksize(data);
        size = SKB_WITH_OVERHEAD(osize);

In both cases, the "how large is the allocation?" question is answered
_after_ the allocation, where the compiler hinting is not in an easy place
to make the association any more. This mismatch between the compiler's
view of the buffer length and the code's intention about how much it is
going to actually use has already caused problems[1]. It is possible to
fix this by reordering the use of the "actual size" information.

We can serve the needs of users of ksize() and still have accurate buffer
length hinting for the compiler by doing the bucket size calculation
_before_ the allocation. Code can instead ask "how large an allocation
would I get for a given size?".

Introduce kmalloc_size_roundup(), to serve this function so we can start
replacing the "anticipatory resizing" uses of ksize().

[1] https://github.com/ClangBuiltLinux/linux/issues/1599
    https://github.com/KSPP/linux/issues/183
-------

And after adding kmalloc_size_roundup(), put it to use with the various
ksize() callers, restore the previously removed __alloc_size hint,
and fix the use of __malloc annotations.

I tried to trim the CC list on this series since it got rather long. I
kept all the suggested mailing lists, though. :)

Thanks!

-Kees

Kees Cook (12):
  slab: Introduce kmalloc_size_roundup()
  skbuff: Proactively round up to kmalloc bucket size
  net: ipa: Proactively round up to kmalloc bucket size
  btrfs: send: Proactively round up to kmalloc bucket size
  dma-buf: Proactively round up to kmalloc bucket size
  coredump: Proactively round up to kmalloc bucket size
  igb: Proactively round up to kmalloc bucket size
  openvswitch: Proactively round up to kmalloc bucket size
  x86/microcode/AMD: Track patch allocation size explicitly
  iwlwifi: Track scan_cmd allocation size explicitly
  slab: Remove __malloc attribute from realloc functions
  slab: Restore __alloc_size attribute to __kmalloc_track_caller

 arch/x86/include/asm/microcode.h              |  1 +
 arch/x86/kernel/cpu/microcode/amd.c           |  3 +-
 drivers/dma-buf/dma-resv.c                    |  9 +++-
 drivers/net/ethernet/intel/igb/igb_main.c     |  1 +
 drivers/net/ipa/gsi_trans.c                   |  7 ++-
 drivers/net/wireless/intel/iwlwifi/dvm/dev.h  |  1 +
 drivers/net/wireless/intel/iwlwifi/dvm/scan.c | 10 +++-
 drivers/net/wireless/intel/iwlwifi/mvm/mvm.h  |  3 +-
 drivers/net/wireless/intel/iwlwifi/mvm/ops.c  |  3 +-
 drivers/net/wireless/intel/iwlwifi/mvm/scan.c |  6 +--
 fs/btrfs/send.c                               | 11 +++--
 fs/coredump.c                                 |  7 ++-
 include/linux/compiler_types.h                | 13 ++----
 include/linux/slab.h                          | 46 ++++++++++++++++---
 mm/slab_common.c                              | 17 +++++++
 net/core/skbuff.c                             | 34 +++++++-------
 net/openvswitch/flow_netlink.c                |  4 +-
 17 files changed, 125 insertions(+), 51 deletions(-)

Comments

Christian König Sept. 22, 2022, 7:10 a.m. UTC | #1
Am 22.09.22 um 05:10 schrieb Kees Cook:
> Hi,
>
> This series fixes up the cases where callers of ksize() use it to
> opportunistically grow their buffer sizes, which can run afoul of the
> __alloc_size hinting that CONFIG_UBSAN_BOUNDS and CONFIG_FORTIFY_SOURCE
> use to perform dynamic buffer bounds checking.

Good cleanup, but one question: What other use cases we have for ksize() 
except the opportunistically growth of buffers?

Of hand I can't see any.

So when this patch set is about to clean up this use case it should 
probably also take care to remove ksize() or at least limit it so that 
it won't be used for this use case in the future.

Regards,
Christian.


>   Quoting the first patch:
>
>
> In the effort to help the compiler reason about buffer sizes, the
> __alloc_size attribute was added to allocators. This improves the scope
> of the compiler's ability to apply CONFIG_UBSAN_BOUNDS and (in the near
> future) CONFIG_FORTIFY_SOURCE. For most allocations, this works well,
> as the vast majority of callers are not expecting to use more memory
> than what they asked for.
>
> There is, however, one common exception to this: anticipatory resizing
> of kmalloc allocations. These cases all use ksize() to determine the
> actual bucket size of a given allocation (e.g. 128 when 126 was asked
> for). This comes in two styles in the kernel:
>
> 1) An allocation has been determined to be too small, and needs to be
>     resized. Instead of the caller choosing its own next best size, it
>     wants to minimize the number of calls to krealloc(), so it just uses
>     ksize() plus some additional bytes, forcing the realloc into the next
>     bucket size, from which it can learn how large it is now. For example:
>
> 	data = krealloc(data, ksize(data) + 1, gfp);
> 	data_len = ksize(data);
>
> 2) The minimum size of an allocation is calculated, but since it may
>     grow in the future, just use all the space available in the chosen
>     bucket immediately, to avoid needing to reallocate later. A good
>     example of this is skbuff's allocators:
>
> 	data = kmalloc_reserve(size, gfp_mask, node, &pfmemalloc);
> 	...
> 	/* kmalloc(size) might give us more room than requested.
> 	 * Put skb_shared_info exactly at the end of allocated zone,
> 	 * to allow max possible filling before reallocation.
> 	 */
> 	osize = ksize(data);
>          size = SKB_WITH_OVERHEAD(osize);
>
> In both cases, the "how large is the allocation?" question is answered
> _after_ the allocation, where the compiler hinting is not in an easy place
> to make the association any more. This mismatch between the compiler's
> view of the buffer length and the code's intention about how much it is
> going to actually use has already caused problems[1]. It is possible to
> fix this by reordering the use of the "actual size" information.
>
> We can serve the needs of users of ksize() and still have accurate buffer
> length hinting for the compiler by doing the bucket size calculation
> _before_ the allocation. Code can instead ask "how large an allocation
> would I get for a given size?".
>
> Introduce kmalloc_size_roundup(), to serve this function so we can start
> replacing the "anticipatory resizing" uses of ksize().
>
> [1] https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FClangBuiltLinux%2Flinux%2Fissues%2F1599&data=05%7C01%7Cchristian.koenig%40amd.com%7C491e7c24ddc64e9e505b08da9c47fe36%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637994130356907320%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=te%2BJ46%2B8L8oBTyGS3C7ueORFYI%2BhMRbfEoflVErr4k0%3D&reserved=0
>      https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FKSPP%2Flinux%2Fissues%2F183&data=05%7C01%7Cchristian.koenig%40amd.com%7C491e7c24ddc64e9e505b08da9c47fe36%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637994130356907320%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=lrOCZN6EE%2BnDBA5DfOqteQt0nKCbJJ9bxlh2F13%2B3Es%3D&reserved=0
> -------
>
> And after adding kmalloc_size_roundup(), put it to use with the various
> ksize() callers, restore the previously removed __alloc_size hint,
> and fix the use of __malloc annotations.
>
> I tried to trim the CC list on this series since it got rather long. I
> kept all the suggested mailing lists, though. :)
>
> Thanks!
>
> -Kees
>
> Kees Cook (12):
>    slab: Introduce kmalloc_size_roundup()
>    skbuff: Proactively round up to kmalloc bucket size
>    net: ipa: Proactively round up to kmalloc bucket size
>    btrfs: send: Proactively round up to kmalloc bucket size
>    dma-buf: Proactively round up to kmalloc bucket size
>    coredump: Proactively round up to kmalloc bucket size
>    igb: Proactively round up to kmalloc bucket size
>    openvswitch: Proactively round up to kmalloc bucket size
>    x86/microcode/AMD: Track patch allocation size explicitly
>    iwlwifi: Track scan_cmd allocation size explicitly
>    slab: Remove __malloc attribute from realloc functions
>    slab: Restore __alloc_size attribute to __kmalloc_track_caller
>
>   arch/x86/include/asm/microcode.h              |  1 +
>   arch/x86/kernel/cpu/microcode/amd.c           |  3 +-
>   drivers/dma-buf/dma-resv.c                    |  9 +++-
>   drivers/net/ethernet/intel/igb/igb_main.c     |  1 +
>   drivers/net/ipa/gsi_trans.c                   |  7 ++-
>   drivers/net/wireless/intel/iwlwifi/dvm/dev.h  |  1 +
>   drivers/net/wireless/intel/iwlwifi/dvm/scan.c | 10 +++-
>   drivers/net/wireless/intel/iwlwifi/mvm/mvm.h  |  3 +-
>   drivers/net/wireless/intel/iwlwifi/mvm/ops.c  |  3 +-
>   drivers/net/wireless/intel/iwlwifi/mvm/scan.c |  6 +--
>   fs/btrfs/send.c                               | 11 +++--
>   fs/coredump.c                                 |  7 ++-
>   include/linux/compiler_types.h                | 13 ++----
>   include/linux/slab.h                          | 46 ++++++++++++++++---
>   mm/slab_common.c                              | 17 +++++++
>   net/core/skbuff.c                             | 34 +++++++-------
>   net/openvswitch/flow_netlink.c                |  4 +-
>   17 files changed, 125 insertions(+), 51 deletions(-)
>
Kees Cook Sept. 22, 2022, 3:55 p.m. UTC | #2
On Thu, Sep 22, 2022 at 09:10:56AM +0200, Christian König wrote:
> Am 22.09.22 um 05:10 schrieb Kees Cook:
> > Hi,
> > 
> > This series fixes up the cases where callers of ksize() use it to
> > opportunistically grow their buffer sizes, which can run afoul of the
> > __alloc_size hinting that CONFIG_UBSAN_BOUNDS and CONFIG_FORTIFY_SOURCE
> > use to perform dynamic buffer bounds checking.
> 
> Good cleanup, but one question: What other use cases we have for ksize()
> except the opportunistically growth of buffers?

The remaining cases all seem to be using it as a "do we need to resize
yet?" check, where they don't actually track the allocation size
themselves and want to just depend on the slab cache to answer it. This
is most clearly seen in the igp code:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/ethernet/intel/igb/igb_main.c?h=v6.0-rc6#n1204

My "solution" there kind of side-steps it, and leaves ksize() as-is:
https://lore.kernel.org/linux-hardening/20220922031013.2150682-8-keescook@chromium.org/

The more correct solution would be to add per-v_idx size tracking,
similar to the other changes I sent:
https://lore.kernel.org/linux-hardening/20220922031013.2150682-11-keescook@chromium.org/

I wonder if perhaps I should just migrate some of this code to using
something like struct membuf.

> Off hand I can't see any.
> 
> So when this patch set is about to clean up this use case it should probably
> also take care to remove ksize() or at least limit it so that it won't be
> used for this use case in the future.

Yeah, my goal would be to eliminate ksize(), and it seems possible if
other cases are satisfied with tracking their allocation sizes directly.

-Kees
Vlastimil Babka Sept. 22, 2022, 9:05 p.m. UTC | #3
On 9/22/22 17:55, Kees Cook wrote:
> On Thu, Sep 22, 2022 at 09:10:56AM +0200, Christian König wrote:
>> Am 22.09.22 um 05:10 schrieb Kees Cook:
>> > Hi,
>> > 
>> > This series fixes up the cases where callers of ksize() use it to
>> > opportunistically grow their buffer sizes, which can run afoul of the
>> > __alloc_size hinting that CONFIG_UBSAN_BOUNDS and CONFIG_FORTIFY_SOURCE
>> > use to perform dynamic buffer bounds checking.
>> 
>> Good cleanup, but one question: What other use cases we have for ksize()
>> except the opportunistically growth of buffers?
> 
> The remaining cases all seem to be using it as a "do we need to resize
> yet?" check, where they don't actually track the allocation size
> themselves and want to just depend on the slab cache to answer it. This
> is most clearly seen in the igp code:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/ethernet/intel/igb/igb_main.c?h=v6.0-rc6#n1204
> 
> My "solution" there kind of side-steps it, and leaves ksize() as-is:
> https://lore.kernel.org/linux-hardening/20220922031013.2150682-8-keescook@chromium.org/
> 
> The more correct solution would be to add per-v_idx size tracking,
> similar to the other changes I sent:
> https://lore.kernel.org/linux-hardening/20220922031013.2150682-11-keescook@chromium.org/
> 
> I wonder if perhaps I should just migrate some of this code to using
> something like struct membuf.
> 
>> Off hand I can't see any.
>> 
>> So when this patch set is about to clean up this use case it should probably
>> also take care to remove ksize() or at least limit it so that it won't be
>> used for this use case in the future.
> 
> Yeah, my goal would be to eliminate ksize(), and it seems possible if
> other cases are satisfied with tracking their allocation sizes directly.

I think we could leave ksize() to determine the size without a need for
external tracking, but from now on forbid callers from using that hint to
overflow the allocation size they actually requested? Once we remove the
kasan/kfence hooks in ksize() that make the current kinds of usage possible,
we should be able to catch any offenders of the new semantics that would appear?

> -Kees
>
Kees Cook Sept. 22, 2022, 9:49 p.m. UTC | #4
On Thu, Sep 22, 2022 at 11:05:47PM +0200, Vlastimil Babka wrote:
> On 9/22/22 17:55, Kees Cook wrote:
> > On Thu, Sep 22, 2022 at 09:10:56AM +0200, Christian König wrote:
> > [...]
> > > So when this patch set is about to clean up this use case it should probably
> > > also take care to remove ksize() or at least limit it so that it won't be
> > > used for this use case in the future.
> > 
> > Yeah, my goal would be to eliminate ksize(), and it seems possible if
> > other cases are satisfied with tracking their allocation sizes directly.
> 
> I think we could leave ksize() to determine the size without a need for
> external tracking, but from now on forbid callers from using that hint to
> overflow the allocation size they actually requested? Once we remove the
> kasan/kfence hooks in ksize() that make the current kinds of usage possible,
> we should be able to catch any offenders of the new semantics that would appear?

That's correct. I spent the morning working my way through the rest of
the ksize() users I didn't clean up yesterday, and in several places I
just swapped in __ksize(). But that wouldn't even be needed if we just
removed the kasan unpoisoning from ksize(), etc.

I am tempted to leave it __ksize(), though, just to reinforce that it's
not supposed to be used "normally". What do you think?
Vlastimil Babka Sept. 23, 2022, 9:07 a.m. UTC | #5
On 9/22/22 23:49, Kees Cook wrote:
> On Thu, Sep 22, 2022 at 11:05:47PM +0200, Vlastimil Babka wrote:
>> On 9/22/22 17:55, Kees Cook wrote:
>> > On Thu, Sep 22, 2022 at 09:10:56AM +0200, Christian König wrote:
>> > [...]
>> > > So when this patch set is about to clean up this use case it should probably
>> > > also take care to remove ksize() or at least limit it so that it won't be
>> > > used for this use case in the future.
>> > 
>> > Yeah, my goal would be to eliminate ksize(), and it seems possible if
>> > other cases are satisfied with tracking their allocation sizes directly.
>> 
>> I think we could leave ksize() to determine the size without a need for
>> external tracking, but from now on forbid callers from using that hint to
>> overflow the allocation size they actually requested? Once we remove the
>> kasan/kfence hooks in ksize() that make the current kinds of usage possible,
>> we should be able to catch any offenders of the new semantics that would appear?
> 
> That's correct. I spent the morning working my way through the rest of
> the ksize() users I didn't clean up yesterday, and in several places I
> just swapped in __ksize(). But that wouldn't even be needed if we just
> removed the kasan unpoisoning from ksize(), etc.
> 
> I am tempted to leave it __ksize(), though, just to reinforce that it's
> not supposed to be used "normally". What do you think?

Sounds good. Note in linux-next there's now a series in slab.git planned for
6.1 that moves __ksize() declaration to mm/slab.h to make it more private.
But we don't want random users outside mm and related kasan/kfence
subsystems to include mm/slab.h, so we'll have to expose it again instead of
ksize().