mbox series

[00/62] Separate struct slab from struct page

Message ID 20211004134650.4031813-1-willy@infradead.org (mailing list archive)
Headers show
Series Separate struct slab from struct page | expand

Message

Matthew Wilcox Oct. 4, 2021, 1:45 p.m. UTC
This is an offshoot of the folio work, although it does not depend on any
of the outstanding pieces of folio work.  One of the more complex parts
of the struct page definition is the parts used by the slab allocators.
It would be good for the MM in general if struct slab were its own data
type, and it also helps to prevent tail pages from slipping in anywhere.

The slub conversion is done "properly", ie in individually reviewable,
bisectable chunks.  The slab & slob conversions are slapdash.  The
patches to switch bootmem & zsmalloc away from slab elements are also
expedient instead of well thought through.  The KASAN and memcg parts
would also benefit from a more thoughtful approach.

I'm not entirely happy with slab_test_cache() for a predicate name.
I actually liked SlabAllocation() better, but then I remembered that we're
trying to get away from InterCapping, and somehow slab_test_allocation()
didn't feel right either.

I don't know the slab allocators terribly well, so I would be very
grateful if one of the slab maintainers took over this effort.  This is
kind of a distraction from what I'm really trying to accomplish with
folios, although it has found one minor bug.

Matthew Wilcox (Oracle) (62):
  mm: Convert page_to_section() to pgflags_section()
  mm: Add pgflags_nid()
  mm: Split slab into its own type
  mm: Add account_slab() and unaccount_slab()
  mm: Convert virt_to_cache() to use struct slab
  mm: Convert __ksize() to struct slab
  mm: Use struct slab in kmem_obj_info()
  mm: Convert check_heap_object() to use struct slab
  mm/slub: Convert process_slab() to take a struct slab
  mm/slub: Convert detached_freelist to use a struct slab
  mm/slub: Convert kfree() to use a struct slab
  mm/slub: Convert __slab_free() to take a struct slab
  mm/slub: Convert new_slab() to return a struct slab
  mm/slub: Convert early_kmem_cache_node_alloc() to use struct slab
  mm/slub: Convert kmem_cache_cpu to struct slab
  mm/slub: Convert show_slab_objects() to struct slab
  mm/slub: Convert validate_slab() to take a struct slab
  mm/slub: Convert count_partial() to struct slab
  mm/slub: Convert bootstrap() to struct slab
  mm/slub: Convert __kmem_cache_do_shrink() to struct slab
  mm/slub: Convert free_partial() to use struct slab
  mm/slub: Convert list_slab_objects() to take a struct slab
  mm/slub: Convert slab_alloc_node() to use a struct slab
  mm/slub: Convert get_freelist() to take a struct slab
  mm/slub: Convert node_match() to take a struct slab
  mm/slub: Convert slab flushing to struct slab
  mm/slub: Convert __unfreeze_partials to take a struct slab
  mm/slub: Convert deactivate_slab() to take a struct slab
  mm/slub: Convert acquire_slab() to take a struct page
  mm/slub: Convert partial slab management to struct slab
  mm/slub: Convert slab freeing to struct slab
  mm/slub: Convert shuffle_freelist to struct slab
  mm/slub: Remove struct page argument to next_freelist_entry()
  mm/slub: Remove struct page argument from setup_object()
  mm/slub: Convert freelist_corrupted() to struct slab
  mm/slub: Convert full slab management to struct slab
  mm/slub: Convert free_consistency_checks() to take a struct slab
  mm/slub: Convert alloc_debug_processing() to struct slab
  mm/slub: Convert check_object() to struct slab
  mm/slub: Convert on_freelist() to struct slab
  mm/slub: Convert check_slab() to struct slab
  mm/slub: Convert check_valid_pointer() to struct slab
  mm/slub: Convert object_err() to take a struct slab
  mm/slub: Convert print_trailer() to struct slab
  mm/slub: Convert slab_err() to take a struct slab
  mm/slub: Convert print_page_info() to print_slab_info()
  mm/slub: Convert trace() to take a struct slab
  mm/slub: Convert cmpxchg_double_slab to struct slab
  mm/slub: Convert get_map() and __fill_map() to struct slab
  mm/slub: Convert slab_lock() and slab_unlock() to struct slab
  mm/slub: Convert setup_page_debug() to setup_slab_debug()
  mm/slub: Convert pfmemalloc_match() to take a struct slab
  mm/slub: Remove pfmemalloc_match_unsafe()
  mm: Convert slab to use struct slab
  mm: Convert slob to use struct slab
  mm: Convert slub to use struct slab
  memcg: Convert object cgroups from struct page to struct slab
  mm/kasan: Convert to struct slab
  zsmalloc: Stop using slab fields in struct page
  bootmem: Use page->index instead of page->freelist
  iommu: Use put_pages_list
  mm: Remove slab from struct page

 arch/x86/mm/init_64.c              |    2 +-
 drivers/iommu/amd/io_pgtable.c     |   99 +--
 drivers/iommu/dma-iommu.c          |   11 +-
 drivers/iommu/intel/iommu.c        |   89 +--
 include/asm-generic/memory_model.h |    2 +-
 include/linux/bootmem_info.h       |    2 +-
 include/linux/iommu.h              |    3 +-
 include/linux/kasan.h              |    8 +-
 include/linux/memcontrol.h         |   34 +-
 include/linux/mm.h                 |   16 +-
 include/linux/mm_types.h           |   82 +-
 include/linux/page-flags.h         |   66 +-
 include/linux/slab.h               |    8 -
 include/linux/slab_def.h           |   16 +-
 include/linux/slub_def.h           |   25 +-
 mm/bootmem_info.c                  |    7 +-
 mm/kasan/common.c                  |   25 +-
 mm/kasan/generic.c                 |    8 +-
 mm/kasan/kasan.h                   |    2 +-
 mm/kasan/quarantine.c              |    2 +-
 mm/kasan/report.c                  |   16 +-
 mm/kasan/report_tags.c             |   10 +-
 mm/memcontrol.c                    |   33 +-
 mm/slab.c                          |  423 +++++-----
 mm/slab.h                          |  164 +++-
 mm/slab_common.c                   |    8 +-
 mm/slob.c                          |   42 +-
 mm/slub.c                          | 1143 ++++++++++++++--------------
 mm/sparse.c                        |    8 +-
 mm/usercopy.c                      |   13 +-
 mm/zsmalloc.c                      |   18 +-
 31 files changed, 1212 insertions(+), 1173 deletions(-)

Comments

Johannes Weiner Oct. 11, 2021, 8:07 p.m. UTC | #1
On Mon, Oct 04, 2021 at 02:45:48PM +0100, Matthew Wilcox (Oracle) wrote:
> This is an offshoot of the folio work, although it does not depend on any
> of the outstanding pieces of folio work.  One of the more complex parts
> of the struct page definition is the parts used by the slab allocators.
> It would be good for the MM in general if struct slab were its own data
> type, and it also helps to prevent tail pages from slipping in anywhere.
> 
> The slub conversion is done "properly", ie in individually reviewable,
> bisectable chunks.  The slab & slob conversions are slapdash.  The
> patches to switch bootmem & zsmalloc away from slab elements are also
> expedient instead of well thought through.  The KASAN and memcg parts
> would also benefit from a more thoughtful approach.

This looks great to me. It's a huge step in disentangling struct page,
and it's already showing very cool downstream effects in somewhat
unexpected places like the memory cgroup controller.

> I'm not entirely happy with slab_test_cache() for a predicate name.
> I actually liked SlabAllocation() better, but then I remembered that we're
> trying to get away from InterCapping, and somehow slab_test_allocation()
> didn't feel right either.

I touched on this in the reply to the memcg patch, but I wonder if it
would be better to not have this test against struct slab at all.

It's basically a type test that means slab_is_slab(), and its
existence implies that if you see 'struct slab' somewhere, you don't
know for sure - without having more context - whether it's been vetted
or not. This makes 'struct slab' and option/maybe type that needs a
method of self-identification. Right now it can use the PG_slab bit in
the flags field shared with the page, but if it were separately
allocated some day, that would occupy dedicated memory.

And describing memory, that's struct page's role: to identify what's
sitting behind a random pfn or an address. "struct page should be a
pointer and a type tag", or something like that ;-)

If instead of this:

	slab = virt_to_slab(p);
	if (!slab_test_cache(slab)) {
		page = (struct page *)slab;
		do_page_things(page);
	} else {
		...
	}

you wrote it like this:

	page = virt_to_page(p);
	if (PageSlab(page)) {			/* page->type */
		slab = page_slab(page);		/* page->pointer */
		do_slab_things(slab);
	} else {
		do_bypass_things(page);
	}

it would keep the ambiguity and type resolution in the domain of the
page, and it would make struct slab a strong type that doesn't need to
self-identify.
Matthew Wilcox Oct. 12, 2021, 3:30 a.m. UTC | #2
On Mon, Oct 11, 2021 at 04:07:14PM -0400, Johannes Weiner wrote:
> This looks great to me. It's a huge step in disentangling struct page,
> and it's already showing very cool downstream effects in somewhat
> unexpected places like the memory cgroup controller.

Thanks!

> > I'm not entirely happy with slab_test_cache() for a predicate name.
> > I actually liked SlabAllocation() better, but then I remembered that we're
> > trying to get away from InterCapping, and somehow slab_test_allocation()
> > didn't feel right either.
> 
> I touched on this in the reply to the memcg patch, but I wonder if it
> would be better to not have this test against struct slab at all.
> 
> It's basically a type test that means slab_is_slab(), and its
> existence implies that if you see 'struct slab' somewhere, you don't
> know for sure - without having more context - whether it's been vetted
> or not. This makes 'struct slab' and option/maybe type that needs a
> method of self-identification. Right now it can use the PG_slab bit in
> the flags field shared with the page, but if it were separately
> allocated some day, that would occupy dedicated memory.
> 
> And describing memory, that's struct page's role: to identify what's
> sitting behind a random pfn or an address. "struct page should be a
> pointer and a type tag", or something like that ;-)
> 
> If instead of this:
> 
> 	slab = virt_to_slab(p);
> 	if (!slab_test_cache(slab)) {
> 		page = (struct page *)slab;
> 		do_page_things(page);
> 	} else {
> 		...
> 	}
> 
> you wrote it like this:
> 
> 	page = virt_to_page(p);
> 	if (PageSlab(page)) {			/* page->type */
> 		slab = page_slab(page);		/* page->pointer */
> 		do_slab_things(slab);
> 	} else {
> 		do_bypass_things(page);
> 	}
> 
> it would keep the ambiguity and type resolution in the domain of the
> page, and it would make struct slab a strong type that doesn't need to
> self-identify.

... yeah.  I'm absolutely on board with the goal of moving to each
struct page being very small and having essentially a single
discriminated pointer to its descriptor.  eg all 2^N pages allocated
to a slab would have a bit pattern of 0110 in the bottom 4 bits.
So PageSlab would still make sense, and calling page_slab() on something
that wasn't tagged with 0110 would cause some kind of error.

But until we get to that goal, it's hard to write code that's going to
work in either scenario.  We do need to keep the PG_slab bit to
distinguish slab pages from other kinds of memory.  And we don't want
to be unnecessarily calling compound_head() (although I'm willing to
do that for the sake of better looking code).

What if we replicated the PG_slab bit across all the tail pages
today?  So we could call PageSlab() on any page and instead of it
doing a compound_head() lookup, it just looked at the tail->flags.
Then page_slab() does the actual compound_head() lookup.  I think it
should be fairly cheap to set the extra bits at allocation because we
just set tail->compound_head, so the lines should still be in cache.
(PageSlab is marked as PF_NO_TAIL, so it calls compound_head() on
every access).