diff mbox series

[v2,7/7] s390/sparsemem: reduce section size to 128 MiB

Message ID 20241014144622.876731-8-david@redhat.com (mailing list archive)
State New
Headers show
Series virtio-mem: s390 support | expand

Commit Message

David Hildenbrand Oct. 14, 2024, 2:46 p.m. UTC
Ever since commit 421c175c4d609 ("[S390] Add support for memory hot-add.")
we've been using a section size of 256 MiB on s390 and 32 MiB on s390.
Before that, we were using a section size of 32 MiB on both
architectures.

Likely the reason was that we'd expect a storage increment size of
256 MiB under z/VM back then. As we didn't support memory blocks spanning
multiple memory sections, we would have had to handle having multiple
memory blocks for a single storage increment, which complicates things.
Although that issue reappeared with even bigger storage increment sizes
later, nowadays we have memory blocks that can span multiple memory
sections and we avoid any such issue completely.

Now that we have a new mechanism to expose additional memory to a VM --
virtio-mem -- reduce the section size to 128 MiB to allow for more
flexibility and reduce the metadata overhead when dealing with hot(un)plug
granularity smaller than 256 MiB.

128 MiB has been used by x86-64 since the very beginning. arm64 with 4k
base pages switched to 128 MiB as well: it's just big enough on these
architectures to allows for using a huge page (2 MiB) in the vmemmap in
sane setups with sizeof(struct page) == 64 bytes and a huge page mapping
in the direct mapping, while still allowing for small hot(un)plug
granularity.

For s390, we could even switch to a 64 MiB section size, as our huge page
size is 1 MiB: but the smaller the section size, the more sections we'll
have to manage especially on bigger machines. Making it consistent with
x86-64 and arm64 feels like te right thing for now.

Note that the smallest memory hot(un)plug granularity is also limited by
the memory block size, determined by extracting the memory increment
size from SCLP. Under QEMU/KVM, implementing virtio-mem, we expose 0;
therefore, we'll end up with a memory block size of 128 MiB with a
128 MiB section size.

Tested-by: Mario Casquero <mcasquer@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 arch/s390/include/asm/sparsemem.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Heiko Carstens Oct. 14, 2024, 5:53 p.m. UTC | #1
On Mon, Oct 14, 2024 at 04:46:19PM +0200, David Hildenbrand wrote:
> Ever since commit 421c175c4d609 ("[S390] Add support for memory hot-add.")
> we've been using a section size of 256 MiB on s390 and 32 MiB on s390.
> Before that, we were using a section size of 32 MiB on both
> architectures.
> 
> Likely the reason was that we'd expect a storage increment size of
> 256 MiB under z/VM back then. As we didn't support memory blocks spanning
> multiple memory sections, we would have had to handle having multiple
> memory blocks for a single storage increment, which complicates things.
> Although that issue reappeared with even bigger storage increment sizes
> later, nowadays we have memory blocks that can span multiple memory
> sections and we avoid any such issue completely.

I doubt that z/VM had support for memory hotplug back then already; and the
sclp memory hotplug code was always written in a way that it could handle
increment sizes smaller, larger or equal to section sizes.

If I remember correctly the section size was also be used to represent each
piece of memory in sysfs (aka memory block). So the different sizes were
chosen to avoid an excessive number of sysfs entries on 64 bit.

This problem went away later with the introduction of memory_block_size.

Even further back in time I think there were static arrays which had
2^(MAX_PHYSMEM_BITS - SECTION_SIZE_BITS) elements.

I just gave it a try and, as nowadays expected, bloat-o-meter doesn't
indicate anything like that anymore.

> 128 MiB has been used by x86-64 since the very beginning. arm64 with 4k
> base pages switched to 128 MiB as well: it's just big enough on these
> architectures to allows for using a huge page (2 MiB) in the vmemmap in
> sane setups with sizeof(struct page) == 64 bytes and a huge page mapping
> in the direct mapping, while still allowing for small hot(un)plug
> granularity.
> 
> For s390, we could even switch to a 64 MiB section size, as our huge page
> size is 1 MiB: but the smaller the section size, the more sections we'll
> have to manage especially on bigger machines. Making it consistent with
> x86-64 and arm64 feels like te right thing for now.

That's fine with me.

Acked-by: Heiko Carstens <hca@linux.ibm.com>
David Hildenbrand Oct. 14, 2024, 7:47 p.m. UTC | #2
On 14.10.24 19:53, Heiko Carstens wrote:
> On Mon, Oct 14, 2024 at 04:46:19PM +0200, David Hildenbrand wrote:
>> Ever since commit 421c175c4d609 ("[S390] Add support for memory hot-add.")
>> we've been using a section size of 256 MiB on s390 and 32 MiB on s390.
>> Before that, we were using a section size of 32 MiB on both
>> architectures.
>>
>> Likely the reason was that we'd expect a storage increment size of
>> 256 MiB under z/VM back then. As we didn't support memory blocks spanning
>> multiple memory sections, we would have had to handle having multiple
>> memory blocks for a single storage increment, which complicates things.
>> Although that issue reappeared with even bigger storage increment sizes
>> later, nowadays we have memory blocks that can span multiple memory
>> sections and we avoid any such issue completely.
> 
> I doubt that z/VM had support for memory hotplug back then already; and the
> sclp memory hotplug code was always written in a way that it could handle
> increment sizes smaller, larger or equal to section sizes.
 > > If I remember correctly the section size was also be used to 
represent each
> piece of memory in sysfs (aka memory block). So the different sizes were
> chosen to avoid an excessive number of sysfs entries on 64 bit.
 > > This problem went away later with the introduction of 
memory_block_size.
> 
> Even further back in time I think there were static arrays which had
> 2^(MAX_PHYSMEM_BITS - SECTION_SIZE_BITS) elements.

Interesting. I'll drop the "Likely ..." paragraph then!

> 
> I just gave it a try and, as nowadays expected, bloat-o-meter doesn't
> indicate anything like that anymore.
> 
>> 128 MiB has been used by x86-64 since the very beginning. arm64 with 4k
>> base pages switched to 128 MiB as well: it's just big enough on these
>> architectures to allows for using a huge page (2 MiB) in the vmemmap in
>> sane setups with sizeof(struct page) == 64 bytes and a huge page mapping
>> in the direct mapping, while still allowing for small hot(un)plug
>> granularity.
>>
>> For s390, we could even switch to a 64 MiB section size, as our huge page
>> size is 1 MiB: but the smaller the section size, the more sections we'll
>> have to manage especially on bigger machines. Making it consistent with
>> x86-64 and arm64 feels like te right thing for now.
> 
> That's fine with me.
> 
> Acked-by: Heiko Carstens <hca@linux.ibm.com>
> 

Thanks!
diff mbox series

Patch

diff --git a/arch/s390/include/asm/sparsemem.h b/arch/s390/include/asm/sparsemem.h
index c549893602ea..ff628c50afac 100644
--- a/arch/s390/include/asm/sparsemem.h
+++ b/arch/s390/include/asm/sparsemem.h
@@ -2,7 +2,7 @@ 
 #ifndef _ASM_S390_SPARSEMEM_H
 #define _ASM_S390_SPARSEMEM_H
 
-#define SECTION_SIZE_BITS	28
+#define SECTION_SIZE_BITS	27
 #define MAX_PHYSMEM_BITS	CONFIG_MAX_PHYSMEM_BITS
 
 #endif /* _ASM_S390_SPARSEMEM_H */