Message ID | 20180419182950.67199-1-marcorr@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 04/19/2018 08:29 PM, Marc Orr wrote: > The kvm struct is (currently) tens of kilo-bytes, which turns out to be a > large amount of memory to allocate contiguously via kzalloc. Thus, this > patch changes the kzalloc to a vzalloc, so that the memory allocation is > less likely to fail in resource-constrained environments. > > Signed-off-by: Marc Orr <marcorr@google.com> > --- > include/linux/kvm_host.h | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h > index 6930c63126c7..cf7b6a3a4197 100644 > --- a/include/linux/kvm_host.h > +++ b/include/linux/kvm_host.h > @@ -19,6 +19,7 @@ > #include <linux/preempt.h> > #include <linux/msi.h> > #include <linux/slab.h> > +#include <linux/vmalloc.h> > #include <linux/rcupdate.h> > #include <linux/ratelimit.h> > #include <linux/err.h> > @@ -810,12 +811,12 @@ int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu); > #ifndef __KVM_HAVE_ARCH_VM_ALLOC > static inline struct kvm *kvm_arch_alloc_vm(void) > { > - return kzalloc(sizeof(struct kvm), GFP_KERNEL); > + return vzalloc(sizeof(struct kvm)); > } > > static inline void kvm_arch_free_vm(struct kvm *kvm) > { > - kfree(kvm); > + vfree(kvm); > } > #endif > I dont think this is necessarily safe. This contains kvm_arch and you would need to verify that no architecture has data in struct kvm_arch that must be allocated with kmalloc. Fo example on s390 kernel virtual == kernel real for kmalloc so some driver might rely on that. FWIW, it looks like the s390 struct kvm_arch is fine, but this requires more review I guess. Another thing: In s390 this is 14760 bytes and on x86 it seems to be 12440 on my system. This should never fail, no?
First off---thanks for reviewing my patch! With respect to the size of the struct, here's what I'm seeing on an upstream version of the kernel: (gdb) p sizeof(struct kvm) $1 = 42200 (gdb) p sizeof(struct kvm_arch) $2 = 35496 I'm pretty new to KVM--so it's not obvious to me: - Why the size of this structure differs on my machine vs. yours - What fields in 'struct kvm_arch' vary by architecture When I first read your email, my reaction was: Let's vzalloc 'struct kvm' and kzalloc 'struct kvm_arch', but after looking at the size of 'struct kvm_arch' (see above), that obviously won't help :-). I'll add that my initial reaction when I was posed with this problem was that 40kB is NOTHING. But more experienced colleagues quickly explained to me that 40 kB is actually a lot for Linux to guarantee contiguously via kalloc. I'm not sure if the <13 kB that you're seeing in your environment is much better, but regardless, it seems like we're better off fixing this now rather than dealing with it later when the structure inevitably bloats, both in the upstream kernel and within people's domain-specific use cases. Thus, I wonder if folks have a suggestion on how to refactor these structs so that we can use valloc for most of the fields and partition architecture-specific fields into a sub struct so that they can continue being allocated via kalloc. Thanks, Marc On Fri, Apr 20, 2018 at 1:34 AM Christian Borntraeger < borntraeger@de.ibm.com> wrote: > On 04/19/2018 08:29 PM, Marc Orr wrote: > > The kvm struct is (currently) tens of kilo-bytes, which turns out to be a > > large amount of memory to allocate contiguously via kzalloc. Thus, this > > patch changes the kzalloc to a vzalloc, so that the memory allocation is > > less likely to fail in resource-constrained environments. > > > > Signed-off-by: Marc Orr <marcorr@google.com> > > --- > > include/linux/kvm_host.h | 5 +++-- > > 1 file changed, 3 insertions(+), 2 deletions(-) > > > > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h > > index 6930c63126c7..cf7b6a3a4197 100644 > > --- a/include/linux/kvm_host.h > > +++ b/include/linux/kvm_host.h > > @@ -19,6 +19,7 @@ > > #include <linux/preempt.h> > > #include <linux/msi.h> > > #include <linux/slab.h> > > +#include <linux/vmalloc.h> > > #include <linux/rcupdate.h> > > #include <linux/ratelimit.h> > > #include <linux/err.h> > > @@ -810,12 +811,12 @@ int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu); > > #ifndef __KVM_HAVE_ARCH_VM_ALLOC > > static inline struct kvm *kvm_arch_alloc_vm(void) > > { > > - return kzalloc(sizeof(struct kvm), GFP_KERNEL); > > + return vzalloc(sizeof(struct kvm)); > > } > > > > static inline void kvm_arch_free_vm(struct kvm *kvm) > > { > > - kfree(kvm); > > + vfree(kvm); > > } > > #endif > > > I dont think this is necessarily safe. This contains kvm_arch and you > would need to verify that no architecture has data in struct kvm_arch > that must be allocated with kmalloc. Fo example on s390 kernel virtual > == kernel real for kmalloc so some driver might rely on that. > FWIW, it looks like the s390 struct kvm_arch is fine, but this requires > more review I guess. > Another thing: > In s390 this is 14760 bytes and on x86 it seems to be 12440 on my system. > This should never fail, no?
On Fri, 20 Apr 2018, Marc Orr wrote: > When I first read your email, my reaction was: > Let's vzalloc 'struct kvm' and kzalloc 'struct kvm_arch', but after looking > at > the size of 'struct kvm_arch' (see above), that obviously won't help :-). > > I'll add that my initial reaction when I was posed with this problem was > that > 40kB is NOTHING. But more experienced colleagues quickly explained to me > that > 40 kB is actually a lot for Linux to guarantee contiguously via kalloc. The issue is that this generates an order-4 allocation which results in significant fragmentation, is unmovable, and has an unexpected life cycle. This very often requires memory compaction to free up an otherwise movable pageblock since not many MIGRATE_UNMOVABLE pageblocks have order-4 memory free. That makes the target pageblock also MIGRATE_UNMOVALBE because this is a GFP_KERNEL allocation and we can't migrate it. The struct would be *much* more fragmentation friendly if it were order-0 or order-1 because we often have one or two pages available from existing MIGRATE_UNMOVABLE pageblocks. The ultimate goal is to free as many full pageblocks as possible for transparent hugepages. > I'm > not > sure if the <13 kB that you're seeing in your environment is much better, > but > regardless, it seems like we're better off fixing this now rather than > dealing > with it later when the structure inevitably bloats, both in the upstream > kernel > and within people's domain-specific use cases. > Is it possible for someone to enumerate what exactly in struct kvm absolutely needs to be contiguous? It seems like there is a concern about struct kvm_arch and whether the requirements differ depending on the platform. Are there others? Worst case scenario is that each arch has its own kvm_arch allocator that does kzalloc, vzalloc, or a combination depending on its requirements.
On 04/21/2018 12:07 AM, Marc Orr wrote: > First off---thanks for reviewing my patch! > > With respect to the size of the struct, here's what I'm seeing on an > upstream > version of the kernel: > > (gdb) p sizeof(struct kvm) > $1 = 42200 > (gdb) p sizeof(struct kvm_arch) > $2 = 35496 Yes, I looked at an older kernel on x86. Sorry about that. The newer kernels certainly have grown. Would be interesting to see why. Can you run pahole (usually in the package dwarves) on you kernel and search for "struct kvm " > > I'm pretty new to KVM--so it's not obvious to me: > - Why the size of this structure differs on my machine vs. yours > - What fields in 'struct kvm_arch' vary by architecture > > When I first read your email, my reaction was: > Let's vzalloc 'struct kvm' and kzalloc 'struct kvm_arch', but after looking > at > the size of 'struct kvm_arch' (see above), that obviously won't help :-). > > I'll add that my initial reaction when I was posed with this problem was > that > 40kB is NOTHING. But more experienced colleagues quickly explained to me > that > 40 kB is actually a lot for Linux to guarantee contiguously via kalloc. I'm > not > sure if the <13 kB that you're seeing in your environment is much better, > but > regardless, it seems like we're better off fixing this now rather than > dealing > with it later when the structure inevitably bloats, both in the upstream > kernel > and within people's domain-specific use cases. > > Thus, I wonder if folks have a suggestion on how to refactor these structs > so > that we can use valloc for most of the fields and partition > architecture-specific fields into a sub struct so that they can continue > being > allocated via kalloc. If we split this structure into multiple allocations then we might be able to continue to use kzalloc. pahole might show if there is a big datastructure in there that wants to be split out.
On 04/23/2018 09:41 AM, Christian Borntraeger wrote: > > > On 04/21/2018 12:07 AM, Marc Orr wrote: >> First off---thanks for reviewing my patch! >> >> With respect to the size of the struct, here's what I'm seeing on an >> upstream >> version of the kernel: >> >> (gdb) p sizeof(struct kvm) >> $1 = 42200 >> (gdb) p sizeof(struct kvm_arch) >> $2 = 35496 > > Yes, I looked at an older kernel on x86. Sorry about that. The newer kernels > certainly have grown. Would be interesting to see why. Can you run > pahole (usually in the package dwarves) on you kernel and search for > "struct kvm " Seems to be struct hlist_head mmu_page_hash[4096]; /* 24 32768 */ in kvm_arch. This is now much larger mostly due to commit 114df303a7eeae8b50ebf68229b7e647714a9bea kvm: x86: reduce collisions in mmu_page_hash So maybe it is enough to allocate mmu_page_hash seperately? Adding David Matlack for opinions.
On Mon, Apr 23, 2018 at 12:59 AM Christian Borntraeger < borntraeger@de.ibm.com> wrote: > On 04/23/2018 09:41 AM, Christian Borntraeger wrote: > > > > > > On 04/21/2018 12:07 AM, Marc Orr wrote: > >> First off---thanks for reviewing my patch! > >> > >> With respect to the size of the struct, here's what I'm seeing on an > >> upstream > >> version of the kernel: > >> > >> (gdb) p sizeof(struct kvm) > >> $1 = 42200 > >> (gdb) p sizeof(struct kvm_arch) > >> $2 = 35496 > > > > Yes, I looked at an older kernel on x86. Sorry about that. The newer kernels > > certainly have grown. Would be interesting to see why. Can you run > > pahole (usually in the package dwarves) on you kernel and search for > > "struct kvm " > Seems to be > struct hlist_head mmu_page_hash[4096]; /* 24 32768 */ > in kvm_arch. > This is now much larger mostly due to > commit 114df303a7eeae8b50ebf68229b7e647714a9bea > kvm: x86: reduce collisions in mmu_page_hash > So maybe it is enough to allocate mmu_page_hash seperately? Adding David Matlack > for opinions. Allocating mmu_page_hash separately would not create any issues I can think of. But Mark's concern about future bloat still remains. One option to move forward would be to make this change x86-specific by overriding kvm_arch_{alloc,free}_vm. Other architectures could do the same once they check that it's safe.
On Mon, 23 Apr 2018, David Matlack wrote: > > Seems to be > > struct hlist_head mmu_page_hash[4096]; /* 24 32768 */ > > > in kvm_arch. > > > This is now much larger mostly due to > > > commit 114df303a7eeae8b50ebf68229b7e647714a9bea > > kvm: x86: reduce collisions in mmu_page_hash > > > > So maybe it is enough to allocate mmu_page_hash seperately? Adding David > Matlack > > for opinions. > > Allocating mmu_page_hash separately would not create any issues I can think > of. But Mark's concern about future bloat still remains. > > One option to move forward would be to make this change x86-specific by > overriding kvm_arch_{alloc,free}_vm. Other architectures could do the same > once they check that it's safe. > x86 already appears to be doing that in kvm_x86_ops, but struct kvm_arch still is embedded into struct kvm. I'm wondering if there are any issues with dynamically allocating kvm_arch for all architectures and then having a per-arch override for how to allocate struct kvm_arch (kzalloc or vzalloc or something else) depending on its needs?
On 04/23/2018 09:48 PM, David Rientjes wrote: > On Mon, 23 Apr 2018, David Matlack wrote: > >>> Seems to be >>> struct hlist_head mmu_page_hash[4096]; /* 24 32768 */ >> >>> in kvm_arch. >> >>> This is now much larger mostly due to >> >>> commit 114df303a7eeae8b50ebf68229b7e647714a9bea >>> kvm: x86: reduce collisions in mmu_page_hash >> >> >>> So maybe it is enough to allocate mmu_page_hash seperately? Adding David >> Matlack >>> for opinions. >> >> Allocating mmu_page_hash separately would not create any issues I can think >> of. But Mark's concern about future bloat still remains. >> >> One option to move forward would be to make this change x86-specific by >> overriding kvm_arch_{alloc,free}_vm. Other architectures could do the same >> once they check that it's safe. That would certainly work. On the other hand it would be good if we could keep as much code as possible common across architectures. Paolo, Radim, any preference? >> > > x86 already appears to be doing that in kvm_x86_ops, but struct kvm_arch > still is embedded into struct kvm. I'm wondering if there are any issues > with dynamically allocating kvm_arch for all architectures and then having > a per-arch override for how to allocate struct kvm_arch (kzalloc or > vzalloc or something else) depending on its needs? The disadvantage of allocation vs embedding is that this is one additional pointer chase for code that could be cache cold (e.g. guest was running). I have absolutely no idea if that matters at all or not.
2018-04-24 07:48+0200, Christian Borntraeger: > On 04/23/2018 09:48 PM, David Rientjes wrote: > > On Mon, 23 Apr 2018, David Matlack wrote: > > > >>> Seems to be > >>> struct hlist_head mmu_page_hash[4096]; /* 24 32768 */ > >> > >>> in kvm_arch. > >> > >>> This is now much larger mostly due to > >> > >>> commit 114df303a7eeae8b50ebf68229b7e647714a9bea > >>> kvm: x86: reduce collisions in mmu_page_hash > >> > >> > >>> So maybe it is enough to allocate mmu_page_hash seperately? Adding David > >> Matlack > >>> for opinions. > >> > >> Allocating mmu_page_hash separately would not create any issues I can think > >> of. But Mark's concern about future bloat still remains. > >> > >> One option to move forward would be to make this change x86-specific by > >> overriding kvm_arch_{alloc,free}_vm. Other architectures could do the same > >> once they check that it's safe. > > That would certainly work. On the other hand it would be good if we could keep > as much code as possible common across architectures. > Paolo, Radim, any preference? I'd prefer to have one patch that switches all architectures to vzalloc (this patch + hunk for x86), get acks from maintainers, and include it in linux/next early (rc4 timeframe). > > x86 already appears to be doing that in kvm_x86_ops, but struct kvm_arch > > still is embedded into struct kvm. I'm wondering if there are any issues > > with dynamically allocating kvm_arch for all architectures and then having > > a per-arch override for how to allocate struct kvm_arch (kzalloc or > > vzalloc or something else) depending on its needs? > > The disadvantage of allocation vs embedding is that this is one additional pointer > chase for code that could be cache cold (e.g. guest was running). I have absolutely > no idea if that matters at all or not. Yeah, better not to rely on CPU optimizations. :) I would keep it embedded as I imagine that the part of KVM VM that needs kzalloc would be small on any architecture and therefore better allocated separately, instead of constricting the whole structure.
On Fri, 27 Apr 2018, Radim Krčmář wrote: > 2018-04-24 07:48+0200, Christian Borntraeger: > > On 04/23/2018 09:48 PM, David Rientjes wrote: > > > On Mon, 23 Apr 2018, David Matlack wrote: > > > > > >>> Seems to be > > >>> struct hlist_head mmu_page_hash[4096]; /* 24 32768 */ > > >> > > >>> in kvm_arch. > > >> > > >>> This is now much larger mostly due to > > >> > > >>> commit 114df303a7eeae8b50ebf68229b7e647714a9bea > > >>> kvm: x86: reduce collisions in mmu_page_hash > > >> > > >> > > >>> So maybe it is enough to allocate mmu_page_hash seperately? Adding David > > >> Matlack > > >>> for opinions. > > >> > > >> Allocating mmu_page_hash separately would not create any issues I can think > > >> of. But Mark's concern about future bloat still remains. > > >> > > >> One option to move forward would be to make this change x86-specific by > > >> overriding kvm_arch_{alloc,free}_vm. Other architectures could do the same > > >> once they check that it's safe. > > > > That would certainly work. On the other hand it would be good if we could keep > > as much code as possible common across architectures. > > Paolo, Radim, any preference? > > I'd prefer to have one patch that switches all architectures to vzalloc > (this patch + hunk for x86), get acks from maintainers, and include it > in linux/next early (rc4 timeframe). > Thanks Radim. Marc, does this sound possible? > > > x86 already appears to be doing that in kvm_x86_ops, but struct kvm_arch > > > still is embedded into struct kvm. I'm wondering if there are any issues > > > with dynamically allocating kvm_arch for all architectures and then having > > > a per-arch override for how to allocate struct kvm_arch (kzalloc or > > > vzalloc or something else) depending on its needs? > > > > The disadvantage of allocation vs embedding is that this is one additional pointer > > chase for code that could be cache cold (e.g. guest was running). I have absolutely > > no idea if that matters at all or not. > > Yeah, better not to rely on CPU optimizations. :) > > I would keep it embedded as I imagine that the part of KVM VM that needs > kzalloc would be small on any architecture and therefore better > allocated separately, instead of constricting the whole structure. >
Actually, I don't follow. Radim said: "I'd prefer to have one patch that switches all architectures to vzalloc (this patch + hunk for x86)..." But Christian said vzalloc doesn't work for s390. Thus, I see 2 options: 1. Update kvm_arch_{alloc,free}_vm for x86 only. 2. Update kvm_arch_{alloc,free}_vm globally, for x86, and override the routines for s390 to maintain the kzalloc behavior. Is option 2 what we've settled on? Let me know and I'll send a new patch shortly. Thanks, Marc On Mon, Apr 30, 2018 at 1:33 PM David Rientjes <rientjes@google.com> wrote: > On Fri, 27 Apr 2018, Radim Krčmář wrote: > > 2018-04-24 07:48+0200, Christian Borntraeger: > > > On 04/23/2018 09:48 PM, David Rientjes wrote: > > > > On Mon, 23 Apr 2018, David Matlack wrote: > > > > > > > >>> Seems to be > > > >>> struct hlist_head mmu_page_hash[4096]; /* 24 32768 */ > > > >> > > > >>> in kvm_arch. > > > >> > > > >>> This is now much larger mostly due to > > > >> > > > >>> commit 114df303a7eeae8b50ebf68229b7e647714a9bea > > > >>> kvm: x86: reduce collisions in mmu_page_hash > > > >> > > > >> > > > >>> So maybe it is enough to allocate mmu_page_hash seperately? Adding David > > > >> Matlack > > > >>> for opinions. > > > >> > > > >> Allocating mmu_page_hash separately would not create any issues I can think > > > >> of. But Mark's concern about future bloat still remains. > > > >> > > > >> One option to move forward would be to make this change x86-specific by > > > >> overriding kvm_arch_{alloc,free}_vm. Other architectures could do the same > > > >> once they check that it's safe. > > > > > > That would certainly work. On the other hand it would be good if we could keep > > > as much code as possible common across architectures. > > > Paolo, Radim, any preference? > > > > I'd prefer to have one patch that switches all architectures to vzalloc > > (this patch + hunk for x86), get acks from maintainers, and include it > > in linux/next early (rc4 timeframe). > > > Thanks Radim. Marc, does this sound possible? > > > > x86 already appears to be doing that in kvm_x86_ops, but struct kvm_arch > > > > still is embedded into struct kvm. I'm wondering if there are any issues > > > > with dynamically allocating kvm_arch for all architectures and then having > > > > a per-arch override for how to allocate struct kvm_arch (kzalloc or > > > > vzalloc or something else) depending on its needs? > > > > > > The disadvantage of allocation vs embedding is that this is one additional pointer > > > chase for code that could be cache cold (e.g. guest was running). I have absolutely > > > no idea if that matters at all or not. > > > > Yeah, better not to rely on CPU optimizations. :) > > > > I would keep it embedded as I imagine that the part of KVM VM that needs > > kzalloc would be small on any architecture and therefore better > > allocated separately, instead of constricting the whole structure. > >
On 05/01/2018 12:01 AM, Marc Orr wrote: > Actually, I don't follow. Radim said: > "I'd prefer to have one patch that switches all architectures to vzalloc > (this patch + hunk for x86)..." > > But Christian said vzalloc doesn't work for s390. I said --- I dont think this is necessarily safe. This contains kvm_arch and you would need to verify that no architecture has data in struct kvm_arch that must be allocated with kmalloc. Fo example on s390 kernel virtual == kernel real for kmalloc so some driver might rely on that. FWIW, it looks like the s390 struct kvm_arch is fine, but this requires more review I guess. --- so my statement was more on the "it could break" side. Right now it looks like that most critical data structure (crypto control block, stfle and others) are already pointers and being allocated separately but this needs some review. David, Conny, Janosch, can you please also check struct kvm_arch on z for data structures that are passed directly to the hardware? Those will break if we switch to vzalloc. > > Thus, I see 2 options: > 1. Update kvm_arch_{alloc,free}_vm for x86 only. > 2. Update kvm_arch_{alloc,free}_vm globally, for x86, and override the > routines for s390 to maintain the kzalloc behavior. > > Is option 2 what we've settled on? Let me know and I'll send a new patch > shortly. > > Thanks, > Marc > > On Mon, Apr 30, 2018 at 1:33 PM David Rientjes <rientjes@google.com> wrote: > >> On Fri, 27 Apr 2018, Radim Krčmář wrote: > >>> 2018-04-24 07:48+0200, Christian Borntraeger: >>>> On 04/23/2018 09:48 PM, David Rientjes wrote: >>>>> On Mon, 23 Apr 2018, David Matlack wrote: >>>>> >>>>>>> Seems to be >>>>>>> struct hlist_head mmu_page_hash[4096]; /* > 24 32768 */ >>>>>> >>>>>>> in kvm_arch. >>>>>> >>>>>>> This is now much larger mostly due to >>>>>> >>>>>>> commit 114df303a7eeae8b50ebf68229b7e647714a9bea >>>>>>> kvm: x86: reduce collisions in mmu_page_hash >>>>>> >>>>>> >>>>>>> So maybe it is enough to allocate mmu_page_hash seperately? > Adding David >>>>>> Matlack >>>>>>> for opinions. >>>>>> >>>>>> Allocating mmu_page_hash separately would not create any issues I > can think >>>>>> of. But Mark's concern about future bloat still remains. >>>>>> >>>>>> One option to move forward would be to make this change > x86-specific by >>>>>> overriding kvm_arch_{alloc,free}_vm. Other architectures could do > the same >>>>>> once they check that it's safe. >>>> >>>> That would certainly work. On the other hand it would be good if we > could keep >>>> as much code as possible common across architectures. >>>> Paolo, Radim, any preference? >>> >>> I'd prefer to have one patch that switches all architectures to vzalloc >>> (this patch + hunk for x86), get acks from maintainers, and include it >>> in linux/next early (rc4 timeframe). >>> > >> Thanks Radim. Marc, does this sound possible? > >>>>> x86 already appears to be doing that in kvm_x86_ops, but struct > kvm_arch >>>>> still is embedded into struct kvm. I'm wondering if there are any > issues >>>>> with dynamically allocating kvm_arch for all architectures and then > having >>>>> a per-arch override for how to allocate struct kvm_arch (kzalloc or >>>>> vzalloc or something else) depending on its needs? >>>> >>>> The disadvantage of allocation vs embedding is that this is one > additional pointer >>>> chase for code that could be cache cold (e.g. guest was running). I > have absolutely >>>> no idea if that matters at all or not. >>> >>> Yeah, better not to rely on CPU optimizations. :) >>> >>> I would keep it embedded as I imagine that the part of KVM VM that needs >>> kzalloc would be small on any architecture and therefore better >>> allocated separately, instead of constricting the whole structure. >>> >
On Wed, 2 May 2018 09:16:50 +0200 Christian Borntraeger <borntraeger@de.ibm.com> wrote: > On 05/01/2018 12:01 AM, Marc Orr wrote: > > Actually, I don't follow. Radim said: > > "I'd prefer to have one patch that switches all architectures to vzalloc > > (this patch + hunk for x86)..." > > > > But Christian said vzalloc doesn't work for s390. > > I said > --- > I dont think this is necessarily safe. This contains kvm_arch and you > would need to verify that no architecture has data in struct kvm_arch > that must be allocated with kmalloc. Fo example on s390 kernel virtual > == kernel real for kmalloc so some driver might rely on that. > FWIW, it looks like the s390 struct kvm_arch is fine, but this requires > more review I guess. > --- > > so my statement was more on the "it could break" side. > Right now it looks like that most critical data structure (crypto control > block, stfle and others) are already pointers and being allocated > separately but this needs some review. > > David, Conny, Janosch, can you please also check struct kvm_arch on z for > data structures that are passed directly to the hardware? Those will break > if we switch to vzalloc. From what I can see, the structures sensitive to kzalloc vs. vzalloc are all in separately allocated blocks.
On 02.05.2018 09:48, Cornelia Huck wrote: > On Wed, 2 May 2018 09:16:50 +0200 > Christian Borntraeger <borntraeger@de.ibm.com> wrote: > >> On 05/01/2018 12:01 AM, Marc Orr wrote: >>> Actually, I don't follow. Radim said: >>> "I'd prefer to have one patch that switches all architectures to vzalloc >>> (this patch + hunk for x86)..." >>> >>> But Christian said vzalloc doesn't work for s390. >> >> I said >> --- >> I dont think this is necessarily safe. This contains kvm_arch and you >> would need to verify that no architecture has data in struct kvm_arch >> that must be allocated with kmalloc. Fo example on s390 kernel virtual >> == kernel real for kmalloc so some driver might rely on that. >> FWIW, it looks like the s390 struct kvm_arch is fine, but this requires >> more review I guess. >> --- >> >> so my statement was more on the "it could break" side. >> Right now it looks like that most critical data structure (crypto control >> block, stfle and others) are already pointers and being allocated >> separately but this needs some review. >> >> David, Conny, Janosch, can you please also check struct kvm_arch on z for >> data structures that are passed directly to the hardware? Those will break >> if we switch to vzalloc. > > From what I can see, the structures sensitive to kzalloc vs. vzalloc > are all in separately allocated blocks. Yes, that's also what I can see.
On 02.05.2018 10:03, Janosch Frank wrote: > On 02.05.2018 09:48, Cornelia Huck wrote: >> On Wed, 2 May 2018 09:16:50 +0200 >> Christian Borntraeger <borntraeger@de.ibm.com> wrote: >> >>> On 05/01/2018 12:01 AM, Marc Orr wrote: >>>> Actually, I don't follow. Radim said: >>>> "I'd prefer to have one patch that switches all architectures to vzalloc >>>> (this patch + hunk for x86)..." >>>> >>>> But Christian said vzalloc doesn't work for s390. >>> >>> I said >>> --- >>> I dont think this is necessarily safe. This contains kvm_arch and you >>> would need to verify that no architecture has data in struct kvm_arch >>> that must be allocated with kmalloc. Fo example on s390 kernel virtual >>> == kernel real for kmalloc so some driver might rely on that. >>> FWIW, it looks like the s390 struct kvm_arch is fine, but this requires >>> more review I guess. >>> --- >>> >>> so my statement was more on the "it could break" side. >>> Right now it looks like that most critical data structure (crypto control >>> block, stfle and others) are already pointers and being allocated >>> separately but this needs some review. >>> >>> David, Conny, Janosch, can you please also check struct kvm_arch on z for >>> data structures that are passed directly to the hardware? Those will break >>> if we switch to vzalloc. >> >> From what I can see, the structures sensitive to kzalloc vs. vzalloc >> are all in separately allocated blocks. > > Yes, that's also what I can see. > Agreed, everything seems to be manually allocated (SCA, cbrlo, vsie pages), lies in kvm_run (sdnx, riccb) or lies on the sie_page (sie_block, itdb, crycb, gisa, fac_list).
Thanks for all the discussion on this. I'll send an updated version of the patch by tomorrow. Thanks, Marc On Wed, May 2, 2018 at 7:53 AM David Hildenbrand <david@redhat.com> wrote: > On 02.05.2018 10:03, Janosch Frank wrote: > > On 02.05.2018 09:48, Cornelia Huck wrote: > >> On Wed, 2 May 2018 09:16:50 +0200 > >> Christian Borntraeger <borntraeger@de.ibm.com> wrote: > >> > >>> On 05/01/2018 12:01 AM, Marc Orr wrote: > >>>> Actually, I don't follow. Radim said: > >>>> "I'd prefer to have one patch that switches all architectures to vzalloc > >>>> (this patch + hunk for x86)..." > >>>> > >>>> But Christian said vzalloc doesn't work for s390. > >>> > >>> I said > >>> --- > >>> I dont think this is necessarily safe. This contains kvm_arch and you > >>> would need to verify that no architecture has data in struct kvm_arch > >>> that must be allocated with kmalloc. Fo example on s390 kernel virtual > >>> == kernel real for kmalloc so some driver might rely on that. > >>> FWIW, it looks like the s390 struct kvm_arch is fine, but this requires > >>> more review I guess. > >>> --- > >>> > >>> so my statement was more on the "it could break" side. > >>> Right now it looks like that most critical data structure (crypto control > >>> block, stfle and others) are already pointers and being allocated > >>> separately but this needs some review. > >>> > >>> David, Conny, Janosch, can you please also check struct kvm_arch on z for > >>> data structures that are passed directly to the hardware? Those will break > >>> if we switch to vzalloc. > >> > >> From what I can see, the structures sensitive to kzalloc vs. vzalloc > >> are all in separately allocated blocks. > > > > Yes, that's also what I can see. > > > Agreed, everything seems to be manually allocated (SCA, cbrlo, vsie > pages), lies in kvm_run (sdnx, riccb) or lies on the sie_page > (sie_block, itdb, crycb, gisa, fac_list). > -- > Thanks, > David / dhildenb
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 6930c63126c7..cf7b6a3a4197 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -19,6 +19,7 @@ #include <linux/preempt.h> #include <linux/msi.h> #include <linux/slab.h> +#include <linux/vmalloc.h> #include <linux/rcupdate.h> #include <linux/ratelimit.h> #include <linux/err.h> @@ -810,12 +811,12 @@ int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu); #ifndef __KVM_HAVE_ARCH_VM_ALLOC static inline struct kvm *kvm_arch_alloc_vm(void) { - return kzalloc(sizeof(struct kvm), GFP_KERNEL); + return vzalloc(sizeof(struct kvm)); } static inline void kvm_arch_free_vm(struct kvm *kvm) { - kfree(kvm); + vfree(kvm); } #endif
The kvm struct is (currently) tens of kilo-bytes, which turns out to be a large amount of memory to allocate contiguously via kzalloc. Thus, this patch changes the kzalloc to a vzalloc, so that the memory allocation is less likely to fail in resource-constrained environments. Signed-off-by: Marc Orr <marcorr@google.com> --- include/linux/kvm_host.h | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)