[1/1] kvm: Make VM ioctl do a vzalloc instead of a kzalloc

Message ID	20180419182950.67199-1-marcorr@google.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <kvm-owner@kernel.org> From: Marc Orr <marcorr@google.com> To: kvm@vger.kernel.org Cc: rientjes@google.com, jmattson@google.com, Marc Orr <marcorr@google.com> Subject: [kvm PATCH 1/1] kvm: Make VM ioctl do a vzalloc instead of a kzalloc Date: Thu, 19 Apr 2018 11:29:50 -0700 Message-Id: <20180419182950.67199-1-marcorr@google.com> Sender: kvm-owner@vger.kernel.org Precedence: bulk

Marc Orr April 19, 2018, 6:29 p.m. UTC

The kvm struct is (currently) tens of kilo-bytes, which turns out to be a
large amount of memory to allocate contiguously via kzalloc. Thus, this
patch changes the kzalloc to a vzalloc, so that the memory allocation is
less likely to fail in resource-constrained environments.

Signed-off-by: Marc Orr <marcorr@google.com>
---
 include/linux/kvm_host.h | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

Christian Borntraeger April 20, 2018, 8:34 a.m. UTC | #1

On 04/19/2018 08:29 PM, Marc Orr wrote:
> The kvm struct is (currently) tens of kilo-bytes, which turns out to be a
> large amount of memory to allocate contiguously via kzalloc. Thus, this
> patch changes the kzalloc to a vzalloc, so that the memory allocation is
> less likely to fail in resource-constrained environments.
> 
> Signed-off-by: Marc Orr <marcorr@google.com>
> ---
>  include/linux/kvm_host.h | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 6930c63126c7..cf7b6a3a4197 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -19,6 +19,7 @@
>  #include <linux/preempt.h>
>  #include <linux/msi.h>
>  #include <linux/slab.h>
> +#include <linux/vmalloc.h>
>  #include <linux/rcupdate.h>
>  #include <linux/ratelimit.h>
>  #include <linux/err.h>
> @@ -810,12 +811,12 @@ int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu);
>  #ifndef __KVM_HAVE_ARCH_VM_ALLOC
>  static inline struct kvm *kvm_arch_alloc_vm(void)
>  {
> -	return kzalloc(sizeof(struct kvm), GFP_KERNEL);
> +	return vzalloc(sizeof(struct kvm));
>  }
> 
>  static inline void kvm_arch_free_vm(struct kvm *kvm)
>  {
> -	kfree(kvm);
> +	vfree(kvm);
>  }
>  #endif
> 

I dont think this is necessarily safe. This contains kvm_arch and you
would need to verify that no architecture has data in struct kvm_arch 
that must be allocated with kmalloc. Fo example on s390 kernel virtual
== kernel real for kmalloc so some driver might rely on that.
FWIW, it looks like the s390 struct kvm_arch is fine, but this requires 
more review I guess.

Another thing: 

In s390 this is 14760 bytes and on x86 it seems to be 12440 on my system.
This should never fail, no?

Marc Orr April 20, 2018, 10:07 p.m. UTC | #2

First off---thanks for reviewing my patch!

With respect to the size of the struct, here's what I'm seeing on an
upstream
version of the kernel:

(gdb) p sizeof(struct kvm)
$1 = 42200
(gdb) p sizeof(struct kvm_arch)
$2 = 35496

I'm pretty new to KVM--so it's not obvious to me:
- Why the size of this structure differs on my machine vs. yours
- What fields in 'struct kvm_arch' vary by architecture

When I first read your email, my reaction was:
Let's vzalloc 'struct kvm' and kzalloc 'struct kvm_arch', but after looking
at
the size of 'struct kvm_arch' (see above), that obviously won't help :-).

I'll add that my initial reaction when I was posed with this problem was
that
40kB is NOTHING. But more experienced colleagues quickly explained to me
that
40 kB is actually a lot for Linux to guarantee contiguously via kalloc. I'm
not
sure if the <13 kB that you're seeing in your environment is much better,
but
regardless, it seems like we're better off fixing this now rather than
dealing
with it later when the structure inevitably bloats, both in the upstream
kernel
and within people's domain-specific use cases.

Thus, I wonder if folks have a suggestion on how to refactor these structs
so
that we can use valloc for most of the fields and partition
architecture-specific fields into a sub struct so that they can continue
being
allocated via kalloc.

Thanks,
Marc

On Fri, Apr 20, 2018 at 1:34 AM Christian Borntraeger <
borntraeger@de.ibm.com> wrote:

> On 04/19/2018 08:29 PM, Marc Orr wrote:
> > The kvm struct is (currently) tens of kilo-bytes, which turns out to be
a
> > large amount of memory to allocate contiguously via kzalloc. Thus, this
> > patch changes the kzalloc to a vzalloc, so that the memory allocation is
> > less likely to fail in resource-constrained environments.
> >
> > Signed-off-by: Marc Orr <marcorr@google.com>
> > ---
> >  include/linux/kvm_host.h | 5 +++--
> >  1 file changed, 3 insertions(+), 2 deletions(-)
> >
> > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> > index 6930c63126c7..cf7b6a3a4197 100644
> > --- a/include/linux/kvm_host.h
> > +++ b/include/linux/kvm_host.h
> > @@ -19,6 +19,7 @@
> >  #include <linux/preempt.h>
> >  #include <linux/msi.h>
> >  #include <linux/slab.h>
> > +#include <linux/vmalloc.h>
> >  #include <linux/rcupdate.h>
> >  #include <linux/ratelimit.h>
> >  #include <linux/err.h>
> > @@ -810,12 +811,12 @@ int kvm_arch_vcpu_should_kick(struct kvm_vcpu
*vcpu);
> >  #ifndef __KVM_HAVE_ARCH_VM_ALLOC
> >  static inline struct kvm *kvm_arch_alloc_vm(void)
> >  {
> > -     return kzalloc(sizeof(struct kvm), GFP_KERNEL);
> > +     return vzalloc(sizeof(struct kvm));
> >  }
> >
> >  static inline void kvm_arch_free_vm(struct kvm *kvm)
> >  {
> > -     kfree(kvm);
> > +     vfree(kvm);
> >  }
> >  #endif
> >

> I dont think this is necessarily safe. This contains kvm_arch and you
> would need to verify that no architecture has data in struct kvm_arch
> that must be allocated with kmalloc. Fo example on s390 kernel virtual
> == kernel real for kmalloc so some driver might rely on that.
> FWIW, it looks like the s390 struct kvm_arch is fine, but this requires
> more review I guess.

> Another thing:

> In s390 this is 14760 bytes and on x86 it seems to be 12440 on my system.
> This should never fail, no?

David Rientjes April 23, 2018, 4:40 a.m. UTC | #3

On Fri, 20 Apr 2018, Marc Orr wrote:

> When I first read your email, my reaction was:
> Let's vzalloc 'struct kvm' and kzalloc 'struct kvm_arch', but after looking
> at
> the size of 'struct kvm_arch' (see above), that obviously won't help :-).
> 
> I'll add that my initial reaction when I was posed with this problem was
> that
> 40kB is NOTHING. But more experienced colleagues quickly explained to me
> that
> 40 kB is actually a lot for Linux to guarantee contiguously via kalloc.

The issue is that this generates an order-4 allocation which results in 
significant fragmentation, is unmovable, and has an unexpected life cycle.  
This very often requires memory compaction to free up an otherwise movable 
pageblock since not many MIGRATE_UNMOVABLE pageblocks have order-4 memory 
free.  That makes the target pageblock also MIGRATE_UNMOVALBE because this 
is a GFP_KERNEL allocation and we can't migrate it.  The struct would be 
*much* more fragmentation friendly if it were order-0 or order-1 because 
we often have one or two pages available from existing MIGRATE_UNMOVABLE 
pageblocks.  The ultimate goal is to free as many full pageblocks as 
possible for transparent hugepages.

> I'm
> not
> sure if the <13 kB that you're seeing in your environment is much better,
> but
> regardless, it seems like we're better off fixing this now rather than
> dealing
> with it later when the structure inevitably bloats, both in the upstream
> kernel
> and within people's domain-specific use cases.
> 

Is it possible for someone to enumerate what exactly in struct kvm 
absolutely needs to be contiguous?  It seems like there is a concern about 
struct kvm_arch and whether the requirements differ depending on the 
platform.  Are there others?  Worst case scenario is that each arch has 
its own kvm_arch allocator that does kzalloc, vzalloc, or a combination 
depending on its requirements.

Christian Borntraeger April 23, 2018, 7:41 a.m. UTC | #4

On 04/21/2018 12:07 AM, Marc Orr wrote:
> First off---thanks for reviewing my patch!
> 
> With respect to the size of the struct, here's what I'm seeing on an
> upstream
> version of the kernel:
> 
> (gdb) p sizeof(struct kvm)
> $1 = 42200
> (gdb) p sizeof(struct kvm_arch)
> $2 = 35496

Yes, I looked at an older kernel on x86. Sorry about that. The newer kernels
certainly have grown. Would be interesting to see why. Can you run
pahole (usually in the package dwarves) on you kernel and search for 
"struct kvm "


> 
> I'm pretty new to KVM--so it's not obvious to me:
> - Why the size of this structure differs on my machine vs. yours
> - What fields in 'struct kvm_arch' vary by architecture
> 
> When I first read your email, my reaction was:
> Let's vzalloc 'struct kvm' and kzalloc 'struct kvm_arch', but after looking
> at
> the size of 'struct kvm_arch' (see above), that obviously won't help :-).
> 
> I'll add that my initial reaction when I was posed with this problem was
> that
> 40kB is NOTHING. But more experienced colleagues quickly explained to me
> that
> 40 kB is actually a lot for Linux to guarantee contiguously via kalloc. I'm
> not
> sure if the <13 kB that you're seeing in your environment is much better,
> but
> regardless, it seems like we're better off fixing this now rather than
> dealing
> with it later when the structure inevitably bloats, both in the upstream
> kernel
> and within people's domain-specific use cases.
> 
> Thus, I wonder if folks have a suggestion on how to refactor these structs
> so
> that we can use valloc for most of the fields and partition
> architecture-specific fields into a sub struct so that they can continue
> being
> allocated via kalloc.

If we split this structure into multiple allocations then we might be able
to continue to use kzalloc.

pahole might show if there is a big datastructure in there that wants to
be split out.

Christian Borntraeger April 23, 2018, 7:59 a.m. UTC | #5

On 04/23/2018 09:41 AM, Christian Borntraeger wrote:
> 
> 
> On 04/21/2018 12:07 AM, Marc Orr wrote:
>> First off---thanks for reviewing my patch!
>>
>> With respect to the size of the struct, here's what I'm seeing on an
>> upstream
>> version of the kernel:
>>
>> (gdb) p sizeof(struct kvm)
>> $1 = 42200
>> (gdb) p sizeof(struct kvm_arch)
>> $2 = 35496
> 
> Yes, I looked at an older kernel on x86. Sorry about that. The newer kernels
> certainly have grown. Would be interesting to see why. Can you run
> pahole (usually in the package dwarves) on you kernel and search for 
> "struct kvm "

Seems to be 
        struct hlist_head          mmu_page_hash[4096];  /*    24 32768 */

in kvm_arch. 

This is now much larger mostly due to 

commit 114df303a7eeae8b50ebf68229b7e647714a9bea
    kvm: x86: reduce collisions in mmu_page_hash


So maybe it is enough to allocate mmu_page_hash seperately? Adding David Matlack
for opinions.

David Matlack April 23, 2018, 3:50 p.m. UTC | #6

On Mon, Apr 23, 2018 at 12:59 AM Christian Borntraeger <
borntraeger@de.ibm.com> wrote:



> On 04/23/2018 09:41 AM, Christian Borntraeger wrote:
> >
> >
> > On 04/21/2018 12:07 AM, Marc Orr wrote:
> >> First off---thanks for reviewing my patch!
> >>
> >> With respect to the size of the struct, here's what I'm seeing on an
> >> upstream
> >> version of the kernel:
> >>
> >> (gdb) p sizeof(struct kvm)
> >> $1 = 42200
> >> (gdb) p sizeof(struct kvm_arch)
> >> $2 = 35496
> >
> > Yes, I looked at an older kernel on x86. Sorry about that. The newer
kernels
> > certainly have grown. Would be interesting to see why. Can you run
> > pahole (usually in the package dwarves) on you kernel and search for
> > "struct kvm "

> Seems to be
>          struct hlist_head          mmu_page_hash[4096];  /*    24 32768 */

> in kvm_arch.

> This is now much larger mostly due to

> commit 114df303a7eeae8b50ebf68229b7e647714a9bea
>      kvm: x86: reduce collisions in mmu_page_hash


> So maybe it is enough to allocate mmu_page_hash seperately? Adding David
Matlack
> for opinions.

Allocating mmu_page_hash separately would not create any issues I can think
of. But Mark's concern about future bloat still remains.

One option to move forward  would be to make this change x86-specific by
overriding kvm_arch_{alloc,free}_vm. Other architectures could do the same
once they check that it's safe.

David Rientjes April 23, 2018, 7:48 p.m. UTC | #7

On Mon, 23 Apr 2018, David Matlack wrote:

> > Seems to be
> >          struct hlist_head          mmu_page_hash[4096];  /*    24 32768 */
> 
> > in kvm_arch.
> 
> > This is now much larger mostly due to
> 
> > commit 114df303a7eeae8b50ebf68229b7e647714a9bea
> >      kvm: x86: reduce collisions in mmu_page_hash
> 
> 
> > So maybe it is enough to allocate mmu_page_hash seperately? Adding David
> Matlack
> > for opinions.
> 
> Allocating mmu_page_hash separately would not create any issues I can think
> of. But Mark's concern about future bloat still remains.
> 
> One option to move forward  would be to make this change x86-specific by
> overriding kvm_arch_{alloc,free}_vm. Other architectures could do the same
> once they check that it's safe.
> 

x86 already appears to be doing that in kvm_x86_ops, but struct kvm_arch 
still is embedded into struct kvm.  I'm wondering if there are any issues 
with dynamically allocating kvm_arch for all architectures and then having 
a per-arch override for how to allocate struct kvm_arch (kzalloc or 
vzalloc or something else) depending on its needs?

Christian Borntraeger April 24, 2018, 5:48 a.m. UTC | #8

On 04/23/2018 09:48 PM, David Rientjes wrote:
> On Mon, 23 Apr 2018, David Matlack wrote:
> 
>>> Seems to be
>>>          struct hlist_head          mmu_page_hash[4096];  /*    24 32768 */
>>
>>> in kvm_arch.
>>
>>> This is now much larger mostly due to
>>
>>> commit 114df303a7eeae8b50ebf68229b7e647714a9bea
>>>      kvm: x86: reduce collisions in mmu_page_hash
>>
>>
>>> So maybe it is enough to allocate mmu_page_hash seperately? Adding David
>> Matlack
>>> for opinions.
>>
>> Allocating mmu_page_hash separately would not create any issues I can think
>> of. But Mark's concern about future bloat still remains.
>>
>> One option to move forward  would be to make this change x86-specific by
>> overriding kvm_arch_{alloc,free}_vm. Other architectures could do the same
>> once they check that it's safe.

That would certainly work. On the other hand it would be good if we could keep
as much code as possible common across architectures.
Paolo, Radim, any preference?
>>
> 
> x86 already appears to be doing that in kvm_x86_ops, but struct kvm_arch 
> still is embedded into struct kvm.  I'm wondering if there are any issues 
> with dynamically allocating kvm_arch for all architectures and then having 
> a per-arch override for how to allocate struct kvm_arch (kzalloc or 
> vzalloc or something else) depending on its needs?

The disadvantage of allocation vs embedding is that this is one additional pointer
chase for code that could be cache cold (e.g. guest was running). I have absolutely
no idea if that matters at all or not.

Radim Krčmář April 27, 2018, 7:08 p.m. UTC | #9

2018-04-24 07:48+0200, Christian Borntraeger:
> On 04/23/2018 09:48 PM, David Rientjes wrote:
> > On Mon, 23 Apr 2018, David Matlack wrote:
> > 
> >>> Seems to be
> >>>          struct hlist_head          mmu_page_hash[4096];  /*    24 32768 */
> >>
> >>> in kvm_arch.
> >>
> >>> This is now much larger mostly due to
> >>
> >>> commit 114df303a7eeae8b50ebf68229b7e647714a9bea
> >>>      kvm: x86: reduce collisions in mmu_page_hash
> >>
> >>
> >>> So maybe it is enough to allocate mmu_page_hash seperately? Adding David
> >> Matlack
> >>> for opinions.
> >>
> >> Allocating mmu_page_hash separately would not create any issues I can think
> >> of. But Mark's concern about future bloat still remains.
> >>
> >> One option to move forward  would be to make this change x86-specific by
> >> overriding kvm_arch_{alloc,free}_vm. Other architectures could do the same
> >> once they check that it's safe.
> 
> That would certainly work. On the other hand it would be good if we could keep
> as much code as possible common across architectures.
> Paolo, Radim, any preference?

I'd prefer to have one patch that switches all architectures to vzalloc
(this patch + hunk for x86), get acks from maintainers, and include it
in linux/next early (rc4 timeframe).

> > x86 already appears to be doing that in kvm_x86_ops, but struct kvm_arch 
> > still is embedded into struct kvm.  I'm wondering if there are any issues 
> > with dynamically allocating kvm_arch for all architectures and then having 
> > a per-arch override for how to allocate struct kvm_arch (kzalloc or 
> > vzalloc or something else) depending on its needs?
> 
> The disadvantage of allocation vs embedding is that this is one additional pointer
> chase for code that could be cache cold (e.g. guest was running). I have absolutely
> no idea if that matters at all or not. 

Yeah, better not to rely on CPU optimizations. :)

I would keep it embedded as I imagine that the part of KVM VM that needs
kzalloc would be small on any architecture and therefore better
allocated separately, instead of constricting the whole structure.

David Rientjes April 30, 2018, 8:33 p.m. UTC | #10

On Fri, 27 Apr 2018, Radim Krčmář wrote:

> 2018-04-24 07:48+0200, Christian Borntraeger:
> > On 04/23/2018 09:48 PM, David Rientjes wrote:
> > > On Mon, 23 Apr 2018, David Matlack wrote:
> > > 
> > >>> Seems to be
> > >>>          struct hlist_head          mmu_page_hash[4096];  /*    24 32768 */
> > >>
> > >>> in kvm_arch.
> > >>
> > >>> This is now much larger mostly due to
> > >>
> > >>> commit 114df303a7eeae8b50ebf68229b7e647714a9bea
> > >>>      kvm: x86: reduce collisions in mmu_page_hash
> > >>
> > >>
> > >>> So maybe it is enough to allocate mmu_page_hash seperately? Adding David
> > >> Matlack
> > >>> for opinions.
> > >>
> > >> Allocating mmu_page_hash separately would not create any issues I can think
> > >> of. But Mark's concern about future bloat still remains.
> > >>
> > >> One option to move forward  would be to make this change x86-specific by
> > >> overriding kvm_arch_{alloc,free}_vm. Other architectures could do the same
> > >> once they check that it's safe.
> > 
> > That would certainly work. On the other hand it would be good if we could keep
> > as much code as possible common across architectures.
> > Paolo, Radim, any preference?
> 
> I'd prefer to have one patch that switches all architectures to vzalloc
> (this patch + hunk for x86), get acks from maintainers, and include it
> in linux/next early (rc4 timeframe).
> 

Thanks Radim.  Marc, does this sound possible?

> > > x86 already appears to be doing that in kvm_x86_ops, but struct kvm_arch 
> > > still is embedded into struct kvm.  I'm wondering if there are any issues 
> > > with dynamically allocating kvm_arch for all architectures and then having 
> > > a per-arch override for how to allocate struct kvm_arch (kzalloc or 
> > > vzalloc or something else) depending on its needs?
> > 
> > The disadvantage of allocation vs embedding is that this is one additional pointer
> > chase for code that could be cache cold (e.g. guest was running). I have absolutely
> > no idea if that matters at all or not. 
> 
> Yeah, better not to rely on CPU optimizations. :)
> 
> I would keep it embedded as I imagine that the part of KVM VM that needs
> kzalloc would be small on any architecture and therefore better
> allocated separately, instead of constricting the whole structure.
>

Marc Orr April 30, 2018, 10:01 p.m. UTC | #11

Actually, I don't follow. Radim said:
"I'd prefer to have one patch that switches all architectures to vzalloc
(this patch + hunk for x86)..."

But Christian said vzalloc doesn't work for s390.

Thus, I see 2 options:
1. Update kvm_arch_{alloc,free}_vm for x86 only.
2. Update kvm_arch_{alloc,free}_vm globally, for x86, and override the
routines for s390 to maintain the kzalloc behavior.

Is option 2 what we've settled on? Let me know and I'll send a new patch
shortly.

Thanks,
Marc

On Mon, Apr 30, 2018 at 1:33 PM David Rientjes <rientjes@google.com> wrote:

> On Fri, 27 Apr 2018, Radim Krčmář wrote:

> > 2018-04-24 07:48+0200, Christian Borntraeger:
> > > On 04/23/2018 09:48 PM, David Rientjes wrote:
> > > > On Mon, 23 Apr 2018, David Matlack wrote:
> > > >
> > > >>> Seems to be
> > > >>>          struct hlist_head          mmu_page_hash[4096];  /*
  24 32768 */
> > > >>
> > > >>> in kvm_arch.
> > > >>
> > > >>> This is now much larger mostly due to
> > > >>
> > > >>> commit 114df303a7eeae8b50ebf68229b7e647714a9bea
> > > >>>      kvm: x86: reduce collisions in mmu_page_hash
> > > >>
> > > >>
> > > >>> So maybe it is enough to allocate mmu_page_hash seperately?
Adding David
> > > >> Matlack
> > > >>> for opinions.
> > > >>
> > > >> Allocating mmu_page_hash separately would not create any issues I
can think
> > > >> of. But Mark's concern about future bloat still remains.
> > > >>
> > > >> One option to move forward  would be to make this change
x86-specific by
> > > >> overriding kvm_arch_{alloc,free}_vm. Other architectures could do
the same
> > > >> once they check that it's safe.
> > >
> > > That would certainly work. On the other hand it would be good if we
could keep
> > > as much code as possible common across architectures.
> > > Paolo, Radim, any preference?
> >
> > I'd prefer to have one patch that switches all architectures to vzalloc
> > (this patch + hunk for x86), get acks from maintainers, and include it
> > in linux/next early (rc4 timeframe).
> >

> Thanks Radim.  Marc, does this sound possible?

> > > > x86 already appears to be doing that in kvm_x86_ops, but struct
kvm_arch
> > > > still is embedded into struct kvm.  I'm wondering if there are any
issues
> > > > with dynamically allocating kvm_arch for all architectures and then
having
> > > > a per-arch override for how to allocate struct kvm_arch (kzalloc or
> > > > vzalloc or something else) depending on its needs?
> > >
> > > The disadvantage of allocation vs embedding is that this is one
additional pointer
> > > chase for code that could be cache cold (e.g. guest was running). I
have absolutely
> > > no idea if that matters at all or not.
> >
> > Yeah, better not to rely on CPU optimizations. :)
> >
> > I would keep it embedded as I imagine that the part of KVM VM that needs
> > kzalloc would be small on any architecture and therefore better
> > allocated separately, instead of constricting the whole structure.
> >

Christian Borntraeger May 2, 2018, 7:16 a.m. UTC | #12

On 05/01/2018 12:01 AM, Marc Orr wrote:
> Actually, I don't follow. Radim said:
> "I'd prefer to have one patch that switches all architectures to vzalloc
> (this patch + hunk for x86)..."
> 
> But Christian said vzalloc doesn't work for s390.

I said 
---
I dont think this is necessarily safe. This contains kvm_arch and you
would need to verify that no architecture has data in struct kvm_arch 
that must be allocated with kmalloc. Fo example on s390 kernel virtual
== kernel real for kmalloc so some driver might rely on that.
FWIW, it looks like the s390 struct kvm_arch is fine, but this requires 
more review I guess.
---

so my statement was more on the "it could break" side. 
Right now it looks like that most critical data structure (crypto control
block, stfle and others) are already pointers and being allocated
separately but this needs some review.

David, Conny, Janosch, can you please also check struct kvm_arch on z for
data structures that are passed directly to the hardware? Those will break
if we switch to vzalloc.


> 
> Thus, I see 2 options:
> 1. Update kvm_arch_{alloc,free}_vm for x86 only.
> 2. Update kvm_arch_{alloc,free}_vm globally, for x86, and override the
> routines for s390 to maintain the kzalloc behavior.
> 
> Is option 2 what we've settled on? Let me know and I'll send a new patch
> shortly.
> 
> Thanks,
> Marc
> 
> On Mon, Apr 30, 2018 at 1:33 PM David Rientjes <rientjes@google.com> wrote:
> 
>> On Fri, 27 Apr 2018, Radim Krčmář wrote:
> 
>>> 2018-04-24 07:48+0200, Christian Borntraeger:
>>>> On 04/23/2018 09:48 PM, David Rientjes wrote:
>>>>> On Mon, 23 Apr 2018, David Matlack wrote:
>>>>>
>>>>>>> Seems to be
>>>>>>>          struct hlist_head          mmu_page_hash[4096];  /*
>   24 32768 */
>>>>>>
>>>>>>> in kvm_arch.
>>>>>>
>>>>>>> This is now much larger mostly due to
>>>>>>
>>>>>>> commit 114df303a7eeae8b50ebf68229b7e647714a9bea
>>>>>>>      kvm: x86: reduce collisions in mmu_page_hash
>>>>>>
>>>>>>
>>>>>>> So maybe it is enough to allocate mmu_page_hash seperately?
> Adding David
>>>>>> Matlack
>>>>>>> for opinions.
>>>>>>
>>>>>> Allocating mmu_page_hash separately would not create any issues I
> can think
>>>>>> of. But Mark's concern about future bloat still remains.
>>>>>>
>>>>>> One option to move forward  would be to make this change
> x86-specific by
>>>>>> overriding kvm_arch_{alloc,free}_vm. Other architectures could do
> the same
>>>>>> once they check that it's safe.
>>>>
>>>> That would certainly work. On the other hand it would be good if we
> could keep
>>>> as much code as possible common across architectures.
>>>> Paolo, Radim, any preference?
>>>
>>> I'd prefer to have one patch that switches all architectures to vzalloc
>>> (this patch + hunk for x86), get acks from maintainers, and include it
>>> in linux/next early (rc4 timeframe).
>>>
> 
>> Thanks Radim.  Marc, does this sound possible?
> 
>>>>> x86 already appears to be doing that in kvm_x86_ops, but struct
> kvm_arch
>>>>> still is embedded into struct kvm.  I'm wondering if there are any
> issues
>>>>> with dynamically allocating kvm_arch for all architectures and then
> having
>>>>> a per-arch override for how to allocate struct kvm_arch (kzalloc or
>>>>> vzalloc or something else) depending on its needs?
>>>>
>>>> The disadvantage of allocation vs embedding is that this is one
> additional pointer
>>>> chase for code that could be cache cold (e.g. guest was running). I
> have absolutely
>>>> no idea if that matters at all or not.
>>>
>>> Yeah, better not to rely on CPU optimizations. :)
>>>
>>> I would keep it embedded as I imagine that the part of KVM VM that needs
>>> kzalloc would be small on any architecture and therefore better
>>> allocated separately, instead of constricting the whole structure.
>>>
>

Cornelia Huck May 2, 2018, 7:48 a.m. UTC | #13

On Wed, 2 May 2018 09:16:50 +0200
Christian Borntraeger <borntraeger@de.ibm.com> wrote:

> On 05/01/2018 12:01 AM, Marc Orr wrote:
> > Actually, I don't follow. Radim said:
> > "I'd prefer to have one patch that switches all architectures to vzalloc
> > (this patch + hunk for x86)..."
> > 
> > But Christian said vzalloc doesn't work for s390.  
> 
> I said 
> ---
> I dont think this is necessarily safe. This contains kvm_arch and you
> would need to verify that no architecture has data in struct kvm_arch 
> that must be allocated with kmalloc. Fo example on s390 kernel virtual
> == kernel real for kmalloc so some driver might rely on that.
> FWIW, it looks like the s390 struct kvm_arch is fine, but this requires 
> more review I guess.
> ---
> 
> so my statement was more on the "it could break" side. 
> Right now it looks like that most critical data structure (crypto control
> block, stfle and others) are already pointers and being allocated
> separately but this needs some review.
> 
> David, Conny, Janosch, can you please also check struct kvm_arch on z for
> data structures that are passed directly to the hardware? Those will break
> if we switch to vzalloc.

From what I can see, the structures sensitive to kzalloc vs. vzalloc
are all in separately allocated blocks.

Janosch Frank May 2, 2018, 8:03 a.m. UTC | #14

On 02.05.2018 09:48, Cornelia Huck wrote:
> On Wed, 2 May 2018 09:16:50 +0200
> Christian Borntraeger <borntraeger@de.ibm.com> wrote:
> 
>> On 05/01/2018 12:01 AM, Marc Orr wrote:
>>> Actually, I don't follow. Radim said:
>>> "I'd prefer to have one patch that switches all architectures to vzalloc
>>> (this patch + hunk for x86)..."
>>>
>>> But Christian said vzalloc doesn't work for s390.  
>>
>> I said 
>> ---
>> I dont think this is necessarily safe. This contains kvm_arch and you
>> would need to verify that no architecture has data in struct kvm_arch 
>> that must be allocated with kmalloc. Fo example on s390 kernel virtual
>> == kernel real for kmalloc so some driver might rely on that.
>> FWIW, it looks like the s390 struct kvm_arch is fine, but this requires 
>> more review I guess.
>> ---
>>
>> so my statement was more on the "it could break" side. 
>> Right now it looks like that most critical data structure (crypto control
>> block, stfle and others) are already pointers and being allocated
>> separately but this needs some review.
>>
>> David, Conny, Janosch, can you please also check struct kvm_arch on z for
>> data structures that are passed directly to the hardware? Those will break
>> if we switch to vzalloc.
> 
> From what I can see, the structures sensitive to kzalloc vs. vzalloc
> are all in separately allocated blocks.

Yes, that's also what I can see.

David Hildenbrand May 2, 2018, 2:53 p.m. UTC | #15

On 02.05.2018 10:03, Janosch Frank wrote:
> On 02.05.2018 09:48, Cornelia Huck wrote:
>> On Wed, 2 May 2018 09:16:50 +0200
>> Christian Borntraeger <borntraeger@de.ibm.com> wrote:
>>
>>> On 05/01/2018 12:01 AM, Marc Orr wrote:
>>>> Actually, I don't follow. Radim said:
>>>> "I'd prefer to have one patch that switches all architectures to vzalloc
>>>> (this patch + hunk for x86)..."
>>>>
>>>> But Christian said vzalloc doesn't work for s390.  
>>>
>>> I said 
>>> ---
>>> I dont think this is necessarily safe. This contains kvm_arch and you
>>> would need to verify that no architecture has data in struct kvm_arch 
>>> that must be allocated with kmalloc. Fo example on s390 kernel virtual
>>> == kernel real for kmalloc so some driver might rely on that.
>>> FWIW, it looks like the s390 struct kvm_arch is fine, but this requires 
>>> more review I guess.
>>> ---
>>>
>>> so my statement was more on the "it could break" side. 
>>> Right now it looks like that most critical data structure (crypto control
>>> block, stfle and others) are already pointers and being allocated
>>> separately but this needs some review.
>>>
>>> David, Conny, Janosch, can you please also check struct kvm_arch on z for
>>> data structures that are passed directly to the hardware? Those will break
>>> if we switch to vzalloc.
>>
>> From what I can see, the structures sensitive to kzalloc vs. vzalloc
>> are all in separately allocated blocks.
> 
> Yes, that's also what I can see.
> 

Agreed, everything seems to be manually allocated (SCA, cbrlo, vsie
pages), lies in kvm_run (sdnx, riccb) or lies on the sie_page
(sie_block, itdb, crycb, gisa, fac_list).

Marc Orr May 3, 2018, 1:50 a.m. UTC | #16

Thanks for all the discussion on this. I'll send an updated version of the
patch by tomorrow.
Thanks,
Marc

On Wed, May 2, 2018 at 7:53 AM David Hildenbrand <david@redhat.com> wrote:

> On 02.05.2018 10:03, Janosch Frank wrote:
> > On 02.05.2018 09:48, Cornelia Huck wrote:
> >> On Wed, 2 May 2018 09:16:50 +0200
> >> Christian Borntraeger <borntraeger@de.ibm.com> wrote:
> >>
> >>> On 05/01/2018 12:01 AM, Marc Orr wrote:
> >>>> Actually, I don't follow. Radim said:
> >>>> "I'd prefer to have one patch that switches all architectures to
vzalloc
> >>>> (this patch + hunk for x86)..."
> >>>>
> >>>> But Christian said vzalloc doesn't work for s390.
> >>>
> >>> I said
> >>> ---
> >>> I dont think this is necessarily safe. This contains kvm_arch and you
> >>> would need to verify that no architecture has data in struct kvm_arch
> >>> that must be allocated with kmalloc. Fo example on s390 kernel virtual
> >>> == kernel real for kmalloc so some driver might rely on that.
> >>> FWIW, it looks like the s390 struct kvm_arch is fine, but this
requires
> >>> more review I guess.
> >>> ---
> >>>
> >>> so my statement was more on the "it could break" side.
> >>> Right now it looks like that most critical data structure (crypto
control
> >>> block, stfle and others) are already pointers and being allocated
> >>> separately but this needs some review.
> >>>
> >>> David, Conny, Janosch, can you please also check struct kvm_arch on z
for
> >>> data structures that are passed directly to the hardware? Those will
break
> >>> if we switch to vzalloc.
> >>
> >> From what I can see, the structures sensitive to kzalloc vs. vzalloc
> >> are all in separately allocated blocks.
> >
> > Yes, that's also what I can see.
> >

> Agreed, everything seems to be manually allocated (SCA, cbrlo, vsie
> pages), lies in kvm_run (sdnx, riccb) or lies on the sie_page
> (sie_block, itdb, crycb, gisa, fac_list).

> --

> Thanks,

> David / dhildenb

[1/1] kvm: Make VM ioctl do a vzalloc instead of a kzalloc

Commit Message

Comments

Patch