diff mbox

[v2] kvm, mm: account shadow page tables to kmemcg

Message ID 20180629140224.205849-1-shakeelb@google.com (mailing list archive)
State New, archived
Headers show

Commit Message

Shakeel Butt June 29, 2018, 2:02 p.m. UTC
The size of kvm's shadow page tables corresponds to the size of the
guest virtual machines on the system. Large VMs can spend a significant
amount of memory as shadow page tables which can not be left as system
memory overhead. So, account shadow page tables to the kmemcg.

Signed-off-by: Shakeel Butt <shakeelb@google.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Peter Feiner <pfeiner@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: stable@vger.kernel.org
---
Changelog since v1:
- replaced (GFP_KERNEL|__GFP_ACCOUNT) with GFP_KERNEL_ACCOUNT

 arch/x86/kvm/mmu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Michal Hocko June 29, 2018, 2:30 p.m. UTC | #1
On Fri 29-06-18 07:02:24, Shakeel Butt wrote:
> The size of kvm's shadow page tables corresponds to the size of the
> guest virtual machines on the system. Large VMs can spend a significant
> amount of memory as shadow page tables which can not be left as system
> memory overhead. So, account shadow page tables to the kmemcg.
> 
> Signed-off-by: Shakeel Butt <shakeelb@google.com>
> Cc: Michal Hocko <mhocko@kernel.org>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Greg Thelen <gthelen@google.com>
> Cc: Radim Krčmář <rkrcmar@redhat.com>
> Cc: Peter Feiner <pfeiner@google.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: stable@vger.kernel.org

I am not familiar wtih kvm to judge but if we are going to account this
memory we will probably want to let oom_badness know how much memory
to account to a specific process. Is this something that we can do?
We will probably need a new MM_KERNEL rss_stat stat for that purpose.

Just to make it clear. I am not opposing to this patch but considering
that shadow page tables might consume a lot of memory it would be good
to know who is responsible for it from the OOM perspective. Something to
solve on top of this.

I would also love to see a note how this memory is bound to the owner
life time in the changelog. That would make the review much more easier.

> ---
> Changelog since v1:
> - replaced (GFP_KERNEL|__GFP_ACCOUNT) with GFP_KERNEL_ACCOUNT
> 
>  arch/x86/kvm/mmu.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index d594690d8b95..6b8f11521c41 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -890,7 +890,7 @@ static int mmu_topup_memory_cache_page(struct kvm_mmu_memory_cache *cache,
>  	if (cache->nobjs >= min)
>  		return 0;
>  	while (cache->nobjs < ARRAY_SIZE(cache->objects)) {
> -		page = (void *)__get_free_page(GFP_KERNEL);
> +		page = (void *)__get_free_page(GFP_KERNEL_ACCOUNT);
>  		if (!page)
>  			return -ENOMEM;
>  		cache->objects[cache->nobjs++] = page;
> -- 
> 2.18.0.rc2.346.g013aa6912e-goog
Paolo Bonzini June 29, 2018, 2:40 p.m. UTC | #2
On 29/06/2018 16:30, Michal Hocko wrote:
> I am not familiar wtih kvm to judge but if we are going to account this
> memory we will probably want to let oom_badness know how much memory
> to account to a specific process. Is this something that we can do?
> We will probably need a new MM_KERNEL rss_stat stat for that purpose.
> 
> Just to make it clear. I am not opposing to this patch but considering
> that shadow page tables might consume a lot of memory it would be good
> to know who is responsible for it from the OOM perspective. Something to
> solve on top of this.

The amount of memory is generally proportional to the size of the
virtual machine memory, which is reflected directly into RSS.  Because
KVM processes are usually huge, and will probably dwarf everything else
in the system (except firefox and chromium of course :)), the general
order of magnitude of the oom_badness should be okay.

> I would also love to see a note how this memory is bound to the owner
> life time in the changelog. That would make the review much more easier.

--verbose for people that aren't well versed in linux mm, please...

Paolo
Michal Hocko June 29, 2018, 2:55 p.m. UTC | #3
On Fri 29-06-18 16:40:23, Paolo Bonzini wrote:
> On 29/06/2018 16:30, Michal Hocko wrote:
> > I am not familiar wtih kvm to judge but if we are going to account this
> > memory we will probably want to let oom_badness know how much memory
> > to account to a specific process. Is this something that we can do?
> > We will probably need a new MM_KERNEL rss_stat stat for that purpose.
> > 
> > Just to make it clear. I am not opposing to this patch but considering
> > that shadow page tables might consume a lot of memory it would be good
> > to know who is responsible for it from the OOM perspective. Something to
> > solve on top of this.
> 
> The amount of memory is generally proportional to the size of the
> virtual machine memory, which is reflected directly into RSS.  Because
> KVM processes are usually huge, and will probably dwarf everything else
> in the system (except firefox and chromium of course :)), the general
> order of magnitude of the oom_badness should be okay.

I think we will need MM_KERNEL longterm anyway. As I've said this is not
a must for this patch to go. But it is better to have a fair comparision
and kill larger processes if at all possible. It seems this should be
the case here.
 
> > I would also love to see a note how this memory is bound to the owner
> > life time in the changelog. That would make the review much more easier.
> 
> --verbose for people that aren't well versed in linux mm, please...

Well, if the memory accounted to the memcg hits the hard limit and there
is no way to reclaim anything to reduce the charged memory then we have
to kill something. Hopefully the memory hog. If that one dies it would
be great it releases its charges along the way. My remark was just to
explain how that would happen for this specific type of memory. Bound to
a file, has its own tear down etc. Basically make life of reviewers
easier to understand the lifetime of charged objects without digging
deep into the specific subsystem.
Paolo Bonzini June 29, 2018, 3:43 p.m. UTC | #4
On 29/06/2018 16:55, Michal Hocko wrote:
>>> I would also love to see a note how this memory is bound to the owner
>>> life time in the changelog. That would make the review much more easier.
>> --verbose for people that aren't well versed in linux mm, please...
> Well, if the memory accounted to the memcg hits the hard limit and there
> is no way to reclaim anything to reduce the charged memory then we have
> to kill something. Hopefully the memory hog. If that one dies it would
> be great it releases its charges along the way. My remark was just to
> explain how that would happen for this specific type of memory. Bound to
> a file, has its own tear down etc. Basically make life of reviewers
> easier to understand the lifetime of charged objects without digging
> deep into the specific subsystem.

Oh I see.  Yes, it's all freed when the VM file descriptor (which you
get with a ioctl on /dev/kvm) is closed.

Thanks,

Paolo
Shakeel Butt June 29, 2018, 4:47 p.m. UTC | #5
On Fri, Jun 29, 2018 at 7:55 AM Michal Hocko <mhocko@kernel.org> wrote:
>
> On Fri 29-06-18 16:40:23, Paolo Bonzini wrote:
> > On 29/06/2018 16:30, Michal Hocko wrote:
> > > I am not familiar wtih kvm to judge but if we are going to account this
> > > memory we will probably want to let oom_badness know how much memory
> > > to account to a specific process. Is this something that we can do?
> > > We will probably need a new MM_KERNEL rss_stat stat for that purpose.
> > >
> > > Just to make it clear. I am not opposing to this patch but considering
> > > that shadow page tables might consume a lot of memory it would be good
> > > to know who is responsible for it from the OOM perspective. Something to
> > > solve on top of this.
> >
> > The amount of memory is generally proportional to the size of the
> > virtual machine memory, which is reflected directly into RSS.  Because
> > KVM processes are usually huge, and will probably dwarf everything else
> > in the system (except firefox and chromium of course :)), the general
> > order of magnitude of the oom_badness should be okay.
>
> I think we will need MM_KERNEL longterm anyway. As I've said this is not
> a must for this patch to go. But it is better to have a fair comparision
> and kill larger processes if at all possible. It seems this should be
> the case here.
>

I will look more into MM_KERNEL counter. I still have couple more kmem
allocations in kvm (like dirty bitmap) which I want to be accounted. I
will bundle them together.

thanks,
Shakeel
diff mbox

Patch

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index d594690d8b95..6b8f11521c41 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -890,7 +890,7 @@  static int mmu_topup_memory_cache_page(struct kvm_mmu_memory_cache *cache,
 	if (cache->nobjs >= min)
 		return 0;
 	while (cache->nobjs < ARRAY_SIZE(cache->objects)) {
-		page = (void *)__get_free_page(GFP_KERNEL);
+		page = (void *)__get_free_page(GFP_KERNEL_ACCOUNT);
 		if (!page)
 			return -ENOMEM;
 		cache->objects[cache->nobjs++] = page;