mbox

[GIT,PULL,00/19] KVM: s390: Features for 4.19

Message ID 20180731084405.28953-1-borntraeger@de.ibm.com (mailing list archive)
State New, archived
Headers show

Pull-request

git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux.git tags/kvm-s390-next-4.19-1

Message

Christian Borntraeger July 31, 2018, 8:43 a.m. UTC
Paolo, Radim,

here are the s390 updates for KVM for 4.19. Most important change is
the initial support for host large pages. As this touches KVM and
s390/mm in an intermingled way we have created a topic branch that
is merged here and in Martins tree. In this way all merge conflicts
are nicely handled and avoided.


The following changes since commit 6f0d349d922ba44e4348a17a78ea51b7135965b1:

  Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net (2018-06-25 15:58:17 +0800)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux.git  tags/kvm-s390-next-4.19-1

for you to fetch changes up to 2375846193663a1282c0ef7093640ed3210dc09f:

  Merge tag 'hlp_stage1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into kvms390/next (2018-07-30 23:20:48 +0200)

----------------------------------------------------------------
KVM: s390: Features for 4.19

- initial version for host large page support. Must be enabled with
  module parameter hpage=1 and will conflict with the nested=1
  parameter.
- enable etoken facility for guests
- Fixes

----------------------------------------------------------------
Christian Borntraeger (2):
      KVM: s390/vsie: avoid sparse warning
      KVM: s390: add etoken support for guests

Claudio Imbrenda (2):
      KVM: s390: a utility function for migration
      KVM: s390: Fix storage attributes migration with memory slots

Dominik Dingel (2):
      s390/mm: Clear huge page storage keys on enable_skey
      s390/mm: hugetlb pages within a gmap can not be freed

Janosch Frank (14):
      KVM: s390: Replace clear_user with kvm_clear_guest
      s390/mm: Make gmap_protect_range more modular
      s390/mm: Abstract gmap notify bit setting
      s390/mm: Add gmap pmd linking
      s390/mm: Add gmap pmd notification bit setting
      s390/mm: Add gmap pmd invalidation and clearing
      s390/mm: Add huge page dirty sync support
      s390/mm: Clear skeys for newly mapped huge guest pmds
      s390/mm: Add huge pmd storage key handling
      KVM: s390: Add skey emulation fault handling
      KVM: s390: Beautify skey enable check
      s390/mm: Add huge page gmap linking support
      KVM: s390: Add huge page enablement control
      Merge tag 'hlp_stage1' of git://git.kernel.org/.../kvms390/linux into kvms390/next

 Documentation/virtual/kvm/api.txt   |  16 ++
 arch/s390/include/asm/gmap.h        |  10 +
 arch/s390/include/asm/hugetlb.h     |   5 +-
 arch/s390/include/asm/kvm_host.h    |  11 +-
 arch/s390/include/asm/mmu.h         |   2 +
 arch/s390/include/asm/mmu_context.h |   1 +
 arch/s390/include/asm/pgtable.h     |  13 +-
 arch/s390/include/uapi/asm/kvm.h    |   5 +-
 arch/s390/kvm/kvm-s390.c            | 387 +++++++++++++++++++-----------
 arch/s390/kvm/priv.c                | 143 ++++++++----
 arch/s390/kvm/vsie.c                |  11 +-
 arch/s390/mm/gmap.c                 | 454 ++++++++++++++++++++++++++++++++++--
 arch/s390/mm/hugetlbpage.c          |  24 ++
 arch/s390/mm/pageattr.c             |   6 +-
 arch/s390/mm/pgtable.c              | 159 +++++++++----
 arch/s390/tools/gen_facilities.c    |   3 +-
 include/linux/kvm_host.h            |   7 +
 include/uapi/linux/kvm.h            |   1 +
 virt/kvm/kvm_main.c                 |   2 +-
 19 files changed, 989 insertions(+), 271 deletions(-)

Comments

Paolo Bonzini Aug. 2, 2018, 11:59 a.m. UTC | #1
On 31/07/2018 10:43, Christian Borntraeger wrote:
> Paolo, Radim,
> 
> here are the s390 updates for KVM for 4.19. Most important change is
> the initial support for host large pages. As this touches KVM and
> s390/mm in an intermingled way we have created a topic branch that
> is merged here and in Martins tree. In this way all merge conflicts
> are nicely handled and avoided.
> 
> 
> The following changes since commit 6f0d349d922ba44e4348a17a78ea51b7135965b1:
> 
>   Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net (2018-06-25 15:58:17 +0800)
> 
> are available in the Git repository at:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux.git  tags/kvm-s390-next-4.19-1
> 
> for you to fetch changes up to 2375846193663a1282c0ef7093640ed3210dc09f:
> 
>   Merge tag 'hlp_stage1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into kvms390/next (2018-07-30 23:20:48 +0200)
> 
> ----------------------------------------------------------------
> KVM: s390: Features for 4.19
> 
> - initial version for host large page support. Must be enabled with
>   module parameter hpage=1 and will conflict with the nested=1
>   parameter.
> - enable etoken facility for guests
> - Fixes
> 
> ----------------------------------------------------------------
> Christian Borntraeger (2):
>       KVM: s390/vsie: avoid sparse warning
>       KVM: s390: add etoken support for guests
> 
> Claudio Imbrenda (2):
>       KVM: s390: a utility function for migration
>       KVM: s390: Fix storage attributes migration with memory slots
> 
> Dominik Dingel (2):
>       s390/mm: Clear huge page storage keys on enable_skey
>       s390/mm: hugetlb pages within a gmap can not be freed
> 
> Janosch Frank (14):
>       KVM: s390: Replace clear_user with kvm_clear_guest
>       s390/mm: Make gmap_protect_range more modular
>       s390/mm: Abstract gmap notify bit setting
>       s390/mm: Add gmap pmd linking
>       s390/mm: Add gmap pmd notification bit setting
>       s390/mm: Add gmap pmd invalidation and clearing
>       s390/mm: Add huge page dirty sync support
>       s390/mm: Clear skeys for newly mapped huge guest pmds
>       s390/mm: Add huge pmd storage key handling
>       KVM: s390: Add skey emulation fault handling
>       KVM: s390: Beautify skey enable check
>       s390/mm: Add huge page gmap linking support
>       KVM: s390: Add huge page enablement control
>       Merge tag 'hlp_stage1' of git://git.kernel.org/.../kvms390/linux into kvms390/next
> 
>  Documentation/virtual/kvm/api.txt   |  16 ++
>  arch/s390/include/asm/gmap.h        |  10 +
>  arch/s390/include/asm/hugetlb.h     |   5 +-
>  arch/s390/include/asm/kvm_host.h    |  11 +-
>  arch/s390/include/asm/mmu.h         |   2 +
>  arch/s390/include/asm/mmu_context.h |   1 +
>  arch/s390/include/asm/pgtable.h     |  13 +-
>  arch/s390/include/uapi/asm/kvm.h    |   5 +-
>  arch/s390/kvm/kvm-s390.c            | 387 +++++++++++++++++++-----------
>  arch/s390/kvm/priv.c                | 143 ++++++++----
>  arch/s390/kvm/vsie.c                |  11 +-
>  arch/s390/mm/gmap.c                 | 454 ++++++++++++++++++++++++++++++++++--
>  arch/s390/mm/hugetlbpage.c          |  24 ++
>  arch/s390/mm/pageattr.c             |   6 +-
>  arch/s390/mm/pgtable.c              | 159 +++++++++----
>  arch/s390/tools/gen_facilities.c    |   3 +-
>  include/linux/kvm_host.h            |   7 +
>  include/uapi/linux/kvm.h            |   1 +
>  virt/kvm/kvm_main.c                 |   2 +-
>  19 files changed, 989 insertions(+), 271 deletions(-)
> 

Pulled, thanks.

Paolo
Daniel P. Berrangé Aug. 6, 2018, 10:39 a.m. UTC | #2
On Mon, Aug 06, 2018 at 12:17:39PM +0200, David Hildenbrand wrote:
> On 31.07.2018 10:43, Christian Borntraeger wrote:
> > Paolo, Radim,
> > 
> > here are the s390 updates for KVM for 4.19. Most important change is
> > the initial support for host large pages. As this touches KVM and
> > s390/mm in an intermingled way we have created a topic branch that
> > is merged here and in Martins tree. In this way all merge conflicts
> > are nicely handled and avoided.
> > 
> > 
> > The following changes since commit 6f0d349d922ba44e4348a17a78ea51b7135965b1:
> > 
> >   Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net (2018-06-25 15:58:17 +0800)
> > 
> > are available in the Git repository at:
> > 
> >   git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux.git  tags/kvm-s390-next-4.19-1
> > 
> > for you to fetch changes up to 2375846193663a1282c0ef7093640ed3210dc09f:
> > 
> >   Merge tag 'hlp_stage1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into kvms390/next (2018-07-30 23:20:48 +0200)
> > 
> 
> Hi,
> 
> We had an internal discussion and some Daniel (cc) wondered if we should
> drop the hpage module parameter and instead glue this to the nested
> parameter.
> 
> E.g. nested=1 -> hpage cannot be enabled for a VM
>      nested=0 -> hpage can be enabled for a VM
> 
> Are we ready to expose this feature as default to all VMs? Opinions?
> 
> This means that nested=0 (default) environments will get hpage support
> and hpage support cannot be disabled by an admin.

NB an admin would still have to reserve huge pages for use at boot time,
and explicitly configure QEMU guest to use them in libvirt XML, which is
common with other architectures.

> Benefit is that necessary setup to use huge pages is limited.

My chief concern was to avoid extra s390-specific options as a prerequisite
for using the feature. Most mgmt app developers focus on x86, so any case
where other architecture diverge in behaviour from x86 makes it more likely
that the feature will not be usable under s390.

There's already 2 explicit steps required to enable use of huge pages with
a KVM guest, so it wasn't clear to me what extra protection adding a 3rd
hurdle brings.

> Downside is, that this is somewhat hidden behind another parameter and
> cannot be disabled.

IIUC, eventually nested VMs would support huge pages too, so its only a
short term limitation of nested=1 that this doesn't work with HP. 

Regards,
Daniel
Janosch Frank Aug. 6, 2018, 1:47 p.m. UTC | #3
On 06.08.2018 13:59, David Hildenbrand wrote:
> On 06.08.2018 13:50, Paolo Bonzini wrote:
>> On 06/08/2018 12:17, David Hildenbrand wrote:
>>> Hi,
>>>
>>> We had an internal discussion and some Daniel (cc) wondered if we should
>>> drop the hpage module parameter and instead glue this to the nested
>>> parameter.
>>>
>>> E.g. nested=1 -> hpage cannot be enabled for a VM
>>>      nested=0 -> hpage can be enabled for a VM
>>>
>>> Are we ready to expose this feature as default to all VMs? Opinions?
>>>
>>> This means that nested=0 (default) environments will get hpage support
>>> and hpage support cannot be disabled by an admin.
>>>
>>> Benefit is that necessary setup to use huge pages is limited.
>>> Downside is, that this is somewhat hidden behind another parameter and
>>> cannot be disabled.
>>
>> Regarding nested I agree with Daniel.  However, until dirty page logging
>> works at 4kb granularity (by the way---is it the actual KVM dirty page
>> logging, or storage keys, or both?), I think it's best to keep the
>> module parameter.
> 
> storage keys are right now not dirty tracked (there is no iterator model
> for it in QEMU yet, but there were plans to support it - we could use
> ordinary dirty tracking for it - pages that are marked dirty either have
> dirty page content or dirty storage keys - but it would require some
> changes).

An iterative approach to skey retrieval, and hence skey dirty tracking,
would only gain us something for really big guests that use keys
excessively, right?

That's currently not a scenario we optimize for, as Linux dropped skey
usage a while ago and is the only OS we run as a KVM VM.

> 
> So it KVM dirty page logging that is done on a 1MB basis for now. To
> track pages dirty on 4k granularity, we'll have to create fake page
> tables ("split") just like x86 for huge pages and write-protect all PMD
> entries when dirty tracking is enabled (via memslot). Also, these fake
> page tables will be required to get proper nested virtualization support
> running.
> 
> We decided to postpone this complexity and get the basic running and
> upstream first.
> 
> I would also vote for the parameter until we are sure that everything is
> working as expected (4k dirty tracking, vsie support, some more testing ...)

For now the parameter will stay until we fix all of that. The previous
mm code was very optimized for 4k and PGSTEs and even there I found
mistakes, so I don't want a user to be easily able to run a hp VM
without opting in first.

I might need to extend the documentation a bit, to list all
peculiarities of the current implementation.
Janosch Frank Aug. 6, 2018, 2:19 p.m. UTC | #4
On 06.08.2018 16:01, David Hildenbrand wrote:
> On 06.08.2018 15:47, Janosch Frank wrote:
>> On 06.08.2018 13:59, David Hildenbrand wrote:
>>> On 06.08.2018 13:50, Paolo Bonzini wrote:
>>>> On 06/08/2018 12:17, David Hildenbrand wrote:
>>>>> Hi,
>>>>>
>>>>> We had an internal discussion and some Daniel (cc) wondered if we should
>>>>> drop the hpage module parameter and instead glue this to the nested
>>>>> parameter.
>>>>>
>>>>> E.g. nested=1 -> hpage cannot be enabled for a VM
>>>>>      nested=0 -> hpage can be enabled for a VM
>>>>>
>>>>> Are we ready to expose this feature as default to all VMs? Opinions?
>>>>>
>>>>> This means that nested=0 (default) environments will get hpage support
>>>>> and hpage support cannot be disabled by an admin.
>>>>>
>>>>> Benefit is that necessary setup to use huge pages is limited.
>>>>> Downside is, that this is somewhat hidden behind another parameter and
>>>>> cannot be disabled.
>>>>
>>>> Regarding nested I agree with Daniel.  However, until dirty page logging
>>>> works at 4kb granularity (by the way---is it the actual KVM dirty page
>>>> logging, or storage keys, or both?), I think it's best to keep the
>>>> module parameter.
>>>
>>> storage keys are right now not dirty tracked (there is no iterator model
>>> for it in QEMU yet, but there were plans to support it - we could use
>>> ordinary dirty tracking for it - pages that are marked dirty either have
>>> dirty page content or dirty storage keys - but it would require some
>>> changes).
>>
>> An iterative approach to skey retrieval, and hence skey dirty tracking,
>> would only gain us something for really big guests that use keys
>> excessively, right?
> 
> We migrate them right now as one blob in QEMU (write_keys()) , so an
> iterative approach would allow us to migrate them while the source side
> is still running (and not while it has to be stopped).
> 
> We have 8bit/1byte for 4k. So it is 1/4096 of guest memory. A 4TB guest
> with storage keys touched (or touched by some nested guest!) would
> transfer 1GB of storage keys.
> 
> But we migrate them only if the guest made use of them, which is already
> a pretty good optimization for recent guests.

A simple yes would have sufficed :-) Also I think we currently do not
filter out 0 keys, which might be a lower hanging fruit to optimize and
might open up the user space/kernel interface for implementing iterative
migration later on.

But when looking at the stages that cmma migration went through and the
fact that we don't want to use skeys at all it once again boils down to
the statement of my last mail:

> 
>>
>> That's currently not a scenario we optimize for, as Linux dropped skey
>> usage a while ago and is the only OS we run as a KVM VM.
>>
>>>
>>> So it KVM dirty page logging that is done on a 1MB basis for now. To
>>> track pages dirty on 4k granularity, we'll have to create fake page
>>> tables ("split") just like x86 for huge pages and write-protect all PMD
>>> entries when dirty tracking is enabled (via memslot). Also, these fake
>>> page tables will be required to get proper nested virtualization support
>>> running.
>>>
>>> We decided to postpone this complexity and get the basic running and
>>> upstream first.
>>>
>>> I would also vote for the parameter until we are sure that everything is
>>> working as expected (4k dirty tracking, vsie support, some more testing ...)
>>
>> For now the parameter will stay until we fix all of that. The previous
>> mm code was very optimized for 4k and PGSTEs and even there I found
>> mistakes, so I don't want a user to be easily able to run a hp VM
>> without opting in first.
>>
>> I might need to extend the documentation a bit, to list all
>> peculiarities of the current implementation.
>>
> 
> Agreed.
>