diff mbox

[08/11] KVM: implement multiple address spaces

Message ID 1431956923-35602-9-git-send-email-pbonzini@redhat.com (mailing list archive)
State New, archived
Headers show

Commit Message

Paolo Bonzini May 18, 2015, 1:48 p.m. UTC
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 Documentation/virtual/kvm/api.txt        | 12 ++++++
 arch/powerpc/include/asm/kvm_book3s_64.h |  2 +-
 include/linux/kvm_host.h                 | 26 ++++++++++--
 include/uapi/linux/kvm.h                 |  1 +
 virt/kvm/kvm_main.c                      | 70 ++++++++++++++++++++------------
 5 files changed, 79 insertions(+), 32 deletions(-)

Comments

Radim Krčmář May 19, 2015, 1:32 p.m. UTC | #1
2015-05-18 15:48+0200, Paolo Bonzini:
> +If KVM_CAP_MULTI_ADDRESS_SPACE is available, bits 16-31 of "slot"
> +specifies the address space which is being modified.  They must be
> +less than the value that KVM_CHECK_EXTENSION returns for the
> +KVM_CAP_MULTI_ADDRESS_SPACE capability.

> +                                         Slots in separate address spaces
> +are unrelated;

I'd prefer to decouple address spaces and slots.  KVM_GET_DIRTY_LOG
could stay the same if we said that a slot can be in multiple address
spaces.  (Well, we could do that even now, by searching for slots that
differ only in address space id and pointing them to same dirty bitmap.
This even makes sense, which is a sign of lacking design :)

The main drawback is that forcing decoupling on existing IOCTLs would be
really awkward ... we'd need to have new API for address spaces;
there are two basic operations on an address space:
  add and remove slot (needs: slot id, address space id)
which would give roughly the same functionality as this patch.
(To maximixe compatibility, there could be a default address space and a
 slot flag that doesn't automatically add it there.)

On top of this, we could allow hierarchical address spaces, so very similar
address spaces (like SMM) would be easier to set up.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paolo Bonzini May 19, 2015, 4:19 p.m. UTC | #2
On 19/05/2015 15:32, Radim Kr?má? wrote:
> I'd prefer to decouple address spaces and slots.  KVM_GET_DIRTY_LOG
> could stay the same if we said that a slot can be in multiple address
> spaces.

Actually, my original plan was a third one.  I wanted to have a fixed
number of address spaces, where CPUs would use only the first ones (0
for non-x86, 0+1 for x86).  Then I would let userspace pick a "dirty log
address space" which, in the case of QEMU, would simply use ram_addr_t
as the address.

However, that doesn't work because when marking a page dirty you need
either a (slots, gfn) or a (memslot, relative address in memslot) pair.
 Given the gfn that the VCPU has faulted on, it would be very expensive
to find the corresponding gfn in the "dirty log address space".  On the
contrary, it's very easy if the VCPU can simply query its current memslots.

In your proposal, there is the problem of where to do the overlap check.
 You cannot do it in the slots because it messes up userspace seriously.
 QEMU for example wants to use as few slots as possible and merges
consecutive slots; but what is mergeable in one address space need not
be mergeable in another.  And if you do it in the address spaces, you
have not solved the problem of multiple dirty bitmaps pointing for the
same userspace address.

BTW, a few more calls have to be converted to kvm_vcpu_ equivalents.
I've now gone through all occurrences of "gfn_to_" and found that we
have more invocations of gfn_to_page_many_atomic, gfn_to_pfn_atomic,
gfn_to_pfn, and gfn_to_page to convert.  Also, mark_page_dirty must be
changed to kvm_vcpu_mark_page_dirty.

> (Well, we could do that even now, by searching for slots that
> differ only in address space id and pointing them to same dirty bitmap.
> This even makes sense, which is a sign of lacking design :)

I thought about this, but ultimately it sounded like premature
optimization (and also might not even work due to the aforementioned
problem with merging of adjacent slots).

A possible optimization is to set a flag when no bits are set in the
dirty bitmap, and skip the iteration.  This is the common case for the
SMM memory space for example.  But it can be done later, and is an
independent work.

Keeping slots separate for different address spaces also makes the most
sense in QEMU, because each address space has a separate MemoryListener
that only knows about one address space.  There is one log_sync callback
per address space and no code has to know about all address spaces.

> The main drawback is that forcing decoupling on existing IOCTLs would be
> really awkward ... we'd need to have new API for address spaces;
> there are two basic operations on an address space:
>   add and remove slot (needs: slot id, address space id)
> which would give roughly the same functionality as this patch.
> (To maximixe compatibility, there could be a default address space and a
>  slot flag that doesn't automatical 
> On top of this, we could allow hierarchical address spaces, so very similar
> address spaces (like SMM) would be easier to set up.

That would actually be really messy. :)

The regular and SMM address spaces are not hierarchical.  As soon as you
put a PCI resource underneath SMRAM---which is exactly what happens for
legacy VRAM at 0xa0000---they can be completely different.  Note that
QEMU can map legacy VRAM as a KVM memslot when using the VGA 320x200x256
color mode (this mapping is not correct from the VGA point of view, but
it cannot be changed in QEMU without breaking migration).

What I do dislike in the API is the 16-bit split of the slots field; but
the alternative of defining KVM_SET_USER_MEMORY_REGION2 and
KVM_GET_DIRTY_LOG2 ioctls is just as sad.  If you prefer it that way, I
can certainly do that.

Is it okay for you if I separate the cleanup patches, and then repost
the actual SMM series after the API discussions have settled?  The
cleanups should be easily reviewable, and they make some sense on their
own.  I'm currently autotesting them.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Radim Krčmář May 19, 2015, 6:28 p.m. UTC | #3
2015-05-19 18:19+0200, Paolo Bonzini:
> On 19/05/2015 15:32, Radim Kr?má? wrote:
> > I'd prefer to decouple address spaces and slots.  KVM_GET_DIRTY_LOG
> > could stay the same if we said that a slot can be in multiple address
> > spaces.
> 
> Actually, my original plan was a third one.  I wanted to have a fixed
> number of address spaces, where CPUs would use only the first ones (0
> for non-x86, 0+1 for x86).  Then I would let userspace pick a "dirty log
> address space" which, in the case of QEMU, would simply use ram_addr_t
> as the address.
> 
> However, that doesn't work because when marking a page dirty you need
> either a (slots, gfn) or a (memslot, relative address in memslot) pair.
>  Given the gfn that the VCPU has faulted on, it would be very expensive
> to find the corresponding gfn in the "dirty log address space".

If I understand your aim correctly, we could go with just one dirty map
in this way:
 - have a dirty bitmap that covers userspace memory of all slots that
   have enabled dirty log
 - store an offset to this bitmap in those slots

It would be a framework that allows userspace to query dirty pages based
on userspace address space.  (Changing slots would be costly, but I
don't think it's done very often.)

>                                                                  On the
> contrary, it's very easy if the VCPU can simply query its current memslots.

Yeah, userspace drew the short stick.

> In your proposal, there is the problem of where to do the overlap check.
>  You cannot do it in the slots because it messes up userspace seriously.
> 
>  QEMU for example wants to use as few slots as possible and merges
> consecutive slots; but what is mergeable in one address space need not
> be mergeable in another.  And if you do it in the address spaces, you
> have not solved the problem of multiple dirty bitmaps pointing for the
> same userspace address.

The worst case is the same (two totally different mappings), but some
cases could make it with less slots.  With hierarchy (more below), we
could even allow overlapping over a large slot.

I didn't want to solve the "multiple bitmaps for same userspace" because
KVM_GET_DIRTY_LOG doesn't care about userspace memory space dirtiness
(thought it's probably what we use it for), but only about the guest
address space;  KVM_GET_DIRTY_LOG just says which which page was
modified in a slot :/
(If we wanted dirty bitmap for userspace memory, it would be better to
 use the solution at the top.)

Less slots just means less worries ...

> > (Well, we could do that even now, by searching for slots that
> > differ only in address space id and pointing them to same dirty bitmap.
> > This even makes sense, which is a sign of lacking design :)
> 
> I thought about this, but ultimately it sounded like premature
> optimization (and also might not even work due to the aforementioned
> problem with merging of adjacent slots).
> 
> A possible optimization is to set a flag when no bits are set in the
> dirty bitmap, and skip the iteration.  This is the common case for the
> SMM memory space for example.  But it can be done later, and is an
> independent work.

True.  (I'm not a huge fan of optimizations through exceptions.)

> Keeping slots separate for different address spaces also makes the most
> sense in QEMU, because each address space has a separate MemoryListener
> that only knows about one address space.  There is one log_sync callback
> per address space and no code has to know about all address spaces.

Ah, I would have thought this is done per slot, when there was no
concept of address space in KVM before this patch :)

> > The main drawback is that forcing decoupling on existing IOCTLs would be
> > really awkward ... we'd need to have new API for address spaces;
> > there are two basic operations on an address space:
> >   add and remove slot (needs: slot id, address space id)
> > which would give roughly the same functionality as this patch.
> > (To maximixe compatibility, there could be a default address space and a
> >  slot flag that doesn't automatical 
> > On top of this, we could allow hierarchical address spaces, so very similar
> > address spaces (like SMM) would be easier to set up.
> 
> That would actually be really messy. :)
> 
> The regular and SMM address spaces are not hierarchical.  As soon as you
> put a PCI resource underneath SMRAM---which is exactly what happens for
> legacy VRAM at 0xa0000---they can be completely different.  Note that
> QEMU can map legacy VRAM as a KVM memslot when using the VGA 320x200x256
> color mode (this mapping is not correct from the VGA point of view, but
> it cannot be changed in QEMU without breaking migration).

How is a PCI resource under SMRAM accessed?
I thought that outside SMM, PCI resource under SMRAM is working
normally, but it will be overshadowed, and made inaccessible, in SMM.

I'm not sure if we mean the same hierarchy.  I meant hierarchy in the
sense than one address space is considered before the other.
(Maybe layers would be a better word.)
SMM address space could have just one slot and be above regular, we'd
then decide how to handle overlapping.

> What I do dislike in the API is the 16-bit split of the slots field; but
> the alternative of defining KVM_SET_USER_MEMORY_REGION2 and
> KVM_GET_DIRTY_LOG2 ioctls is just as sad.  If you prefer it that way, I
> can certainly do that.

No, what we have now is much better.

I'd like to know why a taken route is better than a one I'd prefer,
which probably makes me look like I have a problem with everything,
but your solution is neat.

> Is it okay for you if I separate the cleanup patches, and then repost
> the actual SMM series after the API discussions have settled?  The
> cleanups should be easily reviewable, and they make some sense on their
> own.  I'm currently autotesting them.

Yes, it would be much cleaner.  (I wasn't able to apply these patches,
so I just skimmed it and a proper review will take time.)

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paolo Bonzini May 20, 2015, 7:07 a.m. UTC | #4
On 19/05/2015 20:28, Radim Kr?má? wrote:
> If I understand your aim correctly, we could go with just one dirty map
> in this way:
>  - have a dirty bitmap that covers userspace memory of all slots that
>    have enabled dirty log
>  - store an offset to this bitmap in those slots
> 
> It would be a framework that allows userspace to query dirty pages based
> on userspace address space.  (Changing slots would be costly, but I
> don't think it's done very often.)

Yes, that would work.  However, it would be a significant change to the
KVM dirty bitmap model.  Right now it is GPA-indexed, and it would
become HVA-indexed.

The difference arises if the same memory is aliased at different parts
of the address space.  Aliasing can happen if the 0xA0000 VRAM is a
window inside a larger framebuffer, for example; or it can happen across
separate address spaces.  You cannot be sure that *some* userspace is
relying on the properties of a GPA-indexed dirty bitmap.  Certainly not
QEMU, but we try to be userspace-agnostic.

>>                                                                  On the
>> contrary, it's very easy if the VCPU can simply query its current memslots.
> 
> Yeah, userspace drew the short stick.

Userspace is the slow path, so it makes sense that it did. :)

> I didn't want to solve the "multiple bitmaps for same userspace" because
> KVM_GET_DIRTY_LOG doesn't care about userspace memory space dirtiness
> (thought it's probably what we use it for), but only about the guest
> address space;  KVM_GET_DIRTY_LOG just says which which page was
> modified in a slot :/
> (If we wanted dirty bitmap for userspace memory, it would be better to
>  use the solution at the top.)

Userspace implements dirty bitmap for userspace memory as the OR of the
KVM dirty bitmap with its own dirty bitmap.  There's no particular
advantage to a GPA-indexed dirty bitmap; as you noticed, GPA and HVA
indexing can be equally fast because GPA->HVA mapping is just an add.

It's just that the dirty bitmap it's always been GPA-indexed, and it's
too late to change it.

>> A possible optimization is to set a flag when no bits are set in the
>> dirty bitmap, and skip the iteration.  This is the common case for the
>> SMM memory space for example.  But it can be done later, and is an
>> independent work.
> 
> True.  (I'm not a huge fan of optimizations through exceptions.)

It depends on how often the exception kicks in... :)

>> Keeping slots separate for different address spaces also makes the most
>> sense in QEMU, because each address space has a separate MemoryListener
>> that only knows about one address space.  There is one log_sync callback
>> per address space and no code has to know about all address spaces.
> 
> Ah, I would have thought this is done per slot, when there was no
> concept of address space in KVM before this patch :)

Right; the callback is _called_ per slot, but _registered_ per address
space.  Remember that QEMU does have address spaces.

>> The regular and SMM address spaces are not hierarchical.  As soon as you
>> put a PCI resource underneath SMRAM---which is exactly what happens for
>> legacy VRAM at 0xa0000---they can be completely different.  Note that
>> QEMU can map legacy VRAM as a KVM memslot when using the VGA 320x200x256
>> color mode (this mapping is not correct from the VGA point of view, but
>> it cannot be changed in QEMU without breaking migration).
> 
> How is a PCI resource under SMRAM accessed?
> I thought that outside SMM, PCI resource under SMRAM is working
> normally, but it will be overshadowed, and made inaccessible, in SMM.

Yes, it is.  (There is some chipset magic to make instruction fetches
retrieve SMRAM and data fetches retrieve PCI resources.  I guess you
could use execute-only EPT permissions, but needless to say, we don't care).

> I'm not sure if we mean the same hierarchy.  I meant hierarchy in the
> sense than one address space is considered before the other.
> (Maybe layers would be a better word.)
> SMM address space could have just one slot and be above regular, we'd
> then decide how to handle overlapping.

Ah, now I understand.  That would be doable.

But as they say, "All programming is an exercise in caching."  In this
case, the caching is done by userspace.

QEMU implements the SMM address space exactly by overlaying SMRAM over
normal memory:

    /* Outer container */
   memory_region_init(&smram_as_root, OBJECT(kvm_state),
                      "mem-container-smm", ~0ull);
   memory_region_set_enabled(&smm_as_root, true);

   /* Normal memory with priority 0 */
   memory_region_init_alias(&smm_as_mem, OBJECT(kvm_state), "mem-smm",
                            get_system_memory(), 0, ~0ull);
   memory_region_add_subregion_overlap(&smm_as_root, 0, &smm_as_mem, 0);
   memory_region_set_enabled(&smm_as_mem, true);

   /* SMRAM with priority 10 */
   memory_region_add_subregion_overlap(&smm_as_root, 0, smram, 10);
   memory_region_set_enabled(smram, true);

The caching consists simply in resolving the overlaps beforehand, thus
giving KVM the complete address space.

Since slots do not change often, the simpler code is not worth the
potentially more expensive KVM_SET_USER_MEMORY_REGION (it _is_ more
expensive, if only because it has to be called twice per slot change).

>> What I do dislike in the API is the 16-bit split of the slots field; but
>> the alternative of defining KVM_SET_USER_MEMORY_REGION2 and
>> KVM_GET_DIRTY_LOG2 ioctls is just as sad.  If you prefer it that way, I
>> can certainly do that.
> 
> No, what we have now is much better. I'd like to know why a taken
> route is better than a one I'd prefer, which probably makes me look
> like I have a problem with everything, but your solution is neat.

No problem, it's good because otherwise I take too much advantage of,
well, being able to commit to kvm.git. :)

(Also, it's not entirely my solution.  My solution was v1 and the
userspace patches showed that it was a mess and a dead end...  Compared
to v1, this one requires much more care in the MMU.  The same gfn can
have two completely different mapping and you have to use the right one
when e.g. accessing the rmap.  But that's pretty much the only downside).

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Radim Krčmář May 20, 2015, 3:46 p.m. UTC | #5
2015-05-20 09:07+0200, Paolo Bonzini:
> On 19/05/2015 20:28, Radim Kr?má? wrote:
>>> The regular and SMM address spaces are not hierarchical.  As soon as you
>>> put a PCI resource underneath SMRAM---which is exactly what happens for
>>> legacy VRAM at 0xa0000---they can be completely different.  Note that
>>> QEMU can map legacy VRAM as a KVM memslot when using the VGA 320x200x256
>>> color mode (this mapping is not correct from the VGA point of view, but
>>> it cannot be changed in QEMU without breaking migration).
>>
>> How is a PCI resource under SMRAM accessed?
>> I thought that outside SMM, PCI resource under SMRAM is working
>> normally, but it will be overshadowed, and made inaccessible, in SMM.
>
> Yes, it is.  (There is some chipset magic to make instruction fetches
> retrieve SMRAM and data fetches retrieve PCI resources.  I guess you
> could use execute-only EPT permissions, but needless to say, we don't care).

Interesting, so that part of SMRAM is going to be useless for SMM data?
(Even worse, SMM will read and write to the PCI resource?)

>> I'm not sure if we mean the same hierarchy.  I meant hierarchy in the
>> sense than one address space is considered before the other.
>> (Maybe layers would be a better word.)
>> SMM address space could have just one slot and be above regular, we'd
>> then decide how to handle overlapping.
>
> Ah, now I understand.  That would be doable.
>
> But as they say, "All programming is an exercise in caching."  In this
> case, the caching is done by userspace.

(It's not caching if we wanted a different result ;])

> QEMU implements the SMM address space exactly by overlaying SMRAM over
> normal memory:
| [...]
> The caching consists simply in resolving the overlaps beforehand, thus
> giving KVM the complete address space.
>
> Since slots do not change often, the simpler code is not worth the
> potentially more expensive KVM_SET_USER_MEMORY_REGION (it _is_ more
> expensive, if only because it has to be called twice per slot change).

I am a bit worried about the explosion that would happen if we wanted,
for example, per-VCPU address spaces;  SMM would double their amount.

My main issue (orthogonal to layering) is that we don't allow a way to
let userspace tell us that some slots in different name spaces are the
same slot.  We're losing information that could be useful in the future
(I can only think of less slot queries for dirty log now).

What I like about your solution is that it fits existing code really
well, is easily modified if needs change, and that it already exists.
All my ideas would require more code in kernel, which really doesn't
seem to be worth the benefits it would bring to the SMM use case ...

I'm ok with this approach,

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paolo Bonzini May 20, 2015, 6:17 p.m. UTC | #6
On 20/05/2015 17:46, Radim Kr?má? wrote:
> I am a bit worried about the explosion that would happen if we wanted,
> for example, per-VCPU address spaces

Those would be very expensive.  If we were to implement relocatable APIC
base, we would have to do it in a different way than with memslots.

> My main issue (orthogonal to layering) is that we don't allow a way to
> let userspace tell us that some slots in different name spaces are the
> same slot.  We're losing information that could be useful in the future
> (I can only think of less slot queries for dirty log now).

You're right.  On the other hand, I think the ship has sailed the moment
the dirty log was GPA-indexed.

> What I like about your solution is that it fits existing code really
> well, is easily modified if needs change, and that it already exists.

Yes, it does fit existing code really well.

Thanks for the discussion!

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
index 51523b70b6b2..91ab5f2354aa 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -254,6 +254,11 @@  since the last call to this ioctl.  Bit 0 is the first page in the
 memory slot.  Ensure the entire structure is cleared to avoid padding
 issues.
 
+If KVM_CAP_MULTI_ADDRESS_SPACE is available, bits 16-31 specifies
+the address space for which you want to return the dirty bitmap.
+They must be less than the value that KVM_CHECK_EXTENSION returns for
+the KVM_CAP_MULTI_ADDRESS_SPACE capability.
+
 
 4.9 KVM_SET_MEMORY_ALIAS
 
@@ -924,6 +929,13 @@  slot.  When changing an existing slot, it may be moved in the guest
 physical memory space, or its flags may be modified.  It may not be
 resized.  Slots may not overlap in guest physical address space.
 
+If KVM_CAP_MULTI_ADDRESS_SPACE is available, bits 16-31 of "slot"
+specifies the address space which is being modified.  They must be
+less than the value that KVM_CHECK_EXTENSION returns for the
+KVM_CAP_MULTI_ADDRESS_SPACE capability.  Slots in separate address spaces
+are unrelated; the restriction on overlapping slots only applies within
+each address space.
+
 Memory for the region is taken starting at the address denoted by the
 field userspace_addr, which must point at user addressable memory for
 the entire memory slot size.  Any object may back this memory, including
diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h
index 3536d12eb798..2aa79c864e91 100644
--- a/arch/powerpc/include/asm/kvm_book3s_64.h
+++ b/arch/powerpc/include/asm/kvm_book3s_64.h
@@ -430,7 +430,7 @@  static inline void note_hpte_modification(struct kvm *kvm,
  */
 static inline struct kvm_memslots *kvm_memslots_raw(struct kvm *kvm)
 {
-	return rcu_dereference_raw_notrace(kvm->memslots);
+	return rcu_dereference_raw_notrace(kvm->memslots[0]);
 }
 
 extern void kvmppc_mmu_debugfs_init(struct kvm *kvm);
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 427a1034a70e..5c0ee82f6dbd 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -44,6 +44,10 @@ 
 /* Two fragments for cross MMIO pages. */
 #define KVM_MAX_MMIO_FRAGMENTS	2
 
+#ifndef KVM_ADDRESS_SPACE_NUM
+#define KVM_ADDRESS_SPACE_NUM	1
+#endif
+
 /*
  * For the normal pfn, the highest 12 bits should be zero,
  * so we can mask bit 62 ~ bit 52  to indicate the error pfn,
@@ -331,6 +335,13 @@  struct kvm_kernel_irq_routing_entry {
 #define KVM_MEM_SLOTS_NUM (KVM_USER_MEM_SLOTS + KVM_PRIVATE_MEM_SLOTS)
 #endif
 
+#ifndef __KVM_VCPU_MULTIPLE_ADDRESS_SPACE
+static inline int kvm_arch_vcpu_memslots_id(struct kvm_vcpu *vcpu)
+{
+	return 0;
+}
+#endif
+
 /*
  * Note:
  * memslots are not sorted by id anymore, please use id_to_memslot()
@@ -349,7 +360,7 @@  struct kvm {
 	spinlock_t mmu_lock;
 	struct mutex slots_lock;
 	struct mm_struct *mm; /* userspace tied to this vm */
-	struct kvm_memslots *memslots;
+	struct kvm_memslots *memslots[KVM_ADDRESS_SPACE_NUM];
 	struct srcu_struct srcu;
 	struct srcu_struct irq_srcu;
 #ifdef CONFIG_KVM_APIC_ARCHITECTURE
@@ -464,16 +475,23 @@  void kvm_exit(void);
 void kvm_get_kvm(struct kvm *kvm);
 void kvm_put_kvm(struct kvm *kvm);
 
-static inline struct kvm_memslots *kvm_memslots(struct kvm *kvm)
+static inline struct kvm_memslots *__kvm_memslots(struct kvm *kvm, int as_id)
 {
-	return rcu_dereference_check(kvm->memslots,
+	return rcu_dereference_check(kvm->memslots[as_id],
 			srcu_read_lock_held(&kvm->srcu)
 			|| lockdep_is_held(&kvm->slots_lock));
 }
 
+static inline struct kvm_memslots *kvm_memslots(struct kvm *kvm)
+{
+	return __kvm_memslots(kvm, 0);
+}
+
 static inline struct kvm_memslots *kvm_vcpu_memslots(struct kvm_vcpu *vcpu)
 {
-	return kvm_memslots(vcpu->kvm);
+	int as_id = kvm_arch_vcpu_memslots_id(vcpu);
+
+	return __kvm_memslots(vcpu->kvm, as_id);
 }
 
 static inline struct kvm_memory_slot *
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index eace8babd227..5ff1038437e3 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -816,6 +816,7 @@  struct kvm_ppc_smmu_info {
 #define KVM_CAP_PPC_HWRNG 115
 #define KVM_CAP_DISABLE_QUIRKS 116
 #define KVM_CAP_X86_SMM 117
+#define KVM_CAP_MULTI_ADDRESS_SPACE 118
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 2cbda443880b..6838818e2452 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -517,9 +517,11 @@  static struct kvm *kvm_create_vm(unsigned long type)
 	BUILD_BUG_ON(KVM_MEM_SLOTS_NUM > SHRT_MAX);
 
 	r = -ENOMEM;
-	kvm->memslots = kvm_alloc_memslots();
-	if (!kvm->memslots)
-		goto out_err_no_srcu;
+	for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) {
+		kvm->memslots[i] = kvm_alloc_memslots();
+		if (!kvm->memslots[i])
+			goto out_err_no_srcu;
+	}
 
 	if (init_srcu_struct(&kvm->srcu))
 		goto out_err_no_srcu;
@@ -561,7 +563,8 @@  out_err_no_srcu:
 out_err_no_disable:
 	for (i = 0; i < KVM_NR_BUSES; i++)
 		kfree(kvm->buses[i]);
-	kvm_free_memslots(kvm, kvm->memslots);
+	for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++)
+		kvm_free_memslots(kvm, kvm->memslots[i]);
 	kvm_arch_free_vm(kvm);
 	return ERR_PTR(r);
 }
@@ -611,7 +614,8 @@  static void kvm_destroy_vm(struct kvm *kvm)
 #endif
 	kvm_arch_destroy_vm(kvm);
 	kvm_destroy_devices(kvm);
-	kvm_free_memslots(kvm, kvm->memslots);
+	for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++)
+		kvm_free_memslots(kvm, kvm->memslots[i]);
 	cleanup_srcu_struct(&kvm->irq_srcu);
 	cleanup_srcu_struct(&kvm->srcu);
 	kvm_arch_free_vm(kvm);
@@ -730,9 +734,9 @@  static int check_memory_region_flags(const struct kvm_userspace_memory_region *m
 }
 
 static struct kvm_memslots *install_new_memslots(struct kvm *kvm,
-		struct kvm_memslots *slots)
+		int as_id, struct kvm_memslots *slots)
 {
-	struct kvm_memslots *old_memslots = kvm_memslots(kvm);
+	struct kvm_memslots *old_memslots = __kvm_memslots(kvm, as_id);
 
 	/*
 	 * Set the low bit in the generation, which disables SPTE caching
@@ -741,7 +745,7 @@  static struct kvm_memslots *install_new_memslots(struct kvm *kvm,
 	WARN_ON(old_memslots->generation & 1);
 	slots->generation = old_memslots->generation + 1;
 
-	rcu_assign_pointer(kvm->memslots, slots);
+	rcu_assign_pointer(kvm->memslots[as_id], slots);
 	synchronize_srcu_expedited(&kvm->srcu);
 
 	/*
@@ -773,6 +777,7 @@  int __kvm_set_memory_region(struct kvm *kvm,
 	struct kvm_memory_slot *slot;
 	struct kvm_memory_slot old, new;
 	struct kvm_memslots *slots = NULL, *old_memslots;
+	int as_id, id;
 	enum kvm_mr_change change;
 
 	r = check_memory_region_flags(mem);
@@ -780,24 +785,27 @@  int __kvm_set_memory_region(struct kvm *kvm,
 		goto out;
 
 	r = -EINVAL;
+	as_id = mem->slot >> 16;
+	id = mem->slot & 65535;
+
 	/* General sanity checks */
 	if (mem->memory_size & (PAGE_SIZE - 1))
 		goto out;
 	if (mem->guest_phys_addr & (PAGE_SIZE - 1))
 		goto out;
 	/* We can read the guest memory with __xxx_user() later on. */
-	if ((mem->slot < KVM_USER_MEM_SLOTS) &&
+	if ((id < KVM_USER_MEM_SLOTS) &&
 	    ((mem->userspace_addr & (PAGE_SIZE - 1)) ||
 	     !access_ok(VERIFY_WRITE,
 			(void __user *)(unsigned long)mem->userspace_addr,
 			mem->memory_size)))
 		goto out;
-	if (mem->slot >= KVM_MEM_SLOTS_NUM)
+	if (as_id >= KVM_ADDRESS_SPACE_NUM || id >= KVM_MEM_SLOTS_NUM)
 		goto out;
 	if (mem->guest_phys_addr + mem->memory_size < mem->guest_phys_addr)
 		goto out;
 
-	slot = id_to_memslot(kvm_memslots(kvm), mem->slot);
+	slot = id_to_memslot(__kvm_memslots(kvm, as_id), id);
 	base_gfn = mem->guest_phys_addr >> PAGE_SHIFT;
 	npages = mem->memory_size >> PAGE_SHIFT;
 
@@ -806,7 +814,7 @@  int __kvm_set_memory_region(struct kvm *kvm,
 
 	new = old = *slot;
 
-	new.id = mem->slot;
+	new.id = id;
 	new.base_gfn = base_gfn;
 	new.npages = npages;
 	new.flags = mem->flags;
@@ -837,9 +845,9 @@  int __kvm_set_memory_region(struct kvm *kvm,
 	if ((change == KVM_MR_CREATE) || (change == KVM_MR_MOVE)) {
 		/* Check for overlaps */
 		r = -EEXIST;
-		kvm_for_each_memslot(slot, kvm_memslots(kvm)) {
+		kvm_for_each_memslot(slot, __kvm_memslots(kvm, as_id)) {
 			if ((slot->id >= KVM_USER_MEM_SLOTS) ||
-			    (slot->id == mem->slot))
+			    (slot->id == id))
 				continue;
 			if (!((base_gfn + npages <= slot->base_gfn) ||
 			      (base_gfn >= slot->base_gfn + slot->npages)))
@@ -868,13 +876,13 @@  int __kvm_set_memory_region(struct kvm *kvm,
 	slots = kvm_kvzalloc(sizeof(struct kvm_memslots));
 	if (!slots)
 		goto out_free;
-	memcpy(slots, kvm_memslots(kvm), sizeof(struct kvm_memslots));
+	memcpy(slots, __kvm_memslots(kvm, as_id), sizeof(struct kvm_memslots));
 
 	if ((change == KVM_MR_DELETE) || (change == KVM_MR_MOVE)) {
-		slot = id_to_memslot(slots, mem->slot);
+		slot = id_to_memslot(slots, id);
 		slot->flags |= KVM_MEMSLOT_INVALID;
 
-		old_memslots = install_new_memslots(kvm, slots);
+		old_memslots = install_new_memslots(kvm, as_id, slots);
 
 		/* slot was deleted or moved, clear iommu mapping */
 		kvm_iommu_unmap_pages(kvm, &old);
@@ -906,7 +914,7 @@  int __kvm_set_memory_region(struct kvm *kvm,
 	}
 
 	update_memslots(slots, &new);
-	old_memslots = install_new_memslots(kvm, slots);
+	old_memslots = install_new_memslots(kvm, as_id, slots);
 
 	kvm_arch_commit_memory_region(kvm, mem, &old, &new, change);
 
@@ -953,7 +961,7 @@  EXPORT_SYMBOL_GPL(kvm_set_memory_region);
 static int kvm_vm_ioctl_set_memory_region(struct kvm *kvm,
 					  struct kvm_userspace_memory_region *mem)
 {
-	if (mem->slot >= KVM_USER_MEM_SLOTS)
+	if ((mem->slot & 65535) >= KVM_USER_MEM_SLOTS)
 		return -EINVAL;
 
 	if (!mem->memory_size)
@@ -967,16 +975,18 @@  int kvm_get_dirty_log(struct kvm *kvm,
 {
 	struct kvm_memslots *slots;
 	struct kvm_memory_slot *memslot;
-	int r, i;
+	int r, i, as_id, id;
 	unsigned long n;
 	unsigned long any = 0;
 
 	r = -EINVAL;
-	if (log->slot >= KVM_USER_MEM_SLOTS)
+	as_id = log->slot >> 16;
+	id = log->slot & 65535;
+	if (as_id >= KVM_ADDRESS_SPACE_NUM || id >= KVM_USER_MEM_SLOTS)
 		goto out;
 
-	slots = kvm_memslots(kvm);
-	memslot = id_to_memslot(slots, log->slot);
+	slots = __kvm_memslots(kvm, as_id);
+	memslot = id_to_memslot(slots, id);
 	r = -ENOENT;
 	if (!memslot->dirty_bitmap)
 		goto out;
@@ -1027,17 +1037,19 @@  int kvm_get_dirty_log_protect(struct kvm *kvm,
 {
 	struct kvm_memslots *slots;
 	struct kvm_memory_slot *memslot;
-	int r, i;
+	int r, i, as_id, id;
 	unsigned long n;
 	unsigned long *dirty_bitmap;
 	unsigned long *dirty_bitmap_buffer;
 
 	r = -EINVAL;
-	if (log->slot >= KVM_USER_MEM_SLOTS)
+	as_id = log->slot >> 16;
+	id = log->slot & 65535;
+	if (as_id >= KVM_ADDRESS_SPACE_NUM || id >= KVM_USER_MEM_SLOTS)
 		goto out;
 
-	slots = kvm_memslots(kvm);
-	memslot = id_to_memslot(slots, log->slot);
+	slots = __kvm_memslots(kvm, as_id);
+	memslot = id_to_memslot(slots, id);
 
 	dirty_bitmap = memslot->dirty_bitmap;
 	r = -ENOENT;
@@ -2592,6 +2604,10 @@  static long kvm_vm_ioctl_check_extension_generic(struct kvm *kvm, long arg)
 	case KVM_CAP_IRQ_ROUTING:
 		return KVM_MAX_IRQ_ROUTES;
 #endif
+#if KVM_ADDRESS_SPACE_NUM > 1
+	case KVM_CAP_MULTI_ADDRESS_SPACE:
+		return KVM_ADDRESS_SPACE_NUM;
+#endif
 	default:
 		break;
 	}