diff mbox series

[RFC,v1,2/9] KVM: guest_memfd: Add guest_memfd support to kvm_(read|/write)_guest_page()

Message ID 20250122152738.1173160-3-tabba@google.com (mailing list archive)
State Not Applicable
Headers show
Series KVM: Mapping of guest_memfd at the host and a software protected VM type | expand

Commit Message

Fuad Tabba Jan. 22, 2025, 3:27 p.m. UTC
Make kvm_(read|/write)_guest_page() capable of accessing guest
memory for slots that don't have a userspace address, but only if
the memory is mappable, which also indicates that it is
accessible by the host.

Signed-off-by: Fuad Tabba <tabba@google.com>
---
 virt/kvm/kvm_main.c | 118 +++++++++++++++++++++++++++++++++++++-------
 1 file changed, 99 insertions(+), 19 deletions(-)

Comments

David Hildenbrand Jan. 22, 2025, 10:10 p.m. UTC | #1
On 22.01.25 16:27, Fuad Tabba wrote:
> Make kvm_(read|/write)_guest_page() capable of accessing guest
> memory for slots that don't have a userspace address, but only if
> the memory is mappable, which also indicates that it is
> accessible by the host.

Interesting. So far my assumption was that, for shared memory, user 
space would simply mmap() guest_memdd and pass it as userspace address 
to the same memslot that has this guest_memfd for private memory.

Wouldn't that be easier in the first shot? (IOW, not require this patch 
with the cost of faulting the shared page into the page table on access)
Fuad Tabba Jan. 23, 2025, 9:48 a.m. UTC | #2
On Wed, 22 Jan 2025 at 22:10, David Hildenbrand <david@redhat.com> wrote:
>
> On 22.01.25 16:27, Fuad Tabba wrote:
> > Make kvm_(read|/write)_guest_page() capable of accessing guest
> > memory for slots that don't have a userspace address, but only if
> > the memory is mappable, which also indicates that it is
> > accessible by the host.
>
> Interesting. So far my assumption was that, for shared memory, user
> space would simply mmap() guest_memdd and pass it as userspace address
> to the same memslot that has this guest_memfd for private memory.
>
> Wouldn't that be easier in the first shot? (IOW, not require this patch
> with the cost of faulting the shared page into the page table on access)

This has to do more with the ABI I had for pkvm and shared memory
implementations, in which you don't need to specify the userspace
address for memory in a guestmem memslot. The issue is there is no
obvious address to map it to. This would be the case in kvm:arm64 for
tracking paravirtualized time, which the userspace doesn't necessarily
need to interact with, but kvm does.

That said, we could always have a userspace address dedicated to
mapping shared locations, and use that address when the necessity
arises. Or we could always require that memslots have a userspace
address, even if not used. I don't really have a strong preference.

Thanks,
/fuad

> --
> Cheers,
>
> David / dhildenb
>
David Hildenbrand Jan. 23, 2025, 11:39 a.m. UTC | #3
On 23.01.25 10:48, Fuad Tabba wrote:
> On Wed, 22 Jan 2025 at 22:10, David Hildenbrand <david@redhat.com> wrote:
>>
>> On 22.01.25 16:27, Fuad Tabba wrote:
>>> Make kvm_(read|/write)_guest_page() capable of accessing guest
>>> memory for slots that don't have a userspace address, but only if
>>> the memory is mappable, which also indicates that it is
>>> accessible by the host.
>>
>> Interesting. So far my assumption was that, for shared memory, user
>> space would simply mmap() guest_memdd and pass it as userspace address
>> to the same memslot that has this guest_memfd for private memory.
>>
>> Wouldn't that be easier in the first shot? (IOW, not require this patch
>> with the cost of faulting the shared page into the page table on access)
> 

In light of:

https://lkml.kernel.org/r/20250117190938.93793-4-imbrenda@linux.ibm.com

there can, in theory, be memslots that start at address 0 and have a 
"valid" mapping. This case is done from the kernel (and on special s390x 
hardware), though, so it does not apply here at all so far.

In practice, getting address 0 as a valid address is unlikely, because 
the default:

$ sysctl  vm.mmap_min_addr
vm.mmap_min_addr = 65536

usually prohibits it for good reason.

> This has to do more with the ABI I had for pkvm and shared memory
> implementations, in which you don't need to specify the userspace
> address for memory in a guestmem memslot. The issue is there is no
> obvious address to map it to. This would be the case in kvm:arm64 for
> tracking paravirtualized time, which the userspace doesn't necessarily
> need to interact with, but kvm does.

So I understand correctly: userspace wouldn't have to mmap it because it 
is not interested in accessing it, but there is nothing speaking against 
mmaping it, at least in the first shot.

I assume it would not be a private memslot (so far, my understanding is 
that internal memslots never have a guest_memfd attached). 
kvm_gmem_create() is only called via KVM_CREATE_GUEST_MEMFD, to be set 
on user-created memslots.

> 
> That said, we could always have a userspace address dedicated to
> mapping shared locations, and use that address when the necessity
> arises. Or we could always require that memslots have a userspace
> address, even if not used. I don't really have a strong preference.

So, the simpler version where user space would simply mmap guest_memfd 
to provide the address via userspace_addr would at least work for the 
use case of paravirtualized time?

It would get rid of the immediate need for this patch and patch #4 to 
get it flying.


One interesting question is: when would you want shared memory in 
guest_memfd and *not* provide it as part of the same memslot.


One nice thing about the mmap might be that access go via user-space 
page tables: E.g., __kvm_read_guest_page can just access the memory 
without requiring the folio lock and an additional temporary folio 
reference on every access -- it's handled implicitly via the mapcount.

(of course, to map the page we still need that once on the fault path)
Patrick Roy Jan. 23, 2025, 11:57 a.m. UTC | #4
On Thu, 2025-01-23 at 11:39 +0000, David Hildenbrand wrote:
> On 23.01.25 10:48, Fuad Tabba wrote:
>> On Wed, 22 Jan 2025 at 22:10, David Hildenbrand <david@redhat.com> wrote:
>>>
>>> On 22.01.25 16:27, Fuad Tabba wrote:
>>>> Make kvm_(read|/write)_guest_page() capable of accessing guest
>>>> memory for slots that don't have a userspace address, but only if
>>>> the memory is mappable, which also indicates that it is
>>>> accessible by the host.
>>>
>>> Interesting. So far my assumption was that, for shared memory, user
>>> space would simply mmap() guest_memdd and pass it as userspace address
>>> to the same memslot that has this guest_memfd for private memory.
>>>
>>> Wouldn't that be easier in the first shot? (IOW, not require this patch
>>> with the cost of faulting the shared page into the page table on access)
>>
> 
> In light of:
> 
> https://lkml.kernel.org/r/20250117190938.93793-4-imbrenda@linux.ibm.com
> 
> there can, in theory, be memslots that start at address 0 and have a
> "valid" mapping. This case is done from the kernel (and on special s390x
> hardware), though, so it does not apply here at all so far.
> 
> In practice, getting address 0 as a valid address is unlikely, because
> the default:
> 
> $ sysctl  vm.mmap_min_addr
> vm.mmap_min_addr = 65536
> 
> usually prohibits it for good reason.
> 
>> This has to do more with the ABI I had for pkvm and shared memory
>> implementations, in which you don't need to specify the userspace
>> address for memory in a guestmem memslot. The issue is there is no
>> obvious address to map it to. This would be the case in kvm:arm64 for
>> tracking paravirtualized time, which the userspace doesn't necessarily
>> need to interact with, but kvm does.
> 
> So I understand correctly: userspace wouldn't have to mmap it because it
> is not interested in accessing it, but there is nothing speaking against
> mmaping it, at least in the first shot.
> 
> I assume it would not be a private memslot (so far, my understanding is
> that internal memslots never have a guest_memfd attached).
> kvm_gmem_create() is only called via KVM_CREATE_GUEST_MEMFD, to be set
> on user-created memslots.
> 
>>
>> That said, we could always have a userspace address dedicated to
>> mapping shared locations, and use that address when the necessity
>> arises. Or we could always require that memslots have a userspace
>> address, even if not used. I don't really have a strong preference.
> 
> So, the simpler version where user space would simply mmap guest_memfd
> to provide the address via userspace_addr would at least work for the
> use case of paravirtualized time?

fwiw, I'm currently prototyping something like this for x86 (although
not by putting the gmem address into userspace_addr, but by adding a new
field to memslots, so that memory attributes continue working), based on
what we talked about at the last guest_memfd sync meeting (the whole
"how to get MMIO emulation working for non-CoCo VMs in guest_memfd"
story). So I guess if we're going down this route for x86, maybe it
makes sense to do the same on ARM, for consistency?

> It would get rid of the immediate need for this patch and patch #4 to
> get it flying.
> 
> 
> One interesting question is: when would you want shared memory in
> guest_memfd and *not* provide it as part of the same memslot. 

In my testing of non-CoCo gmem VMs on ARM, I've been able to get quite
far without giving KVM a way to internally access shared parts of gmem - 
it's why I was probing Fuad for this simplified series, because
KVM_SW_PROTECTED_VM + mmap (for loading guest kernel) is enough to get a
working non-CoCo VM on ARM (although I admittedly never looked at clocks
inside the guest - maybe that's one thing that breaks if KVM can't
access gmem. How to guest and host agree on the guest memory range
used to exchange paravirtual timekeeping information? Could that exchange
be intercepted in userspace, and set to shared via memory attributes (e.g.
placed outside gmem)? That's the route I'm going down the paravirtual
time on x86).

> One nice thing about the mmap might be that access go via user-space
> page tables: E.g., __kvm_read_guest_page can just access the memory
> without requiring the folio lock and an additional temporary folio
> reference on every access -- it's handled implicitly via the mapcount.
> 
> (of course, to map the page we still need that once on the fault path)

Doing a direct map access in kvm_{read,write}_guest() and friends will
also get tricky if guest_memfd folios ever don't have direct map
entries. On-demand restoration is painful, both complexity and
performance wise [1], while going through a userspace mapping of
guest_memfd would "just work".

> -- 
> Cheers,
> 
> David / dhildenb
>
Fuad Tabba Jan. 23, 2025, 12:16 p.m. UTC | #5
Hi David,

On Thu, 23 Jan 2025 at 11:39, David Hildenbrand <david@redhat.com> wrote:
>
> On 23.01.25 10:48, Fuad Tabba wrote:
> > On Wed, 22 Jan 2025 at 22:10, David Hildenbrand <david@redhat.com> wrote:
> >>
> >> On 22.01.25 16:27, Fuad Tabba wrote:
> >>> Make kvm_(read|/write)_guest_page() capable of accessing guest
> >>> memory for slots that don't have a userspace address, but only if
> >>> the memory is mappable, which also indicates that it is
> >>> accessible by the host.
> >>
> >> Interesting. So far my assumption was that, for shared memory, user
> >> space would simply mmap() guest_memdd and pass it as userspace address
> >> to the same memslot that has this guest_memfd for private memory.
> >>
> >> Wouldn't that be easier in the first shot? (IOW, not require this patch
> >> with the cost of faulting the shared page into the page table on access)
> >

Before answering your questions in detail, in light of this
discussion, and in other discussions I've had, I think the more
reasonable thing to do is to let the memslot for in-place shared
memory look the same as the other guest_memfd memslots, i.e., have a
userspace address regardless if it's needed or not. This makes the
interface the same across all guest_memfd implementations, and avoids
the need for patches such as this one, at least initially until we
realize that we need it.

That said...

>
> In light of:
>
> https://lkml.kernel.org/r/20250117190938.93793-4-imbrenda@linux.ibm.com
>
> there can, in theory, be memslots that start at address 0 and have a
> "valid" mapping. This case is done from the kernel (and on special s390x
> hardware), though, so it does not apply here at all so far.
>
> In practice, getting address 0 as a valid address is unlikely, because
> the default:
>
> $ sysctl  vm.mmap_min_addr
> vm.mmap_min_addr = 65536
>
> usually prohibits it for good reason.

Ack.

> > This has to do more with the ABI I had for pkvm and shared memory
> > implementations, in which you don't need to specify the userspace
> > address for memory in a guestmem memslot. The issue is there is no
> > obvious address to map it to. This would be the case in kvm:arm64 for
> > tracking paravirtualized time, which the userspace doesn't necessarily
> > need to interact with, but kvm does.
>
> So I understand correctly: userspace wouldn't have to mmap it because it
> is not interested in accessing it, but there is nothing speaking against
> mmaping it, at least in the first shot.

That's right.

> I assume it would not be a private memslot (so far, my understanding is
> that internal memslots never have a guest_memfd attached).
> kvm_gmem_create() is only called via KVM_CREATE_GUEST_MEMFD, to be set
> on user-created memslots.

Right, it wouldn't be a private memslot, but a user created one.

> >
> > That said, we could always have a userspace address dedicated to
> > mapping shared locations, and use that address when the necessity
> > arises. Or we could always require that memslots have a userspace
> > address, even if not used. I don't really have a strong preference.
>
> So, the simpler version where user space would simply mmap guest_memfd
> to provide the address via userspace_addr would at least work for the
> use case of paravirtualized time?
>
> It would get rid of the immediate need for this patch and patch #4 to
> get it flying.

I agree now.

>
> One interesting question is: when would you want shared memory in
> guest_memfd and *not* provide it as part of the same memslot.
>

My original thinking wasn't that we wouldn't _want_ to, rather that we
don't _need_ to. I mistakenly thought that it would simplify things,
but, especially after this discussion, I now realize that it makes
things more complicated instead.

> One nice thing about the mmap might be that access go via user-space
> page tables: E.g., __kvm_read_guest_page can just access the memory
> without requiring the folio lock and an additional temporary folio
> reference on every access -- it's handled implicitly via the mapcount.
>
> (of course, to map the page we still need that once on the fault path)

Ack.

Thanks,
/fuad


> --
> Cheers,
>
> David / dhildenb
>
Fuad Tabba Jan. 23, 2025, 12:28 p.m. UTC | #6
Hi Patrick,

On Thu, 23 Jan 2025 at 11:57, Patrick Roy <roypat@amazon.co.uk> wrote:
>
>
>
> On Thu, 2025-01-23 at 11:39 +0000, David Hildenbrand wrote:
> > On 23.01.25 10:48, Fuad Tabba wrote:
> >> On Wed, 22 Jan 2025 at 22:10, David Hildenbrand <david@redhat.com> wrote:
> >>>
> >>> On 22.01.25 16:27, Fuad Tabba wrote:
> >>>> Make kvm_(read|/write)_guest_page() capable of accessing guest
> >>>> memory for slots that don't have a userspace address, but only if
> >>>> the memory is mappable, which also indicates that it is
> >>>> accessible by the host.
> >>>
> >>> Interesting. So far my assumption was that, for shared memory, user
> >>> space would simply mmap() guest_memdd and pass it as userspace address
> >>> to the same memslot that has this guest_memfd for private memory.
> >>>
> >>> Wouldn't that be easier in the first shot? (IOW, not require this patch
> >>> with the cost of faulting the shared page into the page table on access)
> >>
> >
> > In light of:
> >
> > https://lkml.kernel.org/r/20250117190938.93793-4-imbrenda@linux.ibm.com
> >
> > there can, in theory, be memslots that start at address 0 and have a
> > "valid" mapping. This case is done from the kernel (and on special s390x
> > hardware), though, so it does not apply here at all so far.
> >
> > In practice, getting address 0 as a valid address is unlikely, because
> > the default:
> >
> > $ sysctl  vm.mmap_min_addr
> > vm.mmap_min_addr = 65536
> >
> > usually prohibits it for good reason.
> >
> >> This has to do more with the ABI I had for pkvm and shared memory
> >> implementations, in which you don't need to specify the userspace
> >> address for memory in a guestmem memslot. The issue is there is no
> >> obvious address to map it to. This would be the case in kvm:arm64 for
> >> tracking paravirtualized time, which the userspace doesn't necessarily
> >> need to interact with, but kvm does.
> >
> > So I understand correctly: userspace wouldn't have to mmap it because it
> > is not interested in accessing it, but there is nothing speaking against
> > mmaping it, at least in the first shot.
> >
> > I assume it would not be a private memslot (so far, my understanding is
> > that internal memslots never have a guest_memfd attached).
> > kvm_gmem_create() is only called via KVM_CREATE_GUEST_MEMFD, to be set
> > on user-created memslots.
> >
> >>
> >> That said, we could always have a userspace address dedicated to
> >> mapping shared locations, and use that address when the necessity
> >> arises. Or we could always require that memslots have a userspace
> >> address, even if not used. I don't really have a strong preference.
> >
> > So, the simpler version where user space would simply mmap guest_memfd
> > to provide the address via userspace_addr would at least work for the
> > use case of paravirtualized time?
>
> fwiw, I'm currently prototyping something like this for x86 (although
> not by putting the gmem address into userspace_addr, but by adding a new
> field to memslots, so that memory attributes continue working), based on
> what we talked about at the last guest_memfd sync meeting (the whole
> "how to get MMIO emulation working for non-CoCo VMs in guest_memfd"
> story). So I guess if we're going down this route for x86, maybe it
> makes sense to do the same on ARM, for consistency?
>
> > It would get rid of the immediate need for this patch and patch #4 to
> > get it flying.
> >
> >
> > One interesting question is: when would you want shared memory in
> > guest_memfd and *not* provide it as part of the same memslot.
>
> In my testing of non-CoCo gmem VMs on ARM, I've been able to get quite
> far without giving KVM a way to internally access shared parts of gmem -
> it's why I was probing Fuad for this simplified series, because
> KVM_SW_PROTECTED_VM + mmap (for loading guest kernel) is enough to get a
> working non-CoCo VM on ARM (although I admittedly never looked at clocks
> inside the guest - maybe that's one thing that breaks if KVM can't
> access gmem. How to guest and host agree on the guest memory range
> used to exchange paravirtual timekeeping information? Could that exchange
> be intercepted in userspace, and set to shared via memory attributes (e.g.
> placed outside gmem)? That's the route I'm going down the paravirtual
> time on x86).

For an idea of what it looks like on arm64, here's how kvmtool handles it:
https://github.com/kvmtool/kvmtool/blob/master/arm/aarch64/pvtime.c

Cheers,
/fuad





> > One nice thing about the mmap might be that access go via user-space
> > page tables: E.g., __kvm_read_guest_page can just access the memory
> > without requiring the folio lock and an additional temporary folio
> > reference on every access -- it's handled implicitly via the mapcount.
> >
> > (of course, to map the page we still need that once on the fault path)
>
> Doing a direct map access in kvm_{read,write}_guest() and friends will
> also get tricky if guest_memfd folios ever don't have direct map
> entries. On-demand restoration is painful, both complexity and
> performance wise [1], while going through a userspace mapping of
> guest_memfd would "just work".
>
> > --
> > Cheers,
> >
> > David / dhildenb
> >
Patrick Roy Jan. 23, 2025, 1:57 p.m. UTC | #7
On Thu, 2025-01-23 at 12:28 +0000, Fuad Tabba wrote:
> Hi Patrick,
> 
> On Thu, 23 Jan 2025 at 11:57, Patrick Roy <roypat@amazon.co.uk> wrote:
>>
>>
>>
>> On Thu, 2025-01-23 at 11:39 +0000, David Hildenbrand wrote:
>>> On 23.01.25 10:48, Fuad Tabba wrote:
>>>> On Wed, 22 Jan 2025 at 22:10, David Hildenbrand <david@redhat.com> wrote:
>>>>>
>>>>> On 22.01.25 16:27, Fuad Tabba wrote:
>>>>>> Make kvm_(read|/write)_guest_page() capable of accessing guest
>>>>>> memory for slots that don't have a userspace address, but only if
>>>>>> the memory is mappable, which also indicates that it is
>>>>>> accessible by the host.
>>>>>
>>>>> Interesting. So far my assumption was that, for shared memory, user
>>>>> space would simply mmap() guest_memdd and pass it as userspace address
>>>>> to the same memslot that has this guest_memfd for private memory.
>>>>>
>>>>> Wouldn't that be easier in the first shot? (IOW, not require this patch
>>>>> with the cost of faulting the shared page into the page table on access)
>>>>
>>>
>>> In light of:
>>>
>>> https://lkml.kernel.org/r/20250117190938.93793-4-imbrenda@linux.ibm.com
>>>
>>> there can, in theory, be memslots that start at address 0 and have a
>>> "valid" mapping. This case is done from the kernel (and on special s390x
>>> hardware), though, so it does not apply here at all so far.
>>>
>>> In practice, getting address 0 as a valid address is unlikely, because
>>> the default:
>>>
>>> $ sysctl  vm.mmap_min_addr
>>> vm.mmap_min_addr = 65536
>>>
>>> usually prohibits it for good reason.
>>>
>>>> This has to do more with the ABI I had for pkvm and shared memory
>>>> implementations, in which you don't need to specify the userspace
>>>> address for memory in a guestmem memslot. The issue is there is no
>>>> obvious address to map it to. This would be the case in kvm:arm64 for
>>>> tracking paravirtualized time, which the userspace doesn't necessarily
>>>> need to interact with, but kvm does.
>>>
>>> So I understand correctly: userspace wouldn't have to mmap it because it
>>> is not interested in accessing it, but there is nothing speaking against
>>> mmaping it, at least in the first shot.
>>>
>>> I assume it would not be a private memslot (so far, my understanding is
>>> that internal memslots never have a guest_memfd attached).
>>> kvm_gmem_create() is only called via KVM_CREATE_GUEST_MEMFD, to be set
>>> on user-created memslots.
>>>
>>>>
>>>> That said, we could always have a userspace address dedicated to
>>>> mapping shared locations, and use that address when the necessity
>>>> arises. Or we could always require that memslots have a userspace
>>>> address, even if not used. I don't really have a strong preference.
>>>
>>> So, the simpler version where user space would simply mmap guest_memfd
>>> to provide the address via userspace_addr would at least work for the
>>> use case of paravirtualized time?
>>
>> fwiw, I'm currently prototyping something like this for x86 (although
>> not by putting the gmem address into userspace_addr, but by adding a new
>> field to memslots, so that memory attributes continue working), based on
>> what we talked about at the last guest_memfd sync meeting (the whole
>> "how to get MMIO emulation working for non-CoCo VMs in guest_memfd"
>> story). So I guess if we're going down this route for x86, maybe it
>> makes sense to do the same on ARM, for consistency?
>>
>>> It would get rid of the immediate need for this patch and patch #4 to
>>> get it flying.
>>>
>>>
>>> One interesting question is: when would you want shared memory in
>>> guest_memfd and *not* provide it as part of the same memslot.
>>
>> In my testing of non-CoCo gmem VMs on ARM, I've been able to get quite
>> far without giving KVM a way to internally access shared parts of gmem -
>> it's why I was probing Fuad for this simplified series, because
>> KVM_SW_PROTECTED_VM + mmap (for loading guest kernel) is enough to get a
>> working non-CoCo VM on ARM (although I admittedly never looked at clocks
>> inside the guest - maybe that's one thing that breaks if KVM can't
>> access gmem. How to guest and host agree on the guest memory range
>> used to exchange paravirtual timekeeping information? Could that exchange
>> be intercepted in userspace, and set to shared via memory attributes (e.g.
>> placed outside gmem)? That's the route I'm going down the paravirtual
>> time on x86).
> 
> For an idea of what it looks like on arm64, here's how kvmtool handles it:
> https://github.com/kvmtool/kvmtool/blob/master/arm/aarch64/pvtime.c
> 
> Cheers,
> /fuad
 
Thanks! In that example, kvmtool actually allocates a separate memslot for
the pvclock stuff, so I guess it's always possible to simply put it into
a non-gmem memslot, which indeed sidesteps this issue as you mention in
your reply to David :D
  
>>> One nice thing about the mmap might be that access go via user-space
>>> page tables: E.g., __kvm_read_guest_page can just access the memory
>>> without requiring the folio lock and an additional temporary folio
>>> reference on every access -- it's handled implicitly via the mapcount.
>>>
>>> (of course, to map the page we still need that once on the fault path)
>>
>> Doing a direct map access in kvm_{read,write}_guest() and friends will
>> also get tricky if guest_memfd folios ever don't have direct map
>> entries. On-demand restoration is painful, both complexity and
>> performance wise [1], while going through a userspace mapping of
>> guest_memfd would "just work".
>>
>>> --
>>> Cheers,
>>>
>>> David / dhildenb
>>>
David Hildenbrand Jan. 23, 2025, 2:15 p.m. UTC | #8
>>
>> One interesting question is: when would you want shared memory in
>> guest_memfd and *not* provide it as part of the same memslot.
>>
> 
> My original thinking wasn't that we wouldn't _want_ to, rather that we
> don't _need_ to. I mistakenly thought that it would simplify things,
> but, especially after this discussion, I now realize that it makes
> things more complicated instead.

Right, and it's not off the table: we could always add support for this 
"no mmaped shared guest_memfd" memory later if there is good reason to 
have it.
David Hildenbrand Jan. 23, 2025, 2:18 p.m. UTC | #9
>>>
>>> That said, we could always have a userspace address dedicated to
>>> mapping shared locations, and use that address when the necessity
>>> arises. Or we could always require that memslots have a userspace
>>> address, even if not used. I don't really have a strong preference.
>>
>> So, the simpler version where user space would simply mmap guest_memfd
>> to provide the address via userspace_addr would at least work for the
>> use case of paravirtualized time?
> 
> fwiw, I'm currently prototyping something like this for x86 (although
> not by putting the gmem address into userspace_addr, but by adding a new
> field to memslots, so that memory attributes continue working), based on
> what we talked about at the last guest_memfd sync meeting (the whole
> "how to get MMIO emulation working for non-CoCo VMs in guest_memfd"
> story).

Yes, I recall that discussion. Can you elaborate why the separate field 
is required to keep memory attributes working? (could it be sorted out 
differently, by reusing userspace_addr?).

> So I guess if we're going down this route for x86, maybe it
> makes sense to do the same on ARM, for consistency?
> 
>> It would get rid of the immediate need for this patch and patch #4 to
>> get it flying.
>>
>>
>> One interesting question is: when would you want shared memory in
>> guest_memfd and *not* provide it as part of the same memslot.
> 
> In my testing of non-CoCo gmem VMs on ARM, I've been able to get quite
> far without giving KVM a way to internally access shared parts of gmem -
> it's why I was probing Fuad for this simplified series, because
> KVM_SW_PROTECTED_VM + mmap (for loading guest kernel) is enough to get a
> working non-CoCo VM on ARM (although I admittedly never looked at clocks
> inside the guest - maybe that's one thing that breaks if KVM can't
> access gmem. How to guest and host agree on the guest memory range
> used to exchange paravirtual timekeeping information? Could that exchange
> be intercepted in userspace, and set to shared via memory attributes (e.g.
> placed outside gmem)? That's the route I'm going down the paravirtual
> time on x86).

Sounds reasonable to me.

> 
>> One nice thing about the mmap might be that access go via user-space
>> page tables: E.g., __kvm_read_guest_page can just access the memory
>> without requiring the folio lock and an additional temporary folio
>> reference on every access -- it's handled implicitly via the mapcount.
>>
>> (of course, to map the page we still need that once on the fault path)
> 
> Doing a direct map access in kvm_{read,write}_guest() and friends will
> also get tricky if guest_memfd folios ever don't have direct map
> entries. On-demand restoration is painful, both complexity and
> performance wise [1], while going through a userspace mapping of
> guest_memfd would "just work".

Indeed.
David Hildenbrand Jan. 23, 2025, 2:21 p.m. UTC | #10
On 23.01.25 14:57, Patrick Roy wrote:
> 
> 
> On Thu, 2025-01-23 at 12:28 +0000, Fuad Tabba wrote:
>> Hi Patrick,
>>
>> On Thu, 23 Jan 2025 at 11:57, Patrick Roy <roypat@amazon.co.uk> wrote:
>>>
>>>
>>>
>>> On Thu, 2025-01-23 at 11:39 +0000, David Hildenbrand wrote:
>>>> On 23.01.25 10:48, Fuad Tabba wrote:
>>>>> On Wed, 22 Jan 2025 at 22:10, David Hildenbrand <david@redhat.com> wrote:
>>>>>>
>>>>>> On 22.01.25 16:27, Fuad Tabba wrote:
>>>>>>> Make kvm_(read|/write)_guest_page() capable of accessing guest
>>>>>>> memory for slots that don't have a userspace address, but only if
>>>>>>> the memory is mappable, which also indicates that it is
>>>>>>> accessible by the host.
>>>>>>
>>>>>> Interesting. So far my assumption was that, for shared memory, user
>>>>>> space would simply mmap() guest_memdd and pass it as userspace address
>>>>>> to the same memslot that has this guest_memfd for private memory.
>>>>>>
>>>>>> Wouldn't that be easier in the first shot? (IOW, not require this patch
>>>>>> with the cost of faulting the shared page into the page table on access)
>>>>>
>>>>
>>>> In light of:
>>>>
>>>> https://lkml.kernel.org/r/20250117190938.93793-4-imbrenda@linux.ibm.com
>>>>
>>>> there can, in theory, be memslots that start at address 0 and have a
>>>> "valid" mapping. This case is done from the kernel (and on special s390x
>>>> hardware), though, so it does not apply here at all so far.
>>>>
>>>> In practice, getting address 0 as a valid address is unlikely, because
>>>> the default:
>>>>
>>>> $ sysctl  vm.mmap_min_addr
>>>> vm.mmap_min_addr = 65536
>>>>
>>>> usually prohibits it for good reason.
>>>>
>>>>> This has to do more with the ABI I had for pkvm and shared memory
>>>>> implementations, in which you don't need to specify the userspace
>>>>> address for memory in a guestmem memslot. The issue is there is no
>>>>> obvious address to map it to. This would be the case in kvm:arm64 for
>>>>> tracking paravirtualized time, which the userspace doesn't necessarily
>>>>> need to interact with, but kvm does.
>>>>
>>>> So I understand correctly: userspace wouldn't have to mmap it because it
>>>> is not interested in accessing it, but there is nothing speaking against
>>>> mmaping it, at least in the first shot.
>>>>
>>>> I assume it would not be a private memslot (so far, my understanding is
>>>> that internal memslots never have a guest_memfd attached).
>>>> kvm_gmem_create() is only called via KVM_CREATE_GUEST_MEMFD, to be set
>>>> on user-created memslots.
>>>>
>>>>>
>>>>> That said, we could always have a userspace address dedicated to
>>>>> mapping shared locations, and use that address when the necessity
>>>>> arises. Or we could always require that memslots have a userspace
>>>>> address, even if not used. I don't really have a strong preference.
>>>>
>>>> So, the simpler version where user space would simply mmap guest_memfd
>>>> to provide the address via userspace_addr would at least work for the
>>>> use case of paravirtualized time?
>>>
>>> fwiw, I'm currently prototyping something like this for x86 (although
>>> not by putting the gmem address into userspace_addr, but by adding a new
>>> field to memslots, so that memory attributes continue working), based on
>>> what we talked about at the last guest_memfd sync meeting (the whole
>>> "how to get MMIO emulation working for non-CoCo VMs in guest_memfd"
>>> story). So I guess if we're going down this route for x86, maybe it
>>> makes sense to do the same on ARM, for consistency?
>>>
>>>> It would get rid of the immediate need for this patch and patch #4 to
>>>> get it flying.
>>>>
>>>>
>>>> One interesting question is: when would you want shared memory in
>>>> guest_memfd and *not* provide it as part of the same memslot.
>>>
>>> In my testing of non-CoCo gmem VMs on ARM, I've been able to get quite
>>> far without giving KVM a way to internally access shared parts of gmem -
>>> it's why I was probing Fuad for this simplified series, because
>>> KVM_SW_PROTECTED_VM + mmap (for loading guest kernel) is enough to get a
>>> working non-CoCo VM on ARM (although I admittedly never looked at clocks
>>> inside the guest - maybe that's one thing that breaks if KVM can't
>>> access gmem. How to guest and host agree on the guest memory range
>>> used to exchange paravirtual timekeeping information? Could that exchange
>>> be intercepted in userspace, and set to shared via memory attributes (e.g.
>>> placed outside gmem)? That's the route I'm going down the paravirtual
>>> time on x86).
>>
>> For an idea of what it looks like on arm64, here's how kvmtool handles it:
>> https://github.com/kvmtool/kvmtool/blob/master/arm/aarch64/pvtime.c
>>
>> Cheers,
>> /fuad
>   
> Thanks! In that example, kvmtool actually allocates a separate memslot for
> the pvclock stuff, so I guess it's always possible to simply put it into
> a non-gmem memslot, which indeed sidesteps this issue as you mention in
> your reply to David :D

Does that work on CC where all memory defaults to private first, and the 
VM explicitly has to opt into marking it shared first, or how exactly 
would the flow of operations be in the cases of the non-gmem ("good 
old") memslot?
Fuad Tabba Jan. 23, 2025, 2:25 p.m. UTC | #11
On Thu, 23 Jan 2025 at 14:21, David Hildenbrand <david@redhat.com> wrote:
>
> On 23.01.25 14:57, Patrick Roy wrote:
> >
> >
> > On Thu, 2025-01-23 at 12:28 +0000, Fuad Tabba wrote:
> >> Hi Patrick,
> >>
> >> On Thu, 23 Jan 2025 at 11:57, Patrick Roy <roypat@amazon.co.uk> wrote:
> >>>
> >>>
> >>>
> >>> On Thu, 2025-01-23 at 11:39 +0000, David Hildenbrand wrote:
> >>>> On 23.01.25 10:48, Fuad Tabba wrote:
> >>>>> On Wed, 22 Jan 2025 at 22:10, David Hildenbrand <david@redhat.com> wrote:
> >>>>>>
> >>>>>> On 22.01.25 16:27, Fuad Tabba wrote:
> >>>>>>> Make kvm_(read|/write)_guest_page() capable of accessing guest
> >>>>>>> memory for slots that don't have a userspace address, but only if
> >>>>>>> the memory is mappable, which also indicates that it is
> >>>>>>> accessible by the host.
> >>>>>>
> >>>>>> Interesting. So far my assumption was that, for shared memory, user
> >>>>>> space would simply mmap() guest_memdd and pass it as userspace address
> >>>>>> to the same memslot that has this guest_memfd for private memory.
> >>>>>>
> >>>>>> Wouldn't that be easier in the first shot? (IOW, not require this patch
> >>>>>> with the cost of faulting the shared page into the page table on access)
> >>>>>
> >>>>
> >>>> In light of:
> >>>>
> >>>> https://lkml.kernel.org/r/20250117190938.93793-4-imbrenda@linux.ibm.com
> >>>>
> >>>> there can, in theory, be memslots that start at address 0 and have a
> >>>> "valid" mapping. This case is done from the kernel (and on special s390x
> >>>> hardware), though, so it does not apply here at all so far.
> >>>>
> >>>> In practice, getting address 0 as a valid address is unlikely, because
> >>>> the default:
> >>>>
> >>>> $ sysctl  vm.mmap_min_addr
> >>>> vm.mmap_min_addr = 65536
> >>>>
> >>>> usually prohibits it for good reason.
> >>>>
> >>>>> This has to do more with the ABI I had for pkvm and shared memory
> >>>>> implementations, in which you don't need to specify the userspace
> >>>>> address for memory in a guestmem memslot. The issue is there is no
> >>>>> obvious address to map it to. This would be the case in kvm:arm64 for
> >>>>> tracking paravirtualized time, which the userspace doesn't necessarily
> >>>>> need to interact with, but kvm does.
> >>>>
> >>>> So I understand correctly: userspace wouldn't have to mmap it because it
> >>>> is not interested in accessing it, but there is nothing speaking against
> >>>> mmaping it, at least in the first shot.
> >>>>
> >>>> I assume it would not be a private memslot (so far, my understanding is
> >>>> that internal memslots never have a guest_memfd attached).
> >>>> kvm_gmem_create() is only called via KVM_CREATE_GUEST_MEMFD, to be set
> >>>> on user-created memslots.
> >>>>
> >>>>>
> >>>>> That said, we could always have a userspace address dedicated to
> >>>>> mapping shared locations, and use that address when the necessity
> >>>>> arises. Or we could always require that memslots have a userspace
> >>>>> address, even if not used. I don't really have a strong preference.
> >>>>
> >>>> So, the simpler version where user space would simply mmap guest_memfd
> >>>> to provide the address via userspace_addr would at least work for the
> >>>> use case of paravirtualized time?
> >>>
> >>> fwiw, I'm currently prototyping something like this for x86 (although
> >>> not by putting the gmem address into userspace_addr, but by adding a new
> >>> field to memslots, so that memory attributes continue working), based on
> >>> what we talked about at the last guest_memfd sync meeting (the whole
> >>> "how to get MMIO emulation working for non-CoCo VMs in guest_memfd"
> >>> story). So I guess if we're going down this route for x86, maybe it
> >>> makes sense to do the same on ARM, for consistency?
> >>>
> >>>> It would get rid of the immediate need for this patch and patch #4 to
> >>>> get it flying.
> >>>>
> >>>>
> >>>> One interesting question is: when would you want shared memory in
> >>>> guest_memfd and *not* provide it as part of the same memslot.
> >>>
> >>> In my testing of non-CoCo gmem VMs on ARM, I've been able to get quite
> >>> far without giving KVM a way to internally access shared parts of gmem -
> >>> it's why I was probing Fuad for this simplified series, because
> >>> KVM_SW_PROTECTED_VM + mmap (for loading guest kernel) is enough to get a
> >>> working non-CoCo VM on ARM (although I admittedly never looked at clocks
> >>> inside the guest - maybe that's one thing that breaks if KVM can't
> >>> access gmem. How to guest and host agree on the guest memory range
> >>> used to exchange paravirtual timekeeping information? Could that exchange
> >>> be intercepted in userspace, and set to shared via memory attributes (e.g.
> >>> placed outside gmem)? That's the route I'm going down the paravirtual
> >>> time on x86).
> >>
> >> For an idea of what it looks like on arm64, here's how kvmtool handles it:
> >> https://github.com/kvmtool/kvmtool/blob/master/arm/aarch64/pvtime.c
> >>
> >> Cheers,
> >> /fuad
> >
> > Thanks! In that example, kvmtool actually allocates a separate memslot for
> > the pvclock stuff, so I guess it's always possible to simply put it into
> > a non-gmem memslot, which indeed sidesteps this issue as you mention in
> > your reply to David :D
>
> Does that work on CC where all memory defaults to private first, and the
> VM explicitly has to opt into marking it shared first, or how exactly
> would the flow of operations be in the cases of the non-gmem ("good
> old") memslot?

We use a normal memslot, without the KVM_MEM_GUEST_MEMFD flag, and
consider that kind of slot to be shared by default.

Cheers,
/fuad

> --
> Cheers,
>
> David / dhildenb
>
Patrick Roy Jan. 23, 2025, 3:22 p.m. UTC | #12
On Thu, 2025-01-23 at 14:18 +0000, David Hildenbrand wrote:
>>>>
>>>> That said, we could always have a userspace address dedicated to
>>>> mapping shared locations, and use that address when the necessity
>>>> arises. Or we could always require that memslots have a userspace
>>>> address, even if not used. I don't really have a strong preference.
>>>
>>> So, the simpler version where user space would simply mmap guest_memfd
>>> to provide the address via userspace_addr would at least work for the
>>> use case of paravirtualized time?
>>
>> fwiw, I'm currently prototyping something like this for x86 (although
>> not by putting the gmem address into userspace_addr, but by adding a new
>> field to memslots, so that memory attributes continue working), based on
>> what we talked about at the last guest_memfd sync meeting (the whole
>> "how to get MMIO emulation working for non-CoCo VMs in guest_memfd"
>> story).
> 
> Yes, I recall that discussion. Can you elaborate why the separate field
> is required to keep memory attributes working? (could it be sorted out
> differently, by reusing userspace_addr?).

The scenario I ran into was that within the same memslots, I wanted some
gfns to be backed by guest_memfd, and others by traditional memory, so
that KVM can GUP some parts of guest memory even if guest_memfd itself
is direct map removed.

It actually also has to do with paravirtual time, but on x86. Here, the
guest chooses where in guest memory the clock structure is placed via an
MSR write (so I can't a priori use a traditional memslot, like we can on
ARM).  KVM internally wants to GUP the hva that corresponds to the gfn
the guest chooses, but if the hva is in a mapping of direct map removed
gmem, that won't work. So what I did was just intercept the MSR write in
userspace, and clear KVM_MEMORY_ATTRIBUTES_PRIVATE for the gfn. But for
this, I need userspace_addr to not point to the guest_memfd hva.
Although maybe it'd be possible to instead reconfigure the memslots when
intercepting the MSR? Not sure where we stand on KVM_MEM_GUEST_MEMFD
memslots though.

But also conceptually, doesn't KVM_MEMORY_ATTRIBUTES_PRIVATE kinda loose
any meaning if userspace_addr also points towards gmem? E.g. no matter
what we set, we'd get gmem mapped into the guest.

> -- 
> Cheers,
> 
> David / dhildenb
> 

Best, 
Patrick
David Hildenbrand Jan. 24, 2025, 2:44 p.m. UTC | #13
On 23.01.25 16:22, Patrick Roy wrote:
> On Thu, 2025-01-23 at 14:18 +0000, David Hildenbrand wrote:
>>>>>
>>>>> That said, we could always have a userspace address dedicated to
>>>>> mapping shared locations, and use that address when the necessity
>>>>> arises. Or we could always require that memslots have a userspace
>>>>> address, even if not used. I don't really have a strong preference.
>>>>
>>>> So, the simpler version where user space would simply mmap guest_memfd
>>>> to provide the address via userspace_addr would at least work for the
>>>> use case of paravirtualized time?
>>>
>>> fwiw, I'm currently prototyping something like this for x86 (although
>>> not by putting the gmem address into userspace_addr, but by adding a new
>>> field to memslots, so that memory attributes continue working), based on
>>> what we talked about at the last guest_memfd sync meeting (the whole
>>> "how to get MMIO emulation working for non-CoCo VMs in guest_memfd"
>>> story).
>>
>> Yes, I recall that discussion. Can you elaborate why the separate field
>> is required to keep memory attributes working? (could it be sorted out
>> differently, by reusing userspace_addr?).
> 
> The scenario I ran into was that within the same memslots, I wanted some
> gfns to be backed by guest_memfd, and others by traditional memory, so
> that KVM can GUP some parts of guest memory even if guest_memfd itself
> is direct map removed.

Just summarizing what we discussed yesterday:

GUP will not be allowed if the direct map was removed, and paravirt time 
on x86 uses GUP to access the guest page right now.

> 
> It actually also has to do with paravirtual time, but on x86. Here, the
> guest chooses where in guest memory the clock structure is placed via an
> MSR write (so I can't a priori use a traditional memslot, like we can on
> ARM).  KVM internally wants to GUP the hva that corresponds to the gfn
> the guest chooses, but if the hva is in a mapping of direct map removed
> gmem, that won't work.

And as discussed here, Sean raised that these hypervisor updates happen 
rarely.

In the first step, it might be good enough to just update using the 
user-space page table mappings (user access), to avoid messing with 
kmap/direct-map reinstalling.

With that in place, it looks like that user space could simply mmap 
guest_memfd and provide the mmaped area "ordinarily" as the "shared 
memory" part of the guest_memfd -- using userspace_address.

GUP would fail, bit paravirt time would not be using GUP.

So we could handle the guest_memfd mmap just like on other 
architectures, without the need for other memslot fields.

Please correct me if I misunderstood something :)
diff mbox series

Patch

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index de2c11dae231..ad9802012a3f 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -3094,21 +3094,93 @@  static int next_segment(unsigned long len, int offset)
 		return len;
 }
 
-/* Copy @len bytes from guest memory at '(@gfn * PAGE_SIZE) + @offset' to @data */
-static int __kvm_read_guest_page(struct kvm_memory_slot *slot, gfn_t gfn,
-				 void *data, int offset, int len)
+#ifdef CONFIG_KVM_GMEM_MAPPABLE
+static int __kvm_read_guest_memfd_page(struct kvm *kvm,
+				       struct kvm_memory_slot *slot,
+				       gfn_t gfn, void *data, int offset,
+				       int len)
+{
+	struct page *page;
+	u64 pfn;
+	int r;
+
+	/*
+	 * Holds the folio lock until after checking whether it can be faulted
+	 * in, to avoid races with paths that change a folio's mappability.
+	 */
+	r = kvm_gmem_get_pfn(kvm, slot, gfn, &pfn, &page, NULL);
+	if (r)
+		return r;
+
+	memcpy(data, page_address(page) + offset, len);
+	kvm_release_page_clean(page);
+
+	return r;
+}
+
+static int __kvm_write_guest_memfd_page(struct kvm *kvm,
+					struct kvm_memory_slot *slot,
+					gfn_t gfn, const void *data,
+					int offset, int len)
 {
+	struct page *page;
+	u64 pfn;
 	int r;
+
+	/*
+	 * Holds the folio lock until after checking whether it can be faulted
+	 * in, to avoid races with paths that change a folio's mappability.
+	 */
+	r = kvm_gmem_get_pfn(kvm, slot, gfn, &pfn, &page, NULL);
+	if (r)
+		return r;
+
+	memcpy(page_address(page) + offset, data, len);
+	kvm_release_page_dirty(page);
+
+	return r;
+}
+#else
+static int __kvm_read_guest_memfd_page(struct kvm *kvm,
+				       struct kvm_memory_slot *slot,
+				       gfn_t gfn, void *data, int offset,
+				       int len)
+{
+	WARN_ON_ONCE(1);
+	return -EIO;
+}
+
+static int __kvm_write_guest_memfd_page(struct kvm *kvm,
+					struct kvm_memory_slot *slot,
+					gfn_t gfn, const void *data,
+					int offset, int len)
+{
+	WARN_ON_ONCE(1);
+	return -EIO;
+}
+#endif /* CONFIG_KVM_GMEM_MAPPABLE */
+
+/* Copy @len bytes from guest memory at '(@gfn * PAGE_SIZE) + @offset' to @data */
+
+static int __kvm_read_guest_page(struct kvm *kvm, struct kvm_memory_slot *slot,
+				 gfn_t gfn, void *data, int offset, int len)
+{
 	unsigned long addr;
 
 	if (WARN_ON_ONCE(offset + len > PAGE_SIZE))
 		return -EFAULT;
 
+	if (kvm_arch_private_mem_inplace(kvm) &&
+	    kvm_slot_can_be_private(slot) &&
+	    !slot->userspace_addr) {
+		return __kvm_read_guest_memfd_page(kvm, slot, gfn, data,
+						   offset, len);
+	}
+
 	addr = gfn_to_hva_memslot_prot(slot, gfn, NULL);
 	if (kvm_is_error_hva(addr))
 		return -EFAULT;
-	r = __copy_from_user(data, (void __user *)addr + offset, len);
-	if (r)
+	if (__copy_from_user(data, (void __user *)addr + offset, len))
 		return -EFAULT;
 	return 0;
 }
@@ -3118,7 +3190,7 @@  int kvm_read_guest_page(struct kvm *kvm, gfn_t gfn, void *data, int offset,
 {
 	struct kvm_memory_slot *slot = gfn_to_memslot(kvm, gfn);
 
-	return __kvm_read_guest_page(slot, gfn, data, offset, len);
+	return __kvm_read_guest_page(kvm, slot, gfn, data, offset, len);
 }
 EXPORT_SYMBOL_GPL(kvm_read_guest_page);
 
@@ -3127,7 +3199,7 @@  int kvm_vcpu_read_guest_page(struct kvm_vcpu *vcpu, gfn_t gfn, void *data,
 {
 	struct kvm_memory_slot *slot = kvm_vcpu_gfn_to_memslot(vcpu, gfn);
 
-	return __kvm_read_guest_page(slot, gfn, data, offset, len);
+	return __kvm_read_guest_page(vcpu->kvm, slot, gfn, data, offset, len);
 }
 EXPORT_SYMBOL_GPL(kvm_vcpu_read_guest_page);
 
@@ -3204,22 +3276,30 @@  EXPORT_SYMBOL_GPL(kvm_vcpu_read_guest_atomic);
 
 /* Copy @len bytes from @data into guest memory at '(@gfn * PAGE_SIZE) + @offset' */
 static int __kvm_write_guest_page(struct kvm *kvm,
-				  struct kvm_memory_slot *memslot, gfn_t gfn,
-			          const void *data, int offset, int len)
+				  struct kvm_memory_slot *slot, gfn_t gfn,
+				  const void *data, int offset, int len)
 {
-	int r;
-	unsigned long addr;
-
 	if (WARN_ON_ONCE(offset + len > PAGE_SIZE))
 		return -EFAULT;
 
-	addr = gfn_to_hva_memslot(memslot, gfn);
-	if (kvm_is_error_hva(addr))
-		return -EFAULT;
-	r = __copy_to_user((void __user *)addr + offset, data, len);
-	if (r)
-		return -EFAULT;
-	mark_page_dirty_in_slot(kvm, memslot, gfn);
+	if (kvm_arch_private_mem_inplace(kvm) &&
+	    kvm_slot_can_be_private(slot) &&
+	    !slot->userspace_addr) {
+		int r = __kvm_write_guest_memfd_page(kvm, slot, gfn, data,
+						     offset, len);
+
+		if (r)
+			return r;
+	} else {
+		unsigned long addr = gfn_to_hva_memslot(slot, gfn);
+
+		if (kvm_is_error_hva(addr))
+			return -EFAULT;
+		if (__copy_to_user((void __user *)addr + offset, data, len))
+			return -EFAULT;
+	}
+
+	mark_page_dirty_in_slot(kvm, slot, gfn);
 	return 0;
 }