diff mbox

[v6,12/12] KVM: indicate readonly access fault

Message ID 5034D4E3.2080801@linux.vnet.ibm.com (mailing list archive)
State New, archived
Headers show

Commit Message

Xiao Guangrong Aug. 22, 2012, 12:47 p.m. UTC
On 08/22/2012 08:06 PM, Avi Kivity wrote:
> On 08/21/2012 06:03 AM, Xiao Guangrong wrote:
>> Introduce write_readonly_mem in mmio-exit-info to indicate this exit is
>> caused by write access on readonly memslot
> 
> Please document this in chapter 5 of apic.txt.
> 

Okay, please review this one.

Subject: [PATCH v6 12/12] KVM: indicate readonly access fault

Introduce write_readonly_mem in mmio-exit-info to indicate this exit is
caused by write access on readonly memslot

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
---
 Documentation/virtual/kvm/api.txt |    8 +++++++-
 arch/x86/kvm/x86.c                |   23 ++++++++++++++---------
 include/linux/kvm.h               |    3 +++
 include/linux/kvm_host.h          |    1 +
 virt/kvm/kvm_main.c               |    3 +++
 5 files changed, 28 insertions(+), 10 deletions(-)

Comments

Avi Kivity Sept. 6, 2012, 2:09 p.m. UTC | #1
On 08/22/2012 03:47 PM, Xiao Guangrong wrote:
> On 08/22/2012 08:06 PM, Avi Kivity wrote:
>> On 08/21/2012 06:03 AM, Xiao Guangrong wrote:
>>> Introduce write_readonly_mem in mmio-exit-info to indicate this exit is
>>> caused by write access on readonly memslot
>> 
>> Please document this in chapter 5 of apic.txt.
>> 
> 
> Okay, please review this one.
> 
> Subject: [PATCH v6 12/12] KVM: indicate readonly access fault
> 
> Introduce write_readonly_mem in mmio-exit-info to indicate this exit is
> caused by write access on readonly memslot
> 

I'm not sure whether this indication can be trusted by userspace.  By
the time userspace gets to process this, the slot may no longer exist,
or it may be writable.

(in the same way an mmio exit might actually hit RAM)
Xiao Guangrong Sept. 7, 2012, 9:56 a.m. UTC | #2
On 09/06/2012 10:09 PM, Avi Kivity wrote:
> On 08/22/2012 03:47 PM, Xiao Guangrong wrote:
>> On 08/22/2012 08:06 PM, Avi Kivity wrote:
>>> On 08/21/2012 06:03 AM, Xiao Guangrong wrote:
>>>> Introduce write_readonly_mem in mmio-exit-info to indicate this exit is
>>>> caused by write access on readonly memslot
>>>
>>> Please document this in chapter 5 of apic.txt.
>>>
>>
>> Okay, please review this one.
>>
>> Subject: [PATCH v6 12/12] KVM: indicate readonly access fault
>>
>> Introduce write_readonly_mem in mmio-exit-info to indicate this exit is
>> caused by write access on readonly memslot
>>
> 
> I'm not sure whether this indication can be trusted by userspace.  By
> the time userspace gets to process this, the slot may no longer exist,
> or it may be writable.

The case of deleting memslot is ok, because userspace just skips this fault
if no readonly mem or no fault handler can be found.

Switching memslot from readonly to writable sounds strange, i agree with you
that this flag is untrusty under this case.

Marcelo, any comments?

> 
> (in the same way an mmio exit might actually hit RAM)

So, in the userspace, for the safe reason, we should walk all memslots not
just walking mmio handlers?

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Avi Kivity Sept. 9, 2012, 1:46 p.m. UTC | #3
On 09/07/2012 12:56 PM, Xiao Guangrong wrote:
> On 09/06/2012 10:09 PM, Avi Kivity wrote:
>> On 08/22/2012 03:47 PM, Xiao Guangrong wrote:
>>> On 08/22/2012 08:06 PM, Avi Kivity wrote:
>>>> On 08/21/2012 06:03 AM, Xiao Guangrong wrote:
>>>>> Introduce write_readonly_mem in mmio-exit-info to indicate this exit is
>>>>> caused by write access on readonly memslot
>>>>
>>>> Please document this in chapter 5 of apic.txt.
>>>>
>>>
>>> Okay, please review this one.
>>>
>>> Subject: [PATCH v6 12/12] KVM: indicate readonly access fault
>>>
>>> Introduce write_readonly_mem in mmio-exit-info to indicate this exit is
>>> caused by write access on readonly memslot
>>>
>> 
>> I'm not sure whether this indication can be trusted by userspace.  By
>> the time userspace gets to process this, the slot may no longer exist,
>> or it may be writable.
> 
> The case of deleting memslot is ok, because userspace just skips this fault
> if no readonly mem or no fault handler can be found.
> 
> Switching memslot from readonly to writable sounds strange, i agree with you
> that this flag is untrusty under this case.

It's not so strange, see the PAM registers in the 440fx chipset.

The strategy for shadowing the BIOS is to set PAM for (read: pci, write:
RAM) (=a readonly slot in kvm), memcpy the BIOS onto itself, then switch
the PAM to (read: RAM, write: RAM) (=rw slot in kvm) or (read: RAM,
write: PCI) (=different readonly slot in kvm).

Of course that usually happens with just one cpu running so there would
be no confusion.  But in general the indication is racy.
Marcelo Tosatti Sept. 10, 2012, 10:31 p.m. UTC | #4
On Fri, Sep 07, 2012 at 05:56:39PM +0800, Xiao Guangrong wrote:
> On 09/06/2012 10:09 PM, Avi Kivity wrote:
> > On 08/22/2012 03:47 PM, Xiao Guangrong wrote:
> >> On 08/22/2012 08:06 PM, Avi Kivity wrote:
> >>> On 08/21/2012 06:03 AM, Xiao Guangrong wrote:
> >>>> Introduce write_readonly_mem in mmio-exit-info to indicate this exit is
> >>>> caused by write access on readonly memslot
> >>>
> >>> Please document this in chapter 5 of apic.txt.
> >>>
> >>
> >> Okay, please review this one.
> >>
> >> Subject: [PATCH v6 12/12] KVM: indicate readonly access fault
> >>
> >> Introduce write_readonly_mem in mmio-exit-info to indicate this exit is
> >> caused by write access on readonly memslot
> >>
> > 
> > I'm not sure whether this indication can be trusted by userspace.  By
> > the time userspace gets to process this, the slot may no longer exist,
> > or it may be writable.
> 
> The case of deleting memslot is ok, because userspace just skips this fault
> if no readonly mem or no fault handler can be found.
> 
> Switching memslot from readonly to writable sounds strange, i agree with you
> that this flag is untrusty under this case.
> 
> Marcelo, any comments?

The same can happen with slot deletion, for example. 

Userspace (which performed the modification which can result in faults
to non-existant/read-only/.../new-tag memslot), must handle the faults 
properly or avoid the possibility for reference to memslot information 
from the past.

I think its worthwhile to add a note about this in the API
documentation: "The user of this interface is responsible for handling 
references to stale memslot information, either by handling
exit notifications which reference stale memslot information or not
allowing these notifications to exist by stopping all vcpus in userspace
before performing modifications to the memslots map".

> > (in the same way an mmio exit might actually hit RAM)
> 
> So, in the userspace, for the safe reason, we should walk all memslots not
> just walking mmio handlers?
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Avi Kivity Sept. 11, 2012, 9:18 a.m. UTC | #5
On 09/11/2012 01:31 AM, Marcelo Tosatti wrote:
> On Fri, Sep 07, 2012 at 05:56:39PM +0800, Xiao Guangrong wrote:
>> On 09/06/2012 10:09 PM, Avi Kivity wrote:
>> > On 08/22/2012 03:47 PM, Xiao Guangrong wrote:
>> >> On 08/22/2012 08:06 PM, Avi Kivity wrote:
>> >>> On 08/21/2012 06:03 AM, Xiao Guangrong wrote:
>> >>>> Introduce write_readonly_mem in mmio-exit-info to indicate this exit is
>> >>>> caused by write access on readonly memslot
>> >>>
>> >>> Please document this in chapter 5 of apic.txt.
>> >>>
>> >>
>> >> Okay, please review this one.
>> >>
>> >> Subject: [PATCH v6 12/12] KVM: indicate readonly access fault
>> >>
>> >> Introduce write_readonly_mem in mmio-exit-info to indicate this exit is
>> >> caused by write access on readonly memslot
>> >>
>> > 
>> > I'm not sure whether this indication can be trusted by userspace.  By
>> > the time userspace gets to process this, the slot may no longer exist,
>> > or it may be writable.
>> 
>> The case of deleting memslot is ok, because userspace just skips this fault
>> if no readonly mem or no fault handler can be found.
>> 
>> Switching memslot from readonly to writable sounds strange, i agree with you
>> that this flag is untrusty under this case.
>> 
>> Marcelo, any comments?
> 
> The same can happen with slot deletion, for example. 
> 
> Userspace (which performed the modification which can result in faults
> to non-existant/read-only/.../new-tag memslot), must handle the faults 
> properly or avoid the possibility for reference to memslot information 
> from the past.
> 
> I think its worthwhile to add a note about this in the API
> documentation: "The user of this interface is responsible for handling 
> references to stale memslot information, either by handling
> exit notifications which reference stale memslot information or not
> allowing these notifications to exist by stopping all vcpus in userspace
> before performing modifications to the memslots map".

Or we can drop the new interface and rely on userspace to perform the
lookup under its own locking rules.

It's slow, but writes to ROM or ROM/device are rare anyway.
Marcelo Tosatti Sept. 11, 2012, 2:39 p.m. UTC | #6
On Tue, Sep 11, 2012 at 12:18:22PM +0300, Avi Kivity wrote:
> > The same can happen with slot deletion, for example. 
> > 
> > Userspace (which performed the modification which can result in faults
> > to non-existant/read-only/.../new-tag memslot), must handle the faults 
> > properly or avoid the possibility for reference to memslot information 
> > from the past.
> > 
> > I think its worthwhile to add a note about this in the API
> > documentation: "The user of this interface is responsible for handling 
> > references to stale memslot information, either by handling
> > exit notifications which reference stale memslot information or not
> > allowing these notifications to exist by stopping all vcpus in userspace
> > before performing modifications to the memslots map".
> 
> Or we can drop the new interface and rely on userspace to perform the
> lookup under its own locking rules.
> 
> It's slow, but writes to ROM or ROM/device are rare anyway.

Lookup what information? 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Marcelo Tosatti Sept. 12, 2012, 3:27 p.m. UTC | #7
On Tue, Sep 11, 2012 at 11:39:01AM -0300, Marcelo Tosatti wrote:
> On Tue, Sep 11, 2012 at 12:18:22PM +0300, Avi Kivity wrote:
> > > The same can happen with slot deletion, for example. 
> > > 
> > > Userspace (which performed the modification which can result in faults
> > > to non-existant/read-only/.../new-tag memslot), must handle the faults 
> > > properly or avoid the possibility for reference to memslot information 
> > > from the past.
> > > 
> > > I think its worthwhile to add a note about this in the API
> > > documentation: "The user of this interface is responsible for handling 
> > > references to stale memslot information, either by handling
> > > exit notifications which reference stale memslot information or not
> > > allowing these notifications to exist by stopping all vcpus in userspace
> > > before performing modifications to the memslots map".
> > 
> > Or we can drop the new interface and rely on userspace to perform the
> > lookup under its own locking rules.
> > 
> > It's slow, but writes to ROM or ROM/device are rare anyway.
> 
> Lookup what information? 

Ping? 


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Avi Kivity Sept. 12, 2012, 3:34 p.m. UTC | #8
On 09/11/2012 05:39 PM, Marcelo Tosatti wrote:
> On Tue, Sep 11, 2012 at 12:18:22PM +0300, Avi Kivity wrote:
>> > The same can happen with slot deletion, for example. 
>> > 
>> > Userspace (which performed the modification which can result in faults
>> > to non-existant/read-only/.../new-tag memslot), must handle the faults 
>> > properly or avoid the possibility for reference to memslot information 
>> > from the past.
>> > 
>> > I think its worthwhile to add a note about this in the API
>> > documentation: "The user of this interface is responsible for handling 
>> > references to stale memslot information, either by handling
>> > exit notifications which reference stale memslot information or not
>> > allowing these notifications to exist by stopping all vcpus in userspace
>> > before performing modifications to the memslots map".
>> 
>> Or we can drop the new interface and rely on userspace to perform the
>> lookup under its own locking rules.
>> 
>> It's slow, but writes to ROM or ROM/device are rare anyway.
> 
> Lookup what information? 

Where to dispatch the write.

In fact userspace has to do that anyway if it's a ROM/device.  There's
no way userspace can guess that unless we pass in the slot number (which
isn't synchronized with anything).
Marcelo Tosatti Sept. 12, 2012, 3:44 p.m. UTC | #9
On Wed, Sep 12, 2012 at 06:34:33PM +0300, Avi Kivity wrote:
> On 09/11/2012 05:39 PM, Marcelo Tosatti wrote:
> > On Tue, Sep 11, 2012 at 12:18:22PM +0300, Avi Kivity wrote:
> >> > The same can happen with slot deletion, for example. 
> >> > 
> >> > Userspace (which performed the modification which can result in faults
> >> > to non-existant/read-only/.../new-tag memslot), must handle the faults 
> >> > properly or avoid the possibility for reference to memslot information 
> >> > from the past.
> >> > 
> >> > I think its worthwhile to add a note about this in the API
> >> > documentation: "The user of this interface is responsible for handling 
> >> > references to stale memslot information, either by handling
> >> > exit notifications which reference stale memslot information or not
> >> > allowing these notifications to exist by stopping all vcpus in userspace
> >> > before performing modifications to the memslots map".
> >> 
> >> Or we can drop the new interface and rely on userspace to perform the
> >> lookup under its own locking rules.
> >> 
> >> It's slow, but writes to ROM or ROM/device are rare anyway.
> > 
> > Lookup what information? 
> 
> Where to dispatch the write.
> 
> In fact userspace has to do that anyway if it's a ROM/device.  There's
> no way userspace can guess that unless we pass in the slot number (which
> isn't synchronized with anything).

Alright, do you prefer the details of this exit to be worked out later,
when necessary, then? 

That is, not merge this particular patch of the series? 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Avi Kivity Sept. 12, 2012, 3:55 p.m. UTC | #10
On 09/12/2012 06:44 PM, Marcelo Tosatti wrote:
> On Wed, Sep 12, 2012 at 06:34:33PM +0300, Avi Kivity wrote:
>> On 09/11/2012 05:39 PM, Marcelo Tosatti wrote:
>> > On Tue, Sep 11, 2012 at 12:18:22PM +0300, Avi Kivity wrote:
>> >> > The same can happen with slot deletion, for example. 
>> >> > 
>> >> > Userspace (which performed the modification which can result in faults
>> >> > to non-existant/read-only/.../new-tag memslot), must handle the faults 
>> >> > properly or avoid the possibility for reference to memslot information 
>> >> > from the past.
>> >> > 
>> >> > I think its worthwhile to add a note about this in the API
>> >> > documentation: "The user of this interface is responsible for handling 
>> >> > references to stale memslot information, either by handling
>> >> > exit notifications which reference stale memslot information or not
>> >> > allowing these notifications to exist by stopping all vcpus in userspace
>> >> > before performing modifications to the memslots map".
>> >> 
>> >> Or we can drop the new interface and rely on userspace to perform the
>> >> lookup under its own locking rules.
>> >> 
>> >> It's slow, but writes to ROM or ROM/device are rare anyway.
>> > 
>> > Lookup what information? 
>> 
>> Where to dispatch the write.
>> 
>> In fact userspace has to do that anyway if it's a ROM/device.  There's
>> no way userspace can guess that unless we pass in the slot number (which
>> isn't synchronized with anything).
> 
> Alright, do you prefer the details of this exit to be worked out later,
> when necessary, then? 
> 
> That is, not merge this particular patch of the series? 
> 

Right.  I think it is unneeded.
diff mbox

Patch

diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
index b91bfd4..baf477e 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -2089,12 +2089,18 @@  Unused.
 			__u8  data[8];
 			__u32 len;
 			__u8  is_write;
+#ifdef __KVM_HAVE_READONLY_MEM
+			__u8 write_readonly_mem;
+#endif
 		} mmio;

 If exit_reason is KVM_EXIT_MMIO, then the vcpu has
 executed a memory-mapped I/O instruction which could not be satisfied
 by kvm.  The 'data' member contains the written data if 'is_write' is
-true, and should be filled by application code otherwise.
+true, and should be filled by application code otherwise. When the
+KVM_CAP_READONLY_MEM capability, the 'write_readonly_mem' member indicates
+whether the exit is caused by write access on the readonly memslot
+(see KVM_MEM_READONLY in 4.35 KVM_SET_USER_MEMORY_REGION).

 NOTE: For KVM_EXIT_IO, KVM_EXIT_MMIO and KVM_EXIT_OSI, the corresponding
 operations are complete (and guest state is consistent) only after userspace
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 42bbf41..6d7fe4a 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3710,9 +3710,10 @@  int emulator_write_phys(struct kvm_vcpu *vcpu, gpa_t gpa,

 	ret = kvm_write_guest(vcpu->kvm, gpa, val, bytes);
 	if (ret < 0)
-		return 0;
+		return ret;
+
 	kvm_mmu_pte_write(vcpu, gpa, val, bytes);
-	return 1;
+	return 0;
 }

 struct read_write_emulator_ops {
@@ -3742,7 +3743,7 @@  static int read_prepare(struct kvm_vcpu *vcpu, void *val, int bytes)
 static int read_emulate(struct kvm_vcpu *vcpu, gpa_t gpa,
 			void *val, int bytes)
 {
-	return !kvm_read_guest(vcpu->kvm, gpa, val, bytes);
+	return kvm_read_guest(vcpu->kvm, gpa, val, bytes);
 }

 static int write_emulate(struct kvm_vcpu *vcpu, gpa_t gpa,
@@ -3807,7 +3808,8 @@  static int emulator_read_write_onepage(unsigned long addr, void *val,
 	if (ret)
 		goto mmio;

-	if (ops->read_write_emulate(vcpu, gpa, val, bytes))
+	ret = ops->read_write_emulate(vcpu, gpa, val, bytes);
+	if (!ret)
 		return X86EMUL_CONTINUE;

 mmio:
@@ -3829,6 +3831,7 @@  mmio:
 		frag->gpa = gpa;
 		frag->data = val;
 		frag->len = now;
+		frag->write_readonly_mem = (ret == -EPERM);

 		gpa += now;
 		val += now;
@@ -3842,8 +3845,8 @@  int emulator_read_write(struct x86_emulate_ctxt *ctxt, unsigned long addr,
 			struct x86_exception *exception,
 			struct read_write_emulator_ops *ops)
 {
+	struct kvm_mmio_fragment *frag;
 	struct kvm_vcpu *vcpu = emul_to_vcpu(ctxt);
-	gpa_t gpa;
 	int rc;

 	if (ops->read_write_prepare &&
@@ -3875,17 +3878,18 @@  int emulator_read_write(struct x86_emulate_ctxt *ctxt, unsigned long addr,
 	if (!vcpu->mmio_nr_fragments)
 		return rc;

-	gpa = vcpu->mmio_fragments[0].gpa;
+	frag = &vcpu->mmio_fragments[0];

 	vcpu->mmio_needed = 1;
 	vcpu->mmio_cur_fragment = 0;

-	vcpu->run->mmio.len = vcpu->mmio_fragments[0].len;
+	vcpu->run->mmio.len = frag->len;
 	vcpu->run->mmio.is_write = vcpu->mmio_is_write = ops->write;
 	vcpu->run->exit_reason = KVM_EXIT_MMIO;
-	vcpu->run->mmio.phys_addr = gpa;
+	vcpu->run->mmio.phys_addr = frag->gpa;
+	vcpu->run->mmio.write_readonly_mem = frag->write_readonly_mem;

-	return ops->read_write_exit_mmio(vcpu, gpa, val, bytes);
+	return ops->read_write_exit_mmio(vcpu, frag->gpa, val, bytes);
 }

 static int emulator_read_emulated(struct x86_emulate_ctxt *ctxt,
@@ -5525,6 +5529,7 @@  static int complete_mmio(struct kvm_vcpu *vcpu)
 		++frag;
 		run->exit_reason = KVM_EXIT_MMIO;
 		run->mmio.phys_addr = frag->gpa;
+		vcpu->run->mmio.write_readonly_mem = frag->write_readonly_mem;
 		if (vcpu->mmio_is_write)
 			memcpy(run->mmio.data, frag->data, frag->len);
 		run->mmio.len = frag->len;
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index d808694..9d8002f 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -226,6 +226,9 @@  struct kvm_run {
 			__u8  data[8];
 			__u32 len;
 			__u8  is_write;
+#ifdef __KVM_HAVE_READONLY_MEM
+			__u8 write_readonly_mem;
+#endif
 		} mmio;
 		/* KVM_EXIT_HYPERCALL */
 		struct {
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 5972c98..c37b225 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -189,6 +189,7 @@  struct kvm_mmio_fragment {
 	gpa_t gpa;
 	void *data;
 	unsigned len;
+	bool write_readonly_mem;
 };

 struct kvm_vcpu {
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 3416f8a..0c7def7 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1456,6 +1456,9 @@  int kvm_write_guest_page(struct kvm *kvm, gfn_t gfn, const void *data,
 	unsigned long addr;

 	addr = gfn_to_hva(kvm, gfn);
+	if (addr == KVM_HVA_ERR_RO_BAD)
+		return -EPERM;
+
 	if (kvm_is_error_hva(addr))
 		return -EFAULT;
 	r = __copy_to_user((void __user *)addr + offset, data, len);