[5/6,Resend] Vhost-pci RFC: Future Security Enhancement

Message ID	1464509494-159509-6-git-send-email-wei.w.wang@intel.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org> From: Wei Wang <wei.w.wang@intel.com> To: kvm@vger.kernel.org, qemu-devel@nongnu.org, virtio-comment@lists.oasis-open.org, mst@redhat.com, stefanha@redhat.com, pbonzini@redhat.com Date: Sun, 29 May 2016 16:11:33 +0800 Message-Id: <1464509494-159509-6-git-send-email-wei.w.wang@intel.com> In-Reply-To: <1464509494-159509-1-git-send-email-wei.w.wang@intel.com> References: <1464509494-159509-1-git-send-email-wei.w.wang@intel.com> Subject: [Qemu-devel] [PATCH 5/6 Resend] Vhost-pci RFC: Future Security Enhancement Precedence: list Cc: Wei Wang <wei.w.wang@intel.com> Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" <qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org>

Message ID

1464509494-159509-6-git-send-email-wei.w.wang@intel.com (mailing list archive)

State

New, archived

Headers

From: Wei Wang <wei.w.wang@intel.com>
To: kvm@vger.kernel.org, qemu-devel@nongnu.org,
	virtio-comment@lists.oasis-open.org, mst@redhat.com,
	stefanha@redhat.com, pbonzini@redhat.com
Date: Sun, 29 May 2016 16:11:33 +0800
Message-Id: <1464509494-159509-6-git-send-email-wei.w.wang@intel.com>
In-Reply-To: <1464509494-159509-1-git-send-email-wei.w.wang@intel.com>
References: <1464509494-159509-1-git-send-email-wei.w.wang@intel.com>
Subject: [Qemu-devel] [PATCH 5/6 Resend] Vhost-pci RFC: Future Security
	Enhancement
Precedence: list
Cc: Wei Wang <wei.w.wang@intel.com>
Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org
Sender: "Qemu-devel"
	<qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org>

Comments

Jan Kiszka May 30, 2016, 6:23 a.m. UTC | #1

On 2016-05-29 10:11, Wei Wang wrote:
> Signed-off-by: Wei Wang <wei.w.wang@intel.com>
> ---
>  FutureWorks | 21 +++++++++++++++++++++
>  1 file changed, 21 insertions(+)
>  create mode 100644 FutureWorks
> 
> diff --git a/FutureWorks b/FutureWorks
> new file mode 100644
> index 0000000..210edcd
> --- /dev/null
> +++ b/FutureWorks
> @@ -0,0 +1,21 @@
> +The vhost-pci design is currently suitable for a group of VMs who trust each
> +other. To extend it to a more general use case, two security features can be
> +added in the future.

Sounds a bit like security is just "nice to have" in the foreseen use
cases of this mechanism. Is that really true?

> +
> +1 vIOMMU
> +vIOMMU provides the driver VM with the ability to restrict the device VM to
> +transiently access a specified portion of its memory. The vhost-pci design
> +proposed in this RFC can be extended to access the driver VM's memory with
> +vIOMMU. Precisely, the vIOMMU engine in the driver VM configures access
> +permissions (R/W) for the vhost-pci device to access its memory. More details
> +can be found at https://wiki.opnfv.org/display/kvm/Vm2vm+Mst and
> +https://lists.gnu.org/archive/html/qemu-devel/2015-08/msg03993.html

Do you have performance estimates on this approach already?

One challenge should be how to let the VMs reuse existing buffer
mappings so that the vIOMMU isn't continuously reprogrammed - which is
likely not very efficient.

The other is how to hand over packets/buffers in a chain of multiple
VMs. Ideally, there is already a hand-over from sender to the first
receiver so that the sender can no longer mess with the packet after the
receiver started processing it. However, that will work against efficiency.

Essentially, it's the old IPC question of remap vs. copy here. The rest
is "just" interfaces to exploit this elegantly.

> +
> +2 eptp switching
> +The idea of eptp swithing allows a vhost-pci device driver to access the mapped
> +driver VM's memory in an alternative view, where only a piece of trusted code
> +can access the driver VM's memory. More details can be found at
> +http://events.linuxfoundation.org/sites/events/files/slides/
> +Jun_Nakajima_NFV_KVM%202015_final.pdf

As we learned a while back, this one is not really secure. Any updates
on if/how this is going to be fixed?

Jan

Wang, Wei W May 31, 2016, 8 a.m. UTC | #2

On Mon 5/30/2016 2:24 PM, Jan Kiszka Wrote:
> On 2016-05-29 10:11, Wei Wang wrote:
> > Signed-off-by: Wei Wang <wei.w.wang@intel.com>
> > ---
> >  FutureWorks | 21 +++++++++++++++++++++
> >  1 file changed, 21 insertions(+)
> >  create mode 100644 FutureWorks
> >
> > diff --git a/FutureWorks b/FutureWorks new file mode 100644 index
> > 0000000..210edcd
> > --- /dev/null
> > +++ b/FutureWorks
> > @@ -0,0 +1,21 @@
> > +The vhost-pci design is currently suitable for a group of VMs who
> > +trust each other. To extend it to a more general use case, two
> > +security features can be added in the future.
> 
> Sounds a bit like security is just "nice to have" in the foreseen use cases of this
> mechanism. Is that really true?

Not really. It's usually a tradeoff between performance and security, so I think having security doesn't always mean "Nice" :-)

Instead of proposing a compromised solution, we can actually offer two independent solutions, performance oriented vhost-pci (let's call it fast vhost-pci) and security oriented vhost-pci (say, secure vhost-pci). It's up to the users to choose which one to use according to their use cases. So, the secured version of vhost-pci can be viewed as another option for users (not a replacement of this proposal).

Here is a use example:
There are two groups of VMs running on the same host machine. The frequent inter-VM communication between VMs in Group A can choose the fast vhost-pci mechanism. In a special case that a VM from Group A needs to communicate with a VM from Group B, they should set up a new NIC each and specify the use of the secure vhost-pci. 
Since the secure vhost-pci is on our future plan, the traditional vhost-user can be an option for that inter-Group communication currently.

> > +
> > +1 vIOMMU
> > +vIOMMU provides the driver VM with the ability to restrict the device
> > +VM to transiently access a specified portion of its memory. The
> > +vhost-pci design proposed in this RFC can be extended to access the
> > +driver VM's memory with vIOMMU. Precisely, the vIOMMU engine in the
> > +driver VM configures access permissions (R/W) for the vhost-pci
> > +device to access its memory. More details can be found at
> > +https://wiki.opnfv.org/display/kvm/Vm2vm+Mst and
> > +https://lists.gnu.org/archive/html/qemu-devel/2015-08/msg03993.html
> 
> Do you have performance estimates on this approach already?
> 
> One challenge should be how to let the VMs reuse existing buffer mappings so
> that the vIOMMU isn't continuously reprogrammed - which is likely not very
> efficient.

I think one option here is to reserve a large block of GPA area (like a memory pool). The buffers are allocated from and freed to the pool.

Another one would be using batching. For example, set up a batch of 32 buffers (just give the starting guest physical address, and the 32 buffers are guest-physically continuous) each time.

> The other is how to hand over packets/buffers in a chain of multiple VMs. Ideally,
> there is already a hand-over from sender to the first receiver so that the sender
> can no longer mess with the packet after the receiver started processing it.
> However, that will work against efficiency.
> 
> Essentially, it's the old IPC question of remap vs. copy here. The rest is "just"
> interfaces to exploit this elegantly.

There are several ways to do a remapping. The remapping we are using here is to have the entire driver VM's memory mapped by the device VM, that is, the driver VM's memory is completely shared with the device VM. 

If I understand that old remapping based IPC problem correctly, this kind of remapping requires a high degree of coordination between the two parts (i.e. the device VM and the driver VM). I think "virtq" is right the coordination mechanism here - the device VM grabs and fills a buffer from the available ring and puts the filled buffer to the used ring,  so I don't think the sender would mess with the packet after the receiver started to process it.
Please point out if I didn't get your point correctly. Thanks.

> 
> > +
> > +2 eptp switching
> > +The idea of eptp swithing allows a vhost-pci device driver to access
> > +the mapped driver VM's memory in an alternative view, where only a
> > +piece of trusted code can access the driver VM's memory. More details
> > +can be found at
> > +http://events.linuxfoundation.org/sites/events/files/slides/
> > +Jun_Nakajima_NFV_KVM%202015_final.pdf
> 
> As we learned a while back, this one is not really secure. Any updates on if/how
> this is going to be fixed?
> 

Right, that is why we claimed it as a protection mechanism. However, one option we are trying is to fix the related holes from the hardware. We can give updates once it's ready.

Best,
Wei

Jan Kiszka June 2, 2016, 9:27 a.m. UTC | #3

On 2016-05-31 10:00, Wang, Wei W wrote:
> On Mon 5/30/2016 2:24 PM, Jan Kiszka Wrote:
>> On 2016-05-29 10:11, Wei Wang wrote:
>>> Signed-off-by: Wei Wang <wei.w.wang@intel.com>
>>> ---
>>>  FutureWorks | 21 +++++++++++++++++++++
>>>  1 file changed, 21 insertions(+)
>>>  create mode 100644 FutureWorks
>>>
>>> diff --git a/FutureWorks b/FutureWorks new file mode 100644 index
>>> 0000000..210edcd
>>> --- /dev/null
>>> +++ b/FutureWorks
>>> @@ -0,0 +1,21 @@
>>> +The vhost-pci design is currently suitable for a group of VMs who
>>> +trust each other. To extend it to a more general use case, two
>>> +security features can be added in the future.
>>
>> Sounds a bit like security is just "nice to have" in the foreseen use cases of this
>> mechanism. Is that really true?
> 
> Not really. It's usually a tradeoff between performance and security, so I think having security doesn't always mean "Nice" :-)

I don't disagree. I'm rather wondering if the variant without isolation
has valid use-cases at all.

> 
> Instead of proposing a compromised solution, we can actually offer two independent solutions, performance oriented vhost-pci (let's call it fast vhost-pci) and security oriented vhost-pci (say, secure vhost-pci). It's up to the users to choose which one to use according to their use cases. So, the secured version of vhost-pci can be viewed as another option for users (not a replacement of this proposal).
> 
> Here is a use example:
> There are two groups of VMs running on the same host machine. The frequent inter-VM communication between VMs in Group A can choose the fast vhost-pci mechanism. In a special case that a VM from Group A needs to communicate with a VM from Group B, they should set up a new NIC each and specify the use of the secure vhost-pci. 
> Since the secure vhost-pci is on our future plan, the traditional vhost-user can be an option for that inter-Group communication currently.
> 
>>> +
>>> +1 vIOMMU
>>> +vIOMMU provides the driver VM with the ability to restrict the device
>>> +VM to transiently access a specified portion of its memory. The
>>> +vhost-pci design proposed in this RFC can be extended to access the
>>> +driver VM's memory with vIOMMU. Precisely, the vIOMMU engine in the
>>> +driver VM configures access permissions (R/W) for the vhost-pci
>>> +device to access its memory. More details can be found at
>>> +https://wiki.opnfv.org/display/kvm/Vm2vm+Mst and
>>> +https://lists.gnu.org/archive/html/qemu-devel/2015-08/msg03993.html
>>
>> Do you have performance estimates on this approach already?
>>
>> One challenge should be how to let the VMs reuse existing buffer mappings so
>> that the vIOMMU isn't continuously reprogrammed - which is likely not very
>> efficient.
> 
> I think one option here is to reserve a large block of GPA area (like a memory pool). The buffers are allocated from and freed to the pool.

That's basically communication via shared memory regions (like ivshmem).
We'll do this for safety and security critical setups where we need to
keep the hypervisor complexity minimal (we cannot effort a vIOMMU
implementation, e.g.). Possibly, there is some room for sharing solution
details on the guest side here.

> 
> Another one would be using batching. For example, set up a batch of 32 buffers (just give the starting guest physical address, and the 32 buffers are guest-physically continuous) each time.
> 
>> The other is how to hand over packets/buffers in a chain of multiple VMs. Ideally,
>> there is already a hand-over from sender to the first receiver so that the sender
>> can no longer mess with the packet after the receiver started processing it.
>> However, that will work against efficiency.
>>
>> Essentially, it's the old IPC question of remap vs. copy here. The rest is "just"
>> interfaces to exploit this elegantly.
> 
> There are several ways to do a remapping. The remapping we are using here is to have the entire driver VM's memory mapped by the device VM, that is, the driver VM's memory is completely shared with the device VM. 

I know - that's what I would call "without isolation".

> 
> If I understand that old remapping based IPC problem correctly, this kind of remapping requires a high degree of coordination between the two parts (i.e. the device VM and the driver VM). I think "virtq" is right the coordination mechanism here - the device VM grabs and fills a buffer from the available ring and puts the filled buffer to the used ring,  so I don't think the sender would mess with the packet after the receiver started to process it.
> Please point out if I didn't get your point correctly. Thanks.

If you look at kdbus, e.g., they have some threshold value for handing
over messages from sender to receiver via changes to the page tables
(removal from sender side, addition to reader side - they don't rely on
both sides being "nice" with each other) vs. simply copying the data
between both address spaces. The costly paging changes - and it doesn't
matter of they affect a process or a VM - apparently only pay off if the
data blocks are large enough. A typical network packet is way below that
threshold (a few 100K on x86 IIRC).

> 
>>
>>> +
>>> +2 eptp switching
>>> +The idea of eptp swithing allows a vhost-pci device driver to access
>>> +the mapped driver VM's memory in an alternative view, where only a
>>> +piece of trusted code can access the driver VM's memory. More details
>>> +can be found at
>>> +http://events.linuxfoundation.org/sites/events/files/slides/
>>> +Jun_Nakajima_NFV_KVM%202015_final.pdf
>>
>> As we learned a while back, this one is not really secure. Any updates on if/how
>> this is going to be fixed?
>>
> 
> Right, that is why we claimed it as a protection mechanism. However, one option we are trying is to fix the related holes from the hardware. We can give updates once it's ready.

Thanks, looking forward.

Jan

Wang, Wei W June 3, 2016, 5:54 a.m. UTC | #4

On Thu 6/2/2016 5:27 PM, Jan Kiszka wrote:
> On 2016-05-31 10:00, Wang, Wei W wrote:
> > On Mon 5/30/2016 2:24 PM, Jan Kiszka Wrote:
> >> On 2016-05-29 10:11, Wei Wang wrote:
> >>> Signed-off-by: Wei Wang <wei.w.wang@intel.com>
> >>> ---
> >>>  FutureWorks | 21 +++++++++++++++++++++
> >>>  1 file changed, 21 insertions(+)
> >>>  create mode 100644 FutureWorks
> >>>
> >>> diff --git a/FutureWorks b/FutureWorks new file mode 100644 index
> >>> 0000000..210edcd
> >>> --- /dev/null
> >>> +++ b/FutureWorks
> >>> @@ -0,0 +1,21 @@
> >>> +The vhost-pci design is currently suitable for a group of VMs who
> >>> +trust each other. To extend it to a more general use case, two
> >>> +security features can be added in the future.
> >>
> >> Sounds a bit like security is just "nice to have" in the foreseen use
> >> cases of this mechanism. Is that really true?
> >
> > Not really. It's usually a tradeoff between performance and security,
> > so I think having security doesn't always mean "Nice" :-)
> 
> I don't disagree. I'm rather wondering if the variant without isolation has valid
> use-cases at all.


I think one of the use examples is Network Function Virtualization. A group of 
Network Function VMs are chained together. AFAIK, compared to isolation,
performance is more important to them.


> > Instead of proposing a compromised solution, we can actually offer two
> independent solutions, performance oriented vhost-pci (let's call it fast vhost-pci)
> and security oriented vhost-pci (say, secure vhost-pci). It's up to the users to
> choose which one to use according to their use cases. So, the secured version of
> vhost-pci can be viewed as another option for users (not a replacement of this
> proposal).
> >
> > Here is a use example:
> > There are two groups of VMs running on the same host machine. The frequent
> inter-VM communication between VMs in Group A can choose the fast vhost-pci
> mechanism. In a special case that a VM from Group A needs to communicate
> with a VM from Group B, they should set up a new NIC each and specify the use
> of the secure vhost-pci.
> > Since the secure vhost-pci is on our future plan, the traditional vhost-user can
> be an option for that inter-Group communication currently.
> >
> >>> +
> >>> +1 vIOMMU
> >>> +vIOMMU provides the driver VM with the ability to restrict the
> >>> +device VM to transiently access a specified portion of its memory.
> >>> +The vhost-pci design proposed in this RFC can be extended to access
> >>> +the driver VM's memory with vIOMMU. Precisely, the vIOMMU engine in
> >>> +the driver VM configures access permissions (R/W) for the vhost-pci
> >>> +device to access its memory. More details can be found at
> >>> +https://wiki.opnfv.org/display/kvm/Vm2vm+Mst and
> >>> +https://lists.gnu.org/archive/html/qemu-devel/2015-08/msg03993.html
> >>
> >> Do you have performance estimates on this approach already?
> >>
> >> One challenge should be how to let the VMs reuse existing buffer
> >> mappings so that the vIOMMU isn't continuously reprogrammed - which
> >> is likely not very efficient.
> >
> > I think one option here is to reserve a large block of GPA area (like a memory
> pool). The buffers are allocated from and freed to the pool.
> 
> That's basically communication via shared memory regions (like ivshmem).
> We'll do this for safety and security critical setups where we need to keep the
> hypervisor complexity minimal (we cannot effort a vIOMMU implementation,
> e.g.). Possibly, there is some room for sharing solution details on the guest side
> here.

I think people from redhat have started digging into the guest side vIOMMU. 

> >
> > Another one would be using batching. For example, set up a batch of 32
> buffers (just give the starting guest physical address, and the 32 buffers are
> guest-physically continuous) each time.
> >
> >> The other is how to hand over packets/buffers in a chain of multiple
> >> VMs. Ideally, there is already a hand-over from sender to the first
> >> receiver so that the sender can no longer mess with the packet after the
> receiver started processing it.
> >> However, that will work against efficiency.
> >>
> >> Essentially, it's the old IPC question of remap vs. copy here. The rest is "just"
> >> interfaces to exploit this elegantly.
> >
> > There are several ways to do a remapping. The remapping we are using here is
> to have the entire driver VM's memory mapped by the device VM, that is, the
> driver VM's memory is completely shared with the device VM.
> 
> I know - that's what I would call "without isolation".
> 
> >
> > If I understand that old remapping based IPC problem correctly, this kind of
> remapping requires a high degree of coordination between the two parts (i.e.
> the device VM and the driver VM). I think "virtq" is right the coordination
> mechanism here - the device VM grabs and fills a buffer from the available ring
> and puts the filled buffer to the used ring,  so I don't think the sender would
> mess with the packet after the receiver started to process it.
> > Please point out if I didn't get your point correctly. Thanks.
> 
> If you look at kdbus, e.g., they have some threshold value for handing over
> messages from sender to receiver via changes to the page tables (removal from
> sender side, addition to reader side - they don't rely on both sides being "nice"
> with each other) vs. simply copying the data between both address spaces. The
> costly paging changes - and it doesn't matter of they affect a process or a VM -
> apparently only pay off if the data blocks are large enough. A typical network
> packet is way below that threshold (a few 100K on x86 IIRC).

I think it's different. This method is like page flipping, which frequently modifies
both the sender and receiver's pages tables. It is costly because frequently maintaining 
complete TLB consistency across all the CPUs are expensive. I think that's why they
use a threshold to avoid TLB thrashing.

The mapping setup in vhost-pci is a one-time thing (we can ignore the memory hotplug
cases in the discussion, as that happens relatively rare). We don't need that frequent 
page table modifications to transfer data, so I think we also don't have that limitation.


Best,
Wei
> 
> >
> >>
> >>> +
> >>> +2 eptp switching
> >>> +The idea of eptp swithing allows a vhost-pci device driver to
> >>> +access the mapped driver VM's memory in an alternative view, where
> >>> +only a piece of trusted code can access the driver VM's memory.
> >>> +More details can be found at
> >>> +http://events.linuxfoundation.org/sites/events/files/slides/
> >>> +Jun_Nakajima_NFV_KVM%202015_final.pdf
> >>
> >> As we learned a while back, this one is not really secure. Any
> >> updates on if/how this is going to be fixed?
> >>
> >
> > Right, that is why we claimed it as a protection mechanism. However, one
> option we are trying is to fix the related holes from the hardware. We can give
> updates once it's ready.
> 
> Thanks, looking forward.
> 
> Jan
> 
> --
> Siemens AG, Corporate Technology, CT RDA ITP SES-DE Corporate Competence
> Center Embedded Linux

diff --git a/FutureWorks b/FutureWorks
new file mode 100644
index 0000000..210edcd
--- /dev/null
+++ b/FutureWorks
@@ -0,0 +1,21 @@ 
+The vhost-pci design is currently suitable for a group of VMs who trust each
+other. To extend it to a more general use case, two security features can be
+added in the future.
+
+1 vIOMMU
+vIOMMU provides the driver VM with the ability to restrict the device VM to
+transiently access a specified portion of its memory. The vhost-pci design
+proposed in this RFC can be extended to access the driver VM's memory with
+vIOMMU. Precisely, the vIOMMU engine in the driver VM configures access
+permissions (R/W) for the vhost-pci device to access its memory. More details
+can be found at https://wiki.opnfv.org/display/kvm/Vm2vm+Mst and
+https://lists.gnu.org/archive/html/qemu-devel/2015-08/msg03993.html
+
+2 eptp switching
+The idea of eptp swithing allows a vhost-pci device driver to access the mapped
+driver VM's memory in an alternative view, where only a piece of trusted code
+can access the driver VM's memory. More details can be found at
+http://events.linuxfoundation.org/sites/events/files/slides/
+Jun_Nakajima_NFV_KVM%202015_final.pdf
+
+

[5/6,Resend] Vhost-pci RFC: Future Security Enhancement

Commit Message

Comments

Patch