diff mbox

[for-2.9,2/2] intel_iommu: extend supported guest aw to 48 bits

Message ID 1481089965-3888-3-git-send-email-peterx@redhat.com (mailing list archive)
State New, archived
Headers show

Commit Message

Peter Xu Dec. 7, 2016, 5:52 a.m. UTC
Previously vt-d codes only supports 39 bits iova address width. It won't
be hard to extend it to 48 bits.

After enabling this, we should be able to map larger iova addresses.

To check whether 48 bits aw is enabled, we can grep in the guest dmesg
with line: "dmar: Host address width 48" (previously it was 39).

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 hw/i386/intel_iommu_internal.h | 5 +++--
 include/hw/i386/intel_iommu.h  | 2 +-
 2 files changed, 4 insertions(+), 3 deletions(-)

Comments

Jason Wang Dec. 8, 2016, 2 a.m. UTC | #1
On 2016年12月07日 13:52, Peter Xu wrote:
> Previously vt-d codes only supports 39 bits iova address width. It won't
> be hard to extend it to 48 bits.
>
> After enabling this, we should be able to map larger iova addresses.
>
> To check whether 48 bits aw is enabled, we can grep in the guest dmesg
> with line: "dmar: Host address width 48" (previously it was 39).
>
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>   hw/i386/intel_iommu_internal.h | 5 +++--
>   include/hw/i386/intel_iommu.h  | 2 +-
>   2 files changed, 4 insertions(+), 3 deletions(-)

Reviewed-by: Jason Wang <jasowang@redhat.com>

>
> diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
> index e808c67..00e1e16 100644
> --- a/hw/i386/intel_iommu_internal.h
> +++ b/hw/i386/intel_iommu_internal.h
> @@ -195,7 +195,7 @@
>   #define VTD_DOMAIN_ID_SHIFT         16  /* 16-bit domain id for 64K domains */
>   #define VTD_DOMAIN_ID_MASK          ((1UL << VTD_DOMAIN_ID_SHIFT) - 1)
>   #define VTD_CAP_ND                  (((VTD_DOMAIN_ID_SHIFT - 4) / 2) & 7ULL)
> -#define VTD_MGAW                    39  /* Maximum Guest Address Width */
> +#define VTD_MGAW                    48  /* Maximum Guest Address Width */
>   #define VTD_CAP_MGAW                (((VTD_MGAW - 1) & 0x3fULL) << 16)
>   #define VTD_MAMV                    18ULL
>   #define VTD_CAP_MAMV                (VTD_MAMV << 48)
> @@ -209,7 +209,8 @@
>   #define VTD_CAP_SAGAW_39bit         (0x2ULL << VTD_CAP_SAGAW_SHIFT)
>    /* 48-bit AGAW, 4-level page-table */
>   #define VTD_CAP_SAGAW_48bit         (0x4ULL << VTD_CAP_SAGAW_SHIFT)
> -#define VTD_CAP_SAGAW               VTD_CAP_SAGAW_39bit
> +#define VTD_CAP_SAGAW               (VTD_CAP_SAGAW_39bit | \
> +                                     VTD_CAP_SAGAW_48bit)
>   
>   /* IQT_REG */
>   #define VTD_IQT_QT(val)             (((val) >> 4) & 0x7fffULL)
> diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
> index 405c9d1..8e0fe65 100644
> --- a/include/hw/i386/intel_iommu.h
> +++ b/include/hw/i386/intel_iommu.h
> @@ -44,7 +44,7 @@
>   #define VTD_SID_TO_DEVFN(sid)       ((sid) & 0xff)
>   
>   #define DMAR_REG_SIZE               0x230
> -#define VTD_HOST_ADDRESS_WIDTH      39
> +#define VTD_HOST_ADDRESS_WIDTH      48
>   #define VTD_HAW_MASK                ((1ULL << VTD_HOST_ADDRESS_WIDTH) - 1)
>   
>   #define DMAR_REPORT_F_INTR          (1)
Michael S. Tsirkin Dec. 11, 2016, 3:13 a.m. UTC | #2
On Wed, Dec 07, 2016 at 01:52:45PM +0800, Peter Xu wrote:
> Previously vt-d codes only supports 39 bits iova address width. It won't
> be hard to extend it to 48 bits.
> 
> After enabling this, we should be able to map larger iova addresses.
> 
> To check whether 48 bits aw is enabled, we can grep in the guest dmesg
> with line: "dmar: Host address width 48" (previously it was 39).
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>

I suspect we can't do this for old machine types.
Need to behave in compatible ways.
Also, is 48 always enough? 5 level with 57 bits
is just around the corner.
And is it always supported? for things like vfio
to work, don't we need to check what does host support?


> ---
>  hw/i386/intel_iommu_internal.h | 5 +++--
>  include/hw/i386/intel_iommu.h  | 2 +-
>  2 files changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
> index e808c67..00e1e16 100644
> --- a/hw/i386/intel_iommu_internal.h
> +++ b/hw/i386/intel_iommu_internal.h
> @@ -195,7 +195,7 @@
>  #define VTD_DOMAIN_ID_SHIFT         16  /* 16-bit domain id for 64K domains */
>  #define VTD_DOMAIN_ID_MASK          ((1UL << VTD_DOMAIN_ID_SHIFT) - 1)
>  #define VTD_CAP_ND                  (((VTD_DOMAIN_ID_SHIFT - 4) / 2) & 7ULL)
> -#define VTD_MGAW                    39  /* Maximum Guest Address Width */
> +#define VTD_MGAW                    48  /* Maximum Guest Address Width */
>  #define VTD_CAP_MGAW                (((VTD_MGAW - 1) & 0x3fULL) << 16)
>  #define VTD_MAMV                    18ULL
>  #define VTD_CAP_MAMV                (VTD_MAMV << 48)
> @@ -209,7 +209,8 @@
>  #define VTD_CAP_SAGAW_39bit         (0x2ULL << VTD_CAP_SAGAW_SHIFT)
>   /* 48-bit AGAW, 4-level page-table */
>  #define VTD_CAP_SAGAW_48bit         (0x4ULL << VTD_CAP_SAGAW_SHIFT)
> -#define VTD_CAP_SAGAW               VTD_CAP_SAGAW_39bit
> +#define VTD_CAP_SAGAW               (VTD_CAP_SAGAW_39bit | \
> +                                     VTD_CAP_SAGAW_48bit)
>  
>  /* IQT_REG */
>  #define VTD_IQT_QT(val)             (((val) >> 4) & 0x7fffULL)
> diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
> index 405c9d1..8e0fe65 100644
> --- a/include/hw/i386/intel_iommu.h
> +++ b/include/hw/i386/intel_iommu.h
> @@ -44,7 +44,7 @@
>  #define VTD_SID_TO_DEVFN(sid)       ((sid) & 0xff)
>  
>  #define DMAR_REG_SIZE               0x230
> -#define VTD_HOST_ADDRESS_WIDTH      39
> +#define VTD_HOST_ADDRESS_WIDTH      48
>  #define VTD_HAW_MASK                ((1ULL << VTD_HOST_ADDRESS_WIDTH) - 1)
>  
>  #define DMAR_REPORT_F_INTR          (1)
> -- 
> 2.7.4
Peter Xu Dec. 12, 2016, 2:01 a.m. UTC | #3
On Sun, Dec 11, 2016 at 05:13:45AM +0200, Michael S. Tsirkin wrote:
> On Wed, Dec 07, 2016 at 01:52:45PM +0800, Peter Xu wrote:
> > Previously vt-d codes only supports 39 bits iova address width. It won't
> > be hard to extend it to 48 bits.
> > 
> > After enabling this, we should be able to map larger iova addresses.
> > 
> > To check whether 48 bits aw is enabled, we can grep in the guest dmesg
> > with line: "dmar: Host address width 48" (previously it was 39).
> > 
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> 
> I suspect we can't do this for old machine types.
> Need to behave in compatible ways.

Sure. I can do that.

Btw, is vt-d iommu still in experimental stage? I am just thinking
whether it'll be overkill we add lots of tunables before we have one
stable and mature vt-d emulation.

> Also, is 48 always enough? 5 level with 57 bits
> is just around the corner.

Please refer to the discussion with Jason - looks like vt-d spec
currently supports only 39/48 bits address width? Please correct if I
made a mistake.

> And is it always supported? for things like vfio
> to work, don't we need to check what does host support?

Hmm, yes, we should do that. But until now, we still don't have a
complete vfio support. IMHO we can postpone this issue until vfio is
fully supported.

Thanks,

-- peterx
Alex Williamson Dec. 12, 2016, 7:35 p.m. UTC | #4
On Mon, 12 Dec 2016 10:01:15 +0800
Peter Xu <peterx@redhat.com> wrote:

> On Sun, Dec 11, 2016 at 05:13:45AM +0200, Michael S. Tsirkin wrote:
> > On Wed, Dec 07, 2016 at 01:52:45PM +0800, Peter Xu wrote:  
> > > Previously vt-d codes only supports 39 bits iova address width. It won't
> > > be hard to extend it to 48 bits.
> > > 
> > > After enabling this, we should be able to map larger iova addresses.
> > > 
> > > To check whether 48 bits aw is enabled, we can grep in the guest dmesg
> > > with line: "dmar: Host address width 48" (previously it was 39).
> > > 
> > > Signed-off-by: Peter Xu <peterx@redhat.com>  
> > 
> > I suspect we can't do this for old machine types.
> > Need to behave in compatible ways.  
> 
> Sure. I can do that.
> 
> Btw, is vt-d iommu still in experimental stage? I am just thinking
> whether it'll be overkill we add lots of tunables before we have one
> stable and mature vt-d emulation.
> 
> > Also, is 48 always enough? 5 level with 57 bits
> > is just around the corner.  
> 
> Please refer to the discussion with Jason - looks like vt-d spec
> currently supports only 39/48 bits address width? Please correct if I
> made a mistake.
> 
> > And is it always supported? for things like vfio
> > to work, don't we need to check what does host support?  
> 
> Hmm, yes, we should do that. But until now, we still don't have a
> complete vfio support. IMHO we can postpone this issue until vfio is
> fully supported.

I'm not sure how the vIOMMU supporting 39 bits or 48 bits is directly
relevant to vfio, we're not sharing page tables.  There is already a
case today, without vIOMMU that you can make a guest which has more
guest physical address space than the hardware IOMMU by overcommitting
system memory.  Generally this quickly resolves itself when we start
pinning pages since the physical address width of the IOMMU is
typically the same as the physical address width of the host system
(ie. we exhaust the host memory).  It is possible though that we could
create a sparse memory VM that makes use of gfns beyond the physical
IOMMU, with or without a vIOMMU.  You'll get an error from the mapping
ioctl when this occurs and the VM will abort.  It's not typically an
issue since the memory capacity of the host and the IOMMU physical
address width really better align fairly well.  Thanks,

Alex
Peter Xu Dec. 13, 2016, 3:33 a.m. UTC | #5
On Mon, Dec 12, 2016 at 12:35:44PM -0700, Alex Williamson wrote:
> On Mon, 12 Dec 2016 10:01:15 +0800
> Peter Xu <peterx@redhat.com> wrote:
> 
> > On Sun, Dec 11, 2016 at 05:13:45AM +0200, Michael S. Tsirkin wrote:
> > > On Wed, Dec 07, 2016 at 01:52:45PM +0800, Peter Xu wrote:  
> > > > Previously vt-d codes only supports 39 bits iova address width. It won't
> > > > be hard to extend it to 48 bits.
> > > > 
> > > > After enabling this, we should be able to map larger iova addresses.
> > > > 
> > > > To check whether 48 bits aw is enabled, we can grep in the guest dmesg
> > > > with line: "dmar: Host address width 48" (previously it was 39).
> > > > 
> > > > Signed-off-by: Peter Xu <peterx@redhat.com>  
> > > 
> > > I suspect we can't do this for old machine types.
> > > Need to behave in compatible ways.  
> > 
> > Sure. I can do that.
> > 
> > Btw, is vt-d iommu still in experimental stage? I am just thinking
> > whether it'll be overkill we add lots of tunables before we have one
> > stable and mature vt-d emulation.
> > 
> > > Also, is 48 always enough? 5 level with 57 bits
> > > is just around the corner.  
> > 
> > Please refer to the discussion with Jason - looks like vt-d spec
> > currently supports only 39/48 bits address width? Please correct if I
> > made a mistake.
> > 
> > > And is it always supported? for things like vfio
> > > to work, don't we need to check what does host support?  
> > 
> > Hmm, yes, we should do that. But until now, we still don't have a
> > complete vfio support. IMHO we can postpone this issue until vfio is
> > fully supported.
> 
> I'm not sure how the vIOMMU supporting 39 bits or 48 bits is directly
> relevant to vfio, we're not sharing page tables.  There is already a
> case today, without vIOMMU that you can make a guest which has more
> guest physical address space than the hardware IOMMU by overcommitting
> system memory.  Generally this quickly resolves itself when we start
> pinning pages since the physical address width of the IOMMU is
> typically the same as the physical address width of the host system
> (ie. we exhaust the host memory).

Hi, Alex,

Here does "hardware IOMMU" means the IOMMU iova address space width?
For example, if guest has 48 bits physical address width (without
vIOMMU), but host hardware IOMMU only supports 39 bits for its iova
address space, could device assigment work in this case?

Thanks,

-- peterx
Alex Williamson Dec. 13, 2016, 3:51 a.m. UTC | #6
On Tue, 13 Dec 2016 11:33:41 +0800
Peter Xu <peterx@redhat.com> wrote:

> On Mon, Dec 12, 2016 at 12:35:44PM -0700, Alex Williamson wrote:
> > On Mon, 12 Dec 2016 10:01:15 +0800
> > Peter Xu <peterx@redhat.com> wrote:
> >   
> > > On Sun, Dec 11, 2016 at 05:13:45AM +0200, Michael S. Tsirkin wrote:  
> > > > On Wed, Dec 07, 2016 at 01:52:45PM +0800, Peter Xu wrote:    
> > > > > Previously vt-d codes only supports 39 bits iova address width. It won't
> > > > > be hard to extend it to 48 bits.
> > > > > 
> > > > > After enabling this, we should be able to map larger iova addresses.
> > > > > 
> > > > > To check whether 48 bits aw is enabled, we can grep in the guest dmesg
> > > > > with line: "dmar: Host address width 48" (previously it was 39).
> > > > > 
> > > > > Signed-off-by: Peter Xu <peterx@redhat.com>    
> > > > 
> > > > I suspect we can't do this for old machine types.
> > > > Need to behave in compatible ways.    
> > > 
> > > Sure. I can do that.
> > > 
> > > Btw, is vt-d iommu still in experimental stage? I am just thinking
> > > whether it'll be overkill we add lots of tunables before we have one
> > > stable and mature vt-d emulation.
> > >   
> > > > Also, is 48 always enough? 5 level with 57 bits
> > > > is just around the corner.    
> > > 
> > > Please refer to the discussion with Jason - looks like vt-d spec
> > > currently supports only 39/48 bits address width? Please correct if I
> > > made a mistake.
> > >   
> > > > And is it always supported? for things like vfio
> > > > to work, don't we need to check what does host support?    
> > > 
> > > Hmm, yes, we should do that. But until now, we still don't have a
> > > complete vfio support. IMHO we can postpone this issue until vfio is
> > > fully supported.  
> > 
> > I'm not sure how the vIOMMU supporting 39 bits or 48 bits is directly
> > relevant to vfio, we're not sharing page tables.  There is already a
> > case today, without vIOMMU that you can make a guest which has more
> > guest physical address space than the hardware IOMMU by overcommitting
> > system memory.  Generally this quickly resolves itself when we start
> > pinning pages since the physical address width of the IOMMU is
> > typically the same as the physical address width of the host system
> > (ie. we exhaust the host memory).  
> 
> Hi, Alex,
> 
> Here does "hardware IOMMU" means the IOMMU iova address space width?
> For example, if guest has 48 bits physical address width (without
> vIOMMU), but host hardware IOMMU only supports 39 bits for its iova
> address space, could device assigment work in this case?

The current usage depends entirely on what the user (VM) tries to map.
You could expose a vIOMMU with a 64bit address width, but the moment
you try to perform a DMA mapping with IOVA beyond bit 39 (if that's the
host IOMMU address width), the ioctl will fail and the VM will abort.
IOW, you can claim whatever vIOMMU address width you want, but if you
layout guest memory or devices in such a way that actually require IOVA
mapping beyond the host capabilities, you're going to abort.  Likewise,
without a vIOMMU if the guest memory layout is sufficiently sparse to
require such IOVAs, you're going to abort.  Thanks,

Alex
Tian, Kevin Dec. 13, 2016, 5 a.m. UTC | #7
> From: Alex Williamson
> Sent: Tuesday, December 13, 2016 3:36 AM
> 
> On Mon, 12 Dec 2016 10:01:15 +0800
> Peter Xu <peterx@redhat.com> wrote:
> 
> > On Sun, Dec 11, 2016 at 05:13:45AM +0200, Michael S. Tsirkin wrote:
> > > On Wed, Dec 07, 2016 at 01:52:45PM +0800, Peter Xu wrote:
> > > > Previously vt-d codes only supports 39 bits iova address width. It won't
> > > > be hard to extend it to 48 bits.
> > > >
> > > > After enabling this, we should be able to map larger iova addresses.
> > > >
> > > > To check whether 48 bits aw is enabled, we can grep in the guest dmesg
> > > > with line: "dmar: Host address width 48" (previously it was 39).
> > > >
> > > > Signed-off-by: Peter Xu <peterx@redhat.com>
> > >
> > > I suspect we can't do this for old machine types.
> > > Need to behave in compatible ways.
> >
> > Sure. I can do that.
> >
> > Btw, is vt-d iommu still in experimental stage? I am just thinking
> > whether it'll be overkill we add lots of tunables before we have one
> > stable and mature vt-d emulation.
> >
> > > Also, is 48 always enough? 5 level with 57 bits
> > > is just around the corner.
> >
> > Please refer to the discussion with Jason - looks like vt-d spec
> > currently supports only 39/48 bits address width? Please correct if I
> > made a mistake.
> >
> > > And is it always supported? for things like vfio
> > > to work, don't we need to check what does host support?
> >
> > Hmm, yes, we should do that. But until now, we still don't have a
> > complete vfio support. IMHO we can postpone this issue until vfio is
> > fully supported.
> 
> I'm not sure how the vIOMMU supporting 39 bits or 48 bits is directly
> relevant to vfio, we're not sharing page tables.  There is already a
> case today, without vIOMMU that you can make a guest which has more
> guest physical address space than the hardware IOMMU by overcommitting
> system memory.  Generally this quickly resolves itself when we start
> pinning pages since the physical address width of the IOMMU is
> typically the same as the physical address width of the host system
> (ie. we exhaust the host memory).  It is possible though that we could
> create a sparse memory VM that makes use of gfns beyond the physical
> IOMMU, with or without a vIOMMU.  You'll get an error from the mapping
> ioctl when this occurs and the VM will abort.  It's not typically an
> issue since the memory capacity of the host and the IOMMU physical
> address width really better align fairly well.  Thanks,
> 
> Alex

Hi, Alex,

I have a different thought here regarding to w/ and w/o vIOMMU.

When there is no vIOMMU exposed, page pinning happens when 
we assign a device. Exceeding physical IOMMU address weight 
leads to failure of device assignment. It's just fine.

However when vIOMMU is exposed, guest IOVA programming 
happens at run-time. If a IOVA happens to exceed physical address
width, aborting the VM looks unreasonable since there's nothing
wrong with guest's operation. We'd better provide a way to at least
notify guest about such error. Aligning vIOMMU address width to 
pIOMMU allows us triggering a virtual IOMMU fault - with proper
error code to indicating such error condition to guest.

Thanks
Kevin
Peter Xu Dec. 13, 2016, 5:24 a.m. UTC | #8
On Mon, Dec 12, 2016 at 08:51:50PM -0700, Alex Williamson wrote:

[...]

> > > I'm not sure how the vIOMMU supporting 39 bits or 48 bits is directly
> > > relevant to vfio, we're not sharing page tables.  There is already a
> > > case today, without vIOMMU that you can make a guest which has more
> > > guest physical address space than the hardware IOMMU by overcommitting
> > > system memory.  Generally this quickly resolves itself when we start
> > > pinning pages since the physical address width of the IOMMU is
> > > typically the same as the physical address width of the host system
> > > (ie. we exhaust the host memory).  
> > 
> > Hi, Alex,
> > 
> > Here does "hardware IOMMU" means the IOMMU iova address space width?
> > For example, if guest has 48 bits physical address width (without
> > vIOMMU), but host hardware IOMMU only supports 39 bits for its iova
> > address space, could device assigment work in this case?
> 
> The current usage depends entirely on what the user (VM) tries to map.
> You could expose a vIOMMU with a 64bit address width, but the moment
> you try to perform a DMA mapping with IOVA beyond bit 39 (if that's the
> host IOMMU address width), the ioctl will fail and the VM will abort.
> IOW, you can claim whatever vIOMMU address width you want, but if you
> layout guest memory or devices in such a way that actually require IOVA
> mapping beyond the host capabilities, you're going to abort.  Likewise,
> without a vIOMMU if the guest memory layout is sufficiently sparse to
> require such IOVAs, you're going to abort.  Thanks,

Thanks for the explanation. I got the point.

However, should we allow guest behaviors affect hypervisor? In this
case, if guest maps IOVA range over 39 bits (assuming vIOMMU is
declaring itself with 48 bits address width), the VM will crash. How
about we shrink vIOMMU address width to 39 bits during boot if we
detected that assigned devices are configured? IMHO no matter what we
do in the guest, the hypervisor should keep the guest alive from
hypervisor POV (emulation of the guest hardware should not be stopped
by guest behavior). If any operation in guest can cause hypervisor
down, isn't it a bug?

Thanks,

-- peterx
Alex Williamson Dec. 13, 2016, 5:31 a.m. UTC | #9
On Tue, 13 Dec 2016 05:00:07 +0000
"Tian, Kevin" <kevin.tian@intel.com> wrote:

> > From: Alex Williamson
> > Sent: Tuesday, December 13, 2016 3:36 AM
> > 
> > On Mon, 12 Dec 2016 10:01:15 +0800
> > Peter Xu <peterx@redhat.com> wrote:
> >   
> > > On Sun, Dec 11, 2016 at 05:13:45AM +0200, Michael S. Tsirkin wrote:  
> > > > On Wed, Dec 07, 2016 at 01:52:45PM +0800, Peter Xu wrote:  
> > > > > Previously vt-d codes only supports 39 bits iova address width. It won't
> > > > > be hard to extend it to 48 bits.
> > > > >
> > > > > After enabling this, we should be able to map larger iova addresses.
> > > > >
> > > > > To check whether 48 bits aw is enabled, we can grep in the guest dmesg
> > > > > with line: "dmar: Host address width 48" (previously it was 39).
> > > > >
> > > > > Signed-off-by: Peter Xu <peterx@redhat.com>  
> > > >
> > > > I suspect we can't do this for old machine types.
> > > > Need to behave in compatible ways.  
> > >
> > > Sure. I can do that.
> > >
> > > Btw, is vt-d iommu still in experimental stage? I am just thinking
> > > whether it'll be overkill we add lots of tunables before we have one
> > > stable and mature vt-d emulation.
> > >  
> > > > Also, is 48 always enough? 5 level with 57 bits
> > > > is just around the corner.  
> > >
> > > Please refer to the discussion with Jason - looks like vt-d spec
> > > currently supports only 39/48 bits address width? Please correct if I
> > > made a mistake.
> > >  
> > > > And is it always supported? for things like vfio
> > > > to work, don't we need to check what does host support?  
> > >
> > > Hmm, yes, we should do that. But until now, we still don't have a
> > > complete vfio support. IMHO we can postpone this issue until vfio is
> > > fully supported.  
> > 
> > I'm not sure how the vIOMMU supporting 39 bits or 48 bits is directly
> > relevant to vfio, we're not sharing page tables.  There is already a
> > case today, without vIOMMU that you can make a guest which has more
> > guest physical address space than the hardware IOMMU by overcommitting
> > system memory.  Generally this quickly resolves itself when we start
> > pinning pages since the physical address width of the IOMMU is
> > typically the same as the physical address width of the host system
> > (ie. we exhaust the host memory).  It is possible though that we could
> > create a sparse memory VM that makes use of gfns beyond the physical
> > IOMMU, with or without a vIOMMU.  You'll get an error from the mapping
> > ioctl when this occurs and the VM will abort.  It's not typically an
> > issue since the memory capacity of the host and the IOMMU physical
> > address width really better align fairly well.  Thanks,
> > 
> > Alex  
> 
> Hi, Alex,
> 
> I have a different thought here regarding to w/ and w/o vIOMMU.
> 
> When there is no vIOMMU exposed, page pinning happens when 
> we assign a device. Exceeding physical IOMMU address weight 
> leads to failure of device assignment. It's just fine.
> 
> However when vIOMMU is exposed, guest IOVA programming 
> happens at run-time. If a IOVA happens to exceed physical address
> width, aborting the VM looks unreasonable since there's nothing
> wrong with guest's operation. We'd better provide a way to at least
> notify guest about such error. Aligning vIOMMU address width to 
> pIOMMU allows us triggering a virtual IOMMU fault - with proper
> error code to indicating such error condition to guest.

QEMU supports memory hotplug, so the first scenario can lead to
aborting the VM runtime as well.  However, I guess the vIOMMU case
really boils down to how the guest makes use of the vIOMMU IOVA space.
The guest could decide to hand out DMA tokens starting at the high end
of the supported vIOMMU width.  Once we enable guest vfio, we allow a
user to manage the IOVA space for a device, so a malicious user within
the guest could abort the entire guest.  So perhaps you're right that
the scenarios available to abort the VM are far more numerous that I
originally thought.

Exposing the host IOMMU address width through the vfio API has been on
my todo list for a long time, it's only been more recently that the
iommu_domain structure includes a geometry structure to expose this
information out through the IOMMU API.  A new capability within the
vfio iommu info data could describe the domain width.  Thanks,

Alex
Alex Williamson Dec. 13, 2016, 5:48 a.m. UTC | #10
On Tue, 13 Dec 2016 13:24:29 +0800
Peter Xu <peterx@redhat.com> wrote:

> On Mon, Dec 12, 2016 at 08:51:50PM -0700, Alex Williamson wrote:
> 
> [...]
> 
> > > > I'm not sure how the vIOMMU supporting 39 bits or 48 bits is directly
> > > > relevant to vfio, we're not sharing page tables.  There is already a
> > > > case today, without vIOMMU that you can make a guest which has more
> > > > guest physical address space than the hardware IOMMU by overcommitting
> > > > system memory.  Generally this quickly resolves itself when we start
> > > > pinning pages since the physical address width of the IOMMU is
> > > > typically the same as the physical address width of the host system
> > > > (ie. we exhaust the host memory).    
> > > 
> > > Hi, Alex,
> > > 
> > > Here does "hardware IOMMU" means the IOMMU iova address space width?
> > > For example, if guest has 48 bits physical address width (without
> > > vIOMMU), but host hardware IOMMU only supports 39 bits for its iova
> > > address space, could device assigment work in this case?  
> > 
> > The current usage depends entirely on what the user (VM) tries to map.
> > You could expose a vIOMMU with a 64bit address width, but the moment
> > you try to perform a DMA mapping with IOVA beyond bit 39 (if that's the
> > host IOMMU address width), the ioctl will fail and the VM will abort.
> > IOW, you can claim whatever vIOMMU address width you want, but if you
> > layout guest memory or devices in such a way that actually require IOVA
> > mapping beyond the host capabilities, you're going to abort.  Likewise,
> > without a vIOMMU if the guest memory layout is sufficiently sparse to
> > require such IOVAs, you're going to abort.  Thanks,  
> 
> Thanks for the explanation. I got the point.
> 
> However, should we allow guest behaviors affect hypervisor? In this
> case, if guest maps IOVA range over 39 bits (assuming vIOMMU is
> declaring itself with 48 bits address width), the VM will crash. How
> about we shrink vIOMMU address width to 39 bits during boot if we
> detected that assigned devices are configured? IMHO no matter what we
> do in the guest, the hypervisor should keep the guest alive from
> hypervisor POV (emulation of the guest hardware should not be stopped
> by guest behavior). If any operation in guest can cause hypervisor
> down, isn't it a bug?

Any case of the guest crashing the hypervisor (ie. the host) is a
serious bug, but a guest causing it's own VM to abort is an entirely
different class, and in some cases justified.  For instance, you only
need a guest misbehaving in the virtio protocol to generate a VM
abort.  The cases Kevin raises make me reconsider because they are
cases of a VM behaving properly, within the specifications of the
hardware exposed to it, generating a VM abort, and in the case of vfio
exposed through to a guest user, allow the VM to be susceptible to the
actions of that user.

Of course any time we tie VM hardware to a host constraint, we're
asking for trouble.  You're example of shrinking the vIOMMU address
width to 39bits on boot highlights that.  Clearly cold plug devices is
only one scenario, what about hotplug devices?  We cannot dynamically
change the vIOMMU address width.  What about migration, we could start
the VM w/o an assigned device on a 48bit capable host and migrate it to
a 39bit host and then attempt to hot add an assigned device.  For the
most compatibility, why would we ever configure the VM with a vIOMMU
address width beyond the minimum necessary to support the potential
populated guest physical memory?  Thanks,

Alex
Peter Xu Dec. 13, 2016, 6:12 a.m. UTC | #11
On Mon, Dec 12, 2016 at 10:48:28PM -0700, Alex Williamson wrote:
> On Tue, 13 Dec 2016 13:24:29 +0800
> Peter Xu <peterx@redhat.com> wrote:
> 
> > On Mon, Dec 12, 2016 at 08:51:50PM -0700, Alex Williamson wrote:
> > 
> > [...]
> > 
> > > > > I'm not sure how the vIOMMU supporting 39 bits or 48 bits is directly
> > > > > relevant to vfio, we're not sharing page tables.  There is already a
> > > > > case today, without vIOMMU that you can make a guest which has more
> > > > > guest physical address space than the hardware IOMMU by overcommitting
> > > > > system memory.  Generally this quickly resolves itself when we start
> > > > > pinning pages since the physical address width of the IOMMU is
> > > > > typically the same as the physical address width of the host system
> > > > > (ie. we exhaust the host memory).    
> > > > 
> > > > Hi, Alex,
> > > > 
> > > > Here does "hardware IOMMU" means the IOMMU iova address space width?
> > > > For example, if guest has 48 bits physical address width (without
> > > > vIOMMU), but host hardware IOMMU only supports 39 bits for its iova
> > > > address space, could device assigment work in this case?  
> > > 
> > > The current usage depends entirely on what the user (VM) tries to map.
> > > You could expose a vIOMMU with a 64bit address width, but the moment
> > > you try to perform a DMA mapping with IOVA beyond bit 39 (if that's the
> > > host IOMMU address width), the ioctl will fail and the VM will abort.
> > > IOW, you can claim whatever vIOMMU address width you want, but if you
> > > layout guest memory or devices in such a way that actually require IOVA
> > > mapping beyond the host capabilities, you're going to abort.  Likewise,
> > > without a vIOMMU if the guest memory layout is sufficiently sparse to
> > > require such IOVAs, you're going to abort.  Thanks,  
> > 
> > Thanks for the explanation. I got the point.
> > 
> > However, should we allow guest behaviors affect hypervisor? In this
> > case, if guest maps IOVA range over 39 bits (assuming vIOMMU is
> > declaring itself with 48 bits address width), the VM will crash. How
> > about we shrink vIOMMU address width to 39 bits during boot if we
> > detected that assigned devices are configured? IMHO no matter what we
> > do in the guest, the hypervisor should keep the guest alive from
> > hypervisor POV (emulation of the guest hardware should not be stopped
> > by guest behavior). If any operation in guest can cause hypervisor
> > down, isn't it a bug?
> 
> Any case of the guest crashing the hypervisor (ie. the host) is a
> serious bug, but a guest causing it's own VM to abort is an entirely
> different class, and in some cases justified.  For instance, you only
> need a guest misbehaving in the virtio protocol to generate a VM
> abort.  The cases Kevin raises make me reconsider because they are
> cases of a VM behaving properly, within the specifications of the
> hardware exposed to it, generating a VM abort, and in the case of vfio
> exposed through to a guest user, allow the VM to be susceptible to the
> actions of that user.
> 
> Of course any time we tie VM hardware to a host constraint, we're
> asking for trouble.  You're example of shrinking the vIOMMU address
> width to 39bits on boot highlights that.  Clearly cold plug devices is
> only one scenario, what about hotplug devices?  We cannot dynamically
> change the vIOMMU address width.  What about migration, we could start
> the VM w/o an assigned device on a 48bit capable host and migrate it to
> a 39bit host and then attempt to hot add an assigned device.  For the
> most compatibility, why would we ever configure the VM with a vIOMMU
> address width beyond the minimum necessary to support the potential
> populated guest physical memory?  Thanks,

For now, I feel a tunable for the address width more essential - let's
just name it as "aw-bits", which should only be used by advanced
users. By default, we can use an address width safe enough, like 39
bits (I assume that most pIOMMUs should support at least 39 bits).
User configurations can override (for now, we can limit the options to
only 39/48 bits).

Then, we can temporarily live even without the interface to detect
host parameters - when user specify a specific width, he/she will
manage the rest (of course taking the risk of VM aborts).

Thanks,

-- peterx
Alex Williamson Dec. 13, 2016, 1:17 p.m. UTC | #12
On Tue, 13 Dec 2016 14:12:12 +0800
Peter Xu <peterx@redhat.com> wrote:

> On Mon, Dec 12, 2016 at 10:48:28PM -0700, Alex Williamson wrote:
> > On Tue, 13 Dec 2016 13:24:29 +0800
> > Peter Xu <peterx@redhat.com> wrote:
> >   
> > > On Mon, Dec 12, 2016 at 08:51:50PM -0700, Alex Williamson wrote:
> > > 
> > > [...]
> > >   
> > > > > > I'm not sure how the vIOMMU supporting 39 bits or 48 bits is directly
> > > > > > relevant to vfio, we're not sharing page tables.  There is already a
> > > > > > case today, without vIOMMU that you can make a guest which has more
> > > > > > guest physical address space than the hardware IOMMU by overcommitting
> > > > > > system memory.  Generally this quickly resolves itself when we start
> > > > > > pinning pages since the physical address width of the IOMMU is
> > > > > > typically the same as the physical address width of the host system
> > > > > > (ie. we exhaust the host memory).      
> > > > > 
> > > > > Hi, Alex,
> > > > > 
> > > > > Here does "hardware IOMMU" means the IOMMU iova address space width?
> > > > > For example, if guest has 48 bits physical address width (without
> > > > > vIOMMU), but host hardware IOMMU only supports 39 bits for its iova
> > > > > address space, could device assigment work in this case?    
> > > > 
> > > > The current usage depends entirely on what the user (VM) tries to map.
> > > > You could expose a vIOMMU with a 64bit address width, but the moment
> > > > you try to perform a DMA mapping with IOVA beyond bit 39 (if that's the
> > > > host IOMMU address width), the ioctl will fail and the VM will abort.
> > > > IOW, you can claim whatever vIOMMU address width you want, but if you
> > > > layout guest memory or devices in such a way that actually require IOVA
> > > > mapping beyond the host capabilities, you're going to abort.  Likewise,
> > > > without a vIOMMU if the guest memory layout is sufficiently sparse to
> > > > require such IOVAs, you're going to abort.  Thanks,    
> > > 
> > > Thanks for the explanation. I got the point.
> > > 
> > > However, should we allow guest behaviors affect hypervisor? In this
> > > case, if guest maps IOVA range over 39 bits (assuming vIOMMU is
> > > declaring itself with 48 bits address width), the VM will crash. How
> > > about we shrink vIOMMU address width to 39 bits during boot if we
> > > detected that assigned devices are configured? IMHO no matter what we
> > > do in the guest, the hypervisor should keep the guest alive from
> > > hypervisor POV (emulation of the guest hardware should not be stopped
> > > by guest behavior). If any operation in guest can cause hypervisor
> > > down, isn't it a bug?  
> > 
> > Any case of the guest crashing the hypervisor (ie. the host) is a
> > serious bug, but a guest causing it's own VM to abort is an entirely
> > different class, and in some cases justified.  For instance, you only
> > need a guest misbehaving in the virtio protocol to generate a VM
> > abort.  The cases Kevin raises make me reconsider because they are
> > cases of a VM behaving properly, within the specifications of the
> > hardware exposed to it, generating a VM abort, and in the case of vfio
> > exposed through to a guest user, allow the VM to be susceptible to the
> > actions of that user.
> > 
> > Of course any time we tie VM hardware to a host constraint, we're
> > asking for trouble.  You're example of shrinking the vIOMMU address
> > width to 39bits on boot highlights that.  Clearly cold plug devices is
> > only one scenario, what about hotplug devices?  We cannot dynamically
> > change the vIOMMU address width.  What about migration, we could start
> > the VM w/o an assigned device on a 48bit capable host and migrate it to
> > a 39bit host and then attempt to hot add an assigned device.  For the
> > most compatibility, why would we ever configure the VM with a vIOMMU
> > address width beyond the minimum necessary to support the potential
> > populated guest physical memory?  Thanks,  
> 
> For now, I feel a tunable for the address width more essential - let's
> just name it as "aw-bits", which should only be used by advanced
> users. By default, we can use an address width safe enough, like 39
> bits (I assume that most pIOMMUs should support at least 39 bits).
> User configurations can override (for now, we can limit the options to
> only 39/48 bits).
> 
> Then, we can temporarily live even without the interface to detect
> host parameters - when user specify a specific width, he/she will
> manage the rest (of course taking the risk of VM aborts).

I'm sorry, what is the actual benefit of a 48-bit address width?
Simply to be able to support larger memory VMs?  In that case the
address width should be automatically configured when necessary rather
than providing yet another obscure user configuration.  Minimally, if
we don't have the support worked out for an option we should denote it
as an experimental option by prefixing it with 'x-'.  Once we make a
non-experimental option, we're stuck with it, and if feels like this is
being rushed through without an concrete requirement for supporting
it.  Thanks,

Alex
Michael S. Tsirkin Dec. 13, 2016, 2:38 p.m. UTC | #13
On Tue, Dec 13, 2016 at 06:17:47AM -0700, Alex Williamson wrote:
> On Tue, 13 Dec 2016 14:12:12 +0800
> Peter Xu <peterx@redhat.com> wrote:
> 
> > On Mon, Dec 12, 2016 at 10:48:28PM -0700, Alex Williamson wrote:
> > > On Tue, 13 Dec 2016 13:24:29 +0800
> > > Peter Xu <peterx@redhat.com> wrote:
> > >   
> > > > On Mon, Dec 12, 2016 at 08:51:50PM -0700, Alex Williamson wrote:
> > > > 
> > > > [...]
> > > >   
> > > > > > > I'm not sure how the vIOMMU supporting 39 bits or 48 bits is directly
> > > > > > > relevant to vfio, we're not sharing page tables.  There is already a
> > > > > > > case today, without vIOMMU that you can make a guest which has more
> > > > > > > guest physical address space than the hardware IOMMU by overcommitting
> > > > > > > system memory.  Generally this quickly resolves itself when we start
> > > > > > > pinning pages since the physical address width of the IOMMU is
> > > > > > > typically the same as the physical address width of the host system
> > > > > > > (ie. we exhaust the host memory).      
> > > > > > 
> > > > > > Hi, Alex,
> > > > > > 
> > > > > > Here does "hardware IOMMU" means the IOMMU iova address space width?
> > > > > > For example, if guest has 48 bits physical address width (without
> > > > > > vIOMMU), but host hardware IOMMU only supports 39 bits for its iova
> > > > > > address space, could device assigment work in this case?    
> > > > > 
> > > > > The current usage depends entirely on what the user (VM) tries to map.
> > > > > You could expose a vIOMMU with a 64bit address width, but the moment
> > > > > you try to perform a DMA mapping with IOVA beyond bit 39 (if that's the
> > > > > host IOMMU address width), the ioctl will fail and the VM will abort.
> > > > > IOW, you can claim whatever vIOMMU address width you want, but if you
> > > > > layout guest memory or devices in such a way that actually require IOVA
> > > > > mapping beyond the host capabilities, you're going to abort.  Likewise,
> > > > > without a vIOMMU if the guest memory layout is sufficiently sparse to
> > > > > require such IOVAs, you're going to abort.  Thanks,    
> > > > 
> > > > Thanks for the explanation. I got the point.
> > > > 
> > > > However, should we allow guest behaviors affect hypervisor? In this
> > > > case, if guest maps IOVA range over 39 bits (assuming vIOMMU is
> > > > declaring itself with 48 bits address width), the VM will crash. How
> > > > about we shrink vIOMMU address width to 39 bits during boot if we
> > > > detected that assigned devices are configured? IMHO no matter what we
> > > > do in the guest, the hypervisor should keep the guest alive from
> > > > hypervisor POV (emulation of the guest hardware should not be stopped
> > > > by guest behavior). If any operation in guest can cause hypervisor
> > > > down, isn't it a bug?  
> > > 
> > > Any case of the guest crashing the hypervisor (ie. the host) is a
> > > serious bug, but a guest causing it's own VM to abort is an entirely
> > > different class, and in some cases justified.  For instance, you only
> > > need a guest misbehaving in the virtio protocol to generate a VM
> > > abort.  The cases Kevin raises make me reconsider because they are
> > > cases of a VM behaving properly, within the specifications of the
> > > hardware exposed to it, generating a VM abort, and in the case of vfio
> > > exposed through to a guest user, allow the VM to be susceptible to the
> > > actions of that user.
> > > 
> > > Of course any time we tie VM hardware to a host constraint, we're
> > > asking for trouble.  You're example of shrinking the vIOMMU address
> > > width to 39bits on boot highlights that.  Clearly cold plug devices is
> > > only one scenario, what about hotplug devices?  We cannot dynamically
> > > change the vIOMMU address width.  What about migration, we could start
> > > the VM w/o an assigned device on a 48bit capable host and migrate it to
> > > a 39bit host and then attempt to hot add an assigned device.  For the
> > > most compatibility, why would we ever configure the VM with a vIOMMU
> > > address width beyond the minimum necessary to support the potential
> > > populated guest physical memory?  Thanks,  
> > 
> > For now, I feel a tunable for the address width more essential - let's
> > just name it as "aw-bits", which should only be used by advanced
> > users. By default, we can use an address width safe enough, like 39
> > bits (I assume that most pIOMMUs should support at least 39 bits).
> > User configurations can override (for now, we can limit the options to
> > only 39/48 bits).
> > 
> > Then, we can temporarily live even without the interface to detect
> > host parameters - when user specify a specific width, he/she will
> > manage the rest (of course taking the risk of VM aborts).
> 
> I'm sorry, what is the actual benefit of a 48-bit address width?
> Simply to be able to support larger memory VMs?  In that case the
> address width should be automatically configured when necessary rather
> than providing yet another obscure user configuration.

I think we need to map out all the issues, and a tunable
isn't a bad way to experiment in order do this.

>  Minimally, if
> we don't have the support worked out for an option we should denote it
> as an experimental option by prefixing it with 'x-'.  Once we make a
> non-experimental option, we're stuck with it, and if feels like this is
> being rushed through without an concrete requirement for supporting
> it.  Thanks,
> 
> Alex

That's a good idea I think. We'll rename once we have
a better understanding what this depends on.
diff mbox

Patch

diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
index e808c67..00e1e16 100644
--- a/hw/i386/intel_iommu_internal.h
+++ b/hw/i386/intel_iommu_internal.h
@@ -195,7 +195,7 @@ 
 #define VTD_DOMAIN_ID_SHIFT         16  /* 16-bit domain id for 64K domains */
 #define VTD_DOMAIN_ID_MASK          ((1UL << VTD_DOMAIN_ID_SHIFT) - 1)
 #define VTD_CAP_ND                  (((VTD_DOMAIN_ID_SHIFT - 4) / 2) & 7ULL)
-#define VTD_MGAW                    39  /* Maximum Guest Address Width */
+#define VTD_MGAW                    48  /* Maximum Guest Address Width */
 #define VTD_CAP_MGAW                (((VTD_MGAW - 1) & 0x3fULL) << 16)
 #define VTD_MAMV                    18ULL
 #define VTD_CAP_MAMV                (VTD_MAMV << 48)
@@ -209,7 +209,8 @@ 
 #define VTD_CAP_SAGAW_39bit         (0x2ULL << VTD_CAP_SAGAW_SHIFT)
  /* 48-bit AGAW, 4-level page-table */
 #define VTD_CAP_SAGAW_48bit         (0x4ULL << VTD_CAP_SAGAW_SHIFT)
-#define VTD_CAP_SAGAW               VTD_CAP_SAGAW_39bit
+#define VTD_CAP_SAGAW               (VTD_CAP_SAGAW_39bit | \
+                                     VTD_CAP_SAGAW_48bit)
 
 /* IQT_REG */
 #define VTD_IQT_QT(val)             (((val) >> 4) & 0x7fffULL)
diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
index 405c9d1..8e0fe65 100644
--- a/include/hw/i386/intel_iommu.h
+++ b/include/hw/i386/intel_iommu.h
@@ -44,7 +44,7 @@ 
 #define VTD_SID_TO_DEVFN(sid)       ((sid) & 0xff)
 
 #define DMAR_REG_SIZE               0x230
-#define VTD_HOST_ADDRESS_WIDTH      39
+#define VTD_HOST_ADDRESS_WIDTH      48
 #define VTD_HAW_MASK                ((1ULL << VTD_HOST_ADDRESS_WIDTH) - 1)
 
 #define DMAR_REPORT_F_INTR          (1)