diff mbox series

IOMMU: make DMA containment of quarantined devices optional

Message ID e43e17ea-6ad0-d125-216f-4798853e3116@suse.com (mailing list archive)
State Superseded
Headers show
Series IOMMU: make DMA containment of quarantined devices optional | expand

Commit Message

Jan Beulich Dec. 11, 2019, 12:53 p.m. UTC
Containing still in flight DMA was introduced to work around certain
devices / systems hanging hard upon hitting an IOMMU fault. Passing
through (such) devices (on such systems) is inherently insecure (as
guests could easily arrange for IOMMU faults to occur). Defaulting to
a mode where admins may not even become aware of issues with devices can
be considered undesirable. Therefore convert this mode of operation to
an optional one, not one enabled by default.

This involves resurrecting code commit ea38867831da ("x86 / iommu: set
up a scratch page in the quarantine domain") did remove, in a slightly
extended fashion. Here, instead of reintroducing a pretty pointless use
of "goto" in domain_context_unmap(), and instead of making the function
(at least temporarily) inconsistent, take the opportunity and replace
the other similarly pointless "goto" as well.

In order to key the re-instated bypasses off of there (not) being a root
page table this further requires moving the allocate_domain_resources()
invocation from reassign_device() to amd_iommu_setup_domain_device() (or
else reassign_device() would allocate a root page table anyway); this is
benign to the second caller of the latter function.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
I'm happy to take better suggestions to replace "full".

Comments

Roger Pau Monné Dec. 11, 2019, 3:54 p.m. UTC | #1
On Wed, Dec 11, 2019 at 01:53:00PM +0100, Jan Beulich wrote:
> Containing still in flight DMA was introduced to work around certain
> devices / systems hanging hard upon hitting an IOMMU fault. Passing
> through (such) devices (on such systems) is inherently insecure (as
> guests could easily arrange for IOMMU faults to occur). Defaulting to
> a mode where admins may not even become aware of issues with devices can
> be considered undesirable. Therefore convert this mode of operation to
> an optional one, not one enabled by default.
> 
> This involves resurrecting code commit ea38867831da ("x86 / iommu: set
> up a scratch page in the quarantine domain") did remove, in a slightly
> extended fashion. Here, instead of reintroducing a pretty pointless use
> of "goto" in domain_context_unmap(), and instead of making the function
> (at least temporarily) inconsistent, take the opportunity and replace
> the other similarly pointless "goto" as well.
> 
> In order to key the re-instated bypasses off of there (not) being a root
> page table this further requires moving the allocate_domain_resources()
> invocation from reassign_device() to amd_iommu_setup_domain_device() (or
> else reassign_device() would allocate a root page table anyway); this is
> benign to the second caller of the latter function.
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> ---
> I'm happy to take better suggestions to replace "full".
> 
> --- a/docs/misc/xen-command-line.pandoc
> +++ b/docs/misc/xen-command-line.pandoc
> @@ -1232,7 +1232,7 @@ detection of systems known to misbehave
>  > Default: `new` unless directed-EOI is supported
>  
>  ### iommu
> -    = List of [ <bool>, verbose, debug, force, required, quarantine,
> +    = List of [ <bool>, verbose, debug, force, required, quarantine[=full],
>                  sharept, intremap, intpost, crash-disable,
>                  snoop, qinval, igfx, amd-iommu-perdev-intremap,
>                  dom0-{passthrough,strict} ]
> @@ -1270,11 +1270,13 @@ boolean (e.g. `iommu=no`) can override t
>      will prevent Xen from booting if IOMMUs aren't discovered and enabled
>      successfully.
>  
> -*   The `quarantine` boolean can be used to control Xen's behavior when
> -    de-assigning devices from guests.  If enabled (the default), Xen always
> -    quarantines such devices; they must be explicitly assigned back to Dom0
> -    before they can be used there again.  If disabled, Xen will only
> -    quarantine devices the toolstack hass arranged for getting quarantined.
> +*   The `quarantine` option can be used to control Xen's behavior when
> +    de-assigning devices from guests.  If set to true (the default), Xen
> +    always quarantines such devices; they must be explicitly assigned back
> +    to Dom0 before they can be used there again.  If set to "full", still
> +    active DMA will additionally be directed to a "sink" page.  If set to
> +    false, Xen will only quarantine devices the toolstack has arranged for
> +    getting quarantined.
>  
>  *   The `sharept` boolean controls whether the IOMMU pagetables are shared
>      with the CPU-side HAP pagetables, or allocated separately.  Sharing
> --- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
> +++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
> @@ -85,18 +85,36 @@ int get_dma_requestor_id(uint16_t seg, u
>      return req_id;
>  }
>  
> -static void amd_iommu_setup_domain_device(
> +static int __must_check allocate_domain_resources(struct domain_iommu *hd)
> +{
> +    int rc;
> +
> +    spin_lock(&hd->arch.mapping_lock);
> +    rc = amd_iommu_alloc_root(hd);
> +    spin_unlock(&hd->arch.mapping_lock);
> +
> +    return rc;
> +}
> +
> +static int __must_check amd_iommu_setup_domain_device(
>      struct domain *domain, struct amd_iommu *iommu,
>      uint8_t devfn, struct pci_dev *pdev)
>  {
>      struct amd_iommu_dte *table, *dte;
>      unsigned long flags;
> -    int req_id, valid = 1;
> +    int req_id, valid = 1, rc;
>      u8 bus = pdev->bus;
> -    const struct domain_iommu *hd = dom_iommu(domain);
> +    struct domain_iommu *hd = dom_iommu(domain);
> +
> +    /* dom_io is used as a sentinel for quarantined devices */
> +    if ( domain == dom_io && !hd->arch.root_table )

This condition (and it's Intel counterpart) would be better in a macro
IMO, so that vendor code regardless of the implementation can use the
same macro (and to avoid having to add the same comment in all
instances), ie: IS_DEVICE_QUARANTINED or some such would be fine IMO.

> +        return 0;
> +
> +    BUG_ON(!hd->arch.paging_mode || !iommu->dev_table.buffer);
>  
> -    BUG_ON( !hd->arch.root_table || !hd->arch.paging_mode ||
> -            !iommu->dev_table.buffer );
> +    rc = allocate_domain_resources(hd);
> +    if ( rc )
> +        return rc;
>  
>      if ( iommu_hwdom_passthrough && is_hardware_domain(domain) )
>          valid = 0;
> @@ -151,6 +169,8 @@ static void amd_iommu_setup_domain_devic
>  
>          amd_iommu_flush_iotlb(devfn, pdev, INV_IOMMU_ALL_PAGES_ADDRESS, 0);
>      }
> +
> +    return 0;
>  }
>  
>  int __init acpi_ivrs_init(void)
> @@ -220,17 +240,6 @@ int amd_iommu_alloc_root(struct domain_i
>      return 0;
>  }
>  
> -static int __must_check allocate_domain_resources(struct domain_iommu *hd)
> -{
> -    int rc;
> -
> -    spin_lock(&hd->arch.mapping_lock);
> -    rc = amd_iommu_alloc_root(hd);
> -    spin_unlock(&hd->arch.mapping_lock);
> -
> -    return rc;
> -}
> -
>  int amd_iommu_get_paging_mode(unsigned long entries)
>  {
>      int level = 1;
> @@ -287,6 +296,10 @@ static void amd_iommu_disable_domain_dev
>      int req_id;
>      u8 bus = pdev->bus;
>  
> +    /* dom_io is used as a sentinel for quarantined devices */
> +    if ( domain == dom_io && !dom_iommu(domain)->arch.root_table )
> +        return;
> +
>      BUG_ON ( iommu->dev_table.buffer == NULL );
>      req_id = get_dma_requestor_id(iommu->seg, PCI_BDF2(bus, devfn));
>      table = iommu->dev_table.buffer;
> @@ -333,7 +346,6 @@ static int reassign_device(struct domain
>  {
>      struct amd_iommu *iommu;
>      int bdf, rc;
> -    struct domain_iommu *t = dom_iommu(target);
>  
>      bdf = PCI_BDF2(pdev->bus, pdev->devfn);
>      iommu = find_iommu_for_device(pdev->seg, bdf);
> @@ -354,11 +366,10 @@ static int reassign_device(struct domain
>          pdev->domain = target;
>      }
>  
> -    rc = allocate_domain_resources(t);
> +    rc = amd_iommu_setup_domain_device(target, iommu, devfn, pdev);
>      if ( rc )
>          return rc;
>  
> -    amd_iommu_setup_domain_device(target, iommu, devfn, pdev);
>      AMD_IOMMU_DEBUG("Re-assign %04x:%02x:%02x.%u from dom%d to dom%d\n",
>                      pdev->seg, pdev->bus, PCI_SLOT(devfn), PCI_FUNC(devfn),
>                      source->domain_id, target->domain_id);
> @@ -515,8 +526,7 @@ static int amd_iommu_add_device(u8 devfn
>          spin_unlock_irqrestore(&iommu->lock, flags);
>      }
>  
> -    amd_iommu_setup_domain_device(pdev->domain, iommu, devfn, pdev);
> -    return 0;
> +    return amd_iommu_setup_domain_device(pdev->domain, iommu, devfn, pdev);
>  }
>  
>  static int amd_iommu_remove_device(u8 devfn, struct pci_dev *pdev)
> --- a/xen/drivers/passthrough/iommu.c
> +++ b/xen/drivers/passthrough/iommu.c
> @@ -30,13 +30,17 @@ bool_t __initdata iommu_enable = 1;
>  bool_t __read_mostly iommu_enabled;
>  bool_t __read_mostly force_iommu;
>  bool_t __read_mostly iommu_verbose;
> -bool __read_mostly iommu_quarantine = true;
>  bool_t __read_mostly iommu_igfx = 1;
>  bool_t __read_mostly iommu_snoop = 1;
>  bool_t __read_mostly iommu_qinval = 1;
>  bool_t __read_mostly iommu_intremap = 1;
>  bool_t __read_mostly iommu_crash_disable;
>  
> +#define IOMMU_quarantine_none  false
> +#define IOMMU_quarantine_basic true
> +#define IOMMU_quarantine_full  2
> +uint8_t __read_mostly iommu_quarantine = IOMMU_quarantine_basic;

I don't really like to use booleans with non-boolean variables.
Wouldn't it be better to just use plain numbers, or even better an
enum?

> +
>  static bool __hwdom_initdata iommu_hwdom_none;
>  bool __hwdom_initdata iommu_hwdom_strict;
>  bool __read_mostly iommu_hwdom_passthrough;
> @@ -81,6 +85,8 @@ static int __init parse_iommu_param(cons
>              force_iommu = val;
>          else if ( (val = parse_boolean("quarantine", s, ss)) >= 0 )
>              iommu_quarantine = val;
> +        else if ( ss == s + 15 && !strncmp(s, "quarantine=full", 15) )
> +            iommu_quarantine = IOMMU_quarantine_full;
>          else if ( (val = parse_boolean("igfx", s, ss)) >= 0 )
>              iommu_igfx = val;
>          else if ( (val = parse_boolean("verbose", s, ss)) >= 0 )
> @@ -451,7 +457,7 @@ static int __init iommu_quarantine_init(
>      dom_io->options |= XEN_DOMCTL_CDF_iommu;
>  
>      rc = iommu_domain_init(dom_io, 0);
> -    if ( rc )
> +    if ( rc || iommu_quarantine < IOMMU_quarantine_full )
>          return rc;
>  
>      if ( !hd->platform_ops->quarantine_init )
> --- a/xen/drivers/passthrough/vtd/iommu.c
> +++ b/xen/drivers/passthrough/vtd/iommu.c
> @@ -1291,6 +1291,10 @@ int domain_context_mapping_one(
>      int agaw, rc, ret;
>      bool_t flush_dev_iotlb;
>  
> +    /* dom_io is used as a sentinel for quarantined devices */
> +    if ( domain == dom_io && !hd->arch.pgd_maddr )
> +        return 0;
> +
>      ASSERT(pcidevs_locked());
>      spin_lock(&iommu->lock);
>      maddr = bus_to_context_maddr(iommu, bus);
> @@ -1537,6 +1541,10 @@ int domain_context_unmap_one(
>      int iommu_domid, rc, ret;
>      bool_t flush_dev_iotlb;
>  
> +    /* dom_io is used as a sentinel for quarantined devices */
> +    if ( domain == dom_io && !dom_iommu(domain)->arch.pgd_maddr )
> +        return 0;
> +
>      ASSERT(pcidevs_locked());
>      spin_lock(&iommu->lock);
>  
> @@ -1598,7 +1606,7 @@ static int domain_context_unmap(struct d
>  {
>      struct acpi_drhd_unit *drhd;
>      struct vtd_iommu *iommu;
> -    int ret = 0;
> +    int ret;
>      u8 seg = pdev->seg, bus = pdev->bus, tmp_bus, tmp_devfn, secbus;
>      int found = 0;
>  
> @@ -1614,14 +1622,12 @@ static int domain_context_unmap(struct d
>              printk(VTDPREFIX "d%d:Hostbridge: skip %04x:%02x:%02x.%u unmap\n",
>                     domain->domain_id, seg, bus,
>                     PCI_SLOT(devfn), PCI_FUNC(devfn));
> -        if ( !is_hardware_domain(domain) )
> -            return -EPERM;
> -        goto out;
> +        return is_hardware_domain(domain) ? 0 : -EPERM;
>  
>      case DEV_TYPE_PCIe_BRIDGE:
>      case DEV_TYPE_PCIe2PCI_BRIDGE:
>      case DEV_TYPE_LEGACY_PCI_BRIDGE:
> -        goto out;
> +        return 0;
>  
>      case DEV_TYPE_PCIe_ENDPOINT:
>          if ( iommu_debug )
> @@ -1665,10 +1671,13 @@ static int domain_context_unmap(struct d
>          dprintk(XENLOG_ERR VTDPREFIX, "d%d:unknown(%u): %04x:%02x:%02x.%u\n",
>                  domain->domain_id, pdev->type,
>                  seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn));
> -        ret = -EINVAL;
> -        goto out;
> +        return -EINVAL;
>      }
>  
> +    /* dom_io is used as a sentinel for quarantined devices */
> +    if ( domain == dom_io && !dom_iommu(domain)->arch.pgd_maddr )
> +        return ret;
> +
>      /*
>       * if no other devices under the same iommu owned by this domain,
>       * clear iommu in iommu_bitmap and clear domain_id in domid_bitmp
> @@ -1694,16 +1703,12 @@ static int domain_context_unmap(struct d
>  
>          iommu_domid = domain_iommu_domid(domain, iommu);
>          if ( iommu_domid == -1 )
> -        {
> -            ret = -EINVAL;
> -            goto out;
> -        }
> +            return -EINVAL;
>  
>          clear_bit(iommu_domid, iommu->domid_bitmap);
>          iommu->domid_map[iommu_domid] = 0;
>      }
>  
> -out:
>      return ret;
>  }
>  
> --- a/xen/include/xen/iommu.h
> +++ b/xen/include/xen/iommu.h
> @@ -53,8 +53,9 @@ static inline bool_t dfn_eq(dfn_t x, dfn
>  }
>  
>  extern bool_t iommu_enable, iommu_enabled;
> -extern bool force_iommu, iommu_quarantine, iommu_verbose, iommu_igfx;
> +extern bool force_iommu, iommu_verbose, iommu_igfx;
>  extern bool_t iommu_snoop, iommu_qinval, iommu_intremap, iommu_intpost;
> +extern uint8_t iommu_quarantine;

Exporting this variable without the paired defines seems pointless,
how are external callers supposed to figure out the quarantine mode
without the IOMMU_quarantine_* defines?

Thanks, Roger.
Jan Beulich Dec. 12, 2019, 9:28 a.m. UTC | #2
On 11.12.2019 16:54, Roger Pau Monné wrote:
> On Wed, Dec 11, 2019 at 01:53:00PM +0100, Jan Beulich wrote:
>> --- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
>> +++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
>> @@ -85,18 +85,36 @@ int get_dma_requestor_id(uint16_t seg, u
>>      return req_id;
>>  }
>>  
>> -static void amd_iommu_setup_domain_device(
>> +static int __must_check allocate_domain_resources(struct domain_iommu *hd)
>> +{
>> +    int rc;
>> +
>> +    spin_lock(&hd->arch.mapping_lock);
>> +    rc = amd_iommu_alloc_root(hd);
>> +    spin_unlock(&hd->arch.mapping_lock);
>> +
>> +    return rc;
>> +}
>> +
>> +static int __must_check amd_iommu_setup_domain_device(
>>      struct domain *domain, struct amd_iommu *iommu,
>>      uint8_t devfn, struct pci_dev *pdev)
>>  {
>>      struct amd_iommu_dte *table, *dte;
>>      unsigned long flags;
>> -    int req_id, valid = 1;
>> +    int req_id, valid = 1, rc;
>>      u8 bus = pdev->bus;
>> -    const struct domain_iommu *hd = dom_iommu(domain);
>> +    struct domain_iommu *hd = dom_iommu(domain);
>> +
>> +    /* dom_io is used as a sentinel for quarantined devices */
>> +    if ( domain == dom_io && !hd->arch.root_table )
> 
> This condition (and it's Intel counterpart) would be better in a macro
> IMO, so that vendor code regardless of the implementation can use the
> same macro (and to avoid having to add the same comment in all
> instances), ie: IS_DEVICE_QUARANTINED or some such would be fine IMO.

The "device" in the name suggested is inapplicable, as there's no
device involved here. The conditional also isn't about
"quarantined", but about the extended for thereof. I further don't
understand "vendor code" in your remark: Different macros would be
needed for either vendor anyway. (I did actually consider having
some kind of predicate helper, but I couldn't come up with a
sufficiently good name. I also think such an abstraction should
then have been introduced when these conditionals were first added
in their then still vendor independent form.)

>> --- a/xen/drivers/passthrough/iommu.c
>> +++ b/xen/drivers/passthrough/iommu.c
>> @@ -30,13 +30,17 @@ bool_t __initdata iommu_enable = 1;
>>  bool_t __read_mostly iommu_enabled;
>>  bool_t __read_mostly force_iommu;
>>  bool_t __read_mostly iommu_verbose;
>> -bool __read_mostly iommu_quarantine = true;
>>  bool_t __read_mostly iommu_igfx = 1;
>>  bool_t __read_mostly iommu_snoop = 1;
>>  bool_t __read_mostly iommu_qinval = 1;
>>  bool_t __read_mostly iommu_intremap = 1;
>>  bool_t __read_mostly iommu_crash_disable;
>>  
>> +#define IOMMU_quarantine_none  false
>> +#define IOMMU_quarantine_basic true
>> +#define IOMMU_quarantine_full  2
>> +uint8_t __read_mostly iommu_quarantine = IOMMU_quarantine_basic;
> 
> I don't really like to use booleans with non-boolean variables.
> Wouldn't it be better to just use plain numbers, or even better an
> enum?

No option is really good here, I think. I did consider using an
enum, but I wanted to restrict the variable to 8 bits. If I was
to use an enum, of course I'd also want to have the variable this
(correct) type. And I'd also like to avoid the packed attribute
here. The above seemed to least bad option; I could be convinced
to use 0/1 instead of false/true, though.

>> --- a/xen/include/xen/iommu.h
>> +++ b/xen/include/xen/iommu.h
>> @@ -53,8 +53,9 @@ static inline bool_t dfn_eq(dfn_t x, dfn
>>  }
>>  
>>  extern bool_t iommu_enable, iommu_enabled;
>> -extern bool force_iommu, iommu_quarantine, iommu_verbose, iommu_igfx;
>> +extern bool force_iommu, iommu_verbose, iommu_igfx;
>>  extern bool_t iommu_snoop, iommu_qinval, iommu_intremap, iommu_intpost;
>> +extern uint8_t iommu_quarantine;
> 
> Exporting this variable without the paired defines seems pointless,
> how are external callers supposed to figure out the quarantine mode
> without the IOMMU_quarantine_* defines?

Why pointless? Outside of the file knowing the IOMMU_quarantine_*
defines the variable continues to have boolean meaning.

Jan
Roger Pau Monné Dec. 12, 2019, 1:15 p.m. UTC | #3
On Thu, Dec 12, 2019 at 10:28:26AM +0100, Jan Beulich wrote:
> On 11.12.2019 16:54, Roger Pau Monné wrote:
> > On Wed, Dec 11, 2019 at 01:53:00PM +0100, Jan Beulich wrote:
> >> --- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
> >> +++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
> >> @@ -85,18 +85,36 @@ int get_dma_requestor_id(uint16_t seg, u
> >>      return req_id;
> >>  }
> >>  
> >> -static void amd_iommu_setup_domain_device(
> >> +static int __must_check allocate_domain_resources(struct domain_iommu *hd)
> >> +{
> >> +    int rc;
> >> +
> >> +    spin_lock(&hd->arch.mapping_lock);
> >> +    rc = amd_iommu_alloc_root(hd);
> >> +    spin_unlock(&hd->arch.mapping_lock);
> >> +
> >> +    return rc;
> >> +}
> >> +
> >> +static int __must_check amd_iommu_setup_domain_device(
> >>      struct domain *domain, struct amd_iommu *iommu,
> >>      uint8_t devfn, struct pci_dev *pdev)
> >>  {
> >>      struct amd_iommu_dte *table, *dte;
> >>      unsigned long flags;
> >> -    int req_id, valid = 1;
> >> +    int req_id, valid = 1, rc;
> >>      u8 bus = pdev->bus;
> >> -    const struct domain_iommu *hd = dom_iommu(domain);
> >> +    struct domain_iommu *hd = dom_iommu(domain);
> >> +
> >> +    /* dom_io is used as a sentinel for quarantined devices */
> >> +    if ( domain == dom_io && !hd->arch.root_table )
> > 
> > This condition (and it's Intel counterpart) would be better in a macro
> > IMO, so that vendor code regardless of the implementation can use the
> > same macro (and to avoid having to add the same comment in all
> > instances), ie: IS_DEVICE_QUARANTINED or some such would be fine IMO.
> 
> The "device" in the name suggested is inapplicable, as there's no
> device involved here. The conditional also isn't about
> "quarantined", but about the extended for thereof.

Maybe IS_QUARANTINE_FULL or IS_FULLY_QUARANTINED or something similar
in order to match the command line option then?

The comment above explicitly mentions that dom_io is used as a
sentinel for quarantined devices, hence the DEVICE in the name didn't
seem that far off.

> I further don't
> understand "vendor code" in your remark: Different macros would be
> needed for either vendor anyway.

Yes, but both macros would have the same name, hence you wouldn't need
to think whether you are in AMD or Intel code as the macro would
always have the same name.

> (I did actually consider having
> some kind of predicate helper, but I couldn't come up with a
> sufficiently good name. I also think such an abstraction should
> then have been introduced when these conditionals were first added
> in their then still vendor independent form.)

I would prefer some kind of macro, as I think there's quite a lot of
replication of those two checks, and IMO it's easy to by mistake use
the wrong one when moving between Intel and AMD code (the more that
it would build fine but lead to runtime issues).

> 
> >> --- a/xen/drivers/passthrough/iommu.c
> >> +++ b/xen/drivers/passthrough/iommu.c
> >> @@ -30,13 +30,17 @@ bool_t __initdata iommu_enable = 1;
> >>  bool_t __read_mostly iommu_enabled;
> >>  bool_t __read_mostly force_iommu;
> >>  bool_t __read_mostly iommu_verbose;
> >> -bool __read_mostly iommu_quarantine = true;
> >>  bool_t __read_mostly iommu_igfx = 1;
> >>  bool_t __read_mostly iommu_snoop = 1;
> >>  bool_t __read_mostly iommu_qinval = 1;
> >>  bool_t __read_mostly iommu_intremap = 1;
> >>  bool_t __read_mostly iommu_crash_disable;
> >>  
> >> +#define IOMMU_quarantine_none  false
> >> +#define IOMMU_quarantine_basic true
> >> +#define IOMMU_quarantine_full  2
> >> +uint8_t __read_mostly iommu_quarantine = IOMMU_quarantine_basic;
> > 
> > I don't really like to use booleans with non-boolean variables.
> > Wouldn't it be better to just use plain numbers, or even better an
> > enum?
> 
> No option is really good here, I think. I did consider using an
> enum, but I wanted to restrict the variable to 8 bits.

IMO I wouldn't be that worried about using 8 vs 32 bits.

> If I was
> to use an enum, of course I'd also want to have the variable this
> (correct) type. And I'd also like to avoid the packed attribute
> here. The above seemed to least bad option; I could be convinced
> to use 0/1 instead of false/true, though.

Yes please, 0/1 would be fine for me.

> 
> >> --- a/xen/include/xen/iommu.h
> >> +++ b/xen/include/xen/iommu.h
> >> @@ -53,8 +53,9 @@ static inline bool_t dfn_eq(dfn_t x, dfn
> >>  }
> >>  
> >>  extern bool_t iommu_enable, iommu_enabled;
> >> -extern bool force_iommu, iommu_quarantine, iommu_verbose, iommu_igfx;
> >> +extern bool force_iommu, iommu_verbose, iommu_igfx;
> >>  extern bool_t iommu_snoop, iommu_qinval, iommu_intremap, iommu_intpost;
> >> +extern uint8_t iommu_quarantine;
> > 
> > Exporting this variable without the paired defines seems pointless,
> > how are external callers supposed to figure out the quarantine mode
> > without the IOMMU_quarantine_* defines?
> 
> Why pointless? Outside of the file knowing the IOMMU_quarantine_*
> defines the variable continues to have boolean meaning.

Do you think you could add a comment to that effect?

Thanks, Roger.
Jan Beulich Dec. 12, 2019, 2:06 p.m. UTC | #4
On 12.12.2019 14:15, Roger Pau Monné wrote:
> On Thu, Dec 12, 2019 at 10:28:26AM +0100, Jan Beulich wrote:
>> On 11.12.2019 16:54, Roger Pau Monné wrote:
>>> On Wed, Dec 11, 2019 at 01:53:00PM +0100, Jan Beulich wrote:
>>>> --- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
>>>> +++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
>>>> @@ -85,18 +85,36 @@ int get_dma_requestor_id(uint16_t seg, u
>>>>      return req_id;
>>>>  }
>>>>  
>>>> -static void amd_iommu_setup_domain_device(
>>>> +static int __must_check allocate_domain_resources(struct domain_iommu *hd)
>>>> +{
>>>> +    int rc;
>>>> +
>>>> +    spin_lock(&hd->arch.mapping_lock);
>>>> +    rc = amd_iommu_alloc_root(hd);
>>>> +    spin_unlock(&hd->arch.mapping_lock);
>>>> +
>>>> +    return rc;
>>>> +}
>>>> +
>>>> +static int __must_check amd_iommu_setup_domain_device(
>>>>      struct domain *domain, struct amd_iommu *iommu,
>>>>      uint8_t devfn, struct pci_dev *pdev)
>>>>  {
>>>>      struct amd_iommu_dte *table, *dte;
>>>>      unsigned long flags;
>>>> -    int req_id, valid = 1;
>>>> +    int req_id, valid = 1, rc;
>>>>      u8 bus = pdev->bus;
>>>> -    const struct domain_iommu *hd = dom_iommu(domain);
>>>> +    struct domain_iommu *hd = dom_iommu(domain);
>>>> +
>>>> +    /* dom_io is used as a sentinel for quarantined devices */
>>>> +    if ( domain == dom_io && !hd->arch.root_table )
>>>
>>> This condition (and it's Intel counterpart) would be better in a macro
>>> IMO, so that vendor code regardless of the implementation can use the
>>> same macro (and to avoid having to add the same comment in all
>>> instances), ie: IS_DEVICE_QUARANTINED or some such would be fine IMO.
>>
>> The "device" in the name suggested is inapplicable, as there's no
>> device involved here. The conditional also isn't about
>> "quarantined", but about the extended for thereof.
> 
> Maybe IS_QUARANTINE_FULL or IS_FULLY_QUARANTINED or something similar
> in order to match the command line option then?

And IS_*() or is_*() ought to have an object to pass to. What would
the object be here? The domain isn't applicable. Maybe something
like FULL_QUARANTINE_MODE() (without any parameters) ...

>> I further don't
>> understand "vendor code" in your remark: Different macros would be
>> needed for either vendor anyway.
> 
> Yes, but both macros would have the same name, hence you wouldn't need
> to think whether you are in AMD or Intel code as the macro would
> always have the same name.
> 
>> (I did actually consider having
>> some kind of predicate helper, but I couldn't come up with a
>> sufficiently good name. I also think such an abstraction should
>> then have been introduced when these conditionals were first added
>> in their then still vendor independent form.)
> 
> I would prefer some kind of macro, as I think there's quite a lot of
> replication of those two checks, and IMO it's easy to by mistake use
> the wrong one when moving between Intel and AMD code (the more that
> it would build fine but lead to runtime issues).

Makes sense, as long as we can find a half way suitable name.

>>>> --- a/xen/include/xen/iommu.h
>>>> +++ b/xen/include/xen/iommu.h
>>>> @@ -53,8 +53,9 @@ static inline bool_t dfn_eq(dfn_t x, dfn
>>>>  }
>>>>  
>>>>  extern bool_t iommu_enable, iommu_enabled;
>>>> -extern bool force_iommu, iommu_quarantine, iommu_verbose, iommu_igfx;
>>>> +extern bool force_iommu, iommu_verbose, iommu_igfx;
>>>>  extern bool_t iommu_snoop, iommu_qinval, iommu_intremap, iommu_intpost;
>>>> +extern uint8_t iommu_quarantine;
>>>
>>> Exporting this variable without the paired defines seems pointless,
>>> how are external callers supposed to figure out the quarantine mode
>>> without the IOMMU_quarantine_* defines?
>>
>> Why pointless? Outside of the file knowing the IOMMU_quarantine_*
>> defines the variable continues to have boolean meaning.
> 
> Do you think you could add a comment to that effect?

Will do.

Jan
diff mbox series

Patch

--- a/docs/misc/xen-command-line.pandoc
+++ b/docs/misc/xen-command-line.pandoc
@@ -1232,7 +1232,7 @@  detection of systems known to misbehave
 > Default: `new` unless directed-EOI is supported
 
 ### iommu
-    = List of [ <bool>, verbose, debug, force, required, quarantine,
+    = List of [ <bool>, verbose, debug, force, required, quarantine[=full],
                 sharept, intremap, intpost, crash-disable,
                 snoop, qinval, igfx, amd-iommu-perdev-intremap,
                 dom0-{passthrough,strict} ]
@@ -1270,11 +1270,13 @@  boolean (e.g. `iommu=no`) can override t
     will prevent Xen from booting if IOMMUs aren't discovered and enabled
     successfully.
 
-*   The `quarantine` boolean can be used to control Xen's behavior when
-    de-assigning devices from guests.  If enabled (the default), Xen always
-    quarantines such devices; they must be explicitly assigned back to Dom0
-    before they can be used there again.  If disabled, Xen will only
-    quarantine devices the toolstack hass arranged for getting quarantined.
+*   The `quarantine` option can be used to control Xen's behavior when
+    de-assigning devices from guests.  If set to true (the default), Xen
+    always quarantines such devices; they must be explicitly assigned back
+    to Dom0 before they can be used there again.  If set to "full", still
+    active DMA will additionally be directed to a "sink" page.  If set to
+    false, Xen will only quarantine devices the toolstack has arranged for
+    getting quarantined.
 
 *   The `sharept` boolean controls whether the IOMMU pagetables are shared
     with the CPU-side HAP pagetables, or allocated separately.  Sharing
--- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
+++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
@@ -85,18 +85,36 @@  int get_dma_requestor_id(uint16_t seg, u
     return req_id;
 }
 
-static void amd_iommu_setup_domain_device(
+static int __must_check allocate_domain_resources(struct domain_iommu *hd)
+{
+    int rc;
+
+    spin_lock(&hd->arch.mapping_lock);
+    rc = amd_iommu_alloc_root(hd);
+    spin_unlock(&hd->arch.mapping_lock);
+
+    return rc;
+}
+
+static int __must_check amd_iommu_setup_domain_device(
     struct domain *domain, struct amd_iommu *iommu,
     uint8_t devfn, struct pci_dev *pdev)
 {
     struct amd_iommu_dte *table, *dte;
     unsigned long flags;
-    int req_id, valid = 1;
+    int req_id, valid = 1, rc;
     u8 bus = pdev->bus;
-    const struct domain_iommu *hd = dom_iommu(domain);
+    struct domain_iommu *hd = dom_iommu(domain);
+
+    /* dom_io is used as a sentinel for quarantined devices */
+    if ( domain == dom_io && !hd->arch.root_table )
+        return 0;
+
+    BUG_ON(!hd->arch.paging_mode || !iommu->dev_table.buffer);
 
-    BUG_ON( !hd->arch.root_table || !hd->arch.paging_mode ||
-            !iommu->dev_table.buffer );
+    rc = allocate_domain_resources(hd);
+    if ( rc )
+        return rc;
 
     if ( iommu_hwdom_passthrough && is_hardware_domain(domain) )
         valid = 0;
@@ -151,6 +169,8 @@  static void amd_iommu_setup_domain_devic
 
         amd_iommu_flush_iotlb(devfn, pdev, INV_IOMMU_ALL_PAGES_ADDRESS, 0);
     }
+
+    return 0;
 }
 
 int __init acpi_ivrs_init(void)
@@ -220,17 +240,6 @@  int amd_iommu_alloc_root(struct domain_i
     return 0;
 }
 
-static int __must_check allocate_domain_resources(struct domain_iommu *hd)
-{
-    int rc;
-
-    spin_lock(&hd->arch.mapping_lock);
-    rc = amd_iommu_alloc_root(hd);
-    spin_unlock(&hd->arch.mapping_lock);
-
-    return rc;
-}
-
 int amd_iommu_get_paging_mode(unsigned long entries)
 {
     int level = 1;
@@ -287,6 +296,10 @@  static void amd_iommu_disable_domain_dev
     int req_id;
     u8 bus = pdev->bus;
 
+    /* dom_io is used as a sentinel for quarantined devices */
+    if ( domain == dom_io && !dom_iommu(domain)->arch.root_table )
+        return;
+
     BUG_ON ( iommu->dev_table.buffer == NULL );
     req_id = get_dma_requestor_id(iommu->seg, PCI_BDF2(bus, devfn));
     table = iommu->dev_table.buffer;
@@ -333,7 +346,6 @@  static int reassign_device(struct domain
 {
     struct amd_iommu *iommu;
     int bdf, rc;
-    struct domain_iommu *t = dom_iommu(target);
 
     bdf = PCI_BDF2(pdev->bus, pdev->devfn);
     iommu = find_iommu_for_device(pdev->seg, bdf);
@@ -354,11 +366,10 @@  static int reassign_device(struct domain
         pdev->domain = target;
     }
 
-    rc = allocate_domain_resources(t);
+    rc = amd_iommu_setup_domain_device(target, iommu, devfn, pdev);
     if ( rc )
         return rc;
 
-    amd_iommu_setup_domain_device(target, iommu, devfn, pdev);
     AMD_IOMMU_DEBUG("Re-assign %04x:%02x:%02x.%u from dom%d to dom%d\n",
                     pdev->seg, pdev->bus, PCI_SLOT(devfn), PCI_FUNC(devfn),
                     source->domain_id, target->domain_id);
@@ -515,8 +526,7 @@  static int amd_iommu_add_device(u8 devfn
         spin_unlock_irqrestore(&iommu->lock, flags);
     }
 
-    amd_iommu_setup_domain_device(pdev->domain, iommu, devfn, pdev);
-    return 0;
+    return amd_iommu_setup_domain_device(pdev->domain, iommu, devfn, pdev);
 }
 
 static int amd_iommu_remove_device(u8 devfn, struct pci_dev *pdev)
--- a/xen/drivers/passthrough/iommu.c
+++ b/xen/drivers/passthrough/iommu.c
@@ -30,13 +30,17 @@  bool_t __initdata iommu_enable = 1;
 bool_t __read_mostly iommu_enabled;
 bool_t __read_mostly force_iommu;
 bool_t __read_mostly iommu_verbose;
-bool __read_mostly iommu_quarantine = true;
 bool_t __read_mostly iommu_igfx = 1;
 bool_t __read_mostly iommu_snoop = 1;
 bool_t __read_mostly iommu_qinval = 1;
 bool_t __read_mostly iommu_intremap = 1;
 bool_t __read_mostly iommu_crash_disable;
 
+#define IOMMU_quarantine_none  false
+#define IOMMU_quarantine_basic true
+#define IOMMU_quarantine_full  2
+uint8_t __read_mostly iommu_quarantine = IOMMU_quarantine_basic;
+
 static bool __hwdom_initdata iommu_hwdom_none;
 bool __hwdom_initdata iommu_hwdom_strict;
 bool __read_mostly iommu_hwdom_passthrough;
@@ -81,6 +85,8 @@  static int __init parse_iommu_param(cons
             force_iommu = val;
         else if ( (val = parse_boolean("quarantine", s, ss)) >= 0 )
             iommu_quarantine = val;
+        else if ( ss == s + 15 && !strncmp(s, "quarantine=full", 15) )
+            iommu_quarantine = IOMMU_quarantine_full;
         else if ( (val = parse_boolean("igfx", s, ss)) >= 0 )
             iommu_igfx = val;
         else if ( (val = parse_boolean("verbose", s, ss)) >= 0 )
@@ -451,7 +457,7 @@  static int __init iommu_quarantine_init(
     dom_io->options |= XEN_DOMCTL_CDF_iommu;
 
     rc = iommu_domain_init(dom_io, 0);
-    if ( rc )
+    if ( rc || iommu_quarantine < IOMMU_quarantine_full )
         return rc;
 
     if ( !hd->platform_ops->quarantine_init )
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -1291,6 +1291,10 @@  int domain_context_mapping_one(
     int agaw, rc, ret;
     bool_t flush_dev_iotlb;
 
+    /* dom_io is used as a sentinel for quarantined devices */
+    if ( domain == dom_io && !hd->arch.pgd_maddr )
+        return 0;
+
     ASSERT(pcidevs_locked());
     spin_lock(&iommu->lock);
     maddr = bus_to_context_maddr(iommu, bus);
@@ -1537,6 +1541,10 @@  int domain_context_unmap_one(
     int iommu_domid, rc, ret;
     bool_t flush_dev_iotlb;
 
+    /* dom_io is used as a sentinel for quarantined devices */
+    if ( domain == dom_io && !dom_iommu(domain)->arch.pgd_maddr )
+        return 0;
+
     ASSERT(pcidevs_locked());
     spin_lock(&iommu->lock);
 
@@ -1598,7 +1606,7 @@  static int domain_context_unmap(struct d
 {
     struct acpi_drhd_unit *drhd;
     struct vtd_iommu *iommu;
-    int ret = 0;
+    int ret;
     u8 seg = pdev->seg, bus = pdev->bus, tmp_bus, tmp_devfn, secbus;
     int found = 0;
 
@@ -1614,14 +1622,12 @@  static int domain_context_unmap(struct d
             printk(VTDPREFIX "d%d:Hostbridge: skip %04x:%02x:%02x.%u unmap\n",
                    domain->domain_id, seg, bus,
                    PCI_SLOT(devfn), PCI_FUNC(devfn));
-        if ( !is_hardware_domain(domain) )
-            return -EPERM;
-        goto out;
+        return is_hardware_domain(domain) ? 0 : -EPERM;
 
     case DEV_TYPE_PCIe_BRIDGE:
     case DEV_TYPE_PCIe2PCI_BRIDGE:
     case DEV_TYPE_LEGACY_PCI_BRIDGE:
-        goto out;
+        return 0;
 
     case DEV_TYPE_PCIe_ENDPOINT:
         if ( iommu_debug )
@@ -1665,10 +1671,13 @@  static int domain_context_unmap(struct d
         dprintk(XENLOG_ERR VTDPREFIX, "d%d:unknown(%u): %04x:%02x:%02x.%u\n",
                 domain->domain_id, pdev->type,
                 seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn));
-        ret = -EINVAL;
-        goto out;
+        return -EINVAL;
     }
 
+    /* dom_io is used as a sentinel for quarantined devices */
+    if ( domain == dom_io && !dom_iommu(domain)->arch.pgd_maddr )
+        return ret;
+
     /*
      * if no other devices under the same iommu owned by this domain,
      * clear iommu in iommu_bitmap and clear domain_id in domid_bitmp
@@ -1694,16 +1703,12 @@  static int domain_context_unmap(struct d
 
         iommu_domid = domain_iommu_domid(domain, iommu);
         if ( iommu_domid == -1 )
-        {
-            ret = -EINVAL;
-            goto out;
-        }
+            return -EINVAL;
 
         clear_bit(iommu_domid, iommu->domid_bitmap);
         iommu->domid_map[iommu_domid] = 0;
     }
 
-out:
     return ret;
 }
 
--- a/xen/include/xen/iommu.h
+++ b/xen/include/xen/iommu.h
@@ -53,8 +53,9 @@  static inline bool_t dfn_eq(dfn_t x, dfn
 }
 
 extern bool_t iommu_enable, iommu_enabled;
-extern bool force_iommu, iommu_quarantine, iommu_verbose, iommu_igfx;
+extern bool force_iommu, iommu_verbose, iommu_igfx;
 extern bool_t iommu_snoop, iommu_qinval, iommu_intremap, iommu_intpost;
+extern uint8_t iommu_quarantine;
 
 #if defined(CONFIG_IOMMU_FORCE_PT_SHARE)
 #define iommu_hap_pt_share true