From patchwork Wed Jun 7 19:24:32 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Venu Busireddy X-Patchwork-Id: 9772657 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 837B860350 for ; Wed, 7 Jun 2017 19:28:15 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 73190200F5 for ; Wed, 7 Jun 2017 19:28:15 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 676F52847A; Wed, 7 Jun 2017 19:28:15 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.2 required=2.0 tests=BAYES_00, RCVD_IN_DNSWL_MED, UNPARSEABLE_RELAY autolearn=ham version=3.3.1 Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 9D7DA200F5 for ; Wed, 7 Jun 2017 19:28:13 +0000 (UTC) Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.84_2) (envelope-from ) id 1dIgZr-0002Bf-IR; Wed, 07 Jun 2017 19:25:11 +0000 Received: from mail6.bemta3.messagelabs.com ([195.245.230.39]) by lists.xenproject.org with esmtp (Exim 4.84_2) (envelope-from ) id 1dIgZq-0002BZ-5W for xen-devel@lists.xen.org; Wed, 07 Jun 2017 19:25:10 +0000 Received: from [85.158.137.68] by server-3.bemta-3.messagelabs.com id 4E/42-01985-51358395; Wed, 07 Jun 2017 19:25:09 +0000 X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFvrELMWRWlGSWpSXmKPExsXSO6nOVVck2CL S4N89WYslHxezODB6HN39mymAMYo1My8pvyKBNePju+vMBZ9mM1Wsn3qarYHxyy3GLkYuDiGB SUwS3dP2s3cxcgI5XxklXh6xhEhsYJSYM7eFGcLpY5T4dusnE0gVm4CBxNHDPawgtohArsTF7 qNsIDazgLTEtSdvwWxhAX2JS9eamEFsFgFVib3tq8HivAJ2EmvevwfbJiEgL3F2y05mCNtYov 3tRbYJjDwLGBlWMWoUpxaVpRbpGlroJRVlpmeU5CZm5ugaGhjr5aYWFyemp+YkJhXrJefnbmI E+r+egYFxB+Pv056HGCU5mJREeSs8LSKF+JLyUyozEosz4otKc1KLDzHKcHAoSfDeCATKCRal pqdWpGXmAAMRJi3BwaMkwsvnD5TmLS5IzC3OTIdInWLU5diwev0XJiGWvPy8VClx3pUgMwRAi jJK8+BGwKLiEqOslDAvIwMDgxBPQWpRbmYJqvwrRnEORiVh3it+QFN4MvNK4Da9AjqCCeSISy YgR5QkIqSkGhgnPZ/jdP6I8xYtnt6+i2GbY04IpsZ8KZ+Z116z9k3xFc/yu1wvZK8qXNwR159 S03C3wffXoxe2k61eJedH3CqbVq+4wqVye82x+RYzV2frz+puZpicV7UkMLM54eTdPdVFyxu7 bdPU3/y49LRt+zmla44re95Pzal72VPIzjTliEfWioIHu48osRRnJBpqMRcVJwIAe5ClyoUCA AA= X-Env-Sender: venu.busireddy@oracle.com X-Msg-Ref: server-5.tower-31.messagelabs.com!1496863506!101335548!1 X-Originating-IP: [141.146.126.69] X-SpamReason: No, hits=0.0 required=7.0 tests=sa_preprocessor: VHJ1c3RlZCBJUDogMTQxLjE0Ni4xMjYuNjkgPT4gMjc3MjE4\n X-StarScan-Received: X-StarScan-Version: 9.4.19; banners=-,-,- X-VirusChecked: Checked Received: (qmail 25241 invoked from network); 7 Jun 2017 19:25:07 -0000 Received: from aserp1040.oracle.com (HELO aserp1040.oracle.com) (141.146.126.69) by server-5.tower-31.messagelabs.com with DHE-RSA-AES256-GCM-SHA384 encrypted SMTP; 7 Jun 2017 19:25:07 -0000 Received: from aserv0022.oracle.com (aserv0022.oracle.com [141.146.126.234]) by aserp1040.oracle.com (Sentrion-MTA-4.3.2/Sentrion-MTA-4.3.2) with ESMTP id v57JP4VG020332 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 7 Jun 2017 19:25:05 GMT Received: from aserv0121.oracle.com (aserv0121.oracle.com [141.146.126.235]) by aserv0022.oracle.com (8.14.4/8.14.4) with ESMTP id v57JP4uo005819 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Wed, 7 Jun 2017 19:25:04 GMT Received: from abhmp0014.oracle.com (abhmp0014.oracle.com [141.146.116.20]) by aserv0121.oracle.com (8.13.8/8.13.8) with ESMTP id v57JP2t5002167; Wed, 7 Jun 2017 19:25:03 GMT Received: from localhost.localdomain (/10.154.171.198) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Wed, 07 Jun 2017 12:25:01 -0700 From: Venu Busireddy To: venu.busireddy@oracle.com, Ian Jackson , Wei Liu Date: Wed, 7 Jun 2017 14:24:32 -0500 Message-Id: <20170607192432.20500-1-venu.busireddy@oracle.com> X-Mailer: git-send-email 2.12.2 X-Source-IP: aserv0022.oracle.com [141.146.126.234] Cc: xen-devel@lists.xen.org Subject: [Xen-devel] Containing unrecoverable AER errors... X-BeenThere: xen-devel@lists.xen.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: xen-devel-bounces@lists.xen.org Sender: "Xen-devel" X-Virus-Scanned: ClamAV using ClamSMTP Hi, I am working on creating a patch to aid in containing the unrecoverable AER errors generated by PCI devices assigned to guests in passthrough mode. The overall approach is as follows: 1. Change the BIOS settings such that the AER error handling is delegated to the host. 2. Change the xen_pciback driver to store the name (SBDF) of the erring device in xenstore. 3. At the time of creating the guest, setup a watcher for such writes to the xenstore. 4. When the watcher is kicked off due to errors, *shutdown* the guest and mark the erring device unassignable until administrative intervention. I got all of this working, but I was advised that shutting down the guest is not the correct approach, because the guest may or may not respond to the shutdown. The suggestion was to destroy the guest. I ran into a problem with that. libxl_domain_destroy() is not callable from within libxl. I tried to create a new wrapper to call libxl__domain_destroy(), but the callback function never gets called! Not surprisingly, because the description in libxl/libxl_internal.h about asynchronous operations does prohibit this! What is the best way to kill/destroy a guest from within libxl? Could you please advise? I am including the patches below for reference (please ignore the few debug statements). The problem part is the function aer_backend_watch_callback() in tools/libxl/libxl_pci.c. diff --git a/drivers/xen/xen-pciback/pci_stub.c b/drivers/xen/xen-pciback/pci_stub.c index 6331a95..5a4bae5 100644 --- a/drivers/xen/xen-pciback/pci_stub.c +++ b/drivers/xen/xen-pciback/pci_stub.c @@ -656,11 +656,13 @@ static const struct pci_device_id pcistub_ids[] = { }; #define PCI_NODENAME_MAX 40 +#define PCI_DEVICENAME_MAX 14 static void kill_domain_by_device(struct pcistub_device *psdev) { struct xenbus_transaction xbt; int err; char nodename[PCI_NODENAME_MAX]; + char devicename[PCI_DEVICENAME_MAX]; BUG_ON(!psdev); snprintf(nodename, PCI_NODENAME_MAX, "/local/domain/0/backend/pci/%d/0", @@ -675,6 +677,18 @@ static void kill_domain_by_device(struct pcistub_device *psdev) } /*PV AER handlers will set this flag*/ xenbus_printf(xbt, nodename, "aerState" , "aerfail"); + + /* + * Xend versions <= 4.4 depend on "aerState" and expect its value + * to be set to "aerfail". Therefore, add a new node "aerFailedSBDF" + * to set the device name. + */ + snprintf(devicename, PCI_DEVICENAME_MAX, "%04x:%02x:%02x.%x", + pci_domain_nr(psdev->dev->bus), + psdev->dev->bus->number, + PCI_SLOT(psdev->dev->devfn), PCI_FUNC(psdev->dev->devfn)); + xenbus_printf(xbt, nodename, "aerFailedSBDF" , devicename); + err = xenbus_transaction_end(xbt, 0); if (err) { if (err == -EAGAIN) diff --git a/xen/common/domctl.c b/xen/common/domctl.c index 951a5dc..5e0f123 100644 --- a/xen/common/domctl.c +++ b/xen/common/domctl.c @@ -393,9 +393,15 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl) { case XEN_DOMCTL_createdomain: case XEN_DOMCTL_test_assign_device: + case XEN_DOMCTL_test_hidden_device: case XEN_DOMCTL_gdbsx_guestmemio: d = NULL; break; + case XEN_DOMCTL_hide_device: + case XEN_DOMCTL_unhide_device: + rcu_lock_domain(dom_xen); + d = dom_xen; + break; default: d = rcu_lock_domain_by_id(op->domain); if ( !d && op->cmd != XEN_DOMCTL_getdomaininfo ) diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c index c8e2d2d..eb8fd99 100644 --- a/xen/drivers/passthrough/pci.c +++ b/xen/drivers/passthrough/pci.c @@ -31,6 +31,7 @@ #include #include #include +#include #include #include "ats.h" @@ -1333,19 +1334,31 @@ int iommu_remove_device(struct pci_dev *pdev) return hd->platform_ops->remove_device(pdev->devfn, pci_to_dev(pdev)); } +static bool device_assigned_to_domain(struct domain *d, u16 seg, u8 bus, u8 devfn) +{ + bool rc = false; + + pcidevs_lock(); + + if ( pci_get_pdev_by_domain(d, seg, bus, devfn) ) + rc = true; + + pcidevs_unlock(); + return rc; +} + /* * If the device isn't owned by the hardware domain, it means it already * has been assigned to other domain, or it doesn't exist. */ static int device_assigned(u16 seg, u8 bus, u8 devfn) { - struct pci_dev *pdev; - - pcidevs_lock(); - pdev = pci_get_pdev_by_domain(hardware_domain, seg, bus, devfn); - pcidevs_unlock(); + return device_assigned_to_domain(hardware_domain, seg, bus, devfn) ? 0 : -EBUSY; +} - return pdev ? 0 : -EBUSY; +static int device_hidden(u16 seg, u8 bus, u8 devfn) +{ + return device_assigned_to_domain(dom_xen, seg, bus, devfn) ? -EBUSY : 0; } static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn, u32 flag) @@ -1354,6 +1367,22 @@ static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn, u32 flag) struct pci_dev *pdev; int rc = 0; + if ( device_hidden(seg, bus, devfn) ) + return -EINVAL; + + if ( d == dom_xen ) + { + pdev = pci_get_pdev(seg, bus, devfn); + if ( pdev ) + { + pdev->domain = dom_xen; + list_add(&pdev->domain_list, &dom_xen->arch.pdev_list); + return rc; + } + else + return -ENODEV; + } + if ( !iommu_enabled || !hd->platform_ops ) return 0; @@ -1417,10 +1446,23 @@ int deassign_device(struct domain *d, u16 seg, u8 bus, u8 devfn) struct pci_dev *pdev = NULL; int ret = 0; + ASSERT(pcidevs_locked()); + + if ( d == dom_xen ) + { + pdev = pci_get_pdev(seg, bus, devfn); + if ( pdev ) + { + pdev->domain = hardware_domain; + list_add(&pdev->domain_list, &dom_xen->arch.pdev_list); + return ret; + } + else return -ENODEV; + } + if ( !iommu_enabled || !hd->platform_ops ) return -EINVAL; - ASSERT(pcidevs_locked()); pdev = pci_get_pdev_by_domain(d, seg, bus, devfn); if ( !pdev ) return -ENODEV; @@ -1600,6 +1642,15 @@ int iommu_do_pci_domctl( seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn)); ret = -EINVAL; } + + if ( device_hidden(seg, bus, devfn) ) + { + printk(XENLOG_G_INFO + "%04x:%02x:%02x.%u device is hidden\n", + seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn)); + ret = -EINVAL; + } + break; case XEN_DOMCTL_assign_device: @@ -1636,8 +1687,15 @@ int iommu_do_pci_domctl( break; } - ret = device_assigned(seg, bus, devfn) ?: - assign_device(d, seg, bus, devfn, flag); + if ( device_hidden(seg, bus, devfn) ) + { + ret = -EINVAL; + break; + } + + if ( !device_assigned(seg, bus, devfn) ) + ret = assign_device(d, seg, bus, devfn, flag); + if ( ret == -ERESTART ) ret = hypercall_create_continuation(__HYPERVISOR_domctl, "h", u_domctl); @@ -1671,6 +1729,12 @@ int iommu_do_pci_domctl( bus = PCI_BUS(machine_sbdf); devfn = PCI_DEVFN2(machine_sbdf); + if ( device_hidden(seg, bus, devfn) ) + { + ret = -EINVAL; + break; + } + pcidevs_lock(); ret = deassign_device(d, seg, bus, devfn); pcidevs_unlock(); @@ -1679,7 +1743,86 @@ int iommu_do_pci_domctl( "deassign %04x:%02x:%02x.%u from dom%d failed (%d)\n", seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn), d->domain_id, ret); + break; + + case XEN_DOMCTL_hide_device: + machine_sbdf = domctl->u.assign_device.u.pci.machine_sbdf; + ret = xsm_hide_device(XSM_HOOK, d, machine_sbdf); + if ( ret ) + break; + + if ( unlikely(d->is_dying) ) + { + ret = -EAGAIN; + break; + } + + seg = machine_sbdf >> 16; + bus = PCI_BUS(machine_sbdf); + devfn = PCI_DEVFN2(machine_sbdf); + flag = domctl->u.assign_device.flag; + + if ( device_hidden(seg, bus, devfn) ) + { + ret = -EINVAL; + break; + } + + pcidevs_lock(); + ret = assign_device(dom_xen, seg, bus, devfn, flag); + pcidevs_unlock(); + if ( ret == -ERESTART ) + ret = hypercall_create_continuation(__HYPERVISOR_domctl, + "h", u_domctl); + else if ( ret ) + printk(XENLOG_G_ERR "XEN_DOMCTL_hide_device: " + "hide %04x:%02x:%02x.%u failed (%d)\n", + seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn), ret); + break; + + case XEN_DOMCTL_unhide_device: + machine_sbdf = domctl->u.assign_device.u.pci.machine_sbdf; + ret = xsm_unhide_device(XSM_HOOK, d, machine_sbdf); + if ( ret ) + break; + + if ( unlikely(d->is_dying) ) + { + ret = -EINVAL; + break; + } + + seg = machine_sbdf >> 16; + bus = PCI_BUS(machine_sbdf); + devfn = PCI_DEVFN2(machine_sbdf); + + if ( !device_hidden(seg, bus, devfn) ) + { + ret = -EINVAL; + break; + } + + pcidevs_lock(); + ret = deassign_device(dom_xen, seg, bus, devfn); + pcidevs_unlock(); + + if ( ret == -ERESTART ) + ret = hypercall_create_continuation(__HYPERVISOR_domctl, + "h", u_domctl); + else if ( ret ) + printk(XENLOG_G_ERR "XEN_DOMCTL_unhide_device: " + "assign %04x:%02x:%02x.%u to dom%d failed (%d)\n", + seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn), + d->domain_id, ret); + break; + + case XEN_DOMCTL_test_hidden_device: + machine_sbdf = domctl->u.assign_device.u.pci.machine_sbdf; + seg = machine_sbdf >> 16; + bus = PCI_BUS(machine_sbdf); + devfn = PCI_DEVFN2(machine_sbdf); + ret = device_hidden(seg, bus, devfn); break; default: diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h index e6cf211..1b043ea 100644 --- a/xen/include/public/domctl.h +++ b/xen/include/public/domctl.h @@ -1222,6 +1222,9 @@ struct xen_domctl { #define XEN_DOMCTL_gdbsx_pausevcpu 1001 #define XEN_DOMCTL_gdbsx_unpausevcpu 1002 #define XEN_DOMCTL_gdbsx_domstatus 1003 +#define XEN_DOMCTL_hide_device 2001 +#define XEN_DOMCTL_unhide_device 2002 +#define XEN_DOMCTL_test_hidden_device 2003 uint32_t interface_version; /* XEN_DOMCTL_INTERFACE_VERSION */ domid_t domain; union { diff --git a/xen/include/xsm/dummy.h b/xen/include/xsm/dummy.h index 62fcea6..0b820e1 100644 --- a/xen/include/xsm/dummy.h +++ b/xen/include/xsm/dummy.h @@ -355,6 +355,24 @@ static XSM_INLINE int xsm_deassign_device(XSM_DEFAULT_ARG struct domain *d, uint return xsm_default_action(action, current->domain, d); } +static XSM_INLINE int xsm_hide_device(XSM_DEFAULT_ARG struct domain *d, uint32_t machine_bdf) +{ + XSM_ASSERT_ACTION(XSM_HOOK); + return xsm_default_action(action, current->domain, d); +} + +static XSM_INLINE int xsm_unhide_device(XSM_DEFAULT_ARG struct domain *d, uint32_t machine_bdf) +{ + XSM_ASSERT_ACTION(XSM_HOOK); + return xsm_default_action(action, current->domain, d); +} + +static XSM_INLINE int xsm_test_hidden_device(XSM_DEFAULT_ARG uint32_t machine_bdf) +{ + XSM_ASSERT_ACTION(XSM_HOOK); + return xsm_default_action(action, current->domain, NULL); +} + #endif /* HAS_PASSTHROUGH && HAS_PCI */ #if defined(CONFIG_HAS_PASSTHROUGH) && defined(CONFIG_HAS_DEVICE_TREE) diff --git a/xen/include/xsm/xsm.h b/xen/include/xsm/xsm.h index 60c0fd6..03dbeff 100644 --- a/xen/include/xsm/xsm.h +++ b/xen/include/xsm/xsm.h @@ -479,6 +479,22 @@ static inline int xsm_deassign_device(xsm_default_t def, struct domain *d, uint3 { return xsm_ops->deassign_device(d, machine_bdf); } + +static inline int xsm_hide_device(xsm_default_t def, struct domain *d, uint32_t machine_bdf) +{ + return xsm_ops->hide_device(d, machine_bdf); +} + +static inline int xsm_unhide_device(xsm_default_t def, struct domain *d, uint32_t machine_bdf) +{ + return xsm_ops->unhide_device(d, machine_bdf); +} + +static inline int xsm_test_hidden_device(xsm_default_t def, uint32_t machine_bdf) +{ + return xsm_ops->test_hidden_device(machine_bdf); +} + #endif /* HAS_PASSTHROUGH && HAS_PCI) */ #if defined(CONFIG_HAS_PASSTHROUGH) && defined(CONFIG_HAS_DEVICE_TREE) diff --git a/xen/xsm/dummy.c b/xen/xsm/dummy.c index 3cb5492..78111bb 100644 --- a/xen/xsm/dummy.c +++ b/xen/xsm/dummy.c @@ -94,6 +94,9 @@ void __init xsm_fixup_ops (struct xsm_operations *ops) set_to_dummy_if_null(ops, test_assign_device); set_to_dummy_if_null(ops, assign_device); set_to_dummy_if_null(ops, deassign_device); + set_to_dummy_if_null(ops, hide_device); + set_to_dummy_if_null(ops, unhide_device); + set_to_dummy_if_null(ops, test_hidden_device); #endif #if defined(CONFIG_HAS_PASSTHROUGH) && defined(CONFIG_HAS_DEVICE_TREE) diff --git a/xen/xsm/flask/hooks.c b/xen/xsm/flask/hooks.c index fd84ac0..3695768 100644 --- a/xen/xsm/flask/hooks.c +++ b/xen/xsm/flask/hooks.c @@ -1311,6 +1311,22 @@ static int flask_deassign_device(struct domain *d, uint32_t machine_bdf) return avc_current_has_perm(rsid, SECCLASS_RESOURCE, RESOURCE__REMOVE_DEVICE, NULL); } + +static int flask_unhide_device(struct domain *d, uint32_t machine_bdf) +{ + return flask_deassign_device(d, machine_bdf); +} + +static int flask_hide_device(struct domain *d, uint32_t machine_bdf) +{ + return flask_assign_device(d, machine_bdf); +} + +static int flask_test_hidden_device(struct domain *d, uint32_t machine_bdf) +{ + return flask_test_assign_device(d, machine_bdf); +} + #endif /* HAS_PASSTHROUGH && HAS_PCI */ #if defined(CONFIG_HAS_PASSTHROUGH) && defined(CONFIG_HAS_DEVICE_TREE) @@ -1783,6 +1799,9 @@ static struct xsm_operations flask_ops = { .test_assign_device = flask_test_assign_device, .assign_device = flask_assign_device, .deassign_device = flask_deassign_device, + .hide_device = flask_hide_device, + .unhide_device = flask_unhide_device, + .test_hidden_device = flask_test_hidden_device, #endif #if defined(CONFIG_HAS_PASSTHROUGH) && defined(CONFIG_HAS_DEVICE_TREE) diff --git a/xen/xsm/flask/policy/access_vectors b/xen/xsm/flask/policy/access_vectors index 1f7eb35..873df59 100644 --- a/xen/xsm/flask/policy/access_vectors +++ b/xen/xsm/flask/policy/access_vectors @@ -437,13 +437,13 @@ class resource # XEN_DOMCTL_iomem_permission, XEN_DOMCTL_memory_mapping add_iomem remove_iomem -# XEN_DOMCTL_get_device_group, XEN_DOMCTL_test_assign_device: +# XEN_DOMCTL_get_device_group, XEN_DOMCTL_test_assign_device, XEN_DOMCTL_test_hidden_device: # source = domain making the hypercall # target = device being queried stat_device -# XEN_DOMCTL_assign_device +# XEN_DOMCTL_assign_device, XEN_DOMCTL_hide_device add_device -# XEN_DOMCTL_deassign_device +# XEN_DOMCTL_deassign_device, XEN_DOMCTL_unhide_device remove_device # checked for PCI hot and cold-plug hypercalls, with target as the PCI device # checked for CPU and memory hotplug with xen_t as the target diff --git a/docs/man/xl.pod.1.in b/docs/man/xl.pod.1.in index 78bf884..86f7089 100644 --- a/docs/man/xl.pod.1.in +++ b/docs/man/xl.pod.1.in @@ -1462,6 +1462,13 @@ These are devices in the system which are configured to be available for passthrough and are bound to a suitable PCI backend driver in domain 0 rather than a real driver. +=item B + +List all the assignable PCI devices that are hidden. +When a PCI device assigned to a guest in passthrough mode causes +unrecoverable AER errors, the hypervisor shuts down the guest and hides +the device from being assignable to the guests. + =item B I Make the device at PCI Bus/Device/Function BDF assignable to guests. @@ -1484,6 +1491,23 @@ it will also attempt to re-bind the device to its original driver, making it usable by Domain 0 again. If the device is not bound to pciback, it will return success. +=item B I + +Hide the device at PCI Bus/Device/Function BDF from being assignable +to guests, similar to the way the hypervisor would hide the device that +caused unrecoverable AER errors. +When a PCI device assigned to a guest in passthrough mode causes +unrecoverable AER errors, the hypervisor shuts down the guest and hides +the device from being assignable to the guests. + +=item B I + +Unhide the device at PCI Bus/Device/Function BDF that was previously +hidden by the hypervisor due to unrecoverable AER errors. +When a PCI device assigned to a guest in passthrough mode causes +unrecoverable AER errors, the hypervisor shuts down the guest and hides +the device from being assignable to the guests. + =item B I I Hot-plug a new pass-through pci device to the specified domain. diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h index 1629f41..9730285 100644 --- a/tools/libxc/include/xenctrl.h +++ b/tools/libxc/include/xenctrl.h @@ -1670,6 +1670,10 @@ int xc_assign_device(xc_interface *xch, uint32_t machine_sbdf, uint32_t flag); +int xc_hide_device(xc_interface *xch, uint32_t machine_bdf); +int xc_unhide_device(xc_interface *xch, uint32_t machine_bdf); +int xc_test_hidden_device(xc_interface *xch, uint32_t machine_bdf); + int xc_get_device_group(xc_interface *xch, uint32_t domid, uint32_t machine_sbdf, diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c index 00909ad4..714d632 100644 --- a/tools/libxc/xc_domain.c +++ b/tools/libxc/xc_domain.c @@ -1501,6 +1501,44 @@ int xc_assign_device( return do_domctl(xch, &domctl); } +int xc_hide_device( + xc_interface *xch, + uint32_t machine_sbdf) +{ + DECLARE_DOMCTL; + + domctl.cmd = XEN_DOMCTL_hide_device; + domctl.domain = DOMID_XEN; + domctl.u.assign_device.u.pci.machine_sbdf = machine_sbdf; + + return do_domctl(xch, &domctl); +} + +int xc_unhide_device( + xc_interface *xch, + uint32_t machine_sbdf) +{ + DECLARE_DOMCTL; + + domctl.cmd = XEN_DOMCTL_unhide_device; + domctl.domain = DOMID_XEN; + domctl.u.assign_device.u.pci.machine_sbdf = machine_sbdf; + + return do_domctl(xch, &domctl); +} + +int xc_test_hidden_device( + xc_interface *xch, + uint32_t machine_sbdf) +{ + DECLARE_DOMCTL; + + domctl.cmd = XEN_DOMCTL_test_hidden_device; + domctl.u.assign_device.u.pci.machine_sbdf = machine_sbdf; + + return do_domctl(xch, &domctl); +} + int xc_get_device_group( xc_interface *xch, uint32_t domid, diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h index cf8687a..5a5bd14 100644 --- a/tools/libxl/libxl.h +++ b/tools/libxl/libxl.h @@ -1944,6 +1944,9 @@ int libxl_device_events_handler(libxl_ctx *ctx, int libxl_device_pci_assignable_add(libxl_ctx *ctx, libxl_device_pci *pcidev, int rebind); int libxl_device_pci_assignable_remove(libxl_ctx *ctx, libxl_device_pci *pcidev, int rebind); libxl_device_pci *libxl_device_pci_assignable_list(libxl_ctx *ctx, int *num); +int libxl_device_pci_assignable_hide(libxl_ctx *ctx, libxl_device_pci *pcidev); +int libxl_device_pci_assignable_unhide(libxl_ctx *ctx, libxl_device_pci *pcidev); +int libxl_device_pci_assignable_is_hidden(libxl_ctx *ctx, libxl_device_pci *pcidev); /* CPUID handling */ int libxl_cpuid_parse_config(libxl_cpuid_policy_list *cpuid, const char* str); diff --git a/tools/libxl/libxl_event.h b/tools/libxl/libxl_event.h index 1ea789e..4c78798 100644 --- a/tools/libxl/libxl_event.h +++ b/tools/libxl/libxl_event.h @@ -178,6 +178,8 @@ void libxl_event_register_callbacks(libxl_ctx *ctx, typedef struct libxl__evgen_domain_death libxl_evgen_domain_death; int libxl_evenable_domain_death(libxl_ctx *ctx, uint32_t domid, libxl_ev_user, libxl_evgen_domain_death **evgen_out); +int libxl_reg_aer_events_handler(libxl_ctx *, uint32_t) LIBXL_EXTERNAL_CALLERS_ONLY; +void libxl_unreg_aer_events_handler(libxl_ctx *, uint32_t); void libxl_evdisable_domain_death(libxl_ctx *ctx, libxl_evgen_domain_death*); /* Arranges for the generation of DOMAIN_SHUTDOWN and DOMAIN_DEATH * events. A domain which is destroyed before it shuts down diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c index b14df16..ad7fb27 100644 --- a/tools/libxl/libxl_pci.c +++ b/tools/libxl/libxl_pci.c @@ -874,6 +874,42 @@ int libxl_device_pci_assignable_add(libxl_ctx *ctx, libxl_device_pci *pcidev, return rc; } +int libxl_device_pci_assignable_hide(libxl_ctx *ctx, libxl_device_pci *pcidev) +{ + GC_INIT(ctx); + int rc; + + rc = xc_hide_device(ctx->xch, pcidev_encode_bdf(pcidev)); + if (rc < 0) + LOGD(ERROR, 0, "xc_hide_device failed"); + + GC_FREE; + return rc; +} + +int libxl_device_pci_assignable_unhide(libxl_ctx *ctx, libxl_device_pci *pcidev) +{ + GC_INIT(ctx); + int rc; + + rc = xc_unhide_device(ctx->xch, pcidev_encode_bdf(pcidev)); + if (rc < 0) + LOGD(ERROR, 0, "xc_unhide_device failed"); + + GC_FREE; + return rc; +} + +int libxl_device_pci_assignable_is_hidden(libxl_ctx *ctx, libxl_device_pci *pcidev) +{ + GC_INIT(ctx); + int rc; + + rc = xc_test_hidden_device(ctx->xch, pcidev_encode_bdf(pcidev)); + + GC_FREE; + return rc; +} int libxl_device_pci_assignable_remove(libxl_ctx *ctx, libxl_device_pci *pcidev, int rebind) @@ -1292,6 +1328,138 @@ out: return rc; } +static void domain_destroy_callback(libxl__egc *egc, + libxl__domain_destroy_state *dds, + int rc) +{ + STATE_AO_GC(dds->ao); + + LOGD(ERROR, dds->domid, "domain_destroy_callback(): Entered"); + if (rc) + LOGD(ERROR, dds->domid, "Destruction of domain failed, rc = %d", rc); + + libxl__ao_complete(egc, ao, rc); + LOGD(ERROR, dds->domid, "domain_destroy_callback(): Exiting"); +} + + +static int domain_destroy(libxl_ctx *ctx, uint32_t domid) +{ + AO_CREATE(ctx, domid, (const libxl_asyncop_how *)0); + libxl__domain_destroy_state *dds; + + LOGD(ERROR, domid, "domain_destroy(): Entered"); + GCNEW(dds); + dds->ao = ao; + dds->domid = domid; + dds->callback = domain_destroy_callback; + libxl__domain_destroy(egc, dds); + + LOGD(ERROR, domid, "domain_destroy(): ao->complete = %d", ao->complete); + while (!ao->complete) { + LOGD(ERROR, domid, "domain_destroy(): ao->complete = %d", ao->complete); + sleep(5); + } + + LOGD(ERROR, domid, "domain_destroy(): Exiting, rc = 0x%08x", ao->rc); + return ao->rc; +} + +typedef struct { + uint32_t domid; + libxl__ev_xswatch watch; +} libxl_aer_watch; +static libxl_aer_watch aer_watch; + +static void aer_backend_watch_callback(libxl__egc *egc, + libxl__ev_xswatch *watch, + const char *watch_path, + const char *event_path) +{ + EGC_GC; + libxl_aer_watch *l_aer_watch = CONTAINER_OF(watch, *l_aer_watch, watch); + libxl_ctx *ctx = libxl__gc_owner(gc); + uint32_t domid = l_aer_watch->domid; + uint32_t seg, bus, dev, fn; + int rc; + char *p, *path, *dst_path; + const char *aerFailedSBDF; + struct xs_permissions rwperm[1]; + + LOGD(ERROR, domid, " Entered, ctx = %p", ctx); + if (strlen(watch_path) > 0) LOGD(ERROR, domid, " watch_path = %s", watch_path); + if (strlen(event_path) > 0) LOGD(ERROR, domid, " event_path = %s", event_path); + LOGD(ERROR, domid, " domid = %d", domid); + + /* Extract the backend directory. */ + path = libxl__strdup(gc, event_path); + p = strrchr(path, '/'); + if (p == NULL) + goto skip; + if (strcmp(p, "/aerFailedSBDF") != 0) + goto skip; + /* Truncate the string so it points to the backend directory. */ + *p = '\0'; + + /* Fetch the value of the failed PCI device. */ + rc = libxl__xs_read_checked(gc, XBT_NULL, + GCSPRINTF("%s/aerFailedSBDF", path), &aerFailedSBDF); + if (rc || !aerFailedSBDF) + goto skip; + + libxl_unreg_aer_events_handler(ctx, domid); + + rc = domain_destroy(ctx, domid); + + sscanf(aerFailedSBDF, "%x:%x:%x.%x", &seg, &bus, &dev, &fn); + rc = xc_hide_device(ctx->xch, seg << 16 | bus << 8 | dev << 3 | fn); + if (rc) + LOGD(ERROR, domid, " xc_hide_device() failed, rc = %d", rc); + + rwperm[0].id = 0; + rwperm[0].perms = XS_PERM_READ | XS_PERM_WRITE; + dst_path = GCSPRINTF("/local/domain/0/backend/pci/0/0/%s", "aerFailedPCIs"); + rc = libxl__xs_mknod(gc, XBT_NULL, dst_path, rwperm, 1); + if (rc) { + LOGD(ERROR, domid, " libxl__xs_mknod() failed, rc = %d", rc); + goto skip; + } + + rc = libxl__xs_write_checked(gc, XBT_NULL, dst_path, aerFailedSBDF); + if (rc) + LOGD(ERROR, domid, " libxl__xs_write_checked() failed, rc = %d", rc); + +skip: + return; +} + +/* Handler of events for device driver domains */ +int libxl_reg_aer_events_handler(libxl_ctx *ctx, uint32_t domid) +{ + int rc; + char *be_path; + GC_INIT(ctx); + + /* + * We use absolute paths because we want xswatch to also return + * absolute paths that can be parsed by libxl__parse_backend_path. + */ + aer_watch.domid = domid; + be_path = GCSPRINTF("/local/domain/0/backend/pci/%u/0/aerFailedSBDF", domid); + rc = libxl__ev_xswatch_register(gc, &aer_watch.watch, + aer_backend_watch_callback, be_path); + return rc; +} + +/* Handler of events for device driver domains */ +void libxl_unreg_aer_events_handler(libxl_ctx *ctx, uint32_t domid) +{ + GC_INIT(ctx); + + libxl__ev_xswatch_deregister(gc, &aer_watch.watch); + return; +} + static void libxl__add_pcidevs(libxl__egc *egc, libxl__ao *ao, uint32_t domid, libxl_domain_config *d_config, libxl__multidev *multidev) diff --git a/tools/python/xen/lowlevel/xc/xc.c b/tools/python/xen/lowlevel/xc/xc.c index 5d112af..ad822df 100644 --- a/tools/python/xen/lowlevel/xc/xc.c +++ b/tools/python/xen/lowlevel/xc/xc.c @@ -609,6 +609,76 @@ static PyObject *pyxc_deassign_device(XcObject *self, return Py_BuildValue("i", sbdf); } +static PyObject *pyxc_hide_device(XcObject *self, + PyObject *args, + PyObject *kwds) +{ + uint32_t sbdf = 0; + char *pci_str; + int seg, bus, dev, func; + static char *kwd_list[] = { "pci", NULL }; + + if ( !PyArg_ParseTupleAndKeywords(args, kwds, "is", kwd_list, &pci_str) ) + { + sbdf = -1; + goto end_hide; + } + + while ( next_bdf(&pci_str, &seg, &bus, &dev, &func) ) + { + sbdf = seg << 16; + sbdf |= (bus & 0xff) << 8; + sbdf |= (dev & 0x1f) << 3; + sbdf |= (func & 0x7); + + if ( xc_hide_device(self->xc_handle, sbdf) != 0 ) + { + if ( errno == ENOSYS ) + sbdf = -1; + break; + } + sbdf = 0; + } + +end_hide: + return Py_BuildValue("i", sbdf); +} + +static PyObject *pyxc_unhide_device(XcObject *self, + PyObject *args, + PyObject *kwds) +{ + uint32_t sbdf = 0; + char *pci_str; + int seg, bus, dev, func; + static char *kwd_list[] = { "pci", NULL }; + + if ( !PyArg_ParseTupleAndKeywords(args, kwds, "is", kwd_list, &pci_str) ) + { + sbdf = -1; + goto end_unhide; + } + + while ( next_bdf(&pci_str, &seg, &bus, &dev, &func) ) + { + sbdf = seg << 16; + sbdf |= (bus & 0xff) << 8; + sbdf |= (dev & 0x1f) << 3; + sbdf |= (func & 0x7); + + if ( xc_unhide_device(self->xc_handle, sbdf) != 0 ) + { + if ( errno == ENOSYS ) + sbdf = -1; + break; + } + sbdf = 0; + } + +end_unhide: + return Py_BuildValue("i", sbdf); +} + static PyObject *pyxc_get_device_group(XcObject *self, PyObject *args) { @@ -2233,7 +2303,21 @@ static PyMethodDef pyxc_methods[] = { " dom [int]: Domain to deassign device from.\n" " pci_str [str]: PCI devices.\n" "Returns: [int] 0 on success, or device bdf that can't be deassigned.\n" }, + + { "hide_device", + (PyCFunction)pyxc_hide_device, + METH_VARARGS | METH_KEYWORDS, "\n" + "Hide device after AER fatal error trigger.\n" + " pci_str [str]: PCI devices.\n" + "Returns: [int] device bdf on success or -1 if it cant be hidden.\n" }, + { "unhide_device", + (PyCFunction)pyxc_unhide_device, + METH_VARARGS | METH_KEYWORDS, "\n" + "Unhide hidden device after AER fatal error trigger.\n" + " pci_str [str]: PCI devices.\n" + "Returns: [int] device bdf on success or -1 if it cant be unhidden.\n" }, + { "sched_id_get", (PyCFunction)pyxc_sched_id_get, METH_NOARGS, "\n" diff --git a/tools/xl/xl.h b/tools/xl/xl.h index aa95b77..915fe86 100644 --- a/tools/xl/xl.h +++ b/tools/xl/xl.h @@ -121,9 +121,12 @@ int main_vncviewer(int argc, char **argv); int main_pcilist(int argc, char **argv); int main_pcidetach(int argc, char **argv); int main_pciattach(int argc, char **argv); +int main_pciassignable_hide(int argc, char **argv); +int main_pciassignable_unhide(int argc, char **argv); int main_pciassignable_add(int argc, char **argv); int main_pciassignable_remove(int argc, char **argv); int main_pciassignable_list(int argc, char **argv); +int main_pciassignable_list_hidden(int argc, char **argv); #ifndef LIBXL_HAVE_NO_SUSPEND_RESUME int main_restore(int argc, char **argv); int main_migrate_receive(int argc, char **argv); diff --git a/tools/xl/xl_cmdtable.c b/tools/xl/xl_cmdtable.c index 30eb93c..e23bd15 100644 --- a/tools/xl/xl_cmdtable.c +++ b/tools/xl/xl_cmdtable.c @@ -119,6 +119,23 @@ struct cmd_spec cmd_table[] = { "List all the assignable pci devices", "", }, + { "pci-assignable-list-hidden", + &main_pciassignable_list_hidden, 0, 0, + "List all the pci devices hidden due to AER errors", + "", + }, + { "pci-assignable-hide", + &main_pciassignable_hide, 0, 1, + "Hide a PCI device", + "", + "-h Print this help.\n" + }, + { "pci-assignable-unhide", + &main_pciassignable_unhide, 0, 1, + "Unhide a PCI device", + "", + "-h Print this help.\n" + }, { "pause", &main_pause, 0, 1, "Pause execution of a domain", diff --git a/tools/xl/xl_pci.c b/tools/xl/xl_pci.c index 58345bd..f48c469 100644 --- a/tools/xl/xl_pci.c +++ b/tools/xl/xl_pci.c @@ -163,8 +163,9 @@ static void pciassignable_list(void) if ( pcidevs == NULL ) return; for (i = 0; i < num; i++) { - printf("%04x:%02x:%02x.%01x\n", - pcidevs[i].domain, pcidevs[i].bus, pcidevs[i].dev, pcidevs[i].func); + if (!libxl_device_pci_assignable_is_hidden(ctx, &pcidevs[i])) + printf("%04x:%02x:%02x.%01x\n", + pcidevs[i].domain, pcidevs[i].bus, pcidevs[i].dev, pcidevs[i].func); libxl_device_pci_dispose(&pcidevs[i]); } free(pcidevs); @@ -182,6 +183,126 @@ int main_pciassignable_list(int argc, char **argv) return 0; } +static void pciassignable_list_hidden(void) +{ + libxl_device_pci *pcidevs; + int num, i; + + pcidevs = libxl_device_pci_assignable_list(ctx, &num); + + if ( pcidevs == NULL ) + return; + for (i = 0; i < num; i++) { + if (libxl_device_pci_assignable_is_hidden(ctx, &pcidevs[i])) + printf("%04x:%02x:%02x.%01x\n", + pcidevs[i].domain, pcidevs[i].bus, pcidevs[i].dev, pcidevs[i].func); + libxl_device_pci_dispose(&pcidevs[i]); + } + free(pcidevs); +} + +int main_pciassignable_list_hidden(int argc, char **argv) +{ + int opt; + + SWITCH_FOREACH_OPT(opt, "", NULL, "pci-assignable-list-hidden", 0) { + /* No options */ + } + + pciassignable_list_hidden(); + return 0; +} + +static int pciassignable_hide(const char *bdf) +{ + libxl_device_pci pcidev; + XLU_Config *config; + int r = EXIT_SUCCESS; + + libxl_device_pci_init(&pcidev); + + config = xlu_cfg_init(stderr, "command line"); + if (!config) { + perror("xlu_cfg_init"); + exit(-1); + } + + if (xlu_pci_parse_bdf(config, &pcidev, bdf)) { + fprintf(stderr, "pci-assignable-hide: malformed BDF specification \"%s\"\n", bdf); + exit(2); + } + + if (libxl_device_pci_assignable_hide(ctx, &pcidev)) + r = EXIT_FAILURE; + + libxl_device_pci_dispose(&pcidev); + xlu_cfg_destroy(config); + + return r; +} + +int main_pciassignable_hide(int argc, char **argv) +{ + int opt; + const char *bdf = NULL; + + SWITCH_FOREACH_OPT(opt, "", NULL, "main_pciassignable_hide", 1) { + /* No options */ + } + + bdf = argv[optind]; + + if (pciassignable_hide(bdf)) + return EXIT_FAILURE; + + return EXIT_SUCCESS; +} + +static int pciassignable_unhide(const char *bdf) +{ + libxl_device_pci pcidev; + XLU_Config *config; + int r = EXIT_SUCCESS; + + libxl_device_pci_init(&pcidev); + + config = xlu_cfg_init(stderr, "command line"); + if (!config) { + perror("xlu_cfg_init"); + exit(-1); + } + + if (xlu_pci_parse_bdf(config, &pcidev, bdf)) { + fprintf(stderr, "pci-assignable-unhide: malformed BDF specification \"%s\"\n", bdf); + exit(2); + } + + if (libxl_device_pci_assignable_unhide(ctx, &pcidev)) + r = EXIT_FAILURE; + + libxl_device_pci_dispose(&pcidev); + xlu_cfg_destroy(config); + + return r; +} + +int main_pciassignable_unhide(int argc, char **argv) +{ + int opt; + const char *bdf = NULL; + + SWITCH_FOREACH_OPT(opt, "", NULL, "main_pciassignable_unhide", 1) { + /* No options */ + } + + bdf = argv[optind]; + + if (pciassignable_unhide(bdf)) + return EXIT_FAILURE; + + return EXIT_SUCCESS; +} + static int pciassignable_add(const char *bdf, int rebind) { libxl_device_pci pcidev; diff --git a/tools/xl/xl_vmcontrol.c b/tools/xl/xl_vmcontrol.c index 89c2b25..10a48a9 100644 --- a/tools/xl/xl_vmcontrol.c +++ b/tools/xl/xl_vmcontrol.c @@ -966,6 +966,15 @@ start: LOG("Waiting for domain %s (domid %u) to die [pid %ld]", d_config.c_info.name, domid, (long)getpid()); + ret = libxl_reg_aer_events_handler(ctx, domid); + if (ret) { + /* + * This error may not be severe enough to fail the creation of the VM. + * Log the error, and continue with the creation. + */ + LOG("libxl_reg_aer_events_handler() failed, ret = 0x%08x", ret); + } + ret = libxl_evenable_domain_death(ctx, domid, 0, &deathw); if (ret) goto out; @@ -993,6 +1002,7 @@ start: LOG("Domain %u has shut down, reason code %d 0x%x", domid, event->u.domain_shutdown.shutdown_reason, event->u.domain_shutdown.shutdown_reason); + libxl_unreg_aer_events_handler(ctx, domid); switch (handle_domain_death(&domid, event, &d_config)) { case DOMAIN_RESTART_SOFT_RESET: domid_soft_reset = domid; @@ -1059,6 +1069,7 @@ start: case LIBXL_EVENT_TYPE_DOMAIN_DEATH: LOG("Domain %u has been destroyed.", domid); + libxl_unreg_aer_events_handler(ctx, domid); libxl_event_free(ctx, event); ret = 0; goto out;