diff mbox series

spapr_pci: Robustify support of PCI bridges

Message ID 159431476748.407044.16711294833569014964.stgit@bahia.lan (mailing list archive)
State New, archived
Headers show
Series spapr_pci: Robustify support of PCI bridges | expand

Commit Message

Greg Kurz July 9, 2020, 5:12 p.m. UTC
Some recent error handling cleanups unveiled issues with our support of
PCI bridges:

1) QEMU aborts when using non-standard PCI bridge types,
   unveiled by commit 7ef1553dac "spapr_pci: Drop some dead error handling"

$ qemu-system-ppc64 -M pseries -device pcie-pci-bridge
Unexpected error in object_property_find() at qom/object.c:1240:
qemu-system-ppc64: -device pcie-pci-bridge: Property '.chassis_nr' not found
Aborted (core dumped)

This happens because we assume all PCI bridge types to have a "chassis_nr"
property. This property only exists with the standard PCI bridge type
"pci-bridge" actually. We could possibly revert 7ef1553dac but it seems
much simpler to check the presence of "chassis_nr" earlier.

2) QEMU abort if same "chassis_nr" value is used several times,
   unveiled by commit d2623129a7de "qom: Drop parameter @errp of
   object_property_add() & friends"

$ qemu-system-ppc64 -M pseries -device pci-bridge,chassis_nr=1 \
                        -device pci-bridge,chassis_nr=1
Unexpected error in object_property_try_add() at qom/object.c:1167:
qemu-system-ppc64: -device pci-bridge,chassis_nr=1: attempt to add duplicate property '40000100' to object (type 'container')
Aborted (core dumped)

This happens because we assume that "chassis_nr" values are unique, but
nobody enforces that and we end up generating duplicate DRC ids. The PCI
code doesn't really care for duplicate "chassis_nr" properties since it
is only used to initialize the "Chassis Number Register" of the bridge,
with no functional impact on QEMU. So, even if passing the same value
several times might look weird, it never broke anything before, so
I guess we don't necessarily want to enforce strict checking in the PCI
code now.

Workaround both issues in the PAPR code: check that the bridge has a
unique and non null "chassis_nr" when plugging it into its parent bus.

Fixes: 05929a6c5dfe ("spapr: Don't use bus number for building DRC ids")
Reported-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Greg Kurz <groug@kaod.org>
---
 hw/ppc/spapr_pci.c |   57 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 57 insertions(+)

Comments

David Gibson July 16, 2020, 4:45 a.m. UTC | #1
On Thu, Jul 09, 2020 at 07:12:47PM +0200, Greg Kurz wrote:
> Some recent error handling cleanups unveiled issues with our support of
> PCI bridges:
> 
> 1) QEMU aborts when using non-standard PCI bridge types,
>    unveiled by commit 7ef1553dac "spapr_pci: Drop some dead error handling"
> 
> $ qemu-system-ppc64 -M pseries -device pcie-pci-bridge
> Unexpected error in object_property_find() at qom/object.c:1240:
> qemu-system-ppc64: -device pcie-pci-bridge: Property '.chassis_nr' not found
> Aborted (core dumped)

Oops, I thought we had a check that we actually had a "pci-bridge"
device before continuing with the hotplug, but I guess not.

> This happens because we assume all PCI bridge types to have a "chassis_nr"
> property. This property only exists with the standard PCI bridge type
> "pci-bridge" actually. We could possibly revert 7ef1553dac but it seems
> much simpler to check the presence of "chassis_nr" earlier.

Hrm, right, 7ef1553dac was not really correct since add_drcs() really
can fail.

> 2) QEMU abort if same "chassis_nr" value is used several times,
>    unveiled by commit d2623129a7de "qom: Drop parameter @errp of
>    object_property_add() & friends"
> 
> $ qemu-system-ppc64 -M pseries -device pci-bridge,chassis_nr=1 \
>                         -device pci-bridge,chassis_nr=1
> Unexpected error in object_property_try_add() at qom/object.c:1167:
> qemu-system-ppc64: -device pci-bridge,chassis_nr=1: attempt to add duplicate property '40000100' to object (type 'container')
> Aborted (core dumped)
> 
> This happens because we assume that "chassis_nr" values are unique, but
> nobody enforces that and we end up generating duplicate DRC ids. The PCI
> code doesn't really care for duplicate "chassis_nr" properties since it
> is only used to initialize the "Chassis Number Register" of the bridge,
> with no functional impact on QEMU. So, even if passing the same value
> several times might look weird, it never broke anything before, so
> I guess we don't necessarily want to enforce strict checking in the PCI
> code now.

Yeah, I guess.  I'm pretty sure that the chassis number of bridges is
supposed to be system-unique (well, unique within the PCI domain at
least, I guess) as part of the hardware spec.  So specifying multiple
chassis ids the same is a user error, but we need a better failure
mode.

> Workaround both issues in the PAPR code: check that the bridge has a
> unique and non null "chassis_nr" when plugging it into its parent bus.
>
> Fixes: 05929a6c5dfe ("spapr: Don't use bus number for building DRC ids")

Arguably, it's really fixing 7ef1553dac.

> Reported-by: Thomas Huth <thuth@redhat.com>
> Signed-off-by: Greg Kurz <groug@kaod.org>

I had a few misgivings about the details of this, but I think I've
convinced myself they're fine.  There's a couple of things I'd like to
polish, but I'll do that as a follow up.

> ---
>  hw/ppc/spapr_pci.c |   57 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 57 insertions(+)
> 
> diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
> index 329002ac040e..09d52ef7954d 100644
> --- a/hw/ppc/spapr_pci.c
> +++ b/hw/ppc/spapr_pci.c
> @@ -1480,6 +1480,57 @@ static void spapr_pci_bridge_plug(SpaprPhbState *phb,
>      add_drcs(phb, bus);
>  }
>  
> +/* Returns non-zero if the value of "chassis_nr" is already in use */
> +static int check_chassis_nr(Object *obj, void *opaque)
> +{
> +    int new_chassis_nr =
> +        object_property_get_uint(opaque, "chassis_nr", &error_abort);
> +    int chassis_nr =
> +        object_property_get_uint(obj, "chassis_nr", NULL);
> +
> +    if (!object_dynamic_cast(obj, TYPE_PCI_BRIDGE)) {
> +        return 0;
> +    }
> +
> +    /* Skip unsupported bridge types */
> +    if (!chassis_nr) {
> +        return 0;
> +    }
> +
> +    /* Skip self */
> +    if (obj == opaque) {
> +        return 0;
> +    }
> +
> +    return chassis_nr == new_chassis_nr;
> +}
> +
> +static bool bridge_has_valid_chassis_nr(Object *bridge, Error **errp)
> +{
> +    int chassis_nr =
> +        object_property_get_uint(bridge, "chassis_nr", NULL);
> +
> +    /*
> +     * slotid_cap_init() already ensures that "chassis_nr" isn't null for
> +     * standard PCI bridges, so this really tells if "chassis_nr" is present
> +     * or not.
> +     */
> +    if (!chassis_nr) {
> +        error_setg(errp, "PCI Bridge lacks a \"chassis_nr\" property");
> +        error_append_hint(errp, "Try -device pci-bridge instead.\n");
> +        return false;
> +    }
> +
> +    /* We want unique values for "chassis_nr" */
> +    if (object_child_foreach_recursive(object_get_root(), check_chassis_nr,
> +                                       bridge)) {
> +        error_setg(errp, "Bridge chassis %d already in use", chassis_nr);
> +        return false;
> +    }
> +
> +    return true;
> +}
> +
>  static void spapr_pci_plug(HotplugHandler *plug_handler,
>                             DeviceState *plugged_dev, Error **errp)
>  {
> @@ -1491,6 +1542,12 @@ static void spapr_pci_plug(HotplugHandler *plug_handler,
>      PCIBus *bus = PCI_BUS(qdev_get_parent_bus(DEVICE(pdev)));
>      uint32_t slotnr = PCI_SLOT(pdev->devfn);
>  
> +    if (pc->is_bridge) {
> +        if (!bridge_has_valid_chassis_nr(OBJECT(plugged_dev), errp)) {
> +            return;
> +        }
> +    }
> +
>      /* if DR is disabled we don't need to do anything in the case of
>       * hotplug or coldplug callbacks
>       */
> 
>
Michael S. Tsirkin July 16, 2020, 6:53 a.m. UTC | #2
On Thu, Jul 09, 2020 at 07:12:47PM +0200, Greg Kurz wrote:
> Some recent error handling cleanups unveiled issues with our support of
> PCI bridges:
> 
> 1) QEMU aborts when using non-standard PCI bridge types,
>    unveiled by commit 7ef1553dac "spapr_pci: Drop some dead error handling"
> 
> $ qemu-system-ppc64 -M pseries -device pcie-pci-bridge
> Unexpected error in object_property_find() at qom/object.c:1240:
> qemu-system-ppc64: -device pcie-pci-bridge: Property '.chassis_nr' not found
> Aborted (core dumped)
> 
> This happens because we assume all PCI bridge types to have a "chassis_nr"
> property. This property only exists with the standard PCI bridge type
> "pci-bridge" actually. We could possibly revert 7ef1553dac but it seems
> much simpler to check the presence of "chassis_nr" earlier.
> 
> 2) QEMU abort if same "chassis_nr" value is used several times,
>    unveiled by commit d2623129a7de "qom: Drop parameter @errp of
>    object_property_add() & friends"
> 
> $ qemu-system-ppc64 -M pseries -device pci-bridge,chassis_nr=1 \
>                         -device pci-bridge,chassis_nr=1
> Unexpected error in object_property_try_add() at qom/object.c:1167:
> qemu-system-ppc64: -device pci-bridge,chassis_nr=1: attempt to add duplicate property '40000100' to object (type 'container')
> Aborted (core dumped)
> 
> This happens because we assume that "chassis_nr" values are unique, but
> nobody enforces that and we end up generating duplicate DRC ids. The PCI
> code doesn't really care for duplicate "chassis_nr" properties since it
> is only used to initialize the "Chassis Number Register" of the bridge,
> with no functional impact on QEMU. So, even if passing the same value
> several times might look weird, it never broke anything before, so
> I guess we don't necessarily want to enforce strict checking in the PCI
> code now.
> 
> Workaround both issues in the PAPR code: check that the bridge has a
> unique and non null "chassis_nr" when plugging it into its parent bus.
> 
> Fixes: 05929a6c5dfe ("spapr: Don't use bus number for building DRC ids")
> Reported-by: Thomas Huth <thuth@redhat.com>
> Signed-off-by: Greg Kurz <groug@kaod.org>



> ---
>  hw/ppc/spapr_pci.c |   57 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 57 insertions(+)
> 
> diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
> index 329002ac040e..09d52ef7954d 100644
> --- a/hw/ppc/spapr_pci.c
> +++ b/hw/ppc/spapr_pci.c
> @@ -1480,6 +1480,57 @@ static void spapr_pci_bridge_plug(SpaprPhbState *phb,
>      add_drcs(phb, bus);
>  }
>  
> +/* Returns non-zero if the value of "chassis_nr" is already in use */
> +static int check_chassis_nr(Object *obj, void *opaque)
> +{
> +    int new_chassis_nr =
> +        object_property_get_uint(opaque, "chassis_nr", &error_abort);
> +    int chassis_nr =
> +        object_property_get_uint(obj, "chassis_nr", NULL);
> +
> +    if (!object_dynamic_cast(obj, TYPE_PCI_BRIDGE)) {
> +        return 0;
> +    }
> +
> +    /* Skip unsupported bridge types */
> +    if (!chassis_nr) {
> +        return 0;
> +    }
> +
> +    /* Skip self */
> +    if (obj == opaque) {
> +        return 0;
> +    }
> +
> +    return chassis_nr == new_chassis_nr;
> +}
> +
> +static bool bridge_has_valid_chassis_nr(Object *bridge, Error **errp)

I would rename this "bridge_has_unique_chassis_nr".

> +{
> +    int chassis_nr =
> +        object_property_get_uint(bridge, "chassis_nr", NULL);
> +
> +    /*
> +     * slotid_cap_init() already ensures that "chassis_nr" isn't null for
> +     * standard PCI bridges, so this really tells if "chassis_nr" is present
> +     * or not.
> +     */
> +    if (!chassis_nr) {
> +        error_setg(errp, "PCI Bridge lacks a \"chassis_nr\" property");
> +        error_append_hint(errp, "Try -device pci-bridge instead.\n");
> +        return false;
> +    }
> +
> +    /* We want unique values for "chassis_nr" */
> +    if (object_child_foreach_recursive(object_get_root(), check_chassis_nr,
> +                                       bridge)) {
> +        error_setg(errp, "Bridge chassis %d already in use", chassis_nr);
> +        return false;
> +    }
> +
> +    return true;
> +}
> +
>  static void spapr_pci_plug(HotplugHandler *plug_handler,
>                             DeviceState *plugged_dev, Error **errp)
>  {
> @@ -1491,6 +1542,12 @@ static void spapr_pci_plug(HotplugHandler *plug_handler,
>      PCIBus *bus = PCI_BUS(qdev_get_parent_bus(DEVICE(pdev)));
>      uint32_t slotnr = PCI_SLOT(pdev->devfn);
>  
> +    if (pc->is_bridge) {
> +        if (!bridge_has_valid_chassis_nr(OBJECT(plugged_dev), errp)) {
> +            return;
> +        }
> +    }
> +


Add a comment here explaning DRC ids are generated from chassis_nr and
these need to be unique?


>      /* if DR is disabled we don't need to do anything in the case of
>       * hotplug or coldplug callbacks
>       */
>

With above fixed:

Acked-by: Michael S. Tsirkin <mst@redhat.com>
 
Feel free to merge.
Greg Kurz July 16, 2020, 10:32 a.m. UTC | #3
On Thu, 16 Jul 2020 14:45:40 +1000
David Gibson <david@gibson.dropbear.id.au> wrote:

> On Thu, Jul 09, 2020 at 07:12:47PM +0200, Greg Kurz wrote:
> > Some recent error handling cleanups unveiled issues with our support of
> > PCI bridges:
> > 
> > 1) QEMU aborts when using non-standard PCI bridge types,
> >    unveiled by commit 7ef1553dac "spapr_pci: Drop some dead error handling"
> > 
> > $ qemu-system-ppc64 -M pseries -device pcie-pci-bridge
> > Unexpected error in object_property_find() at qom/object.c:1240:
> > qemu-system-ppc64: -device pcie-pci-bridge: Property '.chassis_nr' not found
> > Aborted (core dumped)
> 
> Oops, I thought we had a check that we actually had a "pci-bridge"
> device before continuing with the hotplug, but I guess not.
> 

Ah... are you suggesting we should explicitly check the actual type
of the bridge rather than looking for the "chassis_nr" property ?

> > This happens because we assume all PCI bridge types to have a "chassis_nr"
> > property. This property only exists with the standard PCI bridge type
> > "pci-bridge" actually. We could possibly revert 7ef1553dac but it seems
> > much simpler to check the presence of "chassis_nr" earlier.
> 
> Hrm, right, 7ef1553dac was not really correct since add_drcs() really
> can fail.
> 

Yes.

> > 2) QEMU abort if same "chassis_nr" value is used several times,
> >    unveiled by commit d2623129a7de "qom: Drop parameter @errp of
> >    object_property_add() & friends"
> > 
> > $ qemu-system-ppc64 -M pseries -device pci-bridge,chassis_nr=1 \
> >                         -device pci-bridge,chassis_nr=1
> > Unexpected error in object_property_try_add() at qom/object.c:1167:
> > qemu-system-ppc64: -device pci-bridge,chassis_nr=1: attempt to add duplicate property '40000100' to object (type 'container')
> > Aborted (core dumped)
> > 
> > This happens because we assume that "chassis_nr" values are unique, but
> > nobody enforces that and we end up generating duplicate DRC ids. The PCI
> > code doesn't really care for duplicate "chassis_nr" properties since it
> > is only used to initialize the "Chassis Number Register" of the bridge,
> > with no functional impact on QEMU. So, even if passing the same value
> > several times might look weird, it never broke anything before, so
> > I guess we don't necessarily want to enforce strict checking in the PCI
> > code now.
> 
> Yeah, I guess.  I'm pretty sure that the chassis number of bridges is
> supposed to be system-unique (well, unique within the PCI domain at
> least, I guess) as part of the hardware spec.  So specifying multiple
> chassis ids the same is a user error, but we need a better failure
> mode.
> 

According to section 3.2.6.4 of "PCI-to-PCI Bridge Architecture
Specification", the chassis number is exposed to the OS as a 
non-volatile r/w register. It seems to be expected that chassis
numbers might collide, in which case the system software can
overwrite the register with a new number. So I'm not sure that
specifying the same number multiple times is an actual user error.

> > Workaround both issues in the PAPR code: check that the bridge has a
> > unique and non null "chassis_nr" when plugging it into its parent bus.
> >
> > Fixes: 05929a6c5dfe ("spapr: Don't use bus number for building DRC ids")
> 
> Arguably, it's really fixing 7ef1553dac.
> 

True for issue 1 but not for issue 2, which is the result of
05929a6c5dfe (switch to "chassis_nr" introduces a condition
to end up with duplicate DRC ids) and d2623129a7de (assert
when trying to add a duplicated DRC).

I'm starting to think I should maybe split this in
two patches. One for each issue.

> > Reported-by: Thomas Huth <thuth@redhat.com>
> > Signed-off-by: Greg Kurz <groug@kaod.org>
> 
> I had a few misgivings about the details of this, but I think I've
> convinced myself they're fine.  There's a couple of things I'd like to
> polish, but I'll do that as a follow up.
> 

Some fixes for d2623129a7de just got merged. Let me have a look
again.

> > ---
> >  hw/ppc/spapr_pci.c |   57 ++++++++++++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 57 insertions(+)
> > 
> > diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
> > index 329002ac040e..09d52ef7954d 100644
> > --- a/hw/ppc/spapr_pci.c
> > +++ b/hw/ppc/spapr_pci.c
> > @@ -1480,6 +1480,57 @@ static void spapr_pci_bridge_plug(SpaprPhbState *phb,
> >      add_drcs(phb, bus);
> >  }
> >  
> > +/* Returns non-zero if the value of "chassis_nr" is already in use */
> > +static int check_chassis_nr(Object *obj, void *opaque)
> > +{
> > +    int new_chassis_nr =
> > +        object_property_get_uint(opaque, "chassis_nr", &error_abort);
> > +    int chassis_nr =
> > +        object_property_get_uint(obj, "chassis_nr", NULL);
> > +
> > +    if (!object_dynamic_cast(obj, TYPE_PCI_BRIDGE)) {
> > +        return 0;
> > +    }
> > +
> > +    /* Skip unsupported bridge types */
> > +    if (!chassis_nr) {
> > +        return 0;
> > +    }
> > +
> > +    /* Skip self */
> > +    if (obj == opaque) {
> > +        return 0;
> > +    }
> > +
> > +    return chassis_nr == new_chassis_nr;
> > +}
> > +
> > +static bool bridge_has_valid_chassis_nr(Object *bridge, Error **errp)
> > +{
> > +    int chassis_nr =
> > +        object_property_get_uint(bridge, "chassis_nr", NULL);
> > +
> > +    /*
> > +     * slotid_cap_init() already ensures that "chassis_nr" isn't null for
> > +     * standard PCI bridges, so this really tells if "chassis_nr" is present
> > +     * or not.
> > +     */
> > +    if (!chassis_nr) {
> > +        error_setg(errp, "PCI Bridge lacks a \"chassis_nr\" property");
> > +        error_append_hint(errp, "Try -device pci-bridge instead.\n");
> > +        return false;
> > +    }
> > +
> > +    /* We want unique values for "chassis_nr" */
> > +    if (object_child_foreach_recursive(object_get_root(), check_chassis_nr,
> > +                                       bridge)) {
> > +        error_setg(errp, "Bridge chassis %d already in use", chassis_nr);
> > +        return false;
> > +    }
> > +
> > +    return true;
> > +}
> > +
> >  static void spapr_pci_plug(HotplugHandler *plug_handler,
> >                             DeviceState *plugged_dev, Error **errp)
> >  {
> > @@ -1491,6 +1542,12 @@ static void spapr_pci_plug(HotplugHandler *plug_handler,
> >      PCIBus *bus = PCI_BUS(qdev_get_parent_bus(DEVICE(pdev)));
> >      uint32_t slotnr = PCI_SLOT(pdev->devfn);
> >  
> > +    if (pc->is_bridge) {
> > +        if (!bridge_has_valid_chassis_nr(OBJECT(plugged_dev), errp)) {
> > +            return;
> > +        }
> > +    }
> > +
> >      /* if DR is disabled we don't need to do anything in the case of
> >       * hotplug or coldplug callbacks
> >       */
> > 
> > 
>
Greg Kurz July 16, 2020, 10:34 a.m. UTC | #4
On Thu, 16 Jul 2020 02:53:07 -0400
"Michael S. Tsirkin" <mst@redhat.com> wrote:

> On Thu, Jul 09, 2020 at 07:12:47PM +0200, Greg Kurz wrote:
> > Some recent error handling cleanups unveiled issues with our support of
> > PCI bridges:
> > 
> > 1) QEMU aborts when using non-standard PCI bridge types,
> >    unveiled by commit 7ef1553dac "spapr_pci: Drop some dead error handling"
> > 
> > $ qemu-system-ppc64 -M pseries -device pcie-pci-bridge
> > Unexpected error in object_property_find() at qom/object.c:1240:
> > qemu-system-ppc64: -device pcie-pci-bridge: Property '.chassis_nr' not found
> > Aborted (core dumped)
> > 
> > This happens because we assume all PCI bridge types to have a "chassis_nr"
> > property. This property only exists with the standard PCI bridge type
> > "pci-bridge" actually. We could possibly revert 7ef1553dac but it seems
> > much simpler to check the presence of "chassis_nr" earlier.
> > 
> > 2) QEMU abort if same "chassis_nr" value is used several times,
> >    unveiled by commit d2623129a7de "qom: Drop parameter @errp of
> >    object_property_add() & friends"
> > 
> > $ qemu-system-ppc64 -M pseries -device pci-bridge,chassis_nr=1 \
> >                         -device pci-bridge,chassis_nr=1
> > Unexpected error in object_property_try_add() at qom/object.c:1167:
> > qemu-system-ppc64: -device pci-bridge,chassis_nr=1: attempt to add duplicate property '40000100' to object (type 'container')
> > Aborted (core dumped)
> > 
> > This happens because we assume that "chassis_nr" values are unique, but
> > nobody enforces that and we end up generating duplicate DRC ids. The PCI
> > code doesn't really care for duplicate "chassis_nr" properties since it
> > is only used to initialize the "Chassis Number Register" of the bridge,
> > with no functional impact on QEMU. So, even if passing the same value
> > several times might look weird, it never broke anything before, so
> > I guess we don't necessarily want to enforce strict checking in the PCI
> > code now.
> > 
> > Workaround both issues in the PAPR code: check that the bridge has a
> > unique and non null "chassis_nr" when plugging it into its parent bus.
> > 
> > Fixes: 05929a6c5dfe ("spapr: Don't use bus number for building DRC ids")
> > Reported-by: Thomas Huth <thuth@redhat.com>
> > Signed-off-by: Greg Kurz <groug@kaod.org>
> 
> 
> 
> > ---
> >  hw/ppc/spapr_pci.c |   57 ++++++++++++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 57 insertions(+)
> > 
> > diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
> > index 329002ac040e..09d52ef7954d 100644
> > --- a/hw/ppc/spapr_pci.c
> > +++ b/hw/ppc/spapr_pci.c
> > @@ -1480,6 +1480,57 @@ static void spapr_pci_bridge_plug(SpaprPhbState *phb,
> >      add_drcs(phb, bus);
> >  }
> >  
> > +/* Returns non-zero if the value of "chassis_nr" is already in use */
> > +static int check_chassis_nr(Object *obj, void *opaque)
> > +{
> > +    int new_chassis_nr =
> > +        object_property_get_uint(opaque, "chassis_nr", &error_abort);
> > +    int chassis_nr =
> > +        object_property_get_uint(obj, "chassis_nr", NULL);
> > +
> > +    if (!object_dynamic_cast(obj, TYPE_PCI_BRIDGE)) {
> > +        return 0;
> > +    }
> > +
> > +    /* Skip unsupported bridge types */
> > +    if (!chassis_nr) {
> > +        return 0;
> > +    }
> > +
> > +    /* Skip self */
> > +    if (obj == opaque) {
> > +        return 0;
> > +    }
> > +
> > +    return chassis_nr == new_chassis_nr;
> > +}
> > +
> > +static bool bridge_has_valid_chassis_nr(Object *bridge, Error **errp)
> 
> I would rename this "bridge_has_unique_chassis_nr".
> 

Right.

> > +{
> > +    int chassis_nr =
> > +        object_property_get_uint(bridge, "chassis_nr", NULL);
> > +
> > +    /*
> > +     * slotid_cap_init() already ensures that "chassis_nr" isn't null for
> > +     * standard PCI bridges, so this really tells if "chassis_nr" is present
> > +     * or not.
> > +     */
> > +    if (!chassis_nr) {
> > +        error_setg(errp, "PCI Bridge lacks a \"chassis_nr\" property");
> > +        error_append_hint(errp, "Try -device pci-bridge instead.\n");
> > +        return false;
> > +    }
> > +
> > +    /* We want unique values for "chassis_nr" */
> > +    if (object_child_foreach_recursive(object_get_root(), check_chassis_nr,
> > +                                       bridge)) {
> > +        error_setg(errp, "Bridge chassis %d already in use", chassis_nr);
> > +        return false;
> > +    }
> > +
> > +    return true;
> > +}
> > +
> >  static void spapr_pci_plug(HotplugHandler *plug_handler,
> >                             DeviceState *plugged_dev, Error **errp)
> >  {
> > @@ -1491,6 +1542,12 @@ static void spapr_pci_plug(HotplugHandler *plug_handler,
> >      PCIBus *bus = PCI_BUS(qdev_get_parent_bus(DEVICE(pdev)));
> >      uint32_t slotnr = PCI_SLOT(pdev->devfn);
> >  
> > +    if (pc->is_bridge) {
> > +        if (!bridge_has_valid_chassis_nr(OBJECT(plugged_dev), errp)) {
> > +            return;
> > +        }
> > +    }
> > +
> 
> 
> Add a comment here explaning DRC ids are generated from chassis_nr and
> these need to be unique?
> 

Makes sense.

> 
> >      /* if DR is disabled we don't need to do anything in the case of
> >       * hotplug or coldplug callbacks
> >       */
> >
> 
> With above fixed:
> 
> Acked-by: Michael S. Tsirkin <mst@redhat.com>
>  
> Feel free to merge.
> 

Thanks !
David Gibson July 16, 2020, 1:11 p.m. UTC | #5
On Thu, Jul 16, 2020 at 12:32:44PM +0200, Greg Kurz wrote:
> On Thu, 16 Jul 2020 14:45:40 +1000
> David Gibson <david@gibson.dropbear.id.au> wrote:
> 
> > On Thu, Jul 09, 2020 at 07:12:47PM +0200, Greg Kurz wrote:
> > > Some recent error handling cleanups unveiled issues with our support of
> > > PCI bridges:
> > > 
> > > 1) QEMU aborts when using non-standard PCI bridge types,
> > >    unveiled by commit 7ef1553dac "spapr_pci: Drop some dead error handling"
> > > 
> > > $ qemu-system-ppc64 -M pseries -device pcie-pci-bridge
> > > Unexpected error in object_property_find() at qom/object.c:1240:
> > > qemu-system-ppc64: -device pcie-pci-bridge: Property '.chassis_nr' not found
> > > Aborted (core dumped)
> > 
> > Oops, I thought we had a check that we actually had a "pci-bridge"
> > device before continuing with the hotplug, but I guess not.
> 
> Ah... are you suggesting we should explicitly check the actual type
> of the bridge rather than looking for the "chassis_nr" property ?

Uh.. I thought about it, but I don't think it matters much which way
we do it.

> > > This happens because we assume all PCI bridge types to have a "chassis_nr"
> > > property. This property only exists with the standard PCI bridge type
> > > "pci-bridge" actually. We could possibly revert 7ef1553dac but it seems
> > > much simpler to check the presence of "chassis_nr" earlier.
> > 
> > Hrm, right, 7ef1553dac was not really correct since add_drcs() really
> > can fail.
> 
> Yes.
> 
> > > 2) QEMU abort if same "chassis_nr" value is used several times,
> > >    unveiled by commit d2623129a7de "qom: Drop parameter @errp of
> > >    object_property_add() & friends"
> > > 
> > > $ qemu-system-ppc64 -M pseries -device pci-bridge,chassis_nr=1 \
> > >                         -device pci-bridge,chassis_nr=1
> > > Unexpected error in object_property_try_add() at qom/object.c:1167:
> > > qemu-system-ppc64: -device pci-bridge,chassis_nr=1: attempt to add duplicate property '40000100' to object (type 'container')
> > > Aborted (core dumped)
> > > 
> > > This happens because we assume that "chassis_nr" values are unique, but
> > > nobody enforces that and we end up generating duplicate DRC ids. The PCI
> > > code doesn't really care for duplicate "chassis_nr" properties since it
> > > is only used to initialize the "Chassis Number Register" of the bridge,
> > > with no functional impact on QEMU. So, even if passing the same value
> > > several times might look weird, it never broke anything before, so
> > > I guess we don't necessarily want to enforce strict checking in the PCI
> > > code now.
> > 
> > Yeah, I guess.  I'm pretty sure that the chassis number of bridges is
> > supposed to be system-unique (well, unique within the PCI domain at
> > least, I guess) as part of the hardware spec.  So specifying multiple
> > chassis ids the same is a user error, but we need a better failure
> > mode.
> 
> According to section 3.2.6.4 of "PCI-to-PCI Bridge Architecture
> Specification", the chassis number is exposed to the OS as a 
> non-volatile r/w register.

Argh.  Dammit.  I could have sworn I checked that chassis numbers were
supposed to be unique (and not guest alterable).  That's the whole
reason I chose it.

> It seems to be expected that chassis
> numbers might collide, in which case the system software can
> overwrite the register with a new number. So I'm not sure that
> specifying the same number multiple times is an actual user error.
> 
> > > Workaround both issues in the PAPR code: check that the bridge has a
> > > unique and non null "chassis_nr" when plugging it into its parent bus.
> > >
> > > Fixes: 05929a6c5dfe ("spapr: Don't use bus number for building DRC ids")
> > 
> > Arguably, it's really fixing 7ef1553dac.
> 
> True for issue 1 but not for issue 2, which is the result of
> 05929a6c5dfe (switch to "chassis_nr" introduces a condition
> to end up with duplicate DRC ids)

Well, technically.  But the method we had before was *way* more broken
than chassis numbers.  It was using bus number which is not stable
across the VM's lifetime, which is a non-starter.  Plus, the bus
number too won't be unique, until the guest has enumerated the bus,
which is too late for DRC creation.

The only reason we got away with it, was that it was basically dead
code - at that stage we didn't support hotplug under bridges, so we
never actually created DRCs except for the root bus.

> and d2623129a7de (assert
> when trying to add a duplicated DRC).

Well, again, only on a technicality.  It might not have immediatley
assert()ed, but adding a duplicated DRC was still completely broken
before that.

> I'm starting to think I should maybe split this in
> two patches. One for each issue.

At this stage, I'd kind of prefer to merge this fix now, with the
intention of doing a pull request tomorrow.  AFAICT none of the
suggested improvements can't be done as followups.

> > > Reported-by: Thomas Huth <thuth@redhat.com>
> > > Signed-off-by: Greg Kurz <groug@kaod.org>
> > 
> > I had a few misgivings about the details of this, but I think I've
> > convinced myself they're fine.  There's a couple of things I'd like to
> > polish, but I'll do that as a follow up.
> 
> Some fixes for d2623129a7de just got merged. Let me have a look
> again.

In fact the main part of the polish I was thinking of didn't really
work out.  The only change I made was a tiny move to the check (it's
not really relevant until *after* we've checked if hotplug is enabled
at all).  So I just folded that in and put it into the ppc-for-5.1
tree.
Markus Armbruster July 16, 2020, 2:01 p.m. UTC | #6
David Gibson <david@gibson.dropbear.id.au> writes:

> On Thu, Jul 09, 2020 at 07:12:47PM +0200, Greg Kurz wrote:
>> Some recent error handling cleanups unveiled issues with our support of
>> PCI bridges:
>> 
>> 1) QEMU aborts when using non-standard PCI bridge types,
>>    unveiled by commit 7ef1553dac "spapr_pci: Drop some dead error handling"
>> 
>> $ qemu-system-ppc64 -M pseries -device pcie-pci-bridge
>> Unexpected error in object_property_find() at qom/object.c:1240:
>> qemu-system-ppc64: -device pcie-pci-bridge: Property '.chassis_nr' not found
>> Aborted (core dumped)
>
> Oops, I thought we had a check that we actually had a "pci-bridge"
> device before continuing with the hotplug, but I guess not.
>
>> This happens because we assume all PCI bridge types to have a "chassis_nr"
>> property. This property only exists with the standard PCI bridge type
>> "pci-bridge" actually. We could possibly revert 7ef1553dac but it seems
>> much simpler to check the presence of "chassis_nr" earlier.
>
> Hrm, right, 7ef1553dac was not really correct since add_drcs() really
> can fail.

Right.  I failed to see that we can run into a bridge without a
"chassis_nr" here.

>> 2) QEMU abort if same "chassis_nr" value is used several times,
>>    unveiled by commit d2623129a7de "qom: Drop parameter @errp of
>>    object_property_add() & friends"
>> 
>> $ qemu-system-ppc64 -M pseries -device pci-bridge,chassis_nr=1 \
>>                         -device pci-bridge,chassis_nr=1
>> Unexpected error in object_property_try_add() at qom/object.c:1167:
>> qemu-system-ppc64: -device pci-bridge,chassis_nr=1: attempt to add duplicate property '40000100' to object (type 'container')
>> Aborted (core dumped)

Before d2623129a7de, the error got *ignored* in
spapr_dr_connector_new():

    SpaprDrc *spapr_dr_connector_new(Object *owner, const char *type,
                                             uint32_t id)
    {
        SpaprDrc *drc = SPAPR_DR_CONNECTOR(object_new(type));
        char *prop_name;

        drc->id = id;
        drc->owner = owner;
        prop_name = g_strdup_printf("dr-connector[%"PRIu32"]",
                                    spapr_drc_index(drc));
        object_property_add_child(owner, prop_name, OBJECT(drc), &error_abort);
        object_unref(OBJECT(drc));
--->    object_property_set_bool(OBJECT(drc), true, "realized", NULL);
        g_free(prop_name);

        return drc;
    }

I doubt that's healthy.

>> This happens because we assume that "chassis_nr" values are unique, but
>> nobody enforces that and we end up generating duplicate DRC ids. The PCI
>> code doesn't really care for duplicate "chassis_nr" properties since it
>> is only used to initialize the "Chassis Number Register" of the bridge,
>> with no functional impact on QEMU. So, even if passing the same value
>> several times might look weird, it never broke anything before, so
>> I guess we don't necessarily want to enforce strict checking in the PCI
>> code now.
>
> Yeah, I guess.  I'm pretty sure that the chassis number of bridges is
> supposed to be system-unique (well, unique within the PCI domain at
> least, I guess) as part of the hardware spec.  So specifying multiple
> chassis ids the same is a user error, but we need a better failure
> mode.
>
>> Workaround both issues in the PAPR code: check that the bridge has a
>> unique and non null "chassis_nr" when plugging it into its parent bus.
>>
>> Fixes: 05929a6c5dfe ("spapr: Don't use bus number for building DRC ids")
>
> Arguably, it's really fixing 7ef1553dac.

I agree 7ef1553dac broke the "use a bridge that doesn't have property
'chassis_nr' case.

I suspect the "duplicate chassis_nr" case has always been broken, and
d2623129a7de merely uncovered it.

If we can trigger the abort with hot-plug, then d2623129a7de made things
materially worse (new way to accidentally kill your guest and maybe lose
data), and I'd add a Fixes: blaming it.

>> Reported-by: Thomas Huth <thuth@redhat.com>
>> Signed-off-by: Greg Kurz <groug@kaod.org>
>
> I had a few misgivings about the details of this, but I think I've
> convinced myself they're fine.  There's a couple of things I'd like to
> polish, but I'll do that as a follow up.
Markus Armbruster July 16, 2020, 2:23 p.m. UTC | #7
David Gibson <david@gibson.dropbear.id.au> writes:

> On Thu, Jul 16, 2020 at 12:32:44PM +0200, Greg Kurz wrote:
>> On Thu, 16 Jul 2020 14:45:40 +1000
>> David Gibson <david@gibson.dropbear.id.au> wrote:
>> 
>> > On Thu, Jul 09, 2020 at 07:12:47PM +0200, Greg Kurz wrote:
>> > > Some recent error handling cleanups unveiled issues with our support of
>> > > PCI bridges:
>> > > 
>> > > 1) QEMU aborts when using non-standard PCI bridge types,
>> > >    unveiled by commit 7ef1553dac "spapr_pci: Drop some dead error handling"
>> > > 
>> > > $ qemu-system-ppc64 -M pseries -device pcie-pci-bridge
>> > > Unexpected error in object_property_find() at qom/object.c:1240:
>> > > qemu-system-ppc64: -device pcie-pci-bridge: Property '.chassis_nr' not found
>> > > Aborted (core dumped)
>> > 
>> > Oops, I thought we had a check that we actually had a "pci-bridge"
>> > device before continuing with the hotplug, but I guess not.
>> 
>> Ah... are you suggesting we should explicitly check the actual type
>> of the bridge rather than looking for the "chassis_nr" property ?
>
> Uh.. I thought about it, but I don't think it matters much which way
> we do it.

Would it make sense to add the "chassis_nr" property to *all* PCI
bridge devices?

[...]
Greg Kurz July 16, 2020, 2:42 p.m. UTC | #8
On Thu, 16 Jul 2020 16:01:18 +0200
Markus Armbruster <armbru@redhat.com> wrote:

> David Gibson <david@gibson.dropbear.id.au> writes:
> 
> > On Thu, Jul 09, 2020 at 07:12:47PM +0200, Greg Kurz wrote:
> >> Some recent error handling cleanups unveiled issues with our support of
> >> PCI bridges:
> >> 
> >> 1) QEMU aborts when using non-standard PCI bridge types,
> >>    unveiled by commit 7ef1553dac "spapr_pci: Drop some dead error handling"
> >> 
> >> $ qemu-system-ppc64 -M pseries -device pcie-pci-bridge
> >> Unexpected error in object_property_find() at qom/object.c:1240:
> >> qemu-system-ppc64: -device pcie-pci-bridge: Property '.chassis_nr' not found
> >> Aborted (core dumped)
> >
> > Oops, I thought we had a check that we actually had a "pci-bridge"
> > device before continuing with the hotplug, but I guess not.
> >
> >> This happens because we assume all PCI bridge types to have a "chassis_nr"
> >> property. This property only exists with the standard PCI bridge type
> >> "pci-bridge" actually. We could possibly revert 7ef1553dac but it seems
> >> much simpler to check the presence of "chassis_nr" earlier.
> >
> > Hrm, right, 7ef1553dac was not really correct since add_drcs() really
> > can fail.
> 
> Right.  I failed to see that we can run into a bridge without a
> "chassis_nr" here.
> 
> >> 2) QEMU abort if same "chassis_nr" value is used several times,
> >>    unveiled by commit d2623129a7de "qom: Drop parameter @errp of
> >>    object_property_add() & friends"
> >> 
> >> $ qemu-system-ppc64 -M pseries -device pci-bridge,chassis_nr=1 \
> >>                         -device pci-bridge,chassis_nr=1
> >> Unexpected error in object_property_try_add() at qom/object.c:1167:
> >> qemu-system-ppc64: -device pci-bridge,chassis_nr=1: attempt to add duplicate property '40000100' to object (type 'container')
> >> Aborted (core dumped)
> 
> Before d2623129a7de, the error got *ignored* in
> spapr_dr_connector_new():
> 
>     SpaprDrc *spapr_dr_connector_new(Object *owner, const char *type,
>                                              uint32_t id)
>     {
>         SpaprDrc *drc = SPAPR_DR_CONNECTOR(object_new(type));
>         char *prop_name;
> 
>         drc->id = id;
>         drc->owner = owner;
>         prop_name = g_strdup_printf("dr-connector[%"PRIu32"]",
>                                     spapr_drc_index(drc));
>         object_property_add_child(owner, prop_name, OBJECT(drc), &error_abort);
>         object_unref(OBJECT(drc));
> --->    object_property_set_bool(OBJECT(drc), true, "realized", NULL);
>         g_free(prop_name);
> 
>         return drc;
>     }
> 
> I doubt that's healthy.
> 

This isn't. The object_property_set_bool() was later converted to
qdev_realize() (thanks again for the cleanups!) but the problem
remains. Realize can fail and I see now reason we don't do proper
error handling when it comes to the DRCs.

I'll look into fixing that.

> >> This happens because we assume that "chassis_nr" values are unique, but
> >> nobody enforces that and we end up generating duplicate DRC ids. The PCI
> >> code doesn't really care for duplicate "chassis_nr" properties since it
> >> is only used to initialize the "Chassis Number Register" of the bridge,
> >> with no functional impact on QEMU. So, even if passing the same value
> >> several times might look weird, it never broke anything before, so
> >> I guess we don't necessarily want to enforce strict checking in the PCI
> >> code now.
> >
> > Yeah, I guess.  I'm pretty sure that the chassis number of bridges is
> > supposed to be system-unique (well, unique within the PCI domain at
> > least, I guess) as part of the hardware spec.  So specifying multiple
> > chassis ids the same is a user error, but we need a better failure
> > mode.
> >
> >> Workaround both issues in the PAPR code: check that the bridge has a
> >> unique and non null "chassis_nr" when plugging it into its parent bus.
> >>
> >> Fixes: 05929a6c5dfe ("spapr: Don't use bus number for building DRC ids")
> >
> > Arguably, it's really fixing 7ef1553dac.
> 
> I agree 7ef1553dac broke the "use a bridge that doesn't have property
> 'chassis_nr' case.
> 
> I suspect the "duplicate chassis_nr" case has always been broken, and
> d2623129a7de merely uncovered it.
> 

Yes.

> If we can trigger the abort with hot-plug, then d2623129a7de made things
> materially worse (new way to accidentally kill your guest and maybe lose
> data), and I'd add a Fixes: blaming it.
> 

Yes it does.

David,

Maybe consider folding a third Fixes: tag into this patch ?

> >> Reported-by: Thomas Huth <thuth@redhat.com>
> >> Signed-off-by: Greg Kurz <groug@kaod.org>
> >
> > I had a few misgivings about the details of this, but I think I've
> > convinced myself they're fine.  There's a couple of things I'd like to
> > polish, but I'll do that as a follow up.
>
Greg Kurz July 16, 2020, 2:57 p.m. UTC | #9
On Thu, 16 Jul 2020 16:23:52 +0200
Markus Armbruster <armbru@redhat.com> wrote:

> David Gibson <david@gibson.dropbear.id.au> writes:
> 
> > On Thu, Jul 16, 2020 at 12:32:44PM +0200, Greg Kurz wrote:
> >> On Thu, 16 Jul 2020 14:45:40 +1000
> >> David Gibson <david@gibson.dropbear.id.au> wrote:
> >> 
> >> > On Thu, Jul 09, 2020 at 07:12:47PM +0200, Greg Kurz wrote:
> >> > > Some recent error handling cleanups unveiled issues with our support of
> >> > > PCI bridges:
> >> > > 
> >> > > 1) QEMU aborts when using non-standard PCI bridge types,
> >> > >    unveiled by commit 7ef1553dac "spapr_pci: Drop some dead error handling"
> >> > > 
> >> > > $ qemu-system-ppc64 -M pseries -device pcie-pci-bridge
> >> > > Unexpected error in object_property_find() at qom/object.c:1240:
> >> > > qemu-system-ppc64: -device pcie-pci-bridge: Property '.chassis_nr' not found
> >> > > Aborted (core dumped)
> >> > 
> >> > Oops, I thought we had a check that we actually had a "pci-bridge"
> >> > device before continuing with the hotplug, but I guess not.
> >> 
> >> Ah... are you suggesting we should explicitly check the actual type
> >> of the bridge rather than looking for the "chassis_nr" property ?
> >
> > Uh.. I thought about it, but I don't think it matters much which way
> > we do it.
> 
> Would it make sense to add the "chassis_nr" property to *all* PCI
> bridge devices?
> 

I see that the "PCI Express to PCI/PCI-X Bridge Specification" mentions
a "Chassis Number Register" which looks very similar to the what exists
in standard PCI-to-PCI brdiges. This doesn't seem to be implemented in
our "pcie-pci-bridge" device model though, but of course I have no idea
why :)

Maybe Michael or Marcel (cc'd) can share some thoughts about that ?

> [...]
>
David Gibson July 16, 2020, 11:50 p.m. UTC | #10
On Thu, Jul 16, 2020 at 04:42:00PM +0200, Greg Kurz wrote:
> On Thu, 16 Jul 2020 16:01:18 +0200
> Markus Armbruster <armbru@redhat.com> wrote:
> 
> > David Gibson <david@gibson.dropbear.id.au> writes:
> > 
> > > On Thu, Jul 09, 2020 at 07:12:47PM +0200, Greg Kurz wrote:
> > >> Some recent error handling cleanups unveiled issues with our support of
> > >> PCI bridges:
> > >> 
> > >> 1) QEMU aborts when using non-standard PCI bridge types,
> > >>    unveiled by commit 7ef1553dac "spapr_pci: Drop some dead error handling"
> > >> 
> > >> $ qemu-system-ppc64 -M pseries -device pcie-pci-bridge
> > >> Unexpected error in object_property_find() at qom/object.c:1240:
> > >> qemu-system-ppc64: -device pcie-pci-bridge: Property '.chassis_nr' not found
> > >> Aborted (core dumped)
> > >
> > > Oops, I thought we had a check that we actually had a "pci-bridge"
> > > device before continuing with the hotplug, but I guess not.
> > >
> > >> This happens because we assume all PCI bridge types to have a "chassis_nr"
> > >> property. This property only exists with the standard PCI bridge type
> > >> "pci-bridge" actually. We could possibly revert 7ef1553dac but it seems
> > >> much simpler to check the presence of "chassis_nr" earlier.
> > >
> > > Hrm, right, 7ef1553dac was not really correct since add_drcs() really
> > > can fail.
> > 
> > Right.  I failed to see that we can run into a bridge without a
> > "chassis_nr" here.

And I missed it on review, as well.

> > >> 2) QEMU abort if same "chassis_nr" value is used several times,
> > >>    unveiled by commit d2623129a7de "qom: Drop parameter @errp of
> > >>    object_property_add() & friends"
> > >> 
> > >> $ qemu-system-ppc64 -M pseries -device pci-bridge,chassis_nr=1 \
> > >>                         -device pci-bridge,chassis_nr=1
> > >> Unexpected error in object_property_try_add() at qom/object.c:1167:
> > >> qemu-system-ppc64: -device pci-bridge,chassis_nr=1: attempt to add duplicate property '40000100' to object (type 'container')
> > >> Aborted (core dumped)
> > 
> > Before d2623129a7de, the error got *ignored* in
> > spapr_dr_connector_new():
> > 
> >     SpaprDrc *spapr_dr_connector_new(Object *owner, const char *type,
> >                                              uint32_t id)
> >     {
> >         SpaprDrc *drc = SPAPR_DR_CONNECTOR(object_new(type));
> >         char *prop_name;
> > 
> >         drc->id = id;
> >         drc->owner = owner;
> >         prop_name = g_strdup_printf("dr-connector[%"PRIu32"]",
> >                                     spapr_drc_index(drc));
> >         object_property_add_child(owner, prop_name, OBJECT(drc), &error_abort);
> >         object_unref(OBJECT(drc));
> > --->    object_property_set_bool(OBJECT(drc), true, "realized", NULL);
> >         g_free(prop_name);
> > 
> >         return drc;
> >     }
> > 
> > I doubt that's healthy.

Indeed.

> This isn't. The object_property_set_bool() was later converted to
> qdev_realize() (thanks again for the cleanups!) but the problem
> remains. Realize can fail and I see now reason we don't do proper
> error handling when it comes to the DRCs.
> 
> I'll look into fixing that.
> 
> > >> This happens because we assume that "chassis_nr" values are unique, but
> > >> nobody enforces that and we end up generating duplicate DRC ids. The PCI
> > >> code doesn't really care for duplicate "chassis_nr" properties since it
> > >> is only used to initialize the "Chassis Number Register" of the bridge,
> > >> with no functional impact on QEMU. So, even if passing the same value
> > >> several times might look weird, it never broke anything before, so
> > >> I guess we don't necessarily want to enforce strict checking in the PCI
> > >> code now.
> > >
> > > Yeah, I guess.  I'm pretty sure that the chassis number of bridges is
> > > supposed to be system-unique (well, unique within the PCI domain at
> > > least, I guess) as part of the hardware spec.  So specifying multiple
> > > chassis ids the same is a user error, but we need a better failure
> > > mode.
> > >
> > >> Workaround both issues in the PAPR code: check that the bridge has a
> > >> unique and non null "chassis_nr" when plugging it into its parent bus.
> > >>
> > >> Fixes: 05929a6c5dfe ("spapr: Don't use bus number for building DRC ids")
> > >
> > > Arguably, it's really fixing 7ef1553dac.
> > 
> > I agree 7ef1553dac broke the "use a bridge that doesn't have property
> > 'chassis_nr' case.
> > 
> > I suspect the "duplicate chassis_nr" case has always been broken, and
> > d2623129a7de merely uncovered it.
> 
> Yes.

I agree.

> > If we can trigger the abort with hot-plug, then d2623129a7de made things
> > materially worse (new way to accidentally kill your guest and maybe lose
> > data), and I'd add a Fixes: blaming it.
> > 
> 
> Yes it does.
> 
> David,
> 
> Maybe consider folding a third Fixes: tag into this patch ?

Done.

> > >> Reported-by: Thomas Huth <thuth@redhat.com>
> > >> Signed-off-by: Greg Kurz <groug@kaod.org>
> > >
> > > I had a few misgivings about the details of this, but I think I've
> > > convinced myself they're fine.  There's a couple of things I'd like to
> > > polish, but I'll do that as a follow up.
> > 
>
David Gibson July 16, 2020, 11:57 p.m. UTC | #11
On Thu, Jul 16, 2020 at 04:57:54PM +0200, Greg Kurz wrote:
> On Thu, 16 Jul 2020 16:23:52 +0200
> Markus Armbruster <armbru@redhat.com> wrote:
> 
> > David Gibson <david@gibson.dropbear.id.au> writes:
> > 
> > > On Thu, Jul 16, 2020 at 12:32:44PM +0200, Greg Kurz wrote:
> > >> On Thu, 16 Jul 2020 14:45:40 +1000
> > >> David Gibson <david@gibson.dropbear.id.au> wrote:
> > >> 
> > >> > On Thu, Jul 09, 2020 at 07:12:47PM +0200, Greg Kurz wrote:
> > >> > > Some recent error handling cleanups unveiled issues with our support of
> > >> > > PCI bridges:
> > >> > > 
> > >> > > 1) QEMU aborts when using non-standard PCI bridge types,
> > >> > >    unveiled by commit 7ef1553dac "spapr_pci: Drop some dead error handling"
> > >> > > 
> > >> > > $ qemu-system-ppc64 -M pseries -device pcie-pci-bridge
> > >> > > Unexpected error in object_property_find() at qom/object.c:1240:
> > >> > > qemu-system-ppc64: -device pcie-pci-bridge: Property '.chassis_nr' not found
> > >> > > Aborted (core dumped)
> > >> > 
> > >> > Oops, I thought we had a check that we actually had a "pci-bridge"
> > >> > device before continuing with the hotplug, but I guess not.
> > >> 
> > >> Ah... are you suggesting we should explicitly check the actual type
> > >> of the bridge rather than looking for the "chassis_nr" property ?
> > >
> > > Uh.. I thought about it, but I don't think it matters much which way
> > > we do it.
> > 
> > Would it make sense to add the "chassis_nr" property to *all* PCI
> > bridge devices?
> > 
> 
> I see that the "PCI Express to PCI/PCI-X Bridge Specification" mentions
> a "Chassis Number Register" which looks very similar to the what exists
> in standard PCI-to-PCI brdiges. This doesn't seem to be implemented in
> our "pcie-pci-bridge" device model though, but of course I have no idea
> why :)

We could consider it, but I don't think there's a lot to be gained by
it at this stage.  I don't think there's really any reason to want to
use bridges other than plain "pci-bridge" on the pseries machine.

PCI is a bit weird on pseries, since it's explicitly paravirt.
Although you can use extended config space, and thereby PCI-E devices
on it, the topology really looks pretty much identical to vanilla
PCI.  So, I don't think there's any reason to use PCI-E bridges on
pseries.

Other than PCI-E bridges of various sorts, a quick scan suggests all
the other bridge types in qemu are weird variants that are mostly
specific to some particular platform.  I don't see any reason we'd
want those on pseries either.

> Maybe Michael or Marcel (cc'd) can share some thoughts about that ?
diff mbox series

Patch

diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
index 329002ac040e..09d52ef7954d 100644
--- a/hw/ppc/spapr_pci.c
+++ b/hw/ppc/spapr_pci.c
@@ -1480,6 +1480,57 @@  static void spapr_pci_bridge_plug(SpaprPhbState *phb,
     add_drcs(phb, bus);
 }
 
+/* Returns non-zero if the value of "chassis_nr" is already in use */
+static int check_chassis_nr(Object *obj, void *opaque)
+{
+    int new_chassis_nr =
+        object_property_get_uint(opaque, "chassis_nr", &error_abort);
+    int chassis_nr =
+        object_property_get_uint(obj, "chassis_nr", NULL);
+
+    if (!object_dynamic_cast(obj, TYPE_PCI_BRIDGE)) {
+        return 0;
+    }
+
+    /* Skip unsupported bridge types */
+    if (!chassis_nr) {
+        return 0;
+    }
+
+    /* Skip self */
+    if (obj == opaque) {
+        return 0;
+    }
+
+    return chassis_nr == new_chassis_nr;
+}
+
+static bool bridge_has_valid_chassis_nr(Object *bridge, Error **errp)
+{
+    int chassis_nr =
+        object_property_get_uint(bridge, "chassis_nr", NULL);
+
+    /*
+     * slotid_cap_init() already ensures that "chassis_nr" isn't null for
+     * standard PCI bridges, so this really tells if "chassis_nr" is present
+     * or not.
+     */
+    if (!chassis_nr) {
+        error_setg(errp, "PCI Bridge lacks a \"chassis_nr\" property");
+        error_append_hint(errp, "Try -device pci-bridge instead.\n");
+        return false;
+    }
+
+    /* We want unique values for "chassis_nr" */
+    if (object_child_foreach_recursive(object_get_root(), check_chassis_nr,
+                                       bridge)) {
+        error_setg(errp, "Bridge chassis %d already in use", chassis_nr);
+        return false;
+    }
+
+    return true;
+}
+
 static void spapr_pci_plug(HotplugHandler *plug_handler,
                            DeviceState *plugged_dev, Error **errp)
 {
@@ -1491,6 +1542,12 @@  static void spapr_pci_plug(HotplugHandler *plug_handler,
     PCIBus *bus = PCI_BUS(qdev_get_parent_bus(DEVICE(pdev)));
     uint32_t slotnr = PCI_SLOT(pdev->devfn);
 
+    if (pc->is_bridge) {
+        if (!bridge_has_valid_chassis_nr(OBJECT(plugged_dev), errp)) {
+            return;
+        }
+    }
+
     /* if DR is disabled we don't need to do anything in the case of
      * hotplug or coldplug callbacks
      */