Message ID | 20201218103400.689660-7-groug@kaod.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | spapr: Fix visibility and traversal of DR connectors | expand |
On 12/18/20 7:34 AM, Greg Kurz wrote: > Modeling DR connectors as individual devices raises some > concerns, as already discussed a year ago in this thread: > > https://patchew.org/QEMU/20191017205953.13122-1-cheloha@linux.vnet.ibm.com/ > > First, high maxmem settings creates too many DRC devices. > This causes scalability issues. It severely increase boot s/increase/increases > time because the multiple traversals of the DRC list that > are performed during machine setup are quadratic operations. > This is directly related to the fact that DRCs are modeled > as individual devices and added to the composition tree. > > Second, DR connectors are really an internal concept of > PAPR. They aren't something that the user or management > layer can manipulate in any way. We already don't allow > their creation with device_add by clearing user_creatable. > > DR connectors don't even need to be modeled as actual > devices since they don't sit in a bus. They just need > to be associated to an 'owner' object and to have the > equivalent of realize/unrealize functions. > > Downgrade them to be simple objects. Convert the existing > realize() and unrealize() to be methods of the DR connector > base class. Also have the base class to inherit from the > vmstate_if interface directly. The get_id() hook simply > returns NULL, just as device_vmstate_if_get_id() does for > devices that don't sit in a bus. The DR connector is no > longer made a child object. This means that it must be > explicitely freed when no longer needed. This is only s/explicitely/explicitly > required for PHBs and PCI bridges actually : have them to > free the DRC with spapr_dr_connector_free() instead of > object_unparent(). > > No longer add the DRCs to the QOM composition tree. Track > them with a glib hash table using the global DRC index as > the key instead. This makes traversal a linear operation. > > Signed-off-by: Greg Kurz <groug@kaod.org> > --- Code LGTM. The maintainer is welcome to fix the nits while pushing. Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com> > include/hw/ppc/spapr_drc.h | 8 +- > hw/ppc/spapr_drc.c | 166 ++++++++++++++----------------------- > hw/ppc/spapr_pci.c | 2 +- > 3 files changed, 69 insertions(+), 107 deletions(-) > > diff --git a/include/hw/ppc/spapr_drc.h b/include/hw/ppc/spapr_drc.h > index 8982927d5c24..a26aa8b9d4c3 100644 > --- a/include/hw/ppc/spapr_drc.h > +++ b/include/hw/ppc/spapr_drc.h > @@ -170,7 +170,7 @@ typedef enum { > > typedef struct SpaprDrc { > /*< private >*/ > - DeviceState parent; > + Object parent; > > uint32_t id; > Object *owner; > @@ -193,7 +193,7 @@ struct SpaprMachineState; > > typedef struct SpaprDrcClass { > /*< private >*/ > - DeviceClass parent; > + ObjectClass parent; > SpaprDrcState empty_state; > SpaprDrcState ready_state; > > @@ -209,6 +209,9 @@ typedef struct SpaprDrcClass { > > int (*dt_populate)(SpaprDrc *drc, struct SpaprMachineState *spapr, > void *fdt, int *fdt_start_offset, Error **errp); > + > + void (*realize)(SpaprDrc *drc); > + void (*unrealize)(SpaprDrc *drc); > } SpaprDrcClass; > > typedef struct SpaprDrcPhysical { > @@ -232,6 +235,7 @@ SpaprDrcType spapr_drc_type(SpaprDrc *drc); > > SpaprDrc *spapr_dr_connector_new(Object *owner, const char *type, > uint32_t id); > +void spapr_dr_connector_free(SpaprDrc *drc); > SpaprDrc *spapr_drc_by_index(uint32_t index); > SpaprDrc *spapr_drc_by_id(const char *type, uint32_t id); > int spapr_dt_drc(void *fdt, int offset, Object *owner, uint32_t drc_type_mask); > diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c > index 8571d5bafe4e..e26763f8b5a4 100644 > --- a/hw/ppc/spapr_drc.c > +++ b/hw/ppc/spapr_drc.c > @@ -27,7 +27,6 @@ > #include "sysemu/reset.h" > #include "trace.h" > > -#define DRC_CONTAINER_PATH "/dr-connector" > #define DRC_INDEX_TYPE_SHIFT 28 > #define DRC_INDEX_ID_MASK ((1ULL << DRC_INDEX_TYPE_SHIFT) - 1) > > @@ -503,65 +502,56 @@ static const VMStateDescription vmstate_spapr_drc = { > } > }; > > -static void drc_realize(DeviceState *d, Error **errp) > +static GHashTable *drc_hash_table(void) > { > - SpaprDrc *drc = SPAPR_DR_CONNECTOR(d); > - Object *root_container; > - gchar *link_name; > - const char *child_name; > + static GHashTable *dht; > > + if (!dht) { > + dht = g_hash_table_new(NULL, NULL); > + } > + > + return dht; > +} > + > + > +static void drc_realize(SpaprDrc *drc) > +{ > trace_spapr_drc_realize(spapr_drc_index(drc)); > - /* NOTE: we do this as part of realize/unrealize due to the fact > - * that the guest will communicate with the DRC via RTAS calls > - * referencing the global DRC index. By unlinking the DRC > - * from DRC_CONTAINER_PATH/<drc_index> we effectively make it > - * inaccessible by the guest, since lookups rely on this path > - * existing in the composition tree > - */ > - root_container = container_get(object_get_root(), DRC_CONTAINER_PATH); > - link_name = g_strdup_printf("%x", spapr_drc_index(drc)); > - child_name = object_get_canonical_path_component(OBJECT(drc)); > - trace_spapr_drc_realize_child(spapr_drc_index(drc), child_name); > - object_property_add_alias(root_container, link_name, > - drc->owner, child_name); > - g_free(link_name); > + > + g_hash_table_insert(drc_hash_table(), > + GUINT_TO_POINTER(spapr_drc_index(drc)), drc); > vmstate_register(VMSTATE_IF(drc), spapr_drc_index(drc), &vmstate_spapr_drc, > drc); > trace_spapr_drc_realize_complete(spapr_drc_index(drc)); > } > > -static void drc_unrealize(DeviceState *d) > +static void drc_unrealize(SpaprDrc *drc) > { > - SpaprDrc *drc = SPAPR_DR_CONNECTOR(d); > - Object *root_container; > - gchar *name; > - > trace_spapr_drc_unrealize(spapr_drc_index(drc)); > vmstate_unregister(VMSTATE_IF(drc), &vmstate_spapr_drc, drc); > - root_container = container_get(object_get_root(), DRC_CONTAINER_PATH); > - name = g_strdup_printf("%x", spapr_drc_index(drc)); > - object_property_del(root_container, name); > - g_free(name); > + g_hash_table_remove(drc_hash_table(), > + GUINT_TO_POINTER(spapr_drc_index(drc))); > } > > SpaprDrc *spapr_dr_connector_new(Object *owner, const char *type, > uint32_t id) > { > SpaprDrc *drc = SPAPR_DR_CONNECTOR(object_new(type)); > - char *prop_name; > > drc->id = id; > - drc->owner = owner; > - prop_name = g_strdup_printf("dr-connector[%"PRIu32"]", > - spapr_drc_index(drc)); > - object_property_add_child(owner, prop_name, OBJECT(drc)); > - object_unref(OBJECT(drc)); > - qdev_realize(DEVICE(drc), NULL, NULL); > - g_free(prop_name); > + drc->owner = object_ref(owner); > + SPAPR_DR_CONNECTOR_GET_CLASS(drc)->realize(drc); > > return drc; > } > > +void spapr_dr_connector_free(SpaprDrc *drc) > +{ > + SPAPR_DR_CONNECTOR_GET_CLASS(drc)->unrealize(drc); > + object_unref(drc->owner); > + object_unref(drc); > +} > + > static void spapr_dr_connector_instance_init(Object *obj) > { > SpaprDrc *drc = SPAPR_DR_CONNECTOR(obj); > @@ -575,17 +565,19 @@ static void spapr_dr_connector_instance_init(Object *obj) > drc->state = drck->empty_state; > } > > +static char *drc_vmstate_if_get_id(VMStateIf *obj) > +{ > + return NULL; > +} > + > static void spapr_dr_connector_class_init(ObjectClass *k, void *data) > { > - DeviceClass *dk = DEVICE_CLASS(k); > + SpaprDrcClass *drck = SPAPR_DR_CONNECTOR_CLASS(k); > + VMStateIfClass *vc = VMSTATE_IF_CLASS(k); > > - dk->realize = drc_realize; > - dk->unrealize = drc_unrealize; > - /* > - * Reason: DR connector needs to be wired to either the machine or to a > - * PHB in spapr_dr_connector_new(). > - */ > - dk->user_creatable = false; > + drck->realize = drc_realize; > + drck->unrealize = drc_unrealize; > + vc->get_id = drc_vmstate_if_get_id; > } > > static bool drc_physical_needed(void *opaque) > @@ -623,39 +615,32 @@ static void drc_physical_reset(void *opaque) > } > } > > -static void realize_physical(DeviceState *d, Error **errp) > +static void realize_physical(SpaprDrc *drc) > { > - SpaprDrcPhysical *drcp = SPAPR_DRC_PHYSICAL(d); > - Error *local_err = NULL; > - > - drc_realize(d, &local_err); > - if (local_err) { > - error_propagate(errp, local_err); > - return; > - } > + SpaprDrcPhysical *drcp = SPAPR_DRC_PHYSICAL(drc); > > + drc_realize(drc); > vmstate_register(VMSTATE_IF(drcp), > spapr_drc_index(SPAPR_DR_CONNECTOR(drcp)), > &vmstate_spapr_drc_physical, drcp); > qemu_register_reset(drc_physical_reset, drcp); > } > > -static void unrealize_physical(DeviceState *d) > +static void unrealize_physical(SpaprDrc *drc) > { > - SpaprDrcPhysical *drcp = SPAPR_DRC_PHYSICAL(d); > + SpaprDrcPhysical *drcp = SPAPR_DRC_PHYSICAL(drc); > > - drc_unrealize(d); > - vmstate_unregister(VMSTATE_IF(drcp), &vmstate_spapr_drc_physical, drcp); > qemu_unregister_reset(drc_physical_reset, drcp); > + vmstate_unregister(VMSTATE_IF(drcp), &vmstate_spapr_drc_physical, drcp); > + drc_unrealize(drc); > } > > static void spapr_drc_physical_class_init(ObjectClass *k, void *data) > { > - DeviceClass *dk = DEVICE_CLASS(k); > SpaprDrcClass *drck = SPAPR_DR_CONNECTOR_CLASS(k); > > - dk->realize = realize_physical; > - dk->unrealize = unrealize_physical; > + drck->realize = realize_physical; > + drck->unrealize = unrealize_physical; > drck->dr_entity_sense = physical_entity_sense; > drck->isolate = drc_isolate_physical; > drck->unisolate = drc_unisolate_physical; > @@ -731,12 +716,16 @@ static void spapr_drc_pmem_class_init(ObjectClass *k, void *data) > > static const TypeInfo spapr_dr_connector_info = { > .name = TYPE_SPAPR_DR_CONNECTOR, > - .parent = TYPE_DEVICE, > + .parent = TYPE_OBJECT, > .instance_size = sizeof(SpaprDrc), > .instance_init = spapr_dr_connector_instance_init, > .class_size = sizeof(SpaprDrcClass), > .class_init = spapr_dr_connector_class_init, > .abstract = true, > + .interfaces = (InterfaceInfo[]) { > + { TYPE_VMSTATE_IF }, > + { } > + }, > }; > > static const TypeInfo spapr_drc_physical_info = { > @@ -789,14 +778,9 @@ static const TypeInfo spapr_drc_pmem_info = { > > SpaprDrc *spapr_drc_by_index(uint32_t index) > { > - Object *obj; > - gchar *name; > - > - name = g_strdup_printf("%s/%x", DRC_CONTAINER_PATH, index); > - obj = object_resolve_path(name, NULL); > - g_free(name); > - > - return !obj ? NULL : SPAPR_DR_CONNECTOR(obj); > + return > + SPAPR_DR_CONNECTOR(g_hash_table_lookup(drc_hash_table(), > + GUINT_TO_POINTER(index))); > } > > SpaprDrc *spapr_drc_by_id(const char *type, uint32_t id) > @@ -824,13 +808,12 @@ SpaprDrc *spapr_drc_by_id(const char *type, uint32_t id) > */ > int spapr_dt_drc(void *fdt, int offset, Object *owner, uint32_t drc_type_mask) > { > - Object *root_container; > - ObjectProperty *prop; > - ObjectPropertyIterator iter; > + GHashTableIter iter; > uint32_t drc_count = 0; > GArray *drc_indexes, *drc_power_domains; > GString *drc_names, *drc_types; > int ret; > + SpaprDrc *drc; > > /* > * This should really be only called once per node since it overwrites > @@ -851,26 +834,12 @@ int spapr_dt_drc(void *fdt, int offset, Object *owner, uint32_t drc_type_mask) > drc_names = g_string_set_size(g_string_new(NULL), sizeof(uint32_t)); > drc_types = g_string_set_size(g_string_new(NULL), sizeof(uint32_t)); > > - /* aliases for all DRConnector objects will be rooted in QOM > - * composition tree at DRC_CONTAINER_PATH > - */ > - root_container = container_get(object_get_root(), DRC_CONTAINER_PATH); > - > - object_property_iter_init(&iter, root_container); > - while ((prop = object_property_iter_next(&iter))) { > - Object *obj; > - SpaprDrc *drc; > + g_hash_table_iter_init(&iter, drc_hash_table()); > + while (g_hash_table_iter_next(&iter, NULL, (gpointer) &drc)) { > SpaprDrcClass *drck; > char *drc_name = NULL; > uint32_t drc_index, drc_power_domain; > > - if (!strstart(prop->type, "link<", NULL)) { > - continue; > - } > - > - obj = object_property_get_link(root_container, prop->name, > - &error_abort); > - drc = SPAPR_DR_CONNECTOR(obj); > drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc); > > if (owner && (drc->owner != owner)) { > @@ -951,23 +920,12 @@ out: > > void spapr_drc_reset_all(SpaprMachineState *spapr) > { > - Object *drc_container; > - ObjectProperty *prop; > - ObjectPropertyIterator iter; > + GHashTableIter iter; > + SpaprDrc *drc; > > - drc_container = container_get(object_get_root(), DRC_CONTAINER_PATH); > restart: > - object_property_iter_init(&iter, drc_container); > - while ((prop = object_property_iter_next(&iter))) { > - SpaprDrc *drc; > - > - if (!strstart(prop->type, "link<", NULL)) { > - continue; > - } > - drc = SPAPR_DR_CONNECTOR(object_property_get_link(drc_container, > - prop->name, > - &error_abort)); > - > + g_hash_table_iter_init(&iter, drc_hash_table()); > + while (g_hash_table_iter_next(&iter, NULL, (gpointer) &drc)) { > /* > * This will complete any pending plug/unplug requests. > * In case of a unplugged PHB or PCI bridge, this will > diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c > index 76d7c91e9c64..ca0cca664e3c 100644 > --- a/hw/ppc/spapr_pci.c > +++ b/hw/ppc/spapr_pci.c > @@ -1262,7 +1262,7 @@ static void remove_drcs(SpaprPhbState *phb, PCIBus *bus) > SpaprDrc *drc = drc_from_devfn(phb, chassis, i); > > if (drc) { > - object_unparent(OBJECT(drc)); > + spapr_dr_connector_free(drc); > } > } > } >
On Fri, Dec 18, 2020 at 11:34:00AM +0100, Greg Kurz wrote: > Modeling DR connectors as individual devices raises some > concerns, as already discussed a year ago in this thread: > > https://patchew.org/QEMU/20191017205953.13122-1-cheloha@linux.vnet.ibm.com/ > > First, high maxmem settings creates too many DRC devices. > This causes scalability issues. It severely increase boot > time because the multiple traversals of the DRC list that > are performed during machine setup are quadratic operations. > This is directly related to the fact that DRCs are modeled > as individual devices and added to the composition tree. > > Second, DR connectors are really an internal concept of > PAPR. They aren't something that the user or management > layer can manipulate in any way. We already don't allow > their creation with device_add by clearing user_creatable. > > DR connectors don't even need to be modeled as actual > devices since they don't sit in a bus. They just need > to be associated to an 'owner' object and to have the > equivalent of realize/unrealize functions. > > Downgrade them to be simple objects. Convert the existing > realize() and unrealize() to be methods of the DR connector > base class. Also have the base class to inherit from the > vmstate_if interface directly. The get_id() hook simply > returns NULL, just as device_vmstate_if_get_id() does for > devices that don't sit in a bus. The DR connector is no > longer made a child object. This means that it must be > explicitely freed when no longer needed. This is only > required for PHBs and PCI bridges actually : have them to > free the DRC with spapr_dr_connector_free() instead of > object_unparent(). > > No longer add the DRCs to the QOM composition tree. Track > them with a glib hash table using the global DRC index as > the key instead. This makes traversal a linear operation. I have some reservations about this one. The main thing is that attaching migration state to something that's not a device seems a bit odd to me. AFAICT exactly one other non-device implements TYPE_VMSTATE_IF, and what it does isn't very clear to me. As I might have mentioned to you I had a different idea for how to address this problem: still use a TYPE_DEVICE, but have it manage a whole array of DRCs as one unit, rather than just a single one. Specifically I was thinking: * one array per PCI bus (DRCs for each function on the bus) * one array for each block of memory (so one for base memory, one for each DIMM) * one array for all the cpus * one array for all the PHBs It has some disadvantages compared to your scheme: it still leaves (less) devices which can't be user managed, which is a bit ugly. On the other hand, each of those arrays can reasonably be dense, so we can use direct indexing rather than a hash table, which is a bit nicer. Thoughts? > > Signed-off-by: Greg Kurz <groug@kaod.org> > --- > include/hw/ppc/spapr_drc.h | 8 +- > hw/ppc/spapr_drc.c | 166 ++++++++++++++----------------------- > hw/ppc/spapr_pci.c | 2 +- > 3 files changed, 69 insertions(+), 107 deletions(-) > > diff --git a/include/hw/ppc/spapr_drc.h b/include/hw/ppc/spapr_drc.h > index 8982927d5c24..a26aa8b9d4c3 100644 > --- a/include/hw/ppc/spapr_drc.h > +++ b/include/hw/ppc/spapr_drc.h > @@ -170,7 +170,7 @@ typedef enum { > > typedef struct SpaprDrc { > /*< private >*/ > - DeviceState parent; > + Object parent; > > uint32_t id; > Object *owner; > @@ -193,7 +193,7 @@ struct SpaprMachineState; > > typedef struct SpaprDrcClass { > /*< private >*/ > - DeviceClass parent; > + ObjectClass parent; > SpaprDrcState empty_state; > SpaprDrcState ready_state; > > @@ -209,6 +209,9 @@ typedef struct SpaprDrcClass { > > int (*dt_populate)(SpaprDrc *drc, struct SpaprMachineState *spapr, > void *fdt, int *fdt_start_offset, Error **errp); > + > + void (*realize)(SpaprDrc *drc); > + void (*unrealize)(SpaprDrc *drc); > } SpaprDrcClass; > > typedef struct SpaprDrcPhysical { > @@ -232,6 +235,7 @@ SpaprDrcType spapr_drc_type(SpaprDrc *drc); > > SpaprDrc *spapr_dr_connector_new(Object *owner, const char *type, > uint32_t id); > +void spapr_dr_connector_free(SpaprDrc *drc); > SpaprDrc *spapr_drc_by_index(uint32_t index); > SpaprDrc *spapr_drc_by_id(const char *type, uint32_t id); > int spapr_dt_drc(void *fdt, int offset, Object *owner, uint32_t drc_type_mask); > diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c > index 8571d5bafe4e..e26763f8b5a4 100644 > --- a/hw/ppc/spapr_drc.c > +++ b/hw/ppc/spapr_drc.c > @@ -27,7 +27,6 @@ > #include "sysemu/reset.h" > #include "trace.h" > > -#define DRC_CONTAINER_PATH "/dr-connector" > #define DRC_INDEX_TYPE_SHIFT 28 > #define DRC_INDEX_ID_MASK ((1ULL << DRC_INDEX_TYPE_SHIFT) - 1) > > @@ -503,65 +502,56 @@ static const VMStateDescription vmstate_spapr_drc = { > } > }; > > -static void drc_realize(DeviceState *d, Error **errp) > +static GHashTable *drc_hash_table(void) > { > - SpaprDrc *drc = SPAPR_DR_CONNECTOR(d); > - Object *root_container; > - gchar *link_name; > - const char *child_name; > + static GHashTable *dht; > > + if (!dht) { > + dht = g_hash_table_new(NULL, NULL); > + } > + > + return dht; > +} > + > + > +static void drc_realize(SpaprDrc *drc) > +{ > trace_spapr_drc_realize(spapr_drc_index(drc)); > - /* NOTE: we do this as part of realize/unrealize due to the fact > - * that the guest will communicate with the DRC via RTAS calls > - * referencing the global DRC index. By unlinking the DRC > - * from DRC_CONTAINER_PATH/<drc_index> we effectively make it > - * inaccessible by the guest, since lookups rely on this path > - * existing in the composition tree > - */ > - root_container = container_get(object_get_root(), DRC_CONTAINER_PATH); > - link_name = g_strdup_printf("%x", spapr_drc_index(drc)); > - child_name = object_get_canonical_path_component(OBJECT(drc)); > - trace_spapr_drc_realize_child(spapr_drc_index(drc), child_name); > - object_property_add_alias(root_container, link_name, > - drc->owner, child_name); > - g_free(link_name); > + > + g_hash_table_insert(drc_hash_table(), > + GUINT_TO_POINTER(spapr_drc_index(drc)), drc); > vmstate_register(VMSTATE_IF(drc), spapr_drc_index(drc), &vmstate_spapr_drc, > drc); > trace_spapr_drc_realize_complete(spapr_drc_index(drc)); > } > > -static void drc_unrealize(DeviceState *d) > +static void drc_unrealize(SpaprDrc *drc) > { > - SpaprDrc *drc = SPAPR_DR_CONNECTOR(d); > - Object *root_container; > - gchar *name; > - > trace_spapr_drc_unrealize(spapr_drc_index(drc)); > vmstate_unregister(VMSTATE_IF(drc), &vmstate_spapr_drc, drc); > - root_container = container_get(object_get_root(), DRC_CONTAINER_PATH); > - name = g_strdup_printf("%x", spapr_drc_index(drc)); > - object_property_del(root_container, name); > - g_free(name); > + g_hash_table_remove(drc_hash_table(), > + GUINT_TO_POINTER(spapr_drc_index(drc))); > } > > SpaprDrc *spapr_dr_connector_new(Object *owner, const char *type, > uint32_t id) > { > SpaprDrc *drc = SPAPR_DR_CONNECTOR(object_new(type)); > - char *prop_name; > > drc->id = id; > - drc->owner = owner; > - prop_name = g_strdup_printf("dr-connector[%"PRIu32"]", > - spapr_drc_index(drc)); > - object_property_add_child(owner, prop_name, OBJECT(drc)); > - object_unref(OBJECT(drc)); > - qdev_realize(DEVICE(drc), NULL, NULL); > - g_free(prop_name); > + drc->owner = object_ref(owner); > + SPAPR_DR_CONNECTOR_GET_CLASS(drc)->realize(drc); > > return drc; > } > > +void spapr_dr_connector_free(SpaprDrc *drc) > +{ > + SPAPR_DR_CONNECTOR_GET_CLASS(drc)->unrealize(drc); > + object_unref(drc->owner); > + object_unref(drc); > +} > + > static void spapr_dr_connector_instance_init(Object *obj) > { > SpaprDrc *drc = SPAPR_DR_CONNECTOR(obj); > @@ -575,17 +565,19 @@ static void spapr_dr_connector_instance_init(Object *obj) > drc->state = drck->empty_state; > } > > +static char *drc_vmstate_if_get_id(VMStateIf *obj) > +{ > + return NULL; > +} > + > static void spapr_dr_connector_class_init(ObjectClass *k, void *data) > { > - DeviceClass *dk = DEVICE_CLASS(k); > + SpaprDrcClass *drck = SPAPR_DR_CONNECTOR_CLASS(k); > + VMStateIfClass *vc = VMSTATE_IF_CLASS(k); > > - dk->realize = drc_realize; > - dk->unrealize = drc_unrealize; > - /* > - * Reason: DR connector needs to be wired to either the machine or to a > - * PHB in spapr_dr_connector_new(). > - */ > - dk->user_creatable = false; > + drck->realize = drc_realize; > + drck->unrealize = drc_unrealize; > + vc->get_id = drc_vmstate_if_get_id; > } > > static bool drc_physical_needed(void *opaque) > @@ -623,39 +615,32 @@ static void drc_physical_reset(void *opaque) > } > } > > -static void realize_physical(DeviceState *d, Error **errp) > +static void realize_physical(SpaprDrc *drc) > { > - SpaprDrcPhysical *drcp = SPAPR_DRC_PHYSICAL(d); > - Error *local_err = NULL; > - > - drc_realize(d, &local_err); > - if (local_err) { > - error_propagate(errp, local_err); > - return; > - } > + SpaprDrcPhysical *drcp = SPAPR_DRC_PHYSICAL(drc); > > + drc_realize(drc); > vmstate_register(VMSTATE_IF(drcp), > spapr_drc_index(SPAPR_DR_CONNECTOR(drcp)), > &vmstate_spapr_drc_physical, drcp); > qemu_register_reset(drc_physical_reset, drcp); > } > > -static void unrealize_physical(DeviceState *d) > +static void unrealize_physical(SpaprDrc *drc) > { > - SpaprDrcPhysical *drcp = SPAPR_DRC_PHYSICAL(d); > + SpaprDrcPhysical *drcp = SPAPR_DRC_PHYSICAL(drc); > > - drc_unrealize(d); > - vmstate_unregister(VMSTATE_IF(drcp), &vmstate_spapr_drc_physical, drcp); > qemu_unregister_reset(drc_physical_reset, drcp); > + vmstate_unregister(VMSTATE_IF(drcp), &vmstate_spapr_drc_physical, drcp); > + drc_unrealize(drc); > } > > static void spapr_drc_physical_class_init(ObjectClass *k, void *data) > { > - DeviceClass *dk = DEVICE_CLASS(k); > SpaprDrcClass *drck = SPAPR_DR_CONNECTOR_CLASS(k); > > - dk->realize = realize_physical; > - dk->unrealize = unrealize_physical; > + drck->realize = realize_physical; > + drck->unrealize = unrealize_physical; > drck->dr_entity_sense = physical_entity_sense; > drck->isolate = drc_isolate_physical; > drck->unisolate = drc_unisolate_physical; > @@ -731,12 +716,16 @@ static void spapr_drc_pmem_class_init(ObjectClass *k, void *data) > > static const TypeInfo spapr_dr_connector_info = { > .name = TYPE_SPAPR_DR_CONNECTOR, > - .parent = TYPE_DEVICE, > + .parent = TYPE_OBJECT, > .instance_size = sizeof(SpaprDrc), > .instance_init = spapr_dr_connector_instance_init, > .class_size = sizeof(SpaprDrcClass), > .class_init = spapr_dr_connector_class_init, > .abstract = true, > + .interfaces = (InterfaceInfo[]) { > + { TYPE_VMSTATE_IF }, > + { } > + }, > }; > > static const TypeInfo spapr_drc_physical_info = { > @@ -789,14 +778,9 @@ static const TypeInfo spapr_drc_pmem_info = { > > SpaprDrc *spapr_drc_by_index(uint32_t index) > { > - Object *obj; > - gchar *name; > - > - name = g_strdup_printf("%s/%x", DRC_CONTAINER_PATH, index); > - obj = object_resolve_path(name, NULL); > - g_free(name); > - > - return !obj ? NULL : SPAPR_DR_CONNECTOR(obj); > + return > + SPAPR_DR_CONNECTOR(g_hash_table_lookup(drc_hash_table(), > + GUINT_TO_POINTER(index))); > } > > SpaprDrc *spapr_drc_by_id(const char *type, uint32_t id) > @@ -824,13 +808,12 @@ SpaprDrc *spapr_drc_by_id(const char *type, uint32_t id) > */ > int spapr_dt_drc(void *fdt, int offset, Object *owner, uint32_t drc_type_mask) > { > - Object *root_container; > - ObjectProperty *prop; > - ObjectPropertyIterator iter; > + GHashTableIter iter; > uint32_t drc_count = 0; > GArray *drc_indexes, *drc_power_domains; > GString *drc_names, *drc_types; > int ret; > + SpaprDrc *drc; > > /* > * This should really be only called once per node since it overwrites > @@ -851,26 +834,12 @@ int spapr_dt_drc(void *fdt, int offset, Object *owner, uint32_t drc_type_mask) > drc_names = g_string_set_size(g_string_new(NULL), sizeof(uint32_t)); > drc_types = g_string_set_size(g_string_new(NULL), sizeof(uint32_t)); > > - /* aliases for all DRConnector objects will be rooted in QOM > - * composition tree at DRC_CONTAINER_PATH > - */ > - root_container = container_get(object_get_root(), DRC_CONTAINER_PATH); > - > - object_property_iter_init(&iter, root_container); > - while ((prop = object_property_iter_next(&iter))) { > - Object *obj; > - SpaprDrc *drc; > + g_hash_table_iter_init(&iter, drc_hash_table()); > + while (g_hash_table_iter_next(&iter, NULL, (gpointer) &drc)) { > SpaprDrcClass *drck; > char *drc_name = NULL; > uint32_t drc_index, drc_power_domain; > > - if (!strstart(prop->type, "link<", NULL)) { > - continue; > - } > - > - obj = object_property_get_link(root_container, prop->name, > - &error_abort); > - drc = SPAPR_DR_CONNECTOR(obj); > drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc); > > if (owner && (drc->owner != owner)) { > @@ -951,23 +920,12 @@ out: > > void spapr_drc_reset_all(SpaprMachineState *spapr) > { > - Object *drc_container; > - ObjectProperty *prop; > - ObjectPropertyIterator iter; > + GHashTableIter iter; > + SpaprDrc *drc; > > - drc_container = container_get(object_get_root(), DRC_CONTAINER_PATH); > restart: > - object_property_iter_init(&iter, drc_container); > - while ((prop = object_property_iter_next(&iter))) { > - SpaprDrc *drc; > - > - if (!strstart(prop->type, "link<", NULL)) { > - continue; > - } > - drc = SPAPR_DR_CONNECTOR(object_property_get_link(drc_container, > - prop->name, > - &error_abort)); > - > + g_hash_table_iter_init(&iter, drc_hash_table()); > + while (g_hash_table_iter_next(&iter, NULL, (gpointer) &drc)) { > /* > * This will complete any pending plug/unplug requests. > * In case of a unplugged PHB or PCI bridge, this will > diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c > index 76d7c91e9c64..ca0cca664e3c 100644 > --- a/hw/ppc/spapr_pci.c > +++ b/hw/ppc/spapr_pci.c > @@ -1262,7 +1262,7 @@ static void remove_drcs(SpaprPhbState *phb, PCIBus *bus) > SpaprDrc *drc = drc_from_devfn(phb, chassis, i); > > if (drc) { > - object_unparent(OBJECT(drc)); > + spapr_dr_connector_free(drc); > } > } > }
On Mon, 28 Dec 2020 19:28:39 +1100 David Gibson <david@gibson.dropbear.id.au> wrote: > On Fri, Dec 18, 2020 at 11:34:00AM +0100, Greg Kurz wrote: > > Modeling DR connectors as individual devices raises some > > concerns, as already discussed a year ago in this thread: > > > > https://patchew.org/QEMU/20191017205953.13122-1-cheloha@linux.vnet.ibm.com/ > > > > First, high maxmem settings creates too many DRC devices. > > This causes scalability issues. It severely increase boot > > time because the multiple traversals of the DRC list that > > are performed during machine setup are quadratic operations. > > This is directly related to the fact that DRCs are modeled > > as individual devices and added to the composition tree. > > > > Second, DR connectors are really an internal concept of > > PAPR. They aren't something that the user or management > > layer can manipulate in any way. We already don't allow > > their creation with device_add by clearing user_creatable. > > > > DR connectors don't even need to be modeled as actual > > devices since they don't sit in a bus. They just need > > to be associated to an 'owner' object and to have the > > equivalent of realize/unrealize functions. > > > > Downgrade them to be simple objects. Convert the existing > > realize() and unrealize() to be methods of the DR connector > > base class. Also have the base class to inherit from the > > vmstate_if interface directly. The get_id() hook simply > > returns NULL, just as device_vmstate_if_get_id() does for > > devices that don't sit in a bus. The DR connector is no > > longer made a child object. This means that it must be > > explicitely freed when no longer needed. This is only > > required for PHBs and PCI bridges actually : have them to > > free the DRC with spapr_dr_connector_free() instead of > > object_unparent(). > > > > No longer add the DRCs to the QOM composition tree. Track > > them with a glib hash table using the global DRC index as > > the key instead. This makes traversal a linear operation. > > I have some reservations about this one. The main thing is that > attaching migration state to something that's not a device seems a bit > odd to me. AFAICT exactly one other non-device implements > TYPE_VMSTATE_IF, and what it does isn't very clear to me. > Even with your proposal below, the current SpaprDrc type, which is used all over the place, will stop being a TYPE_DEVICE but we still need to support migration with existing machine types for which DRC are devices. Implementing TYPE_VMSTATE_IF is essentially a hack that allows to do that without keeping the current TYPE_DEVICE based implementation around. > As I might have mentioned to you I had a different idea for how to > address this problem: still use a TYPE_DEVICE, but have it manage a > whole array of DRCs as one unit, rather than just a single one. > Specifically I was thinking: > > * one array per PCI bus (DRCs for each function on the bus) > * one array for each block of memory (so one for base memory, one for > each DIMM) > * one array for all the cpus > * one array for all the PHBs > > It has some disadvantages compared to your scheme: it still leaves > (less) devices which can't be user managed, which is a bit ugly. On > the other hand, each of those arrays can reasonably be dense, so we > can use direct indexing rather than a hash table, which is a bit > nicer. > > Thoughts? > I find it a bit overkill to introduce a new TYPE_DEVICE (let's call it a DRC manager) for something that: - doesn't sit on a bus - can't be user managed - isn't directly represented to the guest as a full node in the DT unlike all other devices, but just as indexes in some properties of actual DR capable devices. Given that the DRC index space is global and this is what the guest passes to DR RTAS calls, we can't do direct indexing, strictly speaking. We need at least some logic to dispatch operations on individual DRC states to the appropriate DRC manager. This logic belongs to the machine IMHO. This shouldn't be too complex for CPUs and PHBs since they sit directly under the machine and have 1:1 relation with the attached device. It just boils down to instantiate some DRC managers during machine init: - range [ 0x10000000 ... 0x10000000 + ${max_cpus} [ for CPUs - range [ 0x20000000 ... 0x20000000 + 31 [ for PHBs For memory, the code currently generates DRC indexes in the range: [ 0x80000000 ... 0x80000000 + ${base_ram_size}/256M ... ${max_ram_size}/256M [ ie. it doesn't generate DRC indexes for the base memory AFAICT. Also each DIMM can be of arbitrary size, ie. consume an arbitrary amount of DRC indexes. So the machine would instantiate SPAPR_MAX_RAM_SLOTS (32) DRC managers, each capable of managing the full set of LMB DRCs, just in case ? Likely a lot of zeroes with high maxmem settings but I guess we can live with it. PCI busses would need some extra care though since the machine doesn't know about them. This would require to be able to register/unregister DRC managers for SPAPR_DR_CONNECTOR_TYPE_PCI indexes, so that the dispatching logic know about the ranges they cover (PHB internals). And finally comes migration : I cannot think of a way to generate the VMState sections used by existing machine types out of a set of arrays of integers... We could keep the current implementation around and use it with older machine types, but this certainly looks terrible from a maintenance perspective. Did you have any suggestion to handle that ? I seem to remember that one of the motivation to have arrays of DRCs is to avoid the inflation of VMState sections that we currently get with high maxmem settings, and it is considered preferable to stream sparse arrays. This could be achieved by building these arrays out of the global DRC hash table in a machine pre-save handler and migrate them in a subsection for the default machine type. Older machine types would continue with the current VMState sections thanks to the TYPE_VMSTATE_IF hack. Does this seem a reasonable trade-off to be able to support older and newer machine types with the same implementation ? > > > > Signed-off-by: Greg Kurz <groug@kaod.org> > > --- > > include/hw/ppc/spapr_drc.h | 8 +- > > hw/ppc/spapr_drc.c | 166 ++++++++++++++----------------------- > > hw/ppc/spapr_pci.c | 2 +- > > 3 files changed, 69 insertions(+), 107 deletions(-) > > > > diff --git a/include/hw/ppc/spapr_drc.h b/include/hw/ppc/spapr_drc.h > > index 8982927d5c24..a26aa8b9d4c3 100644 > > --- a/include/hw/ppc/spapr_drc.h > > +++ b/include/hw/ppc/spapr_drc.h > > @@ -170,7 +170,7 @@ typedef enum { > > > > typedef struct SpaprDrc { > > /*< private >*/ > > - DeviceState parent; > > + Object parent; > > > > uint32_t id; > > Object *owner; > > @@ -193,7 +193,7 @@ struct SpaprMachineState; > > > > typedef struct SpaprDrcClass { > > /*< private >*/ > > - DeviceClass parent; > > + ObjectClass parent; > > SpaprDrcState empty_state; > > SpaprDrcState ready_state; > > > > @@ -209,6 +209,9 @@ typedef struct SpaprDrcClass { > > > > int (*dt_populate)(SpaprDrc *drc, struct SpaprMachineState *spapr, > > void *fdt, int *fdt_start_offset, Error **errp); > > + > > + void (*realize)(SpaprDrc *drc); > > + void (*unrealize)(SpaprDrc *drc); > > } SpaprDrcClass; > > > > typedef struct SpaprDrcPhysical { > > @@ -232,6 +235,7 @@ SpaprDrcType spapr_drc_type(SpaprDrc *drc); > > > > SpaprDrc *spapr_dr_connector_new(Object *owner, const char *type, > > uint32_t id); > > +void spapr_dr_connector_free(SpaprDrc *drc); > > SpaprDrc *spapr_drc_by_index(uint32_t index); > > SpaprDrc *spapr_drc_by_id(const char *type, uint32_t id); > > int spapr_dt_drc(void *fdt, int offset, Object *owner, uint32_t drc_type_mask); > > diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c > > index 8571d5bafe4e..e26763f8b5a4 100644 > > --- a/hw/ppc/spapr_drc.c > > +++ b/hw/ppc/spapr_drc.c > > @@ -27,7 +27,6 @@ > > #include "sysemu/reset.h" > > #include "trace.h" > > > > -#define DRC_CONTAINER_PATH "/dr-connector" > > #define DRC_INDEX_TYPE_SHIFT 28 > > #define DRC_INDEX_ID_MASK ((1ULL << DRC_INDEX_TYPE_SHIFT) - 1) > > > > @@ -503,65 +502,56 @@ static const VMStateDescription vmstate_spapr_drc = { > > } > > }; > > > > -static void drc_realize(DeviceState *d, Error **errp) > > +static GHashTable *drc_hash_table(void) > > { > > - SpaprDrc *drc = SPAPR_DR_CONNECTOR(d); > > - Object *root_container; > > - gchar *link_name; > > - const char *child_name; > > + static GHashTable *dht; > > > > + if (!dht) { > > + dht = g_hash_table_new(NULL, NULL); > > + } > > + > > + return dht; > > +} > > + > > + > > +static void drc_realize(SpaprDrc *drc) > > +{ > > trace_spapr_drc_realize(spapr_drc_index(drc)); > > - /* NOTE: we do this as part of realize/unrealize due to the fact > > - * that the guest will communicate with the DRC via RTAS calls > > - * referencing the global DRC index. By unlinking the DRC > > - * from DRC_CONTAINER_PATH/<drc_index> we effectively make it > > - * inaccessible by the guest, since lookups rely on this path > > - * existing in the composition tree > > - */ > > - root_container = container_get(object_get_root(), DRC_CONTAINER_PATH); > > - link_name = g_strdup_printf("%x", spapr_drc_index(drc)); > > - child_name = object_get_canonical_path_component(OBJECT(drc)); > > - trace_spapr_drc_realize_child(spapr_drc_index(drc), child_name); > > - object_property_add_alias(root_container, link_name, > > - drc->owner, child_name); > > - g_free(link_name); > > + > > + g_hash_table_insert(drc_hash_table(), > > + GUINT_TO_POINTER(spapr_drc_index(drc)), drc); > > vmstate_register(VMSTATE_IF(drc), spapr_drc_index(drc), &vmstate_spapr_drc, > > drc); > > trace_spapr_drc_realize_complete(spapr_drc_index(drc)); > > } > > > > -static void drc_unrealize(DeviceState *d) > > +static void drc_unrealize(SpaprDrc *drc) > > { > > - SpaprDrc *drc = SPAPR_DR_CONNECTOR(d); > > - Object *root_container; > > - gchar *name; > > - > > trace_spapr_drc_unrealize(spapr_drc_index(drc)); > > vmstate_unregister(VMSTATE_IF(drc), &vmstate_spapr_drc, drc); > > - root_container = container_get(object_get_root(), DRC_CONTAINER_PATH); > > - name = g_strdup_printf("%x", spapr_drc_index(drc)); > > - object_property_del(root_container, name); > > - g_free(name); > > + g_hash_table_remove(drc_hash_table(), > > + GUINT_TO_POINTER(spapr_drc_index(drc))); > > } > > > > SpaprDrc *spapr_dr_connector_new(Object *owner, const char *type, > > uint32_t id) > > { > > SpaprDrc *drc = SPAPR_DR_CONNECTOR(object_new(type)); > > - char *prop_name; > > > > drc->id = id; > > - drc->owner = owner; > > - prop_name = g_strdup_printf("dr-connector[%"PRIu32"]", > > - spapr_drc_index(drc)); > > - object_property_add_child(owner, prop_name, OBJECT(drc)); > > - object_unref(OBJECT(drc)); > > - qdev_realize(DEVICE(drc), NULL, NULL); > > - g_free(prop_name); > > + drc->owner = object_ref(owner); > > + SPAPR_DR_CONNECTOR_GET_CLASS(drc)->realize(drc); > > > > return drc; > > } > > > > +void spapr_dr_connector_free(SpaprDrc *drc) > > +{ > > + SPAPR_DR_CONNECTOR_GET_CLASS(drc)->unrealize(drc); > > + object_unref(drc->owner); > > + object_unref(drc); > > +} > > + > > static void spapr_dr_connector_instance_init(Object *obj) > > { > > SpaprDrc *drc = SPAPR_DR_CONNECTOR(obj); > > @@ -575,17 +565,19 @@ static void spapr_dr_connector_instance_init(Object *obj) > > drc->state = drck->empty_state; > > } > > > > +static char *drc_vmstate_if_get_id(VMStateIf *obj) > > +{ > > + return NULL; > > +} > > + > > static void spapr_dr_connector_class_init(ObjectClass *k, void *data) > > { > > - DeviceClass *dk = DEVICE_CLASS(k); > > + SpaprDrcClass *drck = SPAPR_DR_CONNECTOR_CLASS(k); > > + VMStateIfClass *vc = VMSTATE_IF_CLASS(k); > > > > - dk->realize = drc_realize; > > - dk->unrealize = drc_unrealize; > > - /* > > - * Reason: DR connector needs to be wired to either the machine or to a > > - * PHB in spapr_dr_connector_new(). > > - */ > > - dk->user_creatable = false; > > + drck->realize = drc_realize; > > + drck->unrealize = drc_unrealize; > > + vc->get_id = drc_vmstate_if_get_id; > > } > > > > static bool drc_physical_needed(void *opaque) > > @@ -623,39 +615,32 @@ static void drc_physical_reset(void *opaque) > > } > > } > > > > -static void realize_physical(DeviceState *d, Error **errp) > > +static void realize_physical(SpaprDrc *drc) > > { > > - SpaprDrcPhysical *drcp = SPAPR_DRC_PHYSICAL(d); > > - Error *local_err = NULL; > > - > > - drc_realize(d, &local_err); > > - if (local_err) { > > - error_propagate(errp, local_err); > > - return; > > - } > > + SpaprDrcPhysical *drcp = SPAPR_DRC_PHYSICAL(drc); > > > > + drc_realize(drc); > > vmstate_register(VMSTATE_IF(drcp), > > spapr_drc_index(SPAPR_DR_CONNECTOR(drcp)), > > &vmstate_spapr_drc_physical, drcp); > > qemu_register_reset(drc_physical_reset, drcp); > > } > > > > -static void unrealize_physical(DeviceState *d) > > +static void unrealize_physical(SpaprDrc *drc) > > { > > - SpaprDrcPhysical *drcp = SPAPR_DRC_PHYSICAL(d); > > + SpaprDrcPhysical *drcp = SPAPR_DRC_PHYSICAL(drc); > > > > - drc_unrealize(d); > > - vmstate_unregister(VMSTATE_IF(drcp), &vmstate_spapr_drc_physical, drcp); > > qemu_unregister_reset(drc_physical_reset, drcp); > > + vmstate_unregister(VMSTATE_IF(drcp), &vmstate_spapr_drc_physical, drcp); > > + drc_unrealize(drc); > > } > > > > static void spapr_drc_physical_class_init(ObjectClass *k, void *data) > > { > > - DeviceClass *dk = DEVICE_CLASS(k); > > SpaprDrcClass *drck = SPAPR_DR_CONNECTOR_CLASS(k); > > > > - dk->realize = realize_physical; > > - dk->unrealize = unrealize_physical; > > + drck->realize = realize_physical; > > + drck->unrealize = unrealize_physical; > > drck->dr_entity_sense = physical_entity_sense; > > drck->isolate = drc_isolate_physical; > > drck->unisolate = drc_unisolate_physical; > > @@ -731,12 +716,16 @@ static void spapr_drc_pmem_class_init(ObjectClass *k, void *data) > > > > static const TypeInfo spapr_dr_connector_info = { > > .name = TYPE_SPAPR_DR_CONNECTOR, > > - .parent = TYPE_DEVICE, > > + .parent = TYPE_OBJECT, > > .instance_size = sizeof(SpaprDrc), > > .instance_init = spapr_dr_connector_instance_init, > > .class_size = sizeof(SpaprDrcClass), > > .class_init = spapr_dr_connector_class_init, > > .abstract = true, > > + .interfaces = (InterfaceInfo[]) { > > + { TYPE_VMSTATE_IF }, > > + { } > > + }, > > }; > > > > static const TypeInfo spapr_drc_physical_info = { > > @@ -789,14 +778,9 @@ static const TypeInfo spapr_drc_pmem_info = { > > > > SpaprDrc *spapr_drc_by_index(uint32_t index) > > { > > - Object *obj; > > - gchar *name; > > - > > - name = g_strdup_printf("%s/%x", DRC_CONTAINER_PATH, index); > > - obj = object_resolve_path(name, NULL); > > - g_free(name); > > - > > - return !obj ? NULL : SPAPR_DR_CONNECTOR(obj); > > + return > > + SPAPR_DR_CONNECTOR(g_hash_table_lookup(drc_hash_table(), > > + GUINT_TO_POINTER(index))); > > } > > > > SpaprDrc *spapr_drc_by_id(const char *type, uint32_t id) > > @@ -824,13 +808,12 @@ SpaprDrc *spapr_drc_by_id(const char *type, uint32_t id) > > */ > > int spapr_dt_drc(void *fdt, int offset, Object *owner, uint32_t drc_type_mask) > > { > > - Object *root_container; > > - ObjectProperty *prop; > > - ObjectPropertyIterator iter; > > + GHashTableIter iter; > > uint32_t drc_count = 0; > > GArray *drc_indexes, *drc_power_domains; > > GString *drc_names, *drc_types; > > int ret; > > + SpaprDrc *drc; > > > > /* > > * This should really be only called once per node since it overwrites > > @@ -851,26 +834,12 @@ int spapr_dt_drc(void *fdt, int offset, Object *owner, uint32_t drc_type_mask) > > drc_names = g_string_set_size(g_string_new(NULL), sizeof(uint32_t)); > > drc_types = g_string_set_size(g_string_new(NULL), sizeof(uint32_t)); > > > > - /* aliases for all DRConnector objects will be rooted in QOM > > - * composition tree at DRC_CONTAINER_PATH > > - */ > > - root_container = container_get(object_get_root(), DRC_CONTAINER_PATH); > > - > > - object_property_iter_init(&iter, root_container); > > - while ((prop = object_property_iter_next(&iter))) { > > - Object *obj; > > - SpaprDrc *drc; > > + g_hash_table_iter_init(&iter, drc_hash_table()); > > + while (g_hash_table_iter_next(&iter, NULL, (gpointer) &drc)) { > > SpaprDrcClass *drck; > > char *drc_name = NULL; > > uint32_t drc_index, drc_power_domain; > > > > - if (!strstart(prop->type, "link<", NULL)) { > > - continue; > > - } > > - > > - obj = object_property_get_link(root_container, prop->name, > > - &error_abort); > > - drc = SPAPR_DR_CONNECTOR(obj); > > drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc); > > > > if (owner && (drc->owner != owner)) { > > @@ -951,23 +920,12 @@ out: > > > > void spapr_drc_reset_all(SpaprMachineState *spapr) > > { > > - Object *drc_container; > > - ObjectProperty *prop; > > - ObjectPropertyIterator iter; > > + GHashTableIter iter; > > + SpaprDrc *drc; > > > > - drc_container = container_get(object_get_root(), DRC_CONTAINER_PATH); > > restart: > > - object_property_iter_init(&iter, drc_container); > > - while ((prop = object_property_iter_next(&iter))) { > > - SpaprDrc *drc; > > - > > - if (!strstart(prop->type, "link<", NULL)) { > > - continue; > > - } > > - drc = SPAPR_DR_CONNECTOR(object_property_get_link(drc_container, > > - prop->name, > > - &error_abort)); > > - > > + g_hash_table_iter_init(&iter, drc_hash_table()); > > + while (g_hash_table_iter_next(&iter, NULL, (gpointer) &drc)) { > > /* > > * This will complete any pending plug/unplug requests. > > * In case of a unplugged PHB or PCI bridge, this will > > diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c > > index 76d7c91e9c64..ca0cca664e3c 100644 > > --- a/hw/ppc/spapr_pci.c > > +++ b/hw/ppc/spapr_pci.c > > @@ -1262,7 +1262,7 @@ static void remove_drcs(SpaprPhbState *phb, PCIBus *bus) > > SpaprDrc *drc = drc_from_devfn(phb, chassis, i); > > > > if (drc) { > > - object_unparent(OBJECT(drc)); > > + spapr_dr_connector_free(drc); > > } > > } > > } >
On Wed, Jan 06, 2021 at 07:15:36PM +0100, Greg Kurz wrote: > On Mon, 28 Dec 2020 19:28:39 +1100 > David Gibson <david@gibson.dropbear.id.au> wrote: > > > On Fri, Dec 18, 2020 at 11:34:00AM +0100, Greg Kurz wrote: > > > Modeling DR connectors as individual devices raises some > > > concerns, as already discussed a year ago in this thread: > > > > > > https://patchew.org/QEMU/20191017205953.13122-1-cheloha@linux.vnet.ibm.com/ > > > > > > First, high maxmem settings creates too many DRC devices. > > > This causes scalability issues. It severely increase boot > > > time because the multiple traversals of the DRC list that > > > are performed during machine setup are quadratic operations. > > > This is directly related to the fact that DRCs are modeled > > > as individual devices and added to the composition tree. > > > > > > Second, DR connectors are really an internal concept of > > > PAPR. They aren't something that the user or management > > > layer can manipulate in any way. We already don't allow > > > their creation with device_add by clearing user_creatable. > > > > > > DR connectors don't even need to be modeled as actual > > > devices since they don't sit in a bus. They just need > > > to be associated to an 'owner' object and to have the > > > equivalent of realize/unrealize functions. > > > > > > Downgrade them to be simple objects. Convert the existing > > > realize() and unrealize() to be methods of the DR connector > > > base class. Also have the base class to inherit from the > > > vmstate_if interface directly. The get_id() hook simply > > > returns NULL, just as device_vmstate_if_get_id() does for > > > devices that don't sit in a bus. The DR connector is no > > > longer made a child object. This means that it must be > > > explicitely freed when no longer needed. This is only > > > required for PHBs and PCI bridges actually : have them to > > > free the DRC with spapr_dr_connector_free() instead of > > > object_unparent(). > > > > > > No longer add the DRCs to the QOM composition tree. Track > > > them with a glib hash table using the global DRC index as > > > the key instead. This makes traversal a linear operation. > > > > I have some reservations about this one. The main thing is that > > attaching migration state to something that's not a device seems a bit > > odd to me. AFAICT exactly one other non-device implements > > TYPE_VMSTATE_IF, and what it does isn't very clear to me. > > > > Even with your proposal below, the current SpaprDrc type, which is > used all over the place, will stop being a TYPE_DEVICE but we still > need to support migration with existing machine types for which DRC > are devices. Ah... that's a good point. > Implementing TYPE_VMSTATE_IF is essentially a hack that > allows to do that without keeping the current TYPE_DEVICE based > implementation around. Ok, that makes things clearer. > > As I might have mentioned to you I had a different idea for how to > > address this problem: still use a TYPE_DEVICE, but have it manage a > > whole array of DRCs as one unit, rather than just a single one. > > Specifically I was thinking: > > > > * one array per PCI bus (DRCs for each function on the bus) > > * one array for each block of memory (so one for base memory, one for > > each DIMM) > > * one array for all the cpus > > * one array for all the PHBs > > > > It has some disadvantages compared to your scheme: it still leaves > > (less) devices which can't be user managed, which is a bit ugly. On > > the other hand, each of those arrays can reasonably be dense, so we > > can use direct indexing rather than a hash table, which is a bit > > nicer. > > > > Thoughts? > > > > I find it a bit overkill to introduce a new TYPE_DEVICE (let's > call it a DRC manager) for something that: > - doesn't sit on a bus > - can't be user managed > - isn't directly represented to the guest as a full node > in the DT unlike all other devices, but just as indexes > in some properties of actual DR capable devices. > > Given that the DRC index space is global and this is what > the guest passes to DR RTAS calls, we can't do direct > indexing, strictly speaking. We need at least some logic > to dispatch operations on individual DRC states to the > appropriate DRC manager. This logic belongs to the machine > IMHO. > > This shouldn't be too complex for CPUs and PHBs since they > sit directly under the machine and have 1:1 relation with > the attached device. It just boils down to instantiate > some DRC managers during machine init: > > - range [ 0x10000000 ... 0x10000000 + ${max_cpus} [ > for CPUs > - range [ 0x20000000 ... 0x20000000 + 31 [ > for PHBs > > For memory, the code currently generates DRC indexes in the range: > > [ 0x80000000 ... 0x80000000 + ${base_ram_size}/256M ... ${max_ram_size}/256M [ > > ie. it doesn't generate DRC indexes for the base memory AFAICT. Also > each DIMM can be of arbitrary size, ie. consume an arbitrary amount > of DRC indexes. So the machine would instantiate SPAPR_MAX_RAM_SLOTS (32) > DRC managers, each capable of managing the full set of LMB DRCs, just > in case ? Likely a lot of zeroes with high maxmem settings but I guess > we can live with it. Actually, I was thinking of just a single manager for all the (pluggable) LMB DRCs, a single manager for all CPU DRCs, a single manager for all PHB DRCs and one per bus for PCI DRCs. I'm not assuming a 1:1 correspondance between manager and user side hotplug operations. Although... actually the "manager" could be an interface rather than an object, in which case the DRC manager would be the machine for LMBs, CPUs, and PHBs and the parent bus for each PCI slot. > PCI busses would need some extra care though since the machine > doesn't know about them. This would require to be able to > register/unregister DRC managers for SPAPR_DR_CONNECTOR_TYPE_PCI > indexes, so that the dispatching logic know about the ranges > they cover (PHB internals). Right, but that wouldn't really be any different from the dynamic creation of DRCs we do add_drcs() / remove_drcs() right now, except that it would create /destroy one object instead of a bunch. > And finally comes migration : I cannot think of a way to generate > the VMState sections used by existing machine types out of a set > of arrays of integers... We could keep the current implementation > around and use it with older machine types, but this certainly > looks terrible from a maintenance perspective. Did you have any > suggestion to handle that ? Ugh, yeah.. that could be difficult. > I seem to remember that one of the motivation to have arrays > of DRCs is to avoid the inflation of VMState sections that > we currently get with high maxmem settings, and it is considered > preferable to stream sparse arrays. This could be achieved by > building these arrays out of the global DRC hash table in a machine > pre-save handler and migrate them in a subsection for the default > machine type. Older machine types would continue with the current > VMState sections thanks to the TYPE_VMSTATE_IF hack. > > Does this seem a reasonable trade-off to be able to support > older and newer machine types with the same implementation ? Hrm, maybe, yeah.
diff --git a/include/hw/ppc/spapr_drc.h b/include/hw/ppc/spapr_drc.h index 8982927d5c24..a26aa8b9d4c3 100644 --- a/include/hw/ppc/spapr_drc.h +++ b/include/hw/ppc/spapr_drc.h @@ -170,7 +170,7 @@ typedef enum { typedef struct SpaprDrc { /*< private >*/ - DeviceState parent; + Object parent; uint32_t id; Object *owner; @@ -193,7 +193,7 @@ struct SpaprMachineState; typedef struct SpaprDrcClass { /*< private >*/ - DeviceClass parent; + ObjectClass parent; SpaprDrcState empty_state; SpaprDrcState ready_state; @@ -209,6 +209,9 @@ typedef struct SpaprDrcClass { int (*dt_populate)(SpaprDrc *drc, struct SpaprMachineState *spapr, void *fdt, int *fdt_start_offset, Error **errp); + + void (*realize)(SpaprDrc *drc); + void (*unrealize)(SpaprDrc *drc); } SpaprDrcClass; typedef struct SpaprDrcPhysical { @@ -232,6 +235,7 @@ SpaprDrcType spapr_drc_type(SpaprDrc *drc); SpaprDrc *spapr_dr_connector_new(Object *owner, const char *type, uint32_t id); +void spapr_dr_connector_free(SpaprDrc *drc); SpaprDrc *spapr_drc_by_index(uint32_t index); SpaprDrc *spapr_drc_by_id(const char *type, uint32_t id); int spapr_dt_drc(void *fdt, int offset, Object *owner, uint32_t drc_type_mask); diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c index 8571d5bafe4e..e26763f8b5a4 100644 --- a/hw/ppc/spapr_drc.c +++ b/hw/ppc/spapr_drc.c @@ -27,7 +27,6 @@ #include "sysemu/reset.h" #include "trace.h" -#define DRC_CONTAINER_PATH "/dr-connector" #define DRC_INDEX_TYPE_SHIFT 28 #define DRC_INDEX_ID_MASK ((1ULL << DRC_INDEX_TYPE_SHIFT) - 1) @@ -503,65 +502,56 @@ static const VMStateDescription vmstate_spapr_drc = { } }; -static void drc_realize(DeviceState *d, Error **errp) +static GHashTable *drc_hash_table(void) { - SpaprDrc *drc = SPAPR_DR_CONNECTOR(d); - Object *root_container; - gchar *link_name; - const char *child_name; + static GHashTable *dht; + if (!dht) { + dht = g_hash_table_new(NULL, NULL); + } + + return dht; +} + + +static void drc_realize(SpaprDrc *drc) +{ trace_spapr_drc_realize(spapr_drc_index(drc)); - /* NOTE: we do this as part of realize/unrealize due to the fact - * that the guest will communicate with the DRC via RTAS calls - * referencing the global DRC index. By unlinking the DRC - * from DRC_CONTAINER_PATH/<drc_index> we effectively make it - * inaccessible by the guest, since lookups rely on this path - * existing in the composition tree - */ - root_container = container_get(object_get_root(), DRC_CONTAINER_PATH); - link_name = g_strdup_printf("%x", spapr_drc_index(drc)); - child_name = object_get_canonical_path_component(OBJECT(drc)); - trace_spapr_drc_realize_child(spapr_drc_index(drc), child_name); - object_property_add_alias(root_container, link_name, - drc->owner, child_name); - g_free(link_name); + + g_hash_table_insert(drc_hash_table(), + GUINT_TO_POINTER(spapr_drc_index(drc)), drc); vmstate_register(VMSTATE_IF(drc), spapr_drc_index(drc), &vmstate_spapr_drc, drc); trace_spapr_drc_realize_complete(spapr_drc_index(drc)); } -static void drc_unrealize(DeviceState *d) +static void drc_unrealize(SpaprDrc *drc) { - SpaprDrc *drc = SPAPR_DR_CONNECTOR(d); - Object *root_container; - gchar *name; - trace_spapr_drc_unrealize(spapr_drc_index(drc)); vmstate_unregister(VMSTATE_IF(drc), &vmstate_spapr_drc, drc); - root_container = container_get(object_get_root(), DRC_CONTAINER_PATH); - name = g_strdup_printf("%x", spapr_drc_index(drc)); - object_property_del(root_container, name); - g_free(name); + g_hash_table_remove(drc_hash_table(), + GUINT_TO_POINTER(spapr_drc_index(drc))); } SpaprDrc *spapr_dr_connector_new(Object *owner, const char *type, uint32_t id) { SpaprDrc *drc = SPAPR_DR_CONNECTOR(object_new(type)); - char *prop_name; drc->id = id; - drc->owner = owner; - prop_name = g_strdup_printf("dr-connector[%"PRIu32"]", - spapr_drc_index(drc)); - object_property_add_child(owner, prop_name, OBJECT(drc)); - object_unref(OBJECT(drc)); - qdev_realize(DEVICE(drc), NULL, NULL); - g_free(prop_name); + drc->owner = object_ref(owner); + SPAPR_DR_CONNECTOR_GET_CLASS(drc)->realize(drc); return drc; } +void spapr_dr_connector_free(SpaprDrc *drc) +{ + SPAPR_DR_CONNECTOR_GET_CLASS(drc)->unrealize(drc); + object_unref(drc->owner); + object_unref(drc); +} + static void spapr_dr_connector_instance_init(Object *obj) { SpaprDrc *drc = SPAPR_DR_CONNECTOR(obj); @@ -575,17 +565,19 @@ static void spapr_dr_connector_instance_init(Object *obj) drc->state = drck->empty_state; } +static char *drc_vmstate_if_get_id(VMStateIf *obj) +{ + return NULL; +} + static void spapr_dr_connector_class_init(ObjectClass *k, void *data) { - DeviceClass *dk = DEVICE_CLASS(k); + SpaprDrcClass *drck = SPAPR_DR_CONNECTOR_CLASS(k); + VMStateIfClass *vc = VMSTATE_IF_CLASS(k); - dk->realize = drc_realize; - dk->unrealize = drc_unrealize; - /* - * Reason: DR connector needs to be wired to either the machine or to a - * PHB in spapr_dr_connector_new(). - */ - dk->user_creatable = false; + drck->realize = drc_realize; + drck->unrealize = drc_unrealize; + vc->get_id = drc_vmstate_if_get_id; } static bool drc_physical_needed(void *opaque) @@ -623,39 +615,32 @@ static void drc_physical_reset(void *opaque) } } -static void realize_physical(DeviceState *d, Error **errp) +static void realize_physical(SpaprDrc *drc) { - SpaprDrcPhysical *drcp = SPAPR_DRC_PHYSICAL(d); - Error *local_err = NULL; - - drc_realize(d, &local_err); - if (local_err) { - error_propagate(errp, local_err); - return; - } + SpaprDrcPhysical *drcp = SPAPR_DRC_PHYSICAL(drc); + drc_realize(drc); vmstate_register(VMSTATE_IF(drcp), spapr_drc_index(SPAPR_DR_CONNECTOR(drcp)), &vmstate_spapr_drc_physical, drcp); qemu_register_reset(drc_physical_reset, drcp); } -static void unrealize_physical(DeviceState *d) +static void unrealize_physical(SpaprDrc *drc) { - SpaprDrcPhysical *drcp = SPAPR_DRC_PHYSICAL(d); + SpaprDrcPhysical *drcp = SPAPR_DRC_PHYSICAL(drc); - drc_unrealize(d); - vmstate_unregister(VMSTATE_IF(drcp), &vmstate_spapr_drc_physical, drcp); qemu_unregister_reset(drc_physical_reset, drcp); + vmstate_unregister(VMSTATE_IF(drcp), &vmstate_spapr_drc_physical, drcp); + drc_unrealize(drc); } static void spapr_drc_physical_class_init(ObjectClass *k, void *data) { - DeviceClass *dk = DEVICE_CLASS(k); SpaprDrcClass *drck = SPAPR_DR_CONNECTOR_CLASS(k); - dk->realize = realize_physical; - dk->unrealize = unrealize_physical; + drck->realize = realize_physical; + drck->unrealize = unrealize_physical; drck->dr_entity_sense = physical_entity_sense; drck->isolate = drc_isolate_physical; drck->unisolate = drc_unisolate_physical; @@ -731,12 +716,16 @@ static void spapr_drc_pmem_class_init(ObjectClass *k, void *data) static const TypeInfo spapr_dr_connector_info = { .name = TYPE_SPAPR_DR_CONNECTOR, - .parent = TYPE_DEVICE, + .parent = TYPE_OBJECT, .instance_size = sizeof(SpaprDrc), .instance_init = spapr_dr_connector_instance_init, .class_size = sizeof(SpaprDrcClass), .class_init = spapr_dr_connector_class_init, .abstract = true, + .interfaces = (InterfaceInfo[]) { + { TYPE_VMSTATE_IF }, + { } + }, }; static const TypeInfo spapr_drc_physical_info = { @@ -789,14 +778,9 @@ static const TypeInfo spapr_drc_pmem_info = { SpaprDrc *spapr_drc_by_index(uint32_t index) { - Object *obj; - gchar *name; - - name = g_strdup_printf("%s/%x", DRC_CONTAINER_PATH, index); - obj = object_resolve_path(name, NULL); - g_free(name); - - return !obj ? NULL : SPAPR_DR_CONNECTOR(obj); + return + SPAPR_DR_CONNECTOR(g_hash_table_lookup(drc_hash_table(), + GUINT_TO_POINTER(index))); } SpaprDrc *spapr_drc_by_id(const char *type, uint32_t id) @@ -824,13 +808,12 @@ SpaprDrc *spapr_drc_by_id(const char *type, uint32_t id) */ int spapr_dt_drc(void *fdt, int offset, Object *owner, uint32_t drc_type_mask) { - Object *root_container; - ObjectProperty *prop; - ObjectPropertyIterator iter; + GHashTableIter iter; uint32_t drc_count = 0; GArray *drc_indexes, *drc_power_domains; GString *drc_names, *drc_types; int ret; + SpaprDrc *drc; /* * This should really be only called once per node since it overwrites @@ -851,26 +834,12 @@ int spapr_dt_drc(void *fdt, int offset, Object *owner, uint32_t drc_type_mask) drc_names = g_string_set_size(g_string_new(NULL), sizeof(uint32_t)); drc_types = g_string_set_size(g_string_new(NULL), sizeof(uint32_t)); - /* aliases for all DRConnector objects will be rooted in QOM - * composition tree at DRC_CONTAINER_PATH - */ - root_container = container_get(object_get_root(), DRC_CONTAINER_PATH); - - object_property_iter_init(&iter, root_container); - while ((prop = object_property_iter_next(&iter))) { - Object *obj; - SpaprDrc *drc; + g_hash_table_iter_init(&iter, drc_hash_table()); + while (g_hash_table_iter_next(&iter, NULL, (gpointer) &drc)) { SpaprDrcClass *drck; char *drc_name = NULL; uint32_t drc_index, drc_power_domain; - if (!strstart(prop->type, "link<", NULL)) { - continue; - } - - obj = object_property_get_link(root_container, prop->name, - &error_abort); - drc = SPAPR_DR_CONNECTOR(obj); drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc); if (owner && (drc->owner != owner)) { @@ -951,23 +920,12 @@ out: void spapr_drc_reset_all(SpaprMachineState *spapr) { - Object *drc_container; - ObjectProperty *prop; - ObjectPropertyIterator iter; + GHashTableIter iter; + SpaprDrc *drc; - drc_container = container_get(object_get_root(), DRC_CONTAINER_PATH); restart: - object_property_iter_init(&iter, drc_container); - while ((prop = object_property_iter_next(&iter))) { - SpaprDrc *drc; - - if (!strstart(prop->type, "link<", NULL)) { - continue; - } - drc = SPAPR_DR_CONNECTOR(object_property_get_link(drc_container, - prop->name, - &error_abort)); - + g_hash_table_iter_init(&iter, drc_hash_table()); + while (g_hash_table_iter_next(&iter, NULL, (gpointer) &drc)) { /* * This will complete any pending plug/unplug requests. * In case of a unplugged PHB or PCI bridge, this will diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c index 76d7c91e9c64..ca0cca664e3c 100644 --- a/hw/ppc/spapr_pci.c +++ b/hw/ppc/spapr_pci.c @@ -1262,7 +1262,7 @@ static void remove_drcs(SpaprPhbState *phb, PCIBus *bus) SpaprDrc *drc = drc_from_devfn(phb, chassis, i); if (drc) { - object_unparent(OBJECT(drc)); + spapr_dr_connector_free(drc); } } }
Modeling DR connectors as individual devices raises some concerns, as already discussed a year ago in this thread: https://patchew.org/QEMU/20191017205953.13122-1-cheloha@linux.vnet.ibm.com/ First, high maxmem settings creates too many DRC devices. This causes scalability issues. It severely increase boot time because the multiple traversals of the DRC list that are performed during machine setup are quadratic operations. This is directly related to the fact that DRCs are modeled as individual devices and added to the composition tree. Second, DR connectors are really an internal concept of PAPR. They aren't something that the user or management layer can manipulate in any way. We already don't allow their creation with device_add by clearing user_creatable. DR connectors don't even need to be modeled as actual devices since they don't sit in a bus. They just need to be associated to an 'owner' object and to have the equivalent of realize/unrealize functions. Downgrade them to be simple objects. Convert the existing realize() and unrealize() to be methods of the DR connector base class. Also have the base class to inherit from the vmstate_if interface directly. The get_id() hook simply returns NULL, just as device_vmstate_if_get_id() does for devices that don't sit in a bus. The DR connector is no longer made a child object. This means that it must be explicitely freed when no longer needed. This is only required for PHBs and PCI bridges actually : have them to free the DRC with spapr_dr_connector_free() instead of object_unparent(). No longer add the DRCs to the QOM composition tree. Track them with a glib hash table using the global DRC index as the key instead. This makes traversal a linear operation. Signed-off-by: Greg Kurz <groug@kaod.org> --- include/hw/ppc/spapr_drc.h | 8 +- hw/ppc/spapr_drc.c | 166 ++++++++++++++----------------------- hw/ppc/spapr_pci.c | 2 +- 3 files changed, 69 insertions(+), 107 deletions(-)