From patchwork Fri Dec 18 10:33:54 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Greg Kurz X-Patchwork-Id: 11981683 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3D0E1C2BBCF for ; Fri, 18 Dec 2020 10:40:36 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id DC77E23A62 for ; Fri, 18 Dec 2020 10:40:35 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org DC77E23A62 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=kaod.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Received: from localhost ([::1]:41198 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1kqDBa-0000y3-T6 for qemu-devel@archiver.kernel.org; Fri, 18 Dec 2020 05:40:34 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]:46322) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kqD7I-0003IJ-BC for qemu-devel@nongnu.org; Fri, 18 Dec 2020 05:36:08 -0500 Received: from us-smtp-delivery-44.mimecast.com ([205.139.111.44]:39487) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_CBC_SHA1:256) (Exim 4.90_1) (envelope-from ) id 1kqD7F-0002ha-QY for qemu-devel@nongnu.org; Fri, 18 Dec 2020 05:36:07 -0500 Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-243-z77r1BlcO0iKhkAj7QNiYQ-1; Fri, 18 Dec 2020 05:34:04 -0500 X-MC-Unique: z77r1BlcO0iKhkAj7QNiYQ-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id A9F60107ACE8; Fri, 18 Dec 2020 10:34:02 +0000 (UTC) Received: from bahia.redhat.com (ovpn-114-254.ams2.redhat.com [10.36.114.254]) by smtp.corp.redhat.com (Postfix) with ESMTP id 24E5160BE5; Fri, 18 Dec 2020 10:34:00 +0000 (UTC) From: Greg Kurz To: qemu-devel@nongnu.org Subject: [PATCH 0/6] spapr: Fix visibility and traversal of DR connectors Date: Fri, 18 Dec 2020 11:33:54 +0100 Message-Id: <20201218103400.689660-1-groug@kaod.org> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=groug@kaod.org X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: kaod.org Received-SPF: softfail client-ip=205.139.111.44; envelope-from=groug@kaod.org; helo=us-smtp-delivery-44.mimecast.com X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_NONE=0.001, SPF_SOFTFAIL=0.665 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Daniel Henrique Barboza , qemu-ppc@nongnu.org, Greg Kurz , David Gibson Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" Setting a high maxmem value seriously degrades the guest's boot time, from 3 seconds for 1T up to more than 3 minutes for 8T. All this time is spent during initial machine setup and CAS, preventing use of the QEMU monitor in the meantime. Profiling reveals: % cumulative self self total time seconds seconds calls s/call s/call name 85.48 24.08 24.08 566117 0.00 0.00 object_get_canonical_path_component 13.67 27.93 3.85 57623944 0.00 0.00 strstart ----------------------------------------------- 0.00 0.00 1/566117 host_memory_backend_get_name [270] 1.41 0.22 33054/566117 drc_realize [23] 22.67 3.51 533062/566117 object_get_canonical_path [3] [2] 98.7 24.08 3.73 566117 object_get_canonical_path_component [2] 3.73 0.00 55802324/57623944 strstart [19] ----------------------------------------------- 12 object_property_set_link [1267] 33074 device_set_realized [138] 231378 object_get_child_property [652] [3] 93.0 0.01 26.18 264464 object_get_canonical_path [3] 22.67 3.51 533062/566117 object_get_canonical_path_component [2] 264464 object_get_root [629] ----------------------------------------------- This is because an 8T maxmem means QEMU can create up to 32768 LMB DRC objects, each tracking the hot-plug/unplug state of 256M of contiguous RAM. These objects are all created during machine init for the machine lifetime. Their realize path involves several calls to object_get_canonical_path_component(), which itself traverses all properties of the parent node. This results in a quadratic operation. Worse, the full list of DRCs is traversed 7 times during the boot process, eg. to populate the device tree, calling object_get_canonical_path_component() on each DRC again. Yet even more costly quadratic traversals. Modeling DR connectors as individual devices raises some concerns, as already discussed a year ago in this thread: https://patchew.org/QEMU/20191017205953.13122-1-cheloha@linux.vnet.ibm.com/ First, having so many devices to track the DRC states is excessive and can cause scalability issues in various ways. This bites again with this quadratic traversal issue. Second, DR connectors are really PAPR internals that shouldn't be exposed at all in the composition tree. This series converts DR connectors to be simple unparented objects tracked in a separate hash table, rather than actual devices exposed in the QOM tree. This doesn't address the overall concern on scalability, but this brings linear traversal of the DR connectors. The time penalty with a 8T maxmem is reduced to less than 1 second, and we get a much shorter 'info qom-tree' output. This is transparent to migration. Greg Kurz (6): spapr: Call spapr_drc_reset() for all DRCs at CAS spapr: Fix reset of transient DR connectors spapr: Introduce spapr_drc_reset_all() spapr: Use spapr_drc_reset_all() at machine reset spapr: Add drc_ prefix to the DRC realize and unrealize functions spapr: Model DR connectors as simple objects include/hw/ppc/spapr_drc.h | 18 +++- hw/ppc/spapr.c | 15 +-- hw/ppc/spapr_drc.c | 181 +++++++++++++++++-------------------- hw/ppc/spapr_hcall.c | 33 ++----- hw/ppc/spapr_pci.c | 2 +- 5 files changed, 106 insertions(+), 143 deletions(-) Tested-by: Daniel Henrique Barboza