From patchwork Fri Jan 26 12:48:45 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Daniel Henrique Barboza X-Patchwork-Id: 10186589 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 8FFE8602C8 for ; Fri, 26 Jan 2018 16:39:33 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 820D92A1B3 for ; Fri, 26 Jan 2018 16:39:33 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 763602A1BB; Fri, 26 Jan 2018 16:39:33 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 4B68C2A1B3 for ; Fri, 26 Jan 2018 16:39:32 +0000 (UTC) Received: from localhost ([::1]:37271 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ef72J-0007lf-3O for patchwork-qemu-devel@patchwork.kernel.org; Fri, 26 Jan 2018 11:39:31 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:39611) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ef6xN-00040K-D6 for qemu-devel@nongnu.org; Fri, 26 Jan 2018 11:34:26 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ef6xI-0008Eh-FD for qemu-devel@nongnu.org; Fri, 26 Jan 2018 11:34:25 -0500 Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]:49776) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1ef6xH-0008E8-Ut for qemu-devel@nongnu.org; Fri, 26 Jan 2018 11:34:20 -0500 Received: from pps.filterd (m0098410.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.22/8.16.0.22) with SMTP id w0QCn4Fr071247 for ; Fri, 26 Jan 2018 07:49:06 -0500 Received: from e37.co.us.ibm.com (e37.co.us.ibm.com [32.97.110.158]) by mx0a-001b2d01.pphosted.com with ESMTP id 2fr2ymur67-1 (version=TLSv1.2 cipher=AES256-SHA bits=256 verify=NOT) for ; Fri, 26 Jan 2018 07:49:05 -0500 Received: from localhost by e37.co.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Fri, 26 Jan 2018 05:48:57 -0700 Received: from b03cxnp07028.gho.boulder.ibm.com (9.17.130.15) by e37.co.us.ibm.com (192.168.1.137) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Fri, 26 Jan 2018 05:48:55 -0700 Received: from b03ledav004.gho.boulder.ibm.com (b03ledav004.gho.boulder.ibm.com [9.17.130.235]) by b03cxnp07028.gho.boulder.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id w0QCmten13566266; Fri, 26 Jan 2018 05:48:55 -0700 Received: from b03ledav004.gho.boulder.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 7006C7803F; Fri, 26 Jan 2018 05:48:55 -0700 (MST) Received: from localhost.localdomain (unknown [9.80.210.41]) by b03ledav004.gho.boulder.ibm.com (Postfix) with ESMTP id 1824C78037; Fri, 26 Jan 2018 05:48:53 -0700 (MST) From: Daniel Henrique Barboza To: qemu-devel@nongnu.org Date: Fri, 26 Jan 2018 10:48:45 -0200 X-Mailer: git-send-email 2.14.3 X-TM-AS-GCONF: 00 x-cbid: 18012612-0024-0000-0000-000017D94FBA X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00008431; HX=3.00000241; KW=3.00000007; PH=3.00000004; SC=3.00000248; SDB=6.00980622; UDB=6.00497126; IPR=6.00759923; BA=6.00005796; NDR=6.00000001; ZLA=6.00000005; ZF=6.00000009; ZB=6.00000000; ZP=6.00000000; ZH=6.00000000; ZU=6.00000002; MB=3.00019224; XFM=3.00000015; UTC=2018-01-26 12:48:56 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 18012612-0025-0000-0000-00004E75CA1B Message-Id: <20180126124845.19257-1-danielhb@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:, , definitions=2018-01-26_07:, , signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=1 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 impostorscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1709140000 definitions=main-1801260169 X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x [generic] [fuzzy] X-Received-From: 148.163.156.1 Subject: [Qemu-devel] [PATCH] migration/savevm.c: do not fail when len > MAX_VM_CMD_PACKAGED_SIZE X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Daniel Henrique Barboza , dgilbert@redhat.com, quintela@redhat.com Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" X-Virus-Scanned: ClamAV using ClamSMTP MAX_VM_CMD_PACKAGED_SIZE is a constant used in qemu_savevm_send_packaged and loadvm_handle_cmd_packaged to determine whether a package is too big to be sent or received. qemu_savevm_send_packaged is called inside postcopy_start (migration/migration.c) to send the MigrationState in a single blob to the destination, using the MIG_CMD_PACKAGED subcommand, which will read it up using loadvm_handle_cmd_packaged. If the blob is larger than MAX_VM_CMD_PACKAGED_SIZE, an error is thrown and the postcopy migration is aborted. Both MAX_VM_CMD_PACKAGED_SIZE and MIG_CMD_PACKAGED were introduced by commit 11cf1d984b ("MIG_CMD_PACKAGED: Send a packaged chunk ..."). The constant has its original value of 1ul << 24 (16MB). The current MAX_VM_CMD_PACKAGED_SIZE value is not enough to support postcopy migration of bigger pseries guests. The blob size for a postcopy migration of a pseries guest with the following setup: qemu-system-ppc64 --nographic -vga none -machine pseries,accel=kvm -m 64G \ -smp 1,maxcpus=32 -device virtio-blk-pci,drive=rootdisk \ -drive file=f27.qcow2,if=none,cache=none,format=qcow2,id=rootdisk \ -netdev user,id=u1 -net nic,netdev=u1 goes around 12MB. Bumping the RAM to 128G makes the blob sizes goes to 20MB. With 256G the blob goes to 37MB - more than twice the current maximum size. At this moment the pseries machine can handle guests with up to 1TB of RAM, making this postcopy blob goes to 128MB of size approximately. One solution is to bump MAX_VM_CMD_PACKAGED_SIZE up to bigger values. A value of 1ul << 27 would be enough for pseries guests up to 1TB of RAM, but there are 2 problems with this approach: - we'll keep supporting bigger and bigger guests as time goes by. This constant would be bumped from time to time; - if we're willing to bump the constant every time we need a bigger blob, why have the constant in the first place? Considering that its current value is 16MB, bumping it to 128MB already makes it 'unreasonably large' considering the original design of MIG_CMD_PACKAGED. A better long term solution is to determine whether the design of MIG_CMD_PACKAGED can be changed to send partial blobs of smaller sizes or even get rid of the size limitation. Until then, this patch changes both qemu_savevm_send_packaged and loadvm_handle_cmd_packaged to not bail out if the blob len is greater than MAX_VM_CMD_PACKAGED_SIZE. To not fully ignore the occurrence (something can go wrong and the MigrationState can inadvertently grow beyond expected), we also change the traces of both functions to report both the current blob size and the current recommended maximum. This way we allow big guests to execute postcopy migration while retaining the information for debug purposes. Signed-off-by: Daniel Henrique Barboza Reported-by: Balamuruhan S --- migration/savevm.c | 15 ++------------- migration/trace-events | 4 ++-- 2 files changed, 4 insertions(+), 15 deletions(-) diff --git a/migration/savevm.c b/migration/savevm.c index b7908f62be..c7b9d69578 100644 --- a/migration/savevm.c +++ b/migration/savevm.c @@ -861,15 +861,9 @@ int qemu_savevm_send_packaged(QEMUFile *f, const uint8_t *buf, size_t len) { uint32_t tmp; - if (len > MAX_VM_CMD_PACKAGED_SIZE) { - error_report("%s: Unreasonably large packaged state: %zu", - __func__, len); - return -1; - } - tmp = cpu_to_be32(len); - trace_qemu_savevm_send_packaged(); + trace_qemu_savevm_send_packaged(len, MAX_VM_CMD_PACKAGED_SIZE); qemu_savevm_command_send(f, MIG_CMD_PACKAGED, 4, (uint8_t *)&tmp); qemu_put_buffer(f, buf, len); @@ -1718,12 +1712,7 @@ static int loadvm_handle_cmd_packaged(MigrationIncomingState *mis) QIOChannelBuffer *bioc; length = qemu_get_be32(mis->from_src_file); - trace_loadvm_handle_cmd_packaged(length); - - if (length > MAX_VM_CMD_PACKAGED_SIZE) { - error_report("Unreasonably large packaged state: %zu", length); - return -1; - } + trace_loadvm_handle_cmd_packaged(length, MAX_VM_CMD_PACKAGED_SIZE); bioc = qio_channel_buffer_new(length); qio_channel_set_name(QIO_CHANNEL(bioc), "migration-loadvm-buffer"); diff --git a/migration/trace-events b/migration/trace-events index 6f29fcc686..646963ffec 100644 --- a/migration/trace-events +++ b/migration/trace-events @@ -6,10 +6,10 @@ qemu_loadvm_state_section_command(int ret) "%d" qemu_loadvm_state_section_partend(uint32_t section_id) "%u" qemu_loadvm_state_post_main(int ret) "%d" qemu_loadvm_state_section_startfull(uint32_t section_id, const char *idstr, uint32_t instance_id, uint32_t version_id) "%u(%s) %u %u" -qemu_savevm_send_packaged(void) "" +qemu_savevm_send_packaged(size_t len, size_t max) "size=%zu, max recommended=%zu" loadvm_state_setup(void) "" loadvm_state_cleanup(void) "" -loadvm_handle_cmd_packaged(unsigned int length) "%u" +loadvm_handle_cmd_packaged(size_t len, size_t max) "size=%zu, max recommended=%zu" loadvm_handle_cmd_packaged_main(int ret) "%d" loadvm_handle_cmd_packaged_received(int ret) "%d" loadvm_postcopy_handle_advise(void) ""