From patchwork Thu Dec 9 09:39:19 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Leonardo Bras X-Patchwork-Id: 12666233 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id BF4E1C433F5 for ; Thu, 9 Dec 2021 09:41:23 +0000 (UTC) Received: from localhost ([::1]:51956 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1mvFvW-0002g7-Pa for qemu-devel@archiver.kernel.org; Thu, 09 Dec 2021 04:41:22 -0500 Received: from eggs.gnu.org ([209.51.188.92]:33566) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mvFuN-0000Ry-MB for qemu-devel@nongnu.org; Thu, 09 Dec 2021 04:40:11 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:50620) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mvFuF-0008F7-IL for qemu-devel@nongnu.org; Thu, 09 Dec 2021 04:40:10 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1639042802; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=f5uLRbBkiDpEe+yiA6KOcwWWhtKLg/uvMiC1IBwEoqE=; b=MqAReemo6iZXKIzPlVG3mYSLTuH9IjEFf2NjZFw+y6KZEkclY7Q60QWC6ef12bymHiVjT+ IRlkQx1ZI/Sz+7aLmvNiCIYvkMv36AO/9BLZYSZjp0cg0CfoXtm2YgnpcM+znP9O2V9qJa LtrTadwarDolYLWi4gfSrIaYdD5jr7I= Received: from mail-ua1-f71.google.com (mail-ua1-f71.google.com [209.85.222.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-592-GUm8iVjHOjqqK60UvBuvag-1; Thu, 09 Dec 2021 04:40:01 -0500 X-MC-Unique: GUm8iVjHOjqqK60UvBuvag-1 Received: by mail-ua1-f71.google.com with SMTP id y21-20020ab025d5000000b002d3ca52eab5so3305041uan.22 for ; Thu, 09 Dec 2021 01:40:01 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=f5uLRbBkiDpEe+yiA6KOcwWWhtKLg/uvMiC1IBwEoqE=; b=5xbW142X/KCY9CKxjXeTFabB8s2wp7D3ijDtWmj9qvuGo9bgkn5n4I0KU3dSL6HPe5 sOfelbSX7EdGSpvAbpQTcqzTYS4ZyouZ7QkCVckm5iXmHyVDkRZRZ36EUXyhJHWi3FPh DP9Uep+iPuNtBtLKm8jBjfYq+9wVeaj0rC9Lq1baNbw3UGtdDpv9MoMA9P0sSpU+glOm GQMwmMMPu+vvX60zGvh0QHKFdC19H4N96K/y1cm/T9HSaZa1RGfFPHGkLEEKd4QXjyiG gmXOWxg3mkKAHCwKtOCPI5G76sFfjXY4ncMpe+2vZezUBcDAP7J5QoxHskxxsjBLowrG vWfw== X-Gm-Message-State: AOAM533RShLVQRnlVNNWQ6WFM9VEkapuB6tmJ+GTRqZuUk4k18fMg7Qm FBEzjMF/omC8w7Q1SRA+EIPUS/mlKgNH3PmsZbz5OEizeM+udXOVpwDbXoBLG2ifCtF5pxY5QtV l9+4EVXFulRlAaSI= X-Received: by 2002:ab0:6f0a:: with SMTP id r10mr18024966uah.106.1639042800567; Thu, 09 Dec 2021 01:40:00 -0800 (PST) X-Google-Smtp-Source: ABdhPJyOwehyTGYxxKzjKN279c5byQG31/Eij+jLr/kdN1/jrxmL+gNwC6Z0y87AS/UGjcMoeg8SGQ== X-Received: by 2002:ab0:6f0a:: with SMTP id r10mr18024917uah.106.1639042800244; Thu, 09 Dec 2021 01:40:00 -0800 (PST) Received: from LeoBras.redhat.com ([2804:431:c7f0:dadb:5bd9:f52:7a4a:a8af]) by smtp.gmail.com with ESMTPSA id s22sm3460215vkm.28.2021.12.09.01.39.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 09 Dec 2021 01:39:59 -0800 (PST) From: Leonardo Bras To: =?utf-8?q?Daniel_P=2E_Berrang=C3=A9?= , Juan Quintela , "Dr. David Alan Gilbert" , Eric Blake , Markus Armbruster Subject: [PATCH v6 1/6] QIOChannel: Add io_writev_zero_copy & io_flush_zero_copy callbacks Date: Thu, 9 Dec 2021 06:39:19 -0300 Message-Id: <20211209093923.1293701-2-leobras@redhat.com> X-Mailer: git-send-email 2.33.1 In-Reply-To: <20211209093923.1293701-1-leobras@redhat.com> References: <20211209093923.1293701-1-leobras@redhat.com> MIME-Version: 1.0 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=leobras@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Received-SPF: pass client-ip=170.10.133.124; envelope-from=leobras@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -33 X-Spam_score: -3.4 X-Spam_bar: --- X-Spam_report: (-3.4 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.618, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Leonardo Bras , qemu-devel@nongnu.org Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" Adds io_writev_zero_copy and io_flush_zero_copy as optional callback to QIOChannelClass, allowing the implementation of zero copy writes by subclasses. How to use them: - Write data using qio_channel_writev_zero_copy(), - Wait write completion with qio_channel_flush_zero_copy(). Notes: As some zero copy implementations work asynchronously, it's recommended to keep the write buffer untouched until the return of qio_channel_flush_zero_copy(), to avoid the risk of sending an updated buffer instead of the one at the write. As the new callbacks are optional, if a subclass does not implement them, then: - io_writev_zero_copy will return -1, - io_flush_zero_copy will return 0 without changing anything. Also, some functions like qio_channel_writev_full_all() were adapted to receive a flag parameter. That allows shared code between zero copy and non-zero copy writev, and also an easier implementation on new flags. Signed-off-by: Leonardo Bras --- include/io/channel.h | 98 +++++++++++++++++++++++++++++++++++++++----- io/channel.c | 66 +++++++++++++++++++++++------ 2 files changed, 142 insertions(+), 22 deletions(-) diff --git a/include/io/channel.h b/include/io/channel.h index 88988979f8..83fa970a19 100644 --- a/include/io/channel.h +++ b/include/io/channel.h @@ -32,12 +32,15 @@ OBJECT_DECLARE_TYPE(QIOChannel, QIOChannelClass, #define QIO_CHANNEL_ERR_BLOCK -2 +#define QIO_CHANNEL_WRITE_FLAG_ZERO_COPY 0x1 + typedef enum QIOChannelFeature QIOChannelFeature; enum QIOChannelFeature { QIO_CHANNEL_FEATURE_FD_PASS, QIO_CHANNEL_FEATURE_SHUTDOWN, QIO_CHANNEL_FEATURE_LISTEN, + QIO_CHANNEL_FEATURE_WRITE_ZERO_COPY, }; @@ -136,6 +139,12 @@ struct QIOChannelClass { IOHandler *io_read, IOHandler *io_write, void *opaque); + ssize_t (*io_writev_zero_copy)(QIOChannel *ioc, + const struct iovec *iov, + size_t niov, + Error **errp); + int (*io_flush_zero_copy)(QIOChannel *ioc, + Error **errp); }; /* General I/O handling functions */ @@ -321,10 +330,11 @@ int qio_channel_readv_all(QIOChannel *ioc, /** - * qio_channel_writev_all: + * qio_channel_writev_all_flags: * @ioc: the channel object * @iov: the array of memory regions to write data from * @niov: the length of the @iov array + * @flags: write flags (QIO_CHANNEL_WRITE_FLAG_*) * @errp: pointer to a NULL-initialized error object * * Write data to the IO channel, reading it from the @@ -337,12 +347,23 @@ int qio_channel_readv_all(QIOChannel *ioc, * to be written, yielding from the current coroutine * if required. * + * If QIO_CHANNEL_WRITE_FLAG_ZERO_COPY is passed in flags, + * instead of waiting for all requested data to be written, + * this function will wait until it's all queued for writing. + * In this case, if the buffer gets changed between queueing and + * sending, the updated buffer will be sent. If this is not a + * desired behavior, it's suggested to call qio_channel_flush_zero_copy() + * before reusing the buffer. + * * Returns: 0 if all bytes were written, or -1 on error */ -int qio_channel_writev_all(QIOChannel *ioc, - const struct iovec *iov, - size_t niov, - Error **erp); +int qio_channel_writev_all_flags(QIOChannel *ioc, + const struct iovec *iov, + size_t niov, + int flags, + Error **errp); +#define qio_channel_writev_all(ioc, iov, niov, errp) \ + qio_channel_writev_all_flags(ioc, iov, niov, 0, errp) /** * qio_channel_readv: @@ -831,12 +852,13 @@ int qio_channel_readv_full_all(QIOChannel *ioc, Error **errp); /** - * qio_channel_writev_full_all: + * qio_channel_writev_full_all_flags: * @ioc: the channel object * @iov: the array of memory regions to write data from * @niov: the length of the @iov array * @fds: an array of file handles to send * @nfds: number of file handles in @fds + * @flags: write flags (QIO_CHANNEL_WRITE_FLAG_*) * @errp: pointer to a NULL-initialized error object * * @@ -846,13 +868,69 @@ int qio_channel_readv_full_all(QIOChannel *ioc, * to be written, yielding from the current coroutine * if required. * + * If QIO_CHANNEL_WRITE_FLAG_ZERO_COPY is passed in flags, + * instead of waiting for all requested data to be written, + * this function will wait until it's all queued for writing. + * In this case, if the buffer gets changed between queueing and + * sending, the updated buffer will be sent. If this is not a + * desired behavior, it's suggested to call qio_channel_flush_zero_copy() + * before reusing the buffer. + * * Returns: 0 if all bytes were written, or -1 on error */ -int qio_channel_writev_full_all(QIOChannel *ioc, - const struct iovec *iov, - size_t niov, - int *fds, size_t nfds, +int qio_channel_writev_full_all_flags(QIOChannel *ioc, + const struct iovec *iov, + size_t niov, + int *fds, size_t nfds, + int flags, Error **errp); +#define qio_channel_writev_full_all(ioc, iov, niov, fds, nfds, errp) \ + qio_channel_writev_full_all_flags(ioc, iov, niov, fds, nfds, 0, errp) + +/** + * qio_channel_writev_zero_copy: + * @ioc: the channel object + * @iov: the array of memory regions to write data from + * @niov: the length of the @iov array + * @errp: pointer to a NULL-initialized error object + * + * Behaves like qio_channel_writev, but may write + * data asynchronously while avoiding unnecessary data copy. + * This function may return before any data is actually written, + * but will queue every buffer for writing. + * + * Some implementations require the buffer region to be locked, + * so if there is not enough locked memory available to the process + * this function will fail. + * + * If at some point it's necessary to wait for all data to be + * written, use qio_channel_flush_zero_copy(). + * + * If zero copy is not available, returns -1 and set errp. + */ + +ssize_t qio_channel_writev_zero_copy(QIOChannel *ioc, + const struct iovec *iov, + size_t niov, + Error **errp); + +/** + * qio_channel_flush_zero_copy: + * @ioc: the channel object + * @errp: pointer to a NULL-initialized error object + * + * Will block until every packet queued with + * qio_channel_writev_zero_copy() is sent, or return + * in case of any error. + * + * If not implemented, acts as a no-op, and returns 0. + * + * Returns -1 if any error is found, + * 1 if every send failed to use zero copy. + * 0 otherwise. + */ + +int qio_channel_flush_zero_copy(QIOChannel *ioc, Error **errp); #endif /* QIO_CHANNEL_H */ diff --git a/io/channel.c b/io/channel.c index e8b019dc36..d04a6772c8 100644 --- a/io/channel.c +++ b/io/channel.c @@ -212,19 +212,21 @@ int qio_channel_readv_full_all(QIOChannel *ioc, return ret; } -int qio_channel_writev_all(QIOChannel *ioc, - const struct iovec *iov, - size_t niov, - Error **errp) +int qio_channel_writev_all_flags(QIOChannel *ioc, + const struct iovec *iov, + size_t niov, + int flags, + Error **errp) { - return qio_channel_writev_full_all(ioc, iov, niov, NULL, 0, errp); + return qio_channel_writev_full_all_flags(ioc, iov, niov, NULL, 0, flags, + errp); } -int qio_channel_writev_full_all(QIOChannel *ioc, - const struct iovec *iov, - size_t niov, - int *fds, size_t nfds, - Error **errp) +int qio_channel_writev_full_all_flags(QIOChannel *ioc, + const struct iovec *iov, + size_t niov, + int *fds, size_t nfds, + int flags, Error **errp) { int ret = -1; struct iovec *local_iov = g_new(struct iovec, niov); @@ -237,8 +239,16 @@ int qio_channel_writev_full_all(QIOChannel *ioc, while (nlocal_iov > 0) { ssize_t len; - len = qio_channel_writev_full(ioc, local_iov, nlocal_iov, fds, nfds, - errp); + + if (flags & QIO_CHANNEL_WRITE_FLAG_ZERO_COPY) { + assert(fds == NULL && nfds == 0); + len = qio_channel_writev_zero_copy(ioc, local_iov, nlocal_iov, + errp); + } else { + len = qio_channel_writev_full(ioc, local_iov, nlocal_iov, fds, nfds, + errp); + } + if (len == QIO_CHANNEL_ERR_BLOCK) { if (qemu_in_coroutine()) { qio_channel_yield(ioc, G_IO_OUT); @@ -474,6 +484,38 @@ off_t qio_channel_io_seek(QIOChannel *ioc, } +ssize_t qio_channel_writev_zero_copy(QIOChannel *ioc, + const struct iovec *iov, + size_t niov, + Error **errp) +{ + QIOChannelClass *klass = QIO_CHANNEL_GET_CLASS(ioc); + + if (!klass->io_writev_zero_copy || + !qio_channel_has_feature(ioc, QIO_CHANNEL_FEATURE_WRITE_ZERO_COPY)) { + error_setg_errno(errp, EINVAL, + "Channel does not support zero copy writev"); + return -1; + } + + return klass->io_writev_zero_copy(ioc, iov, niov, errp); +} + + +int qio_channel_flush_zero_copy(QIOChannel *ioc, + Error **errp) +{ + QIOChannelClass *klass = QIO_CHANNEL_GET_CLASS(ioc); + + if (!klass->io_flush_zero_copy || + !qio_channel_has_feature(ioc, QIO_CHANNEL_FEATURE_WRITE_ZERO_COPY)) { + return 0; + } + + return klass->io_flush_zero_copy(ioc, errp); +} + + static void qio_channel_restart_read(void *opaque) { QIOChannel *ioc = opaque; From patchwork Thu Dec 9 09:39:20 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Leonardo Bras X-Patchwork-Id: 12666235 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id E92D1C433EF for ; Thu, 9 Dec 2021 09:41:41 +0000 (UTC) Received: from localhost ([::1]:53324 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1mvFvp-0003b1-1v for qemu-devel@archiver.kernel.org; Thu, 09 Dec 2021 04:41:41 -0500 Received: from eggs.gnu.org ([209.51.188.92]:33600) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mvFuN-0000T6-Oo for qemu-devel@nongnu.org; Thu, 09 Dec 2021 04:40:11 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]:31814) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mvFuL-0008GA-Am for qemu-devel@nongnu.org; Thu, 09 Dec 2021 04:40:11 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1639042805; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=jGKoxdnf58ewGNAWIq5i0b14y2q/0mju46wmowQjwo8=; b=O4NdvjxuYa14QGOXgHkxiaWTm4c2AZyUnjvMMZCCHNhq3i77jNM9JPFL0/pGlZBUsALSrj MWSIluw/1nIuBiBsCMMT4yEbb+Sp85SCGqkrRLwDYj4hH/a3E472ElM5/2X1J33dkRYzs0 VxROtta3oR4qJEcyYoDRwV/j/4jAqlg= Received: from mail-ua1-f72.google.com (mail-ua1-f72.google.com [209.85.222.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-125-TBHSCIKsMMC9j6zjLbOSkw-1; Thu, 09 Dec 2021 04:40:04 -0500 X-MC-Unique: TBHSCIKsMMC9j6zjLbOSkw-1 Received: by mail-ua1-f72.google.com with SMTP id t22-20020ab02696000000b002e970ec14a3so3317634uao.19 for ; Thu, 09 Dec 2021 01:40:04 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=jGKoxdnf58ewGNAWIq5i0b14y2q/0mju46wmowQjwo8=; b=OKbpu9g0kLhvozniMsYWl23AKGC6kWGTcvBC9MCZNKTG4a7W5WnEuPS+NssTHRL6B+ zql9oXZo88UzSUOy4dtV7ya6oDz8Xcsj8ltHmzJdMgoILiL9IFg3A8L77qwvQgEI7JSy cC/UqPpRMEABIeFAULTVCy/QEwr0rvT27b44LaEuFj2IeS9Infa48LsuvtVttx1mkXUl o9wavyzUMamxyJDMM1aDUpXSenazLM3kJExRt91Ey1QF2e4B7COjb7Ru0CPbOw5eYrP1 xBseIYv6SjcvUGLz7C9t6TC8OGpgHM6YoUfAnWl3qsiVLIeLqnslk0jnFavic8/72kZ0 zCFw== X-Gm-Message-State: AOAM53275eBbHO5f1/R20N3k3pNlODz+b6dxzd5pUCJwvk6APbmsFE+T vc6wokLDszSd0vtjj9jXedN4GilByuDGbE186bfDJDzJhszkmH+ST0bgJfbyvjv3D3gnSYAvLKl LaO7Qo9+WI+YHwQ0= X-Received: by 2002:ab0:4868:: with SMTP id c37mr17257396uad.33.1639042803475; Thu, 09 Dec 2021 01:40:03 -0800 (PST) X-Google-Smtp-Source: ABdhPJz0SAtZYfq/FI2DXo8uJhs/vb2TlSezgUn4SIRgJv9hKTA74Yc41VNMpqr+8XE6rZ+3epm5Hg== X-Received: by 2002:ab0:4868:: with SMTP id c37mr17257369uad.33.1639042803258; Thu, 09 Dec 2021 01:40:03 -0800 (PST) Received: from LeoBras.redhat.com ([2804:431:c7f0:dadb:5bd9:f52:7a4a:a8af]) by smtp.gmail.com with ESMTPSA id s22sm3460215vkm.28.2021.12.09.01.40.00 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 09 Dec 2021 01:40:02 -0800 (PST) From: Leonardo Bras To: =?utf-8?q?Daniel_P=2E_Berrang=C3=A9?= , Juan Quintela , "Dr. David Alan Gilbert" , Eric Blake , Markus Armbruster Subject: [PATCH v6 2/6] QIOChannelSocket: Add flags parameter for writing Date: Thu, 9 Dec 2021 06:39:20 -0300 Message-Id: <20211209093923.1293701-3-leobras@redhat.com> X-Mailer: git-send-email 2.33.1 In-Reply-To: <20211209093923.1293701-1-leobras@redhat.com> References: <20211209093923.1293701-1-leobras@redhat.com> MIME-Version: 1.0 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=leobras@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Received-SPF: pass client-ip=170.10.129.124; envelope-from=leobras@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -33 X-Spam_score: -3.4 X-Spam_bar: --- X-Spam_report: (-3.4 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.618, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Leonardo Bras , qemu-devel@nongnu.org Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" Change qio_channel_socket_writev() in order to accept flags, so its possible to selectively make use of sendmsg() flags. qio_channel_socket_writev() contents were moved to a helper function qio_channel_socket_writev_flags() which accepts an extra argument for flags. (This argument is passed directly to sendmsg(). Signed-off-by: Leonardo Bras Reviewed-by: Daniel P. Berrangé --- io/channel-socket.c | 26 +++++++++++++++++++------- 1 file changed, 19 insertions(+), 7 deletions(-) diff --git a/io/channel-socket.c b/io/channel-socket.c index 606ec97cf7..b57a27bf91 100644 --- a/io/channel-socket.c +++ b/io/channel-socket.c @@ -520,12 +520,13 @@ static ssize_t qio_channel_socket_readv(QIOChannel *ioc, return ret; } -static ssize_t qio_channel_socket_writev(QIOChannel *ioc, - const struct iovec *iov, - size_t niov, - int *fds, - size_t nfds, - Error **errp) +static ssize_t qio_channel_socket_writev_flags(QIOChannel *ioc, + const struct iovec *iov, + size_t niov, + int *fds, + size_t nfds, + int flags, + Error **errp) { QIOChannelSocket *sioc = QIO_CHANNEL_SOCKET(ioc); ssize_t ret; @@ -558,7 +559,7 @@ static ssize_t qio_channel_socket_writev(QIOChannel *ioc, } retry: - ret = sendmsg(sioc->fd, &msg, 0); + ret = sendmsg(sioc->fd, &msg, flags); if (ret <= 0) { if (errno == EAGAIN) { return QIO_CHANNEL_ERR_BLOCK; @@ -572,6 +573,17 @@ static ssize_t qio_channel_socket_writev(QIOChannel *ioc, } return ret; } + +static ssize_t qio_channel_socket_writev(QIOChannel *ioc, + const struct iovec *iov, + size_t niov, + int *fds, + size_t nfds, + Error **errp) +{ + return qio_channel_socket_writev_flags(ioc, iov, niov, fds, nfds, 0, errp); +} + #else /* WIN32 */ static ssize_t qio_channel_socket_readv(QIOChannel *ioc, const struct iovec *iov, From patchwork Thu Dec 9 09:39:21 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Leonardo Bras X-Patchwork-Id: 12666237 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 5EF83C433F5 for ; Thu, 9 Dec 2021 09:43:39 +0000 (UTC) Received: from localhost ([::1]:57336 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1mvFxi-0006MN-Hk for qemu-devel@archiver.kernel.org; Thu, 09 Dec 2021 04:43:38 -0500 Received: from eggs.gnu.org ([209.51.188.92]:33822) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mvFua-0000r3-6d for qemu-devel@nongnu.org; Thu, 09 Dec 2021 04:40:24 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]:47826) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mvFuM-0008Gr-Rt for qemu-devel@nongnu.org; Thu, 09 Dec 2021 04:40:23 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1639042810; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=iOUPGpjpRkeh0PPRwOCu38b1HQTgEiG41NIQNvFH+wc=; b=Kqajhz4G0kdLvgomZOXGgLFauSabLQCW/MyRRuVlnL9xdO/8EltlpjyxJvM/alnn0d0IH/ NdIapQz08FQkC1jvxOa2FqyUIv5Ny8as4o/pDcMlEZKhIYQ8i6t0v+xeVtszxEO2zPnVsM sqnneEGt840XJf9sXQG8VMfukt5eLuE= Received: from mail-ua1-f72.google.com (mail-ua1-f72.google.com [209.85.222.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-319-OIbm-HlhP4aEJ0kjzOBEjA-1; Thu, 09 Dec 2021 04:40:07 -0500 X-MC-Unique: OIbm-HlhP4aEJ0kjzOBEjA-1 Received: by mail-ua1-f72.google.com with SMTP id w48-20020a9f31f3000000b002cf15fe9655so3316227uad.17 for ; Thu, 09 Dec 2021 01:40:07 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=iOUPGpjpRkeh0PPRwOCu38b1HQTgEiG41NIQNvFH+wc=; b=Lnv5n8U4f+YAK8Lp3Kme8LS4dg7aco+y1CExbsqIgiUEG1uWBpqHIT0/ppWab/1AOc 9w0fQKaoqeQ49EqY7KEpcFAkEhozGiTujHBP72mMVqIltU/sOSWwH92nvKA3qS/tnnuk FHPlTV73MWrUmEeOzny+oaNi7q/GTfyiUesEqynhKSWQr0uRgSzdkSnGTw9y0wm92OFy z8/LKl9Uy7IkctawMRudmgoazRhFfkfSaBjNexClXi1HVT8YD3SB6UyJ+E5VcNee7WYD Ip5Hi8+ZLk/BgYGtfphlXLr34eksS5VXc5oslRqpsdUlbGrIBNl3Lt5ymbNcQucgArZa tVbA== X-Gm-Message-State: AOAM533KKOGwAm0w8bpTJ4DVTg+1UzQaml+d3JGYW7SEF7zHUt5hfcKd asPBI5SQkzQtNVS2bF4YGkPgJv4U/ccBBlpklCyApL+pawCKH9E6HHrLtHGRz27VcbRhthCBY13 phwc+PX1l10qplo0= X-Received: by 2002:a67:f9c6:: with SMTP id c6mr6263187vsq.20.1639042806571; Thu, 09 Dec 2021 01:40:06 -0800 (PST) X-Google-Smtp-Source: ABdhPJwfzFRtAiG3HOrwKR3yujerRspAO8cCr1F/LuHgvVpLESFZ2sBibUpcLxWWEMWFyVcSQSsK3Q== X-Received: by 2002:a67:f9c6:: with SMTP id c6mr6263158vsq.20.1639042806343; Thu, 09 Dec 2021 01:40:06 -0800 (PST) Received: from LeoBras.redhat.com ([2804:431:c7f0:dadb:5bd9:f52:7a4a:a8af]) by smtp.gmail.com with ESMTPSA id s22sm3460215vkm.28.2021.12.09.01.40.03 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 09 Dec 2021 01:40:05 -0800 (PST) From: Leonardo Bras To: =?utf-8?q?Daniel_P=2E_Berrang=C3=A9?= , Juan Quintela , "Dr. David Alan Gilbert" , Eric Blake , Markus Armbruster Subject: [PATCH v6 3/6] QIOChannelSocket: Implement io_writev_zero_copy & io_flush_zero_copy for CONFIG_LINUX Date: Thu, 9 Dec 2021 06:39:21 -0300 Message-Id: <20211209093923.1293701-4-leobras@redhat.com> X-Mailer: git-send-email 2.33.1 In-Reply-To: <20211209093923.1293701-1-leobras@redhat.com> References: <20211209093923.1293701-1-leobras@redhat.com> MIME-Version: 1.0 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=leobras@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Received-SPF: pass client-ip=170.10.129.124; envelope-from=leobras@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -26 X-Spam_score: -2.7 X-Spam_bar: -- X-Spam_report: (-2.7 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.618, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Leonardo Bras , qemu-devel@nongnu.org Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" For CONFIG_LINUX, implement the new optional callbacks io_write_zero_copy and io_flush_zero_copy on QIOChannelSocket, but enables it only when MSG_ZEROCOPY feature is available in the host kernel, which is checked on qio_channel_socket_connect_sync() qio_channel_socket_flush_zero_copy() was implemented by counting how many times sendmsg(...,MSG_ZEROCOPY) was successfully called, and then reading the socket's error queue, in order to find how many of them finished sending. Flush will loop until those counters are the same, or until some error occurs. A new function qio_channel_socket_poll() was also created in order to avoid busy-looping recvmsg() in qio_channel_socket_flush_zero_copy() while waiting for updates in socket's error queue. Notes on using writev_zero_copy(): 1: Buffer - As MSG_ZEROCOPY tells the kernel to use the same user buffer to avoid copying, some caution is necessary to avoid overwriting any buffer before it's sent. If something like this happen, a newer version of the buffer may be sent instead. - If this is a problem, it's recommended to call flush_zero_copy() before freeing or re-using the buffer. 2: Locked memory - When using MSG_ZERCOCOPY, the buffer memory will be locked after queued, and unlocked after it's sent. - Depending on the size of each buffer, and how often it's sent, it may require a larger amount of locked memory than usually available to non-root user. - If the required amount of locked memory is not available, writev_zero_copy will return an error, which can abort an operation like migration, - Because of this, when an user code wants to add zero copy as a feature, it requires a mechanism to disable it, so it can still be accessible to less privileged users. Signed-off-by: Leonardo Bras --- include/io/channel-socket.h | 2 + io/channel-socket.c | 119 +++++++++++++++++++++++++++++++++++- 2 files changed, 118 insertions(+), 3 deletions(-) diff --git a/include/io/channel-socket.h b/include/io/channel-socket.h index e747e63514..513c428fe4 100644 --- a/include/io/channel-socket.h +++ b/include/io/channel-socket.h @@ -47,6 +47,8 @@ struct QIOChannelSocket { socklen_t localAddrLen; struct sockaddr_storage remoteAddr; socklen_t remoteAddrLen; + ssize_t zero_copy_queued; + ssize_t zero_copy_sent; }; diff --git a/io/channel-socket.c b/io/channel-socket.c index b57a27bf91..d0c91662c1 100644 --- a/io/channel-socket.c +++ b/io/channel-socket.c @@ -26,6 +26,10 @@ #include "io/channel-watch.h" #include "trace.h" #include "qapi/clone-visitor.h" +#ifdef CONFIG_LINUX +#include +#include +#endif #define SOCKET_MAX_FDS 16 @@ -55,6 +59,8 @@ qio_channel_socket_new(void) sioc = QIO_CHANNEL_SOCKET(object_new(TYPE_QIO_CHANNEL_SOCKET)); sioc->fd = -1; + sioc->zero_copy_queued = 0; + sioc->zero_copy_sent = 0; ioc = QIO_CHANNEL(sioc); qio_channel_set_feature(ioc, QIO_CHANNEL_FEATURE_SHUTDOWN); @@ -154,6 +160,16 @@ int qio_channel_socket_connect_sync(QIOChannelSocket *ioc, return -1; } +#ifdef CONFIG_LINUX + int ret, v = 1; + ret = qemu_setsockopt(fd, SOL_SOCKET, SO_ZEROCOPY, &v, sizeof(v)); + if (ret == 0) { + /* Zero copy available on host */ + qio_channel_set_feature(QIO_CHANNEL(ioc), + QIO_CHANNEL_FEATURE_WRITE_ZERO_COPY); + } +#endif + return 0; } @@ -561,12 +577,19 @@ static ssize_t qio_channel_socket_writev_flags(QIOChannel *ioc, retry: ret = sendmsg(sioc->fd, &msg, flags); if (ret <= 0) { - if (errno == EAGAIN) { + switch (errno) { + case EAGAIN: return QIO_CHANNEL_ERR_BLOCK; - } - if (errno == EINTR) { + case EINTR: goto retry; + case ENOBUFS: + if (flags & MSG_ZEROCOPY) { + error_setg_errno(errp, errno, + "Process can't lock enough memory for using MSG_ZEROCOPY"); + return -1; + } } + error_setg_errno(errp, errno, "Unable to write to socket"); return -1; @@ -670,6 +693,92 @@ static ssize_t qio_channel_socket_writev(QIOChannel *ioc, } #endif /* WIN32 */ + +#ifdef CONFIG_LINUX + +static ssize_t qio_channel_socket_writev_zero_copy(QIOChannel *ioc, + const struct iovec *iov, + size_t niov, + Error **errp) +{ + QIOChannelSocket *sioc = QIO_CHANNEL_SOCKET(ioc); + ssize_t ret; + + ret = qio_channel_socket_writev_flags(ioc, iov, niov, NULL, 0, + MSG_ZEROCOPY, errp); + if (ret > 0) { + sioc->zero_copy_queued++; + } + + return ret; +} + +static int qio_channel_socket_flush_zero_copy(QIOChannel *ioc, + Error **errp) +{ + QIOChannelSocket *sioc = QIO_CHANNEL_SOCKET(ioc); + struct msghdr msg = {}; + struct sock_extended_err *serr; + struct cmsghdr *cm; + char control[CMSG_SPACE(sizeof(*serr))]; + int received; + int ret = 1; + + msg.msg_control = control; + msg.msg_controllen = sizeof(control); + memset(control, 0, sizeof(control)); + + while (sioc->zero_copy_sent < sioc->zero_copy_queued) { + received = recvmsg(sioc->fd, &msg, MSG_ERRQUEUE); + if (received < 0) { + switch (errno) { + case EAGAIN: + /* Nothing on errqueue, wait until something is available */ + qio_channel_wait(ioc, G_IO_ERR); + continue; + case EINTR: + continue; + default: + error_setg_errno(errp, errno, + "Unable to read errqueue"); + return -1; + } + } + + cm = CMSG_FIRSTHDR(&msg); + if (cm->cmsg_level != SOL_IP && + cm->cmsg_type != IP_RECVERR) { + error_setg_errno(errp, EPROTOTYPE, + "Wrong cmsg in errqueue"); + return -1; + } + + serr = (void *) CMSG_DATA(cm); + if (serr->ee_errno != SO_EE_ORIGIN_NONE) { + error_setg_errno(errp, serr->ee_errno, + "Error on socket"); + return -1; + } + if (serr->ee_origin != SO_EE_ORIGIN_ZEROCOPY) { + error_setg_errno(errp, serr->ee_origin, + "Error not from zero copy"); + return -1; + } + + /* No errors, count successfully finished sendmsg()*/ + sioc->zero_copy_sent += serr->ee_data - serr->ee_info + 1; + + /* If any sendmsg() succeeded using zero copy, return 0 at the end */ + if (serr->ee_code != SO_EE_CODE_ZEROCOPY_COPIED) { + ret = 0; + } + } + + return ret; +} + +#endif /* CONFIG_LINUX */ + static int qio_channel_socket_set_blocking(QIOChannel *ioc, bool enabled, @@ -799,6 +908,10 @@ static void qio_channel_socket_class_init(ObjectClass *klass, ioc_klass->io_set_delay = qio_channel_socket_set_delay; ioc_klass->io_create_watch = qio_channel_socket_create_watch; ioc_klass->io_set_aio_fd_handler = qio_channel_socket_set_aio_fd_handler; +#ifdef CONFIG_LINUX + ioc_klass->io_writev_zero_copy = qio_channel_socket_writev_zero_copy; + ioc_klass->io_flush_zero_copy = qio_channel_socket_flush_zero_copy; +#endif } static const TypeInfo qio_channel_socket_info = { From patchwork Thu Dec 9 09:39:22 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Leonardo Bras X-Patchwork-Id: 12666253 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id D0605C433EF for ; Thu, 9 Dec 2021 09:44:24 +0000 (UTC) Received: from localhost ([::1]:58222 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1mvFyR-0006zO-RX for qemu-devel@archiver.kernel.org; Thu, 09 Dec 2021 04:44:23 -0500 Received: from eggs.gnu.org ([209.51.188.92]:33734) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mvFuT-0000hD-NE for qemu-devel@nongnu.org; Thu, 09 Dec 2021 04:40:17 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]:30827) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mvFuR-0008JV-GE for qemu-devel@nongnu.org; Thu, 09 Dec 2021 04:40:17 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1639042814; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=F4NLDeIuuaxxceOZmivbo1XKjLJe4Xseu9LYJlJr5/M=; b=RKonH8PyGHD4VF7zAJqBeMp8lk3NYKIurOPySVOrBm8wc7A/MwxV4noFyHzZWTAogejBaR AHmTyOQFMTyqgdr1oKnXjtsgho9AN8CM7sYLFbpUOzMVylQGJYezCqkD/vCxPWYZw2ksBp x/3+J1p2Rof3vgmTLXoKZc1Oyjr2Hgo= Received: from mail-vk1-f199.google.com (mail-vk1-f199.google.com [209.85.221.199]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-109-g1goGKrYPre0VDZlG-bWqQ-1; Thu, 09 Dec 2021 04:40:13 -0500 X-MC-Unique: g1goGKrYPre0VDZlG-bWqQ-1 Received: by mail-vk1-f199.google.com with SMTP id t193-20020a1f78ca000000b002f7fbb56d22so2930927vkc.17 for ; Thu, 09 Dec 2021 01:40:13 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=F4NLDeIuuaxxceOZmivbo1XKjLJe4Xseu9LYJlJr5/M=; b=IMrPeJVZkqSmpi525yZihi1mk3P2M2efxdB2TF+fgeNexA0yi2VCoBrwGAKx6ybRoY FojD7t7iLMxDohzZ/ADLDerkx2VNmffZQNquI1Soe+ZeDWALf1rDnzrSmfWrr7bQpoMo Z85S/WcSGZNnWJy4+C240BoAtWEJobAzWtM63XGmVWe/Vg7rwoTr55/DmNycwDhpo8/Q HkIaMrvfXdETR5FiJprJ8sX8kXNxzyikjf+qIk3XeOCKbDHoagtrDCl6HsJSFF51v4jb 43KuzY/t6vUKWrMz8+cQ02oERX/Xe9pWr8fSFOhbdtAb8Ry/lUzuwzMdFcRAZ6nDRLwK sUHA== X-Gm-Message-State: AOAM531CH4nUflseL81vsCQ0SkgisGOKRWMTxQO9Md63V0ZOeQUduAxq 3Lno8Ee6qaf41y0nBE/MidqRRqAPgsQM1O8PSd2zOP7ZPMJduwaSUASDM3K8gwYpBVYMBktEtFQ Zz0Sinzn8uoxh0nc= X-Received: by 2002:a05:6122:548:: with SMTP id y8mr7315237vko.24.1639042813245; Thu, 09 Dec 2021 01:40:13 -0800 (PST) X-Google-Smtp-Source: ABdhPJzPtbYUW47MwnBSywo/z1lAdLVvICZYtBWSsGVy8dq1+jSH+vHKfcAoYHGTpvcrJUbAoxpeOQ== X-Received: by 2002:a05:6122:548:: with SMTP id y8mr7315201vko.24.1639042812997; Thu, 09 Dec 2021 01:40:12 -0800 (PST) Received: from LeoBras.redhat.com ([2804:431:c7f0:dadb:5bd9:f52:7a4a:a8af]) by smtp.gmail.com with ESMTPSA id s22sm3460215vkm.28.2021.12.09.01.40.06 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 09 Dec 2021 01:40:09 -0800 (PST) From: Leonardo Bras To: =?utf-8?q?Daniel_P=2E_Berrang=C3=A9?= , Juan Quintela , "Dr. David Alan Gilbert" , Eric Blake , Markus Armbruster Subject: [PATCH v6 4/6] migration: Add zero-copy parameter for QMP/HMP for Linux Date: Thu, 9 Dec 2021 06:39:22 -0300 Message-Id: <20211209093923.1293701-5-leobras@redhat.com> X-Mailer: git-send-email 2.33.1 In-Reply-To: <20211209093923.1293701-1-leobras@redhat.com> References: <20211209093923.1293701-1-leobras@redhat.com> MIME-Version: 1.0 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=leobras@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Received-SPF: pass client-ip=170.10.129.124; envelope-from=leobras@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -33 X-Spam_score: -3.4 X-Spam_bar: --- X-Spam_report: (-3.4 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.618, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Leonardo Bras , qemu-devel@nongnu.org Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" Add property that allows zero-copy migration of memory pages, and also includes a helper function migrate_use_zero_copy() to check if it's enabled. No code is introduced to actually do the migration, but it allow future implementations to enable/disable this feature. On non-Linux builds this parameter is compiled-out. Signed-off-by: Leonardo Bras --- qapi/migration.json | 24 ++++++++++++++++++++++++ migration/migration.h | 5 +++++ migration/migration.c | 32 ++++++++++++++++++++++++++++++++ migration/socket.c | 5 +++++ monitor/hmp-cmds.c | 6 ++++++ 5 files changed, 72 insertions(+) diff --git a/qapi/migration.json b/qapi/migration.json index bbfd48cf0b..2e62ea6ebd 100644 --- a/qapi/migration.json +++ b/qapi/migration.json @@ -730,6 +730,13 @@ # will consume more CPU. # Defaults to 1. (Since 5.0) # +# @zero-copy: Controls behavior on sending memory pages on migration. +# When true, enables a zero-copy mechanism for sending memory +# pages, if host supports it. +# Requires that QEMU be permitted to use locked memory for guest +# RAM pages. +# Defaults to false. (Since 7.0) +# # @block-bitmap-mapping: Maps block nodes and bitmaps on them to # aliases for the purpose of dirty bitmap migration. Such # aliases may for example be the corresponding names on the @@ -769,6 +776,7 @@ 'xbzrle-cache-size', 'max-postcopy-bandwidth', 'max-cpu-throttle', 'multifd-compression', 'multifd-zlib-level' ,'multifd-zstd-level', + { 'name': 'zero-copy', 'if' : 'CONFIG_LINUX'}, 'block-bitmap-mapping' ] } ## @@ -895,6 +903,13 @@ # will consume more CPU. # Defaults to 1. (Since 5.0) # +# @zero-copy: Controls behavior on sending memory pages on migration. +# When true, enables a zero-copy mechanism for sending memory +# pages, if host supports it. +# Requires that QEMU be permitted to use locked memory for guest +# RAM pages. +# Defaults to false. (Since 7.0) +# # @block-bitmap-mapping: Maps block nodes and bitmaps on them to # aliases for the purpose of dirty bitmap migration. Such # aliases may for example be the corresponding names on the @@ -949,6 +964,7 @@ '*multifd-compression': 'MultiFDCompression', '*multifd-zlib-level': 'uint8', '*multifd-zstd-level': 'uint8', + '*zero-copy': { 'type': 'bool', 'if': 'CONFIG_LINUX' }, '*block-bitmap-mapping': [ 'BitmapMigrationNodeAlias' ] } } ## @@ -1095,6 +1111,13 @@ # will consume more CPU. # Defaults to 1. (Since 5.0) # +# @zero-copy: Controls behavior on sending memory pages on migration. +# When true, enables a zero-copy mechanism for sending memory +# pages, if host supports it. +# Requires that QEMU be permitted to use locked memory for guest +# RAM pages. +# Defaults to false. (Since 7.0) +# # @block-bitmap-mapping: Maps block nodes and bitmaps on them to # aliases for the purpose of dirty bitmap migration. Such # aliases may for example be the corresponding names on the @@ -1147,6 +1170,7 @@ '*multifd-compression': 'MultiFDCompression', '*multifd-zlib-level': 'uint8', '*multifd-zstd-level': 'uint8', + '*zero-copy': { 'type': 'bool', 'if': 'CONFIG_LINUX' }, '*block-bitmap-mapping': [ 'BitmapMigrationNodeAlias' ] } } ## diff --git a/migration/migration.h b/migration/migration.h index 8130b703eb..1489eeb165 100644 --- a/migration/migration.h +++ b/migration/migration.h @@ -339,6 +339,11 @@ MultiFDCompression migrate_multifd_compression(void); int migrate_multifd_zlib_level(void); int migrate_multifd_zstd_level(void); +#ifdef CONFIG_LINUX +bool migrate_use_zero_copy(void); +#else +#define migrate_use_zero_copy() (false) +#endif int migrate_use_xbzrle(void); uint64_t migrate_xbzrle_cache_size(void); bool migrate_colo_enabled(void); diff --git a/migration/migration.c b/migration/migration.c index abaf6f9e3d..109d11e4a1 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -886,6 +886,10 @@ MigrationParameters *qmp_query_migrate_parameters(Error **errp) params->multifd_zlib_level = s->parameters.multifd_zlib_level; params->has_multifd_zstd_level = true; params->multifd_zstd_level = s->parameters.multifd_zstd_level; +#ifdef CONFIG_LINUX + params->has_zero_copy = true; + params->zero_copy = s->parameters.zero_copy; +#endif params->has_xbzrle_cache_size = true; params->xbzrle_cache_size = s->parameters.xbzrle_cache_size; params->has_max_postcopy_bandwidth = true; @@ -1538,6 +1542,11 @@ static void migrate_params_test_apply(MigrateSetParameters *params, if (params->has_multifd_compression) { dest->multifd_compression = params->multifd_compression; } +#ifdef CONFIG_LINUX + if (params->has_zero_copy) { + dest->zero_copy = params->zero_copy; + } +#endif if (params->has_xbzrle_cache_size) { dest->xbzrle_cache_size = params->xbzrle_cache_size; } @@ -1650,6 +1659,11 @@ static void migrate_params_apply(MigrateSetParameters *params, Error **errp) if (params->has_multifd_compression) { s->parameters.multifd_compression = params->multifd_compression; } +#ifdef CONFIG_LINUX + if (params->has_zero_copy) { + s->parameters.zero_copy = params->zero_copy; + } +#endif if (params->has_xbzrle_cache_size) { s->parameters.xbzrle_cache_size = params->xbzrle_cache_size; xbzrle_cache_resize(params->xbzrle_cache_size, errp); @@ -2540,6 +2554,17 @@ int migrate_multifd_zstd_level(void) return s->parameters.multifd_zstd_level; } +#ifdef CONFIG_LINUX +bool migrate_use_zero_copy(void) +{ + MigrationState *s; + + s = migrate_get_current(); + + return s->parameters.zero_copy; +} +#endif + int migrate_use_xbzrle(void) { MigrationState *s; @@ -4190,6 +4215,10 @@ static Property migration_properties[] = { DEFINE_PROP_UINT8("multifd-zstd-level", MigrationState, parameters.multifd_zstd_level, DEFAULT_MIGRATE_MULTIFD_ZSTD_LEVEL), +#ifdef CONFIG_LINUX + DEFINE_PROP_BOOL("zero_copy", MigrationState, + parameters.zero_copy, false), +#endif DEFINE_PROP_SIZE("xbzrle-cache-size", MigrationState, parameters.xbzrle_cache_size, DEFAULT_MIGRATE_XBZRLE_CACHE_SIZE), @@ -4287,6 +4316,9 @@ static void migration_instance_init(Object *obj) params->has_multifd_compression = true; params->has_multifd_zlib_level = true; params->has_multifd_zstd_level = true; +#ifdef CONFIG_LINUX + params->has_zero_copy = true; +#endif params->has_xbzrle_cache_size = true; params->has_max_postcopy_bandwidth = true; params->has_max_cpu_throttle = true; diff --git a/migration/socket.c b/migration/socket.c index 05705a32d8..f7a77aafd3 100644 --- a/migration/socket.c +++ b/migration/socket.c @@ -77,6 +77,11 @@ static void socket_outgoing_migration(QIOTask *task, } else { trace_migration_socket_outgoing_connected(data->hostname); } + + if (migrate_use_zero_copy()) { + error_setg(&err, "Zero copy not available in migration"); + } + migration_channel_connect(data->s, sioc, data->hostname, err); object_unref(OBJECT(sioc)); } diff --git a/monitor/hmp-cmds.c b/monitor/hmp-cmds.c index 9c91bf93e9..71d16f86ac 100644 --- a/monitor/hmp-cmds.c +++ b/monitor/hmp-cmds.c @@ -1297,6 +1297,12 @@ void hmp_migrate_set_parameter(Monitor *mon, const QDict *qdict) p->has_multifd_zstd_level = true; visit_type_uint8(v, param, &p->multifd_zstd_level, &err); break; +#ifdef CONFIG_LINUX + case MIGRATION_PARAMETER_ZERO_COPY: + p->has_zero_copy = true; + visit_type_bool(v, param, &p->zero_copy, &err); + break; +#endif case MIGRATION_PARAMETER_XBZRLE_CACHE_SIZE: p->has_xbzrle_cache_size = true; if (!visit_type_size(v, param, &cache_size, &err)) { From patchwork Thu Dec 9 09:39:23 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Leonardo Bras X-Patchwork-Id: 12666255 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 5CBA9C433F5 for ; Thu, 9 Dec 2021 09:46:33 +0000 (UTC) Received: from localhost ([::1]:35204 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1mvG0W-00028Z-A2 for qemu-devel@archiver.kernel.org; Thu, 09 Dec 2021 04:46:32 -0500 Received: from eggs.gnu.org ([209.51.188.92]:33920) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mvFui-0001EF-74 for qemu-devel@nongnu.org; Thu, 09 Dec 2021 04:40:32 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:49059) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mvFuU-0008N0-6b for qemu-devel@nongnu.org; Thu, 09 Dec 2021 04:40:31 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1639042817; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ZfXfBsToD0+qcYWGy5JM4yO2gVY45hZvZa4JsqMhZsk=; b=Mw1or5XIOCKjLL2IXLUa72TKfbF7emtO6rMzGxuxwHdkV0nVSxCfPCblTNLVzF0IKs8JGX +h+AsmPvNVBZ0f2eVCV+dZHRicezRhkYGKBSIZM6VxwqRJtgNyXUX+WBdY2rviONIjJyBl TdGIeIYzVsTESREHyP7uoJpnyfkaiOM= Received: from mail-vk1-f199.google.com (mail-vk1-f199.google.com [209.85.221.199]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-248--EnvzmEONMSXKzz46tAi0g-1; Thu, 09 Dec 2021 04:40:16 -0500 X-MC-Unique: -EnvzmEONMSXKzz46tAi0g-1 Received: by mail-vk1-f199.google.com with SMTP id m190-20020a1feec7000000b002fa5912bcd7so2923187vkh.21 for ; Thu, 09 Dec 2021 01:40:16 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=ZfXfBsToD0+qcYWGy5JM4yO2gVY45hZvZa4JsqMhZsk=; b=J1uIgJOSohTZur6GOtdnWVPofgj9SCw3znLxgmNARS2CDolbtq4zGSrFdjBHkHId6d 4szxKQyhWj4MkFT0fgoVbfBxZ15H/F+1MwV+4a2icDrZk26qh038nAbdBLB4vqkyWr5y gcej0MOvUbMZEZ3N9QnlXFtWKQ10BrZjvmlJ3pt8ka81+RvFuurz2D6QIeEGN9/+uN4d xxB9+zO7B/UNEZiXM77hLnpI6noltFiDEJ4MQkK3IAU8k85kRvwP3rnfX+dabJrt17Ps /GQYj7ODFAbnuzjYgXHr6Bm71bILfrWqwCbF+1VycDFklj5/AlkfihPDZj56CVcXNI/b JvsA== X-Gm-Message-State: AOAM532w1IMd9KSu/iiM2tE65KvvtlLqsN3bZb5g/RBRuJLOQyuGMfEO qSWN0uePdLehLbbrd9h7kDdL7EO6VlR8npp4rFDUSv+Ij7WLs4/rZavCIDS4TDC343zRzJdZM+q Ku0AelIT8JU/kB2Q= X-Received: by 2002:a05:6102:3ed1:: with SMTP id n17mr5981551vsv.3.1639042815941; Thu, 09 Dec 2021 01:40:15 -0800 (PST) X-Google-Smtp-Source: ABdhPJwqprxE7tyyxid4tLcY7kQh146rFIr2WEKeXv6gr4pCuReHc2HfhRG17dJdlF/OVShbIytMuQ== X-Received: by 2002:a05:6102:3ed1:: with SMTP id n17mr5981529vsv.3.1639042815796; Thu, 09 Dec 2021 01:40:15 -0800 (PST) Received: from LeoBras.redhat.com ([2804:431:c7f0:dadb:5bd9:f52:7a4a:a8af]) by smtp.gmail.com with ESMTPSA id s22sm3460215vkm.28.2021.12.09.01.40.13 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 09 Dec 2021 01:40:15 -0800 (PST) From: Leonardo Bras To: =?utf-8?q?Daniel_P=2E_Berrang=C3=A9?= , Juan Quintela , "Dr. David Alan Gilbert" , Eric Blake , Markus Armbruster Subject: [PATCH v6 5/6] migration: Add migrate_use_tls() helper Date: Thu, 9 Dec 2021 06:39:23 -0300 Message-Id: <20211209093923.1293701-6-leobras@redhat.com> X-Mailer: git-send-email 2.33.1 In-Reply-To: <20211209093923.1293701-1-leobras@redhat.com> References: <20211209093923.1293701-1-leobras@redhat.com> MIME-Version: 1.0 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=leobras@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Received-SPF: pass client-ip=170.10.133.124; envelope-from=leobras@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -26 X-Spam_score: -2.7 X-Spam_bar: -- X-Spam_report: (-2.7 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.618, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Leonardo Bras , qemu-devel@nongnu.org Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" A lot of places check parameters.tls_creds in order to evaluate if TLS is in use, and sometimes call migrate_get_current() just for that test. Add new helper function migrate_use_tls() in order to simplify testing for TLS usage. Signed-off-by: Leonardo Bras Reviewed-by: Juan Quintela --- migration/migration.h | 1 + migration/channel.c | 6 +++--- migration/migration.c | 9 +++++++++ migration/multifd.c | 5 +---- 4 files changed, 14 insertions(+), 7 deletions(-) diff --git a/migration/migration.h b/migration/migration.h index 1489eeb165..445d95bbf2 100644 --- a/migration/migration.h +++ b/migration/migration.h @@ -344,6 +344,7 @@ bool migrate_use_zero_copy(void); #else #define migrate_use_zero_copy() (false) #endif +int migrate_use_tls(void); int migrate_use_xbzrle(void); uint64_t migrate_xbzrle_cache_size(void); bool migrate_colo_enabled(void); diff --git a/migration/channel.c b/migration/channel.c index c4fc000a1a..1a45b75d29 100644 --- a/migration/channel.c +++ b/migration/channel.c @@ -32,16 +32,16 @@ */ void migration_channel_process_incoming(QIOChannel *ioc) { - MigrationState *s = migrate_get_current(); Error *local_err = NULL; trace_migration_set_incoming_channel( ioc, object_get_typename(OBJECT(ioc))); - if (s->parameters.tls_creds && - *s->parameters.tls_creds && + if (migrate_use_tls() && !object_dynamic_cast(OBJECT(ioc), TYPE_QIO_CHANNEL_TLS)) { + MigrationState *s = migrate_get_current(); + migration_tls_channel_process_incoming(s, ioc, &local_err); } else { migration_ioc_register_yank(ioc); diff --git a/migration/migration.c b/migration/migration.c index 109d11e4a1..8e50f7508a 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -2565,6 +2565,15 @@ bool migrate_use_zero_copy(void) } #endif +int migrate_use_tls(void) +{ + MigrationState *s; + + s = migrate_get_current(); + + return s->parameters.tls_creds && *s->parameters.tls_creds; +} + int migrate_use_xbzrle(void) { MigrationState *s; diff --git a/migration/multifd.c b/migration/multifd.c index 7c9deb1921..b32b756147 100644 --- a/migration/multifd.c +++ b/migration/multifd.c @@ -794,14 +794,11 @@ static bool multifd_channel_connect(MultiFDSendParams *p, QIOChannel *ioc, Error *error) { - MigrationState *s = migrate_get_current(); - trace_multifd_set_outgoing_channel( ioc, object_get_typename(OBJECT(ioc)), p->tls_hostname, error); if (!error) { - if (s->parameters.tls_creds && - *s->parameters.tls_creds && + if (migrate_use_tls() && !object_dynamic_cast(OBJECT(ioc), TYPE_QIO_CHANNEL_TLS)) { multifd_tls_channel_connect(p, ioc, &error); From patchwork Thu Dec 9 09:39:24 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Leonardo Bras X-Patchwork-Id: 12666257 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id AB79DC433F5 for ; Thu, 9 Dec 2021 09:46:50 +0000 (UTC) Received: from localhost ([::1]:36182 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1mvG0n-0002rA-R0 for qemu-devel@archiver.kernel.org; Thu, 09 Dec 2021 04:46:49 -0500 Received: from eggs.gnu.org ([209.51.188.92]:33846) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mvFub-0000vE-Qt for qemu-devel@nongnu.org; Thu, 09 Dec 2021 04:40:25 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:21717) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mvFuZ-0008OG-T9 for qemu-devel@nongnu.org; Thu, 09 Dec 2021 04:40:25 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1639042822; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=lA323sJLUVxjAwZPCVNpDBJnVeKk+Zs9yrOAfD/pO4o=; b=IeSRtqf6CD149t62y21fQaH93LbGaRLAjDjNLPil6Vj9UyIN0DDCrzljjQHbIFfkQCnnX7 QEKkfppemKqgsN3FmVIkdhefhYskmrg0Y2ns8R1YtdHHYwfzMsn9gwwl32nhnLStWvagHP 8/hPU72zygYmjXkJUp745K1ZbY+o/x8= Received: from mail-vk1-f197.google.com (mail-vk1-f197.google.com [209.85.221.197]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-258-DXfGgoQvNmmX07zy3MWGHA-1; Thu, 09 Dec 2021 04:40:19 -0500 X-MC-Unique: DXfGgoQvNmmX07zy3MWGHA-1 Received: by mail-vk1-f197.google.com with SMTP id x5-20020a056122118500b002efcd7b7990so2939767vkn.15 for ; Thu, 09 Dec 2021 01:40:19 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=lA323sJLUVxjAwZPCVNpDBJnVeKk+Zs9yrOAfD/pO4o=; b=GObWmRGrGT0Mcs2d9VRSjw4/P7e3h6idkdysbHwhjA4W2zETf3DyNHXABLAZpt7nDu VAel2e7GK1tHqY6RdOTUR2oDQTBWUIp0OF+6iBvYVwKG78w1T/3CSQ2+pegqIrZtk2xn Ml6nRnBpMtZdApojoRH+TJWaq8BU4UfA2FbTEgRO1MKewPUPn/OqifDXGCDAftNEAt3r KQtaSirG9iXKbXf+CsGTIQLVivWmhVFJG5At+fIrCuxRGFw0gcuxxvZXNXBrX5YhTKwG bMsfHWC5gQfDupxuOelT8t6i39mHCar179X3Uf9T7CKvFrp/JachZE/Ai4yepJXRvsQz /QPA== X-Gm-Message-State: AOAM531gz+XOZBIqlpN2RYuIXCaOU6kiD9ztJslbLbqDQ0euX5ND4d6t RhKopMKsv/IQ3RKSoUjTGQlbDRhSit/0kZv3diaGUs7eriNEdrZSFIoy3DuvYKLQB1JKLTqx/3q KZdEjCK/XJhj+rng= X-Received: by 2002:a67:f88e:: with SMTP id h14mr6055452vso.37.1639042819240; Thu, 09 Dec 2021 01:40:19 -0800 (PST) X-Google-Smtp-Source: ABdhPJwguOL5PbzBRX8WmXzUevHjtwMTKFVwcfHzU/yQDfYwVaRrf/N5LY05/xVx57auDLMKiWmMtw== X-Received: by 2002:a67:f88e:: with SMTP id h14mr6055413vso.37.1639042818985; Thu, 09 Dec 2021 01:40:18 -0800 (PST) Received: from LeoBras.redhat.com ([2804:431:c7f0:dadb:5bd9:f52:7a4a:a8af]) by smtp.gmail.com with ESMTPSA id s22sm3460215vkm.28.2021.12.09.01.40.16 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 09 Dec 2021 01:40:18 -0800 (PST) From: Leonardo Bras To: =?utf-8?q?Daniel_P=2E_Berrang=C3=A9?= , Juan Quintela , "Dr. David Alan Gilbert" , Eric Blake , Markus Armbruster Subject: [PATCH v6 6/6] multifd: Implement zero copy write in multifd migration (multifd-zero-copy) Date: Thu, 9 Dec 2021 06:39:24 -0300 Message-Id: <20211209093923.1293701-7-leobras@redhat.com> X-Mailer: git-send-email 2.33.1 In-Reply-To: <20211209093923.1293701-1-leobras@redhat.com> References: <20211209093923.1293701-1-leobras@redhat.com> MIME-Version: 1.0 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=leobras@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Received-SPF: pass client-ip=170.10.133.124; envelope-from=leobras@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -33 X-Spam_score: -3.4 X-Spam_bar: --- X-Spam_report: (-3.4 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.618, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Leonardo Bras , qemu-devel@nongnu.org Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" Implement zero copy on nocomp_send_write(), by making use of QIOChannel zero copy interface. Change multifd_send_sync_main() so it can distinguish each iteration sync from the setup and the completion, so a flush_zero_copy() can be called after each iteration in order to make sure all dirty pages are sent before a new iteration is started. Also make it return -1 if flush_zero_copy() fails, in order to cancel the migration process, and avoid resuming the guest in the target host without receiving all current RAM. This will work fine on RAM migration because the RAM pages are not usually freed, and there is no problem on changing the pages content between writev_zero_copy() and the actual sending of the buffer, because this change will dirty the page and cause it to be re-sent on a next iteration anyway. A lot of locked memory may be needed in order to use multid migration with zero-copy enabled, so disabling the feature should be necessary for low-privileged users trying to perform multifd migrations. Signed-off-by: Leonardo Bras --- migration/multifd.h | 4 +++- migration/migration.c | 11 ++++++++++- migration/multifd.c | 40 +++++++++++++++++++++++++++++++++++----- migration/ram.c | 29 ++++++++++++++++++++++------- migration/socket.c | 5 +++-- 5 files changed, 73 insertions(+), 16 deletions(-) diff --git a/migration/multifd.h b/migration/multifd.h index 15c50ca0b2..37941c1872 100644 --- a/migration/multifd.h +++ b/migration/multifd.h @@ -22,7 +22,7 @@ int multifd_load_cleanup(Error **errp); bool multifd_recv_all_channels_created(void); bool multifd_recv_new_channel(QIOChannel *ioc, Error **errp); void multifd_recv_sync_main(void); -void multifd_send_sync_main(QEMUFile *f); +int multifd_send_sync_main(QEMUFile *f, bool sync); int multifd_queue_page(QEMUFile *f, RAMBlock *block, ram_addr_t offset); /* Multifd Compression flags */ @@ -97,6 +97,8 @@ typedef struct { uint32_t packet_len; /* pointer to the packet */ MultiFDPacket_t *packet; + /* multifd flags for sending ram */ + int write_flags; /* multifd flags for each packet */ uint32_t flags; /* size of the next packet that contains pages */ diff --git a/migration/migration.c b/migration/migration.c index 8e50f7508a..4e154ff901 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -1468,7 +1468,16 @@ static bool migrate_params_check(MigrationParameters *params, Error **errp) error_prepend(errp, "Invalid mapping given for block-bitmap-mapping: "); return false; } - +#ifdef CONFIG_LINUX + if (params->zero_copy && + (!migrate_use_multifd() || + params->multifd_compression != MULTIFD_COMPRESSION_NONE || + (params->tls_creds && *params->tls_creds))) { + error_setg(errp, + "Zero copy only available for non-compressed non-TLS multifd migration"); + return false; + } +#endif return true; } diff --git a/migration/multifd.c b/migration/multifd.c index b32b756147..4c718cddf7 100644 --- a/migration/multifd.c +++ b/migration/multifd.c @@ -105,7 +105,8 @@ static int nocomp_send_prepare(MultiFDSendParams *p, uint32_t used, */ static int nocomp_send_write(MultiFDSendParams *p, uint32_t used, Error **errp) { - return qio_channel_writev_all(p->c, p->pages->iov, used, errp); + return qio_channel_writev_all_flags(p->c, p->pages->iov, used, + p->write_flags, errp); } /** @@ -578,19 +579,28 @@ void multifd_save_cleanup(void) multifd_send_state = NULL; } -void multifd_send_sync_main(QEMUFile *f) +int multifd_send_sync_main(QEMUFile *f, bool sync) { int i; + bool flush_zero_copy; if (!migrate_use_multifd()) { - return; + return 0; } if (multifd_send_state->pages->used) { if (multifd_send_pages(f) < 0) { error_report("%s: multifd_send_pages fail", __func__); - return; + return 0; } } + + /* + * When using zero-copy, it's necessary to flush after each iteration to + * make sure pages from earlier iterations don't end up replacing newer + * pages. + */ + flush_zero_copy = sync && migrate_use_zero_copy(); + for (i = 0; i < migrate_multifd_channels(); i++) { MultiFDSendParams *p = &multifd_send_state->params[i]; @@ -601,7 +611,7 @@ void multifd_send_sync_main(QEMUFile *f) if (p->quit) { error_report("%s: channel %d has already quit", __func__, i); qemu_mutex_unlock(&p->mutex); - return; + return 0; } p->packet_num = multifd_send_state->packet_num++; @@ -612,6 +622,17 @@ void multifd_send_sync_main(QEMUFile *f) ram_counters.transferred += p->packet_len; qemu_mutex_unlock(&p->mutex); qemu_sem_post(&p->sem); + + if (flush_zero_copy) { + int ret; + Error *err = NULL; + + ret = qio_channel_flush_zero_copy(p->c, &err); + if (ret < 0) { + error_report_err(err); + return -1; + } + } } for (i = 0; i < migrate_multifd_channels(); i++) { MultiFDSendParams *p = &multifd_send_state->params[i]; @@ -620,6 +641,8 @@ void multifd_send_sync_main(QEMUFile *f) qemu_sem_wait(&p->sem_sync); } trace_multifd_send_sync_main(multifd_send_state->packet_num); + + return 0; } static void *multifd_send_thread(void *opaque) @@ -917,6 +940,13 @@ int multifd_save_setup(Error **errp) p->packet->version = cpu_to_be32(MULTIFD_VERSION); p->name = g_strdup_printf("multifdsend_%d", i); p->tls_hostname = g_strdup(s->hostname); + + if (migrate_use_zero_copy()) { + p->write_flags = QIO_CHANNEL_WRITE_FLAG_ZERO_COPY; + } else { + p->write_flags = 0; + } + socket_send_channel_create(multifd_new_send_channel_async, p); } diff --git a/migration/ram.c b/migration/ram.c index 863035d235..0b3ddbffc1 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -2992,6 +2992,7 @@ static int ram_save_setup(QEMUFile *f, void *opaque) { RAMState **rsp = opaque; RAMBlock *block; + int ret; if (compress_threads_save_setup()) { return -1; @@ -3026,7 +3027,11 @@ static int ram_save_setup(QEMUFile *f, void *opaque) ram_control_before_iterate(f, RAM_CONTROL_SETUP); ram_control_after_iterate(f, RAM_CONTROL_SETUP); - multifd_send_sync_main(f); + ret = multifd_send_sync_main(f, false); + if (ret < 0) { + return ret; + } + qemu_put_be64(f, RAM_SAVE_FLAG_EOS); qemu_fflush(f); @@ -3135,7 +3140,11 @@ static int ram_save_iterate(QEMUFile *f, void *opaque) out: if (ret >= 0 && migration_is_setup_or_active(migrate_get_current()->state)) { - multifd_send_sync_main(rs->f); + ret = multifd_send_sync_main(rs->f, true); + if (ret < 0) { + return ret; + } + qemu_put_be64(f, RAM_SAVE_FLAG_EOS); qemu_fflush(f); ram_counters.transferred += 8; @@ -3193,13 +3202,19 @@ static int ram_save_complete(QEMUFile *f, void *opaque) ram_control_after_iterate(f, RAM_CONTROL_FINISH); } - if (ret >= 0) { - multifd_send_sync_main(rs->f); - qemu_put_be64(f, RAM_SAVE_FLAG_EOS); - qemu_fflush(f); + if (ret < 0) { + return ret; } - return ret; + ret = multifd_send_sync_main(rs->f, false); + if (ret < 0) { + return ret; + } + + qemu_put_be64(f, RAM_SAVE_FLAG_EOS); + qemu_fflush(f); + + return 0; } static void ram_save_pending(QEMUFile *f, void *opaque, uint64_t max_size, diff --git a/migration/socket.c b/migration/socket.c index f7a77aafd3..23b03e6190 100644 --- a/migration/socket.c +++ b/migration/socket.c @@ -78,8 +78,9 @@ static void socket_outgoing_migration(QIOTask *task, trace_migration_socket_outgoing_connected(data->hostname); } - if (migrate_use_zero_copy()) { - error_setg(&err, "Zero copy not available in migration"); + if (migrate_use_zero_copy() && + !qio_channel_has_feature(sioc, QIO_CHANNEL_FEATURE_WRITE_ZERO_COPY)) { + error_setg(&err, "Zero copy feature not detected in host kernel"); } migration_channel_connect(data->s, sioc, data->hostname, err);