From patchwork Wed Jan 8 11:50:19 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Levon X-Patchwork-Id: 13930721 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id AACF3E77199 for ; Wed, 8 Jan 2025 11:58:49 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1tVUdp-0006Z3-4B; Wed, 08 Jan 2025 06:54:29 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1tVUdQ-0006SO-Db for qemu-devel@nongnu.org; Wed, 08 Jan 2025 06:54:05 -0500 Received: from mx0a-002c1b01.pphosted.com ([148.163.151.68]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1tVUdL-0002GA-Hr for qemu-devel@nongnu.org; Wed, 08 Jan 2025 06:54:03 -0500 Received: from pps.filterd (m0127840.ppops.net [127.0.0.1]) by mx0a-002c1b01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 5085doxa029507; Wed, 8 Jan 2025 03:53:56 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nutanix.com; h= cc:content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to; s= proofpoint20171006; bh=n9qr3mk4E/ZQS5sBOt6yr/xhodb5zxqZ4z/XPdbQW vI=; b=kOwOiInCE1V4RhUBwVTSdAyS58qmakOSobw+kM9fvqX5UGf2Qf0XFgoDJ FZDAjAs43vNveYE7fW4tKRfxYttBl5DkLZvINPa5iryVPrs25sSxstm6O+pQ3hYw gVObnxUgeoZvj80FklnrNu8qqK3gcVbFDHpRh+7t4kDUgw6y0DI456lLHmNW4gSu rQrqTwtULACTcE9qOHLqs8GX+pRQ/kKWdG+F4yMlLX/k0shItIyslMa4O29wrmi7 l2dgcJU8uf7+bh0/TyGqJoQrMymtSzz8TXuY8S3ERDXMuYJziNQ0xYxbzNYKFuxJ Vj/W1ye7aT/kXaAHzeFqf2otZ00PQ== Received: from nam11-co1-obe.outbound.protection.outlook.com (mail-co1nam11lp2169.outbound.protection.outlook.com [104.47.56.169]) by mx0a-002c1b01.pphosted.com (PPS) with ESMTPS id 43y26xqhax-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 08 Jan 2025 03:53:56 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=E+tIiuD/2Orpj3Dyz1GmWUXRM/tRTlLaxbzR4549MEnHcT+GGplnYuOgn5gDfGVhe2DZoyTpN4ZoI6XLESq2bkRiIrOjZzCtP7T9i/cDxmETFUJ5Kh+wstUfStcSJOtC/SViiq+/itjVqYqNiC2pwLTNBADBeCBZvYavhhl/6nIUFV3HakvX+jWzo+V6gMKuo+/6WGaVGA/JD7unLrA0btiJIoM32GoQCK/3HPr9wW7j/kP4WkxmjP94O9TgY4knlDIEf/hJSbFHYdc81dlqusk359mRx327apTyDhMs+gQOXWGyypeyrgjWVCfMr1M3fbeh4y7nXzt9pDRhPctp4Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=n9qr3mk4E/ZQS5sBOt6yr/xhodb5zxqZ4z/XPdbQWvI=; b=bkGxaHLWPT/4OzR1P9u52wclMNv3AJVBJLAwfE0KTBtdNcPAJpixW8nPOb7zctMQ6/EjQt2nrBu3egWfCkVtG3QRcoAoExcnel0yXyJDR1SWCIfIy4ak1iF+auI3OX/nV1qZHOJJxuoFpL+uywncXjEACUr/M9QyEbcxcwENGux4nMzRCJJrEZuhSSpXYOylm+jDFGt8OeD2VEu6qEtT21ZgBqzn4FRx1gDytqCPO0i8npl7lHuko22kKgl11NVgosY+4iWdYkSj0Y+WzKmNxjTSPYkNSDQ5IK8Sivek1l9xF00fiO2rKC+DCTjQCn9fGjR3YF5Ss0+Nuv0Sl9oakQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=nutanix.com; dmarc=pass action=none header.from=nutanix.com; dkim=pass header.d=nutanix.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nutanix.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=n9qr3mk4E/ZQS5sBOt6yr/xhodb5zxqZ4z/XPdbQWvI=; b=iw55LOBJnY/0hZO09s9GPxq77SRBx/wX2bmV8Dv1peot8itybNuQB39rOs5PG0b+4kSfdnMXYFbe9nUVP6rLrihXzL9inWi/s7s8lgIjM1ds9HCrO9HQHEjFEzC/93mgACIph9HgMJBGuJEmaxJr2WCKGIg4RvpUqxgF9/E5P19tI/p+L8SiPS6Cx9p2fYgf26mw9L98Lq3xAfCZLlTBFOPL3QBKrDTxRNfqeR4RUFP0CNTXxTC5o4XfP//ftdQ9zlV0qnrDy7J+bUa3qzH7qJF+Q51uE8wA0rtrObzWUmIuDlx2DLkHzjYE0HGCK3aA77O4fRGZn3Cq2tnKP5TMKQ== Received: from CH2PR02MB6760.namprd02.prod.outlook.com (2603:10b6:610:7f::9) by PH0PR02MB7670.namprd02.prod.outlook.com (2603:10b6:510:50::13) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8335.10; Wed, 8 Jan 2025 11:53:46 +0000 Received: from CH2PR02MB6760.namprd02.prod.outlook.com ([fe80::fd77:ea65:a159:ef51]) by CH2PR02MB6760.namprd02.prod.outlook.com ([fe80::fd77:ea65:a159:ef51%4]) with mapi id 15.20.8335.010; Wed, 8 Jan 2025 11:53:46 +0000 From: John Levon To: qemu-devel@nongnu.org Cc: elena.ufimtseva@oracle.com, alex.williamson@redhat.com, clg@redhat.com, jag.raman@oracle.com, thanos.makatos@nutanix.com Subject: [PATCH 13/26] vfio-user: define socket send functions Date: Wed, 8 Jan 2025 11:50:19 +0000 Message-Id: <20250108115032.1677686-14-john.levon@nutanix.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20250108115032.1677686-1-john.levon@nutanix.com> References: <20250108115032.1677686-1-john.levon@nutanix.com> X-ClientProxiedBy: AS4P189CA0066.EURP189.PROD.OUTLOOK.COM (2603:10a6:20b:659::14) To CH2PR02MB6760.namprd02.prod.outlook.com (2603:10b6:610:7f::9) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: CH2PR02MB6760:EE_|PH0PR02MB7670:EE_ X-MS-Office365-Filtering-Correlation-Id: 437e7373-5bc0-4df7-cf2d-08dd2fdb1b9f x-proofpoint-crosstenant: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|376014|1800799024|366016; X-Microsoft-Antispam-Message-Info: 0MieiYOmyptZnFpEBYqujjN88RUNMjAXdSw2GtQb1eocEpJ0v1Z1+FS/3eosURZ2VedQz0N8SJ+IJAZ0s6jX8Zf8AX/aJN1TDtn1hUsNvdhUjnZIlu6qzP/AS6MSIS4nphTBNyXt6eT4vvY4eX3MxDHBeQSRg5kdYGrhPHP4Fknw38MXii6eHzkm4H6hl5MschpzHpoI92TEdaXm/ms0j7RwPV8DWfNiwbFyP6TFhyCOcoBeNxxYVTBGJQxZC1ckzy+iETcYMp/QXPtIfXRIHWbusBc69DbWNVjyp9bRm6FbISh/NHW6pZLyIcJypepa0GB7N+0xTfm9iLXXoalk0tvGd1u3vlkttUDHJJ4W4tXHIrr/2XWjTwcOSBCUnLlMxzdfN6o90RFyIpw0ntaVytSPxO6+3j1z/v92FfLf11pH5lf4oNrIUWrV6KrTBWSs4BjQ7Xz7cRW2/kxjH7QM/+fwpy3K4nf6FEwpB07AwsV/pJd5vCe+UBd99LKrvJCTKfDgmv5qZAtIrkgk/hcOyGMOgnEB1Qymr96J3N9Lzr383YR9nhvHTJ/J8ELfyvt9f7NSsV+FSwU6T2EFQl77yQRFTqt123QJTA7vPyaYIrTOOZwa0cG8yjSkDoyulxybZB58lWJGyO8djzsa5OaCCy4mJpr4TLTuCM2L/uzvSC/kNeI2gjMtaRrLIXnXbnghTfybkvvQaG+2jtvkfFBIv7B0Ih+UICXALvvT8yqzpd1kD2OWZpm2uehyX5BDHB/oGwgBtL5mH6pEA+hHZNqROSYXtmOKXesJWy5qnu6WCKgjdVphsmNkoj51qdtUAArFehTFrhUSMTT/vVns7QMxlykWZ1vo9D3mNMcPxtjU1rbh4vaOYI9qHtnCDFH6RCvGOu3s2wJXzLn29WeQnASBwsjEpmEfQapcIg/VB1DYO/3C8WScwywwWN7xsvpWHYrEuGPAuCmEEEFmvgIwZbePXidffTSZqhTKqeKUUYlNKN7i7IobMzh756oXl84zxwVuvokYI3/vK/H+m+ohA1zi3DUyrefeSdUzT7NrCqnVsP+6JObM2iw3IZDk+Er7W2QVSM5Y5NfRjCbcV0WSd6OHiobgYMCJi8JlCncVAS/MFSCsYf/INCQEVj4Zt+8Xr5SCGOBNY91WfM0lQLuIOGrDJLqeIl6yi1R3vnWJFMvYiAh8tMp3M0HC2MqnNNt/TZXvEuPaOtXEKLpzHCM84YApbNgPSFaOIcRwbxsItBDiDDKvkozmBEmal1an7hagNyBweJd2ydbQPXymT1BY4yoDO7op/Upunub7Nw7qaoqyzSvNnL9GDuZx+bSiizFPv/fm36V8if+SWPEcsWFn+lIqhSZiYWYU1j8hSqFtuxbIEX9/URdUbvJfMj58Jr+8z9ag X-Forefront-Antispam-Report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:CH2PR02MB6760.namprd02.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230040)(376014)(1800799024)(366016); DIR:OUT; SFP:1102; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: ovds62/fzH3aPHQOwJQCd9EP1hxMqlEsshxteEaowrUdgxoP14mU9iOOqiitNEdY71MVEjCtd7479++MPq1KFFfpJ2TudypJUUkq0Z4Ao15gF5q5UY94D9AsrzntGmFmwWJrMxW03yA2Z2w5kK7UU3VdKaW6uAlavmtLR6YXS02vcRE4nJlxMvn/kUmYiz2KbgulU4aRKPsRmQH00frbNGPoMtmNcL6IObbLKubS7tXJzonLA7xx1ceIjxMZBSFtAFQ7AdGZZzqauYQS+ySA9ouOAcwwIbJRWtzS6IDn8zUNwk3YSYJ2FQXdfgVVS7rCfcfIud4XawchKFQr3nP6YHRZ5VRTCZ6LHKa51plHVdrGx8vsaV2LjlyUK9ODkGVUhx5/szp2UcMtYe8nYfw/gPB6Gxlhi/jQlPyevIXjWITk/kSgXafGip80TpQVydPnc47EXXncDMX+rJKGK3xrEP7jPTiMrFrQ8zWBGXTwPbxabLysZwfQTuGPka9UHklWzPB3u4fCamdZTDzGdA0hDBWo2dMi1vfz2Ve/FyUCA0wSgp4VxQTml1KF2aQxi2OZLr690ltOzOR8l8EpTZmzR4O4X8gTZXorSITULoaT0DKo3jwzO86cU8o4MHtqBp7yhdRptKGVXF8qLghR6lVkDxlL2KoMbGJy/FgFQDzVrSFZ192Gi/MO/IFAj6Nt+8WjtgMVHhSqJOtTxvIddmUNyEXqmb/hTpPE710sRQrnZDYx2UJ6dSl+axvYP5PscG9indno01IOf6TVuHklIulYSRZQIBwujfLZwhxQ6lzOFaVrwAqsHjm+BEuwUvXVxrvHE0EXnfwcDigxyHiwNoUh2FjZixky9HbbSfg/stGox9oA9fKtxepDe/u1OGxAQ7w//S4T0Do89XQnVFhkRpI7BFPxIvyEUQXxSH9rpByFeHfW1TDlm1QMSyQL7eHbe+Rs6yfFoEAXN9FtxRCPFD/iD9Fnq28O7bWU1GW1c4fVEWnEtwyi+09Jzs95MtnxJexsVq3hUMC8wI4ZmkrOrdErNGdMlTh3xsk9bpK+aI0SHzE8mPq5okpHnSJ4gm9gnRgy4tnB+UDBFjGnmXZecHPYNTov+AuMe90UyTCLYPeW5Hn6YelOIeDxylhR3AgbJaSOSrqLpiA7AyhJPLQEqBmZV//cjZb0DaWejdyJHQ3gSlbgnjFRXckXiz725dxpblJoEDUSVRy5Di4NxzvKLIhQ5JfzPGR6yL2b7+MKlkUMs+0ELNBdQb1C39qD+cxHBCoIjQ95vDraJ8gBiQ+iPzsCjqVqw0wY2PTdZdfsrSgpSTf7gSvmFEwj4Ty+GMNyHqlXf126uXqTs+EIZpq8sQobnP56SomAZgOgm3TNIg0sTsMfAJ/GZrRXBK1+ymP1wsUCZgqaCNnq9ZB885sf+3HcWthjXKW0iDXrqC5V7HyiHR3vk5v4p1BDNarH3e4MQJHR2T6G47PoqWY6UguPbXndG8+Rh2LiflnoTNipTj06jAfv4rfyRR3q43FCmQWFl89yCuOSVchrfM/qBJ2BcvT85TjpVYeExIzd+HkHOwmyFopsmN1EFrvMKxHP2z8znbgV X-OriginatorOrg: nutanix.com X-MS-Exchange-CrossTenant-Network-Message-Id: 437e7373-5bc0-4df7-cf2d-08dd2fdb1b9f X-MS-Exchange-CrossTenant-AuthSource: CH2PR02MB6760.namprd02.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 08 Jan 2025 11:53:46.6647 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: bb047546-786f-4de1-bd75-24e5b6f79043 X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: UMur3vyWSeuwnx1xj1g3l4y0Hm49s408fCaxLn+Db7Rii1judTWzVuD1e4HNXPRISnms1dbOprKin6yag89gNA== X-MS-Exchange-Transport-CrossTenantHeadersStamped: PH0PR02MB7670 X-Authority-Analysis: v=2.4 cv=Z/cWHGRA c=1 sm=1 tr=0 ts=677e6754 cx=c_pps a=MPHjzrODTC1L994aNYq1fw==:117 a=wKuvFiaSGQ0qltdbU6+NXLB8nM8=:19 a=Ol13hO9ccFRV9qXi2t6ftBPywas=:19 a=xqWC_Br6kY4A:10 a=VdSt8ZQiCzkA:10 a=0034W8JfsZAA:10 a=0kUYKlekyDsA:10 a=yPCof4ZbAAAA:8 a=64Cc0HZtAAAA:8 a=gI41RUl3HiD4W0Omd60A:9 a=14NRyaPF5x3gF6G45PvQ:22 X-Proofpoint-GUID: i1fOGVhEVgUZ7kiGecz81gOharGxHOxr X-Proofpoint-ORIG-GUID: i1fOGVhEVgUZ7kiGecz81gOharGxHOxr X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1057,Hydra:6.0.680,FMLib:17.12.68.34 definitions=2025-01-08_02,2025-01-08_01,2024-11-22_01 X-Proofpoint-Spam-Reason: safe Received-SPF: pass client-ip=148.163.151.68; envelope-from=john.levon@nutanix.com; helo=mx0a-002c1b01.pphosted.com X-Spam_score_int: -31 X-Spam_score: -3.2 X-Spam_bar: --- X-Spam_report: (-3.2 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.432, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H2=-0.001, RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, RCVD_IN_VALIDITY_SAFE_BLOCKED=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org From: Jagannathan Raman Also negotiate protocol version with remote server Originally-by: John Johnson Signed-off-by: Jagannathan Raman Signed-off-by: Elena Ufimtseva Signed-off-by: John Levon --- hw/vfio/trace-events | 2 + hw/vfio/user-pci.c | 18 +- hw/vfio/user-protocol.h | 62 +++++ hw/vfio/user.c | 495 ++++++++++++++++++++++++++++++++++++++++ hw/vfio/user.h | 9 + 5 files changed, 584 insertions(+), 2 deletions(-) diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events index 0e3e7be10c..d66fc6c214 100644 --- a/hw/vfio/trace-events +++ b/hw/vfio/trace-events @@ -185,3 +185,5 @@ iommufd_cdev_pci_hot_reset_dep_devices(int domain, int bus, int slot, int functi vfio_user_recv_hdr(const char *name, uint16_t id, uint16_t cmd, uint32_t size, uint32_t flags) " (%s) id 0x%x cmd 0x%x size 0x%x flags 0x%x" vfio_user_recv_read(uint16_t id, int read) " id 0x%x read 0x%x" vfio_user_recv_request(uint16_t cmd) " command 0x%x" +vfio_user_send_write(uint16_t id, int wrote) " id 0x%x wrote 0x%x" +vfio_user_version(uint16_t major, uint16_t minor, const char *caps) " major %d minor %d caps: %s" diff --git a/hw/vfio/user-pci.c b/hw/vfio/user-pci.c index b62fd4edef..62259db473 100644 --- a/hw/vfio/user-pci.c +++ b/hw/vfio/user-pci.c @@ -39,6 +39,7 @@ OBJECT_DECLARE_SIMPLE_TYPE(VFIOUserPCIDevice, VFIO_USER_PCI) struct VFIOUserPCIDevice { VFIOPCIDevice device; char *sock_name; + bool send_queued; /* all sends are queued */ }; /* @@ -98,6 +99,14 @@ static void vfio_user_pci_realize(PCIDevice *pdev, Error **errp) vbasedev->proxy = proxy; vfio_user_set_handler(vbasedev, vfio_user_pci_process_req, vdev); + if (udev->send_queued) { + proxy->flags |= VFIO_PROXY_FORCE_QUEUED; + } + + if (!vfio_user_validate_version(proxy, errp)) { + goto error; + } + vbasedev->name = g_strdup_printf("VFIO user <%s>", udev->sock_name); vbasedev->ops = &vfio_user_pci_ops; vbasedev->type = VFIO_DEVICE_TYPE_PCI; @@ -112,9 +121,13 @@ static void vfio_user_pci_realize(PCIDevice *pdev, Error **errp) if (!vfio_attach_device_by_iommu_type(TYPE_VFIO_IOMMU_USER, vbasedev->name, vbasedev, as, errp)) { - error_prepend(errp, VFIO_MSG_PREFIX, vbasedev->name); - return; + goto error; } + + return; + +error: + error_prepend(errp, VFIO_MSG_PREFIX, vdev->vbasedev.name); } static void vfio_user_instance_init(Object *obj) @@ -157,6 +170,7 @@ static void vfio_user_instance_finalize(Object *obj) static const Property vfio_user_pci_dev_properties[] = { DEFINE_PROP_STRING("socket", VFIOUserPCIDevice, sock_name), + DEFINE_PROP_BOOL("x-send-queued", VFIOUserPCIDevice, send_queued, false), }; static void vfio_user_pci_dev_class_init(ObjectClass *klass, void *data) diff --git a/hw/vfio/user-protocol.h b/hw/vfio/user-protocol.h index d23877c958..5de5b2030c 100644 --- a/hw/vfio/user-protocol.h +++ b/hw/vfio/user-protocol.h @@ -51,4 +51,66 @@ enum vfio_user_command { #define VFIO_USER_NO_REPLY 0x10 #define VFIO_USER_ERROR 0x20 + +/* + * VFIO_USER_VERSION + */ +typedef struct { + VFIOUserHdr hdr; + uint16_t major; + uint16_t minor; + char capabilities[]; +} VFIOUserVersion; + +#define VFIO_USER_MAJOR_VER 0 +#define VFIO_USER_MINOR_VER 0 + +#define VFIO_USER_CAP "capabilities" + +/* "capabilities" members */ +#define VFIO_USER_CAP_MAX_FDS "max_msg_fds" +#define VFIO_USER_CAP_MAX_XFER "max_data_xfer_size" +#define VFIO_USER_CAP_PGSIZES "pgsizes" +#define VFIO_USER_CAP_MAP_MAX "max_dma_maps" +#define VFIO_USER_CAP_MIGR "migration" + +/* "migration" members */ +#define VFIO_USER_CAP_PGSIZE "pgsize" +#define VFIO_USER_CAP_MAX_BITMAP "max_bitmap_size" + +/* + * Max FDs mainly comes into play when a device supports multiple interrupts + * where each ones uses an eventfd to inject it into the guest. + * It is clamped by the the number of FDs the qio channel supports in a + * single message. + */ +#define VFIO_USER_DEF_MAX_FDS 8 +#define VFIO_USER_MAX_MAX_FDS 16 + +/* + * Max transfer limits the amount of data in region and DMA messages. + * Region R/W will be very small (limited by how much a single instruction + * can process) so just use a reasonable limit here. + */ +#define VFIO_USER_DEF_MAX_XFER (1024 * 1024) +#define VFIO_USER_MAX_MAX_XFER (64 * 1024 * 1024) + +/* + * Default pagesizes supported is 4k. + */ +#define VFIO_USER_DEF_PGSIZE 4096 + +/* + * Default max number of DMA mappings is stolen from the + * linux kernel "dma_entry_limit" + */ +#define VFIO_USER_DEF_MAP_MAX 65535 + +/* + * Default max bitmap size is also take from the linux kernel, + * where usage of signed ints limits the VA range to 2^31 bytes. + * Dividing that by the number of bits per byte yields 256MB + */ +#define VFIO_USER_DEF_MAX_BITMAP (256 * 1024 * 1024) + #endif /* VFIO_USER_PROTOCOL_H */ diff --git a/hw/vfio/user.c b/hw/vfio/user.c index 1ab8e10739..4e48bc65fe 100644 --- a/hw/vfio/user.c +++ b/hw/vfio/user.c @@ -23,12 +23,18 @@ #include "io/channel-socket.h" #include "io/channel-util.h" #include "system/iothread.h" +#include "qapi/qmp/qdict.h" +#include "qapi/qmp/qjson.h" +#include "qapi/qmp/qstring.h" +#include "qapi/qmp/qnum.h" #include "user.h" #include "trace.h" +static int wait_time = 5000; /* wait up to 5 sec for busy servers */ static IOThread *vfio_user_iothread; static void vfio_user_shutdown(VFIOUserProxy *proxy); +static int vfio_user_send_qio(VFIOUserProxy *proxy, VFIOUserMsg *msg); static VFIOUserMsg *vfio_user_getmsg(VFIOUserProxy *proxy, VFIOUserHdr *hdr, VFIOUserFDs *fds); static VFIOUserFDs *vfio_user_getfds(int numfds); @@ -36,9 +42,16 @@ static void vfio_user_recycle(VFIOUserProxy *proxy, VFIOUserMsg *msg); static void vfio_user_recv(void *opaque); static int vfio_user_recv_one(VFIOUserProxy *proxy); +static void vfio_user_send(void *opaque); +static int vfio_user_send_one(VFIOUserProxy *proxy); static void vfio_user_cb(void *opaque); static void vfio_user_request(void *opaque); +static int vfio_user_send_queued(VFIOUserProxy *proxy, VFIOUserMsg *msg); +static void vfio_user_send_wait(VFIOUserProxy *proxy, VFIOUserHdr *hdr, + VFIOUserFDs *fds, int rsize); +static void vfio_user_request_msg(VFIOUserHdr *hdr, uint16_t cmd, + uint32_t size, uint32_t flags); static inline void vfio_user_set_error(VFIOUserHdr *hdr, uint32_t err) { @@ -57,6 +70,35 @@ static void vfio_user_shutdown(VFIOUserProxy *proxy) proxy->ctx, NULL, NULL); } +static int vfio_user_send_qio(VFIOUserProxy *proxy, VFIOUserMsg *msg) +{ + VFIOUserFDs *fds = msg->fds; + struct iovec iov = { + .iov_base = msg->hdr, + .iov_len = msg->hdr->size, + }; + size_t numfds = 0; + int ret, *fdp = NULL; + Error *local_err = NULL; + + if (fds != NULL && fds->send_fds != 0) { + numfds = fds->send_fds; + fdp = fds->fds; + } + + ret = qio_channel_writev_full(proxy->ioc, &iov, 1, fdp, numfds, 0, + &local_err); + + if (ret == -1) { + vfio_user_set_error(msg->hdr, EIO); + vfio_user_shutdown(proxy); + error_report_err(local_err); + } + trace_vfio_user_send_write(msg->hdr->id, ret); + + return ret; +} + static VFIOUserMsg *vfio_user_getmsg(VFIOUserProxy *proxy, VFIOUserHdr *hdr, VFIOUserFDs *fds) { @@ -97,6 +139,7 @@ static void vfio_user_recycle(VFIOUserProxy *proxy, VFIOUserMsg *msg) msg->hdr = NULL; msg->fds = NULL; msg->complete = false; + msg->pending = false; QTAILQ_INSERT_HEAD(&proxy->free, msg, next); } @@ -391,6 +434,54 @@ err: return -1; } +/* + * Send messages from outgoing queue when the socket buffer has space. + * If we deplete 'outgoing', remove ourselves from the poll list. + */ +static void vfio_user_send(void *opaque) +{ + VFIOUserProxy *proxy = opaque; + + QEMU_LOCK_GUARD(&proxy->lock); + + if (proxy->state == VFIO_PROXY_CONNECTED) { + while (!QTAILQ_EMPTY(&proxy->outgoing)) { + if (vfio_user_send_one(proxy) < 0) { + return; + } + } + qio_channel_set_aio_fd_handler(proxy->ioc, proxy->ctx, + vfio_user_recv, NULL, NULL, proxy); + } +} + +/* + * Send a single message. + * + * Sent async messages are freed, others are moved to pending queue. + */ +static int vfio_user_send_one(VFIOUserProxy *proxy) +{ + VFIOUserMsg *msg; + int ret; + + msg = QTAILQ_FIRST(&proxy->outgoing); + ret = vfio_user_send_qio(proxy, msg); + if (ret < 0) { + return ret; + } + + QTAILQ_REMOVE(&proxy->outgoing, msg, next); + if (msg->type == VFIO_MSG_ASYNC) { + vfio_user_recycle(proxy, msg); + } else { + QTAILQ_INSERT_TAIL(&proxy->pending, msg, next); + msg->pending = true; + } + + return 0; +} + static void vfio_user_cb(void *opaque) { VFIOUserProxy *proxy = opaque; @@ -452,6 +543,119 @@ static void vfio_user_request(void *opaque) } } +/* + * Messages are queued onto the proxy's outgoing list. + * + * It handles 3 types of messages: + * + * async messages - replies and posted writes + * + * There will be no reply from the server, so message + * buffers are freed after they're sent. + * + * nowait messages - map/unmap during address space transactions + * + * These are also sent async, but a reply is expected so that + * vfio_wait_reqs() can wait for the youngest nowait request. + * They transition from the outgoing list to the pending list + * when sent, and are freed when the reply is received. + * + * wait messages - all other requests + * + * The reply to these messages is waited for by their caller. + * They also transition from outgoing to pending when sent, but + * the message buffer is returned to the caller with the reply + * contents. The caller is responsible for freeing these messages. + * + * As an optimization, if the outgoing list and the socket send + * buffer are empty, the message is sent inline instead of being + * added to the outgoing list. The rest of the transitions are + * unchanged. + * + * returns 0 if the message was sent or queued + * returns -1 on send error + */ +static int vfio_user_send_queued(VFIOUserProxy *proxy, VFIOUserMsg *msg) +{ + int ret; + + /* + * Unsent outgoing msgs - add to tail + */ + if (!QTAILQ_EMPTY(&proxy->outgoing)) { + QTAILQ_INSERT_TAIL(&proxy->outgoing, msg, next); + return 0; + } + + /* + * Try inline - if blocked, queue it and kick send poller + */ + if (proxy->flags & VFIO_PROXY_FORCE_QUEUED) { + ret = QIO_CHANNEL_ERR_BLOCK; + } else { + ret = vfio_user_send_qio(proxy, msg); + } + if (ret == QIO_CHANNEL_ERR_BLOCK) { + QTAILQ_INSERT_HEAD(&proxy->outgoing, msg, next); + qio_channel_set_aio_fd_handler(proxy->ioc, proxy->ctx, + vfio_user_recv, proxy->ctx, + vfio_user_send, proxy); + return 0; + } + if (ret == -1) { + return ret; + } + + /* + * Sent - free async, add others to pending + */ + if (msg->type == VFIO_MSG_ASYNC) { + vfio_user_recycle(proxy, msg); + } else { + QTAILQ_INSERT_TAIL(&proxy->pending, msg, next); + msg->pending = true; + } + + return 0; +} + +static void vfio_user_send_wait(VFIOUserProxy *proxy, VFIOUserHdr *hdr, + VFIOUserFDs *fds, int rsize) +{ + VFIOUserMsg *msg; + int ret; + + if (hdr->flags & VFIO_USER_NO_REPLY) { + error_printf("vfio_user_send_wait on async message\n"); + vfio_user_set_error(hdr, EINVAL); + return; + } + + qemu_mutex_lock(&proxy->lock); + + msg = vfio_user_getmsg(proxy, hdr, fds); + msg->id = hdr->id; + msg->rsize = rsize ? rsize : hdr->size; + msg->type = VFIO_MSG_WAIT; + + ret = vfio_user_send_queued(proxy, msg); + + if (ret == 0) { + while (!msg->complete) { + if (!qemu_cond_timedwait(&msg->cv, &proxy->lock, wait_time)) { + VFIOUserMsgQ *list; + + list = msg->pending ? &proxy->pending : &proxy->outgoing; + QTAILQ_REMOVE(list, msg, next); + vfio_user_set_error(hdr, ETIMEDOUT); + break; + } + } + } + vfio_user_recycle(proxy, msg); + + qemu_mutex_unlock(&proxy->lock); +} static QLIST_HEAD(, VFIOUserProxy) vfio_user_sockets = QLIST_HEAD_INITIALIZER(vfio_user_sockets); @@ -480,6 +684,15 @@ VFIOUserProxy *vfio_user_connect_dev(SocketAddress *addr, Error **errp) proxy = g_malloc0(sizeof(VFIOUserProxy)); proxy->sockname = g_strdup_printf("unix:%s", sockname); proxy->ioc = ioc; + + /* init defaults */ + proxy->max_xfer_size = VFIO_USER_DEF_MAX_XFER; + proxy->max_send_fds = VFIO_USER_DEF_MAX_FDS; + proxy->max_dma = VFIO_USER_DEF_MAP_MAX; + proxy->dma_pgsizes = VFIO_USER_DEF_PGSIZE; + proxy->max_bitmap = VFIO_USER_DEF_MAX_BITMAP; + proxy->migr_pgsize = VFIO_USER_DEF_PGSIZE; + proxy->flags = VFIO_PROXY_CLIENT; proxy->state = VFIO_PROXY_CONNECTED; @@ -577,3 +790,285 @@ void vfio_user_disconnect(VFIOUserProxy *proxy) g_free(proxy->sockname); g_free(proxy); } + +static void vfio_user_request_msg(VFIOUserHdr *hdr, uint16_t cmd, + uint32_t size, uint32_t flags) +{ + static uint16_t next_id; + + hdr->id = qatomic_fetch_inc(&next_id); + hdr->command = cmd; + hdr->size = size; + hdr->flags = (flags & ~VFIO_USER_TYPE) | VFIO_USER_REQUEST; + hdr->error_reply = 0; +} + +struct cap_entry { + const char *name; + bool (*check)(VFIOUserProxy *proxy, QObject *qobj, Error **errp); +}; + +static bool caps_parse(VFIOUserProxy *proxy, QDict *qdict, + struct cap_entry caps[], Error **errp) +{ + QObject *qobj; + struct cap_entry *p; + + for (p = caps; p->name != NULL; p++) { + qobj = qdict_get(qdict, p->name); + if (qobj != NULL) { + if (!p->check(proxy, qobj, errp)) { + return false; + } + qdict_del(qdict, p->name); + } + } + + /* warning, for now */ + if (qdict_size(qdict) != 0) { + warn_report("spurious capabilities"); + } + return true; +} + +static bool check_migr_pgsize(VFIOUserProxy *proxy, QObject *qobj, Error **errp) +{ + QNum *qn = qobject_to(QNum, qobj); + uint64_t pgsize; + + if (qn == NULL || !qnum_get_try_uint(qn, &pgsize)) { + error_setg(errp, "malformed %s", VFIO_USER_CAP_PGSIZE); + return false; + } + + /* must be larger than default */ + if (pgsize & (VFIO_USER_DEF_PGSIZE - 1)) { + error_setg(errp, "pgsize 0x%"PRIx64" too small", pgsize); + return false; + } + + proxy->migr_pgsize = pgsize; + return true; +} + +static bool check_bitmap(VFIOUserProxy *proxy, QObject *qobj, Error **errp) +{ + QNum *qn = qobject_to(QNum, qobj); + uint64_t bitmap_size; + + if (qn == NULL || !qnum_get_try_uint(qn, &bitmap_size)) { + error_setg(errp, "malformed %s", VFIO_USER_CAP_MAX_BITMAP); + return false; + } + + /* can only lower it */ + if (bitmap_size > VFIO_USER_DEF_MAX_BITMAP) { + error_setg(errp, "%s too large", VFIO_USER_CAP_MAX_BITMAP); + return false; + } + + proxy->max_bitmap = bitmap_size; + return true; +} + +static struct cap_entry caps_migr[] = { + { VFIO_USER_CAP_PGSIZE, check_migr_pgsize }, + { VFIO_USER_CAP_MAX_BITMAP, check_bitmap }, + { NULL } +}; + +static bool check_max_fds(VFIOUserProxy *proxy, QObject *qobj, Error **errp) +{ + QNum *qn = qobject_to(QNum, qobj); + uint64_t max_send_fds; + + if (qn == NULL || !qnum_get_try_uint(qn, &max_send_fds) || + max_send_fds > VFIO_USER_MAX_MAX_FDS) { + error_setg(errp, "malformed %s", VFIO_USER_CAP_MAX_FDS); + return false; + } + proxy->max_send_fds = max_send_fds; + return true; +} + +static bool check_max_xfer(VFIOUserProxy *proxy, QObject *qobj, Error **errp) +{ + QNum *qn = qobject_to(QNum, qobj); + uint64_t max_xfer_size; + + if (qn == NULL || !qnum_get_try_uint(qn, &max_xfer_size) || + max_xfer_size > VFIO_USER_MAX_MAX_XFER) { + error_setg(errp, "malformed %s", VFIO_USER_CAP_MAX_XFER); + return false; + } + proxy->max_xfer_size = max_xfer_size; + return true; +} + +static bool check_pgsizes(VFIOUserProxy *proxy, QObject *qobj, Error **errp) +{ + QNum *qn = qobject_to(QNum, qobj); + uint64_t pgsizes; + + if (qn == NULL || !qnum_get_try_uint(qn, &pgsizes)) { + error_setg(errp, "malformed %s", VFIO_USER_CAP_PGSIZES); + return false; + } + + /* must be larger than default */ + if (pgsizes & (VFIO_USER_DEF_PGSIZE - 1)) { + error_setg(errp, "pgsize 0x%"PRIx64" too small", pgsizes); + return false; + } + + proxy->dma_pgsizes = pgsizes; + return true; +} + +static bool check_max_dma(VFIOUserProxy *proxy, QObject *qobj, Error **errp) +{ + QNum *qn = qobject_to(QNum, qobj); + uint64_t max_dma; + + if (qn == NULL || !qnum_get_try_uint(qn, &max_dma)) { + error_setg(errp, "malformed %s", VFIO_USER_CAP_MAP_MAX); + return false; + } + + /* can only lower it */ + if (max_dma > VFIO_USER_DEF_MAP_MAX) { + error_setg(errp, "%s too large", VFIO_USER_CAP_MAP_MAX); + return false; + } + + proxy->max_dma = max_dma; + return true; +} + +static bool check_migr(VFIOUserProxy *proxy, QObject *qobj, Error **errp) +{ + QDict *qdict = qobject_to(QDict, qobj); + + if (qdict == NULL) { + error_setg(errp, "malformed %s", VFIO_USER_CAP_MAX_FDS); + return true; + } + return caps_parse(proxy, qdict, caps_migr, errp); +} + +static struct cap_entry caps_cap[] = { + { VFIO_USER_CAP_MAX_FDS, check_max_fds }, + { VFIO_USER_CAP_MAX_XFER, check_max_xfer }, + { VFIO_USER_CAP_PGSIZES, check_pgsizes }, + { VFIO_USER_CAP_MAP_MAX, check_max_dma }, + { VFIO_USER_CAP_MIGR, check_migr }, + { NULL } +}; + +static bool check_cap(VFIOUserProxy *proxy, QObject *qobj, Error **errp) +{ + QDict *qdict = qobject_to(QDict, qobj); + + if (qdict == NULL) { + error_setg(errp, "malformed %s", VFIO_USER_CAP); + return false; + } + return caps_parse(proxy, qdict, caps_cap, errp); +} + +static struct cap_entry ver_0_0[] = { + { VFIO_USER_CAP, check_cap }, + { NULL } +}; + +static bool caps_check(VFIOUserProxy *proxy, int minor, const char *caps, + Error **errp) +{ + QObject *qobj; + QDict *qdict; + bool ret; + + qobj = qobject_from_json(caps, NULL); + if (qobj == NULL) { + error_setg(errp, "malformed capabilities %s", caps); + return false; + } + qdict = qobject_to(QDict, qobj); + if (qdict == NULL) { + error_setg(errp, "capabilities %s not an object", caps); + qobject_unref(qobj); + return false; + } + ret = caps_parse(proxy, qdict, ver_0_0, errp); + + qobject_unref(qobj); + return ret; +} + +static GString *caps_json(void) +{ + QDict *dict = qdict_new(); + QDict *capdict = qdict_new(); + QDict *migdict = qdict_new(); + GString *str; + + qdict_put_int(migdict, VFIO_USER_CAP_PGSIZE, VFIO_USER_DEF_PGSIZE); + qdict_put_int(migdict, VFIO_USER_CAP_MAX_BITMAP, VFIO_USER_DEF_MAX_BITMAP); + qdict_put_obj(capdict, VFIO_USER_CAP_MIGR, QOBJECT(migdict)); + + qdict_put_int(capdict, VFIO_USER_CAP_MAX_FDS, VFIO_USER_MAX_MAX_FDS); + qdict_put_int(capdict, VFIO_USER_CAP_MAX_XFER, VFIO_USER_DEF_MAX_XFER); + qdict_put_int(capdict, VFIO_USER_CAP_PGSIZES, VFIO_USER_DEF_PGSIZE); + qdict_put_int(capdict, VFIO_USER_CAP_MAP_MAX, VFIO_USER_DEF_MAP_MAX); + + qdict_put_obj(dict, VFIO_USER_CAP, QOBJECT(capdict)); + + str = qobject_to_json(QOBJECT(dict)); + qobject_unref(dict); + return str; +} + +bool vfio_user_validate_version(VFIOUserProxy *proxy, Error **errp) +{ + g_autofree VFIOUserVersion *msgp = NULL; + GString *caps; + char *reply; + int size, caplen; + + caps = caps_json(); + caplen = caps->len + 1; + size = sizeof(*msgp) + caplen; + msgp = g_malloc0(size); + + vfio_user_request_msg(&msgp->hdr, VFIO_USER_VERSION, size, 0); + msgp->major = VFIO_USER_MAJOR_VER; + msgp->minor = VFIO_USER_MINOR_VER; + memcpy(&msgp->capabilities, caps->str, caplen); + g_string_free(caps, true); + trace_vfio_user_version(msgp->major, msgp->minor, msgp->capabilities); + + vfio_user_send_wait(proxy, &msgp->hdr, NULL, 0); + if (msgp->hdr.flags & VFIO_USER_ERROR) { + error_setg_errno(errp, msgp->hdr.error_reply, "version reply"); + return false; + } + + if (msgp->major != VFIO_USER_MAJOR_VER || + msgp->minor > VFIO_USER_MINOR_VER) { + error_setg(errp, "incompatible server version"); + return false; + } + + reply = msgp->capabilities; + if (reply[msgp->hdr.size - sizeof(*msgp) - 1] != '\0') { + error_setg(errp, "corrupt version reply"); + return false; + } + + if (!caps_check(proxy, msgp->minor, reply, errp)) { + return false; + } + + trace_vfio_user_version(msgp->major, msgp->minor, msgp->capabilities); + return true; +} diff --git a/hw/vfio/user.h b/hw/vfio/user.h index 30cf35d3e4..9c3b279839 100644 --- a/hw/vfio/user.h +++ b/hw/vfio/user.h @@ -35,6 +35,7 @@ typedef struct VFIOUserMsg { uint32_t id; QemuCond cv; bool complete; + bool pending; enum msg_type type; } VFIOUserMsg; @@ -54,6 +55,12 @@ typedef struct VFIOUserProxy { struct QIOChannel *ioc; void (*request)(void *opaque, VFIOUserMsg *msg); void *req_arg; + uint64_t max_xfer_size; + uint64_t max_send_fds; + uint64_t max_dma; + uint64_t dma_pgsizes; + uint64_t max_bitmap; + uint64_t migr_pgsize; int flags; QemuCond close_cv; AioContext *ctx; @@ -76,6 +83,7 @@ typedef struct VFIOUserProxy { /* VFIOProxy flags */ #define VFIO_PROXY_CLIENT 0x1 +#define VFIO_PROXY_FORCE_QUEUED 0x4 typedef struct VFIODevice VFIODevice; @@ -84,5 +92,6 @@ void vfio_user_disconnect(VFIOUserProxy *proxy); void vfio_user_set_handler(VFIODevice *vbasedev, void (*handler)(void *opaque, VFIOUserMsg *msg), void *reqarg); +bool vfio_user_validate_version(VFIOUserProxy *proxy, Error **errp); #endif /* VFIO_USER_H */