From patchwork Fri Jun 10 23:21:28 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Lucas De Marchi X-Patchwork-Id: 12878152 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 51C1AC43334 for ; Fri, 10 Jun 2022 23:21:09 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id D142B11AFAD; Fri, 10 Jun 2022 23:21:08 +0000 (UTC) Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by gabe.freedesktop.org (Postfix) with ESMTPS id 144C011AFAD; Fri, 10 Jun 2022 23:21:08 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1654903268; x=1686439268; h=from:to:cc:subject:date:message-id:mime-version: content-transfer-encoding; bh=F73a8Ft10vACEWr3RICOQTrJWj1S3nEcP6DzQv42DfI=; b=SMGOwWLDPxCD3Gkzhxi6BwDkwS7YxUg8UbMsxSaa+ivMrjlu54tClMHO tC3sVXmEHevpMQEIQhE102Jhq7BZ/3blUhIX9hHv0peLk9iNEglQRQAAA 06etTbFAu0mXmV1rxI042QUpqUmv/ZCDDtL3RqD6EizcY6WsM2kSnkcva rJDqPtb/hfwO5bgVZv7lIjPQRSXmqxHTPPJvKcrUNtl2NsP0pgUwFX/Nv UnCwfL9GEQBU8tnytTgjQLZFrnbY7WjxziRhxItUxgAufzOolTESaXbkR iYOoLNv5BJ+Zgx1NSoRZaHI8gNasf6ymklOJjwItUDn2z6V5TE/wFmqfC Q==; X-IronPort-AV: E=McAfee;i="6400,9594,10374"; a="258210104" X-IronPort-AV: E=Sophos;i="5.91,291,1647327600"; d="scan'208";a="258210104" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Jun 2022 16:21:07 -0700 X-IronPort-AV: E=Sophos;i="5.91,291,1647327600"; d="scan'208";a="610919567" Received: from lucas-s2600cw.jf.intel.com ([10.165.21.202]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Jun 2022 16:21:07 -0700 From: Lucas De Marchi To: intel-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org Date: Fri, 10 Jun 2022 16:21:28 -0700 Message-Id: <20220610232130.2865479-1-lucas.demarchi@intel.com> X-Mailer: git-send-email 2.36.1 MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 1/3] iosys-map: Add per-word read X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: daniel.vetter@ffwll.ch, Lucas De Marchi , christian.koenig@amd.com, tzimmermann@suse.de, chris@chris-wilson.co.uk Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" Instead of always falling back to memcpy_fromio() for any size, prefer using read{b,w,l}(). When reading struct members it's common to read individual integer variables individually. Going through memcpy_fromio() for each of them poses a high penalty. Employ a similar trick as __seqprop() by using _Generic() to generate only the specific call based on a type-compatible variable. For a pariticular i915 workload producing GPU context switches, __get_engine_usage_record() is particularly hot since the engine usage is read from device local memory with dgfx, possibly multiple times since it's racy. Test execution time for this test shows a ~12.5% improvement with DG2: Before: nrepeats = 1000; min = 7.63243e+06; max = 1.01817e+07; median = 9.52548e+06; var = 526149; After: nrepeats = 1000; min = 7.03402e+06; max = 8.8832e+06; median = 8.33955e+06; var = 333113; Other things attempted that didn't prove very useful: 1) Change the _Generic() on x86 to just dereference the memory address 2) Change __get_engine_usage_record() to do just 1 read per loop, comparing with the previous value read 3) Change __get_engine_usage_record() to access the fields directly as it was before the conversion to iosys-map (3) did gave a small improvement (~3%), but doesn't seem to scale well to other similar cases in the driver. Additional test by Chris Wilson using gem_create from igt with some changes to track object creation time. This happens to accidentally stress this code path: Pre iosys_map conversion of engine busyness: lmem0: Creating 262144 4KiB objects took 59274.2ms Unpatched: lmem0: Creating 262144 4KiB objects took 108830.2ms With readl (this patch): lmem0: Creating 262144 4KiB objects took 61348.6ms s/readl/READ_ONCE/ lmem0: Creating 262144 4KiB objects took 61333.2ms So we do take a little bit more time than before the conversion, but that is due to other factors: bringing the READ_ONCE back would be as good as just doing this conversion. Signed-off-by: Lucas De Marchi Reviewed-by: Christian König for the entire --- include/linux/iosys-map.h | 26 ++++++++++++++++++++++---- 1 file changed, 22 insertions(+), 4 deletions(-) diff --git a/include/linux/iosys-map.h b/include/linux/iosys-map.h index e69a002d5aa4..cd28c7a1b79c 100644 --- a/include/linux/iosys-map.h +++ b/include/linux/iosys-map.h @@ -333,6 +333,20 @@ static inline void iosys_map_memset(struct iosys_map *dst, size_t offset, memset(dst->vaddr + offset, value, len); } +#ifdef CONFIG_64BIT +#define __iosys_map_rd_io_u64_case(val_, vaddr_iomem_) \ + u64: val_ = readq(vaddr_iomem_), +#else +#define __iosys_map_rd_io_u64_case(val_, vaddr_iomem_) +#endif + +#define __iosys_map_rd_io(val__, vaddr_iomem__, type__) _Generic(val__, \ + u8: val__ = readb(vaddr_iomem__), \ + u16: val__ = readw(vaddr_iomem__), \ + u32: val__ = readl(vaddr_iomem__), \ + __iosys_map_rd_io_u64_case(val__, vaddr_iomem__) \ + default: memcpy_fromio(&(val__), vaddr_iomem__, sizeof(val__))) + /** * iosys_map_rd - Read a C-type value from the iosys_map * @@ -346,10 +360,14 @@ static inline void iosys_map_memset(struct iosys_map *dst, size_t offset, * Returns: * The value read from the mapping. */ -#define iosys_map_rd(map__, offset__, type__) ({ \ - type__ val; \ - iosys_map_memcpy_from(&val, map__, offset__, sizeof(val)); \ - val; \ +#define iosys_map_rd(map__, offset__, type__) ({ \ + type__ val; \ + if ((map__)->is_iomem) { \ + __iosys_map_rd_io(val, (map__)->vaddr_iomem + offset__, type__);\ + } else { \ + memcpy(&val, (map__)->vaddr + offset__, sizeof(val)); \ + } \ + val; \ }) /** From patchwork Fri Jun 10 23:21:29 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lucas De Marchi X-Patchwork-Id: 12878153 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id EA4E9CCA47B for ; Fri, 10 Jun 2022 23:21:09 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id D2C1511AFAF; Fri, 10 Jun 2022 23:21:08 +0000 (UTC) Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by gabe.freedesktop.org (Postfix) with ESMTPS id 3FD5E11AFAF; Fri, 10 Jun 2022 23:21:08 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1654903268; x=1686439268; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=KIQGghf/MmQhFdNpCCB1glb7uOI0L+bi2eEDmid2NtI=; b=E6qpNxn64uTrwCiIvlmJ0oETbBypu5VnxCiPZj3JcEFLdLMlG9hkSdSf 9wKWKWQ+uHT9fubsJZalwzFq8zLT/gkntkDExA4eZNN65IuRbkt3gHj6+ rfY2ADl9ZT3B+L3EObw4DX7Ux87dq4UBp62Oz7L2DbFLNQmZgnGzMpFCh 7EPUjzo618Bwl0ZL3hpAuu3SWZFYUJq7lgvqRJR7oYJq2BTPD9Q/K9Pxu zKdtdJUyN7FBBLzUAt8WAtJzbnDIuxy0W4Lu/6RqKhjGb7uoAZ2PPgKhE qpv2F+XPd6R26XrL5VMlckVx/nlvp0bS6L3Z3/tncmy4t82AZq0u29YGU g==; X-IronPort-AV: E=McAfee;i="6400,9594,10374"; a="258210105" X-IronPort-AV: E=Sophos;i="5.91,291,1647327600"; d="scan'208";a="258210105" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Jun 2022 16:21:07 -0700 X-IronPort-AV: E=Sophos;i="5.91,291,1647327600"; d="scan'208";a="610919571" Received: from lucas-s2600cw.jf.intel.com ([10.165.21.202]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Jun 2022 16:21:07 -0700 From: Lucas De Marchi To: intel-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org Date: Fri, 10 Jun 2022 16:21:29 -0700 Message-Id: <20220610232130.2865479-2-lucas.demarchi@intel.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: <20220610232130.2865479-1-lucas.demarchi@intel.com> References: <20220610232130.2865479-1-lucas.demarchi@intel.com> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 2/3] iosys-map: Add per-word write X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: daniel.vetter@ffwll.ch, Lucas De Marchi , christian.koenig@amd.com, tzimmermann@suse.de, chris@chris-wilson.co.uk Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" Like was done for read, provide the equivalent for write. Even if current users are not in the hot path, this should future-proof it. Signed-off-by: Lucas De Marchi --- include/linux/iosys-map.h | 20 +++++++++++++++++--- 1 file changed, 17 insertions(+), 3 deletions(-) diff --git a/include/linux/iosys-map.h b/include/linux/iosys-map.h index cd28c7a1b79c..793e5cd50dbf 100644 --- a/include/linux/iosys-map.h +++ b/include/linux/iosys-map.h @@ -336,8 +336,11 @@ static inline void iosys_map_memset(struct iosys_map *dst, size_t offset, #ifdef CONFIG_64BIT #define __iosys_map_rd_io_u64_case(val_, vaddr_iomem_) \ u64: val_ = readq(vaddr_iomem_), +#define __iosys_map_wr_io_u64_case(val_, vaddr_iomem_) \ + u64: writeq(val_, vaddr_iomem_), #else #define __iosys_map_rd_io_u64_case(val_, vaddr_iomem_) +#define __iosys_map_wr_io_u64_case(val_, vaddr_iomem_) #endif #define __iosys_map_rd_io(val__, vaddr_iomem__, type__) _Generic(val__, \ @@ -347,6 +350,13 @@ static inline void iosys_map_memset(struct iosys_map *dst, size_t offset, __iosys_map_rd_io_u64_case(val__, vaddr_iomem__) \ default: memcpy_fromio(&(val__), vaddr_iomem__, sizeof(val__))) +#define __iosys_map_wr_io(val__, vaddr_iomem__, type__) _Generic(val__, \ + u8: writeb(val__, vaddr_iomem__), \ + u16: writew(val__, vaddr_iomem__), \ + u32: writel(val__, vaddr_iomem__), \ + __iosys_map_wr_io_u64_case(val__, vaddr_iomem__) \ + default: memcpy_toio(vaddr_iomem__, &val, sizeof(val))) + /** * iosys_map_rd - Read a C-type value from the iosys_map * @@ -381,9 +391,13 @@ static inline void iosys_map_memset(struct iosys_map *dst, size_t offset, * Write a C-type value to the iosys_map, handling possible un-aligned accesses * to the mapping. */ -#define iosys_map_wr(map__, offset__, type__, val__) ({ \ - type__ val = (val__); \ - iosys_map_memcpy_to(map__, offset__, &val, sizeof(val)); \ +#define iosys_map_wr(map__, offset__, type__, val__) ({ \ + type__ val = (val__); \ + if ((map__)->is_iomem) { \ + __iosys_map_wr_io(val, (map__)->vaddr_iomem + offset__, type__);\ + } else { \ + memcpy((map__)->vaddr + offset__, &val, sizeof(val)); \ + } \ }) /** From patchwork Fri Jun 10 23:21:30 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lucas De Marchi X-Patchwork-Id: 12878154 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id C66D8C433EF for ; Fri, 10 Jun 2022 23:21:17 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 9894A11AFB7; Fri, 10 Jun 2022 23:21:11 +0000 (UTC) Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by gabe.freedesktop.org (Postfix) with ESMTPS id 6B0D611AFAD; Fri, 10 Jun 2022 23:21:08 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1654903268; x=1686439268; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=A4+lPKDnTI+yoJGG5baeZBNbeEmhuj3eODvaLwB6HHE=; b=YGGnrLG7+ABCQ7sgrbtasYWDpLVhluNlP0FypMEx0mF4eKiBrF0fyPGd LGrD8ksoHUV4vWoUIWgN6Pl+5Ba1HKZZZyCoseAIaqm5LNWPzAA4S0/MC BWTsWakbdItV46IgNZfQDzNqPfZkkyQeCAHwt1P6J0jTX2/FctGjLC4X0 1wqlonq9v2zjonGvBGM4BpkyhuIygPjlZ8eqtmEAk+RAEOj9ZCJ0iKk7e VEy4Wl3hpU9g5lZLjfMvBTKKwDXVQO3zsplnyKTE4+9XyVjH4t9xDRLYe RtbVSaPnISD6q9sFmpnRK8BgqaMzGZtcHgLPl82KiIerFzwM8YMsdFtgJ Q==; X-IronPort-AV: E=McAfee;i="6400,9594,10374"; a="258210106" X-IronPort-AV: E=Sophos;i="5.91,291,1647327600"; d="scan'208";a="258210106" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Jun 2022 16:21:07 -0700 X-IronPort-AV: E=Sophos;i="5.91,291,1647327600"; d="scan'208";a="610919574" Received: from lucas-s2600cw.jf.intel.com ([10.165.21.202]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Jun 2022 16:21:07 -0700 From: Lucas De Marchi To: intel-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org Date: Fri, 10 Jun 2022 16:21:30 -0700 Message-Id: <20220610232130.2865479-3-lucas.demarchi@intel.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: <20220610232130.2865479-1-lucas.demarchi@intel.com> References: <20220610232130.2865479-1-lucas.demarchi@intel.com> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 3/3] iosys-map: Fix typo in documentation X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: daniel.vetter@ffwll.ch, Lucas De Marchi , christian.koenig@amd.com, tzimmermann@suse.de, chris@chris-wilson.co.uk Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" It's one argument, vaddr_iomem, not 2 (vaddr and _iomem). Signed-off-by: Lucas De Marchi --- include/linux/iosys-map.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/linux/iosys-map.h b/include/linux/iosys-map.h index 793e5cd50dbf..d092d30f5812 100644 --- a/include/linux/iosys-map.h +++ b/include/linux/iosys-map.h @@ -23,7 +23,7 @@ * memcpy(vaddr, src, len); * * void *vaddr_iomem = ...; // pointer to I/O memory - * memcpy_toio(vaddr, _iomem, src, len); + * memcpy_toio(vaddr_iomem, src, len); * * The user of such pointer may not have information about the mapping of that * region or may want to have a single code path to handle operations on that