Message ID | 20220627224751.3627465-1-lucas.demarchi@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [CI,1/2] iosys-map: Add per-word read | expand |
Am 28.06.22 um 00:47 schrieb Lucas De Marchi: > Instead of always falling back to memcpy_fromio() for any size, prefer > using read{b,w,l}(). When reading struct members it's common to read > individual integer variables individually. Going through memcpy_fromio() > for each of them poses a high penalty. > > Employ a similar trick as __seqprop() by using _Generic() to generate > only the specific call based on a type-compatible variable. > > For a pariticular i915 workload producing GPU context switches, > __get_engine_usage_record() is particularly hot since the engine usage > is read from device local memory with dgfx, possibly multiple times > since it's racy. Test execution time for this test shows a ~12.5% > improvement with DG2: > > Before: > nrepeats = 1000; min = 7.63243e+06; max = 1.01817e+07; > median = 9.52548e+06; var = 526149; > After: > nrepeats = 1000; min = 7.03402e+06; max = 8.8832e+06; > median = 8.33955e+06; var = 333113; > > Other things attempted that didn't prove very useful: > 1) Change the _Generic() on x86 to just dereference the memory address > 2) Change __get_engine_usage_record() to do just 1 read per loop, > comparing with the previous value read > 3) Change __get_engine_usage_record() to access the fields directly as it > was before the conversion to iosys-map > > (3) did gave a small improvement (~3%), but doesn't seem to scale well > to other similar cases in the driver. > > Additional test by Chris Wilson using gem_create from igt with some > changes to track object creation time. This happens to accidentally > stress this code path: > > Pre iosys_map conversion of engine busyness: > lmem0: Creating 262144 4KiB objects took 59274.2ms > > Unpatched: > lmem0: Creating 262144 4KiB objects took 108830.2ms > > With readl (this patch): > lmem0: Creating 262144 4KiB objects took 61348.6ms > > s/readl/READ_ONCE/ > lmem0: Creating 262144 4KiB objects took 61333.2ms > > So we do take a little bit more time than before the conversion, but > that is due to other factors: bringing the READ_ONCE back would be as > good as just doing this conversion. > > v2: > - Remove default from _Generic() - callers wanting to read more > than u64 should use iosys_map_memcpy_from() > - Add READ_ONCE() cases dereferencing the pointer when using system > memory > > Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> > Reviewed-by: Christian König <christian.koenig@amd.com> # v1 > Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de> Feel free to update my rb to v2 as well. Apart from that do you have commit rights to drm-misc-next? If not should we push this? Thanks, Christian. > --- > include/linux/iosys-map.h | 45 +++++++++++++++++++++++++++++++-------- > 1 file changed, 36 insertions(+), 9 deletions(-) > > diff --git a/include/linux/iosys-map.h b/include/linux/iosys-map.h > index 4b8406ee8bc4..ec81ed995c59 100644 > --- a/include/linux/iosys-map.h > +++ b/include/linux/iosys-map.h > @@ -6,6 +6,7 @@ > #ifndef __IOSYS_MAP_H__ > #define __IOSYS_MAP_H__ > > +#include <linux/compiler_types.h> > #include <linux/io.h> > #include <linux/string.h> > > @@ -333,6 +334,26 @@ static inline void iosys_map_memset(struct iosys_map *dst, size_t offset, > memset(dst->vaddr + offset, value, len); > } > > +#ifdef CONFIG_64BIT > +#define __iosys_map_rd_io_u64_case(val_, vaddr_iomem_) \ > + u64: val_ = readq(vaddr_iomem_) > +#else > +#define __iosys_map_rd_io_u64_case(val_, vaddr_iomem_) \ > + u64: memcpy_fromio(&(val_), vaddr_iomem_, sizeof(u64)) > +#endif > + > +#define __iosys_map_rd_io(val__, vaddr_iomem__, type__) _Generic(val__, \ > + u8: val__ = readb(vaddr_iomem__), \ > + u16: val__ = readw(vaddr_iomem__), \ > + u32: val__ = readl(vaddr_iomem__), \ > + __iosys_map_rd_io_u64_case(val__, vaddr_iomem__)) > + > +#define __iosys_map_rd_sys(val__, vaddr__, type__) ({ \ > + compiletime_assert(sizeof(type__) <= sizeof(u64), \ > + "Unsupported access size for __iosys_map_rd_sys()"); \ > + val__ = READ_ONCE(*((type__ *)vaddr__)); \ > +}) > + > /** > * iosys_map_rd - Read a C-type value from the iosys_map > * > @@ -340,16 +361,21 @@ static inline void iosys_map_memset(struct iosys_map *dst, size_t offset, > * @offset__: The offset from which to read > * @type__: Type of the value being read > * > - * Read a C type value from iosys_map, handling possible un-aligned accesses to > - * the mapping. > + * Read a C type value (u8, u16, u32 and u64) from iosys_map. For other types or > + * if pointer may be unaligned (and problematic for the architecture supported), > + * use iosys_map_memcpy_from(). > * > * Returns: > * The value read from the mapping. > */ > -#define iosys_map_rd(map__, offset__, type__) ({ \ > - type__ val; \ > - iosys_map_memcpy_from(&val, map__, offset__, sizeof(val)); \ > - val; \ > +#define iosys_map_rd(map__, offset__, type__) ({ \ > + type__ val; \ > + if ((map__)->is_iomem) { \ > + __iosys_map_rd_io(val, (map__)->vaddr_iomem + (offset__), type__);\ > + } else { \ > + __iosys_map_rd_sys(val, (map__)->vaddr + (offset__), type__); \ > + } \ > + val; \ > }) > > /** > @@ -379,9 +405,10 @@ static inline void iosys_map_memset(struct iosys_map *dst, size_t offset, > * > * Read a value from iosys_map considering its layout is described by a C struct > * starting at @struct_offset__. The field offset and size is calculated and its > - * value read handling possible un-aligned memory accesses. For example: suppose > - * there is a @struct foo defined as below and the value ``foo.field2.inner2`` > - * needs to be read from the iosys_map: > + * value read. If the field access would incur in un-aligned access, then either > + * iosys_map_memcpy_from() needs to be used or the architecture must support it. > + * For example: suppose there is a @struct foo defined as below and the value > + * ``foo.field2.inner2`` needs to be read from the iosys_map: > * > * .. code-block:: c > *
On Mon, Jul 04, 2022 at 08:30:08AM +0200, Christian König wrote: >Am 28.06.22 um 00:47 schrieb Lucas De Marchi: >>Instead of always falling back to memcpy_fromio() for any size, prefer >>using read{b,w,l}(). When reading struct members it's common to read >>individual integer variables individually. Going through memcpy_fromio() >>for each of them poses a high penalty. >> >>Employ a similar trick as __seqprop() by using _Generic() to generate >>only the specific call based on a type-compatible variable. >> >>For a pariticular i915 workload producing GPU context switches, >>__get_engine_usage_record() is particularly hot since the engine usage >>is read from device local memory with dgfx, possibly multiple times >>since it's racy. Test execution time for this test shows a ~12.5% >>improvement with DG2: >> >>Before: >> nrepeats = 1000; min = 7.63243e+06; max = 1.01817e+07; >> median = 9.52548e+06; var = 526149; >>After: >> nrepeats = 1000; min = 7.03402e+06; max = 8.8832e+06; >> median = 8.33955e+06; var = 333113; >> >>Other things attempted that didn't prove very useful: >>1) Change the _Generic() on x86 to just dereference the memory address >>2) Change __get_engine_usage_record() to do just 1 read per loop, >> comparing with the previous value read >>3) Change __get_engine_usage_record() to access the fields directly as it >> was before the conversion to iosys-map >> >>(3) did gave a small improvement (~3%), but doesn't seem to scale well >>to other similar cases in the driver. >> >>Additional test by Chris Wilson using gem_create from igt with some >>changes to track object creation time. This happens to accidentally >>stress this code path: >> >> Pre iosys_map conversion of engine busyness: >> lmem0: Creating 262144 4KiB objects took 59274.2ms >> >> Unpatched: >> lmem0: Creating 262144 4KiB objects took 108830.2ms >> >> With readl (this patch): >> lmem0: Creating 262144 4KiB objects took 61348.6ms >> >> s/readl/READ_ONCE/ >> lmem0: Creating 262144 4KiB objects took 61333.2ms >> >>So we do take a little bit more time than before the conversion, but >>that is due to other factors: bringing the READ_ONCE back would be as >>good as just doing this conversion. >> >>v2: >>- Remove default from _Generic() - callers wanting to read more >> than u64 should use iosys_map_memcpy_from() >>- Add READ_ONCE() cases dereferencing the pointer when using system >> memory >> >>Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> >>Reviewed-by: Christian König <christian.koenig@amd.com> # v1 >>Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de> > >Feel free to update my rb to v2 as well. > >Apart from that do you have commit rights to drm-misc-next? If not >should we push this? I do have commit rights. I pushed this last week to drm-misc-next. 6fb5ee7cec06 ("iosys-map: Add per-word write") 5f278dbd540b ("iosys-map: Add per-word read") thanks Lucas De Marchi > >Thanks, >Christian. > >>--- >> include/linux/iosys-map.h | 45 +++++++++++++++++++++++++++++++-------- >> 1 file changed, 36 insertions(+), 9 deletions(-) >> >>diff --git a/include/linux/iosys-map.h b/include/linux/iosys-map.h >>index 4b8406ee8bc4..ec81ed995c59 100644 >>--- a/include/linux/iosys-map.h >>+++ b/include/linux/iosys-map.h >>@@ -6,6 +6,7 @@ >> #ifndef __IOSYS_MAP_H__ >> #define __IOSYS_MAP_H__ >>+#include <linux/compiler_types.h> >> #include <linux/io.h> >> #include <linux/string.h> >>@@ -333,6 +334,26 @@ static inline void iosys_map_memset(struct iosys_map *dst, size_t offset, >> memset(dst->vaddr + offset, value, len); >> } >>+#ifdef CONFIG_64BIT >>+#define __iosys_map_rd_io_u64_case(val_, vaddr_iomem_) \ >>+ u64: val_ = readq(vaddr_iomem_) >>+#else >>+#define __iosys_map_rd_io_u64_case(val_, vaddr_iomem_) \ >>+ u64: memcpy_fromio(&(val_), vaddr_iomem_, sizeof(u64)) >>+#endif >>+ >>+#define __iosys_map_rd_io(val__, vaddr_iomem__, type__) _Generic(val__, \ >>+ u8: val__ = readb(vaddr_iomem__), \ >>+ u16: val__ = readw(vaddr_iomem__), \ >>+ u32: val__ = readl(vaddr_iomem__), \ >>+ __iosys_map_rd_io_u64_case(val__, vaddr_iomem__)) >>+ >>+#define __iosys_map_rd_sys(val__, vaddr__, type__) ({ \ >>+ compiletime_assert(sizeof(type__) <= sizeof(u64), \ >>+ "Unsupported access size for __iosys_map_rd_sys()"); \ >>+ val__ = READ_ONCE(*((type__ *)vaddr__)); \ >>+}) >>+ >> /** >> * iosys_map_rd - Read a C-type value from the iosys_map >> * >>@@ -340,16 +361,21 @@ static inline void iosys_map_memset(struct iosys_map *dst, size_t offset, >> * @offset__: The offset from which to read >> * @type__: Type of the value being read >> * >>- * Read a C type value from iosys_map, handling possible un-aligned accesses to >>- * the mapping. >>+ * Read a C type value (u8, u16, u32 and u64) from iosys_map. For other types or >>+ * if pointer may be unaligned (and problematic for the architecture supported), >>+ * use iosys_map_memcpy_from(). >> * >> * Returns: >> * The value read from the mapping. >> */ >>-#define iosys_map_rd(map__, offset__, type__) ({ \ >>- type__ val; \ >>- iosys_map_memcpy_from(&val, map__, offset__, sizeof(val)); \ >>- val; \ >>+#define iosys_map_rd(map__, offset__, type__) ({ \ >>+ type__ val; \ >>+ if ((map__)->is_iomem) { \ >>+ __iosys_map_rd_io(val, (map__)->vaddr_iomem + (offset__), type__);\ >>+ } else { \ >>+ __iosys_map_rd_sys(val, (map__)->vaddr + (offset__), type__); \ >>+ } \ >>+ val; \ >> }) >> /** >>@@ -379,9 +405,10 @@ static inline void iosys_map_memset(struct iosys_map *dst, size_t offset, >> * >> * Read a value from iosys_map considering its layout is described by a C struct >> * starting at @struct_offset__. The field offset and size is calculated and its >>- * value read handling possible un-aligned memory accesses. For example: suppose >>- * there is a @struct foo defined as below and the value ``foo.field2.inner2`` >>- * needs to be read from the iosys_map: >>+ * value read. If the field access would incur in un-aligned access, then either >>+ * iosys_map_memcpy_from() needs to be used or the architecture must support it. >>+ * For example: suppose there is a @struct foo defined as below and the value >>+ * ``foo.field2.inner2`` needs to be read from the iosys_map: >> * >> * .. code-block:: c >> * >
diff --git a/include/linux/iosys-map.h b/include/linux/iosys-map.h index 4b8406ee8bc4..ec81ed995c59 100644 --- a/include/linux/iosys-map.h +++ b/include/linux/iosys-map.h @@ -6,6 +6,7 @@ #ifndef __IOSYS_MAP_H__ #define __IOSYS_MAP_H__ +#include <linux/compiler_types.h> #include <linux/io.h> #include <linux/string.h> @@ -333,6 +334,26 @@ static inline void iosys_map_memset(struct iosys_map *dst, size_t offset, memset(dst->vaddr + offset, value, len); } +#ifdef CONFIG_64BIT +#define __iosys_map_rd_io_u64_case(val_, vaddr_iomem_) \ + u64: val_ = readq(vaddr_iomem_) +#else +#define __iosys_map_rd_io_u64_case(val_, vaddr_iomem_) \ + u64: memcpy_fromio(&(val_), vaddr_iomem_, sizeof(u64)) +#endif + +#define __iosys_map_rd_io(val__, vaddr_iomem__, type__) _Generic(val__, \ + u8: val__ = readb(vaddr_iomem__), \ + u16: val__ = readw(vaddr_iomem__), \ + u32: val__ = readl(vaddr_iomem__), \ + __iosys_map_rd_io_u64_case(val__, vaddr_iomem__)) + +#define __iosys_map_rd_sys(val__, vaddr__, type__) ({ \ + compiletime_assert(sizeof(type__) <= sizeof(u64), \ + "Unsupported access size for __iosys_map_rd_sys()"); \ + val__ = READ_ONCE(*((type__ *)vaddr__)); \ +}) + /** * iosys_map_rd - Read a C-type value from the iosys_map * @@ -340,16 +361,21 @@ static inline void iosys_map_memset(struct iosys_map *dst, size_t offset, * @offset__: The offset from which to read * @type__: Type of the value being read * - * Read a C type value from iosys_map, handling possible un-aligned accesses to - * the mapping. + * Read a C type value (u8, u16, u32 and u64) from iosys_map. For other types or + * if pointer may be unaligned (and problematic for the architecture supported), + * use iosys_map_memcpy_from(). * * Returns: * The value read from the mapping. */ -#define iosys_map_rd(map__, offset__, type__) ({ \ - type__ val; \ - iosys_map_memcpy_from(&val, map__, offset__, sizeof(val)); \ - val; \ +#define iosys_map_rd(map__, offset__, type__) ({ \ + type__ val; \ + if ((map__)->is_iomem) { \ + __iosys_map_rd_io(val, (map__)->vaddr_iomem + (offset__), type__);\ + } else { \ + __iosys_map_rd_sys(val, (map__)->vaddr + (offset__), type__); \ + } \ + val; \ }) /** @@ -379,9 +405,10 @@ static inline void iosys_map_memset(struct iosys_map *dst, size_t offset, * * Read a value from iosys_map considering its layout is described by a C struct * starting at @struct_offset__. The field offset and size is calculated and its - * value read handling possible un-aligned memory accesses. For example: suppose - * there is a @struct foo defined as below and the value ``foo.field2.inner2`` - * needs to be read from the iosys_map: + * value read. If the field access would incur in un-aligned access, then either + * iosys_map_memcpy_from() needs to be used or the architecture must support it. + * For example: suppose there is a @struct foo defined as below and the value + * ``foo.field2.inner2`` needs to be read from the iosys_map: * * .. code-block:: c *