Message ID | 20180215120203.23115-7-lionel.g.landwerlin@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Quoting Lionel Landwerlin (2018-02-15 14:02:02) > With the introduction of asymmetric slices in CNL, we cannot rely on > the previous SUBSLICE_MASK getparam to tell userspace what subslices > are available. Here we introduce a more detailed way of querying the > Gen's GPU topology that doesn't aggregate numbers. > > This is essential for monitoring parts of the GPU with the OA unit, > because counters need to be normalized to the number of > EUs/subslices/slices. The current aggregated numbers like EU_TOTAL do > not gives us sufficient information. > > As a bonus we can draw representations of the GPU : > > https://imgur.com/a/vuqpa > > v2: Rename uapi struct s/_mask/_info/ (Tvrtko) > Report max_slice/subslice/eus_per_subslice rather than strides (Tvrtko) > Add uapi macros to read data from *_info structs (Tvrtko) > > v3: Use !!(v & DRM_I915_BIT()) for uapi macros instead of custom shifts (Tvrtko) > > v4: factorize query item writting (Tvrtko) > tweak uapi struct/define names (Tvrtko) > > v5: Replace ALIGN() macro (Chris) > > v6: Updated uapi comments (Tvrtko) > Moved flags != 0 checks into vfuncs (Tvrtko) > > v7: Use access_ok() before copying anything, to avoid overflows (Chris) > Switch BUG_ON() to GEM_WARN_ON() (Tvrtko) > > v8: Tweak uapi comments style to match the coding style (Lionel) > > v9: Fix error in comment about computation of enabled subslice (Tvrtko) > > v10: Fix/update comments in uAPI (Sagar) > > v11: Drop drm_i915_query_(slice|subslice|eu)_info in favor of a single > drm_i915_query_topology_info (Joonas) > > Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> <SNIP> > +++ b/include/uapi/drm/i915_drm.h > @@ -1620,6 +1620,7 @@ struct drm_i915_perf_oa_config { > > struct drm_i915_query_item { > __u64 query_id; > +#define DRM_I915_QUERY_TOPOLOGY_INFO 0x01 Just a number should be sufficient? Hex would indicate a mask. > +/* > + * Data written by the kernel with query DRM_I915_QUERY_TOPOLOGY_INFO : > + * > + * data: contains the 3 pieces of information : > + * > + * - the slice mask with one bit per slice telling whether a slice is > + * available. The availability of slice X can be queried with the following > + * formula : > + * > + * (data[X / 8] >> (X % 8)) & 1 > + * > + * - the subslice mask for each slice with one bit per subslice telling > + * whether a subslice is available. The availability of subslice Y in slice > + * X can be queried with the following formula : > + * > + * (data[subslice_offset + > + * X * DIV_ROUND_UP(max_subslices, 8) + > + * Y / 8] >> (Y % 8)) & 1 > + * > + * - the EU mask for each subslice in each slice with one bit per EU telling > + * whether an EU is available. The availability of EU Z in subslice Y in > + * slice X can be queried with the following formula : > + * > + * (data[eu_offset + > + * X * max_subslices * DIV_ROUND_UP(max_eus_per_subslice, 8) + > + * Y * DIV_ROUND_UP(max_eus_per_subslice, 8) + > + * Z / 8] >> (Z % 8)) & 1 I'm still contemplating if providing *_stride to make this more straightofrward would be a good or bad thing. The cases would become: data[X / 8] & BIT(X % 8) data[subslice_offset + X * subslice_stride + Y/8] & BIT(Y % 8) data[eu_offset + (X * max_subslices + Y) * eu_stride + Z/8] & BIT(Z % 8) I think I'm heavily leaning towards that, as it comes with the option that we increase eu_stride two-fold to report more information per EU (or subslice for that matter). Thoughts? Regards, Joonas
On 16/02/18 12:28, Joonas Lahtinen wrote: > Quoting Lionel Landwerlin (2018-02-15 14:02:02) >> With the introduction of asymmetric slices in CNL, we cannot rely on >> the previous SUBSLICE_MASK getparam to tell userspace what subslices >> are available. Here we introduce a more detailed way of querying the >> Gen's GPU topology that doesn't aggregate numbers. >> >> This is essential for monitoring parts of the GPU with the OA unit, >> because counters need to be normalized to the number of >> EUs/subslices/slices. The current aggregated numbers like EU_TOTAL do >> not gives us sufficient information. >> >> As a bonus we can draw representations of the GPU : >> >> https://imgur.com/a/vuqpa >> >> v2: Rename uapi struct s/_mask/_info/ (Tvrtko) >> Report max_slice/subslice/eus_per_subslice rather than strides (Tvrtko) >> Add uapi macros to read data from *_info structs (Tvrtko) >> >> v3: Use !!(v & DRM_I915_BIT()) for uapi macros instead of custom shifts (Tvrtko) >> >> v4: factorize query item writting (Tvrtko) >> tweak uapi struct/define names (Tvrtko) >> >> v5: Replace ALIGN() macro (Chris) >> >> v6: Updated uapi comments (Tvrtko) >> Moved flags != 0 checks into vfuncs (Tvrtko) >> >> v7: Use access_ok() before copying anything, to avoid overflows (Chris) >> Switch BUG_ON() to GEM_WARN_ON() (Tvrtko) >> >> v8: Tweak uapi comments style to match the coding style (Lionel) >> >> v9: Fix error in comment about computation of enabled subslice (Tvrtko) >> >> v10: Fix/update comments in uAPI (Sagar) >> >> v11: Drop drm_i915_query_(slice|subslice|eu)_info in favor of a single >> drm_i915_query_topology_info (Joonas) >> >> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> > <SNIP> > >> +++ b/include/uapi/drm/i915_drm.h >> @@ -1620,6 +1620,7 @@ struct drm_i915_perf_oa_config { >> >> struct drm_i915_query_item { >> __u64 query_id; >> +#define DRM_I915_QUERY_TOPOLOGY_INFO 0x01 > Just a number should be sufficient? Hex would indicate a mask. > >> +/* >> + * Data written by the kernel with query DRM_I915_QUERY_TOPOLOGY_INFO : >> + * >> + * data: contains the 3 pieces of information : >> + * >> + * - the slice mask with one bit per slice telling whether a slice is >> + * available. The availability of slice X can be queried with the following >> + * formula : >> + * >> + * (data[X / 8] >> (X % 8)) & 1 >> + * >> + * - the subslice mask for each slice with one bit per subslice telling >> + * whether a subslice is available. The availability of subslice Y in slice >> + * X can be queried with the following formula : >> + * >> + * (data[subslice_offset + >> + * X * DIV_ROUND_UP(max_subslices, 8) + >> + * Y / 8] >> (Y % 8)) & 1 >> + * >> + * - the EU mask for each subslice in each slice with one bit per EU telling >> + * whether an EU is available. The availability of EU Z in subslice Y in >> + * slice X can be queried with the following formula : >> + * >> + * (data[eu_offset + >> + * X * max_subslices * DIV_ROUND_UP(max_eus_per_subslice, 8) + >> + * Y * DIV_ROUND_UP(max_eus_per_subslice, 8) + >> + * Z / 8] >> (Z % 8)) & 1 > I'm still contemplating if providing *_stride to make this more straightofrward > would be a good or bad thing. The cases would become: > > data[X / 8] & BIT(X % 8) > > data[subslice_offset + X * subslice_stride + Y/8] & BIT(Y % 8) > > data[eu_offset + (X * max_subslices + Y) * eu_stride + Z/8] & BIT(Z % 8) > > I think I'm heavily leaning towards that, as it comes with the option > that we increase eu_stride two-fold to report more information per EU > (or subslice for that matter). I'm not sure adding other types of information interleaved with availability mask is good thing. Sounds pretty annoying to parse/test. I'd much rather adding new structs. Regarding stride fields, I'm okay either way. Thanks! - Lionel > > Thoughts? > > Regards, Joonas >
On 15/02/2018 12:02, Lionel Landwerlin wrote: > With the introduction of asymmetric slices in CNL, we cannot rely on > the previous SUBSLICE_MASK getparam to tell userspace what subslices > are available. Here we introduce a more detailed way of querying the > Gen's GPU topology that doesn't aggregate numbers. > > This is essential for monitoring parts of the GPU with the OA unit, > because counters need to be normalized to the number of > EUs/subslices/slices. The current aggregated numbers like EU_TOTAL do > not gives us sufficient information. > > As a bonus we can draw representations of the GPU : > > https://imgur.com/a/vuqpa > > v2: Rename uapi struct s/_mask/_info/ (Tvrtko) > Report max_slice/subslice/eus_per_subslice rather than strides (Tvrtko) > Add uapi macros to read data from *_info structs (Tvrtko) > > v3: Use !!(v & DRM_I915_BIT()) for uapi macros instead of custom shifts (Tvrtko) > > v4: factorize query item writting (Tvrtko) > tweak uapi struct/define names (Tvrtko) > > v5: Replace ALIGN() macro (Chris) > > v6: Updated uapi comments (Tvrtko) > Moved flags != 0 checks into vfuncs (Tvrtko) > > v7: Use access_ok() before copying anything, to avoid overflows (Chris) > Switch BUG_ON() to GEM_WARN_ON() (Tvrtko) > > v8: Tweak uapi comments style to match the coding style (Lionel) > > v9: Fix error in comment about computation of enabled subslice (Tvrtko) > > v10: Fix/update comments in uAPI (Sagar) > > v11: Drop drm_i915_query_(slice|subslice|eu)_info in favor of a single > drm_i915_query_topology_info (Joonas) > > Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> > --- > drivers/gpu/drm/i915/i915_query.c | 71 +++++++++++++++++++++++++++++++++++++++ > include/uapi/drm/i915_drm.h | 40 ++++++++++++++++++++++ > 2 files changed, 111 insertions(+) > > diff --git a/drivers/gpu/drm/i915/i915_query.c b/drivers/gpu/drm/i915/i915_query.c > index 92ce3e24588e..828f1ba19248 100644 > --- a/drivers/gpu/drm/i915/i915_query.c > +++ b/drivers/gpu/drm/i915/i915_query.c > @@ -25,8 +25,79 @@ > #include "i915_drv.h" > #include <uapi/drm/i915_drm.h> > > +static int query_topology_info(struct drm_i915_private *dev_priv, > + struct drm_i915_query_item *query_item) > +{ > + const struct sseu_dev_info *sseu = &INTEL_INFO(dev_priv)->sseu; > + struct drm_i915_query_topology_info topo_info; > + u32 slice_length, subslice_length, eu_length, total_length; > + > + if (query_item->flags != 0) > + return -EINVAL; > + > + if (sseu->max_slices == 0) > + return -ENODEV; > + > + /* > + * If we ever change the internal slice mask data type, we'll need to > + * update this function. > + */ > + BUILD_BUG_ON(sizeof(u8) != sizeof(sseu->slice_mask)); > + > + slice_length = sizeof(sseu->slice_mask); > + subslice_length = sseu->max_slices * > + DIV_ROUND_UP(sseu->max_subslices, > + sizeof(sseu->subslice_mask[0]) * BITS_PER_BYTE); > + eu_length = sseu->max_slices * sseu->max_subslices * > + DIV_ROUND_UP(sseu->max_eus_per_subslice, BITS_PER_BYTE); > + > + total_length = sizeof(topo_info) + slice_length + subslice_length + eu_length; > + > + if (query_item->length == 0) > + return total_length; > + > + if (query_item->length < total_length) > + return -EINVAL; > + > + if (!access_ok(VERIFY_WRITE, u64_to_user_ptr(query_item->data_ptr), > + total_length)) > + return -EFAULT; > + > + memset(&topo_info, 0, sizeof(topo_info)); > + topo_info.max_slices = sseu->max_slices; > + topo_info.max_subslices = sseu->max_subslices; > + topo_info.max_eus_per_subslice = sseu->max_eus_per_subslice; > + > + topo_info.subslice_offset = slice_length; > + topo_info.eu_offset = slice_length + subslice_length; > + > + if (__copy_to_user(u64_to_user_ptr(query_item->data_ptr), > + &topo_info, sizeof(topo_info))) > + return -EFAULT; > + > + if (__copy_to_user(u64_to_user_ptr(query_item->data_ptr + > + sizeof(topo_info)), > + &sseu->slice_mask, slice_length)) > + return -EFAULT; > + > + if (__copy_to_user(u64_to_user_ptr(query_item->data_ptr + > + sizeof(topo_info) + > + slice_length), > + sseu->subslice_mask, subslice_length)) > + return -EFAULT; > + > + if (__copy_to_user(u64_to_user_ptr(query_item->data_ptr + > + sizeof(topo_info) + > + slice_length + subslice_length), > + sseu->eu_mask, eu_length)) > + return -EFAULT; > + > + return total_length; > +} > + > static int (* const i915_query_funcs[])(struct drm_i915_private *dev_priv, > struct drm_i915_query_item *query_item) = { > + query_topology_info, > }; > > int i915_query_ioctl(struct drm_device *dev, void *data, struct drm_file *file) > diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h > index 7fd980176f25..9d5c4caf5a6d 100644 > --- a/include/uapi/drm/i915_drm.h > +++ b/include/uapi/drm/i915_drm.h > @@ -1620,6 +1620,7 @@ struct drm_i915_perf_oa_config { > > struct drm_i915_query_item { > __u64 query_id; > +#define DRM_I915_QUERY_TOPOLOGY_INFO 0x01 > > /* > * When set to zero by userspace, this is filled with the size of the > @@ -1656,6 +1657,45 @@ struct drm_i915_query { > __u64 items_ptr; > }; > > +/* > + * Data written by the kernel with query DRM_I915_QUERY_TOPOLOGY_INFO : > + * > + * data: contains the 3 pieces of information : > + * > + * - the slice mask with one bit per slice telling whether a slice is > + * available. The availability of slice X can be queried with the following > + * formula : > + * > + * (data[X / 8] >> (X % 8)) & 1 > + * > + * - the subslice mask for each slice with one bit per subslice telling > + * whether a subslice is available. The availability of subslice Y in slice > + * X can be queried with the following formula : > + * > + * (data[subslice_offset + > + * X * DIV_ROUND_UP(max_subslices, 8) + > + * Y / 8] >> (Y % 8)) & 1 > + * > + * - the EU mask for each subslice in each slice with one bit per EU telling > + * whether an EU is available. The availability of EU Z in subslice Y in > + * slice X can be queried with the following formula : > + * > + * (data[eu_offset + > + * X * max_subslices * DIV_ROUND_UP(max_eus_per_subslice, 8) + > + * Y * DIV_ROUND_UP(max_eus_per_subslice, 8) + > + * Z / 8] >> (Z % 8)) & 1 > + */ > +struct drm_i915_query_topology_info { > + __u16 max_slices; > + __u16 max_subslices; > + __u16 max_eus_per_subslice; I'd add __u16 pad here. > + > + __u16 subslice_offset; > + __u16 eu_offset; > + And __u16 pad2[2] here. Or alternatively only __u15 pad[3] here to align to 64-bit. Whichever version looks better to you. > + __u8 data[]; > +}; > + > #if defined(__cplusplus) > } > #endif > Otherwise I did not do a full review but can say that one query is fine by me. Regards, Tvrtko
On 20/02/18 11:05, Tvrtko Ursulin wrote: > > On 15/02/2018 12:02, Lionel Landwerlin wrote: >> With the introduction of asymmetric slices in CNL, we cannot rely on >> the previous SUBSLICE_MASK getparam to tell userspace what subslices >> are available. Here we introduce a more detailed way of querying the >> Gen's GPU topology that doesn't aggregate numbers. >> >> This is essential for monitoring parts of the GPU with the OA unit, >> because counters need to be normalized to the number of >> EUs/subslices/slices. The current aggregated numbers like EU_TOTAL do >> not gives us sufficient information. >> >> As a bonus we can draw representations of the GPU : >> >> https://imgur.com/a/vuqpa >> >> v2: Rename uapi struct s/_mask/_info/ (Tvrtko) >> Report max_slice/subslice/eus_per_subslice rather than strides >> (Tvrtko) >> Add uapi macros to read data from *_info structs (Tvrtko) >> >> v3: Use !!(v & DRM_I915_BIT()) for uapi macros instead of custom >> shifts (Tvrtko) >> >> v4: factorize query item writting (Tvrtko) >> tweak uapi struct/define names (Tvrtko) >> >> v5: Replace ALIGN() macro (Chris) >> >> v6: Updated uapi comments (Tvrtko) >> Moved flags != 0 checks into vfuncs (Tvrtko) >> >> v7: Use access_ok() before copying anything, to avoid overflows (Chris) >> Switch BUG_ON() to GEM_WARN_ON() (Tvrtko) >> >> v8: Tweak uapi comments style to match the coding style (Lionel) >> >> v9: Fix error in comment about computation of enabled subslice (Tvrtko) >> >> v10: Fix/update comments in uAPI (Sagar) >> >> v11: Drop drm_i915_query_(slice|subslice|eu)_info in favor of a single >> drm_i915_query_topology_info (Joonas) >> >> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> >> --- >> drivers/gpu/drm/i915/i915_query.c | 71 >> +++++++++++++++++++++++++++++++++++++++ >> include/uapi/drm/i915_drm.h | 40 ++++++++++++++++++++++ >> 2 files changed, 111 insertions(+) >> >> diff --git a/drivers/gpu/drm/i915/i915_query.c >> b/drivers/gpu/drm/i915/i915_query.c >> index 92ce3e24588e..828f1ba19248 100644 >> --- a/drivers/gpu/drm/i915/i915_query.c >> +++ b/drivers/gpu/drm/i915/i915_query.c >> @@ -25,8 +25,79 @@ >> #include "i915_drv.h" >> #include <uapi/drm/i915_drm.h> >> +static int query_topology_info(struct drm_i915_private *dev_priv, >> + struct drm_i915_query_item *query_item) >> +{ >> + const struct sseu_dev_info *sseu = &INTEL_INFO(dev_priv)->sseu; >> + struct drm_i915_query_topology_info topo_info; >> + u32 slice_length, subslice_length, eu_length, total_length; >> + >> + if (query_item->flags != 0) >> + return -EINVAL; >> + >> + if (sseu->max_slices == 0) >> + return -ENODEV; >> + >> + /* >> + * If we ever change the internal slice mask data type, we'll >> need to >> + * update this function. >> + */ >> + BUILD_BUG_ON(sizeof(u8) != sizeof(sseu->slice_mask)); >> + >> + slice_length = sizeof(sseu->slice_mask); >> + subslice_length = sseu->max_slices * >> + DIV_ROUND_UP(sseu->max_subslices, >> + sizeof(sseu->subslice_mask[0]) * BITS_PER_BYTE); >> + eu_length = sseu->max_slices * sseu->max_subslices * >> + DIV_ROUND_UP(sseu->max_eus_per_subslice, BITS_PER_BYTE); >> + >> + total_length = sizeof(topo_info) + slice_length + >> subslice_length + eu_length; >> + >> + if (query_item->length == 0) >> + return total_length; >> + >> + if (query_item->length < total_length) >> + return -EINVAL; >> + >> + if (!access_ok(VERIFY_WRITE, u64_to_user_ptr(query_item->data_ptr), >> + total_length)) >> + return -EFAULT; >> + >> + memset(&topo_info, 0, sizeof(topo_info)); >> + topo_info.max_slices = sseu->max_slices; >> + topo_info.max_subslices = sseu->max_subslices; >> + topo_info.max_eus_per_subslice = sseu->max_eus_per_subslice; >> + >> + topo_info.subslice_offset = slice_length; >> + topo_info.eu_offset = slice_length + subslice_length; >> + >> + if (__copy_to_user(u64_to_user_ptr(query_item->data_ptr), >> + &topo_info, sizeof(topo_info))) >> + return -EFAULT; >> + >> + if (__copy_to_user(u64_to_user_ptr(query_item->data_ptr + >> + sizeof(topo_info)), >> + &sseu->slice_mask, slice_length)) >> + return -EFAULT; >> + >> + if (__copy_to_user(u64_to_user_ptr(query_item->data_ptr + >> + sizeof(topo_info) + >> + slice_length), >> + sseu->subslice_mask, subslice_length)) >> + return -EFAULT; >> + >> + if (__copy_to_user(u64_to_user_ptr(query_item->data_ptr + >> + sizeof(topo_info) + >> + slice_length + subslice_length), >> + sseu->eu_mask, eu_length)) >> + return -EFAULT; >> + >> + return total_length; >> +} >> + >> static int (* const i915_query_funcs[])(struct drm_i915_private >> *dev_priv, >> struct drm_i915_query_item *query_item) = { >> + query_topology_info, >> }; >> int i915_query_ioctl(struct drm_device *dev, void *data, struct >> drm_file *file) >> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h >> index 7fd980176f25..9d5c4caf5a6d 100644 >> --- a/include/uapi/drm/i915_drm.h >> +++ b/include/uapi/drm/i915_drm.h >> @@ -1620,6 +1620,7 @@ struct drm_i915_perf_oa_config { >> struct drm_i915_query_item { >> __u64 query_id; >> +#define DRM_I915_QUERY_TOPOLOGY_INFO 0x01 >> /* >> * When set to zero by userspace, this is filled with the size >> of the >> @@ -1656,6 +1657,45 @@ struct drm_i915_query { >> __u64 items_ptr; >> }; >> +/* >> + * Data written by the kernel with query DRM_I915_QUERY_TOPOLOGY_INFO : >> + * >> + * data: contains the 3 pieces of information : >> + * >> + * - the slice mask with one bit per slice telling whether a slice is >> + * available. The availability of slice X can be queried with the >> following >> + * formula : >> + * >> + * (data[X / 8] >> (X % 8)) & 1 >> + * >> + * - the subslice mask for each slice with one bit per subslice telling >> + * whether a subslice is available. The availability of subslice Y >> in slice >> + * X can be queried with the following formula : >> + * >> + * (data[subslice_offset + >> + * X * DIV_ROUND_UP(max_subslices, 8) + >> + * Y / 8] >> (Y % 8)) & 1 >> + * >> + * - the EU mask for each subslice in each slice with one bit per EU >> telling >> + * whether an EU is available. The availability of EU Z in >> subslice Y in >> + * slice X can be queried with the following formula : >> + * >> + * (data[eu_offset + >> + * X * max_subslices * >> DIV_ROUND_UP(max_eus_per_subslice, 8) + >> + * Y * DIV_ROUND_UP(max_eus_per_subslice, 8) + >> + * Z / 8] >> (Z % 8)) & 1 >> + */ >> +struct drm_i915_query_topology_info { >> + __u16 max_slices; >> + __u16 max_subslices; >> + __u16 max_eus_per_subslice; > > I'd add __u16 pad here. Sure. Joonas also emitted the idea of have a flag to change the meaning of data, so will add that too. > >> + >> + __u16 subslice_offset; >> + __u16 eu_offset; >> + > > And __u16 pad2[2] here. > > Or alternatively only __u15 pad[3] here to align to 64-bit. Whichever > version looks better to you. Okay, I thought because the following with u8 it wouldn't matter, but sure. > >> + __u8 data[]; >> +}; >> + >> #if defined(__cplusplus) >> } >> #endif >> > > Otherwise I did not do a full review but can say that one query is > fine by me. Thanks! > > Regards, > > Tvrtko > >
diff --git a/drivers/gpu/drm/i915/i915_query.c b/drivers/gpu/drm/i915/i915_query.c index 92ce3e24588e..828f1ba19248 100644 --- a/drivers/gpu/drm/i915/i915_query.c +++ b/drivers/gpu/drm/i915/i915_query.c @@ -25,8 +25,79 @@ #include "i915_drv.h" #include <uapi/drm/i915_drm.h> +static int query_topology_info(struct drm_i915_private *dev_priv, + struct drm_i915_query_item *query_item) +{ + const struct sseu_dev_info *sseu = &INTEL_INFO(dev_priv)->sseu; + struct drm_i915_query_topology_info topo_info; + u32 slice_length, subslice_length, eu_length, total_length; + + if (query_item->flags != 0) + return -EINVAL; + + if (sseu->max_slices == 0) + return -ENODEV; + + /* + * If we ever change the internal slice mask data type, we'll need to + * update this function. + */ + BUILD_BUG_ON(sizeof(u8) != sizeof(sseu->slice_mask)); + + slice_length = sizeof(sseu->slice_mask); + subslice_length = sseu->max_slices * + DIV_ROUND_UP(sseu->max_subslices, + sizeof(sseu->subslice_mask[0]) * BITS_PER_BYTE); + eu_length = sseu->max_slices * sseu->max_subslices * + DIV_ROUND_UP(sseu->max_eus_per_subslice, BITS_PER_BYTE); + + total_length = sizeof(topo_info) + slice_length + subslice_length + eu_length; + + if (query_item->length == 0) + return total_length; + + if (query_item->length < total_length) + return -EINVAL; + + if (!access_ok(VERIFY_WRITE, u64_to_user_ptr(query_item->data_ptr), + total_length)) + return -EFAULT; + + memset(&topo_info, 0, sizeof(topo_info)); + topo_info.max_slices = sseu->max_slices; + topo_info.max_subslices = sseu->max_subslices; + topo_info.max_eus_per_subslice = sseu->max_eus_per_subslice; + + topo_info.subslice_offset = slice_length; + topo_info.eu_offset = slice_length + subslice_length; + + if (__copy_to_user(u64_to_user_ptr(query_item->data_ptr), + &topo_info, sizeof(topo_info))) + return -EFAULT; + + if (__copy_to_user(u64_to_user_ptr(query_item->data_ptr + + sizeof(topo_info)), + &sseu->slice_mask, slice_length)) + return -EFAULT; + + if (__copy_to_user(u64_to_user_ptr(query_item->data_ptr + + sizeof(topo_info) + + slice_length), + sseu->subslice_mask, subslice_length)) + return -EFAULT; + + if (__copy_to_user(u64_to_user_ptr(query_item->data_ptr + + sizeof(topo_info) + + slice_length + subslice_length), + sseu->eu_mask, eu_length)) + return -EFAULT; + + return total_length; +} + static int (* const i915_query_funcs[])(struct drm_i915_private *dev_priv, struct drm_i915_query_item *query_item) = { + query_topology_info, }; int i915_query_ioctl(struct drm_device *dev, void *data, struct drm_file *file) diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h index 7fd980176f25..9d5c4caf5a6d 100644 --- a/include/uapi/drm/i915_drm.h +++ b/include/uapi/drm/i915_drm.h @@ -1620,6 +1620,7 @@ struct drm_i915_perf_oa_config { struct drm_i915_query_item { __u64 query_id; +#define DRM_I915_QUERY_TOPOLOGY_INFO 0x01 /* * When set to zero by userspace, this is filled with the size of the @@ -1656,6 +1657,45 @@ struct drm_i915_query { __u64 items_ptr; }; +/* + * Data written by the kernel with query DRM_I915_QUERY_TOPOLOGY_INFO : + * + * data: contains the 3 pieces of information : + * + * - the slice mask with one bit per slice telling whether a slice is + * available. The availability of slice X can be queried with the following + * formula : + * + * (data[X / 8] >> (X % 8)) & 1 + * + * - the subslice mask for each slice with one bit per subslice telling + * whether a subslice is available. The availability of subslice Y in slice + * X can be queried with the following formula : + * + * (data[subslice_offset + + * X * DIV_ROUND_UP(max_subslices, 8) + + * Y / 8] >> (Y % 8)) & 1 + * + * - the EU mask for each subslice in each slice with one bit per EU telling + * whether an EU is available. The availability of EU Z in subslice Y in + * slice X can be queried with the following formula : + * + * (data[eu_offset + + * X * max_subslices * DIV_ROUND_UP(max_eus_per_subslice, 8) + + * Y * DIV_ROUND_UP(max_eus_per_subslice, 8) + + * Z / 8] >> (Z % 8)) & 1 + */ +struct drm_i915_query_topology_info { + __u16 max_slices; + __u16 max_subslices; + __u16 max_eus_per_subslice; + + __u16 subslice_offset; + __u16 eu_offset; + + __u8 data[]; +}; + #if defined(__cplusplus) } #endif
With the introduction of asymmetric slices in CNL, we cannot rely on the previous SUBSLICE_MASK getparam to tell userspace what subslices are available. Here we introduce a more detailed way of querying the Gen's GPU topology that doesn't aggregate numbers. This is essential for monitoring parts of the GPU with the OA unit, because counters need to be normalized to the number of EUs/subslices/slices. The current aggregated numbers like EU_TOTAL do not gives us sufficient information. As a bonus we can draw representations of the GPU : https://imgur.com/a/vuqpa v2: Rename uapi struct s/_mask/_info/ (Tvrtko) Report max_slice/subslice/eus_per_subslice rather than strides (Tvrtko) Add uapi macros to read data from *_info structs (Tvrtko) v3: Use !!(v & DRM_I915_BIT()) for uapi macros instead of custom shifts (Tvrtko) v4: factorize query item writting (Tvrtko) tweak uapi struct/define names (Tvrtko) v5: Replace ALIGN() macro (Chris) v6: Updated uapi comments (Tvrtko) Moved flags != 0 checks into vfuncs (Tvrtko) v7: Use access_ok() before copying anything, to avoid overflows (Chris) Switch BUG_ON() to GEM_WARN_ON() (Tvrtko) v8: Tweak uapi comments style to match the coding style (Lionel) v9: Fix error in comment about computation of enabled subslice (Tvrtko) v10: Fix/update comments in uAPI (Sagar) v11: Drop drm_i915_query_(slice|subslice|eu)_info in favor of a single drm_i915_query_topology_info (Joonas) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> --- drivers/gpu/drm/i915/i915_query.c | 71 +++++++++++++++++++++++++++++++++++++++ include/uapi/drm/i915_drm.h | 40 ++++++++++++++++++++++ 2 files changed, 111 insertions(+)