Message ID | 1471596198-30748-18-git-send-email-akash.goel@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Fri, Aug 19, 2016 at 02:13:16PM +0530, akash.goel@intel.com wrote: > From: Akash Goel <akash.goel@intel.com> > > In order to have fast reads from the GuC log buffer, used SSE4.1 movntdqa > based memcpy function i915_memcpy_from_wc. > GuC log buffer has a WC type vmalloc mapping and copying using movntqda > from WC type memory is almost as fast as reading from WB memory. > This will further reduce the log buffer sampling time, so is needed dearly > to deal with the flush interrupt storm when GuC is generating logs at a > very high rate. > Ideally SSE 4.1 should be present on all chipsets supporting GuC based > submisssions, but if not then logging will not be enabled. > > v2: Rebase. > > Suggested-by: Chris Wilson <chris@chris-wilson.co.uk> > Signed-off-by: Akash Goel <akash.goel@intel.com> > Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Should be squashed with patch 16 (use MAP_WC). -Chris
On 8/19/2016 11:49 PM, Chris Wilson wrote: > On Fri, Aug 19, 2016 at 02:13:16PM +0530, akash.goel@intel.com wrote: >> From: Akash Goel <akash.goel@intel.com> >> >> In order to have fast reads from the GuC log buffer, used SSE4.1 movntdqa >> based memcpy function i915_memcpy_from_wc. >> GuC log buffer has a WC type vmalloc mapping and copying using movntqda >> from WC type memory is almost as fast as reading from WB memory. >> This will further reduce the log buffer sampling time, so is needed dearly >> to deal with the flush interrupt storm when GuC is generating logs at a >> very high rate. >> Ideally SSE 4.1 should be present on all chipsets supporting GuC based >> submisssions, but if not then logging will not be enabled. >> >> v2: Rebase. >> >> Suggested-by: Chris Wilson <chris@chris-wilson.co.uk> >> Signed-off-by: Akash Goel <akash.goel@intel.com> >> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> > > Should be squashed with patch 16 (use MAP_WC). Fine will squash, but please could you tell that what issue could be there with 2 patches being separate. Either both will be merged or none of them will be merged. Best regards Akash > -Chris >
On Sat, Aug 20, 2016 at 12:14:46PM +0530, Goel, Akash wrote: > > > On 8/19/2016 11:49 PM, Chris Wilson wrote: > >On Fri, Aug 19, 2016 at 02:13:16PM +0530, akash.goel@intel.com wrote: > >>From: Akash Goel <akash.goel@intel.com> > >> > >>In order to have fast reads from the GuC log buffer, used SSE4.1 movntdqa > >>based memcpy function i915_memcpy_from_wc. > >>GuC log buffer has a WC type vmalloc mapping and copying using movntqda > >>from WC type memory is almost as fast as reading from WB memory. > >>This will further reduce the log buffer sampling time, so is needed dearly > >>to deal with the flush interrupt storm when GuC is generating logs at a > >>very high rate. > >>Ideally SSE 4.1 should be present on all chipsets supporting GuC based > >>submisssions, but if not then logging will not be enabled. > >> > >>v2: Rebase. > >> > >>Suggested-by: Chris Wilson <chris@chris-wilson.co.uk> > >>Signed-off-by: Akash Goel <akash.goel@intel.com> > >>Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> > > > >Should be squashed with patch 16 (use MAP_WC). > Fine will squash, but please could you tell that what issue could be > there with 2 patches being separate. Either both will be merged or > none of them will be merged. Further reflection is that the initial patch to map the log should be with MAP_WC. Correctness first, then the performance patch. As it stands, the story being told is: 1. Enable readback. 2. Oops, that isn't correct, better map it WC 3. Oops, that is too slow, better use movntqa. A better story is 1. Enable readback, enforcing coherency. 2. Accelerate readback for !llc.
diff --git a/drivers/gpu/drm/i915/i915_guc_submission.c b/drivers/gpu/drm/i915/i915_guc_submission.c index 7c889e1..c214a7c 100644 --- a/drivers/gpu/drm/i915/i915_guc_submission.c +++ b/drivers/gpu/drm/i915/i915_guc_submission.c @@ -1128,13 +1128,13 @@ static void guc_read_update_log_buffer(struct intel_guc *guc) /* Just copy the newly written data */ if (read_offset > write_offset) { - memcpy(dst_data, src_data, write_offset); + i915_memcpy_from_wc(dst_data, src_data, write_offset); bytes_to_copy = buffer_size - read_offset; } else { bytes_to_copy = write_offset - read_offset; } - memcpy(dst_data + read_offset, - src_data + read_offset, bytes_to_copy); + i915_memcpy_from_wc(dst_data + read_offset, + src_data + read_offset, bytes_to_copy); src_data += buffer_size; dst_data += buffer_size; @@ -1247,6 +1247,16 @@ static void guc_create_log(struct intel_guc *guc) vma = guc->log.vma; if (!vma) { + /* We require SSE 4.1 for fast reads from the GuC log buffer and + * it should be present on the chipsets supporting GuC based + * submisssions. + */ + if (WARN_ON(!i915_memcpy_from_wc(NULL, NULL, 0))) { + /* logging will not be enabled */ + i915.guc_log_level = -1; + return; + } + vma = guc_allocate_vma(guc, size); if (IS_ERR(vma)) { /* logging will be off */