[RFC,v2,0/1] Replace shmem memory region and object backend

Message ID	20220517204513.429930-1-adrian.larumbe@collabora.com (mailing list archive)
Headers	show Return-Path: <intel-gfx-bounces@lists.freedesktop.org> sender: alarumbe) with ESMTPSA id 0BCDD1F448BC From: Adrian Larumbe <adrian.larumbe@collabora.com> To: daniel@ffwll.ch, intel-gfx@lists.freedesktop.org Date: Tue, 17 May 2022 21:45:12 +0100 Message-Id: <20220517204513.429930-1-adrian.larumbe@collabora.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Subject: [Intel-gfx] [RFC PATCH v2 0/1] Replace shmem memory region and object backend Precedence: list Cc: adrian.larumbe@collabora.com Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" <intel-gfx-bounces@lists.freedesktop.org>
Series	Replace shmem memory region and object backend \| expand [RFC,v2,0/1] Replace shmem memory region and object backend [RFC,v2,1/1] drm/i915: Replace shmem memory region and object backend with TTM

Message ID

20220517204513.429930-1-adrian.larumbe@collabora.com (mailing list archive)

Headers

From: Adrian Larumbe <adrian.larumbe@collabora.com>
To: daniel@ffwll.ch,
	intel-gfx@lists.freedesktop.org
Date: Tue, 17 May 2022 21:45:12 +0100
Message-Id: <20220517204513.429930-1-adrian.larumbe@collabora.com>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Subject: [Intel-gfx] [RFC PATCH v2 0/1] Replace shmem memory region and
 object backend
Precedence: list
Cc: adrian.larumbe@collabora.com
Errors-To: intel-gfx-bounces@lists.freedesktop.org
Sender: "Intel-gfx" <intel-gfx-bounces@lists.freedesktop.org>

Series

Replace shmem memory region and object backend | expand

Message

Adrián Larumbe May 17, 2022, 8:45 p.m. UTC

This patch is a second attempt at eliminating the old shmem memory region
and GEM object backend, in favour of a TTM-based one that is able to manage
objects placed on both system and local memory.

Questions addressed since previous revision:

* Creating an anonymous vfs mount for shmem files in TTM
* Fixing LLC caching properties and bit 17 swizzling before setting a TTM
bo's pages when calling get_pages
* Added handling of phys backend from TTM functions
* Added pread callback to TTM gem object backend
* In shmem_create_from_object, ensuring an shmem object we just got a filp
for has its pages marked dirty and accessed. Otherwise, the engine won't be
able to read the initial state and a GPU hung will ensue

However, one of the issues persists:

Many GPU hungs in machines of GEN <= 5. My assumption is this has something
 to do with a caching pitfall, but everywhere across the TTM backend code
 I've tried to handle object creation and getting its pages with the same
 set of caching and coherency properties as in the old shmem backend.

Adrian Larumbe (1):
  drm/i915: Replace shmem memory region and object backend with TTM

 drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c   |  12 +-
 drivers/gpu/drm/i915/gem/i915_gem_mman.c     |  32 +-
 drivers/gpu/drm/i915/gem/i915_gem_object.h   |   4 +-
 drivers/gpu/drm/i915/gem/i915_gem_phys.c     |  32 +-
 drivers/gpu/drm/i915/gem/i915_gem_shmem.c    | 390 +------------------
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c      | 267 ++++++++++++-
 drivers/gpu/drm/i915/gem/i915_gem_ttm.h      |   3 +
 drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c |   9 +-
 drivers/gpu/drm/i915/gt/shmem_utils.c        |  64 ++-
 drivers/gpu/drm/i915/intel_memory_region.c   |   7 +-
 10 files changed, 398 insertions(+), 422 deletions(-)

Comments

Matthew Auld May 18, 2022, 3 p.m. UTC | #1

On Tue, 17 May 2022 at 21:45, Adrian Larumbe
<adrian.larumbe@collabora.com> wrote:
>
> This patch is a second attempt at eliminating the old shmem memory region
> and GEM object backend, in favour of a TTM-based one that is able to manage
> objects placed on both system and local memory.
>
> Questions addressed since previous revision:
>
> * Creating an anonymous vfs mount for shmem files in TTM
> * Fixing LLC caching properties and bit 17 swizzling before setting a TTM
> bo's pages when calling get_pages
> * Added handling of phys backend from TTM functions
> * Added pread callback to TTM gem object backend
> * In shmem_create_from_object, ensuring an shmem object we just got a filp
> for has its pages marked dirty and accessed. Otherwise, the engine won't be
> able to read the initial state and a GPU hung will ensue
>
> However, one of the issues persists:
>
> Many GPU hungs in machines of GEN <= 5. My assumption is this has something
>  to do with a caching pitfall, but everywhere across the TTM backend code
>  I've tried to handle object creation and getting its pages with the same
>  set of caching and coherency properties as in the old shmem backend.

Some thoughts in case it's helpful:

- We still look to be trampling the cache_level etc after object
creation. AFAICT i915_ttm_adjust_gem_after_move can be called in
various places after creation.

- The i915_ttm_pwrite hook won't play nice on non-llc platforms, since
it doesn't force a clflush or keep track of the writes with
cache_dirty. The existing ->shmem_pwrite hook only works because we
are guaranteed to have not yet populated the mm.pages, and on non-llc
platforms we always force a clflush in __set_pages(). In
i915_ttm_pwrite we are now just calling pin_pages() and then writing
through the page-cache without forcing a clflush, or ensuring that we
leave cache_dirty set. Also AFAIK the whole point of shmem_pwrite was
to avoid needing to populate the entire object like when calling
pin_pages(). Would it make sense to just fallback to using
i915_gem_shmem_pwrite, which should already take care of the required
flushing?

For reference a common usage pattern is something like:

bb = gem_create() <-- assume non-llc so must be CACHE_NONE
gem_write(bb, BATCH_BUFFER_END) <-- might use cached pwrite internally
execbuf(bb) <-- doesn't see BATCH_BUFFER_END if we don't clflush

>
> Adrian Larumbe (1):
>   drm/i915: Replace shmem memory region and object backend with TTM
>
>  drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c   |  12 +-
>  drivers/gpu/drm/i915/gem/i915_gem_mman.c     |  32 +-
>  drivers/gpu/drm/i915/gem/i915_gem_object.h   |   4 +-
>  drivers/gpu/drm/i915/gem/i915_gem_phys.c     |  32 +-
>  drivers/gpu/drm/i915/gem/i915_gem_shmem.c    | 390 +------------------
>  drivers/gpu/drm/i915/gem/i915_gem_ttm.c      | 267 ++++++++++++-
>  drivers/gpu/drm/i915/gem/i915_gem_ttm.h      |   3 +
>  drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c |   9 +-
>  drivers/gpu/drm/i915/gt/shmem_utils.c        |  64 ++-
>  drivers/gpu/drm/i915/intel_memory_region.c   |   7 +-
>  10 files changed, 398 insertions(+), 422 deletions(-)
>
> --
> 2.35.1
>

Adrián Larumbe May 27, 2022, 3:37 p.m. UTC | #2

On 18.05.2022 16:00, Matthew Auld wrote:
> On Tue, 17 May 2022 at 21:45, Adrian Larumbe
> <adrian.larumbe@collabora.com> wrote:
> >
> > This patch is a second attempt at eliminating the old shmem memory region
> > and GEM object backend, in favour of a TTM-based one that is able to manage
> > objects placed on both system and local memory.
> >
> > Questions addressed since previous revision:
> >
> > * Creating an anonymous vfs mount for shmem files in TTM
> > * Fixing LLC caching properties and bit 17 swizzling before setting a TTM
> > bo's pages when calling get_pages
> > * Added handling of phys backend from TTM functions
> > * Added pread callback to TTM gem object backend
> > * In shmem_create_from_object, ensuring an shmem object we just got a filp
> > for has its pages marked dirty and accessed. Otherwise, the engine won't be
> > able to read the initial state and a GPU hung will ensue
> >
> > However, one of the issues persists:
> >
> > Many GPU hungs in machines of GEN <= 5. My assumption is this has something
> >  to do with a caching pitfall, but everywhere across the TTM backend code
> >  I've tried to handle object creation and getting its pages with the same
> >  set of caching and coherency properties as in the old shmem backend.
> 
> Some thoughts in case it's helpful:
> 
> - We still look to be trampling the cache_level etc after object
> creation. AFAICT i915_ttm_adjust_gem_after_move can be called in
> various places after creation.

I traced this function so that I could see everywhere it was being called when
running some IGT tests and kmscube, and the only place it was setting a caching
coherence value other than none was in init_status_page, where I915_CACHE_LLC is
picked as the cache coherency mode even for machines that do not have
it. However this code was already present before my changes and didn't seem to
cause any issues, so I don't think it's involved.

> - The i915_ttm_pwrite hook won't play nice on non-llc platforms, since
> it doesn't force a clflush or keep track of the writes with
> cache_dirty. The existing ->shmem_pwrite hook only works because we
> are guaranteed to have not yet populated the mm.pages, and on non-llc
> platforms we always force a clflush in __set_pages(). In
> i915_ttm_pwrite we are now just calling pin_pages() and then writing
> through the page-cache without forcing a clflush, or ensuring that we
> leave cache_dirty set. Also AFAIK the whole point of shmem_pwrite was
> to avoid needing to populate the entire object like when calling
> pin_pages(). Would it make sense to just fallback to using
> i915_gem_shmem_pwrite, which should already take care of the required
> flushing?
> 
> For reference a common usage pattern is something like:
> 
> bb = gem_create() <-- assume non-llc so must be CACHE_NONE
> gem_write(bb, BATCH_BUFFER_END) <-- might use cached pwrite internally
> execbuf(bb) <-- doesn't see BATCH_BUFFER_END if we don't clflush

It turns out this was the underlying issue causing machines GEN <= 5 to break in
pretty much every single test. It seems that for older hardware, IGT tests would
pick pwrite as the preferred method for filling BO's from UM, so my code wasn't
calling clfush after getting the pages and writing the UM pointer data into the
shmem file.

The way I fixed it was creating an shmem file for this and other cases when it's
required by the existing code, but without getting the pages.  In a way, I just
cut through the usual TTM populate path and instance an shmem object so that I
can avoid caching issues.

Thanks a lot for catching this one!

> >
> > Adrian Larumbe (1):
> >   drm/i915: Replace shmem memory region and object backend with TTM
> >
> >  drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c   |  12 +-
> >  drivers/gpu/drm/i915/gem/i915_gem_mman.c     |  32 +-
> >  drivers/gpu/drm/i915/gem/i915_gem_object.h   |   4 +-
> >  drivers/gpu/drm/i915/gem/i915_gem_phys.c     |  32 +-
> >  drivers/gpu/drm/i915/gem/i915_gem_shmem.c    | 390 +------------------
> >  drivers/gpu/drm/i915/gem/i915_gem_ttm.c      | 267 ++++++++++++-
> >  drivers/gpu/drm/i915/gem/i915_gem_ttm.h      |   3 +
> >  drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c |   9 +-
> >  drivers/gpu/drm/i915/gt/shmem_utils.c        |  64 ++-
> >  drivers/gpu/drm/i915/intel_memory_region.c   |   7 +-
> >  10 files changed, 398 insertions(+), 422 deletions(-)
> >
> > --
> > 2.35.1

Matthew Auld May 30, 2022, 6:58 p.m. UTC | #3

On Fri, 27 May 2022 at 16:37, Adrian Larumbe
<adrian.larumbe@collabora.com> wrote:
>
> On 18.05.2022 16:00, Matthew Auld wrote:
> > On Tue, 17 May 2022 at 21:45, Adrian Larumbe
> > <adrian.larumbe@collabora.com> wrote:
> > >
> > > This patch is a second attempt at eliminating the old shmem memory region
> > > and GEM object backend, in favour of a TTM-based one that is able to manage
> > > objects placed on both system and local memory.
> > >
> > > Questions addressed since previous revision:
> > >
> > > * Creating an anonymous vfs mount for shmem files in TTM
> > > * Fixing LLC caching properties and bit 17 swizzling before setting a TTM
> > > bo's pages when calling get_pages
> > > * Added handling of phys backend from TTM functions
> > > * Added pread callback to TTM gem object backend
> > > * In shmem_create_from_object, ensuring an shmem object we just got a filp
> > > for has its pages marked dirty and accessed. Otherwise, the engine won't be
> > > able to read the initial state and a GPU hung will ensue
> > >
> > > However, one of the issues persists:
> > >
> > > Many GPU hungs in machines of GEN <= 5. My assumption is this has something
> > >  to do with a caching pitfall, but everywhere across the TTM backend code
> > >  I've tried to handle object creation and getting its pages with the same
> > >  set of caching and coherency properties as in the old shmem backend.
> >
> > Some thoughts in case it's helpful:
> >
> > - We still look to be trampling the cache_level etc after object
> > creation. AFAICT i915_ttm_adjust_gem_after_move can be called in
> > various places after creation.
>
> I traced this function so that I could see everywhere it was being called when
> running some IGT tests and kmscube, and the only place it was setting a caching
> coherence value other than none was in init_status_page, where I915_CACHE_LLC is
> picked as the cache coherency mode even for machines that do not have
> it. However this code was already present before my changes and didn't seem to
> cause any issues, so I don't think it's involved.

Yeah, I guess it's a different issue, but we still need to somehow
ensure we never trample the cache_level etc on integrated platforms
after object creation. On non-LLC platforms(not discrete) it's normal
to set CACHE_LLC for certain types of buffers. And even on LLC
platforms it's normal to set CACHE_NONE, like for display buffers,
since the display engine is not coherent.

For reference we can have something like:

bb = gem_create() <-- assume non-llc so must be CACHE_NONE
gem_set_caching(bb, CACHED) <-- should now be CACHE_LLC/L3
ptr = mmap_wb(bb); *ptr = BATCH_BUFFER_END <-- gem_after_move() resets
to CACHE_NONE
execbuf(bb) <-- doesn't see the BATCH_BUFFER_END

>
> > - The i915_ttm_pwrite hook won't play nice on non-llc platforms, since
> > it doesn't force a clflush or keep track of the writes with
> > cache_dirty. The existing ->shmem_pwrite hook only works because we
> > are guaranteed to have not yet populated the mm.pages, and on non-llc
> > platforms we always force a clflush in __set_pages(). In
> > i915_ttm_pwrite we are now just calling pin_pages() and then writing
> > through the page-cache without forcing a clflush, or ensuring that we
> > leave cache_dirty set. Also AFAIK the whole point of shmem_pwrite was
> > to avoid needing to populate the entire object like when calling
> > pin_pages(). Would it make sense to just fallback to using
> > i915_gem_shmem_pwrite, which should already take care of the required
> > flushing?
> >
> > For reference a common usage pattern is something like:
> >
> > bb = gem_create() <-- assume non-llc so must be CACHE_NONE
> > gem_write(bb, BATCH_BUFFER_END) <-- might use cached pwrite internally
> > execbuf(bb) <-- doesn't see BATCH_BUFFER_END if we don't clflush
>
> It turns out this was the underlying issue causing machines GEN <= 5 to break in
> pretty much every single test. It seems that for older hardware, IGT tests would
> pick pwrite as the preferred method for filling BO's from UM, so my code wasn't
> calling clfush after getting the pages and writing the UM pointer data into the
> shmem file.
>
> The way I fixed it was creating an shmem file for this and other cases when it's
> required by the existing code, but without getting the pages.  In a way, I just
> cut through the usual TTM populate path and instance an shmem object so that I
> can avoid caching issues.
>
> Thanks a lot for catching this one!
>
> > >
> > > Adrian Larumbe (1):
> > >   drm/i915: Replace shmem memory region and object backend with TTM
> > >
> > >  drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c   |  12 +-
> > >  drivers/gpu/drm/i915/gem/i915_gem_mman.c     |  32 +-
> > >  drivers/gpu/drm/i915/gem/i915_gem_object.h   |   4 +-
> > >  drivers/gpu/drm/i915/gem/i915_gem_phys.c     |  32 +-
> > >  drivers/gpu/drm/i915/gem/i915_gem_shmem.c    | 390 +------------------
> > >  drivers/gpu/drm/i915/gem/i915_gem_ttm.c      | 267 ++++++++++++-
> > >  drivers/gpu/drm/i915/gem/i915_gem_ttm.h      |   3 +
> > >  drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c |   9 +-
> > >  drivers/gpu/drm/i915/gt/shmem_utils.c        |  64 ++-
> > >  drivers/gpu/drm/i915/intel_memory_region.c   |   7 +-
> > >  10 files changed, 398 insertions(+), 422 deletions(-)
> > >
> > > --
> > > 2.35.1
>